Department of Systems Engineering and Engineering Management,

                    The Chinese University of Hong Kong


A Few Steps towards Robust Chat Text Processing

Dr. Yunqing Xia
Department of System Engineering and Engineering Management
The Chinese University of Hong Kong

Date : March 24, 2006 (Friday)

Time : 4:30 p.m. - 5:30 p.m.

Venue : Room 513, William M.W. Mong Engineering Building

(Engineering Building Complex Phase 2), CUHK

Chat text refers to the special human language used in the community of
network communication via platforms such as online chat rooms/tools (ICQ, MSN,
etc), bulletin board systems (BBS), email systems, blogs, etc. Chat text
nowadays becomes ubiquitous due in particular to the rapid proliferation of
Internet applications such as online education and customer relationship
management. On the other hand, it is also abused by solicitors of terrorism,
pornography and crime in chat rooms and BBS. All the facts disclose the rising
importance of chat text understanding.

Chat text is anomalous, dynamic and source-specific compared to standard
language. This renders conventional NLP tools inapplicable. In this talk, a
few research steps towards robust chat text processing are presented to
resolve these problem. In first step, pattern matching technique and support
vector machines are applied to recognize the anomalous chat terms. But both
approaches are ineffective due to the dynamic nature. In the second step, an
error-driven approach is proposed to recognize dynamic chat terms by
calculating confidence that input text is chat text. The approach uses a
standard corpus and a static chat text corpus and performs consistently on
time-varying test sets. However, the normalization task remains unaddressed.
In the third step, the normalization issue is addressed with an extended
source channel model, which incorporates phonetic mappings between chat terms
and standard words. The phonetic mappings are static compared to the dynamic
chat terms. So the approach can perform robustly on time-varying test sets
from cross-sources.

Dr. Xia obtained his PhD degree in Institute of Computing Technologies at
Chinese Academy of Sciences in 2001. In January 2003, he joined University of
Sheffield as a postdoctoral researcher in Sheffield NLP group. In December
2005, he joined The Chinese University of Hong Kong as a postdoctoral research
fellow in Department of SE&EM His research interests cover statistical
language processing and machine learning approaches to text classification and
text mining.


Note : Cookies and drinks will be available at 4:15 pm.



                       ***** ALL ARE WELCOME *****

Host : Prof. Kam Fai Wong
Tel : 2609 8332
Email : kfwong@se.cuhk.edu.hk

For more information please

refer to http://www.se.cuhk.edu.hk/~seg5810/