********************************************************************


                                                     Seminar

             Department of Systems Engineering and Engineering Management
                                  The Chinese University of Hong Kong

------------------------------------------------------------------------------------------

 

 

 

Title

:

Tree-based and Forest-based Translation

 

 

 

Speaker

:

Dr. Liang Huang

 

 

University of Pennsylvania

 

 

 

 

 

Date

:

November 26th, 2008 (Wednesday)

 

 

 

Time

:

4:30 p.m. - 5:30 p.m.

 

 

 

Venue

:

Room 513

 

 

William M.W. Mong Engineering Building

 

 

(Engineering Building Complex Phase 2)

 

 

CUHK

 

 

 

------------------------------------------------------------------------------------------

Abstract:
 

What can machine translation systems learn from human translators? And what is in common between translating English into Chinese and compiling C++ into machine code?

In this talk I will first introduce a tree-based paradigm for machine translation, inspired by both human translators and compilers. In this paradigm, a source language sentence is first parsed into a syntax tree, which is then recursively converted into a target language sentence via tree-to-string transformation rules. Since the translation process is driven by the syntax, this approach resembles the \"syntax-directed translation\" method used by almost all compilers.

However, natural languages are crucially different from programming languages in that they are fundamentally ambiguous. So we don\'t (and will probably never) have perfect parsers, and parsing errors adversely affect translation quality. An obvious solution is to use the top-k parses, rather than a single 1-best tree, but this only helps a little bit due to the limited scope of the k-best list. We instead propose a \"forest-based approach\", which translates a packed forest encoding *exponentially* many parses in a compact (polynomial) space by sharing common subtrees. Large-scale experiments showed very significant improvements (over the 1-best baseline) in terms of translation quality, which outperforms the best reported systems to date, and confirmed that translating on a forest of millions of trees can be even faster than translating on top-30 individual trees thanks to dynamic programming.

This is a joint work with Haitao Mi and Qun Liu.


-------------------------------------------------------------------------------------------

Biography:
 

Liang Huang recently defended his PhD thesis at the University of Pennsylvania, co-supervised by Aravind Joshi and Kevin Knight (USC/ISI). He is mainly interested in the theoretical aspects of computational linguistics, in particular, efficient algorithms in parsing and machine translation, generic dynamic programming, and formal properties of synchronous grammars. His thesis develops a set of \"forest-based methods\" that have been applied to many problems in NLP including k-best parsing, forest rescoring and reranking, and forest-based translation. He received an Outstanding Paper Award at ACL 2008, and a University Teaching Award at Penn in 2005.


************************* ALL ARE WELCOME ************************

 

 

 

Host

:

Prof. Kam-Fai Wong

Tel

:

(852) 2609-8332

Email

:

kfwong@se.cuhk.edu.hk

 

 

 

Enquiries

:

Prof. Nan Chen or Prof. Sean X. Zhou

 

:

Department of Systems Engineering and Engineering Management

 

 

CUHK

Website

:

http://www.se.cuhk.edu.hk/~seg5810

Email

:

seg5810@se.cuhk.edu.hk

 

 

 

********************************************************************