*********************************************************************

********************************************************************

Seminar

Department of Systems Engineering and Engineering Management
The Chinese University of Hong Kong

------------------------------------------------------------------------------------------


Title	:	Tree-based and Forest-based Translation

Speaker	:	Dr. Liang Huang
		University of Pennsylvania


Date	:	November 26th, 2008 (Wednesday)

Time	:	4:30 p.m. - 5:30 p.m.

Venue	:	Room 513
		William M.W. Mong Engineering Building
		(Engineering Building Complex Phase 2)
		CUHK

------------------------------------------------------------------------------------------

Abstract:
　

What can machine translation systems learn from human translators? And what is in common between translating English into Chinese and compiling C++ into machine code?

In this talk I will first introduce a tree-based paradigm for machine translation, inspired by both human translators and compilers. In this paradigm, a source language sentence is first parsed into a syntax tree, which is then recursively converted into a target language sentence via tree-to-string transformation rules. Since the translation process is driven by the syntax, this approach resembles the \"syntax-directed translation\" method used by almost all compilers.

However, natural languages are crucially different from programming languages in that they are fundamentally ambiguous. So we don\'t (and will probably never) have perfect parsers, and parsing errors adversely affect translation quality. An obvious solution is to use the top-k parses, rather than a single 1-best tree, but this only helps a little bit due to the limited scope of the k-best list. We instead propose a \"forest-based approach\", which translates a packed forest encoding *exponentially* many parses in a compact (polynomial) space by sharing common subtrees. Large-scale experiments showed very significant improvements (over the 1-best baseline) in terms of translation quality, which outperforms the best reported systems to date, and confirmed that translating on a forest of millions of trees can be even faster than translating on top-30 individual trees thanks to dynamic programming.

This is a joint work with Haitao Mi and Qun Liu.

-------------------------------------------------------------------------------------------

Biography:
　

Liang Huang recently defended his PhD thesis at the University of Pennsylvania, co-supervised by Aravind Joshi and Kevin Knight (USC/ISI). He is mainly interested in the theoretical aspects of computational linguistics, in particular, efficient algorithms in parsing and machine translation, generic dynamic programming, and formal properties of synchronous grammars. His thesis develops a set of \"forest-based methods\" that have been applied to many problems in NLP including k-best parsing, forest rescoring and reranking, and forest-based translation. He received an Outstanding Paper Award at ACL 2008, and a University Teaching Award at Penn in 2005.

************************* ALL ARE WELCOME ************************


Host	:	Prof. Kam-Fai Wong
Tel	:	(852) 2609-8332
Email	:	kfwong@se.cuhk.edu.hk

Enquiries	:	Prof. Nan Chen or Prof. Sean X. Zhou
	:	Department of Systems Engineering and Engineering Management
		CUHK
Website	:	http://www.se.cuhk.edu.hk/~seg5810
Email	:	seg5810@se.cuhk.edu.hk

********************************************************************