Call for Participation to Bake-off tasks of CLP2010
The CIPS-SIGHAN Joint Conference on Chinese Language Processing (CLP2010) will also feature four international bake-offs in Chinese Language Processing, and these are:
• Chinese word segmentation
• Chinese Parsing
• Chinese Personal Name disambiguation
• Chinese Word Sense Induction
Task 1: Chinese Segmentation
Built on the successes of previous SIGHAN-sponsored international bakeoffs, the training and test data in the CIPS-SIGHAN-2010 word segmentation task will be from different domains to improve the robustness of current systems. In addition, selected examples for various test points will be added to expose potential problems that need to be solved to take the state of the art to the next level. The evaluation will help improve the performance of automatic segmentation for Chinese by identifying crucial language resources and new natural language processing algorithms.
Organizers: Liu, Qun Zhao, HongMei
Task 2: Chinese Parsing
Chinese syntactic parsing has been a highly active research area in recent years, and there is a pressing need for a common evaluation platform where different approaches can be compared and progress can be gauged. The purpose of the CIPS-ParsEval campaign is to provide such a platform. The first CIPS-ParsEval (CIPS-ParsEval-2009) was successfully held in Beijing in 2009. Built on this success, the second CIPS-ParsEval (CIPS-ParsEval-2010), jointly sponsored by CIPS and SIGHAN, will be held in the summer of 2010. The hope is that through such evaluation campaigns, more advanced Chinese syntactic parsing techniques will emerge, more effective Chinese language processing resources will be built, and the state of the art will be advanced as a result.
This evaluation includes two sub-tasks: sub-sentence parsing and complete sentence parsing. For complex sentences, the performance of automatic parsers will be evaluated at three different levels (phrase level, simple sentence level and complex sentence level).
For each sub-task, there are two tracks. 1) In the closed track, participants can only use training data provided by the organizers. 2) In the open track the participants can use any data source in addition to the training data provided by the organizers. Entries in the two tracks will be evaluated separately.
In addition, single systems and combined systems will be evaluated separately in the closed track. 1) single system: parsers that use a single parsing model to accomplish the parsing task. 2) system combination: participants are allowed to combine multiple models to improve performance. Collaborative decoding methods will be regarded as a combination method.
Organizer: Zhou, Qiang Zhu, Jingbo
Task 3: Chinese Personal Name disambiguation
Personal names are usually highly ambiguous in text because different people may have the same name and the same name can be written in different ways. Solving this problem will have a huge impact on the accuracy of web search and potentially other natural language applications. There have been two recent Web People Search (WePS) evaluation campaigns on personal name disambiguation using data from English language web pages. Chinese personal name disambiguation is potentially more challenging due to the need for word segmentation, which could introduce errors that can in large part be avoided in the English task. The Chinese personal name disambiguation task will thus be an adapted version of the English WePS task that takes word segmentation into account.
Organizers: Li, Maggie Huang, Chu-Ren Chen, Ying Jin, Peng
Task 4: Chinese Word Sense Induction
The use of word senses instead of word forms has been shown to improve performance in information retrieval, information extraction and machine translation. Word Sense Disambiguation generally requires the use of large-scale manually annotated lexical resources. Word Sense Induction (WSI) can overcome this limitation, and it has become one of the most important topics in current computational linguistics research.
Compared with European languages such as English, the study of WSI and WSD in Chinese is inadequate. In addition, Chinese word senses have their own characteristics. The methods that work well in English may not work well in Chinese. This task is intended to promote the exchange of ideas among participants and improve the performance of Chinese WSI systems.
Organizer: Sun, Le Dong, Qiang Zhang, Zhenzhong
Please visit the website (http://www.cipsc.org.cn/clp2010/cfpa.htm) for the details on these competitions