Call for Papers: CIPS-SIGHAN Joint Conference on Chinese Language Processing (CLP2010)
Background and Goals
With the rapid of expansion of Chinese language materials on the Internet, the use of natural language technology as a way of harnessing Chinese language content is drawing growing interest from researchers around the globe. The rise of China as a global power with increasing influence on the world stage is only fanning this interest. The Chinese language also has a number of characteristics that make Chinese language processing particularly challenging and intellectually rewarding. For example, written Chinese text does not have conventionalized word boundaries like English and other Western languages, and researchers have devoted an enormous amount of energy to figuring out the best way to identify words, which is generally considered to the first step for more advanced language processing tasks. There have been four successful international Chinese word segmentation bakeoffs sponsored by the ACL Special Interest Group on Chinese Language Processing (SIGHAN), and they have drawn wide participation and have greatly advanced the state-of-the-art in this area. The Chinese language is also characterized by the lack of formal devices such as morphological tense and number that often provide important clues for shallow language processing tasks like part-of-speech tagging and syntactic chunking. As a result, solutions to Chinese language processing problems often require more sophisticated language processing techniques that are capable of drawing inferences from more subtle information.
Against this backdrop, the first conference on Chinese Language Processing (CLP2010) jointly organized by the Chinese Information Processing Society of China (CIPS) and SIGHAN, will be held on August 28-29, 2010 in Beijing, right after COLING 2010 and in the same venue. The goal is to bring together both established and aspiring researchers around the globe and provide a unified forum for them to showcase their research achievements, share their ideas, and frame research problems that are crucial in advancing the state-of-the-art in Chinese language processing.
Papers are invited on substantial, original and unpublished research on all aspects of Chinese language processing, including but not limited to:
• word segmentation
• part-of-speech tagging
• syntactic chunking and parsing
• lexical semantics
• semantic role labeling
• word sense disambiguation
• lexicon acquisition
• corpus development and language resources
• evaluation methods and user studies
• computational models of discourse
• temporal and spatial information processing
• sentimental analysis and opinion mining
• language generation
Call for Participation to Bake-off tasks of CLP2010
The CIPS-SIGHAN Joint Conference on Chinese Language Processing (CLP2010) will also feature four international bake-offs in Chinese Language Processing, and these are:
•Chinese word segmentation
•Chinese Personal Name disambiguation
•Chinese Word Sense Induction
Task 1: Chinese Segmentation
Built on the successes of previous SIGHAN-sponsored international bakeoffs, the training and test data in the CIPS-SIGHAN-2010 word segmentation task will be from different domains to improve the robustness of current systems. In addition, selected examples for various test points will be added to expose potential problems that need to be solved to take the state of the art to the next level. The evaluation will help improve the performance of automatic segmentation for Chinese by identifying crucial language resources and new natural language processing algorithms.
Organizers: Liu, Qun Zhao, HongMei
Task 2: Chinese Parsing
Chinese syntactic parsing has been a highly active research area in recent years, and there is a pressing need for a common evaluation platform where different approaches can be compared and progress can be gauged. The purpose of the CIPS-ParsEval campaign is to provide such a platform. The first CIPS-ParsEval (CIPS-ParsEval-2009) was successfully held in Beijing in 2009. Built on this success, the second CIPS-ParsEval (CIPS-ParsEval-2010), jointly sponsored by CIPS and SIGHAN, will be held in the summer of 2010. The hope is that through such evaluation campaigns, more advanced Chinese syntactic parsing techniques will emerge, more effective Chinese language processing resources will be built, and the state of the art will be advanced as a result.
This evaluation includes two sub-tasks: sub-sentence parsing and complete sentence parsing. For complex sentences, the performance of automatic parsers will be evaluated at three different levels (phrase level, simple sentence level and complex sentence level).
For each sub-task, there are two tracks. 1) In the closed track, participants can only use training data provided by the organizers. 2) In the open track the participants can use any data source in addition to the training data provided by the organizers. Entries in the two tracks will be evaluated separately.
In addition, single systems and combined systems will be evaluated separately in the closed track. 1) single system: parsers that use a single parsing model to accomplish the parsing task. 2) system combination: participants are allowed to combine multiple models to improve performance. Collaborative decoding methods will be regarded as a combination method.
Organizer: Zhou, Qiang Zhu, Jingbo
Task 3: Chinese Personal Name disambiguation
Personal names are usually highly ambiguous in text because different people may have the same name and the same name can be written in different ways. Solving this problem will have a huge impact on the accuracy of web search and potentially other natural language applications. There have been two recent Web People Search (WePS) evaluation campaigns on personal name disambiguation using data from English language web pages. Chinese personal name disambiguation is potentially more challenging due to the need for word segmentation, which could introduce errors that can in large part be avoided in the English task. The Chinese personal name disambiguation task will thus be an adapted version of the English WePS task that takes word segmentation into account.
Organizers: Li, Maggie Huang, Chu-Ren Chen, Ying Jin, Peng
Task 4: Chinese Word Sense Induction
The use of word senses instead of word forms has been shown to improve performance in information retrieval, information extraction and machine translation. Word Sense Disambiguation generally requires the use of large-scale manually annotated lexical resources. Word Sense Induction (WSI) can overcome this limitation, and it has become one of the most important topics in current computational linguistics research.
Compared with European languages such as English, the study of WSI and WSD in Chinese is inadequate. In addition, Chinese word senses have their own characteristics. The methods that work well in English may not work well in Chinese. This task is intended to promote the exchange of ideas among participants and improve the performance of Chinese WSI systems.
Organizer: Sun, Le Dong, Qiang Zhang, Zhenzhong