Call for Abstracts - Workshop on Linguistic Processing Pipelines

Call for Abstracts Pre-Conference: Workshop on Linguistic Processing Pipelines date: 29. 09. 2009 location: Potsdam (Germany) Main Conference: http://www.ling.uni-potsdam.de/acl-lab/gscl09/index.en.html There is a range of NLP techniques that have reached a degree of reliability that makes them valuable, if not indispensable, components in more complex text processing tasks, both in the development of information systems (information retrieval, question/answering, document summarization, opinion mining etc) as well as in computational and corpus linguistic research (e.g., semantic role labelling, theme identification, lexical chaining). To achieve a specific analysis task, linguists set up linguistic processing pipelines or work flows, i.e., configurations of processing steps that communicate with one another. Typically, this starts from a data set in the form of a corpus of raw text which is pre-processed (text normalization, encoding, segmentation) and then annotated (both automatically and manually). Here, some basic types of processing, such as tokenization or part-of-speech tagging, can provide a solid basis for subsequent, more sophisticated types of analysis. Based on such enriched data sets, new data is extracted and analyzed further (e.g., by querying, data mining or descriptive statistical analysis). The steadily increasing interest in authentic texts and their interpretation in the NLP and related communities marks a turn towards empirical methods which rely on existing NLP techniques to a considerable extent. It seems a good point in time to take account of the current practices in the design and deployment of linguistic processing pipelines in order to reflect upon these practices, assess them and identify future directions of research in this promising subfield of NLP/Corpus Linguistics. The workshop invites participants to present their practices in text/corpus processing, focusing on linguistic processing pipelines for selected research areas in information systems and linguistics such as * informational retrieval * information extraction * question/answering * automatic summarization * (machine) translation * grammar induction * lexicology/lexicography * register analysis * discourse analysis The goal of the workshop is to provide a forum for discussing and comparing the processing pipelines people build and discuss the issues involved, including for instance * configurations (types, order) of processing steps * data structures and formats * use of frameworks (e.g., GATE, UIMA) * automatic and manual components for annotation * integration/harmonization of components * status of the data produced (transient vs. sustainable) * exploitation of the resulting data By providing the opportunity for an exchange of experiences in building processing pipelines, the workshop seeks to contribute to enabling the sharing of processing pipelines as well as foster the development of best practices in linguistic resource building and corpus processing more generally. Invited Talk Thilo Götz, IBM This workshop is an activity of the AK Texttechnologie of GSCL: http://www.gscl.info/ak_txt.html Organizers: Elke Teich (TU Darmstadt, teich_at_linglit.tu-darmstadt.de) Andreas Witt (IDS Mannheim, witt_at_ids-mannheim.de) Peter Fankhauser (L3S Hannover, fankhauser_at_l3s.de) Reviewing committee: Núria Bel António Branco Stefanie Dipper Tomaz Erjavec Iryna Gurevych Thilo Götz Uli Heid Erhard Hinrichs Kimmo Koskenniemi Jonas Kuhn Henning Lobin Laurent Romary Mike Rosner Koenraad DeSmedt Marko Tadić Submission details: We invite extended abstracts (500-700 words excluding references) addressing one of the above listed topics. Submission of abstract 15/06/09 Notification of acceptance 15/07/09 Submission of revised abstract 31/08/09 Abstracts should be sent in pdf format to witt_at_ids-mannheim.de. Please put "GSCL Workshop" in the subject line. The workshop proceedings will comprise the extended abstracts only. However, we intend to publish a book of selected papers after the conference.