[eu_members at aclweb dot org] Book Announcement:: Introduction to Linguistic Annotation and Text Analytics


Introduction to Linguistic Annotation and Text Analytics

Graham Wilcock (University of Helsinki)

Synthesis Lectures on Human Language Technologies #3 (Morgan & 
Claypool Publishers), 2009, 159 pages

Linguistic annotation and text analytics are active areas of research 
and development, with academic conferences and industry events such as 
the Linguistic Annotation Workshops and the annual Text Analytics 
Summits. This book provides a basic introduction to both fields, and 
aims to show that good linguistic annotations are the essential 
foundation for good text analytics. After briefly reviewing the basics 
of XML, with practical exercises illustrating in-line and stand-off 
annotations, a chapter is devoted to explaining the different levels 
of linguistic annotations. The reader is encouraged to create example 
annotations using the WordFreak linguistic annotation tool. The next 
chapter shows how annotations can be created automatically using 
statistical NLP tools, and compares two sets of tools, the OpenNLP and 
Stanford NLP tools. The second half of the book describes different 
annotation formats and gives practical examples of how to interchange 
annotations between different formats using XSLT transformations. The 
two main text analytics architectures, GATE and UIMA, are then 
described and compared, with practical exercises showing how to 
configure and customize them. The final chapter is an introduction to 
text analytics, describing the main applications and functions 
including named entity recognition, coreference resolution and 
information extraction, with practical examples using both open source 
and commercial tools. Copies of the example files, scripts, and 
stylesheets used in the book are available from the companion website, 
located at http://sites.morganclaypool.com/wilcock.

Table of Contents: Working with XML / Linguistic Annotation / Using 
Statistical NLP Tools / Annotation Interchange / Annotation 
Architectures / Text Analytics


This title is available online without charge to members of 
institutions that have licensed the Synthesis Digital Library of 
Engineering and Computer Science.  Members of licensing institutions 
have unlimited access to download, save, and print the PDF without 
restriction; use of the book as a course text is encouraged.  To find 
out whether your institution is a subscriber, visit <http://www.morganclaypool.com/page/licensed
 >, or just click on the book's URL above from an institutional IP 
address and attempt to download the PDF.  Others may purchase the book 
from this URL as a PDF download for US$30 or in print for US$40.  
Printed copies are also available from Amazon and from booksellers 
worldwide at approximately US$40 or local currency equivalent.