[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[eu_members at aclweb dot org] FIFTH RECOGNIZING TEXTUAL ENTAILMENT CHALLENGE AT TAC 2009 (RTE-5) -- Call for Participation
RTE-5: Call for Participation
FIFTH RECOGNIZING TEXTUAL ENTAILMENT CHALLENGE AT TAC 2009
Since 2004 RTE Challenges have promoted research in textual entailment recognition as a task that captures major semantic inference needs across many natural language processing applications, such as Question Answering (QA), Information Retrieval (IR), Information Extraction (IE), and multi-document summarization. Over the years the encouraging progresses, both in term of number of researchers involved and results achieved, have spurred to further investigate the phenomena involved by innovating the challenge every year and moving it to more realistic scenarios.
Capitalizing on the very good response obtained so far, the RTE Organizing Committee is glad to launch the Fifth Recognizing Textual Entailment Challenge, proposed for the second year as a track of Textual Analysis Conference.
WHAT IS NEW IN RTE-5
-A Textual Entailment Search Pilot task will be proposed, based on the data used in the Summarization task at TAC 2008/2009.
-The main task will be similar to the RTE-4 task, with the following changes:
-Texts will be longer, usually corresponding to a portion of the source document that a reader would naturally select, such as a paragraph or a group of related sentences.
-Texts will come from a variety of sources and will not be edited from their source documents. Thus, systems will be asked to handle real text that may include typographical errors and ungrammatical sentences.
-A development set will be released.
-The textual entailment recognition task would be based on only three application settings: QA, IE, and IR.
-Mandatory ablation tests for major knowledge resources will be required for those systems that employ these resources.
The Textual Entailment Search Pilot, representing a first step towards more realistic scenarios in the Textual Entailment Recognition task, is aimed at:
1) producing a data set which reflects the natural distribution of entailment in a corpus and presents all the problems that can raise while detecting textual entailment in a natural setting
2) analyzing the potential impact of textual entailment recognition on a real NLP application task, namely the Summarization task.
The Textual Entailment Search task consists in finding all the sentences in a set of documents that entail a given Hypothesis.
The task is situated in the Summarization application setting, where the Hypothesis (H) is taken from a Summary Content Unit (SCU), and the systems must find all the entailing sentences (Ts) in a corpus of 10 newswire documents about a common topic.
The following example is taken from the development set:
<H_sentence>Russia requested international help to rescue the AS-28.</H_sentence>
<text doc_id="APW_ENG_20050806.0018" s_id="1" evaluation="YES">At Moscow's request, Japan has dispatched four naval vessels to help rescue a Russian submarine snagged on the floor of the Pacific Ocean, but the ships aren't expected to arrive at the scene until early next week.</text>
<text doc_id="APW_ENG_20050806.0018" s_id="7" evaluation="YES">Navy spokesman Capt. Igor Dygalo said the U.S. Navy has also been asked for assistance, the RIA-Novosti news agency reported.</text>
<text doc_id="APW_ENG_20050806.0726" s_id="6" evaluation="YES">Russian authorities hope British and American unmanned submersibles, sent after a Russian plea for help, can cut the submarine loose.</text>
As it can be seen from the example above, in the Entailment Search task both Text and Hypothesis are to be interpreted in the context of the corpus and contain explicit and implicit references to entities, events, dates, places, situations, etc. pertaining to the topic.
As this Pilot requires the retrieval of entailing sentences only, contradicting sentences are not to be taken into account, and thus the entailment judgment may be seen as a two-way decision between “yes” and “no” entailment.
The guidelines for participants, together with one topic taken from the development set, are available at the RTE-5 website(http://www.nist.gov/tac/2009/RTE/index.html).
RTE is the task of recognizing that the meaning of one text, termed H(ypothesis), can be inferred by the content of another, termed T(ext). Given a set of pairs of T's and H's as input, the systems must
recognize whether each T entails the corresponding H, deciding whether:
-T entails H
-T contradicts H, or shows it false
-the veracity of H is unknown on the basis of T.
RTE-5 main task will consist of two sub-tasks:
1) The three-way RTE task, where the system must decide whether:
-T entails H - in which case the pair will be marked as ENTAILMENT
-T contradicts H - in which case the pair will be marked as CONTRADICTION
-The truth of H cannot be determined on the basis of T - in which case the pair will be marked as UNKNOWN
2) The two-way RTE task is to decide whether:
-T entails H - in which case the pair will be marked as ENTAILMENT
-T does not entail H - in which case the pair will be marked as NO ENTAILMENT
The systems can decide whether to participate in either or both tasks.
System results will be compared to a human-annotated gold-standard test corpus. Examples of three-way judgments are given at the end this document.
As in previous challenges, the test data sets will be based on multiple data sources, intended to be representative of typical problems encountered by applied systems. Specifically, data types corresponding to the following application areas will be used:
-Question Answering (QA): simulating a QA scenario in which the hypothesized answer has to be inferred from the candidate text passage
-Information Retrieval (IR): choosing propositional queries as hypotheses, and proposing relevant and irrelevant sentences retrieved by IR systems as texts
-Information Extraction/Relation Extraction (IE): generating T-H pairs, picking positive and negative examples of typical outputs of IE systems
More details are provided in the guidelines for participants available at the RTE-5 website (http://www.nist.gov/tac/2009/RTE/index.html).
THE RTE RESOURCE POOL AT ACLwiki
The RTE Resource Pool, set up for the first time during RTE3, serves as a portal and forum for publicizing and tracking resources, and reporting on their use. All the RTE participants and other members of the NLP
community who develop or use relevant resources are encouraged to contribute to this important resource.
This year we are also planning to update and integrate the RTE Resource Pool (http://www.aclweb.org/aclwiki/index.php?title=Textual_Entailment_Resource_Pool) with a section specifically dedicated to knowledge resources used. The new page will mainly contain a list of the "standard" RTE resources, which have been selected and exploited majorly in the design of RTE systems during the RTE challenges held so far, together with the links to the locations where they are made available. Moreover, a shortlist of the "top" resources will be provided, as well as some results of the data analyses which have been conducted so far on the resources presented in the page.
Pilot Development set release 3 April
Main Development Set Release: 29 May
Main and Pilot Test Set Release: 2 September 2009
Submissions: 9 September 2009
Release of individual evaluated results: 18 September 2009
TRACK COORDINATORS AND ORGANIZERS:
Luisa Bentivogli, CELCT and FBK, Italy (Track coordinator, bentivo at fbk dot it)
Ido Dagan, Bar Ilan University, Israel
Hoa Trang Dang, NIST, USA
Danilo Giampiccolo, CELCT, Italy (Track coordinator, giampiccolo at celct dot it)
Bernardo Magnini, FBK, Italy
Examples of main task three-way judgments taken from RTE 4 test set (downloadable from http://www.nist.gov/tac/data/):
- <pair id="16" entailment="ENTAILMENT" task="IR">
<t>A 66-year-old man has been sentenced to life in prison by a French court for murdering seven girls and young women. Michel Fourniret, dubbed the "Ogre of the Ardennes", had admitted kidnapping and killing his victims between 1987 and 2001.</t>
<h>Michel Fourniret was sentenced to life imprisonment.</h>
- <pair id="60" entailment="CONTRADICTION" task="IR">
<t>Syrian officials have said the bombed building was an empty military warehouse. They have refused to let nuclear inspectors visit the location, which was bulldozed after the bombing.</t>
<h>Nuclear inspectors are to visit Syria.</h>
- <pair id="100" entailment="UNKNOWN" task="IR">
<t>British and American diplomats were today attacked as they tried to investigate political violence in Zimbabwe, the US Embassy in Harare has said.</t>
<h>Diplomats were detained in Zimbabwe.</h>
- <pair id="307" entailment="ENTAILMENT" task="QA">
<t>African Union leaders ended their summit in Egypt yesterday refusing to condemn President Mugabe, cementing his hold on power even as they urged the establishment of a national unity government in Zimbabwe.</t>
<h>African Union leaders had a meeting in Egypt.</h>
- <pair id="316" entailment="CONTRADICTION" task="QA">
<t>Adopting just a couple of elements of the Mediterranean diet could cut the risk of cancer by 12%, say scientists. A study of 26,000 Greek people found just using more olive oil alone cut the risk by 9%.</t>
<h>Mediterranean foods increase the risk of cancer because of olive oil.</h>
- <pair id="327" entailment="UNKNOWN" task="QA">
<t>Speaking at a press conference held by video link from Lebanon, Shiekh Hassan Nasrallah said that the Shia Islamist group had also agreed to supply Israel with information on the airman Ron Arad, who went missing in 1986.</t>
<h>Shiekh Hassan Nasrallah is from Lebanon.</h>
- <pair id="417" entailment="UNKNOWN" task="QA">
<t>The acceleration of the shrinking of Arctic ice continues to threaten the survival of these animals. Scientists predict that the numbers of polar bears will fall by about a third, if sea ice in the Arctic continues to melt at its present rate.</t>
<h>The level of Arctic ice will fall by a third.</h>
- <pair id="534" entailment="CONTRADICTION" task="SUM">
<t>Much of the world has moved toward democracy and freedom, but China hasn't moved much and Russia seems headed in the opposite direction. Of the two, China is probably easier to deal with. It appears to have a collective leadership, which gives a certain continuity to its policy.</t>
<h>China and Russia will move toward democracy.</h>
- <pair id="614" entailment="ENTAILMENT" task="SUM">
<t>Political analyst Earl Ofari Hutchinson says Barack Obama has to capture the votes of Latinos for his Democratic presidential bid in the March 4 Texas primary.</t>
<h>Latino voters are crucial for Obama in Texas.</h>
- <pair id="709" entailment="ENTAILMENT" task="IE">
<t>A new report by the International Federation of Journalists (IFJ) documents 129 cases where media workers have been killed because of their work during 2004. They expect the number to increase as more information reaches them. This could make 2004 the deadliest year ever. 49 casualties (close to 40%) occurred in Iraq, making it by far the deadliest country for journalists. At least 20 of those appeared to be cases where journalists were directly targeted because of their profession.</t>
<h>49 media workers were killed in Iraq in 2004.</h>