BRIEFS General Description
Basic Linguistic Processing
The heart of the BRIEFS system is the Functional Dependency Grammar (Fdg) parser provided by Connexor Oy. The parser splits the text to tokens and sentences and produces the following information for the tokens:
base form (aka lemma)
functional dependency representing structural information within sentences
name type information if the token is recognised as a name.
-------- näiden viivojen välissä oleva alue muokattava: ---- alkaa ---- Consider the analysis of a sentence I remember seeing her somewhere before below. The fifth word in the sentence, her, has the base form she. It is analysed as a direct object ( obj), dependent of the word number 4. The functional dependency type is shown by the functional tag @ OBJ, followed by the surface-syntactic label %NH indicating that the word is the head at the phrase level. Morphological labels further specify that the word is the third-person form of a personal prono un in accusative case.
1 I i subj:>2 @SUBJ %NH PRON PERS NOM SG1 2 remember remember main:>0 @+FMAINV %VA V PRES -SG3 3 having have v-ch:>4 @-FAUXV %AUX ING 4 seen see obj:>2 @-FMAINV %VA EN 5 her she obj:>4 @OBJ %NH PRON PERS ACC SG3 6 somewhere somewhere loc:>4 @ADVL %EH ADV 7 before before tmp:>4 @ADVL %EH ADV . .
For more information on the parser please contact Connexor Oy.
Some additional processing follows the basic processing by Fdg. First, numerical expressions are identified. Second, there is the recognition and classification of proper names (e.g. companies, persons, products,.). Third, noun and verb phrases in the domain are identified.
Numerical expressions are identified (dates, currencies and physical measures) by Briefs Numex. The expressions are converted to standard formats with "value" and "unit of measure". This makes the expressions computable, the prime goal of the BRIEFS project.
The name processing in Briefs Name relies on name identification by Fdg. However, more granularity in the classification of names than provided by Fdg might be required. Briefs Name relies on custom name lists in its classification. It also changes the name of a known company to the form specified in the custom lists. The function of the co-reference processing in Briefs Name is to address different ways to write for example the name of a company as well as to resolve simple references to a name by pronouns or expressions such as "they" or "the company". A more detailed description of name processing can be found in documentation for Named Entiti es.
To assist in the identification, classification and accumulation of names and special terms a names editor has been built as part of the BRIEFS IE Config Tool. The editor supports the verification and editing of the results of Briefs Name. The editor is described in detail in documentation for Information Extr action.
The output of the Fdg parser is exploited to form noun and verb phrases in Briefs Xp. Noun phrases may indicate multi-word terms or what kinds of attributes are used to qualify the concepts in a domain. Verb phrases contain the main verb and its auxiliaries.
---------- loppuu ----------
HUT/TAI Research Centre; Matti Keijola, Lauri Seitsonen; Last modified 24.4.2003
- Dokumentaatio: /usr/share/doc/briefs/documents
- Kotisivu: Connexor Oy
Wed Apr 27 13:41:04 2005