ag2at.txt
The task is, essentially, to take a file in AG format, e.g. /web/ling/kieliteknologia/tutkimus/interact/AT/dialogi80ag.xml
and transform it first to a file in AT format, e.g. dialogi80at.xml and then further to DiAT format, e.g. dialogi80.xml.
A file in AG format is basically a chart: a list of edges of form "from anchor a to anchor b there is an edge with label c".
The task is to massage the list back into a form where the edge text runs in the order determiner by the anchors and the tags are nested in
some suitable way. For instance, if the chart says:
"from anchor 0 to anchor 1 there is a FOO edge with text foo".
"from anchor 1 to anchor 2 there is a BAR edge with text bar".
"from anchor 2 to anchor 3 there is a BAZ edge with text baz".
"from anchor 0 to anchor 2 there is a FOOBAR edge with text foo bar".
"from anchor 1 to anchor 3 there is a BARBAZ edge with text bar baz".
this should go into something like
foo bar baz
where the overlap of FOOBAR and BARBAZ edges is handled by splitting one of them into pieces which are coindexed with the "set" feature.
The conversion is not going to be unique in that there are many ways to do the splits. The conversions should be conservative in the
sense that iterating them does not lose information: the AG files which are generated in each back and forth conversion should be
equivalent. (Whether they should be identical bears considering.)
There are complications: what if there are alternative texts going on at the same time (e.g. speaker overlap)? But that only makes the
problem interesting :).