Symbol Layer


Datatypes for Symbols and Symbol Alphabets

typedef Symbol Symbol
 A handle for a symbol name, i.e. a string.
typedef SymbolSet SymbolSet
 A set of symbols aka an alphabet of symbols.
typedef SymbolIterator SymbolIterator
 Iterator over the symbols in a SymbolSet.
typedef SymbolPair SymbolPair
 A pair of symbols representing a transition in a transducer.
typedef SymbolPairSet SymbolPairSet
 A set of symbol pairs aka an alphabet of symbol pairs.
typedef SymbolPairIterator SymbolPairIterator
 Iterator over the set of symbol pairs in a SymbolPairSet.
typedef KeyTable KeyTable
 A table for storing Key-to-Symbol associations.

Defining and Using Symbols

Symbol define_symbol (const char *s)
 Define a symbol with name s.
bool is_symbol (const char *s)
 Whether the string s indicates a name for a symbol.
Symbol get_symbol (const char *s)
 Find the symbol for the symbol name s.
const char * get_symbol_name (Symbol s)
 Find the symbol name for the symbol s.
bool is_equal (Symbol s1, Symbol s2)
 Whether the symbol s1 is identical to symbol s2.

Defining and Using Alphabets of Symbols

SymbolSetcreate_empty_symbol_set ()
 Define an empty set of symbols.
SymbolSetinsert_symbol (Symbol s, SymbolSet *Si)
 Insert s into the set of symbols Si and return the updated set.
bool has_symbol (Symbol s, SymbolSet *Si)
 Whether symbol s is a member of the set of symbols Si.

Iterators over Symbols

SymbolIterator begin_sigma_symbol (SymbolSet *Si)
 Beginning of the iterator for the symbol set Si.
SymbolIterator end_sigma_symbol (SymbolSet *Si)
 End of the iterator for the symbol set Si.
size_t size_sigma_symbol (SymbolSet *Si)
 Size of the iterator for the symbol set Si.
Symbol get_sigma_symbol (SymbolIterator Si)
 Get the symbol pointed by the symbol iterator si.

Defining and Using Symbol Pairs

SymbolPairdefine_symbolpair (Symbol s1, Symbol s2)
 Define a symbol pair with input symbol s1 and output symbol s2.
Symbol get_input_symbol (SymbolPair *s)
 Get the input symbol of SymbolPair s.
Symbol get_output_symbol (SymbolPair *s)
 Get the output symbol of SymbolPair s.

Defining and Using Alphabets of Symbol Pairs

SymbolPairSetcreate_empty_symbolpair_set ()
 Define an empty set of symbol pairs.
SymbolPairSetinsert_symbolpair (SymbolPair *p, SymbolPairSet *Pi)
 Insert p into the set of symbol pairs Pi and return the updated set.
bool has_symbolpair (SymbolPair *p, SymbolPairSet *Pi)
 Whether symbol pair p is a member of the set of symbol pairs Pi.

Iterators over Symbol Pairs

SymbolPairIterator begin_pi_symbol (SymbolPairSet *Pi)
 Beginning of the iterator for the symbol pair set Pi.
SymbolPairIterator end_pi_symbol (SymbolPairSet *Pi)
 End of the iterator for the symbol pair set Pi.
size_t size_pi_symbol (SymbolPairSet *Pi)
 Size of the iterator for the symbol pair set Pi.
SymbolPairget_pi_symbolpair (SymbolPairIterator pi)
 Get the symbol pair pointed by the symbol pair iterator pi.

Defining the Connection between Symbols and Transducer Keys.

The relation 1:N between keys and symbols is useful for dealing with equivalence classes of symbols.

KeyTablecreate_key_table ()
 Create an empty key table.
bool is_key (Key i, KeyTable *T)
 Whether i indicates an existing key in key table T.
bool is_symbol (Symbol s, KeyTable *T)
 Whether s indicates an existing symbol in key table T.
void associate_key (Key i, KeyTable *T, Symbol s)
 Associate the key i in the key table T with the symbol s.
Key get_key (Symbol s, KeyTable *T)
 Find the key for the symbol s in key table T.
Key get_unused_key (KeyTable *T)
 Return a Key which hasn't been associated to any symbol in key table T.
Symbol get_key_symbol (Key i, KeyTable *T)
 Find a symbol for the key i in key table T.
KeySetget_key_set (KeyTable *T)
 A set of keys in key table T.
SymbolSetget_symbol_set (KeyTable *T)
 A set of symbols in key table T.
KeyTableread_symbol_table (istream &is, bool binary=false)
 Read a symbol table from istream is and transform it to a key table. binary defines whether the symbol table is in binary or text format.
void write_symbol_table (KeyTable *T, ostream &os, bool binary=false)
 Transform the key table T to a symbol table and write it to ostream os. binary defines whether the symbol table is written in binary or text format.
KeyTablegather_flag_diacritic_table (KeyTable *kt)
 Return a new key table only including those key/symbol pairs which correspond to flag-diacritic symbol names.

Reading Symbol Strings and Transducers

Read transducers

(1) in text format from pair strings and input streams and

(2) in binary format from files and input streams so that the keys used in the transducer are harmonized according to a key table.

TransducerHandle longest_match_tokenizer (KeySet *ks, KeyTable *kt)
 Create a left to right longest match tokenizer for symbols in key set ks.
TransducerHandle longest_match_tokenizer2 (KeyTable *kt)
 Create a left to right longest match tokenizer for symbols in key set ks.
KeyTablerecode_key_table (KeyTable *kt, const char *epsilon_replacement)
 Replace the epsilon in kt, with epsilon_replacement.
KeyPairVectortokenize_string_pair (TransducerHandle tokeniser, const char *upper, const char *lower, KeyTable *inputKeys)
 Change 2 strings to a transducer aligned character by character according to tokenisation by tokeniser. The path(s) of result of composition of of string’s UTF-8 representations against tokeniser are paired up to a new tokeniser from beginning to end. Empty spaces in the end are filled with ε’s.
KeyVectortokenize_string (TransducerHandle tokeniser, const char *string, KeyTable *inputKeys)
 Change a string s into identity pair transducer as tokenised by tokeniser.
KeyVectorlongest_match_tokenize (TransducerHandle tokenizer, const char *string, KeyTable *inputKeys)
 Use tokenizer to tokenize string.
KeyPairVectorlongest_match_tokenize_pair (TransducerHandle tokenizer, const char *string1, const char *string2, KeyTable *inputKeys)
 Use tokenizer to tokenize string1 and string2 and align the tokenized strings to a key pair vector.
KeyPairVectortokenize_pair_string (TransducerHandle tokeniser, char *pairs, KeyTable *inputKeys)
 Tokenise with tokeniser a string s of individual characters and colon separated pairs into transducer.
TransducerHandle pairstring_to_transducer (const char *str, KeyTable *T)
 Create a one-path transducer as defined in pairstring form in str using the symbols defined in key table T.
TransducerHandle read_transducer_text (istream &is, KeyTable *T, bool sfst=false)
 Make a transducer as defined in text form in istream is using the key-to-printname relations defined in key table T. The parameter sfst defines whether SFST text format is used, otherwise AT&T format is used.
bool has_symbol_table (istream &is)
 Whether the transducer coming from istream is has a symbol table stored with it.
TransducerHandle read_transducer (istream &is, KeyTable *T)
 Read a transducer in binary form from input stream is and harmonize it according to the key table T.
TransducerHandle harmonize_transducer (TransducerHandle t, KeyTable *T_old, KeyTable *T_new)
 Harmonize transducer t that uses key table T_old according to key table T _new.

Writing Symbol Strings and Transducers

Write transducers

(1) in text format into pair strings and output streams and

(2) in binary format to output streams so that the print names associated to keys are stored with the transducer.

char * transducer_to_pairstring (TransducerHandle t, KeyTable *T, bool spaces=true, bool print_epsilons=true)
 A pairstring representation of one-path transducer t using the symbols defined in key table T. spaces defines whether pairs are separated by spaces.
void print_transducer (TransducerHandle t, KeyTable *T, bool print_weights=false, ostream &ostr=std::cout, bool old=false)
 Print transducer t in text format using the symbols defined in key table T. The parameter print_weights indicates whether weights are included, the output stream ostr indicates where printing is directed. Parameter old indicates whether transducer t should be printed in old SFST text format instead of AT&T format.
void write_transducer (TransducerHandle t, KeyTable *T, ostream &os=std::cout, bool backwards_compatibility=false)
 Write t in binary form to output stream os. Key table T is stored with the transducer.
void write_runtime_transducer (TransducerHandle t, KeyTable *kt, FILE *output_file)
 Write a transducer t with key table kt into file output_file. Write its symbols into the file with name symbol_file_name.

Detailed Description

Datatypes and functions related to symbols and the relation between symbols and keys.

Typedef Documentation

typedef KeyTable KeyTable

A table for storing Key-to-Symbol associations.

A key can be associated to several symbols but a symbol is associated to only one key.

Definition at line 57 of file symbol-layer.h.

typedef Symbol Symbol

A handle for a symbol name, i.e. a string.

Symbol is the type of a handle for such a symbol that could occur in cell of an input or output tape or as input or output labels of transitions in transducers, or of a special-use symbols that do not occur on tapes but occur only as input or output transition labels having a special interpretation, e.g. any, default, failure, etc., which is indicated by an attribute of the transducer.

There is a global, session-spesific table of Symbol-to-string relations, called the the global symbol cache. In the symbol cache, one Symbol is associated with one string and for one string there is one Symbol representing it, i.e. the relation between strings and Symbols is one-to-one.

Definition at line 34 of file symbol-layer.h.

Iterator over the symbols in a SymbolSet.

Definition at line 40 of file symbol-layer.h.

A pair of symbols representing a transition in a transducer.

Definition at line 43 of file symbol-layer.h.

Iterator over the set of symbol pairs in a SymbolPairSet.

Definition at line 49 of file symbol-layer.h.

A set of symbol pairs aka an alphabet of symbol pairs.

Definition at line 46 of file symbol-layer.h.

A set of symbols aka an alphabet of symbols.

Definition at line 37 of file symbol-layer.h.


Function Documentation

void associate_key ( Key  i,
KeyTable T,
Symbol  s 
)

Associate the key i in the key table T with the symbol s.

The symbol that is first associated with a key, becomes the primary symbol for that key. If key i has already been associated with one or more symbol(s) not equal to s, the symbol s becomes a parallel symbol for the key i.

SymbolPairIterator begin_pi_symbol ( SymbolPairSet Pi  ) 

Beginning of the iterator for the symbol pair set Pi.

SymbolIterator begin_sigma_symbol ( SymbolSet Si  ) 

Beginning of the iterator for the symbol set Si.

SymbolSet* create_empty_symbol_set (  ) 

Define an empty set of symbols.

SymbolPairSet* create_empty_symbolpair_set (  ) 

Define an empty set of symbol pairs.

KeyTable* create_key_table (  ) 

Create an empty key table.

The result has no associations defined between symbols and keys.

Symbol define_symbol ( const char *  s  ) 

Define a symbol with name s.

SymbolPair* define_symbolpair ( Symbol  s1,
Symbol  s2 
)

Define a symbol pair with input symbol s1 and output symbol s2.

SymbolPairIterator end_pi_symbol ( SymbolPairSet Pi  ) 

End of the iterator for the symbol pair set Pi.

SymbolIterator end_sigma_symbol ( SymbolSet Si  ) 

End of the iterator for the symbol set Si.

KeyTable* gather_flag_diacritic_table ( KeyTable kt  ) 

Return a new key table only including those key/symbol pairs which correspond to flag-diacritic symbol names.

Flag-diacritic symbol names begin and end with an '@'.

Symbol get_input_symbol ( SymbolPair s  ) 

Get the input symbol of SymbolPair s.

Key get_key ( Symbol  s,
KeyTable T 
)

Find the key for the symbol s in key table T.

KeySet* get_key_set ( KeyTable T  ) 

A set of keys in key table T.

Symbol get_key_symbol ( Key  i,
KeyTable T 
)

Find a symbol for the key i in key table T.

If there are several symbols associated with the key, the primary symbol (the symbol that was first associated with the key) is returned.

Symbol get_output_symbol ( SymbolPair s  ) 

Get the output symbol of SymbolPair s.

SymbolPair* get_pi_symbolpair ( SymbolPairIterator  pi  ) 

Get the symbol pair pointed by the symbol pair iterator pi.

Symbol get_sigma_symbol ( SymbolIterator  Si  ) 

Get the symbol pointed by the symbol iterator si.

Symbol get_symbol ( const char *  s  ) 

Find the symbol for the symbol name s.

Precondition:
s must refer to a symbol name. Use is_symbol to check this if you are not sure.

const char* get_symbol_name ( Symbol  s  ) 

Find the symbol name for the symbol s.

SymbolSet* get_symbol_set ( KeyTable T  ) 

A set of symbols in key table T.

Key get_unused_key ( KeyTable T  ) 

Return a Key which hasn't been associated to any symbol in key table T.

TransducerHandle harmonize_transducer ( TransducerHandle  t,
KeyTable T_old,
KeyTable T_new 
)

Harmonize transducer t that uses key table T_old according to key table T _new.

See also:
read_transducer

Definition at line 2496 of file hofst.C.

bool has_symbol ( Symbol  s,
SymbolSet Si 
)

Whether symbol s is a member of the set of symbols Si.

bool has_symbol_table ( istream &  is  ) 

Whether the transducer coming from istream is has a symbol table stored with it.

Precondition:
The transducer is in valid format and the end of stream has not been reached. Use read_format to check this.

Definition at line 2520 of file hofst.C.

bool has_symbolpair ( SymbolPair p,
SymbolPairSet Pi 
)

Whether symbol pair p is a member of the set of symbol pairs Pi.

SymbolSet* insert_symbol ( Symbol  s,
SymbolSet Si 
)

Insert s into the set of symbols Si and return the updated set.

SymbolPairSet* insert_symbolpair ( SymbolPair p,
SymbolPairSet Pi 
)

Insert p into the set of symbol pairs Pi and return the updated set.

bool is_equal ( Symbol  s1,
Symbol  s2 
)

Whether the symbol s1 is identical to symbol s2.

bool is_key ( Key  i,
KeyTable T 
)

Whether i indicates an existing key in key table T.

bool is_symbol ( Symbol  s,
KeyTable T 
)

Whether s indicates an existing symbol in key table T.

bool is_symbol ( const char *  s  ) 

Whether the string s indicates a name for a symbol.

KeyVector* longest_match_tokenize ( TransducerHandle  tokenizer,
const char *  string,
KeyTable inputKeys 
)

Use tokenizer to tokenize string.

The transducer tokenizer should be created using the function longest_match_tokenizer2. The key table inputKeys should contain all characters in string and be compatible with tokenizer.

Definition at line 2167 of file hofst.C.

KeyPairVector* longest_match_tokenize_pair ( TransducerHandle  tokenizer,
const char *  string1,
const char *  string2,
KeyTable inputKeys 
)

Use tokenizer to tokenize string1 and string2 and align the tokenized strings to a key pair vector.

The transducer tokenizer should be created using the function longest_match_tokenizer2. The key table inputKeys should contain all characters in string1 and string2 and be compatible with tokenizer. The tokenized strings will be aligned into a key pair vector. The shorter one of the tokenized strings will be padded with zeroes at the end.

Definition at line 2178 of file hofst.C.

TransducerHandle longest_match_tokenizer ( KeySet ks,
KeyTable kt 
)

Create a left to right longest match tokenizer for symbols in key set ks.

The keytable kt should contain the letters which make up the symbols for keys in ks. The keyset ks should not contain the key epsilon! The resulting transducer can be composed with other transducers to accomplish tokenization.

Definition at line 1904 of file hofst.C.

TransducerHandle longest_match_tokenizer2 ( KeyTable kt  ) 

Create a left to right longest match tokenizer for symbols in key set ks.

The keytable kt should contain the letters which make up its multicharacter symbols. Tokenization can be accomplished using functions longest_match_tokenize and longest_match_tokenize_pair.

Definition at line 2005 of file hofst.C.

TransducerHandle pairstring_to_transducer ( const char *  str,
KeyTable T 
)

Create a one-path transducer as defined in pairstring form in str using the symbols defined in key table T.

The transitions must be written one after another separated by a space. (For automatic tokenization of symbols, see tokenize_pair_string.) If the input and output symbols are not equal, they are separated by a colon. If the backslash '\' and colon ':' are part of a symbol name, they must be escaped as "\\" and "\:".

For example the string "a:\: cd:e" represents a transducer with consecutive transitions mapping "a" to ":" and "cd" to "e".

See also:
transducer_to_pairstring

Definition at line 4629 of file hofst.C.

void print_transducer ( TransducerHandle  t,
KeyTable T,
bool  print_weights = false,
ostream &  ostr = std::cout,
bool  old = false 
)

Print transducer t in text format using the symbols defined in key table T. The parameter print_weights indicates whether weights are included, the output stream ostr indicates where printing is directed. Parameter old indicates whether transducer t should be printed in old SFST text format instead of AT&T format.

In HFST the print_weight parameter is ignored.

In At&T and SFST format, the newline, horizontal tab, carriage return, vertical tab, formfeed, bell character, backspace, backslash and space are printed as "\n", "\t", "\r", "\v", "\f" "\a", "\b", "\\" and "\0x20". In SFST format, the colon and angle brackets are printed as "\:", "\<" and "\>".

See also:
read_transducer_text

Definition at line 2589 of file hofst.C.

KeyTable* read_symbol_table ( istream &  is,
bool  binary = false 
)

Read a symbol table from istream is and transform it to a key table. binary defines whether the symbol table is in binary or text format.

Key table and symbol table are two ways of representing key-to-string mappings. Key tables are used during a session and symbol tables when moving or storing information between sessions.

During a session, a key table associates keys to symbol handles and the global symbol cache associates symbol handles to strings.

Between sessions, a symbol table associates keys directly to strings, as there is no symbol cache.

A symbol table in OpenFst text format lists each symbol name and its associated key on one line. The symbol name and the associated key are separated by a tabulator. If several symbol names are associated to the same key, the one listed first is considered the primary print name for that key.

An example:


KeyTable          Global symbol cache      Symbol table            Symbol table in text format     
--------          -------------------      ------------            ---------------------------

Key  Symbol       Symbol    string         Key   string            <> TAB 0
                                                                   <eps> TAB 0
 0     0, 1         0         "<>"          0      "<>", "<eps>"   a TAB 1 
 1     2            1         "<eps>"       1      "a"             b TAB 2
 2     4            2         "a"           2      "b"             c TAB 3
 3     5            3         "A"           3      "c" 
                    4         "b"
                    5         "c"
                    6         "d"

TransducerHandle read_transducer ( istream &  is,
KeyTable T 
)

Read a transducer in binary form from input stream is and harmonize it according to the key table T.

Following notations are used: Ts = the transducer read from istream is and S = the symbol table of transducer Tr.

Harmonization is done in the following way:

If T is empty (made with create_key_table), S is copied to T as such and all keys used in Ts remain the same i.e. no harmonization is done.

If T is not empty, the harmonization goes as follows. For each input and output key in a transition in Ts, a corresponding primary print name is looked in S. A corresponding key value for this print name is then looked in T and the original input or output key is replaced with this key. Epsilon keys are copied as such (the primary name of epsilon is thus defined solely by T). If a primary print name used in Ts is not found in T, it is added to T and to the global symbol cache to the next free position.

Some special cases: (1) If a key used in Ts is not found in S, it is replaced by next free key in T, but it is not added to T as it has no print name (the side effect is that the key after next free key in T is associated with a dummy Symbol, so it is recommended that all keys used in Ts are in S.) (2) Keys defined in S that are not used in Ts are not copied to T.

Precondition:
The transducer read from istream is must have a symbol table stored with it.
Returns:
The harmonized version of the transducer read from istream is. If end of stream is reached, NULL.

Definition at line 2426 of file hofst.C.

TransducerHandle read_transducer_text ( istream &  is,
KeyTable T,
bool  sfst = false 
)

Make a transducer as defined in text form in istream is using the key-to-printname relations defined in key table T. The parameter sfst defines whether SFST text format is used, otherwise AT&T format is used.

In At&T and SFST format, the newline, horizontal tab, carriage return, vertical tab, formfeed, bell character, backspace, backslash and space must be escaped as "\n", "\t", "\r", "\v", "\f" "\a", "\b", "\\" and "\0x20". In SFST format, the colon and angle brackets must be escaped as "\:", "\<" and "\>".

An example of a transducer file:

AT&T                                       AT&T UNWEIGHTED               SFST                         

0      0                                   0                             final  0
0      1      a      aa     0.3            0      1      a      aa       0      a:aa   1
0      2      b      b      0              0      2      b      b        0      b      2
1      0      c      C      0.5            1      0      c      C        1      c:C    0
2      1      \n     c      0              2      1      \n     c        2      \n:c   1
2      0      a      A      1.2            2      0      a      A        2      a:A    0
2      2      d      D      1.65           2      2      d      D        2      d:D    2
2      0.5                                 2                             final  2 

The syntax of the lines in the text format is one of the following in the AT&T format:

  • originating_node TAB destination_node TAB input_symbol TAB output_symbol (TAB transition_weight)
  • final_node (TAB final_weight)

and one of the following in sfst format:

  • originating_node TAB input_symbol:output_symbol TAB destination_node
  • final TAB final_node

When AT&T format is used in HFST, weights are ignored. When SFST or AT&T unweighted format is used in HWFST, weights are set to zero.

Precondition:
All printnames used in the text format representation of the transducer must be in the key table T.
Returns:
A transducer as defined in is. If end of stream is reached, NULL.
See also:
print_transducer

Definition at line 5138 of file hofst.C.

KeyTable* recode_key_table ( KeyTable kt,
const char *  epsilon_replacement 
)

Replace the epsilon in kt, with epsilon_replacement.

When tokenizing input-strings, the strings should never contain a substring matching the symbol name of the epsilon key in the KeyTable used in tokenization. Therefore the epsilons in the tokenizer should be replaced by an internal epsilon-symbol, which is unlikely to occur in real input-strings.

recode_key_table returns a KeyTable, which is the same as kt, except the key 0 corresponds to the internal epsilon symbol name epsilon_replacement and the original epsilon symbol name corresponds to the first unused key in kt.

Definition at line 1984 of file hofst.C.

size_t size_pi_symbol ( SymbolPairSet Pi  ) 

Size of the iterator for the symbol pair set Pi.

size_t size_sigma_symbol ( SymbolSet Si  ) 

Size of the iterator for the symbol set Si.

KeyPairVector* tokenize_pair_string ( TransducerHandle  tokeniser,
char *  pairs,
KeyTable inputKeys 
)

Tokenise with tokeniser a string s of individual characters and colon separated pairs into transducer.

E.g. a string cat+pl:s will be made to c a t +pl:s given that tokeniser creates such tokens.

Parameters:
tokeniser A transducer that, upon composing leftwards against transducer made of UTF-8 characters of string, results in acyclic tokenisation(s) of original path.
pairs UTF-8 encoded string for transducer
inputKeys KeyTable that matches mapping of UTF-8 characters on input side of tokeniser.
Returns:
Transducer that contains as paths all possible aligned tokenisation(s) of upper : lower.
Todo:
does not support ambiguous tokenisations (i.e. with more than one path.

Definition at line 593 of file hsfst.C.

KeyVector* tokenize_string ( TransducerHandle  tokeniser,
const char *  string,
KeyTable inputKeys 
)

Change a string s into identity pair transducer as tokenised by tokeniser.

E.g. a string cat will be tokenised as transducer c a t, given that tokeniser creates tokens for c, a, and t.

Parameters:
tokeniser A transducer that, upon composing leftwards against transducer made of UTF-8 characters of string, results in acyclic tokenisation(s) of original path.
string UTF-8 encoded string for transducer pairs.
inputKeys KeyTable that matches mapping of UTF-8 characters on input side of tokeniser.
Returns:
Transducer that contains as paths of s tokenised with tokeniser.
Todo:
does not support ambiguous tokenisations (i.e. with more than one path.

Definition at line 2158 of file hofst.C.

KeyPairVector* tokenize_string_pair ( TransducerHandle  tokeniser,
const char *  upper,
const char *  lower,
KeyTable inputKeys 
)

Change 2 strings to a transducer aligned character by character according to tokenisation by tokeniser. The path(s) of result of composition of of string’s UTF-8 representations against tokeniser are paired up to a new tokeniser from beginning to end. Empty spaces in the end are filled with ε’s.

E.g. strings cat dog are aligned as c:d a:o g:t. Strings ääliö ääliöitä are aligned as ä ä l i ö ε:i ε:t ε:ä. And talo+NOUN+SINGULAR+NOMINATIVE talo as t a l o +NOUN:ε +SINGULAR:ε +NOMINATIVE:ε, given that tokeniser and keytable contains those symbols.

If specific alignment is required, it is possible to specify ε’s manually using the string for ε that is defined in inputKeys.

A tokeniser tokeniser may be built manually using or with functions, such as longestMatchTokeniser(...)

Parameters:
tokeniser A transducer that, upon composing leftwards against transducer made of UTF-8 characters of string, results in acyclic tokenisation(s) of original path.
upper UTF-8 encoded string for input side of transducer.
lower UTF-8 encoded string for output side of transducer.
inputKeys KeyTable that matches mapping of UTF-8 characters on input side of tokeniser.
Returns:
Transducer that contains as paths all possible aligned tokenisation(s) of upper : lower.
Todo:
does not support ambiguous tokenisations (i.e. with more than one path.

Definition at line 2207 of file hofst.C.

char* transducer_to_pairstring ( TransducerHandle  t,
KeyTable T,
bool  spaces = true,
bool  print_epsilons = true 
)

A pairstring representation of one-path transducer t using the symbols defined in key table T. spaces defines whether pairs are separated by spaces.

The transitions are printed one after another, separated by spaces if so requested. If the input and output symbols are not equal, they are separated by a colon. If the backslash '\' and colon ':' are part of a symbol print name, they are escaped as "\\" and "\:".

The empty transducer is represented by "\empty_transducer" and the epsilon transducer as "EPS" where EPS is the symbol name for epsilon (pairstring_to_transducer recognizes "" as the epsilon transducer, but "EPS" is a more user-friendly notation). If the symbol name for epsilon is not defined, "\epsilon" is returned.

See also:
pairstring_to_transducer

Definition at line 4582 of file hofst.C.

void write_runtime_transducer ( TransducerHandle  t,
KeyTable kt,
FILE *  output_file 
)

Write a transducer t with key table kt into file output_file. Write its symbols into the file with name symbol_file_name.

Definition at line 2136 of file hofst.C.

void write_symbol_table ( KeyTable T,
ostream &  os,
bool  binary = false 
)

Transform the key table T to a symbol table and write it to ostream os. binary defines whether the symbol table is written in binary or text format.

See also:
read_symbol_table

void write_transducer ( TransducerHandle  t,
KeyTable T,
ostream &  os = std::cout,
bool  backwards_compatibility = false 
)

Write t in binary form to output stream os. Key table T is stored with the transducer.

Parameters:
t Transducer to be written
T Key table that is stored with the transducer
os Where transducer is written
backwards_compatibility Whether the transducer is written in SFST/OpenFst compatible format.

Definition at line 2736 of file hofst.C.


Generated on Tue Sep 29 11:43:34 2009 for Helsinki Finite-State Transducer Technology (HFST) interface by  doxygen 1.5.8