Ideally, to find a given form of a new member of the same class, one substitutes the inflectional stem of the new word in place of the stem of the paradigm word and reads off the resulting form. In grammars intended for human consumption, the relation between the paradigm and its representatives may be subject to simple morphophonological rules.
Compared to concatenative (IA, Item and Arrangement) morphology, the WP model does not define any correspondence between individual tags and morphs. For instance, the plural genitive of Latin nouns in -us is -orum. Compared to rule (IP, Item and Process) morphology, there is no explicit treatment of morphophonology.
Paradigms in a WP morphology can be identified by arbitrary labels (declension or conjugation number) or by a set of thematic forms which suffice to identify the paradigm. For instance, Latin verb amo belongs to first conjugation, identified by the series amo, amavi, amatum, amare.
The initial idea is quite simple. Murf reads in forms in a set of tagged forms, trying to place each form in a finite state network, maximising the match of the new form in the existing network. The new form is matched with the existing network at both ends of the net. A match which leaves the least unmatched residue is chosen, and the missing part is added into the net as a new arc.
Given, for instance, a paradigm
Form Tagging talossa talo 1 N SG INE taloissa talo 1 N SG INE talona talo 1 N SG ESSMurf correctly infers that the plural essive form is taloina:
0: talo talo 1 N 4 4 SG 5 5 ssa INE 1. 5 na ESS 1. 4 i PL 5 5(The number following the base form identifies the base as a member of a given paradigm.) As the net shows, Murf is able to infer a segmentation of the forms into morphs and tags the morphs appropriately. As a side effect of entering the attested form in the network, new, unattested forms may get generated through re-entrances in the net. Call such forms side effects.
The initial idea needs a number of refinements to capture familiar morphological phenomena in real data. They include morphotax, complementary distribution, free variation, blocking, defective paradigms and productivity.
Morphotax concerns the admissible orders of tags in a well-formed word. The heuristics Murf follows here is that a proposed match of a new word is not allowed to produce unattested taggings. To guarantee that, Murf first forms a separate morphotax network of the taggings it has encountered. When a new form is considered for entry at a given place of the net, its side effects are checked for morphotax.
Complementary distribution is present when any given tagging is realised by just one form, although tags occurring in it have more than one allomorph. For instance, Finnish partitive endings tA and A are in complementary distribution, the former occurs after heavy syllables and the latter after light ones. Identically tagged forms are in free variation. Murf implements a complementary distribution check which prevents production of free variants as a side effect of insertion.
To allow genuine free variation past the complementary distribution check, it suffices to tag the variants as different. For instance, Finnish third person possessive suffix has two forms nsA and Vn which are in free variation after light open syllables. They are tagged as P3/A and P3, respectively.
Another distributional gap is that nouns do not occur in comitative plural without possessive suffix (adjectives do). To record such gaps Murf allows definition of separate networks for exceptions. For instance, entry
*- - N PL_COMdisallows a noun ending in plural comitative.
Blocking refers to the phenomenon that a lexicalised exception to a regular rule blocks a productive, regular rule. For instance, Finnish nominative plural is talot, not taloi, as one might be led to expect from the previous data. Murf accounts for blocking in the following way. When a paradigm is read in, all forms in it are put on a waiting list. Whenever a form is inserted, forms on the waiting list are checked for blocking. An insertion is not allowed if it would produce a side effect blocked by a form on the waiting list.
Some paradigms are defective in that some forms are missing from an expected cross classification. For instance, Finnish comitative and instrumental (instructive) cases only have one number (plural). From a combinatorial point of view, case and number form in these cases a portmanteau morph instead of two independent morphs. The most straightforward way of recording this gap in distribution is to make the tag combination PL_COM a tag on its own.
Productivity refers to the fact that certain forms by default generalise to new words, while others are by default restricted to a closed set of forms. (This fact is one of the main motivations of paradigm morphology in the first place.) For instance, Finnish nominals have productive vowel stems and less productive consonant stems. A new base form pokemon will automatically go in the productive wovel stem paradigm. Murf allows marking a variant as a nonproductive one as follows:
tienoisiin tienoo 24 N PL ILL tienoihin tienoo 24 N PL_ILL/h! tienoiden tienoo 24 N PL GEN tienoitten tienoo 24 N PL_GEN/tt!
Nonproductive variants marked with ! will not be generalised into paradigms where they have not been specifically licensed by attested forms.
Murf allows constraining derivational endings with a categorial grammar style tag format X\Y
onneton onni 8 N tOn N\A 57 A SG NOMThis constrains tOn to combine with nouns and produce adjectives. (Formally, X\Y is analogous to a portmanteau tag in that it constrains variation at a point in the net.)
With little space/time optimisation done so far, adding new paradigms gets slow toward the end of the process. Adding new words to existing paradigms can be made faster. It is also possible to use the net to guess the paradigms of unknown words on the basis of thematic forms.