Contents Previous   Next  

Chapter 4


Speech Engines: javax.speech
 

This chapter introduces the javax.speech package.This package defines the behavior of all speech engines (speech recognizers and synthesizers). The topics covered include:

 


 

4.1     What is a Speech Engine?

The javax.speech package of the Java Speech API defines an abstract software representation of a speech engine. "Speech engine" is the generic term for a system designed to deal with either speech input or speech output. Speech synthesizers and speech recognizers are both speech engine instances. Speaker verification systems and speaker identification systems are also speech engines but are not currently supported through the Java Speech API.

The javax.speech package defines classes and interfaces that define the basic functionality of an engine. The javax.speech.synthesis package and javax.speech.recognition package extend and augment the basic functionality to define the specific capabilities of speech synthesizers and speech recognizers.

The Java Speech API makes only one assumption about the implementation of a JSAPI engine: that it provides a true implementation of the Java classes and interfaces defined by the API. In supporting those classes and interfaces, an engine may completely software-based or may be a combination of software and hardware. The engine may be local to the client computer or remotely operating on a server. The engine may be written entirely as Java software or may be a combination of Java software and native code.

The basic processes for using a speech engine in an application are as follows.

  1. Identify the application's functional requirements for an engine (e.g, language or dictation capability).
  2. Locate and create an engine that meets those functional requirements.
  3. Allocate the resources for the engine.
  4. Set up the engine.
  5. Begin operation of the engine - technically, resume it.
  6. Use the engine
  7. Deallocate the resources of the engine.

Steps 4 and 6 in this process operate differently for the two types of speech engine - recognizer or synthesizer. The other steps apply to all speech engines and are described in the remainder of this chapter.

The "Hello World!" code example for speech synthesis and the "Hello World!" code example for speech recognition both illustrate the 7 steps described above. They also show that simple speech applications are simple to write with the Java Speech API - writing your first speech application should not be too hard.

 


 

4.2     Properties of a Speech Engine

Applications are responsible for determining their functional requirements for a speech synthesizer and/or speech recognizer. For example, an application might determine that it needs a dictation recognizer for the local language or a speech synthesizer for Korean with a female voice. Applications are also responsible for determining behavior when there is no speech engine available with the required features. Based on specific functional requirements, a speech engine can be selected, created, and started. This section explains how the features of a speech engine are used in engine selection, and how those features are handled in Java software.

Functional requirements are handled in applications as engine selection properties. Each installed speech synthesizer and speech recognizer is defined by a set of properties. An installed engine may have one or many modes of operation, each defined by a unique set of properties, and encapsulated in a mode descriptor object.

The basic engine properties are defined in the EngineModeDesc class. Additional specific properties for speech recognizers and synthesizers are defined by the RecognizerModeDesc and SynthesizerModeDesc classes that are contained in the javax.speech.recognition and javax.speech.synthesis packages respectively.

In addition to mode descriptor objects provided by speech engines to describe their capabilities, an application can create its own mode descriptor objects to indicate its functional requirements. The same Java classes are used for both purposes. An engine-provided mode descriptor describes an actual mode of operation whereas an application-defined mode descriptor defines a preferred or desired mode of operation. (Locating, Selecting and Creating Engines describes the use of a mode descriptor.)

The basic properties defined for all speech engines are listed in Table 4-1

Table 4-1 Basic engine selection properties: EngineModeDesc
Property Name  Description  
EngineName  A String that defines the name of the speech engine. e.g., "Acme Dictation System".  
ModeName  A String that defines a specific mode of operation of the speech engine. e.g. "Acme Spanish Dictator".  
Locale  A java.util.Locale object that indicates the language supported by the speech engine, and optionally, a country and a variant. The Locale class uses standard ISO 639 language codes and ISO 3166 country codes. For example, Locale("fr", "ca") represents a Canadian French locale, and Locale("en", "") represents English (the language).  
Running  A Boolean object that is TRUE for engines which are already running on a platform, otherwise FALSE. Selecting a running engine allows for sharing of resources and may also allow for fast creation of a speech engine object.  
.

The one additional property defined by the SynthesizerModeDesc class for speech synthesizers is shown in Table 4-2

Table 4-2 Synthesizer selection properties: SynthesizerModeDesc
Property Name  Description  
List of voices  An array of voices that the synthesizer is capable of producing. Each voice is defined by an instance of the Voice class which encapsulates voice name, gender, age and speaking style.  
.

The two additional properties defined by the RecognizerModeDesc class for speech recognizers are shown in Table 4-3

Table 4-3 Recognizer selection properties: RecognizerModeDesc
Property Name  Description  
Dictation supported  A Boolean value indicating whether this mode of operation of the recognizer supports a dictation grammar.  
Speaker profiles  A list of SpeakerProfile objects for speakers who have trained the recognizer. Recognizers that do not support training return a null list.  
.

All three mode descriptor classes, EngineModeDesc, SynthesizerModeDesc and RecognizerModeDesc use the get and set property patterns for JavaBeansTM. For example, the Locale property has get and set methods of the form:

Locale getLocale();
void setLocale(Locale l);

Furthermore, all the properties are defined by class objects, never by primitives (primitives in the Java programming language include boolean, int etc.). With this design, a null value always represents "don't care" and is used by applications to indicate that a particular property is unimportant to its functionality. For instance, a null value for the "dictation supported" property indicates that dictation is not relevant to engine selection. Since that property is represented by the Boolean class, a value of TRUE indicates that dictation is required and FALSE indicates explicitly that dictation should not be provided.

 


 

4.3     Locating, Selecting and Creating Engines

4.3.1     Default Engine Creation

The simplest way to create a speech engine is to request a default engine. This is appropriate when an application wants an engine for the default locale (specifically for the local language) and does not have any special functional requirements for the engine. The Central class in the javax.speech package is used for locating and creating engines. Default engine creation uses two static methods of the Central class.

Synthesizer Central.createSynthesizer(EngineModeDesc mode);
Recognizer Central.createRecognizer(EngineModeDesc mode);

The following code creates a default Recognizer and Synthesizer.


import javax.speech.*; import javax.speech.synthesis.*; import javax.speech.recognition.*; { // Get a synthesizer for the default locale Synthesizer synth = Central.createSynthesizer(null); // Get a recognizer for the default locale Recognizer rec = Central.createRecognizer(null); }

For both the createSynthesizer and createRecognizer the null parameters indicate that the application doesn't care about the properties of the synthesizer or recognizer. However, both creation methods have an implicit selection policy. Since the application did not specify the language of the engine, the language from the system's default locale returned by java.util.Locale.getDefault() is used. In all cases of creating a speech engine, the Java Speech API forces language to be considered since it is fundamental to correct engine operation.

If more than one engine supports the default language, the Central then gives preference to an engine that is running (running property is true), and then to an engine that supports the country defined in the default locale.

If the example above is performed in the US locale, a recognizer and synthesizer for the English language will be returned if one is available. Furthermore, if engines are installed for both British and US English, the US English engine would be created.

4.3.2     Simple Engine Creation

The next easiest way to create an engine is to create a mode descriptor, define desired engine properties and pass the descriptor to the appropriate engine creation method of the Central class. When the mode descriptor passed to the createSynthesizer or createRecognizer methods is non-null, an engine is created which matches all of the properties defined in the descriptor. If no suitable engine is available, the methods return null.

The list of properties is described in the Properties of a Speech Engine. All the properties in EngineModeDesc and its sub-classes RecognizerModeDesc and SynthesizerModeDesc default to null to indicate "don't care".

The following code sample shows a method that creates a dictation-capable recognizer for the default locale. It returns null if no suitable engine is available.


/** Get a dictation recognizer for the default locale */ Recognizer createDictationRecognizer() { // Create a mode descriptor with all required features RecognizerModeDesc required = new RecognizerModeDesc(); required.setDictationGrammarSupported(Boolean.TRUE); return Central.createRecognizer(required); }

Since the required object provided to the createRecognizer method does not have a specified locale (it is not set, so it is null) the Central class again enforces a policy of selecting an engine for the language specified in the system's default locale. The Central class will also give preference to running engines and then to engines that support the country defined in the default locale.

In the next example we create a Synthesizer for Spanish with a male voice.


/** * Return a speech synthesizer for Spanish. * Return null if no such engine is available. */ Synthesizer createSpanishSynthesizer() { // Create a mode descriptor with all required features // "es" is the ISO 639 language code for "Spanish" SynthesizerModeDesc required = new SynthesizerModeDesc(); required.setLocale(new Locale("es", null)); required.addVoice(new Voice( null, GENDER_MALE, AGE_DONT_CARE, null)); return Central.createSynthesizer(required); }

Again, the method returns null if no matching synthesizer is found and the application is responsible for determining how to handle the situation.

4.3.3     Advanced Engine Selection

This section explains more advanced mechanisms for locating and creating speech engines. Most applications do not need to use these mechanisms. Readers may choose to skip this section.

In addition to performing engine creation, the Central class can provide lists of available recognizers and synthesizers from two static methods.

EngineList availableSynthesizers(EngineModeDesc mode);
EngineList availableRecognizers(EngineModeDesc mode);

If the mode passed to either method is null, then all known speech recognizers or synthesizers are returned. Unlike the createRecognizer and createSynthesizer methods, there is no policy that restricts the list to the default locale or to running engines - in advanced selection such decisions are the responsibility of the application.

Both availableSynthesizers and availableRecognizers return an EngineList object, a sub-class of Vector. If there are no available engines, or no engines that match the properties defined in the mode descriptor, the list is zero length (not null) and its isEmpty method returns true. Otherwise the list contains a set of SynthesizerModeDesc or RecognizerModeDesc objects each defining a mode of operation of an engine. These mode descriptors are engine-defined so all their features are defined (non-null) and applications can test these features to refine the engine selection.

Because EngineList is a sub-class of Vector, each element it contains is a Java Object. Thus, when accessing the elements applications need to cast the objects to EngineModeDesc, SynthesizerModeDesc or RecognizerModeDesc.

The following code shows how an application can obtain a list of speech synthesizers with a female voice for German. All other parameters of the mode descriptor remain null for "don't care" (engine name, mode name etc.).


import javax.speech.*; import javax.speech.synthesis.*; // Define the set of required properties in a mode descriptor SynthesizerModeDesc required = new SynthesizerModeDesc(); required.setLocale(new Locale("de", "")); required.addVoice(new Voice( null, GENDER_FEMALE, AGE_DONT_CARE, null)); // Get the list of matching engine modes EngineList list = Central.availableSynthesizers(required); // Test whether the list is empty - any suitable synthesizers? if (list.isEmpty()) ...

If the application specifically wanted Swiss German and a running engine it would add the following before calling availableSynthesizers:

required.setLocale(new Locale("de", "CH"));
required.setRunning(Boolean.TRUE);

To create a speech engine from a mode descriptor obtained through the availableSynthesizers and availableRecognizers methods, an application simply calls the createSynthesizer or createRecognizer method. Because the engine created the mode descriptor and because it provided values for all the properties, it has sufficient information to create the engine directly. An example later in this section illustrates the creation of a Recognizer from an engine- provided mode descriptor.

Although applications do not normally care, engine-provided mode descriptors are special in two other ways. First, all engine-provided mode descriptors are required to implement the EngineCreate interface which includes a single createEngine method. The Central class uses this interface to perform the creation. Second, engine-provided mode descriptors may extend the SynthesizerModeDesc and RecognizerModeDesc classes to encapsulate additional features and information. Applications should not access that information if they want to be portable, but engines will use that information when creating a running Synthesizer or Recognizer.

4.3.3.1     Refining an Engine List

If more than one engine matches the required properties provided to availableSynthesizers or availableRecognizers then the list will have more than one entry and the application must choose from amongst them.

In the simplest case, applications simply select the first in the list which is obtained using the EngineList.first method. For example:


EngineModeDesc required; ... EngineList list = Central.availableRecognizers(required); if (!list.isEmpty()) { EngineModeDesc desc = (EngineModeDesc)(list.first()); Recognizer rec = Central.createRecognizer(desc); }

More sophisticated selection algorithms may test additional properties of the available engine. For example, an application may give precedence to a synthesizer mode that has a voice called "Victoria".

The list manipulation methods of the EngineList class are convenience methods for advanced engine selection.

The following code shows how to use these methods to obtain a Spanish dictation recognizer with preference given to a recognizer that has been trained for a specified speaker passed as an input parameter.


import javax.speech.*; import javax.speech.recognition.*; import java.util.Locale; Recognizer getSpanishDictation(String name) { RecognizerModeDesc required = new RecognizerModeDesc(); required.setLocale(new Locale("es", "")); required.setDictationGrammarSupported(Boolean.TRUE); // Get a list of Spanish dictation recognizers EngineList list = Central.availableRecognizers(required); if (list.isEmpty()) return null; // nothing available // Create a description for an engine trained for the speaker SpeakerProfile profile = new SpeakerProfile(null, name, null); RecognizerModeDesc requireSpeaker = new RecognizerModeDesc(); requireSpeaker.addSpeakerProfile(profile); // Prune list if any recognizers have been trained for speaker if (list.anyMatch(requireSpeaker)) list.requireMatch(requireSpeaker); // Now try to create the recognizer RecognizerModeDesc first = (RecognizerModeDesc)(list.firstElement()); try { return Central.createRecognizer(first); } catch (SpeechException e) { return null; } }

 


 

4.4     Engine States

4.4.1     State systems

The Engine interface includes a set of methods that define a generalized state system manager. Here we consider the operation of those methods. In the following sections we consider the two core state systems implemented by all speech engines: the allocation state system and the pause-resume state system. In Chapter 5, the state system for synthesizer queue management is described. In Chapter 6, the state systems for recognizer focus and for recognition activity are described.

A state defines a particular mode of operation of a speech engine. For example, the output queue moves between the QUEUE_EMPTY and QUEUE_NOT_EMPTY states. The following are the basics of state management.

The getEngineState method of the Engine interface returns the current engine state. The engine state is represented by a long value (64-bit value). Specified bits of the state represent the engine being in specific states. This bit- wise representation is used because an engine can be in more than one state at a time, and usually is during normal operation.

Every speech engine must be in one and only one of the four allocation states (described in detail in Section 4.4.2). These states are DEALLOCATED, ALLOCATED, ALLOCATING_RESOURCES and DEALLOCATING_RESOURCES. The ALLOCATED state has multiple sub-states. Any ALLOCATED engine must be in either the PAUSED or the RESUMED state (described in detail in Section 4.4.4).

Synthesizers have a separate sub-state system for queue status. Like the paused/resumed state system, the QUEUE_EMPTY and QUEUE_NOT_EMPTY states are both sub-states of the ALLOCATED state. Furthermore, the queue status and the paused/resumed status are independent.

Recognizers have three independent sub-state systems to the ALLOCATED state (the PAUSED/RESUMED system plus two others). The LISTENING, PROCESSING and SUSPENDED states indicate the current activity of the recognition process. The FOCUS_ON and FOCUS_OFF states indicate whether the recognizer currently has speech focus. For a recognizer, all three sub-state systems of the ALLOCATED state operate independently (with some exceptions that are discussed in the recognition chapter).

Each of these state names is represented by a static long in which a single unique bit is set. The & and | operators of the Java programming language are used to manipulate these state bits. For example, the state of an allocated, resumed synthesizer with an empty speech output queue is defined by:

(Engine.ALLOCATED | Engine.RESUMED | Synthesizer.QUEUE_EMPTY)

To test whether an engine is resumed, we use the test:

if ((engine.getEngineState() & Engine.RESUMED) != 0) ...

For convenience, the Engine interface defines two additional methods for handling engine states. The testEngineState method is passed a state value and returns true if all the state bits in that value are currently set for the engine. Again, to test whether an engine is resumed, we use the test:

if (engine.testEngineState(Engine.RESUMED)) ...

Technically, the testEngineState(state) method is equivalent to:

if ((engine.getEngineState() & state) == state)...

The final state method is waitEngineState. This method blocks the calling thread until the engine reaches the defined state. For example, to wait until a synthesizer stops speaking because its queue is empty we use:

engine.waitEngineState(Synthesizer.QUEUE_EMPTY);

In addition to method calls, applications can monitor state through the event system. Every state transition is marked by an EngineEvent being issued to each EngineListener attached to the Engine. The EngineEvent class is extended by the SynthesizerEvent and RecognizerEvent classes for state transitions that are specific to those engines. For example, the RECOGNIZER_PROCESSING RecognizerEvent indicates a transition from the LISTENING state to the PROCESSING (which indicates that the recognizer has detected speech and is producing a result).

4.4.2     Allocation State System

Engine allocation is the process in which the resources required by a speech recognizer or synthesizer are obtained. Engines are not automatically allocated when created because speech engines can require substantial resources (CPU, memory and disk space) and because they may need exclusive access to an audio resource (e.g. microphone input or speaker output). Furthermore, allocation can be a slow procedure for some engines (perhaps a few seconds or over a minute).

The allocate method of the Engine interface requests the engine to perform allocation and is usually one of the first calls made to a created speech engine. A newly created engine is always in the DEALLOCATED state. A call to the allocate method is, technically speaking, a request to the engine to transition to the ALLOCATED state. During the transition, the engine is in a temporary ALLOCATING_RESOURCES state.

The deallocate method of the Engine interface requests the engine to perform deallocation of its resources. All well-behaved applications call deallocate once they have finished using an engine so that its resources are freed up for other applications. The deallocate method returns the engine to the DEALLOCATED state. During the transition, the engine is in a temporary DEALLOCATING_RESOURCES state.

Figure 4-1 shows the state diagram for the allocation state system.

Each block represents a state of the engine. An engine must always be in one of the four specified states. As the engine transitions between states, the event labelled on the transition arc is issued to the EngineListeners attached to the engine.

The normal operational state of an engine is ALLOCATED. The paused-resumed state of an engine is described in the next section. The sub-state systems of ALLOCATED synthesizers and recognizers are described in Chapter 5 and Chapter 6 respectively.

4.4.3     Allocated States and Call Blocking

For advanced applications, it is often desirable to start up the allocation of a speech engine in a background thread while other parts of the application are being initialized. This can be achieved by calling the allocate method in a separate thread. The following code shows an example of this using an inner class implementation of the Runnable interface. To determine when the allocation method is complete, we check later in the code for the engine being in the ALLOCATED state.


Engine engine; { engine = Central.createRecognizer(); new Thread(new Runnable() { public void run() { try { engine.allocate(); } catch (Exception e) { e.printStackTrace(); } } }).start(); // Do other stuff while allocation takes place ... // Now wait until allocation is complete engine.waitEngineState(Engine.ALLOCATED); } }

A full implementation of an application that uses this approach to engine allocation needs to consider the possibility that the allocation fails. In that case, the allocate method throws an EngineException and the engine returns to the DEALLOCATED state.

Another issue advanced applications need to consider is class blocking. Most methods of the Engine, Recognizer and Synthesizer are defined for normal operation in the ALLOCATED state. What if they are called for an engine in another allocation state? For most methods, the operation is defined as follows:

A small subset of engine methods will operate correctly in all engine states. The getEngineProperties always allows runtime engine properties to be set and tested (although properties only take effect in the ALLOCATED state). The getEngineModeDesc method can always return the mode descriptor for the engine. Finally, the three engine state methods - getEngineState, testEngineState and waitEngineState - always operated as defined.

4.4.4     Pause - Resume State System

All ALLOCATED speech engines have PAUSED and RESUMED states. Once an engine reaches the ALLOCATED state, it enters either the PAUSED or the RESUMED state. The factors that affect the initial PAUSED/RESUMED state are described below.

The PAUSED/RESUMED state indicates whether the audio input or output of the engine is on or off. A resumed recognizer is receiving audio input. A paused recognizer is ignoring audio input. A resumed synthesizer produces audio output as it speaks. A paused synthesizer is not producing audio output.

As part of the engine state system, the Engine interface provides several methods to test PAUSED/RESUMED state. The general state system is described previously in Section 4.4.

An application controls an engine's PAUSED/RESUMED state with the pause and resume methods. An application may pause or resume an engine indefinitely. Each time the PAUSED/RESUMED state changes an ENGINE_PAUSED or ENGINE_RESUMED type of EngineEvent is issued each EngineListener attached to the Engine.

Figure 4-2 shows the basic pause and resume diagram for a speech engine. As a sub-state system of the ALLOCATED state, the pause and resume states represented within the ALLOCATED state as shown in Figure 4-1.

As with Figure 4-1, Figure 4-2 represents states as labelled blocks, and the engine events as labelled arcs between those blocks. In this diagram the large block is the ALLOCATED state which contains both the PAUSED and RESUMED states.

4.4.5     State Sharing

The PAUSED/RESUMED state of a speech engine may, in many situations, be shared by multiple applications. Here we must make a distinction between the Java object that represents a Recognizer or Synthesizer and the underlying engine that may have multiple Java and non-Java applications connected to it. For example, in personal computing systems (e.g., desktops and laptops), there is typically a single engine running and connected to microphone input or speaker/ headphone output and all application share that resource.

When a Recognizer or Synthesizer (the Java software object) is paused and resumed the shared underlying engine is paused and resumed and all applications connected to that engine are affected.

There are three key implications from this architecture:

4.4.6     Synthesizer Pause

For a speech synthesizer - a speech output device - pause immediately stops the audio output of synthesized speech. Resume recommences speech output from the point at which the pause took effect. This is analogous to pause and resume on a tape player or CD player.

Chapter 5 describes an additional state system of synthesizers. An ALLOCATED Synthesizer has sub-states for QUEUE_EMPTY and QUEUE_NOT_EMPTY. This represents whether there is text on the speech output queue of the synthesizer that is being spoken or waiting to be spoken. The queue state and pause/resume state are independent. It is possible, for example, for a RESUMED synthesizer to have an empty output queue (QUEUE_EMPTY state). In this case, the synthesizer is silent because it has nothing to say. If any text is provided to be spoken, speech output will start immediately because the synthesizer is RESUMED.

4.4.7     Recognizer Pause

For a recognizer, pausing and resuming turns audio input off and on and is analogous to switching the microphone off and on. When audio input is off the audio is lost. Unlike a synthesizer, for which a resume continues speech output from the point at which it was paused, resuming a recognizer restarts the processing of audio input from the time at which resume is called.

Under normal circumstances, pausing a recognizer will stop the recognizer's internal processes that match audio against grammars. If the user was in the middle of speaking at the instant at which the recognizer was paused, the recognizer is forced to finalize its recognition process. This is because a recognizer cannot assume that the audio received just before pausing is in any way linked to the audio data that it will receive after being resumed. Technically speaking, pausing introduces a discontinuity into the audio input stream.

One complexity for pausing and resuming a recognizer (not relevant to synthesizers) is the role of internal buffering. For various reasons, described in Chapter 6, a recognizer has a buffer for audio input which mediates between the audio device and the internal component of the recognizer which perform that match of the audio to the grammars. If recognizer is performing in real-time the buffer is empty or nearly empty. If the recognizer is temporarily suspended or operates slower than real-time, then the buffer may contain seconds of audio or more.

When a recognizer is paused, the pause takes effect on the input end of the buffer; i.e, the recognizer stops putting data into the buffer. At the other end of the buffer - where the actual recognition is performed Þ- the recognizer continues to process audio data until the buffer is empty. This means that the recognizer can continue to produce recognition results for a limited period of time even after it has been paused. (A Recognizer also provides a forceFinalize method with an option to flush the audio input buffer.)

Chapter 6 describes an additional state system of recognizers. An ALLOCATED Recognizer has a separate sub-state system for LISTENING, RECOGNIZING and SUSPENDED. These states indicate the current activity of the internal recognition process. These states are largely decoupled from the PAUSED and RESUMED states except that, as described in detail in Chapter 6, a paused recognizer eventually returns to the LISTENING state when it runs out of audio input (the LISTENING state indicates that the recognizer is listening to background silence, not to speech).

The SUSPENDED state of a Recognizer is superficially similar to the PAUSED state. In the SUSPENDED state the recognizer is not processing audio input from the buffer, but is temporarily halted while an application updates its grammars. A key distinction between the PAUSED state and the SUSPENDED state is that in the SUSPENDED state audio input can be still be coming into the audio input buffer. When the recognizer leaves the SUSPENDED state the audio is processed. The SUSPENDED state allows a user to continue talking to the recognizer even while the recognizer is temporarily SUSPENDED. Furthermore, by updating grammars in the SUSPENDED state, an application can apply multiple grammar changes instantaneously with respect to the audio input stream.

 


 

4.5     Speech Events

Speech engines, both recognizers and synthesizers, generate many types of events. Applications are not required to handle all events, however, some events are particularly important for implementing speech applications. For example, some result events must be processed to receive recognized text from a recognizer.

Java Speech API events follow the JavaBeans event model. Events are issued to a listener attached to an object involved in generating that event. All the speech events are derived from the SpeechEvent class in the javax.speech package.

The events of the javax.speech package are listed in Table 4-4.

Table 4-4 Speech events: javax.speech package
Name  Description  
SpeechEvent  Parent class of all speech events.  
EngineEvent  Indicates a change in speech engine state.  
AudioEvent  Indicates an audio input or output event.  
EngineErrorEvent  Sub-class of EngineEvent that indicates an asynchronous problems has occurred in the engine.  

The events of the javax.speech.synthesis package are listed in Table 4-5.

Table 4-5 Speech events: javax.speech.synthesis package
Name  Description  
SynthesizerEvent  Extends the EngineEvent for the specialized events of a Synthesizer.  
SpeakableEvent  Indicates the progress in output of synthesized text.  

The events of the javax.speech.recognition package are listed in Table 4-6.

Table 4-6 Speech events: javax.speech.recognition package
Name  Description  
RecognizerEvent  Extends the EngineEvent for the specialized events of a Recognizer.  
GrammarEvent  Indicates an update of or a status change of a recognition grammar.  
ResultEvent  Indicates status and data changes of recognition results.  
RecognizerAudioEvent  Extends AudioEvent with events for start and stop of speech and audio level updates.  

4.5.1     Event Synchronization

A speech engine is required to provide all its events in synchronization with the AWT event queue whenever possible. The reason for this constraint is that it simplifies to integration of speech events with AWT events and the Java Foundation Classes events (e.g., keyboard, mouse and focus events). This constraint does not adversely affect applications that do not provide graphical interfaces.

Synchronization with the AWT event queue means that the AWT event queue is not issuing another event when the speech event is being issued. To implement this, speech engines need to place speech events onto the AWT event queue. The queue is obtained through the AWT Toolkit:

EventQueue q = Toolkit.getDefaultToolkit().getSystemEventQueue();

The EventQueue runs a separate thread for event dispatch. Speech engines are not required to issue the events through that thread, but should ensure that thread is blocked while the speech event is issued.

Note that SpeechEvent is not a sub-class of AWTEvent, and that speech events are not actually placed directly on the AWT event queue. Instead, a speech engine is performing internal activities to keep its internal speech event queue synchronized with the AWT event queue to make an application developer's life easier.

 


 

4.6     Other Engine Functions

4.6.1     Runtime Engine Properties

Speech engines each have a set of properties that can be changed while the engine is running. The EngineProperties interface defined in the javax.speech package is the root interface for accessing runtime properties. It is extended by the SynthesizerProperties interface defined in the javax.speech.synthesis package, and the RecognizerProperties interface defined in the javax.speech.recognition package.

For any engine, the EngineProperties is obtained by calling the EngineProperties method defined in the Engine interface. To avoid casting the return object, the getSynthesizerProperties method of the Synthesizer interface and the getRecognizerProperties method of the Recognizer interface are also provided to return the appropriate type. For example:


{ Recognizer rec = ...; RecognizerProperties props = rec.getRecognizerProperties(); }

The EngineProperties interface provides three types of functionality.

The SynthesizerProperties and RecognizerProperties interfaces define the sets of runtime features of those engine types. These specific properties defined by these interfaces are described in Chapter 5 and Chapter 6 respectively.

For each property there is a get and a set method, both using the JavaBeans property patterns. For example, the methods for handling a synthesizer's speaking voice are:

float getVolume()
void setVolume(float voice) throws PropertyVetoException;

The get method returns the current setting. The set method attempts to set a new volume. A set method throws an exception if it fails. Typically, this is because the engine rejects the set value. In the case of volume, the legal range is 0.0 to 1.0. Values outside of this range cause an exception.

The set methods of the SynthesizerProperties and RecognizerProperties interfaces are asynchronous - they may return before the property change takes effect. For example, a change in the voice of a synthesizer may be deferred until the end of the current word, the current sentence or even the current document. So that an application knows when a change occurs, a PropertyChangeEvent is issued to each PropertyChangeListener attached to the properties object.

A property change event may also be issued because another application has changed a property, because changing one property affects another (e.g., changing a synthesizer's voice from male to female will usually cause an increase in the pitch setting), or because the property values have been reset.

4.6.2     Audio Management

The AudioManager of a speech engine is provided for management of the engine's speech input or output. For the Java Speech API Version 1.0 specification, the AudioManager interface is minimal. As the audio streaming interfaces for the Java platform are established, the AudioManager interface will be enhanced for more advanced functionality.

For this release, the AudioManager interface defines the ability to attach and remove AudioListener objects. For this release, the AudioListener interface is simple: it is empty. However, the RecognizerAudioListener interface extends the AudioListener interface to receive three audio event types (SPEECH_STARTED, SPEECH_STOPPED and AUDIO_LEVEL events). These events are described in detail in Chapter 6. As a type of AudioListener, a RecognizerAudioListener is attached and removed through the AudioManager.

4.6.3     Vocabulary Management

An engine can optionally provide a VocabManager for control of the pronunciation of words and other vocabulary. This manager is obtained by calling the getVocabManager method of a Recognizer or Synthesizer (it is a method of the Engine interface). If the engine does not support vocabulary management, the method returns null.

The manager defines a list of Word objects. Words can be added to the VocabManager, removed from the VocabManager, and searched through the VocabManager.

The Word class is defined in the javax.speech package. Each Word is defined by the following features.


Contents Previous   Next  


JavaTM Speech API Programmer's Guide
Copyright © 1997-1998 Sun Microsystems, Inc. All rights reserved
Send comments or corrections to javaspeech-comments@sun.com