Speech recognition in consumer electronics generally
takes one of two forms: speaker-independent or speaker-dependent
recognition.
Devices utilizing the speaker-independent type can be used
by any consumer, "right out of the box", so to speak. Speaker-
independent patterns are generated by taking a large number of
samples across a target demographic for the device (young girls,
yuppies, or even dogs!) and then averaging the patterns to create
a somewhat "universal" template. Obviously, speaker-independent
devices have a somewhat limited vocabulary due to storage constraints
and the time and resources required to get a good representative
sample. Usually, recordings of about 500 different people saying the
same word are required to produce a worthwhile speaker-independent
sample.
Speaker-dependent recognition has the advantage of versatility
of vocabulary. For example, a "password" journal utilizing speech
recognition ought to have the ability to learn any particular
password that the consumer wants to use. Speaker dependent templates
are generally compressed on the fly and stored in flash memory.
Speaker-dependent technology is ideal for applications where security
is of primary concern, such as the aforementioned password journal.
A journal might open when Little Susie says the password, "bacon fat",
but if her brother Billy tries to say "bacon fat" into the microphone,
the journal won't open--unless, of course, he is adept enough at
impersonating his sister!
For both speaker-dependent and speaker-independent devices, a
"threshold" is often coded into the firmware. This threshold
accounts for such variables as ambient room noise and variations
in the human voice (due to such things as colds). A speech
recognition device is said to have robust performance if it is
able to pick out the pattern it is looking for despite the presence
of noise.