Text to speech is the process of converting written text into audio of a humanlike voice. It typically consists of a number of stages:
  • The text is normalised -- i.e. had abbreviations removed or expanded, e.g. "CRC $14.20 me@here" would be expanded to "see are see fourteen dollars twenty me at here"
  • By analysing the punctuation and the grammar of the sentence, an appropriate intonation is worked out so it doesn't sound monotonous.
  • The words are converted into phonemes with intonation information.
  • The phonemes are synthesised into audio.

For those in the field, text to speech is the name of the whole process, while speech synthesis is usually used to refer to just the last phase, of assembling phonemes.

Currently there are two popular methods of speech synthesis: concatenative and parametric.

The top products in the field today are probably:

