After the electrical revolution
took place in the early twentieth century
, the understanding of circuitry and electrical engineering
began to become commonplace. With that knowledge came an explosion of creative possibility, in some cases using electrical devices to model (and thus replace) their mechanical counterparts, and in others using the new techniques for completely novel
purposes with no previous examples. As this happened, systems that used mechanical means to model speech
phased out of popularity, replaced by systems that used a completely electrical analog
pathway, from sound source through modulation
to output speaker.
While still primitive and rather difficult to make out, speech from these machines was more understandable than the best of their mechanical brethren. Methods of long-term storage and amplification made the technology more versatile as well, in some cases not needing a human attendant to press keys for each individual sound. The public was more interested too, after seeing and reading about legitimate scientists and engineers working on something that could be so eerily impressive to experience.
Some of the devices and techniques that came about during this period were:
The Voder: It's probably best to think of this machine as the direct ancestor of the vocoder, both in name and technology. Driving the sound generation were two sources, one sending out white noise, and the other a relaxation oscillator producing sort of a vocal cord buzz. A foot pedal varied the pitch of the relaxation oscillator so its intonation could be made less monotonic, and a wrist bar -- the "Energy Switch" -- switched between noise and oscillation. From this source the sound went through a bank of band pass filters, which each cut off a different portion of the speech frequency range. These filters were controlled by a ten key keyboard, opening up when keys were depressed and staying closed and silent otherwise. Alongside the keyboard were three more keys, which excite transient frequencies in the filter bank to produce k/t, p/b, or k/g stops depending on whether the sound was noise (unvoiced) or oscillation (voiced). Finally, there was a key that turned on or off the amplification stage, to quell any transients or noise that echoed through the filter bank for longer than would sound natural.
Invented by H. W. Dudley of Bell Labs in 1939, this machine (a particular model nicknamed Pedro, the Voder, in fact) was debuted at the World's Fairs of 1939 in New York and 1940 in San Fransisco. Operators had to train up to a year (!) to play the machines well enough for the speech to be intelligible. Later this same idea -- modulation of a sound source with a filter bank -- would be used to create the vocoder, which instead of using a keyboard (mostly) had the modulation done by another source. If this source was speech the vocoder would sound like a distorted voice, and so forth.
Pattern Playback: An altogether different approach was taken in the late 1940's by Frank Cooper at Yale's Haskins Laboratory. Spectrography involved using the sound's frequency to modulate a light filter, so as the sound changed a changing image was put exposed to moving photographic film. Intrigued by this still-new invention, Cooper set up a system whereby light passed through a spectrographic print and was processed to become the sound recorded on the print. This method it was not only a new way to save recorded sound for an indefinite period of time, but also made it possible to synthesize speech wholesale by drawing in light on film. Hand-made "voiceprints" paved the way for psychological speech perception experimentation throughout the 1950's, which discovered many vocal traits and semaphores important in human speech processing.
Electrical Vocal Tract: Much closer conceptually closer to the mechanical speech synthesis machines, these devices were circuits set up in such a way as to model the shape and resonance of the human vocal tract. A researcher named H.K. Dunn created such a model in which an oscillator provided the basic sound, which was passed through thirteen inductance coils, each of which were tuned to resonate like the open area of one part of the vocal tract. One of the coils acted as the tongue, opposing resonance rather than increasing it. This coil had a dial to select vertical tongue position, and could be slid back and forth on the machine to simulate horizontal position. There was another variable inductor at the end of the chain, which acted as the lips and could be turned all the way to silence. Having a cumbersome interface and no noise generator this machine could not produce actual speech, though its vowels were said to be uncanilly accurate.
Formant Synthesis: Formant synthesis, unlike the topics above, is still an active subject of research and really deserves a whole node of its own. It falls somewhere in between all three of the above approaches, taking a little bit of technique from each. Essentially, formant synthesis pushes either a frequency or noise through a filter bank, rather like the Voder and its vocoder cousins. Unlike the Voder, however, the filter bank isn't directly controlled by an operator, but by various oscillators and other signal generators. Each signal generator modulates the filter bank in such a way as to produce the sound of a formant, one of the energy bands that appears on spectrographic recordings of speech. These generators are themselves can be modulated by different means to model the movement of a human vocal tract during their given formant, making the synthesized sound additionally realistic. In this way speech can be synthesized by triggering different formants in the order that they make up a phoneme, and further stringing them together to create speech.