Google’s Text-to-Speech System Nearly Mimics Human Voice

Tacotron 2, the name given to Google’s second-generation text-to-speech synthesizer sounds almost like a human being. That’s according to an un-reviewed research paper published in December 2017 by Google.

Very high-level details suggest that the system consists of two neural networks; the first one converts pure text to a visual representation of sound frequencies that vary with time. The second, called WaveNet, reads these sound frequencies and produces a near-human voice. WaveNet is a product of Google’s parent, Alphabet’s Artificial Intelligence lab, DeepMind.

