Neuroengineers at Columbia University have successfully converted human neural activity into synthesized speech.
Led by Dr. Nima Mesgarani, their method involved training a deep-learning algorithm to interpret and synthesize the neural patterns of test subjects. Doing so, they were able to “reconstruct the words a person hears with unprecedented clarity.”
The test subjects, in this case a group of five individuals undergoing brain surgery for epilepsy, were asked to listen to audio of people speaking for 30 minutes, while researchers measured their brain activity. This was used to train the algorithm.
Afterwards, the researchers played audio of the numbers zero to nine being spoken, while at the same time recording the subjects’ brain activity again. The algorithm then translated the neural signals and attempted to synthesize them into speech.
“The end result was a robotic-sounding voice reciting a sequence of numbers.”
According to Dr. Mesgarani, “people could understand and repeat the sounds about 75% of the time, which is well above and beyond any previous attempts.”
You can listen to the “robotic” recordings, created using four different methods, right over here. It should be noted that what the neuroengineers accomplished here wasn’t directly thought-to-speech, but rather interpreting brain activity related to speech the subjects had just listened to.
However, it’s not difficult to see how this same deep-learning algorithm may one day be used to allow true thought-to-speech. That’s exactly what these neuroengineers intend to work on next: They hope to eventually create a wearable implant.
“In this scenario,” said Dr. Mesgarani, “if the wearer thinks ‘I need a glass of water,’ our system could take the brain signals generated by that thought, and turn them into synthesized, verbal speech.”
You can read the full study at Nature.