For The First Time, AI Can Teach Itself Any Language On Earth
Entertainment By Elena Boaghi | December 2, 2017
To understand the potential of these new systems, it helps to know how current machine translation works. The current de facto standard is Google Translate, a system that covers 103 languages from Afrikaans to Zulu, including the top 10 languages in the world–in order, Mandarin, Spanish, English, Hindi, Bengali, Portuguese, Russian, Japanese, German, and Javanese. Google’s system uses human-supervised neural networks that compare parallel texts–books and articles that have been previously translated by humans. By comparing extremely large amounts of these parallel texts, Google Translate learns the equivalences between any two given languages, thus acquiring the ability to quickly translate between them. Sometimes the translations are funny or don’t really capture the original meaning but, in general, they are functional and, overtime, they’re getting better and better.
Google’s approach is good, and it works. But unfortunately, it’s not universally functional. That’s because supervised training requires a very long time and a lot of supervisors–so many that Google actually uses crowdsourcing–but also because there just aren’t enough of these parallel texts translated between all the languages in the world. Consider this: According to the Ethnologue catalog of world languages, there are 6,909 living languages on Earth. 414 of those account for 94% of humanity. Since Google Translate covers 103, that leaves 6,806 languages without automated translation–311 with more than one million speakers. In total, at least eight hundred million people can’t enjoy the benefits of automated translation.
The two new systems–which can translate words and sentences between any language–don’t learn by comparing large amounts of parallel texts translated by humans. They also don’t need supervision. Instead, they use unsupervised machine learning and compare random texts in different languages. How does that work? Since languages group words in similar ways, the systems guess what the word equivalencies are, building translation dictionaries with that information. From there, they figure out the sentence structure, evaluating the result of their guesses by translating back and forth between different languages.
As UPV’s researcher Mikel Artetxe describes: “Imagine that you give one person lots of Chinese books and lots of Arabic books—none of them overlapping—and the person has to learn to translate Chinese to Arabic. That seems impossible, right?” In fact, it seemed so impossible that Microsoft AI expert Di He–who inspired these two research projects–told Science that he was in shock to learn that “the computer could learn to translate even without human supervision.”
One caveat? The systems are not as accurate as current parallel text deep learning systems–but the fact that a computer can guess all this without any human guidance is, like Di He points out, nothing short incredible. We’re just scratching the surface of this new learning method. It seems very likely that sometime soon, a true universal translator that allows us to talk to anyone in their native tongue won’t just be the stuff of sci-fi.