A group of researchers from the Computer Science and Artificial Intelligence Laboratory (CSAIL) of the Massachusetts Institute of Technology (MIT) have developed an Artificial Intelligence system capable of translating ‘dead’ languages.
The objective of this system, which has been developed by MIT with the support of the Intelligence Advanced Research Project Activity (IARPA) and directed by MIT professor Regina Barzilay, is tohelp linguists decipher languages that have been lost to history.
According to recent research, most of the languages that have existed are no longer spoken and dozens of dead languages are lost or have not been deciphered, as not enough is known about their grammar, vocabulary or syntax.
In this sense, the AI system developed by the MIT researchers is capable of automatically decipher dead languages without the need to have advanced knowledge of its relationship with other languages.
The system can also determine the relationships between various languages on its own and has been used, among other things, to corroborate recent studies suggesting that the Iberian language is not related to Basque, according to an article from MIT.
This system is based on knowledge of historical linguistics, such as the fact that languages tend to evolve in certain predictable ways.
Although a given language rarely adds or removes an entire sound, certain sound substitutions can occur, for example a word with a “p” can change to a “b” in descendant language, but is less likely to change to a “k”.
The algorithm developed by Barzilay and MIT PhD student Jiaming Luo learns to embed the sounds of language in a multidimensional space where differences in pronunciation are reflected in the distance between the corresponding vectors. This allows finding patterns of language change and segmenting words in a dead language and assigning them to a current language.
The team of researchers seeks to expand their efforts beyond connecting texts with related words in a known language and focus on identifying the semantic meaning of words even without knowing how they are read.
“These ‘entity recognition’ methods are commonly used in various word processing applications today and are highly accurate, but the key research question is whether the task is feasible without training data in the old language.” Barzilay has indicated.