According to the research team, it is 23% more accurate than existing systems
A model artificial intelligence that can directly translate speech from one language to another was created by a research team of the American technology company Meta.
Most existing machine learning translation systems are text-oriented or involve multiple steps, namely speech recognition, text-to-text translation, and text-to-speech conversion. Furthermore, the language coverage in existing speech-to-speech models falls short of that of text-to-text models.
In an effort to address these limitations, the new model, called SEAMLESSM4T, does instant translations for up to 101 languages and may pave the way for fast translations, according to the journal publication. Nature.
In particular it can do speech-to-speech translation by recognizing 101 languages and translating in 36, speech-to-text translation (101 languages in 96), text-to-speech translation (96 languages in 36), text-to-text translation (96 languages) and automatic speech recognition (96 languages). According to the research team, for speech-to-speech translation SEAMLESSM4T translates with up to 23% more accuracy than existing systems.
In a companion article commenting on the research in the same journal, associate professor at Tallinn University of Technology in Estonia, Tanel Alume, notes that the biggest virtue of this model is the fact that all the data and code to run and optimize the technology are publicly available. However, he sees that some obstacles remain, such as limited language translation or the difficulty of translating conversations in noisy places or between people with strong accents, which human translators handle more easily.
Alison Keneke, an assistant professor in Cornell University’s Department of Computer Science, said it was interesting that the researchers quantified the toxic, harmful or offensive language that translations might introduce and looked for any gender bias the model might produce. in translations. “While speech technologies may be more efficient and cost-effective at transcribing and translating than humans (who are also prone to biases and errors), it is imperative to understand the ways in which these technologies fail, disproportionately for certain demographics,” he notes.