Machine Translation and the use of Machine Learning, Artificial Intelligence and Language Processing technologies has gotten a lot of attention in the last couple of years. The most recent hype cycle of note started with Google’s 2016 research paper about its Neural Machine Translation (NMT) system and subsequent announcement that their flagship product Google Translate was switching to an NMT engine. The marked improvement in translation quality sparked a wave of media coverage and announcements from other players in the industry that they too are moving to NMT. This includes Andovar’s Machine Translation (MT) partner Omniscien, who have not only built their own NMT engines but also created a set of tools to prepare content for MT and improve the output further.
Let’s start with a short definition of Neural Machine Translation courtesy of Wikipedia:
Neural machine translation (NMT) is an approach to machine translation that uses a large artificial neural network to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model.
Understood? Good. Not understood? Even better. There’s a whole ocean of misinformation waiting for you.
Media coverage and discussion of MT is extremely polarized. On one hand, we have the MT providers and mainstream media that tend to repeat their most exaggerated claims to get readers’ attention. You can recognize those articles by their repeated use of catch-phrases such as "Babel Fish", "singularity" or "Star Trek’s Universal Translator".
One doesn't need to search long to find different attitudes to MT expressed by translators. These typically fall into one of the following three categories:
So is an MT revolution just around the corner? Will it render all translators unemployed, obliterate the need to learn languages and bring about world peace? Or is it all just snake oil for the gullible?
Anyone with any real knowledge of MT will reply that the truth lies in the middle.
Even the most fervent proponents of MT agree that it is not perfect all of the time; and even its most zealous opponents admit that it does surprisingly well some of the time. Machine Translation, whether Neural or not, does work well with some types of content and in some language pairs. Despite this, finding the middle ground seems elusive.
Instead of arguing whether MT is all good or all bad, let’s admit that it’s a bit of both (with the good parts being particularly good). What this means in practice is that it should be considered in all translation projects and either accepted or dismissed based on the results rather than preconceptions.
Here are some guidelines to consider for the application of MT:
Less suitable | More suitable |
---|---|
Non-structured | Structured |
Living language | Controlled language |
Long sentences | Short sentences |
Literature, marketing content | Technical content |
Context-dependent | Context-independent |
Ambiguous | Straightforward |
User generated content | Professionally written |
Informal | Formal |
High-risk (medical, legal, etc.) | Low-risk |
Overarching all those is the purpose of the translation and quality expectations that go with it. In some cases, an imperfect translation may be better than no translation, but in others (legal texts for example) only the highest human quality will do.
All of this applies not only to whole projects, but also to their sections. Andovar recommends triaging the content: consider sending the legalese to professional translators, the product descriptions to MT with post-editing, and the structured technical documentation to MT only. After all, some of this really is excellent.
MT's drawback is that it lacks the ability to discern culture, locale and social differences, often necessitating human post-editing. Trying to get a correct translation from an engine like Google Translate can still be inaccurate depending on language nuance. At its base, MT is not as intuitive as some would believe but it is still incredibly useful in translating large bulks of language into fairly comprehensible sentences.
Computer aided-translation (CAT) is a system that blends MT with human review. With CAT, the need for human eyes to review work is mandatory, making it an effort that offers the highest success rate in terms of accuracy. Where MT is not intuitive, human review can address the nature of language, ideally yielding natural-sounding communication and text. It is this blended approach that keeps human's involved and garner's the best results, and one that we here at Andovar believe in strongly.
To read more into the various types of translation technologies available for use today, feel free to check out our Ultimate Guide to Translation Automation Technologies.
In addition, to hear more about this discussion on Machine Translation itself and its presence in the localization industry, follow this link to watch a video presentation on the topic by Andovar's Chief Executive Officer, Conor Bracken.
At Andovar, we are as interested as you are in the evolutionary steps of the internet and the world at large. We want to be right there at the forefront of technology disruptions, integrating new system updates as they become available.
Our staff have become experts in the best systems available and we are prepared to help you access the best hardware and software available to help you smoothly transition to them in the future. Andovar’s Language Technology Tools are here to make your life a little easier.
Please feel free to get in touch if you have any questions about translation technologies or to see how we can help you with your next localization project!