Machine Translation: Where We Came From, Where We Are, and Why It’s Still Not Perfect
As a marketing specialist or product manager, you’ve got a lot on your mind and your desk: product development. Marketing strategies. Competitor analysis. Ad campaigns. Social media content. Documentation/product literature (manuals, quick guides, safety instructions). And yes, when everything else is said and done, translation (you can’t translate what isn’t there, which is the reason that translators are always the last ones to get an invitation to the party/product release workflow).
Now, as someone that busy, you probably don’t translate yourself. (If you do: congratulations! You are a polyglot and a hardworking person!) You probably don’t even do it in-house (unless you work for a well-stuffed medium-sized company or enterprise). So the nitty-gritty of the translation process may not be the first thing on your mind when you face a tight deadline for a safety instructions booklet that needs to be done by Monday, or the cost of a white paper that you’d like to use in the Italian and Spanish markets without really knowing if more than three people will read it. So, where do you start?
Well, maybe by reading this article.
1. A Perfect World: No Languages, No Translation (And, Unfortunately, No Cultures)
Unless you are a true polyglot (or have never left your city or country), you have probably been at a loss for words more than once in your life. We live in Babylon. People speak different languages. Heck, you may live in a country where it’s almost impossible to understand the weird dialect that those people speak up North / down South; it might as well be Martian. And when you stand unhappily before a shop sign in Taiwan, or try to work your way through a verbose menu in an upper-class French restaurant, you may secretly wish for a world without languages (and, while we’re at it, without borders and customs).
However, deep down, we all know that a uniform, single-language world isn’t really desirable. Planet Earth is such a wonderful, confusing, and fun place because we come from different cultures, see things differently, and use different words for different things. Languages aren’t hard-coded systems with one-on-one equivalents for every object and sentiment under the sun; they represent history and cultures. Fifty words for snow aren’t of much use in the desert, and you’ll probably agree that the German “Fisimatenten“ (which – probably – stems from the French “ Visitez ma tente”) has a different, shall we say, vibe than the English “Poppycock”.
So, when we assume that languages are important and that we should try to understand each other (avoiding the Babelfish 1 dilemma), you’ll need someone or something that’ll help.
Which brings us to humans.
2. A Brief History of Translation
Human Translation
When we think about translation, we often imagine a bilingual human with a good understanding of the subject matter, carefully reading a sentence in one language and rendering it faithfully in another, maybe consulting a dictionary or encyclopedia when encountering an odd word or concept.
But what happens when we try to automate this process? Why isn’t it as simple as stringing together words from a dictionary or feeding a document into a “magic” machine?
In the following section, we’ll briefly explore the history of machine translation (MT), its current capabilities and limitations, and why even the most advanced systems still need human guidance or careful preparation if you insist on top-notch quality.
We’ll also touch on why machine translation can be incredibly cheap per word — and yet why the total cost of high-quality translated content can still be significant.
Why Word-by-Word Translation Doesn’t Work
The most basic approach to translation — word by word — quickly falls apart. Languages don’t match up one-to-one. Even a simple English sentence like “I run a business” cannot be translated by simply replacing each word individually to Italian or Korean, as grammatical structure, idioms, and context all differ.
Rule-Based Translation: Another Dead End
As early as the 1950s, researchers realized that translation is not just about substituting words, but about understanding meaning, grammar, and cultural context. Early efforts focused on simple rule-based systems, which required huge sets of manually defined grammar rules and dictionaries. While theoretically sound, these approaches were slow to build and limited in flexibility due to the immense complexity of human languages.
Statistical Machine Translation: The First Big Leap
Around the 1990s, we saw the rise of statistical machine translation (SMT). The basic idea was to analyze massive bilingual corpora (parallel texts) and statistically model how words and phrases in one language tend to correspond to those in another.
Systems like the early versions of Google Translate were based on statistical machine translation. These systems “learned” probabilities of phrases, allowing them to produce more natural output than pure rule-based systems.
However, statistical machine translation had significant drawbacks:
- Surface-level pattern recognition: Statistical machine translation doesn’t understand meaning. It is simply based on the statistical likelihoods of word sequences.
- Fragmentation: Longer, complex sentences often resulted in awkward or even incomprehensible translations.
- Language coverage: Good statistical machine translation results depended heavily on having large, high-quality bilingual corpora — which are not available for many language pairs or specialized domains.
By the late 2010s, statistical machine translation had essentially reached a dead end. Incremental improvements required massive data and computing power but yielded diminishing returns.
Neural Machine Translation and the Rise of Language Models
The next breakthrough came with neural machine translation (NMT). Here, instead of handling words and phrases as isolated units, the system uses artificial neural networks to model entire sentences and contexts holistically.
NMT fundamentally changed how machine translation works by “encoding” a sentence’s meaning before generating a new sentence in the target language. The resulting translations became much more fluent and natural, especially in major language pairs such as English and French.
Today’s cutting-edge machine translation engines are even more advanced, using large language models (LLMs) similar to the models that power interactive tools like Claude and ChatGPT. These machine translation engines can capture subtler nuances and handle some context beyond individual sentences.
3. Where We Are Now: Impressive, but Not Infallible
Modern machine translation engines, like DeepL, Google Translate, or Microsoft Translator, can achieve excellent results for technical manuals, legal texts, medical instructions, and other structured, non-fiction content. These are use cases where language tends to be precise, consistent, and context is usually local to each sentence or paragraph.
However, several challenges remain:
Segment-Level Translation
Most machine translation systems still translate sentence by sentence (or segment by segment). They do not “see” the entire document, meaning they may miss subtle links, repeated terminology, or implied meanings that span multiple sections.
As a simple example, if sentence B refers to the subject of sentence A as “it”, every human reader and translator will know what “it” refers to. A machine may or may not, leading to wrong translations. E.g., in German, machine (“Maschine”) is female, but a device (“Gerät”) is neuter. This can lead to translations that look sloppy or, well, machine-translated.
Better Results for Long Segments
Paradoxically, the longer and more complete a sentence or segment, the better machine translation systems tend to perform. Short fragments — headings, UI labels, table cells — are harder. For example, the English word “Table” could refer to a piece of furniture or a data grid. Without context, a machine translation engine may choose the wrong meaning, leading to embarrassing errors.
Ambiguity and Idioms
Generic phrases or ambiguous words often make machine translation engines stumble. A simple heading like “Play” might be translated as “Spielen” in German — which is fine for a game, but completely wrong if it refers to a playback button on a device.
4. “A Day’s Work” versus “Too Cheap to Meter”
A human translator needs to make a living, and he can only process a certain number of words per day, which varies depending on the complexity of the source material and other factors. This leads to “hard numbers” that will often surprise clients.
A typical translator may charge €0.10 to €0.20 per word (sometimes more for specialized content), reflecting their expertise and time. As a rule of thumb, in pre-machine-translation times, translators used to translate between 2000 and 3000 words per day. Now that machine pretranslation is usually part of most translators’ workflow, that number has increased to 5000-6000 words per day – one of the many cases where technology hasn’t made people “free” (or useless), but raised the bar. Needless to say, many translators are not happy about this development, because most clients are aware of this technological progress and require increased output (or lowered prices for translation post-editing instead of “real” translation work).
In contrast, using machine translation costs a fraction of a (US Dollar or Euro) cent per word — often around €0.0001 or even lower when scaled.
This is because machine translation engines running on modern CPUs process vast volumes almost instantly and without ongoing labor costs. The marginal cost of translating one more word is close to zero, once the system is set up.
However, while raw machine translation output is cheap, ensuring it is correct and fit for publishing — especially in critical contexts — introduces new costs: editing, checking, or preparing the source content more carefully.
5. Prepare and Polish: What Still Needs to Be Done
Even the best-prepared machine translation output is rarely perfect straight out of the engine. Companies using machine translation effectively typically focus on source optimization to reduce errors:
Avoiding Ambiguity
Technical writers should not use the English word “Play” as a heading in a product manual, even if that is the name of the button or menu item they are describing. Instead, they should write “The PLAY button” so it is clear this heading refers to a specific control element, not an activity. This will give the machine translation engine the required context.
Tagging and markup
Brand names, product names, codes, or certain commands should usually not be translated. To ensure these terms are protected, structured formats such as HTML and XML allow users to “protect” these elements by wrapping them in tags like <span class="notranslate">...</span>. In principle, this ensures the machine translation engine will skip these terms. However, this is not a bulletproof solution, as it may be necessary to adapt such a term grammatically (inflection).
Consistent Use of Terminology
Technical writers should use the same terms consistently throughout a document. While this can result in drier and possibly more boring prose, it will help both human and machine translators produce consistent results.
Glossaries
Many advanced machine translation systems allow the use of glossaries or “term bases” that force certain words or phrases to be translated in a specific way. However, such a glossary will be particular to a specific business, company, or even product range. These glossaries should be curated by humans – which is non-trivial, especially when translating to many target languages.
Only a Solid Source Ensures Solid (Machine) Translations
Generically speaking, machine translation can only work with what it has been given. Here, like in many other disciplines, the “garbage in, garbage out” principle applies. Brands and writers should not expect an algorithm to “fill in the blanks”, as it may or may not work.
Getting Close to “Perfect” … But Never 100%
Even with careful source preparation, thorough tagging, and access to high-quality machine translations, small errors can still slip through. This is similar to optical character recognition (OCR) for scanned documents: no matter how good the technology, there’s always a small margin of error (often referred to as “99.x% accuracy”).
In machine translation, these final fractions of a percent can matter a lot, especially for safety-critical instructions or customer-facing marketing materials. That’s why final human review, even if minimal, is often recommended before publication.
6. Examples for Translation Prices: Human, Raw MT, MT, and Post-Editing
The examples in the following tables are based on the following rates, which are realistic for European languages. Feel free to do your own research here.
- Human translator: 0,15 € per word
(Some technical texts or creative marketing texts can be higher, but this is a reasonable average.) - Raw machine translation: 0,0005 € per word
(Many cloud-based machine translation APIs cost 20 € per million characters, which is 0,0005 € per word or less.) - Machine translation with human post-editing (MTPE): 0,07 € per word
Light human post-editing (where the target text is checked for consistency and errors) is usually available at this rate.
| Document Type | Human Translation (circa 0,15 €/word) | Raw Machine Translation (circa 0,0005 €/word) | Machine Translation + Post-editing (circa 0,07 €/word) |
|---|---|---|---|
| Product quick guide and safety instructions (1.000 words) |
150 € | 0,50 € | 70,50 € |
| Product manual (5.000 words) |
750 € | 2,50 € | 352,50 € |
| Large product knowledge base (100.000 words) |
15.000 € | 50 € | 7.050 € |
How to interpret these figures:
- Human translation ensures the highest linguistic and stylistic quality. It is most suitable for marketing-heavy content, legally critical information, or brand-sensitive messaging.
- Raw MT is extremely cheap and fast but should be used only when perfect accuracy is not required, or for purely internal understanding (so-called “gisting”).
- MT with post-editing provides a strong middle ground: most cost savings of MT, but with human review to fix critical errors and polish readability.
Please note that while raw machine translation costs “almost nothing” per word, the real costs may come later: misunderstandings, customer dissatisfaction, or liability risks if important details are mistranslated. Post-editing mitigates this while still providing significant savings.
Conclusion: Machine Translation Is a Powerful Tool, but Not a Magic Wand
Machine translation has come a long way since the days of simple dictionaries and statistical word juggling. Neural approaches and language models have brought us much closer to natural, fluent translations at a fraction of the traditional cost.
However, using machine translation effectively requires more than just hitting “translate.” Source texts must be optimized, ambiguity minimized, and special cases handled carefully.
For businesses, the good news is that machine translation can drastically reduce costs and speed up turnaround times, making it possible to reach global markets faster. The key is to combine automation with thoughtful preparation — and to understand that while machine translation is a fantastic tool, it is not (yet) a perfect replacement for human translators and subject-matter experts.
For more information on translation strategies, workflows, and solutions for your business, please get in touch.
From Douglas Adams’ The Hitchhiker’s Guide to the Galaxy”: “The Babel fish is small, yellow, leech-like, and probably the oddest thing in the Universe. It feeds on brainwave energy received not from its own carrier, but from those around it. It absorbs all unconscious mental frequencies from this brainwave energy to nourish itself with. It then excretes into the mind of its carrier a telepathic matrix formed by combining the conscious thought frequencies with nerve signals picked up from the speech centres of the brain which has supplied them. The practical upshot of all this is that if you stick a Babel fish in your ear you can instantly understand anything said to you in any form of language. […] Meanwhile, the poor Babel fish, by effectively removing all barriers to communication between different races and cultures, has caused more and bloodier wars than anything else in the history of creation.”↩︎
↻ 2025-11-20