What’s in a Paragraph? Why Humans Outdo Auto-Translators

This week I spent time translating a document for a friend whose grandfather fought in World War I. The document, issued by the French government, recognizes members of a United States Marine Regiment that helped defeat the Germans near the end of the war. There’s only one paragraph to translate. A simple task, one would think. Yet it’s not something you can easily feed into a translation program and quickly receive an adequate result. In fact, most translations if done well rely heavily on the human brain’s ability to consider myriad factors that might affect interpretation—factors that programs like Google Translate currently have no way of determining.

In this post, I share some of the challenges and discoveries I made while translating the text. I’m hoping that the history buffs and native French speakers who read along will enrich the story further in the comments.

Military Commendation
Military commendation issued by France after World War I.

Readability of Source Text

Above is a photo of the certificate that was awarded to my friend’s grandfather. I had to read through it several times to determine all of the words. The initial challenge was simply identifying each character properly. The text of the original was handwritten so the letters aren’t perfectly consistent. Surprisingly, several words are missing accents, which if taken literally as written, can alter their interpretation.

So for starters, I wonder if there is a text recognition system out there that is capable of producing a spell-corrected version of the original French.

My French and English Interpretations

Below is a zoomed-in photo of the document’s text followed by my interpretation of the French alongside my English translation.

Text to be translated
Original French text of WWI commendation.

Citation à l’ordre de l’armée

Commendation at the Behest of the Military

Le 5e Régiments de Marine Américain :
(Sous les ordres du Colonel Logan Feland)

5th Regiment of the United States Marines:
(Under the command of Colonel Logan Feland)

“A pris une part glorieuse aux operations engagées par la 4e Armée en Champagne, en octobre 1918. Le 3 octobre 1918, a participé à l’attaque des positions allemandes fortement retranchées entre le Blanc-Mont et la ferme Médéah, et poussant de l’avant jusqu’au abords de St Étienne-à-Arnes, a réalisé une avance de 6 kilométres. A fait plusieurs milliers de prisonniers, capturé des canons, des mitrailleuses et un important materiel de guerre. Cette attaque combinée avec celle des Divisions Françaises a eu pour conséquence l’évacuation des deux rives de la Suippe et du Massif de Notre Dame des Champs.”

“Has taken glorious part in operations deployed by the 4th Army in Champagne, in October 1918. On October 3, 1918, participated in the attack of deeply entrenched German positions between Le Blanc Mont and the Médéah farm, and pushing forward to the outskirts of St Étienne-à-Arnes, achieved an advance of 6 kilometers. Took several thousand prisoners, captured cannons, machine guns and substantial military equipment. This attack combined with that of French Divisions resulted in the evacuation of both shores of the Suippe and of the Massif de Notre Dame des Champs.”

(Ordre no 14742 “D” ~ 21 Mars 1919)

(Order no 14742 “D” ~ March 21, 1919)

Au Grand Quartier Général
Le Maréchal
Commandant en chef les Armées de l’Est.

Pétain

At Supreme Headquarters
Marshal
Eastern Armies Commander-in-Chief

Pétain

"v" resembles "y"
“v” resembles “y”

Perhaps the most glaring difference between the font used in the document and modern-day English is the letter “v” which has a descending tail. Thus, the word avant looks like ayant. In the context of the document, however, my brain almost subconsciously made the needed adjustment.

Médéah
“Médéah”

The letter that gave me the greatest difficulty is the “M” in front of the word Médéah. It doesn’t resemble any of the other “M”s in the document and to my eye, looks more like an “R” or perhaps a combination of two characters. Google searches for a farm with a similar name near St Étienne-à-Arnes, or Le Blanc Mont, or the Massif de Notre Dame des Champs, proved fruitless. Luckily, I could call upon my French friend Mijo, who quickly recognized the ornate symbol as an M.

An Infamous Signature

The document is signed by Maréchal Pétain. I wondered if this was the same national hero, who was adored by France after World War I, then went on to collaborate with the Nazis during World War II. After the second war, Pétain was tried for treason and narrowly escaped execution. Because of his age and prior service to France, he was shown leniency and sentenced to life imprisonment. A quick image search of the infamous leader’s signature online confirmed that this is the same Pétain who authorized the commendation.

Signature of Maréchal Philippe Pétain
Signature of Maréchal Philippe Pétain

Understanding the Setting

I suppose a professional translator who is paid by the word might stop at this point, collect a fee and move on. But, the exercise raised several questions in my mind that I wanted to track down. First and foremost, where exactly did this confrontation take place? On my initial reading of the text, I’d incorrectly assumed that the fighting was somewhere near Mont Blanc, France’s highest peak in the Alps which extends into portions of Switzerland and Italy.

Looking at a map, however, I realized that this was unlikely as most of the fighting in France during World War I took place in the north. Indeed, Le Blanc Mont (as stated in the document) is different than Mont Blanc. According to Google Maps, however, there are 6 Le Blanc Monts in France and most are in the north! The one I was looking for was located near the Médéah farm, but that no longer exists. There were other clues, however, and eventually, I came to believe that the Le Blanc Mont mentioned in the document had to be the one that was closest to the Suippe river and St Étienne-à-Arnes.

Below is a screenshot from Google Maps, showing the distance between Le Blanc Mont and St Étienne-à-Arnes. (Google Maps annoyingly removed the label of the SEAA endpoint.) My limited knowledge of WWI was enough to recall the Battle of the Marne and Battle of Ardennes. Here we see the boundary line between Ardennes and Marne (two departments of France) stretching across the very location under consideration.

Area near St Étienne-à-Armes
Distance between St Étienne-à-Arnes and Le Blanc Mont.

History Comes to Life

In another view on Google Maps, I noticed an American WWI monument lying practically on top of the as-the-crow-flies trajectory from my screen capture. Zeroing in on the monument brought up the following photograph, taken last February. The monument commemorates “the achievements of the American units that served in combat with the French Fourth Army during the summer and fall of 1918.”

WWI Sommepy American Monument
The World War I Sommepy American Monument. Photo by Adrien Wallerant.

Visitors to this site and the surrounding area can still find many vestiges from a war that took place more than one hundred years ago: trenches, dugouts, gun emplacements, fortified bunkers, and vast cemeteries. Here are a few photos I found online.

An Introduction to More

Above are the highlights of what I uncovered. A dump of my browser history shows far more dead ends than fruitful pages. One of my searches turned up a touristic route in northern France that leads you across the gently rolling and now peaceful battlegrounds. However, if you’d prefer to remain in the comfort of your home, today’s technology offers up a host of resources for learning more about what an ancestor might have endured.

My research barely skims the surface of what lies behind the honorary paragraph that sent me looking for more. But, it gives my friend a better starting point for how to proceed than a paragraph digested by one of today’s auto-translators. Some people might argue that expert systems and machine-learning algorithms are quickly catching up to human-level functioning. For now, however, nothing comes close.

Perhaps you can enrich the story further by adding your own insights in the comments.

About Carol A. Seidl

Serial software entrepreneur, writer, translator, and mother of 3. Avid follower of French media, culture, history, and language. Lover of books, travel, history, art, cooking, fitness, and nature. Cultivating connections with francophiles and francophones.

30 Comments

  1. Well done. I was surprised at the signature and checked it on line as you did. It does look like Pétain’s signature. He had won the battle at Verdun, turning the war around. But Chief of staff in 1918 was Foch who truly won the war. (Pétain apparently refused to obey Foch’s orders… tsss. Egos) And Pétain’s actions during WWII fully warranted the death penalty that he got. He had actually given de Gaulle the death penalty “in ausencia” in 1940. But de Gaulle commuted Pétain’s condemnation to life.

    • I didn’t know the part about Pétain issuing the death penalty for de Gaulle. That seems pretty egregious! De Gaulle definitely proved to be the bigger man.

      • He did not do directly of course. But he set up a special tribunal that gave de Gaulle the death penalty on August 4th, 1940. Pétain had surendered in June. Only a few weeks to issue the verdict. Yes, de Gaulle was the bigger man. (Sorely lacking these days…)

        • Sorry for the late reply. Your comment somehow slipped through the cracks. I thought the coverage was good. I watched Macron and Biden deliver their address live and I was so glad that they didn’t have an English interpreter talking over Macron. You heard him speak in French, then heard the English interpreter when he had finished. Likewise, the French interpretation of Biden’s English was delivered in full.

          To be honest, I haven’t been watching as much news of late so I don’t know how well the rest of Macron’s tour was covered. I just learned this morning, for example, that Macron visited New Orleans. Whether this was covered by U.S. media, I don’t know.

          • That is a good interpretation, aloows one to follow both the original and the interpretation.
            New Orleans was unexplicable to me? Want to make the best of your US trip, go to California and speak to the top 5 high tech guys/women there… But “N’ar-leens”? Looks like tourism to me. He mentioned “Francophonie” as an excuse, but no-one has spoken French there in centuries. “Cajuns”? (The old Acadiens?)
            It seems to me Macron is already out of this world. Makes no sense. (I must be gettin’ old. )

          • Haha! Well, NO may still be the US city with the most French influence. But you’re right, not a lot of French spoken there today and that which remains is barely recognizable. I visited for the first time in 2019 and was delight to find a used bookstore there with all French titles. That is extremely rare these days—even in much bigger North American cities. Couldn’t even find one in Toronto!

          • Only in 2019? I like NO. It’s a nice place. Must have been badly hit. I hope they could restore the city. I don’t know what happened.
            About French books, it is a bit far away to go to NO… LOL. Toronto should. They’re supposed to be bilingual… Re-LOL.

      • PS. What was the coverage of Macron’s visit to the US?

  2. Great job reading the script. Even I have trouble with early 20th century writing. I agree with the M. That was a fancy one.
    Now I “disagree” on one thing only. The village is Saint-Étienne-à-Arnes. It’s an N. Not an M. I actually looked the place up, it seems to be in the vicinity.

    • Great job on the fact-checking! I should have caught the Armes/Arnes error because I looked the place up as well and had it in Google Maps for hours on my laptop. When I first transcribed the French, I failed to recognize the “n” as an “m”. Thereafter, I cut and pasted the village name to avoid typos. Bad strategy. Ha!

      • haha! You did fine. Especially since that is not even close to an American script. I learned to write like that, with a “plume sergent-major” dipped in ink. Did pages and pages of writing. Using the lines. You got graded for that in first grade. (First and last year I went to school for. a long time but that’s and other story.)
        Au revoir.

        • Writing with a pen you must dip into ink sounds pretty brutal for a first grader–and not too fun for a parent charged with removing ink stains from clothing. I’m very glad my son and I weren’t subjected to that!

          • haha! We had blouses. But our hands did come out a bit blue… Fortunately I never went back to school after that first year until senior high.

          • You dodged a bullet. I’m highly critical of preordained curriculums but sent all my kids to public school anyway. Thankfully, they seem to have overcome the mind-numbing repetition they were subjected to.

          • Yeah. Many bullets. One for each year of home schooling. Haha! Enjoyed it. On the contrary Senior High was a shock. I didn’t even know how to take notes. Wrote too slow. Had to borrow friends’ notes. Not to mention an idiot Math teacher who almost made me miss my Baccalauréat… (But I got it)
            I’m sure your kids are very smart. Are they happy in College?

          • They’re pretty happy but I am superstitious when it comes to characterizing their experience. As soon as I say one thing, the opposite seems to happen. Ha! Hopefully, they have the tools they need to cope with life’s curve balls.

          • I understand that kind of superstition when it comes to kids… I “do” the same with my daughters and grandkids… One is never too cautious… LOL.
            And I’m sure they have most of the tools. The rest, they will have to learn by themselves.
            We will probably be in touch still in December, but… Joyeux Noël et bonne année à toi et toute ta famille. Hugs.

          • Merci Brieuc. À toi et tes proches aussi!

  3. Translation, to give good results, must include the translator understanding the meaning of the text and expressing the equivalent meaning in the target language. Understanding is a process that requires a conscious mind. So until machines have conscious minds (that is, until they effectively become human), they cannot understand things, and no algorithm can substitute for that.

    Understanding also involves using knowledge and experience of the real world that humans have but machines do not — such as knowing what part of France the fighting of World War I took place in, or that the place name of a small entity such as a farm might no longer be in use after a century. This is why translation from ancient languages like Sumerian presents special problems — the equivalent cultural and circumstantial knowledge is lost to us, long gone.

    Then of course there are problems presented by things like idiosyncratic handwriting (as exemplified here). Since ayant is actually a word in French, would a machine interpret the word that way and try to translate the sentence to make it fit? Realizing that avant makes more sense in context seems like distinctly human intuition.

    And of course most texts contain some minor errors. It seems odd that a French native speaker would omit accent marks, especially in a formal text, but even educated English speakers often make minor errors of spelling or punctuation.

    Finally, languages differ greatly in what elements of meaning they express explicitly. This doesn’t vary much among European languages, but the farther afield you go the more noticeable it becomes. Japanese sentences often don’t explicitly show the grammatical subject, neither by a pronoun nor by verb ending; it’s inferred from context. But Japanese verbs do have a system of endings which show how polite the sentence is. I suspect Japanese-to-English machine translation would have trouble getting grammatical subjects right, while English-to-Japanese would have trouble getting the politeness endings right. It takes human awareness to deduce information that isn’t explicitly given in the original text.

    I suppose a machine could produce a translation of this document which would be roughly correct, and adequate to get the general idea across. But it wouldn’t pick up on the nuances and details that you did.

    • You’re right Infidel about the best translators having a background in the subject matter. That’s why most translators specialize in certain subjects like law, science, business, or politics. The more they know, the better. And over time their vocabulary in the given domain continues to grow.

      The Japanese example is excellent. That does seem like a near-impossible task for a machine, even one that employs learning algorithms that allow it to adjust and expand upon its skills over time.

      For kicks, I ran the text through 3 optical character readers that convert image text into raw text. The first was in Evernote. Evernote is generally pretty good at reading business cards or flyers, for example. Even when the fonts are artsy. It also isn’t bad with handwriting. But it couldn’t figure out much in this case. It got the word “Grand” correctly and “le” (in most cases) and “des” but that was about it. I was surprised that it didn’t recognize more of the easy small words, like “une”.

      Things went downhill from there. iPhone has an OCR that will create a note from a scanned image. It did terribly. It transcribed “Regiment” to “Megiments” and that was about as good as it got. The worst was “Divisions” to “Bijssions”. Major fail there. Haha.

      Then I found a free OCR online that got my hopes up when I had to specify a source language (and French was one of the options) before running the analysis. Here’s an example: “Le 5e Régiments de Marine Américain :” was transformed into “reg une. rd, d e Marine nzeuca.trz .”. I may have that wrong. It was hard to tell where lines began and ended but you get the idea. At least it picked off “Marine”.

      Surely, there are better scanners that one has to pay for, but I still think they would have difficulty with this document.

      • It’s not difficult to find more examples of different elements of meaning being made explicit in different languages. Chinese has a bunch of sentence-ending particles which express things like how the speaker feels about what he just stated, whether the action described is completed or ongoing, etc. Leaving those particles out results in a statement which is understandable but unnatural, something only a non-native speaker would say. I can’t imagine how English-to-Chinese machine translation would handle that. Many native Indian languages in the Amazon basin have what’s called “evidentiary grammar”, in which verbs have different endings depending on how the speaker knows what he’s saying. So a sentence like “the monkey killed the rat” would change depending on whether the speaker actually saw the monkey kill the rat, or just saw the monkey leaving the dead rat with blood on its teeth, or any of half-a-dozen other possibilities. Because those are minor languages, we rarely need to translate anything into them, but it’s easy to see the problems a machine would have when doing so.

        Those optical character reader results are really bad, which is surprising given that most of this text is actually quite neatly written. My own study of French is limited to two college quarters forty years ago, and I could recognize more than half the words in the text, and of course recognize the letters in the rest of them. God knows what those systems would make of, say, the wildly distorted version of the Arabic alphabet often used in Pakistani languages, or the hopeless scribble you sometimes get with handwritten Chinese.

        A human can infer a great deal of missing or unclear information from what is known about what is there. For example, the only reason you didn’t immediately recognize that weird M in Médéah is that Médéah is an unfamiliar word, so you had nothing to guide you in identifying the odd letter. In fact, if you look at 21 Mars 1919 in parentheses below the main block of text, the M in Mars looks the same as the M in Médéah. It probably didn’t even register to you as looking strange because Mars is a common word and you identified it without effort despite the odd form of the M.

        • Wow! You are too dang smart! I thought I checked all the other M’s in the document but you’re right, the M in Mars is an obvious clue. I don’t find Médéah unrecognizable per se but its weird in a French context. The accents aren’t apparent but after learning that the word started with M, subsequent Google searches were fruitful so that I could get the original spelling.

          Love the example of the Native American language. Reminds me a bit of the French subjunctive. Plus there’s a helpful tip in there. Next time someone accuses my cat of killing a song bird, I’ll ask if they saw blood on its teeth.

          All joking aside, thanks much for these excellent examples of tough translation problems.

  4. I certainly found your insights to be true when I was helping Ukrainian media earlier this year. You might put something in Google
    Translate just to get the gist of it and do some fact checking to make the work go faster, but the translation itself was clunky or wrong.

  5. IN older days (well, also when I was a kid!), we had super fancy ways of handwriting capital letters, hence this M. Which for me is clerly an M, but that must be because of many pages I did as a young child with capital letters, lol

  6. Glorious reminder of why one translates. Merci.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.