Crowdsourced Monolingual Translation

TranslationA recent paper by a researcher at the University of Maryland explores that use of crowdsourcing in text translation.  Instead of requiring bilingual translators to convert books from one language to another, Google Translate was used for the bulk of the translation, with crowdsourced monolingual translators on either side to verify the text and add corrections.

The primary motivation behind monolingual translation is its lower cost.  Bilingual translators can be expensive due to their relatively rare skills.  Literate monolingual people, on the other hand, are easy to crowdsource and put to work translating books and content.

Although the research project still has a long way to go (the translated content is better than Google Translate, yet still far from perfect), the crowdsourced monolingual translation technique offers potential for simpler translations and treads exciting ground in using the “wisdom of the crowds” to help make content more widely accessible.

Crowdsourced Monolingual Translation (CMT) has six primary steps in the translation:

  1. The original content is first converted to the target language using Google Translate.
  2. A crowdsourced reviewer then reads the translated text and marks suspect words and phrases
  3. A native speaker then provides graphic / picture annotations to help clarify the meaning of the words
  4. The reviewer of the translated text then reviews the graphic and makes suggested changes to the text
  5. Google Translate is then used to convert the translated text back to the original language
  6. The native speaker then reviews the reverse translated text, and upon approval, the phrase is committed to the database

If this seems like a lot of work, it is.  In addition, each sentence may need to go through the process several times, in order to verify translation if a difference of opinion is detected.  Complex sentences may need to be translated upwards of thirty times before a consensus is reached.

Another significant problem with CMT is the limited scope of content it can translate.  While it works satisfactorily for children’s books and Twitter dialogue, the system does not scale as well to the multi-layered or allegorical sentences that are the hallmark of most literature.

Still, with children’s books, CMT resulting in an impressive improvement in the percentage of high-quality sentences.  Children’s books with pictures had the best results – CMT was judged to have 68% high-quality content, as opposed to Google Translate’s 10%.  An alternative version of CMT that was focused on sentence-level translation resulted in an approximate 50% reduction in grammar, stylistic, and content errors.  Yet, as anyone who has worked with Google Translate in the past can assert, a 50% reduction in errors still a far cry from publishable material.

While CMT has made interesting progress in translation, it still has far to go.  Perhaps by combining CMT with machine learning, or by using crowd-sourced back-propagation in the Google Translate algorithm, new peaks could be achieved in automated translation.  Until then, unless translating a children’s picture book, it’s still best to hire a professional.

Written by Andrew Palczewski

About the Author
Andrew Palczewski is CEO of apHarmony, a Chicago software development company. He holds a Master's degree in Computer Engineering from the University of Illinois at Urbana-Champaign and has over ten years' experience in managing development of software projects.
Google+

RSS Twitter LinkedIn Facebook Email

Leave a Reply

Your email address will not be published. Required fields are marked *