Rabbit Hole #1

Machine Translation: Why aren’t museums using it?

Why aren’t museums using machine translation? This article explores how machine translation works and how it came to be in its current state, as well as how we evaluate state-of-the-art machine translation technology. Finally, this article examines how improvements in machine translation could disrupt and transform the museum industry.

Anyone who has needed some quick help in another language knows the drill – just open up Google Translate. While we know it won’t be perfect, it’s better than nothing and has helped countless people communicate across linguistic and cultural divides. Despite its usefulness, there isn’t widespread adoption of machine translation technology across the museum industry.  

In a world where Siri can set alarms, give us directions, and look things up, shouldn’t machine translation be better by now? How does machine translation work? Why are museums still using human translation? And most importantly, what happens when machine translation is good enough that museums and other arts enterprises can use it without human oversight? 

This article explores how machine translation works and how it came to be in its current state, as well as how we evaluate state-of-the-art machine translation technology. Finally, this article examines how improvements in machine translation could disrupt and transform the museum industry.

How does machine translation work?

So, how does machine translation work? Underpinning machine translation is a field called natural language processing, or NLP for short. This area of study uses tools of computational linguistics combined with modern advances in computer science in order to “learn, understand, and produce human language content.” While NLP has progressed greatly since its inception, one foundational technique that is still used in some forms today is called “bag of words”, where all the words in a document are added up and each word’s frequency is calculated. This concept is a great illustration of many of the techniques used in NLP – counting words and calculating their relationship to other words within the same document or within a corpus of documents. 

Photo by Amador Loureiro on Unsplash

Two other concepts to highlight within NLP are part-of-speech tagging and named entity recognition. Part-of-speech tagging is a computational linguistics tool in which descriptors (the tags) are assigned to each word’s role within a sentence. This allows for semantic understanding of the role each word plays, which is critical to grokking the full meaning of a sentence. Another important concept in NLP is named entity recognition, which is when a word or phrase that contains important information is identified and categorized. We need to be able to differentiate a name or location as distinct from an ordinary noun or pronoun to understand a sentence, and named entity recognition allows for that.

The techniques described above offer a window into how the field of natural language processing works and are illustrative of concepts integral to machine translation. Machine translation historically used statistical learning techniques similar to those described above. Launched in April 2006, Google Translate used a corpus of billions of words in both the origin and target languages to help craft computer-generated translations. While it is not the only company offering cutting-edge translation technology, Google has been an industry leader for many years.

With improvement in computing hardware as well as machine learning technology, Google was able to improve its translation service. In 2016, it launched a new version of its machine translation service that used neural machine translation instead of statistical models. Artificial neural networks are a type of machine learning algorithm that has quickly become useful in many industries. Using neural machine translation means that the model itself is made up of layers and layers of simple linear regression models. By combining these simple models, it mimics the way a human brain works in sending, translating, and receiving signals (hence why these models are referred to as “neural networks”). 

Image from IBM’s “What are Neural Networks?

Google wasn’t done in 2016; although their current translation model still uses neural machine translation, they’ve made strides in other areas. Previously, they needed copious amounts of data in every language that they wanted to translate, but in using neural network machine learning techniques, their models can now perform better on low-resource languages, meaning they perform well even if they were not trained on very much data for that language. In 2019, Google released a massively multilingual, massive neural machine translation (M4) that was pretrained on over 25 billion sentence pairs and as a result, performed even better on low-resource languages. They have continued to focus on improving this low-resource language ability in order to expand their machine translation offerings, and as such released an updated model in 2020.

How do we know machine translation is improving?

Photo by Markus Winkler on Unsplash

So, machine translation works by taking natural language processing techniques, training a complex model with many layers of simpler models, and then inputting new text on which the model uses what it has learned. But how do researchers know that machine translation is improving? 

 There are a few metrics that exist for machine translation, and Google uses one called BLEU, which stands for “bilingual evaluation understudy”. The paper establishing the method was published in 2002, but despite its age the metric is still being used to evaluate cutting-edge language models. BLEU calculates a translation closeness score by evaluating each sentence as a unit and comparing them on a weighted average of words, allowing for other word choices that make sense in context. The score is then reported as a number between 0 and 1 with 1 being the highest score possible. Google has continued to evaluate its models using BLEU so that updated models can be easily compared to their predecessors.

Despite the progress that has been made, it’s notable to compare machine translation capabilities with natural language processing tools that we use in English every day. As mentioned in the introduction, people take for granted that they can use speech-to-text on their phone, or that they can interface with a chatbot on a website. Notably, a company in Canada called Cohere just raised $125 million to perform similar NLP tasks, including content moderation, conversational artificial intelligence, and search support. Another example of the expansion of English language model tools comes in the form of a startup called Forefront. Two weeks after the launch of an open-source large language model called GPT-NeoX-20b, Forefront announced that it was offering fine-tuning services to make the use of GPT-NeoX-20b more accessible. All these markers of progress demonstrate that NLP in English is rapidly advancing and expanding. 

Why isn’t machine translation good enough?

Photo by Eunice Lituañas on Unsplash

If AI can write papers and chat with us, why can’t it fluently translate my words into any language without errors? One of the shortcomings of large language models is that they require copious amounts of data to be effective. While there has been considerable effort to train machine translation models, there just isn’t as much training data available in many languages as there is in English. Despite Google’s best efforts to train their translation models to perform well on low-resource languages, they still aren’t nearly as good as the models that use English. And not only are the models not as good, they are also insufficient for professional use even in high-resource languages. As researcher Aihua Zhu states, current machine translation technology fails to meet the needs of use in professional contexts.

Despite the clear advantage that English has over other languages, there are still numerous shortcomings that exist with English large language models. Because many of them were trained on text from the internet, they tend to have biases such as stereotypes towards people of minority gender or ethnic identities. A recent illustration of this is that YouTube’s automated captioning service was spotted using profanity in captions of videos for young children. So while language models in English perform well enough to be useful to large swathes of society, they still have limitations. Comparatively, the limitations of multilingual language models are greater and do not allow for as much integration into high-level societal tasks, although they are very useful for many people (in 2016 Google stated that Google Translate has 500 million users and translates over 1 billion words a day). 

How are museums translating their material?

Turning back to the museum industry, we consider the current state of translation resources available to them. How are museums currently offering material in multiple languages? Many museums are employing professional translation services. For instance, the Field Museum in Chicago uses a company called Multilingual Connections, and the Denver Botanical GardensSouth Florida Science Center, and the Metropolitan Museum of Art use a company called Eriksen Translation. That is, museums must pay for professional human translators in order to offer material in multiple languages. When the Children’s Discovery Museum of San Jose decided in 2015 to translate materials so that they could engage more with Latino visitors, they decided to exclusively use human translations rather than machine translation because they wanted to ensure that their materials were true to the “spirit of the words,” not just that it offered visitors the gist of what was going on. 

While all of these are museums in the United States, it also seems that countries in Europe use a similar model of hiring professional translators – two contract bidding announcements by the Museum of the Quai Branly – Jacques Chirac and the Museum of Bastia in France demonstrate that human translation is still the industry standard in Europe as well. 

A great illustration of the current state of machine translation as useful but not quite good enough comes in an announcement for a translation sprint. Europeana Pro is an organization funded by the European Union that is dedicated to preserving and promoting cultural heritage. In their 2020 invitation to join sprint to translate foundational documents into more languages, the organizers of the sprint instructed participants to use a number of machine translation tools only “as long as you review the resulting text.” This instruction underlines the themes explored by this paper: machine translation is good, but not yet good enough for professional use without human review.

How will museums be affected when machine translation achieves parity with human translation?

Photo by Yi Liu on Unsplash

So machine translation clearly is not yet good enough – but how could it change once the technology catches up to the need? User experience is naturally a priority for museums, and translation is an integral part of that. There are many museums in America that could better serve their non-English-speaking populations by expanding their offerings. And it isn’t just in serving their visitors; museums could also benefit greatly from better machine translation for their collections. A team of researchers from Beijing Jiaotong University used machine translation to evaluate ancient Chinese texts and translate them into English, demonstrating the promise of machine translation as a way to preserve and expand collection offerings. A research group based out of the University of California Los Angeles (UCLA) has a program dedicated to using automated translation to analyze and understand cuneiform languages on a series of tablets from southern Mesopotamia. A widely-used museum database software called Axiell has multilingual fields embedded in its data structures, and offers an automated translation tool so that curators have some idea of what they’re looking at even if it is in a language they don’t understand.

Another team of researchers commented on translation in museums as a narrative and a means by which to convey identity. They explore how bad translation can be seen as a failure of the museum and argue that the quality of the translation not only has an impact on the museum, but also on the message that the museum wants to convey. A professor and researcher from the United Kingdom synthesized the benefits of translation for museums into two central themes: economic value and social inclusivity. However, given that a professional translation service is often required, that is currently not always an option for museums. 

While professional translation services are the norm, some museums are starting to explore machine translation as a potential tool. The Computer History Museum conducted an experiment with artificial intelligence during which they used machine translation on a number of their audio and video files. While it is unclear if they have put it into effect so far, they underscored many of the points made in this article by commenting that machine translation promises to make museums and collections “more accessible to speakers of languages other than English.” 

All of this evidence points to the potential disruption and change that could occur for museums when machine translation becomes a ubiquitous option for professional-level translation. Besides machine translation’s promise of museums’ ability to better serve communities in America, the international museum industry also thrives off the ability to offer people a window into culture and identity that isn’t possible without accurate translations. The significant reduction of costs that would occur if machine translation improved would offer museums the ability to broaden their offerings and expand their visitor experience beyond their current capability.


Axiell. n.d. “Axiell – Bringing Culture and Knowledge to Life.” Axiell. Accessed March 2, 2022a.

Axiell. n.d. “Interface Functionality: Editing or Translating Adlib Interface Texts.” Accessed March 2, 2022b.

Axiell. n.d. “Translations Manager: Introduction.” Accessed March 2, 2022c.

Bapna, Ankur, and Orhan Firat. 2019. “Exploring Massively Multilingual, Massive Neural Machine Translation.” Google AI Blog (blog). October 11, 2019.

Chen, Chia-Li, and Min-Hsiu Liao. 2017. “National Identity, International Visitors: Narration and Translation of the Taipei 228 Memorial Museum.” Museum and Society 15 (1): 56–68.

“Cohere.” n.d. Cohere. Accessed March 2, 2022.

David C. Brock. 2022. “A Museum’s Experience With AI.” CHM. February 3, 2022.

Eriksen Translations Inc. 2019. “Museum Audience Engagement: Translation Strategies to Promote Diversity.” Eriksen Translations Inc. September 19, 2019.

Eriksen Translations Inc. n.d. “Translating and Typesetting the Met Guides into 6 Languages.” Eriksen Translations Inc. Accessed March 2, 2022a.

Eriksen Translations Inc. n.d. “Translating Exhibition Materials for the South Florida Science Center.” Eriksen Translations Inc. Accessed March 2, 2022b.

Eriksen Translations Inc. n.d. “Translating the Garden Tool Mobile Website for Denver Botanic Gardens.” Eriksen Translations Inc. Accessed March 2, 2022c.

Eriksen Translations Inc. n.d. “Translation & Localization Services NYC | Eriksen Translations.” Eriksen Translations Inc. Accessed March 2, 2022d.

“Forefront: Fine-Tune and Deploy GPT-J, GPT-13B, and GPT-NeoX.” n.d. Accessed March 2, 2022.

Field Museum. 2018. “About.” Text. Field Museum. May 14, 2018.

Furui, S., T. Kikuchi, Y. Shinnaka, and C. Hori. 2004. “Speech-to-Text and Speech-to-Speech Summarization of Spontaneous Speech.” IEEE Transactions on Speech and Audio Processing 12 (4): 401–8.

Goel, Aman. 2018. “How Does Siri Work? The Science Behind Siri.” Magoosh Data Science Blog (blog). February 3, 2018.

Hirschberg, Julia, and Christopher D. Manning. 2015. “Advances in Natural Language Processing.” Science 349 (6245): 261–66.

IBM Cloud Education. 2021. “What Are Neural Networks?” August 3, 2021.

Lalwani, Tarun, Shashank Bhalotia, Ashish Pal, Vasundhara Rathod, and Shreya Bisen. 2018. “Implementation of a Chatbot System Using AI and NLP.” SSRN Scholarly Paper ID 3531782. Rochester, NY: Social Science Research Network.

Leahy, Connor. 2022. “Announcing GPT-NeoX-20B.” EleutherAI Blog. February 2, 2022.

Liao, Min-Hsiu. 2018. “Museums and Creative Industries: The Contribution of Translation Studies.” January 2018.

Marshall, Christopher. 2020. “What Is Named Entity Recognition (NER) and How Can I Use It?” Super.AI (blog). June 2, 2020.

Martin, Jenni, and Marilee Jennings. 2015. “Tomorrow’s Museum: Multilingual Audiences and the Learning Institution.” Museums & Social Issues 10 (1): 83–94.

Matas, Ariadna. 2020. “OpenGLAM Translation Sprint at Europeana 2020.” Europeana Pro. September 2020.

Melisa Palferro. 2018. “Different Approaches to Museum Translation.” #ucreatewetranslate (blog). March 8, 2018.

Menon, Yasmin, and Will Lach. 2021. “Creating Accessibility in Museums with High Quality Translations.” Eriksen Translations Inc. November 19, 2021.

Mingyu, Lu, and Si Xianzhu. 2010. “Application of Machine Translation to Chinese-English Translation of Relic Texts in Museum.” In 2010 International Conference on Intelligent System Design and Engineering Application, 1:355–58.

Mitkov, Ruslan. 2004. The Oxford Handbook of Computational Linguistics. OUP Oxford.

MTAAC Team. n.d. “Machine Translation and Automated Analysis of Cuneiform Languages.” Machine Translation and Automated Analysis of Cuneiform Languages. Accessed March 2, 2022.

Multilingual Connections. n.d. “Museum Translation Services.” Multilingual Connections. Accessed March 2, 2022.

Nadeem, Moin, Anna Bethke, and Siva Reddy. 2020. “StereoSet: Measuring Stereotypical Bias in Pretrained Language Models.” ArXiv:2004.09456 [Cs], April.

Och, Franz. n.d. “Statistical Machine Translation Live.” Google AI Blog (blog). Accessed March 2, 2022.

Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. “Bleu: A Method for Automatic Evaluation of Machine Translation.” In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 311–18. Philadelphia, Pennsylvania, USA: Association for Computational Linguistics.

Scott, Josh. 2022. “Cohere Closes $125 Million USD Series B Round Led by Tiger Global.” BetaKit (blog). February 15, 2022.

Siddhant, Aditya, Melvin Johnson, Henry Tsai, Naveen Ari, Jason Riesa, Ankur Bapna, Orhan Firat, and Karthik Raman. 2020. “Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation.” Proceedings of the AAAI Conference on Artificial Intelligence 34 (05): 8854–61.

Simonite, Tom. 2022. “YouTube’s Captions Insert Explicit Language in Kids’ Videos.” Wired, February 24, 2022.

Turovsky, Barak. 2016a. “Ten Years of Google Translate.” Google. April 28, 2016.

Turovsky, Barak. 2016b. “Found in Translation: More Accurate, Fluent Sentences in Google Translate.” Google. November 15, 2016.

Zhu, Aihua. 2021. “Man-Machine Translation—Future of Computer-Assisted Translation.” Journal of Physics: Conference Series 1861 (1).

Leave a Reply