Language Preservation and Artificial Intelligence

How would the modern world function without language? Languages are entities that are alive and in constant flux. They hold keys to our historical heritage and develop as we acquire new vocabulary. So what would happen if we lost it? The extinction is not recent; however, the pace at which languages are disappearing today has no precedent and is alarming.

There are over 7,000 languages in the world. This diversity of languages, cultures, and thoughts is a healthy sign. A world with a monoculture would negatively affect the world. About 3,000 languages are in danger of becoming extinct. We are vulnerable to losing our culture and part of human history if we let these languages go, carrying devastating consequences. With every lost language, we lose an enormous cultural heritage; the understanding of how humans relate to the world around us; scientific, medical, and botanical knowledge, and most importantly, we lose the expression of communities’ humor, love, and life.

There are many reasons why languages go extinct; for this article, we’ll be focusing on technological advances and biases in Artificial Intelligence data collecting and the consequences of how that might neglect rarer languages. 

A majority of the information on the internet is in English, and it has become the unofficial global language in business. Artificial intelligence takes the reins of more and more Internet queries, data collection and analysis, and learns the subtleties of spoken languages like English. The challenge for national languages is to balance using English for international trade and as an instrument for progress while still retaining the unique elements of one’s own culture.

One of the initiatives to support this mission is the Endangered Language Project which puts technology at the service of the organizations and individuals working to confront language endangerment by documenting, preserving, and teaching them. Through their website, users can access the most up-to-date information on endangered languages, in addition to playing an active role in putting their languages online by submitting information or samples. This data is vital so that AI can collect and help preserve the language.

From The Endangered Language Project, The Spar Spangled Banner was sung in Navajo.

Glossika, a linguistics company, says it’s using artificial intelligence (AI) to speed up language learning by increasing or decreasing the learner’s exposure to patterns depending on their feedback. In this approach, the language learning process is based on repeating phrase patterns that show the relationships between words, eliminating the traditional vocabulary and grammar repetition. And in early December 2017, two AI systems were able to teach themselves any language known to man. In the same month, December 2017, Capiche, an AI translation and tracing application, was created to improve communication with and from refugees in Europe to help them overcome obstacles due to a lack of language skills.

But How Does AI Work?

According to the National Artificial Intelligence Act of 2020, AI is a machine-based learning system that can make predictions, recommendations, or decisions influencing real or virtual environments for a given set of human-defined objectives. The DataPine predicts that you won’t be able to escape the word AI in 2022 according to their  Top 10 IT & Technology Buzzwords You Won’t Be Able To Avoid In 2022 report. 

Artificial Intelligence systems will start to become virtually invisible, as we rely on AI for more and more tasks, they will rapidly become as familiar as all the other technologies we interact with every day. This growing reliance on AI will lead to artificial intelligence being accepted as natural,”

Dr. Joseph Reger, Fujitsu Fellow, and Chief Technical Officer EMEIA

Artificial Intelligence has many possibilities to offer. For example, machine learning learns patterns much faster than the human brain. As a result, AI technologies deliver remarkable results in highly specialized fields. As a result, it is estimated that the AI market will grow from $16.06 billion in 2017 to $190.61 billion by 2025

The virtual assistants: Alexa, Cortana, and more have taken the consumers’ market storm. Google Now and Siri have become ubiquitous, thanks mainly to the voice recognition software becoming much more potent over the past year. Moreover, Siri, Cortana, and Alexa all share similar roles – enabling us to live one step closer to the futuristic notions of having AI virtual assistants that can do anything we need on a whim.

According to Adi Agashe, a program director at Microsoft, Alexa is built using Natural Language Processing. First, it records your words, then it sends them to Amazon where they will be to be analyzed. Amazon breaks down your words into individual sounds where it consults the It database containing various words’ pronunciations to find which words most closely correspond to the combination of individual sounds. Then it sends the message back and delivers the order. See the picture below.


The Downsides to AI

Even though AI has provided us with technological advances, it has its downsides. First, it is pervasive as it enables many of our daily routines. Along with that, there is also the danger in the form of threats to privacy, dignity, and agency, dangers of mass surveillance, and increased use of unreliable AI technologies in law enforcement. Gender and ethnic bias are prevalent in voice recognition. Disparities exist because of the way we’ve structured our data analysis, databases, and machine learning. As a result, audio commentary struggles with breathier and higher-pitched voices. The underlying reason may be that databases have lots of white male data and less on female and minority voices. For example, speech scientists frequently analyze TED Talks, and 70% of TED speakers are male. AI is therefore set up to fail.

With these concerns, the United Nations Education, Science, and Cultural Organization UNESCO launched a project: to give the world an ethical framework for using artificial intelligence. Thanks to the mobilization of experts from around the globe and intense international negotiations, the 193 UNESCO member nations signed the first-ever ethical framework of AI. They include four parts: Protecting data, Banning social scoring & mass surveillance, Helping to monitor and evaluate, and Protecting the environment.

A Case study of the Icelandic Lanaguage

On the North-Atlantic island of Iceland live about 350,000 people with their native language that has almost stayed the same since the Norse settlers made it home. The Icelandic language has remained isolated from the rest of the world, Icelandic has retained its uniqueness and complexity. Known to be complex and challenging to learn, Icelandic is now in danger of dying out because digital technologies are increasingly tailored towards the language of business: English

,,Linguistics experts, studying the future of a language spoken by fewer than 400,000 people in an increasingly globalized world, wonder if this is the beginning of the end for the Icelandic tongue linguistic experts fear that the Icelandic language is threatened with extinction because of the widespread use of English in new technologies and the country’s reliance on English-speaking mass tourism..”

Street view in Reykjavik, Iceland

Former President Vigdis Finnbogadottir and UNESCO goodwill ambassador told The Associated Press that Iceland must take steps to protect its language. She is particularly concerned that programs be developed so the language can be easily used in digital technology.

“The less useful Icelandic becomes in people’s daily life, the closer we as a nation get to the threshold of giving up its use

Eirikur Rognvaldsson, a language professor at the University of Iceland.

Teachers are already sensing a change among students in the scope of their Icelandic vocabulary and reading comprehension. Anna Jonsdottir, a teaching consultant, said she often hears teenagers speak English among themselves when she visits schools in Reykjavik, the capital. This is mainly due to most modern technologies that are not tailored to or trained in Icelandic. In addition, Icelandic is one of the least-supported languages digitally. According to Iceland’s Ministry of Education, one of the practical steps considered is the allocation of about USD 8.8 million to fund an open-access Icelandic database to facilitate the work of tech developers and make Icelandic a language option. This project is currently being conducted with data collected to develop AI using the Icelandic language.


It is essential to utilize emerging technology to bridge the past. AI has many possibilities, and it is exciting to see where the future takes us, but it is essential to be aware of the biases and threats mentioned.

What Iceland has that many of these countries lack are that they are well-developed countries with high GDP and an educated population, which is not the case in many countries. Also, the vulnerability of the Icelandic language was not as far gone as other ones on the list of endangered languages. 

Leave a Reply