Do robots love language? Bias and Google Translate

Translate Tongan? You'll have to ask him--Google Translate can't help.
Translate Tongan? You’ll have to ask him–Google Translate can’t help.

I tend not to follow the mainstream. I study languages that others don’t, and I’ll often gravitate towards marginal dialects when I can. When I speak Arabic, I try to throw in a little Moroccan when I can. Speaking Russian, I might add a little bit of a Ukrainian accent. Right now, I’m learning Swiss German, which I’m afraid will irritate my standard German-speaking friends.

Google Translate follows the mainstream. It is a tool developed by a savvy business filling a commercial need. People who have and spend money need an application to conduct their business more easily. I addressed the relative value of languages in an earlier post.

Unfortunately, Google Translate reflects the mainstream. It offers the languages of the powerful, and translates using the language of the status quo without respect for what is good or right independent of how things are done. For using language the way most powerful people do, Google Translate works well; those of us who seek out the margins and buck the trend of “standard” speech see clear limitations in the language and gender bias of our world reflected in this software.

Which languages are most important?

I don’t know how I missed it, but I just saw this week that Google Translate expanded into African languages a few months ago.

80 languages at Google Translate
80 languages at Google Translate

You can see that now it includes five African languages: Somali (how did I miss that?!) and Zulu, plus the three most widely-spoken languages of Nigeria (Igbo, Yoruba, and Hausa). The only other African languages offered previously were Swahili and Afrikaans from 2009.

The service follows the power structure of the Internet. You can see the stages of growth of the software in this article. Here is the general process of expansion. The first languages were all EU languages, and quickly were accompanied by ones from East Asia. After Arabic and Russian appeared, eastern European and Southeast Asian languages came next. Other Southeast Asian and Central Asian languages arrived, until the first American (Hatian Creole) and African languages were incorporated (including Afrikaans, which some may call a European language). Even though Hindi was one of the earlier languages, other Indian languages surprisingly only came at a late stage–after Latin!–and, finally, in the last stage, a group of African languages and the first Oceanic language, Maori, made it in. No indigenous languages of North or South America are yet to be represented.

I don’t believe Google would have a policy to include or exclude languages. As a successful business, they would naturally gravitate to languages that would bring the most sets of eyes to their site. Also, when they figured out a language so they could add it, adding a closely-related language wouldn’t take much additional effort. For example, adding Danish, Swedish, and Norwegian in the same release makes sense, and once Spanish is well-established, Catalan probably takes minimal effort.

Their stages of development reflect a reality of the internet and commercial value of languages. Europe and East Asia are the most important, then Asia and Southeast Asia, and finally Africa. The indigenous peoples of Oceania and the Americas are insignificant. I noticed some odd anomalies. Hundreds of millions of Indians’ mother tongues were left till quite late. I think there’s an assumption that Indians can just use English. At the same time, Welsh or Irish were added much earlier, in spite of very few monolingual speakers. For some reason, Western European languages received preference that Indian languages did not. I don’t think this is racism, however; Google reflects an economic reality in its amoral inequality of wealth and poverty.

Is Google Translate sexist?

At one time, some people accused Google Translate of gender bias. They noted that phrases that included ambiguous gender sometimes came back with a gender. Some people were scandalized because translations reflected an unwanted stereotype. For example, this article describes gender bias manifested in German. In German, Lehrer can mean a man or woman teacher, while Lehrerin is a woman teacher. I translated “physics teacher” and “math teacher” and they both used Lehrer, while “French teacher” and “cooking teacher” translate with Lehrerin, imposing a gender bias of certain areas of specialization.

I ran another experiment. In Arabic, like many other languages, there is no “it,” so one uses a masculine or feminine pronoun based on the grammatical gender of the noun. So “door” is “he,” while “car” is “she,” for example. I translated, “He fixed the car” into Arabic, and translated it back, and got the same, “He fixed the car.” When I translated, “She fixed the car” into Arabic and back, Google served up, “It fixed the car.” Maybe it is more easily imaginable that a wrench would fix a car than a woman would.

These results reflect the methodology of the translation, which is to draw from a large corpus of incidents. The author of this article interviewed an engineer working on the software who said, “Statistical patterns were used to allow the tool to determine what gender was being referred to. Should the text include the word “dice”, which is Spanish for “says”, the algorithm will not only assess the frequency that this is historically used to refer to a male or female speaker, but also the other words in the inputted text.” The software reflects how the phrase is used. It is a robot reflecting the real use of human language with stereotypes, biases, and all.

We can’t really blame the bias of the software–we can only blame our own biases. The software has no ability to understand the pragmatics of the situation. Modern Hebrew reflects the gender of the subject in present verbs. When I translated, “I am nursing the baby” or “I am giving birth,” the gender was masculine. It seems that when there is little evidence, the software defaults to masculine, even if it can’t make sense in real life. When a real bias comes out of the language, the software presents that as what you, as a “typical” speaker of the langauge, were “probably” getting at. Simply put, people talk more about women as French teachers than as physics teachers in German. Google Translate reflects our world.

Our tool in our world

I love all languages. I think we can use language to lift people up. We don’t have to marginalize languages or individuals with what and how we speak.

But our world is what it is: biased. You can make more ad revenue with some languages than with others. We tend to find fewer women working in math and science than with children. Google reflects this right back at us.

Languages rise and fall and adapt more quickly than our software. Humans can see trends coming that computers can’t. People feel right about speaking one way instead of another.

I buck the trend, though. I want to speak languages that are not money-makers. I want to find ways to focus on the marginalized rather than keep them on the margins. If I want to change the status quo, I can’t rely on Google Translate. I have to learn to speak for myself, with my own words.

Be sure to “Like” if you support the margins, those people and languages who don’t follow the trend.

Photo credit: Light Knight / Foter / Creative Commons Attribution-NonCommercial-NoDerivs 2.0 Generic (CC BY-NC-ND 2.0)

 

Intermediate language-learning: Beyond the basics

This is the Linguistadores logo for Dutch -- one of several languages
This is the Linguistadores logo for Dutch — one of several languages offered

At this point, the language-learning market is saturated with on-line tools. They tend to fit in two categories: 1) very basic vocabulary and exercises (eg, Transparent Language) and 2) social networks for language exchanges (eg, iTalki).  Very little exists, unfortunately, for more intermediate learning. What do you do if you have the basics of the language down fairly well (eg, verb tenses, noun declensions, 200+ vocabulary words), but want to move on? You don’t know enough for, say, movies without subtitles or podcasts. Conversations with native speakers can’t last very long yet. Linguistadores has imagined the next step by helping your learning through native-language content, geared to your level.

Your choice of media
Your choice of media

This platform offers access to real pop culture items, but broken down for language learners. I tried out Dutch as the language I was learning and English as my native language. First, you have to input your language ability level. Then, the application will serve up material for your level. Materials come from three categories: written, videos, or music. The written are articles from popular periodicals.

From a music video, you can look up a word from the lyrics and add it to your list.
From a music video, you can look up a word from the lyrics and add it to your list.

Videos are popular TV shows or movies hosted on another site (eg, YouTube), and music are videos of pop songs. The pop songs play the video with the words of the song next to the video, but I couldn’t find subtitles for the non-music videos. You can easily look up words from the articles and songs.

You can save and collect words into a list to create flashcards.
You can save and collect words into a list to create flashcards.

Linguistadores also offers you a way to keep track of new words. As you run into unfamiliar words, you can click on them and save them. You can use these lists as flash cards for memorizing the words.

The site is in its beginnings, so I hope that it will grow in a few areas. First, I hope they come up with a mobile platform very soon. I do all my language study on the go. If I’m on a computer, I’m at work. (And I better be working!) I could only watch videos and scroll through the songs’ texts on my iOS and Android devices.

A representative of Linguistadores let me know already (they were very responsive to me on Twitter) that they are working on a mobile platform. I will be giving them my ideas and suggestions — and I’m looking forward to the results. I’m hoping that the word lookup function and the videos will be available in the mobile version.

Second, I hope the language offerings are expanded. Right now, the choices are English, Dutch, German, French, and Spanish. I know these languages fairly well, and I would prefer to spend my time getting my lower languages up to a higher level. I think it will take some time to expand offerings, however, as the quality and quantity of the language materials are very high. It takes a lot of effort to keep things at this level. (How long till they get to Farsi and Somali? LOL)

Third, I wonder about the future of the material they have. How do they plan to keep the offerings fresh? There are only so many music videos, for example. I’m afraid I could possibly get bored if I have to watch the same ones too many times. Also, several of the videos I tried to watch were taken down by the original owner, which is bound to happen down the line.

Nevertheless, I believe that on-line language learning has to go the direction that Linguistadores laid out. As a kid, I stepped up my native language by looking up new words in the dictionary. I also spent a lot of time reading the lyrics to songs I liked, which gave me an ear for how people enunciate in music. I want to get to a point where I can learn on my own from native content, and Linguistadores offers a wonderful stepping-stone.

What are the on-line tools you’re using for language-learning? What do you love about them?