Do robots love language? Bias and Google Translate

Translate Tongan? You'll have to ask him--Google Translate can't help.
Translate Tongan? You’ll have to ask him–Google Translate can’t help.

I tend not to follow the mainstream. I study languages that others don’t, and I’ll often gravitate towards marginal dialects when I can. When I speak Arabic, I try to throw in a little Moroccan when I can. Speaking Russian, I might add a little bit of a Ukrainian accent. Right now, I’m learning Swiss German, which I’m afraid will irritate my standard German-speaking friends.

Google Translate follows the mainstream. It is a tool developed by a savvy business filling a commercial need. People who have and spend money need an application to conduct their business more easily. I addressed the relative value of languages in an earlier post.

Unfortunately, Google Translate reflects the mainstream. It offers the languages of the powerful, and translates using the language of the status quo without respect for what is good or right independent of how things are done. For using language the way most powerful people do, Google Translate works well; those of us who seek out the margins and buck the trend of “standard” speech see clear limitations in the language and gender bias of our world reflected in this software.

Which languages are most important?

I don’t know how I missed it, but I just saw this week that Google Translate expanded into African languages a few months ago.

80 languages at Google Translate
80 languages at Google Translate

You can see that now it includes five African languages: Somali (how did I miss that?!) and Zulu, plus the three most widely-spoken languages of Nigeria (Igbo, Yoruba, and Hausa). The only other African languages offered previously were Swahili and Afrikaans from 2009.

The service follows the power structure of the Internet. You can see the stages of growth of the software in this article. Here is the general process of expansion. The first languages were all EU languages, and quickly were accompanied by ones from East Asia. After Arabic and Russian appeared, eastern European and Southeast Asian languages came next. Other Southeast Asian and Central Asian languages arrived, until the first American (Hatian Creole) and African languages were incorporated (including Afrikaans, which some may call a European language). Even though Hindi was one of the earlier languages, other Indian languages surprisingly only came at a late stage–after Latin!–and, finally, in the last stage, a group of African languages and the first Oceanic language, Maori, made it in. No indigenous languages of North or South America are yet to be represented.

I don’t believe Google would have a policy to include or exclude languages. As a successful business, they would naturally gravitate to languages that would bring the most sets of eyes to their site. Also, when they figured out a language so they could add it, adding a closely-related language wouldn’t take much additional effort. For example, adding Danish, Swedish, and Norwegian in the same release makes sense, and once Spanish is well-established, Catalan probably takes minimal effort.

Their stages of development reflect a reality of the internet and commercial value of languages. Europe and East Asia are the most important, then Asia and Southeast Asia, and finally Africa. The indigenous peoples of Oceania and the Americas are insignificant. I noticed some odd anomalies. Hundreds of millions of Indians’ mother tongues were left till quite late. I think there’s an assumption that Indians can just use English. At the same time, Welsh or Irish were added much earlier, in spite of very few monolingual speakers. For some reason, Western European languages received preference that Indian languages did not. I don’t think this is racism, however; Google reflects an economic reality in its amoral inequality of wealth and poverty.

Is Google Translate sexist?

At one time, some people accused Google Translate of gender bias. They noted that phrases that included ambiguous gender sometimes came back with a gender. Some people were scandalized because translations reflected an unwanted stereotype. For example, this article describes gender bias manifested in German. In German, Lehrer can mean a man or woman teacher, while Lehrerin is a woman teacher. I translated “physics teacher” and “math teacher” and they both used Lehrer, while “French teacher” and “cooking teacher” translate with Lehrerin, imposing a gender bias of certain areas of specialization.

I ran another experiment. In Arabic, like many other languages, there is no “it,” so one uses a masculine or feminine pronoun based on the grammatical gender of the noun. So “door” is “he,” while “car” is “she,” for example. I translated, “He fixed the car” into Arabic, and translated it back, and got the same, “He fixed the car.” When I translated, “She fixed the car” into Arabic and back, Google served up, “It fixed the car.” Maybe it is more easily imaginable that a wrench would fix a car than a woman would.

These results reflect the methodology of the translation, which is to draw from a large corpus of incidents. The author of this article interviewed an engineer working on the software who said, “Statistical patterns were used to allow the tool to determine what gender was being referred to. Should the text include the word “dice”, which is Spanish for “says”, the algorithm will not only assess the frequency that this is historically used to refer to a male or female speaker, but also the other words in the inputted text.” The software reflects how the phrase is used. It is a robot reflecting the real use of human language with stereotypes, biases, and all.

We can’t really blame the bias of the software–we can only blame our own biases. The software has no ability to understand the pragmatics of the situation. Modern Hebrew reflects the gender of the subject in present verbs. When I translated, “I am nursing the baby” or “I am giving birth,” the gender was masculine. It seems that when there is little evidence, the software defaults to masculine, even if it can’t make sense in real life. When a real bias comes out of the language, the software presents that as what you, as a “typical” speaker of the langauge, were “probably” getting at. Simply put, people talk more about women as French teachers than as physics teachers in German. Google Translate reflects our world.

Our tool in our world

I love all languages. I think we can use language to lift people up. We don’t have to marginalize languages or individuals with what and how we speak.

But our world is what it is: biased. You can make more ad revenue with some languages than with others. We tend to find fewer women working in math and science than with children. Google reflects this right back at us.

Languages rise and fall and adapt more quickly than our software. Humans can see trends coming that computers can’t. People feel right about speaking one way instead of another.

I buck the trend, though. I want to speak languages that are not money-makers. I want to find ways to focus on the marginalized rather than keep them on the margins. If I want to change the status quo, I can’t rely on Google Translate. I have to learn to speak for myself, with my own words.

Be sure to “Like” if you support the margins, those people and languages who don’t follow the trend.

Photo credit: Light Knight / Foter / Creative Commons Attribution-NonCommercial-NoDerivs 2.0 Generic (CC BY-NC-ND 2.0)

 

Advertisements

104 thoughts on “Do robots love language? Bias and Google Translate

  1. Yeah… I’m still waiting for Gaidhlig (Scottish Gaelic). You’d think after Irish it wouldn’t take too long, by your reasoning. Or, even better, Kriol. Unlike Gaidhlig and te reo Maori, there are monolingual Kriol speakers out there (or, at least, Kriol-speakers who don’t speak one of Google Translate’s other languages). The same goes for Tok Pisin, for that matter…

    As far as I’m concerned, though, online translators can’t be trusted as more than a dictionary (put one word it, it’s wonderful – it gives you a whole handful of translations to choose from!). I still tell people about the time I put in “Pourriez-vous….?” and it came out with “Wilt thou…?”!

    Liked by 3 people

    1. Robot translation is pretty funny. That’s a great example.

      I have no idea how Irish and Welsh got in, but Scottish Gaelic didn’t. Why Basque but not Breton?

      But Oceania is absent except for Maori, and that sounded like a force of will, if you look at the Google Translate blog. They counted on a crowd of Maori volunteers for that.

      Of course, the Maori project intrigued me. Could you do that for other languages? How about widely-spoken languages like Oromo (30 million speakers) or Amharic (20 million)? How about the official languages of states of the US, like Hawaiian or Tlingit?

      Liked by 3 people

      1. My guess is that it’s because Irish and Welsh are both spoken widely/ taught in schools in Ireland and Wales, while Gaidhlig is still limited to pretty much the upper and westernmost edge of Scotland and a few scattered communities in major cities.

        Actually, you’re right, it’s about official status – Gaidhlig and Breton aren’t official, Irish and Welsh are, and Basque is used as a weapon for independence. My Welsh doesn’t extent beyond a few phrases, but I’d be interested to know which dialect they use. Probably south; that’s where all the major cities are.

        That still doesn’t explain why Esperanto and Yiddish are on there, and various important other languages aren’t. (Are there any monolingual Yiddish speakers out there? If you’re going to include Yiddish, why not Pennsylvania Dutch?)

        Now we’ve got Maori, I’d like to see Pitjantjatjara. But if Maori was such a force of will by Maori volunteers, that won’t happen – all the best Pitjantjatjara translators will be tied up with the Old Testament for the next twenty years.

        At least they haven’t included Klingon (yet).

        Liked by 1 person

  2. Fascinating, especially re. gender bias.
    I use GT a lot (and with great caution), mainly in languages I know fairly well, so I’m aware that it makes horrendous mistakes, even sometimes with simple sentences in MAJOR languages. Also, it keeps giving me Brazilian Portuguese when what I want is European – can they not offer both options?! I dread to think how it’s going to screw up minority languages when it can’t even get Spanish or German right.

    Liked by 1 person

  3. I like the prior to comments, I use google translate mainly as a Dictionary and when I have a greater proficiency in the language I see the errors and many are due to gender, which in my opinion has to do with English (American) not having masculine or feminine pronouns and the richness of other languages compared to English; sometimes it is hard to offer a translation even if you are bilingual somethings just don’t translate with google or otherwise; but when conversing with another you can get a lot closer than just typing a sentence into a box.

    Liked by 2 people

  4. Pingback: Do you love the languages around you? | Loving Language

  5. With regard to the much later development of Indian languages, I don’t think it’s because there’s an assumption that Indian people can “just use English.” I think it has more to do with the assumption that Indian people can “just speak Hindi.”

    Source: A neighbor from college who was an engineering student from Nepal. He said that he and everyone in that general area of the world (in the engineering circles at least) spoke Hindi because of the necessity of communicating with Indian engineers.

    Granted that’s just one story from one person, but he (and his wife) were the only people in the neighborhood who could talk to the Indian engineering students who moved in next door to me. In Hindi of course.

    Liked by 2 people

    1. That could very well be the case. I can add my own anecdote to the story. A friend of mine from Pune told me that Southern Indians are known for a high level of education. Some of them, especially citizens of Tamil Nadu state don’t speak Hindi. Hindi is much easier for Northerners than Southerners.

      Like you say, it’s one story from one person. Maybe it’s worth more study?

      Liked by 2 people

      1. I certainly think so! I think it’s interesting which languages are easier / more intuitive to which native speakers. Although that won’t always be universal, either. For me, Mandarin is a pretty easy language to learn to speak, because I can hear the tones just fine and also because (my favorite) of the grammatical simplicity. But when I tell people that they look at me like I’ve grown a second head. 🙂

        Like

      2. Love the tones! When I tell people that the grammar of the whole language family, that Chinese, Vietnamese, etc, has super easy syntax, I get that look too.

        I posted on “Why Somali is harder than your language”, which addresses this issue of “what’s harder” for whom.

        Liked by 2 people

  6. If software engineers program Google translate based on statistical usage of male vs. female nouns for verbs/actions, then clearly it’s a great example how Google Translate cannot help us change language use to reflect societal changes in equity, human relationships and new human activities.

    Liked by 1 person

    1. That would be a very cool experiment. For example, do sentences with female pronouns and verbs show more or fewer errors? Are “the crowds” spending as much time working on those errors as others?

      As far as politics and language, this would be a great project.

      Like

  7. I wouldn’t blame Google for the gender bias or things like that because most of the time it’s not Google doing the translation, it’s someone familiar with that language. For example, we have seen instances where typing in son of a b—– translates to the name of a local politician. Can we blame Google for that? Not really. Google just followed the algorithms. An algorithm that learned from us.

    Liked by 1 person

    1. I agree with you. I think our own human gender bias is to blame.

      I believe that Google is doing the translation based on statistical analysis. It takes crowd-sourced suggestions, too. In my examples of Hebrew, above, I showed that the tool is based on statistics over pragmatics. Hence, “I nursed the baby” comes up masculine, since there are more masculine verbs in Hebrew, not because the sentence actually makes sense.

      Like

  8. I share your feelings… but I only got seven (including swiss-german, which has at least 25 different dialects and certainly other 25 sub dialects if I consider that in the italian speaking part -South Switzerland- every Valley has its own dialect)…
    In the waiting list is still Japanese (only got 1 year lesson but you need to practice unless you forget it), arabic, russian and chinese.
    Serenity Claudine

    Liked by 1 person

  9. Ur missing the technical angle that the programmers are not sexist, they dont care which languages are covered because the goal is to include them all, and from a usability perspective if ur using google translate to get a perfect translation u r barking up the wrong tree because robots cannot replace the poetic selection of words

    Liked by 1 person

  10. You can always tell people’s biases when, all things being equal, they are forced to make a decision. When there are 6000 language to choose from, which will they pick? Significantly, Google has not chosen by numbers of speakers. I’m exploring here what hermeneutic appears based on the languages they picked. The languages are skewed towards richer and more influential people in the tech field. This is a fact, and I see why they would choose that way.

    I’m using the Google Translate tool to reveal biases in our culture. The tool is technically linguistically unaware to, as you put it, the poetic selection of words. It just looks at trends among humans. And the human trends are skewed towards the rich, the powerful, and the masculine.

    To overcome that problem, we have to look to us humans, not the robots that reflect us.

    Liked by 2 people

  11. Great take on the subject. I realize much of what you said while in university the language offering for very poor; and instruction was another thing. Language is a powerful tool and we are limited on that ,especially here in the States

    Liked by 2 people

    1. Thank you. As an undergrad, I see what you mention, but at the U of Wisconsin, I saw a lot of options. It depends on the place. However, I think the number of languages is falling as the humanities loose funding.

      I say: Go out to your community and learn the languages they speak there. What do they speak where you live?

      Like

  12. Thanks for good article. My mother language is Korean and when doing the google translation it is funny and doesn’t work well. But when I translate Spanish-English, it is quite accurate. I think the popular language works well better.

    Liked by 3 people

      1. Zyriacus

        I’m under the impression, that Google Translator uses English as an intermediate. When trying to translate from German into French – and back into German a “Sängerin” is turned into a “singer” and appears in the German re-translation as a “Sänger”.

        Like

    1. On the other hand, English and Spanish are much closer historically, grammatically, and with vocabulary, since they’re both Indo-European languages and English had so much influence from French, which is, of course, *very* close to Spanish. Korean, meanwhile, is a completely unrelated language with a very different vocabulary and grammar… No translation is ever going to be completely accurate, even with a person.

      (But I do agree with you. My aunt and one of my cousins is Korean and I once put something through Google Translate for them – my aunt said she couldn’t understand a thing, it made no sense! Likewise translating comments from Korean to English.)

      Liked by 1 person

      1. Your explanation makes sense, Rachel.

        I was looking at the explanation of the algorithm, and some languages go through other languages. So if you’re German translating Korean, Google might go through English. It’s complicated!

        Like

      2. I think I’ve heard or read somewhere that a lot of language translators put everything through a con-lang such as Esperanto, to iron out grammar irregularities etc. I’m not sure how true that is. But it makes sense that if Korean-English and German-English are common on Google Translate, it would have algorithms for that but not for a less common translation such as German-Korean, and would then translate through English. That just opens up that much more room for mistakes, though…

        Like

  13. Great post. Google recently failed me when i wanted basic translation from English to Persian. I needed pronounciation and/or lettersin English I could at least struggle with. However, no luck with Google. I was relegated to turn to YouTube – thankfully the answer was there. A definite bias on Google’s part as it has never failed me this way for Mandarin, Swedish, French, or Portuguese. Cheers, and congrats on a well-deserved FP!

    Liked by 1 person

  14. I do agree with you guys. Certainly, popular language is easy to translate and to understand. Even for a robot using language. of course, the robot will be easily programmed with the popular language.

    Liked by 1 person

  15. I think that if you want to learn an unusual language, you should have a look at Italian and his vernaculars. Here in Italy if you just move to a town near yours, you already see a difference, so it is really difficult to not have an odd accent.

    Liked by 1 person

      1. Surely there are, but the most are in Italian; the English ones are quite expensive, for example I found one by Martin Maiden. I really don’t know how it can be learning it, because the dialect I can speak I have learnt by myself and by the people around me, so the only things I read written in these vernaculars were some novels by Verga and the Federick II medieval school’s sonnets(Sicilian) and a book by Dario Fo(Lombard and Venetian). A good syntesis of the Italian dialects are Manzoni’s works, but they are quite slow to read and difficult.

        Like

  16. It’s amusing to use Google Translate on a paragraph through a couple of different languages and then change back to English. In many cases, the translation is totally different from what the paragraph originally said! Though it’s great to have Google Translate to quickly translate things we need to know, I still feel like there won’t ever be software that can communicate ideas like a person can.

    Liked by 2 people

  17. Pingback: Google Translate Hacks | Citizen Sociolinguistics

  18. Pingback: The World’s Languages Have A Significant Bias Towards Happiness – Business Wolf

  19. solarey

    Very interesting post. Food for thought. Last I checked silicone valley was 80% male and 20 % female. I think because it’s male dominated, it’s natural that most of the foreign languages would be masculine. As you stated before it’s not done intentionally. But I must admit, I do like Google translate alot. I see it as a free digital Rosetta Stone. I have an interest in summerian texts, I’m sure they will eventually get to that too. 😉

    Liked by 1 person

    1. Thanks for your input. I think that the language reflects broader use, global and not just in Silicone Valley. At the same time, I think the gender imbalance there does trickle through to their products without their realizing it.

      Liked by 1 person

      1. solarey

        I agree but I’m sure that in time this will be resolved. Granted it’s not perfect but keep in mind this is relatively new & free! 😆 as for gender imbalance especially in language, we need more bilingual tech savy females.

        Like

      2. I don’t know if robots can solve this one in the end. If one language combines in one word what another language expresses in two different words, the translator has to make a choice. Humans are better at these choices. If we translate “teacher” in to German, we have to know the gender of the teacher, whether it’s “Lehrer” or “Lehrerin.” One has to understand the tendencies of the world to make this choice.

        Liked by 1 person

      3. solarey

        Humm… maybe voice recognition? The male voice is deeper and a woman’s voice is higher pitch, this might be the language solution. Sound

        Like

  20. Suddenly there was Zulu in there. I was surprised. Some things do not happen on their own. Somebody speaks up and makes the effort to make a change. Thank you for an interesting article. I write poetry at mine should you ever get the chance to visit. 🙂

    Like

  21. So, where was I? Yes, and suddenly Zulu appeared in the mix. Some things do not happen on their own. We are uplifted by those who speak up and make an effort for us all. Thank you for the interesting article. I write poetry by mine should you get the chance to visit. 🙂

    Like

  22. Considering it’s EnglishLanguageDay today, I’m really happy to have stumbled upon your article. You have some great thoughts! And if you need any help with your Swiss German, don’t hesitate to contact me 🙂

    Like

  23. Pingback: Good news! 50,000 views! | Loving Language

  24. Pingback: Microsoft is killing languages with Skype Translator (and so is Google Conversation Mode) | Loving Language

  25. some very interesting issues raised. I agree that it is because of our bias why we have these problems. It is shocking to think that our social bias can be programmed into ‘robots’. I may be taking this elsewhere however with faulty human traits being installed into computing who knows what the future has in store. However mainstream will always be mainstream, ultimately driven by hierachy and money. A detailed overview of all languages out of the love for language would be lovely. yet this isn’t an ideal word andeveryones motives aren’t pure.

    Liked by 1 person

  26. Pingback: Assessment | mentalfreedomblog

  27. Hi! Did you see today Google Translate’s expanded again? I’m a bit excited because Gaelic’s there… not sure how I feel about that. On one hand, *finally*, but on the other had, we’ve been enjoying being able to use FaceBook without people “understanding” by means of Google Translate. Also, it’s been billed as “Scots Gaelic”, which is a misnomer – “Scots” is a completely different language, and calling it “Scots Gaelic” only heightens the confusion – it’s “Scottish Gaelic”, as in “Gaelic of Scotland”. Anyway…

    I’m also a little concerned that some languages like Corsican, Frisian, and Luxembourgish have been added while other language less similar to pre-existing ones are still waiting. On the other hand, I’m glad to see languages like Pashto, Amharic, Kyrgyz, Zhosa, and Samoan – although I didn’t expect that last would take long after Maori arrived.

    I’ve run a few texts through the translator and it’s about as accurate as ever. I’m wondering which dictionary they used…

    Liked by 1 person

  28. Pingback: Google Translate and Gaelic | Rachel's Ramblings

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s