Microsoft Translator: Making Knowledge Access Equitable

Appen data helps Microsoft promote equitable knowledge

The Project

In the early days of online translation, the software was clunky, and the translations directly translated every word, often leading to serious misunderstandings in the nuances of language. Microsoft Translator has made translation easier, more accurate, faster, and made synchronous multi-language communication possible.

Microsoft Translator started by working with the world’s most frequently spoken languages. Today, they’re adding more and more languages. Less common languages are being added to Microsoft Translator regularly and are being used to teach younger generations, to preserve languages that are disappearing, and to make knowledge access equitable and accessible, no matter what language you speak.

The Challenge

Microsoft Translator, powered by Azure Cognitive Services, uses AI technology to parse language and translate it into another language. To do this, they need a large, accurately annotated training dataset to prepare the translator model for each language.

Microsoft Translator struggled to get the size of dataset they needed for some of the less frequently spoken or cataloged languages. Creating a dataset takes time, knowledge, and resources. Translating to languages that have a different alphabet requires phonetic similarity & transliteration first which can be done with expert staff and linguists. You must find fluent speakers, collect data points, annotate each data point, and run quality assurance tests to ensure accuracy.

To speed up their time to market, Microsoft reached out to outside sources to collect and prepare the data they needed.

The Solution

Appen was the vendor of choice that Microsoft Translator reached out to work with on this language project. We provided the expertise, resources and creative solutions needed to create translated datasets from rare languages and run the necessary quality checks.

Our process included working with local resources to source translations from fluent speakers. We collected data, annotated the data by transcribing and translating each data piece, and evaluated the model outputs for quality assurance and accuracy. We developed a service that would allow Microsoft to generate multiple translations for gender-ambiguous source sentences – addressing translation and bias.

Our work for Microsoft Translator encompassed three of the data for the AI lifecycle stages: data sourcing, data preparation, and model evaluation by humans. By completing this work, we helped Microsoft Translator get the data they needed at the highest possible quality, and on time.

The Result

As a result of our partnership, Microsoft Translator now has 110 languages available for consumers to use for translations and working in other languages. Appen supported the data gathering process for 108 of those 110 languages.

While there are 110 available languages, some of the newer and less commonly spoken languages include:

The links lead to Microsoft blog posts that go in-depth about the language and the process in adding it to the Microsoft Translator AI.

No matter our client or the size of the project, we’re proud to create the highest quality data possible so that we’re part of the solution of making AI better. Representative data is how we make AI more ethical. Our work with Microsoft Translator to represent all languages, not just those with the most speakers, is part of our goal of making AI better and more ethical.

Website for deploying AI with world class training data


Andrew Ettinger | Chief Revenue Officer

Andrew Ettinger joined Appen as Chief Revenue Officer in May 2023 overseeing the company's revenue strategies and driving growth in the field of AI. He joined Appen with more than 25 years of sales experience in sales and services in the technology industry. 

Andrew's expertise extends to harnessing the power of data to drive insights and optimize processes. As the Chief Revenue Officer at Astronomer, he successfully grew the adoption of their open-source data solution, leading to a remarkable increase in monthly downloads and revenue. His strategic initiatives resulted in a 600% growth in customer count and a 75% win rate. 

Prior to joining Astronomer, he served as the VP of Sales at Pivotal Software, where he helped grow the business from zero to $100 million in annual recurring revenue in a single year leading up to Pivotal’s initial public offering, and up to $500 million thereafter. Under his leadership, the company achieved three consecutive years of 50% revenue growth, fueling digital transformations for Fortune 500 companies in various sectors. 

Andrew holds a Bachelor of Science in Business Marketing from The Ohio State University.