Results in short
Microsoft's Editor is absolutely unreliable in detection of biased wording and in suggesting inclusive language. It's even worse given that users are lead to believe—through "inclusivity configurations"—that the Editor would be able to do that.
Understanding the Tools
Both Witty and Microsoft Editor are browser plugins which help you to write more clearly and concisely. We compared them in order to understand their performance on bias detection, suggestion of inclusive language and ability to produce output in line with the EU AI Act, the US AI Directives, and Anti-Discrimination laws in both regions.
What is Microsoft Editor?
Microsoft Editor is a browser plugin that checks your spelling, grammar and helps you to write more clearly and concisely. It also makes refinements–if you configure it to do so–such as inclusiveness, corrects sensitive geopolitical references or improves clarity.
What is Witty?
Witty is an AI-based browser plugin focusing on detecting bias in language. It identifies biased terms across more than 50 diversity dimensions, including gender, race, age, mental health, and more. Witty provides suggestions for bias-free alternatives, keeping companies out of legal trouble stemming from anti-discrimination laws or AI Acts.
Witty is trained to detect several thousand biased terms and phrases in English, German, and French. The vocabulary is built on studies, input from specialized linguists, and associations related to the social movements of Black Lives Matter, #meToo, LGBTQIA+, people with disability, etc.
Our comparative studies
We did two different studies comparing the Microsoft Editor and Witty. In order to compare them, all configurations of the Microsoft Editor and Witty have been set to the maximum level possible in English and German.
Study 1: Imaginary test sentences
In this study, we compared imaginary test sentences, which we reviewed on two aspects:
- Diversity Dimensions: We tested how each tool detected bias across various diversity dimensions. For each of these terms, we created test sentences in English and German that could occur in a business setting. The wordings tested came from Witty's library of biased and discriminatory wording.
- Handling Male-Generic Language: We examined how each tool approached gendered language in German. In today’s business world in Germany, it is expected that organizations use gender-neutral language. This can mean special gender-neutral forms, using both male and female forms or using a gender sign (like a star) to also include non-binary people.
We entered the test sentence into the Microsoft Editor and checked if it detects/underlines the biased term and if an inclusive alternative is proposed.
For Witty, we pasted the sentence into our Witty Editor, where it was automatically analyzed by Witty.
In each test case, we evaluated two main criteria:
- Did the tool detect the biased wording or phrase?
- Did the tool suggest an inclusive alternative?
For Witty the path would always be ‘yes, yes’, since the wordings tested come from our library of biased and discriminatory wording.
Let’s look at the two aspects we tested and examples for each.
1. Detecting Bias Across Diversity Dimensions
A primary distinction between Witty and the Editor lies in the depth of diversity coverage. Witty’s AI is specifically trained across more than 50 diversity dimensions, ensuring that even subtle biases are flagged.
Here are some example sentences from different categories.
Diversity dimension/bias |
Tested word (from Witty's database of and its network of orgs) |
Test sentence |
Migration |
illegal immigrant |
The competitor’s company was fined for employing illegal immigrants in their factories. |
Mobility |
confined to a wheelchair |
Since the accident, our colleague is confined to a wheelchair. |
Gender cue |
man-hour |
How many man-hours will this project take? |
Age, against 50+ |
digital native |
Our next lead in communications should be a digital native. |
Mental image |
engineer |
We need an engineer who can handle complex projects and deliver on tight deadlines. |
We tested 89 terms in English and 94 terms in German from 44 diversity dimensions in English and 46 dimensions in German.
This is the text with which we tested the male generic in German. The yellow and grey underlined words are such male generic terms.
Quantitative Results
Since the test phrases were sourced from Witty's extensive database, Witty detected 100% of the biased terms in all cases. In contrast, Microsoft Editor showed significant limitations:
The Microsoft Editor detects 19% of biased wording in English, 10% in German and none of the male generic terms we tested.
The study’s quantitative results illustrated differences between the tools:
- The Microsoft Editor, even with all ‘inclusivity options’ activated, is not satisfactory at all in regard to bias detection.
- In German, the situation is particularly precarious.
- The Microsoft Editor did not catch any male generics in German in our tests.
These findings suggest that, while it can assist with general grammar improvements, the Editor's detection and handling of bias is absolutely unreliable. After the test, we are not even sure what it checks at all when it comes to inclusion. This is a very bad user experience given that with all configurations turned on, users would think that the Editor is reliable where really it is not doing anything at all. It very far falls short compared to Witty’s targeted approach.
Study 2: Real third-party texts
In our second study, we analyzed five real-world texts from well-known companies, such as Amazon’s LinkedIn About Page, Fiege Logistics’ About page, Kühne & Nagel’s website, a Google employee's social media post, and a Patagonia job advertisement.
Here is one of the five sample texts:
The table below illustrates the comparative performance of Microsoft Editor and Witty in detecting biased terms and suggesting inclusive alternatives. The results highlight a stark difference between the two tools:
-
English:
- Microsoft Editor detected only 1% of the terms flagged by Witty.
- It proposed inclusive alternatives for just 1% of the terms where Witty offered suggestions.
-
German:
- Microsoft Editor identified 4% of the biased terms detected by Witty.
- It provided alternatives for 2% of the terms where Witty suggested inclusive options.
These findings emphasize that Witty significantly outperforms Microsoft Editor in both identifying bias and offering actionable alternatives, making it a more reliable tool for fostering inclusive language.
Conclusion
Microsoft Editor and Witty cater to different needs in communication tools. While the Editor effectively enhances productivity by addressing grammar, spelling, and clarity, it is absolutely unreliable for inclusive language. On the other hand, Witty is specifically designed to promote inclusion, with robust bias detection across 50+ diversity dimensions and culturally sensitive alternatives. By the way, Witty also contains a basic grammar & spell checker.
This gap in the Editor’s functionality is not trivial—it poses serious risks in light of regulations such as the EU AI Act, U.S. AI directives, and anti-discrimination laws. Bias in public-facing texts could lead to legal challenges and damage a company’s reputation. Witty mitigates these risks, ensuring your communication aligns with both legal standards and modern expectations of inclusiveness.