Artificial Intelligence

Ethics Board 2025 Update

In 2024 gave us the opportunity to apply our ethics process in practice as we refined our features and exploring the ethical use of Large-Language-Models (LLMs).

Looking back the theme of 2024 for the first three quarters was iterating and improving what we already had. We also brought Witty to Microsoft Office while making our solution more accessible. In Q4 we then introduced some larger new features using Large-Language-Models (LLM) that required a deeper look. Throughout the year we continued to apply our ethics process which we developed in 2023.

 

Focus on feature refinement

In 2023, we introduced a feature to allow teams and users to fully customize which of the diversity dimensions they want Witty to highlight. In Q1 we iterated on this feature based on user feedback to make it possible for users to exceed the minimum team settings. The aim is to allow individual users to challenge themselves more if they wish to do so. But we also wanted to enable team admins to experiment with higher settings in order to prepare and gain confidence to increase the minimum settings for the entire team.

Expanding accessibility

In Q2 the biggest step was the release of the Microsoft Word Add-in. In the browser extension we automatically check text for those websites that are allowed via the privacy settings. This gives sufficient “context” to ensure that only desired content is reviewed. For Word this pattern cannot be applied, which is why we introduced a dedicated button to trigger a check either across the entire document or a selected portion of text. In Q3 we then updated the Word Add-in and Dashboard to be more accessible in collaboration with Zenyth Group. We also applied some learnings to the browser extension, but we still need to add keyboard shortcuts. We are aware that some of our color choices need to be revisited for better contrasts.

Another step we took in Q2 was the introduction of the “HR add-on”. In the HR context there is a specific power dynamic that makes certain words more problematic. As part of this feature we reviewed all of our existing rules to determine if they should only apply to the HR case. We also introduced some new rules that we previously felt would be confusing or overboard.

Adding French language support

In Q4 we made several major feature announcements that required attention from the ethics board. One of the biggest was the release of French language support. In order to ensure we cover a similar range of topics we used OpenAI ChatGPT to translate rules from English to French. Note we did this as part of a master thesis with the UZH to ensure that these are not literal but cultural translations. We then worked with an expert in inclusive language in French to review these translations and then expand the rules with additional data sources (f.e. a list of professions in French) and other language guides for inclusive language in French. We also developed an algorithm to handle the generic masculine inclusively using the point médian.

Launching Witty GPT

We released Witty GPT in Q4. We have done numerous internal studies on bias in Large-Language-Models (LLMs) like ChatGPT and Microsoft CoPilot and have found them to be biased even when explicitly prompting for inclusive language. Yet when talking with prospective customers we have frequently heard that they wish to harness LLMs for productivity gains despite these risks, even in products like Microsoft CoPilot. So we decided to build an integration where we ensure that best practices for prompting for inclusive language are always applied but then take things a step further. We analyze the response with our Witty algorithm and generate a follow-up prompt asking the LLM to improve the original response based on this feedback. We have found this provides for a much more deterministic result. There are a few things to consider here:

  • The final response is still generated by an LLM to keep a consist style and tonality, so there are risks that it either ignores or incorrectly applies our feedback
  • We only iterate once to ensure the response isn’t severely slower, therefore issues Witty would reliably identify might still remain in the final output.

The design decisions here were guided by the fact that users are already using LLMs in their raw form. If we made the responses too slow they would continue using the raw form of these LLMs. This is why in our marketing material we use words like “guide the LLM” to clarify that in the end the LLM is in control and that it is not deterministic.

By showing an overview of our feedback and the edits that were generated based on this feedback, we provide transparency and keep the educational side of Witty alive inside Witty GPT. While Witty GPT is LLM agnostic and is intended to be integrated with LLMs customized by our customers, we use Anthropic Claude Haiku 3.0 for our reference implementation as they have a relatively high degree of transparency on AI safety, are available within the EU and have been found to be less biased.

Grammatically correct alternatives

The other LLM supported feature we launched in Q4 uses Anthropic Claude Haiku to automatically make our alternatives grammatically correct within the specific sentence context. For English and German we have done this rule based, but there are limitations. It is also cumbersome to maintain. For French, we decided to instead from the start use LLMs for this based on the work we started in the summer of 2024 for inspiration rephrasings.

We use a prompt that gives very detailed instructions to ensure the LLM only makes changes that are directly related to the alternative the user has accepted. For example by default LLMs will try to fix spelling mistakes even when not prompted. Or if the same word appears multiple times, they would change all the cases. With this we also avoid introducing the LLMs bias into the rephrased text. As part of the feature we also give the user a preview of changes so that they remain in control over what gets added. Currently, this feature is only available in the Word Add-in in French. In Q1 it should also be available in the browser extension and also become available in English and German.

For both Witty GPT and the grammatically correct alternatives via LLM, we introduced a new team level privacy setting with which these features can be disabled for the entire team. There we also added explanations about the prompts we use. We also mention that while the model is hosted with AWS, they promise to not use any of the prompts or responses for their own training purposes.

Takeaways from the ethics board

As mentioned in the beginning of the article, the ethics process we developed in 2023 continued to be used. It served as a constant reminder that we need to continuously evaluate developments. Thanks to its structure it is possible for us to apply it rigorously to identify what aspects require deeper inspection. For 2024 we specifically did deeper reviews for the features leveraging LLMs. Throughout the year we remained in contact with Anna Mätzener as our outside member to our ethics board.

Lukas Kahwe Smith

Lukas Smith (he/him) is Co-Founder and CTO at Witty Works. Previously he was a partner at the digital agency Liip, where he was supporting customers as a system architect while leading various internal initiatives like the ISO 27001 certification. As a well known open source contributor, he was release manager for PHP 5.3 and helped shaped the current release process. He was also a key contributor to many PHP based projects like Symfony and the Doctrine project. He also acted as the Symfony Diversity lead.

Elements Image

Subscribe to our mailing

Stay in the loop with carefully crafted articles about inclusive language and tips to improve diversity & inclusion in the workplace.

Latest Articles

Ethics Board 2025 Update

Ethics Board 2025 Update

In 2024 gave us the opportunity to apply our ethics process in practice as we refined our features and exploring the ethical use of Large-L...

The EU AI Act: Witty Mitigates Compliance Risks of ChatGPT or Microsoft Copilot

The EU AI Act: Witty Mitigates Compliance Risks of ChatGPT or Microsoft Copilot

Learn how the EU AI Act impacts AI language requirements. Discover how Witty ensures compliance with inclusive language to help your busine...

Microsoft Editor irresponsibly unreliable in bias detection and inclusive language

Microsoft Editor irresponsibly unreliable in bias detection and inclusive language

Discover how the Microsoft Editor and Witty compare in bias detection and inclusive language. While the Editor enhances productivity, Witty...