The Missing Piece in AI Governance: Fighting Bias In, Bias Out
If you listened to my recent podcast (Navigating Innovation and Trust in the Age of AI) with Kim Basile, Kyndryl’s Chief Information Officer, you would know that I like to work with acronyms. When looking at the exploding world of AI, enterprises and executives are experiencing FOMO – the Fear Of Missing Out. As Kyndryl is the world’s largest provider of IT infrastructure services, Kim definitely recognized that FOMO is real – for Kyndryl and its customers. The perception (reality?) is that if you’re not in the AI game, then your business is falling behind.
I also talked with Kim about a second acronym as it relates to AI – FOMU – the Fear Of Messing Up. I believe that this is an even more important element in successfully launching AI initiatives. Kim talked at length about the governance required to properly manage AI projects, and to build the cross-functional trust in them. It isn’t just one person responsible for governance, but a leading oversight team that will ensure that the proper guardrails are in place, and AI projects are just like any other managed effort within enterprises.
If you’re on LinkedIn, and have connections in tech, then you certainly saw posts referring to the MIT study where “95% of organizations found zero return despite enterprise investment of $30 billion to $40 billion into GenAI” from articles like this (AI investment led to zero returns for 95% of companies in MIT study). Given all the hype, and associated investment, that’s a scary statistic. But why is that the case?
I’m certain that FOMO and diving into AI initiatives without extended planning and governance is part of the problem. I would also think that not paying enough attention to FOMU in these projects led to success not being achieved. But I believe that there is another acronym that is contributing to AI not being as successful as anticipated – BIBO – Bias In Bias Out. This can be applied to the data sources selected for model training to the associated prompts being used to get results from models, bias throughout the systems cause issues that lead to failures.
What can be done to minimize BIBO, to strive for bias-free AI systems?
First and foremost is understanding the wide variety of biases that can be introduced into AI systems. This article (AI bias: exploring discriminatory algorithmic decision-making models and the application of possible machine-centric solutions adapted from the pharmaceutical industry – PMC) does a great job of identifying the main types of bias:
- Historical Bias
- Representation Bias
- Measurement Bias
- Evaluation Bias
- Simpson’s Paradox (Subgroup Bias)
- Sampling Bias
- Content Production Bias
- Algorithmic Bias
Establishing data sets that are not skewed to ending up with a predetermined set of results is necessary. Building the right data foundation starts with thorough audits of training datasets to identify representation gaps, historical inequities, and skewed samples before model development begins. The goal is simple: eliminate bias from the start. Implement diverse data sourcing strategies that actively seek out underrepresented perspectives and use cases rather than relying on easily accessible or convenient datasets.
From the article referenced above, “Algorithms rely on data, and their outcomes tend to be as good as the data provided and labelled and the way the mathematical formulations are devised. Even in an unsupervised ML model working with raw data, the machine might find discriminatory societal patterns and replicate them.”
An example of “Representation Bias” mentioned in the article was in the earlier days of more widespread AI adoption. Amazon built an AI model to automatically review, analyze and grade the background of individuals who applied to the company. But after using this system for about a year, they realized that it was rating men far higher than women (Insight – Amazon scraps secret AI recruiting tool that showed bias against women | Reuters). The model was trained on the past 10 years of hiring data – which was wildly male-dominated. Essentially, the model trained itself that male candidates were “better”. This model couldn’t keep pace with tech roles and the workforce that was becoming far more popular with women.
There is a real legal and financial risk to corporations if bias appears in AI-driven actions and results as well. HR-platform Workday is being sued because its application tracking system (ATS) showed bias against older applicants (https://styledispatch.com/the-hidden-ageism-in-ai-hiring-tools/). AI models look at backgrounds with resume gaps, using outdated terminology and graduation dates (which can derive age) that could create a disadvantage for more experienced (aka older) candidates.
As Kim mentioned in our podcast, governing AI initiatives with cross-functional experts can help support different perspectives, and reduce the opportunity for bias to be introduced. Facilitate exercises where team members specifically challenge assumptions and look for blind spots in model design, data acquisition and implementation. While some stakeholders might hesitate to participate because AI seems too technical, input from non-technical team members is often essential for spotting biased datasets and problematic prompts.
Rigorous testing, validation and ongoing governance will be critical in establishing then maintaining bias-free AI systems. Develop bias detection protocols that test model performance across different demographic groups, use cases, and edge conditions before deployment. Implement continuous monitoring systems that track model performance disparities in production, facilitating regular reviews by governance teams.
It’s all about that prompt, prompt, prompt….optimization
As end users look to utilize specially trained LLMs through natural language interfaces similar to ChatGPT, what and how you word your prompting can yield significantly different results. I recently had an internal debate with a colleague about the naming of a particular product. I had gotten some outside feedback that the name we selected could be improved. My co-worker went to ChatGPT and got results highlighting why the name being used was a good one. I went to Claude and tried to make as neutral a prompt as possible, setting the stage for the question, giving some product details, our two choices for names, and asking it to pick the best one for the market. Claude recommended we use the other name we were considering.
Neither result was “right” or “wrong”. Just completely different results based on prompts. Did I really make as neutral a prompt as possible? Not exactly. I failed to include some product attributes that support the current product naming. I went back to Claude, included the original prompt, added these very relevant product details, and the result came back that we could choose either name – and some pros/cons for each.
Don’t “lead the witness”. We need to train users on how unconscious bias in prompt design can skew results, providing guidelines for neutral, inclusive language. As shown in my product naming example, if you lead the AI engine in a certain direction, like “why is the current name good” the engine will do just that. AI teams need to build prompt templates and guardrails that help users avoid leading questions or assumptions that could perpetuate stereotypes or unintentionally skew results.
Moving forward.
The path to AI success isn’t just about avoiding FOMO or managing FOMU—it’s about confronting the hidden third factor that’s certainly part of the 95% of AI investment failures: BIBO, or Bias In Bias Out. From Amazon’s male-biased recruiting algorithm to the subtle ways our prompts can skew results, bias can infiltrate AI systems at every level, turning promising initiatives into expensive failures. The solution requires the same rigorous governance Kim Basile advocates for, but with a laser focus on diverse data sourcing, cross-functional bias detection, and training users to craft neutral prompts that don’t “lead the witness.” Organizations that master BIBO won’t just avoid becoming part of that sobering 95% failure statistic—they’ll unlock AI’s true potential while their competitors struggle with systems that perpetuate the very problems they were designed to solve.
