How is fairness in AI calculated?
What is AI fairness and why is it important?
Fairness is a subjective term that is difficult to define in general. The concept of fairness, usually, can be seen as:
“an agreement between two or more parties about what constitutes acceptable behavior or the degree of impartiality that exists between two or more parties.”
When used in relation to AI systems, fairness explores the potential impact of AI systems’ decision making on society and who decides how those decisions are going to be made.
There are many definitions for what fairness means, but the most important one for AI is that it should not discriminate against any individual or group of people based on their race, gender, sexual orientation, age, etc.
When creating or building AI systems, fairness is an important aspect to be taken into consideration, because:
- AI algorithms should be designed in a way that they are not biased towards any particular group of people. This can be achieved by making sure that the data used to train the algorithm is diverse and agrees with the technology that is being built;
- ‘good fairness foundation’ helps build trust in the public, which is necessary for the adoption and use of AI in society;
- Having fair and responsible principles in the work behind the systems makes the system be prepared for compliance with regulations and laws.
How is fairness calculated?
AI systems are designed to learn from data and their environment, so they can make more accurate decisions. This means that some biases might be learned by the system. The question then becomes how to mitigate these biases to make the system more fair? One way is to use fairness metrics.
The AI fairness metric is a measure of how well the AI system is able to make decisions in a fair way. The algorithm for calculating the fairness metric is based on two factors:
- The number of false positives and false negatives that the AI system has made in relation to its original purpose
- The number of false positives and false negatives that an average person would have made if they had taken the same decision as the AI system
The first factor measures how many times an AI system has mistakenly identified a person as being either guilty or innocent when they are not, while the second factor measures how many times an average human would have made this mistake.
To narrow down the most relevant definition for each scenario, we need to identify the following:
- for which underserved group we want fairness;
- if we intend to achieve parity or set a preference;
- and whether we want fairness to be applied in treatment or in the overall results.
There are plenty of fairness principles and methods that can be taken into account when deciding on which metrics to use to calculate fairness as these metrics vary in their complexity, computational cost, and reliability. Companies are choosing the ones that suit them based on their organizational values, aims and goals.
At KOSA we consider these characteristics to ensure the continued fairness in the AI systems we work with:
- displaying a summary of different metrics that describe the outcomes of the AI system evaluation
- including balanced data
- including 5 responsible AI principles: Accountability, Fairness, Transparency, Safety and Robustness.
With this, KOSA’s software can help [companies] test the fairness and verify if any potential and present bias is uncovered and mitigated as per the pre-established responsible AI principles.
An example of fairness metrics used
KOSA uses fairness metrics that quantify the amount of unwanted bias in datasets and models. We report five fundamental examples, but you can find more in Gajane’s and Verma’s papers:
- Group Fairness: this metric establishes that a predictor should always predict with (almost) equal probability a particular outcome for individuals across different groups. The fundamental idea here is that all individuals should have similar chances of obtaining a positive outcome, irrespective of gender, race, or age. Its biggest implementations are cases of affirmative action, but since it must preserve parity its applicability is limited to a few circumstances.
- Individual Fairness: also known as fairness through awareness. According to this metric, classifiers are “fair” if they predict similar outcomes for similar individuals, regardless of sensitive attributes. Individual fairness relies heavily on a heuristic “distance-metric”, which measures the divide among individuals; therefore, its applicability is limited to those areas where a reliable and non-discriminatory distance-metric is available.
- Fairness through Unawareness: in this context, fairness is present if a predictor does not make explicit use of sensitive attributes in the predictive process. This condition would be met for any predictor that is not group-conditional. It is a particularly useful approach if it is not possible to specify any sensitive attributes and there is no other background knowledge available. Overall, it roughly corresponds to being “blind” to counter discrimination.
- Equal Opportunity: it posits that all demographics should have equivalent true positive rates; in other words, positive outcomes should manifest at a similar rate in each group. This idea has an affinity with disparate mistreatment, a metric that asks for equivalence of misclassification across groups.
- Equalized Odds: adding to the previous metric, here fairness is achieved if all groups present similar true positive rates and false positive rates. The notion supporting this metric is that individuals are evaluated meritocratically, regardless of the status of their sensitive attributes (e.g., gender or race).
The complexity of the world where AI systems are used is still a barrier to measuring AI fairness. The real world is far more complex than the controlled environments that we can use for testing the fairness of the system, making it important for companies creating and using AI systems to verify and measure outcomes continuously.
Fairness as a collaborative effort
The idea of fairness in machine learning algorithms is to ensure that these algorithms do not discriminate against any particular group. This can be done by adding a variety of data sets from different groups and different backgrounds to the training data. Ensuring true AI fairness starts also from the involvement of all relevant stakeholders throughout the development process.
At KOSA, we propose a tailored dashboard with specific steps for each of the company's team members initiating a close collaboration as a foundation for better and ‘fairer’ outcomes.
An example scenario would be:
- The Responsible AI/ Compliance team sets out the regulatory needs and maintains accurate overviews and documentation in the system of the important principles that need to be taken and why.
- The Technical team takes the responsibility of collecting data, auditing and testing the system for fairness and planning the follow-up activities.
- The Executive team ensures that all the necessary steps are taken from the dashboard and the outcomes are equitable.
What can be done as a first step?
The concept of fairness is gaining an increasingly central role in the development of those AI models affecting public and private life. Since defining ‘fairness’ is such a complex issue, finding good circumstantial definitions and appropriate metrics to measure fairness becomes a pivotal matter for companies using AI at scale – and the main challenge of building responsible intelligent systems.
At KOSA AI, we integrate a variety of fairness metrics in our system to help you evaluate datasets and models. A first step towards embedding ‘fairness’ into your AI is defining the responsible AI principles and establishing the groundwork for mitigation of unwanted bias and issues. You can take KOSA’s Responsible AI Self-Assessment test as a quick start.