Types of Bias

Feb 26

bias is present

in computational systems because each development step requires human judgment

The basic explanation

At a high level, there are three types of bias: pre-existing, technical, and emergent.

pre-existing

Pre-existing bias has its roots in social institutions, practices, and attitudes. This is the bias from the “real world” that seeps its way into AI. Sometimes it is called societal bias.

Technical bias

Technical bias arises from technical constraints of considerations, sometimes known as statistical bias. This type of bias gets introduced to AI because of the modeling decisions the data scientists make.

Emergent bias

Emergent bias arises in a context of use, which could also be included in societal bias. Emergent bias is introduced when a machine learning model is used inappropriately or outside of the context for which it was intended.

The more complicated explanation

Data scientists have come to refine these ideas and add to them. More recently defined types of bias include the following:

1 —

Historical bias

Historical bias arises when there is a misalignment between the world as it is and the values or objectives to be encoded and propagated in a model. It is a normative concern with the state of the world. It exists even with perfect sampling and feature selection.

2 —

Representation bias

Representation bias arises while defining and sampling a population on which you will train a model. It occurs when the training population under-represents, and subsequently fails to generalize well, for some part of the use population.

3 —

Measurement bias

Measurement bias arises when choosing and measuring features and labels to use. Variables themselves are often proxies for the desired quantities, not a pure measurement of a construct. For example, because we cannot measure depression directly, we may use the responses from a medical questionnaire to approximate depression. The chosen set of features and labels may leave out important factors or introduce group or input-dependent noise that leads to differential performance.

4 —

Aggregation bias

Aggregation bias arises during model construction, when distinct populations are inappropriately combined. For example, a predictive policing tool may have been trained on a sample of Iraq citizens but then refined with data from low income communities in Los Angeles. In many applications, the population of interest is heterogeneous and a single model is unlikely to suit all subgroups.

5 —

Evaluation bias

Evaluation bias occurs during model iteration and evaluation. It can arise when the testing or external benchmark populations do not equally represent the various parts of the user population. Evaluation bias can also arise from the use of performance metrics that are not appropriate for the way in which the model will be used.

6 —

Deployment bias

Deployment bias occurs after model deployment, when a system is used or interpreted in inappropriate ways.

Selly Djap