In 1936, Literary Digest, an influential American magazine of general interest, published a survey on the predisposition of the American vote for the presidential elections of that same year. In it, he predicted a landslide victory for Alf Landon (Republican candidate) over Franklin D. Roosevelt.
Unquestionably, Franklin D. Roosevelt was elected with 61% of the votes. This led to the decline of Literary Digest and the subsequent closure of the magazine.
For your survey, Literary Digest used lists of car owners and landlines. Here we have a clear example of algorithmic bias, since the automobile and the telephone were, in those years, indicative of an economic level superior to the average American, so that the model of prediction of vote was clearly biased in favor of the republican vote (the great fortunes vote more to the republicans).
In the same way, machine learning algorithms depend on the training data that is supplied to them. In this way, these algorithms can reflect, or even expand, the biases that are present in the data.
The machine learning algorithms depend on the training data that is supplied to them. In this way, these algorithms can reflect, or even expand, the biases that are present in the data
If an algorithm is trained with racist or sexist data, the resulting predictions will also reflect this. That is why algorithms try to avoid obviously problematic variables, such as race. No bank would want a denunciation of a collective for denial of mortgages by the simple fact of the color of their skin. However, even if we exclude the concept of race from the algorithms, other variables can produce the same biased result. In this case it can be the postal code. If it is used to determine better or worse credit conditions, we could be excluding from access to credit entire neighborhoods with a high number of people of the same race and also predominantly poor.
Finding an answer to what are the correct surrogate variables – also called proxies for such information – and determining when a prediction or decision is affected by an algorithmic bias may not be easy, since:
₋ If the machine learning algorithm is based on a complicated neural network it makes very good predictions, but in reality it is not able to explain why it made a particular prediction, so it can be almost impossible to understand why, or even how, the algorithm issues its judgment.
₋ On the other hand, learning methods based on decision trees or Bayesian networks are much more transparent for inspection, but there is the fact that companies will not allow their algorithms to be examined publicly.
How can we balance the need for more precise algorithms with the need for transparency towards people who are being affected by possible biases? If necessary, are we willing to sacrifice precision for transparency?
Because linked with the accuracy of the algorithm we see another important social criterion to be addressed: find the person or entity responsible for the "error" of the model. That is, can we determine who is responsible when a system fails in its assigned task?
You can quote the first fatal run over of a car without a driver happened in March 2018. On whom should responsibility fall? About the company that owns the car? Or are the programmers and data scientists guilty? Maybe the end users? Would that knowledge influence the decision to buy a car or another knowing that whatever happens is not our responsibility? What if it was our responsibility?
We humans do not know our true motives for acting and how to act. Who knows for sure what decision it would take given the case of crashing or not against something or someone being at stake his life and that of his family? Should we demand that machines be better at this than we really are?
The challenge, therefore, is to establish algorithms that produce an ethical behavior (or that humans, in general terms, can accept as ethical).
For these new requirements in the design of artificial intelligence, we need to find the appropriate legislation in this field. And before that, there must be public ethical debates about it. Uniting, of course, the ethical implications to the development of new technologies is a clear need in all sectors where we want to apply the new models of artificial intelligence.
Rocío Pérez is a specialist in Big Data solutions at Oracle Ibérica and a professor of Data Science at MIOTI