Risk of Bias and Error From Data Sets Used for Dermatologic AI

In this issue of JAMA Dermatology, Daneshjou et al report on bias of medical data sets used for artificial intelligence (AI) and underreporting of relevant metainformation. Their findings are in line with those of other reports, showing that most current data sets for machine learning are biased in various ways. Biased data sets, however, may render a machine learning model unfit for practical use. This is because data sets are not simply a small part of the machine learning pipeline, but the essence of it. Machine learning models are not “intelligent” in the broad human sense; rather, they learn ways of processing known training cases to build a representation map that can be used to map unknown test cases (Figure, A). An unknown test case is then classified according to the distribution of known diagnoses of similar training ca ses in the adjacent area of the representation map. If this distribution favors the correct diagnosis, we regard this as a successful prediction (Figure, B). This is an oversimplification of supervised machine learning, but it should help in understanding 2 important problems surrounding biased data sets.
Source: JAMA Dermatology - Category: Dermatology Source Type: research