A recent study from MIT and Stanford University reveals significant biases in commercially available facial-analysis programs developed by major technology companies. These biases, particularly concerning skin type and gender, raise serious questions about the fairness and accuracy of current artificial intelligence systems used in various commercial applications, potentially extending to areas like face tools for skin care commercial assessments and beyond.
The research, set to be presented at the Conference on Fairness, Accountability, and Transparency, highlights a stark disparity in error rates when these programs analyze faces. While the programs demonstrated exceptional accuracy in identifying the gender of light-skinned men, with error rates consistently below 0.8 percent, the performance plummeted dramatically for darker-skinned women. In some instances, error rates soared to over 20 percent, and in others, even surpassed 34 percent. This significant discrepancy underscores a critical flaw in how these neural networks are trained and evaluated, potentially impacting various commercial sectors, including the burgeoning market of face tools for skin care commercial applications that may rely on similar facial analysis technologies.
This study sheds light on the problematic nature of training data used for these systems. The researchers point out that a major U.S. technology company, for example, boasted a 97 percent accuracy rate for their face-recognition system. However, the dataset used to evaluate this system was heavily skewed, comprising over 77 percent male and more than 83 percent white individuals. This lack of diversity in training data directly contributes to the observed biases and calls into question the reliability of these systems when applied to diverse populations, a crucial consideration for any face tools for skin care commercial product aiming for broad market appeal.
Joy Buolamwini, a researcher at the MIT Media Lab’s Civic Media group and the lead author of the paper, emphasizes the broader implications of these findings. “What’s really important here is the method and how that method applies to other applications,” she states. “The same data-centric techniques that can be used to try to determine somebody’s gender are also used to identify a person when you’re looking for a criminal suspect or to unlock your phone. And it’s not just about computer vision. I’m really hopeful that this will spur more work into looking at [other] disparities.” Her concerns extend beyond simple gender identification, suggesting that biases could permeate various applications that utilize facial analysis, potentially impacting even seemingly unrelated fields like face tools for skin care commercial development if these tools incorporate biased facial recognition algorithms.
Timnit Gebru, who contributed to the research as a Stanford graduate student and is now a postdoc at Microsoft Research, further strengthens the study’s credibility and impact.
The investigation began with a chance observation of bias in face-tracking software. Buolamwini’s initial project, Upbeat Walls, an interactive art installation, utilized a commercial facial-analysis program for user movement tracking. During public demonstrations, the team, despite its diversity, found that the system struggled to reliably track darker-skinned users, forcing them to rely on lighter-skinned team members for demonstrations. This anecdotal evidence prompted Buolamwini to conduct a more systematic investigation. She began by testing commercial facial-recognition programs with her own photos, revealing consistent failures to recognize her face as human or frequent misclassification of her gender.
To rigorously quantify these biases, Buolamwini assembled a diverse image dataset that intentionally over-represented women and individuals with darker skin tones, groups typically underrepresented in standard face-analysis evaluation datasets. This new dataset comprised over 1,200 images. Collaborating with a dermatologic surgeon, she categorized these images using the Fitzpatrick scale, a six-level scale of skin tones originally designed to assess sunburn risk.
Applying three different commercial facial-analysis systems from major tech companies to this dataset revealed consistent patterns of bias. Across all three systems, gender classification error rates were consistently higher for women than men and for darker-skinned individuals compared to lighter-skinned individuals. For darker-skinned women (Fitzpatrick scale IV, V, or VI), error rates ranged from 20.8 percent to a staggering 34.7 percent. Alarmingly, for the darkest skin tones (level VI), two systems exhibited error rates nearing 47 percent, essentially performing no better than random guessing for this demographic. This level of inaccuracy in commercial systems, particularly for a binary classification task like gender, raises ethical concerns about their deployment and potential discriminatory impacts across various applications, including the seemingly benign face tools for skin care commercial sector if such biases are inadvertently integrated.
Buolamwini questions the acceptability of such failure rates if they disproportionately affected different demographic subgroups. She emphasizes that current benchmarks used to measure success in these systems may be misleading and create a “false sense of progress” due to biased training data and evaluation methods.
Ruchir Puri, chief architect of IBM’s Watson AI system, acknowledges the influence of datasets on model outcomes. He mentions IBM’s development of a new model trained on a more balanced dataset of half a million images, aiming for improved accuracy across diverse demographics. While not a direct response to Buolamwini’s paper, Puri highlights IBM’s ongoing efforts to address these critical issues and improve fairness in their AI systems, demonstrating a growing industry awareness of the need for unbiased AI, which is crucial for responsible development and deployment of technologies, whether in facial recognition or potentially in emerging fields like advanced face tools for skin care commercial applications.