Search engines aren’t neutral. They’re influenced by the same white cis-male patriarchy that permeates American society and government. Dr. Noble's book Algorithms of Oppression revealed how typing "black girls" into Google brought back search results for pornographic sites. Google changed the results for this search term in light of Dr. Noble's findings, but the problem remains for search terms related to other female minorities such as “Asian girls.” It’s also no accident that the Library of Congress subject headings classify Native American sand art as “arts and crafts” (W.
Wiegand, personal communication, Oct 10, 2018). Systematic oppression is occurring behind the computer screen. "Algorithms can reproduce and support the same negative stereotyping that occurs in society" (Farkas, 2017). Algorithms don't create themselves. Data scientists create the code for them and — like all human beings — they have biases for better or worse.
This library guide examines the biases present in those algorithms, provides information on how to navigate them, and ways to fight bias. The topics in this library guide include:
Subject headings are derived from mainstream articles and books because they can't realistically encompass everything that was ever published. This can become precarious because marginalized groups often have difficulty publishing their works so mainstream literature often doesn't include them. Subject headings also tend to follow what's considered socially acceptable in mainstream culture at the time which results in pushing non-dominant culture to the side and unrelated search terms becoming associated with marginalized groups. For example, Strottman (2007) revealed several headings used to categorize Native Americans that were completely inaccurate and clearly biased representations.
On the flip side, the LC subject headings don't include white privilege at all because the Library of Congress chooses not to "include specific headings for groups discriminated against" (Subject Authority Cooperative Program, 2016). Unless these headings are contested by librarians or the general public, the bias remains. It's important to note that librarians have successfully lobbied these headings in the past, but there's still a lot more work to be done.
Algorithms have to be fed data in order to be able to make decisions and judgments, similar to how people need to have evidence and information to come to conclusions. There's one key difference between people and algorithms. Algorithms will make the same decision 100% of the time — unless adjusted by a computer programmer — while people are prone to making exceptions or getting distracted by other variables. Therefore, if an algorithm is biased, it'll consistently make biased choices which can cause even more damage than a biased human being. This isn't just a hypothetical scenario. Algorithms are regularly fed biased data because they rely on historic data. For example, a technology company looking to automate its job hiring process may assign an algorithm to scan applicants' resumes. Technology has been historically male-dominated so the data the company feeds to the algorithm will be male-dominated. As a result, the algorithm will become biased and filter out resumes not written by male applicants.
One may think the solution is simply to remove gender from the algorithm's decision-making protocol altogether. However, removing data — biased or not — will decrease the algorithm's accuracy. This is where the fairness over accuracy curve that many computer programmers use comes from. The ideal scenario is create an algorithm that's as fair and accurate as possible.