In a March 12th article by Olivia Solon for NBC News, IBM reveals that it has developed its so-called Diversity in Faces dataset, a face database of approximately one million images to assist researchers in developing less biased facial recognition algorithms. The article states:
Academics often appeal to the noncommercial nature of their work to bypass questions of copyright. Flickr became an appealing resource for facial recognition researchers because many users published their images under “Creative Commons” licenses, which means that others can reuse their pictures without paying license fees. Some of these licenses allow commercial use.
To build its Diversity in Faces dataset, IBM says it drew upon a collection of 100 million images published with Creative Commons licenses that Flickr’s owner, Yahoo, released as a batch for researchers to download in 2014. IBM narrowed that dataset down to about 1 million photos…
The article provides a tool. Enter your Flickr user name and it will tell you if any of your images have been included in the dataset, and, if so, how many. It also provides a sample image. I have 23 images in the dataset. The sample image provided by the NBC app is a shot I took during the 2010 Toronto Pride Parade, a head-on shot of a young Caucasian participant wearing large orange-tinted sunglasses. I posted it subject to an Attribution-NonCommercial-ShareAlike creative commons license. That means anyone is free to use my image if:
- they offer appropriate attribution,
- they use it for non-commercial purposes, and
- reshare under the same terms.
I have a number of concerns:
1. The Sample Image
The first and, to my mind, most obvious concern is the simple observation that while the ostensible purpose of creating the dataset is to help train facial recognition algorithms to be more even-handed in their identification of racialized faces, the sample image returned in my search is of a Caucasian face. I wonder if they’ve ever developed an algorithm to identify irony.
2. Licensing Terms
Does IBM’s use adhere to my licensing terms? Clearly, Yahoo thinks so or it wouldn’t have bent over backwards to provide IBM with its original set of 100 million images. In terms of attribution, the Flickr version of my photo includes a watermark to a domain I still own but do not use – nouspique.com. Does this count as appropriate attribution? As for non-commercial use, IBM tries to weasel its way around the issue by claiming that it has produced the dataset for the use of academics and researchers. I’m not convinced. In the end, the fruits of that research will find their way into the commercial side of IBM’s operations, rolled out as surveillance tools for military or police groups, retail tools to enhance the shopping experience, etc. It seems a bit disingenuous to suggest this use does not include a commercial component. Nevertheless, strictly speaking, they are probably correct insofar as the actual use of the photos in question is restricted to the research side of the process; my images will not be embedded in any commercial product. If that is true, then the obligation to reshare under the same terms does not apply.
3. Privacy
One of the article’s chief concerns is that the subjects of these photographs may be suffering a violation of their privacy. Most of the photos I post to Flickr (or any social media platform) fall within the genre of street photography. I shoot in public spaces, principally in the Greater Toronto Area, where the law is fairly clear: in public spaces, people have no reasonable expectation of privacy; therefore, I am free to photograph them without permission. By extension, IBM does not violate their privacy by including their images in its dataset. However, I cannot speak for the circumstances in which other photographers captured their images. What if these images came from family birthday parties in a private backyard? Or private functions like weddings or retirement parties? Or from jurisdictions where a different treatment of privacy applies?
4. Perverse Incentives
While, on its face, the production of a less racially biased dataset sounds laudable, its practical outcome is likely a more effective tool for oppressing racialized people. This is an outcome I neither intended nor even contemplated when I first assigned these images to the creative commons. My motives were relatively idealistic. Equipped with a law degree and inspired by the work of Lawrence Lessig, I had viewed the creative commons as an antidote to formally legislated Intellectual Property regimes which increasingly favour large (corporate) IP owners through things like increased patent and copyright terms and more stringent educational and fair use exemptions. Even a country like Canada, with its saner copyright regime, is bowing to the winds of the global neoliberal ideologues. I had hoped that by participating in the development of a repository for visual, literary, and musical artists, I would be supporting ordinary private individual creators who lacked the resources of larger enterprises. They would be able to incorporate my work through citation, quotation, mashup, whatever, without encountering the usual barriers that block all but the largest publishing and film-making concerns from engaging in this kind of activity.
I suppose I should have expected this. It is capital’s way to insinuate itself even where it was never meant to go.
There are two obvious responses. One is to stop assigning works to the creative commons. The other is to customize the terms of the license, perhaps by redefining the term “commercial activity.” The point is: when large corporations like Yahoo and IBM abuse the creative commons regime — and it is abuse — it produces an incentive for ordinary creators to flee the commons. That would be a shame as it would produce a general impoverishment of the cultural field.
5. What to do?
Is there anything I can do in this specific situation? According to the article, I can request that IBM remove my photos from its dataset. However, there are limitations to this.
- It cannot remove the images from copies of the dataset that have already been distributed to academics and researchers.
- It will not remove images without their exact address. Since IBM will not provide an index of its dataset, it is impossible to offer them any exact addresses. Effectively, IBM says to photographers: we honour take-down requests; it is impossible to make take-down requests. This is the sort of scenario Joseph Heller might have imagined.
- Another action is to terminate my Flickr account to prevent this sort of thing from happening again.
6. Technochauvinism
For my final observation, I draw on Meredith Broussard‘s recent book, Artificial Unintelligence: How Computers Misunderstand the World (Cambridge: The MIT Press, 2018). She challenges the confidence technochauvinists have in their beloved AI technology. The thrust of her challenge is two-pronged. The first prong is her observation that technochauvinists apply their AI technology like a hammer and suppose that virtually every human problem is a nail. But the simple fact is: some human problems are NOT nails. I suspect facial recognition is one such “problem.” Some of the parametres which drive facial recognition engage us in concerns which are subjective and emotionally charged. Race, for example. How does one train an algorithm to be sensitive to the constructedness of race? Do most coders even understand this question? How would an algorithm categorize my wife, who is mixed race? What about queerness? Is there a queerness index? My sample image is from a Pride Parade. Will algorithms try to categorize people along the lines of gender and sexuality? Looks like a nail to me? Let’s hammer it.
The task is one prong, but the coder is the other. I leave you with Broussard in her own words:
[W]e have a small elite group of men who tend to overestimate their mathematical abilities, who have systematically excluded women and people of color in favor of machines for centuries, who tend to want to make science fiction real, who have little regard for social convention, who don’t believe that social norms or rules apply to them, who have unused piles of government money sitting around, and who have adopted the ideological rhetoric of far-right libertarian anarcho-capitalists.
What could possibly go wrong?