As more and more of our cultural heritage becomes available in digital formats, humanities scholars are increasingly adding artificial intelligence (AI) and other computational techniques to their research methods. However, the question is just how valuable the insights gained from these tools are. It proves to be surprisingly difficult to assess whether such insights constitute a meaningful and interesting trend or merely reflect an error or bias in the tools and data used. In her PhD thesis, CWI researcher Myriam Traub explores ways to better understand such limitations.
The difficulty to grasp these limitations partly involves well-known quality issues in data, such as errors in optical character recognition (OCR). These errors are easy to spot by scholars, and are widely recognized as a problem in the community. But even for such obvious problems, little is known about how these errors impact AI methods used for research further ‘downstream’. It is entirely unclear how the outcome of culturally-oriented research projects is affected when research methods are provided with erroneous or biased data as input.
Lesser-known sources of bias
On the other hand, there are other sources of bias of which only a few users are aware. One example of this is algorithmic bias in full text search. This has been studied for more than a decade. But still, there is little awareness around this topic when it comes to using search tools in a non-commercial digital library. For these lesser-known sources of tool bias, it is of key importance to measure the amount of bias. Only then researchers can assess its impact on the research conducted with these tools.
Examining digital method use
Myriam Traub explored sources of bias in data and tools used by humanities scholars. She addresses a number of these in her PhD thesis, which she defends today at Utrecht University. For her research, Traub interviewed humanities scholars on their use of digital methods and the role of these methods in the overall research process. She studied retrievability bias in the search engine of the Dutch historic newspaper archive, the impact of partially fixing OCR errors by using human computation, and the potential of crowdsourcing on difficult tasks that are traditionally seen as limited to domain experts.
In particular, Traub shows that digital humanities should not only quest for better performing tools and higher quality data, but also pursue better techniques to measure limitations in tools and data. Also, Traub addresses that better techniques are needed for conveying the results of these computational measures to humanities scholars interested in the historical artefacts or events expressed in the data.
Traub calls for more intense, multidisciplinary collaboration between humanities scholars, data custodians and tool developers to better understand each other’s assumptions, approaches and requirements. This could help build not only the technical research infrastructure humanities scholars need. It should also help create the human infrastructure where scholars need to be trained in the skills necessary to routinely make critical assessments of the fitness of digital data and tools available in the technical infrastructure.
Traub performed her research at CWI within the research project SealincMedia, which is part of the national COMMIT/ program. Research partners were, amongst others, the Dutch National Library and Rijksmuseum.
Source: CWI, 11 May 2020