MESA Banner
Crafting Arabic: The Politics and Ethics of Making Language into Data in Amman’s Tech Sector
Abstract
In May 2023, Sam Altman, the CEO of OpenAI, spoke at the Xpand Technology Conference in Amman, Jordan about data, artificial intelligence (AI), and ChatGPT. In his opening remarks, tech mogul and moderator Fouad Jeryes emphasized the significance of the event taking place in Jordan: “We remain a powerhouse here in Jordan for tech…we are the creators of the overwhelming majority of [Arabic] content on the Internet.” Discourses about Jordan’s inordinate production of digital content have circulated in news media and policy reports for at least a decade. However, such claims have accumulated greater significance with the entrance of chatbots, large language models, and other AI-enabled technologies—built on massive linguistic datasets—into public consciousness. Today, workers in Amman’s tech sector, a space in which English has historically been privileged, are not only accumulating socio-economic capital through their Arabic competencies; through their everyday labor, they are grappling with the politics and ethics of producing Arabic-language data and digital technologies. This paper draws on over a year of ethnographic fieldwork in Amman’s tech sector, including participant observation with two tech startups and semi-structured interviews, to ask: (1) What kinds of labor and decision-making are involved in transforming Arabic content into data for technological production? (2) How do varied linguistic histories, political commitments, and conceptions of value shape the experiences of those working at the intersection of language and technology? These questions aim to illuminate the political and ethical stakes of tasks like digitizing, collecting, annotating, and categorizing linguistic data, in addition to the ways that data-driven technologies and AI are differentially experienced across linguistic landscapes and political economies. Building on Jordan as a case study, this paper argues that making language into data is not a simple conversion but rather a process thoroughly shaped by the language ideologies, social relations, and political milieus in which it is embedded. Moreover, it advances recent anthropological scholarship on AI and algorithmic systems—which has largely focused on the US and Western Europe—by asking how concepts like data and machine learning are being differently conceptualized in the Arabic-speaking world. Lastly, this paper builds on scholars in science and technology studies and critical data studies who are centering tech workers in the Global South as political subjects—rather than just victims of economic exploitation—who actively shape the languages and digital technologies with which they labor.
Discipline
Anthropology
Geographic Area
Jordan
Sub Area
None