Services on Demand
Article
Indicators
Related links
- Cited by Google
- Similars in Google
Share
South African Journal of Science
On-line version ISSN 1996-7489
Print version ISSN 0038-2353
S. Afr. j. sci. vol.120 n.3-4 Pretoria Mar./Apr. 2024
http://dx.doi.org/10.17159/sajs.2024/17008
COMMENTARY
Without access to social media platform data, we risk being left in the dark
Douglas A. Parry
Department of Information Science, Stellenbosch University, Stellenbosch, South Africa
ABSTRACT
SIGNIFICANCE:
Social media data are essential for studying human behaviour and understanding potential systemic risks. Social media platforms have, however, begun to remove access to these data. In response, other countries and regions have implemented legislation that compels platforms to provide researchers with data access. In South Africa, we have lagged behind the Global North when it comes to using platform data in our research and, given the recent access restrictions, we risk being left behind. In this Commentary, I call attention to this critical issue and initiate a conversation about access to social media data in South Africa.
Keywords: social media data, digital trace data, data access, Digital Services Act, DSA
Subatomic particle physics has CERN. Astronomy has the Hubble telescope. Social science has the Internet, smartphones, email, social media, satellites, and a myriad of other ways to follow human behavior. The gods of the information age have produced a whole panoply of technologies for social research along the journey to other destinations. - David Lazer'
For scientists interested in understanding the character and effects of individual and collective behaviour, the digital age has brought about new opportunities to observe and measure human interaction at scale.2 The Internet, and social media platforms in particular, provides us with unprecedented (albeit socially and algorithmically biased)2,3 access to robust data on these interactions4. Facebook, Instagram, Twitter/X, TikTok and other social media platforms have not only fundamentally altered the ways in which we communicate with others but, by scaffolding how we interact with and engage with individuals, ideas, and organisations, they have reconfigured the public sphere itself.
Throughout our everyday use of these platforms, we leave behind digital traces of our behaviour as by-products of our interactions with other users and entities.5 These digital traces hold tremendous value for studying the nature and consequences of (mediated) human behaviour at individual, group, and population levels. For example, through social media platform data we can investigate, among other things, misinformation campaigns, election interference, the spread of infectious diseases, polarisation, and the effects of particular interaction patterns on user's mental health and well-being.6-8 Collectively, such data provide an important lens that enables both fundamental research on human behaviour as well as research on the potential systemic societal and economic risks posed by social media platforms. This importance is reflected in the growing prominence of computational social science across academic disciplines.9
Despite the considerable opportunities that these data hold, and the widespread use of computational social science in the Global North, research in South Africa, and the Global South more generally, has thus far made only limited use of digital trace data compared to the engagement observed in the Global North.10,11 While there are likely many factors underlying this general disparity, and numerous counter examples of interesting studies in the Global South leveraging digital trace data, unpacking these factors falls outside the scope of this comment. For now, it is sufficient to acknowledge the existence of this general disparity. Despite only limited use of digital trace data collected from social media platforms in South Africa, it is no less important here than it is in the Global North that we understand how these platforms impact, for better or worse, ordinary South Africans, our society at large, and potentially our hard-fought democracy.
"The way is shut" - restrictions on access to social media data
Unfortunately, while social media platforms continue to collect and analyse user data, in the wake of various scandals, the fears surrounding unfettered data use for the training of large-language models, and a general reluctance to face external scrutiny, most social media platforms have restricted and/or entirely removed access to the application programming interfaces (APIs) that researchers used to access platform data.12-14 The shuttering of these APIs has brought large swathes of social media research to a halt and, in doing so, severely jeopardised the extent to which we can learn about behaviour online and the very real consequences that this behaviour can have for individuals and societies.12,15
While alternative approaches to collecting social media data exist (e.g. self-reports, web scraping, browser extensions, data donations)5, and some researchers have partnered directly with social media platforms to access data, for various reasons these techniques fall short of the high-quality individual-level data available directly from platforms4,5,14, or in the case of direct partnerships, are only available to a select privileged few (who are almost exclusively based at institutions in the Global North). For this reason, we require direct, equitable access to social media data, without which we will be left in the dark when it comes to important individual and societal questions.
"A light from the shadows shall spring" - regulations to compel data access
In response to these restrictions and the broader recognition of the power that online platforms hold, the European Commission has implemented several key regulations as part of a broader set of legislation designed to govern the digital sphere in the European Union (EU). With the aim of fostering a safer online environment, the Digital Services Act (DSA) will, among other things, compel online platforms operating within the EU to prevent and remove posts containing illegal content, to ban certain types of targeted advertising (e.g. sexual orientation, religion), and to provide greater transparency into how their content algorithms work. For "very large online platforms" (i.e. those with at least 45 million monthly users in the EU, which covers most popular social media platforms and many other more general services like Booking.com and Google Maps), the act also requires that platforms (1) enable users to be able to opt out of recommendation systems, (2) be subject to external and independent audits, and (3) share data with researchers and other vetted independent 'watchdog' organisations.
This latter requirement, described in Article 40 of the DSA (see https://www.eu-digital-services-act.com/Digital_Services_Act_Article_40.html for the full text), is particularly important given the stranglehold that platforms have over access to data. Article 40 of the DSA compels very large online platforms to provide vetted researchers with access to data "for the sole purpose of conducting research that contributes to the detection, identification and understanding of systemic risks in the Union" and sets out the rules governing this access. While the DSA has been in effect since 25 August 2023, the code of conduct for practical compliance with the regulations remains a work in progress, and the full implementation will occur only in 2024 (see Klinger and Ohme16 for a set of recommendations for what this should look like). Indeed, managing vetted researcher access to platform data poses substantial challenges for the protection of personal user data and privacy, but also for the design, implementation, and running of the necessary infrastructure to securely manage, store, and process researcher data access. Despite these challenges, the DSA, and Article 40 in particular, provides an important example for other countries seeking to break the stranglehold that social media platforms have over their data, and enable robust and equitable access to critical data for research purposes.
Although research using digital trace data collected from social media platforms in South Africa has lagged behind research conducted in the Global North, given the rapidly growing adoption of mobile Internet-connected communications technologies (i.e. smartphones), the increasing roll-out of 5G bandwidth, and the general digitalisation of our economy, it is likely that there exists a similar potential for systemic societal risks posed by social media platforms here as there is in other countries. Unfortunately, given the ongoing restrictions and closing of platform APIs, South African researchers are left without any reliable, legal means of accessing data from social media platforms. Without access to social media data, not only do we risk getting left behind when it comes to research on important topics (e.g. misinformation, online political interference, digital well-being) but, perhaps more importantly, we are left without any means of developing insight at scale into online behaviour and the potential risks that it imposes for South Africans and South African society.
"All we have to decide is what to do with the time that is given to us" - an example of how to move forward
To ensure free and objective research on behaviour on social media platforms and the potential systemic risks that actions on these platforms can pose for individuals and society, we require reliable access to platform data. The EU has set an example for how this access can be achieved. While it would be naïve to assume that South Africa holds the same degree of economic clout to impose similar regulations on platforms with any chance of them being effective, we should not be afraid to explore other creative possibilities to enable research with digital trace data to flourish and contribute to our collective understanding of the character and risks of online behaviour. These possibilities could include the development of infrastructure for data donation at scale or the establishment of large-scale consortia to leverage current best practices to collect and integrate available data (see, for example, the European Digital Media Observatory project). Alongside these possibilities, I propose that we embrace the 'Brussels effect' (i.e. the de facto though not necessarily de jure externalising of EU regulations outside of the region's borders due to various market mechanisms) and seek to leverage the DSA by (1) lobbying the South African government to implement similar regulations 'piggybacking' on the DSA, (2) coordinating with relevant stakeholders in the EU to ensure that the code of conduct makes provision for broader access (see Klinger and Ohme16), and (3) in lieu of formal policy, collaborating with researchers in the EU who will have access to social media platform data.
My second and third propositions extend from the fact that the DSA is implemented on the basis of the 'market location principle' (lex loci solutionis), which holds that, irrespective of the location at which the platform was established, because the platform offers services in the EU/to EU citizens, non-European data also fall within the scope of the regulations. This implies that data from South African social media users will be available to researchers within the EU (but not presently researchers in South Africa). Notably, Article 40 does not necessarily preclude access to individuals based outside of the EU. Article 40(8)(a) of the DSA read together with Article 2, point 1 of the Digital Single Market Directive suggests that to be eligible to access platform data one should be affiliated with a non-profit entity located in any country devoted to scientific research. The regulation does, however, restrict the geographic scope of the research foci for which access to platform data will be provided. Both Article 40(4) and Article 40(12) suggest that the research must study "systemic risks in the Union." In a broad interpretation of this restriction, Husovec17 argues that "research focusing on risks in the Union needs to study non-EU countries to be scientifically sound." This interpretation suggests that, notwithstanding the data privacy concerns associated with cross-border data sharing and the practicalities of data access outside the EU, with careful justification, we can leverage this legislation to access data on South African social media users. Regardless, our 'first prize' would be the production of similar legislation in South Africa (proposition 1) so that we can avoid being reliant on partnerships with researchers in the EU. In doing so, we would not be alone; recognising the growing prominence of the digital/online environment, many other countries are developing legislation in this regard. The UK Online Safety Bill, for example, contains similar provisions to the DSA.
With this Commentary I have aimed to call attention to this critical issue, highlight the need for increased use of digital trace data in South African research, raise concerns about our rapidly disappearing access to platform data, and initiate a conversation about the need for policy enabling research with social media data in South Africa. While the EU has provided an example for how we can achieve increased transparency and access to social media data, developing and implementing the relevant legislation fit for our economy will not be an easy task. We will require input from experts in many disciplines (e.g. law, economics, computer science, and the social sciences at large) and the establishment of resource-intensive data intermediaries who can steward (collect, maintain, share) access to platform data as well as the development of infrastructure to enable and manage these procedures (data clean rooms, virtual laboratory environments, etc.), all while being mindful of the substantial ethical and privacy risks posed by increased data access (see de Vreese and Tromble14 for a discussion of how research data access can co-exist with data privacy regulations, and point five in Klinger and Ohme16 for recommendations on how data sensitivity can be managed). Despite these challenges, I believe that, without proactive effort, we risk being left behind without any robust means of studying the potential systemic risks at play.
Competing interests
I have no competing interests to declare.
References
1. Lazer D. Social science, today. Science. 2018;359(6371):42. https://doi.org/10.1126/science.aaq0679 [ Links ]
2. Lazer D, Hargittai E, Freelon D, Gonzalez-Bailon S, Munger K, Ognyanova K, et al. Meaningful measures of human society in the twenty-first century. Nature. 2021;595(7866):189-196. https://doi.org/10.1038/s41586-021-03660-7 [ Links ]
3. Wagner C, Strohmaier M, Olteanu A, Kiciman E, Contractor N, Eliassi-Rad T. Measuring algorithmically infused societies. Nature. 2021;595(7866), Art. #7866. https://doi.org/10.1038/s41586-021-03666-1 [ Links ]
4. Parry D, Davidson BI, Sewall CJR, Fisher JT, Mieczkowski H, Quintana DS. A systematic review and meta-analysis of discrepancies between logged and self-reported digital media use. Nat Hum Behav. 2021;5:1535-1547. https://doi.org/10.1038/s41562-021-01117-5 [ Links ]
5. Ohme J, Araujo T, Boeschoten L, Freelon D, Ram N, Reeves BB, et al. Digital trace data collection for social media effects research: APIs, data donation, and (screen) tracking. Commun Methods Meas. Forthcoming 2023. https://doi.org/10.1080/19312458.2023.2181319 [ Links ]
6. Guess AM, Malhotra N, Pan J, Barberá P Allcott H, Brown T, et al. How do social media feed algorithms affect attitudes and behavior in an election campaign? Science. 2023;381(6656):398-404. https://doi.org/10.1126/science.abp9364 [ Links ]
7. Ruths D, Pfeffer J. Social media for large studies of behavior. Science. 2014;346(6213):1063-1064. https://doi.org/10.1126/science.346.6213.1063 [ Links ]
8. Van Bavel JJ, Rathje S, Harris E, Robertson C, Sternisko A. How social media shapes polarization. Trends Cogn Sci. 2021;25(11):913-916. https://doi.org/10.1016/j.tics.2021.07.013 [ Links ]
9. Lazer DMJ, Pentland A, Watts DJ, Aral S, Athey S, Contractor N, et al. Computational social science: Obstacles and opportunities. Science. 2020; 369(6507):1060-1062. https://doi.org/10.1126/science.aaz8170 [ Links ]
10. Becerra G, Ratovicius C. Social sciences and humanities on big data: A bibliometric analysis. J Syst Inf Technol Man. 2022;19. https://doi.org/10.4301/S1807-1775202219011 [ Links ]
11. Ghai S, Fassi L, Awadh F, Orben A. Lack of sample diversity in research on adolescent depression and social media use: A scoping review and meta-analysis. Clin Psychol Sci. 2023;11(5):759-772. https://doi.org/10.1177/21677026221114859 [ Links ]
12. Freelon D. Computational research in the post-API age. Polit Commun. 2018;35(4):665-668. https://doi.org/10.1080/10584609.2018.1477506 [ Links ]
13. Davidson BI, Wischerath D, Racek D, Parry DA, Godwin E, Hinds J, et al. Platform-controlled social media APIs threaten open science. Nat Hum Behav. 2023;7(12):2054-2057. https://doi.org/10.1038/s41562-023-01750-2 [ Links ]
14. De Vreese C, Tromble R. The data abyss: How lack of data access leaves research and society in the dark. Polit Commun. 2023;40(3):356-360. https://doi.org/10.1080/10584609.2023.2207488 [ Links ]
15. Bruns A. After the 'APIcalypse': Social media platforms and their fight against critical scholarly research. Inf Commun Soc. 2019;22(11):1544-1566. https://doi.org/10.1080/1369118X.2019.1637447 [ Links ]
16. Klinger U, Ohme J. What the scientific community needs from data access under Art. 40 DSA: 20 Points on infrastructures, participation, transparency, and funding. Weizenbaum Policy Paper 8. Berlin: Weizenbaum Institute for the Networked Society - The German Internet Institute; 2023. https://doi.org/10.34669/WI.WPP/8.2 [ Links ]
17. Husovec M. How to facilitate data access under the Digital Services Act. SSRN; 2023 May 19. Available from: https://ssrn.com/abstract=4452940 [ Links ]
Correspondence:
Douglas Parry
Email: dougaparry@sun.ac.za
Published: 27 March 2024