A large chunk of the world’s largest collection of data – information about the political influence of online advertising on many of the planet’s 2.85 billion Facebook users – is available to researchers, but there’s a catch. Getting access to the massive trove of Facebook data requires navigating a labyrinth of contract minutiae, legal reviews, technical restrictions, and privacy agreements with the social media giant.
Facebook promised to release data to researchers more than three years ago. But the number of people who have actually been able to study the 1 billion gigabyte (an exabyte, or enough data to fill 250 million DVDs) dataset that was released in early 2020, and results about how social media has affected the political process has been limited.
When Facebook has released data, researchers argue that the results haven’t shed sufficient light on subjects because important variables have been left out. Although its publicly available advertising data has been useful, the information doesn’t include all potential sources of misinformation and has been described as “the tip of the iceberg in terms of where and how propaganda gets spread on Facebook,” according to Sam Woolley, a University of Texas researcher.
The information also hasn’t been shared with all researchers who have requested access, causing hard feelings and suspicions that Facebook is limiting the data releases to people whose research is unlikely to be damaging to the social media giant.
“They only give permission to people who are tech-friendly and whose answers are likely to be tech-friendly,” said Luigi Zingales, director of the University of Chicago Stigler Center for the Study of the Economy and the State. “For research, you need data. And the only people with data is them. So at some point, my fear is that we become part of the propaganda machine. The only articles we can write are the ones that big tech companies like.”
And even when agreements have been signed between researchers and Facebook, the platform hasn’t always followed through, often citing privacy concerns or using “differential privacy,” a controversial form of masking being considered by the U.S. Census Bureau and challenged by states who fear it will corrupt data. The technique involves taking attributes from one user and applying those attributes to another user, millions of times over.
Facebook also has prevented researchers from “scraping,” or using computer scripts to download potentially useful data, citing privacy concerns. The company paid a record $5 billion fine in 2019 for its role in the Cambridge Analytica scandal, which involved secretly sharing the private information of tens of millions of users with a company that used the information to profile and target potential voters. And while Facebook contends it doesn’t approve or reject research, it does reserve the right “to remove any confidential or personally identifiable information.”
Even when information is provided to researchers, it’s tightly controlled. One of the restrictions on the data, known as the Facebook Open Research and Transparency platform, or FORT, limits the data to advertisements with fewer than 100 impressions. The limit rules out the possibility of discovering the extent of “microtargeting,” a crucial use of its advertising that can push false information to extremely specific groups that then spread even more falsehoods through other means, such as individual posts or group pages.
Facebook also places technical limits on researchers. They cannot download data, but instead have to use a Facebook-approved website to analyze it. Facebook also limits the tools that can be used by researchers. While Facebook allows researchers to use some popular software programs (R, SQL, and Python) to analyze the data, more sophisticated computer programs are restricted.
While there are options to study Facebook’s influence without having to jump through multiple hoops that include a mind-numbing 17-page contract, those options – using publicly available data, creating a study-specific Facebook page, or downloading Facebook data from volunteers – provide far-from-complete pictures of its influence on the political process. The company often has disabled third-party scraping tools, such as a political advertisement collector tool designed by the news outlet Pro Publica and ordered a New York University researcher to shut down a browser extension that collected data on the targets of Facebook political advertising.
In the absence of greater transparency from social media giants, Congress can play a role by passing legislation to establish standards for how large tech companies share data with researchers. Ultimately, greater access to research and evaluate data from Facebook and other large social media platforms will help to diagnose and fix problems while holding the platforms accountable.