Implications of Government Release of Large Datasets

In 2014 – 15, the Berkeley Center for Law & Technology conducted a project to examine the civil rights, human rights, security, and privacy issues that arise from initiatives to release large datasets of government information to the public for analysis and reuse.
 
Funded by Microsoft, this research was intended to gain knowledge of the impact of the online release of large amounts of data generated by citizens’ interactions with government and to help ground public policy discussions and drive the development of a framework to avoid potential abuses of this data while encouraging greater engagement and innovation.  A public request for proposals was issued and six projects were funded.
 
In April, 2015, results from these projects were presented at the BCLT/Berkeley Technology Law Journal Annual Symposium. Final papers were published in the BTLJ.

In November 2015, BCLT co-sponsored a follow-on conference at New York University on responsible use of open data.


BACKGROUND

Governments at all levels are releasing large datasets for analysis by anyone for any purpose—“Open Data.”  Using Open Data, entrepreneurs may create new products and services, and citizens may gain insight into government programs and activities.  A plethora of time-saving and useful applications have emerged from Open Data feeds, including more accurate traffic information, real-time arrival of public transportation, and information about crimes in neighborhoods.  Sometimes governments release large datasets in order to encourage the development of unimagined new applications.  For instance, New York City has made over 1,100 databases available, some of which contain information that can be linked to individuals, such as a parking violation database containing license plate numbers and car descriptions.

Data held by the government is often implicitly or explicitly about individuals—acting in roles that have recognized constitutional protection, such as lobbyist, signatory to a petition, or donor to a political cause; in roles that require special protection, such as victim of, witness to, or suspect in a crime; in the role as businessperson submitting proprietary information to a regulator or obtaining a business license; and in the role of ordinary citizen.  While open government is often presented as an unqualified good, sometimes Open Data can identify individuals or groups, leading to a more transparent citizenry.  The citizen who foresees this growing transparency may be less willing to engage in government, as these transactions may be documented and released in a dataset to anyone to use for any imaginable purpose—including to deanonymize the database—forever.  Moreover, some groups of citizens may have few options or no choice as to whether to engage in governmental activities.  Hence, open data sets may have a disparate impact on certain groups. The potential impact of large-scale data and analysis on civil rights is an area of growing concern.  A number of civil rights and media justice groups banded together in February 2014 to endorse the “Civil Rights Principles for the Era of Big Data” and the potential of new data systems to undermine longstanding civil rights protections was flagged as a significant concern in a 2014 White House policy review.