Request for Proposals: Exploring the Implications of Government Release of Large Datasets
| overview | background | selection criteria | deadlines |
| application requirements | eligibility requirements | submissions |
| subject matter experts | judges |
The Berkeley Center for Law & Technology and Microsoft are issuing this request for proposals (RFP) to fund scholarly inquiry to examine the civil rights, human rights, security and privacy issues that arise from recent initiatives to release large datasets of government information to the public for analysis and reuse. This research may help ground public policy discussions and drive the development of a framework to avoid potential abuses of this data while encouraging greater engagement and innovation.
This RFP seeks to:
- Gain knowledge of the impact of the online release of large amounts of data generated by citizens' interactions with government
- Imagine new possibilities for technical, legal, and regulatory interventions that avoid abuse
- Begin building a body of research that addresses these issues
Governments at all levels are releasing large datasets for analysis by anyone for any purpose—“Open Data.” Using Open Data, entrepreneurs may create new products and services, and citizens may use it to gain insight into the government. A plethora of time saving and other useful applications have emerged from Open Data feeds, including more accurate traffic information, real-time arrival of public transportation, and information about crimes in neighborhoods. Sometimes governments release large datasets in order to encourage the development of unimagined new applications. For instance, New York City has made over 1,100 databases available, some of which contain information that can be linked to individuals, such as a parking violation database containing license plate numbers and car descriptions.
Data held by the government is often implicitly or explicitly about individuals—acting in roles that have recognized constitutional protection, such as lobbyist, signatory to a petition, or donor to a political cause; in roles that require special protection, such as victim of, witness to, or suspect in a crime; in the role as businessperson submitting proprietary information to a regulator or obtaining a business license; and in the role of ordinary citizen. While open government is often presented as an unqualified good, sometimes Open Data can identify individuals or groups, leading to a more transparent citizenry. The citizen who foresees this growing transparency may be less willing to engage in government, as these transactions may be documented and released in a dataset to anyone to use for any imaginable purpose—including to deanonymize the database—forever. Moreover, some groups of citizens may have few options or no choice as to whether to engage in governmental activities. Hence, open data sets may have a disparate impact on certain groups. The potential impact of large-scale data and analysis on civil rights is an area of growing concern. A number of civil rights and media justice groups banded together in February 2014 to endorse the “Civil Rights Principles for the Era of Big Data” and the potential of new data systems to undermine longstanding civil rights protections was flagged as a "central finding" of a recent policy review by White House adviser John Podesta.
The Berkeley Center for Law & Technology (BCLT) and Microsoft are issuing this request for proposals in an effort to better understand the implications and potential impact of the release of data related to U.S. citizens’ interactions with their local, state and federal governments. BCLT and Microsoft will fund up to six grants, with a combined total of $300,000. Grantees will be required to participate in a workshop to present and discuss their research at the Berkeley Technology Law Journal (BTLJ) Spring Symposium. All grantees’ papers will be published in a dedicated monograph. Grantees’ papers that approach the issues from a legal perspective may also be published in the BTLJ. We may also hold a followup workshop in New York City or Washington, DC.
While we are primarily interested in funding proposals that address issues related to the policy impacts of Open Data, many of these issues are intertwined with general societal implications of “big data.” As a result, proposals that explore Open Data from a big data perspective are welcome; however, proposals solely focused on big data are not. We are open to proposals that address the following difficult question. We are also open to methods and disciplines, and are particularly interested in proposals from cross-disciplinary teams.
- To what extent does existing Open Data made available by city and state governments affect individual profiling? Do the effects change depending on the level of aggregation (neighborhood vs. cities)? What releases of information could foreseeably cause discrimination in the future? Will different groups in society be disproportionately impacted by Open Data?
- Should the use of Open Data be governed by a code of conduct or subject to a review process before being released? In order to enhance citizen privacy, should governments develop guidelines to release sampled or perturbed data, instead of entire datasets? When datasets contain potentially identifiable information, should there be a notice-and-comment proceeding that includes proposed technological solutions to anonymize, de-identify or otherwise perturb the data?
- Is there something fundamentally different about government services and the government’s collection of citizen’s data for basic needs in modern society such as power and water that requires governments to exercise greater due care than commercial entities?
- Companies have legal and practical mechanisms to shield data submitted to government from public release. What mechanisms do individuals have or should have to address misuse of Open Data? Could developments in the constitutional right to information privacy as articulated in Whalen and Westinghouse Electric Co address Open Data privacy issues?
- Collecting data costs money, and its release could affect civil liberties. Yet it is being given away freely, sometimes to immensely profitable firms. Should governments license data for a fee and/or impose limits on its use, given its value?
- The privacy principle of “collection limitation” is under siege, with many arguing that use restrictions will be more efficacious for protecting privacy and more workable for big data analysis. Does the potential of Open Data justify eroding state and federal privacy act collection limitation principles? What are the ethical dimensions of a government system that deprives the data subject of the ability to obscure or prevent the collection of data about a sensitive issue? A move from collection restrictions to use regulation raises a number of related issues, detailed below.
- Are use restrictions efficacious in creating accountability? Consumer reporting agencies are regulated by use restrictions, yet they are not known for their accountability. How could use regulations be implemented in the context of Open Data efficaciously? Can a self-learning algorithm honor data use restrictions?
- If an Open Dataset were regulated by a use restriction, how could individuals police wrongful uses? How would plaintiffs overcome the likely defenses or proof of facts in a use regulation system, such as a burden to prove that data were analyzed and the product of that analysis was used in a certain way to harm the plaintiff? Will plaintiffs ever be able to beat first amendment defenses?
- The President’s Council of Advisors on Science and Technology big data report emphasizes that analysis is not a “use” of data. Such an interpretation suggests that NSA metadata analysis and large-scale scanning of communications do not raise privacy issues. What are the ethical and legal implications of the “analysis is not use” argument in the context of Open Data?
- Open Data celebrates the idea that information collected by the government can be used by another person for various kinds of analysis. When analysts are not involved in the collection of data, they are less likely to understand its context and limitations. How do we ensure that this knowledge is maintained in a use regulation system?
- Former President William Clinton was admitted under a pseudonym for a procedure at a New York Hospital in 2004. The hospital detected 1,500 attempts by its own employees to access the President’s records. With snooping such a tempting activity, how could incentives be crafted to cause self-policing of government data and the self-disclosure of inappropriate uses of Open Data?
- It is clear that data privacy regulation could hamper some big data efforts. However, many examples of big data successes hail from highly regulated environments, such as health care and financial services—areas with statutory, common law, and IRB protections. What are the contours of privacy law that are compatible with big data and Open Data success and which are inherently inimical to it?
- In recent years, the problem of “too much money in politics” has been addressed with increasing disclosure requirements. Yet, distrust in government remains high, and individuals identified in donor databases have been subjected to harassment. Is the answer to problems of distrust in government even more Open Data?
- What are the ethical and epistemological implications of encouraging government decision-making based upon correlation analysis, without a rigorous understanding of cause and effect? Are there decisions that should not be left to just correlational proof? While enthusiasm for data science has increased, scientific journals are elevating their standards, with special scrutiny focused on hypothesis-free, multiple comparison analysis. What could legal and policy experts learn from experts in statistics about the nature and limits of open data?
All proposals received by the submission deadline and in compliance with the eligibility criteria will be peer-reviewed by a panel of subject-matter experts listed below. Experts may submit their own proposals, but are prohibited from reviewing or submitting feedback on any proposals they submit.
BCLT will select the most worthy proposals for funding. Although BCLT will use the feedback from the panel of subject-matter experts, all funding decisions are solely within BCLT’s discretion. No individual feedback will be provided on proposals that are not funded.
All proposals will be evaluated based on the following criteria:
- Addresses an important research question that, if answered, has the potential to have a significant impact on the public’s understanding the implications of open government
- Where possible, an interdisciplinary approach, including technical works that incorporate legal analysis
- Potential for wide dissemination and use of knowledge, including specific plans for scholarly publications, public presentations, and white papers
- Ability to complete the project including adequacy of resources available, reasonableness of timelines, and qualifications of identified contributors
- Qualifications of principle investigator including previous history of work in the area, successful completion of previously-funded projects, research or teaching awards, and books published
- Proposal submission deadline: September 25, 2014
- Notification of results (25% of award): October 30, 2014
- Draft paper submission (50% of award): May 1, 2015
- Workshop: April 16-17, 2015
- Final paper submission (25% of award): June 1, 2015
Note: Dates are subject to changes.
Project proposal: The proposal contains full details of the proposed project in a maximum of 10 pages. The project proposal will be made available for peer review by BCLT, Microsoft and experts and scholars in the field. The project proposal should include:
- Project description: What set of questions will be addressed? How will they be addressed? How will answering these questions help advance what is known about civic tech?
- Approach: What is the methodological and theoretical approach that the researchers will address? Exactly how will the researchers go about answering the question?
- Related research: Briefly summarize related research, including references where appropriate.
- Researchers’ roles: Describe the role of individual researchers on the project and how their skills and knowledge enable them to address the question proposed.
- Schedule: What milestones will be used to measure progress of the project during the year and when will they be completed? If the project described is part of a larger ongoing research program, estimate the time for completion of this project only.
- Use of funds: Provide a budget (in U.S. dollars) describing how the award will be used. The budget should be presented as a table with the total budget request clearly indicated. The budget may include honorariums.
- Other support: Include other contributions to this project (cash, goods, and services), if any, but do not include the use of university facilities that are otherwise provided on an ongoing basis. Note: Authors of winning proposals will be required to submit an original letter on department letterhead certifying the commitment of any additional or matching support described in the proposal.
To be eligible for this RFP, your institute and proposal must meet the following requirements :
- Institutions must have access to the knowledge, resources, and skills necessary to carry out the proposed research.
- Institutions must be either an accredited degree-granting university with a non-profit status or a research institution with non-profit status. Researchers from international institutions are eligible to apply.
- Proposals that are incomplete or request funds in excess of the maximum award will be excluded from the selection process.
- The receiving institution must agree that awards are made as unrestricted gifts, will not be subject to indirect costs or overhead charges , and these may not be included in the budget for the proposed project.
Additionally, BCLT assumes that the final papers submitted are available for publication and dissemination. It is the responsibility of the authors, not BCLT, to determine whether publication of the papers requires the prior consent of other parties and, if so, to obtain it.
To submit a proposal, visit the Conference Management Toolkit (CMT) here.
Once you have created a profile, the site will allow you to submit your proposal.
If you have questions, please contact Chris Hoofnagle, principal investigator on this project.
- Legislative Counsel, Privacy, American Civil Liberties Union
- Principal, Microsoft Research
- Professor, NYU | Director, Information Law Institute
- Principal, Robinson & Yu LLC
- Alfred B. Engelberg Professor of Law, NYU School of Law
- Senior Staff Attorney, Electronic Frontier Foundation
- Research Director, Institute for the Future | Author of
- Senior Counsel and Managing Policy Director, The Leadership Conference on Civil and Human Rights
Funding decisions will be made exclusively by BCLT's Professor Paul Schwartz and Chris Hoofnagle.
- Director, Information Privacy Programs, BCLT | Senior Staff Attorney, Samuelson Law, Technology and Public Policy Clinic
- Jefferson E Peyser Professor of Law | Co-Director, BCLT