Data Release
Despite federal mandates requiring open data, some government agencies may still be reluctant to share their data freely. Their reluctance may include concerns over data quality or the ways that data may be interpreted. However, input from the data user community suggests that data users generally value access even to imperfect data. That suggests several best practices:
Release both raw data and improved data with transparency about accuracy, quality and provenance. Raw data is more likely to have quality problems but is also timelier than data that has been vetted and cleaned. Despite its imperfections, raw data may lead to innovative insights and uses. Strategies for releasing both raw and improved data can give data users more options, as long as the provenance and limitations of the datasets are clear, and as long as the “authoritative” dataset is clearly identified.
An effective strategy is to release both raw data and improved versions as they are developed, and keeping both versions available. This strategy meets data users’ needs for both timeliness and quality. Datasets do not have to be perfect to be usable, but it’s important to know the strengths and weaknesses of data in order to be able to use it. All data should be released together with transparent information about its accuracy, quality, and provenance. Conducting pre-release testing to assess data quality can be an additional safeguard. This could include developing an experimental space to give users access to work with data so they can test it before it is made public.
Employ user-focused communication strategies to encourage data dissemination and use. The case for open data becomes more compelling if there are clear examples that show its value to users. Federal data stewards, as well as independent organizations, can collect and publish case studies that show how open government data is being applied, and can help inspire others to release their data as well. Agencies can develop tutorials, customized user interfaces, and other tools to make their data easier to use. Hackathons and challenges can engage the open data community and others to find new ways to put data to use.
Release individuals’ data back to them where possible. In the U.S., as well as the UK and some other countries, the open data movement has included a growing focus on giving individuals access to data about themselves, such as health or financial data. This “MyData” movement is not about pure open data - no one is suggesting that personal data like this be released to the public at large. But it is a form of opening up data that is proving its value through initiatives like Green Button and Blue Button.