DC’s open data directive adopts the mistakes made by the White House

Earlier today DC’s mayor issued a Transparency, Open Government and Open Data Directive (readable thanks to Alex Howard here). Much of it was adapted from the White House’s open government memoranda, including those memoranda’s faults.

Overview

There are many things to like about the Directive, including the mention of a potential new Chief Data Officer position, the use of open formats, and the goal of promoting reuse. The framing in terms of transparency, participation, and collaboration — lifted from Obama’s 2009 open government memo and adopted in the Mayor’s 2011 memorandum on transparency and open government — is good. (Though not great. The White House never managed to actually execute the collaboration part.)

But much of it is also undercut by a new notion of conditional access to government data that is becoming the norm.

Having their cake and eating it too

What I mean is that while the directive explicitly and clearly states that there will be

no restrictions on copying, publishing, further distributing, modifying or using the data [in DC’s data catalog]

it simultaneously explicitly describes a number of restrictions that there will or may be on use of the data. (It’s clear DC copied language from the White House’s 2013 open data memo (“M-13-13”), which I’ve blogged about before here and here, including their mistakes.)

“No restrictions” is what we want. It is, by community consensus, a core and defining quality of open government data.

If there are capricious rules around the reuse of it, it’s not open government data. Period. Restrictions serve only to create a legal lever by which the government can put pressure on things they don’t like. Imagine if the DC government took legal action against Greater Greater Washington to stop an unflattering story on the basis that GGW didn’t properly cite the DC government for the data used in a story. This is what the future of open data in DC looks like when there are restrictions on reuse.

Okay so specifically:

“Open license” does not mean “no restrictions”

So first it says that the data catalog will accomplish this goal of “no restrictions” by making the data available through an “open license.” The usual meaning of open license does not mean “no restrictions,” however. Most open licenses, including open source licenses and Creative Commons licenses, only grant some privileges but not others. Often privileges come along with new requirements, such as GPL’s virality clause, or the restriction that users must attribute the work to the author. Under the Open Definition, “open” means reusable but potentially subject to certain terms.

In guidance I co-wrote with Eric Mill, Jonathan Gray, and others called Best-Practices Language for Making Data “License-Free”, we addressed what governments should do if they really want to create “no restrictions.” They should use CC0, a copyright waiver. This is really the only way to achieve “no restrictions.”

(This was one of the confusions in M-13-13 as well. It’s clear the directive took the open licensing language from M-13-13.)

“Open license” presumes the work is copyrighted

Facts cannot be copyrighted. To the extent that DC’s data catalog contains facts about the District, about government operations, and so on, the data files in the catalog are likely not subject to copyright protections. (What is and isn’t copyrightable is murky.) Open licensing, as normally understood, presumes the work is copyrighted. If the work isn’t copyrighted, an open license simply doesn’t apply. You can’t license what you don’t own.

(This was another one of the confusions in M-13-13. But unlike the federal government, the DC government probably can copyright things it produces. But probably not data files.)

Data users must agree to a contract first

The data “shall be subject to Terms of Use developed by OCTO.” This means that DC residents will have to agree to a contract before getting the data. What will the contract say? More on that later. This is, by its nature, a restriction on use.

Imagine if data provided in response to a Freedom of Information Act request came with a contract. They’ll fulfill the FOIA request but only if — let’s say hypothetically — you agree to not sue the government using the information you get. Well, duh, that defeats the point. Just as a Terms of Use agreement undermines “no restrictions.”

The directive indicates that the Terms of Use will include a “disclaimer of liability or indemnification provision”. These are complex legal provisions that could involve waiving rights or compensating the DC government if there is a lawsuit. These are serious things to consider before using government data.

(This was not a problem in M-13-13. The License-Free Best Practices did address this though.)

Attribution and explanation requirements

The directive also gives us a clue about what else will be in the Terms of Service:

Nothing in this Order shall be deemed to prohibit OCTO or any agency … from adopting or implementing measures necessary or appropriate to . . . (v) require a third party providing the District’s public data (or applications based on public data) to the public to explicitly identify the source and version of the public dataset, and describe any modifications made to the public dataset.

This is an attribution requirement, plus a requirement for data users to explain themselves.

To be sure, and as Alex Howard called me out on on Twitter, these are hypotheticals that the directive leaves open and not something the directive is mandating. But the fact that these are mentioned strongly suggests that OCTO or other agencies want to enforce these sort of terms and will if they can.

And, as you might guess I would say, requirements to attribute the government for data and to explain what you did with data are restrictions on use, which like the others create a lever by which the DC government might put pressure on things it doesn’t like.

(This was also a problem in M-13-13, but in this case it doesn’t appear that the DC directive specifically copied the problem from M-13-13.)

Conclusion

There is a strong American tradition — or at least a core American value — that the government does not get in the way of the dissemination of ideas. We don’t always live up to that ideal, but we strive for it. Access to information about the government that comes with restrictions on what we can say when we use it (e.g. attribution & explanation), a waiver of rights or a commitment to indemnify, etc. are all an anathema to accountability and transparency and respect for the public.

If and when these new terms go up, I will encourage users to FOIA for the same information rather than get it from the DC data catalog.

I’m tracking the White House with persistent cookies

ProPublic reported this morning that WhiteHouse.gov is — albeit accidentally — using a new method for tracking individual visitors to the website. This reminded me that for the last 6 months I’ve been tracking the White House.

Methodology

On Jan 17 the President made his first major speech regarding reforms to the NSA’s massive surveillance programs revealed last year. I thought that morning that he would announce new mandatory data retention policies for internet and telephone service providers. He didn’t. But by the time the speech began I had already started tracking the White House.

About 5% of traffic to my website GovTrack.us comes from the government. Most IP addresses are tied to the major broadband providers like Verizon, Comcast, and so on. But some government IP addresses come from special IP address blocks labeled specifically for the office that reserved them.

Three blocks were of interest to me: the blocks for the Executive Office of the President (“EOP”, about 60 page views on GovTrack per day), the United States Senate (about 300 page views/day), and the House of Representatives (about 600 page views/day). I don’t know where the computers are that have these IP addresses, but I expect that EOP IP addresses would include White House, West Wing, and perhaps the Eisenhower Executive Office Building (more on what the EOP is). The House and Senate IP addresses are used in the Capitol and the seven congressional office buildings, including in non-political offices and the guest WiFi network, to the best of my knowledge.

On Jan 17 I began uniquely identifying the users of these IP blocks by placing a persistent cookie with a unique identifier in their web browsers when they visited GovTrack and logging each of those page views. Persistent cookies get lost when users clear their browser cookies, but it’s a useful first approximation to identifying users.

Summary Results

So far, 324,705 hits on GovTrack have been logged from 19,131 unique tracking cookies:

Network Hits Uniques
EOP 12,512 1,161
Senate 92,917 7,572
House 219,276 10,771

(Interestingly, 373 unique cookies appeared on more than one of the three networks — probably a laptop that moved from one building to another.)

The longest recorded session is for one tracking cookie on the House network that made 1,590 page views almost all on Feb 21, but also in January and March, to pages for various representatives. My guess is this was a lobbyist on the guest wifi doing research before a meeting.

From the EOP, the longest recorded session is 901 page views between March 18 and July 18. This user mostly looked at my congressional district maps and a few bills on a variety of subjects. There was no discernible pattern to it, except that this person is probably responsible for looking up the congressional districts of people. Maybe the person processes incoming mail to the President.

This is all I’ll look into right now, but I may post more about it if I find anything interesting.

I’d be glad to share the data on request.

We the People is 10% a Sham

We the People, one of the White House’s cornerstone open government initiatives, is 10% a sham. The site promises to respond to petitions posted by users if the petition reaches a certain number of signatures within 30 days. Nextgov reported earlier in the year that the White House was not keeping its end of the pledge. It’s true. More than 10% of petitions that deserve an answer go unanswered.

The threshold for a White House response has gone up steadily as the popularity of We The People has increased. In 2012 the threshold was 25,000 signatures within 30 days, and now the threshold is 100,000. Except when it isn’t. Of the 217 petitions that gathered enough signatures for a response, 29 have gone unanswered.

The full list of petitions that the White House owes a response to are below.

Two of those petitions hit close to home for the open government movement. In early 2013, Aaron Swartz, an early leader of the open government movement, committed suicide while under investigation for downloading research papers without permission. Two petitions (1, 2) were submitted shortly after his death calling for the firing of attorneys believed to be over-zealous in their prosecution of Swartz. The petitions each gathered more than the then-threshold of 25,000 signatures, but more than one year and 89,733 signatures later there has been no response.

The earliest of the petitions still “pending response,” according to the White House’s own data, is a petition which has now reached ten times the (early) 5,000 signature threshold. It is a petition about GMO food labeling. Another petition, at 211,925 signatures today (181,479 in its first 30 days), created in July 2013 asked the White House to declare the Muslim Brotherhood party in Egypt a terrorist organization. It remains not responded to as well.

The White House has posted 156 responses to 225 petitions since the site launched in September 2011. 2,916 petitions have been created in all. The most successful petition, by number of signatures, was one created in December 2012 asking the White House to recognize the Westboro Baptist Church as a hate group. It gathered 367,180 signatures and in July 2013 received a response to the effect that the White House does not maintain a list of hate groups — demonstrating that, of course, getting a response does not mean getting the response the petitioners wanted.

h/t and thanks to @konklone for mentioning this to me a long long while ago.

Okay here’s the list, in order of number of signatures gathered. The date before each is the date the petition was created, and so it met its threshold 30 days later.

07/07/13: Declare Muslim Brotherhood organization as a terrorist group (211,925 signatures; 181,479 in 30 days)

06/09/13: Pardon Edward Snowden (161,395 signatures; 129,312 in 30 days)

05/03/13: Invest and deport Jasmine Sun who was the main suspect of a famous Thallium poison murder case (victim:Zhu Lin) in China (151,169 signatures; 148,285 in 30 days)

05/13/14: put sanctions on China for invading Vietnam territory with the deployment of oil rig Haiyang 981. (139,216 signatures; 138,878 in 30 days)

06/05/13: allow Tesla Motors to sell directly to consumers in all 50 states. (138,379 signatures; 110,384 in 30 days)

12/11/13: Remove offensive state in Glendale, CA public park (129,170 signatures; 123,629 in 30 days)

05/01/14: Demand Release of U.S.M.C. Sgt. Tahmooressi Suffering with PTSD from Mexico Imprisonment (128,770 signatures; 115,889 in 30 days)

04/25/14: Urge S. Korean Government & Press to Stop the Attack Against Church in the Aftermath of Ferry Tragedy (119,369 signatures; 117,749 in 30 days)

08/22/13: Stop SOPA 2013 (118,905 signatures; 106,486 in 30 days)

05/15/13: Provide necessary assistance to prevent Taiwanese people from being murdered by Philippines and rebuild friendship. (115,676 signatures; 113,330 in 30 days)

02/17/14: Stop SOPA 2014. (112,293 signatures; 104,184 in 30 days)

11/12/13: Reform ECPA: Tell the Government to Get a Warrant (112,087 signatures; 105,236 in 30 days)

04/12/12: Support mandatory labeling of genetically engineered foods (GMOs). (110,784 signatures; 30,740 in 30 days)

02/26/14: Allow Ukrainian Citizens 90 day entrance into the USA on passport, without Visa. (107,909 signatures; 103,037 in 30 days)

01/04/14: Please Protect The Peace Monument in Glendale Central Library (106,751 signatures; 105,390 in 30 days)

02/27/14: Urge the FDA to Say YES to Accelerated Approval for safe, effective therapies for children with Duchenne. (106,734 signatures; 105,036 in 30 days)

04/23/14: Designate Russia as “State Sponsor of Terrorism” (104,914 signatures; 103,722 in 30 days)

03/21/14: Legally Recognize Non-Binary Genders (103,166 signatures; 101,494 in 30 days)

09/23/11: Require all Genetically Modified Foods to be labeled as such. (64,311 signatures; 8,747 in 30 days)

01/12/13: Remove United States District Attorney Carmen Ortiz from office for overreach in the case of Aaron Swartz. (60,881 signatures; 52,466 in 30 days)

05/10/12: Remove the monument and not to support any international harassment related to this issue against the people of Japan. (47,477 signatures; 31,473 in 30 days)

06/21/12: Repeal the House of Representatives Resolution 121 to stop aggravating int’l harassment by Korean propaganda & lies! (46,012 signatures; 27,623 in 30 days)

09/01/12: Persuade South Korea (the ROK) to accept Japan’s proposal on territorial dispute over islets. (42,015 signatures; 30,213 in 30 days)

12/28/12: To award the Medal of Freedom to the 4 Firefighters who were ambushed in West Webster New York on Christmas Eve 2012 (34,067 signatures; 29,322 in 30 days)

12/02/12: Investigate and publicly condemn organ harvesting from Falun Gong believers in China (33,733 signatures; 28,624 in 30 days)

01/08/13: Invite Neal Boortz, the author of The FairTax Book, to spend one hour talking with the President about tax reform. (32,155 signatures; 28,191 in 30 days)

12/11/12: oppose the petition created by xHisa Axon Japanxs proposal to take Japanxs claim over Dokdo (or Takeshima) to the ICJ. (31,609 signatures; 28,959 in 30 days)

01/12/13: Fire Assistant U.S. Attorney Steve Heymann. (28,854 signatures; 25,717 in 30 days)

12/29/12: There are election rigging made by Progressive Program that have been used in the 18th Presidential Election of S. KOREA (26,797 signatures; 25,467 in 30 days)