Background: Do I need a lawyer to hack?
Back in September I asked Do I need a lawyer to hack in DC? on the Code for DC blog. I had discovered that in exchange for access to the District’s data, civic hackers (including myself) were agreeing to very odd terms including not taking any legal action against the District. Imagine if the data reveals actual injustice. We’d have given up the right to use the legal system to make things right! See the Code for DC post for more on why I think these terms were bad policy, but in short: data isn’t “open” if it can only be used on capricious terms. Open government data must be license-free.
What’s been revised since then
The District’s Office of the Chief Technology Officer (OCTO) immediately engaged with me, Code for DC, and others in the open government community to fix these problems. To their credit, several OCTO staff members spent several hours talking through these issues with me on multiple occasions. They have really been putting in the effort to get this all right.
Little more than a week after my blog post, DC posted its first update to the terms, which Alex Howard covered here. That update removed two of the clauses that I noted were problematic:
- the agreement not to take legal action against the District
- the indemnification clause
The removal of those two clauses were major improvements. But the rest of the updated terms, in the parts I cared about, were incoherent. They had intended to retain a requirement to attribute the District in all uses of District data, they explained to me, but the legal language they used to say it made no sense.
In a new update to the terms posted today, which followed additional conversations with OCTO, there were two more great improvements. These terms were finally dropped:
- agreeing to follow all “rules”, a very ambiguous term
- the requirement to attribute the data to the District in all uses of the data (it’s now merely a suggestion)
The removal of these two requirements, in combination with the two removed in September, makes this a very important step forward.
One of my original concerns remains, however, and that is that the District has not granted anyone a copyright license to use District datasets. Data per se isn’t protected by copyright law, but the way a dataset is presented may be. The District has claimed copyright over its things before, and it remains risky to use District datasets without a copyright license. Both the September update and today’s update attempted to address this concern but each created more confusion that there was before.
Although today’s update mentions the CC0 public domain dedication, which would be the correct way to make the District data available, it also explicitly says that the District retains copyright:
- The terms say, at the top, that they “apply only to . . . non-copyrightable information.” The whole point is that we need a license to use the aspects of the datasets that are copyrighted by the District.
- Later on, the terms read: “Any copyrighted or trademarked content included on these Sites retains that copyright or trademark protection.” Again, this says that the District retains copyright.
- And: “You must secure permission for reuse of copyrighted … content,” which, as written (but probably not intended), seems to say that to the extent the District datasets are copyrighted, data users must seek permission to use it first. (Among other problems, like side-stepping “fair use” in copyright law.)
With respect to the copyright question, the new terms document is a step backward because it may confuse data users into thinking the datasets have been dedicated to the public domain when in fact they haven’t been.