The new world of government transparency through technology

The big news lately is that the Center for Responsive Politics opened up their large database of normalized campaign contribution records under a Creative Commons license. I think this is more significant to the world of government transparency & technology than it might appear. Just around five years ago this world was quite different. Organizations like CRP were very much using technology to bring new insight to civics. That hasn’t changed. But organizations saw themselves as solitary entities whose primary mission was to provide a new direct-to-citizen service to the public. A web application, for instance. There’s no need for me to list off other examples — every advocacy and government transparency website was like that, to the best of my recollection. (Except maybe IMSP who seemed to be ahead of the pack.)

All that has changed, and I wish I could pinpoint exactly how that happened. The combination of “Web 2.0” as a buzz-word and grassroots digital campaigning in 2004 probably had a lot to do with it. The Howard Dean presidential campaign got a boost (at least in terms of publicity if not poll numbers) from developers coming together to specialize the Drupal open source CMS for political campaigning (“CivicSpace”). That sent a message, even if no one quite recognized it at the time, that developers have a role to play in the world of civics and that cooperation was a viable model for getting things done. Not to say that the CivicSpace project invented this — I was working on GovTrack for a few years by that point and across the pond Tom Steinberg and the MySociety group had been thinking about open source civics for even longer. But I suspect, even in my own thinking, that CivicSpace crystalized some vague earlier notions of civic hacking.

The story isn’t over yet, though, because I don’t think any of this alone would have brought us to where we are today. Unfortunately, from this point forward I run the risk of giving too much credit to the things I know about and not enough credit elsewhere. Still, here’s how I see it. Four more things had to happen, independently. First, entrepreneur Mike Klein had to make a lot a lot a lot of money. Second, Dan Newman and David Moore had to build and, respectively. These are, now, and especially were at the start, leading examples of how you can do really cool new things by mixing data sources (for MAPLight, mixing my GovTrack legislation data with campaign contribution data from CRP) or re-mixing data sources (for OpenCongress giving my legislation data a more social spin). Third, John Wonderlich had to start, quite by accident, the Open House Project — this was a crucial step in bridging the technology world with staffers for congressmen, especially with Speaker Pelosi’s office. The fourth bit was that Ellen Miller and Micah Sifry had to put it all together and form the Sunlight Foundation: funding from Mike going to two great technology projects (IMO these are Sunlight’s most important grantees) and a policy arm with teeth because of its pragmatic approach to connecting with policymakers.

That’s pretty much it, because from there things just make sense. Sunlight recruited great staff and steamrolled through the open government world stamping out the idea that each open government group should be in its own little world — by funding interaction, in a sense.

The expectations for government transparency advocacy changed. Groups had to walk the walk a bit more by sharing and collaborating. So now besides CRP’s data being opened up for anyone to remix we have the Taxpayers earmark data, the Sunlight Labs API, the MAPLight API, and probably several more databases. The New York Times API probably owes some of its inspiration to these changing expectations too. So it’s a whole new world now of not just open governenment, and not even open government data, but open government transparency advocacy data. (Is there a catchier name for that?)

Update on bulk data from Congress

One of the Open House Project’s recommendations was that Congress share its legislative data with the public in bulk and I’ve had a long history of posts on the subject. Over at the Free Gov info blog (link), Bob Tapella, Public Printer at the Government Printing Office, tells us that they are responding to this recommendation. He writes in a comment (presumably it is really him):

We have recently been called upon by Congress in the joint explanatory statement on the H.R. 1105, to work with the Library of Congress, including the Congressional Research Service, and the Law Library of Congress, to discuss access to bulk data. Specifically, the language is as follows:

[JT: omitted — I’ve posted it before here]

To address this request, a Legislative branch task force has been assembled consisting of representatives from the offices of the Secretary of the Senate, the Clerk of the House, the Library of Congress, Congressional Research Service, the Law Library of Congress, and GPO. This task force has already met and is working to develop a position on access to bulk data. We will look to this work and the review by Congress to help guide our work on making bulk data accessible.


Check me out: My talk at Berkeley’s Free Culture Conference last year

Watch a video of my talk at the Free Culture Conference last year on Civic Hacking. (Text and slides here.) It was my best talk yet. I’ve got another good one (if I do say so myself) coming up at CITP’s Studying Society in a Digital World conference in a few weeks at Princeton.


Andy Gueritz announced on the mail list for my SemWeb RDF library for .NET that he has created an OLE provider for a SPARQL endpoint that is usable in Microsoft Excel. He wrote,

In a moment of insanity (but a great learning experience), I gave myself the challenge of writing an OLE DB provider for SPARQL. It is built on top of the SemWeb libary which has saved a substantial amount of effort and also brings some powerful functionality to the table very quickly (Thanks, Joshua!)

The provider as constructed implements a readonly OLE provider that supports all four SPARQL query types and interfaces to SemWeb through COM-Callable Wrapper. It is not extensively tested yet but seems to work with most of the queries I have now put through it, and of course being built on SemWeb it is able to read both local and remote SPARQL sources.

Moral of the story: populate Excel tables with SPARQL queries.

More here.

Try hacking for government transparency in GSoC

Does the thought of “hacking Congress” entice you? I don’t mean breaking in to U.S. Capitol servers, of course, but putting your l33t hacking skillz to use to improve government transparency and civic engagement. The Sunlight Foundation (I have no affiliation) is a mentoring organization in Google Summer of Code 2009. Check it out.

Shameless plug: