The legislative data dance is a song that never ends

The House Appropriations committee passed up another chance to advance core transparency practices in Congress. In a draft report published this morning for FY2014 appropriations, the committee makes no mention of legislative data. And in the Bulk Data Task Force’s finally-released recommendations, the Library of Congress gets all worked up over something no one has been asking for.

Here’s the short of it. Can we get a spreadsheet simply listing all bills in Congress? Is that so hard? I guess so.

After last year’s legislative branch appropriations bill report said the committee was “concerned” that the public would misuse any bulk data downloads, The Washington Post covered how the public uses this sort of data for good, and House leadership formed a Bulk Data Task Force to consider if and how to make bulk legislative data available. That task force submitted recommendations to the House Appropriations committee last December, but it was only made available to the public last week (see this, page 679).

In the recommendations, the task force noted that it had begun several new transparency projects. One is the Bill Summaries project, in which the Library of Congress will begin to publish the summaries of House bills written by the Congressional Research Service (CRS) in some structured way. The Library of Congress’s report to the task force has some choice quotes:

“some groups may try to leverage this action to drive demand for public dissemination of CRS reports”  (Note that “CRS reports” are different from “CRS summaries.” That’s a whole other can of worms.)

“CRS could find itself . . . needing to clarify misrepresentations made by non-congressional actors”

“if there is an obligation to inform the general public to the risks of non-authoritative versions of the information, it has not been included in the estimates”

These CRS summaries have already been widely distributed… on GovTrack… for nearly a decade. (And, I’m sorry, but what risks am I causing?) And while I wouldn’t mind having the summaries easier to get from the Library, I certainly am not gunning for them. I want data like the list of cosponsors, what activities bills have gone through, or just a simple list of bills. If the Library thought this wasn’t a great place to start with bulk data, well, I couldn’t agree more!

Some of the other projects mentioned in the recommendations are indeed very useful (some of which I wrote about here). Others, however, touted bulk data success without making any new data available. In the recommendations’s meeting minutes in the appendix, the task force wrote that it discussed “what data is available on GovTrack compared to what would be available through the proposed GPO project.” Quite a bit! That proposed GPO project turned into the one that made no new data available. In their next meeting they met with me and folks from other groups (Sunlight, Cornell LII, and so on), but I don’t recall them asking me the question they posed the week before, oddly.

The other projects mentioned in the bulk data task force recommendations are:

  • Congress.gov, THOMAS’s upgrade, which is explicitly not providing any bulk data (except perhaps through the new Bill Summaries Project)
  • Member Data Update: The Clerk’s list of Members of the House now includes Bioguide IDs, which is fantastic and very helpful.
  • A new House History website launched or will launched. See, I don’t even know. Again, not bulk data.
  • Docs.House.Gov: Committee schedules and documents have been added. (Great! I’m using that data on GovTrack already.)
  • New XML data for House floor activity. (This is pretty interesting but a little disorganized. I would rather scrape THOMAS than use this XML data.)
  • The Clerk is launching a Twitter account. (No data here.)
  • HouseLive speaker search. (Searching videos. Data? Who knows.)
  • Stock Act public data disclosure.
  • Legislative Data Dashboard (not quite sure what this is).
  • Converting the United States Code to XML. (This is a big and commendable project.)
  • A contest to get the public to convert bills to the Akoma Ntoso XML data format. (Does not count as open government data if the public has to do the work.)
  • Replacing MicroComp (an old bill/report text drafting tool?).
  • Positive Law Codification (when did that become in scope for this task force?).
  • Editorial Updating System (no idea what this is).

So while the recommendations support the use of legislative data generally, it made no long term goals for broad access to the legislative data on THOMAS. And as for the only data in motion now, the Library of Congress appears not to be happy about making it widely available.

The committee report for the annual legislative branch appropriations bill, which kicked off the task force last year, has been an important document for legislative transparency in the past. Besides last year’s step backwards, in 2009 the report indicated the House supported “bulk data downloads” for the bill status information on THOMAS.gov. Though nothing came of it. This year the committee said nothing, so, well, I guess nothing will come of it too.

3 thoughts on “The legislative data dance is a song that never ends”

  1. Josh:

    Some of this is a bigger deal than it looks, although I hasten to add that we’ve had an XML version of the US Code since 2001, and a *good* XML version since 2004 ;). The stuff about positive-law codification and editorial support is in there for internal consumption and because it’s part of the same contract that is providing for the XML conversion. I would expect a few beneficial side effects from it, if only because the data architecture needed for it will, properly done, provide a foundation for a future point-in-time system.

    As for abandoning Microcomp, sometimes known as “locator code” or “bell code” data, it’s a very big deal indeed from our perspective, if only because of what will replace it. With some luck, it will be a combination of XML and XSL/FO. An insane amount of stuff is currently produced in Microcomp (I’m looking at *you*, Congressional Record) and there is thus the potential for publishing that data in a much more open standard.

    As to all the stuff that ain’t happening and the language about “misuse”, I suspect that just boils down to the idea that somebody might use voting data for opposition research in a primary fight, and on that subject I have only one thing to say:

    t.

    Like

  2. Joshua,

    1. Regarding the CRS report summaries, I am puzzled about that ominous-sounding language, “misrepresentations made by non-congressional actors”! As an inconsequential, utterly politically (and otherwise; I am an impoverished Jewish widow) unattached member of the public citizenry, I am happy to get access to anything that I can from the CRS. I am curious about this:

    “These CRS summaries have already been widely distributed… on GovTrack… for nearly a decade… what risks am I causing?

    I am not concerned that you are causing any risks! However, I do not understand why you and your website would be the sole online vessel to make Congressional Research Service report summaries available to the public. No, I don’t think you are hauling in millions of dollars of Google AdSense or Double Click revenue with banner display advertising! It just seems like a service that should be provided directly by the federal government, or through you as a contractor, presumably with a competitive bidding process and periodic renewals over the past ten years. Maybe that is how it does work.

    2. “Stock Act public data disclosure” Lol! TechDirt et al. should have fun with that. Current status in those parts is that those records are only available as hard copy, stored in the basement of a D.C. building, and require appointment arrangements to view. I refer to current records, not archives.

    3. “A contest to get the public to convert bills to the Akoma Ntoso XML data format” I saw that, when it was announced! WHY do we need to convert U.S. bills to that format? There were three individuals running the contest. One was a female economics professor in Italy (Padua maybe? it was nice to see a woman as a an economist AND data contest judge!), another was Italian, or maybe from Uganda.

    I read up on the Akoma Ntoso XML data format, via their website and so forth. I couldn’t fathom the rationale. Interesting minor point: I saw reference to the format via the EPA reg Twitter account, just a few days ago!

    I wanted to tell you about another XML U.S. open data source, but this comment is too lengthy as is. If it isn’t tossed in the spam bin, I’ll return with that URL.

    Ellie K

    P.S. Do you accept HTML tags for unstylish women like me? It would be appreciated.

    Like

  3. Hi, Ellie. I’ll try to respond to some of your points:

    1. CRS reports and CRS summaries are actually two separate things. The reports are non-public. I don’t have them. I’m agnostic about whether they should be public.

    The CRS summaries are bill summaries, and they can be found on Congress.gov. They’re public. Many websites including GovTrack copy and distribute them. What’s new is that the Library is working on publishing them in a structured data format, which will make reuse easier for everyone. That will be all public too. I’m not asking for any special access to information. Just the opposite.

    So what I was saying was that the Library was worried about the risk of providing the public with public information that they’ve been providing and I’ve been re-providing for a long time. If they think what I’m doing is a public hazard of some sort and that I should turn off GovTrack, they should come out and say it. That’s what I was complaining about.

    3. I don’t know either. I think Akoma Ntoso is something of an interesting research project for them. I don’t think this is getting in the way of open data, but it just looks funny.

    Thanks for writing.

    Like

Leave a comment