July 2007 – Joshua Tauberer's Archived Blog

S. 1 passes House

S. 1: Commission to Strengthen Confidence in Congress Act of 2007 passed the House today by a ridiculously wide margin, and so I’ll do a bit of biting my hat for all of the “Congress isn’t doing anything”-type complaining I’ve done recently. The bill previously passed in the Senate, also by a wide margin.

Either the Times or I am a bit confused on what happens next. The Times says the bill moves to the Senate next. It goes back to the Senate if the House made any changes, and I don’t see that there were any amendment as of yesterday. The text of the bill as passed by the House is also not available yet from GPO (it should be up tomorrow morning), so we’ll have to wait to see just what exactly the House voted on.

I’ll try to post a synopsis of the bill once I see the text tomorrow.

Congressional video from the trenches

While much of our report was written by people that, at least currently, are in the game of spreading information in an issue-neutral way — that is, information for the sake of information — it’s always nice to hear that those that are in the game of policy come to (at least some of) the same conclusions we did about congressional websites. Last week I asked my good friend Aliza who is interning at the Community Food Security Coalition what she thought of committee websites:

today is the mark up in the House of the farm bill, and my boss and i have noticed something frustrating about this sort of thing, at least for the agriculture committee, which is that they don’t archive the webcasts (or audio) of mark up and hearings- you can watch them live, which is great, and we do, but they should be archived.
…
it’s probably good, also, to have more of a searchable record of things that have been said.

As we mentioned in our report (both in the committees and video sections, iirc), the availability of webcasts is highly variable from committee to committee.

For the record, after my post a while back about Glenn Beck’s position on removing video cameras in the capitol, some of the responses on the mail list made me realize that there are two ways you could go. If the current level of cameras in committees induces political posturing at the expense of real legislating, because cameras that broadcast out to a wide audience are relatively rare and politicians seem to want to take advantage of the air time, Congress could either remove cameras completely, as Glenn Beck suggested, or they could put cameras in all open meetings, as we suggested in our report, properly streamed and archived on the web, with the result being that with every meeting a chance to posture, the incentive to take advantage of any one particular meeting for posturing is reduced.

Leg. database: Some progress

A couple of weeks ago I was fortunate to get in touch with three people at the Library of Congress, including the LOC’s director of web services (if I’m getting the title right), who took some time to talk to me about the information flow in the LOC that gets legislative information into the website THOMAS and the Capitol-internal Legislative Information System (LIS). (Regular readers will know that my interest here is the recommendation in the first chapter of our report: having the LOC publish a public, raw database of legislative information as a basis for other websites to present new views into the legislative process.)

The purpose of the conference call was for me to get a better understanding of the feasibility of our recommendation. One thing I learned is that the legislative information that powers THOMAS and LIS is currently in a regular (Oracle) database —- no surprise there, of course — but that they have been in the process of converting their database to XML, which is a good, modern way of storing this type of infoirmation. The switchover internally to XML has been going on for a few years and is virtually finished. (Now knowing this, I have to note a correction for our report, in which I believe we presupposed the existence of a (finished) XML database based on the information we had.) From a general public-interest perspective, it’s nice to see the LOC upgrading to the latest and greatest way of doing things technologically. From the perspective of our report, the fact that the LOC will have an XML database internally of legislative information would seem to confirm to me that for them to create a public database of legislative information is, from a purely technological perspective, entirely straightforward.

The primary concern at the LOC in terms of providing new services like a public XML database, I was told, is ensuring the accuracy of the information they make available. So, it’s really a non-technological issue that is the most prominent (as we probably should expect). As far as I could tell, this was the only major concern about providing an XML database to the public.

I learned various other things:

Last post here I wrote about the bill XML files that the House publishes publicly. I was told that the Senate is drafting some 90% of their bills in XML — that was given as a rough guess. By my own count, the House is drafting around 97% of their bills in XML, and the Senate could, for all I know, be reaching that level too. The Senate XML files are not public yet, apparently (as far as the LOC guys knew) because thee Senate folks were not completely sure yet whether the XML files were correctly rendering in an official, print-ready format (when used with the official stylesheet). The LOC guys said that, as a hopeful guess, Senate XML files could become public in the next year.

The LOC is looking into adding RSS feeds to THOMAS, a feature that I’ve seen on people’s wish-lists a few times. I don’t know how comprehensive the feeds will be, and I don’t believe the LOC has decided that yet either. They’re looking into what the infrastructural requirements will be (such as the impact of the additional load on their servers — which I know from the feeds on GovTrack is not a trivial concern).

One other thing I learned was roughly how many people work on THOMAS. Because the people that work on THOMAS are also working on other things for the LOC, there’s no exact number that could be given. A variety of people are involved in THOMAS, like design, development, management, quality assurance, and sys. administration. I got the initial impression that if you had to compare it to hypothetically a certain number of full time people working solely on THOMAS, it would be somewhere in the ballpark of 2 to 5. But that’s my number, not the LOC’s, and I mention it only to relay my sense of how much resources THOMAS represents now.

I don’t have much in the way of concluding remarks for this post, so take all of that for what it’s worth. (And thanks again to the guys at the LOC for talking to me.)

Legislative XML: What we have and what we’re seeking

John asked me to clarify a bit what legislative information exists in XML and what more would be a good idea for Congress to provide (this is the subject of the first chapter of our report, Legislative Databases).

What exists now, publicly, is an XML markup of the text of some legislation. First the counts, and then I’ll explain what’s in these files. The House has revised its bill drafting process, and, by my count, currently 97% of House bills (3481 out of the 3558 so far this year) are prepared and published publicly as XML. The Senate, I am told, is well into the process of using a similar (or the same?) system, but they are not yet ready to make their XML files available to the public, so there are no such files for Senate bills. (Thus, XML bill files are available for 63% of the bills introduced this year so far.) Also, since this process is relatively new, the availability of XML for bills only goes back to 2003. The files are available on THOMAS (here or by clicking the XML Display link on bill text pages, where available), and are described at http://xml.house.gov.

XML is a type of structured data format that our report urges using in a few sections — such as for publishing committee schedules (e.g. in RSS, a flavor of XML). But, the potential uses of XML depends wholly on what information you encode in the XML.

These bill XML files are markups of the text of the bill, in a structured data format, which means that the organization of the bill into titles, sections, paragraphs, etc. and other formatting considerations like quoted text are explicitly represented. What is is useful for is when applications want to control how to format the text of a bill, rather than using the GPO’s PDF (i.e. display-exactly-as-it-prints) or text-only versions (i.e. no formatting allowed), which don’t look nice if you try to embed them in a web page. For instance, the markup in bill XML files is probably how Sunlight’s LOUIS renders the text of bills in a way that makes it visually pleasing. Marking up the text this way also makes it more readily possible to write applications that tag certain sections with annotations, like an “earmark guide” of some sort. Additionally, references to Members of Congress, like those in the list of a bill’s sponsors, and references to existing law (by name and U.S.C. references) are marked up in the XML files. This means that sites like LOUIS can (and they do) make those words hyperlinks to relevant information about the people or laws, something you can’t get from the PDFs or text form of bills. The benefit of the existing files to the public has primarily to do with making a user-friendly display of the text, as well as indexing and searching the text.

What the House has done for bill XML is useful and important, as can be seen from LOUIS’s use of it to make reading bills easier (than it could have done without XML). However, there is much more information about legislation than the text, and that information is also very important to the public. That is what our report urges the House to make available in a structured data format.

This additional information — what is called bill summary and status at the Library of Congress — is made available to the public through the THOMAS website (administered by the LOC). THOMAS has this information going back to 1989, for every bill. However, that information is not made available in a structured data format, limiting the ability of the public to reuse, transform, and mix it to create new views into the Congress. And that is what we’re asking for.

That information includes (besides what is in the existing bill XML files) CRS summaries of bills, a list of every action taken on each bill (votes, motions, referrals, etc.), a list of all titles a bill goes by, committee assignments, a list of related bills, a list of amendments on the bills (incl. title, sponsor, and legislative activity on them), a list of (LIV) subject terms assigned to the bill by CRS (which is very helpful for the public), and related committee documents.

The information on THOMAS is really crucial for tracking legislation — especially the list of legislative actions, like votes, and amendments. Without it, that is, just with the text of legislation, the public gets a very static view of the process, and are left to the hands of the media to be told about whether a bill passed or what committees are responsible for it.

Our recommendation in the report was thus that to the extent such a database already exists in the Library of Congress (it does — it’s how the THOMAS website manages to exist at all), giving the public a structured data representation of the database should be easy, relatively noncontroversial, and a big tip of the hat to transparency.

	Joseph Kerski on 50% of the U.S. population liv…
	Harlan on 50% of the U.S. population liv…
	New Best Practices f… on Updated Guidance for Federal A…
	Supporting Best Prac… on Updated Guidance for Federal A…
	» Tauberer et… on Guidance: Federal agencies can…