We’re typing away

In case you’re wondering why it’s been a little silent on the blog and the mail list recently, we’ve begun drafting our report. The group has broken down into smaller groups each working on a different section of the report. I’ve been working on the section of the report that will recommend using structured data formats to publish the status of legislation, and it’s coming along nicely thanks to the help of some others. We should be done with our section in the next few days, and I’m looking forward to seeing what the other sections will look like.
This probably puts us a little bit behind schedule for issuing the report this month, but I thought that was too optimistic all along. The extra time that has gone into getting background facts and carefully hacking out the issues for and against each recommendation has been incredibly useful.

Committee Feeds: Some Recommendations

When creating an RSS feed, one should have two audiences in mind: first is your human readers that will be subscribing with their news readers, second are the computers that will be mixing your feed with others, and transforming your feed into other formats (iCal feeds, integrating it into a website, etc.) For committees in Congress, here are some do’s and don’t’s with regard to the second audience.

For announcing committee hearings, here’s the least information-rich way to publish a feed: (And apologies for using brackets instead of for XML tags. I can’t seem to get those to work with WordPress.)

[pubDate](date that the hearing was announced)[/pubDate]

[description]The committee on whatever will meet next Tuesday in our usual meeting location to discuss the terrorism prevention bill. [/descrpition]

No computer can get anything out of that, meaning it would be impossible to transform that feed for any other use. Better would be this:


[name]Committee on …[/name]

[date](date & time of the hearing in a standard date/time encoding[/date]
[description](same as description field above)[/description]
[location]Longworth whatever[/location]
[subject]Terrorism bills[/subject]

This structure separates out all of the details into distinct fields with clear formats. Converting this feed into iCal format so that someone could integrate the events into their calendaring program would be straightforward.

And, especially in the case of the related-bill item, the bill is referred to not by name (which is ambiguous), or even by number (like H.R. 1234) which is still ambiguous across Congresses, but in a very precise format that is entirely unambiguous. That will let websites like mine cross-reference events in the feed with the bills they are related to, which means it will be easier for people to follow the committee schedule.

Also, including metadata at the top like the actual name of the committee (not, mind you, the title of the feed, like “Hearing Schedule for the Committee….”) is not a bad idea either.

How To Take i-Transparency Seriously: Create An Office For It

You’ve read on this blog about how the House can use the Internet better for fostering transparency. What the suggestions come down to is making use of the latest Internet technologies, the new low-cost distribution network, to bring as much information, in an organized way, to the public.

With the number of separate data sets that exist (or we want to exist) about Congress, organizing them so that they are interlinked, and easily meshable, is difficult, takes real work, and needs to have some degree of coordination.

Further, as technology changes, we want Congress to continually update how they use the Internet. Congress must make a lasting commitment to technology & transparency.

One idea to toss out there is: The House should establish an “Office of Technology and Transparency” (”OTT”) within the purview of the Speaker and put the weight of the Speaker behind it. The office should have three functions:

  1. The OTT should serve as a source of guidance for the webmasters of the various House webpages (committee websites, member pages, etc.) in how the latest Web standards, from RSS and XML to RDF, tags, trackbacks, and OPML, can be put to use. Congress should be a leader in adapting standards that foster transparency, and the OTT can help implement those standards in Congress.
  2. The OTT should serve as a liaison between the Speaker and the Library of Congress and the House clerk to bring entire datasets into the public realm with the latest technology standards. For example, the OTT should work with the Congressional Research Service (in the Library of Congress) to make the Legislative Information System database (which powers THOMAS) public to as much a degree as is reasonable. The OTT should also work with maintainers of the new lobbying disclosure databases (presumably in the clerk’s office) to ensure the databases are made available to the public both through a comprehensive searchable website, and are highly interlinked with other data sets, and are also provided to the public as a raw database download in a modern format. In this role, the OTT serves to promote or coordinate the House’s Internet public library of data.
  3. The OTT should act as a coordinating body between distinct data sets maintained within the government to promote consistency and standards-reuse between the data sets, ensuring the data sets are maximally meshable. For instance, legislative, lobbying, and election data sets should all use a consistent identification scheme for Members of Congress.

(This thought stems from a suggestion by Gary Bass of OMB Watch on the TOHP mail list to have a Congressional Internet Library.)

States leading the way: take 2

After writing my previous post about states leading the way in publishing raw database dumps of legislative activity (making the data really reusable, besides providing a nice web interface), I checked in with Illinois’ Legislative Information System Executive Director Tim Rice, who said:

The XML data is the result of a total rewrite of our systems that was completed in 2003. Since we had the data in XML, it made sense to provide it. Putting it on our FTP site was the best means to do that. It made the data available in a useful format, and it provided access to data that kept our site from being crawled constantly by those wanting that data.

There really haven’t been any problems with maintaining the FTP site. The data is moved there as part of our regularly scheduled processes, so it doesn’t require special attention.

So that’s the word from IL: Publishing the data they already had is easy and even cuts back on the number of “spiders” crawling their site (like GovTrack does to THOMAS).  (Thanks to Tim Rice for the info.)