A couple of weeks ago I was fortunate to get in touch with three people at the Library of Congress, including the LOC’s director of web services (if I’m getting the title right), who took some time to talk to me about the information flow in the LOC that gets legislative information into the website THOMAS and the Capitol-internal Legislative Information System (LIS). (Regular readers will know that my interest here is the recommendation in the first chapter of our report: having the LOC publish a public, raw database of legislative information as a basis for other websites to present new views into the legislative process.)
The purpose of the conference call was for me to get a better understanding of the feasibility of our recommendation. One thing I learned is that the legislative information that powers THOMAS and LIS is currently in a regular (Oracle) database —- no surprise there, of course — but that they have been in the process of converting their database to XML, which is a good, modern way of storing this type of infoirmation. The switchover internally to XML has been going on for a few years and is virtually finished. (Now knowing this, I have to note a correction for our report, in which I believe we presupposed the existence of a (finished) XML database based on the information we had.) From a general public-interest perspective, it’s nice to see the LOC upgrading to the latest and greatest way of doing things technologically. From the perspective of our report, the fact that the LOC will have an XML database internally of legislative information would seem to confirm to me that for them to create a public database of legislative information is, from a purely technological perspective, entirely straightforward.
The primary concern at the LOC in terms of providing new services like a public XML database, I was told, is ensuring the accuracy of the information they make available. So, it’s really a non-technological issue that is the most prominent (as we probably should expect). As far as I could tell, this was the only major concern about providing an XML database to the public.
I learned various other things:
Last post here I wrote about the bill XML files that the House publishes publicly. I was told that the Senate is drafting some 90% of their bills in XML — that was given as a rough guess. By my own count, the House is drafting around 97% of their bills in XML, and the Senate could, for all I know, be reaching that level too. The Senate XML files are not public yet, apparently (as far as the LOC guys knew) because thee Senate folks were not completely sure yet whether the XML files were correctly rendering in an official, print-ready format (when used with the official stylesheet). The LOC guys said that, as a hopeful guess, Senate XML files could become public in the next year.
The LOC is looking into adding RSS feeds to THOMAS, a feature that I’ve seen on people’s wish-lists a few times. I don’t know how comprehensive the feeds will be, and I don’t believe the LOC has decided that yet either. They’re looking into what the infrastructural requirements will be (such as the impact of the additional load on their servers — which I know from the feeds on GovTrack is not a trivial concern).
One other thing I learned was roughly how many people work on THOMAS. Because the people that work on THOMAS are also working on other things for the LOC, there’s no exact number that could be given. A variety of people are involved in THOMAS, like design, development, management, quality assurance, and sys. administration. I got the initial impression that if you had to compare it to hypothetically a certain number of full time people working solely on THOMAS, it would be somewhere in the ballpark of 2 to 5. But that’s my number, not the LOC’s, and I mention it only to relay my sense of how much resources THOMAS represents now.
I don’t have much in the way of concluding remarks for this post, so take all of that for what it’s worth. (And thanks again to the guys at the LOC for talking to me.)