Mash-ups for government transparency

A few years ago I launched I didn’t think of it this way at the time, but these days you might call it a mash-up of data about the U.S. Congress. At the time what I was thinking was just collecting information about Congress from various sources (THOMAS, the Senate website, and the House website) and cross-referencing and hyperlinking the data in a way that no one had done yet. In fact, it was the huge amount of public data on the status of legislation that was made available through THOMAS (as I understand it thanks to the Republican take-over in 1994) that inspired me to try to put the data to new uses. It started with updates by email of what your congressmen were up to each day, generated automatically by grabbing data from THOMAS and, effectively, transforming it into a customized email update for anyone who wanted it.

The trouble with building GovTrack is that one has to do a bit of friendly reverse-engineering. The information is all “out there”, meant for public consumption, but it’s not out there in a way that makes it easy to transform into other formats for other uses, like the email updates, RSS feeds, and cross-referenced pages. The trouble is this: While people have no trouble browsing and searching THOMAS (for instance) for the information they need, we can’t make computers do the same thing automatically without much difficulty. To take an example, if I want to have my computer automatically fetch for me a list of all bills that were acted on the previous day (and in fact this is something GovTrack does), I would write a program that fetches the Daily Digest in the Congressional Record from THOMAS, which has bullets like this:

“Eleven bills and one resolution were introduced, as follows: S. 360-370 and S. Res. 37.”

I have no trouble understanding that. But, well, let me say as someone studying linguistics and natural language processing, computers are a long way from being able to understand English prose as well as people, nay as well as three-year-olds. Was the bill S. 365 introduced yesterday? Yes, of course — even though it was not mentioned explicitly (it’s merely in the range 360-370), and that’s just the first problem for a computer trying to make heads or tails of this information. So what’s a programmer to do?
Let’s go back to the goal of this. Certainly I don’t think it’s the government’s job to necessarily provide email updates, RSS feeds, Google Calendar integration of events, and whatever the latest technology hits are. There are a million and one things that one can do with information about the status of legislation, and someone will want each of them. So the question is this: How can the government, and Congress in particular, publish information about what it is doing in a way that makes it easy for others to put the information to new uses?

To be concrete again, because it’s always good to be concrete: How can THOMAS publish a list of bills that were acted on in a purpose-neutral way, a way that makes it easy for programmers to go and write applications to take the information and do anything with it that someone might want?

This is a question that I’ll probably blog more than once about on this site in the next few months. The answer is what’s called structured (or “machine-readable”) data, and it comes down to publishing information twice, once for humans clicking away at links, and once in boring, explicit tables meant for computer applications to transform into different formats. But more on that later.

“State of the Union” Is The Title Of This Post

There’s something funny about the title of this post, and it’s what happened at the start of the State of the Union tonight.  (By the way, kudos to MSNBC for posting the transcript, as spoken, immediately after the speech ended.)

Thank you very much. And tonight I have the high privilege and distinct honor of my own as the first president to begin the State of the Union message with these words: “Madame Speaker.”

Another lie?  Self-referential lies aren’t as bad as lies about weapons of mass destruction, but they’re more interesting to linguistics at least.  Why?  The first words of the State of the Union are “Thank you very much.”  They are not  “Madame Speaker” as he claimed.  It’s funny, of course, because the very utterance in which he makes a claim about what he said falsifies the claim.  The President ought to have said the following:

Thank you very much. And tonight I have the high privilege and distinct honor of my own as the first president to end the first paragraph the State of the Union message with these words: “Madame Speaker.”

But that’s not as elegant.  To preserve a different aspect of the meaning, he might have said:

“Madame Speaker” are words that tonight I have had the high privilege and distinct honor of my own of being the first president to begin the State of the Union message with.  Thank you very much.

(Not that I think he really should have said either of those, but it is, indeed, what he might have said if his speech writer were a stickler for precise, silly details.)
Maybe I haven’t given him enough benefit of the doubt.  Let’s call that paragraph meta-speech and not technically a part of the State of the Union.  Like, he gets to say it but we don’t count it as a part of the actual State of the Union. Because, if we consider the next two paragraphs as meta-speech also:

In his day, the late Congressman Thomas D’Alesandro, Jr. . . . Congratulations, Madame Speaker! Congratulations.

Two members of the House . . . Tim Johnson and Congressman Charlie Norwood.

Then finally we get to a point where he does seem to start up the “real” speech, starting in the traditional way, and with the words “Madam Speaker”.

Madam Speaker, Vice President Cheney, Members of Congress, distinguished guests, and fellow citizens . . .

But, ah-ha!  You may have noticed that the spelling of Madam changed from the first paragraph.  That was MSNBC’s doing, which seems like a subversive way of ensuring the President did not start with “Madame Speaker” after all.  It’s the left-wing media at work.

This of course all reminds me of Godel, Escher, Bach, which I finished reading recently.  In it, Achilles says someone keeps crank calling him on the phone and shouting:

“Is false when preceded by its negation!  Is false when preceded by its negation!”

The President may have inadvertently proved the incompleteness of number theory without realizing it.

Meaningful Reform

About a year ago following a few scandals, the House and Senate saw a flurry of Congressional reform legislation get introduced… and then promptly ignored. Finally, however, we may see meaningful reform. Senate majority leader Harry Reid has introduced S. 1: Commission to Strengthen Confidence in Congress Act of 2007. The bill would make two incredibly important advances:

(Sec 103) It shall not be in order to consider any Senate bill or Senate amendment or conference [without] a list of– (1) all earmarks in such measure; (2) an identification of the Member or Members who proposed the earmark; and (3) an explanation of the essential governmental purpose for the earmark is available … to all Members and made available on the Internet to the general public for at least 48 hours before its consideration.’.

(Sec 104) It shall not be in order to consider a conference report unless such report is available to all Members and made available to the general public by means of the Internet for at least 48 hours before its consideration.

Strangely, the bill does not require that bills (!) be available on the Internet for 48 hours before being voted on. Just conference reports. After a bill has been passed by both the House and the Senate, it’s often the case that the second chamber to get the bill has made amendments to the bill that the first chamber hasn’t yet gotten a chance to see. In that case, a conference committee is made to get the two chambers back in sync, and the final version of a bill comes out in a conference report.

Since it’s been introduced by Reid, I think it’s almost certainly going to get through the Senate. The House seems to be off in its own world, so I’m not sure whether we’ll see this bill ever become law, but it’s got a good shot.