Political text analysis: The Times counts debate words

The New York Times has an interesting flash application that breaks down the text of yesterday’s Democratic debate (there was a debate? UPDATE: And it was in my own city??) by speaker and shows visually the distribution of who spoken when through the debate. I mention it here because it’s one of these data transformations very much in the same spirit of what I keep pushing here. They took the transcript, made it visual and interactive, and the end result is a vastly different view onto the debate than anyone had before. It uses the same transcript as anyone else, but adds something very new and informative.

One can’t help but notice that the different candidates are not getting the same amount of speaking time. Clinton spoke more than 3.5 times more words, and the same for speaking time, than Biden. For that matter, basically so did the moderator, who held the floor for more time than anyone but Clinton. It’s no wonder that Clinton is considered “the Democrat to beat” considering she’s in our face more.

If the numbers weren’t so vastly different between the candidates, we’d chalk it up to some random variation that happens from debate to debate. But, from the numbers, the speaking times are clearly planned. It’s so clear that I feel like maybe I missed something. Is it common knowledge that the debates are proportioning time out to the candidates based on their poll numbers (or something equivalent)? It’s not just that the front-runners are getting more time. The statistical correlation is ridiculously high (speaking time versus FOX News/Opinion Dynamics Poll. Oct. 23-24: r=.96). That is, the debate organizers are basically using this formula to determine how much time each candidate should get:

Speaking Time = 8:26 minutes + 25 seconds * Latest Poll Number (%)

Of course, debate organizers can’t control exactly how long each candidate talks for, but the candidates only deviated from the formula by at most two minutes and twenty seconds (Biden, who spoke less, and DoddCORRECTED: Edwards, who spoke more).

So now I’m getting off topic a bit, but in any case: transformations on data can be very revealing!

Steve King introduces a new bill with a bit of Internet-transparency thrown in

Steve King, a Republican from Iowa, has introduced a new bill that has a clause specifically about Internet-based transparency. (We know King from his bill H.R. 170: Sunlight Act of 2007, parts of which I think were integrated into the passed ethics reform bill. One part that wasn’t integrated was a provision to have bills posted online for 48 hours before their consideration.) His new bill is H. Res. 776: Amending the Rules of the House of Representatives to require that rescission bills always be considered under open rules every year, and for other purposes.

This bill, like most of the 12 others he has introduced this year, takes a classical conservative position, here trying to reduce government spending. The real point of the bill is expressed best in one of its findings clauses:

Whereas a rescissions bill, which would cut Federal spending, should be brought to the House floor at the beginning of every fiscal quarter to give Congress the opportunity to cut and cancel unnecessary, wasteful, and bloated government spending to eliminate the deficit;

But the interesting part for us is:

Whereas the process of cutting spending should be open to the public, by posting this spending cutting bill and its amendments on the Internet, so that Americans can exercise their right to contact their Members of Congress and make their views known

It has a variant of the 48-hours language from his other bill applied specifically to rescission bills.

Committee Votes: That’s The Deal

I happened to check on the list of cosponsors to H. Res. 231: Amending the Rules of the House of Representatives to require all committees post record votes on their web sites within 48 hours of such votes — the number is growing. It now has 131 cosponsors, with 27 added in the last two months. That’s some good work on the Hill for whoever has been rounding up support for the bill.

All of the cosponsors are Republican. Does anyone know why that would be? Do Members not bother to seek out support across the aisle, do Members not listen to Dear Colleague letters from across the aisle, or are the Democrats not interested in actual transparency reform?

Committee Votes: What’s the deal?

For a few years now I’ve wanted to look into integrating committee actions into GovTrack. Along with full roll call votes, it would be nice to be able to see how committee members voted in committee on various issues. Finally I took a look at a report PDF from the House Armed Services committee on the defense appropriations bill to see how they include committee votes. The report PDF is, as far as I know, the only way to find out this information besides personally going to a committee office or, maybe, making a phone call. With all congressional data, there’s never an easy way to get it, but some programming magic (screen scraping) is usually enough to extract the info out of wherever it is.

Not so for committee votes. Reminiscent of the type-print-scan-print-mail-scan-print-type financial disclosure methods in the Senate, committee votes were included in this PDF as an image. That is, the vote was typed up, and then probably printed, scanned, and then imported as an image in the final report. Because it is an image, and not text, it is infeasible to extract this information automatically.

I’ll give the committee the benefit of the doubt that this just happens to be the way they’ve always done it, and change is tough.

But come on. This isn’t transparency.

The newest advocacy org.: the Oversight committee

Advocacy organizations are, to some degree, defined by mobilizing a community to take some action. One thing they tend to do is conclude a message with something like “To take action, call your congressperson at [phone number].” Sometimes I find that kind of off-putting because it seems like all they want to accomplish is what in the tech world is called a distributed denial of service attack, where a service (in this case a congressional office) is tried to be taken off line by an attack from many sources (in this case the constituents), somehow independently organized. (Ok, maybe that’s a bit too negative.)

This just in from the Committee on Oversight and Government Reform RSS feed:

On October 8, 2007, the American Spectator printed a fictitious story alleging that Congressman Waxman and the House Oversight Committee were investigating conservative and Republican talk show radio programs….

The American Spectator should immediately retract its report and apologize for the confusion its fictitious report has caused. Moreover, anyone concerned about the false reporting should contact the American Spectator at (703)807-2011 to register your views.

Since when was the Oversight committee in the business of mobilizing a group to take action?

Communication: Authentication Part II

The problem of authentication is basically this: how can we off-load the problem onto someone else that’s already doing authentication? I suggested last post charging credit cards using some credit card charging service that happens to verify billing addresses too (and, as Oxa pointed out in the comments, it’s fairly disenfranchising, although to be honest I don’t mind—Internet communication is already disenfranchising). Two more methods to consider are off-loading the verification to the postal service, or to the individuals.

Sending postcards to verify addresses– The recipient has to type in a random code in the postcard to verify that he got the postcard, i.e. that he’s at that address. (Oxa mentioned this, and I’ve seen it elsewhere.) I didn’t mention it because I assumed this would be too costly. Actually, that may not be so true. I’m just ballparking, but if the overhead of a credit card purchase is around 10 cents, and it costs 41 cents to mail a postcard, that’s not soooo different. But mailing a postcard has some additional overhead (printing the postcard (automagically), and manually schelpping postcards from a printer tray to an outgoing USPS mailbox). I also found a service that will verify phone number-address pairs, which is actually pretty close to what is needed — at around 40 cents per verification.

However, even these methods don’t get you all the way, because in fact we need more than address verification. We need verification or at least assurance that the person hasn’t verified before. You could limit the number of verifications per address, but there are some technical problems with that. The credit card method has the advantage that an individual can only verify as many times as the number of credit cards that he has, and that’s usually pretty limited.

There’s another route to consider, but this is a route tried before with no success as far as I’m aware. You can off-load the authentication problem to the users by creating a web of trust. User A does the work of authenticating users B, C, and D, User B authenticates E, F, and G, etc.. And then one just has to worry about how much you trust a small number of root users, rather than the whole community. But I don’t know if this has ever been a practical solution to anything.

Technical Challenges of Communication: Authentication

I guess this is going to be a series of blog posts on this subject. For me, this is a lot of thinking out loud and trying to figure out whether there’s something in here for me to tackle (with my nonexistent spare time), so I appreciate the comments.

As commenters Oxa and Chris (in the post before last) note, OpenID is one of these emerging protocols that would seem to be helpful here. Sort of. Here’s the technical side of the problem we face: When a citizen signs a letter (or joins into one of these many-to-one communications), how does the congressional office know that that signature is legit? Currently, the only authentication in the process is citizens providing at least seemingly-real addresses, but as one staffer at the CMF conference noted, there are people (maybe not many, but at least one) who are using other people’s names and addresses when submitting letters to Congress.

A technical solution here would be for congressional offices to implement some (whatever it might be) form of authentication, and someone at the conference (apologies I forget who) mentioned conceivably using the e-Authentication system (in development) at GSA (iirc). That would authenticate people against bank accounts, possibly. (And someone else at the conference raised the question of whether that was fair to all.)

The problem gets a little bit worse if someone wants to implement one of these communications methods outside of the Capitol. In this case, not only does one have to do the authentication as above (and probably without the GSA’s help), but one has to then be able to convince congressional offices that the signatures being relayed are legit. It’s one thing to authenticate at the time of signature, and quite another to be able to prove to someone else that you did the authenticating. (Well, proving may not be necessary. Trust is another solution.)

Of course, these issues have been completely solved at the lowest technical level in the world of encryption. The issue here is a matter of how to implement it so it’s not limited to geeks with PGP keys and congressional offices with geeky staffers who can verify PGP signatures.

But, now as for OpenID in particular. Actually it doesn’t solve the problem because there is no way to tie an OpenID to a real-world name and home address, which is what we really need. OpenID, for readers who haven’t seen it yet, is a sort of global login identifier that you would use to log in at any website, rather than giving a different username and password for each website you use. It’s a great idea because, most interestingly, it is a completely decentralized system, and an open standard.

OpenID is certainly a good place to start if you want to build a system that is going to have broad applicability (i.e. “open use” ?) beyond verifying signatures on letters to Congress. How to co-opt OpenID into this is an open question, as far as I know. (I’ve talked about it ever so briefly with Andrew Lee at Fantasy Congress. And, also, I noticed that the idea of authentication was listed on the Gateway to Gov wiki some time ago, just to mention. Also, I know people in the OpenID and FOAF communities have thought of issues like this, but I don’t believe anyone has tackled it head-on.)

To do the actual authenticating, really the only practical way that I know of is using credit card billing addresses — charging users a nominal fee to authenticate, and then returning the money (or not).

So here’s the bottom line as I see it now: An authentication system is the primary thing we need if we’re going to have new forms of congressional communication. Building the core of this system based on credit card billing addresses should take about a week. I would do it myself except that the system must process credit cards and possibly needs to hold onto some personal information (certainly not the credit card number, but a name, home address, and an encryption key, for instance), which makes the site a huge liability and responsibility.

Landscape of constituent communication

Oxa Koba asks in a comment to my last post what other forms of many-to-one communications types there are that would make sense for Congress, as additional forms of communication besides the individually sent letter. I don’t know what would make sense for Congress, but here are some things I had in mind.

The petition, which I mentioned last post but am including here for completeness’s sake — A letter with a number of signatories. Questions: How can Congress authenticate the signatories? Is a petition too easy to sign to be meaningful to representatives?

The collaboratively written letter. This is something I have a new interest in following the discussion of the “C-Wiki” on the OHP list. This differs from the petition in that the signers actively participated in crafting the letter, meaning that the effort of the signatories is roughly the same as an individually written letter. The question is: How does one do a collaboratively written letter on a large scale? Something like a wiki would be involved, but one could imagine tweaking the wiki process to make a better system for writing consensus-driven letters. On a small scale, this is really not much different from the individual letter (if sent multiple times), so it’s a technological question for how to do it effectively/usefully.

A petition to answer a question. This is something like a Digg for “ask your rep”, or the Slashdot-style interview, and I may have first seen this for politics on the Gateway-to-Gov wiki. Vote up questions you want your rep to address, the top questions get sent, and the rep sends back a single reply to all those who participated in voting. I’m separating this from the usual petition in that the emphasis here is on the response.

The town forum. Once relegated to physical spaces, this can now be done in a few ways on the Internet. A text-based chat room (this must have been done before, but I don’t know of any examples), a virtual video-based chat room of some sort, or streaming a video recording of an actual (in person) town hall over the web. (I met someone on Monday who had some experience with that that I hope to talk to more about once I get the chance.) In a town forum, questions may have been authored by individuals (solitarily), but the method of disseminating the response is much more lively/interesting than a letter or press release.

The delegated voice. (Ok, I’m making up that name.) This is where members of the community elect someone to voice their thoughts and communicate one-on-one with someone higher up. Think of this as Your District’s delegate-to-your-representative on Net Neutrality issues. Backed by a large constituency, the delegate makes meetings with the congressperson, relays views, reports back, and perhaps establishes a long-lasting relation both with the community and the member of congress.

[Update 10/6/07: Commenter Chris on my previous post notes blog comments and forums as two other communications methods. Definitely.]

That’s all I can think of for now. Leave comments if you have other ideas.

The interesting thing to me is that there are technological issues in each of these methods that can be resolved with some elbow grease that might make them practical (whereas without technology, most are perhaps difficult or impossible to do effectively). (Thanks Oxa for the comment!)

Communicating with Congress Conference

One aspect of transparency that we didn’t touch on in our report was the ability of the public to contact Members of Congress. Yesterday the well-respected Congressional Management Foundation hosted a conference on Communicating with Congress, and some OHP regulars were in attendance (John Wonderlich, Rob Pierson, and Daniel Bennett were among the panelists — btw, thanks John and esp. Daniel for the plugs for GovTrack). I was pretty sick yesterday, esp. by the end of the conference, and was probably fairly incoherent to those I talked to after.

Two things that I learned stood out:

First, congressional offices are ridiculously overloaded with communication with the public. 313 million emails came into Congress in 2006 (iirc), which if you do the math (because I forget if anyone gave the exact number) is in the ballpark of 300-2000 emails per office per day. And given the current office budgets allowing for just a few people (in the House) to be dedicated to dealing with communications like that, there is no way, as passionate as they are about it (which also became quite evidence both from the staffer panelists and those that were in the audience), for them to respond to all communications. As a result, what we see on the outside — web forms, sometimes CAPTCHAs, limiting communication to constituents, and other barriers, are a means for them to triage the bombardment of letters they get. If they can’t deal with it all, they prioritize the letters that the writer took the most effort to create. That’s very reasonable to me.

However, what was not reasonable was that if Members sincerely want to respond to every incoming letter (one staffer told a story of how Sen. Frist asked his staff to reply to every letter, and the staffers looked back in puzzlement), and given that more staff is needed to do that (staff sizes haven’t increased in 30 years), then the Members should be writing resolutions to increase their budget to make that possible. Congress can’t blame the need to triage on budgetary restrictions — they decide the budget, after all.

The second thing was that, as panelist Alan Rosenblatt presented, the method of triage has unintended ramifications — that barriers to entry can be seen as an insult to those to care but don’t have time to write a carefully crafted letter themselves and instead rely on the research of advocacy organizations to make his point (by joining in on a letter-writing campaign, with a pre-written letter). He put the point quite well: Members of Congress rely on their staffers to do research and craft public statements, and in the same way, Americans rely on advocacy groups to do research and craft letters to politicians. There’s nothing wrong, he said, with sending a pre-written letter. And as another panelist showed, less than 10% (he later said 20%+ as a guess, but the numbers on the slide indicated otherwise) of those who participate in a letter-writing campaign modify a pre-written letter.

I got in under the wire with the last question of the day, which went effectively unanswered (though Daniel tried). I should have started with this: There seem to be three ways to deal with the problem of overloaded communications staffers (”LC”s?). One way is to increase the barriers to communication so they get fewer letters, eliminating the least important ones (as they see it). Another way is to streamline the process, which goes along the lines of what Rob suggested for a computerized, standardized (XML) letter submission format. But there is a third way, which is what I suggested, which is looking at other forms of communication entirely, to complement individual letter writing, that deal with more constituents at once. Clearly, to the extent that it makes any sense at all, dealing with communications that are sent collectively by citizens is more efficient than dealing with the same letter sent individually. Currently, petitions (a basic form of an aggregated communication) that Members receive have no weight, according to one staffer I asked. Presumably this is because (1) it is too easy to sign a petition to be meaningful (again, as they see it), and (2) it is impossible for Member offices to verify who signed the petition. At the least, (2) is something solvable with technology. But there are many other forms of many-to-one, aggregated communication, and I would sincerely like to know more about what Members think of those methods and whether the problems with those methods are technologically addressable.