Debate Time Follow-up: The CNN debate

Apparently there was another Democratic debate last night. Based on the transcript analysis by the New York Times and the latest Fox News/Opinion Dynamics poll numbers, I’ve run the numbers again. Last debate, as I blogged, I found that the amount of speaking time of each candidate was ridiculously closely correlated with their latest poll numbers at the time, to the extent that it was impossible to believe that that was not planned. (For stats people, r > .95). That is, MSNBC is skewing the elections and endowing polls (i.e. an easy news source) with more importance by giving more free exposure to the leading candidates.

Yesterday’s debate, a CNN debate, did not show quite as high a correlation (r = .73), with the latest poll numbers at the time of the debate. That’s still quite high. Obama spoke the most, although Clinton still leads by quite a bit in the polls. On the other hand, Obama spoke for more than 3 times as much time as Kucinich, the candidate who spoke the least. The correlation is still implausibly high if we believe the speaking time was intended to be allocated evenly, but perhaps it’s not so high as to believe that CNN used a formula based on poll numbers to decide speaking time for each candidate (as I believe MSNBC did).

Next time MSNBC and CNN have debates, we will start to see whether we can tell from the numbers that MSNBC and CNN have different policies for how they allocate speaking time.

Better late than never: GPO responds to my question 1 year later

On November 19, 2006, I inquired with the GPO regarding how they decided on charging the public $8,000 for documents they produce in their normal course of printing bills. These documents are not for the end-user, but would be useful for sites like GovTrack. The U.S. Code requires most GPO documents be sold to the public at their marginal cost to distribute, and the marginal cost of distributing the documents I wanted (the “Daily Bills” product with “GPO Locator Codes”) couldn’t be more than $100 a year, and is probably closer to $1.00. Either they aren’t complying with the law, or they don’t consider the documents among those covered by that rule. I wanted to know which.

Today I got an email from GPO’s Lead Customer Service Representative:

Dear Customer: I was updating my database files and notice that your incident was still pending.

Yeah, I would say so. Better late than never.

Senate Voting Records: Use XML

(This is written in the style of a letter to the Senate… because hopefully it will turn into just that. Comments on its persuasiveness are welcome.)

Summary: The Senate’s current position on publishing voting records online is analogous to a reference library that has no copy machine. I explain below why the Senate website should publish its roll call vote records in “XML format”, to facilitate educating the public and strengthening transparency, and why any reluctance there may be should be reevaluated in light of the experience from the House’s use of XML for roll call votes and the presence today of unauthoritative XML for Senate votes. Current Senate website policy should be revised to encourage the use of this “structured data format”.

Though everyone believes an electorate must be informed to make wise decisions at the polls, the complexities of what happens in the Congress are indeed difficult to distill and share with the public. Roll call voting records are of crucial importance to the public for obvious reasons, but at the same time fail to capture the nuances of each situation that may have played a central role in a Senator’s decision making. How voting records, which are easy to convey but oversimplify the big picture, should be responsibly shared with the public is a question for debate. I suggest below that the Senate website publish its roll call vote records in “XML format” (in addition to what is currently available) to help keep the public informed, and that any fears about how the information in XML may be used are not strong enough reasons to avoid this technology.

The Senate’s current position on publishing voting records online is analogous to a reference library that has no copy machine. In a reference library without a copy machine, the information in the stacks is certainly made available, but library members can’t easily share the information with others. They can instruct others how to find the information in the library (i.e. a link), and they can copy the information by hand and make copies at Kinkos, but library members are unable to use the latest technology to help them share the information outside the library. In such a world, the library members’ response is likely to be to haul in their own copy machines into the library. This is exactly what has happened with Senate voting records.

Leaving the metaphor, long ago the Senate took the important step of publishing voting records on its website. Though the votes webpages themselves cannot capture all of the nuances of each vote, these webpages complement what exists elsewhere on the web. For instance, the websites of newspapers, which do try to explain the back-story of legislative issues to present a larger picture, often link to the Senate’s roll call webpages as, in a sense, an extension of their own reporting, that is, so they can provide not just the big picture but also the crucial details. The roll call webpages thus have an important role in educating the electorate and promoting transparency.

The metaphorical copy machine represents what is called structured data, for example “XML.” XML allows computers to more easily process information, and for voting records would help that information be disseminated more widely and in novel ways to the public. While structured data is a part of today’s so-called “Web 2.0″, the current policy understood to be coming from Senate Administration is that the Senate website is not to publish structured data for roll call votes, with the reason understood to be that Senators prefer to have their votes be published not as isolated factoids, where they could be misrepresented, but rather only as part of a larger picture.

This policy warrants review on two accounts. On the one hand, even such isolated facts have a crucial role of complementing the larger picture presented elsewhere, as does the existing Senate webpages for votes as explained above. But further, for several years the House has published its voting records in XML. The New York Times, for instance, makes use of these files to enhance their own coverage of legislation by including visual representations of votes along with their articles — the big picture and the crucial details. XML made the voting information more easily transformed into visual form, a form that has educational value to the public, and so using XML is in this respect in the public interest. The Senate does not publish XML, and while as with the metaphorical reference library this does not prevent wholesale access to the information, it is holding back on technology that facilitates educating others. The Senate should adopt a similar policy as the House to encourage the dissemination of voting information, knowing from the experience of the House that it will be used often to complement reporting of the nuances and the big picture.

Because it does not publish votes in XML, the public has hauled in its own copy machine — and the effect is that Senate vote XML files are available to the public, Senate rules notwithstanding. The independent website GovTrack.us publishes its own XML files for Senate votes, and these are used by several other websites to enhance the public’s understanding of the Congress. Any fears Senators might have had for a future with XML can thus be evaluated today. However, this unauthoritative source for voting information is not an optimal solution, on account of the fact that on rare occasions it disseminates incorrect information to some hundreds of thousands of monthly visitors of the websites using these XML files. An authoritative source of roll call vote XML files from the Senate directly would rectify this problem.

As there is virtually no cost to publishing XML files for roll call votes, and in light of the experience that can be gathered from the House’s use of XML and the presence today of (unauthoritative) XML for Senate votes, the current policy regarding the use of structured data on the Senate website should be reevaluated. The use of structured data should be encouraged for all public information on the Senate website, especially starting with roll call votes, and would signal a renewed commitment to using technology to promote transparency.

The cynical take on the debate speaking times

I can’t help but take this a step further. Last post I noted that in Tuesday’s MSNBC Democratic presidential debate, the amount of time spoken by each candidate was correlated ridiculously well with their latest poll numbers, to the extent that it is impossible to believe this was not planned. I don’t know who planned it, but it would seem to me that it is [MS]NBC that had the most to gain. (If the candidates voted on the rules, certainly a majority would not have agreed to such a distribution of time.)

NBC’s (presumed) choice to distribute time is no less than a judgment about who should be president. (And it’s ironic that this would fuel the pundits who come on later to ask “who won” the debate. They should just ask their corporate buddies who they decided to give more screen time to.) Proportioning time is different from cutting out candidates entirely. Not everyone can reasonably fit on a stage or within 2 hours, and a debate with 20 candidates isn’t going to be of particular use to the public. But, given a fixed number of candidates to include, and assuming the public benefits equally from hearing from each, then distributing the time grossly unevenly among the candidates doesn’t serve anyone except those that have something to gain through the election of one candidate or another, and it’s highly presumptuous.

So why would NBC do that? Before the obvious answer, there are two possibilities. The most generous is that the executives believe that the stronger or more likely to win candidates have some claim to more time. Why waste TV time on a candidate who won’t win? But this doesn’t explain the situation. John Edwards is not without hope, but NBC still gave 1.5 times more time to Clinton than to him.

The second possibility is that NBC believes this distribution will get higher ratings for the debate. Actually this isn’t an unreasonable idea. If it’s true that people watch what they want to hear, than people could prefer a debate when their preferred candidate speaks more. Then, proportioning out the time by each’s number of supporters could, in principle, make economic sense. (It’s not obvious that mathematically it does make sense, but you could make up an economic story to make it work.)

The third, cynical possibility is that NBC executives are being swayed by their own personal situations. By limiting the majority of the debate time to a few candidates, they increase the influence of their own campaign contributions to those candidates. I don’t know whether NBC execs contribute particularly differently from the population at large, but here are the numbers from CRP. Looking at donations from self-reported NBC executives of $500 or more to Democrats in the debate, $14,500 went to Obama, $7,600 to Clinton, $2,300 to Dodd, and nothing to anyone else. These numbers are no explanation for the time proportioning (then we would expect Clinton to have received the most), but it does show us that the NBC executives have a personal stake in the top candidates, just like everyone else. And if I were them, I certainly wouldn’t want the candidate I contributed to to be out-debated by an opponent who later goes on to win the White House. Who wants to contribute to a loser?

With either of the last two reasons, there is a large conflict of interest. It’s impossible to get out of it: Time was probably proportioned either to bolster ratings (i.e. playing with politics for money), or to bolster particular candidates (i.e. playing with politics for control).