July 2013 – Joshua Tauberer's Archived Blog

This was updated twice since first posting, as indicated below.

In a Wired article yesterday Lawmakers Who Upheld NSA Phone Spying Received Double the Defense Industry Cash, the author said that based on an analysis by MAPLight “defense cash was a better predictor of a member’s vote on the Amash amendment than party affiliation.” That suggests there’s evidence defense cash had something to do with the vote. There isn’t.There isn’t much.

Everyone who’s been following the Amash vote already knows that the vote was not along party lines in the least. Take a look at the seating chart diagram on the GovTrack vote page:

Liberal Democrats and conservative Republicans happened to form a coalition in opposition of NSA data collection (an “Aye” vote), while moderates in both parties voted to reject the amendment. (The seating chart arranges representatives by their GovTrack ideology score.) So, first, the fact that defense cash was a better predictor than party is not very interesting.

A better question is whether defense cash is a better predictor than a legislator’s pre-existing personal convictions, as measured by our ideology score.

It isn’t.

Defense cash’s prediction

To make this quantitative, let’s make the prediction like this. Since we know the vote was 205/217, let’s put the 217 legislators who received the most defense cash into one group and the bottom 205 legislators into another group. How well do those groups match the vote outcome? Here’s the breakdown by counts:

	Less$	More$
Aye	123	82
No	82	135

In other words, this prediction is right for 123+135 = 258 legislators, or just 61% of the time.

Ideology’s prediction

We can do a similar analysis based on the ideology score. The idea is that the further from the center a legislator is, the more likely he or she was to vote for the amendment. So let’s make groups for the 205 legislators with scores furthest from the median ideology score (“extreme”) and the 217 closest (“moderate”). Does that match the vote?

A little better.

	Extreme	Moderate
Aye	131	74
No	74	143

This prediction is right for 131+143 = 274 legislators, or 65% of the time. That’s a little better than defense cash, but let’s call it a draw.

[update: added 7/29/2013]

We have two predictors for the vote — personal conviction and campaign contributions — that are about equally good, and both are equally plausible. In the absence of other data, there’s no reason to prefer one explanation of the vote over the other.

Better together?

Votes are often mostly along party lines. That is, vote and party are often extremely highly correlated. That also means that to the extent money is highly correlated with votes, it’s then necessarily highly correlated with party affiliation too. That makes it very difficult, or impossible, to try to separate the influences of party and money.

But the Amash vote presents a uniquely interesting case because ideology (distance from the center) and defense dollars are not really correlated at all (r=-.05). That means ideology is good at predicting 60ish% of the votes and defense dollars are good at predicting a slightly different 60ish%. Maybe we can put them together to predict more than either can predict alone?

Let’s start with the predictions from the ideology score. We know we got 35%, or 148, of the votes wrong. So let’s swap the 74 congressmen in the ‘extreme’ group with the highest defense cash (call them the A group) with the 74 representatives in the ‘moderate’ group with the least defense cash (call them the B group). If money has any effect, we’d predict these to be the representatives most likely to be affected. Here’s how those representatives voted:

	A	B
Aye	35	38
No	39	36

Note that the by ideology alone, we predicted the As to be Aye voters and the Bs to be No voters, which was right 35+36=71 times. After the swap, we make the reverse predictions, which is right 39+38=77 times. The swap improves our predictions for 6 votes, or 1.4% (6 out of 422 aye and no votes).

The predictors are better together. That means there is room for an influence of defense dollars on the vote, even for a skeptic like me that prefers an explanation in terms of ideology first. But it’s a small effect in absolute terms. And this effect goes both ways. The 6 votes extra are split between 4 additional no-votes due to defense-dollars and 2 additional aye-votes due to lack-of-defense-dollars.

So let’s boil this down to one number. Out of the 422 votes, maybe about 4 no-votes were due to the influence of defense contractor campaign contributions. Even in a tight vote like this, that wouldn’t have affected the outcome. And it’s still a big maybe. This is a miniscule correlation that is probably due more to random chance than any actual influence of money.

(In a linear regression model, the adjusted r-squared roughly doubles when we put the factors together.)

[end of update]

What does it mean?

Since we have two predictors that are about equally good, and one has nothing to do either with defense or money, there’s no reason to think that defense cash had anything directly to do with the outcome of this vote.

There’s obviously a role of campaign cash in our political system. In particular, only candidates who can raise cash can run for office. I’ve written about that in my book if you want to know what I think in more detail.

But if you want to relate industry cash to a particular vote, you’re going to have to at least beat other explanations that aren’t based on that industry’s cash.

So, here’s the thing, it’s important that we actually tell truthful stories, not just ones that we can spin to match our beliefs.

[update: added 8/19/2013] Ben Klemens, a statistician, has turned this data into an interesting logit model and quantifies in a better way the effect of money on the vote: post 1, post 2. [end of update]

Analysis details

After merging the vote and ideology data from GovTrack with the campaign contributions aggregated by MAPLight into a single table (download), I ran the following script in R:

data = read.table("table.csv", header=T, sep=",")
attach(data)

# There were 205 Aye-votes.
num_ayes = sum(vote=='Aye')

# Group legislators by how much defense contractor money they received.
# Call the bottom 205 legislators the 'Less$' group, and the other half
# the 'More$' group.
defense_dollars = ifelse(rank(contribs) <= num_ayes, 'Less$', 'More$')

# Group legislators by how far their GovTrack ideology score is from
# the House median. Call the most extreme 205 legislators the 'Extreme'
# group, and the other half the 'Moderate' group.
distance_from_center = abs(ideology - median(ideology))
is_extreme = ifelse(rank(-distance_from_center) <= num_ayes, 'Extreme', 'Moderate')
table(vote, defense_dollars)
table(vote, is_extreme)
cat("cor(contribs, distance_from_center) =", cor(contribs, distance_from_center),"n")
swap_size = 74
group = ifelse(is_extreme=='Extreme', '0', 'Z')
group[is_extreme=='Extreme'][rank(-contribs[is_extreme=='Extreme']) <= swap_size] = 'A'
group[is_extreme!='Extreme'][rank(contribs[is_extreme!='Extreme']) <= swap_size] = 'B'
print(table(vote, group))

The House Appropriations committee passed up another chance to advance core transparency practices in Congress. In a draft report published this morning for FY2014 appropriations, the committee makes no mention of legislative data. And in the Bulk Data Task Force’s finally-released recommendations, the Library of Congress gets all worked up over something no one has been asking for.

Here’s the short of it. Can we get a spreadsheet simply listing all bills in Congress? Is that so hard? I guess so.

After last year’s legislative branch appropriations bill report said the committee was “concerned” that the public would misuse any bulk data downloads, The Washington Post covered how the public uses this sort of data for good, and House leadership formed a Bulk Data Task Force to consider if and how to make bulk legislative data available. That task force submitted recommendations to the House Appropriations committee last December, but it was only made available to the public last week (see this, page 679).

In the recommendations, the task force noted that it had begun several new transparency projects. One is the Bill Summaries project, in which the Library of Congress will begin to publish the summaries of House bills written by the Congressional Research Service (CRS) in some structured way. The Library of Congress’s report to the task force has some choice quotes:

“some groups may try to leverage this action to drive demand for public dissemination of CRS reports” (Note that “CRS reports” are different from “CRS summaries.” That’s a whole other can of worms.)

“CRS could find itself . . . needing to clarify misrepresentations made by non-congressional actors”

“if there is an obligation to inform the general public to the risks of non-authoritative versions of the information, it has not been included in the estimates”

These CRS summaries have already been widely distributed… on GovTrack… for nearly a decade. (And, I’m sorry, but what risks am I causing?) And while I wouldn’t mind having the summaries easier to get from the Library, I certainly am not gunning for them. I want data like the list of cosponsors, what activities bills have gone through, or just a simple list of bills. If the Library thought this wasn’t a great place to start with bulk data, well, I couldn’t agree more!

Some of the other projects mentioned in the recommendations are indeed very useful (some of which I wrote about here). Others, however, touted bulk data success without making any new data available. In the recommendations’s meeting minutes in the appendix, the task force wrote that it discussed “what data is available on GovTrack compared to what would be available through the proposed GPO project.” Quite a bit! That proposed GPO project turned into the one that made no new data available. In their next meeting they met with me and folks from other groups (Sunlight, Cornell LII, and so on), but I don’t recall them asking me the question they posed the week before, oddly.

The other projects mentioned in the bulk data task force recommendations are:

Congress.gov, THOMAS’s upgrade, which is explicitly not providing any bulk data (except perhaps through the new Bill Summaries Project)
Member Data Update: The Clerk’s list of Members of the House now includes Bioguide IDs, which is fantastic and very helpful.
A new House History website launched or will launched. See, I don’t even know. Again, not bulk data.
Docs.House.Gov: Committee schedules and documents have been added. (Great! I’m using that data on GovTrack already.)
New XML data for House floor activity. (This is pretty interesting but a little disorganized. I would rather scrape THOMAS than use this XML data.)
The Clerk is launching a Twitter account. (No data here.)
HouseLive speaker search. (Searching videos. Data? Who knows.)
Stock Act public data disclosure.
Legislative Data Dashboard (not quite sure what this is).
Converting the United States Code to XML. (This is a big and commendable project.)
A contest to get the public to convert bills to the Akoma Ntoso XML data format. (Does not count as open government data if the public has to do the work.)
Replacing MicroComp (an old bill/report text drafting tool?).
Positive Law Codification (when did that become in scope for this task force?).
Editorial Updating System (no idea what this is).

So while the recommendations support the use of legislative data generally, it made no long term goals for broad access to the legislative data on THOMAS. And as for the only data in motion now, the Library of Congress appears not to be happy about making it widely available.

The committee report for the annual legislative branch appropriations bill, which kicked off the task force last year, has been an important document for legislative transparency in the past. Besides last year’s step backwards, in 2009 the report indicated the House supported “bulk data downloads” for the bill status information on THOMAS.gov. Though nothing came of it. This year the committee said nothing, so, well, I guess nothing will come of it too.

	Joseph Kerski on 50% of the U.S. population liv…
	Harlan on 50% of the U.S. population liv…
	New Best Practices f… on Updated Guidance for Federal A…
	Supporting Best Prac… on Updated Guidance for Federal A…
	» Tauberer et… on Guidance: Federal agencies can…

Month: July 2013

[Updated] Defense dollars aren’t a better predictor of the Amash vote

The legislative data dance is a song that never ends