My goals

I get this question a lot so…

What led you to develop this tool? (2001)

I began working on in early 2001 after taking a college class that introduced me to the realities of the political system: Policy isn’t developed meritocratically, that is, on the merits, but instead is a result of a process in which groups have unequal (and diverse types of) power. Information asymmetry (groups have different access to information) was a sort of power imbalance that I could attempt to fix, by making information about the U.S. Congress more readily accessible (and actionable) to a wider audience. GovTrack launched in 2004.

POPVOX (2010)

In 2010 I became interested in how Americans share their opinions and knowledge with Congress. Folk wisdom was that signing a petition was an ineffective means of advocacy. What would work better? With two other cofounders, we launched to a) aggregate and quantify constituent sentiment for bills, and b) improve the way constituents share their rich knowledge about how policy would affect them. I quit the company in 2012. (2012)

Jonathan Zucker and I began exploring a possible venture in 2014 and launched in early 2015. After more than 10 years of GovTrack, and several years on a second civic technology startup, I was motivated to find a way to make money less influential in policy-making through a means other than campaign finance reform. (I don’t oppose reform necessarily, but I am not a policy guy.) attempts to shift the balance of power away from moneyed interested by making small dollar donors more powerful in elections, by making their contributions more targeted and better aggregated.

What goals do you have with the tool?

My goal here is simple: People come to the site seeking information about what is happening in the U.S. Congress, and I try to provide a comprehensive, contextual, and clear answer. Each person that learns something from the website is a success story. I hope to make Americans better equipped to deal with government, or, for users who are legislative professionals, I hope to provide better information and cheaper than they could get the same information from a paid service (namely, free).

See above. Our goals with this website is to change the way politics is done in America.

What were the challenges encountered and the results so far?

There has only ever been one challenge, and that has been identifying the right product to develop (i.e. the right problem to tackle). There are no easy problems in the civic space, and it takes years to learn how the world works well enough to have a realistic picture of how a new product could find into that world and make a change.

The results here have been phenomenal. GovTrack serves around 10 million individuals each year, either through the website directly or through one of the many websites and apps that reuse GovTrack’s open data. GovTrack is regularly cited in the media. And in a recent survey, users reported GovTrack makes them more confident about approaching government.

It is too soon to know what the results have been so far.

Pew asked Americans what they think about open data

TL;DR: Pew’s numbers are probably way off, they’re presented in a completely misleading context, and I don’t know why anyone would want to survey the American public for familiarity with technical jargon.

Yesterday Pew published a survey of how Americans feel about open government data. Don’t believe their lies. (This is a Memento reference…)

Pew says “relatively few” when they mean “OMG actually a lot!”

Pew’s report on their survey suffered from the same flaw usually lobbied at open data itself — lack of context. Their numbers are grossly (and negligently) misleading without context. So if anyone thinks Pew is in some high up position to come and judge open data, I’d rethink that. For instance, Pew said

Relatively few Americans reported using government data sources . . . 20% have used government sources to find information about student or teacher performance.

They think 20% is a little. But only about half of adult-aged Americans (to match their sample) are students or parents of students. So of people that actually might care about student/teacher performance, about half used government data. To me that’s huge! If Pew is going to slip in a judgment about whether this is a lot or a little, they ought to substantiate it and not pass it off as if it were a finding.

Pew asked questions we know Americans don’t know the answer to

Pew asked their panelists how often they made use of government data. We know Americans don’t always know when the services they are using are government services, so there’s no reason to think Pew’s panelists had any idea how often they made use of government data.

I think there were a few surveys about this a few years ago, but in one covered here, almost half of those who took Pell grants, unemployment insurance, and other forms of government assistance believed they had not ever used a government social program.

Why should Americans know our jargon?

The survey is like asking Americans how they think TCP/IP will affect government services. (What do you think the results of that survey would be?) TCP/IP is the protocol that underlies the whole Internet — it’s super important. But there’s no reason to think the American public would be, or should be, familiar with our technical jargon.

TCP/IP, like government data, is technical jargon that refers to a means, and not an end. The open data community has the unfortunate habit of talking about open data as if it were an end in itself. It’s not. It’s in the service of other goals (better government service delivery, for instance).  Do Americans like free weather reports? Then they probably like government data even if they don’t know that that’s what we call it.

The survey tells us about American’s familiarity with our technical jargon. If I’m doing my job right in informing and empowering Americans, then they won’t know my technical jargon and just get to be informed and empowered. And, so, Pew’s survey doesn’t tell us anything about whether Americans use government data or what they think about its importance.

#Hack4Congress: An event where citizens can make Congress better

Self-governance is hard — and it is getting harder. When Congress first convened in 1789, the nation entrusted its lawmaking powers to just 79 people. Today Americans elect 541 federal lawmakers who then hire tens of thousands of staff members to help them write law and connect with constituents, lobbyists, and campaign supporters. The laws they write are hundreds or thousands of pages of unintelligible instructions to the nation’s codifiers and check-writers.

It is a hot mess. But it’s our mess.

My goal with is to enable Americans, including congressional staff, to more effectively carry-out our self-government responsibilities. As Mark Schmitt recently asked,  “how do we reform American politics so that [our] pluralistic vision . . . might actually describe reality?”

On April 30-May 1, join me, The OpenGov Foundation, Harvard’s Ash Center, and other colleagues for #Hack4Congress in DC where we’ll try to make our mess of self-governance just a little bit tidier. — Register Here

We’re going to problem-solve how we can make self-governance better. That includes both issues we face as citizens keeping Congress accountable as well as issues faced by congressional staff as they do their best to represent their constituents.

Who should come? Anyone with a passion for Congress is welcome. If you like to imagine and design products, research wonky but very real problems, translate techno-speak, or develop software, you will be welcome. You can be a hobbyist or a professional.

The event culminates with a presentation session before a panel of judges (I’ll be one) who are practitioners, scholars and others active in the civic tech and data space. Finalists will present their solutions to high-level congressional representatives this spring.

This is the third event in the #Hack4Congress series — the previous were in Boston and San Francisco.

Why the return-on-investment of open data is the wrong question

Based on brief remarks I gave at Open Data Day DC 2015.

Open data is a set of practices. It is a community around those practices. And it is a set of values that we bring to problems that we’re tasked to solve.

Open data is a lot like voting. On election day, voting is a messy process, and not everyone wants to do it. It’s expensive to buy and maintain all of those voting machines, to take the day off from work, to do recounts when something goes wrong. It’s confusing. There are a lot of local positions I’m not familiar with and I need the help of experts to effectively participate.

But if someone walked up to you at 10pm election night and asked you to demonstrate the return on investment of all of the day’s efforts, I think you’d say that that’s not the right question. You have to look back, first, at the history of how we got to vote. And then you have to be patient and look forward for change and evolution in government and the new policies that might be enacted years if not decades later, to know whether the vote was “successful.”

So it goes for open data. We should invest in learning and perfecting the methods of open data — how you publish it, get it, analyze it, and so on — and about the values of open data. But always keep in mind that these skills and ideas are in the service of the problems that brought us to use open data in the first place: government corruption, consumer choice in the marketplace, more effectively telling a story, widening access to justice, and so on. Those are big problems, and when we bring open data to the table we must remember to evaluate it in the context of playing the long game for specific social change or other goals.

Campaign finance reading list

Thomas Stratmann. 2005. Some talk: Money in politics. A (partial) review of the literature. In Public Choice, volume 124.
Decades of academic research, and copious amounts of data, has failed to find any widespread influence of campaign contributions on the outcomes of roll call votes.

Joshua L. Kalla and David E. Broockman. Forthcoming. Campaign Contributions Facilitate Access to Congressional Officials: A Randomized Field Experiment.
A field experiment showed that campaign contributors get greater access to policymakers. “[The first randomized field experiment on the effects of campaign contributions on access to policymakers. In the experiment, a political organization attempted to schedule meetings between 191 Congressional offices and active campaign donors in their districts. .  . . When informed prospective attendees were political donors, senior policymakers made themselves available between three and four times more often.”

Caitlin Macneal. June 9, 2014. GOP Rep. Acknowledges That Members Expect Donations For Votes. In Talking Points Memo Limewire.
It’s an open secret that large donors make tactical contributions. Macneal reports on an open admission of how this works. “McAllister told the crowd that an unnamed colleague told him on the House floor that if he voted ‘no’ on the bill, he would receive a contribution from Heritage, a conservative think tank. ‘I played dumb and asked him, “How would you vote?” ‘ McAllister said. ‘He told me, “Vote no and you will get a $1,200 check from the Heritage [Action]. If you vote yes, you will get a $1,000 check from some environmental impact group.” ‘ ”

Lee Jared Drutman. 2010. The Business of America is Lobbying: The Expansion of Corporate Political Activity and the Future of American Pluralism. Doctoral dissertation, U.C. Berkeley.
In a survey of lobbyists by Lee Drutman, the importance of fundraiser events was ranked near the bottom among 21 lobbying tactics. Drutman also reported that of businesses with a lobbying presence in Washington, D.C., just 24% maintain a PAC, the sort of organization they would need to make campaign contributions.22 (Of course, as Drutman pointed out, the sensitivity of admitting that fundraisers are a component of lobbying may have reduced their apparent importance.) (pages 11, 39)

Damon M. Cann. 2009. Sharing the Wealth: Member Contributions and the Exchange Theory of Party Influence in the U.S. House of Representatives.
Cann performed a thorough analysis of how transfers of money between congressional campaigns influenced committee chair assignments. Cann compared seniority, party unity, contributions to other candidates’ campaigns and other factors against who won and who lost of those House members seeking chair positions. On the bright side, it hasn’t always been about money. In the 104th Congress, the Speaker (Newt Gingrich) relied primarily on committee seniority when choosing his new set of committee chairs, following long-standing precedent. Chair selection in the 105th and 106th Congresses (under Gingrich and then Dennis Hastert) began to be influenced by campaign contributions to the party. An extra $30,000 could catapult the second senior Republican member into the chair. By 2001 and the 107th Congress, the seniority system had been abandoned. By the numbers, Hastert’s chair assignments from the 107th to the 109th Congress could be explained almost entirely by who had given the most to Hastert’s party and whether they had in the past voted in unity with the party. A similar but slightly less certain picture unfolded for the selection of the chairs of the Appropriations subcommittees.

Lynn Vavreck. Oct. 7, 2014. A Campaign Dollar’s Power Is More Valuable to a Challenger. In The New York Times / Upshot.
The value of a dollar spent may be worth more to challengers than to incumbents. “[T]o earn one additional vote, the incumbent member of Congress had to spend roughly $200, while the mayoral challenger had to spend only $30 . . . Caps on money probably hurt challengers in both parties more than they hurt either individual party. A large amount of money in campaigns, often deplored, may actually hurt incumbents by helping challengers compete effectively. 

Eleanor Neff Powell and Justin Grimmer. 2014. Money in Exile: Campaign Contributions and Committee Access.
Some contributions are shown to be tied to whether a member of Congress holds a particular committee position. That is, some contributors are trying to shape the make-up of committees. “[W]e exploit committee exile—the involuntary removal of committee members after a party loses a sizable number of seats . . . We use exile to show that . . . [i]ndustries overseen by the committee decrease contributions to exiled legislators, and instead direct their contributions to new committee members from the opposite party.”

Phil Mattingly. August 28, 2014. The Super PAC Workaround: How Candidates Quietly, Legally Communicate. In Bloomberg Businessweek.
Candidates cannot coordinate their expenditures with other PACs that support them. This article shows how candidates are skirting the rules to communicate with Super PACs.

Ray La Raja. January 7, 2015. Campaign finance laws that make small donations public may lead to fewer people contributing and to smaller donations. In the London School of Economics and Political Science blog.
Donors, at least small donors, are reluctant to divulge personal information and put their contribution in the public record. Disclosure of personal information can decrease small money donations by half and can lead to donors making smaller donations to stay beneath reporting requirements.

Author anonymous. February 5, 2015. Confessions of a congressman: 9 secrets from the inside. In Vox.
“Campaigns are so expensive that the average member needs a million-dollar war chest every two years and spends 50 percent to 75 percent of their term in office raising money. Think about that. You’re paying us to do a job and we’re spending that time you’re paying us asking rich people and corporations to give us money so we can run ads convincing you to keep paying us to do this job.” “If a member of Congress doesn’t vote with his or her party 99 percent of the time, he’s considered unreliable and excluded from party decision-making.”

My 13-year campaign for legislative data finally comes to a successful end

Yesterday at a small meeting the Senate announced that it would be making its legislative data available to the public. This has been a long time coming.

The what & why

No legislative branch agency makes available a spreadsheet that lists every bill introduced in Congress. This issue is that simple. We’re finally going to get a list of bills in a useful data format, and, hopefully, a lot more information on top of it, some time next year.

I first asked the Library of Congress for access to its database of legislation in 2001 when I began building GovTrack. They said no, under orders from the House and Senate, and so I began “screen scraping”, or reverse engineering, their public website for the same information and making that data freely available to others. The data is what you need to create large-scale visualization, analysis, and tools, such as the ideology and leadership scores, bill prognosis, email updates, legislator report cards, bill text paragraph permalinks, maps of congressional districts, advanced search, and much much more that I built on GovTrack.

And my data on GovTrack, rather than anything Congress produces, quickly became the authoritative source for legislative information. Endless apps have been build on top of the data I made available. Even Congress comes to me for data. Representatives embed the maps on GovTrack on their websites and ask me, from time to time, for their own voting statistics. The House Democrats use GovTrack’s data to keep their caucus informed, and many Senate offices load GovTrack data into their back-office systems.

The data is now collected in a community project on GitHub (which began in 2012 and was spearheaded by Eric Mill at the Sunlight Foundation, Derek Willis, and myself), but the right place for this data is Congress. I never wanted to be the linchpin of congressional information (except in so far as it provided me with a career, so… thank you Congress). Once the Senate begins actually making its data available, planned for next year some time, I hope to see Congress become the authoritative source for its own information.

The history

Advocacy around legislative data began in 2007. At the request of Speaker Nancy Pelosi, who was looking for ways to reform the House, a group of government transparency advocates issued the The Open House Project report, co-written by myself and others and spearheaded by the new Sunlight Foundation. The report called for the House to make available the legislative data I had been asking for, among several other transparency recommendations. It was just seven years ago that “data” was something totally new to Congress. I surveyed the state of legislative data in 2008 — there was not much. At this time the Senate had not yet even started publishing its voting records in data (as XML). Following the report, many of us worked with Senate staff to explain why making vote data available to the public was a good thing, and only in 2009 did they start making that available (see also 2007). In 2009 we also secured favorable language in the FY 2009 omnibus appropriations bill (see also 2008), but Congress’s support agencies largely ignored the directive to make data available.

John Wonderlich at the Sunlight Foundation, who had started The Open House Project, kept the advocacy going over the next several years. But the House, under Pelosi, was not very responsive to requests for more transparency during this time. Some headway was made, but not in legislative data.

The Republican take-over of the House in 2011 marked a major shift toward transparency. They began making much more data available and promised data about bills. When one representative strangely tried to put the kibosh on data in 2012, The Washington Post ran a story about it (and about me, which was flattering), which lit the fire under House leadership and lead to the formation of the House Bulk Data Task Force. Advocates formed a new Congressional Data Coalition in 2014, spearheaded by Daniel Schuman at CREW, and we secured favorable language in the FY2015 legislative branch appropriations bill to keep the pressure on. The House task force during this time made some progress, but without the cooperation from the Senate it wasn’t able to actually do much.

That’s what changed yesterday: the Senate is on board. This closes out what has been, for me, a 13-year campaign.

Daniel wrote more about the news here.

DC updates its open data terms of use: Round 2

Over the last few months DC has worked with the open data community to revise its outdated terms of use agreement. Here’s where we stand today, after DC’s second revision posted earlier today.

Background: Do I need a lawyer to hack?

Back in September I asked Do I need a lawyer to hack in DC? on the Code for DC blog. I had discovered that in exchange for access to the District’s data, civic hackers (including myself) were agreeing to very odd terms including not taking any legal action against the District. Imagine if the data reveals actual injustice. We’d have given up the right to use the legal system to make things right! See the Code for DC post for more on why I think these terms were bad policy, but in short: data isn’t “open” if it can only be used on capricious terms. Open government data must be license-free.

What’s been revised since then

The District’s Office of the Chief Technology Officer (OCTO) immediately engaged with me, Code for DC, and others in the open government community to fix these problems. To their credit, several OCTO staff members spent several hours talking through these issues with me on multiple occasions. They have really been putting in the effort to get this all right.

Little more than a week after my blog post, DC posted its first update to the terms, which Alex Howard covered here. That update removed two of the clauses that I noted were problematic:

  • the agreement not to take legal action against the District
  • the indemnification clause

The removal of those two clauses were major improvements. But the rest of the updated terms, in the parts I cared about, were incoherent. They had intended to retain a requirement to attribute the District in all uses of District data, they explained to me, but the legal language they used to say it made no sense.

In a new update to the terms posted today, which followed additional conversations with OCTO, there were two more great improvements. These terms were finally dropped:

  • agreeing to follow all “rules”, a very ambiguous term
  • the requirement to attribute the data to the District in all uses of the data (it’s now merely a suggestion)

The removal of these two requirements, in combination with the two removed in September, makes this a very important step forward.

One of my original concerns remains, however, and that is that the District has not granted anyone a copyright license to use District datasets. Data per se isn’t protected by copyright law, but the way a dataset is presented may be. The District has claimed copyright over its things before, and it remains risky to use District datasets without a copyright license. Both the September update and today’s update attempted to address this concern but each created more confusion that there was before.

Although today’s update mentions the CC0 public domain dedication, which would be the correct way to make the District data available, it also explicitly says that the District retains copyright:

  • The terms say, at the top, that they “apply only to . . . non-copyrightable information.” The whole point is that we need a license to use the aspects of the datasets that are copyrighted by the District.
  • Later on, the terms read: “Any copyrighted or trademarked content included on these Sites retains that copyright or trademark protection.” Again, this says that the District retains copyright.
  • And: “You must secure permission for reuse of copyrighted … content,” which, as written (but probably not intended), seems to say that to the extent the District datasets are copyrighted, data users must seek permission to use it first. (Among other problems, like side-stepping “fair use” in copyright law.)

With respect to the copyright question, the new terms document is a step backward because it may confuse data users into thinking the datasets have been dedicated to the public domain when in fact they haven’t been.