Congressional Disbursements Data Release Evaluation

This week the House of Representatives began publishing disbursements online. Disbursements include how much congressmen and their staffs are paid, what kinds of expenses they have, and who they are paying for these services. This is a really great case study in how to do transparency. There are a lot of wins here and several points to learn from.

The best thing I see so far is the documentation provided on There is a nice explanation of the reporting process, a FAQ, and a glossary. There is also a table of transaction codes found in the document, and these all are crucial for anyone reading or analyzing the information. This is one of the best examples of documentation I’ve seen for government data of this kind.

Here’s an evaluation of the disclosure based on standards I’ve drawn from others and outlined here.

In summary, many of the goals are met. The important ones not met are machine processability, public input, and public review. Machine processability is a very important one and the fact that this goal was not met seriously undermines much of the reason for publishing the information in the first place.

Information is not meaningfully public if it is not available on the Internet for free.


Data Should Be Primary. Primary data is data as collected at the source, with the finest possible level of granularity, not in aggregate or modified forms.

We can evaluate this because the documentation actually describes how the House Clerk receives the information. It talks about some degree of aggregation taking place, such as a $10,000 travel record not necessarily being for one trip. But by and large we’re seeing the level of detail that I think is expected.



The SOD is published quarterly, and hopefully this will include the online/electronic version going forward. I don’t think we can have too much higher of an expectation here. Ten years from now perhaps I would like to see real-time expense reporting, but not today.

— TO BE SEEN (if the electronic version is published as often as the print version in the future)

Accessible. Data are available to the widest range of users for the widest range of purposes.

This goal from the “8 Principles” refers to:

Data Format: PDF is an open standard, and therefore a good choice among the data formats for the purpose of publishing a document. (See more below.)

Bulk Data: The 3,400-page document is provided in a single PDF file, rather than hundreds or thousands of separate downloads. This satisfies the goal of bulk data.

Documentation: Documentation is excellent. There is an explanation of the reporting process, a FAQ, a table of codes, and a glossary.


Machine processable: Data are reasonably structured to allow automated processing.

This is the first goal which is not addressed at all by the data release. While PDF is good for documents, it is bad for tabular information. It does not support sorting, transforming, or other analysis, and it only marginally supports search. A spreadsheet format of any sort would be useful here, some formats better than others.

Considering the size of this data set, without the help of computers to process this information it is far less useful than it could be. To be given a barely-searchable 3,000-page file is only a small step up from being mailed several reams of paper.

Furthermore, there is no indication on the disbursements website that this will be considered in the future.


Non-discriminatory: Data are available to anyone, with no requirement of registration.
Non-proprietary: Data are available in a format over which no entity has exclusive control.
License-free. Dissemination of the data is not limited by intellectual property law or other terms.


Promote analysis: Data published by the government should be in formats and approaches that promote analysis and reuse of that data.

This goal (and the next two) comes from the Association of Computing Machinery’s Recommendation on Open Government and is similar to the Machine Processable goal above. So, see above.


Safe file formats: Government bodies publishing data online should always seek to publish using data formats that do not include executable content.

PDF is, relatively speaking, a safe file format.


Provenance and trust: Published content should be digitally signed or include attestation of publication/creation date, authenticity, and integrity.

According to the disbursements website, the files are digitally signed. I haven’t verified that the signature process was done correctly.


Public input: The public is in the best position to determine what information technologies will be best suited for the applications the public intends to create for itself.

I am sure members of our community have been in touch with the Clerk’s office. However, there was no public discussion on how these files ought to have been made available, and therefore I am going to not count this goal as having been met.


Public review: There should be a means for the public to interact with the data publisher during and after the data has been made. The public may have questions or may find errors. The process of creating the data should also be transparent.

This is a goal rarely given any attention. The documentation gives significant insight into this process. But there is no contact person for this data set that is made known to the public.


Interagency coordination: Interoperability makes data more valuable by making it easier to derive new uses from combinations of data. To the extent two data sets refer to the same kinds of things, the creators of the data sets should strive to make them interoperable.

There is a potential to link the names of Members of Congress to their ID numbers provided in, say, the XML voting records.


Permanent Web Address: The file should have a stable location.

— GOAL ACHIEVED (provided it is kept there)

Globally Unique Identifiers: This concept, important on the world wide web, is that any document, resource, data record, or entity mentioned in a database, or some might say every paragraph in a document, should have a unique identification that others can use to point to or cite it elsewhere.


Linked Open Data: This is a method for publishing databases in a standard format for interconnectivity with other databases without the expense of wide agreement on unified inter-agency or global data standards.


2 thoughts on “Congressional Disbursements Data Release Evaluation”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: