Printable Congressional District Maps: Behind The Scenes

Today I’m releasing print-quality maps of congressional districts, with street-level detail and county border lines. This has been one of the most sought-after resources based on emails I’ve received over the last some four years and I don’t think you can find this anywhere else. (At least not comprehensively for the whole nation. Local state clerk’s offices may have them. NationalAtlas.gov has maps but not with very much detail.)

This was a solid 2-day project with less than 300 lines of code and it’s something that only recently became this easy to do. I used Amazon Web Services (AWS), Census TIGER/Line cartographic data in an AWS pubic data snapshot, OpenStreetMap for the street detail in an AWS snapshot prepared by MapBox.com, Mapnik to render the maps (pre-installed on an AWS machine image prepared by MapBox), and the Python modules osgeo (for OGR) and PIL.

Here’s what  I did. This took a lot of trial and error, but in the end the steps were relatively simple.

Setting up the EC2 instance and the OpenStreetMap (OSM) planet data:

  • Start up a new Amazon EC2 Linux instance using the AWS machine image (AMI) prepared by MapBox linked above.
  • Create Amazon Elastic Block Storage (EBS) volumes for the two data sets (OpenStreetMap and Census TIGER/Line) in the same availability zone as the EC2 instance. If you do it in the AWS console, you’ll just need to search for the snapshots by ID or name (see the links above).
  • Attach the two volumes to the running EC2 instance as /dev/sdf (OSM) and /dev/sdg (TIGER).
  • Log into the EC2 instance with SSH.
  • Mount the two volumes: mkdir /mnt/osm; mount -t ext3 /dev/sdf /mnt/osm; mkdir /mnt/tiger; mount -t ext3 /dev/sdg /mnt/tiger
  • Following the MapBox instructions, attach the OSM data to Postgres, change the Postgres configuration to remove password protection, and restart Postgres.

To set up Mapnik, I followed the OpenStreetMap wiki which shows how to reuse their map styling. Most of the steps can be skipped because the data has already been set up in Postgres by MapBox. That involved:

  • Getting the OSM Mapnik files from their SVN repository.
  • Downloading some extraneous boundary information.
  • Create a new style definition that controls how map features are rendered based on the OSM defaults.
  • Editing the defaults a) so it actually works, and b) so it looks good at high DPI for printing (increasing font sizes, removing some icons). This took a lot of trial and error since I didn’t understand what was going on and regenerating a map takes some time.

The last step was to write a Python script that invokes Mapnik for each congressional district and generates a high-resolution map image.

  • The Census’s TIGER/Line cartographic data has a Shapefile-format file for each state containing the congressional districts in the state. The osgeo/OGR Python module can load the file and tell you the latitude/longitude bounds of the congressional district (among other things).
  • Then the Mapnik Python bindings are used to create a new map with the given size, loading in the OSM street data.
  • Additional layers are added from the TIGER/Line data for place names (CDPs and county subdivisions if you’re familiar with Census data), county names and borders, state borders (and shading of other states), and the boundaries of the congressional district itself and shading of other congressional districts.
  • After rendering the map, which takes ~30 seconds, I used the Python Imaging Library module to add header and footer text with a nice translucent effect.

Generating the maps at three resolutions for all of the congressional districts (except districts at-large) took several hours. I let it run overnight. They’re stored on Amazon S3 (the s3cmd tool is really useful for that).

There’s still room for a lot of improvement. After playing with the style instructions I got too much local road detail that in some places just ruins the whole map at low resolution. And in many places the county names aren’t showing up. Maybe because there’s too much detail. It’ll take some more trial and error to fix.

The source code (which includes all of the preparation steps in detail) is posted here.

Who’s been visiting our Assistant Deputy CTO for Open Government?

The White House began publishing its visitor logs — with sensitive information removed.

Honestly, I don’t really get what the big hubub is over this information. First, are corrupting influences on the administration really going to stop corrupting because of this? And, a corollary, who exactly is in a position to be reading over the records to make sure nothing bad is going on? Who are these visitors anyway?

But I always enjoy playing with data all the same. To do it with a little levity, I thought I would profile Robynn Sturm’s visitors. I met Robynn recently and certainly got the feeling that of all people to hold the title of Assistant Deputy Chief Technology Officer for Open Government for the United States of America, she seemed like a good person to hold the job.

Anyway, in September-October 2009, she had 35 visits. I only have the names and can’t be sure of who they are, but I’ll do my best to give Google search results that might be reasonable. Text comes from the pages I’ve linked to.

Ellen Alberding and Gretchen Sims are the president and education program manager, respectively, of the Joyce Foundation, which supports efforts to protect the natural environment of the Great Lakes, to reduce poverty and violence in the region, and to ensure that its people have access to good schools, decent jobs, and a diverse and thriving culture. Sims donated to the Democrats in the last two presidential elections. (I wouldn’t have mentioned it except that that’s how I found out where she worked.)

Ethan Batraski (@ethanjb): startup co-founder, mathematician, machine learning researcher, techanista, sharing thoughts on product management, startups, venture funding & semantic web

Marc Berejka worked in senior government affairs roles at Microsoft, including eight years as a lobbyist for the high-tech giant. Says Politico: “Opponents of the Obama administration’s position on patent reform say that David Kappos and Marc Berejka, who recently took top jobs in the Commerce Department, are wielding too much influence over a policy that stands to benefit both of their former companies.”

Lawrence Brandt, a co-editor of Digital Government, is a program manager within NSF.

Gerard Fiala is the staff director in the Senate HELP committee’s subcommittee on employment and workplace safety.

Seena Jon Ghaznavi (@sjgood) is a young actor in the movie Death of a President.

Michael Harding – This name is too popular.

Greg Horowitt and Victor Hwang are co-founders and Managing Directors of T2 Venture Capital, a venture fund focused on breakthrough technology spinning out of government and academia. Horowitt is also Director and Co-Founder of the Global CONNECT program based at the University of California, San Diego, and is a key thought leader in the field of ‘innovation systems’, and their relevant applications for sustainable regional economic development through technology commercialization.

Ester Lee might be the Ester Lee that works for AT&T. But maybe not.

Joseph Mancio – another somewhat popular name.

Dominic Mauro (@mynameisdom) is a TA for Internet Law at NY Law School (which, for reference, is the school that Beth Noveck comes from; Beth is Robynn’s boss).

Sara Mirsky is the American Constitutional Society’s NYLS chapter‘s co-president. (See note about about NYLS).

Courtney Patterson (@cnpatterson) is a obsessive-compulsive law student in NYC. (We probably know at which school.)

Gina Wells is another common name.

Phillip Wickham is President and CEO of the Kauffman Fellows Program at the Center for Venture Education in Palo Alto, CA. The mission of the Kauffman Fellows Program is to develop the next generation of leaders in venture capital.

John Bell is way too common a name.

Pamela Frugoli works for the Department of Labor.

Daniel Gomez might be a lawyer.

Melissa Sperry is too common a name.

Meredith Stewart is another popular name.

Haley van Dyck was a part of the Obama 08 campaign team, according to her step-mom’s LinkedIn page, who, btw, is proud of her.

Jing Vivatrat is either an m&a businesswoman or an FCC director, or both.