A friend of a friend asked me what I thought of the different options for raising money for a for-profit startup. In case this is helpful to others, here’s a quick rundown:
An angel investor, meaning a rich person you can convince to give you money. This is usually the best route if you can make it work, because if it works at all there is typically close alignment between what the investor wants and what the company wants.
Venture capital firm. This is for the very high risk / very high reward route. VCs expect that one out of forty of their investments will pay off, and so when it does it has to pay off big. But it turns into a work hard/play hard environment, and Silicon Valley culture is just completely awful. Once you get VC funding, they have a lot of leverage, and as you need more money you have to do more of what they say — even if it doesn’t agree with your vision.
Academic funding. I don’t have any experience here, but I expect it’s exactly the opposite of VCs – low risk/low reward. If that option is available, it might be great.
Foundations. The big foundations most often fund nonprofits, but some like the Knight Foundation do investments in for-profit companies too. It’s a little like working with VCs, but I think the culture is a lot better when you work with a foundation.
Crowdfunding (like Kickstarter). Crowdfunding is actually the closest to free money, because you have so many supporters/investors that none of them have any power. If some of them are your friends, that’s even better because you have an added incentive to spend the money wisely. But compared to angels/VCs, if you go this route you’re sort of on your own. An angel/VC is like an added team member that you don’t get with crowdfunding.
There is really nothing as good as bootstrapping (self-funding), meaning you don’t take any money and just make things work. As soon as you take someone else’s money, you give them control and ownership in your company. The longer you can put that off, the more ownership and control you retain — which makes for a happier life and possibly more money in the long term. The more you can do on your own while you’re waiting for other options to work out, the more you have to show when you start asking people for money. Bootstrapping usually means you have a good, short-term business model — like selling something to someone.
Depending on what you’re doing, there’s also government grants and contracts (a pain in the neck to apply for) and sometimes government contests (e.g. challenge.gov), or other sorts of contests. (All of these would fall under bootstrapping.)
A lot of funding comes down to who you know and developing relationships and a track record over time that eventually lead to something working out. Often you don’t really have a choice of funding method because you have to be lucky for anything to work out.
Disclaimers: GovTrack was bootstrapped, but it took 8 years to get to sustainability, so on balance it still may not have made a profit yet. POPVOX was angel-backed. if.then.fund is (currently) bootstrapped/self-funded.
In a Facebook study published this week, Facebook manipulated all of their U.S. many of their users’ News Feeds by omitting 0-90% of posts containing either positive or negative content over the course of a week in 2012. They reported that those users wrote fewer positive and negative words (respectively) in their own posts, concluding that Facebook is a medium on which emotions spread, a case of “emotional contagion” using their technical term.
Here’s what you need to know:
On average, no emotion actually spread
The number of positive words in their average user’s posts decreased from 6 words to… 6 words.
The first major omission in the study is the lack of individual-level statistics. While they reported aggregate numbers such as having analyzed “over 3 million posts” totaling “122 million words” made by their “N = 689,003” users, and the study’s implication for “hundreds of thousands of emotion expressions,” they omitted any discussion of whether and how individuals were affected in any meaningful way.
From their numbers, the average user wrote 4.5-5 posts totaling 177 words during the experimental week. Only 3.6% of those words — so about 6 words — were “emotional,” and they found that by omitting about half of emotional posts from user’s News Feeds that percentage would go down by 0.1% or less. A 0.1% change is about 2/10ths of a word.
For most of their users, there was not even close to a measurable effect.
(The study did mention the Cohen’s d statistic of ‘0.02’ which is another way to say that there was an aggregate effect but basically no individual-level effect.)
The study has no test for external validity (was it about emotions at all?)
An important part of every study is checking that what you’re measuring actually relates to the phenomenon you’re interested in. This is called external validity. The authors of the Facebook study boasted that they didn’t think of this.
The paper quixotically mentions that “no text was seen by the researchers” in order to comply with Facebook’s agreement with its users about how it will use their data.
They didn’t look at all?
That’s kind of a problem. How do you perform a study on 122 million words and not look at any of them?
Are the posts even original, expressive content? The users might be sharing posts less (sharing is sort of like retweeting) or referring less to the emotional states of friends (“John sounds sad!”). The words in a post may reflect the emotions of someone besides the poster!
To classify words as “positive” or “negative” the study consulted a pre-existing list of positive and negative words used throughout these sorts of social science research studies. This comes with some limitations: sarcasm, quotation, or even simple negation completely cut out the legs under this approach. I actually think in aggregate these problems tend to go away, but only when you have a large effect size.
The whole of Facebook’s reported effect on emotion could be due to one of the many limitations of using word lists as a proxy for emotion. They needed to demonstrate it wasn’t.
Methodological concerns
This study is not reproducible. While most research isn’t ever reproduced, that it couldbe provides a check against the fabrication of results (and sometimes that’s how fabricators are caught). Facebook provides the only access to a network of this size and shape. It is unlikely they would provide access to research that might discredit the study.
The study also uses a strange analysis. Their experimental design was 2 X 9-ish (control or experiment X 10-90% of posts hidden), but they plugged the two variables into their linear regression in two ways. The first became a binary (“dummy”) variable in the regression, which is right, but the second become a weight on the data points rather than a predictor. That’s an odd choice. Do the results come out differently if the percentage of posts hidden is properly included in the regression model? Did they choose the analysis that gave the results they wanted to see? (This is why I say “about half of emotional posts” above, since the analysis is over a weighted range.)
Informed consent
Finally, there’s the problem of informed consent. It is unethical to run experiments on people without it. The paper addresses legal consent, in the sense that the users agreed to various things as a pre-condition for using Facebook. Though being manipulated was probably not one of them (I don’t know what Facebook’s terms of service were in early 2012 unfortunately).
Certainly the consent didn’t reach the level of informed consent, in which participants have a cogent sense of what is at stake. There’s a great discussion of this at Slate by Katy Waldman.
Facebook’s users have a right to be outraged over this.
Keep in mind though that there are different ethical obligations for research versus developing a product. It could be ethical for Facebook to manipulate News Feeds to figure out how to increase engagement while at the same time being unethical for a research journal to publish a paper about it.
Last week I noticed that the sunset aligned unusually well with my cross-street, Newton St NW, and it made me wonder if we have any Manhattanhenge-like events in DC. DC can one-up Manhattan — we’ve got a double-henge, if you’ll let me coin a phrase.
The Double-henge
Here in Columbia Heights we have a unique street pattern. Two roads — Park Rd and Monroe St. — come to an apex on 14th St. They go north both to the east and west of 14th St. On a few days a year — centered on May 15 and July 29 — the roads point east toward sunrise and west toward sunset. Click the links to see on suncalc.net. (The alignment isn’t exact, so the effect spans a few days.)
All the henges
Like Manhattan, DC’s grid lines up with sunrise & sunset. It’s on the equinoxes, so we get a boring double-henge on those days too.
Some of the state avenues are kind of close to the solar azimuths on the solstices, but the peak days are a few days off. In the summer it is on the same days as the Columbia Heights Doublehenge. On those days the avenues parallel to New York Avenue line up with sunrise and the avenues parallel to Pennsylvania Avenue line up with sunset. Around the winter solstice — Nov 5 and Feb 6 — the avenues parallel to Pennsylvania Avenue line up with sunrise and the avenues parallel to New York Avenue line up with sunset.
I wondered for each day of the year, what was the DC road that best aligns with sunrise and sunset. If you’re driving these would also be the roads to avoid (h/t @knowtheory). Here’s a table for the next year. The links will show you where exactly it is:
Over the last year I’ve had the opportunity to work with the DC Council on improving public access to DC’s laws. Today I join DC officials and the OpenGov Foundation on the Kojo Nnamdi radio show here in DC to talk about it, and in preparation for that I wrote this up as some notes for myself.
Civic hacking is a term for creative, often technological approaches to solving problems in our civic lives. It’s often about improving our interaction with government, and so building an app to get more people to register to vote would be an example of civic hacking. You might be surprised that that’s what it means. “Hacking” is a homonym, it is a word that has multiple meanings. We’re all familiar with how it can mean cyber crime. But just like how words like mouse, gay, fluke each have totally unrelated meanings, hacking is like that. The two meanings of hacking each have their own distinct communities. In my hacking community, we have organizations like Code for America and Code for DC trying to solve problems.
Codification is the process of compiling many statutes into an orderly, compact rendition of the law as it is today. Codification of laws began in 6th Century BC Athens. It wasn’t civic hacking. It was elites trying to protect their property. The Visigothic Code, written in Western Europe around 650 AC, directed “bishops and priests” to give a copy of the Code to the Jews to educate them of their heresy. So it goes. Actually it wasn’t all bad. The Visigothic Code also set a maximum price that the Code itself could be sold for (four hundred solidi, maybe $100,000 or more today), which perhaps was a form of ensuring wider access to it. Modern open records laws began in 18th Century China, where public announcements of promotions and government spending were common. Sweden enacted the first law creating a right to government records in 1766. And lay citizens have indeed long been users of the law. According to Olson (1992), “Pennsylvanians annoyed with what they thought to be unfair practices on the part of flour inspectors in the 1760s confronted the inspectors with copies of the laws.” (more history in my book)
The most important reason governments make the law available to the public is that ignorance of the law is not an excuse, and without access to the law one cannot properly defend oneself in court. Governments have an ethical obligation to promulgate the law.
But that is by no means the only reason why promulgating the law is important and useful. As the Law.Gov authors wrote, there are these other reasons: Broader use of legal materials in education (e.g. to train better lawyers and better citizens with respect to how they interact with government) and in research (e.g. to better understand how government works so that we, as elected officials and advocates, can make our government operate better); “Innovation in the legal information market by reducing barriers to entry.”; “Savings in the government’s own cost of providing these materials”; Reducing the cost of legal compliance for small businesses; “Increased foreign trade by making it easier for our foreign partners to understand our laws.”
There are many dimensions to access. Access isn’t meaningful without understanding. There are a lot of reasons why one might not understand the law even if we have access to read the words. And that’s a hard problem. But it is not a reason to not provide access to it in the first place. Users of the law can’t learn how to understand it if they can’t see it, and it would be mighty paternalistic to write off any citizen as unable to learn how to understand it. We should promote understanding, but in the meanwhile we must still provide access.
An aspect of understanding is whether we are able to be taught by others, or, inversely, if we may only teach ourselves. Surprisingly, there are many reasons why it might be illegal to share the law with others to teach them about it. The two most common causes of this are website terms of service and copyright:
The only electronic source of the DC Code in early 2013 was a website run by the (foreign-owned) company Westlaw. Westlaw was under contract with DC to help with the actual codification process as well as providing electronic public access. But through its website’s terms of service agreement, anyone reading the law on the public website was granted access in exchange for giving up rights. The terms of service included: “[Y]ou will not reproduce, duplicate, copy, download, store, further transmit, disseminate, transfer, or otherwise exploit this website, or any portion hereof . . . [Y]ou will not use any robot, spider, other automatic software or device, or manual process to monitor or copy our website or the content, information, or services on this website.” (accessed Apr. 26, 2013
Reproducing the law, copying and pasting it into an email, is a crucial tool for being able to understand the law. Terms of service are contracts. Violating a contract could normally result in a law suit and a civil penalty, typically in proportion to the harm done. Violations of website terms of service agreements in particular though can be a felony and lead to jail time under the Computer Fraud and Abuse Act. Copying DC’s laws could lead to jail time. That’s not a good thing. And that problem exists in many other jurisdictions.
DC has solved this problem by making the Code available to the public without terms of service.
Copyright is also a problem. Some states assert that they have copyright over their laws. Georgia, Idaho, and Mississippi have demanded that the nonprofit Public.Resource.Org take down its electronic copies of the official laws of those states. (There is some disagreement over whether so-called annotations to the law are law or are copyrighted.) Public.Resource.Org is fighting a similar argument with nonprofit standards-writing bodies — i.e. the bodies that write public safety codes and building construction standards — because they claim copyright over standards that have been incorporated into law. Violations of copyright law come with stiff fines. There should be no copyright over law, and court cases have addressed this, but some states have taken a particularly narrow and short-sighted view on this.
DC has historically claimed copyright over the DC Code as well, but apparently in a defensive posture to prevent its contractors (West, Lexis) from claiming copyright over the law themselves. DC now solved this problem by making the Code available to the public with a copyright waiver called Creative Commons Zero (CC0). DC no longer claims copyright over the code. (I’ll note again that there are a number of court cases that say that edicts of government, i.e. the law, cannot be copyrighted. But no one wants to have to go to court to fight over this.)
Understanding of the law is magnified if we use tools. The Code of the District of Columbia has almost 20,000 sections. Search is crucial. So is good typography: think about access to the visually impaired, the older people among us, and anyone who doesn’t want to get a headache from the way the law is printed. For companies concerned about legal compliance, the ability to be alerted to how the law has changed — with “track changes”-style highlighting — is incredibly useful. So not only is access important, but electronic access is even more important so that we can use tools to help us understand it.
Lawyers, citizens, students, and other users of the law have different needs when it comes to reading it. Government bodies should create a website to provide public access to the law, but it is a shame if they provide the only access to the law. The law should be provided to the public in data formats that promote reuse so that the public — companies, citizen hackers, and so on — can build on it and create new public access websites that are tuned for a wider range of access. These websites might provide functionality that a government website could not, such as analysis (e.g. written by lawyers), inline definitions, links to related resources (for instance related laws in other jurisdictions), translations into other languages, email alerts to changes, and a place where citizens can propose changes to the law.
There’s been some interesting negative reactions to open data lately. I’m all for skepticism, but skepticism should be backed up with facts. For instance, when Michael Gurstein talks about the digital divide he refers to concrete examples of this coming true — although I disagree with some of his analysis (more on that in my book). Andrea Di Maio’s critique of open government data, on the other hand, is too vague to be instructive.
I readily admit that “open government” and “open data” is vague already. Justin Grimes, Harlan Yu, and I held a panel at Transparency Camp a few weeks back about how vague the terms are and how that can lead to trouble when we aren’t clear about what we mean. Di Maio runs right into this problem, placing the burden of a successful open data movement on “mythical ‘application developers'” (sorry, I don’t exist?). Transparency is only one of a dozen or more reasons why open government data is a good thing, and these reasons cannot all be judged by the same rubric.
Focusing on open data for transparency, Di Maio argues that “[t]he more the data, . . . the more specialized are the skills and resources required to process that data,” or in other words that open data can actually exacerbate a digital divide rather than close it. I’ll be one of the first to say that open data doesn’t necessarily mean better government (see the link to my book above), but Di Maio’s statement that I quoted can easily seen to be simply wrong.
One has to evaluate open gov data with everything else held equal. in other words, if there is open government data in the hypothetical, the comparison to make is to another world where the same government processes exist — they’re just not published in a machine-processable format. Now, you tell me, which world requires more skill to understand the data? Clearly the second, because every skill you need in the open-data-enabled world you need in the open-data-denied world, but you *also* need some other skills just to get the information in the second world.
There are certainly unintended consequences of open data. In my book I discuss two cases where legislative data affected the behaviors of the legislators in a way that’s probably not good. But let’s stay on fact and, for that matter, on logic.
2010 was a big year for posturing. We saw introduced in Congress H.R. 4983: Transparency in Government Act of 2010 (Quigley), H.R. 6289: To direct the Librarian of Congress to make available to the public the bulk legislative… (Foster), and H.R. 4858: The Public Online Information Act of 2010. The Congressional Transparency Caucus was created (Quigley/Issa). An open data law was passed in San Francisco, and bills were introduced in New York City and New York State.
We owe Sunlight a lot of credit for pushing many of these things forward.
Besides all of this, there have been a number of “contests” lately (http://challenge.gov/) some offering prizes to use government data. Clay Johnson posted two new ones to the sunlightlabs mail list this week. The only thing I’ll say here about the strategy of the open government movement is that we haven’t taken the challenges seriously, and I think it’s a missed opportunity to show why open data matters. But it’s just one of many things to do.
As a web developer who likes to produce info graphics, I’ve often run into the problem of choosing good color palettes for charts and, in the harder case, a smooth color spectrum. Colors should be aesthetically pleasing and able to convey differences in accordance with our perceptual abilities.
UPDATE 1: Jan 9, 2011: I made a number of mistakes the first time around, including having the diagrams flipped.
There are a number of types of color blindness (see Wikipedia) but the most common are the absence or dysfunction of the “red”, “blue”, and “green” cones in the retina. The “red” and “green” cones affect up to 10% of men, the “blue” cones and all cones in women must more rarely. When a cone is absent, the individual can’t make distinctions among colors that vary only in the quantity of that color, if you think about colors as being a mix of red, green, and blue. (Actually the cones don’t respond to prototypical red, green, and blue but instead have a distribution of response over a range of color wavelengths not necessarily particularly close to the prototypical wavelengths.)
Until today I didn’t really understand the mechanics of color blindness and so it was difficult to understand how to choose good color spectra. Worse, the only simple guide I could find through Googling gave some examples of good color palettes to use without explanation and without relation to the various types of color blindness.
Here’s what I’ve learned this afternoon.
The CIE 1931 color space is a mathematical model for our perception of color based on the activity of the three types of cones. As with “normal” seeers who have three types of cones, the CIE 1931 model has three dimensions of color. And it has two powerful implications: first, it gives us a coordinate system that covers the gamut of colors that can be perceived. Second, it gives a mathematical model that can be used to understand what happens for color blind people. In particular, in the CIE 1931 flattened two-dimensional color space, color blindless is represented by radial lines emanating from a red, green, or blue “copunctal point”. Color blind individuals (of each type) cannot distinguish two colors if they fall on the same radial line. These are called confusion lines. (Also, this is an interesting way of understanding the reduction of one dimension of perception.)
Now let’s make this practical. For a web designer concerned about accessibility, avoid following radial lines! More on this in a moment.
Backing up from color blindness, it’s important that the choice of colors on a spectrum are spaced to correspond with our ability to distinguish nearby colors. One drawback of the CIE 1931 model is that, for instance, green gets an unfairly large region compared to other colors — that whole area up top just looks like the same green to me. A newer alternative color space called CIE 1976 (L*, a*, b*) (a.k.a. CIE LAB) is a transformation of the older CIE 1931 space so that equal distances in the color space represent equal amounts of perceptual distance. A color spectrum for a chart should be taken by drawing a line or curve through this type of color space. (In the images below, the color space is larger than the colored region but the black areas cannot be represented on computer screens because they do not fall in the RGB color space.)
Now we have to put these two together. The CIE 1931 color space gives a model for how to choose accessible colors: avoid the confusion lines. The best way to avoid a confusion line is to go perpendicular to confusion lines. But we want to go perpendicularly in CIE LAB space so that we make the most perceptually distinct step. (I’m assuming that perceptual distinctiveness between two points in CIE LAB space is unaffected by color blindness. It’s probably wrong but good enough.)
Let’s start with protanopia, the lack of “red” cones, as an example. The image below plots in the dark dotted lines the confusion lines for protanopia on the CIE LAB color space. Note that because CIE LAB space is a distortion of the CIE 1931 space, the confusion lines no longer appear to radiate from a point. (This color space has a third dimension for lightness, L in 0-100, not shown. Here I’m choosing L=50.) For a “normal”-sighted individual, any path through the color space will be perceptually useful for a chart. For a protanopic, only paths that go perpendicular to the confusion lines will have maximal perceptible differences. If you follow confusion lines, the protanopic will not be able to tell the difference. As you can tell, going from red to green is not a good idea since it follows a confusion line. The perpendiculars are indicated by white arrows. Good gradients follow the perpendiculars.
Protan Spectrum Lines in CIE LAB L=50
Here’s the full set of images for protanopia (red, left), deuteranopia (green, middle), and tritanopia (blue, right), for different values of lightness:
As you can see, “red” and “green” cone color blindness is similar. “Blue” cone color blindness is totally different, in fact it’s practically a 90-degree rotation of the other two making it impossible to follow a line that is maximally perceptible by everyone.
Since the first two are similar and the prevalence of tritanopia and tritanomaly are considerably rarer compared to the other two, if we put tritanopia and tritanomaly aside (for now!) and design for the other two, we might be able to choose a single color spectrum that at least works reasonably well for those cases. A good color spectrum to use will be a vertical line that stays within the RGB boundary, either orange to blue or red to purple. That said, if we vary from the perpendiculars a little bit we might be able to satisfy everyone a little. Orange to turquoise and green to pink go diagonally across the color space and so might cover everyone.
That said, this is all theoretical. I’m not color blind so I don’t have any intuitions about whether this is right. Also, this is my first time getting into the math of colors so… maybe I got it wrong somewhere. In fact, in my first version of this I had numbers backwards and perpendicular lines that weren’t. Hopefully this is closer to the truth now. (And I appreciate the great explanations given by Daniel Flück at his blog.)
Finally, apparently everyone can see lightness, so the most accessible spectrum is just varying the lightness (and the color doesn’t matter).
These images were created with a Python script and the grapefruit, numpy, and matplotlib libraries. Here is the code:
# Usage: python plot.py L protan|deutan|tritan
#
# e.g. python plot.py 50 protan
# e.g. python plot.py 50 protan
#
# To generate all of the images in at a bash shell:
# for L in {25,50,75}; do for b in {protan,deutan,tritan}; do echo $L $b; python plot.py $L $b; done; done
###########################################
import sys
from math import sqrt, atan2
from grapefruit import Color
import matplotlib.pyplot as plt
import numpy
w, h = (480, 480)
L = float(sys.argv[1])
bt = sys.argv[2] # blindness type
# According to http://www.colblindor.com/2009/01/19/colorblind-colors-of-confusion/
# These are points for each type of color blindness around which the dimensionality
# of the color space is reduced, in CIE 1931 color space.
copunctal_points = {
"protan": (0.7635, 0.2365),
"deutan": (1.4000, -0.4000),
"tritan": (0.1748 , 0.0000)
}
# Draw the color space.
colorspace = [[(0.0,0.0,0.0) for x in xrange(0, w)] for y in xrange(0, h)]
for x in xrange(0, w):
for y in xrange(0, h):
# Compute the CIE L*, a*, b* coordinates (easy, since our x,y coordinates
# are just a translation and scaling of the LAB coordinates).
a = 2.0*x/(w-1) - 1.0
b = 1.0 - 2.0*y/(h-1)
# Convert this into sRGB so we can plot the color, and plot it.
clr = Color.NewFromLab(L, a, b)
r, g, b = clr.rgb
if r < 0 or g < 0 or b < 0 or r > 1 or g > 1 or b > 1:
continue
colorspace[y][x] = (r,g,b)
# Draw the confusion line or spectrum line gradient.
csegs = 15
contourpoints = {
"x": [[0 for x in xrange(0, csegs)] for y in xrange(0, csegs)],
"y": [[0 for x in xrange(0, csegs)] for y in xrange(0, csegs)],
"spectrum": [[0 for x in xrange(0, csegs)] for y in xrange(0, csegs)],
"confusion": [[0 for x in xrange(0, csegs)] for y in xrange(0, csegs)]
}
for xi in xrange(0, csegs):
for yi in xrange(0, csegs):
# Compute pixel coordinate from grid coordinate.
x = xi/float(csegs-1) * (w-1)
y = yi/float(csegs-1) * (h-1)
# Compute the CIE L*, a*, b* coordinates (easy).
a = 2.0*x/(w-1) - 1.0
b = 2.0*y/(h-1) - 1.0
# Compute the corresponding CIE 1931 X, Y, Z coordinates.
X, Y, Z = Color.LabToXyz(L, a, b)
# Convert CIE 1931 X, Y, Z to CIE 1931 x, y (but we'll keep capital
# letters for the variable names). The copuntual point is in CIE 1931
# x, y coordinates.
X, Y = (X / (X + Y + Z), Y / (X + Y + Z))
contourpoints["x"][yi][xi] = x
contourpoints["y"][yi][xi] = y
# To compute the confusion lines, we plot a contour diagram where
# the value at each point is the point's angle relative to the copunctal
# point. Two points on the same confusion line will have the same angle,
# and contour plots connect points of the same value.
dY, dX = Y - copunctal_points[bt][1], X - copunctal_points[bt][0]
contourpoints["confusion"][yi][xi] = atan2(dY, dX) # yields confusion lines
# To compute the spectrum lines, we want lines perpendicular to the
# confusion lines. In my first attempt at this, I computed perpendiculars
# in the CIE 1931 space by choosing the contour plot value at a point to
# be the *distance* from the point to the copunctal point. This plotted
# the concentric circles around the copunctal point, transformed to
# CIE LAB space.
#
# However this is wrong, because the perpendiculars should be computed
# in CIE LAB space, which is the perceptual space. To compute the
# perpendiculars, we compute the gradient of the matrix that underlies
# the confusion lines. Then the gradient is plotted with the quiver plot type.
contourpoints["spectrum"] = numpy.gradient(numpy.array(contourpoints["confusion"], dtype=numpy.float))
for xi in xrange(0, csegs): # normalize!
for yi in xrange(0, csegs):
d = sqrt(contourpoints["spectrum"][0][yi][xi]**2 + contourpoints["spectrum"][1][yi][xi]**2)
contourpoints["spectrum"][0][yi][xi] /= d
contourpoints["spectrum"][1][yi][xi] /= d
# Draw it.
plt.figure()
plt.axes(frame_on=False)
plt.xticks([])
plt.yticks([])
plt.imshow(colorspace, extent=(0, w, 0, h))
plt.quiver(contourpoints["x"], contourpoints["y"], contourpoints["spectrum"][1], contourpoints["spectrum"][0], color="white", alpha=.5)
plt.contour(contourpoints["x"], contourpoints["y"], contourpoints["confusion"], w/15, colors="black", linestyles="dotted", alpha=.25)
plt.text(0, 0, "~".join(sys.argv[2:]) + "; L=" + sys.argv[1], color="white")
plt.savefig("colorspace_" + "_".join(sys.argv[1:]) + ".png", bbox_inches="tight", pad_inches=0)
I was curious today what screen resolutions people are using these days. Google Analytics reports the screen resolutions of your visitors but doesn’t give it to you in a way that is useful. It lists each unique screen resolution e.g. 1152×864 and how many visitors came with that resolution. But what you want to know is, how many people have a horizontal resolution of 1152 or more? That calls for a cumulative histogram.
Here are histograms for horizontal and vertical resolutions based on visitors to my site GovTrack.us over the last month. The horizontal resolutions show that around 95% of visitors support at least 1024 pixels, but it drops off to only around 70% of visitors supporting a greater horizontal resolution. The 70% hangs out till about 1280 pixels (meaning, should we be designing for 1280 pixels now and make things harder for just the remaining 30%?). Then it drops again to a mere 35% for anything greater than 1280. And as for the standard wide-screen resolution of 1680, it’s just around 15%.
For reference, the iPad’s resolution (in its most popular orientation) is 768×1024.
With 1024 pixels horizontally still the resolution most widely supported, it’s not surprising that 780 pixels vertically is the point of a big drop off too, from around 95% down to less than 50% supporting anything greater. While 70% of visitors support 1280 pixels horizontally, only around 30% support its 4:3-corresponding vertical resolution of 1024 (probably as more people are using widescreens).
(My last post began a long discussion on the OHP list. Here’s some of my follow-up.)
I don’t believe any of my transparency colleagues believe that there is a pervasive systematic problem with Hill staff & Members making literally corrupt decisions on a regular basis. However, I’m sure everyone here recognizes systemic selection biases (who can afford to get elected) and incentives (the revolving door) that are worthy of study.
The net effect of these biases and incentives is not clear, but there is certainly an effect. We know there have been a few bad apples and it is the public’s duty to be on the lookout for more. And we know that the biases and incentives affect policy results i.e. through who is elected, how committees are assigned, what lobbyists/advocates have access to Congress’s ears, and maybe in some more pernicious ways.
Now whether the net effect is in some sense good, neutral, or bad is something we’re disagreeing on. Tom is basically defending neutral, while most else would say bad.
Compared to what, and how would you know?
The problem with this discussion is that we can’t make up a hypothetical less-money-obsessed world world that we would all agree on. Take away money and some other aspect of the human condition is going to take its place. And even if we could imagine a world, how would you measure if that world was better off?
I’ll try to tie this back into the point I initially made:
Discovering bad applies through investigative, data-driven reporting is great for the country. It is actionable information. But while reporting on mere “correlation” establishes a *possible* bias or incentive, it neither indicates an actual effect on policy nor suggests any action that we could take that we could be reasonably sure would in fact improve policymaking.
Of course, correlations can be the beginning of an investigative project. Paul Blumenthal’s recent post “Incoming finance committee chairman relies on finance campaign contributions” raises a lot of concern over correlations. But its relevance is backed up by other observations and makes a good case that, at the very least, the media should be keeping a close eye on a Member of Congress who is being tempted by some very strong incentives. Hopefully he’ll resist the temptations.
I finally figured out the “easy” way to tether an Ubuntu laptop to an Android phone over USB without rooting the phone, provided you have remote SSH access to an Ubuntu machine somewhere else connected to the web.
The general idea is to use the phone to create a forwarded port from a laptop to some other machine with access to the web, and then to run a VPN connection over that port.
You will need:
A phone running Android 2 or later (or maybe earlier).
ConnectBot installed on the phone
A “host” machine running Linux, with SSH and OpenVPN installed and on which you have root access, and which has a public IP address.
Your Linux laptop that you want to tether, where you’ll need OpenVPN and root access again. And the Android SDK development tools (in particular, adb).
Setup on the host machine (i.e. the VPN server):
Install OpenVPN. Start the server with the following command: sudo /usr/sbin/openvpn --proto tcp-server --local localhost --dev tun --ifconfig 192.168.2.1 192.168.2.2
This will set up the server to run and listen for incoming VPN connections from the localhost only, using 192.168.2.1 for itself on the VPN and expecting your laptop to be configured for 192.168.2.2 (in the last step).
The VPN itself is set up unencrypted since the VPN connection will be running over an encrypted line (see below).
Obviously you’ll want to do this before you leave your home, though you also have an opportunity to start it where I’ve noted it below.
On the phone:
In Settings -> Applications -> Development turn on USB Debugging, which allows us to forward TCP connections through the USB cable.
Connect the phone to the laptop using the USB cable.
In ConnectBot, set up a new connection to the host machine with a SOCKS (aka dynamic) port forward on port 1080.
Start the connection and log in to the host machine.
If you haven’t already, start OpenVPN on the host machine at the command prompt.
On the laptop:
Start a port forward from the laptop to the phone using: sudo adb forward tcp:1080 tcp:1080
Set up DNS options in an alternate /etc/resolv.conf.tether file. For instance, put “nameserver IPADDR” into the file where IPADDR is the IP address of a public nameserver (e.g. copy it from /etc/resolv.conf on the host machine).
Start the VPN: sudo openvpn --proto tcp-client --dev tun --socks-proxy localhost --remote localhost --ifconfig 192.168.2.2 192.168.2.1 --route 0.0.0.0 0.0.0.0 --script-security 2 system --route-up "/bin/cp /etc/resolv.conf.tether /etc/resolv.conf"
The VPN connection first connects to the standard SOCKS port of 1080 on localhost, which adb is forwarding to the actual SOCKS proxy server on the phone. Through the proxy OpenVPN is connecting to “localhost” which is passed through the proxy to the host machine where it resolves to the host machine itself.
The route parameter sets up a default route to forward traffic on the laptop over the VPN. And the final option sets up DNS from your preconfigured options.
I haven’t tested this really. Just some notes for later.