March 2005 – Joshua Tauberer's Archived Blog

A Programming Project

A Programming Task for Someone Looking to Hack

The biggest thing that has helped me to program better is little programming projects. My first was a simple math tutoring program in GW-BASIC, written with the help of my dad back around third grade. I’ve almost always had a little project to keep me busy since then.Today, it’s creating an RDF library in C#.

I know that often people are looking for ideas for programs to write, so I thought I’d post a routine that someone might want to spend some time hacking. This is a mildly advanced routine, but anyway:

The goal is to parse an RDF/XML document using only XmlReader. That is, extract the RDF statements without loading the entire document into memory as an XmlDocument. As far as I know, this has never been programmed in C#, and it is really critical if semantic web applications are going to be built in .NET.

Getting the basics going isn’t too difficult a task. Getting the entire spec implemented is more of a challenge. But what’s life without challenges, eh? If you’re interested in taking a stab at this, drop me an email (tauberer@for.net).

A Design Suggestion

When I was riding the train back from D.C. to Philly last week, the speaker in the car I was in wasn’t working, so no one could hear the conductor’s announcements. Probably no Amtrak person noticed the problem.

It made me think that we often build things that don’t notice when they’re not working. Speakers should be built with microphones that realize when the speaker isn’t emitting the sound it should be, and when that happens it sends back a signal to… somewhere. Software should do the same thing. Applications should realize when things aren’t working right and, more importantly, send back a useful message that a problem occured.

Here’s a for instance. I plugged in a printer to my Linux desktop this week, but I couldn’t print a test page. The only message I got back was that I should increase the debugging level and inspect the output. Well, this is not a useful signal. Even with debugging on, the message I got was that the driver couldn’t be loaded. Pretty vague. It turned out the driver wasn’t even present on my system because I didn’t have the RPM installed. This is a condition that the printing system should have been able to detect and inform me of.

The failure here is there was no mechanism built into the system for passing back useful error messages to the user. If there was a useful message at some point, it was discarded before it reached me. Don’t write software like this.

My Trip to D.C.

Last night I got back from a two-day trip to D.C. The point of the
trip was to make a presentation about GovTrack and also to start some
collaboration with others on expanding the political information that
is freely and openly available online.

Monday afternoon I presented GovTrack and some ideas about the
semantic web to the people who are responsible for getting some
aspects of legislative information posted online in XML format. Right
now GovTrack gets its information from screen-scraping, which is an
inexact and fragile process of extracting information out of the same
HTML pages that you see when you view web sites. Having data
published also in XML format can greatly improve the accuracy of
getting information. What the people at the clerk of the House have
done to date, in terms of getting bills written in XML and roll call
votes posted in XML, has been a great step forward, although it
hasn’t been that useful for GovTrack. (One reason is the Senate
hasn’t followed suit because, as I understand it, the clerk of the
Senate isn’t authorized by the Senate itself to work on such things.)

I think I’ve met now almost all of the
players in the arena of building a network of political
information. Between everyone involved, we have enough data and enthusiasm to get
something very unique and useful started.

(For more details, see my posting on the GovTrack blog.)

Diffing and RDF

If you’re reading this, you’re probably reading this on Monologue, and that means I’ve successfully added myself to Monologue. 🙂

Recently I got a helpful bug report for my Diff library for C# which pointed out that my port of Perl’s Algorithm::Diff wasn’t generating the same diffs as the original module. I fixed the bug and reposted a new version of the library.

In unrelated news, I’m working on building the semantic web for information about the U.S. government. This is a spin-off of my work on GovTrack (which is powered by Mono). To get this web built, I’m in the position of having to convince people that RDF is the right way to approach the problem of distributed information — over, for instance, XML, XML Schema, and XQuery. The problem is that RDF is complicated and often misunderstood, and I hadn’t found a good document explaining what RDF is and why it should be used for this. So, I wrote one. I’m not a master of RDF by any means, so any corrections and suggestions are welcome.

By the way, if you’re interested in building this political semantic web, join the GovTrack mail list.

Lastly, with my new interest in RDF, I was looking for a good C# library for working with RDF data models. I didn’t find one that I particularly liked (there are a few ones out there, but for various reasons I just couldn’t see myself using them), so I’m working on my own. I’ll post the source in a few weeks, probably.

	Joseph Kerski on 50% of the U.S. population liv…
	Harlan on 50% of the U.S. population liv…
	New Best Practices f… on Updated Guidance for Federal A…
	Supporting Best Prac… on Updated Guidance for Federal A…
	» Tauberer et… on Guidance: Federal agencies can…