So, generally speaking, I’ve typically adhered to the rule that those who develop software should be aware of various classes of algorithms and data structures, but should avoid implementing them if at all possible. The reasoning here is pretty simple, and I think pretty common:
- You’re reinventing the wheel. Stop that, we have enough wheels.
- You’re probably reinventing it badly.
So just go find yourself the appropriate wheel to solve your problem and move on.
Ah, but there’s a gotcha, here: Speaking for myself, I never truly understand an algorithm or data structure, both theoretically (ie, how it works in the abstract, complexity, etc) and practically (ie, how you’d actually implement the thing) until I try to implement it. After all, these things in the abstract can be tricky to grok, and when actually implemented you discover there’s all kinds of details and edge cases that you need to deal with.
Now, I’ve spent a lot of my free time learning about programming languages (the tools of our trade that we use to express our ideas), and about software architecture and design, the “blueprints”, if you will. But if languages are the tools and the architecture and design are the blueprints, algorithms and data structures are akin to the templates carpenters use for building doors, windows, etc. That is, they provide a general framework for solving various classes of problems that we as developers encounter day-to-day.
And, like a framer, day-to-day we may very well make use of various prefabbed components to get our jobs done more quickly and efficiently. But without understanding how and why those components are built the way they are, it can be very easy to misuse or abuse them. Plus, it can’t hurt if, when someone comes along and asks you to show off your mad skillz, you can demonstrate your ability to build one of those components from scratch.
Consequently, I plan to kick off a round of posts wherein I explore various interesting algorithms and data structures that happen to catch my attention. So far I have a couple on the list that look interesting, either because I don’t know them, or because it’s been so long that I’ve forgotten them…
- Skip list
- Fibonacci heap
- Red-Black tree
- Radix/PATRICIA Tries
- Suffix Tries
- Bloom filter
- Various streaming algorithms (computations over read-once streams of data):
- Heavy hitters (finding elements that appear more often than a proscribed freqency)
- Counting distinct elements
- Computing entropy
- Topological sort
And I guarantee there’s more that belong on this list, but this is just an initial roadmap… assuming I follow through, anyway.
So, a couple years back I started doing some subcontracting work for a buddy of mine who runs a little ColdFusion consultancy. As part of that work, I took ownership of one of the projects another sub had built for one of his client, and the experience has been… interesting.
See, like PHP and Perl, ColdFusion has the wonderful property of making it very easy for middling developers to write truly awful code that, ultimately, gets the job done. And so it is with this project. My predecessor was, to be complementary, one of those middling developers. The codebase, itself, is a total mess. Like, if there was a digital version of Hoarders, this code might be on it. But, it does get the job done, and ultimately, when it comes to customers, that’s what matters (well, until the bugs start rolling in).
Of course, as a self-respecting(-ish) developer, this is a nightmare. In the beginning, I dreaded modifying the code. Duplication is rampant, meaning a fix in one place may need to be done in many. Side effects are ubiquitous, so it’s difficult to predict the results of a change. Even simple things like consistent indentation are nowhere to be found. And don’t even dream of anything like automated regression tests.
Worse, feeling no ownership of the code, my strategy was to minimally disturb the code as it existed while implementing new features or bug fixes, which meant the status quo remained. Fortunately, around a year ago I finally got over this last hump and made the decision to gradually start modernizing the code. And that’s where things got fun.
One of the biggest problems with this code is that data access and business logic are littered throughout the code, with absolutely no separation between data and views. And, remember, it’s duplicated. Often. So the first order of business? Build a real data access layer, and do it such that the new code could live beside the old. Of course, this last requirement was fairly easy since there was no pre-existing data access layer to live beside…
So, in the last year, I’ve built at least a dozen CFCs that, slowly but surely, are beginning to encompass large portions of the (thankfully fairly simple) data model and attendant business logic. Then, as I’ve implemented new features or fixed bugs, I’ve migrated old business logic into the new data access layer and then updated old code to use the new object layer. Gradually, the old code is eroding away. Very gradually.
Finally, after a year of this, after chipping away and chipping away, finally, while there’s still loads of legacy code kicking around (including a surprising amount of simply dead code… apparently my predecessor didn’t understand how version control systems work–if you want to remove code, remove it, don’t comment it out!), the tide is slowly starting to turn. More and more often, bugs that need to be fixed are getting fixed in one place. New features are able to leverage the object layer, cutting down development time and bugs. And some major new features coming down the pipe will be substantially easier to build with this new infrastructure in place. It’s really incredibly satisfying, in a god-damn-this-is-how-it-should-be sort of way.
The funny thing is, this kind of approach goes very much against my natural instincts. Conservative by nature, I’m often the last person to start rewriting code. However, if there’s one thing this project has taught me (along with a couple wonderfully excited, eager co-workers), it’s that sometimes you really do have to gut the basement to fix the cracks in the foundation. And sometimes, you just gotta tear the whole house down.
A Simple Problem
So, recently I found myself struggling with a very simple problem: I kept missing our darn garbage pickup date. Yeah, sure, the city sends out a little printed schedule, and it usually makes it to my refrigerator, but given that requires I actually pay attention to the silly thing, it doesn’t end up helping me much.
Now, in the past, I’ve solved this problem by manually punching the pickup dates into Google Calendar. This works, but it’s tedious, error prone, and whenever the new schedule comes out, I have to do it all over again. And if it wasn’t clear, I’m lazy. Real lazy. So last time around I simply didn’t get around to doing it.
And so I miss the pickup dates.
Obviously what I really wanted was an iCalendar formatted file that I could just subscribe to in Google Calendar, at which point I could move on with my life. But, alas, to my knowledge no such resource exists.
An Idea Is Formed
Well, I found myself discussing this with my officemate, Steve, and he pointed out that the pickup schedule is, in fact, available online in raw form as part of the City of Edmonton’s Open Data Catalog. The Open Data Catalogue is a remarkable resource. In it you can find a staggering amount of data about the city, provided in a simple, machine-readable form, and browseable online. It’s really very cool, and it feels like a resource just waiting to be tapped.
Anyway, back to the topic at hand, it turns out the pickup schedule is available right in the catalogue, albeit in a fairly raw form. At this point, the answer seemed obvious: write a tool which could consume this data, and produce an iCal file as output. I could then subscribe to the generated calendar, and voila! No more missed garbage days!
Why Can’t Things Be Easy?
Unfortunately, things got a little tricky once we started digging into the data. It turns out that the city is subdivided into a series of zones. Each of these zones then has one or more garbage pickup days, and so one’s individual pickup schedule varies based on your geographic location in the city. As such, the pickup schedule data is organized as a simple table as follows:
Zone Day Date
So, now we have a problem: how do we determine the zone and day for a given household?
Well, it turns out the city provides the zone/day data as a geographic overlay, in the form of a KML data file. KML is a rather complex XML dialect which can be used to represent geographic data, and is used in, among other placed, Google Maps, Google Earth, and so forth. But, it’s a standard, and there are standard tools for handling it, so a gold star for the city!
Alright, so now we have the geographic data, we just need to parse that file, and then based on household location, identify the appropriate zone and day, and extract the correct schedule information. This should be easy…
The first question was one of language. I knew I needed a few things:
- Access to a decent web framework, so I could quickly slap together a (preferably REST) web service.
- Something to parse the KML data.
- Something to do point-in-polygon tests, so that, given the boundaries of a zone and a household lat/long, I could determine if the household was in that zone.
- A CSV library would be handy (this is the form the schedule data takes).
- A library for generating iCalendar files.
And, in addition, given this was a quick learning project, I figured it’d be nice to choose something I haven’t written much code in. In the end, I settled on Python, and in particular:
- CherryPy - A simple, lightweight web framework with support for REST built in.
- libkml - A KML library with (poorly documented) Python bindings.
- RosettaCode RayCasting Algorithm - Code that implements the standard raycasting point-in-polygon test (I could’ve written this myself, but… why?)
- Python’s standard CSV module.
- vobject - A python library for parsing and generating iCalendar files.
Now, it turns out the hardest bit was in ingesting the KML file. The libkml bindings, while functional, are awful. They aren’t terribly pythonic, they’re horribly documented, and the examples and tests provide no coverage of the API. As a result, I ended up reading a lot of SWIG binding definition files, in order to determine the method calls available in the object model provided. But, eventually, I was able to figure out how to extract both the zone metadata (name, day, etc), and the polygon definitions, and with that, it was fairly easy to write a small library to identify a household zone based on their latitude and longitude.
After that, the rest was really pretty easy: the code determines the user’s zone, downloads the schedule data in CSV form, identifies the rows for that household, and then spits out an iCalendar file containing that schedule data. Voila, done! The service is available here:
Just enter your lat/long, and you should get a valid iCal file back. Nice! And even better, it took me roughly a day to slap the whole thing together.
Upon posting about this on Google+, another friend of mine pointed out that it was rather onerous to expect the user to go look up the lat/long of their household in order to use the service. This got me thinking of alternatives… wouldn’t it be great if the user could just provide their address, and the service could do the rest?
Well, happy days, it turns out Google offers a free REST API for geocoding (the technical term for the process of mapping an address to a latitude/longitude pair (or vice versa)). Just hit the service with a street address, and they’ll hand you back geographic data in the form of a JSON or XML document. Nice!
About ten lines of code later (with the help of Python’s minidom module), I had a service which could take an address, get the lat/long automatically, and return the appropriate iCal file. You can hit this version here:
Just append your address to the URL, and you could get an iCal file back. Thanks Google!
BTW, if you want to take a look at the (quite simple) code, you can check it out on github.
Well, I know I said I was gonna write some posts on the Playbook as I begin developing for it. Unfortunately, there are a number of things which have deeply turned me off of the prospect…
Yes, I understand the device isn’t ready yet. I understand the simulators are in beta, and the SDKs are being completed. But the startling omissions in the current APIs make me seriously wonder about the device and the BB developers:
- No text box control. There’s a text field, but no multi-line control. So much for a note taking application or anything similar.
- No date picker. Seriously. Wow.
- No localization support. This is supposed to be an enterprise-level device, and it doesn’t have a localization API yet??
- No rotation support. Just… unbelievable. This is a tablet, ffs. How can they not have landscape/portrait mode available? Hell, apparently no one has even seen a sample Blackberry app that does portrait mode.
- No webkit engine API. They’re “working on it”, apparently. The device is supposed to be out in a month. I mean, really…
Meanwhile, the simulator doesn’t support things like:
- The camera API
- The multimedia API (you know, the thing that, to quote, is supposed to “differentiate” this device from others on the market).
It’s really quite stunning to me, and makes me wonder what other omissions there are in the application stack.
App World Application Blackholed
I applied for App World three weeks ago. And nothing. Apparently my application is being “reviewed”. Well, I ain’t spending time writing code if I’m not even sure I’ll be able to submit the thing.
Ridiculous App World Fees
Yes, the current submission fees have been waived, but it’ll be $20 per submission to App World once the promotion is over. That means every failed submission, every update, is gonna cost $20. It’s ridiculous.
The webinars BB posted were, frankly, terrible. The BB consultant running them is brutal, the material is superficial at best, his delivery is moronic, repetative, and frankly, boring… they’re just bad. Meanwhile, they’re full of glaring holes, bad examples, and don’t get me started on the marketspeak.
Meanwhile, every other question seems to highlight another gap in the SDK or simulator… the number of times I heard “we’re still working on a story for that” was impressive, to say the least.
Everything I’ve seen suggests this device is half-baked at best. Incomplete APIs, crappy presentations, an application process that seems to have stalled out on me, and a fee structure that seems designed to turn away smaller developers… for a $500 device, it really doesn’t seem to be worth the aggravation.