• ### Dangers of Abstraction

One of the more impressive things about Pharo/Squeak is the level of depth in the core libraries, and how those libraries build upon each other to create larger, complex structures. One need only look at the Collection hierarchy for an example of this, where myriad collection types are supported in a deep hierarchy that allows for powerful language constructs like:

aCollection select: aPredicateBlock thenCollect: aMappingBlock


to work across essentially every type of collection available. Unfortunately, building these large software constructs can have negative consequences when one attempts to analyze performance or complexity, and in this post I’ll outline one particular case that bit me a few weeks back.

My problems all started while I was still experimenting with Magma. Magma, as you may or may not recall (depending on if you’ve read anything else I’ve posted… which you probably haven’t) is a pure-Smalltalk object-oriented database whose end goal is to provide the Smalltalk world with a free, powerful, transparent object store.

Now, among Magma’s features is a powerful set of collections, which implement the aforementioned collection protocols, while also providing a much-needed feature: querying. In order to make use of this facility, any column that you wish to generate queries over must have an index defined over it, which is really a glorified hash table on the column1. Whenever you create one of these indexes on a collection, the index itself is squirreled away in a file on disk alongside the database. And that’s where the problems come in.

In my application, a Go game repository, I had a fairly large number of collections sitting around holding references to Game objects (one per individual user, plus one per Go player), and I needed to be able to query each of these collections across a number of features (not the least of which, the tags applied to each game). That meant potentially many thousands of indexes in the system, at least2. And that meant thousands of files on disk for each of those indexes.

Well, when I first hit the site, I found something rather peculiar: initially accessing an individual collection took a very long time. On the order of a few seconds, at least. Naturally this dismayed me, and so I started profiling the code, in order to pin down the performance issues. And I was, frankly, a little shocked at the outcome.

It turns out that, deep in the bowels of the Magma index code, Magma makes use of the FileDirectory class to find the index file name for the index itself. Makes sense so far, right? As part of that, it uses some features of the FileDirectory class to identify files with a specific naming convention. And that code reads the entire directory, in order to identify the desired files.

On the face of it, this should be fine.

However, internally, that code does a bunch of work to translate those file names from Unicode to internal Squeak character/strings. And it turns out that little bit of code isn’t exactly snappy. Multiply that by thousands of files, and voila, you get horrible performance.

So believe it or not, the index performance issues had nothing to do with Magma. It was all due to inefficiencies deep in the bowels of Squeak. And hence the subject of this article. Deep abstraction and code reuse is a very good thing, don’t get me wrong. But any time you build up what I think of as a “cathedral” of code, it’s possible for rotting foundations to bite you later.

1. That, by itself, is a rather onerous requirement, but that’s really a separate issue

2. Granted, in retrospect, there may have been a better way to design the DB, but for a very small scale application, this approach sufficed

• ### Running in the Rain - Wetter or Drier?

Well, as anyone living in Edmonton knows, the weather in our area has been, well… rather crappy. Cold, rainy, windy, it feels more like the fall than waning summer. And through it all, I’ve persisted in cycle commuting, mostly because it allows me to justify (excuse) a rather gastronomically decadent lifestyle. Consequently, I’ve found myself caught in more than a few showers over the last few weeks, resulting in much dampness, and, oddly enough, a bit of inspiration.

Now, a favorite show of many folks, myself included, is Mythbusters. They attempt to perform “scientific” experiments to verify or debunk various myths, preconceived notions, and so forth. Now, one of the topics they tackled was: Does moving faster in the rain keep you dry, or get you wetter? Well, in their “experiment”, I seem to recall they found little difference between slow or fast walking, which I found a little surprising, and during a recent bike trip, I found myself pondering how it is they could have found the results they did.

Meanwhile, I’ve also been digging more deeply into the joyous language that is Smalltalk, specifically the Squeak implementation, and a related web application framework called Seaside. However, I’ve been at a loss for a small-scale project to hack up that would allow me to flex my rather atrophed Smalltalk muscles. And so it was that, a couple days ago, while cycling home in the rain, I realized, why not simulate a person walking through a rain storm, and determine whether the Mythbusters results were accurate?

Now, before I get into the details, I should point out this really is pretty non-scientific. I’m sure there are details that I’ve missed which make this simulation completely unrealistic. But, it was fun. :) Now, a bit of explanation about my methodology. First, the simulation is two-dimensional, since I didn’t think the added complexity of doing a full, 3D simulation would generate sufficiently different results. Second, rather than moving my subject through a shower of rain drops at varying speeds, I decided to apply a uniform direction vector to the drops themselves (basically move the drops instead of the subject… the effect is the same, but the implementation is a lot easier). With that said, the experiment is set up as follows (note, these parameters are all configurable, but this is what I chose… they’re entirely arbitrary):

1. The rain drop spawn field is 20m by 20m.
2. The rain drops are created at a rate of 80 every second, distributed randomly across the top of the spawn field.
3. Rain drops fall at the terminal velocity for a typical drop indicated [http://www.grow.arizona.edu/water/raindropvelocity.shtml here] (6.25 m/s).
4. The subject is a rectangle approximately 6 feet tall by 6 inches wide.
5. The subject’s walking speed varies from 1 to 8 m/s, stepping 0.25 m/s per experiment.
6. The subject “walks” a fixed 20m during each experiment.
7. Each experiment was repeated 10 times and the results averaged (since rain drops are spawned in random positions).

The final tallies can be seen in the graph below:

Granted, it looks a bit noisy, but the general trend appears to indicate that moving faster through a rain storm helps keep you drier! Though, the advantage does seem to level off (it looks like a roughly exponential decay, to me, with the limit at some non-zero value). Remember that, folks… the weather doesn’t look like it’s going to improve. :(

Incidentally, working on this in Squeak has been quite enjoyable. The richness of the class library made many tasks far easier than they would be in other languages, and the ability to fix bugs as I go, and then continue running the code is, to say the least, incredibly cool. And, frankly, I think Smalltalk is the most elegant programming language I’ve ever worked with. :)

Update:

Found an oversight in my simulation, but the above graph now reflects the latest version. In short, I had to make sure the playfield was populated with raindrops before beginning each walk. Otherwise, the subject could complete the walk before a drop ever fell low enough to hit him!

Update 2:

Woo! I win a gold star!