Wow, it’s been a shockingly long time since I’ve posted an update here… seems like I probably should, particularly since I’m now paying for a hosting service to run this damn thing. Might as well actually make it worth my while.
Which brings me to the first major change, largely transparent to anyone strange enough to still be reading this thing, which is that about a month and a half ago, or so, I finally moved my public websites off to Linode. Why Linode? Mainly because, for a modest fee, I get what amounts to a naked Linux box with IPv4 and IPv6 connectivity, upon which I can run basically anything, and so it really just becomes another server in my arsenal, which is rather handy. With it I’m hosting both this blog and my mom’s business website, all more or less highly available, and not victim to the whims of the power company servicing my house.
As a bonus, the upstream on my Linode kicks the ass of the upstream I get at home, so performance should be much quicker for anyone still visiting this place.
Next up, we have another reason why I haven’t posted things here in a while: I’ve got me a Google+ account, and have been moving a lot of my public communications there. Of course, for updates that are directly relevant to my software projects and so forth, I’d rather post them here, so this blog will live on, as rarely updated as it is.
I’ve also been moving more and more of my actual software over to GitHub. The migration is slow, and I’m still struggling to decide which projects I want to move (I’ve got plenty of old Software Projects that I could move, I just can’t decide if it’s worth the trouble), but so far I’ve been damn happy with the results when it comes to NetHackDS, so it makes sense to migrate other projects which make sense (such as the inexplicably popular savsender).
So, there we go, mandatory post complete. Maybe I’ll even author another mandatory post some time in the future. Stay tuned!
One of the more impressive things about Pharo/Squeak is the level of depth in the core libraries, and how those libraries build upon each other to create larger, complex structures. One need only look at the Collection hierarchy for an example of this, where myriad collection types are supported in a deep hierarchy that allows for powerful language constructs like:
aCollection select: aPredicateBlock thenCollect: aMappingBlock
to work across essentially every type of collection available. Unfortunately, building these large software constructs can have negative consequences when one attempts to analyze performance or complexity, and in this post I’ll outline one particular case that bit me a few weeks back.
My problems all started while I was still experimenting with Magma. Magma, as you may or may not recall (depending on if you’ve read anything else I’ve posted… which you probably haven’t) is a pure-Smalltalk object-oriented database whose end goal is to provide the Smalltalk world with a free, powerful, transparent object store.
Now, among Magma’s features is a powerful set of collections, which implement the aforementioned collection protocols, while also providing a much-needed feature: querying. In order to make use of this facility, any column that you wish to generate queries over must have an index defined over it, which is really a glorified hash table on the column1. Whenever you create one of these indexes on a collection, the index itself is squirreled away in a file on disk alongside the database. And that’s where the problems come in.
In my application, a Go game repository, I had a fairly large number of collections sitting around holding references to Game objects (one per individual user, plus one per Go player), and I needed to be able to query each of these collections across a number of features (not the least of which, the tags applied to each game). That meant potentially many thousands of indexes in the system, at least2. And that meant thousands of files on disk for each of those indexes.
Well, when I first hit the site, I found something rather peculiar: initially accessing an individual collection took a very long time. On the order of a few seconds, at least. Naturally this dismayed me, and so I started profiling the code, in order to pin down the performance issues. And I was, frankly, a little shocked at the outcome.
It turns out that, deep in the bowels of the Magma index code, Magma makes use of the FileDirectory class to find the index file name for the index itself. Makes sense so far, right? As part of that, it uses some features of the FileDirectory class to identify files with a specific naming convention. And that code reads the entire directory, in order to identify the desired files.
On the face of it, this should be fine.
However, internally, that code does a bunch of work to translate those file names from Unicode to internal Squeak character/strings. And it turns out that little bit of code isn’t exactly snappy. Multiply that by thousands of files, and voila, you get horrible performance.
So believe it or not, the index performance issues had nothing to do with Magma. It was all due to inefficiencies deep in the bowels of Squeak. And hence the subject of this article. Deep abstraction and code reuse is a very good thing, don’t get me wrong. But any time you build up what I think of as a “cathedral” of code, it’s possible for rotting foundations to bite you later.
In my many years in the software development industry, not to mention my many years in the software development education industry, I’ve been continually amazed by the tacit acceptance of the fact that many (most?) software developers are terrible writers. The university programmes don’t require anything beyond a simple English 101 class, and companies simply accept the fact that many of their people are, at best, barely literate. It’s a sad, stupid state of affairs, and I figured I’d take a few minutes to explain why I think it’s a detriment to the industry as a whole.
You see, in my mind, at it’s core, software development is fundamentally an act of communication. Of course, there’s the obvious fact that a developer must take their ideas and communicate them to the computer, which then executes them. But as developers, we must also communicate ideas to our users, through the user interfaces we build. And we must also communicate ideas to other developers through the code itself, not to mention the comments therein (after all, as any developer will tell you, development is as much, if not more, about reading code as it is writing it).
Similarly, writing is, obviously, an act of communication. When a writer writes, their goal is to take amorphous, ephemeral ideas, and turn them into concrete, written words which preserve the essence of those ideas and communicates them to the reader.
Now, in order to communicate complex ideas through written word, one must master some very basic skills:
- The ability to clearly conceptualize an idea and transform it into a more concrete expression.
- The ability to break down that idea into simple parts that can be easily explained.
- The ability to explain those parts in a way the reader can understand.
- The ability to take those parts, now explained, and to synthesize them into a coherent whole.
Does this sound anything at all like software development?
Furthermore, a capable writer pays attention to detail. He is as much concerned with the way an idea is expressed as he is with communicating the idea itself. For example, I could’ve written this entire post in short, terse sentences with no paragraph breaks. But I care as much about how these ideas are communicated as I do about the actual act of communicating them.
Similarly, in the area of software development, while two developers may derive the same solution to a problem, one may choose to write terse, difficult to read code that’s poorly formatted and organized, and consequently difficult to maintain, while the other may produce code that’s precisely the opposite.
By now you can probably guess what I’m getting at. I would surmise that you would find a correlation between developers who are skilled writers, and those who produce code that’s clean, readable, and maintainable. Now, that’s not to say there aren’t exceptions. I’m sure there are many many developers out there that are great writers yet terrible developers, and vice versa. But I would contend that, statistically, you would find a correlation between writing skill and development skill, and at their core, these two disciplines are really very similar.
So why is it that we accept such poor writing skill in the development community? Quite honestly, I’m not sure. I think part of the issue is the fundamental belief that software development is an engineering skill, a process that’s dominated purely by technological problems that must be solved with technological solutions. I suspect it’s also driven by a false dichotomy, the idea that writers are “thinkers” and technologists are “doers”. But I truly believe it needs to change. Meanwhile, the next time I interview someone, I may be tempted to ask them to write a short essay on a topic of my choice…
So, it’s winter here in the northern hemisphere (although, given the weather we’ve been having lately, you wouldn’t know it), and I now have a renewed passion for two of my favorite hobbies: knitting and Nethack.
Anyway, since the start of my “season” I’ve created and killed off a whole host of characters, during which time I’ve often felt the nearly irresistible urge to throw my DS against the wall. And this fact begs an interesting question (to use that phrase colloquially): Why on earth do I do this to myself??
Now, for those not in the know, Nethack is part of a family of games known as Rogue-likes, named after their original progenitor, Dungeon Crawler. Err, I mean, Rogue. Anyway, this family of games all have a few things in common (which is why they’re a family, duh):
First, they’re almost invariably centered around a character the user controls, who is then responsible for exploring a world, encountering bad guys, and eventually progressing to the endgoal, whatever that may be. In the case of Nethack, it’s a dungeon, and the player’s goal is to descend to the bottom of that dungeon, retrieve the Amulet of Yendor, and return it to his god, whilst not dying along the way.
Second, most roguelikes involves lots of items, armour, weapons, scrolls, wands, spellbooks, and so forth, that the player can acquire along the way, either by finding them randomly, looting from corpses, or buying or stealing from shops.
Third, those items? They’re unidentified at the outset. For example, in Nethack, you may come across a scroll with a name like “NR 9”, but you’ll have no idea what it actually does. So a large part of the game is focused on various tricks to identify those items. Oh, and of course, items can be good or bad, so that scroll may have been a scroll of enchant armour, or it may have been a scroll of destroy armour. So you can’t just go randomly reading scrolls, zapping wands, and trying things on (unless you plan to die quickly).
Fourth, when you die, you’re dead. No take-backsies. No save points. Nadda. You can, of course, save your current game and pick it up later, but if you die, that save state is gone. Toast. Kaput. You’re boned. So you have to be very careful. And avoid stupidity (the YASD, or Yet Another Stupid Death, is a common experience amongst Nethackers).
Fourth, and most importantly, the level layout, the positions of the items and their identities, the enemies, they’re all random. So each game is completely different.
So, back to the question. Why do I do this? YASD after YASD, I still come back for more, and I like it. And it’s that fourth item that, I think, is the key.
You see, gambling works by a pretty simple reward system, combined with the thrill of risk taking. Of course, anyone who’s spent any time in a casino understands what I’m talking about, here. Notice any similarities? Like any other form of gambling, Nethack provides randomized rewards to the players in exchange for risk, and as one progresses in the game, the risk only gets more pronounced (since the player has more and more invested in their character). One game, they may find a wand of wishing on the second level. The next, they might hit a poly trap in the Gnomish Mines, blow out their armour, and get killed by a cockatrice. That kind of randomized reward system plays with the brain in the exact same way that, say, Blackjack does.
So you really have to wonder, are there problem Nethack players out there? Was Rogue really the first Evercrack? I’m betting the answer is ‘yes’… the only difference is, unlike WoW, the Rogue-like family has maintained a relatively low profile, and so you don’t see the kind of widespread addiction we now see in modern MMORPGs.
1 of 2