Posts in category 'smalltalk'

  • Dangers of Abstraction

    One of the more impressive things about Pharo/Squeak is the level of depth in the core libraries, and how those libraries build upon each other to create larger, complex structures. One need only look at the Collection hierarchy for an example of this, where myriad collection types are supported in a deep hierarchy that allows for powerful language constructs like:

    aCollection select: aPredicateBlock thenCollect: aMappingBlock
    

    to work across essentially every type of collection available. Unfortunately, building these large software constructs can have negative consequences when one attempts to analyze performance or complexity, and in this post I’ll outline one particular case that bit me a few weeks back.

    My problems all started while I was still experimenting with Magma. Magma, as you may or may not recall (depending on if you’ve read anything else I’ve posted… which you probably haven’t) is a pure-Smalltalk object-oriented database whose end goal is to provide the Smalltalk world with a free, powerful, transparent object store.

    Now, among Magma’s features is a powerful set of collections, which implement the aforementioned collection protocols, while also providing a much-needed feature: querying. In order to make use of this facility, any column that you wish to generate queries over must have an index defined over it, which is really a glorified hash table on the column1. Whenever you create one of these indexes on a collection, the index itself is squirreled away in a file on disk alongside the database. And that’s where the problems come in.

    In my application, a Go game repository, I had a fairly large number of collections sitting around holding references to Game objects (one per individual user, plus one per Go player), and I needed to be able to query each of these collections across a number of features (not the least of which, the tags applied to each game). That meant potentially many thousands of indexes in the system, at least2. And that meant thousands of files on disk for each of those indexes.

    Well, when I first hit the site, I found something rather peculiar: initially accessing an individual collection took a very long time. On the order of a few seconds, at least. Naturally this dismayed me, and so I started profiling the code, in order to pin down the performance issues. And I was, frankly, a little shocked at the outcome.

    It turns out that, deep in the bowels of the Magma index code, Magma makes use of the FileDirectory class to find the index file name for the index itself. Makes sense so far, right? As part of that, it uses some features of the FileDirectory class to identify files with a specific naming convention. And that code reads the entire directory, in order to identify the desired files.

    On the face of it, this should be fine.

    However, internally, that code does a bunch of work to translate those file names from Unicode to internal Squeak character/strings. And it turns out that little bit of code isn’t exactly snappy. Multiply that by thousands of files, and voila, you get horrible performance.

    So believe it or not, the index performance issues had nothing to do with Magma. It was all due to inefficiencies deep in the bowels of Squeak. And hence the subject of this article. Deep abstraction and code reuse is a very good thing, don’t get me wrong. But any time you build up what I think of as a “cathedral” of code, it’s possible for rotting foundations to bite you later.

    1. That, by itself, is a rather onerous requirement, but that’s really a separate issue 

    2. Granted, in retrospect, there may have been a better way to design the DB, but for a very small scale application, this approach sufficed 

  • Glorp - Early Impressions

    Well, this was meant to be a shorter post, but alas, I’ve failed miserably. Oh well, suck it up. Well, assuming anyone’s out there and actually reading this…

    Anyway, the topic today is… well, it should be evident from the post title: my initial impressions of Glorp. No, Glorp is not just the sound I make in the back of my throat while considering whether or not to ride the kiddie rollercoaster at West Edmonton Mall. It is, in fact, an object-relational mapping package for Smalltalk, which attempts to bridge the rather deep divide between the object-oriented and relational data modeling worlds.

    Now, generally speaking, I tend to be a fan of ORM’s. Of course, that’s probably because I’ve never really used one heavily in a production environment. But, generally speaking, the idea of describing the relationship between objects and their tables in code, and then having the code do all the work to generate a schema seems like a really nice thing to me. Of course, the real question then becomes, how hard is it to set up those mappings? And it turns out, in Glorp, the answer is: well, it’s a pain in the ass.

    Okay, to be fair, there’s a reason it’s a pain in the ass: Glorp is designed to be incredibly flexible, and so it’s designed for the general case. Unfortunately, that means added complexity. What kind of complexity, you ask? Well, allow me to demonstrate, using my little toy project as an example. This little project of mine is an online Go game record repository. As such, I need to store information about users, games, players, and so forth (well, there’s not much more forth… other than tags, that’s actually it). So, suppose we want to define a Game object and a User object, such that a Game contains a reference to the User that submitted it.

    Now, before I begin, you need to understand that a database is generally represented by a single Repository class of some kind. That Repository class, which must be a subclass of DescriptorSystem, defines the tables in the database schema, their relationships, and how those tables map to the various objects in your system. This information is encapsulated in methods with a standard naming convention (how very Rails-esque), so if some of this looks a tad funny, it’s not me, it’s the naming convention.

    So, let’s begin by defining a User. First, we need to describe the table schema where the User objects will come from:

    tableForUSERS: aTable
    
        aTable 
            createFieldNamed: 'UserID' type: platform sequence;
            createFieldNamed: 'Name' type: platform text;
            createFieldNamed: 'Password' type: platform text.
            
        (aTable fieldNamed: 'UserID') bePrimaryKey.
    

    This code should be pretty self-explanatory (a side-effect of Smalltalk’s lovely syntax). This method takes a blank DatabaseTable instance and populates it with the fields that define the User table. Additionally, it sets the PK for the table to be UserID. Easy peasy. Now, assuming the Users table maps to a class called GRUser, we define the class model that this table will map to.

    classModelGRUser: model
    
        model 
            newAttributeNamed: #userid;
            newAttributeNamed: #name;
            newAttributeNamed: #password;
            newAttributeNamed: #games collectionOf: GRGame.
    

    Also straightforward. This specifies the various attributes that make up the GRUser class. Incidentally, you still need to declare a real GRUser class… all this code does is tell Glorp what attributes it should be aware of, and what they are.

    Lastly, we need to defined a “descriptor” for the Users -> GRUser mapping. The descriptor basically defines how the various attributes in the model map to fields in the table. Additionally, it defines the relations between the tables. So, here we go:

    descriptorForGRUser: description
    
        | table |
        
        table := self tableNamed: 'Users'.
        
        description table: table.
        
        (description newMapping: DirectMapping)
            from: #userid to: (table fieldNamed: 'UserID').
            
        (description newMapping: DirectMapping)
            from: #name to: (table fieldNamed: 'Name').
            
        (description newMapping: DirectMapping)
            from: #password to: (table fieldNamed: 'Password').
    
        (description newMapping: ToManyMapping) 
            attributeName: #games;
            referenceClass: GRGame;
            collectionType: OrderedCollection;
            orderBy: #additionTime.
    

    So, for each field, we define a mapping. A DirectMapping instance maps an attribute to a field… err… directly. The ToManyMapping, on the other hand, sets up a relation, and maps the #games attribute of the GRUser class to the GRGame class. But how does it figure out how to do the join? That’s in the table and descriptor definitions for the Games table and GRGame object (note, I’m going to leave out the extra junk):

    descriptorForGRUser: description
    
        | table |
        
        table := self tableNamed: 'Users'.
        
        description table: table.
        
        (description newMapping: DirectMapping)
            from: #userid to: (table fieldNamed: 'UserID').
            
        (description newMapping: DirectMapping)
            from: #name to: (table fieldNamed: 'Name').
            
        (description newMapping: DirectMapping)
            from: #password to: (table fieldNamed: 'Password').
    
        (description newMapping: ToManyMapping) 
            attributeName: #games;
            referenceClass: GRGame;
            collectionType: OrderedCollection;
            orderBy: #additionTime.
    

    So as you can see, in the table definition, we establish a foreign key from the Games table to the Users table, and then in the descriptor, we define a RelationshipMapping (which is a synonym for a OneToOneMapping) from GRGame -> GRUser.

    I hope at this point you can see the one big problem with Glorp: It’s really really complicated. Worse, it’s not particularly well documented, which makes it a bit of a challenge to work with, and means that if you want to do something “interesting” it can be a bit of a challenge. As a quick example, in my schema, the Games table has two references to the Players table, one for the white player, and one for the black player. This greatly confuses Glorp, which means I had to do a bit of manual work to get the relationships set up. Here’s how the black player relation is established (there may be a better way, but I don’t know what it would be):

        blackField := table fieldNamed: 'Black'.
        playerIdField := (self tableNamed: 'Players') fieldNamed: 'PlayerID'.
    
        mapping := (description newMapping: RelationshipMapping)
            attributeName: #black;
            referenceClass: GRPlayer.
        
        mapping join: (
            self 
                joinFor: mapping
                toTables: { self tableNamed: 'Players' }
                fromConstraints: { }
                toConstraints: { ForeignKeyConstraint sourceField: blackField targetField: playerIdField }
        ).
    
    

    And then it’s basically the same thing for the white player. Mmmm… ugly.

    But, all that said, once the mappings are set up, suddenly Glorp can be a real joy to work with. Here’s the code necessary to add a user, and then query him back out:

    | user |
    
    user := GRUser withName: 'shyguy' andPassword: 'secret'.
    
    self session
        inUnitOfWorkDo: [ self session register: aGRUser ].
    
    self session 
        readOneOf: GRUser where: [ :each | each name = 'shyguy' ].
    

    The query is of particular interest. That looks an awful lot like a straight select block, but it is, in fact, translated into an SQL query, which is then run against the database. And that is pretty darn cool. It almost looks like a pure object store, ala Magma, and that’s mighty impressive.