Posts in category 'hacking'

  • Haskell + Data Analysis -> Good Times

    So, as part of my ongoing obsession with toying with unusual programming languages, Haskell has periodically popped on and off my radar. The problem is, it’s rare that I find a problem where I feel like sitting down and figuring out how to solve it in Haskell, particularly since Haskell’s strengths and weaknesses don’t often mesh with the kinds of ad-hoc programming I tend to do (for example, Haskell sucks for text parsing, primarily due to performance constraints, and I find much of the random coding I do involves high-volume text processing).

    But all that has changed due to an interesting problem we’ve been fighting with at work. You see, on one of our production servers, we’re having performance problems. And so the first thing we did was find a way to collect telemetry. Of course, the first cut dumped out raw CSV files, which are a pain in the butt to manipulate in interesting ways, and as a result, I found myself writing a lot of Perl to deal with the data we received. Not fun.

    Finally, after days of this, I decided to write a new tool that collects telemetry as we were doing before, but rather than using CSV, stores the data in an SQLite database, thus making the information a hell of a lot easier to manipulate. “But now you need to analyze that database!”, you say. Ahh yes, you’re quite right, and normally I might turn to Perl to do just that. However, it turns out, Haskell is more or less perfect for that very job.

    See, Haskell just so happens to have HDBC, which is really the Haskell equivalent to Perl’s DBI. And there just happens to be an SQLite HDBC driver available, which provides a nice functional interface to the underlying database. With this combination, querying the database and manipulating its contents becomes exceedingly easy. And in particular, because of Haskell’s laziness, we can do much of our processing in a streaming fashion, rather than bulk loading large amounts of data for processing.

    For example, suppose we have a table as follows:

      ID   Date   Value  

    Where you may have multiple rows for a given date. Now say you want to take that table, and group it so that all the rows for the same date are collected together. Well, in Perl, you’d probably set up a loop, track the previous and next rows, build a list in memory, and output the results as you go, and that would work out just fine. But it’s tedious. Haskell, on the other hand, makes this all remarkably easy.

    First, let’s back up. What we really want to do is take a list of items, and then group them together based on some kind of splitting function. It may be a list of integers, a list of strings, or a list of database rows. But in the end, it’s really all the same thing. Well, you could define a function like that as follows:

    ~> splitWhen :: (a -> a -> Bool) -> [a] -> ([a], [a])                                                     
    splitWhen func [] = ([], [])                                                                           
    splitWhen func (head:[]) = ([head], [])                                                                
    splitWhen func (first:second:rest)                                                                     
      | func first second = (first:result, remainder)                                                       
      | otherwise         = ([first], second:rest)                                                          
      where (result, remainder) = splitWhen func (second:rest)                                             
                                                                                                            
    ~> splitList :: (a -> a -> Bool) -> [a] -> [[a]]                                                           
    splitList func [] = []                                                                                  
    splitList func lst = group:(splitList func remainder)                                                   
      where (group, remainder) = splitWhen func lst  
    

    So, first we define splitWhen, which is a function that takes:

    1. A test function.
    2. A list.

    The test function is applied to each pair of items in the list, starting at the beginning, and the list is split at the point where the function returns false. splitList then uses splitWhen to break a whole list into groups. So, for example:

    splitList (\x y -> x < y) [ 1, 2, 1, 3 ]
    

    Returns

    [ [1, 2] [1, 3] ]
    

    But this code has another interesting property that may not be obvious to someone unused to Haskell: these functions are lazy. That means they only do work as elements are requested from the list. For example, given this code:

    take 5 $ splitWhen (\x y -> x < y) [ sin x | x <- [ 1 .. ] ]
    

    The second part of this statement generates an infinite list of the sin() values of the whole numbers starting from 1. And splitWhen operates on that list. If this weren’t Haskell, this code would run forever, but because Haskell evaluates statements lazily, this only returns the first 5 groups, as follows:

    [
      [0.8414709848078965, 0.9092974268256817],
      [0.1411200080598672],
      [-0.7568024953079282],
      [-0.9589242746631385, -0.27941549819892586, 0.6569865987187891, 0.9893582466233818],
      [0.4121184852417566]
    ]
    

    Nice! As an aside, this is one of the more interesting aspects of Haskell: it encourages you to write reusable functions like this.

    So, let’s apply this to a database query. Well, it turns out, that’s dead simple. You’d just do something like:

    conn <- connectSqlite3 "database.db"
    stmt <- prepare conn "SELECT Date, Value FROM theTable ORDER BY Date"
    execute stmt []
    groups <- (splitWhen (\(adate:rest) (bdate:rest) -> adate == bdate)) `liftM` (fetchAllRows stmt)
    
    putStrLn $ take 5 groups
    

    Yeah, okay, this is a little dense. The first few lines prepare our query. No big deal there. It’s the last line where the magic really happens. First, let’s start on the far right. Here we see the function fetchAllRows being called. That function returns the rows generated from the query, but it does so lazily. So rows are only retrieved from the database as they’re needed. We then apply the splitWhen function to the results (ignore the liftM, that has to do with Monads, and you probably don’t want to know…). And then we take 5 groups from the result. Voila! In a surprisingly small amount of code, a huge chunk of which is nicely generic and reusable, we can do what, in Perl, would likely take dozens of lines of code. Pretty nice!

  • AJAX in Seaside

    So, in yet another post on a series about Pharo and Seaside, I thought I’d highlight a great strength in Seaside: it’s incredibly powerful support for building rich, AJAX-enabled web applications.

    As any web developer today knows, if you’re building rich web apps with complex user interactions, you’d be remiss not to look at AJAX for facilitating some of those interactions. AJAX makes it possible for a rendered web page, in a browser, to interact with the server and perform partial updates of the web page, in situ. This means that full page loads aren’t necessary to, say, update a list of information on the screen, and results in a cleaner, more seamless user experience (Gmail was really an early champion of this technique).

    Now, traditionally, an AJAX workflow involves attaching Javascript functions to page element event handlers, and then writing those functions so that they call back to the web server using an XmlHttpRequest object, after which the results are inserted into an element on the screen. Of course, doing this in a cross-browser way is pretty complex, given various inconsistencies in the DOM and so forth, and so the web development world birthed libraries like jQuery and Prototype, and higher-level libraries like Script.aculo.us. But in the end, you still have to write Javascript, create server endpoints by hand, and so forth. Again, we’re back to gritty web development. And that makes me a sad panda.

    Of course, this post wouldn’t exist if Seaside didn’t somehow make this situation a whole lot simpler, and boy does it ever. To illustrate this, I’m going to demonstrate an AJAX-enabled version of the counter program mentioned in my first post on Seaside. So, instead of doing a full page refresh to display the updated counter value, we’re simply going to update the heading each time the value changes. Now, again, imagine what it would take to do this is a more traditional web framework. Then compare it to this:

    renderContentOn: html
    
     | id counter |
    
     counter := 0.
     id := html nextId.
    
     html heading id: id; with: counter.
    
     html anchor
       onClick: (
         html scriptaculous updater
           id: id;
           callback: [ :ajaxHtml | 
             counter := counter + 1. 
             ajaxHtml text: counter.
           ]
       );
       url: '#';
       with: 'Increase'.
       
     html space.
     
     html anchor
       onClick: (
         html scriptaculous updater
           id: id;
           callback: [ :ajaxHtml | 
             counter := counter - 1. 
             ajaxHtml text: counter.
           ]
       );
       url: '#';
       with: 'Decrease'.
    

    That’s it. The full script.

    Now, a little explanation. The script begins with a little preamble, initializing our counter, and allocating an ID, which we then associate with the header when we first render it. Pretty standard fare so far. The really interesting bit comes in the anchor definition, and in particular the definition of the onClick handler. Of course, this bit bares a little explanation.

    The various tag objects in Seaside respond to selectors that correspond to the standard DOM events. When sending such a message, the parameter is an instance of a JSFunction object, which encapsulates the actual javascript that will be rendered into the document. Now, in this particular example, we’re actually using part of the Scriptaculous library wrapper to create an “updater” object, a type of JSFunction, which takes the ID of a page element, and a callback, and when invoked, causes the callback to be triggered. Upon invocation, this callback is passed an HTML canvas, and when the callback terminates, the contents of that canvas are used to replace the contents of the indicated page element. Neat!

    So in this particular case, we have two anchor tags, each of which has an onClick event registered which, when invoked, updates the counter value and then updates the heading on the page.

    By the way, there’s also a little bit of extra magic going on here. You’ll notice the ‘counter’ variable is local, while in the original example it was an instance variable. But this works, here, because those callbacks are actually lexical closures, and so the ‘counter’ variable sticks around, referenced by those closures, even though the function itself has returned, and the variable technically has gone out of scope.

    To me, the really amazing thing, here, is that never once do I, as a developer, have to even touch HTML or Javascript. The entire thing is written in clean, readable Smalltalk, and it’s the underlying infrastructure that translates my high-level ideas into a functional, cross-browser implementation. Once again, Seaside let’s me forget about all those annoying, gritty little details. I just write clean, expressive Smalltalk code, and it Just Works, exactly as I would expect it should.

    Update:

    If you want to see the above application running live, you can find it here.