Thursday, March 10, 2011

Beautiful code

I am currently reading the book Beautiful Code, which got me thinking "what is the most beautiful code I've ever written". The answer came up really quickly.
As part of my dissertation I developed an image analysis and statistical data mining and analysis software. In this software, one of the functions was responsible to detect correlation of events between two vectors, where each vector holds the times in which a neuron fired an action potential. This kind of computation is called cross correlation but with the constraint that the correlation is bounded by biological time frame (usually less then 100 ms).

The initial implementation looked as follows, where 'sVec' and 'tVec' are the vectors, and 'before' and 'after' define the biological time frame.

function crossCol = calcCrossCorrelation(sVec, tVec, before, after)
    numOfSourceMaximas = numel(sVec);
    numOfTargetMaximas = numel(tVec);
    crossCol = zeros(before+after+1, 1);
    for i=1:numOfSourceMaximas
        sourceSpikeLocation = sVec(i);
        for j=1:numOfTargetMaximas
            targetSpikeLocation = tVec(j);
            delta = targetSpikeLocation - sourceSpikeLocation;
            if(delta< -before)
            if(delta > after)
            crossCol(abs(-before - delta )+1) =
                             crossCol(abs(-before - delta )+1) +1;

This code basically runs on every combination of any event in the vectors and checks whether the combination occurred within the given time frame. It ran slow, really slow. To analyze data from one imaging session and to complete all the checks of all the cross correlations of all the cells I needed a few days, so I sat down to optimize this code, the final result looked like this:

function crossCol = calcCrossCorrelation(sVec, tVec, before, after)

distMat = bsxfun(@minus,tVec',sVec);
filteredDist= distMat ((distMat  >= -before) & (distMat  <= after));
crossCol = histc(filteredDist, (-before:after));

Thee lines of code.

The first line creates a distance matrix between the two vectors (using Matlab's utility bsxfun, which is very efficient). 
The second line is filtering out this matrix into a vector that holds all the cells whose value was within the biological time frame.
The third line bins all these values to a set of bins for all the values in the biological time frame (the set of values is defined by the 'before' and 'after' parameters and the sampling rate which affects which values would exist in this range).

This change made the same analysis run in a matter of about 30 seconds, as well as being beautiful code on its  own.

The updated code results from a change in the problem solving approach - thinking of the problem in terms of matrices and what can be done with vectors/matrices and arrays instead of thinking about each event on its own. This thinking paradigm is very powerful, especially when using the right tools (in here - Matlab).