I recently put together an A* implementation for the first time. I actually did it twice: once in a naive way just to get a good grasp on how the algorithm works, and then a second time with an eye towards making it perform better. I’ve been meaning to write up my observations on it for a couple of months, so here they are.

Amit’s A* Pages served as my primary reference for how the algorithm works. That’s definitely the place to go if you want to learn about it.

For my first implementation of the algorithm, I did a very straightforward implementation: I had a 2D grid of locations with movements costs, a list of closed off locations, and a list of open locations being considered. The data structures looked like this (with some unimportant details elided):

// This class is just an NSArray of waypoints
@class Path;

@interface PathNode : NSObject
{
    float fValue, gValue, hValue;
    MapCoordinate coordinate;
    PathNode *parent;
}
@end

@interface Pathfinder : NSObject
{
    NSMutableArray *_openList;
    NSMutableArray *_closedList;

    // CollisionMap is just a 2D grid of movement costs with some
    // nice accessor functions
    CollisionMap *_collisionMap;
}

...
- (Path*)pathFromStart:(CGPoint)startPosition toEnd:(CGPoint)endPosition;
@end

There are three important details about this first implementation:

  • The open and closed lists are kept in NSMutableArrays. The open list is kept sorted at insertion time. Checking for membership in one of the lists means scanning through the list.
  • PathNodes are dynamically allocated.
  • This implementation was dog slow.

On an iPad 2 with a moderately sized map, finding a decent length path took 200ms. Totally not acceptable. I expected the first pass to be bad; in particular I expected maintaining the sorted open list to perform terribly. So I fired up Instruments, captured a trace, and the results looked like this:

Slow profiler run

By far the thing taking the most time was looking through the open and closed lists for a previously allocated PathNode (indexOfCoordinate:inList:). Maintaining the open list took almost no time next to that.

I was breaking other rules too: in particular, allocating and deallocating those PathNodes at runtime was a bad idea. So, armed with the profiler results, I took a second shot at the implementation.

struct PathfindingCell
{
    MapCoordinate coordinate;
    float f, g, h;
    struct PathfindingCell *parent;
    struct PathfindingCell *openPrevious, *openNext;
    struct PathfindingCell *closedPrevious, *closedNext;
};

typedef struct PathfindingCell PathfindingCell;

@interface Pathfinder : NSObject
{
    CollisionMap *_collisionMap;

    int _cellsWide;
    int _cellsHigh;
    PathfindingCell *_cells;

    PathfindingCell *_openListHead;
    PathfindingCell *_closedListHead;
}
- (id)initWithCollisionMap:(CollisionMap *)collisionMap;
- (Path*)pathFromStart:(CGPoint)startPosition toEnd:(CGPoint)endPosition;

This time:

  • I keep a 2D array of PathfindingCells with the same dimensions as the input CollisionMap. This is allocated once when the Pathfinder object is created, and never again.
  • Instead of using NSMutableArrays, I now keep list links directly in the PathfindingCell structs.
  • As a result of those two properties, I can now:
    • Directly look up any cell given an x,y location.
    • Determine whether a cell is in the open or closed list without traversing the list.
  • Only one allocation is made for each run, and that’s for the Path object that gets returned from the pathFromStart:toEnd: method.

Importantly, the algorithm didn’t change, but the data layout did. With these changes, a profiler run for the same map and path now looks like this:

Fast profiler run

That’s better. I could probably make it faster; maybe a priority queue would work better than the insertion-sorted list I’m using now, and I could pre-allocate a fixed length Path object or something else, but it’s fast enough for me now, so I stopped at this point.

I posted the source code for the implementation as a Github Gist. It’s just the A* implementation without any drawing code or anything, but hopefully it’ll still be useful to learn from.