James Doig's Blog

Saturday, 19 May 2012

Initial thoughts: Graph Databases

We've recently had some very interesting tech talks given by our talented developers and guest speakers.The first one was regarding graph databases (e.g Neo4J) and how we may use such a tool to store our routes data.

After the talk I was jotting down some of the ideas talked about to help get my head around graph databases.

I really liked the thought of storing not only route data but also pricing and user search metrics within the graph.
For example if a user was to choose Edinburgh as their point of origin a lot of relevant information about their, potential, requirements would be within only a couple of hops of the graphs nodes.

In the image (not at all based on real data) we can see a relationship going from Edinburgh to Spain that represents a large percentage of searches with Edinburgh as an origin have Spain as a destination (we can also see the price and distance/journey time between these relationships).

There is also a relationship going from Edinburgh to Glasgow that shows a large percentage of searches that start off with Edinburgh as an origin often see the user switching to Glasgow airport as their origin (perhaps due to ticket price or availability of flights to a given destination). In conjunction with this we can see that Italy is the main destination searched for when Glasgow is the origin.

Lastly there is a link from Edinburgh to Amsterdam telling us that Amsterdam is the most popular "via" location in a multi-hop journey and that Dubai is the most popular destination from Amsterdam.

I haven't gotten a chance to see how such a database would work in reality... But I do like the idea of being able to take a user origin as soon as its selected, e.g Edinburgh, and be able to immediately come back with information like:

Flights from Edinburgh to Spain from £30 (journey times around 2 hours)
Flights from Edinburgh to Dubai from £432 (journey times around 10 hours)
Flights from Glasgow to Italy from £51 (journey times around 4 hours)
Train from Edinburgh to Glasgow from £18 (journey time < 1 hour)

Adventures in Redis. Part 1

While looking for cool buzz-wordy tech to play around with I investigated an in-memory key/value(or more accurately a key/data structure) store called Redis.

Redis was immediately appealing due to how easy it was to get started, I was able to test the commands from within my web browser due to the fantastic documentation pages. e.g: http://redis.io/commands/hincrby

See the last line in the example that reads 'redis>' ? That's an interactive command prompt!
And installation was a synch:

$ wget http://redis.googlecode.com/files/redis-2.4.13.tar.gz
$ tar xzf redis-2.4.13.tar.gz
$ cd redis-2.4.13
$ make

It didn't take long for me to realize this would be a great fit for a problem a team at work where tackling (and there went my cool side project).
The guys where generating inverted indices over some travel data, sharding it by origin country and storing it in process in a .NET program to be queried via http endpoints. This data would also be updated tens of thousands of times an hour.

Their queries took the shape of a set membership and/or set intersection tests (e.g Where id is in index: 'Origin country: Spain' AND is is in index 'Day of departure: 20121202'). With the goal of finding the cheapest ticket that met all the criteria.

To map this problem to Redis I would take the quote data and also build inverted indices over it using the Redis sorted set, obviously sorting on price.

To check the existence of a ticket to Spain on 2012-12-02 I would ZINTERSTORE (intersect and store) the Spain index and the 20121202 index. This was going to be slow, I knew this as the Redis documentation also gives you the time complexity for all of it's commands.

But the plan was to shard the data into hundreds of shards as we had quite a lot of CPU's at our disposal, my idea was that small data sets intersected in parallel would overcome the speed issue.

Another one of the goals here was to take something out the box that had persistence and replication and apply it as simply as possible.

As a nice side effect of Redis storing the intersection allowed us to build facets against the now reduced set where as the existing solution would build each facet a new.
So we spiked this in just one day, my go to data dude, a python-wizard and my self. During the spike we discovered a couple interesting things about our data:

1) The distribution of our data was horrible, 95%+ of our origin and destinations where in the UK, North America or Russia.

2) The requirements of the project where a bit too optimistic (but I won't go into this)

(chart by Charlie G.)
Point one meant that a query who's origin and destination fell in that 95% head (e.g London to New York) was dealing with almost all of our data.

At the end of the day we where 6-20 times faster than the existing solution (depending on the query) but still not fast enough for production.

Since our spike it has come to light that 50%-90% of our data can be culled and still provide 80%-90% of the functionality. Also the requirements have been loosened up.

With the removal of this excess data, should we revisit the Redis solution we would be able to pre-intersect a LOT of our data (e.g store pre intersected city-to-city, or country-to-country departing-april, etc).
It was a very rewarding experience, working with a couple of really smart guys, learning a lot about our data and a great tool in the form of Redis.

Tuesday, 20 September 2011

Clojure on Heroku

Here's a quick link to my simple "Clojure on Heroku example" Git repository .

To quote the readme:

"A very (very) simple example of Clojure + Compojure project that can run on Heroku's Cedar stack"

Wednesday, 10 August 2011

A simple Clojure web app

I thought I'd start with a simple example showing why I'm coming to love Clojure. This is a small web app that accepts a GET request with an id, hits a CouchDB database, executes some simple business logic then returns the results as json.
What's interesting is that it does all this without using business objects, mappers, etc.

Clojure allows you to deserialize the json response from Couch into a hash-map and access the elements of that hash-map via symbols like so (client :address), no need for strings or accessing by a numerical index. As you can see this allows you to access values in the hash-map like fields on an object.

Clojure also has a thrush (or pipe) operator -> which can be used to pass objects into methods. While this is only syntactic sugar it allows you to use functions as if they where methods. e.g:
(-> client print-name-and-address)

This approach may not meet everyone's needs but I think it's a really great way to make lightweight, easily extensible, data driven web applications.
In the following example all the code is lumped together in 1 file just for the sake simplicity. In production code you would probably want to brake functions out into separate files for those that work on json, documents & vehicles as well as the core routing logic. (Error handling has been omitted to keep the example simple)

Friday, 29 July 2011

A fresh, new, start

Hi all,

Sorry for the lack of updates but it's been a busy year.
Soon after my last F# post I landed a pretty cool job. I've since been learning Python and a bit of Ruby/Rake.

But I have to say the language I'm really psyched about this year is Clojure... It's functional, it's JVM back-end means it's portable but most of all I like the fact Clojure has made Lisp trendy in this day and age; like some ancient, elder thing rising from obscurity to take over the world.

So, as you may have guessed, my next few posts are going to be very Clojure-centric.

Wednesday, 25 August 2010

A* With F#

First of all please accept my apologies if you find this post and my last one rather scant on details and description.
I'm mega-busy at the moment, I'll pop back soon and try to flesh some of these posts out with detailed descriptions.

I've updated my path finding to use a generic version of the A* algorithm. This slows it down (~6 milliseconds on my box, in debug mode) but it does make for more reusable code.

My Tools module has become the home of this new algorithm and is as follows

And my Pathfinding module has shed some weight to look like so:

JD

Tuesday, 24 August 2010

F# Pathfinding

I've recently started to have a play around with F# and for my first non-trivial application I'm thinking of implementing a path finding algorithm as seen in games. Seeing as this is a learning exercise I'm going to try and be as "functional" as possible (trying to stay as "pure" as I can with immutable data).

First things first:
We'll need a world to 'path find' our way around. Lets start off with defining some simple utility functions. Below is my Tools module that contains functions to:

Square ints to a float
Remove items from lists
Load ints from a file (2D arrays of numbers will serve as our tile maps, collision maps, monster & item locations, etc)

The map:
Ok we still need to describe the world we want to navigate: In our world a 0 represents an open space and a 1 represents an area we can not enter. For example a 4 unit long corridor, running left to right, would look like this:

* Wall 1 1 1 1
* Void 0 0 0 0
* Wall 1 1 1 1

Here is my level module that describes:

Map point: an x,y location in our 2D space, plus the value of the tile at that location and a function to calculate distance from another map point
Map: a list of map points and a function to get the neighboring tiles of any given tile

Pathfinding: Now comes the fun part. This contains the PathingNode type (that basically wraps a map location with information that the A* path-finding algorithm needs), some utility functions and the path-finding function itself.

Level 1: Now let's create a file to represent our world. Open up a text editor and and create a delimited 2D array of 0's and 1's with a definite path between 2 points. like so:

* 0; 0; 0; 0; 0;
* 0; 0; 0; 1; 0;
* 0; 0; 0; 1; 0;
* 0; 1; 1; 1; 1;
* 0; 0; 0; 0; 0;
* 1; 1; 1; 1; 0;
* 0; 0; 0; 0; 0;
* 1; 0; 1; 1; 1;
* 0; 0; 0; 0; 0;

(I've highlighted my start tile green and end tile red for reference)
...Save this text file some where on your hard drive for later.
The Results:
The code bellow shows the pathfinding in action: