Infostop

2019

If you have a sequence of raw GPS points collected from some individual (like yourself) for a period of time, a common pre-processing step you will want to do is to find stop locations. There's a lot of ways to do this, but typically, you group points in time somehow such as to reduce the number of location points to consider for the second step which is usually some kind of clustering. Often, people use DBSCAN for this second clustering step because it doesn't require you to specify the number of desired clusters (i.e. stop locations) and also it's pretty easy to understand: if points are closer than some distance they belong to the same cluster.

My problem with using DBSCAN for detecting stop locations has always been that if two clusters are slightly overlapping, DBSCAN will give them the same label. This is especially annoying when clustering GPS points, because they are inherently a little bit noisy, so GPS points associated with different labels will often overlap slightly. So I wrote a small piece of code to fix this.

My solution is called Infostop, and leverages Infomap to find clusters. The idea is simple. First you group time-consecutive points that are within some distance (and optionally, time). From each group, you only keep the median, such as to reduce the overall number of points. THEN (and this is the new stuff), you create a network, where each median is a node and two nodes have a link between them if they are within a given distance. Now you run the community detection algorithm Infomap on this network and lo and behold, the resulting stop locations are just beautiful. Here's an example of what it looks like for some of my data.

infostop_example

Notice that many of the points that are in different clusters are actually really close to each other. DBSCAN would have assigned the same label to these slightly overlapping clusters, but Infomap does not. So with Infostop you can e.g. download your Google location data, and automatically (and fast) label each GPS sample.

If you want to use this you can simply pip install it (pip install infostop). Check out the Github repository here. Many thanks to Laura Alessandretti for discussions and contributions that made this little piece of software much better, and thanks to Piotr Sapiezynski for his haversinevec code that makes computing inter-point distances unfathomably fast.