This week it was revealed that the iPhone stores users’ locations, and this immediately caused a huge firestorm of commentary by tech geeks, panic among privacy advocates, and delight to data geeks like myself. Even better/worse, it seems that the iPhone caches location traces long-term, possibly back to the date the phone was activated.
I ditched my iPhone this past December (good riddance) in favor of the Droid X (Android). I figured, on such an open source OS, Google must be doing the same thing. After surfing through Hacker News, it turns out I was right.
Compared to the iPhone though, getting the data on an Android phone is not simple.
Once I downloaded the files to my Mac (via scp), I downloaded this handy-dandy parser from packetlss called android-locdump and converted the cache.cell and cache.wifi files into GPX files by passing the --gpx flag. You can also leave off the --gpx flag and parse the output yourself.
Then I used GPSBabel to convert the GPX files to CSV files and loaded them into R. While this was great for a static view, the lack of interactive zooming makes working with this type of data more difficult. I then used some code from the RgoogleMaps package vignette, and adapted for use by Michael Malecki. [Drew Conway has developed stalkR for analyzing iPhone and iPad location data in R.]
library(RgoogleMaps) Df <- read.csv("CSV file", header=FALSE) names(Df) <- c("Latitude", "Longitude", "Key") bb <- qbbox(lat=range(Df$Latitude), lon=range(Df$Longitude)) m <- c(mean(Df$Latitude), mean(Df$Longitude)) zoom <- min(MaxZoom(latrange=bb$latR,lonrange=bb$lonR)) Map <- GetMap.bbox(bb$lonR, bb$latR, zoom=zoom, maptype="mobile", NEWMAP=TRUE, destfile="tempmap.jpg", RETURNIMAGE=TRUE, GRAYSCALE=TRUE) tmp <- PlotOnStaticMap(lat=Df$Latitude, lon=Df$Longitude, cex=.7,pch=20,col="red", MyMap=Map, NEWMAP=FALSE)
The map clusters my activity into a few familiar categories: work, school (Math Sciences Building actually), home, and my parents’. Android also picked up a dinner outing in Santa Monica, and a trip to the Shopzilla office for the Los Angeles Hadoop User Group meetup, but little else.
What I Found
The cache.cell file uses cell tower triangulation to locate the user. In addition to this imprecise measure, the Android’s location tracker has several limitations
I also found that I need to get out more.
Why Would Apple do Such a Thing?
Earlier iPhone models (up to 2010 apparently) used Skyhook for its geo-location database. Skyhook employees basically drive cars wired with WiFi sensors and GPS and does what is called “wardriving.” They drive around cities recording information about the access points it encounters and where it encounters them. When a user logs onto the web via one of those access points, Skyhook customer sites can cross-reference the access point location with its physical location. As of August 2010, Apple dropped Skyhook. Why?
I suspect Apple is using this data to build its own geo-location database, yet there is no evidence that the files on the iPhone are actually being transmitted to Apple. If it is true that the location database is actually transmitted to the user’s computer, it’s possible that Apple uses this data from Safari to enable geo-location features in it.
The investigative side of me says that this could be useful in a missing persons case if the phone is dropped.
Android or iPhone?
Apple and Google pursued different approaches in caching users’ locations. Apple used a standard database file stored on the phone. Although this file is hidden in the phone, it seems to be transmitted to the user’s computer. The user can then open the file and see what Apple is storing about them. Heck, they could even modify it to privatize it. The iPhone updates this information very frequently, and keeps it around for a very long time. The file is there, the user knows it is there, and the user can see what is in the file. Unfortunately, this also means that people will overreact.
Google, on the other hand, hid the file deep in the filesystem such that a terminal connection is necessary to reach it, and “rooting” the phone is necessary to see its content. The user has no idea that this file exists, and cannot see what Google is storing about them. This is a bit shady. On the other hand, the information that Google is collecting is very minimal and has questionable use. Data is not updated often, and is not held on disk for very long. It is also possible to clear at least the WiFi location cache file by turning WiFi off and on.
So, what do you think about all of this?