Short Paper (6 pages)
Official IEEE published version:
DOI: 10.1109/BigData.2018.8621886
Author's version: PDF
(528KB)
IP geolocation databases map IP addresses to their geographical locations. These databases are used in a variety of online services to serve local content to users. Here we present methods for extracting locations from the reverse DNS hostnames assigned to IP addresses. We summarize a machine learning based approach which, given a hostname, aims to extract and rank potential location candidates, which can then potentially be fused with other geolocation signals. We show that this approach significantly outperforms a state-of-the-art academic baseline, and it is competitive and complementary to commercial baselines. Since extracting locations from more than a billion reverse DNS hostnames at once poses a significant computational challenge, we develop a distributed version of our algorithm. We perform experiments on a cluster of 2,000 machines to demonstrate that our distributed implementation can scale. We show that compared to the single machine version, our distributed approach can achieve a speedup of more than 150X.
In Proceedings of the 2018 IEEE International Conference on Big Data (BigData), pages 1581-1586, Seattle, WA, December 2018.
Back to Brian Davison's publications