IP Geolocation through Geographic Clicks

Ovidiu Dan, Vaibhav Parikh and Brian D. Davison

Article (22 pages)
Official ACM published version: DOI: 10.1145/3476774
Author's version: PDF

Abstract
IP geolocation databases map IP addresses to their physical locations. They are used to determine the location of online users when their precise location is unavailable. These databases are vital for a number of online services, including search engine personalization, content delivery, local ads, and fraud detection. However, IP geolocation databases are often inaccurate. In this work we present two novel approaches to improving IP geolocation by mining search engine click logs. First, we show that we can derive which URLs have local affinity by clustering clicks from IPs with known locations. We demonstrate that we can further propagate these URL locations to IP addresses with unknown locations. Our approach significantly outperforms two state-of-the-art commercial IP geolocation databases by 25 and 36 percentage points at a distance error of 10 kilometers, respectively. Second, we present an alternative method of assigning locations to URLs when IP location training data is not available, by instead extracting locations from the body of web documents. This second approach also outperforms the baselines by 7 and 17 percentage points, respectively, and has higher coverage than the first method. Finally, we also demonstrate that our two approaches outperform the academic state of the art based on mining query logs.

ACM Transactions on Spatial Algorithms and Systems (TSAS), 8(1), article 2. March 2022.

Back to Brian Davison's publications


Last modified: 13 March 2022
Brian D. Davison