问题描述:

I need to write hive query that will pull from a table including columns for lat lon and return results that are within x miles of a given point. Does hive require a toolkit or plugin for this?

I am using Hadoop (3.2.7.844) running on HDInsight

Sorry, You could also just write a big mathematical function in query if I understand you use case correctly. Writing the mathematical function in query would be a headache but from a hive query perspective it should be doable.

```
dlon = lon2 - lon1
dlat = lat2 - lat1
a = (sin(dlat/2))^2 + cos(lat1) * cos(lat2) * (sin(dlon/2))^2
c = 2 * atan2( sqrt(a), sqrt(1-a) )
d = R * c (where R is the radius of the Earth).
```

And the in where condition filter based on your x miles. Taking below paths would be obviously more manageable and clean.

Yes, you would need something more than just the function supplied by Hive out of the box.

There are essentially two paths you may take.

- Write you own UDF (Its pretty simple). This could be a scalar UDF which will take x miles and your point in question as arguments and returns the rows of table (lat, long) which are with in x miles for your argument point. In the UDF you can implement a function to calculate the distance between argument point and row of the table provided to UDF. Return if distance is less than x miles.
- Or you could use such UDF/UDTFs already written by other people. Hive being open source has benefited a lot by this. You can take this library ESRI library to do your stuff https://github.com/Esri/gis-tools-for-hadoop. There are also plenty other such libraries available.