There are geocoders out there aplenty, but services come and go and when one needs to get a job done quickly, there's nothing more frustrating, than coming across a 404 error page.
I found that a lot of services and libraries do a decent job in geocoding datasets, with one - deal-breaking - caveat: they drop the associated data. This means a tedious merging job between the original and geocoded dataset. This scenario has essentially defined my core requirement set:
- create a geocoder with minimal infrastructure need (to host) and one that non-technical people can set up with reasonable ease
- retain original data attributes
js.geocoder takes a
csv file with at least an
ID field (to allow easy re-merging) and an
Addr (for Address) field and gives you another
csv file with
- the same ID
- the original address
- the address as interpreted by the geocoding service
- Latitude and Longitude
- the response status (ok, error messages etc)
- ...and it flags if the interpreted address is outside of Australia (I plan to change this to a user defined region later)
- HTML5 File API i.e any modern browser — see caniuse.com for compatible browsers
- third party geocoding service, currently Google
- user defined client / API key
- Nominatim implementation (option to chose between Google and OSM) as geocoding service
- fetch altitude data (3D geocoding FTW!)
As geocoding is reasonably resource intensive (en masse) and relies on massive geo-databases made available by generous third parties, they also imply some — fairly reasonable, I should say — usage terms. To ensure that these terms are understood, I've added a little read the terms before starting enforcement mechanism to the UI. I'm sorry. I know it's annoying, but I also know that most people just wouldn't care... and would get themselves or in worse cases their organisation or some good-soul open host banned from these services.
For there are strict rules to request frequency hitting these geocoding services, I currently imply a 1.2 second interval between requests. This means that to geocode a few thousand entries may take a while. The good news is: you can just let your browser open and running. It won't time out, thanks to papaparse's awesome, streamed parsing mechanism.
Its first release, v0.4 is now up on bitbucket, go and download / clone / fork yours.