Everything else ...

Photo and video projects

Computers and technical stuff

Script for Coordinates

Sat, 22/10/2011 - 18:38 -- kstobbe

Recently I had a list of addresses that I needed to find the coordinates for - to be able to plot them on a map. One solution would be to get a plug-in for Google Maps that returns the coordinates for an address manually typed into Google Maps. This solutions would probably have take me the same amout of time as to make an automated solution. The solution ended up being a shell script for my Linux machine that uses the shell tools SED, Grep, and W3M to automatically get coordinates for the entire address list.

Tools

Let me first introduce the tools that I intend to use. First of all shell script. So what it the difference between a script and a real application? An application is written in a programming language and is compiled into an executable file that solves some task - this task could easily be the problem at hand to find coordinates. A script on the other hand is a list of shell commands that are executed as the script runs - no compilation needed. A shell script is really just a list of shell commands that are executed sequentially.

SED is a Stream EDitor for UNiX/Linux that can modify a stream of text that is delivered to the program - e.g. from a file. With SED you can for instance replace a certain word in a file, remove six characters in each line or the first 10 lines in a file and lots lots more.

Grep is used to extract or search for sentances, words or certain characters in a file.

W3M is a text based web browser. It can either work as a regular browser or it can dump a homepage into a file, either as interpreted HTML or as raw HTML.

Seperating the Address

I have my address in a text file so the first step is to seperate the address into elements and read them into variables. My address file is devided with newlines between each address element, but it could just as easily have been comma seperated.

For the sake of example lets assume that my addresses are in a file named address_list.txt. I know that the first line of the file holds the name of the person who lives on the address. On the second line is the street name and the house number. On the third line is the zip code and the city name. I know that all the address I have in the list are addresses in Denmark, so this information is not in the address file. Following this information comes the name of the next persin in the address list.

The first thing I do is to read the name of the person into a variable. I don't have a need to divide the name into first and last name, so I simply consider the name a one string. I use SED to read the first line in my address file and store that in a variable:

var_name=$(sed -n '1p' address_list.txt)

The operration 1p returns the first line in the file. Then I want to read the street name and house number. Again, I have no need to split up the street name and the house number into two seperate elements but just need it as a single string:

var_address=$(sed -n '2p' address_list.txt)

Now I want to get the zip code and the city name. Due to how I need to use the address data, apart from getting the coordinates, I need to split the zip code and the city name into two different variables. Therefore I use SED and grep in combination to extract the first four characters in the string and make sure they are digits - Danish zip codes are four digits:

var_zip=$(sed -n '3p' address_list.txt | grep -o '^[0-9]\{4\}')

In the greo command I use the -o operator to have grep into print the matching characters. Normally grep will print the entire line where a match is found. If you read the grep command from behind it means: Return four characters; they have to be digits; and they have to be at the beginning of the line. As the commands are used in a script that will normally interpret curly brackets it is necessary to escape the characters with backslashes.

Then I skip the first four characters in the same line and use the rest to get the city name:

var_city=$(sed -n '3p' address_list.txt | sed 's/.....//')

I know that the zip code it four digits followed by a space, so I tell sed that I don't care about the first five characters on the line and would like to delete them.

I now have variables with all the address elements that I need. It is now relatively simple to expand these commands to read the rest of the address file. It only take a little calculation and a variable to get the 1p, 2p and 3p to match the next person in the file.

Getting the Coordinates

Now we are ready to get the coordinates for the address. By accessing a special Google Maps homepage, where the address is in the URL the only thing on that page will be the coordinates for that address.

http://maps.google.com/maps/geo?q=algade+1,+9000,+danmark&output=csv&oe=utf8&sensor=false

The URL above will return the string below. The first two numbers represent an error code and the accuracy of the address:

200,8,57.0469132,9.9231148

I then put the address elements I need together and replace all the spaces with a plus sign (+). Then I prepare the URL so it is ready for the browser:

var_google_address=$(echo "$var_address, $var_zip, Denmark" | sed 's/ /+/g')

var_url=$(echo "http://maps.google.com/maps/geo?q=$var_google_adresse&output=csv&oe=utf8&sensor=false")

Then I use the W3M web browser to download the page and store it in a file I call temp.txt:

w3m -dump $var_url > temp.txt

In theory I now have all the information that I was out to get, but I still need a bit of work to get in a state I like. As seen in the example above Google returns the coordinates in a certain format that I need to break down. I start out by replacing the commas between the numbers with spaces. By having spaces between the numbers I can use the Linux program cut to return a specific "word" in a sentance.

sed -i 's/,/ /g' temp.txt

Now I can extract the information I want:

var_error=$(cat temp.txt | cut -d " " -f1)

var_lat=$(cat temp.txt | cut -d " " -f3)

var_long=$(cat temp.txt | cut -d " " -f4)

The command cat "plays" the file I give it. In cut I signal with f1, f3 and f4 which word I want and with -d that space is the symbol that divides the words.

At this time you can use var_error to verify if you got a valid result from Google and take action accordingly. An error code of 200 is success. As we are working in a script then we still need to generate some output from the data that we have found. You are now free to choose from any of the variables that we have created throughout the script:

echo "$var_name, $var_address, $var_zip $var_city, $var_lat, $var_long"

If you would like to see the data plotted onto a map you should only print the coordinates. Then you can find homepages on the web that has implemented an interface that enables you to plot multiple points onto Google Maps at the same time. You can't do that directly on Google Maps but there is an API to do it, so you'll have to find a homepage that has implemented the interface.

Tags: