About

methmap This project takes data about meth lab busts from the DEA Then creates a .kml file so that the data can be visualized using google earth.

KMZ of methmap

The source code is available

Getting the links

I first had to make a list of urls to tell my script where to get the data from I did that using this script which lists all the URLs on a page. This script is from Dive Into Python

from sgmllib import SGMLParser
 
class URLLister(SGMLParser):
	def reset(self):
		SGMLParser.reset(self)
		self.urls = []
 
	def start_a(self, attrs):
		href = [v for k, v in attrs if k=='href']
		self.urls.extend(href)
 
if __name__ == "__main__":
	import urllib
	usock = urllib.urlopen("http://www.usdoj.gov/dea/seizures/")
	parser = URLLister()
	parser.feed(usock.read())
	parser.close()
	usock.close()
	for url in parser.urls: print url

I then only selected the urls which were state names and saved them to a file called link plus a little awk script to generate the full url.

awk '{print "http://www.usdoj.gov/dea/seizures/"$1}' link.txt > links.txt

Generating the kml

The script 'parseall.sh' is used to generate the kml file.

#!/bin/bash
 
rm statedata.xml
echo "<states></states>" > statedata.xml
 
for link in `cat links.txt`; do ./htmlout.py $link; done
 
rm final.kml
echo "Generating KML"
./geocode.py

It does two things

htmlout.py

it runs htmlout.py which grabs the data from the urls given as the first argument. It generates a XML file named statedata.xml from these sites.

#! /usr/bin/env python
 
#Grabs the data from the web and create XML doc of that data
#must start with a blank xml doc "<states></states>"
 
import urllib2, re
from BeautifulSoup import BeautifulSoup
import amara
import sys
 
#get the url from an arg
url = sys.argv[1]
 
#open a xml file that 
#note the fist time this file needs to be
# "<states></states>"
doc = amara.parse('statedata.xml')
 
f=open('statedata.xml','w')
f.write(doc.xml())
 
print "parsing: %s" % url
page = urllib2.urlopen(url)
soup = BeautifulSoup(page)
 
#define a dict that will be used to find meth bust count
d = {}
 
#get the state name
state = soup.find('p', align="center")
state = state.find(text=True).strip(" .")
state = re.split(" -- ", state)
state = state[1]
#create a new state element under states
doc.states.xml_append(doc.xml_create_element(u'state',
											attributes={u'sname' : state}))	
 
 
# set the location for soup to start reading in data
# there is only one node with the text "COUNTY"    
th_row = soup.find(text="COUNTY").findParent("tr")
 
# each subsequent row is an entry in the list.
for td_row in th_row.findNextSiblings("tr"):
	county, city, address, s_date = td_row.findAll("td")
 
	# we only want the text from the other cells, not  the containing markup.        
	county = county.find(text=True).strip(" .")
	city = city.find(text=True).strip(" .") 
	address = address.find(text=True).strip(" .") 
	s_date = s_date.find(text=True).strip(" .") 
 
	#this finds the number of insidents for each county 
	#using a dict
	if county in d:
		#already exists
		d[county] += 1
	else:
		#define new pair
		d[county] = 1
 
#print out the dict
for key, value in d.items():
	#print "%s : %s" % (key.capitalize(),value)
	#the int has to be converted to a unicode string
	newvalue = str(value)
	newvalue = newvalue.decode('utf-8')
 
	#add the county info to the last state element that exists
	e = doc.xml_create_element(u'county',
							attributes={u'cname': key.capitalize(),
							u'points' : newvalue})
	doc.states.state[-1].xml_append(e)
 
 
f=open('statedata.xml','w')
f.write(doc.xml())
#print doc.xml()

The statedate file is like this

<states>
	<state sname="State Name">
		<county cname="County Name" points"datapoints" />
	<state>
</states>

geocode.py

Then geocode.py is ran. What this does is load up usdata.xml which is an xml file I found on the Keyhole BBS that has all counties for each state.

The script checks each county for each state and looks at statedata.xml to see if there is any data for that county. It then color codes that county based on the number of meth busts found for that county.

It then generates a file called final.kml which can be loaded in google earth.

#! /usr/bin/env python
 
#US Counties v06.kml manip file
#goes through the entire kml
#adds in the data from the meth busts
 
import amara
 
#location of the kml with county data
doc = amara.parse('usdata.xml')
#location of xml with data points of meth busts
sd =  amara.parse('statedata.xml')
 
#strip unessesary string data for better string matching
def struni(name):
	name = str(name)
	name = name.lower()
	name = name.replace(' ','')
	name = name.replace('-','')
	return name
 
#find the points for a county for a given state
def pointfind(c,s):
	c = struni(c)
	s = struni(s)
 
	for state in sd.states.state:
		if struni(state.sname) == s:
			for county in state.county:
				if c == struni(county.cname):
					return county.points
	return 0
 
#for each state go through each county and find the data for it			
for states in doc.kml.Document.Folder.Folder.Folder:
	#print states.name
	for counties in states.Placemark:
 
		#find the data point for this county in this state
		dp = int(pointfind(counties.name,states.name))
 
		#set the color style based on the number returned
		if dp == 1 or dp == 0:
			counties.styleUrl = u'1'
		if dp == 2:
			counties.styleUrl = u'2'
		if dp == 3 or dp == 4:
			counties.styleUrl = u'3'
		if 4 < dp < 20:
			counties.styleUrl = u'4'
		if dp > 20:
			counties.styleUrl = u'5'
 
		#add description with the meth bust info
		desc = "<description>%s meth busts</description>" % dp
		counties.xml_append_fragment(desc)
 
#write the file out
f=open('final.kml','w')
f.write(doc.xml())
#print doc.xml()

file summary

cparse.py – This is a script I used for debugging the output of htmlout.py

htmlout.py – Grabs data from websites and makes xml of data

statedata.xml – data about states generated by htmlout.py

geocode.py – Generates kml and color codes counties based on the xml made my htmlout.py

usdata.xml – xml data that defines every state and county in the us

parseall.sh – this script automates the process of creating the kml

links.txt – a list of links to get data from

methmap.jpg – a screenshot if you don't have google earth.

zip of source

 
methmap.txt · Last modified: 2010/02/26 10:31 by newacct
 
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki