Earlier this week I have started in on another pretty cool project that will compliment some of the work that I have done on the IP Manager. Since an IP address is able to give us a general geographic area my manager thought it might be neat to utilize some of the publically available census data that is published by Stats Canada and the US Census Bureau. This information can be related to Postal/Zip codes, which can then be related to groups of users of our applications.
So far, I have conducted a lot of research in order to learn about the layout and terminology of the data that is collected by both government organizations. One thing that is immediately evident is that there are very few likenesses between them. StatsCan uses something called a Postal Code Conversion file in order to map its collected data to physical geographic areas. The US Census Bureau does not have anything comparable to this but rather they have what are called Zip Code Tabulation Areas.
My initial work has been to create a multitude of Django models to store the different abstractions of geographic/census areas that are employed by each government. The first piece of information that we decided would be useful to store in our databases is median household income. For StatsCan, I have managed to write a parser to read this information into our database and relate it by Postal Code. The scripts that I am writing are basically one-off scripts with a single defined purpose. As I continue my research for how to go about gathering the US Census bureau data I have learned that they actually supply a REST API. I have a feeling that this should make my job substantially easier but I will have to go more in depth to find out its impact.
One particularly interesting thing that this project has opened my eyes to is the fact that there is so much data available for free. This data is just waiting to be utilized in fascinating ways and that is something that is unbelievably powerful for small projects and companies alike.