As I alluded to in my previous post, all of my attention has been shifted to a new project, which is dubbed: The Ad Aggregator project. Polar Mobile has a lot of data that it collects, organizes and stores in order to analyze it and better its products. At this point of writing, the majority of what is collected is only that which is coming internally from the various native mobile clients. The intent of the Ad Aggregator project is to actually utilize some external resources at our disposal, which in this case would be Mocean and DoubleClick For Publishers. These third party providers collect a variety of analytics and offer reports but it has been found that the format of these reports are not particularly streamlined or easy to use. It was determined that it would be ideal to be able to pull this data into our own specific format so that internal tools could be used to analyze all of the data in the similar fashion.
Given that I am roughly a week into the project at this point I will touch on my progress so far and what the next steps will be to cross the finish line. A lot of planning was required in the first few days for this project. I did a lot of studying to brief myself on the current database schema that is utilized by Polar Mobile. Their analytics databases are organized in what is called a 'Snowflake Schema'. This schema is defined by tables foreign keying out to multiple other tables, which form dimensions. When drawn out on paper this pattern tends to form what looks like snowflakes, hence the name. I ended up going through several iterations of designs with members from my team before getting the go ahead to begin implementation. Designing this schema was further complicated by the fact that both ad providers have their own terms and inherent structure that had to be accommodated.
The implementation and retrieval of the data ended up being very straightforward since a lot of the groundwork was already in place from the existing schema. The Ad Aggregator consisted of a single script that accepted a date or date range to pull data for. The intention is that this script would be run daily to retrieve the previous day of statistics.
Now that all of the appropriate information is being filled in Polar's aggregate tables, the next step for this project is to define the new XML schema for a tool called Saiku. Saiku is an ad hoc querying tool that allows users to drag and drop different dimensions and measures and allows one to see data in different light.
So far I have managed to get the system running locally in my development environment for testing but I have yet to start on actually defining the schema. To me this does seem slightly inefficient. It almost seems as though you have to design the schema and implement twice, once using the appropriate Django models and once using XML so that Saiku can understand how to access the data. If I get a chance I might look into automatically generating the XML schema but that's for another day!
As a side note, I really enjoyed getting to work closely with some of the other guys on this project. I not only worked with those on my team but also with the resident Ad Operations fellow who seemed excited at the prospects of the project. I am certainly learning a lot from them and trying to absorb as much as I can before I go back to school in September.