Time-shifting PacketsAt Polar Mobile, the notion of sending "packets" from individual clients to our servers exists. A packet has been defined internally but it is essentially a collection of items in JSON format that the client will collect and post to certain API endpoints over HTTP. One item that is included with each sent packet is a UNIX timestamp indicating the date and time that the client submitted the packet. On the server side, this timestamp is used to classify and group packets together so that other processes can analyze different packet items over various periods of time such as individual months and years. What was found since this particular system has been in place was that some timestamps would be sent in with dates and times that would make little to no sense such as 1980 or in future years like 2063. The problem that was determined was that those clients sending the timestamps had their internal clocks modified or reset for whatever reason. My first task was to attempt to normalize all of these bizarre dates to determine when they actually happened.
To accomplish this task, it was determined that we would be able to also take a time reading from the server when the packet had been received. It would then be possible to determine the delta between the client and server time and shift each individual packet from this client by this delta. Using some simple logic all written in Python, this was easily accomplished and now all dates coming in from the client are being normalized successfully. This ultimately provides for easier analysis of packets from anyone on the Polar team who is interested.
Dirty Bucket ManagerA second task that I got a chance to work on after the packet time shifting project was complete was a new method to process all of the packets on our servers. Previously all packets coming in would be processed nightly only if its particular date was from the previous seven days. While this did work well, it was determined that a far better way to process each "bucket" of packets would be to only process the bucket if it was deemed the most important. To fix this, we continually score each bucket based on its date and the number of packets that it holds. These scores are all stored in a sorted Redis queue so that we can grab the buckets with the largest scores to process.
All in all, I thought that these projects were great to get me re-acquainted with all of the internals of the analytics projects at Polar. They were challenging enough that I didn't feel that I was being given work just to keep me busy. I had a lot of fun completing these projects and learned a lot through the process.