A Performance-Based Marketplace Bug

Published in Open Source

To start off, I’m happy to announce that most of the bugs that I was working on for the past couple of weeks have now been merged into master! On one pull request, I addressed a couple of small issues by switching the base class of the form set to use happyforms (another Mozilla project). I also switched part of the test to use a cleaner concatenation via the urlparams module. My other pull request got merged without any direct changes or feedback required which was quite a welcome change.

I began this week by searching for some new bugs on Marketplace since the ones that I had just closed were for the Add-ons project. I came across one which seemed interesting since I had just closed another bug relating to Elasticsearch. This particular bug described a situation in which whenever someone searched on the Marketplace, two requests would be made to the Elasticsearch cluster whenever no results were found. When results were found, only one request would be made. This type of bug is not super critical but at such a large scale, every request counts when it comes to performance.

My first step to solve this problem was to write a test to check how many searches were being performed whenever the search came back with no results. By doing this, I was able to verify that two searches were in fact being performed. Next, I really wanted to see a traceback of the function stack to determine how the search function was being called a second time. To accomplish this, I simply raised an exception in my mocked out test to view the stack trace. By examining the stack trace, I realized that Mozilla actually uses another project to interface between Django models and the Elasticsearch server called elasticutils, which is also a Mozilla project. By digging through that project, I saw that they were using a results cache, which would essentially store any past search and upon subsequent searches it would check the cache to determine if a request to the server should actually be made. When performing a count() query to get the number of results, the cache is checked to see if the count has already been determined. If no results cache exists, the utility must make another request to Elasticsearch to perform the query.

I added some debug statements to the elasticutils code by cd’ing into my virtualenv and navigating to the corresponding source. From what I could tell after adding my statements, it seemed as though the results cache was being lost upon making the count() query. The results cache object was being evaluated to false in the if condition! After a little bit of reading about Python object evaluation, I learned that if a Python object defines a __len__ function, then it will try to evaluate that. In this case, since the search was returning zero results, the __len__ function was returning zero, which of course evaluates to false. At this point, it is believed that the results cache does not exist so an extra request to Elasticsearch is made. To fix this problem, all that needed to be done was to change the way that the results cache was being evaluated. Essentially, we just want to ensure that it is not None.

Once I had the problem figured out, the fix itself was quite trivial. I forked and cloned the elasticutils project and made my small change. I had to figure out how to run the test suite to ensure I was not breaking anything in the process. The maintainers of the project provide a simple script called run_tests.py but in order to use it I first had to create a new virtualenv so that I could install nose, Django and elasticsearch without polluting my global namespace. I was able to run my individual test with:

nosetests elasticutils.tests.test_query:QueryTest.test_count

In the process, I noticed many of the other tests were failing. It turns out that before the test suite is run, the elastic search cluster status must be retrieved. This call was timing out since it waits for the health status of the cluster to become ‘yellow’ but in my case it was staying consistently red. To fix this problem, I was able to delete all of my data with the following command:

curl -XDELETE 'http://localhost:9200/_all

Once the cluster status was green, I added a new unit test to verify the behaviour of my bug fix. I then opened a pull request for this on the elasticutils project. The maintainer of this project recently reviewed and merged my pull request into master. As soon as he releases a new version of the project, I should be able to submit a new pull request on Zamboni to include this change.

Some Recognition

Once I submitted my pull request, I reached out to my usual Marketplace contributor so that he could review it. In the process he started to ask me some questions about why I was contributing so many patches to the project and he indicated that he thought I was doing a great job so it was awesome to get that sort of validation from someone who is so close to the project.

On top of this, a couple of days later, the Engineering Manager of the Marketplace and Add-ons projects reached out to me on IRC just to thank me for working on the projects since apparently he realized I was making several contributions lately. Again, this was quite a nice surprise to receive some small level of recognition for the work that I was doing. Even though I am not working on the most important bugs or features in the projects, I seem to be working on things that more than likely wouldn’t get finished otherwise. Knowing that my patches are not going unnoticed definitely seems to give me more motivation to continue to contribute (hopefully even beyond the scope of my CIS*4900 project). I think this small act is quite a good testament to the community that Mozilla seems to build around their open-source projects. I am getting to the point where I am starting to feel like I am one of the team and that is a great spot to be in. In fact, the main contributor that I communicate with these days has started to recommend bugs for me to work on so I am excited to start tackling those in the coming week!

Written By

Andrew Halligan

Published March 8, 2014