This year a clever algorithm won the day in rankmaniac

I wrote last year about the mini-projects I use in my “flagship” course Networks: Structure & Algorithms.  We recently finished the first of these miniprojects: rankmaniac.

In rankmaniac, students have to build a system (using Amazon Elastic MapReduce) to compute the nodes of a huge web crawl that have the top 20 pageranks, and to do the computation as quickly as possible.  The system gets scored based on the sum of the mean-squared error of ranking of the 20 nodes output x 30sec + the computation time.  Thus, there’s a tradeoff between accuracy and computing time, and the goal is to find (and efficiently implement) a parallel algorithm for computing approximate pageranks very quickly…

The students have one week to spend on this and, each night, their current solutions are run and the scores are posted on a public leaderboard.  Their grades are not impacted by their ranking (only whether they pass two given milestones), but the team at the top of the leaderboard at the end of the week is crowned “rankmaniac” for the year.

This year’s competition

Every year, students try lots of interesting algorithmic approaches to the problem, but most years, it turns out that system optimizations win the day.  For the scale of the problems that I can give them, a clean, simple, efficient, implementation can typically beat a complex algorithm…  But, this year was very exciting because the winning team won because of a clever algorithmic optimization!

In particular, they used a nice idea from a paper of Kamvar et al at WWW: Extrapolation Methods for Accelerating PageRank Computations.   The key idea in the paper is quite simple and effective: perform quadratic extrapolation to minimize the number of iterations computed by the Power Method. Intuitively, one can think of simply replacing computation steps in the Power Method with extrapolation steps every once in a while in order to speed up computation.  You can’t do this too often, or things won’t converge, but you can do it often enough that significant speedups are possible.

In the case of rankmaniac, the extrapolation steps meant that the winning team could run fewer iterations than any of the other teams, which was enough to win the day!