As I mentioned in an earlier post, I’m teaching my “flagship” course this winter, a unique networks course that focuses on the structure and economics of networks using both a theoretical and an applied perspective.
One of the unique things about the course is that we have three fun/challenging mini-project competitions: rankmaniac, pandemaniac, and clickmaniac. The first of these is rankmaniac, and it’s going out in class today…which means that the next week is one of my favorite parts of the class.
What is “rankmaniac”?
Well, the name is egregiously copied from an assignment in a course on the “Science of the Web” that Luis von Ahn taught at Carnegie Mellon a few years ago. In his version, students created a webpage and then engaged in search engine optimization (SEO) to try to be the first link listed on google when the term “rankmaniac” was searched for…hence becoming the rankmaniac!
Well, the year I taught the course, we adopted the same assignment at Caltech in order to see whether the Caltech kids could beat the CMU kids at their own game…they did, as you can see by searching for rankmaniac today.
But, each year, I have adapted the assignment a bit, with the goal of giving a better idea of the algorithmics that go into search engines. Last year (with the help of some excellent TAs), we finally got a system set-up that I think is really fun (and intellectually challenging).
The rules of the game
The current version of rankmaniac has nothing to do with SEO anymore. Now, the goal is to build a system (using Amazon Elastic MapReduce) to compute the nodes of a huge web crawl that have the top 20 pageranks, and to do the computation as quickly as possible.
More concretely, the “score” for a submission is the sum of the mean-squared error of ranking of the 20 nodes output x 30sec + the computation time. Thus, there’s a tradeoff between accuracy and computing time…
So, the goal is to find (and efficiently implement) a parallel algorithm for computing approximate pageranks very quickly…
The students have 1 week to spend on this, and each night, their current solutions are run and the scores are posted on a leaderboard. For their grade, they only have to beat two TA teams, but (of course) the competition within the class keeps people entertained and motivated…
I’m really curious to see what the students come up with this year. Last year, everyone went with some sort of iterative solution, with various stopping and pruning criteria used along the way… I’m hoping that this year, folks might give some of the approximate eigenvector or linear system solvers a whirl to see if they can be competitive. Some folks claim that these algorithms are now able to match iterative solvers in speed for large graphs, but I haven’t seen that in practice yet.
And, finally, good luck to all of the students this year…have fun!