Posts Tagged ‘cmu’

Image representing Netflix as depicted in Crun...
Image via CrunchBase

It looks like some of the top players in the Netflix Prize competition have teamed up and finally broke the 10% improvement barrier.  I know I’m a few days late on this, though not because I didn’t see when it happened.  I’ve been battling an ear infection all week and it has left me dizzy, in pain, and with no energy when I get home from work.  I hesitated before even posting anything about this, since there is little I can add at this point that hasn’t already been said. I’ll just share a few thoughts and experiences for posterity and leave it at that.  I’m also going to eventually make the point that recommender systems are operating under a false assumption, if you read this all the way through. :)

I competed for the prize for a bit, trying out a few ideas with support vector machines and maximum margin matrix factorization [pdf] that never panned out.  We were getting about a 4% improvement over Cinematch, which put us way down the list.  Going further would mean investing a lot of effort into implementing other algorithms, working out the ensemble, etc., unless we came up with some novel algorithm that bridged the gap.  That didn’t seem likely, so I stopped working on it just after leaving school.  I learned a lot about machine learning, matrix factorization, and scaling thanks to the competition, so it was hardly a net loss for me.

The one thing I regret is that the prize encouraged me and my advisor to spend more effort on the competition than we should have, which in turn meant we didn’t spend more time working on something tangibly productive for research.  Bluntly put, I think if we hadn’t wasted so much time on the competition, we could have worked on a different research problem more likely to produce a paper.  The lack of published research on my CV was the main reason I didn’t move on to get my PhD at CMU (at least, that’s what I was told by those close to the decision).  Hindsight is 20/20, and at the time, the shining glory of winning a million bucks and fame was delicious.  It also seemed like we had ideas that “maybe kinda sorta” were going somewhere.  That turned out to not be the case, but when admissions committees look at research experience, negative results = no results.

Many people have lauded the competition by saying that it has encouraged research in collaborative filtering and brought public attention to the field.  I was one of those people.  Others have criticized it for not focusing more on what people actually care about when using recommender systems — getting something useful and having a good experience!  And yes, Daniel Lemire, I’m thinking of you. :)  But I’m convinced that Daniel is right.  I remember reading in the literature that a 10% improvement is about what’s needed for someone to actually be able to notice a difference in recommender systems.  So maybe people will notice a slight improvement in the Netflix recommendations if these ideas are ever implemented.  Which is another problem — most of the stuff that led to winning the prize is so computationally expensive, it’s not really feasible for production.  Netflix recently released some improvements, and I didn’t notice a damned thing.  They still recommended me Daft Punk’s Electroma, which was a mind-numbing screen-turd.  And I must have seen every good sci-fi movie ever made, because there are no more recommendations for me in that category.  I have trouble believing that.

The point of a recommender system really shouldn’t be just to guess what I might happen to rate something at a given time.  The fact that introducing time makes such a big difference in improving performance in the competition seems like a ginormous red flag to me.  Sure I can look back in time and say “on day X, people liked movies about killing terrorists.”  The qualifying set in the competition asked you to predict the rating for a movie by a user on a given date in the past.  Remember what I said about hindsight being 20/20?  How about you predict what I will rate a movie this coming weekend.  See the problem?

I will sound the HCIR trumpets and say that what recommender systems should really be looking at is improving exploration.  When I go looking for a movie to a watch, or a pair of shoes to buy, I already know what I like in general.  Let me pick a starting point and then show me useful ways of narrowing down my search to the cool thing I really want.  Clerk dogs is a good first step on this path, though I think we’re going to have to move away from curated knowledge before this is going to catch fire.

Maybe I have this all wrong.  Maybe we need to discard the notion of recommender systems, since they are operating under the wrong premise.  We don’t need a machine to recommend something it thinks we’ll like.  We need a machine that will help us discover something we’ll like.  We need to be making discovery engines.  (Replace recommender system with search engine in most of what I just said and you’ll find that I have really been sounding the HCIR trumpets.)

Reblog this post [with Zemanta]

This is research I did a while ago and presented Monday to fulfill the requirements of my Masters degree.  The presentation only needed to be about 20 minutes, so it was a very short intro.  We have moved on since then, so when I say future work, I really mean future work.  The post is rather lengthy, so I have moved the main content below the jump.


I just completed the final requirements of my Masters degree today (the details of which I will save for a future post).  It has been a difficult road, and I’m glad it’s done.  I didn’t attend any sort of graduation ceremonies, because I don’t go for that sort of thing — at all.  Until today, it didn’t feel like the weight was off my shoulders.  Now I actually feel like celebrating!  But I won’t, because I’m a nerd.  I’m currently celebrating by working on a programming puzzle.  And surfing the blagoblag.

I still have a couple months of servitude to complete the requirements of my fellowship, but the degree is mine.

Know your audience

Posted: 14 May 2008 in Uncategorized
Tags: , , , , , , ,

It is very important to know your audience when marketing your product. This was posted at the bus stop just outside the CS department at CMU.

Cool school

Today is the official opening day of GWAP: Games with a Purpose. This is one of two research projects I have been working on for the past few months, though my involvement with GWAP so far has only been in the form of attending meetings, minor testing, and offering my sage gaming advice (and by sage, I mean the herb). GWAP is the next phase in Luis von Ahn‘s human computation project. If you visit and play some games, not only will you be rewarded with a good time, but you’ll be helping science! Science needs you. To play games. Now.

The Idea

Artificial intelligence has come a long way, but humans are still far better at computers at simple, everyday tasks. We can quickly pick out the key points in a photo, we know what words mean and how they are related, we can identify various elements in a piece of music, etc. All of these things are still very difficult for computers. So why not funnel some of the gazillion hours we waste on solitaire into something useful? Luis has already launched a couple websites that let people play games while solving these problems. Perhaps you’ve noticed the link to Google Image Labeler on Google Image Search? That idea came from his ESP game (which is now on GWAP).

The Motivation

What researchers need to help them develop better algorithms for computers to do these tasks is data. The more data the better. Statistical machine translation has improved quite a bit over the past few years, in large part due to an increased amount of data. This is the reason why languages that are spoken by few people (even those spoken by as few as several million) still don’t have machine translation tools: there is just not enough data. More data means more food for these algorithms which means better results. And if results don’t improve, then we have learned something else.

The Solution

Multiple billions of hours are spent each year on computer games. If even a small fraction of that time were spent performing some task that computers aren’t yet able to do, we could increase the size of the data sets available to researchers enormously. Luis puts this all a lot better than I can, and fortunately, you can watch him on YouTube (below).

So, check it out already.

I attended some of the final presentations of an undergrad class on Game Programming today with a friend. We went in expecting something more like a poster session, where people are arrayed around a room showing their work off to a few people who managed to crowd around them. The poster session is ideal for brief browsing, because you can skip anything you’re not interested in. Instead, it was a series of power point presentations followed by an on-screen demo.


I’ve been messing around with recommender systems for the past year and a half, but not using the kNN (k-Nearest Neighbors) algorithm. However, my current homework assignment for my Information Retrieval class is to implement kNN for a subset of the Netflix Prize data. The data we are working with is about 800k ratings, which is slightly smaller than the MovieLens dataset, which was the previous dataset of choice for research on movie recommender systems. The entire Netflix data set dwarfs the MovieLens set by a factor of 100, so it is quickly replacing MovieLens in papers. The Netflix data is much sparser than the MovieLens, which changes things, as well.

kNN is a fairly simple machine learning algorithm to implement. On a dataset the size of Netflix, it’s still easy to do stupid things that cause it to take forever. Recommender systems typically match users to movies on a scale, which is the user’s rating for that item. In the case of Netflix, the scale is 1 (hate) to 5 (love). For the Netflix Prize, the goal is to guess user’s ratings on a hidden set as correctly as possible (according to root mean squared error (RMSE)). One way of approaching the problem is to create a user-items matrix where the rows correspond to a user, the columns to an item (movie) and the value in each cell is the user’s rating for that item. If the user has not rated the item, it is assigned a zero. Now, we can split this matrix up into vectors, where each row vector represents a user. kNN seeks to find similar users to other users (or similar movies to other movies) according to some metric over these vectors. I won’t bother going into the metrics in detail, but they include cosine similarity, Euclidean distance, and Pearson correlation. The ratings from the chosen k users are combined (either by simply averaging or using some weighted average) to form the prediction for a movie the user has not yet rated.

So on the test dataset for this assignment, I built three models that had the following RMSE scores:

Model 1 0.9831
Model 2 1.0371
Model 3 0.9768

Just guessing the average rating for each movie gives an RMSE of 1.0 in this case, so Models 1 and 3 improve over the baseline, while Model 2 does worse. The best performing teams in the Netflix prize use ensemble methods to combine various models. The simple way to do this is just with a linear combination. So given models {m1, m2, m3} and weights {w1, w2, w3}, the ensemble prediction would be w1m1 + w2m2 + w3m3 (where w1+w2+w3=1.0). This was the first time I had tried ensembles with recommender systems as well, so imagine my surprise when I hit 0.9469 RMSE with my best choice of w’s. Of course, this is nowhere near the number needed to actually claim the prize, but it was a nice demonstration of the power of ensemble methods. I recommend checking out the proceedings of last year’s KDD Cup if you’re interested.