Posts Tagged ‘cmu’

Image representing Netflix as depicted in Crun...
Image via CrunchBase

It looks like some of the top players in the Netflix Prize competition have teamed up and finally broke the 10% improvement barrier.  I know I’m a few days late on this, though not because I didn’t see when it happened.  I’ve been battling an ear infection all week and it has left me dizzy, in pain, and with no energy when I get home from work.  I hesitated before even posting anything about this, since there is little I can add at this point that hasn’t already been said. I’ll just share a few thoughts and experiences for posterity and leave it at that.  I’m also going to eventually make the point that recommender systems are operating under a false assumption, if you read this all the way through. :)

I competed for the prize for a bit, trying out a few ideas with support vector machines and maximum margin matrix factorization [pdf] that never panned out.  We were getting about a 4% improvement over Cinematch, which put us way down the list.  Going further would mean investing a lot of effort into implementing other algorithms, working out the ensemble, etc., unless we came up with some novel algorithm that bridged the gap.  That didn’t seem likely, so I stopped working on it just after leaving school.  I learned a lot about machine learning, matrix factorization, and scaling thanks to the competition, so it was hardly a net loss for me.

The one thing I regret is that the prize encouraged me and my advisor to spend more effort on the competition than we should have, which in turn meant we didn’t spend more time working on something tangibly productive for research.  Bluntly put, I think if we hadn’t wasted so much time on the competition, we could have worked on a different research problem more likely to produce a paper.  The lack of published research on my CV was the main reason I didn’t move on to get my PhD at CMU (at least, that’s what I was told by those close to the decision).  Hindsight is 20/20, and at the time, the shining glory of winning a million bucks and fame was delicious.  It also seemed like we had ideas that “maybe kinda sorta” were going somewhere.  That turned out to not be the case, but when admissions committees look at research experience, negative results = no results.

Many people have lauded the competition by saying that it has encouraged research in collaborative filtering and brought public attention to the field.  I was one of those people.  Others have criticized it for not focusing more on what people actually care about when using recommender systems — getting something useful and having a good experience!  And yes, Daniel Lemire, I’m thinking of you. :)  But I’m convinced that Daniel is right.  I remember reading in the literature that a 10% improvement is about what’s needed for someone to actually be able to notice a difference in recommender systems.  So maybe people will notice a slight improvement in the Netflix recommendations if these ideas are ever implemented.  Which is another problem — most of the stuff that led to winning the prize is so computationally expensive, it’s not really feasible for production.  Netflix recently released some improvements, and I didn’t notice a damned thing.  They still recommended me Daft Punk’s Electroma, which was a mind-numbing screen-turd.  And I must have seen every good sci-fi movie ever made, because there are no more recommendations for me in that category.  I have trouble believing that.

The point of a recommender system really shouldn’t be just to guess what I might happen to rate something at a given time.  The fact that introducing time makes such a big difference in improving performance in the competition seems like a ginormous red flag to me.  Sure I can look back in time and say “on day X, people liked movies about killing terrorists.”  The qualifying set in the competition asked you to predict the rating for a movie by a user on a given date in the past.  Remember what I said about hindsight being 20/20?  How about you predict what I will rate a movie this coming weekend.  See the problem?

I will sound the HCIR trumpets and say that what recommender systems should really be looking at is improving exploration.  When I go looking for a movie to a watch, or a pair of shoes to buy, I already know what I like in general.  Let me pick a starting point and then show me useful ways of narrowing down my search to the cool thing I really want.  Clerk dogs is a good first step on this path, though I think we’re going to have to move away from curated knowledge before this is going to catch fire.

Maybe I have this all wrong.  Maybe we need to discard the notion of recommender systems, since they are operating under the wrong premise.  We don’t need a machine to recommend something it thinks we’ll like.  We need a machine that will help us discover something we’ll like.  We need to be making discovery engines.  (Replace recommender system with search engine in most of what I just said and you’ll find that I have really been sounding the HCIR trumpets.)

Reblog this post [with Zemanta]

This is research I did a while ago and presented Monday to fulfill the requirements of my Masters degree.  The presentation only needed to be about 20 minutes, so it was a very short intro.  We have moved on since then, so when I say future work, I really mean future work.  The post is rather lengthy, so I have moved the main content below the jump.


I just completed the final requirements of my Masters degree today (the details of which I will save for a future post).  It has been a difficult road, and I’m glad it’s done.  I didn’t attend any sort of graduation ceremonies, because I don’t go for that sort of thing — at all.  Until today, it didn’t feel like the weight was off my shoulders.  Now I actually feel like celebrating!  But I won’t, because I’m a nerd.  I’m currently celebrating by working on a programming puzzle.  And surfing the blagoblag.

I still have a couple months of servitude to complete the requirements of my fellowship, but the degree is mine.

Know your audience

Posted: 14 May 2008 in Uncategorized
Tags: , , , , , , ,

It is very important to know your audience when marketing your product. This was posted at the bus stop just outside the CS department at CMU.

Cool school

Today is the official opening day of GWAP: Games with a Purpose. This is one of two research projects I have been working on for the past few months, though my involvement with GWAP so far has only been in the form of attending meetings, minor testing, and offering my sage gaming advice (and by sage, I mean the herb). GWAP is the next phase in Luis von Ahn‘s human computation project. If you visit and play some games, not only will you be rewarded with a good time, but you’ll be helping science! Science needs you. To play games. Now.

The Idea

Artificial intelligence has come a long way, but humans are still far better at computers at simple, everyday tasks. We can quickly pick out the key points in a photo, we know what words mean and how they are related, we can identify various elements in a piece of music, etc. All of these things are still very difficult for computers. So why not funnel some of the gazillion hours we waste on solitaire into something useful? Luis has already launched a couple websites that let people play games while solving these problems. Perhaps you’ve noticed the link to Google Image Labeler on Google Image Search? That idea came from his ESP game (which is now on GWAP).

The Motivation

What researchers need to help them develop better algorithms for computers to do these tasks is data. The more data the better. Statistical machine translation has improved quite a bit over the past few years, in large part due to an increased amount of data. This is the reason why languages that are spoken by few people (even those spoken by as few as several million) still don’t have machine translation tools: there is just not enough data. More data means more food for these algorithms which means better results. And if results don’t improve, then we have learned something else.

The Solution

Multiple billions of hours are spent each year on computer games. If even a small fraction of that time were spent performing some task that computers aren’t yet able to do, we could increase the size of the data sets available to researchers enormously. Luis puts this all a lot better than I can, and fortunately, you can watch him on YouTube (below).

So, check it out already.

I attended some of the final presentations of an undergrad class on Game Programming today with a friend. We went in expecting something more like a poster session, where people are arrayed around a room showing their work off to a few people who managed to crowd around them. The poster session is ideal for brief browsing, because you can skip anything you’re not interested in. Instead, it was a series of power point presentations followed by an on-screen demo.


I’ve been messing around with recommender systems for the past year and a half, but not using the kNN (k-Nearest Neighbors) algorithm. However, my current homework assignment for my Information Retrieval class is to implement kNN for a subset of the Netflix Prize data. The data we are working with is about 800k ratings, which is slightly smaller than the MovieLens dataset, which was the previous dataset of choice for research on movie recommender systems. The entire Netflix data set dwarfs the MovieLens set by a factor of 100, so it is quickly replacing MovieLens in papers. The Netflix data is much sparser than the MovieLens, which changes things, as well.

kNN is a fairly simple machine learning algorithm to implement. On a dataset the size of Netflix, it’s still easy to do stupid things that cause it to take forever. Recommender systems typically match users to movies on a scale, which is the user’s rating for that item. In the case of Netflix, the scale is 1 (hate) to 5 (love). For the Netflix Prize, the goal is to guess user’s ratings on a hidden set as correctly as possible (according to root mean squared error (RMSE)). One way of approaching the problem is to create a user-items matrix where the rows correspond to a user, the columns to an item (movie) and the value in each cell is the user’s rating for that item. If the user has not rated the item, it is assigned a zero. Now, we can split this matrix up into vectors, where each row vector represents a user. kNN seeks to find similar users to other users (or similar movies to other movies) according to some metric over these vectors. I won’t bother going into the metrics in detail, but they include cosine similarity, Euclidean distance, and Pearson correlation. The ratings from the chosen k users are combined (either by simply averaging or using some weighted average) to form the prediction for a movie the user has not yet rated.

So on the test dataset for this assignment, I built three models that had the following RMSE scores:

Model 1 0.9831
Model 2 1.0371
Model 3 0.9768

Just guessing the average rating for each movie gives an RMSE of 1.0 in this case, so Models 1 and 3 improve over the baseline, while Model 2 does worse. The best performing teams in the Netflix prize use ensemble methods to combine various models. The simple way to do this is just with a linear combination. So given models {m1, m2, m3} and weights {w1, w2, w3}, the ensemble prediction would be w1m1 + w2m2 + w3m3 (where w1+w2+w3=1.0). This was the first time I had tried ensembles with recommender systems as well, so imagine my surprise when I hit 0.9469 RMSE with my best choice of w’s. Of course, this is nowhere near the number needed to actually claim the prize, but it was a nice demonstration of the power of ensemble methods. I recommend checking out the proceedings of last year’s KDD Cup if you’re interested.

OpenEphyra is a question answering (QA) system developed here at the Language Technologies Institute by Nico Schlaefer. He began his work at the University of Karlsruhe in Germany, but has since continued it at CMU and is currently a PhD student here. Since it is a home-grown language technologies package, I decided to check it out and play around. This is the first QA system I have used that wasn’t integrated in a search engine, so this isn’t exactly an expert review.

Getting started in Windows (or Linux or whatever) is pretty easy if you already have Apache ant and Java installed. Ant isn’t necessary, but I recommend getting it if you don’t have it already. It’s just handy. First, download the OpenEphyra package from sourceforge. The download is about 59 MB and once it’s done unpack it in whatever directory you want. Assuming you have ant installed, all you have to do is type ant to build it, though you may want to issue ant clean first. I had to make one slight change to the build.xml file to get it to run, which was on line 55: <jvmarg line="-server& #13;-Xms512m& #13;-Xmx1024m"/>, which had to be changed to <jvmarg line="-server -Xms512m -Xmx1024m"/>. Easy enough. Then to run it, all you have to do is type ant OpenEphyra.

After taking a short bit to load up, you can enter questions on the command line. Based on what I can tell from the output, it begins by normalizing the question (removing morphology, getting rid of punctuation). Then it determines the type of answer it is looking for, like a person’s name or a place and assigns certain properties to what it expects to find. Next it automatically creates a list of queries that are sent to the search engine(s). The documentation indicates that the AQUAINT, AQUAINT-2 and BLOG06 corpora are included (at least preprocessing is supported), but there are searchers for Google, Wikipedia, Yahoo and several others. Indri is a search engine which supports structured queries and OpenEphyra auto-generates some structured queries from what I saw playing around. After generating the queries, they are sent to the various searchers and results are obtained and scored. Finally, if you’re lucky, you get an answer to your question.

Here are the results of screwing around with it for a few minutes:

  • Who created OpenEphyra?
    • no answer (sorry, Nico)
  • Who invented the cotton gin?
    • Eli Whitney
  • Who created man?
    • God
  • What is the capital of Mongolia?
    • Ulaanbaatar
  • Who invented the flux capacitor?
    • Doc Brown (awesome!)
  • Who is the author of the Mendicant Bug?
    • Zuckerberg — damn you, Facebook! :(
  • How much wood can a woodchuck chuck?
    • no answer (correct)
  • What is the atomic number of Curium?
    • 96 (also correct)
  • Who killed Lord Voldemort?
    • Harry (correct, but partial)
  • How many rings for elven kings?
    • 3021 (so, so very wrong)

Fun stuff! It’s not anywhere near perfect, but there are definite uses and the thing is ridiculously easy to install and use. Also, it’s in Java, so you can integrate it with your own system with very little effort. Depending on what sort of question you are looking for answers to, you get various levels of results. Factual questions about geography and people tend to do better than questions about numbers and fiction, as you might expect. Also, why-questions are not supported.

Another bonus is the project is open source, so if you’re into QA, you can help develop it.

Spam of the Day 2008-02-07

Posted: 7 February 2008 in Uncategorized
Tags: , , , , , ,

I go through my spam everyday to make sure that false positives don’t get deleted. For whatever reason, stuff coming from the Help Desk at CMU gets labeled as spam a lot. I’m not saying it sounds like word salad (*cough*), but it trips off gmail’s spam sensors. The good thing about gmail is a low false negative rate, the bad thing is a fairly high false positive.  And if you weren’t already aware, word salad is the name given to the jumble of unrelated, often obscure words that appear in a spam email to throw off spam filters.

The various spam messages I get never fail to amuse me in some way, so why not share them with you, my innocent reader would rather never see another spam title again? Ages ago, I was especially amused by two bits of spam that actually had lines from Robert Jordan’s Wheel of Time series as subject lines. I captured an image of the second one, but the first is lost forever and I haven’t noticed one since (click on it if it’s too small to read).

Twice the Dragon, for the price he must pay.

So the inaugural Spam of the Day (SOTD, rhymes with sotted):

“Try the new manpower candy!”

ACM Turing Award Winner

Posted: 4 February 2008 in Uncategorized
Tags: , , , , , ,

Ed Clarke, a professor of Computer Science at CMU, just won the 2007 ACM Turing Award.  The ACM is the Association for Computing Machinery and is the oldest professional group for the computing industry.  I first became a member in 2005 and have maintained that membership since.  The Turing Award is given in honor of Alan Turing, the father of computer science (most would agree).  This award is basically the Nobel prize of computer science (since they don’t give Nobels for CS) and is meant to recognize individuals who have made a lasting and significant contribution to the computing field.

Ed’s work was in conjunction with two other people:  E. Allen Emerson and Joseph Sifakis.   Their work was on model checking, which is a way of determining whether a hardware or software structure is a model of a logical formula.  So if a structure matches a formula in propositional logic, it checks.

Clarke joins three other professors at CMU who are Turing recipients.  Raj Reddy was co-awarded it in 1994 for large scale AI systems.  Manuel Blum won it in 1995 for his work on computational complexity theory.  Dana Scott won it in 1976 for non-deterministic finite state machines, something that has a major role in natural language processing (and computational linguistics).