Archive for 30 July 2009

Github just announced their own version of the Netflix Prize.  Instead of predicting movie ratings, Github wants you to suggest repositories for users to watch.  This is different from the Netflix Prize in a number of ways:

  1. a user watching a repo is similar to a user visiting a page from a search engine – they are implicit endorsements (we assume that doing so means the user actually likes the repo)
  2. we are predicting the likelihood of a user wanting to watch a repo (binary event), rather than how much a user likes a movie
  3. the data set is a lot smaller, and sparsity is a LOT greater (the matrix is 0.006% filled vs. Netflix 1% filled)
  4. you get multiple tries!  they let you pick 10 repos that user may watch and as long as one of them matches, you get credit for it

Already there have been many submissions.  The number one place is currently held by Daniel Haran with 46.9% guessed correctly.  Happy hunting, if you decide to compete.

The prizes are a bottle of Pappy van Winkle bourbon and a large Github account for life.  The bottle of Pappy is making me consider competing.

works-on-my-machine-starburstA while back I ported David Blei’s lda-c code for performing Latent Dirichlet Allocation to Ruby.  Basically I just wrapped the C methods in a Ruby class, turned it into a gem, and called it a day.  The result was a bit ugly and unwieldy, like most research code.  A few months later, Todd Fisher came along and discovered a couple bugs and memory leaks in the C code, for which I am very grateful.  I had been toying with the idea of improving the Ruby code, and embarked on a mission to do so.  The result is a hopefully much cleaner gem that can be used right out of the box with little screwing around.

Unfortunately, I did something I’m ashamed of.  Ruby gems are notorious for breaking backwards compatibility, and I have done just that.  The good news is, your code will almost work, assuming you didn’t start diving into the Document and Corpus classes too heavily.  If you did, then you will probably experience a lot of breakage.  The result, I hope is a more sensical implementation, however, so maybe you won’t hate me.  Of course, I could be wrong and my implementation is still crap.  If that’s the case, please let me know what needs to be improved.

To install the gem:

gem sources -a
sudo gem install ealdent-lda-ruby


Reblog this post [with Zemanta]