Since Ruby is my new favorite toy, I thought it would be fun to try my hand at C extensions.  I came across David Blei’s C code for Latent Dirichlet Allocation and it looked simple enough to convert into a Ruby module.  Ruby makes it very easy to wrap some C functions (which is good to know if you need a really fast implementation of something that gets called alot).  Wrapping a C library is slightly harder, but not horribly so.  Probably most of my challenge was the fact that it’s been so long since I wrote anything in C.

Since the code is open source, I decided to release the Ruby wrapper as a gem on GitHub.  I chose GitHub over RubyForge, because it uses Git and freakin’ rocks.  But GitHub is a story for another day.  Feel free to contribute and extend the project if you’re so inclined.

A basic usage example:

require 'lda'
# create an Lda object for training
lda =
corpus ="data/data_file.dat")
lda.corpus = corpus
# run EM algorithm using random starting points
# print the topic 20 words per topic

You can also download the gem from GitHub directly:

gem sources -a
sudo gem install ealdent-lda-ruby

You only need the first line if you haven’t added GitHub to your sources before.


Comments are closed.