Posts Tagged ‘c’

works-on-my-machine-starburstA while back I ported David Blei’s lda-c code for performing Latent Dirichlet Allocation to Ruby.  Basically I just wrapped the C methods in a Ruby class, turned it into a gem, and called it a day.  The result was a bit ugly and unwieldy, like most research code.  A few months later, Todd Fisher came along and discovered a couple bugs and memory leaks in the C code, for which I am very grateful.  I had been toying with the idea of improving the Ruby code, and embarked on a mission to do so.  The result is a hopefully much cleaner gem that can be used right out of the box with little screwing around.

Unfortunately, I did something I’m ashamed of.  Ruby gems are notorious for breaking backwards compatibility, and I have done just that.  The good news is, your code will almost work, assuming you didn’t start diving into the Document and Corpus classes too heavily.  If you did, then you will probably experience a lot of breakage.  The result, I hope is a more sensical implementation, however, so maybe you won’t hate me.  Of course, I could be wrong and my implementation is still crap.  If that’s the case, please let me know what needs to be improved.

To install the gem:

gem sources -a http://gems.github.com
sudo gem install ealdent-lda-ruby

Enjoy!

Reblog this post [with Zemanta]

Advertisements

Since Ruby is my new favorite toy, I thought it would be fun to try my hand at C extensions.  I came across David Blei’s C code for Latent Dirichlet Allocation and it looked simple enough to convert into a Ruby module.  Ruby makes it very easy to wrap some C functions (which is good to know if you need a really fast implementation of something that gets called alot).  Wrapping a C library is slightly harder, but not horribly so.  Probably most of my challenge was the fact that it’s been so long since I wrote anything in C.

Since the code is open source, I decided to release the Ruby wrapper as a gem on GitHub.  I chose GitHub over RubyForge, because it uses Git and freakin’ rocks.  But GitHub is a story for another day.  Feel free to contribute and extend the project if you’re so inclined.

A basic usage example:

require 'lda'
# create an Lda object for training
lda = Lda::Lda.new
corpus = Lda::Corpus.new("data/data_file.dat")
lda.corpus = corpus
# run EM algorithm using random starting points
lda.em("random")
lda.load_vocabulary("data/vocab.txt")
# print the topic 20 words per topic
lda.print_topics(20)

You can also download the gem from GitHub directly:

gem sources -a http://gems.github.com
sudo gem install ealdent-lda-ruby

You only need the first line if you haven’t added GitHub to your sources before.

Morning Madness

Posted: 2 November 2007 in Uncategorized
Tags: , , , , , , , , , ,

Ever get pissed twice before you’ve really even opened your eyes? This is why I shouldn’t read my RSS feeds so early in the morning. At the top of the list is Bush equating Democrats who oppose the war (as if it could be called opposition, anyway) to those who ignored Hitler and Lenin and then Hillary firing back. Am I mad at Bush for making this analogy? No and I think he’s correct, but not in the way he thinks. I’m more angry at Hillary for firing back and not recognizing her own culpability. The Sheepocrats sat back and did nothing four years ago when this war began and passed the Patriot Act before that. They have endorsed the war at every stage since and even their current so-called opposition is luke-warm and putrid with its weasliness. So yeah, they are like people who ignored the rise of Hitler and Lenin. If she had recognized that and said it publicly, it would have done her credit.

 

Next up, I was reading a few bit twiddling hacks and came across a nice one for branchless absolute value [hat tip]. The hacks are all in the public domain, too, so that’s good. He does list the occasional variation that is patented, an enormously helpful fact if you’re producing commercial software. So here is the patented version of the branchless absolute value:


int v; // we want to find the absolute value of v
int r; // the result goes here
int const mask = v >> sizeof(int) * CHAR_BIT - 1;
r = (v ^ mask) - mask;

The last ^ (XOR) – (subtract) combination represents the patent. What works also?

r = (v + mask) ^ mask;

As Sean points out, though, the patent probably could be contested if the holder (none other than Sun Microsystems) ever tried to enforce it. So what ticked me off is that such a thing could be patented. I raise my hands in impotent fury at the ludicrousness of software patents. I don’t blame the inventors for them, it’s something you pretty much have to do these days. I blame the system that makes that true.

Update

Did some benchmarks on the two versions of absolute value given above.  Using a 3.06GHz processor, I could run 4 billion absolute values in 18.916 +/- 0.021 seconds for the patented version and 18.906 +/- 0.026 seconds for the free version.  So no need to even bother with the patented version it looks like.