Posts Tagged ‘code’

I just published the simple-random ruby gem, which is ported from C# code by John D. Cook.  You can view the source on github or install the gem via rubygems:

gem install simple-random

The gem allows you to sample from the following distributions:

  • Beta
  • Cauchy
  • Chi Square
  • Exponential
  • Gamma
  • Inverse Gamma
  • Laplace (double exponential)
  • Normal
  • Student t
  • Uniform
  • Weibull

Simple examples:

require 'rubygems'
require 'simple-random'

r = SimpleRandom.new
r.uniform # => 0.127064087195322
r.normal(5, 1) # => 5.71972152940515

Advertisements

Java maps and sorting

Posted: 1 August 2009 in Uncategorized
Tags: , , , , ,

I’m always a little annoyed I have to implement sorting Map keys by their values myself in Java.  It seems like they should be a part of the standard Collections library or something.  Maybe they are and I just haven’t seen it?  My solution (gist) is based on feedback from Josh in the comments to a previous post. How does that look to you?

Jekyll and Code

Posted: 8 January 2009 in Uncategorized
Tags: , , , , , , ,

Tom Preston-Werner, aka mojombo, rocks.  When GitHub announced GitHub Pages recently, they pointed to a new blog engine, Jekyll.  Jekyll generates the blog as a set of static pages — no database reads, no PHP, just fast HTML.  I was instantly drawn to it, and since I’ve been itching to switch blog engines, I damn near moved this blog.  It would be hosted on GitHub, for free.  And it would be backed up using my favorite version control system.  I would have complete access to all of my content.  If WordPress went belly up, I would lose all of my content.  That bothers me.

Jekyll is still in its infancy.  But for two things, I would switch right now.  First, support for tags is incomplete, so pages on my blog such as http://mendicantbug.com/category/computational-linguistics/ would no longer be supported under Jekyll.  That would play hell with my Google traffic.  I’m willing to make that sacrifice since most of that traffic is from people who don’t care about the main topics I’m interested in.  Second, and this is the killer, Jekyll does not support comments.  Yet.  The good news is, it can be forked and someone may implement comments.  I hope so, but the static nature of Jekyll means handling comments is not very straightforward.  I can imagine how it might be done, so we’ll see.  I suppose I could do it myself, but my plate is so full right now I’m having a hard time getting what I need to get done done.

So what I’m doing instead, for now, is hosting my code there.  Jekyll has code highlighting built-in using Liquid.  Handy!  I put up the source for my post on Bandwidth simulation.  I’ll be adding more soon, which I’ll make note of, if for some reason you’re actually interested.

Since Ruby is my new favorite toy, I thought it would be fun to try my hand at C extensions.  I came across David Blei’s C code for Latent Dirichlet Allocation and it looked simple enough to convert into a Ruby module.  Ruby makes it very easy to wrap some C functions (which is good to know if you need a really fast implementation of something that gets called alot).  Wrapping a C library is slightly harder, but not horribly so.  Probably most of my challenge was the fact that it’s been so long since I wrote anything in C.

Since the code is open source, I decided to release the Ruby wrapper as a gem on GitHub.  I chose GitHub over RubyForge, because it uses Git and freakin’ rocks.  But GitHub is a story for another day.  Feel free to contribute and extend the project if you’re so inclined.

A basic usage example:

require 'lda'
# create an Lda object for training
lda = Lda::Lda.new
corpus = Lda::Corpus.new("data/data_file.dat")
lda.corpus = corpus
# run EM algorithm using random starting points
lda.em("random")
lda.load_vocabulary("data/vocab.txt")
# print the topic 20 words per topic
lda.print_topics(20)

You can also download the gem from GitHub directly:

gem sources -a http://gems.github.com
sudo gem install ealdent-lda-ruby

You only need the first line if you haven’t added GitHub to your sources before.

A couple of days ago, I wrote a script that would tweet anything you plurked. Thanks to some code from Neville Newey (based on PHP code by Charl van Niekerk), the plurk.py script I wrote has been updated to both plurk your tweets and tweet your plurks. This should work on both windows and linux machines. If you have access to a linux machine, I suggest setting up a cron job to take care of this. As I mentioned in the previous post, if you set up a cron job, be sure to change the path to plurkdb.dat to an absolute path. I have done the most testing on this with python 2.4 in linux.

This code is open source under the Creative Commons 3.0 Attribution license that this blog uses Creative Commons BSD license. Neville’s code appears to be under CC:Attribution 2.5 for South Africa, by what I could glean from his site. I have considered making this an open source project under Google code but have yet to take it all the way. Google sets a lifetime limit of 10 projects, so I will continue to hoard those against future need. If you make modifications to the code, please let me know and I will probably post them here and in the code for future releases, so we all win.

Note that the command line parameters have changed:

plurk.py <twitter username> <twitter password> <plurk username> <plurk password>

And of course, as with all software, use at your own risk.

Tweet your plurks

Posted: 2 June 2008 in Uncategorized
Tags: , , , , , , , ,

If you want to use Plurk, but aren’t ready to leave Twitter, I wrote a little python script you can use to automatically mirror your plurks on Twitter. This will not work for response plurks, but your main plurks will be extracted and posted to your Twitter account with the prefix “plurking:” followed by your plurk.

The resulting tweet looks like this:

sample of what the script outputs in twitter

Download the script and set it up as a cron job (or you could execute it manually). It should work with python 2.4 and later. It stores a plurkdb.dat file (which you should probably assign an absolute path to, depending on the behavior of cron on your system). This file is checked every time it is run to make sure that duplicate plurks aren’t being tweeted. You should pass the following parameters on the command line (or modify the script so they are hardcoded, if you want): <twitter username> <twitter password> <plurk username> <plurk password>. Update: see later post on updated plurk script.  And like with all software, use at your own risk.

Please let me know if you have any problems with it or see room for improvement. I hacked this out in a hurry, so …

Java Properties

Posted: 5 December 2007 in Uncategorized
Tags: , , , , , ,

I discovered the java.util.Properties class a couple weeks ago in the ginormous Java API docs. If you’ve ever created a software project where you have a lot of different settings that change frequently, this is the class for you. In my research, I implement all these different algorithms for various things, find out they don’t work, implement something else, rinse, repeat. Being able to look back at my results from two months ago and then loading the exact same configuration and running the experiment all over again is a must. Enter the Properties class. (more…)