Posts Tagged ‘math’

I just published the simple-random ruby gem, which is ported from C# code by John D. Cook.  You can view the source on github or install the gem via rubygems:

gem install simple-random

The gem allows you to sample from the following distributions:

  • Beta
  • Cauchy
  • Chi Square
  • Exponential
  • Gamma
  • Inverse Gamma
  • Laplace (double exponential)
  • Normal
  • Student t
  • Uniform
  • Weibull

Simple examples:

require 'rubygems'
require 'simple-random'

r = SimpleRandom.new
r.uniform # => 0.127064087195322
r.normal(5, 1) # => 5.71972152940515

Advertisements

Fun with charts

Posted: 19 December 2007 in Uncategorized
Tags: , , , , ,

I just saw a post on Statistical Modeling dealing with some of the worst use of statistical graphics this year. Be sure to check it out. I’d have to say I agree with that assessment. The case deals with two pictures of a road during the Crimean War. In the first picture, there is an road covered in cannonballs. In the second, the road is clear. Errol Morris challenged his readers to figure out which picture came first. The correct answer is the clear road.

Morris uses pie charts and bar graphs to display the reasons people gave for their decisions. While colorful, these graphs are also meaningless. So given the data, I z-normalized the on choices and off choices (made it so their distributions had mean 0 and standard deviation 1). I used the same bar graph setup (except horizontal this time). Since I normalized each distribution, the actual quantity of voters one way or the other no longer really makes a difference. I am just comparing the relative preference by one side or the other for a given reason. This assumes that there is some significance to a person not choosing a particular reason, which may be incorrect.

Click to enlarge the graph if it’s not properly visible:

Errol Morris discusses data on people’s decisions about two photographs from the Crimean War.

So what I think my chart shows is that shadows are the worst feature to choose for correctly guessing which came first. People who focused on either the shelling or characteristics/artistic features were more likely to choose correctly.  The most confusing feature is the number and position of the balls.   Also confusing were practical concerns.  If I were going to train a support vector machine to classify images of this type, I would use the three features: shelling, characteristics/artistic and shadows.

So what do you think? Am I way off on trying to normalize these and make this kind of assessment? I am, after all, a statistics amateur.

The PISA (Program for International Student Assessment) test is administered to 15 year olds in industrialized countries every three years. The 2006 results were just released and show that US students are ranked 17th out of 30 in science and 24th in math. About 1.3% of students reached the highest level on the test overall with New Zealand and Finland having the most star pupils at 3.9%. [source (Note: may require free registration)] (more…)

brainfscking set theory

Posted: 3 December 2007 in Uncategorized
Tags: , , , , , ,

I mentioned the esoteric programming language brainfuck a little while back. It consists of 8 operations and was created in order to make the smallest compiler in the world (I think the current best is 174 bytes). I was reading a post over on Good Math, Bad Math that defines arithmetic in terms of sets. Pretty basic if you’ve done anything with set theory, but Mark has a clear way of explaining things so I usually try to read all of his posts. I’ve been playing catch-up today.  It struck me immediately how closely the set form that Mark describes matches the syntax/logical structure of brainfuck.  So I decided to play around a little.  Read on for more. (more…)

Since I work with recommender systems, I’d hardly be doing my job if I didn’t notice things like Google Reader’s new feed recommendations. From the description of how the recommender works on the Google help page (which is unfortunately not very specific):

Your recommendations list is automatically generated. It takes into account the feeds you’re already subscribed to, as well as information from your Web History, including your location. Aggregated across many users, this information can indicate which feeds are popular among people with similar interests. For instance, if a lot of people subscribe to feeds about both peanut butter and jelly, and you only subscribe to feeds about peanut butter, Reader will recommend that you try some jelly.

This sounds like they are using a hybrid recommender system. When you are recommending items (in this case feeds) to users, you can either consider the qualities of the items themselves (content-based) or the behavior of people similar to you (collaborative filtering). The Netflix Prize is a collaborative filtering case for the most part, though it is possible to add in some amount of content. (more…)