Posts Tagged ‘social networking’

tunkrank-ravenA couple months ago, Daniel Tunkelang posted an algorithm on his blog that attempts to emulate PageRank for Twitter.  I implemented a toy version I dubbed TunkRank, and then suggested that name on his blog.  It got some traction, so I figured what the heck and decided to implement it on TunkRank.com.

Now, there appeared to be a little debate about just whether it is actually emulating PageRank or something else on Daniel’s blog, but I leave it to you to read the comments  on his post if you’re interested. There are also plenty of ideas there on the best way to establish a measure of influence.  I’ll limit the discussion in this post to the basics.

  1. The amount of attention you can give is spread out among all those you follow. The more you follow, the less attention you can give each one.
  2. Your influence depends on the amount of attention your followers can give you.

As a twitterer, your influence does not depend on how many people you follow. However, your usefulness as a follower does. Having higher influence depends on having many followers who follow relatively few people but are followed by many. Followers like that are more likely to pick up on your tweets, act on them, retweet them, whatever. You gain influence through the social graph thanks to their influence.

Therefore, your TunkRank score is a reflection of how much attention your followers can both directly give you and give to you.

I implemented this algorithm in Ruby using Merb, MySQL, Capistrano, nginx, and ActiveRecord (and, of course, Git for version control). While my job involves working on a web app, my role has mostly been on back-end NLP stuff. I’m still quite new to the whole Rails-level-web-app-world. For those who don’t know, Merb is a framework similar Ruby on Rails. So similar they are merging and will become Rails 3. ActiveRecord is an Object-relational Mapping (ORM) that Rails uses. The standard ORM for Merb is DataMapper, but I stuck with something I’m more familiar with to limit the variables in my little project.

There are many aspects of getting a web app up and running that I had only heard about in passing — and many more I’m still lost on. But I figured implementing TunkRank would be an interesting place to start.

Phase I – Data Collection

As I said, I implemented TunkRank as a toy the same night that Daniel posted his algorithm. Things seemed to work out quite nicely and I liked it on theoretical grounds as a measure. When I decided to implement the real version, the task of hammering Twitter millions of times suddenly loomed. I suppose I thought there were maybe about 1 million active accounts on Twitter. I have harvested over 2 million before slowing my harvesting down in favor of other development. I have also collected about 40 million edges in the social graph (user A follows user B is one edge). Of the 2 million users I have encountered, those 40 million edges are for only 25% of them. I still haven’t gotten the followers for the remaining 1.5 million. When I do so, I’m sure I’ll discover another million or three users I haven’t seen yet.

I stopped where I did because I was using Ruby’s marshal functionality to dump the social graph to disk. Each dump was weighing in around 250 MB and it was exceeding Marshal’s ability to function. At this point I threw everything into a MySQL database. Bleh! I can’t even describe the pain in the ass that was. If I were to do that again, I would certainly use PostgreSQL, and may still do so. Better yet, I would use some sort of column store database.  But it’s in the MySQL db now and running ok (just ok, not great or even well). MySQL dies quietly and annoyingly at times.  I hate it.

Doing the operations I was doing before in memory in ActiveRecord instead is mind-bogglingly slow by comparison, as you’d expect. Twitter just released the ability to pull all follower ids in one request, which would have made my life easier, but I still can benefit from it going forward. Also, I should have been storing more information about users than just the twitter username. Having to go back and collect that was slow and annoying, but it’s done.

Phase II – Implementing the Algorithm

The algorithm is simple to compute. Check out this gist for a version that calculates it using ActiveRecord. I’d post it here, but WordPress.com sucks and I’m stuck with it. The code uses ActiveRecord more than I’d like, so I rewrote it in SQL using twitter ids.  The gist for that is here.  The #{p} and #{self.twitter_id} are Ruby variables.

Phase III – Doing the Web App

The web app itself is both the most important step and the least fun for me. I very much enjoyed putting together the code to collect the Twitter social graph and then computing the TunkRank scores, but all the nuts and bolts of getting a web app up and running are tedious. Some of it is interesting. Merb isn’t so bad, though I feel like the documentation is shitty. There is an open source Merb book that is missing stuff in all the sections I needed the most. The API documentation isn’t bad, but isn’t easy to search for high level things that you would normally find in a tutorial. Nor should it be — it’s API documentation not a tutorial.

Fortunately, most things were easy enough that I could find a solution eventually. The whole deploying step is foreign to me, and I’m an apache noob so when it comes to balancing mongrel instances I’m like wtf?  Fortunately, I found a few tutorials I was able to piece together.

So the final product is hosted on my 1.8 GHz dual core Dell laptop with 2 GB RAM running Ubuntu 8.10. If you check it out, hopefully it won’t overtax my pathetic server and bring the site down. My data is becoming a little stale so if your username isn’t found, please be patient. When a new person is encountered, I queue them for processing.

Final Thoughts

You can also follow @tunkrank on Twitter. I originally had that account acting as a bot that tweets scores when it encounters influential users. Also,  I was having it auto-follow anyone it grades, but upon reflection, it occurred to me these two things were just plain spammy. I chalk it up to a bad decision in the dead of night. Instead I will just have it follow anyone who follows it.  See my twitter philosophy for how the account will be managed.  I will post updates there on changes, fixes, and up/downtime.

The TunkRank score itself can grow quite large, especially for users with a high number of followers. I present percentiles as the measure, so everything falls in the interval [0,100]. That does not properly reflect that someone in the 100th percentile can be almost 1000 times more influential than someone in the 99th. I’m open to suggestions about how better to show this information. Neal Richter had a few good ideas, perhaps I’ll try one of those.  Still, though, I’m left feeling a little dissatisfied by all of the scoring mechanisms (my own included). As Neal pointed out, his ideas are starting points and I’d like to hear what other people would like to see before proceeding with a different scoring method.

Let me know what you think.

@MarsPhoenix is a twitter success story.  It’s also a NASA success story.  Oh and also a scientific success for all it has done on Mars.  As six months of night approach, the Phoenix probe was slowly shutting down systems to finish analyses.  A couple of days ago, a dust storm diminished the day time charging cycle enough that it caused the lander to go into hibernation.  NASA is going to try to revive the it this weekend, but the prospects are grim.  Even more grim are the chances that the probe will awake come spring.  Temperatures at the Martian poles go so low in the winter, they exceed the minimum tolerance for electrical circuits.

But back to the Twitter success story.  As of right now, @MarsPhoenix has 37,284 followers.  That makes it one of the most followed users on Twitter.  For the past few months, NASA has been posting updates posing as the probe.  The updates take the form of first-person snippets of information and answers to questions from users.  Overall, it has been great PR, keeping people up-to-date on space exploration in a completely new way.  We can’t exactly have a live feed from Mars, but by personifying the probe and getting people involved, NASA has really done a lot for improving public involvement in the mission.

NASA has expanded their twittering to a whole host of other missions.  Most notable (to me) amongst them are the Cassini probe (which is orbiting Saturn),  the Lunar Reconnaissance Orbiter, and the Spirit and Opportunity rovers.  So if you twitter, they might be worth some of your time.

@MarsPhoenix posted the following earlier today:

I should stay well-preserved in this cold. I’ll be humankind’s monument here for centuries, eons, until future explorers come for me ;-)

In honor of its imminent passing, Wired is running a contest to find the best epitaph for Phoenix.  My current favorite is:  “Every robotic lander dies. Not every robotic lander truly lives.”  I’m getting a little choked up..

After hearing about it for weeks, I caved and decided to check out friendfeed last night [and again, ht @dpn]. In previous posts I mentioned something I like to call the information diaspora. This is the phenomenon created by posting all sorts of personal information about your likes, dislikes, thoughts, opinions, etc all over the web and your subsequent loss of that information because it can’t be managed. I can see friendfeed coming in handy for removing some of this problem. You can attach a number of different social networking sites, flickr, youtube, etc all to your friendfeed account. Whenever you post something new in one of these sites, that information will be updated on friendfeed for all of your friends (and yourself) to be able to view. It’s not the perfect solution, but it is a very big step in the right direction.

Check it out. As usual, my username there is ealdent and feel free to friend me.

Plurk-or-Tweet

Posted: 1 June 2008 in Uncategorized
Tags: , , , ,

Is it Hallowe’en already? A fellow nlp blogger (and twitterer) pointed me to Plurk just a few minutes ago. I have been messing with Twitter’s api over the past couple days, which hasn’t been as easy as you’d think since they are suffering from massive growing pains. Fetching the public timeline takes between 5-30 seconds. However, they just got like $15 million in funding, so maybe they’ll be able to address the issue. The even bigger question is can they turn this free advertising service (which is what it is partially becoming) into a revenue stream?

Plurk is basically Twitter with a makeover and some extra social features thrown in. It still has the 140 character status update style interface, but includes a function selection for each plurk (what they call qualifiers): you can say, think, ask, wish, etc. You can also add smileys. Rather than appearing as a series of boxes scrolling down the screen, your plurks appear as floating boxes on a side-scrolling timeline. Plurks of friends also appear on this timeline and the result is a more graphical and pleasing (to me) interface. You can reply directly to other plurks in the boxes and conversations are tracked very nicely. This is far superior to twitter, which requires you to visit the other person’s timeline and wade through their tweets to find previous tweets in a thread. With Twitter being slower than a drunken monkey with three broken legs, that’s even harder.

Preview of Plurk

As my esteemed colleague pointed out, however, scaling is an issue for any service like this. Ultimately, you are bound by how fast you can access the database. If Plurk becomes as popular as Twitter (and I have every reason to believe it won’t), it will also become bogged down. Also, Plurk is just getting started and has no discernible API (unless I’m just missing it). Twitter already has quite a few third party apps.

I must say, though, I am sorely tempted to abandon Twitter in favor of Plurk just for the fact that Plurk is accessible. The massive lag of Twitter is getting to me. Of course, if no one is there to listen to my ramblings, what’s the point?

Science fiction author Arthur C. Clarke died yesterday.  He touched many lives through his writing and his ideas had an impact on me at an early age with short stories like “The Nine Billion Names of God” and movies based on his books like 2010 (which I saw in the theater) and later 2001 (which I saw as a young man).   His novel Rendezvous with Rama is being made into a movie and IMDB is quoting 2009 as the release date.  I thought it was interesting to find out he had been living in Sri Lanka for some time.

I visited my family in Ohio this past weekend and my uncle made a few interesting points.  He’s an old-school spring engineer, meaning he learned coming up through the trade rather than by going to school, and he supervises a number of employees at a relatively small spring company.  My grandfather used to own a spring company called, shockingly enough, Adams & Sons Spring Co.  That was later bought out and a number of the employees were moved to a different plant, including my dad and uncle.  So anyhow, my uncle was telling me a story, which I won’t go into, but the heart of it is that you should not wait for people to hand you “what you deserve.”  If you are a leader, regardless of your job title, then lead.  If you see someone who needs help, don’t wait for them to ask you.  Help.  Show that you have the initiative.  That’s probably fairly obvious, I mean we’ve all heard it before, but it came at a particularly important time for me.

I’ve been on twitter for a while now, though I don’t update it super-regularly like some people.  It’s fun and I hope more of my friends start using it, but I’ve noticed an interesting trend.  Just about anything is open to potential spam.  Friendster is sick with it.  MySpace is abominable.  LinkedIn seems fairly immune and I’ve gotten very few spam friend requests from Facebook.  Twitter has so far been very good about it, but there is a new trend that I’ve found interesting.  You can follow people and people can follow you on twitter.  So your status updates are public and potentially seen by thousands of people.  How do you increase the number of people who follow you?  Follow them, of course!  I’m having random people follow me left and right.  It only helps me, since I don’t follow them back, but it’s interesting to note.

Netflix Friends

Posted: 26 January 2008 in Uncategorized
Tags: , ,

Become my friend on Netflix.  I think it helps that you are actually already on Netflix.  :P

The Roman occupation of Judea (Israel) during the first century AD was disrupted in 70 AD when the Jewish people revolted. Rome, being a kick-ass military power, put down this rebellion. However, they couldn’t let the Jews get away with this attempt at self-rule, which might encourage other provinces to do the same. The new, crushing occupation and settlement of Judea led to the beginning of another diaspora of the Jewish people (the Jews had been scattered before, read your Old Testament).

I’ve talked about my idea of the new information diaspora a couple times before. We fill up all these different social networking sites and online services with personal information about our hobbies, preferences, friends, etc. This information is separated by incompatibility between platforms. OpenSocial is a move towards removing these boundaries, but so far it hasn’t caught fire.

In Facebook’s terms of service, you are not allowed to scrape Facebook for content. They don’t want you to gather information about your social graph, since that would potentially undermine their service. Ergo, you can import information into Facebook, but can’t export it out. Mark Zuckerberg, the founder of Facebook (though whether it was really his idea or software is disputed), seems to be shaping up to be quite a tyrant in this realm. It’s almost daily that some news about his bungling comes over the blagoblag.

The latest fiasco surrounds Robert Scoble, one of the better tech writers out there (in my opinion). He was using Plaxo Pulse, a service that attempts to solve a small part of the information diaspora problem by consolidating your friends’ activities on different sites. Facebook, however, put down this rebellion by disabling Scoble’s account. Robert’s crime? Trying to get the names, email addresses, and birthdays of the 1800 friends he has on both Facebook and Plaxo.

The Empire never ended.