Posts Tagged ‘blagoblag’

Mendicant Bug Podcast

Posted: 12 July 2009 in Uncategorized
Tags: , , , ,

Thanks to (via @johndcook), this blog now has a podcast powered by speech synthesis.  Not having heard any decent speech synthesis for open domain text (maybe I’m behind the times here), I was pretty impressed with it.  John had a post with a quote from The Agony and the Ecstasy and Odiogo got it pretty close to right in terms of pronunciation and intonation.  Hopefully it will turn out as well for my blog.  Let me know if you give it a listen.

Subscribe to the podcast

Reblog this post [with Zemanta]

If you are in search of a blog that will put an end to all of your earthly troubles, look no further than the Noisy Channel.  Aside from being a font of knowledge that will turn you into an AI from a futurist’s dream, there have been reports that regular TNC readers have been cured of certain debilitating illnesses such as halitosis, trichodaganomania, and the often fatal googlemania.

What are you waiting for?

P.S.  Daniel, email me for the address of where you can send the check.

There has been much ballyhoo in the blogosphere touting Google’s so-called foray into semantic search.  The blog post announcing the new feature doesn’t even mention the word semantics, but it does say it looks at associations and concepts related to your query.  I see no mention of tuples or anything of the sort and the suggested queries are the kind of thing that I would expect to come out of a background closer to document/query classification than semantic analysis.

Related search results for <i>much ado about nothing</i>

Related search results for much ado about nothing

And the results are pretty meh.  Except for taming of the shrew, those results are no-brainers.  That’s query completion quality results.  Of course you can’t judge the whole system by one isolated example.

When PC World and a host of other pop tech media zines started toasting the entrance of Google to the semantic arena, I was excited to see some cool stuff.  Imagine my disappointment when I was not only underwhelmed by the quality of the results, but by the lack of novelty.  How long has that feature been there?  Seems like I’ve seen it for ages.  Maybe it got a technological face-lift (I guess that would be a face-lift on the inside), but it looks about the same as I remember it.  Plus, its placement at the bottom of results page relegates it to search engine hell.

In summary:  boring.  My complaints are first and foremost with those elements of the blagoblag who over-hyped this.  Secondly, I am complaining to Google for not being better.  I am feeling demanding today.

Daniel’s post on it is worth reading.

It is bad journalism when an old news story is debunked and continues to be rehashed!  How sloppy!  Shame on you, Houston Chronicle!

Back around 2000, when Palem began thinking about the future of computer chip technology, power consumption wasn’t a big consideration. Only speed mattered.

But today, the energy consumed by information technology – a January news story likened the energy used in just two Google searches to boiling a kettle of tea – has become a major consideration.

Google debunked the results quite quickly after that article ran. Why is it acceptable to cite stories without checking on whether those stories are accurate? Isn’t this what we pay journalists for? I guess it’s too hard to check up on facts and instead we can just say there was a news story that reported it rather than making any claims about its correctness. Isn’t that what we have bloggers for?

Since I started blogging almost a year and a half ago, I have been following many blogs. I managed to find some blogs dealing with computational linguistics and natural language processing, but they were few and far between. Since then, I’ve discovered quite a few NLP people that have entered the blagoblag. Here is a non-exhaustive list of the many that I follow.

Many of these bloggers post sporadically and even then only post about CL/NLP occasionally. I’ve tried to organize the list into those who post exclusively on CL/NLP (at least as far as I have followed them) and those who post sporadically on CL/NLP. I would fall into the latter, since I frequently blog about my dogs, regular computer science-y and programming stuff, and other rants. P.S. I group Information Retrieval in with CL/NLP here, but only the blogs I actually read. I’m sure there’s a bazillion I don’t.

If I’ve missed one+, please let me know. I’m always on the lookout. Ditto if you think I’ve miscategorized someone.  I’ve excluded a few that haven’t posted in a while.

Jekyll and Code

Posted: 8 January 2009 in Uncategorized
Tags: , , , , , , ,

Tom Preston-Werner, aka mojombo, rocks.  When GitHub announced GitHub Pages recently, they pointed to a new blog engine, Jekyll.  Jekyll generates the blog as a set of static pages — no database reads, no PHP, just fast HTML.  I was instantly drawn to it, and since I’ve been itching to switch blog engines, I damn near moved this blog.  It would be hosted on GitHub, for free.  And it would be backed up using my favorite version control system.  I would have complete access to all of my content.  If WordPress went belly up, I would lose all of my content.  That bothers me.

Jekyll is still in its infancy.  But for two things, I would switch right now.  First, support for tags is incomplete, so pages on my blog such as would no longer be supported under Jekyll.  That would play hell with my Google traffic.  I’m willing to make that sacrifice since most of that traffic is from people who don’t care about the main topics I’m interested in.  Second, and this is the killer, Jekyll does not support comments.  Yet.  The good news is, it can be forked and someone may implement comments.  I hope so, but the static nature of Jekyll means handling comments is not very straightforward.  I can imagine how it might be done, so we’ll see.  I suppose I could do it myself, but my plate is so full right now I’m having a hard time getting what I need to get done done.

So what I’m doing instead, for now, is hosting my code there.  Jekyll has code highlighting built-in using Liquid.  Handy!  I put up the source for my post on Bandwidth simulation.  I’ll be adding more soon, which I’ll make note of, if for some reason you’re actually interested.

Top posts of 2008

Posted: 31 December 2008 in Uncategorized
Tags: , , , ,

Looking back over 2008, there have been a lot of changes in my life. Many of those are reflected in my blog, but few are reflected in the posts that have gotten the most traffic. But for the hell of it, here are the top posts anyway.

Post Hits in 2008
Old English Translator 10,589
Christmas Tree 2007 4,393
Steampunk Death Star 1,362
Salad Fingers 8 1,108
10 Reasons to Use Git for Research 1,032
Merge sort fun 777
The Noob’s Guide to Parsing 774
Java Properties 759
Ambigrams 719
Substitution Ciphers 680

Of all of those posts, the best one is hands down 10 Reasons to Use Git for Research. After that, the Noob’s Guide to Parsing. Some of the posts with the most hits are just link-sharing, where I saw something cool (Salad Fingers, Steampunk Star Wars, Ambigrams) and then other people found my link first.  One definite change on this blog was a decrease in the frequency of my posts.  Around the end of last year, I was posting close to 2 items per day.  Now it has stretched out to about 2 items per week.  Maybe I’ll reflect more on that later.

I’ll leave you with these thoughts.

ReadWriteWeb has a post on Forrester Research’s study about consumer trust of information sources.  It puts corporate and personal blogs at the very bottom (with 16% and 18% trust respectively), with personal email from a friend coming in at number one (with 77% trust).  Forrester suggests that corporate blogs shut down shop unless their blog is doing a good job of generating good will and/or leads.

This study bothers me on many levels.  As Michael Bernstein points out in the comments:

“Trust” is a 4 or a 5 on a 5 point scale, that is, anything above neutral. This means that lots of people could slightly trust a source and it would show up above something which a smaller number of people trust quite a bit and others are neutral on.

Also, the study compares information sources like email from friends and social networking profiles of friends to corporate and personal blogs. I ranted about this a bit on The Noisy Channel, which I’ll just reproduce here:

Comparing “personal blog” or some random “corporate blog” to “personal email sent from a friend” is pretty much like comparing “advice from gin-soaked hobo” to “what your mama always said.” The fact that Forrester can get away with presenting something like this and suggesting businesses act on it to shut down their blogs bothers me. It seems to me that 16-18% trustworthiness is not bad when you consider that much of the time you do a Google search for some product you hit a splog. That’s probably the only experience 80% of people have with blogs. Of course, that’s wild speculation, but this straw man study has gotten under my skin. :P And I do acknowledge that there is a huge amount of untrustworthy information in blogs, but I’m not sure that it’s much different from other user-generated content.

I agree that corporate blogs that are just reproductions of press releases (as Daniel Tunkelang at the Noisy Channel points out) are garbage. That is the wrong way to run a corporate blog. Google has a very good approach. They promote work they are doing by getting employees to blog about their personal projects (at least the Google blogs I read, there are surely exceptions). It comes across as real and beneficial. The value is that they keep you up-to-date on what they are doing with actual content. When that changes to become shameless promotion and unveiled attempts to drive sales, the blog is going to suck. GitHub’s blog is a another good example of a corporate blog done right.

Moving on, Daniel Tunkelang again offers some useful insight:

I think the interesting question for companies is not whether they should publish corporate blogs, but rather whether they should encourage their employees to publish personal blogs that relate to the work the company does. … I think that companies are often too conservative, and incur an enormous opportunity cost in the name of protecting trade secrets. Letting employees blog (and, more generally, publish) not only provides the companies with free marketing, but also provides employees with an avenue for personal development.

My cynicism prevents me from getting my hopes up here, but that would be nice.

100,000 hits

Posted: 20 November 2008 in Uncategorized
Tags: , ,

Sometime earlier today, I hit 100,000 hits on this blog!

Blogging Platform

Posted: 7 November 2008 in Uncategorized
Tags: , ,

I’m considering switching to a different blogging platform.  The inability to use javascript in the hosted version of WordPress is annoying.  What do you think?

Update: I have decided to stick with  After further evaluation it seems to be the best choice.  Even though I hate not being able to do some javascript stuff, ultimately it just seems to be a better platform.