Paul Payak of the Global Language Monitor is claiming the 1 millionth English word is coming soon.  He says a new English word is coined every 98 minutes, so the 1 million marker will arrive about 15 days hence.  The CBS article that tipped me off to this is pretty amusing in the quotes it selected from linguists, which resoundingly cried “bullshit.”  But the best quote came from Payak himself:

We believe words can be counted if you define them in the right way. You can count them like anything else in science. You can count how many atoms there are in the ocean.

Let’s think about counting the atoms in the ocean for a moment. What about where rivers flow into the ocean? Where is the boundary line? Salt and fresh water are mingling quite a bit and finding the exact boundary is pretty much impossible. If we draw an arbitrary line, surely we will get too much in one place and too little in another. Also, what about rain and evaporation? Counting the atoms would require an instantaneous snapshot of the entire ocean at the atomic level. It can’t be done.

You run into similar problems counting words.  Compound words blend into single words and words leave the language as well as enter it.  How do you detect this?  You’d need a snapshot of the entire English language as it is spoken, typed, and read all around the world.  What is a word in one dialect isn’t necessarily a word in another dialect.  Where do you draw the line?

I just completed my first guest blogging post over at mind x the + gap where I talked about the mutual history of language and commerce, as well as some thoughts on how that will continue into the future. Since the focus of Mil Joshi‘s blog is more towards psychology and economics, the following is a slight adaptation more in line with my normal content.

Commerce is a human convention deeply entwined with language. Economic motivations were among the many reasons ancient (and modern) empires conquered other lands, spreading their languages beyond their natural range. Traders would travel to distant lands, encountering speakers of exotic languages. And where two languages meet, words begin to exchange back and forth. In cases where bilingual speakers were few to none, Pidgin languages developed. Pidgins are languages with simplified grammar and vocabulary, and are never spoken as a first language. They come about as a means of communicating between speakers of different languages for the purpose of trade. When a Pidgin is spoken widely enough that children in the community grow up learning it as a first language, the language changes into a Creole. Creoles have many fascinating characteristics, but the point here is, commerce is a driving factor in their creation. When a conquering empire brings its own language, it either supplants the native language or influences it heavily. Pidgins, on the other hand, develop because speakers are motivated to communicate in order to trade.

Groups of speakers who remain in constant contact tend to speak the same dialect of a language. When a group breaks off and becomes isolated (contact with the original group is infrequent or not widespread), their dialects begin to diverge. Mass communication is changing this landscape, allowing larger and larger people groups to remain in constant contact. As a result, minority languages are being spoken even less in favor of popular languages. This process is called linguistic homogenization. If we follow the slippery slope to the extreme, eventually there will be a single language spoken by all people. This eventuality isn’t likely to happen in our lifetimes, and not just because it requires almost all native speakers of a language to die out. A far more likely scenario is that a handful of commerce languages will be spoken by the vast majority of people. Commerce languages are popular languages people speak to do business in (English, Mandarin, etc).

There are many factors driving linguistic homogenization. Commerce is certainly one of them. In the modern world of the internet and mass media, attention is the scarce resource people are competing for. If you want to capture the attention of others, you need to maximize your reach and doing so typically means choosing a language of commerce. Minority languages present a barrier to the widest possible dissemination of information (except when the only intended audience are speakers of that language). The attention economy promotes linguistic homogenization.

Machine translation services, such as Google Translate, potentially have the power to change this. As the quality of these services improve, it becomes less and less necessary to publish exclusively in commerce languages. Linguistic homogenization may not be the inexorable force it appears to be today. Of course, the output of machine translation can be pretty abysmal. Will the quality of machine translation improve fast enough, and will the business case for them be strong enough to turn the tide of linguistic homogenization?  Those betting on machine translation services surely hope so. But there is a dueling problem here. In order for machine translation to truly counteract linguistic homogenization, it has to be freely available (or ridiculously cheap). These systems are difficult to build and require great computational resources. The outcome will almost certainly be a matter of economics as well as science.

While the future progress of commerce and language may be uncertain, what is certain is that they will continue to heavily influence each other. And there’s nothing new about that.

This is a subject much larger than the treatment I am about to give it.  Linguistic homogenization occurs in modern states where regional dialects are marginalized and a standard dialect is advanced as the primary method for acceptable public communication.  The powerful favoring a single dialect is nothing new, but now more than ever, states are able to impose this on the wider populace.  European countries encourage one or two primary languages to be taught in school and used in public.  America does something similar with Standard American English.  Speaking a non-standard dialect is often seen as a barrier to employment and movement in higher social circles.  Basically, the snobs keep you down if you don’t talk like they do.

I was reading on Language Log earlier about the Uniformitarian Principle.  Uniformitarianism is simply the idea that things are now as they have always been, so we can learn how things were by learning how they are now.  Language Log describes how modern Europe no longer holds the key to language in prehistoric Europe thanks to the ability of modern states to impose linguistic homogenization.  Think about that for a second.  Modern states, presumably democratic, are so powerful they even tell you how to talk.  Perhaps even how you think.  Is that a paranoid leap?  Am I overstating it?  Even absolute dictators of past centuries didn’t have that kind of power.

But it’s not like one single person is doing this.  Instead they are doing it.  The ineffable they.  But if they are telling us how to think, why do we listen?  We can’t help it, we’re too young when it happens, and then we become them.

Absolute dictators of the past could not do this for many reasons.  They didn’t have the infrastructure to educate the masses, nor did they have popular media to transmit one dialect into every home on a daily basis.  A population too large for all of its parts to remain in constant contact will begin to diverge dialectally.  But educating the masses would have been looked down upon anyway since giving people too many ideas tends to make them question things like a single all-powerful leader calling all the shots.  So now that we are educated enough to know all-powerful dictators are bad news, we have replaced them with power structures more complicated and inscrutable.

A recent post by Daniel Lemire posing a simple mathematical puzzle revealed in stark contrast the bars of my mental prison.  So what are the bars like of this bigger prison we cannot see?  Philip K Dick called it the Black Iron Prison.  I’ve always found that concept appealing.

The North American Computational Linguistics Olympiad is an annual competition open to US high school students that introduces kids to computational linguistics at a much younger age than people normally hear about it. I didn’t hear about CL until I was three years into my undergrad program. The instant I did hear about it, I knew I wanted to do it. Most people I talk to about it, look like I’ve just uttered a phrase of Klingon. I suspect most people don’t hear about it at all, or if they do, it’s sometime during their undergrad program and not at the beginning, when they might be better able to plan their educational career path. Also, CL is pretty much a graduate program and rarely taught before then. Granted, a lot of the maths involved are beyond what’s taught to high school students and early undergrads, but the linguistics is not. And thinking about linguistics computationally is not. So NACLO is doing an extremely valuable service which I support completely. And not just because one of my professors is one of the General Chairs of the organizing committee for it. She no longer can affect my grade and I have no need to suck up — so this is genuine. How’s that for full disclosure?

One of my google alerts popped up a post on a spam blog I tracked down to this original post, which talks about a lot of young kids doing some great things in science. In the post is an interview with last year’s winner, Adam Hesterberg. He said, “I’d never studied linguistics, and ‘computation’ sounded like boring calculation.” That reminded me of the fact that computation might mean a different thing for most people than it does for scientists. I’m no corpus linguist, so I’m not gonna try to find out right here. What I suspect is that computation has a more “hard work” connotation for people outside of science: it’s the “plugging and chugging” meaning. Inside science, it’s tacked onto the beginning of some other field to mean anything in that field that can be computed. Computational linguistics deals with the computable aspects of linguistic theories. A very quick search on wikipedia finds at least a dozen other computational fields:

Is it a good idea to use this name when approaching high school students? What about language technologies? Well, the competition isn’t about language technologies, it’s about critical problem solving in a linguistics setting. And trying to fit that into a competition name isn’t going to work, either. North American Critical Problem Solving about Linguistics Olympiad (NACPSLO)? It makes me think of narcolepsy.

So my proposal is North American Logic and Language Olympiad (NALLO). It’s easy to say (rhymes with hallow) and accurately describes the subject matter. Plus, I think it has broader appeal. A lot of kids are interested in logic, language, or both. It shakes free of the negative connotation of computation and draws kids where they can be introduced to it a little more easily. The downside is that it doesn’t mention linguistics directly, so that might trouble some people who are a little more traditional about their outreach.

What do you think?

Apparently, I run a rather clean shop. Whodathunkit. And probably most of the cussing comes from my posts on brainfuck.


It’s a morning of fun new words! First I hear greenwashing on the Today Show, which Donna likes to watch while she eats brekkie. Then, Language Log delights me with nanoblahblah, henchgoon, and celebufreak. Erin McKean, the Dictionary Evangelist, twitters words of the day so I also got a nice infusion when I examined her twitter feed for the past week or so. A few selections I particularly like that she found: paracosm, yostelumpet, and anthroponymy. And now for the definitions!

  • anthroponymy – the study of the names of human beings [emckean@twitter]
  • celebufreak – a freak with fame (e.g. Kim Kardashian) [Wordlustitude]
  • greenwashing – marketing a product as green when it’s really not [Today show]
  • henchgoon – alternate term for administrative assistant or “assistant of doom” [Wordlustitude]
  • nanoblahblah – very, very tiny nonsense (nanotechnobabble) [Wordlustitude]
  • paracosm – a private imaginary world, esp. made by children to escape harsh circumstances (think Pan’s Labyrinth) [emckean@twitter]
  • yodelumpet – a singing style that combines yodeling and Louis-Armstrong-style trumpet-like sounds [emckean@twitter]

Please note that the twitter links are stable in terms of link permanence, but are unstable in twitter’s ability to serve up the page. So if at first you get a bizarre message with birds, try again. This has also led to the re-discovery of the most excellent Wordlustitude site. I had seen a while ago but for whatever reason didn’t subscribe to it. This has been remedied, and if you like neologisms, I recommend you do the same.

There is nothing unusual about verbing nouns in English.  Despite the fact that your English teacher may have told you not to do this, it is common practice, especially on the intarwebs.  Verbing brand names to mean the primary action performed by the chief product of that brand is less common, but we all know about “googling.”  Just sitting here, trying to drink my morning coffee, I couldn’t come up with another example.

But what got me thinking about this is another example used in today’s User Friendly.  One character says,

“You’re gonna ebay it to goths, aren’t you.” [emphasis mine]

I had never heard the brand name ebay used in verb form, meaning to sell something on ebay (the primary function of their chief product).   It is not uncommon, though.  Searching the Google for +”to ebay it”, I found that at least 10% of the top few pages of results were just this construction (versus “to ebay.  It …”).  I estimate from that there are about 19,000 uses of ebay as a verb in this context, and no doubt many others in variations (e.g. “I ebayed my watch”).

Another example that just occurred to me, but which is pretty artificial, is to twitter, meaning to post something on Twitter.  I say this is artificial because Twitter openly encourages and suggests this terminology.  It was not an emergent construct, but an imposed one.  It has been adopted by the overwhelming majority of users, though.  [follow me on twitter]

So here is my question:  does this only work for Internet companies?  I’m probably forgetting some obvious brick-and-mortar company for which we have verbed their brand, so please tell me if I have.  Or is it that Internet companies are especially conducive to this construction because so many Internet companies start off with only one service and become known by that service.  Google is search, ebay is selling crap through auctions, twitter is … twittering.   If this only works for Internet companies, why did we start doing it in the first place?

And I just came up with a brick-and-mortar example:  hoover.  You can hoover down a plate of food, meaning to suck something up like a champ.  But my classification still holds, that is the primary function of their chief product (or at least the main product that people know them by).  Marketing people have already taken this to heart, I’m sure.  You need an easy name that sounds like English.  Just like with scientific terminology, no one wants to Dinklefwat their dishes.