Paul Payak of the Global Language Monitor is claiming the 1 millionth English word is coming soon.  He says a new English word is coined every 98 minutes, so the 1 million marker will arrive about 15 days hence.  The CBS article that tipped me off to this is pretty amusing in the quotes it selected from linguists, which resoundingly cried “bullshit.”  But the best quote came from Payak himself:

We believe words can be counted if you define them in the right way. You can count them like anything else in science. You can count how many atoms there are in the ocean.

Let’s think about counting the atoms in the ocean for a moment. What about where rivers flow into the ocean? Where is the boundary line? Salt and fresh water are mingling quite a bit and finding the exact boundary is pretty much impossible. If we draw an arbitrary line, surely we will get too much in one place and too little in another. Also, what about rain and evaporation? Counting the atoms would require an instantaneous snapshot of the entire ocean at the atomic level. It can’t be done.

You run into similar problems counting words.  Compound words blend into single words and words leave the language as well as enter it.  How do you detect this?  You’d need a snapshot of the entire English language as it is spoken, typed, and read all around the world.  What is a word in one dialect isn’t necessarily a word in another dialect.  Where do you draw the line?

This is a subject much larger than the treatment I am about to give it.  Linguistic homogenization occurs in modern states where regional dialects are marginalized and a standard dialect is advanced as the primary method for acceptable public communication.  The powerful favoring a single dialect is nothing new, but now more than ever, states are able to impose this on the wider populace.  European countries encourage one or two primary languages to be taught in school and used in public.  America does something similar with Standard American English.  Speaking a non-standard dialect is often seen as a barrier to employment and movement in higher social circles.  Basically, the snobs keep you down if you don’t talk like they do.

I was reading on Language Log earlier about the Uniformitarian Principle.  Uniformitarianism is simply the idea that things are now as they have always been, so we can learn how things were by learning how they are now.  Language Log describes how modern Europe no longer holds the key to language in prehistoric Europe thanks to the ability of modern states to impose linguistic homogenization.  Think about that for a second.  Modern states, presumably democratic, are so powerful they even tell you how to talk.  Perhaps even how you think.  Is that a paranoid leap?  Am I overstating it?  Even absolute dictators of past centuries didn’t have that kind of power.

But it’s not like one single person is doing this.  Instead they are doing it.  The ineffable they.  But if they are telling us how to think, why do we listen?  We can’t help it, we’re too young when it happens, and then we become them.

Absolute dictators of the past could not do this for many reasons.  They didn’t have the infrastructure to educate the masses, nor did they have popular media to transmit one dialect into every home on a daily basis.  A population too large for all of its parts to remain in constant contact will begin to diverge dialectally.  But educating the masses would have been looked down upon anyway since giving people too many ideas tends to make them question things like a single all-powerful leader calling all the shots.  So now that we are educated enough to know all-powerful dictators are bad news, we have replaced them with power structures more complicated and inscrutable.

A recent post by Daniel Lemire posing a simple mathematical puzzle revealed in stark contrast the bars of my mental prison.  So what are the bars like of this bigger prison we cannot see?  Philip K Dick called it the Black Iron Prison.  I’ve always found that concept appealing.

I was asked recently about the motivation for Abney’s DP (determiner phrase) hypothesis. That is, that determiners are not part of English noun phrases but head up their own phrases of which NPs are complements. I couldn’t remember the justification I was given in my Syntax I class, so I went back to the textbook (Syntax: A Generative Introduction by Andrew Carnie). I found the following interesting excerpt:

“… for lack of a better place to put them, we put determiners … in the specifiers of NPs. This, however, violates one of the basic principles underlying X-bar theory: All non-head material must be phrasal. Notice that this principle is a theoretical rather than an empirical requirement (i.e., it is motivated by the elegance of the theory and not by any data), but it is a nice idea from a mathematical point of view, and it would be good if we could show that it has some empirical basis.”

This clashes a bit with my empirical sensibilities. It represents very much the rational point of view in linguistics, that we can probe our own understanding of language by judging what we perceive to be grammatical or ungrammatical. The empiricist view would look at it from another angle: does it appear in data? So the theoretical view might be “nice” but if it is not supported by the data, it is crap.

Treebanks don’t use DPs (at least none that I’ve seen), so automatic parsers typically have no concept of them. I wonder if they would add any value?  I’m guessing they would just run into sparsity issues since another set of tags have to be estimated.   But who knows, the extra structure might be helpful in complex situations.


I was just reading a Wired article about the deaths of two AI researchers:  Chris McKinstry and Pushpinder Singh.  Both were working on strong AI (or at least, had the hope of it).  Both committed suicide and did it within a month of each other.  McKinstry claimed that his system would be aware in a short time.  If GAC ever became aware, it has vanished into the cloud.  So all very interesting and I recommend the article.  Not if you want a serious read about the topics they researched, but it presents an interesting narrative of two lives with eerie parallels.

What inspired this post is a minor quibble about a word that many English speakers have surely heard:  Wunderkind.  In German, it literally means “wonder child” and is often applied in English to a child prodigy or a young person whose star is on the rise.  Here is an excerpt from the Wired article:

Push, as everyone called him, had also taught himself to code — first on a VIC-20, then by making computer games for an Amiga and an Apple IIe. His father, Mahender, a topographer and mapmaker who had studied advanced mathematics, encouraged the wüenderkind. Singh was brilliant, ambitious, and strong-willed. In ninth grade, he had created his own sound digitizer and taught it to play a song he was supposed to be practicing for his piano lessons. “I don’t want to learn piano anymore, I want to learn this,” he said. [emphasis mine]

When you have a German vowel with an umlaut, it is rendered in English orthography as the vowel + e.  So ü would be written in English as ue.  Wunderkind has no umlaut in German, so this would not be necessary.  Plus, you wouldn’t have to add the e anyway since they already included the umlaut.  Shoddy editorial work, but it made me lol.

It’s a morning of fun new words! First I hear greenwashing on the Today Show, which Donna likes to watch while she eats brekkie. Then, Language Log delights me with nanoblahblah, henchgoon, and celebufreak. Erin McKean, the Dictionary Evangelist, twitters words of the day so I also got a nice infusion when I examined her twitter feed for the past week or so. A few selections I particularly like that she found: paracosm, yostelumpet, and anthroponymy. And now for the definitions!

  • anthroponymy – the study of the names of human beings [emckean@twitter]
  • celebufreak – a freak with fame (e.g. Kim Kardashian) [Wordlustitude]
  • greenwashing – marketing a product as green when it’s really not [Today show]
  • henchgoon – alternate term for administrative assistant or “assistant of doom” [Wordlustitude]
  • nanoblahblah – very, very tiny nonsense (nanotechnobabble) [Wordlustitude]
  • paracosm – a private imaginary world, esp. made by children to escape harsh circumstances (think Pan’s Labyrinth) [emckean@twitter]
  • yodelumpet – a singing style that combines yodeling and Louis-Armstrong-style trumpet-like sounds [emckean@twitter]

Please note that the twitter links are stable in terms of link permanence, but are unstable in twitter’s ability to serve up the page. So if at first you get a bizarre message with birds, try again. This has also led to the re-discovery of the most excellent Wordlustitude site. I had seen a while ago but for whatever reason didn’t subscribe to it. This has been remedied, and if you like neologisms, I recommend you do the same.

There is nothing unusual about verbing nouns in English.  Despite the fact that your English teacher may have told you not to do this, it is common practice, especially on the intarwebs.  Verbing brand names to mean the primary action performed by the chief product of that brand is less common, but we all know about “googling.”  Just sitting here, trying to drink my morning coffee, I couldn’t come up with another example.

But what got me thinking about this is another example used in today’s User Friendly.  One character says,

“You’re gonna ebay it to goths, aren’t you.” [emphasis mine]

I had never heard the brand name ebay used in verb form, meaning to sell something on ebay (the primary function of their chief product).   It is not uncommon, though.  Searching the Google for +”to ebay it”, I found that at least 10% of the top few pages of results were just this construction (versus “to ebay.  It …”).  I estimate from that there are about 19,000 uses of ebay as a verb in this context, and no doubt many others in variations (e.g. “I ebayed my watch”).

Another example that just occurred to me, but which is pretty artificial, is to twitter, meaning to post something on Twitter.  I say this is artificial because Twitter openly encourages and suggests this terminology.  It was not an emergent construct, but an imposed one.  It has been adopted by the overwhelming majority of users, though.  [follow me on twitter]

So here is my question:  does this only work for Internet companies?  I’m probably forgetting some obvious brick-and-mortar company for which we have verbed their brand, so please tell me if I have.  Or is it that Internet companies are especially conducive to this construction because so many Internet companies start off with only one service and become known by that service.  Google is search, ebay is selling crap through auctions, twitter is … twittering.   If this only works for Internet companies, why did we start doing it in the first place?

And I just came up with a brick-and-mortar example:  hoover.  You can hoover down a plate of food, meaning to suck something up like a champ.  But my classification still holds, that is the primary function of their chief product (or at least the main product that people know them by).  Marketing people have already taken this to heart, I’m sure.  You need an easy name that sounds like English.  Just like with scientific terminology, no one wants to Dinklefwat their dishes.

A couple months ago, I wrote about Richard Hogg dying. He was a professor at the University of Manchester who edited the Cambridge History of the English Language and did a lot of work on Old English morphology. I had corresponded with him briefly a few months before he died about a lab project on computational morphology. I was making a morphological analyzer for Old English verbs. I’m actually still working on it and generalizing it to the rest of the language. Anyhow, as I said before, he was a nice and helpful guy and it was a shame to see him go.

Now, the International Society for the Linguistics of English (ISLE) has set up a scholarship in his honor. Early career scholars who are members of ISLE (membership can be applied for at the time of submission) are eligible. Early career means you either haven’t gotten your PhD yet or got it within the past two years. Masters and undergraduate applicants are acceptable, but the expected entrant is a PhD candidate/recent recipient. The paper may be on any research-related topic in English or English linguistics and will be judged on originality and the contribution of its results. The prize is £500 and the submission deadline is March 31, 2008.