June 10, 2004

Reconstructing the original migration out of Africa

John McWhorter is restrainedly enthusiastic about a recently published paper linking an isolated language of Nepal to the languages of the Andaman islands. Now, as I have recently pointed out, I am not terribly specialised in historical linguistics. The last thing I read on the historical linguistics of Papuan languages was Wurm's book from the mid-70's. Even when I read it over a decade ago, the very idea that the Andaman languages had any particular connection to the languages of Papua was considered controvertial.

Thus, I am ill-equipped to judge the connecting hypothesis - that if this Nepalese language is related to Andamanese languages and Andamanese languages are related to Papuan languages and Papua has been settled for some 75,000 years, then this link is reconstructing an 80,000 year old linguistic connection. But, it seems to me that I'd need to be convinced of the intermediate steps before considering the basic claim.

McWhorter does appear to be giving due diligence to scepticism about claims for such ancient reconstructions. I can't read the paper or see its bibiliography, so I can't track the entire argument back myself and come to my own conclusions.

He puts a lot of stock in the similarity of pronouns as a point in favour of this radical account of the origins of the Papuan languages. However, there is something else that I would want to check up on. I note that the authors are all involved in this project over at the Santa Fe institute. The intent is to construct a large database of core vocabulary for all - or a large part - of the worlds languages and accepted language reconstructions. This would enable them to make large comparisons across the whole database.

Now, while a similarity in pronouns might well be compelling when studying a small group of languages, if you have several hundred languages in your database, you might well find very surprising coincidences between languages that are nonetheless totally spurious. If the discovery of this previously unknown long distance similarity has been uncovered by such a database search, this argues against giving the coincidence of pronoun similarity any terribly great weight. Given a large enough sample, there is no relationship so unlikely that it can't be a coincidence. A lot of bad science - especially social science - gets published by ignoring this rule.

How you find something out is relevant in determining whether or not it is meaningful. Imagine if instead of an Andamanese language, this Nepalese language used almost exactly the same pronouns as - say - Farsi. We would immediately identify this as a coincidence. But, Farsi is a far closer neighbour geographically to Nepal than the Andamans.

Once again, I haven't read the article, but that is the only thing McWhorter describes that seems very striking to me. This Nepali community that appears to be the focus of this radical hypothesis is, like Andaman islanders, very short and dark. Southern Asia and Australasia have pockets of short, dark people who are physically quite different from their neighbours. They used to be called negritos (in Malaysia, they are nowadays called orang asli, but that name includes other peoples as well) and quite a few people have suggested that they are, in some sense, remenants of the original migration out of Africa in the distant past. However, I don't think there are any particular linguistic similarities between the "negritos" (I know some speak Austronesian languages, for example), and I am unaware of anyone claiming that they have DNA markers that would indicate a particularly common ancestry.

Now, this is not a body of research I keep terribly up-to-date in. These questions may have already been answered somewhere else. But, I wouldn't put much store by an article whose content is "A speculates that B, and C has found intriguing evidence of D, and if B and D and E (which we are proposing based on our search of a database of language etymologies derived from various sources) then possibly F. And, wow, wouldn't F be something!"

Maybe it's all legit and a radical breakthrough has just been made, or is at least plausible. But, this is the kind of thing I'd look out for if I could read the paper in question.

This completely stinks. Regular correspondences, not surface similarities, are the basis of responsible historical linguistics.

Ruhlen claims otherwise, at frankly preposterous time-depths, and is ridiculously far from having established the legitimacy of his methodology.

I don't know what McWhorter is up to, here. His argument is no better than "Sure, all the physicists are always slagging off astrology, but my friend Ted is just _such_ a typical Virgo; there's just got to be _something_ to it."

Posted by: des at June 10, 2004 17:03

That's what I've always understood too, but stuff keeps bubbling up that indicates other people are persuing alternatives. I just want to say that it's all bullshit, but this isn't my field and I can't be sure. But I do understand how to lie with statistics, and that's what this smells like to me.

Posted by: Scott Martens at June 11, 2004 12:36

McWhorter seemed like such a nice man, too.

Posted by: PF at June 15, 2004 22:40

1. Read the damn articles -- Ruhlen’s and McWhorter’s. They’re easy to find online and will take you 10-15 minutes.
2. Ruhlen makes a plausible argument that Kusunda does not belong in its current classification, and has ended up there only by accident. Thus, it will have to go somewhere. While that is not evidence for anything else, it does leave a problem hanging which will have to be addressed.
3. Ruhlen, and to a lesser extent McWhorter, complain that the refutations offered are all of the same weak variety: "It could be just
random..." with no evidence that it in fact is: no similar language situations shown to be random, no identified similarities with other languages besides the ones posited. "Linguistics just doesn’t work that way..." with a description about how linguists establish other types of relationships, but no comment addressing the claim that something else is being examined.* "The time-depth is preposterous..." with not even lip-service to starting from the data without jumping to fearful conclusions what it might mean. "Ruhlen is a poo-poo head anyway..." without explaining the relevance of that.
4. Gee, that looks exactly like what you guys did here.
5. As an additional fisking, the fact that they all work out of the same institute is a diagnosis before symptoms have been identified. It’s one of those shorthand forms of “They’ve got an agenda.” Irrelevant. Regular correspondences versus surface similarities? That has a nice ring, but it's notoriously elastic. A synonym for "surface similarities" would be "the actual words of the language," after all.
6. It was pre-established that Kusunda was one of three relic forms in SE Asia. If you were looking for something with unexpected features, this is exactly where you would start your search.
7. Some of Ruhlen’s data doesn’t look very strong to me. Some of it does. A trained comparative linguist evaluating the data would be useful to me. Even a clever nonspecialist looking at it would interest me.
8. I don’t like a time-depth of 75,000 years either. But that’s just speculation so far. History is full of odd things happening, and objects showing up in ridiculous places. Kusunda might be legitimately reclassified and suggest an enormous conflict in the data, which may have some other answer. But it can’t be thrown out just because it conflicts with other data.
9. And you might want to read the damn articles.

* I do understand the greater level of assurance one gets when one can lay out a string of relationships between languages: t often changes to z; alienable posession is marked with a separate case; etc. I remain unimpressed that linguists just know that without these, you aint got nuthin’. I have had decades of different schools of psychologists pulling the same stuff in my own field. They’re real good at convincing each other. If you can’t convince an intelligent layman, then it’s just voodoo.

Posted by: Assistant Village Idiot at March 1, 2006 2:49
