May 10, 2005

The PhD proposal - part 1

Real life strikes again. I'm moving forward on the PhD, doing some fixes on the proposal. This, once again, interrupting my ability to do other things, but that's for the best. I'm not going to flunk Dutch, so the studying can wait. I'm not going to have trouble with Russian - although why it takes five weeks to get an invitation letter through the Foreign Ministry is beyond me. I ought to be getting it in a week.

So, below the fold, part one of the PhD proposal. Critiques solicited. I will probably delete the content below the line after a week or so because I don't really want the search engines finding it. The idea is simple enough, and I really don't want to see someone else doing the same thing in the middle of my doctorate. Call me egotistical, or paranoid. Whatever.

[Removed as originally intended]

"The Chinese writing system, in which spaces are not used": spaces are used to separate syllable-morphemes, not words. "wuzilide dongxi" ("the things in the room") is 2 words but 6 graphs, and the spacing doesn't tell you where the words divide. Writing pinyin, Chinese divide words about as we would.

The Chinese treat all syllables as morphemes, even though some are not. In a moderate number of 2-syllable words, especially animal names, one syllable (or both) is never used independently of the other. Fictitious morpheme definitions are invented for these syllables.

In dongxi "thing", both syllables are real morphemes, but the meaning of the binome is completely unrelated to the meanings of the syllable graphs ("east" + "west"). A true two-syllable word.

You may have known all this, but anyway....

Can't critique your project proposal itself, but as a layman I found your presentation well done.

John, have you ever tried to get two Mandarin speakers to agree on where the word breaks are? Yeah, I know that there are rules about spacing in Pinyin, and I have the strong impression no one has a clue what they are. My point is that the clairty we have in English about where the line is between syntax and morphology is due to our writing system. The line itself is very poorly motivated from a linguistic standpoint.

Part 2 goes up tonight.

Have to think about it.

The pinyin transcriptions I've seen looked about right. Disputes might come from overattachment to the traditional writing system, like prescriptive grammar.

Perhaps though you should say "spaces are not used to group syllables into words". I did a double-take on "spaces are not used", which I think was true of ancient Greek and Latin. Sort of a nitpick I guess.

Interesting stuff, at least from the viewpoint of this pathetically mono-lingual math dilettante.

My impression is that locating word breaks is relatively difficult for continuous speech recognition systems, too.

