Monday, July 28, 2003

linguistic butchery by online translation programs

I thought about writing a long discourse on the problem of using a translator program like Alta Vista's Babel Fish, but then I thought it might be better just to show you what it does to a language like Korean. What follows sounds like some weird prophecy shouted out by an incoherent foreigner (esp. when you read the last line). This was, in fact, a friendly email from my Korean/Hanja teacher. With all the extraneous references to "bedspreads," it sounds more salacious than it is. And what creature, exactly, came home to play with my teacher's nephew?

It came today to the house and with the nephew petty it played and the sleep which is pushed it slept. The life changed and the body did not get better petty. It will be like that and and it slept. And it happens and it eats evening and the petty spirit listens to recently and this mail it sends, well! the example bedspread. Only my talk plentifully? Your bedspread. The truth, the eastern country university saw? The day well will become and the bedspread. And anyway it appears Sunday not to be going to the anger poultry house. India also the hazard thing which wraps the burden which will go does accident, the printed style of writing... The tile it is given the older sister and if, the bedspread. The good season is not easier thought than every Sunday the bedspread which is not. It is like that but India it goes and to come again it wants going certainly. About under us who sprout the possibility of meeting Tuesday be and Joh keyss nine bedspreads, at the time of hour day liaison give.

Strange, eh? There's something eerily poetic about it at times, wouldn't you agree?

But remember to tread carefully: "About under us who sprout the possibility of meeting Tuesday be and Joh keyss nine bedspreads, at the time of hour day liaison give."

I've noticed that all sorts of possibilities sprout when I'm on the toilet. But if the Nine Bedspreads should attack me while I'm shitting, I'll be sure to liaison give.

A better, more human translation of the email excerpt:

I came home today, played a bit with my nephew, then went to sleep. Things are topsy-turvy, so I'm a bit under the weather. I took some medicine. Then I got up, had dinner, and thought to send an email. I've talked a lot about myself, haven't I!? So, did you visit Dongguk University? If you got a job there, that'd be great. Also... it looks like I won't be able to go to Hwagye-sa [Buddhist temple]. I still have to buy some items for my trip to India... have to help my older sis, you know. Going to temple every Sunday is harder than it seems at first blush. All the same, I'll be wanting to go to temple after I'm back from India. Anyway, if we can meet Tuesday, that'll be good. Call if you have time.

Sci-fi writers keep imagining that we'll have universal translators someday. Maybe so, but only if they possess human intelligence, including the absolutely necessary element of social awareness. Words themselves cannot be conceived of purely as discrete, immutable units that retain their meanings in some a priori, context-free manner. To the contrary, language is very context-bound, which is what makes communication both exciting and risky-- not to mention difficult for this first generation of translation programs. Current programs can't read minds; they don't understand intentions, and have no clue what sounds "natural." Strangely, though there are certain grammar rules that can be followed more or less consistently, these programs will violate those rules on a regular basis. What results is stuff like the quoted Babel Fish paragraph.

I will say that Babel Fish is slightly better with European languages, probably because of structural similarities, as well as notional similarities that have been programmed into it, probably unconsciously, by its designers. Languages like Korean, where so many words are constructed from a limited palette of similar-sounding syllables, require a bit more of an "artistic" brain for understanding, as the phonemes and morphemes are even more context-dependent than they are in English. If I say the phoneme "ee" to a Korean, for example, they'll have no idea what I mean unless I put it in a sentence. "Ee" could mean (1) a person's surname (Lee, Yi, Rhee, etc.); (2) lice; (3) the Sino-Korean number "two"; (4) the demonstrative particle meaning "this," as in the phrase "ee saram," meaning "this person," etc. If I say "ee" to an English speaker, what are my choices? They pretty much boil down to the sound "ee" itself, as when you tell a child, "Say EEEEEEEE!" to make them smile-- or open their mouth wider for tooth brushing. Or "ee," by itself, can refer to the letter E.

Notice, too, that "Say EEEEE" is also context-dependent. Am I trying to make a child smile (say, for photographs), or am I trying to persuade her to open her mouth to brush her teeth better?

These hurdles are absurdly simple for a human brain to overcome. But for a translation program with no pragmatic awareness, they are the source of amusing errors.

Go to Amritas for a neat linguistic discussion on "Yellow Claw." In the meantime, you might want to think about how this Babel Fish translation butchered the name of a Buddhist temple, Hwagye-sa (Flower + Mountain Stream + Temple, if I'm not mistaken), to give us "Anger Poultry House." I have to admit, I was rolling when I saw that.

One Korean word for anger is indeed "hwa." Vietnamese Thien (Viet. "Zen") monk Thich Nhat Hanh's book Anger is called Hwa in Korea. Popular seller, too. There is also a Sino-Korean character "gye," meaning poultry-- as in the Korean word "gye-ran," egg. The phoneme "sa" means dozens of different things in Korean, including "temple," "house," "person" (jeon-sa = warrior), etc. The problem is that, in Korean (and in Chinese, too), many Chinese-derived words are pronounced the same, meaning there are tons of homophones, but they are written as completely different Chinese characters. When you read Korean text (Korean is an alphabet, not a syllabary like Japanese, or a collection of characters like Chinese), you'll often see a Korean word followed by a parenthetical containing Chinese characters, to clue the reader as to the actual word being used. Which goes to show that, sometimes, even contextual clues aren't enough to help a reader understand what's being said.

And you wonder why it's taking me so damn long to learn Korean.
_

No comments: