Skip to main content

Wired 14.12: Me Translate Pretty One Day

Wired 14.12: Me Translate Pretty One Day

Spanish to English? French to Russian? Computers haven't been up to the task. But a New York firm with an ingenious algorithm and a really big dictionary is finally cracking the code.

The brainchild of a quirky former used-car salesman named Eli Abir, the company has been designing the system in secret since just after 9/11. Now the application is ready for public scrutiny, on the heels of a research paper that Carbonell – who is also a professor of computer science at Carnegie Mellon University and head of the school's Language Technologies Institute – presented at a conference this summer. In it, he asserts that the company's software represents not only the most accurate Spanish-to-English translation system ever created but also a major advance in the field of machine translation.

[...]

From its genesis at the post-World War II dawn of computing – when ambitious researchers believed it would take only a few years to crack the language problem – until the late 1980s, machine translation, or MT, consisted almost entirely of what are known as rule-based systems. As the name implies, such translation engines required human linguists to combine grammar and syntax rules with cross-­language dictionaries. The simplest rules might state, for example, that in French, adjectives generally follow nouns, while in English, they typically precede them. But given the ambiguity of language and the vast number of exceptions and often contradictory rules, the resulting systems ranged from marginally useful to comically inept.

Over the past decade, however, machine translation has improved dramatically, propelled by the relentless march of Moore's law, a spike in federal funding in the wake of 9/11, and, most important, a new idea. The idea dates from the late 1980s and early 1990s, when researchers at IBM stopped relying on grammar rules and began experimenting with sets of already-translated work known as parallel text. In the most promising method to emerge from the work, called statistical-based MT, algorithms analyze large collections of previous translations, or what are technically called parallel corpora – sessions of the European Union, say, or newswire copy – to divine the statistical probabilities of words and phrases in one language ending up as particular words or phrases in another. A model is then built on those probabilities and used to evaluate new text. A slew of researchers took up IBM's insights, and by the turn of the 21st century the quality of statistical MT research systems had drawn even with five decades of rule-based work.

Since then, researchers have tweaked their algorithms and the Web has spawned an explosion of available parallel text, turning the competition into a rout. The lopsidedness is best seen in the results from the annual MT evaluation put on by the National Institute of Standards and Technology (NIST), which uses a measurement called the BiLingual Evaluation Understudy (BLEU) scale to assess a system's performance in Chinese and Arabic against human translation. A high-quality human translator will likely score between 0.7 and 0.85 out of a possible 1 on the BLEU scale. In 2005, Google's stat-based system topped the NIST evaluation in both Arabic (at 0.51) and Chinese (at 0.35). Systran, the most prominent rule-based system still in operation, languished at 0.11 for Arabic and 0.15 for Chinese.

[...]

WHEN MEANINGFUL MACHINES first tested its Spanish-English engine on the BLEU scale in spring 2004, "it came in at 0.37," recalls the company's CEO, Steve Klein. "I was pretty dejected. But Jaime said, 'No, that's pretty good for flipping the switch the first time.'" A few months later, the system had jumped above 0.60 in internal tests, and by the time of Carbonell's presentation in August, the score in blind tests was 0.65 and still climbing. Although the company didn't test the passage with any statistical-based systems, when it tested Systran and another publicly available rule-based system, SDL, on the same data, both scored around 0.56, according to Carbonell's paper. Meaningful Machines was in stealth mode at the time, protecting its ideas. But Carbonell was itching to talk about his results. He didn't just have an engine that he says earned the highest BLEU score ever recorded by a machine. He had an engine that had done it without relying on parallel text.

Instead, the Meaningful Machines system uses a large collection of text in the target language (in the initial case it's 150 Gbytes of English text derived from the Web), a small amount of text in the source language, and a massive bilingual dictionary. Given a passage to translate from Spanish, the system looks at each sentence in consecutive five- to eight-word chunks. The al Qaeda message analysis, for example, might start with "Declaramos nuestra responsabilidad de lo que ha ocurrido." Using the dictionary, the software employs a process called flooding to generate and store all possible English translations for the words in that chunk.

Comments

Popular posts from this blog

Insulin Resistance- cause of ADD, diabetes, narcolepsy, etc etc

Insulin Resistance Insulin Resistance Have you been diagnosed with clinical depression? Heart disease? Type II, or adult, diabetes? Narcolepsy? Are you, or do you think you might be, an alcoholic? Do you gain weight around your middle in spite of faithfully dieting? Are you unable to lose weight? Does your child have ADHD? If you have any one of these symptoms, I wrote this article for you. Believe it or not, the same thing can cause all of the above symptoms. I am not a medical professional. I am not a nutritionist. The conclusions I have drawn from my own experience and observations are not rocket science. A diagnosis of clinical depression is as ordinary as the common cold today. Prescriptions for Prozac, Zoloft, Wellbutrin, etc., are written every day. Genuine clinical depression is a very serious condition caused by serotonin levels in the brain. I am not certain, however, that every diagnosis of depression is the real thing. My guess is that about 10 percent of the people taking ...

Could Narcolepsy be caused by gluten? :: Kitchen Table Hypothesis

Kitchen Table Hypothesis from www.zombieinstitute.net - Heidi's new site It's commonly known that a severe allergy to peanuts can cause death within minutes. What if there were an allergy that were delayed for hours and caused people to fall asleep instead? That is what I believe is happening in people with Narcolepsy. Celiac disease is an allergy to gliadin, a specific gluten protein found in grains such as wheat, barley and rye. In celiac disease the IgA antigliadin antibody is produced after ingestion of gluten. It attacks the gluten, but also mistakenly binds to and creates an immune reaction in the cells of the small intestine causing severe damage. There is another form of gluten intolerance, Dermatitis Herpetiformis, in which the IgA antigliadin bind to proteins in the skin, causing blisters, itching and pain. This can occur without any signs of intestinal damage. Non-celiac gluten sensitivity is a similar autoimmune reaction to gliadin, however it usually involves the...

Blue-blocking Glasses To Improve Sleep And ADHD Symptoms Developed

Blue-blocking Glasses To Improve Sleep And ADHD Symptoms Developed Scientists at John Carroll University, working in its Lighting Innovations Institute, have developed an affordable accessory that appears to reduce the symptoms of ADHD. Their discovery also has also been shown to improve sleep patterns among people who have difficulty falling asleep. The John Carroll researchers have created glasses designed to block blue light, therefore altering a person's circadian rhythm, which leads to improvement in ADHD symptoms and sleep disorders. […] How the Glasses Work The individual puts on the glasses a couple of hours ahead of bedtime, advancing the circadian rhythm. The special glasses block the blue rays that cause a delay in the start of the flow of melatonin, the sleep hormone. Normally, melatonin flow doesn't begin until after the individual goes into darkness. Studies indicate that promoting the earlier release of melatonin results in a marked decline of ADHD symptoms. Bett...