Wednesday, 24 September 2014

Och

A long time ago I did a spell as what is now called an intern at the statistical part of the European Commission (EC), long enough ago that I was overpaid and underworked, an arrangement which I think is reversed these days. And in the course of being underworked I came across the business of machine translation, something in which the EC took a great interest given its large spend on translators and interpreters and in which I have taken a nodding interest ever since.

My guess at the time was that in order to build a decent machine translation system you had to have a machine language sitting in the middle of it. So you translated the Albanian text (say) into the machine language and translated the machine text into English text. The idea was that you needed to understand what was being said in order to do a decent translation, the bonus was that you hugely reduced the number of things you had to deal with, from roughly k1 times N squared to k2 times N, where N is the number of languages and the k are constants. And that has been my position ever since. And N had grown quite a lot since, enough to make the N squared a bit of a problem.

However, the other day, while continuing to move through the odd but interesting singularity book by Kurzweil (see 10th and 4th September), I came across a chap called Franz Josef Och who, it seems, has built a very successful machine translator on entirely different lines, using statistics and computational sledge hammers rather than stilettos.

Och headed up Google machine translation operation for a while, but now pops up at http://www.humanlongevity.com/, an outfit which relates back to Kurzweil who is clearly fascinated by his prospect of imminent immortality.

Turning to Google, I now realise that they take machine translation very seriously, a lot more so than I had guessed by their pop up offers to translate things when one looks at foreign language web sites. They pointed me to a tutorial workbook on statistical machine translation by one Kevin Knight, written way back in April 1999 and which I have been trying to get to grips with.

Point one, this machine translation is bilateral, between one real language and another. No machine language sitting in the middle. All you need to translate, say, from French to English, is a large chunk of English text, possibly culled from the web in the course of search engine business, and a large chunk of parallel text in French and English.

Point two, the system is an application of the Bayes' theorem mentioned the other day. So the probability of this bit of English E being the proper translation of that bit of French F is expressed as the product of the probability of this bit of English P(E) times the probability of that bit of French being the translation of this bit of English P(F|E). It seems that this breaking down of the single problem P(E|F) into the other two problems  is the key to making it tractable, amenable to statistical sledgehammers.

Point three, the system appears to be the product of some kind of iteration, with the system getting better each time around. It learns.

Point four, there are some language twiddly bits which more obviously have to do with the structure of language. Things like word order and the short special words used to structure things. Words like 'may' and 'the' in English.

Point five, there are some mathematical twiddly bits to do with, for example, the manipulation of very small numbers, manipulation which defeats the bog standard arithmetic offered by Excel.

Not got very far with it all yet, but enough to see that maybe my original idea was wrong. You don't need a machine language sitting in the middle of it all - and maybe the brain doesn't either. It just doesn't work like that, whatever a logician might like to think.

I shall try to persevere.

No comments:

Post a Comment