My review of the TENV - Interpreting and Translating Congress in Breda on 9th and 10th of March 2018. A call for translators to get or stay active in their business. The title quote by Konstantin Dranch indicates how FUTURE was the main topic all over the conference. Here is what other speakers said:
- "Now is the time to redesign your business", Jaap van der Meer
- "Things that cannot be digitalized will become scarce and increase in value", Wim de Ridder
- "At its core, the translator's work remains the same", Daniel Prou
- "Companies that grow from buying each other are a threat to the translator, because they streamline their procurement and put pricing engines in place", Konstantin Dranch, Nimdzi
- "A measuring system gives you control and improves your professionalism", Isabella Moore, Comtec
The congress was once again very well organized, and featured high-quality presentations in a delightful location: The Chassé Theatre in Breda, where Dutch and Belgium colleagues could meet half way. Charming Marijke Roskam presented the key note speakers, chaired several other presentations and shined by keeping up communication between audience and speakers, adding great value to this conference. From several sides I heard people saying that Marijke was a strong reason to come back to this conference, and I can only endorse that. Thank you, Marijke Roskam, for an entertaining conference! May other conference organizers take it as an example.
The conference in English and Dutch featured three presentational tracks and a workshop track. Interpreting service was supplied by students. Main topics of presentations were the exponential development of our industry and of technology in general; Neural Machine Translation; and Quality Management for interpreting as well as for translation. A couple of sessions also covered software tools and language related topics, such as the workshop "Dutch in the Netherlands and Belgium". Oh, and so important at the very end: The session on the AVG, the Dutch implementation of the new European data protection law. So, on both days I was absolutely able to compile an interesting program for myself.
Day 1 started with keynote speaker and futurologist Willem Peter de Ridder, who presented us with his vision of the digital Darwinism of our time. Everything that can possibly be digitalized will be digitalized, and things that cannot be digitalized will become scarce and increase in value. Since humans tend to think of development in a linear way, he tried to make us visualize the exponential growth of technological innovation and digitalization with a metaphor of Wembley stadium: The stadium gets filled up with water exponentially, meaning in minute one there is 1 drop of water, in minute two enter 2 drops of water, in minute three there are 4 more drops added, in minute four you see 8 drops of water joining, which means that in minute 21 a bucket of water is filled and in minute 46 the whole stadium is flooded with water. Now the question: Who intervened at minute 21? Who among us even recognized any danger? At that point I asked myself whether we should be running from anything now. Does this metaphor imply a danger that we run into by the exponential growth, which our brains are unable to grasp? But De Ridder clarified his point: As human beings do not develop exponentially, but rather incrementally, we need to stay aware of future developments, visualize future scenarios much more than we do now, and improve data protection. He gave many examples of usages for new technology, from robotizing routines and complex processes like the barista in the coffee shop or the doctor in a surgery over artificial voice imitation to robots with citizenship, like Sophia in Saudi Arabia. How much intervention by artificial intelligence should be taken, and when and where? Digitalization of knowledge means democratization of knowledge, but how can it be controlled in a democratic way?
Next, I listened to Daniel Prou from the European Commission, where translators have the choice of whether or not to incorporate NMT into their daily translation routine. The fact that most of the translators at the European Commission do use NMT is an indicator ofits ability to enhance the work of the translator, even though the gain resulting from the use of machine translation has notyet been measured. However, investigation is underway. "At its core, the translator's work remains the same", says Prou, because the translator has to compare the source text to the results given by the machine. Translating without knowing the source language, made possible by the latest development in machine translation, is gisting and this is not what translators do. He explains the differences in output between SMT and NMT, a topic that I covered with numerous examples in my last article about TC39 in London.
The workshop "What will your CAT tool look like in 5 to 10 years" was organized by Juliana van der Lek-Ciudin and Frieda Steurs from KU Leuven. In groups of 5, the participants discussed several topics of CAT tool features. I joined the group of revision and QA. We identified typical ways a translator gets provided with a translation that has to be revised, and agreed that the best way is a revision side by side in your CAT tool to compare source and target followed by a revision of the final document. We complained that you normally have to manually implement changes to the final document in the TM. It would be an improvement to be able to easily reimport the revised final document into the tool, but that seems to be complicated by the vast number of file formats currently in use. We discussed the pros and cons of standalone QA tools like QA Distiller, XBench or Lexiqa. It seemed to be a question of personal preference for the interfaces or the workflow of the LSP, since comprehensive CAT tools all offer more or less the same functionality. Juliana at the end of the workshop suggested joining langtech.wiki, an initiative where translators can indicate software problems or missing functionalities they encounter in their work. The platform was founded to help translators have a clearer voice when as to how their technology is going to develop.
When in January 1954 the breaking news reached the world that a computer had translated Russian into English, the people responsible were convinced that it would take five more years to achieving FAHQT (Fully Automatic High Quality Translation). The statement has become a running gag in the industry: just 5 more years... But now is the time - NUNC EST TEMPO - says Jaap van der Meer in his keynote speech, and draws a picture of a future in which machine learning will change our way of working very soon and very fast. What will change for Translators?
Just as we do not exactly understand human brain, we also don't understand the ambient intelligence of machine learning that is happening these days. A widely-cited example is the computer AlphaGo winning against human champion Ke Ji, and only three months later being defeated by a newer self-learning version of himself. The computer had made thousands of decisions and we do not know how. There is no secret machine ;the algorithms are available to everyone and a whole industry tries to bridge the skill gap so datafication becomes available to everyone. How can we deal with the conflict of open algorithms but closed data? NUNC EST TEMPO to redesign your business, therefore says Jaap van der Meer. Get vertical, read the charts and keep measuring. "Everybody needs a dashboard. If you don't have a dashboard you are not a professional."
The second day begins with the keynote speech of Robert Etches, who advises us to throw everything we have done thus far overboard, and to start again. “Inspiring”, was what I heard people saying after the speech. However, in my daily translation business in technical documentation, I have to deal with so many cases where I wish that technology were at least on the standard of today, that I wonder if throwing everything overboard really offers a possible solution (especially having the FAHQT-in- 5-years-story in mind…)
A very interesting presentation was brought us by Konstantin Dranch, Russian market researcher at Nimdzi. Nimdzi ranks the largest LSPs of the world, and You can check this information on their website. The market seems to be very fragmented, and it is hard to get all companies above 10 million dollars of revenue. The four biggest segments: 1 companies that buy other companies (RWS for example bought Moravia and got up to top in stock market); 2 public contractors (e.g. military organizations); 3 dubbing and media and 4. remote interpreting and multilingual marketing agencies. Technology companies (like translation platforms) are missing from the list, because their revenue is just above 10 million dollars, even though they have huge traffic on their platforms. The two slides above and below show the reason, why relatively small countries hold the bigger share in revenues: costs of labour.
Konstantin foresees that growth will also come from Europe. "The hottest segment in the market is probably media localization". He warns that companies that grow from buying each other are a threat to the translator, because they streamline their procurement and put pricing engines in place.
Isabella Moore from Comtec spoke about how measuring KPI's can help your business. A measuring system gives you control and improves your professionalism. It not only measures lag indicators like financial results ( which tell you how your business did in the past), but also measures lead indicators related to processes like customer satisfaction, number of complaints, or suppliers you added to your database and actually assigned to a project. You do not even need a very comprehensive tool; a small business could do with a couple of spreadsheets. Isabella told us how she could sell their family business, because it had a clear and traceable value, and how she could even reacquire the business a couple of years later for a lower price, because of the numbers.
Last but not least I heard the presentation about the new Dutch data protection law (AVG), based on the General Data Protection Regulation (GDPR) (Regulation (EU) 2016/679), that will become enforceable on May 25, 2018. A lot of legal talk for a technical translator... The regulation relates to private data, not to company data. Many translators deal with personal data all the time, and even have to process it ,for example in certificates. In cases of accepting work directly from private clients, translators have to take measures to insure safe handling of this personal data by establishing certain procedures. Even more importantly, permission for processing personal data must be granted in writing by means of a standard form or a framework contract signed by the client. You are not allowed to imply permission; you will need explicit consent. An order agreement seems to be a valid consent. So what is processing personal data? Processing can be collecting, saving, discretion, copying, anonymizing, translating and a lot more. Data may not be processed unless there is at least one lawful basis to do so. You can find the list of lawful bases on Wikipedia. Breaches must be reported to the authorities or to the responsible person within 72 hours. There are different roles of responsibility and you might evenhold more than one role at a time. The citizen has the right of access and the right to erasure. You are also supposed to maintain records of processing activities. The regulation of course contains a lot more and is gaining importance. So it might be a good idea to keep yourself informed. I hope there will be webinars for translators on this topic soon in order to find practicable solutions.
I am definitely looking forward to the next TENV and invite you to join me!
Update on TC39: SMT output vs. NMT output - what is new for translators on a hands-on practical level?
Second update on TC39. There is so much to tell about this conference, that I decided to pick out one useful topic for translators to start with and leave the full review for later.
SMT output vs. NMT output, what is new for translators on a hands-on practical level?
As a translator you have to be aware of the differences in output while working with output from SMT and NMT to avoid inadequate quality. Which error type is mainly made?
Neural is the magic word these days and several speakers addressed NMT as part of their presentations: Judith Klein in her opening talk "The best of 3 worlds: TM, SMT and NMT in Star Transit", Andrzej Zydron from XTM in his presentation "Beyond Neural MT", Emanuelle Esperança Rodier from the Université de Grenoble Alpes in her presentation "Evaluation of NMT and SMT systems: A study on uses and perceptions" (most of the below examples) and Alexander Waibel on his keynote address "A world without language barriers".
After the conference the proceedings normally get published on the conference website. So if you are interested in the detailed results of the research, go ahead and have a look. I will simplify the theoretical background here, shining light only on the practical use for translators.
Everybody is awed by NMT, so first of all some examples of how fluent the output of NMT can be in comparison to SMT:
- Source: Se solicita la sustitución de todos los rodamientos del grupo traseiro.
- SMT: Tausch aller Wälzlagergruppe beantraagt wird.
- NMT: Der Austausch aller Lager der hinteren Gruppe wird angefordert.
- Source: Anmerkungen oder Korrekturen sind keine eingegangen.
- SMT: Any comments or correction are not have died.
- NMT: Note or correction are not received.
Nice, isn't? NMT uses attention mechanisms. Attention is a useful feature in human brain to save computational resources. Not so in NMT systems. Unfortunately technology in that point has not yet reached human brain efficiency and there is a LOT of computing performance involved, which brings us to the limitations:
- NMT is good for units of less than 60 words.
- NMT is good for similar languages. EN>EN would have an LC factor of 1, EN>FR=0,8, EN>DU=0,75, EN>GER=0,6, EN>PL,RU,CS=0,45, EN>JA=0,2. Obviously for now NMT is not a good option for English into Japanese and works pretty good for English into French. It does not do a good job on morphologically rich languages into languages with primitive morphology (RU>EN).
Doing it "Neural" means using a fuzzy representation of knowledge. NMT systems try to capture the higher-level meaning of the text and are therefore rather able to generalize to new sentences than statistical systems.
Here some tricky typical error types translators have to be aware of.
- Unknown words
Be aware: sounds nice - is wrong!
NMT output has a considerably higher fluency, which makes it appear more human. As SMT becomes clumsy and uses source words when there are no results, NMT goes fuzzy and can replace or even add words, to make it look nice.
Examples:
- Source: les adolescents japonais aiment les jeux vidéos
- SMT: the adolescents japanese love electronic vidéos
- NMT: japanese teenagers are interested in fashion
Note the word vidéos, that could not be translated and remained in SMT in the source language while NMT would replace it with an in this context often used word. A mistranslation when overlooked. You will not find source words in NMT output. These mistakes can be hard to spot.
- Extra words
Related to 1.
- Source: avez-vous un menu ?
- SMT: do you have a menu?
- NMT: do you have a fixed menu?
- Missing an important content word
Be aware: Fluent, but totally wrong
- Source: c' est le contrat d' achat de mes chèques de voyage.
- SMT: it's the purchase agreement of my checks.
- NMT: it's the seniority wage system.
Some mistakes you will not find in NMT output such as words taken over from source text. And some mistakes appear a lot less such as incorrect word forms or word order.
Translator´s action: Post-editing might take more or less the same time, but you need more awareness for hard to spot mistakes and read the source carefully to avoid mistranslations.
Three Inspiring Speeches at the Interpreting - and Translating Congress 2016 in Hilversum in the Netherlands
You can read this article in German (3 Vorträge voller Denkanstöße), in Dutch (3 indrukwekkende speeches) and in Portugese (Três apresentações inspiradoras).
My first conference this year!
On the 11th and 12th of March I joined the "Tolk - en Vertaal Congres 2016" in Hilversum in the Netherlands. With more than 750 people registered, the conference was well attended and thematically very broad. It addressed pretty much everybody who works in the translation industry. Under the motto "Getting ahead together" interpreters, translators, students, translation agencies, language technology experts, training institutes and even the public service had gathered. The COA, the Dutch authority for receiving refugees, was also represented.
Henry Liu, President of the International Federation of Translators (FIT), gave the prelude with his inspiring keynote speech in front of a full audience. "What unites us? - Building a collaborative and sustainable translation profession" was a speech full of food for thought. How can we bridge the gap between users, funders and translators and make the translation process transparent? Are we still translators or rather IT-professionals? What alternatives are there for the volume-based billing of translations? Nobody advertises with "Slow Translation"; you always see "Fast Translation". Is there something like the "Fair Trade LSP”? The following comparison was amusing: Imagine the interpreter only gets paid 50% for a sentence he had previously interpreted for this customer. He refers to Akerlof’s lemons problem and talks about the discrepancy between translation quality and customer specification: The quality should fulfil the customers’ expectations, no more and no less, so you can agree on an adequate price.
On quality and responsibility Henry Liu reminded us of Toyota and the most expensive recall of all time, where translator and whistleblower Betsy Benjamin played an important role. Liu found the translator’s situation in volunteering to be problematic. All participants would get outside recognition in one form or another, except for volunteer translators. Is intrinsic motivation enough, or do we need to improve collaboration?
Technology as a positive development, but also as a counterpoint to linguistic diversity was another issue. "If everyone thinks alike, everyone stops thinking." Should we dematerialize translation, rather than dehumanize it?
Jaap van der Meer from TAUS presented us with an overview of the development of language technologies in general and machine translation in particular. At the same time as Jaap van der Meer's speech, Bert Esselink in the other room was giving a talk on "Assuring quality when clients keep demanding more quality at the same rates". This decision was a difficult one for me. Esselink’s classic "A Practical Guide To Localization" is also on my bookshelf and I'd liked to hear him too. But since my focus at the moment is on machine translation, at 11:45 I was sitting in the theatre, eager to hear Jaap van der Meer talk about the beginnings of the language technology industry in the 1980s.
A video of the first text writer reminded us how fast the development in information technology had been. With 100 pages on 20 floppy disks back then, Jaap was able to increase his productivity by 100%! Also in the 80s Systran, one of the oldest companies in machine translation and today's market leader in this area in internet, was introduced in the local Paris internet "Minitel". As for progress in the 90s, Lernout & Hauspie, now Nuance (Dragon Naturally Speaking) was mentioned for its developments in speech recognition.
In the long run, according to Jaap’s experience, development of new technologies passes certain phases: 1. Height of expectations, 2. Depths of disillusion, 3. Plateau. The 2000s were the decade of TMs. TAUS began gathering TM data in the Cloud. The idea of the project was to build a human language genome analogous to the Human Genome Project. The disillusion had already begun, says Jaap van der Meer. In 2010 more words had been translated by machines than by humans. Translation has become an application. Translation engines exist as APIs, paid for by the sharing of data. (As a little sideswipe Jaap van der Meer mentions Slate Desktop here, the Cloud-independent exception, but more on that in the next post.) In the future: TAUS invites you to discuss the blog "The future does not need Translators". Everyone can voice his opinion. Jaap frankly shares his insights of a long career in language technology: Keep calm and don´t get overly excited - technology is made by humans. He then returns to the impossibility of protecting data in the cloud.
And one last bit of advice for us: START MEASURING!
An apt comparison: Translation is like toilet paper - no one thinks about it until you need it.
Lori Thicke from Translators Without Borders opened the second day of the conference. Lori had always wanted to volunteer, but simply never got to it (like so many of us) until her translation agency got an assignment from Doctors Without Borders in 1993. Her chance had finally come and she offered the translation pro bono. Lori did her own research and realized the huge demand for translations for humanitarian aid.
An important issue in her speech was about mother tongues. If we only communicate in major European languages, we will only talk to ourselves. Lori talked about crisis situations, where critical information was delayed because it was not provided in the regional language. She explained the major advantage that people have when they can study in their first or second mother tongue, a situation many people in Europe totally take for granted during their education. For example, Norwegian, a language with 5 million speakers, has 450,000 Wikipedia pages full of free information. Hausa, with 50 million speakers, has only 1,300 pages listed in Wikipedia. Lori also commented on the problematic nature of volunteer work for translators, who remain invisible and unrecognized. And what is the situation of translators in the most required language combinations for humanitarian aid?
The conference offered many more interesting workshops and lectures. As a technical translator, my main focus was technology. I watched the Cat-Fight, joined the Lilt-Workshop, learned that you can train Dragon with target TMX, learned that Wordbee allows as many target language columns as you like, learned about the SCATE-project, was able to ask my questions about Matecat, made interesting contacts and met a lot of nice colleagues.
Next year I'll be back for sure!
Uta Schulz - www.usermanualtranslation.com
PS: This text was LILTed from German (500 words/h + proofreading).