You are currently browsing the tag archive for the ‘localization’ tag.

Terminology

Things have a name. That’s simple. Different things have a different name, and different names belong to different things. That’s simple too. But this is not the reality. In reality, most of the words have more than one meaning, and most of things have multiple different names. That’s why you have to specify how you want to call things you work with. This is terminology, which plays a fundamental role during localization.

Terminology has to be defined at the earliest possible stage of the localization project. It is very important to use the same terminology throughout your entire project and all your projects. This way, your users can be sure, when you are talking about a given Thing, you really mean that Thing, and not AnotherThing. This is true not only for the translation; indeed, keeping terminology begins at the source text. If you want to talk about Windows group policy, you have to call it group policy and not group regulations. If you want to talk about Unix shell, you have to call it shell, not command line. Of course, you can decide you’ll use the name group regulations and command line, but in this case, make your decision at the beginning of your project, stick to it, and use these names consistently.

The other side is: you have to give different things different names. The video card built into the computer and the external device used separately, but connected to the computer must have a different name. You could call both of them video device, but when you instruct the user to turn on the video device, it may happen the user desperately tries to find the power button on the video card, and finally, damages his/her computer.

How clearly different things have to be differentiated, partly depends on the type of your text. In a marketing material, where technical details receive less emphasis, keeping a clear terminology is less important. In some cases, if you need a nice but not so accurate text, you may get a text with a better style, if you don’t adhere dogmatically to your terminology. In an airplane manual, however, you can’t mess up names. I hope it is needless to mention, hiring an author without a good style and appropriate technical knowledge will degrade overall quality. He/she needs to know when to be accurate and when to be chaste.

How to start?

If you already have some kind of terminology, you may want to keep, use and sometimes extend it. But if you don’t have one, it is very difficult to build a good terminology from the scratch. As a good starting point, you can use the terminology developed by big firms; Microsoft’s terminology, for example, is freely available, and it covers many languages. I think it is a bad idea to start to build your own, because it will be a long and expensive work, and it will cause inconsistencies between your application and the operating system and the expressions already known by the user.

Suppose, you are a software vendor in the USA, but want to localize your product to Hungarian. You need a person with good Hungarian knowledge, with good professional background to build, check and maintain your terminology. This is a huge problem, because you don’t speak Hungarian, and you can’t judge the quality of work of Hungarian translators. The only solution is to find professionals with good references and solid knowledge, let them check each others’ work and expect high quality. Passing this work on the sales assistant of your Hungarian office is definitely a bad idea.

Your terminology will be recorded in the form of a terminology database. There are different applications for this specific purpose, and CAT software may also contain some functions to handle terminology, but at the beginning, if you don’t need a professional solution, you can use a simple Excel table with source term – target language term pairs, maybe supplemented with some comments.

An example of a bad terminology: restore and recovery are both used in the source document, and have a different meaning. Restore means restoring a file or folder, while recovery means returning a computer from a complete failure to a working state. The person who makes the terminology, doesn’t know that, and misses to remark that the terminology database contains both words. He/she pairs the same translation for both of them. Then, when the translator receives the sentence “Choose restore or recovery”, he/she needs to write a translated sentence that sounds like “Choose restore or restore”. Otherwise, the translator will not adhere to the terminology, and you, when checking the translation, will think the translator made a poor work.

Handling variances

There are further difficulties when handling your terminology. You may, for example, have an application available for different types of devices: desktop computers, some kind of portable machines with touch screen and PDAs. When talking about using the user interface’s elements, you may need to differentiate these devices; the desktop user has to click to the button, the portable machine’s user has to tap on the screen, while the PDA’s owner needs to touch it.

A good terminology database software is able to handle searches based on regular expressions. This is especially important if the terminology database and the help or manual text aren’t fully consistent. For example, if the database contains “Downloading your e-mail”, while the text contains “Downloading your email” (without dash in e-mail), then the translator won’t find the expression. Of course, few translators can write regular expressions (or will spend his/her time correcting errors coming from the source text), so the ultimate solution is to keep the source text free of these minor variations.

Going further: consistency

Maybe you already noticed, we are talking about consistency. Even the goal of building a terminology database is to keep source and target side consistency. Reaching this, however, does not end with building a database.

Translation memories give a great help in keeping target side consistency (the consistency of the translated text) – the translator can search for already translated parts. But this search will be unsuccessful, when the source text is inconsistent. Consistency here means that a given thing has to have the same name in every occurrences, and the user interface, the manual, the help files etc. all have to be consistent.

Imagine, when the user interface contains the word “Locked”, while the manual contains “Lock”. This can be a typing error or anything else. The user interface is localized first, then, while working on the manual, the translator looks up for “Lock”. The search will be unsuccessful, and consistency will be gone.

When talking about large software vendors, developing complex applications or application suites, it is especially difficult to maintain consistency between different applications, releases, modules, organizational units of developers etc. Further, in today’s integrated world, more interesting problems can arise. I’ll bring two examples.

The first is a mobile phone, which has a music player application installed. This application is made, branded and localized by another vendor. There’s some kind of business agreement between vendors, so the application is installed on the phones in the factory to extend multimedia capabilities. The user interface of the phone is localized to many languages. The phone maker decides to localize the music player application too. Three possibilities exist:

  • The application is localized separately, and will be inconsistent with both the phone’s own software and the multimedia vendor’s other applications. This is the worst case.
  • The application is localized based on the translation memory and terminology of its own developer, but possibly will be inconsistent with the phone’s software.
  • The application is localized based on the translation memory and terminology of the phone vendor, and it will be consistent with the phone’s software, but not with the software vendor’s other products. Maybe this is the best case.

The situation can be more complex, when this multimedia application has another synchronization and album-making component running on the owner’s desktop computer. As an example, a few years ago I found some interesting inconsistencies between Microsoft products running on the desktop and a mobile device, where sync, synchronization and copy were used alternately.

Windows Group Policy setting dialogue

Windows Group Policy setting dialogue. Some parts are localized, some remained in English. Microsoft desktop operating systems are localized, while server operating systems aren

The second example is a digital camera. It has a simple menu, with simple and short expressions. One of them is “Print number”. Localization work begins with the firmware. The translator receives this expression, thinks it means printing a serial number on the captured image, and translates it accordingly. Later, the work continues, and, while working on the manual, the translator realizes that “Print number” means “How many copies of this image do you want to print?”. Just because the camera is PictBrige-compliant, can be connected to some printers, and during that, the user has to choose how many prints he/she wants. The translator realizes, the previous translation should be modified. Maybe he/she sends an e-mail to the client, but what if the camera’s firmware is finalized in the meantime and is already used in production? Nobody will refresh the firmware to correct such a bug, and the final product will have a misleading text in its menu.

In this special case, the given expression is part of the PictBridge standard. It would be great if the owner of this standard would pay a few dollars to localize these expressions, and would give away the right of use with localized menu elements. The camera maker could avoid errors, and the consistency of the expressions could be kept between all the products based on the standard, giving further help to end users. It would be a win-win situation.

A more interesting problem arises when a device provides some extra functions, such as when a mobile phone can be controlled with voice commands. In this case, you need to localize the voice prompts, find someone (maybe an actor) to pronounce these commands, embed them into the firmware etc. This can be a separate branch of development, independent of the localization; however, it is important the put these expressions into the translation memory or the terminology database, because the manual or the help file may refer to them. If you decide not to localize voice commands, because it would require too much effort, compared to the size of the localized market, then you should let the translator know that and instruct him/her to keep these expressions in the original language. (Of course, this is important information to the end user too.)

How to help the translator?

It is important to restate: maintaining the terminology database and the translation memory is not the task of the translator. The translator has to use these information sources, and should, of course, add new original-translation pairs to the translation memory, but approving which updates to use during further works, is not his/her task. You can outsource it to someone, even to the translator itself, but this work won’t be the part of the translation or the localization work.

When giving out the localization work, you should attach enough background information and information of good quality. Here I want to refer to my first post’s section “What to localize?”. As I mentioned, you should avoid sending out internal information to the translator. This is not just because of protecting information, but keeping the translation memory clean and free of irrelevant entries, which can be misleading to the translator.

Keeping a common translation memory and terminology database for all of your products or separating them, depends on the situation. If they are completely separate, then you can start with a basic common memory and database, and keep them separate. But if your products are integrated to some degree, maybe it is better to integrate the translation resources too. When the translator works on your spreadsheet application, and the help file mentions your database engine, the translator will ask for the translation memory and the terminology of the database engine. Or, the worst case, won’t spend his/her time asking for further resources, and will handle the situation on its own.

You should not mix the function of the translation memory and the terminology database. Basically, the memory contains original-translation pairs, which are usually complete sentences. The database, however, contains single words or short expressions, such as how the word “recover” or how the expression “Open file” should be translated. If you inject all occurrences of “Open file” from the translation memory to the terminology database, then you’ll get a spreadsheet with 100,000 lines, and when the translator wants to find the exact translation of “Open file from disk”, he/she will get 560 matches – you can be sure, he/she won’t check each of them, clicking the Find button 560 times.

Practical tips

You can further help the translator by organizing the localization project. Just some ideas, which can be used, especially when there’s no translation memory and terminology to build on, or the localization project is small:

  • You can group all the expressions related to a given element. For example: when the user right-click a file, various commands can be reached, such as Open, Copy, Delete.
  • You can group all the expressions related to a given function. For example, Open file, Open picture, Open video etc.
  • You can group all the elements that can be reached by the end user from the same place. For example, your application has a dialogue, with some buttons, pop-up explanations etc. You can place the dialogue title, the button titles and the explanations in the same group before sending out the document for translation.
  • When localizing help files or documentation, especially when multiple translators are working together, it is wise to start with cross-references, for example with titles, which can be mentioned in many places.
  • Depending on the format of the source text, you can insert explanations, such as what the message means, where it will be displayed and so on. This way, the translation can be made based on the real meaning, not just the words.
  • When using abbreviations, you should include as comment what the abbreviation means. Searching on the Internet, especially without context, is a very boring and difficult task.

Based on my second post you got a picture on how to lay down the foundations of your localization project: all is about the source text’s quality, consistency and well used terminology. Naturally, there are many further possibilities to help the localization work – or make it difficult. I’ll cover them in my next post.

Someone: So, what’s your work?
Me: I do translations.
Someone: Aha… So your English must be excellent.
Me: Well, not exactly…
Someone: How is it possible?
Me: I do software localization, which is more about background knowledge than excellent language knowledge.
Someone: Aha…

That conversation happened many times. Why is it so interesting? Well, it isn’t, but gives some taste on what people think and know about software localization. Sure, it’s some kind of translation, but one of the many specialized areas of it.

What am I talking about?

The input of the software localization process is a software in a given, usually English language, and the output is a software with the same functionality, but displaying its user interface in another language. The localization process can be made up of many stages, for example: choosing languages which you want to localize your application to, choosing the right tools for the process, exporting strings to an editable and translatable format, choosing a supplier who is able to make the translation work, building, testing and finally distributing the localized version of the application. But I won’t cover the whole process; in fact, I won’t fully cover any part of it, but I will bring examples how can you make a mess of the outcome, producing a localized application, which is irritating to the end user – finally, throwing your money out of the window when paying for the whole process.

What to localize?

Let’s see the first question. What’s your goal? You need a localized application, which can be sold to the people speaking the given language. But… Wait a moment. Sure, you have an application, but what’s the product? Is it simply some kind of software, or maybe there are some help files belonging to it? Does it have an installation and administration manual? Does it have a box, a printed start-up poster, a “return to us” registration card? Does it have a support forum, a “Minimum requirements” page on your web site? Maybe all of these things make up your product, and you have to decide on what to localize. It has to be a business decision, but it can be a root cause of annoying dependencies and problems. When making your decision, you have to take into account how many copies you plan to sell, what’s the target audience of your application (for example, applications written to professionals do not have to be localized, because you can assume they have an appropriate level of English knowledge), costs, legal requirements (for example, a product cannot be sold without a manual written in the official language spoken in the given country), deadlines etc.

Let’s suppose, the decision is made, you want to localize the application itself, the help files and the accompanying documentation. This is not obvious, sometimes the software itself remains in English, but the manual has to be localized. This can be the case when you have an application which tightly integrates with some embedded operating system, uses its messages, but the operating system isn’t localized. Or, you can localize the user guide, but not the administration guide written to the operators. Finally, in some cases, you can find localized applications without localized documentation.

There is a further marketing-related question: do you want to localize the product name itself? In most, but not all cases, the name remains in English, but sometimes a part of it is localized. For example, if it’s called ABCD Phone Manager, the “Phone Manager” part can be localized.

Tools

The localization work itself is done with software tools, that’s why the process is often called CAT, Computer Aided Translation. Many tools exist, their developers have a different point of view and as such, their applications have different features. Some of them target software developers, and treat translators as a “must have” resource. These applications are nice tools for the developers, because they require little conversion and editing work, but in some extent, are difficult to work with as a translator. On the other end of the scale you find tools made for general translation work. They are easier to use for the translator, but are unable to handle formats used by developers. This gap is often narrowed by integration between tools; an obvious example to this is using the spell checker of Microsoft Word, which is available in many languages.

Most of the tools have more than one version; for example, one for developers, one for localization teams and one for freelancer translators working alone. The latter one is often free, that’s why usually only the version made for developers can be used to create new translation projects. The model is simple, you choose and buy the appropriate tool for you, you send the project file to the translator, he/she can download the translator’s tool for free and can make the work. This way, you don’t have to force the translator to make an investment in another tool, only to work with you – and spend all his/her income to buy something which will be never used to work for another client.

The basic functions provided by CAT tools to the translators are the following:

  • Text editing. That’s evident, the tool has to display the original text and has to provide some space the type in the translation. It is good to have some spell checking functionality.
  • Translation memory. It is a database, which stores original-translation pairs. It can be used to search for previously translated and repetitive sentences or parts of sentences.
  • Dictionary or terminology database. It can be used to store simple original-translation pairs of words or expressions.

Let’s see some examples. IBM Translation Manager (TM/2) is a very old tool. Previously, it was sold as a CAT product. Currently it is used as an internal tool for IBM.

IBM TM. Source of picture: www.vegadata.cz

IBM TM. Source of picture: http://www.vegadata.cz. Notice the dictionary at the lower third of the window.

Trados is an old player too. Its most important feature (to me…) is that it integrates with Microsoft Word, so the translator can use all the editing and spell checking features of Word while doing his/her work. Its developer was recently bought by another player of the CAT market, SDL.

Passolo, also owned by SDL, is a tool with more focus on the needs of software developers.

Passolo. Source of picture: www.multilingual.com

Passolo. Source of picture: http://www.multilingual.com. Notice how small the input field is, but on the left side, the translator gets the list of similar sentences.

Alchemy Catalyst is similar to Passolo, from the translators view.

Microsoft has some tools too, like Helium and Localization Studio, with many developer-minded functions.

If we want to draw a “Specialized tools, made for developers, difficult to use” – “General tools, made for translators, easy to use” axis, maybe Trados is on the right side, and the others in some or more extent, on the other side. Thanks to the recent buyings, integration efforts and developments, they are sliding to the right side in terms of usability.

Of course, you can find many other tools. It isn’t difficult to write an application with basic functionality, like having a field to type the translation in. The real art of CAT tool making begins with providing various filters for formats like .doc, .pdf, supporting the formats used by development tools, integrating with other applications and spell checkers, importing and exporting translation memories of other CAT software etc.

When deciding to localize your software product, you have to choose which tool to buy and use. Of course, there’s no one fits for all solutions, so you may choose two tools: a developer-friendly to localize the user interface’s strings and a translator-friendly to translate the documentation. Due to the integrations, they may use the same translation memory, which makes everyone’s work easier.

Input to the translator

Suppose you have the right tools, and you have to assemble the project you will send to the translator. As your software product is made up of the application itself and all the documentation, the project will also have at least two parts.

Look at your application. It displays a lot of strings: it has a window title, menu names, error messages, popup messages, dialog boxes, option buttons, links and so on. This is the first part. The other part of the project is the help file and the documentation.

You can differentiate these parts by the length of the strings found within: user interface elements are often made up of a single word or a few words. Error messages, links and some explanatory texts are whole sentences, at most. However, the documentation has sentences which are parts of a continuous text. This distinction is important, because shorter parts have to be treated in a different manner than continuous text. The latter is easy to understand to the translator, because it contains the context in itself; the former ones are difficult to work with, because they don’t have a context, and in many cases, it is difficult or impossible to figure out what these sentences are talking about. These two parts require a very different working style.

Why? Because all the CAT tools work with segments. A segment is a translation unit. When working with continuous text, a segment can be a sentence, a title, a member of an enumeration etc. When working with user interface terms, a segment is a menu name, a window title, a message – simply anything displayed on the interface. Due to the nature of things, when translating a continuous text, the translator can see the previous (already translated) and the next sentence (segment). But when localizing user interface elements, he/she works with separate units, which, in many cases, are not related to each other, or simply aren’t following each other on the list of terms to translate.

Again: What to localize?

So far, the localization work itself seems to be simple, you need a translator, or a translation office which has translators as employees or subcontractors, you send them your localization project, and you get the localized one. Development tools support this process, you can use them to export the appropriate strings, and you can import the translated ones. If strings aren’t built into the executables, but attached as XML files, then your work is much simpler; you’ll have general binaries and separate language files. Whichever method you use, you’ll send and receive strings, no problem.

However, remember, you need a product which talks to the end user on his/her own language. Do you really need to send out every exotic error message, used only during testing, seen only by developers, for translation? Do comments made by developers, function names, module names etc. need to be translated? Do comments made by Jane, the reviewer of the manual, inserted in your Word document with a style “comment”, have to be localized? The answer is clearly no. First, they cause disturbances during the translation process, second, you’ll pay for useless work. Third, sometimes you’ll give out internal information that would be kept inside your organization. Of course, if you want to begin localization work before releasing the final product, you can’t avoid sending out some beta materials, but in every case you have to check what you hand out to some foreign people.

Summary

At this stage you know what’s your aim with your localization project, you have some imagination of the translator’s work and chosen the appropriate CAT tool. Next time I’ll talk about the terminology database and keeping consistency during your localization project.