Terminology

Things have a name. That’s simple. Different things have a different name, and different names belong to different things. That’s simple too. But this is not the reality. In reality, most of the words have more than one meaning, and most of things have multiple different names. That’s why you have to specify how you want to call things you work with. This is terminology, which plays a fundamental role during localization.

Terminology has to be defined at the earliest possible stage of the localization project. It is very important to use the same terminology throughout your entire project and all your projects. This way, your users can be sure, when you are talking about a given Thing, you really mean that Thing, and not AnotherThing. This is true not only for the translation; indeed, keeping terminology begins at the source text. If you want to talk about Windows group policy, you have to call it group policy and not group regulations. If you want to talk about Unix shell, you have to call it shell, not command line. Of course, you can decide you’ll use the name group regulations and command line, but in this case, make your decision at the beginning of your project, stick to it, and use these names consistently.

The other side is: you have to give different things different names. The video card built into the computer and the external device used separately, but connected to the computer must have a different name. You could call both of them video device, but when you instruct the user to turn on the video device, it may happen the user desperately tries to find the power button on the video card, and finally, damages his/her computer.

How clearly different things have to be differentiated, partly depends on the type of your text. In a marketing material, where technical details receive less emphasis, keeping a clear terminology is less important. In some cases, if you need a nice but not so accurate text, you may get a text with a better style, if you don’t adhere dogmatically to your terminology. In an airplane manual, however, you can’t mess up names. I hope it is needless to mention, hiring an author without a good style and appropriate technical knowledge will degrade overall quality. He/she needs to know when to be accurate and when to be chaste.

How to start?

If you already have some kind of terminology, you may want to keep, use and sometimes extend it. But if you don’t have one, it is very difficult to build a good terminology from the scratch. As a good starting point, you can use the terminology developed by big firms; Microsoft’s terminology, for example, is freely available, and it covers many languages. I think it is a bad idea to start to build your own, because it will be a long and expensive work, and it will cause inconsistencies between your application and the operating system and the expressions already known by the user.

Suppose, you are a software vendor in the USA, but want to localize your product to Hungarian. You need a person with good Hungarian knowledge, with good professional background to build, check and maintain your terminology. This is a huge problem, because you don’t speak Hungarian, and you can’t judge the quality of work of Hungarian translators. The only solution is to find professionals with good references and solid knowledge, let them check each others’ work and expect high quality. Passing this work on the sales assistant of your Hungarian office is definitely a bad idea.

Your terminology will be recorded in the form of a terminology database. There are different applications for this specific purpose, and CAT software may also contain some functions to handle terminology, but at the beginning, if you don’t need a professional solution, you can use a simple Excel table with source term – target language term pairs, maybe supplemented with some comments.

An example of a bad terminology: restore and recovery are both used in the source document, and have a different meaning. Restore means restoring a file or folder, while recovery means returning a computer from a complete failure to a working state. The person who makes the terminology, doesn’t know that, and misses to remark that the terminology database contains both words. He/she pairs the same translation for both of them. Then, when the translator receives the sentence “Choose restore or recovery”, he/she needs to write a translated sentence that sounds like “Choose restore or restore”. Otherwise, the translator will not adhere to the terminology, and you, when checking the translation, will think the translator made a poor work.

Handling variances

There are further difficulties when handling your terminology. You may, for example, have an application available for different types of devices: desktop computers, some kind of portable machines with touch screen and PDAs. When talking about using the user interface’s elements, you may need to differentiate these devices; the desktop user has to click to the button, the portable machine’s user has to tap on the screen, while the PDA’s owner needs to touch it.

A good terminology database software is able to handle searches based on regular expressions. This is especially important if the terminology database and the help or manual text aren’t fully consistent. For example, if the database contains “Downloading your e-mail”, while the text contains “Downloading your email” (without dash in e-mail), then the translator won’t find the expression. Of course, few translators can write regular expressions (or will spend his/her time correcting errors coming from the source text), so the ultimate solution is to keep the source text free of these minor variations.

Going further: consistency

Maybe you already noticed, we are talking about consistency. Even the goal of building a terminology database is to keep source and target side consistency. Reaching this, however, does not end with building a database.

Translation memories give a great help in keeping target side consistency (the consistency of the translated text) – the translator can search for already translated parts. But this search will be unsuccessful, when the source text is inconsistent. Consistency here means that a given thing has to have the same name in every occurrences, and the user interface, the manual, the help files etc. all have to be consistent.

Imagine, when the user interface contains the word “Locked”, while the manual contains “Lock”. This can be a typing error or anything else. The user interface is localized first, then, while working on the manual, the translator looks up for “Lock”. The search will be unsuccessful, and consistency will be gone.

When talking about large software vendors, developing complex applications or application suites, it is especially difficult to maintain consistency between different applications, releases, modules, organizational units of developers etc. Further, in today’s integrated world, more interesting problems can arise. I’ll bring two examples.

The first is a mobile phone, which has a music player application installed. This application is made, branded and localized by another vendor. There’s some kind of business agreement between vendors, so the application is installed on the phones in the factory to extend multimedia capabilities. The user interface of the phone is localized to many languages. The phone maker decides to localize the music player application too. Three possibilities exist:

  • The application is localized separately, and will be inconsistent with both the phone’s own software and the multimedia vendor’s other applications. This is the worst case.
  • The application is localized based on the translation memory and terminology of its own developer, but possibly will be inconsistent with the phone’s software.
  • The application is localized based on the translation memory and terminology of the phone vendor, and it will be consistent with the phone’s software, but not with the software vendor’s other products. Maybe this is the best case.

The situation can be more complex, when this multimedia application has another synchronization and album-making component running on the owner’s desktop computer. As an example, a few years ago I found some interesting inconsistencies between Microsoft products running on the desktop and a mobile device, where sync, synchronization and copy were used alternately.

Windows Group Policy setting dialogue

Windows Group Policy setting dialogue. Some parts are localized, some remained in English. Microsoft desktop operating systems are localized, while server operating systems aren

The second example is a digital camera. It has a simple menu, with simple and short expressions. One of them is “Print number”. Localization work begins with the firmware. The translator receives this expression, thinks it means printing a serial number on the captured image, and translates it accordingly. Later, the work continues, and, while working on the manual, the translator realizes that “Print number” means “How many copies of this image do you want to print?”. Just because the camera is PictBrige-compliant, can be connected to some printers, and during that, the user has to choose how many prints he/she wants. The translator realizes, the previous translation should be modified. Maybe he/she sends an e-mail to the client, but what if the camera’s firmware is finalized in the meantime and is already used in production? Nobody will refresh the firmware to correct such a bug, and the final product will have a misleading text in its menu.

In this special case, the given expression is part of the PictBridge standard. It would be great if the owner of this standard would pay a few dollars to localize these expressions, and would give away the right of use with localized menu elements. The camera maker could avoid errors, and the consistency of the expressions could be kept between all the products based on the standard, giving further help to end users. It would be a win-win situation.

A more interesting problem arises when a device provides some extra functions, such as when a mobile phone can be controlled with voice commands. In this case, you need to localize the voice prompts, find someone (maybe an actor) to pronounce these commands, embed them into the firmware etc. This can be a separate branch of development, independent of the localization; however, it is important the put these expressions into the translation memory or the terminology database, because the manual or the help file may refer to them. If you decide not to localize voice commands, because it would require too much effort, compared to the size of the localized market, then you should let the translator know that and instruct him/her to keep these expressions in the original language. (Of course, this is important information to the end user too.)

How to help the translator?

It is important to restate: maintaining the terminology database and the translation memory is not the task of the translator. The translator has to use these information sources, and should, of course, add new original-translation pairs to the translation memory, but approving which updates to use during further works, is not his/her task. You can outsource it to someone, even to the translator itself, but this work won’t be the part of the translation or the localization work.

When giving out the localization work, you should attach enough background information and information of good quality. Here I want to refer to my first post’s section “What to localize?”. As I mentioned, you should avoid sending out internal information to the translator. This is not just because of protecting information, but keeping the translation memory clean and free of irrelevant entries, which can be misleading to the translator.

Keeping a common translation memory and terminology database for all of your products or separating them, depends on the situation. If they are completely separate, then you can start with a basic common memory and database, and keep them separate. But if your products are integrated to some degree, maybe it is better to integrate the translation resources too. When the translator works on your spreadsheet application, and the help file mentions your database engine, the translator will ask for the translation memory and the terminology of the database engine. Or, the worst case, won’t spend his/her time asking for further resources, and will handle the situation on its own.

You should not mix the function of the translation memory and the terminology database. Basically, the memory contains original-translation pairs, which are usually complete sentences. The database, however, contains single words or short expressions, such as how the word “recover” or how the expression “Open file” should be translated. If you inject all occurrences of “Open file” from the translation memory to the terminology database, then you’ll get a spreadsheet with 100,000 lines, and when the translator wants to find the exact translation of “Open file from disk”, he/she will get 560 matches – you can be sure, he/she won’t check each of them, clicking the Find button 560 times.

Practical tips

You can further help the translator by organizing the localization project. Just some ideas, which can be used, especially when there’s no translation memory and terminology to build on, or the localization project is small:

  • You can group all the expressions related to a given element. For example: when the user right-click a file, various commands can be reached, such as Open, Copy, Delete.
  • You can group all the expressions related to a given function. For example, Open file, Open picture, Open video etc.
  • You can group all the elements that can be reached by the end user from the same place. For example, your application has a dialogue, with some buttons, pop-up explanations etc. You can place the dialogue title, the button titles and the explanations in the same group before sending out the document for translation.
  • When localizing help files or documentation, especially when multiple translators are working together, it is wise to start with cross-references, for example with titles, which can be mentioned in many places.
  • Depending on the format of the source text, you can insert explanations, such as what the message means, where it will be displayed and so on. This way, the translation can be made based on the real meaning, not just the words.
  • When using abbreviations, you should include as comment what the abbreviation means. Searching on the Internet, especially without context, is a very boring and difficult task.

Based on my second post you got a picture on how to lay down the foundations of your localization project: all is about the source text’s quality, consistency and well used terminology. Naturally, there are many further possibilities to help the localization work – or make it difficult. I’ll cover them in my next post.