Localization - Everything2.com

Localization, generally speaking, is the process by which a team of application builders creates different SKUs of a software application for sale in a non-native country. This process is oftentimes much more than just translating the strings, bitmaps, and external resources of a file into another language. There are many issues when developing an application that you want to go global with. Generally speaking, these concerns may be out of date eventually, but as of 2001, these concerns have been valid for at least the last five years when developing an international application.

By country, here are some of the items that you can expect to deal with, in just the programmatic aspect alone:

Japan typically has a number of issues associated with it.
- A large one is that they do not use the same character set as we do. This means having to deal with strings in Unicode. There are even mixed mode strings (double byte English, single byte English, double byte Kanji) can be hell to display, as they are very complicated. The font sizes are also on the magnitude of three times as large to store in memory. Therefore, your application is going to use considerably much more memory (although thankfully the Japanese are aware of this).
- There are two character sets active in Japan, Furigana and Kanji. Each one might need to be able to be used in an application that accepts character input.
- Ruby notation, a series of annotations next to a piece of text for pronunciation purposes, also need to be dealt with.
- Applications typically need to be hooked up to an OS-provided, interface level service known as an IME, or Input Method Editor. This allows an application to be free from the keyboard stroke parsing needed to come up with the correct characters.
- When developing an application that needs text, keep in mind that it is very common to have Japanese text typing vertically.
China China enjoys many of the same issues as Japan. They are very similar in character style (even though the language is different).
Spain/Spanish Spain is a fairly easy translation in applications. Mostly this is just a string replacement.
France In france, the numerical place separator is different. The French use a comma instead of a period to denote thuosands. IE, as Americans would write a number as "1,000,000", the French would write it as "1.000.000". The decimal separator is also reversed (America: "1.5"; France: "1,5" for one and a half). Other than that, it is just a matter of making sure your program can handle accents in both directions and the few other alternate characters posed by the language. All in all, not a very difficult language to adapt an application to.
Great Britain / Australia You'd think this would be just a "ship as is", but it is deceptive.
- In Great Britain and Australia, the date/time formats are reversed. In America, to say September 11th, we'd say 7/11/2001. However, on the other side of the pond, they'd use 11/7/2001 (and not referring to November 7th). This is a large problem for default dates and the way things are displayed, especially when denoting things in the future. There is an international date/time standard published by the W3C, but that's not as regularly followed as what people are used to.
- The Metric System. While this usually doesn't come up in an application, keep in mind that the United States is the only major country that doesn't use the metric system. Thus conversions from inches to cm, pounds to kg, and the like should be on hand when displaying values.
Greece Greek is a really funny language that uses different characters.
- Alternate characters are a large problem in Greek. A large concern that as of Mac OS X, Greek is a produced SKU, but is not officially supported. Thus an application that you choose to release to the Greek market can only (currently), be released on Windows. This was a decision Apple made for you.
Hebrew From personal experience, I can tell you that a Hebrew application is the worst to work with, code wise.
- Hebrew characters read right-to-left. This is hell on any coding system, because all of our ideas for parsing and display now have to run backwards. There are a lot of sneaky tricks that many OS's provide to help you out with the coding of these sorts of applications (such as text boxes that you can choose to make Hebrew if you'd like). There is a huge decision as to whether or not to support this platform. Only major apps usually undertake this porting step.
Russian Mainly this is another set of font formats. The input isn't terribly bad, although Cyrillic is a little hard to work with.
Arabic While I haven't worked much with these applications, there are miles of political implications on top of Arabic character set, which is typically considered to be very hard to work with.
All of Europe Even in US applications you have one problem in particular...
- The Euro character is a problem, because it isn't included in a lot of standard character sets. There are several display hacks for displaying and printing it (including font replacement, bitmap sending, etc). In Unicode systems, there is a character for it, but trying to use one character not in your code page oftentimes leads programmers to try some shady things.

Other languages you will probably end up worrying about are the myriad of Indian languages (difficult), Italian, Korean (similar to Chinese and Japanese), Polish (again, tough due to non-standard character sets), Swedish, German (German has a great deal of special characters, but is a fairly easy language to port to, technically speaking), and Portuguese, just to name a few.

Along with the many technical issues, there are some serious political issues (especially the Middle East and Arabic countries) when releasing a program to other countries.

Be careful to research the proper name of the place to which you are releasing software. If you choose a name of a country, and it happens to be the name given by a faction that is not the ruling party, you could be seen as someone who is siding with rebels or dissidents. This is something you want to avoid, and is a political reality. In one international incident several Microsoft sales representatives were detained because of some problems in Office.
Images need to be researched (clip art and logos). For instance, while an owl in our country (the US) is a symbol of wisdom, in many others it is considered a dumb and petulant bird. It would be an insult to associate an owl with any sort of education. Certain colors, whether or not something numbered as even or odd, and many personal gestures can take on alternate cultural meanings in other places.

Another large problem is that put in the hands of the QA engineer. Many systems and software gets cross-polinated in European countries. Oftentimes the US English version of a system is most stable, so people will use that as their primary OS, but use a French or even Japanese applications (with something like Apple's JLK) on it. The amount of configuration testing needed to get assure that a large scale application will work in the most amount of situations is immense. You can't account for all situations, but trying to hit the most likely ones are possible. Have a large variety of OS languages on hand, as they all have their own sins per system.

The main lesson to be learned here is to do your homework. Countries change, but there will be certain technical and political implications in place for many years to come in the software development industry.

Localization is sometimes abbreviated l10n, for L-ten letters-N.

sku	UNIX file system layout	US-Centricity on Everything	Transware
The Book of the Damned: Chapter 19	Careers for a liberal arts major	I18N	Schrödinger's Cat
translation	North America	Mario & Luigi: Partners In Time	The separation of grammar and lexicon in the human mind
George W. Bush's 2005 State of the Union Address	Capitals of the World	EIPT	Mega Man Network Transmission
Home designs and styles	Sesame Park	It's Hard to be Humble	M60 Armored Vehicle-Launched MICLIC
SimCity 3000 Unlimited	grapheme	Pipe link	S10N