4-Minute Read

Linux Locales

Locale is often ignored aspects of a system. Especially if you’re America. For most systems, locale is chosen on installation. But for distributions like ArchLinux, you do need to config locale yourself. Now there’re two ways to config set locale. One way is described on ArchLinux Wiki, edit /etc/locale.gen to enable the locale and use locale-gen to generates it. Finally set $LANG to the locale you want. The second way is to use DE, take KDE for example. You can set locale in system settings under languages or formats. And unlike locale-gen, it doesn’t need root permission. Surprisingly, it actually work! I always set time format in formats to British while keep other America. The time format on system panel did changed to British after that. However, after rebooting, some terminal applications will complain about invalid locales. Mosh even refuses to work! I just set $LC_ALL to C in situations like this. But there must be something wrong about it.

Until recently, I figured out what the cause of these weird behavior. I’m working on porting old formats system setting module written in Qt Widget to Qml so Plasma Mobile can use it. During which I did some research into the matters, turns out there are two locale system on most modern Linux desktop environments. One is the “traditional” GNU locale, that is shipped with glibc and can be used with GNU gettext. The other is ICU, which is pretty much everywhere. Qt uses it, Browser uses it, Java also uses it. But the two isn’t compatible.

Difference between Glibc Locale and ICU

Turns out the locale ArchLinux Wiki described is Glibc locale. Glibc locale isn’t enabled by default, on some distributions like Debian, you’ll need to download the language package to have it. On ArchLinux, it’s installed by default. But to be able to use the specific locale, you still have to generate it. Generation will transform the locale description files to a library. So that’s why you need root permission to generate it. But for ICU, all locales are enabled by default. You can use all the locales as long as you have ICU linked against the application. Another thing is that Glibc and ICU supports different sets of locales. ICU supports more, but it doesn’t support charset other than UTF-8. Glib doesn’t have as much ICU locales, but it does support charset that the locale uses before the Unicode era.

The conflicts

The most cursed thing about ICU and Glibc locale is that both locales are set by SAME environment variables! That means when I set $Lang to en_US.UTF-8, both applications that use GNU gettext and applications that use ICU will use the en_US.UTF-8 locale. When it’s totally okay with ICU, your system may not have the locale generated or even installed. And gettext applications will start complain.

Difficulties to encounter

As a DE, KDE Plasma needs to support both locale in system settings. We can’t support only ICU or Glibc, because that will means while half applications work totally fine, the other half is in total chaos. So we’d need to find the intersect between ICU locales and Glibc locales. That’s one difficulty. The other difficulty being the generation of Glibc locales. As I said before, different distributions have different methods to enable a locale. But Plasma has to work across all distributions. On ArchLinux we only need to run locale-gen, but on Debian we have to install it, the installation hook will invoke locale-gen. There is a DBus API provided by Systemd that can do these sort of work, but I don’t know if it only generate locale or will install the language package in distros like Debian. Even if it does work, Plasma can’t use the API. Because Plasma doesn’t rely on Systemd, we also need to support distributions like Apline Linux or Void Linux. So I don’t really know what to do about that. Maybe just show a message points to a webpage with instructions to enable Glibc locales on different systems?

What Apple did

MacOS is also Unix like, so it worth to see what they did about it. The system applications uses ICU, but it also have some Unix utilities, they’re on the same boat as us. I set system language to English, and locale shows only $LC_CTYPE is set to en_US, others are not set. So Apple choose to not support Unix locale. Sounds legit, thank you Apple for giving me tons of tar: invalid locale.

Current situation

I’ve managed ported Formats settings to Qml, which expected to land in Plasma 5.24. This MR resolved many bugs, but the said bug I stated above doesn’t get solved. We’re still only support ICU locale, but I’ll try to address the issue in the near future.

formats

Recent Posts

Categories

About

A young developer who loves Linux.