Tuesday, November 27, 2018

Khmer Unicode Problems in newer Linux distros

TL;DR

Type the following command into Terminal:

im-config -n xim

Log out and log back in for the changes to take effect.


The Full Version

I noticed recently that the problems with Khmer vowels appear again in new versions of Linux distros, including the popular Ubuntu and its derivatives like elementaryOS which I’m currently using. The vowels in question are:

Shift + A ាំ (Srak Arm)
Shift + Semicolon (;) ោះ (Srak Orss)
Shift + V េះ (Srak Ess)
Comma (,) ុំ (Srak Om)
Shift + Comma (,) ុះ (Srak Oss)

The problem is that, instead of outputting the intended vowels, square block or nothing at all is outputted.

A widely-shared workaround to resolve this problem is to add entries for these vowels in /usr/share/X11/locale/$LANG/Compose:

#
# Khmer digraphs
#
<U17ff> : "ាំ"
<U17fe> : "ោះ"
<U17fd> : "េះ"
<U17fc> : "ុំ"
<U17fb> : "ុះ"

NOTE: At the time of writing it seems that most distros already include these bindings.

Plus, you need to fallback to the legacy input method xim, which respects the aforementioned Compose file, by adding the following line in /etc/environment:

GTK_IM_MODULE=xim

All is well and good, right? Unfortunately, no. As of late, most Linux distros are shipped with the popular ibus as the default input method, and the current version of ibus (1.5.17) does not support multi-character Compose key output (the 5 Khmer vowels are multi-character), so the previous method no longer works even with GTK_IM_MODULE=xim in the /etc/environment file.

A solution to this is to force the default input method of the system to xim:

im-config -n xim

I found out that just using this command will make the Khmer Unicode compose key bindings work again in GTK applications without any need to modify the /etc/environment file.

Wait, only GTK applications? Yes, QT applications still don’t seem to work correctly with multi-character Compose key output, at least not the applications I tested with:

  • VLC Media Player
  • VirtualBox
  • TeamViewer
  • WPS Office
Strangely, even forcing QT_IM_MODULE=xim doesn’t help either.

Please keep in mind that xim is very old and will most likely be completely removed in the future as other input methods are more stable. That being said, until ibus gains support for multi-character Compose key output, xim is still a valid band-aid to this annoying problem. In the meantime, you can also take a look at other input methods like uim which has better legacy support with xim and multi-character Compose key output. I tested uim and it seems to work out of the box with the 5 Khmer vowels. The only problem is that it’s not included by default in most distros, so you have to install it manually.

Hopefully this will make your Linux adventure with Khmer Unicode better, and if you have a different method to make Khmer Unicode support works better on Linux, please let me know.

No comments:

Post a Comment

Les Symboles Nationaux (1) v2

(L'article ci-dessous est une révision de l'article publié en 2011) Chaque pays a ses symboles nationaux qui montrent la cult...