Bug 22649

Summary: Permit assigning a codepoint combo to one keycode
Product: xorg Reporter: Nicolas Mailhot <nicolas.mailhot>
Component: Server/Input/CoreAssignee: Xorg Project Team <xorg-team>
Status: RESOLVED INVALID QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: berto.d.sera, millosh, s.cretella
Version: gitKeywords: x12
Hardware: Other   
OS: All   
Whiteboard: 2011BRB_Reviewed
i915 platform: i915 features:

Description Nicolas Mailhot 2009-07-06 23:35:27 UTC
Many scripts use ligatures or letters with many diacritics. Unfortunately the unicode.org consortium didn't assign precomposed codepoints to all of them (and tends not to assign precomposed codepoints to scripts that didn't grab them while it was possible)

If you're lucky, all the bits your script needs use single codepoints, and you can add all your ligatures and ciacritic-ed letters to an xkb keymap. If you're not you can't

This limitation should be removed to make all scripts equal, just like the 255 keycode limit is being removed in bug 11227
Comment 1 James Cloos 2009-07-10 12:23:58 UTC
Note that the current workaround is to define a keysym for each combo
and add that to the compose table with the desired string as output.

Cf the Arabic ligatures in the en_US.UTF-8/Compose table.
Comment 2 Berto 'd' Sera 2009-10-22 10:26:46 UTC
+1 This is really a frequent problem, it needs a stable clear policy, not a geeky workaround. The digital divide is big enough without adding trouble from our side.
Comment 3 Sabine Emmy Eller 2009-10-22 10:50:44 UTC
I now exactly have this problem. This issue will come up over and over again during the next weeks/months depending on which less resourced languages I am/we are going to work with.
Comment 4 Daniel Stone 2009-10-22 20:58:03 UTC
this will hopefully be supported in xkb2.
Comment 5 Dilan Roshani 2009-10-23 02:33:35 UTC
A friend of mine has asked me to share my expertise in computational linguistic with you. Although I am not familiar with the language except the information I got from Wiki, but I would like to share my experience as on Kurdish with you. We had the same problem back in 1992 with Kurdish written in many non standard systems. Therefore I went for re-codifying and unifying all the systems into one to free the writing experience for any age on any level. see more at www.kurdishacademy.org

What I see in relation to Udmurt is a classical issue with codified language with too many singular cases that does not appear in any other languages so it can not be covered by Unicode or any other standard. You will need to look into digraph and trigraph cases to cover the rear cases instead of creating very odd single letters. This will make the codification more standard friendly. It is better for the future of the language in electronic medium. 

You need to always see the concept from a 7 years old kid's perspective otherwise the language learning will become a sophisticated experience for computer experts. This is important for the survival of the language in new rapid electronic era.
Comment 6 Berto 'd' Sera 2009-10-24 01:57:47 UTC
I've been reading the comments on the bug and I tried my luck with the
"current workaround" as suggested by James Cloos, but I seem to be to
dumb to get the logics behind it. The proposed example with Arabic
ligatures in the us_utf8/Compose file sounded promising, but turned
out to be quite cryptic.

On having a look at a Fedora box the only such locale I find is in:
/usr/share/X11/locale
it's already quite a mystery why a handful of locales would be here,
while everything else is under /usr/share/locale (but okay, let's
assume there's a good reason).

In the locale dir I do get a Compose file, authored by David Monniaux
which in turn has a snippet that goes
....
#
# Arabic Lam-Alef ligatures
#

<UFEFB> :   "لا" # ARABIC LIGATURE LAM WITH ALEF
<UFEF7> :   "لأ" # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA ABOVE
<UFEF9> :   "لإ" # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA BELOW
<UFEF5> :   "لآ" # ARABIC LIGATURE LAM WITH ALEF WITH MADDA ABOVE
....
Only, to the best of my understanding these are single unicode points,
specified in the Basic Multiligual Plane, within the Arabic
Presentation Forms-B (see
http://en.wikipedia.org/wiki/Basic_Multilingual_Plane). How would
this explain how to create a char made of a base and a combining
diacritic?
Comment 7 Milos Rancic 2009-11-19 18:41:22 UTC
A number of times I found that this feature is needed. Fortunately, Unicode supports my language better than other languages and I don't need it for my own language. However, Unicode stopped with adding support for the rest of the languages and some of the features for my language don't have full support inside of Unicode. Fortunately, again, it is possible to use my language fully by combining regular features of XKB and keymap engines at other OS-es. At the other side, it is obviously that it is not enough for many other languages.
Comment 8 Adam Jackson 2018-06-12 18:44:01 UTC
Mass closure: This bug has been untouched for more than six years, and is not obviously still valid. Please file a new report if you continue to experience issues with a current server.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.