Bug 18501

Summary: Missing compose sequences for Cedilla and a
Product: xorg Reporter: Simos Xenitellis <simos.bugzilla>
Component: Lib/Xlib (data)Assignee: Xorg Project Team <xorg-team>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: bensberg, hramrach, jkohen
Version: unspecifiedKeywords: i18n
Hardware: Other   
OS: All   
Whiteboard: 2011BRB_Reviewed
i915 platform: i915 features:

Description Simos Xenitellis 2008-11-12 09:07:51 UTC
From http://bugzilla.gnome.org/show_bug.cgi?id=559909

There are some compose sequences missing,

  <Multi_key> <comma> <a> : "ą"
  <Multi_key> <comma> <A> : "Ą"
  <Multi_key> <cedilla> <a> : "ą"
  <Multi_key> <cedilla> <A> : "Ą"

I wonder if the reporter at the GNOME report can have a look and identify if there are any other cedilla-related compose sequences that are missing.
Comment 1 James Cloos 2008-11-12 09:55:19 UTC
The characters »ą«, »Ą«, »ą« and »Ą« from that report all have OGONEKs,
not CEDILLAs.  They are in the Compose table vi <Multi_key> <semicolon>
and <dead_ogonek>.

It seems that <Multi_key> <cedilla> <a> should be set aside for a
putative »a̧« combining sequence, should it be needed.

OTOH, the pt_BR.UTF-8 has set the precedent for locale-specific Compose
files which are only slightly different than en_US.UTF-8.  We could
create a pl_PL.UTF-8 dir and add those compose sequences there.
Comment 2 Javier Kohen 2008-11-13 11:37:54 UTC
(In reply to comment #0)
> From http://bugzilla.gnome.org/show_bug.cgi?id=559909
> 
> There are some compose sequences missing,
> 
>   <Multi_key> <comma> <a> : "ą"
>   <Multi_key> <comma> <A> : "Ą"
>   <Multi_key> <cedilla> <a> : "ą"
>   <Multi_key> <cedilla> <A> : "Ą"
> 
> I wonder if the reporter at the GNOME report can have a look and identify if
> there are any other cedilla-related compose sequences that are missing.
> 

I'm the original reporter at the Gnome bugzilla. It's right that those have ogoneks, unfortunately US keyboards don't have that character, so that's why I think the comma and the cedilla were used. For reference, it's still possible to enter e with ogonek using comma (and I think cedilla, too).

I haven't found other missing combinations for Spanish, German and Polish. However, I still miss the ability to enter the letter first and the diacritic mark latter, as in for example <a> <comma>.
Comment 3 James Cloos 2008-11-13 13:23:32 UTC
> For reference, it's still possible to enter e with ogonek using comma
> (and I think cedilla, too).

Not using XIM and the xorg Compose table:

:; grep 'E WITH OGONEK' xorg/lib/libX11/nls/en_US.UTF-8/Compose.pre 
<dead_ogonek> <E>                	: "Ę"   U0118 # LATIN CAPITAL LETTER E WITH OGONEK
<Multi_key> <semicolon> <E>      	: "Ę"   U0118 # LATIN CAPITAL LETTER E WITH OGONEK
<dead_ogonek> <e>                	: "ę"   U0119 # LATIN SMALL LETTER E WITH OGONEK
<Multi_key> <semicolon> <e>      	: "ę"   U0119 # LATIN SMALL LETTER E WITH OGONEK

Of the other Compose tables, the pt_BR.UTF-8 table also references U0118
and U0119 with the same sequeces as above.  

However, some of the ISO-8859 tables (-2, -4 and -13) reference eogonek
and Eogonek—which probably means that the UTF-8 tables should use those
symbols instead of the Uxxxx symbols; I’ll look into that—and they, as
you wrote, support <Multi_key> <comma> <e/E> and <Multi_key> <e/E> <comma>.

The en_US.UTF-8 table tries to normalize the Multi_key-initiated
sequences, using comma for cedillas, semicolon for ogonek, etc.  We can,
however, add locale-specific tables with backwards-compatable support
for UTF-9 locales which derrive from ISO-8859 locales.  pt_BR.UTF-8 is
the only current example which has just minimal changes from en_US.UTF-8,
but more can be added if the user community wants them.

Generally speaking, the reversed order compose sequences may or may not
be possible.  Some may alias other seqences.
Comment 4 Javier Kohen 2008-11-13 13:33:25 UTC
My mistake, I didn't notice the difference between ȩ (e with cedilla) and ę (e with ogonek) until a native Polish speaker brought that to my attention a few minutes ago. So I've been using the wrong character because the compose sequence changed.

In any case, I think my knowledge of the technicalities is not sufficient to suggest a course of action. If overrides are possible, it sounds like a nice way to keep compatible, however if they are rare, they might cause more harm by making things less predictable.
Comment 5 Jeremy Huddleston Sequoia 2011-10-03 09:40:04 UTC
Is this still relevant?
Comment 6 Benno Schulenberg 2013-09-08 15:28:38 UTC
(In reply to comment #5)
> Is this still relevant?

No, it is not: in June 2010 the sequences with <Multi_key> <comma> were added for A, a, E, e, I, i, U and u to produce the corresponding letters with ogoneks.  Then in March 2012 the postfix versions were added.  So effectively this bug is fixed.  Closing.

(In reply to comment #2)
> For reference, it's still possible to enter e with ogonek using comma
> (and I think cedilla, too).

Yes, in GTK+ (or at least: in my oldish GTK+) <Multi_key> <e> <comma> produces ę (e with ogonek), and <Multi_key> <comma> <e> gives ȩ (e with cedilla).  That might be considered useful, but is instead rather confusing: all other reversed sequences with comma always produce the same letter.  The current compose table in X is okay.

(In reply to comment #3)
> The en_US.UTF-8 table tries to normalize the Multi_key-initiated
> sequences, using comma for cedillas, semicolon for ogonek, etc.

As noted above, this is no longer true.  The sequences with comma for ogonek were added because some ISO-8859-* files had them too.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.