public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Improved console UTF-8 support for the Linux kernel?
@ 2004-12-11 17:06 Simos Xenitellis
  2004-12-11 17:30 ` David Gómez
  0 siblings, 1 reply; 15+ messages in thread
From: Simos Xenitellis @ 2004-12-11 17:06 UTC (permalink / raw)
  To: linux-kernel


Hi All,
The current UTF-8 keyboard input (for the console) of the Linux kernel
does not support  "composing" or writing characters with accents. This
affects quite a few languages that require accents (French, German,
Danish, Swedish?, Greek, cyrillic-based?, others?.). 

In general, UTF-8 console support is good to display text in different character sets,
enabling to configure a distribution to use UTF-8 locales for both
console/Xorg. However, while it was possible to write in German, Spanish, French, etc,
now it is not possible anymore.

While looking into the problem, I noticed that there is work to make
Linux console handle Unicode better.

Two links are of interest
A. Improved UTF-8 support for the Linux kernel, by Chris Heath
http://chris.heathens.co.nz/linux/utf8.html
B. Notes on the Linux console, by Innocenti Maresin
http://www.comtv.ru/~av95/linux/console/

Discussion on these issues take place at the linux-utf8 mailing list, archived at
http://groups-beta.google.com/group/nlo.lists.linux-utf8

Chris Heath has a set of incremental patches
(http://chris.heathens.co.nz/linux/utf8.html) to enhance Unicode for the
console.
I noticed that he contacted this list in May 2003
(http://seclists.org/lists/linux-kernel/2003/May/7956.html) but
unfortunatelly the discussion was diverted to coding styles.

Is there an interest for re-submission of mentioned patches for
inclusion in the kernel (yeah, provided coding style is "normalised")?

Simos

p.s.
I am not sending this e-mail on behalf of any of the authors, just
myself.


^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: Improved console UTF-8 support for the Linux kernel?
@ 2004-12-12 14:02 Simos Xenitellis
  0 siblings, 0 replies; 15+ messages in thread
From: Simos Xenitellis @ 2004-12-12 14:02 UTC (permalink / raw)
  To: linux-kernel


Jan Engelhardt wrote: 
> >> The current UTF-8 keyboard input (for the console) of the Linux kernel
> >> does not support  "composing" or writing characters with accents.
> 
> That's weird, because "ö" (LATIN O WITH DIAERESIS) -- which clearly lies
> outside the 7-bit range, is working on my system without myself poking the
> kernel. Both hitting the key or using compose mode. This also applies to
> A-with-DIAERESIS, U-with-DIAERESIS, sharp german S, but does not for anything
> else, e.g. compose-'-e to generate E with accent aigu.

I am a bit confused. Could you please comment on the following, as a
common test steps?

I am not sure how you wrote the above characters. According to UTF-8,
characters with codepoints above 0x79 require two bytes so that to be
valid. When you compose "ö" (you press something like ";", then "o") in
the console?

For simplicity, let's assume you do something like
% loadkeys --unicode
keycode 53 = 0x0d2f
compose '/' 'q' to U+00F6
compose '/' 'w' to U+00F7
compose '/' 'e' to U+00F8
compose '/' 'r' to U+00F9
compose '/' 't' to U+0100
compose '/' 'y' to U+0101
keycode 2 = U+00F6
keycode 3 = U+00F7
keycode 4 = U+00F8
keycode 5 = U+00F9
keycode 6 = U+0100
keycode 7 = U+0101
^D
% 

Dead key (due to "0d") is the character "/" (0x2f).
Keycodes 2-7 are keys for numbers 1-6.
To test, I type
% cat > test.txt
<we try out all key compositions to generate U+00F6-U+0101>
^D

When we try keys 1-6, we get
% od -x text.txt
0000000 b6c3 b7c3 b8c3 b9c3 80c4 81c4 000a
0000015
%
which is correct.

When we try using the dead key "/" and q-y, we get
% od -x test.txt
0000000 f7f6 f9f8 0100 000a
0000007
% 

To get the keyboard in a sane mode, "loadkeys --unicode -d".

>From here we see there is no conversion to UTF-8 whatsoever.

In the second case, the kernel cannot return the full character when it
is in Unicode mode.

> >Yes, i recently find it out when trying to switch all my system to
> >UTF-8. But the patch from Chris you mention below works very well
> >for me (and for anybody that needs to type compose characters for
> >languages based in the latin1 encoding i guess).
> >
> >> Is there an interest for re-submission of mentioned patches for
> >> inclusion in the kernel (yeah, provided coding style is "normalised")?
> >
> >At least, I am _really_ interested :)
> 
> So am I. I have to use xterm for anything fancy now...
> (especially for the even-more fancy stuff that begins at three-byte UTF8
> sequences, such as Japanese :-)

Good. I hope more people raise their hands for this.

Simos

[I am sending this again. It did not make it to the kernel mailing list in the first^Wsecond post for some reason..]


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2004-12-12 23:52 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-12-11 17:06 Improved console UTF-8 support for the Linux kernel? Simos Xenitellis
2004-12-11 17:30 ` David Gómez
2004-12-11 19:07   ` Jan Engelhardt
2004-12-11 21:25     ` David Gómez
2004-12-11 21:39       ` Jan Engelhardt
2004-12-11 22:01         ` David Gómez
2004-12-11 22:26         ` Gene Heskett
     [not found]     ` <1102803807.3183.59.camel@kl>
2004-12-12  0:05       ` Jan Engelhardt
2004-12-12  0:38         ` David Gómez
2004-12-12 22:08           ` Simos Xenitellis
2004-12-12 22:14             ` Jan Engelhardt
2004-12-12 23:06             ` David Gómez
2004-12-12 23:52             ` Andries Brouwer
2004-12-12 15:44         ` Lehmann 
  -- strict thread matches above, loose matches on Subject: below --
2004-12-12 14:02 Simos Xenitellis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox