From: "Adam Tlałka" <atlka@pg.gda.pl>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org, torvalds@osdl.org
Subject: Re: [PATCH]console:UTF-8 mode compatibility fixes
Date: Sat, 18 Feb 2006 17:01:10 +0100 [thread overview]
Message-ID: <43F744C6.8020209@pg.gda.pl> (raw)
In-Reply-To: <20060218025921.7456e168.akpm@osdl.org>
Użytkownik Andrew Morton napisał:
> Adam Tla/lka <atlka@pg.gda.pl> wrote:
>
>>
>>This patch applies to 2.6.15.3 kernel sources to drivers/char/vt.c file.
>>It should work with other versions too.
>>
>>Changed console behaviour so in UTF-8 mode vt100 alternate character
>>sequences work as described in terminfo/termcap linux terminal definition.
>>Programs can use vt100 control seqences - smacs, rmacs and acsc characters
>>in UTF-8 mode in the same way as in normal mode so one definition is always
>>valid - current behaviour make these seqences not working in UTF-8 mode.
>>
>>Added reporting malformed UTF-8 seqences as replacement glyphs.
>>I think that terminal should always display something rather then ignoring
>>these kind of data as it does now. Also it sticks to Unicode standards
>>saying that every wrong byte should be reported. It is more human readable
>>too in case of Latin subsets including ASCII chars.
>>
>>...
>>
>>- } else if (vc->vc_utf) {
>>+ } else if (vc->vc_utf && !vc->vc_disp_ctrl) {
>> /* Combine UTF-8 into Unicode */
>>- /* Incomplete characters silently ignored */
>>+ /* Malformed sequence represented as replacement glyphs */
>>+rescan_last_byte:
>> if(c > 0x7f) {
>>
>>...
>>
>>+ if (vc->vc_npar) {
>>+ c = orig;
>>+ goto rescan_last_byte;
>>+ }
>>
>>...
>>
>>+ }
>>+ vc->vc_utf_count = 0;
>>+ c = orig;
>>+ goto rescan_last_byte;
>>+ }
>> continue;
>> }
>
>
> I spent some time trying to work out why this cannot cause an infinite loop
> and gave up. Can you explain?
1. this code is executed only if vc_utf_count != 0
which means uncompleted UTF-8 sequence, because in case of proper UTF-8
sequence or normal mode vc_utf_count == 0 in these places of code.
2. vc_npar is not used while completing UTF-seqence so I used it as a
counter of scanned sequence continuation bytes, it is set to 0 if begin
of UTF-8 seqence is detected and vc_utf_count set to number of
continuation bytes
3. when you can't display replacement glyph bad sequence is ignored as
previously so vc_utf_count and vc_npar is zeroed in case of malformed
UTF-8 seqence and there is no loop - anyway replacement glyph
should always be defined IMHO or I must change this code because
it seems not to be correct to use c as tc as a last resort because in
this case c means byte value which malformed scanned seqence so it is
not valuable for us. Maybe the better way is to use "?" char as a last
resort instead of c value. What do you think?
Maybe I should remember all bytes of the UTF-sequence to use their
values as a last resort char in case of malformed sequence and 0xfffd
not defined?
Regards
--
Adam Tlałka mailto:atlka@pg.gda.pl ^v^ ^v^ ^v^
System & Network Administration Group - - - ~~~~~~
Computer Center, Gdańsk University of Technology, Poland
PGP public key: finger atlka@sunrise.pg.gda.pl
next prev parent reply other threads:[~2006-02-18 16:02 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-02-17 23:33 [PATCH]console:UTF-8 mode compatibility fixes Adam Tla/lka
2006-02-18 10:59 ` Andrew Morton
2006-02-18 16:01 ` Adam Tlałka [this message]
2006-02-19 4:24 ` Alexander E. Patrakov
2006-02-19 12:45 ` Adam Tla/lka
2006-02-19 16:16 ` Adam Tla/lka
2006-02-19 17:07 ` Alexander E. Patrakov
2006-02-18 14:17 ` Alexander E. Patrakov
2006-02-19 1:53 ` Thomas Dickey
2006-02-19 4:33 ` Alexander E. Patrakov
2006-02-19 11:47 ` Adam Tla/lka
2006-02-20 1:20 ` Thomas Dickey
2006-03-07 15:05 ` Adam Tlałka
2006-02-19 5:42 ` Alexander E. Patrakov
2006-02-19 10:15 ` Adam Tla/lka
2006-02-19 23:19 ` [PATCH]console:UTF-8 mode compatibility fixes - new version Adam Tla/lka
2006-02-20 8:14 ` [PATCH]console:UTF-8 mode compatibility fixes - new version #1 Adam Tla/lka
[not found] ` <43F72A1E.1090707@ums.usu.ru>
2006-02-18 14:37 ` [PATCH]console:UTF-8 mode compatibility fixes Adam Tlałka
2006-02-19 1:43 ` Thomas Dickey
2006-02-19 10:45 ` Adam Tla/lka
2006-02-18 22:35 ` Adam Tla/lka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43F744C6.8020209@pg.gda.pl \
--to=atlka@pg.gda.pl \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.