From: "Adam Tlałka" <atlka@pg.gda.pl>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org, torvalds@osdl.org
Subject: Re: [PATCH]console:UTF-8 mode compatibility fixes
Date: Sat, 18 Feb 2006 17:01:10 +0100 [thread overview]
Message-ID: <43F744C6.8020209@pg.gda.pl> (raw)
In-Reply-To: <20060218025921.7456e168.akpm@osdl.org>
Użytkownik Andrew Morton napisał:
> Adam Tla/lka <atlka@pg.gda.pl> wrote:
>
>>
>>This patch applies to 2.6.15.3 kernel sources to drivers/char/vt.c file.
>>It should work with other versions too.
>>
>>Changed console behaviour so in UTF-8 mode vt100 alternate character
>>sequences work as described in terminfo/termcap linux terminal definition.
>>Programs can use vt100 control seqences - smacs, rmacs and acsc characters
>>in UTF-8 mode in the same way as in normal mode so one definition is always
>>valid - current behaviour make these seqences not working in UTF-8 mode.
>>
>>Added reporting malformed UTF-8 seqences as replacement glyphs.
>>I think that terminal should always display something rather then ignoring
>>these kind of data as it does now. Also it sticks to Unicode standards
>>saying that every wrong byte should be reported. It is more human readable
>>too in case of Latin subsets including ASCII chars.
>>
>>...
>>
>>- } else if (vc->vc_utf) {
>>+ } else if (vc->vc_utf && !vc->vc_disp_ctrl) {
>> /* Combine UTF-8 into Unicode */
>>- /* Incomplete characters silently ignored */
>>+ /* Malformed sequence represented as replacement glyphs */
>>+rescan_last_byte:
>> if(c > 0x7f) {
>>
>>...
>>
>>+ if (vc->vc_npar) {
>>+ c = orig;
>>+ goto rescan_last_byte;
>>+ }
>>
>>...
>>
>>+ }
>>+ vc->vc_utf_count = 0;
>>+ c = orig;
>>+ goto rescan_last_byte;
>>+ }
>> continue;
>> }
>
>
> I spent some time trying to work out why this cannot cause an infinite loop
> and gave up. Can you explain?
1. this code is executed only if vc_utf_count != 0
which means uncompleted UTF-8 sequence, because in case of proper UTF-8
sequence or normal mode vc_utf_count == 0 in these places of code.
2. vc_npar is not used while completing UTF-seqence so I used it as a
counter of scanned sequence continuation bytes, it is set to 0 if begin
of UTF-8 seqence is detected and vc_utf_count set to number of
continuation bytes
3. when you can't display replacement glyph bad sequence is ignored as
previously so vc_utf_count and vc_npar is zeroed in case of malformed
UTF-8 seqence and there is no loop - anyway replacement glyph
should always be defined IMHO or I must change this code because
it seems not to be correct to use c as tc as a last resort because in
this case c means byte value which malformed scanned seqence so it is
not valuable for us. Maybe the better way is to use "?" char as a last
resort instead of c value. What do you think?
Maybe I should remember all bytes of the UTF-sequence to use their
values as a last resort char in case of malformed sequence and 0xfffd
not defined?
Regards
--
Adam Tlałka mailto:atlka@pg.gda.pl ^v^ ^v^ ^v^
System & Network Administration Group - - - ~~~~~~
Computer Center, Gdańsk University of Technology, Poland
PGP public key: finger atlka@sunrise.pg.gda.pl
next prev parent reply other threads:[~2006-02-18 16:02 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-02-17 23:33 [PATCH]console:UTF-8 mode compatibility fixes Adam Tla/lka
2006-02-18 10:59 ` Andrew Morton
2006-02-18 16:01 ` Adam Tlałka [this message]
2006-02-19 4:24 ` Alexander E. Patrakov
2006-02-19 12:45 ` Adam Tla/lka
2006-02-19 16:16 ` Adam Tla/lka
2006-02-19 17:07 ` Alexander E. Patrakov
2006-02-18 14:17 ` Alexander E. Patrakov
2006-02-19 1:53 ` Thomas Dickey
2006-02-19 4:33 ` Alexander E. Patrakov
2006-02-19 11:47 ` Adam Tla/lka
2006-02-20 1:20 ` Thomas Dickey
2006-03-07 15:05 ` Adam Tlałka
2006-02-19 5:42 ` Alexander E. Patrakov
2006-02-19 10:15 ` Adam Tla/lka
2006-02-19 23:19 ` [PATCH]console:UTF-8 mode compatibility fixes - new version Adam Tla/lka
2006-02-20 8:14 ` [PATCH]console:UTF-8 mode compatibility fixes - new version #1 Adam Tla/lka
[not found] ` <43F72A1E.1090707@ums.usu.ru>
2006-02-18 14:37 ` [PATCH]console:UTF-8 mode compatibility fixes Adam Tlałka
2006-02-19 1:43 ` Thomas Dickey
2006-02-19 10:45 ` Adam Tla/lka
2006-02-18 22:35 ` Adam Tla/lka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43F744C6.8020209@pg.gda.pl \
--to=atlka@pg.gda.pl \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox