Re: [PATCH]console:UTF-8 mode compatibility fixes

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "Adam Tlałka" <atlka@pg.gda.pl>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org, torvalds@osdl.org
Subject: Re: [PATCH]console:UTF-8 mode compatibility fixes
Date: Sat, 18 Feb 2006 17:01:10 +0100	[thread overview]
Message-ID: <43F744C6.8020209@pg.gda.pl> (raw)
In-Reply-To: <20060218025921.7456e168.akpm@osdl.org>

Użytkownik Andrew Morton napisał:

> Adam Tla/lka <atlka@pg.gda.pl> wrote:
> 
>>
>>This patch applies to 2.6.15.3 kernel sources to drivers/char/vt.c file.
>>It should work with other versions too.
>>
>>Changed console behaviour so in UTF-8 mode vt100 alternate character
>>sequences work as described in terminfo/termcap linux terminal definition.
>>Programs can use vt100 control seqences - smacs, rmacs and acsc  characters
>>in UTF-8 mode in the same way as in normal mode so one definition is always
>>valid - current behaviour make these seqences not working in UTF-8 mode.
>>
>>Added reporting malformed UTF-8 seqences as replacement glyphs.
>>I think that terminal should always display something rather then ignoring
>>these kind of data as it does now. Also it sticks to Unicode standards
>>saying that every wrong byte should be reported. It is more human readable
>>too in case of Latin subsets including ASCII chars.
>>
>>...
>>
>>-		} else if (vc->vc_utf) {
>>+		} else if (vc->vc_utf && !vc->vc_disp_ctrl) {
>> 		    /* Combine UTF-8 into Unicode */
>>-		    /* Incomplete characters silently ignored */
>>+		    /* Malformed sequence represented as replacement glyphs */
>>+rescan_last_byte:
>> 		    if(c > 0x7f) {
>>
>>...
>>
>>+					if (vc->vc_npar) {
>>+						c = orig;
>>+						goto rescan_last_byte;
>>+					}
>>
>>...
>>
>>+				}
>>+				vc->vc_utf_count = 0;
>>+				c = orig;
>>+				goto rescan_last_byte;
>>+			}
>> 			continue;
>> 		}
> 
> 
> I spent some time trying to work out why this cannot cause an infinite loop
> and gave up.  Can you explain?

1. this code is executed only if vc_utf_count != 0
which means uncompleted UTF-8 sequence, because in case of proper UTF-8 
sequence or normal mode vc_utf_count == 0 in these places of code.

2. vc_npar is not used while completing UTF-seqence so I used it as a 
counter of scanned sequence continuation bytes, it is set to 0 if begin 
of UTF-8 seqence is detected and vc_utf_count set to number of 
continuation bytes

3. when you can't display replacement glyph bad sequence is ignored as 
previously so vc_utf_count and vc_npar is zeroed in case of  malformed 
UTF-8 seqence and there is no loop - anyway replacement glyph
should always be defined IMHO or I must change this code because
it seems not to be correct to use c as tc as a last resort because in 
this case c means byte value which malformed scanned seqence so it is 
not valuable for us. Maybe the better way is to use "?" char as a last
resort instead of c value. What do you think?
Maybe I should remember all bytes of the UTF-sequence to use their 
values as a last resort char in case of malformed sequence and 0xfffd
not defined?

Regards
-- 
Adam Tlałka       mailto:atlka@pg.gda.pl    ^v^ ^v^ ^v^
System  & Network Administration Group       - - - ~~~~~~
Computer Center,  Gdańsk University of Technology, Poland
PGP public key:   finger atlka@sunrise.pg.gda.pl

next prev parent reply	other threads:[~2006-02-18 16:02 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-17 23:33 [PATCH]console:UTF-8 mode compatibility fixes Adam Tla/lka
2006-02-18 10:59 ` Andrew Morton
2006-02-18 16:01   ` Adam Tlałka [this message]
2006-02-19  4:24     ` Alexander E. Patrakov
2006-02-19 12:45       ` Adam Tla/lka
2006-02-19 16:16       ` Adam Tla/lka
2006-02-19 17:07         ` Alexander E. Patrakov
2006-02-18 14:17 ` Alexander E. Patrakov
2006-02-19  1:53   ` Thomas Dickey
2006-02-19  4:33     ` Alexander E. Patrakov
2006-02-19 11:47     ` Adam Tla/lka
2006-02-20  1:20       ` Thomas Dickey
2006-03-07 15:05         ` Adam Tlałka
2006-02-19  5:42   ` Alexander E. Patrakov
2006-02-19 10:15     ` Adam Tla/lka
2006-02-19 23:19       ` [PATCH]console:UTF-8 mode compatibility fixes - new version Adam Tla/lka
2006-02-20  8:14         ` [PATCH]console:UTF-8 mode compatibility fixes - new version #1 Adam Tla/lka
     [not found] ` <43F72A1E.1090707@ums.usu.ru>
2006-02-18 14:37   ` [PATCH]console:UTF-8 mode compatibility fixes Adam Tlałka
2006-02-19  1:43     ` Thomas Dickey
2006-02-19 10:45       ` Adam Tla/lka
2006-02-18 22:35 ` Adam Tla/lka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43F744C6.8020209@pg.gda.pl \
    --to=atlka@pg.gda.pl \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox