public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Willy Tarreau <w@1wt.eu>
To: Adrian Bunk <bunk@kernel.org>
Cc: Helge Hafting <helge.hafting@aitel.hist.no>,
	"H. Peter Anvin" <hpa@kernel.org>,
	linux-kernel@vger.kernel.org, trivial@kernel.org
Subject: Re: [2.6 patch] UTF-8 fixes in comments
Date: Tue, 29 Apr 2008 13:06:38 +0200	[thread overview]
Message-ID: <20080429110638.GG1473@1wt.eu> (raw)
In-Reply-To: <20080429104216.GC19269@cs181133002.pp.htv.fi>

On Tue, Apr 29, 2008 at 01:42:16PM +0300, Adrian Bunk wrote:
> On Tue, Apr 29, 2008 at 12:09:34PM +0200, Willy Tarreau wrote:
> > On Tue, Apr 29, 2008 at 11:06:05AM +0200, Helge Hafting wrote:
> > > >Well, I accidentally used a freshly installed laptop running mandriva 2008.
> > > >I was typing in a terminal inside KDE (I don't know the program name, sort
> > > >of an xterm, but with huge borders all around). I made a typo in a word and
> > > >typed in a "é" (e acute). Pressing backspace to fix it showed me that I
> > > >remove more chars than typed. I tried again. Pressing this letter 5 times,
> > > >then 10 times backspace. I removed 5 chars from the prompt. I suspect that
> > > >if I had used some chars with wider encoding (eg 4 bytes), I could have
> > > >removed as many... Clearly those tools are not ready.
> > > >  
> > > So don't use that particular tool
> > 
> > It was not my machine, and had you been there, you would have heard me call
> > it names !
> > 
> > > and/or file a bug with the maintainer. :-)
> > 
> > It's too easy to impose crappy designs to end-users and tell them that if
> > that does not work they have to file a bug. There are a minimal set of
> > things that must be tested before shipping. Seeing that the default
> > terminal emulator in KDE on Mandriva 2008 is configured in UTF-8 and does
> > not properly render it simply makes me sick. This is broken by design and
> > even distros trying to get it working for years still can't cope with it.
> > There must be a reason.
> 
> I can reproduce your problem in a plain xterm when setting LANG=en_US
> (most likely the same problem can occur with other non UTF-8 settings).

possibly they broke it when forcing support for variable length ?

> In this case I'm actually more surprised that the character is displayed 
> correctly than that you have to type backspace twice.

It's not that I *had* to type it twice. But I *could* type it twice, and
the first one removed the character, the second one the prompt.

> Any kind of charset mixing is highly problematic (which is also why my 
> patch was attached compressed), so if you disable UTF-8 anywhere in a 
> modern distribution problems are somehow expected (it could also be a 
> bug in Mandrivas default settings, but that would really surprise me).

No, it was not disabled at all. I had to type in a command for a
co-worker who just did a default install the day before, and typed a
typo which I wanted to fix.

> > Unicode yes, UTF-8 no. UTF-8 is a compressed encoding of unicode.
> > That's as silly as if you had to replace your terminals to read
> > native gzip, and expect them as well as all the tools to work
> > properly!
> 
> It's not a compressed encoding, it's a variable-length encoding.
> 
> Besides the size advantages one main advantage of UTF-8 is that ASCII is 
> valid UTF-8. This means that for the ASCII source code in the kernel it 
> doesn't matter whether it's treated as ASCII or UTF-8, and no conversion 
> was needed.
> 
> You can't get this property with a fixed-size Unicode encoding.

I don't agree. If you refuse character-set mixing, there's no problem.
Bit 7 of first char == 1 ? => full text is 32 bit.

Willy


  reply	other threads:[~2008-04-29 11:07 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-28 15:40 [2.6 patch] UTF-8 fixes in comments Adrian Bunk
2008-04-28 23:05 ` Willy Tarreau
2008-04-29  1:29   ` H. Peter Anvin
2008-04-29  5:06     ` Willy Tarreau
2008-04-29  6:04       ` H. Peter Anvin
2008-04-29  7:29       ` Adrian Bunk
2008-04-29  8:14         ` Willy Tarreau
2008-04-29  9:06           ` Helge Hafting
2008-04-29  9:33             ` Alan Cox
2008-04-29 10:09             ` Willy Tarreau
2008-04-29 10:10               ` Alan Cox
2008-04-29 10:33                 ` Willy Tarreau
2008-04-29 10:34                   ` Alan Cox
2008-04-29 22:12                     ` Willy Tarreau
2008-04-29 22:15                       ` Alan Cox
2008-04-29 23:05                         ` Willy Tarreau
2008-05-01 20:18                           ` H. Peter Anvin
2008-05-01  9:46                   ` Alexander E. Patrakov
2008-04-29 19:33                 ` H. Peter Anvin
2008-04-29 10:42               ` Adrian Bunk
2008-04-29 11:06                 ` Willy Tarreau [this message]
2008-04-29 11:27                   ` Adrian Bunk
2008-04-29 11:32                     ` Adrian Bunk
2008-04-29 20:18                       ` Jeremy Fitzhardinge
2008-04-30  9:15               ` Helge Hafting
2008-04-30 19:22                 ` Adrian Bunk
2008-04-30 19:42                 ` H. Peter Anvin
2008-04-29  9:43           ` Adrian Bunk
2008-04-29 19:31           ` H. Peter Anvin
2008-04-29 20:05             ` Willy Tarreau
2008-04-29 20:09               ` H. Peter Anvin
2008-05-09 12:48       ` David Kågedal
2008-04-29  9:01   ` Alan Cox
2008-04-29  9:19     ` Jan Engelhardt
2008-04-29  9:34     ` Willy Tarreau
2008-04-29  9:41       ` Alan Cox
2008-04-29 12:18 ` KOSAKI Motohiro
  -- strict thread matches above, loose matches on Subject: below --
2008-04-30  0:08 Samuel Thibault
2008-04-30  3:38 ` Chris Adams
2008-04-30  9:38 ` Samuel Thibault
2008-04-30 19:45   ` Willy Tarreau
2008-04-30 19:49 ` Willy Tarreau
2008-05-03 23:50   ` Samuel Thibault
2008-05-04  8:55     ` Willy Tarreau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080429110638.GG1473@1wt.eu \
    --to=w@1wt.eu \
    --cc=bunk@kernel.org \
    --cc=helge.hafting@aitel.hist.no \
    --cc=hpa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=trivial@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox