All of lore.kernel.org
 help / color / mirror / Atom feed
From: Willy Tarreau <w@1wt.eu>
To: Adrian Bunk <bunk@kernel.org>
Cc: Helge Hafting <helge.hafting@aitel.hist.no>,
	"H. Peter Anvin" <hpa@kernel.org>,
	linux-kernel@vger.kernel.org, trivial@kernel.org
Subject: Re: [2.6 patch] UTF-8 fixes in comments
Date: Tue, 29 Apr 2008 13:06:38 +0200	[thread overview]
Message-ID: <20080429110638.GG1473@1wt.eu> (raw)
In-Reply-To: <20080429104216.GC19269@cs181133002.pp.htv.fi>

On Tue, Apr 29, 2008 at 01:42:16PM +0300, Adrian Bunk wrote:
> On Tue, Apr 29, 2008 at 12:09:34PM +0200, Willy Tarreau wrote:
> > On Tue, Apr 29, 2008 at 11:06:05AM +0200, Helge Hafting wrote:
> > > >Well, I accidentally used a freshly installed laptop running mandriva 2008.
> > > >I was typing in a terminal inside KDE (I don't know the program name, sort
> > > >of an xterm, but with huge borders all around). I made a typo in a word and
> > > >typed in a "é" (e acute). Pressing backspace to fix it showed me that I
> > > >remove more chars than typed. I tried again. Pressing this letter 5 times,
> > > >then 10 times backspace. I removed 5 chars from the prompt. I suspect that
> > > >if I had used some chars with wider encoding (eg 4 bytes), I could have
> > > >removed as many... Clearly those tools are not ready.
> > > >  
> > > So don't use that particular tool
> > 
> > It was not my machine, and had you been there, you would have heard me call
> > it names !
> > 
> > > and/or file a bug with the maintainer. :-)
> > 
> > It's too easy to impose crappy designs to end-users and tell them that if
> > that does not work they have to file a bug. There are a minimal set of
> > things that must be tested before shipping. Seeing that the default
> > terminal emulator in KDE on Mandriva 2008 is configured in UTF-8 and does
> > not properly render it simply makes me sick. This is broken by design and
> > even distros trying to get it working for years still can't cope with it.
> > There must be a reason.
> 
> I can reproduce your problem in a plain xterm when setting LANG=en_US
> (most likely the same problem can occur with other non UTF-8 settings).

possibly they broke it when forcing support for variable length ?

> In this case I'm actually more surprised that the character is displayed 
> correctly than that you have to type backspace twice.

It's not that I *had* to type it twice. But I *could* type it twice, and
the first one removed the character, the second one the prompt.

> Any kind of charset mixing is highly problematic (which is also why my 
> patch was attached compressed), so if you disable UTF-8 anywhere in a 
> modern distribution problems are somehow expected (it could also be a 
> bug in Mandrivas default settings, but that would really surprise me).

No, it was not disabled at all. I had to type in a command for a
co-worker who just did a default install the day before, and typed a
typo which I wanted to fix.

> > Unicode yes, UTF-8 no. UTF-8 is a compressed encoding of unicode.
> > That's as silly as if you had to replace your terminals to read
> > native gzip, and expect them as well as all the tools to work
> > properly!
> 
> It's not a compressed encoding, it's a variable-length encoding.
> 
> Besides the size advantages one main advantage of UTF-8 is that ASCII is 
> valid UTF-8. This means that for the ASCII source code in the kernel it 
> doesn't matter whether it's treated as ASCII or UTF-8, and no conversion 
> was needed.
> 
> You can't get this property with a fixed-size Unicode encoding.

I don't agree. If you refuse character-set mixing, there's no problem.
Bit 7 of first char == 1 ? => full text is 32 bit.

Willy


  reply	other threads:[~2008-04-29 11:07 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-28 15:40 [2.6 patch] UTF-8 fixes in comments Adrian Bunk
2008-04-28 23:05 ` Willy Tarreau
2008-04-29  1:29   ` H. Peter Anvin
2008-04-29  5:06     ` Willy Tarreau
2008-04-29  6:04       ` H. Peter Anvin
2008-04-29  7:29       ` Adrian Bunk
2008-04-29  8:14         ` Willy Tarreau
2008-04-29  9:06           ` Helge Hafting
2008-04-29  9:33             ` Alan Cox
2008-04-29 10:09             ` Willy Tarreau
2008-04-29 10:10               ` Alan Cox
2008-04-29 10:33                 ` Willy Tarreau
2008-04-29 10:34                   ` Alan Cox
2008-04-29 22:12                     ` Willy Tarreau
2008-04-29 22:15                       ` Alan Cox
2008-04-29 23:05                         ` Willy Tarreau
2008-05-01 20:18                           ` H. Peter Anvin
2008-05-01  9:46                   ` Alexander E. Patrakov
2008-04-29 19:33                 ` H. Peter Anvin
2008-04-29 10:42               ` Adrian Bunk
2008-04-29 11:06                 ` Willy Tarreau [this message]
2008-04-29 11:27                   ` Adrian Bunk
2008-04-29 11:32                     ` Adrian Bunk
2008-04-29 20:18                       ` Jeremy Fitzhardinge
2008-04-30  9:15               ` Helge Hafting
2008-04-30 19:22                 ` Adrian Bunk
2008-04-30 19:42                 ` H. Peter Anvin
2008-04-29  9:43           ` Adrian Bunk
2008-04-29 19:31           ` H. Peter Anvin
2008-04-29 20:05             ` Willy Tarreau
2008-04-29 20:09               ` H. Peter Anvin
2008-05-09 12:48       ` David Kågedal
2008-04-29  9:01   ` Alan Cox
2008-04-29  9:19     ` Jan Engelhardt
2008-04-29  9:34     ` Willy Tarreau
2008-04-29  9:41       ` Alan Cox
2008-04-29 12:18 ` KOSAKI Motohiro
  -- strict thread matches above, loose matches on Subject: below --
2008-04-30  0:08 Samuel Thibault
2008-04-30  3:38 ` Chris Adams
2008-04-30  9:38 ` Samuel Thibault
2008-04-30 19:45   ` Willy Tarreau
2008-04-30 19:49 ` Willy Tarreau
2008-05-03 23:50   ` Samuel Thibault
2008-05-04  8:55     ` Willy Tarreau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080429110638.GG1473@1wt.eu \
    --to=w@1wt.eu \
    --cc=bunk@kernel.org \
    --cc=helge.hafting@aitel.hist.no \
    --cc=hpa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=trivial@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.