From: Willy Tarreau <w@1wt.eu>
To: Adrian Bunk <bunk@kernel.org>
Cc: Helge Hafting <helge.hafting@aitel.hist.no>,
"H. Peter Anvin" <hpa@kernel.org>,
linux-kernel@vger.kernel.org, trivial@kernel.org
Subject: Re: [2.6 patch] UTF-8 fixes in comments
Date: Tue, 29 Apr 2008 13:06:38 +0200 [thread overview]
Message-ID: <20080429110638.GG1473@1wt.eu> (raw)
In-Reply-To: <20080429104216.GC19269@cs181133002.pp.htv.fi>
On Tue, Apr 29, 2008 at 01:42:16PM +0300, Adrian Bunk wrote:
> On Tue, Apr 29, 2008 at 12:09:34PM +0200, Willy Tarreau wrote:
> > On Tue, Apr 29, 2008 at 11:06:05AM +0200, Helge Hafting wrote:
> > > >Well, I accidentally used a freshly installed laptop running mandriva 2008.
> > > >I was typing in a terminal inside KDE (I don't know the program name, sort
> > > >of an xterm, but with huge borders all around). I made a typo in a word and
> > > >typed in a "é" (e acute). Pressing backspace to fix it showed me that I
> > > >remove more chars than typed. I tried again. Pressing this letter 5 times,
> > > >then 10 times backspace. I removed 5 chars from the prompt. I suspect that
> > > >if I had used some chars with wider encoding (eg 4 bytes), I could have
> > > >removed as many... Clearly those tools are not ready.
> > > >
> > > So don't use that particular tool
> >
> > It was not my machine, and had you been there, you would have heard me call
> > it names !
> >
> > > and/or file a bug with the maintainer. :-)
> >
> > It's too easy to impose crappy designs to end-users and tell them that if
> > that does not work they have to file a bug. There are a minimal set of
> > things that must be tested before shipping. Seeing that the default
> > terminal emulator in KDE on Mandriva 2008 is configured in UTF-8 and does
> > not properly render it simply makes me sick. This is broken by design and
> > even distros trying to get it working for years still can't cope with it.
> > There must be a reason.
>
> I can reproduce your problem in a plain xterm when setting LANG=en_US
> (most likely the same problem can occur with other non UTF-8 settings).
possibly they broke it when forcing support for variable length ?
> In this case I'm actually more surprised that the character is displayed
> correctly than that you have to type backspace twice.
It's not that I *had* to type it twice. But I *could* type it twice, and
the first one removed the character, the second one the prompt.
> Any kind of charset mixing is highly problematic (which is also why my
> patch was attached compressed), so if you disable UTF-8 anywhere in a
> modern distribution problems are somehow expected (it could also be a
> bug in Mandrivas default settings, but that would really surprise me).
No, it was not disabled at all. I had to type in a command for a
co-worker who just did a default install the day before, and typed a
typo which I wanted to fix.
> > Unicode yes, UTF-8 no. UTF-8 is a compressed encoding of unicode.
> > That's as silly as if you had to replace your terminals to read
> > native gzip, and expect them as well as all the tools to work
> > properly!
>
> It's not a compressed encoding, it's a variable-length encoding.
>
> Besides the size advantages one main advantage of UTF-8 is that ASCII is
> valid UTF-8. This means that for the ASCII source code in the kernel it
> doesn't matter whether it's treated as ASCII or UTF-8, and no conversion
> was needed.
>
> You can't get this property with a fixed-size Unicode encoding.
I don't agree. If you refuse character-set mixing, there's no problem.
Bit 7 of first char == 1 ? => full text is 32 bit.
Willy
next prev parent reply other threads:[~2008-04-29 11:07 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-28 15:40 [2.6 patch] UTF-8 fixes in comments Adrian Bunk
2008-04-28 23:05 ` Willy Tarreau
2008-04-29 1:29 ` H. Peter Anvin
2008-04-29 5:06 ` Willy Tarreau
2008-04-29 6:04 ` H. Peter Anvin
2008-04-29 7:29 ` Adrian Bunk
2008-04-29 8:14 ` Willy Tarreau
2008-04-29 9:06 ` Helge Hafting
2008-04-29 9:33 ` Alan Cox
2008-04-29 10:09 ` Willy Tarreau
2008-04-29 10:10 ` Alan Cox
2008-04-29 10:33 ` Willy Tarreau
2008-04-29 10:34 ` Alan Cox
2008-04-29 22:12 ` Willy Tarreau
2008-04-29 22:15 ` Alan Cox
2008-04-29 23:05 ` Willy Tarreau
2008-05-01 20:18 ` H. Peter Anvin
2008-05-01 9:46 ` Alexander E. Patrakov
2008-04-29 19:33 ` H. Peter Anvin
2008-04-29 10:42 ` Adrian Bunk
2008-04-29 11:06 ` Willy Tarreau [this message]
2008-04-29 11:27 ` Adrian Bunk
2008-04-29 11:32 ` Adrian Bunk
2008-04-29 20:18 ` Jeremy Fitzhardinge
2008-04-30 9:15 ` Helge Hafting
2008-04-30 19:22 ` Adrian Bunk
2008-04-30 19:42 ` H. Peter Anvin
2008-04-29 9:43 ` Adrian Bunk
2008-04-29 19:31 ` H. Peter Anvin
2008-04-29 20:05 ` Willy Tarreau
2008-04-29 20:09 ` H. Peter Anvin
2008-05-09 12:48 ` David Kågedal
2008-04-29 9:01 ` Alan Cox
2008-04-29 9:19 ` Jan Engelhardt
2008-04-29 9:34 ` Willy Tarreau
2008-04-29 9:41 ` Alan Cox
2008-04-29 12:18 ` KOSAKI Motohiro
-- strict thread matches above, loose matches on Subject: below --
2008-04-30 0:08 Samuel Thibault
2008-04-30 3:38 ` Chris Adams
2008-04-30 9:38 ` Samuel Thibault
2008-04-30 19:45 ` Willy Tarreau
2008-04-30 19:49 ` Willy Tarreau
2008-05-03 23:50 ` Samuel Thibault
2008-05-04 8:55 ` Willy Tarreau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080429110638.GG1473@1wt.eu \
--to=w@1wt.eu \
--cc=bunk@kernel.org \
--cc=helge.hafting@aitel.hist.no \
--cc=hpa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=trivial@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox