From: <pcg@goof.com ( Marc) (A.) (Lehmann )>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Jamie Lokier <jamie@shareable.org>, Marc Lehmann <pcg@schmorp.de>,
viro@parcelfarce.linux.theplanet.co.uk,
Linux kernel <linux-kernel@vger.kernel.org>
Subject: Re: UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior)
Date: Tue, 17 Feb 2004 08:14:48 +0100 [thread overview]
Message-ID: <20040217071448.GA8846@schmorp.de> (raw)
In-Reply-To: <Pine.LNX.4.58.0402161431260.30742@home.osdl.org>
On Mon, Feb 16, 2004 at 02:40:25PM -0800, Linus Torvalds <torvalds@osdl.org> wrote:
> Try it with a regular C locale. Do a simple
>
> echo > åäö
Just for your info, though. You can't even input these characters in a C
locale, since your libc (and/or xlib) is unable to handle them (lots of SO
C functions will barf on this one). C is 7 bit only.
> Which, if you think about is, is 100% EXACTLY equivalent to what a UTF-8
> program should do when it sees broken UTF-8.
The problem is that the very common C language makes it a pain to use
this in i18n programs. multibyte functions or iconv will no accept
these, so programs wanting to do what you are expecting to do need to
re-implement most if not all of the character handling of your typical
libc.
Yes, it's possible....
> The two cases are 100% equivalent. We've gone through this before. There
> is a bit of pain involved, but it's not something new, or something
> fundamentally impossible. It's very straightforward indeed.
The "bit" is enourmous, as you can't use your libc for text processing
anymore.
Yes, it works in non-i18n programms, but right now most programs get
i18n support, which means they will all fail to properly handle
non-locale characters.
--
-----==- |
----==-- _ |
---==---(_)__ __ ____ __ Marc Lehmann +--
--==---/ / _ \/ // /\ \/ / pcg@goof.com |e|
-=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+
The choice of a GNU generation |
|
next prev parent reply other threads:[~2004-02-17 7:15 UTC|newest]
Thread overview: 118+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <04Feb13.163954est.41760@gpu.utcc.utoronto.ca>
2004-02-14 14:27 ` JFS default behavior Nicolas Mailhot
2004-02-14 15:40 ` viro
2004-02-14 17:47 ` Nicolas Mailhot
2004-02-14 17:59 ` Nicolas Mailhot
2004-02-14 23:06 ` Robin Rosenberg
2004-02-14 23:29 ` viro
2004-02-15 0:07 ` Robin Rosenberg
2004-02-15 2:41 ` Linus Torvalds
2004-02-15 3:33 ` Matthias Urlichs
2004-02-15 4:04 ` viro
2004-02-15 9:48 ` Robin Rosenberg
2004-02-15 18:26 ` yodaiken
2004-02-18 2:48 ` Unicode normalization (userspace issue, but what the heck) H. Peter Anvin
2004-02-20 9:48 ` Matthias Urlichs
2004-02-16 15:05 ` stty utf8 Jamie Lokier
2004-02-16 16:10 ` Gerd Knorr
2004-02-16 22:03 ` Jamie Lokier
2004-02-16 22:17 ` Linus Torvalds
2004-02-16 22:04 ` Jamie Lokier
2004-02-16 18:36 ` UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior) Marc Lehmann
2004-02-16 18:49 ` Linus Torvalds
2004-02-16 19:26 ` UTF-8 practically vs. theoretically in the VFS API Jeff Garzik
2004-02-16 19:48 ` John Bradford
2004-02-16 19:48 ` Linus Torvalds
2004-02-16 20:20 ` Marc Lehmann
2004-02-16 20:26 ` Linus Torvalds
2004-02-18 2:49 ` Rob Landley
2004-02-16 20:21 ` bert hubert
2004-02-16 20:33 ` Marc Lehmann
2004-02-18 2:58 ` H. Peter Anvin
2004-02-18 3:13 ` Linus Torvalds
2004-02-18 3:22 ` H. Peter Anvin
2004-02-18 3:30 ` Linus Torvalds
2004-02-18 5:30 ` H. Peter Anvin
2004-02-18 10:29 ` Robin Rosenberg
2004-02-18 11:49 ` Tomas Szepe
2004-02-18 11:59 ` Robin Rosenberg
2004-02-18 12:05 ` Tomas Szepe
2004-02-18 12:34 ` Robin Rosenberg
2004-02-18 15:35 ` Linus Torvalds
2004-02-18 19:47 ` Tomas Szepe
2004-02-18 20:01 ` H. Peter Anvin
2004-02-18 21:22 ` Robin Rosenberg
2004-02-18 21:42 ` H. Peter Anvin
2004-02-18 11:24 ` Jamie Lokier
2004-02-18 11:33 ` Jamie Lokier
2004-02-18 16:47 ` H. Peter Anvin
2004-02-18 19:59 ` Linus Torvalds
2004-02-18 20:08 ` H. Peter Anvin
2004-02-18 7:25 ` bert hubert
2004-02-16 20:16 ` Marc Lehmann
2004-02-16 20:20 ` Jeff Garzik
2004-02-16 21:10 ` viro
2004-02-17 7:18 ` jw schultz
2004-02-17 7:42 ` Nick Piggin
2004-02-16 20:03 ` UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior) Marc Lehmann
2004-02-16 20:23 ` Linus Torvalds
2004-02-16 20:58 ` Marc Lehmann
2004-02-17 14:12 ` Dave Kleikamp
2004-02-16 22:26 ` Jamie Lokier
2004-02-16 22:40 ` Linus Torvalds
2004-02-16 22:52 ` Linus Torvalds
2004-02-17 13:15 ` Jamie Lokier
2004-02-17 7:14 ` Lehmann [this message]
2004-02-17 11:20 ` UTF-8 practically vs. theoretically in the VFS API Helge Hafting
2004-02-17 15:56 ` UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior) Linus Torvalds
[not found] ` <20040217161111.GE8231@schmorp.de>
2004-02-17 16:32 ` Linus Torvalds
2004-02-17 16:46 ` Jamie Lokier
2004-02-17 19:00 ` UTF-8 practically vs. theoretically in the VFS API Måns Rullgård
2004-02-17 20:57 ` Jamie Lokier
2004-02-17 21:06 ` Alex Belits
2004-02-17 21:47 ` Jamie Lokier
2004-02-22 15:32 ` Eric W. Biederman
2004-02-22 16:28 ` Jamie Lokier
2004-02-22 21:53 ` Eric W. Biederman
2004-02-18 7:23 ` Marc Lehmann
2004-02-17 21:23 ` Matthew Kirkwood
2004-02-18 13:11 ` UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior) Matthew Garrett
2004-02-17 16:52 ` Marc Lehmann
2004-02-17 16:54 ` UTF-8 practically vs. theoretically in the VFS API Stefan Smietanowski
2004-02-18 1:27 ` Hans Reiser
2004-02-18 2:08 ` Robin Rosenberg
2004-02-18 11:06 ` Jamie Lokier
2004-02-17 20:37 ` UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior) Robin Rosenberg
2004-02-17 16:36 ` Jamie Lokier
2004-02-17 17:52 ` viro
2004-02-17 19:29 ` Jamie Lokier
2004-02-17 19:45 ` Linus Torvalds
2004-02-17 20:30 ` Jamie Lokier
2004-02-17 20:49 ` Linus Torvalds
2004-02-17 21:17 ` Jamie Lokier
2004-02-17 19:51 ` Jamie Lokier
2004-02-17 19:53 ` viro
2004-02-17 20:35 ` John Bradford
2004-02-17 20:40 ` Jamie Lokier
2004-02-17 20:50 ` John Bradford
2004-02-17 21:04 ` Linus Torvalds
2004-02-17 21:16 ` John Bradford
2004-02-17 21:21 ` Linus Torvalds
2004-02-18 0:52 ` John Bradford
2004-02-17 22:50 ` Robin Rosenberg
2004-02-18 6:48 ` Marc Lehmann
2004-02-17 20:47 ` viro
2004-02-17 20:53 ` John Bradford
2004-02-17 20:59 ` Linus Torvalds
2004-02-17 21:06 ` John Bradford
2004-02-17 21:42 ` Alex Belits
2004-02-18 6:56 ` Marc Lehmann
2004-02-18 20:37 ` Alex Belits
2004-02-18 3:11 ` H. Peter Anvin
2004-02-17 20:38 ` Jamie Lokier
2004-02-18 3:07 ` H. Peter Anvin
2004-02-21 13:54 ` Pavel Machek
2004-02-22 20:09 ` H. Peter Anvin
2004-02-17 1:24 ` Alex Belits
2004-02-17 21:09 ` Jamie Lokier
2004-02-17 21:48 ` Linus Torvalds
2004-02-17 22:19 ` Alex Belits
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040217071448.GA8846@schmorp.de \
--to=pcg@goof.com \
--cc=jamie@shareable.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pcg@schmorp.de \
--cc=torvalds@osdl.org \
--cc=viro@parcelfarce.linux.theplanet.co.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox