public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: <pcg@goof.com ( Marc) (A.) (Lehmann )>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Jamie Lokier <jamie@shareable.org>, Marc Lehmann <pcg@schmorp.de>,
	viro@parcelfarce.linux.theplanet.co.uk,
	Linux kernel <linux-kernel@vger.kernel.org>
Subject: Re: UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior)
Date: Tue, 17 Feb 2004 08:14:48 +0100	[thread overview]
Message-ID: <20040217071448.GA8846@schmorp.de> (raw)
In-Reply-To: <Pine.LNX.4.58.0402161431260.30742@home.osdl.org>

On Mon, Feb 16, 2004 at 02:40:25PM -0800, Linus Torvalds <torvalds@osdl.org> wrote:
> Try it with a regular C locale. Do a simple
> 
> 	echo > åäö

Just for your info, though. You can't even input these characters in a C
locale, since your libc (and/or xlib) is unable to handle them (lots of SO
C functions will barf on this one). C is 7 bit only.

> Which, if you think about is, is 100% EXACTLY equivalent to what a UTF-8
> program should do when it sees broken UTF-8.

The problem is that the very common C language makes it a pain to use
this in i18n programs. multibyte functions or iconv will no accept
these, so programs wanting to do what you are expecting to do need to
re-implement most if not all of the character handling of your typical
libc.

Yes, it's possible....

> The two cases are 100% equivalent. We've gone through this before. There 
> is a bit of pain involved, but it's not something new, or something 
> fundamentally impossible. It's very straightforward indeed.

The "bit" is enourmous, as you can't use your libc for text processing
anymore.

Yes, it works in non-i18n programms, but right now most programs get
i18n support, which means they will all fail to properly handle
non-locale characters.

-- 
      -----==-                                             |
      ----==-- _                                           |
      ---==---(_)__  __ ____  __       Marc Lehmann      +--
      --==---/ / _ \/ // /\ \/ /       pcg@goof.com      |e|
      -=====/_/_//_/\_,_/ /_/\_\       XX11-RIPE         --+
    The choice of a GNU generation                       |
                                                         |

  parent reply	other threads:[~2004-02-17  7:15 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <04Feb13.163954est.41760@gpu.utcc.utoronto.ca>
2004-02-14 14:27 ` JFS default behavior Nicolas Mailhot
2004-02-14 15:40   ` viro
2004-02-14 17:47     ` Nicolas Mailhot
2004-02-14 17:59       ` Nicolas Mailhot
2004-02-14 23:06     ` Robin Rosenberg
2004-02-14 23:29       ` viro
2004-02-15  0:07         ` Robin Rosenberg
2004-02-15  2:41           ` Linus Torvalds
2004-02-15  3:33             ` Matthias Urlichs
2004-02-15  4:04               ` viro
2004-02-15  9:48                 ` Robin Rosenberg
2004-02-15 18:26                 ` yodaiken
2004-02-18  2:48               ` Unicode normalization (userspace issue, but what the heck) H. Peter Anvin
2004-02-20  9:48                 ` Matthias Urlichs
2004-02-16 15:05             ` stty utf8 Jamie Lokier
2004-02-16 16:10               ` Gerd Knorr
2004-02-16 22:03               ` Jamie Lokier
2004-02-16 22:17                 ` Linus Torvalds
2004-02-16 22:04               ` Jamie Lokier
2004-02-16 18:36             ` UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior) Marc Lehmann
2004-02-16 18:49               ` Linus Torvalds
2004-02-16 19:26                 ` UTF-8 practically vs. theoretically in the VFS API Jeff Garzik
2004-02-16 19:48                   ` John Bradford
2004-02-16 19:48                     ` Linus Torvalds
2004-02-16 20:20                       ` Marc Lehmann
2004-02-16 20:26                         ` Linus Torvalds
2004-02-18  2:49                         ` Rob Landley
2004-02-16 20:21                       ` bert hubert
2004-02-16 20:33                         ` Marc Lehmann
2004-02-18  2:58                         ` H. Peter Anvin
2004-02-18  3:13                           ` Linus Torvalds
2004-02-18  3:22                             ` H. Peter Anvin
2004-02-18  3:30                               ` Linus Torvalds
2004-02-18  5:30                                 ` H. Peter Anvin
2004-02-18 10:29                                   ` Robin Rosenberg
2004-02-18 11:49                                     ` Tomas Szepe
2004-02-18 11:59                                       ` Robin Rosenberg
2004-02-18 12:05                                         ` Tomas Szepe
2004-02-18 12:34                                           ` Robin Rosenberg
2004-02-18 15:35                                   ` Linus Torvalds
2004-02-18 19:47                                     ` Tomas Szepe
2004-02-18 20:01                                       ` H. Peter Anvin
2004-02-18 21:22                                         ` Robin Rosenberg
2004-02-18 21:42                                           ` H. Peter Anvin
2004-02-18 11:24                               ` Jamie Lokier
2004-02-18 11:33                             ` Jamie Lokier
2004-02-18 16:47                               ` H. Peter Anvin
2004-02-18 19:59                               ` Linus Torvalds
2004-02-18 20:08                                 ` H. Peter Anvin
2004-02-18  7:25                           ` bert hubert
2004-02-16 20:16                     ` Marc Lehmann
2004-02-16 20:20                       ` Jeff Garzik
2004-02-16 21:10                       ` viro
2004-02-17  7:18                       ` jw schultz
2004-02-17  7:42                       ` Nick Piggin
2004-02-16 20:03                 ` UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior) Marc Lehmann
2004-02-16 20:23                   ` Linus Torvalds
2004-02-16 20:58                     ` Marc Lehmann
2004-02-17 14:12                       ` Dave Kleikamp
2004-02-16 22:26                     ` Jamie Lokier
2004-02-16 22:40                       ` Linus Torvalds
2004-02-16 22:52                         ` Linus Torvalds
2004-02-17 13:15                           ` Jamie Lokier
2004-02-17  7:14                         ` Lehmann  [this message]
2004-02-17 11:20                           ` UTF-8 practically vs. theoretically in the VFS API Helge Hafting
2004-02-17 15:56                           ` UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior) Linus Torvalds
     [not found]                             ` <20040217161111.GE8231@schmorp.de>
2004-02-17 16:32                               ` Linus Torvalds
2004-02-17 16:46                                 ` Jamie Lokier
2004-02-17 19:00                                   ` UTF-8 practically vs. theoretically in the VFS API Måns Rullgård
2004-02-17 20:57                                     ` Jamie Lokier
2004-02-17 21:06                                       ` Alex Belits
2004-02-17 21:47                                         ` Jamie Lokier
2004-02-22 15:32                                           ` Eric W. Biederman
2004-02-22 16:28                                             ` Jamie Lokier
2004-02-22 21:53                                               ` Eric W. Biederman
2004-02-18  7:23                                         ` Marc Lehmann
2004-02-17 21:23                                       ` Matthew Kirkwood
2004-02-18 13:11                                   ` UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior) Matthew Garrett
2004-02-17 16:52                                 ` Marc Lehmann
2004-02-17 16:54                                 ` UTF-8 practically vs. theoretically in the VFS API Stefan Smietanowski
2004-02-18  1:27                                   ` Hans Reiser
2004-02-18  2:08                                     ` Robin Rosenberg
2004-02-18 11:06                                       ` Jamie Lokier
2004-02-17 20:37                                 ` UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior) Robin Rosenberg
2004-02-17 16:36                             ` Jamie Lokier
2004-02-17 17:52                               ` viro
2004-02-17 19:29                                 ` Jamie Lokier
2004-02-17 19:45                                   ` Linus Torvalds
2004-02-17 20:30                                     ` Jamie Lokier
2004-02-17 20:49                                       ` Linus Torvalds
2004-02-17 21:17                                         ` Jamie Lokier
2004-02-17 19:51                                   ` Jamie Lokier
2004-02-17 19:53                                   ` viro
2004-02-17 20:35                                     ` John Bradford
2004-02-17 20:40                                       ` Jamie Lokier
2004-02-17 20:50                                         ` John Bradford
2004-02-17 21:04                                           ` Linus Torvalds
2004-02-17 21:16                                             ` John Bradford
2004-02-17 21:21                                               ` Linus Torvalds
2004-02-18  0:52                                                 ` John Bradford
2004-02-17 22:50                                               ` Robin Rosenberg
2004-02-18  6:48                                             ` Marc Lehmann
2004-02-17 20:47                                       ` viro
2004-02-17 20:53                                         ` John Bradford
2004-02-17 20:59                                       ` Linus Torvalds
2004-02-17 21:06                                         ` John Bradford
2004-02-17 21:42                                         ` Alex Belits
2004-02-18  6:56                                           ` Marc Lehmann
2004-02-18 20:37                                             ` Alex Belits
2004-02-18  3:11                                         ` H. Peter Anvin
2004-02-17 20:38                                     ` Jamie Lokier
2004-02-18  3:07                               ` H. Peter Anvin
2004-02-21 13:54                             ` Pavel Machek
2004-02-22 20:09                               ` H. Peter Anvin
2004-02-17  1:24                   ` Alex Belits
2004-02-17 21:09                     ` Jamie Lokier
2004-02-17 21:48                       ` Linus Torvalds
2004-02-17 22:19                       ` Alex Belits

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040217071448.GA8846@schmorp.de \
    --to=pcg@goof.com \
    --cc=jamie@shareable.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pcg@schmorp.de \
    --cc=torvalds@osdl.org \
    --cc=viro@parcelfarce.linux.theplanet.co.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox