public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "H. Peter Anvin" <hpa@zytor.com>
To: Linus Torvalds <torvalds@osdl.org>
Cc: linux-kernel@vger.kernel.org
Subject: Re: UTF-8 practically vs. theoretically in the VFS API
Date: Tue, 17 Feb 2004 21:30:09 -0800	[thread overview]
Message-ID: <4032F861.3080304@zytor.com> (raw)
In-Reply-To: <Pine.LNX.4.58.0402171927520.2686@home.osdl.org>

Linus Torvalds wrote:
> 
> On Tue, 17 Feb 2004, H. Peter Anvin wrote:
> 
>>Well, the reason you'd want an out-of-band mechanism is to be able to
>>display it as some kind of escapes. 
> 
> 
> I'd suggest just doing that when you convert the utf-8 format to printable 
> format _anyway_.  At that point you just make the "printable" 
> representation be the binary escape sequence (which you have to have for 
> other non-printable utf-8 characters anyway).
> 

What does "printable" mean in this context?  Typically you have to 
convert it to UCS-4 first, so you can index into your font tables, then 
you have to create the right composition, apply the bidirectional text 
algorithm, and so forth.

Rendering general Unicode text is complex enough that you really want it 
layered.  What I described what the first step of that -- mostly trying 
to show that "throwing an error" doesn't necessarily mean "produce no 
output."  What you shouldn't do, though, is alias it with legitimate input.

> And if you do things right (ie you allow user input in that same escaped 
> output format), you can allow users to re-create the exact "broken utf-8". 
> Which is actually important just so that the user can fix it up (ie 
> imagine the user noticing that the filename is broken, and now needs to do 
> a "mv broken-name fixed-name" - the user needs some way to re-create the 
> brokenness).

Indeed.  The C language has gone with \x77 for bytes and \u7777 or 
\U77777777 for Unicode characters (4 vs 8 hex digits respectively); I 
think this is a good UI for shells to follow.  The \x representation 
then doesn't stand for characters but for bytes.  It may be desirable to 
disallow encoding of *valid* UTF-8 characters this way, though.

	-hpa

  reply	other threads:[~2004-02-18  5:30 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <04Feb13.163954est.41760@gpu.utcc.utoronto.ca>
2004-02-14 14:27 ` JFS default behavior Nicolas Mailhot
2004-02-14 15:40   ` viro
2004-02-14 17:47     ` Nicolas Mailhot
2004-02-14 17:59       ` Nicolas Mailhot
2004-02-14 23:06     ` Robin Rosenberg
2004-02-14 23:29       ` viro
2004-02-15  0:07         ` Robin Rosenberg
2004-02-15  2:41           ` Linus Torvalds
2004-02-15  3:33             ` Matthias Urlichs
2004-02-15  4:04               ` viro
2004-02-15  9:48                 ` Robin Rosenberg
2004-02-15 18:26                 ` yodaiken
2004-02-18  2:48               ` Unicode normalization (userspace issue, but what the heck) H. Peter Anvin
2004-02-20  9:48                 ` Matthias Urlichs
2004-02-16 15:05             ` stty utf8 Jamie Lokier
2004-02-16 16:10               ` Gerd Knorr
2004-02-16 22:03               ` Jamie Lokier
2004-02-16 22:17                 ` Linus Torvalds
2004-02-16 22:04               ` Jamie Lokier
2004-02-16 18:36             ` UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior) Marc Lehmann
2004-02-16 18:49               ` Linus Torvalds
2004-02-16 19:26                 ` UTF-8 practically vs. theoretically in the VFS API Jeff Garzik
2004-02-16 19:48                   ` John Bradford
2004-02-16 19:48                     ` Linus Torvalds
2004-02-16 20:20                       ` Marc Lehmann
2004-02-16 20:26                         ` Linus Torvalds
2004-02-18  2:49                         ` Rob Landley
2004-02-16 20:21                       ` bert hubert
2004-02-16 20:33                         ` Marc Lehmann
2004-02-18  2:58                         ` H. Peter Anvin
2004-02-18  3:13                           ` Linus Torvalds
2004-02-18  3:22                             ` H. Peter Anvin
2004-02-18  3:30                               ` Linus Torvalds
2004-02-18  5:30                                 ` H. Peter Anvin [this message]
2004-02-18 10:29                                   ` Robin Rosenberg
2004-02-18 11:49                                     ` Tomas Szepe
2004-02-18 11:59                                       ` Robin Rosenberg
2004-02-18 12:05                                         ` Tomas Szepe
2004-02-18 12:34                                           ` Robin Rosenberg
2004-02-18 15:35                                   ` Linus Torvalds
2004-02-18 19:47                                     ` Tomas Szepe
2004-02-18 20:01                                       ` H. Peter Anvin
2004-02-18 21:22                                         ` Robin Rosenberg
2004-02-18 21:42                                           ` H. Peter Anvin
2004-02-18 11:24                               ` Jamie Lokier
2004-02-18 11:33                             ` Jamie Lokier
2004-02-18 16:47                               ` H. Peter Anvin
2004-02-18 19:59                               ` Linus Torvalds
2004-02-18 20:08                                 ` H. Peter Anvin
2004-02-18  7:25                           ` bert hubert
2004-02-16 20:16                     ` Marc Lehmann
2004-02-16 20:20                       ` Jeff Garzik
2004-02-16 21:10                       ` viro
2004-02-17  7:18                       ` jw schultz
2004-02-17  7:42                       ` Nick Piggin
2004-02-16 20:03                 ` UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior) Marc Lehmann
2004-02-16 20:23                   ` Linus Torvalds
2004-02-16 20:58                     ` Marc Lehmann
2004-02-17 14:12                       ` Dave Kleikamp
2004-02-16 22:26                     ` Jamie Lokier
2004-02-16 22:40                       ` Linus Torvalds
2004-02-16 22:52                         ` Linus Torvalds
2004-02-17 13:15                           ` Jamie Lokier
2004-02-17  7:14                         ` Lehmann 
2004-02-17 11:20                           ` UTF-8 practically vs. theoretically in the VFS API Helge Hafting
2004-02-17 15:56                           ` UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior) Linus Torvalds
     [not found]                             ` <20040217161111.GE8231@schmorp.de>
2004-02-17 16:32                               ` Linus Torvalds
2004-02-17 16:46                                 ` Jamie Lokier
2004-02-17 19:00                                   ` UTF-8 practically vs. theoretically in the VFS API Måns Rullgård
2004-02-17 20:57                                     ` Jamie Lokier
2004-02-17 21:06                                       ` Alex Belits
2004-02-17 21:47                                         ` Jamie Lokier
2004-02-22 15:32                                           ` Eric W. Biederman
2004-02-22 16:28                                             ` Jamie Lokier
2004-02-22 21:53                                               ` Eric W. Biederman
2004-02-18  7:23                                         ` Marc Lehmann
2004-02-17 21:23                                       ` Matthew Kirkwood
2004-02-18 13:11                                   ` UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior) Matthew Garrett
2004-02-17 16:52                                 ` Marc Lehmann
2004-02-17 16:54                                 ` UTF-8 practically vs. theoretically in the VFS API Stefan Smietanowski
2004-02-18  1:27                                   ` Hans Reiser
2004-02-18  2:08                                     ` Robin Rosenberg
2004-02-18 11:06                                       ` Jamie Lokier
2004-02-17 20:37                                 ` UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior) Robin Rosenberg
2004-02-17 16:36                             ` Jamie Lokier
2004-02-17 17:52                               ` viro
2004-02-17 19:29                                 ` Jamie Lokier
2004-02-17 19:45                                   ` Linus Torvalds
2004-02-17 20:30                                     ` Jamie Lokier
2004-02-17 20:49                                       ` Linus Torvalds
2004-02-17 21:17                                         ` Jamie Lokier
2004-02-17 19:51                                   ` Jamie Lokier
2004-02-17 19:53                                   ` viro
2004-02-17 20:35                                     ` John Bradford
2004-02-17 20:40                                       ` Jamie Lokier
2004-02-17 20:50                                         ` John Bradford
2004-02-17 21:04                                           ` Linus Torvalds
2004-02-17 21:16                                             ` John Bradford
2004-02-17 21:21                                               ` Linus Torvalds
2004-02-18  0:52                                                 ` John Bradford
2004-02-17 22:50                                               ` Robin Rosenberg
2004-02-18  6:48                                             ` Marc Lehmann
2004-02-17 20:47                                       ` viro
2004-02-17 20:53                                         ` John Bradford
2004-02-17 20:59                                       ` Linus Torvalds
2004-02-17 21:06                                         ` John Bradford
2004-02-17 21:42                                         ` Alex Belits
2004-02-18  6:56                                           ` Marc Lehmann
2004-02-18 20:37                                             ` Alex Belits
2004-02-18  3:11                                         ` H. Peter Anvin
2004-02-17 20:38                                     ` Jamie Lokier
2004-02-18  3:07                               ` H. Peter Anvin
2004-02-21 13:54                             ` Pavel Machek
2004-02-22 20:09                               ` H. Peter Anvin
2004-02-17  1:24                   ` Alex Belits
2004-02-17 21:09                     ` Jamie Lokier
2004-02-17 21:48                       ` Linus Torvalds
2004-02-17 22:19                       ` Alex Belits
2004-02-23 11:35 UTF-8 practically vs. theoretically in the VFS API Norman Diamond
     [not found] ` <fa.ip45pqg.i26oru@ifi.uio.no>
2004-02-23 19:13   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4032F861.3080304@zytor.com \
    --to=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox