public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jamie Lokier <jamie@shareable.org>
To: John Bradford <john@grabjohn.com>
Cc: Robin Rosenberg <robin.rosenberg.lists@dewire.com>,
	Linux kernel <linux-kernel@vger.kernel.org>
Subject: Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.)
Date: Fri, 13 Feb 2004 03:15:02 +0000	[thread overview]
Message-ID: <20040213031502.GG25499@mail.shareable.org> (raw)
In-Reply-To: <200402122113.i1CLDqoB000179@81-2-122-30.bradfords.org.uk>

John Bradford wrote:
> in the real world don't you think that there will be a lot of
> decoders which decode the multi-byte sequence back, rather than
> report an error?

There will be decoders which convert ASCII "a" to "A" too.  We can't
fix broken code; at least we can make it clear to anyone writing a
decoder what is acceptable, and that being "liberal" in what's decoded
is not acceptable and considered a security flaw.

An app author only writes the UTF-8 decoder once; it isn't at all hard
to convert non-minimal forms to the replacement char U+FFFD.
(Although that could be a security hole in some cases, it's much
better than allowing non-zero characters to decoder to NUL or "/" or
".").  Rejecting a non-minimal form is often hard, because the UTF-8
decoder is often used in a place which cannot flag errors.

> Imagine you have two files, with the following filename bytes:
> 
> 11000001 10000001 00000000
> 01000001 00000000
> 
> ..and a _real world_ application, which is not necessarily completely
> UTF-8 conformant, tries to open the file with filename 'A'.  Which one
> is it going to open?

The one which "ls" and other programs show as "A".
The other one will typically show as "?" or a diamond or something.

> I don't think that the issue with combining characters is likely to be
> an issue, I only mentioned it as an example.  As you pointed out a
> single accented character, and a two character combination are
> distinct, and converting the combination to the corresponding single
> character in a filename would definitely be wrong, in my opinion.
> However, that doesn't mean that software won't do it.

Indeed some software will do it, and worse than that: they may look
the same in an editor or file selector.  (See recent problems with
misleading URLs for why that sort of thing can be a security hole).

The combining char problem is similar to case folding: some
filesystems and programs treat "a" and "A" as equivalent too.  If the
kernel had an encoding converter, and the filesystem stored iso-8859-1
while userspace was presented with utf-8, it is likely that several
Unicode characters would be mapped to "a", causing similar problems to
automatic case folding in filesystems.

In other words, there is no clear solution to this problem.

-- Jamie

  parent reply	other threads:[~2004-02-13  3:15 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-02-09 11:58 UTF-8 in file systems? xfs/extfs/etc Nico Schottelius
2004-02-09 12:26 ` Måns Rullgård
2004-02-09 12:28 ` Hugo Mills
2004-02-09 13:04 ` Matthew Reppert
2004-02-09 13:36 ` Matthias Urlichs
2004-02-10  4:32   ` Mike Fedyk
2004-02-10  4:53     ` Matthias Urlichs
2004-02-10  9:46     ` Robin Rosenberg
2004-02-10 23:04     ` jw schultz
2004-02-10 23:17       ` viro
2004-02-10 23:23       ` Måns Rullgård
2004-02-11  0:02       ` Mike Fedyk
2004-02-09 15:06 ` Matthew Garrett
2004-02-11  6:39 ` Tim Connors
2004-02-11 16:35   ` JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) Dave Kleikamp
2004-02-12  0:45     ` Andy Isaacson
2004-02-12  1:19       ` Tim Connors
2004-02-12  3:54       ` jw schultz
2004-02-12 12:03         ` Robin Rosenberg
2004-02-12  8:54       ` Jamie Lokier
2004-02-12 15:55         ` Robin Rosenberg
2004-02-12 16:17           ` John Bradford
2004-02-12 16:40             ` Robin Rosenberg
2004-02-12 17:16               ` John Bradford
2004-02-12 18:06                 ` Robin Rosenberg
2004-02-12 19:08                   ` John Bradford
2004-02-12 19:39                     ` Robin Rosenberg
2004-02-12 21:13                       ` John Bradford
2004-02-12 22:29                         ` Robin Rosenberg
2004-02-12 22:50                           ` Valdis.Kletnieks
2004-02-13  2:58                           ` Jamie Lokier
2004-02-13  9:48                             ` Robin Rosenberg
2004-02-13  3:15                         ` Jamie Lokier [this message]
2004-02-14 15:24                     ` Eduard Bloch
2004-02-13  0:17             ` Jamie Lokier
2004-02-13  0:38           ` Jamie Lokier
2004-02-13  1:16             ` Robin Rosenberg
2004-02-13  1:23               ` Jamie Lokier
2004-02-13  1:46                 ` Robin Rosenberg
2004-02-13  2:29               ` viro
2004-02-13  3:23                 ` Jamie Lokier
2004-02-14 15:09                   ` Eduard Bloch
2004-02-15  1:01                     ` Jamie Lokier
2004-02-16 14:03                       ` Eduard Bloch
2004-02-16 14:28                         ` Jamie Lokier
2004-02-16 19:22                           ` Eduard Bloch
2004-02-16 21:44                             ` Jamie Lokier
2004-02-16 15:18                         ` Valdis.Kletnieks
2004-02-16 15:32                           ` Jamie Lokier
2004-02-16 19:13                             ` Eduard Bloch
2004-02-16 15:46                           ` John Bradford
2004-02-16 15:48                             ` viro
2004-02-16 16:43                               ` John Bradford
2004-02-16 16:25                             ` Robin Rosenberg
2004-02-16 15:27                         ` Jamie Lokier
2004-02-16 15:44                         ` Robin Rosenberg
2004-02-13 10:03                 ` Robin Rosenberg
2004-02-13 10:22                   ` vda
2004-02-13 10:29                     ` Robin Rosenberg
2004-02-12 13:28       ` Dave Kleikamp
2004-02-12 15:26       ` Valdis.Kletnieks
2004-02-12 15:41         ` Dave Kleikamp
  -- strict thread matches above, loose matches on Subject: below --
2004-02-12 16:50 Nicolas Mailhot
2004-02-12 18:12 ` Robin Rosenberg
2004-02-13  3:03 ` Jamie Lokier
2004-02-13 10:07   ` Robin Rosenberg
2004-02-13 18:06   ` Nicolas Mailhot
2004-02-13 18:15     ` viro
2004-02-13 18:24       ` Valdis.Kletnieks
2004-02-13 18:31         ` viro
2004-02-13 20:27           ` Jamie Lokier
2004-02-13 18:31       ` Richard B. Johnson
2004-02-13 22:39         ` Robin Rosenberg
     [not found] <04Feb13.015940est.41760@gpu.utcc.utoronto.ca>
2004-02-13 10:26 ` Robin Rosenberg
     [not found] <04Feb13.024659est.41760@gpu.utcc.utoronto.ca>
2004-02-13 17:57 ` Nicolas Mailhot
     [not found] <1nioI-5Re-1@gated-at.bofh.it>
     [not found] ` <1orqh-6gs-47@gated-at.bofh.it>
     [not found]   ` <1ozGR-60N-1@gated-at.bofh.it>
     [not found]     ` <1oAa3-6pR-37@gated-at.bofh.it>
     [not found]       ` <1oBpi-7pO-1@gated-at.bofh.it>
     [not found]         ` <1oCbM-8oW-9@gated-at.bofh.it>
     [not found]           ` <1p9Kl-7BV-1@gated-at.bofh.it>
     [not found]             ` <1piXj-1d3-3@gated-at.bofh.it>
2004-02-15 14:26               ` Pascal Schmidt
     [not found]               ` <1pRLy-21o-31@gated-at.bofh.it>
     [not found]                 ` <1pSRf-31Z-5@gated-at.bofh.it>
2004-02-16 15:44                   ` Pascal Schmidt
2004-02-16 15:59                     ` Valdis.Kletnieks
     [not found] <1pvrI-8bq-29@gated-at.bofh.it>
     [not found] ` <1pvrI-8bq-31@gated-at.bofh.it>
     [not found]   ` <1pvrJ-8bq-33@gated-at.bofh.it>
     [not found]     ` <1pvrJ-8bq-35@gated-at.bofh.it>
     [not found]       ` <1pvrJ-8bq-37@gated-at.bofh.it>
     [not found]         ` <1pvrJ-8bq-39@gated-at.bofh.it>
     [not found]           ` <1pvrJ-8bq-41@gated-at.bofh.it>
     [not found]             ` <1pvrJ-8bq-43@gated-at.bofh.it>
     [not found]               ` <1pTay-3hc-13@gated-at.bofh.it>
     [not found]                 ` <1pTay-3hc-15@gated-at.bofh.it>
     [not found]                   ` <1pTay-3hc-11@gated-at.bofh.it>
     [not found]                     ` <1pTu7-3Ce-7@gated-at.bofh.it>
2004-02-16 17:26                       ` Pascal Schmidt
2004-02-16 17:58                         ` Valdis.Kletnieks
2004-02-16 19:48                           ` Pascal Schmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040213031502.GG25499@mail.shareable.org \
    --to=jamie@shareable.org \
    --cc=john@grabjohn.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=robin.rosenberg.lists@dewire.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox