From: Jamie Lokier <jamie@shareable.org>
To: John Bradford <john@grabjohn.com>
Cc: Robin Rosenberg <robin.rosenberg.lists@dewire.com>,
Linux kernel <linux-kernel@vger.kernel.org>
Subject: Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.)
Date: Fri, 13 Feb 2004 03:15:02 +0000 [thread overview]
Message-ID: <20040213031502.GG25499@mail.shareable.org> (raw)
In-Reply-To: <200402122113.i1CLDqoB000179@81-2-122-30.bradfords.org.uk>
John Bradford wrote:
> in the real world don't you think that there will be a lot of
> decoders which decode the multi-byte sequence back, rather than
> report an error?
There will be decoders which convert ASCII "a" to "A" too. We can't
fix broken code; at least we can make it clear to anyone writing a
decoder what is acceptable, and that being "liberal" in what's decoded
is not acceptable and considered a security flaw.
An app author only writes the UTF-8 decoder once; it isn't at all hard
to convert non-minimal forms to the replacement char U+FFFD.
(Although that could be a security hole in some cases, it's much
better than allowing non-zero characters to decoder to NUL or "/" or
"."). Rejecting a non-minimal form is often hard, because the UTF-8
decoder is often used in a place which cannot flag errors.
> Imagine you have two files, with the following filename bytes:
>
> 11000001 10000001 00000000
> 01000001 00000000
>
> ..and a _real world_ application, which is not necessarily completely
> UTF-8 conformant, tries to open the file with filename 'A'. Which one
> is it going to open?
The one which "ls" and other programs show as "A".
The other one will typically show as "?" or a diamond or something.
> I don't think that the issue with combining characters is likely to be
> an issue, I only mentioned it as an example. As you pointed out a
> single accented character, and a two character combination are
> distinct, and converting the combination to the corresponding single
> character in a filename would definitely be wrong, in my opinion.
> However, that doesn't mean that software won't do it.
Indeed some software will do it, and worse than that: they may look
the same in an editor or file selector. (See recent problems with
misleading URLs for why that sort of thing can be a security hole).
The combining char problem is similar to case folding: some
filesystems and programs treat "a" and "A" as equivalent too. If the
kernel had an encoding converter, and the filesystem stored iso-8859-1
while userspace was presented with utf-8, it is likely that several
Unicode characters would be mapped to "a", causing similar problems to
automatic case folding in filesystems.
In other words, there is no clear solution to this problem.
-- Jamie
next prev parent reply other threads:[~2004-02-13 3:15 UTC|newest]
Thread overview: 81+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-02-09 11:58 UTF-8 in file systems? xfs/extfs/etc Nico Schottelius
2004-02-09 12:26 ` Måns Rullgård
2004-02-09 12:28 ` Hugo Mills
2004-02-09 13:04 ` Matthew Reppert
2004-02-09 13:36 ` Matthias Urlichs
2004-02-10 4:32 ` Mike Fedyk
2004-02-10 4:53 ` Matthias Urlichs
2004-02-10 9:46 ` Robin Rosenberg
2004-02-10 23:04 ` jw schultz
2004-02-10 23:17 ` viro
2004-02-10 23:23 ` Måns Rullgård
2004-02-11 0:02 ` Mike Fedyk
2004-02-09 15:06 ` Matthew Garrett
2004-02-11 6:39 ` Tim Connors
2004-02-11 16:35 ` JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) Dave Kleikamp
2004-02-12 0:45 ` Andy Isaacson
2004-02-12 1:19 ` Tim Connors
2004-02-12 3:54 ` jw schultz
2004-02-12 12:03 ` Robin Rosenberg
2004-02-12 8:54 ` Jamie Lokier
2004-02-12 15:55 ` Robin Rosenberg
2004-02-12 16:17 ` John Bradford
2004-02-12 16:40 ` Robin Rosenberg
2004-02-12 17:16 ` John Bradford
2004-02-12 18:06 ` Robin Rosenberg
2004-02-12 19:08 ` John Bradford
2004-02-12 19:39 ` Robin Rosenberg
2004-02-12 21:13 ` John Bradford
2004-02-12 22:29 ` Robin Rosenberg
2004-02-12 22:50 ` Valdis.Kletnieks
2004-02-13 2:58 ` Jamie Lokier
2004-02-13 9:48 ` Robin Rosenberg
2004-02-13 3:15 ` Jamie Lokier [this message]
2004-02-14 15:24 ` Eduard Bloch
2004-02-13 0:17 ` Jamie Lokier
2004-02-13 0:38 ` Jamie Lokier
2004-02-13 1:16 ` Robin Rosenberg
2004-02-13 1:23 ` Jamie Lokier
2004-02-13 1:46 ` Robin Rosenberg
2004-02-13 2:29 ` viro
2004-02-13 3:23 ` Jamie Lokier
2004-02-14 15:09 ` Eduard Bloch
2004-02-15 1:01 ` Jamie Lokier
2004-02-16 14:03 ` Eduard Bloch
2004-02-16 14:28 ` Jamie Lokier
2004-02-16 19:22 ` Eduard Bloch
2004-02-16 21:44 ` Jamie Lokier
2004-02-16 15:18 ` Valdis.Kletnieks
2004-02-16 15:32 ` Jamie Lokier
2004-02-16 19:13 ` Eduard Bloch
2004-02-16 15:46 ` John Bradford
2004-02-16 15:48 ` viro
2004-02-16 16:43 ` John Bradford
2004-02-16 16:25 ` Robin Rosenberg
2004-02-16 15:27 ` Jamie Lokier
2004-02-16 15:44 ` Robin Rosenberg
2004-02-13 10:03 ` Robin Rosenberg
2004-02-13 10:22 ` vda
2004-02-13 10:29 ` Robin Rosenberg
2004-02-12 13:28 ` Dave Kleikamp
2004-02-12 15:26 ` Valdis.Kletnieks
2004-02-12 15:41 ` Dave Kleikamp
-- strict thread matches above, loose matches on Subject: below --
2004-02-12 16:50 Nicolas Mailhot
2004-02-12 18:12 ` Robin Rosenberg
2004-02-13 3:03 ` Jamie Lokier
2004-02-13 10:07 ` Robin Rosenberg
2004-02-13 18:06 ` Nicolas Mailhot
2004-02-13 18:15 ` viro
2004-02-13 18:24 ` Valdis.Kletnieks
2004-02-13 18:31 ` viro
2004-02-13 20:27 ` Jamie Lokier
2004-02-13 18:31 ` Richard B. Johnson
2004-02-13 22:39 ` Robin Rosenberg
[not found] <04Feb13.015940est.41760@gpu.utcc.utoronto.ca>
2004-02-13 10:26 ` Robin Rosenberg
[not found] <04Feb13.024659est.41760@gpu.utcc.utoronto.ca>
2004-02-13 17:57 ` Nicolas Mailhot
[not found] <1nioI-5Re-1@gated-at.bofh.it>
[not found] ` <1orqh-6gs-47@gated-at.bofh.it>
[not found] ` <1ozGR-60N-1@gated-at.bofh.it>
[not found] ` <1oAa3-6pR-37@gated-at.bofh.it>
[not found] ` <1oBpi-7pO-1@gated-at.bofh.it>
[not found] ` <1oCbM-8oW-9@gated-at.bofh.it>
[not found] ` <1p9Kl-7BV-1@gated-at.bofh.it>
[not found] ` <1piXj-1d3-3@gated-at.bofh.it>
2004-02-15 14:26 ` Pascal Schmidt
[not found] ` <1pRLy-21o-31@gated-at.bofh.it>
[not found] ` <1pSRf-31Z-5@gated-at.bofh.it>
2004-02-16 15:44 ` Pascal Schmidt
2004-02-16 15:59 ` Valdis.Kletnieks
[not found] <1pvrI-8bq-29@gated-at.bofh.it>
[not found] ` <1pvrI-8bq-31@gated-at.bofh.it>
[not found] ` <1pvrJ-8bq-33@gated-at.bofh.it>
[not found] ` <1pvrJ-8bq-35@gated-at.bofh.it>
[not found] ` <1pvrJ-8bq-37@gated-at.bofh.it>
[not found] ` <1pvrJ-8bq-39@gated-at.bofh.it>
[not found] ` <1pvrJ-8bq-41@gated-at.bofh.it>
[not found] ` <1pvrJ-8bq-43@gated-at.bofh.it>
[not found] ` <1pTay-3hc-13@gated-at.bofh.it>
[not found] ` <1pTay-3hc-15@gated-at.bofh.it>
[not found] ` <1pTay-3hc-11@gated-at.bofh.it>
[not found] ` <1pTu7-3Ce-7@gated-at.bofh.it>
2004-02-16 17:26 ` Pascal Schmidt
2004-02-16 17:58 ` Valdis.Kletnieks
2004-02-16 19:48 ` Pascal Schmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040213031502.GG25499@mail.shareable.org \
--to=jamie@shareable.org \
--cc=john@grabjohn.com \
--cc=linux-kernel@vger.kernel.org \
--cc=robin.rosenberg.lists@dewire.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.