From: Jamie Lokier <jamie@shareable.org>
To: John Bradford <john@grabjohn.com>
Cc: Robin Rosenberg <robin.rosenberg.lists@dewire.com>,
Linux kernel <linux-kernel@vger.kernel.org>
Subject: Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.)
Date: Fri, 13 Feb 2004 03:15:02 +0000 [thread overview]
Message-ID: <20040213031502.GG25499@mail.shareable.org> (raw)
In-Reply-To: <200402122113.i1CLDqoB000179@81-2-122-30.bradfords.org.uk>
John Bradford wrote:
> in the real world don't you think that there will be a lot of
> decoders which decode the multi-byte sequence back, rather than
> report an error?
There will be decoders which convert ASCII "a" to "A" too. We can't
fix broken code; at least we can make it clear to anyone writing a
decoder what is acceptable, and that being "liberal" in what's decoded
is not acceptable and considered a security flaw.
An app author only writes the UTF-8 decoder once; it isn't at all hard
to convert non-minimal forms to the replacement char U+FFFD.
(Although that could be a security hole in some cases, it's much
better than allowing non-zero characters to decoder to NUL or "/" or
"."). Rejecting a non-minimal form is often hard, because the UTF-8
decoder is often used in a place which cannot flag errors.
> Imagine you have two files, with the following filename bytes:
>
> 11000001 10000001 00000000
> 01000001 00000000
>
> ..and a _real world_ application, which is not necessarily completely
> UTF-8 conformant, tries to open the file with filename 'A'. Which one
> is it going to open?
The one which "ls" and other programs show as "A".
The other one will typically show as "?" or a diamond or something.
> I don't think that the issue with combining characters is likely to be
> an issue, I only mentioned it as an example. As you pointed out a
> single accented character, and a two character combination are
> distinct, and converting the combination to the corresponding single
> character in a filename would definitely be wrong, in my opinion.
> However, that doesn't mean that software won't do it.
Indeed some software will do it, and worse than that: they may look
the same in an editor or file selector. (See recent problems with
misleading URLs for why that sort of thing can be a security hole).
The combining char problem is similar to case folding: some
filesystems and programs treat "a" and "A" as equivalent too. If the
kernel had an encoding converter, and the filesystem stored iso-8859-1
while userspace was presented with utf-8, it is likely that several
Unicode characters would be mapped to "a", causing similar problems to
automatic case folding in filesystems.
In other words, there is no clear solution to this problem.
-- Jamie
next prev parent reply other threads:[~2004-02-13 3:15 UTC|newest]
Thread overview: 81+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-02-09 11:58 UTF-8 in file systems? xfs/extfs/etc Nico Schottelius
2004-02-09 12:26 ` Måns Rullgård
2004-02-09 12:28 ` Hugo Mills
2004-02-09 13:04 ` Matthew Reppert
2004-02-09 13:36 ` Matthias Urlichs
2004-02-10 4:32 ` Mike Fedyk
2004-02-10 4:53 ` Matthias Urlichs
2004-02-10 9:46 ` Robin Rosenberg
2004-02-10 23:04 ` jw schultz
2004-02-10 23:17 ` viro
2004-02-10 23:23 ` Måns Rullgård
2004-02-11 0:02 ` Mike Fedyk
2004-02-09 15:06 ` Matthew Garrett
2004-02-11 6:39 ` Tim Connors
2004-02-11 16:35 ` JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.) Dave Kleikamp
2004-02-12 0:45 ` Andy Isaacson
2004-02-12 1:19 ` Tim Connors
2004-02-12 3:54 ` jw schultz
2004-02-12 12:03 ` Robin Rosenberg
2004-02-12 8:54 ` Jamie Lokier
2004-02-12 15:55 ` Robin Rosenberg
2004-02-12 16:17 ` John Bradford
2004-02-12 16:40 ` Robin Rosenberg
2004-02-12 17:16 ` John Bradford
2004-02-12 18:06 ` Robin Rosenberg
2004-02-12 19:08 ` John Bradford
2004-02-12 19:39 ` Robin Rosenberg
2004-02-12 21:13 ` John Bradford
2004-02-12 22:29 ` Robin Rosenberg
2004-02-12 22:50 ` Valdis.Kletnieks
2004-02-13 2:58 ` Jamie Lokier
2004-02-13 9:48 ` Robin Rosenberg
2004-02-13 3:15 ` Jamie Lokier [this message]
2004-02-14 15:24 ` Eduard Bloch
2004-02-13 0:17 ` Jamie Lokier
2004-02-13 0:38 ` Jamie Lokier
2004-02-13 1:16 ` Robin Rosenberg
2004-02-13 1:23 ` Jamie Lokier
2004-02-13 1:46 ` Robin Rosenberg
2004-02-13 2:29 ` viro
2004-02-13 3:23 ` Jamie Lokier
2004-02-14 15:09 ` Eduard Bloch
2004-02-15 1:01 ` Jamie Lokier
2004-02-16 14:03 ` Eduard Bloch
2004-02-16 14:28 ` Jamie Lokier
2004-02-16 19:22 ` Eduard Bloch
2004-02-16 21:44 ` Jamie Lokier
2004-02-16 15:18 ` Valdis.Kletnieks
2004-02-16 15:32 ` Jamie Lokier
2004-02-16 19:13 ` Eduard Bloch
2004-02-16 15:46 ` John Bradford
2004-02-16 15:48 ` viro
2004-02-16 16:43 ` John Bradford
2004-02-16 16:25 ` Robin Rosenberg
2004-02-16 15:27 ` Jamie Lokier
2004-02-16 15:44 ` Robin Rosenberg
2004-02-13 10:03 ` Robin Rosenberg
2004-02-13 10:22 ` vda
2004-02-13 10:29 ` Robin Rosenberg
2004-02-12 13:28 ` Dave Kleikamp
2004-02-12 15:26 ` Valdis.Kletnieks
2004-02-12 15:41 ` Dave Kleikamp
-- strict thread matches above, loose matches on Subject: below --
2004-02-12 16:50 Nicolas Mailhot
2004-02-12 18:12 ` Robin Rosenberg
2004-02-13 3:03 ` Jamie Lokier
2004-02-13 10:07 ` Robin Rosenberg
2004-02-13 18:06 ` Nicolas Mailhot
2004-02-13 18:15 ` viro
2004-02-13 18:24 ` Valdis.Kletnieks
2004-02-13 18:31 ` viro
2004-02-13 20:27 ` Jamie Lokier
2004-02-13 18:31 ` Richard B. Johnson
2004-02-13 22:39 ` Robin Rosenberg
[not found] <04Feb13.015940est.41760@gpu.utcc.utoronto.ca>
2004-02-13 10:26 ` Robin Rosenberg
[not found] <04Feb13.024659est.41760@gpu.utcc.utoronto.ca>
2004-02-13 17:57 ` Nicolas Mailhot
[not found] <1nioI-5Re-1@gated-at.bofh.it>
[not found] ` <1orqh-6gs-47@gated-at.bofh.it>
[not found] ` <1ozGR-60N-1@gated-at.bofh.it>
[not found] ` <1oAa3-6pR-37@gated-at.bofh.it>
[not found] ` <1oBpi-7pO-1@gated-at.bofh.it>
[not found] ` <1oCbM-8oW-9@gated-at.bofh.it>
[not found] ` <1p9Kl-7BV-1@gated-at.bofh.it>
[not found] ` <1piXj-1d3-3@gated-at.bofh.it>
2004-02-15 14:26 ` Pascal Schmidt
[not found] ` <1pRLy-21o-31@gated-at.bofh.it>
[not found] ` <1pSRf-31Z-5@gated-at.bofh.it>
2004-02-16 15:44 ` Pascal Schmidt
2004-02-16 15:59 ` Valdis.Kletnieks
[not found] <1pvrI-8bq-29@gated-at.bofh.it>
[not found] ` <1pvrI-8bq-31@gated-at.bofh.it>
[not found] ` <1pvrJ-8bq-33@gated-at.bofh.it>
[not found] ` <1pvrJ-8bq-35@gated-at.bofh.it>
[not found] ` <1pvrJ-8bq-37@gated-at.bofh.it>
[not found] ` <1pvrJ-8bq-39@gated-at.bofh.it>
[not found] ` <1pvrJ-8bq-41@gated-at.bofh.it>
[not found] ` <1pvrJ-8bq-43@gated-at.bofh.it>
[not found] ` <1pTay-3hc-13@gated-at.bofh.it>
[not found] ` <1pTay-3hc-15@gated-at.bofh.it>
[not found] ` <1pTay-3hc-11@gated-at.bofh.it>
[not found] ` <1pTu7-3Ce-7@gated-at.bofh.it>
2004-02-16 17:26 ` Pascal Schmidt
2004-02-16 17:58 ` Valdis.Kletnieks
2004-02-16 19:48 ` Pascal Schmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040213031502.GG25499@mail.shareable.org \
--to=jamie@shareable.org \
--cc=john@grabjohn.com \
--cc=linux-kernel@vger.kernel.org \
--cc=robin.rosenberg.lists@dewire.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox