From: Jamie Lokier <jamie@shareable.org>
To: Linus Torvalds <torvalds@osdl.org>
Cc: viro@parcelfarce.linux.theplanet.co.uk, Marc <pcg@goof.com>,
Marc Lehmann <pcg@schmorp.de>,
Linux kernel <linux-kernel@vger.kernel.org>
Subject: Re: UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior)
Date: Tue, 17 Feb 2004 20:30:56 +0000 [thread overview]
Message-ID: <20040217203056.GC24311@mail.shareable.org> (raw)
In-Reply-To: <Pine.LNX.4.58.0402171134180.2154@home.osdl.org>
Linus Torvalds wrote:
> And it would be _trivial_ to add a LOOKUP_NODOTDOT and allow user space to
> use it through a O_NODOTDOT thing.
Nope. That wouldn't help for a bundle of libraries that goes:
1. Eliminate "." and ".." components, leaving only leading ".."s.
2. Reject path if it has a leading "..".
3. Shove it in a string with some other text and pass to other library.
Next program does:
4. Extract path from string.
5. open ("/var/public/files/$PATH", ...)
O_NODOTDOT won't protect against that.
> Same goes for O_NOFOLLOW or O_NOMOUNT, to tell the kernel that it
> shouldn't follow symbolic links or cross mount-points - another thing that
> some software might want to use in order to check that you can't "escape"
> your subtree.
( O_NOMOUNT is a good idea. I like O_NOFOLLOW - already use it to
avoid lstat() calls. )
> But note how my point was that YOU SHOULD NEVER EVER MUNGE A PATHNAME!
>
> It is fundamentally _wrong_ to convert pathnames. You _cannot_ do it
> correctly.
I know. You know. I think everyone else got it the first time too ;)
Have you ever written a script which takes a pathname and puts it in a
text file, and passes to another to operate on, and just skipped over
the details of what a poorly placed control character would do?
Real applications do exactly that sort of thing. Mostly it works,
occasionally security holes are found. Welcome to the land of dodgy
CGI scripts and program generated Makefiles.
This is the same.
> - always _always_ work on the "extended UTF-8" format, and never EVER
> convert that to anything else (except when you need to actually print
> it, but then you encode it properly with escape sequences, the way you
> have to _anyway_).
>
> If you follow the above simple rules, you can't get it wrong. And in those
> rules, ".." is the BYTE SEQUENCE in the "extended UTF-8". Nothing more.
Yup. It works right up until you pass your string to a library which
doesn't follow that rule, and which munges malformed UTF-8 because
it's _expecting_ well formed UTF-8. E.g. you pass a path in an XML
document; the XML parser at the other end will either munge your path
(causing a security hole), or reject it (which is good).
The right thing to do on these occasions is check and/or escape
"extended UTF-8" prior to putting it into a text context.
Practically, that means a UTF-8 aware program has to keep track of
which text is "extended UTF-8" (i.e. bytes), and which text is real UTF-8.
Practically, it means every interface where a path may be passed in a
UTF-8 string has to define whether that's an escaped path, which will
be unescaped before being used for a system call, or an unescaped path.
Then you get into what kind of escaping.
In theory all those checks and escapings will be in the right places.
In theory C programs don't have buffer overflows either.
It is exasperated because UTF-8 is often passed through middle-layer
programs and libraries that don't know anything about it, so when
assembling a whole system it's all too easy to lose track of where to
put the checks and escapings - and where not to.
Yes there _is_ a perfectly fine solution: the one you gave.
In practice it is difficult to ensure a whole system where paths are
mixed with text is consistent about that. And that's where we get a
good selection of our Windows worms from.
-- Jamie
next prev parent reply other threads:[~2004-02-17 20:31 UTC|newest]
Thread overview: 118+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <04Feb13.163954est.41760@gpu.utcc.utoronto.ca>
2004-02-14 14:27 ` JFS default behavior Nicolas Mailhot
2004-02-14 15:40 ` viro
2004-02-14 17:47 ` Nicolas Mailhot
2004-02-14 17:59 ` Nicolas Mailhot
2004-02-14 23:06 ` Robin Rosenberg
2004-02-14 23:29 ` viro
2004-02-15 0:07 ` Robin Rosenberg
2004-02-15 2:41 ` Linus Torvalds
2004-02-15 3:33 ` Matthias Urlichs
2004-02-15 4:04 ` viro
2004-02-15 9:48 ` Robin Rosenberg
2004-02-15 18:26 ` yodaiken
2004-02-18 2:48 ` Unicode normalization (userspace issue, but what the heck) H. Peter Anvin
2004-02-20 9:48 ` Matthias Urlichs
2004-02-16 15:05 ` stty utf8 Jamie Lokier
2004-02-16 16:10 ` Gerd Knorr
2004-02-16 22:03 ` Jamie Lokier
2004-02-16 22:17 ` Linus Torvalds
2004-02-16 22:04 ` Jamie Lokier
2004-02-16 18:36 ` UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior) Marc Lehmann
2004-02-16 18:49 ` Linus Torvalds
2004-02-16 19:26 ` UTF-8 practically vs. theoretically in the VFS API Jeff Garzik
2004-02-16 19:48 ` John Bradford
2004-02-16 19:48 ` Linus Torvalds
2004-02-16 20:20 ` Marc Lehmann
2004-02-16 20:26 ` Linus Torvalds
2004-02-18 2:49 ` Rob Landley
2004-02-16 20:21 ` bert hubert
2004-02-16 20:33 ` Marc Lehmann
2004-02-18 2:58 ` H. Peter Anvin
2004-02-18 3:13 ` Linus Torvalds
2004-02-18 3:22 ` H. Peter Anvin
2004-02-18 3:30 ` Linus Torvalds
2004-02-18 5:30 ` H. Peter Anvin
2004-02-18 10:29 ` Robin Rosenberg
2004-02-18 11:49 ` Tomas Szepe
2004-02-18 11:59 ` Robin Rosenberg
2004-02-18 12:05 ` Tomas Szepe
2004-02-18 12:34 ` Robin Rosenberg
2004-02-18 15:35 ` Linus Torvalds
2004-02-18 19:47 ` Tomas Szepe
2004-02-18 20:01 ` H. Peter Anvin
2004-02-18 21:22 ` Robin Rosenberg
2004-02-18 21:42 ` H. Peter Anvin
2004-02-18 11:24 ` Jamie Lokier
2004-02-18 11:33 ` Jamie Lokier
2004-02-18 16:47 ` H. Peter Anvin
2004-02-18 19:59 ` Linus Torvalds
2004-02-18 20:08 ` H. Peter Anvin
2004-02-18 7:25 ` bert hubert
2004-02-16 20:16 ` Marc Lehmann
2004-02-16 20:20 ` Jeff Garzik
2004-02-16 21:10 ` viro
2004-02-17 7:18 ` jw schultz
2004-02-17 7:42 ` Nick Piggin
2004-02-16 20:03 ` UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior) Marc Lehmann
2004-02-16 20:23 ` Linus Torvalds
2004-02-16 20:58 ` Marc Lehmann
2004-02-17 14:12 ` Dave Kleikamp
2004-02-16 22:26 ` Jamie Lokier
2004-02-16 22:40 ` Linus Torvalds
2004-02-16 22:52 ` Linus Torvalds
2004-02-17 13:15 ` Jamie Lokier
2004-02-17 7:14 ` Lehmann
2004-02-17 11:20 ` UTF-8 practically vs. theoretically in the VFS API Helge Hafting
2004-02-17 15:56 ` UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior) Linus Torvalds
[not found] ` <20040217161111.GE8231@schmorp.de>
2004-02-17 16:32 ` Linus Torvalds
2004-02-17 16:46 ` Jamie Lokier
2004-02-17 19:00 ` UTF-8 practically vs. theoretically in the VFS API Måns Rullgård
2004-02-17 20:57 ` Jamie Lokier
2004-02-17 21:06 ` Alex Belits
2004-02-17 21:47 ` Jamie Lokier
2004-02-22 15:32 ` Eric W. Biederman
2004-02-22 16:28 ` Jamie Lokier
2004-02-22 21:53 ` Eric W. Biederman
2004-02-18 7:23 ` Marc Lehmann
2004-02-17 21:23 ` Matthew Kirkwood
2004-02-18 13:11 ` UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior) Matthew Garrett
2004-02-17 16:52 ` Marc Lehmann
2004-02-17 16:54 ` UTF-8 practically vs. theoretically in the VFS API Stefan Smietanowski
2004-02-18 1:27 ` Hans Reiser
2004-02-18 2:08 ` Robin Rosenberg
2004-02-18 11:06 ` Jamie Lokier
2004-02-17 20:37 ` UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior) Robin Rosenberg
2004-02-17 16:36 ` Jamie Lokier
2004-02-17 17:52 ` viro
2004-02-17 19:29 ` Jamie Lokier
2004-02-17 19:45 ` Linus Torvalds
2004-02-17 20:30 ` Jamie Lokier [this message]
2004-02-17 20:49 ` Linus Torvalds
2004-02-17 21:17 ` Jamie Lokier
2004-02-17 19:51 ` Jamie Lokier
2004-02-17 19:53 ` viro
2004-02-17 20:35 ` John Bradford
2004-02-17 20:40 ` Jamie Lokier
2004-02-17 20:50 ` John Bradford
2004-02-17 21:04 ` Linus Torvalds
2004-02-17 21:16 ` John Bradford
2004-02-17 21:21 ` Linus Torvalds
2004-02-18 0:52 ` John Bradford
2004-02-17 22:50 ` Robin Rosenberg
2004-02-18 6:48 ` Marc Lehmann
2004-02-17 20:47 ` viro
2004-02-17 20:53 ` John Bradford
2004-02-17 20:59 ` Linus Torvalds
2004-02-17 21:06 ` John Bradford
2004-02-17 21:42 ` Alex Belits
2004-02-18 6:56 ` Marc Lehmann
2004-02-18 20:37 ` Alex Belits
2004-02-18 3:11 ` H. Peter Anvin
2004-02-17 20:38 ` Jamie Lokier
2004-02-18 3:07 ` H. Peter Anvin
2004-02-21 13:54 ` Pavel Machek
2004-02-22 20:09 ` H. Peter Anvin
2004-02-17 1:24 ` Alex Belits
2004-02-17 21:09 ` Jamie Lokier
2004-02-17 21:48 ` Linus Torvalds
2004-02-17 22:19 ` Alex Belits
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040217203056.GC24311@mail.shareable.org \
--to=jamie@shareable.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pcg@goof.com \
--cc=pcg@schmorp.de \
--cc=torvalds@osdl.org \
--cc=viro@parcelfarce.linux.theplanet.co.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox