From: Al Viro <viro@ZenIV.linux.org.uk>
To: Eric Biederman <ebiederm@xmission.com>
Cc: linux-nfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Christoph Hellwig <hch@infradead.org>
Subject: [RFC] alloc_pid() breakage
Date: Tue, 3 Dec 2013 02:11:03 +0000 [thread overview]
Message-ID: <20131203021103.GH10323@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20131201181329.GC10323@ZenIV.linux.org.uk>
On Sun, Dec 01, 2013 at 06:13:29PM +0000, Al Viro wrote:
> AFAICS, pid_ns gets internal procfs instance and it pins the sucker down.
> Which would cause exact same problems, obviously. The trick done there
> is more or less to introduce a "being shut down" state of pid_ns - from
> the moment when we don't have any pids in it to actual destruction.
> Entering that state schedules (yes, it is async and yes, it is ugly)
> dropping the internal procfs vfsmount.
>
> Additional headache, AFAICS, comes from /proc/self/ns/pid - it can be
> opened, passed to somebody in ancestor pidns and then fed by it to
> setns(2). After that fork() by that somebody will trigger alloc_pid() in
> that pid_ns. What happens if it comes just before the (already scheduled)
> pid_ns_release_proc()? AFAICS, nothing good - there's no protection
> against leaks, access to freed vfsmount, double-mntput, etc. Eric, am
> I missing something subtle and relevant in that code?
Egads... I think I see what's going on, but it's convoluted as hell -
you rely on 1 not getting returned more than once by alloc_pidmap(), even
after having been freed, so this
if (unlikely(is_child_reaper(pid))) {
if (pid_ns_prepare_proc(ns))
goto out_free;
}
is essentially "on the first call of alloc_pid() for given pidns". And
upper bit in ->nr_hashed acts as "it's not in rundown state".
OK, so... what happens if I do unshare(CLONE_NEWPID) and the first fork()
attempt fails (e.g. due to failure to allocate a map page when allocating
a number in parent pidns, or OOM-induced failure to mount procfs, whatever).
Sure, that fork() has failed. No pid had been allocated, thus no free_pid()
calls made. After a while the memory becomes less tight and the same process
tries to fork() again. What happens then? pidns with processes in it,
but no reaper and NULL ->proc_mnt? sysctl(2) called in it won't be happy;
neither will exit(2), actually, since it'll hit proc_flush_task_mnt() and
oops on trying to evaluate ->proc_mnt->mnt_root...
Another question: can free_pid() end up scheduling ->proc_work for anything
other than the last level? After all, reaper in parent pidns couldn't have
gotten through the zap_pid_ns_process() yet, let alone getting to its
free_pid(), right?
WARNING: multiple messages have this Message-ID (diff)
From: Al Viro <viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
To: Eric Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Linus Torvalds
<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Subject: [RFC] alloc_pid() breakage
Date: Tue, 3 Dec 2013 02:11:03 +0000 [thread overview]
Message-ID: <20131203021103.GH10323@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20131201181329.GC10323-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
On Sun, Dec 01, 2013 at 06:13:29PM +0000, Al Viro wrote:
> AFAICS, pid_ns gets internal procfs instance and it pins the sucker down.
> Which would cause exact same problems, obviously. The trick done there
> is more or less to introduce a "being shut down" state of pid_ns - from
> the moment when we don't have any pids in it to actual destruction.
> Entering that state schedules (yes, it is async and yes, it is ugly)
> dropping the internal procfs vfsmount.
>
> Additional headache, AFAICS, comes from /proc/self/ns/pid - it can be
> opened, passed to somebody in ancestor pidns and then fed by it to
> setns(2). After that fork() by that somebody will trigger alloc_pid() in
> that pid_ns. What happens if it comes just before the (already scheduled)
> pid_ns_release_proc()? AFAICS, nothing good - there's no protection
> against leaks, access to freed vfsmount, double-mntput, etc. Eric, am
> I missing something subtle and relevant in that code?
Egads... I think I see what's going on, but it's convoluted as hell -
you rely on 1 not getting returned more than once by alloc_pidmap(), even
after having been freed, so this
if (unlikely(is_child_reaper(pid))) {
if (pid_ns_prepare_proc(ns))
goto out_free;
}
is essentially "on the first call of alloc_pid() for given pidns". And
upper bit in ->nr_hashed acts as "it's not in rundown state".
OK, so... what happens if I do unshare(CLONE_NEWPID) and the first fork()
attempt fails (e.g. due to failure to allocate a map page when allocating
a number in parent pidns, or OOM-induced failure to mount procfs, whatever).
Sure, that fork() has failed. No pid had been allocated, thus no free_pid()
calls made. After a while the memory becomes less tight and the same process
tries to fork() again. What happens then? pidns with processes in it,
but no reaper and NULL ->proc_mnt? sysctl(2) called in it won't be happy;
neither will exit(2), actually, since it'll hit proc_flush_task_mnt() and
oops on trying to evaluate ->proc_mnt->mnt_root...
Another question: can free_pid() end up scheduling ->proc_work for anything
other than the last level? After all, reaper in parent pidns couldn't have
gotten through the zap_pid_ns_process() yet, let alone getting to its
free_pid(), right?
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2013-12-03 2:11 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-01 13:14 [PATCH 00/11] [RFC] repair net namespace damage to rpc_pipefs Christoph Hellwig
2013-12-01 13:14 ` [PATCH 01/11] sunrpc: allocate pipefs inodes using kmalloc Christoph Hellwig
2013-12-01 13:14 ` Christoph Hellwig
2013-12-01 14:26 ` Jeff Layton
2013-12-01 14:26 ` Jeff Layton
2013-12-01 13:14 ` [PATCH 02/11] rpc_pipefs: always mount on net namespace initialization Christoph Hellwig
2013-12-01 13:14 ` Christoph Hellwig
2013-12-01 15:24 ` Jeff Layton
2013-12-01 13:14 ` [PATCH 03/11] sunrpc: remove the rpc_clients_block notifier Christoph Hellwig
2013-12-01 13:14 ` Christoph Hellwig
2013-12-01 15:25 ` Jeff Layton
2013-12-01 13:14 ` [PATCH 04/11] sunrpc: no need to have a lock or superblock for rpc_unlink Christoph Hellwig
2013-12-01 13:14 ` Christoph Hellwig
2013-12-01 13:14 ` [PATCH 05/11] sunprc: add sensible pipe creation and removal helpers Christoph Hellwig
2013-12-01 13:14 ` Christoph Hellwig
2013-12-01 13:14 ` [PATCH 06/11] sunrpc: clean up rpc_pipefs client dir creation Christoph Hellwig
2013-12-01 13:14 ` Christoph Hellwig
2013-12-01 13:14 ` [PATCH 07/11] sunrpc: make rpc_mkdir_populate net-namespace aware Christoph Hellwig
2013-12-01 13:14 ` Christoph Hellwig
2013-12-01 13:14 ` [PATCH 08/11] sunrpc: rpc_get_sb_net, die, die, die Christoph Hellwig
2013-12-01 13:14 ` Christoph Hellwig
2013-12-01 13:14 ` [PATCH 09/11] nfs: convert idmapper to rpc_mkpipe_clnt Christoph Hellwig
2013-12-01 13:14 ` Christoph Hellwig
2013-12-01 13:14 ` [PATCH 10/11] auth_gss: convert " Christoph Hellwig
2013-12-01 13:14 ` Christoph Hellwig
2013-12-01 13:14 ` [PATCH 11/11] sunrpc: rpc_pipefs cleanup Christoph Hellwig
2013-12-01 13:14 ` Christoph Hellwig
2013-12-01 14:36 ` [PATCH 00/11] [RFC] repair net namespace damage to rpc_pipefs Jeff Layton
2013-12-01 14:36 ` Jeff Layton
2013-12-01 15:44 ` Jeff Layton
2013-12-01 15:44 ` Jeff Layton
2013-12-01 15:57 ` Jeff Layton
2013-12-01 18:13 ` Al Viro
2013-12-01 18:13 ` Al Viro
2013-12-02 8:12 ` Christoph Hellwig
2013-12-02 8:12 ` Christoph Hellwig
2013-12-02 13:44 ` Trond Myklebust
2013-12-02 14:24 ` Stanislav Kinsbursky
2013-12-02 14:24 ` Stanislav Kinsbursky
2013-12-02 15:36 ` Christoph Hellwig
2013-12-02 15:36 ` Christoph Hellwig
2013-12-02 15:58 ` Trond Myklebust
2013-12-02 15:58 ` Trond Myklebust
2013-12-03 7:24 ` Stanislav Kinsbursky
2013-12-03 7:24 ` Stanislav Kinsbursky
2013-12-02 15:34 ` Christoph Hellwig
2013-12-02 15:34 ` Christoph Hellwig
2013-12-02 16:00 ` Trond Myklebust
2013-12-02 16:00 ` Trond Myklebust
2013-12-02 16:27 ` Christoph Hellwig
2013-12-02 16:27 ` Christoph Hellwig
2013-12-02 16:46 ` Trond Myklebust
2013-12-02 16:46 ` Trond Myklebust
2013-12-02 16:33 ` Jeff Layton
2013-12-02 16:37 ` J. Bruce Fields
2013-12-02 16:45 ` Jeff Layton
2013-12-02 15:57 ` Al Viro
2013-12-02 15:57 ` Al Viro
2013-12-02 16:04 ` Trond Myklebust
2013-12-02 16:04 ` Trond Myklebust
2013-12-03 2:11 ` Al Viro [this message]
2013-12-03 2:11 ` [RFC] alloc_pid() breakage Al Viro
2013-12-02 7:23 ` [PATCH 00/11] [RFC] repair net namespace damage to rpc_pipefs Stanislav Kinsbursky
2013-12-02 7:23 ` Stanislav Kinsbursky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131203021103.GH10323@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=ebiederm@xmission.com \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.