From: Al Viro <viro@zeniv.linux.org.uk>
To: David Howells <dhowells@redhat.com>
Cc: avagin@gmail.com, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 3/5] cgroup: Fix refcounting
Date: Sat, 5 Jan 2019 06:37:03 +0000 [thread overview]
Message-ID: <20190105063703.GP2217@ZenIV.linux.org.uk> (raw)
In-Reply-To: <154655905608.3032.5762393419161551596.stgit@warthog.procyon.org.uk>
On Thu, Jan 03, 2019 at 11:44:16PM +0000, David Howells wrote:
> Fix cgroup refcounting by the following means:
>
> (1) Don't use PERCPU_REF_INIT_DEAD and percpu_ref_reinit(). Using this
> causes a problem should kernfs_get_tree() create a superblock and then
> fail to create a root inode. The superblock destructor will be invoked
> before kernfs_get_tree() returns - and the refcount isn't reinit'd till
> after that.
>
> To this end, cgroup_setup_root() no longer needs a ref_flags argument.
>
> (2) Provide a flag, CSS_CREATING, that is used to prevent concurrent access
> to a cgroup root that is being set up and for which the superblock is
> still being set up. This appears to be necessary to hide the fact that
> the cgroup is accessible before the superblock is fully initialised.
>
> (3) Set CSS_CREATING in cgroup1_get_tree() on a new cgroup_root. This is
> then cleared in cgroup_do_get_tree().
>
> (4) cgroup_get_tree() is made to call cgroup_get() on the root it sets for a
> v2 cgroup. Admittedly, this doesn't do anything because CSS_NO_REF is
> set, but it future proofs it in case this is changed in future.
>
> (5) cgroup_fill_super() is created and passed to kernfs_get_tree() in the
> kernfs_fs_context struct. This takes an extra ref on the root for the
> superblock in the event that a superblock is created.
>
> struct cgroup_fs_context::root then holds a single ref on the root right
> through till it is freed.
>
> Note that new_root is transferred into the cgroup_fs_context as
> is_new_root, though this is probably unnecessary as it's only used to clear
> CSS_CREATING - and no one else can gain access to the root until we've
> cleared the flag.
Umm... It'll need reordering; could you do CSS_CREATING parts on top of
mainline, with the rest rediffed on top of that? We'll need to backport
the fix, obviously...
Re (4) - that's over the top for backporting. Sure, the analogue of the
problem does exist in mainline -
if (IS_ERR(dentry) || !new_sb)
cgroup_put(&root->cgrp);
can't tell kernfs_mount_ns() failing very early from kernfs_fill_super()
failing and triggering ->kill_sb(). The latter case has a reference
dropped by cgroup_kill_sb(); in the former nothing of that sort
happens. Said that, we have the following cases:
1) early failure (anything up to sget_userns() in mainline).
ERR_PTR() is returned, reference not consumed.
2) successful reuse of existing superblock. Normal dentry
returned, new_sb set to false, reference not consumed.
3) new superblock, with failing kernfs_fill_super().
ERR_PTR() returned, new_sb set to true. Reference consumed.
4) new superblock, successful setup. Normal dentry returned,
new_sb set to true. Reference consumed.
Cases (2) and (4) (!IS_ERR(dentry)) are fine - reference is consumed
iff new_sb is true. Case (1) is also fine - reference is not consumed.
Case (3) is where we go wrong - the test comes up with "do cgroup_put()",
which we don't want.
Note that case (1) leaves new_sb uninitialized. And *that* is the
root cause of this shit - if we initialized it (in cgroup_do_mount())
to false, the test would've been simply "if (!new_sb)". In all
four cases.
That's precisely the same kind of bug as the one fixed in
7b745a4e4051 ("unfuck sysfs_mount()") last May...
next prev parent reply other threads:[~2019-01-05 6:37 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-03 23:44 [PATCH 1/5] cgroup2: Always apply root flags on successful superblock obtainance David Howells
2019-01-03 23:44 ` [PATCH 2/5] kernfs: Provide a fill_super hook for the subclass filesystem David Howells
2019-01-03 23:44 ` [PATCH 3/5] cgroup: Fix refcounting David Howells
2019-01-04 8:16 ` Andrei Vagin
2019-01-05 6:37 ` Al Viro [this message]
2019-01-05 7:21 ` David Howells
2019-01-05 16:05 ` Al Viro
2019-01-03 23:44 ` [PATCH 4/5] cgroup: refcount fixes David Howells
2019-01-03 23:56 ` David Howells
2019-01-03 23:44 ` [PATCH 5/5] percpu: Kill off PERCPU_REF_INIT_DEAD and percpu_ref_reinit() David Howells
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190105063703.GP2217@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=avagin@gmail.com \
--cc=dhowells@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.