From: Al Viro <viro@zeniv.linux.org.uk>
To: David Howells <dhowells@redhat.com>
Cc: avagin@gmail.com, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 3/5] cgroup: Fix refcounting
Date: Sat, 5 Jan 2019 06:37:03 +0000 [thread overview]
Message-ID: <20190105063703.GP2217@ZenIV.linux.org.uk> (raw)
In-Reply-To: <154655905608.3032.5762393419161551596.stgit@warthog.procyon.org.uk>
On Thu, Jan 03, 2019 at 11:44:16PM +0000, David Howells wrote:
> Fix cgroup refcounting by the following means:
>
> (1) Don't use PERCPU_REF_INIT_DEAD and percpu_ref_reinit(). Using this
> causes a problem should kernfs_get_tree() create a superblock and then
> fail to create a root inode. The superblock destructor will be invoked
> before kernfs_get_tree() returns - and the refcount isn't reinit'd till
> after that.
>
> To this end, cgroup_setup_root() no longer needs a ref_flags argument.
>
> (2) Provide a flag, CSS_CREATING, that is used to prevent concurrent access
> to a cgroup root that is being set up and for which the superblock is
> still being set up. This appears to be necessary to hide the fact that
> the cgroup is accessible before the superblock is fully initialised.
>
> (3) Set CSS_CREATING in cgroup1_get_tree() on a new cgroup_root. This is
> then cleared in cgroup_do_get_tree().
>
> (4) cgroup_get_tree() is made to call cgroup_get() on the root it sets for a
> v2 cgroup. Admittedly, this doesn't do anything because CSS_NO_REF is
> set, but it future proofs it in case this is changed in future.
>
> (5) cgroup_fill_super() is created and passed to kernfs_get_tree() in the
> kernfs_fs_context struct. This takes an extra ref on the root for the
> superblock in the event that a superblock is created.
>
> struct cgroup_fs_context::root then holds a single ref on the root right
> through till it is freed.
>
> Note that new_root is transferred into the cgroup_fs_context as
> is_new_root, though this is probably unnecessary as it's only used to clear
> CSS_CREATING - and no one else can gain access to the root until we've
> cleared the flag.
Umm... It'll need reordering; could you do CSS_CREATING parts on top of
mainline, with the rest rediffed on top of that? We'll need to backport
the fix, obviously...
Re (4) - that's over the top for backporting. Sure, the analogue of the
problem does exist in mainline -
if (IS_ERR(dentry) || !new_sb)
cgroup_put(&root->cgrp);
can't tell kernfs_mount_ns() failing very early from kernfs_fill_super()
failing and triggering ->kill_sb(). The latter case has a reference
dropped by cgroup_kill_sb(); in the former nothing of that sort
happens. Said that, we have the following cases:
1) early failure (anything up to sget_userns() in mainline).
ERR_PTR() is returned, reference not consumed.
2) successful reuse of existing superblock. Normal dentry
returned, new_sb set to false, reference not consumed.
3) new superblock, with failing kernfs_fill_super().
ERR_PTR() returned, new_sb set to true. Reference consumed.
4) new superblock, successful setup. Normal dentry returned,
new_sb set to true. Reference consumed.
Cases (2) and (4) (!IS_ERR(dentry)) are fine - reference is consumed
iff new_sb is true. Case (1) is also fine - reference is not consumed.
Case (3) is where we go wrong - the test comes up with "do cgroup_put()",
which we don't want.
Note that case (1) leaves new_sb uninitialized. And *that* is the
root cause of this shit - if we initialized it (in cgroup_do_mount())
to false, the test would've been simply "if (!new_sb)". In all
four cases.
That's precisely the same kind of bug as the one fixed in
7b745a4e4051 ("unfuck sysfs_mount()") last May...
next prev parent reply other threads:[~2019-01-05 6:37 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-03 23:44 [PATCH 1/5] cgroup2: Always apply root flags on successful superblock obtainance David Howells
2019-01-03 23:44 ` [PATCH 2/5] kernfs: Provide a fill_super hook for the subclass filesystem David Howells
2019-01-03 23:44 ` [PATCH 3/5] cgroup: Fix refcounting David Howells
2019-01-04 8:16 ` Andrei Vagin
2019-01-05 6:37 ` Al Viro [this message]
2019-01-05 16:05 ` Al Viro
2019-01-05 7:21 ` David Howells
2019-01-03 23:44 ` [PATCH 4/5] cgroup: refcount fixes David Howells
2019-01-03 23:44 ` [PATCH 5/5] percpu: Kill off PERCPU_REF_INIT_DEAD and percpu_ref_reinit() David Howells
2019-01-03 23:56 ` [PATCH 4/5] cgroup: refcount fixes David Howells
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190105063703.GP2217@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=avagin@gmail.com \
--cc=dhowells@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).