From: Al Viro <viro@zeniv.linux.org.uk>
To: David Howells <dhowells@redhat.com>
Cc: avagin@gmail.com, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 3/5] cgroup: Fix refcounting
Date: Sat, 5 Jan 2019 16:05:24 +0000 [thread overview]
Message-ID: <20190105160524.GQ2217@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20190105063703.GP2217@ZenIV.linux.org.uk>
On Sat, Jan 05, 2019 at 06:37:03AM +0000, Al Viro wrote:
> Cases (2) and (4) (!IS_ERR(dentry)) are fine - reference is consumed
> iff new_sb is true. Case (1) is also fine - reference is not consumed.
> Case (3) is where we go wrong - the test comes up with "do cgroup_put()",
> which we don't want.
>
> Note that case (1) leaves new_sb uninitialized. And *that* is the
> root cause of this shit - if we initialized it (in cgroup_do_mount())
> to false, the test would've been simply "if (!new_sb)". In all
> four cases.
>
> That's precisely the same kind of bug as the one fixed in
> 7b745a4e4051 ("unfuck sysfs_mount()") last May...
Egads... That's not all there is in there; kernfs_node_dentry() failures
are... not handled well.
To start with, kernfs_node_dentry() leaks. Consider this:
struct dentry *kernfs_node_dentry(struct kernfs_node *kn,
struct super_block *sb)
{
...
dentry = dget(sb->s_root);
/* Check if this is the root kernfs_node */
if (!kn->parent)
return dentry;
OK, it might (and normally will) return an extra reference to a dentry on
the given superblock. However,
knparent = find_next_ancestor(kn, NULL);
if (WARN_ON(!knparent))
return ERR_PTR(-EINVAL);
right after that will return ERR_PTR() *and* acquire an extra reference to
the root dentry of given superblock. Then we have a loop:
do {
struct dentry *dtmp;
struct kernfs_node *kntmp;
if (kn == knparent)
return dentry;
kntmp = find_next_ancestor(kn, knparent);
if (WARN_ON(!kntmp))
return ERR_PTR(-EINVAL);
dtmp = lookup_one_len_unlocked(kntmp->name, dentry,
strlen(kntmp->name));
Get an extra reference to child of dentry or get ERR_PTR()
dput(dentry);
... and drop the parent in either case.
if (IS_ERR(dtmp))
return dtmp;
knparent = kntmp;
dentry = dtmp;
} while (true);
So normally we acquire an return an extra reference to a dentry on our
superblock. In case of error returned by lookup_one_len_unlocked()
we pass that error along; no effect on refcounts. And in case of
that NULL from find_next_ancestor() we return an error *and* leave
behind an extra reference to some dentry on the superblock. With
no way for caller to tell which dentry would that be.
IOW, these two if (WARN_ON(!...)) should do dput(dentry) as well,
but the trouble doesn't end there. These, after all, appear to
be "never can happen" cases; failure of lookup_one_len_unlocked()
is really possible (with -ENOMEM, if nothing else). And the caller
(cgroup_do_mount()) is doing this:
if (!IS_ERR(dentry) && ns != &init_cgroup_ns) {
struct dentry *nsdentry;
struct cgroup *cgrp;
mutex_lock(&cgroup_mutex);
spin_lock_irq(&css_set_lock);
cgrp = cset_cgroup_from_root(ns->root_cset, root);
spin_unlock_irq(&css_set_lock);
mutex_unlock(&cgroup_mutex);
nsdentry = kernfs_node_dentry(cgrp->kn, dentry->d_sb);
dput(dentry);
dentry = nsdentry;
}
Here dentry is equal to the root of our superblock. The reference to
dentry is held, so's an active reference to superblock in question and
so is ->s_umount of that superblock.
In success case we end up acquiring an extra reference to dentry on
the same superblock, dropping the reference we used to hold on the
original and use that new dentry instead.
In failure case, though, we end up with
* no extra references to any dentries on that superblock
* active reference held to the superblock itself
* ->s_umount held on it
... and the error is returned to caller and propagated back to
caller of ->mount(). Which has no way to tell which superblock
have we leaked (locked, at that). IOW, deactivate_locked_super()
was missing not just in David's rewrite - it's needed in the
mainline as well...
next prev parent reply other threads:[~2019-01-05 16:05 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-03 23:44 [PATCH 1/5] cgroup2: Always apply root flags on successful superblock obtainance David Howells
2019-01-03 23:44 ` [PATCH 2/5] kernfs: Provide a fill_super hook for the subclass filesystem David Howells
2019-01-03 23:44 ` [PATCH 3/5] cgroup: Fix refcounting David Howells
2019-01-04 8:16 ` Andrei Vagin
2019-01-05 6:37 ` Al Viro
2019-01-05 16:05 ` Al Viro [this message]
2019-01-05 7:21 ` David Howells
2019-01-03 23:44 ` [PATCH 4/5] cgroup: refcount fixes David Howells
2019-01-03 23:44 ` [PATCH 5/5] percpu: Kill off PERCPU_REF_INIT_DEAD and percpu_ref_reinit() David Howells
2019-01-03 23:56 ` [PATCH 4/5] cgroup: refcount fixes David Howells
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190105160524.GQ2217@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=avagin@gmail.com \
--cc=dhowells@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).