From: Tejun Heo <tj@kernel.org>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
"Michal Koutný" <mkoutny@suse.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
"Boqun Feng" <boqun.feng@gmail.com>,
"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
"Hillf Danton" <hdanton@sina.com>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Marco Elver" <elver@google.com>,
"Zefan Li" <lizefan.x@bytedance.com>,
tglx@linutronix.de,
syzbot+6ea37e2e6ffccf41a7e6@syzkaller.appspotmail.com
Subject: Re: [PATCH v3] kernfs: Use RCU for kernfs_node::name and ::parent lookup.
Date: Thu, 16 Jan 2025 07:32:08 -1000 [thread overview]
Message-ID: <Z4lCmIB_7cPm0Ebv@slm.duckdns.org> (raw)
In-Reply-To: <20250116132745.dU941oor@linutronix.de>
Hello,
On Thu, Jan 16, 2025 at 02:27:45PM +0100, Sebastian Andrzej Siewior wrote:
> > Shouldn't this be freed somewhere?
>
> There is
> char *kn_name __free(kfree) = NULL;
>
> at the top. This will kfree(kn_name) once it is out of scope (on return
> from rdtgroup_pseudo_lock_create()).
Ah, makes sense.
...
> > > @@ -557,16 +568,18 @@ void kernfs_put(struct kernfs_node *kn)
> > > if (!kn || !atomic_dec_and_test(&kn->count))
> > > return;
> > > root = kernfs_root(kn);
> > > + guard(rcu)();
> > > repeat:
> > > /*
> > > * Moving/renaming is always done while holding reference.
> > > * kn->parent won't change beneath us.
> > > */
> > > - parent = kn->parent;
> > > + parent = rcu_dereference(kn->parent);
> >
> > I wonder whether it'd be better to encode the reference count rule (ie. add
> > the condition kn->count == 0 to deref_check) in the kn->parent deref
> > accessor. This function doesn't need RCU read lock and holding it makes it
> > more confusing.
>
> You are saying that we don't need RCU here because if we drop the last
> reference then nobody can rename the node anymore and so parent can't
> change. That sounds right.
> What about using rcu_dereference_protected() instead? Using
> rcu_dereference(x, !atomic_read(&kn->count)) looks odd given that we
> established that the counter is 0. Therefore I would suggest
> rcu_access_pointer() but the reference drop might qualify as "locked".
I think it's usually a better form to encode the whole access rule in a
shared accessor for the field so that the deref rules for the field can be
understood and enforced from the shared accessor and the shared accessor
would use rcu_dereference_protected() internally.
> > > diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> > > index 8502ef68459b9..05f7b30283150 100644
> > > --- a/fs/kernfs/file.c
> > > +++ b/fs/kernfs/file.c
> > > @@ -911,9 +911,11 @@ static void kernfs_notify_workfn(struct work_struct *work)
> > > /* kick fsnotify */
> > >
> > > down_read(&root->kernfs_supers_rwsem);
> > > + down_read(&root->kernfs_rwsem);
> >
> > Why is this addition necessary? Hmm... was the code previously broken w.r.t.
> > renaming? Can this be RCU?
>
> I *think* it was broken unless you unsure somehow that this can't be
> invoked on nodes which can be renamed.
> The ensures that the later obtained kn_name does not freed after a
> rename.
> This can not be RCU because ilookup() has wait_on_inode() (might sleep).
If it was broken, let's separate it out to its own patch.
> > > diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
> > > index 1358c21837f1a..db71faba3bb53 100644
> > > --- a/fs/kernfs/mount.c
> > > +++ b/fs/kernfs/mount.c
> > > @@ -145,8 +145,10 @@ static struct dentry *kernfs_fh_to_parent(struct super_block *sb,
> > > static struct dentry *kernfs_get_parent_dentry(struct dentry *child)
> > > {
> > > struct kernfs_node *kn = kernfs_dentry_node(child);
> > > + struct kernfs_root *root = kernfs_root(kn);
> > >
> > > - return d_obtain_alias(kernfs_get_inode(child->d_sb, kn->parent));
> > > + guard(rwsem_read)(&root->kernfs_rwsem);
> > > + return d_obtain_alias(kernfs_get_inode(child->d_sb, kernfs_rcu_get_parent(kn)));
> >
> > Ditto.
>
> kernfs_rcu_get_parent() gets you name from the kn. Can you ensure that
> it won't go away during a rename? If so, I would add the matching
> comment then.
> There is d_obtain_alias() -> __d_obtain_alias() -> d_alloc_anon() which
> makes not possible to use RCU.
If true, better to put it in its own prep patch, I think.
...
> > > @@ -216,6 +219,9 @@ struct dentry *kernfs_node_dentry(struct kernfs_node *kn,
> > > if (!kn->parent)
> > > return dentry;
> > >
> > > + root = kernfs_root(kn);
> > > + guard(rwsem_read)(&root->kernfs_rwsem);
> >
> > Here too, it's a bit confusing that it's adding new locking. Was the code
> > broken before? If so, it'd be clearer if the fixes were in their own patch.
>
> It dereferences name (later in lookup_positive_unlocked()). I don't see
> how it is safe against a rename without the lock.
>
> If you agree that all three are bugs, that existed before, then I will
> extract it out of this patch.
Not sure whether they're all correct but let's separate them out.
...
> > I wonder whether it'd be better to rename kn->parent to something like
> > kn->__parent (or maybe some other suffix) to clarify that the field is not
> > to be deref'ed directly and kernfs_parent() helper is made available to the
> > users. That way, users can benefit from the additional conditions in
> > rcu_dereference_check().
>
> sparse should yell at people if they deference directly. I have no
> problem to rename it to __parent if you say so.
Sparse isn't useless but also often ignored. Probably better to be explicit.
...
> > e.g. Here, it'd be a lot better if kernfs provided helper can be used so
> > that deref condition check can be preserved.
>
> Something like
> | static struct cgroup *cg_get_parent_priv(struct kernfs_node *kn)
> | {
> | return rcu_dereference_check(kn->parent, kn->flags & KERNFS_ROOT_INVARIANT_PARENT)->priv;
> | }
Maybe a shorter name?
Thanks.
--
tejun
prev parent reply other threads:[~2025-01-16 17:32 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-21 17:52 [PATCH v3] kernfs: Use RCU for kernfs_node::name and ::parent lookup Sebastian Andrzej Siewior
2024-11-25 14:49 ` Michal Koutný
2024-11-25 18:02 ` Sebastian Andrzej Siewior
2024-12-11 15:36 ` Michal Koutný
2024-12-03 22:21 ` Tejun Heo
2025-01-16 13:27 ` Sebastian Andrzej Siewior
2025-01-16 13:39 ` Sebastian Andrzej Siewior
2025-01-16 17:32 ` Tejun Heo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z4lCmIB_7cPm0Ebv@slm.duckdns.org \
--to=tj@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=boqun.feng@gmail.com \
--cc=cgroups@vger.kernel.org \
--cc=elver@google.com \
--cc=gregkh@linuxfoundation.org \
--cc=hannes@cmpxchg.org \
--cc=hdanton@sina.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizefan.x@bytedance.com \
--cc=mkoutny@suse.com \
--cc=paulmck@kernel.org \
--cc=syzbot+6ea37e2e6ffccf41a7e6@syzkaller.appspotmail.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox