From: Al Viro <viro@zeniv.linux.org.uk>
To: Imran Khan <imran.f.khan@oracle.com>
Cc: tj@kernel.org, gregkh@linuxfoundation.org,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org
Subject: Re: [RESEND PATCH v7 1/8] kernfs: Introduce interface to access global kernfs_open_file_mutex.
Date: Thu, 17 Mar 2022 21:34:49 +0000 [thread overview]
Message-ID: <YjOpedPDj+3KCJjk@zeniv-ca.linux.org.uk> (raw)
In-Reply-To: <20220317072612.163143-2-imran.f.khan@oracle.com>
On Thu, Mar 17, 2022 at 06:26:05PM +1100, Imran Khan wrote:
> @@ -570,9 +571,10 @@ static void kernfs_put_open_node(struct kernfs_node *kn,
> struct kernfs_open_file *of)
> {
> struct kernfs_open_node *on = kn->attr.open;
> + struct mutex *mutex = NULL;
> unsigned long flags;
>
> - mutex_lock(&kernfs_open_file_mutex);
> + mutex = kernfs_open_file_mutex_lock(kn);
> spin_lock_irqsave(&kernfs_open_node_lock, flags);
Can that ever be reached with local interrupts disabled? I mean, what is
that spin_lock_irqsave() about?
> @@ -745,11 +747,12 @@ static int kernfs_fop_release(struct inode *inode, struct file *filp)
> {
> struct kernfs_node *kn = inode->i_private;
> struct kernfs_open_file *of = kernfs_of(filp);
> + struct mutex *lock = NULL;
>
> if (kn->flags & KERNFS_HAS_RELEASE) {
> - mutex_lock(&kernfs_open_file_mutex);
> + lock = kernfs_open_file_mutex_lock(kn);
> kernfs_release_file(kn, of);
> - mutex_unlock(&kernfs_open_file_mutex);
> + mutex_unlock(lock);
Careful - you are about to remove the existing exclusion between *all*
->release() instances, same node or not.
In particular, if some driver had them manipulate a driver-local list of
some kind, relying upon the kernfs to provide the exclusion, it'd break
as soon as you turn that thing into per-node (or hashed) mutex.
It's _probably_ safe, seeing that the one and only instance of ->release()
in the entire tree (cgroup_file_release()) is rather limited in what
it's doing, and while it calls a submethod (cftype.release()) there's only
a couple of instances of that (cgroup_procs_release() and
cgroup_pressure_release(), both in kernel/cgroup/cgroup.c). Neither
seems to rely upon the global exclusion.
However, that's a change of rules and it needs to be documented as such.
Incidentally, what's the point of having kernfs_open_node->refcnt
atomic_t? All users are under kernfs_open_node_lock... AFAICS,
it's simply "->files is non-empty or something is in
kernfs_drain_open_files() for the node in question", so I'm not
even sure we want a counter there...
Note that kernfs_drain_open_files() can't overlap with
kernfs_fops_open() adding to the list of files (and we seriously
rely upon that - you don't want ops->release() called while in
the middle of ops->open()). kernfs_fops_open() starts with
grabbing an active reference; kernfs_drain_open_files() is
not called until we had
* prevented new active references being grabbed and
* waited for all active references to be dropped.
So kernfs_drain_open_files() can do the following:
1) optimistically check for ->attr.open being NULL;
bugger off if it is. We know that nobody could be currently
trying to add anything to it, mutex or no mutex.
2) grab the mutex
3) recheck ->attr.open; it might have become NULL.
If it had, unlock and bugger off.
4) walk the list, doing unmaps/releases.
5) unlock and bugger off.
The only thing doing removals from the list is
kernfs_put_open_node() and it grabs that mutex.
So it can't get to the "remove from list, free the container
of list head" until we are through.
IOW, there's no reason to hold a reference to kernfs_open_node
in kernfs_drain_open_files() at all. And that makes ->refcnt
completely useless - kernfs_put_open_node() should do
list_del(&of->list);
if (list_empty(&on->files))
kn->attr.open = NULL;
else
on = NULL;
and to hell with refcounting.
As the matter of fact, we can do even better - make freeing
that thing rcu-delayed, use rcu_assign_pointer() for stores,
rcu_dereference() for loads and have kernfs_notify() do
rcu_read_lock();
on = rcu_dereference(kn->attr.open);
if (on) {
atomic_inc(&on->event);
wake_up_interruptible(&on->poll);
}
rcu_read_unlock();
and kernfs_open_node_lock becomes useless - all places that
grab it are under kernfs_open_file_mutex.
next prev parent reply other threads:[~2022-03-17 21:37 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-17 7:26 [RESEND PATCH v7 0/8] kernfs: Introduce interface to access global kernfs_open_file_mutex Imran Khan
2022-03-17 7:26 ` [RESEND PATCH v7 1/8] " Imran Khan
2022-03-17 21:34 ` Al Viro [this message]
2022-04-05 5:36 ` Imran Khan
2022-04-05 14:24 ` Al Viro
2022-04-06 4:54 ` Imran Khan
2022-04-06 14:54 ` Al Viro
2022-04-06 15:18 ` Tejun Heo
2022-04-14 0:01 ` Imran Khan
2022-03-18 17:10 ` Eric W. Biederman
2022-03-21 0:10 ` Imran Khan
2022-03-17 7:26 ` [RESEND PATCH v7 2/8] kernfs: Replace global kernfs_open_file_mutex with hashed mutexes Imran Khan
2022-03-17 7:26 ` [RESEND PATCH v7 3/8] kernfs: Introduce interface to access kernfs_open_node_lock Imran Khan
2022-03-17 7:26 ` [RESEND PATCH v7 4/8] kernfs: Replace global kernfs_open_node_lock with hashed spinlocks Imran Khan
2022-03-17 7:26 ` [RESEND PATCH v7 5/8] kernfs: Use a per-fs rwsem to protect per-fs list of kernfs_super_info Imran Khan
2022-03-17 7:26 ` [RESEND PATCH v7 6/8] kernfs: Introduce interface to access per-fs rwsem Imran Khan
2022-03-17 7:26 ` [RESEND PATCH v7 7/8] kernfs: Replace per-fs rwsem with hashed rwsems Imran Khan
2022-03-18 0:07 ` Al Viro
2022-03-21 1:57 ` Imran Khan
2022-03-21 7:29 ` Al Viro
2022-03-21 16:46 ` Tejun Heo
2022-03-21 17:55 ` Al Viro
2022-03-21 19:20 ` Tejun Heo
2022-03-22 2:40 ` Al Viro
2022-03-22 17:08 ` Tejun Heo
2022-03-22 20:26 ` Al Viro
2022-03-22 21:20 ` Tejun Heo
2022-03-28 0:15 ` Imran Khan
2022-03-28 17:30 ` Tejun Heo
2022-03-30 2:23 ` Imran Khan
2022-03-17 7:26 ` [RESEND PATCH v7 8/8] kernfs: Add a document to describe hashed locks used in kernfs Imran Khan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YjOpedPDj+3KCJjk@zeniv-ca.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=akpm@linux-foundation.org \
--cc=gregkh@linuxfoundation.org \
--cc=imran.f.khan@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox