All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Imran Khan <imran.f.khan@oracle.com>
Cc: tj@kernel.org, gregkh@linuxfoundation.org,
	akpm@linux-foundation.org, linux-kernel@vger.kernel.org
Subject: Re: [RESEND PATCH v7 1/8] kernfs: Introduce interface to access global kernfs_open_file_mutex.
Date: Thu, 17 Mar 2022 21:34:49 +0000	[thread overview]
Message-ID: <YjOpedPDj+3KCJjk@zeniv-ca.linux.org.uk> (raw)
In-Reply-To: <20220317072612.163143-2-imran.f.khan@oracle.com>

On Thu, Mar 17, 2022 at 06:26:05PM +1100, Imran Khan wrote:

> @@ -570,9 +571,10 @@ static void kernfs_put_open_node(struct kernfs_node *kn,
>  				 struct kernfs_open_file *of)
>  {
>  	struct kernfs_open_node *on = kn->attr.open;
> +	struct mutex *mutex = NULL;
>  	unsigned long flags;
>  
> -	mutex_lock(&kernfs_open_file_mutex);
> +	mutex = kernfs_open_file_mutex_lock(kn);
>  	spin_lock_irqsave(&kernfs_open_node_lock, flags);

Can that ever be reached with local interrupts disabled?  I mean, what is
that spin_lock_irqsave() about?

> @@ -745,11 +747,12 @@ static int kernfs_fop_release(struct inode *inode, struct file *filp)
>  {
>  	struct kernfs_node *kn = inode->i_private;
>  	struct kernfs_open_file *of = kernfs_of(filp);
> +	struct mutex *lock = NULL;
>  
>  	if (kn->flags & KERNFS_HAS_RELEASE) {
> -		mutex_lock(&kernfs_open_file_mutex);
> +		lock = kernfs_open_file_mutex_lock(kn);
>  		kernfs_release_file(kn, of);
> -		mutex_unlock(&kernfs_open_file_mutex);
> +		mutex_unlock(lock);

Careful - you are about to remove the existing exclusion between *all*
->release() instances, same node or not.

In particular, if some driver had them manipulate a driver-local list of
some kind, relying upon the kernfs to provide the exclusion, it'd break
as soon as you turn that thing into per-node (or hashed) mutex.

It's _probably_ safe, seeing that the one and only instance of ->release()
in the entire tree (cgroup_file_release()) is rather limited in what
it's doing, and while it calls a submethod (cftype.release()) there's only
a couple of instances of that (cgroup_procs_release() and
cgroup_pressure_release(), both in kernel/cgroup/cgroup.c).  Neither
seems to rely upon the global exclusion.

However, that's a change of rules and it needs to be documented as such.

Incidentally, what's the point of having kernfs_open_node->refcnt
atomic_t?  All users are under kernfs_open_node_lock...  AFAICS,
it's simply "->files is non-empty or something is in
kernfs_drain_open_files() for the node in question", so I'm not
even sure we want a counter there...

Note that kernfs_drain_open_files() can't overlap with
kernfs_fops_open() adding to the list of files (and we seriously
rely upon that - you don't want ops->release() called while in
the middle of ops->open()).  kernfs_fops_open() starts with
grabbing an active reference; kernfs_drain_open_files() is
not called until we had
	* prevented new active references being grabbed and
	* waited for all active references to be dropped.

So kernfs_drain_open_files() can do the following:
	1) optimistically check for ->attr.open being NULL;
bugger off if it is.  We know that nobody could be currently
trying to add anything to it, mutex or no mutex.
	2) grab the mutex
	3) recheck ->attr.open; it might have become NULL.
If it had, unlock and bugger off.
	4) walk the list, doing unmaps/releases.
	5) unlock and bugger off.
The only thing doing removals from the list is
kernfs_put_open_node() and it grabs that mutex.
So it can't get to the "remove from list, free the container
of list head" until we are through.

IOW, there's no reason to hold a reference to kernfs_open_node
in kernfs_drain_open_files() at all.  And that makes ->refcnt
completely useless - kernfs_put_open_node() should do
	list_del(&of->list);
	if (list_empty(&on->files))
		kn->attr.open = NULL;
	else
		on = NULL;
and to hell with refcounting.

As the matter of fact, we can do even better - make freeing
that thing rcu-delayed, use rcu_assign_pointer() for stores,
rcu_dereference() for loads and have kernfs_notify() do
	rcu_read_lock();
	on = rcu_dereference(kn->attr.open);
	if (on) {
		atomic_inc(&on->event);
		wake_up_interruptible(&on->poll);
	}
	rcu_read_unlock();
and kernfs_open_node_lock becomes useless - all places that
grab it are under kernfs_open_file_mutex.

  reply	other threads:[~2022-03-17 21:37 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-17  7:26 [RESEND PATCH v7 0/8] kernfs: Introduce interface to access global kernfs_open_file_mutex Imran Khan
2022-03-17  7:26 ` [RESEND PATCH v7 1/8] " Imran Khan
2022-03-17 21:34   ` Al Viro [this message]
2022-04-05  5:36     ` Imran Khan
2022-04-05 14:24       ` Al Viro
2022-04-06  4:54         ` Imran Khan
2022-04-06 14:54           ` Al Viro
2022-04-06 15:18             ` Tejun Heo
2022-04-14  0:01           ` Imran Khan
2022-03-18 17:10   ` Eric W. Biederman
2022-03-21  0:10     ` Imran Khan
2022-03-17  7:26 ` [RESEND PATCH v7 2/8] kernfs: Replace global kernfs_open_file_mutex with hashed mutexes Imran Khan
2022-03-17  7:26 ` [RESEND PATCH v7 3/8] kernfs: Introduce interface to access kernfs_open_node_lock Imran Khan
2022-03-17  7:26 ` [RESEND PATCH v7 4/8] kernfs: Replace global kernfs_open_node_lock with hashed spinlocks Imran Khan
2022-03-17  7:26 ` [RESEND PATCH v7 5/8] kernfs: Use a per-fs rwsem to protect per-fs list of kernfs_super_info Imran Khan
2022-03-17  7:26 ` [RESEND PATCH v7 6/8] kernfs: Introduce interface to access per-fs rwsem Imran Khan
2022-03-17  7:26 ` [RESEND PATCH v7 7/8] kernfs: Replace per-fs rwsem with hashed rwsems Imran Khan
2022-03-18  0:07   ` Al Viro
2022-03-21  1:57     ` Imran Khan
2022-03-21  7:29       ` Al Viro
2022-03-21 16:46         ` Tejun Heo
2022-03-21 17:55           ` Al Viro
2022-03-21 19:20             ` Tejun Heo
2022-03-22  2:40               ` Al Viro
2022-03-22 17:08                 ` Tejun Heo
2022-03-22 20:26                   ` Al Viro
2022-03-22 21:20                     ` Tejun Heo
2022-03-28  0:15                 ` Imran Khan
2022-03-28 17:30                   ` Tejun Heo
2022-03-30  2:23                 ` Imran Khan
2022-03-17  7:26 ` [RESEND PATCH v7 8/8] kernfs: Add a document to describe hashed locks used in kernfs Imran Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YjOpedPDj+3KCJjk@zeniv-ca.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=akpm@linux-foundation.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=imran.f.khan@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.