linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Nicolai Stange <nicstange@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Johannes Berg <johannes@sipsolutions.net>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 9/9] debugfs: free debugfs_fsdata instances
Date: Mon, 17 Apr 2017 09:01:21 -0700	[thread overview]
Message-ID: <20170417160121.GP3956@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170416095137.2784-10-nicstange@gmail.com>

On Sun, Apr 16, 2017 at 11:51:37AM +0200, Nicolai Stange wrote:
> Currently, a dentry's debugfs_fsdata instance is allocated from
> debugfs_file_get() at first usage, i.e. at first file opening.
> 
> It won't ever get freed though.
> 
> Ideally, these instances would get freed after the last open file handle
> gets closed again, that is from the fops' ->release().
> 
> Unfortunately, we can't hook easily into those ->release()ers of files
> created through debugfs_create_file_unsafe(), i.e. of those proxied through
> debugfs_open_proxy_file_operations rather than through the "full"
> debugfs_full_proxy_file_operations proxy.
> 
> Hence, free unreferenced debugfs_fsdata instances from debugfs_file_put(),
> with the drawback of a potential allocation + deallocation per
> debugfs_file_get() + debugfs_file_put() pair, that is per fops invocation.
> 
> In addition to its former role of tracking pending fops, use the
> ->active_users for reference counting on the debugfs_fsdata instance
> itself. In particular, don't keep a dummy reference to be dropped from
> __debugfs_remove_file(): a d_delete()ed dentry and thus, request for
> completion notification, is now signaled by the d_unlinked() dentry itself.
> 
> Once ->active_users drops to zero (and the dentry is still intact), free
> the debugfs_fsdata instance from debugfs_file_put(). RCU protects any
> concurrent debugfs_file_get() attempts to get a hold of the instance here.
> Likewise for full_proxy_release() which lacks a call to debugfs_file_get().
> 
> Note that due to non-atomic updates to the d_unlinked() + ->d_fsdata pair,
> care must be taken in order to avoid races between debugfs_file_put() and
> debugfs_file_get() as well as __debugfs_remove_file(). Rather than
> introducing a global lock, exploit the fact that there will ever be only a
> single !d_unlinked() -> d_unlinked() transition and add memory barriers
> where needed. Given the lack of proper benchmarking, that debugfs fops
> aren't performance critical and that we've already got a potential
> allocation/deallocation pair anyway, the added code complexity might be
> highly questionable though.
> 
> Signed-off-by: Nicolai Stange <nicstange@gmail.com>

If you have not already done so, please run this with debug enabled,
especially CONFIG_PROVE_LOCKING=y (which implies CONFIG_PROVE_RCU=y).
This is important because there are configurations for which the deadlocks
you saw with SRCU turn into silent failure, including memory corruption.
CONFIG_PROVE_RCU=y will catch many of those situations.

(And yes, kfree_rcu() doesn't have that problem, but...)

Another issue called out inline.

							Thanx, Paul

> ---
>  fs/debugfs/file.c     | 102 ++++++++++++++++++++++++++++++++++++++++----------
>  fs/debugfs/inode.c    |   8 +++-
>  fs/debugfs/internal.h |   1 +
>  3 files changed, 90 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
> index f4dfd7d0d625..b2cc25d44a39 100644
> --- a/fs/debugfs/file.c
> +++ b/fs/debugfs/file.c
> @@ -22,6 +22,7 @@
>  #include <linux/slab.h>
>  #include <linux/atomic.h>
>  #include <linux/device.h>
> +#include <linux/rcupdate.h>
>  #include <asm/poll.h>
> 
>  #include "internal.h"
> @@ -78,10 +79,39 @@ int debugfs_file_get(struct dentry *dentry)
>  	struct debugfs_fsdata *fsd;
>  	void *d_fsd;
> 
> -	d_fsd = READ_ONCE(dentry->d_fsdata);
> +	rcu_read_lock();
> +retry:
> +	d_fsd = rcu_dereference(dentry->d_fsdata);
>  	if (!((unsigned long)d_fsd & DEBUGFS_FSDATA_IS_REAL_FOPS_BIT)) {
> +		/*
> +		 * Paired with the control dependency in
> +		 * debugfs_file_put(): if we saw the debugfs_fsdata
> +		 * instance "restored" there but not the dead dentry,
> +		 * we'd erroneously instantiate a fresh debugfs_fsdata
> +		 * instance below.
> +		 */
> +		smp_rmb();
> +		if (d_unlinked(dentry)) {
> +			rcu_read_unlock();
> +			return -EIO;
> +		}
> +
>  		fsd = d_fsd;
> +		if (!refcount_inc_not_zero(&fsd->active_users)) {
> +			/*
> +			 * A concurrent debugfs_file_put() dropped the
> +			 * count to zero and is about to free the
> +			 * debugfs_fsdata. Help out resetting the
> +			 * ->d_fsdata and retry.
> +			 */
> +			d_fsd = (void *)((unsigned long)fsd->real_fops |
> +					DEBUGFS_FSDATA_IS_REAL_FOPS_BIT);
> +			RCU_INIT_POINTER(dentry->d_fsdata, d_fsd);

This is an infrequent race, I hope?  If on the other hand there is
a possibility of this branch being taken a huge number of times in
one call, it would be good to exit the RCU read-side critical section
before retrying.

> +			goto retry;
> +		}
> +		rcu_read_unlock();
>  	} else {
> +		rcu_read_unlock();
>  		fsd = kmalloc(sizeof(*fsd), GFP_KERNEL);
>  		if (!fsd)
>  			return -ENOMEM;
> @@ -91,25 +121,28 @@ int debugfs_file_get(struct dentry *dentry)
>  		refcount_set(&fsd->active_users, 1);
>  		init_completion(&fsd->active_users_drained);
>  		if (cmpxchg(&dentry->d_fsdata, d_fsd, fsd) != d_fsd) {
> +			/*
> +			 * Another debugfs_file_get() has installed a
> +			 * debugfs_fsdata instance concurrently.
> +			 * Free ours and retry to grab a reference on
> +			 * the installed one.
> +			 */
>  			kfree(fsd);
> -			fsd = READ_ONCE(dentry->d_fsdata);
> +			rcu_read_lock();
> +			goto retry;

And given this code path, why not put the retry: label before the
initial rcu_read_lock()?  Same number of lines of code, rcu_read_lock()
and rcu_read_unlock() are very lightweight, the extra executions should
be rare, and you might be avoiding a grace-period-starvation problem.

> +		}
> +		/*
> +		 * In case of a successful cmpxchg() above, this check is
> +		 * strictly necessary and must follow it, see the comment in
> +		 * __debugfs_remove_file().
> +		 */
> +		if (d_unlinked(dentry)) {
> +			if (refcount_dec_and_test(&fsd->active_users))
> +				complete(&fsd->active_users_drained);
> +			return -EIO;
>  		}
>  	}
> 
> -	/*
> -	 * In case of a successful cmpxchg() above, this check is
> -	 * strictly necessary and must follow it, see the comment in
> -	 * __debugfs_remove_file().
> -	 * OTOH, if the cmpxchg() hasn't been executed or wasn't
> -	 * successful, this serves the purpose of not starving
> -	 * removers.
> -	 */
> -	if (d_unlinked(dentry))
> -		return -EIO;
> -
> -	if (!refcount_inc_not_zero(&fsd->active_users))
> -		return -EIO;
> -
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(debugfs_file_get);
> @@ -126,9 +159,29 @@ EXPORT_SYMBOL_GPL(debugfs_file_get);
>  void debugfs_file_put(struct dentry *dentry)
>  {
>  	struct debugfs_fsdata *fsd = READ_ONCE(dentry->d_fsdata);
> +	void *d_fsd;
> 
> -	if (refcount_dec_and_test(&fsd->active_users))
> -		complete(&fsd->active_users_drained);
> +	if (refcount_dec_and_test(&fsd->active_users)) {
> +		d_fsd = (void *)((unsigned long)fsd->real_fops |
> +				DEBUGFS_FSDATA_IS_REAL_FOPS_BIT);
> +		RCU_INIT_POINTER(dentry->d_fsdata, d_fsd);
> +		/* Paired with smp_mb() in __debugfs_remove_file(). */
> +		smp_mb();
> +		if (d_unlinked(dentry)) {
> +			/*
> +			 * We have a control dependency paired with the
> +			 * smp_rmb() in debugfs_file_get() here.
> +			 *
> +			 * Restore the debugfs_fsdata instance into
> +			 * ->d_fsdata s.t. ->d_release() can free
> +			 * it.
> +			 */
> +			WRITE_ONCE(dentry->d_fsdata, fsd);
> +			complete(&fsd->active_users_drained);
> +		} else {
> +			kfree_rcu(fsd, rcu_head);
> +		}
> +	}
>  }
>  EXPORT_SYMBOL_GPL(debugfs_file_put);
> 
> @@ -221,9 +274,20 @@ static unsigned int full_proxy_poll(struct file *filp,
>  static int full_proxy_release(struct inode *inode, struct file *filp)
>  {
>  	const struct dentry *dentry = F_DENTRY(filp);
> -	const struct file_operations *real_fops = debugfs_real_fops(filp);
>  	const struct file_operations *proxy_fops = filp->f_op;
>  	int r = 0;
> +	void *d_fsd;
> +	const struct file_operations *real_fops;
> +
> +	rcu_read_lock();
> +	d_fsd = rcu_dereference(F_DENTRY(filp)->d_fsdata);
> +	if ((unsigned long)d_fsd & DEBUGFS_FSDATA_IS_REAL_FOPS_BIT) {
> +		real_fops = (void *)((unsigned long)d_fsd &
> +				~DEBUGFS_FSDATA_IS_REAL_FOPS_BIT);
> +	} else {
> +		real_fops = ((struct debugfs_fsdata *)d_fsd)->real_fops;
> +	}
> +	rcu_read_unlock();
> 
>  	/*
>  	 * We must not protect this against removal races here: the
> diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
> index 2360c17ec00a..bacb4d6bf178 100644
> --- a/fs/debugfs/inode.c
> +++ b/fs/debugfs/inode.c
> @@ -27,6 +27,7 @@
>  #include <linux/parser.h>
>  #include <linux/magic.h>
>  #include <linux/slab.h>
> +#include <linux/rcupdate.h>
> 
>  #include "internal.h"
> 
> @@ -636,13 +637,16 @@ static void __debugfs_remove_file(struct dentry *dentry, struct dentry *parent)
>  	 * cmpxchg() in debugfs_file_get(): either
>  	 * debugfs_file_get() must see a dead dentry or we must see a
>  	 * debugfs_fsdata instance at ->d_fsdata here (or both).
> +	 *
> +	 * Also paired with the smp_mb() in debugfs_file_put(): if we
> +	 * see a debugfs_fsdata instance here, then debugfs_file_put()
> +	 * must see a dead dentry.
>  	 */
>  	smp_mb();
>  	fsd = READ_ONCE(dentry->d_fsdata);
>  	if ((unsigned long)fsd & DEBUGFS_FSDATA_IS_REAL_FOPS_BIT)
>  		return;
> -	if (!refcount_dec_and_test(&fsd->active_users))
> -		wait_for_completion(&fsd->active_users_drained);
> +	wait_for_completion(&fsd->active_users_drained);
>  }
> 
>  static int __debugfs_remove(struct dentry *dentry, struct dentry *parent)
> diff --git a/fs/debugfs/internal.h b/fs/debugfs/internal.h
> index cb1e8139c398..0445bd7d11f2 100644
> --- a/fs/debugfs/internal.h
> +++ b/fs/debugfs/internal.h
> @@ -23,6 +23,7 @@ struct debugfs_fsdata {
>  	const struct file_operations *real_fops;
>  	refcount_t active_users;
>  	struct completion active_users_drained;
> +	struct rcu_head rcu_head;
>  };
> 
>  /*
> -- 
> 2.12.2
> 

  reply	other threads:[~2017-04-17 16:01 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-23 14:54 deadlock in synchronize_srcu() in debugfs? Johannes Berg
2017-03-23 15:29 ` Johannes Berg
2017-03-24  8:56   ` Johannes Berg
2017-03-24  9:24     ` Johannes Berg
2017-03-24 17:45       ` Paul E. McKenney
2017-03-24 18:51         ` Johannes Berg
2017-03-24 19:33           ` Paul E. McKenney
2017-03-24 20:20             ` Paul E. McKenney
2017-03-27 11:18               ` Johannes Berg
2017-03-23 15:36 ` Nicolai Stange
2017-03-23 15:47   ` Johannes Berg
2017-03-27 11:36   ` Johannes Berg
2017-03-30  7:32     ` Nicolai Stange
2017-03-30  7:55       ` Johannes Berg
2017-03-30 10:27         ` Nicolai Stange
2017-03-30 11:11           ` Johannes Berg
2017-03-31  9:03             ` Nicolai Stange
2017-03-31  9:44               ` Johannes Berg
2017-04-16  9:51               ` [RFC PATCH 0/9] debugfs: per-file removal protection Nicolai Stange
2017-04-16  9:51                 ` [RFC PATCH 1/9] debugfs: add support for more elaborate ->d_fsdata Nicolai Stange
2017-04-16  9:51                 ` [RFC PATCH 2/9] debugfs: implement per-file removal protection Nicolai Stange
2017-04-18  2:23                   ` [lkp-robot] [debugfs] f3e7155d08: BUG:unable_to_handle_kernel kernel test robot
2017-04-23 18:37                     ` Nicolai Stange
2017-04-24  6:36                       ` Ye Xiaolong
2017-04-16  9:51                 ` [RFC PATCH 3/9] debugfs: debugfs_real_fops(): drop __must_hold sparse annotation Nicolai Stange
2017-04-16  9:51                 ` [RFC PATCH 4/9] debugfs: convert to debugfs_file_get() and -put() Nicolai Stange
2017-04-16  9:51                 ` [RFC PATCH 5/9] IB/hfi1: " Nicolai Stange
2017-04-16  9:51                 ` [RFC PATCH 6/9] debugfs: purge obsolete SRCU based removal protection Nicolai Stange
2017-04-16  9:51                 ` [RFC PATCH 7/9] debugfs: call debugfs_real_fops() only after debugfs_file_get() Nicolai Stange
2017-04-16  9:51                 ` [RFC PATCH 8/9] debugfs: defer debugfs_fsdata allocation to first usage Nicolai Stange
2017-04-18  9:36                   ` Johannes Berg
2017-05-02 20:05                     ` Nicolai Stange
2017-05-03  5:43                       ` Johannes Berg
2017-04-16  9:51                 ` [RFC PATCH 9/9] debugfs: free debugfs_fsdata instances Nicolai Stange
2017-04-17 16:01                   ` Paul E. McKenney [this message]
2017-04-18  9:39                     ` Johannes Berg
2017-04-18 13:31                       ` Paul E. McKenney
2017-04-18 13:40                         ` Johannes Berg
2017-04-18 15:17                           ` Paul E. McKenney
2017-04-18 15:20                             ` Johannes Berg
2017-04-18 17:19                               ` Paul E. McKenney
2017-03-23 15:37 ` deadlock in synchronize_srcu() in debugfs? Paul E. McKenney
2017-03-23 15:46   ` Johannes Berg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170417160121.GP3956@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=johannes@sipsolutions.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nicstange@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).