From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Nicolai Stange <nicstange@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Johannes Berg <johannes@sipsolutions.net>,
linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 9/9] debugfs: free debugfs_fsdata instances
Date: Mon, 17 Apr 2017 09:01:21 -0700 [thread overview]
Message-ID: <20170417160121.GP3956@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170416095137.2784-10-nicstange@gmail.com>
On Sun, Apr 16, 2017 at 11:51:37AM +0200, Nicolai Stange wrote:
> Currently, a dentry's debugfs_fsdata instance is allocated from
> debugfs_file_get() at first usage, i.e. at first file opening.
>
> It won't ever get freed though.
>
> Ideally, these instances would get freed after the last open file handle
> gets closed again, that is from the fops' ->release().
>
> Unfortunately, we can't hook easily into those ->release()ers of files
> created through debugfs_create_file_unsafe(), i.e. of those proxied through
> debugfs_open_proxy_file_operations rather than through the "full"
> debugfs_full_proxy_file_operations proxy.
>
> Hence, free unreferenced debugfs_fsdata instances from debugfs_file_put(),
> with the drawback of a potential allocation + deallocation per
> debugfs_file_get() + debugfs_file_put() pair, that is per fops invocation.
>
> In addition to its former role of tracking pending fops, use the
> ->active_users for reference counting on the debugfs_fsdata instance
> itself. In particular, don't keep a dummy reference to be dropped from
> __debugfs_remove_file(): a d_delete()ed dentry and thus, request for
> completion notification, is now signaled by the d_unlinked() dentry itself.
>
> Once ->active_users drops to zero (and the dentry is still intact), free
> the debugfs_fsdata instance from debugfs_file_put(). RCU protects any
> concurrent debugfs_file_get() attempts to get a hold of the instance here.
> Likewise for full_proxy_release() which lacks a call to debugfs_file_get().
>
> Note that due to non-atomic updates to the d_unlinked() + ->d_fsdata pair,
> care must be taken in order to avoid races between debugfs_file_put() and
> debugfs_file_get() as well as __debugfs_remove_file(). Rather than
> introducing a global lock, exploit the fact that there will ever be only a
> single !d_unlinked() -> d_unlinked() transition and add memory barriers
> where needed. Given the lack of proper benchmarking, that debugfs fops
> aren't performance critical and that we've already got a potential
> allocation/deallocation pair anyway, the added code complexity might be
> highly questionable though.
>
> Signed-off-by: Nicolai Stange <nicstange@gmail.com>
If you have not already done so, please run this with debug enabled,
especially CONFIG_PROVE_LOCKING=y (which implies CONFIG_PROVE_RCU=y).
This is important because there are configurations for which the deadlocks
you saw with SRCU turn into silent failure, including memory corruption.
CONFIG_PROVE_RCU=y will catch many of those situations.
(And yes, kfree_rcu() doesn't have that problem, but...)
Another issue called out inline.
Thanx, Paul
> ---
> fs/debugfs/file.c | 102 ++++++++++++++++++++++++++++++++++++++++----------
> fs/debugfs/inode.c | 8 +++-
> fs/debugfs/internal.h | 1 +
> 3 files changed, 90 insertions(+), 21 deletions(-)
>
> diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
> index f4dfd7d0d625..b2cc25d44a39 100644
> --- a/fs/debugfs/file.c
> +++ b/fs/debugfs/file.c
> @@ -22,6 +22,7 @@
> #include <linux/slab.h>
> #include <linux/atomic.h>
> #include <linux/device.h>
> +#include <linux/rcupdate.h>
> #include <asm/poll.h>
>
> #include "internal.h"
> @@ -78,10 +79,39 @@ int debugfs_file_get(struct dentry *dentry)
> struct debugfs_fsdata *fsd;
> void *d_fsd;
>
> - d_fsd = READ_ONCE(dentry->d_fsdata);
> + rcu_read_lock();
> +retry:
> + d_fsd = rcu_dereference(dentry->d_fsdata);
> if (!((unsigned long)d_fsd & DEBUGFS_FSDATA_IS_REAL_FOPS_BIT)) {
> + /*
> + * Paired with the control dependency in
> + * debugfs_file_put(): if we saw the debugfs_fsdata
> + * instance "restored" there but not the dead dentry,
> + * we'd erroneously instantiate a fresh debugfs_fsdata
> + * instance below.
> + */
> + smp_rmb();
> + if (d_unlinked(dentry)) {
> + rcu_read_unlock();
> + return -EIO;
> + }
> +
> fsd = d_fsd;
> + if (!refcount_inc_not_zero(&fsd->active_users)) {
> + /*
> + * A concurrent debugfs_file_put() dropped the
> + * count to zero and is about to free the
> + * debugfs_fsdata. Help out resetting the
> + * ->d_fsdata and retry.
> + */
> + d_fsd = (void *)((unsigned long)fsd->real_fops |
> + DEBUGFS_FSDATA_IS_REAL_FOPS_BIT);
> + RCU_INIT_POINTER(dentry->d_fsdata, d_fsd);
This is an infrequent race, I hope? If on the other hand there is
a possibility of this branch being taken a huge number of times in
one call, it would be good to exit the RCU read-side critical section
before retrying.
> + goto retry;
> + }
> + rcu_read_unlock();
> } else {
> + rcu_read_unlock();
> fsd = kmalloc(sizeof(*fsd), GFP_KERNEL);
> if (!fsd)
> return -ENOMEM;
> @@ -91,25 +121,28 @@ int debugfs_file_get(struct dentry *dentry)
> refcount_set(&fsd->active_users, 1);
> init_completion(&fsd->active_users_drained);
> if (cmpxchg(&dentry->d_fsdata, d_fsd, fsd) != d_fsd) {
> + /*
> + * Another debugfs_file_get() has installed a
> + * debugfs_fsdata instance concurrently.
> + * Free ours and retry to grab a reference on
> + * the installed one.
> + */
> kfree(fsd);
> - fsd = READ_ONCE(dentry->d_fsdata);
> + rcu_read_lock();
> + goto retry;
And given this code path, why not put the retry: label before the
initial rcu_read_lock()? Same number of lines of code, rcu_read_lock()
and rcu_read_unlock() are very lightweight, the extra executions should
be rare, and you might be avoiding a grace-period-starvation problem.
> + }
> + /*
> + * In case of a successful cmpxchg() above, this check is
> + * strictly necessary and must follow it, see the comment in
> + * __debugfs_remove_file().
> + */
> + if (d_unlinked(dentry)) {
> + if (refcount_dec_and_test(&fsd->active_users))
> + complete(&fsd->active_users_drained);
> + return -EIO;
> }
> }
>
> - /*
> - * In case of a successful cmpxchg() above, this check is
> - * strictly necessary and must follow it, see the comment in
> - * __debugfs_remove_file().
> - * OTOH, if the cmpxchg() hasn't been executed or wasn't
> - * successful, this serves the purpose of not starving
> - * removers.
> - */
> - if (d_unlinked(dentry))
> - return -EIO;
> -
> - if (!refcount_inc_not_zero(&fsd->active_users))
> - return -EIO;
> -
> return 0;
> }
> EXPORT_SYMBOL_GPL(debugfs_file_get);
> @@ -126,9 +159,29 @@ EXPORT_SYMBOL_GPL(debugfs_file_get);
> void debugfs_file_put(struct dentry *dentry)
> {
> struct debugfs_fsdata *fsd = READ_ONCE(dentry->d_fsdata);
> + void *d_fsd;
>
> - if (refcount_dec_and_test(&fsd->active_users))
> - complete(&fsd->active_users_drained);
> + if (refcount_dec_and_test(&fsd->active_users)) {
> + d_fsd = (void *)((unsigned long)fsd->real_fops |
> + DEBUGFS_FSDATA_IS_REAL_FOPS_BIT);
> + RCU_INIT_POINTER(dentry->d_fsdata, d_fsd);
> + /* Paired with smp_mb() in __debugfs_remove_file(). */
> + smp_mb();
> + if (d_unlinked(dentry)) {
> + /*
> + * We have a control dependency paired with the
> + * smp_rmb() in debugfs_file_get() here.
> + *
> + * Restore the debugfs_fsdata instance into
> + * ->d_fsdata s.t. ->d_release() can free
> + * it.
> + */
> + WRITE_ONCE(dentry->d_fsdata, fsd);
> + complete(&fsd->active_users_drained);
> + } else {
> + kfree_rcu(fsd, rcu_head);
> + }
> + }
> }
> EXPORT_SYMBOL_GPL(debugfs_file_put);
>
> @@ -221,9 +274,20 @@ static unsigned int full_proxy_poll(struct file *filp,
> static int full_proxy_release(struct inode *inode, struct file *filp)
> {
> const struct dentry *dentry = F_DENTRY(filp);
> - const struct file_operations *real_fops = debugfs_real_fops(filp);
> const struct file_operations *proxy_fops = filp->f_op;
> int r = 0;
> + void *d_fsd;
> + const struct file_operations *real_fops;
> +
> + rcu_read_lock();
> + d_fsd = rcu_dereference(F_DENTRY(filp)->d_fsdata);
> + if ((unsigned long)d_fsd & DEBUGFS_FSDATA_IS_REAL_FOPS_BIT) {
> + real_fops = (void *)((unsigned long)d_fsd &
> + ~DEBUGFS_FSDATA_IS_REAL_FOPS_BIT);
> + } else {
> + real_fops = ((struct debugfs_fsdata *)d_fsd)->real_fops;
> + }
> + rcu_read_unlock();
>
> /*
> * We must not protect this against removal races here: the
> diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
> index 2360c17ec00a..bacb4d6bf178 100644
> --- a/fs/debugfs/inode.c
> +++ b/fs/debugfs/inode.c
> @@ -27,6 +27,7 @@
> #include <linux/parser.h>
> #include <linux/magic.h>
> #include <linux/slab.h>
> +#include <linux/rcupdate.h>
>
> #include "internal.h"
>
> @@ -636,13 +637,16 @@ static void __debugfs_remove_file(struct dentry *dentry, struct dentry *parent)
> * cmpxchg() in debugfs_file_get(): either
> * debugfs_file_get() must see a dead dentry or we must see a
> * debugfs_fsdata instance at ->d_fsdata here (or both).
> + *
> + * Also paired with the smp_mb() in debugfs_file_put(): if we
> + * see a debugfs_fsdata instance here, then debugfs_file_put()
> + * must see a dead dentry.
> */
> smp_mb();
> fsd = READ_ONCE(dentry->d_fsdata);
> if ((unsigned long)fsd & DEBUGFS_FSDATA_IS_REAL_FOPS_BIT)
> return;
> - if (!refcount_dec_and_test(&fsd->active_users))
> - wait_for_completion(&fsd->active_users_drained);
> + wait_for_completion(&fsd->active_users_drained);
> }
>
> static int __debugfs_remove(struct dentry *dentry, struct dentry *parent)
> diff --git a/fs/debugfs/internal.h b/fs/debugfs/internal.h
> index cb1e8139c398..0445bd7d11f2 100644
> --- a/fs/debugfs/internal.h
> +++ b/fs/debugfs/internal.h
> @@ -23,6 +23,7 @@ struct debugfs_fsdata {
> const struct file_operations *real_fops;
> refcount_t active_users;
> struct completion active_users_drained;
> + struct rcu_head rcu_head;
> };
>
> /*
> --
> 2.12.2
>
next prev parent reply other threads:[~2017-04-17 16:01 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-23 14:54 deadlock in synchronize_srcu() in debugfs? Johannes Berg
2017-03-23 15:29 ` Johannes Berg
2017-03-24 8:56 ` Johannes Berg
2017-03-24 9:24 ` Johannes Berg
2017-03-24 17:45 ` Paul E. McKenney
2017-03-24 18:51 ` Johannes Berg
2017-03-24 19:33 ` Paul E. McKenney
2017-03-24 20:20 ` Paul E. McKenney
2017-03-27 11:18 ` Johannes Berg
2017-03-23 15:36 ` Nicolai Stange
2017-03-23 15:47 ` Johannes Berg
2017-03-27 11:36 ` Johannes Berg
2017-03-30 7:32 ` Nicolai Stange
2017-03-30 7:55 ` Johannes Berg
2017-03-30 10:27 ` Nicolai Stange
2017-03-30 11:11 ` Johannes Berg
2017-03-31 9:03 ` Nicolai Stange
2017-03-31 9:44 ` Johannes Berg
2017-04-16 9:51 ` [RFC PATCH 0/9] debugfs: per-file removal protection Nicolai Stange
2017-04-16 9:51 ` [RFC PATCH 1/9] debugfs: add support for more elaborate ->d_fsdata Nicolai Stange
2017-04-16 9:51 ` [RFC PATCH 2/9] debugfs: implement per-file removal protection Nicolai Stange
2017-04-18 2:23 ` [lkp-robot] [debugfs] f3e7155d08: BUG:unable_to_handle_kernel kernel test robot
2017-04-18 2:23 ` kernel test robot
2017-04-23 18:37 ` Nicolai Stange
2017-04-23 18:37 ` Nicolai Stange
2017-04-24 6:36 ` Ye Xiaolong
2017-04-24 6:36 ` Ye Xiaolong
2017-04-16 9:51 ` [RFC PATCH 3/9] debugfs: debugfs_real_fops(): drop __must_hold sparse annotation Nicolai Stange
2017-04-16 9:51 ` [RFC PATCH 4/9] debugfs: convert to debugfs_file_get() and -put() Nicolai Stange
2017-04-16 9:51 ` [RFC PATCH 5/9] IB/hfi1: " Nicolai Stange
2017-04-16 9:51 ` [RFC PATCH 6/9] debugfs: purge obsolete SRCU based removal protection Nicolai Stange
2017-04-16 9:51 ` [RFC PATCH 7/9] debugfs: call debugfs_real_fops() only after debugfs_file_get() Nicolai Stange
2017-04-16 9:51 ` [RFC PATCH 8/9] debugfs: defer debugfs_fsdata allocation to first usage Nicolai Stange
2017-04-18 9:36 ` Johannes Berg
2017-05-02 20:05 ` Nicolai Stange
2017-05-03 5:43 ` Johannes Berg
2017-04-16 9:51 ` [RFC PATCH 9/9] debugfs: free debugfs_fsdata instances Nicolai Stange
2017-04-17 16:01 ` Paul E. McKenney [this message]
2017-04-18 9:39 ` Johannes Berg
2017-04-18 13:31 ` Paul E. McKenney
2017-04-18 13:40 ` Johannes Berg
2017-04-18 15:17 ` Paul E. McKenney
2017-04-18 15:20 ` Johannes Berg
2017-04-18 17:19 ` Paul E. McKenney
2017-03-23 15:37 ` deadlock in synchronize_srcu() in debugfs? Paul E. McKenney
2017-03-23 15:46 ` Johannes Berg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170417160121.GP3956@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=gregkh@linuxfoundation.org \
--cc=johannes@sipsolutions.net \
--cc=linux-kernel@vger.kernel.org \
--cc=nicstange@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.