public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nicolai Stange <nicstange@gmail.com>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Johannes Berg <johannes@sipsolutions.net>,
	"Paul E.McKenney" <paulmck@linux.vnet.ibm.com>,
	Tyler Hall <tylerwhall@gmail.com>,
	Mike Marciniszyn <mike.marciniszyn@intel.com>,
	Dennis Dalessandro <dennis.dalessandro@intel.com>,
	Doug Ledford <dledford@redhat.com>,
	Sean Hefty <sean.hefty@intel.com>,
	Hal Rosenstock <hal.rosenstock@gmail.com>,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
	Nicolai Stange <nicstange@gmail.com>
Subject: [PATCH v3 8/8] debugfs: defer debugfs_fsdata allocation to first usage
Date: Tue, 31 Oct 2017 00:15:54 +0100	[thread overview]
Message-ID: <20171030231554.6200-9-nicstange@gmail.com> (raw)
In-Reply-To: <20171030231554.6200-1-nicstange@gmail.com>

Currently, __debugfs_create_file allocates one struct debugfs_fsdata
instance for every file created. However, there are potentially many
debugfs file around, most of which are never touched by userspace.

Thus, defer the allocations to the first usage, i.e. to the first
debugfs_file_get().

A dentry's ->d_fsdata starts out to point to the "real", user provided
fops. After a debugfs_fsdata instance has been allocated (and the real
fops pointer has been moved over into its ->real_fops member),
->d_fsdata is changed to point to it from then on. The two cases are
distinguished by setting BIT(0) for the real fops case.

struct debugfs_fsdata's foremost purpose is to track active users and to
make debugfs_remove() block until they are done. Since no debugfs_fsdata
instance means no active users, make debugfs_remove() return immediately
in this case.

Take care of possible races between debugfs_file_get() and
debugfs_remove(): either debugfs_remove() must see a debugfs_fsdata
instance and thus wait for possible active users or debugfs_file_get() must
see a dead dentry and return immediately.

Make a dentry's ->d_release(), i.e. debugfs_release_dentry(), check whether
->d_fsdata is actually a debugfs_fsdata instance before kfree()ing it.

Similarly, make debugfs_real_fops() check whether ->d_fsdata is actually
a debugfs_fsdata instance before returning it, otherwise emit a warning.

The set of possible error codes returned from debugfs_file_get() has grown
from -EIO to -EIO and -ENOMEM. Make open_proxy_open() and full_proxy_open()
pass the -ENOMEM onwards to their callers.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
---
 fs/debugfs/file.c     | 55 ++++++++++++++++++++++++++++++++++++++++++---------
 fs/debugfs/inode.c    | 36 +++++++++++++++++----------------
 fs/debugfs/internal.h |  8 ++++++++
 3 files changed, 73 insertions(+), 26 deletions(-)

diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
index bc3549c95574..65872340e301 100644
--- a/fs/debugfs/file.c
+++ b/fs/debugfs/file.c
@@ -53,6 +53,15 @@ const struct file_operations *debugfs_real_fops(const struct file *filp)
 {
 	struct debugfs_fsdata *fsd = F_DENTRY(filp)->d_fsdata;
 
+	if ((unsigned long)fsd & DEBUGFS_FSDATA_IS_REAL_FOPS_BIT) {
+		/*
+		 * Urgh, we've been called w/o a protecting
+		 * debugfs_file_get().
+		 */
+		WARN_ON(1);
+		return NULL;
+	}
+
 	return fsd->real_fops;
 }
 EXPORT_SYMBOL_GPL(debugfs_real_fops);
@@ -74,9 +83,35 @@ EXPORT_SYMBOL_GPL(debugfs_real_fops);
  */
 int debugfs_file_get(struct dentry *dentry)
 {
-	struct debugfs_fsdata *fsd = dentry->d_fsdata;
+	struct debugfs_fsdata *fsd;
+	void *d_fsd;
+
+	d_fsd = READ_ONCE(dentry->d_fsdata);
+	if (!((unsigned long)d_fsd & DEBUGFS_FSDATA_IS_REAL_FOPS_BIT)) {
+		fsd = d_fsd;
+	} else {
+		fsd = kmalloc(sizeof(*fsd), GFP_KERNEL);
+		if (!fsd)
+			return -ENOMEM;
+
+		fsd->real_fops = (void *)((unsigned long)d_fsd &
+					~DEBUGFS_FSDATA_IS_REAL_FOPS_BIT);
+		refcount_set(&fsd->active_users, 1);
+		init_completion(&fsd->active_users_drained);
+		if (cmpxchg(&dentry->d_fsdata, d_fsd, fsd) != d_fsd) {
+			kfree(fsd);
+			fsd = READ_ONCE(dentry->d_fsdata);
+		}
+	}
 
-	/* Avoid starvation of removers. */
+	/*
+	 * In case of a successful cmpxchg() above, this check is
+	 * strictly necessary and must follow it, see the comment in
+	 * __debugfs_remove_file().
+	 * OTOH, if the cmpxchg() hasn't been executed or wasn't
+	 * successful, this serves the purpose of not starving
+	 * removers.
+	 */
 	if (d_unlinked(dentry))
 		return -EIO;
 
@@ -98,7 +133,7 @@ EXPORT_SYMBOL_GPL(debugfs_file_get);
  */
 void debugfs_file_put(struct dentry *dentry)
 {
-	struct debugfs_fsdata *fsd = dentry->d_fsdata;
+	struct debugfs_fsdata *fsd = READ_ONCE(dentry->d_fsdata);
 
 	if (refcount_dec_and_test(&fsd->active_users))
 		complete(&fsd->active_users_drained);
@@ -109,10 +144,11 @@ static int open_proxy_open(struct inode *inode, struct file *filp)
 {
 	struct dentry *dentry = F_DENTRY(filp);
 	const struct file_operations *real_fops = NULL;
-	int r = 0;
+	int r;
 
-	if (debugfs_file_get(dentry))
-		return -ENOENT;
+	r = debugfs_file_get(dentry);
+	if (r)
+		return r == -EIO ? -ENOENT : r;
 
 	real_fops = debugfs_real_fops(filp);
 	real_fops = fops_get(real_fops);
@@ -233,10 +269,11 @@ static int full_proxy_open(struct inode *inode, struct file *filp)
 	struct dentry *dentry = F_DENTRY(filp);
 	const struct file_operations *real_fops = NULL;
 	struct file_operations *proxy_fops = NULL;
-	int r = 0;
+	int r;
 
-	if (debugfs_file_get(dentry))
-		return -ENOENT;
+	r = debugfs_file_get(dentry);
+	if (r)
+		return r == -EIO ? -ENOENT : r;
 
 	real_fops = debugfs_real_fops(filp);
 	real_fops = fops_get(real_fops);
diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index f587aded46b5..9dca4da059b3 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -184,7 +184,10 @@ static const struct super_operations debugfs_super_operations = {
 
 static void debugfs_release_dentry(struct dentry *dentry)
 {
-	kfree(dentry->d_fsdata);
+	void *fsd = dentry->d_fsdata;
+
+	if (!((unsigned long)fsd & DEBUGFS_FSDATA_IS_REAL_FOPS_BIT))
+		kfree(dentry->d_fsdata);
 }
 
 static struct vfsmount *debugfs_automount(struct path *path)
@@ -344,35 +347,25 @@ static struct dentry *__debugfs_create_file(const char *name, umode_t mode,
 {
 	struct dentry *dentry;
 	struct inode *inode;
-	struct debugfs_fsdata *fsd;
-
-	fsd = kmalloc(sizeof(*fsd), GFP_KERNEL);
-	if (!fsd)
-		return NULL;
 
 	if (!(mode & S_IFMT))
 		mode |= S_IFREG;
 	BUG_ON(!S_ISREG(mode));
 	dentry = start_creating(name, parent);
 
-	if (IS_ERR(dentry)) {
-		kfree(fsd);
+	if (IS_ERR(dentry))
 		return NULL;
-	}
 
 	inode = debugfs_get_inode(dentry->d_sb);
-	if (unlikely(!inode)) {
-		kfree(fsd);
+	if (unlikely(!inode))
 		return failed_creating(dentry);
-	}
 
 	inode->i_mode = mode;
 	inode->i_private = data;
 
 	inode->i_fop = proxy_fops;
-	fsd->real_fops = real_fops;
-	refcount_set(&fsd->active_users, 1);
-	dentry->d_fsdata = fsd;
+	dentry->d_fsdata = (void *)((unsigned long)real_fops |
+				DEBUGFS_FSDATA_IS_REAL_FOPS_BIT);
 
 	d_instantiate(dentry, inode);
 	fsnotify_create(d_inode(dentry->d_parent), dentry);
@@ -635,8 +628,17 @@ static void __debugfs_remove_file(struct dentry *dentry, struct dentry *parent)
 
 	simple_unlink(d_inode(parent), dentry);
 	d_delete(dentry);
-	fsd = dentry->d_fsdata;
-	init_completion(&fsd->active_users_drained);
+
+	/*
+	 * Paired with the closing smp_mb() implied by a successful
+	 * cmpxchg() in debugfs_file_get(): either
+	 * debugfs_file_get() must see a dead dentry or we must see a
+	 * debugfs_fsdata instance at ->d_fsdata here (or both).
+	 */
+	smp_mb();
+	fsd = READ_ONCE(dentry->d_fsdata);
+	if ((unsigned long)fsd & DEBUGFS_FSDATA_IS_REAL_FOPS_BIT)
+		return;
 	if (!refcount_dec_and_test(&fsd->active_users))
 		wait_for_completion(&fsd->active_users_drained);
 }
diff --git a/fs/debugfs/internal.h b/fs/debugfs/internal.h
index 0eea99432840..cb1e8139c398 100644
--- a/fs/debugfs/internal.h
+++ b/fs/debugfs/internal.h
@@ -25,4 +25,12 @@ struct debugfs_fsdata {
 	struct completion active_users_drained;
 };
 
+/*
+ * A dentry's ->d_fsdata either points to the real fops or to a
+ * dynamically allocated debugfs_fsdata instance.
+ * In order to distinguish between these two cases, a real fops
+ * pointer gets its lowest bit set.
+ */
+#define DEBUGFS_FSDATA_IS_REAL_FOPS_BIT BIT(0)
+
 #endif /* _DEBUGFS_INTERNAL_H_ */
-- 
2.13.6

      parent reply	other threads:[~2017-10-30 23:17 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-12 20:01 deadlock in debugfs synchronize_srcu() when unplugging USB Tyler Hall
2017-10-12 21:04 ` Paul E. McKenney
2017-10-16  7:51 ` Greg Kroah-Hartman
2017-10-16  8:15   ` Johannes Berg
2017-10-16  8:31     ` Greg Kroah-Hartman
2017-10-16 15:08       ` Tyler Hall
2017-10-30 23:15         ` [PATCH v3 0/8] debugfs: per-file removal protection Nicolai Stange
2017-10-30 23:15           ` [PATCH v3 1/8] debugfs: add support for more elaborate ->d_fsdata Nicolai Stange
2017-10-30 23:15           ` [PATCH v3 2/8] debugfs: implement per-file removal protection Nicolai Stange
2017-10-30 23:15           ` [PATCH v3 3/8] debugfs: debugfs_real_fops(): drop __must_hold sparse annotation Nicolai Stange
2017-10-30 23:15           ` [PATCH v3 4/8] debugfs: convert to debugfs_file_get() and -put() Nicolai Stange
2017-10-30 23:15           ` [PATCH v3 5/8] IB/hfi1: " Nicolai Stange
2017-11-07 14:29             ` Dennis Dalessandro
2017-10-30 23:15           ` [PATCH v3 6/8] debugfs: purge obsolete SRCU based removal protection Nicolai Stange
2017-10-30 23:15           ` [PATCH v3 7/8] debugfs: call debugfs_real_fops() only after debugfs_file_get() Nicolai Stange
2017-10-30 23:15           ` Nicolai Stange [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171030231554.6200-9-nicstange@gmail.com \
    --to=nicstange@gmail.com \
    --cc=dennis.dalessandro@intel.com \
    --cc=dledford@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hal.rosenstock@gmail.com \
    --cc=johannes@sipsolutions.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mike.marciniszyn@intel.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=sean.hefty@intel.com \
    --cc=tylerwhall@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox