[PATCH v2] Make lockdep happy with configfs

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2] Make lockdep happy with configfs
@ 2009-01-28 18:18 Louis Rilling
  2009-01-28 18:18 ` [PATCH 1/2] configfs: Silence lockdep on mkdir() and rmdir() Louis Rilling
  2009-01-28 18:18 ` [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy Louis Rilling
  0 siblings, 2 replies; 11+ messages in thread
From: Louis Rilling @ 2009-01-28 18:18 UTC (permalink / raw)
  To: Joel Becker; +Cc: linux-kernel, akpm, cluster-devel, swhiteho, peterz


Hi Joel,

Here is a revised version of the patchset making lockdep happy with configfs.
I still don't have a good setup to test the second patch beyond compilation, and
I still guess that you have one :)

Louis

Changelog:
- put s_depth logic in separate functions and remove #ifdef LOCKDEP in the
  hooked functions.
- added the following note to explain why configfs_depend_prep() is correct
  when examining attaching items:
+ * Note: items in the middle of attachment start with s_type = 0
+ * (configfs_new_dirent()), and configfs_make_dirent() (called from
+ * create_dir()) sets s_type = CONFIGFS_DIR|CONFIGFS_USET_CREATING. In both
+ * cases the item is ignored. Since s_type is an int, we rely on the CPU to
+ * atomically update the value, without making configfs_make_dirent() take
+ * configfs_dirent_lock.

- fixed parenthesis on pattern !a & b && c --> !(a & b) && c
- quiet checkpatch

Louis Rilling (2):
      configfs: Silence lockdep on mkdir() and rmdir()
      configfs: Rework configfs_depend_item() locking and make lockdep happy

 fs/configfs/configfs_internal.h |    3 +
 fs/configfs/dir.c               |  188 ++++++++++++++++++++++++++++-----------
 fs/configfs/inode.c             |   38 ++++++++
 3 files changed, 175 insertions(+), 54 deletions(-)
-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] configfs: Silence lockdep on mkdir() and rmdir()
  2009-01-28 18:18 [PATCH v2] Make lockdep happy with configfs Louis Rilling
@ 2009-01-28 18:18 ` Louis Rilling
  2009-01-28 18:18 ` [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy Louis Rilling
  1 sibling, 0 replies; 11+ messages in thread
From: Louis Rilling @ 2009-01-28 18:18 UTC (permalink / raw)
  To: Joel Becker
  Cc: linux-kernel, akpm, cluster-devel, swhiteho, peterz,
	Louis Rilling

When attaching default groups (subdirs) of a new group (in mkdir() or
in configfs_register()), configfs recursively takes inode's mutexes
along the path from the parent of the new group to the default
subdirs. This is needed to ensure that the VFS will not race with
operations on these sub-dirs. This is safe for the following reasons:

- the VFS allows one to lock first an inode and second one of its
  children (The lock subclasses for this pattern are respectively
  I_MUTEX_PARENT and I_MUTEX_CHILD);
- from this rule any inode path can be recursively locked in
  descending order as long as it stays under a single mountpoint and
  does not follow symlinks.

Unfortunately lockdep does not know (yet?) how to handle such
recursion.

I've tried to use Peter Zijlstra's lock_set_subclass() helper to
upgrade i_mutexes from I_MUTEX_CHILD to I_MUTEX_PARENT when we know
that we might recursively lock some of their descendant, but this
usage does not seem to fit the purpose of lock_set_subclass() because
it leads to several i_mutex locked with subclass I_MUTEX_PARENT by
the same task.

>From inside configfs it is not possible to serialize those recursive
locking with a top-level one, because mkdir() and rmdir() are already
called with inodes locked by the VFS. So using some
mutex_lock_nest_lock() is not an option.

I am proposing two solutions:
1) one that wraps recursive mutex_lock()s with
   lockdep_off()/lockdep_on().
2) (as suggested earlier by Peter Zijlstra) one that puts the
   i_mutexes recursively locked in different classes based on their
   depth from the top-level config_group created. This
   induces an arbitrary limit (MAX_LOCK_DEPTH - 2 == 46) on the
   nesting of configfs default groups whenever lockdep is activated
   but this limit looks reasonably high. Unfortunately, this also
   isolates VFS operations on configfs default groups from the others
   and thus lowers the chances to detect locking issues.

Nobody likes solution 1), what I can understand.

This patch implements solution 2). However lockdep is still not happy with
configfs_depend_item(). Next patch reworks the locking of
configfs_depend_item() and finally makes lockdep happy.

Changelog:
- put s_depth logic in separate functions and remove #ifdef LOCKDEP in the
  hooked functions.
- quiet checkpatch

Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com>
---
 fs/configfs/configfs_internal.h |    3 +
 fs/configfs/dir.c               |   90 +++++++++++++++++++++++++++++++++++++++
 fs/configfs/inode.c             |   38 ++++++++++++++++
 3 files changed, 131 insertions(+), 0 deletions(-)

diff --git a/fs/configfs/configfs_internal.h b/fs/configfs/configfs_internal.h
index 762d287..da6061a 100644
--- a/fs/configfs/configfs_internal.h
+++ b/fs/configfs/configfs_internal.h
@@ -39,6 +39,9 @@ struct configfs_dirent {
 	umode_t			s_mode;
 	struct dentry		* s_dentry;
 	struct iattr		* s_iattr;
+#ifdef CONFIG_LOCKDEP
+	int			s_depth;
+#endif
 };
 
 #define CONFIGFS_ROOT		0x0001
diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c
index 8e93341..836596b 100644
--- a/fs/configfs/dir.c
+++ b/fs/configfs/dir.c
@@ -78,6 +78,92 @@ static struct dentry_operations configfs_dentry_ops = {
 	.d_delete	= configfs_d_delete,
 };
 
+#ifdef CONFIG_LOCKDEP
+
+/*
+ * Helpers to make lockdep happy with our recursive locking of default groups'
+ * inodes (see configfs_attach_group() and configfs_detach_group()).
+ * We put default groups i_mutexes in separate classes according to their depth
+ * from the youngest non-default group ancestor.
+ *
+ * For a non-default group A having default groups A/B, A/C, and A/C/D, default
+ * groups A/B and A/C will have their inode's mutex in class
+ * default_group_class[0], and default group A/C/D will be in
+ * default_group_class[1].
+ *
+ * The lock classes are declared and assigned in inode.c, according to the
+ * s_depth value.
+ * The s_depth value is initialized to -1, adjusted to >= 0 when attaching
+ * default groups, and reset to -1 when all default groups are attached. During
+ * attachment, if configfs_create() sees s_depth > 0, the lock class of the new
+ * inode's mutex is set to default_group_class[s_depth - 1].
+ */
+
+static void configfs_init_dirent_depth(struct configfs_dirent *sd)
+{
+	sd->s_depth = -1;
+}
+
+static void configfs_set_dir_dirent_depth(struct configfs_dirent *parent_sd,
+					  struct configfs_dirent *sd)
+{
+	int parent_depth = parent_sd->s_depth;
+
+	if (parent_depth >= 0)
+		sd->s_depth = parent_depth + 1;
+}
+
+static void
+configfs_adjust_dir_dirent_depth_before_populate(struct configfs_dirent *sd)
+{
+	/*
+	 * item's i_mutex class is already setup, so s_depth is now only
+	 * used to set new sub-directories s_depth, which is always done
+	 * with item's i_mutex locked.
+	 */
+	/*
+	 *  sd->s_depth == -1 iff we are a non default group.
+	 *  else (we are a default group) sd->s_depth > 0 (see
+	 *  create_dir()).
+	 */
+	if (sd->s_depth == -1)
+		/*
+		 * We are a non default group and we are going to create
+		 * default groups.
+		 */
+		sd->s_depth = 0;
+}
+
+static void
+configfs_adjust_dir_dirent_depth_after_populate(struct configfs_dirent *sd)
+{
+	/* We will not create default groups anymore. */
+	sd->s_depth = -1;
+}
+
+#else /* CONFIG_LOCKDEP */
+
+static void configfs_init_dirent_depth(struct configfs_dirent *sd)
+{
+}
+
+static void configfs_set_dir_dirent_depth(struct configfs_dirent *parent_sd,
+					  struct configfs_dirent *sd)
+{
+}
+
+static void
+configfs_adjust_dir_dirent_depth_before_populate(struct configfs_dirent *sd)
+{
+}
+
+static void
+configfs_adjust_dir_dirent_depth_after_populate(struct configfs_dirent *sd)
+{
+}
+
+#endif /* CONFIG_LOCKDEP */
+
 /*
  * Allocates a new configfs_dirent and links it to the parent configfs_dirent
  */
@@ -94,6 +180,7 @@ static struct configfs_dirent *configfs_new_dirent(struct configfs_dirent * pare
 	INIT_LIST_HEAD(&sd->s_links);
 	INIT_LIST_HEAD(&sd->s_children);
 	sd->s_element = element;
+	configfs_init_dirent_depth(sd);
 	spin_lock(&configfs_dirent_lock);
 	if (parent_sd->s_type & CONFIGFS_USET_DROPPING) {
 		spin_unlock(&configfs_dirent_lock);
@@ -187,6 +274,7 @@ static int create_dir(struct config_item * k, struct dentry * p,
 		error = configfs_make_dirent(p->d_fsdata, d, k, mode,
 					     CONFIGFS_DIR | CONFIGFS_USET_CREATING);
 	if (!error) {
+		configfs_set_dir_dirent_depth(p->d_fsdata, d->d_fsdata);
 		error = configfs_create(d, mode, init_dir);
 		if (!error) {
 			inc_nlink(p->d_inode);
@@ -789,11 +877,13 @@ static int configfs_attach_group(struct config_item *parent_item,
 		 * error, as rmdir() would.
 		 */
 		mutex_lock_nested(&dentry->d_inode->i_mutex, I_MUTEX_CHILD);
+		configfs_adjust_dir_dirent_depth_before_populate(sd);
 		ret = populate_groups(to_config_group(item));
 		if (ret) {
 			configfs_detach_item(item);
 			dentry->d_inode->i_flags |= S_DEAD;
 		}
+		configfs_adjust_dir_dirent_depth_after_populate(sd);
 		mutex_unlock(&dentry->d_inode->i_mutex);
 		if (ret)
 			d_delete(dentry);
diff --git a/fs/configfs/inode.c b/fs/configfs/inode.c
index 5d349d3..4921e74 100644
--- a/fs/configfs/inode.c
+++ b/fs/configfs/inode.c
@@ -33,10 +33,15 @@
 #include <linux/backing-dev.h>
 #include <linux/capability.h>
 #include <linux/sched.h>
+#include <linux/lockdep.h>
 
 #include <linux/configfs.h>
 #include "configfs_internal.h"
 
+#ifdef CONFIG_LOCKDEP
+static struct lock_class_key default_group_class[MAX_LOCK_DEPTH];
+#endif
+
 extern struct super_block * configfs_sb;
 
 static const struct address_space_operations configfs_aops = {
@@ -150,6 +155,38 @@ struct inode * configfs_new_inode(mode_t mode, struct configfs_dirent * sd)
 	return inode;
 }
 
+#ifdef CONFIG_LOCKDEP
+
+static void configfs_set_inode_lock_class(struct configfs_dirent *sd,
+					  struct inode *inode)
+{
+	int depth = sd->s_depth;
+
+	if (depth > 0) {
+		if (depth <= ARRAY_SIZE(default_group_class)) {
+			lockdep_set_class(&inode->i_mutex,
+					  &default_group_class[depth - 1]);
+		} else {
+			/*
+			 * In practice the maximum level of locking depth is
+			 * already reached. Just inform about possible reasons.
+			 */
+			printk(KERN_INFO "configfs: Too many levels of inodes"
+			       " for the locking correctness validator.\n");
+			printk(KERN_INFO "Spurious warnings may appear.\n");
+		}
+	}
+}
+
+#else /* CONFIG_LOCKDEP */
+
+static void configfs_set_inode_lock_class(struct configfs_dirent *sd,
+					  struct inode *inode)
+{
+}
+
+#endif /* CONFIG_LOCKDEP */
+
 int configfs_create(struct dentry * dentry, int mode, int (*init)(struct inode *))
 {
 	int error = 0;
@@ -162,6 +199,7 @@ int configfs_create(struct dentry * dentry, int mode, int (*init)(struct inode *
 					struct inode *p_inode = dentry->d_parent->d_inode;
 					p_inode->i_mtime = p_inode->i_ctime = CURRENT_TIME;
 				}
+				configfs_set_inode_lock_class(sd, inode);
 				goto Proceed;
 			}
 			else
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy
  2009-01-28 18:18 [PATCH v2] Make lockdep happy with configfs Louis Rilling
  2009-01-28 18:18 ` [PATCH 1/2] configfs: Silence lockdep on mkdir() and rmdir() Louis Rilling
@ 2009-01-28 18:18 ` Louis Rilling
  2009-04-29 18:52   ` Joel Becker
  1 sibling, 1 reply; 11+ messages in thread
From: Louis Rilling @ 2009-01-28 18:18 UTC (permalink / raw)
  To: Joel Becker
  Cc: linux-kernel, akpm, cluster-devel, swhiteho, peterz,
	Louis Rilling

configfs_depend_item() recursively locks all inodes mutex from configfs root to
the target item, which makes lockdep unhappy. The purpose of this recursive
locking is to ensure that the item tree can be safely parsed and that the target
item, if found, is not about to leave.

This patch reworks configfs_depend_item() locking using configfs_dirent_lock.
Since configfs_dirent_lock protects all changes to the configfs_dirent tree, and
protects tagging of items to be removed, this lock can be used instead of the
inodes mutex lock chain.
This needs that the check for dependents be done atomically with
CONFIGFS_USET_DROPPING tagging.

Now lockdep looks happy with configfs.

Changelog:
- added the following note to explain why configfs_depend_prep() is correct
  when examining attaching items:
+ * Note: items in the middle of attachment start with s_type = 0
+ * (configfs_new_dirent()), and configfs_make_dirent() (called from
+ * create_dir()) sets s_type = CONFIGFS_DIR|CONFIGFS_USET_CREATING. In both
+ * cases the item is ignored. Since s_type is an int, we rely on the CPU to
+ * atomically update the value, without making configfs_make_dirent() take
+ * configfs_dirent_lock.

- fixed parenthesis on pattern !a & b && c --> !(a & b) && c
- quiet checkpatch

Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com>
---
 fs/configfs/dir.c |   98 ++++++++++++++++++++++++-----------------------------
 1 files changed, 44 insertions(+), 54 deletions(-)

diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c
index 836596b..2739514 100644
--- a/fs/configfs/dir.c
+++ b/fs/configfs/dir.c
@@ -1006,11 +1006,11 @@ static int configfs_dump(struct configfs_dirent *sd, int level)
  * Note, btw, that this can be called at *any* time, even when a configfs
  * subsystem isn't registered, or when configfs is loading or unloading.
  * Just like configfs_register_subsystem().  So we take the same
- * precautions.  We pin the filesystem.  We lock each i_mutex _in_order_
- * on our way down the tree.  If we can find the target item in the
+ * precautions.  We pin the filesystem.  We lock configfs_dirent_lock.
+ * If we can find the target item in the
  * configfs tree, it must be part of the subsystem tree as well, so we
- * do not need the subsystem semaphore.  Holding the i_mutex chain locks
- * out mkdir() and rmdir(), who might be racing us.
+ * do not need the subsystem semaphore.  Holding configfs_dirent_lock helps
+ * locking out mkdir() and rmdir(), who might be racing us.
  */
 
 /*
@@ -1023,17 +1023,23 @@ static int configfs_dump(struct configfs_dirent *sd, int level)
  * do that so we can unlock it if we find nothing.
  *
  * Here we do a depth-first search of the dentry hierarchy looking for
- * our object.  We take i_mutex on each step of the way down.  IT IS
- * ESSENTIAL THAT i_mutex LOCKING IS ORDERED.  If we come back up a branch,
- * we'll drop the i_mutex.
+ * our object.
+ * We deliberately ignore items tagged as dropping since they are virtually
+ * dead, as well as items in the middle of attachment since they virtually
+ * do not exist yet. This completes the locking out of racing mkdir() and
+ * rmdir().
+ * Note: items in the middle of attachment start with s_type = 0
+ * (configfs_new_dirent()), and configfs_make_dirent() (called from
+ * create_dir()) sets s_type = CONFIGFS_DIR|CONFIGFS_USET_CREATING. In both
+ * cases the item is ignored. Since s_type is an int, we rely on the CPU to
+ * atomically update the value, without making configfs_make_dirent() take
+ * configfs_dirent_lock.
  *
- * If the target is not found, -ENOENT is bubbled up and we have released
- * all locks.  If the target was found, the locks will be cleared by
- * configfs_depend_rollback().
+ * If the target is not found, -ENOENT is bubbled up.
  *
  * This adds a requirement that all config_items be unique!
  *
- * This is recursive because the locking traversal is tricky.  There isn't
+ * This is recursive.  There isn't
  * much on the stack, though, so folks that need this function - be careful
  * about your stack!  Patches will be accepted to make it iterative.
  */
@@ -1045,13 +1051,13 @@ static int configfs_depend_prep(struct dentry *origin,
 
 	BUG_ON(!origin || !sd);
 
-	/* Lock this guy on the way down */
-	mutex_lock(&sd->s_dentry->d_inode->i_mutex);
 	if (sd->s_element == target)  /* Boo-yah */
 		goto out;
 
 	list_for_each_entry(child_sd, &sd->s_children, s_sibling) {
-		if (child_sd->s_type & CONFIGFS_DIR) {
+		if ((child_sd->s_type & CONFIGFS_DIR) &&
+		    !(child_sd->s_type & CONFIGFS_USET_DROPPING) &&
+		    !(child_sd->s_type & CONFIGFS_USET_CREATING)) {
 			ret = configfs_depend_prep(child_sd->s_dentry,
 						   target);
 			if (!ret)
@@ -1060,33 +1066,12 @@ static int configfs_depend_prep(struct dentry *origin,
 	}
 
 	/* We looped all our children and didn't find target */
-	mutex_unlock(&sd->s_dentry->d_inode->i_mutex);
 	ret = -ENOENT;
 
 out:
 	return ret;
 }
 
-/*
- * This is ONLY called if configfs_depend_prep() did its job.  So we can
- * trust the entire path from item back up to origin.
- *
- * We walk backwards from item, unlocking each i_mutex.  We finish by
- * unlocking origin.
- */
-static void configfs_depend_rollback(struct dentry *origin,
-				     struct config_item *item)
-{
-	struct dentry *dentry = item->ci_dentry;
-
-	while (dentry != origin) {
-		mutex_unlock(&dentry->d_inode->i_mutex);
-		dentry = dentry->d_parent;
-	}
-
-	mutex_unlock(&origin->d_inode->i_mutex);
-}
-
 int configfs_depend_item(struct configfs_subsystem *subsys,
 			 struct config_item *target)
 {
@@ -1127,17 +1112,21 @@ int configfs_depend_item(struct configfs_subsystem *subsys,
 
 	/* Ok, now we can trust subsys/s_item */
 
-	/* Scan the tree, locking i_mutex recursively, return 0 if found */
+	spin_lock(&configfs_dirent_lock);
+	/* Scan the tree, return 0 if found */
 	ret = configfs_depend_prep(subsys_sd->s_dentry, target);
 	if (ret)
-		goto out_unlock_fs;
+		goto out_unlock_dirent_lock;
 
-	/* We hold all i_mutexes from the subsystem down to the target */
+	/*
+	 * We are sure that the item is not about to be removed by rmdir(), and
+	 * not in the middle of attachment by mkdir().
+	 */
 	p = target->ci_dentry->d_fsdata;
 	p->s_dependent_count += 1;
 
-	configfs_depend_rollback(subsys_sd->s_dentry, target);
-
+out_unlock_dirent_lock:
+	spin_unlock(&configfs_dirent_lock);
 out_unlock_fs:
 	mutex_unlock(&configfs_sb->s_root->d_inode->i_mutex);
 
@@ -1162,10 +1151,10 @@ void configfs_undepend_item(struct configfs_subsystem *subsys,
 	struct configfs_dirent *sd;
 
 	/*
-	 * Since we can trust everything is pinned, we just need i_mutex
-	 * on the item.
+	 * Since we can trust everything is pinned, we just need
+	 * configfs_dirent_lock.
 	 */
-	mutex_lock(&target->ci_dentry->d_inode->i_mutex);
+	spin_lock(&configfs_dirent_lock);
 
 	sd = target->ci_dentry->d_fsdata;
 	BUG_ON(sd->s_dependent_count < 1);
@@ -1176,7 +1165,7 @@ void configfs_undepend_item(struct configfs_subsystem *subsys,
 	 * After this unlock, we cannot trust the item to stay alive!
 	 * DO NOT REFERENCE item after this unlock.
 	 */
-	mutex_unlock(&target->ci_dentry->d_inode->i_mutex);
+	spin_unlock(&configfs_dirent_lock);
 }
 EXPORT_SYMBOL(configfs_undepend_item);
 
@@ -1376,13 +1365,6 @@ static int configfs_rmdir(struct inode *dir, struct dentry *dentry)
 	if (sd->s_type & CONFIGFS_USET_DEFAULT)
 		return -EPERM;
 
-	/*
-	 * Here's where we check for dependents.  We're protected by
-	 * i_mutex.
-	 */
-	if (sd->s_dependent_count)
-		return -EBUSY;
-
 	/* Get a working ref until we have the child */
 	parent_item = configfs_get_config_item(dentry->d_parent);
 	subsys = to_config_group(parent_item)->cg_subsys;
@@ -1406,9 +1388,17 @@ static int configfs_rmdir(struct inode *dir, struct dentry *dentry)
 
 		mutex_lock(&configfs_symlink_mutex);
 		spin_lock(&configfs_dirent_lock);
-		ret = configfs_detach_prep(dentry, &wait_mutex);
-		if (ret)
-			configfs_detach_rollback(dentry);
+		/*
+		 * Here's where we check for dependents.  We're protected by
+		 * configfs_dirent_lock.
+		 * If no dependent, atomically tag the item as dropping.
+		 */
+		ret = sd->s_dependent_count ? -EBUSY : 0;
+		if (!ret) {
+			ret = configfs_detach_prep(dentry, &wait_mutex);
+			if (ret)
+				configfs_detach_rollback(dentry);
+		}
 		spin_unlock(&configfs_dirent_lock);
 		mutex_unlock(&configfs_symlink_mutex);
 
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy
  2009-01-28 18:18 ` [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy Louis Rilling
@ 2009-04-29 18:52   ` Joel Becker
  2009-04-30  9:18     ` Louis Rilling
  0 siblings, 1 reply; 11+ messages in thread
From: Joel Becker @ 2009-04-29 18:52 UTC (permalink / raw)
  To: Louis Rilling; +Cc: linux-kernel, akpm, cluster-devel, swhiteho, peterz

On Wed, Jan 28, 2009 at 07:18:33PM +0100, Louis Rilling wrote:
> configfs_depend_item() recursively locks all inodes mutex from configfs root to
> the target item, which makes lockdep unhappy. The purpose of this recursive
> locking is to ensure that the item tree can be safely parsed and that the target
> item, if found, is not about to leave.
> 
> This patch reworks configfs_depend_item() locking using configfs_dirent_lock.
> Since configfs_dirent_lock protects all changes to the configfs_dirent tree, and
> protects tagging of items to be removed, this lock can be used instead of the
> inodes mutex lock chain.
> This needs that the check for dependents be done atomically with
> CONFIGFS_USET_DROPPING tagging.

	These patches are now in the 'lockdep' branch of the configfs
tree.  I'm planning to send them in the next merge window.  I've made
one change.

> + * Note: items in the middle of attachment start with s_type = 0
> + * (configfs_new_dirent()), and configfs_make_dirent() (called from
> + * create_dir()) sets s_type = CONFIGFS_DIR|CONFIGFS_USET_CREATING. In both
> + * cases the item is ignored. Since s_type is an int, we rely on the CPU to
> + * atomically update the value, without making configfs_make_dirent() take
> + * configfs_dirent_lock.

	I've added configfs_dirent_lock in configfs_make_dirent(),
because it is not safe at all to rely on the fact that s_type is an int.
It's an atomic set on one CPU, but there's no guarantee that it's seen
correctly on other CPUs.  Plus, there's no real need for speed here.  So
we properly take configfs_dirent_lock around s_type in
configfs_make_dirent(), and that ensures we see things correctly on SMP.

Joel

-- 

"Three o'clock is always too late or too early for anything you
 want to do."
        - Jean-Paul Sartre

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy
  2009-04-29 18:52   ` Joel Becker
@ 2009-04-30  9:18     ` Louis Rilling
  2009-04-30 17:20       ` Joel Becker
  0 siblings, 1 reply; 11+ messages in thread
From: Louis Rilling @ 2009-04-30  9:18 UTC (permalink / raw)
  To: linux-kernel, akpm, cluster-devel, swhiteho, peterz

[-- Attachment #1: Type: text/plain, Size: 2230 bytes --]

On 29/04/09 11:52 -0700, Joel Becker wrote:
> On Wed, Jan 28, 2009 at 07:18:33PM +0100, Louis Rilling wrote:
> > configfs_depend_item() recursively locks all inodes mutex from configfs root to
> > the target item, which makes lockdep unhappy. The purpose of this recursive
> > locking is to ensure that the item tree can be safely parsed and that the target
> > item, if found, is not about to leave.
> > 
> > This patch reworks configfs_depend_item() locking using configfs_dirent_lock.
> > Since configfs_dirent_lock protects all changes to the configfs_dirent tree, and
> > protects tagging of items to be removed, this lock can be used instead of the
> > inodes mutex lock chain.
> > This needs that the check for dependents be done atomically with
> > CONFIGFS_USET_DROPPING tagging.
> 
> 	These patches are now in the 'lockdep' branch of the configfs
> tree.  I'm planning to send them in the next merge window.  I've made
> one change.
> 
> > + * Note: items in the middle of attachment start with s_type = 0
> > + * (configfs_new_dirent()), and configfs_make_dirent() (called from
> > + * create_dir()) sets s_type = CONFIGFS_DIR|CONFIGFS_USET_CREATING. In both
> > + * cases the item is ignored. Since s_type is an int, we rely on the CPU to
> > + * atomically update the value, without making configfs_make_dirent() take
> > + * configfs_dirent_lock.
> 
> 	I've added configfs_dirent_lock in configfs_make_dirent(),
> because it is not safe at all to rely on the fact that s_type is an int.
> It's an atomic set on one CPU, but there's no guarantee that it's seen
> correctly on other CPUs.  Plus, there's no real need for speed here.  So
> we properly take configfs_dirent_lock around s_type in
> configfs_make_dirent(), and that ensures we see things correctly on SMP.

Agreed, I was going to suggest something like this. Actually I'd push the
initialization of s_type down to configfs_new_dirent(), so that s_type either
is always NULL, or always shows the correct type of object.

Thanks,

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy
  2009-04-30  9:18     ` Louis Rilling
@ 2009-04-30 17:20       ` Joel Becker
  2009-04-30 17:30         ` Joel Becker
  0 siblings, 1 reply; 11+ messages in thread
From: Joel Becker @ 2009-04-30 17:20 UTC (permalink / raw)
  To: linux-kernel, akpm, cluster-devel, swhiteho, peterz

On Thu, Apr 30, 2009 at 11:18:28AM +0200, Louis Rilling wrote:
> On 29/04/09 11:52 -0700, Joel Becker wrote:
> > On Wed, Jan 28, 2009 at 07:18:33PM +0100, Louis Rilling wrote:
> > > configfs_depend_item() recursively locks all inodes mutex from configfs root to
> > > the target item, which makes lockdep unhappy. The purpose of this recursive
> > > locking is to ensure that the item tree can be safely parsed and that the target
> > > item, if found, is not about to leave.
> > > 
> > > This patch reworks configfs_depend_item() locking using configfs_dirent_lock.
> > > Since configfs_dirent_lock protects all changes to the configfs_dirent tree, and
> > > protects tagging of items to be removed, this lock can be used instead of the
> > > inodes mutex lock chain.
> > > This needs that the check for dependents be done atomically with
> > > CONFIGFS_USET_DROPPING tagging.
> > 
> > 	These patches are now in the 'lockdep' branch of the configfs
> > tree.  I'm planning to send them in the next merge window.  I've made
> > one change.
> > 
> > > + * Note: items in the middle of attachment start with s_type = 0
> > > + * (configfs_new_dirent()), and configfs_make_dirent() (called from
> > > + * create_dir()) sets s_type = CONFIGFS_DIR|CONFIGFS_USET_CREATING. In both
> > > + * cases the item is ignored. Since s_type is an int, we rely on the CPU to
> > > + * atomically update the value, without making configfs_make_dirent() take
> > > + * configfs_dirent_lock.
> > 
> > 	I've added configfs_dirent_lock in configfs_make_dirent(),
> > because it is not safe at all to rely on the fact that s_type is an int.
> > It's an atomic set on one CPU, but there's no guarantee that it's seen
> > correctly on other CPUs.  Plus, there's no real need for speed here.  So
> > we properly take configfs_dirent_lock around s_type in
> > configfs_make_dirent(), and that ensures we see things correctly on SMP.
> 
> Agreed, I was going to suggest something like this. Actually I'd push the
> initialization of s_type down to configfs_new_dirent(), so that s_type either
> is always NULL, or always shows the correct type of object.

	0, not "NULL", but yeah I think that's a good plan.

Joel

-- 

"If at first you don't succeed, cover all traces that you tried."
                                                        -Unknown

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy
  2009-04-30 17:20       ` Joel Becker
@ 2009-04-30 17:30         ` Joel Becker
  2009-05-04 10:20           ` Louis Rilling
  0 siblings, 1 reply; 11+ messages in thread
From: Joel Becker @ 2009-04-30 17:30 UTC (permalink / raw)
  To: linux-kernel, akpm, cluster-devel, swhiteho, peterz

On Thu, Apr 30, 2009 at 10:20:14AM -0700, Joel Becker wrote:
> On Thu, Apr 30, 2009 at 11:18:28AM +0200, Louis Rilling wrote:
> > Agreed, I was going to suggest something like this. Actually I'd push the
> > initialization of s_type down to configfs_new_dirent(), so that s_type either
> > is always NULL, or always shows the correct type of object.
> 
> 	0, not "NULL", but yeah I think that's a good plan.

	Like this.  Please review the comment change mostly.


diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c
index 63d8815..8e48b52 100644
--- a/fs/configfs/dir.c
+++ b/fs/configfs/dir.c
@@ -167,8 +167,8 @@ configfs_adjust_dir_dirent_depth_after_populate(struct configfs_dirent *sd)
 /*
  * Allocates a new configfs_dirent and links it to the parent configfs_dirent
  */
-static struct configfs_dirent *configfs_new_dirent(struct configfs_dirent * parent_sd,
-						void * element)
+static struct configfs_dirent *configfs_new_dirent(struct configfs_dirent *parent_sd,
+						   void *element, int type)
 {
 	struct configfs_dirent * sd;
 
@@ -180,6 +180,7 @@ static struct configfs_dirent *configfs_new_dirent(struct configfs_dirent * pare
 	INIT_LIST_HEAD(&sd->s_links);
 	INIT_LIST_HEAD(&sd->s_children);
 	sd->s_element = element;
+	sd->s_type = type;
 	configfs_init_dirent_depth(sd);
 	spin_lock(&configfs_dirent_lock);
 	if (parent_sd->s_type & CONFIGFS_USET_DROPPING) {
@@ -225,19 +226,12 @@ int configfs_make_dirent(struct configfs_dirent * parent_sd,
 {
 	struct configfs_dirent * sd;
 
-	sd = configfs_new_dirent(parent_sd, element);
+	sd = configfs_new_dirent(parent_sd, element, type);
 	if (IS_ERR(sd))
 		return PTR_ERR(sd);
 
-	/*
-	 * We need configfs_dirent_lock so that configfs_depend_prep()
-	 * can see s_type accurately on other CPUs.
-	 */
-	spin_lock(&configfs_dirent_lock);
 	sd->s_mode = mode;
-	sd->s_type = type;
 	sd->s_dentry = dentry;
-	spin_unlock(&configfs_dirent_lock);
 	if (dentry) {
 		dentry->d_fsdata = configfs_get(sd);
 		dentry->d_op = &configfs_dentry_ops;
@@ -1034,11 +1028,10 @@ static int configfs_dump(struct configfs_dirent *sd, int level)
  * dead, as well as items in the middle of attachment since they virtually
  * do not exist yet. This completes the locking out of racing mkdir() and
  * rmdir().
- * Note: items in the middle of attachment start with s_type = 0
- * (configfs_new_dirent()), and configfs_make_dirent() (called from
- * create_dir()) sets s_type = CONFIGFS_DIR|CONFIGFS_USET_CREATING. In both
- * cases the item is ignored.  configfs_make_dirent() is locked out from
- * updating s_type by configfs_dirent_lock.
+ * Note: subdirectories in the middle of attachment start with s_type =
+ * CONFIGFS_DIR|CONFIGFS_USET_CREATING set by create_dir().  When
+ * CONFIGFS_USET_CREATING is set, we ignore the item.  The actual set of
+ * s_type is in configfs_new_dirent(), which has configfs_dirent_lock.
  *
  * If the target is not found, -ENOENT is bubbled up.
  *
@@ -1514,7 +1507,7 @@ static int configfs_dir_open(struct inode *inode, struct file *file)
 	 */
 	err = -ENOENT;
 	if (configfs_dirent_is_ready(parent_sd)) {
-		file->private_data = configfs_new_dirent(parent_sd, NULL);
+		file->private_data = configfs_new_dirent(parent_sd, NULL, 0);
 		if (IS_ERR(file->private_data))
 			err = PTR_ERR(file->private_data);
 		else

-- 

"There is no sincerer love than the love of food."  
         - George Bernard Shaw 

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy
  2009-04-30 17:30         ` Joel Becker
@ 2009-05-04 10:20           ` Louis Rilling
  0 siblings, 0 replies; 11+ messages in thread
From: Louis Rilling @ 2009-05-04 10:20 UTC (permalink / raw)
  To: linux-kernel, akpm, cluster-devel, swhiteho, peterz

[-- Attachment #1: Type: text/plain, Size: 4567 bytes --]

On 30/04/09 10:30 -0700, Joel Becker wrote:
> On Thu, Apr 30, 2009 at 10:20:14AM -0700, Joel Becker wrote:
> > On Thu, Apr 30, 2009 at 11:18:28AM +0200, Louis Rilling wrote:
> > > Agreed, I was going to suggest something like this. Actually I'd push the
> > > initialization of s_type down to configfs_new_dirent(), so that s_type either
> > > is always NULL, or always shows the correct type of object.
> > 
> > 	0, not "NULL", but yeah I think that's a good plan.
> 
> 	Like this.  Please review the comment change mostly.

Indeed, the code is exactly what I had in mind. Small comments about the comment
change below.

Thanks!

Acked-by: Louis Rilling <louis.rilling@kerlabs.com>

> 
> 
> diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c
> index 63d8815..8e48b52 100644
> --- a/fs/configfs/dir.c
> +++ b/fs/configfs/dir.c
> @@ -167,8 +167,8 @@ configfs_adjust_dir_dirent_depth_after_populate(struct configfs_dirent *sd)
>  /*
>   * Allocates a new configfs_dirent and links it to the parent configfs_dirent
>   */
> -static struct configfs_dirent *configfs_new_dirent(struct configfs_dirent * parent_sd,
> -						void * element)
> +static struct configfs_dirent *configfs_new_dirent(struct configfs_dirent *parent_sd,
> +						   void *element, int type)
>  {
>  	struct configfs_dirent * sd;
>  
> @@ -180,6 +180,7 @@ static struct configfs_dirent *configfs_new_dirent(struct configfs_dirent * pare
>  	INIT_LIST_HEAD(&sd->s_links);
>  	INIT_LIST_HEAD(&sd->s_children);
>  	sd->s_element = element;
> +	sd->s_type = type;
>  	configfs_init_dirent_depth(sd);
>  	spin_lock(&configfs_dirent_lock);
>  	if (parent_sd->s_type & CONFIGFS_USET_DROPPING) {
> @@ -225,19 +226,12 @@ int configfs_make_dirent(struct configfs_dirent * parent_sd,
>  {
>  	struct configfs_dirent * sd;
>  
> -	sd = configfs_new_dirent(parent_sd, element);
> +	sd = configfs_new_dirent(parent_sd, element, type);
>  	if (IS_ERR(sd))
>  		return PTR_ERR(sd);
>  
> -	/*
> -	 * We need configfs_dirent_lock so that configfs_depend_prep()
> -	 * can see s_type accurately on other CPUs.
> -	 */
> -	spin_lock(&configfs_dirent_lock);
>  	sd->s_mode = mode;
> -	sd->s_type = type;
>  	sd->s_dentry = dentry;
> -	spin_unlock(&configfs_dirent_lock);
>  	if (dentry) {
>  		dentry->d_fsdata = configfs_get(sd);
>  		dentry->d_op = &configfs_dentry_ops;
> @@ -1034,11 +1028,10 @@ static int configfs_dump(struct configfs_dirent *sd, int level)
>   * dead, as well as items in the middle of attachment since they virtually
>   * do not exist yet. This completes the locking out of racing mkdir() and
>   * rmdir().
> - * Note: items in the middle of attachment start with s_type = 0
> - * (configfs_new_dirent()), and configfs_make_dirent() (called from
> - * create_dir()) sets s_type = CONFIGFS_DIR|CONFIGFS_USET_CREATING. In both
> - * cases the item is ignored.  configfs_make_dirent() is locked out from
> - * updating s_type by configfs_dirent_lock.
> + * Note: subdirectories in the middle of attachment start with s_type =
> + * CONFIGFS_DIR|CONFIGFS_USET_CREATING set by create_dir().  When

I'd say "As long as CONFIGFS_USET_CREATING is set", since, by design, once cleared,
CONFIGFS_USET_CREATING never comes back.

Louis

> + * CONFIGFS_USET_CREATING is set, we ignore the item.  The actual set of
> + * s_type is in configfs_new_dirent(), which has configfs_dirent_lock.
>   *
>   * If the target is not found, -ENOENT is bubbled up.
>   *
> @@ -1514,7 +1507,7 @@ static int configfs_dir_open(struct inode *inode, struct file *file)
>  	 */
>  	err = -ENOENT;
>  	if (configfs_dirent_is_ready(parent_sd)) {
> -		file->private_data = configfs_new_dirent(parent_sd, NULL);
> +		file->private_data = configfs_new_dirent(parent_sd, NULL, 0);
>  		if (IS_ERR(file->private_data))
>  			err = PTR_ERR(file->private_data);
>  		else
> 
> -- 
> 
> "There is no sincerer love than the love of food."  
>          - George Bernard Shaw 
> 
> Joel Becker
> Principal Software Developer
> Oracle
> E-mail: joel.becker@oracle.com
> Phone: (650) 506-8127
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item()
@ 2008-12-18 11:15 Louis Rilling
  2008-12-18 18:00 ` [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy Louis Rilling
  0 siblings, 1 reply; 11+ messages in thread
From: Louis Rilling @ 2008-12-18 11:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra, Andrew Morton, cluster-devel, swhiteho

[-- Attachment #1: Type: text/plain, Size: 4727 bytes --]

On 18/12/08  1:27 -0800, Joel Becker wrote:
> 
> 	I know it's hard, or I'd have sent you patches :-)  In fact,
> Louis tried to use the subclass bits to make this work to a depth of N
> (where N was probably deep enough in practice).  However, this creates
> subclasses that don't get seen by the regular VFS locking - and the big
> deal here is making sure configfs's use of i_mutex meshes with the VFS.
> That is, his code made the warnings go away, but removed much of
> lockdep's ability to see when we got the locking wrong.
> 
> > The thing is, in practise it turns out that reworking code to not run
> > into these issues often makes the code better - if only for the fact
> > that taking locks is expensive and doing less is better, and holding
> > locks stifles concurrency, so holding less it better (yes, I said
> > _often_, there likely are counter cases but I don't believe configfs is
> > one of them).
> 
> 	This isn't about concurrency or speed.  This is about safety
> while configfs is attaching or (especially) detaching config_items from
> the filesystem view it presents.  When the VFS travels down a path, it
> unlocks the trailing directory.  We can't do that when tearing down
> default groups, because we need to lock that small hunk and tear it out
> atomically.
> 
> > Anyway - I'm against just turning lockdep off, that will make folks
> > complacent and let the stuff rot to bits inside - and I for one will
> > refuse to run anything using it (but since that only seems to be
> > dlm/ocfs and I'm of the believe that centralized cluster stuff sucks
> > rocks anyway that won't be a problem).
> 
> 	Oh, be nice :-)
> 	You are absolutely right that turning off lockdep leaves the
> possibility of complacency and bitrot.  That's precisely why I didn't
> like Louis' subclass solution - again, bitrot might go unnoticed.
> 	Now, I know that I will be paying attention to the locking and
> going over it with a fine-toothed comb.  But I'd much rather have an
> actual working lockdep analysis.  Whether that means we find a way for
> lockdep to describe what's happening here, or we find another way to
> keep folks out of the tree we're removing, I don't care.

Perhaps I didn't explain myself well. Quoting my original post:

<quote>
I am proposing two solutions:
1) one that wraps recursive mutex_lock()s with
   lockdep_off()/lockdep_on().
2) (as suggested earlier by Peter Zijlstra) one that puts the
   i_mutexes recursively locked in different classes based on their
   depth from the top-level config_group created. This
   induces an arbitrary limit (MAX_LOCK_DEPTH - 2 == 46) on the
   nesting of configfs default groups whenever lockdep is activated
   but this limit looks reasonably high. Unfortunately, this alos
   isolates VFS operations on configfs default groups from the others
   and thus lowers the chances to detect locking issues.

This patch implements solution 1).

Solution 2) looks better from lockdep's point of view, but fails with
configfs_depend_item(). This needs to rework the locking
scheme of configfs_depend_item() by removing the variable lock recursion
depth, and I think that it's doable thanks to the configfs_dirent_lock.
For now, let's stick to solution 1).
</quote>

Solution 2) does not play with i_mutex sub-classes as I proposed earlier, but
instead put default_groups' i_mutex in separate classes (actually one class per
default group depth). This is not worse than putting each run queue lock in a
separate class, as it used to be.

For instance, if a created group A has default groups A/B, A/D, and A/B/C, A's
i_mutex class will be the regular i_mutex class used everywhere else in the VFS,
A/B and A/D will have default_group_class[0], and A/B/C will have
default_group_class[1].

Of course those default_group classes will not benefit from locking schemes seen
by lockdep outside configfs, but they still will interact nicely with the VFS.
Moreover, a default group depth limit of 46 (MAX_LOCK_DEPTH - 2) looks rather
reasonable, doesn't it?

To me the real drawback of this solution is that it needs to rework locking in
configfs_depend_item(). Peter says it is preferable, I know how it could be
done, but as any code rework this may bring new bugs, and I realize that I'm
spending time to explain this while 1) I don't have much time to just explain
what could be done, 2) I'd prefer having time to code what I am explaining.
Let's see if I can show you something today.

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy
  2008-12-18 11:15 [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() Louis Rilling
@ 2008-12-18 18:00 ` Louis Rilling
  2009-01-28  4:13   ` Joel Becker
  0 siblings, 1 reply; 11+ messages in thread
From: Louis Rilling @ 2008-12-18 18:00 UTC (permalink / raw)
  To: Joel Becker
  Cc: linux-kernel, akpm, cluster-devel, swhiteho, peterz,
	Louis Rilling

configfs_depend_item() recursively locks all inodes mutex from configfs root to
the target item, which makes lockdep unhappy. The purpose of this recursive
locking is to ensure that the item tree can be safely parsed and that the target
item, if found, is not about to leave.

This patch reworks configfs_depend_item() locking using configfs_dirent_lock.
Since configfs_dirent_lock protects all changes to the configfs_dirent tree, and
protects tagging of items to be removed, this lock can be used instead of the
inodes mutex lock chain.
This needs that the check for dependents be done atomically with
CONFIGFS_USET_DROPPING tagging.

Now lockdep looks happy with configfs.

Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com>
---
 fs/configfs/dir.c |   92 ++++++++++++++++++++++-------------------------------
 1 files changed, 38 insertions(+), 54 deletions(-)

diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c
index f21be74..4dea906 100644
--- a/fs/configfs/dir.c
+++ b/fs/configfs/dir.c
@@ -955,11 +955,11 @@ static int configfs_dump(struct configfs_dirent *sd, int level)
  * Note, btw, that this can be called at *any* time, even when a configfs
  * subsystem isn't registered, or when configfs is loading or unloading.
  * Just like configfs_register_subsystem().  So we take the same
- * precautions.  We pin the filesystem.  We lock each i_mutex _in_order_
- * on our way down the tree.  If we can find the target item in the
+ * precautions.  We pin the filesystem.  We lock configfs_dirent_lock.
+ * If we can find the target item in the
  * configfs tree, it must be part of the subsystem tree as well, so we
- * do not need the subsystem semaphore.  Holding the i_mutex chain locks
- * out mkdir() and rmdir(), who might be racing us.
+ * do not need the subsystem semaphore.  Holding configfs_dirent_lock helps
+ * locking out mkdir() and rmdir(), who might be racing us.
  */
 
 /*
@@ -972,17 +972,17 @@ static int configfs_dump(struct configfs_dirent *sd, int level)
  * do that so we can unlock it if we find nothing.
  *
  * Here we do a depth-first search of the dentry hierarchy looking for
- * our object.  We take i_mutex on each step of the way down.  IT IS
- * ESSENTIAL THAT i_mutex LOCKING IS ORDERED.  If we come back up a branch,
- * we'll drop the i_mutex.
+ * our object.
+ * We deliberately ignore items tagged as dropping since they are virtually
+ * dead, as well as items in the middle of attachment since they virtually
+ * do not exist yet. This completes the locking out of racing mkdir() and
+ * rmdir().
  *
- * If the target is not found, -ENOENT is bubbled up and we have released
- * all locks.  If the target was found, the locks will be cleared by
- * configfs_depend_rollback().
+ * If the target is not found, -ENOENT is bubbled up.
  *
  * This adds a requirement that all config_items be unique!
  *
- * This is recursive because the locking traversal is tricky.  There isn't
+ * This is recursive.  There isn't
  * much on the stack, though, so folks that need this function - be careful
  * about your stack!  Patches will be accepted to make it iterative.
  */
@@ -994,13 +994,13 @@ static int configfs_depend_prep(struct dentry *origin,
 
 	BUG_ON(!origin || !sd);
 
-	/* Lock this guy on the way down */
-	mutex_lock(&sd->s_dentry->d_inode->i_mutex);
 	if (sd->s_element == target)  /* Boo-yah */
 		goto out;
 
 	list_for_each_entry(child_sd, &sd->s_children, s_sibling) {
-		if (child_sd->s_type & CONFIGFS_DIR) {
+		if (child_sd->s_type & CONFIGFS_DIR &&
+		    !child_sd->s_type & CONFIGFS_USET_DROPPING &&
+		    !child_sd->s_type & CONFIGFS_USET_CREATING) {
 			ret = configfs_depend_prep(child_sd->s_dentry,
 						   target);
 			if (!ret)
@@ -1009,33 +1009,12 @@ static int configfs_depend_prep(struct dentry *origin,
 	}
 
 	/* We looped all our children and didn't find target */
-	mutex_unlock(&sd->s_dentry->d_inode->i_mutex);
 	ret = -ENOENT;
 
 out:
 	return ret;
 }
 
-/*
- * This is ONLY called if configfs_depend_prep() did its job.  So we can
- * trust the entire path from item back up to origin.
- *
- * We walk backwards from item, unlocking each i_mutex.  We finish by
- * unlocking origin.
- */
-static void configfs_depend_rollback(struct dentry *origin,
-				     struct config_item *item)
-{
-	struct dentry *dentry = item->ci_dentry;
-
-	while (dentry != origin) {
-		mutex_unlock(&dentry->d_inode->i_mutex);
-		dentry = dentry->d_parent;
-	}
-
-	mutex_unlock(&origin->d_inode->i_mutex);
-}
-
 int configfs_depend_item(struct configfs_subsystem *subsys,
 			 struct config_item *target)
 {
@@ -1076,17 +1055,21 @@ int configfs_depend_item(struct configfs_subsystem *subsys,
 
 	/* Ok, now we can trust subsys/s_item */
 
-	/* Scan the tree, locking i_mutex recursively, return 0 if found */
+	spin_lock(&configfs_dirent_lock);
+	/* Scan the tree, protected by configfs_dirent_lock, return 0 if found */
 	ret = configfs_depend_prep(subsys_sd->s_dentry, target);
 	if (ret)
-		goto out_unlock_fs;
+		goto out_unlock_dirent_lock;
 
-	/* We hold all i_mutexes from the subsystem down to the target */
+	/*
+	 * We are sure that the item is not about to be removed by rmdir(), and
+	 * not in the middle of attachment by mkdir().
+	 */
 	p = target->ci_dentry->d_fsdata;
 	p->s_dependent_count += 1;
 
-	configfs_depend_rollback(subsys_sd->s_dentry, target);
-
+out_unlock_dirent_lock:
+	spin_unlock(&configfs_dirent_lock);
 out_unlock_fs:
 	mutex_unlock(&configfs_sb->s_root->d_inode->i_mutex);
 
@@ -1111,10 +1094,10 @@ void configfs_undepend_item(struct configfs_subsystem *subsys,
 	struct configfs_dirent *sd;
 
 	/*
-	 * Since we can trust everything is pinned, we just need i_mutex
-	 * on the item.
+	 * Since we can trust everything is pinned, we just need
+	 * configfs_dirent_lock.
 	 */
-	mutex_lock(&target->ci_dentry->d_inode->i_mutex);
+	spin_lock(&configfs_dirent_lock);
 
 	sd = target->ci_dentry->d_fsdata;
 	BUG_ON(sd->s_dependent_count < 1);
@@ -1125,7 +1108,7 @@ void configfs_undepend_item(struct configfs_subsystem *subsys,
 	 * After this unlock, we cannot trust the item to stay alive!
 	 * DO NOT REFERENCE item after this unlock.
 	 */
-	mutex_unlock(&target->ci_dentry->d_inode->i_mutex);
+	spin_unlock(&configfs_dirent_lock);
 }
 EXPORT_SYMBOL(configfs_undepend_item);
 
@@ -1325,13 +1308,6 @@ static int configfs_rmdir(struct inode *dir, struct dentry *dentry)
 	if (sd->s_type & CONFIGFS_USET_DEFAULT)
 		return -EPERM;
 
-	/*
-	 * Here's where we check for dependents.  We're protected by
-	 * i_mutex.
-	 */
-	if (sd->s_dependent_count)
-		return -EBUSY;
-
 	/* Get a working ref until we have the child */
 	parent_item = configfs_get_config_item(dentry->d_parent);
 	subsys = to_config_group(parent_item)->cg_subsys;
@@ -1355,9 +1331,17 @@ static int configfs_rmdir(struct inode *dir, struct dentry *dentry)
 
 		mutex_lock(&configfs_symlink_mutex);
 		spin_lock(&configfs_dirent_lock);
-		ret = configfs_detach_prep(dentry, &wait_mutex);
-		if (ret)
-			configfs_detach_rollback(dentry);
+		/*
+		 * Here's where we check for dependents.  We're protected by
+		 * configfs_dirent_lock.
+		 * If no dependent, atomically tag the item as dropping.
+		 */
+		ret = sd->s_dependent_count ? -EBUSY : 0;
+		if (!ret) {
+			ret = configfs_detach_prep(dentry, &wait_mutex);
+			if (ret)
+				configfs_detach_rollback(dentry);
+		}
 		spin_unlock(&configfs_dirent_lock);
 		mutex_unlock(&configfs_symlink_mutex);
 
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy
  2008-12-18 18:00 ` [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy Louis Rilling
@ 2009-01-28  4:13   ` Joel Becker
  2009-01-28 10:32     ` Louis Rilling
  0 siblings, 1 reply; 11+ messages in thread
From: Joel Becker @ 2009-01-28  4:13 UTC (permalink / raw)
  To: Louis Rilling; +Cc: linux-kernel, akpm, cluster-devel, swhiteho, peterz

On Thu, Dec 18, 2008 at 07:00:18PM +0100, Louis Rilling wrote:
> configfs_depend_item() recursively locks all inodes mutex from configfs root to
> the target item, which makes lockdep unhappy. The purpose of this recursive
> locking is to ensure that the item tree can be safely parsed and that the target
> item, if found, is not about to leave.
> 
> This patch reworks configfs_depend_item() locking using configfs_dirent_lock.
> Since configfs_dirent_lock protects all changes to the configfs_dirent tree, and
> protects tagging of items to be removed, this lock can be used instead of the
> inodes mutex lock chain.
> This needs that the check for dependents be done atomically with
> CONFIGFS_USET_DROPPING tagging.
> 
> Now lockdep looks happy with configfs.

	This looks almost, but not quite right.
	In the create path, we do configfs_new_dirent() before we set
sd->s_type.  But configfs_new_dirent() attaches sd->s_sibling.  So, in
aonther thread, configfs_depend_prep() can traverse this s_sibling
without CONFIGFS_USET_CREATING being set.  This turns out to be safe
because CONFIGFS_DIR is also not set - but boy I'd like a comment about
that.
	What if we're in mkdir(2) in one thread and another thread is  
trying to pin the parent directory?  That is, we are inside
configfs_mkdir(parent, new_dentry, mode).  The other thread is doing
configfs_depend_item(subsys, parent).  With this patch, the other thread
will not take parent->i_mutex.  It will happily determine that
parent is part of the tree and bump its s_dependent with no locking.  Is
this OK?
	If it is - isn't this patch good without any other reason?  That
is, aside from the issues of lockdep, isn't it better for
configfs_depend_item() to never have to worry about the VFS locks other
than the configfs root?

Joel


-- 

 The zen have a saying:
 "When you learn how to listen, ANYONE can be your teacher."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy
  2009-01-28  4:13   ` Joel Becker
@ 2009-01-28 10:32     ` Louis Rilling
  0 siblings, 0 replies; 11+ messages in thread
From: Louis Rilling @ 2009-01-28 10:32 UTC (permalink / raw)
  To: linux-kernel, akpm, cluster-devel, swhiteho, peterz

[-- Attachment #1: Type: text/plain, Size: 2929 bytes --]

On 27/01/09 20:13 -0800, Joel Becker wrote:
> On Thu, Dec 18, 2008 at 07:00:18PM +0100, Louis Rilling wrote:
> > configfs_depend_item() recursively locks all inodes mutex from configfs root to
> > the target item, which makes lockdep unhappy. The purpose of this recursive
> > locking is to ensure that the item tree can be safely parsed and that the target
> > item, if found, is not about to leave.
> > 
> > This patch reworks configfs_depend_item() locking using configfs_dirent_lock.
> > Since configfs_dirent_lock protects all changes to the configfs_dirent tree, and
> > protects tagging of items to be removed, this lock can be used instead of the
> > inodes mutex lock chain.
> > This needs that the check for dependents be done atomically with
> > CONFIGFS_USET_DROPPING tagging.
> > 
> > Now lockdep looks happy with configfs.
> 
> 	This looks almost, but not quite right.
> 	In the create path, we do configfs_new_dirent() before we set
> sd->s_type.  But configfs_new_dirent() attaches sd->s_sibling.  So, in
> aonther thread, configfs_depend_prep() can traverse this s_sibling
> without CONFIGFS_USET_CREATING being set.  This turns out to be safe
> because CONFIGFS_DIR is also not set - but boy I'd like a comment about
> that.

Definitely agreed. I should have written this comment instead of letting you
notice this.

> 	What if we're in mkdir(2) in one thread and another thread is  
> trying to pin the parent directory?  That is, we are inside
> configfs_mkdir(parent, new_dentry, mode).  The other thread is doing
> configfs_depend_item(subsys, parent).  With this patch, the other thread
> will not take parent->i_mutex.  It will happily determine that
> parent is part of the tree and bump its s_dependent with no locking.  Is
> this OK?

Yes this is the expected impact. It is OK because
1) under a same critical section of configfs_dirent_lock, depend_item()
checks that CONFIGFS_USET_DROPPING is not set and bumps s_dependent;
2) under a same critical section of configfs_dirent_lock, configfs_rmdir()
checks the s_dependent count and tries to set CONFIGFS_USET_DROPPING.

> 	If it is - isn't this patch good without any other reason?  That
> is, aside from the issues of lockdep, isn't it better for
> configfs_depend_item() to never have to worry about the VFS locks other
> than the configfs root?

Yes, this patch may look like an improvement, independently from lockdep. I
think that locking is simpler this way, and this also removes the need for
configfs_depend_rollback(). Moreover this moves towards the management of
configfs_dirents protected by configfs_dirent_lock only. In the end, it's up to
you to judge if this is a good direction ;)

Thanks,

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-05-04 10:20 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-28 18:18 [PATCH v2] Make lockdep happy with configfs Louis Rilling
2009-01-28 18:18 ` [PATCH 1/2] configfs: Silence lockdep on mkdir() and rmdir() Louis Rilling
2009-01-28 18:18 ` [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy Louis Rilling
2009-04-29 18:52   ` Joel Becker
2009-04-30  9:18     ` Louis Rilling
2009-04-30 17:20       ` Joel Becker
2009-04-30 17:30         ` Joel Becker
2009-05-04 10:20           ` Louis Rilling
  -- strict thread matches above, loose matches on Subject: below --
2008-12-18 11:15 [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() Louis Rilling
2008-12-18 18:00 ` [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy Louis Rilling
2009-01-28  4:13   ` Joel Becker
2009-01-28 10:32     ` Louis Rilling

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox