public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir()
@ 2008-06-16 17:00 Louis Rilling
  2008-06-16 17:00 ` [PATCH 1/5][BUGFIX] configfs: Introduce configfs_dirent_lock Louis Rilling
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Louis Rilling @ 2008-06-16 17:00 UTC (permalink / raw)
  To: Joel.Becker; +Cc: Louis.Rilling, linux-kernel, ocfs2-devel

Hi,

This patchset fixes the deadlock described below. This second version of the
patchset fixes another long-standing bug in symlinks locking. Please read the
headers of the third and fourth patches for a detailed explanation of the
bugfixes. The last patch fixes a new behavior introduced by the deadlock fix,
and that ones could find undesirable. Whether it is really needed or not is
debatable.

Changelog: 
  - improve comments on the usage of the new configfs_dirent_lock
  - fix a few missing critical sections in the first patch (see the patch header
    for details)
  - include the symlinks locking fix since it depends on configfs_dirent_lock
    and affects rmdir()
  - added a behavior fix that prevents a failing mkdir() from making rmdir()
    fail

The following procedure can trigger a deadlock in configfs (see
http://www.ussg.iu.edu/hypermail/linux/kernel/0806.1/0380.html for a patch
that makes it easier to trigger):

# mkdir /config/cluster/foo
# cd /config/cluster/foo
# mv heartbeat/dead_threshold node/bar

and in another shell, right after having launched test_deadlock:

# rmdir /config/cluster/foo

First, lockdep warns as usual (see below), and after two minutes (standard task
deadlock parameters), we get the dead lock alerts:

<log>

=============================================
[ INFO: possible recursive locking detected ]
2.6.26-rc5 #13
---------------------------------------------
rmdir/3997 is trying to acquire lock:
 (&sb->s_type->i_mutex_key#11){--..}, at: [<ffffffff802d2131>] configfs_detach_prep+0x58/0xaa

but task is already holding lock:
 (&sb->s_type->i_mutex_key#11){--..}, at: [<ffffffff80296070>] vfs_rmdir+0x49/0xac

other info that might help us debug this:
2 locks held by rmdir/3997:
 #0:  (&sb->s_type->i_mutex_key#3/1){--..}, at: [<ffffffff80297c77>] do_rmdir+0x82/0x108
 #1:  (&sb->s_type->i_mutex_key#11){--..}, at: [<ffffffff80296070>] vfs_rmdir+0x49/0xac

stack backtrace:
Pid: 3997, comm: rmdir Not tainted 2.6.26-rc5 #13

Call Trace:
 [<ffffffff8024aa65>] __lock_acquire+0x8d2/0xc78
 [<ffffffff802495ec>] find_usage_backwards+0x9d/0xbe
 [<ffffffff802d2131>] configfs_detach_prep+0x58/0xaa
 [<ffffffff8024b1de>] lock_acquire+0x51/0x6c
 [<ffffffff802d2131>] configfs_detach_prep+0x58/0xaa
 [<ffffffff80247dad>] debug_mutex_lock_common+0x16/0x23
 [<ffffffff805d63a4>] mutex_lock_nested+0xcd/0x23b
 [<ffffffff802d2131>] configfs_detach_prep+0x58/0xaa
 [<ffffffff802d327b>] configfs_rmdir+0xb8/0x1c3
 [<ffffffff80296092>] vfs_rmdir+0x6b/0xac
 [<ffffffff80297cac>] do_rmdir+0xb7/0x108
 [<ffffffff80249d1e>] trace_hardirqs_on+0xef/0x113
 [<ffffffff805d74c4>] trace_hardirqs_on_thunk+0x35/0x3a
 [<ffffffff8020b0cb>] system_call_after_swapgs+0x7b/0x80

INFO: task test_deadlock:3996 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
test_deadlock D 0000000000000001     0  3996   3980
 ffff81007cc93d78 0000000000000046 ffff81007cc93d40 ffffffff808ed280
 ffffffff808ed280 ffff81007cc93d28 ffffffff808ed280 ffffffff808ed280
 ffffffff808ed280 ffffffff808ea120 ffffffff808ed280 ffff81007cdcaa10
Call Trace:
 [<ffffffff802955e3>] lock_rename+0x11e/0x126
 [<ffffffff805d641e>] mutex_lock_nested+0x147/0x23b
 [<ffffffff802955e3>] lock_rename+0x11e/0x126
 [<ffffffff80297838>] sys_renameat+0xd7/0x21c
 [<ffffffff805d74c4>] trace_hardirqs_on_thunk+0x35/0x3a
 [<ffffffff80249d1e>] trace_hardirqs_on+0xef/0x113
 [<ffffffff805d74c4>] trace_hardirqs_on_thunk+0x35/0x3a
 [<ffffffff8020b0cb>] system_call_after_swapgs+0x7b/0x80

INFO: lockdep is turned off.
INFO: task rmdir:3997 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rmdir         D 0000000000000000     0  3997   3986
 ffff81007cdb9dd8 0000000000000046 0000000000000000 ffffffff808ed280
 ffffffff808ed280 ffff81007cdb9d88 ffffffff808ed280 ffffffff808ed280
 ffffffff808ed280 ffffffff808ea120 ffffffff808ed280 ffff81007cde0a50
Call Trace:
 [<ffffffff802d2131>] configfs_detach_prep+0x58/0xaa
 [<ffffffff805d641e>] mutex_lock_nested+0x147/0x23b
 [<ffffffff802d2131>] configfs_detach_prep+0x58/0xaa
 [<ffffffff802d327b>] configfs_rmdir+0xb8/0x1c3
 [<ffffffff80296092>] vfs_rmdir+0x6b/0xac
 [<ffffffff80297cac>] do_rmdir+0xb7/0x108
 [<ffffffff80249d1e>] trace_hardirqs_on+0xef/0x113
 [<ffffffff805d74c4>] trace_hardirqs_on_thunk+0x35/0x3a
 [<ffffffff8020b0cb>] system_call_after_swapgs+0x7b/0x80

INFO: lockdep is turned off.

</log>

The issue here is that the VFS locks the i_mutex of the source and target
directories of the rename in source -> target order (because none is ascendent
of the other one), while configfs_detach_prep() takes them in default group
order (or reverse order, I'm not sure), following the order specified by the
groups' creator.

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/5][BUGFIX] configfs: Introduce configfs_dirent_lock
  2008-06-16 17:00 [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir() Louis Rilling
@ 2008-06-16 17:00 ` Louis Rilling
  2008-06-16 17:00 ` [PATCH 2/5][BUGFIX] configfs: Protect configfs_dirent s_links list mutations Louis Rilling
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Louis Rilling @ 2008-06-16 17:00 UTC (permalink / raw)
  To: Joel.Becker; +Cc: Louis.Rilling, linux-kernel, ocfs2-devel

[-- Attachment #1: configfs-introduce-configfs_dirent_lock.patch --]
[-- Type: text/plain, Size: 6878 bytes --]

This patch introduces configfs_dirent_lock spinlock to protect configfs_dirent
traversals against linkage mutations (add/del/move). This will allow
configfs_detach_prep() to avoid locking i_mutexes. 

Locking rules for configfs_dirent linkage mutations are the same plus the
requirement of taking configfs_dirent_lock. For configfs_dirent walking, one can
either take appropriate i_mutex as before, or take configfs_dirent_lock.

The spinlock could actually be a mutex, but the critical sections are either
O(1) or should not be too long (default groups walking in last patch).

ChangeLog:
  - Clarify the comment on configfs_dirent_lock usage
  - Move sd->s_element init before linking the new dirent
  - In lseek(), do not release configfs_dirent_lock before the dirent is
    relinked.

Signed-off-by: Louis Rilling <Louis.Rilling@kerlabs.com>
---
 fs/configfs/configfs_internal.h |    3 +++
 fs/configfs/dir.c               |   28 +++++++++++++++++++++++++++-
 fs/configfs/inode.c             |    2 ++
 fs/configfs/symlink.c           |    2 ++
 4 files changed, 34 insertions(+), 1 deletion(-)

Index: b/fs/configfs/configfs_internal.h
===================================================================
--- a/fs/configfs/configfs_internal.h	2008-06-16 15:59:47.000000000 +0200
+++ b/fs/configfs/configfs_internal.h	2008-06-16 18:54:04.000000000 +0200
@@ -26,6 +26,7 @@
 
 #include <linux/slab.h>
 #include <linux/list.h>
+#include <linux/spinlock.h>
 
 struct configfs_dirent {
 	atomic_t		s_count;
@@ -49,6 +50,8 @@ struct configfs_dirent {
 #define CONFIGFS_USET_DROPPING	0x0100
 #define CONFIGFS_NOT_PINNED	(CONFIGFS_ITEM_ATTR)
 
+extern spinlock_t configfs_dirent_lock;
+
 extern struct vfsmount * configfs_mount;
 extern struct kmem_cache *configfs_dir_cachep;
 
Index: b/fs/configfs/dir.c
===================================================================
--- a/fs/configfs/dir.c	2008-06-16 15:59:47.000000000 +0200
+++ b/fs/configfs/dir.c	2008-06-16 18:54:25.000000000 +0200
@@ -35,6 +35,14 @@
 #include "configfs_internal.h"
 
 DECLARE_RWSEM(configfs_rename_sem);
+/*
+ * Protects mutations of configfs_dirent linkage together with proper i_mutex
+ * Mutators of configfs_dirent linkage must *both* have the proper inode locked
+ * and configfs_dirent_lock locked, in that order.
+ * This allows one to safely traverse configfs_dirent trees without having to
+ * lock inodes.
+ */
+DEFINE_SPINLOCK(configfs_dirent_lock);
 
 static void configfs_d_iput(struct dentry * dentry,
 			    struct inode * inode)
@@ -79,8 +87,10 @@ static struct configfs_dirent *configfs_
 	atomic_set(&sd->s_count, 1);
 	INIT_LIST_HEAD(&sd->s_links);
 	INIT_LIST_HEAD(&sd->s_children);
-	list_add(&sd->s_sibling, &parent_sd->s_children);
 	sd->s_element = element;
+	spin_lock(&configfs_dirent_lock);
+	list_add(&sd->s_sibling, &parent_sd->s_children);
+	spin_unlock(&configfs_dirent_lock);
 
 	return sd;
 }
@@ -173,7 +183,9 @@ static int create_dir(struct config_item
 		} else {
 			struct configfs_dirent *sd = d->d_fsdata;
 			if (sd) {
+				spin_lock(&configfs_dirent_lock);
 				list_del_init(&sd->s_sibling);
+				spin_unlock(&configfs_dirent_lock);
 				configfs_put(sd);
 			}
 		}
@@ -224,7 +236,9 @@ int configfs_create_link(struct configfs
 		else {
 			struct configfs_dirent *sd = dentry->d_fsdata;
 			if (sd) {
+				spin_lock(&configfs_dirent_lock);
 				list_del_init(&sd->s_sibling);
+				spin_unlock(&configfs_dirent_lock);
 				configfs_put(sd);
 			}
 		}
@@ -238,7 +252,9 @@ static void remove_dir(struct dentry * d
 	struct configfs_dirent * sd;
 
 	sd = d->d_fsdata;
+	spin_lock(&configfs_dirent_lock);
 	list_del_init(&sd->s_sibling);
+	spin_unlock(&configfs_dirent_lock);
 	configfs_put(sd);
 	if (d->d_inode)
 		simple_rmdir(parent->d_inode,d);
@@ -410,7 +426,9 @@ static void detach_attrs(struct config_i
 	list_for_each_entry_safe(sd, tmp, &parent_sd->s_children, s_sibling) {
 		if (!sd->s_element || !(sd->s_type & CONFIGFS_NOT_PINNED))
 			continue;
+		spin_lock(&configfs_dirent_lock);
 		list_del_init(&sd->s_sibling);
+		spin_unlock(&configfs_dirent_lock);
 		configfs_drop_dentry(sd, dentry);
 		configfs_put(sd);
 	}
@@ -1268,7 +1286,9 @@ static int configfs_dir_close(struct ino
 	struct configfs_dirent * cursor = file->private_data;
 
 	mutex_lock(&dentry->d_inode->i_mutex);
+	spin_lock(&configfs_dirent_lock);
 	list_del_init(&cursor->s_sibling);
+	spin_unlock(&configfs_dirent_lock);
 	mutex_unlock(&dentry->d_inode->i_mutex);
 
 	release_configfs_dirent(cursor);
@@ -1308,7 +1328,9 @@ static int configfs_readdir(struct file 
 			/* fallthrough */
 		default:
 			if (filp->f_pos == 2) {
+				spin_lock(&configfs_dirent_lock);
 				list_move(q, &parent_sd->s_children);
+				spin_unlock(&configfs_dirent_lock);
 			}
 			for (p=q->next; p!= &parent_sd->s_children; p=p->next) {
 				struct configfs_dirent *next;
@@ -1331,7 +1353,9 @@ static int configfs_readdir(struct file 
 						 dt_type(next)) < 0)
 					return 0;
 
+				spin_lock(&configfs_dirent_lock);
 				list_move(q, p);
+				spin_unlock(&configfs_dirent_lock);
 				p = q;
 				filp->f_pos++;
 			}
@@ -1362,6 +1386,7 @@ static loff_t configfs_dir_lseek(struct 
 			struct list_head *p;
 			loff_t n = file->f_pos - 2;
 
+			spin_lock(&configfs_dirent_lock);
 			list_del(&cursor->s_sibling);
 			p = sd->s_children.next;
 			while (n && p != &sd->s_children) {
@@ -1373,6 +1398,7 @@ static loff_t configfs_dir_lseek(struct 
 				p = p->next;
 			}
 			list_add_tail(&cursor->s_sibling, p);
+			spin_unlock(&configfs_dirent_lock);
 		}
 	}
 	mutex_unlock(&dentry->d_inode->i_mutex);
Index: b/fs/configfs/inode.c
===================================================================
--- a/fs/configfs/inode.c	2008-06-16 15:59:47.000000000 +0200
+++ b/fs/configfs/inode.c	2008-06-16 17:22:13.000000000 +0200
@@ -247,7 +247,9 @@ void configfs_hash_and_remove(struct den
 		if (!sd->s_element)
 			continue;
 		if (!strcmp(configfs_get_name(sd), name)) {
+			spin_lock(&configfs_dirent_lock);
 			list_del_init(&sd->s_sibling);
+			spin_unlock(&configfs_dirent_lock);
 			configfs_drop_dentry(sd, dir);
 			configfs_put(sd);
 			break;
Index: b/fs/configfs/symlink.c
===================================================================
--- a/fs/configfs/symlink.c	2008-06-16 15:59:47.000000000 +0200
+++ b/fs/configfs/symlink.c	2008-06-16 18:54:25.000000000 +0200
@@ -169,7 +169,9 @@ int configfs_unlink(struct inode *dir, s
 	parent_item = configfs_get_config_item(dentry->d_parent);
 	type = parent_item->ci_type;
 
+	spin_lock(&configfs_dirent_lock);
 	list_del_init(&sd->s_sibling);
+	spin_unlock(&configfs_dirent_lock);
 	configfs_drop_dentry(sd, dentry->d_parent);
 	dput(dentry);
 	configfs_put(sd);

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/5][BUGFIX] configfs: Protect configfs_dirent s_links list mutations
  2008-06-16 17:00 [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir() Louis Rilling
  2008-06-16 17:00 ` [PATCH 1/5][BUGFIX] configfs: Introduce configfs_dirent_lock Louis Rilling
@ 2008-06-16 17:00 ` Louis Rilling
  2008-06-16 17:01 ` [PATCH 3/5][BUGFIX] configfs: Make configfs_new_dirent() return error code instead of NULL Louis Rilling
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Louis Rilling @ 2008-06-16 17:00 UTC (permalink / raw)
  To: Joel.Becker; +Cc: Louis.Rilling, linux-kernel, ocfs2-devel

[-- Attachment #1: configfs-fix-symlink-dirent-linkage-locking.patch --]
[-- Type: text/plain, Size: 2871 bytes --]

Symlinks to a config_item are listed under its configfs_dirent s_links, but the
list mutations are not protected by any common lock.

This patch uses the configfs_dirent_lock spinlock to add the necessary
protection.

Note: we should also protect the list_empty() test in configfs_detach_prep() but
1/ the lock should not be released immediately because nothing would prevent the
list from being filled after a successful list_empty() test, making the problem
tricky,
2/ this will be solved by the rmdir() vs rename() deadlock bugfix.

Signed-off-by: Louis Rilling <Louis.Rilling@kerlabs.com>
---
 fs/configfs/dir.c     |    5 +++--
 fs/configfs/symlink.c |    8 ++++++--
 2 files changed, 9 insertions(+), 4 deletions(-)

Index: b/fs/configfs/symlink.c
===================================================================
--- a/fs/configfs/symlink.c	2008-06-16 17:42:31.000000000 +0200
+++ b/fs/configfs/symlink.c	2008-06-16 17:42:43.000000000 +0200
@@ -77,12 +77,15 @@ static int create_link(struct config_ite
 	sl = kmalloc(sizeof(struct configfs_symlink), GFP_KERNEL);
 	if (sl) {
 		sl->sl_target = config_item_get(item);
-		/* FIXME: needs a lock, I'd bet */
+		spin_lock(&configfs_dirent_lock);
 		list_add(&sl->sl_list, &target_sd->s_links);
+		spin_unlock(&configfs_dirent_lock);
 		ret = configfs_create_link(sl, parent_item->ci_dentry,
 					   dentry);
 		if (ret) {
+			spin_lock(&configfs_dirent_lock);
 			list_del_init(&sl->sl_list);
+			spin_unlock(&configfs_dirent_lock);
 			config_item_put(item);
 			kfree(sl);
 		}
@@ -186,8 +189,9 @@ int configfs_unlink(struct inode *dir, s
 		type->ct_item_ops->drop_link(parent_item,
 					       sl->sl_target);
 
-	/* FIXME: Needs lock */
+	spin_lock(&configfs_dirent_lock);
 	list_del_init(&sl->sl_list);
+	spin_unlock(&configfs_dirent_lock);
 
 	/* Put reference from create_link() */
 	config_item_put(sl->sl_target);
Index: b/fs/configfs/dir.c
===================================================================
--- a/fs/configfs/dir.c	2008-06-16 17:42:31.000000000 +0200
+++ b/fs/configfs/dir.c	2008-06-16 18:54:20.000000000 +0200
@@ -37,10 +37,11 @@
 DECLARE_RWSEM(configfs_rename_sem);
 /*
  * Protects mutations of configfs_dirent linkage together with proper i_mutex
+ * Also protects mutations of symlinks linkage to target configfs_dirent
  * Mutators of configfs_dirent linkage must *both* have the proper inode locked
  * and configfs_dirent_lock locked, in that order.
- * This allows one to safely traverse configfs_dirent trees without having to
- * lock inodes.
+ * This allows one to safely traverse configfs_dirent trees and symlinks without
+ * having to lock inodes.
  */
 DEFINE_SPINLOCK(configfs_dirent_lock);
 

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 3/5][BUGFIX] configfs: Make configfs_new_dirent() return error code instead of NULL
  2008-06-16 17:00 [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir() Louis Rilling
  2008-06-16 17:00 ` [PATCH 1/5][BUGFIX] configfs: Introduce configfs_dirent_lock Louis Rilling
  2008-06-16 17:00 ` [PATCH 2/5][BUGFIX] configfs: Protect configfs_dirent s_links list mutations Louis Rilling
@ 2008-06-16 17:01 ` Louis Rilling
  2008-06-16 17:01 ` [PATCH 4/5][BUGFIX] configfs: Fix deadlock with racing rmdir() and rename() Louis Rilling
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Louis Rilling @ 2008-06-16 17:01 UTC (permalink / raw)
  To: Joel.Becker; +Cc: Louis.Rilling, linux-kernel, ocfs2-devel

[-- Attachment #1: configfs-make-configfs_new_dirent-return-errno.patch --]
[-- Type: text/plain, Size: 1666 bytes --]

This patch makes configfs_new_dirent return negative error code instead of NULL,
which will be useful in the next patch to differentiate ENOMEM from ENOENT.

Signed-off-by: Louis Rilling <Louis.Rilling@kerlabs.com>
---
 fs/configfs/dir.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

Index: b/fs/configfs/dir.c
===================================================================
--- a/fs/configfs/dir.c	2008-06-16 17:42:43.000000000 +0200
+++ b/fs/configfs/dir.c	2008-06-16 18:54:11.000000000 +0200
@@ -30,6 +30,7 @@
 #include <linux/mount.h>
 #include <linux/module.h>
 #include <linux/slab.h>
+#include <linux/err.h>
 
 #include <linux/configfs.h>
 #include "configfs_internal.h"
@@ -83,7 +84,7 @@ static struct configfs_dirent *configfs_
 
 	sd = kmem_cache_zalloc(configfs_dir_cachep, GFP_KERNEL);
 	if (!sd)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 
 	atomic_set(&sd->s_count, 1);
 	INIT_LIST_HEAD(&sd->s_links);
@@ -129,8 +130,8 @@ int configfs_make_dirent(struct configfs
 	struct configfs_dirent * sd;
 
 	sd = configfs_new_dirent(parent_sd, element);
-	if (!sd)
-		return -ENOMEM;
+	if (IS_ERR(sd))
+		return PTR_ERR(sd);
 
 	sd->s_mode = mode;
 	sd->s_type = type;
@@ -1277,7 +1278,7 @@ static int configfs_dir_open(struct inod
 	file->private_data = configfs_new_dirent(parent_sd, NULL);
 	mutex_unlock(&dentry->d_inode->i_mutex);
 
-	return file->private_data ? 0 : -ENOMEM;
+	return IS_ERR(file->private_data) ? PTR_ERR(file->private_data) : 0;
 
 }
 

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 4/5][BUGFIX] configfs: Fix deadlock with racing rmdir() and rename()
  2008-06-16 17:00 [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir() Louis Rilling
                   ` (2 preceding siblings ...)
  2008-06-16 17:01 ` [PATCH 3/5][BUGFIX] configfs: Make configfs_new_dirent() return error code instead of NULL Louis Rilling
@ 2008-06-16 17:01 ` Louis Rilling
  2008-06-16 17:01 ` [PATCH 5/5][BUGFIX] configfs: Fix failing mkdir() making racing rmdir() fail Louis Rilling
  2008-06-16 23:26 ` [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir() Joel Becker
  5 siblings, 0 replies; 9+ messages in thread
From: Louis Rilling @ 2008-06-16 17:01 UTC (permalink / raw)
  To: Joel.Becker; +Cc: Louis.Rilling, linux-kernel, ocfs2-devel

[-- Attachment #1: configfs-fix-rmdir-vs-rename-deadlock.patch --]
[-- Type: text/plain, Size: 5747 bytes --]

This patch fixes the deadlock between racing sys_rename() and configfs_rmdir().

The idea is to avoid locking i_mutexes of default groups in
configfs_detach_prep(), and rely instead on the new configfs_dirent_lock to
protect against configfs_dirent's linkage mutations. To ensure that an mkdir()
racing with rmdir() will not create new items in a to-be-removed default group,
we make configfs_new_dirent() check for the CONFIGFS_USET_DROPPING flag right
before linking the new dirent, and return error if the flag is set. This makes
racing mkdir()/symlink()/dir_open() fail in places where errors could already
happen, resp. in (attach_item()|attach_group())/create_link()/new_dirent().

configfs_depend() remains safe since it locks all the path from configfs root,
and is thus mutually exclusive with rmdir().

An advantage of this is that now detach_groups() unconditionnaly takes the
default groups i_mutex, which makes it more consistent with populate_groups().

Signed-off-by: Louis Rilling <Louis.Rilling@kerlabs.com>
---
 fs/configfs/dir.c |   45 +++++++++++++++++++++++++--------------------
 1 file changed, 25 insertions(+), 20 deletions(-)

Index: b/fs/configfs/dir.c
===================================================================
--- a/fs/configfs/dir.c	2008-06-16 18:06:41.000000000 +0200
+++ b/fs/configfs/dir.c	2008-06-16 18:54:04.000000000 +0200
@@ -43,6 +43,10 @@ DECLARE_RWSEM(configfs_rename_sem);
  * and configfs_dirent_lock locked, in that order.
  * This allows one to safely traverse configfs_dirent trees and symlinks without
  * having to lock inodes.
+ *
+ * Protects setting of CONFIGFS_USET_DROPPING: checking the flag
+ * unlocked is not reliable unless in detach_groups() called from
+ * rmdir()/unregister() and from configfs_attach_group()
  */
 DEFINE_SPINLOCK(configfs_dirent_lock);
 
@@ -91,6 +95,11 @@ static struct configfs_dirent *configfs_
 	INIT_LIST_HEAD(&sd->s_children);
 	sd->s_element = element;
 	spin_lock(&configfs_dirent_lock);
+	if (parent_sd->s_type & CONFIGFS_USET_DROPPING) {
+		spin_unlock(&configfs_dirent_lock);
+		kmem_cache_free(configfs_dir_cachep, sd);
+		return ERR_PTR(-ENOENT);
+	}
 	list_add(&sd->s_sibling, &parent_sd->s_children);
 	spin_unlock(&configfs_dirent_lock);
 
@@ -349,11 +358,11 @@ static struct dentry * configfs_lookup(s
 
 /*
  * Only subdirectories count here.  Files (CONFIGFS_NOT_PINNED) are
- * attributes and are removed by rmdir().  We recurse, taking i_mutex
- * on all children that are candidates for default detach.  If the
- * result is clean, then configfs_detach_group() will handle dropping
- * i_mutex.  If there is an error, the caller will clean up the i_mutex
- * holders via configfs_detach_rollback().
+ * attributes and are removed by rmdir().  We recurse, setting
+ * CONFIGFS_USET_DROPPING on all children that are candidates for
+ * default detach.
+ * If there is an error, the caller will reset the flags via
+ * configfs_detach_rollback().
  */
 static int configfs_detach_prep(struct dentry *dentry)
 {
@@ -370,8 +379,7 @@ static int configfs_detach_prep(struct d
 		if (sd->s_type & CONFIGFS_NOT_PINNED)
 			continue;
 		if (sd->s_type & CONFIGFS_USET_DEFAULT) {
-			mutex_lock(&sd->s_dentry->d_inode->i_mutex);
-			/* Mark that we've taken i_mutex */
+			/* Mark that we're trying to drop the group */
 			sd->s_type |= CONFIGFS_USET_DROPPING;
 
 			/*
@@ -392,7 +400,7 @@ out:
 }
 
 /*
- * Walk the tree, dropping i_mutex wherever CONFIGFS_USET_DROPPING is
+ * Walk the tree, resetting CONFIGFS_USET_DROPPING wherever it was
  * set.
  */
 static void configfs_detach_rollback(struct dentry *dentry)
@@ -403,11 +411,7 @@ static void configfs_detach_rollback(str
 	list_for_each_entry(sd, &parent_sd->s_children, s_sibling) {
 		if (sd->s_type & CONFIGFS_USET_DEFAULT) {
 			configfs_detach_rollback(sd->s_dentry);
-
-			if (sd->s_type & CONFIGFS_USET_DROPPING) {
-				sd->s_type &= ~CONFIGFS_USET_DROPPING;
-				mutex_unlock(&sd->s_dentry->d_inode->i_mutex);
-			}
+			sd->s_type &= ~CONFIGFS_USET_DROPPING;
 		}
 	}
 }
@@ -486,16 +490,12 @@ static void detach_groups(struct config_
 
 		child = sd->s_dentry;
 
+		mutex_lock(&child->d_inode->i_mutex);
+
 		configfs_detach_group(sd->s_element);
 		child->d_inode->i_flags |= S_DEAD;
 
-		/*
-		 * From rmdir/unregister, a configfs_detach_prep() pass
-		 * has taken our i_mutex for us.  Drop it.
-		 * From mkdir/register cleanup, there is no sem held.
-		 */
-		if (sd->s_type & CONFIGFS_USET_DROPPING)
-			mutex_unlock(&child->d_inode->i_mutex);
+		mutex_unlock(&child->d_inode->i_mutex);
 
 		d_delete(child);
 		dput(child);
@@ -1181,12 +1181,15 @@ static int configfs_rmdir(struct inode *
 		return -EINVAL;
 	}
 
+	spin_lock(&configfs_dirent_lock);
 	ret = configfs_detach_prep(dentry);
 	if (ret) {
 		configfs_detach_rollback(dentry);
+		spin_unlock(&configfs_dirent_lock);
 		config_item_put(parent_item);
 		return ret;
 	}
+	spin_unlock(&configfs_dirent_lock);
 
 	/* Get a working ref for the duration of this function */
 	item = configfs_get_config_item(dentry);
@@ -1476,9 +1479,11 @@ void configfs_unregister_subsystem(struc
 	mutex_lock_nested(&configfs_sb->s_root->d_inode->i_mutex,
 			  I_MUTEX_PARENT);
 	mutex_lock_nested(&dentry->d_inode->i_mutex, I_MUTEX_CHILD);
+	spin_lock(&configfs_dirent_lock);
 	if (configfs_detach_prep(dentry)) {
 		printk(KERN_ERR "configfs: Tried to unregister non-empty subsystem!\n");
 	}
+	spin_unlock(&configfs_dirent_lock);
 	configfs_detach_group(&group->cg_item);
 	dentry->d_inode->i_flags |= S_DEAD;
 	mutex_unlock(&dentry->d_inode->i_mutex);

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 5/5][BUGFIX] configfs: Fix failing mkdir() making racing rmdir() fail
  2008-06-16 17:00 [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir() Louis Rilling
                   ` (3 preceding siblings ...)
  2008-06-16 17:01 ` [PATCH 4/5][BUGFIX] configfs: Fix deadlock with racing rmdir() and rename() Louis Rilling
@ 2008-06-16 17:01 ` Louis Rilling
  2008-06-16 23:26 ` [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir() Joel Becker
  5 siblings, 0 replies; 9+ messages in thread
From: Louis Rilling @ 2008-06-16 17:01 UTC (permalink / raw)
  To: Joel.Becker; +Cc: Louis.Rilling, linux-kernel, ocfs2-devel

[-- Attachment #1: configfs-fix-failing-mkdir-making-rmdir-fail.patch --]
[-- Type: text/plain, Size: 5410 bytes --]

When fixing the rename() vs rmdir() deadlock, we stopped locking default groups'
inodes in configfs_detach_prep(), letting racing mkdir() in default groups
proceed concurrently. This enables races like below happen, which leads to a
failing mkdir() making rmdir() fail, despite the group to remove having no
user-created directory under it in the end.

	process A: 			process B:
	/* PWD=A/B */
	mkdir("C")
	  make_item("C")
	  attach_group("C")
					rmdir("A")
					  detach_prep("A")
					    detach_prep("B")
					      error because of "C"
					  return -ENOTEMPTY
	    attach_group("C/D")
	      error (eg -ENOMEM)
	  return -ENOMEM

This patch prevents such scenarii by making rmdir() wait as long as
detach_prep() fails because a racing mkdir() is in the middle of attach_group().
To achieve this, mkdir() sets a flag CONFIGFS_USET_IN_MKDIR in parent's
configfs_dirent before calling attach_group(), and clears the flag once
attach_group() is done. detach_prep() fails with -EAGAIN whenever the flag is
hit and returns the guilty inode's mutex so that rmdir() can wait on it.

Signed-off-by: Louis Rilling <Louis.Rilling@kerlabs.com>
---
 fs/configfs/configfs_internal.h |    1 
 fs/configfs/dir.c               |   53 ++++++++++++++++++++++++++++++++--------
 2 files changed, 44 insertions(+), 10 deletions(-)

Index: b/fs/configfs/configfs_internal.h
===================================================================
--- a/fs/configfs/configfs_internal.h	2008-06-16 18:06:27.000000000 +0200
+++ b/fs/configfs/configfs_internal.h	2008-06-16 18:06:54.000000000 +0200
@@ -48,6 +48,7 @@ struct configfs_dirent {
 #define CONFIGFS_USET_DIR	0x0040
 #define CONFIGFS_USET_DEFAULT	0x0080
 #define CONFIGFS_USET_DROPPING	0x0100
+#define CONFIGFS_USET_IN_MKDIR	0x0200
 #define CONFIGFS_NOT_PINNED	(CONFIGFS_ITEM_ATTR)
 
 extern spinlock_t configfs_dirent_lock;
Index: b/fs/configfs/dir.c
===================================================================
--- a/fs/configfs/dir.c	2008-06-16 18:06:50.000000000 +0200
+++ b/fs/configfs/dir.c	2008-06-16 18:52:50.000000000 +0200
@@ -364,7 +364,7 @@ static struct dentry * configfs_lookup(s
  * If there is an error, the caller will reset the flags via
  * configfs_detach_rollback().
  */
-static int configfs_detach_prep(struct dentry *dentry)
+static int configfs_detach_prep(struct dentry *dentry, struct mutex **wait_mutex)
 {
 	struct configfs_dirent *parent_sd = dentry->d_fsdata;
 	struct configfs_dirent *sd;
@@ -379,6 +379,12 @@ static int configfs_detach_prep(struct d
 		if (sd->s_type & CONFIGFS_NOT_PINNED)
 			continue;
 		if (sd->s_type & CONFIGFS_USET_DEFAULT) {
+			/* Abort if racing with mkdir() */
+			if (sd->s_type & CONFIGFS_USET_IN_MKDIR) {
+				if (wait_mutex)
+					*wait_mutex = &sd->s_dentry->d_inode->i_mutex;
+				return -EAGAIN;
+			}
 			/* Mark that we're trying to drop the group */
 			sd->s_type |= CONFIGFS_USET_DROPPING;
 
@@ -386,7 +392,7 @@ static int configfs_detach_prep(struct d
 			 * Yup, recursive.  If there's a problem, blame
 			 * deep nesting of default_groups
 			 */
-			ret = configfs_detach_prep(sd->s_dentry);
+			ret = configfs_detach_prep(sd->s_dentry, wait_mutex);
 			if (!ret)
 				continue;
 		} else
@@ -1113,11 +1119,26 @@ static int configfs_mkdir(struct inode *
 	 */
 	module_got = 1;
 
+	/*
+	 * Make racing rmdir() fail if it did not tag parent with
+	 * CONFIGFS_USET_DROPPING
+	 * Note: if CONFIGFS_USET_DROPPING is already set, attach_group() will
+	 * fail and let rmdir() terminate correctly
+	 */
+	spin_lock(&configfs_dirent_lock);
+	/* This will make configfs_detach_prep() fail */
+	sd->s_type |= CONFIGFS_USET_IN_MKDIR;
+	spin_unlock(&configfs_dirent_lock);
+
 	if (group)
 		ret = configfs_attach_group(parent_item, item, dentry);
 	else
 		ret = configfs_attach_item(parent_item, item, dentry);
 
+	spin_lock(&configfs_dirent_lock);
+	sd->s_type &= ~CONFIGFS_USET_IN_MKDIR;
+	spin_unlock(&configfs_dirent_lock);
+
 out_unlink:
 	if (ret) {
 		/* Tear down everything we built up */
@@ -1182,13 +1203,25 @@ static int configfs_rmdir(struct inode *
 	}
 
 	spin_lock(&configfs_dirent_lock);
-	ret = configfs_detach_prep(dentry);
-	if (ret) {
-		configfs_detach_rollback(dentry);
-		spin_unlock(&configfs_dirent_lock);
-		config_item_put(parent_item);
-		return ret;
-	}
+	do {
+		struct mutex *wait_mutex;
+
+		ret = configfs_detach_prep(dentry, &wait_mutex);
+		if (ret) {
+			configfs_detach_rollback(dentry);
+			spin_unlock(&configfs_dirent_lock);
+			if (ret != -EAGAIN) {
+				config_item_put(parent_item);
+				return ret;
+			}
+
+			/* Wait until the racing operation terminates */
+			mutex_lock(wait_mutex);
+			mutex_unlock(wait_mutex);
+
+			spin_lock(&configfs_dirent_lock);
+		}
+	} while (ret == -EAGAIN);
 	spin_unlock(&configfs_dirent_lock);
 
 	/* Get a working ref for the duration of this function */
@@ -1480,7 +1513,7 @@ void configfs_unregister_subsystem(struc
 			  I_MUTEX_PARENT);
 	mutex_lock_nested(&dentry->d_inode->i_mutex, I_MUTEX_CHILD);
 	spin_lock(&configfs_dirent_lock);
-	if (configfs_detach_prep(dentry)) {
+	if (configfs_detach_prep(dentry, NULL)) {
 		printk(KERN_ERR "configfs: Tried to unregister non-empty subsystem!\n");
 	}
 	spin_unlock(&configfs_dirent_lock);

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir()
  2008-06-16 17:00 [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir() Louis Rilling
                   ` (4 preceding siblings ...)
  2008-06-16 17:01 ` [PATCH 5/5][BUGFIX] configfs: Fix failing mkdir() making racing rmdir() fail Louis Rilling
@ 2008-06-16 23:26 ` Joel Becker
  2008-06-23 22:34   ` Joel Becker
  5 siblings, 1 reply; 9+ messages in thread
From: Joel Becker @ 2008-06-16 23:26 UTC (permalink / raw)
  To: Louis Rilling; +Cc: linux-kernel, ocfs2-devel

On Mon, Jun 16, 2008 at 07:00:57PM +0200, Louis Rilling wrote:
> This patchset fixes the deadlock described below. This second version of the
> patchset fixes another long-standing bug in symlinks locking. Please read the
> headers of the third and fourth patches for a detailed explanation of the
> bugfixes. The last patch fixes a new behavior introduced by the deadlock fix,
> and that ones could find undesirable. Whether it is really needed or not is
> debatable.

	Ok, this is in-tree for the next merge.  Thanks.

Joel

-- 

"There is no sincerer love than the love of food."  
         - George Bernard Shaw 

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir()
  2008-06-16 23:26 ` [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir() Joel Becker
@ 2008-06-23 22:34   ` Joel Becker
  2008-06-24 13:30     ` Louis Rilling
  0 siblings, 1 reply; 9+ messages in thread
From: Joel Becker @ 2008-06-23 22:34 UTC (permalink / raw)
  To: Louis Rilling, linux-kernel, ocfs2-devel

On Mon, Jun 16, 2008 at 04:26:57PM -0700, Joel Becker wrote:
> 	Ok, this is in-tree for the next merge.  Thanks.

	Btw, we still need to solve the lockdep issue.  I'm leaning
towards turning it off for attach/detach_groups(), because any CHILD+N
level won't mesh with the VFS anyway.
	We also need a comment on detach_group() pointing out its
recursive locking.  It's there in the code, but it's not obvious.

Joel

-- 

"Not everything that can be counted counts, and not everything
 that counts can be counted."
        - Albert Einstein 

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir()
  2008-06-23 22:34   ` Joel Becker
@ 2008-06-24 13:30     ` Louis Rilling
  0 siblings, 0 replies; 9+ messages in thread
From: Louis Rilling @ 2008-06-24 13:30 UTC (permalink / raw)
  To: Joel Becker; +Cc: linux-kernel, ocfs2-devel

[-- Attachment #1: Type: text/plain, Size: 875 bytes --]

On Mon, Jun 23, 2008 at 03:34:50PM -0700, Joel Becker wrote:
> On Mon, Jun 16, 2008 at 04:26:57PM -0700, Joel Becker wrote:
> > 	Ok, this is in-tree for the next merge.  Thanks.
> 
> 	Btw, we still need to solve the lockdep issue.  I'm leaning
> towards turning it off for attach/detach_groups(), because any CHILD+N
> level won't mesh with the VFS anyway.

Yes, and the same for depend_prep()/depend_rollback().

> 	We also need a comment on detach_group() pointing out its
> recursive locking.  It's there in the code, but it's not obvious.

Will do all of this after we are sure not to change things again (see my
separate email on a possible bug in attach_group()).

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-06-24 13:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-16 17:00 [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir() Louis Rilling
2008-06-16 17:00 ` [PATCH 1/5][BUGFIX] configfs: Introduce configfs_dirent_lock Louis Rilling
2008-06-16 17:00 ` [PATCH 2/5][BUGFIX] configfs: Protect configfs_dirent s_links list mutations Louis Rilling
2008-06-16 17:01 ` [PATCH 3/5][BUGFIX] configfs: Make configfs_new_dirent() return error code instead of NULL Louis Rilling
2008-06-16 17:01 ` [PATCH 4/5][BUGFIX] configfs: Fix deadlock with racing rmdir() and rename() Louis Rilling
2008-06-16 17:01 ` [PATCH 5/5][BUGFIX] configfs: Fix failing mkdir() making racing rmdir() fail Louis Rilling
2008-06-16 23:26 ` [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir() Joel Becker
2008-06-23 22:34   ` Joel Becker
2008-06-24 13:30     ` Louis Rilling

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox