* [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir()
@ 2008-06-16 17:00 Louis Rilling
2008-06-16 17:00 ` [PATCH 1/5][BUGFIX] configfs: Introduce configfs_dirent_lock Louis Rilling
` (5 more replies)
0 siblings, 6 replies; 9+ messages in thread
From: Louis Rilling @ 2008-06-16 17:00 UTC (permalink / raw)
To: Joel.Becker; +Cc: Louis.Rilling, linux-kernel, ocfs2-devel
Hi,
This patchset fixes the deadlock described below. This second version of the
patchset fixes another long-standing bug in symlinks locking. Please read the
headers of the third and fourth patches for a detailed explanation of the
bugfixes. The last patch fixes a new behavior introduced by the deadlock fix,
and that ones could find undesirable. Whether it is really needed or not is
debatable.
Changelog:
- improve comments on the usage of the new configfs_dirent_lock
- fix a few missing critical sections in the first patch (see the patch header
for details)
- include the symlinks locking fix since it depends on configfs_dirent_lock
and affects rmdir()
- added a behavior fix that prevents a failing mkdir() from making rmdir()
fail
The following procedure can trigger a deadlock in configfs (see
http://www.ussg.iu.edu/hypermail/linux/kernel/0806.1/0380.html for a patch
that makes it easier to trigger):
# mkdir /config/cluster/foo
# cd /config/cluster/foo
# mv heartbeat/dead_threshold node/bar
and in another shell, right after having launched test_deadlock:
# rmdir /config/cluster/foo
First, lockdep warns as usual (see below), and after two minutes (standard task
deadlock parameters), we get the dead lock alerts:
<log>
=============================================
[ INFO: possible recursive locking detected ]
2.6.26-rc5 #13
---------------------------------------------
rmdir/3997 is trying to acquire lock:
(&sb->s_type->i_mutex_key#11){--..}, at: [<ffffffff802d2131>] configfs_detach_prep+0x58/0xaa
but task is already holding lock:
(&sb->s_type->i_mutex_key#11){--..}, at: [<ffffffff80296070>] vfs_rmdir+0x49/0xac
other info that might help us debug this:
2 locks held by rmdir/3997:
#0: (&sb->s_type->i_mutex_key#3/1){--..}, at: [<ffffffff80297c77>] do_rmdir+0x82/0x108
#1: (&sb->s_type->i_mutex_key#11){--..}, at: [<ffffffff80296070>] vfs_rmdir+0x49/0xac
stack backtrace:
Pid: 3997, comm: rmdir Not tainted 2.6.26-rc5 #13
Call Trace:
[<ffffffff8024aa65>] __lock_acquire+0x8d2/0xc78
[<ffffffff802495ec>] find_usage_backwards+0x9d/0xbe
[<ffffffff802d2131>] configfs_detach_prep+0x58/0xaa
[<ffffffff8024b1de>] lock_acquire+0x51/0x6c
[<ffffffff802d2131>] configfs_detach_prep+0x58/0xaa
[<ffffffff80247dad>] debug_mutex_lock_common+0x16/0x23
[<ffffffff805d63a4>] mutex_lock_nested+0xcd/0x23b
[<ffffffff802d2131>] configfs_detach_prep+0x58/0xaa
[<ffffffff802d327b>] configfs_rmdir+0xb8/0x1c3
[<ffffffff80296092>] vfs_rmdir+0x6b/0xac
[<ffffffff80297cac>] do_rmdir+0xb7/0x108
[<ffffffff80249d1e>] trace_hardirqs_on+0xef/0x113
[<ffffffff805d74c4>] trace_hardirqs_on_thunk+0x35/0x3a
[<ffffffff8020b0cb>] system_call_after_swapgs+0x7b/0x80
INFO: task test_deadlock:3996 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
test_deadlock D 0000000000000001 0 3996 3980
ffff81007cc93d78 0000000000000046 ffff81007cc93d40 ffffffff808ed280
ffffffff808ed280 ffff81007cc93d28 ffffffff808ed280 ffffffff808ed280
ffffffff808ed280 ffffffff808ea120 ffffffff808ed280 ffff81007cdcaa10
Call Trace:
[<ffffffff802955e3>] lock_rename+0x11e/0x126
[<ffffffff805d641e>] mutex_lock_nested+0x147/0x23b
[<ffffffff802955e3>] lock_rename+0x11e/0x126
[<ffffffff80297838>] sys_renameat+0xd7/0x21c
[<ffffffff805d74c4>] trace_hardirqs_on_thunk+0x35/0x3a
[<ffffffff80249d1e>] trace_hardirqs_on+0xef/0x113
[<ffffffff805d74c4>] trace_hardirqs_on_thunk+0x35/0x3a
[<ffffffff8020b0cb>] system_call_after_swapgs+0x7b/0x80
INFO: lockdep is turned off.
INFO: task rmdir:3997 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rmdir D 0000000000000000 0 3997 3986
ffff81007cdb9dd8 0000000000000046 0000000000000000 ffffffff808ed280
ffffffff808ed280 ffff81007cdb9d88 ffffffff808ed280 ffffffff808ed280
ffffffff808ed280 ffffffff808ea120 ffffffff808ed280 ffff81007cde0a50
Call Trace:
[<ffffffff802d2131>] configfs_detach_prep+0x58/0xaa
[<ffffffff805d641e>] mutex_lock_nested+0x147/0x23b
[<ffffffff802d2131>] configfs_detach_prep+0x58/0xaa
[<ffffffff802d327b>] configfs_rmdir+0xb8/0x1c3
[<ffffffff80296092>] vfs_rmdir+0x6b/0xac
[<ffffffff80297cac>] do_rmdir+0xb7/0x108
[<ffffffff80249d1e>] trace_hardirqs_on+0xef/0x113
[<ffffffff805d74c4>] trace_hardirqs_on_thunk+0x35/0x3a
[<ffffffff8020b0cb>] system_call_after_swapgs+0x7b/0x80
INFO: lockdep is turned off.
</log>
The issue here is that the VFS locks the i_mutex of the source and target
directories of the rename in source -> target order (because none is ascendent
of the other one), while configfs_detach_prep() takes them in default group
order (or reverse order, I'm not sure), following the order specified by the
groups' creator.
Louis
--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/5][BUGFIX] configfs: Introduce configfs_dirent_lock
2008-06-16 17:00 [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir() Louis Rilling
@ 2008-06-16 17:00 ` Louis Rilling
2008-06-16 17:00 ` [PATCH 2/5][BUGFIX] configfs: Protect configfs_dirent s_links list mutations Louis Rilling
` (4 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Louis Rilling @ 2008-06-16 17:00 UTC (permalink / raw)
To: Joel.Becker; +Cc: Louis.Rilling, linux-kernel, ocfs2-devel
[-- Attachment #1: configfs-introduce-configfs_dirent_lock.patch --]
[-- Type: text/plain, Size: 6878 bytes --]
This patch introduces configfs_dirent_lock spinlock to protect configfs_dirent
traversals against linkage mutations (add/del/move). This will allow
configfs_detach_prep() to avoid locking i_mutexes.
Locking rules for configfs_dirent linkage mutations are the same plus the
requirement of taking configfs_dirent_lock. For configfs_dirent walking, one can
either take appropriate i_mutex as before, or take configfs_dirent_lock.
The spinlock could actually be a mutex, but the critical sections are either
O(1) or should not be too long (default groups walking in last patch).
ChangeLog:
- Clarify the comment on configfs_dirent_lock usage
- Move sd->s_element init before linking the new dirent
- In lseek(), do not release configfs_dirent_lock before the dirent is
relinked.
Signed-off-by: Louis Rilling <Louis.Rilling@kerlabs.com>
---
fs/configfs/configfs_internal.h | 3 +++
fs/configfs/dir.c | 28 +++++++++++++++++++++++++++-
fs/configfs/inode.c | 2 ++
fs/configfs/symlink.c | 2 ++
4 files changed, 34 insertions(+), 1 deletion(-)
Index: b/fs/configfs/configfs_internal.h
===================================================================
--- a/fs/configfs/configfs_internal.h 2008-06-16 15:59:47.000000000 +0200
+++ b/fs/configfs/configfs_internal.h 2008-06-16 18:54:04.000000000 +0200
@@ -26,6 +26,7 @@
#include <linux/slab.h>
#include <linux/list.h>
+#include <linux/spinlock.h>
struct configfs_dirent {
atomic_t s_count;
@@ -49,6 +50,8 @@ struct configfs_dirent {
#define CONFIGFS_USET_DROPPING 0x0100
#define CONFIGFS_NOT_PINNED (CONFIGFS_ITEM_ATTR)
+extern spinlock_t configfs_dirent_lock;
+
extern struct vfsmount * configfs_mount;
extern struct kmem_cache *configfs_dir_cachep;
Index: b/fs/configfs/dir.c
===================================================================
--- a/fs/configfs/dir.c 2008-06-16 15:59:47.000000000 +0200
+++ b/fs/configfs/dir.c 2008-06-16 18:54:25.000000000 +0200
@@ -35,6 +35,14 @@
#include "configfs_internal.h"
DECLARE_RWSEM(configfs_rename_sem);
+/*
+ * Protects mutations of configfs_dirent linkage together with proper i_mutex
+ * Mutators of configfs_dirent linkage must *both* have the proper inode locked
+ * and configfs_dirent_lock locked, in that order.
+ * This allows one to safely traverse configfs_dirent trees without having to
+ * lock inodes.
+ */
+DEFINE_SPINLOCK(configfs_dirent_lock);
static void configfs_d_iput(struct dentry * dentry,
struct inode * inode)
@@ -79,8 +87,10 @@ static struct configfs_dirent *configfs_
atomic_set(&sd->s_count, 1);
INIT_LIST_HEAD(&sd->s_links);
INIT_LIST_HEAD(&sd->s_children);
- list_add(&sd->s_sibling, &parent_sd->s_children);
sd->s_element = element;
+ spin_lock(&configfs_dirent_lock);
+ list_add(&sd->s_sibling, &parent_sd->s_children);
+ spin_unlock(&configfs_dirent_lock);
return sd;
}
@@ -173,7 +183,9 @@ static int create_dir(struct config_item
} else {
struct configfs_dirent *sd = d->d_fsdata;
if (sd) {
+ spin_lock(&configfs_dirent_lock);
list_del_init(&sd->s_sibling);
+ spin_unlock(&configfs_dirent_lock);
configfs_put(sd);
}
}
@@ -224,7 +236,9 @@ int configfs_create_link(struct configfs
else {
struct configfs_dirent *sd = dentry->d_fsdata;
if (sd) {
+ spin_lock(&configfs_dirent_lock);
list_del_init(&sd->s_sibling);
+ spin_unlock(&configfs_dirent_lock);
configfs_put(sd);
}
}
@@ -238,7 +252,9 @@ static void remove_dir(struct dentry * d
struct configfs_dirent * sd;
sd = d->d_fsdata;
+ spin_lock(&configfs_dirent_lock);
list_del_init(&sd->s_sibling);
+ spin_unlock(&configfs_dirent_lock);
configfs_put(sd);
if (d->d_inode)
simple_rmdir(parent->d_inode,d);
@@ -410,7 +426,9 @@ static void detach_attrs(struct config_i
list_for_each_entry_safe(sd, tmp, &parent_sd->s_children, s_sibling) {
if (!sd->s_element || !(sd->s_type & CONFIGFS_NOT_PINNED))
continue;
+ spin_lock(&configfs_dirent_lock);
list_del_init(&sd->s_sibling);
+ spin_unlock(&configfs_dirent_lock);
configfs_drop_dentry(sd, dentry);
configfs_put(sd);
}
@@ -1268,7 +1286,9 @@ static int configfs_dir_close(struct ino
struct configfs_dirent * cursor = file->private_data;
mutex_lock(&dentry->d_inode->i_mutex);
+ spin_lock(&configfs_dirent_lock);
list_del_init(&cursor->s_sibling);
+ spin_unlock(&configfs_dirent_lock);
mutex_unlock(&dentry->d_inode->i_mutex);
release_configfs_dirent(cursor);
@@ -1308,7 +1328,9 @@ static int configfs_readdir(struct file
/* fallthrough */
default:
if (filp->f_pos == 2) {
+ spin_lock(&configfs_dirent_lock);
list_move(q, &parent_sd->s_children);
+ spin_unlock(&configfs_dirent_lock);
}
for (p=q->next; p!= &parent_sd->s_children; p=p->next) {
struct configfs_dirent *next;
@@ -1331,7 +1353,9 @@ static int configfs_readdir(struct file
dt_type(next)) < 0)
return 0;
+ spin_lock(&configfs_dirent_lock);
list_move(q, p);
+ spin_unlock(&configfs_dirent_lock);
p = q;
filp->f_pos++;
}
@@ -1362,6 +1386,7 @@ static loff_t configfs_dir_lseek(struct
struct list_head *p;
loff_t n = file->f_pos - 2;
+ spin_lock(&configfs_dirent_lock);
list_del(&cursor->s_sibling);
p = sd->s_children.next;
while (n && p != &sd->s_children) {
@@ -1373,6 +1398,7 @@ static loff_t configfs_dir_lseek(struct
p = p->next;
}
list_add_tail(&cursor->s_sibling, p);
+ spin_unlock(&configfs_dirent_lock);
}
}
mutex_unlock(&dentry->d_inode->i_mutex);
Index: b/fs/configfs/inode.c
===================================================================
--- a/fs/configfs/inode.c 2008-06-16 15:59:47.000000000 +0200
+++ b/fs/configfs/inode.c 2008-06-16 17:22:13.000000000 +0200
@@ -247,7 +247,9 @@ void configfs_hash_and_remove(struct den
if (!sd->s_element)
continue;
if (!strcmp(configfs_get_name(sd), name)) {
+ spin_lock(&configfs_dirent_lock);
list_del_init(&sd->s_sibling);
+ spin_unlock(&configfs_dirent_lock);
configfs_drop_dentry(sd, dir);
configfs_put(sd);
break;
Index: b/fs/configfs/symlink.c
===================================================================
--- a/fs/configfs/symlink.c 2008-06-16 15:59:47.000000000 +0200
+++ b/fs/configfs/symlink.c 2008-06-16 18:54:25.000000000 +0200
@@ -169,7 +169,9 @@ int configfs_unlink(struct inode *dir, s
parent_item = configfs_get_config_item(dentry->d_parent);
type = parent_item->ci_type;
+ spin_lock(&configfs_dirent_lock);
list_del_init(&sd->s_sibling);
+ spin_unlock(&configfs_dirent_lock);
configfs_drop_dentry(sd, dentry->d_parent);
dput(dentry);
configfs_put(sd);
--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/5][BUGFIX] configfs: Protect configfs_dirent s_links list mutations
2008-06-16 17:00 [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir() Louis Rilling
2008-06-16 17:00 ` [PATCH 1/5][BUGFIX] configfs: Introduce configfs_dirent_lock Louis Rilling
@ 2008-06-16 17:00 ` Louis Rilling
2008-06-16 17:01 ` [PATCH 3/5][BUGFIX] configfs: Make configfs_new_dirent() return error code instead of NULL Louis Rilling
` (3 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Louis Rilling @ 2008-06-16 17:00 UTC (permalink / raw)
To: Joel.Becker; +Cc: Louis.Rilling, linux-kernel, ocfs2-devel
[-- Attachment #1: configfs-fix-symlink-dirent-linkage-locking.patch --]
[-- Type: text/plain, Size: 2871 bytes --]
Symlinks to a config_item are listed under its configfs_dirent s_links, but the
list mutations are not protected by any common lock.
This patch uses the configfs_dirent_lock spinlock to add the necessary
protection.
Note: we should also protect the list_empty() test in configfs_detach_prep() but
1/ the lock should not be released immediately because nothing would prevent the
list from being filled after a successful list_empty() test, making the problem
tricky,
2/ this will be solved by the rmdir() vs rename() deadlock bugfix.
Signed-off-by: Louis Rilling <Louis.Rilling@kerlabs.com>
---
fs/configfs/dir.c | 5 +++--
fs/configfs/symlink.c | 8 ++++++--
2 files changed, 9 insertions(+), 4 deletions(-)
Index: b/fs/configfs/symlink.c
===================================================================
--- a/fs/configfs/symlink.c 2008-06-16 17:42:31.000000000 +0200
+++ b/fs/configfs/symlink.c 2008-06-16 17:42:43.000000000 +0200
@@ -77,12 +77,15 @@ static int create_link(struct config_ite
sl = kmalloc(sizeof(struct configfs_symlink), GFP_KERNEL);
if (sl) {
sl->sl_target = config_item_get(item);
- /* FIXME: needs a lock, I'd bet */
+ spin_lock(&configfs_dirent_lock);
list_add(&sl->sl_list, &target_sd->s_links);
+ spin_unlock(&configfs_dirent_lock);
ret = configfs_create_link(sl, parent_item->ci_dentry,
dentry);
if (ret) {
+ spin_lock(&configfs_dirent_lock);
list_del_init(&sl->sl_list);
+ spin_unlock(&configfs_dirent_lock);
config_item_put(item);
kfree(sl);
}
@@ -186,8 +189,9 @@ int configfs_unlink(struct inode *dir, s
type->ct_item_ops->drop_link(parent_item,
sl->sl_target);
- /* FIXME: Needs lock */
+ spin_lock(&configfs_dirent_lock);
list_del_init(&sl->sl_list);
+ spin_unlock(&configfs_dirent_lock);
/* Put reference from create_link() */
config_item_put(sl->sl_target);
Index: b/fs/configfs/dir.c
===================================================================
--- a/fs/configfs/dir.c 2008-06-16 17:42:31.000000000 +0200
+++ b/fs/configfs/dir.c 2008-06-16 18:54:20.000000000 +0200
@@ -37,10 +37,11 @@
DECLARE_RWSEM(configfs_rename_sem);
/*
* Protects mutations of configfs_dirent linkage together with proper i_mutex
+ * Also protects mutations of symlinks linkage to target configfs_dirent
* Mutators of configfs_dirent linkage must *both* have the proper inode locked
* and configfs_dirent_lock locked, in that order.
- * This allows one to safely traverse configfs_dirent trees without having to
- * lock inodes.
+ * This allows one to safely traverse configfs_dirent trees and symlinks without
+ * having to lock inodes.
*/
DEFINE_SPINLOCK(configfs_dirent_lock);
--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 3/5][BUGFIX] configfs: Make configfs_new_dirent() return error code instead of NULL
2008-06-16 17:00 [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir() Louis Rilling
2008-06-16 17:00 ` [PATCH 1/5][BUGFIX] configfs: Introduce configfs_dirent_lock Louis Rilling
2008-06-16 17:00 ` [PATCH 2/5][BUGFIX] configfs: Protect configfs_dirent s_links list mutations Louis Rilling
@ 2008-06-16 17:01 ` Louis Rilling
2008-06-16 17:01 ` [PATCH 4/5][BUGFIX] configfs: Fix deadlock with racing rmdir() and rename() Louis Rilling
` (2 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Louis Rilling @ 2008-06-16 17:01 UTC (permalink / raw)
To: Joel.Becker; +Cc: Louis.Rilling, linux-kernel, ocfs2-devel
[-- Attachment #1: configfs-make-configfs_new_dirent-return-errno.patch --]
[-- Type: text/plain, Size: 1666 bytes --]
This patch makes configfs_new_dirent return negative error code instead of NULL,
which will be useful in the next patch to differentiate ENOMEM from ENOENT.
Signed-off-by: Louis Rilling <Louis.Rilling@kerlabs.com>
---
fs/configfs/dir.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
Index: b/fs/configfs/dir.c
===================================================================
--- a/fs/configfs/dir.c 2008-06-16 17:42:43.000000000 +0200
+++ b/fs/configfs/dir.c 2008-06-16 18:54:11.000000000 +0200
@@ -30,6 +30,7 @@
#include <linux/mount.h>
#include <linux/module.h>
#include <linux/slab.h>
+#include <linux/err.h>
#include <linux/configfs.h>
#include "configfs_internal.h"
@@ -83,7 +84,7 @@ static struct configfs_dirent *configfs_
sd = kmem_cache_zalloc(configfs_dir_cachep, GFP_KERNEL);
if (!sd)
- return NULL;
+ return ERR_PTR(-ENOMEM);
atomic_set(&sd->s_count, 1);
INIT_LIST_HEAD(&sd->s_links);
@@ -129,8 +130,8 @@ int configfs_make_dirent(struct configfs
struct configfs_dirent * sd;
sd = configfs_new_dirent(parent_sd, element);
- if (!sd)
- return -ENOMEM;
+ if (IS_ERR(sd))
+ return PTR_ERR(sd);
sd->s_mode = mode;
sd->s_type = type;
@@ -1277,7 +1278,7 @@ static int configfs_dir_open(struct inod
file->private_data = configfs_new_dirent(parent_sd, NULL);
mutex_unlock(&dentry->d_inode->i_mutex);
- return file->private_data ? 0 : -ENOMEM;
+ return IS_ERR(file->private_data) ? PTR_ERR(file->private_data) : 0;
}
--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 4/5][BUGFIX] configfs: Fix deadlock with racing rmdir() and rename()
2008-06-16 17:00 [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir() Louis Rilling
` (2 preceding siblings ...)
2008-06-16 17:01 ` [PATCH 3/5][BUGFIX] configfs: Make configfs_new_dirent() return error code instead of NULL Louis Rilling
@ 2008-06-16 17:01 ` Louis Rilling
2008-06-16 17:01 ` [PATCH 5/5][BUGFIX] configfs: Fix failing mkdir() making racing rmdir() fail Louis Rilling
2008-06-16 23:26 ` [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir() Joel Becker
5 siblings, 0 replies; 9+ messages in thread
From: Louis Rilling @ 2008-06-16 17:01 UTC (permalink / raw)
To: Joel.Becker; +Cc: Louis.Rilling, linux-kernel, ocfs2-devel
[-- Attachment #1: configfs-fix-rmdir-vs-rename-deadlock.patch --]
[-- Type: text/plain, Size: 5747 bytes --]
This patch fixes the deadlock between racing sys_rename() and configfs_rmdir().
The idea is to avoid locking i_mutexes of default groups in
configfs_detach_prep(), and rely instead on the new configfs_dirent_lock to
protect against configfs_dirent's linkage mutations. To ensure that an mkdir()
racing with rmdir() will not create new items in a to-be-removed default group,
we make configfs_new_dirent() check for the CONFIGFS_USET_DROPPING flag right
before linking the new dirent, and return error if the flag is set. This makes
racing mkdir()/symlink()/dir_open() fail in places where errors could already
happen, resp. in (attach_item()|attach_group())/create_link()/new_dirent().
configfs_depend() remains safe since it locks all the path from configfs root,
and is thus mutually exclusive with rmdir().
An advantage of this is that now detach_groups() unconditionnaly takes the
default groups i_mutex, which makes it more consistent with populate_groups().
Signed-off-by: Louis Rilling <Louis.Rilling@kerlabs.com>
---
fs/configfs/dir.c | 45 +++++++++++++++++++++++++--------------------
1 file changed, 25 insertions(+), 20 deletions(-)
Index: b/fs/configfs/dir.c
===================================================================
--- a/fs/configfs/dir.c 2008-06-16 18:06:41.000000000 +0200
+++ b/fs/configfs/dir.c 2008-06-16 18:54:04.000000000 +0200
@@ -43,6 +43,10 @@ DECLARE_RWSEM(configfs_rename_sem);
* and configfs_dirent_lock locked, in that order.
* This allows one to safely traverse configfs_dirent trees and symlinks without
* having to lock inodes.
+ *
+ * Protects setting of CONFIGFS_USET_DROPPING: checking the flag
+ * unlocked is not reliable unless in detach_groups() called from
+ * rmdir()/unregister() and from configfs_attach_group()
*/
DEFINE_SPINLOCK(configfs_dirent_lock);
@@ -91,6 +95,11 @@ static struct configfs_dirent *configfs_
INIT_LIST_HEAD(&sd->s_children);
sd->s_element = element;
spin_lock(&configfs_dirent_lock);
+ if (parent_sd->s_type & CONFIGFS_USET_DROPPING) {
+ spin_unlock(&configfs_dirent_lock);
+ kmem_cache_free(configfs_dir_cachep, sd);
+ return ERR_PTR(-ENOENT);
+ }
list_add(&sd->s_sibling, &parent_sd->s_children);
spin_unlock(&configfs_dirent_lock);
@@ -349,11 +358,11 @@ static struct dentry * configfs_lookup(s
/*
* Only subdirectories count here. Files (CONFIGFS_NOT_PINNED) are
- * attributes and are removed by rmdir(). We recurse, taking i_mutex
- * on all children that are candidates for default detach. If the
- * result is clean, then configfs_detach_group() will handle dropping
- * i_mutex. If there is an error, the caller will clean up the i_mutex
- * holders via configfs_detach_rollback().
+ * attributes and are removed by rmdir(). We recurse, setting
+ * CONFIGFS_USET_DROPPING on all children that are candidates for
+ * default detach.
+ * If there is an error, the caller will reset the flags via
+ * configfs_detach_rollback().
*/
static int configfs_detach_prep(struct dentry *dentry)
{
@@ -370,8 +379,7 @@ static int configfs_detach_prep(struct d
if (sd->s_type & CONFIGFS_NOT_PINNED)
continue;
if (sd->s_type & CONFIGFS_USET_DEFAULT) {
- mutex_lock(&sd->s_dentry->d_inode->i_mutex);
- /* Mark that we've taken i_mutex */
+ /* Mark that we're trying to drop the group */
sd->s_type |= CONFIGFS_USET_DROPPING;
/*
@@ -392,7 +400,7 @@ out:
}
/*
- * Walk the tree, dropping i_mutex wherever CONFIGFS_USET_DROPPING is
+ * Walk the tree, resetting CONFIGFS_USET_DROPPING wherever it was
* set.
*/
static void configfs_detach_rollback(struct dentry *dentry)
@@ -403,11 +411,7 @@ static void configfs_detach_rollback(str
list_for_each_entry(sd, &parent_sd->s_children, s_sibling) {
if (sd->s_type & CONFIGFS_USET_DEFAULT) {
configfs_detach_rollback(sd->s_dentry);
-
- if (sd->s_type & CONFIGFS_USET_DROPPING) {
- sd->s_type &= ~CONFIGFS_USET_DROPPING;
- mutex_unlock(&sd->s_dentry->d_inode->i_mutex);
- }
+ sd->s_type &= ~CONFIGFS_USET_DROPPING;
}
}
}
@@ -486,16 +490,12 @@ static void detach_groups(struct config_
child = sd->s_dentry;
+ mutex_lock(&child->d_inode->i_mutex);
+
configfs_detach_group(sd->s_element);
child->d_inode->i_flags |= S_DEAD;
- /*
- * From rmdir/unregister, a configfs_detach_prep() pass
- * has taken our i_mutex for us. Drop it.
- * From mkdir/register cleanup, there is no sem held.
- */
- if (sd->s_type & CONFIGFS_USET_DROPPING)
- mutex_unlock(&child->d_inode->i_mutex);
+ mutex_unlock(&child->d_inode->i_mutex);
d_delete(child);
dput(child);
@@ -1181,12 +1181,15 @@ static int configfs_rmdir(struct inode *
return -EINVAL;
}
+ spin_lock(&configfs_dirent_lock);
ret = configfs_detach_prep(dentry);
if (ret) {
configfs_detach_rollback(dentry);
+ spin_unlock(&configfs_dirent_lock);
config_item_put(parent_item);
return ret;
}
+ spin_unlock(&configfs_dirent_lock);
/* Get a working ref for the duration of this function */
item = configfs_get_config_item(dentry);
@@ -1476,9 +1479,11 @@ void configfs_unregister_subsystem(struc
mutex_lock_nested(&configfs_sb->s_root->d_inode->i_mutex,
I_MUTEX_PARENT);
mutex_lock_nested(&dentry->d_inode->i_mutex, I_MUTEX_CHILD);
+ spin_lock(&configfs_dirent_lock);
if (configfs_detach_prep(dentry)) {
printk(KERN_ERR "configfs: Tried to unregister non-empty subsystem!\n");
}
+ spin_unlock(&configfs_dirent_lock);
configfs_detach_group(&group->cg_item);
dentry->d_inode->i_flags |= S_DEAD;
mutex_unlock(&dentry->d_inode->i_mutex);
--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 5/5][BUGFIX] configfs: Fix failing mkdir() making racing rmdir() fail
2008-06-16 17:00 [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir() Louis Rilling
` (3 preceding siblings ...)
2008-06-16 17:01 ` [PATCH 4/5][BUGFIX] configfs: Fix deadlock with racing rmdir() and rename() Louis Rilling
@ 2008-06-16 17:01 ` Louis Rilling
2008-06-16 23:26 ` [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir() Joel Becker
5 siblings, 0 replies; 9+ messages in thread
From: Louis Rilling @ 2008-06-16 17:01 UTC (permalink / raw)
To: Joel.Becker; +Cc: Louis.Rilling, linux-kernel, ocfs2-devel
[-- Attachment #1: configfs-fix-failing-mkdir-making-rmdir-fail.patch --]
[-- Type: text/plain, Size: 5410 bytes --]
When fixing the rename() vs rmdir() deadlock, we stopped locking default groups'
inodes in configfs_detach_prep(), letting racing mkdir() in default groups
proceed concurrently. This enables races like below happen, which leads to a
failing mkdir() making rmdir() fail, despite the group to remove having no
user-created directory under it in the end.
process A: process B:
/* PWD=A/B */
mkdir("C")
make_item("C")
attach_group("C")
rmdir("A")
detach_prep("A")
detach_prep("B")
error because of "C"
return -ENOTEMPTY
attach_group("C/D")
error (eg -ENOMEM)
return -ENOMEM
This patch prevents such scenarii by making rmdir() wait as long as
detach_prep() fails because a racing mkdir() is in the middle of attach_group().
To achieve this, mkdir() sets a flag CONFIGFS_USET_IN_MKDIR in parent's
configfs_dirent before calling attach_group(), and clears the flag once
attach_group() is done. detach_prep() fails with -EAGAIN whenever the flag is
hit and returns the guilty inode's mutex so that rmdir() can wait on it.
Signed-off-by: Louis Rilling <Louis.Rilling@kerlabs.com>
---
fs/configfs/configfs_internal.h | 1
fs/configfs/dir.c | 53 ++++++++++++++++++++++++++++++++--------
2 files changed, 44 insertions(+), 10 deletions(-)
Index: b/fs/configfs/configfs_internal.h
===================================================================
--- a/fs/configfs/configfs_internal.h 2008-06-16 18:06:27.000000000 +0200
+++ b/fs/configfs/configfs_internal.h 2008-06-16 18:06:54.000000000 +0200
@@ -48,6 +48,7 @@ struct configfs_dirent {
#define CONFIGFS_USET_DIR 0x0040
#define CONFIGFS_USET_DEFAULT 0x0080
#define CONFIGFS_USET_DROPPING 0x0100
+#define CONFIGFS_USET_IN_MKDIR 0x0200
#define CONFIGFS_NOT_PINNED (CONFIGFS_ITEM_ATTR)
extern spinlock_t configfs_dirent_lock;
Index: b/fs/configfs/dir.c
===================================================================
--- a/fs/configfs/dir.c 2008-06-16 18:06:50.000000000 +0200
+++ b/fs/configfs/dir.c 2008-06-16 18:52:50.000000000 +0200
@@ -364,7 +364,7 @@ static struct dentry * configfs_lookup(s
* If there is an error, the caller will reset the flags via
* configfs_detach_rollback().
*/
-static int configfs_detach_prep(struct dentry *dentry)
+static int configfs_detach_prep(struct dentry *dentry, struct mutex **wait_mutex)
{
struct configfs_dirent *parent_sd = dentry->d_fsdata;
struct configfs_dirent *sd;
@@ -379,6 +379,12 @@ static int configfs_detach_prep(struct d
if (sd->s_type & CONFIGFS_NOT_PINNED)
continue;
if (sd->s_type & CONFIGFS_USET_DEFAULT) {
+ /* Abort if racing with mkdir() */
+ if (sd->s_type & CONFIGFS_USET_IN_MKDIR) {
+ if (wait_mutex)
+ *wait_mutex = &sd->s_dentry->d_inode->i_mutex;
+ return -EAGAIN;
+ }
/* Mark that we're trying to drop the group */
sd->s_type |= CONFIGFS_USET_DROPPING;
@@ -386,7 +392,7 @@ static int configfs_detach_prep(struct d
* Yup, recursive. If there's a problem, blame
* deep nesting of default_groups
*/
- ret = configfs_detach_prep(sd->s_dentry);
+ ret = configfs_detach_prep(sd->s_dentry, wait_mutex);
if (!ret)
continue;
} else
@@ -1113,11 +1119,26 @@ static int configfs_mkdir(struct inode *
*/
module_got = 1;
+ /*
+ * Make racing rmdir() fail if it did not tag parent with
+ * CONFIGFS_USET_DROPPING
+ * Note: if CONFIGFS_USET_DROPPING is already set, attach_group() will
+ * fail and let rmdir() terminate correctly
+ */
+ spin_lock(&configfs_dirent_lock);
+ /* This will make configfs_detach_prep() fail */
+ sd->s_type |= CONFIGFS_USET_IN_MKDIR;
+ spin_unlock(&configfs_dirent_lock);
+
if (group)
ret = configfs_attach_group(parent_item, item, dentry);
else
ret = configfs_attach_item(parent_item, item, dentry);
+ spin_lock(&configfs_dirent_lock);
+ sd->s_type &= ~CONFIGFS_USET_IN_MKDIR;
+ spin_unlock(&configfs_dirent_lock);
+
out_unlink:
if (ret) {
/* Tear down everything we built up */
@@ -1182,13 +1203,25 @@ static int configfs_rmdir(struct inode *
}
spin_lock(&configfs_dirent_lock);
- ret = configfs_detach_prep(dentry);
- if (ret) {
- configfs_detach_rollback(dentry);
- spin_unlock(&configfs_dirent_lock);
- config_item_put(parent_item);
- return ret;
- }
+ do {
+ struct mutex *wait_mutex;
+
+ ret = configfs_detach_prep(dentry, &wait_mutex);
+ if (ret) {
+ configfs_detach_rollback(dentry);
+ spin_unlock(&configfs_dirent_lock);
+ if (ret != -EAGAIN) {
+ config_item_put(parent_item);
+ return ret;
+ }
+
+ /* Wait until the racing operation terminates */
+ mutex_lock(wait_mutex);
+ mutex_unlock(wait_mutex);
+
+ spin_lock(&configfs_dirent_lock);
+ }
+ } while (ret == -EAGAIN);
spin_unlock(&configfs_dirent_lock);
/* Get a working ref for the duration of this function */
@@ -1480,7 +1513,7 @@ void configfs_unregister_subsystem(struc
I_MUTEX_PARENT);
mutex_lock_nested(&dentry->d_inode->i_mutex, I_MUTEX_CHILD);
spin_lock(&configfs_dirent_lock);
- if (configfs_detach_prep(dentry)) {
+ if (configfs_detach_prep(dentry, NULL)) {
printk(KERN_ERR "configfs: Tried to unregister non-empty subsystem!\n");
}
spin_unlock(&configfs_dirent_lock);
--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir()
2008-06-16 17:00 [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir() Louis Rilling
` (4 preceding siblings ...)
2008-06-16 17:01 ` [PATCH 5/5][BUGFIX] configfs: Fix failing mkdir() making racing rmdir() fail Louis Rilling
@ 2008-06-16 23:26 ` Joel Becker
2008-06-23 22:34 ` Joel Becker
5 siblings, 1 reply; 9+ messages in thread
From: Joel Becker @ 2008-06-16 23:26 UTC (permalink / raw)
To: Louis Rilling; +Cc: linux-kernel, ocfs2-devel
On Mon, Jun 16, 2008 at 07:00:57PM +0200, Louis Rilling wrote:
> This patchset fixes the deadlock described below. This second version of the
> patchset fixes another long-standing bug in symlinks locking. Please read the
> headers of the third and fourth patches for a detailed explanation of the
> bugfixes. The last patch fixes a new behavior introduced by the deadlock fix,
> and that ones could find undesirable. Whether it is really needed or not is
> debatable.
Ok, this is in-tree for the next merge. Thanks.
Joel
--
"There is no sincerer love than the love of food."
- George Bernard Shaw
Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir()
2008-06-16 23:26 ` [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir() Joel Becker
@ 2008-06-23 22:34 ` Joel Becker
2008-06-24 13:30 ` Louis Rilling
0 siblings, 1 reply; 9+ messages in thread
From: Joel Becker @ 2008-06-23 22:34 UTC (permalink / raw)
To: Louis Rilling, linux-kernel, ocfs2-devel
On Mon, Jun 16, 2008 at 04:26:57PM -0700, Joel Becker wrote:
> Ok, this is in-tree for the next merge. Thanks.
Btw, we still need to solve the lockdep issue. I'm leaning
towards turning it off for attach/detach_groups(), because any CHILD+N
level won't mesh with the VFS anyway.
We also need a comment on detach_group() pointing out its
recursive locking. It's there in the code, but it's not obvious.
Joel
--
"Not everything that can be counted counts, and not everything
that counts can be counted."
- Albert Einstein
Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir()
2008-06-23 22:34 ` Joel Becker
@ 2008-06-24 13:30 ` Louis Rilling
0 siblings, 0 replies; 9+ messages in thread
From: Louis Rilling @ 2008-06-24 13:30 UTC (permalink / raw)
To: Joel Becker; +Cc: linux-kernel, ocfs2-devel
[-- Attachment #1: Type: text/plain, Size: 875 bytes --]
On Mon, Jun 23, 2008 at 03:34:50PM -0700, Joel Becker wrote:
> On Mon, Jun 16, 2008 at 04:26:57PM -0700, Joel Becker wrote:
> > Ok, this is in-tree for the next merge. Thanks.
>
> Btw, we still need to solve the lockdep issue. I'm leaning
> towards turning it off for attach/detach_groups(), because any CHILD+N
> level won't mesh with the VFS anyway.
Yes, and the same for depend_prep()/depend_rollback().
> We also need a comment on detach_group() pointing out its
> recursive locking. It's there in the code, but it's not obvious.
Will do all of this after we are sure not to change things again (see my
separate email on a possible bug in attach_group()).
Louis
--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-06-24 13:30 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-16 17:00 [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir() Louis Rilling
2008-06-16 17:00 ` [PATCH 1/5][BUGFIX] configfs: Introduce configfs_dirent_lock Louis Rilling
2008-06-16 17:00 ` [PATCH 2/5][BUGFIX] configfs: Protect configfs_dirent s_links list mutations Louis Rilling
2008-06-16 17:01 ` [PATCH 3/5][BUGFIX] configfs: Make configfs_new_dirent() return error code instead of NULL Louis Rilling
2008-06-16 17:01 ` [PATCH 4/5][BUGFIX] configfs: Fix deadlock with racing rmdir() and rename() Louis Rilling
2008-06-16 17:01 ` [PATCH 5/5][BUGFIX] configfs: Fix failing mkdir() making racing rmdir() fail Louis Rilling
2008-06-16 23:26 ` [PATCH 0/5][BUGFIX] configfs: Fix deadlock of rename() vs rmdir() Joel Becker
2008-06-23 22:34 ` Joel Becker
2008-06-24 13:30 ` Louis Rilling
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox