From: bfields@fieldses.org
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
"J. Bruce Fields" <bfields@redhat.com>
Subject: [PATCH 5/5] vfs: change nondirectory i_mutex ordering to fix quota deadlock
Date: Wed, 25 Apr 2012 11:22:09 -0400 [thread overview]
Message-ID: <1335367329-929-5-git-send-email-bfields@fieldses.org> (raw)
In-Reply-To: <20120418215238.GA11959@fieldses.org>
From: "J. Bruce Fields" <bfields@redhat.com>
A write can take an i_mutex on a quota file while holding the i_mutex on
the file being written to.
And both rename and fs/ext4/move_extent.c:mext_inode_double_lock() can
also take the i_mutex on two regular files.
Either of those could take locks in opposite order from a quota file
write, and end up deadlocked.
Changing the locking order in the quota-update-while-writing case looks
hard. So, instead, change the order in the mext_inode_double_lock()
case so that the i_mutex is always taken on a quota file after being
taken on a file that isn't a quota file.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
Documentation/filesystems/directory-locking | 25 +++++++++++++++----------
fs/inode.c | 13 ++++++++++++-
2 files changed, 27 insertions(+), 11 deletions(-)
diff --git a/Documentation/filesystems/directory-locking b/Documentation/filesystems/directory-locking
index 9e8a629..022d94f 100644
--- a/Documentation/filesystems/directory-locking
+++ b/Documentation/filesystems/directory-locking
@@ -3,8 +3,12 @@ kinds of locks - per-inode (->i_mutex) and per-filesystem
(->s_vfs_rename_mutex).
When taking the i_mutex on multiple non-directory objects, we
-always acquire the locks in order by increasing address. We'll call
-that "inode pointer" order in the following.
+always acquire them in the following order (which we'll call "the usual
+order" in the following):
+
+ * non-IS_NOQUOTA inodes before IS_NOQUOTA inodes
+ * within each category, inodes with smaller addresses before
+ inodes with larger addresses
For our purposes all operations fall in 5 classes:
@@ -17,7 +21,7 @@ locks victim and calls the method.
4) rename() that is _not_ cross-directory. Locking rules: caller locks
the parent and finds source and target. If source and target both
-exist, they are locked in inode pointer order. Otherwise lock just
+exist, they are locked in the usual order. Otherwise lock just
source. Then call method.
5) link creation. Locking rules:
@@ -35,8 +39,8 @@ rules:
fail with -ENOTEMPTY
* if new parent is equal to or is a descendent of source
fail with -ELOOP
- * If target exists, lock both source and target, in inode
- pointer order. Otherwise lock just source.
+ * If target exists, lock both source and target, in the
+ usual order. Otherwise lock just source.
* call the method.
@@ -63,10 +67,10 @@ objects - A < B iff A is an ancestor of B.
the order until we had acquired all locks).
(3) locks on non-directory objects are acquired only after locks on
- directory objects, and are acquired in inode pointer order.
+ directory objects, and are acquired in the usual order.
(Proof: all operations but renames take lock on at most one
non-directory object, except renames, which take locks on source and
- target in inode pointer order.)
+ target in the usual order.)
Now consider the minimal deadlock. Each process is blocked on
attempt to acquire some lock and already holds at least one lock. Let's
@@ -75,9 +79,10 @@ not contended, since any process blocked on it is not holding any locks.
Thus all processes are blocked on ->i_mutex.
By (3), any process holding a non-directory lock can only be
-waiting on another non-directory lock with a larger address. Therefore
-the process holding the "largest" such lock can always make progress, and
-non-directory objects are not included in the set of contended locks.
+waiting on another non-directory that is "larger" in the usual order.
+Therefore the process holding the "largest" such lock can always make
+progress, and non-directory objects are not included in the set of
+contended locks.
Thus link creation can't be a part of deadlock - it can't be
blocked on source and it means that it doesn't hold any locks.
diff --git a/fs/inode.c b/fs/inode.c
index 487c924..13d23b6 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -961,6 +961,17 @@ void unlock_new_inode(struct inode *inode)
}
EXPORT_SYMBOL(unlock_new_inode);
+/*
+ * We order !IS_NOQUOTA files before ISNOQUOTA files, and by pointer
+ * within each category.
+ */
+static bool nondir_mutex_ordered(struct inode *inode1, struct inode *inode2)
+{
+ if (IS_NOQUOTA(inode1) == IS_NOQUOTA(inode2))
+ return inode1 < inode2;
+ return IS_NOQUOTA(inode2);
+}
+
/**
* lock_two_nondirectories - take two i_mutexes on non-directory objects
* @inode1: first inode to lock; must be non-NULL
@@ -970,7 +981,7 @@ void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
{
if (inode1 == inode2 || inode2 == NULL)
mutex_lock(&inode1->i_mutex);
- else if (inode1 < inode2) {
+ else if (nondir_mutex_ordered(inode1, inode2)) {
mutex_lock(&inode1->i_mutex);
mutex_lock_nested(&inode2->i_mutex, I_MUTEX_QUOTA);
--
1.7.5.4
next prev parent reply other threads:[~2012-04-25 15:22 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-17 5:25 [git pull] vfs and fs fixes Al Viro
2012-04-17 15:01 ` Linus Torvalds
2012-04-17 16:22 ` J. Bruce Fields
2012-04-17 16:33 ` Linus Torvalds
2012-04-17 17:06 ` J. Bruce Fields
2012-04-17 17:59 ` Al Viro
2012-04-17 18:01 ` Al Viro
2012-04-17 18:28 ` Al Viro
2012-04-17 21:14 ` J. Bruce Fields
2012-04-17 22:08 ` Linus Torvalds
2012-04-17 23:44 ` Al Viro
2012-04-18 0:49 ` J. Bruce Fields
2012-04-18 0:56 ` Linus Torvalds
2012-04-18 21:52 ` J. Bruce Fields
2012-04-25 15:20 ` J. Bruce Fields
2012-04-25 15:22 ` [PATCH 1/5] vfs: fix outdated i_mutex_lock_class documentation bfields
2012-04-25 15:22 ` [PATCH 2/5] vfs: pull ext4's double-i_mutex-locking into common code bfields
2012-04-25 15:22 ` [PATCH 3/5] vfs: don't use PARENT/CHILD lock classes for non-directories bfields
2012-04-25 15:22 ` [PATCH 4/5] vfs: take i_mutex on renamed file bfields
2012-04-25 15:22 ` bfields [this message]
2012-04-25 15:28 ` [PATCH 5/5] vfs: change nondirectory i_mutex ordering to fix quota deadlock J. Bruce Fields
2012-04-25 19:53 ` Jan Kara
2012-04-25 19:58 ` J. Bruce Fields
2012-04-20 11:15 ` [git pull] vfs and fs fixes Jan Kara
2012-04-24 19:52 ` J. Bruce Fields
2012-04-24 22:23 ` Jan Kara
2012-04-25 11:29 ` J. Bruce Fields
2012-04-25 16:26 ` Jan Kara
2012-04-25 16:47 ` Steven Whitehouse
2012-04-25 17:11 ` J. Bruce Fields
2012-04-18 0:47 ` J. Bruce Fields
2012-04-19 3:23 ` Benjamin Herrenschmidt
2012-04-19 14:50 ` Ted Ts'o
2012-04-24 17:40 ` Greg KH
2012-04-24 17:45 ` Al Viro
2012-04-24 17:59 ` Greg KH
2012-04-24 18:04 ` Al Viro
2012-04-24 20:37 ` Greg KH
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1335367329-929-5-git-send-email-bfields@fieldses.org \
--to=bfields@fieldses.org \
--cc=bfields@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).