From: Dave Chinner <david@fromorbit.com>
To: xfs@oss.sgi.com
Subject: [PATCH 01/19] xfs: xfs_remove deadlocks due to inverted AGF vs AGI lock ordering
Date: Tue, 15 Oct 2013 09:17:47 +1100 [thread overview]
Message-ID: <1381789085-21923-2-git-send-email-david@fromorbit.com> (raw)
In-Reply-To: <1381789085-21923-1-git-send-email-david@fromorbit.com>
From: Dave Chinner <dchinner@redhat.com>
Removing an inode from the namespace involves removing the directory
entry and dropping the link count on the inode. Removing the
directory entry can result in locking an AGF (directory blocks were
freed) and removing a link count can result in placing the inode on
an unlinked list which results in locking an AGI.
The big problem here is that we have an ordering constraint on AGF
and AGI locking - inode allocation locks the AGI, then can allocate
a new extent for new inodes, locking the AGF after the AGI.
Similarly, freeing the inode removes the inode from the unlinked
list, requiring that we lock the AGI first, and then freeing the
inode can result in an inode chunk being freed and hence freeing
disk space requiring that we lock an AGF.
Hence the ordering that is imposed by other parts of the code is AGI
before AGF. This means we cannot remove the directory entry before
we drop the inode reference count and put it on the unlinked list as
this results in a lock order of AGF then AGI, and this can deadlock
against inode allocation and freeing. Therefore we must drop the
link counts before we remove the directory entry.
This is still safe from a transactional point of view - it is not
until we get to xfs_bmap_finish() that we have the possibility of
multiple transactions in this operation. Hence as long as we remove
the directory entry and drop the link count in the first transaction
of the remove operation, there are no transactional constraints on
the ordering here.
Change the ordering of the operations in the xfs_remove() function
to align the ordering of AGI and AGF locking to match that of the
rest of the code.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_inode.c | 72 +++++++++++++++++++++++++++++++++---------------------
1 file changed, 44 insertions(+), 28 deletions(-)
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 5fce726..6e66b6f 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2405,6 +2405,33 @@ xfs_iunpin_wait(
__xfs_iunpin_wait(ip);
}
+/*
+ * Removing an inode from the namespace involves removing the directory entry
+ * and dropping the link count on the inode. Removing the directory entry can
+ * result in locking an AGF (directory blocks were freed) and removing a link
+ * count can result in placing the inode on an unlinked list which results in
+ * locking an AGI.
+ *
+ * The big problem here is that we have an ordering constraint on AGF and AGI
+ * locking - inode allocation locks the AGI, then can allocate a new extent for
+ * new inodes, locking the AGF after the AGI. Similarly, freeing the inode
+ * removes the inode from the unlinked list, requiring that we lock the AGI
+ * first, and then freeing the inode can result in an inode chunk being freed
+ * and hence freeing disk space requiring that we lock an AGF.
+ *
+ * Hence the ordering that is imposed by other parts of the code is AGI before
+ * AGF. This means we cannot remove the directory entry before we drop the inode
+ * reference count and put it on the unlinked list as this results in a lock
+ * order of AGF then AGI, and this can deadlock against inode allocation and
+ * freeing. Therefore we must drop the link counts before we remove the
+ * directory entry.
+ *
+ * This is still safe from a transactional point of view - it is not until we
+ * get to xfs_bmap_finish() that we have the possibility of multiple
+ * transactions in this operation. Hence as long as we remove the directory
+ * entry and drop the link count in the first transaction of the remove
+ * operation, there are no transactional constraints on the ordering here.
+ */
int
xfs_remove(
xfs_inode_t *dp,
@@ -2474,6 +2501,7 @@ xfs_remove(
/*
* If we're removing a directory perform some additional validation.
*/
+ cancel_flags |= XFS_TRANS_ABORT;
if (is_dir) {
ASSERT(ip->i_d.di_nlink >= 2);
if (ip->i_d.di_nlink != 2) {
@@ -2484,31 +2512,16 @@ xfs_remove(
error = XFS_ERROR(ENOTEMPTY);
goto out_trans_cancel;
}
- }
- xfs_bmap_init(&free_list, &first_block);
- error = xfs_dir_removename(tp, dp, name, ip->i_ino,
- &first_block, &free_list, resblks);
- if (error) {
- ASSERT(error != ENOENT);
- goto out_bmap_cancel;
- }
- xfs_trans_ichgtime(tp, dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
-
- if (is_dir) {
- /*
- * Drop the link from ip's "..".
- */
+ /* Drop the link from ip's "..". */
error = xfs_droplink(tp, dp);
if (error)
- goto out_bmap_cancel;
+ goto out_trans_cancel;
- /*
- * Drop the "." link from ip to self.
- */
+ /* Drop the "." link from ip to self. */
error = xfs_droplink(tp, ip);
if (error)
- goto out_bmap_cancel;
+ goto out_trans_cancel;
} else {
/*
* When removing a non-directory we need to log the parent
@@ -2517,20 +2530,24 @@ xfs_remove(
*/
xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE);
}
+ xfs_trans_ichgtime(tp, dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
- /*
- * Drop the link from dp to ip.
- */
+ /* Drop the link from dp to ip. */
error = xfs_droplink(tp, ip);
if (error)
- goto out_bmap_cancel;
+ goto out_trans_cancel;
- /*
- * Determine if this is the last link while
- * we are in the transaction.
- */
+ /* Determine if this is the last link while the inode is locked */
link_zero = (ip->i_d.di_nlink == 0);
+ xfs_bmap_init(&free_list, &first_block);
+ error = xfs_dir_removename(tp, dp, name, ip->i_ino,
+ &first_block, &free_list, resblks);
+ if (error) {
+ ASSERT(error != ENOENT);
+ goto out_bmap_cancel;
+ }
+
/*
* If this is a synchronous mount, make sure that the
* remove transaction goes to disk before returning to
@@ -2560,7 +2577,6 @@ xfs_remove(
out_bmap_cancel:
xfs_bmap_cancel(&free_list);
- cancel_flags |= XFS_TRANS_ABORT;
out_trans_cancel:
xfs_trans_cancel(tp, cancel_flags);
std_return:
--
1.8.4.rc3
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2013-10-14 22:18 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-14 22:17 [PATCH 00/19 V2] xfs: patches for 3.13 Dave Chinner
2013-10-14 22:17 ` Dave Chinner [this message]
2013-10-14 22:17 ` [PATCH 02/19] xfs: open code inc_inode_iversion when logging an inode Dave Chinner
2013-10-14 22:17 ` [PATCH 03/19] xfs: prevent deadlock trying to cover an active log Dave Chinner
2013-10-17 15:54 ` Eric Sandeen
2013-10-17 18:51 ` Ben Myers
2013-10-14 22:17 ` [PATCH 04/19] xfs: create a shared header file for format-related information Dave Chinner
2013-10-22 23:12 ` Ben Myers
2013-10-22 23:36 ` [PATCH 04/19, V2] " Dave Chinner
2013-10-14 22:17 ` [PATCH 05/19] xfs: unify directory/attribute format definitions Dave Chinner
2013-10-21 23:11 ` Ben Myers
2013-10-21 23:33 ` Dave Chinner
2013-10-22 14:57 ` Christoph Hellwig
2013-10-14 22:17 ` [PATCH 06/19] xfs: split dquot buffer operations out Dave Chinner
2013-10-14 22:17 ` [PATCH 07/19] xfs: remove unused transaction callback variables Dave Chinner
2013-10-22 14:25 ` Ben Myers
2013-10-14 22:17 ` [PATCH 08/19] xfs: decouple log and transaction headers Dave Chinner
2013-10-22 17:16 ` Ben Myers
2013-10-22 23:50 ` [PATCH 08/19, V2] " Dave Chinner
2013-10-14 22:17 ` [PATCH 09/19] xfs: decouple inode and bmap btree header files Dave Chinner
2013-10-22 18:29 ` Ben Myers
2013-10-22 23:40 ` [PATCH 09/19, V2] " Dave Chinner
2013-10-22 23:48 ` Dave Chinner
2013-10-22 23:51 ` [PATCH 09/19, V3] " Dave Chinner
2013-10-14 22:17 ` [PATCH 10/19] xfs: split xfs_rtalloc.c for userspace sanity Dave Chinner
2013-10-14 22:17 ` [PATCH 11/19] xfs: abstract the differences in dir2/dir3 via an ops vector Dave Chinner
2013-10-14 22:17 ` [PATCH 12/19] xfs: vectorise remaining shortform dir2 ops Dave Chinner
2013-10-14 22:17 ` [PATCH 13/19] xfs: vectorise directory data operations Dave Chinner
2013-10-24 18:39 ` Ben Myers
2013-10-24 21:31 ` Dave Chinner
2013-10-24 21:41 ` Ben Myers
2013-10-24 22:08 ` Dave Chinner
2013-10-24 22:28 ` Ben Myers
2013-10-27 23:01 ` Dave Chinner
2013-10-14 22:18 ` [PATCH 14/19] xfs: vectorise directory data operations part 2 Dave Chinner
2013-10-14 22:18 ` [PATCH 15/19] xfs: vectorise directory leaf operations Dave Chinner
2013-10-20 23:25 ` [PATCH 15/19, V2] " Dave Chinner
2013-10-14 22:18 ` [PATCH 16/19] xfs: vectorise DA btree operations Dave Chinner
2013-10-25 17:17 ` Ben Myers
2013-10-14 22:18 ` [PATCH 17/19] xfs: vectorise encoding/decoding directory headers Dave Chinner
2013-10-25 18:46 ` Ben Myers
2013-10-25 19:20 ` Ben Myers
2013-10-14 22:18 ` [PATCH 18/19] xfs: vectorise directory leaf operations Dave Chinner
2013-10-25 20:18 ` Ben Myers
2013-10-25 21:51 ` Ben Myers
2013-10-14 22:18 ` [PATCH 19/19] xfs: convert directory vector functions to constants Dave Chinner
2013-10-16 21:16 ` Ben Myers
2013-10-16 22:23 ` Dave Chinner
2013-10-16 22:52 ` Ben Myers
2013-10-16 23:06 ` Ben Myers
2013-10-17 0:17 ` Dave Chinner
2013-10-23 22:23 ` Ben Myers
2013-10-25 21:47 ` Ben Myers
2013-10-23 22:23 ` [PATCH 00/19 V2] xfs: patches for 3.13 Ben Myers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1381789085-21923-2-git-send-email-david@fromorbit.com \
--to=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox