From: Dave Chinner <david@fromorbit.com>
To: xfs@oss.sgi.com
Subject: [PATCH 01/15] xfs: xfs_remove deadlocks due to inverted AGF vs AGI lock ordering
Date: Tue, 29 Oct 2013 22:11:44 +1100 [thread overview]
Message-ID: <1383045118-31107-2-git-send-email-david@fromorbit.com> (raw)
In-Reply-To: <1383045118-31107-1-git-send-email-david@fromorbit.com>
From: Dave Chinner <dchinner@redhat.com>
Removing an inode from the namespace involves removing the directory
entry and dropping the link count on the inode. Removing the
directory entry can result in locking an AGF (directory blocks were
freed) and removing a link count can result in placing the inode on
an unlinked list which results in locking an AGI.
The big problem here is that we have an ordering constraint on AGF
and AGI locking - inode allocation locks the AGI, then can allocate
a new extent for new inodes, locking the AGF after the AGI.
Similarly, freeing the inode removes the inode from the unlinked
list, requiring that we lock the AGI first, and then freeing the
inode can result in an inode chunk being freed and hence freeing
disk space requiring that we lock an AGF.
Hence the ordering that is imposed by other parts of the code is AGI
before AGF. This means we cannot remove the directory entry before
we drop the inode reference count and put it on the unlinked list as
this results in a lock order of AGF then AGI, and this can deadlock
against inode allocation and freeing. Therefore we must drop the
link counts before we remove the directory entry.
This is still safe from a transactional point of view - it is not
until we get to xfs_bmap_finish() that we have the possibility of
multiple transactions in this operation. Hence as long as we remove
the directory entry and drop the link count in the first transaction
of the remove operation, there are no transactional constraints on
the ordering here.
Change the ordering of the operations in the xfs_remove() function
to align the ordering of AGI and AGF locking to match that of the
rest of the code.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_inode.c | 72 +++++++++++++++++++++++++++++++++---------------------
1 file changed, 44 insertions(+), 28 deletions(-)
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 326b94d..001aa89 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2404,6 +2404,33 @@ xfs_iunpin_wait(
__xfs_iunpin_wait(ip);
}
+/*
+ * Removing an inode from the namespace involves removing the directory entry
+ * and dropping the link count on the inode. Removing the directory entry can
+ * result in locking an AGF (directory blocks were freed) and removing a link
+ * count can result in placing the inode on an unlinked list which results in
+ * locking an AGI.
+ *
+ * The big problem here is that we have an ordering constraint on AGF and AGI
+ * locking - inode allocation locks the AGI, then can allocate a new extent for
+ * new inodes, locking the AGF after the AGI. Similarly, freeing the inode
+ * removes the inode from the unlinked list, requiring that we lock the AGI
+ * first, and then freeing the inode can result in an inode chunk being freed
+ * and hence freeing disk space requiring that we lock an AGF.
+ *
+ * Hence the ordering that is imposed by other parts of the code is AGI before
+ * AGF. This means we cannot remove the directory entry before we drop the inode
+ * reference count and put it on the unlinked list as this results in a lock
+ * order of AGF then AGI, and this can deadlock against inode allocation and
+ * freeing. Therefore we must drop the link counts before we remove the
+ * directory entry.
+ *
+ * This is still safe from a transactional point of view - it is not until we
+ * get to xfs_bmap_finish() that we have the possibility of multiple
+ * transactions in this operation. Hence as long as we remove the directory
+ * entry and drop the link count in the first transaction of the remove
+ * operation, there are no transactional constraints on the ordering here.
+ */
int
xfs_remove(
xfs_inode_t *dp,
@@ -2473,6 +2500,7 @@ xfs_remove(
/*
* If we're removing a directory perform some additional validation.
*/
+ cancel_flags |= XFS_TRANS_ABORT;
if (is_dir) {
ASSERT(ip->i_d.di_nlink >= 2);
if (ip->i_d.di_nlink != 2) {
@@ -2483,31 +2511,16 @@ xfs_remove(
error = XFS_ERROR(ENOTEMPTY);
goto out_trans_cancel;
}
- }
- xfs_bmap_init(&free_list, &first_block);
- error = xfs_dir_removename(tp, dp, name, ip->i_ino,
- &first_block, &free_list, resblks);
- if (error) {
- ASSERT(error != ENOENT);
- goto out_bmap_cancel;
- }
- xfs_trans_ichgtime(tp, dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
-
- if (is_dir) {
- /*
- * Drop the link from ip's "..".
- */
+ /* Drop the link from ip's "..". */
error = xfs_droplink(tp, dp);
if (error)
- goto out_bmap_cancel;
+ goto out_trans_cancel;
- /*
- * Drop the "." link from ip to self.
- */
+ /* Drop the "." link from ip to self. */
error = xfs_droplink(tp, ip);
if (error)
- goto out_bmap_cancel;
+ goto out_trans_cancel;
} else {
/*
* When removing a non-directory we need to log the parent
@@ -2516,20 +2529,24 @@ xfs_remove(
*/
xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE);
}
+ xfs_trans_ichgtime(tp, dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
- /*
- * Drop the link from dp to ip.
- */
+ /* Drop the link from dp to ip. */
error = xfs_droplink(tp, ip);
if (error)
- goto out_bmap_cancel;
+ goto out_trans_cancel;
- /*
- * Determine if this is the last link while
- * we are in the transaction.
- */
+ /* Determine if this is the last link while the inode is locked */
link_zero = (ip->i_d.di_nlink == 0);
+ xfs_bmap_init(&free_list, &first_block);
+ error = xfs_dir_removename(tp, dp, name, ip->i_ino,
+ &first_block, &free_list, resblks);
+ if (error) {
+ ASSERT(error != ENOENT);
+ goto out_bmap_cancel;
+ }
+
/*
* If this is a synchronous mount, make sure that the
* remove transaction goes to disk before returning to
@@ -2559,7 +2576,6 @@ xfs_remove(
out_bmap_cancel:
xfs_bmap_cancel(&free_list);
- cancel_flags |= XFS_TRANS_ABORT;
out_trans_cancel:
xfs_trans_cancel(tp, cancel_flags);
std_return:
--
1.8.4.rc3
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2013-10-29 11:12 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-29 11:11 [PATCH 00/15] xfs: patches for 3.13 Dave Chinner
2013-10-29 11:11 ` Dave Chinner [this message]
2013-10-30 22:39 ` [PATCH 01/15] xfs: xfs_remove deadlocks due to inverted AGF vs AGI lock ordering Ben Myers
2013-10-30 23:15 ` Dave Chinner
2013-11-04 23:10 ` Ben Myers
2013-10-29 11:11 ` [PATCH 02/15] xfs: open code inc_inode_iversion when logging an inode Dave Chinner
2013-10-29 11:11 ` [PATCH 03/15] xfs: abstract the differences in dir2/dir3 via an ops vector Dave Chinner
2013-10-29 11:11 ` [PATCH 04/15] xfs: vectorise remaining shortform dir2 ops Dave Chinner
2013-10-29 11:11 ` [PATCH 05/15] xfs: vectorise directory data operations Dave Chinner
2013-10-29 11:11 ` [PATCH 06/15] xfs: vectorise directory data operations part 2 Dave Chinner
2013-10-29 11:11 ` [PATCH 07/15] xfs: vectorise directory leaf operations Dave Chinner
2013-10-29 11:11 ` [PATCH 08/15] xfs: vectorise DA btree operations Dave Chinner
2013-10-29 11:11 ` [PATCH 09/15] xfs: vectorise encoding/decoding directory headers Dave Chinner
2013-10-29 19:06 ` Ben Myers
2013-10-29 11:11 ` [PATCH 10/15] xfs: vectorise directory leaf operations Dave Chinner
2013-10-29 19:13 ` Ben Myers
2013-10-29 11:11 ` [PATCH 11/15] xfs: convert directory vector functions to constants Dave Chinner
2013-10-29 19:22 ` Ben Myers
2013-10-29 22:15 ` [PATCH 11/15 V2] " Dave Chinner
2013-10-30 18:09 ` Ben Myers
2013-10-29 11:11 ` [PATCH 12/15] xfs: make dir2 ftype offset pointers explicit Dave Chinner
2013-10-29 20:00 ` Ben Myers
2013-10-29 22:15 ` Dave Chinner
2013-10-30 18:51 ` Ben Myers
2013-10-29 11:11 ` [PATCH 13/15] xfs: validity check the directory block leaf entry count Dave Chinner
2013-10-29 20:43 ` Ben Myers
2013-10-29 11:11 ` [PATCH 14/15] xfs: prevent stack overflows from page cache allocation Dave Chinner
2013-10-30 10:23 ` Christoph Hellwig
2013-10-30 21:40 ` Ben Myers
2013-10-29 11:11 ` [PATCH 15/15] xfs: fix static and extern sparse warnings Dave Chinner
2013-10-29 21:12 ` Ben Myers
2013-10-30 19:22 ` [PATCH 00/15] xfs: patches for 3.13 Ben Myers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1383045118-31107-2-git-send-email-david@fromorbit.com \
--to=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox