public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: wenli xie <wlxie7296@gmail.com>
Cc: xfs <linux-xfs@vger.kernel.org>
Subject: Re: [Bug report] overlayfs over xfs whiteout operation may cause deadlock
Date: Thu, 17 Dec 2020 13:11:17 -0800	[thread overview]
Message-ID: <20201217211117.GF38809@magnolia> (raw)
In-Reply-To: <CABRboy35_tyxA3gHN7_=xp0_RVugQjvFOHCRsH4Y4rrivE7HmQ@mail.gmail.com>

On Tue, Dec 15, 2020 at 08:44:27PM +0800, wenli xie wrote:
> I tried upstream kernel 5.10 to do the test, and this issue still  can be
> reproduced.

Thanks for the report, I've condensed this down to the following:

#!/bin/bash

SCRATCH_MNT=/mnt
LOAD_FACTOR=1
TIME_FACTOR=1

mkfs.xfs -f /dev/sda
mount /dev/sda $SCRATCH_MNT

mkdir $SCRATCH_MNT/lowerdir
mkdir $SCRATCH_MNT/lowerdir1
mkdir $SCRATCH_MNT/lowerdir/etc
mkdir $SCRATCH_MNT/workers
echo salts > $SCRATCH_MNT/lowerdir/etc/access.conf
touch $SCRATCH_MNT/running

stop_workers() {
	test -e $SCRATCH_MNT/running || return
	rm -f $SCRATCH_MNT/running

	while [ "$(ls $SCRATCH_MNT/workers/ | wc -l)" -gt 0 ]; do
		wait
	done
}

worker() {
	local tag="$1"
	local mergedir="$SCRATCH_MNT/merged$tag"
	local l="lowerdir=$SCRATCH_MNT/lowerdir:$SCRATCH_MNT/lowerdir1"
	local u="upperdir=$SCRATCH_MNT/upperdir$tag"
	local w="workdir=$SCRATCH_MNT/workdir$tag"
	local i="index=off"

	touch $SCRATCH_MNT/workers/$tag
	while test -e $SCRATCH_MNT/running; do
		rm -rf $SCRATCH_MNT/merged$tag
		rm -rf $SCRATCH_MNT/upperdir$tag
		rm -rf $SCRATCH_MNT/workdir$tag
		mkdir $SCRATCH_MNT/merged$tag
		mkdir $SCRATCH_MNT/workdir$tag
		mkdir $SCRATCH_MNT/upperdir$tag

		mount -t overlay overlay -o "$l,$u,$w,$i" $mergedir
		mv $mergedir/etc/access.conf $mergedir/etc/access.conf.bak
		touch $mergedir/etc/access.conf
		mv $mergedir/etc/access.conf $mergedir/etc/access.conf.bak
		touch $mergedir/etc/access.conf
		umount $mergedir
	done
	rm -f $SCRATCH_MNT/workers/$tag
}

for i in $(seq 0 $((4 + LOAD_FACTOR)) ); do
	worker $i &
done

sleep $((30 * TIME_FACTOR))
stop_workers

...and I think this is enough to diagnose the deadlock.

This is an ABBA deadlock caused by locking the AGI buffers in the wrong
order.  Specifically, we seem to be calling xfs_dir_rename with a
non-null @wip and a non-null @target_ip.  In the deadlock scenario, @wip
is an inode in AG 2, and @target_ip is an inode in AG 0 with nlink==1.

First we call xfs_iunlink_remove to remove @wip from the unlinked list,
which causes us to lock AGI 2.  Next we replace the directory entry.
Finally, we need to droplink @target_ip.  Since @target_ip has nlink==1,
xfs_droplink will need to put it on AGI 0's unlinked list.

Unfortunately, the locking rules say that you can only lock AGIs in
increasing order.  This means that we cannot lock AGI 0 after locking
AGI 2 without risking deadlock.

Does the attached patch fix the deadlock for you?

--D

From: Darrick J. Wong <darrick.wong@oracle.com>
Subject: [PATCH] xfs: fix an ABBA deadlock in xfs_rename

When overlayfs is running on top of xfs and the user unlinks a file in
the overlay, overlayfs will create a whiteout inode and ask xfs to
"rename" the whiteout file atop the one being unlinked.  If the file
being unlinked loses its one nlink, we then have to put the inode on the
unlinked list.

This requires us to grab the AGI buffer of the whiteout inode to take it
off the unlinked list (which is where whiteouts are created) and to grab
the AGI buffer of the file being deleted.  If the whiteout was created
in a higher numbered AG than the file being deleted, we'll lock the AGIs
in the wrong order and deadlock.

Therefore, grab all the AGI locks we think we'll need ahead of time, and
in the correct order.

Reported-by: wenli xie <wlxie7296@gmail.com>
Fixes: 93597ae8dac0 ("xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_inode.c |   46 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index b7352bc4c815..dd419a1bc6ba 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3000,6 +3000,48 @@ xfs_rename_alloc_whiteout(
 	return 0;
 }
 
+/*
+ * For the general case of renaming files, lock all the AGI buffers we need to
+ * handle bumping the nlink of the whiteout inode off the unlinked list and to
+ * handle dropping the nlink of the target inode.  We have to do this in
+ * increasing AG order to avoid deadlocks.
+ */
+static int
+xfs_rename_lock_agis(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*wip,
+	struct xfs_inode	*target_ip)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	struct xfs_buf		*bp;
+	xfs_agnumber_t		agi_locks[2] = { NULLAGNUMBER, NULLAGNUMBER };
+	int			error;
+
+	if (wip)
+		agi_locks[0] = XFS_INO_TO_AGNO(mp, wip->i_ino);
+
+	if (target_ip && VFS_I(target_ip)->i_nlink == 1)
+		agi_locks[1] = XFS_INO_TO_AGNO(mp, target_ip->i_ino);
+
+	if (agi_locks[0] != NULLAGNUMBER && agi_locks[1] != NULLAGNUMBER &&
+	    agi_locks[0] > agi_locks[1])
+		swap(agi_locks[0], agi_locks[1]);
+
+	if (agi_locks[0] != NULLAGNUMBER) {
+		error = xfs_read_agi(mp, tp, agi_locks[0], &bp);
+		if (error)
+			return error;
+	}
+
+	if (agi_locks[1] != NULLAGNUMBER) {
+		error = xfs_read_agi(mp, tp, agi_locks[1], &bp);
+		if (error)
+			return error;
+	}
+
+	return 0;
+}
+
 /*
  * xfs_rename
  */
@@ -3130,6 +3172,10 @@ xfs_rename(
 		}
 	}
 
+	error = xfs_rename_lock_agis(tp, wip, target_ip);
+	if (error)
+		return error;
+
 	/*
 	 * Directory entry creation below may acquire the AGF. Remove
 	 * the whiteout from the unlinked list first to preserve correct

  parent reply	other threads:[~2020-12-17 21:12 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CABRboy006NP8JrxuBgEJbfCcGGUY2Kucwfov+HJf2xW34D5Ocg@mail.gmail.com>
2020-12-11 23:42 ` [Bug report] overlayfs over xfs whiteout operation may cause deadlock Darrick J. Wong
     [not found]   ` <CABRboy35_tyxA3gHN7_=xp0_RVugQjvFOHCRsH4Y4rrivE7HmQ@mail.gmail.com>
2020-12-17 21:11     ` Darrick J. Wong [this message]
2020-12-22  1:29       ` wenli xie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201217211117.GF38809@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=wlxie7296@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox