linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Xiao Ni <xni@redhat.com>
To: song@kernel.org
Cc: yukuai1@huaweicloud.com, bmarzins@redhat.com, heinzm@redhat.com,
	snitzer@kernel.org, ncroxon@redhat.com, neilb@suse.de,
	linux-raid@vger.kernel.org, dm-devel@lists.linux.dev
Subject: [PATCH RFC 1/4] dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid
Date: Tue, 20 Feb 2024 23:30:56 +0800	[thread overview]
Message-ID: <20240220153059.11233-2-xni@redhat.com> (raw)
In-Reply-To: <20240220153059.11233-1-xni@redhat.com>

MD_RECOVERY_WAIT is used by dmraid to delay reshape process by patch
commit 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock").
Before patch commit f52f5c71f3d4b ("md: fix stopping sync thread")
dmraid stopped sync thread directy by calling md_reap_sync_thread.
After this patch dmraid stops sync thread asynchronously as md does.
This is right. Now the dmraid stop process is like this:

1. raid_postsuspend->md_stop_writes->__md_stop_writes->stop_sync_thread.
stop_sync_thread sets MD_RECOVERY_INTR and wait until MD_RECOVERY_RUNNING
is cleared
2. md_do_sync finds MD_RECOVERY_WAIT is set and return. (This is the
root cause for this deadlock. We hope md_do_sync can set MD_RECOVERY_DONE)
3. md thread calls md_check_recovery (This is the place to reap sync
thread. Because MD_RECOVERY_DONE is not set. md thread can't reap sync
thread)
4. raid_dtr stops/free struct mddev and release dmraid related resources

dmraid only sets MD_RECOVERY_WAIT but doesn't clear it. It needs to clear
this bit when stopping the dmraid before stopping sync thread.

But the deadlock still can happen sometimes even MD_RECOVERY_WAIT is
cleared before stopping sync thread. It's the reason stop_sync_thread only
wakes up task. If the task isn't running, it still needs to wake up sync
thread too.

This deadlock can be reproduced 100% by these commands:
modprobe brd rd_size=34816 rd_nr=5
while [ 1 ]; do
vgcreate test_vg /dev/ram*
lvcreate --type raid5 -L 16M -n test_lv test_vg
lvconvert -y --stripes 4 /dev/test_vg/test_lv
vgremove test_vg -ff
sleep 1
done

Fixes: 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock")
Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
Signed-off-by: Xiao Ni <xni@redhat.com>
---
 drivers/md/dm-raid.c | 2 ++
 drivers/md/md.c      | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index eb009d6bb03a..325767c1140f 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3796,6 +3796,8 @@ static void raid_postsuspend(struct dm_target *ti)
 	struct raid_set *rs = ti->private;
 
 	if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
+		if (test_bit(MD_RECOVERY_WAIT, &rs->md.recovery))
+			clear_bit(MD_RECOVERY_WAIT, &rs->md.recovery);
 		/* Writes have to be stopped before suspending to avoid deadlocks. */
 		if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
 			md_stop_writes(&rs->md);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 2266358d8074..54790261254d 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4904,6 +4904,7 @@ static void stop_sync_thread(struct mddev *mddev, bool locked, bool check_seq)
 	 * never happen
 	 */
 	md_wakeup_thread_directly(mddev->sync_thread);
+	md_wakeup_thread(mddev->sync_thread);
 	if (work_pending(&mddev->sync_work))
 		flush_work(&mddev->sync_work);
 
-- 
2.32.0 (Apple Git-132)


  reply	other threads:[~2024-02-20 15:31 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-20 15:30 [PATCH RFC V2 0/4] Fix regression bugs Xiao Ni
2024-02-20 15:30 ` Xiao Ni [this message]
2024-02-23  3:31   ` [PATCH RFC 1/4] dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid Yu Kuai
2024-02-23 13:20     ` Xiao Ni
2024-02-26  1:31       ` Yu Kuai
2024-02-26  5:12         ` Xiao Ni
2024-02-26  9:36           ` Yu Kuai
2024-02-27  7:16             ` Xiao Ni
2024-02-27  7:39               ` Yu Kuai
2024-03-01  4:18             ` Xiao Ni
2024-02-23 10:31   ` Yu Kuai
2024-02-23 13:40     ` Xiao Ni
2024-02-20 15:30 ` [PATCH RFC 2/4] md: Set MD_RECOVERY_FROZEN before stop sync thread Xiao Ni
2024-02-23  3:12   ` Yu Kuai
2024-02-23  3:58     ` Song Liu
2024-02-20 15:30 ` [PATCH RFC 3/4] md: Missing decrease active_io for flush io Xiao Ni
2024-02-23  3:06   ` Yu Kuai
2024-02-23 13:49     ` Xiao Ni
2024-02-20 15:30 ` [PATCH RFC V2 4/4] md/raid5: Don't check crossing reshape when reshape hasn't started Xiao Ni
2024-02-23  3:08   ` Yu Kuai
2024-02-23 14:00     ` Xiao Ni
2024-02-21  5:45 ` [PATCH RFC V2 0/4] Fix regression bugs Benjamin Marzinski
2024-02-23  2:42   ` Xiao Ni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240220153059.11233-2-xni@redhat.com \
    --to=xni@redhat.com \
    --cc=bmarzins@redhat.com \
    --cc=dm-devel@lists.linux.dev \
    --cc=heinzm@redhat.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=ncroxon@redhat.com \
    --cc=neilb@suse.de \
    --cc=snitzer@kernel.org \
    --cc=song@kernel.org \
    --cc=yukuai1@huaweicloud.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).