[PATCH 0/6] Fix dmraid regression bugs

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/6] Fix dmraid regression bugs
@ 2024-02-29 15:49 Xiao Ni
  2024-02-29 15:49 ` [PATCH 1/6] md: Revert "md: Don't register sync_thread for reshape directly" Xiao Ni
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Xiao Ni @ 2024-02-29 15:49 UTC (permalink / raw)
  To: song; +Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, linux-raid, dm-devel

Hi all

This patch set tries to fix dmraid regression problems when we recently.
After talking with Kuai who also sent a patch set which is used to fix
dmraid regression problems, we decide to use a small patch set to fix
these regression problems. This patch is based on song's md-6.8 branch. 

This patch set has six patches. It reverts three patches. The fourth one
and the fifth one resolve deadlock problems. With these two patches, it
can resolve most deadlock problem. The last one fixes the raid5 reshape
deadlock problem.

I have run lvm2 regression test. There are 4 failed cases:
shell/dmsetup-integrity-keys.sh
shell/lvresize-fs-crypt.sh
shell/pvck-dump.sh
shell/select-report.sh

And lvconvert-raid-reshape.sh can fail sometimes. But it fails in 6.6
kernel too. So it can return back to the same state with 6.6 kernel.

Xiao Ni (6):
  Revert "md: Don't register sync_thread for reshape directly"
  Revert "md: Make sure md_do_sync() will set MD_RECOVERY_DONE"
  Revert "md: Don't ignore suspended array in md_check_recovery()"
  dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid
  md: Set MD_RECOVERY_FROZEN before stop sync thread
  md/raid5: Don't check crossing reshape when reshape hasn't started

 drivers/md/dm-raid.c |  2 ++
 drivers/md/md.c      | 22 +++++++++----------
 drivers/md/raid10.c  | 16 ++++++++++++--
 drivers/md/raid5.c   | 51 ++++++++++++++++++++++++++++++++------------
 4 files changed, 63 insertions(+), 28 deletions(-)

-- 
2.32.0 (Apple Git-132)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/6] md: Revert "md: Don't register sync_thread for reshape directly"
  2024-02-29 15:49 [PATCH 0/6] Fix dmraid regression bugs Xiao Ni
@ 2024-02-29 15:49 ` Xiao Ni
  2024-03-01  2:38   ` Yu Kuai
  2024-02-29 15:49 ` [PATCH 2/6] md: Revert "md: Make sure md_do_sync() will set MD_RECOVERY_DONE" Xiao Ni
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Xiao Ni @ 2024-02-29 15:49 UTC (permalink / raw)
  To: song; +Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, linux-raid, dm-devel

This reverts commit ad39c08186f8a0f221337985036ba86731d6aafe.

Function stop_sync_thread only wakes up sync task. It also needs to
wake up sync thread. This problem will be fixed in the following
patch.

Signed-off-by: Xiao Ni <xni@redhat.com>
---
 drivers/md/md.c     |  5 +----
 drivers/md/raid10.c | 16 ++++++++++++++--
 drivers/md/raid5.c  | 29 +++++++++++++++++++++++++++--
 3 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 9e41a9aaba8b..db4743ba7f6c 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9376,7 +9376,6 @@ static void md_start_sync(struct work_struct *ws)
 	struct mddev *mddev = container_of(ws, struct mddev, sync_work);
 	int spares = 0;
 	bool suspend = false;
-	char *name;
 
 	/*
 	 * If reshape is still in progress, spares won't be added or removed
@@ -9414,10 +9413,8 @@ static void md_start_sync(struct work_struct *ws)
 	if (spares)
 		md_bitmap_write_all(mddev->bitmap);
 
-	name = test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) ?
-			"reshape" : "resync";
 	rcu_assign_pointer(mddev->sync_thread,
-			   md_register_thread(md_do_sync, mddev, name));
+			   md_register_thread(md_do_sync, mddev, "resync"));
 	if (!mddev->sync_thread) {
 		pr_warn("%s: could not start resync thread...\n",
 			mdname(mddev));
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index a5f8419e2df1..7412066ea22c 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4175,7 +4175,11 @@ static int raid10_run(struct mddev *mddev)
 		clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
 		clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
 		set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
-		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+		set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
+		rcu_assign_pointer(mddev->sync_thread,
+			md_register_thread(md_do_sync, mddev, "reshape"));
+		if (!mddev->sync_thread)
+			goto out_free_conf;
 	}
 
 	return 0;
@@ -4569,8 +4573,16 @@ static int raid10_start_reshape(struct mddev *mddev)
 	clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
 	clear_bit(MD_RECOVERY_DONE, &mddev->recovery);
 	set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
-	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+	set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
+
+	rcu_assign_pointer(mddev->sync_thread,
+			   md_register_thread(md_do_sync, mddev, "reshape"));
+	if (!mddev->sync_thread) {
+		ret = -EAGAIN;
+		goto abort;
+	}
 	conf->reshape_checkpoint = jiffies;
+	md_wakeup_thread(mddev->sync_thread);
 	md_new_event();
 	return 0;
 
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 6a7a32f7fb91..8497880135ee 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7936,7 +7936,11 @@ static int raid5_run(struct mddev *mddev)
 		clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
 		clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
 		set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
-		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+		set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
+		rcu_assign_pointer(mddev->sync_thread,
+			md_register_thread(md_do_sync, mddev, "reshape"));
+		if (!mddev->sync_thread)
+			goto abort;
 	}
 
 	/* Ok, everything is just fine now */
@@ -8502,8 +8506,29 @@ static int raid5_start_reshape(struct mddev *mddev)
 	clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
 	clear_bit(MD_RECOVERY_DONE, &mddev->recovery);
 	set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
-	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+	set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
+	rcu_assign_pointer(mddev->sync_thread,
+			   md_register_thread(md_do_sync, mddev, "reshape"));
+	if (!mddev->sync_thread) {
+		mddev->recovery = 0;
+		spin_lock_irq(&conf->device_lock);
+		write_seqcount_begin(&conf->gen_lock);
+		mddev->raid_disks = conf->raid_disks = conf->previous_raid_disks;
+		mddev->new_chunk_sectors =
+			conf->chunk_sectors = conf->prev_chunk_sectors;
+		mddev->new_layout = conf->algorithm = conf->prev_algo;
+		rdev_for_each(rdev, mddev)
+			rdev->new_data_offset = rdev->data_offset;
+		smp_wmb();
+		conf->generation--;
+		conf->reshape_progress = MaxSector;
+		mddev->reshape_position = MaxSector;
+		write_seqcount_end(&conf->gen_lock);
+		spin_unlock_irq(&conf->device_lock);
+		return -EAGAIN;
+	}
 	conf->reshape_checkpoint = jiffies;
+	md_wakeup_thread(mddev->sync_thread);
 	md_new_event();
 	return 0;
 }
-- 
2.32.0 (Apple Git-132)


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/6] md: Revert "md: Make sure md_do_sync() will set MD_RECOVERY_DONE"
  2024-02-29 15:49 [PATCH 0/6] Fix dmraid regression bugs Xiao Ni
  2024-02-29 15:49 ` [PATCH 1/6] md: Revert "md: Don't register sync_thread for reshape directly" Xiao Ni
@ 2024-02-29 15:49 ` Xiao Ni
  2024-02-29 22:53   ` Song Liu
  2024-02-29 15:49 ` [PATCH 3/6] md: Revert "md: Don't ignore suspended array in md_check_recovery()" Xiao Ni
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Xiao Ni @ 2024-02-29 15:49 UTC (permalink / raw)
  To: song; +Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, linux-raid, dm-devel

This reverts commit 82ec0ae59d02e89164b24c0cc8e4e50de78b5fd6.

The root cause is that MD_RECOVERY_WAIT isn't cleared when stopping raid.
The following patch 'Clear MD_RECOVERY_WAIT when stopping dmraid' fixes
this problem.

Signed-off-by: Xiao Ni <xni@redhat.com>
---
 drivers/md/md.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index db4743ba7f6c..6376b1aad4d9 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8792,16 +8792,12 @@ void md_do_sync(struct md_thread *thread)
 	int ret;
 
 	/* just incase thread restarts... */
-	if (test_bit(MD_RECOVERY_DONE, &mddev->recovery))
+	if (test_bit(MD_RECOVERY_DONE, &mddev->recovery) ||
+	    test_bit(MD_RECOVERY_WAIT, &mddev->recovery))
 		return;
-
-	if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
-		goto skip;
-
-	if (test_bit(MD_RECOVERY_WAIT, &mddev->recovery) ||
-	    !md_is_rdwr(mddev)) {/* never try to sync a read-only array */
+	if (!md_is_rdwr(mddev)) {/* never try to sync a read-only array */
 		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-		goto skip;
+		return;
 	}
 
 	if (mddev_is_clustered(mddev)) {
-- 
2.32.0 (Apple Git-132)


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/6] md: Revert "md: Don't ignore suspended array in md_check_recovery()"
  2024-02-29 15:49 [PATCH 0/6] Fix dmraid regression bugs Xiao Ni
  2024-02-29 15:49 ` [PATCH 1/6] md: Revert "md: Don't register sync_thread for reshape directly" Xiao Ni
  2024-02-29 15:49 ` [PATCH 2/6] md: Revert "md: Make sure md_do_sync() will set MD_RECOVERY_DONE" Xiao Ni
@ 2024-02-29 15:49 ` Xiao Ni
  2024-02-29 15:49 ` [PATCH 4/6] dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid Xiao Ni
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Xiao Ni @ 2024-02-29 15:49 UTC (permalink / raw)
  To: song; +Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, linux-raid, dm-devel

This reverts commit 1baae052cccd08daf9a9d64c3f959d8cdb689757.

For dmraid, it doesn't allow any io including sync io when array is
suspended. Although it's a simple change in this patch, it still needs
more work to support it. Now we're trying to fix regression problems.
So let's keep as small changes as we can. We can rethink about this
in future.

Signed-off-by: Xiao Ni <xni@redhat.com>
---
 drivers/md/md.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 6376b1aad4d9..79dfc015c322 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9492,6 +9492,9 @@ static void unregister_sync_thread(struct mddev *mddev)
  */
 void md_check_recovery(struct mddev *mddev)
 {
+	if (READ_ONCE(mddev->suspended))
+		return;
+
 	if (mddev->bitmap)
 		md_bitmap_daemon_work(mddev);
 
-- 
2.32.0 (Apple Git-132)


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/6] dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid
  2024-02-29 15:49 [PATCH 0/6] Fix dmraid regression bugs Xiao Ni
                   ` (2 preceding siblings ...)
  2024-02-29 15:49 ` [PATCH 3/6] md: Revert "md: Don't ignore suspended array in md_check_recovery()" Xiao Ni
@ 2024-02-29 15:49 ` Xiao Ni
  2024-03-01  2:44   ` Yu Kuai
  2024-02-29 15:49 ` [PATCH 5/6] md: Set MD_RECOVERY_FROZEN before stop sync thread Xiao Ni
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Xiao Ni @ 2024-02-29 15:49 UTC (permalink / raw)
  To: song; +Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, linux-raid, dm-devel

MD_RECOVERY_WAIT is used by dmraid to delay reshape process by patch
commit 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock").
Before patch commit f52f5c71f3d4b ("md: fix stopping sync thread")
dmraid stopped sync thread directy by calling md_reap_sync_thread.
After this patch dmraid stops sync thread asynchronously as md does.
This is right. Now the dmraid stop process is like this:

1. raid_postsuspend->md_stop_writes->__md_stop_writes->stop_sync_thread.
stop_sync_thread sets MD_RECOVERY_INTR and wait until MD_RECOVERY_RUNNING
is cleared
2. md_do_sync finds MD_RECOVERY_WAIT is set and return. (This is the
root cause for this deadlock. We hope md_do_sync can set MD_RECOVERY_DONE)
3. md thread calls md_check_recovery (This is the place to reap sync
thread. Because MD_RECOVERY_DONE is not set. md thread can't reap sync
thread)
4. raid_dtr stops/free struct mddev and release dmraid related resources

dmraid only sets MD_RECOVERY_WAIT but doesn't clear it. It needs to clear
this bit when stopping the dmraid before stopping sync thread.

But the deadlock still can happen sometimes even MD_RECOVERY_WAIT is
cleared before stopping sync thread. It's the reason stop_sync_thread only
wakes up task. If the task isn't running, it still needs to wake up sync
thread too.

This deadlock can be reproduced 100% by these commands:
modprobe brd rd_size=34816 rd_nr=5
while [ 1 ]; do
vgcreate test_vg /dev/ram*
lvcreate --type raid5 -L 16M -n test_lv test_vg
lvconvert -y --stripes 4 /dev/test_vg/test_lv
vgremove test_vg -ff
sleep 1
done

Fixes: 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock")
Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
Signed-off-by: Xiao Ni <xni@redhat.com>
---
 drivers/md/dm-raid.c | 2 ++
 drivers/md/md.c      | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index eb009d6bb03a..325767c1140f 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3796,6 +3796,8 @@ static void raid_postsuspend(struct dm_target *ti)
 	struct raid_set *rs = ti->private;
 
 	if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
+		if (test_bit(MD_RECOVERY_WAIT, &rs->md.recovery))
+			clear_bit(MD_RECOVERY_WAIT, &rs->md.recovery);
 		/* Writes have to be stopped before suspending to avoid deadlocks. */
 		if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
 			md_stop_writes(&rs->md);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 79dfc015c322..f264749be28b 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4908,6 +4908,7 @@ static void stop_sync_thread(struct mddev *mddev, bool locked, bool check_seq)
 	 * never happen
 	 */
 	md_wakeup_thread_directly(mddev->sync_thread);
+	md_wakeup_thread(mddev->sync_thread);
 	if (work_pending(&mddev->sync_work))
 		flush_work(&mddev->sync_work);
 
-- 
2.32.0 (Apple Git-132)


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/6] md: Set MD_RECOVERY_FROZEN before stop sync thread
  2024-02-29 15:49 [PATCH 0/6] Fix dmraid regression bugs Xiao Ni
                   ` (3 preceding siblings ...)
  2024-02-29 15:49 ` [PATCH 4/6] dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid Xiao Ni
@ 2024-02-29 15:49 ` Xiao Ni
  2024-02-29 15:49 ` [PATCH 6/6] md/raid5: Don't check crossing reshape when reshape hasn't started Xiao Ni
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Xiao Ni @ 2024-02-29 15:49 UTC (permalink / raw)
  To: song; +Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, linux-raid, dm-devel

After patch commit f52f5c71f3d4b ("md: fix stopping sync thread"), dmraid
stops sync thread asynchronously. The calling process is:
dev_remove->dm_destroy->__dm_destroy->raid_postsuspend->raid_dtr

raid_postsuspend does two jobs. First, it stops sync thread. Then it
suspend array. Now it can stop sync thread successfully. But it doesn't
set MD_RECOVERY_FROZEN. It's introduced by patch f52f5c71f3d4b. So after
raid_postsuspend, the sync thread starts again. raid_dtr can't stop the
sync thread because the array is already suspended.

This can be reproduced easily by those commands:
while [ 1 ]; do
vgcreate test_vg /dev/loop0 /dev/loop1
lvcreate --type raid1 -L 400M -m 1 -n test_lv test_vg
lvchange -an test_vg
vgremove test_vg -ff
done

Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
Signed-off-by: Xiao Ni <xni@redhat.com>
---
 drivers/md/md.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index f264749be28b..cf15ccf0e27b 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6341,6 +6341,7 @@ static void __md_stop_writes(struct mddev *mddev)
 void md_stop_writes(struct mddev *mddev)
 {
 	mddev_lock_nointr(mddev);
+	set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
 	__md_stop_writes(mddev);
 	mddev_unlock(mddev);
 }
-- 
2.32.0 (Apple Git-132)


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/6] md/raid5: Don't check crossing reshape when reshape hasn't started
  2024-02-29 15:49 [PATCH 0/6] Fix dmraid regression bugs Xiao Ni
                   ` (4 preceding siblings ...)
  2024-02-29 15:49 ` [PATCH 5/6] md: Set MD_RECOVERY_FROZEN before stop sync thread Xiao Ni
@ 2024-02-29 15:49 ` Xiao Ni
  2024-02-29 19:39 ` [PATCH 0/6] Fix dmraid regression bugs Christoph Hellwig
  2024-03-01  2:12 ` Yu Kuai
  7 siblings, 0 replies; 19+ messages in thread
From: Xiao Ni @ 2024-02-29 15:49 UTC (permalink / raw)
  To: song; +Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, linux-raid, dm-devel

stripe_ahead_of_reshape is used to check if a stripe region cross the
reshape position. So first, change the function name to
stripe_across_reshape to describe the usage of this function.

For reshape backwards, it starts reshape from the end of array and conf->
reshape_progress is init to raid5_size. During reshape, if previous is true
(set in make_stripe_request) and max_sector >= conf->reshape_progress, ios
should wait until reshape window moves forward. But ios don't need to wait
if max_sector is raid5_size.

And put the conditions into the function directly to make understand the
codes easily.

This can be reproduced easily by lvm2 test shell/lvconvert-raid-reshape.sh
For dm raid reshape, before starting sync thread, it needs to reload table
some times. In one time dm raid uses MD_RECOVERY_WAIT to delay reshape and
it doesn't start sync thread this time. Then one io comes in and it waits
because stripe_ahead_of_reshape returns true because it's a backward
reshape and max_sectors > conf->reshape_progress. But the reshape hasn't
started. So skip this check when reshape_progress is raid5_size

Fixes: 486f60558607 ("md/raid5: Check all disks in a stripe_head for reshape progress")
Signed-off-by: Xiao Ni <xni@redhat.com>
---
 drivers/md/raid5.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 8497880135ee..965991a3104f 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5832,17 +5832,12 @@ static bool ahead_of_reshape(struct mddev *mddev, sector_t sector,
 					  sector >= reshape_sector;
 }
 
-static bool range_ahead_of_reshape(struct mddev *mddev, sector_t min,
-				   sector_t max, sector_t reshape_sector)
-{
-	return mddev->reshape_backwards ? max < reshape_sector :
-					  min >= reshape_sector;
-}
-
-static bool stripe_ahead_of_reshape(struct mddev *mddev, struct r5conf *conf,
+static sector_t raid5_size(struct mddev *mddev, sector_t sectors, int raid_disks);
+static bool stripe_across_reshape(struct mddev *mddev, struct r5conf *conf,
 				    struct stripe_head *sh)
 {
 	sector_t max_sector = 0, min_sector = MaxSector;
+	sector_t reshape_pos = 0;
 	bool ret = false;
 	int dd_idx;
 
@@ -5856,9 +5851,12 @@ static bool stripe_ahead_of_reshape(struct mddev *mddev, struct r5conf *conf,
 
 	spin_lock_irq(&conf->device_lock);
 
-	if (!range_ahead_of_reshape(mddev, min_sector, max_sector,
-				     conf->reshape_progress))
-		/* mismatch, need to try again */
+	reshape_pos = conf->reshape_progress;
+	if (mddev->reshape_backwards) {
+		if (max_sector >= reshape_pos &&
+		    reshape_pos != raid5_size(mddev, 0, 0))
+			ret = true;
+	} else if (min_sector < reshape_pos)
 		ret = true;
 
 	spin_unlock_irq(&conf->device_lock);
@@ -5969,7 +5967,7 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
 	}
 
 	if (unlikely(previous) &&
-	    stripe_ahead_of_reshape(mddev, conf, sh)) {
+	    stripe_across_reshape(mddev, conf, sh)) {
 		/*
 		 * Expansion moved on while waiting for a stripe.
 		 * Expansion could still move past after this
-- 
2.32.0 (Apple Git-132)


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/6] Fix dmraid regression bugs
  2024-02-29 15:49 [PATCH 0/6] Fix dmraid regression bugs Xiao Ni
                   ` (5 preceding siblings ...)
  2024-02-29 15:49 ` [PATCH 6/6] md/raid5: Don't check crossing reshape when reshape hasn't started Xiao Ni
@ 2024-02-29 19:39 ` Christoph Hellwig
  2024-02-29 19:45   ` Song Liu
  2024-03-01  2:12 ` Yu Kuai
  7 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2024-02-29 19:39 UTC (permalink / raw)
  To: Xiao Ni
  Cc: song, yukuai1, bmarzins, heinzm, snitzer, ncroxon, linux-raid,
	dm-devel

If I rund this on the md/md-6.9-for-hch branch all the hangs I was
previously seeing in the lvm2 test suite are gone.  Still a bunchof
failures, though:

### 427 tests: 284 passed, 127 skipped, 0 timed out, 3 warned, 13 failed 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/6] Fix dmraid regression bugs
  2024-02-29 19:39 ` [PATCH 0/6] Fix dmraid regression bugs Christoph Hellwig
@ 2024-02-29 19:45   ` Song Liu
  0 siblings, 0 replies; 19+ messages in thread
From: Song Liu @ 2024-02-29 19:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Xiao Ni, yukuai1, bmarzins, heinzm, snitzer, ncroxon, linux-raid,
	dm-devel

On Thu, Feb 29, 2024 at 11:39 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> If I rund this on the md/md-6.9-for-hch branch all the hangs I was
> previously seeing in the lvm2 test suite are gone.  Still a bunchof
> failures, though:
>
> ### 427 tests: 284 passed, 127 skipped, 0 timed out, 3 warned, 13 failed

Yes, this set fixes the issues we are seeing with lvm2 tests. However,
it triggers some other issue. I am looking into it.

Thanks,
Song

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] md: Revert "md: Make sure md_do_sync() will set MD_RECOVERY_DONE"
  2024-02-29 15:49 ` [PATCH 2/6] md: Revert "md: Make sure md_do_sync() will set MD_RECOVERY_DONE" Xiao Ni
@ 2024-02-29 22:53   ` Song Liu
  2024-02-29 23:45     ` Song Liu
  0 siblings, 1 reply; 19+ messages in thread
From: Song Liu @ 2024-02-29 22:53 UTC (permalink / raw)
  To: Xiao Ni; +Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, linux-raid, dm-devel

On Thu, Feb 29, 2024 at 7:50 AM Xiao Ni <xni@redhat.com> wrote:
>
> This reverts commit 82ec0ae59d02e89164b24c0cc8e4e50de78b5fd6.
>
> The root cause is that MD_RECOVERY_WAIT isn't cleared when stopping raid.
> The following patch 'Clear MD_RECOVERY_WAIT when stopping dmraid' fixes
> this problem.
>
> Signed-off-by: Xiao Ni <xni@redhat.com>

I think we still need 82ec0ae59d02e89164b24c0cc8e4e50de78b5fd6 or some
variation of it. Otherwise, we may hit the following deadlock. The test vm here
has 2 raid arrays: one raid5 with journal, and a raid1.

I pushed other patches in the set to the md-6.9-for-hch branch for
further tests.

Thanks,
Song


[  250.347646] INFO: task systemd-udevd:546 blocked for more than 122 seconds.
[  250.348443]       Not tainted 6.8.0-rc3+ #479
[  250.348912] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  250.349741] task:systemd-udevd   state:D stack:27136 pid:546
tgid:546   ppid:525    flags:0x00000000
[  250.350740] Call Trace:
[  250.351043]  <TASK>
[  250.351310]  __schedule+0x862/0x19b0
[  250.351770]  ? __pfx___schedule+0x10/0x10
[  250.352222]  ? lock_release+0x250/0x690
[  250.352657]  ? __pfx_lock_release+0x10/0x10
[  250.353128]  ? mark_held_locks+0x62/0x90
[  250.353604]  schedule+0x77/0x200
[  250.353976]  md_handle_request+0x1fe/0x650
[  250.354459]  ? __pfx_md_handle_request+0x10/0x10
[  250.354957]  ? bio_split_to_limits+0x131/0x150
[  250.355456]  ? __pfx_autoremove_wake_function+0x10/0x10
[  250.356031]  ? lock_is_held_type+0xda/0x130
[  250.356515]  __submit_bio+0x99/0xe0
[  250.356910]  submit_bio_noacct_nocheck+0x25a/0x570
[  250.357510]  ? __pfx_submit_bio_noacct_nocheck+0x10/0x10
[  250.358080]  ? __might_resched+0x274/0x350
[  250.358546]  ? submit_bio_noacct+0x1b7/0x6c0
[  250.359067]  mpage_readahead+0x25b/0x300
[  250.359507]  ? __pfx_mpage_readahead+0x10/0x10
[  250.359986]  ? __pfx___lock_acquire+0x10/0x10
[  250.360524]  ? __pfx_blkdev_get_block+0x10/0x10
[  250.361046]  ? __pfx_lock_release+0x10/0x10
[  250.361602]  ? __pfx___filemap_add_folio+0x10/0x10
[  250.362250]  ? lock_is_held_type+0xda/0x130
[  250.362785]  read_pages+0xfd/0x650
[  250.363173]  ? __pfx_read_pages+0x10/0x10
[  250.363685]  page_cache_ra_unbounded+0x1df/0x2d0
[  250.364228]  force_page_cache_ra+0x11e/0x150
[  250.364716]  filemap_get_pages+0x6f1/0xbb0
[  250.365218]  ? __pfx_filemap_get_pages+0x10/0x10
[  250.365735]  ? lock_is_held_type+0xda/0x130
[  250.366266]  filemap_read+0x216/0x6a0
[  250.366679]  ? __pfx_mark_lock+0x10/0x10
[  250.367132]  ? __pfx_ptep_set_access_flags+0x10/0x10
[  250.367765]  ? __pfx_filemap_read+0x10/0x10
[  250.368234]  ? __lock_acquire+0x959/0x3540
[  250.368756]  blkdev_read_iter+0xc0/0x230
[  250.369200]  vfs_read+0x38c/0x540
[  250.369581]  ? __pfx_vfs_read+0x10/0x10
[  250.370038]  ? __fget_light+0x96/0xd0
[  250.370469]  ksys_read+0xcb/0x170
[  250.370839]  ? __pfx_ksys_read+0x10/0x10
[  250.371320]  do_syscall_64+0x7a/0x1a0
[  250.371735]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[  250.372367] RIP: 0033:0x7fcb590118b2
[  250.372865] RSP: 002b:00007ffcdd5f9c18 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[  250.373840] RAX: ffffffffffffffda RBX: 0000555885985010 RCX: 00007fcb590118b2
[  250.374641] RDX: 0000000000000040 RSI: 0000555885985038 RDI: 0000000000000011
[  250.375437] RBP: 000055588599fd40 R08: 0000555885985010 R09: 000055588596c010
[  250.376222] R10: 00007fcb58fbfbc0 R11: 0000000000000246 R12: 00000000804f0000
[  250.376974] R13: 0000000000000040 R14: 000055588599fd90 R15: 0000555885985028
[  250.377811]  </TASK>
[  250.378073] INFO: task mdadm:562 blocked for more than 122 seconds.
[  250.378753]       Not tainted 6.8.0-rc3+ #479
[  250.379237] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  250.380055] task:mdadm           state:D stack:25872 pid:562
tgid:562   ppid:543    flags:0x00004000
[  250.381071] Call Trace:
[  250.381369]  <TASK>
[  250.381625]  __schedule+0x862/0x19b0
[  250.382054]  ? __pfx___schedule+0x10/0x10
[  250.382502]  ? lock_release+0x250/0x690
[  250.382943]  ? __pfx_lock_release+0x10/0x10
[  250.383407]  ? mark_held_locks+0x24/0x90
[  250.383851]  ? lockdep_hardirqs_on+0x7d/0x100
[  250.384345]  ? preempt_count_sub+0x18/0xd0
[  250.384806]  ? _raw_spin_unlock_irqrestore+0x3f/0x60
[  250.385358]  schedule+0x77/0x200
[  250.385718]  md_ioctl+0x1750/0x1d60
[  250.386114]  ? __pfx_md_ioctl+0x10/0x10
[  250.386535]  ? _raw_spin_unlock_irqrestore+0x34/0x60
[  250.387063]  ? lockdep_hardirqs_on+0x7d/0x100
[  250.387567]  ? preempt_count_sub+0x18/0xd0
[  250.388024]  ? populate_seccomp_data+0x184/0x220
[  250.388522]  ? __pfx_autoremove_wake_function+0x10/0x10
[  250.389083]  ? __seccomp_filter+0x102/0x760
[  250.389553]  blkdev_ioctl+0x1f1/0x3c0
[  250.389956]  ? __pfx_blkdev_ioctl+0x10/0x10
[  250.390441]  __x64_sys_ioctl+0xc6/0x100
[  250.390880]  do_syscall_64+0x7a/0x1a0
[  250.391313]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[  250.391877] RIP: 0033:0x7fd88eef362b
[  250.392290] RSP: 002b:00007fff8c298438 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[  250.393098] RAX: ffffffffffffffda RBX: 000055e1b77a2300 RCX: 00007fd88eef362b
[  250.393896] RDX: 00007fff8c2985a8 RSI: 0000000040140921 RDI: 0000000000000004
[  250.394664] RBP: 0000000000000005 R08: 000000000000001e R09: 00007fff8c298197
[  250.395457] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[  250.396223] R13: 000055e1b77a4c70 R14: 00007fff8c2984f8 R15: 000055e1b77a46d0
[  250.397050]  </TASK>
[  250.397357]
[  250.397357] Showing all locks held in the system:
[  250.398092] 1 lock held by khungtaskd/211:
[  250.398535]  #0: ffffffff87f6fea0 (rcu_read_lock){....}-{1:2}, at:
debug_show_all_locks+0x4d/0x230
[  250.399613] 1 lock held by systemd-journal/499:
[  250.400124] 1 lock held by systemd-udevd/546:
[  250.400616]  #0: ffff88801461d178
(mapping.invalidate_lock){.+.+}-{3:3}, at:
page_cache_ra_unbounded+0xa4/0x2d0
[  250.401701]
[  250.401882] =============================================
[  250.401882]
[  250.402618] Kernel panic - not syncing: hung_task: blocked tasks
[  250.403294] CPU: 2 PID: 211 Comm: khungtaskd Not tainted 6.8.0-rc3+ #479
[  250.404046] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[  250.405264] Call Trace:
[  250.405537]  <TASK>
[  250.405776]  dump_stack_lvl+0x4a/0x80
[  250.406185]  panic+0x41c/0x460
[  250.406592]  ? __pfx_panic+0x10/0x10
[  250.407167]  ? lock_release+0x205/0x690
[  250.407713]  ? preempt_count_sub+0x18/0xd0
[  250.408273]  watchdog+0x9af/0x9b0
[  250.408673]  ? __pfx_watchdog+0x10/0x10
[  250.409097]  kthread+0x1b1/0x1f0
[  250.409476]  ? kthread+0xf6/0x1f0
[  250.409849]  ? __pfx_kthread+0x10/0x10
[  250.410276]  ret_from_fork+0x31/0x60
[  250.410704]  ? __pfx_kthread+0x10/0x10
[  250.411123]  ret_from_fork_asm+0x1b/0x30
[  250.411604]  </TASK>
[  250.412330] Kernel Offset: disabled
[  250.412802] ---[ end Kernel panic - not syncing: hung_task: blocked
tasks ]---

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] md: Revert "md: Make sure md_do_sync() will set MD_RECOVERY_DONE"
  2024-02-29 22:53   ` Song Liu
@ 2024-02-29 23:45     ` Song Liu
  2024-03-01  0:49       ` Xiao Ni
  0 siblings, 1 reply; 19+ messages in thread
From: Song Liu @ 2024-02-29 23:45 UTC (permalink / raw)
  To: Xiao Ni; +Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, linux-raid, dm-devel

On Thu, Feb 29, 2024 at 2:53 PM Song Liu <song@kernel.org> wrote:
>
> On Thu, Feb 29, 2024 at 7:50 AM Xiao Ni <xni@redhat.com> wrote:
> >
> > This reverts commit 82ec0ae59d02e89164b24c0cc8e4e50de78b5fd6.
> >
> > The root cause is that MD_RECOVERY_WAIT isn't cleared when stopping raid.
> > The following patch 'Clear MD_RECOVERY_WAIT when stopping dmraid' fixes
> > this problem.
> >
> > Signed-off-by: Xiao Ni <xni@redhat.com>
>
> I think we still need 82ec0ae59d02e89164b24c0cc8e4e50de78b5fd6 or some
> variation of it. Otherwise, we may hit the following deadlock. The test vm here
> has 2 raid arrays: one raid5 with journal, and a raid1.
>
> I pushed other patches in the set to the md-6.9-for-hch branch for
> further tests.

Actually, it appears md-6.9-for-hch branch still has this problem. Let me test
more..

Song

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] md: Revert "md: Make sure md_do_sync() will set MD_RECOVERY_DONE"
  2024-02-29 23:45     ` Song Liu
@ 2024-03-01  0:49       ` Xiao Ni
  2024-03-01  1:11         ` Song Liu
  0 siblings, 1 reply; 19+ messages in thread
From: Xiao Ni @ 2024-03-01  0:49 UTC (permalink / raw)
  To: Song Liu
  Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, linux-raid, dm-devel

On Fri, Mar 1, 2024 at 7:46 AM Song Liu <song@kernel.org> wrote:
>
> On Thu, Feb 29, 2024 at 2:53 PM Song Liu <song@kernel.org> wrote:
> >
> > On Thu, Feb 29, 2024 at 7:50 AM Xiao Ni <xni@redhat.com> wrote:
> > >
> > > This reverts commit 82ec0ae59d02e89164b24c0cc8e4e50de78b5fd6.
> > >
> > > The root cause is that MD_RECOVERY_WAIT isn't cleared when stopping raid.
> > > The following patch 'Clear MD_RECOVERY_WAIT when stopping dmraid' fixes
> > > this problem.
> > >
> > > Signed-off-by: Xiao Ni <xni@redhat.com>
> >
> > I think we still need 82ec0ae59d02e89164b24c0cc8e4e50de78b5fd6 or some
> > variation of it. Otherwise, we may hit the following deadlock. The test vm here
> > has 2 raid arrays: one raid5 with journal, and a raid1.
> >
> > I pushed other patches in the set to the md-6.9-for-hch branch for
> > further tests.
>
> Actually, it appears md-6.9-for-hch branch still has this problem. Let me test
> more..
>
> Song
>

Hi Song

What are the commands you use for testing? Can you reproduce it with
the 6.6 kernel?

Regards
Xiao


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] md: Revert "md: Make sure md_do_sync() will set MD_RECOVERY_DONE"
  2024-03-01  0:49       ` Xiao Ni
@ 2024-03-01  1:11         ` Song Liu
  0 siblings, 0 replies; 19+ messages in thread
From: Song Liu @ 2024-03-01  1:11 UTC (permalink / raw)
  To: Xiao Ni; +Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, linux-raid, dm-devel

On Thu, Feb 29, 2024 at 4:49 PM Xiao Ni <xni@redhat.com> wrote:
>
> On Fri, Mar 1, 2024 at 7:46 AM Song Liu <song@kernel.org> wrote:
> >
> > On Thu, Feb 29, 2024 at 2:53 PM Song Liu <song@kernel.org> wrote:
> > >
> > > On Thu, Feb 29, 2024 at 7:50 AM Xiao Ni <xni@redhat.com> wrote:
> > > >
> > > > This reverts commit 82ec0ae59d02e89164b24c0cc8e4e50de78b5fd6.
> > > >
> > > > The root cause is that MD_RECOVERY_WAIT isn't cleared when stopping raid.
> > > > The following patch 'Clear MD_RECOVERY_WAIT when stopping dmraid' fixes
> > > > this problem.
> > > >
> > > > Signed-off-by: Xiao Ni <xni@redhat.com>
> > >
> > > I think we still need 82ec0ae59d02e89164b24c0cc8e4e50de78b5fd6 or some
> > > variation of it. Otherwise, we may hit the following deadlock. The test vm here
> > > has 2 raid arrays: one raid5 with journal, and a raid1.
> > >
> > > I pushed other patches in the set to the md-6.9-for-hch branch for
> > > further tests.
> >
> > Actually, it appears md-6.9-for-hch branch still has this problem. Let me test
> > more..
> >
> > Song
> >
>
> Hi Song
>
> What are the commands you use for testing? Can you reproduce it with
> the 6.6 kernel?

The VM has these two arrays assembled automatically on boot. I can repro
the issue by simply reboot the VM (which triggers stop array on both). So
the repro is basically rebooting the array in a loop via ssh.

For this branch,

https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h=md-6.9-for-hch

which has 5 of the 6 patches in these set, I can reproduce the issue. This issue
doesn't happen on commit aee93ec0ec79, which is before this set.

Song

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/6] Fix dmraid regression bugs
  2024-02-29 15:49 [PATCH 0/6] Fix dmraid regression bugs Xiao Ni
                   ` (6 preceding siblings ...)
  2024-02-29 19:39 ` [PATCH 0/6] Fix dmraid regression bugs Christoph Hellwig
@ 2024-03-01  2:12 ` Yu Kuai
  2024-03-01  2:22   ` Xiao Ni
  7 siblings, 1 reply; 19+ messages in thread
From: Yu Kuai @ 2024-03-01  2:12 UTC (permalink / raw)
  To: Xiao Ni, song
  Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, linux-raid, dm-devel,
	yukuai (C)

Hi,

在 2024/02/29 23:49, Xiao Ni 写道:
> Hi all
> 
> This patch set tries to fix dmraid regression problems when we recently.
> After talking with Kuai who also sent a patch set which is used to fix
> dmraid regression problems, we decide to use a small patch set to fix
> these regression problems. This patch is based on song's md-6.8 branch.
> 
> This patch set has six patches. It reverts three patches. The fourth one
> and the fifth one resolve deadlock problems. With these two patches, it
> can resolve most deadlock problem. The last one fixes the raid5 reshape
> deadlock problem.
> 
> I have run lvm2 regression test. There are 4 failed cases:
> shell/dmsetup-integrity-keys.sh
> shell/lvresize-fs-crypt.sh
> shell/pvck-dump.sh
> shell/select-report.sh

You might need to run the test suite in a loop to make sure there are no
tests that will fail occasionally.

Thanks,
Kuai

> 
> And lvconvert-raid-reshape.sh can fail sometimes. But it fails in 6.6
> kernel too. So it can return back to the same state with 6.6 kernel.
> 
> Xiao Ni (6):
>    Revert "md: Don't register sync_thread for reshape directly"
>    Revert "md: Make sure md_do_sync() will set MD_RECOVERY_DONE"
>    Revert "md: Don't ignore suspended array in md_check_recovery()"
>    dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid
>    md: Set MD_RECOVERY_FROZEN before stop sync thread
>    md/raid5: Don't check crossing reshape when reshape hasn't started
> 
>   drivers/md/dm-raid.c |  2 ++
>   drivers/md/md.c      | 22 +++++++++----------
>   drivers/md/raid10.c  | 16 ++++++++++++--
>   drivers/md/raid5.c   | 51 ++++++++++++++++++++++++++++++++------------
>   4 files changed, 63 insertions(+), 28 deletions(-)
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/6] Fix dmraid regression bugs
  2024-03-01  2:12 ` Yu Kuai
@ 2024-03-01  2:22   ` Xiao Ni
  0 siblings, 0 replies; 19+ messages in thread
From: Xiao Ni @ 2024-03-01  2:22 UTC (permalink / raw)
  To: Yu Kuai
  Cc: song, bmarzins, heinzm, snitzer, ncroxon, linux-raid, dm-devel,
	yukuai (C)

On Fri, Mar 1, 2024 at 10:12 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2024/02/29 23:49, Xiao Ni 写道:
> > Hi all
> >
> > This patch set tries to fix dmraid regression problems when we recently.
> > After talking with Kuai who also sent a patch set which is used to fix
> > dmraid regression problems, we decide to use a small patch set to fix
> > these regression problems. This patch is based on song's md-6.8 branch.
> >
> > This patch set has six patches. It reverts three patches. The fourth one
> > and the fifth one resolve deadlock problems. With these two patches, it
> > can resolve most deadlock problem. The last one fixes the raid5 reshape
> > deadlock problem.
> >
> > I have run lvm2 regression test. There are 4 failed cases:
> > shell/dmsetup-integrity-keys.sh
> > shell/lvresize-fs-crypt.sh
> > shell/pvck-dump.sh
> > shell/select-report.sh
>
> You might need to run the test suite in a loop to make sure there are no
> tests that will fail occasionally.

I'll let the tests run today to check if there are more errors.

Regards
Xiao
>
> Thanks,
> Kuai
>
> >
> > And lvconvert-raid-reshape.sh can fail sometimes. But it fails in 6.6
> > kernel too. So it can return back to the same state with 6.6 kernel.
> >
> > Xiao Ni (6):
> >    Revert "md: Don't register sync_thread for reshape directly"
> >    Revert "md: Make sure md_do_sync() will set MD_RECOVERY_DONE"
> >    Revert "md: Don't ignore suspended array in md_check_recovery()"
> >    dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid
> >    md: Set MD_RECOVERY_FROZEN before stop sync thread
> >    md/raid5: Don't check crossing reshape when reshape hasn't started
> >
> >   drivers/md/dm-raid.c |  2 ++
> >   drivers/md/md.c      | 22 +++++++++----------
> >   drivers/md/raid10.c  | 16 ++++++++++++--
> >   drivers/md/raid5.c   | 51 ++++++++++++++++++++++++++++++++------------
> >   4 files changed, 63 insertions(+), 28 deletions(-)
> >
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] md: Revert "md: Don't register sync_thread for reshape directly"
  2024-02-29 15:49 ` [PATCH 1/6] md: Revert "md: Don't register sync_thread for reshape directly" Xiao Ni
@ 2024-03-01  2:38   ` Yu Kuai
  2024-03-01  4:41     ` Xiao Ni
  0 siblings, 1 reply; 19+ messages in thread
From: Yu Kuai @ 2024-03-01  2:38 UTC (permalink / raw)
  To: Xiao Ni, song
  Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, linux-raid, dm-devel,
	yukuai (C)

Hi,

在 2024/02/29 23:49, Xiao Ni 写道:
> This reverts commit ad39c08186f8a0f221337985036ba86731d6aafe.
> 
> Function stop_sync_thread only wakes up sync task. It also needs to
> wake up sync thread. This problem will be fixed in the following
> patch.

I don't think so, unlike mddev->thread, sync_thread will only be
executed once and must be executed each time it's registered, and caller
must make sure to wake up registered sync_thread.

Thanks,
Kuai
> 
> Signed-off-by: Xiao Ni <xni@redhat.com>
> ---
>   drivers/md/md.c     |  5 +----
>   drivers/md/raid10.c | 16 ++++++++++++++--
>   drivers/md/raid5.c  | 29 +++++++++++++++++++++++++++--
>   3 files changed, 42 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 9e41a9aaba8b..db4743ba7f6c 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -9376,7 +9376,6 @@ static void md_start_sync(struct work_struct *ws)
>   	struct mddev *mddev = container_of(ws, struct mddev, sync_work);
>   	int spares = 0;
>   	bool suspend = false;
> -	char *name;
>   
>   	/*
>   	 * If reshape is still in progress, spares won't be added or removed
> @@ -9414,10 +9413,8 @@ static void md_start_sync(struct work_struct *ws)
>   	if (spares)
>   		md_bitmap_write_all(mddev->bitmap);
>   
> -	name = test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) ?
> -			"reshape" : "resync";
>   	rcu_assign_pointer(mddev->sync_thread,
> -			   md_register_thread(md_do_sync, mddev, name));
> +			   md_register_thread(md_do_sync, mddev, "resync"));
>   	if (!mddev->sync_thread) {
>   		pr_warn("%s: could not start resync thread...\n",
>   			mdname(mddev));
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index a5f8419e2df1..7412066ea22c 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -4175,7 +4175,11 @@ static int raid10_run(struct mddev *mddev)
>   		clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
>   		clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
>   		set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
> -		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> +		set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
> +		rcu_assign_pointer(mddev->sync_thread,
> +			md_register_thread(md_do_sync, mddev, "reshape"));
> +		if (!mddev->sync_thread)
> +			goto out_free_conf;
>   	}
>   
>   	return 0;
> @@ -4569,8 +4573,16 @@ static int raid10_start_reshape(struct mddev *mddev)
>   	clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
>   	clear_bit(MD_RECOVERY_DONE, &mddev->recovery);
>   	set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
> -	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> +	set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
> +
> +	rcu_assign_pointer(mddev->sync_thread,
> +			   md_register_thread(md_do_sync, mddev, "reshape"));
> +	if (!mddev->sync_thread) {
> +		ret = -EAGAIN;
> +		goto abort;
> +	}
>   	conf->reshape_checkpoint = jiffies;
> +	md_wakeup_thread(mddev->sync_thread);
>   	md_new_event();
>   	return 0;
>   
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 6a7a32f7fb91..8497880135ee 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -7936,7 +7936,11 @@ static int raid5_run(struct mddev *mddev)
>   		clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
>   		clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
>   		set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
> -		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> +		set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
> +		rcu_assign_pointer(mddev->sync_thread,
> +			md_register_thread(md_do_sync, mddev, "reshape"));
> +		if (!mddev->sync_thread)
> +			goto abort;
>   	}
>   
>   	/* Ok, everything is just fine now */
> @@ -8502,8 +8506,29 @@ static int raid5_start_reshape(struct mddev *mddev)
>   	clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
>   	clear_bit(MD_RECOVERY_DONE, &mddev->recovery);
>   	set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
> -	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> +	set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
> +	rcu_assign_pointer(mddev->sync_thread,
> +			   md_register_thread(md_do_sync, mddev, "reshape"));
> +	if (!mddev->sync_thread) {
> +		mddev->recovery = 0;
> +		spin_lock_irq(&conf->device_lock);
> +		write_seqcount_begin(&conf->gen_lock);
> +		mddev->raid_disks = conf->raid_disks = conf->previous_raid_disks;
> +		mddev->new_chunk_sectors =
> +			conf->chunk_sectors = conf->prev_chunk_sectors;
> +		mddev->new_layout = conf->algorithm = conf->prev_algo;
> +		rdev_for_each(rdev, mddev)
> +			rdev->new_data_offset = rdev->data_offset;
> +		smp_wmb();
> +		conf->generation--;
> +		conf->reshape_progress = MaxSector;
> +		mddev->reshape_position = MaxSector;
> +		write_seqcount_end(&conf->gen_lock);
> +		spin_unlock_irq(&conf->device_lock);
> +		return -EAGAIN;
> +	}
>   	conf->reshape_checkpoint = jiffies;
> +	md_wakeup_thread(mddev->sync_thread);
>   	md_new_event();
>   	return 0;
>   }
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid
  2024-02-29 15:49 ` [PATCH 4/6] dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid Xiao Ni
@ 2024-03-01  2:44   ` Yu Kuai
  2024-03-01  4:19     ` Xiao Ni
  0 siblings, 1 reply; 19+ messages in thread
From: Yu Kuai @ 2024-03-01  2:44 UTC (permalink / raw)
  To: Xiao Ni, song
  Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, linux-raid, dm-devel,
	yukuai (C)

Hi,

在 2024/02/29 23:49, Xiao Ni 写道:
> MD_RECOVERY_WAIT is used by dmraid to delay reshape process by patch
> commit 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock").
> Before patch commit f52f5c71f3d4b ("md: fix stopping sync thread")
> dmraid stopped sync thread directy by calling md_reap_sync_thread.
> After this patch dmraid stops sync thread asynchronously as md does.
> This is right. Now the dmraid stop process is like this:
> 
> 1. raid_postsuspend->md_stop_writes->__md_stop_writes->stop_sync_thread.
> stop_sync_thread sets MD_RECOVERY_INTR and wait until MD_RECOVERY_RUNNING
> is cleared
> 2. md_do_sync finds MD_RECOVERY_WAIT is set and return. (This is the
> root cause for this deadlock. We hope md_do_sync can set MD_RECOVERY_DONE)
> 3. md thread calls md_check_recovery (This is the place to reap sync
> thread. Because MD_RECOVERY_DONE is not set. md thread can't reap sync
> thread)
> 4. raid_dtr stops/free struct mddev and release dmraid related resources
> 
> dmraid only sets MD_RECOVERY_WAIT but doesn't clear it. It needs to clear
> this bit when stopping the dmraid before stopping sync thread.
> 
> But the deadlock still can happen sometimes even MD_RECOVERY_WAIT is
> cleared before stopping sync thread. It's the reason stop_sync_thread only
> wakes up task. If the task isn't running, it still needs to wake up sync
> thread too.
> 
> This deadlock can be reproduced 100% by these commands:
> modprobe brd rd_size=34816 rd_nr=5
> while [ 1 ]; do
> vgcreate test_vg /dev/ram*
> lvcreate --type raid5 -L 16M -n test_lv test_vg
> lvconvert -y --stripes 4 /dev/test_vg/test_lv
> vgremove test_vg -ff
> sleep 1
> done
> 
> Fixes: 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock")
> Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
> Signed-off-by: Xiao Ni <xni@redhat.com>
> ---
>   drivers/md/dm-raid.c | 2 ++
>   drivers/md/md.c      | 1 +
>   2 files changed, 3 insertions(+)
> 
> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> index eb009d6bb03a..325767c1140f 100644
> --- a/drivers/md/dm-raid.c
> +++ b/drivers/md/dm-raid.c
> @@ -3796,6 +3796,8 @@ static void raid_postsuspend(struct dm_target *ti)
>   	struct raid_set *rs = ti->private;
>   
>   	if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
> +		if (test_bit(MD_RECOVERY_WAIT, &rs->md.recovery))
> +			clear_bit(MD_RECOVERY_WAIT, &rs->md.recovery);

Like I mentioned in the RFC v2 patch, this really is not safe, or do you
think am I missing something?

Of course we want lvm2 tests behave the same as v6.6, but we can't
introduce new issue that is not covered by lvm2 tests.

Thanks,
Kuai

>   		/* Writes have to be stopped before suspending to avoid deadlocks. */
>   		if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
>   			md_stop_writes(&rs->md);
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 79dfc015c322..f264749be28b 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -4908,6 +4908,7 @@ static void stop_sync_thread(struct mddev *mddev, bool locked, bool check_seq)
>   	 * never happen
>   	 */
>   	md_wakeup_thread_directly(mddev->sync_thread);
> +	md_wakeup_thread(mddev->sync_thread);
>   	if (work_pending(&mddev->sync_work))
>   		flush_work(&mddev->sync_work);
>   
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid
  2024-03-01  2:44   ` Yu Kuai
@ 2024-03-01  4:19     ` Xiao Ni
  0 siblings, 0 replies; 19+ messages in thread
From: Xiao Ni @ 2024-03-01  4:19 UTC (permalink / raw)
  To: Yu Kuai
  Cc: song, bmarzins, heinzm, snitzer, ncroxon, linux-raid, dm-devel,
	yukuai (C)

On Fri, Mar 1, 2024 at 10:45 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2024/02/29 23:49, Xiao Ni 写道:
> > MD_RECOVERY_WAIT is used by dmraid to delay reshape process by patch
> > commit 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock").
> > Before patch commit f52f5c71f3d4b ("md: fix stopping sync thread")
> > dmraid stopped sync thread directy by calling md_reap_sync_thread.
> > After this patch dmraid stops sync thread asynchronously as md does.
> > This is right. Now the dmraid stop process is like this:
> >
> > 1. raid_postsuspend->md_stop_writes->__md_stop_writes->stop_sync_thread.
> > stop_sync_thread sets MD_RECOVERY_INTR and wait until MD_RECOVERY_RUNNING
> > is cleared
> > 2. md_do_sync finds MD_RECOVERY_WAIT is set and return. (This is the
> > root cause for this deadlock. We hope md_do_sync can set MD_RECOVERY_DONE)
> > 3. md thread calls md_check_recovery (This is the place to reap sync
> > thread. Because MD_RECOVERY_DONE is not set. md thread can't reap sync
> > thread)
> > 4. raid_dtr stops/free struct mddev and release dmraid related resources
> >
> > dmraid only sets MD_RECOVERY_WAIT but doesn't clear it. It needs to clear
> > this bit when stopping the dmraid before stopping sync thread.
> >
> > But the deadlock still can happen sometimes even MD_RECOVERY_WAIT is
> > cleared before stopping sync thread. It's the reason stop_sync_thread only
> > wakes up task. If the task isn't running, it still needs to wake up sync
> > thread too.
> >
> > This deadlock can be reproduced 100% by these commands:
> > modprobe brd rd_size=34816 rd_nr=5
> > while [ 1 ]; do
> > vgcreate test_vg /dev/ram*
> > lvcreate --type raid5 -L 16M -n test_lv test_vg
> > lvconvert -y --stripes 4 /dev/test_vg/test_lv
> > vgremove test_vg -ff
> > sleep 1
> > done
> >
> > Fixes: 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock")
> > Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
> > Signed-off-by: Xiao Ni <xni@redhat.com>
> > ---
> >   drivers/md/dm-raid.c | 2 ++
> >   drivers/md/md.c      | 1 +
> >   2 files changed, 3 insertions(+)
> >
> > diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> > index eb009d6bb03a..325767c1140f 100644
> > --- a/drivers/md/dm-raid.c
> > +++ b/drivers/md/dm-raid.c
> > @@ -3796,6 +3796,8 @@ static void raid_postsuspend(struct dm_target *ti)
> >       struct raid_set *rs = ti->private;
> >
> >       if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
> > +             if (test_bit(MD_RECOVERY_WAIT, &rs->md.recovery))
> > +                     clear_bit(MD_RECOVERY_WAIT, &rs->md.recovery);
>
> Like I mentioned in the RFC v2 patch, this really is not safe, or do you
> think am I missing something?

Hi Kuai

I replied based on RFC v2 email directly.

Regards
Xiao
>
> Of course we want lvm2 tests behave the same as v6.6, but we can't
> introduce new issue that is not covered by lvm2 tests.
>
> Thanks,
> Kuai
>
> >               /* Writes have to be stopped before suspending to avoid deadlocks. */
> >               if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
> >                       md_stop_writes(&rs->md);
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index 79dfc015c322..f264749be28b 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -4908,6 +4908,7 @@ static void stop_sync_thread(struct mddev *mddev, bool locked, bool check_seq)
> >        * never happen
> >        */
> >       md_wakeup_thread_directly(mddev->sync_thread);
> > +     md_wakeup_thread(mddev->sync_thread);
> >       if (work_pending(&mddev->sync_work))
> >               flush_work(&mddev->sync_work);
> >
> >
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] md: Revert "md: Don't register sync_thread for reshape directly"
  2024-03-01  2:38   ` Yu Kuai
@ 2024-03-01  4:41     ` Xiao Ni
  0 siblings, 0 replies; 19+ messages in thread
From: Xiao Ni @ 2024-03-01  4:41 UTC (permalink / raw)
  To: Yu Kuai
  Cc: song, bmarzins, heinzm, snitzer, ncroxon, linux-raid, dm-devel,
	yukuai (C)

On Fri, Mar 1, 2024 at 10:38 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2024/02/29 23:49, Xiao Ni 写道:
> > This reverts commit ad39c08186f8a0f221337985036ba86731d6aafe.
> >
> > Function stop_sync_thread only wakes up sync task. It also needs to
> > wake up sync thread. This problem will be fixed in the following
> > patch.
>
> I don't think so, unlike mddev->thread, sync_thread will only be
> executed once and must be executed each time it's registered, and caller
> must make sure to wake up registered sync_thread.

Hi Kuai

I'll modify the comments. But it should be right to
wake_up(mddev->sync_thread) in function stop_sync_thread too? You gave
the same patch yesterday too. I know the caller should wake up sync
thread too.

"However, I think the one to register sync_thread is responsible to
wake it up." I put your comments here. If I understand correctly, we
can do something like this?
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7937,6 +7937,7 @@ static int raid5_run(struct mddev *mddev)
                set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
                rcu_assign_pointer(mddev->sync_thread,
                        md_register_thread(md_do_sync, mddev, "reshape"));
+               md_wakeup_thread(mddev->sync_thread);
                if (!mddev->sync_thread)
                        goto abort;
        }


And at first, I didn't revert
ad39c08186f8a0f221337985036ba86731d6aafe. But with my patch set, it
can cause failure in lvm2 test suit. And the patch you gave yesterday
is part of my patch01, so I revert it. Are you good if I change the
comments and with the modification (wake up sync thread after
registering reshape)?

Best Regards
Xiao

>
> Thanks,
> Kuai
> >
> > Signed-off-by: Xiao Ni <xni@redhat.com>
> > ---
> >   drivers/md/md.c     |  5 +----
> >   drivers/md/raid10.c | 16 ++++++++++++++--
> >   drivers/md/raid5.c  | 29 +++++++++++++++++++++++++++--
> >   3 files changed, 42 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index 9e41a9aaba8b..db4743ba7f6c 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -9376,7 +9376,6 @@ static void md_start_sync(struct work_struct *ws)
> >       struct mddev *mddev = container_of(ws, struct mddev, sync_work);
> >       int spares = 0;
> >       bool suspend = false;
> > -     char *name;
> >
> >       /*
> >        * If reshape is still in progress, spares won't be added or removed
> > @@ -9414,10 +9413,8 @@ static void md_start_sync(struct work_struct *ws)
> >       if (spares)
> >               md_bitmap_write_all(mddev->bitmap);
> >
> > -     name = test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) ?
> > -                     "reshape" : "resync";
> >       rcu_assign_pointer(mddev->sync_thread,
> > -                        md_register_thread(md_do_sync, mddev, name));
> > +                        md_register_thread(md_do_sync, mddev, "resync"));
> >       if (!mddev->sync_thread) {
> >               pr_warn("%s: could not start resync thread...\n",
> >                       mdname(mddev));
> > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> > index a5f8419e2df1..7412066ea22c 100644
> > --- a/drivers/md/raid10.c
> > +++ b/drivers/md/raid10.c
> > @@ -4175,7 +4175,11 @@ static int raid10_run(struct mddev *mddev)
> >               clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
> >               clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
> >               set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
> > -             set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> > +             set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
> > +             rcu_assign_pointer(mddev->sync_thread,
> > +                     md_register_thread(md_do_sync, mddev, "reshape"));
> > +             if (!mddev->sync_thread)
> > +                     goto out_free_conf;
> >       }
> >
> >       return 0;
> > @@ -4569,8 +4573,16 @@ static int raid10_start_reshape(struct mddev *mddev)
> >       clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
> >       clear_bit(MD_RECOVERY_DONE, &mddev->recovery);
> >       set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
> > -     set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> > +     set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
> > +
> > +     rcu_assign_pointer(mddev->sync_thread,
> > +                        md_register_thread(md_do_sync, mddev, "reshape"));
> > +     if (!mddev->sync_thread) {
> > +             ret = -EAGAIN;
> > +             goto abort;
> > +     }
> >       conf->reshape_checkpoint = jiffies;
> > +     md_wakeup_thread(mddev->sync_thread);
> >       md_new_event();
> >       return 0;
> >
> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > index 6a7a32f7fb91..8497880135ee 100644
> > --- a/drivers/md/raid5.c
> > +++ b/drivers/md/raid5.c
> > @@ -7936,7 +7936,11 @@ static int raid5_run(struct mddev *mddev)
> >               clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
> >               clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
> >               set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
> > -             set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> > +             set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
> > +             rcu_assign_pointer(mddev->sync_thread,
> > +                     md_register_thread(md_do_sync, mddev, "reshape"));
> > +             if (!mddev->sync_thread)
> > +                     goto abort;
> >       }
> >
> >       /* Ok, everything is just fine now */
> > @@ -8502,8 +8506,29 @@ static int raid5_start_reshape(struct mddev *mddev)
> >       clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
> >       clear_bit(MD_RECOVERY_DONE, &mddev->recovery);
> >       set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
> > -     set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> > +     set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
> > +     rcu_assign_pointer(mddev->sync_thread,
> > +                        md_register_thread(md_do_sync, mddev, "reshape"));
> > +     if (!mddev->sync_thread) {
> > +             mddev->recovery = 0;
> > +             spin_lock_irq(&conf->device_lock);
> > +             write_seqcount_begin(&conf->gen_lock);
> > +             mddev->raid_disks = conf->raid_disks = conf->previous_raid_disks;
> > +             mddev->new_chunk_sectors =
> > +                     conf->chunk_sectors = conf->prev_chunk_sectors;
> > +             mddev->new_layout = conf->algorithm = conf->prev_algo;
> > +             rdev_for_each(rdev, mddev)
> > +                     rdev->new_data_offset = rdev->data_offset;
> > +             smp_wmb();
> > +             conf->generation--;
> > +             conf->reshape_progress = MaxSector;
> > +             mddev->reshape_position = MaxSector;
> > +             write_seqcount_end(&conf->gen_lock);
> > +             spin_unlock_irq(&conf->device_lock);
> > +             return -EAGAIN;
> > +     }
> >       conf->reshape_checkpoint = jiffies;
> > +     md_wakeup_thread(mddev->sync_thread);
> >       md_new_event();
> >       return 0;
> >   }
> >
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-03-01  4:41 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-29 15:49 [PATCH 0/6] Fix dmraid regression bugs Xiao Ni
2024-02-29 15:49 ` [PATCH 1/6] md: Revert "md: Don't register sync_thread for reshape directly" Xiao Ni
2024-03-01  2:38   ` Yu Kuai
2024-03-01  4:41     ` Xiao Ni
2024-02-29 15:49 ` [PATCH 2/6] md: Revert "md: Make sure md_do_sync() will set MD_RECOVERY_DONE" Xiao Ni
2024-02-29 22:53   ` Song Liu
2024-02-29 23:45     ` Song Liu
2024-03-01  0:49       ` Xiao Ni
2024-03-01  1:11         ` Song Liu
2024-02-29 15:49 ` [PATCH 3/6] md: Revert "md: Don't ignore suspended array in md_check_recovery()" Xiao Ni
2024-02-29 15:49 ` [PATCH 4/6] dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid Xiao Ni
2024-03-01  2:44   ` Yu Kuai
2024-03-01  4:19     ` Xiao Ni
2024-02-29 15:49 ` [PATCH 5/6] md: Set MD_RECOVERY_FROZEN before stop sync thread Xiao Ni
2024-02-29 15:49 ` [PATCH 6/6] md/raid5: Don't check crossing reshape when reshape hasn't started Xiao Ni
2024-02-29 19:39 ` [PATCH 0/6] Fix dmraid regression bugs Christoph Hellwig
2024-02-29 19:45   ` Song Liu
2024-03-01  2:12 ` Yu Kuai
2024-03-01  2:22   ` Xiao Ni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).