linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 0/4] Fix dmraid regression bugs
@ 2024-03-01 15:21 Xiao Ni
  2024-03-01 15:21 ` [PATCH 1/4] md: Revert "md: Don't register sync_thread for reshape directly" Xiao Ni
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Xiao Ni @ 2024-03-01 15:21 UTC (permalink / raw)
  To: song; +Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, linux-raid, dm-devel

Hi all

This patch set tries to fix dmraid regression problems we face
recently. This patch is based on song's md-6.8 branch. 

This patch set has four patches. The first two patches revert two patches.
The third one and the fourth one resolve deadlock problems.

I have run lvm2 regression test 5 times. There are 4 failed cases:
shell/dmsetup-integrity-keys.sh
shell/lvresize-fs-crypt.sh
shell/pvck-dump.sh
shell/select-report.sh

And lvconvert-raid-reshape.sh can fail sometimes. But it fails in 6.6
kernel too. So it can return back to the same state with 6.6 kernel.

V2:
It doesn't revert commit 82ec0ae59d02
("md: Make sure md_do_sync() will set MD_RECOVERY_DONE")
It doesn't clear MD_RECOVERY_WAIT before stopping dmraid
Re-write patch01 comment

Xiao Ni (4):
  md: Revert "md: Don't register sync_thread for reshape directly"
  md: Revert "md: Don't ignore suspended array in md_check_recovery()"
  md: Set MD_RECOVERY_FROZEN before stop sync thread
  md/raid5: Don't check crossing reshape when reshape hasn't started

 drivers/md/md.c     |  9 ++++----
 drivers/md/raid10.c | 16 ++++++++++++--
 drivers/md/raid5.c  | 51 ++++++++++++++++++++++++++++++++-------------
 3 files changed, 56 insertions(+), 20 deletions(-)

-- 
2.32.0 (Apple Git-132)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/4] md: Revert "md: Don't register sync_thread for reshape directly"
  2024-03-01 15:21 [PATCH V2 0/4] Fix dmraid regression bugs Xiao Ni
@ 2024-03-01 15:21 ` Xiao Ni
  2024-03-01 15:21 ` [PATCH 2/4] md: Revert "md: Don't ignore suspended array in md_check_recovery()" Xiao Ni
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Xiao Ni @ 2024-03-01 15:21 UTC (permalink / raw)
  To: song; +Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, linux-raid, dm-devel

This reverts commit ad39c08186f8a0f221337985036ba86731d6aafe.

The reverted patch says there is no way to guarantee that md_do_sync
will be executed. Users should choose a sutiable chance to wake up sync
thread after registering sync thread.

And this patch set tries to use a minimal change to fix dmraid regressions.
With patch03 and patch04 and commit 82ec0ae59d02
("md: Make sure md_do_sync() will set MD_RECOVERY_DONE"), all deadlock
problems can be fixed. So revert this one and we can rethink about this
in future.

Signed-off-by: Xiao Ni <xni@redhat.com>
---
 drivers/md/md.c     |  5 +----
 drivers/md/raid10.c | 16 ++++++++++++++--
 drivers/md/raid5.c  | 29 +++++++++++++++++++++++++++--
 3 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 9e41a9aaba8b..db4743ba7f6c 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9376,7 +9376,6 @@ static void md_start_sync(struct work_struct *ws)
 	struct mddev *mddev = container_of(ws, struct mddev, sync_work);
 	int spares = 0;
 	bool suspend = false;
-	char *name;
 
 	/*
 	 * If reshape is still in progress, spares won't be added or removed
@@ -9414,10 +9413,8 @@ static void md_start_sync(struct work_struct *ws)
 	if (spares)
 		md_bitmap_write_all(mddev->bitmap);
 
-	name = test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) ?
-			"reshape" : "resync";
 	rcu_assign_pointer(mddev->sync_thread,
-			   md_register_thread(md_do_sync, mddev, name));
+			   md_register_thread(md_do_sync, mddev, "resync"));
 	if (!mddev->sync_thread) {
 		pr_warn("%s: could not start resync thread...\n",
 			mdname(mddev));
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index a5f8419e2df1..7412066ea22c 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4175,7 +4175,11 @@ static int raid10_run(struct mddev *mddev)
 		clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
 		clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
 		set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
-		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+		set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
+		rcu_assign_pointer(mddev->sync_thread,
+			md_register_thread(md_do_sync, mddev, "reshape"));
+		if (!mddev->sync_thread)
+			goto out_free_conf;
 	}
 
 	return 0;
@@ -4569,8 +4573,16 @@ static int raid10_start_reshape(struct mddev *mddev)
 	clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
 	clear_bit(MD_RECOVERY_DONE, &mddev->recovery);
 	set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
-	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+	set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
+
+	rcu_assign_pointer(mddev->sync_thread,
+			   md_register_thread(md_do_sync, mddev, "reshape"));
+	if (!mddev->sync_thread) {
+		ret = -EAGAIN;
+		goto abort;
+	}
 	conf->reshape_checkpoint = jiffies;
+	md_wakeup_thread(mddev->sync_thread);
 	md_new_event();
 	return 0;
 
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 6a7a32f7fb91..4c1f572cc00f 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7936,7 +7936,11 @@ static int raid5_run(struct mddev *mddev)
 		clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
 		clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
 		set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
-		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+		set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
+		rcu_assign_pointer(mddev->sync_thread,
+			md_register_thread(md_do_sync, mddev, "reshape"));
+		if (!mddev->sync_thread)
+			goto abort;
 	}
 
 	/* Ok, everything is just fine now */
@@ -8502,8 +8506,29 @@ static int raid5_start_reshape(struct mddev *mddev)
 	clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
 	clear_bit(MD_RECOVERY_DONE, &mddev->recovery);
 	set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
-	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+	set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
+	rcu_assign_pointer(mddev->sync_thread,
+			   md_register_thread(md_do_sync, mddev, "reshape"));
+	if (!mddev->sync_thread) {
+		mddev->recovery = 0;
+		spin_lock_irq(&conf->device_lock);
+		write_seqcount_begin(&conf->gen_lock);
+		mddev->raid_disks = conf->raid_disks = conf->previous_raid_disks;
+		mddev->new_chunk_sectors =
+			conf->chunk_sectors = conf->prev_chunk_sectors;
+		mddev->new_layout = conf->algorithm = conf->prev_algo;
+		rdev_for_each(rdev, mddev)
+			rdev->new_data_offset = rdev->data_offset;
+		smp_wmb();
+		conf->generation--;
+		conf->reshape_progress = MaxSector;
+		mddev->reshape_position = MaxSector;
+		write_seqcount_end(&conf->gen_lock);
+		spin_unlock_irq(&conf->device_lock);
+		return -EAGAIN;
+	}
 	conf->reshape_checkpoint = jiffies;
+	md_wakeup_thread(mddev->sync_thread);
 	md_new_event();
 	return 0;
 }
-- 
2.32.0 (Apple Git-132)


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/4] md: Revert "md: Don't ignore suspended array in md_check_recovery()"
  2024-03-01 15:21 [PATCH V2 0/4] Fix dmraid regression bugs Xiao Ni
  2024-03-01 15:21 ` [PATCH 1/4] md: Revert "md: Don't register sync_thread for reshape directly" Xiao Ni
@ 2024-03-01 15:21 ` Xiao Ni
  2024-03-01 15:21 ` [PATCH 3/4] md: Set MD_RECOVERY_FROZEN before stop sync thread Xiao Ni
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Xiao Ni @ 2024-03-01 15:21 UTC (permalink / raw)
  To: song; +Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, linux-raid, dm-devel

This reverts commit 1baae052cccd08daf9a9d64c3f959d8cdb689757.

For dmraid, it doesn't allow any io including sync io when array is
suspended. Although it's a simple change in this patch, it still needs
more work to support it. Now we're trying to fix regression problems.
So let's keep as small changes as we can. We can rethink about this
in future.

Signed-off-by: Xiao Ni <xni@redhat.com>
---
 drivers/md/md.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index db4743ba7f6c..c4624814d94c 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9496,6 +9496,9 @@ static void unregister_sync_thread(struct mddev *mddev)
  */
 void md_check_recovery(struct mddev *mddev)
 {
+	if (READ_ONCE(mddev->suspended))
+		return;
+
 	if (mddev->bitmap)
 		md_bitmap_daemon_work(mddev);
 
-- 
2.32.0 (Apple Git-132)


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/4] md: Set MD_RECOVERY_FROZEN before stop sync thread
  2024-03-01 15:21 [PATCH V2 0/4] Fix dmraid regression bugs Xiao Ni
  2024-03-01 15:21 ` [PATCH 1/4] md: Revert "md: Don't register sync_thread for reshape directly" Xiao Ni
  2024-03-01 15:21 ` [PATCH 2/4] md: Revert "md: Don't ignore suspended array in md_check_recovery()" Xiao Ni
@ 2024-03-01 15:21 ` Xiao Ni
  2024-03-01 15:21 ` [PATCH 4/4] md/raid5: Don't check crossing reshape when reshape hasn't started Xiao Ni
  2024-03-01 22:28 ` [PATCH V2 0/4] Fix dmraid regression bugs Song Liu
  4 siblings, 0 replies; 7+ messages in thread
From: Xiao Ni @ 2024-03-01 15:21 UTC (permalink / raw)
  To: song; +Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, linux-raid, dm-devel

After patch commit f52f5c71f3d4b ("md: fix stopping sync thread"), dmraid
stops sync thread asynchronously. The calling process is:
dev_remove->dm_destroy->__dm_destroy->raid_postsuspend->raid_dtr

raid_postsuspend does two jobs. First, it stops sync thread. Then it
suspend array. Now it can stop sync thread successfully. But it doesn't
set MD_RECOVERY_FROZEN. It's introduced by patch f52f5c71f3d4b. So after
raid_postsuspend, the sync thread starts again. raid_dtr can't stop the
sync thread because the array is already suspended.

This can be reproduced easily by those commands:
while [ 1 ]; do
vgcreate test_vg /dev/loop0 /dev/loop1
lvcreate --type raid1 -L 400M -m 1 -n test_lv test_vg
lvchange -an test_vg
vgremove test_vg -ff
done

Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
Signed-off-by: Xiao Ni <xni@redhat.com>
---
 drivers/md/md.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index c4624814d94c..c96a3bb073c4 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6340,6 +6340,7 @@ static void __md_stop_writes(struct mddev *mddev)
 void md_stop_writes(struct mddev *mddev)
 {
 	mddev_lock_nointr(mddev);
+	set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
 	__md_stop_writes(mddev);
 	mddev_unlock(mddev);
 }
-- 
2.32.0 (Apple Git-132)


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/4] md/raid5: Don't check crossing reshape when reshape hasn't started
  2024-03-01 15:21 [PATCH V2 0/4] Fix dmraid regression bugs Xiao Ni
                   ` (2 preceding siblings ...)
  2024-03-01 15:21 ` [PATCH 3/4] md: Set MD_RECOVERY_FROZEN before stop sync thread Xiao Ni
@ 2024-03-01 15:21 ` Xiao Ni
  2024-03-01 22:28 ` [PATCH V2 0/4] Fix dmraid regression bugs Song Liu
  4 siblings, 0 replies; 7+ messages in thread
From: Xiao Ni @ 2024-03-01 15:21 UTC (permalink / raw)
  To: song; +Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, linux-raid, dm-devel

stripe_ahead_of_reshape is used to check if a stripe region cross the
reshape position. So first, change the function name to
stripe_across_reshape to describe the usage of this function.

For reshape backwards, it starts reshape from the end of array and conf->
reshape_progress is init to raid5_size. During reshape, if previous is true
(set in make_stripe_request) and max_sector >= conf->reshape_progress, ios
should wait until reshape window moves forward. But ios don't need to wait
if max_sector is raid5_size.

And put the conditions into the function directly to make understand the
codes easily.

This can be reproduced easily by lvm2 test shell/lvconvert-raid-reshape.sh
For dm raid reshape, before starting sync thread, it needs to reload table
some times. In one time dm raid uses MD_RECOVERY_WAIT to delay reshape and
it doesn't start sync thread this time. Then one io comes in and it waits
because stripe_ahead_of_reshape returns true because it's a backward
reshape and max_sectors > conf->reshape_progress. But the reshape hasn't
started. So skip this check when reshape_progress is raid5_size

Fixes: 486f60558607 ("md/raid5: Check all disks in a stripe_head for reshape progress")
Signed-off-by: Xiao Ni <xni@redhat.com>
---
 drivers/md/raid5.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 4c1f572cc00f..8d562c1344f4 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5832,17 +5832,12 @@ static bool ahead_of_reshape(struct mddev *mddev, sector_t sector,
 					  sector >= reshape_sector;
 }
 
-static bool range_ahead_of_reshape(struct mddev *mddev, sector_t min,
-				   sector_t max, sector_t reshape_sector)
-{
-	return mddev->reshape_backwards ? max < reshape_sector :
-					  min >= reshape_sector;
-}
-
-static bool stripe_ahead_of_reshape(struct mddev *mddev, struct r5conf *conf,
+static sector_t raid5_size(struct mddev *mddev, sector_t sectors, int raid_disks);
+static bool stripe_across_reshape(struct mddev *mddev, struct r5conf *conf,
 				    struct stripe_head *sh)
 {
 	sector_t max_sector = 0, min_sector = MaxSector;
+	sector_t reshape_pos = 0;
 	bool ret = false;
 	int dd_idx;
 
@@ -5856,9 +5851,12 @@ static bool stripe_ahead_of_reshape(struct mddev *mddev, struct r5conf *conf,
 
 	spin_lock_irq(&conf->device_lock);
 
-	if (!range_ahead_of_reshape(mddev, min_sector, max_sector,
-				     conf->reshape_progress))
-		/* mismatch, need to try again */
+	reshape_pos = conf->reshape_progress;
+	if (mddev->reshape_backwards) {
+		if (max_sector >= reshape_pos &&
+		    reshape_pos != raid5_size(mddev, 0, 0))
+			ret = true;
+	} else if (min_sector < reshape_pos)
 		ret = true;
 
 	spin_unlock_irq(&conf->device_lock);
@@ -5969,7 +5967,7 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
 	}
 
 	if (unlikely(previous) &&
-	    stripe_ahead_of_reshape(mddev, conf, sh)) {
+	    stripe_across_reshape(mddev, conf, sh)) {
 		/*
 		 * Expansion moved on while waiting for a stripe.
 		 * Expansion could still move past after this
-- 
2.32.0 (Apple Git-132)


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH V2 0/4] Fix dmraid regression bugs
  2024-03-01 15:21 [PATCH V2 0/4] Fix dmraid regression bugs Xiao Ni
                   ` (3 preceding siblings ...)
  2024-03-01 15:21 ` [PATCH 4/4] md/raid5: Don't check crossing reshape when reshape hasn't started Xiao Ni
@ 2024-03-01 22:28 ` Song Liu
  2024-03-02  0:41   ` Xiao Ni
  4 siblings, 1 reply; 7+ messages in thread
From: Song Liu @ 2024-03-01 22:28 UTC (permalink / raw)
  To: Xiao Ni; +Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, linux-raid, dm-devel

On Fri, Mar 1, 2024 at 7:21 AM Xiao Ni <xni@redhat.com> wrote:
>
> Hi all
>
> This patch set tries to fix dmraid regression problems we face
> recently. This patch is based on song's md-6.8 branch.
>
> This patch set has four patches. The first two patches revert two patches.
> The third one and the fourth one resolve deadlock problems.
>
> I have run lvm2 regression test 5 times. There are 4 failed cases:
> shell/dmsetup-integrity-keys.sh
> shell/lvresize-fs-crypt.sh
> shell/pvck-dump.sh
> shell/select-report.sh
>
> And lvconvert-raid-reshape.sh can fail sometimes. But it fails in 6.6
> kernel too. So it can return back to the same state with 6.6 kernel.
>
> V2:
> It doesn't revert commit 82ec0ae59d02
> ("md: Make sure md_do_sync() will set MD_RECOVERY_DONE")
> It doesn't clear MD_RECOVERY_WAIT before stopping dmraid
> Re-write patch01 comment

Unfortunately, I am still seeing the same deadlock in the reboot tests
with two arrays. OTOH, Yu Kuai's version doesn't have this issue.
I think we will ship that patch set.

Thanks for your kind work on this issue.
Song

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2 0/4] Fix dmraid regression bugs
  2024-03-01 22:28 ` [PATCH V2 0/4] Fix dmraid regression bugs Song Liu
@ 2024-03-02  0:41   ` Xiao Ni
  0 siblings, 0 replies; 7+ messages in thread
From: Xiao Ni @ 2024-03-02  0:41 UTC (permalink / raw)
  To: Song Liu
  Cc: yukuai1, bmarzins, heinzm, snitzer, ncroxon, linux-raid, dm-devel

On Sat, Mar 2, 2024 at 6:28 AM Song Liu <song@kernel.org> wrote:
>
> On Fri, Mar 1, 2024 at 7:21 AM Xiao Ni <xni@redhat.com> wrote:
> >
> > Hi all
> >
> > This patch set tries to fix dmraid regression problems we face
> > recently. This patch is based on song's md-6.8 branch.
> >
> > This patch set has four patches. The first two patches revert two patches.
> > The third one and the fourth one resolve deadlock problems.
> >
> > I have run lvm2 regression test 5 times. There are 4 failed cases:
> > shell/dmsetup-integrity-keys.sh
> > shell/lvresize-fs-crypt.sh
> > shell/pvck-dump.sh
> > shell/select-report.sh
> >
> > And lvconvert-raid-reshape.sh can fail sometimes. But it fails in 6.6
> > kernel too. So it can return back to the same state with 6.6 kernel.
> >
> > V2:
> > It doesn't revert commit 82ec0ae59d02
> > ("md: Make sure md_do_sync() will set MD_RECOVERY_DONE")
> > It doesn't clear MD_RECOVERY_WAIT before stopping dmraid
> > Re-write patch01 comment
>
> Unfortunately, I am still seeing the same deadlock in the reboot tests
> with two arrays. OTOH, Yu Kuai's version doesn't have this issue.
> I think we will ship that patch set.
>
> Thanks for your kind work on this issue.
> Song
>

It's a process for me to study :)
I'll have a test based on Yu Kuai's patch set and give the result later.

Best Regards
Xiao


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-03-02  0:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-01 15:21 [PATCH V2 0/4] Fix dmraid regression bugs Xiao Ni
2024-03-01 15:21 ` [PATCH 1/4] md: Revert "md: Don't register sync_thread for reshape directly" Xiao Ni
2024-03-01 15:21 ` [PATCH 2/4] md: Revert "md: Don't ignore suspended array in md_check_recovery()" Xiao Ni
2024-03-01 15:21 ` [PATCH 3/4] md: Set MD_RECOVERY_FROZEN before stop sync thread Xiao Ni
2024-03-01 15:21 ` [PATCH 4/4] md/raid5: Don't check crossing reshape when reshape hasn't started Xiao Ni
2024-03-01 22:28 ` [PATCH V2 0/4] Fix dmraid regression bugs Song Liu
2024-03-02  0:41   ` Xiao Ni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).