linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -next 00/13] md: make rdev addition and removal independent from daemon thread
@ 2023-08-03 13:24 Yu Kuai
  2023-08-03 13:24 ` [PATCH -next 01/13] md: remove rdev flag 'RemoveSynchronized' Yu Kuai
                   ` (12 more replies)
  0 siblings, 13 replies; 17+ messages in thread
From: Yu Kuai @ 2023-08-03 13:24 UTC (permalink / raw)
  To: song, xni; +Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

This is the third patchset to do some preparatory work to synchronize
io with array reconfiguration.

1) The first patchset refactor 'active_io', make sure that mddev_suspend()
will wait for io to be done. [1]

2) The second patchset remove 'quiesce' callback from mddev_suspend(), so
that mddev_suspend() doesn't rely on 'quiesce' callback is registered,
and can be used for all personalites; [2]

3) This patchset make array reconfiguration independent from daemon thread,
and synchronize it with io will be much easier because io may rely on
daemon thread to be done.

patch 1-11 are cleanup and refactor patches, I think they make related
code much easier to read;

More patchset on the way!

[1] https://lore.kernel.org/all/20230621165110.1498313-1-yukuai1@huaweicloud.com/
[2] https://lore.kernel.org/all/20230628012931.88911-2-yukuai1@huaweicloud.com/

Yu Kuai (13):
  md: remove rdev flag 'RemoveSynchronized'
  md: factor out a helper rdev_removeable() from remove_and_add_spares()
  md: factor out a helper rdev_is_spare() from remove_and_add_spares()
  md: factor out a helper rdev_addable() from remove_and_add_spares()
  md: factor out a helper hot_remove_rdev() from remove_and_add_spares()
  md: factor out a helper hot_add_rdev() from remove_and_add_spares()
  md: factor out a helper remove_rdev() from state_store()
  md: convert to use hot_remove_rdev() to hot remove one rdev
  md: convert to use hot_add_rdev() to hot add one rdev
  md: cleanup remove_and_add_spares()
  md: use separate work_struct for md_start_sync()
  md: delay choosing sync direction to md_start_sync()
  md: delay remove_and_add_spares() for read only array to
    md_start_sync()

 drivers/md/md-multipath.c |  15 +-
 drivers/md/md.c           | 413 +++++++++++++++++++++-----------------
 drivers/md/md.h           |  10 +-
 drivers/md/raid1.c        |  15 +-
 drivers/md/raid10.c       |  15 +-
 drivers/md/raid5.c        |  15 +-
 6 files changed, 262 insertions(+), 221 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH -next 01/13] md: remove rdev flag 'RemoveSynchronized'
  2023-08-03 13:24 [PATCH -next 00/13] md: make rdev addition and removal independent from daemon thread Yu Kuai
@ 2023-08-03 13:24 ` Yu Kuai
  2023-08-14 10:02   ` Song Liu
  2023-08-03 13:24 ` [PATCH -next 02/13] md: factor out a helper rdev_removeable() from remove_and_add_spares() Yu Kuai
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 17+ messages in thread
From: Yu Kuai @ 2023-08-03 13:24 UTC (permalink / raw)
  To: song, xni; +Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

When multiple rdevs are removed from array in remove_and_add_spares(),
which is only possible from daemon thread(user can only remove one rdev
through ioctl/sysfs at one time), the flag is used to avoid calling
synchronize_rcu() multiple times for each rdev.

However, remove rdev from daemon thread really is super cold path, while
removing n rdevs, it doesn't matter sync rcu n times or just one time.
Hence remove this flag and make code simpler.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md-multipath.c | 15 +++++++--------
 drivers/md/md.c           | 26 ++------------------------
 drivers/md/md.h           |  5 -----
 drivers/md/raid1.c        | 15 +++++++--------
 drivers/md/raid10.c       | 15 +++++++--------
 drivers/md/raid5.c        | 15 +++++++--------
 6 files changed, 30 insertions(+), 61 deletions(-)

diff --git a/drivers/md/md-multipath.c b/drivers/md/md-multipath.c
index d22276870283..9e4ddd5240cd 100644
--- a/drivers/md/md-multipath.c
+++ b/drivers/md/md-multipath.c
@@ -258,14 +258,13 @@ static int multipath_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
 			goto abort;
 		}
 		p->rdev = NULL;
-		if (!test_bit(RemoveSynchronized, &rdev->flags)) {
-			synchronize_rcu();
-			if (atomic_read(&rdev->nr_pending)) {
-				/* lost the race, try later */
-				err = -EBUSY;
-				p->rdev = rdev;
-				goto abort;
-			}
+
+		synchronize_rcu();
+		if (atomic_read(&rdev->nr_pending)) {
+			/* lost the race, try later */
+			err = -EBUSY;
+			p->rdev = rdev;
+			goto abort;
 		}
 		err = md_integrity_register(mddev);
 	}
diff --git a/drivers/md/md.c b/drivers/md/md.c
index a3d98273b295..cd7ac1dee3b8 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9155,7 +9155,6 @@ static int remove_and_add_spares(struct mddev *mddev,
 	struct md_rdev *rdev;
 	int spares = 0;
 	int removed = 0;
-	bool remove_some = false;
 
 	if (this && test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
 		/* Mustn't remove devices when resync thread is running */
@@ -9165,28 +9164,9 @@ static int remove_and_add_spares(struct mddev *mddev,
 		if ((this == NULL || rdev == this) &&
 		    rdev->raid_disk >= 0 &&
 		    !test_bit(Blocked, &rdev->flags) &&
-		    test_bit(Faulty, &rdev->flags) &&
+		    !test_bit(In_sync, &rdev->flags) &&
+		    !test_bit(Journal, &rdev->flags) &&
 		    atomic_read(&rdev->nr_pending)==0) {
-			/* Faulty non-Blocked devices with nr_pending == 0
-			 * never get nr_pending incremented,
-			 * never get Faulty cleared, and never get Blocked set.
-			 * So we can synchronize_rcu now rather than once per device
-			 */
-			remove_some = true;
-			set_bit(RemoveSynchronized, &rdev->flags);
-		}
-	}
-
-	if (remove_some)
-		synchronize_rcu();
-	rdev_for_each(rdev, mddev) {
-		if ((this == NULL || rdev == this) &&
-		    rdev->raid_disk >= 0 &&
-		    !test_bit(Blocked, &rdev->flags) &&
-		    ((test_bit(RemoveSynchronized, &rdev->flags) ||
-		     (!test_bit(In_sync, &rdev->flags) &&
-		      !test_bit(Journal, &rdev->flags))) &&
-		    atomic_read(&rdev->nr_pending)==0)) {
 			if (mddev->pers->hot_remove_disk(
 				    mddev, rdev) == 0) {
 				sysfs_unlink_rdev(mddev, rdev);
@@ -9195,8 +9175,6 @@ static int remove_and_add_spares(struct mddev *mddev,
 				removed++;
 			}
 		}
-		if (remove_some && test_bit(RemoveSynchronized, &rdev->flags))
-			clear_bit(RemoveSynchronized, &rdev->flags);
 	}
 
 	if (removed && mddev->kobj.sd)
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 8ae957480976..b25b6d061372 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -189,11 +189,6 @@ enum flag_bits {
 				 * than other devices in the array
 				 */
 	ClusterRemove,
-	RemoveSynchronized,	/* synchronize_rcu() was called after
-				 * this device was known to be faulty,
-				 * so it is safe to remove without
-				 * another synchronize_rcu() call.
-				 */
 	ExternalBbl,            /* External metadata provides bad
 				 * block management for a disk
 				 */
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 23d211969565..acb6d6542619 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1859,15 +1859,14 @@ static int raid1_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
 			goto abort;
 		}
 		p->rdev = NULL;
-		if (!test_bit(RemoveSynchronized, &rdev->flags)) {
-			synchronize_rcu();
-			if (atomic_read(&rdev->nr_pending)) {
-				/* lost the race, try later */
-				err = -EBUSY;
-				p->rdev = rdev;
-				goto abort;
-			}
+		synchronize_rcu();
+		if (atomic_read(&rdev->nr_pending)) {
+			/* lost the race, try later */
+			err = -EBUSY;
+			p->rdev = rdev;
+			goto abort;
 		}
+
 		if (conf->mirrors[conf->raid_disks + number].rdev) {
 			/* We just removed a device that is being replaced.
 			 * Move down the replacement.  We drain all IO before
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 7704a4c7f469..64dd5cb6133e 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2247,15 +2247,14 @@ static int raid10_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
 		goto abort;
 	}
 	*rdevp = NULL;
-	if (!test_bit(RemoveSynchronized, &rdev->flags)) {
-		synchronize_rcu();
-		if (atomic_read(&rdev->nr_pending)) {
-			/* lost the race, try later */
-			err = -EBUSY;
-			*rdevp = rdev;
-			goto abort;
-		}
+	synchronize_rcu();
+	if (atomic_read(&rdev->nr_pending)) {
+		/* lost the race, try later */
+		err = -EBUSY;
+		*rdevp = rdev;
+		goto abort;
 	}
+
 	if (p->replacement) {
 		/* We must have just cleared 'rdev' */
 		p->rdev = p->replacement;
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 4cdb35e54251..37d9865b180a 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -8267,15 +8267,14 @@ static int raid5_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
 		goto abort;
 	}
 	*rdevp = NULL;
-	if (!test_bit(RemoveSynchronized, &rdev->flags)) {
-		lockdep_assert_held(&mddev->reconfig_mutex);
-		synchronize_rcu();
-		if (atomic_read(&rdev->nr_pending)) {
-			/* lost the race, try later */
-			err = -EBUSY;
-			rcu_assign_pointer(*rdevp, rdev);
-		}
+	lockdep_assert_held(&mddev->reconfig_mutex);
+	synchronize_rcu();
+	if (atomic_read(&rdev->nr_pending)) {
+		/* lost the race, try later */
+		err = -EBUSY;
+		rcu_assign_pointer(*rdevp, rdev);
 	}
+
 	if (!err) {
 		err = log_modify(conf, rdev, false);
 		if (err)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -next 02/13] md: factor out a helper rdev_removeable() from remove_and_add_spares()
  2023-08-03 13:24 [PATCH -next 00/13] md: make rdev addition and removal independent from daemon thread Yu Kuai
  2023-08-03 13:24 ` [PATCH -next 01/13] md: remove rdev flag 'RemoveSynchronized' Yu Kuai
@ 2023-08-03 13:24 ` Yu Kuai
  2023-08-03 13:24 ` [PATCH -next 03/13] md: factor out a helper rdev_is_spare() " Yu Kuai
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Yu Kuai @ 2023-08-03 13:24 UTC (permalink / raw)
  To: song, xni; +Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

There are no functional changes, just to make the code simpler and
prepare to refactoer remove_and_add_spares().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index cd7ac1dee3b8..0d84754027ec 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9149,6 +9149,14 @@ void md_do_sync(struct md_thread *thread)
 }
 EXPORT_SYMBOL_GPL(md_do_sync);
 
+static bool rdev_removeable(struct md_rdev *rdev)
+{
+	return rdev->raid_disk >= 0 && !test_bit(Blocked, &rdev->flags) &&
+	       !test_bit(In_sync, &rdev->flags) &&
+	       !test_bit(Journal, &rdev->flags) &&
+	       !atomic_read(&rdev->nr_pending);
+}
+
 static int remove_and_add_spares(struct mddev *mddev,
 				 struct md_rdev *this)
 {
@@ -9161,19 +9169,12 @@ static int remove_and_add_spares(struct mddev *mddev,
 		return 0;
 
 	rdev_for_each(rdev, mddev) {
-		if ((this == NULL || rdev == this) &&
-		    rdev->raid_disk >= 0 &&
-		    !test_bit(Blocked, &rdev->flags) &&
-		    !test_bit(In_sync, &rdev->flags) &&
-		    !test_bit(Journal, &rdev->flags) &&
-		    atomic_read(&rdev->nr_pending)==0) {
-			if (mddev->pers->hot_remove_disk(
-				    mddev, rdev) == 0) {
-				sysfs_unlink_rdev(mddev, rdev);
-				rdev->saved_raid_disk = rdev->raid_disk;
-				rdev->raid_disk = -1;
-				removed++;
-			}
+		if ((this == NULL || rdev == this) && rdev_removeable(rdev) &&
+		    !mddev->pers->hot_remove_disk(mddev, rdev)) {
+			sysfs_unlink_rdev(mddev, rdev);
+			rdev->saved_raid_disk = rdev->raid_disk;
+			rdev->raid_disk = -1;
+			removed++;
 		}
 	}
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -next 03/13] md: factor out a helper rdev_is_spare() from remove_and_add_spares()
  2023-08-03 13:24 [PATCH -next 00/13] md: make rdev addition and removal independent from daemon thread Yu Kuai
  2023-08-03 13:24 ` [PATCH -next 01/13] md: remove rdev flag 'RemoveSynchronized' Yu Kuai
  2023-08-03 13:24 ` [PATCH -next 02/13] md: factor out a helper rdev_removeable() from remove_and_add_spares() Yu Kuai
@ 2023-08-03 13:24 ` Yu Kuai
  2023-08-03 13:24 ` [PATCH -next 04/13] md: factor out a helper rdev_addable() " Yu Kuai
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Yu Kuai @ 2023-08-03 13:24 UTC (permalink / raw)
  To: song, xni; +Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

There are no functional changes, just to make the code simpler and
prepare to refactor remove_and_add_spares().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0d84754027ec..0e4967f17115 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9157,6 +9157,14 @@ static bool rdev_removeable(struct md_rdev *rdev)
 	       !atomic_read(&rdev->nr_pending);
 }
 
+static bool rdev_is_spare(struct md_rdev *rdev)
+{
+	return !test_bit(Candidate, &rdev->flags) && rdev->raid_disk >= 0 &&
+	       !test_bit(In_sync, &rdev->flags) &&
+	       !test_bit(Journal, &rdev->flags) &&
+	       !test_bit(Faulty, &rdev->flags);
+}
+
 static int remove_and_add_spares(struct mddev *mddev,
 				 struct md_rdev *this)
 {
@@ -9187,13 +9195,10 @@ static int remove_and_add_spares(struct mddev *mddev,
 	rdev_for_each(rdev, mddev) {
 		if (this && this != rdev)
 			continue;
+		if (rdev_is_spare(rdev))
+			spares++;
 		if (test_bit(Candidate, &rdev->flags))
 			continue;
-		if (rdev->raid_disk >= 0 &&
-		    !test_bit(In_sync, &rdev->flags) &&
-		    !test_bit(Journal, &rdev->flags) &&
-		    !test_bit(Faulty, &rdev->flags))
-			spares++;
 		if (rdev->raid_disk >= 0)
 			continue;
 		if (test_bit(Faulty, &rdev->flags))
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -next 04/13] md: factor out a helper rdev_addable() from remove_and_add_spares()
  2023-08-03 13:24 [PATCH -next 00/13] md: make rdev addition and removal independent from daemon thread Yu Kuai
                   ` (2 preceding siblings ...)
  2023-08-03 13:24 ` [PATCH -next 03/13] md: factor out a helper rdev_is_spare() " Yu Kuai
@ 2023-08-03 13:24 ` Yu Kuai
  2023-08-03 13:24 ` [PATCH -next 05/13] md: factor out a helper hot_remove_rdev() " Yu Kuai
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Yu Kuai @ 2023-08-03 13:24 UTC (permalink / raw)
  To: song, xni; +Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

There are no functional changes, just to make the code simpler and
prepare to refactoer remove_and_add_spares().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0e4967f17115..8017371a9e53 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9157,6 +9157,20 @@ static bool rdev_removeable(struct md_rdev *rdev)
 	       !atomic_read(&rdev->nr_pending);
 }
 
+static bool rdev_addable(struct md_rdev *rdev)
+{
+	if (test_bit(Candidate, &rdev->flags) || rdev->raid_disk >= 0 ||
+	    test_bit(Faulty, &rdev->flags))
+		return false;
+
+	if (!test_bit(Journal, &rdev->flags) && !md_is_rdwr(rdev->mddev) &&
+	    !(rdev->saved_raid_disk >= 0 &&
+	      !test_bit(Bitmap_sync, &rdev->flags)))
+		return false;
+
+	return true;
+}
+
 static bool rdev_is_spare(struct md_rdev *rdev)
 {
 	return !test_bit(Candidate, &rdev->flags) && rdev->raid_disk >= 0 &&
@@ -9197,20 +9211,10 @@ static int remove_and_add_spares(struct mddev *mddev,
 			continue;
 		if (rdev_is_spare(rdev))
 			spares++;
-		if (test_bit(Candidate, &rdev->flags))
+		if (!rdev_addable(rdev))
 			continue;
-		if (rdev->raid_disk >= 0)
-			continue;
-		if (test_bit(Faulty, &rdev->flags))
-			continue;
-		if (!test_bit(Journal, &rdev->flags)) {
-			if (!md_is_rdwr(mddev) &&
-			    !(rdev->saved_raid_disk >= 0 &&
-			      !test_bit(Bitmap_sync, &rdev->flags)))
-				continue;
-
+		if (!test_bit(Journal, &rdev->flags))
 			rdev->recovery_offset = 0;
-		}
 		if (mddev->pers->hot_add_disk(mddev, rdev) == 0) {
 			/* failure here is OK */
 			sysfs_link_rdev(mddev, rdev);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -next 05/13] md: factor out a helper hot_remove_rdev() from remove_and_add_spares()
  2023-08-03 13:24 [PATCH -next 00/13] md: make rdev addition and removal independent from daemon thread Yu Kuai
                   ` (3 preceding siblings ...)
  2023-08-03 13:24 ` [PATCH -next 04/13] md: factor out a helper rdev_addable() " Yu Kuai
@ 2023-08-03 13:24 ` Yu Kuai
  2023-08-03 13:24 ` [PATCH -next 06/13] md: factor out a helper hot_add_rdev() " Yu Kuai
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Yu Kuai @ 2023-08-03 13:24 UTC (permalink / raw)
  To: song, xni; +Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

There are no functional changes, just to make the code simpler and
prepare to refactor remove_and_add_spares().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 8017371a9e53..74ee31f2706b 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9179,6 +9179,23 @@ static bool rdev_is_spare(struct md_rdev *rdev)
 	       !test_bit(Faulty, &rdev->flags);
 }
 
+static bool hot_remove_rdev(struct md_rdev *rdev)
+{
+	struct mddev *mddev;
+
+	if (!rdev_removeable(rdev))
+		return false;
+
+	mddev = rdev->mddev;
+	if (mddev->pers->hot_remove_disk(mddev, rdev))
+		return false;
+
+	sysfs_unlink_rdev(mddev, rdev);
+	rdev->saved_raid_disk = rdev->raid_disk;
+	rdev->raid_disk = -1;
+	return true;
+}
+
 static int remove_and_add_spares(struct mddev *mddev,
 				 struct md_rdev *this)
 {
@@ -9190,15 +9207,9 @@ static int remove_and_add_spares(struct mddev *mddev,
 		/* Mustn't remove devices when resync thread is running */
 		return 0;
 
-	rdev_for_each(rdev, mddev) {
-		if ((this == NULL || rdev == this) && rdev_removeable(rdev) &&
-		    !mddev->pers->hot_remove_disk(mddev, rdev)) {
-			sysfs_unlink_rdev(mddev, rdev);
-			rdev->saved_raid_disk = rdev->raid_disk;
-			rdev->raid_disk = -1;
+	rdev_for_each(rdev, mddev)
+		if ((this == NULL || rdev == this) && hot_remove_rdev(rdev))
 			removed++;
-		}
-	}
 
 	if (removed && mddev->kobj.sd)
 		sysfs_notify_dirent_safe(mddev->sysfs_degraded);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -next 06/13] md: factor out a helper hot_add_rdev() from remove_and_add_spares()
  2023-08-03 13:24 [PATCH -next 00/13] md: make rdev addition and removal independent from daemon thread Yu Kuai
                   ` (4 preceding siblings ...)
  2023-08-03 13:24 ` [PATCH -next 05/13] md: factor out a helper hot_remove_rdev() " Yu Kuai
@ 2023-08-03 13:24 ` Yu Kuai
  2023-08-03 13:24 ` [PATCH -next 07/13] md: factor out a helper remove_rdev() from state_store() Yu Kuai
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Yu Kuai @ 2023-08-03 13:24 UTC (permalink / raw)
  To: song, xni; +Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

There are no functional changes, just to make the code simpler and
prepare to refactoer remove_and_add_spares().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 34 ++++++++++++++++++++++------------
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 74ee31f2706b..77c48f7b605c 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9196,6 +9196,26 @@ static bool hot_remove_rdev(struct md_rdev *rdev)
 	return true;
 }
 
+static bool hot_add_rdev(struct md_rdev *rdev)
+{
+	struct mddev *mddev;
+
+	if (!rdev_addable(rdev))
+		return false;
+
+	if (!test_bit(Journal, &rdev->flags))
+		rdev->recovery_offset = 0;
+
+	mddev = rdev->mddev;
+	if (mddev->pers->hot_add_disk(mddev, rdev))
+		return false;
+
+	sysfs_link_rdev(mddev, rdev);
+	md_new_event();
+	set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags);
+	return true;
+}
+
 static int remove_and_add_spares(struct mddev *mddev,
 				 struct md_rdev *this)
 {
@@ -9222,18 +9242,8 @@ static int remove_and_add_spares(struct mddev *mddev,
 			continue;
 		if (rdev_is_spare(rdev))
 			spares++;
-		if (!rdev_addable(rdev))
-			continue;
-		if (!test_bit(Journal, &rdev->flags))
-			rdev->recovery_offset = 0;
-		if (mddev->pers->hot_add_disk(mddev, rdev) == 0) {
-			/* failure here is OK */
-			sysfs_link_rdev(mddev, rdev);
-			if (!test_bit(Journal, &rdev->flags))
-				spares++;
-			md_new_event();
-			set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags);
-		}
+		if (hot_add_rdev(rdev) && !test_bit(Journal, &rdev->flags))
+			spares++;
 	}
 no_add:
 	if (removed)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -next 07/13] md: factor out a helper remove_rdev() from state_store()
  2023-08-03 13:24 [PATCH -next 00/13] md: make rdev addition and removal independent from daemon thread Yu Kuai
                   ` (5 preceding siblings ...)
  2023-08-03 13:24 ` [PATCH -next 06/13] md: factor out a helper hot_add_rdev() " Yu Kuai
@ 2023-08-03 13:24 ` Yu Kuai
  2023-08-03 13:24 ` [PATCH -next 08/13] md: convert to use hot_remove_rdev() to hot remove one rdev Yu Kuai
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Yu Kuai @ 2023-08-03 13:24 UTC (permalink / raw)
  To: song, xni; +Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

state_store() is a very big function already, factor out a helper to
prevent that following changes will make state_store() more complicated.

There are no functional changes.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 50 +++++++++++++++++++++++++++++--------------------
 1 file changed, 30 insertions(+), 20 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 77c48f7b605c..3903bdfe5293 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2883,6 +2883,35 @@ state_show(struct md_rdev *rdev, char *page)
 	return len+sprintf(page+len, "\n");
 }
 
+static int remove_rdev(struct md_rdev *rdev)
+{
+	struct mddev *mddev = rdev->mddev;
+
+	if (mddev->pers) {
+		clear_bit(Blocked, &rdev->flags);
+		remove_and_add_spares(mddev, rdev);
+	}
+
+	if (rdev->raid_disk >= 0)
+		return -EBUSY;
+
+	if (mddev_is_clustered(mddev)) {
+		int err = md_cluster_ops->remove_disk(mddev, rdev);
+
+		if (err)
+			return err;
+	}
+
+	md_kick_rdev_from_array(rdev);
+	if (mddev->pers) {
+		set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags);
+		md_wakeup_thread(mddev->thread);
+	}
+	md_new_event();
+
+	return 0;
+}
+
 static ssize_t
 state_store(struct md_rdev *rdev, const char *buf, size_t len)
 {
@@ -2913,26 +2942,7 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
 		else
 			err = 0;
 	} else if (cmd_match(buf, "remove")) {
-		if (rdev->mddev->pers) {
-			clear_bit(Blocked, &rdev->flags);
-			remove_and_add_spares(rdev->mddev, rdev);
-		}
-		if (rdev->raid_disk >= 0)
-			err = -EBUSY;
-		else {
-			err = 0;
-			if (mddev_is_clustered(mddev))
-				err = md_cluster_ops->remove_disk(mddev, rdev);
-
-			if (err == 0) {
-				md_kick_rdev_from_array(rdev);
-				if (mddev->pers) {
-					set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags);
-					md_wakeup_thread(mddev->thread);
-				}
-				md_new_event();
-			}
-		}
+		err = remove_rdev(rdev);
 	} else if (cmd_match(buf, "writemostly")) {
 		set_bit(WriteMostly, &rdev->flags);
 		mddev_create_serial_pool(rdev->mddev, rdev, false);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -next 08/13] md: convert to use hot_remove_rdev() to hot remove one rdev
  2023-08-03 13:24 [PATCH -next 00/13] md: make rdev addition and removal independent from daemon thread Yu Kuai
                   ` (6 preceding siblings ...)
  2023-08-03 13:24 ` [PATCH -next 07/13] md: factor out a helper remove_rdev() from state_store() Yu Kuai
@ 2023-08-03 13:24 ` Yu Kuai
  2023-08-03 13:24 ` [PATCH -next 09/13] md: convert to use hot_add_rdev() to hot add " Yu Kuai
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Yu Kuai @ 2023-08-03 13:24 UTC (permalink / raw)
  To: song, xni; +Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

There are no functional changes, prepare to cleanup
remove_and_add_spares().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 3903bdfe5293..de7399769c8d 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -85,6 +85,7 @@ static struct workqueue_struct *md_wq;
 static struct workqueue_struct *md_misc_wq;
 struct workqueue_struct *md_bitmap_wq;
 
+static bool hot_remove_rdev(struct md_rdev *rdev);
 static int remove_and_add_spares(struct mddev *mddev,
 				 struct md_rdev *this);
 static void mddev_detach(struct mddev *mddev);
@@ -2887,9 +2888,13 @@ static int remove_rdev(struct md_rdev *rdev)
 {
 	struct mddev *mddev = rdev->mddev;
 
+	if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
+		return -EBUSY;
+
 	if (mddev->pers) {
 		clear_bit(Blocked, &rdev->flags);
-		remove_and_add_spares(mddev, rdev);
+		if (!hot_remove_rdev(rdev))
+			return -EBUSY;
 	}
 
 	if (rdev->raid_disk >= 0)
@@ -3133,8 +3138,8 @@ slot_store(struct md_rdev *rdev, const char *buf, size_t len)
 		if (rdev->mddev->pers->hot_remove_disk == NULL)
 			return -EINVAL;
 		clear_bit(Blocked, &rdev->flags);
-		remove_and_add_spares(rdev->mddev, rdev);
-		if (rdev->raid_disk >= 0)
+		if (test_bit(MD_RECOVERY_RUNNING, &rdev->mddev->recovery) ||
+		    !hot_remove_rdev(rdev))
 			return -EBUSY;
 		set_bit(MD_RECOVERY_NEEDED, &rdev->mddev->recovery);
 		md_wakeup_thread(rdev->mddev->thread);
@@ -6931,6 +6936,9 @@ static int hot_remove_disk(struct mddev *mddev, dev_t dev)
 	if (!mddev->pers)
 		return -ENODEV;
 
+	if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
+		return -EBUSY;
+
 	rdev = find_rdev(mddev, dev);
 	if (!rdev)
 		return -ENXIO;
@@ -6939,9 +6947,7 @@ static int hot_remove_disk(struct mddev *mddev, dev_t dev)
 		goto kick_rdev;
 
 	clear_bit(Blocked, &rdev->flags);
-	remove_and_add_spares(mddev, rdev);
-
-	if (rdev->raid_disk >= 0)
+	if (!hot_remove_rdev(rdev))
 		goto busy;
 
 kick_rdev:
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -next 09/13] md: convert to use hot_add_rdev() to hot add one rdev
  2023-08-03 13:24 [PATCH -next 00/13] md: make rdev addition and removal independent from daemon thread Yu Kuai
                   ` (7 preceding siblings ...)
  2023-08-03 13:24 ` [PATCH -next 08/13] md: convert to use hot_remove_rdev() to hot remove one rdev Yu Kuai
@ 2023-08-03 13:24 ` Yu Kuai
  2023-08-03 13:24 ` [PATCH -next 10/13] md: cleanup remove_and_add_spares() Yu Kuai
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Yu Kuai @ 2023-08-03 13:24 UTC (permalink / raw)
  To: song, xni; +Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Replace remove_and_add_spares() with hot_add_rdev() in
check_sb_changes(), also handle the case that hot add rdev failed.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index de7399769c8d..1dc26bb1e096 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9773,13 +9773,20 @@ static void check_sb_changes(struct mddev *mddev, struct md_rdev *rdev)
 			    !(le32_to_cpu(sb->feature_map) &
 			      MD_FEATURE_RESHAPE_ACTIVE)) {
 				rdev2->saved_raid_disk = role;
-				ret = remove_and_add_spares(mddev, rdev2);
-				pr_info("Activated spare: %pg\n",
-					rdev2->bdev);
-				/* wakeup mddev->thread here, so array could
-				 * perform resync with the new activated disk */
-				set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
-				md_wakeup_thread(mddev->thread);
+				if (!test_bit(MD_RECOVERY_RUNNING,
+					      &mddev->recovery) &&
+				    hot_add_rdev(rdev2)) {
+					pr_info("Activated spare: %pg\n",
+						rdev2->bdev);
+					/*
+					 * wakeup mddev->thread here, so array
+					 * could perform resync with the new
+					 * activated disk.
+					 */
+					set_bit(MD_RECOVERY_NEEDED,
+						&mddev->recovery);
+					md_wakeup_thread(mddev->thread);
+				}
 			}
 			/* device faulty
 			 * We just want to do the minimum to mark the disk
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -next 10/13] md: cleanup remove_and_add_spares()
  2023-08-03 13:24 [PATCH -next 00/13] md: make rdev addition and removal independent from daemon thread Yu Kuai
                   ` (8 preceding siblings ...)
  2023-08-03 13:24 ` [PATCH -next 09/13] md: convert to use hot_add_rdev() to hot add " Yu Kuai
@ 2023-08-03 13:24 ` Yu Kuai
  2023-08-03 13:24 ` [PATCH -next 11/13] md: use separate work_struct for md_start_sync() Yu Kuai
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Yu Kuai @ 2023-08-03 13:24 UTC (permalink / raw)
  To: song, xni; +Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Now that remove_and_add_spares() is only called from daemon thread, and
the second parameter is always NULL, remove the second parameter.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 22 +++++-----------------
 1 file changed, 5 insertions(+), 17 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 1dc26bb1e096..e64d1d0b4c5c 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -86,8 +86,6 @@ static struct workqueue_struct *md_misc_wq;
 struct workqueue_struct *md_bitmap_wq;
 
 static bool hot_remove_rdev(struct md_rdev *rdev);
-static int remove_and_add_spares(struct mddev *mddev,
-				 struct md_rdev *this);
 static void mddev_detach(struct mddev *mddev);
 static void export_rdev(struct md_rdev *rdev, struct mddev *mddev);
 static void md_wakeup_thread_directly(struct md_thread __rcu *thread);
@@ -9232,36 +9230,26 @@ static bool hot_add_rdev(struct md_rdev *rdev)
 	return true;
 }
 
-static int remove_and_add_spares(struct mddev *mddev,
-				 struct md_rdev *this)
+static int remove_and_add_spares(struct mddev *mddev)
 {
 	struct md_rdev *rdev;
 	int spares = 0;
 	int removed = 0;
 
-	if (this && test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
-		/* Mustn't remove devices when resync thread is running */
-		return 0;
-
 	rdev_for_each(rdev, mddev)
-		if ((this == NULL || rdev == this) && hot_remove_rdev(rdev))
+		if (hot_remove_rdev(rdev))
 			removed++;
 
 	if (removed && mddev->kobj.sd)
 		sysfs_notify_dirent_safe(mddev->sysfs_degraded);
 
-	if (this && removed)
-		goto no_add;
-
 	rdev_for_each(rdev, mddev) {
-		if (this && this != rdev)
-			continue;
 		if (rdev_is_spare(rdev))
 			spares++;
 		if (hot_add_rdev(rdev) && !test_bit(Journal, &rdev->flags))
 			spares++;
 	}
-no_add:
+
 	if (removed)
 		set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags);
 	return spares;
@@ -9381,7 +9369,7 @@ void md_check_recovery(struct mddev *mddev)
 			 * As we only add devices that are already in-sync,
 			 * we can activate the spares immediately.
 			 */
-			remove_and_add_spares(mddev, NULL);
+			remove_and_add_spares(mddev);
 			/* There is no thread, but we need to call
 			 * ->spare_active and clear saved_raid_disk
 			 */
@@ -9462,7 +9450,7 @@ void md_check_recovery(struct mddev *mddev)
 				goto not_running;
 			set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
 			clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
-		} else if ((spares = remove_and_add_spares(mddev, NULL))) {
+		} else if ((spares = remove_and_add_spares(mddev))) {
 			clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
 			clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
 			clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -next 11/13] md: use separate work_struct for md_start_sync()
  2023-08-03 13:24 [PATCH -next 00/13] md: make rdev addition and removal independent from daemon thread Yu Kuai
                   ` (9 preceding siblings ...)
  2023-08-03 13:24 ` [PATCH -next 10/13] md: cleanup remove_and_add_spares() Yu Kuai
@ 2023-08-03 13:24 ` Yu Kuai
  2023-08-03 13:24 ` [PATCH -next 12/13] md: delay choosing sync direction to md_start_sync() Yu Kuai
  2023-08-03 13:24 ` [PATCH -next 13/13] md: delay remove_and_add_spares() for read only array " Yu Kuai
  12 siblings, 0 replies; 17+ messages in thread
From: Yu Kuai @ 2023-08-03 13:24 UTC (permalink / raw)
  To: song, xni; +Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

It's a little weird to borrow 'del_work' for md_start_sync(), declare
a new work_struct 'sync_work' for md_start_sync().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 10 ++++++----
 drivers/md/md.h |  5 ++++-
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index e64d1d0b4c5c..8980a41bfe97 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -630,13 +630,13 @@ void mddev_put(struct mddev *mddev)
 		 * flush_workqueue() after mddev_find will succeed in waiting
 		 * for the work to be done.
 		 */
-		INIT_WORK(&mddev->del_work, mddev_delayed_delete);
 		queue_work(md_misc_wq, &mddev->del_work);
 	}
 	spin_unlock(&all_mddevs_lock);
 }
 
 static void md_safemode_timeout(struct timer_list *t);
+static void md_start_sync(struct work_struct *ws);
 
 void mddev_init(struct mddev *mddev)
 {
@@ -661,6 +661,9 @@ void mddev_init(struct mddev *mddev)
 	mddev->resync_min = 0;
 	mddev->resync_max = MaxSector;
 	mddev->level = LEVEL_NONE;
+
+	INIT_WORK(&mddev->sync_work, md_start_sync);
+	INIT_WORK(&mddev->del_work, mddev_delayed_delete);
 }
 EXPORT_SYMBOL_GPL(mddev_init);
 
@@ -9257,7 +9260,7 @@ static int remove_and_add_spares(struct mddev *mddev)
 
 static void md_start_sync(struct work_struct *ws)
 {
-	struct mddev *mddev = container_of(ws, struct mddev, del_work);
+	struct mddev *mddev = container_of(ws, struct mddev, sync_work);
 
 	rcu_assign_pointer(mddev->sync_thread,
 			   md_register_thread(md_do_sync, mddev, "resync"));
@@ -9470,8 +9473,7 @@ void md_check_recovery(struct mddev *mddev)
 				 */
 				md_bitmap_write_all(mddev->bitmap);
 			}
-			INIT_WORK(&mddev->del_work, md_start_sync);
-			queue_work(md_misc_wq, &mddev->del_work);
+			queue_work(md_misc_wq, &mddev->sync_work);
 			goto unlock;
 		}
 	not_running:
diff --git a/drivers/md/md.h b/drivers/md/md.h
index b25b6d061372..0381f2aa6cbb 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -445,7 +445,10 @@ struct mddev {
 	struct kernfs_node		*sysfs_degraded;	/*handle for 'degraded' */
 	struct kernfs_node		*sysfs_level;		/*handle for 'level' */
 
-	struct work_struct del_work;	/* used for delayed sysfs removal */
+	/* used for delayed sysfs removal */
+	struct work_struct del_work;
+	/* used for register new sync thread */
+	struct work_struct sync_work;
 
 	/* "lock" protects:
 	 *   flush_bio transition from NULL to !NULL
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -next 12/13] md: delay choosing sync direction to md_start_sync()
  2023-08-03 13:24 [PATCH -next 00/13] md: make rdev addition and removal independent from daemon thread Yu Kuai
                   ` (10 preceding siblings ...)
  2023-08-03 13:24 ` [PATCH -next 11/13] md: use separate work_struct for md_start_sync() Yu Kuai
@ 2023-08-03 13:24 ` Yu Kuai
  2023-08-03 13:24 ` [PATCH -next 13/13] md: delay remove_and_add_spares() for read only array " Yu Kuai
  12 siblings, 0 replies; 17+ messages in thread
From: Yu Kuai @ 2023-08-03 13:24 UTC (permalink / raw)
  To: song, xni; +Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Before this patch, for read-write array:

1) md_check_recover() found that something need to be done, and it'll
   try to grab 'reconfig_mutex'. The case that md_check_recover() need
   to do something:
   - array is not suspend;
   - super_block need to be updated;
   - 'MD_RECOVERY_NEEDED' or ''MD_RECOVERY_DONE' is set;
   - unusual case related to safemode;

2) if 'MD_RECOVERY_RUNNING' is not set, and 'MD_RECOVERY_NEEDED' is set,
   md_check_recover() will try to choose a sync direction, and then
   queue a work md_start_sync().

3) md_start_sync() register sync_thread;

After this patch,

1) is the same;
2) if 'MD_RECOVERY_RUNNING' is not set, and 'MD_RECOVERY_NEEDED' is set,
   queue a work md_start_sync() directly;
3) md_start_sync() will try to choose a sync direction, and then
   register sync_thread();

Because 'MD_RECOVERY_RUNNING' is cleared when sync_thread is done, 2)
and 3) is always runned in serial and they can never concurrent, this
change should not introduce any behavior change for now.

Also fix a problem that md_start_sync() can clear 'MD_RECOVERY_RUNNING'
without protection in error path, which might affect the logical in
md_check_recovery().

The advantage to change this is that array reconfiguration is
independent from daemon now, and it'll be much easier to synchronize it
with io, consider that io may rely on daemon thread to be done.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 126 +++++++++++++++++++++++++-----------------------
 1 file changed, 67 insertions(+), 59 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 8980a41bfe97..ef88581d9a39 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9261,6 +9261,44 @@ static int remove_and_add_spares(struct mddev *mddev)
 static void md_start_sync(struct work_struct *ws)
 {
 	struct mddev *mddev = container_of(ws, struct mddev, sync_work);
+	int spares = 0;
+
+	mddev_lock_nointr(mddev);
+
+	/*
+	 * No recovery is running, remove any failed drives, then add spares if
+	 * possible. Spares are also removed and re-added, to allow the
+	 * personality to fail the re-add.
+	 */
+	if (mddev->reshape_position != MaxSector) {
+		if (mddev->pers->check_reshape == NULL ||
+		    mddev->pers->check_reshape(mddev) != 0)
+			/* Cannot proceed */
+			goto not_running;
+		set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
+		clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
+	} else if ((spares = remove_and_add_spares(mddev))) {
+		clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
+		clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
+		clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
+		set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
+	} else if (mddev->recovery_cp < MaxSector) {
+		set_bit(MD_RECOVERY_SYNC, &mddev->recovery);
+		clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
+	} else if (!test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
+		/* nothing to be done ... */
+		goto not_running;
+	}
+
+	if (!mddev->pers->sync_request)
+		goto not_running;
+
+	/*
+	 * We are adding a device or devices to an array which has the bitmap
+	 * stored on all devices. So make sure all bitmap pages get written.
+	 */
+	if (spares)
+		md_bitmap_write_all(mddev->bitmap);
 
 	rcu_assign_pointer(mddev->sync_thread,
 			   md_register_thread(md_do_sync, mddev, "resync"));
@@ -9268,20 +9306,27 @@ static void md_start_sync(struct work_struct *ws)
 		pr_warn("%s: could not start resync thread...\n",
 			mdname(mddev));
 		/* leave the spares where they are, it shouldn't hurt */
-		clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
-		clear_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
-		clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
-		clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
-		clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
-		wake_up(&resync_wait);
-		if (test_and_clear_bit(MD_RECOVERY_RECOVER,
-				       &mddev->recovery))
-			if (mddev->sysfs_action)
-				sysfs_notify_dirent_safe(mddev->sysfs_action);
-	} else
-		md_wakeup_thread(mddev->sync_thread);
+		goto not_running;
+	}
+
+	mddev_unlock(mddev);
+	md_wakeup_thread(mddev->sync_thread);
 	sysfs_notify_dirent_safe(mddev->sysfs_action);
 	md_new_event();
+	return;
+
+not_running:
+	clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
+	clear_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
+	clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
+	clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
+	clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
+	mddev_unlock(mddev);
+
+	wake_up(&resync_wait);
+	if (test_and_clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery) &&
+	    mddev->sysfs_action)
+		sysfs_notify_dirent_safe(mddev->sysfs_action);
 }
 
 /*
@@ -9349,7 +9394,6 @@ void md_check_recovery(struct mddev *mddev)
 		return;
 
 	if (mddev_trylock(mddev)) {
-		int spares = 0;
 		bool try_set_sync = mddev->safemode != 0;
 
 		if (!mddev->external && mddev->safemode == 1)
@@ -9436,56 +9480,20 @@ void md_check_recovery(struct mddev *mddev)
 		clear_bit(MD_RECOVERY_INTR, &mddev->recovery);
 		clear_bit(MD_RECOVERY_DONE, &mddev->recovery);
 
-		if (!test_and_clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery) ||
-		    test_bit(MD_RECOVERY_FROZEN, &mddev->recovery))
-			goto not_running;
-		/* no recovery is running.
-		 * remove any failed drives, then
-		 * add spares if possible.
-		 * Spares are also removed and re-added, to allow
-		 * the personality to fail the re-add.
-		 */
-
-		if (mddev->reshape_position != MaxSector) {
-			if (mddev->pers->check_reshape == NULL ||
-			    mddev->pers->check_reshape(mddev) != 0)
-				/* Cannot proceed */
-				goto not_running;
-			set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
-			clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
-		} else if ((spares = remove_and_add_spares(mddev))) {
-			clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
-			clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
-			clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
-			set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
-		} else if (mddev->recovery_cp < MaxSector) {
-			set_bit(MD_RECOVERY_SYNC, &mddev->recovery);
-			clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
-		} else if (!test_bit(MD_RECOVERY_SYNC, &mddev->recovery))
-			/* nothing to be done ... */
-			goto not_running;
-
-		if (mddev->pers->sync_request) {
-			if (spares) {
-				/* We are adding a device or devices to an array
-				 * which has the bitmap stored on all devices.
-				 * So make sure all bitmap pages get written
-				 */
-				md_bitmap_write_all(mddev->bitmap);
-			}
+		if (test_and_clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery) &&
+		    !test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) {
+			/*
+			 * Before this sync thread is done and
+			 * MD_RECOVERY_RUNNING is cleared, new sync_work won't
+			 * be queued.
+			 */
 			queue_work(md_misc_wq, &mddev->sync_work);
-			goto unlock;
-		}
-	not_running:
-		if (!mddev->sync_thread) {
+		} else {
 			clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
 			wake_up(&resync_wait);
-			if (test_and_clear_bit(MD_RECOVERY_RECOVER,
-					       &mddev->recovery))
-				if (mddev->sysfs_action)
-					sysfs_notify_dirent_safe(mddev->sysfs_action);
 		}
-	unlock:
+
+unlock:
 		wake_up(&mddev->sb_wait);
 		mddev_unlock(mddev);
 	}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH -next 13/13] md: delay remove_and_add_spares() for read only array to md_start_sync()
  2023-08-03 13:24 [PATCH -next 00/13] md: make rdev addition and removal independent from daemon thread Yu Kuai
                   ` (11 preceding siblings ...)
  2023-08-03 13:24 ` [PATCH -next 12/13] md: delay choosing sync direction to md_start_sync() Yu Kuai
@ 2023-08-03 13:24 ` Yu Kuai
  12 siblings, 0 replies; 17+ messages in thread
From: Yu Kuai @ 2023-08-03 13:24 UTC (permalink / raw)
  To: song, xni; +Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Before this patch, for read-only array:

md_check_recovery() check that 'MD_RECOVERY_NEEDED' is set, then it will
call remove_and_add_spares() directly to try to remove and add rdevs
from array.

After this patch:

1) md_check_recovery() check that 'MD_RECOVERY_NEEDED' is set, and the
   worker 'sync_work' is not pending, and there are rdevs can be added
   or removed, then it will queue new work md_start_sync();
2) md_start_sync() will call remove_and_add_spares() and exist;

This change make sure that array reconfiguration is independent from
daemon, and it'll be much easier to synchronize it with io, consier
that io may rely on daemon thread to be done.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 37 +++++++++++++++++++++++++++----------
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index ef88581d9a39..f6e024c15530 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9188,6 +9188,16 @@ static bool rdev_addable(struct md_rdev *rdev)
 	return true;
 }
 
+static bool md_spares_need_change(struct mddev *mddev)
+{
+	struct md_rdev *rdev;
+
+	rdev_for_each(rdev, mddev)
+		if (rdev_removeable(rdev) || rdev_addable(rdev))
+			return true;
+	return false;
+}
+
 static bool rdev_is_spare(struct md_rdev *rdev)
 {
 	return !test_bit(Candidate, &rdev->flags) && rdev->raid_disk >= 0 &&
@@ -9265,6 +9275,12 @@ static void md_start_sync(struct work_struct *ws)
 
 	mddev_lock_nointr(mddev);
 
+	if (!md_is_rdwr(mddev)) {
+		remove_and_add_spares(mddev);
+		mddev_unlock(mddev);
+		return;
+	}
+
 	/*
 	 * No recovery is running, remove any failed drives, then add spares if
 	 * possible. Spares are also removed and re-added, to allow the
@@ -9381,7 +9397,8 @@ void md_check_recovery(struct mddev *mddev)
 	}
 
 	if (!md_is_rdwr(mddev) &&
-	    !test_bit(MD_RECOVERY_NEEDED, &mddev->recovery))
+	    (!test_bit(MD_RECOVERY_NEEDED, &mddev->recovery) ||
+	     work_pending(&mddev->sync_work)))
 		return;
 	if ( ! (
 		(mddev->sb_flags & ~ (1<<MD_SB_CHANGE_PENDING)) ||
@@ -9409,15 +9426,8 @@ void md_check_recovery(struct mddev *mddev)
 				 */
 				rdev_for_each(rdev, mddev)
 					clear_bit(Blocked, &rdev->flags);
-			/* On a read-only array we can:
-			 * - remove failed devices
-			 * - add already-in_sync devices if the array itself
-			 *   is in-sync.
-			 * As we only add devices that are already in-sync,
-			 * we can activate the spares immediately.
-			 */
-			remove_and_add_spares(mddev);
-			/* There is no thread, but we need to call
+			/*
+			 * There is no thread, but we need to call
 			 * ->spare_active and clear saved_raid_disk
 			 */
 			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
@@ -9425,6 +9435,13 @@ void md_check_recovery(struct mddev *mddev)
 			clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
 			clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 			clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
+
+			/*
+			 * Let md_start_sync() to remove and add rdevs to the
+			 * array.
+			 */
+			if (md_spares_need_change(mddev))
+				queue_work(md_misc_wq, &mddev->sync_work);
 			goto unlock;
 		}
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH -next 01/13] md: remove rdev flag 'RemoveSynchronized'
  2023-08-03 13:24 ` [PATCH -next 01/13] md: remove rdev flag 'RemoveSynchronized' Yu Kuai
@ 2023-08-14 10:02   ` Song Liu
  2023-08-14 21:36     ` NeilBrown
  0 siblings, 1 reply; 17+ messages in thread
From: Song Liu @ 2023-08-14 10:02 UTC (permalink / raw)
  To: Yu Kuai, NeilBrown, NeilBrown
  Cc: xni, linux-raid, linux-kernel, yukuai3, yi.zhang, yangerkun

+ Neil

RemoveSynchronized was added by Neil not too long ago. I wonder whether
we still need it in some cases (large raid10 losing a large set of disks).

Thanks,
Song


On Thu, Aug 3, 2023 at 9:27 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> From: Yu Kuai <yukuai3@huawei.com>
>
> When multiple rdevs are removed from array in remove_and_add_spares(),
> which is only possible from daemon thread(user can only remove one rdev
> through ioctl/sysfs at one time), the flag is used to avoid calling
> synchronize_rcu() multiple times for each rdev.
>
> However, remove rdev from daemon thread really is super cold path, while
> removing n rdevs, it doesn't matter sync rcu n times or just one time.
> Hence remove this flag and make code simpler.
>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  drivers/md/md-multipath.c | 15 +++++++--------
>  drivers/md/md.c           | 26 ++------------------------
>  drivers/md/md.h           |  5 -----
>  drivers/md/raid1.c        | 15 +++++++--------
>  drivers/md/raid10.c       | 15 +++++++--------
>  drivers/md/raid5.c        | 15 +++++++--------
>  6 files changed, 30 insertions(+), 61 deletions(-)
>
> diff --git a/drivers/md/md-multipath.c b/drivers/md/md-multipath.c
> index d22276870283..9e4ddd5240cd 100644
> --- a/drivers/md/md-multipath.c
> +++ b/drivers/md/md-multipath.c
> @@ -258,14 +258,13 @@ static int multipath_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
>                         goto abort;
>                 }
>                 p->rdev = NULL;
> -               if (!test_bit(RemoveSynchronized, &rdev->flags)) {
> -                       synchronize_rcu();
> -                       if (atomic_read(&rdev->nr_pending)) {
> -                               /* lost the race, try later */
> -                               err = -EBUSY;
> -                               p->rdev = rdev;
> -                               goto abort;
> -                       }
> +
> +               synchronize_rcu();
> +               if (atomic_read(&rdev->nr_pending)) {
> +                       /* lost the race, try later */
> +                       err = -EBUSY;
> +                       p->rdev = rdev;
> +                       goto abort;
>                 }
>                 err = md_integrity_register(mddev);
>         }
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index a3d98273b295..cd7ac1dee3b8 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -9155,7 +9155,6 @@ static int remove_and_add_spares(struct mddev *mddev,
>         struct md_rdev *rdev;
>         int spares = 0;
>         int removed = 0;
> -       bool remove_some = false;
>
>         if (this && test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
>                 /* Mustn't remove devices when resync thread is running */
> @@ -9165,28 +9164,9 @@ static int remove_and_add_spares(struct mddev *mddev,
>                 if ((this == NULL || rdev == this) &&
>                     rdev->raid_disk >= 0 &&
>                     !test_bit(Blocked, &rdev->flags) &&
> -                   test_bit(Faulty, &rdev->flags) &&
> +                   !test_bit(In_sync, &rdev->flags) &&
> +                   !test_bit(Journal, &rdev->flags) &&
>                     atomic_read(&rdev->nr_pending)==0) {
> -                       /* Faulty non-Blocked devices with nr_pending == 0
> -                        * never get nr_pending incremented,
> -                        * never get Faulty cleared, and never get Blocked set.
> -                        * So we can synchronize_rcu now rather than once per device
> -                        */
> -                       remove_some = true;
> -                       set_bit(RemoveSynchronized, &rdev->flags);
> -               }
> -       }
> -
> -       if (remove_some)
> -               synchronize_rcu();
> -       rdev_for_each(rdev, mddev) {
> -               if ((this == NULL || rdev == this) &&
> -                   rdev->raid_disk >= 0 &&
> -                   !test_bit(Blocked, &rdev->flags) &&
> -                   ((test_bit(RemoveSynchronized, &rdev->flags) ||
> -                    (!test_bit(In_sync, &rdev->flags) &&
> -                     !test_bit(Journal, &rdev->flags))) &&
> -                   atomic_read(&rdev->nr_pending)==0)) {
>                         if (mddev->pers->hot_remove_disk(
>                                     mddev, rdev) == 0) {
>                                 sysfs_unlink_rdev(mddev, rdev);
> @@ -9195,8 +9175,6 @@ static int remove_and_add_spares(struct mddev *mddev,
>                                 removed++;
>                         }
>                 }
> -               if (remove_some && test_bit(RemoveSynchronized, &rdev->flags))
> -                       clear_bit(RemoveSynchronized, &rdev->flags);
>         }
>
>         if (removed && mddev->kobj.sd)
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 8ae957480976..b25b6d061372 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -189,11 +189,6 @@ enum flag_bits {
>                                  * than other devices in the array
>                                  */
>         ClusterRemove,
> -       RemoveSynchronized,     /* synchronize_rcu() was called after
> -                                * this device was known to be faulty,
> -                                * so it is safe to remove without
> -                                * another synchronize_rcu() call.
> -                                */
>         ExternalBbl,            /* External metadata provides bad
>                                  * block management for a disk
>                                  */
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 23d211969565..acb6d6542619 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1859,15 +1859,14 @@ static int raid1_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
>                         goto abort;
>                 }
>                 p->rdev = NULL;
> -               if (!test_bit(RemoveSynchronized, &rdev->flags)) {
> -                       synchronize_rcu();
> -                       if (atomic_read(&rdev->nr_pending)) {
> -                               /* lost the race, try later */
> -                               err = -EBUSY;
> -                               p->rdev = rdev;
> -                               goto abort;
> -                       }
> +               synchronize_rcu();
> +               if (atomic_read(&rdev->nr_pending)) {
> +                       /* lost the race, try later */
> +                       err = -EBUSY;
> +                       p->rdev = rdev;
> +                       goto abort;
>                 }
> +
>                 if (conf->mirrors[conf->raid_disks + number].rdev) {
>                         /* We just removed a device that is being replaced.
>                          * Move down the replacement.  We drain all IO before
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 7704a4c7f469..64dd5cb6133e 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -2247,15 +2247,14 @@ static int raid10_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
>                 goto abort;
>         }
>         *rdevp = NULL;
> -       if (!test_bit(RemoveSynchronized, &rdev->flags)) {
> -               synchronize_rcu();
> -               if (atomic_read(&rdev->nr_pending)) {
> -                       /* lost the race, try later */
> -                       err = -EBUSY;
> -                       *rdevp = rdev;
> -                       goto abort;
> -               }
> +       synchronize_rcu();
> +       if (atomic_read(&rdev->nr_pending)) {
> +               /* lost the race, try later */
> +               err = -EBUSY;
> +               *rdevp = rdev;
> +               goto abort;
>         }
> +
>         if (p->replacement) {
>                 /* We must have just cleared 'rdev' */
>                 p->rdev = p->replacement;
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 4cdb35e54251..37d9865b180a 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -8267,15 +8267,14 @@ static int raid5_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
>                 goto abort;
>         }
>         *rdevp = NULL;
> -       if (!test_bit(RemoveSynchronized, &rdev->flags)) {
> -               lockdep_assert_held(&mddev->reconfig_mutex);
> -               synchronize_rcu();
> -               if (atomic_read(&rdev->nr_pending)) {
> -                       /* lost the race, try later */
> -                       err = -EBUSY;
> -                       rcu_assign_pointer(*rdevp, rdev);
> -               }
> +       lockdep_assert_held(&mddev->reconfig_mutex);
> +       synchronize_rcu();
> +       if (atomic_read(&rdev->nr_pending)) {
> +               /* lost the race, try later */
> +               err = -EBUSY;
> +               rcu_assign_pointer(*rdevp, rdev);
>         }
> +
>         if (!err) {
>                 err = log_modify(conf, rdev, false);
>                 if (err)
> --
> 2.39.2
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH -next 01/13] md: remove rdev flag 'RemoveSynchronized'
  2023-08-14 10:02   ` Song Liu
@ 2023-08-14 21:36     ` NeilBrown
  2023-08-15  1:04       ` Yu Kuai
  0 siblings, 1 reply; 17+ messages in thread
From: NeilBrown @ 2023-08-14 21:36 UTC (permalink / raw)
  To: Song Liu
  Cc: Yu Kuai, xni, linux-raid, linux-kernel, yukuai3, yi.zhang,
	yangerkun

On Mon, 14 Aug 2023, Song Liu wrote:
> + Neil
> 
> RemoveSynchronized was added by Neil not too long ago. I wonder whether
> we still need it in some cases (large raid10 losing a large set of disks).

We have customers with thousands of members in a raid10.  They do
sometimes need to remove half of them.  Dropping RemoveSynchronized
would significantly increase the time it takes to do that.

Removing an rdev is not often a hot path.  But sometimes it can be.

NeilBrown


> 
> Thanks,
> Song
> 
> 
> On Thu, Aug 3, 2023 at 9:27 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >
> > From: Yu Kuai <yukuai3@huawei.com>
> >
> > When multiple rdevs are removed from array in remove_and_add_spares(),
> > which is only possible from daemon thread(user can only remove one rdev
> > through ioctl/sysfs at one time), the flag is used to avoid calling
> > synchronize_rcu() multiple times for each rdev.
> >
> > However, remove rdev from daemon thread really is super cold path, while
> > removing n rdevs, it doesn't matter sync rcu n times or just one time.
> > Hence remove this flag and make code simpler.
> >
> > Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> > ---
> >  drivers/md/md-multipath.c | 15 +++++++--------
> >  drivers/md/md.c           | 26 ++------------------------
> >  drivers/md/md.h           |  5 -----
> >  drivers/md/raid1.c        | 15 +++++++--------
> >  drivers/md/raid10.c       | 15 +++++++--------
> >  drivers/md/raid5.c        | 15 +++++++--------
> >  6 files changed, 30 insertions(+), 61 deletions(-)
> >
> > diff --git a/drivers/md/md-multipath.c b/drivers/md/md-multipath.c
> > index d22276870283..9e4ddd5240cd 100644
> > --- a/drivers/md/md-multipath.c
> > +++ b/drivers/md/md-multipath.c
> > @@ -258,14 +258,13 @@ static int multipath_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
> >                         goto abort;
> >                 }
> >                 p->rdev = NULL;
> > -               if (!test_bit(RemoveSynchronized, &rdev->flags)) {
> > -                       synchronize_rcu();
> > -                       if (atomic_read(&rdev->nr_pending)) {
> > -                               /* lost the race, try later */
> > -                               err = -EBUSY;
> > -                               p->rdev = rdev;
> > -                               goto abort;
> > -                       }
> > +
> > +               synchronize_rcu();
> > +               if (atomic_read(&rdev->nr_pending)) {
> > +                       /* lost the race, try later */
> > +                       err = -EBUSY;
> > +                       p->rdev = rdev;
> > +                       goto abort;
> >                 }
> >                 err = md_integrity_register(mddev);
> >         }
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index a3d98273b295..cd7ac1dee3b8 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -9155,7 +9155,6 @@ static int remove_and_add_spares(struct mddev *mddev,
> >         struct md_rdev *rdev;
> >         int spares = 0;
> >         int removed = 0;
> > -       bool remove_some = false;
> >
> >         if (this && test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
> >                 /* Mustn't remove devices when resync thread is running */
> > @@ -9165,28 +9164,9 @@ static int remove_and_add_spares(struct mddev *mddev,
> >                 if ((this == NULL || rdev == this) &&
> >                     rdev->raid_disk >= 0 &&
> >                     !test_bit(Blocked, &rdev->flags) &&
> > -                   test_bit(Faulty, &rdev->flags) &&
> > +                   !test_bit(In_sync, &rdev->flags) &&
> > +                   !test_bit(Journal, &rdev->flags) &&
> >                     atomic_read(&rdev->nr_pending)==0) {
> > -                       /* Faulty non-Blocked devices with nr_pending == 0
> > -                        * never get nr_pending incremented,
> > -                        * never get Faulty cleared, and never get Blocked set.
> > -                        * So we can synchronize_rcu now rather than once per device
> > -                        */
> > -                       remove_some = true;
> > -                       set_bit(RemoveSynchronized, &rdev->flags);
> > -               }
> > -       }
> > -
> > -       if (remove_some)
> > -               synchronize_rcu();
> > -       rdev_for_each(rdev, mddev) {
> > -               if ((this == NULL || rdev == this) &&
> > -                   rdev->raid_disk >= 0 &&
> > -                   !test_bit(Blocked, &rdev->flags) &&
> > -                   ((test_bit(RemoveSynchronized, &rdev->flags) ||
> > -                    (!test_bit(In_sync, &rdev->flags) &&
> > -                     !test_bit(Journal, &rdev->flags))) &&
> > -                   atomic_read(&rdev->nr_pending)==0)) {
> >                         if (mddev->pers->hot_remove_disk(
> >                                     mddev, rdev) == 0) {
> >                                 sysfs_unlink_rdev(mddev, rdev);
> > @@ -9195,8 +9175,6 @@ static int remove_and_add_spares(struct mddev *mddev,
> >                                 removed++;
> >                         }
> >                 }
> > -               if (remove_some && test_bit(RemoveSynchronized, &rdev->flags))
> > -                       clear_bit(RemoveSynchronized, &rdev->flags);
> >         }
> >
> >         if (removed && mddev->kobj.sd)
> > diff --git a/drivers/md/md.h b/drivers/md/md.h
> > index 8ae957480976..b25b6d061372 100644
> > --- a/drivers/md/md.h
> > +++ b/drivers/md/md.h
> > @@ -189,11 +189,6 @@ enum flag_bits {
> >                                  * than other devices in the array
> >                                  */
> >         ClusterRemove,
> > -       RemoveSynchronized,     /* synchronize_rcu() was called after
> > -                                * this device was known to be faulty,
> > -                                * so it is safe to remove without
> > -                                * another synchronize_rcu() call.
> > -                                */
> >         ExternalBbl,            /* External metadata provides bad
> >                                  * block management for a disk
> >                                  */
> > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > index 23d211969565..acb6d6542619 100644
> > --- a/drivers/md/raid1.c
> > +++ b/drivers/md/raid1.c
> > @@ -1859,15 +1859,14 @@ static int raid1_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
> >                         goto abort;
> >                 }
> >                 p->rdev = NULL;
> > -               if (!test_bit(RemoveSynchronized, &rdev->flags)) {
> > -                       synchronize_rcu();
> > -                       if (atomic_read(&rdev->nr_pending)) {
> > -                               /* lost the race, try later */
> > -                               err = -EBUSY;
> > -                               p->rdev = rdev;
> > -                               goto abort;
> > -                       }
> > +               synchronize_rcu();
> > +               if (atomic_read(&rdev->nr_pending)) {
> > +                       /* lost the race, try later */
> > +                       err = -EBUSY;
> > +                       p->rdev = rdev;
> > +                       goto abort;
> >                 }
> > +
> >                 if (conf->mirrors[conf->raid_disks + number].rdev) {
> >                         /* We just removed a device that is being replaced.
> >                          * Move down the replacement.  We drain all IO before
> > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> > index 7704a4c7f469..64dd5cb6133e 100644
> > --- a/drivers/md/raid10.c
> > +++ b/drivers/md/raid10.c
> > @@ -2247,15 +2247,14 @@ static int raid10_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
> >                 goto abort;
> >         }
> >         *rdevp = NULL;
> > -       if (!test_bit(RemoveSynchronized, &rdev->flags)) {
> > -               synchronize_rcu();
> > -               if (atomic_read(&rdev->nr_pending)) {
> > -                       /* lost the race, try later */
> > -                       err = -EBUSY;
> > -                       *rdevp = rdev;
> > -                       goto abort;
> > -               }
> > +       synchronize_rcu();
> > +       if (atomic_read(&rdev->nr_pending)) {
> > +               /* lost the race, try later */
> > +               err = -EBUSY;
> > +               *rdevp = rdev;
> > +               goto abort;
> >         }
> > +
> >         if (p->replacement) {
> >                 /* We must have just cleared 'rdev' */
> >                 p->rdev = p->replacement;
> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > index 4cdb35e54251..37d9865b180a 100644
> > --- a/drivers/md/raid5.c
> > +++ b/drivers/md/raid5.c
> > @@ -8267,15 +8267,14 @@ static int raid5_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
> >                 goto abort;
> >         }
> >         *rdevp = NULL;
> > -       if (!test_bit(RemoveSynchronized, &rdev->flags)) {
> > -               lockdep_assert_held(&mddev->reconfig_mutex);
> > -               synchronize_rcu();
> > -               if (atomic_read(&rdev->nr_pending)) {
> > -                       /* lost the race, try later */
> > -                       err = -EBUSY;
> > -                       rcu_assign_pointer(*rdevp, rdev);
> > -               }
> > +       lockdep_assert_held(&mddev->reconfig_mutex);
> > +       synchronize_rcu();
> > +       if (atomic_read(&rdev->nr_pending)) {
> > +               /* lost the race, try later */
> > +               err = -EBUSY;
> > +               rcu_assign_pointer(*rdevp, rdev);
> >         }
> > +
> >         if (!err) {
> >                 err = log_modify(conf, rdev, false);
> >                 if (err)
> > --
> > 2.39.2
> >
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH -next 01/13] md: remove rdev flag 'RemoveSynchronized'
  2023-08-14 21:36     ` NeilBrown
@ 2023-08-15  1:04       ` Yu Kuai
  0 siblings, 0 replies; 17+ messages in thread
From: Yu Kuai @ 2023-08-15  1:04 UTC (permalink / raw)
  To: NeilBrown, Song Liu
  Cc: Yu Kuai, xni, linux-raid, linux-kernel, yi.zhang, yangerkun,
	yukuai (C)

Hi,

在 2023/08/15 5:36, NeilBrown 写道:
> On Mon, 14 Aug 2023, Song Liu wrote:
>> + Neil
>>
>> RemoveSynchronized was added by Neil not too long ago. I wonder whether
>> we still need it in some cases (large raid10 losing a large set of disks).
> 
> We have customers with thousands of members in a raid10.  They do
> sometimes need to remove half of them.  Dropping RemoveSynchronized
> would significantly increase the time it takes to do that.
> 
> Removing an rdev is not often a hot path.  But sometimes it can be.
> 

Thanks for the explanation, and soory that I didn't mention that in this
commit message, the way that rdev is protected through rcu is not safe,
and refer to the implementation of 'q_usage_counter' in block layer, I'm
planing to refactor the way that rdev is protected and rcu is not needed
anymore:

1. refactor mddev_suspend(), and make sure it's called by all the caller
of remove_and_add_spares().
2. factor out a helper md_array_enter() from md_handle_request, this
helper will grab 'active_io' and wait for mddev_suspend() to be done.
3. in the fastpath, before accessing rdev, make sure md_array_enter() is
called.

Thanks,
Kuai

> NeilBrown
> 
> 
>>
>> Thanks,
>> Song
>>
>>
>> On Thu, Aug 3, 2023 at 9:27 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>
>>> From: Yu Kuai <yukuai3@huawei.com>
>>>
>>> When multiple rdevs are removed from array in remove_and_add_spares(),
>>> which is only possible from daemon thread(user can only remove one rdev
>>> through ioctl/sysfs at one time), the flag is used to avoid calling
>>> synchronize_rcu() multiple times for each rdev.
>>>
>>> However, remove rdev from daemon thread really is super cold path, while
>>> removing n rdevs, it doesn't matter sync rcu n times or just one time.
>>> Hence remove this flag and make code simpler.
>>>
>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>>> ---
>>>   drivers/md/md-multipath.c | 15 +++++++--------
>>>   drivers/md/md.c           | 26 ++------------------------
>>>   drivers/md/md.h           |  5 -----
>>>   drivers/md/raid1.c        | 15 +++++++--------
>>>   drivers/md/raid10.c       | 15 +++++++--------
>>>   drivers/md/raid5.c        | 15 +++++++--------
>>>   6 files changed, 30 insertions(+), 61 deletions(-)
>>>
>>> diff --git a/drivers/md/md-multipath.c b/drivers/md/md-multipath.c
>>> index d22276870283..9e4ddd5240cd 100644
>>> --- a/drivers/md/md-multipath.c
>>> +++ b/drivers/md/md-multipath.c
>>> @@ -258,14 +258,13 @@ static int multipath_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
>>>                          goto abort;
>>>                  }
>>>                  p->rdev = NULL;
>>> -               if (!test_bit(RemoveSynchronized, &rdev->flags)) {
>>> -                       synchronize_rcu();
>>> -                       if (atomic_read(&rdev->nr_pending)) {
>>> -                               /* lost the race, try later */
>>> -                               err = -EBUSY;
>>> -                               p->rdev = rdev;
>>> -                               goto abort;
>>> -                       }
>>> +
>>> +               synchronize_rcu();
>>> +               if (atomic_read(&rdev->nr_pending)) {
>>> +                       /* lost the race, try later */
>>> +                       err = -EBUSY;
>>> +                       p->rdev = rdev;
>>> +                       goto abort;
>>>                  }
>>>                  err = md_integrity_register(mddev);
>>>          }
>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>> index a3d98273b295..cd7ac1dee3b8 100644
>>> --- a/drivers/md/md.c
>>> +++ b/drivers/md/md.c
>>> @@ -9155,7 +9155,6 @@ static int remove_and_add_spares(struct mddev *mddev,
>>>          struct md_rdev *rdev;
>>>          int spares = 0;
>>>          int removed = 0;
>>> -       bool remove_some = false;
>>>
>>>          if (this && test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
>>>                  /* Mustn't remove devices when resync thread is running */
>>> @@ -9165,28 +9164,9 @@ static int remove_and_add_spares(struct mddev *mddev,
>>>                  if ((this == NULL || rdev == this) &&
>>>                      rdev->raid_disk >= 0 &&
>>>                      !test_bit(Blocked, &rdev->flags) &&
>>> -                   test_bit(Faulty, &rdev->flags) &&
>>> +                   !test_bit(In_sync, &rdev->flags) &&
>>> +                   !test_bit(Journal, &rdev->flags) &&
>>>                      atomic_read(&rdev->nr_pending)==0) {
>>> -                       /* Faulty non-Blocked devices with nr_pending == 0
>>> -                        * never get nr_pending incremented,
>>> -                        * never get Faulty cleared, and never get Blocked set.
>>> -                        * So we can synchronize_rcu now rather than once per device
>>> -                        */
>>> -                       remove_some = true;
>>> -                       set_bit(RemoveSynchronized, &rdev->flags);
>>> -               }
>>> -       }
>>> -
>>> -       if (remove_some)
>>> -               synchronize_rcu();
>>> -       rdev_for_each(rdev, mddev) {
>>> -               if ((this == NULL || rdev == this) &&
>>> -                   rdev->raid_disk >= 0 &&
>>> -                   !test_bit(Blocked, &rdev->flags) &&
>>> -                   ((test_bit(RemoveSynchronized, &rdev->flags) ||
>>> -                    (!test_bit(In_sync, &rdev->flags) &&
>>> -                     !test_bit(Journal, &rdev->flags))) &&
>>> -                   atomic_read(&rdev->nr_pending)==0)) {
>>>                          if (mddev->pers->hot_remove_disk(
>>>                                      mddev, rdev) == 0) {
>>>                                  sysfs_unlink_rdev(mddev, rdev);
>>> @@ -9195,8 +9175,6 @@ static int remove_and_add_spares(struct mddev *mddev,
>>>                                  removed++;
>>>                          }
>>>                  }
>>> -               if (remove_some && test_bit(RemoveSynchronized, &rdev->flags))
>>> -                       clear_bit(RemoveSynchronized, &rdev->flags);
>>>          }
>>>
>>>          if (removed && mddev->kobj.sd)
>>> diff --git a/drivers/md/md.h b/drivers/md/md.h
>>> index 8ae957480976..b25b6d061372 100644
>>> --- a/drivers/md/md.h
>>> +++ b/drivers/md/md.h
>>> @@ -189,11 +189,6 @@ enum flag_bits {
>>>                                   * than other devices in the array
>>>                                   */
>>>          ClusterRemove,
>>> -       RemoveSynchronized,     /* synchronize_rcu() was called after
>>> -                                * this device was known to be faulty,
>>> -                                * so it is safe to remove without
>>> -                                * another synchronize_rcu() call.
>>> -                                */
>>>          ExternalBbl,            /* External metadata provides bad
>>>                                   * block management for a disk
>>>                                   */
>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>>> index 23d211969565..acb6d6542619 100644
>>> --- a/drivers/md/raid1.c
>>> +++ b/drivers/md/raid1.c
>>> @@ -1859,15 +1859,14 @@ static int raid1_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
>>>                          goto abort;
>>>                  }
>>>                  p->rdev = NULL;
>>> -               if (!test_bit(RemoveSynchronized, &rdev->flags)) {
>>> -                       synchronize_rcu();
>>> -                       if (atomic_read(&rdev->nr_pending)) {
>>> -                               /* lost the race, try later */
>>> -                               err = -EBUSY;
>>> -                               p->rdev = rdev;
>>> -                               goto abort;
>>> -                       }
>>> +               synchronize_rcu();
>>> +               if (atomic_read(&rdev->nr_pending)) {
>>> +                       /* lost the race, try later */
>>> +                       err = -EBUSY;
>>> +                       p->rdev = rdev;
>>> +                       goto abort;
>>>                  }
>>> +
>>>                  if (conf->mirrors[conf->raid_disks + number].rdev) {
>>>                          /* We just removed a device that is being replaced.
>>>                           * Move down the replacement.  We drain all IO before
>>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>>> index 7704a4c7f469..64dd5cb6133e 100644
>>> --- a/drivers/md/raid10.c
>>> +++ b/drivers/md/raid10.c
>>> @@ -2247,15 +2247,14 @@ static int raid10_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
>>>                  goto abort;
>>>          }
>>>          *rdevp = NULL;
>>> -       if (!test_bit(RemoveSynchronized, &rdev->flags)) {
>>> -               synchronize_rcu();
>>> -               if (atomic_read(&rdev->nr_pending)) {
>>> -                       /* lost the race, try later */
>>> -                       err = -EBUSY;
>>> -                       *rdevp = rdev;
>>> -                       goto abort;
>>> -               }
>>> +       synchronize_rcu();
>>> +       if (atomic_read(&rdev->nr_pending)) {
>>> +               /* lost the race, try later */
>>> +               err = -EBUSY;
>>> +               *rdevp = rdev;
>>> +               goto abort;
>>>          }
>>> +
>>>          if (p->replacement) {
>>>                  /* We must have just cleared 'rdev' */
>>>                  p->rdev = p->replacement;
>>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>>> index 4cdb35e54251..37d9865b180a 100644
>>> --- a/drivers/md/raid5.c
>>> +++ b/drivers/md/raid5.c
>>> @@ -8267,15 +8267,14 @@ static int raid5_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
>>>                  goto abort;
>>>          }
>>>          *rdevp = NULL;
>>> -       if (!test_bit(RemoveSynchronized, &rdev->flags)) {
>>> -               lockdep_assert_held(&mddev->reconfig_mutex);
>>> -               synchronize_rcu();
>>> -               if (atomic_read(&rdev->nr_pending)) {
>>> -                       /* lost the race, try later */
>>> -                       err = -EBUSY;
>>> -                       rcu_assign_pointer(*rdevp, rdev);
>>> -               }
>>> +       lockdep_assert_held(&mddev->reconfig_mutex);
>>> +       synchronize_rcu();
>>> +       if (atomic_read(&rdev->nr_pending)) {
>>> +               /* lost the race, try later */
>>> +               err = -EBUSY;
>>> +               rcu_assign_pointer(*rdevp, rdev);
>>>          }
>>> +
>>>          if (!err) {
>>>                  err = log_modify(conf, rdev, false);
>>>                  if (err)
>>> --
>>> 2.39.2
>>>
>>
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-08-15  1:05 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-03 13:24 [PATCH -next 00/13] md: make rdev addition and removal independent from daemon thread Yu Kuai
2023-08-03 13:24 ` [PATCH -next 01/13] md: remove rdev flag 'RemoveSynchronized' Yu Kuai
2023-08-14 10:02   ` Song Liu
2023-08-14 21:36     ` NeilBrown
2023-08-15  1:04       ` Yu Kuai
2023-08-03 13:24 ` [PATCH -next 02/13] md: factor out a helper rdev_removeable() from remove_and_add_spares() Yu Kuai
2023-08-03 13:24 ` [PATCH -next 03/13] md: factor out a helper rdev_is_spare() " Yu Kuai
2023-08-03 13:24 ` [PATCH -next 04/13] md: factor out a helper rdev_addable() " Yu Kuai
2023-08-03 13:24 ` [PATCH -next 05/13] md: factor out a helper hot_remove_rdev() " Yu Kuai
2023-08-03 13:24 ` [PATCH -next 06/13] md: factor out a helper hot_add_rdev() " Yu Kuai
2023-08-03 13:24 ` [PATCH -next 07/13] md: factor out a helper remove_rdev() from state_store() Yu Kuai
2023-08-03 13:24 ` [PATCH -next 08/13] md: convert to use hot_remove_rdev() to hot remove one rdev Yu Kuai
2023-08-03 13:24 ` [PATCH -next 09/13] md: convert to use hot_add_rdev() to hot add " Yu Kuai
2023-08-03 13:24 ` [PATCH -next 10/13] md: cleanup remove_and_add_spares() Yu Kuai
2023-08-03 13:24 ` [PATCH -next 11/13] md: use separate work_struct for md_start_sync() Yu Kuai
2023-08-03 13:24 ` [PATCH -next 12/13] md: delay choosing sync direction to md_start_sync() Yu Kuai
2023-08-03 13:24 ` [PATCH -next 13/13] md: delay remove_and_add_spares() for read only array " Yu Kuai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).