[PATCH v2 00/11] md: align bio to io_opt and fix abnormal io

public inbox for linux-raid@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 00/11] md: align bio to io_opt and fix abnormal io_opt
@ 2025-11-24  6:31 Yu Kuai
  2025-11-24  6:31 ` [PATCH v2 01/11] md: merge mddev has_superblock into mddev_flags Yu Kuai
                   ` (10 more replies)
  0 siblings, 11 replies; 25+ messages in thread
From: Yu Kuai @ 2025-11-24  6:31 UTC (permalink / raw)
  To: song, linux-raid; +Cc: linux-kernel, filippo, colyli, yukuai

changes in v2:
 - add prep cleanup patches, 1-3;
 - and patch 11 to fix abormal io_opt;

Yu Kuai (11):
  md: merge mddev has_superblock into mddev_flags
  md: merge mddev faillast_dev into mddev_flags
  md: merge mddev serialize_policy into mddev_flags
  md/raid5: use mempool to allocate stripe_request_ctx
  md/raid5: make sure max_sectors is not less than io_opt
  md: support to align bio to limits
  md: add a helper md_config_align_limits()
  md/raid5: align bio to io_opt
  md/raid10: align bio to io_opt
  md/raid0: align bio to io_opt
  md: fix abnormal io_opt from member disks

 drivers/md/md-bitmap.c |   4 +-
 drivers/md/md.c        | 117 +++++++++++++++++++++++++++++++++++------
 drivers/md/md.h        |  32 +++++++++--
 drivers/md/raid0.c     |   6 ++-
 drivers/md/raid1-10.c  |   5 --
 drivers/md/raid1.c     |  13 ++---
 drivers/md/raid10.c    |  10 ++--
 drivers/md/raid5.c     |  91 ++++++++++++++++++++++----------
 drivers/md/raid5.h     |   2 +
 9 files changed, 214 insertions(+), 66 deletions(-)

-- 
2.51.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 01/11] md: merge mddev has_superblock into mddev_flags
  2025-11-24  6:31 [PATCH v2 00/11] md: align bio to io_opt and fix abnormal io_opt Yu Kuai
@ 2025-11-24  6:31 ` Yu Kuai
  2025-12-26  3:04   ` Li Nan
  2025-11-24  6:31 ` [PATCH v2 02/11] md: merge mddev faillast_dev " Yu Kuai
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Yu Kuai @ 2025-11-24  6:31 UTC (permalink / raw)
  To: song, linux-raid; +Cc: linux-kernel, filippo, colyli, yukuai

There is not need to use a separate field in struct mddev, there are no
functional changes.

Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 drivers/md/md.c | 6 +++---
 drivers/md/md.h | 3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 7b5c5967568f..b49fdee11a03 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6462,7 +6462,7 @@ int md_run(struct mddev *mddev)
 	 * the only valid external interface is through the md
 	 * device.
 	 */
-	mddev->has_superblocks = false;
+	clear_bit(MD_HAS_SUPERBLOCK, &mddev->flags);
 	rdev_for_each(rdev, mddev) {
 		if (test_bit(Faulty, &rdev->flags))
 			continue;
@@ -6475,7 +6475,7 @@ int md_run(struct mddev *mddev)
 		}
 
 		if (rdev->sb_page)
-			mddev->has_superblocks = true;
+			set_bit(MD_HAS_SUPERBLOCK, &mddev->flags);
 
 		/* perform some consistency tests on the device.
 		 * We don't want the data to overlap the metadata,
@@ -9085,7 +9085,7 @@ void md_write_start(struct mddev *mddev, struct bio *bi)
 	rcu_read_unlock();
 	if (did_change)
 		sysfs_notify_dirent_safe(mddev->sysfs_state);
-	if (!mddev->has_superblocks)
+	if (!test_bit(MD_HAS_SUPERBLOCK, &mddev->flags))
 		return;
 	wait_event(mddev->sb_wait,
 		   !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags));
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 6985f2829bbd..b4c9aa600edd 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -340,6 +340,7 @@ struct md_cluster_operations;
  *		   array is ready yet.
  * @MD_BROKEN: This is used to stop writes and mark array as failed.
  * @MD_DELETED: This device is being deleted
+ * @MD_HAS_SUPERBLOCK: There is persistence sb in member disks.
  *
  * change UNSUPPORTED_MDDEV_FLAGS for each array type if new flag is added
  */
@@ -356,6 +357,7 @@ enum mddev_flags {
 	MD_BROKEN,
 	MD_DO_DELETE,
 	MD_DELETED,
+	MD_HAS_SUPERBLOCK,
 };
 
 enum mddev_sb_flags {
@@ -623,7 +625,6 @@ struct mddev {
 	/* The sequence number for sync thread */
 	atomic_t sync_seq;
 
-	bool	has_superblocks:1;
 	bool	fail_last_dev:1;
 	bool	serialize_policy:1;
 };
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 02/11] md: merge mddev faillast_dev into mddev_flags
  2025-11-24  6:31 [PATCH v2 00/11] md: align bio to io_opt and fix abnormal io_opt Yu Kuai
  2025-11-24  6:31 ` [PATCH v2 01/11] md: merge mddev has_superblock into mddev_flags Yu Kuai
@ 2025-11-24  6:31 ` Yu Kuai
  2025-12-26  3:46   ` Li Nan
  2025-11-24  6:31 ` [PATCH v2 03/11] md: merge mddev serialize_policy " Yu Kuai
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Yu Kuai @ 2025-11-24  6:31 UTC (permalink / raw)
  To: song, linux-raid; +Cc: linux-kernel, filippo, colyli, yukuai

There is not need to use a separate field in struct mddev, there are no
functional changes.

Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 drivers/md/md.c     | 10 ++++++----
 drivers/md/md.h     |  3 ++-
 drivers/md/raid0.c  |  3 ++-
 drivers/md/raid1.c  |  4 ++--
 drivers/md/raid10.c |  4 ++--
 drivers/md/raid5.c  |  5 ++++-
 6 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index b49fdee11a03..5dcfd0371090 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5864,11 +5864,11 @@ __ATTR(consistency_policy, S_IRUGO | S_IWUSR, consistency_policy_show,
 
 static ssize_t fail_last_dev_show(struct mddev *mddev, char *page)
 {
-	return sprintf(page, "%d\n", mddev->fail_last_dev);
+	return sprintf(page, "%d\n", test_bit(MD_FAILLAST_DEV, &mddev->flags));
 }
 
 /*
- * Setting fail_last_dev to true to allow last device to be forcibly removed
+ * Setting MD_FAILLAST_DEV to allow last device to be forcibly removed
  * from RAID1/RAID10.
  */
 static ssize_t
@@ -5881,8 +5881,10 @@ fail_last_dev_store(struct mddev *mddev, const char *buf, size_t len)
 	if (ret)
 		return ret;
 
-	if (value != mddev->fail_last_dev)
-		mddev->fail_last_dev = value;
+	if (value)
+		set_bit(MD_FAILLAST_DEV, &mddev->flags);
+	else
+		clear_bit(MD_FAILLAST_DEV, &mddev->flags);
 
 	return len;
 }
diff --git a/drivers/md/md.h b/drivers/md/md.h
index b4c9aa600edd..297a104fba88 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -341,6 +341,7 @@ struct md_cluster_operations;
  * @MD_BROKEN: This is used to stop writes and mark array as failed.
  * @MD_DELETED: This device is being deleted
  * @MD_HAS_SUPERBLOCK: There is persistence sb in member disks.
+ * @MD_FAILLAST_DEV: Allow last rdev to be removed.
  *
  * change UNSUPPORTED_MDDEV_FLAGS for each array type if new flag is added
  */
@@ -358,6 +359,7 @@ enum mddev_flags {
 	MD_DO_DELETE,
 	MD_DELETED,
 	MD_HAS_SUPERBLOCK,
+	MD_FAILLAST_DEV,
 };
 
 enum mddev_sb_flags {
@@ -625,7 +627,6 @@ struct mddev {
 	/* The sequence number for sync thread */
 	atomic_t sync_seq;
 
-	bool	fail_last_dev:1;
 	bool	serialize_policy:1;
 };
 
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 47aee1b1d4d1..012d8402af28 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -27,7 +27,8 @@ module_param(default_layout, int, 0644);
 	 (1L << MD_JOURNAL_CLEAN) |	\
 	 (1L << MD_FAILFAST_SUPPORTED) |\
 	 (1L << MD_HAS_PPL) |		\
-	 (1L << MD_HAS_MULTIPLE_PPLS))
+	 (1L << MD_HAS_MULTIPLE_PPLS) |	\
+	 (1L << MD_FAILLAST_DEV))
 
 /*
  * inform the user of the raid configuration
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 57d50465eed1..98b5c93810bb 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1746,7 +1746,7 @@ static void raid1_status(struct seq_file *seq, struct mddev *mddev)
  *	- &mddev->degraded is bumped.
  *
  * @rdev is marked as &Faulty excluding case when array is failed and
- * &mddev->fail_last_dev is off.
+ * MD_FAILLAST_DEV is not set.
  */
 static void raid1_error(struct mddev *mddev, struct md_rdev *rdev)
 {
@@ -1759,7 +1759,7 @@ static void raid1_error(struct mddev *mddev, struct md_rdev *rdev)
 	    (conf->raid_disks - mddev->degraded) == 1) {
 		set_bit(MD_BROKEN, &mddev->flags);
 
-		if (!mddev->fail_last_dev) {
+		if (!test_bit(MD_FAILLAST_DEV, &mddev->flags)) {
 			conf->recovery_disabled = mddev->recovery_disabled;
 			spin_unlock_irqrestore(&conf->device_lock, flags);
 			return;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 84be4cc7e873..09328e032f14 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1990,7 +1990,7 @@ static int enough(struct r10conf *conf, int ignore)
  *	- &mddev->degraded is bumped.
  *
  * @rdev is marked as &Faulty excluding case when array is failed and
- * &mddev->fail_last_dev is off.
+ * MD_FAILLAST_DEV is not set.
  */
 static void raid10_error(struct mddev *mddev, struct md_rdev *rdev)
 {
@@ -2002,7 +2002,7 @@ static void raid10_error(struct mddev *mddev, struct md_rdev *rdev)
 	if (test_bit(In_sync, &rdev->flags) && !enough(conf, rdev->raid_disk)) {
 		set_bit(MD_BROKEN, &mddev->flags);
 
-		if (!mddev->fail_last_dev) {
+		if (!test_bit(MD_FAILLAST_DEV, &mddev->flags)) {
 			spin_unlock_irqrestore(&conf->device_lock, flags);
 			return;
 		}
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index cdbc7eba5c54..74f6729864fa 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -56,7 +56,10 @@
 #include "md-bitmap.h"
 #include "raid5-log.h"
 
-#define UNSUPPORTED_MDDEV_FLAGS	(1L << MD_FAILFAST_SUPPORTED)
+#define UNSUPPORTED_MDDEV_FLAGS		\
+	((1L << MD_FAILFAST_SUPPORTED) |	\
+	 (1L << MD_FAILLAST_DEV))
+
 
 #define cpu_to_group(cpu) cpu_to_node(cpu)
 #define ANY_GROUP NUMA_NO_NODE
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 03/11] md: merge mddev serialize_policy into mddev_flags
  2025-11-24  6:31 [PATCH v2 00/11] md: align bio to io_opt and fix abnormal io_opt Yu Kuai
  2025-11-24  6:31 ` [PATCH v2 01/11] md: merge mddev has_superblock into mddev_flags Yu Kuai
  2025-11-24  6:31 ` [PATCH v2 02/11] md: merge mddev faillast_dev " Yu Kuai
@ 2025-11-24  6:31 ` Yu Kuai
  2025-12-26  6:33   ` Li Nan
  2025-11-24  6:31 ` [PATCH v2 04/11] md/raid5: use mempool to allocate stripe_request_ctx Yu Kuai
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Yu Kuai @ 2025-11-24  6:31 UTC (permalink / raw)
  To: song, linux-raid; +Cc: linux-kernel, filippo, colyli, yukuai

There is not need to use a separate field in struct mddev, there are no
functional changes.

Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 drivers/md/md-bitmap.c |  4 ++--
 drivers/md/md.c        | 20 ++++++++++++--------
 drivers/md/md.h        |  4 ++--
 drivers/md/raid0.c     |  3 ++-
 drivers/md/raid1.c     |  4 ++--
 drivers/md/raid5.c     |  3 ++-
 6 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 84b7e2af6dba..dbe4c4b9a1da 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -2085,7 +2085,7 @@ static void bitmap_destroy(struct mddev *mddev)
 		return;
 
 	bitmap_wait_behind_writes(mddev);
-	if (!mddev->serialize_policy)
+	if (!test_bit(MD_SERIALIZE_POLICY, &mddev->flags))
 		mddev_destroy_serial_pool(mddev, NULL);
 
 	mutex_lock(&mddev->bitmap_info.mutex);
@@ -2809,7 +2809,7 @@ backlog_store(struct mddev *mddev, const char *buf, size_t len)
 	mddev->bitmap_info.max_write_behind = backlog;
 	if (!backlog && mddev->serial_info_pool) {
 		/* serial_info_pool is not needed if backlog is zero */
-		if (!mddev->serialize_policy)
+		if (!test_bit(MD_SERIALIZE_POLICY, &mddev->flags))
 			mddev_destroy_serial_pool(mddev, NULL);
 	} else if (backlog && !mddev->serial_info_pool) {
 		/* serial_info_pool is needed since backlog is not zero */
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 5dcfd0371090..5833cbff4acf 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -279,7 +279,8 @@ void mddev_destroy_serial_pool(struct mddev *mddev, struct md_rdev *rdev)
 
 		rdev_for_each(temp, mddev) {
 			if (!rdev) {
-				if (!mddev->serialize_policy ||
+				if (!test_bit(MD_SERIALIZE_POLICY,
+					      &mddev->flags) ||
 				    !rdev_need_serial(temp))
 					rdev_uninit_serial(temp);
 				else
@@ -5897,11 +5898,12 @@ static ssize_t serialize_policy_show(struct mddev *mddev, char *page)
 	if (mddev->pers == NULL || (mddev->pers->head.id != ID_RAID1))
 		return sprintf(page, "n/a\n");
 	else
-		return sprintf(page, "%d\n", mddev->serialize_policy);
+		return sprintf(page, "%d\n",
+			       test_bit(MD_SERIALIZE_POLICY, &mddev->flags));
 }
 
 /*
- * Setting serialize_policy to true to enforce write IO is not reordered
+ * Setting MD_SERIALIZE_POLICY enforce write IO is not reordered
  * for raid1.
  */
 static ssize_t
@@ -5914,7 +5916,7 @@ serialize_policy_store(struct mddev *mddev, const char *buf, size_t len)
 	if (err)
 		return err;
 
-	if (value == mddev->serialize_policy)
+	if (value == test_bit(MD_SERIALIZE_POLICY, &mddev->flags))
 		return len;
 
 	err = mddev_suspend_and_lock(mddev);
@@ -5926,11 +5928,13 @@ serialize_policy_store(struct mddev *mddev, const char *buf, size_t len)
 		goto unlock;
 	}
 
-	if (value)
+	if (value) {
 		mddev_create_serial_pool(mddev, NULL);
-	else
+		set_bit(MD_SERIALIZE_POLICY, &mddev->flags);
+	} else {
 		mddev_destroy_serial_pool(mddev, NULL);
-	mddev->serialize_policy = value;
+		clear_bit(MD_SERIALIZE_POLICY, &mddev->flags);
+	}
 unlock:
 	mddev_unlock_and_resume(mddev);
 	return err ?: len;
@@ -6827,7 +6831,7 @@ static void __md_stop_writes(struct mddev *mddev)
 		md_update_sb(mddev, 1);
 	}
 	/* disable policy to guarantee rdevs free resources for serialization */
-	mddev->serialize_policy = 0;
+	clear_bit(MD_SERIALIZE_POLICY, &mddev->flags);
 	mddev_destroy_serial_pool(mddev, NULL);
 }
 
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 297a104fba88..6ee18045f41c 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -342,6 +342,7 @@ struct md_cluster_operations;
  * @MD_DELETED: This device is being deleted
  * @MD_HAS_SUPERBLOCK: There is persistence sb in member disks.
  * @MD_FAILLAST_DEV: Allow last rdev to be removed.
+ * @MD_SERIALIZE_POLICY: Enforce write IO is not reordered, just used by raid1.
  *
  * change UNSUPPORTED_MDDEV_FLAGS for each array type if new flag is added
  */
@@ -360,6 +361,7 @@ enum mddev_flags {
 	MD_DELETED,
 	MD_HAS_SUPERBLOCK,
 	MD_FAILLAST_DEV,
+	MD_SERIALIZE_POLICY,
 };
 
 enum mddev_sb_flags {
@@ -626,8 +628,6 @@ struct mddev {
 
 	/* The sequence number for sync thread */
 	atomic_t sync_seq;
-
-	bool	serialize_policy:1;
 };
 
 enum recovery_flags {
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 012d8402af28..bf1f3ab59c83 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -28,7 +28,8 @@ module_param(default_layout, int, 0644);
 	 (1L << MD_FAILFAST_SUPPORTED) |\
 	 (1L << MD_HAS_PPL) |		\
 	 (1L << MD_HAS_MULTIPLE_PPLS) |	\
-	 (1L << MD_FAILLAST_DEV))
+	 (1L << MD_FAILLAST_DEV) |	\
+	 (1L << MD_SERIALIZE_POLICY))
 
 /*
  * inform the user of the raid configuration
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 98b5c93810bb..f4c7004888af 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -542,7 +542,7 @@ static void raid1_end_write_request(struct bio *bio)
 				call_bio_endio(r1_bio);
 			}
 		}
-	} else if (rdev->mddev->serialize_policy)
+	} else if (test_bit(MD_SERIALIZE_POLICY, &rdev->mddev->flags))
 		remove_serial(rdev, lo, hi);
 	if (r1_bio->bios[mirror] == NULL)
 		rdev_dec_pending(rdev, conf->mddev);
@@ -1644,7 +1644,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 			mbio = bio_alloc_clone(rdev->bdev, bio, GFP_NOIO,
 					       &mddev->bio_set);
 
-			if (mddev->serialize_policy)
+			if (test_bit(MD_SERIALIZE_POLICY, &mddev->flags))
 				wait_for_serialization(rdev, r1_bio);
 		}
 
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 74f6729864fa..f405ba7b99a7 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -58,7 +58,8 @@
 
 #define UNSUPPORTED_MDDEV_FLAGS		\
 	((1L << MD_FAILFAST_SUPPORTED) |	\
-	 (1L << MD_FAILLAST_DEV))
+	 (1L << MD_FAILLAST_DEV) |		\
+	 (1L << MD_SERIALIZE_POLICY))
 
 
 #define cpu_to_group(cpu) cpu_to_node(cpu)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 04/11] md/raid5: use mempool to allocate stripe_request_ctx
  2025-11-24  6:31 [PATCH v2 00/11] md: align bio to io_opt and fix abnormal io_opt Yu Kuai
                   ` (2 preceding siblings ...)
  2025-11-24  6:31 ` [PATCH v2 03/11] md: merge mddev serialize_policy " Yu Kuai
@ 2025-11-24  6:31 ` Yu Kuai
  2025-12-26  8:33   ` Li Nan
  2025-12-30  9:38   ` Li Nan
  2025-11-24  6:31 ` [PATCH v2 05/11] md/raid5: make sure max_sectors is not less than io_opt Yu Kuai
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 25+ messages in thread
From: Yu Kuai @ 2025-11-24  6:31 UTC (permalink / raw)
  To: song, linux-raid; +Cc: linux-kernel, filippo, colyli, yukuai

On the one hand, stripe_request_ctx is 72 bytes, and it's a bit huge for
a stack variable.

On the other hand, the bitmap sectors_to_do is a fixed size, result in
max_hw_sector_kb of raid5 array is at most 256 * 4k = 1Mb, and this will
make full stripe IO impossible for the array that chunk_size * data_disks
is bigger. Allocate ctx during runtime will make it possible to get rid
of this limit.

Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 drivers/md/md.h       |  4 +++
 drivers/md/raid1-10.c |  5 ----
 drivers/md/raid5.c    | 61 +++++++++++++++++++++++++++----------------
 drivers/md/raid5.h    |  2 ++
 4 files changed, 45 insertions(+), 27 deletions(-)

diff --git a/drivers/md/md.h b/drivers/md/md.h
index 6ee18045f41c..b8c5dec12b62 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -22,6 +22,10 @@
 #include <trace/events/block.h>
 
 #define MaxSector (~(sector_t)0)
+/*
+ * Number of guaranteed raid bios in case of extreme VM load:
+ */
+#define	NR_RAID_BIOS 256
 
 enum md_submodule_type {
 	MD_PERSONALITY = 0,
diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
index 521625756128..c33099925f23 100644
--- a/drivers/md/raid1-10.c
+++ b/drivers/md/raid1-10.c
@@ -3,11 +3,6 @@
 #define RESYNC_BLOCK_SIZE (64*1024)
 #define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
 
-/*
- * Number of guaranteed raid bios in case of extreme VM load:
- */
-#define	NR_RAID_BIOS 256
-
 /* when we get a read error on a read-only array, we redirect to another
  * device without failing the first device, or trying to over-write to
  * correct the read error.  To keep track of bad blocks on a per-bio
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index f405ba7b99a7..0080dec4a6ef 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6083,13 +6083,13 @@ static sector_t raid5_bio_lowest_chunk_sector(struct r5conf *conf,
 static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 {
 	DEFINE_WAIT_FUNC(wait, woken_wake_function);
-	bool on_wq;
 	struct r5conf *conf = mddev->private;
-	sector_t logical_sector;
-	struct stripe_request_ctx ctx = {};
 	const int rw = bio_data_dir(bi);
+	struct stripe_request_ctx *ctx;
+	sector_t logical_sector;
 	enum stripe_result res;
 	int s, stripe_cnt;
+	bool on_wq;
 
 	if (unlikely(bi->bi_opf & REQ_PREFLUSH)) {
 		int ret = log_handle_flush_request(conf, bi);
@@ -6101,11 +6101,6 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 				return true;
 		}
 		/* ret == -EAGAIN, fallback */
-		/*
-		 * if r5l_handle_flush_request() didn't clear REQ_PREFLUSH,
-		 * we need to flush journal device
-		 */
-		ctx.do_flush = bi->bi_opf & REQ_PREFLUSH;
 	}
 
 	md_write_start(mddev, bi);
@@ -6128,16 +6123,24 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 	}
 
 	logical_sector = bi->bi_iter.bi_sector & ~((sector_t)RAID5_STRIPE_SECTORS(conf)-1);
-	ctx.first_sector = logical_sector;
-	ctx.last_sector = bio_end_sector(bi);
 	bi->bi_next = NULL;
 
-	stripe_cnt = DIV_ROUND_UP_SECTOR_T(ctx.last_sector - logical_sector,
+	ctx = mempool_alloc(conf->ctx_pool, GFP_NOIO | __GFP_ZERO);
+	ctx->first_sector = logical_sector;
+	ctx->last_sector = bio_end_sector(bi);
+	/*
+	 * if r5l_handle_flush_request() didn't clear REQ_PREFLUSH,
+	 * we need to flush journal device
+	 */
+	if (unlikely(bi->bi_opf & REQ_PREFLUSH))
+		ctx->do_flush = true;
+
+	stripe_cnt = DIV_ROUND_UP_SECTOR_T(ctx->last_sector - logical_sector,
 					   RAID5_STRIPE_SECTORS(conf));
-	bitmap_set(ctx.sectors_to_do, 0, stripe_cnt);
+	bitmap_set(ctx->sectors_to_do, 0, stripe_cnt);
 
 	pr_debug("raid456: %s, logical %llu to %llu\n", __func__,
-		 bi->bi_iter.bi_sector, ctx.last_sector);
+		 bi->bi_iter.bi_sector, ctx->last_sector);
 
 	/* Bail out if conflicts with reshape and REQ_NOWAIT is set */
 	if ((bi->bi_opf & REQ_NOWAIT) &&
@@ -6145,6 +6148,7 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 		bio_wouldblock_error(bi);
 		if (rw == WRITE)
 			md_write_end(mddev);
+		mempool_free(ctx, conf->ctx_pool);
 		return true;
 	}
 	md_account_bio(mddev, &bi);
@@ -6163,10 +6167,10 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 		add_wait_queue(&conf->wait_for_reshape, &wait);
 		on_wq = true;
 	}
-	s = (logical_sector - ctx.first_sector) >> RAID5_STRIPE_SHIFT(conf);
+	s = (logical_sector - ctx->first_sector) >> RAID5_STRIPE_SHIFT(conf);
 
 	while (1) {
-		res = make_stripe_request(mddev, conf, &ctx, logical_sector,
+		res = make_stripe_request(mddev, conf, ctx, logical_sector,
 					  bi);
 		if (res == STRIPE_FAIL || res == STRIPE_WAIT_RESHAPE)
 			break;
@@ -6183,9 +6187,9 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 			 * raid5_activate_delayed() from making progress
 			 * and thus deadlocking.
 			 */
-			if (ctx.batch_last) {
-				raid5_release_stripe(ctx.batch_last);
-				ctx.batch_last = NULL;
+			if (ctx->batch_last) {
+				raid5_release_stripe(ctx->batch_last);
+				ctx->batch_last = NULL;
 			}
 
 			wait_woken(&wait, TASK_UNINTERRUPTIBLE,
@@ -6193,21 +6197,23 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 			continue;
 		}
 
-		s = find_next_bit_wrap(ctx.sectors_to_do, stripe_cnt, s);
+		s = find_next_bit_wrap(ctx->sectors_to_do, stripe_cnt, s);
 		if (s == stripe_cnt)
 			break;
 
-		logical_sector = ctx.first_sector +
+		logical_sector = ctx->first_sector +
 			(s << RAID5_STRIPE_SHIFT(conf));
 	}
 	if (unlikely(on_wq))
 		remove_wait_queue(&conf->wait_for_reshape, &wait);
 
-	if (ctx.batch_last)
-		raid5_release_stripe(ctx.batch_last);
+	if (ctx->batch_last)
+		raid5_release_stripe(ctx->batch_last);
 
 	if (rw == WRITE)
 		md_write_end(mddev);
+
+	mempool_free(ctx, conf->ctx_pool);
 	if (res == STRIPE_WAIT_RESHAPE) {
 		md_free_cloned_bio(bi);
 		return false;
@@ -7374,6 +7380,10 @@ static void free_conf(struct r5conf *conf)
 	bioset_exit(&conf->bio_split);
 	kfree(conf->stripe_hashtbl);
 	kfree(conf->pending_data);
+
+	if (conf->ctx_pool)
+		mempool_destroy(conf->ctx_pool);
+
 	kfree(conf);
 }
 
@@ -8057,6 +8067,13 @@ static int raid5_run(struct mddev *mddev)
 			goto abort;
 	}
 
+	conf->ctx_pool = mempool_create_kmalloc_pool(NR_RAID_BIOS,
+					sizeof(struct stripe_request_ctx));
+	if (!conf->ctx_pool) {
+		ret = -ENOMEM;
+		goto abort;
+	}
+
 	if (log_init(conf, journal_dev, raid5_has_ppl(conf)))
 		goto abort;
 
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index eafc6e9ed6ee..6e3f07119fa4 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -690,6 +690,8 @@ struct r5conf {
 	struct list_head	pending_list;
 	int			pending_data_cnt;
 	struct r5pending_data	*next_pending_data;
+
+	mempool_t		*ctx_pool;
 };
 
 #if PAGE_SIZE == DEFAULT_STRIPE_SIZE
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 05/11] md/raid5: make sure max_sectors is not less than io_opt
  2025-11-24  6:31 [PATCH v2 00/11] md: align bio to io_opt and fix abnormal io_opt Yu Kuai
                   ` (3 preceding siblings ...)
  2025-11-24  6:31 ` [PATCH v2 04/11] md/raid5: use mempool to allocate stripe_request_ctx Yu Kuai
@ 2025-11-24  6:31 ` Yu Kuai
  2025-11-24  6:31 ` [PATCH v2 06/11] md: support to align bio to limits Yu Kuai
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Yu Kuai @ 2025-11-24  6:31 UTC (permalink / raw)
  To: song, linux-raid; +Cc: linux-kernel, filippo, colyli, yukuai

Otherwise, even if user issue IO by io_opt, such IO will be split
by max_sectors before they are submitted to raid5. For consequence,
full stripe IO is impossible.

BTW, dm-raid5 is not affected and still have such problem.

Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 drivers/md/raid5.c | 35 ++++++++++++++++++++++++++---------
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 0080dec4a6ef..cd0eff2f69b4 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -777,14 +777,14 @@ struct stripe_request_ctx {
 	/* last sector in the request */
 	sector_t last_sector;
 
+	/* the request had REQ_PREFLUSH, cleared after the first stripe_head */
+	bool do_flush;
+
 	/*
 	 * bitmap to track stripe sectors that have been added to stripes
 	 * add one to account for unaligned requests
 	 */
-	DECLARE_BITMAP(sectors_to_do, RAID5_MAX_REQ_STRIPES + 1);
-
-	/* the request had REQ_PREFLUSH, cleared after the first stripe_head */
-	bool do_flush;
+	unsigned long sectors_to_do[];
 };
 
 /*
@@ -7739,6 +7739,24 @@ static int only_parity(int raid_disk, int algo, int raid_disks, int max_degraded
 	return 0;
 }
 
+static int raid5_create_ctx_pool(struct r5conf *conf)
+{
+	struct stripe_request_ctx *ctx;
+	int size;
+
+	if (mddev_is_dm(conf->mddev))
+		size = BITS_TO_LONGS(RAID5_MAX_REQ_STRIPES);
+	else
+		size = BITS_TO_LONGS(
+			queue_max_hw_sectors(conf->mddev->gendisk->queue) >>
+			RAID5_STRIPE_SHIFT(conf));
+
+	conf->ctx_pool = mempool_create_kmalloc_pool(NR_RAID_BIOS,
+			struct_size(ctx, sectors_to_do, size));
+
+	return conf->ctx_pool ? 0 : -ENOMEM;
+}
+
 static int raid5_set_limits(struct mddev *mddev)
 {
 	struct r5conf *conf = mddev->private;
@@ -7795,6 +7813,8 @@ static int raid5_set_limits(struct mddev *mddev)
 	 * Limit the max sectors based on this.
 	 */
 	lim.max_hw_sectors = RAID5_MAX_REQ_STRIPES << RAID5_STRIPE_SHIFT(conf);
+	if ((lim.max_hw_sectors << 9) < lim.io_opt)
+		lim.max_hw_sectors = lim.io_opt >> 9;
 
 	/* No restrictions on the number of segments in the request */
 	lim.max_segments = USHRT_MAX;
@@ -8067,12 +8087,9 @@ static int raid5_run(struct mddev *mddev)
 			goto abort;
 	}
 
-	conf->ctx_pool = mempool_create_kmalloc_pool(NR_RAID_BIOS,
-					sizeof(struct stripe_request_ctx));
-	if (!conf->ctx_pool) {
-		ret = -ENOMEM;
+	ret = raid5_create_ctx_pool(conf);
+	if (ret)
 		goto abort;
-	}
 
 	if (log_init(conf, journal_dev, raid5_has_ppl(conf)))
 		goto abort;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 06/11] md: support to align bio to limits
  2025-11-24  6:31 [PATCH v2 00/11] md: align bio to io_opt and fix abnormal io_opt Yu Kuai
                   ` (4 preceding siblings ...)
  2025-11-24  6:31 ` [PATCH v2 05/11] md/raid5: make sure max_sectors is not less than io_opt Yu Kuai
@ 2025-11-24  6:31 ` Yu Kuai
  2025-11-27  0:51   ` kernel test robot
  2025-11-27  7:05   ` kernel test robot
  2025-11-24  6:31 ` [PATCH v2 07/11] md: add a helper md_config_align_limits() Yu Kuai
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 25+ messages in thread
From: Yu Kuai @ 2025-11-24  6:31 UTC (permalink / raw)
  To: song, linux-raid; +Cc: linux-kernel, filippo, colyli, yukuai

For personalities that report optimal IO size, it's indicate that users
can get the best IO bandwidth if they issue IO with this size. However
there is also an implicit condition that IO should also be aligned to the
optimal IO size.

Currently, bio will only be split by limits, if bio offset is not aligned
to limits, then all split bio will not be aligned. This patch add a new
feature to align bio to limits first, and following patches will support
this for each personality if necessary.

Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 drivers/md/md.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 drivers/md/md.h |  2 ++
 2 files changed, 48 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 5833cbff4acf..db2d950a1449 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -428,6 +428,48 @@ bool md_handle_request(struct mddev *mddev, struct bio *bio)
 }
 EXPORT_SYMBOL(md_handle_request);
 
+static struct bio *__md_bio_align_to_limits(struct mddev *mddev,
+                                           struct bio *bio)
+{
+	unsigned int max_sectors = mddev->gendisk->queue->limits.max_sectors;
+	sector_t start = bio->bi_iter.bi_sector;
+	sector_t align_start = roundup(start, max_sectors);
+	sector_t end;
+	sector_t align_end;
+
+	/* already aligned */
+	if (align_start == start)
+		return bio;
+
+	end = start + bio_sectors(bio);
+	align_end = rounddown(end, max_sectors);
+
+	/* bio is too small to split */
+	if (align_end <= align_start)
+		return bio;
+
+	return bio_submit_split_bioset(bio, align_start - start,
+				       &mddev->gendisk->bio_split);
+}
+
+static struct bio *md_bio_align_to_limits(struct mddev *mddev, struct bio *bio)
+{
+	if (!test_bit(MD_BIO_ALIGN, &mddev->flags))
+		return bio;
+
+	/* atomic write can't split */
+	if (bio->bi_opf & REQ_ATOMIC)
+		return bio;
+
+	switch (bio_op(bio)) {
+	case REQ_OP_READ:
+	case REQ_OP_WRITE:
+		return __md_bio_align_to_limits(mddev, bio);
+	default:
+		return bio;
+	}
+}
+
 static void md_submit_bio(struct bio *bio)
 {
 	const int rw = bio_data_dir(bio);
@@ -443,6 +485,10 @@ static void md_submit_bio(struct bio *bio)
 		return;
 	}
 
+	bio = md_bio_align_to_limits(mddev, bio);
+	if (!bio)
+		return;
+
 	bio = bio_split_to_limits(bio);
 	if (!bio)
 		return;
diff --git a/drivers/md/md.h b/drivers/md/md.h
index b8c5dec12b62..e7aba83b708b 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -347,6 +347,7 @@ struct md_cluster_operations;
  * @MD_HAS_SUPERBLOCK: There is persistence sb in member disks.
  * @MD_FAILLAST_DEV: Allow last rdev to be removed.
  * @MD_SERIALIZE_POLICY: Enforce write IO is not reordered, just used by raid1.
+ * @MD_BIO_ALIGN: Bio issued to the array will align to io_opt before split.
  *
  * change UNSUPPORTED_MDDEV_FLAGS for each array type if new flag is added
  */
@@ -366,6 +367,7 @@ enum mddev_flags {
 	MD_HAS_SUPERBLOCK,
 	MD_FAILLAST_DEV,
 	MD_SERIALIZE_POLICY,
+	MD_BIO_ALIGN,
 };
 
 enum mddev_sb_flags {
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 07/11] md: add a helper md_config_align_limits()
  2025-11-24  6:31 [PATCH v2 00/11] md: align bio to io_opt and fix abnormal io_opt Yu Kuai
                   ` (5 preceding siblings ...)
  2025-11-24  6:31 ` [PATCH v2 06/11] md: support to align bio to limits Yu Kuai
@ 2025-11-24  6:31 ` Yu Kuai
  2025-11-24  6:32 ` [PATCH v2 08/11] md/raid5: align bio to io_opt Yu Kuai
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Yu Kuai @ 2025-11-24  6:31 UTC (permalink / raw)
  To: song, linux-raid; +Cc: linux-kernel, filippo, colyli, yukuai

This helper will be used by personalities that want to align bio to
io_opt to get best IO bandwidth.

Also add the new flag to UNSUPPORTED_MDDEV_FLAGS for now, following
patches will enable this for personalities.

Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 drivers/md/md.h    | 11 +++++++++++
 drivers/md/raid0.c |  3 ++-
 drivers/md/raid1.c |  3 ++-
 drivers/md/raid5.c |  3 ++-
 4 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/md/md.h b/drivers/md/md.h
index e7aba83b708b..ddf989f2a139 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -1091,6 +1091,17 @@ static inline bool rdev_blocked(struct md_rdev *rdev)
 	return false;
 }
 
+static inline void md_config_align_limits(struct mddev *mddev,
+					  struct queue_limits *lim)
+{
+	if ((lim->max_hw_sectors << 9) < lim->io_opt)
+		lim->max_hw_sectors = lim->io_opt >> 9;
+	else
+		lim->max_hw_sectors = rounddown(lim->max_hw_sectors,
+						lim->io_opt >> 9);
+	set_bit(MD_BIO_ALIGN, &mddev->flags);
+}
+
 #define mddev_add_trace_msg(mddev, fmt, args...)			\
 do {									\
 	if (!mddev_is_dm(mddev))					\
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index bf1f3ab59c83..01cce0c3eab7 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -29,7 +29,8 @@ module_param(default_layout, int, 0644);
 	 (1L << MD_HAS_PPL) |		\
 	 (1L << MD_HAS_MULTIPLE_PPLS) |	\
 	 (1L << MD_FAILLAST_DEV) |	\
-	 (1L << MD_SERIALIZE_POLICY))
+	 (1L << MD_SERIALIZE_POLICY) |	\
+	 (1L << MD_BIO_ALIGN))
 
 /*
  * inform the user of the raid configuration
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index f4c7004888af..1a957dba2640 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -42,7 +42,8 @@
 	((1L << MD_HAS_JOURNAL) |	\
 	 (1L << MD_JOURNAL_CLEAN) |	\
 	 (1L << MD_HAS_PPL) |		\
-	 (1L << MD_HAS_MULTIPLE_PPLS))
+	 (1L << MD_HAS_MULTIPLE_PPLS) |	\
+	 (1L << MD_BIO_ALIGN))
 
 static void allow_barrier(struct r1conf *conf, sector_t sector_nr);
 static void lower_barrier(struct r1conf *conf, sector_t sector_nr);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index cd0eff2f69b4..0b607aa5963e 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -59,7 +59,8 @@
 #define UNSUPPORTED_MDDEV_FLAGS		\
 	((1L << MD_FAILFAST_SUPPORTED) |	\
 	 (1L << MD_FAILLAST_DEV) |		\
-	 (1L << MD_SERIALIZE_POLICY))
+	 (1L << MD_SERIALIZE_POLICY) |		\
+	 (1L << MD_BIO_ALIGN))
 
 
 #define cpu_to_group(cpu) cpu_to_node(cpu)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 08/11] md/raid5: align bio to io_opt
  2025-11-24  6:31 [PATCH v2 00/11] md: align bio to io_opt and fix abnormal io_opt Yu Kuai
                   ` (6 preceding siblings ...)
  2025-11-24  6:31 ` [PATCH v2 07/11] md: add a helper md_config_align_limits() Yu Kuai
@ 2025-11-24  6:32 ` Yu Kuai
  2025-11-24  6:32 ` [PATCH v2 09/11] md/raid10: " Yu Kuai
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Yu Kuai @ 2025-11-24  6:32 UTC (permalink / raw)
  To: song, linux-raid; +Cc: linux-kernel, filippo, colyli, yukuai

raid5 internal implementaion indicates that if write bio is aligned to
io_opt, then full stripe write will be used, which will be best for
bandwidth because there is no need to read extra data to build new
xor data.

Simple test in my VM, 32 disks raid5 with 64kb chunksize:
dd if=/dev/zero of=/dev/md0 bs=100M oflag=direct

Before this patch:  782 MB/s
With this patch:    1.1 GB/s

BTW, there are still other bottleneck related to stripe handler, and
require further optimization.

Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 drivers/md/raid5.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 0b607aa5963e..bbcb1c06951c 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -59,8 +59,7 @@
 #define UNSUPPORTED_MDDEV_FLAGS		\
 	((1L << MD_FAILFAST_SUPPORTED) |	\
 	 (1L << MD_FAILLAST_DEV) |		\
-	 (1L << MD_SERIALIZE_POLICY) |		\
-	 (1L << MD_BIO_ALIGN))
+	 (1L << MD_SERIALIZE_POLICY))
 
 
 #define cpu_to_group(cpu) cpu_to_node(cpu)
@@ -7814,8 +7813,7 @@ static int raid5_set_limits(struct mddev *mddev)
 	 * Limit the max sectors based on this.
 	 */
 	lim.max_hw_sectors = RAID5_MAX_REQ_STRIPES << RAID5_STRIPE_SHIFT(conf);
-	if ((lim.max_hw_sectors << 9) < lim.io_opt)
-		lim.max_hw_sectors = lim.io_opt >> 9;
+	md_config_align_limits(mddev, &lim);
 
 	/* No restrictions on the number of segments in the request */
 	lim.max_segments = USHRT_MAX;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 09/11] md/raid10: align bio to io_opt
  2025-11-24  6:31 [PATCH v2 00/11] md: align bio to io_opt and fix abnormal io_opt Yu Kuai
                   ` (7 preceding siblings ...)
  2025-11-24  6:32 ` [PATCH v2 08/11] md/raid5: align bio to io_opt Yu Kuai
@ 2025-11-24  6:32 ` Yu Kuai
  2025-11-24  6:32 ` [PATCH v2 10/11] md/raid0: " Yu Kuai
  2025-11-24  6:32 ` [PATCH v2 11/11] md: fix abnormal io_opt from member disks Yu Kuai
  10 siblings, 0 replies; 25+ messages in thread
From: Yu Kuai @ 2025-11-24  6:32 UTC (permalink / raw)
  To: song, linux-raid; +Cc: linux-kernel, filippo, colyli, yukuai

The impact is not so significant for raid10 compared to raid5, however
it's still more appropriate to issue IOs evenly to underlying disks.

Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 drivers/md/raid10.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 09328e032f14..2c6b65b83724 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4008,6 +4008,8 @@ static int raid10_set_queue_limits(struct mddev *mddev)
 	err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY);
 	if (err)
 		return err;
+
+	md_config_align_limits(mddev, &lim);
 	return queue_limits_set(mddev->gendisk->queue, &lim);
 }
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 10/11] md/raid0: align bio to io_opt
  2025-11-24  6:31 [PATCH v2 00/11] md: align bio to io_opt and fix abnormal io_opt Yu Kuai
                   ` (8 preceding siblings ...)
  2025-11-24  6:32 ` [PATCH v2 09/11] md/raid10: " Yu Kuai
@ 2025-11-24  6:32 ` Yu Kuai
  2025-11-24  6:32 ` [PATCH v2 11/11] md: fix abnormal io_opt from member disks Yu Kuai
  10 siblings, 0 replies; 25+ messages in thread
From: Yu Kuai @ 2025-11-24  6:32 UTC (permalink / raw)
  To: song, linux-raid; +Cc: linux-kernel, filippo, colyli, yukuai

The impact is not so significant for raid0 compared to raid5, however
it's still more appropriate to issue IOs evenly to underlying disks.

Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 drivers/md/raid0.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 01cce0c3eab7..c94c6f78767f 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -29,8 +29,7 @@ module_param(default_layout, int, 0644);
 	 (1L << MD_HAS_PPL) |		\
 	 (1L << MD_HAS_MULTIPLE_PPLS) |	\
 	 (1L << MD_FAILLAST_DEV) |	\
-	 (1L << MD_SERIALIZE_POLICY) |	\
-	 (1L << MD_BIO_ALIGN))
+	 (1L << MD_SERIALIZE_POLICY))
 
 /*
  * inform the user of the raid configuration
@@ -391,6 +390,8 @@ static int raid0_set_limits(struct mddev *mddev)
 	err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY);
 	if (err)
 		return err;
+
+	md_config_align_limits(mddev, &lim);
 	return queue_limits_set(mddev->gendisk->queue, &lim);
 }
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 11/11] md: fix abnormal io_opt from member disks
  2025-11-24  6:31 [PATCH v2 00/11] md: align bio to io_opt and fix abnormal io_opt Yu Kuai
                   ` (9 preceding siblings ...)
  2025-11-24  6:32 ` [PATCH v2 10/11] md/raid0: " Yu Kuai
@ 2025-11-24  6:32 ` Yu Kuai
  10 siblings, 0 replies; 25+ messages in thread
From: Yu Kuai @ 2025-11-24  6:32 UTC (permalink / raw)
  To: song, linux-raid; +Cc: linux-kernel, filippo, colyli, yukuai

It's reported that mtp3sas can report abnormal io_opt, for consequence,
md array will end up with abnormal io_opt as well, due to the
lcm_not_zero() from blk_stack_limits().

Some personalities will configure optimal IO size, and it's indicate that
users can get the best IO bandwidth if they issue IO with this size, and
we don't want io_opt to be covered by member disks with abnormal io_opt.

Fix this problem by adding a new mddev flags MD_STACK_IO_OPT to indicate
that io_opt configured by personalities is preferred over member disks
or not.

Reported-by: Filippo Giunchedi <filippo@debian.org>
Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1121006
Reported-by: Coly Li <colyli@fnnas.com>
Closes: https://lore.kernel.org/all/20250817152645.7115-1-colyli@kernel.org/
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 drivers/md/md.c     | 35 ++++++++++++++++++++++++++++++++++-
 drivers/md/md.h     |  5 ++++-
 drivers/md/raid1.c  |  2 +-
 drivers/md/raid10.c |  4 ++--
 4 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index db2d950a1449..7714f367765f 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6191,11 +6191,17 @@ static const struct kobj_type md_ktype = {
 
 int mdp_major = 0;
 
+static bool rdev_is_mddev(struct md_rdev *rdev)
+{
+	return rdev->bdev->bd_disk->fops == &md_fops;
+}
+
 /* stack the limit for all rdevs into lim */
 int mddev_stack_rdev_limits(struct mddev *mddev, struct queue_limits *lim,
 		unsigned int flags)
 {
 	struct md_rdev *rdev;
+	unsigned int io_opt = lim->io_opt;
 
 	rdev_for_each(rdev, mddev) {
 		queue_limits_stack_bdev(lim, rdev->bdev, rdev->data_offset,
@@ -6203,6 +6209,9 @@ int mddev_stack_rdev_limits(struct mddev *mddev, struct queue_limits *lim,
 		if ((flags & MDDEV_STACK_INTEGRITY) &&
 		    !queue_limits_stack_integrity_bdev(lim, rdev->bdev))
 			return -EINVAL;
+
+		if (rdev_is_mddev(rdev))
+			set_bit(MD_STACK_IO_OPT, &mddev->flags);
 	}
 
 	/*
@@ -6216,14 +6225,24 @@ int mddev_stack_rdev_limits(struct mddev *mddev, struct queue_limits *lim,
 	}
 	mddev->logical_block_size = lim->logical_block_size;
 
+	/*
+	 * If all member disks are not mdraid array, and the personality
+	 * already configures io_opt, keep this io_opt and ignore io_opt from
+	 * member disks.
+	 */
+	if (!test_bit(MD_STACK_IO_OPT, &mddev->flags) && io_opt)
+		lim->io_opt = io_opt;
+
 	return 0;
 }
 EXPORT_SYMBOL_GPL(mddev_stack_rdev_limits);
 
 /* apply the extra stacking limits from a new rdev into mddev */
-int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev)
+int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev,
+			 bool io_opt_configured)
 {
 	struct queue_limits lim;
+	unsigned int io_opt = 0;
 
 	if (mddev_is_dm(mddev))
 		return 0;
@@ -6236,6 +6255,18 @@ int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev)
 	}
 
 	lim = queue_limits_start_update(mddev->gendisk->queue);
+
+	/*
+	 * Keep the old io_opt if no member disks are from md array, and
+	 * the personality configure it's own io_opt.
+	 */
+	if (!test_bit(MD_STACK_IO_OPT, &mddev->flags)) {
+		if (rdev_is_mddev(rdev))
+			set_bit(MD_STACK_IO_OPT, &mddev->flags);
+		else if (io_opt_configured)
+			io_opt = lim.io_opt;
+	}
+
 	queue_limits_stack_bdev(&lim, rdev->bdev, rdev->data_offset,
 				mddev->gendisk->disk_name);
 
@@ -6246,6 +6277,8 @@ int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev)
 		return -ENXIO;
 	}
 
+	if (io_opt)
+		lim.io_opt = io_opt;
 	return queue_limits_commit_update(mddev->gendisk->queue, &lim);
 }
 EXPORT_SYMBOL_GPL(mddev_stack_new_rdev);
diff --git a/drivers/md/md.h b/drivers/md/md.h
index ddf989f2a139..d37076593403 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -348,6 +348,7 @@ struct md_cluster_operations;
  * @MD_FAILLAST_DEV: Allow last rdev to be removed.
  * @MD_SERIALIZE_POLICY: Enforce write IO is not reordered, just used by raid1.
  * @MD_BIO_ALIGN: Bio issued to the array will align to io_opt before split.
+ * @MD_STACK_IO_OPT: Stack io_opt by member disks.
  *
  * change UNSUPPORTED_MDDEV_FLAGS for each array type if new flag is added
  */
@@ -368,6 +369,7 @@ enum mddev_flags {
 	MD_FAILLAST_DEV,
 	MD_SERIALIZE_POLICY,
 	MD_BIO_ALIGN,
+	MD_STACK_IO_OPT,
 };
 
 enum mddev_sb_flags {
@@ -1041,7 +1043,8 @@ int do_md_run(struct mddev *mddev);
 #define MDDEV_STACK_INTEGRITY	(1u << 0)
 int mddev_stack_rdev_limits(struct mddev *mddev, struct queue_limits *lim,
 		unsigned int flags);
-int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev);
+int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev,
+			 bool io_opt_configured);
 void mddev_update_io_opt(struct mddev *mddev, unsigned int nr_stripes);
 
 extern const struct block_device_operations md_fops;
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 1a957dba2640..f3f3086f27fa 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1944,7 +1944,7 @@ static int raid1_add_disk(struct mddev *mddev, struct md_rdev *rdev)
 	for (mirror = first; mirror <= last; mirror++) {
 		p = conf->mirrors + mirror;
 		if (!p->rdev) {
-			err = mddev_stack_new_rdev(mddev, rdev);
+			err = mddev_stack_new_rdev(mddev, rdev, false);
 			if (err)
 				return err;
 
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 2c6b65b83724..a6edc91e7a9a 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2139,7 +2139,7 @@ static int raid10_add_disk(struct mddev *mddev, struct md_rdev *rdev)
 			continue;
 		}
 
-		err = mddev_stack_new_rdev(mddev, rdev);
+		err = mddev_stack_new_rdev(mddev, rdev, true);
 		if (err)
 			return err;
 		p->head_position = 0;
@@ -2157,7 +2157,7 @@ static int raid10_add_disk(struct mddev *mddev, struct md_rdev *rdev)
 		clear_bit(In_sync, &rdev->flags);
 		set_bit(Replacement, &rdev->flags);
 		rdev->raid_disk = repl_slot;
-		err = mddev_stack_new_rdev(mddev, rdev);
+		err = mddev_stack_new_rdev(mddev, rdev, true);
 		if (err)
 			return err;
 		conf->fullsync = 1;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 06/11] md: support to align bio to limits
  2025-11-24  6:31 ` [PATCH v2 06/11] md: support to align bio to limits Yu Kuai
@ 2025-11-27  0:51   ` kernel test robot
  2025-11-30  2:38     ` Yu Kuai
  2025-11-27  7:05   ` kernel test robot
  1 sibling, 1 reply; 25+ messages in thread
From: kernel test robot @ 2025-11-27  0:51 UTC (permalink / raw)
  To: Yu Kuai, song, linux-raid
  Cc: oe-kbuild-all, linux-kernel, filippo, colyli, yukuai

Hi Yu,

kernel test robot noticed the following build errors:

[auto build test ERROR on next-20251121]
[also build test ERROR on v6.18-rc7]
[cannot apply to linus/master song-md/md-next v6.18-rc7 v6.18-rc6 v6.18-rc5]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yu-Kuai/md-merge-mddev-has_superblock-into-mddev_flags/20251124-143826
base:   next-20251121
patch link:    https://lore.kernel.org/r/20251124063203.1692144-7-yukuai%40fnnas.com
patch subject: [PATCH v2 06/11] md: support to align bio to limits
config: sparc-randconfig-002-20251127 (https://download.01.org/0day-ci/archive/20251127/202511270809.hl08JR8y-lkp@intel.com/config)
compiler: sparc-linux-gcc (GCC) 11.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251127/202511270809.hl08JR8y-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202511270809.hl08JR8y-lkp@intel.com/

All errors (new ones prefixed by >>):

   sparc-linux-ld: drivers/md/md.o: in function `md_submit_bio':
>> md.c:(.text+0x85d4): undefined reference to `__umoddi3'

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 06/11] md: support to align bio to limits
  2025-11-24  6:31 ` [PATCH v2 06/11] md: support to align bio to limits Yu Kuai
  2025-11-27  0:51   ` kernel test robot
@ 2025-11-27  7:05   ` kernel test robot
  1 sibling, 0 replies; 25+ messages in thread
From: kernel test robot @ 2025-11-27  7:05 UTC (permalink / raw)
  To: Yu Kuai, song, linux-raid
  Cc: oe-kbuild-all, linux-kernel, filippo, colyli, yukuai

Hi Yu,

kernel test robot noticed the following build errors:

[auto build test ERROR on next-20251121]
[also build test ERROR on v6.18-rc7]
[cannot apply to linus/master song-md/md-next v6.18-rc7 v6.18-rc6 v6.18-rc5]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yu-Kuai/md-merge-mddev-has_superblock-into-mddev_flags/20251124-143826
base:   next-20251121
patch link:    https://lore.kernel.org/r/20251124063203.1692144-7-yukuai%40fnnas.com
patch subject: [PATCH v2 06/11] md: support to align bio to limits
config: xtensa-randconfig-001-20251127 (https://download.01.org/0day-ci/archive/20251127/202511271423.CJ3C240z-lkp@intel.com/config)
compiler: xtensa-linux-gcc (GCC) 11.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251127/202511271423.CJ3C240z-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202511271423.CJ3C240z-lkp@intel.com/

All errors (new ones prefixed by >>):

   cfi_cmdset_0002.c:(.xiptext+0xfe5): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0xff8): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: mutex_lock
   cfi_cmdset_0002.c:(.xiptext+0x1006): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: (.text+0x17b0)
   cfi_cmdset_0002.c:(.xiptext+0x1012): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_cmdset_0002.c:(.xiptext+0x101c): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x1024): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: mutex_unlock
   cfi_cmdset_0002.c:(.xiptext+0x1030): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x103a): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x1049): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x1053): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x1060): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x107c): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x1094): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_send_gen_cmd
   cfi_cmdset_0002.c:(.xiptext+0x10ac): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_send_gen_cmd
   cfi_cmdset_0002.c:(.xiptext+0x10c4): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_send_gen_cmd
   cfi_cmdset_0002.c:(.xiptext+0x10dc): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_send_gen_cmd
   cfi_cmdset_0002.c:(.xiptext+0x10f2): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_send_gen_cmd
   cfi_cmdset_0002.c:(.xiptext+0x113f): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x114b): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_cmdset_0002.c:(.xiptext+0x1154): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x116e): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: add_wait_queue
   cfi_cmdset_0002.c:(.xiptext+0x1176): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: mutex_unlock
   cfi_cmdset_0002.c:(.xiptext+0x117c): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: schedule
   cfi_cmdset_0002.c:(.xiptext+0x1187): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: remove_wait_queue
   cfi_cmdset_0002.c:(.xiptext+0x118f): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: mutex_lock
   cfi_cmdset_0002.c:(.xiptext+0x1198): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x11aa): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp1
   cfi_cmdset_0002.c:(.xiptext+0x11b2): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x11d0): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x11e8): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_cmdset_0002.c:(.xiptext+0x11f0): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x11fc): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: (.text+0x940)
   cfi_cmdset_0002.c:(.xiptext+0x120a): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_cmdset_0002.c:(.xiptext+0x1218): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x122e): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_cmdset_0002.c:(.xiptext+0x1237): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x1243): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: _printk
   cfi_cmdset_0002.c:(.xiptext+0x124c): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x1264): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x1273): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_build_cmd
   cfi_cmdset_0002.c:(.xiptext+0x128e): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_cmdset_0002.c:(.xiptext+0x12a0): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x12b8): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x12ca): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x12d4): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: (.text+0x167c)
   cfi_cmdset_0002.c:(.xiptext+0x12dc): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: mutex_unlock
   cfi_cmdset_0002.c:(.xiptext+0x12e7): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x12fa): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __stack_chk_fail
   cfi_cmdset_0002.c:(.xiptext+0x131b): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x1334): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: mutex_lock
   cfi_cmdset_0002.c:(.xiptext+0x1342): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: (.text+0x17b0)
   cfi_cmdset_0002.c:(.xiptext+0x134e): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_cmdset_0002.c:(.xiptext+0x1356): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x135e): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: mutex_unlock
   cfi_cmdset_0002.c:(.xiptext+0x1368): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x1376): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_cmdset_0002.c:(.xiptext+0x137f): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x138b): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x1396): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_cmdset_0002.c:(.xiptext+0x139f): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x13b4): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x13be): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_cmdset_0002.c:(.xiptext+0x13c7): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   drivers/mtd/chips/cfi_cmdset_0002.o: in function `do_erase_oneblock':
   cfi_cmdset_0002.c:(.xiptext+0x13f0): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x13f6): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: (.text.unlikely+0x288)
   cfi_cmdset_0002.c:(.xiptext+0x13fc): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x140b): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x141c): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x1427): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x1434): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x145c): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_send_gen_cmd
   cfi_cmdset_0002.c:(.xiptext+0x1472): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_send_gen_cmd
   cfi_cmdset_0002.c:(.xiptext+0x1480): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_build_cmd
   cfi_cmdset_0002.c:(.xiptext+0x14a3): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_build_cmd
   cfi_cmdset_0002.c:(.xiptext+0x14bf): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_cmdset_0002.c:(.xiptext+0x14cd): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x14d7): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_cmdset_0002.c:(.xiptext+0x14e0): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x14ec): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x14f6): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_cmdset_0002.c:(.xiptext+0x14ff): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x1514): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x151f): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_cmdset_0002.c:(.xiptext+0x152b): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x154f): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x1573): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_cmp4
   cfi_cmdset_0002.c:(.xiptext+0x158c): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x1598): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x15a7): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_build_cmd
   cfi_cmdset_0002.c:(.xiptext+0x15d8): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_cmdset_0002.c:(.xiptext+0x15e0): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x15f0): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x1608): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x1615): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_cmdset_0002.c:(.xiptext+0x161f): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: (.text+0x167c)
   cfi_cmdset_0002.c:(.xiptext+0x1627): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: mutex_unlock
   cfi_cmdset_0002.c:(.xiptext+0x162d): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   xtensa-linux-ld: drivers/md/md.o: in function `recovery_start_store':
   md.c:(.text+0xa144): undefined reference to `__udivdi3'
>> xtensa-linux-ld: md.c:(.text+0xa148): undefined reference to `__umoddi3'
   xtensa-linux-ld: drivers/md/md.o: in function `md_write_end':
   md.c:(.text+0xa21b): undefined reference to `__udivdi3'
   xtensa-linux-ld: md.c:(.text+0xa284): undefined reference to `__umoddi3'
   drivers/mtd/chips/cfi_util.o:(.xiptext+0x14): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   drivers/mtd/chips/cfi_util.o:(.xiptext+0x28): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_build_cmd
   drivers/mtd/chips/cfi_util.o: in function `cfi_qry_present':
   cfi_util.c:(.xiptext+0x38): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_build_cmd
   cfi_util.c:(.xiptext+0x46): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_build_cmd
   cfi_util.c:(.xiptext+0x7f): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_cmp4
   cfi_util.c:(.xiptext+0x8d): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x9b): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_cmp4
   cfi_util.c:(.xiptext+0xa4): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0xb2): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_cmp4
   cfi_util.c:(.xiptext+0xc2): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0xcf): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __stack_chk_fail
   cfi_util.c:(.xiptext+0xe7): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0xf4): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_build_cmd
   cfi_util.c:(.xiptext+0x10c): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_build_cmd
   cfi_util.c:(.xiptext+0x123): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_util.c:(.xiptext+0x12c): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   drivers/mtd/chips/cfi_util.o: in function `cfi_qry_mode_off':
   cfi_util.c:(.xiptext+0x13b): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_util.c:(.xiptext+0x144): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x152): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_build_cmd
   cfi_util.c:(.xiptext+0x168): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x177): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_util.c:(.xiptext+0x180): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x193): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x1a0): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_build_cmd
   cfi_util.c:(.xiptext+0x1cb): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_build_cmd
   cfi_util.c:(.xiptext+0x1ec): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_util.c:(.xiptext+0x1f9): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x207): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_build_cmd
   cfi_util.c:(.xiptext+0x21f): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_build_cmd
   cfi_util.c:(.xiptext+0x246): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_build_cmd
   drivers/mtd/chips/cfi_util.o: in function `cfi_qry_mode_on':
   cfi_util.c:(.xiptext+0x26b): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_util.c:(.xiptext+0x274): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x282): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_build_cmd
   cfi_util.c:(.xiptext+0x2ae): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_build_cmd
   cfi_util.c:(.xiptext+0x2d3): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_util.c:(.xiptext+0x2dc): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x2eb): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_build_cmd
   cfi_util.c:(.xiptext+0x317): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_build_cmd
   cfi_util.c:(.xiptext+0x343): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_cmp4
   cfi_util.c:(.xiptext+0x357): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x360): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x36c): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_build_cmd
   cfi_util.c:(.xiptext+0x398): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_build_cmd
   cfi_util.c:(.xiptext+0x3bb): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_util.c:(.xiptext+0x3c4): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x3d2): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_build_cmd
   cfi_util.c:(.xiptext+0x3fe): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_build_cmd
   cfi_util.c:(.xiptext+0x428): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_cmp4
   cfi_util.c:(.xiptext+0x43b): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x444): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x450): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_build_cmd
   cfi_util.c:(.xiptext+0x47a): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: cfi_build_cmd
   cfi_util.c:(.xiptext+0x4a2): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_util.c:(.xiptext+0x4ab): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x4c7): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x4e0): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp2
   cfi_util.c:(.xiptext+0x4ed): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x4fb): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: _printk
   cfi_util.c:(.xiptext+0x506): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __kmalloc_noprof
   cfi_util.c:(.xiptext+0x51a): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x53c): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x556): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_util.c:(.xiptext+0x568): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x577): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_util.c:(.xiptext+0x582): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x58e): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_util.c:(.xiptext+0x59c): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x5a6): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_util.c:(.xiptext+0x5b4): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x5c9): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x5d3): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_util.c:(.xiptext+0x5e0): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x5ea): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   cfi_util.c:(.xiptext+0x5f3): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   drivers/mtd/chips/cfi_util.o: in function `cfi_read_pri':
   cfi_util.c:(.xiptext+0x60f): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x618): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x632): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x63e): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_cmp4
   cfi_util.c:(.xiptext+0x64c): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   cfi_util.c:(.xiptext+0x66b): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   drivers/mtd/maps/map_funcs.o:(.xiptext+0x7): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   drivers/mtd/maps/map_funcs.o: in function `simple_map_copy_to':
   map_funcs.c:(.xiptext+0x16): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: memcpy_toio
   map_funcs.c:(.xiptext+0x2f): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   map_funcs.c:(.xiptext+0x3b): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   drivers/mtd/maps/map_funcs.o: in function `simple_map_copy_from':
   map_funcs.c:(.xiptext+0x47): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: memcpy
   map_funcs.c:(.xiptext+0x50): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   map_funcs.c:(.xiptext+0x5e): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: memcpy_fromio
   map_funcs.c:(.xiptext+0x64): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   map_funcs.c:(.xiptext+0x8b): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc
   map_funcs.c:(.xiptext+0x9a): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_const_cmp4
   map_funcs.c:(.xiptext+0xa8): dangerous relocation: windowed longcall crosses 1GB boundary; return may fail: __sanitizer_cov_trace_pc

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 06/11] md: support to align bio to limits
  2025-11-27  0:51   ` kernel test robot
@ 2025-11-30  2:38     ` Yu Kuai
  0 siblings, 0 replies; 25+ messages in thread
From: Yu Kuai @ 2025-11-30  2:38 UTC (permalink / raw)
  To: kernel test robot, song, linux-raid, Yu Kuai
  Cc: oe-kbuild-all, linux-kernel, filippo, colyli

Hi,

在 2025/11/27 8:51, kernel test robot 写道:
> Hi Yu,
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on next-20251121]
> [also build test ERROR on v6.18-rc7]
> [cannot apply to linus/master song-md/md-next v6.18-rc7 v6.18-rc6 v6.18-rc5]
> [If your patch is applied to the wrong git tree, kindly drop us a note.

This set should be applied cleanly for the branch mdraid/md-6.19

Thanks,
Kuai

> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url:    https://github.com/intel-lab-lkp/linux/commits/Yu-Kuai/md-merge-mddev-has_superblock-into-mddev_flags/20251124-143826
> base:   next-20251121
> patch link:    https://lore.kernel.org/r/20251124063203.1692144-7-yukuai%40fnnas.com
> patch subject: [PATCH v2 06/11] md: support to align bio to limits
> config: sparc-randconfig-002-20251127 (https://download.01.org/0day-ci/archive/20251127/202511270809.hl08JR8y-lkp@intel.com/config)
> compiler: sparc-linux-gcc (GCC) 11.5.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251127/202511270809.hl08JR8y-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202511270809.hl08JR8y-lkp@intel.com/
>
> All errors (new ones prefixed by >>):
>
>     sparc-linux-ld: drivers/md/md.o: in function `md_submit_bio':
>>> md.c:(.text+0x85d4): undefined reference to `__umoddi3'

-- 
Thanks,
Kuai

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 01/11] md: merge mddev has_superblock into mddev_flags
  2025-11-24  6:31 ` [PATCH v2 01/11] md: merge mddev has_superblock into mddev_flags Yu Kuai
@ 2025-12-26  3:04   ` Li Nan
  0 siblings, 0 replies; 25+ messages in thread
From: Li Nan @ 2025-12-26  3:04 UTC (permalink / raw)
  To: Yu Kuai, song, linux-raid; +Cc: linux-kernel, filippo, colyli



在 2025/11/24 14:31, Yu Kuai 写道:
> There is not need to use a separate field in struct mddev, there are no
> functional changes.
> 
> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
> ---
>   drivers/md/md.c | 6 +++---
>   drivers/md/md.h | 3 ++-
>   2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 7b5c5967568f..b49fdee11a03 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -6462,7 +6462,7 @@ int md_run(struct mddev *mddev)
>   	 * the only valid external interface is through the md
>   	 * device.
>   	 */
> -	mddev->has_superblocks = false;
> +	clear_bit(MD_HAS_SUPERBLOCK, &mddev->flags);
>   	rdev_for_each(rdev, mddev) {
>   		if (test_bit(Faulty, &rdev->flags))
>   			continue;
> @@ -6475,7 +6475,7 @@ int md_run(struct mddev *mddev)
>   		}
>   
>   		if (rdev->sb_page)
> -			mddev->has_superblocks = true;
> +			set_bit(MD_HAS_SUPERBLOCK, &mddev->flags);
>   
>   		/* perform some consistency tests on the device.
>   		 * We don't want the data to overlap the metadata,
> @@ -9085,7 +9085,7 @@ void md_write_start(struct mddev *mddev, struct bio *bi)
>   	rcu_read_unlock();
>   	if (did_change)
>   		sysfs_notify_dirent_safe(mddev->sysfs_state);
> -	if (!mddev->has_superblocks)
> +	if (!test_bit(MD_HAS_SUPERBLOCK, &mddev->flags))
>   		return;
>   	wait_event(mddev->sb_wait,
>   		   !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags));
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 6985f2829bbd..b4c9aa600edd 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -340,6 +340,7 @@ struct md_cluster_operations;
>    *		   array is ready yet.
>    * @MD_BROKEN: This is used to stop writes and mark array as failed.
>    * @MD_DELETED: This device is being deleted
> + * @MD_HAS_SUPERBLOCK: There is persistence sb in member disks.
>    *
>    * change UNSUPPORTED_MDDEV_FLAGS for each array type if new flag is added
>    */
> @@ -356,6 +357,7 @@ enum mddev_flags {
>   	MD_BROKEN,
>   	MD_DO_DELETE,
>   	MD_DELETED,
> +	MD_HAS_SUPERBLOCK,
>   };
>   
>   enum mddev_sb_flags {
> @@ -623,7 +625,6 @@ struct mddev {
>   	/* The sequence number for sync thread */
>   	atomic_t sync_seq;
>   
> -	bool	has_superblocks:1;
>   	bool	fail_last_dev:1;
>   	bool	serialize_policy:1;
>   };

LGTM

Reviewed-by: Li Nan <linan122@huawei.com>


-- 
Thanks,
Nan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 02/11] md: merge mddev faillast_dev into mddev_flags
  2025-11-24  6:31 ` [PATCH v2 02/11] md: merge mddev faillast_dev " Yu Kuai
@ 2025-12-26  3:46   ` Li Nan
  0 siblings, 0 replies; 25+ messages in thread
From: Li Nan @ 2025-12-26  3:46 UTC (permalink / raw)
  To: Yu Kuai, song, linux-raid; +Cc: linux-kernel, filippo, colyli



在 2025/11/24 14:31, Yu Kuai 写道:
> There is not need to use a separate field in struct mddev, there are no
> functional changes.
> 
> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
> ---
>   drivers/md/md.c     | 10 ++++++----
>   drivers/md/md.h     |  3 ++-
>   drivers/md/raid0.c  |  3 ++-
>   drivers/md/raid1.c  |  4 ++--
>   drivers/md/raid10.c |  4 ++--
>   drivers/md/raid5.c  |  5 ++++-
>   6 files changed, 18 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index b49fdee11a03..5dcfd0371090 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -5864,11 +5864,11 @@ __ATTR(consistency_policy, S_IRUGO | S_IWUSR, consistency_policy_show,
>   
>   static ssize_t fail_last_dev_show(struct mddev *mddev, char *page)
>   {
> -	return sprintf(page, "%d\n", mddev->fail_last_dev);
> +	return sprintf(page, "%d\n", test_bit(MD_FAILLAST_DEV, &mddev->flags));
>   }
>   
>   /*
> - * Setting fail_last_dev to true to allow last device to be forcibly removed
> + * Setting MD_FAILLAST_DEV to allow last device to be forcibly removed
>    * from RAID1/RAID10.
>    */
>   static ssize_t
> @@ -5881,8 +5881,10 @@ fail_last_dev_store(struct mddev *mddev, const char *buf, size_t len)
>   	if (ret)
>   		return ret;
>   
> -	if (value != mddev->fail_last_dev)
> -		mddev->fail_last_dev = value;
> +	if (value)
> +		set_bit(MD_FAILLAST_DEV, &mddev->flags);
> +	else
> +		clear_bit(MD_FAILLAST_DEV, &mddev->flags);
>   
>   	return len;
>   }
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index b4c9aa600edd..297a104fba88 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -341,6 +341,7 @@ struct md_cluster_operations;
>    * @MD_BROKEN: This is used to stop writes and mark array as failed.
>    * @MD_DELETED: This device is being deleted
>    * @MD_HAS_SUPERBLOCK: There is persistence sb in member disks.
> + * @MD_FAILLAST_DEV: Allow last rdev to be removed.
>    *
>    * change UNSUPPORTED_MDDEV_FLAGS for each array type if new flag is added
>    */
> @@ -358,6 +359,7 @@ enum mddev_flags {
>   	MD_DO_DELETE,
>   	MD_DELETED,
>   	MD_HAS_SUPERBLOCK,
> +	MD_FAILLAST_DEV,
>   };
>   
>   enum mddev_sb_flags {
> @@ -625,7 +627,6 @@ struct mddev {
>   	/* The sequence number for sync thread */
>   	atomic_t sync_seq;
>   
> -	bool	fail_last_dev:1;
>   	bool	serialize_policy:1;
>   };
>   
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index 47aee1b1d4d1..012d8402af28 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -27,7 +27,8 @@ module_param(default_layout, int, 0644);
>   	 (1L << MD_JOURNAL_CLEAN) |	\
>   	 (1L << MD_FAILFAST_SUPPORTED) |\
>   	 (1L << MD_HAS_PPL) |		\
> -	 (1L << MD_HAS_MULTIPLE_PPLS))
> +	 (1L << MD_HAS_MULTIPLE_PPLS) |	\
> +	 (1L << MD_FAILLAST_DEV))
>   
>   /*
>    * inform the user of the raid configuration
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 57d50465eed1..98b5c93810bb 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1746,7 +1746,7 @@ static void raid1_status(struct seq_file *seq, struct mddev *mddev)
>    *	- &mddev->degraded is bumped.
>    *
>    * @rdev is marked as &Faulty excluding case when array is failed and
> - * &mddev->fail_last_dev is off.
> + * MD_FAILLAST_DEV is not set.
>    */
>   static void raid1_error(struct mddev *mddev, struct md_rdev *rdev)
>   {
> @@ -1759,7 +1759,7 @@ static void raid1_error(struct mddev *mddev, struct md_rdev *rdev)
>   	    (conf->raid_disks - mddev->degraded) == 1) {
>   		set_bit(MD_BROKEN, &mddev->flags);
>   
> -		if (!mddev->fail_last_dev) {
> +		if (!test_bit(MD_FAILLAST_DEV, &mddev->flags)) {
>   			conf->recovery_disabled = mddev->recovery_disabled;
>   			spin_unlock_irqrestore(&conf->device_lock, flags);
>   			return;
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 84be4cc7e873..09328e032f14 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -1990,7 +1990,7 @@ static int enough(struct r10conf *conf, int ignore)
>    *	- &mddev->degraded is bumped.
>    *
>    * @rdev is marked as &Faulty excluding case when array is failed and
> - * &mddev->fail_last_dev is off.
> + * MD_FAILLAST_DEV is not set.
>    */
>   static void raid10_error(struct mddev *mddev, struct md_rdev *rdev)
>   {
> @@ -2002,7 +2002,7 @@ static void raid10_error(struct mddev *mddev, struct md_rdev *rdev)
>   	if (test_bit(In_sync, &rdev->flags) && !enough(conf, rdev->raid_disk)) {
>   		set_bit(MD_BROKEN, &mddev->flags);
>   
> -		if (!mddev->fail_last_dev) {
> +		if (!test_bit(MD_FAILLAST_DEV, &mddev->flags)) {
>   			spin_unlock_irqrestore(&conf->device_lock, flags);
>   			return;
>   		}
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index cdbc7eba5c54..74f6729864fa 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -56,7 +56,10 @@
>   #include "md-bitmap.h"
>   #include "raid5-log.h"
>   
> -#define UNSUPPORTED_MDDEV_FLAGS	(1L << MD_FAILFAST_SUPPORTED)
> +#define UNSUPPORTED_MDDEV_FLAGS		\
> +	((1L << MD_FAILFAST_SUPPORTED) |	\
> +	 (1L << MD_FAILLAST_DEV))
> +
>   
>   #define cpu_to_group(cpu) cpu_to_node(cpu)
>   #define ANY_GROUP NUMA_NO_NODE
LGTM

Reviewed-by: Li Nan <linan122@huawei.com>

-- 
Thanks,
Nan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 03/11] md: merge mddev serialize_policy into mddev_flags
  2025-11-24  6:31 ` [PATCH v2 03/11] md: merge mddev serialize_policy " Yu Kuai
@ 2025-12-26  6:33   ` Li Nan
  0 siblings, 0 replies; 25+ messages in thread
From: Li Nan @ 2025-12-26  6:33 UTC (permalink / raw)
  To: Yu Kuai, song, linux-raid; +Cc: linux-kernel, filippo, colyli



在 2025/11/24 14:31, Yu Kuai 写道:
> There is not need to use a separate field in struct mddev, there are no
> functional changes.
> 
> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
> ---
>   drivers/md/md-bitmap.c |  4 ++--
>   drivers/md/md.c        | 20 ++++++++++++--------
>   drivers/md/md.h        |  4 ++--
>   drivers/md/raid0.c     |  3 ++-
>   drivers/md/raid1.c     |  4 ++--
>   drivers/md/raid5.c     |  3 ++-
>   6 files changed, 22 insertions(+), 16 deletions(-)
> 


LGTM

Reviewed-by: Li Nan <linan122@huawei.com>

-- 
Thanks,
Nan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 04/11] md/raid5: use mempool to allocate stripe_request_ctx
  2025-11-24  6:31 ` [PATCH v2 04/11] md/raid5: use mempool to allocate stripe_request_ctx Yu Kuai
@ 2025-12-26  8:33   ` Li Nan
  2026-01-03 11:12     ` Yu Kuai
  2025-12-30  9:38   ` Li Nan
  1 sibling, 1 reply; 25+ messages in thread
From: Li Nan @ 2025-12-26  8:33 UTC (permalink / raw)
  To: Yu Kuai, song, linux-raid; +Cc: linux-kernel, filippo, colyli



在 2025/11/24 14:31, Yu Kuai 写道:
> On the one hand, stripe_request_ctx is 72 bytes, and it's a bit huge for
> a stack variable.
> 
> On the other hand, the bitmap sectors_to_do is a fixed size, result in
> max_hw_sector_kb of raid5 array is at most 256 * 4k = 1Mb, and this will
> make full stripe IO impossible for the array that chunk_size * data_disks
> is bigger. Allocate ctx during runtime will make it possible to get rid
> of this limit.
> 
> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
> ---
>   drivers/md/md.h       |  4 +++
>   drivers/md/raid1-10.c |  5 ----
>   drivers/md/raid5.c    | 61 +++++++++++++++++++++++++++----------------
>   drivers/md/raid5.h    |  2 ++
>   4 files changed, 45 insertions(+), 27 deletions(-)
> 

[...]

> @@ -7374,6 +7380,10 @@ static void free_conf(struct r5conf *conf)
>   	bioset_exit(&conf->bio_split);
>   	kfree(conf->stripe_hashtbl);
>   	kfree(conf->pending_data);
> +
> +	if (conf->ctx_pool)
> +		mempool_destroy(conf->ctx_pool);
> +
>   	kfree(conf);
>   }
>   
> @@ -8057,6 +8067,13 @@ static int raid5_run(struct mddev *mddev)
>   			goto abort;
>   	}
>   
> +	conf->ctx_pool = mempool_create_kmalloc_pool(NR_RAID_BIOS,
> +					sizeof(struct stripe_request_ctx));
> +	if (!conf->ctx_pool) {
> +		ret = -ENOMEM;
> +		goto abort;
> +	}
> +

What about moving create to setup_conf()? If so, call destroy in
free_conf() without checks.

>   	if (log_init(conf, journal_dev, raid5_has_ppl(conf)))
>   		goto abort;
>   
> diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
> index eafc6e9ed6ee..6e3f07119fa4 100644
> --- a/drivers/md/raid5.h
> +++ b/drivers/md/raid5.h
> @@ -690,6 +690,8 @@ struct r5conf {
>   	struct list_head	pending_list;
>   	int			pending_data_cnt;
>   	struct r5pending_data	*next_pending_data;
> +
> +	mempool_t		*ctx_pool;
>   };
>   
>   #if PAGE_SIZE == DEFAULT_STRIPE_SIZE

-- 
Thanks,
Nan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 04/11] md/raid5: use mempool to allocate stripe_request_ctx
  2025-11-24  6:31 ` [PATCH v2 04/11] md/raid5: use mempool to allocate stripe_request_ctx Yu Kuai
  2025-12-26  8:33   ` Li Nan
@ 2025-12-30  9:38   ` Li Nan
  2026-01-03 11:13     ` Yu Kuai
  1 sibling, 1 reply; 25+ messages in thread
From: Li Nan @ 2025-12-30  9:38 UTC (permalink / raw)
  To: Yu Kuai, song, linux-raid; +Cc: linux-kernel, filippo, colyli



在 2025/11/24 14:31, Yu Kuai 写道:
> On the one hand, stripe_request_ctx is 72 bytes, and it's a bit huge for
> a stack variable.
> 
> On the other hand, the bitmap sectors_to_do is a fixed size, result in
> max_hw_sector_kb of raid5 array is at most 256 * 4k = 1Mb, and this will
> make full stripe IO impossible for the array that chunk_size * data_disks
> is bigger. Allocate ctx during runtime will make it possible to get rid
> of this limit.
> 
> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
> ---
>   drivers/md/md.h       |  4 +++
>   drivers/md/raid1-10.c |  5 ----
>   drivers/md/raid5.c    | 61 +++++++++++++++++++++++++++----------------
>   drivers/md/raid5.h    |  2 ++
>   4 files changed, 45 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 6ee18045f41c..b8c5dec12b62 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -22,6 +22,10 @@
>   #include <trace/events/block.h>
>   
>   #define MaxSector (~(sector_t)0)
> +/*
> + * Number of guaranteed raid bios in case of extreme VM load:
> + */
> +#define	NR_RAID_BIOS 256
>   
>   enum md_submodule_type {
>   	MD_PERSONALITY = 0,
> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
> index 521625756128..c33099925f23 100644
> --- a/drivers/md/raid1-10.c
> +++ b/drivers/md/raid1-10.c
> @@ -3,11 +3,6 @@
>   #define RESYNC_BLOCK_SIZE (64*1024)
>   #define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
>   
> -/*
> - * Number of guaranteed raid bios in case of extreme VM load:
> - */
> -#define	NR_RAID_BIOS 256
> -
>   /* when we get a read error on a read-only array, we redirect to another
>    * device without failing the first device, or trying to over-write to
>    * correct the read error.  To keep track of bad blocks on a per-bio
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index f405ba7b99a7..0080dec4a6ef 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -6083,13 +6083,13 @@ static sector_t raid5_bio_lowest_chunk_sector(struct r5conf *conf,
>   static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
>   {
>   	DEFINE_WAIT_FUNC(wait, woken_wake_function);
> -	bool on_wq;
>   	struct r5conf *conf = mddev->private;
> -	sector_t logical_sector;
> -	struct stripe_request_ctx ctx = {};
>   	const int rw = bio_data_dir(bi);
> +	struct stripe_request_ctx *ctx;
> +	sector_t logical_sector;
>   	enum stripe_result res;
>   	int s, stripe_cnt;
> +	bool on_wq;
>   
>   	if (unlikely(bi->bi_opf & REQ_PREFLUSH)) {
>   		int ret = log_handle_flush_request(conf, bi);
> @@ -6101,11 +6101,6 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
>   				return true;
>   		}
>   		/* ret == -EAGAIN, fallback */
> -		/*
> -		 * if r5l_handle_flush_request() didn't clear REQ_PREFLUSH,
> -		 * we need to flush journal device
> -		 */
> -		ctx.do_flush = bi->bi_opf & REQ_PREFLUSH;
>   	}
>   
>   	md_write_start(mddev, bi);
> @@ -6128,16 +6123,24 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
>   	}
>   
>   	logical_sector = bi->bi_iter.bi_sector & ~((sector_t)RAID5_STRIPE_SECTORS(conf)-1);
> -	ctx.first_sector = logical_sector;
> -	ctx.last_sector = bio_end_sector(bi);
>   	bi->bi_next = NULL;
>   
> -	stripe_cnt = DIV_ROUND_UP_SECTOR_T(ctx.last_sector - logical_sector,
> +	ctx = mempool_alloc(conf->ctx_pool, GFP_NOIO | __GFP_ZERO);

In mempool_alloc_noprof():
	VM_WARN_ON_ONCE(gfp_mask & __GFP_ZERO);

__GFP_ZERO should be removed and ensure init before accessing the members.

-- 
Thanks,
Nan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 04/11] md/raid5: use mempool to allocate stripe_request_ctx
  2025-12-26  8:33   ` Li Nan
@ 2026-01-03 11:12     ` Yu Kuai
  0 siblings, 0 replies; 25+ messages in thread
From: Yu Kuai @ 2026-01-03 11:12 UTC (permalink / raw)
  To: Li Nan, song, linux-raid; +Cc: linux-kernel, filippo, colyli, yukuai

Hi,

在 2025/12/26 16:33, Li Nan 写道:
>
>
> 在 2025/11/24 14:31, Yu Kuai 写道:
>> On the one hand, stripe_request_ctx is 72 bytes, and it's a bit huge for
>> a stack variable.
>>
>> On the other hand, the bitmap sectors_to_do is a fixed size, result in
>> max_hw_sector_kb of raid5 array is at most 256 * 4k = 1Mb, and this will
>> make full stripe IO impossible for the array that chunk_size * 
>> data_disks
>> is bigger. Allocate ctx during runtime will make it possible to get rid
>> of this limit.
>>
>> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
>> ---
>>   drivers/md/md.h       |  4 +++
>>   drivers/md/raid1-10.c |  5 ----
>>   drivers/md/raid5.c    | 61 +++++++++++++++++++++++++++----------------
>>   drivers/md/raid5.h    |  2 ++
>>   4 files changed, 45 insertions(+), 27 deletions(-)
>>
>
> [...]
>
>> @@ -7374,6 +7380,10 @@ static void free_conf(struct r5conf *conf)
>>       bioset_exit(&conf->bio_split);
>>       kfree(conf->stripe_hashtbl);
>>       kfree(conf->pending_data);
>> +
>> +    if (conf->ctx_pool)
>> +        mempool_destroy(conf->ctx_pool);
>> +
>>       kfree(conf);
>>   }
>>   @@ -8057,6 +8067,13 @@ static int raid5_run(struct mddev *mddev)
>>               goto abort;
>>       }
>>   +    conf->ctx_pool = mempool_create_kmalloc_pool(NR_RAID_BIOS,
>> +                    sizeof(struct stripe_request_ctx));
>> +    if (!conf->ctx_pool) {
>> +        ret = -ENOMEM;
>> +        goto abort;
>> +    }
>> +
>
> What about moving create to setup_conf()? If so, call destroy in
> free_conf() without checks.

No, we can't, this must be done after raid5_set_limits(), which is called
at the end of raid5_run().

>
>>       if (log_init(conf, journal_dev, raid5_has_ppl(conf)))
>>           goto abort;
>>   diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
>> index eafc6e9ed6ee..6e3f07119fa4 100644
>> --- a/drivers/md/raid5.h
>> +++ b/drivers/md/raid5.h
>> @@ -690,6 +690,8 @@ struct r5conf {
>>       struct list_head    pending_list;
>>       int            pending_data_cnt;
>>       struct r5pending_data    *next_pending_data;
>> +
>> +    mempool_t        *ctx_pool;
>>   };
>>     #if PAGE_SIZE == DEFAULT_STRIPE_SIZE
>
-- 
Thansk,
Kuai

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 04/11] md/raid5: use mempool to allocate stripe_request_ctx
  2025-12-30  9:38   ` Li Nan
@ 2026-01-03 11:13     ` Yu Kuai
  0 siblings, 0 replies; 25+ messages in thread
From: Yu Kuai @ 2026-01-03 11:13 UTC (permalink / raw)
  To: Li Nan, song, linux-raid; +Cc: linux-kernel, filippo, colyli, yukuai

Hi,

在 2025/12/30 17:38, Li Nan 写道:
> In mempool_alloc_noprof():
>     VM_WARN_ON_ONCE(gfp_mask & __GFP_ZERO);
>
> __GFP_ZERO should be removed and ensure init before accessing the members.

Looks correct, will fix it.

-- 
Thansk,
Kuai

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 06/11] md: support to align bio to limits
  2026-01-03 15:45 [PATCH v2 00/11] md: align bio to io_opt and fix abnormal io_opt Yu Kuai
@ 2026-01-03 15:45 ` Yu Kuai
  2026-01-06  3:04   ` Xiao Ni
  2026-01-06  6:42   ` Li Nan
  0 siblings, 2 replies; 25+ messages in thread
From: Yu Kuai @ 2026-01-03 15:45 UTC (permalink / raw)
  To: linux-raid; +Cc: yukuai, colyli, linan122

For personalities that report optimal IO size, it's indicate that users
can get the best IO bandwidth if they issue IO with this size. However
there is also an implicit condition that IO should also be aligned to the
optimal IO size.

Currently, bio will only be split by limits, if bio offset is not aligned
to limits, then all split bio will not be aligned. This patch add a new
feature to align bio to limits first, and following patches will support
this for each personality if necessary.

Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 drivers/md/md.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 drivers/md/md.h |  2 ++
 2 files changed, 48 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 21b0bc3088d2..7292aedef01b 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -428,6 +428,48 @@ bool md_handle_request(struct mddev *mddev, struct bio *bio)
 }
 EXPORT_SYMBOL(md_handle_request);
 
+static struct bio *__md_bio_align_to_limits(struct mddev *mddev,
+                                           struct bio *bio)
+{
+	unsigned int max_sectors = mddev->gendisk->queue->limits.max_sectors;
+	sector_t start = bio->bi_iter.bi_sector;
+	sector_t align_start = roundup(start, max_sectors);
+	sector_t end;
+	sector_t align_end;
+
+	/* already aligned */
+	if (align_start == start)
+		return bio;
+
+	end = start + bio_sectors(bio);
+	align_end = rounddown(end, max_sectors);
+
+	/* bio is too small to split */
+	if (align_end <= align_start)
+		return bio;
+
+	return bio_submit_split_bioset(bio, align_start - start,
+				       &mddev->gendisk->bio_split);
+}
+
+static struct bio *md_bio_align_to_limits(struct mddev *mddev, struct bio *bio)
+{
+	if (!test_bit(MD_BIO_ALIGN, &mddev->flags))
+		return bio;
+
+	/* atomic write can't split */
+	if (bio->bi_opf & REQ_ATOMIC)
+		return bio;
+
+	switch (bio_op(bio)) {
+	case REQ_OP_READ:
+	case REQ_OP_WRITE:
+		return __md_bio_align_to_limits(mddev, bio);
+	default:
+		return bio;
+	}
+}
+
 static void md_submit_bio(struct bio *bio)
 {
 	const int rw = bio_data_dir(bio);
@@ -443,6 +485,10 @@ static void md_submit_bio(struct bio *bio)
 		return;
 	}
 
+	bio = md_bio_align_to_limits(mddev, bio);
+	if (!bio)
+		return;
+
 	bio = bio_split_to_limits(bio);
 	if (!bio)
 		return;
diff --git a/drivers/md/md.h b/drivers/md/md.h
index b8c5dec12b62..e7aba83b708b 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -347,6 +347,7 @@ struct md_cluster_operations;
  * @MD_HAS_SUPERBLOCK: There is persistence sb in member disks.
  * @MD_FAILLAST_DEV: Allow last rdev to be removed.
  * @MD_SERIALIZE_POLICY: Enforce write IO is not reordered, just used by raid1.
+ * @MD_BIO_ALIGN: Bio issued to the array will align to io_opt before split.
  *
  * change UNSUPPORTED_MDDEV_FLAGS for each array type if new flag is added
  */
@@ -366,6 +367,7 @@ enum mddev_flags {
 	MD_HAS_SUPERBLOCK,
 	MD_FAILLAST_DEV,
 	MD_SERIALIZE_POLICY,
+	MD_BIO_ALIGN,
 };
 
 enum mddev_sb_flags {
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 06/11] md: support to align bio to limits
  2026-01-03 15:45 ` [PATCH v2 06/11] md: support to align bio to limits Yu Kuai
@ 2026-01-06  3:04   ` Xiao Ni
  2026-01-06  6:42   ` Li Nan
  1 sibling, 0 replies; 25+ messages in thread
From: Xiao Ni @ 2026-01-06  3:04 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, colyli, linan122

On Sat, Jan 3, 2026 at 11:46 PM Yu Kuai <yukuai@fnnas.com> wrote:
>
> For personalities that report optimal IO size, it's indicate that users

Hi Kuai

type error here
s/it's indicate/it indicates/g

Regards
Xiao

> can get the best IO bandwidth if they issue IO with this size. However
> there is also an implicit condition that IO should also be aligned to the
> optimal IO size.
>
> Currently, bio will only be split by limits, if bio offset is not aligned
> to limits, then all split bio will not be aligned. This patch add a new
> feature to align bio to limits first, and following patches will support
> this for each personality if necessary.
>
> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
> ---
>  drivers/md/md.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
>  drivers/md/md.h |  2 ++
>  2 files changed, 48 insertions(+)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 21b0bc3088d2..7292aedef01b 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -428,6 +428,48 @@ bool md_handle_request(struct mddev *mddev, struct bio *bio)
>  }
>  EXPORT_SYMBOL(md_handle_request);
>
> +static struct bio *__md_bio_align_to_limits(struct mddev *mddev,
> +                                           struct bio *bio)
> +{
> +       unsigned int max_sectors = mddev->gendisk->queue->limits.max_sectors;
> +       sector_t start = bio->bi_iter.bi_sector;
> +       sector_t align_start = roundup(start, max_sectors);
> +       sector_t end;
> +       sector_t align_end;
> +
> +       /* already aligned */
> +       if (align_start == start)
> +               return bio;
> +
> +       end = start + bio_sectors(bio);
> +       align_end = rounddown(end, max_sectors);
> +
> +       /* bio is too small to split */
> +       if (align_end <= align_start)
> +               return bio;
> +
> +       return bio_submit_split_bioset(bio, align_start - start,
> +                                      &mddev->gendisk->bio_split);
> +}
> +
> +static struct bio *md_bio_align_to_limits(struct mddev *mddev, struct bio *bio)
> +{
> +       if (!test_bit(MD_BIO_ALIGN, &mddev->flags))
> +               return bio;
> +
> +       /* atomic write can't split */
> +       if (bio->bi_opf & REQ_ATOMIC)
> +               return bio;
> +
> +       switch (bio_op(bio)) {
> +       case REQ_OP_READ:
> +       case REQ_OP_WRITE:
> +               return __md_bio_align_to_limits(mddev, bio);
> +       default:
> +               return bio;
> +       }
> +}
> +
>  static void md_submit_bio(struct bio *bio)
>  {
>         const int rw = bio_data_dir(bio);
> @@ -443,6 +485,10 @@ static void md_submit_bio(struct bio *bio)
>                 return;
>         }
>
> +       bio = md_bio_align_to_limits(mddev, bio);
> +       if (!bio)
> +               return;
> +
>         bio = bio_split_to_limits(bio);
>         if (!bio)
>                 return;
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index b8c5dec12b62..e7aba83b708b 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -347,6 +347,7 @@ struct md_cluster_operations;
>   * @MD_HAS_SUPERBLOCK: There is persistence sb in member disks.
>   * @MD_FAILLAST_DEV: Allow last rdev to be removed.
>   * @MD_SERIALIZE_POLICY: Enforce write IO is not reordered, just used by raid1.
> + * @MD_BIO_ALIGN: Bio issued to the array will align to io_opt before split.
>   *
>   * change UNSUPPORTED_MDDEV_FLAGS for each array type if new flag is added
>   */
> @@ -366,6 +367,7 @@ enum mddev_flags {
>         MD_HAS_SUPERBLOCK,
>         MD_FAILLAST_DEV,
>         MD_SERIALIZE_POLICY,
> +       MD_BIO_ALIGN,
>  };
>
>  enum mddev_sb_flags {
> --
> 2.51.0
>
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 06/11] md: support to align bio to limits
  2026-01-03 15:45 ` [PATCH v2 06/11] md: support to align bio to limits Yu Kuai
  2026-01-06  3:04   ` Xiao Ni
@ 2026-01-06  6:42   ` Li Nan
  1 sibling, 0 replies; 25+ messages in thread
From: Li Nan @ 2026-01-06  6:42 UTC (permalink / raw)
  To: Yu Kuai, linux-raid; +Cc: colyli



在 2026/1/3 23:45, Yu Kuai 写道:
> For personalities that report optimal IO size, it's indicate that users
> can get the best IO bandwidth if they issue IO with this size. However
> there is also an implicit condition that IO should also be aligned to the
> optimal IO size.
> 
> Currently, bio will only be split by limits, if bio offset is not aligned
> to limits, then all split bio will not be aligned. This patch add a new
> feature to align bio to limits first, and following patches will support
> this for each personality if necessary.
> 
> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
> ---
>   drivers/md/md.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
>   drivers/md/md.h |  2 ++
>   2 files changed, 48 insertions(+)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 21b0bc3088d2..7292aedef01b 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -428,6 +428,48 @@ bool md_handle_request(struct mddev *mddev, struct bio *bio)
>   }
>   EXPORT_SYMBOL(md_handle_request);
>   
> +static struct bio *__md_bio_align_to_limits(struct mddev *mddev,
> +                                           struct bio *bio)
> +{
> +	unsigned int max_sectors = mddev->gendisk->queue->limits.max_sectors;
> +	sector_t start = bio->bi_iter.bi_sector;
> +	sector_t align_start = roundup(start, max_sectors);
> +	sector_t end;
> +	sector_t align_end;
> +
> +	/* already aligned */
> +	if (align_start == start)
> +		return bio;
> +
> +	end = start + bio_sectors(bio);
> +	align_end = rounddown(end, max_sectors);
> +
> +	/* bio is too small to split */
> +	if (align_end <= align_start)
> +		return bio;
> +
> +	return bio_submit_split_bioset(bio, align_start - start,
> +				       &mddev->gendisk->bio_split);
> +}
> +
> +static struct bio *md_bio_align_to_limits(struct mddev *mddev, struct bio *bio)
> +{
> +	if (!test_bit(MD_BIO_ALIGN, &mddev->flags))
> +		return bio;
> +
> +	/* atomic write can't split */
> +	if (bio->bi_opf & REQ_ATOMIC)
> +		return bio;
> +
> +	switch (bio_op(bio)) {
> +	case REQ_OP_READ:
> +	case REQ_OP_WRITE:
> +		return __md_bio_align_to_limits(mddev, bio);
> +	default:
> +		return bio;
> +	}
> +}
> +
>   static void md_submit_bio(struct bio *bio)
>   {
>   	const int rw = bio_data_dir(bio);
> @@ -443,6 +485,10 @@ static void md_submit_bio(struct bio *bio)
>   		return;
>   	}
>   
> +	bio = md_bio_align_to_limits(mddev, bio);
> +	if (!bio)
> +		return;
> +
>   	bio = bio_split_to_limits(bio);
>   	if (!bio)
>   		return;
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index b8c5dec12b62..e7aba83b708b 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -347,6 +347,7 @@ struct md_cluster_operations;
>    * @MD_HAS_SUPERBLOCK: There is persistence sb in member disks.
>    * @MD_FAILLAST_DEV: Allow last rdev to be removed.
>    * @MD_SERIALIZE_POLICY: Enforce write IO is not reordered, just used by raid1.
> + * @MD_BIO_ALIGN: Bio issued to the array will align to io_opt before split.
>    *
>    * change UNSUPPORTED_MDDEV_FLAGS for each array type if new flag is added
>    */
> @@ -366,6 +367,7 @@ enum mddev_flags {
>   	MD_HAS_SUPERBLOCK,
>   	MD_FAILLAST_DEV,
>   	MD_SERIALIZE_POLICY,
> +	MD_BIO_ALIGN,
>   };
>   
>   enum mddev_sb_flags {

LGTM

Reviewed-by: Li Nan <linan122@huawei.com>

-- 
Thanks,
Nan


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2026-01-06  6:42 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-24  6:31 [PATCH v2 00/11] md: align bio to io_opt and fix abnormal io_opt Yu Kuai
2025-11-24  6:31 ` [PATCH v2 01/11] md: merge mddev has_superblock into mddev_flags Yu Kuai
2025-12-26  3:04   ` Li Nan
2025-11-24  6:31 ` [PATCH v2 02/11] md: merge mddev faillast_dev " Yu Kuai
2025-12-26  3:46   ` Li Nan
2025-11-24  6:31 ` [PATCH v2 03/11] md: merge mddev serialize_policy " Yu Kuai
2025-12-26  6:33   ` Li Nan
2025-11-24  6:31 ` [PATCH v2 04/11] md/raid5: use mempool to allocate stripe_request_ctx Yu Kuai
2025-12-26  8:33   ` Li Nan
2026-01-03 11:12     ` Yu Kuai
2025-12-30  9:38   ` Li Nan
2026-01-03 11:13     ` Yu Kuai
2025-11-24  6:31 ` [PATCH v2 05/11] md/raid5: make sure max_sectors is not less than io_opt Yu Kuai
2025-11-24  6:31 ` [PATCH v2 06/11] md: support to align bio to limits Yu Kuai
2025-11-27  0:51   ` kernel test robot
2025-11-30  2:38     ` Yu Kuai
2025-11-27  7:05   ` kernel test robot
2025-11-24  6:31 ` [PATCH v2 07/11] md: add a helper md_config_align_limits() Yu Kuai
2025-11-24  6:32 ` [PATCH v2 08/11] md/raid5: align bio to io_opt Yu Kuai
2025-11-24  6:32 ` [PATCH v2 09/11] md/raid10: " Yu Kuai
2025-11-24  6:32 ` [PATCH v2 10/11] md/raid0: " Yu Kuai
2025-11-24  6:32 ` [PATCH v2 11/11] md: fix abnormal io_opt from member disks Yu Kuai
  -- strict thread matches above, loose matches on Subject: below --
2026-01-03 15:45 [PATCH v2 00/11] md: align bio to io_opt and fix abnormal io_opt Yu Kuai
2026-01-03 15:45 ` [PATCH v2 06/11] md: support to align bio to limits Yu Kuai
2026-01-06  3:04   ` Xiao Ni
2026-01-06  6:42   ` Li Nan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox