[PATCH v4 00/11] md: align bio to io_opt for better performance

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v4 00/11] md: align bio to io_opt for better performance
@ 2026-01-12  4:28 Yu Kuai
  2026-01-12  4:28 ` [PATCH v4 01/11] md: merge mddev has_superblock into mddev_flags Yu Kuai
                   ` (10 more replies)
  0 siblings, 11 replies; 19+ messages in thread
From: Yu Kuai @ 2026-01-12  4:28 UTC (permalink / raw)
  To: linux-raid, linan122; +Cc: yukuai

This patchset optimizes MD RAID performance by aligning bios to the
optimal I/O size before splitting. When I/O is aligned to io_opt,
raid5 can perform full stripe writes without needing to read extra
data for parity calculation, significantly improving bandwidth.

Patches 1-3: Cleanup - merge boolean fields into mddev_flags
Patches 4-5: Preparation - use mempool for stripe_request_ctx and
             ensure max_sectors >= io_opt
Patches 6-7: Core - add bio alignment infrastructure
Patches 8-10: Enable bio alignment for raid5, raid10, and raid0
Patch 11: Fix abnormal io_opt from member disks

Performance improvement on 32-disk raid5 with 64kb chunk:
  dd if=/dev/zero of=/dev/md0 bs=100M oflag=direct
  Before: 782 MB/s
  After:  1.1 GB/s

Changes in v4:
- Patch 11: Simplify by checking rdev_is_mddev() first, remove
  MD_STACK_IO_OPT flag

Changes in v3:
- Patch 4: Remove unnecessary NULL check before mempool_destroy()
- Patch 6: Use sector_div() instead of roundup()/rounddown() to fix
  64-bit division issue on 32-bit platforms

Changes in v2:
- Fix mempool in patch 4
- Add prep cleanup patches, 1-3
- Add patch 11 to fix abnormal io_opt
- Add Link tags to patches

Yu Kuai (11):
  md: merge mddev has_superblock into mddev_flags
  md: merge mddev faillast_dev into mddev_flags
  md: merge mddev serialize_policy into mddev_flags
  md/raid5: use mempool to allocate stripe_request_ctx
  md/raid5: make sure max_sectors is not less than io_opt
  md: support to align bio to limits
  md: add a helper md_config_align_limits()
  md/raid5: align bio to io_opt
  md/raid10: align bio to io_opt
  md/raid0: align bio to io_opt
  md: fix abnormal io_opt from member disks

 drivers/md/md-bitmap.c |   4 +-
 drivers/md/md.c        | 118 +++++++++++++++++++++++++++++++++++------
 drivers/md/md.h        |  30 +++++++++--
 drivers/md/raid0.c     |   6 ++-
 drivers/md/raid1-10.c  |   5 --
 drivers/md/raid1.c     |  13 ++---
 drivers/md/raid10.c    |  10 ++--
 drivers/md/raid5.c     |  92 ++++++++++++++++++++++----------
 drivers/md/raid5.h     |   3 ++
 9 files changed, 215 insertions(+), 66 deletions(-)

-- 
2.51.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 01/11] md: merge mddev has_superblock into mddev_flags
  2026-01-12  4:28 [PATCH v4 00/11] md: align bio to io_opt for better performance Yu Kuai
@ 2026-01-12  4:28 ` Yu Kuai
  2026-01-12  4:28 ` [PATCH v4 02/11] md: merge mddev faillast_dev " Yu Kuai
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2026-01-12  4:28 UTC (permalink / raw)
  To: linux-raid, linan122; +Cc: yukuai

There is not need to use a separate field in struct mddev, there are no
functional changes.

Link: https://lore.kernel.org/linux-raid/20260103154543.832844-2-yukuai@fnnas.com
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
Reviewed-by: Li Nan <linan122@huawei.com>
---
 drivers/md/md.c | 6 +++---
 drivers/md/md.h | 3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index e5922a682953..91a30ed6b01e 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6463,7 +6463,7 @@ int md_run(struct mddev *mddev)
 	 * the only valid external interface is through the md
 	 * device.
 	 */
-	mddev->has_superblocks = false;
+	clear_bit(MD_HAS_SUPERBLOCK, &mddev->flags);
 	rdev_for_each(rdev, mddev) {
 		if (test_bit(Faulty, &rdev->flags))
 			continue;
@@ -6476,7 +6476,7 @@ int md_run(struct mddev *mddev)
 		}
 
 		if (rdev->sb_page)
-			mddev->has_superblocks = true;
+			set_bit(MD_HAS_SUPERBLOCK, &mddev->flags);
 
 		/* perform some consistency tests on the device.
 		 * We don't want the data to overlap the metadata,
@@ -9086,7 +9086,7 @@ void md_write_start(struct mddev *mddev, struct bio *bi)
 	rcu_read_unlock();
 	if (did_change)
 		sysfs_notify_dirent_safe(mddev->sysfs_state);
-	if (!mddev->has_superblocks)
+	if (!test_bit(MD_HAS_SUPERBLOCK, &mddev->flags))
 		return;
 	wait_event(mddev->sb_wait,
 		   !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags));
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 6985f2829bbd..b4c9aa600edd 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -340,6 +340,7 @@ struct md_cluster_operations;
  *		   array is ready yet.
  * @MD_BROKEN: This is used to stop writes and mark array as failed.
  * @MD_DELETED: This device is being deleted
+ * @MD_HAS_SUPERBLOCK: There is persistence sb in member disks.
  *
  * change UNSUPPORTED_MDDEV_FLAGS for each array type if new flag is added
  */
@@ -356,6 +357,7 @@ enum mddev_flags {
 	MD_BROKEN,
 	MD_DO_DELETE,
 	MD_DELETED,
+	MD_HAS_SUPERBLOCK,
 };
 
 enum mddev_sb_flags {
@@ -623,7 +625,6 @@ struct mddev {
 	/* The sequence number for sync thread */
 	atomic_t sync_seq;
 
-	bool	has_superblocks:1;
 	bool	fail_last_dev:1;
 	bool	serialize_policy:1;
 };
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 02/11] md: merge mddev faillast_dev into mddev_flags
  2026-01-12  4:28 [PATCH v4 00/11] md: align bio to io_opt for better performance Yu Kuai
  2026-01-12  4:28 ` [PATCH v4 01/11] md: merge mddev has_superblock into mddev_flags Yu Kuai
@ 2026-01-12  4:28 ` Yu Kuai
  2026-01-12  4:28 ` [PATCH v4 03/11] md: merge mddev serialize_policy " Yu Kuai
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2026-01-12  4:28 UTC (permalink / raw)
  To: linux-raid, linan122; +Cc: yukuai

There is not need to use a separate field in struct mddev, there are no
functional changes.

Link: https://lore.kernel.org/linux-raid/20260103154543.832844-3-yukuai@fnnas.com
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
Reviewed-by: Li Nan <linan122@huawei.com>
---
 drivers/md/md.c     | 10 ++++++----
 drivers/md/md.h     |  3 ++-
 drivers/md/raid0.c  |  3 ++-
 drivers/md/raid1.c  |  4 ++--
 drivers/md/raid10.c |  4 ++--
 drivers/md/raid5.c  |  5 ++++-
 6 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 91a30ed6b01e..be0d33fbf988 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5865,11 +5865,11 @@ __ATTR(consistency_policy, S_IRUGO | S_IWUSR, consistency_policy_show,
 
 static ssize_t fail_last_dev_show(struct mddev *mddev, char *page)
 {
-	return sprintf(page, "%d\n", mddev->fail_last_dev);
+	return sprintf(page, "%d\n", test_bit(MD_FAILLAST_DEV, &mddev->flags));
 }
 
 /*
- * Setting fail_last_dev to true to allow last device to be forcibly removed
+ * Setting MD_FAILLAST_DEV to allow last device to be forcibly removed
  * from RAID1/RAID10.
  */
 static ssize_t
@@ -5882,8 +5882,10 @@ fail_last_dev_store(struct mddev *mddev, const char *buf, size_t len)
 	if (ret)
 		return ret;
 
-	if (value != mddev->fail_last_dev)
-		mddev->fail_last_dev = value;
+	if (value)
+		set_bit(MD_FAILLAST_DEV, &mddev->flags);
+	else
+		clear_bit(MD_FAILLAST_DEV, &mddev->flags);
 
 	return len;
 }
diff --git a/drivers/md/md.h b/drivers/md/md.h
index b4c9aa600edd..297a104fba88 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -341,6 +341,7 @@ struct md_cluster_operations;
  * @MD_BROKEN: This is used to stop writes and mark array as failed.
  * @MD_DELETED: This device is being deleted
  * @MD_HAS_SUPERBLOCK: There is persistence sb in member disks.
+ * @MD_FAILLAST_DEV: Allow last rdev to be removed.
  *
  * change UNSUPPORTED_MDDEV_FLAGS for each array type if new flag is added
  */
@@ -358,6 +359,7 @@ enum mddev_flags {
 	MD_DO_DELETE,
 	MD_DELETED,
 	MD_HAS_SUPERBLOCK,
+	MD_FAILLAST_DEV,
 };
 
 enum mddev_sb_flags {
@@ -625,7 +627,6 @@ struct mddev {
 	/* The sequence number for sync thread */
 	atomic_t sync_seq;
 
-	bool	fail_last_dev:1;
 	bool	serialize_policy:1;
 };
 
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 985c377356eb..4d567fcf6a7c 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -27,7 +27,8 @@ module_param(default_layout, int, 0644);
 	 (1L << MD_JOURNAL_CLEAN) |	\
 	 (1L << MD_FAILFAST_SUPPORTED) |\
 	 (1L << MD_HAS_PPL) |		\
-	 (1L << MD_HAS_MULTIPLE_PPLS))
+	 (1L << MD_HAS_MULTIPLE_PPLS) |	\
+	 (1L << MD_FAILLAST_DEV))
 
 /*
  * inform the user of the raid configuration
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 57d50465eed1..98b5c93810bb 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1746,7 +1746,7 @@ static void raid1_status(struct seq_file *seq, struct mddev *mddev)
  *	- &mddev->degraded is bumped.
  *
  * @rdev is marked as &Faulty excluding case when array is failed and
- * &mddev->fail_last_dev is off.
+ * MD_FAILLAST_DEV is not set.
  */
 static void raid1_error(struct mddev *mddev, struct md_rdev *rdev)
 {
@@ -1759,7 +1759,7 @@ static void raid1_error(struct mddev *mddev, struct md_rdev *rdev)
 	    (conf->raid_disks - mddev->degraded) == 1) {
 		set_bit(MD_BROKEN, &mddev->flags);
 
-		if (!mddev->fail_last_dev) {
+		if (!test_bit(MD_FAILLAST_DEV, &mddev->flags)) {
 			conf->recovery_disabled = mddev->recovery_disabled;
 			spin_unlock_irqrestore(&conf->device_lock, flags);
 			return;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 84be4cc7e873..09328e032f14 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1990,7 +1990,7 @@ static int enough(struct r10conf *conf, int ignore)
  *	- &mddev->degraded is bumped.
  *
  * @rdev is marked as &Faulty excluding case when array is failed and
- * &mddev->fail_last_dev is off.
+ * MD_FAILLAST_DEV is not set.
  */
 static void raid10_error(struct mddev *mddev, struct md_rdev *rdev)
 {
@@ -2002,7 +2002,7 @@ static void raid10_error(struct mddev *mddev, struct md_rdev *rdev)
 	if (test_bit(In_sync, &rdev->flags) && !enough(conf, rdev->raid_disk)) {
 		set_bit(MD_BROKEN, &mddev->flags);
 
-		if (!mddev->fail_last_dev) {
+		if (!test_bit(MD_FAILLAST_DEV, &mddev->flags)) {
 			spin_unlock_irqrestore(&conf->device_lock, flags);
 			return;
 		}
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index e57ce3295292..441bc838f250 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -56,7 +56,10 @@
 #include "md-bitmap.h"
 #include "raid5-log.h"
 
-#define UNSUPPORTED_MDDEV_FLAGS	(1L << MD_FAILFAST_SUPPORTED)
+#define UNSUPPORTED_MDDEV_FLAGS		\
+	((1L << MD_FAILFAST_SUPPORTED) |	\
+	 (1L << MD_FAILLAST_DEV))
+
 
 #define cpu_to_group(cpu) cpu_to_node(cpu)
 #define ANY_GROUP NUMA_NO_NODE
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 03/11] md: merge mddev serialize_policy into mddev_flags
  2026-01-12  4:28 [PATCH v4 00/11] md: align bio to io_opt for better performance Yu Kuai
  2026-01-12  4:28 ` [PATCH v4 01/11] md: merge mddev has_superblock into mddev_flags Yu Kuai
  2026-01-12  4:28 ` [PATCH v4 02/11] md: merge mddev faillast_dev " Yu Kuai
@ 2026-01-12  4:28 ` Yu Kuai
  2026-01-12  4:28 ` [PATCH v4 04/11] md/raid5: use mempool to allocate stripe_request_ctx Yu Kuai
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2026-01-12  4:28 UTC (permalink / raw)
  To: linux-raid, linan122; +Cc: yukuai

There is not need to use a separate field in struct mddev, there are no
functional changes.

Link: https://lore.kernel.org/linux-raid/20260103154543.832844-4-yukuai@fnnas.com
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
Reviewed-by: Li Nan <linan122@huawei.com>
---
 drivers/md/md-bitmap.c |  4 ++--
 drivers/md/md.c        | 20 ++++++++++++--------
 drivers/md/md.h        |  4 ++--
 drivers/md/raid0.c     |  3 ++-
 drivers/md/raid1.c     |  4 ++--
 drivers/md/raid5.c     |  3 ++-
 6 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 84b7e2af6dba..dbe4c4b9a1da 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -2085,7 +2085,7 @@ static void bitmap_destroy(struct mddev *mddev)
 		return;
 
 	bitmap_wait_behind_writes(mddev);
-	if (!mddev->serialize_policy)
+	if (!test_bit(MD_SERIALIZE_POLICY, &mddev->flags))
 		mddev_destroy_serial_pool(mddev, NULL);
 
 	mutex_lock(&mddev->bitmap_info.mutex);
@@ -2809,7 +2809,7 @@ backlog_store(struct mddev *mddev, const char *buf, size_t len)
 	mddev->bitmap_info.max_write_behind = backlog;
 	if (!backlog && mddev->serial_info_pool) {
 		/* serial_info_pool is not needed if backlog is zero */
-		if (!mddev->serialize_policy)
+		if (!test_bit(MD_SERIALIZE_POLICY, &mddev->flags))
 			mddev_destroy_serial_pool(mddev, NULL);
 	} else if (backlog && !mddev->serial_info_pool) {
 		/* serial_info_pool is needed since backlog is not zero */
diff --git a/drivers/md/md.c b/drivers/md/md.c
index be0d33fbf988..21b0bc3088d2 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -279,7 +279,8 @@ void mddev_destroy_serial_pool(struct mddev *mddev, struct md_rdev *rdev)
 
 		rdev_for_each(temp, mddev) {
 			if (!rdev) {
-				if (!mddev->serialize_policy ||
+				if (!test_bit(MD_SERIALIZE_POLICY,
+					      &mddev->flags) ||
 				    !rdev_need_serial(temp))
 					rdev_uninit_serial(temp);
 				else
@@ -5898,11 +5899,12 @@ static ssize_t serialize_policy_show(struct mddev *mddev, char *page)
 	if (mddev->pers == NULL || (mddev->pers->head.id != ID_RAID1))
 		return sprintf(page, "n/a\n");
 	else
-		return sprintf(page, "%d\n", mddev->serialize_policy);
+		return sprintf(page, "%d\n",
+			       test_bit(MD_SERIALIZE_POLICY, &mddev->flags));
 }
 
 /*
- * Setting serialize_policy to true to enforce write IO is not reordered
+ * Setting MD_SERIALIZE_POLICY enforce write IO is not reordered
  * for raid1.
  */
 static ssize_t
@@ -5915,7 +5917,7 @@ serialize_policy_store(struct mddev *mddev, const char *buf, size_t len)
 	if (err)
 		return err;
 
-	if (value == mddev->serialize_policy)
+	if (value == test_bit(MD_SERIALIZE_POLICY, &mddev->flags))
 		return len;
 
 	err = mddev_suspend_and_lock(mddev);
@@ -5927,11 +5929,13 @@ serialize_policy_store(struct mddev *mddev, const char *buf, size_t len)
 		goto unlock;
 	}
 
-	if (value)
+	if (value) {
 		mddev_create_serial_pool(mddev, NULL);
-	else
+		set_bit(MD_SERIALIZE_POLICY, &mddev->flags);
+	} else {
 		mddev_destroy_serial_pool(mddev, NULL);
-	mddev->serialize_policy = value;
+		clear_bit(MD_SERIALIZE_POLICY, &mddev->flags);
+	}
 unlock:
 	mddev_unlock_and_resume(mddev);
 	return err ?: len;
@@ -6828,7 +6832,7 @@ static void __md_stop_writes(struct mddev *mddev)
 		md_update_sb(mddev, 1);
 	}
 	/* disable policy to guarantee rdevs free resources for serialization */
-	mddev->serialize_policy = 0;
+	clear_bit(MD_SERIALIZE_POLICY, &mddev->flags);
 	mddev_destroy_serial_pool(mddev, NULL);
 }
 
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 297a104fba88..6ee18045f41c 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -342,6 +342,7 @@ struct md_cluster_operations;
  * @MD_DELETED: This device is being deleted
  * @MD_HAS_SUPERBLOCK: There is persistence sb in member disks.
  * @MD_FAILLAST_DEV: Allow last rdev to be removed.
+ * @MD_SERIALIZE_POLICY: Enforce write IO is not reordered, just used by raid1.
  *
  * change UNSUPPORTED_MDDEV_FLAGS for each array type if new flag is added
  */
@@ -360,6 +361,7 @@ enum mddev_flags {
 	MD_DELETED,
 	MD_HAS_SUPERBLOCK,
 	MD_FAILLAST_DEV,
+	MD_SERIALIZE_POLICY,
 };
 
 enum mddev_sb_flags {
@@ -626,8 +628,6 @@ struct mddev {
 
 	/* The sequence number for sync thread */
 	atomic_t sync_seq;
-
-	bool	serialize_policy:1;
 };
 
 enum recovery_flags {
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 4d567fcf6a7c..d83b2b1c0049 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -28,7 +28,8 @@ module_param(default_layout, int, 0644);
 	 (1L << MD_FAILFAST_SUPPORTED) |\
 	 (1L << MD_HAS_PPL) |		\
 	 (1L << MD_HAS_MULTIPLE_PPLS) |	\
-	 (1L << MD_FAILLAST_DEV))
+	 (1L << MD_FAILLAST_DEV) |	\
+	 (1L << MD_SERIALIZE_POLICY))
 
 /*
  * inform the user of the raid configuration
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 98b5c93810bb..f4c7004888af 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -542,7 +542,7 @@ static void raid1_end_write_request(struct bio *bio)
 				call_bio_endio(r1_bio);
 			}
 		}
-	} else if (rdev->mddev->serialize_policy)
+	} else if (test_bit(MD_SERIALIZE_POLICY, &rdev->mddev->flags))
 		remove_serial(rdev, lo, hi);
 	if (r1_bio->bios[mirror] == NULL)
 		rdev_dec_pending(rdev, conf->mddev);
@@ -1644,7 +1644,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 			mbio = bio_alloc_clone(rdev->bdev, bio, GFP_NOIO,
 					       &mddev->bio_set);
 
-			if (mddev->serialize_policy)
+			if (test_bit(MD_SERIALIZE_POLICY, &mddev->flags))
 				wait_for_serialization(rdev, r1_bio);
 		}
 
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 441bc838f250..2294d00953af 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -58,7 +58,8 @@
 
 #define UNSUPPORTED_MDDEV_FLAGS		\
 	((1L << MD_FAILFAST_SUPPORTED) |	\
-	 (1L << MD_FAILLAST_DEV))
+	 (1L << MD_FAILLAST_DEV) |		\
+	 (1L << MD_SERIALIZE_POLICY))
 
 
 #define cpu_to_group(cpu) cpu_to_node(cpu)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 04/11] md/raid5: use mempool to allocate stripe_request_ctx
  2026-01-12  4:28 [PATCH v4 00/11] md: align bio to io_opt for better performance Yu Kuai
                   ` (2 preceding siblings ...)
  2026-01-12  4:28 ` [PATCH v4 03/11] md: merge mddev serialize_policy " Yu Kuai
@ 2026-01-12  4:28 ` Yu Kuai
  2026-01-12  4:28 ` [PATCH v4 05/11] md/raid5: make sure max_sectors is not less than io_opt Yu Kuai
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2026-01-12  4:28 UTC (permalink / raw)
  To: linux-raid, linan122; +Cc: yukuai

On the one hand, stripe_request_ctx is 72 bytes, and it's a bit huge for
a stack variable.

On the other hand, the bitmap sectors_to_do is a fixed size, result in
max_hw_sector_kb of raid5 array is at most 256 * 4k = 1Mb, and this will
make full stripe IO impossible for the array that chunk_size * data_disks
is bigger. Allocate ctx during runtime will make it possible to get rid
of this limit.

Link: https://lore.kernel.org/linux-raid/20260103154543.832844-5-yukuai@fnnas.com
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
Reviewed-by: Li Nan <linan122@huawei.com>
---
 drivers/md/md.h       |  4 +++
 drivers/md/raid1-10.c |  5 ----
 drivers/md/raid5.c    | 61 +++++++++++++++++++++++++++----------------
 drivers/md/raid5.h    |  2 ++
 4 files changed, 45 insertions(+), 27 deletions(-)

diff --git a/drivers/md/md.h b/drivers/md/md.h
index 6ee18045f41c..b8c5dec12b62 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -22,6 +22,10 @@
 #include <trace/events/block.h>
 
 #define MaxSector (~(sector_t)0)
+/*
+ * Number of guaranteed raid bios in case of extreme VM load:
+ */
+#define	NR_RAID_BIOS 256
 
 enum md_submodule_type {
 	MD_PERSONALITY = 0,
diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
index 521625756128..c33099925f23 100644
--- a/drivers/md/raid1-10.c
+++ b/drivers/md/raid1-10.c
@@ -3,11 +3,6 @@
 #define RESYNC_BLOCK_SIZE (64*1024)
 #define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
 
-/*
- * Number of guaranteed raid bios in case of extreme VM load:
- */
-#define	NR_RAID_BIOS 256
-
 /* when we get a read error on a read-only array, we redirect to another
  * device without failing the first device, or trying to over-write to
  * correct the read error.  To keep track of bad blocks on a per-bio
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 2294d00953af..e92514c91305 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6084,13 +6084,13 @@ static sector_t raid5_bio_lowest_chunk_sector(struct r5conf *conf,
 static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 {
 	DEFINE_WAIT_FUNC(wait, woken_wake_function);
-	bool on_wq;
 	struct r5conf *conf = mddev->private;
-	sector_t logical_sector;
-	struct stripe_request_ctx ctx = {};
 	const int rw = bio_data_dir(bi);
+	struct stripe_request_ctx *ctx;
+	sector_t logical_sector;
 	enum stripe_result res;
 	int s, stripe_cnt;
+	bool on_wq;
 
 	if (unlikely(bi->bi_opf & REQ_PREFLUSH)) {
 		int ret = log_handle_flush_request(conf, bi);
@@ -6102,11 +6102,6 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 				return true;
 		}
 		/* ret == -EAGAIN, fallback */
-		/*
-		 * if r5l_handle_flush_request() didn't clear REQ_PREFLUSH,
-		 * we need to flush journal device
-		 */
-		ctx.do_flush = bi->bi_opf & REQ_PREFLUSH;
 	}
 
 	md_write_start(mddev, bi);
@@ -6129,16 +6124,25 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 	}
 
 	logical_sector = bi->bi_iter.bi_sector & ~((sector_t)RAID5_STRIPE_SECTORS(conf)-1);
-	ctx.first_sector = logical_sector;
-	ctx.last_sector = bio_end_sector(bi);
 	bi->bi_next = NULL;
 
-	stripe_cnt = DIV_ROUND_UP_SECTOR_T(ctx.last_sector - logical_sector,
+	ctx = mempool_alloc(conf->ctx_pool, GFP_NOIO);
+	memset(ctx, 0, sizeof(*ctx));
+	ctx->first_sector = logical_sector;
+	ctx->last_sector = bio_end_sector(bi);
+	/*
+	 * if r5l_handle_flush_request() didn't clear REQ_PREFLUSH,
+	 * we need to flush journal device
+	 */
+	if (unlikely(bi->bi_opf & REQ_PREFLUSH))
+		ctx->do_flush = true;
+
+	stripe_cnt = DIV_ROUND_UP_SECTOR_T(ctx->last_sector - logical_sector,
 					   RAID5_STRIPE_SECTORS(conf));
-	bitmap_set(ctx.sectors_to_do, 0, stripe_cnt);
+	bitmap_set(ctx->sectors_to_do, 0, stripe_cnt);
 
 	pr_debug("raid456: %s, logical %llu to %llu\n", __func__,
-		 bi->bi_iter.bi_sector, ctx.last_sector);
+		 bi->bi_iter.bi_sector, ctx->last_sector);
 
 	/* Bail out if conflicts with reshape and REQ_NOWAIT is set */
 	if ((bi->bi_opf & REQ_NOWAIT) &&
@@ -6146,6 +6150,7 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 		bio_wouldblock_error(bi);
 		if (rw == WRITE)
 			md_write_end(mddev);
+		mempool_free(ctx, conf->ctx_pool);
 		return true;
 	}
 	md_account_bio(mddev, &bi);
@@ -6164,10 +6169,10 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 		add_wait_queue(&conf->wait_for_reshape, &wait);
 		on_wq = true;
 	}
-	s = (logical_sector - ctx.first_sector) >> RAID5_STRIPE_SHIFT(conf);
+	s = (logical_sector - ctx->first_sector) >> RAID5_STRIPE_SHIFT(conf);
 
 	while (1) {
-		res = make_stripe_request(mddev, conf, &ctx, logical_sector,
+		res = make_stripe_request(mddev, conf, ctx, logical_sector,
 					  bi);
 		if (res == STRIPE_FAIL || res == STRIPE_WAIT_RESHAPE)
 			break;
@@ -6184,9 +6189,9 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 			 * raid5_activate_delayed() from making progress
 			 * and thus deadlocking.
 			 */
-			if (ctx.batch_last) {
-				raid5_release_stripe(ctx.batch_last);
-				ctx.batch_last = NULL;
+			if (ctx->batch_last) {
+				raid5_release_stripe(ctx->batch_last);
+				ctx->batch_last = NULL;
 			}
 
 			wait_woken(&wait, TASK_UNINTERRUPTIBLE,
@@ -6194,21 +6199,23 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 			continue;
 		}
 
-		s = find_next_bit_wrap(ctx.sectors_to_do, stripe_cnt, s);
+		s = find_next_bit_wrap(ctx->sectors_to_do, stripe_cnt, s);
 		if (s == stripe_cnt)
 			break;
 
-		logical_sector = ctx.first_sector +
+		logical_sector = ctx->first_sector +
 			(s << RAID5_STRIPE_SHIFT(conf));
 	}
 	if (unlikely(on_wq))
 		remove_wait_queue(&conf->wait_for_reshape, &wait);
 
-	if (ctx.batch_last)
-		raid5_release_stripe(ctx.batch_last);
+	if (ctx->batch_last)
+		raid5_release_stripe(ctx->batch_last);
 
 	if (rw == WRITE)
 		md_write_end(mddev);
+
+	mempool_free(ctx, conf->ctx_pool);
 	if (res == STRIPE_WAIT_RESHAPE) {
 		md_free_cloned_bio(bi);
 		return false;
@@ -7376,6 +7383,9 @@ static void free_conf(struct r5conf *conf)
 	bioset_exit(&conf->bio_split);
 	kfree(conf->stripe_hashtbl);
 	kfree(conf->pending_data);
+
+	mempool_destroy(conf->ctx_pool);
+
 	kfree(conf);
 }
 
@@ -8059,6 +8069,13 @@ static int raid5_run(struct mddev *mddev)
 			goto abort;
 	}
 
+	conf->ctx_pool = mempool_create_kmalloc_pool(NR_RAID_BIOS,
+					sizeof(struct stripe_request_ctx));
+	if (!conf->ctx_pool) {
+		ret = -ENOMEM;
+		goto abort;
+	}
+
 	if (log_init(conf, journal_dev, raid5_has_ppl(conf)))
 		goto abort;
 
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index eafc6e9ed6ee..6e3f07119fa4 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -690,6 +690,8 @@ struct r5conf {
 	struct list_head	pending_list;
 	int			pending_data_cnt;
 	struct r5pending_data	*next_pending_data;
+
+	mempool_t		*ctx_pool;
 };
 
 #if PAGE_SIZE == DEFAULT_STRIPE_SIZE
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 05/11] md/raid5: make sure max_sectors is not less than io_opt
  2026-01-12  4:28 [PATCH v4 00/11] md: align bio to io_opt for better performance Yu Kuai
                   ` (3 preceding siblings ...)
  2026-01-12  4:28 ` [PATCH v4 04/11] md/raid5: use mempool to allocate stripe_request_ctx Yu Kuai
@ 2026-01-12  4:28 ` Yu Kuai
  2026-01-12  4:28 ` [PATCH v4 06/11] md: support to align bio to limits Yu Kuai
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2026-01-12  4:28 UTC (permalink / raw)
  To: linux-raid, linan122; +Cc: yukuai

Otherwise, even if user issue IO by io_opt, such IO will be split
by max_sectors before they are submitted to raid5. For consequence,
full stripe IO is impossible.

BTW, dm-raid5 is not affected and still have such problem.

Link: https://lore.kernel.org/linux-raid/20260103154543.832844-6-yukuai@fnnas.com
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 drivers/md/raid5.c | 38 ++++++++++++++++++++++++++++----------
 drivers/md/raid5.h |  1 +
 2 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index e92514c91305..af48ad2bc723 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -777,14 +777,14 @@ struct stripe_request_ctx {
 	/* last sector in the request */
 	sector_t last_sector;
 
+	/* the request had REQ_PREFLUSH, cleared after the first stripe_head */
+	bool do_flush;
+
 	/*
 	 * bitmap to track stripe sectors that have been added to stripes
 	 * add one to account for unaligned requests
 	 */
-	DECLARE_BITMAP(sectors_to_do, RAID5_MAX_REQ_STRIPES + 1);
-
-	/* the request had REQ_PREFLUSH, cleared after the first stripe_head */
-	bool do_flush;
+	unsigned long sectors_to_do[];
 };
 
 /*
@@ -6127,7 +6127,7 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 	bi->bi_next = NULL;
 
 	ctx = mempool_alloc(conf->ctx_pool, GFP_NOIO);
-	memset(ctx, 0, sizeof(*ctx));
+	memset(ctx, 0, conf->ctx_size);
 	ctx->first_sector = logical_sector;
 	ctx->last_sector = bio_end_sector(bi);
 	/*
@@ -7741,6 +7741,25 @@ static int only_parity(int raid_disk, int algo, int raid_disks, int max_degraded
 	return 0;
 }
 
+static int raid5_create_ctx_pool(struct r5conf *conf)
+{
+	struct stripe_request_ctx *ctx;
+	int size;
+
+	if (mddev_is_dm(conf->mddev))
+		size = BITS_TO_LONGS(RAID5_MAX_REQ_STRIPES);
+	else
+		size = BITS_TO_LONGS(
+			queue_max_hw_sectors(conf->mddev->gendisk->queue) >>
+			RAID5_STRIPE_SHIFT(conf));
+
+	conf->ctx_size = struct_size(ctx, sectors_to_do, size);
+	conf->ctx_pool = mempool_create_kmalloc_pool(NR_RAID_BIOS,
+						     conf->ctx_size);
+
+	return conf->ctx_pool ? 0 : -ENOMEM;
+}
+
 static int raid5_set_limits(struct mddev *mddev)
 {
 	struct r5conf *conf = mddev->private;
@@ -7797,6 +7816,8 @@ static int raid5_set_limits(struct mddev *mddev)
 	 * Limit the max sectors based on this.
 	 */
 	lim.max_hw_sectors = RAID5_MAX_REQ_STRIPES << RAID5_STRIPE_SHIFT(conf);
+	if ((lim.max_hw_sectors << 9) < lim.io_opt)
+		lim.max_hw_sectors = lim.io_opt >> 9;
 
 	/* No restrictions on the number of segments in the request */
 	lim.max_segments = USHRT_MAX;
@@ -8069,12 +8090,9 @@ static int raid5_run(struct mddev *mddev)
 			goto abort;
 	}
 
-	conf->ctx_pool = mempool_create_kmalloc_pool(NR_RAID_BIOS,
-					sizeof(struct stripe_request_ctx));
-	if (!conf->ctx_pool) {
-		ret = -ENOMEM;
+	ret = raid5_create_ctx_pool(conf);
+	if (ret)
 		goto abort;
-	}
 
 	if (log_init(conf, journal_dev, raid5_has_ppl(conf)))
 		goto abort;
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index 6e3f07119fa4..ddfe65237888 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -692,6 +692,7 @@ struct r5conf {
 	struct r5pending_data	*next_pending_data;
 
 	mempool_t		*ctx_pool;
+	int			ctx_size;
 };
 
 #if PAGE_SIZE == DEFAULT_STRIPE_SIZE
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 06/11] md: support to align bio to limits
  2026-01-12  4:28 [PATCH v4 00/11] md: align bio to io_opt for better performance Yu Kuai
                   ` (4 preceding siblings ...)
  2026-01-12  4:28 ` [PATCH v4 05/11] md/raid5: make sure max_sectors is not less than io_opt Yu Kuai
@ 2026-01-12  4:28 ` Yu Kuai
  2026-01-12 11:24   ` Li Nan
  2026-01-12  4:28 ` [PATCH v4 07/11] md: add a helper md_config_align_limits() Yu Kuai
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 19+ messages in thread
From: Yu Kuai @ 2026-01-12  4:28 UTC (permalink / raw)
  To: linux-raid, linan122; +Cc: yukuai

For personalities that report optimal IO size, it indicates that users
can get the best IO bandwidth if they issue IO with this size. However
there is also an implicit condition that IO should also be aligned to the
optimal IO size.

Currently, bio will only be split by limits, if bio offset is not aligned
to limits, then all split bio will not be aligned. This patch add a new
feature to align bio to limits first, and following patches will support
this for each personality if necessary.

Link: https://lore.kernel.org/linux-raid/20260103154543.832844-7-yukuai@fnnas.com
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
Reviewed-by: Li Nan <linan122@huawei.com>
---
 drivers/md/md.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/md/md.h |  2 ++
 2 files changed, 56 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 21b0bc3088d2..731ec800f5cb 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -428,6 +428,56 @@ bool md_handle_request(struct mddev *mddev, struct bio *bio)
 }
 EXPORT_SYMBOL(md_handle_request);
 
+static struct bio *__md_bio_align_to_limits(struct mddev *mddev,
+					     struct bio *bio)
+{
+	unsigned int max_sectors = mddev->gendisk->queue->limits.max_sectors;
+	sector_t start = bio->bi_iter.bi_sector;
+	sector_t end = start + bio_sectors(bio);
+	sector_t align_start;
+	sector_t align_end;
+	u32 rem;
+
+	/* calculate align_start = roundup(start, max_sectors) */
+	align_start = start;
+	rem = sector_div(align_start, max_sectors);
+	/* already aligned */
+	if (!rem)
+		return bio;
+
+	align_start = start + max_sectors - rem;
+
+	/* calculate align_end = rounddown(end, max_sectors) */
+	align_end = end;
+	rem = sector_div(align_end, max_sectors);
+	align_end = end - rem;
+
+	/* bio is too small to split */
+	if (align_end <= align_start)
+		return bio;
+
+	return bio_submit_split_bioset(bio, align_start - start,
+				       &mddev->gendisk->bio_split);
+}
+
+static struct bio *md_bio_align_to_limits(struct mddev *mddev, struct bio *bio)
+{
+	if (!test_bit(MD_BIO_ALIGN, &mddev->flags))
+		return bio;
+
+	/* atomic write can't split */
+	if (bio->bi_opf & REQ_ATOMIC)
+		return bio;
+
+	switch (bio_op(bio)) {
+	case REQ_OP_READ:
+	case REQ_OP_WRITE:
+		return __md_bio_align_to_limits(mddev, bio);
+	default:
+		return bio;
+	}
+}
+
 static void md_submit_bio(struct bio *bio)
 {
 	const int rw = bio_data_dir(bio);
@@ -443,6 +493,10 @@ static void md_submit_bio(struct bio *bio)
 		return;
 	}
 
+	bio = md_bio_align_to_limits(mddev, bio);
+	if (!bio)
+		return;
+
 	bio = bio_split_to_limits(bio);
 	if (!bio)
 		return;
diff --git a/drivers/md/md.h b/drivers/md/md.h
index b8c5dec12b62..e7aba83b708b 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -347,6 +347,7 @@ struct md_cluster_operations;
  * @MD_HAS_SUPERBLOCK: There is persistence sb in member disks.
  * @MD_FAILLAST_DEV: Allow last rdev to be removed.
  * @MD_SERIALIZE_POLICY: Enforce write IO is not reordered, just used by raid1.
+ * @MD_BIO_ALIGN: Bio issued to the array will align to io_opt before split.
  *
  * change UNSUPPORTED_MDDEV_FLAGS for each array type if new flag is added
  */
@@ -366,6 +367,7 @@ enum mddev_flags {
 	MD_HAS_SUPERBLOCK,
 	MD_FAILLAST_DEV,
 	MD_SERIALIZE_POLICY,
+	MD_BIO_ALIGN,
 };
 
 enum mddev_sb_flags {
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 07/11] md: add a helper md_config_align_limits()
  2026-01-12  4:28 [PATCH v4 00/11] md: align bio to io_opt for better performance Yu Kuai
                   ` (5 preceding siblings ...)
  2026-01-12  4:28 ` [PATCH v4 06/11] md: support to align bio to limits Yu Kuai
@ 2026-01-12  4:28 ` Yu Kuai
  2026-01-12  4:28 ` [PATCH v4 08/11] md/raid5: align bio to io_opt Yu Kuai
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2026-01-12  4:28 UTC (permalink / raw)
  To: linux-raid, linan122; +Cc: yukuai

This helper will be used by personalities that want to align bio to
io_opt to get best IO bandwidth.

Also add the new flag to UNSUPPORTED_MDDEV_FLAGS for now, following
patches will enable this for personalities.

Signed-off-by: Yu Kuai <yukuai@fnnas.com>
Reviewed-by: Li Nan <linan122@huawei.com>
---
 drivers/md/md.h    | 11 +++++++++++
 drivers/md/raid0.c |  3 ++-
 drivers/md/raid1.c |  3 ++-
 drivers/md/raid5.c |  3 ++-
 4 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/md/md.h b/drivers/md/md.h
index e7aba83b708b..ddf989f2a139 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -1091,6 +1091,17 @@ static inline bool rdev_blocked(struct md_rdev *rdev)
 	return false;
 }
 
+static inline void md_config_align_limits(struct mddev *mddev,
+					  struct queue_limits *lim)
+{
+	if ((lim->max_hw_sectors << 9) < lim->io_opt)
+		lim->max_hw_sectors = lim->io_opt >> 9;
+	else
+		lim->max_hw_sectors = rounddown(lim->max_hw_sectors,
+						lim->io_opt >> 9);
+	set_bit(MD_BIO_ALIGN, &mddev->flags);
+}
+
 #define mddev_add_trace_msg(mddev, fmt, args...)			\
 do {									\
 	if (!mddev_is_dm(mddev))					\
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index d83b2b1c0049..f3814a69cd13 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -29,7 +29,8 @@ module_param(default_layout, int, 0644);
 	 (1L << MD_HAS_PPL) |		\
 	 (1L << MD_HAS_MULTIPLE_PPLS) |	\
 	 (1L << MD_FAILLAST_DEV) |	\
-	 (1L << MD_SERIALIZE_POLICY))
+	 (1L << MD_SERIALIZE_POLICY) |	\
+	 (1L << MD_BIO_ALIGN))
 
 /*
  * inform the user of the raid configuration
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index f4c7004888af..1a957dba2640 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -42,7 +42,8 @@
 	((1L << MD_HAS_JOURNAL) |	\
 	 (1L << MD_JOURNAL_CLEAN) |	\
 	 (1L << MD_HAS_PPL) |		\
-	 (1L << MD_HAS_MULTIPLE_PPLS))
+	 (1L << MD_HAS_MULTIPLE_PPLS) |	\
+	 (1L << MD_BIO_ALIGN))
 
 static void allow_barrier(struct r1conf *conf, sector_t sector_nr);
 static void lower_barrier(struct r1conf *conf, sector_t sector_nr);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index af48ad2bc723..30a7069cbd0c 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -59,7 +59,8 @@
 #define UNSUPPORTED_MDDEV_FLAGS		\
 	((1L << MD_FAILFAST_SUPPORTED) |	\
 	 (1L << MD_FAILLAST_DEV) |		\
-	 (1L << MD_SERIALIZE_POLICY))
+	 (1L << MD_SERIALIZE_POLICY) |		\
+	 (1L << MD_BIO_ALIGN))
 
 
 #define cpu_to_group(cpu) cpu_to_node(cpu)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 08/11] md/raid5: align bio to io_opt
  2026-01-12  4:28 [PATCH v4 00/11] md: align bio to io_opt for better performance Yu Kuai
                   ` (6 preceding siblings ...)
  2026-01-12  4:28 ` [PATCH v4 07/11] md: add a helper md_config_align_limits() Yu Kuai
@ 2026-01-12  4:28 ` Yu Kuai
  2026-01-12  4:28 ` [PATCH v4 09/11] md/raid10: " Yu Kuai
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2026-01-12  4:28 UTC (permalink / raw)
  To: linux-raid, linan122; +Cc: yukuai

raid5 internal implementaion indicates that if write bio is aligned to
io_opt, then full stripe write will be used, which will be best for
bandwidth because there is no need to read extra data to build new
xor data.

Simple test in my VM, 32 disks raid5 with 64kb chunksize:
dd if=/dev/zero of=/dev/md0 bs=100M oflag=direct

Before this patch:  782 MB/s
With this patch:    1.1 GB/s

BTW, there are still other bottleneck related to stripe handler, and
require further optimization.

Signed-off-by: Yu Kuai <yukuai@fnnas.com>
Reviewed-by: Li Nan <linan122@huawei.com>
---
 drivers/md/raid5.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 30a7069cbd0c..0160cbed7389 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -59,8 +59,7 @@
 #define UNSUPPORTED_MDDEV_FLAGS		\
 	((1L << MD_FAILFAST_SUPPORTED) |	\
 	 (1L << MD_FAILLAST_DEV) |		\
-	 (1L << MD_SERIALIZE_POLICY) |		\
-	 (1L << MD_BIO_ALIGN))
+	 (1L << MD_SERIALIZE_POLICY))
 
 
 #define cpu_to_group(cpu) cpu_to_node(cpu)
@@ -7817,8 +7816,7 @@ static int raid5_set_limits(struct mddev *mddev)
 	 * Limit the max sectors based on this.
 	 */
 	lim.max_hw_sectors = RAID5_MAX_REQ_STRIPES << RAID5_STRIPE_SHIFT(conf);
-	if ((lim.max_hw_sectors << 9) < lim.io_opt)
-		lim.max_hw_sectors = lim.io_opt >> 9;
+	md_config_align_limits(mddev, &lim);
 
 	/* No restrictions on the number of segments in the request */
 	lim.max_segments = USHRT_MAX;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 09/11] md/raid10: align bio to io_opt
  2026-01-12  4:28 [PATCH v4 00/11] md: align bio to io_opt for better performance Yu Kuai
                   ` (7 preceding siblings ...)
  2026-01-12  4:28 ` [PATCH v4 08/11] md/raid5: align bio to io_opt Yu Kuai
@ 2026-01-12  4:28 ` Yu Kuai
  2026-01-12  4:28 ` [PATCH v4 10/11] md/raid0: " Yu Kuai
  2026-01-12  4:28 ` [PATCH v4 11/11] md: fix abnormal io_opt from member disks Yu Kuai
  10 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2026-01-12  4:28 UTC (permalink / raw)
  To: linux-raid, linan122; +Cc: yukuai

The impact is not so significant for raid10 compared to raid5, however
it's still more appropriate to issue IOs evenly to underlying disks.

Signed-off-by: Yu Kuai <yukuai@fnnas.com>
Reviewed-by: Li Nan <linan122@huawei.com>
---
 drivers/md/raid10.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 09328e032f14..2c6b65b83724 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4008,6 +4008,8 @@ static int raid10_set_queue_limits(struct mddev *mddev)
 	err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY);
 	if (err)
 		return err;
+
+	md_config_align_limits(mddev, &lim);
 	return queue_limits_set(mddev->gendisk->queue, &lim);
 }
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 10/11] md/raid0: align bio to io_opt
  2026-01-12  4:28 [PATCH v4 00/11] md: align bio to io_opt for better performance Yu Kuai
                   ` (8 preceding siblings ...)
  2026-01-12  4:28 ` [PATCH v4 09/11] md/raid10: " Yu Kuai
@ 2026-01-12  4:28 ` Yu Kuai
  2026-01-12  4:28 ` [PATCH v4 11/11] md: fix abnormal io_opt from member disks Yu Kuai
  10 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2026-01-12  4:28 UTC (permalink / raw)
  To: linux-raid, linan122; +Cc: yukuai

The impact is not so significant for raid0 compared to raid5, however
it's still more appropriate to issue IOs evenly to underlying disks.

Signed-off-by: Yu Kuai <yukuai@fnnas.com>
Reviewed-by: Li Nan <linan122@huawei.com>
---
 drivers/md/raid0.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index f3814a69cd13..0ae44e3bfff2 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -29,8 +29,7 @@ module_param(default_layout, int, 0644);
 	 (1L << MD_HAS_PPL) |		\
 	 (1L << MD_HAS_MULTIPLE_PPLS) |	\
 	 (1L << MD_FAILLAST_DEV) |	\
-	 (1L << MD_SERIALIZE_POLICY) |	\
-	 (1L << MD_BIO_ALIGN))
+	 (1L << MD_SERIALIZE_POLICY))
 
 /*
  * inform the user of the raid configuration
@@ -398,6 +397,8 @@ static int raid0_set_limits(struct mddev *mddev)
 	err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY);
 	if (err)
 		return err;
+
+	md_config_align_limits(mddev, &lim);
 	return queue_limits_set(mddev->gendisk->queue, &lim);
 }
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 11/11] md: fix abnormal io_opt from member disks
  2026-01-12  4:28 [PATCH v4 00/11] md: align bio to io_opt for better performance Yu Kuai
                   ` (9 preceding siblings ...)
  2026-01-12  4:28 ` [PATCH v4 10/11] md/raid0: " Yu Kuai
@ 2026-01-12  4:28 ` Yu Kuai
  2026-01-12  7:28   ` Li Nan
  2026-01-14  3:15   ` Xiao Ni
  10 siblings, 2 replies; 19+ messages in thread
From: Yu Kuai @ 2026-01-12  4:28 UTC (permalink / raw)
  To: linux-raid, linan122; +Cc: yukuai

It's reported that mtp3sas can report abnormal io_opt, for consequence,
md array will end up with abnormal io_opt as well, due to the
lcm_not_zero() from blk_stack_limits().

Some personalities will configure optimal IO size, and it's indicate that
users can get the best IO bandwidth if they issue IO with this size, and
we don't want io_opt to be covered by member disks with abnormal io_opt.

Fix this problem by adding a new mddev flags MD_STACK_IO_OPT to indicate
that io_opt configured by personalities is preferred over member disks
or not.

Reported-by: Filippo Giunchedi <filippo@debian.org>
Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1121006
Reported-by: Coly Li <colyli@fnnas.com>
Closes: https://lore.kernel.org/all/20250817152645.7115-1-colyli@kernel.org/
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 drivers/md/md.c     | 28 +++++++++++++++++++++++++++-
 drivers/md/md.h     |  3 ++-
 drivers/md/raid1.c  |  2 +-
 drivers/md/raid10.c |  4 ++--
 4 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 731ec800f5cb..6c0fb09c26dc 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6200,18 +6200,33 @@ static const struct kobj_type md_ktype = {
 
 int mdp_major = 0;
 
+static bool rdev_is_mddev(struct md_rdev *rdev)
+{
+	return rdev->bdev->bd_disk->fops == &md_fops;
+}
+
 /* stack the limit for all rdevs into lim */
 int mddev_stack_rdev_limits(struct mddev *mddev, struct queue_limits *lim,
 		unsigned int flags)
 {
 	struct md_rdev *rdev;
+	bool io_opt_configured = lim->io_opt;
 
 	rdev_for_each(rdev, mddev) {
+		unsigned int io_opt = lim->io_opt;
+
 		queue_limits_stack_bdev(lim, rdev->bdev, rdev->data_offset,
 					mddev->gendisk->disk_name);
 		if ((flags & MDDEV_STACK_INTEGRITY) &&
 		    !queue_limits_stack_integrity_bdev(lim, rdev->bdev))
 			return -EINVAL;
+
+		/*
+		 * If member disk is not mdraid array, keep the io_opt
+		 * from personality and ignore io_opt from member disk.
+		 */
+		if (!rdev_is_mddev(rdev) && io_opt_configured)
+			lim->io_opt = io_opt;
 	}
 
 	/*
@@ -6230,9 +6245,11 @@ int mddev_stack_rdev_limits(struct mddev *mddev, struct queue_limits *lim,
 EXPORT_SYMBOL_GPL(mddev_stack_rdev_limits);
 
 /* apply the extra stacking limits from a new rdev into mddev */
-int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev)
+int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev,
+			 bool io_opt_configured)
 {
 	struct queue_limits lim;
+	unsigned int io_opt;
 
 	if (mddev_is_dm(mddev))
 		return 0;
@@ -6245,6 +6262,8 @@ int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev)
 	}
 
 	lim = queue_limits_start_update(mddev->gendisk->queue);
+	io_opt = lim.io_opt;
+
 	queue_limits_stack_bdev(&lim, rdev->bdev, rdev->data_offset,
 				mddev->gendisk->disk_name);
 
@@ -6255,6 +6274,13 @@ int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev)
 		return -ENXIO;
 	}
 
+	/*
+	 * If member disk is not mdraid array, keep the io_opt from
+	 * personality and ignore io_opt from member disk.
+	 */
+	if (!rdev_is_mddev(rdev) && io_opt_configured)
+		lim.io_opt = io_opt;
+
 	return queue_limits_commit_update(mddev->gendisk->queue, &lim);
 }
 EXPORT_SYMBOL_GPL(mddev_stack_new_rdev);
diff --git a/drivers/md/md.h b/drivers/md/md.h
index ddf989f2a139..80c527b3777d 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -1041,7 +1041,8 @@ int do_md_run(struct mddev *mddev);
 #define MDDEV_STACK_INTEGRITY	(1u << 0)
 int mddev_stack_rdev_limits(struct mddev *mddev, struct queue_limits *lim,
 		unsigned int flags);
-int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev);
+int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev,
+			 bool io_opt_configured);
 void mddev_update_io_opt(struct mddev *mddev, unsigned int nr_stripes);
 
 extern const struct block_device_operations md_fops;
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 1a957dba2640..f3f3086f27fa 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1944,7 +1944,7 @@ static int raid1_add_disk(struct mddev *mddev, struct md_rdev *rdev)
 	for (mirror = first; mirror <= last; mirror++) {
 		p = conf->mirrors + mirror;
 		if (!p->rdev) {
-			err = mddev_stack_new_rdev(mddev, rdev);
+			err = mddev_stack_new_rdev(mddev, rdev, false);
 			if (err)
 				return err;
 
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 2c6b65b83724..a6edc91e7a9a 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2139,7 +2139,7 @@ static int raid10_add_disk(struct mddev *mddev, struct md_rdev *rdev)
 			continue;
 		}
 
-		err = mddev_stack_new_rdev(mddev, rdev);
+		err = mddev_stack_new_rdev(mddev, rdev, true);
 		if (err)
 			return err;
 		p->head_position = 0;
@@ -2157,7 +2157,7 @@ static int raid10_add_disk(struct mddev *mddev, struct md_rdev *rdev)
 		clear_bit(In_sync, &rdev->flags);
 		set_bit(Replacement, &rdev->flags);
 		rdev->raid_disk = repl_slot;
-		err = mddev_stack_new_rdev(mddev, rdev);
+		err = mddev_stack_new_rdev(mddev, rdev, true);
 		if (err)
 			return err;
 		conf->fullsync = 1;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 11/11] md: fix abnormal io_opt from member disks
  2026-01-12  4:28 ` [PATCH v4 11/11] md: fix abnormal io_opt from member disks Yu Kuai
@ 2026-01-12  7:28   ` Li Nan
  2026-01-14  3:15   ` Xiao Ni
  1 sibling, 0 replies; 19+ messages in thread
From: Li Nan @ 2026-01-12  7:28 UTC (permalink / raw)
  To: Yu Kuai, linux-raid



在 2026/1/12 12:28, Yu Kuai 写道:
> It's reported that mtp3sas can report abnormal io_opt, for consequence,
> md array will end up with abnormal io_opt as well, due to the
> lcm_not_zero() from blk_stack_limits().
> 
> Some personalities will configure optimal IO size, and it's indicate that
> users can get the best IO bandwidth if they issue IO with this size, and
> we don't want io_opt to be covered by member disks with abnormal io_opt.
> 
> Fix this problem by adding a new mddev flags MD_STACK_IO_OPT to indicate
> that io_opt configured by personalities is preferred over member disks
> or not.
> 
> Reported-by: Filippo Giunchedi <filippo@debian.org>
> Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1121006
> Reported-by: Coly Li <colyli@fnnas.com>
> Closes: https://lore.kernel.org/all/20250817152645.7115-1-colyli@kernel.org/
> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
> ---
>   drivers/md/md.c     | 28 +++++++++++++++++++++++++++-
>   drivers/md/md.h     |  3 ++-
>   drivers/md/raid1.c  |  2 +-
>   drivers/md/raid10.c |  4 ++--
>   4 files changed, 32 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 731ec800f5cb..6c0fb09c26dc 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -6200,18 +6200,33 @@ static const struct kobj_type md_ktype = {
>   
>   int mdp_major = 0;
>   
> +static bool rdev_is_mddev(struct md_rdev *rdev)
> +{
> +	return rdev->bdev->bd_disk->fops == &md_fops;
> +}
> +
>   /* stack the limit for all rdevs into lim */
>   int mddev_stack_rdev_limits(struct mddev *mddev, struct queue_limits *lim,
>   		unsigned int flags)
>   {
>   	struct md_rdev *rdev;
> +	bool io_opt_configured = lim->io_opt;
>   
>   	rdev_for_each(rdev, mddev) {
> +		unsigned int io_opt = lim->io_opt;
> +
>   		queue_limits_stack_bdev(lim, rdev->bdev, rdev->data_offset,
>   					mddev->gendisk->disk_name);
>   		if ((flags & MDDEV_STACK_INTEGRITY) &&
>   		    !queue_limits_stack_integrity_bdev(lim, rdev->bdev))
>   			return -EINVAL;
> +
> +		/*
> +		 * If member disk is not mdraid array, keep the io_opt
> +		 * from personality and ignore io_opt from member disk.
> +		 */
> +		if (!rdev_is_mddev(rdev) && io_opt_configured)
> +			lim->io_opt = io_opt;
>   	}
>   
>   	/*
> @@ -6230,9 +6245,11 @@ int mddev_stack_rdev_limits(struct mddev *mddev, struct queue_limits *lim,
>   EXPORT_SYMBOL_GPL(mddev_stack_rdev_limits);
>   
>   /* apply the extra stacking limits from a new rdev into mddev */
> -int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev)
> +int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev,
> +			 bool io_opt_configured)
>   {
>   	struct queue_limits lim;
> +	unsigned int io_opt;
>   
>   	if (mddev_is_dm(mddev))
>   		return 0;
> @@ -6245,6 +6262,8 @@ int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev)
>   	}
>   
>   	lim = queue_limits_start_update(mddev->gendisk->queue);
> +	io_opt = lim.io_opt;
> +
>   	queue_limits_stack_bdev(&lim, rdev->bdev, rdev->data_offset,
>   				mddev->gendisk->disk_name);
>   
> @@ -6255,6 +6274,13 @@ int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev)
>   		return -ENXIO;
>   	}
>   
> +	/*
> +	 * If member disk is not mdraid array, keep the io_opt from
> +	 * personality and ignore io_opt from member disk.
> +	 */
> +	if (!rdev_is_mddev(rdev) && io_opt_configured)
> +		lim.io_opt = io_opt;
> +
>   	return queue_limits_commit_update(mddev->gendisk->queue, &lim);
>   }
>   EXPORT_SYMBOL_GPL(mddev_stack_new_rdev);
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index ddf989f2a139..80c527b3777d 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -1041,7 +1041,8 @@ int do_md_run(struct mddev *mddev);
>   #define MDDEV_STACK_INTEGRITY	(1u << 0)
>   int mddev_stack_rdev_limits(struct mddev *mddev, struct queue_limits *lim,
>   		unsigned int flags);
> -int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev);
> +int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev,
> +			 bool io_opt_configured);
>   void mddev_update_io_opt(struct mddev *mddev, unsigned int nr_stripes);
>   
>   extern const struct block_device_operations md_fops;
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 1a957dba2640..f3f3086f27fa 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1944,7 +1944,7 @@ static int raid1_add_disk(struct mddev *mddev, struct md_rdev *rdev)
>   	for (mirror = first; mirror <= last; mirror++) {
>   		p = conf->mirrors + mirror;
>   		if (!p->rdev) {
> -			err = mddev_stack_new_rdev(mddev, rdev);
> +			err = mddev_stack_new_rdev(mddev, rdev, false);
>   			if (err)
>   				return err;
>   
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 2c6b65b83724..a6edc91e7a9a 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -2139,7 +2139,7 @@ static int raid10_add_disk(struct mddev *mddev, struct md_rdev *rdev)
>   			continue;
>   		}
>   
> -		err = mddev_stack_new_rdev(mddev, rdev);
> +		err = mddev_stack_new_rdev(mddev, rdev, true);
>   		if (err)
>   			return err;
>   		p->head_position = 0;
> @@ -2157,7 +2157,7 @@ static int raid10_add_disk(struct mddev *mddev, struct md_rdev *rdev)
>   		clear_bit(In_sync, &rdev->flags);
>   		set_bit(Replacement, &rdev->flags);
>   		rdev->raid_disk = repl_slot;
> -		err = mddev_stack_new_rdev(mddev, rdev);
> +		err = mddev_stack_new_rdev(mddev, rdev, true);
>   		if (err)
>   			return err;
>   		conf->fullsync = 1;

LGTM

Reviewed-by: Li Nan <linan122@huawei.com>

-- 
Thanks,
Nan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 06/11] md: support to align bio to limits
  2026-01-12  4:28 ` [PATCH v4 06/11] md: support to align bio to limits Yu Kuai
@ 2026-01-12 11:24   ` Li Nan
  2026-01-12 11:40     ` Li Nan
  0 siblings, 1 reply; 19+ messages in thread
From: Li Nan @ 2026-01-12 11:24 UTC (permalink / raw)
  To: Yu Kuai, linux-raid



在 2026/1/12 12:28, Yu Kuai 写道:
> For personalities that report optimal IO size, it indicates that users
> can get the best IO bandwidth if they issue IO with this size. However
> there is also an implicit condition that IO should also be aligned to the
> optimal IO size.
> 
> Currently, bio will only be split by limits, if bio offset is not aligned
> to limits, then all split bio will not be aligned. This patch add a new
> feature to align bio to limits first, and following patches will support
> this for each personality if necessary.
> 
> Link: https://lore.kernel.org/linux-raid/20260103154543.832844-7-yukuai@fnnas.com
> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
> Reviewed-by: Li Nan <linan122@huawei.com>
> ---
>   drivers/md/md.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++
>   drivers/md/md.h |  2 ++
>   2 files changed, 56 insertions(+)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 21b0bc3088d2..731ec800f5cb 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -428,6 +428,56 @@ bool md_handle_request(struct mddev *mddev, struct bio *bio)
>   }
>   EXPORT_SYMBOL(md_handle_request);
>   
> +static struct bio *__md_bio_align_to_limits(struct mddev *mddev,
> +					     struct bio *bio)
> +{
> +	unsigned int max_sectors = mddev->gendisk->queue->limits.max_sectors;
> +	sector_t start = bio->bi_iter.bi_sector;
> +	sector_t end = start + bio_sectors(bio);
> +	sector_t align_start;
> +	sector_t align_end;
> +	u32 rem;
> +
> +	/* calculate align_start = roundup(start, max_sectors) */

Can we use roundup_u64() here?

> +	align_start = start;
> +	rem = sector_div(align_start, max_sectors);
> +	/* already aligned */
> +	if (!rem)
> +		return bio;
> +
> +	align_start = start + max_sectors - rem;
> +
> +	/* calculate align_end = rounddown(end, max_sectors) */

Use div64_u64_rem() here seems better.

> +	align_end = end;
> +	rem = sector_div(align_end, max_sectors);
> +	align_end = end - rem;
> +
> +	/* bio is too small to split */
> +	if (align_end <= align_start)
> +		return bio;
> +
> +	return bio_submit_split_bioset(bio, align_start - start,
> +				       &mddev->gendisk->bio_split);
> +}
> +
-- 
Thanks,
Nan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 06/11] md: support to align bio to limits
  2026-01-12 11:24   ` Li Nan
@ 2026-01-12 11:40     ` Li Nan
  0 siblings, 0 replies; 19+ messages in thread
From: Li Nan @ 2026-01-12 11:40 UTC (permalink / raw)
  To: Yu Kuai, linux-raid



在 2026/1/12 19:24, Li Nan 写道:
> 
> 
> 在 2026/1/12 12:28, Yu Kuai 写道:
>> For personalities that report optimal IO size, it indicates that users
>> can get the best IO bandwidth if they issue IO with this size. However
>> there is also an implicit condition that IO should also be aligned to the
>> optimal IO size.
>>
>> Currently, bio will only be split by limits, if bio offset is not aligned
>> to limits, then all split bio will not be aligned. This patch add a new
>> feature to align bio to limits first, and following patches will support
>> this for each personality if necessary.
>>
>> Link: 
>> https://lore.kernel.org/linux-raid/20260103154543.832844-7-yukuai@fnnas.com
>> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
>> Reviewed-by: Li Nan <linan122@huawei.com>
>> ---
>>   drivers/md/md.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++
>>   drivers/md/md.h |  2 ++
>>   2 files changed, 56 insertions(+)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 21b0bc3088d2..731ec800f5cb 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -428,6 +428,56 @@ bool md_handle_request(struct mddev *mddev, struct 
>> bio *bio)
>>   }
>>   EXPORT_SYMBOL(md_handle_request);
>> +static struct bio *__md_bio_align_to_limits(struct mddev *mddev,
>> +                         struct bio *bio)
>> +{
>> +    unsigned int max_sectors = mddev->gendisk->queue->limits.max_sectors;
>> +    sector_t start = bio->bi_iter.bi_sector;
>> +    sector_t end = start + bio_sectors(bio);
>> +    sector_t align_start;
>> +    sector_t align_end;
>> +    u32 rem;
>> +
>> +    /* calculate align_start = roundup(start, max_sectors) */
> 
> Can we use roundup_u64() here?
> 
>> +    align_start = start;
>> +    rem = sector_div(align_start, max_sectors);
>> +    /* already aligned */
>> +    if (!rem)
>> +        return bio;
>> +
>> +    align_start = start + max_sectors - rem;
>> +
>> +    /* calculate align_end = rounddown(end, max_sectors) */
> 
> Use div64_u64_rem() here seems better.

div64_u64_rem is same as sector_div. Please ignore it.

> 
>> +    align_end = end;
>> +    rem = sector_div(align_end, max_sectors);
>> +    align_end = end - rem;
>> +
>> +    /* bio is too small to split */
>> +    if (align_end <= align_start)
>> +        return bio;
>> +
>> +    return bio_submit_split_bioset(bio, align_start - start,
>> +                       &mddev->gendisk->bio_split);
>> +}
>> +

-- 
Thanks,
Nan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 05/11] md/raid5: make sure max_sectors is not less than io_opt
  2026-01-12  4:28 ` [PATCH v4 05/11] md/raid5: make sure max_sectors is not less than io_opt Yu Kuai
@ 2026-01-13  5:06 ` Dan Carpenter
  -1 siblings, 0 replies; 19+ messages in thread
From: kernel test robot @ 2026-01-12 22:15 UTC (permalink / raw)
  To: oe-kbuild; +Cc: lkp, Dan Carpenter

BCC: lkp@intel.com
CC: oe-kbuild-all@lists.linux.dev
In-Reply-To: <20260112042857.2334264-6-yukuai@fnnas.com>
References: <20260112042857.2334264-6-yukuai@fnnas.com>
TO: Yu Kuai <yukuai@fnnas.com>
TO: linux-raid@vger.kernel.org
TO: linan122@huawei.com
CC: yukuai@fnnas.com

Hi Yu,

kernel test robot noticed the following build warnings:

[auto build test WARNING on linus/master]
[also build test WARNING on v6.19-rc5 next-20260109]
[cannot apply to song-md/md-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yu-Kuai/md-merge-mddev-has_superblock-into-mddev_flags/20260112-123233
base:   linus/master
patch link:    https://lore.kernel.org/r/20260112042857.2334264-6-yukuai%40fnnas.com
patch subject: [PATCH v4 05/11] md/raid5: make sure max_sectors is not less than io_opt
:::::: branch date: 18 hours ago
:::::: commit date: 18 hours ago
config: i386-randconfig-141-20260113 (https://download.01.org/0day-ci/archive/20260113/202601130531.LGfcZsa4-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
smatch version: v0.5.0-8985-g2614ff1a

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <error27@gmail.com>
| Closes: https://lore.kernel.org/r/202601130531.LGfcZsa4-lkp@intel.com/

New smatch warnings:
drivers/md/raid5.c:8100 raid5_run() warn: missing error code 'ret'

Old smatch warnings:
drivers/md/raid5.c:2880 raid5_end_write_request() error: uninitialized symbol 'rdev'.
drivers/md/raid5.c:2885 raid5_end_write_request() error: uninitialized symbol 'rdev'.
drivers/md/raid5.c:8580 raid5_start_reshape() warn: mixing irq and irqsave

vim +/ret +8100 drivers/md/raid5.c

16ef510139315a Christoph Hellwig    2020-09-24  7829  
849674e4fb175e Shaohua Li           2016-01-20  7830  static int raid5_run(struct mddev *mddev)
91adb56473febe NeilBrown            2009-03-31  7831  {
d1688a6d5515f1 NeilBrown            2011-10-11  7832  	struct r5conf *conf;
c148ffdcda00b6 NeilBrown            2009-11-13  7833  	int dirty_parity_disks = 0;
3cb03002000f13 NeilBrown            2011-10-11  7834  	struct md_rdev *rdev;
713cf5a63954bd Shaohua Li           2015-08-13  7835  	struct md_rdev *journal_dev = NULL;
c148ffdcda00b6 NeilBrown            2009-11-13  7836  	sector_t reshape_offset = 0;
c567c86b90d471 Yu Kuai              2023-06-22  7837  	int i;
b5254dd5fdd9ab NeilBrown            2012-05-21  7838  	long long min_offset_diff = 0;
b5254dd5fdd9ab NeilBrown            2012-05-21  7839  	int first = 1;
f63f17350e5373 Christoph Hellwig    2024-03-03  7840  	int ret = -EIO;
91adb56473febe NeilBrown            2009-03-31  7841  
907a99c314a5a6 Li Nan               2025-07-22  7842  	if (mddev->resync_offset != MaxSector)
cc6167b4f3b3ca NeilBrown            2016-11-02  7843  		pr_notice("md/raid:%s: not clean -- starting background reconstruction\n",
8c6ac868b107ed Andre Noll           2009-06-18  7844  			  mdname(mddev));
b5254dd5fdd9ab NeilBrown            2012-05-21  7845  
b5254dd5fdd9ab NeilBrown            2012-05-21  7846  	rdev_for_each(rdev, mddev) {
b5254dd5fdd9ab NeilBrown            2012-05-21  7847  		long long diff;
713cf5a63954bd Shaohua Li           2015-08-13  7848  
f2076e7d0643d1 Shaohua Li           2015-10-08  7849  		if (test_bit(Journal, &rdev->flags)) {
713cf5a63954bd Shaohua Li           2015-08-13  7850  			journal_dev = rdev;
f2076e7d0643d1 Shaohua Li           2015-10-08  7851  			continue;
f2076e7d0643d1 Shaohua Li           2015-10-08  7852  		}
b5254dd5fdd9ab NeilBrown            2012-05-21  7853  		if (rdev->raid_disk < 0)
b5254dd5fdd9ab NeilBrown            2012-05-21  7854  			continue;
b5254dd5fdd9ab NeilBrown            2012-05-21  7855  		diff = (rdev->new_data_offset - rdev->data_offset);
b5254dd5fdd9ab NeilBrown            2012-05-21  7856  		if (first) {
b5254dd5fdd9ab NeilBrown            2012-05-21  7857  			min_offset_diff = diff;
b5254dd5fdd9ab NeilBrown            2012-05-21  7858  			first = 0;
b5254dd5fdd9ab NeilBrown            2012-05-21  7859  		} else if (mddev->reshape_backwards &&
b5254dd5fdd9ab NeilBrown            2012-05-21  7860  			 diff < min_offset_diff)
b5254dd5fdd9ab NeilBrown            2012-05-21  7861  			min_offset_diff = diff;
b5254dd5fdd9ab NeilBrown            2012-05-21  7862  		else if (!mddev->reshape_backwards &&
b5254dd5fdd9ab NeilBrown            2012-05-21  7863  			 diff > min_offset_diff)
b5254dd5fdd9ab NeilBrown            2012-05-21  7864  			min_offset_diff = diff;
b5254dd5fdd9ab NeilBrown            2012-05-21  7865  	}
b5254dd5fdd9ab NeilBrown            2012-05-21  7866  
230b55fa8d6400 NeilBrown            2017-10-17  7867  	if ((test_bit(MD_HAS_JOURNAL, &mddev->flags) || journal_dev) &&
230b55fa8d6400 NeilBrown            2017-10-17  7868  	    (mddev->bitmap_info.offset || mddev->bitmap_info.file)) {
230b55fa8d6400 NeilBrown            2017-10-17  7869  		pr_notice("md/raid:%s: array cannot have both journal and bitmap\n",
230b55fa8d6400 NeilBrown            2017-10-17  7870  			  mdname(mddev));
c567c86b90d471 Yu Kuai              2023-06-22  7871  		return -EINVAL;
230b55fa8d6400 NeilBrown            2017-10-17  7872  	}
230b55fa8d6400 NeilBrown            2017-10-17  7873  
91adb56473febe NeilBrown            2009-03-31  7874  	if (mddev->reshape_position != MaxSector) {
91adb56473febe NeilBrown            2009-03-31  7875  		/* Check that we can continue the reshape.
b5254dd5fdd9ab NeilBrown            2012-05-21  7876  		 * Difficulties arise if the stripe we would write to
b5254dd5fdd9ab NeilBrown            2012-05-21  7877  		 * next is at or after the stripe we would read from next.
b5254dd5fdd9ab NeilBrown            2012-05-21  7878  		 * For a reshape that changes the number of devices, this
b5254dd5fdd9ab NeilBrown            2012-05-21  7879  		 * is only possible for a very short time, and mdadm makes
b5254dd5fdd9ab NeilBrown            2012-05-21  7880  		 * sure that time appears to have past before assembling
b5254dd5fdd9ab NeilBrown            2012-05-21  7881  		 * the array.  So we fail if that time hasn't passed.
b5254dd5fdd9ab NeilBrown            2012-05-21  7882  		 * For a reshape that keeps the number of devices the same
b5254dd5fdd9ab NeilBrown            2012-05-21  7883  		 * mdadm must be monitoring the reshape can keeping the
b5254dd5fdd9ab NeilBrown            2012-05-21  7884  		 * critical areas read-only and backed up.  It will start
b5254dd5fdd9ab NeilBrown            2012-05-21  7885  		 * the array in read-only mode, so we check for that.
91adb56473febe NeilBrown            2009-03-31  7886  		 */
91adb56473febe NeilBrown            2009-03-31  7887  		sector_t here_new, here_old;
91adb56473febe NeilBrown            2009-03-31  7888  		int old_disks;
18b0033491f584 Andre Noll           2009-03-31  7889  		int max_degraded = (mddev->level == 6 ? 2 : 1);
05256d9884d327 NeilBrown            2015-07-15  7890  		int chunk_sectors;
05256d9884d327 NeilBrown            2015-07-15  7891  		int new_data_disks;
91adb56473febe NeilBrown            2009-03-31  7892  
713cf5a63954bd Shaohua Li           2015-08-13  7893  		if (journal_dev) {
cc6167b4f3b3ca NeilBrown            2016-11-02  7894  			pr_warn("md/raid:%s: don't support reshape with journal - aborting.\n",
713cf5a63954bd Shaohua Li           2015-08-13  7895  				mdname(mddev));
c567c86b90d471 Yu Kuai              2023-06-22  7896  			return -EINVAL;
713cf5a63954bd Shaohua Li           2015-08-13  7897  		}
713cf5a63954bd Shaohua Li           2015-08-13  7898  
88ce4930e2b803 NeilBrown            2009-03-31  7899  		if (mddev->new_level != mddev->level) {
cc6167b4f3b3ca NeilBrown            2016-11-02  7900  			pr_warn("md/raid:%s: unsupported reshape required - aborting.\n",
91adb56473febe NeilBrown            2009-03-31  7901  				mdname(mddev));
c567c86b90d471 Yu Kuai              2023-06-22  7902  			return -EINVAL;
91adb56473febe NeilBrown            2009-03-31  7903  		}
91adb56473febe NeilBrown            2009-03-31  7904  		old_disks = mddev->raid_disks - mddev->delta_disks;
91adb56473febe NeilBrown            2009-03-31  7905  		/* reshape_position must be on a new-stripe boundary, and one
91adb56473febe NeilBrown            2009-03-31  7906  		 * further up in new geometry must map after here in old
91adb56473febe NeilBrown            2009-03-31  7907  		 * geometry.
05256d9884d327 NeilBrown            2015-07-15  7908  		 * If the chunk sizes are different, then as we perform reshape
05256d9884d327 NeilBrown            2015-07-15  7909  		 * in units of the largest of the two, reshape_position needs
05256d9884d327 NeilBrown            2015-07-15  7910  		 * be a multiple of the largest chunk size times new data disks.
91adb56473febe NeilBrown            2009-03-31  7911  		 */
91adb56473febe NeilBrown            2009-03-31  7912  		here_new = mddev->reshape_position;
05256d9884d327 NeilBrown            2015-07-15  7913  		chunk_sectors = max(mddev->chunk_sectors, mddev->new_chunk_sectors);
05256d9884d327 NeilBrown            2015-07-15  7914  		new_data_disks = mddev->raid_disks - max_degraded;
05256d9884d327 NeilBrown            2015-07-15  7915  		if (sector_div(here_new, chunk_sectors * new_data_disks)) {
cc6167b4f3b3ca NeilBrown            2016-11-02  7916  			pr_warn("md/raid:%s: reshape_position not on a stripe boundary\n",
cc6167b4f3b3ca NeilBrown            2016-11-02  7917  				mdname(mddev));
c567c86b90d471 Yu Kuai              2023-06-22  7918  			return -EINVAL;
91adb56473febe NeilBrown            2009-03-31  7919  		}
05256d9884d327 NeilBrown            2015-07-15  7920  		reshape_offset = here_new * chunk_sectors;
91adb56473febe NeilBrown            2009-03-31  7921  		/* here_new is the stripe we will write to */
91adb56473febe NeilBrown            2009-03-31  7922  		here_old = mddev->reshape_position;
05256d9884d327 NeilBrown            2015-07-15  7923  		sector_div(here_old, chunk_sectors * (old_disks-max_degraded));
91adb56473febe NeilBrown            2009-03-31  7924  		/* here_old is the first stripe that we might need to read
91adb56473febe NeilBrown            2009-03-31  7925  		 * from */
67ac6011db5d2b NeilBrown            2009-08-13  7926  		if (mddev->delta_disks == 0) {
67ac6011db5d2b NeilBrown            2009-08-13  7927  			/* We cannot be sure it is safe to start an in-place
b5254dd5fdd9ab NeilBrown            2012-05-21  7928  			 * reshape.  It is only safe if user-space is monitoring
67ac6011db5d2b NeilBrown            2009-08-13  7929  			 * and taking constant backups.
67ac6011db5d2b NeilBrown            2009-08-13  7930  			 * mdadm always starts a situation like this in
67ac6011db5d2b NeilBrown            2009-08-13  7931  			 * readonly mode so it can take control before
67ac6011db5d2b NeilBrown            2009-08-13  7932  			 * allowing any writes.  So just check for that.
67ac6011db5d2b NeilBrown            2009-08-13  7933  			 */
b5254dd5fdd9ab NeilBrown            2012-05-21  7934  			if (abs(min_offset_diff) >= mddev->chunk_sectors &&
b5254dd5fdd9ab NeilBrown            2012-05-21  7935  			    abs(min_offset_diff) >= mddev->new_chunk_sectors)
b5254dd5fdd9ab NeilBrown            2012-05-21  7936  				/* not really in-place - so OK */;
b5254dd5fdd9ab NeilBrown            2012-05-21  7937  			else if (mddev->ro == 0) {
cc6167b4f3b3ca NeilBrown            2016-11-02  7938  				pr_warn("md/raid:%s: in-place reshape must be started in read-only mode - aborting\n",
0c55e02259115c NeilBrown            2010-05-03  7939  					mdname(mddev));
c567c86b90d471 Yu Kuai              2023-06-22  7940  				return -EINVAL;
67ac6011db5d2b NeilBrown            2009-08-13  7941  			}
2c810cddc44d6f NeilBrown            2012-05-21  7942  		} else if (mddev->reshape_backwards
05256d9884d327 NeilBrown            2015-07-15  7943  		    ? (here_new * chunk_sectors + min_offset_diff <=
05256d9884d327 NeilBrown            2015-07-15  7944  		       here_old * chunk_sectors)
05256d9884d327 NeilBrown            2015-07-15  7945  		    : (here_new * chunk_sectors >=
05256d9884d327 NeilBrown            2015-07-15  7946  		       here_old * chunk_sectors + (-min_offset_diff))) {
91adb56473febe NeilBrown            2009-03-31  7947  			/* Reading from the same stripe as writing to - bad */
cc6167b4f3b3ca NeilBrown            2016-11-02  7948  			pr_warn("md/raid:%s: reshape_position too early for auto-recovery - aborting.\n",
0c55e02259115c NeilBrown            2010-05-03  7949  				mdname(mddev));
c567c86b90d471 Yu Kuai              2023-06-22  7950  			return -EINVAL;
91adb56473febe NeilBrown            2009-03-31  7951  		}
cc6167b4f3b3ca NeilBrown            2016-11-02  7952  		pr_debug("md/raid:%s: reshape will continue\n", mdname(mddev));
91adb56473febe NeilBrown            2009-03-31  7953  		/* OK, we should be able to continue; */
91adb56473febe NeilBrown            2009-03-31  7954  	} else {
91adb56473febe NeilBrown            2009-03-31  7955  		BUG_ON(mddev->level != mddev->new_level);
91adb56473febe NeilBrown            2009-03-31  7956  		BUG_ON(mddev->layout != mddev->new_layout);
664e7c413f1e90 Andre Noll           2009-06-18  7957  		BUG_ON(mddev->chunk_sectors != mddev->new_chunk_sectors);
91adb56473febe NeilBrown            2009-03-31  7958  		BUG_ON(mddev->delta_disks != 0);
91adb56473febe NeilBrown            2009-03-31  7959  	}
245f46c2c221ef NeilBrown            2009-03-31  7960  
3418d036c81dcb Artur Paszkiewicz    2017-03-09  7961  	if (test_bit(MD_HAS_JOURNAL, &mddev->flags) &&
3418d036c81dcb Artur Paszkiewicz    2017-03-09  7962  	    test_bit(MD_HAS_PPL, &mddev->flags)) {
3418d036c81dcb Artur Paszkiewicz    2017-03-09  7963  		pr_warn("md/raid:%s: using journal device and PPL not allowed - disabling PPL\n",
3418d036c81dcb Artur Paszkiewicz    2017-03-09  7964  			mdname(mddev));
3418d036c81dcb Artur Paszkiewicz    2017-03-09  7965  		clear_bit(MD_HAS_PPL, &mddev->flags);
ddc088238cd698 Pawel Baldysiak      2017-08-16  7966  		clear_bit(MD_HAS_MULTIPLE_PPLS, &mddev->flags);
3418d036c81dcb Artur Paszkiewicz    2017-03-09  7967  	}
3418d036c81dcb Artur Paszkiewicz    2017-03-09  7968  
245f46c2c221ef NeilBrown            2009-03-31  7969  	if (mddev->private == NULL)
91adb56473febe NeilBrown            2009-03-31  7970  		conf = setup_conf(mddev);
245f46c2c221ef NeilBrown            2009-03-31  7971  	else
245f46c2c221ef NeilBrown            2009-03-31  7972  		conf = mddev->private;
91adb56473febe NeilBrown            2009-03-31  7973  
c567c86b90d471 Yu Kuai              2023-06-22  7974  	if (IS_ERR(conf))
c567c86b90d471 Yu Kuai              2023-06-22  7975  		return PTR_ERR(conf);
91adb56473febe NeilBrown            2009-03-31  7976  
486b0f7bcd64be Song Liu             2016-08-19  7977  	if (test_bit(MD_HAS_JOURNAL, &mddev->flags)) {
486b0f7bcd64be Song Liu             2016-08-19  7978  		if (!journal_dev) {
cc6167b4f3b3ca NeilBrown            2016-11-02  7979  			pr_warn("md/raid:%s: journal disk is missing, force array readonly\n",
7dde2ad3c5b4af Shaohua Li           2015-10-08  7980  				mdname(mddev));
7dde2ad3c5b4af Shaohua Li           2015-10-08  7981  			mddev->ro = 1;
7dde2ad3c5b4af Shaohua Li           2015-10-08  7982  			set_disk_ro(mddev->gendisk, 1);
907a99c314a5a6 Li Nan               2025-07-22  7983  		} else if (mddev->resync_offset == MaxSector)
486b0f7bcd64be Song Liu             2016-08-19  7984  			set_bit(MD_JOURNAL_CLEAN, &mddev->flags);
7dde2ad3c5b4af Shaohua Li           2015-10-08  7985  	}
7dde2ad3c5b4af Shaohua Li           2015-10-08  7986  
b5254dd5fdd9ab NeilBrown            2012-05-21  7987  	conf->min_offset_diff = min_offset_diff;
44693154398272 Yu Kuai              2023-05-23  7988  	rcu_assign_pointer(mddev->thread, conf->thread);
44693154398272 Yu Kuai              2023-05-23  7989  	rcu_assign_pointer(conf->thread, NULL);
91adb56473febe NeilBrown            2009-03-31  7990  	mddev->private = conf;
91adb56473febe NeilBrown            2009-03-31  7991  
17045f52ac76d9 NeilBrown            2011-12-23  7992  	for (i = 0; i < conf->raid_disks && conf->previous_raid_disks;
17045f52ac76d9 NeilBrown            2011-12-23  7993  	     i++) {
ad8606702f2689 Yu Kuai              2023-11-25  7994  		rdev = conf->disks[i].rdev;
17045f52ac76d9 NeilBrown            2011-12-23  7995  		if (!rdev)
c148ffdcda00b6 NeilBrown            2009-11-13  7996  			continue;
ad8606702f2689 Yu Kuai              2023-11-25  7997  		if (conf->disks[i].replacement &&
17045f52ac76d9 NeilBrown            2011-12-23  7998  		    conf->reshape_progress != MaxSector) {
17045f52ac76d9 NeilBrown            2011-12-23  7999  			/* replacements and reshape simply do not mix. */
cc6167b4f3b3ca NeilBrown            2016-11-02  8000  			pr_warn("md: cannot handle concurrent replacement and reshape.\n");
17045f52ac76d9 NeilBrown            2011-12-23  8001  			goto abort;
17045f52ac76d9 NeilBrown            2011-12-23  8002  		}
7bc436121e557b Tom Rix              2023-03-27  8003  		if (test_bit(In_sync, &rdev->flags))
2f115882499f3e NeilBrown            2010-06-17  8004  			continue;
c148ffdcda00b6 NeilBrown            2009-11-13  8005  		/* This disc is not fully in-sync.  However if it
c148ffdcda00b6 NeilBrown            2009-11-13  8006  		 * just stored parity (beyond the recovery_offset),
c148ffdcda00b6 NeilBrown            2009-11-13  8007  		 * when we don't need to be concerned about the
c148ffdcda00b6 NeilBrown            2009-11-13  8008  		 * array being dirty.
c148ffdcda00b6 NeilBrown            2009-11-13  8009  		 * When reshape goes 'backwards', we never have
c148ffdcda00b6 NeilBrown            2009-11-13  8010  		 * partially completed devices, so we only need
c148ffdcda00b6 NeilBrown            2009-11-13  8011  		 * to worry about reshape going forwards.
c148ffdcda00b6 NeilBrown            2009-11-13  8012  		 */
c148ffdcda00b6 NeilBrown            2009-11-13  8013  		/* Hack because v0.91 doesn't store recovery_offset properly. */
c148ffdcda00b6 NeilBrown            2009-11-13  8014  		if (mddev->major_version == 0 &&
c148ffdcda00b6 NeilBrown            2009-11-13  8015  		    mddev->minor_version > 90)
c148ffdcda00b6 NeilBrown            2009-11-13  8016  			rdev->recovery_offset = reshape_offset;
c148ffdcda00b6 NeilBrown            2009-11-13  8017  
c148ffdcda00b6 NeilBrown            2009-11-13  8018  		if (rdev->recovery_offset < reshape_offset) {
c148ffdcda00b6 NeilBrown            2009-11-13  8019  			/* We need to check old and new layout */
c148ffdcda00b6 NeilBrown            2009-11-13  8020  			if (!only_parity(rdev->raid_disk,
c148ffdcda00b6 NeilBrown            2009-11-13  8021  					 conf->algorithm,
c148ffdcda00b6 NeilBrown            2009-11-13  8022  					 conf->raid_disks,
c148ffdcda00b6 NeilBrown            2009-11-13  8023  					 conf->max_degraded))
c148ffdcda00b6 NeilBrown            2009-11-13  8024  				continue;
c148ffdcda00b6 NeilBrown            2009-11-13  8025  		}
c148ffdcda00b6 NeilBrown            2009-11-13  8026  		if (!only_parity(rdev->raid_disk,
c148ffdcda00b6 NeilBrown            2009-11-13  8027  				 conf->prev_algo,
c148ffdcda00b6 NeilBrown            2009-11-13  8028  				 conf->previous_raid_disks,
c148ffdcda00b6 NeilBrown            2009-11-13  8029  				 conf->max_degraded))
c148ffdcda00b6 NeilBrown            2009-11-13  8030  			continue;
c148ffdcda00b6 NeilBrown            2009-11-13  8031  		dirty_parity_disks++;
c148ffdcda00b6 NeilBrown            2009-11-13  8032  	}
91adb56473febe NeilBrown            2009-03-31  8033  
17045f52ac76d9 NeilBrown            2011-12-23  8034  	/*
17045f52ac76d9 NeilBrown            2011-12-23  8035  	 * 0 for a fully functional array, 1 or 2 for a degraded array.
17045f52ac76d9 NeilBrown            2011-12-23  8036  	 */
2e38a37f23c98d Song Liu             2017-01-24  8037  	mddev->degraded = raid5_calc_degraded(conf);
91adb56473febe NeilBrown            2009-03-31  8038  
674806d62fb02a NeilBrown            2010-06-16  8039  	if (has_failed(conf)) {
cc6167b4f3b3ca NeilBrown            2016-11-02  8040  		pr_crit("md/raid:%s: not enough operational devices (%d/%d failed)\n",
02c2de8cc83588 NeilBrown            2006-10-03  8041  			mdname(mddev), mddev->degraded, conf->raid_disks);
^1da177e4c3f41 Linus Torvalds       2005-04-16  8042  		goto abort;
^1da177e4c3f41 Linus Torvalds       2005-04-16  8043  	}
^1da177e4c3f41 Linus Torvalds       2005-04-16  8044  
91adb56473febe NeilBrown            2009-03-31  8045  	/* device size must be a multiple of chunk size */
c5eec74f252dfb Guoqing Jiang        2020-12-16  8046  	mddev->dev_sectors &= ~((sector_t)mddev->chunk_sectors - 1);
91adb56473febe NeilBrown            2009-03-31  8047  	mddev->resync_max_sectors = mddev->dev_sectors;
91adb56473febe NeilBrown            2009-03-31  8048  
c148ffdcda00b6 NeilBrown            2009-11-13  8049  	if (mddev->degraded > dirty_parity_disks &&
907a99c314a5a6 Li Nan               2025-07-22  8050  	    mddev->resync_offset != MaxSector) {
4536bf9ba2d034 Artur Paszkiewicz    2017-03-09  8051  		if (test_bit(MD_HAS_PPL, &mddev->flags))
4536bf9ba2d034 Artur Paszkiewicz    2017-03-09  8052  			pr_crit("md/raid:%s: starting dirty degraded array with PPL.\n",
4536bf9ba2d034 Artur Paszkiewicz    2017-03-09  8053  				mdname(mddev));
4536bf9ba2d034 Artur Paszkiewicz    2017-03-09  8054  		else if (mddev->ok_start_degraded)
cc6167b4f3b3ca NeilBrown            2016-11-02  8055  			pr_crit("md/raid:%s: starting dirty degraded array - data corruption possible.\n",
6ff8d8ec06690f NeilBrown            2006-01-06  8056  				mdname(mddev));
6ff8d8ec06690f NeilBrown            2006-01-06  8057  		else {
cc6167b4f3b3ca NeilBrown            2016-11-02  8058  			pr_crit("md/raid:%s: cannot start dirty degraded array.\n",
^1da177e4c3f41 Linus Torvalds       2005-04-16  8059  				mdname(mddev));
^1da177e4c3f41 Linus Torvalds       2005-04-16  8060  			goto abort;
^1da177e4c3f41 Linus Torvalds       2005-04-16  8061  		}
6ff8d8ec06690f NeilBrown            2006-01-06  8062  	}
^1da177e4c3f41 Linus Torvalds       2005-04-16  8063  
cc6167b4f3b3ca NeilBrown            2016-11-02  8064  	pr_info("md/raid:%s: raid level %d active with %d out of %d devices, algorithm %d\n",
cc6167b4f3b3ca NeilBrown            2016-11-02  8065  		mdname(mddev), conf->level,
^1da177e4c3f41 Linus Torvalds       2005-04-16  8066  		mddev->raid_disks-mddev->degraded, mddev->raid_disks,
e183eaedd53807 NeilBrown            2009-03-31  8067  		mddev->new_layout);
^1da177e4c3f41 Linus Torvalds       2005-04-16  8068  
^1da177e4c3f41 Linus Torvalds       2005-04-16  8069  	print_raid5_conf(conf);
^1da177e4c3f41 Linus Torvalds       2005-04-16  8070  
fef9c61fdfabf9 NeilBrown            2009-03-31  8071  	if (conf->reshape_progress != MaxSector) {
fef9c61fdfabf9 NeilBrown            2009-03-31  8072  		conf->reshape_safe = conf->reshape_progress;
f67055780caac6 NeilBrown            2006-03-27  8073  		atomic_set(&conf->reshape_stripes, 0);
f67055780caac6 NeilBrown            2006-03-27  8074  		clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
f67055780caac6 NeilBrown            2006-03-27  8075  		clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
f67055780caac6 NeilBrown            2006-03-27  8076  		set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
ad39c08186f8a0 Yu Kuai              2024-02-01  8077  		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
f67055780caac6 NeilBrown            2006-03-27  8078  	}
f67055780caac6 NeilBrown            2006-03-27  8079  
^1da177e4c3f41 Linus Torvalds       2005-04-16  8080  	/* Ok, everything is just fine now */
a64c876fd35790 NeilBrown            2010-04-14  8081  	if (mddev->to_remove == &raid5_attrs_group)
a64c876fd35790 NeilBrown            2010-04-14  8082  		mddev->to_remove = NULL;
00bcb4ac7ee7e5 NeilBrown            2010-06-01  8083  	else if (mddev->kobj.sd &&
00bcb4ac7ee7e5 NeilBrown            2010-06-01  8084  	    sysfs_create_group(&mddev->kobj, &raid5_attrs_group))
cc6167b4f3b3ca NeilBrown            2016-11-02  8085  		pr_warn("raid5: failed to create sysfs attributes for %s\n",
5e55e2f5fc95b3 NeilBrown            2007-03-26  8086  			mdname(mddev));
4a5add49951e69 NeilBrown            2010-06-01  8087  	md_set_array_sectors(mddev, raid5_size(mddev, 0, 0));
7a5febe9ffeecd NeilBrown            2005-05-16  8088  
176df894d79741 Christoph Hellwig    2024-03-03  8089  	if (!mddev_is_dm(mddev)) {
f63f17350e5373 Christoph Hellwig    2024-03-03  8090  		ret = raid5_set_limits(mddev);
f63f17350e5373 Christoph Hellwig    2024-03-03  8091  		if (ret)
f63f17350e5373 Christoph Hellwig    2024-03-03  8092  			goto abort;
9f7c2220017771 NeilBrown            2010-07-26  8093  	}
23032a0eb97c8e Raz Ben-Jehuda(caro  2006-12-10  8094) 
585d578974395f Yu Kuai              2026-01-12  8095  	ret = raid5_create_ctx_pool(conf);
585d578974395f Yu Kuai              2026-01-12  8096  	if (ret)
01fce9e38c0e92 Yu Kuai              2026-01-12  8097  		goto abort;
01fce9e38c0e92 Yu Kuai              2026-01-12  8098  
845b9e229fe071 Artur Paszkiewicz    2017-04-04  8099  	if (log_init(conf, journal_dev, raid5_has_ppl(conf)))
5aabf7c49d9ebe Song Liu             2016-11-17 @8100  		goto abort;
5c7e81c3de9eb3 Shaohua Li           2015-08-13  8101  
^1da177e4c3f41 Linus Torvalds       2005-04-16  8102  	return 0;
^1da177e4c3f41 Linus Torvalds       2005-04-16  8103  abort:
7eb8ff02c1df27 Li Lingfeng          2023-08-03  8104  	md_unregister_thread(mddev, &mddev->thread);
^1da177e4c3f41 Linus Torvalds       2005-04-16  8105  	print_raid5_conf(conf);
95fc17aac45300 Dan Williams         2009-07-31  8106  	free_conf(conf);
^1da177e4c3f41 Linus Torvalds       2005-04-16  8107  	mddev->private = NULL;
cc6167b4f3b3ca NeilBrown            2016-11-02  8108  	pr_warn("md/raid:%s: failed to run raid set.\n", mdname(mddev));
f63f17350e5373 Christoph Hellwig    2024-03-03  8109  	return ret;
^1da177e4c3f41 Linus Torvalds       2005-04-16  8110  }
^1da177e4c3f41 Linus Torvalds       2005-04-16  8111  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 05/11] md/raid5: make sure max_sectors is not less than io_opt
@ 2026-01-13  5:06 ` Dan Carpenter
  0 siblings, 0 replies; 19+ messages in thread
From: Dan Carpenter @ 2026-01-13  5:06 UTC (permalink / raw)
  To: oe-kbuild, Yu Kuai, linux-raid, linan122; +Cc: lkp, oe-kbuild-all, yukuai

Hi Yu,

kernel test robot noticed the following build warnings:

https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yu-Kuai/md-merge-mddev-has_superblock-into-mddev_flags/20260112-123233
base:   linus/master
patch link:    https://lore.kernel.org/r/20260112042857.2334264-6-yukuai%40fnnas.com
patch subject: [PATCH v4 05/11] md/raid5: make sure max_sectors is not less than io_opt
config: i386-randconfig-141-20260113 (https://download.01.org/0day-ci/archive/20260113/202601130531.LGfcZsa4-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
smatch version: v0.5.0-8985-g2614ff1a

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
| Closes: https://lore.kernel.org/r/202601130531.LGfcZsa4-lkp@intel.com/

New smatch warnings:
drivers/md/raid5.c:8100 raid5_run() warn: missing error code 'ret'

vim +/ret +8100 drivers/md/raid5.c

cc6167b4f3b3ca NeilBrown            2016-11-02  8064  	pr_info("md/raid:%s: raid level %d active with %d out of %d devices, algorithm %d\n",
cc6167b4f3b3ca NeilBrown            2016-11-02  8065  		mdname(mddev), conf->level,
^1da177e4c3f41 Linus Torvalds       2005-04-16  8066  		mddev->raid_disks-mddev->degraded, mddev->raid_disks,
e183eaedd53807 NeilBrown            2009-03-31  8067  		mddev->new_layout);
^1da177e4c3f41 Linus Torvalds       2005-04-16  8068  
^1da177e4c3f41 Linus Torvalds       2005-04-16  8069  	print_raid5_conf(conf);
^1da177e4c3f41 Linus Torvalds       2005-04-16  8070  
fef9c61fdfabf9 NeilBrown            2009-03-31  8071  	if (conf->reshape_progress != MaxSector) {
fef9c61fdfabf9 NeilBrown            2009-03-31  8072  		conf->reshape_safe = conf->reshape_progress;
f67055780caac6 NeilBrown            2006-03-27  8073  		atomic_set(&conf->reshape_stripes, 0);
f67055780caac6 NeilBrown            2006-03-27  8074  		clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
f67055780caac6 NeilBrown            2006-03-27  8075  		clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
f67055780caac6 NeilBrown            2006-03-27  8076  		set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
ad39c08186f8a0 Yu Kuai              2024-02-01  8077  		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
f67055780caac6 NeilBrown            2006-03-27  8078  	}
f67055780caac6 NeilBrown            2006-03-27  8079  
^1da177e4c3f41 Linus Torvalds       2005-04-16  8080  	/* Ok, everything is just fine now */
a64c876fd35790 NeilBrown            2010-04-14  8081  	if (mddev->to_remove == &raid5_attrs_group)
a64c876fd35790 NeilBrown            2010-04-14  8082  		mddev->to_remove = NULL;
00bcb4ac7ee7e5 NeilBrown            2010-06-01  8083  	else if (mddev->kobj.sd &&
00bcb4ac7ee7e5 NeilBrown            2010-06-01  8084  	    sysfs_create_group(&mddev->kobj, &raid5_attrs_group))
cc6167b4f3b3ca NeilBrown            2016-11-02  8085  		pr_warn("raid5: failed to create sysfs attributes for %s\n",
5e55e2f5fc95b3 NeilBrown            2007-03-26  8086  			mdname(mddev));
4a5add49951e69 NeilBrown            2010-06-01  8087  	md_set_array_sectors(mddev, raid5_size(mddev, 0, 0));
7a5febe9ffeecd NeilBrown            2005-05-16  8088  
176df894d79741 Christoph Hellwig    2024-03-03  8089  	if (!mddev_is_dm(mddev)) {
f63f17350e5373 Christoph Hellwig    2024-03-03  8090  		ret = raid5_set_limits(mddev);
f63f17350e5373 Christoph Hellwig    2024-03-03  8091  		if (ret)
f63f17350e5373 Christoph Hellwig    2024-03-03  8092  			goto abort;
9f7c2220017771 NeilBrown            2010-07-26  8093  	}
23032a0eb97c8e Raz Ben-Jehuda(caro  2006-12-10  8094) 
585d578974395f Yu Kuai              2026-01-12  8095  	ret = raid5_create_ctx_pool(conf);
585d578974395f Yu Kuai              2026-01-12  8096  	if (ret)
01fce9e38c0e92 Yu Kuai              2026-01-12  8097  		goto abort;
01fce9e38c0e92 Yu Kuai              2026-01-12  8098  
845b9e229fe071 Artur Paszkiewicz    2017-04-04  8099  	if (log_init(conf, journal_dev, raid5_has_ppl(conf)))
5aabf7c49d9ebe Song Liu             2016-11-17 @8100  		goto abort;

Presumably we should propagate the error code from log_init()?

5c7e81c3de9eb3 Shaohua Li           2015-08-13  8101  
^1da177e4c3f41 Linus Torvalds       2005-04-16  8102  	return 0;
^1da177e4c3f41 Linus Torvalds       2005-04-16  8103  abort:
7eb8ff02c1df27 Li Lingfeng          2023-08-03  8104  	md_unregister_thread(mddev, &mddev->thread);
^1da177e4c3f41 Linus Torvalds       2005-04-16  8105  	print_raid5_conf(conf);
95fc17aac45300 Dan Williams         2009-07-31  8106  	free_conf(conf);
^1da177e4c3f41 Linus Torvalds       2005-04-16  8107  	mddev->private = NULL;
cc6167b4f3b3ca NeilBrown            2016-11-02  8108  	pr_warn("md/raid:%s: failed to run raid set.\n", mdname(mddev));
f63f17350e5373 Christoph Hellwig    2024-03-03  8109  	return ret;
^1da177e4c3f41 Linus Torvalds       2005-04-16  8110  }

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 05/11] md/raid5: make sure max_sectors is not less than io_opt
  2026-01-13  5:06 ` Dan Carpenter
  (?)
@ 2026-01-13  6:11 ` Yu Kuai
  -1 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2026-01-13  6:11 UTC (permalink / raw)
  To: Dan Carpenter, oe-kbuild, linux-raid, linan122, yukuai; +Cc: lkp, oe-kbuild-all

Hi,

在 2026/1/13 13:06, Dan Carpenter 写道:
> Hi Yu,
>
> kernel test robot noticed the following build warnings:
>
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url:    https://github.com/intel-lab-lkp/linux/commits/Yu-Kuai/md-merge-mddev-has_superblock-into-mddev_flags/20260112-123233
> base:   linus/master
> patch link:    https://lore.kernel.org/r/20260112042857.2334264-6-yukuai%40fnnas.com
> patch subject: [PATCH v4 05/11] md/raid5: make sure max_sectors is not less than io_opt
> config: i386-randconfig-141-20260113 (https://download.01.org/0day-ci/archive/20260113/202601130531.LGfcZsa4-lkp@intel.com/config)
> compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
> smatch version: v0.5.0-8985-g2614ff1a
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
> | Closes: https://lore.kernel.org/r/202601130531.LGfcZsa4-lkp@intel.com/
>
> New smatch warnings:
> drivers/md/raid5.c:8100 raid5_run() warn: missing error code 'ret'
>
> vim +/ret +8100 drivers/md/raid5.c
>
> cc6167b4f3b3ca NeilBrown            2016-11-02  8064  	pr_info("md/raid:%s: raid level %d active with %d out of %d devices, algorithm %d\n",
> cc6167b4f3b3ca NeilBrown            2016-11-02  8065  		mdname(mddev), conf->level,
> ^1da177e4c3f41 Linus Torvalds       2005-04-16  8066  		mddev->raid_disks-mddev->degraded, mddev->raid_disks,
> e183eaedd53807 NeilBrown            2009-03-31  8067  		mddev->new_layout);
> ^1da177e4c3f41 Linus Torvalds       2005-04-16  8068
> ^1da177e4c3f41 Linus Torvalds       2005-04-16  8069  	print_raid5_conf(conf);
> ^1da177e4c3f41 Linus Torvalds       2005-04-16  8070
> fef9c61fdfabf9 NeilBrown            2009-03-31  8071  	if (conf->reshape_progress != MaxSector) {
> fef9c61fdfabf9 NeilBrown            2009-03-31  8072  		conf->reshape_safe = conf->reshape_progress;
> f67055780caac6 NeilBrown            2006-03-27  8073  		atomic_set(&conf->reshape_stripes, 0);
> f67055780caac6 NeilBrown            2006-03-27  8074  		clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
> f67055780caac6 NeilBrown            2006-03-27  8075  		clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
> f67055780caac6 NeilBrown            2006-03-27  8076  		set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
> ad39c08186f8a0 Yu Kuai              2024-02-01  8077  		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> f67055780caac6 NeilBrown            2006-03-27  8078  	}
> f67055780caac6 NeilBrown            2006-03-27  8079
> ^1da177e4c3f41 Linus Torvalds       2005-04-16  8080  	/* Ok, everything is just fine now */
> a64c876fd35790 NeilBrown            2010-04-14  8081  	if (mddev->to_remove == &raid5_attrs_group)
> a64c876fd35790 NeilBrown            2010-04-14  8082  		mddev->to_remove = NULL;
> 00bcb4ac7ee7e5 NeilBrown            2010-06-01  8083  	else if (mddev->kobj.sd &&
> 00bcb4ac7ee7e5 NeilBrown            2010-06-01  8084  	    sysfs_create_group(&mddev->kobj, &raid5_attrs_group))
> cc6167b4f3b3ca NeilBrown            2016-11-02  8085  		pr_warn("raid5: failed to create sysfs attributes for %s\n",
> 5e55e2f5fc95b3 NeilBrown            2007-03-26  8086  			mdname(mddev));
> 4a5add49951e69 NeilBrown            2010-06-01  8087  	md_set_array_sectors(mddev, raid5_size(mddev, 0, 0));
> 7a5febe9ffeecd NeilBrown            2005-05-16  8088
> 176df894d79741 Christoph Hellwig    2024-03-03  8089  	if (!mddev_is_dm(mddev)) {
> f63f17350e5373 Christoph Hellwig    2024-03-03  8090  		ret = raid5_set_limits(mddev);
> f63f17350e5373 Christoph Hellwig    2024-03-03  8091  		if (ret)
> f63f17350e5373 Christoph Hellwig    2024-03-03  8092  			goto abort;
> 9f7c2220017771 NeilBrown            2010-07-26  8093  	}
> 23032a0eb97c8e Raz Ben-Jehuda(caro  2006-12-10  8094)
> 585d578974395f Yu Kuai              2026-01-12  8095  	ret = raid5_create_ctx_pool(conf);
> 585d578974395f Yu Kuai              2026-01-12  8096  	if (ret)
> 01fce9e38c0e92 Yu Kuai              2026-01-12  8097  		goto abort;
> 01fce9e38c0e92 Yu Kuai              2026-01-12  8098
> 845b9e229fe071 Artur Paszkiewicz    2017-04-04  8099  	if (log_init(conf, journal_dev, raid5_has_ppl(conf)))
> 5aabf7c49d9ebe Song Liu             2016-11-17 @8100  		goto abort;
>
> Presumably we should propagate the error code from log_init()?

Yes we should, but this problem looks already exist before this patch.

>
> 5c7e81c3de9eb3 Shaohua Li           2015-08-13  8101
> ^1da177e4c3f41 Linus Torvalds       2005-04-16  8102  	return 0;
> ^1da177e4c3f41 Linus Torvalds       2005-04-16  8103  abort:
> 7eb8ff02c1df27 Li Lingfeng          2023-08-03  8104  	md_unregister_thread(mddev, &mddev->thread);
> ^1da177e4c3f41 Linus Torvalds       2005-04-16  8105  	print_raid5_conf(conf);
> 95fc17aac45300 Dan Williams         2009-07-31  8106  	free_conf(conf);
> ^1da177e4c3f41 Linus Torvalds       2005-04-16  8107  	mddev->private = NULL;
> cc6167b4f3b3ca NeilBrown            2016-11-02  8108  	pr_warn("md/raid:%s: failed to run raid set.\n", mdname(mddev));
> f63f17350e5373 Christoph Hellwig    2024-03-03  8109  	return ret;
> ^1da177e4c3f41 Linus Torvalds       2005-04-16  8110  }
>
-- 
Thansk,
Kuai

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 11/11] md: fix abnormal io_opt from member disks
  2026-01-12  4:28 ` [PATCH v4 11/11] md: fix abnormal io_opt from member disks Yu Kuai
  2026-01-12  7:28   ` Li Nan
@ 2026-01-14  3:15   ` Xiao Ni
  1 sibling, 0 replies; 19+ messages in thread
From: Xiao Ni @ 2026-01-14  3:15 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, linan122

On Mon, Jan 12, 2026 at 12:30 PM Yu Kuai <yukuai@fnnas.com> wrote:
>
> It's reported that mtp3sas can report abnormal io_opt, for consequence,
> md array will end up with abnormal io_opt as well, due to the
> lcm_not_zero() from blk_stack_limits().
>
> Some personalities will configure optimal IO size, and it's indicate that
> users can get the best IO bandwidth if they issue IO with this size, and
> we don't want io_opt to be covered by member disks with abnormal io_opt.
>
> Fix this problem by adding a new mddev flags MD_STACK_IO_OPT to indicate
> that io_opt configured by personalities is preferred over member disks
> or not.

Hi Kuai

In v4, it doesn't use MD_STACK_IO_OPT anymore. So the comment needs to
be modified.

The patch looks good to me.

Reviewed-by: Xiao Ni <xni@redhat.com>
>
> Reported-by: Filippo Giunchedi <filippo@debian.org>
> Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1121006
> Reported-by: Coly Li <colyli@fnnas.com>
> Closes: https://lore.kernel.org/all/20250817152645.7115-1-colyli@kernel.org/
> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
> ---
>  drivers/md/md.c     | 28 +++++++++++++++++++++++++++-
>  drivers/md/md.h     |  3 ++-
>  drivers/md/raid1.c  |  2 +-
>  drivers/md/raid10.c |  4 ++--
>  4 files changed, 32 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 731ec800f5cb..6c0fb09c26dc 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -6200,18 +6200,33 @@ static const struct kobj_type md_ktype = {
>
>  int mdp_major = 0;
>
> +static bool rdev_is_mddev(struct md_rdev *rdev)
> +{
> +       return rdev->bdev->bd_disk->fops == &md_fops;
> +}
> +
>  /* stack the limit for all rdevs into lim */
>  int mddev_stack_rdev_limits(struct mddev *mddev, struct queue_limits *lim,
>                 unsigned int flags)
>  {
>         struct md_rdev *rdev;
> +       bool io_opt_configured = lim->io_opt;
>
>         rdev_for_each(rdev, mddev) {
> +               unsigned int io_opt = lim->io_opt;
> +
>                 queue_limits_stack_bdev(lim, rdev->bdev, rdev->data_offset,
>                                         mddev->gendisk->disk_name);
>                 if ((flags & MDDEV_STACK_INTEGRITY) &&
>                     !queue_limits_stack_integrity_bdev(lim, rdev->bdev))
>                         return -EINVAL;
> +
> +               /*
> +                * If member disk is not mdraid array, keep the io_opt
> +                * from personality and ignore io_opt from member disk.
> +                */
> +               if (!rdev_is_mddev(rdev) && io_opt_configured)
> +                       lim->io_opt = io_opt;
>         }
>
>         /*
> @@ -6230,9 +6245,11 @@ int mddev_stack_rdev_limits(struct mddev *mddev, struct queue_limits *lim,
>  EXPORT_SYMBOL_GPL(mddev_stack_rdev_limits);
>
>  /* apply the extra stacking limits from a new rdev into mddev */
> -int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev)
> +int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev,
> +                        bool io_opt_configured)
>  {
>         struct queue_limits lim;
> +       unsigned int io_opt;
>
>         if (mddev_is_dm(mddev))
>                 return 0;
> @@ -6245,6 +6262,8 @@ int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev)
>         }
>
>         lim = queue_limits_start_update(mddev->gendisk->queue);
> +       io_opt = lim.io_opt;
> +
>         queue_limits_stack_bdev(&lim, rdev->bdev, rdev->data_offset,
>                                 mddev->gendisk->disk_name);
>
> @@ -6255,6 +6274,13 @@ int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev)
>                 return -ENXIO;
>         }
>
> +       /*
> +        * If member disk is not mdraid array, keep the io_opt from
> +        * personality and ignore io_opt from member disk.
> +        */
> +       if (!rdev_is_mddev(rdev) && io_opt_configured)
> +               lim.io_opt = io_opt;
> +
>         return queue_limits_commit_update(mddev->gendisk->queue, &lim);
>  }
>  EXPORT_SYMBOL_GPL(mddev_stack_new_rdev);
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index ddf989f2a139..80c527b3777d 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -1041,7 +1041,8 @@ int do_md_run(struct mddev *mddev);
>  #define MDDEV_STACK_INTEGRITY  (1u << 0)
>  int mddev_stack_rdev_limits(struct mddev *mddev, struct queue_limits *lim,
>                 unsigned int flags);
> -int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev);
> +int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev,
> +                        bool io_opt_configured);
>  void mddev_update_io_opt(struct mddev *mddev, unsigned int nr_stripes);
>
>  extern const struct block_device_operations md_fops;
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 1a957dba2640..f3f3086f27fa 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1944,7 +1944,7 @@ static int raid1_add_disk(struct mddev *mddev, struct md_rdev *rdev)
>         for (mirror = first; mirror <= last; mirror++) {
>                 p = conf->mirrors + mirror;
>                 if (!p->rdev) {
> -                       err = mddev_stack_new_rdev(mddev, rdev);
> +                       err = mddev_stack_new_rdev(mddev, rdev, false);
>                         if (err)
>                                 return err;
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 2c6b65b83724..a6edc91e7a9a 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -2139,7 +2139,7 @@ static int raid10_add_disk(struct mddev *mddev, struct md_rdev *rdev)
>                         continue;
>                 }
>
> -               err = mddev_stack_new_rdev(mddev, rdev);
> +               err = mddev_stack_new_rdev(mddev, rdev, true);
>                 if (err)
>                         return err;
>                 p->head_position = 0;
> @@ -2157,7 +2157,7 @@ static int raid10_add_disk(struct mddev *mddev, struct md_rdev *rdev)
>                 clear_bit(In_sync, &rdev->flags);
>                 set_bit(Replacement, &rdev->flags);
>                 rdev->raid_disk = repl_slot;
> -               err = mddev_stack_new_rdev(mddev, rdev);
> +               err = mddev_stack_new_rdev(mddev, rdev, true);
>                 if (err)
>                         return err;
>                 conf->fullsync = 1;
> --
> 2.51.0
>
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2026-01-14  3:16 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-12 22:15 [PATCH v4 05/11] md/raid5: make sure max_sectors is not less than io_opt kernel test robot
2026-01-13  5:06 ` Dan Carpenter
2026-01-13  6:11 ` Yu Kuai
  -- strict thread matches above, loose matches on Subject: below --
2026-01-12  4:28 [PATCH v4 00/11] md: align bio to io_opt for better performance Yu Kuai
2026-01-12  4:28 ` [PATCH v4 01/11] md: merge mddev has_superblock into mddev_flags Yu Kuai
2026-01-12  4:28 ` [PATCH v4 02/11] md: merge mddev faillast_dev " Yu Kuai
2026-01-12  4:28 ` [PATCH v4 03/11] md: merge mddev serialize_policy " Yu Kuai
2026-01-12  4:28 ` [PATCH v4 04/11] md/raid5: use mempool to allocate stripe_request_ctx Yu Kuai
2026-01-12  4:28 ` [PATCH v4 05/11] md/raid5: make sure max_sectors is not less than io_opt Yu Kuai
2026-01-12  4:28 ` [PATCH v4 06/11] md: support to align bio to limits Yu Kuai
2026-01-12 11:24   ` Li Nan
2026-01-12 11:40     ` Li Nan
2026-01-12  4:28 ` [PATCH v4 07/11] md: add a helper md_config_align_limits() Yu Kuai
2026-01-12  4:28 ` [PATCH v4 08/11] md/raid5: align bio to io_opt Yu Kuai
2026-01-12  4:28 ` [PATCH v4 09/11] md/raid10: " Yu Kuai
2026-01-12  4:28 ` [PATCH v4 10/11] md/raid0: " Yu Kuai
2026-01-12  4:28 ` [PATCH v4 11/11] md: fix abnormal io_opt from member disks Yu Kuai
2026-01-12  7:28   ` Li Nan
2026-01-14  3:15   ` Xiao Ni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.