[PATCH v3 0/8] folio support for sync I/O in RAID

Linux RAID subsystem development
 help / color / mirror / Atom feed

* [PATCH v3 0/8] folio support for sync I/O in RAID
@ 2026-04-16  3:37 linan666
  2026-04-16  3:37 ` [PATCH v3 1/8] md/raid1,raid10: clean up of RESYNC_SECTORS linan666
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: linan666 @ 2026-04-16  3:37 UTC (permalink / raw)
  To: song, yukuai; +Cc: linux-raid, linux-kernel, linan666, yangerkun, yi.zhang

From: Li Nan <linan122@huawei.com>

This patchset adds folio support to sync operations in raid1/10.
Previously, we used 16 * 4K pages for 64K sync I/O. With this change,
we'll use a single 64K folio instead. Using folios reduces
resync/recovery time by 20% on HDD.

This is the first step towards full folio support in RAID. Going forward,
I will replace the remaining page-based usage with folio.

The patchset was tested with mdadm. Additional fault injection stress tests
were run under file systems.

v3:
 - In patch 3/4/5, ntroduce safe_folio_put and use it for tmpfolio.
 - Merge Cleanup patch into patch 6.

v2:
 - Remove patch "md: use folio for bb_folio". It will be included in
   a later patch set
 - In patch 5:
    1) fix typo
    2) rewrite the logic of copying data in process_checks()
    3) rename resync_get_all_folio() to resync_get_folio()
    4) s/resync_pages *rps/resync_folio *rfs/g in
       raid1_alloc_init_r1buf() and raid10_alloc_init_r10buf()
 - Subsequent patches: Adapting conflicts caused by patch 5

Li Nan (8):
  md/raid1,raid10: clean up of RESYNC_SECTORS
  md: introduce sync_folio_io for folio support in RAID
  md: introduce safe_put_folio for folio support in RAID
  md/raid1: use folio for tmppage
  md/raid10: use folio for tmppage
  md/raid1,raid10: use folio for sync path IO
  md/raid1: fix IO error at logical block size granularity
  md/raid10: fix IO error at logical block size granularity

 drivers/md/md.h       |  10 +-
 drivers/md/raid1.h    |   2 +-
 drivers/md/raid10.h   |   2 +-
 drivers/md/md.c       |  17 ++-
 drivers/md/raid1-10.c |  81 ++++-------
 drivers/md/raid1.c    | 233 ++++++++++++++-----------------
 drivers/md/raid10.c   | 312 ++++++++++++++++++++----------------------
 7 files changed, 297 insertions(+), 360 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 1/8] md/raid1,raid10: clean up of RESYNC_SECTORS
  2026-04-16  3:37 [PATCH v3 0/8] folio support for sync I/O in RAID linan666
@ 2026-04-16  3:37 ` linan666
  2026-04-16  3:37 ` [PATCH v3 2/8] md: introduce sync_folio_io for folio support in RAID linan666
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: linan666 @ 2026-04-16  3:37 UTC (permalink / raw)
  To: song, yukuai; +Cc: linux-raid, linux-kernel, linan666, yangerkun, yi.zhang

From: Li Nan <linan122@huawei.com>

Move redundant RESYNC_SECTORS definition from raid1 and raid10
implementations to raid1-10.c.

Simplify max_sync assignment in raid10_sync_request().

No functional changes.

Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai@fnnas.com>
---
 drivers/md/raid1-10.c | 1 +
 drivers/md/raid1.c    | 1 -
 drivers/md/raid10.c   | 4 +---
 3 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
index c33099925f23..cda531d0720b 100644
--- a/drivers/md/raid1-10.c
+++ b/drivers/md/raid1-10.c
@@ -2,6 +2,7 @@
 /* Maximum size of each resync request */
 #define RESYNC_BLOCK_SIZE (64*1024)
 #define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
+#define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
 
 /* when we get a read error on a read-only array, we redirect to another
  * device without failing the first device, or trying to over-write to
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 867db18bc3ba..5a73a9f19e0e 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -136,7 +136,6 @@ static void *r1bio_pool_alloc(gfp_t gfp_flags, struct r1conf *conf)
 }
 
 #define RESYNC_DEPTH 32
-#define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
 #define RESYNC_WINDOW (RESYNC_BLOCK_SIZE * RESYNC_DEPTH)
 #define RESYNC_WINDOW_SECTORS (RESYNC_WINDOW >> 9)
 #define CLUSTER_RESYNC_WINDOW (16 * RESYNC_WINDOW)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index b4892c5d571c..90c1036f6ec4 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -113,7 +113,6 @@ static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data)
 	return kzalloc(size, gfp_flags);
 }
 
-#define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
 /* amount of memory to reserve for resync requests */
 #define RESYNC_WINDOW (1024*1024)
 /* maximum number of concurrent requests, memory permitting */
@@ -3153,7 +3152,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 	struct bio *biolist = NULL, *bio;
 	sector_t nr_sectors;
 	int i;
-	int max_sync;
+	int max_sync = RESYNC_SECTORS;
 	sector_t sync_blocks;
 	sector_t chunk_mask = conf->geo.chunk_mask;
 	int page_idx = 0;
@@ -3266,7 +3265,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 	 * end_sync_write if we will want to write.
 	 */
 
-	max_sync = RESYNC_PAGES << (PAGE_SHIFT-9);
 	if (!test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
 		/* recovery... the complicated one */
 		int j;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 2/8] md: introduce sync_folio_io for folio support in RAID
  2026-04-16  3:37 [PATCH v3 0/8] folio support for sync I/O in RAID linan666
  2026-04-16  3:37 ` [PATCH v3 1/8] md/raid1,raid10: clean up of RESYNC_SECTORS linan666
@ 2026-04-16  3:37 ` linan666
  2026-04-16  3:37 ` [PATCH v3 3/8] md: introduce safe_put_folio " linan666
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: linan666 @ 2026-04-16  3:37 UTC (permalink / raw)
  To: song, yukuai; +Cc: linux-raid, linux-kernel, linan666, yangerkun, yi.zhang

From: Li Nan <linan122@huawei.com>

Prepare for folio support in RAID by introducing sync_folio_io(),
matching sync_page_io()'s functionality. Differences are:

- Add new parameter 'off' to prepare for adding a folio to bio in segments,
  e.g. in fix_recovery_read_error()
- Change return value to bool
- Replace the checking to 'bio.bi_status == BLK_STS_OK'

sync_page_io() will be removed once full folio support is complete.

Signed-off-by: Li Nan <linan122@huawei.com>
---
 drivers/md/md.h |  4 +++-
 drivers/md/md.c | 15 +++++++++++----
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/md/md.h b/drivers/md/md.h
index ac84289664cd..914b992a073b 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -924,8 +924,10 @@ void md_write_metadata(struct mddev *mddev, struct md_rdev *rdev,
 		       sector_t sector, int size, struct page *page,
 		       unsigned int offset);
 extern int md_super_wait(struct mddev *mddev);
-extern int sync_page_io(struct md_rdev *rdev, sector_t sector, int size,
+extern bool sync_page_io(struct md_rdev *rdev, sector_t sector, int size,
 		struct page *page, blk_opf_t opf, bool metadata_op);
+extern bool sync_folio_io(struct md_rdev *rdev, sector_t sector, int size,
+		int off, struct folio *folio, blk_opf_t opf, bool metadata_op);
 extern void md_do_sync(struct md_thread *thread);
 extern void md_new_event(void);
 extern void md_allow_write(struct mddev *mddev);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index d9c9fd2839b3..5e83914d5c14 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1166,8 +1166,8 @@ int md_super_wait(struct mddev *mddev)
 	return 0;
 }
 
-int sync_page_io(struct md_rdev *rdev, sector_t sector, int size,
-		 struct page *page, blk_opf_t opf, bool metadata_op)
+bool sync_folio_io(struct md_rdev *rdev, sector_t sector, int size, int off,
+		  struct folio *folio, blk_opf_t opf, bool metadata_op)
 {
 	struct bio bio;
 	struct bio_vec bvec;
@@ -1185,11 +1185,18 @@ int sync_page_io(struct md_rdev *rdev, sector_t sector, int size,
 		bio.bi_iter.bi_sector = sector + rdev->new_data_offset;
 	else
 		bio.bi_iter.bi_sector = sector + rdev->data_offset;
-	__bio_add_page(&bio, page, size, 0);
+	bio_add_folio_nofail(&bio, folio, size, off);
 
 	submit_bio_wait(&bio);
 
-	return !bio.bi_status;
+	return bio.bi_status == BLK_STS_OK;
+}
+EXPORT_SYMBOL_GPL(sync_folio_io);
+
+bool sync_page_io(struct md_rdev *rdev, sector_t sector, int size,
+		 struct page *page, blk_opf_t opf, bool metadata_op)
+{
+	return sync_folio_io(rdev, sector, size, 0, page_folio(page), opf, metadata_op);
 }
 EXPORT_SYMBOL_GPL(sync_page_io);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 3/8] md: introduce safe_put_folio for folio support in RAID
  2026-04-16  3:37 [PATCH v3 0/8] folio support for sync I/O in RAID linan666
  2026-04-16  3:37 ` [PATCH v3 1/8] md/raid1,raid10: clean up of RESYNC_SECTORS linan666
  2026-04-16  3:37 ` [PATCH v3 2/8] md: introduce sync_folio_io for folio support in RAID linan666
@ 2026-04-16  3:37 ` linan666
  2026-04-16  3:37 ` [PATCH v3 4/8] md/raid1: use folio for tmppage linan666
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: linan666 @ 2026-04-16  3:37 UTC (permalink / raw)
  To: song, yukuai; +Cc: linux-raid, linux-kernel, linan666, yangerkun, yi.zhang

From: Li Nan <linan122@huawei.com>

safe_put_page() will be removed after the last reference to it in RAID5
is removed.

Signed-off-by: Li Nan <linan122@huawei.com>
---
 drivers/md/md.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/md/md.h b/drivers/md/md.h
index 914b992a073b..7c0c38f09cc3 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -888,6 +888,12 @@ struct md_io_clone {
 	rcu_read_unlock();                  \
 } while (0)
 
+static inline void safe_folio_put(struct folio *folio)
+{
+	if (folio)
+		folio_put(folio);
+}
+
 static inline void safe_put_page(struct page *p)
 {
 	if (p) put_page(p);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 4/8] md/raid1: use folio for tmppage
  2026-04-16  3:37 [PATCH v3 0/8] folio support for sync I/O in RAID linan666
                   ` (2 preceding siblings ...)
  2026-04-16  3:37 ` [PATCH v3 3/8] md: introduce safe_put_folio " linan666
@ 2026-04-16  3:37 ` linan666
  2026-04-16  3:37 ` [PATCH v3 5/8] md/raid10: " linan666
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: linan666 @ 2026-04-16  3:37 UTC (permalink / raw)
  To: song, yukuai; +Cc: linux-raid, linux-kernel, linan666, yangerkun, yi.zhang

From: Li Nan <linan122@huawei.com>

Convert tmppage to tmpfolio and use it throughout in raid1.

Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
---
 drivers/md/raid1.h |  2 +-
 drivers/md/raid1.c | 18 ++++++++++--------
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
index c98d43a7ae99..d480b3a8c2c4 100644
--- a/drivers/md/raid1.h
+++ b/drivers/md/raid1.h
@@ -101,7 +101,7 @@ struct r1conf {
 	/* temporary buffer to synchronous IO when attempting to repair
 	 * a read error.
 	 */
-	struct page		*tmppage;
+	struct folio		*tmpfolio;
 
 	/* When taking over an array from a different personality, we store
 	 * the new thread here until we fully activate the array.
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 5a73a9f19e0e..a72abdc37a2d 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2417,8 +2417,8 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
 			      rdev->recovery_offset >= sect + s)) &&
 			    rdev_has_badblock(rdev, sect, s) == 0) {
 				atomic_inc(&rdev->nr_pending);
-				if (sync_page_io(rdev, sect, s<<9,
-					 conf->tmppage, REQ_OP_READ, false))
+				if (sync_folio_io(rdev, sect, s<<9, 0,
+					 conf->tmpfolio, REQ_OP_READ, false))
 					success = 1;
 				rdev_dec_pending(rdev, mddev);
 				if (success)
@@ -2447,7 +2447,8 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
 			    !test_bit(Faulty, &rdev->flags)) {
 				atomic_inc(&rdev->nr_pending);
 				r1_sync_page_io(rdev, sect, s,
-						conf->tmppage, REQ_OP_WRITE);
+						folio_page(conf->tmpfolio, 0),
+						REQ_OP_WRITE);
 				rdev_dec_pending(rdev, mddev);
 			}
 		}
@@ -2461,7 +2462,8 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
 			    !test_bit(Faulty, &rdev->flags)) {
 				atomic_inc(&rdev->nr_pending);
 				if (r1_sync_page_io(rdev, sect, s,
-						conf->tmppage, REQ_OP_READ)) {
+						folio_page(conf->tmpfolio, 0),
+						REQ_OP_READ)) {
 					atomic_add(s, &rdev->corrected_errors);
 					pr_info("md/raid1:%s: read error corrected (%d sectors at %llu on %pg)\n",
 						mdname(mddev), s,
@@ -3099,8 +3101,8 @@ static struct r1conf *setup_conf(struct mddev *mddev)
 	if (!conf->mirrors)
 		goto abort;
 
-	conf->tmppage = alloc_page(GFP_KERNEL);
-	if (!conf->tmppage)
+	conf->tmpfolio = folio_alloc(GFP_KERNEL, 0);
+	if (!conf->tmpfolio)
 		goto abort;
 
 	r1bio_size = offsetof(struct r1bio, bios[mddev->raid_disks * 2]);
@@ -3175,7 +3177,7 @@ static struct r1conf *setup_conf(struct mddev *mddev)
 	if (conf) {
 		mempool_destroy(conf->r1bio_pool);
 		kfree(conf->mirrors);
-		safe_put_page(conf->tmppage);
+		safe_folio_put(conf->tmpfolio);
 		kfree(conf->nr_pending);
 		kfree(conf->nr_waiting);
 		kfree(conf->nr_queued);
@@ -3290,7 +3292,7 @@ static void raid1_free(struct mddev *mddev, void *priv)
 
 	mempool_destroy(conf->r1bio_pool);
 	kfree(conf->mirrors);
-	safe_put_page(conf->tmppage);
+	safe_folio_put(conf->tmpfolio);
 	kfree(conf->nr_pending);
 	kfree(conf->nr_waiting);
 	kfree(conf->nr_queued);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 5/8] md/raid10: use folio for tmppage
  2026-04-16  3:37 [PATCH v3 0/8] folio support for sync I/O in RAID linan666
                   ` (3 preceding siblings ...)
  2026-04-16  3:37 ` [PATCH v3 4/8] md/raid1: use folio for tmppage linan666
@ 2026-04-16  3:37 ` linan666
  2026-04-16  3:37 ` [PATCH v3 6/8] md/raid1,raid10: use folio for sync path IO linan666
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: linan666 @ 2026-04-16  3:37 UTC (permalink / raw)
  To: song, yukuai; +Cc: linux-raid, linux-kernel, linan666, yangerkun, yi.zhang

From: Li Nan <linan122@huawei.com>

Convert tmppage to tmpfolio and use it throughout in raid10.

Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
---
 drivers/md/raid10.h |  2 +-
 drivers/md/raid10.c | 37 +++++++++++++++++++------------------
 2 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/drivers/md/raid10.h b/drivers/md/raid10.h
index ec79d87fb92f..19f37439a4e2 100644
--- a/drivers/md/raid10.h
+++ b/drivers/md/raid10.h
@@ -89,7 +89,7 @@ struct r10conf {
 
 	mempool_t		r10bio_pool;
 	mempool_t		r10buf_pool;
-	struct page		*tmppage;
+	struct folio		*tmpfolio;
 	struct bio_set		bio_split;
 
 	/* When taking over an array from a different personality, we store
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 90c1036f6ec4..26f93040cd13 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2581,13 +2581,13 @@ static void recovery_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 	}
 }
 
-static int r10_sync_page_io(struct md_rdev *rdev, sector_t sector,
-			    int sectors, struct page *page, enum req_op op)
+static int r10_sync_folio_io(struct md_rdev *rdev, sector_t sector,
+			    int sectors, struct folio *folio, enum req_op op)
 {
 	if (rdev_has_badblock(rdev, sector, sectors) &&
 	    (op == REQ_OP_READ || test_bit(WriteErrorSeen, &rdev->flags)))
 		return -1;
-	if (sync_page_io(rdev, sector, sectors << 9, page, op, false))
+	if (sync_folio_io(rdev, sector, sectors << 9, 0, folio, op, false))
 		/* success */
 		return 1;
 	if (op == REQ_OP_WRITE) {
@@ -2650,12 +2650,13 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
 					      r10_bio->devs[sl].addr + sect,
 					      s) == 0) {
 				atomic_inc(&rdev->nr_pending);
-				success = sync_page_io(rdev,
-						       r10_bio->devs[sl].addr +
-						       sect,
-						       s<<9,
-						       conf->tmppage,
-						       REQ_OP_READ, false);
+				success = sync_folio_io(rdev,
+							r10_bio->devs[sl].addr +
+							sect,
+							s<<9,
+							0,
+							conf->tmpfolio,
+							REQ_OP_READ, false);
 				rdev_dec_pending(rdev, mddev);
 				if (success)
 					break;
@@ -2698,10 +2699,10 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
 				continue;
 
 			atomic_inc(&rdev->nr_pending);
-			if (r10_sync_page_io(rdev,
-					     r10_bio->devs[sl].addr +
-					     sect,
-					     s, conf->tmppage, REQ_OP_WRITE)
+			if (r10_sync_folio_io(rdev,
+					      r10_bio->devs[sl].addr +
+					      sect,
+					      s, conf->tmpfolio, REQ_OP_WRITE)
 			    == 0) {
 				/* Well, this device is dead */
 				pr_notice("md/raid10:%s: read correction write failed (%d sectors at %llu on %pg)\n",
@@ -2730,10 +2731,10 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
 				continue;
 
 			atomic_inc(&rdev->nr_pending);
-			switch (r10_sync_page_io(rdev,
+			switch (r10_sync_folio_io(rdev,
 					     r10_bio->devs[sl].addr +
 					     sect,
-					     s, conf->tmppage, REQ_OP_READ)) {
+					     s, conf->tmpfolio, REQ_OP_READ)) {
 			case 0:
 				/* Well, this device is dead */
 				pr_notice("md/raid10:%s: unable to read back corrected sectors (%d sectors at %llu on %pg)\n",
@@ -3823,7 +3824,7 @@ static void raid10_free_conf(struct r10conf *conf)
 	kfree(conf->mirrors);
 	kfree(conf->mirrors_old);
 	kfree(conf->mirrors_new);
-	safe_put_page(conf->tmppage);
+	safe_folio_put(conf->tmpfolio);
 	bioset_exit(&conf->bio_split);
 	kfree(conf);
 }
@@ -3861,8 +3862,8 @@ static struct r10conf *setup_conf(struct mddev *mddev)
 	if (!conf->mirrors)
 		goto out;
 
-	conf->tmppage = alloc_page(GFP_KERNEL);
-	if (!conf->tmppage)
+	conf->tmpfolio = folio_alloc(GFP_KERNEL, 0);
+	if (!conf->tmpfolio)
 		goto out;
 
 	conf->geo = geo;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 6/8] md/raid1,raid10: use folio for sync path IO
  2026-04-16  3:37 [PATCH v3 0/8] folio support for sync I/O in RAID linan666
                   ` (4 preceding siblings ...)
  2026-04-16  3:37 ` [PATCH v3 5/8] md/raid10: " linan666
@ 2026-04-16  3:37 ` linan666
  2026-04-30  1:54   ` Xiao Ni
  2026-04-16  3:38 ` [PATCH v3 7/8] md/raid1: fix IO error at logical block size granularity linan666
  2026-04-16  3:38 ` [PATCH v3 8/8] md/raid10: " linan666
  7 siblings, 1 reply; 13+ messages in thread
From: linan666 @ 2026-04-16  3:37 UTC (permalink / raw)
  To: song, yukuai; +Cc: linux-raid, linux-kernel, linan666, yangerkun, yi.zhang

From: Li Nan <linan122@huawei.com>

Convert all IO on the sync path to use folios, and rename page-related
identifiers to match folio.

Since RESYNC_BLOCK_SIZE (64K) has higher allocation failure chance than 4k,
retry with lower orders to improve allocation reliability. A r1/10_bio may
have different rf->folio orders, so use minimum order as r1/10_bio sectors
to prevent exceeding size when adding folio to IO later.

Clean up:
1. Remove resync_get_all_folio() and invoke folio_get() directly instead.
2. Clean up redundant while(0) loop in md_bio_reset_resync_folio().
3. Clean up bio variable by directly referencing r10_bio->devs[j].bio
   instead in r1buf_pool_alloc() and r10buf_pool_alloc().
4. Clean up RESYNC_PAGES.
5. Remove resync_fetch_folio(), access 'rf->folio' directly.
6. Remove resync_free_folio(), call folio_put() directly.
7. clean up sync IO size calculation in raid1/10_sync_request.

Signed-off-by: Li Nan <linan122@huawei.com>
---
 drivers/md/md.c       |   2 +-
 drivers/md/raid1-10.c |  80 ++++---------
 drivers/md/raid1.c    | 209 +++++++++++++++-------------------
 drivers/md/raid10.c   | 254 +++++++++++++++++++++---------------------
 4 files changed, 240 insertions(+), 305 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 5e83914d5c14..6554b849ac74 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9440,7 +9440,7 @@ static bool sync_io_within_limit(struct mddev *mddev)
 {
 	/*
 	 * For raid456, sync IO is stripe(4k) per IO, for other levels, it's
-	 * RESYNC_PAGES(64k) per IO.
+	 * RESYNC_BLOCK_SIZE(64k) per IO.
 	 */
 	return atomic_read(&mddev->recovery_active) <
 	       (raid_is_456(mddev) ? 8 : 128) * sync_io_depth(mddev);
diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
index cda531d0720b..10200b0a3fd2 100644
--- a/drivers/md/raid1-10.c
+++ b/drivers/md/raid1-10.c
@@ -1,7 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Maximum size of each resync request */
 #define RESYNC_BLOCK_SIZE (64*1024)
-#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
 #define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
 
 /* when we get a read error on a read-only array, we redirect to another
@@ -20,9 +19,9 @@
 #define MAX_PLUG_BIO 32
 
 /* for managing resync I/O pages */
-struct resync_pages {
+struct resync_folio {
 	void		*raid_bio;
-	struct page	*pages[RESYNC_PAGES];
+	struct folio	*folio;
 };
 
 struct raid1_plug_cb {
@@ -36,77 +35,44 @@ static void rbio_pool_free(void *rbio, void *data)
 	kfree(rbio);
 }
 
-static inline int resync_alloc_pages(struct resync_pages *rp,
-				     gfp_t gfp_flags)
+static inline int resync_alloc_folio(struct resync_folio *rf,
+				     gfp_t gfp_flags, int *order)
 {
-	int i;
+	struct folio *folio;
 
-	for (i = 0; i < RESYNC_PAGES; i++) {
-		rp->pages[i] = alloc_page(gfp_flags);
-		if (!rp->pages[i])
-			goto out_free;
-	}
+	do {
+		folio = folio_alloc(gfp_flags, *order);
+		if (folio)
+			break;
+	} while (--(*order) > 0);
 
+	if (!folio)
+		return -ENOMEM;
+
+	rf->folio = folio;
 	return 0;
-
-out_free:
-	while (--i >= 0)
-		put_page(rp->pages[i]);
-	return -ENOMEM;
-}
-
-static inline void resync_free_pages(struct resync_pages *rp)
-{
-	int i;
-
-	for (i = 0; i < RESYNC_PAGES; i++)
-		put_page(rp->pages[i]);
-}
-
-static inline void resync_get_all_pages(struct resync_pages *rp)
-{
-	int i;
-
-	for (i = 0; i < RESYNC_PAGES; i++)
-		get_page(rp->pages[i]);
-}
-
-static inline struct page *resync_fetch_page(struct resync_pages *rp,
-					     unsigned idx)
-{
-	if (WARN_ON_ONCE(idx >= RESYNC_PAGES))
-		return NULL;
-	return rp->pages[idx];
 }
 
 /*
- * 'strct resync_pages' stores actual pages used for doing the resync
+ * 'strct resync_folio' stores actual pages used for doing the resync
  *  IO, and it is per-bio, so make .bi_private points to it.
  */
-static inline struct resync_pages *get_resync_pages(struct bio *bio)
+static inline struct resync_folio *get_resync_folio(struct bio *bio)
 {
 	return bio->bi_private;
 }
 
 /* generally called after bio_reset() for reseting bvec */
-static void md_bio_reset_resync_pages(struct bio *bio, struct resync_pages *rp,
+static void md_bio_reset_resync_folio(struct bio *bio, struct resync_folio *rf,
 			       int size)
 {
-	int idx = 0;
-
 	/* initialize bvec table again */
-	do {
-		struct page *page = resync_fetch_page(rp, idx);
-		int len = min_t(int, size, PAGE_SIZE);
-
-		if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
-			bio->bi_status = BLK_STS_RESOURCE;
-			bio_endio(bio);
-			return;
-		}
-
-		size -= len;
-	} while (idx++ < RESYNC_PAGES && size > 0);
+	if (WARN_ON(!bio_add_folio(bio, rf->folio,
+				   min_t(int, size, RESYNC_BLOCK_SIZE),
+				   0))) {
+		bio->bi_status = BLK_STS_RESOURCE;
+		bio_endio(bio);
+	}
 }
 
 
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index a72abdc37a2d..724fd4f2cc3a 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -120,11 +120,11 @@ static void remove_serial(struct md_rdev *rdev, sector_t lo, sector_t hi)
 
 /*
  * for resync bio, r1bio pointer can be retrieved from the per-bio
- * 'struct resync_pages'.
+ * 'struct resync_folio'.
  */
 static inline struct r1bio *get_resync_r1bio(struct bio *bio)
 {
-	return get_resync_pages(bio)->raid_bio;
+	return get_resync_folio(bio)->raid_bio;
 }
 
 static void *r1bio_pool_alloc(gfp_t gfp_flags, struct r1conf *conf)
@@ -146,70 +146,69 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
 	struct r1conf *conf = data;
 	struct r1bio *r1_bio;
 	struct bio *bio;
-	int need_pages;
+	int need_folio;
 	int j;
-	struct resync_pages *rps;
+	struct resync_folio *rfs;
+	int order = get_order(RESYNC_BLOCK_SIZE);
 
 	r1_bio = r1bio_pool_alloc(gfp_flags, conf);
 	if (!r1_bio)
 		return NULL;
 
-	rps = kmalloc_array(conf->raid_disks * 2, sizeof(struct resync_pages),
+	rfs = kmalloc_array(conf->raid_disks * 2, sizeof(struct resync_folio),
 			    gfp_flags);
-	if (!rps)
+	if (!rfs)
 		goto out_free_r1bio;
 
 	/*
 	 * Allocate bios : 1 for reading, n-1 for writing
 	 */
 	for (j = conf->raid_disks * 2; j-- ; ) {
-		bio = bio_kmalloc(RESYNC_PAGES, gfp_flags);
+		bio = bio_kmalloc(1, gfp_flags);
 		if (!bio)
 			goto out_free_bio;
-		bio_init_inline(bio, NULL, RESYNC_PAGES, 0);
+		bio_init_inline(bio, NULL, 1, 0);
 		r1_bio->bios[j] = bio;
 	}
 	/*
-	 * Allocate RESYNC_PAGES data pages and attach them to
-	 * the first bio.
+	 * Allocate data folio and attach it to the first bio.
 	 * If this is a user-requested check/repair, allocate
-	 * RESYNC_PAGES for each bio.
+	 * folio for each bio.
 	 */
 	if (test_bit(MD_RECOVERY_REQUESTED, &conf->mddev->recovery))
-		need_pages = conf->raid_disks * 2;
+		need_folio = conf->raid_disks * 2;
 	else
-		need_pages = 1;
+		need_folio = 1;
 	for (j = 0; j < conf->raid_disks * 2; j++) {
-		struct resync_pages *rp = &rps[j];
+		struct resync_folio *rf = &rfs[j];
 
-		bio = r1_bio->bios[j];
-
-		if (j < need_pages) {
-			if (resync_alloc_pages(rp, gfp_flags))
-				goto out_free_pages;
+		if (j < need_folio) {
+			if (resync_alloc_folio(rf, gfp_flags, &order))
+				goto out_free_folio;
 		} else {
-			memcpy(rp, &rps[0], sizeof(*rp));
-			resync_get_all_pages(rp);
+			memcpy(rf, &rfs[0], sizeof(*rf));
+			folio_get(rf->folio);
 		}
 
-		rp->raid_bio = r1_bio;
-		bio->bi_private = rp;
+		rf->raid_bio = r1_bio;
+		r1_bio->bios[j]->bi_private = rf;
 	}
 
+	r1_bio->sectors = 1 << (order + PAGE_SECTORS_SHIFT);
 	r1_bio->master_bio = NULL;
 
 	return r1_bio;
 
-out_free_pages:
+out_free_folio:
 	while (--j >= 0)
-		resync_free_pages(&rps[j]);
+		folio_put(rfs[j].folio);
 
 out_free_bio:
 	while (++j < conf->raid_disks * 2) {
 		bio_uninit(r1_bio->bios[j]);
 		kfree(r1_bio->bios[j]);
 	}
-	kfree(rps);
+	kfree(rfs);
 
 out_free_r1bio:
 	rbio_pool_free(r1_bio, data);
@@ -221,17 +220,17 @@ static void r1buf_pool_free(void *__r1_bio, void *data)
 	struct r1conf *conf = data;
 	int i;
 	struct r1bio *r1bio = __r1_bio;
-	struct resync_pages *rp = NULL;
+	struct resync_folio *rf = NULL;
 
 	for (i = conf->raid_disks * 2; i--; ) {
-		rp = get_resync_pages(r1bio->bios[i]);
-		resync_free_pages(rp);
+		rf = get_resync_folio(r1bio->bios[i]);
+		folio_put(rf->folio);
 		bio_uninit(r1bio->bios[i]);
 		kfree(r1bio->bios[i]);
 	}
 
-	/* resync pages array stored in the 1st bio's .bi_private */
-	kfree(rp);
+	/* resync folio stored in the 1st bio's .bi_private */
+	kfree(rf);
 
 	rbio_pool_free(r1bio, data);
 }
@@ -2095,10 +2094,10 @@ static void end_sync_write(struct bio *bio)
 	put_sync_write_buf(r1_bio);
 }
 
-static int r1_sync_page_io(struct md_rdev *rdev, sector_t sector,
-			   int sectors, struct page *page, blk_opf_t rw)
+static int r1_sync_folio_io(struct md_rdev *rdev, sector_t sector, int sectors,
+			    int off, struct folio *folio, blk_opf_t rw)
 {
-	if (sync_page_io(rdev, sector, sectors << 9, page, rw, false))
+	if (sync_folio_io(rdev, sector, sectors << 9, off, folio, rw, false))
 		/* success */
 		return 1;
 	if (rw == REQ_OP_WRITE) {
@@ -2129,10 +2128,10 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
 	struct mddev *mddev = r1_bio->mddev;
 	struct r1conf *conf = mddev->private;
 	struct bio *bio = r1_bio->bios[r1_bio->read_disk];
-	struct page **pages = get_resync_pages(bio)->pages;
+	struct folio *folio = get_resync_folio(bio)->folio;
 	sector_t sect = r1_bio->sector;
 	int sectors = r1_bio->sectors;
-	int idx = 0;
+	int off = 0;
 	struct md_rdev *rdev;
 
 	rdev = conf->mirrors[r1_bio->read_disk].rdev;
@@ -2162,9 +2161,8 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
 				 * active, and resync is currently active
 				 */
 				rdev = conf->mirrors[d].rdev;
-				if (sync_page_io(rdev, sect, s<<9,
-						 pages[idx],
-						 REQ_OP_READ, false)) {
+				if (sync_folio_io(rdev, sect, s<<9, off, folio,
+						  REQ_OP_READ, false)) {
 					success = 1;
 					break;
 				}
@@ -2197,7 +2195,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
 			/* Try next page */
 			sectors -= s;
 			sect += s;
-			idx++;
+			off += s << 9;
 			continue;
 		}
 
@@ -2210,8 +2208,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
 			if (r1_bio->bios[d]->bi_end_io != end_sync_read)
 				continue;
 			rdev = conf->mirrors[d].rdev;
-			if (r1_sync_page_io(rdev, sect, s,
-					    pages[idx],
+			if (r1_sync_folio_io(rdev, sect, s, off, folio,
 					    REQ_OP_WRITE) == 0) {
 				r1_bio->bios[d]->bi_end_io = NULL;
 				rdev_dec_pending(rdev, mddev);
@@ -2225,14 +2222,13 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
 			if (r1_bio->bios[d]->bi_end_io != end_sync_read)
 				continue;
 			rdev = conf->mirrors[d].rdev;
-			if (r1_sync_page_io(rdev, sect, s,
-					    pages[idx],
+			if (r1_sync_folio_io(rdev, sect, s, off, folio,
 					    REQ_OP_READ) != 0)
 				atomic_add(s, &rdev->corrected_errors);
 		}
 		sectors -= s;
 		sect += s;
-		idx ++;
+		off += s << 9;
 	}
 	set_bit(R1BIO_Uptodate, &r1_bio->state);
 	bio->bi_status = 0;
@@ -2252,14 +2248,12 @@ static void process_checks(struct r1bio *r1_bio)
 	struct r1conf *conf = mddev->private;
 	int primary;
 	int i;
-	int vcnt;
 
 	/* Fix variable parts of all bios */
-	vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9);
 	for (i = 0; i < conf->raid_disks * 2; i++) {
 		blk_status_t status;
 		struct bio *b = r1_bio->bios[i];
-		struct resync_pages *rp = get_resync_pages(b);
+		struct resync_folio *rf = get_resync_folio(b);
 		if (b->bi_end_io != end_sync_read)
 			continue;
 		/* fixup the bio for reuse, but preserve errno */
@@ -2269,11 +2263,11 @@ static void process_checks(struct r1bio *r1_bio)
 		b->bi_iter.bi_sector = r1_bio->sector +
 			conf->mirrors[i].rdev->data_offset;
 		b->bi_end_io = end_sync_read;
-		rp->raid_bio = r1_bio;
-		b->bi_private = rp;
+		rf->raid_bio = r1_bio;
+		b->bi_private = rf;
 
 		/* initialize bvec table again */
-		md_bio_reset_resync_pages(b, rp, r1_bio->sectors << 9);
+		md_bio_reset_resync_folio(b, rf, r1_bio->sectors << 9);
 	}
 	for (primary = 0; primary < conf->raid_disks * 2; primary++)
 		if (r1_bio->bios[primary]->bi_end_io == end_sync_read &&
@@ -2284,44 +2278,39 @@ static void process_checks(struct r1bio *r1_bio)
 		}
 	r1_bio->read_disk = primary;
 	for (i = 0; i < conf->raid_disks * 2; i++) {
-		int j = 0;
 		struct bio *pbio = r1_bio->bios[primary];
 		struct bio *sbio = r1_bio->bios[i];
 		blk_status_t status = sbio->bi_status;
-		struct page **ppages = get_resync_pages(pbio)->pages;
-		struct page **spages = get_resync_pages(sbio)->pages;
-		struct bio_vec *bi;
-		int page_len[RESYNC_PAGES] = { 0 };
-		struct bvec_iter_all iter_all;
+		struct folio *pfolio = get_resync_folio(pbio)->folio;
+		struct folio *sfolio = get_resync_folio(sbio)->folio;
 
 		if (sbio->bi_end_io != end_sync_read)
 			continue;
 		/* Now we can 'fixup' the error value */
 		sbio->bi_status = 0;
 
-		bio_for_each_segment_all(bi, sbio, iter_all)
-			page_len[j++] = bi->bv_len;
-
-		if (!status) {
-			for (j = vcnt; j-- ; ) {
-				if (memcmp(page_address(ppages[j]),
-					   page_address(spages[j]),
-					   page_len[j]))
-					break;
-			}
-		} else
-			j = 0;
-		if (j >= 0)
+		/*
+		 * Copy data and submit write in two cases:
+		 * - IO error (non-zero status)
+		 * - Data inconsistency and not a CHECK operation.
+		 */
+		if (status) {
 			atomic64_add(r1_bio->sectors, &mddev->resync_mismatches);
-		if (j < 0 || (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)
-			      && !status)) {
-			/* No need to write to this device. */
-			sbio->bi_end_io = NULL;
-			rdev_dec_pending(conf->mirrors[i].rdev, mddev);
+			bio_copy_data(sbio, pbio);
 			continue;
+		} else if (memcmp(folio_address(pfolio),
+			folio_address(sfolio),
+			r1_bio->sectors << 9)) {
+			atomic64_add(r1_bio->sectors, &mddev->resync_mismatches);
+			if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) {
+				bio_copy_data(sbio, pbio);
+				continue;
+			}
 		}
 
-		bio_copy_data(sbio, pbio);
+		/* No need to write to this device. */
+		sbio->bi_end_io = NULL;
+		rdev_dec_pending(conf->mirrors[i].rdev, mddev);
 	}
 }
 
@@ -2446,9 +2435,8 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
 			if (rdev &&
 			    !test_bit(Faulty, &rdev->flags)) {
 				atomic_inc(&rdev->nr_pending);
-				r1_sync_page_io(rdev, sect, s,
-						folio_page(conf->tmpfolio, 0),
-						REQ_OP_WRITE);
+				r1_sync_folio_io(rdev, sect, s, 0,
+						conf->tmpfolio, REQ_OP_WRITE);
 				rdev_dec_pending(rdev, mddev);
 			}
 		}
@@ -2461,9 +2449,8 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
 			if (rdev &&
 			    !test_bit(Faulty, &rdev->flags)) {
 				atomic_inc(&rdev->nr_pending);
-				if (r1_sync_page_io(rdev, sect, s,
-						folio_page(conf->tmpfolio, 0),
-						REQ_OP_READ)) {
+				if (r1_sync_folio_io(rdev, sect, s, 0,
+						conf->tmpfolio, REQ_OP_READ)) {
 					atomic_add(s, &rdev->corrected_errors);
 					pr_info("md/raid1:%s: read error corrected (%d sectors at %llu on %pg)\n",
 						mdname(mddev), s,
@@ -2738,15 +2725,15 @@ static int init_resync(struct r1conf *conf)
 static struct r1bio *raid1_alloc_init_r1buf(struct r1conf *conf)
 {
 	struct r1bio *r1bio = mempool_alloc(&conf->r1buf_pool, GFP_NOIO);
-	struct resync_pages *rps;
+	struct resync_folio *rfs;
 	struct bio *bio;
 	int i;
 
 	for (i = conf->raid_disks * 2; i--; ) {
 		bio = r1bio->bios[i];
-		rps = bio->bi_private;
+		rfs = bio->bi_private;
 		bio_reset(bio, NULL, 0);
-		bio->bi_private = rps;
+		bio->bi_private = rfs;
 	}
 	r1bio->master_bio = NULL;
 	return r1bio;
@@ -2775,10 +2762,9 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
 	int write_targets = 0, read_targets = 0;
 	sector_t sync_blocks;
 	bool still_degraded = false;
-	int good_sectors = RESYNC_SECTORS;
+	int good_sectors;
 	int min_bad = 0; /* number of sectors that are bad in all devices */
 	int idx = sector_to_idx(sector_nr);
-	int page_idx = 0;
 
 	if (!mempool_initialized(&conf->r1buf_pool))
 		if (init_resync(conf))
@@ -2858,8 +2844,11 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
 	r1_bio->sector = sector_nr;
 	r1_bio->state = 0;
 	set_bit(R1BIO_IsSync, &r1_bio->state);
-	/* make sure good_sectors won't go across barrier unit boundary */
-	good_sectors = align_to_barrier_unit_end(sector_nr, good_sectors);
+	/*
+	 * make sure good_sectors won't go across barrier unit boundary.
+	 * r1_bio->sectors <= RESYNC_SECTORS.
+	 */
+	good_sectors = align_to_barrier_unit_end(sector_nr, r1_bio->sectors);
 
 	for (i = 0; i < conf->raid_disks * 2; i++) {
 		struct md_rdev *rdev;
@@ -2979,44 +2968,28 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
 		max_sector = mddev->resync_max; /* Don't do IO beyond here */
 	if (max_sector > sector_nr + good_sectors)
 		max_sector = sector_nr + good_sectors;
-	nr_sectors = 0;
-	sync_blocks = 0;
 	do {
-		struct page *page;
-		int len = PAGE_SIZE;
-		if (sector_nr + (len>>9) > max_sector)
-			len = (max_sector - sector_nr) << 9;
-		if (len == 0)
+		nr_sectors = max_sector - sector_nr;
+		if (nr_sectors == 0)
 			break;
-		if (sync_blocks == 0) {
-			if (!md_bitmap_start_sync(mddev, sector_nr,
-						  &sync_blocks, still_degraded) &&
-			    !conf->fullsync &&
-			    !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
-				break;
-			if ((len >> 9) > sync_blocks)
-				len = sync_blocks<<9;
-		}
+		if (!md_bitmap_start_sync(mddev, sector_nr,
+					  &sync_blocks, still_degraded) &&
+		    !conf->fullsync &&
+		    !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
+			break;
+		if (nr_sectors > sync_blocks)
+			nr_sectors = sync_blocks;
 
 		for (i = 0 ; i < conf->raid_disks * 2; i++) {
-			struct resync_pages *rp;
-
 			bio = r1_bio->bios[i];
-			rp = get_resync_pages(bio);
 			if (bio->bi_end_io) {
-				page = resync_fetch_page(rp, page_idx);
+				struct resync_folio *rf = get_resync_folio(bio);
 
-				/*
-				 * won't fail because the vec table is big
-				 * enough to hold all these pages
-				 */
-				__bio_add_page(bio, page, len, 0);
+				bio_add_folio_nofail(bio, rf->folio, nr_sectors << 9, 0);
 			}
 		}
-		nr_sectors += len>>9;
-		sector_nr += len>>9;
-		sync_blocks -= (len>>9);
-	} while (++page_idx < RESYNC_PAGES);
+		sector_nr += nr_sectors;
+	} while (0);
 
 	r1_bio->sectors = nr_sectors;
 
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 26f93040cd13..3638e00fe420 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -96,11 +96,11 @@ static void end_reshape(struct r10conf *conf);
 
 /*
  * for resync bio, r10bio pointer can be retrieved from the per-bio
- * 'struct resync_pages'.
+ * 'struct resync_folio'.
  */
 static inline struct r10bio *get_resync_r10bio(struct bio *bio)
 {
-	return get_resync_pages(bio)->raid_bio;
+	return get_resync_folio(bio)->raid_bio;
 }
 
 static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data)
@@ -133,8 +133,9 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
 	struct r10bio *r10_bio;
 	struct bio *bio;
 	int j;
-	int nalloc, nalloc_rp;
-	struct resync_pages *rps;
+	int nalloc, nalloc_rf;
+	struct resync_folio *rfs;
+	int order = get_order(RESYNC_BLOCK_SIZE);
 
 	r10_bio = r10bio_pool_alloc(gfp_flags, conf);
 	if (!r10_bio)
@@ -148,66 +149,64 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
 
 	/* allocate once for all bios */
 	if (!conf->have_replacement)
-		nalloc_rp = nalloc;
+		nalloc_rf = nalloc;
 	else
-		nalloc_rp = nalloc * 2;
-	rps = kmalloc_array(nalloc_rp, sizeof(struct resync_pages), gfp_flags);
-	if (!rps)
+		nalloc_rf = nalloc * 2;
+	rfs = kmalloc_array(nalloc_rf, sizeof(struct resync_folio), gfp_flags);
+	if (!rfs)
 		goto out_free_r10bio;
 
 	/*
 	 * Allocate bios.
 	 */
 	for (j = nalloc ; j-- ; ) {
-		bio = bio_kmalloc(RESYNC_PAGES, gfp_flags);
+		bio = bio_kmalloc(1, gfp_flags);
 		if (!bio)
 			goto out_free_bio;
-		bio_init_inline(bio, NULL, RESYNC_PAGES, 0);
+		bio_init_inline(bio, NULL, 1, 0);
 		r10_bio->devs[j].bio = bio;
 		if (!conf->have_replacement)
 			continue;
-		bio = bio_kmalloc(RESYNC_PAGES, gfp_flags);
+		bio = bio_kmalloc(1, gfp_flags);
 		if (!bio)
 			goto out_free_bio;
-		bio_init_inline(bio, NULL, RESYNC_PAGES, 0);
+		bio_init_inline(bio, NULL, 1, 0);
 		r10_bio->devs[j].repl_bio = bio;
 	}
 	/*
-	 * Allocate RESYNC_PAGES data pages and attach them
-	 * where needed.
+	 * Allocate data folio and attach it where needed.
 	 */
 	for (j = 0; j < nalloc; j++) {
 		struct bio *rbio = r10_bio->devs[j].repl_bio;
-		struct resync_pages *rp, *rp_repl;
+		struct resync_folio *rf, *rf_repl;
 
-		rp = &rps[j];
+		rf = &rfs[j];
 		if (rbio)
-			rp_repl = &rps[nalloc + j];
-
-		bio = r10_bio->devs[j].bio;
+			rf_repl = &rfs[nalloc + j];
 
 		if (!j || test_bit(MD_RECOVERY_SYNC,
 				   &conf->mddev->recovery)) {
-			if (resync_alloc_pages(rp, gfp_flags))
-				goto out_free_pages;
+			if (resync_alloc_folio(rf, gfp_flags, &order))
+				goto out_free_folio;
 		} else {
-			memcpy(rp, &rps[0], sizeof(*rp));
-			resync_get_all_pages(rp);
+			memcpy(rf, &rfs[0], sizeof(*rf));
+			folio_get(rf->folio);
 		}
 
-		rp->raid_bio = r10_bio;
-		bio->bi_private = rp;
+		rf->raid_bio = r10_bio;
+		r10_bio->devs[j].bio->bi_private = rf;
 		if (rbio) {
-			memcpy(rp_repl, rp, sizeof(*rp));
-			rbio->bi_private = rp_repl;
+			memcpy(rf_repl, rf, sizeof(*rf));
+			rbio->bi_private = rf_repl;
 		}
 	}
 
+	r10_bio->sectors = 1 << (order + PAGE_SECTORS_SHIFT);
 	return r10_bio;
 
-out_free_pages:
+out_free_folio:
 	while (--j >= 0)
-		resync_free_pages(&rps[j]);
+		folio_put(rfs[j].folio);
 
 	j = 0;
 out_free_bio:
@@ -219,7 +218,7 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
 			bio_uninit(r10_bio->devs[j].repl_bio);
 		kfree(r10_bio->devs[j].repl_bio);
 	}
-	kfree(rps);
+	kfree(rfs);
 out_free_r10bio:
 	rbio_pool_free(r10_bio, conf);
 	return NULL;
@@ -230,14 +229,14 @@ static void r10buf_pool_free(void *__r10_bio, void *data)
 	struct r10conf *conf = data;
 	struct r10bio *r10bio = __r10_bio;
 	int j;
-	struct resync_pages *rp = NULL;
+	struct resync_folio *rf = NULL;
 
 	for (j = conf->copies; j--; ) {
 		struct bio *bio = r10bio->devs[j].bio;
 
 		if (bio) {
-			rp = get_resync_pages(bio);
-			resync_free_pages(rp);
+			rf = get_resync_folio(bio);
+			folio_put(rf->folio);
 			bio_uninit(bio);
 			kfree(bio);
 		}
@@ -250,7 +249,7 @@ static void r10buf_pool_free(void *__r10_bio, void *data)
 	}
 
 	/* resync pages array stored in the 1st bio's .bi_private */
-	kfree(rp);
+	kfree(rf);
 
 	rbio_pool_free(r10bio, conf);
 }
@@ -2342,8 +2341,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 	struct r10conf *conf = mddev->private;
 	int i, first;
 	struct bio *tbio, *fbio;
-	int vcnt;
-	struct page **tpages, **fpages;
+	struct folio *tfolio, *ffolio;
 
 	atomic_set(&r10_bio->remaining, 1);
 
@@ -2359,14 +2357,13 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 	fbio = r10_bio->devs[i].bio;
 	fbio->bi_iter.bi_size = r10_bio->sectors << 9;
 	fbio->bi_iter.bi_idx = 0;
-	fpages = get_resync_pages(fbio)->pages;
+	ffolio = get_resync_folio(fbio)->folio;
 
-	vcnt = (r10_bio->sectors + (PAGE_SIZE >> 9) - 1) >> (PAGE_SHIFT - 9);
 	/* now find blocks with errors */
 	for (i=0 ; i < conf->copies ; i++) {
-		int  j, d;
+		int  d;
 		struct md_rdev *rdev;
-		struct resync_pages *rp;
+		struct resync_folio *rf;
 
 		tbio = r10_bio->devs[i].bio;
 
@@ -2375,31 +2372,23 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 		if (i == first)
 			continue;
 
-		tpages = get_resync_pages(tbio)->pages;
+		tfolio = get_resync_folio(tbio)->folio;
 		d = r10_bio->devs[i].devnum;
 		rdev = conf->mirrors[d].rdev;
 		if (!r10_bio->devs[i].bio->bi_status) {
 			/* We know that the bi_io_vec layout is the same for
 			 * both 'first' and 'i', so we just compare them.
-			 * All vec entries are PAGE_SIZE;
 			 */
-			int sectors = r10_bio->sectors;
-			for (j = 0; j < vcnt; j++) {
-				int len = PAGE_SIZE;
-				if (sectors < (len / 512))
-					len = sectors * 512;
-				if (memcmp(page_address(fpages[j]),
-					   page_address(tpages[j]),
-					   len))
-					break;
-				sectors -= len/512;
+			if (memcmp(folio_address(ffolio),
+				   folio_address(tfolio),
+				   r10_bio->sectors << 9)) {
+				atomic64_add(r10_bio->sectors,
+					     &mddev->resync_mismatches);
+				if (test_bit(MD_RECOVERY_CHECK,
+					     &mddev->recovery))
+					/* Don't fix anything. */
+					continue;
 			}
-			if (j == vcnt)
-				continue;
-			atomic64_add(r10_bio->sectors, &mddev->resync_mismatches);
-			if (test_bit(MD_RECOVERY_CHECK, &mddev->recovery))
-				/* Don't fix anything. */
-				continue;
 		} else if (test_bit(FailFast, &rdev->flags)) {
 			/* Just give up on this device */
 			md_error(rdev->mddev, rdev);
@@ -2410,13 +2399,13 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 		 * First we need to fixup bv_offset, bv_len and
 		 * bi_vecs, as the read request might have corrupted these
 		 */
-		rp = get_resync_pages(tbio);
+		rf = get_resync_folio(tbio);
 		bio_reset(tbio, conf->mirrors[d].rdev->bdev, REQ_OP_WRITE);
 
-		md_bio_reset_resync_pages(tbio, rp, fbio->bi_iter.bi_size);
+		md_bio_reset_resync_folio(tbio, rf, fbio->bi_iter.bi_size);
 
-		rp->raid_bio = r10_bio;
-		tbio->bi_private = rp;
+		rf->raid_bio = r10_bio;
+		tbio->bi_private = rf;
 		tbio->bi_iter.bi_sector = r10_bio->devs[i].addr;
 		tbio->bi_end_io = end_sync_write;
 
@@ -2476,10 +2465,9 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
 	struct bio *bio = r10_bio->devs[0].bio;
 	sector_t sect = 0;
 	int sectors = r10_bio->sectors;
-	int idx = 0;
 	int dr = r10_bio->devs[0].devnum;
 	int dw = r10_bio->devs[1].devnum;
-	struct page **pages = get_resync_pages(bio)->pages;
+	struct folio *folio = get_resync_folio(bio)->folio;
 
 	while (sectors) {
 		int s = sectors;
@@ -2492,19 +2480,21 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
 
 		rdev = conf->mirrors[dr].rdev;
 		addr = r10_bio->devs[0].addr + sect;
-		ok = sync_page_io(rdev,
-				  addr,
-				  s << 9,
-				  pages[idx],
-				  REQ_OP_READ, false);
+		ok = sync_folio_io(rdev,
+				   addr,
+				   s << 9,
+				   sect << 9,
+				   folio,
+				   REQ_OP_READ, false);
 		if (ok) {
 			rdev = conf->mirrors[dw].rdev;
 			addr = r10_bio->devs[1].addr + sect;
-			ok = sync_page_io(rdev,
-					  addr,
-					  s << 9,
-					  pages[idx],
-					  REQ_OP_WRITE, false);
+			ok = sync_folio_io(rdev,
+					   addr,
+					   s << 9,
+					   sect << 9,
+					   folio,
+					   REQ_OP_WRITE, false);
 			if (!ok) {
 				set_bit(WriteErrorSeen, &rdev->flags);
 				if (!test_and_set_bit(WantReplacement,
@@ -2539,7 +2529,6 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
 
 		sectors -= s;
 		sect += s;
-		idx++;
 	}
 }
 
@@ -3050,7 +3039,7 @@ static int init_resync(struct r10conf *conf)
 static struct r10bio *raid10_alloc_init_r10buf(struct r10conf *conf)
 {
 	struct r10bio *r10bio = mempool_alloc(&conf->r10buf_pool, GFP_NOIO);
-	struct rsync_pages *rp;
+	struct resync_folio *rf;
 	struct bio *bio;
 	int nalloc;
 	int i;
@@ -3063,14 +3052,14 @@ static struct r10bio *raid10_alloc_init_r10buf(struct r10conf *conf)
 
 	for (i = 0; i < nalloc; i++) {
 		bio = r10bio->devs[i].bio;
-		rp = bio->bi_private;
+		rf = bio->bi_private;
 		bio_reset(bio, NULL, 0);
-		bio->bi_private = rp;
+		bio->bi_private = rf;
 		bio = r10bio->devs[i].repl_bio;
 		if (bio) {
-			rp = bio->bi_private;
+			rf = bio->bi_private;
 			bio_reset(bio, NULL, 0);
-			bio->bi_private = rp;
+			bio->bi_private = rf;
 		}
 	}
 	return r10bio;
@@ -3156,7 +3145,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 	int max_sync = RESYNC_SECTORS;
 	sector_t sync_blocks;
 	sector_t chunk_mask = conf->geo.chunk_mask;
-	int page_idx = 0;
 
 	/*
 	 * Allow skipping a full rebuild for incremental assembly
@@ -3376,6 +3364,15 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 						continue;
 					}
 				}
+
+				/*
+				 * RESYNC_BLOCK_SIZE folio might alloc failed in
+				 * resync_alloc_folio(). Fall back to smaller sync
+				 * size if needed.
+				 */
+				if (max_sync > r10_bio->sectors)
+					max_sync = r10_bio->sectors;
+
 				any_working = 1;
 				bio = r10_bio->devs[0].bio;
 				bio->bi_next = biolist;
@@ -3527,7 +3524,15 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 		}
 		if (sync_blocks < max_sync)
 			max_sync = sync_blocks;
+
 		r10_bio = raid10_alloc_init_r10buf(conf);
+		/*
+		 * RESYNC_BLOCK_SIZE folio might alloc failed in resync_alloc_folio().
+		 * Fall back to smaller sync size if needed.
+		 */
+		if (max_sync > r10_bio->sectors)
+			max_sync = r10_bio->sectors;
+
 		r10_bio->state = 0;
 
 		r10_bio->mddev = mddev;
@@ -3620,29 +3625,25 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
 		}
 	}
 
-	nr_sectors = 0;
 	if (sector_nr + max_sync < max_sector)
 		max_sector = sector_nr + max_sync;
 	do {
-		struct page *page;
-		int len = PAGE_SIZE;
-		if (sector_nr + (len>>9) > max_sector)
-			len = (max_sector - sector_nr) << 9;
-		if (len == 0)
+		nr_sectors = max_sector - sector_nr;
+
+		if (nr_sectors == 0)
 			break;
 		for (bio= biolist ; bio ; bio=bio->bi_next) {
-			struct resync_pages *rp = get_resync_pages(bio);
-			page = resync_fetch_page(rp, page_idx);
-			if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
+			struct resync_folio *rf = get_resync_folio(bio);
+
+			if (WARN_ON(!bio_add_folio(bio, rf->folio, nr_sectors << 9, 0))) {
 				bio->bi_status = BLK_STS_RESOURCE;
 				bio_endio(bio);
 				*skipped = 1;
-				return max_sync;
+				return nr_sectors << 9;
 			}
 		}
-		nr_sectors += len>>9;
-		sector_nr += len>>9;
-	} while (++page_idx < RESYNC_PAGES);
+		sector_nr += nr_sectors;
+	} while (0);
 	r10_bio->sectors = nr_sectors;
 
 	if (mddev_is_clustered(mddev) &&
@@ -4560,7 +4561,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
 				int *skipped)
 {
 	/* We simply copy at most one chunk (smallest of old and new)
-	 * at a time, possibly less if that exceeds RESYNC_PAGES,
+	 * at a time, possibly less if that exceeds RESYNC_BLOCK_SIZE,
 	 * or we hit a bad block or something.
 	 * This might mean we pause for normal IO in the middle of
 	 * a chunk, but that is not a problem as mddev->reshape_position
@@ -4600,14 +4601,13 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
 	struct r10bio *r10_bio;
 	sector_t next, safe, last;
 	int max_sectors;
-	int nr_sectors;
 	int s;
 	struct md_rdev *rdev;
 	int need_flush = 0;
 	struct bio *blist;
 	struct bio *bio, *read_bio;
 	int sectors_done = 0;
-	struct page **pages;
+	struct folio *folio;
 
 	if (sector_nr == 0) {
 		/* If restarting in the middle, skip the initial sectors */
@@ -4709,7 +4709,12 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
 	r10_bio->mddev = mddev;
 	r10_bio->sector = sector_nr;
 	set_bit(R10BIO_IsReshape, &r10_bio->state);
-	r10_bio->sectors = last - sector_nr + 1;
+	/*
+	 * RESYNC_BLOCK_SIZE folio might alloc failed in
+	 * resync_alloc_folio(). Fall back to smaller sync
+	 * size if needed.
+	 */
+	r10_bio->sectors = min_t(int, r10_bio->sectors, last - sector_nr + 1);
 	rdev = read_balance(conf, r10_bio, &max_sectors);
 	BUG_ON(!test_bit(R10BIO_Previous, &r10_bio->state));
 
@@ -4723,7 +4728,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
 		return sectors_done;
 	}
 
-	read_bio = bio_alloc_bioset(rdev->bdev, RESYNC_PAGES, REQ_OP_READ,
+	read_bio = bio_alloc_bioset(rdev->bdev, 1, REQ_OP_READ,
 				    GFP_KERNEL, &mddev->bio_set);
 	read_bio->bi_iter.bi_sector = (r10_bio->devs[r10_bio->read_slot].addr
 			       + rdev->data_offset);
@@ -4787,32 +4792,23 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
 		blist = b;
 	}
 
-	/* Now add as many pages as possible to all of these bios. */
+	/* Now add folio to all of these bios. */
 
-	nr_sectors = 0;
-	pages = get_resync_pages(r10_bio->devs[0].bio)->pages;
-	for (s = 0 ; s < max_sectors; s += PAGE_SIZE >> 9) {
-		struct page *page = pages[s / (PAGE_SIZE >> 9)];
-		int len = (max_sectors - s) << 9;
-		if (len > PAGE_SIZE)
-			len = PAGE_SIZE;
-		for (bio = blist; bio ; bio = bio->bi_next) {
-			if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
-				bio->bi_status = BLK_STS_RESOURCE;
-				bio_endio(bio);
-				return sectors_done;
-			}
+	folio = get_resync_folio(r10_bio->devs[0].bio)->folio;
+	for (bio = blist; bio ; bio = bio->bi_next) {
+		if (WARN_ON(!bio_add_folio(bio, folio, max_sectors, 0))) {
+			bio->bi_status = BLK_STS_RESOURCE;
+			bio_endio(bio);
+			return sectors_done;
 		}
-		sector_nr += len >> 9;
-		nr_sectors += len >> 9;
 	}
-	r10_bio->sectors = nr_sectors;
+	r10_bio->sectors = max_sectors >> 9;
 
 	/* Now submit the read */
 	atomic_inc(&r10_bio->remaining);
 	read_bio->bi_next = NULL;
 	submit_bio_noacct(read_bio);
-	sectors_done += nr_sectors;
+	sectors_done += max_sectors;
 	if (sector_nr <= last)
 		goto read_more;
 
@@ -4914,8 +4910,8 @@ static int handle_reshape_read_error(struct mddev *mddev,
 	struct r10conf *conf = mddev->private;
 	struct r10bio *r10b;
 	int slot = 0;
-	int idx = 0;
-	struct page **pages;
+	int sect = 0;
+	struct folio *folio;
 
 	r10b = kmalloc(struct_size(r10b, devs, conf->copies), GFP_NOIO);
 	if (!r10b) {
@@ -4923,8 +4919,8 @@ static int handle_reshape_read_error(struct mddev *mddev,
 		return -ENOMEM;
 	}
 
-	/* reshape IOs share pages from .devs[0].bio */
-	pages = get_resync_pages(r10_bio->devs[0].bio)->pages;
+	/* reshape IOs share folio from .devs[0].bio */
+	folio = get_resync_folio(r10_bio->devs[0].bio)->folio;
 
 	r10b->sector = r10_bio->sector;
 	__raid10_find_phys(&conf->prev, r10b);
@@ -4940,19 +4936,19 @@ static int handle_reshape_read_error(struct mddev *mddev,
 		while (!success) {
 			int d = r10b->devs[slot].devnum;
 			struct md_rdev *rdev = conf->mirrors[d].rdev;
-			sector_t addr;
 			if (rdev == NULL ||
 			    test_bit(Faulty, &rdev->flags) ||
 			    !test_bit(In_sync, &rdev->flags))
 				goto failed;
 
-			addr = r10b->devs[slot].addr + idx * PAGE_SIZE;
 			atomic_inc(&rdev->nr_pending);
-			success = sync_page_io(rdev,
-					       addr,
-					       s << 9,
-					       pages[idx],
-					       REQ_OP_READ, false);
+			success = sync_folio_io(rdev,
+						r10b->devs[slot].addr +
+						sect,
+						s << 9,
+						sect << 9,
+						folio,
+						REQ_OP_READ, false);
 			rdev_dec_pending(rdev, mddev);
 			if (success)
 				break;
@@ -4971,7 +4967,7 @@ static int handle_reshape_read_error(struct mddev *mddev,
 			return -EIO;
 		}
 		sectors -= s;
-		idx++;
+		sect += s;
 	}
 	kfree(r10b);
 	return 0;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 6/8] md/raid1,raid10: use folio for sync path IO
  2026-04-16  3:37 ` [PATCH v3 6/8] md/raid1,raid10: use folio for sync path IO linan666
@ 2026-04-30  1:54   ` Xiao Ni
  2026-05-07  7:13     ` 李楠 Magic Li
  0 siblings, 1 reply; 13+ messages in thread
From: Xiao Ni @ 2026-04-30  1:54 UTC (permalink / raw)
  To: linan666; +Cc: song, yukuai, linux-raid, linux-kernel, yangerkun, yi.zhang

Hi Nan

On Thu, Apr 16, 2026 at 11:55 AM <linan666@huaweicloud.com> wrote:
>
> From: Li Nan <linan122@huawei.com>
>
> Convert all IO on the sync path to use folios, and rename page-related
> identifiers to match folio.
>
> Since RESYNC_BLOCK_SIZE (64K) has higher allocation failure chance than 4k,
> retry with lower orders to improve allocation reliability. A r1/10_bio may
> have different rf->folio orders, so use minimum order as r1/10_bio sectors
> to prevent exceeding size when adding folio to IO later.
>
> Clean up:
> 1. Remove resync_get_all_folio() and invoke folio_get() directly instead.
> 2. Clean up redundant while(0) loop in md_bio_reset_resync_folio().
> 3. Clean up bio variable by directly referencing r10_bio->devs[j].bio
>    instead in r1buf_pool_alloc() and r10buf_pool_alloc().
> 4. Clean up RESYNC_PAGES.
> 5. Remove resync_fetch_folio(), access 'rf->folio' directly.
> 6. Remove resync_free_folio(), call folio_put() directly.
> 7. clean up sync IO size calculation in raid1/10_sync_request.
>
> Signed-off-by: Li Nan <linan122@huawei.com>
> ---
>  drivers/md/md.c       |   2 +-
>  drivers/md/raid1-10.c |  80 ++++---------
>  drivers/md/raid1.c    | 209 +++++++++++++++-------------------
>  drivers/md/raid10.c   | 254 +++++++++++++++++++++---------------------
>  4 files changed, 240 insertions(+), 305 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 5e83914d5c14..6554b849ac74 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -9440,7 +9440,7 @@ static bool sync_io_within_limit(struct mddev *mddev)
>  {
>         /*
>          * For raid456, sync IO is stripe(4k) per IO, for other levels, it's
> -        * RESYNC_PAGES(64k) per IO.
> +        * RESYNC_BLOCK_SIZE(64k) per IO.
>          */
>         return atomic_read(&mddev->recovery_active) <
>                (raid_is_456(mddev) ? 8 : 128) * sync_io_depth(mddev);
> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
> index cda531d0720b..10200b0a3fd2 100644
> --- a/drivers/md/raid1-10.c
> +++ b/drivers/md/raid1-10.c
> @@ -1,7 +1,6 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /* Maximum size of each resync request */
>  #define RESYNC_BLOCK_SIZE (64*1024)
> -#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
>  #define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
>
>  /* when we get a read error on a read-only array, we redirect to another
> @@ -20,9 +19,9 @@
>  #define MAX_PLUG_BIO 32
>
>  /* for managing resync I/O pages */
> -struct resync_pages {
> +struct resync_folio {
>         void            *raid_bio;
> -       struct page     *pages[RESYNC_PAGES];
> +       struct folio    *folio;
>  };
>
>  struct raid1_plug_cb {
> @@ -36,77 +35,44 @@ static void rbio_pool_free(void *rbio, void *data)
>         kfree(rbio);
>  }
>
> -static inline int resync_alloc_pages(struct resync_pages *rp,
> -                                    gfp_t gfp_flags)
> +static inline int resync_alloc_folio(struct resync_folio *rf,
> +                                    gfp_t gfp_flags, int *order)
>  {
> -       int i;
> +       struct folio *folio;
>
> -       for (i = 0; i < RESYNC_PAGES; i++) {
> -               rp->pages[i] = alloc_page(gfp_flags);
> -               if (!rp->pages[i])
> -                       goto out_free;
> -       }
> +       do {
> +               folio = folio_alloc(gfp_flags, *order);
> +               if (folio)
> +                       break;
> +       } while (--(*order) > 0);

It has a problem here. If it can't allocate a big page, the sync
request unit will be smaller and sync performance may decrease. This
can happen when the system lacks sufficient continuous memory. This
change looks good to me. I just want to throw this problem out for an
open discussion.

>
> +       if (!folio)
> +               return -ENOMEM;
> +
> +       rf->folio = folio;
>         return 0;
> -
> -out_free:
> -       while (--i >= 0)
> -               put_page(rp->pages[i]);
> -       return -ENOMEM;
> -}
> -
> -static inline void resync_free_pages(struct resync_pages *rp)
> -{
> -       int i;
> -
> -       for (i = 0; i < RESYNC_PAGES; i++)
> -               put_page(rp->pages[i]);
> -}
> -
> -static inline void resync_get_all_pages(struct resync_pages *rp)
> -{
> -       int i;
> -
> -       for (i = 0; i < RESYNC_PAGES; i++)
> -               get_page(rp->pages[i]);
> -}
> -
> -static inline struct page *resync_fetch_page(struct resync_pages *rp,
> -                                            unsigned idx)
> -{
> -       if (WARN_ON_ONCE(idx >= RESYNC_PAGES))
> -               return NULL;
> -       return rp->pages[idx];
>  }
>
>  /*
> - * 'strct resync_pages' stores actual pages used for doing the resync
> + * 'strct resync_folio' stores actual pages used for doing the resync
>   *  IO, and it is per-bio, so make .bi_private points to it.
>   */
> -static inline struct resync_pages *get_resync_pages(struct bio *bio)
> +static inline struct resync_folio *get_resync_folio(struct bio *bio)
>  {
>         return bio->bi_private;
>  }
>
>  /* generally called after bio_reset() for reseting bvec */
> -static void md_bio_reset_resync_pages(struct bio *bio, struct resync_pages *rp,
> +static void md_bio_reset_resync_folio(struct bio *bio, struct resync_folio *rf,
>                                int size)
>  {
> -       int idx = 0;
> -
>         /* initialize bvec table again */
> -       do {
> -               struct page *page = resync_fetch_page(rp, idx);
> -               int len = min_t(int, size, PAGE_SIZE);
> -
> -               if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
> -                       bio->bi_status = BLK_STS_RESOURCE;
> -                       bio_endio(bio);
> -                       return;
> -               }
> -
> -               size -= len;
> -       } while (idx++ < RESYNC_PAGES && size > 0);
> +       if (WARN_ON(!bio_add_folio(bio, rf->folio,
> +                                  min_t(int, size, RESYNC_BLOCK_SIZE),
> +                                  0))) {
> +               bio->bi_status = BLK_STS_RESOURCE;
> +               bio_endio(bio);
> +       }
>  }
>
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index a72abdc37a2d..724fd4f2cc3a 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -120,11 +120,11 @@ static void remove_serial(struct md_rdev *rdev, sector_t lo, sector_t hi)
>
>  /*
>   * for resync bio, r1bio pointer can be retrieved from the per-bio
> - * 'struct resync_pages'.
> + * 'struct resync_folio'.
>   */
>  static inline struct r1bio *get_resync_r1bio(struct bio *bio)
>  {
> -       return get_resync_pages(bio)->raid_bio;
> +       return get_resync_folio(bio)->raid_bio;
>  }
>
>  static void *r1bio_pool_alloc(gfp_t gfp_flags, struct r1conf *conf)
> @@ -146,70 +146,69 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
>         struct r1conf *conf = data;
>         struct r1bio *r1_bio;
>         struct bio *bio;
> -       int need_pages;
> +       int need_folio;

The name need_folio is confusing. Can we keep the same style as the
old version? How about need_folios?

>         int j;
> -       struct resync_pages *rps;
> +       struct resync_folio *rfs;
> +       int order = get_order(RESYNC_BLOCK_SIZE);
>
>         r1_bio = r1bio_pool_alloc(gfp_flags, conf);
>         if (!r1_bio)
>                 return NULL;
>
> -       rps = kmalloc_array(conf->raid_disks * 2, sizeof(struct resync_pages),
> +       rfs = kmalloc_array(conf->raid_disks * 2, sizeof(struct resync_folio),
>                             gfp_flags);
> -       if (!rps)
> +       if (!rfs)
>                 goto out_free_r1bio;
>
>         /*
>          * Allocate bios : 1 for reading, n-1 for writing
>          */
>         for (j = conf->raid_disks * 2; j-- ; ) {
> -               bio = bio_kmalloc(RESYNC_PAGES, gfp_flags);
> +               bio = bio_kmalloc(1, gfp_flags);
>                 if (!bio)
>                         goto out_free_bio;
> -               bio_init_inline(bio, NULL, RESYNC_PAGES, 0);
> +               bio_init_inline(bio, NULL, 1, 0);
>                 r1_bio->bios[j] = bio;
>         }
>         /*
> -        * Allocate RESYNC_PAGES data pages and attach them to
> -        * the first bio.
> +        * Allocate data folio and attach it to the first bio.
>          * If this is a user-requested check/repair, allocate
> -        * RESYNC_PAGES for each bio.
> +        * folio for each bio.
>          */
>         if (test_bit(MD_RECOVERY_REQUESTED, &conf->mddev->recovery))
> -               need_pages = conf->raid_disks * 2;
> +               need_folio = conf->raid_disks * 2;
>         else
> -               need_pages = 1;
> +               need_folio = 1;
>         for (j = 0; j < conf->raid_disks * 2; j++) {
> -               struct resync_pages *rp = &rps[j];
> +               struct resync_folio *rf = &rfs[j];
>
> -               bio = r1_bio->bios[j];
> -
> -               if (j < need_pages) {
> -                       if (resync_alloc_pages(rp, gfp_flags))
> -                               goto out_free_pages;
> +               if (j < need_folio) {
> +                       if (resync_alloc_folio(rf, gfp_flags, &order))
> +                               goto out_free_folio;
>                 } else {
> -                       memcpy(rp, &rps[0], sizeof(*rp));
> -                       resync_get_all_pages(rp);
> +                       memcpy(rf, &rfs[0], sizeof(*rf));
> +                       folio_get(rf->folio);
>                 }
>
> -               rp->raid_bio = r1_bio;
> -               bio->bi_private = rp;
> +               rf->raid_bio = r1_bio;
> +               r1_bio->bios[j]->bi_private = rf;
>         }
>
> +       r1_bio->sectors = 1 << (order + PAGE_SECTORS_SHIFT);
>         r1_bio->master_bio = NULL;
>
>         return r1_bio;
>
> -out_free_pages:
> +out_free_folio:
>         while (--j >= 0)
> -               resync_free_pages(&rps[j]);
> +               folio_put(rfs[j].folio);
>
>  out_free_bio:
>         while (++j < conf->raid_disks * 2) {
>                 bio_uninit(r1_bio->bios[j]);
>                 kfree(r1_bio->bios[j]);
>         }
> -       kfree(rps);
> +       kfree(rfs);
>
>  out_free_r1bio:
>         rbio_pool_free(r1_bio, data);
> @@ -221,17 +220,17 @@ static void r1buf_pool_free(void *__r1_bio, void *data)
>         struct r1conf *conf = data;
>         int i;
>         struct r1bio *r1bio = __r1_bio;
> -       struct resync_pages *rp = NULL;
> +       struct resync_folio *rf = NULL;
>
>         for (i = conf->raid_disks * 2; i--; ) {
> -               rp = get_resync_pages(r1bio->bios[i]);
> -               resync_free_pages(rp);
> +               rf = get_resync_folio(r1bio->bios[i]);
> +               folio_put(rf->folio);
>                 bio_uninit(r1bio->bios[i]);
>                 kfree(r1bio->bios[i]);
>         }
>
> -       /* resync pages array stored in the 1st bio's .bi_private */
> -       kfree(rp);
> +       /* resync folio stored in the 1st bio's .bi_private */
> +       kfree(rf);
>
>         rbio_pool_free(r1bio, data);
>  }
> @@ -2095,10 +2094,10 @@ static void end_sync_write(struct bio *bio)
>         put_sync_write_buf(r1_bio);
>  }
>
> -static int r1_sync_page_io(struct md_rdev *rdev, sector_t sector,
> -                          int sectors, struct page *page, blk_opf_t rw)
> +static int r1_sync_folio_io(struct md_rdev *rdev, sector_t sector, int sectors,
> +                           int off, struct folio *folio, blk_opf_t rw)
>  {
> -       if (sync_page_io(rdev, sector, sectors << 9, page, rw, false))
> +       if (sync_folio_io(rdev, sector, sectors << 9, off, folio, rw, false))
>                 /* success */
>                 return 1;
>         if (rw == REQ_OP_WRITE) {
> @@ -2129,10 +2128,10 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
>         struct mddev *mddev = r1_bio->mddev;
>         struct r1conf *conf = mddev->private;
>         struct bio *bio = r1_bio->bios[r1_bio->read_disk];
> -       struct page **pages = get_resync_pages(bio)->pages;
> +       struct folio *folio = get_resync_folio(bio)->folio;
>         sector_t sect = r1_bio->sector;
>         int sectors = r1_bio->sectors;
> -       int idx = 0;
> +       int off = 0;
>         struct md_rdev *rdev;
>
>         rdev = conf->mirrors[r1_bio->read_disk].rdev;
> @@ -2162,9 +2161,8 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
>                                  * active, and resync is currently active
>                                  */
>                                 rdev = conf->mirrors[d].rdev;
> -                               if (sync_page_io(rdev, sect, s<<9,
> -                                                pages[idx],
> -                                                REQ_OP_READ, false)) {
> +                               if (sync_folio_io(rdev, sect, s<<9, off, folio,
> +                                                 REQ_OP_READ, false)) {
>                                         success = 1;
>                                         break;
>                                 }
> @@ -2197,7 +2195,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
>                         /* Try next page */
>                         sectors -= s;
>                         sect += s;
> -                       idx++;
> +                       off += s << 9;
>                         continue;
>                 }
>
> @@ -2210,8 +2208,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
>                         if (r1_bio->bios[d]->bi_end_io != end_sync_read)
>                                 continue;
>                         rdev = conf->mirrors[d].rdev;
> -                       if (r1_sync_page_io(rdev, sect, s,
> -                                           pages[idx],
> +                       if (r1_sync_folio_io(rdev, sect, s, off, folio,
>                                             REQ_OP_WRITE) == 0) {
>                                 r1_bio->bios[d]->bi_end_io = NULL;
>                                 rdev_dec_pending(rdev, mddev);
> @@ -2225,14 +2222,13 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
>                         if (r1_bio->bios[d]->bi_end_io != end_sync_read)
>                                 continue;
>                         rdev = conf->mirrors[d].rdev;
> -                       if (r1_sync_page_io(rdev, sect, s,
> -                                           pages[idx],
> +                       if (r1_sync_folio_io(rdev, sect, s, off, folio,
>                                             REQ_OP_READ) != 0)
>                                 atomic_add(s, &rdev->corrected_errors);
>                 }
>                 sectors -= s;
>                 sect += s;
> -               idx ++;
> +               off += s << 9;
>         }
>         set_bit(R1BIO_Uptodate, &r1_bio->state);
>         bio->bi_status = 0;
> @@ -2252,14 +2248,12 @@ static void process_checks(struct r1bio *r1_bio)
>         struct r1conf *conf = mddev->private;
>         int primary;
>         int i;
> -       int vcnt;
>
>         /* Fix variable parts of all bios */
> -       vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9);
>         for (i = 0; i < conf->raid_disks * 2; i++) {
>                 blk_status_t status;
>                 struct bio *b = r1_bio->bios[i];
> -               struct resync_pages *rp = get_resync_pages(b);
> +               struct resync_folio *rf = get_resync_folio(b);
>                 if (b->bi_end_io != end_sync_read)
>                         continue;
>                 /* fixup the bio for reuse, but preserve errno */
> @@ -2269,11 +2263,11 @@ static void process_checks(struct r1bio *r1_bio)
>                 b->bi_iter.bi_sector = r1_bio->sector +
>                         conf->mirrors[i].rdev->data_offset;
>                 b->bi_end_io = end_sync_read;
> -               rp->raid_bio = r1_bio;
> -               b->bi_private = rp;
> +               rf->raid_bio = r1_bio;
> +               b->bi_private = rf;
>
>                 /* initialize bvec table again */
> -               md_bio_reset_resync_pages(b, rp, r1_bio->sectors << 9);
> +               md_bio_reset_resync_folio(b, rf, r1_bio->sectors << 9);
>         }
>         for (primary = 0; primary < conf->raid_disks * 2; primary++)
>                 if (r1_bio->bios[primary]->bi_end_io == end_sync_read &&
> @@ -2284,44 +2278,39 @@ static void process_checks(struct r1bio *r1_bio)
>                 }
>         r1_bio->read_disk = primary;
>         for (i = 0; i < conf->raid_disks * 2; i++) {
> -               int j = 0;
>                 struct bio *pbio = r1_bio->bios[primary];
>                 struct bio *sbio = r1_bio->bios[i];
>                 blk_status_t status = sbio->bi_status;
> -               struct page **ppages = get_resync_pages(pbio)->pages;
> -               struct page **spages = get_resync_pages(sbio)->pages;
> -               struct bio_vec *bi;
> -               int page_len[RESYNC_PAGES] = { 0 };
> -               struct bvec_iter_all iter_all;
> +               struct folio *pfolio = get_resync_folio(pbio)->folio;
> +               struct folio *sfolio = get_resync_folio(sbio)->folio;
>
>                 if (sbio->bi_end_io != end_sync_read)
>                         continue;
>                 /* Now we can 'fixup' the error value */
>                 sbio->bi_status = 0;
>
> -               bio_for_each_segment_all(bi, sbio, iter_all)
> -                       page_len[j++] = bi->bv_len;
> -
> -               if (!status) {
> -                       for (j = vcnt; j-- ; ) {
> -                               if (memcmp(page_address(ppages[j]),
> -                                          page_address(spages[j]),
> -                                          page_len[j]))
> -                                       break;
> -                       }
> -               } else
> -                       j = 0;
> -               if (j >= 0)
> +               /*
> +                * Copy data and submit write in two cases:
> +                * - IO error (non-zero status)
> +                * - Data inconsistency and not a CHECK operation.
> +                */
> +               if (status) {
>                         atomic64_add(r1_bio->sectors, &mddev->resync_mismatches);
> -               if (j < 0 || (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)
> -                             && !status)) {
> -                       /* No need to write to this device. */
> -                       sbio->bi_end_io = NULL;
> -                       rdev_dec_pending(conf->mirrors[i].rdev, mddev);
> +                       bio_copy_data(sbio, pbio);
>                         continue;
> +               } else if (memcmp(folio_address(pfolio),
> +                       folio_address(sfolio),
> +                       r1_bio->sectors << 9)) {
> +                       atomic64_add(r1_bio->sectors, &mddev->resync_mismatches);
> +                       if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) {
> +                               bio_copy_data(sbio, pbio);
> +                               continue;
> +                       }
>                 }
>
> -               bio_copy_data(sbio, pbio);
> +               /* No need to write to this device. */
> +               sbio->bi_end_io = NULL;
> +               rdev_dec_pending(conf->mirrors[i].rdev, mddev);
>         }
>  }
>
> @@ -2446,9 +2435,8 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
>                         if (rdev &&
>                             !test_bit(Faulty, &rdev->flags)) {
>                                 atomic_inc(&rdev->nr_pending);
> -                               r1_sync_page_io(rdev, sect, s,
> -                                               folio_page(conf->tmpfolio, 0),
> -                                               REQ_OP_WRITE);
> +                               r1_sync_folio_io(rdev, sect, s, 0,
> +                                               conf->tmpfolio, REQ_OP_WRITE);
>                                 rdev_dec_pending(rdev, mddev);
>                         }
>                 }
> @@ -2461,9 +2449,8 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
>                         if (rdev &&
>                             !test_bit(Faulty, &rdev->flags)) {
>                                 atomic_inc(&rdev->nr_pending);
> -                               if (r1_sync_page_io(rdev, sect, s,
> -                                               folio_page(conf->tmpfolio, 0),
> -                                               REQ_OP_READ)) {
> +                               if (r1_sync_folio_io(rdev, sect, s, 0,
> +                                               conf->tmpfolio, REQ_OP_READ)) {
>                                         atomic_add(s, &rdev->corrected_errors);
>                                         pr_info("md/raid1:%s: read error corrected (%d sectors at %llu on %pg)\n",
>                                                 mdname(mddev), s,
> @@ -2738,15 +2725,15 @@ static int init_resync(struct r1conf *conf)
>  static struct r1bio *raid1_alloc_init_r1buf(struct r1conf *conf)
>  {
>         struct r1bio *r1bio = mempool_alloc(&conf->r1buf_pool, GFP_NOIO);
> -       struct resync_pages *rps;
> +       struct resync_folio *rfs;
>         struct bio *bio;
>         int i;
>
>         for (i = conf->raid_disks * 2; i--; ) {
>                 bio = r1bio->bios[i];
> -               rps = bio->bi_private;
> +               rfs = bio->bi_private;
>                 bio_reset(bio, NULL, 0);
> -               bio->bi_private = rps;
> +               bio->bi_private = rfs;
>         }
>         r1bio->master_bio = NULL;
>         return r1bio;
> @@ -2775,10 +2762,9 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
>         int write_targets = 0, read_targets = 0;
>         sector_t sync_blocks;
>         bool still_degraded = false;
> -       int good_sectors = RESYNC_SECTORS;
> +       int good_sectors;
>         int min_bad = 0; /* number of sectors that are bad in all devices */
>         int idx = sector_to_idx(sector_nr);
> -       int page_idx = 0;
>
>         if (!mempool_initialized(&conf->r1buf_pool))
>                 if (init_resync(conf))
> @@ -2858,8 +2844,11 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
>         r1_bio->sector = sector_nr;
>         r1_bio->state = 0;
>         set_bit(R1BIO_IsSync, &r1_bio->state);
> -       /* make sure good_sectors won't go across barrier unit boundary */
> -       good_sectors = align_to_barrier_unit_end(sector_nr, good_sectors);
> +       /*
> +        * make sure good_sectors won't go across barrier unit boundary.
> +        * r1_bio->sectors <= RESYNC_SECTORS.
> +        */
> +       good_sectors = align_to_barrier_unit_end(sector_nr, r1_bio->sectors);
>
>         for (i = 0; i < conf->raid_disks * 2; i++) {
>                 struct md_rdev *rdev;
> @@ -2979,44 +2968,28 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
>                 max_sector = mddev->resync_max; /* Don't do IO beyond here */
>         if (max_sector > sector_nr + good_sectors)
>                 max_sector = sector_nr + good_sectors;
> -       nr_sectors = 0;
> -       sync_blocks = 0;
>         do {
> -               struct page *page;
> -               int len = PAGE_SIZE;
> -               if (sector_nr + (len>>9) > max_sector)
> -                       len = (max_sector - sector_nr) << 9;
> -               if (len == 0)
> +               nr_sectors = max_sector - sector_nr;
> +               if (nr_sectors == 0)
>                         break;
> -               if (sync_blocks == 0) {
> -                       if (!md_bitmap_start_sync(mddev, sector_nr,
> -                                                 &sync_blocks, still_degraded) &&
> -                           !conf->fullsync &&
> -                           !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
> -                               break;
> -                       if ((len >> 9) > sync_blocks)
> -                               len = sync_blocks<<9;
> -               }
> +               if (!md_bitmap_start_sync(mddev, sector_nr,
> +                                         &sync_blocks, still_degraded) &&
> +                   !conf->fullsync &&
> +                   !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
> +                       break;
> +               if (nr_sectors > sync_blocks)
> +                       nr_sectors = sync_blocks;
>
>                 for (i = 0 ; i < conf->raid_disks * 2; i++) {
> -                       struct resync_pages *rp;
> -
>                         bio = r1_bio->bios[i];
> -                       rp = get_resync_pages(bio);
>                         if (bio->bi_end_io) {
> -                               page = resync_fetch_page(rp, page_idx);
> +                               struct resync_folio *rf = get_resync_folio(bio);
>
> -                               /*
> -                                * won't fail because the vec table is big
> -                                * enough to hold all these pages
> -                                */
> -                               __bio_add_page(bio, page, len, 0);
> +                               bio_add_folio_nofail(bio, rf->folio, nr_sectors << 9, 0);
>                         }
>                 }
> -               nr_sectors += len>>9;
> -               sector_nr += len>>9;
> -               sync_blocks -= (len>>9);
> -       } while (++page_idx < RESYNC_PAGES);
> +               sector_nr += nr_sectors;
> +       } while (0);

Now it can handle all pages in one go via a folio. It's strange to
keep while(0) here.


>
>         r1_bio->sectors = nr_sectors;


This patch is a little big. Is it better to split this patch here?

>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 26f93040cd13..3638e00fe420 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -96,11 +96,11 @@ static void end_reshape(struct r10conf *conf);
>
>  /*
>   * for resync bio, r10bio pointer can be retrieved from the per-bio
> - * 'struct resync_pages'.
> + * 'struct resync_folio'.
>   */
>  static inline struct r10bio *get_resync_r10bio(struct bio *bio)
>  {
> -       return get_resync_pages(bio)->raid_bio;
> +       return get_resync_folio(bio)->raid_bio;
>  }
>
>  static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data)
> @@ -133,8 +133,9 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
>         struct r10bio *r10_bio;
>         struct bio *bio;
>         int j;
> -       int nalloc, nalloc_rp;
> -       struct resync_pages *rps;
> +       int nalloc, nalloc_rf;
> +       struct resync_folio *rfs;
> +       int order = get_order(RESYNC_BLOCK_SIZE);
>
>         r10_bio = r10bio_pool_alloc(gfp_flags, conf);
>         if (!r10_bio)
> @@ -148,66 +149,64 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
>
>         /* allocate once for all bios */
>         if (!conf->have_replacement)
> -               nalloc_rp = nalloc;
> +               nalloc_rf = nalloc;
>         else
> -               nalloc_rp = nalloc * 2;
> -       rps = kmalloc_array(nalloc_rp, sizeof(struct resync_pages), gfp_flags);
> -       if (!rps)
> +               nalloc_rf = nalloc * 2;
> +       rfs = kmalloc_array(nalloc_rf, sizeof(struct resync_folio), gfp_flags);
> +       if (!rfs)
>                 goto out_free_r10bio;
>
>         /*
>          * Allocate bios.
>          */
>         for (j = nalloc ; j-- ; ) {
> -               bio = bio_kmalloc(RESYNC_PAGES, gfp_flags);
> +               bio = bio_kmalloc(1, gfp_flags);
>                 if (!bio)
>                         goto out_free_bio;
> -               bio_init_inline(bio, NULL, RESYNC_PAGES, 0);
> +               bio_init_inline(bio, NULL, 1, 0);
>                 r10_bio->devs[j].bio = bio;
>                 if (!conf->have_replacement)
>                         continue;
> -               bio = bio_kmalloc(RESYNC_PAGES, gfp_flags);
> +               bio = bio_kmalloc(1, gfp_flags);
>                 if (!bio)
>                         goto out_free_bio;
> -               bio_init_inline(bio, NULL, RESYNC_PAGES, 0);
> +               bio_init_inline(bio, NULL, 1, 0);
>                 r10_bio->devs[j].repl_bio = bio;
>         }
>         /*
> -        * Allocate RESYNC_PAGES data pages and attach them
> -        * where needed.
> +        * Allocate data folio and attach it where needed.
>          */
>         for (j = 0; j < nalloc; j++) {
>                 struct bio *rbio = r10_bio->devs[j].repl_bio;
> -               struct resync_pages *rp, *rp_repl;
> +               struct resync_folio *rf, *rf_repl;
>
> -               rp = &rps[j];
> +               rf = &rfs[j];
>                 if (rbio)
> -                       rp_repl = &rps[nalloc + j];
> -
> -               bio = r10_bio->devs[j].bio;
> +                       rf_repl = &rfs[nalloc + j];
>
>                 if (!j || test_bit(MD_RECOVERY_SYNC,
>                                    &conf->mddev->recovery)) {
> -                       if (resync_alloc_pages(rp, gfp_flags))
> -                               goto out_free_pages;
> +                       if (resync_alloc_folio(rf, gfp_flags, &order))
> +                               goto out_free_folio;
>                 } else {
> -                       memcpy(rp, &rps[0], sizeof(*rp));
> -                       resync_get_all_pages(rp);
> +                       memcpy(rf, &rfs[0], sizeof(*rf));
> +                       folio_get(rf->folio);
>                 }
>
> -               rp->raid_bio = r10_bio;
> -               bio->bi_private = rp;
> +               rf->raid_bio = r10_bio;
> +               r10_bio->devs[j].bio->bi_private = rf;
>                 if (rbio) {
> -                       memcpy(rp_repl, rp, sizeof(*rp));
> -                       rbio->bi_private = rp_repl;
> +                       memcpy(rf_repl, rf, sizeof(*rf));
> +                       rbio->bi_private = rf_repl;
>                 }
>         }
>
> +       r10_bio->sectors = 1 << (order + PAGE_SECTORS_SHIFT);
>         return r10_bio;
>
> -out_free_pages:
> +out_free_folio:
>         while (--j >= 0)
> -               resync_free_pages(&rps[j]);
> +               folio_put(rfs[j].folio);
>
>         j = 0;
>  out_free_bio:
> @@ -219,7 +218,7 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
>                         bio_uninit(r10_bio->devs[j].repl_bio);
>                 kfree(r10_bio->devs[j].repl_bio);
>         }
> -       kfree(rps);
> +       kfree(rfs);
>  out_free_r10bio:
>         rbio_pool_free(r10_bio, conf);
>         return NULL;
> @@ -230,14 +229,14 @@ static void r10buf_pool_free(void *__r10_bio, void *data)
>         struct r10conf *conf = data;
>         struct r10bio *r10bio = __r10_bio;
>         int j;
> -       struct resync_pages *rp = NULL;
> +       struct resync_folio *rf = NULL;
>
>         for (j = conf->copies; j--; ) {
>                 struct bio *bio = r10bio->devs[j].bio;
>
>                 if (bio) {
> -                       rp = get_resync_pages(bio);
> -                       resync_free_pages(rp);
> +                       rf = get_resync_folio(bio);
> +                       folio_put(rf->folio);
>                         bio_uninit(bio);
>                         kfree(bio);
>                 }
> @@ -250,7 +249,7 @@ static void r10buf_pool_free(void *__r10_bio, void *data)
>         }
>
>         /* resync pages array stored in the 1st bio's .bi_private */
> -       kfree(rp);
> +       kfree(rf);
>
>         rbio_pool_free(r10bio, conf);
>  }
> @@ -2342,8 +2341,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
>         struct r10conf *conf = mddev->private;
>         int i, first;
>         struct bio *tbio, *fbio;
> -       int vcnt;
> -       struct page **tpages, **fpages;
> +       struct folio *tfolio, *ffolio;
>
>         atomic_set(&r10_bio->remaining, 1);
>
> @@ -2359,14 +2357,13 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
>         fbio = r10_bio->devs[i].bio;
>         fbio->bi_iter.bi_size = r10_bio->sectors << 9;
>         fbio->bi_iter.bi_idx = 0;
> -       fpages = get_resync_pages(fbio)->pages;
> +       ffolio = get_resync_folio(fbio)->folio;
>
> -       vcnt = (r10_bio->sectors + (PAGE_SIZE >> 9) - 1) >> (PAGE_SHIFT - 9);
>         /* now find blocks with errors */
>         for (i=0 ; i < conf->copies ; i++) {
> -               int  j, d;
> +               int  d;
>                 struct md_rdev *rdev;
> -               struct resync_pages *rp;
> +               struct resync_folio *rf;
>
>                 tbio = r10_bio->devs[i].bio;
>
> @@ -2375,31 +2372,23 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
>                 if (i == first)
>                         continue;
>
> -               tpages = get_resync_pages(tbio)->pages;
> +               tfolio = get_resync_folio(tbio)->folio;
>                 d = r10_bio->devs[i].devnum;
>                 rdev = conf->mirrors[d].rdev;
>                 if (!r10_bio->devs[i].bio->bi_status) {
>                         /* We know that the bi_io_vec layout is the same for
>                          * both 'first' and 'i', so we just compare them.
> -                        * All vec entries are PAGE_SIZE;
>                          */
> -                       int sectors = r10_bio->sectors;
> -                       for (j = 0; j < vcnt; j++) {
> -                               int len = PAGE_SIZE;
> -                               if (sectors < (len / 512))
> -                                       len = sectors * 512;
> -                               if (memcmp(page_address(fpages[j]),
> -                                          page_address(tpages[j]),
> -                                          len))
> -                                       break;
> -                               sectors -= len/512;
> +                       if (memcmp(folio_address(ffolio),
> +                                  folio_address(tfolio),
> +                                  r10_bio->sectors << 9)) {
> +                               atomic64_add(r10_bio->sectors,
> +                                            &mddev->resync_mismatches);
> +                               if (test_bit(MD_RECOVERY_CHECK,
> +                                            &mddev->recovery))
> +                                       /* Don't fix anything. */
> +                                       continue;
>                         }
> -                       if (j == vcnt)
> -                               continue;
> -                       atomic64_add(r10_bio->sectors, &mddev->resync_mismatches);
> -                       if (test_bit(MD_RECOVERY_CHECK, &mddev->recovery))
> -                               /* Don't fix anything. */
> -                               continue;
>                 } else if (test_bit(FailFast, &rdev->flags)) {
>                         /* Just give up on this device */
>                         md_error(rdev->mddev, rdev);
> @@ -2410,13 +2399,13 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
>                  * First we need to fixup bv_offset, bv_len and
>                  * bi_vecs, as the read request might have corrupted these
>                  */
> -               rp = get_resync_pages(tbio);
> +               rf = get_resync_folio(tbio);
>                 bio_reset(tbio, conf->mirrors[d].rdev->bdev, REQ_OP_WRITE);
>
> -               md_bio_reset_resync_pages(tbio, rp, fbio->bi_iter.bi_size);
> +               md_bio_reset_resync_folio(tbio, rf, fbio->bi_iter.bi_size);
>
> -               rp->raid_bio = r10_bio;
> -               tbio->bi_private = rp;
> +               rf->raid_bio = r10_bio;
> +               tbio->bi_private = rf;
>                 tbio->bi_iter.bi_sector = r10_bio->devs[i].addr;
>                 tbio->bi_end_io = end_sync_write;
>
> @@ -2476,10 +2465,9 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
>         struct bio *bio = r10_bio->devs[0].bio;
>         sector_t sect = 0;
>         int sectors = r10_bio->sectors;
> -       int idx = 0;
>         int dr = r10_bio->devs[0].devnum;
>         int dw = r10_bio->devs[1].devnum;
> -       struct page **pages = get_resync_pages(bio)->pages;
> +       struct folio *folio = get_resync_folio(bio)->folio;
>
>         while (sectors) {
>                 int s = sectors;
> @@ -2492,19 +2480,21 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
>
>                 rdev = conf->mirrors[dr].rdev;
>                 addr = r10_bio->devs[0].addr + sect;
> -               ok = sync_page_io(rdev,
> -                                 addr,
> -                                 s << 9,
> -                                 pages[idx],
> -                                 REQ_OP_READ, false);
> +               ok = sync_folio_io(rdev,
> +                                  addr,
> +                                  s << 9,
> +                                  sect << 9,
> +                                  folio,
> +                                  REQ_OP_READ, false);
>                 if (ok) {
>                         rdev = conf->mirrors[dw].rdev;
>                         addr = r10_bio->devs[1].addr + sect;
> -                       ok = sync_page_io(rdev,
> -                                         addr,
> -                                         s << 9,
> -                                         pages[idx],
> -                                         REQ_OP_WRITE, false);
> +                       ok = sync_folio_io(rdev,
> +                                          addr,
> +                                          s << 9,
> +                                          sect << 9,
> +                                          folio,
> +                                          REQ_OP_WRITE, false);
>                         if (!ok) {
>                                 set_bit(WriteErrorSeen, &rdev->flags);
>                                 if (!test_and_set_bit(WantReplacement,
> @@ -2539,7 +2529,6 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
>
>                 sectors -= s;
>                 sect += s;
> -               idx++;
>         }
>  }
>
> @@ -3050,7 +3039,7 @@ static int init_resync(struct r10conf *conf)
>  static struct r10bio *raid10_alloc_init_r10buf(struct r10conf *conf)
>  {
>         struct r10bio *r10bio = mempool_alloc(&conf->r10buf_pool, GFP_NOIO);
> -       struct rsync_pages *rp;
> +       struct resync_folio *rf;
>         struct bio *bio;
>         int nalloc;
>         int i;
> @@ -3063,14 +3052,14 @@ static struct r10bio *raid10_alloc_init_r10buf(struct r10conf *conf)
>
>         for (i = 0; i < nalloc; i++) {
>                 bio = r10bio->devs[i].bio;
> -               rp = bio->bi_private;
> +               rf = bio->bi_private;
>                 bio_reset(bio, NULL, 0);
> -               bio->bi_private = rp;
> +               bio->bi_private = rf;
>                 bio = r10bio->devs[i].repl_bio;
>                 if (bio) {
> -                       rp = bio->bi_private;
> +                       rf = bio->bi_private;
>                         bio_reset(bio, NULL, 0);
> -                       bio->bi_private = rp;
> +                       bio->bi_private = rf;
>                 }
>         }
>         return r10bio;
> @@ -3156,7 +3145,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>         int max_sync = RESYNC_SECTORS;
>         sector_t sync_blocks;
>         sector_t chunk_mask = conf->geo.chunk_mask;
> -       int page_idx = 0;
>
>         /*
>          * Allow skipping a full rebuild for incremental assembly
> @@ -3376,6 +3364,15 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>                                                 continue;
>                                         }
>                                 }
> +
> +                               /*
> +                                * RESYNC_BLOCK_SIZE folio might alloc failed in
> +                                * resync_alloc_folio(). Fall back to smaller sync
> +                                * size if needed.
> +                                */
> +                               if (max_sync > r10_bio->sectors)
> +                                       max_sync = r10_bio->sectors;
> +
>                                 any_working = 1;
>                                 bio = r10_bio->devs[0].bio;
>                                 bio->bi_next = biolist;
> @@ -3527,7 +3524,15 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>                 }
>                 if (sync_blocks < max_sync)
>                         max_sync = sync_blocks;
> +
>                 r10_bio = raid10_alloc_init_r10buf(conf);
> +               /*
> +                * RESYNC_BLOCK_SIZE folio might alloc failed in resync_alloc_folio().
> +                * Fall back to smaller sync size if needed.
> +                */
> +               if (max_sync > r10_bio->sectors)
> +                       max_sync = r10_bio->sectors;
> +
>                 r10_bio->state = 0;
>
>                 r10_bio->mddev = mddev;
> @@ -3620,29 +3625,25 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>                 }
>         }
>
> -       nr_sectors = 0;
>         if (sector_nr + max_sync < max_sector)
>                 max_sector = sector_nr + max_sync;
>         do {
> -               struct page *page;
> -               int len = PAGE_SIZE;
> -               if (sector_nr + (len>>9) > max_sector)
> -                       len = (max_sector - sector_nr) << 9;
> -               if (len == 0)
> +               nr_sectors = max_sector - sector_nr;
> +
> +               if (nr_sectors == 0)
>                         break;
>                 for (bio= biolist ; bio ; bio=bio->bi_next) {
> -                       struct resync_pages *rp = get_resync_pages(bio);
> -                       page = resync_fetch_page(rp, page_idx);
> -                       if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
> +                       struct resync_folio *rf = get_resync_folio(bio);
> +
> +                       if (WARN_ON(!bio_add_folio(bio, rf->folio, nr_sectors << 9, 0))) {
>                                 bio->bi_status = BLK_STS_RESOURCE;
>                                 bio_endio(bio);
>                                 *skipped = 1;
> -                               return max_sync;
> +                               return nr_sectors << 9;
>                         }
>                 }
> -               nr_sectors += len>>9;
> -               sector_nr += len>>9;
> -       } while (++page_idx < RESYNC_PAGES);
> +               sector_nr += nr_sectors;
> +       } while (0);
>         r10_bio->sectors = nr_sectors;
>
>         if (mddev_is_clustered(mddev) &&
> @@ -4560,7 +4561,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
>                                 int *skipped)
>  {
>         /* We simply copy at most one chunk (smallest of old and new)
> -        * at a time, possibly less if that exceeds RESYNC_PAGES,
> +        * at a time, possibly less if that exceeds RESYNC_BLOCK_SIZE,
>          * or we hit a bad block or something.
>          * This might mean we pause for normal IO in the middle of
>          * a chunk, but that is not a problem as mddev->reshape_position
> @@ -4600,14 +4601,13 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
>         struct r10bio *r10_bio;
>         sector_t next, safe, last;
>         int max_sectors;
> -       int nr_sectors;
>         int s;
>         struct md_rdev *rdev;
>         int need_flush = 0;
>         struct bio *blist;
>         struct bio *bio, *read_bio;
>         int sectors_done = 0;
> -       struct page **pages;
> +       struct folio *folio;
>
>         if (sector_nr == 0) {
>                 /* If restarting in the middle, skip the initial sectors */
> @@ -4709,7 +4709,12 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
>         r10_bio->mddev = mddev;
>         r10_bio->sector = sector_nr;
>         set_bit(R10BIO_IsReshape, &r10_bio->state);
> -       r10_bio->sectors = last - sector_nr + 1;
> +       /*
> +        * RESYNC_BLOCK_SIZE folio might alloc failed in
> +        * resync_alloc_folio(). Fall back to smaller sync
> +        * size if needed.
> +        */
> +       r10_bio->sectors = min_t(int, r10_bio->sectors, last - sector_nr + 1);
>         rdev = read_balance(conf, r10_bio, &max_sectors);
>         BUG_ON(!test_bit(R10BIO_Previous, &r10_bio->state));
>
> @@ -4723,7 +4728,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
>                 return sectors_done;
>         }
>
> -       read_bio = bio_alloc_bioset(rdev->bdev, RESYNC_PAGES, REQ_OP_READ,
> +       read_bio = bio_alloc_bioset(rdev->bdev, 1, REQ_OP_READ,
>                                     GFP_KERNEL, &mddev->bio_set);
>         read_bio->bi_iter.bi_sector = (r10_bio->devs[r10_bio->read_slot].addr
>                                + rdev->data_offset);
> @@ -4787,32 +4792,23 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
>                 blist = b;
>         }
>
> -       /* Now add as many pages as possible to all of these bios. */
> +       /* Now add folio to all of these bios. */
>
> -       nr_sectors = 0;
> -       pages = get_resync_pages(r10_bio->devs[0].bio)->pages;
> -       for (s = 0 ; s < max_sectors; s += PAGE_SIZE >> 9) {
> -               struct page *page = pages[s / (PAGE_SIZE >> 9)];
> -               int len = (max_sectors - s) << 9;
> -               if (len > PAGE_SIZE)
> -                       len = PAGE_SIZE;
> -               for (bio = blist; bio ; bio = bio->bi_next) {
> -                       if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
> -                               bio->bi_status = BLK_STS_RESOURCE;
> -                               bio_endio(bio);
> -                               return sectors_done;
> -                       }
> +       folio = get_resync_folio(r10_bio->devs[0].bio)->folio;
> +       for (bio = blist; bio ; bio = bio->bi_next) {
> +               if (WARN_ON(!bio_add_folio(bio, folio, max_sectors, 0))) {
> +                       bio->bi_status = BLK_STS_RESOURCE;
> +                       bio_endio(bio);
> +                       return sectors_done;

In fact, the original codes don't clean up before returning.
bio_add_folio_nofail is used in raid1 and can we use
bio_add_folio_nofail here as well?

>                 }
> -               sector_nr += len >> 9;
> -               nr_sectors += len >> 9;
>         }
> -       r10_bio->sectors = nr_sectors;
> +       r10_bio->sectors = max_sectors >> 9;
>
>         /* Now submit the read */
>         atomic_inc(&r10_bio->remaining);
>         read_bio->bi_next = NULL;
>         submit_bio_noacct(read_bio);
> -       sectors_done += nr_sectors;
> +       sectors_done += max_sectors;
>         if (sector_nr <= last)
>                 goto read_more;
>
> @@ -4914,8 +4910,8 @@ static int handle_reshape_read_error(struct mddev *mddev,
>         struct r10conf *conf = mddev->private;
>         struct r10bio *r10b;
>         int slot = 0;
> -       int idx = 0;
> -       struct page **pages;
> +       int sect = 0;
> +       struct folio *folio;
>
>         r10b = kmalloc(struct_size(r10b, devs, conf->copies), GFP_NOIO);
>         if (!r10b) {
> @@ -4923,8 +4919,8 @@ static int handle_reshape_read_error(struct mddev *mddev,
>                 return -ENOMEM;
>         }
>
> -       /* reshape IOs share pages from .devs[0].bio */
> -       pages = get_resync_pages(r10_bio->devs[0].bio)->pages;
> +       /* reshape IOs share folio from .devs[0].bio */
> +       folio = get_resync_folio(r10_bio->devs[0].bio)->folio;
>
>         r10b->sector = r10_bio->sector;
>         __raid10_find_phys(&conf->prev, r10b);
> @@ -4940,19 +4936,19 @@ static int handle_reshape_read_error(struct mddev *mddev,
>                 while (!success) {
>                         int d = r10b->devs[slot].devnum;
>                         struct md_rdev *rdev = conf->mirrors[d].rdev;
> -                       sector_t addr;
>                         if (rdev == NULL ||
>                             test_bit(Faulty, &rdev->flags) ||
>                             !test_bit(In_sync, &rdev->flags))
>                                 goto failed;
>
> -                       addr = r10b->devs[slot].addr + idx * PAGE_SIZE;
>                         atomic_inc(&rdev->nr_pending);
> -                       success = sync_page_io(rdev,
> -                                              addr,
> -                                              s << 9,
> -                                              pages[idx],
> -                                              REQ_OP_READ, false);
> +                       success = sync_folio_io(rdev,
> +                                               r10b->devs[slot].addr +
> +                                               sect,
> +                                               s << 9,
> +                                               sect << 9,
> +                                               folio,
> +                                               REQ_OP_READ, false);
>                         rdev_dec_pending(rdev, mddev);
>                         if (success)
>                                 break;
> @@ -4971,7 +4967,7 @@ static int handle_reshape_read_error(struct mddev *mddev,
>                         return -EIO;
>                 }
>                 sectors -= s;
> -               idx++;
> +               sect += s;
>         }
>         kfree(r10b);
>         return 0;
> --
> 2.39.2
>
>

Regards
Xiao


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 6/8] md/raid1,raid10: use folio for sync path IO
  2026-04-30  1:54   ` Xiao Ni
@ 2026-05-07  7:13     ` 李楠 Magic Li
  0 siblings, 0 replies; 13+ messages in thread
From: 李楠 Magic Li @ 2026-05-07  7:13 UTC (permalink / raw)
  To: Xiao Ni, linan666@huaweicloud.com
  Cc: song@kernel.org, yukuai@fnnas.com, linux-raid@vger.kernel.org,
	linux-kernel@vger.kernel.org, yangerkun@huawei.com,
	yi.zhang@huawei.com, 张同浩 Tonghao Zhang

On Thu Apr 30, 2026 at 9:54 AM CST, Xiao Ni wrote:
> Hi Nan
>
> On Thu, Apr 16, 2026 at 11:55 AM <linan666@huaweicloud.com> wrote:
>>
>> From: Li Nan <linan122@huawei.com>
>>
>> Convert all IO on the sync path to use folios, and rename page-related
>> identifiers to match folio.
>>
>> Since RESYNC_BLOCK_SIZE (64K) has higher allocation failure chance than 4k,
>> retry with lower orders to improve allocation reliability. A r1/10_bio may
>> have different rf->folio orders, so use minimum order as r1/10_bio sectors
>> to prevent exceeding size when adding folio to IO later.
>>
>> Clean up:
>> 1. Remove resync_get_all_folio() and invoke folio_get() directly instead.
>> 2. Clean up redundant while(0) loop in md_bio_reset_resync_folio().
>> 3. Clean up bio variable by directly referencing r10_bio->devs[j].bio
>>    instead in r1buf_pool_alloc() and r10buf_pool_alloc().
>> 4. Clean up RESYNC_PAGES.
>> 5. Remove resync_fetch_folio(), access 'rf->folio' directly.
>> 6. Remove resync_free_folio(), call folio_put() directly.
>> 7. clean up sync IO size calculation in raid1/10_sync_request.
>>
>> Signed-off-by: Li Nan <linan122@huawei.com>
>> ---
>>  drivers/md/md.c       |   2 +-
>>  drivers/md/raid1-10.c |  80 ++++---------
>>  drivers/md/raid1.c    | 209 +++++++++++++++-------------------
>>  drivers/md/raid10.c   | 254 +++++++++++++++++++++---------------------
>>  4 files changed, 240 insertions(+), 305 deletions(-)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 5e83914d5c14..6554b849ac74 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -9440,7 +9440,7 @@ static bool sync_io_within_limit(struct mddev *mddev)
>>  {
>>         /*
>>          * For raid456, sync IO is stripe(4k) per IO, for other levels, it's
>> -        * RESYNC_PAGES(64k) per IO.
>> +        * RESYNC_BLOCK_SIZE(64k) per IO.
>>          */
>>         return atomic_read(&mddev->recovery_active) <
>>                (raid_is_456(mddev) ? 8 : 128) * sync_io_depth(mddev);
>> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
>> index cda531d0720b..10200b0a3fd2 100644
>> --- a/drivers/md/raid1-10.c
>> +++ b/drivers/md/raid1-10.c
>> @@ -1,7 +1,6 @@
>>  // SPDX-License-Identifier: GPL-2.0
>>  /* Maximum size of each resync request */
>>  #define RESYNC_BLOCK_SIZE (64*1024)
>> -#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
>>  #define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
>>
>>  /* when we get a read error on a read-only array, we redirect to another
>> @@ -20,9 +19,9 @@
>>  #define MAX_PLUG_BIO 32
>>
>>  /* for managing resync I/O pages */
>> -struct resync_pages {
>> +struct resync_folio {
>>         void            *raid_bio;
>> -       struct page     *pages[RESYNC_PAGES];
>> +       struct folio    *folio;
>>  };
>>
>>  struct raid1_plug_cb {
>> @@ -36,77 +35,44 @@ static void rbio_pool_free(void *rbio, void *data)
>>         kfree(rbio);
>>  }
>>
>> -static inline int resync_alloc_pages(struct resync_pages *rp,
>> -                                    gfp_t gfp_flags)
>> +static inline int resync_alloc_folio(struct resync_folio *rf,
>> +                                    gfp_t gfp_flags, int *order)
>>  {
>> -       int i;
>> +       struct folio *folio;
>>
>> -       for (i = 0; i < RESYNC_PAGES; i++) {
>> -               rp->pages[i] = alloc_page(gfp_flags);
>> -               if (!rp->pages[i])
>> -                       goto out_free;
>> -       }
>> +       do {
>> +               folio = folio_alloc(gfp_flags, *order);
>> +               if (folio)
>> +                       break;
>> +       } while (--(*order) > 0);
>
> It has a problem here. If it can't allocate a big page, the sync
> request unit will be smaller and sync performance may decrease. This
> can happen when the system lacks sufficient continuous memory. This
> change looks good to me. I just want to throw this problem out for an
> open discussion.
>

Yeah, it can be easily reproduced in qemu. We have a few options:
  1. Alloc smaller folio 
  2. Return -ENOMEM directly
  3. Alloc multiple small folios to assemble a larger one. It is not and
     good idea, as it will make the code much more complex.

IMO, 1 seems like the best choice.

>>
>> +       if (!folio)
>> +               return -ENOMEM;
>> +
>> +       rf->folio = folio;
>>         return 0;
>> -
>> -out_free:
>> -       while (--i >= 0)
>> -               put_page(rp->pages[i]);
>> -       return -ENOMEM;
>> -}
>> -
>> -static inline void resync_free_pages(struct resync_pages *rp)
>> -{
>> -       int i;
>> -
>> -       for (i = 0; i < RESYNC_PAGES; i++)
>> -               put_page(rp->pages[i]);
>> -}
>> -
>> -static inline void resync_get_all_pages(struct resync_pages *rp)
>> -{
>> -       int i;
>> -
>> -       for (i = 0; i < RESYNC_PAGES; i++)
>> -               get_page(rp->pages[i]);
>> -}
>> -
>> -static inline struct page *resync_fetch_page(struct resync_pages *rp,
>> -                                            unsigned idx)
>> -{
>> -       if (WARN_ON_ONCE(idx >= RESYNC_PAGES))
>> -               return NULL;
>> -       return rp->pages[idx];
>>  }
>>
>>  /*
>> - * 'strct resync_pages' stores actual pages used for doing the resync
>> + * 'strct resync_folio' stores actual pages used for doing the resync
>>   *  IO, and it is per-bio, so make .bi_private points to it.
>>   */
>> -static inline struct resync_pages *get_resync_pages(struct bio *bio)
>> +static inline struct resync_folio *get_resync_folio(struct bio *bio)
>>  {
>>         return bio->bi_private;
>>  }
>>
>>  /* generally called after bio_reset() for reseting bvec */
>> -static void md_bio_reset_resync_pages(struct bio *bio, struct resync_pages *rp,
>> +static void md_bio_reset_resync_folio(struct bio *bio, struct resync_folio *rf,
>>                                int size)
>>  {
>> -       int idx = 0;
>> -
>>         /* initialize bvec table again */
>> -       do {
>> -               struct page *page = resync_fetch_page(rp, idx);
>> -               int len = min_t(int, size, PAGE_SIZE);
>> -
>> -               if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
>> -                       bio->bi_status = BLK_STS_RESOURCE;
>> -                       bio_endio(bio);
>> -                       return;
>> -               }
>> -
>> -               size -= len;
>> -       } while (idx++ < RESYNC_PAGES && size > 0);
>> +       if (WARN_ON(!bio_add_folio(bio, rf->folio,
>> +                                  min_t(int, size, RESYNC_BLOCK_SIZE),
>> +                                  0))) {
>> +               bio->bi_status = BLK_STS_RESOURCE;
>> +               bio_endio(bio);
>> +       }
>>  }
>>
>>
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index a72abdc37a2d..724fd4f2cc3a 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -120,11 +120,11 @@ static void remove_serial(struct md_rdev *rdev, sector_t lo, sector_t hi)
>>
>>  /*
>>   * for resync bio, r1bio pointer can be retrieved from the per-bio
>> - * 'struct resync_pages'.
>> + * 'struct resync_folio'.
>>   */
>>  static inline struct r1bio *get_resync_r1bio(struct bio *bio)
>>  {
>> -       return get_resync_pages(bio)->raid_bio;
>> +       return get_resync_folio(bio)->raid_bio;
>>  }
>>
>>  static void *r1bio_pool_alloc(gfp_t gfp_flags, struct r1conf *conf)
>> @@ -146,70 +146,69 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
>>         struct r1conf *conf = data;
>>         struct r1bio *r1_bio;
>>         struct bio *bio;
>> -       int need_pages;
>> +       int need_folio;
>
> The name need_folio is confusing. Can we keep the same style as the
> old version? How about need_folios?
>

Agree, I will rename it in v2.

>>         int j;
>> -       struct resync_pages *rps;
>> +       struct resync_folio *rfs;
>> +       int order = get_order(RESYNC_BLOCK_SIZE);
>>
>>         r1_bio = r1bio_pool_alloc(gfp_flags, conf);
>>         if (!r1_bio)
>>                 return NULL;
>>
>> -       rps = kmalloc_array(conf->raid_disks * 2, sizeof(struct resync_pages),
>> +       rfs = kmalloc_array(conf->raid_disks * 2, sizeof(struct resync_folio),
>>                             gfp_flags);
>> -       if (!rps)
>> +       if (!rfs)
>>                 goto out_free_r1bio;
>>
>>         /*
>>          * Allocate bios : 1 for reading, n-1 for writing
>>          */
>>         for (j = conf->raid_disks * 2; j-- ; ) {
>> -               bio = bio_kmalloc(RESYNC_PAGES, gfp_flags);
>> +               bio = bio_kmalloc(1, gfp_flags);
>>                 if (!bio)
>>                         goto out_free_bio;
>> -               bio_init_inline(bio, NULL, RESYNC_PAGES, 0);
>> +               bio_init_inline(bio, NULL, 1, 0);
>>                 r1_bio->bios[j] = bio;
>>         }
>>         /*
>> -        * Allocate RESYNC_PAGES data pages and attach them to
>> -        * the first bio.
>> +        * Allocate data folio and attach it to the first bio.
>>          * If this is a user-requested check/repair, allocate
>> -        * RESYNC_PAGES for each bio.
>> +        * folio for each bio.
>>          */
>>         if (test_bit(MD_RECOVERY_REQUESTED, &conf->mddev->recovery))
>> -               need_pages = conf->raid_disks * 2;
>> +               need_folio = conf->raid_disks * 2;
>>         else
>> -               need_pages = 1;
>> +               need_folio = 1;
>>         for (j = 0; j < conf->raid_disks * 2; j++) {
>> -               struct resync_pages *rp = &rps[j];
>> +               struct resync_folio *rf = &rfs[j];
>>
>> -               bio = r1_bio->bios[j];
>> -
>> -               if (j < need_pages) {
>> -                       if (resync_alloc_pages(rp, gfp_flags))
>> -                               goto out_free_pages;
>> +               if (j < need_folio) {
>> +                       if (resync_alloc_folio(rf, gfp_flags, &order))
>> +                               goto out_free_folio;
>>                 } else {
>> -                       memcpy(rp, &rps[0], sizeof(*rp));
>> -                       resync_get_all_pages(rp);
>> +                       memcpy(rf, &rfs[0], sizeof(*rf));
>> +                       folio_get(rf->folio);
>>                 }
>>
>> -               rp->raid_bio = r1_bio;
>> -               bio->bi_private = rp;
>> +               rf->raid_bio = r1_bio;
>> +               r1_bio->bios[j]->bi_private = rf;
>>         }
>>
>> +       r1_bio->sectors = 1 << (order + PAGE_SECTORS_SHIFT);
>>         r1_bio->master_bio = NULL;
>>
>>         return r1_bio;
>>
>> -out_free_pages:
>> +out_free_folio:
>>         while (--j >= 0)
>> -               resync_free_pages(&rps[j]);
>> +               folio_put(rfs[j].folio);
>>
>>  out_free_bio:
>>         while (++j < conf->raid_disks * 2) {
>>                 bio_uninit(r1_bio->bios[j]);
>>                 kfree(r1_bio->bios[j]);
>>         }
>> -       kfree(rps);
>> +       kfree(rfs);
>>
>>  out_free_r1bio:
>>         rbio_pool_free(r1_bio, data);
>> @@ -221,17 +220,17 @@ static void r1buf_pool_free(void *__r1_bio, void *data)
>>         struct r1conf *conf = data;
>>         int i;
>>         struct r1bio *r1bio = __r1_bio;
>> -       struct resync_pages *rp = NULL;
>> +       struct resync_folio *rf = NULL;
>>
>>         for (i = conf->raid_disks * 2; i--; ) {
>> -               rp = get_resync_pages(r1bio->bios[i]);
>> -               resync_free_pages(rp);
>> +               rf = get_resync_folio(r1bio->bios[i]);
>> +               folio_put(rf->folio);
>>                 bio_uninit(r1bio->bios[i]);
>>                 kfree(r1bio->bios[i]);
>>         }
>>
>> -       /* resync pages array stored in the 1st bio's .bi_private */
>> -       kfree(rp);
>> +       /* resync folio stored in the 1st bio's .bi_private */
>> +       kfree(rf);
>>
>>         rbio_pool_free(r1bio, data);
>>  }
>> @@ -2095,10 +2094,10 @@ static void end_sync_write(struct bio *bio)
>>         put_sync_write_buf(r1_bio);
>>  }
>>
>> -static int r1_sync_page_io(struct md_rdev *rdev, sector_t sector,
>> -                          int sectors, struct page *page, blk_opf_t rw)
>> +static int r1_sync_folio_io(struct md_rdev *rdev, sector_t sector, int sectors,
>> +                           int off, struct folio *folio, blk_opf_t rw)
>>  {
>> -       if (sync_page_io(rdev, sector, sectors << 9, page, rw, false))
>> +       if (sync_folio_io(rdev, sector, sectors << 9, off, folio, rw, false))
>>                 /* success */
>>                 return 1;
>>         if (rw == REQ_OP_WRITE) {
>> @@ -2129,10 +2128,10 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
>>         struct mddev *mddev = r1_bio->mddev;
>>         struct r1conf *conf = mddev->private;
>>         struct bio *bio = r1_bio->bios[r1_bio->read_disk];
>> -       struct page **pages = get_resync_pages(bio)->pages;
>> +       struct folio *folio = get_resync_folio(bio)->folio;
>>         sector_t sect = r1_bio->sector;
>>         int sectors = r1_bio->sectors;
>> -       int idx = 0;
>> +       int off = 0;
>>         struct md_rdev *rdev;
>>
>>         rdev = conf->mirrors[r1_bio->read_disk].rdev;
>> @@ -2162,9 +2161,8 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
>>                                  * active, and resync is currently active
>>                                  */
>>                                 rdev = conf->mirrors[d].rdev;
>> -                               if (sync_page_io(rdev, sect, s<<9,
>> -                                                pages[idx],
>> -                                                REQ_OP_READ, false)) {
>> +                               if (sync_folio_io(rdev, sect, s<<9, off, folio,
>> +                                                 REQ_OP_READ, false)) {
>>                                         success = 1;
>>                                         break;
>>                                 }
>> @@ -2197,7 +2195,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
>>                         /* Try next page */
>>                         sectors -= s;
>>                         sect += s;
>> -                       idx++;
>> +                       off += s << 9;
>>                         continue;
>>                 }
>>
>> @@ -2210,8 +2208,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
>>                         if (r1_bio->bios[d]->bi_end_io != end_sync_read)
>>                                 continue;
>>                         rdev = conf->mirrors[d].rdev;
>> -                       if (r1_sync_page_io(rdev, sect, s,
>> -                                           pages[idx],
>> +                       if (r1_sync_folio_io(rdev, sect, s, off, folio,
>>                                             REQ_OP_WRITE) == 0) {
>>                                 r1_bio->bios[d]->bi_end_io = NULL;
>>                                 rdev_dec_pending(rdev, mddev);
>> @@ -2225,14 +2222,13 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
>>                         if (r1_bio->bios[d]->bi_end_io != end_sync_read)
>>                                 continue;
>>                         rdev = conf->mirrors[d].rdev;
>> -                       if (r1_sync_page_io(rdev, sect, s,
>> -                                           pages[idx],
>> +                       if (r1_sync_folio_io(rdev, sect, s, off, folio,
>>                                             REQ_OP_READ) != 0)
>>                                 atomic_add(s, &rdev->corrected_errors);
>>                 }
>>                 sectors -= s;
>>                 sect += s;
>> -               idx ++;
>> +               off += s << 9;
>>         }
>>         set_bit(R1BIO_Uptodate, &r1_bio->state);
>>         bio->bi_status = 0;
>> @@ -2252,14 +2248,12 @@ static void process_checks(struct r1bio *r1_bio)
>>         struct r1conf *conf = mddev->private;
>>         int primary;
>>         int i;
>> -       int vcnt;
>>
>>         /* Fix variable parts of all bios */
>> -       vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9);
>>         for (i = 0; i < conf->raid_disks * 2; i++) {
>>                 blk_status_t status;
>>                 struct bio *b = r1_bio->bios[i];
>> -               struct resync_pages *rp = get_resync_pages(b);
>> +               struct resync_folio *rf = get_resync_folio(b);
>>                 if (b->bi_end_io != end_sync_read)
>>                         continue;
>>                 /* fixup the bio for reuse, but preserve errno */
>> @@ -2269,11 +2263,11 @@ static void process_checks(struct r1bio *r1_bio)
>>                 b->bi_iter.bi_sector = r1_bio->sector +
>>                         conf->mirrors[i].rdev->data_offset;
>>                 b->bi_end_io = end_sync_read;
>> -               rp->raid_bio = r1_bio;
>> -               b->bi_private = rp;
>> +               rf->raid_bio = r1_bio;
>> +               b->bi_private = rf;
>>
>>                 /* initialize bvec table again */
>> -               md_bio_reset_resync_pages(b, rp, r1_bio->sectors << 9);
>> +               md_bio_reset_resync_folio(b, rf, r1_bio->sectors << 9);
>>         }
>>         for (primary = 0; primary < conf->raid_disks * 2; primary++)
>>                 if (r1_bio->bios[primary]->bi_end_io == end_sync_read &&
>> @@ -2284,44 +2278,39 @@ static void process_checks(struct r1bio *r1_bio)
>>                 }
>>         r1_bio->read_disk = primary;
>>         for (i = 0; i < conf->raid_disks * 2; i++) {
>> -               int j = 0;
>>                 struct bio *pbio = r1_bio->bios[primary];
>>                 struct bio *sbio = r1_bio->bios[i];
>>                 blk_status_t status = sbio->bi_status;
>> -               struct page **ppages = get_resync_pages(pbio)->pages;
>> -               struct page **spages = get_resync_pages(sbio)->pages;
>> -               struct bio_vec *bi;
>> -               int page_len[RESYNC_PAGES] = { 0 };
>> -               struct bvec_iter_all iter_all;
>> +               struct folio *pfolio = get_resync_folio(pbio)->folio;
>> +               struct folio *sfolio = get_resync_folio(sbio)->folio;
>>
>>                 if (sbio->bi_end_io != end_sync_read)
>>                         continue;
>>                 /* Now we can 'fixup' the error value */
>>                 sbio->bi_status = 0;
>>
>> -               bio_for_each_segment_all(bi, sbio, iter_all)
>> -                       page_len[j++] = bi->bv_len;
>> -
>> -               if (!status) {
>> -                       for (j = vcnt; j-- ; ) {
>> -                               if (memcmp(page_address(ppages[j]),
>> -                                          page_address(spages[j]),
>> -                                          page_len[j]))
>> -                                       break;
>> -                       }
>> -               } else
>> -                       j = 0;
>> -               if (j >= 0)
>> +               /*
>> +                * Copy data and submit write in two cases:
>> +                * - IO error (non-zero status)
>> +                * - Data inconsistency and not a CHECK operation.
>> +                */
>> +               if (status) {
>>                         atomic64_add(r1_bio->sectors, &mddev->resync_mismatches);
>> -               if (j < 0 || (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)
>> -                             && !status)) {
>> -                       /* No need to write to this device. */
>> -                       sbio->bi_end_io = NULL;
>> -                       rdev_dec_pending(conf->mirrors[i].rdev, mddev);
>> +                       bio_copy_data(sbio, pbio);
>>                         continue;
>> +               } else if (memcmp(folio_address(pfolio),
>> +                       folio_address(sfolio),
>> +                       r1_bio->sectors << 9)) {
>> +                       atomic64_add(r1_bio->sectors, &mddev->resync_mismatches);
>> +                       if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) {
>> +                               bio_copy_data(sbio, pbio);
>> +                               continue;
>> +                       }
>>                 }
>>
>> -               bio_copy_data(sbio, pbio);
>> +               /* No need to write to this device. */
>> +               sbio->bi_end_io = NULL;
>> +               rdev_dec_pending(conf->mirrors[i].rdev, mddev);
>>         }
>>  }
>>
>> @@ -2446,9 +2435,8 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
>>                         if (rdev &&
>>                             !test_bit(Faulty, &rdev->flags)) {
>>                                 atomic_inc(&rdev->nr_pending);
>> -                               r1_sync_page_io(rdev, sect, s,
>> -                                               folio_page(conf->tmpfolio, 0),
>> -                                               REQ_OP_WRITE);
>> +                               r1_sync_folio_io(rdev, sect, s, 0,
>> +                                               conf->tmpfolio, REQ_OP_WRITE);
>>                                 rdev_dec_pending(rdev, mddev);
>>                         }
>>                 }
>> @@ -2461,9 +2449,8 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
>>                         if (rdev &&
>>                             !test_bit(Faulty, &rdev->flags)) {
>>                                 atomic_inc(&rdev->nr_pending);
>> -                               if (r1_sync_page_io(rdev, sect, s,
>> -                                               folio_page(conf->tmpfolio, 0),
>> -                                               REQ_OP_READ)) {
>> +                               if (r1_sync_folio_io(rdev, sect, s, 0,
>> +                                               conf->tmpfolio, REQ_OP_READ)) {
>>                                         atomic_add(s, &rdev->corrected_errors);
>>                                         pr_info("md/raid1:%s: read error corrected (%d sectors at %llu on %pg)\n",
>>                                                 mdname(mddev), s,
>> @@ -2738,15 +2725,15 @@ static int init_resync(struct r1conf *conf)
>>  static struct r1bio *raid1_alloc_init_r1buf(struct r1conf *conf)
>>  {
>>         struct r1bio *r1bio = mempool_alloc(&conf->r1buf_pool, GFP_NOIO);
>> -       struct resync_pages *rps;
>> +       struct resync_folio *rfs;
>>         struct bio *bio;
>>         int i;
>>
>>         for (i = conf->raid_disks * 2; i--; ) {
>>                 bio = r1bio->bios[i];
>> -               rps = bio->bi_private;
>> +               rfs = bio->bi_private;
>>                 bio_reset(bio, NULL, 0);
>> -               bio->bi_private = rps;
>> +               bio->bi_private = rfs;
>>         }
>>         r1bio->master_bio = NULL;
>>         return r1bio;
>> @@ -2775,10 +2762,9 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
>>         int write_targets = 0, read_targets = 0;
>>         sector_t sync_blocks;
>>         bool still_degraded = false;
>> -       int good_sectors = RESYNC_SECTORS;
>> +       int good_sectors;
>>         int min_bad = 0; /* number of sectors that are bad in all devices */
>>         int idx = sector_to_idx(sector_nr);
>> -       int page_idx = 0;
>>
>>         if (!mempool_initialized(&conf->r1buf_pool))
>>                 if (init_resync(conf))
>> @@ -2858,8 +2844,11 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
>>         r1_bio->sector = sector_nr;
>>         r1_bio->state = 0;
>>         set_bit(R1BIO_IsSync, &r1_bio->state);
>> -       /* make sure good_sectors won't go across barrier unit boundary */
>> -       good_sectors = align_to_barrier_unit_end(sector_nr, good_sectors);
>> +       /*
>> +        * make sure good_sectors won't go across barrier unit boundary.
>> +        * r1_bio->sectors <= RESYNC_SECTORS.
>> +        */
>> +       good_sectors = align_to_barrier_unit_end(sector_nr, r1_bio->sectors);
>>
>>         for (i = 0; i < conf->raid_disks * 2; i++) {
>>                 struct md_rdev *rdev;
>> @@ -2979,44 +2968,28 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
>>                 max_sector = mddev->resync_max; /* Don't do IO beyond here */
>>         if (max_sector > sector_nr + good_sectors)
>>                 max_sector = sector_nr + good_sectors;
>> -       nr_sectors = 0;
>> -       sync_blocks = 0;
>>         do {
>> -               struct page *page;
>> -               int len = PAGE_SIZE;
>> -               if (sector_nr + (len>>9) > max_sector)
>> -                       len = (max_sector - sector_nr) << 9;
>> -               if (len == 0)
>> +               nr_sectors = max_sector - sector_nr;
>> +               if (nr_sectors == 0)
>>                         break;
>> -               if (sync_blocks == 0) {
>> -                       if (!md_bitmap_start_sync(mddev, sector_nr,
>> -                                                 &sync_blocks, still_degraded) &&
>> -                           !conf->fullsync &&
>> -                           !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
>> -                               break;
>> -                       if ((len >> 9) > sync_blocks)
>> -                               len = sync_blocks<<9;
>> -               }
>> +               if (!md_bitmap_start_sync(mddev, sector_nr,
>> +                                         &sync_blocks, still_degraded) &&
>> +                   !conf->fullsync &&
>> +                   !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
>> +                       break;
>> +               if (nr_sectors > sync_blocks)
>> +                       nr_sectors = sync_blocks;
>>
>>                 for (i = 0 ; i < conf->raid_disks * 2; i++) {
>> -                       struct resync_pages *rp;
>> -
>>                         bio = r1_bio->bios[i];
>> -                       rp = get_resync_pages(bio);
>>                         if (bio->bi_end_io) {
>> -                               page = resync_fetch_page(rp, page_idx);
>> +                               struct resync_folio *rf = get_resync_folio(bio);
>>
>> -                               /*
>> -                                * won't fail because the vec table is big
>> -                                * enough to hold all these pages
>> -                                */
>> -                               __bio_add_page(bio, page, len, 0);
>> +                               bio_add_folio_nofail(bio, rf->folio, nr_sectors << 9, 0);
>>                         }
>>                 }
>> -               nr_sectors += len>>9;
>> -               sector_nr += len>>9;
>> -               sync_blocks -= (len>>9);
>> -       } while (++page_idx < RESYNC_PAGES);
>> +               sector_nr += nr_sectors;
>> +       } while (0);
>
> Now it can handle all pages in one go via a folio. It's strange to
> keep while(0) here.
>

I tried cleanning up while(0), it made 'if' and 'break' statements
unreadable. So I kept while(0) here.

>
>>
>>         r1_bio->sectors = nr_sectors;
>
>
> This patch is a little big. Is it better to split this patch here?
>

It can't be splitted. The changes in raid1.c and raid10.c are entirely about
resync_pages -> resync_folio. We have to change declaration and its usage in
one patch.

>>
>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>> index 26f93040cd13..3638e00fe420 100644
>> --- a/drivers/md/raid10.c
>> +++ b/drivers/md/raid10.c
>> @@ -96,11 +96,11 @@ static void end_reshape(struct r10conf *conf);
>>
>>  /*
>>   * for resync bio, r10bio pointer can be retrieved from the per-bio
>> - * 'struct resync_pages'.
>> + * 'struct resync_folio'.
>>   */
>>  static inline struct r10bio *get_resync_r10bio(struct bio *bio)
>>  {
>> -       return get_resync_pages(bio)->raid_bio;
>> +       return get_resync_folio(bio)->raid_bio;
>>  }
>>
>>  static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data)
>> @@ -133,8 +133,9 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
>>         struct r10bio *r10_bio;
>>         struct bio *bio;
>>         int j;
>> -       int nalloc, nalloc_rp;
>> -       struct resync_pages *rps;
>> +       int nalloc, nalloc_rf;
>> +       struct resync_folio *rfs;
>> +       int order = get_order(RESYNC_BLOCK_SIZE);
>>
>>         r10_bio = r10bio_pool_alloc(gfp_flags, conf);
>>         if (!r10_bio)
>> @@ -148,66 +149,64 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
>>
>>         /* allocate once for all bios */
>>         if (!conf->have_replacement)
>> -               nalloc_rp = nalloc;
>> +               nalloc_rf = nalloc;
>>         else
>> -               nalloc_rp = nalloc * 2;
>> -       rps = kmalloc_array(nalloc_rp, sizeof(struct resync_pages), gfp_flags);
>> -       if (!rps)
>> +               nalloc_rf = nalloc * 2;
>> +       rfs = kmalloc_array(nalloc_rf, sizeof(struct resync_folio), gfp_flags);
>> +       if (!rfs)
>>                 goto out_free_r10bio;
>>
>>         /*
>>          * Allocate bios.
>>          */
>>         for (j = nalloc ; j-- ; ) {
>> -               bio = bio_kmalloc(RESYNC_PAGES, gfp_flags);
>> +               bio = bio_kmalloc(1, gfp_flags);
>>                 if (!bio)
>>                         goto out_free_bio;
>> -               bio_init_inline(bio, NULL, RESYNC_PAGES, 0);
>> +               bio_init_inline(bio, NULL, 1, 0);
>>                 r10_bio->devs[j].bio = bio;
>>                 if (!conf->have_replacement)
>>                         continue;
>> -               bio = bio_kmalloc(RESYNC_PAGES, gfp_flags);
>> +               bio = bio_kmalloc(1, gfp_flags);
>>                 if (!bio)
>>                         goto out_free_bio;
>> -               bio_init_inline(bio, NULL, RESYNC_PAGES, 0);
>> +               bio_init_inline(bio, NULL, 1, 0);
>>                 r10_bio->devs[j].repl_bio = bio;
>>         }
>>         /*
>> -        * Allocate RESYNC_PAGES data pages and attach them
>> -        * where needed.
>> +        * Allocate data folio and attach it where needed.
>>          */
>>         for (j = 0; j < nalloc; j++) {
>>                 struct bio *rbio = r10_bio->devs[j].repl_bio;
>> -               struct resync_pages *rp, *rp_repl;
>> +               struct resync_folio *rf, *rf_repl;
>>
>> -               rp = &rps[j];
>> +               rf = &rfs[j];
>>                 if (rbio)
>> -                       rp_repl = &rps[nalloc + j];
>> -
>> -               bio = r10_bio->devs[j].bio;
>> +                       rf_repl = &rfs[nalloc + j];
>>
>>                 if (!j || test_bit(MD_RECOVERY_SYNC,
>>                                    &conf->mddev->recovery)) {
>> -                       if (resync_alloc_pages(rp, gfp_flags))
>> -                               goto out_free_pages;
>> +                       if (resync_alloc_folio(rf, gfp_flags, &order))
>> +                               goto out_free_folio;
>>                 } else {
>> -                       memcpy(rp, &rps[0], sizeof(*rp));
>> -                       resync_get_all_pages(rp);
>> +                       memcpy(rf, &rfs[0], sizeof(*rf));
>> +                       folio_get(rf->folio);
>>                 }
>>
>> -               rp->raid_bio = r10_bio;
>> -               bio->bi_private = rp;
>> +               rf->raid_bio = r10_bio;
>> +               r10_bio->devs[j].bio->bi_private = rf;
>>                 if (rbio) {
>> -                       memcpy(rp_repl, rp, sizeof(*rp));
>> -                       rbio->bi_private = rp_repl;
>> +                       memcpy(rf_repl, rf, sizeof(*rf));
>> +                       rbio->bi_private = rf_repl;
>>                 }
>>         }
>>
>> +       r10_bio->sectors = 1 << (order + PAGE_SECTORS_SHIFT);
>>         return r10_bio;
>>
>> -out_free_pages:
>> +out_free_folio:
>>         while (--j >= 0)
>> -               resync_free_pages(&rps[j]);
>> +               folio_put(rfs[j].folio);
>>
>>         j = 0;
>>  out_free_bio:
>> @@ -219,7 +218,7 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
>>                         bio_uninit(r10_bio->devs[j].repl_bio);
>>                 kfree(r10_bio->devs[j].repl_bio);
>>         }
>> -       kfree(rps);
>> +       kfree(rfs);
>>  out_free_r10bio:
>>         rbio_pool_free(r10_bio, conf);
>>         return NULL;
>> @@ -230,14 +229,14 @@ static void r10buf_pool_free(void *__r10_bio, void *data)
>>         struct r10conf *conf = data;
>>         struct r10bio *r10bio = __r10_bio;
>>         int j;
>> -       struct resync_pages *rp = NULL;
>> +       struct resync_folio *rf = NULL;
>>
>>         for (j = conf->copies; j--; ) {
>>                 struct bio *bio = r10bio->devs[j].bio;
>>
>>                 if (bio) {
>> -                       rp = get_resync_pages(bio);
>> -                       resync_free_pages(rp);
>> +                       rf = get_resync_folio(bio);
>> +                       folio_put(rf->folio);
>>                         bio_uninit(bio);
>>                         kfree(bio);
>>                 }
>> @@ -250,7 +249,7 @@ static void r10buf_pool_free(void *__r10_bio, void *data)
>>         }
>>
>>         /* resync pages array stored in the 1st bio's .bi_private */
>> -       kfree(rp);
>> +       kfree(rf);
>>
>>         rbio_pool_free(r10bio, conf);
>>  }
>> @@ -2342,8 +2341,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
>>         struct r10conf *conf = mddev->private;
>>         int i, first;
>>         struct bio *tbio, *fbio;
>> -       int vcnt;
>> -       struct page **tpages, **fpages;
>> +       struct folio *tfolio, *ffolio;
>>
>>         atomic_set(&r10_bio->remaining, 1);
>>
>> @@ -2359,14 +2357,13 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
>>         fbio = r10_bio->devs[i].bio;
>>         fbio->bi_iter.bi_size = r10_bio->sectors << 9;
>>         fbio->bi_iter.bi_idx = 0;
>> -       fpages = get_resync_pages(fbio)->pages;
>> +       ffolio = get_resync_folio(fbio)->folio;
>>
>> -       vcnt = (r10_bio->sectors + (PAGE_SIZE >> 9) - 1) >> (PAGE_SHIFT - 9);
>>         /* now find blocks with errors */
>>         for (i=0 ; i < conf->copies ; i++) {
>> -               int  j, d;
>> +               int  d;
>>                 struct md_rdev *rdev;
>> -               struct resync_pages *rp;
>> +               struct resync_folio *rf;
>>
>>                 tbio = r10_bio->devs[i].bio;
>>
>> @@ -2375,31 +2372,23 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
>>                 if (i == first)
>>                         continue;
>>
>> -               tpages = get_resync_pages(tbio)->pages;
>> +               tfolio = get_resync_folio(tbio)->folio;
>>                 d = r10_bio->devs[i].devnum;
>>                 rdev = conf->mirrors[d].rdev;
>>                 if (!r10_bio->devs[i].bio->bi_status) {
>>                         /* We know that the bi_io_vec layout is the same for
>>                          * both 'first' and 'i', so we just compare them.
>> -                        * All vec entries are PAGE_SIZE;
>>                          */
>> -                       int sectors = r10_bio->sectors;
>> -                       for (j = 0; j < vcnt; j++) {
>> -                               int len = PAGE_SIZE;
>> -                               if (sectors < (len / 512))
>> -                                       len = sectors * 512;
>> -                               if (memcmp(page_address(fpages[j]),
>> -                                          page_address(tpages[j]),
>> -                                          len))
>> -                                       break;
>> -                               sectors -= len/512;
>> +                       if (memcmp(folio_address(ffolio),
>> +                                  folio_address(tfolio),
>> +                                  r10_bio->sectors << 9)) {
>> +                               atomic64_add(r10_bio->sectors,
>> +                                            &mddev->resync_mismatches);
>> +                               if (test_bit(MD_RECOVERY_CHECK,
>> +                                            &mddev->recovery))
>> +                                       /* Don't fix anything. */
>> +                                       continue;
>>                         }
>> -                       if (j == vcnt)
>> -                               continue;
>> -                       atomic64_add(r10_bio->sectors, &mddev->resync_mismatches);
>> -                       if (test_bit(MD_RECOVERY_CHECK, &mddev->recovery))
>> -                               /* Don't fix anything. */
>> -                               continue;
>>                 } else if (test_bit(FailFast, &rdev->flags)) {
>>                         /* Just give up on this device */
>>                         md_error(rdev->mddev, rdev);
>> @@ -2410,13 +2399,13 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
>>                  * First we need to fixup bv_offset, bv_len and
>>                  * bi_vecs, as the read request might have corrupted these
>>                  */
>> -               rp = get_resync_pages(tbio);
>> +               rf = get_resync_folio(tbio);
>>                 bio_reset(tbio, conf->mirrors[d].rdev->bdev, REQ_OP_WRITE);
>>
>> -               md_bio_reset_resync_pages(tbio, rp, fbio->bi_iter.bi_size);
>> +               md_bio_reset_resync_folio(tbio, rf, fbio->bi_iter.bi_size);
>>
>> -               rp->raid_bio = r10_bio;
>> -               tbio->bi_private = rp;
>> +               rf->raid_bio = r10_bio;
>> +               tbio->bi_private = rf;
>>                 tbio->bi_iter.bi_sector = r10_bio->devs[i].addr;
>>                 tbio->bi_end_io = end_sync_write;
>>
>> @@ -2476,10 +2465,9 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
>>         struct bio *bio = r10_bio->devs[0].bio;
>>         sector_t sect = 0;
>>         int sectors = r10_bio->sectors;
>> -       int idx = 0;
>>         int dr = r10_bio->devs[0].devnum;
>>         int dw = r10_bio->devs[1].devnum;
>> -       struct page **pages = get_resync_pages(bio)->pages;
>> +       struct folio *folio = get_resync_folio(bio)->folio;
>>
>>         while (sectors) {
>>                 int s = sectors;
>> @@ -2492,19 +2480,21 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
>>
>>                 rdev = conf->mirrors[dr].rdev;
>>                 addr = r10_bio->devs[0].addr + sect;
>> -               ok = sync_page_io(rdev,
>> -                                 addr,
>> -                                 s << 9,
>> -                                 pages[idx],
>> -                                 REQ_OP_READ, false);
>> +               ok = sync_folio_io(rdev,
>> +                                  addr,
>> +                                  s << 9,
>> +                                  sect << 9,
>> +                                  folio,
>> +                                  REQ_OP_READ, false);
>>                 if (ok) {
>>                         rdev = conf->mirrors[dw].rdev;
>>                         addr = r10_bio->devs[1].addr + sect;
>> -                       ok = sync_page_io(rdev,
>> -                                         addr,
>> -                                         s << 9,
>> -                                         pages[idx],
>> -                                         REQ_OP_WRITE, false);
>> +                       ok = sync_folio_io(rdev,
>> +                                          addr,
>> +                                          s << 9,
>> +                                          sect << 9,
>> +                                          folio,
>> +                                          REQ_OP_WRITE, false);
>>                         if (!ok) {
>>                                 set_bit(WriteErrorSeen, &rdev->flags);
>>                                 if (!test_and_set_bit(WantReplacement,
>> @@ -2539,7 +2529,6 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
>>
>>                 sectors -= s;
>>                 sect += s;
>> -               idx++;
>>         }
>>  }
>>
>> @@ -3050,7 +3039,7 @@ static int init_resync(struct r10conf *conf)
>>  static struct r10bio *raid10_alloc_init_r10buf(struct r10conf *conf)
>>  {
>>         struct r10bio *r10bio = mempool_alloc(&conf->r10buf_pool, GFP_NOIO);
>> -       struct rsync_pages *rp;
>> +       struct resync_folio *rf;
>>         struct bio *bio;
>>         int nalloc;
>>         int i;
>> @@ -3063,14 +3052,14 @@ static struct r10bio *raid10_alloc_init_r10buf(struct r10conf *conf)
>>
>>         for (i = 0; i < nalloc; i++) {
>>                 bio = r10bio->devs[i].bio;
>> -               rp = bio->bi_private;
>> +               rf = bio->bi_private;
>>                 bio_reset(bio, NULL, 0);
>> -               bio->bi_private = rp;
>> +               bio->bi_private = rf;
>>                 bio = r10bio->devs[i].repl_bio;
>>                 if (bio) {
>> -                       rp = bio->bi_private;
>> +                       rf = bio->bi_private;
>>                         bio_reset(bio, NULL, 0);
>> -                       bio->bi_private = rp;
>> +                       bio->bi_private = rf;
>>                 }
>>         }
>>         return r10bio;
>> @@ -3156,7 +3145,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>>         int max_sync = RESYNC_SECTORS;
>>         sector_t sync_blocks;
>>         sector_t chunk_mask = conf->geo.chunk_mask;
>> -       int page_idx = 0;
>>
>>         /*
>>          * Allow skipping a full rebuild for incremental assembly
>> @@ -3376,6 +3364,15 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>>                                                 continue;
>>                                         }
>>                                 }
>> +
>> +                               /*
>> +                                * RESYNC_BLOCK_SIZE folio might alloc failed in
>> +                                * resync_alloc_folio(). Fall back to smaller sync
>> +                                * size if needed.
>> +                                */
>> +                               if (max_sync > r10_bio->sectors)
>> +                                       max_sync = r10_bio->sectors;
>> +
>>                                 any_working = 1;
>>                                 bio = r10_bio->devs[0].bio;
>>                                 bio->bi_next = biolist;
>> @@ -3527,7 +3524,15 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>>                 }
>>                 if (sync_blocks < max_sync)
>>                         max_sync = sync_blocks;
>> +
>>                 r10_bio = raid10_alloc_init_r10buf(conf);
>> +               /*
>> +                * RESYNC_BLOCK_SIZE folio might alloc failed in resync_alloc_folio().
>> +                * Fall back to smaller sync size if needed.
>> +                */
>> +               if (max_sync > r10_bio->sectors)
>> +                       max_sync = r10_bio->sectors;
>> +
>>                 r10_bio->state = 0;
>>
>>                 r10_bio->mddev = mddev;
>> @@ -3620,29 +3625,25 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>>                 }
>>         }
>>
>> -       nr_sectors = 0;
>>         if (sector_nr + max_sync < max_sector)
>>                 max_sector = sector_nr + max_sync;
>>         do {
>> -               struct page *page;
>> -               int len = PAGE_SIZE;
>> -               if (sector_nr + (len>>9) > max_sector)
>> -                       len = (max_sector - sector_nr) << 9;
>> -               if (len == 0)
>> +               nr_sectors = max_sector - sector_nr;
>> +
>> +               if (nr_sectors == 0)
>>                         break;
>>                 for (bio= biolist ; bio ; bio=bio->bi_next) {
>> -                       struct resync_pages *rp = get_resync_pages(bio);
>> -                       page = resync_fetch_page(rp, page_idx);
>> -                       if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
>> +                       struct resync_folio *rf = get_resync_folio(bio);
>> +
>> +                       if (WARN_ON(!bio_add_folio(bio, rf->folio, nr_sectors << 9, 0))) {
>>                                 bio->bi_status = BLK_STS_RESOURCE;
>>                                 bio_endio(bio);
>>                                 *skipped = 1;
>> -                               return max_sync;
>> +                               return nr_sectors << 9;
>>                         }
>>                 }
>> -               nr_sectors += len>>9;
>> -               sector_nr += len>>9;
>> -       } while (++page_idx < RESYNC_PAGES);
>> +               sector_nr += nr_sectors;
>> +       } while (0);
>>         r10_bio->sectors = nr_sectors;
>>
>>         if (mddev_is_clustered(mddev) &&
>> @@ -4560,7 +4561,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
>>                                 int *skipped)
>>  {
>>         /* We simply copy at most one chunk (smallest of old and new)
>> -        * at a time, possibly less if that exceeds RESYNC_PAGES,
>> +        * at a time, possibly less if that exceeds RESYNC_BLOCK_SIZE,
>>          * or we hit a bad block or something.
>>          * This might mean we pause for normal IO in the middle of
>>          * a chunk, but that is not a problem as mddev->reshape_position
>> @@ -4600,14 +4601,13 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
>>         struct r10bio *r10_bio;
>>         sector_t next, safe, last;
>>         int max_sectors;
>> -       int nr_sectors;
>>         int s;
>>         struct md_rdev *rdev;
>>         int need_flush = 0;
>>         struct bio *blist;
>>         struct bio *bio, *read_bio;
>>         int sectors_done = 0;
>> -       struct page **pages;
>> +       struct folio *folio;
>>
>>         if (sector_nr == 0) {
>>                 /* If restarting in the middle, skip the initial sectors */
>> @@ -4709,7 +4709,12 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
>>         r10_bio->mddev = mddev;
>>         r10_bio->sector = sector_nr;
>>         set_bit(R10BIO_IsReshape, &r10_bio->state);
>> -       r10_bio->sectors = last - sector_nr + 1;
>> +       /*
>> +        * RESYNC_BLOCK_SIZE folio might alloc failed in
>> +        * resync_alloc_folio(). Fall back to smaller sync
>> +        * size if needed.
>> +        */
>> +       r10_bio->sectors = min_t(int, r10_bio->sectors, last - sector_nr + 1);
>>         rdev = read_balance(conf, r10_bio, &max_sectors);
>>         BUG_ON(!test_bit(R10BIO_Previous, &r10_bio->state));
>>
>> @@ -4723,7 +4728,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
>>                 return sectors_done;
>>         }
>>
>> -       read_bio = bio_alloc_bioset(rdev->bdev, RESYNC_PAGES, REQ_OP_READ,
>> +       read_bio = bio_alloc_bioset(rdev->bdev, 1, REQ_OP_READ,
>>                                     GFP_KERNEL, &mddev->bio_set);
>>         read_bio->bi_iter.bi_sector = (r10_bio->devs[r10_bio->read_slot].addr
>>                                + rdev->data_offset);
>> @@ -4787,32 +4792,23 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
>>                 blist = b;
>>         }
>>
>> -       /* Now add as many pages as possible to all of these bios. */
>> +       /* Now add folio to all of these bios. */
>>
>> -       nr_sectors = 0;
>> -       pages = get_resync_pages(r10_bio->devs[0].bio)->pages;
>> -       for (s = 0 ; s < max_sectors; s += PAGE_SIZE >> 9) {
>> -               struct page *page = pages[s / (PAGE_SIZE >> 9)];
>> -               int len = (max_sectors - s) << 9;
>> -               if (len > PAGE_SIZE)
>> -                       len = PAGE_SIZE;
>> -               for (bio = blist; bio ; bio = bio->bi_next) {
>> -                       if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
>> -                               bio->bi_status = BLK_STS_RESOURCE;
>> -                               bio_endio(bio);
>> -                               return sectors_done;
>> -                       }
>> +       folio = get_resync_folio(r10_bio->devs[0].bio)->folio;
>> +       for (bio = blist; bio ; bio = bio->bi_next) {
>> +               if (WARN_ON(!bio_add_folio(bio, folio, max_sectors, 0))) {
>> +                       bio->bi_status = BLK_STS_RESOURCE;
>> +                       bio_endio(bio);
>> +                       return sectors_done;
>
> In fact, the original codes don't clean up before returning.
> bio_add_folio_nofail is used in raid1 and can we use
> bio_add_folio_nofail here as well?
>

Agree, I will clean it up before this patch.

>>                 }
>> -               sector_nr += len >> 9;
>> -               nr_sectors += len >> 9;
>>         }
>> -       r10_bio->sectors = nr_sectors;
>> +       r10_bio->sectors = max_sectors >> 9;
>>
>>         /* Now submit the read */
>>         atomic_inc(&r10_bio->remaining);
>>         read_bio->bi_next = NULL;
>>         submit_bio_noacct(read_bio);
>> -       sectors_done += nr_sectors;
>> +       sectors_done += max_sectors;
>>         if (sector_nr <= last)
>>                 goto read_more;
>>
>> @@ -4914,8 +4910,8 @@ static int handle_reshape_read_error(struct mddev *mddev,
>>         struct r10conf *conf = mddev->private;
>>         struct r10bio *r10b;
>>         int slot = 0;
>> -       int idx = 0;
>> -       struct page **pages;
>> +       int sect = 0;
>> +       struct folio *folio;
>>
>>         r10b = kmalloc(struct_size(r10b, devs, conf->copies), GFP_NOIO);
>>         if (!r10b) {
>> @@ -4923,8 +4919,8 @@ static int handle_reshape_read_error(struct mddev *mddev,
>>                 return -ENOMEM;
>>         }
>>
>> -       /* reshape IOs share pages from .devs[0].bio */
>> -       pages = get_resync_pages(r10_bio->devs[0].bio)->pages;
>> +       /* reshape IOs share folio from .devs[0].bio */
>> +       folio = get_resync_folio(r10_bio->devs[0].bio)->folio;
>>
>>         r10b->sector = r10_bio->sector;
>>         __raid10_find_phys(&conf->prev, r10b);
>> @@ -4940,19 +4936,19 @@ static int handle_reshape_read_error(struct mddev *mddev,
>>                 while (!success) {
>>                         int d = r10b->devs[slot].devnum;
>>                         struct md_rdev *rdev = conf->mirrors[d].rdev;
>> -                       sector_t addr;
>>                         if (rdev == NULL ||
>>                             test_bit(Faulty, &rdev->flags) ||
>>                             !test_bit(In_sync, &rdev->flags))
>>                                 goto failed;
>>
>> -                       addr = r10b->devs[slot].addr + idx * PAGE_SIZE;
>>                         atomic_inc(&rdev->nr_pending);
>> -                       success = sync_page_io(rdev,
>> -                                              addr,
>> -                                              s << 9,
>> -                                              pages[idx],
>> -                                              REQ_OP_READ, false);
>> +                       success = sync_folio_io(rdev,
>> +                                               r10b->devs[slot].addr +
>> +                                               sect,
>> +                                               s << 9,
>> +                                               sect << 9,
>> +                                               folio,
>> +                                               REQ_OP_READ, false);
>>                         rdev_dec_pending(rdev, mddev);
>>                         if (success)
>>                                 break;
>> @@ -4971,7 +4967,7 @@ static int handle_reshape_read_error(struct mddev *mddev,
>>                         return -EIO;
>>                 }
>>                 sectors -= s;
>> -               idx++;
>> +               sect += s;
>>         }
>>         kfree(r10b);
>>         return 0;
>> --
>> 2.39.2
>>
>>
>
> Regards
> Xiao

--
Thansk
Nan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 7/8] md/raid1: fix IO error at logical block size granularity
  2026-04-16  3:37 [PATCH v3 0/8] folio support for sync I/O in RAID linan666
                   ` (5 preceding siblings ...)
  2026-04-16  3:37 ` [PATCH v3 6/8] md/raid1,raid10: use folio for sync path IO linan666
@ 2026-04-16  3:38 ` linan666
  2026-04-30  2:22   ` Xiao Ni
  2026-04-16  3:38 ` [PATCH v3 8/8] md/raid10: " linan666
  7 siblings, 1 reply; 13+ messages in thread
From: linan666 @ 2026-04-16  3:38 UTC (permalink / raw)
  To: song, yukuai; +Cc: linux-raid, linux-kernel, linan666, yangerkun, yi.zhang

From: Li Nan <linan122@huawei.com>

RAID1 currently fixes IO error at PAGE_SIZE granularity. Fix at smaller
granularity can handle more errors, and RAID will support logical block
sizes larger than PAGE_SIZE in the future, where PAGE_SIZE IO will fail.

Switch IO error fix granularity to logical block size.

Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai@fnnas.com>
---
 drivers/md/raid1.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 724fd4f2cc3a..de8c964ca11d 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2116,7 +2116,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
 {
 	/* Try some synchronous reads of other devices to get
 	 * good data, much like with normal read errors.  Only
-	 * read into the pages we already have so we don't
+	 * read into the block we already have so we don't
 	 * need to re-issue the read request.
 	 * We don't need to freeze the array, because being in an
 	 * active sync request, there is no normal IO, and
@@ -2147,13 +2147,11 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
 	}
 
 	while(sectors) {
-		int s = sectors;
+		int s = min_t(int, sectors, mddev->logical_block_size >> 9);
 		int d = r1_bio->read_disk;
 		int success = 0;
 		int start;
 
-		if (s > (PAGE_SIZE>>9))
-			s = PAGE_SIZE >> 9;
 		do {
 			if (r1_bio->bios[d]->bi_end_io == end_sync_read) {
 				/* No rcu protection needed here devices
@@ -2192,7 +2190,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
 			if (abort)
 				return 0;
 
-			/* Try next page */
+			/* Try next block */
 			sectors -= s;
 			sect += s;
 			off += s << 9;
@@ -2390,14 +2388,11 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
 	}
 
 	while(sectors) {
-		int s = sectors;
+		int s = min_t(int, sectors, mddev->logical_block_size >> 9);
 		int d = read_disk;
 		int success = 0;
 		int start;
 
-		if (s > (PAGE_SIZE>>9))
-			s = PAGE_SIZE >> 9;
-
 		do {
 			rdev = conf->mirrors[d].rdev;
 			if (rdev &&
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 7/8] md/raid1: fix IO error at logical block size granularity
  2026-04-16  3:38 ` [PATCH v3 7/8] md/raid1: fix IO error at logical block size granularity linan666
@ 2026-04-30  2:22   ` Xiao Ni
  0 siblings, 0 replies; 13+ messages in thread
From: Xiao Ni @ 2026-04-30  2:22 UTC (permalink / raw)
  To: linan666; +Cc: song, yukuai, linux-raid, linux-kernel, yangerkun, yi.zhang

On Thu, Apr 16, 2026 at 11:55 AM <linan666@huaweicloud.com> wrote:
>
> From: Li Nan <linan122@huawei.com>
>
> RAID1 currently fixes IO error at PAGE_SIZE granularity. Fix at smaller
> granularity can handle more errors, and RAID will support logical block
> sizes larger than PAGE_SIZE in the future, where PAGE_SIZE IO will fail.
>
> Switch IO error fix granularity to logical block size.
>
> Signed-off-by: Li Nan <linan122@huawei.com>
> Reviewed-by: Yu Kuai <yukuai@fnnas.com>
> ---
>  drivers/md/raid1.c | 13 ++++---------
>  1 file changed, 4 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 724fd4f2cc3a..de8c964ca11d 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2116,7 +2116,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
>  {
>         /* Try some synchronous reads of other devices to get
>          * good data, much like with normal read errors.  Only
> -        * read into the pages we already have so we don't
> +        * read into the block we already have so we don't
>          * need to re-issue the read request.
>          * We don't need to freeze the array, because being in an
>          * active sync request, there is no normal IO, and
> @@ -2147,13 +2147,11 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
>         }
>
>         while(sectors) {
> -               int s = sectors;
> +               int s = min_t(int, sectors, mddev->logical_block_size >> 9);
>                 int d = r1_bio->read_disk;
>                 int success = 0;
>                 int start;
>
> -               if (s > (PAGE_SIZE>>9))
> -                       s = PAGE_SIZE >> 9;
>                 do {
>                         if (r1_bio->bios[d]->bi_end_io == end_sync_read) {
>                                 /* No rcu protection needed here devices
> @@ -2192,7 +2190,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
>                         if (abort)
>                                 return 0;
>
> -                       /* Try next page */
> +                       /* Try next block */
>                         sectors -= s;
>                         sect += s;
>                         off += s << 9;
> @@ -2390,14 +2388,11 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
>         }
>
>         while(sectors) {
> -               int s = sectors;
> +               int s = min_t(int, sectors, mddev->logical_block_size >> 9);
>                 int d = read_disk;
>                 int success = 0;
>                 int start;
>
> -               if (s > (PAGE_SIZE>>9))
> -                       s = PAGE_SIZE >> 9;
> -
>                 do {
>                         rdev = conf->mirrors[d].rdev;
>                         if (rdev &&
> --
> 2.39.2
>
>

This patch looks good to me.
Reviewed-by: Xiao Ni <xni@redhat.com>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 8/8] md/raid10: fix IO error at logical block size granularity
  2026-04-16  3:37 [PATCH v3 0/8] folio support for sync I/O in RAID linan666
                   ` (6 preceding siblings ...)
  2026-04-16  3:38 ` [PATCH v3 7/8] md/raid1: fix IO error at logical block size granularity linan666
@ 2026-04-16  3:38 ` linan666
  2026-04-30  2:23   ` Xiao Ni
  7 siblings, 1 reply; 13+ messages in thread
From: linan666 @ 2026-04-16  3:38 UTC (permalink / raw)
  To: song, yukuai; +Cc: linux-raid, linux-kernel, linan666, yangerkun, yi.zhang

From: Li Nan <linan122@huawei.com>

RAID10 currently fixes IO error at PAGE_SIZE granularity. Fix at smaller
granularity can handle more errors, and RAID will support logical block
sizes larger than PAGE_SIZE in the future, where PAGE_SIZE IO will fail.

Switch IO error fix granularity to logical block size.

Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai@fnnas.com>
---
 drivers/md/raid10.c | 17 ++++-------------
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 3638e00fe420..5b4ffd23211a 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2454,7 +2454,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 static void fix_recovery_read_error(struct r10bio *r10_bio)
 {
 	/* We got a read error during recovery.
-	 * We repeat the read in smaller page-sized sections.
+	 * We repeat the read in smaller logical_block_sized sections.
 	 * If a read succeeds, write it to the new device or record
 	 * a bad block if we cannot.
 	 * If a read fails, record a bad block on both old and
@@ -2470,14 +2470,11 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
 	struct folio *folio = get_resync_folio(bio)->folio;
 
 	while (sectors) {
-		int s = sectors;
+		int s = min_t(int, sectors, mddev->logical_block_size >> 9);
 		struct md_rdev *rdev;
 		sector_t addr;
 		int ok;
 
-		if (s > (PAGE_SIZE>>9))
-			s = PAGE_SIZE >> 9;
-
 		rdev = conf->mirrors[dr].rdev;
 		addr = r10_bio->devs[0].addr + sect;
 		ok = sync_folio_io(rdev,
@@ -2621,14 +2618,11 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
 	}
 
 	while(sectors) {
-		int s = sectors;
+		int s = min_t(int, sectors, mddev->logical_block_size >> 9);
 		int sl = slot;
 		int success = 0;
 		int start;
 
-		if (s > (PAGE_SIZE>>9))
-			s = PAGE_SIZE >> 9;
-
 		do {
 			d = r10_bio->devs[sl].devnum;
 			rdev = conf->mirrors[d].rdev;
@@ -4926,13 +4920,10 @@ static int handle_reshape_read_error(struct mddev *mddev,
 	__raid10_find_phys(&conf->prev, r10b);
 
 	while (sectors) {
-		int s = sectors;
+		int s = min_t(int, sectors, mddev->logical_block_size >> 9);
 		int success = 0;
 		int first_slot = slot;
 
-		if (s > (PAGE_SIZE >> 9))
-			s = PAGE_SIZE >> 9;
-
 		while (!success) {
 			int d = r10b->devs[slot].devnum;
 			struct md_rdev *rdev = conf->mirrors[d].rdev;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 8/8] md/raid10: fix IO error at logical block size granularity
  2026-04-16  3:38 ` [PATCH v3 8/8] md/raid10: " linan666
@ 2026-04-30  2:23   ` Xiao Ni
  0 siblings, 0 replies; 13+ messages in thread
From: Xiao Ni @ 2026-04-30  2:23 UTC (permalink / raw)
  To: linan666; +Cc: song, yukuai, linux-raid, linux-kernel, yangerkun, yi.zhang

On Thu, Apr 16, 2026 at 11:51 AM <linan666@huaweicloud.com> wrote:
>
> From: Li Nan <linan122@huawei.com>
>
> RAID10 currently fixes IO error at PAGE_SIZE granularity. Fix at smaller
> granularity can handle more errors, and RAID will support logical block
> sizes larger than PAGE_SIZE in the future, where PAGE_SIZE IO will fail.
>
> Switch IO error fix granularity to logical block size.
>
> Signed-off-by: Li Nan <linan122@huawei.com>
> Reviewed-by: Yu Kuai <yukuai@fnnas.com>
> ---
>  drivers/md/raid10.c | 17 ++++-------------
>  1 file changed, 4 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 3638e00fe420..5b4ffd23211a 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -2454,7 +2454,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
>  static void fix_recovery_read_error(struct r10bio *r10_bio)
>  {
>         /* We got a read error during recovery.
> -        * We repeat the read in smaller page-sized sections.
> +        * We repeat the read in smaller logical_block_sized sections.
>          * If a read succeeds, write it to the new device or record
>          * a bad block if we cannot.
>          * If a read fails, record a bad block on both old and
> @@ -2470,14 +2470,11 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
>         struct folio *folio = get_resync_folio(bio)->folio;
>
>         while (sectors) {
> -               int s = sectors;
> +               int s = min_t(int, sectors, mddev->logical_block_size >> 9);
>                 struct md_rdev *rdev;
>                 sector_t addr;
>                 int ok;
>
> -               if (s > (PAGE_SIZE>>9))
> -                       s = PAGE_SIZE >> 9;
> -
>                 rdev = conf->mirrors[dr].rdev;
>                 addr = r10_bio->devs[0].addr + sect;
>                 ok = sync_folio_io(rdev,
> @@ -2621,14 +2618,11 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
>         }
>
>         while(sectors) {
> -               int s = sectors;
> +               int s = min_t(int, sectors, mddev->logical_block_size >> 9);
>                 int sl = slot;
>                 int success = 0;
>                 int start;
>
> -               if (s > (PAGE_SIZE>>9))
> -                       s = PAGE_SIZE >> 9;
> -
>                 do {
>                         d = r10_bio->devs[sl].devnum;
>                         rdev = conf->mirrors[d].rdev;
> @@ -4926,13 +4920,10 @@ static int handle_reshape_read_error(struct mddev *mddev,
>         __raid10_find_phys(&conf->prev, r10b);
>
>         while (sectors) {
> -               int s = sectors;
> +               int s = min_t(int, sectors, mddev->logical_block_size >> 9);
>                 int success = 0;
>                 int first_slot = slot;
>
> -               if (s > (PAGE_SIZE >> 9))
> -                       s = PAGE_SIZE >> 9;
> -
>                 while (!success) {
>                         int d = r10b->devs[slot].devnum;
>                         struct md_rdev *rdev = conf->mirrors[d].rdev;
> --
> 2.39.2
>
>

This patch looks good to me.
Reviewed-by: Xiao Ni <xni@redhat.com>


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-05-07  7:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-16  3:37 [PATCH v3 0/8] folio support for sync I/O in RAID linan666
2026-04-16  3:37 ` [PATCH v3 1/8] md/raid1,raid10: clean up of RESYNC_SECTORS linan666
2026-04-16  3:37 ` [PATCH v3 2/8] md: introduce sync_folio_io for folio support in RAID linan666
2026-04-16  3:37 ` [PATCH v3 3/8] md: introduce safe_put_folio " linan666
2026-04-16  3:37 ` [PATCH v3 4/8] md/raid1: use folio for tmppage linan666
2026-04-16  3:37 ` [PATCH v3 5/8] md/raid10: " linan666
2026-04-16  3:37 ` [PATCH v3 6/8] md/raid1,raid10: use folio for sync path IO linan666
2026-04-30  1:54   ` Xiao Ni
2026-05-07  7:13     ` 李楠 Magic Li
2026-04-16  3:38 ` [PATCH v3 7/8] md/raid1: fix IO error at logical block size granularity linan666
2026-04-30  2:22   ` Xiao Ni
2026-04-16  3:38 ` [PATCH v3 8/8] md/raid10: " linan666
2026-04-30  2:23   ` Xiao Ni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox