* [PATCH v3 0/8] folio support for sync I/O in RAID
@ 2026-04-16 3:37 linan666
2026-04-16 3:37 ` [PATCH v3 1/8] md/raid1,raid10: clean up of RESYNC_SECTORS linan666
` (7 more replies)
0 siblings, 8 replies; 13+ messages in thread
From: linan666 @ 2026-04-16 3:37 UTC (permalink / raw)
To: song, yukuai; +Cc: linux-raid, linux-kernel, linan666, yangerkun, yi.zhang
From: Li Nan <linan122@huawei.com>
This patchset adds folio support to sync operations in raid1/10.
Previously, we used 16 * 4K pages for 64K sync I/O. With this change,
we'll use a single 64K folio instead. Using folios reduces
resync/recovery time by 20% on HDD.
This is the first step towards full folio support in RAID. Going forward,
I will replace the remaining page-based usage with folio.
The patchset was tested with mdadm. Additional fault injection stress tests
were run under file systems.
v3:
- In patch 3/4/5, ntroduce safe_folio_put and use it for tmpfolio.
- Merge Cleanup patch into patch 6.
v2:
- Remove patch "md: use folio for bb_folio". It will be included in
a later patch set
- In patch 5:
1) fix typo
2) rewrite the logic of copying data in process_checks()
3) rename resync_get_all_folio() to resync_get_folio()
4) s/resync_pages *rps/resync_folio *rfs/g in
raid1_alloc_init_r1buf() and raid10_alloc_init_r10buf()
- Subsequent patches: Adapting conflicts caused by patch 5
Li Nan (8):
md/raid1,raid10: clean up of RESYNC_SECTORS
md: introduce sync_folio_io for folio support in RAID
md: introduce safe_put_folio for folio support in RAID
md/raid1: use folio for tmppage
md/raid10: use folio for tmppage
md/raid1,raid10: use folio for sync path IO
md/raid1: fix IO error at logical block size granularity
md/raid10: fix IO error at logical block size granularity
drivers/md/md.h | 10 +-
drivers/md/raid1.h | 2 +-
drivers/md/raid10.h | 2 +-
drivers/md/md.c | 17 ++-
drivers/md/raid1-10.c | 81 ++++-------
drivers/md/raid1.c | 233 ++++++++++++++-----------------
drivers/md/raid10.c | 312 ++++++++++++++++++++----------------------
7 files changed, 297 insertions(+), 360 deletions(-)
--
2.39.2
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH v3 1/8] md/raid1,raid10: clean up of RESYNC_SECTORS 2026-04-16 3:37 [PATCH v3 0/8] folio support for sync I/O in RAID linan666 @ 2026-04-16 3:37 ` linan666 2026-04-16 3:37 ` [PATCH v3 2/8] md: introduce sync_folio_io for folio support in RAID linan666 ` (6 subsequent siblings) 7 siblings, 0 replies; 13+ messages in thread From: linan666 @ 2026-04-16 3:37 UTC (permalink / raw) To: song, yukuai; +Cc: linux-raid, linux-kernel, linan666, yangerkun, yi.zhang From: Li Nan <linan122@huawei.com> Move redundant RESYNC_SECTORS definition from raid1 and raid10 implementations to raid1-10.c. Simplify max_sync assignment in raid10_sync_request(). No functional changes. Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Yu Kuai <yukuai@fnnas.com> --- drivers/md/raid1-10.c | 1 + drivers/md/raid1.c | 1 - drivers/md/raid10.c | 4 +--- 3 files changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c index c33099925f23..cda531d0720b 100644 --- a/drivers/md/raid1-10.c +++ b/drivers/md/raid1-10.c @@ -2,6 +2,7 @@ /* Maximum size of each resync request */ #define RESYNC_BLOCK_SIZE (64*1024) #define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE) +#define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9) /* when we get a read error on a read-only array, we redirect to another * device without failing the first device, or trying to over-write to diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 867db18bc3ba..5a73a9f19e0e 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -136,7 +136,6 @@ static void *r1bio_pool_alloc(gfp_t gfp_flags, struct r1conf *conf) } #define RESYNC_DEPTH 32 -#define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9) #define RESYNC_WINDOW (RESYNC_BLOCK_SIZE * RESYNC_DEPTH) #define RESYNC_WINDOW_SECTORS (RESYNC_WINDOW >> 9) #define CLUSTER_RESYNC_WINDOW (16 * RESYNC_WINDOW) diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index b4892c5d571c..90c1036f6ec4 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -113,7 +113,6 @@ static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data) return kzalloc(size, gfp_flags); } -#define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9) /* amount of memory to reserve for resync requests */ #define RESYNC_WINDOW (1024*1024) /* maximum number of concurrent requests, memory permitting */ @@ -3153,7 +3152,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr, struct bio *biolist = NULL, *bio; sector_t nr_sectors; int i; - int max_sync; + int max_sync = RESYNC_SECTORS; sector_t sync_blocks; sector_t chunk_mask = conf->geo.chunk_mask; int page_idx = 0; @@ -3266,7 +3265,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr, * end_sync_write if we will want to write. */ - max_sync = RESYNC_PAGES << (PAGE_SHIFT-9); if (!test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) { /* recovery... the complicated one */ int j; -- 2.39.2 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 2/8] md: introduce sync_folio_io for folio support in RAID 2026-04-16 3:37 [PATCH v3 0/8] folio support for sync I/O in RAID linan666 2026-04-16 3:37 ` [PATCH v3 1/8] md/raid1,raid10: clean up of RESYNC_SECTORS linan666 @ 2026-04-16 3:37 ` linan666 2026-04-16 3:37 ` [PATCH v3 3/8] md: introduce safe_put_folio " linan666 ` (5 subsequent siblings) 7 siblings, 0 replies; 13+ messages in thread From: linan666 @ 2026-04-16 3:37 UTC (permalink / raw) To: song, yukuai; +Cc: linux-raid, linux-kernel, linan666, yangerkun, yi.zhang From: Li Nan <linan122@huawei.com> Prepare for folio support in RAID by introducing sync_folio_io(), matching sync_page_io()'s functionality. Differences are: - Add new parameter 'off' to prepare for adding a folio to bio in segments, e.g. in fix_recovery_read_error() - Change return value to bool - Replace the checking to 'bio.bi_status == BLK_STS_OK' sync_page_io() will be removed once full folio support is complete. Signed-off-by: Li Nan <linan122@huawei.com> --- drivers/md/md.h | 4 +++- drivers/md/md.c | 15 +++++++++++---- 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/md/md.h b/drivers/md/md.h index ac84289664cd..914b992a073b 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -924,8 +924,10 @@ void md_write_metadata(struct mddev *mddev, struct md_rdev *rdev, sector_t sector, int size, struct page *page, unsigned int offset); extern int md_super_wait(struct mddev *mddev); -extern int sync_page_io(struct md_rdev *rdev, sector_t sector, int size, +extern bool sync_page_io(struct md_rdev *rdev, sector_t sector, int size, struct page *page, blk_opf_t opf, bool metadata_op); +extern bool sync_folio_io(struct md_rdev *rdev, sector_t sector, int size, + int off, struct folio *folio, blk_opf_t opf, bool metadata_op); extern void md_do_sync(struct md_thread *thread); extern void md_new_event(void); extern void md_allow_write(struct mddev *mddev); diff --git a/drivers/md/md.c b/drivers/md/md.c index d9c9fd2839b3..5e83914d5c14 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -1166,8 +1166,8 @@ int md_super_wait(struct mddev *mddev) return 0; } -int sync_page_io(struct md_rdev *rdev, sector_t sector, int size, - struct page *page, blk_opf_t opf, bool metadata_op) +bool sync_folio_io(struct md_rdev *rdev, sector_t sector, int size, int off, + struct folio *folio, blk_opf_t opf, bool metadata_op) { struct bio bio; struct bio_vec bvec; @@ -1185,11 +1185,18 @@ int sync_page_io(struct md_rdev *rdev, sector_t sector, int size, bio.bi_iter.bi_sector = sector + rdev->new_data_offset; else bio.bi_iter.bi_sector = sector + rdev->data_offset; - __bio_add_page(&bio, page, size, 0); + bio_add_folio_nofail(&bio, folio, size, off); submit_bio_wait(&bio); - return !bio.bi_status; + return bio.bi_status == BLK_STS_OK; +} +EXPORT_SYMBOL_GPL(sync_folio_io); + +bool sync_page_io(struct md_rdev *rdev, sector_t sector, int size, + struct page *page, blk_opf_t opf, bool metadata_op) +{ + return sync_folio_io(rdev, sector, size, 0, page_folio(page), opf, metadata_op); } EXPORT_SYMBOL_GPL(sync_page_io); -- 2.39.2 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 3/8] md: introduce safe_put_folio for folio support in RAID 2026-04-16 3:37 [PATCH v3 0/8] folio support for sync I/O in RAID linan666 2026-04-16 3:37 ` [PATCH v3 1/8] md/raid1,raid10: clean up of RESYNC_SECTORS linan666 2026-04-16 3:37 ` [PATCH v3 2/8] md: introduce sync_folio_io for folio support in RAID linan666 @ 2026-04-16 3:37 ` linan666 2026-04-16 3:37 ` [PATCH v3 4/8] md/raid1: use folio for tmppage linan666 ` (4 subsequent siblings) 7 siblings, 0 replies; 13+ messages in thread From: linan666 @ 2026-04-16 3:37 UTC (permalink / raw) To: song, yukuai; +Cc: linux-raid, linux-kernel, linan666, yangerkun, yi.zhang From: Li Nan <linan122@huawei.com> safe_put_page() will be removed after the last reference to it in RAID5 is removed. Signed-off-by: Li Nan <linan122@huawei.com> --- drivers/md/md.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/md/md.h b/drivers/md/md.h index 914b992a073b..7c0c38f09cc3 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -888,6 +888,12 @@ struct md_io_clone { rcu_read_unlock(); \ } while (0) +static inline void safe_folio_put(struct folio *folio) +{ + if (folio) + folio_put(folio); +} + static inline void safe_put_page(struct page *p) { if (p) put_page(p); -- 2.39.2 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 4/8] md/raid1: use folio for tmppage 2026-04-16 3:37 [PATCH v3 0/8] folio support for sync I/O in RAID linan666 ` (2 preceding siblings ...) 2026-04-16 3:37 ` [PATCH v3 3/8] md: introduce safe_put_folio " linan666 @ 2026-04-16 3:37 ` linan666 2026-04-16 3:37 ` [PATCH v3 5/8] md/raid10: " linan666 ` (3 subsequent siblings) 7 siblings, 0 replies; 13+ messages in thread From: linan666 @ 2026-04-16 3:37 UTC (permalink / raw) To: song, yukuai; +Cc: linux-raid, linux-kernel, linan666, yangerkun, yi.zhang From: Li Nan <linan122@huawei.com> Convert tmppage to tmpfolio and use it throughout in raid1. Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> --- drivers/md/raid1.h | 2 +- drivers/md/raid1.c | 18 ++++++++++-------- 2 files changed, 11 insertions(+), 9 deletions(-) diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h index c98d43a7ae99..d480b3a8c2c4 100644 --- a/drivers/md/raid1.h +++ b/drivers/md/raid1.h @@ -101,7 +101,7 @@ struct r1conf { /* temporary buffer to synchronous IO when attempting to repair * a read error. */ - struct page *tmppage; + struct folio *tmpfolio; /* When taking over an array from a different personality, we store * the new thread here until we fully activate the array. diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 5a73a9f19e0e..a72abdc37a2d 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -2417,8 +2417,8 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio) rdev->recovery_offset >= sect + s)) && rdev_has_badblock(rdev, sect, s) == 0) { atomic_inc(&rdev->nr_pending); - if (sync_page_io(rdev, sect, s<<9, - conf->tmppage, REQ_OP_READ, false)) + if (sync_folio_io(rdev, sect, s<<9, 0, + conf->tmpfolio, REQ_OP_READ, false)) success = 1; rdev_dec_pending(rdev, mddev); if (success) @@ -2447,7 +2447,8 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio) !test_bit(Faulty, &rdev->flags)) { atomic_inc(&rdev->nr_pending); r1_sync_page_io(rdev, sect, s, - conf->tmppage, REQ_OP_WRITE); + folio_page(conf->tmpfolio, 0), + REQ_OP_WRITE); rdev_dec_pending(rdev, mddev); } } @@ -2461,7 +2462,8 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio) !test_bit(Faulty, &rdev->flags)) { atomic_inc(&rdev->nr_pending); if (r1_sync_page_io(rdev, sect, s, - conf->tmppage, REQ_OP_READ)) { + folio_page(conf->tmpfolio, 0), + REQ_OP_READ)) { atomic_add(s, &rdev->corrected_errors); pr_info("md/raid1:%s: read error corrected (%d sectors at %llu on %pg)\n", mdname(mddev), s, @@ -3099,8 +3101,8 @@ static struct r1conf *setup_conf(struct mddev *mddev) if (!conf->mirrors) goto abort; - conf->tmppage = alloc_page(GFP_KERNEL); - if (!conf->tmppage) + conf->tmpfolio = folio_alloc(GFP_KERNEL, 0); + if (!conf->tmpfolio) goto abort; r1bio_size = offsetof(struct r1bio, bios[mddev->raid_disks * 2]); @@ -3175,7 +3177,7 @@ static struct r1conf *setup_conf(struct mddev *mddev) if (conf) { mempool_destroy(conf->r1bio_pool); kfree(conf->mirrors); - safe_put_page(conf->tmppage); + safe_folio_put(conf->tmpfolio); kfree(conf->nr_pending); kfree(conf->nr_waiting); kfree(conf->nr_queued); @@ -3290,7 +3292,7 @@ static void raid1_free(struct mddev *mddev, void *priv) mempool_destroy(conf->r1bio_pool); kfree(conf->mirrors); - safe_put_page(conf->tmppage); + safe_folio_put(conf->tmpfolio); kfree(conf->nr_pending); kfree(conf->nr_waiting); kfree(conf->nr_queued); -- 2.39.2 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 5/8] md/raid10: use folio for tmppage 2026-04-16 3:37 [PATCH v3 0/8] folio support for sync I/O in RAID linan666 ` (3 preceding siblings ...) 2026-04-16 3:37 ` [PATCH v3 4/8] md/raid1: use folio for tmppage linan666 @ 2026-04-16 3:37 ` linan666 2026-04-16 3:37 ` [PATCH v3 6/8] md/raid1,raid10: use folio for sync path IO linan666 ` (2 subsequent siblings) 7 siblings, 0 replies; 13+ messages in thread From: linan666 @ 2026-04-16 3:37 UTC (permalink / raw) To: song, yukuai; +Cc: linux-raid, linux-kernel, linan666, yangerkun, yi.zhang From: Li Nan <linan122@huawei.com> Convert tmppage to tmpfolio and use it throughout in raid10. Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> --- drivers/md/raid10.h | 2 +- drivers/md/raid10.c | 37 +++++++++++++++++++------------------ 2 files changed, 20 insertions(+), 19 deletions(-) diff --git a/drivers/md/raid10.h b/drivers/md/raid10.h index ec79d87fb92f..19f37439a4e2 100644 --- a/drivers/md/raid10.h +++ b/drivers/md/raid10.h @@ -89,7 +89,7 @@ struct r10conf { mempool_t r10bio_pool; mempool_t r10buf_pool; - struct page *tmppage; + struct folio *tmpfolio; struct bio_set bio_split; /* When taking over an array from a different personality, we store diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 90c1036f6ec4..26f93040cd13 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -2581,13 +2581,13 @@ static void recovery_request_write(struct mddev *mddev, struct r10bio *r10_bio) } } -static int r10_sync_page_io(struct md_rdev *rdev, sector_t sector, - int sectors, struct page *page, enum req_op op) +static int r10_sync_folio_io(struct md_rdev *rdev, sector_t sector, + int sectors, struct folio *folio, enum req_op op) { if (rdev_has_badblock(rdev, sector, sectors) && (op == REQ_OP_READ || test_bit(WriteErrorSeen, &rdev->flags))) return -1; - if (sync_page_io(rdev, sector, sectors << 9, page, op, false)) + if (sync_folio_io(rdev, sector, sectors << 9, 0, folio, op, false)) /* success */ return 1; if (op == REQ_OP_WRITE) { @@ -2650,12 +2650,13 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10 r10_bio->devs[sl].addr + sect, s) == 0) { atomic_inc(&rdev->nr_pending); - success = sync_page_io(rdev, - r10_bio->devs[sl].addr + - sect, - s<<9, - conf->tmppage, - REQ_OP_READ, false); + success = sync_folio_io(rdev, + r10_bio->devs[sl].addr + + sect, + s<<9, + 0, + conf->tmpfolio, + REQ_OP_READ, false); rdev_dec_pending(rdev, mddev); if (success) break; @@ -2698,10 +2699,10 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10 continue; atomic_inc(&rdev->nr_pending); - if (r10_sync_page_io(rdev, - r10_bio->devs[sl].addr + - sect, - s, conf->tmppage, REQ_OP_WRITE) + if (r10_sync_folio_io(rdev, + r10_bio->devs[sl].addr + + sect, + s, conf->tmpfolio, REQ_OP_WRITE) == 0) { /* Well, this device is dead */ pr_notice("md/raid10:%s: read correction write failed (%d sectors at %llu on %pg)\n", @@ -2730,10 +2731,10 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10 continue; atomic_inc(&rdev->nr_pending); - switch (r10_sync_page_io(rdev, + switch (r10_sync_folio_io(rdev, r10_bio->devs[sl].addr + sect, - s, conf->tmppage, REQ_OP_READ)) { + s, conf->tmpfolio, REQ_OP_READ)) { case 0: /* Well, this device is dead */ pr_notice("md/raid10:%s: unable to read back corrected sectors (%d sectors at %llu on %pg)\n", @@ -3823,7 +3824,7 @@ static void raid10_free_conf(struct r10conf *conf) kfree(conf->mirrors); kfree(conf->mirrors_old); kfree(conf->mirrors_new); - safe_put_page(conf->tmppage); + safe_folio_put(conf->tmpfolio); bioset_exit(&conf->bio_split); kfree(conf); } @@ -3861,8 +3862,8 @@ static struct r10conf *setup_conf(struct mddev *mddev) if (!conf->mirrors) goto out; - conf->tmppage = alloc_page(GFP_KERNEL); - if (!conf->tmppage) + conf->tmpfolio = folio_alloc(GFP_KERNEL, 0); + if (!conf->tmpfolio) goto out; conf->geo = geo; -- 2.39.2 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 6/8] md/raid1,raid10: use folio for sync path IO 2026-04-16 3:37 [PATCH v3 0/8] folio support for sync I/O in RAID linan666 ` (4 preceding siblings ...) 2026-04-16 3:37 ` [PATCH v3 5/8] md/raid10: " linan666 @ 2026-04-16 3:37 ` linan666 2026-04-30 1:54 ` Xiao Ni 2026-04-16 3:38 ` [PATCH v3 7/8] md/raid1: fix IO error at logical block size granularity linan666 2026-04-16 3:38 ` [PATCH v3 8/8] md/raid10: " linan666 7 siblings, 1 reply; 13+ messages in thread From: linan666 @ 2026-04-16 3:37 UTC (permalink / raw) To: song, yukuai; +Cc: linux-raid, linux-kernel, linan666, yangerkun, yi.zhang From: Li Nan <linan122@huawei.com> Convert all IO on the sync path to use folios, and rename page-related identifiers to match folio. Since RESYNC_BLOCK_SIZE (64K) has higher allocation failure chance than 4k, retry with lower orders to improve allocation reliability. A r1/10_bio may have different rf->folio orders, so use minimum order as r1/10_bio sectors to prevent exceeding size when adding folio to IO later. Clean up: 1. Remove resync_get_all_folio() and invoke folio_get() directly instead. 2. Clean up redundant while(0) loop in md_bio_reset_resync_folio(). 3. Clean up bio variable by directly referencing r10_bio->devs[j].bio instead in r1buf_pool_alloc() and r10buf_pool_alloc(). 4. Clean up RESYNC_PAGES. 5. Remove resync_fetch_folio(), access 'rf->folio' directly. 6. Remove resync_free_folio(), call folio_put() directly. 7. clean up sync IO size calculation in raid1/10_sync_request. Signed-off-by: Li Nan <linan122@huawei.com> --- drivers/md/md.c | 2 +- drivers/md/raid1-10.c | 80 ++++--------- drivers/md/raid1.c | 209 +++++++++++++++------------------- drivers/md/raid10.c | 254 +++++++++++++++++++++--------------------- 4 files changed, 240 insertions(+), 305 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 5e83914d5c14..6554b849ac74 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -9440,7 +9440,7 @@ static bool sync_io_within_limit(struct mddev *mddev) { /* * For raid456, sync IO is stripe(4k) per IO, for other levels, it's - * RESYNC_PAGES(64k) per IO. + * RESYNC_BLOCK_SIZE(64k) per IO. */ return atomic_read(&mddev->recovery_active) < (raid_is_456(mddev) ? 8 : 128) * sync_io_depth(mddev); diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c index cda531d0720b..10200b0a3fd2 100644 --- a/drivers/md/raid1-10.c +++ b/drivers/md/raid1-10.c @@ -1,7 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 /* Maximum size of each resync request */ #define RESYNC_BLOCK_SIZE (64*1024) -#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE) #define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9) /* when we get a read error on a read-only array, we redirect to another @@ -20,9 +19,9 @@ #define MAX_PLUG_BIO 32 /* for managing resync I/O pages */ -struct resync_pages { +struct resync_folio { void *raid_bio; - struct page *pages[RESYNC_PAGES]; + struct folio *folio; }; struct raid1_plug_cb { @@ -36,77 +35,44 @@ static void rbio_pool_free(void *rbio, void *data) kfree(rbio); } -static inline int resync_alloc_pages(struct resync_pages *rp, - gfp_t gfp_flags) +static inline int resync_alloc_folio(struct resync_folio *rf, + gfp_t gfp_flags, int *order) { - int i; + struct folio *folio; - for (i = 0; i < RESYNC_PAGES; i++) { - rp->pages[i] = alloc_page(gfp_flags); - if (!rp->pages[i]) - goto out_free; - } + do { + folio = folio_alloc(gfp_flags, *order); + if (folio) + break; + } while (--(*order) > 0); + if (!folio) + return -ENOMEM; + + rf->folio = folio; return 0; - -out_free: - while (--i >= 0) - put_page(rp->pages[i]); - return -ENOMEM; -} - -static inline void resync_free_pages(struct resync_pages *rp) -{ - int i; - - for (i = 0; i < RESYNC_PAGES; i++) - put_page(rp->pages[i]); -} - -static inline void resync_get_all_pages(struct resync_pages *rp) -{ - int i; - - for (i = 0; i < RESYNC_PAGES; i++) - get_page(rp->pages[i]); -} - -static inline struct page *resync_fetch_page(struct resync_pages *rp, - unsigned idx) -{ - if (WARN_ON_ONCE(idx >= RESYNC_PAGES)) - return NULL; - return rp->pages[idx]; } /* - * 'strct resync_pages' stores actual pages used for doing the resync + * 'strct resync_folio' stores actual pages used for doing the resync * IO, and it is per-bio, so make .bi_private points to it. */ -static inline struct resync_pages *get_resync_pages(struct bio *bio) +static inline struct resync_folio *get_resync_folio(struct bio *bio) { return bio->bi_private; } /* generally called after bio_reset() for reseting bvec */ -static void md_bio_reset_resync_pages(struct bio *bio, struct resync_pages *rp, +static void md_bio_reset_resync_folio(struct bio *bio, struct resync_folio *rf, int size) { - int idx = 0; - /* initialize bvec table again */ - do { - struct page *page = resync_fetch_page(rp, idx); - int len = min_t(int, size, PAGE_SIZE); - - if (WARN_ON(!bio_add_page(bio, page, len, 0))) { - bio->bi_status = BLK_STS_RESOURCE; - bio_endio(bio); - return; - } - - size -= len; - } while (idx++ < RESYNC_PAGES && size > 0); + if (WARN_ON(!bio_add_folio(bio, rf->folio, + min_t(int, size, RESYNC_BLOCK_SIZE), + 0))) { + bio->bi_status = BLK_STS_RESOURCE; + bio_endio(bio); + } } diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index a72abdc37a2d..724fd4f2cc3a 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -120,11 +120,11 @@ static void remove_serial(struct md_rdev *rdev, sector_t lo, sector_t hi) /* * for resync bio, r1bio pointer can be retrieved from the per-bio - * 'struct resync_pages'. + * 'struct resync_folio'. */ static inline struct r1bio *get_resync_r1bio(struct bio *bio) { - return get_resync_pages(bio)->raid_bio; + return get_resync_folio(bio)->raid_bio; } static void *r1bio_pool_alloc(gfp_t gfp_flags, struct r1conf *conf) @@ -146,70 +146,69 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data) struct r1conf *conf = data; struct r1bio *r1_bio; struct bio *bio; - int need_pages; + int need_folio; int j; - struct resync_pages *rps; + struct resync_folio *rfs; + int order = get_order(RESYNC_BLOCK_SIZE); r1_bio = r1bio_pool_alloc(gfp_flags, conf); if (!r1_bio) return NULL; - rps = kmalloc_array(conf->raid_disks * 2, sizeof(struct resync_pages), + rfs = kmalloc_array(conf->raid_disks * 2, sizeof(struct resync_folio), gfp_flags); - if (!rps) + if (!rfs) goto out_free_r1bio; /* * Allocate bios : 1 for reading, n-1 for writing */ for (j = conf->raid_disks * 2; j-- ; ) { - bio = bio_kmalloc(RESYNC_PAGES, gfp_flags); + bio = bio_kmalloc(1, gfp_flags); if (!bio) goto out_free_bio; - bio_init_inline(bio, NULL, RESYNC_PAGES, 0); + bio_init_inline(bio, NULL, 1, 0); r1_bio->bios[j] = bio; } /* - * Allocate RESYNC_PAGES data pages and attach them to - * the first bio. + * Allocate data folio and attach it to the first bio. * If this is a user-requested check/repair, allocate - * RESYNC_PAGES for each bio. + * folio for each bio. */ if (test_bit(MD_RECOVERY_REQUESTED, &conf->mddev->recovery)) - need_pages = conf->raid_disks * 2; + need_folio = conf->raid_disks * 2; else - need_pages = 1; + need_folio = 1; for (j = 0; j < conf->raid_disks * 2; j++) { - struct resync_pages *rp = &rps[j]; + struct resync_folio *rf = &rfs[j]; - bio = r1_bio->bios[j]; - - if (j < need_pages) { - if (resync_alloc_pages(rp, gfp_flags)) - goto out_free_pages; + if (j < need_folio) { + if (resync_alloc_folio(rf, gfp_flags, &order)) + goto out_free_folio; } else { - memcpy(rp, &rps[0], sizeof(*rp)); - resync_get_all_pages(rp); + memcpy(rf, &rfs[0], sizeof(*rf)); + folio_get(rf->folio); } - rp->raid_bio = r1_bio; - bio->bi_private = rp; + rf->raid_bio = r1_bio; + r1_bio->bios[j]->bi_private = rf; } + r1_bio->sectors = 1 << (order + PAGE_SECTORS_SHIFT); r1_bio->master_bio = NULL; return r1_bio; -out_free_pages: +out_free_folio: while (--j >= 0) - resync_free_pages(&rps[j]); + folio_put(rfs[j].folio); out_free_bio: while (++j < conf->raid_disks * 2) { bio_uninit(r1_bio->bios[j]); kfree(r1_bio->bios[j]); } - kfree(rps); + kfree(rfs); out_free_r1bio: rbio_pool_free(r1_bio, data); @@ -221,17 +220,17 @@ static void r1buf_pool_free(void *__r1_bio, void *data) struct r1conf *conf = data; int i; struct r1bio *r1bio = __r1_bio; - struct resync_pages *rp = NULL; + struct resync_folio *rf = NULL; for (i = conf->raid_disks * 2; i--; ) { - rp = get_resync_pages(r1bio->bios[i]); - resync_free_pages(rp); + rf = get_resync_folio(r1bio->bios[i]); + folio_put(rf->folio); bio_uninit(r1bio->bios[i]); kfree(r1bio->bios[i]); } - /* resync pages array stored in the 1st bio's .bi_private */ - kfree(rp); + /* resync folio stored in the 1st bio's .bi_private */ + kfree(rf); rbio_pool_free(r1bio, data); } @@ -2095,10 +2094,10 @@ static void end_sync_write(struct bio *bio) put_sync_write_buf(r1_bio); } -static int r1_sync_page_io(struct md_rdev *rdev, sector_t sector, - int sectors, struct page *page, blk_opf_t rw) +static int r1_sync_folio_io(struct md_rdev *rdev, sector_t sector, int sectors, + int off, struct folio *folio, blk_opf_t rw) { - if (sync_page_io(rdev, sector, sectors << 9, page, rw, false)) + if (sync_folio_io(rdev, sector, sectors << 9, off, folio, rw, false)) /* success */ return 1; if (rw == REQ_OP_WRITE) { @@ -2129,10 +2128,10 @@ static int fix_sync_read_error(struct r1bio *r1_bio) struct mddev *mddev = r1_bio->mddev; struct r1conf *conf = mddev->private; struct bio *bio = r1_bio->bios[r1_bio->read_disk]; - struct page **pages = get_resync_pages(bio)->pages; + struct folio *folio = get_resync_folio(bio)->folio; sector_t sect = r1_bio->sector; int sectors = r1_bio->sectors; - int idx = 0; + int off = 0; struct md_rdev *rdev; rdev = conf->mirrors[r1_bio->read_disk].rdev; @@ -2162,9 +2161,8 @@ static int fix_sync_read_error(struct r1bio *r1_bio) * active, and resync is currently active */ rdev = conf->mirrors[d].rdev; - if (sync_page_io(rdev, sect, s<<9, - pages[idx], - REQ_OP_READ, false)) { + if (sync_folio_io(rdev, sect, s<<9, off, folio, + REQ_OP_READ, false)) { success = 1; break; } @@ -2197,7 +2195,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio) /* Try next page */ sectors -= s; sect += s; - idx++; + off += s << 9; continue; } @@ -2210,8 +2208,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio) if (r1_bio->bios[d]->bi_end_io != end_sync_read) continue; rdev = conf->mirrors[d].rdev; - if (r1_sync_page_io(rdev, sect, s, - pages[idx], + if (r1_sync_folio_io(rdev, sect, s, off, folio, REQ_OP_WRITE) == 0) { r1_bio->bios[d]->bi_end_io = NULL; rdev_dec_pending(rdev, mddev); @@ -2225,14 +2222,13 @@ static int fix_sync_read_error(struct r1bio *r1_bio) if (r1_bio->bios[d]->bi_end_io != end_sync_read) continue; rdev = conf->mirrors[d].rdev; - if (r1_sync_page_io(rdev, sect, s, - pages[idx], + if (r1_sync_folio_io(rdev, sect, s, off, folio, REQ_OP_READ) != 0) atomic_add(s, &rdev->corrected_errors); } sectors -= s; sect += s; - idx ++; + off += s << 9; } set_bit(R1BIO_Uptodate, &r1_bio->state); bio->bi_status = 0; @@ -2252,14 +2248,12 @@ static void process_checks(struct r1bio *r1_bio) struct r1conf *conf = mddev->private; int primary; int i; - int vcnt; /* Fix variable parts of all bios */ - vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9); for (i = 0; i < conf->raid_disks * 2; i++) { blk_status_t status; struct bio *b = r1_bio->bios[i]; - struct resync_pages *rp = get_resync_pages(b); + struct resync_folio *rf = get_resync_folio(b); if (b->bi_end_io != end_sync_read) continue; /* fixup the bio for reuse, but preserve errno */ @@ -2269,11 +2263,11 @@ static void process_checks(struct r1bio *r1_bio) b->bi_iter.bi_sector = r1_bio->sector + conf->mirrors[i].rdev->data_offset; b->bi_end_io = end_sync_read; - rp->raid_bio = r1_bio; - b->bi_private = rp; + rf->raid_bio = r1_bio; + b->bi_private = rf; /* initialize bvec table again */ - md_bio_reset_resync_pages(b, rp, r1_bio->sectors << 9); + md_bio_reset_resync_folio(b, rf, r1_bio->sectors << 9); } for (primary = 0; primary < conf->raid_disks * 2; primary++) if (r1_bio->bios[primary]->bi_end_io == end_sync_read && @@ -2284,44 +2278,39 @@ static void process_checks(struct r1bio *r1_bio) } r1_bio->read_disk = primary; for (i = 0; i < conf->raid_disks * 2; i++) { - int j = 0; struct bio *pbio = r1_bio->bios[primary]; struct bio *sbio = r1_bio->bios[i]; blk_status_t status = sbio->bi_status; - struct page **ppages = get_resync_pages(pbio)->pages; - struct page **spages = get_resync_pages(sbio)->pages; - struct bio_vec *bi; - int page_len[RESYNC_PAGES] = { 0 }; - struct bvec_iter_all iter_all; + struct folio *pfolio = get_resync_folio(pbio)->folio; + struct folio *sfolio = get_resync_folio(sbio)->folio; if (sbio->bi_end_io != end_sync_read) continue; /* Now we can 'fixup' the error value */ sbio->bi_status = 0; - bio_for_each_segment_all(bi, sbio, iter_all) - page_len[j++] = bi->bv_len; - - if (!status) { - for (j = vcnt; j-- ; ) { - if (memcmp(page_address(ppages[j]), - page_address(spages[j]), - page_len[j])) - break; - } - } else - j = 0; - if (j >= 0) + /* + * Copy data and submit write in two cases: + * - IO error (non-zero status) + * - Data inconsistency and not a CHECK operation. + */ + if (status) { atomic64_add(r1_bio->sectors, &mddev->resync_mismatches); - if (j < 0 || (test_bit(MD_RECOVERY_CHECK, &mddev->recovery) - && !status)) { - /* No need to write to this device. */ - sbio->bi_end_io = NULL; - rdev_dec_pending(conf->mirrors[i].rdev, mddev); + bio_copy_data(sbio, pbio); continue; + } else if (memcmp(folio_address(pfolio), + folio_address(sfolio), + r1_bio->sectors << 9)) { + atomic64_add(r1_bio->sectors, &mddev->resync_mismatches); + if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) { + bio_copy_data(sbio, pbio); + continue; + } } - bio_copy_data(sbio, pbio); + /* No need to write to this device. */ + sbio->bi_end_io = NULL; + rdev_dec_pending(conf->mirrors[i].rdev, mddev); } } @@ -2446,9 +2435,8 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio) if (rdev && !test_bit(Faulty, &rdev->flags)) { atomic_inc(&rdev->nr_pending); - r1_sync_page_io(rdev, sect, s, - folio_page(conf->tmpfolio, 0), - REQ_OP_WRITE); + r1_sync_folio_io(rdev, sect, s, 0, + conf->tmpfolio, REQ_OP_WRITE); rdev_dec_pending(rdev, mddev); } } @@ -2461,9 +2449,8 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio) if (rdev && !test_bit(Faulty, &rdev->flags)) { atomic_inc(&rdev->nr_pending); - if (r1_sync_page_io(rdev, sect, s, - folio_page(conf->tmpfolio, 0), - REQ_OP_READ)) { + if (r1_sync_folio_io(rdev, sect, s, 0, + conf->tmpfolio, REQ_OP_READ)) { atomic_add(s, &rdev->corrected_errors); pr_info("md/raid1:%s: read error corrected (%d sectors at %llu on %pg)\n", mdname(mddev), s, @@ -2738,15 +2725,15 @@ static int init_resync(struct r1conf *conf) static struct r1bio *raid1_alloc_init_r1buf(struct r1conf *conf) { struct r1bio *r1bio = mempool_alloc(&conf->r1buf_pool, GFP_NOIO); - struct resync_pages *rps; + struct resync_folio *rfs; struct bio *bio; int i; for (i = conf->raid_disks * 2; i--; ) { bio = r1bio->bios[i]; - rps = bio->bi_private; + rfs = bio->bi_private; bio_reset(bio, NULL, 0); - bio->bi_private = rps; + bio->bi_private = rfs; } r1bio->master_bio = NULL; return r1bio; @@ -2775,10 +2762,9 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr, int write_targets = 0, read_targets = 0; sector_t sync_blocks; bool still_degraded = false; - int good_sectors = RESYNC_SECTORS; + int good_sectors; int min_bad = 0; /* number of sectors that are bad in all devices */ int idx = sector_to_idx(sector_nr); - int page_idx = 0; if (!mempool_initialized(&conf->r1buf_pool)) if (init_resync(conf)) @@ -2858,8 +2844,11 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr, r1_bio->sector = sector_nr; r1_bio->state = 0; set_bit(R1BIO_IsSync, &r1_bio->state); - /* make sure good_sectors won't go across barrier unit boundary */ - good_sectors = align_to_barrier_unit_end(sector_nr, good_sectors); + /* + * make sure good_sectors won't go across barrier unit boundary. + * r1_bio->sectors <= RESYNC_SECTORS. + */ + good_sectors = align_to_barrier_unit_end(sector_nr, r1_bio->sectors); for (i = 0; i < conf->raid_disks * 2; i++) { struct md_rdev *rdev; @@ -2979,44 +2968,28 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr, max_sector = mddev->resync_max; /* Don't do IO beyond here */ if (max_sector > sector_nr + good_sectors) max_sector = sector_nr + good_sectors; - nr_sectors = 0; - sync_blocks = 0; do { - struct page *page; - int len = PAGE_SIZE; - if (sector_nr + (len>>9) > max_sector) - len = (max_sector - sector_nr) << 9; - if (len == 0) + nr_sectors = max_sector - sector_nr; + if (nr_sectors == 0) break; - if (sync_blocks == 0) { - if (!md_bitmap_start_sync(mddev, sector_nr, - &sync_blocks, still_degraded) && - !conf->fullsync && - !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) - break; - if ((len >> 9) > sync_blocks) - len = sync_blocks<<9; - } + if (!md_bitmap_start_sync(mddev, sector_nr, + &sync_blocks, still_degraded) && + !conf->fullsync && + !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) + break; + if (nr_sectors > sync_blocks) + nr_sectors = sync_blocks; for (i = 0 ; i < conf->raid_disks * 2; i++) { - struct resync_pages *rp; - bio = r1_bio->bios[i]; - rp = get_resync_pages(bio); if (bio->bi_end_io) { - page = resync_fetch_page(rp, page_idx); + struct resync_folio *rf = get_resync_folio(bio); - /* - * won't fail because the vec table is big - * enough to hold all these pages - */ - __bio_add_page(bio, page, len, 0); + bio_add_folio_nofail(bio, rf->folio, nr_sectors << 9, 0); } } - nr_sectors += len>>9; - sector_nr += len>>9; - sync_blocks -= (len>>9); - } while (++page_idx < RESYNC_PAGES); + sector_nr += nr_sectors; + } while (0); r1_bio->sectors = nr_sectors; diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 26f93040cd13..3638e00fe420 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -96,11 +96,11 @@ static void end_reshape(struct r10conf *conf); /* * for resync bio, r10bio pointer can be retrieved from the per-bio - * 'struct resync_pages'. + * 'struct resync_folio'. */ static inline struct r10bio *get_resync_r10bio(struct bio *bio) { - return get_resync_pages(bio)->raid_bio; + return get_resync_folio(bio)->raid_bio; } static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data) @@ -133,8 +133,9 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data) struct r10bio *r10_bio; struct bio *bio; int j; - int nalloc, nalloc_rp; - struct resync_pages *rps; + int nalloc, nalloc_rf; + struct resync_folio *rfs; + int order = get_order(RESYNC_BLOCK_SIZE); r10_bio = r10bio_pool_alloc(gfp_flags, conf); if (!r10_bio) @@ -148,66 +149,64 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data) /* allocate once for all bios */ if (!conf->have_replacement) - nalloc_rp = nalloc; + nalloc_rf = nalloc; else - nalloc_rp = nalloc * 2; - rps = kmalloc_array(nalloc_rp, sizeof(struct resync_pages), gfp_flags); - if (!rps) + nalloc_rf = nalloc * 2; + rfs = kmalloc_array(nalloc_rf, sizeof(struct resync_folio), gfp_flags); + if (!rfs) goto out_free_r10bio; /* * Allocate bios. */ for (j = nalloc ; j-- ; ) { - bio = bio_kmalloc(RESYNC_PAGES, gfp_flags); + bio = bio_kmalloc(1, gfp_flags); if (!bio) goto out_free_bio; - bio_init_inline(bio, NULL, RESYNC_PAGES, 0); + bio_init_inline(bio, NULL, 1, 0); r10_bio->devs[j].bio = bio; if (!conf->have_replacement) continue; - bio = bio_kmalloc(RESYNC_PAGES, gfp_flags); + bio = bio_kmalloc(1, gfp_flags); if (!bio) goto out_free_bio; - bio_init_inline(bio, NULL, RESYNC_PAGES, 0); + bio_init_inline(bio, NULL, 1, 0); r10_bio->devs[j].repl_bio = bio; } /* - * Allocate RESYNC_PAGES data pages and attach them - * where needed. + * Allocate data folio and attach it where needed. */ for (j = 0; j < nalloc; j++) { struct bio *rbio = r10_bio->devs[j].repl_bio; - struct resync_pages *rp, *rp_repl; + struct resync_folio *rf, *rf_repl; - rp = &rps[j]; + rf = &rfs[j]; if (rbio) - rp_repl = &rps[nalloc + j]; - - bio = r10_bio->devs[j].bio; + rf_repl = &rfs[nalloc + j]; if (!j || test_bit(MD_RECOVERY_SYNC, &conf->mddev->recovery)) { - if (resync_alloc_pages(rp, gfp_flags)) - goto out_free_pages; + if (resync_alloc_folio(rf, gfp_flags, &order)) + goto out_free_folio; } else { - memcpy(rp, &rps[0], sizeof(*rp)); - resync_get_all_pages(rp); + memcpy(rf, &rfs[0], sizeof(*rf)); + folio_get(rf->folio); } - rp->raid_bio = r10_bio; - bio->bi_private = rp; + rf->raid_bio = r10_bio; + r10_bio->devs[j].bio->bi_private = rf; if (rbio) { - memcpy(rp_repl, rp, sizeof(*rp)); - rbio->bi_private = rp_repl; + memcpy(rf_repl, rf, sizeof(*rf)); + rbio->bi_private = rf_repl; } } + r10_bio->sectors = 1 << (order + PAGE_SECTORS_SHIFT); return r10_bio; -out_free_pages: +out_free_folio: while (--j >= 0) - resync_free_pages(&rps[j]); + folio_put(rfs[j].folio); j = 0; out_free_bio: @@ -219,7 +218,7 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data) bio_uninit(r10_bio->devs[j].repl_bio); kfree(r10_bio->devs[j].repl_bio); } - kfree(rps); + kfree(rfs); out_free_r10bio: rbio_pool_free(r10_bio, conf); return NULL; @@ -230,14 +229,14 @@ static void r10buf_pool_free(void *__r10_bio, void *data) struct r10conf *conf = data; struct r10bio *r10bio = __r10_bio; int j; - struct resync_pages *rp = NULL; + struct resync_folio *rf = NULL; for (j = conf->copies; j--; ) { struct bio *bio = r10bio->devs[j].bio; if (bio) { - rp = get_resync_pages(bio); - resync_free_pages(rp); + rf = get_resync_folio(bio); + folio_put(rf->folio); bio_uninit(bio); kfree(bio); } @@ -250,7 +249,7 @@ static void r10buf_pool_free(void *__r10_bio, void *data) } /* resync pages array stored in the 1st bio's .bi_private */ - kfree(rp); + kfree(rf); rbio_pool_free(r10bio, conf); } @@ -2342,8 +2341,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio) struct r10conf *conf = mddev->private; int i, first; struct bio *tbio, *fbio; - int vcnt; - struct page **tpages, **fpages; + struct folio *tfolio, *ffolio; atomic_set(&r10_bio->remaining, 1); @@ -2359,14 +2357,13 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio) fbio = r10_bio->devs[i].bio; fbio->bi_iter.bi_size = r10_bio->sectors << 9; fbio->bi_iter.bi_idx = 0; - fpages = get_resync_pages(fbio)->pages; + ffolio = get_resync_folio(fbio)->folio; - vcnt = (r10_bio->sectors + (PAGE_SIZE >> 9) - 1) >> (PAGE_SHIFT - 9); /* now find blocks with errors */ for (i=0 ; i < conf->copies ; i++) { - int j, d; + int d; struct md_rdev *rdev; - struct resync_pages *rp; + struct resync_folio *rf; tbio = r10_bio->devs[i].bio; @@ -2375,31 +2372,23 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio) if (i == first) continue; - tpages = get_resync_pages(tbio)->pages; + tfolio = get_resync_folio(tbio)->folio; d = r10_bio->devs[i].devnum; rdev = conf->mirrors[d].rdev; if (!r10_bio->devs[i].bio->bi_status) { /* We know that the bi_io_vec layout is the same for * both 'first' and 'i', so we just compare them. - * All vec entries are PAGE_SIZE; */ - int sectors = r10_bio->sectors; - for (j = 0; j < vcnt; j++) { - int len = PAGE_SIZE; - if (sectors < (len / 512)) - len = sectors * 512; - if (memcmp(page_address(fpages[j]), - page_address(tpages[j]), - len)) - break; - sectors -= len/512; + if (memcmp(folio_address(ffolio), + folio_address(tfolio), + r10_bio->sectors << 9)) { + atomic64_add(r10_bio->sectors, + &mddev->resync_mismatches); + if (test_bit(MD_RECOVERY_CHECK, + &mddev->recovery)) + /* Don't fix anything. */ + continue; } - if (j == vcnt) - continue; - atomic64_add(r10_bio->sectors, &mddev->resync_mismatches); - if (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) - /* Don't fix anything. */ - continue; } else if (test_bit(FailFast, &rdev->flags)) { /* Just give up on this device */ md_error(rdev->mddev, rdev); @@ -2410,13 +2399,13 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio) * First we need to fixup bv_offset, bv_len and * bi_vecs, as the read request might have corrupted these */ - rp = get_resync_pages(tbio); + rf = get_resync_folio(tbio); bio_reset(tbio, conf->mirrors[d].rdev->bdev, REQ_OP_WRITE); - md_bio_reset_resync_pages(tbio, rp, fbio->bi_iter.bi_size); + md_bio_reset_resync_folio(tbio, rf, fbio->bi_iter.bi_size); - rp->raid_bio = r10_bio; - tbio->bi_private = rp; + rf->raid_bio = r10_bio; + tbio->bi_private = rf; tbio->bi_iter.bi_sector = r10_bio->devs[i].addr; tbio->bi_end_io = end_sync_write; @@ -2476,10 +2465,9 @@ static void fix_recovery_read_error(struct r10bio *r10_bio) struct bio *bio = r10_bio->devs[0].bio; sector_t sect = 0; int sectors = r10_bio->sectors; - int idx = 0; int dr = r10_bio->devs[0].devnum; int dw = r10_bio->devs[1].devnum; - struct page **pages = get_resync_pages(bio)->pages; + struct folio *folio = get_resync_folio(bio)->folio; while (sectors) { int s = sectors; @@ -2492,19 +2480,21 @@ static void fix_recovery_read_error(struct r10bio *r10_bio) rdev = conf->mirrors[dr].rdev; addr = r10_bio->devs[0].addr + sect; - ok = sync_page_io(rdev, - addr, - s << 9, - pages[idx], - REQ_OP_READ, false); + ok = sync_folio_io(rdev, + addr, + s << 9, + sect << 9, + folio, + REQ_OP_READ, false); if (ok) { rdev = conf->mirrors[dw].rdev; addr = r10_bio->devs[1].addr + sect; - ok = sync_page_io(rdev, - addr, - s << 9, - pages[idx], - REQ_OP_WRITE, false); + ok = sync_folio_io(rdev, + addr, + s << 9, + sect << 9, + folio, + REQ_OP_WRITE, false); if (!ok) { set_bit(WriteErrorSeen, &rdev->flags); if (!test_and_set_bit(WantReplacement, @@ -2539,7 +2529,6 @@ static void fix_recovery_read_error(struct r10bio *r10_bio) sectors -= s; sect += s; - idx++; } } @@ -3050,7 +3039,7 @@ static int init_resync(struct r10conf *conf) static struct r10bio *raid10_alloc_init_r10buf(struct r10conf *conf) { struct r10bio *r10bio = mempool_alloc(&conf->r10buf_pool, GFP_NOIO); - struct rsync_pages *rp; + struct resync_folio *rf; struct bio *bio; int nalloc; int i; @@ -3063,14 +3052,14 @@ static struct r10bio *raid10_alloc_init_r10buf(struct r10conf *conf) for (i = 0; i < nalloc; i++) { bio = r10bio->devs[i].bio; - rp = bio->bi_private; + rf = bio->bi_private; bio_reset(bio, NULL, 0); - bio->bi_private = rp; + bio->bi_private = rf; bio = r10bio->devs[i].repl_bio; if (bio) { - rp = bio->bi_private; + rf = bio->bi_private; bio_reset(bio, NULL, 0); - bio->bi_private = rp; + bio->bi_private = rf; } } return r10bio; @@ -3156,7 +3145,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr, int max_sync = RESYNC_SECTORS; sector_t sync_blocks; sector_t chunk_mask = conf->geo.chunk_mask; - int page_idx = 0; /* * Allow skipping a full rebuild for incremental assembly @@ -3376,6 +3364,15 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr, continue; } } + + /* + * RESYNC_BLOCK_SIZE folio might alloc failed in + * resync_alloc_folio(). Fall back to smaller sync + * size if needed. + */ + if (max_sync > r10_bio->sectors) + max_sync = r10_bio->sectors; + any_working = 1; bio = r10_bio->devs[0].bio; bio->bi_next = biolist; @@ -3527,7 +3524,15 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr, } if (sync_blocks < max_sync) max_sync = sync_blocks; + r10_bio = raid10_alloc_init_r10buf(conf); + /* + * RESYNC_BLOCK_SIZE folio might alloc failed in resync_alloc_folio(). + * Fall back to smaller sync size if needed. + */ + if (max_sync > r10_bio->sectors) + max_sync = r10_bio->sectors; + r10_bio->state = 0; r10_bio->mddev = mddev; @@ -3620,29 +3625,25 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr, } } - nr_sectors = 0; if (sector_nr + max_sync < max_sector) max_sector = sector_nr + max_sync; do { - struct page *page; - int len = PAGE_SIZE; - if (sector_nr + (len>>9) > max_sector) - len = (max_sector - sector_nr) << 9; - if (len == 0) + nr_sectors = max_sector - sector_nr; + + if (nr_sectors == 0) break; for (bio= biolist ; bio ; bio=bio->bi_next) { - struct resync_pages *rp = get_resync_pages(bio); - page = resync_fetch_page(rp, page_idx); - if (WARN_ON(!bio_add_page(bio, page, len, 0))) { + struct resync_folio *rf = get_resync_folio(bio); + + if (WARN_ON(!bio_add_folio(bio, rf->folio, nr_sectors << 9, 0))) { bio->bi_status = BLK_STS_RESOURCE; bio_endio(bio); *skipped = 1; - return max_sync; + return nr_sectors << 9; } } - nr_sectors += len>>9; - sector_nr += len>>9; - } while (++page_idx < RESYNC_PAGES); + sector_nr += nr_sectors; + } while (0); r10_bio->sectors = nr_sectors; if (mddev_is_clustered(mddev) && @@ -4560,7 +4561,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, int *skipped) { /* We simply copy at most one chunk (smallest of old and new) - * at a time, possibly less if that exceeds RESYNC_PAGES, + * at a time, possibly less if that exceeds RESYNC_BLOCK_SIZE, * or we hit a bad block or something. * This might mean we pause for normal IO in the middle of * a chunk, but that is not a problem as mddev->reshape_position @@ -4600,14 +4601,13 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, struct r10bio *r10_bio; sector_t next, safe, last; int max_sectors; - int nr_sectors; int s; struct md_rdev *rdev; int need_flush = 0; struct bio *blist; struct bio *bio, *read_bio; int sectors_done = 0; - struct page **pages; + struct folio *folio; if (sector_nr == 0) { /* If restarting in the middle, skip the initial sectors */ @@ -4709,7 +4709,12 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, r10_bio->mddev = mddev; r10_bio->sector = sector_nr; set_bit(R10BIO_IsReshape, &r10_bio->state); - r10_bio->sectors = last - sector_nr + 1; + /* + * RESYNC_BLOCK_SIZE folio might alloc failed in + * resync_alloc_folio(). Fall back to smaller sync + * size if needed. + */ + r10_bio->sectors = min_t(int, r10_bio->sectors, last - sector_nr + 1); rdev = read_balance(conf, r10_bio, &max_sectors); BUG_ON(!test_bit(R10BIO_Previous, &r10_bio->state)); @@ -4723,7 +4728,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, return sectors_done; } - read_bio = bio_alloc_bioset(rdev->bdev, RESYNC_PAGES, REQ_OP_READ, + read_bio = bio_alloc_bioset(rdev->bdev, 1, REQ_OP_READ, GFP_KERNEL, &mddev->bio_set); read_bio->bi_iter.bi_sector = (r10_bio->devs[r10_bio->read_slot].addr + rdev->data_offset); @@ -4787,32 +4792,23 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, blist = b; } - /* Now add as many pages as possible to all of these bios. */ + /* Now add folio to all of these bios. */ - nr_sectors = 0; - pages = get_resync_pages(r10_bio->devs[0].bio)->pages; - for (s = 0 ; s < max_sectors; s += PAGE_SIZE >> 9) { - struct page *page = pages[s / (PAGE_SIZE >> 9)]; - int len = (max_sectors - s) << 9; - if (len > PAGE_SIZE) - len = PAGE_SIZE; - for (bio = blist; bio ; bio = bio->bi_next) { - if (WARN_ON(!bio_add_page(bio, page, len, 0))) { - bio->bi_status = BLK_STS_RESOURCE; - bio_endio(bio); - return sectors_done; - } + folio = get_resync_folio(r10_bio->devs[0].bio)->folio; + for (bio = blist; bio ; bio = bio->bi_next) { + if (WARN_ON(!bio_add_folio(bio, folio, max_sectors, 0))) { + bio->bi_status = BLK_STS_RESOURCE; + bio_endio(bio); + return sectors_done; } - sector_nr += len >> 9; - nr_sectors += len >> 9; } - r10_bio->sectors = nr_sectors; + r10_bio->sectors = max_sectors >> 9; /* Now submit the read */ atomic_inc(&r10_bio->remaining); read_bio->bi_next = NULL; submit_bio_noacct(read_bio); - sectors_done += nr_sectors; + sectors_done += max_sectors; if (sector_nr <= last) goto read_more; @@ -4914,8 +4910,8 @@ static int handle_reshape_read_error(struct mddev *mddev, struct r10conf *conf = mddev->private; struct r10bio *r10b; int slot = 0; - int idx = 0; - struct page **pages; + int sect = 0; + struct folio *folio; r10b = kmalloc(struct_size(r10b, devs, conf->copies), GFP_NOIO); if (!r10b) { @@ -4923,8 +4919,8 @@ static int handle_reshape_read_error(struct mddev *mddev, return -ENOMEM; } - /* reshape IOs share pages from .devs[0].bio */ - pages = get_resync_pages(r10_bio->devs[0].bio)->pages; + /* reshape IOs share folio from .devs[0].bio */ + folio = get_resync_folio(r10_bio->devs[0].bio)->folio; r10b->sector = r10_bio->sector; __raid10_find_phys(&conf->prev, r10b); @@ -4940,19 +4936,19 @@ static int handle_reshape_read_error(struct mddev *mddev, while (!success) { int d = r10b->devs[slot].devnum; struct md_rdev *rdev = conf->mirrors[d].rdev; - sector_t addr; if (rdev == NULL || test_bit(Faulty, &rdev->flags) || !test_bit(In_sync, &rdev->flags)) goto failed; - addr = r10b->devs[slot].addr + idx * PAGE_SIZE; atomic_inc(&rdev->nr_pending); - success = sync_page_io(rdev, - addr, - s << 9, - pages[idx], - REQ_OP_READ, false); + success = sync_folio_io(rdev, + r10b->devs[slot].addr + + sect, + s << 9, + sect << 9, + folio, + REQ_OP_READ, false); rdev_dec_pending(rdev, mddev); if (success) break; @@ -4971,7 +4967,7 @@ static int handle_reshape_read_error(struct mddev *mddev, return -EIO; } sectors -= s; - idx++; + sect += s; } kfree(r10b); return 0; -- 2.39.2 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v3 6/8] md/raid1,raid10: use folio for sync path IO 2026-04-16 3:37 ` [PATCH v3 6/8] md/raid1,raid10: use folio for sync path IO linan666 @ 2026-04-30 1:54 ` Xiao Ni 2026-05-07 7:13 ` 李楠 Magic Li 0 siblings, 1 reply; 13+ messages in thread From: Xiao Ni @ 2026-04-30 1:54 UTC (permalink / raw) To: linan666; +Cc: song, yukuai, linux-raid, linux-kernel, yangerkun, yi.zhang Hi Nan On Thu, Apr 16, 2026 at 11:55 AM <linan666@huaweicloud.com> wrote: > > From: Li Nan <linan122@huawei.com> > > Convert all IO on the sync path to use folios, and rename page-related > identifiers to match folio. > > Since RESYNC_BLOCK_SIZE (64K) has higher allocation failure chance than 4k, > retry with lower orders to improve allocation reliability. A r1/10_bio may > have different rf->folio orders, so use minimum order as r1/10_bio sectors > to prevent exceeding size when adding folio to IO later. > > Clean up: > 1. Remove resync_get_all_folio() and invoke folio_get() directly instead. > 2. Clean up redundant while(0) loop in md_bio_reset_resync_folio(). > 3. Clean up bio variable by directly referencing r10_bio->devs[j].bio > instead in r1buf_pool_alloc() and r10buf_pool_alloc(). > 4. Clean up RESYNC_PAGES. > 5. Remove resync_fetch_folio(), access 'rf->folio' directly. > 6. Remove resync_free_folio(), call folio_put() directly. > 7. clean up sync IO size calculation in raid1/10_sync_request. > > Signed-off-by: Li Nan <linan122@huawei.com> > --- > drivers/md/md.c | 2 +- > drivers/md/raid1-10.c | 80 ++++--------- > drivers/md/raid1.c | 209 +++++++++++++++------------------- > drivers/md/raid10.c | 254 +++++++++++++++++++++--------------------- > 4 files changed, 240 insertions(+), 305 deletions(-) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 5e83914d5c14..6554b849ac74 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -9440,7 +9440,7 @@ static bool sync_io_within_limit(struct mddev *mddev) > { > /* > * For raid456, sync IO is stripe(4k) per IO, for other levels, it's > - * RESYNC_PAGES(64k) per IO. > + * RESYNC_BLOCK_SIZE(64k) per IO. > */ > return atomic_read(&mddev->recovery_active) < > (raid_is_456(mddev) ? 8 : 128) * sync_io_depth(mddev); > diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c > index cda531d0720b..10200b0a3fd2 100644 > --- a/drivers/md/raid1-10.c > +++ b/drivers/md/raid1-10.c > @@ -1,7 +1,6 @@ > // SPDX-License-Identifier: GPL-2.0 > /* Maximum size of each resync request */ > #define RESYNC_BLOCK_SIZE (64*1024) > -#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE) > #define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9) > > /* when we get a read error on a read-only array, we redirect to another > @@ -20,9 +19,9 @@ > #define MAX_PLUG_BIO 32 > > /* for managing resync I/O pages */ > -struct resync_pages { > +struct resync_folio { > void *raid_bio; > - struct page *pages[RESYNC_PAGES]; > + struct folio *folio; > }; > > struct raid1_plug_cb { > @@ -36,77 +35,44 @@ static void rbio_pool_free(void *rbio, void *data) > kfree(rbio); > } > > -static inline int resync_alloc_pages(struct resync_pages *rp, > - gfp_t gfp_flags) > +static inline int resync_alloc_folio(struct resync_folio *rf, > + gfp_t gfp_flags, int *order) > { > - int i; > + struct folio *folio; > > - for (i = 0; i < RESYNC_PAGES; i++) { > - rp->pages[i] = alloc_page(gfp_flags); > - if (!rp->pages[i]) > - goto out_free; > - } > + do { > + folio = folio_alloc(gfp_flags, *order); > + if (folio) > + break; > + } while (--(*order) > 0); It has a problem here. If it can't allocate a big page, the sync request unit will be smaller and sync performance may decrease. This can happen when the system lacks sufficient continuous memory. This change looks good to me. I just want to throw this problem out for an open discussion. > > + if (!folio) > + return -ENOMEM; > + > + rf->folio = folio; > return 0; > - > -out_free: > - while (--i >= 0) > - put_page(rp->pages[i]); > - return -ENOMEM; > -} > - > -static inline void resync_free_pages(struct resync_pages *rp) > -{ > - int i; > - > - for (i = 0; i < RESYNC_PAGES; i++) > - put_page(rp->pages[i]); > -} > - > -static inline void resync_get_all_pages(struct resync_pages *rp) > -{ > - int i; > - > - for (i = 0; i < RESYNC_PAGES; i++) > - get_page(rp->pages[i]); > -} > - > -static inline struct page *resync_fetch_page(struct resync_pages *rp, > - unsigned idx) > -{ > - if (WARN_ON_ONCE(idx >= RESYNC_PAGES)) > - return NULL; > - return rp->pages[idx]; > } > > /* > - * 'strct resync_pages' stores actual pages used for doing the resync > + * 'strct resync_folio' stores actual pages used for doing the resync > * IO, and it is per-bio, so make .bi_private points to it. > */ > -static inline struct resync_pages *get_resync_pages(struct bio *bio) > +static inline struct resync_folio *get_resync_folio(struct bio *bio) > { > return bio->bi_private; > } > > /* generally called after bio_reset() for reseting bvec */ > -static void md_bio_reset_resync_pages(struct bio *bio, struct resync_pages *rp, > +static void md_bio_reset_resync_folio(struct bio *bio, struct resync_folio *rf, > int size) > { > - int idx = 0; > - > /* initialize bvec table again */ > - do { > - struct page *page = resync_fetch_page(rp, idx); > - int len = min_t(int, size, PAGE_SIZE); > - > - if (WARN_ON(!bio_add_page(bio, page, len, 0))) { > - bio->bi_status = BLK_STS_RESOURCE; > - bio_endio(bio); > - return; > - } > - > - size -= len; > - } while (idx++ < RESYNC_PAGES && size > 0); > + if (WARN_ON(!bio_add_folio(bio, rf->folio, > + min_t(int, size, RESYNC_BLOCK_SIZE), > + 0))) { > + bio->bi_status = BLK_STS_RESOURCE; > + bio_endio(bio); > + } > } > > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index a72abdc37a2d..724fd4f2cc3a 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -120,11 +120,11 @@ static void remove_serial(struct md_rdev *rdev, sector_t lo, sector_t hi) > > /* > * for resync bio, r1bio pointer can be retrieved from the per-bio > - * 'struct resync_pages'. > + * 'struct resync_folio'. > */ > static inline struct r1bio *get_resync_r1bio(struct bio *bio) > { > - return get_resync_pages(bio)->raid_bio; > + return get_resync_folio(bio)->raid_bio; > } > > static void *r1bio_pool_alloc(gfp_t gfp_flags, struct r1conf *conf) > @@ -146,70 +146,69 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data) > struct r1conf *conf = data; > struct r1bio *r1_bio; > struct bio *bio; > - int need_pages; > + int need_folio; The name need_folio is confusing. Can we keep the same style as the old version? How about need_folios? > int j; > - struct resync_pages *rps; > + struct resync_folio *rfs; > + int order = get_order(RESYNC_BLOCK_SIZE); > > r1_bio = r1bio_pool_alloc(gfp_flags, conf); > if (!r1_bio) > return NULL; > > - rps = kmalloc_array(conf->raid_disks * 2, sizeof(struct resync_pages), > + rfs = kmalloc_array(conf->raid_disks * 2, sizeof(struct resync_folio), > gfp_flags); > - if (!rps) > + if (!rfs) > goto out_free_r1bio; > > /* > * Allocate bios : 1 for reading, n-1 for writing > */ > for (j = conf->raid_disks * 2; j-- ; ) { > - bio = bio_kmalloc(RESYNC_PAGES, gfp_flags); > + bio = bio_kmalloc(1, gfp_flags); > if (!bio) > goto out_free_bio; > - bio_init_inline(bio, NULL, RESYNC_PAGES, 0); > + bio_init_inline(bio, NULL, 1, 0); > r1_bio->bios[j] = bio; > } > /* > - * Allocate RESYNC_PAGES data pages and attach them to > - * the first bio. > + * Allocate data folio and attach it to the first bio. > * If this is a user-requested check/repair, allocate > - * RESYNC_PAGES for each bio. > + * folio for each bio. > */ > if (test_bit(MD_RECOVERY_REQUESTED, &conf->mddev->recovery)) > - need_pages = conf->raid_disks * 2; > + need_folio = conf->raid_disks * 2; > else > - need_pages = 1; > + need_folio = 1; > for (j = 0; j < conf->raid_disks * 2; j++) { > - struct resync_pages *rp = &rps[j]; > + struct resync_folio *rf = &rfs[j]; > > - bio = r1_bio->bios[j]; > - > - if (j < need_pages) { > - if (resync_alloc_pages(rp, gfp_flags)) > - goto out_free_pages; > + if (j < need_folio) { > + if (resync_alloc_folio(rf, gfp_flags, &order)) > + goto out_free_folio; > } else { > - memcpy(rp, &rps[0], sizeof(*rp)); > - resync_get_all_pages(rp); > + memcpy(rf, &rfs[0], sizeof(*rf)); > + folio_get(rf->folio); > } > > - rp->raid_bio = r1_bio; > - bio->bi_private = rp; > + rf->raid_bio = r1_bio; > + r1_bio->bios[j]->bi_private = rf; > } > > + r1_bio->sectors = 1 << (order + PAGE_SECTORS_SHIFT); > r1_bio->master_bio = NULL; > > return r1_bio; > > -out_free_pages: > +out_free_folio: > while (--j >= 0) > - resync_free_pages(&rps[j]); > + folio_put(rfs[j].folio); > > out_free_bio: > while (++j < conf->raid_disks * 2) { > bio_uninit(r1_bio->bios[j]); > kfree(r1_bio->bios[j]); > } > - kfree(rps); > + kfree(rfs); > > out_free_r1bio: > rbio_pool_free(r1_bio, data); > @@ -221,17 +220,17 @@ static void r1buf_pool_free(void *__r1_bio, void *data) > struct r1conf *conf = data; > int i; > struct r1bio *r1bio = __r1_bio; > - struct resync_pages *rp = NULL; > + struct resync_folio *rf = NULL; > > for (i = conf->raid_disks * 2; i--; ) { > - rp = get_resync_pages(r1bio->bios[i]); > - resync_free_pages(rp); > + rf = get_resync_folio(r1bio->bios[i]); > + folio_put(rf->folio); > bio_uninit(r1bio->bios[i]); > kfree(r1bio->bios[i]); > } > > - /* resync pages array stored in the 1st bio's .bi_private */ > - kfree(rp); > + /* resync folio stored in the 1st bio's .bi_private */ > + kfree(rf); > > rbio_pool_free(r1bio, data); > } > @@ -2095,10 +2094,10 @@ static void end_sync_write(struct bio *bio) > put_sync_write_buf(r1_bio); > } > > -static int r1_sync_page_io(struct md_rdev *rdev, sector_t sector, > - int sectors, struct page *page, blk_opf_t rw) > +static int r1_sync_folio_io(struct md_rdev *rdev, sector_t sector, int sectors, > + int off, struct folio *folio, blk_opf_t rw) > { > - if (sync_page_io(rdev, sector, sectors << 9, page, rw, false)) > + if (sync_folio_io(rdev, sector, sectors << 9, off, folio, rw, false)) > /* success */ > return 1; > if (rw == REQ_OP_WRITE) { > @@ -2129,10 +2128,10 @@ static int fix_sync_read_error(struct r1bio *r1_bio) > struct mddev *mddev = r1_bio->mddev; > struct r1conf *conf = mddev->private; > struct bio *bio = r1_bio->bios[r1_bio->read_disk]; > - struct page **pages = get_resync_pages(bio)->pages; > + struct folio *folio = get_resync_folio(bio)->folio; > sector_t sect = r1_bio->sector; > int sectors = r1_bio->sectors; > - int idx = 0; > + int off = 0; > struct md_rdev *rdev; > > rdev = conf->mirrors[r1_bio->read_disk].rdev; > @@ -2162,9 +2161,8 @@ static int fix_sync_read_error(struct r1bio *r1_bio) > * active, and resync is currently active > */ > rdev = conf->mirrors[d].rdev; > - if (sync_page_io(rdev, sect, s<<9, > - pages[idx], > - REQ_OP_READ, false)) { > + if (sync_folio_io(rdev, sect, s<<9, off, folio, > + REQ_OP_READ, false)) { > success = 1; > break; > } > @@ -2197,7 +2195,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio) > /* Try next page */ > sectors -= s; > sect += s; > - idx++; > + off += s << 9; > continue; > } > > @@ -2210,8 +2208,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio) > if (r1_bio->bios[d]->bi_end_io != end_sync_read) > continue; > rdev = conf->mirrors[d].rdev; > - if (r1_sync_page_io(rdev, sect, s, > - pages[idx], > + if (r1_sync_folio_io(rdev, sect, s, off, folio, > REQ_OP_WRITE) == 0) { > r1_bio->bios[d]->bi_end_io = NULL; > rdev_dec_pending(rdev, mddev); > @@ -2225,14 +2222,13 @@ static int fix_sync_read_error(struct r1bio *r1_bio) > if (r1_bio->bios[d]->bi_end_io != end_sync_read) > continue; > rdev = conf->mirrors[d].rdev; > - if (r1_sync_page_io(rdev, sect, s, > - pages[idx], > + if (r1_sync_folio_io(rdev, sect, s, off, folio, > REQ_OP_READ) != 0) > atomic_add(s, &rdev->corrected_errors); > } > sectors -= s; > sect += s; > - idx ++; > + off += s << 9; > } > set_bit(R1BIO_Uptodate, &r1_bio->state); > bio->bi_status = 0; > @@ -2252,14 +2248,12 @@ static void process_checks(struct r1bio *r1_bio) > struct r1conf *conf = mddev->private; > int primary; > int i; > - int vcnt; > > /* Fix variable parts of all bios */ > - vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9); > for (i = 0; i < conf->raid_disks * 2; i++) { > blk_status_t status; > struct bio *b = r1_bio->bios[i]; > - struct resync_pages *rp = get_resync_pages(b); > + struct resync_folio *rf = get_resync_folio(b); > if (b->bi_end_io != end_sync_read) > continue; > /* fixup the bio for reuse, but preserve errno */ > @@ -2269,11 +2263,11 @@ static void process_checks(struct r1bio *r1_bio) > b->bi_iter.bi_sector = r1_bio->sector + > conf->mirrors[i].rdev->data_offset; > b->bi_end_io = end_sync_read; > - rp->raid_bio = r1_bio; > - b->bi_private = rp; > + rf->raid_bio = r1_bio; > + b->bi_private = rf; > > /* initialize bvec table again */ > - md_bio_reset_resync_pages(b, rp, r1_bio->sectors << 9); > + md_bio_reset_resync_folio(b, rf, r1_bio->sectors << 9); > } > for (primary = 0; primary < conf->raid_disks * 2; primary++) > if (r1_bio->bios[primary]->bi_end_io == end_sync_read && > @@ -2284,44 +2278,39 @@ static void process_checks(struct r1bio *r1_bio) > } > r1_bio->read_disk = primary; > for (i = 0; i < conf->raid_disks * 2; i++) { > - int j = 0; > struct bio *pbio = r1_bio->bios[primary]; > struct bio *sbio = r1_bio->bios[i]; > blk_status_t status = sbio->bi_status; > - struct page **ppages = get_resync_pages(pbio)->pages; > - struct page **spages = get_resync_pages(sbio)->pages; > - struct bio_vec *bi; > - int page_len[RESYNC_PAGES] = { 0 }; > - struct bvec_iter_all iter_all; > + struct folio *pfolio = get_resync_folio(pbio)->folio; > + struct folio *sfolio = get_resync_folio(sbio)->folio; > > if (sbio->bi_end_io != end_sync_read) > continue; > /* Now we can 'fixup' the error value */ > sbio->bi_status = 0; > > - bio_for_each_segment_all(bi, sbio, iter_all) > - page_len[j++] = bi->bv_len; > - > - if (!status) { > - for (j = vcnt; j-- ; ) { > - if (memcmp(page_address(ppages[j]), > - page_address(spages[j]), > - page_len[j])) > - break; > - } > - } else > - j = 0; > - if (j >= 0) > + /* > + * Copy data and submit write in two cases: > + * - IO error (non-zero status) > + * - Data inconsistency and not a CHECK operation. > + */ > + if (status) { > atomic64_add(r1_bio->sectors, &mddev->resync_mismatches); > - if (j < 0 || (test_bit(MD_RECOVERY_CHECK, &mddev->recovery) > - && !status)) { > - /* No need to write to this device. */ > - sbio->bi_end_io = NULL; > - rdev_dec_pending(conf->mirrors[i].rdev, mddev); > + bio_copy_data(sbio, pbio); > continue; > + } else if (memcmp(folio_address(pfolio), > + folio_address(sfolio), > + r1_bio->sectors << 9)) { > + atomic64_add(r1_bio->sectors, &mddev->resync_mismatches); > + if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) { > + bio_copy_data(sbio, pbio); > + continue; > + } > } > > - bio_copy_data(sbio, pbio); > + /* No need to write to this device. */ > + sbio->bi_end_io = NULL; > + rdev_dec_pending(conf->mirrors[i].rdev, mddev); > } > } > > @@ -2446,9 +2435,8 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio) > if (rdev && > !test_bit(Faulty, &rdev->flags)) { > atomic_inc(&rdev->nr_pending); > - r1_sync_page_io(rdev, sect, s, > - folio_page(conf->tmpfolio, 0), > - REQ_OP_WRITE); > + r1_sync_folio_io(rdev, sect, s, 0, > + conf->tmpfolio, REQ_OP_WRITE); > rdev_dec_pending(rdev, mddev); > } > } > @@ -2461,9 +2449,8 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio) > if (rdev && > !test_bit(Faulty, &rdev->flags)) { > atomic_inc(&rdev->nr_pending); > - if (r1_sync_page_io(rdev, sect, s, > - folio_page(conf->tmpfolio, 0), > - REQ_OP_READ)) { > + if (r1_sync_folio_io(rdev, sect, s, 0, > + conf->tmpfolio, REQ_OP_READ)) { > atomic_add(s, &rdev->corrected_errors); > pr_info("md/raid1:%s: read error corrected (%d sectors at %llu on %pg)\n", > mdname(mddev), s, > @@ -2738,15 +2725,15 @@ static int init_resync(struct r1conf *conf) > static struct r1bio *raid1_alloc_init_r1buf(struct r1conf *conf) > { > struct r1bio *r1bio = mempool_alloc(&conf->r1buf_pool, GFP_NOIO); > - struct resync_pages *rps; > + struct resync_folio *rfs; > struct bio *bio; > int i; > > for (i = conf->raid_disks * 2; i--; ) { > bio = r1bio->bios[i]; > - rps = bio->bi_private; > + rfs = bio->bi_private; > bio_reset(bio, NULL, 0); > - bio->bi_private = rps; > + bio->bi_private = rfs; > } > r1bio->master_bio = NULL; > return r1bio; > @@ -2775,10 +2762,9 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr, > int write_targets = 0, read_targets = 0; > sector_t sync_blocks; > bool still_degraded = false; > - int good_sectors = RESYNC_SECTORS; > + int good_sectors; > int min_bad = 0; /* number of sectors that are bad in all devices */ > int idx = sector_to_idx(sector_nr); > - int page_idx = 0; > > if (!mempool_initialized(&conf->r1buf_pool)) > if (init_resync(conf)) > @@ -2858,8 +2844,11 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr, > r1_bio->sector = sector_nr; > r1_bio->state = 0; > set_bit(R1BIO_IsSync, &r1_bio->state); > - /* make sure good_sectors won't go across barrier unit boundary */ > - good_sectors = align_to_barrier_unit_end(sector_nr, good_sectors); > + /* > + * make sure good_sectors won't go across barrier unit boundary. > + * r1_bio->sectors <= RESYNC_SECTORS. > + */ > + good_sectors = align_to_barrier_unit_end(sector_nr, r1_bio->sectors); > > for (i = 0; i < conf->raid_disks * 2; i++) { > struct md_rdev *rdev; > @@ -2979,44 +2968,28 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr, > max_sector = mddev->resync_max; /* Don't do IO beyond here */ > if (max_sector > sector_nr + good_sectors) > max_sector = sector_nr + good_sectors; > - nr_sectors = 0; > - sync_blocks = 0; > do { > - struct page *page; > - int len = PAGE_SIZE; > - if (sector_nr + (len>>9) > max_sector) > - len = (max_sector - sector_nr) << 9; > - if (len == 0) > + nr_sectors = max_sector - sector_nr; > + if (nr_sectors == 0) > break; > - if (sync_blocks == 0) { > - if (!md_bitmap_start_sync(mddev, sector_nr, > - &sync_blocks, still_degraded) && > - !conf->fullsync && > - !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) > - break; > - if ((len >> 9) > sync_blocks) > - len = sync_blocks<<9; > - } > + if (!md_bitmap_start_sync(mddev, sector_nr, > + &sync_blocks, still_degraded) && > + !conf->fullsync && > + !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) > + break; > + if (nr_sectors > sync_blocks) > + nr_sectors = sync_blocks; > > for (i = 0 ; i < conf->raid_disks * 2; i++) { > - struct resync_pages *rp; > - > bio = r1_bio->bios[i]; > - rp = get_resync_pages(bio); > if (bio->bi_end_io) { > - page = resync_fetch_page(rp, page_idx); > + struct resync_folio *rf = get_resync_folio(bio); > > - /* > - * won't fail because the vec table is big > - * enough to hold all these pages > - */ > - __bio_add_page(bio, page, len, 0); > + bio_add_folio_nofail(bio, rf->folio, nr_sectors << 9, 0); > } > } > - nr_sectors += len>>9; > - sector_nr += len>>9; > - sync_blocks -= (len>>9); > - } while (++page_idx < RESYNC_PAGES); > + sector_nr += nr_sectors; > + } while (0); Now it can handle all pages in one go via a folio. It's strange to keep while(0) here. > > r1_bio->sectors = nr_sectors; This patch is a little big. Is it better to split this patch here? > > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c > index 26f93040cd13..3638e00fe420 100644 > --- a/drivers/md/raid10.c > +++ b/drivers/md/raid10.c > @@ -96,11 +96,11 @@ static void end_reshape(struct r10conf *conf); > > /* > * for resync bio, r10bio pointer can be retrieved from the per-bio > - * 'struct resync_pages'. > + * 'struct resync_folio'. > */ > static inline struct r10bio *get_resync_r10bio(struct bio *bio) > { > - return get_resync_pages(bio)->raid_bio; > + return get_resync_folio(bio)->raid_bio; > } > > static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data) > @@ -133,8 +133,9 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data) > struct r10bio *r10_bio; > struct bio *bio; > int j; > - int nalloc, nalloc_rp; > - struct resync_pages *rps; > + int nalloc, nalloc_rf; > + struct resync_folio *rfs; > + int order = get_order(RESYNC_BLOCK_SIZE); > > r10_bio = r10bio_pool_alloc(gfp_flags, conf); > if (!r10_bio) > @@ -148,66 +149,64 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data) > > /* allocate once for all bios */ > if (!conf->have_replacement) > - nalloc_rp = nalloc; > + nalloc_rf = nalloc; > else > - nalloc_rp = nalloc * 2; > - rps = kmalloc_array(nalloc_rp, sizeof(struct resync_pages), gfp_flags); > - if (!rps) > + nalloc_rf = nalloc * 2; > + rfs = kmalloc_array(nalloc_rf, sizeof(struct resync_folio), gfp_flags); > + if (!rfs) > goto out_free_r10bio; > > /* > * Allocate bios. > */ > for (j = nalloc ; j-- ; ) { > - bio = bio_kmalloc(RESYNC_PAGES, gfp_flags); > + bio = bio_kmalloc(1, gfp_flags); > if (!bio) > goto out_free_bio; > - bio_init_inline(bio, NULL, RESYNC_PAGES, 0); > + bio_init_inline(bio, NULL, 1, 0); > r10_bio->devs[j].bio = bio; > if (!conf->have_replacement) > continue; > - bio = bio_kmalloc(RESYNC_PAGES, gfp_flags); > + bio = bio_kmalloc(1, gfp_flags); > if (!bio) > goto out_free_bio; > - bio_init_inline(bio, NULL, RESYNC_PAGES, 0); > + bio_init_inline(bio, NULL, 1, 0); > r10_bio->devs[j].repl_bio = bio; > } > /* > - * Allocate RESYNC_PAGES data pages and attach them > - * where needed. > + * Allocate data folio and attach it where needed. > */ > for (j = 0; j < nalloc; j++) { > struct bio *rbio = r10_bio->devs[j].repl_bio; > - struct resync_pages *rp, *rp_repl; > + struct resync_folio *rf, *rf_repl; > > - rp = &rps[j]; > + rf = &rfs[j]; > if (rbio) > - rp_repl = &rps[nalloc + j]; > - > - bio = r10_bio->devs[j].bio; > + rf_repl = &rfs[nalloc + j]; > > if (!j || test_bit(MD_RECOVERY_SYNC, > &conf->mddev->recovery)) { > - if (resync_alloc_pages(rp, gfp_flags)) > - goto out_free_pages; > + if (resync_alloc_folio(rf, gfp_flags, &order)) > + goto out_free_folio; > } else { > - memcpy(rp, &rps[0], sizeof(*rp)); > - resync_get_all_pages(rp); > + memcpy(rf, &rfs[0], sizeof(*rf)); > + folio_get(rf->folio); > } > > - rp->raid_bio = r10_bio; > - bio->bi_private = rp; > + rf->raid_bio = r10_bio; > + r10_bio->devs[j].bio->bi_private = rf; > if (rbio) { > - memcpy(rp_repl, rp, sizeof(*rp)); > - rbio->bi_private = rp_repl; > + memcpy(rf_repl, rf, sizeof(*rf)); > + rbio->bi_private = rf_repl; > } > } > > + r10_bio->sectors = 1 << (order + PAGE_SECTORS_SHIFT); > return r10_bio; > > -out_free_pages: > +out_free_folio: > while (--j >= 0) > - resync_free_pages(&rps[j]); > + folio_put(rfs[j].folio); > > j = 0; > out_free_bio: > @@ -219,7 +218,7 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data) > bio_uninit(r10_bio->devs[j].repl_bio); > kfree(r10_bio->devs[j].repl_bio); > } > - kfree(rps); > + kfree(rfs); > out_free_r10bio: > rbio_pool_free(r10_bio, conf); > return NULL; > @@ -230,14 +229,14 @@ static void r10buf_pool_free(void *__r10_bio, void *data) > struct r10conf *conf = data; > struct r10bio *r10bio = __r10_bio; > int j; > - struct resync_pages *rp = NULL; > + struct resync_folio *rf = NULL; > > for (j = conf->copies; j--; ) { > struct bio *bio = r10bio->devs[j].bio; > > if (bio) { > - rp = get_resync_pages(bio); > - resync_free_pages(rp); > + rf = get_resync_folio(bio); > + folio_put(rf->folio); > bio_uninit(bio); > kfree(bio); > } > @@ -250,7 +249,7 @@ static void r10buf_pool_free(void *__r10_bio, void *data) > } > > /* resync pages array stored in the 1st bio's .bi_private */ > - kfree(rp); > + kfree(rf); > > rbio_pool_free(r10bio, conf); > } > @@ -2342,8 +2341,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio) > struct r10conf *conf = mddev->private; > int i, first; > struct bio *tbio, *fbio; > - int vcnt; > - struct page **tpages, **fpages; > + struct folio *tfolio, *ffolio; > > atomic_set(&r10_bio->remaining, 1); > > @@ -2359,14 +2357,13 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio) > fbio = r10_bio->devs[i].bio; > fbio->bi_iter.bi_size = r10_bio->sectors << 9; > fbio->bi_iter.bi_idx = 0; > - fpages = get_resync_pages(fbio)->pages; > + ffolio = get_resync_folio(fbio)->folio; > > - vcnt = (r10_bio->sectors + (PAGE_SIZE >> 9) - 1) >> (PAGE_SHIFT - 9); > /* now find blocks with errors */ > for (i=0 ; i < conf->copies ; i++) { > - int j, d; > + int d; > struct md_rdev *rdev; > - struct resync_pages *rp; > + struct resync_folio *rf; > > tbio = r10_bio->devs[i].bio; > > @@ -2375,31 +2372,23 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio) > if (i == first) > continue; > > - tpages = get_resync_pages(tbio)->pages; > + tfolio = get_resync_folio(tbio)->folio; > d = r10_bio->devs[i].devnum; > rdev = conf->mirrors[d].rdev; > if (!r10_bio->devs[i].bio->bi_status) { > /* We know that the bi_io_vec layout is the same for > * both 'first' and 'i', so we just compare them. > - * All vec entries are PAGE_SIZE; > */ > - int sectors = r10_bio->sectors; > - for (j = 0; j < vcnt; j++) { > - int len = PAGE_SIZE; > - if (sectors < (len / 512)) > - len = sectors * 512; > - if (memcmp(page_address(fpages[j]), > - page_address(tpages[j]), > - len)) > - break; > - sectors -= len/512; > + if (memcmp(folio_address(ffolio), > + folio_address(tfolio), > + r10_bio->sectors << 9)) { > + atomic64_add(r10_bio->sectors, > + &mddev->resync_mismatches); > + if (test_bit(MD_RECOVERY_CHECK, > + &mddev->recovery)) > + /* Don't fix anything. */ > + continue; > } > - if (j == vcnt) > - continue; > - atomic64_add(r10_bio->sectors, &mddev->resync_mismatches); > - if (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) > - /* Don't fix anything. */ > - continue; > } else if (test_bit(FailFast, &rdev->flags)) { > /* Just give up on this device */ > md_error(rdev->mddev, rdev); > @@ -2410,13 +2399,13 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio) > * First we need to fixup bv_offset, bv_len and > * bi_vecs, as the read request might have corrupted these > */ > - rp = get_resync_pages(tbio); > + rf = get_resync_folio(tbio); > bio_reset(tbio, conf->mirrors[d].rdev->bdev, REQ_OP_WRITE); > > - md_bio_reset_resync_pages(tbio, rp, fbio->bi_iter.bi_size); > + md_bio_reset_resync_folio(tbio, rf, fbio->bi_iter.bi_size); > > - rp->raid_bio = r10_bio; > - tbio->bi_private = rp; > + rf->raid_bio = r10_bio; > + tbio->bi_private = rf; > tbio->bi_iter.bi_sector = r10_bio->devs[i].addr; > tbio->bi_end_io = end_sync_write; > > @@ -2476,10 +2465,9 @@ static void fix_recovery_read_error(struct r10bio *r10_bio) > struct bio *bio = r10_bio->devs[0].bio; > sector_t sect = 0; > int sectors = r10_bio->sectors; > - int idx = 0; > int dr = r10_bio->devs[0].devnum; > int dw = r10_bio->devs[1].devnum; > - struct page **pages = get_resync_pages(bio)->pages; > + struct folio *folio = get_resync_folio(bio)->folio; > > while (sectors) { > int s = sectors; > @@ -2492,19 +2480,21 @@ static void fix_recovery_read_error(struct r10bio *r10_bio) > > rdev = conf->mirrors[dr].rdev; > addr = r10_bio->devs[0].addr + sect; > - ok = sync_page_io(rdev, > - addr, > - s << 9, > - pages[idx], > - REQ_OP_READ, false); > + ok = sync_folio_io(rdev, > + addr, > + s << 9, > + sect << 9, > + folio, > + REQ_OP_READ, false); > if (ok) { > rdev = conf->mirrors[dw].rdev; > addr = r10_bio->devs[1].addr + sect; > - ok = sync_page_io(rdev, > - addr, > - s << 9, > - pages[idx], > - REQ_OP_WRITE, false); > + ok = sync_folio_io(rdev, > + addr, > + s << 9, > + sect << 9, > + folio, > + REQ_OP_WRITE, false); > if (!ok) { > set_bit(WriteErrorSeen, &rdev->flags); > if (!test_and_set_bit(WantReplacement, > @@ -2539,7 +2529,6 @@ static void fix_recovery_read_error(struct r10bio *r10_bio) > > sectors -= s; > sect += s; > - idx++; > } > } > > @@ -3050,7 +3039,7 @@ static int init_resync(struct r10conf *conf) > static struct r10bio *raid10_alloc_init_r10buf(struct r10conf *conf) > { > struct r10bio *r10bio = mempool_alloc(&conf->r10buf_pool, GFP_NOIO); > - struct rsync_pages *rp; > + struct resync_folio *rf; > struct bio *bio; > int nalloc; > int i; > @@ -3063,14 +3052,14 @@ static struct r10bio *raid10_alloc_init_r10buf(struct r10conf *conf) > > for (i = 0; i < nalloc; i++) { > bio = r10bio->devs[i].bio; > - rp = bio->bi_private; > + rf = bio->bi_private; > bio_reset(bio, NULL, 0); > - bio->bi_private = rp; > + bio->bi_private = rf; > bio = r10bio->devs[i].repl_bio; > if (bio) { > - rp = bio->bi_private; > + rf = bio->bi_private; > bio_reset(bio, NULL, 0); > - bio->bi_private = rp; > + bio->bi_private = rf; > } > } > return r10bio; > @@ -3156,7 +3145,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr, > int max_sync = RESYNC_SECTORS; > sector_t sync_blocks; > sector_t chunk_mask = conf->geo.chunk_mask; > - int page_idx = 0; > > /* > * Allow skipping a full rebuild for incremental assembly > @@ -3376,6 +3364,15 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr, > continue; > } > } > + > + /* > + * RESYNC_BLOCK_SIZE folio might alloc failed in > + * resync_alloc_folio(). Fall back to smaller sync > + * size if needed. > + */ > + if (max_sync > r10_bio->sectors) > + max_sync = r10_bio->sectors; > + > any_working = 1; > bio = r10_bio->devs[0].bio; > bio->bi_next = biolist; > @@ -3527,7 +3524,15 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr, > } > if (sync_blocks < max_sync) > max_sync = sync_blocks; > + > r10_bio = raid10_alloc_init_r10buf(conf); > + /* > + * RESYNC_BLOCK_SIZE folio might alloc failed in resync_alloc_folio(). > + * Fall back to smaller sync size if needed. > + */ > + if (max_sync > r10_bio->sectors) > + max_sync = r10_bio->sectors; > + > r10_bio->state = 0; > > r10_bio->mddev = mddev; > @@ -3620,29 +3625,25 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr, > } > } > > - nr_sectors = 0; > if (sector_nr + max_sync < max_sector) > max_sector = sector_nr + max_sync; > do { > - struct page *page; > - int len = PAGE_SIZE; > - if (sector_nr + (len>>9) > max_sector) > - len = (max_sector - sector_nr) << 9; > - if (len == 0) > + nr_sectors = max_sector - sector_nr; > + > + if (nr_sectors == 0) > break; > for (bio= biolist ; bio ; bio=bio->bi_next) { > - struct resync_pages *rp = get_resync_pages(bio); > - page = resync_fetch_page(rp, page_idx); > - if (WARN_ON(!bio_add_page(bio, page, len, 0))) { > + struct resync_folio *rf = get_resync_folio(bio); > + > + if (WARN_ON(!bio_add_folio(bio, rf->folio, nr_sectors << 9, 0))) { > bio->bi_status = BLK_STS_RESOURCE; > bio_endio(bio); > *skipped = 1; > - return max_sync; > + return nr_sectors << 9; > } > } > - nr_sectors += len>>9; > - sector_nr += len>>9; > - } while (++page_idx < RESYNC_PAGES); > + sector_nr += nr_sectors; > + } while (0); > r10_bio->sectors = nr_sectors; > > if (mddev_is_clustered(mddev) && > @@ -4560,7 +4561,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, > int *skipped) > { > /* We simply copy at most one chunk (smallest of old and new) > - * at a time, possibly less if that exceeds RESYNC_PAGES, > + * at a time, possibly less if that exceeds RESYNC_BLOCK_SIZE, > * or we hit a bad block or something. > * This might mean we pause for normal IO in the middle of > * a chunk, but that is not a problem as mddev->reshape_position > @@ -4600,14 +4601,13 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, > struct r10bio *r10_bio; > sector_t next, safe, last; > int max_sectors; > - int nr_sectors; > int s; > struct md_rdev *rdev; > int need_flush = 0; > struct bio *blist; > struct bio *bio, *read_bio; > int sectors_done = 0; > - struct page **pages; > + struct folio *folio; > > if (sector_nr == 0) { > /* If restarting in the middle, skip the initial sectors */ > @@ -4709,7 +4709,12 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, > r10_bio->mddev = mddev; > r10_bio->sector = sector_nr; > set_bit(R10BIO_IsReshape, &r10_bio->state); > - r10_bio->sectors = last - sector_nr + 1; > + /* > + * RESYNC_BLOCK_SIZE folio might alloc failed in > + * resync_alloc_folio(). Fall back to smaller sync > + * size if needed. > + */ > + r10_bio->sectors = min_t(int, r10_bio->sectors, last - sector_nr + 1); > rdev = read_balance(conf, r10_bio, &max_sectors); > BUG_ON(!test_bit(R10BIO_Previous, &r10_bio->state)); > > @@ -4723,7 +4728,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, > return sectors_done; > } > > - read_bio = bio_alloc_bioset(rdev->bdev, RESYNC_PAGES, REQ_OP_READ, > + read_bio = bio_alloc_bioset(rdev->bdev, 1, REQ_OP_READ, > GFP_KERNEL, &mddev->bio_set); > read_bio->bi_iter.bi_sector = (r10_bio->devs[r10_bio->read_slot].addr > + rdev->data_offset); > @@ -4787,32 +4792,23 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, > blist = b; > } > > - /* Now add as many pages as possible to all of these bios. */ > + /* Now add folio to all of these bios. */ > > - nr_sectors = 0; > - pages = get_resync_pages(r10_bio->devs[0].bio)->pages; > - for (s = 0 ; s < max_sectors; s += PAGE_SIZE >> 9) { > - struct page *page = pages[s / (PAGE_SIZE >> 9)]; > - int len = (max_sectors - s) << 9; > - if (len > PAGE_SIZE) > - len = PAGE_SIZE; > - for (bio = blist; bio ; bio = bio->bi_next) { > - if (WARN_ON(!bio_add_page(bio, page, len, 0))) { > - bio->bi_status = BLK_STS_RESOURCE; > - bio_endio(bio); > - return sectors_done; > - } > + folio = get_resync_folio(r10_bio->devs[0].bio)->folio; > + for (bio = blist; bio ; bio = bio->bi_next) { > + if (WARN_ON(!bio_add_folio(bio, folio, max_sectors, 0))) { > + bio->bi_status = BLK_STS_RESOURCE; > + bio_endio(bio); > + return sectors_done; In fact, the original codes don't clean up before returning. bio_add_folio_nofail is used in raid1 and can we use bio_add_folio_nofail here as well? > } > - sector_nr += len >> 9; > - nr_sectors += len >> 9; > } > - r10_bio->sectors = nr_sectors; > + r10_bio->sectors = max_sectors >> 9; > > /* Now submit the read */ > atomic_inc(&r10_bio->remaining); > read_bio->bi_next = NULL; > submit_bio_noacct(read_bio); > - sectors_done += nr_sectors; > + sectors_done += max_sectors; > if (sector_nr <= last) > goto read_more; > > @@ -4914,8 +4910,8 @@ static int handle_reshape_read_error(struct mddev *mddev, > struct r10conf *conf = mddev->private; > struct r10bio *r10b; > int slot = 0; > - int idx = 0; > - struct page **pages; > + int sect = 0; > + struct folio *folio; > > r10b = kmalloc(struct_size(r10b, devs, conf->copies), GFP_NOIO); > if (!r10b) { > @@ -4923,8 +4919,8 @@ static int handle_reshape_read_error(struct mddev *mddev, > return -ENOMEM; > } > > - /* reshape IOs share pages from .devs[0].bio */ > - pages = get_resync_pages(r10_bio->devs[0].bio)->pages; > + /* reshape IOs share folio from .devs[0].bio */ > + folio = get_resync_folio(r10_bio->devs[0].bio)->folio; > > r10b->sector = r10_bio->sector; > __raid10_find_phys(&conf->prev, r10b); > @@ -4940,19 +4936,19 @@ static int handle_reshape_read_error(struct mddev *mddev, > while (!success) { > int d = r10b->devs[slot].devnum; > struct md_rdev *rdev = conf->mirrors[d].rdev; > - sector_t addr; > if (rdev == NULL || > test_bit(Faulty, &rdev->flags) || > !test_bit(In_sync, &rdev->flags)) > goto failed; > > - addr = r10b->devs[slot].addr + idx * PAGE_SIZE; > atomic_inc(&rdev->nr_pending); > - success = sync_page_io(rdev, > - addr, > - s << 9, > - pages[idx], > - REQ_OP_READ, false); > + success = sync_folio_io(rdev, > + r10b->devs[slot].addr + > + sect, > + s << 9, > + sect << 9, > + folio, > + REQ_OP_READ, false); > rdev_dec_pending(rdev, mddev); > if (success) > break; > @@ -4971,7 +4967,7 @@ static int handle_reshape_read_error(struct mddev *mddev, > return -EIO; > } > sectors -= s; > - idx++; > + sect += s; > } > kfree(r10b); > return 0; > -- > 2.39.2 > > Regards Xiao ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 6/8] md/raid1,raid10: use folio for sync path IO 2026-04-30 1:54 ` Xiao Ni @ 2026-05-07 7:13 ` 李楠 Magic Li 0 siblings, 0 replies; 13+ messages in thread From: 李楠 Magic Li @ 2026-05-07 7:13 UTC (permalink / raw) To: Xiao Ni, linan666@huaweicloud.com Cc: song@kernel.org, yukuai@fnnas.com, linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yangerkun@huawei.com, yi.zhang@huawei.com, 张同浩 Tonghao Zhang On Thu Apr 30, 2026 at 9:54 AM CST, Xiao Ni wrote: > Hi Nan > > On Thu, Apr 16, 2026 at 11:55 AM <linan666@huaweicloud.com> wrote: >> >> From: Li Nan <linan122@huawei.com> >> >> Convert all IO on the sync path to use folios, and rename page-related >> identifiers to match folio. >> >> Since RESYNC_BLOCK_SIZE (64K) has higher allocation failure chance than 4k, >> retry with lower orders to improve allocation reliability. A r1/10_bio may >> have different rf->folio orders, so use minimum order as r1/10_bio sectors >> to prevent exceeding size when adding folio to IO later. >> >> Clean up: >> 1. Remove resync_get_all_folio() and invoke folio_get() directly instead. >> 2. Clean up redundant while(0) loop in md_bio_reset_resync_folio(). >> 3. Clean up bio variable by directly referencing r10_bio->devs[j].bio >> instead in r1buf_pool_alloc() and r10buf_pool_alloc(). >> 4. Clean up RESYNC_PAGES. >> 5. Remove resync_fetch_folio(), access 'rf->folio' directly. >> 6. Remove resync_free_folio(), call folio_put() directly. >> 7. clean up sync IO size calculation in raid1/10_sync_request. >> >> Signed-off-by: Li Nan <linan122@huawei.com> >> --- >> drivers/md/md.c | 2 +- >> drivers/md/raid1-10.c | 80 ++++--------- >> drivers/md/raid1.c | 209 +++++++++++++++------------------- >> drivers/md/raid10.c | 254 +++++++++++++++++++++--------------------- >> 4 files changed, 240 insertions(+), 305 deletions(-) >> >> diff --git a/drivers/md/md.c b/drivers/md/md.c >> index 5e83914d5c14..6554b849ac74 100644 >> --- a/drivers/md/md.c >> +++ b/drivers/md/md.c >> @@ -9440,7 +9440,7 @@ static bool sync_io_within_limit(struct mddev *mddev) >> { >> /* >> * For raid456, sync IO is stripe(4k) per IO, for other levels, it's >> - * RESYNC_PAGES(64k) per IO. >> + * RESYNC_BLOCK_SIZE(64k) per IO. >> */ >> return atomic_read(&mddev->recovery_active) < >> (raid_is_456(mddev) ? 8 : 128) * sync_io_depth(mddev); >> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c >> index cda531d0720b..10200b0a3fd2 100644 >> --- a/drivers/md/raid1-10.c >> +++ b/drivers/md/raid1-10.c >> @@ -1,7 +1,6 @@ >> // SPDX-License-Identifier: GPL-2.0 >> /* Maximum size of each resync request */ >> #define RESYNC_BLOCK_SIZE (64*1024) >> -#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE) >> #define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9) >> >> /* when we get a read error on a read-only array, we redirect to another >> @@ -20,9 +19,9 @@ >> #define MAX_PLUG_BIO 32 >> >> /* for managing resync I/O pages */ >> -struct resync_pages { >> +struct resync_folio { >> void *raid_bio; >> - struct page *pages[RESYNC_PAGES]; >> + struct folio *folio; >> }; >> >> struct raid1_plug_cb { >> @@ -36,77 +35,44 @@ static void rbio_pool_free(void *rbio, void *data) >> kfree(rbio); >> } >> >> -static inline int resync_alloc_pages(struct resync_pages *rp, >> - gfp_t gfp_flags) >> +static inline int resync_alloc_folio(struct resync_folio *rf, >> + gfp_t gfp_flags, int *order) >> { >> - int i; >> + struct folio *folio; >> >> - for (i = 0; i < RESYNC_PAGES; i++) { >> - rp->pages[i] = alloc_page(gfp_flags); >> - if (!rp->pages[i]) >> - goto out_free; >> - } >> + do { >> + folio = folio_alloc(gfp_flags, *order); >> + if (folio) >> + break; >> + } while (--(*order) > 0); > > It has a problem here. If it can't allocate a big page, the sync > request unit will be smaller and sync performance may decrease. This > can happen when the system lacks sufficient continuous memory. This > change looks good to me. I just want to throw this problem out for an > open discussion. > Yeah, it can be easily reproduced in qemu. We have a few options: 1. Alloc smaller folio 2. Return -ENOMEM directly 3. Alloc multiple small folios to assemble a larger one. It is not and good idea, as it will make the code much more complex. IMO, 1 seems like the best choice. >> >> + if (!folio) >> + return -ENOMEM; >> + >> + rf->folio = folio; >> return 0; >> - >> -out_free: >> - while (--i >= 0) >> - put_page(rp->pages[i]); >> - return -ENOMEM; >> -} >> - >> -static inline void resync_free_pages(struct resync_pages *rp) >> -{ >> - int i; >> - >> - for (i = 0; i < RESYNC_PAGES; i++) >> - put_page(rp->pages[i]); >> -} >> - >> -static inline void resync_get_all_pages(struct resync_pages *rp) >> -{ >> - int i; >> - >> - for (i = 0; i < RESYNC_PAGES; i++) >> - get_page(rp->pages[i]); >> -} >> - >> -static inline struct page *resync_fetch_page(struct resync_pages *rp, >> - unsigned idx) >> -{ >> - if (WARN_ON_ONCE(idx >= RESYNC_PAGES)) >> - return NULL; >> - return rp->pages[idx]; >> } >> >> /* >> - * 'strct resync_pages' stores actual pages used for doing the resync >> + * 'strct resync_folio' stores actual pages used for doing the resync >> * IO, and it is per-bio, so make .bi_private points to it. >> */ >> -static inline struct resync_pages *get_resync_pages(struct bio *bio) >> +static inline struct resync_folio *get_resync_folio(struct bio *bio) >> { >> return bio->bi_private; >> } >> >> /* generally called after bio_reset() for reseting bvec */ >> -static void md_bio_reset_resync_pages(struct bio *bio, struct resync_pages *rp, >> +static void md_bio_reset_resync_folio(struct bio *bio, struct resync_folio *rf, >> int size) >> { >> - int idx = 0; >> - >> /* initialize bvec table again */ >> - do { >> - struct page *page = resync_fetch_page(rp, idx); >> - int len = min_t(int, size, PAGE_SIZE); >> - >> - if (WARN_ON(!bio_add_page(bio, page, len, 0))) { >> - bio->bi_status = BLK_STS_RESOURCE; >> - bio_endio(bio); >> - return; >> - } >> - >> - size -= len; >> - } while (idx++ < RESYNC_PAGES && size > 0); >> + if (WARN_ON(!bio_add_folio(bio, rf->folio, >> + min_t(int, size, RESYNC_BLOCK_SIZE), >> + 0))) { >> + bio->bi_status = BLK_STS_RESOURCE; >> + bio_endio(bio); >> + } >> } >> >> >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c >> index a72abdc37a2d..724fd4f2cc3a 100644 >> --- a/drivers/md/raid1.c >> +++ b/drivers/md/raid1.c >> @@ -120,11 +120,11 @@ static void remove_serial(struct md_rdev *rdev, sector_t lo, sector_t hi) >> >> /* >> * for resync bio, r1bio pointer can be retrieved from the per-bio >> - * 'struct resync_pages'. >> + * 'struct resync_folio'. >> */ >> static inline struct r1bio *get_resync_r1bio(struct bio *bio) >> { >> - return get_resync_pages(bio)->raid_bio; >> + return get_resync_folio(bio)->raid_bio; >> } >> >> static void *r1bio_pool_alloc(gfp_t gfp_flags, struct r1conf *conf) >> @@ -146,70 +146,69 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data) >> struct r1conf *conf = data; >> struct r1bio *r1_bio; >> struct bio *bio; >> - int need_pages; >> + int need_folio; > > The name need_folio is confusing. Can we keep the same style as the > old version? How about need_folios? > Agree, I will rename it in v2. >> int j; >> - struct resync_pages *rps; >> + struct resync_folio *rfs; >> + int order = get_order(RESYNC_BLOCK_SIZE); >> >> r1_bio = r1bio_pool_alloc(gfp_flags, conf); >> if (!r1_bio) >> return NULL; >> >> - rps = kmalloc_array(conf->raid_disks * 2, sizeof(struct resync_pages), >> + rfs = kmalloc_array(conf->raid_disks * 2, sizeof(struct resync_folio), >> gfp_flags); >> - if (!rps) >> + if (!rfs) >> goto out_free_r1bio; >> >> /* >> * Allocate bios : 1 for reading, n-1 for writing >> */ >> for (j = conf->raid_disks * 2; j-- ; ) { >> - bio = bio_kmalloc(RESYNC_PAGES, gfp_flags); >> + bio = bio_kmalloc(1, gfp_flags); >> if (!bio) >> goto out_free_bio; >> - bio_init_inline(bio, NULL, RESYNC_PAGES, 0); >> + bio_init_inline(bio, NULL, 1, 0); >> r1_bio->bios[j] = bio; >> } >> /* >> - * Allocate RESYNC_PAGES data pages and attach them to >> - * the first bio. >> + * Allocate data folio and attach it to the first bio. >> * If this is a user-requested check/repair, allocate >> - * RESYNC_PAGES for each bio. >> + * folio for each bio. >> */ >> if (test_bit(MD_RECOVERY_REQUESTED, &conf->mddev->recovery)) >> - need_pages = conf->raid_disks * 2; >> + need_folio = conf->raid_disks * 2; >> else >> - need_pages = 1; >> + need_folio = 1; >> for (j = 0; j < conf->raid_disks * 2; j++) { >> - struct resync_pages *rp = &rps[j]; >> + struct resync_folio *rf = &rfs[j]; >> >> - bio = r1_bio->bios[j]; >> - >> - if (j < need_pages) { >> - if (resync_alloc_pages(rp, gfp_flags)) >> - goto out_free_pages; >> + if (j < need_folio) { >> + if (resync_alloc_folio(rf, gfp_flags, &order)) >> + goto out_free_folio; >> } else { >> - memcpy(rp, &rps[0], sizeof(*rp)); >> - resync_get_all_pages(rp); >> + memcpy(rf, &rfs[0], sizeof(*rf)); >> + folio_get(rf->folio); >> } >> >> - rp->raid_bio = r1_bio; >> - bio->bi_private = rp; >> + rf->raid_bio = r1_bio; >> + r1_bio->bios[j]->bi_private = rf; >> } >> >> + r1_bio->sectors = 1 << (order + PAGE_SECTORS_SHIFT); >> r1_bio->master_bio = NULL; >> >> return r1_bio; >> >> -out_free_pages: >> +out_free_folio: >> while (--j >= 0) >> - resync_free_pages(&rps[j]); >> + folio_put(rfs[j].folio); >> >> out_free_bio: >> while (++j < conf->raid_disks * 2) { >> bio_uninit(r1_bio->bios[j]); >> kfree(r1_bio->bios[j]); >> } >> - kfree(rps); >> + kfree(rfs); >> >> out_free_r1bio: >> rbio_pool_free(r1_bio, data); >> @@ -221,17 +220,17 @@ static void r1buf_pool_free(void *__r1_bio, void *data) >> struct r1conf *conf = data; >> int i; >> struct r1bio *r1bio = __r1_bio; >> - struct resync_pages *rp = NULL; >> + struct resync_folio *rf = NULL; >> >> for (i = conf->raid_disks * 2; i--; ) { >> - rp = get_resync_pages(r1bio->bios[i]); >> - resync_free_pages(rp); >> + rf = get_resync_folio(r1bio->bios[i]); >> + folio_put(rf->folio); >> bio_uninit(r1bio->bios[i]); >> kfree(r1bio->bios[i]); >> } >> >> - /* resync pages array stored in the 1st bio's .bi_private */ >> - kfree(rp); >> + /* resync folio stored in the 1st bio's .bi_private */ >> + kfree(rf); >> >> rbio_pool_free(r1bio, data); >> } >> @@ -2095,10 +2094,10 @@ static void end_sync_write(struct bio *bio) >> put_sync_write_buf(r1_bio); >> } >> >> -static int r1_sync_page_io(struct md_rdev *rdev, sector_t sector, >> - int sectors, struct page *page, blk_opf_t rw) >> +static int r1_sync_folio_io(struct md_rdev *rdev, sector_t sector, int sectors, >> + int off, struct folio *folio, blk_opf_t rw) >> { >> - if (sync_page_io(rdev, sector, sectors << 9, page, rw, false)) >> + if (sync_folio_io(rdev, sector, sectors << 9, off, folio, rw, false)) >> /* success */ >> return 1; >> if (rw == REQ_OP_WRITE) { >> @@ -2129,10 +2128,10 @@ static int fix_sync_read_error(struct r1bio *r1_bio) >> struct mddev *mddev = r1_bio->mddev; >> struct r1conf *conf = mddev->private; >> struct bio *bio = r1_bio->bios[r1_bio->read_disk]; >> - struct page **pages = get_resync_pages(bio)->pages; >> + struct folio *folio = get_resync_folio(bio)->folio; >> sector_t sect = r1_bio->sector; >> int sectors = r1_bio->sectors; >> - int idx = 0; >> + int off = 0; >> struct md_rdev *rdev; >> >> rdev = conf->mirrors[r1_bio->read_disk].rdev; >> @@ -2162,9 +2161,8 @@ static int fix_sync_read_error(struct r1bio *r1_bio) >> * active, and resync is currently active >> */ >> rdev = conf->mirrors[d].rdev; >> - if (sync_page_io(rdev, sect, s<<9, >> - pages[idx], >> - REQ_OP_READ, false)) { >> + if (sync_folio_io(rdev, sect, s<<9, off, folio, >> + REQ_OP_READ, false)) { >> success = 1; >> break; >> } >> @@ -2197,7 +2195,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio) >> /* Try next page */ >> sectors -= s; >> sect += s; >> - idx++; >> + off += s << 9; >> continue; >> } >> >> @@ -2210,8 +2208,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio) >> if (r1_bio->bios[d]->bi_end_io != end_sync_read) >> continue; >> rdev = conf->mirrors[d].rdev; >> - if (r1_sync_page_io(rdev, sect, s, >> - pages[idx], >> + if (r1_sync_folio_io(rdev, sect, s, off, folio, >> REQ_OP_WRITE) == 0) { >> r1_bio->bios[d]->bi_end_io = NULL; >> rdev_dec_pending(rdev, mddev); >> @@ -2225,14 +2222,13 @@ static int fix_sync_read_error(struct r1bio *r1_bio) >> if (r1_bio->bios[d]->bi_end_io != end_sync_read) >> continue; >> rdev = conf->mirrors[d].rdev; >> - if (r1_sync_page_io(rdev, sect, s, >> - pages[idx], >> + if (r1_sync_folio_io(rdev, sect, s, off, folio, >> REQ_OP_READ) != 0) >> atomic_add(s, &rdev->corrected_errors); >> } >> sectors -= s; >> sect += s; >> - idx ++; >> + off += s << 9; >> } >> set_bit(R1BIO_Uptodate, &r1_bio->state); >> bio->bi_status = 0; >> @@ -2252,14 +2248,12 @@ static void process_checks(struct r1bio *r1_bio) >> struct r1conf *conf = mddev->private; >> int primary; >> int i; >> - int vcnt; >> >> /* Fix variable parts of all bios */ >> - vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9); >> for (i = 0; i < conf->raid_disks * 2; i++) { >> blk_status_t status; >> struct bio *b = r1_bio->bios[i]; >> - struct resync_pages *rp = get_resync_pages(b); >> + struct resync_folio *rf = get_resync_folio(b); >> if (b->bi_end_io != end_sync_read) >> continue; >> /* fixup the bio for reuse, but preserve errno */ >> @@ -2269,11 +2263,11 @@ static void process_checks(struct r1bio *r1_bio) >> b->bi_iter.bi_sector = r1_bio->sector + >> conf->mirrors[i].rdev->data_offset; >> b->bi_end_io = end_sync_read; >> - rp->raid_bio = r1_bio; >> - b->bi_private = rp; >> + rf->raid_bio = r1_bio; >> + b->bi_private = rf; >> >> /* initialize bvec table again */ >> - md_bio_reset_resync_pages(b, rp, r1_bio->sectors << 9); >> + md_bio_reset_resync_folio(b, rf, r1_bio->sectors << 9); >> } >> for (primary = 0; primary < conf->raid_disks * 2; primary++) >> if (r1_bio->bios[primary]->bi_end_io == end_sync_read && >> @@ -2284,44 +2278,39 @@ static void process_checks(struct r1bio *r1_bio) >> } >> r1_bio->read_disk = primary; >> for (i = 0; i < conf->raid_disks * 2; i++) { >> - int j = 0; >> struct bio *pbio = r1_bio->bios[primary]; >> struct bio *sbio = r1_bio->bios[i]; >> blk_status_t status = sbio->bi_status; >> - struct page **ppages = get_resync_pages(pbio)->pages; >> - struct page **spages = get_resync_pages(sbio)->pages; >> - struct bio_vec *bi; >> - int page_len[RESYNC_PAGES] = { 0 }; >> - struct bvec_iter_all iter_all; >> + struct folio *pfolio = get_resync_folio(pbio)->folio; >> + struct folio *sfolio = get_resync_folio(sbio)->folio; >> >> if (sbio->bi_end_io != end_sync_read) >> continue; >> /* Now we can 'fixup' the error value */ >> sbio->bi_status = 0; >> >> - bio_for_each_segment_all(bi, sbio, iter_all) >> - page_len[j++] = bi->bv_len; >> - >> - if (!status) { >> - for (j = vcnt; j-- ; ) { >> - if (memcmp(page_address(ppages[j]), >> - page_address(spages[j]), >> - page_len[j])) >> - break; >> - } >> - } else >> - j = 0; >> - if (j >= 0) >> + /* >> + * Copy data and submit write in two cases: >> + * - IO error (non-zero status) >> + * - Data inconsistency and not a CHECK operation. >> + */ >> + if (status) { >> atomic64_add(r1_bio->sectors, &mddev->resync_mismatches); >> - if (j < 0 || (test_bit(MD_RECOVERY_CHECK, &mddev->recovery) >> - && !status)) { >> - /* No need to write to this device. */ >> - sbio->bi_end_io = NULL; >> - rdev_dec_pending(conf->mirrors[i].rdev, mddev); >> + bio_copy_data(sbio, pbio); >> continue; >> + } else if (memcmp(folio_address(pfolio), >> + folio_address(sfolio), >> + r1_bio->sectors << 9)) { >> + atomic64_add(r1_bio->sectors, &mddev->resync_mismatches); >> + if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) { >> + bio_copy_data(sbio, pbio); >> + continue; >> + } >> } >> >> - bio_copy_data(sbio, pbio); >> + /* No need to write to this device. */ >> + sbio->bi_end_io = NULL; >> + rdev_dec_pending(conf->mirrors[i].rdev, mddev); >> } >> } >> >> @@ -2446,9 +2435,8 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio) >> if (rdev && >> !test_bit(Faulty, &rdev->flags)) { >> atomic_inc(&rdev->nr_pending); >> - r1_sync_page_io(rdev, sect, s, >> - folio_page(conf->tmpfolio, 0), >> - REQ_OP_WRITE); >> + r1_sync_folio_io(rdev, sect, s, 0, >> + conf->tmpfolio, REQ_OP_WRITE); >> rdev_dec_pending(rdev, mddev); >> } >> } >> @@ -2461,9 +2449,8 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio) >> if (rdev && >> !test_bit(Faulty, &rdev->flags)) { >> atomic_inc(&rdev->nr_pending); >> - if (r1_sync_page_io(rdev, sect, s, >> - folio_page(conf->tmpfolio, 0), >> - REQ_OP_READ)) { >> + if (r1_sync_folio_io(rdev, sect, s, 0, >> + conf->tmpfolio, REQ_OP_READ)) { >> atomic_add(s, &rdev->corrected_errors); >> pr_info("md/raid1:%s: read error corrected (%d sectors at %llu on %pg)\n", >> mdname(mddev), s, >> @@ -2738,15 +2725,15 @@ static int init_resync(struct r1conf *conf) >> static struct r1bio *raid1_alloc_init_r1buf(struct r1conf *conf) >> { >> struct r1bio *r1bio = mempool_alloc(&conf->r1buf_pool, GFP_NOIO); >> - struct resync_pages *rps; >> + struct resync_folio *rfs; >> struct bio *bio; >> int i; >> >> for (i = conf->raid_disks * 2; i--; ) { >> bio = r1bio->bios[i]; >> - rps = bio->bi_private; >> + rfs = bio->bi_private; >> bio_reset(bio, NULL, 0); >> - bio->bi_private = rps; >> + bio->bi_private = rfs; >> } >> r1bio->master_bio = NULL; >> return r1bio; >> @@ -2775,10 +2762,9 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr, >> int write_targets = 0, read_targets = 0; >> sector_t sync_blocks; >> bool still_degraded = false; >> - int good_sectors = RESYNC_SECTORS; >> + int good_sectors; >> int min_bad = 0; /* number of sectors that are bad in all devices */ >> int idx = sector_to_idx(sector_nr); >> - int page_idx = 0; >> >> if (!mempool_initialized(&conf->r1buf_pool)) >> if (init_resync(conf)) >> @@ -2858,8 +2844,11 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr, >> r1_bio->sector = sector_nr; >> r1_bio->state = 0; >> set_bit(R1BIO_IsSync, &r1_bio->state); >> - /* make sure good_sectors won't go across barrier unit boundary */ >> - good_sectors = align_to_barrier_unit_end(sector_nr, good_sectors); >> + /* >> + * make sure good_sectors won't go across barrier unit boundary. >> + * r1_bio->sectors <= RESYNC_SECTORS. >> + */ >> + good_sectors = align_to_barrier_unit_end(sector_nr, r1_bio->sectors); >> >> for (i = 0; i < conf->raid_disks * 2; i++) { >> struct md_rdev *rdev; >> @@ -2979,44 +2968,28 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr, >> max_sector = mddev->resync_max; /* Don't do IO beyond here */ >> if (max_sector > sector_nr + good_sectors) >> max_sector = sector_nr + good_sectors; >> - nr_sectors = 0; >> - sync_blocks = 0; >> do { >> - struct page *page; >> - int len = PAGE_SIZE; >> - if (sector_nr + (len>>9) > max_sector) >> - len = (max_sector - sector_nr) << 9; >> - if (len == 0) >> + nr_sectors = max_sector - sector_nr; >> + if (nr_sectors == 0) >> break; >> - if (sync_blocks == 0) { >> - if (!md_bitmap_start_sync(mddev, sector_nr, >> - &sync_blocks, still_degraded) && >> - !conf->fullsync && >> - !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) >> - break; >> - if ((len >> 9) > sync_blocks) >> - len = sync_blocks<<9; >> - } >> + if (!md_bitmap_start_sync(mddev, sector_nr, >> + &sync_blocks, still_degraded) && >> + !conf->fullsync && >> + !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) >> + break; >> + if (nr_sectors > sync_blocks) >> + nr_sectors = sync_blocks; >> >> for (i = 0 ; i < conf->raid_disks * 2; i++) { >> - struct resync_pages *rp; >> - >> bio = r1_bio->bios[i]; >> - rp = get_resync_pages(bio); >> if (bio->bi_end_io) { >> - page = resync_fetch_page(rp, page_idx); >> + struct resync_folio *rf = get_resync_folio(bio); >> >> - /* >> - * won't fail because the vec table is big >> - * enough to hold all these pages >> - */ >> - __bio_add_page(bio, page, len, 0); >> + bio_add_folio_nofail(bio, rf->folio, nr_sectors << 9, 0); >> } >> } >> - nr_sectors += len>>9; >> - sector_nr += len>>9; >> - sync_blocks -= (len>>9); >> - } while (++page_idx < RESYNC_PAGES); >> + sector_nr += nr_sectors; >> + } while (0); > > Now it can handle all pages in one go via a folio. It's strange to > keep while(0) here. > I tried cleanning up while(0), it made 'if' and 'break' statements unreadable. So I kept while(0) here. > >> >> r1_bio->sectors = nr_sectors; > > > This patch is a little big. Is it better to split this patch here? > It can't be splitted. The changes in raid1.c and raid10.c are entirely about resync_pages -> resync_folio. We have to change declaration and its usage in one patch. >> >> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c >> index 26f93040cd13..3638e00fe420 100644 >> --- a/drivers/md/raid10.c >> +++ b/drivers/md/raid10.c >> @@ -96,11 +96,11 @@ static void end_reshape(struct r10conf *conf); >> >> /* >> * for resync bio, r10bio pointer can be retrieved from the per-bio >> - * 'struct resync_pages'. >> + * 'struct resync_folio'. >> */ >> static inline struct r10bio *get_resync_r10bio(struct bio *bio) >> { >> - return get_resync_pages(bio)->raid_bio; >> + return get_resync_folio(bio)->raid_bio; >> } >> >> static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data) >> @@ -133,8 +133,9 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data) >> struct r10bio *r10_bio; >> struct bio *bio; >> int j; >> - int nalloc, nalloc_rp; >> - struct resync_pages *rps; >> + int nalloc, nalloc_rf; >> + struct resync_folio *rfs; >> + int order = get_order(RESYNC_BLOCK_SIZE); >> >> r10_bio = r10bio_pool_alloc(gfp_flags, conf); >> if (!r10_bio) >> @@ -148,66 +149,64 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data) >> >> /* allocate once for all bios */ >> if (!conf->have_replacement) >> - nalloc_rp = nalloc; >> + nalloc_rf = nalloc; >> else >> - nalloc_rp = nalloc * 2; >> - rps = kmalloc_array(nalloc_rp, sizeof(struct resync_pages), gfp_flags); >> - if (!rps) >> + nalloc_rf = nalloc * 2; >> + rfs = kmalloc_array(nalloc_rf, sizeof(struct resync_folio), gfp_flags); >> + if (!rfs) >> goto out_free_r10bio; >> >> /* >> * Allocate bios. >> */ >> for (j = nalloc ; j-- ; ) { >> - bio = bio_kmalloc(RESYNC_PAGES, gfp_flags); >> + bio = bio_kmalloc(1, gfp_flags); >> if (!bio) >> goto out_free_bio; >> - bio_init_inline(bio, NULL, RESYNC_PAGES, 0); >> + bio_init_inline(bio, NULL, 1, 0); >> r10_bio->devs[j].bio = bio; >> if (!conf->have_replacement) >> continue; >> - bio = bio_kmalloc(RESYNC_PAGES, gfp_flags); >> + bio = bio_kmalloc(1, gfp_flags); >> if (!bio) >> goto out_free_bio; >> - bio_init_inline(bio, NULL, RESYNC_PAGES, 0); >> + bio_init_inline(bio, NULL, 1, 0); >> r10_bio->devs[j].repl_bio = bio; >> } >> /* >> - * Allocate RESYNC_PAGES data pages and attach them >> - * where needed. >> + * Allocate data folio and attach it where needed. >> */ >> for (j = 0; j < nalloc; j++) { >> struct bio *rbio = r10_bio->devs[j].repl_bio; >> - struct resync_pages *rp, *rp_repl; >> + struct resync_folio *rf, *rf_repl; >> >> - rp = &rps[j]; >> + rf = &rfs[j]; >> if (rbio) >> - rp_repl = &rps[nalloc + j]; >> - >> - bio = r10_bio->devs[j].bio; >> + rf_repl = &rfs[nalloc + j]; >> >> if (!j || test_bit(MD_RECOVERY_SYNC, >> &conf->mddev->recovery)) { >> - if (resync_alloc_pages(rp, gfp_flags)) >> - goto out_free_pages; >> + if (resync_alloc_folio(rf, gfp_flags, &order)) >> + goto out_free_folio; >> } else { >> - memcpy(rp, &rps[0], sizeof(*rp)); >> - resync_get_all_pages(rp); >> + memcpy(rf, &rfs[0], sizeof(*rf)); >> + folio_get(rf->folio); >> } >> >> - rp->raid_bio = r10_bio; >> - bio->bi_private = rp; >> + rf->raid_bio = r10_bio; >> + r10_bio->devs[j].bio->bi_private = rf; >> if (rbio) { >> - memcpy(rp_repl, rp, sizeof(*rp)); >> - rbio->bi_private = rp_repl; >> + memcpy(rf_repl, rf, sizeof(*rf)); >> + rbio->bi_private = rf_repl; >> } >> } >> >> + r10_bio->sectors = 1 << (order + PAGE_SECTORS_SHIFT); >> return r10_bio; >> >> -out_free_pages: >> +out_free_folio: >> while (--j >= 0) >> - resync_free_pages(&rps[j]); >> + folio_put(rfs[j].folio); >> >> j = 0; >> out_free_bio: >> @@ -219,7 +218,7 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data) >> bio_uninit(r10_bio->devs[j].repl_bio); >> kfree(r10_bio->devs[j].repl_bio); >> } >> - kfree(rps); >> + kfree(rfs); >> out_free_r10bio: >> rbio_pool_free(r10_bio, conf); >> return NULL; >> @@ -230,14 +229,14 @@ static void r10buf_pool_free(void *__r10_bio, void *data) >> struct r10conf *conf = data; >> struct r10bio *r10bio = __r10_bio; >> int j; >> - struct resync_pages *rp = NULL; >> + struct resync_folio *rf = NULL; >> >> for (j = conf->copies; j--; ) { >> struct bio *bio = r10bio->devs[j].bio; >> >> if (bio) { >> - rp = get_resync_pages(bio); >> - resync_free_pages(rp); >> + rf = get_resync_folio(bio); >> + folio_put(rf->folio); >> bio_uninit(bio); >> kfree(bio); >> } >> @@ -250,7 +249,7 @@ static void r10buf_pool_free(void *__r10_bio, void *data) >> } >> >> /* resync pages array stored in the 1st bio's .bi_private */ >> - kfree(rp); >> + kfree(rf); >> >> rbio_pool_free(r10bio, conf); >> } >> @@ -2342,8 +2341,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio) >> struct r10conf *conf = mddev->private; >> int i, first; >> struct bio *tbio, *fbio; >> - int vcnt; >> - struct page **tpages, **fpages; >> + struct folio *tfolio, *ffolio; >> >> atomic_set(&r10_bio->remaining, 1); >> >> @@ -2359,14 +2357,13 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio) >> fbio = r10_bio->devs[i].bio; >> fbio->bi_iter.bi_size = r10_bio->sectors << 9; >> fbio->bi_iter.bi_idx = 0; >> - fpages = get_resync_pages(fbio)->pages; >> + ffolio = get_resync_folio(fbio)->folio; >> >> - vcnt = (r10_bio->sectors + (PAGE_SIZE >> 9) - 1) >> (PAGE_SHIFT - 9); >> /* now find blocks with errors */ >> for (i=0 ; i < conf->copies ; i++) { >> - int j, d; >> + int d; >> struct md_rdev *rdev; >> - struct resync_pages *rp; >> + struct resync_folio *rf; >> >> tbio = r10_bio->devs[i].bio; >> >> @@ -2375,31 +2372,23 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio) >> if (i == first) >> continue; >> >> - tpages = get_resync_pages(tbio)->pages; >> + tfolio = get_resync_folio(tbio)->folio; >> d = r10_bio->devs[i].devnum; >> rdev = conf->mirrors[d].rdev; >> if (!r10_bio->devs[i].bio->bi_status) { >> /* We know that the bi_io_vec layout is the same for >> * both 'first' and 'i', so we just compare them. >> - * All vec entries are PAGE_SIZE; >> */ >> - int sectors = r10_bio->sectors; >> - for (j = 0; j < vcnt; j++) { >> - int len = PAGE_SIZE; >> - if (sectors < (len / 512)) >> - len = sectors * 512; >> - if (memcmp(page_address(fpages[j]), >> - page_address(tpages[j]), >> - len)) >> - break; >> - sectors -= len/512; >> + if (memcmp(folio_address(ffolio), >> + folio_address(tfolio), >> + r10_bio->sectors << 9)) { >> + atomic64_add(r10_bio->sectors, >> + &mddev->resync_mismatches); >> + if (test_bit(MD_RECOVERY_CHECK, >> + &mddev->recovery)) >> + /* Don't fix anything. */ >> + continue; >> } >> - if (j == vcnt) >> - continue; >> - atomic64_add(r10_bio->sectors, &mddev->resync_mismatches); >> - if (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) >> - /* Don't fix anything. */ >> - continue; >> } else if (test_bit(FailFast, &rdev->flags)) { >> /* Just give up on this device */ >> md_error(rdev->mddev, rdev); >> @@ -2410,13 +2399,13 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio) >> * First we need to fixup bv_offset, bv_len and >> * bi_vecs, as the read request might have corrupted these >> */ >> - rp = get_resync_pages(tbio); >> + rf = get_resync_folio(tbio); >> bio_reset(tbio, conf->mirrors[d].rdev->bdev, REQ_OP_WRITE); >> >> - md_bio_reset_resync_pages(tbio, rp, fbio->bi_iter.bi_size); >> + md_bio_reset_resync_folio(tbio, rf, fbio->bi_iter.bi_size); >> >> - rp->raid_bio = r10_bio; >> - tbio->bi_private = rp; >> + rf->raid_bio = r10_bio; >> + tbio->bi_private = rf; >> tbio->bi_iter.bi_sector = r10_bio->devs[i].addr; >> tbio->bi_end_io = end_sync_write; >> >> @@ -2476,10 +2465,9 @@ static void fix_recovery_read_error(struct r10bio *r10_bio) >> struct bio *bio = r10_bio->devs[0].bio; >> sector_t sect = 0; >> int sectors = r10_bio->sectors; >> - int idx = 0; >> int dr = r10_bio->devs[0].devnum; >> int dw = r10_bio->devs[1].devnum; >> - struct page **pages = get_resync_pages(bio)->pages; >> + struct folio *folio = get_resync_folio(bio)->folio; >> >> while (sectors) { >> int s = sectors; >> @@ -2492,19 +2480,21 @@ static void fix_recovery_read_error(struct r10bio *r10_bio) >> >> rdev = conf->mirrors[dr].rdev; >> addr = r10_bio->devs[0].addr + sect; >> - ok = sync_page_io(rdev, >> - addr, >> - s << 9, >> - pages[idx], >> - REQ_OP_READ, false); >> + ok = sync_folio_io(rdev, >> + addr, >> + s << 9, >> + sect << 9, >> + folio, >> + REQ_OP_READ, false); >> if (ok) { >> rdev = conf->mirrors[dw].rdev; >> addr = r10_bio->devs[1].addr + sect; >> - ok = sync_page_io(rdev, >> - addr, >> - s << 9, >> - pages[idx], >> - REQ_OP_WRITE, false); >> + ok = sync_folio_io(rdev, >> + addr, >> + s << 9, >> + sect << 9, >> + folio, >> + REQ_OP_WRITE, false); >> if (!ok) { >> set_bit(WriteErrorSeen, &rdev->flags); >> if (!test_and_set_bit(WantReplacement, >> @@ -2539,7 +2529,6 @@ static void fix_recovery_read_error(struct r10bio *r10_bio) >> >> sectors -= s; >> sect += s; >> - idx++; >> } >> } >> >> @@ -3050,7 +3039,7 @@ static int init_resync(struct r10conf *conf) >> static struct r10bio *raid10_alloc_init_r10buf(struct r10conf *conf) >> { >> struct r10bio *r10bio = mempool_alloc(&conf->r10buf_pool, GFP_NOIO); >> - struct rsync_pages *rp; >> + struct resync_folio *rf; >> struct bio *bio; >> int nalloc; >> int i; >> @@ -3063,14 +3052,14 @@ static struct r10bio *raid10_alloc_init_r10buf(struct r10conf *conf) >> >> for (i = 0; i < nalloc; i++) { >> bio = r10bio->devs[i].bio; >> - rp = bio->bi_private; >> + rf = bio->bi_private; >> bio_reset(bio, NULL, 0); >> - bio->bi_private = rp; >> + bio->bi_private = rf; >> bio = r10bio->devs[i].repl_bio; >> if (bio) { >> - rp = bio->bi_private; >> + rf = bio->bi_private; >> bio_reset(bio, NULL, 0); >> - bio->bi_private = rp; >> + bio->bi_private = rf; >> } >> } >> return r10bio; >> @@ -3156,7 +3145,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr, >> int max_sync = RESYNC_SECTORS; >> sector_t sync_blocks; >> sector_t chunk_mask = conf->geo.chunk_mask; >> - int page_idx = 0; >> >> /* >> * Allow skipping a full rebuild for incremental assembly >> @@ -3376,6 +3364,15 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr, >> continue; >> } >> } >> + >> + /* >> + * RESYNC_BLOCK_SIZE folio might alloc failed in >> + * resync_alloc_folio(). Fall back to smaller sync >> + * size if needed. >> + */ >> + if (max_sync > r10_bio->sectors) >> + max_sync = r10_bio->sectors; >> + >> any_working = 1; >> bio = r10_bio->devs[0].bio; >> bio->bi_next = biolist; >> @@ -3527,7 +3524,15 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr, >> } >> if (sync_blocks < max_sync) >> max_sync = sync_blocks; >> + >> r10_bio = raid10_alloc_init_r10buf(conf); >> + /* >> + * RESYNC_BLOCK_SIZE folio might alloc failed in resync_alloc_folio(). >> + * Fall back to smaller sync size if needed. >> + */ >> + if (max_sync > r10_bio->sectors) >> + max_sync = r10_bio->sectors; >> + >> r10_bio->state = 0; >> >> r10_bio->mddev = mddev; >> @@ -3620,29 +3625,25 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr, >> } >> } >> >> - nr_sectors = 0; >> if (sector_nr + max_sync < max_sector) >> max_sector = sector_nr + max_sync; >> do { >> - struct page *page; >> - int len = PAGE_SIZE; >> - if (sector_nr + (len>>9) > max_sector) >> - len = (max_sector - sector_nr) << 9; >> - if (len == 0) >> + nr_sectors = max_sector - sector_nr; >> + >> + if (nr_sectors == 0) >> break; >> for (bio= biolist ; bio ; bio=bio->bi_next) { >> - struct resync_pages *rp = get_resync_pages(bio); >> - page = resync_fetch_page(rp, page_idx); >> - if (WARN_ON(!bio_add_page(bio, page, len, 0))) { >> + struct resync_folio *rf = get_resync_folio(bio); >> + >> + if (WARN_ON(!bio_add_folio(bio, rf->folio, nr_sectors << 9, 0))) { >> bio->bi_status = BLK_STS_RESOURCE; >> bio_endio(bio); >> *skipped = 1; >> - return max_sync; >> + return nr_sectors << 9; >> } >> } >> - nr_sectors += len>>9; >> - sector_nr += len>>9; >> - } while (++page_idx < RESYNC_PAGES); >> + sector_nr += nr_sectors; >> + } while (0); >> r10_bio->sectors = nr_sectors; >> >> if (mddev_is_clustered(mddev) && >> @@ -4560,7 +4561,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, >> int *skipped) >> { >> /* We simply copy at most one chunk (smallest of old and new) >> - * at a time, possibly less if that exceeds RESYNC_PAGES, >> + * at a time, possibly less if that exceeds RESYNC_BLOCK_SIZE, >> * or we hit a bad block or something. >> * This might mean we pause for normal IO in the middle of >> * a chunk, but that is not a problem as mddev->reshape_position >> @@ -4600,14 +4601,13 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, >> struct r10bio *r10_bio; >> sector_t next, safe, last; >> int max_sectors; >> - int nr_sectors; >> int s; >> struct md_rdev *rdev; >> int need_flush = 0; >> struct bio *blist; >> struct bio *bio, *read_bio; >> int sectors_done = 0; >> - struct page **pages; >> + struct folio *folio; >> >> if (sector_nr == 0) { >> /* If restarting in the middle, skip the initial sectors */ >> @@ -4709,7 +4709,12 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, >> r10_bio->mddev = mddev; >> r10_bio->sector = sector_nr; >> set_bit(R10BIO_IsReshape, &r10_bio->state); >> - r10_bio->sectors = last - sector_nr + 1; >> + /* >> + * RESYNC_BLOCK_SIZE folio might alloc failed in >> + * resync_alloc_folio(). Fall back to smaller sync >> + * size if needed. >> + */ >> + r10_bio->sectors = min_t(int, r10_bio->sectors, last - sector_nr + 1); >> rdev = read_balance(conf, r10_bio, &max_sectors); >> BUG_ON(!test_bit(R10BIO_Previous, &r10_bio->state)); >> >> @@ -4723,7 +4728,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, >> return sectors_done; >> } >> >> - read_bio = bio_alloc_bioset(rdev->bdev, RESYNC_PAGES, REQ_OP_READ, >> + read_bio = bio_alloc_bioset(rdev->bdev, 1, REQ_OP_READ, >> GFP_KERNEL, &mddev->bio_set); >> read_bio->bi_iter.bi_sector = (r10_bio->devs[r10_bio->read_slot].addr >> + rdev->data_offset); >> @@ -4787,32 +4792,23 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, >> blist = b; >> } >> >> - /* Now add as many pages as possible to all of these bios. */ >> + /* Now add folio to all of these bios. */ >> >> - nr_sectors = 0; >> - pages = get_resync_pages(r10_bio->devs[0].bio)->pages; >> - for (s = 0 ; s < max_sectors; s += PAGE_SIZE >> 9) { >> - struct page *page = pages[s / (PAGE_SIZE >> 9)]; >> - int len = (max_sectors - s) << 9; >> - if (len > PAGE_SIZE) >> - len = PAGE_SIZE; >> - for (bio = blist; bio ; bio = bio->bi_next) { >> - if (WARN_ON(!bio_add_page(bio, page, len, 0))) { >> - bio->bi_status = BLK_STS_RESOURCE; >> - bio_endio(bio); >> - return sectors_done; >> - } >> + folio = get_resync_folio(r10_bio->devs[0].bio)->folio; >> + for (bio = blist; bio ; bio = bio->bi_next) { >> + if (WARN_ON(!bio_add_folio(bio, folio, max_sectors, 0))) { >> + bio->bi_status = BLK_STS_RESOURCE; >> + bio_endio(bio); >> + return sectors_done; > > In fact, the original codes don't clean up before returning. > bio_add_folio_nofail is used in raid1 and can we use > bio_add_folio_nofail here as well? > Agree, I will clean it up before this patch. >> } >> - sector_nr += len >> 9; >> - nr_sectors += len >> 9; >> } >> - r10_bio->sectors = nr_sectors; >> + r10_bio->sectors = max_sectors >> 9; >> >> /* Now submit the read */ >> atomic_inc(&r10_bio->remaining); >> read_bio->bi_next = NULL; >> submit_bio_noacct(read_bio); >> - sectors_done += nr_sectors; >> + sectors_done += max_sectors; >> if (sector_nr <= last) >> goto read_more; >> >> @@ -4914,8 +4910,8 @@ static int handle_reshape_read_error(struct mddev *mddev, >> struct r10conf *conf = mddev->private; >> struct r10bio *r10b; >> int slot = 0; >> - int idx = 0; >> - struct page **pages; >> + int sect = 0; >> + struct folio *folio; >> >> r10b = kmalloc(struct_size(r10b, devs, conf->copies), GFP_NOIO); >> if (!r10b) { >> @@ -4923,8 +4919,8 @@ static int handle_reshape_read_error(struct mddev *mddev, >> return -ENOMEM; >> } >> >> - /* reshape IOs share pages from .devs[0].bio */ >> - pages = get_resync_pages(r10_bio->devs[0].bio)->pages; >> + /* reshape IOs share folio from .devs[0].bio */ >> + folio = get_resync_folio(r10_bio->devs[0].bio)->folio; >> >> r10b->sector = r10_bio->sector; >> __raid10_find_phys(&conf->prev, r10b); >> @@ -4940,19 +4936,19 @@ static int handle_reshape_read_error(struct mddev *mddev, >> while (!success) { >> int d = r10b->devs[slot].devnum; >> struct md_rdev *rdev = conf->mirrors[d].rdev; >> - sector_t addr; >> if (rdev == NULL || >> test_bit(Faulty, &rdev->flags) || >> !test_bit(In_sync, &rdev->flags)) >> goto failed; >> >> - addr = r10b->devs[slot].addr + idx * PAGE_SIZE; >> atomic_inc(&rdev->nr_pending); >> - success = sync_page_io(rdev, >> - addr, >> - s << 9, >> - pages[idx], >> - REQ_OP_READ, false); >> + success = sync_folio_io(rdev, >> + r10b->devs[slot].addr + >> + sect, >> + s << 9, >> + sect << 9, >> + folio, >> + REQ_OP_READ, false); >> rdev_dec_pending(rdev, mddev); >> if (success) >> break; >> @@ -4971,7 +4967,7 @@ static int handle_reshape_read_error(struct mddev *mddev, >> return -EIO; >> } >> sectors -= s; >> - idx++; >> + sect += s; >> } >> kfree(r10b); >> return 0; >> -- >> 2.39.2 >> >> > > Regards > Xiao -- Thansk Nan ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v3 7/8] md/raid1: fix IO error at logical block size granularity 2026-04-16 3:37 [PATCH v3 0/8] folio support for sync I/O in RAID linan666 ` (5 preceding siblings ...) 2026-04-16 3:37 ` [PATCH v3 6/8] md/raid1,raid10: use folio for sync path IO linan666 @ 2026-04-16 3:38 ` linan666 2026-04-30 2:22 ` Xiao Ni 2026-04-16 3:38 ` [PATCH v3 8/8] md/raid10: " linan666 7 siblings, 1 reply; 13+ messages in thread From: linan666 @ 2026-04-16 3:38 UTC (permalink / raw) To: song, yukuai; +Cc: linux-raid, linux-kernel, linan666, yangerkun, yi.zhang From: Li Nan <linan122@huawei.com> RAID1 currently fixes IO error at PAGE_SIZE granularity. Fix at smaller granularity can handle more errors, and RAID will support logical block sizes larger than PAGE_SIZE in the future, where PAGE_SIZE IO will fail. Switch IO error fix granularity to logical block size. Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Yu Kuai <yukuai@fnnas.com> --- drivers/md/raid1.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 724fd4f2cc3a..de8c964ca11d 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -2116,7 +2116,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio) { /* Try some synchronous reads of other devices to get * good data, much like with normal read errors. Only - * read into the pages we already have so we don't + * read into the block we already have so we don't * need to re-issue the read request. * We don't need to freeze the array, because being in an * active sync request, there is no normal IO, and @@ -2147,13 +2147,11 @@ static int fix_sync_read_error(struct r1bio *r1_bio) } while(sectors) { - int s = sectors; + int s = min_t(int, sectors, mddev->logical_block_size >> 9); int d = r1_bio->read_disk; int success = 0; int start; - if (s > (PAGE_SIZE>>9)) - s = PAGE_SIZE >> 9; do { if (r1_bio->bios[d]->bi_end_io == end_sync_read) { /* No rcu protection needed here devices @@ -2192,7 +2190,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio) if (abort) return 0; - /* Try next page */ + /* Try next block */ sectors -= s; sect += s; off += s << 9; @@ -2390,14 +2388,11 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio) } while(sectors) { - int s = sectors; + int s = min_t(int, sectors, mddev->logical_block_size >> 9); int d = read_disk; int success = 0; int start; - if (s > (PAGE_SIZE>>9)) - s = PAGE_SIZE >> 9; - do { rdev = conf->mirrors[d].rdev; if (rdev && -- 2.39.2 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v3 7/8] md/raid1: fix IO error at logical block size granularity 2026-04-16 3:38 ` [PATCH v3 7/8] md/raid1: fix IO error at logical block size granularity linan666 @ 2026-04-30 2:22 ` Xiao Ni 0 siblings, 0 replies; 13+ messages in thread From: Xiao Ni @ 2026-04-30 2:22 UTC (permalink / raw) To: linan666; +Cc: song, yukuai, linux-raid, linux-kernel, yangerkun, yi.zhang On Thu, Apr 16, 2026 at 11:55 AM <linan666@huaweicloud.com> wrote: > > From: Li Nan <linan122@huawei.com> > > RAID1 currently fixes IO error at PAGE_SIZE granularity. Fix at smaller > granularity can handle more errors, and RAID will support logical block > sizes larger than PAGE_SIZE in the future, where PAGE_SIZE IO will fail. > > Switch IO error fix granularity to logical block size. > > Signed-off-by: Li Nan <linan122@huawei.com> > Reviewed-by: Yu Kuai <yukuai@fnnas.com> > --- > drivers/md/raid1.c | 13 ++++--------- > 1 file changed, 4 insertions(+), 9 deletions(-) > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 724fd4f2cc3a..de8c964ca11d 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -2116,7 +2116,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio) > { > /* Try some synchronous reads of other devices to get > * good data, much like with normal read errors. Only > - * read into the pages we already have so we don't > + * read into the block we already have so we don't > * need to re-issue the read request. > * We don't need to freeze the array, because being in an > * active sync request, there is no normal IO, and > @@ -2147,13 +2147,11 @@ static int fix_sync_read_error(struct r1bio *r1_bio) > } > > while(sectors) { > - int s = sectors; > + int s = min_t(int, sectors, mddev->logical_block_size >> 9); > int d = r1_bio->read_disk; > int success = 0; > int start; > > - if (s > (PAGE_SIZE>>9)) > - s = PAGE_SIZE >> 9; > do { > if (r1_bio->bios[d]->bi_end_io == end_sync_read) { > /* No rcu protection needed here devices > @@ -2192,7 +2190,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio) > if (abort) > return 0; > > - /* Try next page */ > + /* Try next block */ > sectors -= s; > sect += s; > off += s << 9; > @@ -2390,14 +2388,11 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio) > } > > while(sectors) { > - int s = sectors; > + int s = min_t(int, sectors, mddev->logical_block_size >> 9); > int d = read_disk; > int success = 0; > int start; > > - if (s > (PAGE_SIZE>>9)) > - s = PAGE_SIZE >> 9; > - > do { > rdev = conf->mirrors[d].rdev; > if (rdev && > -- > 2.39.2 > > This patch looks good to me. Reviewed-by: Xiao Ni <xni@redhat.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v3 8/8] md/raid10: fix IO error at logical block size granularity 2026-04-16 3:37 [PATCH v3 0/8] folio support for sync I/O in RAID linan666 ` (6 preceding siblings ...) 2026-04-16 3:38 ` [PATCH v3 7/8] md/raid1: fix IO error at logical block size granularity linan666 @ 2026-04-16 3:38 ` linan666 2026-04-30 2:23 ` Xiao Ni 7 siblings, 1 reply; 13+ messages in thread From: linan666 @ 2026-04-16 3:38 UTC (permalink / raw) To: song, yukuai; +Cc: linux-raid, linux-kernel, linan666, yangerkun, yi.zhang From: Li Nan <linan122@huawei.com> RAID10 currently fixes IO error at PAGE_SIZE granularity. Fix at smaller granularity can handle more errors, and RAID will support logical block sizes larger than PAGE_SIZE in the future, where PAGE_SIZE IO will fail. Switch IO error fix granularity to logical block size. Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Yu Kuai <yukuai@fnnas.com> --- drivers/md/raid10.c | 17 ++++------------- 1 file changed, 4 insertions(+), 13 deletions(-) diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 3638e00fe420..5b4ffd23211a 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -2454,7 +2454,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio) static void fix_recovery_read_error(struct r10bio *r10_bio) { /* We got a read error during recovery. - * We repeat the read in smaller page-sized sections. + * We repeat the read in smaller logical_block_sized sections. * If a read succeeds, write it to the new device or record * a bad block if we cannot. * If a read fails, record a bad block on both old and @@ -2470,14 +2470,11 @@ static void fix_recovery_read_error(struct r10bio *r10_bio) struct folio *folio = get_resync_folio(bio)->folio; while (sectors) { - int s = sectors; + int s = min_t(int, sectors, mddev->logical_block_size >> 9); struct md_rdev *rdev; sector_t addr; int ok; - if (s > (PAGE_SIZE>>9)) - s = PAGE_SIZE >> 9; - rdev = conf->mirrors[dr].rdev; addr = r10_bio->devs[0].addr + sect; ok = sync_folio_io(rdev, @@ -2621,14 +2618,11 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10 } while(sectors) { - int s = sectors; + int s = min_t(int, sectors, mddev->logical_block_size >> 9); int sl = slot; int success = 0; int start; - if (s > (PAGE_SIZE>>9)) - s = PAGE_SIZE >> 9; - do { d = r10_bio->devs[sl].devnum; rdev = conf->mirrors[d].rdev; @@ -4926,13 +4920,10 @@ static int handle_reshape_read_error(struct mddev *mddev, __raid10_find_phys(&conf->prev, r10b); while (sectors) { - int s = sectors; + int s = min_t(int, sectors, mddev->logical_block_size >> 9); int success = 0; int first_slot = slot; - if (s > (PAGE_SIZE >> 9)) - s = PAGE_SIZE >> 9; - while (!success) { int d = r10b->devs[slot].devnum; struct md_rdev *rdev = conf->mirrors[d].rdev; -- 2.39.2 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v3 8/8] md/raid10: fix IO error at logical block size granularity 2026-04-16 3:38 ` [PATCH v3 8/8] md/raid10: " linan666 @ 2026-04-30 2:23 ` Xiao Ni 0 siblings, 0 replies; 13+ messages in thread From: Xiao Ni @ 2026-04-30 2:23 UTC (permalink / raw) To: linan666; +Cc: song, yukuai, linux-raid, linux-kernel, yangerkun, yi.zhang On Thu, Apr 16, 2026 at 11:51 AM <linan666@huaweicloud.com> wrote: > > From: Li Nan <linan122@huawei.com> > > RAID10 currently fixes IO error at PAGE_SIZE granularity. Fix at smaller > granularity can handle more errors, and RAID will support logical block > sizes larger than PAGE_SIZE in the future, where PAGE_SIZE IO will fail. > > Switch IO error fix granularity to logical block size. > > Signed-off-by: Li Nan <linan122@huawei.com> > Reviewed-by: Yu Kuai <yukuai@fnnas.com> > --- > drivers/md/raid10.c | 17 ++++------------- > 1 file changed, 4 insertions(+), 13 deletions(-) > > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c > index 3638e00fe420..5b4ffd23211a 100644 > --- a/drivers/md/raid10.c > +++ b/drivers/md/raid10.c > @@ -2454,7 +2454,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio) > static void fix_recovery_read_error(struct r10bio *r10_bio) > { > /* We got a read error during recovery. > - * We repeat the read in smaller page-sized sections. > + * We repeat the read in smaller logical_block_sized sections. > * If a read succeeds, write it to the new device or record > * a bad block if we cannot. > * If a read fails, record a bad block on both old and > @@ -2470,14 +2470,11 @@ static void fix_recovery_read_error(struct r10bio *r10_bio) > struct folio *folio = get_resync_folio(bio)->folio; > > while (sectors) { > - int s = sectors; > + int s = min_t(int, sectors, mddev->logical_block_size >> 9); > struct md_rdev *rdev; > sector_t addr; > int ok; > > - if (s > (PAGE_SIZE>>9)) > - s = PAGE_SIZE >> 9; > - > rdev = conf->mirrors[dr].rdev; > addr = r10_bio->devs[0].addr + sect; > ok = sync_folio_io(rdev, > @@ -2621,14 +2618,11 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10 > } > > while(sectors) { > - int s = sectors; > + int s = min_t(int, sectors, mddev->logical_block_size >> 9); > int sl = slot; > int success = 0; > int start; > > - if (s > (PAGE_SIZE>>9)) > - s = PAGE_SIZE >> 9; > - > do { > d = r10_bio->devs[sl].devnum; > rdev = conf->mirrors[d].rdev; > @@ -4926,13 +4920,10 @@ static int handle_reshape_read_error(struct mddev *mddev, > __raid10_find_phys(&conf->prev, r10b); > > while (sectors) { > - int s = sectors; > + int s = min_t(int, sectors, mddev->logical_block_size >> 9); > int success = 0; > int first_slot = slot; > > - if (s > (PAGE_SIZE >> 9)) > - s = PAGE_SIZE >> 9; > - > while (!success) { > int d = r10b->devs[slot].devnum; > struct md_rdev *rdev = conf->mirrors[d].rdev; > -- > 2.39.2 > > This patch looks good to me. Reviewed-by: Xiao Ni <xni@redhat.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2026-05-07 7:13 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-16 3:37 [PATCH v3 0/8] folio support for sync I/O in RAID linan666 2026-04-16 3:37 ` [PATCH v3 1/8] md/raid1,raid10: clean up of RESYNC_SECTORS linan666 2026-04-16 3:37 ` [PATCH v3 2/8] md: introduce sync_folio_io for folio support in RAID linan666 2026-04-16 3:37 ` [PATCH v3 3/8] md: introduce safe_put_folio " linan666 2026-04-16 3:37 ` [PATCH v3 4/8] md/raid1: use folio for tmppage linan666 2026-04-16 3:37 ` [PATCH v3 5/8] md/raid10: " linan666 2026-04-16 3:37 ` [PATCH v3 6/8] md/raid1,raid10: use folio for sync path IO linan666 2026-04-30 1:54 ` Xiao Ni 2026-05-07 7:13 ` 李楠 Magic Li 2026-04-16 3:38 ` [PATCH v3 7/8] md/raid1: fix IO error at logical block size granularity linan666 2026-04-30 2:22 ` Xiao Ni 2026-04-16 3:38 ` [PATCH v3 8/8] md/raid10: " linan666 2026-04-30 2:23 ` Xiao Ni
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox