* [PATCH v3 0/8] folio support for sync I/O in RAID
@ 2026-04-16 3:37 linan666
2026-04-16 3:37 ` [PATCH v3 1/8] md/raid1,raid10: clean up of RESYNC_SECTORS linan666
` (7 more replies)
0 siblings, 8 replies; 13+ messages in thread
From: linan666 @ 2026-04-16 3:37 UTC (permalink / raw)
To: song, yukuai; +Cc: linux-raid, linux-kernel, linan666, yangerkun, yi.zhang
From: Li Nan <linan122@huawei.com>
This patchset adds folio support to sync operations in raid1/10.
Previously, we used 16 * 4K pages for 64K sync I/O. With this change,
we'll use a single 64K folio instead. Using folios reduces
resync/recovery time by 20% on HDD.
This is the first step towards full folio support in RAID. Going forward,
I will replace the remaining page-based usage with folio.
The patchset was tested with mdadm. Additional fault injection stress tests
were run under file systems.
v3:
- In patch 3/4/5, ntroduce safe_folio_put and use it for tmpfolio.
- Merge Cleanup patch into patch 6.
v2:
- Remove patch "md: use folio for bb_folio". It will be included in
a later patch set
- In patch 5:
1) fix typo
2) rewrite the logic of copying data in process_checks()
3) rename resync_get_all_folio() to resync_get_folio()
4) s/resync_pages *rps/resync_folio *rfs/g in
raid1_alloc_init_r1buf() and raid10_alloc_init_r10buf()
- Subsequent patches: Adapting conflicts caused by patch 5
Li Nan (8):
md/raid1,raid10: clean up of RESYNC_SECTORS
md: introduce sync_folio_io for folio support in RAID
md: introduce safe_put_folio for folio support in RAID
md/raid1: use folio for tmppage
md/raid10: use folio for tmppage
md/raid1,raid10: use folio for sync path IO
md/raid1: fix IO error at logical block size granularity
md/raid10: fix IO error at logical block size granularity
drivers/md/md.h | 10 +-
drivers/md/raid1.h | 2 +-
drivers/md/raid10.h | 2 +-
drivers/md/md.c | 17 ++-
drivers/md/raid1-10.c | 81 ++++-------
drivers/md/raid1.c | 233 ++++++++++++++-----------------
drivers/md/raid10.c | 312 ++++++++++++++++++++----------------------
7 files changed, 297 insertions(+), 360 deletions(-)
--
2.39.2
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v3 1/8] md/raid1,raid10: clean up of RESYNC_SECTORS
2026-04-16 3:37 [PATCH v3 0/8] folio support for sync I/O in RAID linan666
@ 2026-04-16 3:37 ` linan666
2026-04-16 3:37 ` [PATCH v3 2/8] md: introduce sync_folio_io for folio support in RAID linan666
` (6 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: linan666 @ 2026-04-16 3:37 UTC (permalink / raw)
To: song, yukuai; +Cc: linux-raid, linux-kernel, linan666, yangerkun, yi.zhang
From: Li Nan <linan122@huawei.com>
Move redundant RESYNC_SECTORS definition from raid1 and raid10
implementations to raid1-10.c.
Simplify max_sync assignment in raid10_sync_request().
No functional changes.
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai@fnnas.com>
---
drivers/md/raid1-10.c | 1 +
drivers/md/raid1.c | 1 -
drivers/md/raid10.c | 4 +---
3 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
index c33099925f23..cda531d0720b 100644
--- a/drivers/md/raid1-10.c
+++ b/drivers/md/raid1-10.c
@@ -2,6 +2,7 @@
/* Maximum size of each resync request */
#define RESYNC_BLOCK_SIZE (64*1024)
#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
+#define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
/* when we get a read error on a read-only array, we redirect to another
* device without failing the first device, or trying to over-write to
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 867db18bc3ba..5a73a9f19e0e 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -136,7 +136,6 @@ static void *r1bio_pool_alloc(gfp_t gfp_flags, struct r1conf *conf)
}
#define RESYNC_DEPTH 32
-#define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
#define RESYNC_WINDOW (RESYNC_BLOCK_SIZE * RESYNC_DEPTH)
#define RESYNC_WINDOW_SECTORS (RESYNC_WINDOW >> 9)
#define CLUSTER_RESYNC_WINDOW (16 * RESYNC_WINDOW)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index b4892c5d571c..90c1036f6ec4 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -113,7 +113,6 @@ static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data)
return kzalloc(size, gfp_flags);
}
-#define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
/* amount of memory to reserve for resync requests */
#define RESYNC_WINDOW (1024*1024)
/* maximum number of concurrent requests, memory permitting */
@@ -3153,7 +3152,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
struct bio *biolist = NULL, *bio;
sector_t nr_sectors;
int i;
- int max_sync;
+ int max_sync = RESYNC_SECTORS;
sector_t sync_blocks;
sector_t chunk_mask = conf->geo.chunk_mask;
int page_idx = 0;
@@ -3266,7 +3265,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
* end_sync_write if we will want to write.
*/
- max_sync = RESYNC_PAGES << (PAGE_SHIFT-9);
if (!test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
/* recovery... the complicated one */
int j;
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 2/8] md: introduce sync_folio_io for folio support in RAID
2026-04-16 3:37 [PATCH v3 0/8] folio support for sync I/O in RAID linan666
2026-04-16 3:37 ` [PATCH v3 1/8] md/raid1,raid10: clean up of RESYNC_SECTORS linan666
@ 2026-04-16 3:37 ` linan666
2026-04-16 3:37 ` [PATCH v3 3/8] md: introduce safe_put_folio " linan666
` (5 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: linan666 @ 2026-04-16 3:37 UTC (permalink / raw)
To: song, yukuai; +Cc: linux-raid, linux-kernel, linan666, yangerkun, yi.zhang
From: Li Nan <linan122@huawei.com>
Prepare for folio support in RAID by introducing sync_folio_io(),
matching sync_page_io()'s functionality. Differences are:
- Add new parameter 'off' to prepare for adding a folio to bio in segments,
e.g. in fix_recovery_read_error()
- Change return value to bool
- Replace the checking to 'bio.bi_status == BLK_STS_OK'
sync_page_io() will be removed once full folio support is complete.
Signed-off-by: Li Nan <linan122@huawei.com>
---
drivers/md/md.h | 4 +++-
drivers/md/md.c | 15 +++++++++++----
2 files changed, 14 insertions(+), 5 deletions(-)
diff --git a/drivers/md/md.h b/drivers/md/md.h
index ac84289664cd..914b992a073b 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -924,8 +924,10 @@ void md_write_metadata(struct mddev *mddev, struct md_rdev *rdev,
sector_t sector, int size, struct page *page,
unsigned int offset);
extern int md_super_wait(struct mddev *mddev);
-extern int sync_page_io(struct md_rdev *rdev, sector_t sector, int size,
+extern bool sync_page_io(struct md_rdev *rdev, sector_t sector, int size,
struct page *page, blk_opf_t opf, bool metadata_op);
+extern bool sync_folio_io(struct md_rdev *rdev, sector_t sector, int size,
+ int off, struct folio *folio, blk_opf_t opf, bool metadata_op);
extern void md_do_sync(struct md_thread *thread);
extern void md_new_event(void);
extern void md_allow_write(struct mddev *mddev);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index d9c9fd2839b3..5e83914d5c14 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1166,8 +1166,8 @@ int md_super_wait(struct mddev *mddev)
return 0;
}
-int sync_page_io(struct md_rdev *rdev, sector_t sector, int size,
- struct page *page, blk_opf_t opf, bool metadata_op)
+bool sync_folio_io(struct md_rdev *rdev, sector_t sector, int size, int off,
+ struct folio *folio, blk_opf_t opf, bool metadata_op)
{
struct bio bio;
struct bio_vec bvec;
@@ -1185,11 +1185,18 @@ int sync_page_io(struct md_rdev *rdev, sector_t sector, int size,
bio.bi_iter.bi_sector = sector + rdev->new_data_offset;
else
bio.bi_iter.bi_sector = sector + rdev->data_offset;
- __bio_add_page(&bio, page, size, 0);
+ bio_add_folio_nofail(&bio, folio, size, off);
submit_bio_wait(&bio);
- return !bio.bi_status;
+ return bio.bi_status == BLK_STS_OK;
+}
+EXPORT_SYMBOL_GPL(sync_folio_io);
+
+bool sync_page_io(struct md_rdev *rdev, sector_t sector, int size,
+ struct page *page, blk_opf_t opf, bool metadata_op)
+{
+ return sync_folio_io(rdev, sector, size, 0, page_folio(page), opf, metadata_op);
}
EXPORT_SYMBOL_GPL(sync_page_io);
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 3/8] md: introduce safe_put_folio for folio support in RAID
2026-04-16 3:37 [PATCH v3 0/8] folio support for sync I/O in RAID linan666
2026-04-16 3:37 ` [PATCH v3 1/8] md/raid1,raid10: clean up of RESYNC_SECTORS linan666
2026-04-16 3:37 ` [PATCH v3 2/8] md: introduce sync_folio_io for folio support in RAID linan666
@ 2026-04-16 3:37 ` linan666
2026-04-16 3:37 ` [PATCH v3 4/8] md/raid1: use folio for tmppage linan666
` (4 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: linan666 @ 2026-04-16 3:37 UTC (permalink / raw)
To: song, yukuai; +Cc: linux-raid, linux-kernel, linan666, yangerkun, yi.zhang
From: Li Nan <linan122@huawei.com>
safe_put_page() will be removed after the last reference to it in RAID5
is removed.
Signed-off-by: Li Nan <linan122@huawei.com>
---
drivers/md/md.h | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 914b992a073b..7c0c38f09cc3 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -888,6 +888,12 @@ struct md_io_clone {
rcu_read_unlock(); \
} while (0)
+static inline void safe_folio_put(struct folio *folio)
+{
+ if (folio)
+ folio_put(folio);
+}
+
static inline void safe_put_page(struct page *p)
{
if (p) put_page(p);
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 4/8] md/raid1: use folio for tmppage
2026-04-16 3:37 [PATCH v3 0/8] folio support for sync I/O in RAID linan666
` (2 preceding siblings ...)
2026-04-16 3:37 ` [PATCH v3 3/8] md: introduce safe_put_folio " linan666
@ 2026-04-16 3:37 ` linan666
2026-04-16 3:37 ` [PATCH v3 5/8] md/raid10: " linan666
` (3 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: linan666 @ 2026-04-16 3:37 UTC (permalink / raw)
To: song, yukuai; +Cc: linux-raid, linux-kernel, linan666, yangerkun, yi.zhang
From: Li Nan <linan122@huawei.com>
Convert tmppage to tmpfolio and use it throughout in raid1.
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
---
drivers/md/raid1.h | 2 +-
drivers/md/raid1.c | 18 ++++++++++--------
2 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
index c98d43a7ae99..d480b3a8c2c4 100644
--- a/drivers/md/raid1.h
+++ b/drivers/md/raid1.h
@@ -101,7 +101,7 @@ struct r1conf {
/* temporary buffer to synchronous IO when attempting to repair
* a read error.
*/
- struct page *tmppage;
+ struct folio *tmpfolio;
/* When taking over an array from a different personality, we store
* the new thread here until we fully activate the array.
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 5a73a9f19e0e..a72abdc37a2d 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2417,8 +2417,8 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
rdev->recovery_offset >= sect + s)) &&
rdev_has_badblock(rdev, sect, s) == 0) {
atomic_inc(&rdev->nr_pending);
- if (sync_page_io(rdev, sect, s<<9,
- conf->tmppage, REQ_OP_READ, false))
+ if (sync_folio_io(rdev, sect, s<<9, 0,
+ conf->tmpfolio, REQ_OP_READ, false))
success = 1;
rdev_dec_pending(rdev, mddev);
if (success)
@@ -2447,7 +2447,8 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
!test_bit(Faulty, &rdev->flags)) {
atomic_inc(&rdev->nr_pending);
r1_sync_page_io(rdev, sect, s,
- conf->tmppage, REQ_OP_WRITE);
+ folio_page(conf->tmpfolio, 0),
+ REQ_OP_WRITE);
rdev_dec_pending(rdev, mddev);
}
}
@@ -2461,7 +2462,8 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
!test_bit(Faulty, &rdev->flags)) {
atomic_inc(&rdev->nr_pending);
if (r1_sync_page_io(rdev, sect, s,
- conf->tmppage, REQ_OP_READ)) {
+ folio_page(conf->tmpfolio, 0),
+ REQ_OP_READ)) {
atomic_add(s, &rdev->corrected_errors);
pr_info("md/raid1:%s: read error corrected (%d sectors at %llu on %pg)\n",
mdname(mddev), s,
@@ -3099,8 +3101,8 @@ static struct r1conf *setup_conf(struct mddev *mddev)
if (!conf->mirrors)
goto abort;
- conf->tmppage = alloc_page(GFP_KERNEL);
- if (!conf->tmppage)
+ conf->tmpfolio = folio_alloc(GFP_KERNEL, 0);
+ if (!conf->tmpfolio)
goto abort;
r1bio_size = offsetof(struct r1bio, bios[mddev->raid_disks * 2]);
@@ -3175,7 +3177,7 @@ static struct r1conf *setup_conf(struct mddev *mddev)
if (conf) {
mempool_destroy(conf->r1bio_pool);
kfree(conf->mirrors);
- safe_put_page(conf->tmppage);
+ safe_folio_put(conf->tmpfolio);
kfree(conf->nr_pending);
kfree(conf->nr_waiting);
kfree(conf->nr_queued);
@@ -3290,7 +3292,7 @@ static void raid1_free(struct mddev *mddev, void *priv)
mempool_destroy(conf->r1bio_pool);
kfree(conf->mirrors);
- safe_put_page(conf->tmppage);
+ safe_folio_put(conf->tmpfolio);
kfree(conf->nr_pending);
kfree(conf->nr_waiting);
kfree(conf->nr_queued);
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 5/8] md/raid10: use folio for tmppage
2026-04-16 3:37 [PATCH v3 0/8] folio support for sync I/O in RAID linan666
` (3 preceding siblings ...)
2026-04-16 3:37 ` [PATCH v3 4/8] md/raid1: use folio for tmppage linan666
@ 2026-04-16 3:37 ` linan666
2026-04-16 3:37 ` [PATCH v3 6/8] md/raid1,raid10: use folio for sync path IO linan666
` (2 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: linan666 @ 2026-04-16 3:37 UTC (permalink / raw)
To: song, yukuai; +Cc: linux-raid, linux-kernel, linan666, yangerkun, yi.zhang
From: Li Nan <linan122@huawei.com>
Convert tmppage to tmpfolio and use it throughout in raid10.
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
---
drivers/md/raid10.h | 2 +-
drivers/md/raid10.c | 37 +++++++++++++++++++------------------
2 files changed, 20 insertions(+), 19 deletions(-)
diff --git a/drivers/md/raid10.h b/drivers/md/raid10.h
index ec79d87fb92f..19f37439a4e2 100644
--- a/drivers/md/raid10.h
+++ b/drivers/md/raid10.h
@@ -89,7 +89,7 @@ struct r10conf {
mempool_t r10bio_pool;
mempool_t r10buf_pool;
- struct page *tmppage;
+ struct folio *tmpfolio;
struct bio_set bio_split;
/* When taking over an array from a different personality, we store
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 90c1036f6ec4..26f93040cd13 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2581,13 +2581,13 @@ static void recovery_request_write(struct mddev *mddev, struct r10bio *r10_bio)
}
}
-static int r10_sync_page_io(struct md_rdev *rdev, sector_t sector,
- int sectors, struct page *page, enum req_op op)
+static int r10_sync_folio_io(struct md_rdev *rdev, sector_t sector,
+ int sectors, struct folio *folio, enum req_op op)
{
if (rdev_has_badblock(rdev, sector, sectors) &&
(op == REQ_OP_READ || test_bit(WriteErrorSeen, &rdev->flags)))
return -1;
- if (sync_page_io(rdev, sector, sectors << 9, page, op, false))
+ if (sync_folio_io(rdev, sector, sectors << 9, 0, folio, op, false))
/* success */
return 1;
if (op == REQ_OP_WRITE) {
@@ -2650,12 +2650,13 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
r10_bio->devs[sl].addr + sect,
s) == 0) {
atomic_inc(&rdev->nr_pending);
- success = sync_page_io(rdev,
- r10_bio->devs[sl].addr +
- sect,
- s<<9,
- conf->tmppage,
- REQ_OP_READ, false);
+ success = sync_folio_io(rdev,
+ r10_bio->devs[sl].addr +
+ sect,
+ s<<9,
+ 0,
+ conf->tmpfolio,
+ REQ_OP_READ, false);
rdev_dec_pending(rdev, mddev);
if (success)
break;
@@ -2698,10 +2699,10 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
continue;
atomic_inc(&rdev->nr_pending);
- if (r10_sync_page_io(rdev,
- r10_bio->devs[sl].addr +
- sect,
- s, conf->tmppage, REQ_OP_WRITE)
+ if (r10_sync_folio_io(rdev,
+ r10_bio->devs[sl].addr +
+ sect,
+ s, conf->tmpfolio, REQ_OP_WRITE)
== 0) {
/* Well, this device is dead */
pr_notice("md/raid10:%s: read correction write failed (%d sectors at %llu on %pg)\n",
@@ -2730,10 +2731,10 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
continue;
atomic_inc(&rdev->nr_pending);
- switch (r10_sync_page_io(rdev,
+ switch (r10_sync_folio_io(rdev,
r10_bio->devs[sl].addr +
sect,
- s, conf->tmppage, REQ_OP_READ)) {
+ s, conf->tmpfolio, REQ_OP_READ)) {
case 0:
/* Well, this device is dead */
pr_notice("md/raid10:%s: unable to read back corrected sectors (%d sectors at %llu on %pg)\n",
@@ -3823,7 +3824,7 @@ static void raid10_free_conf(struct r10conf *conf)
kfree(conf->mirrors);
kfree(conf->mirrors_old);
kfree(conf->mirrors_new);
- safe_put_page(conf->tmppage);
+ safe_folio_put(conf->tmpfolio);
bioset_exit(&conf->bio_split);
kfree(conf);
}
@@ -3861,8 +3862,8 @@ static struct r10conf *setup_conf(struct mddev *mddev)
if (!conf->mirrors)
goto out;
- conf->tmppage = alloc_page(GFP_KERNEL);
- if (!conf->tmppage)
+ conf->tmpfolio = folio_alloc(GFP_KERNEL, 0);
+ if (!conf->tmpfolio)
goto out;
conf->geo = geo;
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 6/8] md/raid1,raid10: use folio for sync path IO
2026-04-16 3:37 [PATCH v3 0/8] folio support for sync I/O in RAID linan666
` (4 preceding siblings ...)
2026-04-16 3:37 ` [PATCH v3 5/8] md/raid10: " linan666
@ 2026-04-16 3:37 ` linan666
2026-04-30 1:54 ` Xiao Ni
2026-04-16 3:38 ` [PATCH v3 7/8] md/raid1: fix IO error at logical block size granularity linan666
2026-04-16 3:38 ` [PATCH v3 8/8] md/raid10: " linan666
7 siblings, 1 reply; 13+ messages in thread
From: linan666 @ 2026-04-16 3:37 UTC (permalink / raw)
To: song, yukuai; +Cc: linux-raid, linux-kernel, linan666, yangerkun, yi.zhang
From: Li Nan <linan122@huawei.com>
Convert all IO on the sync path to use folios, and rename page-related
identifiers to match folio.
Since RESYNC_BLOCK_SIZE (64K) has higher allocation failure chance than 4k,
retry with lower orders to improve allocation reliability. A r1/10_bio may
have different rf->folio orders, so use minimum order as r1/10_bio sectors
to prevent exceeding size when adding folio to IO later.
Clean up:
1. Remove resync_get_all_folio() and invoke folio_get() directly instead.
2. Clean up redundant while(0) loop in md_bio_reset_resync_folio().
3. Clean up bio variable by directly referencing r10_bio->devs[j].bio
instead in r1buf_pool_alloc() and r10buf_pool_alloc().
4. Clean up RESYNC_PAGES.
5. Remove resync_fetch_folio(), access 'rf->folio' directly.
6. Remove resync_free_folio(), call folio_put() directly.
7. clean up sync IO size calculation in raid1/10_sync_request.
Signed-off-by: Li Nan <linan122@huawei.com>
---
drivers/md/md.c | 2 +-
drivers/md/raid1-10.c | 80 ++++---------
drivers/md/raid1.c | 209 +++++++++++++++-------------------
drivers/md/raid10.c | 254 +++++++++++++++++++++---------------------
4 files changed, 240 insertions(+), 305 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 5e83914d5c14..6554b849ac74 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9440,7 +9440,7 @@ static bool sync_io_within_limit(struct mddev *mddev)
{
/*
* For raid456, sync IO is stripe(4k) per IO, for other levels, it's
- * RESYNC_PAGES(64k) per IO.
+ * RESYNC_BLOCK_SIZE(64k) per IO.
*/
return atomic_read(&mddev->recovery_active) <
(raid_is_456(mddev) ? 8 : 128) * sync_io_depth(mddev);
diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
index cda531d0720b..10200b0a3fd2 100644
--- a/drivers/md/raid1-10.c
+++ b/drivers/md/raid1-10.c
@@ -1,7 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
/* Maximum size of each resync request */
#define RESYNC_BLOCK_SIZE (64*1024)
-#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
#define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
/* when we get a read error on a read-only array, we redirect to another
@@ -20,9 +19,9 @@
#define MAX_PLUG_BIO 32
/* for managing resync I/O pages */
-struct resync_pages {
+struct resync_folio {
void *raid_bio;
- struct page *pages[RESYNC_PAGES];
+ struct folio *folio;
};
struct raid1_plug_cb {
@@ -36,77 +35,44 @@ static void rbio_pool_free(void *rbio, void *data)
kfree(rbio);
}
-static inline int resync_alloc_pages(struct resync_pages *rp,
- gfp_t gfp_flags)
+static inline int resync_alloc_folio(struct resync_folio *rf,
+ gfp_t gfp_flags, int *order)
{
- int i;
+ struct folio *folio;
- for (i = 0; i < RESYNC_PAGES; i++) {
- rp->pages[i] = alloc_page(gfp_flags);
- if (!rp->pages[i])
- goto out_free;
- }
+ do {
+ folio = folio_alloc(gfp_flags, *order);
+ if (folio)
+ break;
+ } while (--(*order) > 0);
+ if (!folio)
+ return -ENOMEM;
+
+ rf->folio = folio;
return 0;
-
-out_free:
- while (--i >= 0)
- put_page(rp->pages[i]);
- return -ENOMEM;
-}
-
-static inline void resync_free_pages(struct resync_pages *rp)
-{
- int i;
-
- for (i = 0; i < RESYNC_PAGES; i++)
- put_page(rp->pages[i]);
-}
-
-static inline void resync_get_all_pages(struct resync_pages *rp)
-{
- int i;
-
- for (i = 0; i < RESYNC_PAGES; i++)
- get_page(rp->pages[i]);
-}
-
-static inline struct page *resync_fetch_page(struct resync_pages *rp,
- unsigned idx)
-{
- if (WARN_ON_ONCE(idx >= RESYNC_PAGES))
- return NULL;
- return rp->pages[idx];
}
/*
- * 'strct resync_pages' stores actual pages used for doing the resync
+ * 'strct resync_folio' stores actual pages used for doing the resync
* IO, and it is per-bio, so make .bi_private points to it.
*/
-static inline struct resync_pages *get_resync_pages(struct bio *bio)
+static inline struct resync_folio *get_resync_folio(struct bio *bio)
{
return bio->bi_private;
}
/* generally called after bio_reset() for reseting bvec */
-static void md_bio_reset_resync_pages(struct bio *bio, struct resync_pages *rp,
+static void md_bio_reset_resync_folio(struct bio *bio, struct resync_folio *rf,
int size)
{
- int idx = 0;
-
/* initialize bvec table again */
- do {
- struct page *page = resync_fetch_page(rp, idx);
- int len = min_t(int, size, PAGE_SIZE);
-
- if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
- bio->bi_status = BLK_STS_RESOURCE;
- bio_endio(bio);
- return;
- }
-
- size -= len;
- } while (idx++ < RESYNC_PAGES && size > 0);
+ if (WARN_ON(!bio_add_folio(bio, rf->folio,
+ min_t(int, size, RESYNC_BLOCK_SIZE),
+ 0))) {
+ bio->bi_status = BLK_STS_RESOURCE;
+ bio_endio(bio);
+ }
}
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index a72abdc37a2d..724fd4f2cc3a 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -120,11 +120,11 @@ static void remove_serial(struct md_rdev *rdev, sector_t lo, sector_t hi)
/*
* for resync bio, r1bio pointer can be retrieved from the per-bio
- * 'struct resync_pages'.
+ * 'struct resync_folio'.
*/
static inline struct r1bio *get_resync_r1bio(struct bio *bio)
{
- return get_resync_pages(bio)->raid_bio;
+ return get_resync_folio(bio)->raid_bio;
}
static void *r1bio_pool_alloc(gfp_t gfp_flags, struct r1conf *conf)
@@ -146,70 +146,69 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
struct r1conf *conf = data;
struct r1bio *r1_bio;
struct bio *bio;
- int need_pages;
+ int need_folio;
int j;
- struct resync_pages *rps;
+ struct resync_folio *rfs;
+ int order = get_order(RESYNC_BLOCK_SIZE);
r1_bio = r1bio_pool_alloc(gfp_flags, conf);
if (!r1_bio)
return NULL;
- rps = kmalloc_array(conf->raid_disks * 2, sizeof(struct resync_pages),
+ rfs = kmalloc_array(conf->raid_disks * 2, sizeof(struct resync_folio),
gfp_flags);
- if (!rps)
+ if (!rfs)
goto out_free_r1bio;
/*
* Allocate bios : 1 for reading, n-1 for writing
*/
for (j = conf->raid_disks * 2; j-- ; ) {
- bio = bio_kmalloc(RESYNC_PAGES, gfp_flags);
+ bio = bio_kmalloc(1, gfp_flags);
if (!bio)
goto out_free_bio;
- bio_init_inline(bio, NULL, RESYNC_PAGES, 0);
+ bio_init_inline(bio, NULL, 1, 0);
r1_bio->bios[j] = bio;
}
/*
- * Allocate RESYNC_PAGES data pages and attach them to
- * the first bio.
+ * Allocate data folio and attach it to the first bio.
* If this is a user-requested check/repair, allocate
- * RESYNC_PAGES for each bio.
+ * folio for each bio.
*/
if (test_bit(MD_RECOVERY_REQUESTED, &conf->mddev->recovery))
- need_pages = conf->raid_disks * 2;
+ need_folio = conf->raid_disks * 2;
else
- need_pages = 1;
+ need_folio = 1;
for (j = 0; j < conf->raid_disks * 2; j++) {
- struct resync_pages *rp = &rps[j];
+ struct resync_folio *rf = &rfs[j];
- bio = r1_bio->bios[j];
-
- if (j < need_pages) {
- if (resync_alloc_pages(rp, gfp_flags))
- goto out_free_pages;
+ if (j < need_folio) {
+ if (resync_alloc_folio(rf, gfp_flags, &order))
+ goto out_free_folio;
} else {
- memcpy(rp, &rps[0], sizeof(*rp));
- resync_get_all_pages(rp);
+ memcpy(rf, &rfs[0], sizeof(*rf));
+ folio_get(rf->folio);
}
- rp->raid_bio = r1_bio;
- bio->bi_private = rp;
+ rf->raid_bio = r1_bio;
+ r1_bio->bios[j]->bi_private = rf;
}
+ r1_bio->sectors = 1 << (order + PAGE_SECTORS_SHIFT);
r1_bio->master_bio = NULL;
return r1_bio;
-out_free_pages:
+out_free_folio:
while (--j >= 0)
- resync_free_pages(&rps[j]);
+ folio_put(rfs[j].folio);
out_free_bio:
while (++j < conf->raid_disks * 2) {
bio_uninit(r1_bio->bios[j]);
kfree(r1_bio->bios[j]);
}
- kfree(rps);
+ kfree(rfs);
out_free_r1bio:
rbio_pool_free(r1_bio, data);
@@ -221,17 +220,17 @@ static void r1buf_pool_free(void *__r1_bio, void *data)
struct r1conf *conf = data;
int i;
struct r1bio *r1bio = __r1_bio;
- struct resync_pages *rp = NULL;
+ struct resync_folio *rf = NULL;
for (i = conf->raid_disks * 2; i--; ) {
- rp = get_resync_pages(r1bio->bios[i]);
- resync_free_pages(rp);
+ rf = get_resync_folio(r1bio->bios[i]);
+ folio_put(rf->folio);
bio_uninit(r1bio->bios[i]);
kfree(r1bio->bios[i]);
}
- /* resync pages array stored in the 1st bio's .bi_private */
- kfree(rp);
+ /* resync folio stored in the 1st bio's .bi_private */
+ kfree(rf);
rbio_pool_free(r1bio, data);
}
@@ -2095,10 +2094,10 @@ static void end_sync_write(struct bio *bio)
put_sync_write_buf(r1_bio);
}
-static int r1_sync_page_io(struct md_rdev *rdev, sector_t sector,
- int sectors, struct page *page, blk_opf_t rw)
+static int r1_sync_folio_io(struct md_rdev *rdev, sector_t sector, int sectors,
+ int off, struct folio *folio, blk_opf_t rw)
{
- if (sync_page_io(rdev, sector, sectors << 9, page, rw, false))
+ if (sync_folio_io(rdev, sector, sectors << 9, off, folio, rw, false))
/* success */
return 1;
if (rw == REQ_OP_WRITE) {
@@ -2129,10 +2128,10 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
struct mddev *mddev = r1_bio->mddev;
struct r1conf *conf = mddev->private;
struct bio *bio = r1_bio->bios[r1_bio->read_disk];
- struct page **pages = get_resync_pages(bio)->pages;
+ struct folio *folio = get_resync_folio(bio)->folio;
sector_t sect = r1_bio->sector;
int sectors = r1_bio->sectors;
- int idx = 0;
+ int off = 0;
struct md_rdev *rdev;
rdev = conf->mirrors[r1_bio->read_disk].rdev;
@@ -2162,9 +2161,8 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
* active, and resync is currently active
*/
rdev = conf->mirrors[d].rdev;
- if (sync_page_io(rdev, sect, s<<9,
- pages[idx],
- REQ_OP_READ, false)) {
+ if (sync_folio_io(rdev, sect, s<<9, off, folio,
+ REQ_OP_READ, false)) {
success = 1;
break;
}
@@ -2197,7 +2195,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
/* Try next page */
sectors -= s;
sect += s;
- idx++;
+ off += s << 9;
continue;
}
@@ -2210,8 +2208,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
if (r1_bio->bios[d]->bi_end_io != end_sync_read)
continue;
rdev = conf->mirrors[d].rdev;
- if (r1_sync_page_io(rdev, sect, s,
- pages[idx],
+ if (r1_sync_folio_io(rdev, sect, s, off, folio,
REQ_OP_WRITE) == 0) {
r1_bio->bios[d]->bi_end_io = NULL;
rdev_dec_pending(rdev, mddev);
@@ -2225,14 +2222,13 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
if (r1_bio->bios[d]->bi_end_io != end_sync_read)
continue;
rdev = conf->mirrors[d].rdev;
- if (r1_sync_page_io(rdev, sect, s,
- pages[idx],
+ if (r1_sync_folio_io(rdev, sect, s, off, folio,
REQ_OP_READ) != 0)
atomic_add(s, &rdev->corrected_errors);
}
sectors -= s;
sect += s;
- idx ++;
+ off += s << 9;
}
set_bit(R1BIO_Uptodate, &r1_bio->state);
bio->bi_status = 0;
@@ -2252,14 +2248,12 @@ static void process_checks(struct r1bio *r1_bio)
struct r1conf *conf = mddev->private;
int primary;
int i;
- int vcnt;
/* Fix variable parts of all bios */
- vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9);
for (i = 0; i < conf->raid_disks * 2; i++) {
blk_status_t status;
struct bio *b = r1_bio->bios[i];
- struct resync_pages *rp = get_resync_pages(b);
+ struct resync_folio *rf = get_resync_folio(b);
if (b->bi_end_io != end_sync_read)
continue;
/* fixup the bio for reuse, but preserve errno */
@@ -2269,11 +2263,11 @@ static void process_checks(struct r1bio *r1_bio)
b->bi_iter.bi_sector = r1_bio->sector +
conf->mirrors[i].rdev->data_offset;
b->bi_end_io = end_sync_read;
- rp->raid_bio = r1_bio;
- b->bi_private = rp;
+ rf->raid_bio = r1_bio;
+ b->bi_private = rf;
/* initialize bvec table again */
- md_bio_reset_resync_pages(b, rp, r1_bio->sectors << 9);
+ md_bio_reset_resync_folio(b, rf, r1_bio->sectors << 9);
}
for (primary = 0; primary < conf->raid_disks * 2; primary++)
if (r1_bio->bios[primary]->bi_end_io == end_sync_read &&
@@ -2284,44 +2278,39 @@ static void process_checks(struct r1bio *r1_bio)
}
r1_bio->read_disk = primary;
for (i = 0; i < conf->raid_disks * 2; i++) {
- int j = 0;
struct bio *pbio = r1_bio->bios[primary];
struct bio *sbio = r1_bio->bios[i];
blk_status_t status = sbio->bi_status;
- struct page **ppages = get_resync_pages(pbio)->pages;
- struct page **spages = get_resync_pages(sbio)->pages;
- struct bio_vec *bi;
- int page_len[RESYNC_PAGES] = { 0 };
- struct bvec_iter_all iter_all;
+ struct folio *pfolio = get_resync_folio(pbio)->folio;
+ struct folio *sfolio = get_resync_folio(sbio)->folio;
if (sbio->bi_end_io != end_sync_read)
continue;
/* Now we can 'fixup' the error value */
sbio->bi_status = 0;
- bio_for_each_segment_all(bi, sbio, iter_all)
- page_len[j++] = bi->bv_len;
-
- if (!status) {
- for (j = vcnt; j-- ; ) {
- if (memcmp(page_address(ppages[j]),
- page_address(spages[j]),
- page_len[j]))
- break;
- }
- } else
- j = 0;
- if (j >= 0)
+ /*
+ * Copy data and submit write in two cases:
+ * - IO error (non-zero status)
+ * - Data inconsistency and not a CHECK operation.
+ */
+ if (status) {
atomic64_add(r1_bio->sectors, &mddev->resync_mismatches);
- if (j < 0 || (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)
- && !status)) {
- /* No need to write to this device. */
- sbio->bi_end_io = NULL;
- rdev_dec_pending(conf->mirrors[i].rdev, mddev);
+ bio_copy_data(sbio, pbio);
continue;
+ } else if (memcmp(folio_address(pfolio),
+ folio_address(sfolio),
+ r1_bio->sectors << 9)) {
+ atomic64_add(r1_bio->sectors, &mddev->resync_mismatches);
+ if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) {
+ bio_copy_data(sbio, pbio);
+ continue;
+ }
}
- bio_copy_data(sbio, pbio);
+ /* No need to write to this device. */
+ sbio->bi_end_io = NULL;
+ rdev_dec_pending(conf->mirrors[i].rdev, mddev);
}
}
@@ -2446,9 +2435,8 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
if (rdev &&
!test_bit(Faulty, &rdev->flags)) {
atomic_inc(&rdev->nr_pending);
- r1_sync_page_io(rdev, sect, s,
- folio_page(conf->tmpfolio, 0),
- REQ_OP_WRITE);
+ r1_sync_folio_io(rdev, sect, s, 0,
+ conf->tmpfolio, REQ_OP_WRITE);
rdev_dec_pending(rdev, mddev);
}
}
@@ -2461,9 +2449,8 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
if (rdev &&
!test_bit(Faulty, &rdev->flags)) {
atomic_inc(&rdev->nr_pending);
- if (r1_sync_page_io(rdev, sect, s,
- folio_page(conf->tmpfolio, 0),
- REQ_OP_READ)) {
+ if (r1_sync_folio_io(rdev, sect, s, 0,
+ conf->tmpfolio, REQ_OP_READ)) {
atomic_add(s, &rdev->corrected_errors);
pr_info("md/raid1:%s: read error corrected (%d sectors at %llu on %pg)\n",
mdname(mddev), s,
@@ -2738,15 +2725,15 @@ static int init_resync(struct r1conf *conf)
static struct r1bio *raid1_alloc_init_r1buf(struct r1conf *conf)
{
struct r1bio *r1bio = mempool_alloc(&conf->r1buf_pool, GFP_NOIO);
- struct resync_pages *rps;
+ struct resync_folio *rfs;
struct bio *bio;
int i;
for (i = conf->raid_disks * 2; i--; ) {
bio = r1bio->bios[i];
- rps = bio->bi_private;
+ rfs = bio->bi_private;
bio_reset(bio, NULL, 0);
- bio->bi_private = rps;
+ bio->bi_private = rfs;
}
r1bio->master_bio = NULL;
return r1bio;
@@ -2775,10 +2762,9 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
int write_targets = 0, read_targets = 0;
sector_t sync_blocks;
bool still_degraded = false;
- int good_sectors = RESYNC_SECTORS;
+ int good_sectors;
int min_bad = 0; /* number of sectors that are bad in all devices */
int idx = sector_to_idx(sector_nr);
- int page_idx = 0;
if (!mempool_initialized(&conf->r1buf_pool))
if (init_resync(conf))
@@ -2858,8 +2844,11 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
r1_bio->sector = sector_nr;
r1_bio->state = 0;
set_bit(R1BIO_IsSync, &r1_bio->state);
- /* make sure good_sectors won't go across barrier unit boundary */
- good_sectors = align_to_barrier_unit_end(sector_nr, good_sectors);
+ /*
+ * make sure good_sectors won't go across barrier unit boundary.
+ * r1_bio->sectors <= RESYNC_SECTORS.
+ */
+ good_sectors = align_to_barrier_unit_end(sector_nr, r1_bio->sectors);
for (i = 0; i < conf->raid_disks * 2; i++) {
struct md_rdev *rdev;
@@ -2979,44 +2968,28 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
max_sector = mddev->resync_max; /* Don't do IO beyond here */
if (max_sector > sector_nr + good_sectors)
max_sector = sector_nr + good_sectors;
- nr_sectors = 0;
- sync_blocks = 0;
do {
- struct page *page;
- int len = PAGE_SIZE;
- if (sector_nr + (len>>9) > max_sector)
- len = (max_sector - sector_nr) << 9;
- if (len == 0)
+ nr_sectors = max_sector - sector_nr;
+ if (nr_sectors == 0)
break;
- if (sync_blocks == 0) {
- if (!md_bitmap_start_sync(mddev, sector_nr,
- &sync_blocks, still_degraded) &&
- !conf->fullsync &&
- !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
- break;
- if ((len >> 9) > sync_blocks)
- len = sync_blocks<<9;
- }
+ if (!md_bitmap_start_sync(mddev, sector_nr,
+ &sync_blocks, still_degraded) &&
+ !conf->fullsync &&
+ !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
+ break;
+ if (nr_sectors > sync_blocks)
+ nr_sectors = sync_blocks;
for (i = 0 ; i < conf->raid_disks * 2; i++) {
- struct resync_pages *rp;
-
bio = r1_bio->bios[i];
- rp = get_resync_pages(bio);
if (bio->bi_end_io) {
- page = resync_fetch_page(rp, page_idx);
+ struct resync_folio *rf = get_resync_folio(bio);
- /*
- * won't fail because the vec table is big
- * enough to hold all these pages
- */
- __bio_add_page(bio, page, len, 0);
+ bio_add_folio_nofail(bio, rf->folio, nr_sectors << 9, 0);
}
}
- nr_sectors += len>>9;
- sector_nr += len>>9;
- sync_blocks -= (len>>9);
- } while (++page_idx < RESYNC_PAGES);
+ sector_nr += nr_sectors;
+ } while (0);
r1_bio->sectors = nr_sectors;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 26f93040cd13..3638e00fe420 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -96,11 +96,11 @@ static void end_reshape(struct r10conf *conf);
/*
* for resync bio, r10bio pointer can be retrieved from the per-bio
- * 'struct resync_pages'.
+ * 'struct resync_folio'.
*/
static inline struct r10bio *get_resync_r10bio(struct bio *bio)
{
- return get_resync_pages(bio)->raid_bio;
+ return get_resync_folio(bio)->raid_bio;
}
static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data)
@@ -133,8 +133,9 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
struct r10bio *r10_bio;
struct bio *bio;
int j;
- int nalloc, nalloc_rp;
- struct resync_pages *rps;
+ int nalloc, nalloc_rf;
+ struct resync_folio *rfs;
+ int order = get_order(RESYNC_BLOCK_SIZE);
r10_bio = r10bio_pool_alloc(gfp_flags, conf);
if (!r10_bio)
@@ -148,66 +149,64 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
/* allocate once for all bios */
if (!conf->have_replacement)
- nalloc_rp = nalloc;
+ nalloc_rf = nalloc;
else
- nalloc_rp = nalloc * 2;
- rps = kmalloc_array(nalloc_rp, sizeof(struct resync_pages), gfp_flags);
- if (!rps)
+ nalloc_rf = nalloc * 2;
+ rfs = kmalloc_array(nalloc_rf, sizeof(struct resync_folio), gfp_flags);
+ if (!rfs)
goto out_free_r10bio;
/*
* Allocate bios.
*/
for (j = nalloc ; j-- ; ) {
- bio = bio_kmalloc(RESYNC_PAGES, gfp_flags);
+ bio = bio_kmalloc(1, gfp_flags);
if (!bio)
goto out_free_bio;
- bio_init_inline(bio, NULL, RESYNC_PAGES, 0);
+ bio_init_inline(bio, NULL, 1, 0);
r10_bio->devs[j].bio = bio;
if (!conf->have_replacement)
continue;
- bio = bio_kmalloc(RESYNC_PAGES, gfp_flags);
+ bio = bio_kmalloc(1, gfp_flags);
if (!bio)
goto out_free_bio;
- bio_init_inline(bio, NULL, RESYNC_PAGES, 0);
+ bio_init_inline(bio, NULL, 1, 0);
r10_bio->devs[j].repl_bio = bio;
}
/*
- * Allocate RESYNC_PAGES data pages and attach them
- * where needed.
+ * Allocate data folio and attach it where needed.
*/
for (j = 0; j < nalloc; j++) {
struct bio *rbio = r10_bio->devs[j].repl_bio;
- struct resync_pages *rp, *rp_repl;
+ struct resync_folio *rf, *rf_repl;
- rp = &rps[j];
+ rf = &rfs[j];
if (rbio)
- rp_repl = &rps[nalloc + j];
-
- bio = r10_bio->devs[j].bio;
+ rf_repl = &rfs[nalloc + j];
if (!j || test_bit(MD_RECOVERY_SYNC,
&conf->mddev->recovery)) {
- if (resync_alloc_pages(rp, gfp_flags))
- goto out_free_pages;
+ if (resync_alloc_folio(rf, gfp_flags, &order))
+ goto out_free_folio;
} else {
- memcpy(rp, &rps[0], sizeof(*rp));
- resync_get_all_pages(rp);
+ memcpy(rf, &rfs[0], sizeof(*rf));
+ folio_get(rf->folio);
}
- rp->raid_bio = r10_bio;
- bio->bi_private = rp;
+ rf->raid_bio = r10_bio;
+ r10_bio->devs[j].bio->bi_private = rf;
if (rbio) {
- memcpy(rp_repl, rp, sizeof(*rp));
- rbio->bi_private = rp_repl;
+ memcpy(rf_repl, rf, sizeof(*rf));
+ rbio->bi_private = rf_repl;
}
}
+ r10_bio->sectors = 1 << (order + PAGE_SECTORS_SHIFT);
return r10_bio;
-out_free_pages:
+out_free_folio:
while (--j >= 0)
- resync_free_pages(&rps[j]);
+ folio_put(rfs[j].folio);
j = 0;
out_free_bio:
@@ -219,7 +218,7 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
bio_uninit(r10_bio->devs[j].repl_bio);
kfree(r10_bio->devs[j].repl_bio);
}
- kfree(rps);
+ kfree(rfs);
out_free_r10bio:
rbio_pool_free(r10_bio, conf);
return NULL;
@@ -230,14 +229,14 @@ static void r10buf_pool_free(void *__r10_bio, void *data)
struct r10conf *conf = data;
struct r10bio *r10bio = __r10_bio;
int j;
- struct resync_pages *rp = NULL;
+ struct resync_folio *rf = NULL;
for (j = conf->copies; j--; ) {
struct bio *bio = r10bio->devs[j].bio;
if (bio) {
- rp = get_resync_pages(bio);
- resync_free_pages(rp);
+ rf = get_resync_folio(bio);
+ folio_put(rf->folio);
bio_uninit(bio);
kfree(bio);
}
@@ -250,7 +249,7 @@ static void r10buf_pool_free(void *__r10_bio, void *data)
}
/* resync pages array stored in the 1st bio's .bi_private */
- kfree(rp);
+ kfree(rf);
rbio_pool_free(r10bio, conf);
}
@@ -2342,8 +2341,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
struct r10conf *conf = mddev->private;
int i, first;
struct bio *tbio, *fbio;
- int vcnt;
- struct page **tpages, **fpages;
+ struct folio *tfolio, *ffolio;
atomic_set(&r10_bio->remaining, 1);
@@ -2359,14 +2357,13 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
fbio = r10_bio->devs[i].bio;
fbio->bi_iter.bi_size = r10_bio->sectors << 9;
fbio->bi_iter.bi_idx = 0;
- fpages = get_resync_pages(fbio)->pages;
+ ffolio = get_resync_folio(fbio)->folio;
- vcnt = (r10_bio->sectors + (PAGE_SIZE >> 9) - 1) >> (PAGE_SHIFT - 9);
/* now find blocks with errors */
for (i=0 ; i < conf->copies ; i++) {
- int j, d;
+ int d;
struct md_rdev *rdev;
- struct resync_pages *rp;
+ struct resync_folio *rf;
tbio = r10_bio->devs[i].bio;
@@ -2375,31 +2372,23 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
if (i == first)
continue;
- tpages = get_resync_pages(tbio)->pages;
+ tfolio = get_resync_folio(tbio)->folio;
d = r10_bio->devs[i].devnum;
rdev = conf->mirrors[d].rdev;
if (!r10_bio->devs[i].bio->bi_status) {
/* We know that the bi_io_vec layout is the same for
* both 'first' and 'i', so we just compare them.
- * All vec entries are PAGE_SIZE;
*/
- int sectors = r10_bio->sectors;
- for (j = 0; j < vcnt; j++) {
- int len = PAGE_SIZE;
- if (sectors < (len / 512))
- len = sectors * 512;
- if (memcmp(page_address(fpages[j]),
- page_address(tpages[j]),
- len))
- break;
- sectors -= len/512;
+ if (memcmp(folio_address(ffolio),
+ folio_address(tfolio),
+ r10_bio->sectors << 9)) {
+ atomic64_add(r10_bio->sectors,
+ &mddev->resync_mismatches);
+ if (test_bit(MD_RECOVERY_CHECK,
+ &mddev->recovery))
+ /* Don't fix anything. */
+ continue;
}
- if (j == vcnt)
- continue;
- atomic64_add(r10_bio->sectors, &mddev->resync_mismatches);
- if (test_bit(MD_RECOVERY_CHECK, &mddev->recovery))
- /* Don't fix anything. */
- continue;
} else if (test_bit(FailFast, &rdev->flags)) {
/* Just give up on this device */
md_error(rdev->mddev, rdev);
@@ -2410,13 +2399,13 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
* First we need to fixup bv_offset, bv_len and
* bi_vecs, as the read request might have corrupted these
*/
- rp = get_resync_pages(tbio);
+ rf = get_resync_folio(tbio);
bio_reset(tbio, conf->mirrors[d].rdev->bdev, REQ_OP_WRITE);
- md_bio_reset_resync_pages(tbio, rp, fbio->bi_iter.bi_size);
+ md_bio_reset_resync_folio(tbio, rf, fbio->bi_iter.bi_size);
- rp->raid_bio = r10_bio;
- tbio->bi_private = rp;
+ rf->raid_bio = r10_bio;
+ tbio->bi_private = rf;
tbio->bi_iter.bi_sector = r10_bio->devs[i].addr;
tbio->bi_end_io = end_sync_write;
@@ -2476,10 +2465,9 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
struct bio *bio = r10_bio->devs[0].bio;
sector_t sect = 0;
int sectors = r10_bio->sectors;
- int idx = 0;
int dr = r10_bio->devs[0].devnum;
int dw = r10_bio->devs[1].devnum;
- struct page **pages = get_resync_pages(bio)->pages;
+ struct folio *folio = get_resync_folio(bio)->folio;
while (sectors) {
int s = sectors;
@@ -2492,19 +2480,21 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
rdev = conf->mirrors[dr].rdev;
addr = r10_bio->devs[0].addr + sect;
- ok = sync_page_io(rdev,
- addr,
- s << 9,
- pages[idx],
- REQ_OP_READ, false);
+ ok = sync_folio_io(rdev,
+ addr,
+ s << 9,
+ sect << 9,
+ folio,
+ REQ_OP_READ, false);
if (ok) {
rdev = conf->mirrors[dw].rdev;
addr = r10_bio->devs[1].addr + sect;
- ok = sync_page_io(rdev,
- addr,
- s << 9,
- pages[idx],
- REQ_OP_WRITE, false);
+ ok = sync_folio_io(rdev,
+ addr,
+ s << 9,
+ sect << 9,
+ folio,
+ REQ_OP_WRITE, false);
if (!ok) {
set_bit(WriteErrorSeen, &rdev->flags);
if (!test_and_set_bit(WantReplacement,
@@ -2539,7 +2529,6 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
sectors -= s;
sect += s;
- idx++;
}
}
@@ -3050,7 +3039,7 @@ static int init_resync(struct r10conf *conf)
static struct r10bio *raid10_alloc_init_r10buf(struct r10conf *conf)
{
struct r10bio *r10bio = mempool_alloc(&conf->r10buf_pool, GFP_NOIO);
- struct rsync_pages *rp;
+ struct resync_folio *rf;
struct bio *bio;
int nalloc;
int i;
@@ -3063,14 +3052,14 @@ static struct r10bio *raid10_alloc_init_r10buf(struct r10conf *conf)
for (i = 0; i < nalloc; i++) {
bio = r10bio->devs[i].bio;
- rp = bio->bi_private;
+ rf = bio->bi_private;
bio_reset(bio, NULL, 0);
- bio->bi_private = rp;
+ bio->bi_private = rf;
bio = r10bio->devs[i].repl_bio;
if (bio) {
- rp = bio->bi_private;
+ rf = bio->bi_private;
bio_reset(bio, NULL, 0);
- bio->bi_private = rp;
+ bio->bi_private = rf;
}
}
return r10bio;
@@ -3156,7 +3145,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
int max_sync = RESYNC_SECTORS;
sector_t sync_blocks;
sector_t chunk_mask = conf->geo.chunk_mask;
- int page_idx = 0;
/*
* Allow skipping a full rebuild for incremental assembly
@@ -3376,6 +3364,15 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
continue;
}
}
+
+ /*
+ * RESYNC_BLOCK_SIZE folio might alloc failed in
+ * resync_alloc_folio(). Fall back to smaller sync
+ * size if needed.
+ */
+ if (max_sync > r10_bio->sectors)
+ max_sync = r10_bio->sectors;
+
any_working = 1;
bio = r10_bio->devs[0].bio;
bio->bi_next = biolist;
@@ -3527,7 +3524,15 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
}
if (sync_blocks < max_sync)
max_sync = sync_blocks;
+
r10_bio = raid10_alloc_init_r10buf(conf);
+ /*
+ * RESYNC_BLOCK_SIZE folio might alloc failed in resync_alloc_folio().
+ * Fall back to smaller sync size if needed.
+ */
+ if (max_sync > r10_bio->sectors)
+ max_sync = r10_bio->sectors;
+
r10_bio->state = 0;
r10_bio->mddev = mddev;
@@ -3620,29 +3625,25 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
}
}
- nr_sectors = 0;
if (sector_nr + max_sync < max_sector)
max_sector = sector_nr + max_sync;
do {
- struct page *page;
- int len = PAGE_SIZE;
- if (sector_nr + (len>>9) > max_sector)
- len = (max_sector - sector_nr) << 9;
- if (len == 0)
+ nr_sectors = max_sector - sector_nr;
+
+ if (nr_sectors == 0)
break;
for (bio= biolist ; bio ; bio=bio->bi_next) {
- struct resync_pages *rp = get_resync_pages(bio);
- page = resync_fetch_page(rp, page_idx);
- if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
+ struct resync_folio *rf = get_resync_folio(bio);
+
+ if (WARN_ON(!bio_add_folio(bio, rf->folio, nr_sectors << 9, 0))) {
bio->bi_status = BLK_STS_RESOURCE;
bio_endio(bio);
*skipped = 1;
- return max_sync;
+ return nr_sectors << 9;
}
}
- nr_sectors += len>>9;
- sector_nr += len>>9;
- } while (++page_idx < RESYNC_PAGES);
+ sector_nr += nr_sectors;
+ } while (0);
r10_bio->sectors = nr_sectors;
if (mddev_is_clustered(mddev) &&
@@ -4560,7 +4561,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
int *skipped)
{
/* We simply copy at most one chunk (smallest of old and new)
- * at a time, possibly less if that exceeds RESYNC_PAGES,
+ * at a time, possibly less if that exceeds RESYNC_BLOCK_SIZE,
* or we hit a bad block or something.
* This might mean we pause for normal IO in the middle of
* a chunk, but that is not a problem as mddev->reshape_position
@@ -4600,14 +4601,13 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
struct r10bio *r10_bio;
sector_t next, safe, last;
int max_sectors;
- int nr_sectors;
int s;
struct md_rdev *rdev;
int need_flush = 0;
struct bio *blist;
struct bio *bio, *read_bio;
int sectors_done = 0;
- struct page **pages;
+ struct folio *folio;
if (sector_nr == 0) {
/* If restarting in the middle, skip the initial sectors */
@@ -4709,7 +4709,12 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
r10_bio->mddev = mddev;
r10_bio->sector = sector_nr;
set_bit(R10BIO_IsReshape, &r10_bio->state);
- r10_bio->sectors = last - sector_nr + 1;
+ /*
+ * RESYNC_BLOCK_SIZE folio might alloc failed in
+ * resync_alloc_folio(). Fall back to smaller sync
+ * size if needed.
+ */
+ r10_bio->sectors = min_t(int, r10_bio->sectors, last - sector_nr + 1);
rdev = read_balance(conf, r10_bio, &max_sectors);
BUG_ON(!test_bit(R10BIO_Previous, &r10_bio->state));
@@ -4723,7 +4728,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
return sectors_done;
}
- read_bio = bio_alloc_bioset(rdev->bdev, RESYNC_PAGES, REQ_OP_READ,
+ read_bio = bio_alloc_bioset(rdev->bdev, 1, REQ_OP_READ,
GFP_KERNEL, &mddev->bio_set);
read_bio->bi_iter.bi_sector = (r10_bio->devs[r10_bio->read_slot].addr
+ rdev->data_offset);
@@ -4787,32 +4792,23 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
blist = b;
}
- /* Now add as many pages as possible to all of these bios. */
+ /* Now add folio to all of these bios. */
- nr_sectors = 0;
- pages = get_resync_pages(r10_bio->devs[0].bio)->pages;
- for (s = 0 ; s < max_sectors; s += PAGE_SIZE >> 9) {
- struct page *page = pages[s / (PAGE_SIZE >> 9)];
- int len = (max_sectors - s) << 9;
- if (len > PAGE_SIZE)
- len = PAGE_SIZE;
- for (bio = blist; bio ; bio = bio->bi_next) {
- if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
- bio->bi_status = BLK_STS_RESOURCE;
- bio_endio(bio);
- return sectors_done;
- }
+ folio = get_resync_folio(r10_bio->devs[0].bio)->folio;
+ for (bio = blist; bio ; bio = bio->bi_next) {
+ if (WARN_ON(!bio_add_folio(bio, folio, max_sectors, 0))) {
+ bio->bi_status = BLK_STS_RESOURCE;
+ bio_endio(bio);
+ return sectors_done;
}
- sector_nr += len >> 9;
- nr_sectors += len >> 9;
}
- r10_bio->sectors = nr_sectors;
+ r10_bio->sectors = max_sectors >> 9;
/* Now submit the read */
atomic_inc(&r10_bio->remaining);
read_bio->bi_next = NULL;
submit_bio_noacct(read_bio);
- sectors_done += nr_sectors;
+ sectors_done += max_sectors;
if (sector_nr <= last)
goto read_more;
@@ -4914,8 +4910,8 @@ static int handle_reshape_read_error(struct mddev *mddev,
struct r10conf *conf = mddev->private;
struct r10bio *r10b;
int slot = 0;
- int idx = 0;
- struct page **pages;
+ int sect = 0;
+ struct folio *folio;
r10b = kmalloc(struct_size(r10b, devs, conf->copies), GFP_NOIO);
if (!r10b) {
@@ -4923,8 +4919,8 @@ static int handle_reshape_read_error(struct mddev *mddev,
return -ENOMEM;
}
- /* reshape IOs share pages from .devs[0].bio */
- pages = get_resync_pages(r10_bio->devs[0].bio)->pages;
+ /* reshape IOs share folio from .devs[0].bio */
+ folio = get_resync_folio(r10_bio->devs[0].bio)->folio;
r10b->sector = r10_bio->sector;
__raid10_find_phys(&conf->prev, r10b);
@@ -4940,19 +4936,19 @@ static int handle_reshape_read_error(struct mddev *mddev,
while (!success) {
int d = r10b->devs[slot].devnum;
struct md_rdev *rdev = conf->mirrors[d].rdev;
- sector_t addr;
if (rdev == NULL ||
test_bit(Faulty, &rdev->flags) ||
!test_bit(In_sync, &rdev->flags))
goto failed;
- addr = r10b->devs[slot].addr + idx * PAGE_SIZE;
atomic_inc(&rdev->nr_pending);
- success = sync_page_io(rdev,
- addr,
- s << 9,
- pages[idx],
- REQ_OP_READ, false);
+ success = sync_folio_io(rdev,
+ r10b->devs[slot].addr +
+ sect,
+ s << 9,
+ sect << 9,
+ folio,
+ REQ_OP_READ, false);
rdev_dec_pending(rdev, mddev);
if (success)
break;
@@ -4971,7 +4967,7 @@ static int handle_reshape_read_error(struct mddev *mddev,
return -EIO;
}
sectors -= s;
- idx++;
+ sect += s;
}
kfree(r10b);
return 0;
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 7/8] md/raid1: fix IO error at logical block size granularity
2026-04-16 3:37 [PATCH v3 0/8] folio support for sync I/O in RAID linan666
` (5 preceding siblings ...)
2026-04-16 3:37 ` [PATCH v3 6/8] md/raid1,raid10: use folio for sync path IO linan666
@ 2026-04-16 3:38 ` linan666
2026-04-30 2:22 ` Xiao Ni
2026-04-16 3:38 ` [PATCH v3 8/8] md/raid10: " linan666
7 siblings, 1 reply; 13+ messages in thread
From: linan666 @ 2026-04-16 3:38 UTC (permalink / raw)
To: song, yukuai; +Cc: linux-raid, linux-kernel, linan666, yangerkun, yi.zhang
From: Li Nan <linan122@huawei.com>
RAID1 currently fixes IO error at PAGE_SIZE granularity. Fix at smaller
granularity can handle more errors, and RAID will support logical block
sizes larger than PAGE_SIZE in the future, where PAGE_SIZE IO will fail.
Switch IO error fix granularity to logical block size.
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai@fnnas.com>
---
drivers/md/raid1.c | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 724fd4f2cc3a..de8c964ca11d 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2116,7 +2116,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
{
/* Try some synchronous reads of other devices to get
* good data, much like with normal read errors. Only
- * read into the pages we already have so we don't
+ * read into the block we already have so we don't
* need to re-issue the read request.
* We don't need to freeze the array, because being in an
* active sync request, there is no normal IO, and
@@ -2147,13 +2147,11 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
}
while(sectors) {
- int s = sectors;
+ int s = min_t(int, sectors, mddev->logical_block_size >> 9);
int d = r1_bio->read_disk;
int success = 0;
int start;
- if (s > (PAGE_SIZE>>9))
- s = PAGE_SIZE >> 9;
do {
if (r1_bio->bios[d]->bi_end_io == end_sync_read) {
/* No rcu protection needed here devices
@@ -2192,7 +2190,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
if (abort)
return 0;
- /* Try next page */
+ /* Try next block */
sectors -= s;
sect += s;
off += s << 9;
@@ -2390,14 +2388,11 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
}
while(sectors) {
- int s = sectors;
+ int s = min_t(int, sectors, mddev->logical_block_size >> 9);
int d = read_disk;
int success = 0;
int start;
- if (s > (PAGE_SIZE>>9))
- s = PAGE_SIZE >> 9;
-
do {
rdev = conf->mirrors[d].rdev;
if (rdev &&
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 8/8] md/raid10: fix IO error at logical block size granularity
2026-04-16 3:37 [PATCH v3 0/8] folio support for sync I/O in RAID linan666
` (6 preceding siblings ...)
2026-04-16 3:38 ` [PATCH v3 7/8] md/raid1: fix IO error at logical block size granularity linan666
@ 2026-04-16 3:38 ` linan666
2026-04-30 2:23 ` Xiao Ni
7 siblings, 1 reply; 13+ messages in thread
From: linan666 @ 2026-04-16 3:38 UTC (permalink / raw)
To: song, yukuai; +Cc: linux-raid, linux-kernel, linan666, yangerkun, yi.zhang
From: Li Nan <linan122@huawei.com>
RAID10 currently fixes IO error at PAGE_SIZE granularity. Fix at smaller
granularity can handle more errors, and RAID will support logical block
sizes larger than PAGE_SIZE in the future, where PAGE_SIZE IO will fail.
Switch IO error fix granularity to logical block size.
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai@fnnas.com>
---
drivers/md/raid10.c | 17 ++++-------------
1 file changed, 4 insertions(+), 13 deletions(-)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 3638e00fe420..5b4ffd23211a 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2454,7 +2454,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
static void fix_recovery_read_error(struct r10bio *r10_bio)
{
/* We got a read error during recovery.
- * We repeat the read in smaller page-sized sections.
+ * We repeat the read in smaller logical_block_sized sections.
* If a read succeeds, write it to the new device or record
* a bad block if we cannot.
* If a read fails, record a bad block on both old and
@@ -2470,14 +2470,11 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
struct folio *folio = get_resync_folio(bio)->folio;
while (sectors) {
- int s = sectors;
+ int s = min_t(int, sectors, mddev->logical_block_size >> 9);
struct md_rdev *rdev;
sector_t addr;
int ok;
- if (s > (PAGE_SIZE>>9))
- s = PAGE_SIZE >> 9;
-
rdev = conf->mirrors[dr].rdev;
addr = r10_bio->devs[0].addr + sect;
ok = sync_folio_io(rdev,
@@ -2621,14 +2618,11 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
}
while(sectors) {
- int s = sectors;
+ int s = min_t(int, sectors, mddev->logical_block_size >> 9);
int sl = slot;
int success = 0;
int start;
- if (s > (PAGE_SIZE>>9))
- s = PAGE_SIZE >> 9;
-
do {
d = r10_bio->devs[sl].devnum;
rdev = conf->mirrors[d].rdev;
@@ -4926,13 +4920,10 @@ static int handle_reshape_read_error(struct mddev *mddev,
__raid10_find_phys(&conf->prev, r10b);
while (sectors) {
- int s = sectors;
+ int s = min_t(int, sectors, mddev->logical_block_size >> 9);
int success = 0;
int first_slot = slot;
- if (s > (PAGE_SIZE >> 9))
- s = PAGE_SIZE >> 9;
-
while (!success) {
int d = r10b->devs[slot].devnum;
struct md_rdev *rdev = conf->mirrors[d].rdev;
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v3 6/8] md/raid1,raid10: use folio for sync path IO
2026-04-16 3:37 ` [PATCH v3 6/8] md/raid1,raid10: use folio for sync path IO linan666
@ 2026-04-30 1:54 ` Xiao Ni
2026-05-07 7:13 ` 李楠 Magic Li
0 siblings, 1 reply; 13+ messages in thread
From: Xiao Ni @ 2026-04-30 1:54 UTC (permalink / raw)
To: linan666; +Cc: song, yukuai, linux-raid, linux-kernel, yangerkun, yi.zhang
Hi Nan
On Thu, Apr 16, 2026 at 11:55 AM <linan666@huaweicloud.com> wrote:
>
> From: Li Nan <linan122@huawei.com>
>
> Convert all IO on the sync path to use folios, and rename page-related
> identifiers to match folio.
>
> Since RESYNC_BLOCK_SIZE (64K) has higher allocation failure chance than 4k,
> retry with lower orders to improve allocation reliability. A r1/10_bio may
> have different rf->folio orders, so use minimum order as r1/10_bio sectors
> to prevent exceeding size when adding folio to IO later.
>
> Clean up:
> 1. Remove resync_get_all_folio() and invoke folio_get() directly instead.
> 2. Clean up redundant while(0) loop in md_bio_reset_resync_folio().
> 3. Clean up bio variable by directly referencing r10_bio->devs[j].bio
> instead in r1buf_pool_alloc() and r10buf_pool_alloc().
> 4. Clean up RESYNC_PAGES.
> 5. Remove resync_fetch_folio(), access 'rf->folio' directly.
> 6. Remove resync_free_folio(), call folio_put() directly.
> 7. clean up sync IO size calculation in raid1/10_sync_request.
>
> Signed-off-by: Li Nan <linan122@huawei.com>
> ---
> drivers/md/md.c | 2 +-
> drivers/md/raid1-10.c | 80 ++++---------
> drivers/md/raid1.c | 209 +++++++++++++++-------------------
> drivers/md/raid10.c | 254 +++++++++++++++++++++---------------------
> 4 files changed, 240 insertions(+), 305 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 5e83914d5c14..6554b849ac74 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -9440,7 +9440,7 @@ static bool sync_io_within_limit(struct mddev *mddev)
> {
> /*
> * For raid456, sync IO is stripe(4k) per IO, for other levels, it's
> - * RESYNC_PAGES(64k) per IO.
> + * RESYNC_BLOCK_SIZE(64k) per IO.
> */
> return atomic_read(&mddev->recovery_active) <
> (raid_is_456(mddev) ? 8 : 128) * sync_io_depth(mddev);
> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
> index cda531d0720b..10200b0a3fd2 100644
> --- a/drivers/md/raid1-10.c
> +++ b/drivers/md/raid1-10.c
> @@ -1,7 +1,6 @@
> // SPDX-License-Identifier: GPL-2.0
> /* Maximum size of each resync request */
> #define RESYNC_BLOCK_SIZE (64*1024)
> -#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
> #define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
>
> /* when we get a read error on a read-only array, we redirect to another
> @@ -20,9 +19,9 @@
> #define MAX_PLUG_BIO 32
>
> /* for managing resync I/O pages */
> -struct resync_pages {
> +struct resync_folio {
> void *raid_bio;
> - struct page *pages[RESYNC_PAGES];
> + struct folio *folio;
> };
>
> struct raid1_plug_cb {
> @@ -36,77 +35,44 @@ static void rbio_pool_free(void *rbio, void *data)
> kfree(rbio);
> }
>
> -static inline int resync_alloc_pages(struct resync_pages *rp,
> - gfp_t gfp_flags)
> +static inline int resync_alloc_folio(struct resync_folio *rf,
> + gfp_t gfp_flags, int *order)
> {
> - int i;
> + struct folio *folio;
>
> - for (i = 0; i < RESYNC_PAGES; i++) {
> - rp->pages[i] = alloc_page(gfp_flags);
> - if (!rp->pages[i])
> - goto out_free;
> - }
> + do {
> + folio = folio_alloc(gfp_flags, *order);
> + if (folio)
> + break;
> + } while (--(*order) > 0);
It has a problem here. If it can't allocate a big page, the sync
request unit will be smaller and sync performance may decrease. This
can happen when the system lacks sufficient continuous memory. This
change looks good to me. I just want to throw this problem out for an
open discussion.
>
> + if (!folio)
> + return -ENOMEM;
> +
> + rf->folio = folio;
> return 0;
> -
> -out_free:
> - while (--i >= 0)
> - put_page(rp->pages[i]);
> - return -ENOMEM;
> -}
> -
> -static inline void resync_free_pages(struct resync_pages *rp)
> -{
> - int i;
> -
> - for (i = 0; i < RESYNC_PAGES; i++)
> - put_page(rp->pages[i]);
> -}
> -
> -static inline void resync_get_all_pages(struct resync_pages *rp)
> -{
> - int i;
> -
> - for (i = 0; i < RESYNC_PAGES; i++)
> - get_page(rp->pages[i]);
> -}
> -
> -static inline struct page *resync_fetch_page(struct resync_pages *rp,
> - unsigned idx)
> -{
> - if (WARN_ON_ONCE(idx >= RESYNC_PAGES))
> - return NULL;
> - return rp->pages[idx];
> }
>
> /*
> - * 'strct resync_pages' stores actual pages used for doing the resync
> + * 'strct resync_folio' stores actual pages used for doing the resync
> * IO, and it is per-bio, so make .bi_private points to it.
> */
> -static inline struct resync_pages *get_resync_pages(struct bio *bio)
> +static inline struct resync_folio *get_resync_folio(struct bio *bio)
> {
> return bio->bi_private;
> }
>
> /* generally called after bio_reset() for reseting bvec */
> -static void md_bio_reset_resync_pages(struct bio *bio, struct resync_pages *rp,
> +static void md_bio_reset_resync_folio(struct bio *bio, struct resync_folio *rf,
> int size)
> {
> - int idx = 0;
> -
> /* initialize bvec table again */
> - do {
> - struct page *page = resync_fetch_page(rp, idx);
> - int len = min_t(int, size, PAGE_SIZE);
> -
> - if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
> - bio->bi_status = BLK_STS_RESOURCE;
> - bio_endio(bio);
> - return;
> - }
> -
> - size -= len;
> - } while (idx++ < RESYNC_PAGES && size > 0);
> + if (WARN_ON(!bio_add_folio(bio, rf->folio,
> + min_t(int, size, RESYNC_BLOCK_SIZE),
> + 0))) {
> + bio->bi_status = BLK_STS_RESOURCE;
> + bio_endio(bio);
> + }
> }
>
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index a72abdc37a2d..724fd4f2cc3a 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -120,11 +120,11 @@ static void remove_serial(struct md_rdev *rdev, sector_t lo, sector_t hi)
>
> /*
> * for resync bio, r1bio pointer can be retrieved from the per-bio
> - * 'struct resync_pages'.
> + * 'struct resync_folio'.
> */
> static inline struct r1bio *get_resync_r1bio(struct bio *bio)
> {
> - return get_resync_pages(bio)->raid_bio;
> + return get_resync_folio(bio)->raid_bio;
> }
>
> static void *r1bio_pool_alloc(gfp_t gfp_flags, struct r1conf *conf)
> @@ -146,70 +146,69 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
> struct r1conf *conf = data;
> struct r1bio *r1_bio;
> struct bio *bio;
> - int need_pages;
> + int need_folio;
The name need_folio is confusing. Can we keep the same style as the
old version? How about need_folios?
> int j;
> - struct resync_pages *rps;
> + struct resync_folio *rfs;
> + int order = get_order(RESYNC_BLOCK_SIZE);
>
> r1_bio = r1bio_pool_alloc(gfp_flags, conf);
> if (!r1_bio)
> return NULL;
>
> - rps = kmalloc_array(conf->raid_disks * 2, sizeof(struct resync_pages),
> + rfs = kmalloc_array(conf->raid_disks * 2, sizeof(struct resync_folio),
> gfp_flags);
> - if (!rps)
> + if (!rfs)
> goto out_free_r1bio;
>
> /*
> * Allocate bios : 1 for reading, n-1 for writing
> */
> for (j = conf->raid_disks * 2; j-- ; ) {
> - bio = bio_kmalloc(RESYNC_PAGES, gfp_flags);
> + bio = bio_kmalloc(1, gfp_flags);
> if (!bio)
> goto out_free_bio;
> - bio_init_inline(bio, NULL, RESYNC_PAGES, 0);
> + bio_init_inline(bio, NULL, 1, 0);
> r1_bio->bios[j] = bio;
> }
> /*
> - * Allocate RESYNC_PAGES data pages and attach them to
> - * the first bio.
> + * Allocate data folio and attach it to the first bio.
> * If this is a user-requested check/repair, allocate
> - * RESYNC_PAGES for each bio.
> + * folio for each bio.
> */
> if (test_bit(MD_RECOVERY_REQUESTED, &conf->mddev->recovery))
> - need_pages = conf->raid_disks * 2;
> + need_folio = conf->raid_disks * 2;
> else
> - need_pages = 1;
> + need_folio = 1;
> for (j = 0; j < conf->raid_disks * 2; j++) {
> - struct resync_pages *rp = &rps[j];
> + struct resync_folio *rf = &rfs[j];
>
> - bio = r1_bio->bios[j];
> -
> - if (j < need_pages) {
> - if (resync_alloc_pages(rp, gfp_flags))
> - goto out_free_pages;
> + if (j < need_folio) {
> + if (resync_alloc_folio(rf, gfp_flags, &order))
> + goto out_free_folio;
> } else {
> - memcpy(rp, &rps[0], sizeof(*rp));
> - resync_get_all_pages(rp);
> + memcpy(rf, &rfs[0], sizeof(*rf));
> + folio_get(rf->folio);
> }
>
> - rp->raid_bio = r1_bio;
> - bio->bi_private = rp;
> + rf->raid_bio = r1_bio;
> + r1_bio->bios[j]->bi_private = rf;
> }
>
> + r1_bio->sectors = 1 << (order + PAGE_SECTORS_SHIFT);
> r1_bio->master_bio = NULL;
>
> return r1_bio;
>
> -out_free_pages:
> +out_free_folio:
> while (--j >= 0)
> - resync_free_pages(&rps[j]);
> + folio_put(rfs[j].folio);
>
> out_free_bio:
> while (++j < conf->raid_disks * 2) {
> bio_uninit(r1_bio->bios[j]);
> kfree(r1_bio->bios[j]);
> }
> - kfree(rps);
> + kfree(rfs);
>
> out_free_r1bio:
> rbio_pool_free(r1_bio, data);
> @@ -221,17 +220,17 @@ static void r1buf_pool_free(void *__r1_bio, void *data)
> struct r1conf *conf = data;
> int i;
> struct r1bio *r1bio = __r1_bio;
> - struct resync_pages *rp = NULL;
> + struct resync_folio *rf = NULL;
>
> for (i = conf->raid_disks * 2; i--; ) {
> - rp = get_resync_pages(r1bio->bios[i]);
> - resync_free_pages(rp);
> + rf = get_resync_folio(r1bio->bios[i]);
> + folio_put(rf->folio);
> bio_uninit(r1bio->bios[i]);
> kfree(r1bio->bios[i]);
> }
>
> - /* resync pages array stored in the 1st bio's .bi_private */
> - kfree(rp);
> + /* resync folio stored in the 1st bio's .bi_private */
> + kfree(rf);
>
> rbio_pool_free(r1bio, data);
> }
> @@ -2095,10 +2094,10 @@ static void end_sync_write(struct bio *bio)
> put_sync_write_buf(r1_bio);
> }
>
> -static int r1_sync_page_io(struct md_rdev *rdev, sector_t sector,
> - int sectors, struct page *page, blk_opf_t rw)
> +static int r1_sync_folio_io(struct md_rdev *rdev, sector_t sector, int sectors,
> + int off, struct folio *folio, blk_opf_t rw)
> {
> - if (sync_page_io(rdev, sector, sectors << 9, page, rw, false))
> + if (sync_folio_io(rdev, sector, sectors << 9, off, folio, rw, false))
> /* success */
> return 1;
> if (rw == REQ_OP_WRITE) {
> @@ -2129,10 +2128,10 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
> struct mddev *mddev = r1_bio->mddev;
> struct r1conf *conf = mddev->private;
> struct bio *bio = r1_bio->bios[r1_bio->read_disk];
> - struct page **pages = get_resync_pages(bio)->pages;
> + struct folio *folio = get_resync_folio(bio)->folio;
> sector_t sect = r1_bio->sector;
> int sectors = r1_bio->sectors;
> - int idx = 0;
> + int off = 0;
> struct md_rdev *rdev;
>
> rdev = conf->mirrors[r1_bio->read_disk].rdev;
> @@ -2162,9 +2161,8 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
> * active, and resync is currently active
> */
> rdev = conf->mirrors[d].rdev;
> - if (sync_page_io(rdev, sect, s<<9,
> - pages[idx],
> - REQ_OP_READ, false)) {
> + if (sync_folio_io(rdev, sect, s<<9, off, folio,
> + REQ_OP_READ, false)) {
> success = 1;
> break;
> }
> @@ -2197,7 +2195,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
> /* Try next page */
> sectors -= s;
> sect += s;
> - idx++;
> + off += s << 9;
> continue;
> }
>
> @@ -2210,8 +2208,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
> if (r1_bio->bios[d]->bi_end_io != end_sync_read)
> continue;
> rdev = conf->mirrors[d].rdev;
> - if (r1_sync_page_io(rdev, sect, s,
> - pages[idx],
> + if (r1_sync_folio_io(rdev, sect, s, off, folio,
> REQ_OP_WRITE) == 0) {
> r1_bio->bios[d]->bi_end_io = NULL;
> rdev_dec_pending(rdev, mddev);
> @@ -2225,14 +2222,13 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
> if (r1_bio->bios[d]->bi_end_io != end_sync_read)
> continue;
> rdev = conf->mirrors[d].rdev;
> - if (r1_sync_page_io(rdev, sect, s,
> - pages[idx],
> + if (r1_sync_folio_io(rdev, sect, s, off, folio,
> REQ_OP_READ) != 0)
> atomic_add(s, &rdev->corrected_errors);
> }
> sectors -= s;
> sect += s;
> - idx ++;
> + off += s << 9;
> }
> set_bit(R1BIO_Uptodate, &r1_bio->state);
> bio->bi_status = 0;
> @@ -2252,14 +2248,12 @@ static void process_checks(struct r1bio *r1_bio)
> struct r1conf *conf = mddev->private;
> int primary;
> int i;
> - int vcnt;
>
> /* Fix variable parts of all bios */
> - vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9);
> for (i = 0; i < conf->raid_disks * 2; i++) {
> blk_status_t status;
> struct bio *b = r1_bio->bios[i];
> - struct resync_pages *rp = get_resync_pages(b);
> + struct resync_folio *rf = get_resync_folio(b);
> if (b->bi_end_io != end_sync_read)
> continue;
> /* fixup the bio for reuse, but preserve errno */
> @@ -2269,11 +2263,11 @@ static void process_checks(struct r1bio *r1_bio)
> b->bi_iter.bi_sector = r1_bio->sector +
> conf->mirrors[i].rdev->data_offset;
> b->bi_end_io = end_sync_read;
> - rp->raid_bio = r1_bio;
> - b->bi_private = rp;
> + rf->raid_bio = r1_bio;
> + b->bi_private = rf;
>
> /* initialize bvec table again */
> - md_bio_reset_resync_pages(b, rp, r1_bio->sectors << 9);
> + md_bio_reset_resync_folio(b, rf, r1_bio->sectors << 9);
> }
> for (primary = 0; primary < conf->raid_disks * 2; primary++)
> if (r1_bio->bios[primary]->bi_end_io == end_sync_read &&
> @@ -2284,44 +2278,39 @@ static void process_checks(struct r1bio *r1_bio)
> }
> r1_bio->read_disk = primary;
> for (i = 0; i < conf->raid_disks * 2; i++) {
> - int j = 0;
> struct bio *pbio = r1_bio->bios[primary];
> struct bio *sbio = r1_bio->bios[i];
> blk_status_t status = sbio->bi_status;
> - struct page **ppages = get_resync_pages(pbio)->pages;
> - struct page **spages = get_resync_pages(sbio)->pages;
> - struct bio_vec *bi;
> - int page_len[RESYNC_PAGES] = { 0 };
> - struct bvec_iter_all iter_all;
> + struct folio *pfolio = get_resync_folio(pbio)->folio;
> + struct folio *sfolio = get_resync_folio(sbio)->folio;
>
> if (sbio->bi_end_io != end_sync_read)
> continue;
> /* Now we can 'fixup' the error value */
> sbio->bi_status = 0;
>
> - bio_for_each_segment_all(bi, sbio, iter_all)
> - page_len[j++] = bi->bv_len;
> -
> - if (!status) {
> - for (j = vcnt; j-- ; ) {
> - if (memcmp(page_address(ppages[j]),
> - page_address(spages[j]),
> - page_len[j]))
> - break;
> - }
> - } else
> - j = 0;
> - if (j >= 0)
> + /*
> + * Copy data and submit write in two cases:
> + * - IO error (non-zero status)
> + * - Data inconsistency and not a CHECK operation.
> + */
> + if (status) {
> atomic64_add(r1_bio->sectors, &mddev->resync_mismatches);
> - if (j < 0 || (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)
> - && !status)) {
> - /* No need to write to this device. */
> - sbio->bi_end_io = NULL;
> - rdev_dec_pending(conf->mirrors[i].rdev, mddev);
> + bio_copy_data(sbio, pbio);
> continue;
> + } else if (memcmp(folio_address(pfolio),
> + folio_address(sfolio),
> + r1_bio->sectors << 9)) {
> + atomic64_add(r1_bio->sectors, &mddev->resync_mismatches);
> + if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) {
> + bio_copy_data(sbio, pbio);
> + continue;
> + }
> }
>
> - bio_copy_data(sbio, pbio);
> + /* No need to write to this device. */
> + sbio->bi_end_io = NULL;
> + rdev_dec_pending(conf->mirrors[i].rdev, mddev);
> }
> }
>
> @@ -2446,9 +2435,8 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
> if (rdev &&
> !test_bit(Faulty, &rdev->flags)) {
> atomic_inc(&rdev->nr_pending);
> - r1_sync_page_io(rdev, sect, s,
> - folio_page(conf->tmpfolio, 0),
> - REQ_OP_WRITE);
> + r1_sync_folio_io(rdev, sect, s, 0,
> + conf->tmpfolio, REQ_OP_WRITE);
> rdev_dec_pending(rdev, mddev);
> }
> }
> @@ -2461,9 +2449,8 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
> if (rdev &&
> !test_bit(Faulty, &rdev->flags)) {
> atomic_inc(&rdev->nr_pending);
> - if (r1_sync_page_io(rdev, sect, s,
> - folio_page(conf->tmpfolio, 0),
> - REQ_OP_READ)) {
> + if (r1_sync_folio_io(rdev, sect, s, 0,
> + conf->tmpfolio, REQ_OP_READ)) {
> atomic_add(s, &rdev->corrected_errors);
> pr_info("md/raid1:%s: read error corrected (%d sectors at %llu on %pg)\n",
> mdname(mddev), s,
> @@ -2738,15 +2725,15 @@ static int init_resync(struct r1conf *conf)
> static struct r1bio *raid1_alloc_init_r1buf(struct r1conf *conf)
> {
> struct r1bio *r1bio = mempool_alloc(&conf->r1buf_pool, GFP_NOIO);
> - struct resync_pages *rps;
> + struct resync_folio *rfs;
> struct bio *bio;
> int i;
>
> for (i = conf->raid_disks * 2; i--; ) {
> bio = r1bio->bios[i];
> - rps = bio->bi_private;
> + rfs = bio->bi_private;
> bio_reset(bio, NULL, 0);
> - bio->bi_private = rps;
> + bio->bi_private = rfs;
> }
> r1bio->master_bio = NULL;
> return r1bio;
> @@ -2775,10 +2762,9 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
> int write_targets = 0, read_targets = 0;
> sector_t sync_blocks;
> bool still_degraded = false;
> - int good_sectors = RESYNC_SECTORS;
> + int good_sectors;
> int min_bad = 0; /* number of sectors that are bad in all devices */
> int idx = sector_to_idx(sector_nr);
> - int page_idx = 0;
>
> if (!mempool_initialized(&conf->r1buf_pool))
> if (init_resync(conf))
> @@ -2858,8 +2844,11 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
> r1_bio->sector = sector_nr;
> r1_bio->state = 0;
> set_bit(R1BIO_IsSync, &r1_bio->state);
> - /* make sure good_sectors won't go across barrier unit boundary */
> - good_sectors = align_to_barrier_unit_end(sector_nr, good_sectors);
> + /*
> + * make sure good_sectors won't go across barrier unit boundary.
> + * r1_bio->sectors <= RESYNC_SECTORS.
> + */
> + good_sectors = align_to_barrier_unit_end(sector_nr, r1_bio->sectors);
>
> for (i = 0; i < conf->raid_disks * 2; i++) {
> struct md_rdev *rdev;
> @@ -2979,44 +2968,28 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
> max_sector = mddev->resync_max; /* Don't do IO beyond here */
> if (max_sector > sector_nr + good_sectors)
> max_sector = sector_nr + good_sectors;
> - nr_sectors = 0;
> - sync_blocks = 0;
> do {
> - struct page *page;
> - int len = PAGE_SIZE;
> - if (sector_nr + (len>>9) > max_sector)
> - len = (max_sector - sector_nr) << 9;
> - if (len == 0)
> + nr_sectors = max_sector - sector_nr;
> + if (nr_sectors == 0)
> break;
> - if (sync_blocks == 0) {
> - if (!md_bitmap_start_sync(mddev, sector_nr,
> - &sync_blocks, still_degraded) &&
> - !conf->fullsync &&
> - !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
> - break;
> - if ((len >> 9) > sync_blocks)
> - len = sync_blocks<<9;
> - }
> + if (!md_bitmap_start_sync(mddev, sector_nr,
> + &sync_blocks, still_degraded) &&
> + !conf->fullsync &&
> + !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
> + break;
> + if (nr_sectors > sync_blocks)
> + nr_sectors = sync_blocks;
>
> for (i = 0 ; i < conf->raid_disks * 2; i++) {
> - struct resync_pages *rp;
> -
> bio = r1_bio->bios[i];
> - rp = get_resync_pages(bio);
> if (bio->bi_end_io) {
> - page = resync_fetch_page(rp, page_idx);
> + struct resync_folio *rf = get_resync_folio(bio);
>
> - /*
> - * won't fail because the vec table is big
> - * enough to hold all these pages
> - */
> - __bio_add_page(bio, page, len, 0);
> + bio_add_folio_nofail(bio, rf->folio, nr_sectors << 9, 0);
> }
> }
> - nr_sectors += len>>9;
> - sector_nr += len>>9;
> - sync_blocks -= (len>>9);
> - } while (++page_idx < RESYNC_PAGES);
> + sector_nr += nr_sectors;
> + } while (0);
Now it can handle all pages in one go via a folio. It's strange to
keep while(0) here.
>
> r1_bio->sectors = nr_sectors;
This patch is a little big. Is it better to split this patch here?
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 26f93040cd13..3638e00fe420 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -96,11 +96,11 @@ static void end_reshape(struct r10conf *conf);
>
> /*
> * for resync bio, r10bio pointer can be retrieved from the per-bio
> - * 'struct resync_pages'.
> + * 'struct resync_folio'.
> */
> static inline struct r10bio *get_resync_r10bio(struct bio *bio)
> {
> - return get_resync_pages(bio)->raid_bio;
> + return get_resync_folio(bio)->raid_bio;
> }
>
> static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data)
> @@ -133,8 +133,9 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
> struct r10bio *r10_bio;
> struct bio *bio;
> int j;
> - int nalloc, nalloc_rp;
> - struct resync_pages *rps;
> + int nalloc, nalloc_rf;
> + struct resync_folio *rfs;
> + int order = get_order(RESYNC_BLOCK_SIZE);
>
> r10_bio = r10bio_pool_alloc(gfp_flags, conf);
> if (!r10_bio)
> @@ -148,66 +149,64 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
>
> /* allocate once for all bios */
> if (!conf->have_replacement)
> - nalloc_rp = nalloc;
> + nalloc_rf = nalloc;
> else
> - nalloc_rp = nalloc * 2;
> - rps = kmalloc_array(nalloc_rp, sizeof(struct resync_pages), gfp_flags);
> - if (!rps)
> + nalloc_rf = nalloc * 2;
> + rfs = kmalloc_array(nalloc_rf, sizeof(struct resync_folio), gfp_flags);
> + if (!rfs)
> goto out_free_r10bio;
>
> /*
> * Allocate bios.
> */
> for (j = nalloc ; j-- ; ) {
> - bio = bio_kmalloc(RESYNC_PAGES, gfp_flags);
> + bio = bio_kmalloc(1, gfp_flags);
> if (!bio)
> goto out_free_bio;
> - bio_init_inline(bio, NULL, RESYNC_PAGES, 0);
> + bio_init_inline(bio, NULL, 1, 0);
> r10_bio->devs[j].bio = bio;
> if (!conf->have_replacement)
> continue;
> - bio = bio_kmalloc(RESYNC_PAGES, gfp_flags);
> + bio = bio_kmalloc(1, gfp_flags);
> if (!bio)
> goto out_free_bio;
> - bio_init_inline(bio, NULL, RESYNC_PAGES, 0);
> + bio_init_inline(bio, NULL, 1, 0);
> r10_bio->devs[j].repl_bio = bio;
> }
> /*
> - * Allocate RESYNC_PAGES data pages and attach them
> - * where needed.
> + * Allocate data folio and attach it where needed.
> */
> for (j = 0; j < nalloc; j++) {
> struct bio *rbio = r10_bio->devs[j].repl_bio;
> - struct resync_pages *rp, *rp_repl;
> + struct resync_folio *rf, *rf_repl;
>
> - rp = &rps[j];
> + rf = &rfs[j];
> if (rbio)
> - rp_repl = &rps[nalloc + j];
> -
> - bio = r10_bio->devs[j].bio;
> + rf_repl = &rfs[nalloc + j];
>
> if (!j || test_bit(MD_RECOVERY_SYNC,
> &conf->mddev->recovery)) {
> - if (resync_alloc_pages(rp, gfp_flags))
> - goto out_free_pages;
> + if (resync_alloc_folio(rf, gfp_flags, &order))
> + goto out_free_folio;
> } else {
> - memcpy(rp, &rps[0], sizeof(*rp));
> - resync_get_all_pages(rp);
> + memcpy(rf, &rfs[0], sizeof(*rf));
> + folio_get(rf->folio);
> }
>
> - rp->raid_bio = r10_bio;
> - bio->bi_private = rp;
> + rf->raid_bio = r10_bio;
> + r10_bio->devs[j].bio->bi_private = rf;
> if (rbio) {
> - memcpy(rp_repl, rp, sizeof(*rp));
> - rbio->bi_private = rp_repl;
> + memcpy(rf_repl, rf, sizeof(*rf));
> + rbio->bi_private = rf_repl;
> }
> }
>
> + r10_bio->sectors = 1 << (order + PAGE_SECTORS_SHIFT);
> return r10_bio;
>
> -out_free_pages:
> +out_free_folio:
> while (--j >= 0)
> - resync_free_pages(&rps[j]);
> + folio_put(rfs[j].folio);
>
> j = 0;
> out_free_bio:
> @@ -219,7 +218,7 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
> bio_uninit(r10_bio->devs[j].repl_bio);
> kfree(r10_bio->devs[j].repl_bio);
> }
> - kfree(rps);
> + kfree(rfs);
> out_free_r10bio:
> rbio_pool_free(r10_bio, conf);
> return NULL;
> @@ -230,14 +229,14 @@ static void r10buf_pool_free(void *__r10_bio, void *data)
> struct r10conf *conf = data;
> struct r10bio *r10bio = __r10_bio;
> int j;
> - struct resync_pages *rp = NULL;
> + struct resync_folio *rf = NULL;
>
> for (j = conf->copies; j--; ) {
> struct bio *bio = r10bio->devs[j].bio;
>
> if (bio) {
> - rp = get_resync_pages(bio);
> - resync_free_pages(rp);
> + rf = get_resync_folio(bio);
> + folio_put(rf->folio);
> bio_uninit(bio);
> kfree(bio);
> }
> @@ -250,7 +249,7 @@ static void r10buf_pool_free(void *__r10_bio, void *data)
> }
>
> /* resync pages array stored in the 1st bio's .bi_private */
> - kfree(rp);
> + kfree(rf);
>
> rbio_pool_free(r10bio, conf);
> }
> @@ -2342,8 +2341,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
> struct r10conf *conf = mddev->private;
> int i, first;
> struct bio *tbio, *fbio;
> - int vcnt;
> - struct page **tpages, **fpages;
> + struct folio *tfolio, *ffolio;
>
> atomic_set(&r10_bio->remaining, 1);
>
> @@ -2359,14 +2357,13 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
> fbio = r10_bio->devs[i].bio;
> fbio->bi_iter.bi_size = r10_bio->sectors << 9;
> fbio->bi_iter.bi_idx = 0;
> - fpages = get_resync_pages(fbio)->pages;
> + ffolio = get_resync_folio(fbio)->folio;
>
> - vcnt = (r10_bio->sectors + (PAGE_SIZE >> 9) - 1) >> (PAGE_SHIFT - 9);
> /* now find blocks with errors */
> for (i=0 ; i < conf->copies ; i++) {
> - int j, d;
> + int d;
> struct md_rdev *rdev;
> - struct resync_pages *rp;
> + struct resync_folio *rf;
>
> tbio = r10_bio->devs[i].bio;
>
> @@ -2375,31 +2372,23 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
> if (i == first)
> continue;
>
> - tpages = get_resync_pages(tbio)->pages;
> + tfolio = get_resync_folio(tbio)->folio;
> d = r10_bio->devs[i].devnum;
> rdev = conf->mirrors[d].rdev;
> if (!r10_bio->devs[i].bio->bi_status) {
> /* We know that the bi_io_vec layout is the same for
> * both 'first' and 'i', so we just compare them.
> - * All vec entries are PAGE_SIZE;
> */
> - int sectors = r10_bio->sectors;
> - for (j = 0; j < vcnt; j++) {
> - int len = PAGE_SIZE;
> - if (sectors < (len / 512))
> - len = sectors * 512;
> - if (memcmp(page_address(fpages[j]),
> - page_address(tpages[j]),
> - len))
> - break;
> - sectors -= len/512;
> + if (memcmp(folio_address(ffolio),
> + folio_address(tfolio),
> + r10_bio->sectors << 9)) {
> + atomic64_add(r10_bio->sectors,
> + &mddev->resync_mismatches);
> + if (test_bit(MD_RECOVERY_CHECK,
> + &mddev->recovery))
> + /* Don't fix anything. */
> + continue;
> }
> - if (j == vcnt)
> - continue;
> - atomic64_add(r10_bio->sectors, &mddev->resync_mismatches);
> - if (test_bit(MD_RECOVERY_CHECK, &mddev->recovery))
> - /* Don't fix anything. */
> - continue;
> } else if (test_bit(FailFast, &rdev->flags)) {
> /* Just give up on this device */
> md_error(rdev->mddev, rdev);
> @@ -2410,13 +2399,13 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
> * First we need to fixup bv_offset, bv_len and
> * bi_vecs, as the read request might have corrupted these
> */
> - rp = get_resync_pages(tbio);
> + rf = get_resync_folio(tbio);
> bio_reset(tbio, conf->mirrors[d].rdev->bdev, REQ_OP_WRITE);
>
> - md_bio_reset_resync_pages(tbio, rp, fbio->bi_iter.bi_size);
> + md_bio_reset_resync_folio(tbio, rf, fbio->bi_iter.bi_size);
>
> - rp->raid_bio = r10_bio;
> - tbio->bi_private = rp;
> + rf->raid_bio = r10_bio;
> + tbio->bi_private = rf;
> tbio->bi_iter.bi_sector = r10_bio->devs[i].addr;
> tbio->bi_end_io = end_sync_write;
>
> @@ -2476,10 +2465,9 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
> struct bio *bio = r10_bio->devs[0].bio;
> sector_t sect = 0;
> int sectors = r10_bio->sectors;
> - int idx = 0;
> int dr = r10_bio->devs[0].devnum;
> int dw = r10_bio->devs[1].devnum;
> - struct page **pages = get_resync_pages(bio)->pages;
> + struct folio *folio = get_resync_folio(bio)->folio;
>
> while (sectors) {
> int s = sectors;
> @@ -2492,19 +2480,21 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
>
> rdev = conf->mirrors[dr].rdev;
> addr = r10_bio->devs[0].addr + sect;
> - ok = sync_page_io(rdev,
> - addr,
> - s << 9,
> - pages[idx],
> - REQ_OP_READ, false);
> + ok = sync_folio_io(rdev,
> + addr,
> + s << 9,
> + sect << 9,
> + folio,
> + REQ_OP_READ, false);
> if (ok) {
> rdev = conf->mirrors[dw].rdev;
> addr = r10_bio->devs[1].addr + sect;
> - ok = sync_page_io(rdev,
> - addr,
> - s << 9,
> - pages[idx],
> - REQ_OP_WRITE, false);
> + ok = sync_folio_io(rdev,
> + addr,
> + s << 9,
> + sect << 9,
> + folio,
> + REQ_OP_WRITE, false);
> if (!ok) {
> set_bit(WriteErrorSeen, &rdev->flags);
> if (!test_and_set_bit(WantReplacement,
> @@ -2539,7 +2529,6 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
>
> sectors -= s;
> sect += s;
> - idx++;
> }
> }
>
> @@ -3050,7 +3039,7 @@ static int init_resync(struct r10conf *conf)
> static struct r10bio *raid10_alloc_init_r10buf(struct r10conf *conf)
> {
> struct r10bio *r10bio = mempool_alloc(&conf->r10buf_pool, GFP_NOIO);
> - struct rsync_pages *rp;
> + struct resync_folio *rf;
> struct bio *bio;
> int nalloc;
> int i;
> @@ -3063,14 +3052,14 @@ static struct r10bio *raid10_alloc_init_r10buf(struct r10conf *conf)
>
> for (i = 0; i < nalloc; i++) {
> bio = r10bio->devs[i].bio;
> - rp = bio->bi_private;
> + rf = bio->bi_private;
> bio_reset(bio, NULL, 0);
> - bio->bi_private = rp;
> + bio->bi_private = rf;
> bio = r10bio->devs[i].repl_bio;
> if (bio) {
> - rp = bio->bi_private;
> + rf = bio->bi_private;
> bio_reset(bio, NULL, 0);
> - bio->bi_private = rp;
> + bio->bi_private = rf;
> }
> }
> return r10bio;
> @@ -3156,7 +3145,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> int max_sync = RESYNC_SECTORS;
> sector_t sync_blocks;
> sector_t chunk_mask = conf->geo.chunk_mask;
> - int page_idx = 0;
>
> /*
> * Allow skipping a full rebuild for incremental assembly
> @@ -3376,6 +3364,15 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> continue;
> }
> }
> +
> + /*
> + * RESYNC_BLOCK_SIZE folio might alloc failed in
> + * resync_alloc_folio(). Fall back to smaller sync
> + * size if needed.
> + */
> + if (max_sync > r10_bio->sectors)
> + max_sync = r10_bio->sectors;
> +
> any_working = 1;
> bio = r10_bio->devs[0].bio;
> bio->bi_next = biolist;
> @@ -3527,7 +3524,15 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> }
> if (sync_blocks < max_sync)
> max_sync = sync_blocks;
> +
> r10_bio = raid10_alloc_init_r10buf(conf);
> + /*
> + * RESYNC_BLOCK_SIZE folio might alloc failed in resync_alloc_folio().
> + * Fall back to smaller sync size if needed.
> + */
> + if (max_sync > r10_bio->sectors)
> + max_sync = r10_bio->sectors;
> +
> r10_bio->state = 0;
>
> r10_bio->mddev = mddev;
> @@ -3620,29 +3625,25 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> }
> }
>
> - nr_sectors = 0;
> if (sector_nr + max_sync < max_sector)
> max_sector = sector_nr + max_sync;
> do {
> - struct page *page;
> - int len = PAGE_SIZE;
> - if (sector_nr + (len>>9) > max_sector)
> - len = (max_sector - sector_nr) << 9;
> - if (len == 0)
> + nr_sectors = max_sector - sector_nr;
> +
> + if (nr_sectors == 0)
> break;
> for (bio= biolist ; bio ; bio=bio->bi_next) {
> - struct resync_pages *rp = get_resync_pages(bio);
> - page = resync_fetch_page(rp, page_idx);
> - if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
> + struct resync_folio *rf = get_resync_folio(bio);
> +
> + if (WARN_ON(!bio_add_folio(bio, rf->folio, nr_sectors << 9, 0))) {
> bio->bi_status = BLK_STS_RESOURCE;
> bio_endio(bio);
> *skipped = 1;
> - return max_sync;
> + return nr_sectors << 9;
> }
> }
> - nr_sectors += len>>9;
> - sector_nr += len>>9;
> - } while (++page_idx < RESYNC_PAGES);
> + sector_nr += nr_sectors;
> + } while (0);
> r10_bio->sectors = nr_sectors;
>
> if (mddev_is_clustered(mddev) &&
> @@ -4560,7 +4561,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
> int *skipped)
> {
> /* We simply copy at most one chunk (smallest of old and new)
> - * at a time, possibly less if that exceeds RESYNC_PAGES,
> + * at a time, possibly less if that exceeds RESYNC_BLOCK_SIZE,
> * or we hit a bad block or something.
> * This might mean we pause for normal IO in the middle of
> * a chunk, but that is not a problem as mddev->reshape_position
> @@ -4600,14 +4601,13 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
> struct r10bio *r10_bio;
> sector_t next, safe, last;
> int max_sectors;
> - int nr_sectors;
> int s;
> struct md_rdev *rdev;
> int need_flush = 0;
> struct bio *blist;
> struct bio *bio, *read_bio;
> int sectors_done = 0;
> - struct page **pages;
> + struct folio *folio;
>
> if (sector_nr == 0) {
> /* If restarting in the middle, skip the initial sectors */
> @@ -4709,7 +4709,12 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
> r10_bio->mddev = mddev;
> r10_bio->sector = sector_nr;
> set_bit(R10BIO_IsReshape, &r10_bio->state);
> - r10_bio->sectors = last - sector_nr + 1;
> + /*
> + * RESYNC_BLOCK_SIZE folio might alloc failed in
> + * resync_alloc_folio(). Fall back to smaller sync
> + * size if needed.
> + */
> + r10_bio->sectors = min_t(int, r10_bio->sectors, last - sector_nr + 1);
> rdev = read_balance(conf, r10_bio, &max_sectors);
> BUG_ON(!test_bit(R10BIO_Previous, &r10_bio->state));
>
> @@ -4723,7 +4728,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
> return sectors_done;
> }
>
> - read_bio = bio_alloc_bioset(rdev->bdev, RESYNC_PAGES, REQ_OP_READ,
> + read_bio = bio_alloc_bioset(rdev->bdev, 1, REQ_OP_READ,
> GFP_KERNEL, &mddev->bio_set);
> read_bio->bi_iter.bi_sector = (r10_bio->devs[r10_bio->read_slot].addr
> + rdev->data_offset);
> @@ -4787,32 +4792,23 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
> blist = b;
> }
>
> - /* Now add as many pages as possible to all of these bios. */
> + /* Now add folio to all of these bios. */
>
> - nr_sectors = 0;
> - pages = get_resync_pages(r10_bio->devs[0].bio)->pages;
> - for (s = 0 ; s < max_sectors; s += PAGE_SIZE >> 9) {
> - struct page *page = pages[s / (PAGE_SIZE >> 9)];
> - int len = (max_sectors - s) << 9;
> - if (len > PAGE_SIZE)
> - len = PAGE_SIZE;
> - for (bio = blist; bio ; bio = bio->bi_next) {
> - if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
> - bio->bi_status = BLK_STS_RESOURCE;
> - bio_endio(bio);
> - return sectors_done;
> - }
> + folio = get_resync_folio(r10_bio->devs[0].bio)->folio;
> + for (bio = blist; bio ; bio = bio->bi_next) {
> + if (WARN_ON(!bio_add_folio(bio, folio, max_sectors, 0))) {
> + bio->bi_status = BLK_STS_RESOURCE;
> + bio_endio(bio);
> + return sectors_done;
In fact, the original codes don't clean up before returning.
bio_add_folio_nofail is used in raid1 and can we use
bio_add_folio_nofail here as well?
> }
> - sector_nr += len >> 9;
> - nr_sectors += len >> 9;
> }
> - r10_bio->sectors = nr_sectors;
> + r10_bio->sectors = max_sectors >> 9;
>
> /* Now submit the read */
> atomic_inc(&r10_bio->remaining);
> read_bio->bi_next = NULL;
> submit_bio_noacct(read_bio);
> - sectors_done += nr_sectors;
> + sectors_done += max_sectors;
> if (sector_nr <= last)
> goto read_more;
>
> @@ -4914,8 +4910,8 @@ static int handle_reshape_read_error(struct mddev *mddev,
> struct r10conf *conf = mddev->private;
> struct r10bio *r10b;
> int slot = 0;
> - int idx = 0;
> - struct page **pages;
> + int sect = 0;
> + struct folio *folio;
>
> r10b = kmalloc(struct_size(r10b, devs, conf->copies), GFP_NOIO);
> if (!r10b) {
> @@ -4923,8 +4919,8 @@ static int handle_reshape_read_error(struct mddev *mddev,
> return -ENOMEM;
> }
>
> - /* reshape IOs share pages from .devs[0].bio */
> - pages = get_resync_pages(r10_bio->devs[0].bio)->pages;
> + /* reshape IOs share folio from .devs[0].bio */
> + folio = get_resync_folio(r10_bio->devs[0].bio)->folio;
>
> r10b->sector = r10_bio->sector;
> __raid10_find_phys(&conf->prev, r10b);
> @@ -4940,19 +4936,19 @@ static int handle_reshape_read_error(struct mddev *mddev,
> while (!success) {
> int d = r10b->devs[slot].devnum;
> struct md_rdev *rdev = conf->mirrors[d].rdev;
> - sector_t addr;
> if (rdev == NULL ||
> test_bit(Faulty, &rdev->flags) ||
> !test_bit(In_sync, &rdev->flags))
> goto failed;
>
> - addr = r10b->devs[slot].addr + idx * PAGE_SIZE;
> atomic_inc(&rdev->nr_pending);
> - success = sync_page_io(rdev,
> - addr,
> - s << 9,
> - pages[idx],
> - REQ_OP_READ, false);
> + success = sync_folio_io(rdev,
> + r10b->devs[slot].addr +
> + sect,
> + s << 9,
> + sect << 9,
> + folio,
> + REQ_OP_READ, false);
> rdev_dec_pending(rdev, mddev);
> if (success)
> break;
> @@ -4971,7 +4967,7 @@ static int handle_reshape_read_error(struct mddev *mddev,
> return -EIO;
> }
> sectors -= s;
> - idx++;
> + sect += s;
> }
> kfree(r10b);
> return 0;
> --
> 2.39.2
>
>
Regards
Xiao
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 7/8] md/raid1: fix IO error at logical block size granularity
2026-04-16 3:38 ` [PATCH v3 7/8] md/raid1: fix IO error at logical block size granularity linan666
@ 2026-04-30 2:22 ` Xiao Ni
0 siblings, 0 replies; 13+ messages in thread
From: Xiao Ni @ 2026-04-30 2:22 UTC (permalink / raw)
To: linan666; +Cc: song, yukuai, linux-raid, linux-kernel, yangerkun, yi.zhang
On Thu, Apr 16, 2026 at 11:55 AM <linan666@huaweicloud.com> wrote:
>
> From: Li Nan <linan122@huawei.com>
>
> RAID1 currently fixes IO error at PAGE_SIZE granularity. Fix at smaller
> granularity can handle more errors, and RAID will support logical block
> sizes larger than PAGE_SIZE in the future, where PAGE_SIZE IO will fail.
>
> Switch IO error fix granularity to logical block size.
>
> Signed-off-by: Li Nan <linan122@huawei.com>
> Reviewed-by: Yu Kuai <yukuai@fnnas.com>
> ---
> drivers/md/raid1.c | 13 ++++---------
> 1 file changed, 4 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 724fd4f2cc3a..de8c964ca11d 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2116,7 +2116,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
> {
> /* Try some synchronous reads of other devices to get
> * good data, much like with normal read errors. Only
> - * read into the pages we already have so we don't
> + * read into the block we already have so we don't
> * need to re-issue the read request.
> * We don't need to freeze the array, because being in an
> * active sync request, there is no normal IO, and
> @@ -2147,13 +2147,11 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
> }
>
> while(sectors) {
> - int s = sectors;
> + int s = min_t(int, sectors, mddev->logical_block_size >> 9);
> int d = r1_bio->read_disk;
> int success = 0;
> int start;
>
> - if (s > (PAGE_SIZE>>9))
> - s = PAGE_SIZE >> 9;
> do {
> if (r1_bio->bios[d]->bi_end_io == end_sync_read) {
> /* No rcu protection needed here devices
> @@ -2192,7 +2190,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
> if (abort)
> return 0;
>
> - /* Try next page */
> + /* Try next block */
> sectors -= s;
> sect += s;
> off += s << 9;
> @@ -2390,14 +2388,11 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
> }
>
> while(sectors) {
> - int s = sectors;
> + int s = min_t(int, sectors, mddev->logical_block_size >> 9);
> int d = read_disk;
> int success = 0;
> int start;
>
> - if (s > (PAGE_SIZE>>9))
> - s = PAGE_SIZE >> 9;
> -
> do {
> rdev = conf->mirrors[d].rdev;
> if (rdev &&
> --
> 2.39.2
>
>
This patch looks good to me.
Reviewed-by: Xiao Ni <xni@redhat.com>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 8/8] md/raid10: fix IO error at logical block size granularity
2026-04-16 3:38 ` [PATCH v3 8/8] md/raid10: " linan666
@ 2026-04-30 2:23 ` Xiao Ni
0 siblings, 0 replies; 13+ messages in thread
From: Xiao Ni @ 2026-04-30 2:23 UTC (permalink / raw)
To: linan666; +Cc: song, yukuai, linux-raid, linux-kernel, yangerkun, yi.zhang
On Thu, Apr 16, 2026 at 11:51 AM <linan666@huaweicloud.com> wrote:
>
> From: Li Nan <linan122@huawei.com>
>
> RAID10 currently fixes IO error at PAGE_SIZE granularity. Fix at smaller
> granularity can handle more errors, and RAID will support logical block
> sizes larger than PAGE_SIZE in the future, where PAGE_SIZE IO will fail.
>
> Switch IO error fix granularity to logical block size.
>
> Signed-off-by: Li Nan <linan122@huawei.com>
> Reviewed-by: Yu Kuai <yukuai@fnnas.com>
> ---
> drivers/md/raid10.c | 17 ++++-------------
> 1 file changed, 4 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 3638e00fe420..5b4ffd23211a 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -2454,7 +2454,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
> static void fix_recovery_read_error(struct r10bio *r10_bio)
> {
> /* We got a read error during recovery.
> - * We repeat the read in smaller page-sized sections.
> + * We repeat the read in smaller logical_block_sized sections.
> * If a read succeeds, write it to the new device or record
> * a bad block if we cannot.
> * If a read fails, record a bad block on both old and
> @@ -2470,14 +2470,11 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
> struct folio *folio = get_resync_folio(bio)->folio;
>
> while (sectors) {
> - int s = sectors;
> + int s = min_t(int, sectors, mddev->logical_block_size >> 9);
> struct md_rdev *rdev;
> sector_t addr;
> int ok;
>
> - if (s > (PAGE_SIZE>>9))
> - s = PAGE_SIZE >> 9;
> -
> rdev = conf->mirrors[dr].rdev;
> addr = r10_bio->devs[0].addr + sect;
> ok = sync_folio_io(rdev,
> @@ -2621,14 +2618,11 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
> }
>
> while(sectors) {
> - int s = sectors;
> + int s = min_t(int, sectors, mddev->logical_block_size >> 9);
> int sl = slot;
> int success = 0;
> int start;
>
> - if (s > (PAGE_SIZE>>9))
> - s = PAGE_SIZE >> 9;
> -
> do {
> d = r10_bio->devs[sl].devnum;
> rdev = conf->mirrors[d].rdev;
> @@ -4926,13 +4920,10 @@ static int handle_reshape_read_error(struct mddev *mddev,
> __raid10_find_phys(&conf->prev, r10b);
>
> while (sectors) {
> - int s = sectors;
> + int s = min_t(int, sectors, mddev->logical_block_size >> 9);
> int success = 0;
> int first_slot = slot;
>
> - if (s > (PAGE_SIZE >> 9))
> - s = PAGE_SIZE >> 9;
> -
> while (!success) {
> int d = r10b->devs[slot].devnum;
> struct md_rdev *rdev = conf->mirrors[d].rdev;
> --
> 2.39.2
>
>
This patch looks good to me.
Reviewed-by: Xiao Ni <xni@redhat.com>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 6/8] md/raid1,raid10: use folio for sync path IO
2026-04-30 1:54 ` Xiao Ni
@ 2026-05-07 7:13 ` 李楠 Magic Li
0 siblings, 0 replies; 13+ messages in thread
From: 李楠 Magic Li @ 2026-05-07 7:13 UTC (permalink / raw)
To: Xiao Ni, linan666@huaweicloud.com
Cc: song@kernel.org, yukuai@fnnas.com, linux-raid@vger.kernel.org,
linux-kernel@vger.kernel.org, yangerkun@huawei.com,
yi.zhang@huawei.com, 张同浩 Tonghao Zhang
On Thu Apr 30, 2026 at 9:54 AM CST, Xiao Ni wrote:
> Hi Nan
>
> On Thu, Apr 16, 2026 at 11:55 AM <linan666@huaweicloud.com> wrote:
>>
>> From: Li Nan <linan122@huawei.com>
>>
>> Convert all IO on the sync path to use folios, and rename page-related
>> identifiers to match folio.
>>
>> Since RESYNC_BLOCK_SIZE (64K) has higher allocation failure chance than 4k,
>> retry with lower orders to improve allocation reliability. A r1/10_bio may
>> have different rf->folio orders, so use minimum order as r1/10_bio sectors
>> to prevent exceeding size when adding folio to IO later.
>>
>> Clean up:
>> 1. Remove resync_get_all_folio() and invoke folio_get() directly instead.
>> 2. Clean up redundant while(0) loop in md_bio_reset_resync_folio().
>> 3. Clean up bio variable by directly referencing r10_bio->devs[j].bio
>> instead in r1buf_pool_alloc() and r10buf_pool_alloc().
>> 4. Clean up RESYNC_PAGES.
>> 5. Remove resync_fetch_folio(), access 'rf->folio' directly.
>> 6. Remove resync_free_folio(), call folio_put() directly.
>> 7. clean up sync IO size calculation in raid1/10_sync_request.
>>
>> Signed-off-by: Li Nan <linan122@huawei.com>
>> ---
>> drivers/md/md.c | 2 +-
>> drivers/md/raid1-10.c | 80 ++++---------
>> drivers/md/raid1.c | 209 +++++++++++++++-------------------
>> drivers/md/raid10.c | 254 +++++++++++++++++++++---------------------
>> 4 files changed, 240 insertions(+), 305 deletions(-)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 5e83914d5c14..6554b849ac74 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -9440,7 +9440,7 @@ static bool sync_io_within_limit(struct mddev *mddev)
>> {
>> /*
>> * For raid456, sync IO is stripe(4k) per IO, for other levels, it's
>> - * RESYNC_PAGES(64k) per IO.
>> + * RESYNC_BLOCK_SIZE(64k) per IO.
>> */
>> return atomic_read(&mddev->recovery_active) <
>> (raid_is_456(mddev) ? 8 : 128) * sync_io_depth(mddev);
>> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
>> index cda531d0720b..10200b0a3fd2 100644
>> --- a/drivers/md/raid1-10.c
>> +++ b/drivers/md/raid1-10.c
>> @@ -1,7 +1,6 @@
>> // SPDX-License-Identifier: GPL-2.0
>> /* Maximum size of each resync request */
>> #define RESYNC_BLOCK_SIZE (64*1024)
>> -#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
>> #define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
>>
>> /* when we get a read error on a read-only array, we redirect to another
>> @@ -20,9 +19,9 @@
>> #define MAX_PLUG_BIO 32
>>
>> /* for managing resync I/O pages */
>> -struct resync_pages {
>> +struct resync_folio {
>> void *raid_bio;
>> - struct page *pages[RESYNC_PAGES];
>> + struct folio *folio;
>> };
>>
>> struct raid1_plug_cb {
>> @@ -36,77 +35,44 @@ static void rbio_pool_free(void *rbio, void *data)
>> kfree(rbio);
>> }
>>
>> -static inline int resync_alloc_pages(struct resync_pages *rp,
>> - gfp_t gfp_flags)
>> +static inline int resync_alloc_folio(struct resync_folio *rf,
>> + gfp_t gfp_flags, int *order)
>> {
>> - int i;
>> + struct folio *folio;
>>
>> - for (i = 0; i < RESYNC_PAGES; i++) {
>> - rp->pages[i] = alloc_page(gfp_flags);
>> - if (!rp->pages[i])
>> - goto out_free;
>> - }
>> + do {
>> + folio = folio_alloc(gfp_flags, *order);
>> + if (folio)
>> + break;
>> + } while (--(*order) > 0);
>
> It has a problem here. If it can't allocate a big page, the sync
> request unit will be smaller and sync performance may decrease. This
> can happen when the system lacks sufficient continuous memory. This
> change looks good to me. I just want to throw this problem out for an
> open discussion.
>
Yeah, it can be easily reproduced in qemu. We have a few options:
1. Alloc smaller folio
2. Return -ENOMEM directly
3. Alloc multiple small folios to assemble a larger one. It is not and
good idea, as it will make the code much more complex.
IMO, 1 seems like the best choice.
>>
>> + if (!folio)
>> + return -ENOMEM;
>> +
>> + rf->folio = folio;
>> return 0;
>> -
>> -out_free:
>> - while (--i >= 0)
>> - put_page(rp->pages[i]);
>> - return -ENOMEM;
>> -}
>> -
>> -static inline void resync_free_pages(struct resync_pages *rp)
>> -{
>> - int i;
>> -
>> - for (i = 0; i < RESYNC_PAGES; i++)
>> - put_page(rp->pages[i]);
>> -}
>> -
>> -static inline void resync_get_all_pages(struct resync_pages *rp)
>> -{
>> - int i;
>> -
>> - for (i = 0; i < RESYNC_PAGES; i++)
>> - get_page(rp->pages[i]);
>> -}
>> -
>> -static inline struct page *resync_fetch_page(struct resync_pages *rp,
>> - unsigned idx)
>> -{
>> - if (WARN_ON_ONCE(idx >= RESYNC_PAGES))
>> - return NULL;
>> - return rp->pages[idx];
>> }
>>
>> /*
>> - * 'strct resync_pages' stores actual pages used for doing the resync
>> + * 'strct resync_folio' stores actual pages used for doing the resync
>> * IO, and it is per-bio, so make .bi_private points to it.
>> */
>> -static inline struct resync_pages *get_resync_pages(struct bio *bio)
>> +static inline struct resync_folio *get_resync_folio(struct bio *bio)
>> {
>> return bio->bi_private;
>> }
>>
>> /* generally called after bio_reset() for reseting bvec */
>> -static void md_bio_reset_resync_pages(struct bio *bio, struct resync_pages *rp,
>> +static void md_bio_reset_resync_folio(struct bio *bio, struct resync_folio *rf,
>> int size)
>> {
>> - int idx = 0;
>> -
>> /* initialize bvec table again */
>> - do {
>> - struct page *page = resync_fetch_page(rp, idx);
>> - int len = min_t(int, size, PAGE_SIZE);
>> -
>> - if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
>> - bio->bi_status = BLK_STS_RESOURCE;
>> - bio_endio(bio);
>> - return;
>> - }
>> -
>> - size -= len;
>> - } while (idx++ < RESYNC_PAGES && size > 0);
>> + if (WARN_ON(!bio_add_folio(bio, rf->folio,
>> + min_t(int, size, RESYNC_BLOCK_SIZE),
>> + 0))) {
>> + bio->bi_status = BLK_STS_RESOURCE;
>> + bio_endio(bio);
>> + }
>> }
>>
>>
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index a72abdc37a2d..724fd4f2cc3a 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -120,11 +120,11 @@ static void remove_serial(struct md_rdev *rdev, sector_t lo, sector_t hi)
>>
>> /*
>> * for resync bio, r1bio pointer can be retrieved from the per-bio
>> - * 'struct resync_pages'.
>> + * 'struct resync_folio'.
>> */
>> static inline struct r1bio *get_resync_r1bio(struct bio *bio)
>> {
>> - return get_resync_pages(bio)->raid_bio;
>> + return get_resync_folio(bio)->raid_bio;
>> }
>>
>> static void *r1bio_pool_alloc(gfp_t gfp_flags, struct r1conf *conf)
>> @@ -146,70 +146,69 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
>> struct r1conf *conf = data;
>> struct r1bio *r1_bio;
>> struct bio *bio;
>> - int need_pages;
>> + int need_folio;
>
> The name need_folio is confusing. Can we keep the same style as the
> old version? How about need_folios?
>
Agree, I will rename it in v2.
>> int j;
>> - struct resync_pages *rps;
>> + struct resync_folio *rfs;
>> + int order = get_order(RESYNC_BLOCK_SIZE);
>>
>> r1_bio = r1bio_pool_alloc(gfp_flags, conf);
>> if (!r1_bio)
>> return NULL;
>>
>> - rps = kmalloc_array(conf->raid_disks * 2, sizeof(struct resync_pages),
>> + rfs = kmalloc_array(conf->raid_disks * 2, sizeof(struct resync_folio),
>> gfp_flags);
>> - if (!rps)
>> + if (!rfs)
>> goto out_free_r1bio;
>>
>> /*
>> * Allocate bios : 1 for reading, n-1 for writing
>> */
>> for (j = conf->raid_disks * 2; j-- ; ) {
>> - bio = bio_kmalloc(RESYNC_PAGES, gfp_flags);
>> + bio = bio_kmalloc(1, gfp_flags);
>> if (!bio)
>> goto out_free_bio;
>> - bio_init_inline(bio, NULL, RESYNC_PAGES, 0);
>> + bio_init_inline(bio, NULL, 1, 0);
>> r1_bio->bios[j] = bio;
>> }
>> /*
>> - * Allocate RESYNC_PAGES data pages and attach them to
>> - * the first bio.
>> + * Allocate data folio and attach it to the first bio.
>> * If this is a user-requested check/repair, allocate
>> - * RESYNC_PAGES for each bio.
>> + * folio for each bio.
>> */
>> if (test_bit(MD_RECOVERY_REQUESTED, &conf->mddev->recovery))
>> - need_pages = conf->raid_disks * 2;
>> + need_folio = conf->raid_disks * 2;
>> else
>> - need_pages = 1;
>> + need_folio = 1;
>> for (j = 0; j < conf->raid_disks * 2; j++) {
>> - struct resync_pages *rp = &rps[j];
>> + struct resync_folio *rf = &rfs[j];
>>
>> - bio = r1_bio->bios[j];
>> -
>> - if (j < need_pages) {
>> - if (resync_alloc_pages(rp, gfp_flags))
>> - goto out_free_pages;
>> + if (j < need_folio) {
>> + if (resync_alloc_folio(rf, gfp_flags, &order))
>> + goto out_free_folio;
>> } else {
>> - memcpy(rp, &rps[0], sizeof(*rp));
>> - resync_get_all_pages(rp);
>> + memcpy(rf, &rfs[0], sizeof(*rf));
>> + folio_get(rf->folio);
>> }
>>
>> - rp->raid_bio = r1_bio;
>> - bio->bi_private = rp;
>> + rf->raid_bio = r1_bio;
>> + r1_bio->bios[j]->bi_private = rf;
>> }
>>
>> + r1_bio->sectors = 1 << (order + PAGE_SECTORS_SHIFT);
>> r1_bio->master_bio = NULL;
>>
>> return r1_bio;
>>
>> -out_free_pages:
>> +out_free_folio:
>> while (--j >= 0)
>> - resync_free_pages(&rps[j]);
>> + folio_put(rfs[j].folio);
>>
>> out_free_bio:
>> while (++j < conf->raid_disks * 2) {
>> bio_uninit(r1_bio->bios[j]);
>> kfree(r1_bio->bios[j]);
>> }
>> - kfree(rps);
>> + kfree(rfs);
>>
>> out_free_r1bio:
>> rbio_pool_free(r1_bio, data);
>> @@ -221,17 +220,17 @@ static void r1buf_pool_free(void *__r1_bio, void *data)
>> struct r1conf *conf = data;
>> int i;
>> struct r1bio *r1bio = __r1_bio;
>> - struct resync_pages *rp = NULL;
>> + struct resync_folio *rf = NULL;
>>
>> for (i = conf->raid_disks * 2; i--; ) {
>> - rp = get_resync_pages(r1bio->bios[i]);
>> - resync_free_pages(rp);
>> + rf = get_resync_folio(r1bio->bios[i]);
>> + folio_put(rf->folio);
>> bio_uninit(r1bio->bios[i]);
>> kfree(r1bio->bios[i]);
>> }
>>
>> - /* resync pages array stored in the 1st bio's .bi_private */
>> - kfree(rp);
>> + /* resync folio stored in the 1st bio's .bi_private */
>> + kfree(rf);
>>
>> rbio_pool_free(r1bio, data);
>> }
>> @@ -2095,10 +2094,10 @@ static void end_sync_write(struct bio *bio)
>> put_sync_write_buf(r1_bio);
>> }
>>
>> -static int r1_sync_page_io(struct md_rdev *rdev, sector_t sector,
>> - int sectors, struct page *page, blk_opf_t rw)
>> +static int r1_sync_folio_io(struct md_rdev *rdev, sector_t sector, int sectors,
>> + int off, struct folio *folio, blk_opf_t rw)
>> {
>> - if (sync_page_io(rdev, sector, sectors << 9, page, rw, false))
>> + if (sync_folio_io(rdev, sector, sectors << 9, off, folio, rw, false))
>> /* success */
>> return 1;
>> if (rw == REQ_OP_WRITE) {
>> @@ -2129,10 +2128,10 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
>> struct mddev *mddev = r1_bio->mddev;
>> struct r1conf *conf = mddev->private;
>> struct bio *bio = r1_bio->bios[r1_bio->read_disk];
>> - struct page **pages = get_resync_pages(bio)->pages;
>> + struct folio *folio = get_resync_folio(bio)->folio;
>> sector_t sect = r1_bio->sector;
>> int sectors = r1_bio->sectors;
>> - int idx = 0;
>> + int off = 0;
>> struct md_rdev *rdev;
>>
>> rdev = conf->mirrors[r1_bio->read_disk].rdev;
>> @@ -2162,9 +2161,8 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
>> * active, and resync is currently active
>> */
>> rdev = conf->mirrors[d].rdev;
>> - if (sync_page_io(rdev, sect, s<<9,
>> - pages[idx],
>> - REQ_OP_READ, false)) {
>> + if (sync_folio_io(rdev, sect, s<<9, off, folio,
>> + REQ_OP_READ, false)) {
>> success = 1;
>> break;
>> }
>> @@ -2197,7 +2195,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
>> /* Try next page */
>> sectors -= s;
>> sect += s;
>> - idx++;
>> + off += s << 9;
>> continue;
>> }
>>
>> @@ -2210,8 +2208,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
>> if (r1_bio->bios[d]->bi_end_io != end_sync_read)
>> continue;
>> rdev = conf->mirrors[d].rdev;
>> - if (r1_sync_page_io(rdev, sect, s,
>> - pages[idx],
>> + if (r1_sync_folio_io(rdev, sect, s, off, folio,
>> REQ_OP_WRITE) == 0) {
>> r1_bio->bios[d]->bi_end_io = NULL;
>> rdev_dec_pending(rdev, mddev);
>> @@ -2225,14 +2222,13 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
>> if (r1_bio->bios[d]->bi_end_io != end_sync_read)
>> continue;
>> rdev = conf->mirrors[d].rdev;
>> - if (r1_sync_page_io(rdev, sect, s,
>> - pages[idx],
>> + if (r1_sync_folio_io(rdev, sect, s, off, folio,
>> REQ_OP_READ) != 0)
>> atomic_add(s, &rdev->corrected_errors);
>> }
>> sectors -= s;
>> sect += s;
>> - idx ++;
>> + off += s << 9;
>> }
>> set_bit(R1BIO_Uptodate, &r1_bio->state);
>> bio->bi_status = 0;
>> @@ -2252,14 +2248,12 @@ static void process_checks(struct r1bio *r1_bio)
>> struct r1conf *conf = mddev->private;
>> int primary;
>> int i;
>> - int vcnt;
>>
>> /* Fix variable parts of all bios */
>> - vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9);
>> for (i = 0; i < conf->raid_disks * 2; i++) {
>> blk_status_t status;
>> struct bio *b = r1_bio->bios[i];
>> - struct resync_pages *rp = get_resync_pages(b);
>> + struct resync_folio *rf = get_resync_folio(b);
>> if (b->bi_end_io != end_sync_read)
>> continue;
>> /* fixup the bio for reuse, but preserve errno */
>> @@ -2269,11 +2263,11 @@ static void process_checks(struct r1bio *r1_bio)
>> b->bi_iter.bi_sector = r1_bio->sector +
>> conf->mirrors[i].rdev->data_offset;
>> b->bi_end_io = end_sync_read;
>> - rp->raid_bio = r1_bio;
>> - b->bi_private = rp;
>> + rf->raid_bio = r1_bio;
>> + b->bi_private = rf;
>>
>> /* initialize bvec table again */
>> - md_bio_reset_resync_pages(b, rp, r1_bio->sectors << 9);
>> + md_bio_reset_resync_folio(b, rf, r1_bio->sectors << 9);
>> }
>> for (primary = 0; primary < conf->raid_disks * 2; primary++)
>> if (r1_bio->bios[primary]->bi_end_io == end_sync_read &&
>> @@ -2284,44 +2278,39 @@ static void process_checks(struct r1bio *r1_bio)
>> }
>> r1_bio->read_disk = primary;
>> for (i = 0; i < conf->raid_disks * 2; i++) {
>> - int j = 0;
>> struct bio *pbio = r1_bio->bios[primary];
>> struct bio *sbio = r1_bio->bios[i];
>> blk_status_t status = sbio->bi_status;
>> - struct page **ppages = get_resync_pages(pbio)->pages;
>> - struct page **spages = get_resync_pages(sbio)->pages;
>> - struct bio_vec *bi;
>> - int page_len[RESYNC_PAGES] = { 0 };
>> - struct bvec_iter_all iter_all;
>> + struct folio *pfolio = get_resync_folio(pbio)->folio;
>> + struct folio *sfolio = get_resync_folio(sbio)->folio;
>>
>> if (sbio->bi_end_io != end_sync_read)
>> continue;
>> /* Now we can 'fixup' the error value */
>> sbio->bi_status = 0;
>>
>> - bio_for_each_segment_all(bi, sbio, iter_all)
>> - page_len[j++] = bi->bv_len;
>> -
>> - if (!status) {
>> - for (j = vcnt; j-- ; ) {
>> - if (memcmp(page_address(ppages[j]),
>> - page_address(spages[j]),
>> - page_len[j]))
>> - break;
>> - }
>> - } else
>> - j = 0;
>> - if (j >= 0)
>> + /*
>> + * Copy data and submit write in two cases:
>> + * - IO error (non-zero status)
>> + * - Data inconsistency and not a CHECK operation.
>> + */
>> + if (status) {
>> atomic64_add(r1_bio->sectors, &mddev->resync_mismatches);
>> - if (j < 0 || (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)
>> - && !status)) {
>> - /* No need to write to this device. */
>> - sbio->bi_end_io = NULL;
>> - rdev_dec_pending(conf->mirrors[i].rdev, mddev);
>> + bio_copy_data(sbio, pbio);
>> continue;
>> + } else if (memcmp(folio_address(pfolio),
>> + folio_address(sfolio),
>> + r1_bio->sectors << 9)) {
>> + atomic64_add(r1_bio->sectors, &mddev->resync_mismatches);
>> + if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) {
>> + bio_copy_data(sbio, pbio);
>> + continue;
>> + }
>> }
>>
>> - bio_copy_data(sbio, pbio);
>> + /* No need to write to this device. */
>> + sbio->bi_end_io = NULL;
>> + rdev_dec_pending(conf->mirrors[i].rdev, mddev);
>> }
>> }
>>
>> @@ -2446,9 +2435,8 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
>> if (rdev &&
>> !test_bit(Faulty, &rdev->flags)) {
>> atomic_inc(&rdev->nr_pending);
>> - r1_sync_page_io(rdev, sect, s,
>> - folio_page(conf->tmpfolio, 0),
>> - REQ_OP_WRITE);
>> + r1_sync_folio_io(rdev, sect, s, 0,
>> + conf->tmpfolio, REQ_OP_WRITE);
>> rdev_dec_pending(rdev, mddev);
>> }
>> }
>> @@ -2461,9 +2449,8 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
>> if (rdev &&
>> !test_bit(Faulty, &rdev->flags)) {
>> atomic_inc(&rdev->nr_pending);
>> - if (r1_sync_page_io(rdev, sect, s,
>> - folio_page(conf->tmpfolio, 0),
>> - REQ_OP_READ)) {
>> + if (r1_sync_folio_io(rdev, sect, s, 0,
>> + conf->tmpfolio, REQ_OP_READ)) {
>> atomic_add(s, &rdev->corrected_errors);
>> pr_info("md/raid1:%s: read error corrected (%d sectors at %llu on %pg)\n",
>> mdname(mddev), s,
>> @@ -2738,15 +2725,15 @@ static int init_resync(struct r1conf *conf)
>> static struct r1bio *raid1_alloc_init_r1buf(struct r1conf *conf)
>> {
>> struct r1bio *r1bio = mempool_alloc(&conf->r1buf_pool, GFP_NOIO);
>> - struct resync_pages *rps;
>> + struct resync_folio *rfs;
>> struct bio *bio;
>> int i;
>>
>> for (i = conf->raid_disks * 2; i--; ) {
>> bio = r1bio->bios[i];
>> - rps = bio->bi_private;
>> + rfs = bio->bi_private;
>> bio_reset(bio, NULL, 0);
>> - bio->bi_private = rps;
>> + bio->bi_private = rfs;
>> }
>> r1bio->master_bio = NULL;
>> return r1bio;
>> @@ -2775,10 +2762,9 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
>> int write_targets = 0, read_targets = 0;
>> sector_t sync_blocks;
>> bool still_degraded = false;
>> - int good_sectors = RESYNC_SECTORS;
>> + int good_sectors;
>> int min_bad = 0; /* number of sectors that are bad in all devices */
>> int idx = sector_to_idx(sector_nr);
>> - int page_idx = 0;
>>
>> if (!mempool_initialized(&conf->r1buf_pool))
>> if (init_resync(conf))
>> @@ -2858,8 +2844,11 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
>> r1_bio->sector = sector_nr;
>> r1_bio->state = 0;
>> set_bit(R1BIO_IsSync, &r1_bio->state);
>> - /* make sure good_sectors won't go across barrier unit boundary */
>> - good_sectors = align_to_barrier_unit_end(sector_nr, good_sectors);
>> + /*
>> + * make sure good_sectors won't go across barrier unit boundary.
>> + * r1_bio->sectors <= RESYNC_SECTORS.
>> + */
>> + good_sectors = align_to_barrier_unit_end(sector_nr, r1_bio->sectors);
>>
>> for (i = 0; i < conf->raid_disks * 2; i++) {
>> struct md_rdev *rdev;
>> @@ -2979,44 +2968,28 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
>> max_sector = mddev->resync_max; /* Don't do IO beyond here */
>> if (max_sector > sector_nr + good_sectors)
>> max_sector = sector_nr + good_sectors;
>> - nr_sectors = 0;
>> - sync_blocks = 0;
>> do {
>> - struct page *page;
>> - int len = PAGE_SIZE;
>> - if (sector_nr + (len>>9) > max_sector)
>> - len = (max_sector - sector_nr) << 9;
>> - if (len == 0)
>> + nr_sectors = max_sector - sector_nr;
>> + if (nr_sectors == 0)
>> break;
>> - if (sync_blocks == 0) {
>> - if (!md_bitmap_start_sync(mddev, sector_nr,
>> - &sync_blocks, still_degraded) &&
>> - !conf->fullsync &&
>> - !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
>> - break;
>> - if ((len >> 9) > sync_blocks)
>> - len = sync_blocks<<9;
>> - }
>> + if (!md_bitmap_start_sync(mddev, sector_nr,
>> + &sync_blocks, still_degraded) &&
>> + !conf->fullsync &&
>> + !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
>> + break;
>> + if (nr_sectors > sync_blocks)
>> + nr_sectors = sync_blocks;
>>
>> for (i = 0 ; i < conf->raid_disks * 2; i++) {
>> - struct resync_pages *rp;
>> -
>> bio = r1_bio->bios[i];
>> - rp = get_resync_pages(bio);
>> if (bio->bi_end_io) {
>> - page = resync_fetch_page(rp, page_idx);
>> + struct resync_folio *rf = get_resync_folio(bio);
>>
>> - /*
>> - * won't fail because the vec table is big
>> - * enough to hold all these pages
>> - */
>> - __bio_add_page(bio, page, len, 0);
>> + bio_add_folio_nofail(bio, rf->folio, nr_sectors << 9, 0);
>> }
>> }
>> - nr_sectors += len>>9;
>> - sector_nr += len>>9;
>> - sync_blocks -= (len>>9);
>> - } while (++page_idx < RESYNC_PAGES);
>> + sector_nr += nr_sectors;
>> + } while (0);
>
> Now it can handle all pages in one go via a folio. It's strange to
> keep while(0) here.
>
I tried cleanning up while(0), it made 'if' and 'break' statements
unreadable. So I kept while(0) here.
>
>>
>> r1_bio->sectors = nr_sectors;
>
>
> This patch is a little big. Is it better to split this patch here?
>
It can't be splitted. The changes in raid1.c and raid10.c are entirely about
resync_pages -> resync_folio. We have to change declaration and its usage in
one patch.
>>
>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>> index 26f93040cd13..3638e00fe420 100644
>> --- a/drivers/md/raid10.c
>> +++ b/drivers/md/raid10.c
>> @@ -96,11 +96,11 @@ static void end_reshape(struct r10conf *conf);
>>
>> /*
>> * for resync bio, r10bio pointer can be retrieved from the per-bio
>> - * 'struct resync_pages'.
>> + * 'struct resync_folio'.
>> */
>> static inline struct r10bio *get_resync_r10bio(struct bio *bio)
>> {
>> - return get_resync_pages(bio)->raid_bio;
>> + return get_resync_folio(bio)->raid_bio;
>> }
>>
>> static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data)
>> @@ -133,8 +133,9 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
>> struct r10bio *r10_bio;
>> struct bio *bio;
>> int j;
>> - int nalloc, nalloc_rp;
>> - struct resync_pages *rps;
>> + int nalloc, nalloc_rf;
>> + struct resync_folio *rfs;
>> + int order = get_order(RESYNC_BLOCK_SIZE);
>>
>> r10_bio = r10bio_pool_alloc(gfp_flags, conf);
>> if (!r10_bio)
>> @@ -148,66 +149,64 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
>>
>> /* allocate once for all bios */
>> if (!conf->have_replacement)
>> - nalloc_rp = nalloc;
>> + nalloc_rf = nalloc;
>> else
>> - nalloc_rp = nalloc * 2;
>> - rps = kmalloc_array(nalloc_rp, sizeof(struct resync_pages), gfp_flags);
>> - if (!rps)
>> + nalloc_rf = nalloc * 2;
>> + rfs = kmalloc_array(nalloc_rf, sizeof(struct resync_folio), gfp_flags);
>> + if (!rfs)
>> goto out_free_r10bio;
>>
>> /*
>> * Allocate bios.
>> */
>> for (j = nalloc ; j-- ; ) {
>> - bio = bio_kmalloc(RESYNC_PAGES, gfp_flags);
>> + bio = bio_kmalloc(1, gfp_flags);
>> if (!bio)
>> goto out_free_bio;
>> - bio_init_inline(bio, NULL, RESYNC_PAGES, 0);
>> + bio_init_inline(bio, NULL, 1, 0);
>> r10_bio->devs[j].bio = bio;
>> if (!conf->have_replacement)
>> continue;
>> - bio = bio_kmalloc(RESYNC_PAGES, gfp_flags);
>> + bio = bio_kmalloc(1, gfp_flags);
>> if (!bio)
>> goto out_free_bio;
>> - bio_init_inline(bio, NULL, RESYNC_PAGES, 0);
>> + bio_init_inline(bio, NULL, 1, 0);
>> r10_bio->devs[j].repl_bio = bio;
>> }
>> /*
>> - * Allocate RESYNC_PAGES data pages and attach them
>> - * where needed.
>> + * Allocate data folio and attach it where needed.
>> */
>> for (j = 0; j < nalloc; j++) {
>> struct bio *rbio = r10_bio->devs[j].repl_bio;
>> - struct resync_pages *rp, *rp_repl;
>> + struct resync_folio *rf, *rf_repl;
>>
>> - rp = &rps[j];
>> + rf = &rfs[j];
>> if (rbio)
>> - rp_repl = &rps[nalloc + j];
>> -
>> - bio = r10_bio->devs[j].bio;
>> + rf_repl = &rfs[nalloc + j];
>>
>> if (!j || test_bit(MD_RECOVERY_SYNC,
>> &conf->mddev->recovery)) {
>> - if (resync_alloc_pages(rp, gfp_flags))
>> - goto out_free_pages;
>> + if (resync_alloc_folio(rf, gfp_flags, &order))
>> + goto out_free_folio;
>> } else {
>> - memcpy(rp, &rps[0], sizeof(*rp));
>> - resync_get_all_pages(rp);
>> + memcpy(rf, &rfs[0], sizeof(*rf));
>> + folio_get(rf->folio);
>> }
>>
>> - rp->raid_bio = r10_bio;
>> - bio->bi_private = rp;
>> + rf->raid_bio = r10_bio;
>> + r10_bio->devs[j].bio->bi_private = rf;
>> if (rbio) {
>> - memcpy(rp_repl, rp, sizeof(*rp));
>> - rbio->bi_private = rp_repl;
>> + memcpy(rf_repl, rf, sizeof(*rf));
>> + rbio->bi_private = rf_repl;
>> }
>> }
>>
>> + r10_bio->sectors = 1 << (order + PAGE_SECTORS_SHIFT);
>> return r10_bio;
>>
>> -out_free_pages:
>> +out_free_folio:
>> while (--j >= 0)
>> - resync_free_pages(&rps[j]);
>> + folio_put(rfs[j].folio);
>>
>> j = 0;
>> out_free_bio:
>> @@ -219,7 +218,7 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
>> bio_uninit(r10_bio->devs[j].repl_bio);
>> kfree(r10_bio->devs[j].repl_bio);
>> }
>> - kfree(rps);
>> + kfree(rfs);
>> out_free_r10bio:
>> rbio_pool_free(r10_bio, conf);
>> return NULL;
>> @@ -230,14 +229,14 @@ static void r10buf_pool_free(void *__r10_bio, void *data)
>> struct r10conf *conf = data;
>> struct r10bio *r10bio = __r10_bio;
>> int j;
>> - struct resync_pages *rp = NULL;
>> + struct resync_folio *rf = NULL;
>>
>> for (j = conf->copies; j--; ) {
>> struct bio *bio = r10bio->devs[j].bio;
>>
>> if (bio) {
>> - rp = get_resync_pages(bio);
>> - resync_free_pages(rp);
>> + rf = get_resync_folio(bio);
>> + folio_put(rf->folio);
>> bio_uninit(bio);
>> kfree(bio);
>> }
>> @@ -250,7 +249,7 @@ static void r10buf_pool_free(void *__r10_bio, void *data)
>> }
>>
>> /* resync pages array stored in the 1st bio's .bi_private */
>> - kfree(rp);
>> + kfree(rf);
>>
>> rbio_pool_free(r10bio, conf);
>> }
>> @@ -2342,8 +2341,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
>> struct r10conf *conf = mddev->private;
>> int i, first;
>> struct bio *tbio, *fbio;
>> - int vcnt;
>> - struct page **tpages, **fpages;
>> + struct folio *tfolio, *ffolio;
>>
>> atomic_set(&r10_bio->remaining, 1);
>>
>> @@ -2359,14 +2357,13 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
>> fbio = r10_bio->devs[i].bio;
>> fbio->bi_iter.bi_size = r10_bio->sectors << 9;
>> fbio->bi_iter.bi_idx = 0;
>> - fpages = get_resync_pages(fbio)->pages;
>> + ffolio = get_resync_folio(fbio)->folio;
>>
>> - vcnt = (r10_bio->sectors + (PAGE_SIZE >> 9) - 1) >> (PAGE_SHIFT - 9);
>> /* now find blocks with errors */
>> for (i=0 ; i < conf->copies ; i++) {
>> - int j, d;
>> + int d;
>> struct md_rdev *rdev;
>> - struct resync_pages *rp;
>> + struct resync_folio *rf;
>>
>> tbio = r10_bio->devs[i].bio;
>>
>> @@ -2375,31 +2372,23 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
>> if (i == first)
>> continue;
>>
>> - tpages = get_resync_pages(tbio)->pages;
>> + tfolio = get_resync_folio(tbio)->folio;
>> d = r10_bio->devs[i].devnum;
>> rdev = conf->mirrors[d].rdev;
>> if (!r10_bio->devs[i].bio->bi_status) {
>> /* We know that the bi_io_vec layout is the same for
>> * both 'first' and 'i', so we just compare them.
>> - * All vec entries are PAGE_SIZE;
>> */
>> - int sectors = r10_bio->sectors;
>> - for (j = 0; j < vcnt; j++) {
>> - int len = PAGE_SIZE;
>> - if (sectors < (len / 512))
>> - len = sectors * 512;
>> - if (memcmp(page_address(fpages[j]),
>> - page_address(tpages[j]),
>> - len))
>> - break;
>> - sectors -= len/512;
>> + if (memcmp(folio_address(ffolio),
>> + folio_address(tfolio),
>> + r10_bio->sectors << 9)) {
>> + atomic64_add(r10_bio->sectors,
>> + &mddev->resync_mismatches);
>> + if (test_bit(MD_RECOVERY_CHECK,
>> + &mddev->recovery))
>> + /* Don't fix anything. */
>> + continue;
>> }
>> - if (j == vcnt)
>> - continue;
>> - atomic64_add(r10_bio->sectors, &mddev->resync_mismatches);
>> - if (test_bit(MD_RECOVERY_CHECK, &mddev->recovery))
>> - /* Don't fix anything. */
>> - continue;
>> } else if (test_bit(FailFast, &rdev->flags)) {
>> /* Just give up on this device */
>> md_error(rdev->mddev, rdev);
>> @@ -2410,13 +2399,13 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
>> * First we need to fixup bv_offset, bv_len and
>> * bi_vecs, as the read request might have corrupted these
>> */
>> - rp = get_resync_pages(tbio);
>> + rf = get_resync_folio(tbio);
>> bio_reset(tbio, conf->mirrors[d].rdev->bdev, REQ_OP_WRITE);
>>
>> - md_bio_reset_resync_pages(tbio, rp, fbio->bi_iter.bi_size);
>> + md_bio_reset_resync_folio(tbio, rf, fbio->bi_iter.bi_size);
>>
>> - rp->raid_bio = r10_bio;
>> - tbio->bi_private = rp;
>> + rf->raid_bio = r10_bio;
>> + tbio->bi_private = rf;
>> tbio->bi_iter.bi_sector = r10_bio->devs[i].addr;
>> tbio->bi_end_io = end_sync_write;
>>
>> @@ -2476,10 +2465,9 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
>> struct bio *bio = r10_bio->devs[0].bio;
>> sector_t sect = 0;
>> int sectors = r10_bio->sectors;
>> - int idx = 0;
>> int dr = r10_bio->devs[0].devnum;
>> int dw = r10_bio->devs[1].devnum;
>> - struct page **pages = get_resync_pages(bio)->pages;
>> + struct folio *folio = get_resync_folio(bio)->folio;
>>
>> while (sectors) {
>> int s = sectors;
>> @@ -2492,19 +2480,21 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
>>
>> rdev = conf->mirrors[dr].rdev;
>> addr = r10_bio->devs[0].addr + sect;
>> - ok = sync_page_io(rdev,
>> - addr,
>> - s << 9,
>> - pages[idx],
>> - REQ_OP_READ, false);
>> + ok = sync_folio_io(rdev,
>> + addr,
>> + s << 9,
>> + sect << 9,
>> + folio,
>> + REQ_OP_READ, false);
>> if (ok) {
>> rdev = conf->mirrors[dw].rdev;
>> addr = r10_bio->devs[1].addr + sect;
>> - ok = sync_page_io(rdev,
>> - addr,
>> - s << 9,
>> - pages[idx],
>> - REQ_OP_WRITE, false);
>> + ok = sync_folio_io(rdev,
>> + addr,
>> + s << 9,
>> + sect << 9,
>> + folio,
>> + REQ_OP_WRITE, false);
>> if (!ok) {
>> set_bit(WriteErrorSeen, &rdev->flags);
>> if (!test_and_set_bit(WantReplacement,
>> @@ -2539,7 +2529,6 @@ static void fix_recovery_read_error(struct r10bio *r10_bio)
>>
>> sectors -= s;
>> sect += s;
>> - idx++;
>> }
>> }
>>
>> @@ -3050,7 +3039,7 @@ static int init_resync(struct r10conf *conf)
>> static struct r10bio *raid10_alloc_init_r10buf(struct r10conf *conf)
>> {
>> struct r10bio *r10bio = mempool_alloc(&conf->r10buf_pool, GFP_NOIO);
>> - struct rsync_pages *rp;
>> + struct resync_folio *rf;
>> struct bio *bio;
>> int nalloc;
>> int i;
>> @@ -3063,14 +3052,14 @@ static struct r10bio *raid10_alloc_init_r10buf(struct r10conf *conf)
>>
>> for (i = 0; i < nalloc; i++) {
>> bio = r10bio->devs[i].bio;
>> - rp = bio->bi_private;
>> + rf = bio->bi_private;
>> bio_reset(bio, NULL, 0);
>> - bio->bi_private = rp;
>> + bio->bi_private = rf;
>> bio = r10bio->devs[i].repl_bio;
>> if (bio) {
>> - rp = bio->bi_private;
>> + rf = bio->bi_private;
>> bio_reset(bio, NULL, 0);
>> - bio->bi_private = rp;
>> + bio->bi_private = rf;
>> }
>> }
>> return r10bio;
>> @@ -3156,7 +3145,6 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>> int max_sync = RESYNC_SECTORS;
>> sector_t sync_blocks;
>> sector_t chunk_mask = conf->geo.chunk_mask;
>> - int page_idx = 0;
>>
>> /*
>> * Allow skipping a full rebuild for incremental assembly
>> @@ -3376,6 +3364,15 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>> continue;
>> }
>> }
>> +
>> + /*
>> + * RESYNC_BLOCK_SIZE folio might alloc failed in
>> + * resync_alloc_folio(). Fall back to smaller sync
>> + * size if needed.
>> + */
>> + if (max_sync > r10_bio->sectors)
>> + max_sync = r10_bio->sectors;
>> +
>> any_working = 1;
>> bio = r10_bio->devs[0].bio;
>> bio->bi_next = biolist;
>> @@ -3527,7 +3524,15 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>> }
>> if (sync_blocks < max_sync)
>> max_sync = sync_blocks;
>> +
>> r10_bio = raid10_alloc_init_r10buf(conf);
>> + /*
>> + * RESYNC_BLOCK_SIZE folio might alloc failed in resync_alloc_folio().
>> + * Fall back to smaller sync size if needed.
>> + */
>> + if (max_sync > r10_bio->sectors)
>> + max_sync = r10_bio->sectors;
>> +
>> r10_bio->state = 0;
>>
>> r10_bio->mddev = mddev;
>> @@ -3620,29 +3625,25 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>> }
>> }
>>
>> - nr_sectors = 0;
>> if (sector_nr + max_sync < max_sector)
>> max_sector = sector_nr + max_sync;
>> do {
>> - struct page *page;
>> - int len = PAGE_SIZE;
>> - if (sector_nr + (len>>9) > max_sector)
>> - len = (max_sector - sector_nr) << 9;
>> - if (len == 0)
>> + nr_sectors = max_sector - sector_nr;
>> +
>> + if (nr_sectors == 0)
>> break;
>> for (bio= biolist ; bio ; bio=bio->bi_next) {
>> - struct resync_pages *rp = get_resync_pages(bio);
>> - page = resync_fetch_page(rp, page_idx);
>> - if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
>> + struct resync_folio *rf = get_resync_folio(bio);
>> +
>> + if (WARN_ON(!bio_add_folio(bio, rf->folio, nr_sectors << 9, 0))) {
>> bio->bi_status = BLK_STS_RESOURCE;
>> bio_endio(bio);
>> *skipped = 1;
>> - return max_sync;
>> + return nr_sectors << 9;
>> }
>> }
>> - nr_sectors += len>>9;
>> - sector_nr += len>>9;
>> - } while (++page_idx < RESYNC_PAGES);
>> + sector_nr += nr_sectors;
>> + } while (0);
>> r10_bio->sectors = nr_sectors;
>>
>> if (mddev_is_clustered(mddev) &&
>> @@ -4560,7 +4561,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
>> int *skipped)
>> {
>> /* We simply copy at most one chunk (smallest of old and new)
>> - * at a time, possibly less if that exceeds RESYNC_PAGES,
>> + * at a time, possibly less if that exceeds RESYNC_BLOCK_SIZE,
>> * or we hit a bad block or something.
>> * This might mean we pause for normal IO in the middle of
>> * a chunk, but that is not a problem as mddev->reshape_position
>> @@ -4600,14 +4601,13 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
>> struct r10bio *r10_bio;
>> sector_t next, safe, last;
>> int max_sectors;
>> - int nr_sectors;
>> int s;
>> struct md_rdev *rdev;
>> int need_flush = 0;
>> struct bio *blist;
>> struct bio *bio, *read_bio;
>> int sectors_done = 0;
>> - struct page **pages;
>> + struct folio *folio;
>>
>> if (sector_nr == 0) {
>> /* If restarting in the middle, skip the initial sectors */
>> @@ -4709,7 +4709,12 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
>> r10_bio->mddev = mddev;
>> r10_bio->sector = sector_nr;
>> set_bit(R10BIO_IsReshape, &r10_bio->state);
>> - r10_bio->sectors = last - sector_nr + 1;
>> + /*
>> + * RESYNC_BLOCK_SIZE folio might alloc failed in
>> + * resync_alloc_folio(). Fall back to smaller sync
>> + * size if needed.
>> + */
>> + r10_bio->sectors = min_t(int, r10_bio->sectors, last - sector_nr + 1);
>> rdev = read_balance(conf, r10_bio, &max_sectors);
>> BUG_ON(!test_bit(R10BIO_Previous, &r10_bio->state));
>>
>> @@ -4723,7 +4728,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
>> return sectors_done;
>> }
>>
>> - read_bio = bio_alloc_bioset(rdev->bdev, RESYNC_PAGES, REQ_OP_READ,
>> + read_bio = bio_alloc_bioset(rdev->bdev, 1, REQ_OP_READ,
>> GFP_KERNEL, &mddev->bio_set);
>> read_bio->bi_iter.bi_sector = (r10_bio->devs[r10_bio->read_slot].addr
>> + rdev->data_offset);
>> @@ -4787,32 +4792,23 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
>> blist = b;
>> }
>>
>> - /* Now add as many pages as possible to all of these bios. */
>> + /* Now add folio to all of these bios. */
>>
>> - nr_sectors = 0;
>> - pages = get_resync_pages(r10_bio->devs[0].bio)->pages;
>> - for (s = 0 ; s < max_sectors; s += PAGE_SIZE >> 9) {
>> - struct page *page = pages[s / (PAGE_SIZE >> 9)];
>> - int len = (max_sectors - s) << 9;
>> - if (len > PAGE_SIZE)
>> - len = PAGE_SIZE;
>> - for (bio = blist; bio ; bio = bio->bi_next) {
>> - if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
>> - bio->bi_status = BLK_STS_RESOURCE;
>> - bio_endio(bio);
>> - return sectors_done;
>> - }
>> + folio = get_resync_folio(r10_bio->devs[0].bio)->folio;
>> + for (bio = blist; bio ; bio = bio->bi_next) {
>> + if (WARN_ON(!bio_add_folio(bio, folio, max_sectors, 0))) {
>> + bio->bi_status = BLK_STS_RESOURCE;
>> + bio_endio(bio);
>> + return sectors_done;
>
> In fact, the original codes don't clean up before returning.
> bio_add_folio_nofail is used in raid1 and can we use
> bio_add_folio_nofail here as well?
>
Agree, I will clean it up before this patch.
>> }
>> - sector_nr += len >> 9;
>> - nr_sectors += len >> 9;
>> }
>> - r10_bio->sectors = nr_sectors;
>> + r10_bio->sectors = max_sectors >> 9;
>>
>> /* Now submit the read */
>> atomic_inc(&r10_bio->remaining);
>> read_bio->bi_next = NULL;
>> submit_bio_noacct(read_bio);
>> - sectors_done += nr_sectors;
>> + sectors_done += max_sectors;
>> if (sector_nr <= last)
>> goto read_more;
>>
>> @@ -4914,8 +4910,8 @@ static int handle_reshape_read_error(struct mddev *mddev,
>> struct r10conf *conf = mddev->private;
>> struct r10bio *r10b;
>> int slot = 0;
>> - int idx = 0;
>> - struct page **pages;
>> + int sect = 0;
>> + struct folio *folio;
>>
>> r10b = kmalloc(struct_size(r10b, devs, conf->copies), GFP_NOIO);
>> if (!r10b) {
>> @@ -4923,8 +4919,8 @@ static int handle_reshape_read_error(struct mddev *mddev,
>> return -ENOMEM;
>> }
>>
>> - /* reshape IOs share pages from .devs[0].bio */
>> - pages = get_resync_pages(r10_bio->devs[0].bio)->pages;
>> + /* reshape IOs share folio from .devs[0].bio */
>> + folio = get_resync_folio(r10_bio->devs[0].bio)->folio;
>>
>> r10b->sector = r10_bio->sector;
>> __raid10_find_phys(&conf->prev, r10b);
>> @@ -4940,19 +4936,19 @@ static int handle_reshape_read_error(struct mddev *mddev,
>> while (!success) {
>> int d = r10b->devs[slot].devnum;
>> struct md_rdev *rdev = conf->mirrors[d].rdev;
>> - sector_t addr;
>> if (rdev == NULL ||
>> test_bit(Faulty, &rdev->flags) ||
>> !test_bit(In_sync, &rdev->flags))
>> goto failed;
>>
>> - addr = r10b->devs[slot].addr + idx * PAGE_SIZE;
>> atomic_inc(&rdev->nr_pending);
>> - success = sync_page_io(rdev,
>> - addr,
>> - s << 9,
>> - pages[idx],
>> - REQ_OP_READ, false);
>> + success = sync_folio_io(rdev,
>> + r10b->devs[slot].addr +
>> + sect,
>> + s << 9,
>> + sect << 9,
>> + folio,
>> + REQ_OP_READ, false);
>> rdev_dec_pending(rdev, mddev);
>> if (success)
>> break;
>> @@ -4971,7 +4967,7 @@ static int handle_reshape_read_error(struct mddev *mddev,
>> return -EIO;
>> }
>> sectors -= s;
>> - idx++;
>> + sect += s;
>> }
>> kfree(r10b);
>> return 0;
>> --
>> 2.39.2
>>
>>
>
> Regards
> Xiao
--
Thansk
Nan
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2026-05-07 7:13 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-16 3:37 [PATCH v3 0/8] folio support for sync I/O in RAID linan666
2026-04-16 3:37 ` [PATCH v3 1/8] md/raid1,raid10: clean up of RESYNC_SECTORS linan666
2026-04-16 3:37 ` [PATCH v3 2/8] md: introduce sync_folio_io for folio support in RAID linan666
2026-04-16 3:37 ` [PATCH v3 3/8] md: introduce safe_put_folio " linan666
2026-04-16 3:37 ` [PATCH v3 4/8] md/raid1: use folio for tmppage linan666
2026-04-16 3:37 ` [PATCH v3 5/8] md/raid10: " linan666
2026-04-16 3:37 ` [PATCH v3 6/8] md/raid1,raid10: use folio for sync path IO linan666
2026-04-30 1:54 ` Xiao Ni
2026-05-07 7:13 ` 李楠 Magic Li
2026-04-16 3:38 ` [PATCH v3 7/8] md/raid1: fix IO error at logical block size granularity linan666
2026-04-30 2:22 ` Xiao Ni
2026-04-16 3:38 ` [PATCH v3 8/8] md/raid10: " linan666
2026-04-30 2:23 ` Xiao Ni
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox