Linux RAID subsystem development
 help / color / mirror / Atom feed
* [PATCH 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5
@ 2026-06-05  9:15 Yu Kuai
  2026-06-05  9:15 ` [PATCH] md: add exact bitmap mapping and reshape hooks Yu Kuai
                   ` (20 more replies)
  0 siblings, 21 replies; 66+ messages in thread
From: Yu Kuai @ 2026-06-05  9:15 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Hi,

This series adds llbitmap support for online reshape in RAID10 and RAID5.

llbitmap has a different set of constraints from the existing bitmap code:
there is one live bitmap instance, each bit state has richer semantics, and
reshape can change the mapping from logical array ranges to bitmap ranges.
The series therefore adds exact bitmap range mapping hooks, tracks old and
new llbitmap geometry during reshape, remaps checkpointed bits as reshape
progresses, and wires the reshape lifecycle into RAID10 and RAID5.

The main rules are:

1. split bios at the reshape position before bitmap accounting, so one bio
   is never accounted with mixed old/new geometry;
2. do not skip reshape ranges from stale llbitmap state, because reshape
   progress is checkpointed by array metadata;
3. remap llbitmap bits when reshape progress is checkpointed;
4. reject llbitmap reshape if mddev->chunk_sectors shrinks, because the
   effective data range represented by existing bitmap bits can shrink.

The first group of patches prepares generic bitmap and llbitmap
infrastructure.  The second group wires RAID10.  The last group wires
RAID5, including exact old/new stripe mapping.

Validation:
* RAID5 llbitmap test:
  - created 3-disk RAID5 with --bitmap=lockless
  - wrote 96 MiB of random data
  - reshaped to 4 disks
  - llbitmap bits changed from clean=1024 dirty=1024 to
    unwritten=448 clean=1600 dirty=0
  - all sync-related llbitmap counters were zero after reshape
  - data hash was unchanged after reshape
  - replaced one disk, waited for recovery, hash was unchanged
  - failed another old disk and verified degraded reads still matched
* RAID10 llbitmap test:
  - created 4-disk RAID10 n2 with --bitmap=lockless
  - wrote 128 MiB of random data
  - reshaped to 6 disks
  - llbitmap bits changed from clean=2048 dirty=2048 to
    unwritten=2048 clean=4096 dirty=0
  - all sync-related llbitmap counters were zero after reshape
  - data hash was unchanged after reshape
  - replaced one disk, waited for recovery, hash was unchanged
  - failed the rebuilt disk's mirror mate and verified degraded reads still
    matched

Yu Kuai (20):
  md: add exact bitmap mapping and reshape hooks
  md: skip bitmap accounting for empty write ranges
  md: add helper to split bios at reshape offset
  md/md-llbitmap: track bitmap sync_size explicitly
  md/md-llbitmap: allocate page controls independently
  md/md-llbitmap: grow the page cache in place for reshape
  md/md-llbitmap: track target reshape geometry fields
  md/md-llbitmap: finish reshape geometry
  md/md-llbitmap: refuse reshape while llbitmap still needs sync
  md/md-llbitmap: add reshape range mapping helpers
  md/md-llbitmap: don't skip reshape ranges from bitmap state
  md/md-llbitmap: remap checkpointed bits as reshape progresses
  md/md-llbitmap: clamp state-machine walks to tracked bits
  md/raid10: reject llbitmap reshape when md chunk shrinks
  md/raid10: wire llbitmap reshape lifecycle
  md/raid10: split reshape bios before bitmap accounting
  md/raid5: add exact old and new llbitmap mapping helpers
  md/raid5: reject llbitmap reshape when md chunk shrinks
  md/raid5: wire llbitmap reshape lifecycle
  md/raid5: split reshape bios before bitmap accounting

 drivers/md/md-bitmap.c   |   8 +
 drivers/md/md-bitmap.h   |   8 +
 drivers/md/md-llbitmap.c | 616 +++++++++++++++++++++++++++++++++++----
 drivers/md/md.c          |  60 +++-
 drivers/md/md.h          |   8 +
 drivers/md/raid10.c      |  50 +++-
 drivers/md/raid5.c       | 118 ++++++--
 7 files changed, 793 insertions(+), 75 deletions(-)

-- 
2.51.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH] md: add exact bitmap mapping and reshape hooks
  2026-06-05  9:15 [PATCH 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
@ 2026-06-05  9:15 ` Yu Kuai
  2026-06-05  9:15 ` [PATCH] md: skip bitmap accounting for empty write ranges Yu Kuai
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 66+ messages in thread
From: Yu Kuai @ 2026-06-05  9:15 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Add bitmap mapping and reshape hooks needed by llbitmap reshape
support without teaching md core to account a single bio against
multiple bitmap ranges.

This also adds the old/new bitmap geometry helpers used by
personalities to describe reshape mapping to llbitmap.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/md-bitmap.c   |  8 ++++++++
 drivers/md/md-bitmap.h   |  8 ++++++++
 drivers/md/md-llbitmap.c |  8 ++++++++
 drivers/md/md.c          | 12 ++++++++----
 drivers/md/md.h          |  4 ++++
 5 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 028b9ca8ce52..e10296788cdd 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -1727,10 +1727,17 @@ static void bitmap_start_write(struct mddev *mddev, sector_t offset,
 		else
 			sectors = 0;
 	}
 }
 
+static void bitmap_prepare_range(struct mddev *mddev, sector_t *offset,
+				 unsigned long *sectors)
+{
+	if (mddev->pers->bitmap_sector)
+		mddev->pers->bitmap_sector(mddev, offset, sectors);
+}
+
 static void bitmap_end_write(struct mddev *mddev, sector_t offset,
 			     unsigned long sectors)
 {
 	struct bitmap *bitmap = mddev->bitmap;
 
@@ -3075,10 +3082,11 @@ static struct bitmap_operations bitmap_ops = {
 	.load			= bitmap_load,
 	.destroy		= bitmap_destroy,
 	.flush			= bitmap_flush,
 	.write_all		= bitmap_write_all,
 	.dirty_bits		= bitmap_dirty_bits,
+	.prepare_range		= bitmap_prepare_range,
 	.unplug			= bitmap_unplug,
 	.daemon_work		= bitmap_daemon_work,
 
 	.start_behind_write	= bitmap_start_behind_write,
 	.end_behind_write	= bitmap_end_behind_write,
diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h
index 214f623c7e79..f0136fc02feb 100644
--- a/drivers/md/md-bitmap.h
+++ b/drivers/md/md-bitmap.h
@@ -91,10 +91,18 @@ struct bitmap_operations {
 	void (*destroy)(struct mddev *mddev);
 	void (*flush)(struct mddev *mddev);
 	void (*write_all)(struct mddev *mddev);
 	void (*dirty_bits)(struct mddev *mddev, unsigned long s,
 			   unsigned long e);
+	/* Prepare a range for this bitmap implementation. */
+	void (*prepare_range)(struct mddev *mddev,
+			      sector_t *offset,
+			      unsigned long *sectors);
+	void (*reshape_finish)(struct mddev *mddev);
+	int (*reshape_can_start)(struct mddev *mddev);
+	void (*reshape_mark)(struct mddev *mddev, sector_t old_pos,
+			     sector_t new_pos);
 	void (*unplug)(struct mddev *mddev, bool sync);
 	void (*daemon_work)(struct mddev *mddev);
 
 	void (*start_behind_write)(struct mddev *mddev);
 	void (*end_behind_write)(struct mddev *mddev);
diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index 1adc5b117821..bcf34f0c9af6 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -1186,10 +1186,17 @@ static void llbitmap_destroy(struct mddev *mddev)
 	llbitmap_free_pages(llbitmap);
 	kfree(llbitmap);
 	mutex_unlock(&mddev->bitmap_info.mutex);
 }
 
+static void llbitmap_prepare_range(struct mddev *mddev, sector_t *offset,
+				   unsigned long *sectors)
+{
+	if (mddev->pers->bitmap_sector)
+		mddev->pers->bitmap_sector(mddev, offset, sectors);
+}
+
 static void llbitmap_start_write(struct mddev *mddev, sector_t offset,
 				 unsigned long sectors)
 {
 	struct llbitmap *llbitmap = mddev->bitmap;
 	unsigned long start = offset >> llbitmap->chunkshift;
@@ -1775,10 +1782,11 @@ static struct bitmap_operations llbitmap_ops = {
 	.cond_end_sync		= llbitmap_cond_end_sync,
 
 	.update_sb		= llbitmap_update_sb,
 	.get_stats		= llbitmap_get_stats,
 	.dirty_bits		= llbitmap_dirty_bits,
+	.prepare_range		= llbitmap_prepare_range,
 	.write_all		= llbitmap_write_all,
 
 	.groups			= md_llbitmap_groups,
 };
 
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 096bb64e87bd..08eabc7e5a71 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9359,21 +9359,25 @@ void md_submit_discard_bio(struct mddev *mddev, struct md_rdev *rdev,
 	mddev_trace_remap(mddev, discard_bio, bio->bi_iter.bi_sector);
 	submit_bio_noacct(discard_bio);
 }
 EXPORT_SYMBOL_GPL(md_submit_discard_bio);
 
+static void md_bitmap_prepare_range(struct mddev *mddev, sector_t *offset,
+				    unsigned long *sectors)
+{
+	mddev->bitmap_ops->prepare_range(mddev, offset, sectors);
+}
+
 static void md_bitmap_start(struct mddev *mddev,
 			    struct md_io_clone *md_io_clone)
 {
 	md_bitmap_fn *fn = unlikely(md_io_clone->rw == STAT_DISCARD) ?
 			   mddev->bitmap_ops->start_discard :
 			   mddev->bitmap_ops->start_write;
 
-	if (mddev->pers->bitmap_sector)
-		mddev->pers->bitmap_sector(mddev, &md_io_clone->offset,
-					   &md_io_clone->sectors);
-
+	md_bitmap_prepare_range(mddev, &md_io_clone->offset,
+				&md_io_clone->sectors);
 	fn(mddev, md_io_clone->offset, md_io_clone->sectors);
 }
 
 static void md_bitmap_end(struct mddev *mddev, struct md_io_clone *md_io_clone)
 {
diff --git a/drivers/md/md.h b/drivers/md/md.h
index d8daf0f75cbb..110cf0f8b107 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -796,10 +796,14 @@ struct md_personality
 	/* Changes the consistency policy of an active array. */
 	int (*change_consistency_policy)(struct mddev *mddev, const char *buf);
 	/* convert io ranges from array to bitmap */
 	void (*bitmap_sector)(struct mddev *mddev, sector_t *offset,
 			      unsigned long *sectors);
+	void (*bitmap_sector_map)(struct mddev *mddev, sector_t *offset,
+				  unsigned long *sectors, bool previous);
+	sector_t (*bitmap_sync_size)(struct mddev *mddev, bool previous);
+	sector_t (*bitmap_array_sectors)(struct mddev *mddev, bool previous);
 };
 
 struct md_sysfs_entry {
 	struct attribute attr;
 	ssize_t (*show)(struct mddev *, char *);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH] md: skip bitmap accounting for empty write ranges
  2026-06-05  9:15 [PATCH 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
  2026-06-05  9:15 ` [PATCH] md: add exact bitmap mapping and reshape hooks Yu Kuai
@ 2026-06-05  9:15 ` Yu Kuai
  2026-06-05  9:15 ` [PATCH] md: add helper to split bios at reshape offset Yu Kuai
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 66+ messages in thread
From: Yu Kuai @ 2026-06-05  9:15 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

mkfs.ext4 can submit zero-sector flush/FUA bios. These bios are WRITE
bios for md_write_start() purposes, but they do not cover any data sector
and must not dirty bitmap bits.

md bitmap accounting currently passes such bios to bitmap start_write().
For llbitmap this reaches llbitmap_start_write() with sectors == 0,
which underflows the end chunk calculation.

The new bitmap prepare_range() hook can also turn a non-empty bio into an
empty bitmap range when the requested sectors are outside the active
bitmap geometry. Treat both cases as not started, so the completion path
will not call end_write() for an empty range.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/md.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 08eabc7e5a71..ccc4180d2c1d 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9374,10 +9374,12 @@ static void md_bitmap_start(struct mddev *mddev,
 			   mddev->bitmap_ops->start_discard :
 			   mddev->bitmap_ops->start_write;
 
 	md_bitmap_prepare_range(mddev, &md_io_clone->offset,
 				&md_io_clone->sectors);
+	if (!md_io_clone->sectors)
+		return;
 	fn(mddev, md_io_clone->offset, md_io_clone->sectors);
 }
 
 static void md_bitmap_end(struct mddev *mddev, struct md_io_clone *md_io_clone)
 {
@@ -9394,11 +9396,12 @@ static void md_end_clone_io(struct bio *bio)
 						       bio_clone);
 	struct bio *orig_bio = md_io_clone->orig_bio;
 	struct mddev *mddev = md_io_clone->mddev;
 	struct completion *reshape_completion = bio->bi_private;
 
-	if (bio_data_dir(orig_bio) == WRITE && md_bitmap_enabled(mddev, false))
+	if (bio_data_dir(orig_bio) == WRITE && md_io_clone->sectors &&
+	    md_bitmap_enabled(mddev, false))
 		md_bitmap_end(mddev, md_io_clone);
 
 	if (bio->bi_status && !orig_bio->bi_status)
 		orig_bio->bi_status = bio->bi_status;
 
@@ -9421,14 +9424,16 @@ static void md_clone_bio(struct mddev *mddev, struct bio **bio)
 		bio_alloc_clone(bdev, *bio, GFP_NOIO, &mddev->io_clone_set);
 
 	md_io_clone = container_of(clone, struct md_io_clone, bio_clone);
 	md_io_clone->orig_bio = *bio;
 	md_io_clone->mddev = mddev;
+	md_io_clone->sectors = 0;
 	if (blk_queue_io_stat(bdev->bd_disk->queue))
 		md_io_clone->start_time = bio_start_io_acct(*bio);
 
-	if (bio_data_dir(*bio) == WRITE && md_bitmap_enabled(mddev, false)) {
+	if (bio_data_dir(*bio) == WRITE && bio_sectors(*bio) &&
+	    md_bitmap_enabled(mddev, false)) {
 		md_io_clone->offset = (*bio)->bi_iter.bi_sector;
 		md_io_clone->sectors = bio_sectors(*bio);
 		md_io_clone->rw = op_stat_group(bio_op(*bio));
 		md_bitmap_start(mddev, md_io_clone);
 	}
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH] md: add helper to split bios at reshape offset
  2026-06-05  9:15 [PATCH 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
  2026-06-05  9:15 ` [PATCH] md: add exact bitmap mapping and reshape hooks Yu Kuai
  2026-06-05  9:15 ` [PATCH] md: skip bitmap accounting for empty write ranges Yu Kuai
@ 2026-06-05  9:15 ` Yu Kuai
  2026-06-05  9:15 ` [PATCH] md/md-llbitmap: track bitmap sync_size explicitly Yu Kuai
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 66+ messages in thread
From: Yu Kuai @ 2026-06-05  9:15 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Add mddev_bio_split_at_reshape_offset() so personalities can share
reshape-offset bio splitting instead of open-coding the same helper in
multiple places.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/md.c | 39 +++++++++++++++++++++++++++++++++++++++
 drivers/md/md.h |  4 ++++
 2 files changed, 43 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index ccc4180d2c1d..6685e4c53fd9 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9359,10 +9359,49 @@ void md_submit_discard_bio(struct mddev *mddev, struct md_rdev *rdev,
 	mddev_trace_remap(mddev, discard_bio, bio->bi_iter.bi_sector);
 	submit_bio_noacct(discard_bio);
 }
 EXPORT_SYMBOL_GPL(md_submit_discard_bio);
 
+struct bio *mddev_bio_split_at_reshape_offset(struct mddev *mddev,
+					      struct bio *bio,
+					      unsigned int *max_sectors,
+					      struct bio_set *bs)
+{
+	sector_t boundary;
+	sector_t start;
+	sector_t end;
+	unsigned int split_sectors;
+
+	split_sectors = bio_sectors(bio);
+	if (max_sectors && *max_sectors && *max_sectors < split_sectors)
+		split_sectors = *max_sectors;
+
+	if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery))
+		goto split;
+
+	boundary = mddev->reshape_position;
+	start = bio->bi_iter.bi_sector;
+	end = bio_end_sector(bio);
+	if (start >= boundary || end <= boundary)
+		goto split;
+
+	if (boundary - start < split_sectors)
+		split_sectors = boundary - start;
+
+split:
+	if (max_sectors)
+		*max_sectors = split_sectors;
+	if (split_sectors < bio_sectors(bio)) {
+		bio = bio_submit_split_bioset(bio, split_sectors, bs);
+		if (bio)
+			bio->bi_opf |= REQ_NOMERGE;
+	}
+
+	return bio;
+}
+EXPORT_SYMBOL_GPL(mddev_bio_split_at_reshape_offset);
+
 static void md_bitmap_prepare_range(struct mddev *mddev, sector_t *offset,
 				    unsigned long *sectors)
 {
 	mddev->bitmap_ops->prepare_range(mddev, offset, sectors);
 }
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 110cf0f8b107..ebfc6da83161 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -923,10 +923,14 @@ extern void md_done_sync(struct mddev *mddev, int blocks);
 extern void md_sync_error(struct mddev *mddev);
 extern void md_error(struct mddev *mddev, struct md_rdev *rdev);
 extern void md_finish_reshape(struct mddev *mddev);
 void md_submit_discard_bio(struct mddev *mddev, struct md_rdev *rdev,
 			struct bio *bio, sector_t start, sector_t size);
+struct bio *mddev_bio_split_at_reshape_offset(struct mddev *mddev,
+					      struct bio *bio,
+					      unsigned int *max_sectors,
+					      struct bio_set *bs);
 void md_account_bio(struct mddev *mddev, struct bio **bio);
 
 extern bool __must_check md_flush_request(struct mddev *mddev, struct bio *bio);
 void md_write_metadata(struct mddev *mddev, struct md_rdev *rdev,
 		       sector_t sector, int size, struct page *page,
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH] md/md-llbitmap: track bitmap sync_size explicitly
  2026-06-05  9:15 [PATCH 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                   ` (2 preceding siblings ...)
  2026-06-05  9:15 ` [PATCH] md: add helper to split bios at reshape offset Yu Kuai
@ 2026-06-05  9:15 ` Yu Kuai
  2026-06-15 10:48   ` Su Yue
  2026-06-05  9:15 ` [PATCH] md/md-llbitmap: allocate page controls independently Yu Kuai
                   ` (16 subsequent siblings)
  20 siblings, 1 reply; 66+ messages in thread
From: Yu Kuai @ 2026-06-05  9:15 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Track llbitmap's own sync_size instead of always using
mddev->resync_max_sectors directly.

This is the minimal bookkeeping needed before llbitmap can track old
and new reshape geometry independently.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/md-llbitmap.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index bcf34f0c9af6..ecf3ed712315 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -285,10 +285,12 @@ struct llbitmap {
 	unsigned long chunkshift;
 	/* size of one chunk in sector */
 	unsigned long chunksize;
 	/* total number of chunks */
 	unsigned long chunks;
+	/* total number of sectors tracked by current bitmap geometry */
+	sector_t sync_size;
 	unsigned long last_end_sync;
 	/*
 	 * time in seconds that dirty bits will be cleared if the page is not
 	 * accessed.
 	 */
@@ -916,10 +918,11 @@ static int llbitmap_init(struct llbitmap *llbitmap)
 
 	llbitmap->barrier_idle = DEFAULT_BARRIER_IDLE;
 	llbitmap->chunkshift = ffz(~chunksize);
 	llbitmap->chunksize = chunksize;
 	llbitmap->chunks = chunks;
+	llbitmap->sync_size = blocks;
 	mddev->bitmap_info.daemon_sleep = DEFAULT_DAEMON_SLEEP;
 
 	ret = llbitmap_cache_pages(llbitmap);
 	if (ret)
 		return ret;
@@ -936,10 +939,11 @@ static int llbitmap_read_sb(struct llbitmap *llbitmap)
 {
 	struct mddev *mddev = llbitmap->mddev;
 	unsigned long daemon_sleep;
 	unsigned long chunksize;
 	unsigned long events;
+	sector_t sync_size;
 	struct page *sb_page;
 	bitmap_super_t *sb;
 	int ret = -EINVAL;
 
 	if (!mddev->bitmap_info.offset) {
@@ -985,10 +989,13 @@ static int llbitmap_read_sb(struct llbitmap *llbitmap)
 	if (test_and_clear_bit(BITMAP_FIRST_USE, &llbitmap->flags)) {
 		ret = llbitmap_init(llbitmap);
 		goto out_put_page;
 	}
 
+	sync_size = le64_to_cpu(sb->sync_size);
+	if (!sync_size)
+		sync_size = mddev->resync_max_sectors;
 	chunksize = le32_to_cpu(sb->chunksize);
 	if (!is_power_of_2(chunksize)) {
 		pr_err("md/llbitmap: %s: chunksize not a power of 2",
 		       mdname(mddev));
 		goto out_put_page;
@@ -1020,12 +1027,13 @@ static int llbitmap_read_sb(struct llbitmap *llbitmap)
 	mddev->bitmap_info.chunksize = chunksize;
 	mddev->bitmap_info.daemon_sleep = daemon_sleep;
 
 	llbitmap->barrier_idle = DEFAULT_BARRIER_IDLE;
 	llbitmap->chunksize = chunksize;
-	llbitmap->chunks = DIV_ROUND_UP_SECTOR_T(mddev->resync_max_sectors, chunksize);
+	llbitmap->chunks = DIV_ROUND_UP_SECTOR_T(sync_size, chunksize);
 	llbitmap->chunkshift = ffz(~chunksize);
+	llbitmap->sync_size = sync_size;
 	ret = llbitmap_cache_pages(llbitmap);
 
 out_put_page:
 	__free_page(sb_page);
 	kunmap_local(sb);
@@ -1151,10 +1159,11 @@ static int llbitmap_resize(struct mddev *mddev, sector_t blocks, int chunksize)
 	}
 
 	llbitmap->chunkshift = ffz(~chunksize);
 	llbitmap->chunksize = chunksize;
 	llbitmap->chunks = chunks;
+	llbitmap->sync_size = blocks;
 
 	return 0;
 }
 
 static int llbitmap_load(struct mddev *mddev)
@@ -1524,11 +1533,11 @@ static void llbitmap_update_sb(void *data)
 
 	sb = kmap_local_page(sb_page);
 	sb->events = cpu_to_le64(mddev->events);
 	sb->state = cpu_to_le32(llbitmap->flags);
 	sb->chunksize = cpu_to_le32(llbitmap->chunksize);
-	sb->sync_size = cpu_to_le64(mddev->resync_max_sectors);
+	sb->sync_size = cpu_to_le64(llbitmap->sync_size);
 	sb->events_cleared = cpu_to_le64(llbitmap->events_cleared);
 	sb->sectors_reserved = cpu_to_le32(mddev->bitmap_info.space);
 	sb->daemon_sleep = cpu_to_le32(mddev->bitmap_info.daemon_sleep);
 
 	kunmap_local(sb);
@@ -1542,10 +1551,11 @@ static int llbitmap_get_stats(void *data, struct md_bitmap_stats *stats)
 	memset(stats, 0, sizeof(*stats));
 
 	stats->missing_pages = 0;
 	stats->pages = llbitmap->nr_pages;
 	stats->file_pages = llbitmap->nr_pages;
+	stats->sync_size = llbitmap->sync_size;
 
 	stats->behind_writes = atomic_read(&llbitmap->behind_writes);
 	stats->behind_wait = wq_has_sleeper(&llbitmap->behind_wait);
 	stats->events_cleared = llbitmap->events_cleared;
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH] md/md-llbitmap: allocate page controls independently
  2026-06-05  9:15 [PATCH 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                   ` (3 preceding siblings ...)
  2026-06-05  9:15 ` [PATCH] md/md-llbitmap: track bitmap sync_size explicitly Yu Kuai
@ 2026-06-05  9:15 ` Yu Kuai
  2026-06-15 11:06   ` Su Yue
  2026-06-05  9:15 ` [PATCH] md/md-llbitmap: grow the page cache in place for reshape Yu Kuai
                   ` (15 subsequent siblings)
  20 siblings, 1 reply; 66+ messages in thread
From: Yu Kuai @ 2026-06-05  9:15 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Allocate one llbitmap page-control object at a time and free each
object through the same model.

Let llbitmap_read_page() return a zeroed page without reading disk when
the page index is beyond the current bitmap size, so page-control
allocation no longer needs a separate read_existing flag.

This keeps the llbitmap page-control lifetime self-consistent and
prepares the page-cache code for later in-place growth.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/md-llbitmap.c | 99 +++++++++++++++++++++++++---------------
 1 file changed, 62 insertions(+), 37 deletions(-)

diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index ecf3ed712315..2f2896fe4d6f 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -510,24 +510,32 @@ static void llbitmap_write(struct llbitmap *llbitmap, enum llbitmap_state state,
 		llbitmap_set_page_dirty(llbitmap, idx, bit, true);
 	else if (state == BitNeedSyncUnwritten)
 		llbitmap_set_page_dirty(llbitmap, idx, bit, false);
 }
 
+static unsigned int llbitmap_used_pages(struct llbitmap *llbitmap,
+					unsigned long chunks)
+{
+	return DIV_ROUND_UP(chunks + BITMAP_DATA_OFFSET, PAGE_SIZE);
+}
+
 static struct page *llbitmap_read_page(struct llbitmap *llbitmap, int idx)
 {
 	struct mddev *mddev = llbitmap->mddev;
 	struct page *page = NULL;
 	struct md_rdev *rdev;
 
-	if (llbitmap->pctl && llbitmap->pctl[idx])
+	if (llbitmap->pctl && idx < llbitmap->nr_pages && llbitmap->pctl[idx])
 		page = llbitmap->pctl[idx]->page;
 	if (page)
 		return page;
 
 	page = alloc_page(GFP_KERNEL | __GFP_ZERO);
 	if (!page)
 		return ERR_PTR(-ENOMEM);
+	if (idx >= llbitmap_used_pages(llbitmap, llbitmap->chunks))
+		return page;
 
 	rdev_for_each(rdev, mddev) {
 		sector_t sector;
 
 		if (rdev->raid_disk < 0 || test_bit(Faulty, &rdev->flags) ||
@@ -594,65 +602,82 @@ static void llbitmap_free_pages(struct llbitmap *llbitmap)
 		return;
 
 	for (i = 0; i < llbitmap->nr_pages; i++) {
 		struct llbitmap_page_ctl *pctl = llbitmap->pctl[i];
 
-		if (!pctl || !pctl->page)
-			break;
-
-		__free_page(pctl->page);
+		if (!pctl)
+			continue;
+		if (pctl->page)
+			__free_page(pctl->page);
 		percpu_ref_exit(&pctl->active);
+		kfree(pctl);
 	}
 
-	kfree(llbitmap->pctl[0]);
 	kfree(llbitmap->pctl);
 	llbitmap->pctl = NULL;
 }
 
-static int llbitmap_cache_pages(struct llbitmap *llbitmap)
+static struct llbitmap_page_ctl *
+llbitmap_alloc_page_ctl(struct llbitmap *llbitmap, int idx)
 {
 	struct llbitmap_page_ctl *pctl;
-	unsigned int nr_pages = DIV_ROUND_UP(llbitmap->chunks +
-					     BITMAP_DATA_OFFSET, PAGE_SIZE);
+	struct page *page;
 	unsigned int size = struct_size(pctl, dirty, BITS_TO_LONGS(
 						llbitmap->blocks_per_page));
-	int i;
-
-	llbitmap->pctl = kmalloc_array(nr_pages, sizeof(void *),
-				       GFP_KERNEL | __GFP_ZERO);
-	if (!llbitmap->pctl)
-		return -ENOMEM;
 
 	size = round_up(size, cache_line_size());
-	pctl = kmalloc_array(nr_pages, size, GFP_KERNEL | __GFP_ZERO);
-	if (!pctl) {
-		kfree(llbitmap->pctl);
-		return -ENOMEM;
+	pctl = kzalloc(size, GFP_KERNEL);
+	if (!pctl)
+		return ERR_PTR(-ENOMEM);
+
+	page = llbitmap_read_page(llbitmap, idx);
+
+	if (IS_ERR(page)) {
+		kfree(pctl);
+		return ERR_CAST(page);
 	}
 
-	llbitmap->nr_pages = nr_pages;
+	if (percpu_ref_init(&pctl->active, active_release,
+			    PERCPU_REF_ALLOW_REINIT, GFP_KERNEL)) {
+		__free_page(page);
+		kfree(pctl);
+		return ERR_PTR(-ENOMEM);
+	}
 
-	for (i = 0; i < nr_pages; i++, pctl = (void *)pctl + size) {
-		struct page *page = llbitmap_read_page(llbitmap, i);
+	pctl->page = page;
+	pctl->state = page_address(page);
+	init_waitqueue_head(&pctl->wait);
+	return pctl;
+}
 
-		llbitmap->pctl[i] = pctl;
+static unsigned int llbitmap_reserved_pages(struct llbitmap *llbitmap)
+{
+	return DIV_ROUND_UP(llbitmap->mddev->bitmap_info.space << SECTOR_SHIFT,
+			    PAGE_SIZE);
+}
 
-		if (IS_ERR(page)) {
-			llbitmap_free_pages(llbitmap);
-			return PTR_ERR(page);
-		}
+static int llbitmap_alloc_pages(struct llbitmap *llbitmap)
+{
+	unsigned int used_pages = llbitmap_used_pages(llbitmap, llbitmap->chunks);
+	unsigned int nr_pages = max(used_pages, llbitmap_reserved_pages(llbitmap));
+	int i;
+
+	llbitmap->pctl = kcalloc(nr_pages, sizeof(*llbitmap->pctl), GFP_KERNEL);
+	if (!llbitmap->pctl)
+		return -ENOMEM;
 
-		if (percpu_ref_init(&pctl->active, active_release,
-				    PERCPU_REF_ALLOW_REINIT, GFP_KERNEL)) {
-			__free_page(page);
+	llbitmap->nr_pages = nr_pages;
+
+	for (i = 0; i < nr_pages; i++) {
+		llbitmap->pctl[i] = llbitmap_alloc_page_ctl(llbitmap, i);
+		if (IS_ERR(llbitmap->pctl[i])) {
+			int ret = PTR_ERR(llbitmap->pctl[i]);
+
+			llbitmap->pctl[i] = NULL;
 			llbitmap_free_pages(llbitmap);
-			return -ENOMEM;
+			return ret;
 		}
-
-		pctl->page = page;
-		pctl->state = page_address(page);
-		init_waitqueue_head(&pctl->wait);
 	}
 
 	return 0;
 }
 
@@ -921,11 +946,11 @@ static int llbitmap_init(struct llbitmap *llbitmap)
 	llbitmap->chunksize = chunksize;
 	llbitmap->chunks = chunks;
 	llbitmap->sync_size = blocks;
 	mddev->bitmap_info.daemon_sleep = DEFAULT_DAEMON_SLEEP;
 
-	ret = llbitmap_cache_pages(llbitmap);
+	ret = llbitmap_alloc_pages(llbitmap);
 	if (ret)
 		return ret;
 
 	llbitmap_state_machine(llbitmap, 0, llbitmap->chunks - 1,
 			       BitmapActionInit);
@@ -1030,11 +1055,11 @@ static int llbitmap_read_sb(struct llbitmap *llbitmap)
 	llbitmap->barrier_idle = DEFAULT_BARRIER_IDLE;
 	llbitmap->chunksize = chunksize;
 	llbitmap->chunks = DIV_ROUND_UP_SECTOR_T(sync_size, chunksize);
 	llbitmap->chunkshift = ffz(~chunksize);
 	llbitmap->sync_size = sync_size;
-	ret = llbitmap_cache_pages(llbitmap);
+	ret = llbitmap_alloc_pages(llbitmap);
 
 out_put_page:
 	__free_page(sb_page);
 	kunmap_local(sb);
 	return ret;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH] md/md-llbitmap: grow the page cache in place for reshape
  2026-06-05  9:15 [PATCH 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                   ` (4 preceding siblings ...)
  2026-06-05  9:15 ` [PATCH] md/md-llbitmap: allocate page controls independently Yu Kuai
@ 2026-06-05  9:15 ` Yu Kuai
  2026-06-15 11:16   ` Su Yue
  2026-06-05  9:15 ` [PATCH] md/md-llbitmap: track target reshape geometry fields Yu Kuai
                   ` (14 subsequent siblings)
  20 siblings, 1 reply; 66+ messages in thread
From: Yu Kuai @ 2026-06-05  9:15 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Use the page-control helpers to grow llbitmap's cached pages in place
for resize and later reshape preparation, instead of rebuilding the
whole cache.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/md-llbitmap.c | 139 +++++++++++++++++++++++++++++++++++----
 1 file changed, 127 insertions(+), 12 deletions(-)

diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index 2f2896fe4d6f..91d3dec43d48 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -414,10 +414,23 @@ static char state_machine[BitStateCount][BitmapActionCount] = {
 		[BitmapActionClearUnwritten]	= BitUnwritten,
 	},
 };
 
 static void __llbitmap_flush(struct mddev *mddev);
+static void llbitmap_flush(struct mddev *mddev);
+static void llbitmap_update_sb(void *data);
+
+static void llbitmap_resize_chunks(struct mddev *mddev, sector_t blocks,
+				   unsigned long *chunksize,
+				   unsigned long *chunks)
+{
+	*chunks = DIV_ROUND_UP_SECTOR_T(blocks, *chunksize);
+	while (*chunks > mddev->bitmap_info.space << SECTOR_SHIFT) {
+		*chunksize = *chunksize << 1;
+		*chunks = DIV_ROUND_UP_SECTOR_T(blocks, *chunksize);
+	}
+}
 
 static enum llbitmap_state llbitmap_read(struct llbitmap *llbitmap, loff_t pos)
 {
 	unsigned int idx;
 	unsigned int offset;
@@ -653,10 +666,52 @@ static unsigned int llbitmap_reserved_pages(struct llbitmap *llbitmap)
 {
 	return DIV_ROUND_UP(llbitmap->mddev->bitmap_info.space << SECTOR_SHIFT,
 			    PAGE_SIZE);
 }
 
+static int llbitmap_expand_pages(struct llbitmap *llbitmap,
+				 unsigned long chunks)
+{
+	struct llbitmap_page_ctl **pctl;
+	unsigned int old_nr_pages = llbitmap->nr_pages;
+	unsigned int nr_pages = llbitmap_used_pages(llbitmap, chunks);
+	int i;
+	int ret;
+
+	if (nr_pages <= old_nr_pages)
+		return 0;
+
+	pctl = kcalloc(nr_pages, sizeof(*pctl), GFP_KERNEL);
+	if (!pctl)
+		return -ENOMEM;
+
+	if (llbitmap->pctl)
+		memcpy(pctl, llbitmap->pctl,
+		       array_size(old_nr_pages, sizeof(*pctl)));
+
+	for (i = old_nr_pages; i < nr_pages; i++) {
+		pctl[i] = llbitmap_alloc_page_ctl(llbitmap, i);
+		if (IS_ERR(pctl[i]))
+			goto err_alloc_ptr;
+	}
+
+	kfree(llbitmap->pctl);
+	llbitmap->pctl = pctl;
+	llbitmap->nr_pages = nr_pages;
+	return 0;
+
+err_alloc_ptr:
+	ret = PTR_ERR(pctl[i]);
+	for (i--; i >= (int)old_nr_pages; i--) {
+		__free_page(pctl[i]->page);
+		percpu_ref_exit(&pctl[i]->active);
+		kfree(pctl[i]);
+	}
+	kfree(pctl);
+	return ret;
+}
+
 static int llbitmap_alloc_pages(struct llbitmap *llbitmap)
 {
 	unsigned int used_pages = llbitmap_used_pages(llbitmap, llbitmap->chunks);
 	unsigned int nr_pages = max(used_pages, llbitmap_reserved_pages(llbitmap));
 	int i;
@@ -728,10 +783,38 @@ static bool llbitmap_zero_all_disks(struct llbitmap *llbitmap)
 	}
 
 	return true;
 }
 
+static void llbitmap_mark_range(struct llbitmap *llbitmap,
+				unsigned long start,
+				unsigned long end,
+				enum llbitmap_state state)
+{
+	while (start <= end) {
+		llbitmap_write(llbitmap, state, start);
+		start++;
+	}
+}
+
+static int llbitmap_prepare_resize(struct llbitmap *llbitmap,
+				   unsigned long old_chunks,
+				   unsigned long new_chunks,
+				   unsigned long cache_chunks)
+{
+	int ret;
+
+	llbitmap_flush(llbitmap->mddev);
+	ret = llbitmap_expand_pages(llbitmap, cache_chunks);
+	if (ret)
+		return ret;
+	if (new_chunks > old_chunks)
+		llbitmap_mark_range(llbitmap, old_chunks, new_chunks - 1,
+				    BitUnwritten);
+	return 0;
+}
+
 static void llbitmap_init_state(struct llbitmap *llbitmap)
 {
 	struct mddev *mddev = llbitmap->mddev;
 	enum llbitmap_state state = BitUnwritten;
 	unsigned long i;
@@ -1024,14 +1107,14 @@ static int llbitmap_read_sb(struct llbitmap *llbitmap)
 		pr_err("md/llbitmap: %s: chunksize not a power of 2",
 		       mdname(mddev));
 		goto out_put_page;
 	}
 
-	if (chunksize < DIV_ROUND_UP_SECTOR_T(mddev->resync_max_sectors,
+	if (chunksize < DIV_ROUND_UP_SECTOR_T(sync_size,
 					      mddev->bitmap_info.space << SECTOR_SHIFT)) {
 		pr_err("md/llbitmap: %s: chunksize too small %lu < %llu / %lu",
-		       mdname(mddev), chunksize, mddev->resync_max_sectors,
+		       mdname(mddev), chunksize, sync_size,
 		       mddev->bitmap_info.space);
 		goto out_put_page;
 	}
 
 	daemon_sleep = le32_to_cpu(sb->daemon_sleep);
@@ -1169,28 +1252,60 @@ static int llbitmap_create(struct mddev *mddev)
 }
 
 static int llbitmap_resize(struct mddev *mddev, sector_t blocks, int chunksize)
 {
 	struct llbitmap *llbitmap = mddev->bitmap;
+	sector_t old_blocks = llbitmap->sync_size;
+	unsigned long old_chunks = llbitmap->chunks;
 	unsigned long chunks;
+	unsigned long cache_chunks;
+	int ret = 0;
+	unsigned long bitmap_chunksize;
+	bool reshape;
 
 	if (chunksize == 0)
 		chunksize = llbitmap->chunksize;
 
-	/* If there is enough space, leave the chunksize unchanged. */
-	chunks = DIV_ROUND_UP_SECTOR_T(blocks, chunksize);
-	while (chunks > mddev->bitmap_info.space << SECTOR_SHIFT) {
-		chunksize = chunksize << 1;
-		chunks = DIV_ROUND_UP_SECTOR_T(blocks, chunksize);
-	}
+	bitmap_chunksize = chunksize;
+	llbitmap_resize_chunks(mddev, blocks, &bitmap_chunksize, &chunks);
 
-	llbitmap->chunkshift = ffz(~chunksize);
-	llbitmap->chunksize = chunksize;
-	llbitmap->chunks = chunks;
-	llbitmap->sync_size = blocks;
+	reshape = mddev->delta_disks || mddev->new_level != mddev->level ||
+		mddev->new_layout != mddev->layout ||
+		mddev->new_chunk_sectors != mddev->chunk_sectors;
+	if (!reshape && bitmap_chunksize != llbitmap->chunksize)
+		return -EOPNOTSUPP;
+	if (blocks == old_blocks && chunks == llbitmap->chunks)
+		return 0;
+
+	mutex_lock(&mddev->bitmap_info.mutex);
 
+	cache_chunks = reshape ? max(old_chunks, chunks) : chunks;
+	ret = llbitmap_prepare_resize(llbitmap, old_chunks, chunks, cache_chunks);
+	if (ret)
+		goto out;
+
+	if (reshape) {
+		llbitmap->reshape_sync_size = blocks;
+		llbitmap->reshape_chunksize = bitmap_chunksize;
+		llbitmap->reshape_chunks = chunks;
+		llbitmap->chunks = max(old_chunks, chunks);
+	} else {
+		if (blocks < old_blocks && chunks < old_chunks)
+			llbitmap_mark_range(llbitmap, chunks, old_chunks - 1,
+					    BitUnwritten);
+		mddev->bitmap_info.chunksize = bitmap_chunksize;
+		llbitmap->chunks = chunks;
+		llbitmap->sync_size = blocks;
+		llbitmap_update_sb(llbitmap);
+	}
+	__llbitmap_flush(mddev);
+	mutex_unlock(&mddev->bitmap_info.mutex);
 	return 0;
+
+out:
+	mutex_unlock(&mddev->bitmap_info.mutex);
+	return ret;
 }
 
 static int llbitmap_load(struct mddev *mddev)
 {
 	enum llbitmap_action action = BitmapActionReload;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH] md/md-llbitmap: track target reshape geometry fields
  2026-06-05  9:15 [PATCH 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                   ` (5 preceding siblings ...)
  2026-06-05  9:15 ` [PATCH] md/md-llbitmap: grow the page cache in place for reshape Yu Kuai
@ 2026-06-05  9:15 ` Yu Kuai
  2026-06-05  9:15 ` [PATCH] md/md-llbitmap: finish reshape geometry Yu Kuai
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 66+ messages in thread
From: Yu Kuai @ 2026-06-05  9:15 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Track llbitmap bookkeeping for the target reshape geometry while keeping
a single live bitmap instance.

Add the reshape geometry fields, refresh helper, and update the load and
resize paths to keep the target geometry in sync.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/md-llbitmap.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index 91d3dec43d48..f79538a4eb79 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -287,10 +287,13 @@ struct llbitmap {
 	unsigned long chunksize;
 	/* total number of chunks */
 	unsigned long chunks;
 	/* total number of sectors tracked by current bitmap geometry */
 	sector_t sync_size;
+	unsigned long reshape_chunksize;
+	unsigned long reshape_chunks;
+	sector_t reshape_sync_size;
 	unsigned long last_end_sync;
 	/*
 	 * time in seconds that dirty bits will be cleared if the page is not
 	 * accessed.
 	 */
@@ -428,10 +431,42 @@ static void llbitmap_resize_chunks(struct mddev *mddev, sector_t blocks,
 		*chunksize = *chunksize << 1;
 		*chunks = DIV_ROUND_UP_SECTOR_T(blocks, *chunksize);
 	}
 }
 
+static bool llbitmap_reshaping(struct llbitmap *llbitmap)
+{
+	return llbitmap->mddev->reshape_position != MaxSector;
+}
+
+static sector_t llbitmap_personality_sync_size(struct llbitmap *llbitmap,
+					       bool previous)
+{
+	struct mddev *mddev = llbitmap->mddev;
+
+	if (!llbitmap_reshaping(llbitmap) || !mddev->private || !mddev->pers ||
+	    !mddev->pers->bitmap_sync_size)
+		return llbitmap->sync_size;
+	return mddev->pers->bitmap_sync_size(mddev, previous);
+}
+
+static void llbitmap_refresh_reshape(struct llbitmap *llbitmap)
+{
+	unsigned long old_chunks = DIV_ROUND_UP_SECTOR_T(llbitmap->sync_size,
+						 llbitmap->chunksize);
+	sector_t blocks = llbitmap_personality_sync_size(llbitmap, false);
+	unsigned long chunksize = llbitmap->chunksize;
+	unsigned long chunks = DIV_ROUND_UP_SECTOR_T(blocks, chunksize);
+
+	llbitmap->reshape_sync_size = blocks;
+	llbitmap->reshape_chunksize = chunksize;
+	llbitmap->reshape_chunks = chunks;
+	llbitmap_resize_chunks(llbitmap->mddev, blocks, &llbitmap->reshape_chunksize,
+			       &llbitmap->reshape_chunks);
+	llbitmap->chunks = max(old_chunks, llbitmap->reshape_chunks);
+}
+
 static enum llbitmap_state llbitmap_read(struct llbitmap *llbitmap, loff_t pos)
 {
 	unsigned int idx;
 	unsigned int offset;
 
@@ -1027,10 +1062,11 @@ static int llbitmap_init(struct llbitmap *llbitmap)
 	llbitmap->barrier_idle = DEFAULT_BARRIER_IDLE;
 	llbitmap->chunkshift = ffz(~chunksize);
 	llbitmap->chunksize = chunksize;
 	llbitmap->chunks = chunks;
 	llbitmap->sync_size = blocks;
+	llbitmap_refresh_reshape(llbitmap);
 	mddev->bitmap_info.daemon_sleep = DEFAULT_DAEMON_SLEEP;
 
 	ret = llbitmap_alloc_pages(llbitmap);
 	if (ret)
 		return ret;
@@ -1138,10 +1174,11 @@ static int llbitmap_read_sb(struct llbitmap *llbitmap)
 	llbitmap->barrier_idle = DEFAULT_BARRIER_IDLE;
 	llbitmap->chunksize = chunksize;
 	llbitmap->chunks = DIV_ROUND_UP_SECTOR_T(sync_size, chunksize);
 	llbitmap->chunkshift = ffz(~chunksize);
 	llbitmap->sync_size = sync_size;
+	llbitmap_refresh_reshape(llbitmap);
 	ret = llbitmap_alloc_pages(llbitmap);
 
 out_put_page:
 	__free_page(sb_page);
 	kunmap_local(sb);
@@ -1293,10 +1330,11 @@ static int llbitmap_resize(struct mddev *mddev, sector_t blocks, int chunksize)
 			llbitmap_mark_range(llbitmap, chunks, old_chunks - 1,
 					    BitUnwritten);
 		mddev->bitmap_info.chunksize = bitmap_chunksize;
 		llbitmap->chunks = chunks;
 		llbitmap->sync_size = blocks;
+		llbitmap_refresh_reshape(llbitmap);
 		llbitmap_update_sb(llbitmap);
 	}
 	__llbitmap_flush(mddev);
 	mutex_unlock(&mddev->bitmap_info.mutex);
 	return 0;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH] md/md-llbitmap: finish reshape geometry
  2026-06-05  9:15 [PATCH 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                   ` (6 preceding siblings ...)
  2026-06-05  9:15 ` [PATCH] md/md-llbitmap: track target reshape geometry fields Yu Kuai
@ 2026-06-05  9:15 ` Yu Kuai
  2026-06-05  9:15 ` [PATCH] md/md-llbitmap: refuse reshape while llbitmap still needs sync Yu Kuai
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 66+ messages in thread
From: Yu Kuai @ 2026-06-05  9:15 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Commit the staged llbitmap geometry when reshape finishes.

The reshape staging itself is handled through llbitmap_resize(), so only
the finish step remains in this patch.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/md-llbitmap.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index f79538a4eb79..52ab45f2a018 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -1677,10 +1677,34 @@ static void llbitmap_dirty_bits(struct mddev *mddev, unsigned long s,
 				unsigned long e)
 {
 	llbitmap_state_machine(mddev->bitmap, s, e, BitmapActionStartwrite);
 }
 
+static void llbitmap_reshape_finish(struct mddev *mddev)
+{
+	struct llbitmap *llbitmap = mddev->bitmap;
+
+	if (mddev->pers->quiesce)
+		mddev->pers->quiesce(mddev, 1);
+
+	mutex_lock(&mddev->bitmap_info.mutex);
+	llbitmap_flush(mddev);
+
+	llbitmap->chunksize = llbitmap->reshape_chunksize;
+	llbitmap->chunkshift = ffz(~llbitmap->chunksize);
+	llbitmap->chunks = llbitmap->reshape_chunks;
+	llbitmap->sync_size = llbitmap->reshape_sync_size;
+	llbitmap_refresh_reshape(llbitmap);
+	mddev->bitmap_info.chunksize = llbitmap->chunksize;
+	llbitmap_update_sb(llbitmap);
+	__llbitmap_flush(mddev);
+	mutex_unlock(&mddev->bitmap_info.mutex);
+
+	if (mddev->pers->quiesce)
+		mddev->pers->quiesce(mddev, 0);
+}
+
 static void llbitmap_write_sb(struct llbitmap *llbitmap)
 {
 	int nr_blocks = DIV_ROUND_UP(BITMAP_DATA_OFFSET, llbitmap->io_size);
 
 	bitmap_fill(llbitmap->pctl[0]->dirty, nr_blocks);
@@ -1971,10 +1995,11 @@ static struct bitmap_operations llbitmap_ops = {
 
 	.update_sb		= llbitmap_update_sb,
 	.get_stats		= llbitmap_get_stats,
 	.dirty_bits		= llbitmap_dirty_bits,
 	.prepare_range		= llbitmap_prepare_range,
+	.reshape_finish		= llbitmap_reshape_finish,
 	.write_all		= llbitmap_write_all,
 
 	.groups			= md_llbitmap_groups,
 };
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH] md/md-llbitmap: refuse reshape while llbitmap still needs sync
  2026-06-05  9:15 [PATCH 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                   ` (7 preceding siblings ...)
  2026-06-05  9:15 ` [PATCH] md/md-llbitmap: finish reshape geometry Yu Kuai
@ 2026-06-05  9:15 ` Yu Kuai
  2026-06-05  9:15 ` [PATCH] md/md-llbitmap: add reshape range mapping helpers Yu Kuai
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 66+ messages in thread
From: Yu Kuai @ 2026-06-05  9:15 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Reject reshape when llbitmap still contains NeedSync or Syncing bits.

This keeps reshape from starting until the current llbitmap state has
been reconciled.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/md-llbitmap.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index 52ab45f2a018..f45daf3be4d5 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -1677,10 +1677,33 @@ static void llbitmap_dirty_bits(struct mddev *mddev, unsigned long s,
 				unsigned long e)
 {
 	llbitmap_state_machine(mddev->bitmap, s, e, BitmapActionStartwrite);
 }
 
+static int llbitmap_reshape_can_start(struct mddev *mddev)
+{
+	struct llbitmap *llbitmap = mddev->bitmap;
+	unsigned long chunk;
+	int ret = 0;
+
+	if (!llbitmap)
+		return 0;
+
+	mutex_lock(&mddev->bitmap_info.mutex);
+	for (chunk = 0; chunk < llbitmap->chunks; chunk++) {
+		enum llbitmap_state state = llbitmap_read(llbitmap, chunk);
+
+		if (state == BitNeedSync || state == BitSyncing) {
+			ret = -EBUSY;
+			break;
+		}
+	}
+	mutex_unlock(&mddev->bitmap_info.mutex);
+
+	return ret;
+}
+
 static void llbitmap_reshape_finish(struct mddev *mddev)
 {
 	struct llbitmap *llbitmap = mddev->bitmap;
 
 	if (mddev->pers->quiesce)
@@ -1996,10 +2019,11 @@ static struct bitmap_operations llbitmap_ops = {
 	.update_sb		= llbitmap_update_sb,
 	.get_stats		= llbitmap_get_stats,
 	.dirty_bits		= llbitmap_dirty_bits,
 	.prepare_range		= llbitmap_prepare_range,
 	.reshape_finish		= llbitmap_reshape_finish,
+	.reshape_can_start	= llbitmap_reshape_can_start,
 	.write_all		= llbitmap_write_all,
 
 	.groups			= md_llbitmap_groups,
 };
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH] md/md-llbitmap: add reshape range mapping helpers
  2026-06-05  9:15 [PATCH 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                   ` (8 preceding siblings ...)
  2026-06-05  9:15 ` [PATCH] md/md-llbitmap: refuse reshape while llbitmap still needs sync Yu Kuai
@ 2026-06-05  9:15 ` Yu Kuai
  2026-06-05  9:15 ` [PATCH] md/md-llbitmap: don't skip reshape ranges from bitmap state Yu Kuai
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 66+ messages in thread
From: Yu Kuai @ 2026-06-05  9:15 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Teach llbitmap to choose old versus new geometry during reshape and to
encode exact bitmap ranges for the active geometry.

This is the mapping groundwork for checkpoint remapping.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/md-llbitmap.c | 96 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 92 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index f45daf3be4d5..76621a9fd645 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -7,10 +7,11 @@
 #include <linux/init.h>
 #include <linux/timer.h>
 #include <linux/sched.h>
 #include <linux/list.h>
 #include <linux/file.h>
+#include <linux/math64.h>
 #include <linux/seq_file.h>
 #include <trace/events/block.h>
 
 #include "md.h"
 #include "md-bitmap.h"
@@ -447,10 +448,20 @@ static sector_t llbitmap_personality_sync_size(struct llbitmap *llbitmap,
 	    !mddev->pers->bitmap_sync_size)
 		return llbitmap->sync_size;
 	return mddev->pers->bitmap_sync_size(mddev, previous);
 }
 
+static sector_t llbitmap_logical_size(struct llbitmap *llbitmap, bool previous)
+{
+	struct mddev *mddev = llbitmap->mddev;
+
+	if (!llbitmap_reshaping(llbitmap) || !mddev->private || !mddev->pers ||
+	    !mddev->pers->bitmap_array_sectors)
+		return llbitmap_personality_sync_size(llbitmap, previous);
+	return mddev->pers->bitmap_array_sectors(mddev, previous);
+}
+
 static void llbitmap_refresh_reshape(struct llbitmap *llbitmap)
 {
 	unsigned long old_chunks = DIV_ROUND_UP_SECTOR_T(llbitmap->sync_size,
 						 llbitmap->chunksize);
 	sector_t blocks = llbitmap_personality_sync_size(llbitmap, false);
@@ -463,10 +474,56 @@ static void llbitmap_refresh_reshape(struct llbitmap *llbitmap)
 	llbitmap_resize_chunks(llbitmap->mddev, blocks, &llbitmap->reshape_chunksize,
 			       &llbitmap->reshape_chunks);
 	llbitmap->chunks = max(old_chunks, llbitmap->reshape_chunks);
 }
 
+static void llbitmap_map_layout(struct llbitmap *llbitmap, sector_t *offset,
+				unsigned long *sectors, bool previous)
+{
+	sector_t limit = llbitmap_logical_size(llbitmap, previous);
+	sector_t start = *offset;
+	sector_t end = start + *sectors;
+
+	if (start >= limit) {
+		*sectors = 0;
+		return;
+	}
+	if (end > limit)
+		end = limit;
+
+	*offset = start;
+	*sectors = end - start;
+	if (!*sectors)
+		return;
+
+	if (llbitmap->mddev->pers->bitmap_sector_map)
+		llbitmap->mddev->pers->bitmap_sector_map(llbitmap->mddev, offset,
+							 sectors, previous);
+	else if (!previous && llbitmap->mddev->pers->bitmap_sector)
+		llbitmap->mddev->pers->bitmap_sector(llbitmap->mddev, offset,
+							 sectors);
+}
+
+static void llbitmap_encode_range(struct llbitmap *llbitmap, sector_t *offset,
+				  unsigned long *sectors, bool previous)
+{
+	unsigned long chunksize = previous ? llbitmap->chunksize :
+				      llbitmap->reshape_chunksize;
+	u64 start;
+	u64 end;
+
+	if (!*sectors) {
+		*offset = 0;
+		return;
+	}
+
+	start = div64_u64(*offset, chunksize);
+	end = div64_u64(*offset + *sectors - 1, chunksize);
+	*offset = (sector_t)start << llbitmap->chunkshift;
+	*sectors = (end - start + 1) << llbitmap->chunkshift;
+}
+
 static enum llbitmap_state llbitmap_read(struct llbitmap *llbitmap, loff_t pos)
 {
 	unsigned int idx;
 	unsigned int offset;
 
@@ -1373,15 +1430,36 @@ static void llbitmap_destroy(struct mddev *mddev)
 	llbitmap_free_pages(llbitmap);
 	kfree(llbitmap);
 	mutex_unlock(&mddev->bitmap_info.mutex);
 }
 
+static bool llbitmap_map_previous(struct llbitmap *llbitmap, sector_t offset,
+				  unsigned long sectors)
+{
+	struct mddev *mddev = llbitmap->mddev;
+	sector_t boundary = mddev->reshape_position;
+
+	if (!llbitmap_reshaping(llbitmap))
+		return false;
+
+	WARN_ON_ONCE(sectors && offset < boundary && offset + sectors > boundary);
+
+	return mddev->reshape_backwards ? offset < boundary : offset >= boundary;
+}
+
 static void llbitmap_prepare_range(struct mddev *mddev, sector_t *offset,
 				   unsigned long *sectors)
 {
-	if (mddev->pers->bitmap_sector)
-		mddev->pers->bitmap_sector(mddev, offset, sectors);
+	struct llbitmap *llbitmap = mddev->bitmap;
+	bool previous;
+
+	if (!llbitmap)
+		return;
+
+	previous = llbitmap_map_previous(llbitmap, *offset, *sectors);
+	llbitmap_map_layout(llbitmap, offset, sectors, previous);
+	llbitmap_encode_range(llbitmap, offset, sectors, previous);
 }
 
 static void llbitmap_start_write(struct mddev *mddev, sector_t offset,
 				 unsigned long sectors)
 {
@@ -1546,21 +1624,29 @@ static void llbitmap_flush(struct mddev *mddev)
 /* This is used for raid5 lazy initial recovery */
 static bool llbitmap_blocks_synced(struct mddev *mddev, sector_t offset)
 {
 	struct llbitmap *llbitmap = mddev->bitmap;
 	unsigned long p = offset >> llbitmap->chunkshift;
-	enum llbitmap_state c = llbitmap_read(llbitmap, p);
+	enum llbitmap_state c;
+
+	if (p >= llbitmap->chunks)
+		return false;
+	c = llbitmap_read(llbitmap, p);
 
 	return c == BitClean || c == BitDirty || c == BitCleanUnwritten;
 }
 
 static sector_t llbitmap_skip_sync_blocks(struct mddev *mddev, sector_t offset)
 {
 	struct llbitmap *llbitmap = mddev->bitmap;
 	unsigned long p = offset >> llbitmap->chunkshift;
 	int blocks = llbitmap->chunksize - (offset & (llbitmap->chunksize - 1));
-	enum llbitmap_state c = llbitmap_read(llbitmap, p);
+	enum llbitmap_state c;
+
+	if (p >= llbitmap->chunks)
+		return 0;
+	c = llbitmap_read(llbitmap, p);
 
 	/* always skip unwritten blocks */
 	if (c == BitUnwritten)
 		return blocks;
 
@@ -1601,10 +1687,12 @@ static bool llbitmap_start_sync(struct mddev *mddev, sector_t offset,
 	/*
 	 * Handle one bit at a time, this is much simpler. And it doesn't matter
 	 * if md_do_sync() loop more times.
 	 */
 	*blocks = llbitmap->chunksize - (offset & (llbitmap->chunksize - 1));
+	if (p >= llbitmap->chunks)
+		return false;
 	state = llbitmap_state_machine(llbitmap, p, p, BitmapActionStartsync);
 	return state == BitSyncing || state == BitSyncingUnwritten;
 }
 
 /* Something is wrong, sync_thread stop at @offset */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH] md/md-llbitmap: don't skip reshape ranges from bitmap state
  2026-06-05  9:15 [PATCH 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                   ` (9 preceding siblings ...)
  2026-06-05  9:15 ` [PATCH] md/md-llbitmap: add reshape range mapping helpers Yu Kuai
@ 2026-06-05  9:15 ` Yu Kuai
  2026-06-05  9:15 ` [PATCH] md/md-llbitmap: remap checkpointed bits as reshape progresses Yu Kuai
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 66+ messages in thread
From: Yu Kuai @ 2026-06-05  9:15 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Reshape progress is tracked by array metadata rather than llbitmap.
Do not let llbitmap skip_sync_blocks() suppress reshape ranges based on
stale bitmap state before the corresponding checkpoint is persisted.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/md-llbitmap.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index 76621a9fd645..67e63d81a824 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -1644,10 +1644,18 @@ static sector_t llbitmap_skip_sync_blocks(struct mddev *mddev, sector_t offset)
 
 	if (p >= llbitmap->chunks)
 		return 0;
 	c = llbitmap_read(llbitmap, p);
 
+	/*
+	 * Reshape progress is tracked by array metadata rather than llbitmap.
+	 * Skipping reshape ranges from stale bitmap state can lose data after a
+	 * restart before the corresponding bits are checkpointed to disk.
+	 */
+	if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery))
+		return 0;
+
 	/* always skip unwritten blocks */
 	if (c == BitUnwritten)
 		return blocks;
 
 	/* Skip CleanUnwritten - no user data, will be reset after recovery */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH] md/md-llbitmap: remap checkpointed bits as reshape progresses
  2026-06-05  9:15 [PATCH 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                   ` (10 preceding siblings ...)
  2026-06-05  9:15 ` [PATCH] md/md-llbitmap: don't skip reshape ranges from bitmap state Yu Kuai
@ 2026-06-05  9:15 ` Yu Kuai
  2026-06-05  9:15 ` [PATCH] md/md-llbitmap: clamp state-machine walks to tracked bits Yu Kuai
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 66+ messages in thread
From: Yu Kuai @ 2026-06-05  9:15 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Merge checkpointed old llbitmap state forward as reshape_position
advances and record the checkpoint remap through reshape_mark().

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/md-llbitmap.c | 172 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 172 insertions(+)

diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index 67e63d81a824..6783f1b3ddf0 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -499,10 +499,18 @@ static void llbitmap_map_layout(struct llbitmap *llbitmap, sector_t *offset,
 		llbitmap->mddev->pers->bitmap_sector_map(llbitmap->mddev, offset,
 							 sectors, previous);
 	else if (!previous && llbitmap->mddev->pers->bitmap_sector)
 		llbitmap->mddev->pers->bitmap_sector(llbitmap->mddev, offset,
 							 sectors);
+
+	limit = llbitmap_personality_sync_size(llbitmap, previous);
+	start = *offset;
+	end = start + *sectors;
+	if (start >= limit)
+		*sectors = 0;
+	else if (end > limit)
+		*sectors = limit - start;
 }
 
 static void llbitmap_encode_range(struct llbitmap *llbitmap, sector_t *offset,
 				  unsigned long *sectors, bool previous)
 {
@@ -903,10 +911,37 @@ static int llbitmap_prepare_resize(struct llbitmap *llbitmap,
 		llbitmap_mark_range(llbitmap, old_chunks, new_chunks - 1,
 				    BitUnwritten);
 	return 0;
 }
 
+static enum llbitmap_state
+llbitmap_rmerge_state(struct llbitmap *llbitmap,
+		      enum llbitmap_state dst,
+		      enum llbitmap_state src)
+{
+	bool level_456 = raid_is_456(llbitmap->mddev);
+
+	if (dst == BitNeedSync || dst == BitSyncing ||
+	    src == BitNeedSync || src == BitSyncing)
+		return BitNeedSync;
+
+	if (dst == BitDirty || src == BitDirty)
+		return BitDirty;
+
+	/*
+	 * Reshape generates valid target parity/data for both already-written
+	 * and not-yet-written regions in the checkpointed range, so a mix of
+	 * clean and unwritten still results in a clean destination bit.
+	 */
+	if (level_456 && ((dst == BitClean && src == BitUnwritten) ||
+			  (src == BitClean && dst == BitUnwritten)))
+		return BitClean;
+	if (dst == BitClean || src == BitClean)
+		return BitClean;
+	return BitUnwritten;
+}
+
 static void llbitmap_init_state(struct llbitmap *llbitmap)
 {
 	struct mddev *mddev = llbitmap->mddev;
 	enum llbitmap_state state = BitUnwritten;
 	unsigned long i;
@@ -1796,10 +1831,124 @@ static int llbitmap_reshape_can_start(struct mddev *mddev)
 	mutex_unlock(&mddev->bitmap_info.mutex);
 
 	return ret;
 }
 
+struct llbitmap_reshape_range {
+	sector_t offset;
+	unsigned long sectors;
+	sector_t start;
+	sector_t end;
+};
+
+static enum llbitmap_state
+llbitmap_reshape_init_dst(struct llbitmap *llbitmap, unsigned long dst,
+			  const struct llbitmap_reshape_range *new)
+{
+	u64 bit_start = (u64)dst * llbitmap->reshape_chunksize;
+	u64 bit_end = bit_start + llbitmap->reshape_chunksize;
+
+	if (!llbitmap->mddev->reshape_backwards)
+		return bit_start < new->offset ? llbitmap_read(llbitmap, dst) :
+		       BitUnwritten;
+	return bit_end > new->end ? llbitmap_read(llbitmap, dst) : BitUnwritten;
+}
+
+static void llbitmap_reshape_dst_range(struct llbitmap *llbitmap,
+				       unsigned long dst,
+				       const struct llbitmap_reshape_range *new,
+				       struct llbitmap_reshape_range *dst_range)
+{
+	sector_t dst_bit_start = (sector_t)dst * llbitmap->reshape_chunksize;
+
+	dst_range->start = max(dst_bit_start, new->offset);
+	dst_range->end = min(dst_bit_start + llbitmap->reshape_chunksize,
+			     new->end);
+	dst_range->offset = dst_range->start;
+	dst_range->sectors = dst_range->end - dst_range->start;
+}
+
+static void llbitmap_reshape_map_range(struct llbitmap *llbitmap,
+				       sector_t lo, sector_t hi,
+				       bool previous,
+				       struct llbitmap_reshape_range *range)
+{
+	range->offset = lo;
+	range->sectors = hi - lo;
+	llbitmap_map_layout(llbitmap, &range->offset, &range->sectors, previous);
+	range->start = range->offset;
+	range->end = range->offset + range->sectors;
+}
+
+static bool llbitmap_reshape_src_range(const struct llbitmap_reshape_range *old,
+				       const struct llbitmap_reshape_range *new,
+				       const struct llbitmap_reshape_range *dst,
+				       struct llbitmap_reshape_range *src)
+{
+	if (!old->sectors)
+		return false;
+
+	src->start = old->offset +
+		mul_u64_u64_div_u64(dst->start - new->offset,
+				    old->sectors, new->sectors);
+	src->end = old->offset +
+		mul_u64_u64_div_u64_roundup(dst->end - new->offset,
+					    old->sectors, new->sectors);
+	if (src->end > old->end)
+		src->end = old->end;
+	src->offset = src->start;
+	src->sectors = src->end - src->start;
+
+	return src->sectors;
+}
+
+static enum llbitmap_state llbitmap_rmerge_src(struct llbitmap *llbitmap,
+					       enum llbitmap_state state,
+					       const struct llbitmap_reshape_range *src)
+{
+	unsigned long bit = div64_u64(src->start, llbitmap->chunksize);
+	unsigned long end = div64_u64(src->end - 1, llbitmap->chunksize);
+
+	while (bit <= end) {
+		enum llbitmap_state src_state = llbitmap_read(llbitmap, bit);
+
+		state = llbitmap_rmerge_state(llbitmap, state, src_state);
+		bit++;
+	}
+
+	return state;
+}
+
+static void llbitmap_reshape_merge(struct llbitmap *llbitmap,
+				   const struct llbitmap_reshape_range *old,
+				   const struct llbitmap_reshape_range *new)
+{
+	unsigned long dst_start;
+	unsigned long dst_end;
+	unsigned long dst;
+
+	if (!new->sectors)
+		return;
+
+	dst_start = div64_u64(new->offset, llbitmap->reshape_chunksize);
+	dst_end = div64_u64(new->end - 1, llbitmap->reshape_chunksize);
+
+	for (dst = dst_start; dst <= dst_end; dst++) {
+		struct llbitmap_reshape_range dst_range;
+		struct llbitmap_reshape_range src;
+		enum llbitmap_state state;
+
+		llbitmap_reshape_dst_range(llbitmap, dst, new, &dst_range);
+		state = llbitmap_reshape_init_dst(llbitmap, dst, new);
+		if (llbitmap_reshape_src_range(old, new, &dst_range, &src))
+			state = llbitmap_rmerge_src(llbitmap, state, &src);
+		else
+			state = llbitmap_rmerge_state(llbitmap, state, BitUnwritten);
+		llbitmap_write(llbitmap, state, dst);
+	}
+}
+
 static void llbitmap_reshape_finish(struct mddev *mddev)
 {
 	struct llbitmap *llbitmap = mddev->bitmap;
 
 	if (mddev->pers->quiesce)
@@ -1820,10 +1969,32 @@ static void llbitmap_reshape_finish(struct mddev *mddev)
 
 	if (mddev->pers->quiesce)
 		mddev->pers->quiesce(mddev, 0);
 }
 
+static void llbitmap_reshape_mark(struct mddev *mddev, sector_t old_pos,
+				  sector_t new_pos)
+{
+	struct llbitmap *llbitmap = mddev->bitmap;
+	sector_t lo;
+	sector_t hi;
+	struct llbitmap_reshape_range old;
+	struct llbitmap_reshape_range new;
+
+	if (!llbitmap || old_pos == new_pos)
+		return;
+
+	lo = min(old_pos, new_pos);
+	hi = max(old_pos, new_pos);
+	if (!hi)
+		return;
+
+	llbitmap_reshape_map_range(llbitmap, lo, hi, true, &old);
+	llbitmap_reshape_map_range(llbitmap, lo, hi, false, &new);
+	llbitmap_reshape_merge(llbitmap, &old, &new);
+}
+
 static void llbitmap_write_sb(struct llbitmap *llbitmap)
 {
 	int nr_blocks = DIV_ROUND_UP(BITMAP_DATA_OFFSET, llbitmap->io_size);
 
 	bitmap_fill(llbitmap->pctl[0]->dirty, nr_blocks);
@@ -2116,10 +2287,11 @@ static struct bitmap_operations llbitmap_ops = {
 	.get_stats		= llbitmap_get_stats,
 	.dirty_bits		= llbitmap_dirty_bits,
 	.prepare_range		= llbitmap_prepare_range,
 	.reshape_finish		= llbitmap_reshape_finish,
 	.reshape_can_start	= llbitmap_reshape_can_start,
+	.reshape_mark		= llbitmap_reshape_mark,
 	.write_all		= llbitmap_write_all,
 
 	.groups			= md_llbitmap_groups,
 };
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH] md/md-llbitmap: clamp state-machine walks to tracked bits
  2026-06-05  9:15 [PATCH 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                   ` (11 preceding siblings ...)
  2026-06-05  9:15 ` [PATCH] md/md-llbitmap: remap checkpointed bits as reshape progresses Yu Kuai
@ 2026-06-05  9:15 ` Yu Kuai
  2026-06-05  9:15 ` [PATCH] md/raid10: reject llbitmap reshape when md chunk shrinks Yu Kuai
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 66+ messages in thread
From: Yu Kuai @ 2026-06-05  9:15 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

llbitmap_state_machine() can be called with an end bit beyond
llbitmap->chunks. In particular, llbitmap_cond_end_sync() passes
sector >> chunkshift, and sector can reach the tracked boundary
exactly.

Clamp the state-machine range to llbitmap->chunks so it cannot walk
past the tracked bitmap.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/md-llbitmap.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index 6783f1b3ddf0..65d2fd1979e5 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -980,11 +980,14 @@ static enum llbitmap_state llbitmap_state_machine(struct llbitmap *llbitmap,
 
 	if (action == BitmapActionInit) {
 		llbitmap_init_state(llbitmap);
 		return BitNone;
 	}
-
+	if (start >= llbitmap->chunks)
+		return BitNone;
+	if (end >= llbitmap->chunks)
+		end = llbitmap->chunks - 1;
 	while (start <= end) {
 		enum llbitmap_state c = llbitmap_read(llbitmap, start);
 
 		if (c < 0 || c >= BitStateCount) {
 			pr_err("%s: invalid bit %lu state %d action %d, forcing resync\n",
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH] md/raid10: reject llbitmap reshape when md chunk shrinks
  2026-06-05  9:15 [PATCH 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                   ` (12 preceding siblings ...)
  2026-06-05  9:15 ` [PATCH] md/md-llbitmap: clamp state-machine walks to tracked bits Yu Kuai
@ 2026-06-05  9:15 ` Yu Kuai
  2026-06-05  9:15 ` [PATCH] md/raid10: wire llbitmap reshape lifecycle Yu Kuai
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 66+ messages in thread
From: Yu Kuai @ 2026-06-05  9:15 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

llbitmap reshape keeps one live bitmap and cannot safely make an
existing bitmap bit cover a smaller data range.

The llbitmap chunksize itself will not shrink when mddev->chunk_sectors
stays the same or grows. However, shrinking mddev->chunk_sectors can
shrink the effective data range covered by each bit for the RAID10
reshape geometry. Reject that reshape while llbitmap is active.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/raid10.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 5bd7698e0a1b..c69ef76c89e1 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4269,10 +4269,14 @@ static int raid10_check_reshape(struct mddev *mddev)
 	struct r10conf *conf = mddev->private;
 	struct geom geo;
 
 	if (conf->geo.far_copies != 1 && !conf->geo.far_offset)
 		return -EINVAL;
+	if (mddev->bitmap_id == ID_LLBITMAP &&
+	    mddev->new_chunk_sectors &&
+	    mddev->new_chunk_sectors < mddev->chunk_sectors)
+		return -EOPNOTSUPP;
 
 	if (setup_geo(&geo, mddev, geo_start) != conf->copies)
 		/* mustn't change number of copies */
 		return -EINVAL;
 	if (geo.far_copies > 1 && !geo.far_offset)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH] md/raid10: wire llbitmap reshape lifecycle
  2026-06-05  9:15 [PATCH 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                   ` (13 preceding siblings ...)
  2026-06-05  9:15 ` [PATCH] md/raid10: reject llbitmap reshape when md chunk shrinks Yu Kuai
@ 2026-06-05  9:15 ` Yu Kuai
  2026-06-05  9:15 ` [PATCH] md/raid10: split reshape bios before bitmap accounting Yu Kuai
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 66+ messages in thread
From: Yu Kuai @ 2026-06-05  9:15 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Prepare llbitmap before RAID10 starts growing, checkpoint the bitmap
before advancing reshape_position, finish the llbitmap geometry update
when reshape completes, and export the old and new tracked sizes.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/raid10.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index c69ef76c89e1..40000b867136 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4379,10 +4379,16 @@ static int raid10_start_reshape(struct mddev *mddev)
 	int spares = 0;
 	int ret;
 
 	if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
 		return -EBUSY;
+	if (md_bitmap_enabled(mddev, false) &&
+	    mddev->bitmap_ops->reshape_can_start) {
+		ret = mddev->bitmap_ops->reshape_can_start(mddev);
+		if (ret)
+			return ret;
+	}
 
 	if (setup_geo(&new, mddev, geo_start) != conf->copies)
 		return -EINVAL;
 
 	before_length = ((1 << conf->prev.chunk_shift) *
@@ -4692,10 +4698,17 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
 
 	if (need_flush ||
 	    time_after(jiffies, conf->reshape_checkpoint + 10*HZ)) {
 		/* Need to update reshape_position in metadata */
 		wait_barrier(conf, false);
+		if (md_bitmap_enabled(mddev, false) &&
+		    mddev->bitmap_ops->reshape_mark &&
+		    conf->reshape_safe != conf->reshape_progress) {
+			mddev->bitmap_ops->reshape_mark(mddev, conf->reshape_safe,
+						       conf->reshape_progress);
+			mddev->bitmap_ops->unplug(mddev, true);
+		}
 		mddev->reshape_position = conf->reshape_progress;
 		if (mddev->reshape_backwards)
 			mddev->curr_resync_completed = raid10_size(mddev, 0, 0)
 				- conf->reshape_progress;
 		else
@@ -4890,13 +4903,23 @@ static void reshape_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 	end_reshape_request(r10_bio);
 }
 
 static void end_reshape(struct r10conf *conf)
 {
+	struct mddev *mddev = conf->mddev;
+
 	if (test_bit(MD_RECOVERY_INTR, &conf->mddev->recovery))
 		return;
 
+	if (md_bitmap_enabled(mddev, false) &&
+	    mddev->bitmap_ops->reshape_mark &&
+	    conf->reshape_safe != conf->reshape_progress) {
+		mddev->bitmap_ops->reshape_mark(mddev, conf->reshape_safe,
+					       conf->reshape_progress);
+		mddev->bitmap_ops->unplug(mddev, true);
+	}
+
 	spin_lock_irq(&conf->device_lock);
 	conf->prev = conf->geo;
 	md_finish_reshape(conf->mddev);
 	smp_wmb();
 	conf->reshape_progress = MaxSector;
@@ -5024,14 +5047,19 @@ static void end_reshape_request(struct r10bio *r10_bio)
 }
 
 static void raid10_finish_reshape(struct mddev *mddev)
 {
 	struct r10conf *conf = mddev->private;
+	bool llbitmap = mddev->bitmap_id == ID_LLBITMAP &&
+		md_bitmap_enabled(mddev, false);
 
 	if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
 		return;
 
+	if (llbitmap && mddev->bitmap_ops->reshape_finish)
+		mddev->bitmap_ops->reshape_finish(mddev);
+
 	if (mddev->delta_disks > 0) {
 		if (mddev->resync_offset > mddev->resync_max_sectors) {
 			mddev->resync_offset = mddev->resync_max_sectors;
 			set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 		}
@@ -5054,10 +5082,19 @@ static void raid10_finish_reshape(struct mddev *mddev)
 	mddev->reshape_position = MaxSector;
 	mddev->delta_disks = 0;
 	mddev->reshape_backwards = 0;
 }
 
+static sector_t raid10_bitmap_sync_size(struct mddev *mddev, bool previous)
+{
+	struct r10conf *conf = mddev->private;
+
+	if (previous)
+		return raid10_size(mddev, 0, 0);
+	return raid10_size(mddev, 0, conf->geo.raid_disks);
+}
+
 static struct md_personality raid10_personality =
 {
 	.head = {
 		.type	= MD_PERSONALITY,
 		.id	= ID_RAID10,
@@ -5080,10 +5117,12 @@ static struct md_personality raid10_personality =
 	.takeover	= raid10_takeover,
 	.check_reshape	= raid10_check_reshape,
 	.start_reshape	= raid10_start_reshape,
 	.finish_reshape	= raid10_finish_reshape,
 	.update_reshape_pos = raid10_update_reshape_pos,
+	.bitmap_sync_size = raid10_bitmap_sync_size,
+	.bitmap_array_sectors = raid10_bitmap_sync_size,
 };
 
 static int __init raid10_init(void)
 {
 	return register_md_submodule(&raid10_personality.head);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH] md/raid10: split reshape bios before bitmap accounting
  2026-06-05  9:15 [PATCH 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                   ` (14 preceding siblings ...)
  2026-06-05  9:15 ` [PATCH] md/raid10: wire llbitmap reshape lifecycle Yu Kuai
@ 2026-06-05  9:15 ` Yu Kuai
  2026-06-05  9:15 ` [PATCH] md/raid5: add exact old and new llbitmap mapping helpers Yu Kuai
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 66+ messages in thread
From: Yu Kuai @ 2026-06-05  9:15 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Use the shared mddev_bio_split_at_reshape_offset() helper so RAID10
submits only one-side bios to llbitmap during reshape.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/raid10.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 40000b867136..cd35d183877c 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1898,10 +1898,16 @@ static bool raid10_make_request(struct mddev *mddev, struct bio *bio)
 			 || conf->prev.near_copies <
 			 conf->prev.raid_disks)))
 		sectors = chunk_sects -
 			(bio->bi_iter.bi_sector &
 			 (chunk_sects - 1));
+
+	bio = mddev_bio_split_at_reshape_offset(mddev, bio, &sectors,
+						&conf->bio_split);
+	if (!bio)
+		return true;
+
 	__make_request(mddev, bio, sectors);
 
 	/* In case raid10d snuck in to freeze_array */
 	wake_up_barrier(conf);
 	return true;
@@ -4273,11 +4279,10 @@ static int raid10_check_reshape(struct mddev *mddev)
 		return -EINVAL;
 	if (mddev->bitmap_id == ID_LLBITMAP &&
 	    mddev->new_chunk_sectors &&
 	    mddev->new_chunk_sectors < mddev->chunk_sectors)
 		return -EOPNOTSUPP;
-
 	if (setup_geo(&geo, mddev, geo_start) != conf->copies)
 		/* mustn't change number of copies */
 		return -EINVAL;
 	if (geo.far_copies > 1 && !geo.far_offset)
 		/* Cannot switch to 'far' mode */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH] md/raid5: add exact old and new llbitmap mapping helpers
  2026-06-05  9:15 [PATCH 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                   ` (15 preceding siblings ...)
  2026-06-05  9:15 ` [PATCH] md/raid10: split reshape bios before bitmap accounting Yu Kuai
@ 2026-06-05  9:15 ` Yu Kuai
  2026-06-05  9:15 ` [PATCH] md/raid5: reject llbitmap reshape when md chunk shrinks Yu Kuai
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 66+ messages in thread
From: Yu Kuai @ 2026-06-05  9:15 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Teach RAID5 to export exact old and new llbitmap mappings and the
corresponding sync and array sizes for reshape-aware bitmap users.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/raid5.c | 70 ++++++++++++++++++++++++++++++++++------------
 1 file changed, 52 insertions(+), 18 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 180ff0660b6a..ff4da82f78fe 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5933,56 +5933,67 @@ static enum reshape_loc get_reshape_loc(struct mddev *mddev,
 	if (ahead_of_reshape(mddev, logical_sector, reshape_safe))
 		return LOC_INSIDE_RESHAPE;
 	return LOC_BEHIND_RESHAPE;
 }
 
-static void raid5_bitmap_sector(struct mddev *mddev, sector_t *offset,
-				unsigned long *sectors)
+static void raid5_bitmap_sector_map(struct mddev *mddev, sector_t *offset,
+				    unsigned long *sectors,
+				    bool previous)
 {
 	struct r5conf *conf = mddev->private;
 	sector_t start = *offset;
 	sector_t end = start + *sectors;
-	sector_t prev_start = start;
-	sector_t prev_end = end;
 	int sectors_per_chunk;
-	enum reshape_loc loc;
 	int dd_idx;
 
-	sectors_per_chunk = conf->chunk_sectors *
-		(conf->raid_disks - conf->max_degraded);
+	if (previous)
+		sectors_per_chunk = conf->prev_chunk_sectors *
+			(conf->previous_raid_disks - conf->max_degraded);
+	else
+		sectors_per_chunk = conf->chunk_sectors *
+			(conf->raid_disks - conf->max_degraded);
 	start = round_down(start, sectors_per_chunk);
 	end = round_up(end, sectors_per_chunk);
 
-	start = raid5_compute_sector(conf, start, 0, &dd_idx, NULL);
-	end = raid5_compute_sector(conf, end, 0, &dd_idx, NULL);
+	start = raid5_compute_sector(conf, start, previous, &dd_idx, NULL);
+	end = raid5_compute_sector(conf, end, previous, &dd_idx, NULL);
+	*offset = start;
+	*sectors = end - start;
+}
+
+static void raid5_bitmap_sector(struct mddev *mddev, sector_t *offset,
+				unsigned long *sectors)
+{
+	struct r5conf *conf = mddev->private;
+	sector_t start = *offset;
+	sector_t end = start + *sectors;
+	sector_t prev_start = start;
+	unsigned long prev_sectors = end - start;
+	enum reshape_loc loc;
+
+	raid5_bitmap_sector_map(mddev, &start, sectors, false);
+	end = start + *sectors;
 
 	/*
 	 * For LOC_INSIDE_RESHAPE, this IO will wait for reshape to make
 	 * progress, hence it's the same as LOC_BEHIND_RESHAPE.
 	 */
 	loc = get_reshape_loc(mddev, conf, prev_start);
 	if (likely(loc != LOC_AHEAD_OF_RESHAPE)) {
 		*offset = start;
-		*sectors = end - start;
 		return;
 	}
 
-	sectors_per_chunk = conf->prev_chunk_sectors *
-		(conf->previous_raid_disks - conf->max_degraded);
-	prev_start = round_down(prev_start, sectors_per_chunk);
-	prev_end = round_down(prev_end, sectors_per_chunk);
-
-	prev_start = raid5_compute_sector(conf, prev_start, 1, &dd_idx, NULL);
-	prev_end = raid5_compute_sector(conf, prev_end, 1, &dd_idx, NULL);
+	raid5_bitmap_sector_map(mddev, &prev_start, &prev_sectors, true);
 
 	/*
 	 * for LOC_AHEAD_OF_RESHAPE, reshape can make progress before this IO
 	 * is handled in make_stripe_request(), we can't know this here hence
 	 * we set bits for both.
 	 */
 	*offset = min(start, prev_start);
-	*sectors = max(end, prev_end) - *offset;
+	*sectors = max(end, prev_start + prev_sectors) - *offset;
 }
 
 static enum stripe_result make_stripe_request(struct mddev *mddev,
 		struct r5conf *conf, struct stripe_request_ctx *ctx,
 		sector_t logical_sector, struct bio *bi)
@@ -9040,10 +9051,24 @@ static void raid5_prepare_suspend(struct mddev *mddev)
 	struct r5conf *conf = mddev->private;
 
 	wake_up(&conf->wait_for_reshape);
 }
 
+static sector_t raid5_bitmap_sync_size(struct mddev *mddev, bool previous)
+{
+	return mddev->dev_sectors;
+}
+
+static sector_t raid5_bitmap_array_sectors(struct mddev *mddev, bool previous)
+{
+	struct r5conf *conf = mddev->private;
+
+	if (previous)
+		return raid5_size(mddev, 0, 0);
+	return raid5_size(mddev, mddev->dev_sectors, conf->raid_disks);
+}
+
 static struct md_personality raid6_personality =
 {
 	.head = {
 		.type	= MD_PERSONALITY,
 		.id	= ID_RAID6,
@@ -9069,10 +9094,13 @@ static struct md_personality raid6_personality =
 	.quiesce	= raid5_quiesce,
 	.takeover	= raid6_takeover,
 	.change_consistency_policy = raid5_change_consistency_policy,
 	.prepare_suspend = raid5_prepare_suspend,
 	.bitmap_sector	= raid5_bitmap_sector,
+	.bitmap_sector_map = raid5_bitmap_sector_map,
+	.bitmap_sync_size = raid5_bitmap_sync_size,
+	.bitmap_array_sectors = raid5_bitmap_array_sectors,
 };
 static struct md_personality raid5_personality =
 {
 	.head = {
 		.type	= MD_PERSONALITY,
@@ -9099,10 +9127,13 @@ static struct md_personality raid5_personality =
 	.quiesce	= raid5_quiesce,
 	.takeover	= raid5_takeover,
 	.change_consistency_policy = raid5_change_consistency_policy,
 	.prepare_suspend = raid5_prepare_suspend,
 	.bitmap_sector	= raid5_bitmap_sector,
+	.bitmap_sector_map = raid5_bitmap_sector_map,
+	.bitmap_sync_size = raid5_bitmap_sync_size,
+	.bitmap_array_sectors = raid5_bitmap_array_sectors,
 };
 
 static struct md_personality raid4_personality =
 {
 	.head = {
@@ -9130,10 +9161,13 @@ static struct md_personality raid4_personality =
 	.quiesce	= raid5_quiesce,
 	.takeover	= raid4_takeover,
 	.change_consistency_policy = raid5_change_consistency_policy,
 	.prepare_suspend = raid5_prepare_suspend,
 	.bitmap_sector	= raid5_bitmap_sector,
+	.bitmap_sector_map = raid5_bitmap_sector_map,
+	.bitmap_sync_size = raid5_bitmap_sync_size,
+	.bitmap_array_sectors = raid5_bitmap_array_sectors,
 };
 
 static int __init raid5_init(void)
 {
 	int ret;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH] md/raid5: reject llbitmap reshape when md chunk shrinks
  2026-06-05  9:15 [PATCH 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                   ` (16 preceding siblings ...)
  2026-06-05  9:15 ` [PATCH] md/raid5: add exact old and new llbitmap mapping helpers Yu Kuai
@ 2026-06-05  9:15 ` Yu Kuai
  2026-06-05  9:15 ` [PATCH] md/raid5: wire llbitmap reshape lifecycle Yu Kuai
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 66+ messages in thread
From: Yu Kuai @ 2026-06-05  9:15 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

llbitmap reshape keeps one live bitmap and cannot safely make an
existing bitmap bit cover a smaller data range.

The llbitmap chunksize itself will not shrink when mddev->chunk_sectors
stays the same or grows. However, shrinking mddev->chunk_sectors shrinks
sectors_per_chunk used by raid5_bitmap_sector_map(). That can shrink the
effective data range covered by each bit across the old and new RAID5
geometry. Reject that reshape while llbitmap is active.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/raid5.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index ff4da82f78fe..55af1d1eec63 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -8497,10 +8497,13 @@ static int check_reshape(struct mddev *mddev)
 	}
 
 	if (!check_stripe_cache(mddev))
 		return -ENOSPC;
 
+	if (mddev->bitmap_id == ID_LLBITMAP &&
+	    mddev->new_chunk_sectors < mddev->chunk_sectors)
+		return -EOPNOTSUPP;
 	if (mddev->new_chunk_sectors > mddev->chunk_sectors ||
 	    mddev->delta_disks > 0)
 		if (resize_chunks(conf,
 				  conf->previous_raid_disks
 				  + max(0, mddev->delta_disks),
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH] md/raid5: wire llbitmap reshape lifecycle
  2026-06-05  9:15 [PATCH 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                   ` (17 preceding siblings ...)
  2026-06-05  9:15 ` [PATCH] md/raid5: reject llbitmap reshape when md chunk shrinks Yu Kuai
@ 2026-06-05  9:15 ` Yu Kuai
  2026-06-05  9:15 ` [PATCH] md/raid5: split reshape bios before bitmap accounting Yu Kuai
  2026-06-24  6:41 ` [PATCH v2 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
  20 siblings, 0 replies; 66+ messages in thread
From: Yu Kuai @ 2026-06-05  9:15 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Prepare llbitmap before RAID5 reshape starts, checkpoint the bitmap
before advancing reshape_position, and finish the llbitmap geometry
update when reshape completes.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/raid5.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 55af1d1eec63..6786f30dc59b 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6421,10 +6421,17 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, int *sk
 		wait_event(conf->wait_for_reshape,
 			   atomic_read(&conf->reshape_stripes)==0
 			   || test_bit(MD_RECOVERY_INTR, &mddev->recovery));
 		if (atomic_read(&conf->reshape_stripes) != 0)
 			return 0;
+		if (md_bitmap_enabled(mddev, false) &&
+		    mddev->bitmap_ops->reshape_mark &&
+		    conf->reshape_safe != conf->reshape_progress) {
+			mddev->bitmap_ops->reshape_mark(mddev, conf->reshape_safe,
+						       conf->reshape_progress);
+			mddev->bitmap_ops->unplug(mddev, true);
+		}
 		mddev->reshape_position = conf->reshape_progress;
 		mddev->curr_resync_completed = sector_nr;
 		if (!mddev->reshape_backwards)
 			/* Can update recovery_offset */
 			rdev_for_each(rdev, mddev)
@@ -6530,10 +6537,17 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, int *sk
 		wait_event(conf->wait_for_reshape,
 			   atomic_read(&conf->reshape_stripes) == 0
 			   || test_bit(MD_RECOVERY_INTR, &mddev->recovery));
 		if (atomic_read(&conf->reshape_stripes) != 0)
 			goto ret;
+		if (md_bitmap_enabled(mddev, false) &&
+		    mddev->bitmap_ops->reshape_mark &&
+		    conf->reshape_safe != conf->reshape_progress) {
+			mddev->bitmap_ops->reshape_mark(mddev, conf->reshape_safe,
+						       conf->reshape_progress);
+			mddev->bitmap_ops->unplug(mddev, true);
+		}
 		mddev->reshape_position = conf->reshape_progress;
 		mddev->curr_resync_completed = sector_nr;
 		if (!mddev->reshape_backwards)
 			/* Can update recovery_offset */
 			rdev_for_each(rdev, mddev)
@@ -8562,10 +8576,16 @@ static int raid5_start_reshape(struct mddev *mddev)
 	    < mddev->array_sectors) {
 		pr_warn("md/raid:%s: array size must be reduced before number of disks\n",
 			mdname(mddev));
 		return -EINVAL;
 	}
+	if (md_bitmap_enabled(mddev, false) &&
+	    mddev->bitmap_id == ID_LLBITMAP) {
+		i = mddev->bitmap_ops->resize(mddev, mddev->dev_sectors, 0);
+		if (i)
+			return i;
+	}
 
 	atomic_set(&conf->reshape_stripes, 0);
 	spin_lock_irq(&conf->device_lock);
 	write_seqcount_begin(&conf->gen_lock);
 	conf->previous_raid_disks = conf->raid_disks;
@@ -8646,14 +8666,23 @@ static int raid5_start_reshape(struct mddev *mddev)
 /* This is called from the reshape thread and should make any
  * changes needed in 'conf'
  */
 static void end_reshape(struct r5conf *conf)
 {
+	struct mddev *mddev = conf->mddev;
 
 	if (!test_bit(MD_RECOVERY_INTR, &conf->mddev->recovery)) {
 		struct md_rdev *rdev;
 
+		if (md_bitmap_enabled(mddev, false) &&
+		    mddev->bitmap_ops->reshape_mark &&
+		    conf->reshape_safe != conf->reshape_progress) {
+			mddev->bitmap_ops->reshape_mark(mddev, conf->reshape_safe,
+						       conf->reshape_progress);
+			mddev->bitmap_ops->unplug(mddev, true);
+		}
+
 		spin_lock_irq(&conf->device_lock);
 		conf->previous_raid_disks = conf->raid_disks;
 		md_finish_reshape(conf->mddev);
 		smp_wmb();
 		conf->reshape_progress = MaxSector;
@@ -8676,12 +8705,20 @@ static void end_reshape(struct r5conf *conf)
  */
 static void raid5_finish_reshape(struct mddev *mddev)
 {
 	struct r5conf *conf = mddev->private;
 	struct md_rdev *rdev;
+	bool llbitmap = mddev->bitmap_id == ID_LLBITMAP &&
+		md_bitmap_enabled(mddev, false);
 
 	if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {
+		if (llbitmap && mddev->bitmap_ops->reshape_finish)
+			mddev->bitmap_ops->reshape_finish(mddev);
+		if (llbitmap) {
+			mddev->resync_offset = 0;
+			mddev->resync_max_sectors = mddev->dev_sectors;
+		}
 
 		if (mddev->delta_disks <= 0) {
 			int d;
 			spin_lock_irq(&conf->device_lock);
 			mddev->degraded = raid5_calc_degraded(conf);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH] md/raid5: split reshape bios before bitmap accounting
  2026-06-05  9:15 [PATCH 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                   ` (18 preceding siblings ...)
  2026-06-05  9:15 ` [PATCH] md/raid5: wire llbitmap reshape lifecycle Yu Kuai
@ 2026-06-05  9:15 ` Yu Kuai
  2026-06-05 17:27   ` kernel test robot
  2026-06-06  2:15   ` kernel test robot
  2026-06-24  6:41 ` [PATCH v2 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
  20 siblings, 2 replies; 66+ messages in thread
From: Yu Kuai @ 2026-06-05  9:15 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Use the shared mddev_bio_split_at_reshape_offset() helper so RAID5
submits only one-side bios to llbitmap during reshape.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/raid5.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 6786f30dc59b..dde10b5b0ff5 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6172,10 +6172,18 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 		make_discard_request(mddev, bi);
 		md_write_end(mddev);
 		return true;
 	}
 
+	bi = mddev_bio_split_at_reshape_offset(mddev, bi, NULL,
+					       &conf->bio_split);
+	if (!bi) {
+		if (rw == WRITE)
+			md_write_end(mddev);
+		return true;
+	}
+
 	logical_sector = bi->bi_iter.bi_sector & ~((sector_t)RAID5_STRIPE_SECTORS(conf)-1);
 	bi->bi_next = NULL;
 
 	ctx = mempool_alloc(conf->ctx_pool, GFP_NOIO);
 	memset(ctx, 0, conf->ctx_size);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH] md/raid5: split reshape bios before bitmap accounting
  2026-06-05  9:15 ` [PATCH] md/raid5: split reshape bios before bitmap accounting Yu Kuai
@ 2026-06-05 17:27   ` kernel test robot
  2026-06-06  2:15   ` kernel test robot
  1 sibling, 0 replies; 66+ messages in thread
From: kernel test robot @ 2026-06-05 17:27 UTC (permalink / raw)
  To: Yu Kuai, Song Liu, Yu Kuai
  Cc: llvm, oe-kbuild-all, Li Nan, Xiao Ni, linux-raid, linux-kernel

Hi Yu,

kernel test robot noticed the following build errors:

[auto build test ERROR on linus/master]
[also build test ERROR on v6.16-rc1 next-20260605]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yu-Kuai/md-raid5-split-reshape-bios-before-bitmap-accounting/20260605-173848
base:   linus/master
patch link:    https://lore.kernel.org/r/20260605091527.2463539-21-yukuai%40kernel.org
patch subject: [PATCH] md/raid5: split reshape bios before bitmap accounting
config: x86_64-kexec (https://download.01.org/0day-ci/archive/20260605/202606051933.yVCn4DIj-lkp@intel.com/config)
compiler: clang version 22.0.0git (https://github.com/llvm/llvm-project f43d6834093b19baf79beda8c0337ab020ac5f17)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260605/202606051933.yVCn4DIj-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606051933.yVCn4DIj-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/md/raid5.c:4221:7: warning: variable 'qread' set but not used [-Wunused-but-set-variable]
    4221 |                 int qread =0;
         |                     ^
>> drivers/md/raid5.c:6126:7: error: call to undeclared function 'mddev_bio_split_at_reshape_offset'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
    6126 |         bi = mddev_bio_split_at_reshape_offset(mddev, bi, NULL,
         |              ^
>> drivers/md/raid5.c:6126:5: error: incompatible integer to pointer conversion assigning to 'struct bio *' from 'int' [-Wint-conversion]
    6126 |         bi = mddev_bio_split_at_reshape_offset(mddev, bi, NULL,
         |            ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    6127 |                                                &conf->bio_split);
         |                                                ~~~~~~~~~~~~~~~~~
   1 warning and 2 errors generated.


vim +/mddev_bio_split_at_reshape_offset +6126 drivers/md/raid5.c

  6083	
  6084	static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
  6085	{
  6086		DEFINE_WAIT_FUNC(wait, woken_wake_function);
  6087		struct r5conf *conf = mddev->private;
  6088		const int rw = bio_data_dir(bi);
  6089		struct stripe_request_ctx *ctx;
  6090		sector_t logical_sector;
  6091		enum stripe_result res;
  6092		int s, stripe_cnt;
  6093		bool on_wq;
  6094	
  6095		if (unlikely(bi->bi_opf & REQ_PREFLUSH)) {
  6096			int ret = log_handle_flush_request(conf, bi);
  6097	
  6098			if (ret == 0)
  6099				return true;
  6100			if (ret == -ENODEV) {
  6101				if (md_flush_request(mddev, bi))
  6102					return true;
  6103			}
  6104			/* ret == -EAGAIN, fallback */
  6105		}
  6106	
  6107		md_write_start(mddev, bi);
  6108		/*
  6109		 * If array is degraded, better not do chunk aligned read because
  6110		 * later we might have to read it again in order to reconstruct
  6111		 * data on failed drives.
  6112		 */
  6113		if (rw == READ && mddev->degraded == 0 &&
  6114		    mddev->reshape_position == MaxSector) {
  6115			bi = chunk_aligned_read(mddev, bi);
  6116			if (!bi)
  6117				return true;
  6118		}
  6119	
  6120		if (unlikely(bio_op(bi) == REQ_OP_DISCARD)) {
  6121			make_discard_request(mddev, bi);
  6122			md_write_end(mddev);
  6123			return true;
  6124		}
  6125	
> 6126		bi = mddev_bio_split_at_reshape_offset(mddev, bi, NULL,
  6127						       &conf->bio_split);
  6128		if (!bi) {
  6129			if (rw == WRITE)
  6130				md_write_end(mddev);
  6131			return true;
  6132		}
  6133	
  6134		logical_sector = bi->bi_iter.bi_sector & ~((sector_t)RAID5_STRIPE_SECTORS(conf)-1);
  6135		bi->bi_next = NULL;
  6136	
  6137		ctx = mempool_alloc(conf->ctx_pool, GFP_NOIO);
  6138		memset(ctx, 0, conf->ctx_size);
  6139		ctx->first_sector = logical_sector;
  6140		ctx->last_sector = bio_end_sector(bi);
  6141		/*
  6142		 * if r5l_handle_flush_request() didn't clear REQ_PREFLUSH,
  6143		 * we need to flush journal device
  6144		 */
  6145		if (unlikely(bi->bi_opf & REQ_PREFLUSH))
  6146			ctx->do_flush = true;
  6147	
  6148		stripe_cnt = DIV_ROUND_UP_SECTOR_T(ctx->last_sector - logical_sector,
  6149						   RAID5_STRIPE_SECTORS(conf));
  6150		bitmap_set(ctx->sectors_to_do, 0, stripe_cnt);
  6151	
  6152		pr_debug("raid456: %s, logical %llu to %llu\n", __func__,
  6153			 bi->bi_iter.bi_sector, ctx->last_sector);
  6154	
  6155		/* Bail out if conflicts with reshape and REQ_NOWAIT is set */
  6156		if ((bi->bi_opf & REQ_NOWAIT) &&
  6157		    get_reshape_loc(mddev, conf, logical_sector) == LOC_INSIDE_RESHAPE) {
  6158			bio_wouldblock_error(bi);
  6159			if (rw == WRITE)
  6160				md_write_end(mddev);
  6161			mempool_free(ctx, conf->ctx_pool);
  6162			return true;
  6163		}
  6164		md_account_bio(mddev, &bi);
  6165	
  6166		/*
  6167		 * Lets start with the stripe with the lowest chunk offset in the first
  6168		 * chunk. That has the best chances of creating IOs adjacent to
  6169		 * previous IOs in case of sequential IO and thus creates the most
  6170		 * sequential IO pattern. We don't bother with the optimization when
  6171		 * reshaping as the performance benefit is not worth the complexity.
  6172		 */
  6173		if (likely(conf->reshape_progress == MaxSector)) {
  6174			logical_sector = raid5_bio_lowest_chunk_sector(conf, bi);
  6175			on_wq = false;
  6176		} else {
  6177			add_wait_queue(&conf->wait_for_reshape, &wait);
  6178			on_wq = true;
  6179		}
  6180		s = (logical_sector - ctx->first_sector) >> RAID5_STRIPE_SHIFT(conf);
  6181	
  6182		while (1) {
  6183			res = make_stripe_request(mddev, conf, ctx, logical_sector,
  6184						  bi);
  6185			if (res == STRIPE_FAIL || res == STRIPE_WAIT_RESHAPE)
  6186				break;
  6187	
  6188			if (res == STRIPE_RETRY)
  6189				continue;
  6190	
  6191			if (res == STRIPE_SCHEDULE_AND_RETRY) {
  6192				WARN_ON_ONCE(!on_wq);
  6193				/*
  6194				 * Must release the reference to batch_last before
  6195				 * scheduling and waiting for work to be done,
  6196				 * otherwise the batch_last stripe head could prevent
  6197				 * raid5_activate_delayed() from making progress
  6198				 * and thus deadlocking.
  6199				 */
  6200				if (ctx->batch_last) {
  6201					raid5_release_stripe(ctx->batch_last);
  6202					ctx->batch_last = NULL;
  6203				}
  6204	
  6205				wait_woken(&wait, TASK_UNINTERRUPTIBLE,
  6206					   MAX_SCHEDULE_TIMEOUT);
  6207				continue;
  6208			}
  6209	
  6210			s = find_next_bit_wrap(ctx->sectors_to_do, stripe_cnt, s);
  6211			if (s == stripe_cnt)
  6212				break;
  6213	
  6214			logical_sector = ctx->first_sector +
  6215				(s << RAID5_STRIPE_SHIFT(conf));
  6216		}
  6217		if (unlikely(on_wq))
  6218			remove_wait_queue(&conf->wait_for_reshape, &wait);
  6219	
  6220		if (ctx->batch_last)
  6221			raid5_release_stripe(ctx->batch_last);
  6222	
  6223		if (rw == WRITE)
  6224			md_write_end(mddev);
  6225	
  6226		mempool_free(ctx, conf->ctx_pool);
  6227		if (res == STRIPE_WAIT_RESHAPE) {
  6228			DECLARE_COMPLETION_ONSTACK(done);
  6229			WRITE_ONCE(bi->bi_private, &done);
  6230	
  6231			bio_endio(bi);
  6232	
  6233			wait_for_completion(&done);
  6234			return false;
  6235		}
  6236	
  6237		bio_endio(bi);
  6238		return true;
  6239	}
  6240	

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] md/raid5: split reshape bios before bitmap accounting
  2026-06-05  9:15 ` [PATCH] md/raid5: split reshape bios before bitmap accounting Yu Kuai
  2026-06-05 17:27   ` kernel test robot
@ 2026-06-06  2:15   ` kernel test robot
  1 sibling, 0 replies; 66+ messages in thread
From: kernel test robot @ 2026-06-06  2:15 UTC (permalink / raw)
  To: Yu Kuai, Song Liu, Yu Kuai
  Cc: oe-kbuild-all, Li Nan, Xiao Ni, linux-raid, linux-kernel

Hi Yu,

kernel test robot noticed the following build errors:

[auto build test ERROR on linus/master]
[also build test ERROR on v7.1-rc6 next-20260605]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yu-Kuai/md-raid5-split-reshape-bios-before-bitmap-accounting/20260605-214540
base:   linus/master
patch link:    https://lore.kernel.org/r/20260605091527.2463539-21-yukuai%40kernel.org
patch subject: [PATCH] md/raid5: split reshape bios before bitmap accounting
config: x86_64-rhel-9.4 (https://download.01.org/0day-ci/archive/20260606/202606060443.UZIDPAEF-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260606/202606060443.UZIDPAEF-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606060443.UZIDPAEF-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/md/raid5.c: In function 'raid5_make_request':
>> drivers/md/raid5.c:6126:14: error: implicit declaration of function 'mddev_bio_split_at_reshape_offset' [-Wimplicit-function-declaration]
    6126 |         bi = mddev_bio_split_at_reshape_offset(mddev, bi, NULL,
         |              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> drivers/md/raid5.c:6126:12: error: assignment to 'struct bio *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
    6126 |         bi = mddev_bio_split_at_reshape_offset(mddev, bi, NULL,
         |            ^


vim +/mddev_bio_split_at_reshape_offset +6126 drivers/md/raid5.c

  6083	
  6084	static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
  6085	{
  6086		DEFINE_WAIT_FUNC(wait, woken_wake_function);
  6087		struct r5conf *conf = mddev->private;
  6088		const int rw = bio_data_dir(bi);
  6089		struct stripe_request_ctx *ctx;
  6090		sector_t logical_sector;
  6091		enum stripe_result res;
  6092		int s, stripe_cnt;
  6093		bool on_wq;
  6094	
  6095		if (unlikely(bi->bi_opf & REQ_PREFLUSH)) {
  6096			int ret = log_handle_flush_request(conf, bi);
  6097	
  6098			if (ret == 0)
  6099				return true;
  6100			if (ret == -ENODEV) {
  6101				if (md_flush_request(mddev, bi))
  6102					return true;
  6103			}
  6104			/* ret == -EAGAIN, fallback */
  6105		}
  6106	
  6107		md_write_start(mddev, bi);
  6108		/*
  6109		 * If array is degraded, better not do chunk aligned read because
  6110		 * later we might have to read it again in order to reconstruct
  6111		 * data on failed drives.
  6112		 */
  6113		if (rw == READ && mddev->degraded == 0 &&
  6114		    mddev->reshape_position == MaxSector) {
  6115			bi = chunk_aligned_read(mddev, bi);
  6116			if (!bi)
  6117				return true;
  6118		}
  6119	
  6120		if (unlikely(bio_op(bi) == REQ_OP_DISCARD)) {
  6121			make_discard_request(mddev, bi);
  6122			md_write_end(mddev);
  6123			return true;
  6124		}
  6125	
> 6126		bi = mddev_bio_split_at_reshape_offset(mddev, bi, NULL,
  6127						       &conf->bio_split);
  6128		if (!bi) {
  6129			if (rw == WRITE)
  6130				md_write_end(mddev);
  6131			return true;
  6132		}
  6133	
  6134		logical_sector = bi->bi_iter.bi_sector & ~((sector_t)RAID5_STRIPE_SECTORS(conf)-1);
  6135		bi->bi_next = NULL;
  6136	
  6137		ctx = mempool_alloc(conf->ctx_pool, GFP_NOIO);
  6138		memset(ctx, 0, conf->ctx_size);
  6139		ctx->first_sector = logical_sector;
  6140		ctx->last_sector = bio_end_sector(bi);
  6141		/*
  6142		 * if r5l_handle_flush_request() didn't clear REQ_PREFLUSH,
  6143		 * we need to flush journal device
  6144		 */
  6145		if (unlikely(bi->bi_opf & REQ_PREFLUSH))
  6146			ctx->do_flush = true;
  6147	
  6148		stripe_cnt = DIV_ROUND_UP_SECTOR_T(ctx->last_sector - logical_sector,
  6149						   RAID5_STRIPE_SECTORS(conf));
  6150		bitmap_set(ctx->sectors_to_do, 0, stripe_cnt);
  6151	
  6152		pr_debug("raid456: %s, logical %llu to %llu\n", __func__,
  6153			 bi->bi_iter.bi_sector, ctx->last_sector);
  6154	
  6155		/* Bail out if conflicts with reshape and REQ_NOWAIT is set */
  6156		if ((bi->bi_opf & REQ_NOWAIT) &&
  6157		    get_reshape_loc(mddev, conf, logical_sector) == LOC_INSIDE_RESHAPE) {
  6158			bio_wouldblock_error(bi);
  6159			if (rw == WRITE)
  6160				md_write_end(mddev);
  6161			mempool_free(ctx, conf->ctx_pool);
  6162			return true;
  6163		}
  6164		md_account_bio(mddev, &bi);
  6165	
  6166		/*
  6167		 * Lets start with the stripe with the lowest chunk offset in the first
  6168		 * chunk. That has the best chances of creating IOs adjacent to
  6169		 * previous IOs in case of sequential IO and thus creates the most
  6170		 * sequential IO pattern. We don't bother with the optimization when
  6171		 * reshaping as the performance benefit is not worth the complexity.
  6172		 */
  6173		if (likely(conf->reshape_progress == MaxSector)) {
  6174			logical_sector = raid5_bio_lowest_chunk_sector(conf, bi);
  6175			on_wq = false;
  6176		} else {
  6177			add_wait_queue(&conf->wait_for_reshape, &wait);
  6178			on_wq = true;
  6179		}
  6180		s = (logical_sector - ctx->first_sector) >> RAID5_STRIPE_SHIFT(conf);
  6181	
  6182		while (1) {
  6183			res = make_stripe_request(mddev, conf, ctx, logical_sector,
  6184						  bi);
  6185			if (res == STRIPE_FAIL || res == STRIPE_WAIT_RESHAPE)
  6186				break;
  6187	
  6188			if (res == STRIPE_RETRY)
  6189				continue;
  6190	
  6191			if (res == STRIPE_SCHEDULE_AND_RETRY) {
  6192				WARN_ON_ONCE(!on_wq);
  6193				/*
  6194				 * Must release the reference to batch_last before
  6195				 * scheduling and waiting for work to be done,
  6196				 * otherwise the batch_last stripe head could prevent
  6197				 * raid5_activate_delayed() from making progress
  6198				 * and thus deadlocking.
  6199				 */
  6200				if (ctx->batch_last) {
  6201					raid5_release_stripe(ctx->batch_last);
  6202					ctx->batch_last = NULL;
  6203				}
  6204	
  6205				wait_woken(&wait, TASK_UNINTERRUPTIBLE,
  6206					   MAX_SCHEDULE_TIMEOUT);
  6207				continue;
  6208			}
  6209	
  6210			s = find_next_bit_wrap(ctx->sectors_to_do, stripe_cnt, s);
  6211			if (s == stripe_cnt)
  6212				break;
  6213	
  6214			logical_sector = ctx->first_sector +
  6215				(s << RAID5_STRIPE_SHIFT(conf));
  6216		}
  6217		if (unlikely(on_wq))
  6218			remove_wait_queue(&conf->wait_for_reshape, &wait);
  6219	
  6220		if (ctx->batch_last)
  6221			raid5_release_stripe(ctx->batch_last);
  6222	
  6223		if (rw == WRITE)
  6224			md_write_end(mddev);
  6225	
  6226		mempool_free(ctx, conf->ctx_pool);
  6227		if (res == STRIPE_WAIT_RESHAPE) {
  6228			DECLARE_COMPLETION_ONSTACK(done);
  6229			WRITE_ONCE(bi->bi_private, &done);
  6230	
  6231			bio_endio(bi);
  6232	
  6233			wait_for_completion(&done);
  6234			return false;
  6235		}
  6236	
  6237		bio_endio(bi);
  6238		return true;
  6239	}
  6240	

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] md/md-llbitmap: track bitmap sync_size explicitly
  2026-06-05  9:15 ` [PATCH] md/md-llbitmap: track bitmap sync_size explicitly Yu Kuai
@ 2026-06-15 10:48   ` Su Yue
  0 siblings, 0 replies; 66+ messages in thread
From: Su Yue @ 2026-06-15 10:48 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Song Liu, Yu Kuai, Li Nan, Xiao Ni, linux-raid, linux-kernel

On Fri 05 Jun 2026 at 17:15, Yu Kuai <yukuai@kernel.org> wrote:

> From: Yu Kuai <yukuai@fygo.io>
>
> Track llbitmap's own sync_size instead of always using
> mddev->resync_max_sectors directly.
>
> This is the minimal bookkeeping needed before llbitmap can track 
> old
> and new reshape geometry independently.
>
> Signed-off-by: Yu Kuai <yukuai@fygo.io>
>
Reviewed-by: Su Yue <glass.su@suse.com>
> ---
>  drivers/md/md-llbitmap.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
> index bcf34f0c9af6..ecf3ed712315 100644
> --- a/drivers/md/md-llbitmap.c
> +++ b/drivers/md/md-llbitmap.c
> @@ -285,10 +285,12 @@ struct llbitmap {
>  	unsigned long chunkshift;
>  	/* size of one chunk in sector */
>  	unsigned long chunksize;
>  	/* total number of chunks */
>  	unsigned long chunks;
> +	/* total number of sectors tracked by current bitmap geometry 
> */
> +	sector_t sync_size;
>  	unsigned long last_end_sync;
>  	/*
>  	 * time in seconds that dirty bits will be cleared if the page 
>  is not
>  	 * accessed.
>  	 */
> @@ -916,10 +918,11 @@ static int llbitmap_init(struct llbitmap 
> *llbitmap)
>
>  	llbitmap->barrier_idle = DEFAULT_BARRIER_IDLE;
>  	llbitmap->chunkshift = ffz(~chunksize);
>  	llbitmap->chunksize = chunksize;
>  	llbitmap->chunks = chunks;
> +	llbitmap->sync_size = blocks;
>  	mddev->bitmap_info.daemon_sleep = DEFAULT_DAEMON_SLEEP;
>
>  	ret = llbitmap_cache_pages(llbitmap);
>  	if (ret)
>  		return ret;
> @@ -936,10 +939,11 @@ static int llbitmap_read_sb(struct 
> llbitmap *llbitmap)
>  {
>  	struct mddev *mddev = llbitmap->mddev;
>  	unsigned long daemon_sleep;
>  	unsigned long chunksize;
>  	unsigned long events;
> +	sector_t sync_size;
>  	struct page *sb_page;
>  	bitmap_super_t *sb;
>  	int ret = -EINVAL;
>
>  	if (!mddev->bitmap_info.offset) {
> @@ -985,10 +989,13 @@ static int llbitmap_read_sb(struct 
> llbitmap *llbitmap)
>  	if (test_and_clear_bit(BITMAP_FIRST_USE, &llbitmap->flags)) {
>  		ret = llbitmap_init(llbitmap);
>  		goto out_put_page;
>  	}
>
> +	sync_size = le64_to_cpu(sb->sync_size);
> +	if (!sync_size)
> +		sync_size = mddev->resync_max_sectors;
>  	chunksize = le32_to_cpu(sb->chunksize);
>  	if (!is_power_of_2(chunksize)) {
>  		pr_err("md/llbitmap: %s: chunksize not a power of 2",
>  		       mdname(mddev));
>  		goto out_put_page;
> @@ -1020,12 +1027,13 @@ static int llbitmap_read_sb(struct 
> llbitmap *llbitmap)
>  	mddev->bitmap_info.chunksize = chunksize;
>  	mddev->bitmap_info.daemon_sleep = daemon_sleep;
>
>  	llbitmap->barrier_idle = DEFAULT_BARRIER_IDLE;
>  	llbitmap->chunksize = chunksize;
> -	llbitmap->chunks = 
> DIV_ROUND_UP_SECTOR_T(mddev->resync_max_sectors, chunksize);
> +	llbitmap->chunks = DIV_ROUND_UP_SECTOR_T(sync_size, 
> chunksize);
>  	llbitmap->chunkshift = ffz(~chunksize);
> +	llbitmap->sync_size = sync_size;
>  	ret = llbitmap_cache_pages(llbitmap);
>
>  out_put_page:
>  	__free_page(sb_page);
>  	kunmap_local(sb);
> @@ -1151,10 +1159,11 @@ static int llbitmap_resize(struct mddev 
> *mddev, sector_t blocks, int chunksize)
>  	}
>
>  	llbitmap->chunkshift = ffz(~chunksize);
>  	llbitmap->chunksize = chunksize;
>  	llbitmap->chunks = chunks;
> +	llbitmap->sync_size = blocks;
>
>  	return 0;
>  }
>
>  static int llbitmap_load(struct mddev *mddev)
> @@ -1524,11 +1533,11 @@ static void llbitmap_update_sb(void 
> *data)
>
>  	sb = kmap_local_page(sb_page);
>  	sb->events = cpu_to_le64(mddev->events);
>  	sb->state = cpu_to_le32(llbitmap->flags);
>  	sb->chunksize = cpu_to_le32(llbitmap->chunksize);
> -	sb->sync_size = cpu_to_le64(mddev->resync_max_sectors);
> +	sb->sync_size = cpu_to_le64(llbitmap->sync_size);
>  	sb->events_cleared = cpu_to_le64(llbitmap->events_cleared);
>  	sb->sectors_reserved = cpu_to_le32(mddev->bitmap_info.space);
>  	sb->daemon_sleep = 
>  cpu_to_le32(mddev->bitmap_info.daemon_sleep);
>
>  	kunmap_local(sb);
> @@ -1542,10 +1551,11 @@ static int llbitmap_get_stats(void 
> *data, struct md_bitmap_stats *stats)
>  	memset(stats, 0, sizeof(*stats));
>
>  	stats->missing_pages = 0;
>  	stats->pages = llbitmap->nr_pages;
>  	stats->file_pages = llbitmap->nr_pages;
> +	stats->sync_size = llbitmap->sync_size;
>
>  	stats->behind_writes = atomic_read(&llbitmap->behind_writes);
>  	stats->behind_wait = wq_has_sleeper(&llbitmap->behind_wait);
>  	stats->events_cleared = llbitmap->events_cleared;

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] md/md-llbitmap: allocate page controls independently
  2026-06-05  9:15 ` [PATCH] md/md-llbitmap: allocate page controls independently Yu Kuai
@ 2026-06-15 11:06   ` Su Yue
  0 siblings, 0 replies; 66+ messages in thread
From: Su Yue @ 2026-06-15 11:06 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Song Liu, Yu Kuai, Li Nan, Xiao Ni, linux-raid, linux-kernel

On Fri 05 Jun 2026 at 17:15, Yu Kuai <yukuai@kernel.org> wrote:

> From: Yu Kuai <yukuai@fygo.io>
>
> Allocate one llbitmap page-control object at a time and free 
> each
> object through the same model.
>
> Let llbitmap_read_page() return a zeroed page without reading 
> disk when
> the page index is beyond the current bitmap size, so 
> page-control
> allocation no longer needs a separate read_existing flag.
>
> This keeps the llbitmap page-control lifetime self-consistent 
> and
> prepares the page-cache code for later in-place growth.
>
> Signed-off-by: Yu Kuai <yukuai@fygo.io>
>

Straight enough.

Reviewed-by: Su Yue <glass.su@suse.com>

> ---
>  drivers/md/md-llbitmap.c | 99 
>  +++++++++++++++++++++++++---------------
>  1 file changed, 62 insertions(+), 37 deletions(-)
>
> diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
> index ecf3ed712315..2f2896fe4d6f 100644
> --- a/drivers/md/md-llbitmap.c
> +++ b/drivers/md/md-llbitmap.c
> @@ -510,24 +510,32 @@ static void llbitmap_write(struct llbitmap 
> *llbitmap, enum llbitmap_state state,
>  		llbitmap_set_page_dirty(llbitmap, idx, bit, true);
>  	else if (state == BitNeedSyncUnwritten)
>  		llbitmap_set_page_dirty(llbitmap, idx, bit, false);
>  }
>
> +static unsigned int llbitmap_used_pages(struct llbitmap 
> *llbitmap,
> +					unsigned long chunks)
> +{
> +	return DIV_ROUND_UP(chunks + BITMAP_DATA_OFFSET, PAGE_SIZE);
> +}
> +
>  static struct page *llbitmap_read_page(struct llbitmap 
>  *llbitmap, int idx)
>  {
>  	struct mddev *mddev = llbitmap->mddev;
>  	struct page *page = NULL;
>  	struct md_rdev *rdev;
>
> -	if (llbitmap->pctl && llbitmap->pctl[idx])
> +	if (llbitmap->pctl && idx < llbitmap->nr_pages && 
> llbitmap->pctl[idx])
>  		page = llbitmap->pctl[idx]->page;
>  	if (page)
>  		return page;
>
>  	page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>  	if (!page)
>  		return ERR_PTR(-ENOMEM);
> +	if (idx >= llbitmap_used_pages(llbitmap, llbitmap->chunks))
> +		return page;
>
>  	rdev_for_each(rdev, mddev) {
>  		sector_t sector;
>
>  		if (rdev->raid_disk < 0 || test_bit(Faulty, &rdev->flags) 
>  ||
> @@ -594,65 +602,82 @@ static void llbitmap_free_pages(struct 
> llbitmap *llbitmap)
>  		return;
>
>  	for (i = 0; i < llbitmap->nr_pages; i++) {
>  		struct llbitmap_page_ctl *pctl = llbitmap->pctl[i];
>
> -		if (!pctl || !pctl->page)
> -			break;
> -
> -		__free_page(pctl->page);
> +		if (!pctl)
> +			continue;
> +		if (pctl->page)
> +			__free_page(pctl->page);
>  		percpu_ref_exit(&pctl->active);
> +		kfree(pctl);
>  	}
>
> -	kfree(llbitmap->pctl[0]);
>  	kfree(llbitmap->pctl);
>  	llbitmap->pctl = NULL;
>  }
>
> -static int llbitmap_cache_pages(struct llbitmap *llbitmap)
> +static struct llbitmap_page_ctl *
> +llbitmap_alloc_page_ctl(struct llbitmap *llbitmap, int idx)
>  {
>  	struct llbitmap_page_ctl *pctl;
> -	unsigned int nr_pages = DIV_ROUND_UP(llbitmap->chunks +
> -					     BITMAP_DATA_OFFSET, PAGE_SIZE);
> +	struct page *page;
>  	unsigned int size = struct_size(pctl, dirty, BITS_TO_LONGS(
>  						llbitmap->blocks_per_page));
> -	int i;
> -
> -	llbitmap->pctl = kmalloc_array(nr_pages, sizeof(void *),
> -				       GFP_KERNEL | __GFP_ZERO);
> -	if (!llbitmap->pctl)
> -		return -ENOMEM;
>
>  	size = round_up(size, cache_line_size());
> -	pctl = kmalloc_array(nr_pages, size, GFP_KERNEL | __GFP_ZERO);
> -	if (!pctl) {
> -		kfree(llbitmap->pctl);
> -		return -ENOMEM;
> +	pctl = kzalloc(size, GFP_KERNEL);
> +	if (!pctl)
> +		return ERR_PTR(-ENOMEM);
> +
> +	page = llbitmap_read_page(llbitmap, idx);
> +
> +	if (IS_ERR(page)) {
> +		kfree(pctl);
> +		return ERR_CAST(page);
>  	}
>
> -	llbitmap->nr_pages = nr_pages;
> +	if (percpu_ref_init(&pctl->active, active_release,
> +			    PERCPU_REF_ALLOW_REINIT, GFP_KERNEL)) {
> +		__free_page(page);
> +		kfree(pctl);
> +		return ERR_PTR(-ENOMEM);
> +	}
>
> -	for (i = 0; i < nr_pages; i++, pctl = (void *)pctl + size) {
> -		struct page *page = llbitmap_read_page(llbitmap, i);
> +	pctl->page = page;
> +	pctl->state = page_address(page);
> +	init_waitqueue_head(&pctl->wait);
> +	return pctl;
> +}
>
> -		llbitmap->pctl[i] = pctl;
> +static unsigned int llbitmap_reserved_pages(struct llbitmap 
> *llbitmap)
> +{
> +	return DIV_ROUND_UP(llbitmap->mddev->bitmap_info.space << 
> SECTOR_SHIFT,
> +			    PAGE_SIZE);
> +}
>
> -		if (IS_ERR(page)) {
> -			llbitmap_free_pages(llbitmap);
> -			return PTR_ERR(page);
> -		}
> +static int llbitmap_alloc_pages(struct llbitmap *llbitmap)
> +{
> +	unsigned int used_pages = llbitmap_used_pages(llbitmap, 
> llbitmap->chunks);
> +	unsigned int nr_pages = max(used_pages, 
> llbitmap_reserved_pages(llbitmap));
> +	int i;
> +
> +	llbitmap->pctl = kcalloc(nr_pages, sizeof(*llbitmap->pctl), 
> GFP_KERNEL);
> +	if (!llbitmap->pctl)
> +		return -ENOMEM;
>
> -		if (percpu_ref_init(&pctl->active, active_release,
> -				    PERCPU_REF_ALLOW_REINIT, GFP_KERNEL)) {
> -			__free_page(page);
> +	llbitmap->nr_pages = nr_pages;
> +
> +	for (i = 0; i < nr_pages; i++) {
> +		llbitmap->pctl[i] = llbitmap_alloc_page_ctl(llbitmap, i);
> +		if (IS_ERR(llbitmap->pctl[i])) {
> +			int ret = PTR_ERR(llbitmap->pctl[i]);
> +
> +			llbitmap->pctl[i] = NULL;
>  			llbitmap_free_pages(llbitmap);
> -			return -ENOMEM;
> +			return ret;
>  		}
> -
> -		pctl->page = page;
> -		pctl->state = page_address(page);
> -		init_waitqueue_head(&pctl->wait);
>  	}
>
>  	return 0;
>  }
>
> @@ -921,11 +946,11 @@ static int llbitmap_init(struct llbitmap 
> *llbitmap)
>  	llbitmap->chunksize = chunksize;
>  	llbitmap->chunks = chunks;
>  	llbitmap->sync_size = blocks;
>  	mddev->bitmap_info.daemon_sleep = DEFAULT_DAEMON_SLEEP;
>
> -	ret = llbitmap_cache_pages(llbitmap);
> +	ret = llbitmap_alloc_pages(llbitmap);
>  	if (ret)
>  		return ret;
>
>  	llbitmap_state_machine(llbitmap, 0, llbitmap->chunks - 1,
>  			       BitmapActionInit);
> @@ -1030,11 +1055,11 @@ static int llbitmap_read_sb(struct 
> llbitmap *llbitmap)
>  	llbitmap->barrier_idle = DEFAULT_BARRIER_IDLE;
>  	llbitmap->chunksize = chunksize;
>  	llbitmap->chunks = DIV_ROUND_UP_SECTOR_T(sync_size, 
>  chunksize);
>  	llbitmap->chunkshift = ffz(~chunksize);
>  	llbitmap->sync_size = sync_size;
> -	ret = llbitmap_cache_pages(llbitmap);
> +	ret = llbitmap_alloc_pages(llbitmap);
>
>  out_put_page:
>  	__free_page(sb_page);
>  	kunmap_local(sb);
>  	return ret;

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] md/md-llbitmap: grow the page cache in place for reshape
  2026-06-05  9:15 ` [PATCH] md/md-llbitmap: grow the page cache in place for reshape Yu Kuai
@ 2026-06-15 11:16   ` Su Yue
  2026-06-15 16:19     ` yu kuai
  0 siblings, 1 reply; 66+ messages in thread
From: Su Yue @ 2026-06-15 11:16 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Song Liu, Yu Kuai, Li Nan, Xiao Ni, linux-raid, linux-kernel

On Fri 05 Jun 2026 at 17:15, Yu Kuai <yukuai@kernel.org> wrote:

> From: Yu Kuai <yukuai@fygo.io>
>
> Use the page-control helpers to grow llbitmap's cached pages in 
> place
> for resize and later reshape preparation, instead of rebuilding 
> the
> whole cache.
>
> Signed-off-by: Yu Kuai <yukuai@fygo.io>
> ---
>  drivers/md/md-llbitmap.c | 139 
>  +++++++++++++++++++++++++++++++++++----
>  1 file changed, 127 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
> index 2f2896fe4d6f..91d3dec43d48 100644
> --- a/drivers/md/md-llbitmap.c
> +++ b/drivers/md/md-llbitmap.c
> @@ -414,10 +414,23 @@ static char 
> state_machine[BitStateCount][BitmapActionCount] = {
>  		[BitmapActionClearUnwritten]	= BitUnwritten,
>  	},
>  };
>
>  static void __llbitmap_flush(struct mddev *mddev);
> +static void llbitmap_flush(struct mddev *mddev);
> +static void llbitmap_update_sb(void *data);
> +
> +static void llbitmap_resize_chunks(struct mddev *mddev, 
> sector_t blocks,
> +				   unsigned long *chunksize,
> +				   unsigned long *chunks)
>
NIT: I would like call it llbitmap_calculate_chunks.

> +{
> +	*chunks = DIV_ROUND_UP_SECTOR_T(blocks, *chunksize);
> +	while (*chunks > mddev->bitmap_info.space << SECTOR_SHIFT) {
> +		*chunksize = *chunksize << 1;
> +		*chunks = DIV_ROUND_UP_SECTOR_T(blocks, *chunksize);
> +	}
> +}
>
>  static enum llbitmap_state llbitmap_read(struct llbitmap 
>  *llbitmap, loff_t pos)
>  {
>  	unsigned int idx;
>  	unsigned int offset;
> @@ -653,10 +666,52 @@ static unsigned int 
> llbitmap_reserved_pages(struct llbitmap *llbitmap)
>  {
>  	return DIV_ROUND_UP(llbitmap->mddev->bitmap_info.space << 
>  SECTOR_SHIFT,
>  			    PAGE_SIZE);
>  }
>
> +static int llbitmap_expand_pages(struct llbitmap *llbitmap,
> +				 unsigned long chunks)
> +{
> +	struct llbitmap_page_ctl **pctl;
> +	unsigned int old_nr_pages = llbitmap->nr_pages;
> +	unsigned int nr_pages = llbitmap_used_pages(llbitmap, chunks);
> +	int i;
> +	int ret;
> +
> +	if (nr_pages <= old_nr_pages)
> +		return 0;
> +
> +	pctl = kcalloc(nr_pages, sizeof(*pctl), GFP_KERNEL);
> +	if (!pctl)
> +		return -ENOMEM;
> +
> +	if (llbitmap->pctl)
> +		memcpy(pctl, llbitmap->pctl,
> +		       array_size(old_nr_pages, sizeof(*pctl)));
> +
> +	for (i = old_nr_pages; i < nr_pages; i++) {
> +		pctl[i] = llbitmap_alloc_page_ctl(llbitmap, i);
> +		if (IS_ERR(pctl[i]))
> +			goto err_alloc_ptr;
> +	}
> +
> +	kfree(llbitmap->pctl);
> +	llbitmap->pctl = pctl;
> +	llbitmap->nr_pages = nr_pages;
> +	return 0;
> +
> +err_alloc_ptr:
> +	ret = PTR_ERR(pctl[i]);
> +	for (i--; i >= (int)old_nr_pages; i--) {
>
Confused about why not just declare i as an unsigned int?


--
Su
> +		__free_page(pctl[i]->page);
> +		percpu_ref_exit(&pctl[i]->active);
> +		kfree(pctl[i]);
> +	}
> +	kfree(pctl);
> +	return ret;
> +}
> +
>  static int llbitmap_alloc_pages(struct llbitmap *llbitmap)
>  {
>  	unsigned int used_pages = llbitmap_used_pages(llbitmap, 
>  llbitmap->chunks);
>  	unsigned int nr_pages = max(used_pages, 
>  llbitmap_reserved_pages(llbitmap));
>  	int i;
> @@ -728,10 +783,38 @@ static bool llbitmap_zero_all_disks(struct 
> llbitmap *llbitmap)
>  	}
>
>  	return true;
>  }
>
> +static void llbitmap_mark_range(struct llbitmap *llbitmap,
> +				unsigned long start,
> +				unsigned long end,
> +				enum llbitmap_state state)
> +{
> +	while (start <= end) {
> +		llbitmap_write(llbitmap, state, start);
> +		start++;
> +	}
> +}
> +
> +static int llbitmap_prepare_resize(struct llbitmap *llbitmap,
> +				   unsigned long old_chunks,
> +				   unsigned long new_chunks,
> +				   unsigned long cache_chunks)
> +{
> +	int ret;
> +
> +	llbitmap_flush(llbitmap->mddev);
> +	ret = llbitmap_expand_pages(llbitmap, cache_chunks);
> +	if (ret)
> +		return ret;
> +	if (new_chunks > old_chunks)
> +		llbitmap_mark_range(llbitmap, old_chunks, new_chunks - 1,
> +				    BitUnwritten);
> +	return 0;
> +}
> +
>  static void llbitmap_init_state(struct llbitmap *llbitmap)
>  {
>  	struct mddev *mddev = llbitmap->mddev;
>  	enum llbitmap_state state = BitUnwritten;
>  	unsigned long i;
> @@ -1024,14 +1107,14 @@ static int llbitmap_read_sb(struct 
> llbitmap *llbitmap)
>  		pr_err("md/llbitmap: %s: chunksize not a power of 2",
>  		       mdname(mddev));
>  		goto out_put_page;
>  	}
>
> -	if (chunksize < 
> DIV_ROUND_UP_SECTOR_T(mddev->resync_max_sectors,
> +	if (chunksize < DIV_ROUND_UP_SECTOR_T(sync_size,
>  					      mddev->bitmap_info.space << 
>  SECTOR_SHIFT)) {
>  		pr_err("md/llbitmap: %s: chunksize too small %lu < %llu / 
>  %lu",
> -		       mdname(mddev), chunksize, 
> mddev->resync_max_sectors,
> +		       mdname(mddev), chunksize, sync_size,
>  		       mddev->bitmap_info.space);
>  		goto out_put_page;
>  	}
>
>  	daemon_sleep = le32_to_cpu(sb->daemon_sleep);
> @@ -1169,28 +1252,60 @@ static int llbitmap_create(struct mddev 
> *mddev)
>  }
>
>  static int llbitmap_resize(struct mddev *mddev, sector_t 
>  blocks, int chunksize)
>  {
>  	struct llbitmap *llbitmap = mddev->bitmap;
> +	sector_t old_blocks = llbitmap->sync_size;
> +	unsigned long old_chunks = llbitmap->chunks;
>  	unsigned long chunks;
> +	unsigned long cache_chunks;
> +	int ret = 0;
> +	unsigned long bitmap_chunksize;
> +	bool reshape;
>
>  	if (chunksize == 0)
>  		chunksize = llbitmap->chunksize;
>
> -	/* If there is enough space, leave the chunksize unchanged. */
> -	chunks = DIV_ROUND_UP_SECTOR_T(blocks, chunksize);
> -	while (chunks > mddev->bitmap_info.space << SECTOR_SHIFT) {
> -		chunksize = chunksize << 1;
> -		chunks = DIV_ROUND_UP_SECTOR_T(blocks, chunksize);
> -	}
> +	bitmap_chunksize = chunksize;
> +	llbitmap_resize_chunks(mddev, blocks, &bitmap_chunksize, 
> &chunks);
>
> -	llbitmap->chunkshift = ffz(~chunksize);
> -	llbitmap->chunksize = chunksize;
> -	llbitmap->chunks = chunks;
> -	llbitmap->sync_size = blocks;
> +	reshape = mddev->delta_disks || mddev->new_level != 
> mddev->level ||
> +		mddev->new_layout != mddev->layout ||
> +		mddev->new_chunk_sectors != mddev->chunk_sectors;
> +	if (!reshape && bitmap_chunksize != llbitmap->chunksize)
> +		return -EOPNOTSUPP;
> +	if (blocks == old_blocks && chunks == llbitmap->chunks)
> +		return 0;
> +
> +	mutex_lock(&mddev->bitmap_info.mutex);
>
> +	cache_chunks = reshape ? max(old_chunks, chunks) : chunks;
> +	ret = llbitmap_prepare_resize(llbitmap, old_chunks, chunks, 
> cache_chunks);
> +	if (ret)
> +		goto out;
> +
> +	if (reshape) {
> +		llbitmap->reshape_sync_size = blocks;
> +		llbitmap->reshape_chunksize = bitmap_chunksize;
> +		llbitmap->reshape_chunks = chunks;
> +		llbitmap->chunks = max(old_chunks, chunks);
> +	} else {
> +		if (blocks < old_blocks && chunks < old_chunks)
> +			llbitmap_mark_range(llbitmap, chunks, old_chunks - 1,
> +					    BitUnwritten);
> +		mddev->bitmap_info.chunksize = bitmap_chunksize;
> +		llbitmap->chunks = chunks;
> +		llbitmap->sync_size = blocks;
> +		llbitmap_update_sb(llbitmap);
> +	}
> +	__llbitmap_flush(mddev);
> +	mutex_unlock(&mddev->bitmap_info.mutex);
>  	return 0;
> +
> +out:
> +	mutex_unlock(&mddev->bitmap_info.mutex);
> +	return ret;
>  }
>
>  static int llbitmap_load(struct mddev *mddev)
>  {
>  	enum llbitmap_action action = BitmapActionReload;

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] md/md-llbitmap: grow the page cache in place for reshape
  2026-06-15 11:16   ` Su Yue
@ 2026-06-15 16:19     ` yu kuai
  0 siblings, 0 replies; 66+ messages in thread
From: yu kuai @ 2026-06-15 16:19 UTC (permalink / raw)
  To: Su Yue, Yu Kuai
  Cc: Song Liu, Li Nan, Xiao Ni, linux-raid, linux-kernel, yukuai

Hi,

在 2026/6/15 19:16, Su Yue 写道:
> On Fri 05 Jun 2026 at 17:15, Yu Kuai <yukuai@kernel.org> wrote:
>
>> From: Yu Kuai <yukuai@fygo.io>
>>
>> Use the page-control helpers to grow llbitmap's cached pages in place
>> for resize and later reshape preparation, instead of rebuilding the
>> whole cache.
>>
>> Signed-off-by: Yu Kuai <yukuai@fygo.io>
>> ---
>>  drivers/md/md-llbitmap.c | 139  +++++++++++++++++++++++++++++++++++----
>>  1 file changed, 127 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
>> index 2f2896fe4d6f..91d3dec43d48 100644
>> --- a/drivers/md/md-llbitmap.c
>> +++ b/drivers/md/md-llbitmap.c
>> @@ -414,10 +414,23 @@ static char 
>> state_machine[BitStateCount][BitmapActionCount] = {
>>          [BitmapActionClearUnwritten]    = BitUnwritten,
>>      },
>>  };
>>
>>  static void __llbitmap_flush(struct mddev *mddev);
>> +static void llbitmap_flush(struct mddev *mddev);
>> +static void llbitmap_update_sb(void *data);
>> +
>> +static void llbitmap_resize_chunks(struct mddev *mddev, sector_t 
>> blocks,
>> +                   unsigned long *chunksize,
>> +                   unsigned long *chunks)
>>
> NIT: I would like call it llbitmap_calculate_chunks.

Agreed this is better, will change it.

>
>> +{
>> +    *chunks = DIV_ROUND_UP_SECTOR_T(blocks, *chunksize);
>> +    while (*chunks > mddev->bitmap_info.space << SECTOR_SHIFT) {
>> +        *chunksize = *chunksize << 1;
>> +        *chunks = DIV_ROUND_UP_SECTOR_T(blocks, *chunksize);
>> +    }
>> +}
>>
>>  static enum llbitmap_state llbitmap_read(struct llbitmap  *llbitmap, 
>> loff_t pos)
>>  {
>>      unsigned int idx;
>>      unsigned int offset;
>> @@ -653,10 +666,52 @@ static unsigned int 
>> llbitmap_reserved_pages(struct llbitmap *llbitmap)
>>  {
>>      return DIV_ROUND_UP(llbitmap->mddev->bitmap_info.space << 
>>  SECTOR_SHIFT,
>>                  PAGE_SIZE);
>>  }
>>
>> +static int llbitmap_expand_pages(struct llbitmap *llbitmap,
>> +                 unsigned long chunks)
>> +{
>> +    struct llbitmap_page_ctl **pctl;
>> +    unsigned int old_nr_pages = llbitmap->nr_pages;
>> +    unsigned int nr_pages = llbitmap_used_pages(llbitmap, chunks);
>> +    int i;
>> +    int ret;
>> +
>> +    if (nr_pages <= old_nr_pages)
>> +        return 0;
>> +
>> +    pctl = kcalloc(nr_pages, sizeof(*pctl), GFP_KERNEL);
>> +    if (!pctl)
>> +        return -ENOMEM;
>> +
>> +    if (llbitmap->pctl)
>> +        memcpy(pctl, llbitmap->pctl,
>> +               array_size(old_nr_pages, sizeof(*pctl)));
>> +
>> +    for (i = old_nr_pages; i < nr_pages; i++) {
>> +        pctl[i] = llbitmap_alloc_page_ctl(llbitmap, i);
>> +        if (IS_ERR(pctl[i]))
>> +            goto err_alloc_ptr;
>> +    }
>> +
>> +    kfree(llbitmap->pctl);
>> +    llbitmap->pctl = pctl;
>> +    llbitmap->nr_pages = nr_pages;
>> +    return 0;
>> +
>> +err_alloc_ptr:
>> +    ret = PTR_ERR(pctl[i]);
>> +    for (i--; i >= (int)old_nr_pages; i--) {
>>
> Confused about why not just declare i as an unsigned int?
>
Yes, this looks better as well.

>
> -- 
> Su
>> +        __free_page(pctl[i]->page);
>> +        percpu_ref_exit(&pctl[i]->active);
>> +        kfree(pctl[i]);
>> +    }
>> +    kfree(pctl);
>> +    return ret;
>> +}
>> +
>>  static int llbitmap_alloc_pages(struct llbitmap *llbitmap)
>>  {
>>      unsigned int used_pages = llbitmap_used_pages(llbitmap, 
>>  llbitmap->chunks);
>>      unsigned int nr_pages = max(used_pages, 
>>  llbitmap_reserved_pages(llbitmap));
>>      int i;
>> @@ -728,10 +783,38 @@ static bool llbitmap_zero_all_disks(struct 
>> llbitmap *llbitmap)
>>      }
>>
>>      return true;
>>  }
>>
>> +static void llbitmap_mark_range(struct llbitmap *llbitmap,
>> +                unsigned long start,
>> +                unsigned long end,
>> +                enum llbitmap_state state)
>> +{
>> +    while (start <= end) {
>> +        llbitmap_write(llbitmap, state, start);
>> +        start++;
>> +    }
>> +}
>> +
>> +static int llbitmap_prepare_resize(struct llbitmap *llbitmap,
>> +                   unsigned long old_chunks,
>> +                   unsigned long new_chunks,
>> +                   unsigned long cache_chunks)
>> +{
>> +    int ret;
>> +
>> +    llbitmap_flush(llbitmap->mddev);
>> +    ret = llbitmap_expand_pages(llbitmap, cache_chunks);
>> +    if (ret)
>> +        return ret;
>> +    if (new_chunks > old_chunks)
>> +        llbitmap_mark_range(llbitmap, old_chunks, new_chunks - 1,
>> +                    BitUnwritten);
>> +    return 0;
>> +}
>> +
>>  static void llbitmap_init_state(struct llbitmap *llbitmap)
>>  {
>>      struct mddev *mddev = llbitmap->mddev;
>>      enum llbitmap_state state = BitUnwritten;
>>      unsigned long i;
>> @@ -1024,14 +1107,14 @@ static int llbitmap_read_sb(struct llbitmap 
>> *llbitmap)
>>          pr_err("md/llbitmap: %s: chunksize not a power of 2",
>>                 mdname(mddev));
>>          goto out_put_page;
>>      }
>>
>> -    if (chunksize < DIV_ROUND_UP_SECTOR_T(mddev->resync_max_sectors,
>> +    if (chunksize < DIV_ROUND_UP_SECTOR_T(sync_size,
>>                            mddev->bitmap_info.space <<  SECTOR_SHIFT)) {
>>          pr_err("md/llbitmap: %s: chunksize too small %lu < %llu / 
>>  %lu",
>> -               mdname(mddev), chunksize, mddev->resync_max_sectors,
>> +               mdname(mddev), chunksize, sync_size,
>>                 mddev->bitmap_info.space);
>>          goto out_put_page;
>>      }
>>
>>      daemon_sleep = le32_to_cpu(sb->daemon_sleep);
>> @@ -1169,28 +1252,60 @@ static int llbitmap_create(struct mddev *mddev)
>>  }
>>
>>  static int llbitmap_resize(struct mddev *mddev, sector_t  blocks, 
>> int chunksize)
>>  {
>>      struct llbitmap *llbitmap = mddev->bitmap;
>> +    sector_t old_blocks = llbitmap->sync_size;
>> +    unsigned long old_chunks = llbitmap->chunks;
>>      unsigned long chunks;
>> +    unsigned long cache_chunks;
>> +    int ret = 0;
>> +    unsigned long bitmap_chunksize;
>> +    bool reshape;
>>
>>      if (chunksize == 0)
>>          chunksize = llbitmap->chunksize;
>>
>> -    /* If there is enough space, leave the chunksize unchanged. */
>> -    chunks = DIV_ROUND_UP_SECTOR_T(blocks, chunksize);
>> -    while (chunks > mddev->bitmap_info.space << SECTOR_SHIFT) {
>> -        chunksize = chunksize << 1;
>> -        chunks = DIV_ROUND_UP_SECTOR_T(blocks, chunksize);
>> -    }
>> +    bitmap_chunksize = chunksize;
>> +    llbitmap_resize_chunks(mddev, blocks, &bitmap_chunksize, &chunks);
>>
>> -    llbitmap->chunkshift = ffz(~chunksize);
>> -    llbitmap->chunksize = chunksize;
>> -    llbitmap->chunks = chunks;
>> -    llbitmap->sync_size = blocks;
>> +    reshape = mddev->delta_disks || mddev->new_level != mddev->level ||
>> +        mddev->new_layout != mddev->layout ||
>> +        mddev->new_chunk_sectors != mddev->chunk_sectors;
>> +    if (!reshape && bitmap_chunksize != llbitmap->chunksize)
>> +        return -EOPNOTSUPP;
>> +    if (blocks == old_blocks && chunks == llbitmap->chunks)
>> +        return 0;
>> +
>> +    mutex_lock(&mddev->bitmap_info.mutex);
>>
>> +    cache_chunks = reshape ? max(old_chunks, chunks) : chunks;
>> +    ret = llbitmap_prepare_resize(llbitmap, old_chunks, chunks, 
>> cache_chunks);
>> +    if (ret)
>> +        goto out;
>> +
>> +    if (reshape) {
>> +        llbitmap->reshape_sync_size = blocks;
>> +        llbitmap->reshape_chunksize = bitmap_chunksize;
>> +        llbitmap->reshape_chunks = chunks;
>> +        llbitmap->chunks = max(old_chunks, chunks);
>> +    } else {
>> +        if (blocks < old_blocks && chunks < old_chunks)
>> +            llbitmap_mark_range(llbitmap, chunks, old_chunks - 1,
>> +                        BitUnwritten);
>> +        mddev->bitmap_info.chunksize = bitmap_chunksize;
>> +        llbitmap->chunks = chunks;
>> +        llbitmap->sync_size = blocks;
>> +        llbitmap_update_sb(llbitmap);
>> +    }
>> +    __llbitmap_flush(mddev);
>> +    mutex_unlock(&mddev->bitmap_info.mutex);
>>      return 0;
>> +
>> +out:
>> +    mutex_unlock(&mddev->bitmap_info.mutex);
>> +    return ret;
>>  }
>>
>>  static int llbitmap_load(struct mddev *mddev)
>>  {
>>      enum llbitmap_action action = BitmapActionReload;

-- 
Thanks,
Kuai

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v2 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5
  2026-06-05  9:15 [PATCH 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                   ` (19 preceding siblings ...)
  2026-06-05  9:15 ` [PATCH] md/raid5: split reshape bios before bitmap accounting Yu Kuai
@ 2026-06-24  6:41 ` Yu Kuai
  2026-06-24  6:41   ` [PATCH v2 01/20] md: add exact bitmap mapping and reshape hooks Yu Kuai
                     ` (19 more replies)
  20 siblings, 20 replies; 66+ messages in thread
From: Yu Kuai @ 2026-06-24  6:41 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, Su Yue, linux-raid, linux-kernel

Hi,

This series adds llbitmap support for online reshape in RAID10 and RAID5.

llbitmap has a different set of constraints from the existing bitmap code:
there is one live bitmap instance, each bit state has richer semantics, and
reshape can change the mapping from logical array ranges to bitmap ranges.
The series therefore adds exact bitmap range mapping hooks, tracks old and
new llbitmap geometry during reshape, remaps checkpointed bits as reshape
progresses, and wires the reshape lifecycle into RAID10 and RAID5.

The main rules are:

1. split bios at the reshape position before bitmap accounting, so one bio
   is never accounted with mixed old/new geometry;
2. do not skip reshape ranges from stale llbitmap state, because reshape
   progress is checkpointed by array metadata;
3. remap llbitmap bits when reshape progress is checkpointed;
4. reject llbitmap reshape if mddev->chunk_sectors shrinks, because the
   effective data range represented by existing bitmap bits can shrink.

The first group of patches prepares generic bitmap and llbitmap
infrastructure. The second group wires RAID10. The last group wires RAID5,
including exact old/new stripe mapping.

Changes since v1:
- Add Reviewed-by tags from Su Yue.
- Rename llbitmap_resize_chunks() to llbitmap_calculate_chunks().
- Use an unsigned index in llbitmap_expand_pages() error cleanup.
- Rebase on mdraid/md-7.2, including the mddev_bio_split_at_reshape_offset()
  declaration needed by the RAID5 build reported by kernel test robot.

Validation:
* RAID5 llbitmap test:
  - created 3-disk RAID5 with --bitmap=lockless
  - wrote 96 MiB of random data
  - reshaped to 4 disks
  - llbitmap bits changed from clean=1024 dirty=1024 to
    unwritten=448 clean=1600 dirty=0
  - all sync-related llbitmap counters were zero after reshape
  - data hash was unchanged after reshape
  - replaced one disk, waited for recovery, hash was unchanged
  - failed another old disk and verified degraded reads still matched
* RAID10 llbitmap test:
  - created 4-disk RAID10 n2 with --bitmap=lockless
  - wrote 128 MiB of random data
  - reshaped to 6 disks
  - llbitmap bits changed from clean=2048 dirty=2048 to
    unwritten=2048 clean=4096 dirty=0
  - all sync-related llbitmap counters were zero after reshape
  - data hash was unchanged after reshape
  - replaced one disk, waited for recovery, hash was unchanged
  - failed the rebuilt disk's mirror mate and verified degraded reads still
    matched

Yu Kuai (20):
  md: add exact bitmap mapping and reshape hooks
  md: skip bitmap accounting for empty write ranges
  md: add helper to split bios at reshape offset
  md/md-llbitmap: track bitmap sync_size explicitly
  md/md-llbitmap: allocate page controls independently
  md/md-llbitmap: grow the page cache in place for reshape
  md/md-llbitmap: track target reshape geometry fields
  md/md-llbitmap: finish reshape geometry
  md/md-llbitmap: refuse reshape while llbitmap still needs sync
  md/md-llbitmap: add reshape range mapping helpers
  md/md-llbitmap: don't skip reshape ranges from bitmap state
  md/md-llbitmap: remap checkpointed bits as reshape progresses
  md/md-llbitmap: clamp state-machine walks to tracked bits
  md/raid10: reject llbitmap reshape when md chunk shrinks
  md/raid10: wire llbitmap reshape lifecycle
  md/raid10: split reshape bios before bitmap accounting
  md/raid5: add exact old and new llbitmap mapping helpers
  md/raid5: reject llbitmap reshape when md chunk shrinks
  md/raid5: wire llbitmap reshape lifecycle
  md/raid5: split reshape bios before bitmap accounting

 drivers/md/md-bitmap.c   |   8 +
 drivers/md/md-bitmap.h   |   8 +
 drivers/md/md-llbitmap.c | 619 +++++++++++++++++++++++++++++++++++----
 drivers/md/md.c          |  60 +++-
 drivers/md/md.h          |   8 +
 drivers/md/raid10.c      |  49 ++++
 drivers/md/raid5.c       | 118 ++++++--
 7 files changed, 795 insertions(+), 75 deletions(-)


base-commit: 55b77337bdd088c77461588e5ec094421b89911b
-- 
2.51.0

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v2 01/20] md: add exact bitmap mapping and reshape hooks
  2026-06-24  6:41 ` [PATCH v2 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
@ 2026-06-24  6:41   ` Yu Kuai
  2026-06-24  6:41   ` [PATCH v2 02/20] md: skip bitmap accounting for empty write ranges Yu Kuai
                     ` (18 subsequent siblings)
  19 siblings, 0 replies; 66+ messages in thread
From: Yu Kuai @ 2026-06-24  6:41 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, Su Yue, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Add bitmap mapping and reshape hooks needed by llbitmap reshape
support without teaching md core to account a single bio against
multiple bitmap ranges.

This also adds the old/new bitmap geometry helpers used by
personalities to describe reshape mapping to llbitmap.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/md-bitmap.c   |  8 ++++++++
 drivers/md/md-bitmap.h   |  8 ++++++++
 drivers/md/md-llbitmap.c |  8 ++++++++
 drivers/md/md.c          | 12 ++++++++----
 drivers/md/md.h          |  4 ++++
 5 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 0f02e2956398..4c31807e6bcc 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -1730,6 +1730,13 @@ static void bitmap_start_write(struct mddev *mddev, sector_t offset,
 	}
 }
 
+static void bitmap_prepare_range(struct mddev *mddev, sector_t *offset,
+				 unsigned long *sectors)
+{
+	if (mddev->pers->bitmap_sector)
+		mddev->pers->bitmap_sector(mddev, offset, sectors);
+}
+
 static void bitmap_end_write(struct mddev *mddev, sector_t offset,
 			     unsigned long sectors)
 {
@@ -3083,6 +3090,7 @@ static struct bitmap_operations bitmap_ops = {
 	.flush			= bitmap_flush,
 	.write_all		= bitmap_write_all,
 	.dirty_bits		= bitmap_dirty_bits,
+	.prepare_range		= bitmap_prepare_range,
 	.unplug			= bitmap_unplug,
 	.daemon_work		= bitmap_daemon_work,
 
diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h
index f46674bdfeb9..26cd07737aab 100644
--- a/drivers/md/md-bitmap.h
+++ b/drivers/md/md-bitmap.h
@@ -93,6 +93,14 @@ struct bitmap_operations {
 	void (*write_all)(struct mddev *mddev);
 	void (*dirty_bits)(struct mddev *mddev, unsigned long s,
 			   unsigned long e);
+	/* Prepare a range for this bitmap implementation. */
+	void (*prepare_range)(struct mddev *mddev,
+			      sector_t *offset,
+			      unsigned long *sectors);
+	void (*reshape_finish)(struct mddev *mddev);
+	int (*reshape_can_start)(struct mddev *mddev);
+	void (*reshape_mark)(struct mddev *mddev, sector_t old_pos,
+			     sector_t new_pos);
 	void (*unplug)(struct mddev *mddev, bool sync);
 	void (*daemon_work)(struct mddev *mddev);
 
diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index 5a4e2abaa757..f0c20afa354e 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -1188,6 +1188,13 @@ static void llbitmap_destroy(struct mddev *mddev)
 	mutex_unlock(&mddev->bitmap_info.mutex);
 }
 
+static void llbitmap_prepare_range(struct mddev *mddev, sector_t *offset,
+				   unsigned long *sectors)
+{
+	if (mddev->pers->bitmap_sector)
+		mddev->pers->bitmap_sector(mddev, offset, sectors);
+}
+
 static void llbitmap_start_write(struct mddev *mddev, sector_t offset,
 				 unsigned long sectors)
 {
@@ -1780,6 +1787,7 @@ static struct bitmap_operations llbitmap_ops = {
 	.update_sb		= llbitmap_update_sb,
 	.get_stats		= llbitmap_get_stats,
 	.dirty_bits		= llbitmap_dirty_bits,
+	.prepare_range		= llbitmap_prepare_range,
 	.write_all		= llbitmap_write_all,
 
 	.groups			= md_llbitmap_groups,
diff --git a/drivers/md/md.c b/drivers/md/md.c
index d1465bcd86c8..3b20a57b8c7e 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9361,6 +9361,12 @@ void md_submit_discard_bio(struct mddev *mddev, struct md_rdev *rdev,
 }
 EXPORT_SYMBOL_GPL(md_submit_discard_bio);
 
+static void md_bitmap_prepare_range(struct mddev *mddev, sector_t *offset,
+				    unsigned long *sectors)
+{
+	mddev->bitmap_ops->prepare_range(mddev, offset, sectors);
+}
+
 static void md_bitmap_start(struct mddev *mddev,
 			    struct md_io_clone *md_io_clone)
 {
@@ -9368,10 +9374,8 @@ static void md_bitmap_start(struct mddev *mddev,
 			   mddev->bitmap_ops->start_discard :
 			   mddev->bitmap_ops->start_write;
 
-	if (mddev->pers->bitmap_sector)
-		mddev->pers->bitmap_sector(mddev, &md_io_clone->offset,
-					   &md_io_clone->sectors);
-
+	md_bitmap_prepare_range(mddev, &md_io_clone->offset,
+				&md_io_clone->sectors);
 	fn(mddev, md_io_clone->offset, md_io_clone->sectors);
 }
 
diff --git a/drivers/md/md.h b/drivers/md/md.h
index d8daf0f75cbb..110cf0f8b107 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -798,6 +798,10 @@ struct md_personality
 	/* convert io ranges from array to bitmap */
 	void (*bitmap_sector)(struct mddev *mddev, sector_t *offset,
 			      unsigned long *sectors);
+	void (*bitmap_sector_map)(struct mddev *mddev, sector_t *offset,
+				  unsigned long *sectors, bool previous);
+	sector_t (*bitmap_sync_size)(struct mddev *mddev, bool previous);
+	sector_t (*bitmap_array_sectors)(struct mddev *mddev, bool previous);
 };
 
 struct md_sysfs_entry {
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 02/20] md: skip bitmap accounting for empty write ranges
  2026-06-24  6:41 ` [PATCH v2 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
  2026-06-24  6:41   ` [PATCH v2 01/20] md: add exact bitmap mapping and reshape hooks Yu Kuai
@ 2026-06-24  6:41   ` Yu Kuai
  2026-06-24  7:04     ` sashiko-bot
  2026-06-24  6:42   ` [PATCH v2 03/20] md: add helper to split bios at reshape offset Yu Kuai
                     ` (17 subsequent siblings)
  19 siblings, 1 reply; 66+ messages in thread
From: Yu Kuai @ 2026-06-24  6:41 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, Su Yue, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

mkfs.ext4 can submit zero-sector flush/FUA bios. These bios are WRITE
bios for md_write_start() purposes, but they do not cover any data sector
and must not dirty bitmap bits.

md bitmap accounting currently passes such bios to bitmap start_write().
For llbitmap this reaches llbitmap_start_write() with sectors == 0,
which underflows the end chunk calculation.

The new bitmap prepare_range() hook can also turn a non-empty bio into an
empty bitmap range when the requested sectors are outside the active
bitmap geometry. Treat both cases as not started, so the completion path
will not call end_write() for an empty range.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/md.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 3b20a57b8c7e..0b59c676f7c0 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9376,6 +9376,8 @@ static void md_bitmap_start(struct mddev *mddev,
 
 	md_bitmap_prepare_range(mddev, &md_io_clone->offset,
 				&md_io_clone->sectors);
+	if (!md_io_clone->sectors)
+		return;
 	fn(mddev, md_io_clone->offset, md_io_clone->sectors);
 }
 
@@ -9396,7 +9398,8 @@ static void md_end_clone_io(struct bio *bio)
 	struct mddev *mddev = md_io_clone->mddev;
 	struct completion *reshape_completion = bio->bi_private;
 
-	if (bio_data_dir(orig_bio) == WRITE && md_bitmap_enabled(mddev, false))
+	if (bio_data_dir(orig_bio) == WRITE && md_io_clone->sectors &&
+	    md_bitmap_enabled(mddev, false))
 		md_bitmap_end(mddev, md_io_clone);
 
 	if (bio->bi_status && !orig_bio->bi_status)
@@ -9423,10 +9426,12 @@ static void md_clone_bio(struct mddev *mddev, struct bio **bio)
 	md_io_clone = container_of(clone, struct md_io_clone, bio_clone);
 	md_io_clone->orig_bio = *bio;
 	md_io_clone->mddev = mddev;
+	md_io_clone->sectors = 0;
 	if (blk_queue_io_stat(bdev->bd_disk->queue))
 		md_io_clone->start_time = bio_start_io_acct(*bio);
 
-	if (bio_data_dir(*bio) == WRITE && md_bitmap_enabled(mddev, false)) {
+	if (bio_data_dir(*bio) == WRITE && bio_sectors(*bio) &&
+	    md_bitmap_enabled(mddev, false)) {
 		md_io_clone->offset = (*bio)->bi_iter.bi_sector;
 		md_io_clone->sectors = bio_sectors(*bio);
 		md_io_clone->rw = op_stat_group(bio_op(*bio));
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 03/20] md: add helper to split bios at reshape offset
  2026-06-24  6:41 ` [PATCH v2 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
  2026-06-24  6:41   ` [PATCH v2 01/20] md: add exact bitmap mapping and reshape hooks Yu Kuai
  2026-06-24  6:41   ` [PATCH v2 02/20] md: skip bitmap accounting for empty write ranges Yu Kuai
@ 2026-06-24  6:42   ` Yu Kuai
  2026-06-24  7:01     ` sashiko-bot
  2026-06-24  6:42   ` [PATCH v2 04/20] md/md-llbitmap: track bitmap sync_size explicitly Yu Kuai
                     ` (16 subsequent siblings)
  19 siblings, 1 reply; 66+ messages in thread
From: Yu Kuai @ 2026-06-24  6:42 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, Su Yue, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Add mddev_bio_split_at_reshape_offset() so personalities can share
reshape-offset bio splitting instead of open-coding the same helper in
multiple places.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/md.c | 39 +++++++++++++++++++++++++++++++++++++++
 drivers/md/md.h |  4 ++++
 2 files changed, 43 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0b59c676f7c0..c18a7f3c27e6 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9361,6 +9361,45 @@ void md_submit_discard_bio(struct mddev *mddev, struct md_rdev *rdev,
 }
 EXPORT_SYMBOL_GPL(md_submit_discard_bio);
 
+struct bio *mddev_bio_split_at_reshape_offset(struct mddev *mddev,
+					      struct bio *bio,
+					      unsigned int *max_sectors,
+					      struct bio_set *bs)
+{
+	sector_t boundary;
+	sector_t start;
+	sector_t end;
+	unsigned int split_sectors;
+
+	split_sectors = bio_sectors(bio);
+	if (max_sectors && *max_sectors && *max_sectors < split_sectors)
+		split_sectors = *max_sectors;
+
+	if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery))
+		goto split;
+
+	boundary = mddev->reshape_position;
+	start = bio->bi_iter.bi_sector;
+	end = bio_end_sector(bio);
+	if (start >= boundary || end <= boundary)
+		goto split;
+
+	if (boundary - start < split_sectors)
+		split_sectors = boundary - start;
+
+split:
+	if (max_sectors)
+		*max_sectors = split_sectors;
+	if (split_sectors < bio_sectors(bio)) {
+		bio = bio_submit_split_bioset(bio, split_sectors, bs);
+		if (bio)
+			bio->bi_opf |= REQ_NOMERGE;
+	}
+
+	return bio;
+}
+EXPORT_SYMBOL_GPL(mddev_bio_split_at_reshape_offset);
+
 static void md_bitmap_prepare_range(struct mddev *mddev, sector_t *offset,
 				    unsigned long *sectors)
 {
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 110cf0f8b107..ebfc6da83161 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -925,6 +925,10 @@ extern void md_error(struct mddev *mddev, struct md_rdev *rdev);
 extern void md_finish_reshape(struct mddev *mddev);
 void md_submit_discard_bio(struct mddev *mddev, struct md_rdev *rdev,
 			struct bio *bio, sector_t start, sector_t size);
+struct bio *mddev_bio_split_at_reshape_offset(struct mddev *mddev,
+					      struct bio *bio,
+					      unsigned int *max_sectors,
+					      struct bio_set *bs);
 void md_account_bio(struct mddev *mddev, struct bio **bio);
 
 extern bool __must_check md_flush_request(struct mddev *mddev, struct bio *bio);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 04/20] md/md-llbitmap: track bitmap sync_size explicitly
  2026-06-24  6:41 ` [PATCH v2 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                     ` (2 preceding siblings ...)
  2026-06-24  6:42   ` [PATCH v2 03/20] md: add helper to split bios at reshape offset Yu Kuai
@ 2026-06-24  6:42   ` Yu Kuai
  2026-06-24  7:02     ` sashiko-bot
  2026-06-24  6:42   ` [PATCH v2 05/20] md/md-llbitmap: allocate page controls independently Yu Kuai
                     ` (15 subsequent siblings)
  19 siblings, 1 reply; 66+ messages in thread
From: Yu Kuai @ 2026-06-24  6:42 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, Su Yue, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Track llbitmap's own sync_size instead of always using
mddev->resync_max_sectors directly.

This is the minimal bookkeeping needed before llbitmap can track old
and new reshape geometry independently.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
Reviewed-by: Su Yue <glass.su@suse.com>
---
 drivers/md/md-llbitmap.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index f0c20afa354e..e3500a1a52c0 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -287,6 +287,8 @@ struct llbitmap {
 	unsigned long chunksize;
 	/* total number of chunks */
 	unsigned long chunks;
+	/* total number of sectors tracked by current bitmap geometry */
+	sector_t sync_size;
 	unsigned long last_end_sync;
 	/*
 	 * time in seconds that dirty bits will be cleared if the page is not
@@ -918,6 +920,7 @@ static int llbitmap_init(struct llbitmap *llbitmap)
 	llbitmap->chunkshift = ffz(~chunksize);
 	llbitmap->chunksize = chunksize;
 	llbitmap->chunks = chunks;
+	llbitmap->sync_size = blocks;
 	mddev->bitmap_info.daemon_sleep = DEFAULT_DAEMON_SLEEP;
 
 	ret = llbitmap_cache_pages(llbitmap);
@@ -938,6 +941,7 @@ static int llbitmap_read_sb(struct llbitmap *llbitmap)
 	unsigned long daemon_sleep;
 	unsigned long chunksize;
 	unsigned long events;
+	sector_t sync_size;
 	struct page *sb_page;
 	bitmap_super_t *sb;
 	int ret = -EINVAL;
@@ -987,6 +991,9 @@ static int llbitmap_read_sb(struct llbitmap *llbitmap)
 		goto out_put_page;
 	}
 
+	sync_size = le64_to_cpu(sb->sync_size);
+	if (!sync_size)
+		sync_size = mddev->resync_max_sectors;
 	chunksize = le32_to_cpu(sb->chunksize);
 	if (!is_power_of_2(chunksize)) {
 		pr_err("md/llbitmap: %s: chunksize not a power of 2",
@@ -1022,8 +1029,9 @@ static int llbitmap_read_sb(struct llbitmap *llbitmap)
 
 	llbitmap->barrier_idle = DEFAULT_BARRIER_IDLE;
 	llbitmap->chunksize = chunksize;
-	llbitmap->chunks = DIV_ROUND_UP_SECTOR_T(mddev->resync_max_sectors, chunksize);
+	llbitmap->chunks = DIV_ROUND_UP_SECTOR_T(sync_size, chunksize);
 	llbitmap->chunkshift = ffz(~chunksize);
+	llbitmap->sync_size = sync_size;
 	ret = llbitmap_cache_pages(llbitmap);
 
 out_put_page:
@@ -1153,6 +1161,7 @@ static int llbitmap_resize(struct mddev *mddev, sector_t blocks, int chunksize)
 	llbitmap->chunkshift = ffz(~chunksize);
 	llbitmap->chunksize = chunksize;
 	llbitmap->chunks = chunks;
+	llbitmap->sync_size = blocks;
 
 	return 0;
 }
@@ -1526,7 +1535,7 @@ static void llbitmap_update_sb(void *data)
 	sb->events = cpu_to_le64(mddev->events);
 	sb->state = cpu_to_le32(llbitmap->flags);
 	sb->chunksize = cpu_to_le32(llbitmap->chunksize);
-	sb->sync_size = cpu_to_le64(mddev->resync_max_sectors);
+	sb->sync_size = cpu_to_le64(llbitmap->sync_size);
 	sb->events_cleared = cpu_to_le64(llbitmap->events_cleared);
 	sb->sectors_reserved = cpu_to_le32(mddev->bitmap_info.space);
 	sb->daemon_sleep = cpu_to_le32(mddev->bitmap_info.daemon_sleep);
@@ -1544,6 +1553,7 @@ static int llbitmap_get_stats(void *data, struct md_bitmap_stats *stats)
 	stats->missing_pages = 0;
 	stats->pages = llbitmap->nr_pages;
 	stats->file_pages = llbitmap->nr_pages;
+	stats->sync_size = llbitmap->sync_size;
 
 	stats->behind_writes = atomic_read(&llbitmap->behind_writes);
 	stats->behind_wait = wq_has_sleeper(&llbitmap->behind_wait);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 05/20] md/md-llbitmap: allocate page controls independently
  2026-06-24  6:41 ` [PATCH v2 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                     ` (3 preceding siblings ...)
  2026-06-24  6:42   ` [PATCH v2 04/20] md/md-llbitmap: track bitmap sync_size explicitly Yu Kuai
@ 2026-06-24  6:42   ` Yu Kuai
  2026-06-24  7:02     ` sashiko-bot
  2026-06-24  6:42   ` [PATCH v2 06/20] md/md-llbitmap: grow the page cache in place for reshape Yu Kuai
                     ` (14 subsequent siblings)
  19 siblings, 1 reply; 66+ messages in thread
From: Yu Kuai @ 2026-06-24  6:42 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, Su Yue, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Allocate one llbitmap page-control object at a time and free each
object through the same model.

Let llbitmap_read_page() return a zeroed page without reading disk when
the page index is beyond the current bitmap size, so page-control
allocation no longer needs a separate read_existing flag.

This keeps the llbitmap page-control lifetime self-consistent and
prepares the page-cache code for later in-place growth.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
Reviewed-by: Su Yue <glass.su@suse.com>
---
 drivers/md/md-llbitmap.c | 99 +++++++++++++++++++++++++---------------
 1 file changed, 62 insertions(+), 37 deletions(-)

diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index e3500a1a52c0..4c3fc4b35aff 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -512,13 +512,19 @@ static void llbitmap_write(struct llbitmap *llbitmap, enum llbitmap_state state,
 		llbitmap_set_page_dirty(llbitmap, idx, bit, false);
 }
 
+static unsigned int llbitmap_used_pages(struct llbitmap *llbitmap,
+					unsigned long chunks)
+{
+	return DIV_ROUND_UP(chunks + BITMAP_DATA_OFFSET, PAGE_SIZE);
+}
+
 static struct page *llbitmap_read_page(struct llbitmap *llbitmap, int idx)
 {
 	struct mddev *mddev = llbitmap->mddev;
 	struct page *page = NULL;
 	struct md_rdev *rdev;
 
-	if (llbitmap->pctl && llbitmap->pctl[idx])
+	if (llbitmap->pctl && idx < llbitmap->nr_pages && llbitmap->pctl[idx])
 		page = llbitmap->pctl[idx]->page;
 	if (page)
 		return page;
@@ -526,6 +532,8 @@ static struct page *llbitmap_read_page(struct llbitmap *llbitmap, int idx)
 	page = alloc_page(GFP_KERNEL | __GFP_ZERO);
 	if (!page)
 		return ERR_PTR(-ENOMEM);
+	if (idx >= llbitmap_used_pages(llbitmap, llbitmap->chunks))
+		return page;
 
 	rdev_for_each(rdev, mddev) {
 		sector_t sector;
@@ -596,61 +604,78 @@ static void llbitmap_free_pages(struct llbitmap *llbitmap)
 	for (i = 0; i < llbitmap->nr_pages; i++) {
 		struct llbitmap_page_ctl *pctl = llbitmap->pctl[i];
 
-		if (!pctl || !pctl->page)
-			break;
-
-		__free_page(pctl->page);
+		if (!pctl)
+			continue;
+		if (pctl->page)
+			__free_page(pctl->page);
 		percpu_ref_exit(&pctl->active);
+		kfree(pctl);
 	}
 
-	kfree(llbitmap->pctl[0]);
 	kfree(llbitmap->pctl);
 	llbitmap->pctl = NULL;
 }
 
-static int llbitmap_cache_pages(struct llbitmap *llbitmap)
+static struct llbitmap_page_ctl *
+llbitmap_alloc_page_ctl(struct llbitmap *llbitmap, int idx)
 {
 	struct llbitmap_page_ctl *pctl;
-	unsigned int nr_pages = DIV_ROUND_UP(llbitmap->chunks +
-					     BITMAP_DATA_OFFSET, PAGE_SIZE);
+	struct page *page;
 	unsigned int size = struct_size(pctl, dirty, BITS_TO_LONGS(
 						llbitmap->blocks_per_page));
-	int i;
-
-	llbitmap->pctl = kmalloc_array(nr_pages, sizeof(void *),
-				       GFP_KERNEL | __GFP_ZERO);
-	if (!llbitmap->pctl)
-		return -ENOMEM;
 
 	size = round_up(size, cache_line_size());
-	pctl = kmalloc_array(nr_pages, size, GFP_KERNEL | __GFP_ZERO);
-	if (!pctl) {
-		kfree(llbitmap->pctl);
-		return -ENOMEM;
+	pctl = kzalloc(size, GFP_KERNEL);
+	if (!pctl)
+		return ERR_PTR(-ENOMEM);
+
+	page = llbitmap_read_page(llbitmap, idx);
+
+	if (IS_ERR(page)) {
+		kfree(pctl);
+		return ERR_CAST(page);
 	}
 
-	llbitmap->nr_pages = nr_pages;
+	if (percpu_ref_init(&pctl->active, active_release,
+			    PERCPU_REF_ALLOW_REINIT, GFP_KERNEL)) {
+		__free_page(page);
+		kfree(pctl);
+		return ERR_PTR(-ENOMEM);
+	}
 
-	for (i = 0; i < nr_pages; i++, pctl = (void *)pctl + size) {
-		struct page *page = llbitmap_read_page(llbitmap, i);
+	pctl->page = page;
+	pctl->state = page_address(page);
+	init_waitqueue_head(&pctl->wait);
+	return pctl;
+}
 
-		llbitmap->pctl[i] = pctl;
+static unsigned int llbitmap_reserved_pages(struct llbitmap *llbitmap)
+{
+	return DIV_ROUND_UP(llbitmap->mddev->bitmap_info.space << SECTOR_SHIFT,
+			    PAGE_SIZE);
+}
 
-		if (IS_ERR(page)) {
-			llbitmap_free_pages(llbitmap);
-			return PTR_ERR(page);
-		}
+static int llbitmap_alloc_pages(struct llbitmap *llbitmap)
+{
+	unsigned int used_pages = llbitmap_used_pages(llbitmap, llbitmap->chunks);
+	unsigned int nr_pages = max(used_pages, llbitmap_reserved_pages(llbitmap));
+	int i;
+
+	llbitmap->pctl = kcalloc(nr_pages, sizeof(*llbitmap->pctl), GFP_KERNEL);
+	if (!llbitmap->pctl)
+		return -ENOMEM;
 
-		if (percpu_ref_init(&pctl->active, active_release,
-				    PERCPU_REF_ALLOW_REINIT, GFP_KERNEL)) {
-			__free_page(page);
+	llbitmap->nr_pages = nr_pages;
+
+	for (i = 0; i < nr_pages; i++) {
+		llbitmap->pctl[i] = llbitmap_alloc_page_ctl(llbitmap, i);
+		if (IS_ERR(llbitmap->pctl[i])) {
+			int ret = PTR_ERR(llbitmap->pctl[i]);
+
+			llbitmap->pctl[i] = NULL;
 			llbitmap_free_pages(llbitmap);
-			return -ENOMEM;
+			return ret;
 		}
-
-		pctl->page = page;
-		pctl->state = page_address(page);
-		init_waitqueue_head(&pctl->wait);
 	}
 
 	return 0;
@@ -923,7 +948,7 @@ static int llbitmap_init(struct llbitmap *llbitmap)
 	llbitmap->sync_size = blocks;
 	mddev->bitmap_info.daemon_sleep = DEFAULT_DAEMON_SLEEP;
 
-	ret = llbitmap_cache_pages(llbitmap);
+	ret = llbitmap_alloc_pages(llbitmap);
 	if (ret)
 		return ret;
 
@@ -1032,7 +1057,7 @@ static int llbitmap_read_sb(struct llbitmap *llbitmap)
 	llbitmap->chunks = DIV_ROUND_UP_SECTOR_T(sync_size, chunksize);
 	llbitmap->chunkshift = ffz(~chunksize);
 	llbitmap->sync_size = sync_size;
-	ret = llbitmap_cache_pages(llbitmap);
+	ret = llbitmap_alloc_pages(llbitmap);
 
 out_put_page:
 	__free_page(sb_page);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 06/20] md/md-llbitmap: grow the page cache in place for reshape
  2026-06-24  6:41 ` [PATCH v2 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                     ` (4 preceding siblings ...)
  2026-06-24  6:42   ` [PATCH v2 05/20] md/md-llbitmap: allocate page controls independently Yu Kuai
@ 2026-06-24  6:42   ` Yu Kuai
  2026-06-24  7:03     ` sashiko-bot
  2026-06-24  6:42   ` [PATCH v2 07/20] md/md-llbitmap: track target reshape geometry fields Yu Kuai
                     ` (13 subsequent siblings)
  19 siblings, 1 reply; 66+ messages in thread
From: Yu Kuai @ 2026-06-24  6:42 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, Su Yue, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Use the page-control helpers to grow llbitmap's cached pages in place
for resize and later reshape preparation, instead of rebuilding the
whole cache.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/md-llbitmap.c | 139 +++++++++++++++++++++++++++++++++++----
 1 file changed, 127 insertions(+), 12 deletions(-)

diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index 4c3fc4b35aff..98e7824c7362 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -416,6 +416,19 @@ static char state_machine[BitStateCount][BitmapActionCount] = {
 };
 
 static void __llbitmap_flush(struct mddev *mddev);
+static void llbitmap_flush(struct mddev *mddev);
+static void llbitmap_update_sb(void *data);
+
+static void llbitmap_calculate_chunks(struct mddev *mddev, sector_t blocks,
+				      unsigned long *chunksize,
+				      unsigned long *chunks)
+{
+	*chunks = DIV_ROUND_UP_SECTOR_T(blocks, *chunksize);
+	while (*chunks > mddev->bitmap_info.space << SECTOR_SHIFT) {
+		*chunksize = *chunksize << 1;
+		*chunks = DIV_ROUND_UP_SECTOR_T(blocks, *chunksize);
+	}
+}
 
 static enum llbitmap_state llbitmap_read(struct llbitmap *llbitmap, loff_t pos)
 {
@@ -655,6 +668,48 @@ static unsigned int llbitmap_reserved_pages(struct llbitmap *llbitmap)
 			    PAGE_SIZE);
 }
 
+static int llbitmap_expand_pages(struct llbitmap *llbitmap,
+				 unsigned long chunks)
+{
+	struct llbitmap_page_ctl **pctl;
+	unsigned int old_nr_pages = llbitmap->nr_pages;
+	unsigned int nr_pages = llbitmap_used_pages(llbitmap, chunks);
+	unsigned int i;
+	int ret;
+
+	if (nr_pages <= old_nr_pages)
+		return 0;
+
+	pctl = kcalloc(nr_pages, sizeof(*pctl), GFP_KERNEL);
+	if (!pctl)
+		return -ENOMEM;
+
+	if (llbitmap->pctl)
+		memcpy(pctl, llbitmap->pctl,
+		       array_size(old_nr_pages, sizeof(*pctl)));
+
+	for (i = old_nr_pages; i < nr_pages; i++) {
+		pctl[i] = llbitmap_alloc_page_ctl(llbitmap, i);
+		if (IS_ERR(pctl[i]))
+			goto err_alloc_ptr;
+	}
+
+	kfree(llbitmap->pctl);
+	llbitmap->pctl = pctl;
+	llbitmap->nr_pages = nr_pages;
+	return 0;
+
+err_alloc_ptr:
+	ret = PTR_ERR(pctl[i]);
+	while (i-- > old_nr_pages) {
+		__free_page(pctl[i]->page);
+		percpu_ref_exit(&pctl[i]->active);
+		kfree(pctl[i]);
+	}
+	kfree(pctl);
+	return ret;
+}
+
 static int llbitmap_alloc_pages(struct llbitmap *llbitmap)
 {
 	unsigned int used_pages = llbitmap_used_pages(llbitmap, llbitmap->chunks);
@@ -730,6 +785,34 @@ static bool llbitmap_zero_all_disks(struct llbitmap *llbitmap)
 	return true;
 }
 
+static void llbitmap_mark_range(struct llbitmap *llbitmap,
+				unsigned long start,
+				unsigned long end,
+				enum llbitmap_state state)
+{
+	while (start <= end) {
+		llbitmap_write(llbitmap, state, start);
+		start++;
+	}
+}
+
+static int llbitmap_prepare_resize(struct llbitmap *llbitmap,
+				   unsigned long old_chunks,
+				   unsigned long new_chunks,
+				   unsigned long cache_chunks)
+{
+	int ret;
+
+	llbitmap_flush(llbitmap->mddev);
+	ret = llbitmap_expand_pages(llbitmap, cache_chunks);
+	if (ret)
+		return ret;
+	if (new_chunks > old_chunks)
+		llbitmap_mark_range(llbitmap, old_chunks, new_chunks - 1,
+				    BitUnwritten);
+	return 0;
+}
+
 static void llbitmap_init_state(struct llbitmap *llbitmap)
 {
 	struct mddev *mddev = llbitmap->mddev;
@@ -1026,10 +1109,10 @@ static int llbitmap_read_sb(struct llbitmap *llbitmap)
 		goto out_put_page;
 	}
 
-	if (chunksize < DIV_ROUND_UP_SECTOR_T(mddev->resync_max_sectors,
+	if (chunksize < DIV_ROUND_UP_SECTOR_T(sync_size,
 					      mddev->bitmap_info.space << SECTOR_SHIFT)) {
 		pr_err("md/llbitmap: %s: chunksize too small %lu < %llu / %lu",
-		       mdname(mddev), chunksize, mddev->resync_max_sectors,
+		       mdname(mddev), chunksize, sync_size,
 		       mddev->bitmap_info.space);
 		goto out_put_page;
 	}
@@ -1171,24 +1254,56 @@ static int llbitmap_create(struct mddev *mddev)
 static int llbitmap_resize(struct mddev *mddev, sector_t blocks, int chunksize)
 {
 	struct llbitmap *llbitmap = mddev->bitmap;
+	sector_t old_blocks = llbitmap->sync_size;
+	unsigned long old_chunks = llbitmap->chunks;
 	unsigned long chunks;
+	unsigned long cache_chunks;
+	int ret = 0;
+	unsigned long bitmap_chunksize;
+	bool reshape;
 
 	if (chunksize == 0)
 		chunksize = llbitmap->chunksize;
 
-	/* If there is enough space, leave the chunksize unchanged. */
-	chunks = DIV_ROUND_UP_SECTOR_T(blocks, chunksize);
-	while (chunks > mddev->bitmap_info.space << SECTOR_SHIFT) {
-		chunksize = chunksize << 1;
-		chunks = DIV_ROUND_UP_SECTOR_T(blocks, chunksize);
-	}
+	bitmap_chunksize = chunksize;
+	llbitmap_calculate_chunks(mddev, blocks, &bitmap_chunksize, &chunks);
 
-	llbitmap->chunkshift = ffz(~chunksize);
-	llbitmap->chunksize = chunksize;
-	llbitmap->chunks = chunks;
-	llbitmap->sync_size = blocks;
+	reshape = mddev->delta_disks || mddev->new_level != mddev->level ||
+		mddev->new_layout != mddev->layout ||
+		mddev->new_chunk_sectors != mddev->chunk_sectors;
+	if (!reshape && bitmap_chunksize != llbitmap->chunksize)
+		return -EOPNOTSUPP;
+	if (blocks == old_blocks && chunks == llbitmap->chunks)
+		return 0;
+
+	mutex_lock(&mddev->bitmap_info.mutex);
 
+	cache_chunks = reshape ? max(old_chunks, chunks) : chunks;
+	ret = llbitmap_prepare_resize(llbitmap, old_chunks, chunks, cache_chunks);
+	if (ret)
+		goto out;
+
+	if (reshape) {
+		llbitmap->reshape_sync_size = blocks;
+		llbitmap->reshape_chunksize = bitmap_chunksize;
+		llbitmap->reshape_chunks = chunks;
+		llbitmap->chunks = max(old_chunks, chunks);
+	} else {
+		if (blocks < old_blocks && chunks < old_chunks)
+			llbitmap_mark_range(llbitmap, chunks, old_chunks - 1,
+					    BitUnwritten);
+		mddev->bitmap_info.chunksize = bitmap_chunksize;
+		llbitmap->chunks = chunks;
+		llbitmap->sync_size = blocks;
+		llbitmap_update_sb(llbitmap);
+	}
+	__llbitmap_flush(mddev);
+	mutex_unlock(&mddev->bitmap_info.mutex);
 	return 0;
+
+out:
+	mutex_unlock(&mddev->bitmap_info.mutex);
+	return ret;
 }
 
 static int llbitmap_load(struct mddev *mddev)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 07/20] md/md-llbitmap: track target reshape geometry fields
  2026-06-24  6:41 ` [PATCH v2 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                     ` (5 preceding siblings ...)
  2026-06-24  6:42   ` [PATCH v2 06/20] md/md-llbitmap: grow the page cache in place for reshape Yu Kuai
@ 2026-06-24  6:42   ` Yu Kuai
  2026-06-24  7:07     ` sashiko-bot
  2026-06-24  6:42   ` [PATCH v2 08/20] md/md-llbitmap: finish reshape geometry Yu Kuai
                     ` (12 subsequent siblings)
  19 siblings, 1 reply; 66+ messages in thread
From: Yu Kuai @ 2026-06-24  6:42 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, Su Yue, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Track llbitmap bookkeeping for the target reshape geometry while keeping
a single live bitmap instance.

Add the reshape geometry fields, refresh helper, and update the load and
resize paths to keep the target geometry in sync.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/md-llbitmap.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index 98e7824c7362..cdced2126e97 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -289,6 +289,9 @@ struct llbitmap {
 	unsigned long chunks;
 	/* total number of sectors tracked by current bitmap geometry */
 	sector_t sync_size;
+	unsigned long reshape_chunksize;
+	unsigned long reshape_chunks;
+	sector_t reshape_sync_size;
 	unsigned long last_end_sync;
 	/*
 	 * time in seconds that dirty bits will be cleared if the page is not
@@ -430,6 +433,39 @@ static void llbitmap_calculate_chunks(struct mddev *mddev, sector_t blocks,
 	}
 }
 
+static bool llbitmap_reshaping(struct llbitmap *llbitmap)
+{
+	return llbitmap->mddev->reshape_position != MaxSector;
+}
+
+static sector_t llbitmap_personality_sync_size(struct llbitmap *llbitmap,
+					       bool previous)
+{
+	struct mddev *mddev = llbitmap->mddev;
+
+	if (!llbitmap_reshaping(llbitmap) || !mddev->private || !mddev->pers ||
+	    !mddev->pers->bitmap_sync_size)
+		return llbitmap->sync_size;
+	return mddev->pers->bitmap_sync_size(mddev, previous);
+}
+
+static void llbitmap_refresh_reshape(struct llbitmap *llbitmap)
+{
+	unsigned long old_chunks = DIV_ROUND_UP_SECTOR_T(llbitmap->sync_size,
+						 llbitmap->chunksize);
+	sector_t blocks = llbitmap_personality_sync_size(llbitmap, false);
+	unsigned long chunksize = llbitmap->chunksize;
+	unsigned long chunks = DIV_ROUND_UP_SECTOR_T(blocks, chunksize);
+
+	llbitmap->reshape_sync_size = blocks;
+	llbitmap->reshape_chunksize = chunksize;
+	llbitmap->reshape_chunks = chunks;
+	llbitmap_calculate_chunks(llbitmap->mddev, blocks,
+				  &llbitmap->reshape_chunksize,
+				  &llbitmap->reshape_chunks);
+	llbitmap->chunks = max(old_chunks, llbitmap->reshape_chunks);
+}
+
 static enum llbitmap_state llbitmap_read(struct llbitmap *llbitmap, loff_t pos)
 {
 	unsigned int idx;
@@ -1029,6 +1065,7 @@ static int llbitmap_init(struct llbitmap *llbitmap)
 	llbitmap->chunksize = chunksize;
 	llbitmap->chunks = chunks;
 	llbitmap->sync_size = blocks;
+	llbitmap_refresh_reshape(llbitmap);
 	mddev->bitmap_info.daemon_sleep = DEFAULT_DAEMON_SLEEP;
 
 	ret = llbitmap_alloc_pages(llbitmap);
@@ -1140,6 +1177,7 @@ static int llbitmap_read_sb(struct llbitmap *llbitmap)
 	llbitmap->chunks = DIV_ROUND_UP_SECTOR_T(sync_size, chunksize);
 	llbitmap->chunkshift = ffz(~chunksize);
 	llbitmap->sync_size = sync_size;
+	llbitmap_refresh_reshape(llbitmap);
 	ret = llbitmap_alloc_pages(llbitmap);
 
 out_put_page:
@@ -1295,6 +1333,7 @@ static int llbitmap_resize(struct mddev *mddev, sector_t blocks, int chunksize)
 		mddev->bitmap_info.chunksize = bitmap_chunksize;
 		llbitmap->chunks = chunks;
 		llbitmap->sync_size = blocks;
+		llbitmap_refresh_reshape(llbitmap);
 		llbitmap_update_sb(llbitmap);
 	}
 	__llbitmap_flush(mddev);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 08/20] md/md-llbitmap: finish reshape geometry
  2026-06-24  6:41 ` [PATCH v2 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                     ` (6 preceding siblings ...)
  2026-06-24  6:42   ` [PATCH v2 07/20] md/md-llbitmap: track target reshape geometry fields Yu Kuai
@ 2026-06-24  6:42   ` Yu Kuai
  2026-06-24  9:06     ` sashiko-bot
  2026-06-24  6:42   ` [PATCH v2 09/20] md/md-llbitmap: refuse reshape while llbitmap still needs sync Yu Kuai
                     ` (11 subsequent siblings)
  19 siblings, 1 reply; 66+ messages in thread
From: Yu Kuai @ 2026-06-24  6:42 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, Su Yue, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Commit the staged llbitmap geometry when reshape finishes.

The reshape staging itself is handled through llbitmap_resize(), so only
the finish step remains in this patch.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/md-llbitmap.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index cdced2126e97..ae2e87fd5887 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -1680,6 +1680,30 @@ static void llbitmap_dirty_bits(struct mddev *mddev, unsigned long s,
 	llbitmap_state_machine(mddev->bitmap, s, e, BitmapActionStartwrite);
 }
 
+static void llbitmap_reshape_finish(struct mddev *mddev)
+{
+	struct llbitmap *llbitmap = mddev->bitmap;
+
+	if (mddev->pers->quiesce)
+		mddev->pers->quiesce(mddev, 1);
+
+	mutex_lock(&mddev->bitmap_info.mutex);
+	llbitmap_flush(mddev);
+
+	llbitmap->chunksize = llbitmap->reshape_chunksize;
+	llbitmap->chunkshift = ffz(~llbitmap->chunksize);
+	llbitmap->chunks = llbitmap->reshape_chunks;
+	llbitmap->sync_size = llbitmap->reshape_sync_size;
+	llbitmap_refresh_reshape(llbitmap);
+	mddev->bitmap_info.chunksize = llbitmap->chunksize;
+	llbitmap_update_sb(llbitmap);
+	__llbitmap_flush(mddev);
+	mutex_unlock(&mddev->bitmap_info.mutex);
+
+	if (mddev->pers->quiesce)
+		mddev->pers->quiesce(mddev, 0);
+}
+
 static void llbitmap_write_sb(struct llbitmap *llbitmap)
 {
 	int nr_blocks = DIV_ROUND_UP(BITMAP_DATA_OFFSET, llbitmap->io_size);
@@ -1977,6 +2001,7 @@ static struct bitmap_operations llbitmap_ops = {
 	.get_stats		= llbitmap_get_stats,
 	.dirty_bits		= llbitmap_dirty_bits,
 	.prepare_range		= llbitmap_prepare_range,
+	.reshape_finish		= llbitmap_reshape_finish,
 	.write_all		= llbitmap_write_all,
 
 	.groups			= md_llbitmap_groups,
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 09/20] md/md-llbitmap: refuse reshape while llbitmap still needs sync
  2026-06-24  6:41 ` [PATCH v2 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                     ` (7 preceding siblings ...)
  2026-06-24  6:42   ` [PATCH v2 08/20] md/md-llbitmap: finish reshape geometry Yu Kuai
@ 2026-06-24  6:42   ` Yu Kuai
  2026-06-24  7:04     ` sashiko-bot
  2026-06-24  6:42   ` [PATCH v2 10/20] md/md-llbitmap: add reshape range mapping helpers Yu Kuai
                     ` (10 subsequent siblings)
  19 siblings, 1 reply; 66+ messages in thread
From: Yu Kuai @ 2026-06-24  6:42 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, Su Yue, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Reject reshape when llbitmap still contains NeedSync or Syncing bits.

This keeps reshape from starting until the current llbitmap state has
been reconciled.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/md-llbitmap.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index ae2e87fd5887..11fede1fc79f 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -1680,6 +1680,29 @@ static void llbitmap_dirty_bits(struct mddev *mddev, unsigned long s,
 	llbitmap_state_machine(mddev->bitmap, s, e, BitmapActionStartwrite);
 }
 
+static int llbitmap_reshape_can_start(struct mddev *mddev)
+{
+	struct llbitmap *llbitmap = mddev->bitmap;
+	unsigned long chunk;
+	int ret = 0;
+
+	if (!llbitmap)
+		return 0;
+
+	mutex_lock(&mddev->bitmap_info.mutex);
+	for (chunk = 0; chunk < llbitmap->chunks; chunk++) {
+		enum llbitmap_state state = llbitmap_read(llbitmap, chunk);
+
+		if (state == BitNeedSync || state == BitSyncing) {
+			ret = -EBUSY;
+			break;
+		}
+	}
+	mutex_unlock(&mddev->bitmap_info.mutex);
+
+	return ret;
+}
+
 static void llbitmap_reshape_finish(struct mddev *mddev)
 {
 	struct llbitmap *llbitmap = mddev->bitmap;
@@ -2002,6 +2025,7 @@ static struct bitmap_operations llbitmap_ops = {
 	.dirty_bits		= llbitmap_dirty_bits,
 	.prepare_range		= llbitmap_prepare_range,
 	.reshape_finish		= llbitmap_reshape_finish,
+	.reshape_can_start	= llbitmap_reshape_can_start,
 	.write_all		= llbitmap_write_all,
 
 	.groups			= md_llbitmap_groups,
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 10/20] md/md-llbitmap: add reshape range mapping helpers
  2026-06-24  6:41 ` [PATCH v2 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                     ` (8 preceding siblings ...)
  2026-06-24  6:42   ` [PATCH v2 09/20] md/md-llbitmap: refuse reshape while llbitmap still needs sync Yu Kuai
@ 2026-06-24  6:42   ` Yu Kuai
  2026-06-24  7:08     ` sashiko-bot
  2026-06-24  6:42   ` [PATCH v2 11/20] md/md-llbitmap: don't skip reshape ranges from bitmap state Yu Kuai
                     ` (9 subsequent siblings)
  19 siblings, 1 reply; 66+ messages in thread
From: Yu Kuai @ 2026-06-24  6:42 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, Su Yue, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Teach llbitmap to choose old versus new geometry during reshape and to
encode exact bitmap ranges for the active geometry.

This is the mapping groundwork for checkpoint remapping.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/md-llbitmap.c | 96 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 92 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index 11fede1fc79f..d27993cfb13d 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -9,6 +9,7 @@
 #include <linux/sched.h>
 #include <linux/list.h>
 #include <linux/file.h>
+#include <linux/math64.h>
 #include <linux/seq_file.h>
 #include <trace/events/block.h>
 
@@ -449,6 +450,16 @@ static sector_t llbitmap_personality_sync_size(struct llbitmap *llbitmap,
 	return mddev->pers->bitmap_sync_size(mddev, previous);
 }
 
+static sector_t llbitmap_logical_size(struct llbitmap *llbitmap, bool previous)
+{
+	struct mddev *mddev = llbitmap->mddev;
+
+	if (!llbitmap_reshaping(llbitmap) || !mddev->private || !mddev->pers ||
+	    !mddev->pers->bitmap_array_sectors)
+		return llbitmap_personality_sync_size(llbitmap, previous);
+	return mddev->pers->bitmap_array_sectors(mddev, previous);
+}
+
 static void llbitmap_refresh_reshape(struct llbitmap *llbitmap)
 {
 	unsigned long old_chunks = DIV_ROUND_UP_SECTOR_T(llbitmap->sync_size,
@@ -466,6 +477,52 @@ static void llbitmap_refresh_reshape(struct llbitmap *llbitmap)
 	llbitmap->chunks = max(old_chunks, llbitmap->reshape_chunks);
 }
 
+static void llbitmap_map_layout(struct llbitmap *llbitmap, sector_t *offset,
+				unsigned long *sectors, bool previous)
+{
+	sector_t limit = llbitmap_logical_size(llbitmap, previous);
+	sector_t start = *offset;
+	sector_t end = start + *sectors;
+
+	if (start >= limit) {
+		*sectors = 0;
+		return;
+	}
+	if (end > limit)
+		end = limit;
+
+	*offset = start;
+	*sectors = end - start;
+	if (!*sectors)
+		return;
+
+	if (llbitmap->mddev->pers->bitmap_sector_map)
+		llbitmap->mddev->pers->bitmap_sector_map(llbitmap->mddev, offset,
+							 sectors, previous);
+	else if (!previous && llbitmap->mddev->pers->bitmap_sector)
+		llbitmap->mddev->pers->bitmap_sector(llbitmap->mddev, offset,
+							 sectors);
+}
+
+static void llbitmap_encode_range(struct llbitmap *llbitmap, sector_t *offset,
+				  unsigned long *sectors, bool previous)
+{
+	unsigned long chunksize = previous ? llbitmap->chunksize :
+				      llbitmap->reshape_chunksize;
+	u64 start;
+	u64 end;
+
+	if (!*sectors) {
+		*offset = 0;
+		return;
+	}
+
+	start = div64_u64(*offset, chunksize);
+	end = div64_u64(*offset + *sectors - 1, chunksize);
+	*offset = (sector_t)start << llbitmap->chunkshift;
+	*sectors = (end - start + 1) << llbitmap->chunkshift;
+}
+
 static enum llbitmap_state llbitmap_read(struct llbitmap *llbitmap, loff_t pos)
 {
 	unsigned int idx;
@@ -1376,11 +1433,32 @@ static void llbitmap_destroy(struct mddev *mddev)
 	mutex_unlock(&mddev->bitmap_info.mutex);
 }
 
+static bool llbitmap_map_previous(struct llbitmap *llbitmap, sector_t offset,
+				  unsigned long sectors)
+{
+	struct mddev *mddev = llbitmap->mddev;
+	sector_t boundary = mddev->reshape_position;
+
+	if (!llbitmap_reshaping(llbitmap))
+		return false;
+
+	WARN_ON_ONCE(sectors && offset < boundary && offset + sectors > boundary);
+
+	return mddev->reshape_backwards ? offset < boundary : offset >= boundary;
+}
+
 static void llbitmap_prepare_range(struct mddev *mddev, sector_t *offset,
 				   unsigned long *sectors)
 {
-	if (mddev->pers->bitmap_sector)
-		mddev->pers->bitmap_sector(mddev, offset, sectors);
+	struct llbitmap *llbitmap = mddev->bitmap;
+	bool previous;
+
+	if (!llbitmap)
+		return;
+
+	previous = llbitmap_map_previous(llbitmap, *offset, *sectors);
+	llbitmap_map_layout(llbitmap, offset, sectors, previous);
+	llbitmap_encode_range(llbitmap, offset, sectors, previous);
 }
 
 static void llbitmap_start_write(struct mddev *mddev, sector_t offset,
@@ -1549,7 +1627,11 @@ static bool llbitmap_blocks_synced(struct mddev *mddev, sector_t offset)
 {
 	struct llbitmap *llbitmap = mddev->bitmap;
 	unsigned long p = offset >> llbitmap->chunkshift;
-	enum llbitmap_state c = llbitmap_read(llbitmap, p);
+	enum llbitmap_state c;
+
+	if (p >= llbitmap->chunks)
+		return false;
+	c = llbitmap_read(llbitmap, p);
 
 	return c == BitClean || c == BitDirty || c == BitCleanUnwritten;
 }
@@ -1559,7 +1641,11 @@ static sector_t llbitmap_skip_sync_blocks(struct mddev *mddev, sector_t offset)
 	struct llbitmap *llbitmap = mddev->bitmap;
 	unsigned long p = offset >> llbitmap->chunkshift;
 	int blocks = llbitmap->chunksize - (offset & (llbitmap->chunksize - 1));
-	enum llbitmap_state c = llbitmap_read(llbitmap, p);
+	enum llbitmap_state c;
+
+	if (p >= llbitmap->chunks)
+		return 0;
+	c = llbitmap_read(llbitmap, p);
 
 	/* always skip unwritten blocks */
 	if (c == BitUnwritten)
@@ -1604,6 +1690,8 @@ static bool llbitmap_start_sync(struct mddev *mddev, sector_t offset,
 	 * if md_do_sync() loop more times.
 	 */
 	*blocks = llbitmap->chunksize - (offset & (llbitmap->chunksize - 1));
+	if (p >= llbitmap->chunks)
+		return false;
 	state = llbitmap_state_machine(llbitmap, p, p, BitmapActionStartsync);
 	return state == BitSyncing || state == BitSyncingUnwritten;
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 11/20] md/md-llbitmap: don't skip reshape ranges from bitmap state
  2026-06-24  6:41 ` [PATCH v2 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                     ` (9 preceding siblings ...)
  2026-06-24  6:42   ` [PATCH v2 10/20] md/md-llbitmap: add reshape range mapping helpers Yu Kuai
@ 2026-06-24  6:42   ` Yu Kuai
  2026-06-24  6:58     ` sashiko-bot
  2026-06-24  6:42   ` [PATCH v2 12/20] md/md-llbitmap: remap checkpointed bits as reshape progresses Yu Kuai
                     ` (8 subsequent siblings)
  19 siblings, 1 reply; 66+ messages in thread
From: Yu Kuai @ 2026-06-24  6:42 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, Su Yue, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Reshape progress is tracked by array metadata rather than llbitmap.
Do not let llbitmap skip_sync_blocks() suppress reshape ranges based on
stale bitmap state before the corresponding checkpoint is persisted.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/md-llbitmap.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index d27993cfb13d..c62397743d9a 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -1647,6 +1647,14 @@ static sector_t llbitmap_skip_sync_blocks(struct mddev *mddev, sector_t offset)
 		return 0;
 	c = llbitmap_read(llbitmap, p);
 
+	/*
+	 * Reshape progress is tracked by array metadata rather than llbitmap.
+	 * Skipping reshape ranges from stale bitmap state can lose data after a
+	 * restart before the corresponding bits are checkpointed to disk.
+	 */
+	if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery))
+		return 0;
+
 	/* always skip unwritten blocks */
 	if (c == BitUnwritten)
 		return blocks;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 12/20] md/md-llbitmap: remap checkpointed bits as reshape progresses
  2026-06-24  6:41 ` [PATCH v2 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                     ` (10 preceding siblings ...)
  2026-06-24  6:42   ` [PATCH v2 11/20] md/md-llbitmap: don't skip reshape ranges from bitmap state Yu Kuai
@ 2026-06-24  6:42   ` Yu Kuai
  2026-06-24  7:04     ` sashiko-bot
  2026-06-24  6:42   ` [PATCH v2 13/20] md/md-llbitmap: clamp state-machine walks to tracked bits Yu Kuai
                     ` (7 subsequent siblings)
  19 siblings, 1 reply; 66+ messages in thread
From: Yu Kuai @ 2026-06-24  6:42 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, Su Yue, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Merge checkpointed old llbitmap state forward as reshape_position
advances and record the checkpoint remap through reshape_mark().

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/md-llbitmap.c | 172 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 172 insertions(+)

diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index c62397743d9a..00b77ecd600b 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -502,6 +502,14 @@ static void llbitmap_map_layout(struct llbitmap *llbitmap, sector_t *offset,
 	else if (!previous && llbitmap->mddev->pers->bitmap_sector)
 		llbitmap->mddev->pers->bitmap_sector(llbitmap->mddev, offset,
 							 sectors);
+
+	limit = llbitmap_personality_sync_size(llbitmap, previous);
+	start = *offset;
+	end = start + *sectors;
+	if (start >= limit)
+		*sectors = 0;
+	else if (end > limit)
+		*sectors = limit - start;
 }
 
 static void llbitmap_encode_range(struct llbitmap *llbitmap, sector_t *offset,
@@ -906,6 +914,33 @@ static int llbitmap_prepare_resize(struct llbitmap *llbitmap,
 	return 0;
 }
 
+static enum llbitmap_state
+llbitmap_rmerge_state(struct llbitmap *llbitmap,
+		      enum llbitmap_state dst,
+		      enum llbitmap_state src)
+{
+	bool level_456 = raid_is_456(llbitmap->mddev);
+
+	if (dst == BitNeedSync || dst == BitSyncing ||
+	    src == BitNeedSync || src == BitSyncing)
+		return BitNeedSync;
+
+	if (dst == BitDirty || src == BitDirty)
+		return BitDirty;
+
+	/*
+	 * Reshape generates valid target parity/data for both already-written
+	 * and not-yet-written regions in the checkpointed range, so a mix of
+	 * clean and unwritten still results in a clean destination bit.
+	 */
+	if (level_456 && ((dst == BitClean && src == BitUnwritten) ||
+			  (src == BitClean && dst == BitUnwritten)))
+		return BitClean;
+	if (dst == BitClean || src == BitClean)
+		return BitClean;
+	return BitUnwritten;
+}
+
 static void llbitmap_init_state(struct llbitmap *llbitmap)
 {
 	struct mddev *mddev = llbitmap->mddev;
@@ -1799,6 +1834,120 @@ static int llbitmap_reshape_can_start(struct mddev *mddev)
 	return ret;
 }
 
+struct llbitmap_reshape_range {
+	sector_t offset;
+	unsigned long sectors;
+	sector_t start;
+	sector_t end;
+};
+
+static enum llbitmap_state
+llbitmap_reshape_init_dst(struct llbitmap *llbitmap, unsigned long dst,
+			  const struct llbitmap_reshape_range *new)
+{
+	u64 bit_start = (u64)dst * llbitmap->reshape_chunksize;
+	u64 bit_end = bit_start + llbitmap->reshape_chunksize;
+
+	if (!llbitmap->mddev->reshape_backwards)
+		return bit_start < new->offset ? llbitmap_read(llbitmap, dst) :
+		       BitUnwritten;
+	return bit_end > new->end ? llbitmap_read(llbitmap, dst) : BitUnwritten;
+}
+
+static void llbitmap_reshape_dst_range(struct llbitmap *llbitmap,
+				       unsigned long dst,
+				       const struct llbitmap_reshape_range *new,
+				       struct llbitmap_reshape_range *dst_range)
+{
+	sector_t dst_bit_start = (sector_t)dst * llbitmap->reshape_chunksize;
+
+	dst_range->start = max(dst_bit_start, new->offset);
+	dst_range->end = min(dst_bit_start + llbitmap->reshape_chunksize,
+			     new->end);
+	dst_range->offset = dst_range->start;
+	dst_range->sectors = dst_range->end - dst_range->start;
+}
+
+static void llbitmap_reshape_map_range(struct llbitmap *llbitmap,
+				       sector_t lo, sector_t hi,
+				       bool previous,
+				       struct llbitmap_reshape_range *range)
+{
+	range->offset = lo;
+	range->sectors = hi - lo;
+	llbitmap_map_layout(llbitmap, &range->offset, &range->sectors, previous);
+	range->start = range->offset;
+	range->end = range->offset + range->sectors;
+}
+
+static bool llbitmap_reshape_src_range(const struct llbitmap_reshape_range *old,
+				       const struct llbitmap_reshape_range *new,
+				       const struct llbitmap_reshape_range *dst,
+				       struct llbitmap_reshape_range *src)
+{
+	if (!old->sectors)
+		return false;
+
+	src->start = old->offset +
+		mul_u64_u64_div_u64(dst->start - new->offset,
+				    old->sectors, new->sectors);
+	src->end = old->offset +
+		mul_u64_u64_div_u64_roundup(dst->end - new->offset,
+					    old->sectors, new->sectors);
+	if (src->end > old->end)
+		src->end = old->end;
+	src->offset = src->start;
+	src->sectors = src->end - src->start;
+
+	return src->sectors;
+}
+
+static enum llbitmap_state llbitmap_rmerge_src(struct llbitmap *llbitmap,
+					       enum llbitmap_state state,
+					       const struct llbitmap_reshape_range *src)
+{
+	unsigned long bit = div64_u64(src->start, llbitmap->chunksize);
+	unsigned long end = div64_u64(src->end - 1, llbitmap->chunksize);
+
+	while (bit <= end) {
+		enum llbitmap_state src_state = llbitmap_read(llbitmap, bit);
+
+		state = llbitmap_rmerge_state(llbitmap, state, src_state);
+		bit++;
+	}
+
+	return state;
+}
+
+static void llbitmap_reshape_merge(struct llbitmap *llbitmap,
+				   const struct llbitmap_reshape_range *old,
+				   const struct llbitmap_reshape_range *new)
+{
+	unsigned long dst_start;
+	unsigned long dst_end;
+	unsigned long dst;
+
+	if (!new->sectors)
+		return;
+
+	dst_start = div64_u64(new->offset, llbitmap->reshape_chunksize);
+	dst_end = div64_u64(new->end - 1, llbitmap->reshape_chunksize);
+
+	for (dst = dst_start; dst <= dst_end; dst++) {
+		struct llbitmap_reshape_range dst_range;
+		struct llbitmap_reshape_range src;
+		enum llbitmap_state state;
+
+		llbitmap_reshape_dst_range(llbitmap, dst, new, &dst_range);
+		state = llbitmap_reshape_init_dst(llbitmap, dst, new);
+		if (llbitmap_reshape_src_range(old, new, &dst_range, &src))
+			state = llbitmap_rmerge_src(llbitmap, state, &src);
+		else
+			state = llbitmap_rmerge_state(llbitmap, state, BitUnwritten);
+		llbitmap_write(llbitmap, state, dst);
+	}
+}
+
 static void llbitmap_reshape_finish(struct mddev *mddev)
 {
 	struct llbitmap *llbitmap = mddev->bitmap;
@@ -1823,6 +1972,28 @@ static void llbitmap_reshape_finish(struct mddev *mddev)
 		mddev->pers->quiesce(mddev, 0);
 }
 
+static void llbitmap_reshape_mark(struct mddev *mddev, sector_t old_pos,
+				  sector_t new_pos)
+{
+	struct llbitmap *llbitmap = mddev->bitmap;
+	sector_t lo;
+	sector_t hi;
+	struct llbitmap_reshape_range old;
+	struct llbitmap_reshape_range new;
+
+	if (!llbitmap || old_pos == new_pos)
+		return;
+
+	lo = min(old_pos, new_pos);
+	hi = max(old_pos, new_pos);
+	if (!hi)
+		return;
+
+	llbitmap_reshape_map_range(llbitmap, lo, hi, true, &old);
+	llbitmap_reshape_map_range(llbitmap, lo, hi, false, &new);
+	llbitmap_reshape_merge(llbitmap, &old, &new);
+}
+
 static void llbitmap_write_sb(struct llbitmap *llbitmap)
 {
 	int nr_blocks = DIV_ROUND_UP(BITMAP_DATA_OFFSET, llbitmap->io_size);
@@ -2122,6 +2293,7 @@ static struct bitmap_operations llbitmap_ops = {
 	.prepare_range		= llbitmap_prepare_range,
 	.reshape_finish		= llbitmap_reshape_finish,
 	.reshape_can_start	= llbitmap_reshape_can_start,
+	.reshape_mark		= llbitmap_reshape_mark,
 	.write_all		= llbitmap_write_all,
 
 	.groups			= md_llbitmap_groups,
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 13/20] md/md-llbitmap: clamp state-machine walks to tracked bits
  2026-06-24  6:41 ` [PATCH v2 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                     ` (11 preceding siblings ...)
  2026-06-24  6:42   ` [PATCH v2 12/20] md/md-llbitmap: remap checkpointed bits as reshape progresses Yu Kuai
@ 2026-06-24  6:42   ` Yu Kuai
  2026-06-24  7:06     ` sashiko-bot
  2026-06-24  6:42   ` [PATCH v2 14/20] md/raid10: reject llbitmap reshape when md chunk shrinks Yu Kuai
                     ` (6 subsequent siblings)
  19 siblings, 1 reply; 66+ messages in thread
From: Yu Kuai @ 2026-06-24  6:42 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, Su Yue, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

llbitmap_state_machine() can be called with an end bit beyond
llbitmap->chunks. In particular, llbitmap_cond_end_sync() passes
sector >> chunkshift, and sector can reach the tracked boundary
exactly.

Clamp the state-machine range to llbitmap->chunks so it cannot walk
past the tracked bitmap.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/md-llbitmap.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index 00b77ecd600b..581ee9e4a346 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -983,7 +983,10 @@ static enum llbitmap_state llbitmap_state_machine(struct llbitmap *llbitmap,
 		llbitmap_init_state(llbitmap);
 		return BitNone;
 	}
-
+	if (start >= llbitmap->chunks)
+		return BitNone;
+	if (end >= llbitmap->chunks)
+		end = llbitmap->chunks - 1;
 	while (start <= end) {
 		enum llbitmap_state c = llbitmap_read(llbitmap, start);
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 14/20] md/raid10: reject llbitmap reshape when md chunk shrinks
  2026-06-24  6:41 ` [PATCH v2 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                     ` (12 preceding siblings ...)
  2026-06-24  6:42   ` [PATCH v2 13/20] md/md-llbitmap: clamp state-machine walks to tracked bits Yu Kuai
@ 2026-06-24  6:42   ` Yu Kuai
  2026-06-24  6:42   ` [PATCH v2 15/20] md/raid10: wire llbitmap reshape lifecycle Yu Kuai
                     ` (5 subsequent siblings)
  19 siblings, 0 replies; 66+ messages in thread
From: Yu Kuai @ 2026-06-24  6:42 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, Su Yue, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

llbitmap reshape keeps one live bitmap and cannot safely make an
existing bitmap bit cover a smaller data range.

The llbitmap chunksize itself will not shrink when mddev->chunk_sectors
stays the same or grows. However, shrinking mddev->chunk_sectors can
shrink the effective data range covered by each bit for the RAID10
reshape geometry. Reject that reshape while llbitmap is active.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/raid10.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 0a3cfdd3f5df..1242b8d3bb6b 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4284,6 +4284,10 @@ static int raid10_check_reshape(struct mddev *mddev)
 
 	if (conf->geo.far_copies != 1 && !conf->geo.far_offset)
 		return -EINVAL;
+	if (mddev->bitmap_id == ID_LLBITMAP &&
+	    mddev->new_chunk_sectors &&
+	    mddev->new_chunk_sectors < mddev->chunk_sectors)
+		return -EOPNOTSUPP;
 
 	if (setup_geo(&geo, mddev, geo_start) != conf->copies)
 		/* mustn't change number of copies */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 15/20] md/raid10: wire llbitmap reshape lifecycle
  2026-06-24  6:41 ` [PATCH v2 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                     ` (13 preceding siblings ...)
  2026-06-24  6:42   ` [PATCH v2 14/20] md/raid10: reject llbitmap reshape when md chunk shrinks Yu Kuai
@ 2026-06-24  6:42   ` Yu Kuai
  2026-06-24  7:22     ` sashiko-bot
  2026-06-24  6:42   ` [PATCH v2 16/20] md/raid10: split reshape bios before bitmap accounting Yu Kuai
                     ` (4 subsequent siblings)
  19 siblings, 1 reply; 66+ messages in thread
From: Yu Kuai @ 2026-06-24  6:42 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, Su Yue, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Prepare llbitmap before RAID10 starts growing, checkpoint the bitmap
before advancing reshape_position, finish the llbitmap geometry update
when reshape completes, and export the old and new tracked sizes.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/raid10.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 1242b8d3bb6b..e50e4adee389 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4394,6 +4394,12 @@ static int raid10_start_reshape(struct mddev *mddev)
 
 	if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
 		return -EBUSY;
+	if (md_bitmap_enabled(mddev, false) &&
+	    mddev->bitmap_ops->reshape_can_start) {
+		ret = mddev->bitmap_ops->reshape_can_start(mddev);
+		if (ret)
+			return ret;
+	}
 
 	if (setup_geo(&new, mddev, geo_start) != conf->copies)
 		return -EINVAL;
@@ -4707,6 +4713,13 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
 	    time_after(jiffies, conf->reshape_checkpoint + 10*HZ)) {
 		/* Need to update reshape_position in metadata */
 		wait_barrier(conf, false);
+		if (md_bitmap_enabled(mddev, false) &&
+		    mddev->bitmap_ops->reshape_mark &&
+		    conf->reshape_safe != conf->reshape_progress) {
+			mddev->bitmap_ops->reshape_mark(mddev, conf->reshape_safe,
+						       conf->reshape_progress);
+			mddev->bitmap_ops->unplug(mddev, true);
+		}
 		mddev->reshape_position = conf->reshape_progress;
 		if (mddev->reshape_backwards)
 			mddev->curr_resync_completed = raid10_size(mddev, 0, 0)
@@ -4905,9 +4918,19 @@ static void reshape_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 
 static void end_reshape(struct r10conf *conf)
 {
+	struct mddev *mddev = conf->mddev;
+
 	if (test_bit(MD_RECOVERY_INTR, &conf->mddev->recovery))
 		return;
 
+	if (md_bitmap_enabled(mddev, false) &&
+	    mddev->bitmap_ops->reshape_mark &&
+	    conf->reshape_safe != conf->reshape_progress) {
+		mddev->bitmap_ops->reshape_mark(mddev, conf->reshape_safe,
+					       conf->reshape_progress);
+		mddev->bitmap_ops->unplug(mddev, true);
+	}
+
 	spin_lock_irq(&conf->device_lock);
 	conf->prev = conf->geo;
 	md_finish_reshape(conf->mddev);
@@ -5039,10 +5062,15 @@ static void end_reshape_request(struct r10bio *r10_bio)
 static void raid10_finish_reshape(struct mddev *mddev)
 {
 	struct r10conf *conf = mddev->private;
+	bool llbitmap = mddev->bitmap_id == ID_LLBITMAP &&
+		md_bitmap_enabled(mddev, false);
 
 	if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
 		return;
 
+	if (llbitmap && mddev->bitmap_ops->reshape_finish)
+		mddev->bitmap_ops->reshape_finish(mddev);
+
 	if (mddev->delta_disks > 0) {
 		if (mddev->resync_offset > mddev->resync_max_sectors) {
 			mddev->resync_offset = mddev->resync_max_sectors;
@@ -5069,6 +5097,15 @@ static void raid10_finish_reshape(struct mddev *mddev)
 	mddev->reshape_backwards = 0;
 }
 
+static sector_t raid10_bitmap_sync_size(struct mddev *mddev, bool previous)
+{
+	struct r10conf *conf = mddev->private;
+
+	if (previous)
+		return raid10_size(mddev, 0, 0);
+	return raid10_size(mddev, 0, conf->geo.raid_disks);
+}
+
 static struct md_personality raid10_personality =
 {
 	.head = {
@@ -5095,6 +5132,8 @@ static struct md_personality raid10_personality =
 	.start_reshape	= raid10_start_reshape,
 	.finish_reshape	= raid10_finish_reshape,
 	.update_reshape_pos = raid10_update_reshape_pos,
+	.bitmap_sync_size = raid10_bitmap_sync_size,
+	.bitmap_array_sectors = raid10_bitmap_sync_size,
 };
 
 static int __init raid10_init(void)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 16/20] md/raid10: split reshape bios before bitmap accounting
  2026-06-24  6:41 ` [PATCH v2 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                     ` (14 preceding siblings ...)
  2026-06-24  6:42   ` [PATCH v2 15/20] md/raid10: wire llbitmap reshape lifecycle Yu Kuai
@ 2026-06-24  6:42   ` Yu Kuai
  2026-06-24  7:20     ` sashiko-bot
  2026-06-24  6:42   ` [PATCH v2 17/20] md/raid5: add exact old and new llbitmap mapping helpers Yu Kuai
                     ` (3 subsequent siblings)
  19 siblings, 1 reply; 66+ messages in thread
From: Yu Kuai @ 2026-06-24  6:42 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, Su Yue, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Use the shared mddev_bio_split_at_reshape_offset() helper so RAID10
submits only one-side bios to llbitmap during reshape.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/raid10.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index e50e4adee389..fd364a2e2fd9 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1911,6 +1911,12 @@ static bool raid10_make_request(struct mddev *mddev, struct bio *bio)
 		sectors = chunk_sects -
 			(bio->bi_iter.bi_sector &
 			 (chunk_sects - 1));
+
+	bio = mddev_bio_split_at_reshape_offset(mddev, bio, &sectors,
+						&conf->bio_split);
+	if (!bio)
+		return true;
+
 	if (!__make_request(mddev, bio, sectors))
 		md_write_end(mddev);
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 17/20] md/raid5: add exact old and new llbitmap mapping helpers
  2026-06-24  6:41 ` [PATCH v2 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                     ` (15 preceding siblings ...)
  2026-06-24  6:42   ` [PATCH v2 16/20] md/raid10: split reshape bios before bitmap accounting Yu Kuai
@ 2026-06-24  6:42   ` Yu Kuai
  2026-06-24  7:16     ` sashiko-bot
  2026-06-24  6:42   ` [PATCH v2 18/20] md/raid5: reject llbitmap reshape when md chunk shrinks Yu Kuai
                     ` (2 subsequent siblings)
  19 siblings, 1 reply; 66+ messages in thread
From: Yu Kuai @ 2026-06-24  6:42 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, Su Yue, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Teach RAID5 to export exact old and new llbitmap mappings and the
corresponding sync and array sizes for reshape-aware bitmap users.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/raid5.c | 70 ++++++++++++++++++++++++++++++++++------------
 1 file changed, 52 insertions(+), 18 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 0c5c9fb0606e..1613a42cc25d 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5942,25 +5942,43 @@ static enum reshape_loc get_reshape_loc(struct mddev *mddev,
 	return LOC_BEHIND_RESHAPE;
 }
 
-static void raid5_bitmap_sector(struct mddev *mddev, sector_t *offset,
-				unsigned long *sectors)
+static void raid5_bitmap_sector_map(struct mddev *mddev, sector_t *offset,
+				    unsigned long *sectors,
+				    bool previous)
 {
 	struct r5conf *conf = mddev->private;
 	sector_t start = *offset;
 	sector_t end = start + *sectors;
-	sector_t prev_start = start;
-	sector_t prev_end = end;
 	int sectors_per_chunk;
-	enum reshape_loc loc;
 	int dd_idx;
 
-	sectors_per_chunk = conf->chunk_sectors *
-		(conf->raid_disks - conf->max_degraded);
+	if (previous)
+		sectors_per_chunk = conf->prev_chunk_sectors *
+			(conf->previous_raid_disks - conf->max_degraded);
+	else
+		sectors_per_chunk = conf->chunk_sectors *
+			(conf->raid_disks - conf->max_degraded);
 	start = round_down(start, sectors_per_chunk);
 	end = round_up(end, sectors_per_chunk);
 
-	start = raid5_compute_sector(conf, start, 0, &dd_idx, NULL);
-	end = raid5_compute_sector(conf, end, 0, &dd_idx, NULL);
+	start = raid5_compute_sector(conf, start, previous, &dd_idx, NULL);
+	end = raid5_compute_sector(conf, end, previous, &dd_idx, NULL);
+	*offset = start;
+	*sectors = end - start;
+}
+
+static void raid5_bitmap_sector(struct mddev *mddev, sector_t *offset,
+				unsigned long *sectors)
+{
+	struct r5conf *conf = mddev->private;
+	sector_t start = *offset;
+	sector_t end = start + *sectors;
+	sector_t prev_start = start;
+	unsigned long prev_sectors = end - start;
+	enum reshape_loc loc;
+
+	raid5_bitmap_sector_map(mddev, &start, sectors, false);
+	end = start + *sectors;
 
 	/*
 	 * For LOC_INSIDE_RESHAPE, this IO will wait for reshape to make
@@ -5969,17 +5987,10 @@ static void raid5_bitmap_sector(struct mddev *mddev, sector_t *offset,
 	loc = get_reshape_loc(mddev, conf, prev_start);
 	if (likely(loc != LOC_AHEAD_OF_RESHAPE)) {
 		*offset = start;
-		*sectors = end - start;
 		return;
 	}
 
-	sectors_per_chunk = conf->prev_chunk_sectors *
-		(conf->previous_raid_disks - conf->max_degraded);
-	prev_start = round_down(prev_start, sectors_per_chunk);
-	prev_end = round_down(prev_end, sectors_per_chunk);
-
-	prev_start = raid5_compute_sector(conf, prev_start, 1, &dd_idx, NULL);
-	prev_end = raid5_compute_sector(conf, prev_end, 1, &dd_idx, NULL);
+	raid5_bitmap_sector_map(mddev, &prev_start, &prev_sectors, true);
 
 	/*
 	 * for LOC_AHEAD_OF_RESHAPE, reshape can make progress before this IO
@@ -5987,7 +5998,7 @@ static void raid5_bitmap_sector(struct mddev *mddev, sector_t *offset,
 	 * we set bits for both.
 	 */
 	*offset = min(start, prev_start);
-	*sectors = max(end, prev_end) - *offset;
+	*sectors = max(end, prev_start + prev_sectors) - *offset;
 }
 
 static enum stripe_result make_stripe_request(struct mddev *mddev,
@@ -9049,6 +9060,20 @@ static void raid5_prepare_suspend(struct mddev *mddev)
 	wake_up(&conf->wait_for_reshape);
 }
 
+static sector_t raid5_bitmap_sync_size(struct mddev *mddev, bool previous)
+{
+	return mddev->dev_sectors;
+}
+
+static sector_t raid5_bitmap_array_sectors(struct mddev *mddev, bool previous)
+{
+	struct r5conf *conf = mddev->private;
+
+	if (previous)
+		return raid5_size(mddev, 0, 0);
+	return raid5_size(mddev, mddev->dev_sectors, conf->raid_disks);
+}
+
 static struct md_personality raid6_personality =
 {
 	.head = {
@@ -9078,6 +9103,9 @@ static struct md_personality raid6_personality =
 	.change_consistency_policy = raid5_change_consistency_policy,
 	.prepare_suspend = raid5_prepare_suspend,
 	.bitmap_sector	= raid5_bitmap_sector,
+	.bitmap_sector_map = raid5_bitmap_sector_map,
+	.bitmap_sync_size = raid5_bitmap_sync_size,
+	.bitmap_array_sectors = raid5_bitmap_array_sectors,
 };
 static struct md_personality raid5_personality =
 {
@@ -9108,6 +9136,9 @@ static struct md_personality raid5_personality =
 	.change_consistency_policy = raid5_change_consistency_policy,
 	.prepare_suspend = raid5_prepare_suspend,
 	.bitmap_sector	= raid5_bitmap_sector,
+	.bitmap_sector_map = raid5_bitmap_sector_map,
+	.bitmap_sync_size = raid5_bitmap_sync_size,
+	.bitmap_array_sectors = raid5_bitmap_array_sectors,
 };
 
 static struct md_personality raid4_personality =
@@ -9139,6 +9170,9 @@ static struct md_personality raid4_personality =
 	.change_consistency_policy = raid5_change_consistency_policy,
 	.prepare_suspend = raid5_prepare_suspend,
 	.bitmap_sector	= raid5_bitmap_sector,
+	.bitmap_sector_map = raid5_bitmap_sector_map,
+	.bitmap_sync_size = raid5_bitmap_sync_size,
+	.bitmap_array_sectors = raid5_bitmap_array_sectors,
 };
 
 static int __init raid5_init(void)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 18/20] md/raid5: reject llbitmap reshape when md chunk shrinks
  2026-06-24  6:41 ` [PATCH v2 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                     ` (16 preceding siblings ...)
  2026-06-24  6:42   ` [PATCH v2 17/20] md/raid5: add exact old and new llbitmap mapping helpers Yu Kuai
@ 2026-06-24  6:42   ` Yu Kuai
  2026-06-24  7:24     ` sashiko-bot
  2026-06-24  6:42   ` [PATCH v2 19/20] md/raid5: wire llbitmap reshape lifecycle Yu Kuai
  2026-06-24  6:42   ` [PATCH v2 20/20] md/raid5: split reshape bios before bitmap accounting Yu Kuai
  19 siblings, 1 reply; 66+ messages in thread
From: Yu Kuai @ 2026-06-24  6:42 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, Su Yue, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

llbitmap reshape keeps one live bitmap and cannot safely make an
existing bitmap bit cover a smaller data range.

The llbitmap chunksize itself will not shrink when mddev->chunk_sectors
stays the same or grows. However, shrinking mddev->chunk_sectors shrinks
sectors_per_chunk used by raid5_bitmap_sector_map(). That can shrink the
effective data range covered by each bit across the old and new RAID5
geometry. Reject that reshape while llbitmap is active.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/raid5.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 1613a42cc25d..9de648f75dd0 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -8506,6 +8506,9 @@ static int check_reshape(struct mddev *mddev)
 	if (!check_stripe_cache(mddev))
 		return -ENOSPC;
 
+	if (mddev->bitmap_id == ID_LLBITMAP &&
+	    mddev->new_chunk_sectors < mddev->chunk_sectors)
+		return -EOPNOTSUPP;
 	if (mddev->new_chunk_sectors > mddev->chunk_sectors ||
 	    mddev->delta_disks > 0)
 		if (resize_chunks(conf,
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 19/20] md/raid5: wire llbitmap reshape lifecycle
  2026-06-24  6:41 ` [PATCH v2 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                     ` (17 preceding siblings ...)
  2026-06-24  6:42   ` [PATCH v2 18/20] md/raid5: reject llbitmap reshape when md chunk shrinks Yu Kuai
@ 2026-06-24  6:42   ` Yu Kuai
  2026-06-24  7:20     ` sashiko-bot
  2026-06-24  6:42   ` [PATCH v2 20/20] md/raid5: split reshape bios before bitmap accounting Yu Kuai
  19 siblings, 1 reply; 66+ messages in thread
From: Yu Kuai @ 2026-06-24  6:42 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, Su Yue, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Prepare llbitmap before RAID5 reshape starts, checkpoint the bitmap
before advancing reshape_position, and finish the llbitmap geometry
update when reshape completes.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/raid5.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 9de648f75dd0..ac7ea483502f 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6430,6 +6430,13 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, int *sk
 			   || test_bit(MD_RECOVERY_INTR, &mddev->recovery));
 		if (atomic_read(&conf->reshape_stripes) != 0)
 			return 0;
+		if (md_bitmap_enabled(mddev, false) &&
+		    mddev->bitmap_ops->reshape_mark &&
+		    conf->reshape_safe != conf->reshape_progress) {
+			mddev->bitmap_ops->reshape_mark(mddev, conf->reshape_safe,
+						       conf->reshape_progress);
+			mddev->bitmap_ops->unplug(mddev, true);
+		}
 		mddev->reshape_position = conf->reshape_progress;
 		mddev->curr_resync_completed = sector_nr;
 		if (!mddev->reshape_backwards)
@@ -6539,6 +6546,13 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, int *sk
 			   || test_bit(MD_RECOVERY_INTR, &mddev->recovery));
 		if (atomic_read(&conf->reshape_stripes) != 0)
 			goto ret;
+		if (md_bitmap_enabled(mddev, false) &&
+		    mddev->bitmap_ops->reshape_mark &&
+		    conf->reshape_safe != conf->reshape_progress) {
+			mddev->bitmap_ops->reshape_mark(mddev, conf->reshape_safe,
+						       conf->reshape_progress);
+			mddev->bitmap_ops->unplug(mddev, true);
+		}
 		mddev->reshape_position = conf->reshape_progress;
 		mddev->curr_resync_completed = sector_nr;
 		if (!mddev->reshape_backwards)
@@ -8571,6 +8585,12 @@ static int raid5_start_reshape(struct mddev *mddev)
 			mdname(mddev));
 		return -EINVAL;
 	}
+	if (md_bitmap_enabled(mddev, false) &&
+	    mddev->bitmap_id == ID_LLBITMAP) {
+		i = mddev->bitmap_ops->resize(mddev, mddev->dev_sectors, 0);
+		if (i)
+			return i;
+	}
 
 	atomic_set(&conf->reshape_stripes, 0);
 	spin_lock_irq(&conf->device_lock);
@@ -8655,10 +8675,19 @@ static int raid5_start_reshape(struct mddev *mddev)
  */
 static void end_reshape(struct r5conf *conf)
 {
+	struct mddev *mddev = conf->mddev;
 
 	if (!test_bit(MD_RECOVERY_INTR, &conf->mddev->recovery)) {
 		struct md_rdev *rdev;
 
+		if (md_bitmap_enabled(mddev, false) &&
+		    mddev->bitmap_ops->reshape_mark &&
+		    conf->reshape_safe != conf->reshape_progress) {
+			mddev->bitmap_ops->reshape_mark(mddev, conf->reshape_safe,
+						       conf->reshape_progress);
+			mddev->bitmap_ops->unplug(mddev, true);
+		}
+
 		spin_lock_irq(&conf->device_lock);
 		conf->previous_raid_disks = conf->raid_disks;
 		md_finish_reshape(conf->mddev);
@@ -8685,8 +8714,16 @@ static void raid5_finish_reshape(struct mddev *mddev)
 {
 	struct r5conf *conf = mddev->private;
 	struct md_rdev *rdev;
+	bool llbitmap = mddev->bitmap_id == ID_LLBITMAP &&
+		md_bitmap_enabled(mddev, false);
 
 	if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {
+		if (llbitmap && mddev->bitmap_ops->reshape_finish)
+			mddev->bitmap_ops->reshape_finish(mddev);
+		if (llbitmap) {
+			mddev->resync_offset = 0;
+			mddev->resync_max_sectors = mddev->dev_sectors;
+		}
 
 		if (mddev->delta_disks <= 0) {
 			int d;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 20/20] md/raid5: split reshape bios before bitmap accounting
  2026-06-24  6:41 ` [PATCH v2 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
                     ` (18 preceding siblings ...)
  2026-06-24  6:42   ` [PATCH v2 19/20] md/raid5: wire llbitmap reshape lifecycle Yu Kuai
@ 2026-06-24  6:42   ` Yu Kuai
  2026-06-24  7:29     ` sashiko-bot
  19 siblings, 1 reply; 66+ messages in thread
From: Yu Kuai @ 2026-06-24  6:42 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: Li Nan, Xiao Ni, Su Yue, linux-raid, linux-kernel

From: Yu Kuai <yukuai@fygo.io>

Use the shared mddev_bio_split_at_reshape_offset() helper so RAID5
submits only one-side bios to llbitmap during reshape.

Signed-off-by: Yu Kuai <yukuai@fygo.io>
---
 drivers/md/raid5.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index ac7ea483502f..fb346a3c4aa3 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6181,6 +6181,14 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 		return true;
 	}
 
+	bi = mddev_bio_split_at_reshape_offset(mddev, bi, NULL,
+					       &conf->bio_split);
+	if (!bi) {
+		if (rw == WRITE)
+			md_write_end(mddev);
+		return true;
+	}
+
 	logical_sector = bi->bi_iter.bi_sector & ~((sector_t)RAID5_STRIPE_SECTORS(conf)-1);
 	bi->bi_next = NULL;
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 11/20] md/md-llbitmap: don't skip reshape ranges from bitmap state
  2026-06-24  6:42   ` [PATCH v2 11/20] md/md-llbitmap: don't skip reshape ranges from bitmap state Yu Kuai
@ 2026-06-24  6:58     ` sashiko-bot
  0 siblings, 0 replies; 66+ messages in thread
From: sashiko-bot @ 2026-06-24  6:58 UTC (permalink / raw)
  To: Yu Kuai; +Cc: yukuai, linux-raid

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

Pre-existing issues:
- [Critical] Use-After-Free (UAF) of `llbitmap` memory during dynamic bitmap unregistration.
- [High] Signed integer overflow in `llbitmap_skip_sync_blocks` can cause premature resync termination.
--

commit 008d5398fa7aa1473594e503af861f03ef4f2548
Author: Yu Kuai <yukuai@fygo.io>

md/md-llbitmap: don't skip reshape ranges from bitmap state

Reshape progress is tracked by array metadata rather than llbitmap.
Do not let llbitmap skip_sync_blocks() suppress reshape ranges based on
stale bitmap state before the corresponding checkpoint is persisted.

> diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
> index d27993cfb13da..c62397743d9a0 100644
> --- a/drivers/md/md-llbitmap.c
> +++ b/drivers/md/md-llbitmap.c
> @@ -1647,6 +1647,14 @@ static sector_t llbitmap_skip_sync_blocks(struct mddev *mddev, sector_t offset)
>  		return 0;
>  	c = llbitmap_read(llbitmap, p);
>  
> +	/*
> +	 * Reshape progress is tracked by array metadata rather than llbitmap.
> +	 * Skipping reshape ranges from stale bitmap state can lose data after a
> +	 * restart before the corresponding bits are checkpointed to disk.
> +	 */
> +	if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery))
> +		return 0;
> +
>  	/* always skip unwritten blocks */
>  	if (c == BitUnwritten)
>  		return blocks;

[Severity: High]
This is a pre-existing issue, but since we are modifying this function, is
there a risk of signed integer overflow when returning blocks here?

If the array is extremely large and reserved bitmap space is small,
llbitmap->chunksize can be doubled during initialization until it exceeds
INT_MAX.

Because blocks is calculated earlier in this function as an int:

    int blocks = llbitmap->chunksize - (offset & (llbitmap->chunksize - 1));

If chunksize exceeds INT_MAX, blocks could become negative.

When this negative value is implicitly cast to the sector_t (unsigned 64-bit)
return type, it becomes a massive positive number.

Could this cause md_do_sync() to jump to the end of the array and
prematurely abort the synchronization?

Would it be safer to change the type of blocks to unsigned long or sector_t?


[Severity: Critical]
This isn't a bug introduced by this patch, but looking at the teardown
sequence elsewhere in this file for dynamic bitmap unregistration, does
llbitmap_destroy() have a use-after-free vulnerability?

When clearing the bitmap at runtime, llbitmap_destroy() deletes the timer
and flushes the workqueues without setting the LLPageFlush flag:

    timer_delete_sync(&llbitmap->pending_timer);
    flush_workqueue(md_llbitmap_io_wq);
    flush_workqueue(md_llbitmap_unplug_wq);
    
    mddev->bitmap = NULL;
    llbitmap_free_pages(llbitmap);
    kfree(llbitmap);

If the daemon worker (md_llbitmap_daemon_fn) is running on the workqueue,
timer_delete_sync() might do nothing if the timer already fired. The
flush_workqueue() call then waits for the worker to finish.

Inside the worker, because LLPageFlush is not set and idle pages have an
expire of LONG_MAX, it sets restart = true and re-arms the timer:

    if (!test_bit(LLPageFlush, &pctl->flags) &&
        time_before(jiffies, pctl->expire)) {
        restart = true;
        continue;
    }
    ...
    if (restart)
        mod_timer(&llbitmap->pending_timer, ...);

Once the worker finishes, flush_workqueue() returns and llbitmap is freed.
Does this leave the timer armed on freed memory, potentially causing memory
corruption when it fires?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1782282042.git.yukuai@kernel.org?part=11

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 03/20] md: add helper to split bios at reshape offset
  2026-06-24  6:42   ` [PATCH v2 03/20] md: add helper to split bios at reshape offset Yu Kuai
@ 2026-06-24  7:01     ` sashiko-bot
  0 siblings, 0 replies; 66+ messages in thread
From: sashiko-bot @ 2026-06-24  7:01 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, yukuai

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Critical] Lockless read of 64-bit `mddev->reshape_position` risks torn reads on 32-bit architectures and TOCTOU races between bio splitting and bitmap accounting, leading to silent data corruption.
--

commit 4912480eb23a69a6634d083e53f12568752eafb0
Author: Yu Kuai <yukuai@fygo.io>

md: add helper to split bios at reshape offset

This commit adds the mddev_bio_split_at_reshape_offset() helper function.
It allows MD personalities to share the reshape-offset bio splitting logic
instead of open-coding it in multiple places.

> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 0b59c676f7c03..c18a7f3c27e65 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
[ ... ]
> @@ -9361,6 +9361,45 @@ void md_submit_discard_bio(struct mddev *mddev, struct md_rdev *rdev,
>  }
>  EXPORT_SYMBOL_GPL(md_submit_discard_bio);
>  
> +struct bio *mddev_bio_split_at_reshape_offset(struct mddev *mddev,
> +					      struct bio *bio,
> +					      unsigned int *max_sectors,
> +					      struct bio_set *bs)
> +{
> +	sector_t boundary;
> +	sector_t start;
> +	sector_t end;
> +	unsigned int split_sectors;
> +
> +	split_sectors = bio_sectors(bio);
> +	if (max_sectors && *max_sectors && *max_sectors < split_sectors)
> +		split_sectors = *max_sectors;
> +
> +	if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery))
> +		goto split;
> +
> +	boundary = mddev->reshape_position;

[Severity: Critical]
Can this lockless read of mddev->reshape_position cause a torn read on
32-bit architectures since it is a 64-bit value?

Additionally, does reading this without locking introduce a time-of-check to
time-of-use race with bitmap accounting? If mddev->reshape_position
is updated by the reshape thread after this split check, but before bitmap
accounting in llbitmap_map_previous(), an unsplit bio could end up crossing
the new boundary.

This could trigger the WARN_ON_ONCE in llbitmap_map_previous():

    sector_t boundary = mddev->reshape_position;

    if (!llbitmap_reshaping(llbitmap))
        return false;

    WARN_ON_ONCE(sectors && offset < boundary && offset + sectors > boundary);

Could this race result in a bio being mis-accounted to the wrong array
geometry, potentially causing data corruption if a crash occurs during a
reshape?

> +	start = bio->bi_iter.bi_sector;
> +	end = bio_end_sector(bio);
> +	if (start >= boundary || end <= boundary)
> +		goto split;
> +
> +	if (boundary - start < split_sectors)
> +		split_sectors = boundary - start;
> +
> +split:
> +	if (max_sectors)
> +		*max_sectors = split_sectors;
> +	if (split_sectors < bio_sectors(bio)) {
> +		bio = bio_submit_split_bioset(bio, split_sectors, bs);
> +		if (bio)
> +			bio->bi_opf |= REQ_NOMERGE;
> +	}
> +
> +	return bio;
> +}
> +EXPORT_SYMBOL_GPL(mddev_bio_split_at_reshape_offset);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1782282042.git.yukuai@kernel.org?part=3

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 05/20] md/md-llbitmap: allocate page controls independently
  2026-06-24  6:42   ` [PATCH v2 05/20] md/md-llbitmap: allocate page controls independently Yu Kuai
@ 2026-06-24  7:02     ` sashiko-bot
  0 siblings, 0 replies; 66+ messages in thread
From: sashiko-bot @ 2026-06-24  7:02 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, yukuai

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [High] Eager allocation of physical memory and disk I/O for the entire unused reserved bitmap space causes severe memory consumption and I/O stall regressions.

Pre-existing issues:
- [Critical] The bitmap space validation omits BITMAP_DATA_OFFSET, potentially allowing the bitmap to overflow its physical bounds and corrupt user data.
--

commit 880054f58bccd2442566ac1e584d6d9c8fc01e7e
Author: Yu Kuai <yukuai@fygo.io>

md/md-llbitmap: allocate page controls independently

Allocate one llbitmap page-control object at a time and free each
object through the same model.

> diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
> index e3500a1a52c07..4c3fc4b35aff0 100644
> --- a/drivers/md/md-llbitmap.c
> +++ b/drivers/md/md-llbitmap.c

[ ... ]

> +static int llbitmap_alloc_pages(struct llbitmap *llbitmap)
> +{
> +	unsigned int used_pages = llbitmap_used_pages(llbitmap, llbitmap->chunks);
> +	unsigned int nr_pages = max(used_pages, llbitmap_reserved_pages(llbitmap));
> +	int i;

[Severity: High]
Does this unconditionally eager-allocate memory and write out the entire 
reserved space?

By setting nr_pages to the maximum reserved space:
nr_pages = max(used_pages, llbitmap_reserved_pages(llbitmap));

and unconditionally allocating a physical 4KB page for every index up to 
this limit in llbitmap_read_page(), this seems to allocate RAM for all 
reserved, unused pages.

Furthermore, will functions like __llbitmap_flush() now iterate over all 
these allocated nr_pages, marking them dirty and synchronously writing them 
to disk, potentially causing severe I/O stalls for arrays with large 
reserved bitmap spaces?

> +
> +	llbitmap->pctl = kcalloc(nr_pages, sizeof(*llbitmap->pctl), GFP_KERNEL);
> +	if (!llbitmap->pctl)
> +		return -ENOMEM;

[ ... ]

> @@ -923,7 +948,7 @@ static int llbitmap_init(struct llbitmap *llbitmap)
>  	llbitmap->sync_size = blocks;
>  	mddev->bitmap_info.daemon_sleep = DEFAULT_DAEMON_SLEEP;
>  
> -	ret = llbitmap_cache_pages(llbitmap);
> +	ret = llbitmap_alloc_pages(llbitmap);

[Severity: Critical]
This is a pre-existing issue, but does the bitmap space validation here
omit BITMAP_DATA_OFFSET?

In llbitmap_init() and llbitmap_read_sb(), the validation checks ensure
the total bitmap size (chunks) doesn't exceed the reserved byte length
(space). However, they don't seem to account for the 1024-byte
BITMAP_DATA_OFFSET that is prepended to the bitmap data on disk.

If chunks is close to or exactly equal to space, the total byte size
required (chunks + BITMAP_DATA_OFFSET) could exceed the physical bounds
of the reserved space. During flushes, could llbitmap_write_page() write
across the reserved bitmap boundary and overwrite adjacent user data?

>  	if (ret)
>  		return ret;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1782282042.git.yukuai@kernel.org?part=5

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 04/20] md/md-llbitmap: track bitmap sync_size explicitly
  2026-06-24  6:42   ` [PATCH v2 04/20] md/md-llbitmap: track bitmap sync_size explicitly Yu Kuai
@ 2026-06-24  7:02     ` sashiko-bot
  0 siblings, 0 replies; 66+ messages in thread
From: sashiko-bot @ 2026-06-24  7:02 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, yukuai

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [High] Data race on `llbitmap->sync_size` between `llbitmap_resize` and `llbitmap_update_sb` can result in a corrupted superblock.

Pre-existing issues:
- [Critical] Improper teardown order in `llbitmap_destroy()` allows the asynchronous workqueue to unsafely re-arm the timer, leading to a Use-After-Free.
--

commit 4f14b77c13988c93db9a3db0e4720c69a8ac613c
Author: Yu Kuai <yukuai@fygo.io>

md/md-llbitmap: track bitmap sync_size explicitly

Track llbitmap's own sync_size instead of always using
mddev->resync_max_sectors directly. This is the minimal bookkeeping
needed before llbitmap can track old and new reshape geometry
independently.

> diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
> index f0c20afa354eb..e3500a1a52c07 100644
> --- a/drivers/md/md-llbitmap.c
> +++ b/drivers/md/md-llbitmap.c

[ ... ]

> @@ -1153,6 +1161,7 @@ static int llbitmap_resize(struct mddev *mddev, sector_t blocks, int chunksize)
>  	llbitmap->chunkshift = ffz(~chunksize);
>  	llbitmap->chunksize = chunksize;
>  	llbitmap->chunks = chunks;
> +	llbitmap->sync_size = blocks;
>  
>  	return 0;
>  }

[ ... ]

> @@ -1526,7 +1535,7 @@ static void llbitmap_update_sb(void *data)
>  	sb->events = cpu_to_le64(mddev->events);
>  	sb->state = cpu_to_le32(llbitmap->flags);
>  	sb->chunksize = cpu_to_le32(llbitmap->chunksize);
> -	sb->sync_size = cpu_to_le64(mddev->resync_max_sectors);
> +	sb->sync_size = cpu_to_le64(llbitmap->sync_size);

[Severity: High]
Can this read of llbitmap->sync_size cause a torn read on 32-bit systems?

While llbitmap_resize() updates this 64-bit value while holding
mddev->bitmap_info.mutex and reconfig_mutex, md_update_sb() calls
llbitmap_update_sb() without holding bitmap_info.mutex.

If a torn read occurs during concurrent array resize operations, could the
superblock be written with a corrupted sync_size? This might lead to array
assembly failures or out-of-bounds accesses during subsequent operations.

[ ... ]

[Severity: Critical]
This is a pre-existing issue, but does the teardown sequence in
llbitmap_destroy() have a use-after-free risk for the timer?

Looking at the current teardown order in llbitmap_destroy():

llbitmap_destroy() {
        ...
        timer_delete_sync(&llbitmap->pending_timer);
        flush_workqueue(md_llbitmap_io_wq);
        ...
}

If md_llbitmap_daemon_fn() is running during the flush_workqueue() call,
it might attempt to re-arm the timer:

md_llbitmap_daemon_fn() {
        ...
        if (restart)
                mod_timer(&llbitmap->pending_timer, jiffies + ...);
}

Could the asynchronous workqueue unsafely re-arm the timer here, allowing
it to resurrect after timer_delete_sync() and execute after the llbitmap
structure has been freed?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1782282042.git.yukuai@kernel.org?part=4

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 06/20] md/md-llbitmap: grow the page cache in place for reshape
  2026-06-24  6:42   ` [PATCH v2 06/20] md/md-llbitmap: grow the page cache in place for reshape Yu Kuai
@ 2026-06-24  7:03     ` sashiko-bot
  0 siblings, 0 replies; 66+ messages in thread
From: sashiko-bot @ 2026-06-24  7:03 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, yukuai

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

New issues:
- [High] The `LLPageFlush` bit on existing cached pages is set during `llbitmap_flush()` but never cleared, permanently bypassing the barrier idle expiration timer and causing severe metadata write amplification.
- [High] Reallocating `llbitmap->pctl` locklessly in place causes a Use-After-Free for concurrent I/O paths and the background daemon.

Pre-existing issues:
- [High] An integer overflow in chunk size calculation on 32-bit systems causes a fatal divide-by-zero for extremely large arrays.
--

commit 06efa66cdd24523c6bea3b2de073842d179f3506
Author: Yu Kuai <yukuai@fygo.io>

md/md-llbitmap: grow the page cache in place for reshape

Use the page-control helpers to grow llbitmap's cached pages in place
for resize and later reshape preparation, instead of rebuilding the
whole cache.

> diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
> index 4c3fc4b35aff0..98e7824c7362e 100644
> --- a/drivers/md/md-llbitmap.c
> +++ b/drivers/md/md-llbitmap.c
> @@ -416,6 +416,19 @@ static char state_machine[BitStateCount][BitmapActionCount] = {
>  };
>  
>  static void __llbitmap_flush(struct mddev *mddev);
> +static void llbitmap_flush(struct mddev *mddev);
> +static void llbitmap_update_sb(void *data);
> +
> +static void llbitmap_calculate_chunks(struct mddev *mddev, sector_t blocks,
> +				      unsigned long *chunksize,
> +				      unsigned long *chunks)
> +{
> +	*chunks = DIV_ROUND_UP_SECTOR_T(blocks, *chunksize);
> +	while (*chunks > mddev->bitmap_info.space << SECTOR_SHIFT) {
> +		*chunksize = *chunksize << 1;
> +		*chunks = DIV_ROUND_UP_SECTOR_T(blocks, *chunksize);

[Severity: High]
This isn't a bug introduced by this patch, but since `chunksize` is an
`unsigned long` (32 bits on 32-bit architectures) and `blocks` is a 64-bit
`sector_t`, could the left shift push `*chunksize` past 31 bits and overflow
it to 0 for extremely large arrays?

If `*chunksize` becomes 0, does this cause a divide-by-zero panic in the
`DIV_ROUND_UP_SECTOR_T()` macro?

> +	}
> +}

[ ... ]

> +static int llbitmap_expand_pages(struct llbitmap *llbitmap,
> +				 unsigned long chunks)
> +{
> +	struct llbitmap_page_ctl **pctl;
> +	unsigned int old_nr_pages = llbitmap->nr_pages;
> +	unsigned int nr_pages = llbitmap_used_pages(llbitmap, chunks);
> +	unsigned int i;
> +	int ret;
> +
> +	if (nr_pages <= old_nr_pages)
> +		return 0;
> +
> +	pctl = kcalloc(nr_pages, sizeof(*pctl), GFP_KERNEL);
> +	if (!pctl)
> +		return -ENOMEM;
> +
> +	if (llbitmap->pctl)
> +		memcpy(pctl, llbitmap->pctl,
> +		       array_size(old_nr_pages, sizeof(*pctl)));
> +
> +	for (i = old_nr_pages; i < nr_pages; i++) {
> +		pctl[i] = llbitmap_alloc_page_ctl(llbitmap, i);
> +		if (IS_ERR(pctl[i]))
> +			goto err_alloc_ptr;
> +	}
> +
> +	kfree(llbitmap->pctl);
> +	llbitmap->pctl = pctl;
> +	llbitmap->nr_pages = nr_pages;

[Severity: High]
Is it safe to reallocate and update `llbitmap->pctl` and `nr_pages` locklessly
in place like this while the array is processing I/O?

`llbitmap_resize()` holds only `mddev->bitmap_info.mutex`, but concurrent
lockless readers like `md_llbitmap_daemon_fn()` and the I/O fastpath
`llbitmap_start_write()` dereference `llbitmap->pctl[idx]`.

Could this lead to a Use-After-Free if a concurrent reader accesses the old
`llbitmap->pctl` array just as it gets freed?

> +	return 0;
> +
> +err_alloc_ptr:
> +	ret = PTR_ERR(pctl[i]);

[ ... ]

> +static int llbitmap_prepare_resize(struct llbitmap *llbitmap,
> +				   unsigned long old_chunks,
> +				   unsigned long new_chunks,
> +				   unsigned long cache_chunks)
> +{
> +	int ret;
> +
> +	llbitmap_flush(llbitmap->mddev);
> +	ret = llbitmap_expand_pages(llbitmap, cache_chunks);

[Severity: High]
Does this sequence cause the daemon thread to bypass the barrier idle timer and
create severe metadata write amplification?

`llbitmap_flush()` sets the `LLPageFlush` bit on all existing pages. Since
`llbitmap_expand_pages()` reuses the existing `pctl` structures, the active
pages are retained with the `LLPageFlush` bit still set.

Since this bit is never cleared, the check in `md_llbitmap_daemon_fn()`:

	if (!test_bit(LLPageFlush, &pctl->flags) &&
	    time_before(jiffies, pctl->expire)) {
		restart = true;
		continue;
	}

will permanently fail for these pages, forcing the daemon to immediately flush
them and skip re-arming the pending timer.

> +	if (ret)
> +		return ret;
> +	if (new_chunks > old_chunks)
> +		llbitmap_mark_range(llbitmap, old_chunks, new_chunks - 1,
> +				    BitUnwritten);
> +	return 0;
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1782282042.git.yukuai@kernel.org?part=6

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 02/20] md: skip bitmap accounting for empty write ranges
  2026-06-24  6:41   ` [PATCH v2 02/20] md: skip bitmap accounting for empty write ranges Yu Kuai
@ 2026-06-24  7:04     ` sashiko-bot
  0 siblings, 0 replies; 66+ messages in thread
From: sashiko-bot @ 2026-06-24  7:04 UTC (permalink / raw)
  To: Yu Kuai; +Cc: yukuai, linux-raid

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] Using `md_io_clone->start_time == 0` to check whether IO accounting was started is unsafe because `jiffies` can legitimately be exactly 0, permanently leaking the device's `in_flight` counter.
--

commit ff695222c639e0740f645b5e4a15298c468511f4
Author: Yu Kuai <yukuai@fygo.io>

md: skip bitmap accounting for empty write ranges

This commit updates md bitmap accounting to handle zero-sector flush/FUA bios.
It treats both naturally empty bios and ranges trimmed by the prepare_range
hook as not started, preventing the completion path from calling end_write()
on an empty range.

> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 3b20a57b8c7ed..0b59c676f7c03 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c

[ ... ]

> @@ -9423,10 +9426,12 @@ static void md_clone_bio(struct mddev *mddev, struct bio **bio)
>  	md_io_clone = container_of(clone, struct md_io_clone, bio_clone);
>  	md_io_clone->orig_bio = *bio;
>  	md_io_clone->mddev = mddev;
> +	md_io_clone->sectors = 0;
>  	if (blk_queue_io_stat(bdev->bd_disk->queue))
>  		md_io_clone->start_time = bio_start_io_acct(*bio);

[Severity: High]
This isn't a bug introduced by this patch, but is it safe to use
md_io_clone->start_time as a boolean flag to track if IO accounting was
started?

Looking at bio_start_io_acct(), it returns the current jiffies value. If an
IO request is submitted exactly when the jiffies counter wraps to 0,
start_time will be set to 0.

When the IO completes, md_end_clone_io() evaluates the following condition:

    if (md_io_clone->start_time)
        bio_end_io_acct(orig_bio, md_io_clone->start_time);

If start_time happens to be 0, it skips bio_end_io_acct() and the block 
device's in_flight counter is permanently leaked. This prevents
is_mddev_idle() from ever returning true, which could permanently throttle
background sync operations until the system is rebooted.

> -	if (bio_data_dir(*bio) == WRITE && md_bitmap_enabled(mddev, false)) {
> +	if (bio_data_dir(*bio) == WRITE && bio_sectors(*bio) &&
> +	    md_bitmap_enabled(mddev, false)) {
>  		md_io_clone->offset = (*bio)->bi_iter.bi_sector;
>  		md_io_clone->sectors = bio_sectors(*bio);

[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1782282042.git.yukuai@kernel.org?part=2

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 09/20] md/md-llbitmap: refuse reshape while llbitmap still needs sync
  2026-06-24  6:42   ` [PATCH v2 09/20] md/md-llbitmap: refuse reshape while llbitmap still needs sync Yu Kuai
@ 2026-06-24  7:04     ` sashiko-bot
  0 siblings, 0 replies; 66+ messages in thread
From: sashiko-bot @ 2026-06-24  7:04 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, yukuai

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] llbitmap_reshape_can_start() fails to reject reshape when chunks are in BitNeedSyncUnwritten or BitSyncingUnwritten states.
- [High] The newly introduced reshape_can_start callback is never invoked by the RAID5 subsystem, rendering the reshape refusal logic non-functional for RAID5/6.
--

commit fc17ecb84d03fe66b3e1048dc17b6b8022da7f9b
Author: Yu Kuai <yukuai@fygo.io>

md/md-llbitmap: refuse reshape while llbitmap still needs sync

Reject reshape when llbitmap still contains NeedSync or Syncing bits.

This keeps reshape from starting until the current llbitmap state has
been reconciled.

> diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
> index ae2e87fd58874..11fede1fc79fd 100644
> --- a/drivers/md/md-llbitmap.c
> +++ b/drivers/md/md-llbitmap.c
> @@ -1680,6 +1680,29 @@ static void llbitmap_dirty_bits(struct mddev *mddev, unsigned long s,
>  	llbitmap_state_machine(mddev->bitmap, s, e, BitmapActionStartwrite);
>  }
>  
> +static int llbitmap_reshape_can_start(struct mddev *mddev)
> +{
> +	struct llbitmap *llbitmap = mddev->bitmap;
> +	unsigned long chunk;
> +	int ret = 0;
> +
> +	if (!llbitmap)
> +		return 0;
> +
> +	mutex_lock(&mddev->bitmap_info.mutex);
> +	for (chunk = 0; chunk < llbitmap->chunks; chunk++) {
> +		enum llbitmap_state state = llbitmap_read(llbitmap, chunk);
> +
> +		if (state == BitNeedSync || state == BitSyncing) {

[Severity: High]
Should this check also include BitNeedSyncUnwritten and BitSyncingUnwritten
states?

Reshaping an array while it has regions pending or undergoing proactive parity
synchronization could change the array layout while the background sync process
is running, potentially leading to corrupt parity.

> +			ret = -EBUSY;
> +			break;
> +		}
> +	}
> +	mutex_unlock(&mddev->bitmap_info.mutex);
> +
> +	return ret;
> +}
> +

[ ... ]

> @@ -2002,6 +2025,7 @@ static struct bitmap_operations llbitmap_ops = {
>  	.dirty_bits		= llbitmap_dirty_bits,
>  	.prepare_range		= llbitmap_prepare_range,
>  	.reshape_finish		= llbitmap_reshape_finish,
> +	.reshape_can_start	= llbitmap_reshape_can_start,

[Severity: High]
Is the reshape_can_start callback invoked by the RAID5 subsystem when starting
a reshape?

Looking at raid5_start_reshape() in drivers/md/raid5.c:

    if (md_bitmap_enabled(mddev, false) &&
        mddev->bitmap_id == ID_LLBITMAP) {
        i = mddev->bitmap_ops->resize(mddev, mddev->dev_sectors, 0);

It doesn't appear to call mddev->bitmap_ops->reshape_can_start. This could
allow RAID5/6 arrays to start a reshape while the bitmap still contains
pending sync bits, causing the array layout to change before parity is
reconciled.

>  	.write_all		= llbitmap_write_all,
>  
>  	.groups			= md_llbitmap_groups,

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1782282042.git.yukuai@kernel.org?part=9

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 12/20] md/md-llbitmap: remap checkpointed bits as reshape progresses
  2026-06-24  6:42   ` [PATCH v2 12/20] md/md-llbitmap: remap checkpointed bits as reshape progresses Yu Kuai
@ 2026-06-24  7:04     ` sashiko-bot
  0 siblings, 0 replies; 66+ messages in thread
From: sashiko-bot @ 2026-06-24  7:04 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, yukuai

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [Critical] In-place array modification during reshape overwrites un-reshaped source data before it can be read.
- [High] Non-atomic read-modify-write cycle in `llbitmap_reshape_merge` races with concurrent normal I/O, dropping `BitDirty` flags.
- [High] Missing handling for `BitNeedSyncUnwritten` and `BitCleanUnwritten` in `llbitmap_rmerge_state` silently demotes them.
- [High] The `level_456` condition in `llbitmap_rmerge_state` is completely ineffective and subsumed by the subsequent fallback.
--

commit a1307817c41ef916bb59f41a3af5b2d7a709caa2
Author: Yu Kuai <yukuai@fygo.io>

md/md-llbitmap: remap checkpointed bits as reshape progresses

Merge checkpointed old llbitmap state forward as reshape_position
advances and record the checkpoint remap through reshape_mark().

> diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
> index c62397743d9a0..00b77ecd600be 100644
> --- a/drivers/md/md-llbitmap.c
> +++ b/drivers/md/md-llbitmap.c

[ ... ]

> @@ -906,6 +914,33 @@ static int llbitmap_prepare_resize(struct llbitmap *llbitmap,
>  	return 0;
>  }
>  
> +static enum llbitmap_state
> +llbitmap_rmerge_state(struct llbitmap *llbitmap,
> +		      enum llbitmap_state dst,
> +		      enum llbitmap_state src)
> +{
> +	bool level_456 = raid_is_456(llbitmap->mddev);
> +
> +	if (dst == BitNeedSync || dst == BitSyncing ||
> +	    src == BitNeedSync || src == BitSyncing)
> +		return BitNeedSync;

[Severity: High]
Are BitNeedSyncUnwritten and BitCleanUnwritten states meant to be handled
here?

Because they do not match BitNeedSync, BitSyncing, BitDirty, or BitClean,
they fall through these conditions. If paired with BitClean, they
incorrectly evaluate to BitClean; otherwise, they fall through to
BitUnwritten. Both outcomes silently drop the sync requirement.

> +
> +	if (dst == BitDirty || src == BitDirty)
> +		return BitDirty;
> +
> +	/*
> +	 * Reshape generates valid target parity/data for both already-written
> +	 * and not-yet-written regions in the checkpointed range, so a mix of
> +	 * clean and unwritten still results in a clean destination bit.
> +	 */
> +	if (level_456 && ((dst == BitClean && src == BitUnwritten) ||
> +			  (src == BitClean && dst == BitUnwritten)))
> +		return BitClean;
> +	if (dst == BitClean || src == BitClean)
> +		return BitClean;

[Severity: High]
Does the fallback condition "dst == BitClean || src == BitClean" render the
preceding level_456 check dead code?

The fallback unconditionally forces any BitClean mix to BitClean, which
entirely overrides the level_456 restriction. Could this lead to RAID10
incorrectly resolving mixed BitClean and BitUnwritten states to BitClean,
causing skipped resync operations?

> +	return BitUnwritten;
> +}

[ ... ]

> +static void llbitmap_reshape_merge(struct llbitmap *llbitmap,
> +				   const struct llbitmap_reshape_range *old,
> +				   const struct llbitmap_reshape_range *new)
> +{
> +	unsigned long dst_start;
> +	unsigned long dst_end;
> +	unsigned long dst;
> +
> +	if (!new->sectors)
> +		return;
> +
> +	dst_start = div64_u64(new->offset, llbitmap->reshape_chunksize);
> +	dst_end = div64_u64(new->end - 1, llbitmap->reshape_chunksize);
> +
> +	for (dst = dst_start; dst <= dst_end; dst++) {
> +		struct llbitmap_reshape_range dst_range;
> +		struct llbitmap_reshape_range src;
> +		enum llbitmap_state state;
> +
> +		llbitmap_reshape_dst_range(llbitmap, dst, new, &dst_range);
> +		state = llbitmap_reshape_init_dst(llbitmap, dst, new);

[Severity: High]
Does this non-atomic read-modify-write cycle race with concurrent normal I/O?

llbitmap_reshape_init_dst() reads the chunk state, computes the merged state
through llbitmap_rmerge_src(), and writes it back via llbitmap_write() below
without being protected by locks.

Concurrent normal I/O operating on stripes outside the reshape window but
within the same bitmap chunk can set BitDirty. The llbitmap_reshape_merge()
thread will blindly overwrite this BitDirty flag with its locally computed
state, potentially causing silent data corruption due to lost BitDirty
flags.

> +		if (llbitmap_reshape_src_range(old, new, &dst_range, &src))
> +			state = llbitmap_rmerge_src(llbitmap, state, &src);
> +		else
> +			state = llbitmap_rmerge_state(llbitmap, state, BitUnwritten);
> +		llbitmap_write(llbitmap, state, dst);

[Severity: Critical]
Does writing directly to the shared pctl array overwrite un-reshaped source
data before it can be read?

When adding disks to a RAID5 array, new->sectors > old->sectors. This causes
the forward iteration of dst to outpace bit in llbitmap_rmerge_src()
(e.g., dst = 10, bit = 5).

Because llbitmap_write() updates the array in place, the state at index 5
is overwritten before it is read for dst = 10, destroying the original chunk
state.

> +	}
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1782282042.git.yukuai@kernel.org?part=12

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 13/20] md/md-llbitmap: clamp state-machine walks to tracked bits
  2026-06-24  6:42   ` [PATCH v2 13/20] md/md-llbitmap: clamp state-machine walks to tracked bits Yu Kuai
@ 2026-06-24  7:06     ` sashiko-bot
  0 siblings, 0 replies; 66+ messages in thread
From: sashiko-bot @ 2026-06-24  7:06 UTC (permalink / raw)
  To: Yu Kuai; +Cc: yukuai, linux-raid

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [Critical] llbitmap_cond_end_sync prematurely marks partially synced chunks as completed, risking data corruption and skipped resyncs.
--

commit e55ca490604a05215923d84a1b1272a73343ea48
Author: Yu Kuai <yukuai@fygo.io>

md/md-llbitmap: clamp state-machine walks to tracked bits

This commit adds bounds checking to llbitmap_state_machine() to prevent
walking past the tracked bitmap chunks. It ensures the state machine range
is clamped when the requested end bit exceeds the tracked boundary.

> diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
> index 00b77ecd600be..581ee9e4a3467 100644
> --- a/drivers/md/md-llbitmap.c
> +++ b/drivers/md/md-llbitmap.c
> @@ -983,7 +983,10 @@ static enum llbitmap_state llbitmap_state_machine(struct llbitmap *llbitmap,
>  		llbitmap_init_state(llbitmap);
>  		return BitNone;
>  	}
> -
> +	if (start >= llbitmap->chunks)
> +		return BitNone;
> +	if (end >= llbitmap->chunks)
> +		end = llbitmap->chunks - 1;
>  	while (start <= end) {

[Severity: Critical]
This is a pre-existing issue, but does the inclusive bound here cause
prematurely completed resyncs?

Looking at llbitmap_cond_end_sync(), it passes the current sector's chunk
index as the end bit:

drivers/md/md-llbitmap.c:llbitmap_cond_end_sync() {
    ...
    llbitmap_state_machine(llbitmap, 0, sector >> llbitmap->chunkshift,
                           BitmapActionEndsync);
    ...
}

Because the while loop condition is start <= end, the chunk currently
undergoing resync will be included in BitmapActionEndsync and transition to
BitDirty prematurely.

If start_sync() is subsequently called for the remainder of this chunk,
won't it see BitDirty and return false? This would cause the resync loop
to interpret it as no resync needed, skipping the rest of the chunk's
sectors and leaving them out of sync.

Should the end parameter be treated as exclusive instead, or does
llbitmap_cond_end_sync() need to subtract one from the end chunk when
sector is not aligned to the chunk size?

>  		enum llbitmap_state c = llbitmap_read(llbitmap, start);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1782282042.git.yukuai@kernel.org?part=13

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 07/20] md/md-llbitmap: track target reshape geometry fields
  2026-06-24  6:42   ` [PATCH v2 07/20] md/md-llbitmap: track target reshape geometry fields Yu Kuai
@ 2026-06-24  7:07     ` sashiko-bot
  0 siblings, 0 replies; 66+ messages in thread
From: sashiko-bot @ 2026-06-24  7:07 UTC (permalink / raw)
  To: Yu Kuai; +Cc: yukuai, linux-raid

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

Pre-existing issues:
- [High] Incomplete teardown of the `pending_timer` and `daemon_work` cycle in `llbitmap_destroy` leads to a Use-After-Free.
- [High] Concurrent array resize causes Use-After-Free on the `llbitmap->pctl` array due to lack of I/O suspension.
--

commit 2a89e33f120722b7687d6b24f4bf95273fa7a498
Author: Yu Kuai <yukuai@fygo.io>

md/md-llbitmap: track target reshape geometry fields

Track llbitmap bookkeeping for the target reshape geometry while keeping
a single live bitmap instance.

Add the reshape geometry fields, refresh helper, and update the load and
resize paths to keep the target geometry in sync.

> diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
> index 98e7824c7362e..cdced2126e975 100644
> --- a/drivers/md/md-llbitmap.c
> +++ b/drivers/md/md-llbitmap.c
> @@ -289,6 +289,9 @@ struct llbitmap {
>  	unsigned long chunks;
>  	/* total number of sectors tracked by current bitmap geometry */
>  	sector_t sync_size;
> +	unsigned long reshape_chunksize;
> +	unsigned long reshape_chunks;
> +	sector_t reshape_sync_size;
>  	unsigned long last_end_sync;
>  	/*
>  	 * time in seconds that dirty bits will be cleared if the page is not

[Severity: High]
This isn't a bug introduced by this patch, but is there a use-after-free risk
during llbitmap teardown?

If a user removes the bitmap via the location sysfs node,
llbitmap_destroy() deletes the pending_timer and flushes the I/O workqueue:

drivers/md/md-llbitmap.c:llbitmap_destroy() {
    ...
    timer_delete_sync(&llbitmap->pending_timer);
    flush_workqueue(md_llbitmap_io_wq);
    ...
}

However, it doesn't seem to set a shutdown flag to prevent re-arming.
If the daemon work is executing and finds unexpired dirty pages, it will set
restart = true and unconditionally re-arm the timer:

drivers/md/md-llbitmap.c:md_llbitmap_daemon_fn() {
    ...
    if (restart)
        mod_timer(&llbitmap->pending_timer,
                  jiffies + llbitmap->mddev->bitmap_info.daemon_sleep * HZ);
}

Once the work finishes, the flush_workqueue() call in llbitmap_destroy()
returns and the llbitmap structure is freed. Can the re-armed timer then
fire on the freed llbitmap structure?

[Severity: High]
This is a pre-existing issue, but can concurrent array resizing cause a
use-after-free on the llbitmap->pctl array?

When expanding the array size via sysfs, update_size() invokes the
personality resize hook without suspending the array:

drivers/md/md.c:update_size() {
    ...
    rv = mddev->pers->resize(mddev, num_sectors);
    ...
}

This eventually calls llbitmap_expand_pages(), which allocates a new
pointer array and immediately frees the old one:

drivers/md/md-llbitmap.c:llbitmap_expand_pages() {
    ...
    kfree(llbitmap->pctl);
    llbitmap->pctl = pctl;
    ...
}

Since update_size() only holds the mddev_lock and does not drain in-flight
I/O, new I/O can still be issued.

Could a concurrent I/O path calling llbitmap_raise_barrier() fetch the old
llbitmap->pctl array pointer just before it is freed:

drivers/md/md-llbitmap.c:llbitmap_raise_barrier() {
    struct llbitmap_page_ctl *pctl = llbitmap->pctl[page_idx];
    ...
}

and then dereference it, accessing freed memory?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1782282042.git.yukuai@kernel.org?part=7

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 10/20] md/md-llbitmap: add reshape range mapping helpers
  2026-06-24  6:42   ` [PATCH v2 10/20] md/md-llbitmap: add reshape range mapping helpers Yu Kuai
@ 2026-06-24  7:08     ` sashiko-bot
  0 siblings, 0 replies; 66+ messages in thread
From: sashiko-bot @ 2026-06-24  7:08 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, yukuai

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] TOCTOU race on mddev->reshape_position causes bios to be mapped to the wrong geometry.
- [High] The resync threads use raw array sectors and blindly compute the bitmap bit index using the old geometry's chunkshift, completely ignoring reshape_chunksize for sectors residing in the new geometry.
--

commit a481d24ff8d12695efeabb7f320e930174126333
Author: Yu Kuai <yukuai@fygo.io>

md/md-llbitmap: add reshape range mapping helpers
    
Teach llbitmap to choose old versus new geometry during reshape and to
encode exact bitmap ranges for the active geometry.

This is the mapping groundwork for checkpoint remapping.

> diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
> index 11fede1fc79fd..d27993cfb13da 100644
> --- a/drivers/md/md-llbitmap.c
> +++ b/drivers/md/md-llbitmap.c

[ ... ]

> @@ -1376,11 +1433,32 @@ static void llbitmap_destroy(struct mddev *mddev)
>  	mutex_unlock(&mddev->bitmap_info.mutex);
>  }
>  
> +static bool llbitmap_map_previous(struct llbitmap *llbitmap, sector_t offset,
> +				  unsigned long sectors)
> +{
> +	struct mddev *mddev = llbitmap->mddev;
> +	sector_t boundary = mddev->reshape_position;
> +
> +	if (!llbitmap_reshaping(llbitmap))
> +		return false;

[Severity: High]
Can a concurrent reshape thread update mddev->reshape_position from MaxSector to
0 between reading it into boundary and the call to llbitmap_reshaping()?

If that happens, boundary holds MaxSector but llbitmap_reshaping() evaluates to
true. This would cause offset < MaxSector to evaluate to true, mapping the IO
to the new geometry before the array has actually reshaped the sector. Could
this race lead to data corruption?

> +
> +	WARN_ON_ONCE(sectors && offset < boundary && offset + sectors > boundary);
> +
> +	return mddev->reshape_backwards ? offset < boundary : offset >= boundary;
> +}

[ ... ]

> @@ -1549,7 +1627,11 @@ static bool llbitmap_blocks_synced(struct mddev *mddev, sector_t offset)
>  {
>  	struct llbitmap *llbitmap = mddev->bitmap;
>  	unsigned long p = offset >> llbitmap->chunkshift;

[Severity: High]
Does this unconditionally use the old chunkshift to compute the bit index from
the raw sync offset?

During an interrupted reshape recovery, active IO computes the correct bit index
using llbitmap_encode_range() for the new geometry. However, functions like
llbitmap_blocks_synced(), llbitmap_skip_sync_blocks(), and llbitmap_start_sync()
receive raw physical sectors from md_do_sync() and completely bypass
llbitmap_prepare_range().

If reshape_chunksize differs from chunksize, will the sync thread calculate the
wrong bit index, read a clean bit instead of the dirty bit marked by active IO,
and silently skip synchronizing the dirty sectors?

> -	enum llbitmap_state c = llbitmap_read(llbitmap, p);
> +	enum llbitmap_state c;
> +
> +	if (p >= llbitmap->chunks)
> +		return false;
> +	c = llbitmap_read(llbitmap, p);
>  
>  	return c == BitClean || c == BitDirty || c == BitCleanUnwritten;
>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1782282042.git.yukuai@kernel.org?part=10

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 17/20] md/raid5: add exact old and new llbitmap mapping helpers
  2026-06-24  6:42   ` [PATCH v2 17/20] md/raid5: add exact old and new llbitmap mapping helpers Yu Kuai
@ 2026-06-24  7:16     ` sashiko-bot
  0 siblings, 0 replies; 66+ messages in thread
From: sashiko-bot @ 2026-06-24  7:16 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, yukuai

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

New issues:
- [High] raid5_bitmap_array_sectors() incorrectly returns the new, smaller array size for the previous layout during a shrink reshape, leading to truncated bitmap tracking.

Pre-existing issues:
- [High] Invalid use of round_down() and round_up() with a non-power-of-2 sectors_per_chunk multiplier, causing corrupt bitmap mappings.
- [High] Bounding box calculation for mapping ranges between old and new layouts can overflow `unsigned long` on 32-bit systems, truncating dirty regions.
- [High] raid5_bitmap_sector_map() accesses reshape geometry parameters without holding the required sequence lock.
--

commit 7f0559d3994c9371245bbc6a23cf3ce1e8d48872
Author: Yu Kuai <yukuai@fygo.io>

md/raid5: add exact old and new llbitmap mapping helpers

Teach RAID5 to export exact old and new llbitmap mappings and the
corresponding sync and array sizes for reshape-aware bitmap users.

> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 0c5c9fb0606ee..1613a42cc25dc 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
[ ... ]
> @@ -5942,25 +5942,43 @@ static enum reshape_loc get_reshape_loc(struct mddev *mddev,
[ ... ]
> -static void raid5_bitmap_sector(struct mddev *mddev, sector_t *offset,
> -				unsigned long *sectors)
> +static void raid5_bitmap_sector_map(struct mddev *mddev, sector_t *offset,
> +				    unsigned long *sectors,
> +				    bool previous)
>  {
>  	struct r5conf *conf = mddev->private;
>  	sector_t start = *offset;
>  	sector_t end = start + *sectors;
> -	sector_t prev_start = start;
> -	sector_t prev_end = end;
>  	int sectors_per_chunk;
> -	enum reshape_loc loc;
>  	int dd_idx;
>  
> -	sectors_per_chunk = conf->chunk_sectors *
> -		(conf->raid_disks - conf->max_degraded);
> +	if (previous)
> +		sectors_per_chunk = conf->prev_chunk_sectors *
> +			(conf->previous_raid_disks - conf->max_degraded);
> +	else
> +		sectors_per_chunk = conf->chunk_sectors *
> +			(conf->raid_disks - conf->max_degraded);

[Severity: High]
This is a pre-existing issue, but does raid5_bitmap_sector_map() access
these reshape geometry parameters safely?

During a concurrent reshape operation (e.g., raid5_start_reshape()), these
configuration fields can be updated in-place. Without being protected by
read_seqcount_begin(&conf->gen_lock), could a concurrent I/O observe torn
state or a mismatch between chunk size and raid disks, leading to an
incorrect bitmap mapping?

>  	start = round_down(start, sectors_per_chunk);
>  	end = round_up(end, sectors_per_chunk);

[Severity: High]
This is a pre-existing issue, but do round_down() and round_up() work correctly
here when sectors_per_chunk is not a power of 2?

The round_down() and round_up() macros perform a bitwise masking operation
which strictly requires the multiplier to be a power of 2. Because the
number of data disks (raid_disks - max_degraded) is often not a power of 2
(for example, 3 data disks in a 4-disk RAID5), sectors_per_chunk is frequently
not a power of 2.

Will this bitwise mask silently corrupt the logical offsets instead of rounding
to stripe boundaries, leading to incorrect physical regions being tracked in
the bitmap?

> -	start = raid5_compute_sector(conf, start, 0, &dd_idx, NULL);
> -	end = raid5_compute_sector(conf, end, 0, &dd_idx, NULL);
> +	start = raid5_compute_sector(conf, start, previous, &dd_idx, NULL);
> +	end = raid5_compute_sector(conf, end, previous, &dd_idx, NULL);
> +	*offset = start;
> +	*sectors = end - start;
> +}
> +
> +static void raid5_bitmap_sector(struct mddev *mddev, sector_t *offset,
> +				unsigned long *sectors)
> +{
> +	struct r5conf *conf = mddev->private;
> +	sector_t start = *offset;
> +	sector_t end = start + *sectors;
> +	sector_t prev_start = start;
> +	unsigned long prev_sectors = end - start;
> +	enum reshape_loc loc;
> +
> +	raid5_bitmap_sector_map(mddev, &start, sectors, false);
> +	end = start + *sectors;
[ ... ]
> @@ -5987,7 +5998,7 @@ static void raid5_bitmap_sector(struct mddev *mddev, sector_t *offset,
>  	 * we set bits for both.
>  	 */
>  	*offset = min(start, prev_start);
> -	*sectors = max(end, prev_end) - *offset;
> +	*sectors = max(end, prev_start + prev_sectors) - *offset;

[Severity: High]
This isn't a bug introduced by this patch, but can this bounding box
calculation overflow on 32-bit systems?

When calculating the bounding box spanning the old and new physical layout
mappings for an I/O ahead of a reshape, the physical sector offsets diverge
linearly due to differing data disk counts.

For large arrays (e.g., >40TB), the gap between the mapped sectors can
exceed 2TB (4,294,967,295 sectors). Because *sectors is a pointer to
unsigned long (which is 32-bit on 32-bit architectures), could this
subtraction silently overflow, truncating the dirty region and potentially
missing necessary bits?

>  }
>  
>  static enum stripe_result make_stripe_request(struct mddev *mddev,
[ ... ]
> @@ -9049,6 +9060,20 @@ static void raid5_prepare_suspend(struct mddev *mddev)
>  	wake_up(&conf->wait_for_reshape);
>  }
>  
> +static sector_t raid5_bitmap_sync_size(struct mddev *mddev, bool previous)
> +{
> +	return mddev->dev_sectors;
> +}
> +
> +static sector_t raid5_bitmap_array_sectors(struct mddev *mddev, bool previous)
> +{
> +	struct r5conf *conf = mddev->private;
> +
> +	if (previous)
> +		return raid5_size(mddev, 0, 0);

[Severity: High]
Does passing 0 for raid_disks in raid5_size() incorrectly truncate the array
size for the previous layout?

During an array shrink reshape (e.g., from 5 disks to 4), passing 0 forces
raid5_size() to use the minimum disk count between the old and new layouts.
This would return the new, smaller array size instead of the exact previous
layout's size.

Since reshape-aware bitmap users rely on this limit to clip logical I/O
offsets, could I/Os mapping to the old unreshaped region beyond the new
limit be erroneously discarded, leading to dropped dirty bits?

> +	return raid5_size(mddev, mddev->dev_sectors, conf->raid_disks);
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1782282042.git.yukuai@kernel.org?part=17

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 19/20] md/raid5: wire llbitmap reshape lifecycle
  2026-06-24  6:42   ` [PATCH v2 19/20] md/raid5: wire llbitmap reshape lifecycle Yu Kuai
@ 2026-06-24  7:20     ` sashiko-bot
  0 siblings, 0 replies; 66+ messages in thread
From: sashiko-bot @ 2026-06-24  7:20 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, yukuai

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Critical] Checkpointing the translated llbitmap to disk before the MD superblock is updated creates a crash window that leads to silent data corruption on restart.
- [Critical] The llbitmap page control array (`llbitmap->pctl`) can be accessed after being freed during RAID5 reshape initialization because the bitmap resize happens before the array is quiesced.
--

commit 18675e43f792016c6cde2f177af7a007c667120d
Author: Yu Kuai <yukuai@fygo.io>

md/raid5: wire llbitmap reshape lifecycle

Prepare llbitmap before RAID5 reshape starts, checkpoint the bitmap
before advancing reshape_position, and finish the llbitmap geometry
update when reshape completes.

> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 9de648f75dd04..ac7ea483502fd 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c

[ ... ]

> @@ -6430,6 +6430,13 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, int *sk
>  			   || test_bit(MD_RECOVERY_INTR, &mddev->recovery));
>  		if (atomic_read(&conf->reshape_stripes) != 0)
>  			return 0;
> +		if (md_bitmap_enabled(mddev, false) &&
> +		    mddev->bitmap_ops->reshape_mark &&
> +		    conf->reshape_safe != conf->reshape_progress) {
> +			mddev->bitmap_ops->reshape_mark(mddev, conf->reshape_safe,
> +						       conf->reshape_progress);
> +			mddev->bitmap_ops->unplug(mddev, true);
> +		}

[Severity: Critical]
Does checkpointing the translated llbitmap to disk before the MD superblock
is updated create a crash window that leads to silent data corruption on
restart?

During RAID5 reshape, the code translates the bitmap to the new layout via
reshape_mark() and flushes it to disk before asynchronously waking the MD
thread to update the superblock.

If the system crashes after the bitmap flush but before the superblock write
completes, the on-disk metadata will still point to the old reshape position
(reshape_safe), while the on-disk bitmap for the region 
[reshape_safe, reshape_progress] will be stored in the new layout.

On reboot, the array will resume the reshape from reshape_safe and
misinterpret the newly mapped bitmap bits as if they were in the old layout.
If the array is degraded during this restart, it might incorrectly assume the
parity for these sectors is in-sync and reconstruct missing data using
corrupted parity.

>  		mddev->reshape_position = conf->reshape_progress;
>  		mddev->curr_resync_completed = sector_nr;
>  		if (!mddev->reshape_backwards)

[ ... ]

> @@ -8571,6 +8585,12 @@ static int raid5_start_reshape(struct mddev *mddev)
>  			mdname(mddev));
>  		return -EINVAL;
>  	}
> +	if (md_bitmap_enabled(mddev, false) &&
> +	    mddev->bitmap_id == ID_LLBITMAP) {
> +		i = mddev->bitmap_ops->resize(mddev, mddev->dev_sectors, 0);
> +		if (i)
> +			return i;
> +	}

[Severity: Critical]
Can this resize operation trigger a use-after-free regression since it
occurs before raid5_quiesce() stops concurrent I/O?

In raid5_start_reshape(), mddev->bitmap_ops->resize() is called to prepare
the bitmap for the new geometry before raid5_quiesce() is invoked. This means
the array is not suspended, and concurrent I/O is actively being handled.

If the resize operation calls llbitmap_expand_pages() and frees the old
pctl array via kfree() without RCU synchronization, concurrent fast-path I/O
threads executing llbitmap_start_write() could locklessly dereference the
freed pointer array in llbitmap_raise_barrier().

>  
>  	atomic_set(&conf->reshape_stripes, 0);
>  	spin_lock_irq(&conf->device_lock);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1782282042.git.yukuai@kernel.org?part=19

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 16/20] md/raid10: split reshape bios before bitmap accounting
  2026-06-24  6:42   ` [PATCH v2 16/20] md/raid10: split reshape bios before bitmap accounting Yu Kuai
@ 2026-06-24  7:20     ` sashiko-bot
  0 siblings, 0 replies; 66+ messages in thread
From: sashiko-bot @ 2026-06-24  7:20 UTC (permalink / raw)
  To: Yu Kuai; +Cc: yukuai, linux-raid

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Missing `md_write_end()` on `mddev_bio_split_at_reshape_offset()` failure path causes `writes_pending` leak.
- [Medium] Lockless read of `mddev->reshape_position` can result in a torn read on 32-bit architectures, leading to incorrect bio splitting.
--

commit 2a6f82a757276f6bbae0c22cb819c9c6d6b3e4b0
Author: Yu Kuai <yukuai@fygo.io>

md/raid10: split reshape bios before bitmap accounting

Use the shared mddev_bio_split_at_reshape_offset() helper so RAID10
submits only one-side bios to llbitmap during reshape.

> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -1911,6 +1911,12 @@ static bool raid10_make_request(struct mddev *mddev, struct bio *bio)
>  		sectors = chunk_sects -
>  			(bio->bi_iter.bi_sector &
>  			 (chunk_sects - 1));
> +
> +	bio = mddev_bio_split_at_reshape_offset(mddev, bio, &sectors,
> +						&conf->bio_split);

[Severity: Medium]
Can the lockless read of mddev->reshape_position inside this helper
result in a torn read on 32-bit architectures?

In mddev_bio_split_at_reshape_offset(), mddev->reshape_position (a 64-bit
sector_t) is read without READ_ONCE() or a seqcount:

drivers/md/md.c:mddev_bio_split_at_reshape_offset() {
    ...
    boundary = mddev->reshape_position;
    ...
}

On 32-bit systems, could racing with the reshape thread's update (for
example, at the 2TB boundary) tear the 64-bit load and yield a garbage
sector value?

If so, this might cause bio_split to slice the bio at an unaligned offset,
leading to IO errors when the underlying block device rejects it.

> +	if (!bio)
> +		return true;

[Severity: High]
Does this early return bypass the md_write_end() cleanup required to
balance the earlier md_write_start() for write requests?

If mddev_bio_split_at_reshape_offset() fails and returns NULL, it appears
the writes_pending percpu reference is leaked.

Will this prevent the array from reaching the clean safemode state, causing
it to perpetually appear dirty and trigger a full resync on the next
assembly?

> +
>  	if (!__make_request(mddev, bio, sectors))
>  		md_write_end(mddev);
>

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1782282042.git.yukuai@kernel.org?part=16

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 15/20] md/raid10: wire llbitmap reshape lifecycle
  2026-06-24  6:42   ` [PATCH v2 15/20] md/raid10: wire llbitmap reshape lifecycle Yu Kuai
@ 2026-06-24  7:22     ` sashiko-bot
  0 siblings, 0 replies; 66+ messages in thread
From: sashiko-bot @ 2026-06-24  7:22 UTC (permalink / raw)
  To: Yu Kuai; +Cc: yukuai, linux-raid

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] raid10_start_reshape skips initializing llbitmap's reshape geometry, causing bitmap corruption during and after reshape.
- [Low] Hardcoded check for `ID_LLBITMAP` in `raid10_finish_reshape` breaks the polymorphism of `mddev->bitmap_ops`.
--

commit 84732a3d166ed5b144ce82fb596703bb8b35f73c
Author: Yu Kuai <yukuai@fygo.io>

md/raid10: wire llbitmap reshape lifecycle

Prepare llbitmap before RAID10 starts growing, checkpoint the bitmap
before advancing reshape_position, finish the llbitmap geometry update
when reshape completes, and export the old and new tracked sizes.

> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 1242b8d3bb6bd..e50e4adee389f 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -4394,6 +4394,12 @@ static int raid10_start_reshape(struct mddev *mddev)
>  
>  	if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
>  		return -EBUSY;
> +	if (md_bitmap_enabled(mddev, false) &&
> +	    mddev->bitmap_ops->reshape_can_start) {
> +		ret = mddev->bitmap_ops->reshape_can_start(mddev);
> +		if (ret)
> +			return ret;
> +	}

[Severity: High]
Does this fully wire up the llbitmap reshape lifecycle?

Further down in raid10_start_reshape(), the call to mddev->bitmap_ops->resize()
is gated by `if (mddev->delta_disks && mddev->bitmap)`.

If a reshape is started that only grows the chunk size (where
mddev->delta_disks == 0), won't this skip initializing the llbitmap's reshape
geometry?

If resize() is bypassed, llbitmap->reshape_chunksize would remain at the old
chunk size, causing llbitmap_encode_range() for the new layout to map IOs
using the old chunk size. Could this set bits for incorrect regions, leading
to dirty data not being tracked properly and silent data corruption upon
array recovery?

[ ... ]

> @@ -5039,10 +5062,15 @@ static void end_reshape_request(struct r10bio *r10_bio)
>  static void raid10_finish_reshape(struct mddev *mddev)
>  {
>  	struct r10conf *conf = mddev->private;
> +	bool llbitmap = mddev->bitmap_id == ID_LLBITMAP &&
> +		md_bitmap_enabled(mddev, false);
>  
>  	if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
>  		return;
>  
> +	if (llbitmap && mddev->bitmap_ops->reshape_finish)
> +		mddev->bitmap_ops->reshape_finish(mddev);

[Severity: Low]
Is this hardcoded check for ID_LLBITMAP necessary?

By explicitly checking mddev->bitmap_id, this seems to defeat the polymorphism
of mddev->bitmap_ops. If another bitmap implementation introduces
reshape_finish(), wouldn't it be silently ignored?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1782282042.git.yukuai@kernel.org?part=15

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 18/20] md/raid5: reject llbitmap reshape when md chunk shrinks
  2026-06-24  6:42   ` [PATCH v2 18/20] md/raid5: reject llbitmap reshape when md chunk shrinks Yu Kuai
@ 2026-06-24  7:24     ` sashiko-bot
  0 siblings, 0 replies; 66+ messages in thread
From: sashiko-bot @ 2026-06-24  7:24 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, yukuai

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

New issues:
- [High] The validation in `check_reshape` fails to reject reshapes that decrease the number of data disks (`mddev->delta_disks < 0`) for llbitmap, leaving an identical bitmap coverage truncation vulnerability to the one the patch attempts to fix.

Pre-existing issues:
- [High] Writing 0 or a very small/overflowing value to the `chunk_size` sysfs attribute bypasses chunk size validation, leading to a divide-by-zero kernel panic during reshape.
- [High] Pre-existing memory leak of `stripe_head` pages in `resize_stripes()` if array reshape initialization fails.
--

commit cd96767f06f730181f5e43d1d52815ff0535ad82
Author: Yu Kuai <yukuai@fygo.io>

md/raid5: reject llbitmap reshape when md chunk shrinks
    
llbitmap reshape keeps one live bitmap and cannot safely make an
existing bitmap bit cover a smaller data range.

The llbitmap chunksize itself will not shrink when mddev->chunk_sectors
stays the same or grows. However, shrinking mddev->chunk_sectors shrinks
sectors_per_chunk used by raid5_bitmap_sector_map(). That can shrink the
effective data range covered by each bit across the old and new RAID5
geometry. Reject that reshape while llbitmap is active.

> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 1613a42cc25dc..9de648f75dd04 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -8506,6 +8506,9 @@ static int check_reshape(struct mddev *mddev)
>  	if (!check_stripe_cache(mddev))
>  		return -ENOSPC;
>  
> +	if (mddev->bitmap_id == ID_LLBITMAP &&
> +	    mddev->new_chunk_sectors < mddev->chunk_sectors)
> +		return -EOPNOTSUPP;

[Severity: High]
Does this validation fully address the logical coverage truncation?

The commit message correctly notes that shrinking mddev->chunk_sectors shrinks
sectors_per_chunk, which is unsafe for llbitmap because it shrinks the
effective data range covered by each bit.

However, sectors_per_chunk is also reduced if mddev->delta_disks < 0:

sectors_per_chunk = chunk_sectors * (raid_disks - max_degraded);

Since the validation here only checks if the new chunk sectors are smaller,
a reshape that reduces the number of data disks could bypass this check,
resulting in the exact same logical coverage truncation vulnerability.

Could this be updated to also check mddev->delta_disks < 0, or compute the
product to verify sectors_per_chunk does not shrink?

>  	if (mddev->new_chunk_sectors > mddev->chunk_sectors ||
>  	    mddev->delta_disks > 0)

[Severity: High]
This is a pre-existing issue, but writing 0 or a very small value to the
chunk_size sysfs attribute appears to bypass chunk size validation entirely,
leading to a divide-by-zero panic during reshape.

In drivers/md/md.c:chunk_size_store(), user input is right-shifted by 9
to compute mddev->new_chunk_sectors. If the user writes 0, it becomes 0.

When raid5_check_reshape() is called:

drivers/md/raid5.c:raid5_check_reshape() {
    ...
	if (new_chunk > 0) {
		if (!is_power_of_2(new_chunk))
			return -EINVAL;
    ...
}

Because the checks are inside if (new_chunk > 0), a value of 0 skips the
sanity bounds. check_reshape() then accepts it.

Later, in raid5_compute_sector(), this chunk size causes a kernel panic:

drivers/md/raid5.c:raid5_compute_sector() {
    ...
	chunk_offset = sector_div(r_sector, sectors_per_chunk);
    ...
}

Should chunk_size_store() or raid5_check_reshape() explicitly prevent a
chunk size of 0?

>  		if (resize_chunks(conf,
>  				  conf->previous_raid_disks
>  				  + max(0, mddev->delta_disks),
>  				  max(mddev->new_chunk_sectors,
>  				      mddev->chunk_sectors)
>  			    ) < 0)
>  			return -ENOMEM;
> 
>  	if (conf->previous_raid_disks + mddev->delta_disks <= conf->pool_size)
>  		return 0; /* never bother to shrink */
>  	return resize_stripes(conf, (conf->previous_raid_disks
>  				     + mddev->delta_disks));
>  }

[Severity: High]
This isn't a bug introduced by this patch, but is it possible for
resize_stripes() to leak the stripe_head pages if the array reshape
initialization fails?

If the array reshape fails under memory pressure during step 3 of
resize_stripes() (for example, the ndisks allocation fails), the function
structurally proceeds to step 4 without aborting:

drivers/md/raid5.c:resize_stripes() {
    ...
		for (i=conf->raid_disks; i < newsize; i++)
			if (nsh->dev[i].page == NULL) {
				struct page *p = alloc_page(GFP_NOIO);
				nsh->dev[i].page = p;
    ...
}

It loops up to newsize and allocates physical pages for the expanded stripe
slots. At the end of the function, because err is set, it skips updating
conf->pool_size:

drivers/md/raid5.c:resize_stripes() {
    ...
	if (!err)
		conf->pool_size = newsize;
    ...
}

Since conf->pool_size is unchanged, any future teardown via shrink_buffers()
will intentionally ignore the newly allocated pages at indices >= pool_size.
When free_stripe() subsequently destroys the slab object without freeing these
pages, they appear to be permanently leaked.

Is there a mechanism that cleans up these orphaned pages if conf->pool_size
is not updated?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1782282042.git.yukuai@kernel.org?part=18

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 20/20] md/raid5: split reshape bios before bitmap accounting
  2026-06-24  6:42   ` [PATCH v2 20/20] md/raid5: split reshape bios before bitmap accounting Yu Kuai
@ 2026-06-24  7:29     ` sashiko-bot
  0 siblings, 0 replies; 66+ messages in thread
From: sashiko-bot @ 2026-06-24  7:29 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, yukuai

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [Critical] Splitting a bio in `raid5_make_request()` breaks the `STRIPE_WAIT_RESHAPE` retry logic, causing a double-submission of the remainder and a permanent leak of the split front.

Pre-existing issues:
- [High] `make_discard_request()` leaks the `REQ_OP_DISCARD` bio without completion if an array is reshaping or `REQ_NOWAIT` is set.
--

commit fd0cb884709b54fc9c1fe1f70aa2016108b2a998
Author: Yu Kuai <yukuai@fygo.io>

md/raid5: split reshape bios before bitmap accounting

The commit uses mddev_bio_split_at_reshape_offset() in
raid5_make_request() to split bios at the reshape offset. This ensures
that RAID5 submits only one-side bios to llbitmap during reshape.

> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index ac7ea483502fd..fb346a3c4aa3b 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -6181,6 +6181,14 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)

[ ... ]

>  		return true;
>  	}
>  

[Severity: High]
This is a pre-existing issue, but is it possible for the discard bio to be
leaked if the array is reshaping or REQ_NOWAIT is set?

In make_discard_request() which is called just above this context, it returns
early without completing the bio:

    if (WARN_ON_ONCE(bi->bi_opf & REQ_NOWAIT))
        return;

    if (mddev->reshape_position != MaxSector)
        return;

Since raid5_make_request() calls make_discard_request() and then unconditionally
returns true, does this strand the bio and cause indefinite hangs?

> +	bi = mddev_bio_split_at_reshape_offset(mddev, bi, NULL,
> +					       &conf->bio_split);

[Severity: Critical]
Does splitting the bio here break the STRIPE_WAIT_RESHAPE retry logic and cause
a double-submission and memory leak?

When mddev_bio_split_at_reshape_offset() splits the bio, it submits the
remainder via bio_submit_split_bioset() and returns the front half as the
new bi.

Later in raid5_make_request(), if make_stripe_request() hits the reshape window
and returns STRIPE_WAIT_RESHAPE, it handles the retry:

    if (res == STRIPE_WAIT_RESHAPE) {
        DECLARE_COMPLETION_ONSTACK(done);
        WRITE_ONCE(bi->bi_private, &done);
        bio_endio(bi);
        wait_for_completion(&done);
        return false;
    }

When false is returned, md_handle_request() loops and retries the original
bio pointer.

However, the original bio was already submitted as the remainder by the split
operation. Will this result in concurrently submitting the remainder twice?

Additionally, when completing the cloned front half (bi) with the completion
flag, md_end_clone_io() does not complete the original split bio, which
permanently leaks the front half.

> +	if (!bi) {
> +		if (rw == WRITE)
> +			md_write_end(mddev);
> +		return true;
> +	}
> +
>  	logical_sector = bi->bi_iter.bi_sector & ~((sector_t)RAID5_STRIPE_SECTORS(conf)-1);
>  	bi->bi_next = NULL;
>

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1782282042.git.yukuai@kernel.org?part=20

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 08/20] md/md-llbitmap: finish reshape geometry
  2026-06-24  6:42   ` [PATCH v2 08/20] md/md-llbitmap: finish reshape geometry Yu Kuai
@ 2026-06-24  9:06     ` sashiko-bot
  0 siblings, 0 replies; 66+ messages in thread
From: sashiko-bot @ 2026-06-24  9:06 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, yukuai

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] The entire bitmap is unnecessarily flushed to disk twice during reshape finish.
--

commit ae4a6d71e5ee77405793f20d5d615e089187608f
Author: Yu Kuai <yukuai@fygo.io>
md/md-llbitmap: finish reshape geometry

This commit introduces the final step for reshaping llbitmap arrays by
committing the staged geometry. It handles quiescing the device, updating
internal state, writing the superblock, and flushing the bitmap.

> diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
> index cdced2126e975..ae2e87fd58874 100644
> --- a/drivers/md/md-llbitmap.c
> +++ b/drivers/md/md-llbitmap.c

[ ... ]

> @@ -1680,6 +1680,30 @@ static void llbitmap_dirty_bits(struct mddev *mddev, unsigned long s,
>  	llbitmap_state_machine(mddev->bitmap, s, e, BitmapActionStartwrite);
>  }
>  
> +static void llbitmap_reshape_finish(struct mddev *mddev)
> +{
> +	struct llbitmap *llbitmap = mddev->bitmap;
> +
> +	if (mddev->pers->quiesce)
> +		mddev->pers->quiesce(mddev, 1);
> +
> +	mutex_lock(&mddev->bitmap_info.mutex);
> +	llbitmap_flush(mddev);
> +
> +	llbitmap->chunksize = llbitmap->reshape_chunksize;
> +	llbitmap->chunkshift = ffz(~llbitmap->chunksize);
> +	llbitmap->chunks = llbitmap->reshape_chunks;
> +	llbitmap->sync_size = llbitmap->reshape_sync_size;
> +	llbitmap_refresh_reshape(llbitmap);
> +	mddev->bitmap_info.chunksize = llbitmap->chunksize;
> +	llbitmap_update_sb(llbitmap);
> +	__llbitmap_flush(mddev);

[Severity: Low]
Is this second flush of the entire bitmap redundant?

The code calls llbitmap_flush() at the beginning of the lock section.
After updating the geometry and writing out the superblock via
llbitmap_update_sb(), __llbitmap_flush() is called again. Since no bitmap
data bits are modified in the interim, does this cause an unnecessary
synchronous I/O spike that could be avoided?

> +	mutex_unlock(&mddev->bitmap_info.mutex);
> +
> +	if (mddev->pers->quiesce)
> +		mddev->pers->quiesce(mddev, 0);
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1782282042.git.yukuai@kernel.org?part=8

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2026-06-24  9:07 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-05  9:15 [PATCH 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
2026-06-05  9:15 ` [PATCH] md: add exact bitmap mapping and reshape hooks Yu Kuai
2026-06-05  9:15 ` [PATCH] md: skip bitmap accounting for empty write ranges Yu Kuai
2026-06-05  9:15 ` [PATCH] md: add helper to split bios at reshape offset Yu Kuai
2026-06-05  9:15 ` [PATCH] md/md-llbitmap: track bitmap sync_size explicitly Yu Kuai
2026-06-15 10:48   ` Su Yue
2026-06-05  9:15 ` [PATCH] md/md-llbitmap: allocate page controls independently Yu Kuai
2026-06-15 11:06   ` Su Yue
2026-06-05  9:15 ` [PATCH] md/md-llbitmap: grow the page cache in place for reshape Yu Kuai
2026-06-15 11:16   ` Su Yue
2026-06-15 16:19     ` yu kuai
2026-06-05  9:15 ` [PATCH] md/md-llbitmap: track target reshape geometry fields Yu Kuai
2026-06-05  9:15 ` [PATCH] md/md-llbitmap: finish reshape geometry Yu Kuai
2026-06-05  9:15 ` [PATCH] md/md-llbitmap: refuse reshape while llbitmap still needs sync Yu Kuai
2026-06-05  9:15 ` [PATCH] md/md-llbitmap: add reshape range mapping helpers Yu Kuai
2026-06-05  9:15 ` [PATCH] md/md-llbitmap: don't skip reshape ranges from bitmap state Yu Kuai
2026-06-05  9:15 ` [PATCH] md/md-llbitmap: remap checkpointed bits as reshape progresses Yu Kuai
2026-06-05  9:15 ` [PATCH] md/md-llbitmap: clamp state-machine walks to tracked bits Yu Kuai
2026-06-05  9:15 ` [PATCH] md/raid10: reject llbitmap reshape when md chunk shrinks Yu Kuai
2026-06-05  9:15 ` [PATCH] md/raid10: wire llbitmap reshape lifecycle Yu Kuai
2026-06-05  9:15 ` [PATCH] md/raid10: split reshape bios before bitmap accounting Yu Kuai
2026-06-05  9:15 ` [PATCH] md/raid5: add exact old and new llbitmap mapping helpers Yu Kuai
2026-06-05  9:15 ` [PATCH] md/raid5: reject llbitmap reshape when md chunk shrinks Yu Kuai
2026-06-05  9:15 ` [PATCH] md/raid5: wire llbitmap reshape lifecycle Yu Kuai
2026-06-05  9:15 ` [PATCH] md/raid5: split reshape bios before bitmap accounting Yu Kuai
2026-06-05 17:27   ` kernel test robot
2026-06-06  2:15   ` kernel test robot
2026-06-24  6:41 ` [PATCH v2 00/20] md/md-llbitmap: support reshape for RAID10 and RAID5 Yu Kuai
2026-06-24  6:41   ` [PATCH v2 01/20] md: add exact bitmap mapping and reshape hooks Yu Kuai
2026-06-24  6:41   ` [PATCH v2 02/20] md: skip bitmap accounting for empty write ranges Yu Kuai
2026-06-24  7:04     ` sashiko-bot
2026-06-24  6:42   ` [PATCH v2 03/20] md: add helper to split bios at reshape offset Yu Kuai
2026-06-24  7:01     ` sashiko-bot
2026-06-24  6:42   ` [PATCH v2 04/20] md/md-llbitmap: track bitmap sync_size explicitly Yu Kuai
2026-06-24  7:02     ` sashiko-bot
2026-06-24  6:42   ` [PATCH v2 05/20] md/md-llbitmap: allocate page controls independently Yu Kuai
2026-06-24  7:02     ` sashiko-bot
2026-06-24  6:42   ` [PATCH v2 06/20] md/md-llbitmap: grow the page cache in place for reshape Yu Kuai
2026-06-24  7:03     ` sashiko-bot
2026-06-24  6:42   ` [PATCH v2 07/20] md/md-llbitmap: track target reshape geometry fields Yu Kuai
2026-06-24  7:07     ` sashiko-bot
2026-06-24  6:42   ` [PATCH v2 08/20] md/md-llbitmap: finish reshape geometry Yu Kuai
2026-06-24  9:06     ` sashiko-bot
2026-06-24  6:42   ` [PATCH v2 09/20] md/md-llbitmap: refuse reshape while llbitmap still needs sync Yu Kuai
2026-06-24  7:04     ` sashiko-bot
2026-06-24  6:42   ` [PATCH v2 10/20] md/md-llbitmap: add reshape range mapping helpers Yu Kuai
2026-06-24  7:08     ` sashiko-bot
2026-06-24  6:42   ` [PATCH v2 11/20] md/md-llbitmap: don't skip reshape ranges from bitmap state Yu Kuai
2026-06-24  6:58     ` sashiko-bot
2026-06-24  6:42   ` [PATCH v2 12/20] md/md-llbitmap: remap checkpointed bits as reshape progresses Yu Kuai
2026-06-24  7:04     ` sashiko-bot
2026-06-24  6:42   ` [PATCH v2 13/20] md/md-llbitmap: clamp state-machine walks to tracked bits Yu Kuai
2026-06-24  7:06     ` sashiko-bot
2026-06-24  6:42   ` [PATCH v2 14/20] md/raid10: reject llbitmap reshape when md chunk shrinks Yu Kuai
2026-06-24  6:42   ` [PATCH v2 15/20] md/raid10: wire llbitmap reshape lifecycle Yu Kuai
2026-06-24  7:22     ` sashiko-bot
2026-06-24  6:42   ` [PATCH v2 16/20] md/raid10: split reshape bios before bitmap accounting Yu Kuai
2026-06-24  7:20     ` sashiko-bot
2026-06-24  6:42   ` [PATCH v2 17/20] md/raid5: add exact old and new llbitmap mapping helpers Yu Kuai
2026-06-24  7:16     ` sashiko-bot
2026-06-24  6:42   ` [PATCH v2 18/20] md/raid5: reject llbitmap reshape when md chunk shrinks Yu Kuai
2026-06-24  7:24     ` sashiko-bot
2026-06-24  6:42   ` [PATCH v2 19/20] md/raid5: wire llbitmap reshape lifecycle Yu Kuai
2026-06-24  7:20     ` sashiko-bot
2026-06-24  6:42   ` [PATCH v2 20/20] md/raid5: split reshape bios before bitmap accounting Yu Kuai
2026-06-24  7:29     ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox