public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] md/md-llbitmap: fixes and proactive parity building support
@ 2026-02-14  6:10 Yu Kuai
  2026-02-14  6:10 ` [PATCH 1/5] md/md-llbitmap: skip reading rdevs that are not in_sync Yu Kuai
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Yu Kuai @ 2026-02-14  6:10 UTC (permalink / raw)
  To: song; +Cc: linan122, xni, colyli, linux-raid, linux-kernel

This series contains fixes and enhancements for the md-llbitmap (lockless
bitmap) implementation.

Patches 1-2 are bug fixes:
- Patch 1 fixes bitmap data being read from spare disks that are not yet
  in sync, which could lead to incorrect dirty bit tracking.
- Patch 2 fixes a race condition where the state machine could transition
  before the barrier is properly raised.

Patch 3 improves compatibility with older mdadm versions by detecting
on-disk bitmap version and falling back to the correct bitmap_ops when
there's a version mismatch.

Patch 4 adds support for proactive XOR parity building in RAID-456 arrays.
This allows users to pre-build parity for unwritten regions via sysfs
before any user data is written, which can improve write performance for
workloads that will eventually use all storage. New states (CleanUnwritten,
NeedSyncUnwritten, SyncingUnwritten) are added to track these regions
separately from normal dirty/syncing states.

Patch 5 optimizes initial array sync for RAID-456 arrays on devices that
support write_zeroes with unmap. By zeroing all disks upfront, parity is
automatically consistent (0 XOR 0 = 0), allowing the bitmap to be
initialized to BitCleanUnwritten and skipping the initial sync entirely.
This significantly reduces array initialization time on modern NVMe SSDs.

Yu Kuai (5):
  md/md-llbitmap: skip reading rdevs that are not in_sync
  md/md-llbitmap: raise barrier before state machine transition
  md: add fallback to correct bitmap_ops on version mismatch
  md/md-llbitmap: add CleanUnwritten state for RAID-5 proactive parity
    building
  md/md-llbitmap: optimize initial sync with write_zeroes_unmap support

 drivers/md/md-llbitmap.c | 213 +++++++++++++++++++++++++++++++++++----
 drivers/md/md.c          | 109 +++++++++++++++++++-
 2 files changed, 301 insertions(+), 21 deletions(-)

-- 
2.51.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/5] md/md-llbitmap: skip reading rdevs that are not in_sync
  2026-02-14  6:10 [PATCH 0/5] md/md-llbitmap: fixes and proactive parity building support Yu Kuai
@ 2026-02-14  6:10 ` Yu Kuai
  2026-02-14  6:10 ` [PATCH 2/5] md/md-llbitmap: raise barrier before state machine transition Yu Kuai
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Yu Kuai @ 2026-02-14  6:10 UTC (permalink / raw)
  To: song; +Cc: linan122, xni, colyli, linux-raid, linux-kernel

When reading bitmap pages from member disks, the code iterates through
all rdevs and attempts to read from the first available one. However,
it only checks for raid_disk assignment and Faulty flag, missing the
In_sync flag check.

This can cause bitmap data to be read from spare disks that are still
being rebuilt and don't have valid bitmap information yet. Reading
stale or uninitialized bitmap data from such disks can lead to
incorrect dirty bit tracking, potentially causing data corruption
during recovery or normal operation.

Add the In_sync flag check to ensure bitmap pages are only read from
fully synchronized member disks that have valid bitmap data.

Cc: stable@vger.kernel.org
Fixes: 5ab829f1971d ("md/md-llbitmap: introduce new lockless bitmap")
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 drivers/md/md-llbitmap.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index cd713a7dc270..30d7e36b22c4 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -459,7 +459,8 @@ static struct page *llbitmap_read_page(struct llbitmap *llbitmap, int idx)
 	rdev_for_each(rdev, mddev) {
 		sector_t sector;
 
-		if (rdev->raid_disk < 0 || test_bit(Faulty, &rdev->flags))
+		if (rdev->raid_disk < 0 || test_bit(Faulty, &rdev->flags) ||
+		    !test_bit(In_sync, &rdev->flags))
 			continue;
 
 		sector = mddev->bitmap_info.offset +
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/5] md/md-llbitmap: raise barrier before state machine transition
  2026-02-14  6:10 [PATCH 0/5] md/md-llbitmap: fixes and proactive parity building support Yu Kuai
  2026-02-14  6:10 ` [PATCH 1/5] md/md-llbitmap: skip reading rdevs that are not in_sync Yu Kuai
@ 2026-02-14  6:10 ` Yu Kuai
  2026-02-14  6:10 ` [PATCH 3/5] md: add fallback to correct bitmap_ops on version mismatch Yu Kuai
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Yu Kuai @ 2026-02-14  6:10 UTC (permalink / raw)
  To: song; +Cc: linan122, xni, colyli, linux-raid, linux-kernel

Move the barrier raise operation before calling llbitmap_state_machine()
in both llbitmap_start_write() and llbitmap_start_discard(). This
ensures the barrier is in place before any state transitions occur,
preventing potential race conditions where the state machine could
complete before the barrier is properly raised.

Cc: stable@vger.kernel.org
Fixes: 5ab829f1971d ("md/md-llbitmap: introduce new lockless bitmap")
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 drivers/md/md-llbitmap.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index 30d7e36b22c4..5f9e7004e3e3 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -1070,12 +1070,12 @@ static void llbitmap_start_write(struct mddev *mddev, sector_t offset,
 	int page_start = (start + BITMAP_DATA_OFFSET) >> PAGE_SHIFT;
 	int page_end = (end + BITMAP_DATA_OFFSET) >> PAGE_SHIFT;
 
-	llbitmap_state_machine(llbitmap, start, end, BitmapActionStartwrite);
-
 	while (page_start <= page_end) {
 		llbitmap_raise_barrier(llbitmap, page_start);
 		page_start++;
 	}
+
+	llbitmap_state_machine(llbitmap, start, end, BitmapActionStartwrite);
 }
 
 static void llbitmap_end_write(struct mddev *mddev, sector_t offset,
@@ -1102,12 +1102,12 @@ static void llbitmap_start_discard(struct mddev *mddev, sector_t offset,
 	int page_start = (start + BITMAP_DATA_OFFSET) >> PAGE_SHIFT;
 	int page_end = (end + BITMAP_DATA_OFFSET) >> PAGE_SHIFT;
 
-	llbitmap_state_machine(llbitmap, start, end, BitmapActionDiscard);
-
 	while (page_start <= page_end) {
 		llbitmap_raise_barrier(llbitmap, page_start);
 		page_start++;
 	}
+
+	llbitmap_state_machine(llbitmap, start, end, BitmapActionDiscard);
 }
 
 static void llbitmap_end_discard(struct mddev *mddev, sector_t offset,
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/5] md: add fallback to correct bitmap_ops on version mismatch
  2026-02-14  6:10 [PATCH 0/5] md/md-llbitmap: fixes and proactive parity building support Yu Kuai
  2026-02-14  6:10 ` [PATCH 1/5] md/md-llbitmap: skip reading rdevs that are not in_sync Yu Kuai
  2026-02-14  6:10 ` [PATCH 2/5] md/md-llbitmap: raise barrier before state machine transition Yu Kuai
@ 2026-02-14  6:10 ` Yu Kuai
  2026-02-17  8:54   ` Su Yue
  2026-02-14  6:10 ` [PATCH 4/5] md/md-llbitmap: add CleanUnwritten state for RAID-5 proactive parity building Yu Kuai
  2026-02-14  6:10 ` [PATCH 5/5] md/md-llbitmap: optimize initial sync with write_zeroes_unmap support Yu Kuai
  4 siblings, 1 reply; 11+ messages in thread
From: Yu Kuai @ 2026-02-14  6:10 UTC (permalink / raw)
  To: song; +Cc: linan122, xni, colyli, linux-raid, linux-kernel

If default bitmap version and on-disk version doesn't match, and mdadm
is not the latest version to set bitmap_type, set bitmap_ops based on
the disk version.

Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 drivers/md/md.c | 103 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 102 insertions(+), 1 deletion(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 59cd303548de..d2607ed5c2e9 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6447,15 +6447,116 @@ static void md_safemode_timeout(struct timer_list *t)
 
 static int start_dirty_degraded;
 
+/*
+ * Read bitmap superblock and return the bitmap_id based on disk version.
+ * This is used as fallback when default bitmap version and on-disk version
+ * doesn't match, and mdadm is not the latest version to set bitmap_type.
+ */
+static enum md_submodule_id md_bitmap_get_id_from_sb(struct mddev *mddev)
+{
+	struct md_rdev *rdev;
+	struct page *sb_page;
+	bitmap_super_t *sb;
+	enum md_submodule_id id = ID_BITMAP_NONE;
+	sector_t sector;
+	u32 version;
+
+	if (!mddev->bitmap_info.offset)
+		return ID_BITMAP_NONE;
+
+	sb_page = alloc_page(GFP_KERNEL);
+	if (!sb_page)
+		return ID_BITMAP_NONE;
+
+	sector = mddev->bitmap_info.offset;
+
+	rdev_for_each(rdev, mddev) {
+		u32 iosize;
+
+		if (!test_bit(In_sync, &rdev->flags) ||
+		    test_bit(Faulty, &rdev->flags) ||
+		    test_bit(Bitmap_sync, &rdev->flags))
+			continue;
+
+		iosize = roundup(sizeof(bitmap_super_t),
+				 bdev_logical_block_size(rdev->bdev));
+		if (sync_page_io(rdev, sector, iosize, sb_page, REQ_OP_READ,
+				 true))
+			goto read_ok;
+	}
+	goto out;
+
+read_ok:
+	sb = kmap_local_page(sb_page);
+	if (sb->magic != cpu_to_le32(BITMAP_MAGIC))
+		goto out_unmap;
+
+	version = le32_to_cpu(sb->version);
+	switch (version) {
+	case BITMAP_MAJOR_LO:
+	case BITMAP_MAJOR_HI:
+	case BITMAP_MAJOR_CLUSTERED:
+		id = ID_BITMAP;
+		break;
+	case BITMAP_MAJOR_LOCKLESS:
+		id = ID_LLBITMAP;
+		break;
+	default:
+		pr_warn("md: %s: unknown bitmap version %u\n",
+			mdname(mddev), version);
+		break;
+	}
+
+out_unmap:
+	kunmap_local(sb);
+out:
+	__free_page(sb_page);
+	return id;
+}
+
 static int md_bitmap_create(struct mddev *mddev)
 {
+	enum md_submodule_id orig_id = mddev->bitmap_id;
+	enum md_submodule_id sb_id;
+	int err;
+
 	if (mddev->bitmap_id == ID_BITMAP_NONE)
 		return -EINVAL;
 
 	if (!mddev_set_bitmap_ops(mddev))
 		return -ENOENT;
 
-	return mddev->bitmap_ops->create(mddev);
+	err = mddev->bitmap_ops->create(mddev);
+	if (!err)
+		return 0;
+
+	/*
+	 * Create failed, if default bitmap version and on-disk version
+	 * doesn't match, and mdadm is not the latest version to set
+	 * bitmap_type, set bitmap_ops based on the disk version.
+	 */
+	mddev_clear_bitmap_ops(mddev);
+
+	sb_id = md_bitmap_get_id_from_sb(mddev);
+	if (sb_id == ID_BITMAP_NONE || sb_id == orig_id)
+		return err;
+
+	pr_info("md: %s: bitmap version mismatch, switching from %d to %d\n",
+		mdname(mddev), orig_id, sb_id);
+
+	mddev->bitmap_id = sb_id;
+	if (!mddev_set_bitmap_ops(mddev)) {
+		mddev->bitmap_id = orig_id;
+		return -ENOENT;
+	}
+
+	err = mddev->bitmap_ops->create(mddev);
+	if (err) {
+		mddev_clear_bitmap_ops(mddev);
+		mddev->bitmap_id = orig_id;
+	}
+
+	return err;
 }
 
 static void md_bitmap_destroy(struct mddev *mddev)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/5] md/md-llbitmap: add CleanUnwritten state for RAID-5 proactive parity building
  2026-02-14  6:10 [PATCH 0/5] md/md-llbitmap: fixes and proactive parity building support Yu Kuai
                   ` (2 preceding siblings ...)
  2026-02-14  6:10 ` [PATCH 3/5] md: add fallback to correct bitmap_ops on version mismatch Yu Kuai
@ 2026-02-14  6:10 ` Yu Kuai
  2026-02-14  6:10 ` [PATCH 5/5] md/md-llbitmap: optimize initial sync with write_zeroes_unmap support Yu Kuai
  4 siblings, 0 replies; 11+ messages in thread
From: Yu Kuai @ 2026-02-14  6:10 UTC (permalink / raw)
  To: song; +Cc: linan122, xni, colyli, linux-raid, linux-kernel

Add new states to the llbitmap state machine to support proactive XOR
parity building for RAID-5 arrays. This allows users to pre-build parity
data for unwritten regions before any user data is written.

New states added:
- BitNeedSyncUnwritten: Transitional state when proactive sync is triggered
  via sysfs on Unwritten regions.
- BitSyncingUnwritten: Proactive sync in progress for unwritten region.
- BitCleanUnwritten: XOR parity has been pre-built, but no user data
  written yet. When user writes to this region, it transitions to BitDirty.

New actions added:
- BitmapActionProactiveSync: Trigger for proactive XOR parity building.
- BitmapActionClearUnwritten: Convert CleanUnwritten/NeedSyncUnwritten/
  SyncingUnwritten states back to Unwritten before recovery starts.

State flows:
- Current (lazy): Unwritten -> (write) -> NeedSync -> (sync) -> Dirty -> Clean
- New (proactive): Unwritten -> (sysfs) -> NeedSyncUnwritten -> (sync) -> CleanUnwritten
- On write to CleanUnwritten: CleanUnwritten -> (write) -> Dirty -> Clean
- On disk replacement: CleanUnwritten regions are converted to Unwritten
  before recovery starts, so recovery only rebuilds regions with user data

A new sysfs interface is added at /sys/block/mdX/md/llbitmap/proactive_sync
(write-only) to trigger proactive sync. This only works for RAID-456 arrays.

Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 drivers/md/md-llbitmap.c | 140 +++++++++++++++++++++++++++++++++++----
 drivers/md/md.c          |   6 +-
 2 files changed, 132 insertions(+), 14 deletions(-)

diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index 5f9e7004e3e3..461050b2771b 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -208,6 +208,20 @@ enum llbitmap_state {
 	BitNeedSync,
 	/* data is synchronizing */
 	BitSyncing,
+	/*
+	 * Proactive sync requested for unwritten region (raid456 only).
+	 * Triggered via sysfs when user wants to pre-build XOR parity
+	 * for regions that have never been written.
+	 */
+	BitNeedSyncUnwritten,
+	/* Proactive sync in progress for unwritten region */
+	BitSyncingUnwritten,
+	/*
+	 * XOR parity has been pre-built for a region that has never had
+	 * user data written. When user writes to this region, it transitions
+	 * to BitDirty.
+	 */
+	BitCleanUnwritten,
 	BitStateCount,
 	BitNone = 0xff,
 };
@@ -232,6 +246,12 @@ enum llbitmap_action {
 	 * BitNeedSync.
 	 */
 	BitmapActionStale,
+	/*
+	 * Proactive sync trigger for raid456 - builds XOR parity for
+	 * Unwritten regions without requiring user data write first.
+	 */
+	BitmapActionProactiveSync,
+	BitmapActionClearUnwritten,
 	BitmapActionCount,
 	/* Init state is BitUnwritten */
 	BitmapActionInit,
@@ -304,6 +324,8 @@ static char state_machine[BitStateCount][BitmapActionCount] = {
 		[BitmapActionDaemon]		= BitNone,
 		[BitmapActionDiscard]		= BitNone,
 		[BitmapActionStale]		= BitNone,
+		[BitmapActionProactiveSync]	= BitNeedSyncUnwritten,
+		[BitmapActionClearUnwritten]	= BitNone,
 	},
 	[BitClean] = {
 		[BitmapActionStartwrite]	= BitDirty,
@@ -314,6 +336,8 @@ static char state_machine[BitStateCount][BitmapActionCount] = {
 		[BitmapActionDaemon]		= BitNone,
 		[BitmapActionDiscard]		= BitUnwritten,
 		[BitmapActionStale]		= BitNeedSync,
+		[BitmapActionProactiveSync]	= BitNone,
+		[BitmapActionClearUnwritten]	= BitNone,
 	},
 	[BitDirty] = {
 		[BitmapActionStartwrite]	= BitNone,
@@ -324,6 +348,8 @@ static char state_machine[BitStateCount][BitmapActionCount] = {
 		[BitmapActionDaemon]		= BitClean,
 		[BitmapActionDiscard]		= BitUnwritten,
 		[BitmapActionStale]		= BitNeedSync,
+		[BitmapActionProactiveSync]	= BitNone,
+		[BitmapActionClearUnwritten]	= BitNone,
 	},
 	[BitNeedSync] = {
 		[BitmapActionStartwrite]	= BitNone,
@@ -334,6 +360,8 @@ static char state_machine[BitStateCount][BitmapActionCount] = {
 		[BitmapActionDaemon]		= BitNone,
 		[BitmapActionDiscard]		= BitUnwritten,
 		[BitmapActionStale]		= BitNone,
+		[BitmapActionProactiveSync]	= BitNone,
+		[BitmapActionClearUnwritten]	= BitNone,
 	},
 	[BitSyncing] = {
 		[BitmapActionStartwrite]	= BitNone,
@@ -344,6 +372,44 @@ static char state_machine[BitStateCount][BitmapActionCount] = {
 		[BitmapActionDaemon]		= BitNone,
 		[BitmapActionDiscard]		= BitUnwritten,
 		[BitmapActionStale]		= BitNeedSync,
+		[BitmapActionProactiveSync]	= BitNone,
+		[BitmapActionClearUnwritten]	= BitNone,
+	},
+	[BitNeedSyncUnwritten] = {
+		[BitmapActionStartwrite]	= BitNeedSync,
+		[BitmapActionStartsync]		= BitSyncingUnwritten,
+		[BitmapActionEndsync]		= BitNone,
+		[BitmapActionAbortsync]		= BitUnwritten,
+		[BitmapActionReload]		= BitUnwritten,
+		[BitmapActionDaemon]		= BitNone,
+		[BitmapActionDiscard]		= BitUnwritten,
+		[BitmapActionStale]		= BitUnwritten,
+		[BitmapActionProactiveSync]	= BitNone,
+		[BitmapActionClearUnwritten]	= BitUnwritten,
+	},
+	[BitSyncingUnwritten] = {
+		[BitmapActionStartwrite]	= BitSyncing,
+		[BitmapActionStartsync]		= BitSyncingUnwritten,
+		[BitmapActionEndsync]		= BitCleanUnwritten,
+		[BitmapActionAbortsync]		= BitUnwritten,
+		[BitmapActionReload]		= BitUnwritten,
+		[BitmapActionDaemon]		= BitNone,
+		[BitmapActionDiscard]		= BitUnwritten,
+		[BitmapActionStale]		= BitUnwritten,
+		[BitmapActionProactiveSync]	= BitNone,
+		[BitmapActionClearUnwritten]	= BitUnwritten,
+	},
+	[BitCleanUnwritten] = {
+		[BitmapActionStartwrite]	= BitDirty,
+		[BitmapActionStartsync]		= BitNone,
+		[BitmapActionEndsync]		= BitNone,
+		[BitmapActionAbortsync]		= BitNone,
+		[BitmapActionReload]		= BitNone,
+		[BitmapActionDaemon]		= BitNone,
+		[BitmapActionDiscard]		= BitUnwritten,
+		[BitmapActionStale]		= BitUnwritten,
+		[BitmapActionProactiveSync]	= BitNone,
+		[BitmapActionClearUnwritten]	= BitUnwritten,
 	},
 };
 
@@ -376,6 +442,7 @@ static void llbitmap_infect_dirty_bits(struct llbitmap *llbitmap,
 			pctl->state[pos] = level_456 ? BitNeedSync : BitDirty;
 			break;
 		case BitClean:
+		case BitCleanUnwritten:
 			pctl->state[pos] = BitDirty;
 			break;
 		}
@@ -383,7 +450,7 @@ static void llbitmap_infect_dirty_bits(struct llbitmap *llbitmap,
 }
 
 static void llbitmap_set_page_dirty(struct llbitmap *llbitmap, int idx,
-				    int offset)
+				    int offset, bool infect)
 {
 	struct llbitmap_page_ctl *pctl = llbitmap->pctl[idx];
 	unsigned int io_size = llbitmap->io_size;
@@ -398,7 +465,7 @@ static void llbitmap_set_page_dirty(struct llbitmap *llbitmap, int idx,
 	 * resync all the dirty bits, hence skip infect new dirty bits to
 	 * prevent resync unnecessary data.
 	 */
-	if (llbitmap->mddev->degraded) {
+	if (llbitmap->mddev->degraded || !infect) {
 		set_bit(block, pctl->dirty);
 		return;
 	}
@@ -438,7 +505,9 @@ static void llbitmap_write(struct llbitmap *llbitmap, enum llbitmap_state state,
 
 	llbitmap->pctl[idx]->state[bit] = state;
 	if (state == BitDirty || state == BitNeedSync)
-		llbitmap_set_page_dirty(llbitmap, idx, bit);
+		llbitmap_set_page_dirty(llbitmap, idx, bit, true);
+	else if (state == BitNeedSyncUnwritten)
+		llbitmap_set_page_dirty(llbitmap, idx, bit, false);
 }
 
 static struct page *llbitmap_read_page(struct llbitmap *llbitmap, int idx)
@@ -627,11 +696,10 @@ static enum llbitmap_state llbitmap_state_machine(struct llbitmap *llbitmap,
 			goto write_bitmap;
 		}
 
-		if (c == BitNeedSync)
+		if (c == BitNeedSync || c == BitNeedSyncUnwritten)
 			need_resync = !mddev->degraded;
 
 		state = state_machine[c][action];
-
 write_bitmap:
 		if (unlikely(mddev->degraded)) {
 			/* For degraded array, mark new data as need sync. */
@@ -658,8 +726,7 @@ static enum llbitmap_state llbitmap_state_machine(struct llbitmap *llbitmap,
 		}
 
 		llbitmap_write(llbitmap, state, start);
-
-		if (state == BitNeedSync)
+		if (state == BitNeedSync || state == BitNeedSyncUnwritten)
 			need_resync = !mddev->degraded;
 		else if (state == BitDirty &&
 			 !timer_pending(&llbitmap->pending_timer))
@@ -1229,7 +1296,7 @@ static bool llbitmap_blocks_synced(struct mddev *mddev, sector_t offset)
 	unsigned long p = offset >> llbitmap->chunkshift;
 	enum llbitmap_state c = llbitmap_read(llbitmap, p);
 
-	return c == BitClean || c == BitDirty;
+	return c == BitClean || c == BitDirty || c == BitCleanUnwritten;
 }
 
 static sector_t llbitmap_skip_sync_blocks(struct mddev *mddev, sector_t offset)
@@ -1243,6 +1310,10 @@ static sector_t llbitmap_skip_sync_blocks(struct mddev *mddev, sector_t offset)
 	if (c == BitUnwritten)
 		return blocks;
 
+	/* Skip CleanUnwritten - no user data, will be reset after recovery */
+	if (c == BitCleanUnwritten)
+		return blocks;
+
 	/* For degraded array, don't skip */
 	if (mddev->degraded)
 		return 0;
@@ -1261,14 +1332,25 @@ static bool llbitmap_start_sync(struct mddev *mddev, sector_t offset,
 {
 	struct llbitmap *llbitmap = mddev->bitmap;
 	unsigned long p = offset >> llbitmap->chunkshift;
+	enum llbitmap_state state;
+
+	/*
+	 * Before recovery starts, convert CleanUnwritten to Unwritten.
+	 * This ensures the new disk won't have stale parity data.
+	 */
+	if (offset == 0 && test_bit(MD_RECOVERY_RECOVER, &mddev->recovery) &&
+	    !test_bit(MD_RECOVERY_LAZY_RECOVER, &mddev->recovery))
+		llbitmap_state_machine(llbitmap, 0, llbitmap->chunks - 1,
+				       BitmapActionClearUnwritten);
+
 
 	/*
 	 * Handle one bit at a time, this is much simpler. And it doesn't matter
 	 * if md_do_sync() loop more times.
 	 */
 	*blocks = llbitmap->chunksize - (offset & (llbitmap->chunksize - 1));
-	return llbitmap_state_machine(llbitmap, p, p,
-				      BitmapActionStartsync) == BitSyncing;
+	state = llbitmap_state_machine(llbitmap, p, p, BitmapActionStartsync);
+	return state == BitSyncing || state == BitSyncingUnwritten;
 }
 
 /* Something is wrong, sync_thread stop at @offset */
@@ -1474,9 +1556,15 @@ static ssize_t bits_show(struct mddev *mddev, char *page)
 	}
 
 	mutex_unlock(&mddev->bitmap_info.mutex);
-	return sprintf(page, "unwritten %d\nclean %d\ndirty %d\nneed sync %d\nsyncing %d\n",
+	return sprintf(page,
+		       "unwritten %d\nclean %d\ndirty %d\n"
+		       "need sync %d\nsyncing %d\n"
+		       "need sync unwritten %d\nsyncing unwritten %d\n"
+		       "clean unwritten %d\n",
 		       bits[BitUnwritten], bits[BitClean], bits[BitDirty],
-		       bits[BitNeedSync], bits[BitSyncing]);
+		       bits[BitNeedSync], bits[BitSyncing],
+		       bits[BitNeedSyncUnwritten], bits[BitSyncingUnwritten],
+		       bits[BitCleanUnwritten]);
 }
 
 static struct md_sysfs_entry llbitmap_bits = __ATTR_RO(bits);
@@ -1549,11 +1637,39 @@ barrier_idle_store(struct mddev *mddev, const char *buf, size_t len)
 
 static struct md_sysfs_entry llbitmap_barrier_idle = __ATTR_RW(barrier_idle);
 
+static ssize_t
+proactive_sync_store(struct mddev *mddev, const char *buf, size_t len)
+{
+	struct llbitmap *llbitmap;
+
+	/* Only for RAID-456 */
+	if (!raid_is_456(mddev))
+		return -EINVAL;
+
+	mutex_lock(&mddev->bitmap_info.mutex);
+	llbitmap = mddev->bitmap;
+	if (!llbitmap || !llbitmap->pctl) {
+		mutex_unlock(&mddev->bitmap_info.mutex);
+		return -ENODEV;
+	}
+
+	/* Trigger proactive sync on all Unwritten regions */
+	llbitmap_state_machine(llbitmap, 0, llbitmap->chunks - 1,
+			       BitmapActionProactiveSync);
+
+	mutex_unlock(&mddev->bitmap_info.mutex);
+	return len;
+}
+
+static struct md_sysfs_entry llbitmap_proactive_sync =
+	__ATTR(proactive_sync, 0200, NULL, proactive_sync_store);
+
 static struct attribute *md_llbitmap_attrs[] = {
 	&llbitmap_bits.attr,
 	&llbitmap_metadata.attr,
 	&llbitmap_daemon_sleep.attr,
 	&llbitmap_barrier_idle.attr,
+	&llbitmap_proactive_sync.attr,
 	NULL
 };
 
diff --git a/drivers/md/md.c b/drivers/md/md.c
index d2607ed5c2e9..270802b8a4fc 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9870,8 +9870,10 @@ void md_do_sync(struct md_thread *thread)
 				 * Give other IO more of a chance.
 				 * The faster the devices, the less we wait.
 				 */
-				wait_event(mddev->recovery_wait,
-					   !atomic_read(&mddev->recovery_active));
+				wait_event_timeout(
+					mddev->recovery_wait,
+					!atomic_read(&mddev->recovery_active),
+					HZ);
 			}
 		}
 	}
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/5] md/md-llbitmap: optimize initial sync with write_zeroes_unmap support
  2026-02-14  6:10 [PATCH 0/5] md/md-llbitmap: fixes and proactive parity building support Yu Kuai
                   ` (3 preceding siblings ...)
  2026-02-14  6:10 ` [PATCH 4/5] md/md-llbitmap: add CleanUnwritten state for RAID-5 proactive parity building Yu Kuai
@ 2026-02-14  6:10 ` Yu Kuai
  4 siblings, 0 replies; 11+ messages in thread
From: Yu Kuai @ 2026-02-14  6:10 UTC (permalink / raw)
  To: song; +Cc: linan122, xni, colyli, linux-raid, linux-kernel

For RAID-456 arrays with llbitmap, if all underlying disks support
write_zeroes with unmap, issue write_zeroes to zero all disk data
regions and initialize the bitmap to BitCleanUnwritten instead of
BitUnwritten.

This optimization skips the initial XOR parity building because:
1. write_zeroes with unmap guarantees zeroed reads after the operation
2. For RAID-456, when all data is zero, parity is automatically
   consistent (0 XOR 0 XOR ... = 0)
3. BitCleanUnwritten indicates parity is valid but no user data
   has been written

The implementation adds two helper functions:
- llbitmap_all_disks_support_wzeroes_unmap(): Checks if all active
  disks support write_zeroes with unmap
- llbitmap_zero_all_disks(): Issues blkdev_issue_zeroout() to each
  rdev's data region to zero all disks

The zeroing and bitmap state setting happens in llbitmap_init_state()
during bitmap initialization. If any disk fails to zero, we fall back
to BitUnwritten and normal lazy recovery.

This significantly reduces array initialization time for RAID-456
arrays built on modern NVMe SSDs or other devices that support
write_zeroes with unmap.

Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
 drivers/md/md-llbitmap.c | 62 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index 461050b2771b..48bc6a639edd 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -654,13 +654,73 @@ static int llbitmap_cache_pages(struct llbitmap *llbitmap)
 	return 0;
 }
 
+/*
+ * Check if all underlying disks support write_zeroes with unmap.
+ */
+static bool llbitmap_all_disks_support_wzeroes_unmap(struct llbitmap *llbitmap)
+{
+	struct mddev *mddev = llbitmap->mddev;
+	struct md_rdev *rdev;
+
+	rdev_for_each(rdev, mddev) {
+		if (rdev->raid_disk < 0 || test_bit(Faulty, &rdev->flags))
+			continue;
+
+		if (bdev_write_zeroes_unmap_sectors(rdev->bdev) == 0)
+			return false;
+	}
+
+	return true;
+}
+
+/*
+ * Issue write_zeroes to all underlying disks to zero their data regions.
+ * This ensures parity consistency for RAID-456 (0 XOR 0 = 0).
+ * Returns true if all disks were successfully zeroed.
+ */
+static bool llbitmap_zero_all_disks(struct llbitmap *llbitmap)
+{
+	struct mddev *mddev = llbitmap->mddev;
+	struct md_rdev *rdev;
+	sector_t dev_sectors = mddev->dev_sectors;
+	int ret;
+
+	rdev_for_each(rdev, mddev) {
+		if (rdev->raid_disk < 0 || test_bit(Faulty, &rdev->flags))
+			continue;
+
+		ret = blkdev_issue_zeroout(rdev->bdev,
+					   rdev->data_offset,
+					   dev_sectors,
+					   GFP_KERNEL, 0);
+		if (ret) {
+			pr_warn("md/llbitmap: failed to zero disk %pg: %d\n",
+				rdev->bdev, ret);
+			return false;
+		}
+	}
+
+	return true;
+}
+
 static void llbitmap_init_state(struct llbitmap *llbitmap)
 {
+	struct mddev *mddev = llbitmap->mddev;
 	enum llbitmap_state state = BitUnwritten;
 	unsigned long i;
 
-	if (test_and_clear_bit(BITMAP_CLEAN, &llbitmap->flags))
+	if (test_and_clear_bit(BITMAP_CLEAN, &llbitmap->flags)) {
 		state = BitClean;
+	} else if (raid_is_456(mddev) &&
+		   llbitmap_all_disks_support_wzeroes_unmap(llbitmap)) {
+		/*
+		 * All disks support write_zeroes with unmap. Zero all disks
+		 * to ensure parity consistency, then set BitCleanUnwritten
+		 * to skip initial sync.
+		 */
+		if (llbitmap_zero_all_disks(llbitmap))
+			state = BitCleanUnwritten;
+	}
 
 	for (i = 0; i < llbitmap->chunks; i++)
 		llbitmap_write(llbitmap, state, i);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/5] md: add fallback to correct bitmap_ops on version mismatch
  2026-02-14  6:10 ` [PATCH 3/5] md: add fallback to correct bitmap_ops on version mismatch Yu Kuai
@ 2026-02-17  8:54   ` Su Yue
  2026-02-23  2:22     ` Yu Kuai
  0 siblings, 1 reply; 11+ messages in thread
From: Su Yue @ 2026-02-17  8:54 UTC (permalink / raw)
  To: Yu Kuai; +Cc: song, linan122, xni, colyli, linux-raid, linux-kernel

On Sat 14 Feb 2026 at 14:10, Yu Kuai <yukuai@fnnas.com> wrote:

> If default bitmap version and on-disk version doesn't match, and 
> mdadm
> is not the latest version to set bitmap_type, set bitmap_ops 
> based on
> the disk version.
>
Why not just let old version mdadm fails  since llbitmap is a new 
feature.

> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
> ---
>  drivers/md/md.c | 103 
>  +++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 102 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 59cd303548de..d2607ed5c2e9 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -6447,15 +6447,116 @@ static void md_safemode_timeout(struct 
> timer_list *t)
>
>  static int start_dirty_degraded;
>
> +/*
> + * Read bitmap superblock and return the bitmap_id based on 
> disk version.
> + * This is used as fallback when default bitmap version and 
> on-disk version
> + * doesn't match, and mdadm is not the latest version to set 
> bitmap_type.
> + */
> +static enum md_submodule_id md_bitmap_get_id_from_sb(struct 
> mddev *mddev)
> +{
> +	struct md_rdev *rdev;
> +	struct page *sb_page;
> +	bitmap_super_t *sb;
> +	enum md_submodule_id id = ID_BITMAP_NONE;
> +	sector_t sector;
> +	u32 version;
> +
> +	if (!mddev->bitmap_info.offset)
> +		return ID_BITMAP_NONE;
> +
> +	sb_page = alloc_page(GFP_KERNEL);
> +	if (!sb_page)
> +		return ID_BITMAP_NONE;
> +
>
Personally I don't like the way treating error as ID_BITMAP_NONE.
When wrong things happen everything looks fine, no error code, no 
error message.

> +	sector = mddev->bitmap_info.offset;
> +
> +	rdev_for_each(rdev, mddev) {
> +		u32 iosize;
> +
> +		if (!test_bit(In_sync, &rdev->flags) ||
> +		    test_bit(Faulty, &rdev->flags) ||
> +		    test_bit(Bitmap_sync, &rdev->flags))
> +			continue;
> +
> +		iosize = roundup(sizeof(bitmap_super_t),
> +				 bdev_logical_block_size(rdev->bdev));
> +		if (sync_page_io(rdev, sector, iosize, sb_page, 
> REQ_OP_READ,
> +				 true))
> +			goto read_ok;
> +	}
>
And here.

> +	goto out;
> +
> +read_ok:
> +	sb = kmap_local_page(sb_page);
> +	if (sb->magic != cpu_to_le32(BITMAP_MAGIC))
> +		goto out_unmap;
> +
> +	version = le32_to_cpu(sb->version);
> +	switch (version) {
> +	case BITMAP_MAJOR_LO:
> +	case BITMAP_MAJOR_HI:
> +	case BITMAP_MAJOR_CLUSTERED:
>
For BITMAP_MAJOR_CLUSTERED, why not ID_CLUSTER ?

--
Su
> +		id = ID_BITMAP;
> +		break;
> +	case BITMAP_MAJOR_LOCKLESS:
> +		id = ID_LLBITMAP;
> +		break;
> +	default:
> +		pr_warn("md: %s: unknown bitmap version %u\n",
> +			mdname(mddev), version);
> +		break;
> +	}
> +
> +out_unmap:
> +	kunmap_local(sb);
> +out:
> +	__free_page(sb_page);
> +	return id;
> +}
> +
>  static int md_bitmap_create(struct mddev *mddev)
>  {
> +	enum md_submodule_id orig_id = mddev->bitmap_id;
> +	enum md_submodule_id sb_id;
> +	int err;
> +
>  	if (mddev->bitmap_id == ID_BITMAP_NONE)
>  		return -EINVAL;
>
>  	if (!mddev_set_bitmap_ops(mddev))
>  		return -ENOENT;
>
> -	return mddev->bitmap_ops->create(mddev);
> +	err = mddev->bitmap_ops->create(mddev);
> +	if (!err)
> +		return 0;
>
> +
> +	/*
> +	 * Create failed, if default bitmap version and on-disk 
> version
> +	 * doesn't match, and mdadm is not the latest version to set
> +	 * bitmap_type, set bitmap_ops based on the disk version.
> +	 */
> +	mddev_clear_bitmap_ops(mddev);
> +
> +	sb_id = md_bitmap_get_id_from_sb(mddev);
> +	if (sb_id == ID_BITMAP_NONE || sb_id == orig_id)
> +		return err;
> +
> +	pr_info("md: %s: bitmap version mismatch, switching from %d to 
> %d\n",
> +		mdname(mddev), orig_id, sb_id);
> +
> +	mddev->bitmap_id = sb_id;
> +	if (!mddev_set_bitmap_ops(mddev)) {
> +		mddev->bitmap_id = orig_id;
> +		return -ENOENT;
> +	}
> +
> +	err = mddev->bitmap_ops->create(mddev);
> +	if (err) {
> +		mddev_clear_bitmap_ops(mddev);
> +		mddev->bitmap_id = orig_id;
> +	}
> +
> +	return err;
>  }
>
>  static void md_bitmap_destroy(struct mddev *mddev)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/5] md: add fallback to correct bitmap_ops on version mismatch
  2026-02-17  8:54   ` Su Yue
@ 2026-02-23  2:22     ` Yu Kuai
  2026-02-24  1:52       ` Su Yue
  0 siblings, 1 reply; 11+ messages in thread
From: Yu Kuai @ 2026-02-23  2:22 UTC (permalink / raw)
  To: Su Yue; +Cc: song, linan122, xni, colyli, linux-raid, linux-kernel, yukuai

Hi,

在 2026/2/17 16:54, Su Yue 写道:
> On Sat 14 Feb 2026 at 14:10, Yu Kuai <yukuai@fnnas.com> wrote:
>
>> If default bitmap version and on-disk version doesn't match, and mdadm
>> is not the latest version to set bitmap_type, set bitmap_ops based on
>> the disk version.
>>
> Why not just let old version mdadm fails  since llbitmap is a new 
> feature.

The original use case is that we found llbitmap array fails to assemble in
some corner cases, and with the respect I'm not quite familiar with mdadm
code, so I think this patch is the best solution for now.

On the other hand, this should also be helpful if we decide to make llbitmap
the default option in the future.

>
>> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
>> ---
>>  drivers/md/md.c | 103  +++++++++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 102 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 59cd303548de..d2607ed5c2e9 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -6447,15 +6447,116 @@ static void md_safemode_timeout(struct 
>> timer_list *t)
>>
>>  static int start_dirty_degraded;
>>
>> +/*
>> + * Read bitmap superblock and return the bitmap_id based on disk 
>> version.
>> + * This is used as fallback when default bitmap version and on-disk 
>> version
>> + * doesn't match, and mdadm is not the latest version to set 
>> bitmap_type.
>> + */
>> +static enum md_submodule_id md_bitmap_get_id_from_sb(struct mddev 
>> *mddev)
>> +{
>> +    struct md_rdev *rdev;
>> +    struct page *sb_page;
>> +    bitmap_super_t *sb;
>> +    enum md_submodule_id id = ID_BITMAP_NONE;
>> +    sector_t sector;
>> +    u32 version;
>> +
>> +    if (!mddev->bitmap_info.offset)
>> +        return ID_BITMAP_NONE;
>> +
>> +    sb_page = alloc_page(GFP_KERNEL);
>> +    if (!sb_page)
>> +        return ID_BITMAP_NONE;
>> +
>>
> Personally I don't like the way treating error as ID_BITMAP_NONE.
> When wrong things happen everything looks fine, no error code, no 
> error message.

Ok, sounds reasonable.

>
>> +    sector = mddev->bitmap_info.offset;
>> +
>> +    rdev_for_each(rdev, mddev) {
>> +        u32 iosize;
>> +
>> +        if (!test_bit(In_sync, &rdev->flags) ||
>> +            test_bit(Faulty, &rdev->flags) ||
>> +            test_bit(Bitmap_sync, &rdev->flags))
>> +            continue;
>> +
>> +        iosize = roundup(sizeof(bitmap_super_t),
>> +                 bdev_logical_block_size(rdev->bdev));
>> +        if (sync_page_io(rdev, sector, iosize, sb_page, REQ_OP_READ,
>> +                 true))
>> +            goto read_ok;
>> +    }
>>
> And here.
>
>> +    goto out;
>> +
>> +read_ok:
>> +    sb = kmap_local_page(sb_page);
>> +    if (sb->magic != cpu_to_le32(BITMAP_MAGIC))
>> +        goto out_unmap;
>> +
>> +    version = le32_to_cpu(sb->version);
>> +    switch (version) {
>> +    case BITMAP_MAJOR_LO:
>> +    case BITMAP_MAJOR_HI:
>> +    case BITMAP_MAJOR_CLUSTERED:
>>
> For BITMAP_MAJOR_CLUSTERED, why not ID_CLUSTER ?

Because there is no optional bitmap_ops for md-cluster, it's still
the old bitmap, and llbitmap does not support md-cluster for now.

>
> -- 
> Su
>> +        id = ID_BITMAP;
>> +        break;
>> +    case BITMAP_MAJOR_LOCKLESS:
>> +        id = ID_LLBITMAP;
>> +        break;
>> +    default:
>> +        pr_warn("md: %s: unknown bitmap version %u\n",
>> +            mdname(mddev), version);
>> +        break;
>> +    }
>> +
>> +out_unmap:
>> +    kunmap_local(sb);
>> +out:
>> +    __free_page(sb_page);
>> +    return id;
>> +}
>> +
>>  static int md_bitmap_create(struct mddev *mddev)
>>  {
>> +    enum md_submodule_id orig_id = mddev->bitmap_id;
>> +    enum md_submodule_id sb_id;
>> +    int err;
>> +
>>      if (mddev->bitmap_id == ID_BITMAP_NONE)
>>          return -EINVAL;
>>
>>      if (!mddev_set_bitmap_ops(mddev))
>>          return -ENOENT;
>>
>> -    return mddev->bitmap_ops->create(mddev);
>> +    err = mddev->bitmap_ops->create(mddev);
>> +    if (!err)
>> +        return 0;
>>
>> +
>> +    /*
>> +     * Create failed, if default bitmap version and on-disk version
>> +     * doesn't match, and mdadm is not the latest version to set
>> +     * bitmap_type, set bitmap_ops based on the disk version.
>> +     */
>> +    mddev_clear_bitmap_ops(mddev);
>> +
>> +    sb_id = md_bitmap_get_id_from_sb(mddev);
>> +    if (sb_id == ID_BITMAP_NONE || sb_id == orig_id)
>> +        return err;
>> +
>> +    pr_info("md: %s: bitmap version mismatch, switching from %d to 
>> %d\n",
>> +        mdname(mddev), orig_id, sb_id);
>> +
>> +    mddev->bitmap_id = sb_id;
>> +    if (!mddev_set_bitmap_ops(mddev)) {
>> +        mddev->bitmap_id = orig_id;
>> +        return -ENOENT;
>> +    }
>> +
>> +    err = mddev->bitmap_ops->create(mddev);
>> +    if (err) {
>> +        mddev_clear_bitmap_ops(mddev);
>> +        mddev->bitmap_id = orig_id;
>> +    }
>> +
>> +    return err;
>>  }
>>
>>  static void md_bitmap_destroy(struct mddev *mddev)

-- 
Thansk,
Kuai

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/5] md: add fallback to correct bitmap_ops on version mismatch
  2026-02-23  2:22     ` Yu Kuai
@ 2026-02-24  1:52       ` Su Yue
  2026-03-10  1:15         ` Xiao Ni
  0 siblings, 1 reply; 11+ messages in thread
From: Su Yue @ 2026-02-24  1:52 UTC (permalink / raw)
  To: Yu Kuai; +Cc: song, linan122, xni, colyli, linux-raid, linux-kernel

On Mon 23 Feb 2026 at 10:22, "Yu Kuai" <yukuai@fnnas.com> wrote:

> Hi,
>
> 在 2026/2/17 16:54, Su Yue 写道:
>> On Sat 14 Feb 2026 at 14:10, Yu Kuai <yukuai@fnnas.com> wrote:
>>
>>> If default bitmap version and on-disk version doesn't match, 
>>> and mdadm
>>> is not the latest version to set bitmap_type, set bitmap_ops 
>>> based on
>>> the disk version.
>>>
>> Why not just let old version mdadm fails  since llbitmap is a 
>> new
>> feature.
>
> The original use case is that we found llbitmap array fails to 
> assemble in
> some corner cases, and with the respect I'm not quite familiar 
> with mdadm
> code, so I think this patch is the best solution for now.
>
Would you please elaborate which corner cases that llbitmap array 
fails to assemble
in? Do they happen in mdadm <= 4.5?

> On the other hand, this should also be helpful if we decide to 
> make llbitmap
> the default option in the future.
>
But it's so far, right? llbitmap support is still on the way(mdadm 
4.6 is not released).

I am not opposed to the patch. It just looks strange to me that 
changing kernel code to
let old userspace work with *new* feature.
Maybe the mdadm maintainers have words in another angles?

--
Su
>
>>
>>> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
>>> ---
>>>  drivers/md/md.c | 103 
>>>  +++++++++++++++++++++++++++++++++++++++++++++++-
>>>  1 file changed, 102 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>> index 59cd303548de..d2607ed5c2e9 100644
>>> --- a/drivers/md/md.c
>>> +++ b/drivers/md/md.c
>>> @@ -6447,15 +6447,116 @@ static void 
>>> md_safemode_timeout(struct
>>> timer_list *t)
>>>
>>>  static int start_dirty_degraded;
>>>
>>> +/*
>>> + * Read bitmap superblock and return the bitmap_id based on 
>>> disk
>>> version.
>>> + * This is used as fallback when default bitmap version and 
>>> on-disk
>>> version
>>> + * doesn't match, and mdadm is not the latest version to set
>>> bitmap_type.
>>> + */
>>> +static enum md_submodule_id md_bitmap_get_id_from_sb(struct 
>>> mddev
>>> *mddev)
>>> +{
>>> +    struct md_rdev *rdev;
>>> +    struct page *sb_page;
>>> +    bitmap_super_t *sb;
>>> +    enum md_submodule_id id = ID_BITMAP_NONE;
>>> +    sector_t sector;
>>> +    u32 version;
>>> +
>>> +    if (!mddev->bitmap_info.offset)
>>> +        return ID_BITMAP_NONE;
>>> +
>>> +    sb_page = alloc_page(GFP_KERNEL);
>>> +    if (!sb_page)
>>> +        return ID_BITMAP_NONE;
>>> +
>>>
>> Personally I don't like the way treating error as 
>> ID_BITMAP_NONE.
>> When wrong things happen everything looks fine, no error code, 
>> no
>> error message.
>
> Ok, sounds reasonable.
>
>>
>>> +    sector = mddev->bitmap_info.offset;
>>> +
>>> +    rdev_for_each(rdev, mddev) {
>>> +        u32 iosize;
>>> +
>>> +        if (!test_bit(In_sync, &rdev->flags) ||
>>> +            test_bit(Faulty, &rdev->flags) ||
>>> +            test_bit(Bitmap_sync, &rdev->flags))
>>> +            continue;
>>> +
>>> +        iosize = roundup(sizeof(bitmap_super_t),
>>> +                 bdev_logical_block_size(rdev->bdev));
>>> +        if (sync_page_io(rdev, sector, iosize, sb_page, 
>>> REQ_OP_READ,
>>> +                 true))
>>> +            goto read_ok;
>>> +    }
>>>
>> And here.
>>
>>> +    goto out;
>>> +
>>> +read_ok:
>>> +    sb = kmap_local_page(sb_page);
>>> +    if (sb->magic != cpu_to_le32(BITMAP_MAGIC))
>>> +        goto out_unmap;
>>> +
>>> +    version = le32_to_cpu(sb->version);
>>> +    switch (version) {
>>> +    case BITMAP_MAJOR_LO:
>>> +    case BITMAP_MAJOR_HI:
>>> +    case BITMAP_MAJOR_CLUSTERED:
>>>
>> For BITMAP_MAJOR_CLUSTERED, why not ID_CLUSTER ?
>
> Because there is no optional bitmap_ops for md-cluster, it's 
> still
> the old bitmap, and llbitmap does not support md-cluster for 
> now.
>
>>
>> --
>> Su
>>> +        id = ID_BITMAP;
>>> +        break;
>>> +    case BITMAP_MAJOR_LOCKLESS:
>>> +        id = ID_LLBITMAP;
>>> +        break;
>>> +    default:
>>> +        pr_warn("md: %s: unknown bitmap version %u\n",
>>> +            mdname(mddev), version);
>>> +        break;
>>> +    }
>>> +
>>> +out_unmap:
>>> +    kunmap_local(sb);
>>> +out:
>>> +    __free_page(sb_page);
>>> +    return id;
>>> +}
>>> +
>>>  static int md_bitmap_create(struct mddev *mddev)
>>>  {
>>> +    enum md_submodule_id orig_id = mddev->bitmap_id;
>>> +    enum md_submodule_id sb_id;
>>> +    int err;
>>> +
>>>      if (mddev->bitmap_id == ID_BITMAP_NONE)
>>>          return -EINVAL;
>>>
>>>      if (!mddev_set_bitmap_ops(mddev))
>>>          return -ENOENT;
>>>
>>> -    return mddev->bitmap_ops->create(mddev);
>>> +    err = mddev->bitmap_ops->create(mddev);
>>> +    if (!err)
>>> +        return 0;
>>>
>>> +
>>> +    /*
>>> +     * Create failed, if default bitmap version and on-disk 
>>> version
>>> +     * doesn't match, and mdadm is not the latest version to 
>>> set
>>> +     * bitmap_type, set bitmap_ops based on the disk version.
>>> +     */
>>> +    mddev_clear_bitmap_ops(mddev);
>>> +
>>> +    sb_id = md_bitmap_get_id_from_sb(mddev);
>>> +    if (sb_id == ID_BITMAP_NONE || sb_id == orig_id)
>>> +        return err;
>>> +
>>> +    pr_info("md: %s: bitmap version mismatch, switching from 
>>> %d to
>>> %d\n",
>>> +        mdname(mddev), orig_id, sb_id);
>>> +
>>> +    mddev->bitmap_id = sb_id;
>>> +    if (!mddev_set_bitmap_ops(mddev)) {
>>> +        mddev->bitmap_id = orig_id;
>>> +        return -ENOENT;
>>> +    }
>>> +
>>> +    err = mddev->bitmap_ops->create(mddev);
>>> +    if (err) {
>>> +        mddev_clear_bitmap_ops(mddev);
>>> +        mddev->bitmap_id = orig_id;
>>> +    }
>>> +
>>> +    return err;
>>>  }
>>>
>>>  static void md_bitmap_destroy(struct mddev *mddev)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/5] md: add fallback to correct bitmap_ops on version mismatch
  2026-02-24  1:52       ` Su Yue
@ 2026-03-10  1:15         ` Xiao Ni
  2026-03-10  5:19           ` Su Yue
  0 siblings, 1 reply; 11+ messages in thread
From: Xiao Ni @ 2026-03-10  1:15 UTC (permalink / raw)
  To: Su Yue, Yu Kuai; +Cc: song, linan122, colyli, linux-raid, linux-kernel


在 2026/2/24 09:52, Su Yue 写道:
> On Mon 23 Feb 2026 at 10:22, "Yu Kuai" <yukuai@fnnas.com> wrote:
>
>> Hi,
>>
>> 在 2026/2/17 16:54, Su Yue 写道:
>>> On Sat 14 Feb 2026 at 14:10, Yu Kuai <yukuai@fnnas.com> wrote:
>>>
>>>> If default bitmap version and on-disk version doesn't match, and mdadm
>>>> is not the latest version to set bitmap_type, set bitmap_ops based on
>>>> the disk version.
>>>>
>>> Why not just let old version mdadm fails  since llbitmap is a new
>>> feature.
>>
>> The original use case is that we found llbitmap array fails to 
>> assemble in
>> some corner cases, and with the respect I'm not quite familiar with 
>> mdadm
>> code, so I think this patch is the best solution for now.
>>
> Would you please elaborate which corner cases that llbitmap array 
> fails to assemble
> in? Do they happen in mdadm <= 4.5?
>
>> On the other hand, this should also be helpful if we decide to make 
>> llbitmap
>> the default option in the future.
>>
> But it's so far, right? llbitmap support is still on the way(mdadm 4.6 
> is not released).
>
> I am not opposed to the patch. It just looks strange to me that 
> changing kernel code to
> let old userspace work with *new* feature.
> Maybe the mdadm maintainers have words in another angles?


Yes. Is it better to upgrade mdadm to the version which supports llbitmap?


Regards

Xiao

>
> -- 
> Su
>>
>>>
>>>> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
>>>> ---
>>>>  drivers/md/md.c | 103 
>>>>  +++++++++++++++++++++++++++++++++++++++++++++++-
>>>>  1 file changed, 102 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>>> index 59cd303548de..d2607ed5c2e9 100644
>>>> --- a/drivers/md/md.c
>>>> +++ b/drivers/md/md.c
>>>> @@ -6447,15 +6447,116 @@ static void md_safemode_timeout(struct
>>>> timer_list *t)
>>>>
>>>>  static int start_dirty_degraded;
>>>>
>>>> +/*
>>>> + * Read bitmap superblock and return the bitmap_id based on disk
>>>> version.
>>>> + * This is used as fallback when default bitmap version and on-disk
>>>> version
>>>> + * doesn't match, and mdadm is not the latest version to set
>>>> bitmap_type.
>>>> + */
>>>> +static enum md_submodule_id md_bitmap_get_id_from_sb(struct mddev
>>>> *mddev)
>>>> +{
>>>> +    struct md_rdev *rdev;
>>>> +    struct page *sb_page;
>>>> +    bitmap_super_t *sb;
>>>> +    enum md_submodule_id id = ID_BITMAP_NONE;
>>>> +    sector_t sector;
>>>> +    u32 version;
>>>> +
>>>> +    if (!mddev->bitmap_info.offset)
>>>> +        return ID_BITMAP_NONE;
>>>> +
>>>> +    sb_page = alloc_page(GFP_KERNEL);
>>>> +    if (!sb_page)
>>>> +        return ID_BITMAP_NONE;
>>>> +
>>>>
>>> Personally I don't like the way treating error as ID_BITMAP_NONE.
>>> When wrong things happen everything looks fine, no error code, no
>>> error message.
>>
>> Ok, sounds reasonable.
>>
>>>
>>>> +    sector = mddev->bitmap_info.offset;
>>>> +
>>>> +    rdev_for_each(rdev, mddev) {
>>>> +        u32 iosize;
>>>> +
>>>> +        if (!test_bit(In_sync, &rdev->flags) ||
>>>> +            test_bit(Faulty, &rdev->flags) ||
>>>> +            test_bit(Bitmap_sync, &rdev->flags))
>>>> +            continue;
>>>> +
>>>> +        iosize = roundup(sizeof(bitmap_super_t),
>>>> +                 bdev_logical_block_size(rdev->bdev));
>>>> +        if (sync_page_io(rdev, sector, iosize, sb_page, REQ_OP_READ,
>>>> +                 true))
>>>> +            goto read_ok;
>>>> +    }
>>>>
>>> And here.
>>>
>>>> +    goto out;
>>>> +
>>>> +read_ok:
>>>> +    sb = kmap_local_page(sb_page);
>>>> +    if (sb->magic != cpu_to_le32(BITMAP_MAGIC))
>>>> +        goto out_unmap;
>>>> +
>>>> +    version = le32_to_cpu(sb->version);
>>>> +    switch (version) {
>>>> +    case BITMAP_MAJOR_LO:
>>>> +    case BITMAP_MAJOR_HI:
>>>> +    case BITMAP_MAJOR_CLUSTERED:
>>>>
>>> For BITMAP_MAJOR_CLUSTERED, why not ID_CLUSTER ?
>>
>> Because there is no optional bitmap_ops for md-cluster, it's still
>> the old bitmap, and llbitmap does not support md-cluster for now.
>>
>>>
>>> -- 
>>> Su
>>>> +        id = ID_BITMAP;
>>>> +        break;
>>>> +    case BITMAP_MAJOR_LOCKLESS:
>>>> +        id = ID_LLBITMAP;
>>>> +        break;
>>>> +    default:
>>>> +        pr_warn("md: %s: unknown bitmap version %u\n",
>>>> +            mdname(mddev), version);
>>>> +        break;
>>>> +    }
>>>> +
>>>> +out_unmap:
>>>> +    kunmap_local(sb);
>>>> +out:
>>>> +    __free_page(sb_page);
>>>> +    return id;
>>>> +}
>>>> +
>>>>  static int md_bitmap_create(struct mddev *mddev)
>>>>  {
>>>> +    enum md_submodule_id orig_id = mddev->bitmap_id;
>>>> +    enum md_submodule_id sb_id;
>>>> +    int err;
>>>> +
>>>>      if (mddev->bitmap_id == ID_BITMAP_NONE)
>>>>          return -EINVAL;
>>>>
>>>>      if (!mddev_set_bitmap_ops(mddev))
>>>>          return -ENOENT;
>>>>
>>>> -    return mddev->bitmap_ops->create(mddev);
>>>> +    err = mddev->bitmap_ops->create(mddev);
>>>> +    if (!err)
>>>> +        return 0;
>>>>
>>>> +
>>>> +    /*
>>>> +     * Create failed, if default bitmap version and on-disk version
>>>> +     * doesn't match, and mdadm is not the latest version to set
>>>> +     * bitmap_type, set bitmap_ops based on the disk version.
>>>> +     */
>>>> +    mddev_clear_bitmap_ops(mddev);
>>>> +
>>>> +    sb_id = md_bitmap_get_id_from_sb(mddev);
>>>> +    if (sb_id == ID_BITMAP_NONE || sb_id == orig_id)
>>>> +        return err;
>>>> +
>>>> +    pr_info("md: %s: bitmap version mismatch, switching from %d to
>>>> %d\n",
>>>> +        mdname(mddev), orig_id, sb_id);
>>>> +
>>>> +    mddev->bitmap_id = sb_id;
>>>> +    if (!mddev_set_bitmap_ops(mddev)) {
>>>> +        mddev->bitmap_id = orig_id;
>>>> +        return -ENOENT;
>>>> +    }
>>>> +
>>>> +    err = mddev->bitmap_ops->create(mddev);
>>>> +    if (err) {
>>>> +        mddev_clear_bitmap_ops(mddev);
>>>> +        mddev->bitmap_id = orig_id;
>>>> +    }
>>>> +
>>>> +    return err;
>>>>  }
>>>>
>>>>  static void md_bitmap_destroy(struct mddev *mddev)
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/5] md: add fallback to correct bitmap_ops on version mismatch
  2026-03-10  1:15         ` Xiao Ni
@ 2026-03-10  5:19           ` Su Yue
  0 siblings, 0 replies; 11+ messages in thread
From: Su Yue @ 2026-03-10  5:19 UTC (permalink / raw)
  To: Xiao Ni; +Cc: Yu Kuai, song, linan122, colyli, linux-raid, linux-kernel

On Tue 10 Mar 2026 at 09:15, Xiao Ni <xni@redhat.com> wrote:

> 在 2026/2/24 09:52, Su Yue 写道:
>> On Mon 23 Feb 2026 at 10:22, "Yu Kuai" <yukuai@fnnas.com> 
>> wrote:
>>
>>> Hi,
>>>
>>> 在 2026/2/17 16:54, Su Yue 写道:
>>>> On Sat 14 Feb 2026 at 14:10, Yu Kuai <yukuai@fnnas.com> 
>>>> wrote:
>>>>
>>>>> If default bitmap version and on-disk version doesn't match, 
>>>>> and mdadm
>>>>> is not the latest version to set bitmap_type, set bitmap_ops 
>>>>> based on
>>>>> the disk version.
>>>>>
>>>> Why not just let old version mdadm fails  since llbitmap is a 
>>>> new
>>>> feature.
>>>
>>> The original use case is that we found llbitmap array fails to 
>>> assemble in
>>> some corner cases, and with the respect I'm not quite familiar 
>>> with mdadm
>>> code, so I think this patch is the best solution for now.
>>>
>> Would you please elaborate which corner cases that llbitmap 
>> array fails to
>> assemble
>> in? Do they happen in mdadm <= 4.5?
>>
>>> On the other hand, this should also be helpful if we decide to 
>>> make llbitmap
>>> the default option in the future.
>>>
>> But it's so far, right? llbitmap support is still on the 
>> way(mdadm 4.6 is not
>> released).
>>
>> I am not opposed to the patch. It just looks strange to me that 
>> changing
>> kernel code to
>> let old userspace work with *new* feature.
>> Maybe the mdadm maintainers have words in another angles?
>
>
> Yes. Is it better to upgrade mdadm to the version which supports 
> llbitmap?
>

Yes. This's what I mean.

--
Su
>
> Regards
>
> Xiao
>
>>
>> -- Su
>>>
>>>>
>>>>> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
>>>>> ---
>>>>>  drivers/md/md.c | 103 
>>>>>  +++++++++++++++++++++++++++++++++++++++++++++++-
>>>>>  1 file changed, 102 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>>>> index 59cd303548de..d2607ed5c2e9 100644
>>>>> --- a/drivers/md/md.c
>>>>> +++ b/drivers/md/md.c
>>>>> @@ -6447,15 +6447,116 @@ static void 
>>>>> md_safemode_timeout(struct
>>>>> timer_list *t)
>>>>>
>>>>>  static int start_dirty_degraded;
>>>>>
>>>>> +/*
>>>>> + * Read bitmap superblock and return the bitmap_id based on 
>>>>> disk
>>>>> version.
>>>>> + * This is used as fallback when default bitmap version and 
>>>>> on-disk
>>>>> version
>>>>> + * doesn't match, and mdadm is not the latest version to 
>>>>> set
>>>>> bitmap_type.
>>>>> + */
>>>>> +static enum md_submodule_id md_bitmap_get_id_from_sb(struct 
>>>>> mddev
>>>>> *mddev)
>>>>> +{
>>>>> +    struct md_rdev *rdev;
>>>>> +    struct page *sb_page;
>>>>> +    bitmap_super_t *sb;
>>>>> +    enum md_submodule_id id = ID_BITMAP_NONE;
>>>>> +    sector_t sector;
>>>>> +    u32 version;
>>>>> +
>>>>> +    if (!mddev->bitmap_info.offset)
>>>>> +        return ID_BITMAP_NONE;
>>>>> +
>>>>> +    sb_page = alloc_page(GFP_KERNEL);
>>>>> +    if (!sb_page)
>>>>> +        return ID_BITMAP_NONE;
>>>>> +
>>>>>
>>>> Personally I don't like the way treating error as 
>>>> ID_BITMAP_NONE.
>>>> When wrong things happen everything looks fine, no error 
>>>> code, no
>>>> error message.
>>>
>>> Ok, sounds reasonable.
>>>
>>>>
>>>>> +    sector = mddev->bitmap_info.offset;
>>>>> +
>>>>> +    rdev_for_each(rdev, mddev) {
>>>>> +        u32 iosize;
>>>>> +
>>>>> +        if (!test_bit(In_sync, &rdev->flags) ||
>>>>> +            test_bit(Faulty, &rdev->flags) ||
>>>>> +            test_bit(Bitmap_sync, &rdev->flags))
>>>>> +            continue;
>>>>> +
>>>>> +        iosize = roundup(sizeof(bitmap_super_t),
>>>>> +                 bdev_logical_block_size(rdev->bdev));
>>>>> +        if (sync_page_io(rdev, sector, iosize, sb_page, 
>>>>> REQ_OP_READ,
>>>>> +                 true))
>>>>> +            goto read_ok;
>>>>> +    }
>>>>>
>>>> And here.
>>>>
>>>>> +    goto out;
>>>>> +
>>>>> +read_ok:
>>>>> +    sb = kmap_local_page(sb_page);
>>>>> +    if (sb->magic != cpu_to_le32(BITMAP_MAGIC))
>>>>> +        goto out_unmap;
>>>>> +
>>>>> +    version = le32_to_cpu(sb->version);
>>>>> +    switch (version) {
>>>>> +    case BITMAP_MAJOR_LO:
>>>>> +    case BITMAP_MAJOR_HI:
>>>>> +    case BITMAP_MAJOR_CLUSTERED:
>>>>>
>>>> For BITMAP_MAJOR_CLUSTERED, why not ID_CLUSTER ?
>>>
>>> Because there is no optional bitmap_ops for md-cluster, it's 
>>> still
>>> the old bitmap, and llbitmap does not support md-cluster for 
>>> now.
>>>
>>>>
>>>> -- Su
>>>>> +        id = ID_BITMAP;
>>>>> +        break;
>>>>> +    case BITMAP_MAJOR_LOCKLESS:
>>>>> +        id = ID_LLBITMAP;
>>>>> +        break;
>>>>> +    default:
>>>>> +        pr_warn("md: %s: unknown bitmap version %u\n",
>>>>> +            mdname(mddev), version);
>>>>> +        break;
>>>>> +    }
>>>>> +
>>>>> +out_unmap:
>>>>> +    kunmap_local(sb);
>>>>> +out:
>>>>> +    __free_page(sb_page);
>>>>> +    return id;
>>>>> +}
>>>>> +
>>>>>  static int md_bitmap_create(struct mddev *mddev)
>>>>>  {
>>>>> +    enum md_submodule_id orig_id = mddev->bitmap_id;
>>>>> +    enum md_submodule_id sb_id;
>>>>> +    int err;
>>>>> +
>>>>>      if (mddev->bitmap_id == ID_BITMAP_NONE)
>>>>>          return -EINVAL;
>>>>>
>>>>>      if (!mddev_set_bitmap_ops(mddev))
>>>>>          return -ENOENT;
>>>>>
>>>>> -    return mddev->bitmap_ops->create(mddev);
>>>>> +    err = mddev->bitmap_ops->create(mddev);
>>>>> +    if (!err)
>>>>> +        return 0;
>>>>>
>>>>> +
>>>>> +    /*
>>>>> +     * Create failed, if default bitmap version and on-disk 
>>>>> version
>>>>> +     * doesn't match, and mdadm is not the latest version 
>>>>> to set
>>>>> +     * bitmap_type, set bitmap_ops based on the disk 
>>>>> version.
>>>>> +     */
>>>>> +    mddev_clear_bitmap_ops(mddev);
>>>>> +
>>>>> +    sb_id = md_bitmap_get_id_from_sb(mddev);
>>>>> +    if (sb_id == ID_BITMAP_NONE || sb_id == orig_id)
>>>>> +        return err;
>>>>> +
>>>>> +    pr_info("md: %s: bitmap version mismatch, switching 
>>>>> from %d to
>>>>> %d\n",
>>>>> +        mdname(mddev), orig_id, sb_id);
>>>>> +
>>>>> +    mddev->bitmap_id = sb_id;
>>>>> +    if (!mddev_set_bitmap_ops(mddev)) {
>>>>> +        mddev->bitmap_id = orig_id;
>>>>> +        return -ENOENT;
>>>>> +    }
>>>>> +
>>>>> +    err = mddev->bitmap_ops->create(mddev);
>>>>> +    if (err) {
>>>>> +        mddev_clear_bitmap_ops(mddev);
>>>>> +        mddev->bitmap_id = orig_id;
>>>>> +    }
>>>>> +
>>>>> +    return err;
>>>>>  }
>>>>>
>>>>>  static void md_bitmap_destroy(struct mddev *mddev)
>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-03-10  5:25 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-14  6:10 [PATCH 0/5] md/md-llbitmap: fixes and proactive parity building support Yu Kuai
2026-02-14  6:10 ` [PATCH 1/5] md/md-llbitmap: skip reading rdevs that are not in_sync Yu Kuai
2026-02-14  6:10 ` [PATCH 2/5] md/md-llbitmap: raise barrier before state machine transition Yu Kuai
2026-02-14  6:10 ` [PATCH 3/5] md: add fallback to correct bitmap_ops on version mismatch Yu Kuai
2026-02-17  8:54   ` Su Yue
2026-02-23  2:22     ` Yu Kuai
2026-02-24  1:52       ` Su Yue
2026-03-10  1:15         ` Xiao Ni
2026-03-10  5:19           ` Su Yue
2026-02-14  6:10 ` [PATCH 4/5] md/md-llbitmap: add CleanUnwritten state for RAID-5 proactive parity building Yu Kuai
2026-02-14  6:10 ` [PATCH 5/5] md/md-llbitmap: optimize initial sync with write_zeroes_unmap support Yu Kuai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox