[RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks

public inbox for linux-raid@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks
@ 2026-02-03  6:12 Zheng Qixing
  2026-02-03  6:12 ` [RFC v2 1/5] md: add helpers for requested sync action Zheng Qixing
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Zheng Qixing @ 2026-02-03  6:12 UTC (permalink / raw)
  To: song, yukuai, linan122
  Cc: xni, linux-raid, linux-kernel, yi.zhang, yangerkun, houtao1,
	zhengqixing

From: Zheng Qixing <zhengqixing@huawei.com>

Hi,

This is v2 of the series.

# Mechanism
When rectifying badblocks, we issue a single repair write for
the bad range (copy data from a good mirror to the corresponding
LBA on bad mirror). Once the write completes successfully
(bi_status == 0), the LBA range is cleared from the badblocks
table. If the media is still bad for that LBA, a subsequent
read/write will fail again and the range will be marked bad
again.

Doing a read-back for every repair would only prove that the data
is readable at that moment, and it does not provide a stronger
guarantee against future internal remapping.

# Why use LBS granularity for bad-block repair?
In our RAID1 bad-block repair (rectify) testing on a device
reporting 512B logical blocks and 4KiB physical blocks, we
issue 512B I/O directly to the md device and inject an I/O
fault.

Since the md badblocks table can only track failures in terms
of host-visible LBA ranges, it is updated at 512B sector
granularity (i.e., it records the failing sector) and does not
attempt to infer or expand the entry to a 4KiB physical-block
boundary.

Given that the OS has no visibility into the device's internal
mapping from LBAs to physical media (or the FTL), using
logical block size for recording and repairing bad blocks is
the most appropriate choice from a correctness standpoint.

If the underlying media failure is actually larger than 512B,
this is typically reflected by subsequent failures on adjacent
LBAs, at which point the recorded bad range will naturally
grow to cover the affected area.

# Tests
This feature has been tested on a RAID1 built from two 480GB
system disks. It has also been tested under QEMU with a 4-disk
RAID1 setup, with both memory fault injection and I/O fault
injection enabled.

In addition, we will add a new test (26raid1-rectify-badblocks)
in mdadm/tests to verify whether `rectify` can effectively repair
sectors recorded in bad_blocks.

# TODO
rectify currently only supports bad-block repair for the RAID1
level. We will consider extending it to RAID5/10 in follow-up
work.

Changes in v2:
 - Patch 1: Remove non-essential helpers to reduce indirection.
 - Patch 2: Split out a bugfix that was previously included in patch 1.
 - Patch 3: Rename the /proc/mdstat action from "recovery" to "recover"
   to match the naming used by action_store() and action_show().
 - Patch 4: Add a brief comment for MAX_RAID_DISKS.
 - Patch 5: For rectify, reuse handle_sync_write_finished() to handle
   write request completion, removing duplicate completion handling.

Link of v1:
  https://lore.kernel.org/all/20251231070952.1233903-1-zhengqixing@huaweicloud.com/

Zheng Qixing (5):
  md: add helpers for requested sync action
  md: serialize requested sync actions and clear stale request state
  md: rename mdstat action "recovery" to "recover"
  md: introduce MAX_RAID_DISKS macro to replace magic number
  md/raid1: introduce rectify action to repair badblocks

 drivers/md/md.c    | 152 +++++++++++++++++++------
 drivers/md/md.h    |  21 ++++
 drivers/md/raid1.c | 270 ++++++++++++++++++++++++++++++++++++++++++++-
 drivers/md/raid1.h |   1 +
 4 files changed, 409 insertions(+), 35 deletions(-)

-- 
2.39.2

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC v2 1/5] md: add helpers for requested sync action
  2026-02-03  6:12 [RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks Zheng Qixing
@ 2026-02-03  6:12 ` Zheng Qixing
  2026-02-03  6:12 ` [RFC v2 2/5] md: serialize requested sync actions and clear stale request state Zheng Qixing
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Zheng Qixing @ 2026-02-03  6:12 UTC (permalink / raw)
  To: song, yukuai, linan122
  Cc: xni, linux-raid, linux-kernel, yi.zhang, yangerkun, houtao1,
	zhengqixing

From: Zheng Qixing <zhengqixing@huawei.com>

Add helpers for handling requested sync action.

No functional change.

Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>
---
 drivers/md/md.c | 65 ++++++++++++++++++++++++++++++++++---------------
 1 file changed, 46 insertions(+), 19 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 5df2220b1bd1..84af578876e2 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -665,6 +665,41 @@ void mddev_put(struct mddev *mddev)
 	spin_unlock(&all_mddevs_lock);
 }
 
+static int handle_requested_sync_action(struct mddev *mddev,
+					enum sync_action action)
+{
+	switch (action) {
+	case ACTION_CHECK:
+		set_bit(MD_RECOVERY_CHECK, &mddev->recovery);
+		fallthrough;
+	case ACTION_REPAIR:
+		set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
+		set_bit(MD_RECOVERY_SYNC, &mddev->recovery);
+		return 0;
+	default:
+		return -EINVAL;
+	}
+}
+
+static enum sync_action get_recovery_sync_action(struct mddev *mddev)
+{
+	if (test_bit(MD_RECOVERY_CHECK, &mddev->recovery))
+		return ACTION_CHECK;
+	if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
+		return ACTION_REPAIR;
+	return ACTION_RESYNC;
+}
+
+static void set_requested_position(struct mddev *mddev, sector_t value)
+{
+	mddev->resync_min = value;
+}
+
+static sector_t get_requested_position(struct mddev *mddev)
+{
+	return mddev->resync_min;
+}
+
 static void md_safemode_timeout(struct timer_list *t);
 static void md_start_sync(struct work_struct *ws);
 
@@ -5101,17 +5136,9 @@ enum sync_action md_sync_action(struct mddev *mddev)
 	if (test_bit(MD_RECOVERY_RECOVER, &recovery))
 		return ACTION_RECOVER;
 
-	if (test_bit(MD_RECOVERY_SYNC, &recovery)) {
-		/*
-		 * MD_RECOVERY_CHECK must be paired with
-		 * MD_RECOVERY_REQUESTED.
-		 */
-		if (test_bit(MD_RECOVERY_CHECK, &recovery))
-			return ACTION_CHECK;
-		if (test_bit(MD_RECOVERY_REQUESTED, &recovery))
-			return ACTION_REPAIR;
-		return ACTION_RESYNC;
-	}
+	/* MD_RECOVERY_CHECK must be paired with MD_RECOVERY_REQUESTED. */
+	if (test_bit(MD_RECOVERY_SYNC, &recovery))
+		return get_recovery_sync_action(mddev);
 
 	/*
 	 * MD_RECOVERY_NEEDED or MD_RECOVERY_RUNNING is set, however, no
@@ -5300,11 +5327,10 @@ action_store(struct mddev *mddev, const char *page, size_t len)
 			set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
 			break;
 		case ACTION_CHECK:
-			set_bit(MD_RECOVERY_CHECK, &mddev->recovery);
-			fallthrough;
 		case ACTION_REPAIR:
-			set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
-			set_bit(MD_RECOVERY_SYNC, &mddev->recovery);
+			ret = handle_requested_sync_action(mddev, action);
+			if (ret)
+				goto out;
 			fallthrough;
 		case ACTION_RESYNC:
 		case ACTION_IDLE:
@@ -9370,7 +9396,7 @@ static sector_t md_sync_position(struct mddev *mddev, enum sync_action action)
 	switch (action) {
 	case ACTION_CHECK:
 	case ACTION_REPAIR:
-		return mddev->resync_min;
+		return get_requested_position(mddev);
 	case ACTION_RESYNC:
 		if (!mddev->bitmap)
 			return mddev->resync_offset;
@@ -9795,10 +9821,11 @@ void md_do_sync(struct md_thread *thread)
 	if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {
 		/* We completed so min/max setting can be forgotten if used. */
 		if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
-			mddev->resync_min = 0;
+			set_requested_position(mddev, 0);
 		mddev->resync_max = MaxSector;
-	} else if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
-		mddev->resync_min = mddev->curr_resync_completed;
+	} else if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) {
+		set_requested_position(mddev, mddev->curr_resync_completed);
+	}
 	set_bit(MD_RECOVERY_DONE, &mddev->recovery);
 	mddev->curr_resync = MD_RESYNC_NONE;
 	spin_unlock(&mddev->lock);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC v2 2/5] md: serialize requested sync actions and clear stale request state
  2026-02-03  6:12 [RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks Zheng Qixing
  2026-02-03  6:12 ` [RFC v2 1/5] md: add helpers for requested sync action Zheng Qixing
@ 2026-02-03  6:12 ` Zheng Qixing
  2026-02-03  6:12 ` [RFC v2 3/5] md: rename mdstat action "recovery" to "recover" Zheng Qixing
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Zheng Qixing @ 2026-02-03  6:12 UTC (permalink / raw)
  To: song, yukuai, linan122
  Cc: xni, linux-raid, linux-kernel, yi.zhang, yangerkun, houtao1,
	zhengqixing

From: Zheng Qixing <zhengqixing@huawei.com>

In handle_requested_sync_action(), return -EBUSY when
MD_RECOVERY_REQUESTED is already set. This serializes requested sync
actions (such as check or repair) and avoids a race window where a
second sync request can be issued before MD_RECOVERY_RUNNING is set,
resulting in the later request being neither rejected nor executed.

Additionally, in md_check_recovery(), clear requested-sync related
state bits when no recovery operation is running. This prevents stale
request state from persisting in cases where a sync action is queued
and 'frozen' is written before MD_RECOVERY_RUNNING is set, which would
cause subsequent sync requests to spuriously fail with -EBUSY.

Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>
---
 drivers/md/md.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 84af578876e2..7fe02ee21d3e 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -668,6 +668,9 @@ void mddev_put(struct mddev *mddev)
 static int handle_requested_sync_action(struct mddev *mddev,
 					enum sync_action action)
 {
+	if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
+		return -EBUSY;
+
 	switch (action) {
 	case ACTION_CHECK:
 		set_bit(MD_RECOVERY_CHECK, &mddev->recovery);
@@ -10318,6 +10321,9 @@ void md_check_recovery(struct mddev *mddev)
 			queue_work(md_misc_wq, &mddev->sync_work);
 		} else {
 			clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
+			clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
+			clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
+			clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
 			wake_up(&resync_wait);
 		}
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC v2 3/5] md: rename mdstat action "recovery" to "recover"
  2026-02-03  6:12 [RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks Zheng Qixing
  2026-02-03  6:12 ` [RFC v2 1/5] md: add helpers for requested sync action Zheng Qixing
  2026-02-03  6:12 ` [RFC v2 2/5] md: serialize requested sync actions and clear stale request state Zheng Qixing
@ 2026-02-03  6:12 ` Zheng Qixing
  2026-02-03  6:12 ` [RFC v2 4/5] md: introduce MAX_RAID_DISKS macro to replace magic number Zheng Qixing
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Zheng Qixing @ 2026-02-03  6:12 UTC (permalink / raw)
  To: song, yukuai, linan122
  Cc: xni, linux-raid, linux-kernel, yi.zhang, yangerkun, houtao1,
	zhengqixing

From: Zheng Qixing <zhengqixing@huawei.com>

Simplify the code in status_resync() that prints the progress of sync
actions.

Also rename the /proc/mdstat action from "recovery" to "recover" to
match the naming used by action_store() and action_show().

Note:
The md-raid-utilities/mdadm test suite will need to be updated to expect
"recover" instead of "recovery" in /proc/mdstat.

Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>
---
 drivers/md/md.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 7fe02ee21d3e..f319621c6832 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8669,6 +8669,7 @@ static int status_resync(struct seq_file *seq, struct mddev *mddev)
 	sector_t rt, curr_mark_cnt, resync_mark_cnt;
 	int scale, recovery_active;
 	unsigned int per_milli;
+	enum sync_action action;
 
 	if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ||
 	    test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery))
@@ -8750,13 +8751,10 @@ static int status_resync(struct seq_file *seq, struct mddev *mddev)
 			seq_printf(seq, ".");
 		seq_printf(seq, "] ");
 	}
+
+	action = md_sync_action(mddev);
 	seq_printf(seq, " %s =%3u.%u%% (%llu/%llu)",
-		   (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery)?
-		    "reshape" :
-		    (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)?
-		     "check" :
-		     (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ?
-		      "resync" : "recovery"))),
+		   md_sync_action_name(action),
 		   per_milli/10, per_milli % 10,
 		   (unsigned long long) resync/2,
 		   (unsigned long long) max_sectors/2);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC v2 4/5] md: introduce MAX_RAID_DISKS macro to replace magic number
  2026-02-03  6:12 [RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks Zheng Qixing
                   ` (2 preceding siblings ...)
  2026-02-03  6:12 ` [RFC v2 3/5] md: rename mdstat action "recovery" to "recover" Zheng Qixing
@ 2026-02-03  6:12 ` Zheng Qixing
  2026-02-03  6:12 ` [RFC v2 5/5] md/raid1: introduce rectify action to repair badblocks Zheng Qixing
  2026-02-03  7:31 ` [RFC v2 0/5] md/raid1: introduce a new sync " Christoph Hellwig
  5 siblings, 0 replies; 15+ messages in thread
From: Zheng Qixing @ 2026-02-03  6:12 UTC (permalink / raw)
  To: song, yukuai, linan122
  Cc: xni, linux-raid, linux-kernel, yi.zhang, yangerkun, houtao1,
	zhengqixing

From: Zheng Qixing <zhengqixing@huawei.com>

Per-device state is stored as a __le16 dev_roles[] array (2 bytes per
device) plus a fixed 256-byte header, still within a 4 KiB superblock.
Therefore the theoretical maximum is (4096 - 256) / 2 = 1920 entries.

Define MAX_RAID_DISKS macro for the maximum number of RAID disks. No
functional change.

Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>
---
 drivers/md/md.c | 4 ++--
 drivers/md/md.h | 5 +++++
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index f319621c6832..aebbdbaa4e0a 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1873,7 +1873,7 @@ static int super_1_load(struct md_rdev *rdev, struct md_rdev *refdev, int minor_
 
 	if (sb->magic != cpu_to_le32(MD_SB_MAGIC) ||
 	    sb->major_version != cpu_to_le32(1) ||
-	    le32_to_cpu(sb->max_dev) > (4096-256)/2 ||
+	    le32_to_cpu(sb->max_dev) > MAX_RAID_DISKS ||
 	    le64_to_cpu(sb->super_offset) != rdev->sb_start ||
 	    (le32_to_cpu(sb->feature_map) & ~MD_FEATURE_ALL) != 0)
 		return -EINVAL;
@@ -2050,7 +2050,7 @@ static int super_1_validate(struct mddev *mddev, struct md_rdev *freshest, struc
 		mddev->resync_offset = le64_to_cpu(sb->resync_offset);
 		memcpy(mddev->uuid, sb->set_uuid, 16);
 
-		mddev->max_disks =  (4096-256)/2;
+		mddev->max_disks = MAX_RAID_DISKS;
 
 		if (!mddev->logical_block_size)
 			mddev->logical_block_size = le32_to_cpu(sb->logical_block_size);
diff --git a/drivers/md/md.h b/drivers/md/md.h
index a083f37374d0..14f9db38b7c5 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -21,6 +21,11 @@
 #include <linux/raid/md_u.h>
 #include <trace/events/block.h>
 
+/*
+ * v1.x superblock occupies one 4 KiB block: 256B header + 2B per device
+ * in dev_roles[].
+ */
+#define MAX_RAID_DISKS ((4096-256)/2)
 #define MaxSector (~(sector_t)0)
 
 enum md_submodule_type {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC v2 5/5] md/raid1: introduce rectify action to repair badblocks
  2026-02-03  6:12 [RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks Zheng Qixing
                   ` (3 preceding siblings ...)
  2026-02-03  6:12 ` [RFC v2 4/5] md: introduce MAX_RAID_DISKS macro to replace magic number Zheng Qixing
@ 2026-02-03  6:12 ` Zheng Qixing
  2026-02-03  7:31 ` [RFC v2 0/5] md/raid1: introduce a new sync " Christoph Hellwig
  5 siblings, 0 replies; 15+ messages in thread
From: Zheng Qixing @ 2026-02-03  6:12 UTC (permalink / raw)
  To: song, yukuai, linan122
  Cc: xni, linux-raid, linux-kernel, yi.zhang, yangerkun, houtao1,
	zhengqixing

From: Zheng Qixing <zhengqixing@huawei.com>

Add support for repairing known badblocks in RAID1. When disks
have known badblocks (shown in sysfs bad_blocks), data can be
read from other healthy disks in the array and written to repair
the badblock areas and clear it in bad_blocks.

echo rectify > sync_action can trigger this action.

Signed-off-by: Zheng Qixing <zhengqixing@huawei.com>
---
 drivers/md/md.c    |  71 +++++++++++-
 drivers/md/md.h    |  16 +++
 drivers/md/raid1.c | 270 ++++++++++++++++++++++++++++++++++++++++++++-
 drivers/md/raid1.h |   1 +
 4 files changed, 348 insertions(+), 10 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index aebbdbaa4e0a..9b818fcef666 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -74,6 +74,7 @@ static const char *action_name[NR_SYNC_ACTIONS] = {
 	[ACTION_RECOVER]	= "recover",
 	[ACTION_CHECK]		= "check",
 	[ACTION_REPAIR]		= "repair",
+	[ACTION_RECTIFY]	= "rectify",
 	[ACTION_RESHAPE]	= "reshape",
 	[ACTION_FROZEN]		= "frozen",
 	[ACTION_IDLE]		= "idle",
@@ -665,13 +666,47 @@ void mddev_put(struct mddev *mddev)
 	spin_unlock(&all_mddevs_lock);
 }
 
+static int md_badblocks_precheck(struct mddev *mddev)
+{
+	struct md_rdev *rdev;
+	int valid_disks = 0;
+	int ret = -EINVAL;
+
+	/* rectify is currently supported only for RAID1 */
+	if (mddev->level != 1) {
+		pr_err("md/raid1:%s requires raid1 array\n", mdname(mddev));
+		return -EINVAL;
+	}
+
+	rdev_for_each(rdev, mddev) {
+		if (rdev->raid_disk < 0 ||
+		    test_bit(Faulty, &rdev->flags))
+			continue;
+		valid_disks++;
+	}
+	if (valid_disks >= 2)
+		ret = 0;
+
+	return ret;
+}
+
 static int handle_requested_sync_action(struct mddev *mddev,
 					enum sync_action action)
 {
+	int ret;
+
 	if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
 		return -EBUSY;
 
 	switch (action) {
+	case ACTION_RECTIFY:
+		ret = md_badblocks_precheck(mddev);
+		if (ret)
+			return ret;
+		set_bit(MD_RECOVERY_BADBLOCKS_RECTIFY, &mddev->recovery);
+		set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
+		set_bit(MD_RECOVERY_SYNC, &mddev->recovery);
+		return 0;
 	case ACTION_CHECK:
 		set_bit(MD_RECOVERY_CHECK, &mddev->recovery);
 		fallthrough;
@@ -686,6 +721,8 @@ static int handle_requested_sync_action(struct mddev *mddev,
 
 static enum sync_action get_recovery_sync_action(struct mddev *mddev)
 {
+	if (test_bit(MD_RECOVERY_BADBLOCKS_RECTIFY, &mddev->recovery))
+		return ACTION_RECTIFY;
 	if (test_bit(MD_RECOVERY_CHECK, &mddev->recovery))
 		return ACTION_CHECK;
 	if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
@@ -695,11 +732,16 @@ static enum sync_action get_recovery_sync_action(struct mddev *mddev)
 
 static void set_requested_position(struct mddev *mddev, sector_t value)
 {
-	mddev->resync_min = value;
+	if (test_bit(MD_RECOVERY_BADBLOCKS_RECTIFY, &mddev->recovery))
+		mddev->rectify_min = value;
+	else
+		mddev->resync_min = value;
 }
 
 static sector_t get_requested_position(struct mddev *mddev)
 {
+	if (test_bit(MD_RECOVERY_BADBLOCKS_RECTIFY, &mddev->recovery))
+		return mddev->rectify_min;
 	return mddev->resync_min;
 }
 
@@ -820,6 +862,7 @@ int mddev_init(struct mddev *mddev)
 	mddev->reshape_backwards = 0;
 	mddev->last_sync_action = ACTION_IDLE;
 	mddev->resync_min = 0;
+	mddev->rectify_min = 0;
 	mddev->resync_max = MaxSector;
 	mddev->level = LEVEL_NONE;
 
@@ -5139,7 +5182,10 @@ enum sync_action md_sync_action(struct mddev *mddev)
 	if (test_bit(MD_RECOVERY_RECOVER, &recovery))
 		return ACTION_RECOVER;
 
-	/* MD_RECOVERY_CHECK must be paired with MD_RECOVERY_REQUESTED. */
+	/*
+	 * MD_RECOVERY_CHECK / MD_RECOVERY_BADBLOCKS_RECTIFY must be
+	 * paired with MD_RECOVERY_REQUESTED.
+	 */
 	if (test_bit(MD_RECOVERY_SYNC, &recovery))
 		return get_recovery_sync_action(mddev);
 
@@ -5304,6 +5350,7 @@ action_store(struct mddev *mddev, const char *page, size_t len)
 			break;
 		case ACTION_RESHAPE:
 		case ACTION_RECOVER:
+		case ACTION_RECTIFY:
 		case ACTION_CHECK:
 		case ACTION_REPAIR:
 		case ACTION_RESYNC:
@@ -5329,6 +5376,7 @@ action_store(struct mddev *mddev, const char *page, size_t len)
 			clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
 			set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
 			break;
+		case ACTION_RECTIFY:
 		case ACTION_CHECK:
 		case ACTION_REPAIR:
 			ret = handle_requested_sync_action(mddev, action);
@@ -6813,6 +6861,7 @@ static void md_clean(struct mddev *mddev)
 	mddev->raid_disks = 0;
 	mddev->resync_offset = 0;
 	mddev->resync_min = 0;
+	mddev->rectify_min = 0;
 	mddev->resync_max = MaxSector;
 	mddev->reshape_position = MaxSector;
 	/* we still need mddev->external in export_rdev, do not clear it yet */
@@ -9343,6 +9392,7 @@ static sector_t md_sync_max_sectors(struct mddev *mddev,
 {
 	switch (action) {
 	case ACTION_RESYNC:
+	case ACTION_RECTIFY:
 	case ACTION_CHECK:
 	case ACTION_REPAIR:
 		atomic64_set(&mddev->resync_mismatches, 0);
@@ -9395,6 +9445,7 @@ static sector_t md_sync_position(struct mddev *mddev, enum sync_action action)
 	struct md_rdev *rdev;
 
 	switch (action) {
+	case ACTION_RECTIFY:
 	case ACTION_CHECK:
 	case ACTION_REPAIR:
 		return get_requested_position(mddev);
@@ -10020,6 +10071,7 @@ static bool md_choose_sync_action(struct mddev *mddev, int *spares)
 		clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
 		clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
 		clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
+		clear_bit(MD_RECOVERY_BADBLOCKS_RECTIFY, &mddev->recovery);
 
 		/* Start new recovery. */
 		set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
@@ -10077,10 +10129,14 @@ static void md_start_sync(struct work_struct *ws)
 	if (spares && md_bitmap_enabled(mddev, true))
 		mddev->bitmap_ops->write_all(mddev);
 
-	name = test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) ?
-			"reshape" : "resync";
-	rcu_assign_pointer(mddev->sync_thread,
-			   md_register_thread(md_do_sync, mddev, name));
+	if (!is_badblocks_recovery_requested(mddev) ||
+	    !md_badblocks_precheck(mddev)) {
+		name = test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) ?
+				"reshape" : "resync";
+		rcu_assign_pointer(mddev->sync_thread,
+				   md_register_thread(md_do_sync, mddev, name));
+	}
+
 	if (!mddev->sync_thread) {
 		pr_warn("%s: could not start resync thread...\n",
 			mdname(mddev));
@@ -10108,6 +10164,7 @@ static void md_start_sync(struct work_struct *ws)
 	clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
 	clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
 	clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
+	clear_bit(MD_RECOVERY_BADBLOCKS_RECTIFY, &mddev->recovery);
 	mddev_unlock(mddev);
 	/*
 	 * md_start_sync was triggered by MD_RECOVERY_NEEDED, so we should
@@ -10322,6 +10379,7 @@ void md_check_recovery(struct mddev *mddev)
 			clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
 			clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
 			clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
+			clear_bit(MD_RECOVERY_BADBLOCKS_RECTIFY, &mddev->recovery);
 			wake_up(&resync_wait);
 		}
 
@@ -10372,6 +10430,7 @@ void md_reap_sync_thread(struct mddev *mddev)
 	clear_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
 	clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
 	clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
+	clear_bit(MD_RECOVERY_BADBLOCKS_RECTIFY, &mddev->recovery);
 	clear_bit(MD_RECOVERY_LAZY_RECOVER, &mddev->recovery);
 	/*
 	 * We call mddev->cluster_ops->update_size here because sync_size could
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 14f9db38b7c5..0b9e3487bfed 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -102,6 +102,13 @@ enum sync_action {
 	 * are inconsistent data,
 	 */
 	ACTION_REPAIR,
+	/*
+	 * Represent by MD_RECOVERY_SYNC | MD_RECOVERY_REQUESTED |
+	 * MD_RECOVERY_BADBLOCKS_RECTIFY, start when user echo "rectify"
+	 * to sysfs api sync_action, used to repair the badblocks acked
+	 * in bad table;
+	 */
+	ACTION_RECTIFY,
 	/*
 	 * Represent by MD_RECOVERY_RESHAPE, start when new member disk is added
 	 * to the conf, notice that this is different from spares or
@@ -528,6 +535,7 @@ struct mddev {
 	sector_t			resync_offset;
 	sector_t			resync_min;	/* user requested sync
 							 * starts here */
+	sector_t			rectify_min;
 	sector_t			resync_max;	/* resync should pause
 							 * when it gets here */
 
@@ -668,6 +676,8 @@ enum recovery_flags {
 	MD_RESYNCING_REMOTE,
 	/* raid456 lazy initial recover */
 	MD_RECOVERY_LAZY_RECOVER,
+	/* try to repair acked badblocks*/
+	MD_RECOVERY_BADBLOCKS_RECTIFY,
 };
 
 enum md_ro_state {
@@ -1020,6 +1030,12 @@ static inline void mddev_unlock_and_resume(struct mddev *mddev)
 	mddev_resume(mddev);
 }
 
+static inline bool is_badblocks_recovery_requested(struct mddev *mddev)
+{
+	return test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
+	       test_bit(MD_RECOVERY_BADBLOCKS_RECTIFY, &mddev->recovery);
+}
+
 struct mdu_array_info_s;
 struct mdu_disk_info_s;
 
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 00120c86c443..90686a0ff9ca 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -176,7 +176,8 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
 	 * If this is a user-requested check/repair, allocate
 	 * RESYNC_PAGES for each bio.
 	 */
-	if (test_bit(MD_RECOVERY_REQUESTED, &conf->mddev->recovery))
+	if (test_bit(MD_RECOVERY_REQUESTED, &conf->mddev->recovery) &&
+	    !is_badblocks_recovery_requested(conf->mddev))
 		need_pages = conf->raid_disks * 2;
 	else
 		need_pages = 1;
@@ -2380,6 +2381,260 @@ static void sync_request_write(struct mddev *mddev, struct r1bio *r1_bio)
 	put_sync_write_buf(r1_bio);
 }
 
+static void end_rectify_read(struct bio *bio)
+{
+	struct r1bio *r1_bio = get_resync_r1bio(bio);
+	struct r1conf *conf = r1_bio->mddev->private;
+	struct md_rdev *rdev;
+	struct bio *next_bio;
+	bool all_fail = true;
+	int i;
+
+	update_head_pos(r1_bio->read_disk, r1_bio);
+
+	if (!bio->bi_status) {
+		set_bit(R1BIO_Uptodate, &r1_bio->state);
+		goto out;
+	}
+
+	for (i = r1_bio->read_disk + 1; i < conf->raid_disks; i++) {
+		rdev = conf->mirrors[i].rdev;
+		if (!rdev || test_bit(Faulty, &rdev->flags))
+			continue;
+
+		next_bio = r1_bio->bios[i];
+		if (next_bio->bi_end_io == end_rectify_read) {
+			r1_bio->read_disk = i;
+			all_fail = false;
+			break;
+		}
+	}
+
+	if (unlikely(all_fail)) {
+		md_done_sync(r1_bio->mddev, r1_bio->sectors);
+		md_sync_error(r1_bio->mddev);
+		put_buf(r1_bio);
+		return;
+	}
+out:
+	reschedule_retry(r1_bio);
+}
+
+static void end_rectify_write(struct bio *bio)
+{
+	struct r1bio *r1_bio = get_resync_r1bio(bio);
+
+	if (atomic_dec_and_test(&r1_bio->remaining)) {
+		/*
+		 * Rectify only attempts to clear acked bad
+		 * blocks, and it does not set bad blocks in
+		 * cases of R1BIO_WriteError.
+		 * Here we reuse R1BIO_MadeGood flag, which
+		 * does not guarantee that all write I/Os
+		 * actually succeeded.
+		 */
+		set_bit(R1BIO_MadeGood, &r1_bio->state);
+		reschedule_retry(r1_bio);
+	}
+}
+
+static void submit_rectify_read(struct r1bio *r1_bio)
+{
+	struct bio *bio;
+
+	bio = r1_bio->bios[r1_bio->read_disk];
+	bio->bi_status = 0;
+	submit_bio_noacct(bio);
+}
+
+static void rectify_request_write(struct mddev *mddev, struct r1bio *r1_bio)
+{
+	struct r1conf *conf = mddev->private;
+	struct bio *wbio = NULL;
+	struct md_rdev *rdev;
+	int wcnt = 0;
+	int i;
+
+	if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) {
+		submit_rectify_read(r1_bio);
+		return;
+	}
+
+	atomic_set(&r1_bio->remaining, 0);
+	for (i = 0; i < conf->raid_disks; i++) {
+		rdev = conf->mirrors[i].rdev;
+		if (!rdev || test_bit(Faulty, &rdev->flags))
+			continue;
+		wbio = r1_bio->bios[i];
+		if (wbio->bi_end_io == end_rectify_write) {
+			atomic_inc(&r1_bio->remaining);
+			wcnt++;
+			submit_bio_noacct(wbio);
+		}
+	}
+
+	if (unlikely(!wcnt)) {
+		md_done_sync(r1_bio->mddev, r1_bio->sectors);
+		put_buf(r1_bio);
+	}
+}
+
+static void handle_sync_write(struct mddev *mddev, struct r1bio *r1_bio)
+{
+	if (test_bit(R1BIO_BadBlocksRectify, &r1_bio->state))
+		rectify_request_write(mddev, r1_bio);
+	else
+		sync_request_write(mddev, r1_bio);
+}
+
+static sector_t get_badblocks_sync_sectors(struct mddev *mddev, sector_t sector_nr,
+					   int *skipped, unsigned long *bad_disks)
+{
+	struct r1conf *conf = mddev->private;
+	sector_t nr_sectors = mddev->dev_sectors - sector_nr;
+	bool all_faulty = true;
+	struct md_rdev *rdev;
+	bool good = false;
+	int i;
+
+	*skipped = 0;
+	for (i = 0; i < conf->raid_disks; i++) {
+		sector_t first_bad;
+		sector_t bad_sectors;
+
+		rdev = conf->mirrors[i].rdev;
+		if (!rdev || test_bit(Faulty, &rdev->flags))
+			continue;
+
+		all_faulty = false;
+		if (is_badblock(rdev, sector_nr, nr_sectors, &first_bad, &bad_sectors)) {
+			if (first_bad <= sector_nr) {
+				set_bit(i, bad_disks);
+				nr_sectors = min(nr_sectors, first_bad + bad_sectors - sector_nr);
+			} else {
+				good  = true;
+				nr_sectors = min(nr_sectors, first_bad - sector_nr);
+			}
+		} else {
+			good  = true;
+		}
+	}
+
+	if (all_faulty) {
+		*skipped = 1;
+		return 0;
+	}
+
+	if (!good || !bitmap_weight(bad_disks, conf->raid_disks))
+		*skipped = 1;
+
+	/* make sure nr_sectors won't go across barrier unit boundary */
+	return align_to_barrier_unit_end(sector_nr, nr_sectors);
+}
+
+static sector_t get_next_sync_sector(struct mddev *mddev, sector_t sector_nr,
+				     int *skipped, unsigned long *bad_disks)
+{
+	sector_t nr_sectors;
+
+	nr_sectors = get_badblocks_sync_sectors(mddev, sector_nr,
+						skipped, bad_disks);
+	if (!(*skipped) && nr_sectors > RESYNC_PAGES * (PAGE_SIZE >> 9))
+		nr_sectors = RESYNC_PAGES * (PAGE_SIZE >> 9);
+	return nr_sectors;
+}
+
+static struct r1bio *raid1_alloc_init_r1buf(struct r1conf *conf);
+static struct r1bio *init_sync_badblocks_r1bio(struct mddev *mddev,
+					       sector_t sector_nr,
+					       sector_t nr_sectors,
+					       unsigned long *bad_disks)
+{
+	struct r1conf *conf = mddev->private;
+	struct r1bio *r1_bio;
+	struct md_rdev *rdev;
+	int page_idx = 0;
+	struct bio *bio;
+	int i;
+
+	r1_bio = raid1_alloc_init_r1buf(conf);
+	r1_bio->mddev = mddev;
+	r1_bio->sector = sector_nr;
+	r1_bio->sectors = nr_sectors;
+	r1_bio->state = 0;
+	r1_bio->read_disk = -1;
+	set_bit(R1BIO_IsSync, &r1_bio->state);
+	set_bit(R1BIO_BadBlocksRectify, &r1_bio->state);
+
+	for (i = 0; i < conf->raid_disks; i++) {
+		rdev = conf->mirrors[i].rdev;
+		if (!rdev || test_bit(Faulty, &rdev->flags))
+			continue;
+
+		if (r1_bio->read_disk < 0 && !test_bit(i, bad_disks))
+			r1_bio->read_disk = i;
+
+		bio = r1_bio->bios[i];
+		if (test_bit(i, bad_disks)) {
+			bio->bi_opf = REQ_OP_WRITE;
+			bio->bi_end_io = end_rectify_write;
+		} else {
+			bio->bi_opf = REQ_OP_READ;
+			bio->bi_end_io = end_rectify_read;
+		}
+
+		atomic_inc(&rdev->nr_pending);
+		bio->bi_iter.bi_sector = sector_nr + rdev->data_offset;
+		bio_set_dev(bio, rdev->bdev);
+	}
+
+	if (unlikely(r1_bio->read_disk < 0)) {
+		put_buf(r1_bio);
+		return NULL;
+	}
+
+	while (nr_sectors > 0 && page_idx < RESYNC_PAGES) {
+		int len = nr_sectors << 9 < PAGE_SIZE ?
+			  nr_sectors << 9 : PAGE_SIZE;
+		struct resync_pages *rp;
+
+		for (i = 0; i < conf->raid_disks; i++) {
+			bio = r1_bio->bios[i];
+			rp = get_resync_pages(bio);
+			__bio_add_page(bio, resync_fetch_page(rp, page_idx), len, 0);
+		}
+
+		nr_sectors -= len >> 9;
+		page_idx++;
+	}
+
+	return r1_bio;
+}
+
+static sector_t do_sync_badblocks_rectify(struct mddev *mddev,
+					  sector_t sector_nr, int *skipped)
+{
+	DECLARE_BITMAP(bad_disks, MAX_RAID_DISKS);
+	struct r1conf *conf = mddev->private;
+	struct r1bio *r1_bio;
+	sector_t nr_sectors;
+
+	bitmap_zero(bad_disks, MAX_RAID_DISKS);
+	nr_sectors = get_next_sync_sector(mddev, sector_nr, skipped, bad_disks);
+	if (*skipped) {
+		lower_barrier(conf, sector_nr);
+		return nr_sectors;
+	}
+
+	r1_bio = init_sync_badblocks_r1bio(mddev, sector_nr,
+					   nr_sectors, bad_disks);
+	if (!r1_bio)
+		return 0;
+
+	submit_rectify_read(r1_bio);
+	return nr_sectors;
+}
+
 /*
  * This is a kernel thread which:
  *
@@ -2558,13 +2813,16 @@ static void handle_sync_write_finished(struct r1conf *conf, struct r1bio *r1_bio
 {
 	int m;
 	int s = r1_bio->sectors;
+	bool is_rectify = test_bit(R1BIO_BadBlocksRectify, &r1_bio->state);
+
 	for (m = 0; m < conf->raid_disks * 2 ; m++) {
 		struct md_rdev *rdev = conf->mirrors[m].rdev;
 		struct bio *bio = r1_bio->bios[m];
 		if (bio->bi_end_io == NULL)
 			continue;
 		if (!bio->bi_status &&
-		    test_bit(R1BIO_MadeGood, &r1_bio->state))
+		    test_bit(R1BIO_MadeGood, &r1_bio->state) &&
+		    (!is_rectify || bio->bi_end_io == end_rectify_write))
 			rdev_clear_badblocks(rdev, r1_bio->sector, s, 0);
 		if (bio->bi_status &&
 		    test_bit(R1BIO_WriteError, &r1_bio->state))
@@ -2728,7 +2986,7 @@ static void raid1d(struct md_thread *thread)
 			    test_bit(R1BIO_WriteError, &r1_bio->state))
 				handle_sync_write_finished(conf, r1_bio);
 			else
-				sync_request_write(mddev, r1_bio);
+				handle_sync_write(mddev, r1_bio);
 		} else if (test_bit(R1BIO_MadeGood, &r1_bio->state) ||
 			   test_bit(R1BIO_WriteError, &r1_bio->state))
 			handle_write_finished(conf, r1_bio);
@@ -2837,7 +3095,8 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
 	/* before building a request, check if we can skip these blocks..
 	 * This call the bitmap_start_sync doesn't actually record anything
 	 */
-	if (!md_bitmap_start_sync(mddev, sector_nr, &sync_blocks, true) &&
+	if (!is_badblocks_recovery_requested(mddev) &&
+	    !md_bitmap_start_sync(mddev, sector_nr, &sync_blocks, true) &&
 	    !conf->fullsync && !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) {
 		/* We can skip this block, and probably several more */
 		*skipped = 1;
@@ -2863,6 +3122,9 @@ static sector_t raid1_sync_request(struct mddev *mddev, sector_t sector_nr,
 	if (raise_barrier(conf, sector_nr))
 		return 0;
 
+	if (is_badblocks_recovery_requested(mddev))
+		return do_sync_badblocks_rectify(mddev, sector_nr, skipped);
+
 	r1_bio = raid1_alloc_init_r1buf(conf);
 
 	/*
diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
index c98d43a7ae99..6ca8bf808d69 100644
--- a/drivers/md/raid1.h
+++ b/drivers/md/raid1.h
@@ -184,6 +184,7 @@ enum r1bio_state {
 	R1BIO_MadeGood,
 	R1BIO_WriteError,
 	R1BIO_FailFast,
+	R1BIO_BadBlocksRectify,
 };
 
 static inline int sector_to_idx(sector_t sector)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks
  2026-02-03  6:12 [RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks Zheng Qixing
                   ` (4 preceding siblings ...)
  2026-02-03  6:12 ` [RFC v2 5/5] md/raid1: introduce rectify action to repair badblocks Zheng Qixing
@ 2026-02-03  7:31 ` Christoph Hellwig
  2026-02-03  8:08   ` Pascal Hambourg
  2026-02-03  8:08   ` Zheng Qixing
  5 siblings, 2 replies; 15+ messages in thread
From: Christoph Hellwig @ 2026-02-03  7:31 UTC (permalink / raw)
  To: Zheng Qixing
  Cc: song, yukuai, linan122, xni, linux-raid, linux-kernel, yi.zhang,
	yangerkun, houtao1, zhengqixing

Just curious, but what kind of devices do you see that have
permanent bad blocks at a fixed location that are not fixed by
rewriting the sector?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks
  2026-02-03  7:31 ` [RFC v2 0/5] md/raid1: introduce a new sync " Christoph Hellwig
@ 2026-02-03  8:08   ` Pascal Hambourg
  2026-02-03 16:30     ` Christoph Hellwig
  2026-02-03  8:08   ` Zheng Qixing
  1 sibling, 1 reply; 15+ messages in thread
From: Pascal Hambourg @ 2026-02-03  8:08 UTC (permalink / raw)
  To: Christoph Hellwig, Zheng Qixing
  Cc: song, yukuai, linan122, xni, linux-raid, linux-kernel, yi.zhang,
	yangerkun, houtao1, zhengqixing

On 03/02/2026 à 08:31, Christoph Hellwig wrote:
> Just curious, but what kind of devices do you see that have
> permanent bad blocks at a fixed location that are not fixed by
> rewriting the sector?

I have seen this with several hard disk drives of various brands, even 
though SMART attribute #5 (reallocated sector count) had not reached the 
limit.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks
  2026-02-03  7:31 ` [RFC v2 0/5] md/raid1: introduce a new sync " Christoph Hellwig
  2026-02-03  8:08   ` Pascal Hambourg
@ 2026-02-03  8:08   ` Zheng Qixing
  2026-02-03 16:31     ` Christoph Hellwig
  1 sibling, 1 reply; 15+ messages in thread
From: Zheng Qixing @ 2026-02-03  8:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: song, yukuai, linan122, xni, linux-raid, linux-kernel, yi.zhang,
	yangerkun, houtao1, Zheng Qixing

Hi,

在 2026/2/3 15:31, Christoph Hellwig 写道:
> Just curious, but what kind of devices do you see that have
> permanent bad blocks at a fixed location that are not fixed by
> rewriting the sector?

The bad_blocks entries record sectors where I/O failed, which
indicates that the device-internal remapping did not succeed
at that time.

`rectify` does not assume a permanently bad or fixed LBA. Its
purpose is to trigger an additional rewrite, giving the underlying
device (e.g. FTL or firmware) another opportunity to perform its
own remapping.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks
  2026-02-03  8:08   ` Pascal Hambourg
@ 2026-02-03 16:30     ` Christoph Hellwig
  2026-02-03 20:36       ` Pascal Hambourg
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2026-02-03 16:30 UTC (permalink / raw)
  To: Pascal Hambourg
  Cc: Christoph Hellwig, Zheng Qixing, song, yukuai, linan122, xni,
	linux-raid, linux-kernel, yi.zhang, yangerkun, houtao1,
	zhengqixing

On Tue, Feb 03, 2026 at 09:08:18AM +0100, Pascal Hambourg wrote:
> On 03/02/2026 à 08:31, Christoph Hellwig wrote:
> > Just curious, but what kind of devices do you see that have
> > permanent bad blocks at a fixed location that are not fixed by
> > rewriting the sector?
> 
> I have seen this with several hard disk drives of various brands, even
> though SMART attribute #5 (reallocated sector count) had not reached the
> limit.

Weird.  Can you share the models?  I'm especially curious if these
are consumer of enterprise drives and of what vintage.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks
  2026-02-03  8:08   ` Zheng Qixing
@ 2026-02-03 16:31     ` Christoph Hellwig
  2026-02-04  9:29       ` Zheng Qixing
  2026-02-04  9:32       ` Zheng Qixing
  0 siblings, 2 replies; 15+ messages in thread
From: Christoph Hellwig @ 2026-02-03 16:31 UTC (permalink / raw)
  To: Zheng Qixing
  Cc: Christoph Hellwig, song, yukuai, linan122, xni, linux-raid,
	linux-kernel, yi.zhang, yangerkun, houtao1, Zheng Qixing

On Tue, Feb 03, 2026 at 04:08:23PM +0800, Zheng Qixing wrote:
> Hi,
> 
> 在 2026/2/3 15:31, Christoph Hellwig 写道:
> > Just curious, but what kind of devices do you see that have
> > permanent bad blocks at a fixed location that are not fixed by
> > rewriting the sector?
> 
> The bad_blocks entries record sectors where I/O failed, which
> indicates that the device-internal remapping did not succeed
> at that time.
> 
> `rectify` does not assume a permanently bad or fixed LBA. Its
> purpose is to trigger an additional rewrite, giving the underlying
> device (e.g. FTL or firmware) another opportunity to perform its
> own remapping.

Well, what devices do you see where writes fail, but rewrites
fix them?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks
  2026-02-03 16:30     ` Christoph Hellwig
@ 2026-02-03 20:36       ` Pascal Hambourg
  2026-02-04 16:59         ` Christoph Hellwig
  0 siblings, 1 reply; 15+ messages in thread
From: Pascal Hambourg @ 2026-02-03 20:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Zheng Qixing, song, yukuai, linan122, xni, linux-raid,
	linux-kernel, yi.zhang, yangerkun, houtao1, zhengqixing

On 03/02/2026 at 17:30, Christoph Hellwig wrote:
> On Tue, Feb 03, 2026 at 09:08:18AM +0100, Pascal Hambourg wrote:
>> On 03/02/2026 à 08:31, Christoph Hellwig wrote:
>>> Just curious, but what kind of devices do you see that have
>>> permanent bad blocks at a fixed location that are not fixed by
>>> rewriting the sector?
>>
>> I have seen this with several hard disk drives of various brands, even
>> though SMART attribute #5 (reallocated sector count) had not reached the
>> limit.
> 
> Weird.  Can you share the models?  I'm especially curious if these
> are consumer of enterprise drives and of what vintage.

I did not keep track of the models and do not remember them, it was a 
long time ago. They were mostly hard disk drives from Dell and HP 
professional desktop and laptop series, so consumer grade I guess, 
manufactured around 2010.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks
  2026-02-03 16:31     ` Christoph Hellwig
@ 2026-02-04  9:29       ` Zheng Qixing
  2026-02-04  9:32       ` Zheng Qixing
  1 sibling, 0 replies; 15+ messages in thread
From: Zheng Qixing @ 2026-02-04  9:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: song, yukuai, linan122, xni, linux-raid, linux-kernel, yi.zhang,
	yangerkun, houtao1, Zheng Qixing

在 2026/2/4 0:31, Christoph Hellwig 写道:

> On Tue, Feb 03, 2026 at 04:08:23PM +0800, Zheng Qixing wrote:
>> Hi,
>>
>> 在 2026/2/3 15:31, Christoph Hellwig 写道:
>>> Just curious, but what kind of devices do you see that have
>>> permanent bad blocks at a fixed location that are not fixed by
>>> rewriting the sector?
>> The bad_blocks entries record sectors where I/O failed, which
>> indicates that the device-internal remapping did not succeed
>> at that time.
>>
>> `rectify` does not assume a permanently bad or fixed LBA. Its
>> purpose is to trigger an additional rewrite, giving the underlying
>> device (e.g. FTL or firmware) another opportunity to perform its
>> own remapping.
> Well, what devices do you see where writes fail, but rewrites
> fix them?

I understand your concerns, but I do not have a concrete example tied to a specific device model... The intent here is to provide an additional rewrite opportunity, which allows the write path to be exercised again and gives the underlying device or stack a chance to recover or remap the affected range. For remote storage devices, I/O may fail due to network or transport issues. If the final attempt fails, MD can record the affected range in bad_blocks. This behavior does not appear to be tied to a specific device model. For local storage, some controllers may have limitations or corner cases in their remapping mechanisms. In such cases, a sector that could potentially be recovered may be marked as bad, leaving no opportunity for a subsequent successful rewrite.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks
  2026-02-03 16:31     ` Christoph Hellwig
  2026-02-04  9:29       ` Zheng Qixing
@ 2026-02-04  9:32       ` Zheng Qixing
  1 sibling, 0 replies; 15+ messages in thread
From: Zheng Qixing @ 2026-02-04  9:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: song, yukuai, linan122, xni, linux-raid, linux-kernel, yi.zhang,
	yangerkun, houtao1, Zheng Qixing

resend..

在 2026/2/4 0:31, Christoph Hellwig 写道:
> On Tue, Feb 03, 2026 at 04:08:23PM +0800, Zheng Qixing wrote:
>> Hi,
>>
>> 在 2026/2/3 15:31, Christoph Hellwig 写道:
>>> Just curious, but what kind of devices do you see that have
>>> permanent bad blocks at a fixed location that are not fixed by
>>> rewriting the sector?
>> The bad_blocks entries record sectors where I/O failed, which
>> indicates that the device-internal remapping did not succeed
>> at that time.
>>
>> `rectify` does not assume a permanently bad or fixed LBA. Its
>> purpose is to trigger an additional rewrite, giving the underlying
>> device (e.g. FTL or firmware) another opportunity to perform its
>> own remapping.
> Well, what devices do you see where writes fail, but rewrites
> fix them?

I understand your concerns, but I do not have a concrete example tied
to a specific device model...

The intent here is to provide an additional rewrite opportunity, which
allows the write path to be exercised again and gives the underlying
device or stack a chance to recover or remap the affected range.

For remote storage devices, I/O may fail due to network or transport
issues. If the final attempt fails, MD can record the affected range in
bad_blocks. This behavior does not appear to be tied to a specific
device model.

For local storage, some controllers may have limitations or corner cases
in their remapping mechanisms. In such cases, a sector that could
potentially be recovered may be marked as bad, leaving no opportunity
for a subsequent successful rewrite.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks
  2026-02-03 20:36       ` Pascal Hambourg
@ 2026-02-04 16:59         ` Christoph Hellwig
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2026-02-04 16:59 UTC (permalink / raw)
  To: Pascal Hambourg
  Cc: Christoph Hellwig, Zheng Qixing, song, yukuai, linan122, xni,
	linux-raid, linux-kernel, yi.zhang, yangerkun, houtao1,
	zhengqixing

On Tue, Feb 03, 2026 at 09:36:38PM +0100, Pascal Hambourg wrote:
> I did not keep track of the models and do not remember them, it was a long
> time ago. They were mostly hard disk drives from Dell and HP professional
> desktop and laptop series, so consumer grade I guess, manufactured around
> 2010.

Ok, for 15ish year old consumer devices I would not be very surprised.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-02-04 16:59 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-03  6:12 [RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks Zheng Qixing
2026-02-03  6:12 ` [RFC v2 1/5] md: add helpers for requested sync action Zheng Qixing
2026-02-03  6:12 ` [RFC v2 2/5] md: serialize requested sync actions and clear stale request state Zheng Qixing
2026-02-03  6:12 ` [RFC v2 3/5] md: rename mdstat action "recovery" to "recover" Zheng Qixing
2026-02-03  6:12 ` [RFC v2 4/5] md: introduce MAX_RAID_DISKS macro to replace magic number Zheng Qixing
2026-02-03  6:12 ` [RFC v2 5/5] md/raid1: introduce rectify action to repair badblocks Zheng Qixing
2026-02-03  7:31 ` [RFC v2 0/5] md/raid1: introduce a new sync " Christoph Hellwig
2026-02-03  8:08   ` Pascal Hambourg
2026-02-03 16:30     ` Christoph Hellwig
2026-02-03 20:36       ` Pascal Hambourg
2026-02-04 16:59         ` Christoph Hellwig
2026-02-03  8:08   ` Zheng Qixing
2026-02-03 16:31     ` Christoph Hellwig
2026-02-04  9:29       ` Zheng Qixing
2026-02-04  9:32       ` Zheng Qixing

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox