public inbox for linux-raid@vger.kernel.org
 help / color / mirror / Atom feed
From: Kenta Akagi <k@mgml.me>
To: Song Liu <song@kernel.org>, Yu Kuai <yukuai3@huawei.com>,
	Mariusz Tkaczyk <mtkaczyk@kernel.org>, Shaohua Li <shli@fb.com>,
	Guoqing Jiang <jgq516@gmail.com>
Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org,
	Kenta Akagi <k@mgml.me>
Subject: [PATCH v4 3/9] md: introduce md_bio_failure_error()
Date: Mon, 15 Sep 2025 12:42:04 +0900	[thread overview]
Message-ID: <20250915034210.8533-4-k@mgml.me> (raw)
In-Reply-To: <20250915034210.8533-1-k@mgml.me>

Add a new helper function md_bio_failure_error().
It is serialized with md_error() under the same lock and works
almost the same, but with two differences:

* Takes the failed bio as an argument
* If MD_FAILFAST is set in bi_opf and the target rdev is LastDev,
  it does not mark the rdev faulty

Failfast bios must not break the array, but in the current implementation
this can happen. This is because MD_BROKEN was introduced in RAID1/RAID10
and is set when md_error() is called on an rdev required for mddev
operation. At the time failfast was introduced, this was not the case.

Before this commit, md_error() has already been serialized, and
RAID1/RAID10 mark rdevs that must not be set Faulty by Failfast
with the LastDev flag.

The actual change in bio error handling will follow in a later commit.

Signed-off-by: Kenta Akagi <k@mgml.me>
---
 drivers/md/md.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 drivers/md/md.h |  4 +++-
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 5607578a6db9..65fdd9bae8f4 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8297,6 +8297,48 @@ void md_error(struct mddev *mddev, struct md_rdev *rdev)
 }
 EXPORT_SYMBOL(md_error);
 
+/** md_bio_failure_error() - md error handler for MD_FAILFAST bios
+ * @mddev: affected md device.
+ * @rdev: member device to fail.
+ * @bio: bio whose triggered device failure.
+ *
+ * This is almost the same as md_error(). That is, it is serialized at
+ * the same level as md_error, marks the rdev as Faulty, and changes
+ * the mddev status.
+ * However, if all of the following conditions are met, it does nothing.
+ * This is because MD_FAILFAST bios must not stopping the array.
+ *  * RAID1 or RAID10
+ *  * LastDev - if rdev becomes Faulty, mddev will stop
+ *  * The failed bio has MD_FAILFAST set
+ *
+ * Returns: true if _md_error() was called, false if not.
+ */
+bool md_bio_failure_error(struct mddev *mddev, struct md_rdev *rdev, struct bio *bio)
+{
+	bool do_md_error = true;
+
+	spin_lock(&mddev->error_handle_lock);
+	if (mddev->pers) {
+		if (mddev->pers->head.id == ID_RAID1 ||
+		    mddev->pers->head.id == ID_RAID10) {
+			if (test_bit(LastDev, &rdev->flags) &&
+			    test_bit(FailFast, &rdev->flags) &&
+			    bio != NULL && (bio->bi_opf & MD_FAILFAST))
+				do_md_error = false;
+		}
+	}
+
+	if (do_md_error)
+		_md_error(mddev, rdev);
+	else
+		pr_warn_ratelimited("md: %s: %s didn't do anything for %pg\n",
+			mdname(mddev), __func__, rdev->bdev);
+
+	spin_unlock(&mddev->error_handle_lock);
+	return do_md_error;
+}
+EXPORT_SYMBOL(md_bio_failure_error);
+
 /* seq_file implementation /proc/mdstat */
 
 static void status_unused(struct seq_file *seq)
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 5177cb609e4b..11389ea58431 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -283,7 +283,8 @@ enum flag_bits {
 				 */
 	LastDev,		/* This is the last working rdev.
 				 * so don't use FailFast any more for
-				 * metadata.
+				 * metadata and don't Fail rdev
+				 * when FailFast bio failure.
 				 */
 	CollisionCheck,		/*
 				 * check if there is collision between raid1
@@ -906,6 +907,7 @@ extern void md_write_end(struct mddev *mddev);
 extern void md_done_sync(struct mddev *mddev, int blocks, int ok);
 void _md_error(struct mddev *mddev, struct md_rdev *rdev);
 extern void md_error(struct mddev *mddev, struct md_rdev *rdev);
+extern bool md_bio_failure_error(struct mddev *mddev, struct md_rdev *rdev, struct bio *bio);
 extern void md_finish_reshape(struct mddev *mddev);
 void md_submit_discard_bio(struct mddev *mddev, struct md_rdev *rdev,
 			struct bio *bio, sector_t start, sector_t size);
-- 
2.50.1


  parent reply	other threads:[~2025-09-15  3:43 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-15  3:42 [PATCH v4 0/9] Don't set MD_BROKEN on failfast bio failure Kenta Akagi
2025-09-15  3:42 ` [PATCH v4 1/9] md/raid1,raid10: Set the LastDev flag when the configuration changes Kenta Akagi
2025-09-18  1:00   ` Yu Kuai
2025-09-18 14:02     ` Kenta Akagi
2025-09-21  7:54   ` Xiao Ni
2025-09-21 14:48     ` Kenta Akagi
2025-09-15  3:42 ` [PATCH v4 2/9] md: serialize md_error() Kenta Akagi
2025-09-18  1:04   ` Yu Kuai
2025-09-21  6:11     ` Kenta Akagi
2025-09-15  3:42 ` Kenta Akagi [this message]
2025-09-18  1:09   ` [PATCH v4 3/9] md: introduce md_bio_failure_error() Yu Kuai
2025-09-18 14:56     ` Kenta Akagi
2025-09-15  3:42 ` [PATCH v4 4/9] md/raid1,raid10: Don't set MD_BROKEN on failfast bio failure Kenta Akagi
2025-09-18  1:26   ` Yu Kuai
2025-09-18 15:22     ` Kenta Akagi
2025-09-19  1:36       ` Yu Kuai
2025-09-20  6:30         ` Kenta Akagi
2025-09-20  9:51           ` Yu Kuai
2025-09-23 15:54             ` Kenta Akagi
2025-09-15  3:42 ` [PATCH v4 5/9] md/raid1,raid10: Set R{1,10}BIO_Uptodate when successful retry of a failed bio Kenta Akagi
2025-09-17  9:24   ` Li Nan
2025-09-17 13:20     ` Kenta Akagi
2025-09-18  6:39       ` Li Nan
2025-09-18 15:36         ` Kenta Akagi
2025-09-19  1:37           ` Li Nan
2025-09-15  3:42 ` [PATCH v4 6/9] md/raid1,raid10: Fix missing retries Failfast write bios on no-bbl rdevs Kenta Akagi
2025-09-17 10:06   ` Li Nan
2025-09-17 13:33     ` Kenta Akagi
2025-09-18  6:58       ` Li Nan
2025-09-18 16:23         ` Kenta Akagi
2025-09-19  1:28           ` Li Nan
2025-09-15  3:42 ` [PATCH v4 7/9] md/raid10: fix failfast read error not rescheduled Kenta Akagi
2025-09-18  7:38   ` Li Nan
2025-09-18 16:12     ` Kenta Akagi
2025-09-19  1:20       ` Li Nan
2025-09-15  3:42 ` [PATCH v4 8/9] md/raid1,raid10: Add error message when setting MD_BROKEN Kenta Akagi
2025-09-15  3:42 ` [PATCH v4 9/9] md/raid1,raid10: Fix: Operation continuing on 0 devices Kenta Akagi
2025-09-15  7:19   ` Paul Menzel
2025-09-15  8:19     ` Kenta Akagi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250915034210.8533-4-k@mgml.me \
    --to=k@mgml.me \
    --cc=jgq516@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=mtkaczyk@kernel.org \
    --cc=shli@fb.com \
    --cc=song@kernel.org \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox