public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: Diangang Li <diangangli@gmail.com>
To: tytso@mit.edu, adilger.kernel@dilger.ca
Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, changfengnan@bytedance.com,
	Diangang Li <lidiangang@bytedance.com>
Subject: [RFC 1/1] ext4: fail fast on repeated metadata reads after IO failure
Date: Wed, 25 Mar 2026 17:33:49 +0800	[thread overview]
Message-ID: <20260325093349.630193-2-diangangli@gmail.com> (raw)
In-Reply-To: <20260325093349.630193-1-diangangli@gmail.com>

From: Diangang Li <lidiangang@bytedance.com>

ext4 metadata reads serialize on BH_Lock (lock_buffer). If the read fails,
the buffer remains !Uptodate. With concurrent callers, each waiter can
retry the same failing read after the previous holder drops BH_Lock. This
amplifies device retry latency and may trigger hung tasks.

In the normal read path the block driver already performs its own retries.
Once the retries keep failing, re-submitting the same metadata read from
the filesystem just amplifies the latency by serializing waiters on
BH_Lock.

Remember read failures on buffer_head and fail fast for ext4 metadata reads
once a buffer has already failed to read. Clear the flag on successful
read/write completion so the buffer can recover. ext4 read-ahead uses
ext4_read_bh_nowait(), so it does not set the failure flag and remains
best-effort.

Example hung stacks:

  INFO: task toutiao.infra.t:3760933 blocked for more than 327 seconds.
  Call Trace:
   __schedule
   io_schedule
   __wait_on_bit_lock
   bh_uptodate_or_lock
   __read_extent_tree_block
   ext4_find_extent
   ext4_ext_map_blocks
   ext4_map_blocks
   ext4_getblk
   ext4_bread
   __ext4_read_dirblock
   dx_probe
   ext4_htree_fill_tree
   ext4_readdir
   iterate_dir
   ksys_getdents64

  INFO: task toutiao.infra.t:2724456 blocked for more than 327 seconds.
  Call Trace:
   __schedule
   io_schedule
   __wait_on_bit_lock
   ext4_read_bh_lock
   ext4_bread
   __ext4_read_dirblock
   htree_dirblock_to_tree
   ext4_htree_fill_tree
   ext4_readdir
   iterate_dir
   ksys_getdents64

Signed-off-by: Diangang Li <lidiangang@bytedance.com>
Reviewed-by: Fengnan Chang <changfengnan@bytedance.com>
---
 fs/buffer.c                 |  2 ++
 fs/ext4/super.c             | 12 +++++++++++-
 include/linux/buffer_head.h |  2 ++
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 2d2e3ecec6b2b..b41d54b8b1f4d 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -145,6 +145,7 @@ static void __end_buffer_read_notouch(struct buffer_head *bh, int uptodate)
 {
 	if (uptodate) {
 		set_buffer_uptodate(bh);
+		clear_buffer_read_io_error(bh);
 	} else {
 		/* This happens, due to failed read-ahead attempts. */
 		clear_buffer_uptodate(bh);
@@ -167,6 +168,7 @@ void end_buffer_write_sync(struct buffer_head *bh, int uptodate)
 {
 	if (uptodate) {
 		set_buffer_uptodate(bh);
+		clear_buffer_read_io_error(bh);
 	} else {
 		buffer_io_error(bh, ", lost sync page write");
 		mark_buffer_write_io_error(bh);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 781c083000c2e..89a99851864a0 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -198,7 +198,13 @@ int ext4_read_bh(struct buffer_head *bh, blk_opf_t op_flags,
 {
 	BUG_ON(!buffer_locked(bh));
 
+	if (!buffer_write_io_error(bh) && buffer_read_io_error(bh)) {
+		unlock_buffer(bh);
+		return -EIO;
+	}
+
 	if (ext4_buffer_uptodate(bh)) {
+		clear_buffer_read_io_error(bh);
 		unlock_buffer(bh);
 		return 0;
 	}
@@ -206,8 +212,12 @@ int ext4_read_bh(struct buffer_head *bh, blk_opf_t op_flags,
 	__ext4_read_bh(bh, op_flags, end_io, simu_fail);
 
 	wait_on_buffer(bh);
-	if (buffer_uptodate(bh))
+	if (buffer_uptodate(bh)) {
+		clear_buffer_read_io_error(bh);
 		return 0;
+	}
+	if (!buffer_write_io_error(bh))
+		set_buffer_read_io_error(bh);
 	return -EIO;
 }
 
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index b16b88bfbc3e7..be8bedcde379e 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -29,6 +29,7 @@ enum bh_state_bits {
 	BH_Delay,	/* Buffer is not yet allocated on disk */
 	BH_Boundary,	/* Block is followed by a discontiguity */
 	BH_Write_EIO,	/* I/O error on write */
+	BH_Read_EIO,	/* I/O error on read */
 	BH_Unwritten,	/* Buffer is allocated on disk but not written */
 	BH_Quiet,	/* Buffer Error Prinks to be quiet */
 	BH_Meta,	/* Buffer contains metadata */
@@ -132,6 +133,7 @@ BUFFER_FNS(Async_Write, async_write)
 BUFFER_FNS(Delay, delay)
 BUFFER_FNS(Boundary, boundary)
 BUFFER_FNS(Write_EIO, write_io_error)
+BUFFER_FNS(Read_EIO, read_io_error)
 BUFFER_FNS(Unwritten, unwritten)
 BUFFER_FNS(Meta, meta)
 BUFFER_FNS(Prio, prio)
-- 
2.39.5

  reply	other threads:[~2026-03-25  9:34 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-25  9:33 [RFC PATCH 0/1] ext4: fail fast on repeated metadata reads after IO failure Diangang Li
2026-03-25  9:33 ` Diangang Li [this message]
2026-03-25 10:15   ` [RFC 1/1] " Andreas Dilger
2026-03-25 11:13     ` Diangang Li
2026-03-25 14:27       ` Zhang Yi
2026-03-26  2:26         ` changfengnan
2026-03-26  7:42         ` Diangang Li
2026-03-26 11:09           ` Zhang Yi
2026-03-25 15:06     ` Matthew Wilcox
2026-03-26 12:09       ` Diangang Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260325093349.630193-2-diangangli@gmail.com \
    --to=diangangli@gmail.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=changfengnan@bytedance.com \
    --cc=lidiangang@bytedance.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox