From: "changfengnan" <changfengnan@bytedance.com>
To: "Zhang Yi" <yizhang089@gmail.com>
Cc: "Diangang Li" <lidiangang@bytedance.com>,
"Andreas Dilger" <adilger@dilger.ca>,
"Diangang Li" <diangangli@gmail.com>, <tytso@mit.edu>,
<linux-ext4@vger.kernel.org>, <linux-fsdevel@vger.kernel.org>,
<linux-kernel@vger.kernel.org>
Subject: Re: [RFC 1/1] ext4: fail fast on repeated metadata reads after IO failure
Date: Thu, 26 Mar 2026 10:26:16 +0800 [thread overview]
Message-ID: <d9210bcdf73fbe1ac8b6ec132865609a3ed68688.b75b68ec.808e.4625.9191.7f725153fe9d@bytedance.com> (raw)
In-Reply-To: <e5c657e6-ffbd-4327-adaf-ae52cb50b96d@gmail.com>
> From: "Zhang Yi"<yizhang089@gmail.com>
> Date: Wed, Mar 25, 2026, 22:27
> Subject: Re: [RFC 1/1] ext4: fail fast on repeated metadata reads after IO failure
> To: "Diangang Li"<lidiangang@bytedance.com>, "Andreas Dilger"<adilger@dilger.ca>, "Diangang Li"<diangangli@gmail.com>
> Cc: <tytso@mit.edu>, <linux-ext4@vger.kernel.org>, <linux-fsdevel@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <changfengnan@bytedance.com>
> Hi, Diangang,
>
> On 3/25/2026 7:13 PM, Diangang Li wrote:
> > Hi Andreas,
> >
> > BH_Read_EIO is cleared on successful read or write.
>
> I think what Andreas means is, since you modified the ext4_read_bh()
> interface, if the bh to be read already has the Read_EIO flag set, then
> subsequent read operations through this interface will directly return
> failure without issuing a read I/O. At the same time, because its state
IMO, we first need to reach a consensus on whether we can expect a
retry to succeed after a read failure.
Given that current SCSI and NVMe drivers already perform multiple
retries for I/O errors.
IMO, this depends on the specific error. If the block layer returns
BLK_STS_RESOURCE or BLK_STS_AGAIN, we can retry; however, if
it returns BLK_STS_MEDIUM or BLK_STS_IOERR, there is no need to retry.
For scenarios requiring a retry, we should also wait for a certain time
window before retrying.
Thanks.
Fengnan.
> is also not uptodate, for an existing block, a write request will not be
> issued either. How can we clear this Read_EIO flag? IIRC, relying solely
> on ext4_read_bh_nowait() doesn't seem sufficient to achieve this.
>
> Thanks,
> Yi.
>
> >
> > In practice bad blocks are typically repaired/remapped on write, so we
> > expect recovery after a successful rewrite. If the block is never
> > rewritten, repeatedly issuing the same failing read does not help.
> >
> > We clear the flag on successful reads so the buffer can recover
> > immediately if the error was transient. Since read-ahead reads are not
> > blocked, a later successful read-ahead will clear the flag and allow
> > subsequent synchronous readers to proceed normally.
> >
> > Best,
> > Diangang
> >
> > On 3/25/26 6:15 PM, Andreas Dilger wrote:
> >> On Mar 25, 2026, at 03:33, Diangang Li <diangangli@gmail.com> wrote:
> >>>
> >>> From: Diangang Li <lidiangang@bytedance.com>
> >>>
> >>> ext4 metadata reads serialize on BH_Lock (lock_buffer). If the read fails,
> >>> the buffer remains !Uptodate. With concurrent callers, each waiter can
> >>> retry the same failing read after the previous holder drops BH_Lock. This
> >>> amplifies device retry latency and may trigger hung tasks.
> >>>
> >>> In the normal read path the block driver already performs its own retries.
> >>> Once the retries keep failing, re-submitting the same metadata read from
> >>> the filesystem just amplifies the latency by serializing waiters on
> >>> BH_Lock.
> >>>
> >>> Remember read failures on buffer_head and fail fast for ext4 metadata reads
> >>> once a buffer has already failed to read. Clear the flag on successful
> >>> read/write completion so the buffer can recover. ext4 read-ahead uses
> >>> ext4_read_bh_nowait(), so it does not set the failure flag and remains
> >>> best-effort.
> >>
> >> Not that the patch is bad, but if the BH_Read_EIO flag is set on a buffer
> >> and it prevents other tasks from reading that block again, how would the
> >> buffer ever become Uptodate to clear the flag? There isn't enough state
> >> in a 1-bit flag to have any kind of expiry and later retry.
> >>
> >> Cheers, Andreas
> >
>
next prev parent reply other threads:[~2026-03-26 2:26 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-25 9:33 [RFC PATCH 0/1] ext4: fail fast on repeated metadata reads after IO failure Diangang Li
2026-03-25 9:33 ` [RFC 1/1] " Diangang Li
2026-03-25 10:15 ` Andreas Dilger
2026-03-25 11:13 ` Diangang Li
2026-03-25 14:27 ` Zhang Yi
2026-03-26 2:26 ` changfengnan [this message]
2026-03-26 7:42 ` Diangang Li
2026-03-26 11:09 ` Zhang Yi
2026-03-25 15:06 ` Matthew Wilcox
2026-03-26 12:09 ` Diangang Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d9210bcdf73fbe1ac8b6ec132865609a3ed68688.b75b68ec.808e.4625.9191.7f725153fe9d@bytedance.com \
--to=changfengnan@bytedance.com \
--cc=adilger@dilger.ca \
--cc=diangangli@gmail.com \
--cc=lidiangang@bytedance.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=yizhang089@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox