Re: [RFC v2 0/1] ext4: fail fast on repeated buffer_head reads after IO failure

public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed

From: "Diangang Li" <lidiangang@bytedance.com>
To: "Theodore Tso" <tytso@mit.edu>, "Diangang Li" <diangangli@gmail.com>
Cc: <adilger.kernel@dilger.ca>, <linux-ext4@vger.kernel.org>,
	 <linux-fsdevel@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	 <changfengnan@bytedance.com>, <yizhang089@gmail.com>,
	 <willy@infradead.org>
Subject: Re: [RFC v2 0/1] ext4: fail fast on repeated buffer_head reads after IO failure
Date: Thu, 16 Apr 2026 11:54:27 +0800	[thread overview]
Message-ID: <03453fb3-0f3e-491d-ba12-e4208fe1c185@bytedance.com> (raw)
In-Reply-To: <20260413124703.GA20496@macsyma-wired.lan>

On 4/13/26 8:47 PM, Theodore Tso wrote:
> On Mon, Apr 13, 2026 at 02:24:59PM +0800, Diangang Li wrote:
>> From: Diangang Li <lidiangang@bytedance.com>
>>
>> A production system reported hung tasks blocked for 300s+ in ext4
>> buffer_head paths....
>>
>>    [Tue Mar 24 14:16:24 2026] blk_update_request: I/O error, dev sdi,
>>        sector 10704150288 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>>    [Tue Mar 24 14:16:25 2026] blk_update_request: I/O error, dev sdi,
>>        sector 10704488160 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>>    [Tue Mar 24 14:16:26 2026] blk_update_request: I/O error, dev sdi,
>>        sector 10704382912 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> 
> I wonder whether the ext4 layer is the right place to be handle this
> sort of issue.  For example, it could be handled by having a subsystem
> scanning dmesg (or by wiring up notifications so block device errors
> get sent to a userspace daemon), and when certain criteria is met, the
> machine is automatically sent to hardware operations to run
> diagnostics and (most likey) replace the failing disk.
> 
> It could also be handled in the driver or SCSI layer so the "fail
> fast" semantics are handled there, so that it supports all file
> systems, not just ext4.  The SCSI layer also has more information
> about the type of error; you might want to handle things like media
> errors differently from Fibre Channel or iSCSI timeouts (which might
> be something where "fast fast" is not appropriate).
> 
> By the time the error gets propagated up to the buffer head, we lose a
> lot of detail about why the error took place.  Also, in the long term
> we will hopefully be moving away from using buffer cache.
> 
>     		     	    	      	    - Ted

Hi Ted,

What about moving the fail-fast check into the buffer-head path 
(submit_bh_wbc) so it is not ext4-specific. We can update a BH_Read_EIO 
bit in end_bio_bh_io_sync, and add a per-bdev/per-partition sysfs knob 
for the retry window. That turns it into a generic guard for buffer-head 
users, and it naturally goes away as buffer-head usage shrinks.

We did think about doing this in the block layer (submit_bio) or in 
SCSI/NVMe, but a generic solution there seems to need a per-device table 
to cache the error LBAs. With buffer-head, we can keep the error state 
on the bh itself.

I also checked f2fs (no buffer-head). It tracks repeated EIOs on 
metadata/node pages to avoid infinite retry loops. How do you see that 
compared with a buffer-head retry window? Are either of these directions 
worth exploring further?

Thanks,
Diangang

     prev parent reply	other threads:[~2026-04-16  3:54 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-25  9:33 [RFC PATCH 0/1] ext4: fail fast on repeated metadata reads after IO failure Diangang Li
2026-03-25  9:33 ` [RFC 1/1] " Diangang Li
2026-03-25 10:15   ` Andreas Dilger
2026-03-25 11:13     ` Diangang Li
2026-03-25 14:27       ` Zhang Yi
2026-03-26  2:26         ` changfengnan
2026-03-26  7:42         ` Diangang Li
2026-03-26 11:09           ` Zhang Yi
2026-03-25 15:06     ` Matthew Wilcox
2026-03-26 12:09       ` Diangang Li
2026-04-13  6:24 ` [RFC v2 0/1] ext4: fail fast on repeated buffer_head " Diangang Li
2026-04-13  6:25   ` [RFC v2 1/1] " Diangang Li
2026-04-13 12:47   ` [RFC v2 0/1] " Theodore Tso
2026-04-16  3:54     ` Diangang Li [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=03453fb3-0f3e-491d-ba12-e4208fe1c185@bytedance.com \
    --to=lidiangang@bytedance.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=changfengnan@bytedance.com \
    --cc=diangangli@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=willy@infradead.org \
    --cc=yizhang089@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox