public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Theodore Tso" <tytso@mit.edu>
To: Diangang Li <diangangli@gmail.com>
Cc: adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	changfengnan@bytedance.com, yizhang089@gmail.com,
	willy@infradead.org, Diangang Li <lidiangang@bytedance.com>
Subject: Re: [RFC v2 0/1] ext4: fail fast on repeated buffer_head reads after IO failure
Date: Mon, 13 Apr 2026 08:47:03 -0400	[thread overview]
Message-ID: <20260413124703.GA20496@macsyma-wired.lan> (raw)
In-Reply-To: <20260413062500.1380307-1-diangangli@gmail.com>

On Mon, Apr 13, 2026 at 02:24:59PM +0800, Diangang Li wrote:
> From: Diangang Li <lidiangang@bytedance.com>
> 
> A production system reported hung tasks blocked for 300s+ in ext4
> buffer_head paths....
> 
>   [Tue Mar 24 14:16:24 2026] blk_update_request: I/O error, dev sdi,
>       sector 10704150288 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>   [Tue Mar 24 14:16:25 2026] blk_update_request: I/O error, dev sdi,
>       sector 10704488160 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>   [Tue Mar 24 14:16:26 2026] blk_update_request: I/O error, dev sdi,
>       sector 10704382912 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0

I wonder whether the ext4 layer is the right place to be handle this
sort of issue.  For example, it could be handled by having a subsystem
scanning dmesg (or by wiring up notifications so block device errors
get sent to a userspace daemon), and when certain criteria is met, the
machine is automatically sent to hardware operations to run
diagnostics and (most likey) replace the failing disk.

It could also be handled in the driver or SCSI layer so the "fail
fast" semantics are handled there, so that it supports all file
systems, not just ext4.  The SCSI layer also has more information
about the type of error; you might want to handle things like media
errors differently from Fibre Channel or iSCSI timeouts (which might
be something where "fast fast" is not appropriate).

By the time the error gets propagated up to the buffer head, we lose a
lot of detail about why the error took place.  Also, in the long term
we will hopefully be moving away from using buffer cache.

   		     	    	      	    - Ted

      parent reply	other threads:[~2026-04-13 12:48 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-25  9:33 [RFC PATCH 0/1] ext4: fail fast on repeated metadata reads after IO failure Diangang Li
2026-03-25  9:33 ` [RFC 1/1] " Diangang Li
2026-03-25 10:15   ` Andreas Dilger
2026-03-25 11:13     ` Diangang Li
2026-03-25 14:27       ` Zhang Yi
2026-03-26  2:26         ` changfengnan
2026-03-26  7:42         ` Diangang Li
2026-03-26 11:09           ` Zhang Yi
2026-03-25 15:06     ` Matthew Wilcox
2026-03-26 12:09       ` Diangang Li
2026-04-13  6:24 ` [RFC v2 0/1] ext4: fail fast on repeated buffer_head " Diangang Li
2026-04-13  6:25   ` [RFC v2 1/1] " Diangang Li
2026-04-13 12:47   ` Theodore Tso [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260413124703.GA20496@macsyma-wired.lan \
    --to=tytso@mit.edu \
    --cc=adilger.kernel@dilger.ca \
    --cc=changfengnan@bytedance.com \
    --cc=diangangli@gmail.com \
    --cc=lidiangang@bytedance.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=willy@infradead.org \
    --cc=yizhang089@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox