Re: [PATCH v2] btrfs: btrfs_log_dev_io_error() on all bio errors

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Boris Burkov <boris@bur.io>
To: Anand Jain <anajain.sg@gmail.com>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH v2] btrfs: btrfs_log_dev_io_error() on all bio errors
Date: Mon, 6 Apr 2026 09:12:55 -0700	[thread overview]
Message-ID: <20260406161255.GA1743674@zen.localdomain> (raw)
In-Reply-To: <5790719b-70a6-4d75-813d-5842e77c7b89@gmail.com>

On Mon, Apr 06, 2026 at 09:56:07PM +0800, Anand Jain wrote:
> 
> 
> Disagree. We should only track block-device-specific errors;
> specifically, only the missing MEDIUM ERROR should be added [1].
> 
> Including all error types would capture transport and fabric-related
> errors, which do not justify a device replacement or permanent error state.
> 
> In my experience with storage and transport protocols, transport errors
> and timeouts are often transient and far more frequent than actual block
> device media failures. Including them would introduce unnecessary noise
> into the device statistics.

This is just my mistake. I messed up with git and resent my v1 instead
of the updated version after Christoph's review here
https://lore.kernel.org/linux-btrfs/20260327102253.GA18582@lst.de/

I intended to only log bdev error on IOERR, TARGET, MEDIUM, and
PROTECTION as he suggested.

Sorry for wasting your time with that, Anand.

Boris

> 
> 
> #define BLK_STS_NOTSUPP         ((__force blk_status_t)1)
> #define BLK_STS_TIMEOUT         ((__force blk_status_t)2)
> #define BLK_STS_NOSPC           ((__force blk_status_t)3)
> #define BLK_STS_TRANSPORT       ((__force blk_status_t)4)
> #define BLK_STS_TARGET          ((__force blk_status_t)5)
> #define BLK_STS_RESV_CONFLICT   ((__force blk_status_t)6)
> #define BLK_STS_MEDIUM          ((__force blk_status_t)7)
> #define BLK_STS_PROTECTION      ((__force blk_status_t)8)
> #define BLK_STS_RESOURCE        ((__force blk_status_t)9)
> #define BLK_STS_IOERR           ((__force blk_status_t)10)
> <snap>
> #define BLK_STS_OFFLINE         ((__force blk_status_t)16)
> #define BLK_STS_DURATION_LIMIT  ((__force blk_status_t)17)
> #define BLK_STS_INVAL           ((__force blk_status_t)19)
> 
> 
> 
> [1]
> 
> 
> diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
> index 2a2a21aec817..d7af38e9ce29 100644
> --- a/fs/btrfs/bio.c
> +++ b/fs/btrfs/bio.c
> @@ -352,7 +352,8 @@ static void btrfs_log_dev_io_error(const struct bio
> *bio, struct btrfs_device *d
>  {
>         if (!dev || !dev->bdev)
>                 return;
> -       if (bio->bi_status != BLK_STS_IOERR && bio->bi_status !=
> BLK_STS_TARGET)
> +       if (bio->bi_status != BLK_STS_IOERR && bio->bi_status !=
> BLK_STS_TARGET &&
> +           bio->bi_status != BLK_STS_MEDIUM)
>                 return;
> 
>         if (btrfs_op(bio) == BTRFS_MAP_WRITE)
> 
> 
> 
> Thanks, Anand
> 
> On 4/4/26 23:57, Boris Burkov wrote:
> > As far as I can tell, we never intentionally constrained ourselves to
> > these status codes, and it is misleading and surprising to lack the
> > bdev error logging when we get a different error code from the block
> > layer. This can lead to jumping to a wrong conclusion like "this
> > system didn't see any bio failures but aborted with EIO".
> > 
> > For example on nvme devices, I observe many failures coming back as
> > BLK_STS_MEDIUM. It is apparent that the nvme driver returns a variety of
> > BLK_STS_* status values in nvme_error_status().
> > 
> > So handle the known expected errors and make some noise on the rest
> > which we expect won't really happen.
> > 
> > Signed-off-by: Boris Burkov <boris@bur.io>
> > ---
> > Changelog:
> > v2:
> > - proper bdev err logging for expected block errors
> > - btrfs_warn_rl for all other errors
> > ---
> >  fs/btrfs/bio.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
> > index 2a2a21aec817..08b1d62603d0 100644
> > --- a/fs/btrfs/bio.c
> > +++ b/fs/btrfs/bio.c
> > @@ -352,8 +352,7 @@ static void btrfs_log_dev_io_error(const struct bio *bio, struct btrfs_device *d
> >  {
> >  	if (!dev || !dev->bdev)
> >  		return;
> > -	if (bio->bi_status != BLK_STS_IOERR && bio->bi_status != BLK_STS_TARGET)
> > -		return;
> > +	ASSERT(bio->bi_status);
> >  
> >  	if (btrfs_op(bio) == BTRFS_MAP_WRITE)
> >  		btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS);
>

next prev parent reply	other threads:[~2026-04-06 16:13 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-04 15:57 [PATCH v2] btrfs: btrfs_log_dev_io_error() on all bio errors Boris Burkov
2026-04-06 13:56 ` Anand Jain
2026-04-06 16:12   ` Boris Burkov [this message]
2026-04-07  6:36     ` Anand Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260406161255.GA1743674@zen.localdomain \
    --to=boris@bur.io \
    --cc=anajain.sg@gmail.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox