[PATCH v2] btrfs: btrfs_log_dev_io_error() on all bio errors

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2] btrfs: btrfs_log_dev_io_error() on all bio errors
@ 2026-04-04 15:57 Boris Burkov
  2026-04-06 13:56 ` Anand Jain
  0 siblings, 1 reply; 4+ messages in thread
From: Boris Burkov @ 2026-04-04 15:57 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

As far as I can tell, we never intentionally constrained ourselves to
these status codes, and it is misleading and surprising to lack the
bdev error logging when we get a different error code from the block
layer. This can lead to jumping to a wrong conclusion like "this
system didn't see any bio failures but aborted with EIO".

For example on nvme devices, I observe many failures coming back as
BLK_STS_MEDIUM. It is apparent that the nvme driver returns a variety of
BLK_STS_* status values in nvme_error_status().

So handle the known expected errors and make some noise on the rest
which we expect won't really happen.

Signed-off-by: Boris Burkov <boris@bur.io>
---
Changelog:
v2:
- proper bdev err logging for expected block errors
- btrfs_warn_rl for all other errors
---
 fs/btrfs/bio.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 2a2a21aec817..08b1d62603d0 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -352,8 +352,7 @@ static void btrfs_log_dev_io_error(const struct bio *bio, struct btrfs_device *d
 {
 	if (!dev || !dev->bdev)
 		return;
-	if (bio->bi_status != BLK_STS_IOERR && bio->bi_status != BLK_STS_TARGET)
-		return;
+	ASSERT(bio->bi_status);

 	if (btrfs_op(bio) == BTRFS_MAP_WRITE)
 		btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS);
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] btrfs: btrfs_log_dev_io_error() on all bio errors
  2026-04-04 15:57 [PATCH v2] btrfs: btrfs_log_dev_io_error() on all bio errors Boris Burkov
@ 2026-04-06 13:56 ` Anand Jain
  2026-04-06 16:12   ` Boris Burkov
  0 siblings, 1 reply; 4+ messages in thread
From: Anand Jain @ 2026-04-06 13:56 UTC (permalink / raw)
  To: Boris Burkov, linux-btrfs, kernel-team



Disagree. We should only track block-device-specific errors;
specifically, only the missing MEDIUM ERROR should be added [1].

Including all error types would capture transport and fabric-related
errors, which do not justify a device replacement or permanent error state.

In my experience with storage and transport protocols, transport errors
and timeouts are often transient and far more frequent than actual block
device media failures. Including them would introduce unnecessary noise
into the device statistics.


#define BLK_STS_NOTSUPP         ((__force blk_status_t)1)
#define BLK_STS_TIMEOUT         ((__force blk_status_t)2)
#define BLK_STS_NOSPC           ((__force blk_status_t)3)
#define BLK_STS_TRANSPORT       ((__force blk_status_t)4)
#define BLK_STS_TARGET          ((__force blk_status_t)5)
#define BLK_STS_RESV_CONFLICT   ((__force blk_status_t)6)
#define BLK_STS_MEDIUM          ((__force blk_status_t)7)
#define BLK_STS_PROTECTION      ((__force blk_status_t)8)
#define BLK_STS_RESOURCE        ((__force blk_status_t)9)
#define BLK_STS_IOERR           ((__force blk_status_t)10)
<snap>
#define BLK_STS_OFFLINE         ((__force blk_status_t)16)
#define BLK_STS_DURATION_LIMIT  ((__force blk_status_t)17)
#define BLK_STS_INVAL           ((__force blk_status_t)19)



[1]


diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 2a2a21aec817..d7af38e9ce29 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -352,7 +352,8 @@ static void btrfs_log_dev_io_error(const struct bio
*bio, struct btrfs_device *d
 {
        if (!dev || !dev->bdev)
                return;
-       if (bio->bi_status != BLK_STS_IOERR && bio->bi_status !=
BLK_STS_TARGET)
+       if (bio->bi_status != BLK_STS_IOERR && bio->bi_status !=
BLK_STS_TARGET &&
+           bio->bi_status != BLK_STS_MEDIUM)
                return;

        if (btrfs_op(bio) == BTRFS_MAP_WRITE)



Thanks, Anand

On 4/4/26 23:57, Boris Burkov wrote:
> As far as I can tell, we never intentionally constrained ourselves to
> these status codes, and it is misleading and surprising to lack the
> bdev error logging when we get a different error code from the block
> layer. This can lead to jumping to a wrong conclusion like "this
> system didn't see any bio failures but aborted with EIO".
> 
> For example on nvme devices, I observe many failures coming back as
> BLK_STS_MEDIUM. It is apparent that the nvme driver returns a variety of
> BLK_STS_* status values in nvme_error_status().
> 
> So handle the known expected errors and make some noise on the rest
> which we expect won't really happen.
> 
> Signed-off-by: Boris Burkov <boris@bur.io>
> ---
> Changelog:
> v2:
> - proper bdev err logging for expected block errors
> - btrfs_warn_rl for all other errors
> ---
>  fs/btrfs/bio.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
> index 2a2a21aec817..08b1d62603d0 100644
> --- a/fs/btrfs/bio.c
> +++ b/fs/btrfs/bio.c
> @@ -352,8 +352,7 @@ static void btrfs_log_dev_io_error(const struct bio *bio, struct btrfs_device *d
>  {
>  	if (!dev || !dev->bdev)
>  		return;
> -	if (bio->bi_status != BLK_STS_IOERR && bio->bi_status != BLK_STS_TARGET)
> -		return;
> +	ASSERT(bio->bi_status);
>  
>  	if (btrfs_op(bio) == BTRFS_MAP_WRITE)
>  		btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS);


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] btrfs: btrfs_log_dev_io_error() on all bio errors
  2026-04-06 13:56 ` Anand Jain
@ 2026-04-06 16:12   ` Boris Burkov
  2026-04-07  6:36     ` Anand Jain
  0 siblings, 1 reply; 4+ messages in thread
From: Boris Burkov @ 2026-04-06 16:12 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, kernel-team

On Mon, Apr 06, 2026 at 09:56:07PM +0800, Anand Jain wrote:
> 
> 
> Disagree. We should only track block-device-specific errors;
> specifically, only the missing MEDIUM ERROR should be added [1].
> 
> Including all error types would capture transport and fabric-related
> errors, which do not justify a device replacement or permanent error state.
> 
> In my experience with storage and transport protocols, transport errors
> and timeouts are often transient and far more frequent than actual block
> device media failures. Including them would introduce unnecessary noise
> into the device statistics.

This is just my mistake. I messed up with git and resent my v1 instead
of the updated version after Christoph's review here
https://lore.kernel.org/linux-btrfs/20260327102253.GA18582@lst.de/

I intended to only log bdev error on IOERR, TARGET, MEDIUM, and
PROTECTION as he suggested.

Sorry for wasting your time with that, Anand.

Boris

> 
> 
> #define BLK_STS_NOTSUPP         ((__force blk_status_t)1)
> #define BLK_STS_TIMEOUT         ((__force blk_status_t)2)
> #define BLK_STS_NOSPC           ((__force blk_status_t)3)
> #define BLK_STS_TRANSPORT       ((__force blk_status_t)4)
> #define BLK_STS_TARGET          ((__force blk_status_t)5)
> #define BLK_STS_RESV_CONFLICT   ((__force blk_status_t)6)
> #define BLK_STS_MEDIUM          ((__force blk_status_t)7)
> #define BLK_STS_PROTECTION      ((__force blk_status_t)8)
> #define BLK_STS_RESOURCE        ((__force blk_status_t)9)
> #define BLK_STS_IOERR           ((__force blk_status_t)10)
> <snap>
> #define BLK_STS_OFFLINE         ((__force blk_status_t)16)
> #define BLK_STS_DURATION_LIMIT  ((__force blk_status_t)17)
> #define BLK_STS_INVAL           ((__force blk_status_t)19)
> 
> 
> 
> [1]
> 
> 
> diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
> index 2a2a21aec817..d7af38e9ce29 100644
> --- a/fs/btrfs/bio.c
> +++ b/fs/btrfs/bio.c
> @@ -352,7 +352,8 @@ static void btrfs_log_dev_io_error(const struct bio
> *bio, struct btrfs_device *d
>  {
>         if (!dev || !dev->bdev)
>                 return;
> -       if (bio->bi_status != BLK_STS_IOERR && bio->bi_status !=
> BLK_STS_TARGET)
> +       if (bio->bi_status != BLK_STS_IOERR && bio->bi_status !=
> BLK_STS_TARGET &&
> +           bio->bi_status != BLK_STS_MEDIUM)
>                 return;
> 
>         if (btrfs_op(bio) == BTRFS_MAP_WRITE)
> 
> 
> 
> Thanks, Anand
> 
> On 4/4/26 23:57, Boris Burkov wrote:
> > As far as I can tell, we never intentionally constrained ourselves to
> > these status codes, and it is misleading and surprising to lack the
> > bdev error logging when we get a different error code from the block
> > layer. This can lead to jumping to a wrong conclusion like "this
> > system didn't see any bio failures but aborted with EIO".
> > 
> > For example on nvme devices, I observe many failures coming back as
> > BLK_STS_MEDIUM. It is apparent that the nvme driver returns a variety of
> > BLK_STS_* status values in nvme_error_status().
> > 
> > So handle the known expected errors and make some noise on the rest
> > which we expect won't really happen.
> > 
> > Signed-off-by: Boris Burkov <boris@bur.io>
> > ---
> > Changelog:
> > v2:
> > - proper bdev err logging for expected block errors
> > - btrfs_warn_rl for all other errors
> > ---
> >  fs/btrfs/bio.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
> > index 2a2a21aec817..08b1d62603d0 100644
> > --- a/fs/btrfs/bio.c
> > +++ b/fs/btrfs/bio.c
> > @@ -352,8 +352,7 @@ static void btrfs_log_dev_io_error(const struct bio *bio, struct btrfs_device *d
> >  {
> >  	if (!dev || !dev->bdev)
> >  		return;
> > -	if (bio->bi_status != BLK_STS_IOERR && bio->bi_status != BLK_STS_TARGET)
> > -		return;
> > +	ASSERT(bio->bi_status);
> >  
> >  	if (btrfs_op(bio) == BTRFS_MAP_WRITE)
> >  		btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS);
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] btrfs: btrfs_log_dev_io_error() on all bio errors
  2026-04-06 16:12   ` Boris Burkov
@ 2026-04-07  6:36     ` Anand Jain
  0 siblings, 0 replies; 4+ messages in thread
From: Anand Jain @ 2026-04-07  6:36 UTC (permalink / raw)
  To: Boris Burkov; +Cc: linux-btrfs, kernel-team

On 7/4/26 00:12, Boris Burkov wrote:
> On Mon, Apr 06, 2026 at 09:56:07PM +0800, Anand Jain wrote:
>>
>>
>> Disagree. We should only track block-device-specific errors;
>> specifically, only the missing MEDIUM ERROR should be added [1].
>>
>> Including all error types would capture transport and fabric-related
>> errors, which do not justify a device replacement or permanent error state.
>>
>> In my experience with storage and transport protocols, transport errors
>> and timeouts are often transient and far more frequent than actual block
>> device media failures. Including them would introduce unnecessary noise
>> into the device statistics.
> 
> This is just my mistake. I messed up with git and resent my v1 instead
> of the updated version after Christoph's review here
> https://lore.kernel.org/linux-btrfs/20260327102253.GA18582@lst.de/
> 
> I intended to only log bdev error on IOERR, TARGET, MEDIUM, and
> PROTECTION as he suggested.
> 
> Sorry for wasting your time with that, Anand.

My bad - I should have checked v1. Thanks.


> Boris
> 
>>
>>
>> #define BLK_STS_NOTSUPP         ((__force blk_status_t)1)
>> #define BLK_STS_TIMEOUT         ((__force blk_status_t)2)
>> #define BLK_STS_NOSPC           ((__force blk_status_t)3)
>> #define BLK_STS_TRANSPORT       ((__force blk_status_t)4)
>> #define BLK_STS_TARGET          ((__force blk_status_t)5)
>> #define BLK_STS_RESV_CONFLICT   ((__force blk_status_t)6)
>> #define BLK_STS_MEDIUM          ((__force blk_status_t)7)
>> #define BLK_STS_PROTECTION      ((__force blk_status_t)8)
>> #define BLK_STS_RESOURCE        ((__force blk_status_t)9)
>> #define BLK_STS_IOERR           ((__force blk_status_t)10)
>> <snap>
>> #define BLK_STS_OFFLINE         ((__force blk_status_t)16)
>> #define BLK_STS_DURATION_LIMIT  ((__force blk_status_t)17)
>> #define BLK_STS_INVAL           ((__force blk_status_t)19)
>>
>>
>>
>> [1]
>>
>>
>> diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
>> index 2a2a21aec817..d7af38e9ce29 100644
>> --- a/fs/btrfs/bio.c
>> +++ b/fs/btrfs/bio.c
>> @@ -352,7 +352,8 @@ static void btrfs_log_dev_io_error(const struct bio
>> *bio, struct btrfs_device *d
>>  {
>>         if (!dev || !dev->bdev)
>>                 return;
>> -       if (bio->bi_status != BLK_STS_IOERR && bio->bi_status !=
>> BLK_STS_TARGET)
>> +       if (bio->bi_status != BLK_STS_IOERR && bio->bi_status !=
>> BLK_STS_TARGET &&
>> +           bio->bi_status != BLK_STS_MEDIUM)
>>                 return;
>>
>>         if (btrfs_op(bio) == BTRFS_MAP_WRITE)
>>
>>
>>
>> Thanks, Anand
>>
>> On 4/4/26 23:57, Boris Burkov wrote:
>>> As far as I can tell, we never intentionally constrained ourselves to
>>> these status codes, and it is misleading and surprising to lack the
>>> bdev error logging when we get a different error code from the block
>>> layer. This can lead to jumping to a wrong conclusion like "this
>>> system didn't see any bio failures but aborted with EIO".
>>>
>>> For example on nvme devices, I observe many failures coming back as
>>> BLK_STS_MEDIUM. It is apparent that the nvme driver returns a variety of
>>> BLK_STS_* status values in nvme_error_status().
>>>
>>> So handle the known expected errors and make some noise on the rest
>>> which we expect won't really happen.
>>>
>>> Signed-off-by: Boris Burkov <boris@bur.io>
>>> ---
>>> Changelog:
>>> v2:
>>> - proper bdev err logging for expected block errors
>>> - btrfs_warn_rl for all other errors
>>> ---
>>>  fs/btrfs/bio.c | 3 +--
>>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>>
>>> diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
>>> index 2a2a21aec817..08b1d62603d0 100644
>>> --- a/fs/btrfs/bio.c
>>> +++ b/fs/btrfs/bio.c
>>> @@ -352,8 +352,7 @@ static void btrfs_log_dev_io_error(const struct bio *bio, struct btrfs_device *d
>>>  {
>>>  	if (!dev || !dev->bdev)
>>>  		return;
>>> -	if (bio->bi_status != BLK_STS_IOERR && bio->bi_status != BLK_STS_TARGET)
>>> -		return;
>>> +	ASSERT(bio->bi_status);
>>>  
>>>  	if (btrfs_op(bio) == BTRFS_MAP_WRITE)
>>>  		btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS);
>>


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-04-07  6:37 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-04 15:57 [PATCH v2] btrfs: btrfs_log_dev_io_error() on all bio errors Boris Burkov
2026-04-06 13:56 ` Anand Jain
2026-04-06 16:12   ` Boris Burkov
2026-04-07  6:36     ` Anand Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox