public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] btrfs: btrfs_log_dev_io_error() on all bio errors
@ 2026-03-13 19:26 Boris Burkov
  2026-03-13 21:18 ` Qu Wenruo
  2026-03-16 16:20 ` Christoph Hellwig
  0 siblings, 2 replies; 6+ messages in thread
From: Boris Burkov @ 2026-03-13 19:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: hch

As far as I can tell, we never intentionally constrained ourselves to
these status codes, and it is misleading and surprising to lack the
bdev error logging when we get a different error code from the block
layer. This can lead to jumping to a wrong conclusion like "this
system didn't see any bio failures but aborted with EIO".

For example on nvme devices, I observe many failures coming back as
BLK_STS_MEDIUM. It is apparent that the nvme driver returns a variety of
BLK_STS_* status values in nvme_error_status().

Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/bio.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 2a2a21aec817..08b1d62603d0 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -352,8 +352,7 @@ static void btrfs_log_dev_io_error(const struct bio *bio, struct btrfs_device *d
 {
 	if (!dev || !dev->bdev)
 		return;
-	if (bio->bi_status != BLK_STS_IOERR && bio->bi_status != BLK_STS_TARGET)
-		return;
+	ASSERT(bio->bi_status);
 
 	if (btrfs_op(bio) == BTRFS_MAP_WRITE)
 		btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: btrfs_log_dev_io_error() on all bio errors
  2026-03-13 19:26 [PATCH] btrfs: btrfs_log_dev_io_error() on all bio errors Boris Burkov
@ 2026-03-13 21:18 ` Qu Wenruo
  2026-03-16 16:20 ` Christoph Hellwig
  1 sibling, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2026-03-13 21:18 UTC (permalink / raw)
  To: Boris Burkov, linux-btrfs, kernel-team; +Cc: hch



在 2026/3/14 05:56, Boris Burkov 写道:
> As far as I can tell, we never intentionally constrained ourselves to
> these status codes, and it is misleading and surprising to lack the
> bdev error logging when we get a different error code from the block
> layer. This can lead to jumping to a wrong conclusion like "this
> system didn't see any bio failures but aborted with EIO".
> 
> For example on nvme devices, I observe many failures coming back as
> BLK_STS_MEDIUM. It is apparent that the nvme driver returns a variety of
> BLK_STS_* status values in nvme_error_status().
> 
> Signed-off-by: Boris Burkov <boris@bur.io>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu


> ---
>   fs/btrfs/bio.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
> index 2a2a21aec817..08b1d62603d0 100644
> --- a/fs/btrfs/bio.c
> +++ b/fs/btrfs/bio.c
> @@ -352,8 +352,7 @@ static void btrfs_log_dev_io_error(const struct bio *bio, struct btrfs_device *d
>   {
>   	if (!dev || !dev->bdev)
>   		return;
> -	if (bio->bi_status != BLK_STS_IOERR && bio->bi_status != BLK_STS_TARGET)
> -		return;
> +	ASSERT(bio->bi_status);
>   
>   	if (btrfs_op(bio) == BTRFS_MAP_WRITE)
>   		btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS);


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: btrfs_log_dev_io_error() on all bio errors
  2026-03-13 19:26 [PATCH] btrfs: btrfs_log_dev_io_error() on all bio errors Boris Burkov
  2026-03-13 21:18 ` Qu Wenruo
@ 2026-03-16 16:20 ` Christoph Hellwig
  2026-03-16 16:29   ` Boris Burkov
  1 sibling, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2026-03-16 16:20 UTC (permalink / raw)
  To: Boris Burkov; +Cc: linux-btrfs, kernel-team, hch, sbehrens

On Fri, Mar 13, 2026 at 12:26:05PM -0700, Boris Burkov wrote:
> As far as I can tell, we never intentionally constrained ourselves to
> these status codes, and it is misleading and surprising to lack the
> bdev error logging when we get a different error code from the block
> layer. This can lead to jumping to a wrong conclusion like "this
> system didn't see any bio failures but aborted with EIO".
> 
> For example on nvme devices, I observe many failures coming back as
> BLK_STS_MEDIUM. It is apparent that the nvme driver returns a variety of
> BLK_STS_* status values in nvme_error_status().

This goes all the way back to adding the error counts in commit
442a4f6308e69.  The block error values are a bit of a mess and it's
on my todo list to sort them out - only a small number of them should
even be seen here.  I'd suggest to leave things as-is for now, and
I'll make sure you're included in the discussion when I get around
to attack the block status vs file system issues, which will hopefully
be very soon.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: btrfs_log_dev_io_error() on all bio errors
  2026-03-16 16:20 ` Christoph Hellwig
@ 2026-03-16 16:29   ` Boris Burkov
  2026-03-17 13:59     ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Boris Burkov @ 2026-03-16 16:29 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs, kernel-team, sbehrens

On Mon, Mar 16, 2026 at 05:20:20PM +0100, Christoph Hellwig wrote:
> On Fri, Mar 13, 2026 at 12:26:05PM -0700, Boris Burkov wrote:
> > As far as I can tell, we never intentionally constrained ourselves to
> > these status codes, and it is misleading and surprising to lack the
> > bdev error logging when we get a different error code from the block
> > layer. This can lead to jumping to a wrong conclusion like "this
> > system didn't see any bio failures but aborted with EIO".
> > 
> > For example on nvme devices, I observe many failures coming back as
> > BLK_STS_MEDIUM. It is apparent that the nvme driver returns a variety of
> > BLK_STS_* status values in nvme_error_status().
> 
> This goes all the way back to adding the error counts in commit
> 442a4f6308e69.  The block error values are a bit of a mess and it's
> on my todo list to sort them out - only a small number of them should
> even be seen here.  I'd suggest to leave things as-is for now, and
> I'll make sure you're included in the discussion when I get around
> to attack the block status vs file system issues, which will hopefully
> be very soon.
> 

Is there a reason not to properly log the errors we are definitely
already seeing? I have thousands of instances with swallowed
BLK_STS_MEDIUM errors that really confused me and messed up my data
while investigating a possible regression.

If you think this will confuse/upset users who aren't watching a giant
fleet with distributed logging and make them throw out good drives
prematurely or something, then I can just carry this patch internally.

If you are just worried about having to aim at a moving target, I am
happy to offer to be available to make sure btrfs does what you want
when you do find time to work on this. It's hard for me to judge since I
don't really know what is wrong with the error signalling as-is.

Put another way, if we are in a situation with technical debt, which is
what I think you are saying, then I would rather have more visibility
rather than less.

Thanks,
Boris

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: btrfs_log_dev_io_error() on all bio errors
  2026-03-16 16:29   ` Boris Burkov
@ 2026-03-17 13:59     ` Christoph Hellwig
  2026-03-17 16:16       ` David Sterba
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2026-03-17 13:59 UTC (permalink / raw)
  To: Boris Burkov; +Cc: Christoph Hellwig, linux-btrfs, kernel-team, sbehrens

On Mon, Mar 16, 2026 at 09:29:51AM -0700, Boris Burkov wrote:
> Is there a reason not to properly log the errors we are definitely
> already seeing?

For example if you get a BLK_STS_AGAIN which is not actually an
error.

> Put another way, if we are in a situation with technical debt, which is
> what I think you are saying, then I would rather have more visibility
> rather than less.

Can you give me a week to give this thing a stab?  If not please just
add STS_MEDIUM to the whitelist for now.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] btrfs: btrfs_log_dev_io_error() on all bio errors
  2026-03-17 13:59     ` Christoph Hellwig
@ 2026-03-17 16:16       ` David Sterba
  0 siblings, 0 replies; 6+ messages in thread
From: David Sterba @ 2026-03-17 16:16 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Boris Burkov, linux-btrfs, kernel-team, sbehrens

On Tue, Mar 17, 2026 at 02:59:42PM +0100, Christoph Hellwig wrote:
> On Mon, Mar 16, 2026 at 09:29:51AM -0700, Boris Burkov wrote:
> > Is there a reason not to properly log the errors we are definitely
> > already seeing?
> 
> For example if you get a BLK_STS_AGAIN which is not actually an
> error.
> 
> > Put another way, if we are in a situation with technical debt, which is
> > what I think you are saying, then I would rather have more visibility
> > rather than less.
> 
> Can you give me a week to give this thing a stab?  If not please just
> add STS_MEDIUM to the whitelist for now.

Boris has some trouble sending mail, I'm replying on his behalf. The
patch has been removed from for-next and we'll wait for the BLK_STS_*
changes.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-03-17 16:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-13 19:26 [PATCH] btrfs: btrfs_log_dev_io_error() on all bio errors Boris Burkov
2026-03-13 21:18 ` Qu Wenruo
2026-03-16 16:20 ` Christoph Hellwig
2026-03-16 16:29   ` Boris Burkov
2026-03-17 13:59     ` Christoph Hellwig
2026-03-17 16:16       ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox