* [PATCH v3] btrfs: btrfs_log_dev_io_error() on all bio errors
@ 2026-04-06 16:15 Boris Burkov
2026-04-07 5:16 ` Christoph Hellwig
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Boris Burkov @ 2026-04-06 16:15 UTC (permalink / raw)
To: linux-btrfs, kernel-team
As far as I can tell, we never intentionally constrained ourselves to
these status codes, and it is misleading and surprising to lack the
bdev error logging when we get a different error code from the block
layer. This can lead to jumping to a wrong conclusion like "this
system didn't see any bio failures but aborted with EIO".
For example on nvme devices, I observe many failures coming back as
BLK_STS_MEDIUM. It is apparent that the nvme driver returns a variety of
BLK_STS_* status values in nvme_error_status().
So handle the known expected errors and make some noise on the rest
which we expect won't really happen.
Signed-off-by: Boris Burkov <boris@bur.io>
---
Changelog:
v3:
- actually do the change stated in v2...
v2:
- proper bdev err logging for expected block errors
- btrfs_warn_rl for all other errors
---
fs/btrfs/bio.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 2a2a21aec817..9c3cd6978880 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -4,6 +4,7 @@
* Copyright (C) 2022 Christoph Hellwig.
*/
+#include "linux/blk_types.h"
#include <linux/bio.h>
#include "bio.h"
#include "ctree.h"
@@ -350,11 +351,18 @@ static void btrfs_check_read_bio(struct btrfs_bio *bbio, struct btrfs_device *de
static void btrfs_log_dev_io_error(const struct bio *bio, struct btrfs_device *dev)
{
+ blk_status_t sts = bio->bi_status;
+
if (!dev || !dev->bdev)
return;
- if (bio->bi_status != BLK_STS_IOERR && bio->bi_status != BLK_STS_TARGET)
+ if (unlikely(sts == BLK_STS_OK))
return;
-
+ if (unlikely(sts != BLK_STS_IOERR && sts != BLK_STS_TARGET &&
+ sts != BLK_STS_MEDIUM && sts != BLK_STS_PROTECTION)) {
+ btrfs_warn_rl(dev->fs_info, "bdev %s unexpected block io error: %d",
+ btrfs_dev_name(dev), sts);
+ return;
+ }
if (btrfs_op(bio) == BTRFS_MAP_WRITE)
btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS);
else if (!(bio->bi_opf & REQ_RAHEAD))
--
2.53.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v3] btrfs: btrfs_log_dev_io_error() on all bio errors
2026-04-06 16:15 [PATCH v3] btrfs: btrfs_log_dev_io_error() on all bio errors Boris Burkov
@ 2026-04-07 5:16 ` Christoph Hellwig
2026-04-07 6:47 ` Anand Jain
2026-04-07 18:01 ` David Sterba
2 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2026-04-07 5:16 UTC (permalink / raw)
To: Boris Burkov; +Cc: linux-btrfs, kernel-team
On Mon, Apr 06, 2026 at 09:15:15AM -0700, Boris Burkov wrote:
> As far as I can tell, we never intentionally constrained ourselves to
> these status codes, and it is misleading and surprising to lack the
> bdev error logging when we get a different error code from the block
> layer. This can lead to jumping to a wrong conclusion like "this
> system didn't see any bio failures but aborted with EIO".
>
> For example on nvme devices, I observe many failures coming back as
> BLK_STS_MEDIUM. It is apparent that the nvme driver returns a variety of
> BLK_STS_* status values in nvme_error_status().
>
> So handle the known expected errors and make some noise on the rest
> which we expect won't really happen.
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
although now that you have an unlikely annotation for the non-ok
slow patch I'd use a switch statement for clarity int the second
pass.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3] btrfs: btrfs_log_dev_io_error() on all bio errors
2026-04-06 16:15 [PATCH v3] btrfs: btrfs_log_dev_io_error() on all bio errors Boris Burkov
2026-04-07 5:16 ` Christoph Hellwig
@ 2026-04-07 6:47 ` Anand Jain
2026-04-07 6:50 ` Christoph Hellwig
2026-04-07 18:01 ` David Sterba
2 siblings, 1 reply; 5+ messages in thread
From: Anand Jain @ 2026-04-07 6:47 UTC (permalink / raw)
To: Boris Burkov, linux-btrfs, kernel-team; +Cc: Christoph Hellwig
On 7/4/26 00:15, Boris Burkov wrote:
> As far as I can tell, we never intentionally constrained ourselves to
> these status codes, and it is misleading and surprising to lack the
> bdev error logging when we get a different error code from the block
> layer. This can lead to jumping to a wrong conclusion like "this
> system didn't see any bio failures but aborted with EIO".
>
> For example on nvme devices, I observe many failures coming back as
> BLK_STS_MEDIUM. It is apparent that the nvme driver returns a variety of
> BLK_STS_* status values in nvme_error_status().
BLK_STS_PROTECTION is interpreted differently depending on the device.
In NVMe, it indicates invalid Protection Information (PI), while in
HDDs/SSDs it typically points to a malformed CDB (often a software bug).
This is one of several possible error conditions.
Rather than introducing new BLK_STS_* codes, vendors have reused
existing ones for newer device types to maintain backward compatibility.
There isn't much we can do about this for now. I assume the malformed
CDB-related issues have largely been resolved, so this Btrfs fix is
effectively NVMe-specific.
Reviewed-by: Anand Jain <asj@kernel.org>
Thanks
Anand
> So handle the known expected errors and make some noise on the rest
> which we expect won't really happen.
>
> Signed-off-by: Boris Burkov <boris@bur.io>
> ---
> Changelog:
> v3:
> - actually do the change stated in v2...
> v2:
> - proper bdev err logging for expected block errors
> - btrfs_warn_rl for all other errors
>
> ---
> fs/btrfs/bio.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
> index 2a2a21aec817..9c3cd6978880 100644
> --- a/fs/btrfs/bio.c
> +++ b/fs/btrfs/bio.c
> @@ -4,6 +4,7 @@
> * Copyright (C) 2022 Christoph Hellwig.
> */
>
> +#include "linux/blk_types.h"
> #include <linux/bio.h>
> #include "bio.h"
> #include "ctree.h"
> @@ -350,11 +351,18 @@ static void btrfs_check_read_bio(struct btrfs_bio *bbio, struct btrfs_device *de
>
> static void btrfs_log_dev_io_error(const struct bio *bio, struct btrfs_device *dev)
> {
> + blk_status_t sts = bio->bi_status;
> +
> if (!dev || !dev->bdev)
> return;
> - if (bio->bi_status != BLK_STS_IOERR && bio->bi_status != BLK_STS_TARGET)
> + if (unlikely(sts == BLK_STS_OK))
> return;
> -
> + if (unlikely(sts != BLK_STS_IOERR && sts != BLK_STS_TARGET &&
> + sts != BLK_STS_MEDIUM && sts != BLK_STS_PROTECTION)) {
> + btrfs_warn_rl(dev->fs_info, "bdev %s unexpected block io error: %d",
> + btrfs_dev_name(dev), sts);
> + return;
> + }
> if (btrfs_op(bio) == BTRFS_MAP_WRITE)
> btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS);
> else if (!(bio->bi_opf & REQ_RAHEAD))
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3] btrfs: btrfs_log_dev_io_error() on all bio errors
2026-04-07 6:47 ` Anand Jain
@ 2026-04-07 6:50 ` Christoph Hellwig
0 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2026-04-07 6:50 UTC (permalink / raw)
To: Anand Jain; +Cc: Boris Burkov, linux-btrfs, kernel-team, Christoph Hellwig
On Tue, Apr 07, 2026 at 02:47:37PM +0800, Anand Jain wrote:
> BLK_STS_PROTECTION is interpreted differently depending on the device.
> In NVMe, it indicates invalid Protection Information (PI), while in
> HDDs/SSDs it typically points to a malformed CDB (often a software bug).
> This is one of several possible error conditions.
BLK_STS_PROTECTION should never point to a malformed CDB, and my
quick search of the source tree says we don't do that. Where do
you think that happens?
> Rather than introducing new BLK_STS_* codes, vendors have reused
> existing ones for newer device types to maintain backward compatibility.
???
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3] btrfs: btrfs_log_dev_io_error() on all bio errors
2026-04-06 16:15 [PATCH v3] btrfs: btrfs_log_dev_io_error() on all bio errors Boris Burkov
2026-04-07 5:16 ` Christoph Hellwig
2026-04-07 6:47 ` Anand Jain
@ 2026-04-07 18:01 ` David Sterba
2 siblings, 0 replies; 5+ messages in thread
From: David Sterba @ 2026-04-07 18:01 UTC (permalink / raw)
To: Boris Burkov; +Cc: linux-btrfs, kernel-team
On Mon, Apr 06, 2026 at 09:15:15AM -0700, Boris Burkov wrote:
> As far as I can tell, we never intentionally constrained ourselves to
> these status codes, and it is misleading and surprising to lack the
> bdev error logging when we get a different error code from the block
> layer. This can lead to jumping to a wrong conclusion like "this
> system didn't see any bio failures but aborted with EIO".
>
> For example on nvme devices, I observe many failures coming back as
> BLK_STS_MEDIUM. It is apparent that the nvme driver returns a variety of
> BLK_STS_* status values in nvme_error_status().
>
> So handle the known expected errors and make some noise on the rest
> which we expect won't really happen.
>
> Signed-off-by: Boris Burkov <boris@bur.io>
I'll add it to for-next so it's in the first 7.1 pull request. Thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-04-07 18:01 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-06 16:15 [PATCH v3] btrfs: btrfs_log_dev_io_error() on all bio errors Boris Burkov
2026-04-07 5:16 ` Christoph Hellwig
2026-04-07 6:47 ` Anand Jain
2026-04-07 6:50 ` Christoph Hellwig
2026-04-07 18:01 ` David Sterba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox