* [PATCH] xfs: do not propagate ENODATA disk errors into xattr code
@ 2025-08-21 21:36 Eric Sandeen
2025-08-22 15:21 ` Darrick J. Wong
0 siblings, 1 reply; 7+ messages in thread
From: Eric Sandeen @ 2025-08-21 21:36 UTC (permalink / raw)
To: linux-xfs@vger.kernel.org
Cc: Donald Douwsma, Dave Chinner, Darrick J. Wong, Christoph Hellwig,
stable
ENODATA (aka ENOATTR) has a very specific meaning in the xfs xattr code;
namely, that the requested attribute name could not be found.
However, a medium error from disk may also return ENODATA. At best,
this medium error may escape to userspace as "attribute not found"
when in fact it's an IO (disk) error.
At worst, we may oops in xfs_attr_leaf_get() when we do:
error = xfs_attr_leaf_hasname(args, &bp);
if (error == -ENOATTR) {
xfs_trans_brelse(args->trans, bp);
return error;
}
because an ENODATA/ENOATTR error from disk leaves us with a null bp,
and the xfs_trans_brelse will then null-deref it.
As discussed on the list, we really need to modify the lower level
IO functions to trap all disk errors and ensure that we don't let
unique errors like this leak up into higher xfs functions - many
like this should be remapped to EIO.
However, this patch directly addresses a reported bug in the xattr
code, and should be safe to backport to stable kernels. A larger-scope
patch to handle more unique errors at lower levels can follow later.
(Note, prior to 07120f1abdff we did not oops, but we did return the
wrong error code to userspace.)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Fixes: 07120f1abdff ("xfs: Add xfs_has_attr and subroutines")
Cc: <stable@vger.kernel.org> # v5.9+
---
(I get it that sprinkling this around to 3 places might have an ick
factor but I think it's necessary to make a surgical strike on this bug
before we address the general problem.)
Thanks,
-Eric
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index fddb55605e0c..9b54cfb0e13d 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -478,6 +478,12 @@ xfs_attr3_leaf_read(
err = xfs_da_read_buf(tp, dp, bno, 0, bpp, XFS_ATTR_FORK,
&xfs_attr3_leaf_buf_ops);
+ /*
+ * ENODATA from disk implies a disk medium failure; ENODATA for
+ * xattrs means attribute not found, so disambiguate that here.
+ */
+ if (err == -ENODATA)
+ err = -EIO;
if (err || !(*bpp))
return err;
diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index 4c44ce1c8a64..bff3dc226f81 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -435,6 +435,13 @@ xfs_attr_rmtval_get(
0, &bp, &xfs_attr3_rmt_buf_ops);
if (xfs_metadata_is_sick(error))
xfs_dirattr_mark_sick(args->dp, XFS_ATTR_FORK);
+ /*
+ * ENODATA from disk implies a disk medium failure;
+ * ENODATA for xattrs means attribute not found, so
+ * disambiguate that here.
+ */
+ if (error == -ENODATA)
+ error = -EIO;
if (error)
return error;
diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
index 17d9e6154f19..723a0643b838 100644
--- a/fs/xfs/libxfs/xfs_da_btree.c
+++ b/fs/xfs/libxfs/xfs_da_btree.c
@@ -2833,6 +2833,12 @@ xfs_da_read_buf(
&bp, ops);
if (xfs_metadata_is_sick(error))
xfs_dirattr_mark_sick(dp, whichfork);
+ /*
+ * ENODATA from disk implies a disk medium failure; ENODATA for
+ * xattrs means attribute not found, so disambiguate that here.
+ */
+ if (error == -ENODATA && whichfork == XFS_ATTR_FORK)
+ error = -EIO;
if (error)
goto out_free;
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] xfs: do not propagate ENODATA disk errors into xattr code
2025-08-21 21:36 [PATCH] xfs: do not propagate ENODATA disk errors into xattr code Eric Sandeen
@ 2025-08-22 15:21 ` Darrick J. Wong
2025-08-25 7:55 ` Christoph Hellwig
0 siblings, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2025-08-22 15:21 UTC (permalink / raw)
To: Eric Sandeen
Cc: linux-xfs@vger.kernel.org, Donald Douwsma, Dave Chinner,
Christoph Hellwig, stable
On Thu, Aug 21, 2025 at 04:36:10PM -0500, Eric Sandeen wrote:
> ENODATA (aka ENOATTR) has a very specific meaning in the xfs xattr code;
> namely, that the requested attribute name could not be found.
>
> However, a medium error from disk may also return ENODATA. At best,
> this medium error may escape to userspace as "attribute not found"
> when in fact it's an IO (disk) error.
>
> At worst, we may oops in xfs_attr_leaf_get() when we do:
>
> error = xfs_attr_leaf_hasname(args, &bp);
> if (error == -ENOATTR) {
> xfs_trans_brelse(args->trans, bp);
> return error;
> }
>
> because an ENODATA/ENOATTR error from disk leaves us with a null bp,
> and the xfs_trans_brelse will then null-deref it.
>
> As discussed on the list, we really need to modify the lower level
> IO functions to trap all disk errors and ensure that we don't let
> unique errors like this leak up into higher xfs functions - many
> like this should be remapped to EIO.
>
> However, this patch directly addresses a reported bug in the xattr
> code, and should be safe to backport to stable kernels. A larger-scope
> patch to handle more unique errors at lower levels can follow later.
>
> (Note, prior to 07120f1abdff we did not oops, but we did return the
> wrong error code to userspace.)
>
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> Fixes: 07120f1abdff ("xfs: Add xfs_has_attr and subroutines")
> Cc: <stable@vger.kernel.org> # v5.9+
> ---
>
> (I get it that sprinkling this around to 3 places might have an ick
> factor but I think it's necessary to make a surgical strike on this bug
> before we address the general problem.)
>
> Thanks,
> -Eric
>
> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> index fddb55605e0c..9b54cfb0e13d 100644
> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> @@ -478,6 +478,12 @@ xfs_attr3_leaf_read(
>
> err = xfs_da_read_buf(tp, dp, bno, 0, bpp, XFS_ATTR_FORK,
> &xfs_attr3_leaf_buf_ops);
> + /*
> + * ENODATA from disk implies a disk medium failure; ENODATA for
> + * xattrs means attribute not found, so disambiguate that here.
> + */
> + if (err == -ENODATA)
> + err = -EIO;
I think this first chunk isn't needed since you also changed
xfs_da_read_buf to filter the error code, right?
(Shifting towards the giant reconsideration of ENODATA -> EIO filtering)
Do we now also need to adjust the online fsck code to turn ENODATA into
a corruption report? From __xchk_process_error:
case -EFSBADCRC:
case -EFSCORRUPTED:
/* Note the badness but don't abort. */
sc->sm->sm_flags |= errflag;
*error = 0;
fallthrough;
default:
trace_xchk_op_error(sc, agno, bno, *error, ret_ip);
break;
}
We only flag corruptions for these two error codes, but ENODATA from the
block layer means "critical medium error". I take that to mean the
media has permanently lost whatever was persisted there, right?
--D
> if (err || !(*bpp))
> return err;
>
> diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
> index 4c44ce1c8a64..bff3dc226f81 100644
> --- a/fs/xfs/libxfs/xfs_attr_remote.c
> +++ b/fs/xfs/libxfs/xfs_attr_remote.c
> @@ -435,6 +435,13 @@ xfs_attr_rmtval_get(
> 0, &bp, &xfs_attr3_rmt_buf_ops);
> if (xfs_metadata_is_sick(error))
> xfs_dirattr_mark_sick(args->dp, XFS_ATTR_FORK);
> + /*
> + * ENODATA from disk implies a disk medium failure;
> + * ENODATA for xattrs means attribute not found, so
> + * disambiguate that here.
> + */
> + if (error == -ENODATA)
> + error = -EIO;
> if (error)
> return error;
>
> diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
> index 17d9e6154f19..723a0643b838 100644
> --- a/fs/xfs/libxfs/xfs_da_btree.c
> +++ b/fs/xfs/libxfs/xfs_da_btree.c
> @@ -2833,6 +2833,12 @@ xfs_da_read_buf(
> &bp, ops);
> if (xfs_metadata_is_sick(error))
> xfs_dirattr_mark_sick(dp, whichfork);
> + /*
> + * ENODATA from disk implies a disk medium failure; ENODATA for
> + * xattrs means attribute not found, so disambiguate that here.
> + */
> + if (error == -ENODATA && whichfork == XFS_ATTR_FORK)
> + error = -EIO;
> if (error)
> goto out_free;
>
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] xfs: do not propagate ENODATA disk errors into xattr code
2025-08-22 15:21 ` Darrick J. Wong
@ 2025-08-25 7:55 ` Christoph Hellwig
2025-08-25 15:31 ` Eric Sandeen
2025-08-25 15:34 ` Darrick J. Wong
0 siblings, 2 replies; 7+ messages in thread
From: Christoph Hellwig @ 2025-08-25 7:55 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Eric Sandeen, linux-xfs@vger.kernel.org, Donald Douwsma,
Dave Chinner, Christoph Hellwig, stable
On Fri, Aug 22, 2025 at 08:21:37AM -0700, Darrick J. Wong wrote:
> We only flag corruptions for these two error codes, but ENODATA from the
> block layer means "critical medium error". I take that to mean the
> media has permanently lost whatever was persisted there, right?
It can also be a write error. But yes, it's what EIO indidcates in
general. Which is why I really think we should be doing something like
the patch below. But as I don't have the time to fully shephed this
I'm not trying to block this hack, even if I think the issue will
continue to byte us in the future.
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index f9ef3b2a332a..0252faf038aa 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1290,6 +1290,22 @@ xfs_bwrite(
return error;
}
+static int
+xfs_buf_bio_status(
+ struct bio *bio)
+{
+ switch (bio->bi_status) {
+ case BLK_STS_OK:
+ return 0;
+ case BLK_STS_NOSPC:
+ return -ENOSPC;
+ case BLK_STS_OFFLINE:
+ return -ENODEV;
+ default:
+ return -EIO;
+ }
+}
+
static void
xfs_buf_bio_end_io(
struct bio *bio)
@@ -1297,7 +1313,7 @@ xfs_buf_bio_end_io(
struct xfs_buf *bp = bio->bi_private;
if (bio->bi_status)
- xfs_buf_ioerror(bp, blk_status_to_errno(bio->bi_status));
+ xfs_buf_ioerror(bp, xfs_buf_bio_status(bio));
else if ((bp->b_flags & XBF_WRITE) && (bp->b_flags & XBF_ASYNC) &&
XFS_TEST_ERROR(false, bp->b_mount, XFS_ERRTAG_BUF_IOERROR))
xfs_buf_ioerror(bp, -EIO);
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] xfs: do not propagate ENODATA disk errors into xattr code
2025-08-25 7:55 ` Christoph Hellwig
@ 2025-08-25 15:31 ` Eric Sandeen
2025-08-25 15:34 ` Darrick J. Wong
1 sibling, 0 replies; 7+ messages in thread
From: Eric Sandeen @ 2025-08-25 15:31 UTC (permalink / raw)
To: Christoph Hellwig, Darrick J. Wong
Cc: Eric Sandeen, linux-xfs@vger.kernel.org, Donald Douwsma,
Dave Chinner, stable
On 8/25/25 2:55 AM, Christoph Hellwig wrote:
> On Fri, Aug 22, 2025 at 08:21:37AM -0700, Darrick J. Wong wrote:
>> We only flag corruptions for these two error codes, but ENODATA from the
>> block layer means "critical medium error". I take that to mean the
>> media has permanently lost whatever was persisted there, right?
>
> It can also be a write error. But yes, it's what EIO indidcates in
> general. Which is why I really think we should be doing something like
> the patch below. But as I don't have the time to fully shephed this
> I'm not trying to block this hack, even if I think the issue will
> continue to byte us in the future.
Right, as I said we need to do something like what you suggest, which
is 100% fine by me.
I'd just like a safe, very targeted fix merged first, to handle the
currently observed and reported error. A bigger scope change pushed
back to stable kernels has somewhat more risk, and I'd like to avoid
that.
I'll work on something like this, and send another patch that's
something like "xfs: generalize block device error handling" or
whatever, that effectively reverts the patch proposed by me, and
adds this patch proposed by you.
If it soaks a little and has no regressions, I'd be happy to send the
broader fix back to stable as well.
Thanks,
-Eric
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index f9ef3b2a332a..0252faf038aa 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1290,6 +1290,22 @@ xfs_bwrite(
> return error;
> }
>
> +static int
> +xfs_buf_bio_status(
> + struct bio *bio)
> +{
> + switch (bio->bi_status) {
> + case BLK_STS_OK:
> + return 0;
> + case BLK_STS_NOSPC:
> + return -ENOSPC;
> + case BLK_STS_OFFLINE:
> + return -ENODEV;
> + default:
> + return -EIO;
> + }
> +}
> +
> static void
> xfs_buf_bio_end_io(
> struct bio *bio)
> @@ -1297,7 +1313,7 @@ xfs_buf_bio_end_io(
> struct xfs_buf *bp = bio->bi_private;
>
> if (bio->bi_status)
> - xfs_buf_ioerror(bp, blk_status_to_errno(bio->bi_status));
> + xfs_buf_ioerror(bp, xfs_buf_bio_status(bio));
> else if ((bp->b_flags & XBF_WRITE) && (bp->b_flags & XBF_ASYNC) &&
> XFS_TEST_ERROR(false, bp->b_mount, XFS_ERRTAG_BUF_IOERROR))
> xfs_buf_ioerror(bp, -EIO);
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] xfs: do not propagate ENODATA disk errors into xattr code
2025-08-25 7:55 ` Christoph Hellwig
2025-08-25 15:31 ` Eric Sandeen
@ 2025-08-25 15:34 ` Darrick J. Wong
2025-08-27 7:34 ` Christoph Hellwig
1 sibling, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2025-08-25 15:34 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Eric Sandeen, linux-xfs@vger.kernel.org, Donald Douwsma,
Dave Chinner, stable
On Mon, Aug 25, 2025 at 12:55:06AM -0700, Christoph Hellwig wrote:
> On Fri, Aug 22, 2025 at 08:21:37AM -0700, Darrick J. Wong wrote:
> > We only flag corruptions for these two error codes, but ENODATA from the
> > block layer means "critical medium error". I take that to mean the
> > media has permanently lost whatever was persisted there, right?
>
> It can also be a write error. But yes, it's what EIO indidcates in
> general. Which is why I really think we should be doing something like
> the patch below. But as I don't have the time to fully shephed this
> I'm not trying to block this hack, even if I think the issue will
> continue to byte us in the future.
Yes, it's a bit of a problem, whose issues we'll need to nibble on all
over the place to fix all the weird issues before issuing the customary
patchbomb for gluttinous consumption on fsdevel. <cough>
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index f9ef3b2a332a..0252faf038aa 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1290,6 +1290,22 @@ xfs_bwrite(
> return error;
> }
>
> +static int
> +xfs_buf_bio_status(
> + struct bio *bio)
> +{
> + switch (bio->bi_status) {
> + case BLK_STS_OK:
> + return 0;
> + case BLK_STS_NOSPC:
> + return -ENOSPC;
> + case BLK_STS_OFFLINE:
> + return -ENODEV;
> + default:
> + return -EIO;
Well as I pointed out earlier, one interesting "quality" of the current
behavior is that online fsck captures the ENODATA and turns that into a
metadata corruption report. I'd like to keep that behavior.
> + }
> +}
> +
> static void
> xfs_buf_bio_end_io(
> struct bio *bio)
> @@ -1297,7 +1313,7 @@ xfs_buf_bio_end_io(
> struct xfs_buf *bp = bio->bi_private;
>
> if (bio->bi_status)
> - xfs_buf_ioerror(bp, blk_status_to_errno(bio->bi_status));
> + xfs_buf_ioerror(bp, xfs_buf_bio_status(bio));
I think you'd also want to wrap all the submit_bio_wait here too, right?
Hrm, only discard bios, log writes, and zonegc use that function. Maybe
not? I think a failed log write takes down the system no matter what
error code, nobody cares about failing discard, and I think zonegc write
failures just lead to the gc ... aborting?
--D
> else if ((bp->b_flags & XBF_WRITE) && (bp->b_flags & XBF_ASYNC) &&
> XFS_TEST_ERROR(false, bp->b_mount, XFS_ERRTAG_BUF_IOERROR))
> xfs_buf_ioerror(bp, -EIO);
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] xfs: do not propagate ENODATA disk errors into xattr code
2025-08-25 15:34 ` Darrick J. Wong
@ 2025-08-27 7:34 ` Christoph Hellwig
2025-08-27 15:56 ` Darrick J. Wong
0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2025-08-27 7:34 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Christoph Hellwig, Eric Sandeen, linux-xfs@vger.kernel.org,
Donald Douwsma, Dave Chinner, stable
On Mon, Aug 25, 2025 at 08:34:14AM -0700, Darrick J. Wong wrote:
> > + case BLK_STS_NOSPC:
> > + return -ENOSPC;
> > + case BLK_STS_OFFLINE:
> > + return -ENODEV;
> > + default:
> > + return -EIO;
>
> Well as I pointed out earlier, one interesting "quality" of the current
> behavior is that online fsck captures the ENODATA and turns that into a
> metadata corruption report. I'd like to keep that behavior.
-EIO is just as much of a metadata corruption, so if you only catch
ENODATA you're missing most of them.
> > if (bio->bi_status)
> > - xfs_buf_ioerror(bp, blk_status_to_errno(bio->bi_status));
> > + xfs_buf_ioerror(bp, xfs_buf_bio_status(bio));
>
> I think you'd also want to wrap all the submit_bio_wait here too, right?
>
> Hrm, only discard bios, log writes, and zonegc use that function. Maybe
> not? I think a failed log write takes down the system no matter what
> error code, nobody cares about failing discard, and I think zonegc write
> failures just lead to the gc ... aborting?
Yes. In Linux -EIO means an unrecoverable I/O error that the lower
layers gave up retrying. Not much we can do about that.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] xfs: do not propagate ENODATA disk errors into xattr code
2025-08-27 7:34 ` Christoph Hellwig
@ 2025-08-27 15:56 ` Darrick J. Wong
0 siblings, 0 replies; 7+ messages in thread
From: Darrick J. Wong @ 2025-08-27 15:56 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Eric Sandeen, linux-xfs@vger.kernel.org, Donald Douwsma,
Dave Chinner, stable
On Wed, Aug 27, 2025 at 12:34:44AM -0700, Christoph Hellwig wrote:
> On Mon, Aug 25, 2025 at 08:34:14AM -0700, Darrick J. Wong wrote:
> > > + case BLK_STS_NOSPC:
> > > + return -ENOSPC;
> > > + case BLK_STS_OFFLINE:
> > > + return -ENODEV;
> > > + default:
> > > + return -EIO;
> >
> > Well as I pointed out earlier, one interesting "quality" of the current
> > behavior is that online fsck captures the ENODATA and turns that into a
> > metadata corruption report. I'd like to keep that behavior.
>
> -EIO is just as much of a metadata corruption, so if you only catch
> ENODATA you're missing most of them.
Hrmm, well an EIO (or an ENODATA) coming from the block layer causes the
scrub code to return to userspace with EIO, and xfs_scrub will complain
about the IO error and exit.
It doesn't explicitly mark the data structure as corrupt, but scrub
failing should be enough to conclude that the fs is corrupt.
I could patch the kernel to set the CORRUPT flag on the data structure
and keep going, since the likelihood of random bit errors causing media
errors is pretty high now that we have disks that store more than 1e15
bits.
> > > if (bio->bi_status)
> > > - xfs_buf_ioerror(bp, blk_status_to_errno(bio->bi_status));
> > > + xfs_buf_ioerror(bp, xfs_buf_bio_status(bio));
> >
> > I think you'd also want to wrap all the submit_bio_wait here too, right?
> >
> > Hrm, only discard bios, log writes, and zonegc use that function. Maybe
> > not? I think a failed log write takes down the system no matter what
> > error code, nobody cares about failing discard, and I think zonegc write
> > failures just lead to the gc ... aborting?
>
> Yes. In Linux -EIO means an unrecoverable I/O error that the lower
> layers gave up retrying. Not much we can do about that.
<nod>
--D
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-08-27 15:56 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-21 21:36 [PATCH] xfs: do not propagate ENODATA disk errors into xattr code Eric Sandeen
2025-08-22 15:21 ` Darrick J. Wong
2025-08-25 7:55 ` Christoph Hellwig
2025-08-25 15:31 ` Eric Sandeen
2025-08-25 15:34 ` Darrick J. Wong
2025-08-27 7:34 ` Christoph Hellwig
2025-08-27 15:56 ` Darrick J. Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).