[PATCH RFC] xfs: remap block layer ENODATA read errors to EIO

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO
@ 2025-08-18 20:22 Eric Sandeen
  2025-08-18 20:45 ` Darrick J. Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Eric Sandeen @ 2025-08-18 20:22 UTC (permalink / raw)
  To: linux-xfs@vger.kernel.org; +Cc: Donald Douwsma

We had a report that a failing scsi disk was oopsing XFS when an xattr
read encountered a media error. This is because the media error returned
-ENODATA, which we map in xattr code to -ENOATTR and treat specially.

In this particular case, it looked like:

xfs_attr_leaf_get()
	error = xfs_attr_leaf_hasname(args, &bp);
	// here bp is NULL, error == -ENODATA from disk failure
	// but we define ENOATTR as ENODATA, so ...
	if (error == -ENOATTR)  {
		// whoops, surprise! bp is NULL, OOPS here
		xfs_trans_brelse(args->trans, bp);
		return error;
	} ...

To avoid whack-a-mole "test for null bp" or "which -ENODATA do we really
mean in this function?" throughout the xattr code, my first thought is
that we should simply map -ENODATA in lower level read functions back to
-EIO, which is unambiguous, even if we lose the nuance of the underlying
error code. (The block device probably already squawked.) Thoughts?

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index f9ef3b2a332a..6ba57ccaa25f 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -747,6 +747,9 @@ xfs_buf_read_map(
 		/* bad CRC means corrupted metadata */
 		if (error == -EFSBADCRC)
 			error = -EFSCORRUPTED;
+		/* ENODATA == ENOATTR which confuses xattr layers */
+		if (error == -ENODATA)
+			error = -EIO;
 		return error;
 	}
 


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO
  2025-08-18 20:22 [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO Eric Sandeen
@ 2025-08-18 20:45 ` Darrick J. Wong
  2025-08-18 21:11   ` Eric Sandeen
  2025-08-21  9:16   ` Donald Douwsma
  2025-08-18 22:09 ` Dave Chinner
  2025-08-19  8:08 ` Christoph Hellwig
  2 siblings, 2 replies; 21+ messages in thread
From: Darrick J. Wong @ 2025-08-18 20:45 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs@vger.kernel.org, Donald Douwsma

On Mon, Aug 18, 2025 at 03:22:02PM -0500, Eric Sandeen wrote:
> We had a report that a failing scsi disk was oopsing XFS when an xattr
> read encountered a media error. This is because the media error returned
> -ENODATA, which we map in xattr code to -ENOATTR and treat specially.
> 
> In this particular case, it looked like:
> 
> xfs_attr_leaf_get()
> 	error = xfs_attr_leaf_hasname(args, &bp);
> 	// here bp is NULL, error == -ENODATA from disk failure
> 	// but we define ENOATTR as ENODATA, so ...
> 	if (error == -ENOATTR)  {
> 		// whoops, surprise! bp is NULL, OOPS here
> 		xfs_trans_brelse(args->trans, bp);
> 		return error;
> 	} ...
> 
> To avoid whack-a-mole "test for null bp" or "which -ENODATA do we really
> mean in this function?" throughout the xattr code, my first thought is
> that we should simply map -ENODATA in lower level read functions back to
> -EIO, which is unambiguous, even if we lose the nuance of the underlying
> error code. (The block device probably already squawked.) Thoughts?

Uhhhh where does this ENODATA come from?  Is it the block layer?

$ git grep -w ENODATA block/
block/blk-core.c:146:   [BLK_STS_MEDIUM]        = { -ENODATA,   "critical medium" },

--D

> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> ---
> 
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index f9ef3b2a332a..6ba57ccaa25f 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -747,6 +747,9 @@ xfs_buf_read_map(
>  		/* bad CRC means corrupted metadata */
>  		if (error == -EFSBADCRC)
>  			error = -EFSCORRUPTED;
> +		/* ENODATA == ENOATTR which confuses xattr layers */
> +		if (error == -ENODATA)
> +			error = -EIO;
>  		return error;
>  	}
>  
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO
  2025-08-18 20:45 ` Darrick J. Wong
@ 2025-08-18 21:11   ` Eric Sandeen
  2025-08-18 23:04     ` Darrick J. Wong
  2025-08-21  9:16   ` Donald Douwsma
  1 sibling, 1 reply; 21+ messages in thread
From: Eric Sandeen @ 2025-08-18 21:11 UTC (permalink / raw)
  To: Darrick J. Wong, Eric Sandeen; +Cc: linux-xfs@vger.kernel.org, Donald Douwsma

On 8/18/25 3:45 PM, Darrick J. Wong wrote:
> On Mon, Aug 18, 2025 at 03:22:02PM -0500, Eric Sandeen wrote:
>> We had a report that a failing scsi disk was oopsing XFS when an xattr
>> read encountered a media error. This is because the media error returned
>> -ENODATA, which we map in xattr code to -ENOATTR and treat specially.
>>
>> In this particular case, it looked like:
>>
>> xfs_attr_leaf_get()
>> 	error = xfs_attr_leaf_hasname(args, &bp);
>> 	// here bp is NULL, error == -ENODATA from disk failure
>> 	// but we define ENOATTR as ENODATA, so ...
>> 	if (error == -ENOATTR)  {
>> 		// whoops, surprise! bp is NULL, OOPS here
>> 		xfs_trans_brelse(args->trans, bp);
>> 		return error;
>> 	} ...
>>
>> To avoid whack-a-mole "test for null bp" or "which -ENODATA do we really
>> mean in this function?" throughout the xattr code, my first thought is
>> that we should simply map -ENODATA in lower level read functions back to
>> -EIO, which is unambiguous, even if we lose the nuance of the underlying
>> error code. (The block device probably already squawked.) Thoughts?
> 
> Uhhhh where does this ENODATA come from?  Is it the block layer?
> 
> $ git grep -w ENODATA block/
> block/blk-core.c:146:   [BLK_STS_MEDIUM]        = { -ENODATA,   "critical medium" },

That, probably, though I don't speak block layer very well. As mentioned, it was a
SCSI disk error, and it appeared in XFS as -ENODATA:

sd 0:0:23:0: [sdad] tag#937 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=2s
sd 0:0:23:0: [sdad] tag#937 Sense Key : Medium Error [current] 
sd 0:0:23:0: [sdad] tag#937 Add. Sense: Read retries exhausted
sd 0:0:23:0: [sdad] tag#937 CDB: Read(16) 88 00 00 00 00 00 9b df 5e 78 00 00 00 08 00 00
critical medium error, dev sdad, sector 2615107192 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class 2
XFS (sdad1): metadata I/O error in "xfs_da_read_buf+0xe1/0x140 [xfs]" at daddr 0x9bdf5678 len 8 error 61 
(see error 61, ENODATA)

> --D
> 
>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
>> ---
>>
>> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
>> index f9ef3b2a332a..6ba57ccaa25f 100644
>> --- a/fs/xfs/xfs_buf.c
>> +++ b/fs/xfs/xfs_buf.c
>> @@ -747,6 +747,9 @@ xfs_buf_read_map(
>>  		/* bad CRC means corrupted metadata */
>>  		if (error == -EFSBADCRC)
>>  			error = -EFSCORRUPTED;
>> +		/* ENODATA == ENOATTR which confuses xattr layers */
>> +		if (error == -ENODATA)
>> +			error = -EIO;
>>  		return error;
>>  	}
>>  
>>
>>
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO
  2025-08-18 20:22 [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO Eric Sandeen
  2025-08-18 20:45 ` Darrick J. Wong
@ 2025-08-18 22:09 ` Dave Chinner
  2025-08-19  2:27   ` Eric Sandeen
  2025-08-19  8:08 ` Christoph Hellwig
  2 siblings, 1 reply; 21+ messages in thread
From: Dave Chinner @ 2025-08-18 22:09 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs@vger.kernel.org, Donald Douwsma

On Mon, Aug 18, 2025 at 03:22:02PM -0500, Eric Sandeen wrote:
> We had a report that a failing scsi disk was oopsing XFS when an xattr
> read encountered a media error. This is because the media error returned
> -ENODATA, which we map in xattr code to -ENOATTR and treat specially.
> 
> In this particular case, it looked like:
> 
> xfs_attr_leaf_get()
> 	error = xfs_attr_leaf_hasname(args, &bp);
> 	// here bp is NULL, error == -ENODATA from disk failure
> 	// but we define ENOATTR as ENODATA, so ...
> 	if (error == -ENOATTR)  {
> 		// whoops, surprise! bp is NULL, OOPS here
> 		xfs_trans_brelse(args->trans, bp);
> 		return error;
> 	} ...
> 
> To avoid whack-a-mole "test for null bp" or "which -ENODATA do we really
> mean in this function?" throughout the xattr code, my first thought is
> that we should simply map -ENODATA in lower level read functions back to
> -EIO, which is unambiguous, even if we lose the nuance of the underlying
> error code. (The block device probably already squawked.) Thoughts?
> 
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> ---
> 
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index f9ef3b2a332a..6ba57ccaa25f 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -747,6 +747,9 @@ xfs_buf_read_map(
>  		/* bad CRC means corrupted metadata */
>  		if (error == -EFSBADCRC)
>  			error = -EFSCORRUPTED;
> +		/* ENODATA == ENOATTR which confuses xattr layers */
> +		if (error == -ENODATA)
> +			error = -EIO;

Not sure this is the right place to map this. It is only relevant to
the XFS xattr layer in the kernel, so mapping it for everything
seems like overkill.

I suspect that this error mapping should really be in
xfs_attr3_leaf_read() and xfs_da_read_buf() so it is done for
xattr fork metadata read IO only...

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO
  2025-08-18 21:11   ` Eric Sandeen
@ 2025-08-18 23:04     ` Darrick J. Wong
  0 siblings, 0 replies; 21+ messages in thread
From: Darrick J. Wong @ 2025-08-18 23:04 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Eric Sandeen, linux-xfs@vger.kernel.org, Donald Douwsma

On Mon, Aug 18, 2025 at 04:11:41PM -0500, Eric Sandeen wrote:
> On 8/18/25 3:45 PM, Darrick J. Wong wrote:
> > On Mon, Aug 18, 2025 at 03:22:02PM -0500, Eric Sandeen wrote:
> >> We had a report that a failing scsi disk was oopsing XFS when an xattr
> >> read encountered a media error. This is because the media error returned
> >> -ENODATA, which we map in xattr code to -ENOATTR and treat specially.
> >>
> >> In this particular case, it looked like:
> >>
> >> xfs_attr_leaf_get()
> >> 	error = xfs_attr_leaf_hasname(args, &bp);
> >> 	// here bp is NULL, error == -ENODATA from disk failure
> >> 	// but we define ENOATTR as ENODATA, so ...
> >> 	if (error == -ENOATTR)  {
> >> 		// whoops, surprise! bp is NULL, OOPS here
> >> 		xfs_trans_brelse(args->trans, bp);
> >> 		return error;
> >> 	} ...
> >>
> >> To avoid whack-a-mole "test for null bp" or "which -ENODATA do we really
> >> mean in this function?" throughout the xattr code, my first thought is
> >> that we should simply map -ENODATA in lower level read functions back to
> >> -EIO, which is unambiguous, even if we lose the nuance of the underlying
> >> error code. (The block device probably already squawked.) Thoughts?
> > 
> > Uhhhh where does this ENODATA come from?  Is it the block layer?
> > 
> > $ git grep -w ENODATA block/
> > block/blk-core.c:146:   [BLK_STS_MEDIUM]        = { -ENODATA,   "critical medium" },
> 
> That, probably, though I don't speak block layer very well. As mentioned, it was a
> SCSI disk error, and it appeared in XFS as -ENODATA:
> 
> sd 0:0:23:0: [sdad] tag#937 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=2s
> sd 0:0:23:0: [sdad] tag#937 Sense Key : Medium Error [current] 
> sd 0:0:23:0: [sdad] tag#937 Add. Sense: Read retries exhausted
> sd 0:0:23:0: [sdad] tag#937 CDB: Read(16) 88 00 00 00 00 00 9b df 5e 78 00 00 00 08 00 00
> critical medium error, dev sdad, sector 2615107192 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class 2

Ah, yup, critical error, we ran out of retries.

> XFS (sdad1): metadata I/O error in "xfs_da_read_buf+0xe1/0x140 [xfs]" at daddr 0x9bdf5678 len 8 error 61 
> (see error 61, ENODATA)
> 
> > --D
> > 
> >> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> >> ---
> >>
> >> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> >> index f9ef3b2a332a..6ba57ccaa25f 100644
> >> --- a/fs/xfs/xfs_buf.c
> >> +++ b/fs/xfs/xfs_buf.c
> >> @@ -747,6 +747,9 @@ xfs_buf_read_map(
> >>  		/* bad CRC means corrupted metadata */
> >>  		if (error == -EFSBADCRC)
> >>  			error = -EFSCORRUPTED;
> >> +		/* ENODATA == ENOATTR which confuses xattr layers */

Can this comment mention that ENODATA comes from the block layer?

		/*
		 * ENODATA means critical medium error, don't let it
		 * get mixed up with the xattr usage
		 */

With that changed,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> >> +		if (error == -ENODATA)
> >> +			error = -EIO;
> >>  		return error;
> >>  	}
> >>  
> >>
> >>
> > 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO
  2025-08-18 22:09 ` Dave Chinner
@ 2025-08-19  2:27   ` Eric Sandeen
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Sandeen @ 2025-08-19  2:27 UTC (permalink / raw)
  To: Dave Chinner, Eric Sandeen; +Cc: linux-xfs@vger.kernel.org, Donald Douwsma

On 8/18/25 5:09 PM, Dave Chinner wrote:
>> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
>> index f9ef3b2a332a..6ba57ccaa25f 100644
>> --- a/fs/xfs/xfs_buf.c
>> +++ b/fs/xfs/xfs_buf.c
>> @@ -747,6 +747,9 @@ xfs_buf_read_map(
>>  		/* bad CRC means corrupted metadata */
>>  		if (error == -EFSBADCRC)
>>  			error = -EFSCORRUPTED;
>> +		/* ENODATA == ENOATTR which confuses xattr layers */
>> +		if (error == -ENODATA)
>> +			error = -EIO;
> Not sure this is the right place to map this. It is only relevant to
> the XFS xattr layer in the kernel, so mapping it for everything
> seems like overkill.
> 
> I suspect that this error mapping should really be in
> xfs_attr3_leaf_read() and xfs_da_read_buf() so it is done for
> xattr fork metadata read IO only...

yeah, that's partly why RFC. The higher up it gets added, the more
risk of adding another caller later that doesn't handle it. I didn't
see much downside to doing it lower, and just saying well, ENODATA
is kinda another way to spell EIO.

So I don't really care, but doing it in one place seemed better
than doing it in several.

Doing it lower might also be more consistent - only remapping ENODATA
for xattr reads is a little strange? But you're right that other
IOs shouldn't need it.  *shrug*

-Eric

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO
  2025-08-18 20:22 [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO Eric Sandeen
  2025-08-18 20:45 ` Darrick J. Wong
  2025-08-18 22:09 ` Dave Chinner
@ 2025-08-19  8:08 ` Christoph Hellwig
  2025-08-19 14:34   ` Darrick J. Wong
  2025-08-19 15:14   ` Eric Sandeen
  2 siblings, 2 replies; 21+ messages in thread
From: Christoph Hellwig @ 2025-08-19  8:08 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs@vger.kernel.org, Donald Douwsma

On Mon, Aug 18, 2025 at 03:22:02PM -0500, Eric Sandeen wrote:
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index f9ef3b2a332a..6ba57ccaa25f 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -747,6 +747,9 @@ xfs_buf_read_map(
>  		/* bad CRC means corrupted metadata */
>  		if (error == -EFSBADCRC)
>  			error = -EFSCORRUPTED;
> +		/* ENODATA == ENOATTR which confuses xattr layers */
> +		if (error == -ENODATA)
> +			error = -EIO;

I think we need to stop passing random errors through here (same
for the write side).  Unless we have an explicit handler (which we
have for a tiny number of errors), passing random stuff we got through
and which higher layers use for their own purpose will cause trouble.

Btw, your patch is timely as I've just seen something that probably
is the same root cause when running XFS on a device with messed up
PI, which is another of those cases where the block layer returns
"odd" error codes.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO
  2025-08-19  8:08 ` Christoph Hellwig
@ 2025-08-19 14:34   ` Darrick J. Wong
  2025-08-19 14:48     ` Christoph Hellwig
  2025-08-19 15:14   ` Eric Sandeen
  1 sibling, 1 reply; 21+ messages in thread
From: Darrick J. Wong @ 2025-08-19 14:34 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Eric Sandeen, linux-xfs@vger.kernel.org, Donald Douwsma

On Tue, Aug 19, 2025 at 01:08:47AM -0700, Christoph Hellwig wrote:
> On Mon, Aug 18, 2025 at 03:22:02PM -0500, Eric Sandeen wrote:
> > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > index f9ef3b2a332a..6ba57ccaa25f 100644
> > --- a/fs/xfs/xfs_buf.c
> > +++ b/fs/xfs/xfs_buf.c
> > @@ -747,6 +747,9 @@ xfs_buf_read_map(
> >  		/* bad CRC means corrupted metadata */
> >  		if (error == -EFSBADCRC)
> >  			error = -EFSCORRUPTED;
> > +		/* ENODATA == ENOATTR which confuses xattr layers */
> > +		if (error == -ENODATA)
> > +			error = -EIO;
> 
> I think we need to stop passing random errors through here (same
> for the write side).  Unless we have an explicit handler (which we
> have for a tiny number of errors), passing random stuff we got through
> and which higher layers use for their own purpose will cause trouble.
> 
> Btw, your patch is timely as I've just seen something that probably
> is the same root cause when running XFS on a device with messed up
> PI, which is another of those cases where the block layer returns
> "odd" error codes.

Maybe xfs should translate bi_status into whatever error codes it wants
directly instead of relying on blk_status_to_errno, which can change in
subtle ways?

Though how many of those status codes actually need a different error?
BLK_STS_MEDIUM (ENODATA) and BLK_STS_NOTSUPP (EOPNOTSUPP) are the only
ones that look (to me) like they could be confused easily.

--D

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO
  2025-08-19 14:34   ` Darrick J. Wong
@ 2025-08-19 14:48     ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2025-08-19 14:48 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Eric Sandeen, linux-xfs@vger.kernel.org,
	Donald Douwsma

On Tue, Aug 19, 2025 at 07:34:43AM -0700, Darrick J. Wong wrote:
> Maybe xfs should translate bi_status into whatever error codes it wants
> directly instead of relying on blk_status_to_errno, which can change in
> subtle ways?

We could do that.  In fact that's what I'm effectively suggesting.
Pick the ones we care about (and document why while we're at it) and
map everything else to -EIO.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO
  2025-08-19  8:08 ` Christoph Hellwig
  2025-08-19 14:34   ` Darrick J. Wong
@ 2025-08-19 15:14   ` Eric Sandeen
  2025-08-19 15:23     ` Christoph Hellwig
  1 sibling, 1 reply; 21+ messages in thread
From: Eric Sandeen @ 2025-08-19 15:14 UTC (permalink / raw)
  To: Christoph Hellwig, Eric Sandeen; +Cc: linux-xfs@vger.kernel.org, Donald Douwsma

On 8/19/25 3:08 AM, Christoph Hellwig wrote:
> On Mon, Aug 18, 2025 at 03:22:02PM -0500, Eric Sandeen wrote:
>> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
>> index f9ef3b2a332a..6ba57ccaa25f 100644
>> --- a/fs/xfs/xfs_buf.c
>> +++ b/fs/xfs/xfs_buf.c
>> @@ -747,6 +747,9 @@ xfs_buf_read_map(
>>  		/* bad CRC means corrupted metadata */
>>  		if (error == -EFSBADCRC)
>>  			error = -EFSCORRUPTED;
>> +		/* ENODATA == ENOATTR which confuses xattr layers */
>> +		if (error == -ENODATA)
>> +			error = -EIO;
> 
> I think we need to stop passing random errors through here (same
> for the write side).  Unless we have an explicit handler (which we
> have for a tiny number of errors), passing random stuff we got through
> and which higher layers use for their own purpose will cause trouble.
> 
> Btw, your patch is timely as I've just seen something that probably
> is the same root cause when running XFS on a device with messed up
> PI, which is another of those cases where the block layer returns
> "odd" error codes.

Ok, yeah, I had thought about just doing basically

	else
		return -EIO;

as well.

Though you and Dave seem to have different visions here, I do think that
for XFS's purposes, a failed IO is -EIO unless we explicitly need something
different. Anything finer grained or distributed to higher layers sounds
like a bit of a maintenance and correctness nightmare, to me.

-Eric



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO
  2025-08-19 15:14   ` Eric Sandeen
@ 2025-08-19 15:23     ` Christoph Hellwig
  2025-08-19 15:38       ` Eric Sandeen
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2025-08-19 15:23 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Christoph Hellwig, Eric Sandeen, linux-xfs@vger.kernel.org,
	Donald Douwsma

On Tue, Aug 19, 2025 at 10:14:01AM -0500, Eric Sandeen wrote:
> Though you and Dave seem to have different visions here,

We've already clashed on a slightly different twist of this a few weeks
ago :)

> I do think that
> for XFS's purposes, a failed IO is -EIO unless we explicitly need something
> different. Anything finer grained or distributed to higher layers sounds
> like a bit of a maintenance and correctness nightmare, to me.

Mostly, but not entirely yes.  In general I/O errors that bubble up
to the file system are just that: I/O errors that are not retryable
at this point, as otherwise the drivers / midlayers would have already
delt with it.

But there are a few exceptions to that:

The one thing we had a discussion about was ENOSPC, which can happen
with some thing provisioning solutions (and apparently redhat cares
about dm-thin there).  For this we do want retry metadata writes
based on that design, and special casing it would be good, because
an escaping ENOSPC would do the entirely wrong thing in all layers
about the buffer cache.

Another one is EAGAIN for non-blocking I/O.  That's mostly a data
path thing, and we can't really deal with it, but if we make full
use of it, it needs to be special cased.

And then EOPNOTSUP if we want to try optional operations that we
can't query ahead of time.  SCSI WRITE_SAME is one of them, but
we fortunately hide that behind block layer helpers.

For file system directly dealing with persistent reservations
BLK_STS_RESV_CONFLICT might be another one, but I hope we don't
get there :)

If the file system ever directly makes use of Command duration
limits, BLK_STS_DURATION_LIMIT might be another one.

As you see very little of that is actually relevant for XFS,
and even less for the buffer cache.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO
  2025-08-19 15:23     ` Christoph Hellwig
@ 2025-08-19 15:38       ` Eric Sandeen
  2025-08-19 15:41         ` Christoph Hellwig
                           ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Eric Sandeen @ 2025-08-19 15:38 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Eric Sandeen, linux-xfs@vger.kernel.org, Donald Douwsma

On 8/19/25 10:23 AM, Christoph Hellwig wrote:

...

> The one thing we had a discussion about was ENOSPC, which can happen
> with some thing provisioning solutions (and apparently redhat cares
> about dm-thin there).  For this we do want retry metadata writes
> based on that design, and special casing it would be good, because
> an escaping ENOSPC would do the entirely wrong thing in all layers
> about the buffer cache.
> 
> Another one is EAGAIN for non-blocking I/O.  That's mostly a data
> path thing, and we can't really deal with it, but if we make full
> use of it, it needs to be special cased.
> 
> And then EOPNOTSUP if we want to try optional operations that we
> can't query ahead of time.  SCSI WRITE_SAME is one of them, but
> we fortunately hide that behind block layer helpers.
> 
> For file system directly dealing with persistent reservations
> BLK_STS_RESV_CONFLICT might be another one, but I hope we don't
> get there :)
> 
> If the file system ever directly makes use of Command duration
> limits, BLK_STS_DURATION_LIMIT might be another one.
> 
> As you see very little of that is actually relevant for XFS,
> and even less for the buffer cache.

Ok, this is getting a little more complex. The ENODATA problem is
very specific, and has (oddly) been reported by users/customers twice
in recent days. Maybe I can send an acceptable fix for that specific,
observed problem (also suitable for -stable etc), then another
one that is more ambitious on top of that.

-Eric

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO
  2025-08-19 15:38       ` Eric Sandeen
@ 2025-08-19 15:41         ` Christoph Hellwig
  2025-08-19 21:45         ` Dave Chinner
  2025-08-25  7:51         ` Christoph Hellwig
  2 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2025-08-19 15:41 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Christoph Hellwig, Eric Sandeen, linux-xfs@vger.kernel.org,
	Donald Douwsma

On Tue, Aug 19, 2025 at 10:38:54AM -0500, Eric Sandeen wrote:
> Ok, this is getting a little more complex. The ENODATA problem is
> very specific, and has (oddly) been reported by users/customers twice
> in recent days. Maybe I can send an acceptable fix for that specific,
> observed problem (also suitable for -stable etc), then another
> one that is more ambitious on top of that.

The summary of the above is that for xfs_buf concerns any error
leaking out of xfs_buf.c should be turned into -EIO.  We might
want to treat ENOSPC special inside of xfs_buf.c, but that would
be a separate enhancement.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO
  2025-08-19 15:38       ` Eric Sandeen
  2025-08-19 15:41         ` Christoph Hellwig
@ 2025-08-19 21:45         ` Dave Chinner
  2025-08-20  0:16           ` Eric Sandeen
  2025-08-25  7:51         ` Christoph Hellwig
  2 siblings, 1 reply; 21+ messages in thread
From: Dave Chinner @ 2025-08-19 21:45 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Christoph Hellwig, Eric Sandeen, linux-xfs@vger.kernel.org,
	Donald Douwsma

On Tue, Aug 19, 2025 at 10:38:54AM -0500, Eric Sandeen wrote:
> On 8/19/25 10:23 AM, Christoph Hellwig wrote:
> 
> ...
> 
> > The one thing we had a discussion about was ENOSPC, which can happen
> > with some thing provisioning solutions (and apparently redhat cares
> > about dm-thin there).  For this we do want retry metadata writes
> > based on that design, and special casing it would be good, because
> > an escaping ENOSPC would do the entirely wrong thing in all layers
> > about the buffer cache.
> > 
> > Another one is EAGAIN for non-blocking I/O.  That's mostly a data
> > path thing, and we can't really deal with it, but if we make full
> > use of it, it needs to be special cased.
> > 
> > And then EOPNOTSUP if we want to try optional operations that we
> > can't query ahead of time.  SCSI WRITE_SAME is one of them, but
> > we fortunately hide that behind block layer helpers.
> > 
> > For file system directly dealing with persistent reservations
> > BLK_STS_RESV_CONFLICT might be another one, but I hope we don't
> > get there :)
> > 
> > If the file system ever directly makes use of Command duration
> > limits, BLK_STS_DURATION_LIMIT might be another one.
> > 
> > As you see very little of that is actually relevant for XFS,
> > and even less for the buffer cache.
> 
> Ok, this is getting a little more complex. The ENODATA problem is
> very specific, and has (oddly) been reported by users/customers twice
> in recent days. Maybe I can send an acceptable fix for that specific,
> observed problem (also suitable for -stable etc), then another
> one that is more ambitious on top of that.

Right, the lowest risk, minimal targetted fix for the problem
reported is to remap the error in the attr layers. Nothing else is
then affected (ie. global changes of behaviour have significant
potential for unexpected regressions), but the issue is solved for
the users that are tripping over it.

Then, if someone really wants to completely rearchitect how we
handle IO errors in XFS, that can be done as a separate project,
with it's own justification, design review, planning for
integration/deprecation/removal of existing error handling
infrastructure, etc.

We do not tie acceptance of trivial bug fixes with a requirement to
completely rearchitect fundamental filesystem behaviours that are
only vaguely related to the bug that needs to be fixed.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO
  2025-08-19 21:45         ` Dave Chinner
@ 2025-08-20  0:16           ` Eric Sandeen
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Sandeen @ 2025-08-20  0:16 UTC (permalink / raw)
  To: Dave Chinner, Eric Sandeen
  Cc: Christoph Hellwig, linux-xfs@vger.kernel.org, Donald Douwsma

On 8/19/25 4:45 PM, Dave Chinner wrote:
> On Tue, Aug 19, 2025 at 10:38:54AM -0500, Eric Sandeen wrote:

...

>> Ok, this is getting a little more complex. The ENODATA problem is
>> very specific, and has (oddly) been reported by users/customers twice
>> in recent days. Maybe I can send an acceptable fix for that specific,
>> observed problem (also suitable for -stable etc), then another
>> one that is more ambitious on top of that.
> 
> Right, the lowest risk, minimal targetted fix for the problem
> reported is to remap the error in the attr layers. Nothing else is
> then affected (ie. global changes of behaviour have significant
> potential for unexpected regressions), but the issue is solved for
> the users that are tripping over it.
> 
> Then, if someone really wants to completely rearchitect how we
> handle IO errors in XFS, that can be done as a separate project,
> with it's own justification, design review, planning for
> integration/deprecation/removal of existing error handling
> infrastructure, etc.
> 
> We do not tie acceptance of trivial bug fixes with a requirement to
> completely rearchitect fundamental filesystem behaviours that are
> only vaguely related to the bug that needs to be fixed.

Agree, though I don't think (I hope) that any of this discussion was
a NAK, just a "there might be a bigger problem to solve here, too" and
I agree with that. I do want to push to get the demonstrable bug fixed
in a direct, safe way first, though, and not bog it down with grander
plans.

I'll try to find time to do that, and look at the bigger problem,
if I have the time and ability. :)

Thanks,
-Eric

> -Dave.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO
  2025-08-18 20:45 ` Darrick J. Wong
  2025-08-18 21:11   ` Eric Sandeen
@ 2025-08-21  9:16   ` Donald Douwsma
  2025-08-21  9:29     ` [PATCH] xfs: test case for handling io errors when reading extended attributes Donald Douwsma
  2025-08-21 12:52     ` [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO Carlos Maiolino
  1 sibling, 2 replies; 21+ messages in thread
From: Donald Douwsma @ 2025-08-21  9:16 UTC (permalink / raw)
  To: Darrick J. Wong, Eric Sandeen; +Cc: linux-xfs@vger.kernel.org



On 19/8/25 06:45, Darrick J. Wong wrote:
> On Mon, Aug 18, 2025 at 03:22:02PM -0500, Eric Sandeen wrote:
>> We had a report that a failing scsi disk was oopsing XFS when an xattr
>> read encountered a media error. This is because the media error returned
>> -ENODATA, which we map in xattr code to -ENOATTR and treat specially.
>>
>> In this particular case, it looked like:
>>
>> xfs_attr_leaf_get()
>> 	error = xfs_attr_leaf_hasname(args, &bp);
>> 	// here bp is NULL, error == -ENODATA from disk failure
>> 	// but we define ENOATTR as ENODATA, so ...
>> 	if (error == -ENOATTR)  {
>> 		// whoops, surprise! bp is NULL, OOPS here
>> 		xfs_trans_brelse(args->trans, bp);
>> 		return error;
>> 	} ...
>>
>> To avoid whack-a-mole "test for null bp" or "which -ENODATA do we really
>> mean in this function?" throughout the xattr code, my first thought is
>> that we should simply map -ENODATA in lower level read functions back to
>> -EIO, which is unambiguous, even if we lose the nuance of the underlying
>> error code. (The block device probably already squawked.) Thoughts?
> 
> Uhhhh where does this ENODATA come from?  Is it the block layer?
> 
> $ git grep -w ENODATA block/
> block/blk-core.c:146:   [BLK_STS_MEDIUM]        = { -ENODATA,   "critical medium" },
> 

I had been working on a test case for this based on dmerror, but It
never successfully triggered this, since dmerror returned EIO.

At least it didn't until Eric got creative mapping the the error back to
ENODATA.

I'll was in the process of turning this into an xfstest based on your
tests/xfs/556, I'll reply here with it in case its useful to anyone, but
it would need to be modified to somehow inject ENODATA into the return
path.

Don



It should work with a systemtap to map the error, though I think Eric 
was considering alternatives.






^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] xfs: test case for handling io errors when reading extended attributes
  2025-08-21  9:16   ` Donald Douwsma
@ 2025-08-21  9:29     ` Donald Douwsma
  2025-08-21 12:52     ` [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO Carlos Maiolino
  1 sibling, 0 replies; 21+ messages in thread
From: Donald Douwsma @ 2025-08-21  9:29 UTC (permalink / raw)
  To: linux-xfs; +Cc: Darrick J . Wong, Eric Sandeen, Donald Douwsma

We've seen reports from the field panicing in xfs_trans_brelse after
an io error for an attribute block.

sd 0:0:23:0: [sdx] tag#271 CDB: Read(16) 88 00 00 00 00 00 9b df 5e 78 00 00 00 08 00 00
critical medium error, dev sdx, sector 2615107192 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class 2
XFS (sdx1): metadata I/O error in "xfs_da_read_buf+0xe1/0x140 [xfs]" at daddr 0x9bdf5678 len 8 error 61
BUG: kernel NULL pointer dereference, address: 00000000000000e0
...
RIP: 0010:xfs_trans_brelse+0xb/0xe0 [xfs]
...
Call Trace:
 <TASK>
...
 ? xfs_trans_brelse+0xb/0xe0 [xfs]
 xfs_attr_leaf_get+0xb6/0xc0 [xfs]
 xfs_attr_get+0xa0/0xd0 [xfs]
 xfs_xattr_get+0x75/0xb0 [xfs]
 __vfs_getxattr+0x53/0x70
 inode_doinit_use_xattr+0x63/0x180
 inode_doinit_with_dentry+0x196/0x510
 security_d_instantiate+0x2f/0x50
 d_splice_alias+0x46/0x2b0
 xfs_vn_lookup+0x8b/0xb0 [xfs]
 __lookup_slow+0x84/0x130
 walk_component+0x158/0x1d0
 path_lookupat+0x6e/0x1c0
 filename_lookup+0xcf/0x1d0
 vfs_statx+0x8d/0x170
 vfs_fstatat+0x54/0x70
 __do_sys_newfstatat+0x26/0x60

This was an unsuccessful attempt to reproduce this using dmerror.

It should provoke the problem if a system tap or custom kernel is used
to return ENODATA as seen above, and not EIO as returned by dmerror.

Signed-off-by: Donald Douwsma <ddouwsma@redhat.com>
---
 tests/xfs/999     | 107 ++++++++++++++++++++++++++++++++++++++++++++++
 tests/xfs/999.out |   2 +
 2 files changed, 109 insertions(+)
 create mode 100755 tests/xfs/999
 create mode 100644 tests/xfs/999.out

diff --git a/tests/xfs/999 b/tests/xfs/999
new file mode 100755
index 00000000..627ca14e
--- /dev/null
+++ b/tests/xfs/999
@@ -0,0 +1,107 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2025 YOUR NAME HERE.  All Rights Reserved.
+#
+# FS QA Test 601
+#
+# what am I here for?
+# Test for panic during ioerror reading xattr blocks
+#
+. ./common/preamble
+_begin_fstest auto
+
+# Override the default cleanup function.
+_cleanup()
+{
+	cd /
+	rm -f $tmp.*
+	_dmerror_cleanup
+}
+
+# Import common functions.
+# . ./common/filter
+. ./common/dmerror
+
+# real QA test starts here
+
+# Modify as appropriate.
+_require_scratch
+_require_dm_target error
+
+_scratch_mkfs >> $seqres.full
+_scratch_mount
+
+# TODO: Avoid assumptions about inode size using _xfs_get_inode_size and
+#       _xfs_get_inode_core_bytes, currently assuming 512 byte inodes.
+
+# Shortform
+# Include shortform for completeness, we can only inject errors for attributes
+# stored outside of the inode.
+testfile="$SCRATCH_MNT/attr_shortform"
+touch $testfile
+ls -i $testfile >> $seqres.full
+setfattr -n user.test_shortform -v $(printf "%0.12d" 12) $testfile
+
+# leaf with single block extent
+testfile="$SCRATCH_MNT/attr_leaf_oneblock"
+touch $testfile
+ls -i $testfile >> $seqres.full
+setfattr -n user.test_leaf -v $(printf "%0.512d" 512) $testfile
+
+# leaf with single multiple block extent
+testfile="$SCRATCH_MNT/attr_leaf_twoblocks"
+touch $testfile
+ls -i $testfile >> $seqres.full
+inode=$(ls -i $testfile|awk '{print $1}')
+setfattr -n user.test_leaf -v $(printf "%0.5000d" 5000) $testfile
+
+# Generalise to for multiple attributes accross many blocks
+# size_name=
+# size_val=256 # how dows value>block-size work out?
+# num_attrs=2022
+# testfile="$SCRATCH_MNT/attr_leaf_manyblocks"
+# touch $testfile
+# ls -li $testfile
+# inode=$(ls -i $testfile|awk '{print $1}')
+#
+# for n_attr in $(seq 0 $num_attrs); do
+#	echo $n_attr;
+#	setfattr -n $(printf "user.test_leaf_%04d" $n_attr) \
+#		 -v $(printf "%0.${size_val}d" $size_val) \
+#		 $testfile
+# done
+
+$XFS_IO_PROG -c "bmap -al" $testfile >> $seqres.full
+attrblocks=($($XFS_IO_PROG -c "bmap -al" $testfile | awk 'match($3, /[0-9]+/, a) {print a[0]}'))
+echo Attribute fork at blocks ${attrblocks[*]} >> $seqres.full
+
+_scratch_unmount
+
+echo "Dump inode $inode details with xfs_db" >> $seqres.full
+# _scratch_xfs_db -c "inode $inode" -c "print core.aformat core.naextents a" >> $seqres.full
+_scratch_xfs_db -c "inode $inode" -c print >> $seqres.full
+
+_dmerror_init >> $seqres.full 2>&1
+_dmerror_reset_table >> $seqres.full 2>&1
+_dmerror_mount >> $seqres.full 2>&1
+
+echo "Setup dm-error when reading the second attribute block ${attrblocks[1]}" >> $seqres.full
+_dmerror_mark_range_bad ${attrblocks[1]} 1 $SCRATCH_DEV
+
+# Debug from tests/xfs/556
+cat >> $seqres.full << ENDL
+dmerror after marking bad:
+$DMERROR_TABLE
+$DMERROR_RTTABLE
+<end table>
+ENDL
+
+_dmerror_load_error_table
+
+# Panic here if failure
+echo "Re-read the extended attribute, panics on unandled ioerrors" >> $seqres.full
+getfattr -d -m - $testfile >> $seqres.full
+
+# success, all done
+status=0
+exit
diff --git a/tests/xfs/999.out b/tests/xfs/999.out
new file mode 100644
index 00000000..e7f4becf
--- /dev/null
+++ b/tests/xfs/999.out
@@ -0,0 +1,2 @@
+QA output created by 999 
+Silence is golden
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO
  2025-08-21  9:16   ` Donald Douwsma
  2025-08-21  9:29     ` [PATCH] xfs: test case for handling io errors when reading extended attributes Donald Douwsma
@ 2025-08-21 12:52     ` Carlos Maiolino
  2025-08-21 20:06       ` Eric Sandeen
  1 sibling, 1 reply; 21+ messages in thread
From: Carlos Maiolino @ 2025-08-21 12:52 UTC (permalink / raw)
  To: Donald Douwsma; +Cc: Darrick J. Wong, Eric Sandeen, linux-xfs@vger.kernel.org

On Thu, Aug 21, 2025 at 07:16:49PM +1000, Donald Douwsma wrote:
> 
> 
> On 19/8/25 06:45, Darrick J. Wong wrote:
> > On Mon, Aug 18, 2025 at 03:22:02PM -0500, Eric Sandeen wrote:
> >> We had a report that a failing scsi disk was oopsing XFS when an xattr
> >> read encountered a media error. This is because the media error returned
> >> -ENODATA, which we map in xattr code to -ENOATTR and treat specially.

scsi_debug can be configured to return a MEDIUM error which if I follow
the discussion properly, would result in the block layer converting it
to ENODATA.

Carlos


P.S.
I'm pretty sure I heard somebody suggesting scsi_debug before,
I just don't remember who and where.

> >>
> >> In this particular case, it looked like:
> >>
> >> xfs_attr_leaf_get()
> >> 	error = xfs_attr_leaf_hasname(args, &bp);
> >> 	// here bp is NULL, error == -ENODATA from disk failure
> >> 	// but we define ENOATTR as ENODATA, so ...
> >> 	if (error == -ENOATTR)  {
> >> 		// whoops, surprise! bp is NULL, OOPS here
> >> 		xfs_trans_brelse(args->trans, bp);
> >> 		return error;
> >> 	} ...
> >>
> >> To avoid whack-a-mole "test for null bp" or "which -ENODATA do we really
> >> mean in this function?" throughout the xattr code, my first thought is
> >> that we should simply map -ENODATA in lower level read functions back to
> >> -EIO, which is unambiguous, even if we lose the nuance of the underlying
> >> error code. (The block device probably already squawked.) Thoughts?
> >
> > Uhhhh where does this ENODATA come from?  Is it the block layer?
> >
> > $ git grep -w ENODATA block/
> > block/blk-core.c:146:   [BLK_STS_MEDIUM]        = { -ENODATA,   "critical medium" },
> >
> 
> I had been working on a test case for this based on dmerror, but It
> never successfully triggered this, since dmerror returned EIO.
> 
> At least it didn't until Eric got creative mapping the the error back to
> ENODATA.
> 
> I'll was in the process of turning this into an xfstest based on your
> tests/xfs/556, I'll reply here with it in case its useful to anyone, but
> it would need to be modified to somehow inject ENODATA into the return
> path.
> 
> Don
> 
> 
> 
> It should work with a systemtap to map the error, though I think Eric
> was considering alternatives.
> 
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO
  2025-08-21 12:52     ` [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO Carlos Maiolino
@ 2025-08-21 20:06       ` Eric Sandeen
  2025-08-22  7:38         ` Donald Douwsma
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Sandeen @ 2025-08-21 20:06 UTC (permalink / raw)
  To: Carlos Maiolino, Donald Douwsma
  Cc: Darrick J. Wong, Eric Sandeen, linux-xfs@vger.kernel.org

On 8/21/25 7:52 AM, Carlos Maiolino wrote:
> On Thu, Aug 21, 2025 at 07:16:49PM +1000, Donald Douwsma wrote:
>>
>>
>> On 19/8/25 06:45, Darrick J. Wong wrote:
>>> On Mon, Aug 18, 2025 at 03:22:02PM -0500, Eric Sandeen wrote:
>>>> We had a report that a failing scsi disk was oopsing XFS when an xattr
>>>> read encountered a media error. This is because the media error returned
>>>> -ENODATA, which we map in xattr code to -ENOATTR and treat specially.
> 
> scsi_debug can be configured to return a MEDIUM error which if I follow
> the discussion properly, would result in the block layer converting it
> to ENODATA.

Yep that does work though very old scsi_debug modules can't control
which sector returns ENODATA. But for semi-modern scsi_debug it should
work quite well.

-Eric

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO
  2025-08-21 20:06       ` Eric Sandeen
@ 2025-08-22  7:38         ` Donald Douwsma
  0 siblings, 0 replies; 21+ messages in thread
From: Donald Douwsma @ 2025-08-22  7:38 UTC (permalink / raw)
  To: Eric Sandeen, Carlos Maiolino
  Cc: Darrick J. Wong, Eric Sandeen, linux-xfs@vger.kernel.org

On 22/8/25 06:06, Eric Sandeen wrote:
> On 8/21/25 7:52 AM, Carlos Maiolino wrote:
>> On Thu, Aug 21, 2025 at 07:16:49PM +1000, Donald Douwsma wrote:
>>>
>>>
>>> On 19/8/25 06:45, Darrick J. Wong wrote:
>>>> On Mon, Aug 18, 2025 at 03:22:02PM -0500, Eric Sandeen wrote:
>>>>> We had a report that a failing scsi disk was oopsing XFS when an xattr
>>>>> read encountered a media error. This is because the media error returned
>>>>> -ENODATA, which we map in xattr code to -ENOATTR and treat specially.
>>
>> scsi_debug can be configured to return a MEDIUM error which if I follow
>> the discussion properly, would result in the block layer converting it
>> to ENODATA.
> 
> Yep that does work though very old scsi_debug modules can't control
> which sector returns ENODATA. But for semi-modern scsi_debug it should
> work quite well.
> 
> -Eric
> 

Thanks Carlos, Eric, that works well.

I should able to make it work without knowing the sector if I stat the
inode before enabling the error if we need support for older kernels.

I'll file off the rough edges and post it for review.

Don

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO
  2025-08-19 15:38       ` Eric Sandeen
  2025-08-19 15:41         ` Christoph Hellwig
  2025-08-19 21:45         ` Dave Chinner
@ 2025-08-25  7:51         ` Christoph Hellwig
  2 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2025-08-25  7:51 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Christoph Hellwig, Eric Sandeen, linux-xfs@vger.kernel.org,
	Donald Douwsma

On Tue, Aug 19, 2025 at 10:38:54AM -0500, Eric Sandeen wrote:
> Ok, this is getting a little more complex. The ENODATA problem is
> very specific, and has (oddly) been reported by users/customers twice
> in recent days.

That's probably when you started deploying a kernel to them that
contained a backport of

07120f1abdff80f3d1351f733661abe28d609535
Author: Allison Collins <allison.henderson@oracle.com>
Date:   Mon Jul 20 21:47:22 2020 -0700

    xfs: Add xfs_has_attr and subroutines

which added this check.  So anything fixing that should probably
add a Fixes tag, even if that commit only made the issue show up
and did not actually cause it.


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-08-25  7:51 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-18 20:22 [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO Eric Sandeen
2025-08-18 20:45 ` Darrick J. Wong
2025-08-18 21:11   ` Eric Sandeen
2025-08-18 23:04     ` Darrick J. Wong
2025-08-21  9:16   ` Donald Douwsma
2025-08-21  9:29     ` [PATCH] xfs: test case for handling io errors when reading extended attributes Donald Douwsma
2025-08-21 12:52     ` [PATCH RFC] xfs: remap block layer ENODATA read errors to EIO Carlos Maiolino
2025-08-21 20:06       ` Eric Sandeen
2025-08-22  7:38         ` Donald Douwsma
2025-08-18 22:09 ` Dave Chinner
2025-08-19  2:27   ` Eric Sandeen
2025-08-19  8:08 ` Christoph Hellwig
2025-08-19 14:34   ` Darrick J. Wong
2025-08-19 14:48     ` Christoph Hellwig
2025-08-19 15:14   ` Eric Sandeen
2025-08-19 15:23     ` Christoph Hellwig
2025-08-19 15:38       ` Eric Sandeen
2025-08-19 15:41         ` Christoph Hellwig
2025-08-19 21:45         ` Dave Chinner
2025-08-20  0:16           ` Eric Sandeen
2025-08-25  7:51         ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).