* [PATCH v2] xfs: handle racing deletions in xfs_zone_gc_iter_irec
@ 2026-05-18 6:52 Hans Holmberg
2026-05-18 7:48 ` Dave Chinner
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Hans Holmberg @ 2026-05-18 6:52 UTC (permalink / raw)
To: Carlos Maiolino, Darrick J . Wong
Cc: Dave Chinner, Christoph Hellwig, Damien Le Moal, linux-xfs,
Hans Holmberg
Under heavy garbage collection pressure from RocksDB workloads,
filesystem shutdowns can occur in xfs_zone_gc_iter_irec when
xfs_iget() returns -EINVAL for deleted files.
Fix this by handling -EINVAL just like we handle -ENOENT, allowing
zone GC to safely ignore stale mappings.
Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection")
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
---
v2:
- handle -EINVAL in the the caller in stead of switching error code
in xfs_imap_lookup
v1:
https://lore.kernel.org/linux-xfs/20260513063745.8067-1-hans.holmberg@wdc.com/
fs/xfs/xfs_zone_gc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c
index fedcc47048af..96f14d811db2 100644
--- a/fs/xfs/xfs_zone_gc.c
+++ b/fs/xfs/xfs_zone_gc.c
@@ -400,7 +400,7 @@ xfs_zone_gc_iter_irec(
/*
* If the inode was already deleted, skip over it.
*/
- if (error == -ENOENT) {
+ if (error == -ENOENT || error == -EINVAL) {
iter->rec_idx++;
goto retry;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2] xfs: handle racing deletions in xfs_zone_gc_iter_irec
2026-05-18 6:52 [PATCH v2] xfs: handle racing deletions in xfs_zone_gc_iter_irec Hans Holmberg
@ 2026-05-18 7:48 ` Dave Chinner
2026-05-18 8:55 ` Hans Holmberg
2026-05-18 12:43 ` Christoph Hellwig
2026-05-19 7:15 ` Christoph Hellwig
2026-05-30 6:35 ` Carlos Maiolino
2 siblings, 2 replies; 9+ messages in thread
From: Dave Chinner @ 2026-05-18 7:48 UTC (permalink / raw)
To: Hans Holmberg
Cc: Carlos Maiolino, Darrick J . Wong, Dave Chinner,
Christoph Hellwig, Damien Le Moal, linux-xfs
On Mon, May 18, 2026 at 08:52:24AM +0200, Hans Holmberg wrote:
> Under heavy garbage collection pressure from RocksDB workloads,
> filesystem shutdowns can occur in xfs_zone_gc_iter_irec when
> xfs_iget() returns -EINVAL for deleted files.
>
> Fix this by handling -EINVAL just like we handle -ENOENT, allowing
> zone GC to safely ignore stale mappings.
>
> Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection")
> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
> ---
>
> v2:
> - handle -EINVAL in the the caller in stead of switching error code
> in xfs_imap_lookup
>
> v1:
> https://lore.kernel.org/linux-xfs/20260513063745.8067-1-hans.holmberg@wdc.com/
>
>
> fs/xfs/xfs_zone_gc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c
> index fedcc47048af..96f14d811db2 100644
> --- a/fs/xfs/xfs_zone_gc.c
> +++ b/fs/xfs/xfs_zone_gc.c
> @@ -400,7 +400,7 @@ xfs_zone_gc_iter_irec(
> /*
> * If the inode was already deleted, skip over it.
> */
> - if (error == -ENOENT) {
> + if (error == -ENOENT || error == -EINVAL) {
> iter->rec_idx++;
> goto retry;
> }
Why did you choose to do this? I was expecting the updated fix to be
dropping XFS_IGET_UNTRUSTED from the xfs_iget() call because the
inode numbers are coming from a trusted source (i.e. the rmapbt) and
this would then return the expected -ENOENT on unlink races instead
of -EINVAL....
-Dave.
--
Dave Chinner
dgc@kernel.org
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] xfs: handle racing deletions in xfs_zone_gc_iter_irec
2026-05-18 7:48 ` Dave Chinner
@ 2026-05-18 8:55 ` Hans Holmberg
2026-05-18 9:17 ` Damien Le Moal
2026-05-18 9:44 ` Dave Chinner
2026-05-18 12:43 ` Christoph Hellwig
1 sibling, 2 replies; 9+ messages in thread
From: Hans Holmberg @ 2026-05-18 8:55 UTC (permalink / raw)
To: Dave Chinner
Cc: Carlos Maiolino, Darrick J . Wong, Dave Chinner,
Christoph Hellwig, Damien Le Moal, linux-xfs@vger.kernel.org
On 18/05/2026 09:48, Dave Chinner wrote:
> On Mon, May 18, 2026 at 08:52:24AM +0200, Hans Holmberg wrote:
>> Under heavy garbage collection pressure from RocksDB workloads,
>> filesystem shutdowns can occur in xfs_zone_gc_iter_irec when
>> xfs_iget() returns -EINVAL for deleted files.
>>
>> Fix this by handling -EINVAL just like we handle -ENOENT, allowing
>> zone GC to safely ignore stale mappings.
>>
>> Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection")
>> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
>> ---
>>
>> v2:
>> - handle -EINVAL in the the caller in stead of switching error code
>> in xfs_imap_lookup
>>
>> v1:
>> https://lore.kernel.org/linux-xfs/20260513063745.8067-1-hans.holmberg@wdc.com/
>>
>>
>> fs/xfs/xfs_zone_gc.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c
>> index fedcc47048af..96f14d811db2 100644
>> --- a/fs/xfs/xfs_zone_gc.c
>> +++ b/fs/xfs/xfs_zone_gc.c
>> @@ -400,7 +400,7 @@ xfs_zone_gc_iter_irec(
>> /*
>> * If the inode was already deleted, skip over it.
>> */
>> - if (error == -ENOENT) {
>> + if (error == -ENOENT || error == -EINVAL) {
>> iter->rec_idx++;
>> goto retry;
>> }
>
> Why did you choose to do this? I was expecting the updated fix to be
> dropping XFS_IGET_UNTRUSTED from the xfs_iget() call because the
> inode numbers are coming from a trusted source (i.e. the rmapbt) and
> this would then return the expected -ENOENT on unlink races instead
> of -EINVAL....
>
> -Dave.
I kept the XFS_IGET_UNTRUSTED because the xfs_rmap_query_range lookup
that we do in xfs_zone_gc_query may not be valid by the time we call
xfs_iget in xfs_zone_gc_iter_irec.
Cheers,
Hans
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] xfs: handle racing deletions in xfs_zone_gc_iter_irec
2026-05-18 8:55 ` Hans Holmberg
@ 2026-05-18 9:17 ` Damien Le Moal
2026-05-18 9:44 ` Dave Chinner
1 sibling, 0 replies; 9+ messages in thread
From: Damien Le Moal @ 2026-05-18 9:17 UTC (permalink / raw)
To: Hans Holmberg, Dave Chinner
Cc: Carlos Maiolino, Darrick J . Wong, Dave Chinner,
Christoph Hellwig, linux-xfs@vger.kernel.org
On 2026/05/18 10:55, Hans Holmberg wrote:
> On 18/05/2026 09:48, Dave Chinner wrote:
>> On Mon, May 18, 2026 at 08:52:24AM +0200, Hans Holmberg wrote:
>>> Under heavy garbage collection pressure from RocksDB workloads,
>>> filesystem shutdowns can occur in xfs_zone_gc_iter_irec when
>>> xfs_iget() returns -EINVAL for deleted files.
>>>
>>> Fix this by handling -EINVAL just like we handle -ENOENT, allowing
>>> zone GC to safely ignore stale mappings.
>>>
>>> Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection")
>>> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
>>> ---
>>>
>>> v2:
>>> - handle -EINVAL in the the caller in stead of switching error code
>>> in xfs_imap_lookup
>>>
>>> v1:
>>> https://lore.kernel.org/linux-xfs/20260513063745.8067-1-hans.holmberg@wdc.com/
>>>
>>>
>>> fs/xfs/xfs_zone_gc.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c
>>> index fedcc47048af..96f14d811db2 100644
>>> --- a/fs/xfs/xfs_zone_gc.c
>>> +++ b/fs/xfs/xfs_zone_gc.c
>>> @@ -400,7 +400,7 @@ xfs_zone_gc_iter_irec(
>>> /*
>>> * If the inode was already deleted, skip over it.
>>> */
>>> - if (error == -ENOENT) {
>>> + if (error == -ENOENT || error == -EINVAL) {
>>> iter->rec_idx++;
>>> goto retry;
>>> }
>>
>> Why did you choose to do this? I was expecting the updated fix to be
>> dropping XFS_IGET_UNTRUSTED from the xfs_iget() call because the
>> inode numbers are coming from a trusted source (i.e. the rmapbt) and
>> this would then return the expected -ENOENT on unlink races instead
>> of -EINVAL....
>>
>> -Dave.
>
> I kept the XFS_IGET_UNTRUSTED because the xfs_rmap_query_range lookup
> that we do in xfs_zone_gc_query may not be valid by the time we call
> xfs_iget in xfs_zone_gc_iter_irec.
Maybe improve the comment above the change to document this ?
>
> Cheers,
> Hans
>
>
>
>
>
>
>
>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] xfs: handle racing deletions in xfs_zone_gc_iter_irec
2026-05-18 8:55 ` Hans Holmberg
2026-05-18 9:17 ` Damien Le Moal
@ 2026-05-18 9:44 ` Dave Chinner
2026-05-18 12:45 ` Christoph Hellwig
1 sibling, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2026-05-18 9:44 UTC (permalink / raw)
To: Hans Holmberg
Cc: Carlos Maiolino, Darrick J . Wong, Dave Chinner,
Christoph Hellwig, Damien Le Moal, linux-xfs@vger.kernel.org
On Mon, May 18, 2026 at 08:55:58AM +0000, Hans Holmberg wrote:
> On 18/05/2026 09:48, Dave Chinner wrote:
> > On Mon, May 18, 2026 at 08:52:24AM +0200, Hans Holmberg wrote:
> >> Under heavy garbage collection pressure from RocksDB workloads,
> >> filesystem shutdowns can occur in xfs_zone_gc_iter_irec when
> >> xfs_iget() returns -EINVAL for deleted files.
> >>
> >> Fix this by handling -EINVAL just like we handle -ENOENT, allowing
> >> zone GC to safely ignore stale mappings.
> >>
> >> Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection")
> >> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
> >> ---
> >>
> >> v2:
> >> - handle -EINVAL in the the caller in stead of switching error code
> >> in xfs_imap_lookup
> >>
> >> v1:
> >> https://lore.kernel.org/linux-xfs/20260513063745.8067-1-hans.holmberg@wdc.com/
> >>
> >>
> >> fs/xfs/xfs_zone_gc.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c
> >> index fedcc47048af..96f14d811db2 100644
> >> --- a/fs/xfs/xfs_zone_gc.c
> >> +++ b/fs/xfs/xfs_zone_gc.c
> >> @@ -400,7 +400,7 @@ xfs_zone_gc_iter_irec(
> >> /*
> >> * If the inode was already deleted, skip over it.
> >> */
> >> - if (error == -ENOENT) {
> >> + if (error == -ENOENT || error == -EINVAL) {
> >> iter->rec_idx++;
> >> goto retry;
> >> }
> >
> > Why did you choose to do this? I was expecting the updated fix to be
> > dropping XFS_IGET_UNTRUSTED from the xfs_iget() call because the
> > inode numbers are coming from a trusted source (i.e. the rmapbt) and
> > this would then return the expected -ENOENT on unlink races instead
> > of -EINVAL....
>
> I kept the XFS_IGET_UNTRUSTED because the xfs_rmap_query_range lookup
> that we do in xfs_zone_gc_query may not be valid by the time we call
> xfs_iget in xfs_zone_gc_iter_irec.
I'll repeat this once more: XFS_IGET_UNTRUSTED tells xfs_iget where
the inode number came from, not whether it *might* be valid or not.
xfs_iget() will detect races with unlink and do the right thing
regardless of whether IGET_UNTRUSTED is set.
IGET_UNTRUSTED is intended to be used when the inode number comes
from an an external source. That may be userspace or the network
with file handle decoding, bulkstat (user supplied inode number),
corrupt metadata (e.g during online repair when the inode clusters
cannot be fully trusted), etc.
In these situations, we should not be doing expensive validity
checks before attempting to read the inode from disk, and we should
not be giving any information back to the caller as
to whether the inode cluster is allocated or not on failure (e.g.
potentially giving away other valid inode numbers nearby). IOWs, we
don't want to be giving any indication of why the inode number is
considered invalid - it is just a bad inode number that was
provided, and because we don't trust the provider of the inode
number, that's all we will tell it.
However, inode numbers that come from internal metadata (i.e.
trusted sources) are -known to be valid-. They don't get placed into
the internal metadata if they are invalid inode numbers. Hence the
fact that is was in the rmapbt means it is, or very recently was, a
valid inode number, and we can trust it.
If xfs_iget then races with unlink on lookup, that is -just fine-
and it will return -ENOENT. In this case, xfs_iget() is telling the
caller that the inode is no longer allocated, because it is expected
that the caller can use and will do the right thing with this
information.
i.e. XFS_IGET_UNTRUSTED defines whether the inode number source
could be trusted to provide a valid inode number, not whether the
inode number can be guaranteed to be allocated at this specific
lookup instance.
-Dave.
--
Dave Chinner
dgc@kernel.org
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] xfs: handle racing deletions in xfs_zone_gc_iter_irec
2026-05-18 7:48 ` Dave Chinner
2026-05-18 8:55 ` Hans Holmberg
@ 2026-05-18 12:43 ` Christoph Hellwig
1 sibling, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2026-05-18 12:43 UTC (permalink / raw)
To: Dave Chinner
Cc: Hans Holmberg, Carlos Maiolino, Darrick J . Wong, Dave Chinner,
Christoph Hellwig, Damien Le Moal, linux-xfs
On Mon, May 18, 2026 at 05:48:22PM +1000, Dave Chinner wrote:
> > - if (error == -ENOENT) {
> > + if (error == -ENOENT || error == -EINVAL) {
> > iter->rec_idx++;
> > goto retry;
> > }
>
> Why did you choose to do this? I was expecting the updated fix to be
> dropping XFS_IGET_UNTRUSTED from the xfs_iget() call because the
> inode numbers are coming from a trusted source (i.e. the rmapbt) and
> this would then return the expected -ENOENT on unlink races instead
> of -EINVAL....
The inode number is not trusted. It comes from a stale copy of the rmap,
and the inode cluster can have been freed and reallocated. Dropping
XFS_IGET_UNTRUSTED would be highly dangerous as it could cause iget
to interprete something else as an inode struture in this case and
easily cause kernel crashes.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] xfs: handle racing deletions in xfs_zone_gc_iter_irec
2026-05-18 9:44 ` Dave Chinner
@ 2026-05-18 12:45 ` Christoph Hellwig
0 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2026-05-18 12:45 UTC (permalink / raw)
To: Dave Chinner
Cc: Hans Holmberg, Carlos Maiolino, Darrick J . Wong, Dave Chinner,
Christoph Hellwig, Damien Le Moal, linux-xfs@vger.kernel.org
On Mon, May 18, 2026 at 07:44:36PM +1000, Dave Chinner wrote:
> I'll repeat this once more: XFS_IGET_UNTRUSTED tells xfs_iget where
> the inode number came from, not whether it *might* be valid or not.
> xfs_iget() will detect races with unlink and do the right thing
> regardless of whether IGET_UNTRUSTED is set.
No, xfs_iget will not detect that the inode cluster has been reused
as some other kind of metadata or even data between quering the rmap
btree and trying to read the inode to perform garbage collection.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] xfs: handle racing deletions in xfs_zone_gc_iter_irec
2026-05-18 6:52 [PATCH v2] xfs: handle racing deletions in xfs_zone_gc_iter_irec Hans Holmberg
2026-05-18 7:48 ` Dave Chinner
@ 2026-05-19 7:15 ` Christoph Hellwig
2026-05-30 6:35 ` Carlos Maiolino
2 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2026-05-19 7:15 UTC (permalink / raw)
To: Hans Holmberg
Cc: Carlos Maiolino, Darrick J . Wong, Dave Chinner,
Christoph Hellwig, Damien Le Moal, linux-xfs
On Mon, May 18, 2026 at 08:52:24AM +0200, Hans Holmberg wrote:
> Under heavy garbage collection pressure from RocksDB workloads,
> filesystem shutdowns can occur in xfs_zone_gc_iter_irec when
> xfs_iget() returns -EINVAL for deleted files.
>
> Fix this by handling -EINVAL just like we handle -ENOENT, allowing
> zone GC to safely ignore stale mappings.
While I'd prefer -ENOENT returns for not malformed but potentially
out of data inodes, this works as well with a small loss in error
checking:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] xfs: handle racing deletions in xfs_zone_gc_iter_irec
2026-05-18 6:52 [PATCH v2] xfs: handle racing deletions in xfs_zone_gc_iter_irec Hans Holmberg
2026-05-18 7:48 ` Dave Chinner
2026-05-19 7:15 ` Christoph Hellwig
@ 2026-05-30 6:35 ` Carlos Maiolino
2 siblings, 0 replies; 9+ messages in thread
From: Carlos Maiolino @ 2026-05-30 6:35 UTC (permalink / raw)
To: Darrick J . Wong, Hans Holmberg
Cc: Dave Chinner, Christoph Hellwig, Damien Le Moal, linux-xfs
On Mon, 18 May 2026 08:52:24 +0200, Hans Holmberg wrote:
> Under heavy garbage collection pressure from RocksDB workloads,
> filesystem shutdowns can occur in xfs_zone_gc_iter_irec when
> xfs_iget() returns -EINVAL for deleted files.
>
> Fix this by handling -EINVAL just like we handle -ENOENT, allowing
> zone GC to safely ignore stale mappings.
>
> [...]
Applied to for-next, thanks!
[1/1] xfs: handle racing deletions in xfs_zone_gc_iter_irec
commit: bc95fa240a1b8ae64d3dabe87cbe103b912afc45
Best regards,
--
Carlos Maiolino <cem@kernel.org>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-05-30 6:35 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-18 6:52 [PATCH v2] xfs: handle racing deletions in xfs_zone_gc_iter_irec Hans Holmberg
2026-05-18 7:48 ` Dave Chinner
2026-05-18 8:55 ` Hans Holmberg
2026-05-18 9:17 ` Damien Le Moal
2026-05-18 9:44 ` Dave Chinner
2026-05-18 12:45 ` Christoph Hellwig
2026-05-18 12:43 ` Christoph Hellwig
2026-05-19 7:15 ` Christoph Hellwig
2026-05-30 6:35 ` Carlos Maiolino
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.