* [PATCH v2] xfs: handle racing deletions in xfs_zone_gc_iter_irec
@ 2026-05-18 6:52 Hans Holmberg
2026-05-18 7:48 ` Dave Chinner
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Hans Holmberg @ 2026-05-18 6:52 UTC (permalink / raw)
To: Carlos Maiolino, Darrick J . Wong
Cc: Dave Chinner, Christoph Hellwig, Damien Le Moal, linux-xfs,
Hans Holmberg
Under heavy garbage collection pressure from RocksDB workloads,
filesystem shutdowns can occur in xfs_zone_gc_iter_irec when
xfs_iget() returns -EINVAL for deleted files.
Fix this by handling -EINVAL just like we handle -ENOENT, allowing
zone GC to safely ignore stale mappings.
Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection")
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
---
v2:
- handle -EINVAL in the the caller in stead of switching error code
in xfs_imap_lookup
v1:
https://lore.kernel.org/linux-xfs/20260513063745.8067-1-hans.holmberg@wdc.com/
fs/xfs/xfs_zone_gc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c
index fedcc47048af..96f14d811db2 100644
--- a/fs/xfs/xfs_zone_gc.c
+++ b/fs/xfs/xfs_zone_gc.c
@@ -400,7 +400,7 @@ xfs_zone_gc_iter_irec(
/*
* If the inode was already deleted, skip over it.
*/
- if (error == -ENOENT) {
+ if (error == -ENOENT || error == -EINVAL) {
iter->rec_idx++;
goto retry;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH v2] xfs: handle racing deletions in xfs_zone_gc_iter_irec 2026-05-18 6:52 [PATCH v2] xfs: handle racing deletions in xfs_zone_gc_iter_irec Hans Holmberg @ 2026-05-18 7:48 ` Dave Chinner 2026-05-18 8:55 ` Hans Holmberg 2026-05-18 12:43 ` Christoph Hellwig 2026-05-19 7:15 ` Christoph Hellwig 2026-05-30 6:35 ` Carlos Maiolino 2 siblings, 2 replies; 9+ messages in thread From: Dave Chinner @ 2026-05-18 7:48 UTC (permalink / raw) To: Hans Holmberg Cc: Carlos Maiolino, Darrick J . Wong, Dave Chinner, Christoph Hellwig, Damien Le Moal, linux-xfs On Mon, May 18, 2026 at 08:52:24AM +0200, Hans Holmberg wrote: > Under heavy garbage collection pressure from RocksDB workloads, > filesystem shutdowns can occur in xfs_zone_gc_iter_irec when > xfs_iget() returns -EINVAL for deleted files. > > Fix this by handling -EINVAL just like we handle -ENOENT, allowing > zone GC to safely ignore stale mappings. > > Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection") > Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> > --- > > v2: > - handle -EINVAL in the the caller in stead of switching error code > in xfs_imap_lookup > > v1: > https://lore.kernel.org/linux-xfs/20260513063745.8067-1-hans.holmberg@wdc.com/ > > > fs/xfs/xfs_zone_gc.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c > index fedcc47048af..96f14d811db2 100644 > --- a/fs/xfs/xfs_zone_gc.c > +++ b/fs/xfs/xfs_zone_gc.c > @@ -400,7 +400,7 @@ xfs_zone_gc_iter_irec( > /* > * If the inode was already deleted, skip over it. > */ > - if (error == -ENOENT) { > + if (error == -ENOENT || error == -EINVAL) { > iter->rec_idx++; > goto retry; > } Why did you choose to do this? I was expecting the updated fix to be dropping XFS_IGET_UNTRUSTED from the xfs_iget() call because the inode numbers are coming from a trusted source (i.e. the rmapbt) and this would then return the expected -ENOENT on unlink races instead of -EINVAL.... -Dave. -- Dave Chinner dgc@kernel.org ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] xfs: handle racing deletions in xfs_zone_gc_iter_irec 2026-05-18 7:48 ` Dave Chinner @ 2026-05-18 8:55 ` Hans Holmberg 2026-05-18 9:17 ` Damien Le Moal 2026-05-18 9:44 ` Dave Chinner 2026-05-18 12:43 ` Christoph Hellwig 1 sibling, 2 replies; 9+ messages in thread From: Hans Holmberg @ 2026-05-18 8:55 UTC (permalink / raw) To: Dave Chinner Cc: Carlos Maiolino, Darrick J . Wong, Dave Chinner, Christoph Hellwig, Damien Le Moal, linux-xfs@vger.kernel.org On 18/05/2026 09:48, Dave Chinner wrote: > On Mon, May 18, 2026 at 08:52:24AM +0200, Hans Holmberg wrote: >> Under heavy garbage collection pressure from RocksDB workloads, >> filesystem shutdowns can occur in xfs_zone_gc_iter_irec when >> xfs_iget() returns -EINVAL for deleted files. >> >> Fix this by handling -EINVAL just like we handle -ENOENT, allowing >> zone GC to safely ignore stale mappings. >> >> Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection") >> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> >> --- >> >> v2: >> - handle -EINVAL in the the caller in stead of switching error code >> in xfs_imap_lookup >> >> v1: >> https://lore.kernel.org/linux-xfs/20260513063745.8067-1-hans.holmberg@wdc.com/ >> >> >> fs/xfs/xfs_zone_gc.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c >> index fedcc47048af..96f14d811db2 100644 >> --- a/fs/xfs/xfs_zone_gc.c >> +++ b/fs/xfs/xfs_zone_gc.c >> @@ -400,7 +400,7 @@ xfs_zone_gc_iter_irec( >> /* >> * If the inode was already deleted, skip over it. >> */ >> - if (error == -ENOENT) { >> + if (error == -ENOENT || error == -EINVAL) { >> iter->rec_idx++; >> goto retry; >> } > > Why did you choose to do this? I was expecting the updated fix to be > dropping XFS_IGET_UNTRUSTED from the xfs_iget() call because the > inode numbers are coming from a trusted source (i.e. the rmapbt) and > this would then return the expected -ENOENT on unlink races instead > of -EINVAL.... > > -Dave. I kept the XFS_IGET_UNTRUSTED because the xfs_rmap_query_range lookup that we do in xfs_zone_gc_query may not be valid by the time we call xfs_iget in xfs_zone_gc_iter_irec. Cheers, Hans ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] xfs: handle racing deletions in xfs_zone_gc_iter_irec 2026-05-18 8:55 ` Hans Holmberg @ 2026-05-18 9:17 ` Damien Le Moal 2026-05-18 9:44 ` Dave Chinner 1 sibling, 0 replies; 9+ messages in thread From: Damien Le Moal @ 2026-05-18 9:17 UTC (permalink / raw) To: Hans Holmberg, Dave Chinner Cc: Carlos Maiolino, Darrick J . Wong, Dave Chinner, Christoph Hellwig, linux-xfs@vger.kernel.org On 2026/05/18 10:55, Hans Holmberg wrote: > On 18/05/2026 09:48, Dave Chinner wrote: >> On Mon, May 18, 2026 at 08:52:24AM +0200, Hans Holmberg wrote: >>> Under heavy garbage collection pressure from RocksDB workloads, >>> filesystem shutdowns can occur in xfs_zone_gc_iter_irec when >>> xfs_iget() returns -EINVAL for deleted files. >>> >>> Fix this by handling -EINVAL just like we handle -ENOENT, allowing >>> zone GC to safely ignore stale mappings. >>> >>> Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection") >>> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> >>> --- >>> >>> v2: >>> - handle -EINVAL in the the caller in stead of switching error code >>> in xfs_imap_lookup >>> >>> v1: >>> https://lore.kernel.org/linux-xfs/20260513063745.8067-1-hans.holmberg@wdc.com/ >>> >>> >>> fs/xfs/xfs_zone_gc.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c >>> index fedcc47048af..96f14d811db2 100644 >>> --- a/fs/xfs/xfs_zone_gc.c >>> +++ b/fs/xfs/xfs_zone_gc.c >>> @@ -400,7 +400,7 @@ xfs_zone_gc_iter_irec( >>> /* >>> * If the inode was already deleted, skip over it. >>> */ >>> - if (error == -ENOENT) { >>> + if (error == -ENOENT || error == -EINVAL) { >>> iter->rec_idx++; >>> goto retry; >>> } >> >> Why did you choose to do this? I was expecting the updated fix to be >> dropping XFS_IGET_UNTRUSTED from the xfs_iget() call because the >> inode numbers are coming from a trusted source (i.e. the rmapbt) and >> this would then return the expected -ENOENT on unlink races instead >> of -EINVAL.... >> >> -Dave. > > I kept the XFS_IGET_UNTRUSTED because the xfs_rmap_query_range lookup > that we do in xfs_zone_gc_query may not be valid by the time we call > xfs_iget in xfs_zone_gc_iter_irec. Maybe improve the comment above the change to document this ? > > Cheers, > Hans > > > > > > > > -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] xfs: handle racing deletions in xfs_zone_gc_iter_irec 2026-05-18 8:55 ` Hans Holmberg 2026-05-18 9:17 ` Damien Le Moal @ 2026-05-18 9:44 ` Dave Chinner 2026-05-18 12:45 ` Christoph Hellwig 1 sibling, 1 reply; 9+ messages in thread From: Dave Chinner @ 2026-05-18 9:44 UTC (permalink / raw) To: Hans Holmberg Cc: Carlos Maiolino, Darrick J . Wong, Dave Chinner, Christoph Hellwig, Damien Le Moal, linux-xfs@vger.kernel.org On Mon, May 18, 2026 at 08:55:58AM +0000, Hans Holmberg wrote: > On 18/05/2026 09:48, Dave Chinner wrote: > > On Mon, May 18, 2026 at 08:52:24AM +0200, Hans Holmberg wrote: > >> Under heavy garbage collection pressure from RocksDB workloads, > >> filesystem shutdowns can occur in xfs_zone_gc_iter_irec when > >> xfs_iget() returns -EINVAL for deleted files. > >> > >> Fix this by handling -EINVAL just like we handle -ENOENT, allowing > >> zone GC to safely ignore stale mappings. > >> > >> Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection") > >> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> > >> --- > >> > >> v2: > >> - handle -EINVAL in the the caller in stead of switching error code > >> in xfs_imap_lookup > >> > >> v1: > >> https://lore.kernel.org/linux-xfs/20260513063745.8067-1-hans.holmberg@wdc.com/ > >> > >> > >> fs/xfs/xfs_zone_gc.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c > >> index fedcc47048af..96f14d811db2 100644 > >> --- a/fs/xfs/xfs_zone_gc.c > >> +++ b/fs/xfs/xfs_zone_gc.c > >> @@ -400,7 +400,7 @@ xfs_zone_gc_iter_irec( > >> /* > >> * If the inode was already deleted, skip over it. > >> */ > >> - if (error == -ENOENT) { > >> + if (error == -ENOENT || error == -EINVAL) { > >> iter->rec_idx++; > >> goto retry; > >> } > > > > Why did you choose to do this? I was expecting the updated fix to be > > dropping XFS_IGET_UNTRUSTED from the xfs_iget() call because the > > inode numbers are coming from a trusted source (i.e. the rmapbt) and > > this would then return the expected -ENOENT on unlink races instead > > of -EINVAL.... > > I kept the XFS_IGET_UNTRUSTED because the xfs_rmap_query_range lookup > that we do in xfs_zone_gc_query may not be valid by the time we call > xfs_iget in xfs_zone_gc_iter_irec. I'll repeat this once more: XFS_IGET_UNTRUSTED tells xfs_iget where the inode number came from, not whether it *might* be valid or not. xfs_iget() will detect races with unlink and do the right thing regardless of whether IGET_UNTRUSTED is set. IGET_UNTRUSTED is intended to be used when the inode number comes from an an external source. That may be userspace or the network with file handle decoding, bulkstat (user supplied inode number), corrupt metadata (e.g during online repair when the inode clusters cannot be fully trusted), etc. In these situations, we should not be doing expensive validity checks before attempting to read the inode from disk, and we should not be giving any information back to the caller as to whether the inode cluster is allocated or not on failure (e.g. potentially giving away other valid inode numbers nearby). IOWs, we don't want to be giving any indication of why the inode number is considered invalid - it is just a bad inode number that was provided, and because we don't trust the provider of the inode number, that's all we will tell it. However, inode numbers that come from internal metadata (i.e. trusted sources) are -known to be valid-. They don't get placed into the internal metadata if they are invalid inode numbers. Hence the fact that is was in the rmapbt means it is, or very recently was, a valid inode number, and we can trust it. If xfs_iget then races with unlink on lookup, that is -just fine- and it will return -ENOENT. In this case, xfs_iget() is telling the caller that the inode is no longer allocated, because it is expected that the caller can use and will do the right thing with this information. i.e. XFS_IGET_UNTRUSTED defines whether the inode number source could be trusted to provide a valid inode number, not whether the inode number can be guaranteed to be allocated at this specific lookup instance. -Dave. -- Dave Chinner dgc@kernel.org ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] xfs: handle racing deletions in xfs_zone_gc_iter_irec 2026-05-18 9:44 ` Dave Chinner @ 2026-05-18 12:45 ` Christoph Hellwig 0 siblings, 0 replies; 9+ messages in thread From: Christoph Hellwig @ 2026-05-18 12:45 UTC (permalink / raw) To: Dave Chinner Cc: Hans Holmberg, Carlos Maiolino, Darrick J . Wong, Dave Chinner, Christoph Hellwig, Damien Le Moal, linux-xfs@vger.kernel.org On Mon, May 18, 2026 at 07:44:36PM +1000, Dave Chinner wrote: > I'll repeat this once more: XFS_IGET_UNTRUSTED tells xfs_iget where > the inode number came from, not whether it *might* be valid or not. > xfs_iget() will detect races with unlink and do the right thing > regardless of whether IGET_UNTRUSTED is set. No, xfs_iget will not detect that the inode cluster has been reused as some other kind of metadata or even data between quering the rmap btree and trying to read the inode to perform garbage collection. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] xfs: handle racing deletions in xfs_zone_gc_iter_irec 2026-05-18 7:48 ` Dave Chinner 2026-05-18 8:55 ` Hans Holmberg @ 2026-05-18 12:43 ` Christoph Hellwig 1 sibling, 0 replies; 9+ messages in thread From: Christoph Hellwig @ 2026-05-18 12:43 UTC (permalink / raw) To: Dave Chinner Cc: Hans Holmberg, Carlos Maiolino, Darrick J . Wong, Dave Chinner, Christoph Hellwig, Damien Le Moal, linux-xfs On Mon, May 18, 2026 at 05:48:22PM +1000, Dave Chinner wrote: > > - if (error == -ENOENT) { > > + if (error == -ENOENT || error == -EINVAL) { > > iter->rec_idx++; > > goto retry; > > } > > Why did you choose to do this? I was expecting the updated fix to be > dropping XFS_IGET_UNTRUSTED from the xfs_iget() call because the > inode numbers are coming from a trusted source (i.e. the rmapbt) and > this would then return the expected -ENOENT on unlink races instead > of -EINVAL.... The inode number is not trusted. It comes from a stale copy of the rmap, and the inode cluster can have been freed and reallocated. Dropping XFS_IGET_UNTRUSTED would be highly dangerous as it could cause iget to interprete something else as an inode struture in this case and easily cause kernel crashes. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] xfs: handle racing deletions in xfs_zone_gc_iter_irec 2026-05-18 6:52 [PATCH v2] xfs: handle racing deletions in xfs_zone_gc_iter_irec Hans Holmberg 2026-05-18 7:48 ` Dave Chinner @ 2026-05-19 7:15 ` Christoph Hellwig 2026-05-30 6:35 ` Carlos Maiolino 2 siblings, 0 replies; 9+ messages in thread From: Christoph Hellwig @ 2026-05-19 7:15 UTC (permalink / raw) To: Hans Holmberg Cc: Carlos Maiolino, Darrick J . Wong, Dave Chinner, Christoph Hellwig, Damien Le Moal, linux-xfs On Mon, May 18, 2026 at 08:52:24AM +0200, Hans Holmberg wrote: > Under heavy garbage collection pressure from RocksDB workloads, > filesystem shutdowns can occur in xfs_zone_gc_iter_irec when > xfs_iget() returns -EINVAL for deleted files. > > Fix this by handling -EINVAL just like we handle -ENOENT, allowing > zone GC to safely ignore stale mappings. While I'd prefer -ENOENT returns for not malformed but potentially out of data inodes, this works as well with a small loss in error checking: Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] xfs: handle racing deletions in xfs_zone_gc_iter_irec 2026-05-18 6:52 [PATCH v2] xfs: handle racing deletions in xfs_zone_gc_iter_irec Hans Holmberg 2026-05-18 7:48 ` Dave Chinner 2026-05-19 7:15 ` Christoph Hellwig @ 2026-05-30 6:35 ` Carlos Maiolino 2 siblings, 0 replies; 9+ messages in thread From: Carlos Maiolino @ 2026-05-30 6:35 UTC (permalink / raw) To: Darrick J . Wong, Hans Holmberg Cc: Dave Chinner, Christoph Hellwig, Damien Le Moal, linux-xfs On Mon, 18 May 2026 08:52:24 +0200, Hans Holmberg wrote: > Under heavy garbage collection pressure from RocksDB workloads, > filesystem shutdowns can occur in xfs_zone_gc_iter_irec when > xfs_iget() returns -EINVAL for deleted files. > > Fix this by handling -EINVAL just like we handle -ENOENT, allowing > zone GC to safely ignore stale mappings. > > [...] Applied to for-next, thanks! [1/1] xfs: handle racing deletions in xfs_zone_gc_iter_irec commit: bc95fa240a1b8ae64d3dabe87cbe103b912afc45 Best regards, -- Carlos Maiolino <cem@kernel.org> ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-05-30 6:35 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-18 6:52 [PATCH v2] xfs: handle racing deletions in xfs_zone_gc_iter_irec Hans Holmberg 2026-05-18 7:48 ` Dave Chinner 2026-05-18 8:55 ` Hans Holmberg 2026-05-18 9:17 ` Damien Le Moal 2026-05-18 9:44 ` Dave Chinner 2026-05-18 12:45 ` Christoph Hellwig 2026-05-18 12:43 ` Christoph Hellwig 2026-05-19 7:15 ` Christoph Hellwig 2026-05-30 6:35 ` Carlos Maiolino
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.