* [PATCH] xfs: return -ENOENT for unallocated inodes in xfs_imap_lookup
@ 2026-05-13 6:37 Hans Holmberg
2026-05-13 6:57 ` Christoph Hellwig
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Hans Holmberg @ 2026-05-13 6:37 UTC (permalink / raw)
To: Carlos Maiolino, Darrick J . Wong
Cc: Dave Chinner, Christoph Hellwig, Damien Le Moal, linux-xfs,
Hans Holmberg
Under heavy garbage collection pressure from RocksDB workloads,
filesystem shutdowns can occur in xfs_zone_gc_iter_irec when
xfs_iget() returns -EINVAL.
xfs_zone_gc_iter_irec expects -ENOENT when garbage collection races
with file deletion, so that blocks belonging to deleted files can be
skipped gracefully. Returning -EINVAL instead causes the GC code to
treat this as a fatal error and forces a shutdown.
Fix this by returning -ENOENT from xfs_imap_lookup for untrusted inodes
when the inode has been deleted, allowing zone GC to safely ignore stale
mappings.
Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection")
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
---
We could also fix this up in xfs_zone_gc_iter_irec, and handle -EINVAL
as a non-fatal error, but returning -ENOENT from xfs_imap_lookup seems
like the more right thing to do.
The check "that the returned record contains the required inode" in
xfs_imap_lookup also returns -EINVAL and might be worth switching
over to -ENOENT but I'm leaving that as is I did not hit that case
in my testing.
fs/xfs/libxfs/xfs_ialloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index dcef06ec0a02..702e1e853e45 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -2442,7 +2442,7 @@ xfs_imap_lookup(
/* for untrusted inodes check it is allocated first */
if ((flags & XFS_IGET_UNTRUSTED) &&
(rec.ir_free & XFS_INOBT_MASK(agino - rec.ir_startino)))
- return -EINVAL;
+ return -ENOENT;
*chunk_agbno = XFS_AGINO_TO_AGBNO(mp, rec.ir_startino);
*offset_agbno = agbno - *chunk_agbno;
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH] xfs: return -ENOENT for unallocated inodes in xfs_imap_lookup
2026-05-13 6:37 [PATCH] xfs: return -ENOENT for unallocated inodes in xfs_imap_lookup Hans Holmberg
@ 2026-05-13 6:57 ` Christoph Hellwig
2026-05-13 7:31 ` Carlos Maiolino
2026-05-13 13:43 ` Dave Chinner
2 siblings, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2026-05-13 6:57 UTC (permalink / raw)
To: Hans Holmberg
Cc: Carlos Maiolino, Darrick J . Wong, Dave Chinner,
Christoph Hellwig, Damien Le Moal, linux-xfs
On Wed, May 13, 2026 at 08:37:45AM +0200, Hans Holmberg wrote:
> as a non-fatal error, but returning -ENOENT from xfs_imap_lookup seems
> like the more right thing to do.
>
> The check "that the returned record contains the required inode" in
> xfs_imap_lookup also returns -EINVAL and might be worth switching
> over to -ENOENT but I'm leaving that as is I did not hit that case
> in my testing.
I guess you already know my opinion, but I really prefer the consistent
-ENOENT return:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] xfs: return -ENOENT for unallocated inodes in xfs_imap_lookup
2026-05-13 6:37 [PATCH] xfs: return -ENOENT for unallocated inodes in xfs_imap_lookup Hans Holmberg
2026-05-13 6:57 ` Christoph Hellwig
@ 2026-05-13 7:31 ` Carlos Maiolino
2026-05-13 13:43 ` Dave Chinner
2 siblings, 0 replies; 4+ messages in thread
From: Carlos Maiolino @ 2026-05-13 7:31 UTC (permalink / raw)
To: Hans Holmberg
Cc: Darrick J . Wong, Dave Chinner, Christoph Hellwig, Damien Le Moal,
linux-xfs
On Wed, May 13, 2026 at 08:37:45AM +0200, Hans Holmberg wrote:
> Under heavy garbage collection pressure from RocksDB workloads,
> filesystem shutdowns can occur in xfs_zone_gc_iter_irec when
> xfs_iget() returns -EINVAL.
>
> xfs_zone_gc_iter_irec expects -ENOENT when garbage collection races
> with file deletion, so that blocks belonging to deleted files can be
> skipped gracefully. Returning -EINVAL instead causes the GC code to
> treat this as a fatal error and forces a shutdown.
>
> Fix this by returning -ENOENT from xfs_imap_lookup for untrusted inodes
> when the inode has been deleted, allowing zone GC to safely ignore stale
> mappings.
>
> Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection")
> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
> ---
>
> We could also fix this up in xfs_zone_gc_iter_irec, and handle -EINVAL
> as a non-fatal error, but returning -ENOENT from xfs_imap_lookup seems
> like the more right thing to do.
>
> The check "that the returned record contains the required inode" in
> xfs_imap_lookup also returns -EINVAL and might be worth switching
> over to -ENOENT but I'm leaving that as is I did not hit that case
> in my testing.
>
> fs/xfs/libxfs/xfs_ialloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index dcef06ec0a02..702e1e853e45 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -2442,7 +2442,7 @@ xfs_imap_lookup(
> /* for untrusted inodes check it is allocated first */
> if ((flags & XFS_IGET_UNTRUSTED) &&
> (rec.ir_free & XFS_INOBT_MASK(agino - rec.ir_startino)))
> - return -EINVAL;
> + return -ENOENT;
>
> *chunk_agbno = XFS_AGINO_TO_AGBNO(mp, rec.ir_startino);
> *offset_agbno = agbno - *chunk_agbno;
> --
> 2.34.1
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH] xfs: return -ENOENT for unallocated inodes in xfs_imap_lookup
2026-05-13 6:37 [PATCH] xfs: return -ENOENT for unallocated inodes in xfs_imap_lookup Hans Holmberg
2026-05-13 6:57 ` Christoph Hellwig
2026-05-13 7:31 ` Carlos Maiolino
@ 2026-05-13 13:43 ` Dave Chinner
2 siblings, 0 replies; 4+ messages in thread
From: Dave Chinner @ 2026-05-13 13:43 UTC (permalink / raw)
To: Hans Holmberg
Cc: Carlos Maiolino, Darrick J . Wong, Dave Chinner,
Christoph Hellwig, Damien Le Moal, linux-xfs
On Wed, May 13, 2026 at 08:37:45AM +0200, Hans Holmberg wrote:
> Under heavy garbage collection pressure from RocksDB workloads,
> filesystem shutdowns can occur in xfs_zone_gc_iter_irec when
> xfs_iget() returns -EINVAL.
>
> xfs_zone_gc_iter_irec expects -ENOENT when garbage collection races
> with file deletion,
xfs_iget() returns -ENOENT when a lookup races with an unlink on the
cache hot side. When the inode is not in cache, it cannot race with
file deletion.
If we miss the cache, it will go read the inode from disk and then
check the state of it before inserting it into the cache. If the
inode mode is zero on disk, then we raced with unlink and -ENOENT
will be returned.
IOWs, a plain xfs_iget() call will handle races with unlink...
> so that blocks belonging to deleted files can be
> skipped gracefully. Returning -EINVAL instead causes the GC code to
> treat this as a fatal error and forces a shutdown.
If it passes in XFS_IGET_UNTRUSTED to xfs_iget(), then it is saying
the inode number comes from an unknown source and that may be invalid.
Hence we do more rigorous and costly checks on the inode number
(like force a btree lookup) if it is not already validated and in
cache. If any of these validation checks fail we return EINVAL to
indicate it was an invalid inode number.
IOWs, xfs_iget(XFS_IGET_UNTRUSTED) callers need to handle -EINVAL,
because validity checking the inode number is what the caller -asked
it to do-.
If you are using inode number from a trusted source (i.e. internal
filesystem metadata like a directory data block or rmapbt) then you
don't need XFS_IGET_UNTRUSTED. The inode number is known to be good,
modulo races with unlink which xfs_iget() handles anyway...
-Dave.
--
Dave Chinner
dgc@kernel.org
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-05-13 13:44 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-13 6:37 [PATCH] xfs: return -ENOENT for unallocated inodes in xfs_imap_lookup Hans Holmberg
2026-05-13 6:57 ` Christoph Hellwig
2026-05-13 7:31 ` Carlos Maiolino
2026-05-13 13:43 ` Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox