public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: Chao Shi <coshi036@gmail.com>
To: linux-ext4@vger.kernel.org
Cc: tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz,
	Chao Shi <coshi036@gmail.com>, Sungwoo Kim <iam@sung-woo.kim>,
	Dave Tian <daveti@purdue.edu>, Weidong Zhu <weizhu@fiu.edu>
Subject: [PATCH] ext4: avoid __GFP_NOFAIL in __ext4_get_inode_loc allocation
Date: Mon, 27 Apr 2026 18:23:00 -0400	[thread overview]
Message-ID: <20260427222300.1284855-1-coshi036@gmail.com> (raw)

When kswapd shrinks the dcache, the last iput() on an ext4 inode can
trigger ext4_orphan_del(), which calls ext4_reserve_inode_write() and
ultimately __ext4_get_inode_loc().  That function calls sb_getblk(),
which wraps __getblk() and carries implicit __GFP_NOFAIL.  Because
kswapd runs with PF_MEMALLOC set, combining NOFAIL with a non-reclaimable
context trips WARN_ON_ONCE(current->flags & PF_MEMALLOC) inside
__alloc_pages_slowpath(), producing a spurious splat even though the
allocation could simply fail and return -ENOMEM to the caller.

Switch both sb_getblk() call sites in __ext4_get_inode_loc() to
sb_getblk_gfp() with the same flags minus __GFP_NOFAIL
(mapping_gfp_constraint(~__GFP_FS) | __GFP_MOVABLE), computing the gfp
value once and reusing it for the optional bitmap_bh optimisation fetch.
All callers of __ext4_get_inode_loc() -- reached via ext4_get_inode_loc(),
__ext4_get_inode_loc_noinmem(), and ext4_get_fc_inode_loc() -- already
propagate a non-zero return as an error without aborting the filesystem.
Both sb_getblk() call sites in __ext4_get_inode_loc() are converted; the
bitmap_bh fetch already falls back to make_io on NULL, so allowing it to
fail is a no-op there.

Reproduced under syzkaller+FEMU based fuzz tool (FuzzNvme) on x86_64 QEMU,
based on mainline 894009e2ef10:

  WARNING: CPU: 0 PID: 55 at mm/page_alloc.c:4722
                                    __alloc_pages_slowpath
  Comm: kswapd0 Not tainted 6.19.0+ #14
  Call Trace:
   __alloc_pages_slowpath
   alloc_pages_mpol
   folio_alloc_noprof
   filemap_alloc_folio_noprof
   __filemap_get_folio
   grow_dev_folio
   grow_buffers
   __getblk_slow
   bdev_getblk
   __ext4_get_inode_loc
   ext4_get_inode_loc
   ext4_reserve_inode_write
   ext4_orphan_del
   ext4_evict_inode
   evict
   iput
   dentry_unlink_inode
   __dentry_kill
   shrink_dentry_list
   prune_dcache_sb
   super_cache_scan
   do_shrink_slab
   shrink_slab
   shrink_node
   balance_pgdat
   kswapd
   kthread
   ret_from_fork

Related: see d8b90e6387a ("ext4: add ext4_sb_bread_nofail() helper
function for ext4_free_branches()") for the same strategy applied to
the read path in ext4_free_branches().
Link: https://lore.kernel.org/all/?q=PF_MEMALLOC+nofail+ext4+iput

Acked-by: Sungwoo Kim <iam@sung-woo.kim>
Acked-by: Dave Tian <daveti@purdue.edu>
Acked-by: Weidong Zhu <weizhu@fiu.edu>
Signed-off-by: Chao Shi <coshi036@gmail.com>
---
 fs/ext4/inode.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c2c2d6ac7f3..1b2a7bd59b8 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4859,6 +4859,7 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
 	ext4_fsblk_t		block;
 	struct blk_plug		plug;
 	int			inodes_per_block, inode_offset;
+	gfp_t			gfp;
 
 	iloc->bh = NULL;
 	if (ino < EXT4_ROOT_INO ||
@@ -4887,7 +4888,14 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
 	}
 	block += (inode_offset / inodes_per_block);
 
-	bh = sb_getblk(sb, block);
+	/*
+	 * No __GFP_NOFAIL: this can run from reclaim context (kswapd
+	 * shrinker -> iput -> ext4_orphan_del path) where NOFAIL trips
+	 * WARN_ON_ONCE in __alloc_pages_slowpath().
+	 */
+	gfp = mapping_gfp_constraint(sb->s_bdev->bd_mapping, ~__GFP_FS) |
+		__GFP_MOVABLE;
+	bh = sb_getblk_gfp(sb, block, gfp);
 	if (unlikely(!bh))
 		return -ENOMEM;
 	if (ext4_buffer_uptodate(bh))
@@ -4912,7 +4920,7 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
 		start = inode_offset & ~(inodes_per_block - 1);
 
 		/* Is the inode bitmap in cache? */
-		bitmap_bh = sb_getblk(sb, ext4_inode_bitmap(sb, gdp));
+		bitmap_bh = sb_getblk_gfp(sb, ext4_inode_bitmap(sb, gdp), gfp);
 		if (unlikely(!bitmap_bh))
 			goto make_io;
 
-- 
2.43.0


             reply	other threads:[~2026-04-27 22:23 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-27 22:23 Chao Shi [this message]
2026-04-28  1:28 ` [PATCH] ext4: avoid __GFP_NOFAIL in __ext4_get_inode_loc allocation Theodore Tso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260427222300.1284855-1-coshi036@gmail.com \
    --to=coshi036@gmail.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=daveti@purdue.edu \
    --cc=iam@sung-woo.kim \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=weizhu@fiu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox