From: Chao Shi <coshi036@gmail.com>
To: linux-ext4@vger.kernel.org
Cc: tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz,
Chao Shi <coshi036@gmail.com>, Sungwoo Kim <iam@sung-woo.kim>,
Dave Tian <daveti@purdue.edu>, Weidong Zhu <weizhu@fiu.edu>
Subject: [PATCH] ext4: avoid __GFP_NOFAIL in __ext4_get_inode_loc allocation
Date: Mon, 27 Apr 2026 18:23:00 -0400 [thread overview]
Message-ID: <20260427222300.1284855-1-coshi036@gmail.com> (raw)
When kswapd shrinks the dcache, the last iput() on an ext4 inode can
trigger ext4_orphan_del(), which calls ext4_reserve_inode_write() and
ultimately __ext4_get_inode_loc(). That function calls sb_getblk(),
which wraps __getblk() and carries implicit __GFP_NOFAIL. Because
kswapd runs with PF_MEMALLOC set, combining NOFAIL with a non-reclaimable
context trips WARN_ON_ONCE(current->flags & PF_MEMALLOC) inside
__alloc_pages_slowpath(), producing a spurious splat even though the
allocation could simply fail and return -ENOMEM to the caller.
Switch both sb_getblk() call sites in __ext4_get_inode_loc() to
sb_getblk_gfp() with the same flags minus __GFP_NOFAIL
(mapping_gfp_constraint(~__GFP_FS) | __GFP_MOVABLE), computing the gfp
value once and reusing it for the optional bitmap_bh optimisation fetch.
All callers of __ext4_get_inode_loc() -- reached via ext4_get_inode_loc(),
__ext4_get_inode_loc_noinmem(), and ext4_get_fc_inode_loc() -- already
propagate a non-zero return as an error without aborting the filesystem.
Both sb_getblk() call sites in __ext4_get_inode_loc() are converted; the
bitmap_bh fetch already falls back to make_io on NULL, so allowing it to
fail is a no-op there.
Reproduced under syzkaller+FEMU based fuzz tool (FuzzNvme) on x86_64 QEMU,
based on mainline 894009e2ef10:
WARNING: CPU: 0 PID: 55 at mm/page_alloc.c:4722
__alloc_pages_slowpath
Comm: kswapd0 Not tainted 6.19.0+ #14
Call Trace:
__alloc_pages_slowpath
alloc_pages_mpol
folio_alloc_noprof
filemap_alloc_folio_noprof
__filemap_get_folio
grow_dev_folio
grow_buffers
__getblk_slow
bdev_getblk
__ext4_get_inode_loc
ext4_get_inode_loc
ext4_reserve_inode_write
ext4_orphan_del
ext4_evict_inode
evict
iput
dentry_unlink_inode
__dentry_kill
shrink_dentry_list
prune_dcache_sb
super_cache_scan
do_shrink_slab
shrink_slab
shrink_node
balance_pgdat
kswapd
kthread
ret_from_fork
Related: see d8b90e6387a ("ext4: add ext4_sb_bread_nofail() helper
function for ext4_free_branches()") for the same strategy applied to
the read path in ext4_free_branches().
Link: https://lore.kernel.org/all/?q=PF_MEMALLOC+nofail+ext4+iput
Acked-by: Sungwoo Kim <iam@sung-woo.kim>
Acked-by: Dave Tian <daveti@purdue.edu>
Acked-by: Weidong Zhu <weizhu@fiu.edu>
Signed-off-by: Chao Shi <coshi036@gmail.com>
---
fs/ext4/inode.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c2c2d6ac7f3..1b2a7bd59b8 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4859,6 +4859,7 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
ext4_fsblk_t block;
struct blk_plug plug;
int inodes_per_block, inode_offset;
+ gfp_t gfp;
iloc->bh = NULL;
if (ino < EXT4_ROOT_INO ||
@@ -4887,7 +4888,14 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
}
block += (inode_offset / inodes_per_block);
- bh = sb_getblk(sb, block);
+ /*
+ * No __GFP_NOFAIL: this can run from reclaim context (kswapd
+ * shrinker -> iput -> ext4_orphan_del path) where NOFAIL trips
+ * WARN_ON_ONCE in __alloc_pages_slowpath().
+ */
+ gfp = mapping_gfp_constraint(sb->s_bdev->bd_mapping, ~__GFP_FS) |
+ __GFP_MOVABLE;
+ bh = sb_getblk_gfp(sb, block, gfp);
if (unlikely(!bh))
return -ENOMEM;
if (ext4_buffer_uptodate(bh))
@@ -4912,7 +4920,7 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
start = inode_offset & ~(inodes_per_block - 1);
/* Is the inode bitmap in cache? */
- bitmap_bh = sb_getblk(sb, ext4_inode_bitmap(sb, gdp));
+ bitmap_bh = sb_getblk_gfp(sb, ext4_inode_bitmap(sb, gdp), gfp);
if (unlikely(!bitmap_bh))
goto make_io;
--
2.43.0
next reply other threads:[~2026-04-27 22:23 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-27 22:23 Chao Shi [this message]
2026-04-28 1:28 ` [PATCH] ext4: avoid __GFP_NOFAIL in __ext4_get_inode_loc allocation Theodore Tso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260427222300.1284855-1-coshi036@gmail.com \
--to=coshi036@gmail.com \
--cc=adilger.kernel@dilger.ca \
--cc=daveti@purdue.edu \
--cc=iam@sung-woo.kim \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=weizhu@fiu.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox