* Re: [PATCH] ext4: fix allocation failure in ext4_mb_load_buddy_gfp
@ 2025-09-27 0:18 Deepanshu Kartikey
2025-09-27 3:01 ` Theodore Ts'o
0 siblings, 1 reply; 9+ messages in thread
From: Deepanshu Kartikey @ 2025-09-27 0:18 UTC (permalink / raw)
To: adilger.kernel; +Cc: linux-ext4, linux-kernel
Hi Andreas,
Thank you for the detailed analysis of the tradeoffs.
Looking at the syzbot report (https://syzkaller.appspot.com/bug?extid=fd3f70a4509fca8c265d),
this WARNING appears 4 times, so while not frequent, it's a real issue that occurs
under memory pressure conditions.
Your -EAGAIN suggestion makes sense. The approach would be:
1. During memory reclaim, use GFP_NOFS without __GFP_NOFAIL
2. If allocation fails, return -EAGAIN to let reclaim skip this inode
3. Preallocation cleanup happens later when memory is available
I understand this requires modifying the function signature and updating all call
sites. I'm willing to do this work and properly test each caller's error handling.
Questions on implementation:
- Should callers like ext4_clear_inode() ignore -EAGAIN (leave cleanup for later)?
- Should callers like ext4_truncate() retry or also defer?
- For the unused "int needed" parameter you mentioned - should I remove it in the
same patch or separately?
I'd like to implement this fix properly rather than leaving the WARNING unaddressed.
Could you provide guidance on the preferred error handling for the different caller
contexts?
Best regards,
Deepanshu
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] ext4: fix allocation failure in ext4_mb_load_buddy_gfp
2025-09-27 0:18 [PATCH] ext4: fix allocation failure in ext4_mb_load_buddy_gfp Deepanshu Kartikey
@ 2025-09-27 3:01 ` Theodore Ts'o
0 siblings, 0 replies; 9+ messages in thread
From: Theodore Ts'o @ 2025-09-27 3:01 UTC (permalink / raw)
To: Deepanshu Kartikey; +Cc: adilger.kernel, linux-ext4, linux-kernel
On Sat, Sep 27, 2025 at 05:48:15AM +0530, Deepanshu Kartikey wrote:
>
> Your -EAGAIN suggestion makes sense. The approach would be:
> 1. During memory reclaim, use GFP_NOFS without __GFP_NOFAIL
> 2. If allocation fails, return -EAGAIN to let reclaim skip this inode
> 3. Preallocation cleanup happens later when memory is available
>
> I understand this requires modifying the function signature and updating all call
> sites. I'm willing to do this work and properly test each caller's error handling.
>
> Questions on implementation:
> - Should callers like ext4_clear_inode() ignore -EAGAIN (leave cleanup for later)?
It's not so simple. The call path that involes ext4_clear_inode() is
part of evict() which is called when the inode is evicted, and this is
called from prune_icache_sb():
https://syzkaller.appspot.com/bug?extid=fd3f70a4509fca8c265d
The problem is that none of the code paths allow for the inode
eviction to be delayed. And once the inode is evicted, it's *gone*
from memory, so there's no place to store the information needed so we
can "clean up the inode later".
The inode eviction might *take* a while, since there might be pages
that need to be written back first. And in order to do the writeback,
the flusher thread might need to do some block allocations, and that
might require some memory allocations. But since that's happening on
another thread, it doesn't cause any warnings.
This is why in some sense the warning in __alloc_pages_slowpaths() is
a little silly, and it's not really all that bad in practice:
/*
* PF_MEMALLOC request from this context is rather bizarre
* because we cannot reclaim anything and only can loop waiting
* for somebody to do a work for us.
*/
Sure, we might need to loop waiting for someone to work to release
pages. But in the inode eviction path, we are doing that *anyway*:
/*
* Wait for flusher thread to be done with the inode so that filesystem
* does not start destroying it while writeback is still running. Since
* the inode has I_FREEING set, flusher thread won't start new work on
* the inode. We just have to wait for running writeback to finish.
*/
inode_wait_for_writeback(inode);
(And writeback can require memory allocations. Oh, noos!)
In actual practice, it's not that bad, since looping until te memory
can be released is just *fine*. The OOM killer might news to whack
some processes to free memory, but if you're running so close to the
memory exhaustion, it's generally acceptable to have lower priority
jobs get nuked in order to keep the system. And if that's not
acceptable, then don't run the system that close to the edge! :-)
So the only reason why this is a problem is if someone is silly enough
to run with panic on warn enabled. (Don't do that.) Or if you've
gotten sucked int the gamification of syzbot. Personally, if the mm
folks want to insist on putting that WARN_ON_ONCE() there, I'm not
going to worry about the syzbot complaint. :-)
If you *really* want to solve the problem, we probably need to have
some way to that file system set a flag indicating that if you are
trying to prunce the inode cache, and you're in a memory reclaim
context, skip trying to evict this inode. For that matter, if we're
in a memory reclaim context, and the inode has dirty pages which are
being written back --- maybe we should just skip that inode since
there are probably easier and faster inodes to eject if we are so
desperate that we're in memory reclaim mode.
Another potential solution might be to pin the buddy bitmap and bitmap
blocks in meory and don't let them to get evicted if we know that
there might be preallocations pending for that partcular block group.
This will prevent the need to do the memory allocation, and also
avoids needing to do I/O to bring in the bitmap metadata. The
downside is this increases memory usage, all in the name of trying to
avoid that silly mm warning.
Yet another possibility is to have a pool of spare pages *just* for
ext4, and in the case where we are in memory reclaim --- drop the
__GFP_NOFAIL, and use the pool of spare pagese --- and if the pool of
spare pages is exhausted, just wait and do a retry loop on the allocation.
Or we could just drop the __GFP_NOFAIL and just add hard-coded retry
loop. It's what we used to do when the mm people tried to deprecate
__GFP_NOFAIL, and issued a warning whenever someone trid to use
__GFP_NOFAIL --- and so we said, "OK, if you really hate the
GFP_NOFAIL, we'll just open code the retry in the file system, since
data loss is unacceptable, and if that means the system might become
unresponsive for a bit when the system is under heavy memory pressure,
that's an acceptable tradeoff." But then the mm folks said, no wait,
if you do the retry loop in the caller we won't know that the memory
allocation Really Really Needs to Suceed. And so they dropped the
__GFP_NOFAIL warning, and we dropped the hard-coded retry loop.
> I'd like to implement this fix properly rather than leaving the WARNING unaddressed.
> Could you provide guidance on the preferred error handling for the different caller
> contexts?
Quite frankly, I'mm not sure any of these choices are worth it
compared to just leaving the WARNING unaddressed.
If it were up to me, I'd have a WARN_DONT_PANIC() message which
doesn't actually print the word "WARNING" so it doesn't trigger
syzbot, and which does't force a panic on warn. That it satisfies the
mm folks who have never liked __GFP_NOFAIL, since they can let the
kernel whine; it satifies the people who think this is a "securty
problem" because there are people who are silly enough to do panic on
warn; it satisfies the people who buy into the syzbot gamification;
and it avoids file system corruption, which keeps the file system
people happy.
Or we could just drop this particular warning altogether. Silly
warning. :-)
Cheers,
- Ted
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] ext4: fix allocation failure in ext4_mb_load_buddy_gfp
@ 2025-09-27 0:19 Deepanshu Kartikey
0 siblings, 0 replies; 9+ messages in thread
From: Deepanshu Kartikey @ 2025-09-27 0:19 UTC (permalink / raw)
To: adilger.kernel, tytso; +Cc: linux-ext4, linux-kernel
Hi Andreas,
Thank you for the detailed analysis of the tradeoffs.
Looking at the syzbot report (https://syzkaller.appspot.com/bug?extid=fd3f70a4509fca8c265d),
this WARNING appears 4 times, so while not frequent, it's a real issue that occurs
under memory pressure conditions.
Your -EAGAIN suggestion makes sense. The approach would be:
1. During memory reclaim, use GFP_NOFS without __GFP_NOFAIL
2. If allocation fails, return -EAGAIN to let reclaim skip this inode
3. Preallocation cleanup happens later when memory is available
I understand this requires modifying the function signature and updating all call
sites. I'm willing to do this work and properly test each caller's error handling.
Questions on implementation:
- Should callers like ext4_clear_inode() ignore -EAGAIN (leave cleanup for later)?
- Should callers like ext4_truncate() retry or also defer?
- For the unused "int needed" parameter you mentioned - should I remove it in the
same patch or separately?
I'd like to implement this fix properly rather than leaving the WARNING unaddressed.
Could you provide guidance on the preferred error handling for the different caller
contexts?
Best regards,
Deepanshu
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] ext4: fix allocation failure in ext4_mb_load_buddy_gfp
@ 2025-09-25 2:06 Deepanshu Kartikey
2025-09-26 19:17 ` Andreas Dilger
0 siblings, 1 reply; 9+ messages in thread
From: Deepanshu Kartikey @ 2025-09-25 2:06 UTC (permalink / raw)
To: adilger.kernel, tytso; +Cc: linux-ext4, linux-kernel
Hi Andreas,
Thank you for pointing out the fundamental issue with my approach. You're right that removing __GFP_NOFAIL creates a worse problem by potentially triggering filesystem errors.
I understand your suggestion about allowing the function to return errors so the caller can retry, but I need more specific guidance on the implementation approach.
Questions:
1. **Function signature change**: Should ext4_discard_preallocations() be changed from void to int to return error codes? This would require updating all 13+ callers I found.
2. **Caller modifications**: How should the various callers (ext4_truncate, ext4_clear_inode, ext4_release_file, etc.) handle allocation failures during memory pressure? Should they:
- Retry the operation later?
- Skip preallocation cleanup temporarily?
- Handle it differently based on the calling context?
3. **Memory pressure detection**: Is checking (current->flags & PF_MEMALLOC) the right approach to detect when we're in memory reclaim context?
4. **Scope of changes**: Would you prefer:
- A minimal fix that just handles the allocation failure gracefully?
- A more comprehensive rework of the error handling throughout the preallocation discard path?
I want to make sure I understand the preferred approach before submitting v2, especially since this affects multiple call sites throughout the ext4 codebase.
Best regards,
Deepanshu
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] ext4: fix allocation failure in ext4_mb_load_buddy_gfp
2025-09-25 2:06 Deepanshu Kartikey
@ 2025-09-26 19:17 ` Andreas Dilger
0 siblings, 0 replies; 9+ messages in thread
From: Andreas Dilger @ 2025-09-26 19:17 UTC (permalink / raw)
To: Deepanshu Kartikey; +Cc: Theodore Ts'o, linux-ext4, LKML
[-- Attachment #1: Type: text/plain, Size: 3143 bytes --]
On Sep 24, 2025, at 8:06 PM, Deepanshu Kartikey <kartikey406@gmail.com> wrote:
>
>
> Hi Andreas,
>
> Thank you for pointing out the fundamental issue with my approach. You're right that removing __GFP_NOFAIL creates a worse problem by potentially triggering filesystem errors.
>
> I understand your suggestion about allowing the function to return errors so the caller can retry, but I need more specific guidance on the implementation approach.
>
> Questions:
>
> 1. **Function signature change**: Should ext4_discard_preallocations() be changed from void to int to return error codes? This would require updating all 13+ callers I found.
Changing internal function signatures is never a problem for Linux, except
userland syscall APIs. This is even less of a concern if all of the callers
are inside ext4.
I notice that ext4_discard_preallocations() also has an unused "int needed"
argument that could be removed.
> 2. **Caller modifications**: How should the various callers (ext4_truncate, ext4_clear_inode, ext4_release_file, etc.) handle allocation failures during memory pressure? Should they:
> - Retry the operation later?
> - Skip preallocation cleanup temporarily?
> - Handle it differently based on the calling context?
Looking at this more closely, it appears that ext4_discard_preallocations()
should not fail outright, since this would leak space in the filesystem.
I guess this goes back to a question of whether a warning message on the console
when the kernel is totally out of memory is a bad thing? The whole point of
__GFP_NOFAIL was to put the retry in the control of the MM layer, instead of
having the caller loop and doing the same thing.
> 3. **Memory pressure detection**: Is checking (current->flags & PF_MEMALLOC) the right approach to detect when we're in memory reclaim context?
>
> 4. **Scope of changes**: Would you prefer:
> - A minimal fix that just handles the allocation failure gracefully?
> - A more comprehensive rework of the error handling throughout the preallocation discard path?
I was thinking one option might be to have reserved memory to handle this
specific case, so that *one* preallocation can make progress at a time.
However, it isn't clear if this one allocation would be enough to guarantee
forward progress, or if there needs to be a pool at every step along the way.
Just freeing some other buddy bitmap by ext4 and retrying is not guaranteed
to make progress, since OOM means it is likely that *other* threads are also
trying hard to allocate memory.
> I want to make sure I understand the preferred approach before submitting v2, especially since this affects multiple call sites throughout the ext4 codebase.
I think the important question is what the impact is of the original problem
that was being fixed?
Another option would be to return -EAGAIN or -ENOMEM (or similar) from this
function and then the whole "flush THIS inode from cache" would be dropped,
and memory reclaim should progress to some other inode that doesn't have
preallocated blocks, or some other memory.
Cheers, Andreas
[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] ext4: fix allocation failure in ext4_mb_load_buddy_gfp
@ 2025-09-24 1:16 Deepanshu Kartikey
2025-09-24 23:33 ` Andreas Dilger
0 siblings, 1 reply; 9+ messages in thread
From: Deepanshu Kartikey @ 2025-09-24 1:16 UTC (permalink / raw)
To: tytso
Cc: adilger.kernel, linux-ext4, linux-kernel, Deepanshu Kartikey,
syzbot+fd3f70a4509fca8c265d
Fix WARNING in __alloc_pages_slowpath() when ext4_discard_preallocations()
is called during memory pressure.
The issue occurs when __GFP_NOFAIL is used during memory reclaim context,
which can lead to allocation warnings. Avoid using __GFP_NOFAIL when
the current process is already in memory allocation context to prevent
potential deadlocks and warnings.
Reported-by: syzbot+fd3f70a4509fca8c265d@syzkaller.appspotmail.com
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Tested-by: syzbot+fd3f70a4509fca8c265d@syzkaller.appspotmail.com
---
fs/ext4/mballoc.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 5898d92ba19f..61ee009717f1 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -5656,9 +5656,11 @@ void ext4_discard_preallocations(struct inode *inode)
list_for_each_entry_safe(pa, tmp, &list, u.pa_tmp_list) {
BUG_ON(pa->pa_type != MB_INODE_PA);
group = ext4_get_group_number(sb, pa->pa_pstart);
+ gfp_t flags = GFP_NOFS;
+ if (!(current->flags & PF_MEMALLOC))
+ flags |= __GFP_NOFAIL;
- err = ext4_mb_load_buddy_gfp(sb, group, &e4b,
- GFP_NOFS|__GFP_NOFAIL);
+ err = ext4_mb_load_buddy_gfp(sb, group, &e4b, flags);
if (err) {
ext4_error_err(sb, -err, "Error %d loading buddy information for %u",
err, group);
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH] ext4: fix allocation failure in ext4_mb_load_buddy_gfp
2025-09-24 1:16 Deepanshu Kartikey
@ 2025-09-24 23:33 ` Andreas Dilger
0 siblings, 0 replies; 9+ messages in thread
From: Andreas Dilger @ 2025-09-24 23:33 UTC (permalink / raw)
To: Deepanshu Kartikey
Cc: Theodore Ts'o, linux-ext4, LKML, syzbot+fd3f70a4509fca8c265d
[-- Attachment #1: Type: text/plain, Size: 2166 bytes --]
On Sep 23, 2025, at 7:16 PM, Deepanshu Kartikey <kartikey406@gmail.com> wrote:
>
> Fix WARNING in __alloc_pages_slowpath() when ext4_discard_preallocations()
> is called during memory pressure.
>
> The issue occurs when __GFP_NOFAIL is used during memory reclaim context,
> which can lead to allocation warnings. Avoid using __GFP_NOFAIL when
> the current process is already in memory allocation context to prevent
> potential deadlocks and warnings.
This quiets the memory allocation warning, but will result in a filesystem
error being generated (read-only or panic) if the allocation fails, if you
follow the code a few lines further down. That is not good error handling
for a memory allocation failure during cache cleanup.
When __GFP_NOFAIL was *always* passed, then the error could never be hit,
which is why it was put there in the first place.
It looks like this function can return an error and the caller will retry,
so that would be preferable to causing the filesystem to abort in this case.
Cheers, Andreas
>
> Reported-by: syzbot+fd3f70a4509fca8c265d@syzkaller.appspotmail.com
> Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
> Tested-by: syzbot+fd3f70a4509fca8c265d@syzkaller.appspotmail.com
> ---
> fs/ext4/mballoc.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 5898d92ba19f..61ee009717f1 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -5656,9 +5656,11 @@ void ext4_discard_preallocations(struct inode *inode)
> list_for_each_entry_safe(pa, tmp, &list, u.pa_tmp_list) {
> BUG_ON(pa->pa_type != MB_INODE_PA);
> group = ext4_get_group_number(sb, pa->pa_pstart);
> + gfp_t flags = GFP_NOFS;
> + if (!(current->flags & PF_MEMALLOC))
> + flags |= __GFP_NOFAIL;
>
> - err = ext4_mb_load_buddy_gfp(sb, group, &e4b,
> - GFP_NOFS|__GFP_NOFAIL);
> + err = ext4_mb_load_buddy_gfp(sb, group, &e4b, flags);
> if (err) {
> ext4_error_err(sb, -err, "Error %d loading buddy information for %u",
> err, group);
> --
> 2.43.0
>
Cheers, Andreas
[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] ext4: fix allocation failure in ext4_mb_load_buddy_gfp
@ 2025-09-19 2:14 Deepanshu Kartikey
0 siblings, 0 replies; 9+ messages in thread
From: Deepanshu Kartikey @ 2025-09-19 2:14 UTC (permalink / raw)
To: skhan, david.hunter.linux
Cc: linux-ext4, Deepanshu Kartikey, syzbot+fd3f70a4509fca8c265d
Fix WARNING in __alloc_pages_slowpath() when ext4_discard_preallocations()
is called during memory pressure.
The issue occurs when __GFP_NOFAIL is used during memory reclaim context,
which can lead to allocation warnings. Avoid using __GFP_NOFAIL when
the current process is already in memory allocation context to prevent
potential deadlocks and warnings.
Reported-by: syzbot+fd3f70a4509fca8c265d@syzkaller.appspotmail.com
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Tested-by: syzbot+fd3f70a4509fca8c265d@syzkaller.appspotmail.com
---
fs/ext4/mballoc.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 5898d92ba19f..61ee009717f1 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -5656,9 +5656,11 @@ void ext4_discard_preallocations(struct inode *inode)
list_for_each_entry_safe(pa, tmp, &list, u.pa_tmp_list) {
BUG_ON(pa->pa_type != MB_INODE_PA);
group = ext4_get_group_number(sb, pa->pa_pstart);
+ gfp_t flags = GFP_NOFS;
+ if (!(current->flags & PF_MEMALLOC))
+ flags |= __GFP_NOFAIL;
- err = ext4_mb_load_buddy_gfp(sb, group, &e4b,
- GFP_NOFS|__GFP_NOFAIL);
+ err = ext4_mb_load_buddy_gfp(sb, group, &e4b, flags);
if (err) {
ext4_error_err(sb, -err, "Error %d loading buddy information for %u",
err, group);
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH] ext4: fix allocation failure in ext4_mb_load_buddy_gfp
@ 2025-09-19 1:32 Deepanshu Kartikey
0 siblings, 0 replies; 9+ messages in thread
From: Deepanshu Kartikey @ 2025-09-19 1:32 UTC (permalink / raw)
To: syzbot+fd3f70a4509fca8c265d
Cc: syzkaller-bugs, linux-ext4, Deepanshu Kartikey
#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
Fix WARNING in __alloc_pages_slowpath() when ext4_discard_preallocations()
is called during memory pressure.
The issue occurs when __GFP_NOFAIL is used during memory reclaim context,
which can lead to allocation warnings. Avoid using __GFP_NOFAIL when
the current process is already in memory allocation context to prevent
potential deadlocks and warnings.
Reported-by: syzbot+fd3f70a4509fca8c265d@syzkaller.appspotmail.com
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
fs/ext4/mballoc.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 5898d92ba19f..61ee009717f1 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -5656,9 +5656,11 @@ void ext4_discard_preallocations(struct inode *inode)
list_for_each_entry_safe(pa, tmp, &list, u.pa_tmp_list) {
BUG_ON(pa->pa_type != MB_INODE_PA);
group = ext4_get_group_number(sb, pa->pa_pstart);
+ gfp_t flags = GFP_NOFS;
+ if (!(current->flags & PF_MEMALLOC))
+ flags |= __GFP_NOFAIL;
- err = ext4_mb_load_buddy_gfp(sb, group, &e4b,
- GFP_NOFS|__GFP_NOFAIL);
+ err = ext4_mb_load_buddy_gfp(sb, group, &e4b, flags);
if (err) {
ext4_error_err(sb, -err, "Error %d loading buddy information for %u",
err, group);
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-09-27 3:01 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-27 0:18 [PATCH] ext4: fix allocation failure in ext4_mb_load_buddy_gfp Deepanshu Kartikey
2025-09-27 3:01 ` Theodore Ts'o
-- strict thread matches above, loose matches on Subject: below --
2025-09-27 0:19 Deepanshu Kartikey
2025-09-25 2:06 Deepanshu Kartikey
2025-09-26 19:17 ` Andreas Dilger
2025-09-24 1:16 Deepanshu Kartikey
2025-09-24 23:33 ` Andreas Dilger
2025-09-19 2:14 Deepanshu Kartikey
2025-09-19 1:32 Deepanshu Kartikey
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).