Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] mm/page_alloc: fix defrag_mode for non-reclaimable allocations
@ 2026-05-20 12:22 Dmitry Ilvokhin
  2026-05-21 23:59 ` Andrew Morton
  0 siblings, 1 reply; 5+ messages in thread
From: Dmitry Ilvokhin @ 2026-05-20 12:22 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Brendan Jackman, Johannes Weiner, Zi Yan
  Cc: linux-mm, linux-kernel, kernel-team, Dmitry Ilvokhin

When defrag_mode is enabled, ALLOC_NOFRAGMENT is enforced to prevent
migratetype fallbacks and keep pageblocks clean. The allocator relies on
reclaim and compaction to free pages of the correct type before allowing
fallback as a last resort.

However, non-reclaimable allocations such as GFP_ATOMIC cannot invoke
direct reclaim or compaction. With defrag_mode=1, these allocations hit
the !can_direct_reclaim bailout in __alloc_pages_slowpath() with
ALLOC_NOFRAGMENT still set, and fail without ever attempting a fallback.

This causes a large number of SLUB allocation failures for
skbuff_head_cache under network-heavy workloads, despite free memory
being available in other migratetype freelists.

Clear ALLOC_NOFRAGMENT and retry for allocations that request kswapd
reclaim but cannot do direct reclaim themselves (GFP_ATOMIC).  Purely
speculative allocations like GFP_TRANSHUGE_LIGHT that don't set
__GFP_KSWAPD_RECLAIM are left to fail, since they have reasonable
fallbacks and should not cause fragmentation.

Fixes: e3aa7df331bc ("mm: page_alloc: defrag_mode")

Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
Changes in v2:

- Add check for __GFP_KSWAPD_RECLAIM.
- Picked up Johannes acked-by tag.

v1: https://lore.kernel.org/all/20260518163736.173910-1-d@ilvokhin.com/

 mm/page_alloc.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 227d58dc3de6..c5a077de1be0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4811,8 +4811,19 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	}
 
 	/* Caller is not willing to reclaim, we can't balance anything */
-	if (!can_direct_reclaim)
+	if (!can_direct_reclaim) {
+		/*
+		 * Reclaim/compaction cannot run, so defrag_mode's strategy
+		 * of enforcing ALLOC_NOFRAGMENT cannot be fulfilled. Allow
+		 * fallbacks rather than failing the allocation outright.
+		 */
+		if (defrag_mode && (alloc_flags & ALLOC_NOFRAGMENT) &&
+		    (gfp_mask & __GFP_KSWAPD_RECLAIM)) {
+			alloc_flags &= ~ALLOC_NOFRAGMENT;
+			goto retry;
+		}
 		goto nopage;
+	}
 
 	/* Avoid recursion of direct reclaim */
 	if (current->flags & PF_MEMALLOC)
-- 
2.53.0-Meta



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] mm/page_alloc: fix defrag_mode for non-reclaimable allocations
  2026-05-20 12:22 [PATCH v2] mm/page_alloc: fix defrag_mode for non-reclaimable allocations Dmitry Ilvokhin
@ 2026-05-21 23:59 ` Andrew Morton
  2026-05-22 13:05   ` Dmitry Ilvokhin
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2026-05-21 23:59 UTC (permalink / raw)
  To: Dmitry Ilvokhin
  Cc: Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Brendan Jackman, Johannes Weiner, Zi Yan, linux-mm, linux-kernel,
	kernel-team

On Wed, 20 May 2026 12:22:28 +0000 Dmitry Ilvokhin <d@ilvokhin.com> wrote:

> When defrag_mode is enabled, ALLOC_NOFRAGMENT is enforced to prevent
> migratetype fallbacks and keep pageblocks clean. The allocator relies on
> reclaim and compaction to free pages of the correct type before allowing
> fallback as a last resort.
> 
> However, non-reclaimable allocations such as GFP_ATOMIC cannot invoke
> direct reclaim or compaction. With defrag_mode=1, these allocations hit
> the !can_direct_reclaim bailout in __alloc_pages_slowpath() with
> ALLOC_NOFRAGMENT still set, and fail without ever attempting a fallback.
> 
> This causes a large number of SLUB allocation failures for
> skbuff_head_cache under network-heavy workloads, despite free memory
> being available in other migratetype freelists.

That sounds painful.

> Clear ALLOC_NOFRAGMENT and retry for allocations that request kswapd
> reclaim but cannot do direct reclaim themselves (GFP_ATOMIC).  Purely
> speculative allocations like GFP_TRANSHUGE_LIGHT that don't set
> __GFP_KSWAPD_RECLAIM are left to fail, since they have reasonable
> fallbacks and should not cause fragmentation.

How serious is this to our users when running real-world workloads?

> Fixes: e3aa7df331bc ("mm: page_alloc: defrag_mode")
> 
> Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] mm/page_alloc: fix defrag_mode for non-reclaimable allocations
  2026-05-21 23:59 ` Andrew Morton
@ 2026-05-22 13:05   ` Dmitry Ilvokhin
  2026-05-23  2:54     ` Andrew Morton
  0 siblings, 1 reply; 5+ messages in thread
From: Dmitry Ilvokhin @ 2026-05-22 13:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Brendan Jackman, Johannes Weiner, Zi Yan, linux-mm, linux-kernel,
	kernel-team

On Thu, May 21, 2026 at 04:59:10PM -0700, Andrew Morton wrote:
> On Wed, 20 May 2026 12:22:28 +0000 Dmitry Ilvokhin <d@ilvokhin.com> wrote:
> 
> > When defrag_mode is enabled, ALLOC_NOFRAGMENT is enforced to prevent
> > migratetype fallbacks and keep pageblocks clean. The allocator relies on
> > reclaim and compaction to free pages of the correct type before allowing
> > fallback as a last resort.
> > 
> > However, non-reclaimable allocations such as GFP_ATOMIC cannot invoke
> > direct reclaim or compaction. With defrag_mode=1, these allocations hit
> > the !can_direct_reclaim bailout in __alloc_pages_slowpath() with
> > ALLOC_NOFRAGMENT still set, and fail without ever attempting a fallback.
> > 
> > This causes a large number of SLUB allocation failures for
> > skbuff_head_cache under network-heavy workloads, despite free memory
> > being available in other migratetype freelists.
> 
> That sounds painful.
> 
> > Clear ALLOC_NOFRAGMENT and retry for allocations that request kswapd
> > reclaim but cannot do direct reclaim themselves (GFP_ATOMIC).  Purely
> > speculative allocations like GFP_TRANSHUGE_LIGHT that don't set
> > __GFP_KSWAPD_RECLAIM are left to fail, since they have reasonable
> > fallbacks and should not cause fragmentation.
> 
> How serious is this to our users when running real-world workloads?

We observed it on a few of the Meta workloads that adopted
defrag_mode=1.

For the service under load there were 85509 SLUB allocation failures
messages in dmesg within 2 hours. All of them are GFP_ATOMIC allocations
for skbuff_head_cache, despite free pages being available in other
migratetype freelists (~13 GB free).

Since it is networking path from the practical point of view, this means
dropped packets, failed RPC requests, tail latency spikes and overall
service degradation.

> 
> > Fixes: e3aa7df331bc ("mm: page_alloc: defrag_mode")
> > 
> > Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
> > Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] mm/page_alloc: fix defrag_mode for non-reclaimable allocations
  2026-05-22 13:05   ` Dmitry Ilvokhin
@ 2026-05-23  2:54     ` Andrew Morton
  2026-05-23 13:50       ` Dmitry Ilvokhin
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2026-05-23  2:54 UTC (permalink / raw)
  To: Dmitry Ilvokhin
  Cc: Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Brendan Jackman, Johannes Weiner, Zi Yan, linux-mm, linux-kernel,
	kernel-team

On Fri, 22 May 2026 13:05:36 +0000 Dmitry Ilvokhin <d@ilvokhin.com> wrote:

> > How serious is this to our users when running real-world workloads?
> 
> We observed it on a few of the Meta workloads that adopted
> defrag_mode=1.
> 
> For the service under load there were 85509 SLUB allocation failures
> messages in dmesg within 2 hours. All of them are GFP_ATOMIC allocations
> for skbuff_head_cache, despite free pages being available in other
> migratetype freelists (~13 GB free).

For a single machine, I assume.

> Since it is networking path from the practical point of view, this means
> dropped packets, failed RPC requests, tail latency spikes and overall
> service degradation.

OK, thanks.   I assume 12 failures per second isn't a disaster, and that
there's no need to fast-track this into 7.1?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] mm/page_alloc: fix defrag_mode for non-reclaimable allocations
  2026-05-23  2:54     ` Andrew Morton
@ 2026-05-23 13:50       ` Dmitry Ilvokhin
  0 siblings, 0 replies; 5+ messages in thread
From: Dmitry Ilvokhin @ 2026-05-23 13:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
	Brendan Jackman, Johannes Weiner, Zi Yan, linux-mm, linux-kernel,
	kernel-team

On Fri, May 22, 2026 at 07:54:26PM -0700, Andrew Morton wrote:
> On Fri, 22 May 2026 13:05:36 +0000 Dmitry Ilvokhin <d@ilvokhin.com> wrote:
> 
> > > How serious is this to our users when running real-world workloads?
> > 
> > We observed it on a few of the Meta workloads that adopted
> > defrag_mode=1.
> > 
> > For the service under load there were 85509 SLUB allocation failures
> > messages in dmesg within 2 hours. All of them are GFP_ATOMIC allocations
> > for skbuff_head_cache, despite free pages being available in other
> > migratetype freelists (~13 GB free).
> 
> For a single machine, I assume.

Yes, all of that data is from a single machine.

> 
> > Since it is networking path from the practical point of view, this means
> > dropped packets, failed RPC requests, tail latency spikes and overall
> > service degradation.
> 
> OK, thanks.   I assume 12 failures per second isn't a disaster, and that
> there's no need to fast-track this into 7.1?

Yes, I agree. No need to fast-track this.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-05-23 13:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-20 12:22 [PATCH v2] mm/page_alloc: fix defrag_mode for non-reclaimable allocations Dmitry Ilvokhin
2026-05-21 23:59 ` Andrew Morton
2026-05-22 13:05   ` Dmitry Ilvokhin
2026-05-23  2:54     ` Andrew Morton
2026-05-23 13:50       ` Dmitry Ilvokhin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox