Re: [PATCH] mm/page_alloc: skip high atomic reservation at or below costly order

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

From: JP Kobryn <jp.kobryn@linux.dev>
To: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Mel Gorman <mgorman@techsingularity.net>
Cc: akpm@linux-foundation.org, surenb@google.com, mhocko@suse.com,
	jackmanb@google.com, ziy@nvidia.com, linux-mm@kvack.org,
	usama.arif@linux.dev, kirill@shutemov.name, willy@infradead.org,
	linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH] mm/page_alloc: skip high atomic reservation at or below costly order
Date: Tue, 16 Jun 2026 12:58:48 -0700	[thread overview]
Message-ID: <bb1e9f4c-be28-4bc8-8a45-a59b3af888f0@linux.dev> (raw)
In-Reply-To: <ae742ae6-a7b4-424b-bc87-c18200d9d203@kernel.org>

On 5/28/26 6:57 AM, Vlastimil Babka (SUSE) wrote:
> On 5/27/26 07:57, JP Kobryn wrote:
>> On 5/25/26 2:11 AM, Vlastimil Babka (SUSE) wrote:
>>> On 5/19/26 22:28, Johannes Weiner wrote:
>>>> On Mon, May 18, 2026 at 06:25:32PM -0700, JP Kobryn (Meta) wrote:
>>>> This is an interesting patch. A couple of thoughts:
>>>>
>>>> 1. You disabled the highatomic reserve for this workload and it didn't
>>>> seem to matter. Presumably <costly orders don't need the protection.
>>>>
>>>> 2. Maxing out the reserves is odd. ALLOC_HIGHATOMIC allocations will
>>>> try reserved space first,
>>> Hmm, but if the allocation succeeds before entering slowpath,
>>> ALLOC_NON_BLOCK won't be set.
>>> But reserving another block should mean we already exhausted the 
>>> reserved ones.
>>> Unreserving is only done when direct reclaim made some progress but failed
>>> to produce a page. But if it works, or kswapd does the job, we won't 
>>> enter it?
>>
>> There was just no real pressure to invoke the unreserving. Let me know
>> if I'm misunderstanding the question.
> 
> Sorry, it was more thinking out loud about Johannes' point than a question.
> Yeah it seems there was no real pressure to invoke unreserving.
> 
> The reserving side is probably fine. Highatomic allocation will not try the
> already reserved blocks in he fastpath, which is maybe not ideal. But they
> will try them before reserving another block, and that's the important part.
> 

I sent a patch [0] that addresses this.

>>>> and I'd expect things that are commonly
>>>> highatomic to be short-lived. Why don't we stop with a couple of
>>>> claimed highatomic blocks that get continuously recycled?
>>> Maybe it's some big burst of highatomic allocations that leads to the
>>> reservations and then they stay around "forever"?
>>
>> I should add to the changelog the missing info that high frequency
>> net allocations are responsible for these high atomic reservations.
>> Even though the allocations are not necessarily long-lived, the
>> pageblocks remain high atomic.
> 
> OK, thanks for the info.
> 
>>> If that's the case I think we should be perhaps looking at the unreserving
>>> being done more proactively, rather than limiting things to costly order.
>>
>> What are your thoughts if we instead look at it as: should we be reserving
>> full pageblocks for small allocations?
> 
> Well, since migratetypes operate on the pageblock level, so do the
> highatomic reservations. It at least groups them together and not scatter
> all over random pageblocks?

Right, that's the trade-off. I'm not going to pursue this approach.
Instead, I've been looking for a more targeted fix in the relevant
allocator paths. See this patch [0].

> 
>> It seems to come down to whether we want the disproportionate protection 
>> of full
>> pageblocks (below costly order) for high atomic allocs vs letting them 
>> coalesce
>> in the buddy path. Is the data not enough to justify the latter?
> 
> I still think the data shows we might be too lax in unreserving.

Ack.

> 
>>>> 3. The impact on THP and compaction success rate is pretty
>>>> extreme. How can 1% of memory throw such a wrench into the gears?
>>> Maybe if ~all free memory is in the highatomic blocks, compaction can't be
>>> effective much. Or some suitability check somewhere in reclaim+compaction
>>> wrongly assumes the highatomic blocks are usable, so it won't do the work.
>>
>> I could be missing something, but I spent some time tonight looking into
>> this and didn't find an issue in the compaction/reclaim suitability path.
>>
>> __compaction_suitable() calls __zone_watermark_ok(), and that path
>> subtracts free MIGRATE_HIGHATOMIC pages from usable free memory for
>> callers without reserve access:
>>
>>   /*
>>    * If the caller does not have rights to reserves below the min
>>    * watermark then subtract the free pages reserved for highatomic.
>>    */
>>   if (likely(!(alloc_flags & ALLOC_RESERVES)))
>>       unusable_free += READ_ONCE(z->nr_free_highatomic);
>>
>> So free highatomic pages are removed from the usable free count there.
>>
>> Also, the suitable-free-block check in __zone_watermark_ok() only treats
>> MIGRATE_HIGHATOMIC as usable when alloc_flags includes
>> ALLOC_HIGHATOMIC (or ALLOC_OOM). __compaction_suitable() passes
>> ALLOC_CMA here (not ALLOC_HIGHATOMIC), so I don't think compaction is
>> incorrectly treating free highatomic blocks as usable.
> 
> OK, thanks for checking.
> 
>> The only caveat I noticed is the fragmentation accounting side:
>> fill_contig_page_info() / fragmentation_index() appear to count
>> free_area[order].nr_free across migratetypes, so fragmentation scoring
>> may look better than they really are. But that seems adjacent
>> to this patch.
> 
> Right.
> 
>> I think though that by the time we consider reclaim or compaction we're
>> dealing with the aftermath. The patch prevents the problem from occurring
>> up front.
> 
> But I think as a result the highatomic feature is effectively dead. Your
> results confirm there are no more Highatomic pageblocks and zero Atomic
> order-4+ allocations (actually it's weird there's still 1 highatomic
> pageblock with zero allocations that would reserve it, or is that a rounding
> error due to calculating average across multiple hosts?).

Likely a rounding issue.

> 
> I think it's not a surprise that there are no costly highatomic allocation
> attempts, we've always said they are too easy to fail, so likely nobody even
> tries them. MIGRATE_HIGHATOMIC was introduced by Mel [1] and evaluated on
> order-1. Even the non-costly orders can fail of course and should have
> fallbacks, highatomic reserves are just supposed to make the success more
> likely as that improves e.g. the networking receive performance, and they do
> use non-costly orders.
> 
> Did you observe no increase of net receive fallbacks due to this patch?
> Would that be an universal outcome? I.e. did highatomic reservations become
> obsolete thanks to other improvements to the page allocator since they were
> introduced? That would be great as we could remove it completely and
> simplify the code, but we don't know that yet.

See the separate patch [0] which takes a targeted approach on the
allocator path. It accounts for net fallbacks and should help napi/page
frag allocs in the fastpath.

> 
> If there are still benefits, they probably should stay, but that means keep
> them working for non-costly orders, and we should fix the observed problems
> differently. I can see two directions to try in that order.

Ack.

> 
> - You say there are "high frequency net allocations" so I assume they are
> ongoing. We could try modify the fastpath __alloc_frozen_pages_noprof() to
> properly evaluate ALLOC_HIGHATOMIC and let them prefer the reserved blocks
> in cases that do not end up in __alloc_pages_slowpath(). This should ensure
> the reserved blocks are actually being used even if we are above low
> watermarks and don't enter the slowpath.

Yes, this can be seen in the separate patch [0].

> 
> - If that doesn't help and we still have unused highatomic pageblocks,
> figure out how that happens - is the highatomic allocation frequency higher
> at some point, resulting in their increase, and then it drops and they stay
> around? If yes, think about how to make the unreserving more aggressive than
> it currently is.
> 
> [1]
> https://lore.kernel.org/all/1442832762-7247-10-git-send-email-mgorman@techsingularity.net/
> 

The patch below improves the allocator path. I'll explore opportunities
for unreserving.

[0] https://lore.kernel.org/all/20260616191420.52556-1-jp.kobryn@linux.dev/

next prev parent reply	other threads:[~2026-06-16 19:59 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-19  1:25 [PATCH] mm/page_alloc: skip high atomic reservation at or below costly order JP Kobryn (Meta)
2026-05-19 19:27 ` Andrew Morton
2026-05-19 23:25   ` JP Kobryn (Meta)
2026-05-19 20:28 ` Johannes Weiner
2026-05-25  9:11   ` Vlastimil Babka (SUSE)
2026-05-27  5:57     ` JP Kobryn
2026-05-28 13:57       ` Vlastimil Babka (SUSE)
2026-06-16 19:58         ` JP Kobryn [this message]
2026-05-27  2:33   ` JP Kobryn
2026-05-28 17:09 ` Frank van der Linden
2026-06-16 20:00   ` JP Kobryn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bb1e9f4c-be28-4bc8-8a45-a59b3af888f0@linux.dev \
    --to=jp.kobryn@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=kernel-team@meta.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=surenb@google.com \
    --cc=usama.arif@linux.dev \
    --cc=vbabka@kernel.org \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox