From: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
To: JP Kobryn <jp.kobryn@linux.dev>,
Johannes Weiner <hannes@cmpxchg.org>,
Mel Gorman <mgorman@techsingularity.net>
Cc: akpm@linux-foundation.org, surenb@google.com, mhocko@suse.com,
jackmanb@google.com, ziy@nvidia.com, linux-mm@kvack.org,
usama.arif@linux.dev, kirill@shutemov.name, willy@infradead.org,
linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH] mm/page_alloc: skip high atomic reservation at or below costly order
Date: Thu, 28 May 2026 15:57:05 +0200 [thread overview]
Message-ID: <ae742ae6-a7b4-424b-bc87-c18200d9d203@kernel.org> (raw)
In-Reply-To: <7a906c76-6dd9-4bd6-8bab-cb69eb0a3db6@linux.dev>
On 5/27/26 07:57, JP Kobryn wrote:
> On 5/25/26 2:11 AM, Vlastimil Babka (SUSE) wrote:
>> On 5/19/26 22:28, Johannes Weiner wrote:
>>> On Mon, May 18, 2026 at 06:25:32PM -0700, JP Kobryn (Meta) wrote:
>>> This is an interesting patch. A couple of thoughts:
>>>
>>> 1. You disabled the highatomic reserve for this workload and it didn't
>>> seem to matter. Presumably <costly orders don't need the protection.
>>>
>>> 2. Maxing out the reserves is odd. ALLOC_HIGHATOMIC allocations will
>>> try reserved space first,
>> Hmm, but if the allocation succeeds before entering slowpath,
>> ALLOC_NON_BLOCK won't be set.
>> But reserving another block should mean we already exhausted the
>> reserved ones.
>> Unreserving is only done when direct reclaim made some progress but failed
>> to produce a page. But if it works, or kswapd does the job, we won't
>> enter it?
>
> There was just no real pressure to invoke the unreserving. Let me know
> if I'm misunderstanding the question.
Sorry, it was more thinking out loud about Johannes' point than a question.
Yeah it seems there was no real pressure to invoke unreserving.
The reserving side is probably fine. Highatomic allocation will not try the
already reserved blocks in he fastpath, which is maybe not ideal. But they
will try them before reserving another block, and that's the important part.
>>> and I'd expect things that are commonly
>>> highatomic to be short-lived. Why don't we stop with a couple of
>>> claimed highatomic blocks that get continuously recycled?
>> Maybe it's some big burst of highatomic allocations that leads to the
>> reservations and then they stay around "forever"?
>
> I should add to the changelog the missing info that high frequency
> net allocations are responsible for these high atomic reservations.
> Even though the allocations are not necessarily long-lived, the
> pageblocks remain high atomic.
OK, thanks for the info.
>> If that's the case I think we should be perhaps looking at the unreserving
>> being done more proactively, rather than limiting things to costly order.
>
> What are your thoughts if we instead look at it as: should we be reserving
> full pageblocks for small allocations?
Well, since migratetypes operate on the pageblock level, so do the
highatomic reservations. It at least groups them together and not scatter
all over random pageblocks?
> It seems to come down to whether we want the disproportionate protection
> of full
> pageblocks (below costly order) for high atomic allocs vs letting them
> coalesce
> in the buddy path. Is the data not enough to justify the latter?
I still think the data shows we might be too lax in unreserving.
>>> 3. The impact on THP and compaction success rate is pretty
>>> extreme. How can 1% of memory throw such a wrench into the gears?
>> Maybe if ~all free memory is in the highatomic blocks, compaction can't be
>> effective much. Or some suitability check somewhere in reclaim+compaction
>> wrongly assumes the highatomic blocks are usable, so it won't do the work.
>
> I could be missing something, but I spent some time tonight looking into
> this and didn't find an issue in the compaction/reclaim suitability path.
>
> __compaction_suitable() calls __zone_watermark_ok(), and that path
> subtracts free MIGRATE_HIGHATOMIC pages from usable free memory for
> callers without reserve access:
>
> /*
> * If the caller does not have rights to reserves below the min
> * watermark then subtract the free pages reserved for highatomic.
> */
> if (likely(!(alloc_flags & ALLOC_RESERVES)))
> unusable_free += READ_ONCE(z->nr_free_highatomic);
>
> So free highatomic pages are removed from the usable free count there.
>
> Also, the suitable-free-block check in __zone_watermark_ok() only treats
> MIGRATE_HIGHATOMIC as usable when alloc_flags includes
> ALLOC_HIGHATOMIC (or ALLOC_OOM). __compaction_suitable() passes
> ALLOC_CMA here (not ALLOC_HIGHATOMIC), so I don't think compaction is
> incorrectly treating free highatomic blocks as usable.
OK, thanks for checking.
> The only caveat I noticed is the fragmentation accounting side:
> fill_contig_page_info() / fragmentation_index() appear to count
> free_area[order].nr_free across migratetypes, so fragmentation scoring
> may look better than they really are. But that seems adjacent
> to this patch.
Right.
> I think though that by the time we consider reclaim or compaction we're
> dealing with the aftermath. The patch prevents the problem from occurring
> up front.
But I think as a result the highatomic feature is effectively dead. Your
results confirm there are no more Highatomic pageblocks and zero Atomic
order-4+ allocations (actually it's weird there's still 1 highatomic
pageblock with zero allocations that would reserve it, or is that a rounding
error due to calculating average across multiple hosts?).
I think it's not a surprise that there are no costly highatomic allocation
attempts, we've always said they are too easy to fail, so likely nobody even
tries them. MIGRATE_HIGHATOMIC was introduced by Mel [1] and evaluated on
order-1. Even the non-costly orders can fail of course and should have
fallbacks, highatomic reserves are just supposed to make the success more
likely as that improves e.g. the networking receive performance, and they do
use non-costly orders.
Did you observe no increase of net receive fallbacks due to this patch?
Would that be an universal outcome? I.e. did highatomic reservations become
obsolete thanks to other improvements to the page allocator since they were
introduced? That would be great as we could remove it completely and
simplify the code, but we don't know that yet.
If there are still benefits, they probably should stay, but that means keep
them working for non-costly orders, and we should fix the observed problems
differently. I can see two directions to try in that order.
- You say there are "high frequency net allocations" so I assume they are
ongoing. We could try modify the fastpath __alloc_frozen_pages_noprof() to
properly evaluate ALLOC_HIGHATOMIC and let them prefer the reserved blocks
in cases that do not end up in __alloc_pages_slowpath(). This should ensure
the reserved blocks are actually being used even if we are above low
watermarks and don't enter the slowpath.
- If that doesn't help and we still have unused highatomic pageblocks,
figure out how that happens - is the highatomic allocation frequency higher
at some point, resulting in their increase, and then it drops and they stay
around? If yes, think about how to make the unreserving more aggressive than
it currently is.
[1]
https://lore.kernel.org/all/1442832762-7247-10-git-send-email-mgorman@techsingularity.net/
next prev parent reply other threads:[~2026-05-28 13:57 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-19 1:25 [PATCH] mm/page_alloc: skip high atomic reservation at or below costly order JP Kobryn (Meta)
2026-05-19 19:27 ` Andrew Morton
2026-05-19 23:25 ` JP Kobryn (Meta)
2026-05-19 20:28 ` Johannes Weiner
2026-05-25 9:11 ` Vlastimil Babka (SUSE)
2026-05-27 5:57 ` JP Kobryn
2026-05-28 13:57 ` Vlastimil Babka (SUSE) [this message]
2026-06-16 19:58 ` JP Kobryn
2026-05-27 2:33 ` JP Kobryn
2026-05-28 17:09 ` Frank van der Linden
2026-06-16 20:00 ` JP Kobryn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ae742ae6-a7b4-424b-bc87-c18200d9d203@kernel.org \
--to=vbabka@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=jackmanb@google.com \
--cc=jp.kobryn@linux.dev \
--cc=kernel-team@meta.com \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=surenb@google.com \
--cc=usama.arif@linux.dev \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.