Re: [PATCH] mm: limit THP alignment – performance gain observed in AI inference workloads

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: siddhartha@kenip.in
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>,
	linux-mm@kvack.org, Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Subject: Re: [PATCH] mm: limit THP alignment – performance gain observed in AI inference workloads
Date: Mon, 28 Jul 2025 11:11:15 +0530	[thread overview]
Message-ID: <57c50dbbccf38a97e6e9cbb3f2f75f01@kenip.in> (raw)
In-Reply-To: <fd6cc30b-cc8d-4003-ba01-fefd5d696ec6@suse.cz>

On 2025-07-07 14:26, Vlastimil Babka wrote:
> On 7/1/25 20:49, Zi Yan wrote:
>>>> This is very useful information and it's appreciated! Let's not 
>>>> drown this
>>>> out with restatements of stuff already covered.
>>>> 
>>>>> ⚙️ 5. mTHP note
>>>>> Although this patch doesn’t target mTHP directly, I believe a 
>>>>> similar logic
>>>>> tweak could apply there too — especially with shmem-backed 
>>>>> workloads (common
>>>>> in model servers using shared tensor memory). I’d be happy to help 
>>>>> test any
>>>>> changes proposed there to derive the consequent results.
>>>> Dev - could we hold off on any effort to do something like this 
>>>> until I've
>>>> had a chance to refactor THP somewhat? This is already a mess and 
>>>> I'd like
>>>> to avoid us piling on more complexity.
>>>> 
>>>> We can revisit this at a later stage.
>>> 
>>> Yes of course. I had run a small benchmark on a quick dumb patch I 
>>> wrote and I
>>> don't see any measurable perf improvement, probably because the 
>>> highest THP order
>>> getting chosen is always PMD size.
>> 
>> I think mTHP is much more complicated, since mTHP has many sizes.
>> Trying to adjust VMA alignments to get mTHP might not work well, since
>> you never know what sizes new VMAs are going to have.
> 
> Yes I agree it's more complicated. In case there would be a stream of
> allocations of varying small-ish sizes, aligning each of them to its
> smallest applicable mTHP could create gaps that wouldn't exist if we 
> ignored
> the alignment and just find any free area and in the end merge it to an
> existing one. Basically we'd risk recreating the issue with gaps.
> 
> Sticking to one size (2MB) mitigates this to some extent. Unfortunately 
> even
> after my fix the heuristics might be prone to gaps:
> 
> - all allocations not multiple of 2MB - will merge freely
> 
> - all allocations multiple of 2MB - the alignment heuristic will kick 
> in,
> but as a result allocations should still merge as all boundaries are 
> 2MB
> alignned
> 
> - allocations alternate between multiple of 2MB and non-multiple of 2MB 
> -
> this will still create gaps
> 
> Note we already had a report about ebizzy regressing due to my commit 
> [1]
> and I suspect it might be due to this kind of scenario. A proper
> investigation would be useful but I didn't get to it.
> 
> Maybe the solution is to first check if unaligned search gives us a 
> range
> that will merge with adjacent area, and only try the alignment 
> heuristics if
> it doesn't. This will still fail if mmap() is followed by e.g. 
> mprotect() or
> madvise() that will change an initially un-mergeable area to a 
> mergeable
> one. I have no ideas around that though. Just some thoughts to consider 
> for
> anyone wanting to change things here further :)
> 
> [1] 
> https://lore.kernel.org/all/019401db769f%24961e7e20%24c25b7a60%24@telus.net/
> 
>> IMHO, it might be better to align VMA to PMD or the largest mTHP size
>> (for example, on ARM64 with 64KB base page, PMD THP is 512MB, a 2MB
>> mTHP sounds more reasonable there) if possible and enable
>> VMA merging as much as possible for future huge page collapse.
>> mTHP can be used to fill the non faulted holes in VMAs if necessary.
>> 
>>> 
>>> Out of curiosity, where do you plan to do the refactoring?
>> 
>> 
>> Best Regards,
>> Yan, Zi
>> 
Hi Lorenzo, Dev, Mel,

I'm following up on this patch submission from earlier this month:
"[PATCH] mm: limit THP alignment – performance gain observed in AI 
inference workloads."

The change limits THP alignment to PMD-sized mappings, avoiding 
unnecessary hugepage over-allocations in scenarios where 2MB alignment 
is not beneficial. We’ve observed consistent performance improvements in 
inference pipelines (specifically with OpenVINO) where the workload 
profile includes a mix of small and large allocations.

Please let me know if:
- There has been any progress or feedback from your end,
- The patch needs to align with ongoing THP refactoring efforts,
- Additional benchmarks, test traces, or system-level profiles would 
help.

Happy to revise or refine the patch based on further discussion. Thanks 
again for your time and input!

For your information, I have also posted the same at Openvino and 
Huggingface forums and currently waiting for review for the commit on 
the Openvino github repository.

Best regards,
Siddhartha Sharma

next prev parent reply	other threads:[~2025-07-28  5:41 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-27 10:39 [PATCH] mm: limit THP alignment – performance gain observed in AI inference workloads siddhartha
2025-06-27 10:45 ` siddhartha
2025-06-27 15:30 ` Lorenzo Stoakes
2025-06-28  3:49   ` Dev Jain
2025-06-30  0:43     ` siddhartha
2025-06-30  5:25       ` Dev Jain
2025-06-30  5:28         ` Dev Jain
2025-06-30 10:54         ` Lorenzo Stoakes
2025-06-30 11:48           ` siddhartha
2025-07-01  5:23           ` Dev Jain
2025-07-01  5:28             ` Lorenzo Stoakes
2025-07-01  5:45               ` Dev Jain
2025-07-01  5:53                 ` Lorenzo Stoakes
2025-07-01  6:30                   ` Dev Jain
2025-07-01  6:50                     ` Lorenzo Stoakes
2025-07-01  6:58                       ` Dev Jain
2025-07-01 12:15                         ` siddhartha
2025-07-01 12:39                           ` Lorenzo Stoakes
2025-07-01 13:23                             ` siddhartha
2025-07-01 13:28                               ` Lorenzo Stoakes
2025-07-01 14:20                                 ` siddhartha
2025-07-01 16:20                             ` Dev Jain
2025-07-01 18:49                               ` Zi Yan
2025-07-07  8:56                                 ` Vlastimil Babka
2025-07-28  5:41                                   ` siddhartha [this message]
2025-07-28 11:00                                     ` Vlastimil Babka
2025-07-01 15:40                           ` Yang Shi
  -- strict thread matches above, loose matches on Subject: below --
2025-08-11 22:14 siddhartha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57c50dbbccf38a97e6e9cbb3f2f75f01@kenip.in \
    --to=siddhartha@kenip.in \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=vbabka@suse.cz \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).