linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: siddhartha@kenip.in
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>,
	linux-mm@kvack.org, Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Subject: Re: [PATCH] mm: limit THP alignment – performance gain observed in AI inference workloads
Date: Mon, 28 Jul 2025 11:11:15 +0530	[thread overview]
Message-ID: <57c50dbbccf38a97e6e9cbb3f2f75f01@kenip.in> (raw)
In-Reply-To: <fd6cc30b-cc8d-4003-ba01-fefd5d696ec6@suse.cz>

On 2025-07-07 14:26, Vlastimil Babka wrote:
> On 7/1/25 20:49, Zi Yan wrote:
>>>> This is very useful information and it's appreciated! Let's not 
>>>> drown this
>>>> out with restatements of stuff already covered.
>>>> 
>>>>> ⚙️ 5. mTHP note
>>>>> Although this patch doesn’t target mTHP directly, I believe a 
>>>>> similar logic
>>>>> tweak could apply there too — especially with shmem-backed 
>>>>> workloads (common
>>>>> in model servers using shared tensor memory). I’d be happy to help 
>>>>> test any
>>>>> changes proposed there to derive the consequent results.
>>>> Dev - could we hold off on any effort to do something like this 
>>>> until I've
>>>> had a chance to refactor THP somewhat? This is already a mess and 
>>>> I'd like
>>>> to avoid us piling on more complexity.
>>>> 
>>>> We can revisit this at a later stage.
>>> 
>>> Yes of course. I had run a small benchmark on a quick dumb patch I 
>>> wrote and I
>>> don't see any measurable perf improvement, probably because the 
>>> highest THP order
>>> getting chosen is always PMD size.
>> 
>> I think mTHP is much more complicated, since mTHP has many sizes.
>> Trying to adjust VMA alignments to get mTHP might not work well, since
>> you never know what sizes new VMAs are going to have.
> 
> Yes I agree it's more complicated. In case there would be a stream of
> allocations of varying small-ish sizes, aligning each of them to its
> smallest applicable mTHP could create gaps that wouldn't exist if we 
> ignored
> the alignment and just find any free area and in the end merge it to an
> existing one. Basically we'd risk recreating the issue with gaps.
> 
> Sticking to one size (2MB) mitigates this to some extent. Unfortunately 
> even
> after my fix the heuristics might be prone to gaps:
> 
> - all allocations not multiple of 2MB - will merge freely
> 
> - all allocations multiple of 2MB - the alignment heuristic will kick 
> in,
> but as a result allocations should still merge as all boundaries are 
> 2MB
> alignned
> 
> - allocations alternate between multiple of 2MB and non-multiple of 2MB 
> -
> this will still create gaps
> 
> Note we already had a report about ebizzy regressing due to my commit 
> [1]
> and I suspect it might be due to this kind of scenario. A proper
> investigation would be useful but I didn't get to it.
> 
> Maybe the solution is to first check if unaligned search gives us a 
> range
> that will merge with adjacent area, and only try the alignment 
> heuristics if
> it doesn't. This will still fail if mmap() is followed by e.g. 
> mprotect() or
> madvise() that will change an initially un-mergeable area to a 
> mergeable
> one. I have no ideas around that though. Just some thoughts to consider 
> for
> anyone wanting to change things here further :)
> 
> [1] 
> https://lore.kernel.org/all/019401db769f%24961e7e20%24c25b7a60%24@telus.net/
> 
>> IMHO, it might be better to align VMA to PMD or the largest mTHP size
>> (for example, on ARM64 with 64KB base page, PMD THP is 512MB, a 2MB
>> mTHP sounds more reasonable there) if possible and enable
>> VMA merging as much as possible for future huge page collapse.
>> mTHP can be used to fill the non faulted holes in VMAs if necessary.
>> 
>>> 
>>> Out of curiosity, where do you plan to do the refactoring?
>> 
>> 
>> Best Regards,
>> Yan, Zi
>> 
Hi Lorenzo, Dev, Mel,

I'm following up on this patch submission from earlier this month:
"[PATCH] mm: limit THP alignment – performance gain observed in AI 
inference workloads."

The change limits THP alignment to PMD-sized mappings, avoiding 
unnecessary hugepage over-allocations in scenarios where 2MB alignment 
is not beneficial. We’ve observed consistent performance improvements in 
inference pipelines (specifically with OpenVINO) where the workload 
profile includes a mix of small and large allocations.

Please let me know if:
- There has been any progress or feedback from your end,
- The patch needs to align with ongoing THP refactoring efforts,
- Additional benchmarks, test traces, or system-level profiles would 
help.

Happy to revise or refine the patch based on further discussion. Thanks 
again for your time and input!

For your information, I have also posted the same at Openvino and 
Huggingface forums and currently waiting for review for the commit on 
the Openvino github repository.

Best regards,
Siddhartha Sharma


  reply	other threads:[~2025-07-28  5:41 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-27 10:39 [PATCH] mm: limit THP alignment – performance gain observed in AI inference workloads siddhartha
2025-06-27 10:45 ` siddhartha
2025-06-27 15:30 ` Lorenzo Stoakes
2025-06-28  3:49   ` Dev Jain
2025-06-30  0:43     ` siddhartha
2025-06-30  5:25       ` Dev Jain
2025-06-30  5:28         ` Dev Jain
2025-06-30 10:54         ` Lorenzo Stoakes
2025-06-30 11:48           ` siddhartha
2025-07-01  5:23           ` Dev Jain
2025-07-01  5:28             ` Lorenzo Stoakes
2025-07-01  5:45               ` Dev Jain
2025-07-01  5:53                 ` Lorenzo Stoakes
2025-07-01  6:30                   ` Dev Jain
2025-07-01  6:50                     ` Lorenzo Stoakes
2025-07-01  6:58                       ` Dev Jain
2025-07-01 12:15                         ` siddhartha
2025-07-01 12:39                           ` Lorenzo Stoakes
2025-07-01 13:23                             ` siddhartha
2025-07-01 13:28                               ` Lorenzo Stoakes
2025-07-01 14:20                                 ` siddhartha
2025-07-01 16:20                             ` Dev Jain
2025-07-01 18:49                               ` Zi Yan
2025-07-07  8:56                                 ` Vlastimil Babka
2025-07-28  5:41                                   ` siddhartha [this message]
2025-07-28 11:00                                     ` Vlastimil Babka
2025-07-01 15:40                           ` Yang Shi
  -- strict thread matches above, loose matches on Subject: below --
2025-08-11 22:14 siddhartha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57c50dbbccf38a97e6e9cbb3f2f75f01@kenip.in \
    --to=siddhartha@kenip.in \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=vbabka@suse.cz \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).