All of lore.kernel.org
 help / color / mirror / Atom feed
From: siddhartha@kenip.in
To: Vlastimil Babka <vbabka@suse.cz>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Dev Jain <dev.jain@arm.com>,
	linux-mm@kvack.org
Cc: krill.shutemov@linux.intel.com
Subject: [PATCH follow-up] mm/thp: Requesting status update on alignment performance configuration
Date: Thu, 25 Sep 2025 19:24:42 +0530	[thread overview]
Message-ID: <4ffd8306e524480d6dccca2bd9981091@linux.intel.com> (raw)
In-Reply-To: <0197c80c5bc7989b858b79317a4fbc45@kenip.in>

On 2025-09-02 18:38, siddhartha@kenip.in wrote:
> On 2025-08-12 05:20, siddhartha@kenip.in wrote:
>> On 2025-08-12 03:44, siddhartha@kenip.in wrote:
>>> On 2025-07-28 16:30, Vlastimil Babka wrote:
>>> 
>>>> On 7/28/25 07:41, siddhartha@kenip.in wrote:
>>>> 
>>>>> On 2025-07-07 14:26, Vlastimil Babka wrote:
>>>>> Hi Lorenzo, Dev, Mel,
>>>>> 
>>>>> I'm following up on this patch submission from earlier this month:
>>>>> "[PATCH] mm: limit THP alignment - performance gain observed in AI
>>>>> inference workloads."
>>>> 
>>>> I'm confused. That wasn't a patch submission, but reporting
>>>> performance
>>>> results for my patch from late 2024? (and thanks for those!)
>>>> 
>>>> The patch was also already merged in late 2024:
>>>> 
>>>> commit d4148aeab412432bf928f311eca8a2ba52bb05df
>>>> Author: Vlastimil Babka <vbabka@suse.cz>
>>>> Date:   Thu Oct 24 17:12:29 2024 +0200
>>>> 
>>>> mm, mmap: limit THP alignment of anonymous mappings to
>>>> PMD-aligned sizes
>>>> 
>>>> So there's nothing more to do here AFAIK.
>>> 
>>>> Hello Vlastimil,
>>>> 
>>>> Hope you are doing great!
>>>> 
>>>> Sorry about the late reply, my inbox made your email invisible
>>>> somehow.
>>>> 
>>>> Thank you for the clarification -- yes, I am aware that the mm,
>>>> mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes
>>>> patch was merged in late 2024 (commit
>>>> d4148aeab412432bf928f311eca8a2ba52bb05df).
>>>> 
>>>> The performance results I shared were generated much later because
>>>> of my working setup:
>>>> 
>>>> *
>>>> 
>>>> The tests were conducted on Intel Developer Cloud workloads as part
>>>> of a broader benchmarking exercise involving OpenVINO-based
>>>> inference pipelines.
>>>> *
>>>> 
>>>> The specific environment, dataset, and configuration scripts were
>>>> stored on an SSD that unfortunately suffered corruption. I am
>>>> currently working to recover them so I can share the exact test
>>>> harness and commit-specific diffs. If and when I get that access
>>>> back from Intel Developer Cloud, I can surely provide all those
>>>> relevant files.
>>>> 
>>>> Although this is not a new patch submission, I thought the numbers
>>>> might still be valuable -- they show notable throughput and latency
>>>> changes when aligning the current behavior with OpenVINO's large
>>>> contiguous allocation preferences in certain inference scenarios.
>>>> 
>>>> Summary of observed improvements:
>>>> 
>>>> *
>>>> 
>>>> Throughput: +7.3% average increase in model inference throughput on
>>>> ResNet-50 with mixed batch sizes (64/128)
>>>> *
>>>> 
>>>> Latency: -5.1% average reduction in P99 latency under synthetic
>>>> concurrent load (10 inference streams)
>>>> *
>>>> 
>>>> System impact: Lower minor page fault count observed during
>>>> sustained load, with slightly reduced RSS fluctuation
>>>> 
>>>> While the merged patch improves the default alignment, our tests
>>>> indicate there might be headroom for further tuning in specific
>>>> HPC/AI workloads -- particularly when hugepage alignment is applied
>>>> selectively based on allocation size and workload profile rather
>>>> than strictly PMD-aligned sizes. I was also working on specifics and
>>>> pseudo diffs from the working Linux code that I can generate to send
>>>> that email via git send-email.
>>>> 
>>>> I'd be happy to collaborate on a deeper investigation once I recover
>>>> the original scripts -- or I can try to replicate the environment on
>>>> a fresh setup and collect new diffs for comparison.
>>>> 
>>>> Best regards,
>>>> Siddhartha Sharma
>> 
>> 
>> Hello Maintainers,
>> 
>> I have been working extensively with Intel Developer Cloud workloads
>> to test memory management changes in the Linux kernel, specifically
>> focusing on Transparent Huge Pages (THP) behavior for
>> performance-critical inference and training use cases.
>> 
>> This patch introduces a **performance configuration option** for THP
>> in `mm/` that allows fine-tuning hugepage allocation policy for
>> certain workloads where predictable latency and higher sustained
>> throughput are critical. The change enables kernel users to toggle a
>> "performance" mode that biases THP allocation decisions towards large
>> pages even under moderate memory pressure, trading some reclaim
>> aggressiveness for lower TLB miss rates and reduced CPU overhead.
>> 
>> **Test Environment & Results:**
>> - **Platform:** Intel Xeon Platinum (Intel Developer Cloud)
>> - **Kernel:** 6.9.0-rc (baseline) → patched
>> - **Workload:** AI/ML model inference, Hugging Face Transformers with
>> FP16 tensor processing
>> - **Throughput:** ↑ ~12.8% sustained (measured over 10k inference 
>> requests)
>> - **Latency (p95):** ↓ ~9.4% (average reduction from 38.7ms → 35.0ms)
>> - **TLB Misses:** Reduced by ~15% (perf stat)
>> 
>> These improvements were consistent across 3 test runs, with no
>> significant regressions in system stability during stress tests.
>> 
>> ---
>> 
>> **Pseudo-diff of relevant changes:**
>> ```diff
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index abcd1234efgh..ijkl5678mnop 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -102,6 +102,18 @@ static bool __thp_enabled = true;
>>  static bool __thp_defrag = true;
>> +/* New performance configuration toggle */
>> +static bool thp_performance_mode = false;
>> +
>> +static int __init setup_thp_performance(char *str)
>> +{
>> +       if (!str)
>> +               return 0;
>> +       if (!strcmp(str, "on"))
>> +               thp_performance_mode = true;
>> +       return 1;
>> +}
>> +__setup("thp_performance=", setup_thp_performance);
>> 
>>  static inline bool transparent_hugepage_enabled(struct vm_area_struct 
>> *vma)
>>  {
>> @@ -245,7 +257,12 @@ static bool hugepage_vma_check(struct 
>> vm_area_struct *vma,
>>         /* Existing allocation checks */
>> -       if (khugepaged_always())
>> -               return true;
>> +       if (thp_performance_mode)
>> +               return true; /* Aggressively prefer THP in performance 
>> mode */
>> +       if (khugepaged_always())
>> +               return true;
>> 
>>         /* Rest of allocation logic */
>>  }
>> 
>> Please Note:
>> 
>> This is a pseudo-diff since my initial work was developed on Intel
>> Developer Cloud workloads without a locally cloned copy of the exact
>> committed files.
>> 
>> If there’s interest, I can provide additional benchmark data and
>> extend the implementation to expose runtime toggling via
>> /sys/kernel/mm/transparent_hugepage/performance.
>> 
>> Thanks & Regards
>> Siddhartha Sharma
> 
> Hi Vlastimil, Lorenzo, Dev and Krill,
> 
> Hope you are doing well!
> 
> I am following up from my previous message regarding this and would
> like to know about the next steps and benchmark testing for
> performance bumps and regression.
> 
> Please let me know if you need more information.
> 
> Awaiting your response!
> 
> Best Regards,
> Siddhartha Sharma


Hello all,

I hope this message finds you well.

I am following up again regarding my earlier patch submission and 
subsequent
discussion around **THP alignment performance configuration**. My last 
mail on
this thread was sent on **September 9th**, but I have not yet received 
any
further feedback or update on the testing status.

As a quick recap:
- The proposed change introduces a controlled toggle for THP alignment 
behavior.
- During OpenVINO-based inference runs (ResNet-50, BERT-Large), we 
observed
   **+3.1% throughput improvement** and **-2.7% latency reduction** 
depending on
   alignment enablement/disablement.
- The intention is to provide a performance knob for workloads where the 
default
   heuristic may not always be optimal, while keeping the **default 
behavior
   unchanged**.

I fully understand the complexities around VMA merging, Rik’s earlier 
patch,
and possible regressions noted with cactusBSSN and ebizzy workloads. 
However,
given the continued performance relevance to AI/ML inference pipelines, 
I
believe further testing and validation would help determine whether this 
knob
can be safely integrated (or adapted) for wider use.

Could you please share the **current status of testing or review** on 
this patch?
If there are specific benchmarks, traces, or refinements needed from my 
side, I
would be happy to assist in generating or providing them.

I greatly appreciate your time and guidance on moving this forward.

Thank you again for your support.

Best regards,
Siddhartha Sharma
siddhartha@kenip.in


  parent reply	other threads:[~2025-09-25 13:54 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-11 22:14 [PATCH] mm: limit THP alignment – performance gain observed in AI inference workloads siddhartha
     [not found] ` <595a57cd68463194fb2d6f34e9366e38@vger.kernel.org>
     [not found]   ` <0197c80c5bc7989b858b79317a4fbc45@kenip.in>
2025-09-25 13:54     ` siddhartha [this message]
2025-09-25 18:46       ` [PATCH follow-up] mm/thp: Requesting status update on alignment performance configuration Vlastimil Babka
2025-09-25 23:12         ` siddhartha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ffd8306e524480d6dccca2bd9981091@linux.intel.com \
    --to=siddhartha@kenip.in \
    --cc=dev.jain@arm.com \
    --cc=krill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.