All of lore.kernel.org
 help / color / mirror / Atom feed
From: Usama Arif <usamaarif642@gmail.com>
To: Zi Yan <ziy@nvidia.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	david@redhat.com, linux-mm@kvack.org, hannes@cmpxchg.org,
	shakeel.butt@linux.dev, riel@surriel.com,
	baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com,
	Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com,
	linux-kernel@vger.kernel.org, kernel-team@meta.com,
	Yafang Shao <laoar.shao@gmail.com>
Subject: Re: [PATCH 0/1] prctl: allow overriding system THP policy to always
Date: Wed, 7 May 2025 16:12:04 +0100	[thread overview]
Message-ID: <96eccc48-b632-40b7-9797-1b0780ea59cd@gmail.com> (raw)
In-Reply-To: <293530AA-1AB7-4FA0-AF40-3A8464DC0198@nvidia.com>



On 07/05/2025 15:57, Zi Yan wrote:
> +Yafang, who is also looking at changing THP config at cgroup/container level.
> 
> On 7 May 2025, at 10:00, Usama Arif wrote:
> 
>> Allowing override of global THP policy per process allows workloads
>> that have shown to benefit from hugepages to do so, without regressing
>> workloads that wouldn't benefit. This will allow such types of
>> workloads to be run/stacked on the same machine.
>>
>> It also helps in rolling out hugepages in hyperscaler configurations
>> for workloads that benefit from them, where a single THP policy is
>> likely to be used across the entire fleet, and prctl will help override it.
>>
>> An advantage of doing it via prctl vs creating a cgroup specific
>> option (like /sys/fs/cgroup/test/memory.transparent_hugepage.enabled) is
>> that this will work even when there are no cgroups present, and my
>> understanding is there is a strong preference of cgroups controls being
>> hierarchical which usually means them having a numerical value.
> 
> Hi Usama,
> 
> Do you mind giving an example on how to change THP policy for a set of
> processes running in a container (under a cgroup)?

Hi Zi,

In our case, we create the processes in the cgroup via systemd. The way we will enable THP=always
for processes in a cgroup is in the same way we enable KSM for the cgroup.
The change in systemd would be very similar to the line in [1], where we would set prctl PR_SET_THP_ALWAYS
in exec-invoke.
This is at the start of the process, but you would already know at the start of the process
whether you want THP=always for it or not.

[1] https://github.com/systemd/systemd/blob/2e72d3efafa88c1cb4d9b28dd4ade7c6ab7be29a/src/core/exec-invoke.c#L5045

Thanks,
Usama

> 
> Yafang mentioned that the prctl approach would require restarting all running
> services[1] and other inflexiblities, so he proposed to use BPF to change THP
> policy[2]. I wonder if Yafang's issues also apply to your case and if you
> have a solution to them.
> 
> Thanks.
> 
> [1] https://lore.kernel.org/linux-mm/CALOAHbCXMi2GaZdHJaNLXxGsJf-hkDTrztsQiceaBcJ8d8p3cA@mail.gmail.com/
> [2] https://lore.kernel.org/linux-mm/20250429024139.34365-1-laoar.shao@gmail.com/
>>
>>
>> The output and code of test program is below:
>>
>> [root@vm4 vmuser]# echo madvise > /sys/kernel/mm/transparent_hugepage/enabled
>> [root@vm4 vmuser]# echo inherit > /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
>> [root@vm4 vmuser]# ./a.out
>> Default THP setting:
>> THP is not set to 'always'.
>> PR_SET_THP_ALWAYS = 1
>> THP is set to 'always'.
>> PR_SET_THP_ALWAYS = 0
>> THP is not set to 'always'.
>>
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <string.h>
>> #include <unistd.h>
>> #include <sys/mman.h>
>> #include <sys/prctl.h>
>>
>> #define PR_SET_THP_ALWAYS 78
>> #define SIZE 12 * (2 * 1024 * 1024) // 24 MB
>>
>> void check_smaps(void) {
>>     FILE *file = fopen("/proc/self/smaps", "r");
>>     if (!file) {
>>         perror("fopen");
>>         return;
>>     }
>>
>>     char line[256];
>>     int is_hugepage = 0;
>>     while (fgets(line, sizeof(line), file)) {
>>         // if (strstr(line, "AnonHugePages:"))
>>         //     printf("%s\n", line);
>>         if (strstr(line, "AnonHugePages:") && strstr(line, "24576 kB"))
>> {
>>             // printf("%s\n", line);
>>             is_hugepage = 1;
>>             break;
>>         }
>>     }
>>     fclose(file);
>>     if (is_hugepage) {
>>         printf("THP is set to 'always'.\n");
>>     } else {
>>         printf("THP is not set to 'always'.\n");
>>     }
>> }
>>
>> void test_mmap_thp(void) {
>>     char *buffer = (char *)mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
>>                                 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>>     if (buffer == MAP_FAILED) {
>>         perror("mmap");
>>         return;
>>     }
>>     // Touch the memory to ensure it's allocated
>>     memset(buffer, 0, SIZE);
>>     check_smaps();
>>     munmap(buffer, SIZE);
>> }
>>
>> int main() {
>>     printf("Default THP setting: \n");
>>     test_mmap_thp();
>>     printf("PR_SET_THP_ALWAYS = 1 \n");
>>     prctl(PR_SET_THP_ALWAYS, 1, NULL, NULL, NULL);
>>     test_mmap_thp();
>>     printf("PR_SET_THP_ALWAYS = 0 \n");
>>     prctl(PR_SET_THP_ALWAYS, 0, NULL, NULL, NULL);
>>     test_mmap_thp();
>>
>>     return 0;
>> }
>>
>>
>> Usama Arif (1):
>>   prctl: allow overriding system THP policy to always per process
>>
>>  include/linux/huge_mm.h                          |  3 ++-
>>  include/linux/mm_types.h                         |  7 ++-----
>>  include/uapi/linux/prctl.h                       |  3 +++
>>  kernel/sys.c                                     | 16 ++++++++++++++++
>>  tools/include/uapi/linux/prctl.h                 |  3 +++
>>  .../perf/trace/beauty/include/uapi/linux/prctl.h |  3 +++
>>  6 files changed, 29 insertions(+), 6 deletions(-)
>>
>> -- 
>> 2.47.1
> 
> 
> --
> Best Regards,
> Yan, Zi



  reply	other threads:[~2025-05-07 15:12 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-07 14:00 [PATCH 0/1] prctl: allow overriding system THP policy to always Usama Arif
2025-05-07 14:00 ` [PATCH 1/1] prctl: allow overriding system THP policy to always per process Usama Arif
2025-05-07 15:02   ` Usama Arif
2025-05-07 20:14   ` Zi Yan
2025-05-08 10:53     ` Usama Arif
2025-05-08 20:29       ` Zi Yan
2025-05-07 14:57 ` [PATCH 0/1] prctl: allow overriding system THP policy to always Zi Yan
2025-05-07 15:12   ` Usama Arif [this message]
2025-05-07 15:57     ` Zi Yan
2025-05-07 16:09       ` Usama Arif
2025-05-08  5:41         ` Yafang Shao
2025-05-08 16:04           ` Usama Arif
2025-05-09  2:15             ` Yafang Shao
2025-05-09  5:13               ` Johannes Weiner
2025-05-09  9:24                 ` Yafang Shao
2025-05-09  9:30                   ` David Hildenbrand
2025-05-09  9:43                     ` Yafang Shao
2025-05-09 16:46                       ` Johannes Weiner
2025-05-09 22:42                         ` David Hildenbrand
2025-05-09 23:34                           ` Zi Yan
2025-05-11  8:15                             ` David Hildenbrand
2025-05-11 14:08                               ` Usama Arif
2025-05-13 11:43                                 ` Yafang Shao
2025-05-13 12:04                                 ` David Hildenbrand
2025-05-11  2:08                         ` Yafang Shao
2025-05-08 11:06 ` David Hildenbrand
2025-05-08 16:35   ` Usama Arif
2025-05-08 17:39     ` David Hildenbrand
2025-05-08 18:05       ` Usama Arif

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=96eccc48-b632-40b7-9797-1b0780ea59cd@gmail.com \
    --to=usamaarif642@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@meta.com \
    --cc=laoar.shao@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=npache@redhat.com \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.