From: Mike Rapoport <rppt@kernel.org>
To: Usama Arif <usamaarif642@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
david@redhat.com, linux-mm@kvack.org, hannes@cmpxchg.org,
shakeel.butt@linux.dev, riel@surriel.com, ziy@nvidia.com,
laoar.shao@gmail.com, baolin.wang@linux.alibaba.com,
lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com,
npache@redhat.com, ryan.roberts@arm.com, vbabka@suse.cz,
jannh@google.com, Arnd Bergmann <arnd@arndb.de>,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
kernel-team@meta.com, linux-api@vger.kernel.org
Subject: Re: [PATCH v3 0/7] prctl: introduce PR_SET/GET_THP_POLICY
Date: Thu, 22 May 2025 15:10:45 +0300 [thread overview]
Message-ID: <aC8URbAzw06Ob4T8@kernel.org> (raw)
In-Reply-To: <20250519223307.3601786-1-usamaarif642@gmail.com>
(cc'ing linux-api)
On Mon, May 19, 2025 at 11:29:52PM +0100, Usama Arif wrote:
> This series allows to change the THP policy of a process, according to the
> value set in arg2, all of which will be inherited during fork+exec:
> - PR_DEFAULT_MADV_HUGEPAGE: This will set VM_HUGEPAGE and clear VM_NOHUGEPAGE
> for the default VMA flags. It will also iterate through every VMA in the
> process and call hugepage_madvise on it, with MADV_HUGEPAGE policy.
> This effectively allows setting MADV_HUGEPAGE on the entire process.
> In an environment where different types of workloads are run on the
> same machine, this will allow workloads that benefit from always having
> hugepages to do so, without regressing those that don't.
> - PR_DEFAULT_MADV_NOHUGEPAGE: This will set VM_NOHUGEPAGE and clear VM_HUGEPAGE
> for the default VMA flags. It will also iterate through every VMA in the
> process and call hugepage_madvise on it, with MADV_NOHUGEPAGE policy.
> This effectively allows setting MADV_NOHUGEPAGE on the entire process.
> In an environment where different types of workloads are run on the
> same machine,this will allow workloads that benefit from having
> hugepages on an madvise basis only to do so, without regressing those
> that benefit from having hugepages always.
> - PR_THP_POLICY_SYSTEM: This will reset (clear) both VM_HUGEPAGE and
> VM_NOHUGEPAGE process for the default flags.
>
> In hyperscalers, we have a single THP policy for the entire fleet.
> We have different types of workloads (e.g. AI/compute/databases/etc)
> running on a single server.
> Some of these workloads will benefit from always getting THP at fault
> (or collapsed by khugepaged), some of them will benefit by only getting
> them at madvise.
>
> This series is useful for 2 usecases:
> 1) global system policy = madvise, while we want some workloads to get THPs
> at fault and by khugepaged :- some processes (e.g. AI workloads) benefits
> from getting THPs at fault (and collapsed by khugepaged). Other workloads
> like databases will incur regression (either a performance regression or
> they are completely memory bound and even a very slight increase in memory
> will cause them to OOM). So what these patches will do is allow setting
> prctl(PR_DEFAULT_MADV_HUGEPAGE) on the AI workloads, (This is how
> workloads are deployed in our (Meta's/Facebook) fleet at this moment).
>
> 2) global system policy = always, while we want some workloads to get THPs
> only on madvise basis :- Same reason as 1). What these patches
> will do is allow setting prctl(PR_DEFAULT_MADV_NOHUGEPAGE) on the database
> workloads. (We hope this is us (Meta) in the near future, if a majority of
> workloads show that they benefit from always, we flip the default host
> setting to "always" across the fleet and workloads that regress can opt-out
> and be "madvise". New services developed will then be tested with always by
> default. "always" is also the default defconfig option upstream, so I would
> imagine this is faced by others as well.)
>
> v2->v3: (Thanks Lorenzo for all the below feedback!)
> v2: https://lore.kernel.org/all/20250515133519.2779639-1-usamaarif642@gmail.com/
> - no more flags2.
> - no more MMF2_...
> - renamed policy to PR_DEFAULT_MADV_(NO)HUGEPAGE
> - mmap_write_lock_killable acquired in PR_GET_THP_POLICY
> - mmap_write lock fixed in PR_SET_THP_POLICY
> - mmap assert check in process_default_madv_hugepage
> - check if hugepage_global_enabled is enabled in the call and account for s390
> - set mm->def_flags VM_HUGEPAGE and VM_NOHUGEPAGE according to the policy in
> the way done by madvise(). I believe VM merge will not be broken in
> this way.
> - process_default_madv_hugepage function that does for_each_vma and calls
> hugepage_madvise.
>
> v1->v2:
> - change from modifying the THP decision making for the process, to modifying
> VMA flags only. This prevents further complicating the logic used to
> determine THP order (Thanks David!)
> - change from using a prctl per policy change to just using PR_SET_THP_POLICY
> and arg2 to set the policy. (Zi Yan)
> - Introduce PR_THP_POLICY_DEFAULT_NOHUGE and PR_THP_POLICY_DEFAULT_SYSTEM
> - Add selftests and documentation.
>
> Usama Arif (7):
> mm: khugepaged: extract vm flag setting outside of hugepage_madvise
> prctl: introduce PR_DEFAULT_MADV_HUGEPAGE for the process
> prctl: introduce PR_DEFAULT_MADV_NOHUGEPAGE for the process
> prctl: introduce PR_THP_POLICY_SYSTEM for the process
> selftests: prctl: introduce tests for PR_DEFAULT_MADV_NOHUGEPAGE
> selftests: prctl: introduce tests for PR_THP_POLICY_DEFAULT_HUGE
> docs: transhuge: document process level THP controls
>
> Documentation/admin-guide/mm/transhuge.rst | 42 +++
> include/linux/huge_mm.h | 2 +
> include/linux/mm.h | 2 +-
> include/linux/mm_types.h | 4 +-
> include/uapi/linux/prctl.h | 6 +
> kernel/sys.c | 53 ++++
> mm/huge_memory.c | 13 +
> mm/khugepaged.c | 26 +-
> tools/include/uapi/linux/prctl.h | 6 +
> .../trace/beauty/include/uapi/linux/prctl.h | 6 +
> tools/testing/selftests/prctl/Makefile | 2 +-
> tools/testing/selftests/prctl/thp_policy.c | 286 ++++++++++++++++++
> 12 files changed, 436 insertions(+), 12 deletions(-)
> create mode 100644 tools/testing/selftests/prctl/thp_policy.c
>
> --
> 2.47.1
>
>
--
Sincerely yours,
Mike.
prev parent reply other threads:[~2025-05-22 12:10 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-19 22:29 [PATCH v3 0/7] prctl: introduce PR_SET/GET_THP_POLICY Usama Arif
2025-05-19 22:29 ` [PATCH v3 1/7] mm: khugepaged: extract vm flag setting outside of hugepage_madvise Usama Arif
2025-05-20 9:51 ` kernel test robot
2025-05-20 14:43 ` Lorenzo Stoakes
2025-05-20 14:57 ` Usama Arif
2025-05-20 15:13 ` Usama Arif
2025-05-20 15:31 ` Lorenzo Stoakes
2025-05-19 22:29 ` [PATCH v3 2/7] prctl: introduce PR_DEFAULT_MADV_HUGEPAGE for the process Usama Arif
2025-05-19 23:01 ` Jann Horn
2025-05-20 5:23 ` Lorenzo Stoakes
2025-05-20 9:09 ` David Hildenbrand
2025-05-20 9:16 ` Lorenzo Stoakes
2025-05-20 8:48 ` kernel test robot
2025-05-19 22:29 ` [PATCH v3 3/7] prctl: introduce PR_DEFAULT_MADV_NOHUGEPAGE " Usama Arif
2025-05-19 22:29 ` [PATCH v3 4/7] prctl: introduce PR_THP_POLICY_SYSTEM " Usama Arif
2025-05-19 22:29 ` [PATCH v3 5/7] selftests: prctl: introduce tests for PR_DEFAULT_MADV_NOHUGEPAGE Usama Arif
2025-05-19 22:29 ` [PATCH v3 6/7] selftests: prctl: introduce tests for PR_THP_POLICY_DEFAULT_HUGE Usama Arif
2025-05-19 22:29 ` [PATCH v3 7/7] docs: transhuge: document process level THP controls Usama Arif
2025-05-20 5:14 ` [PATCH v3 0/7] prctl: introduce PR_SET/GET_THP_POLICY Lorenzo Stoakes
2025-05-20 7:46 ` Usama Arif
2025-05-20 8:51 ` Lorenzo Stoakes
2025-05-21 2:33 ` Liam R. Howlett
2025-05-21 9:31 ` Usama Arif
2025-05-21 16:37 ` Liam R. Howlett
2025-05-22 12:10 ` Mike Rapoport [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aC8URbAzw06Ob4T8@kernel.org \
--to=rppt@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=jannh@google.com \
--cc=kernel-team@meta.com \
--cc=laoar.shao@gmail.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=npache@redhat.com \
--cc=riel@surriel.com \
--cc=ryan.roberts@arm.com \
--cc=shakeel.butt@linux.dev \
--cc=usamaarif642@gmail.com \
--cc=vbabka@suse.cz \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.