linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: akpm@linux-foundation.org, ziy@nvidia.com,
	baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com,
	Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com,
	dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com,
	gutierrez.asier@huawei-partners.com, willy@infradead.org,
	ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
	bpf@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [RFC PATCH v2 0/5] mm, bpf: BPF based THP adjustment
Date: Mon, 26 May 2025 12:49:19 +0200	[thread overview]
Message-ID: <c983ffa8-cd14-47d4-9430-b96acedd989c@redhat.com> (raw)
In-Reply-To: <CALOAHbB-KQ4+z-Lupv7RcxArfjX7qtWcrboMDdT4LdpoTXOMyw@mail.gmail.com>

On 26.05.25 11:37, Yafang Shao wrote:
> On Mon, May 26, 2025 at 4:14 PM David Hildenbrand <david@redhat.com> wrote:
>>
>>> Hi all,
>>>
>>> Let’s summarize the current state of the discussion and identify how
>>> to move forward.
>>>
>>> - Global-Only Control is Not Viable
>>> We all seem to agree that a global-only control for THP is unwise. In
>>> practice, some workloads benefit from THP while others do not, so a
>>> one-size-fits-all approach doesn’t work.
>>>
>>> - Should We Use "Always" or "Madvise"?
>>> I suspect no one would choose 'always' in its current state. ;)
>>
>> IIRC, RHEL9 has the default set to "always" for a long time.
> 
> good to know.
> 
>>
>> I guess it really depends on how different the workloads are that you
>> are running on the same machine.
> 
> Correct. If we want to enable THP for specific workloads without
> modifying the kernel, we must isolate them on dedicated servers.
> However, this approach wastes resources and is not an acceptable
> solution.
> 
>>
>>   > Both Lorenzo and David propose relying on the madvise mode. However,>
>> since madvise is an unprivileged userspace mechanism, any user can
>>> freely adjust their THP policy. This makes fine-grained control
>>> impossible without breaking userspace compatibility—an undesirable
>>> tradeoff.
>>
>> If required, we could look into a "sealing" mechanism, that would
>> essentially lock modification attempts performed by the process (i.e.,
>> MADV_HUGEPAGE).
> 
> If we don’t introduce a new THP mode and instead rely solely on
> madvise, the "sealing" mechanism could either violate the intended
> semantics of madvise(), or simply break madvise() entirely, right?

We would have to be a bit careful, yes.

Errors from MADV_HUGEPAGE/MADV_NOHUGEPAGE are often ignored, because 
these options also fail with -EINVAL on kernels without THP support.

Ignoring MADV_NOHUGEPAGE can be problematic with userfaultfd.

What you likely really want to do is seal when you configured 
MADV_NOHUGEPAGE to be the default, and fail MADV_HUGEPAGE later.

>>
>> The could be added on top of the current proposals that are flying
>> around, and could be done e.g., per-process.
> 
> How about introducing a dedicated "process" mode? This would allow
> each process to use different THP modes—some in "always," others in
> "madvise," and the rest in "never." Future THP modes could also be
> added to this framework.

We have to be really careful about not creating even more mess with more 
modes.

How would that design look like in detail (how would we set it per 
process etc?)?

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2025-05-26 10:49 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-20  6:04 [RFC PATCH v2 0/5] mm, bpf: BPF based THP adjustment Yafang Shao
2025-05-20  6:04 ` [RFC PATCH v2 1/5] mm: thp: Add a new mode "bpf" Yafang Shao
2025-05-20  6:05 ` [RFC PATCH v2 2/5] mm: thp: Add hook for BPF based THP adjustment Yafang Shao
2025-05-20  6:05 ` [RFC PATCH v2 3/5] mm: thp: add struct ops " Yafang Shao
2025-05-20  6:05 ` [RFC PATCH v2 4/5] bpf: Add get_current_comm to bpf_base_func_proto Yafang Shao
2025-05-20 23:32   ` Andrii Nakryiko
2025-05-20  6:05 ` [RFC PATCH v2 5/5] selftests/bpf: Add selftest for THP adjustment Yafang Shao
2025-05-20  6:52 ` [RFC PATCH v2 0/5] mm, bpf: BPF based " Nico Pache
2025-05-20  7:25   ` Yafang Shao
2025-05-20 13:10     ` Matthew Wilcox
2025-05-20 14:08       ` Yafang Shao
2025-05-20 14:22         ` Lorenzo Stoakes
2025-05-20 14:32           ` Usama Arif
2025-05-20 14:35             ` Lorenzo Stoakes
2025-05-20 14:42               ` Matthew Wilcox
2025-05-20 14:56                 ` David Hildenbrand
2025-05-21  4:28                 ` Yafang Shao
2025-05-20 14:46               ` Usama Arif
2025-05-20 15:00             ` David Hildenbrand
2025-05-20  9:43 ` David Hildenbrand
2025-05-20  9:49   ` Lorenzo Stoakes
2025-05-20 12:06     ` Yafang Shao
2025-05-20 13:45       ` Lorenzo Stoakes
2025-05-20 15:54         ` David Hildenbrand
2025-05-21  4:02           ` Yafang Shao
2025-05-21  3:52         ` Yafang Shao
2025-05-20 11:59   ` Yafang Shao
2025-05-25  3:01 ` Yafang Shao
2025-05-26  7:41   ` Gutierrez Asier
2025-05-26  9:37     ` Yafang Shao
2025-05-26  8:14   ` David Hildenbrand
2025-05-26  9:37     ` Yafang Shao
2025-05-26 10:49       ` David Hildenbrand [this message]
2025-05-26 14:53         ` Liam R. Howlett
2025-05-26 15:54           ` Liam R. Howlett
2025-05-26 16:51             ` David Hildenbrand
2025-05-26 17:07               ` Liam R. Howlett
2025-05-26 17:12                 ` David Hildenbrand
2025-05-26 20:30               ` Gutierrez Asier
2025-05-26 20:37                 ` David Hildenbrand
2025-05-27  5:46         ` Yafang Shao
2025-05-27  7:57           ` David Hildenbrand
2025-05-27  8:13             ` Yafang Shao
2025-05-27  8:30               ` David Hildenbrand
2025-05-27  8:40                 ` Yafang Shao
2025-05-27  9:27                   ` David Hildenbrand
2025-05-27  9:43                     ` Yafang Shao
2025-05-27 12:19                       ` David Hildenbrand
2025-05-28  2:04                         ` Yafang Shao
2025-05-28 20:32                           ` David Hildenbrand
2025-05-26 14:32   ` Zi Yan
2025-05-27  5:53     ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c983ffa8-cd14-47d4-9430-b96acedd989c@redhat.com \
    --to=david@redhat.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dev.jain@arm.com \
    --cc=gutierrez.asier@huawei-partners.com \
    --cc=hannes@cmpxchg.org \
    --cc=laoar.shao@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=npache@redhat.com \
    --cc=ryan.roberts@arm.com \
    --cc=usamaarif642@gmail.com \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).