All of lore.kernel.org
 help / color / mirror / Atom feed
From: Usama Arif <usamaarif642@gmail.com>
To: David Hildenbrand <david@redhat.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	corbet@lwn.net, rppt@kernel.org, surenb@google.com,
	mhocko@suse.com, hannes@cmpxchg.org, baohua@kernel.org,
	shakeel.butt@linux.dev, riel@surriel.com, ziy@nvidia.com,
	laoar.shao@gmail.com, dev.jain@arm.com,
	baolin.wang@linux.alibaba.com, npache@redhat.com,
	Liam.Howlett@oracle.com, ryan.roberts@arm.com, vbabka@suse.cz,
	jannh@google.com, Arnd Bergmann <arnd@arndb.de>,
	sj@kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH v2 3/5] mm/huge_memory: treat MADV_COLLAPSE as an advise with PR_THP_DISABLE_EXCEPT_ADVISED
Date: Fri, 1 Aug 2025 12:26:03 +0100	[thread overview]
Message-ID: <3fc8b76c-825d-4aaa-98f4-ff64bb40497d@gmail.com> (raw)
In-Reply-To: <747509a6-8493-46c3-99d4-38b53a8a7504@redhat.com>



On 31/07/2025 15:54, David Hildenbrand wrote:
> On 31.07.25 16:38, Lorenzo Stoakes wrote:
>> Nits on subject:
>>
>> - It's >75 chars
> 
> No big deal. If we cna come up with something shorter, good.

Changed it to "mm/huge_memory: respect MADV_COLLAPSE with PR_THP_DISABLE_EXCEPT_ADVISED"
for the next revision. That would be 73 chars :)

> 
>> - advise is the verb, advice is the noun.
> 
> Yeah.
> 
>>
>> On Thu, Jul 31, 2025 at 01:27:20PM +0100, Usama Arif wrote:
>>> From: David Hildenbrand <david@redhat.com>
>>>
>>> Let's allow for making MADV_COLLAPSE succeed on areas that neither have
>>> VM_HUGEPAGE nor VM_NOHUGEPAGE when we have THP disabled
>>> unless explicitly advised (PR_THP_DISABLE_EXCEPT_ADVISED).
>>
>> Hmm, I'm not sure about this.
>>
>> So far this prctl() has been the only way to override MADV_COLLAPSE
>> behaviour, but now we're allowing for this one case to not.
> 
> This is not an override really. prctl() disallowed MADV_COLLAPSE, but in the new mode we don't want that anymore.
> 
>> > I suppose the precedent is that MADV_COLLAPSE overrides 'madvise' sysfs
>> behaviour.
>> > I suppose what saves us here is 'advised' can be read to mean either
>> MADV_HUGEPAGE or MADV_COLLAPSE.
>> > And yes, MADV_COLLAPSE is clearly the user requesting this behaviour.
> 
> Exactly.
> 
>>
>> I think the vagueness here is one that already existed, because one could
>> perfectly one have expected MADV_COLLAPSE to obey sysfs and require
>> MADV_HUGEPAGE to have been applied, but of course this is not the case.
> 
> Yes.
> 
>>
>> OK so fine.
>>
>> BUT.
>>
>> I think the MADV_COLLAPSE man page will need to be updated to mention this.
>>
> 
> Yes.
> 

Thanks, yes will do and send this along with changes to prctl man page after this
makes into mm-stable.

>> And I REALLY think we should update the THP doc too to mention all these
>> prctl() modes.
>>
>> I'm not sure we cover that right now _at all_ and obviously we should
>> describe the new flags.
>>
>> Usama - can you add a patch to this series to do that?
> 
> Good point, let's document the interaction with prctl().
> 

I have added the following patch for the next revision. I know that a lot
of this will be in the man page as well, but I have gone the way of being very
very explicit of what are all the possible calls that can be made (hopefully thats
the right approach :))


commit 5f290d29741a514d0861d0f99c8b860ba6af9c37
Author: Usama Arif <usamaarif642@gmail.com>
Date:   Fri Aug 1 12:05:49 2025 +0100

    docs: transhuge: document process level THP controls
    
    This includes the PR_SET_THP_DISABLE/PR_GET_THP_DISABLE pair of
    prctl calls as well the newly introduced PR_THP_DISABLE_EXCEPT_ADVISED
    flag for the PR_SET_THP_DISABLE prctl call.
    
    Signed-off-by: Usama Arif <usamaarif642@gmail.com>

diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index 370fba113460..cce0a99beac8 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -225,6 +225,45 @@ to "always" or "madvise"), and it'll be automatically shutdown when
 PMD-sized THP is disabled (when both the per-size anon control and the
 top-level control are "never")
 
+process THP controls
+--------------------
+
+A process can control its own THP behaviour using the ``PR_SET_THP_DISABLE``
+and ``PR_GET_THP_DISABLE`` pair of prctl(2) calls. These calls support the
+following arguments::
+
+       prctl(PR_SET_THP_DISABLE, 1, 0, 0, 0):
+               This will set the MMF_DISABLE_THP_COMPLETELY mm flag which will
+               result in no THPs being faulted in or collapsed, irrespective
+               of global THP controls. This flag and hence the behaviour is
+               inherited across fork(2) and execve(2).
+
+       prctl(PR_SET_THP_DISABLE, 1, PR_THP_DISABLE_EXCEPT_ADVISED, 0, 0):
+               This will set the MMF_DISABLE_THP_EXCEPT_ADVISED mm flag which
+               will result in THPs being faulted in or collapsed only for
+               the following cases:
+               - Global THP controls are set to "always" or "madvise" and
+                 the process has madvised the region with either MADV_HUGEPAGE
+                 or MADV_COLLAPSE.
+               - Global THP controls is set to "never" and the process has
+                 madvised the region with MADV_COLLAPSE.
+               This flag and hence the behaviour is inherited across fork(2)
+               and execve(2).
+
+       prctl(PR_SET_THP_DISABLE, 0, 0, 0, 0):
+               This will clear the MMF_DISABLE_THP_COMPLETELY and
+               MMF_DISABLE_THP_EXCEPT_ADVISED mm flags. The process will
+               behave according to the global THP controls. This behaviour
+               will be inherited across fork(2) and execve(2).
+
+       prctl(PR_GET_THP_DISABLE, 0, 0, 0, 0):
+               This will return the THP disable mm flag status of the process
+               that was set by prctl(PR_SET_THP_DISABLE, ...).
+               i.e.
+               - 1 if MMF_DISABLE_THP_COMPLETELY flag is set
+               - 3 if MMF_DISABLE_THP_EXCEPT_ADVISED flag is set
+               - 0 otherwise.
+
 Khugepaged controls
 -------------------


>>
>>>
>>> MADV_COLLAPSE is a clear advise that we want to collapse.
>>
>> advise -> advice.
>>
>>>
>>> Note that we still respect the VM_NOHUGEPAGE flag, just like
>>> MADV_COLLAPSE always does. So consequently, MADV_COLLAPSE is now only
>>> refused on VM_NOHUGEPAGE with PR_THP_DISABLE_EXCEPT_ADVISED.
>>
>> You also need to mention the shmem change you've made I think.
> 
> Yes.
> 
>> >>
>>> Co-developed-by: Usama Arif <usamaarif642@gmail.com>
>>> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>>   include/linux/huge_mm.h    | 8 +++++++-
>>>   include/uapi/linux/prctl.h | 2 +-
>>>   mm/huge_memory.c           | 5 +++--
>>>   mm/memory.c                | 6 ++++--
>>>   mm/shmem.c                 | 2 +-
>>>   5 files changed, 16 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>> index b0ff54eee81c..aeaf93f8ac2e 100644
>>> --- a/include/linux/huge_mm.h
>>> +++ b/include/linux/huge_mm.h
>>> @@ -329,7 +329,7 @@ struct thpsize {
>>>    * through madvise or prctl.
>>>    */
>>>   static inline bool vma_thp_disabled(struct vm_area_struct *vma,
>>> -        vm_flags_t vm_flags)
>>> +        vm_flags_t vm_flags, bool forced_collapse)
>>>   {
>>>       /* Are THPs disabled for this VMA? */
>>>       if (vm_flags & VM_NOHUGEPAGE)
>>> @@ -343,6 +343,12 @@ static inline bool vma_thp_disabled(struct vm_area_struct *vma,
>>>        */
>>>       if (vm_flags & VM_HUGEPAGE)
>>>           return false;
>>> +    /*
>>> +     * Forcing a collapse (e.g., madv_collapse), is a clear advise to
>>
>> advise -> advice.
>>
>>> +     * use THPs.
>>> +     */
>>> +    if (forced_collapse)
>>> +        return false;
>>>       return test_bit(MMF_DISABLE_THP_EXCEPT_ADVISED, &vma->vm_mm->flags);
>>>   }
>>>
>>> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
>>> index 9c1d6e49b8a9..ee4165738779 100644
>>> --- a/include/uapi/linux/prctl.h
>>> +++ b/include/uapi/linux/prctl.h
>>> @@ -185,7 +185,7 @@ struct prctl_mm_map {
>>>   #define PR_SET_THP_DISABLE    41
>>>   /*
>>>    * Don't disable THPs when explicitly advised (e.g., MADV_HUGEPAGE /
>>> - * VM_HUGEPAGE).
>>> + * VM_HUGEPAGE / MADV_COLLAPSE).
>>
>> This is confusing you're mixing VMA flags with MADV ones... maybe just
>> stick to madvise ones, or add extra context around VM_HUGEPAGE bit?
> 
> I don't see anything confusing here, really.
> 
> But if it helps you, we can do
>     (e.g., MADV_HUGEPAGE / VM_HUGEPAGE, MADV_COLLAPSE).
> 
> (reason VM_HUGEPAGE is spelled out is that there might be code where we set VM_HUGEPAGE implicitly in the kernel)
> 
>>
>> Would need to be fixed up in a prior commit obviously.
>>
>>>    */
>>>   # define PR_THP_DISABLE_EXCEPT_ADVISED    (1 << 1)
>>>   #define PR_GET_THP_DISABLE    42
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 85252b468f80..ef5ccb0ec5d5 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -104,7 +104,8 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
>>>   {
>>>       const bool smaps = type == TVA_SMAPS;
>>>       const bool in_pf = type == TVA_PAGEFAULT;
>>> -    const bool enforce_sysfs = type != TVA_FORCED_COLLAPSE;
>>> +    const bool forced_collapse = type == TVA_FORCED_COLLAPSE;
>>> +    const bool enforce_sysfs = !forced_collapse;
>>
>> Can we just get rid of this enforce_sysfs altogether in patch 2/5 and use
>> forced_collapse?
> 
> Let's do that as a separate cleanup on top. I want least churn in that patch.
> 
> (had the same idea while writing that patch, but I have other things to focus on than cleaning up all this mess)
> 

I am happy to send this cleanup once this series makes it to mm-new and a new revision is not expected.

Thanks!
Usama

  parent reply	other threads:[~2025-08-01 11:26 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-31 12:27 [PATCH v2 0/5] prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised Usama Arif
2025-07-31 12:27 ` [PATCH v2 1/5] prctl: extend PR_SET_THP_DISABLE to optionally exclude VM_HUGEPAGE Usama Arif
2025-07-31 12:40   ` Lorenzo Stoakes
2025-07-31 13:12     ` Usama Arif
2025-07-31 13:18       ` Lorenzo Stoakes
2025-07-31 13:20       ` David Hildenbrand
2025-07-31 15:13   ` Zi Yan
2025-07-31 12:27 ` [PATCH v2 2/5] mm/huge_memory: convert "tva_flags" to "enum tva_type" for thp_vma_allowable_order*() Usama Arif
2025-07-31 14:00   ` Lorenzo Stoakes
2025-07-31 15:19     ` Zi Yan
2025-07-31 16:15     ` David Hildenbrand
2025-08-01 10:08       ` Lorenzo Stoakes
2025-07-31 19:20     ` Usama Arif
2025-08-01 10:12       ` Lorenzo Stoakes
2025-07-31 12:27 ` [PATCH v2 3/5] mm/huge_memory: treat MADV_COLLAPSE as an advise with PR_THP_DISABLE_EXCEPT_ADVISED Usama Arif
2025-07-31 14:38   ` Lorenzo Stoakes
2025-07-31 14:54     ` David Hildenbrand
2025-08-01 10:32       ` Lorenzo Stoakes
2025-08-01 11:26       ` Usama Arif [this message]
2025-07-31 12:27 ` [PATCH v2 4/5] selftests: prctl: introduce tests for disabling THPs completely Usama Arif
2025-07-31 19:42   ` David Hildenbrand
2025-08-01 11:42     ` Usama Arif
2025-08-01 12:53       ` David Hildenbrand
2025-07-31 12:27 ` [PATCH v2 5/5] selftests: prctl: introduce tests for disabling THPs except for madvise Usama Arif

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3fc8b76c-825d-4aaa-98f4-ff64bb40497d@gmail.com \
    --to=usamaarif642@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=jannh@google.com \
    --cc=kernel-team@meta.com \
    --cc=laoar.shao@gmail.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=sj@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.