linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Usama Arif <usamaarif642@gmail.com>
To: David Hildenbrand <david@redhat.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	corbet@lwn.net, rppt@kernel.org, surenb@google.com,
	mhocko@suse.com, hannes@cmpxchg.org, baohua@kernel.org,
	shakeel.butt@linux.dev, riel@surriel.com, ziy@nvidia.com,
	laoar.shao@gmail.com, dev.jain@arm.com,
	baolin.wang@linux.alibaba.com, npache@redhat.com,
	Liam.Howlett@oracle.com, ryan.roberts@arm.com, vbabka@suse.cz,
	jannh@google.com, Arnd Bergmann <arnd@arndb.de>,
	sj@kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH v2 3/5] mm/huge_memory: treat MADV_COLLAPSE as an advise with PR_THP_DISABLE_EXCEPT_ADVISED
Date: Fri, 1 Aug 2025 12:26:03 +0100	[thread overview]
Message-ID: <3fc8b76c-825d-4aaa-98f4-ff64bb40497d@gmail.com> (raw)
In-Reply-To: <747509a6-8493-46c3-99d4-38b53a8a7504@redhat.com>



On 31/07/2025 15:54, David Hildenbrand wrote:
> On 31.07.25 16:38, Lorenzo Stoakes wrote:
>> Nits on subject:
>>
>> - It's >75 chars
> 
> No big deal. If we cna come up with something shorter, good.

Changed it to "mm/huge_memory: respect MADV_COLLAPSE with PR_THP_DISABLE_EXCEPT_ADVISED"
for the next revision. That would be 73 chars :)

> 
>> - advise is the verb, advice is the noun.
> 
> Yeah.
> 
>>
>> On Thu, Jul 31, 2025 at 01:27:20PM +0100, Usama Arif wrote:
>>> From: David Hildenbrand <david@redhat.com>
>>>
>>> Let's allow for making MADV_COLLAPSE succeed on areas that neither have
>>> VM_HUGEPAGE nor VM_NOHUGEPAGE when we have THP disabled
>>> unless explicitly advised (PR_THP_DISABLE_EXCEPT_ADVISED).
>>
>> Hmm, I'm not sure about this.
>>
>> So far this prctl() has been the only way to override MADV_COLLAPSE
>> behaviour, but now we're allowing for this one case to not.
> 
> This is not an override really. prctl() disallowed MADV_COLLAPSE, but in the new mode we don't want that anymore.
> 
>> > I suppose the precedent is that MADV_COLLAPSE overrides 'madvise' sysfs
>> behaviour.
>> > I suppose what saves us here is 'advised' can be read to mean either
>> MADV_HUGEPAGE or MADV_COLLAPSE.
>> > And yes, MADV_COLLAPSE is clearly the user requesting this behaviour.
> 
> Exactly.
> 
>>
>> I think the vagueness here is one that already existed, because one could
>> perfectly one have expected MADV_COLLAPSE to obey sysfs and require
>> MADV_HUGEPAGE to have been applied, but of course this is not the case.
> 
> Yes.
> 
>>
>> OK so fine.
>>
>> BUT.
>>
>> I think the MADV_COLLAPSE man page will need to be updated to mention this.
>>
> 
> Yes.
> 

Thanks, yes will do and send this along with changes to prctl man page after this
makes into mm-stable.

>> And I REALLY think we should update the THP doc too to mention all these
>> prctl() modes.
>>
>> I'm not sure we cover that right now _at all_ and obviously we should
>> describe the new flags.
>>
>> Usama - can you add a patch to this series to do that?
> 
> Good point, let's document the interaction with prctl().
> 

I have added the following patch for the next revision. I know that a lot
of this will be in the man page as well, but I have gone the way of being very
very explicit of what are all the possible calls that can be made (hopefully thats
the right approach :))


commit 5f290d29741a514d0861d0f99c8b860ba6af9c37
Author: Usama Arif <usamaarif642@gmail.com>
Date:   Fri Aug 1 12:05:49 2025 +0100

    docs: transhuge: document process level THP controls
    
    This includes the PR_SET_THP_DISABLE/PR_GET_THP_DISABLE pair of
    prctl calls as well the newly introduced PR_THP_DISABLE_EXCEPT_ADVISED
    flag for the PR_SET_THP_DISABLE prctl call.
    
    Signed-off-by: Usama Arif <usamaarif642@gmail.com>

diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index 370fba113460..cce0a99beac8 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -225,6 +225,45 @@ to "always" or "madvise"), and it'll be automatically shutdown when
 PMD-sized THP is disabled (when both the per-size anon control and the
 top-level control are "never")
 
+process THP controls
+--------------------
+
+A process can control its own THP behaviour using the ``PR_SET_THP_DISABLE``
+and ``PR_GET_THP_DISABLE`` pair of prctl(2) calls. These calls support the
+following arguments::
+
+       prctl(PR_SET_THP_DISABLE, 1, 0, 0, 0):
+               This will set the MMF_DISABLE_THP_COMPLETELY mm flag which will
+               result in no THPs being faulted in or collapsed, irrespective
+               of global THP controls. This flag and hence the behaviour is
+               inherited across fork(2) and execve(2).
+
+       prctl(PR_SET_THP_DISABLE, 1, PR_THP_DISABLE_EXCEPT_ADVISED, 0, 0):
+               This will set the MMF_DISABLE_THP_EXCEPT_ADVISED mm flag which
+               will result in THPs being faulted in or collapsed only for
+               the following cases:
+               - Global THP controls are set to "always" or "madvise" and
+                 the process has madvised the region with either MADV_HUGEPAGE
+                 or MADV_COLLAPSE.
+               - Global THP controls is set to "never" and the process has
+                 madvised the region with MADV_COLLAPSE.
+               This flag and hence the behaviour is inherited across fork(2)
+               and execve(2).
+
+       prctl(PR_SET_THP_DISABLE, 0, 0, 0, 0):
+               This will clear the MMF_DISABLE_THP_COMPLETELY and
+               MMF_DISABLE_THP_EXCEPT_ADVISED mm flags. The process will
+               behave according to the global THP controls. This behaviour
+               will be inherited across fork(2) and execve(2).
+
+       prctl(PR_GET_THP_DISABLE, 0, 0, 0, 0):
+               This will return the THP disable mm flag status of the process
+               that was set by prctl(PR_SET_THP_DISABLE, ...).
+               i.e.
+               - 1 if MMF_DISABLE_THP_COMPLETELY flag is set
+               - 3 if MMF_DISABLE_THP_EXCEPT_ADVISED flag is set
+               - 0 otherwise.
+
 Khugepaged controls
 -------------------


>>
>>>
>>> MADV_COLLAPSE is a clear advise that we want to collapse.
>>
>> advise -> advice.
>>
>>>
>>> Note that we still respect the VM_NOHUGEPAGE flag, just like
>>> MADV_COLLAPSE always does. So consequently, MADV_COLLAPSE is now only
>>> refused on VM_NOHUGEPAGE with PR_THP_DISABLE_EXCEPT_ADVISED.
>>
>> You also need to mention the shmem change you've made I think.
> 
> Yes.
> 
>> >>
>>> Co-developed-by: Usama Arif <usamaarif642@gmail.com>
>>> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>>   include/linux/huge_mm.h    | 8 +++++++-
>>>   include/uapi/linux/prctl.h | 2 +-
>>>   mm/huge_memory.c           | 5 +++--
>>>   mm/memory.c                | 6 ++++--
>>>   mm/shmem.c                 | 2 +-
>>>   5 files changed, 16 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>> index b0ff54eee81c..aeaf93f8ac2e 100644
>>> --- a/include/linux/huge_mm.h
>>> +++ b/include/linux/huge_mm.h
>>> @@ -329,7 +329,7 @@ struct thpsize {
>>>    * through madvise or prctl.
>>>    */
>>>   static inline bool vma_thp_disabled(struct vm_area_struct *vma,
>>> -        vm_flags_t vm_flags)
>>> +        vm_flags_t vm_flags, bool forced_collapse)
>>>   {
>>>       /* Are THPs disabled for this VMA? */
>>>       if (vm_flags & VM_NOHUGEPAGE)
>>> @@ -343,6 +343,12 @@ static inline bool vma_thp_disabled(struct vm_area_struct *vma,
>>>        */
>>>       if (vm_flags & VM_HUGEPAGE)
>>>           return false;
>>> +    /*
>>> +     * Forcing a collapse (e.g., madv_collapse), is a clear advise to
>>
>> advise -> advice.
>>
>>> +     * use THPs.
>>> +     */
>>> +    if (forced_collapse)
>>> +        return false;
>>>       return test_bit(MMF_DISABLE_THP_EXCEPT_ADVISED, &vma->vm_mm->flags);
>>>   }
>>>
>>> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
>>> index 9c1d6e49b8a9..ee4165738779 100644
>>> --- a/include/uapi/linux/prctl.h
>>> +++ b/include/uapi/linux/prctl.h
>>> @@ -185,7 +185,7 @@ struct prctl_mm_map {
>>>   #define PR_SET_THP_DISABLE    41
>>>   /*
>>>    * Don't disable THPs when explicitly advised (e.g., MADV_HUGEPAGE /
>>> - * VM_HUGEPAGE).
>>> + * VM_HUGEPAGE / MADV_COLLAPSE).
>>
>> This is confusing you're mixing VMA flags with MADV ones... maybe just
>> stick to madvise ones, or add extra context around VM_HUGEPAGE bit?
> 
> I don't see anything confusing here, really.
> 
> But if it helps you, we can do
>     (e.g., MADV_HUGEPAGE / VM_HUGEPAGE, MADV_COLLAPSE).
> 
> (reason VM_HUGEPAGE is spelled out is that there might be code where we set VM_HUGEPAGE implicitly in the kernel)
> 
>>
>> Would need to be fixed up in a prior commit obviously.
>>
>>>    */
>>>   # define PR_THP_DISABLE_EXCEPT_ADVISED    (1 << 1)
>>>   #define PR_GET_THP_DISABLE    42
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 85252b468f80..ef5ccb0ec5d5 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -104,7 +104,8 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
>>>   {
>>>       const bool smaps = type == TVA_SMAPS;
>>>       const bool in_pf = type == TVA_PAGEFAULT;
>>> -    const bool enforce_sysfs = type != TVA_FORCED_COLLAPSE;
>>> +    const bool forced_collapse = type == TVA_FORCED_COLLAPSE;
>>> +    const bool enforce_sysfs = !forced_collapse;
>>
>> Can we just get rid of this enforce_sysfs altogether in patch 2/5 and use
>> forced_collapse?
> 
> Let's do that as a separate cleanup on top. I want least churn in that patch.
> 
> (had the same idea while writing that patch, but I have other things to focus on than cleaning up all this mess)
> 

I am happy to send this cleanup once this series makes it to mm-new and a new revision is not expected.

Thanks!
Usama

  parent reply	other threads:[~2025-08-01 11:26 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-31 12:27 [PATCH v2 0/5] prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised Usama Arif
2025-07-31 12:27 ` [PATCH v2 1/5] prctl: extend PR_SET_THP_DISABLE to optionally exclude VM_HUGEPAGE Usama Arif
2025-07-31 12:40   ` Lorenzo Stoakes
2025-07-31 13:12     ` Usama Arif
2025-07-31 13:18       ` Lorenzo Stoakes
2025-07-31 13:20       ` David Hildenbrand
2025-07-31 15:13   ` Zi Yan
2025-07-31 12:27 ` [PATCH v2 2/5] mm/huge_memory: convert "tva_flags" to "enum tva_type" for thp_vma_allowable_order*() Usama Arif
2025-07-31 14:00   ` Lorenzo Stoakes
2025-07-31 15:19     ` Zi Yan
2025-07-31 16:15     ` David Hildenbrand
2025-08-01 10:08       ` Lorenzo Stoakes
2025-07-31 19:20     ` Usama Arif
2025-08-01 10:12       ` Lorenzo Stoakes
2025-07-31 12:27 ` [PATCH v2 3/5] mm/huge_memory: treat MADV_COLLAPSE as an advise with PR_THP_DISABLE_EXCEPT_ADVISED Usama Arif
2025-07-31 14:38   ` Lorenzo Stoakes
2025-07-31 14:54     ` David Hildenbrand
2025-08-01 10:32       ` Lorenzo Stoakes
2025-08-01 11:26       ` Usama Arif [this message]
2025-07-31 12:27 ` [PATCH v2 4/5] selftests: prctl: introduce tests for disabling THPs completely Usama Arif
2025-07-31 19:42   ` David Hildenbrand
2025-08-01 11:42     ` Usama Arif
2025-08-01 12:53       ` David Hildenbrand
2025-07-31 12:27 ` [PATCH v2 5/5] selftests: prctl: introduce tests for disabling THPs except for madvise Usama Arif

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3fc8b76c-825d-4aaa-98f4-ff64bb40497d@gmail.com \
    --to=usamaarif642@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=jannh@google.com \
    --cc=kernel-team@meta.com \
    --cc=laoar.shao@gmail.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=sj@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).