linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Usama Arif <usamaarif642@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	corbet@lwn.net, rppt@kernel.org, surenb@google.com,
	mhocko@suse.com, hannes@cmpxchg.org, baohua@kernel.org,
	shakeel.butt@linux.dev, riel@surriel.com, ziy@nvidia.com,
	laoar.shao@gmail.com, dev.jain@arm.com,
	baolin.wang@linux.alibaba.com, npache@redhat.com,
	Liam.Howlett@oracle.com, ryan.roberts@arm.com, vbabka@suse.cz,
	jannh@google.com, Arnd Bergmann <arnd@arndb.de>,
	sj@kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH v2 2/5] mm/huge_memory: convert "tva_flags" to "enum tva_type" for thp_vma_allowable_order*()
Date: Thu, 31 Jul 2025 18:15:35 +0200	[thread overview]
Message-ID: <09acd558-19b9-4964-823b-502b9044f954@redhat.com> (raw)
In-Reply-To: <c44cb864-3b36-4aa2-8040-60c97bfdc28e@lucifer.local>

On 31.07.25 16:00, Lorenzo Stoakes wrote:
> On Thu, Jul 31, 2025 at 01:27:19PM +0100, Usama Arif wrote:
>> From: David Hildenbrand <david@redhat.com>
>>
>> Describing the context through a type is much clearer, and good enough
>> for our case.

Just for the other patch, I'll let Usama take it from here, just a bunch 
of comments.

> 
> This is pretty bare bones. What context, what type? Under what
> circumstances?
> 
> This also is missing detail on the key difference here - that actually it
> turns out we _don't_ need these to be flags, rather we can have _distinct_
> modes which are clearer.
> 
> I'd say something like:
> 
> 	when determining which THP orders are eligiible for a VMA mapping,
> 	we have previously specified tva_flags, however it turns out it is
> 	really not necessary to treat these as flags.
> 
> 	Rather, we distinguish between distinct modes.
> 
> 	The only case where we previously combined flags was with
> 	TVA_ENFORCE_SYSFS, but we can avoid this by observing that this is
> 	the default, except for MADV_COLLAPSE or an edge cases in
> 	collapse_pte_mapped_thp() and hugepage_vma_revalidate(), and adding
> 	a mode specifically for this case - TVA_FORCED_COLLAPSE.
> 
> 	... stuff about the different modes...
> 
>>
>> We have:
>> * smaps handling for showing "THPeligible"
>> * Pagefault handling
>> * khugepaged handling
>> * Forced collapse handling: primarily MADV_COLLAPSE, but one other odd case
> 
> Can we actually state what this case is? I mean I guess a handwave in the
> form of 'an edge case in collapse_pte_mapped_thp()' will do also.

Yeah, something like that. I think we also call it when we previously 
checked that there is a THP and that we might be allowed to collapse. 
E.g., collapse_pte_mapped_thp() is also called from khugepaged code 
where we already checked the allowed order.

> 
> Hmm actually we do weird stuff with this so maybe just handwave.
> 
> Like uprobes calls collapse_pte_mapped_thp()... :/ I'm not sure this 'If we
> are here, we've succeeded in replacing all the native pages in the page
> cache with a single hugepage.' comment is even correct.

I think in all these cases we already have a THP and want to force that 
collapse in the page table.

[...]

>>
>> Really, we want to ignore sysfs only when we are forcing a collapse
>> through MADV_COLLAPSE, otherwise we want to enforce.
> 
> I'd say 'ignoring this edge case, ...'
> 
> I think the clearest thing might be to literally list the before/after
> like:
> 
> * TVA_SMAPS | TVA_ENFORCE_SYSFS -> TVA_SMAPS
> * TVA_IN_PF | TVA_ENFORCE_SYSFS -> TVA_PAGEFAULT
> * TVA_ENFORCE_SYSFS             -> TVA_KHUGEPAGED
> * 0                             -> TVA_FORCED_COLLAPSE
> 

That makes sense.

>>
>> With this change, we immediately know if we are in the forced collapse
>> case, which will be valuable next.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> Acked-by: Usama Arif <usamaarif642@gmail.com>
>> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> 
> Overall this is a great cleanup, some various nits however.
> 
>> ---
>>   fs/proc/task_mmu.c      |  4 ++--
>>   include/linux/huge_mm.h | 30 ++++++++++++++++++------------
>>   mm/huge_memory.c        |  8 ++++----
>>   mm/khugepaged.c         | 18 +++++++++---------
>>   mm/memory.c             | 14 ++++++--------
>>   5 files changed, 39 insertions(+), 35 deletions(-)
>>
>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>> index 3d6d8a9f13fc..d440df7b3d59 100644
>> --- a/fs/proc/task_mmu.c
>> +++ b/fs/proc/task_mmu.c
>> @@ -1293,8 +1293,8 @@ static int show_smap(struct seq_file *m, void *v)
>>   	__show_smap(m, &mss, false);
>>
>>   	seq_printf(m, "THPeligible:    %8u\n",
>> -		   !!thp_vma_allowable_orders(vma, vma->vm_flags,
>> -			   TVA_SMAPS | TVA_ENFORCE_SYSFS, THP_ORDERS_ALL));
>> +		   !!thp_vma_allowable_orders(vma, vma->vm_flags, TVA_SMAPS,
>> +					      THP_ORDERS_ALL));
> 
> This !! is so gross, wonder if we could have a bool wrapper. But not a big
> deal.
> 
> I also sort of _hate_ the smaps flag anyway, invoking this 'allowable
> orders' thing just for smaps reporting with maybe some minor delta is just
> odd.
> 
> Something like `bool vma_has_thp_allowed_orders(struct vm_area_struct
> *vma);` would be nicer.
> 
> Anyway thoughts for another time... :)

Yeah, that's not the only nasty bit here ... :)

> 
>>
>>   	if (arch_pkeys_enabled())
>>   		seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 71db243a002e..b0ff54eee81c 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -94,12 +94,15 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr;
>>   #define THP_ORDERS_ALL	\
>>   	(THP_ORDERS_ALL_ANON | THP_ORDERS_ALL_SPECIAL | THP_ORDERS_ALL_FILE_DEFAULT)
>>
>> -#define TVA_SMAPS		(1 << 0)	/* Will be used for procfs */
> 
> Dumb question, but what does 'TVA' stand for? :P

Whoever came up with that probably used the function name where this is 
passed in

thp_vma_allowable_orders()

> 
>> -#define TVA_IN_PF		(1 << 1)	/* Page fault handler */
>> -#define TVA_ENFORCE_SYSFS	(1 << 2)	/* Obey sysfs configuration */
>> +enum tva_type {
>> +	TVA_SMAPS,		/* Exposing "THPeligible:" in smaps. */
> 
> How I hate this flag (just an observation...)
> 
>> +	TVA_PAGEFAULT,		/* Serving a page fault. */
>> +	TVA_KHUGEPAGED,		/* Khugepaged collapse. */
> 
> This is equivalent to the TVA_ENFORCE_SYSFS case before, sort of a default
> I guess, but actually quite nice to add the context that it's sourced from
> khugepaged - I assume this will always be the case when specified?
> 
>> +	TVA_FORCED_COLLAPSE,	/* Forced collapse (i.e., MADV_COLLAPSE). */
> 
> Would put 'e.g.' here, then that allows 'space' for the edge case...

Makes sense.

> 
>> +};
>>
>> -#define thp_vma_allowable_order(vma, vm_flags, tva_flags, order) \
>> -	(!!thp_vma_allowable_orders(vma, vm_flags, tva_flags, BIT(order)))
>> +#define thp_vma_allowable_order(vma, vm_flags, type, order) \
>> +	(!!thp_vma_allowable_orders(vma, vm_flags, type, BIT(order)))
> 
> Nit, but maybe worth keeping tva_ prefix - tva_type - here just so it's
> clear what type it refers to.
> 
> But not end of the world.
> 
> Same comment goes for param names below etc.

No strong opinion, but I prefer to drop the prefix when it can be 
deduced from the type and we are inside of the very function that 
essentially defines these types (tva prefix is implicit, no other type 
applies).

These should probably just be inline functions at some point with proper 
types and doc (separate patch uin the future, of course).

[...]

>> +++ b/mm/khugepaged.c
>> @@ -474,8 +474,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
>>   {
>>   	if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
>>   	    hugepage_pmd_enabled()) {
>> -		if (thp_vma_allowable_order(vma, vm_flags, TVA_ENFORCE_SYSFS,
>> -					    PMD_ORDER))
>> +		if (thp_vma_allowable_order(vma, vm_flags, TVA_KHUGEPAGED, PMD_ORDER))
>>   			__khugepaged_enter(vma->vm_mm);
>>   	}
>>   }
>> @@ -921,7 +920,8 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
>>   				   struct collapse_control *cc)
>>   {
>>   	struct vm_area_struct *vma;
>> -	unsigned long tva_flags = cc->is_khugepaged ? TVA_ENFORCE_SYSFS : 0;
>> +	enum tva_type tva_type = cc->is_khugepaged ? TVA_KHUGEPAGED :
>> +				 TVA_FORCED_COLLAPSE;
> 
> This is great, this is so much clearer.
> 
> A nit though, I mean I come back to my 'type' vs 'tva_type' nit above, this
> is inconsistent, so we should choose one approach and stick with it.

This is outside of the function, so I would prefer to keep it here, but 
no stong opinion.

> 
>>
>>   	if (unlikely(hpage_collapse_test_exit_or_disable(mm)))
>>   		return SCAN_ANY_PROCESS;
>> @@ -932,7 +932,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
>>
>>   	if (!thp_vma_suitable_order(vma, address, PMD_ORDER))
>>   		return SCAN_ADDRESS_RANGE;
>> -	if (!thp_vma_allowable_order(vma, vma->vm_flags, tva_flags, PMD_ORDER))
>> +	if (!thp_vma_allowable_order(vma, vma->vm_flags, tva_type, PMD_ORDER))
>>   		return SCAN_VMA_CHECK;
>>   	/*
>>   	 * Anon VMA expected, the address may be unmapped then
>> @@ -1532,9 +1532,10 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr,
>>   	 * in the page cache with a single hugepage. If a mm were to fault-in
>>   	 * this memory (mapped by a suitably aligned VMA), we'd get the hugepage
>>   	 * and map it by a PMD, regardless of sysfs THP settings. As such, let's
>> -	 * analogously elide sysfs THP settings here.
>> +	 * analogously elide sysfs THP settings here and pretend we are
>> +	 * collapsing.
> 
> I think saying pretending here is potentially confusing, maybe worth saying
> 'force collapse'?

Makes sense.

-- 
Cheers,

David / dhildenb


  parent reply	other threads:[~2025-07-31 16:15 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-31 12:27 [PATCH v2 0/5] prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised Usama Arif
2025-07-31 12:27 ` [PATCH v2 1/5] prctl: extend PR_SET_THP_DISABLE to optionally exclude VM_HUGEPAGE Usama Arif
2025-07-31 12:40   ` Lorenzo Stoakes
2025-07-31 13:12     ` Usama Arif
2025-07-31 13:18       ` Lorenzo Stoakes
2025-07-31 13:20       ` David Hildenbrand
2025-07-31 15:13   ` Zi Yan
2025-07-31 12:27 ` [PATCH v2 2/5] mm/huge_memory: convert "tva_flags" to "enum tva_type" for thp_vma_allowable_order*() Usama Arif
2025-07-31 14:00   ` Lorenzo Stoakes
2025-07-31 15:19     ` Zi Yan
2025-07-31 16:15     ` David Hildenbrand [this message]
2025-08-01 10:08       ` Lorenzo Stoakes
2025-07-31 19:20     ` Usama Arif
2025-08-01 10:12       ` Lorenzo Stoakes
2025-07-31 12:27 ` [PATCH v2 3/5] mm/huge_memory: treat MADV_COLLAPSE as an advise with PR_THP_DISABLE_EXCEPT_ADVISED Usama Arif
2025-07-31 14:38   ` Lorenzo Stoakes
2025-07-31 14:54     ` David Hildenbrand
2025-08-01 10:32       ` Lorenzo Stoakes
2025-08-01 11:26       ` Usama Arif
2025-07-31 12:27 ` [PATCH v2 4/5] selftests: prctl: introduce tests for disabling THPs completely Usama Arif
2025-07-31 19:42   ` David Hildenbrand
2025-08-01 11:42     ` Usama Arif
2025-08-01 12:53       ` David Hildenbrand
2025-07-31 12:27 ` [PATCH v2 5/5] selftests: prctl: introduce tests for disabling THPs except for madvise Usama Arif

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=09acd558-19b9-4964-823b-502b9044f954@redhat.com \
    --to=david@redhat.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=corbet@lwn.net \
    --cc=dev.jain@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=jannh@google.com \
    --cc=kernel-team@meta.com \
    --cc=laoar.shao@gmail.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=sj@kernel.org \
    --cc=surenb@google.com \
    --cc=usamaarif642@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).