Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Dev Jain <dev.jain@arm.com>
To: "David Hildenbrand (Arm)" <david@kernel.org>,
	akpm@linux-foundation.org, ljs@kernel.org, hughd@google.com,
	chrisl@kernel.org, kasong@tencent.com
Cc: riel@surriel.com, liam@infradead.org, vbabka@kernel.org,
	harry@kernel.org, jannh@google.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, qi.zheng@linux.dev,
	shakeel.butt@linux.dev, baohua@kernel.org,
	axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com,
	rppt@kernel.org, surenb@google.com, mhocko@suse.com,
	baolin.wang@linux.alibaba.com, shikemeng@huaweicloud.com,
	nphamcs@gmail.com, bhe@redhat.com, youngjun.park@lge.com,
	pfalcato@suse.de, ryan.roberts@arm.com,
	anshuman.khandual@arm.com
Subject: Re: [PATCH v3 2/9] mm/rmap: refactor hugetlb pte clearing in try_to_unmap_one
Date: Mon, 11 May 2026 14:23:24 +0530	[thread overview]
Message-ID: <d2f11bbd-93ec-4a7e-9de3-ed4541914ad9@arm.com> (raw)
In-Reply-To: <5a4c3c3d-66c8-4ef6-bb6a-2ec0e32694a1@kernel.org>



On 11/05/26 12:40 pm, David Hildenbrand (Arm) wrote:
> On 5/6/26 11:44, Dev Jain wrote:
>> Simplify the code by refactoring the folio_test_hugetlb() branch into
>> a new function.
>>
>> While at it, convert BUG helpers to WARN helpers.
>>
>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>> ---
>>  mm/rmap.c | 117 ++++++++++++++++++++++++++++++++----------------------
>>  1 file changed, 69 insertions(+), 48 deletions(-)
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index a5f067a09de0f..a98acdea0530a 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1978,6 +1978,68 @@ static inline unsigned int folio_unmap_pte_batch(struct folio *folio,
>>  				     FPB_RESPECT_WRITE | FPB_RESPECT_SOFT_DIRTY);
>>  }
>>  
>> +/* Returns false if unmap needs to be aborted */
>> +static inline bool unmap_hugetlb_folio(struct vm_area_struct *vma,
> 
> I'm wondering whether we should make it clearer that this belongs to the
> try_to_unmap family by calling it
> 
> 	ttu_hugetlb_folio

Yes I had suggested a ttu_ prefix somewhere else in the first version,
Lorenzo didn't like it (or probably he didn't like that specific use
of ttu):

https://lore.kernel.org/all/a8b06f36-98e1-435c-881f-67242bc4304a@lucifer.local/

Don't know about a better name other than "commit_ttu_lazyfree_folio" in
that case, but for the hugetlb case, I like ttu_hugetlb_folio.

> 
>> +		struct folio *folio, struct page_vma_mapped_walk *pvmw,
>> +		struct page *page, enum ttu_flags flags, pte_t *pteval,
>> +		struct mmu_notifier_range *range, bool *exit_walk)
>> +{
>> +	/*
>> +	 * The try_to_unmap() is only passed a hugetlb page
>> +	 * in the case where the hugetlb page is poisoned.
>> +	 */
>> +	VM_WARN_ON_PAGE(!PageHWPoison(page), page);
> 
> IIRC, we will never actually get a tail page here.
> 
> Can we avoid passing a page by checking instead whether the hugetlb folios is
> marked as having a poisoned page?
> 
> See the folio_test_set_hwpoison() in hugetlb_update_hwpoison().
> 
> So you can simply use folio_test_hwpoison here instead.

Okay I will confirm and do this.

> 
> 
>> +	/*
>> +	 * huge_pmd_unshare may unmap an entire PMD page.
>> +	 * There is no way of knowing exactly which PMDs may
>> +	 * be cached for this mm, so we must flush them all.
>> +	 * start/end were already adjusted above to cover this
>> +	 * range.
>> +	 */
>> +	flush_cache_range(vma, range->start, range->end);
>> +
>> +	/*
>> +	 * To call huge_pmd_unshare, i_mmap_rwsem must be
>> +	 * held in write mode.  Caller needs to explicitly
>> +	 * do this outside rmap routines.
>> +	 *
>> +	 * We also must hold hugetlb vma_lock in write mode.
>> +	 * Lock order dictates acquiring vma_lock BEFORE
>> +	 * i_mmap_rwsem.  We can only try lock here and fail
>> +	 * if unsuccessful.
>> +	 */
>> +	if (!folio_test_anon(folio)) {
>> +		struct mmu_gather tlb;
>> +
>> +		VM_WARN_ON(!(flags & TTU_RMAP_LOCKED));
>> +		if (!hugetlb_vma_trylock_write(vma)) {
>> +			*exit_walk = true;
>> +			return false;
>> +		}
>> +
>> +		tlb_gather_mmu_vma(&tlb, vma);
>> +		if (huge_pmd_unshare(&tlb, vma, pvmw->address, pvmw->pte)) {
>> +			hugetlb_vma_unlock_write(vma);
>> +			huge_pmd_unshare_flush(&tlb, vma);
>> +			tlb_finish_mmu(&tlb);
>> +			/*
>> +			 * The PMD table was unmapped,
>> +			 * consequently unmapping the folio.
>> +			 */
>> +			*exit_walk = true;
>> +			return true;
>> +		}
>> +		hugetlb_vma_unlock_write(vma);
>> +		tlb_finish_mmu(&tlb);
>> +	}
>> +	*pteval = huge_ptep_clear_flush(vma, pvmw->address, pvmw->pte);
>> +	if (pte_dirty(*pteval))
>> +		folio_mark_dirty(folio);
>> +
>> +	*exit_walk = false;
>> +	return true;
> 
> 
> Can we instead introduce some enum that tells the caller how to proceed?
> 
> I assume we have
> 
> (a) Abort walk (ret = false + page_vma_mapped_walk_done())
> 
> (b) Walk done (ret = true + page_vma_mapped_walk_done())
> 
> (c) Continue walk (call page_vma_mapped_walk())
> 
> enum ttu_walk_result {
> 	TTU_WALK_CONTINUE,
> 	TTU_WALK_ABORT,
> 	TTU_WALK_DONE
> }

I had replied to such a suggestion here:

https://lore.kernel.org/all/caa7c455-7472-48eb-a5dc-145e587d67ba@arm.com/

Probably we don't have any other solution : )

> 
>> +}
>> +
>>  /*
>>   * @arg: enum ttu_flags will be passed to this argument
>>   */
>> @@ -2115,56 +2177,15 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>  				 PageAnonExclusive(subpage);
>>  
>>  		if (folio_test_hugetlb(folio)) {
>> -			bool anon = folio_test_anon(folio);
>> -
>> -			/*
>> -			 * The try_to_unmap() is only passed a hugetlb page
>> -			 * in the case where the hugetlb page is poisoned.
>> -			 */
>> -			VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage);
>> -			/*
>> -			 * huge_pmd_unshare may unmap an entire PMD page.
>> -			 * There is no way of knowing exactly which PMDs may
>> -			 * be cached for this mm, so we must flush them all.
>> -			 * start/end were already adjusted above to cover this
>> -			 * range.
>> -			 */
>> -			flush_cache_range(vma, range.start, range.end);
>> +			bool exit_walk;
>>  
>> -			/*
>> -			 * To call huge_pmd_unshare, i_mmap_rwsem must be
>> -			 * held in write mode.  Caller needs to explicitly
>> -			 * do this outside rmap routines.
>> -			 *
>> -			 * We also must hold hugetlb vma_lock in write mode.
>> -			 * Lock order dictates acquiring vma_lock BEFORE
>> -			 * i_mmap_rwsem.  We can only try lock here and fail
>> -			 * if unsuccessful.
>> -			 */
>> -			if (!anon) {
>> -				struct mmu_gather tlb;
>> -
>> -				VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
>> -				if (!hugetlb_vma_trylock_write(vma))
>> -					goto walk_abort;
>> -
>> -				tlb_gather_mmu_vma(&tlb, vma);
>> -				if (huge_pmd_unshare(&tlb, vma, address, pvmw.pte)) {
>> -					hugetlb_vma_unlock_write(vma);
>> -					huge_pmd_unshare_flush(&tlb, vma);
>> -					tlb_finish_mmu(&tlb);
>> -					/*
>> -					 * The PMD table was unmapped,
>> -					 * consequently unmapping the folio.
>> -					 */
>> -					goto walk_done;
>> -				}
>> -				hugetlb_vma_unlock_write(vma);
>> -				tlb_finish_mmu(&tlb);
>> +			ret = unmap_hugetlb_folio(vma, folio, &pvmw, subpage,
>> +						  flags, &pteval, &range,
>> +						  &exit_walk);
>> +			if (exit_walk) {
>> +				page_vma_mapped_walk_done(&pvmw);
>> +				break;
> 
> In the old walk_abort case you wouldn't set ret = false?

ret will be set appropriately in unmap_hugetlb_folio.
> 
> When returning the enum you could simply do something like
> 
> switch (ret) {
> case TTU_WALK_ABORT:
> 	goto walk_abort;
> case TTU_WALK_DONE:
> 	goto walk_done;
> default:
> 	break;
> }
> 
> 
> While I like this patch, can we please just move all the hugetlb shite into this
> helper function?
> 
> Essentially, get rid of hugetlb special casing in the remainder of the function.
> 
> That also makes the function name clearer (right now it's only doing a part of
> hugetlb folio unmapping).

Okay I can try that. That would mean splitting the pvmw walk for hugetlb and
non-hugetlb, but I suspect it would be very less code duplication.

> 



  reply	other threads:[~2026-05-11  8:53 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-06  9:44 [PATCH v3 0/9] Optimize anonymous large folio unmapping Dev Jain
2026-05-06  9:44 ` [PATCH v3 1/9] mm/rmap: initialize nr_pages to 1 at loop start in try_to_unmap_one Dev Jain
2026-05-11  6:48   ` David Hildenbrand (Arm)
2026-05-11  8:18     ` Dev Jain
2026-05-11  8:32       ` David Hildenbrand (Arm)
2026-05-06  9:44 ` [PATCH v3 2/9] mm/rmap: refactor hugetlb pte clearing " Dev Jain
2026-05-11  7:10   ` David Hildenbrand (Arm)
2026-05-11  8:53     ` Dev Jain [this message]
2026-05-11  8:59       ` David Hildenbrand (Arm)
2026-05-06  9:44 ` [PATCH v3 3/9] mm/rmap: refactor some code around lazyfree folio unmapping Dev Jain
2026-05-11  7:28   ` David Hildenbrand (Arm)
2026-05-06  9:44 ` [PATCH v3 4/9] mm/memory: Batch set uffd-wp markers during zapping Dev Jain
2026-05-11  7:37   ` David Hildenbrand (Arm)
2026-05-06  9:45 ` [PATCH v3 5/9] mm/rmap: batch unmap folios belonging to uffd-wp VMAs Dev Jain
2026-05-11  7:41   ` David Hildenbrand (Arm)
2026-05-06  9:45 ` [PATCH v3 6/9] mm/swapfile: Add batched version of folio_dup_swap Dev Jain
2026-05-11  7:45   ` David Hildenbrand (Arm)
2026-05-06  9:45 ` [PATCH v3 7/9] mm/swapfile: Add batched version of folio_put_swap Dev Jain
2026-05-11  8:07   ` David Hildenbrand (Arm)
2026-05-06  9:45 ` [PATCH v3 8/9] mm/rmap: Add batched version of folio_try_share_anon_rmap_pte Dev Jain
2026-05-11  8:13   ` David Hildenbrand (Arm)
2026-05-11  8:14     ` David Hildenbrand (Arm)
2026-05-06  9:45 ` [PATCH v3 9/9] mm/rmap: enable batch unmapping of anonymous folios Dev Jain
2026-05-11  8:16   ` David Hildenbrand (Arm)
2026-05-08 23:38 ` [PATCH v3 0/9] Optimize anonymous large folio unmapping Andrew Morton
2026-05-11  6:21   ` Dev Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d2f11bbd-93ec-4a7e-9de3-ed4541914ad9@arm.com \
    --to=dev.jain@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bhe@redhat.com \
    --cc=chrisl@kernel.org \
    --cc=david@kernel.org \
    --cc=harry@kernel.org \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=kasong@tencent.com \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=nphamcs@gmail.com \
    --cc=pfalcato@suse.de \
    --cc=qi.zheng@linux.dev \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=weixugc@google.com \
    --cc=youngjun.park@lge.com \
    --cc=yuanchu@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox