Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Dev Jain <dev.jain@arm.com>,
	akpm@linux-foundation.org, ljs@kernel.org, hughd@google.com,
	chrisl@kernel.org, kasong@tencent.com
Cc: riel@surriel.com, liam@infradead.org, vbabka@kernel.org,
	harry@kernel.org, jannh@google.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, qi.zheng@linux.dev,
	shakeel.butt@linux.dev, baohua@kernel.org,
	axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com,
	rppt@kernel.org, surenb@google.com, mhocko@suse.com,
	baolin.wang@linux.alibaba.com, shikemeng@huaweicloud.com,
	nphamcs@gmail.com, bhe@redhat.com, youngjun.park@lge.com,
	pfalcato@suse.de, ryan.roberts@arm.com,
	anshuman.khandual@arm.com
Subject: Re: [PATCH v3 2/9] mm/rmap: refactor hugetlb pte clearing in try_to_unmap_one
Date: Mon, 11 May 2026 10:59:35 +0200	[thread overview]
Message-ID: <f3ec1b78-e405-4b58-8588-faa97070cb3b@kernel.org> (raw)
In-Reply-To: <d2f11bbd-93ec-4a7e-9de3-ed4541914ad9@arm.com>

On 5/11/26 10:53, Dev Jain wrote:
> 
> 
> On 11/05/26 12:40 pm, David Hildenbrand (Arm) wrote:
>> On 5/6/26 11:44, Dev Jain wrote:
>>> Simplify the code by refactoring the folio_test_hugetlb() branch into
>>> a new function.
>>>
>>> While at it, convert BUG helpers to WARN helpers.
>>>
>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>> ---
>>>  mm/rmap.c | 117 ++++++++++++++++++++++++++++++++----------------------
>>>  1 file changed, 69 insertions(+), 48 deletions(-)
>>>
>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>> index a5f067a09de0f..a98acdea0530a 100644
>>> --- a/mm/rmap.c
>>> +++ b/mm/rmap.c
>>> @@ -1978,6 +1978,68 @@ static inline unsigned int folio_unmap_pte_batch(struct folio *folio,
>>>  				     FPB_RESPECT_WRITE | FPB_RESPECT_SOFT_DIRTY);
>>>  }
>>>  
>>> +/* Returns false if unmap needs to be aborted */
>>> +static inline bool unmap_hugetlb_folio(struct vm_area_struct *vma,
>>
>> I'm wondering whether we should make it clearer that this belongs to the
>> try_to_unmap family by calling it
>>
>> 	ttu_hugetlb_folio
> 
> Yes I had suggested a ttu_ prefix somewhere else in the first version,
> Lorenzo didn't like it (or probably he didn't like that specific use
> of ttu):
> 
> https://lore.kernel.org/all/a8b06f36-98e1-435c-881f-67242bc4304a@lucifer.local/
> 
> Don't know about a better name other than "commit_ttu_lazyfree_folio" in
> that case, but for the hugetlb case, I like ttu_hugetlb_folio.

Yes, in particular, once we just process the whole hugetlb oddity in there.

I don't really care about the exact name as long as it's clear that this is not
something fairly generic.

[...]

>>
>>
>>> +	/*
>>> +	 * huge_pmd_unshare may unmap an entire PMD page.
>>> +	 * There is no way of knowing exactly which PMDs may
>>> +	 * be cached for this mm, so we must flush them all.
>>> +	 * start/end were already adjusted above to cover this
>>> +	 * range.
>>> +	 */
>>> +	flush_cache_range(vma, range->start, range->end);
>>> +
>>> +	/*
>>> +	 * To call huge_pmd_unshare, i_mmap_rwsem must be
>>> +	 * held in write mode.  Caller needs to explicitly
>>> +	 * do this outside rmap routines.
>>> +	 *
>>> +	 * We also must hold hugetlb vma_lock in write mode.
>>> +	 * Lock order dictates acquiring vma_lock BEFORE
>>> +	 * i_mmap_rwsem.  We can only try lock here and fail
>>> +	 * if unsuccessful.
>>> +	 */
>>> +	if (!folio_test_anon(folio)) {
>>> +		struct mmu_gather tlb;
>>> +
>>> +		VM_WARN_ON(!(flags & TTU_RMAP_LOCKED));
>>> +		if (!hugetlb_vma_trylock_write(vma)) {
>>> +			*exit_walk = true;
>>> +			return false;
>>> +		}
>>> +
>>> +		tlb_gather_mmu_vma(&tlb, vma);
>>> +		if (huge_pmd_unshare(&tlb, vma, pvmw->address, pvmw->pte)) {
>>> +			hugetlb_vma_unlock_write(vma);
>>> +			huge_pmd_unshare_flush(&tlb, vma);
>>> +			tlb_finish_mmu(&tlb);
>>> +			/*
>>> +			 * The PMD table was unmapped,
>>> +			 * consequently unmapping the folio.
>>> +			 */
>>> +			*exit_walk = true;
>>> +			return true;
>>> +		}
>>> +		hugetlb_vma_unlock_write(vma);
>>> +		tlb_finish_mmu(&tlb);
>>> +	}
>>> +	*pteval = huge_ptep_clear_flush(vma, pvmw->address, pvmw->pte);
>>> +	if (pte_dirty(*pteval))
>>> +		folio_mark_dirty(folio);
>>> +
>>> +	*exit_walk = false;
>>> +	return true;
>>
>>
>> Can we instead introduce some enum that tells the caller how to proceed?
>>
>> I assume we have
>>
>> (a) Abort walk (ret = false + page_vma_mapped_walk_done())
>>
>> (b) Walk done (ret = true + page_vma_mapped_walk_done())
>>
>> (c) Continue walk (call page_vma_mapped_walk())
>>
>> enum ttu_walk_result {
>> 	TTU_WALK_CONTINUE,
>> 	TTU_WALK_ABORT,
>> 	TTU_WALK_DONE
>> }
> 
> I had replied to such a suggestion here:
> 
> https://lore.kernel.org/all/caa7c455-7472-48eb-a5dc-145e587d67ba@arm.com/
> 
> Probably we don't have any other solution : )

That looks like the right way to. The boolean return is just nasty.

>>> -					 */
>>> -					goto walk_done;
>>> -				}
>>> -				hugetlb_vma_unlock_write(vma);
>>> -				tlb_finish_mmu(&tlb);
>>> +			ret = unmap_hugetlb_folio(vma, folio, &pvmw, subpage,
>>> +						  flags, &pteval, &range,
>>> +						  &exit_walk);
>>> +			if (exit_walk) {
>>> +				page_vma_mapped_walk_done(&pvmw);
>>> +				break;
>>
>> In the old walk_abort case you wouldn't set ret = false?
> 
> ret will be set appropriately in unmap_hugetlb_folio.

Ah, right. Confusing ;)

>>
>> When returning the enum you could simply do something like
>>
>> switch (ret) {
>> case TTU_WALK_ABORT:
>> 	goto walk_abort;
>> case TTU_WALK_DONE:
>> 	goto walk_done;
>> default:
>> 	break;
>> }
>>
>>
>> While I like this patch, can we please just move all the hugetlb shite into this
>> helper function?
>>
>> Essentially, get rid of hugetlb special casing in the remainder of the function.
>>
>> That also makes the function name clearer (right now it's only doing a part of
>> hugetlb folio unmapping).
> 
> Okay I can try that. That would mean splitting the pvmw walk for hugetlb and
> non-hugetlb, but I suspect it would be very less code duplication.


Right, and it would also be clearer that hugetlb really only is called for
hwpoison handling.


-- 
Cheers,

David


  reply	other threads:[~2026-05-11  8:59 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-06  9:44 [PATCH v3 0/9] Optimize anonymous large folio unmapping Dev Jain
2026-05-06  9:44 ` [PATCH v3 1/9] mm/rmap: initialize nr_pages to 1 at loop start in try_to_unmap_one Dev Jain
2026-05-11  6:48   ` David Hildenbrand (Arm)
2026-05-11  8:18     ` Dev Jain
2026-05-11  8:32       ` David Hildenbrand (Arm)
2026-05-06  9:44 ` [PATCH v3 2/9] mm/rmap: refactor hugetlb pte clearing " Dev Jain
2026-05-11  7:10   ` David Hildenbrand (Arm)
2026-05-11  8:53     ` Dev Jain
2026-05-11  8:59       ` David Hildenbrand (Arm) [this message]
2026-05-06  9:44 ` [PATCH v3 3/9] mm/rmap: refactor some code around lazyfree folio unmapping Dev Jain
2026-05-11  7:28   ` David Hildenbrand (Arm)
2026-05-06  9:44 ` [PATCH v3 4/9] mm/memory: Batch set uffd-wp markers during zapping Dev Jain
2026-05-11  7:37   ` David Hildenbrand (Arm)
2026-05-06  9:45 ` [PATCH v3 5/9] mm/rmap: batch unmap folios belonging to uffd-wp VMAs Dev Jain
2026-05-11  7:41   ` David Hildenbrand (Arm)
2026-05-06  9:45 ` [PATCH v3 6/9] mm/swapfile: Add batched version of folio_dup_swap Dev Jain
2026-05-11  7:45   ` David Hildenbrand (Arm)
2026-05-06  9:45 ` [PATCH v3 7/9] mm/swapfile: Add batched version of folio_put_swap Dev Jain
2026-05-11  8:07   ` David Hildenbrand (Arm)
2026-05-06  9:45 ` [PATCH v3 8/9] mm/rmap: Add batched version of folio_try_share_anon_rmap_pte Dev Jain
2026-05-11  8:13   ` David Hildenbrand (Arm)
2026-05-11  8:14     ` David Hildenbrand (Arm)
2026-05-06  9:45 ` [PATCH v3 9/9] mm/rmap: enable batch unmapping of anonymous folios Dev Jain
2026-05-11  8:16   ` David Hildenbrand (Arm)
2026-05-08 23:38 ` [PATCH v3 0/9] Optimize anonymous large folio unmapping Andrew Morton
2026-05-11  6:21   ` Dev Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f3ec1b78-e405-4b58-8588-faa97070cb3b@kernel.org \
    --to=david@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bhe@redhat.com \
    --cc=chrisl@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=harry@kernel.org \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=kasong@tencent.com \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=nphamcs@gmail.com \
    --cc=pfalcato@suse.de \
    --cc=qi.zheng@linux.dev \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=weixugc@google.com \
    --cc=youngjun.park@lge.com \
    --cc=yuanchu@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox