From: Dev Jain <dev.jain@arm.com>
To: "David Hildenbrand (Arm)" <david@kernel.org>,
akpm@linux-foundation.org, ljs@kernel.org, hughd@google.com,
chrisl@kernel.org, kasong@tencent.com
Cc: riel@surriel.com, liam@infradead.org, vbabka@kernel.org,
harry@kernel.org, jannh@google.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, qi.zheng@linux.dev,
shakeel.butt@linux.dev, baohua@kernel.org,
axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com,
rppt@kernel.org, surenb@google.com, mhocko@suse.com,
baolin.wang@linux.alibaba.com, shikemeng@huaweicloud.com,
nphamcs@gmail.com, bhe@redhat.com, youngjun.park@lge.com,
pfalcato@suse.de, ryan.roberts@arm.com,
anshuman.khandual@arm.com
Subject: Re: [PATCH v3 2/9] mm/rmap: refactor hugetlb pte clearing in try_to_unmap_one
Date: Mon, 11 May 2026 14:23:24 +0530 [thread overview]
Message-ID: <d2f11bbd-93ec-4a7e-9de3-ed4541914ad9@arm.com> (raw)
In-Reply-To: <5a4c3c3d-66c8-4ef6-bb6a-2ec0e32694a1@kernel.org>
On 11/05/26 12:40 pm, David Hildenbrand (Arm) wrote:
> On 5/6/26 11:44, Dev Jain wrote:
>> Simplify the code by refactoring the folio_test_hugetlb() branch into
>> a new function.
>>
>> While at it, convert BUG helpers to WARN helpers.
>>
>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>> ---
>> mm/rmap.c | 117 ++++++++++++++++++++++++++++++++----------------------
>> 1 file changed, 69 insertions(+), 48 deletions(-)
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index a5f067a09de0f..a98acdea0530a 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1978,6 +1978,68 @@ static inline unsigned int folio_unmap_pte_batch(struct folio *folio,
>> FPB_RESPECT_WRITE | FPB_RESPECT_SOFT_DIRTY);
>> }
>>
>> +/* Returns false if unmap needs to be aborted */
>> +static inline bool unmap_hugetlb_folio(struct vm_area_struct *vma,
>
> I'm wondering whether we should make it clearer that this belongs to the
> try_to_unmap family by calling it
>
> ttu_hugetlb_folio
Yes I had suggested a ttu_ prefix somewhere else in the first version,
Lorenzo didn't like it (or probably he didn't like that specific use
of ttu):
https://lore.kernel.org/all/a8b06f36-98e1-435c-881f-67242bc4304a@lucifer.local/
Don't know about a better name other than "commit_ttu_lazyfree_folio" in
that case, but for the hugetlb case, I like ttu_hugetlb_folio.
>
>> + struct folio *folio, struct page_vma_mapped_walk *pvmw,
>> + struct page *page, enum ttu_flags flags, pte_t *pteval,
>> + struct mmu_notifier_range *range, bool *exit_walk)
>> +{
>> + /*
>> + * The try_to_unmap() is only passed a hugetlb page
>> + * in the case where the hugetlb page is poisoned.
>> + */
>> + VM_WARN_ON_PAGE(!PageHWPoison(page), page);
>
> IIRC, we will never actually get a tail page here.
>
> Can we avoid passing a page by checking instead whether the hugetlb folios is
> marked as having a poisoned page?
>
> See the folio_test_set_hwpoison() in hugetlb_update_hwpoison().
>
> So you can simply use folio_test_hwpoison here instead.
Okay I will confirm and do this.
>
>
>> + /*
>> + * huge_pmd_unshare may unmap an entire PMD page.
>> + * There is no way of knowing exactly which PMDs may
>> + * be cached for this mm, so we must flush them all.
>> + * start/end were already adjusted above to cover this
>> + * range.
>> + */
>> + flush_cache_range(vma, range->start, range->end);
>> +
>> + /*
>> + * To call huge_pmd_unshare, i_mmap_rwsem must be
>> + * held in write mode. Caller needs to explicitly
>> + * do this outside rmap routines.
>> + *
>> + * We also must hold hugetlb vma_lock in write mode.
>> + * Lock order dictates acquiring vma_lock BEFORE
>> + * i_mmap_rwsem. We can only try lock here and fail
>> + * if unsuccessful.
>> + */
>> + if (!folio_test_anon(folio)) {
>> + struct mmu_gather tlb;
>> +
>> + VM_WARN_ON(!(flags & TTU_RMAP_LOCKED));
>> + if (!hugetlb_vma_trylock_write(vma)) {
>> + *exit_walk = true;
>> + return false;
>> + }
>> +
>> + tlb_gather_mmu_vma(&tlb, vma);
>> + if (huge_pmd_unshare(&tlb, vma, pvmw->address, pvmw->pte)) {
>> + hugetlb_vma_unlock_write(vma);
>> + huge_pmd_unshare_flush(&tlb, vma);
>> + tlb_finish_mmu(&tlb);
>> + /*
>> + * The PMD table was unmapped,
>> + * consequently unmapping the folio.
>> + */
>> + *exit_walk = true;
>> + return true;
>> + }
>> + hugetlb_vma_unlock_write(vma);
>> + tlb_finish_mmu(&tlb);
>> + }
>> + *pteval = huge_ptep_clear_flush(vma, pvmw->address, pvmw->pte);
>> + if (pte_dirty(*pteval))
>> + folio_mark_dirty(folio);
>> +
>> + *exit_walk = false;
>> + return true;
>
>
> Can we instead introduce some enum that tells the caller how to proceed?
>
> I assume we have
>
> (a) Abort walk (ret = false + page_vma_mapped_walk_done())
>
> (b) Walk done (ret = true + page_vma_mapped_walk_done())
>
> (c) Continue walk (call page_vma_mapped_walk())
>
> enum ttu_walk_result {
> TTU_WALK_CONTINUE,
> TTU_WALK_ABORT,
> TTU_WALK_DONE
> }
I had replied to such a suggestion here:
https://lore.kernel.org/all/caa7c455-7472-48eb-a5dc-145e587d67ba@arm.com/
Probably we don't have any other solution : )
>
>> +}
>> +
>> /*
>> * @arg: enum ttu_flags will be passed to this argument
>> */
>> @@ -2115,56 +2177,15 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>> PageAnonExclusive(subpage);
>>
>> if (folio_test_hugetlb(folio)) {
>> - bool anon = folio_test_anon(folio);
>> -
>> - /*
>> - * The try_to_unmap() is only passed a hugetlb page
>> - * in the case where the hugetlb page is poisoned.
>> - */
>> - VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage);
>> - /*
>> - * huge_pmd_unshare may unmap an entire PMD page.
>> - * There is no way of knowing exactly which PMDs may
>> - * be cached for this mm, so we must flush them all.
>> - * start/end were already adjusted above to cover this
>> - * range.
>> - */
>> - flush_cache_range(vma, range.start, range.end);
>> + bool exit_walk;
>>
>> - /*
>> - * To call huge_pmd_unshare, i_mmap_rwsem must be
>> - * held in write mode. Caller needs to explicitly
>> - * do this outside rmap routines.
>> - *
>> - * We also must hold hugetlb vma_lock in write mode.
>> - * Lock order dictates acquiring vma_lock BEFORE
>> - * i_mmap_rwsem. We can only try lock here and fail
>> - * if unsuccessful.
>> - */
>> - if (!anon) {
>> - struct mmu_gather tlb;
>> -
>> - VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
>> - if (!hugetlb_vma_trylock_write(vma))
>> - goto walk_abort;
>> -
>> - tlb_gather_mmu_vma(&tlb, vma);
>> - if (huge_pmd_unshare(&tlb, vma, address, pvmw.pte)) {
>> - hugetlb_vma_unlock_write(vma);
>> - huge_pmd_unshare_flush(&tlb, vma);
>> - tlb_finish_mmu(&tlb);
>> - /*
>> - * The PMD table was unmapped,
>> - * consequently unmapping the folio.
>> - */
>> - goto walk_done;
>> - }
>> - hugetlb_vma_unlock_write(vma);
>> - tlb_finish_mmu(&tlb);
>> + ret = unmap_hugetlb_folio(vma, folio, &pvmw, subpage,
>> + flags, &pteval, &range,
>> + &exit_walk);
>> + if (exit_walk) {
>> + page_vma_mapped_walk_done(&pvmw);
>> + break;
>
> In the old walk_abort case you wouldn't set ret = false?
ret will be set appropriately in unmap_hugetlb_folio.
>
> When returning the enum you could simply do something like
>
> switch (ret) {
> case TTU_WALK_ABORT:
> goto walk_abort;
> case TTU_WALK_DONE:
> goto walk_done;
> default:
> break;
> }
>
>
> While I like this patch, can we please just move all the hugetlb shite into this
> helper function?
>
> Essentially, get rid of hugetlb special casing in the remainder of the function.
>
> That also makes the function name clearer (right now it's only doing a part of
> hugetlb folio unmapping).
Okay I can try that. That would mean splitting the pvmw walk for hugetlb and
non-hugetlb, but I suspect it would be very less code duplication.
>
next prev parent reply other threads:[~2026-05-11 8:53 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-06 9:44 [PATCH v3 0/9] Optimize anonymous large folio unmapping Dev Jain
2026-05-06 9:44 ` [PATCH v3 1/9] mm/rmap: initialize nr_pages to 1 at loop start in try_to_unmap_one Dev Jain
2026-05-11 6:48 ` David Hildenbrand (Arm)
2026-05-11 8:18 ` Dev Jain
2026-05-11 8:32 ` David Hildenbrand (Arm)
2026-05-06 9:44 ` [PATCH v3 2/9] mm/rmap: refactor hugetlb pte clearing " Dev Jain
2026-05-11 7:10 ` David Hildenbrand (Arm)
2026-05-11 8:53 ` Dev Jain [this message]
2026-05-11 8:59 ` David Hildenbrand (Arm)
2026-05-06 9:44 ` [PATCH v3 3/9] mm/rmap: refactor some code around lazyfree folio unmapping Dev Jain
2026-05-11 7:28 ` David Hildenbrand (Arm)
2026-05-06 9:44 ` [PATCH v3 4/9] mm/memory: Batch set uffd-wp markers during zapping Dev Jain
2026-05-11 7:37 ` David Hildenbrand (Arm)
2026-05-06 9:45 ` [PATCH v3 5/9] mm/rmap: batch unmap folios belonging to uffd-wp VMAs Dev Jain
2026-05-11 7:41 ` David Hildenbrand (Arm)
2026-05-06 9:45 ` [PATCH v3 6/9] mm/swapfile: Add batched version of folio_dup_swap Dev Jain
2026-05-11 7:45 ` David Hildenbrand (Arm)
2026-05-06 9:45 ` [PATCH v3 7/9] mm/swapfile: Add batched version of folio_put_swap Dev Jain
2026-05-11 8:07 ` David Hildenbrand (Arm)
2026-05-06 9:45 ` [PATCH v3 8/9] mm/rmap: Add batched version of folio_try_share_anon_rmap_pte Dev Jain
2026-05-11 8:13 ` David Hildenbrand (Arm)
2026-05-11 8:14 ` David Hildenbrand (Arm)
2026-05-06 9:45 ` [PATCH v3 9/9] mm/rmap: enable batch unmapping of anonymous folios Dev Jain
2026-05-11 8:16 ` David Hildenbrand (Arm)
2026-05-08 23:38 ` [PATCH v3 0/9] Optimize anonymous large folio unmapping Andrew Morton
2026-05-11 6:21 ` Dev Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d2f11bbd-93ec-4a7e-9de3-ed4541914ad9@arm.com \
--to=dev.jain@arm.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=axelrasmussen@google.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=bhe@redhat.com \
--cc=chrisl@kernel.org \
--cc=david@kernel.org \
--cc=harry@kernel.org \
--cc=hughd@google.com \
--cc=jannh@google.com \
--cc=kasong@tencent.com \
--cc=liam@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=nphamcs@gmail.com \
--cc=pfalcato@suse.de \
--cc=qi.zheng@linux.dev \
--cc=riel@surriel.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=shakeel.butt@linux.dev \
--cc=shikemeng@huaweicloud.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=weixugc@google.com \
--cc=youngjun.park@lge.com \
--cc=yuanchu@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox