From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Dev Jain <dev.jain@arm.com>,
akpm@linux-foundation.org, ljs@kernel.org, hughd@google.com,
chrisl@kernel.org, kasong@tencent.com
Cc: riel@surriel.com, liam@infradead.org, vbabka@kernel.org,
harry@kernel.org, jannh@google.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, qi.zheng@linux.dev,
shakeel.butt@linux.dev, baohua@kernel.org,
axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com,
rppt@kernel.org, surenb@google.com, mhocko@suse.com,
baolin.wang@linux.alibaba.com, shikemeng@huaweicloud.com,
nphamcs@gmail.com, bhe@redhat.com, youngjun.park@lge.com,
pfalcato@suse.de, ryan.roberts@arm.com,
anshuman.khandual@arm.com
Subject: Re: [PATCH v3 2/9] mm/rmap: refactor hugetlb pte clearing in try_to_unmap_one
Date: Mon, 11 May 2026 09:10:05 +0200 [thread overview]
Message-ID: <5a4c3c3d-66c8-4ef6-bb6a-2ec0e32694a1@kernel.org> (raw)
In-Reply-To: <20260506094504.2588857-3-dev.jain@arm.com>
On 5/6/26 11:44, Dev Jain wrote:
> Simplify the code by refactoring the folio_test_hugetlb() branch into
> a new function.
>
> While at it, convert BUG helpers to WARN helpers.
>
> Signed-off-by: Dev Jain <dev.jain@arm.com>
> ---
> mm/rmap.c | 117 ++++++++++++++++++++++++++++++++----------------------
> 1 file changed, 69 insertions(+), 48 deletions(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index a5f067a09de0f..a98acdea0530a 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1978,6 +1978,68 @@ static inline unsigned int folio_unmap_pte_batch(struct folio *folio,
> FPB_RESPECT_WRITE | FPB_RESPECT_SOFT_DIRTY);
> }
>
> +/* Returns false if unmap needs to be aborted */
> +static inline bool unmap_hugetlb_folio(struct vm_area_struct *vma,
I'm wondering whether we should make it clearer that this belongs to the
try_to_unmap family by calling it
ttu_hugetlb_folio
> + struct folio *folio, struct page_vma_mapped_walk *pvmw,
> + struct page *page, enum ttu_flags flags, pte_t *pteval,
> + struct mmu_notifier_range *range, bool *exit_walk)
> +{
> + /*
> + * The try_to_unmap() is only passed a hugetlb page
> + * in the case where the hugetlb page is poisoned.
> + */
> + VM_WARN_ON_PAGE(!PageHWPoison(page), page);
IIRC, we will never actually get a tail page here.
Can we avoid passing a page by checking instead whether the hugetlb folios is
marked as having a poisoned page?
See the folio_test_set_hwpoison() in hugetlb_update_hwpoison().
So you can simply use folio_test_hwpoison here instead.
> + /*
> + * huge_pmd_unshare may unmap an entire PMD page.
> + * There is no way of knowing exactly which PMDs may
> + * be cached for this mm, so we must flush them all.
> + * start/end were already adjusted above to cover this
> + * range.
> + */
> + flush_cache_range(vma, range->start, range->end);
> +
> + /*
> + * To call huge_pmd_unshare, i_mmap_rwsem must be
> + * held in write mode. Caller needs to explicitly
> + * do this outside rmap routines.
> + *
> + * We also must hold hugetlb vma_lock in write mode.
> + * Lock order dictates acquiring vma_lock BEFORE
> + * i_mmap_rwsem. We can only try lock here and fail
> + * if unsuccessful.
> + */
> + if (!folio_test_anon(folio)) {
> + struct mmu_gather tlb;
> +
> + VM_WARN_ON(!(flags & TTU_RMAP_LOCKED));
> + if (!hugetlb_vma_trylock_write(vma)) {
> + *exit_walk = true;
> + return false;
> + }
> +
> + tlb_gather_mmu_vma(&tlb, vma);
> + if (huge_pmd_unshare(&tlb, vma, pvmw->address, pvmw->pte)) {
> + hugetlb_vma_unlock_write(vma);
> + huge_pmd_unshare_flush(&tlb, vma);
> + tlb_finish_mmu(&tlb);
> + /*
> + * The PMD table was unmapped,
> + * consequently unmapping the folio.
> + */
> + *exit_walk = true;
> + return true;
> + }
> + hugetlb_vma_unlock_write(vma);
> + tlb_finish_mmu(&tlb);
> + }
> + *pteval = huge_ptep_clear_flush(vma, pvmw->address, pvmw->pte);
> + if (pte_dirty(*pteval))
> + folio_mark_dirty(folio);
> +
> + *exit_walk = false;
> + return true;
Can we instead introduce some enum that tells the caller how to proceed?
I assume we have
(a) Abort walk (ret = false + page_vma_mapped_walk_done())
(b) Walk done (ret = true + page_vma_mapped_walk_done())
(c) Continue walk (call page_vma_mapped_walk())
enum ttu_walk_result {
TTU_WALK_CONTINUE,
TTU_WALK_ABORT,
TTU_WALK_DONE
}
> +}
> +
> /*
> * @arg: enum ttu_flags will be passed to this argument
> */
> @@ -2115,56 +2177,15 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> PageAnonExclusive(subpage);
>
> if (folio_test_hugetlb(folio)) {
> - bool anon = folio_test_anon(folio);
> -
> - /*
> - * The try_to_unmap() is only passed a hugetlb page
> - * in the case where the hugetlb page is poisoned.
> - */
> - VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage);
> - /*
> - * huge_pmd_unshare may unmap an entire PMD page.
> - * There is no way of knowing exactly which PMDs may
> - * be cached for this mm, so we must flush them all.
> - * start/end were already adjusted above to cover this
> - * range.
> - */
> - flush_cache_range(vma, range.start, range.end);
> + bool exit_walk;
>
> - /*
> - * To call huge_pmd_unshare, i_mmap_rwsem must be
> - * held in write mode. Caller needs to explicitly
> - * do this outside rmap routines.
> - *
> - * We also must hold hugetlb vma_lock in write mode.
> - * Lock order dictates acquiring vma_lock BEFORE
> - * i_mmap_rwsem. We can only try lock here and fail
> - * if unsuccessful.
> - */
> - if (!anon) {
> - struct mmu_gather tlb;
> -
> - VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
> - if (!hugetlb_vma_trylock_write(vma))
> - goto walk_abort;
> -
> - tlb_gather_mmu_vma(&tlb, vma);
> - if (huge_pmd_unshare(&tlb, vma, address, pvmw.pte)) {
> - hugetlb_vma_unlock_write(vma);
> - huge_pmd_unshare_flush(&tlb, vma);
> - tlb_finish_mmu(&tlb);
> - /*
> - * The PMD table was unmapped,
> - * consequently unmapping the folio.
> - */
> - goto walk_done;
> - }
> - hugetlb_vma_unlock_write(vma);
> - tlb_finish_mmu(&tlb);
> + ret = unmap_hugetlb_folio(vma, folio, &pvmw, subpage,
> + flags, &pteval, &range,
> + &exit_walk);
> + if (exit_walk) {
> + page_vma_mapped_walk_done(&pvmw);
> + break;
In the old walk_abort case you wouldn't set ret = false?
When returning the enum you could simply do something like
switch (ret) {
case TTU_WALK_ABORT:
goto walk_abort;
case TTU_WALK_DONE:
goto walk_done;
default:
break;
}
While I like this patch, can we please just move all the hugetlb shite into this
helper function?
Essentially, get rid of hugetlb special casing in the remainder of the function.
That also makes the function name clearer (right now it's only doing a part of
hugetlb folio unmapping).
--
Cheers,
David
next prev parent reply other threads:[~2026-05-11 7:10 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-06 9:44 [PATCH v3 0/9] Optimize anonymous large folio unmapping Dev Jain
2026-05-06 9:44 ` [PATCH v3 1/9] mm/rmap: initialize nr_pages to 1 at loop start in try_to_unmap_one Dev Jain
2026-05-11 6:48 ` David Hildenbrand (Arm)
2026-05-11 8:18 ` Dev Jain
2026-05-11 8:32 ` David Hildenbrand (Arm)
2026-05-06 9:44 ` [PATCH v3 2/9] mm/rmap: refactor hugetlb pte clearing " Dev Jain
2026-05-11 7:10 ` David Hildenbrand (Arm) [this message]
2026-05-11 8:53 ` Dev Jain
2026-05-11 8:59 ` David Hildenbrand (Arm)
2026-05-06 9:44 ` [PATCH v3 3/9] mm/rmap: refactor some code around lazyfree folio unmapping Dev Jain
2026-05-11 7:28 ` David Hildenbrand (Arm)
2026-05-06 9:44 ` [PATCH v3 4/9] mm/memory: Batch set uffd-wp markers during zapping Dev Jain
2026-05-11 7:37 ` David Hildenbrand (Arm)
2026-05-06 9:45 ` [PATCH v3 5/9] mm/rmap: batch unmap folios belonging to uffd-wp VMAs Dev Jain
2026-05-11 7:41 ` David Hildenbrand (Arm)
2026-05-06 9:45 ` [PATCH v3 6/9] mm/swapfile: Add batched version of folio_dup_swap Dev Jain
2026-05-11 7:45 ` David Hildenbrand (Arm)
2026-05-06 9:45 ` [PATCH v3 7/9] mm/swapfile: Add batched version of folio_put_swap Dev Jain
2026-05-11 8:07 ` David Hildenbrand (Arm)
2026-05-06 9:45 ` [PATCH v3 8/9] mm/rmap: Add batched version of folio_try_share_anon_rmap_pte Dev Jain
2026-05-11 8:13 ` David Hildenbrand (Arm)
2026-05-11 8:14 ` David Hildenbrand (Arm)
2026-05-06 9:45 ` [PATCH v3 9/9] mm/rmap: enable batch unmapping of anonymous folios Dev Jain
2026-05-11 8:16 ` David Hildenbrand (Arm)
2026-05-08 23:38 ` [PATCH v3 0/9] Optimize anonymous large folio unmapping Andrew Morton
2026-05-11 6:21 ` Dev Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5a4c3c3d-66c8-4ef6-bb6a-2ec0e32694a1@kernel.org \
--to=david@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=axelrasmussen@google.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=bhe@redhat.com \
--cc=chrisl@kernel.org \
--cc=dev.jain@arm.com \
--cc=harry@kernel.org \
--cc=hughd@google.com \
--cc=jannh@google.com \
--cc=kasong@tencent.com \
--cc=liam@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=nphamcs@gmail.com \
--cc=pfalcato@suse.de \
--cc=qi.zheng@linux.dev \
--cc=riel@surriel.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=shakeel.butt@linux.dev \
--cc=shikemeng@huaweicloud.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=weixugc@google.com \
--cc=youngjun.park@lge.com \
--cc=yuanchu@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox