From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Baolin Wang <baolin.wang@linux.alibaba.com>,
akpm@linux-foundation.org, catalin.marinas@arm.com,
will@kernel.org
Cc: lorenzo.stoakes@oracle.com, ryan.roberts@arm.com,
Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, riel@surriel.com,
harry.yoo@oracle.com, jannh@google.com, willy@infradead.org,
baohua@kernel.org, dev.jain@arm.com, linux-mm@kvack.org,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 1/5] mm: rmap: support batched checks of the references for large folios
Date: Mon, 9 Feb 2026 10:20:27 +0100 [thread overview]
Message-ID: <b86bfd4e-302c-4152-8dfd-41f67515b71d@kernel.org> (raw)
In-Reply-To: <44453a4c-50a2-4e7e-9d2a-ebf973ccf6b7@linux.alibaba.com>
On 2/9/26 10:14, Baolin Wang wrote:
>
>
> On 2/9/26 4:49 PM, David Hildenbrand (Arm) wrote:
>> On 12/26/25 07:07, Baolin Wang wrote:
>>> Currently, folio_referenced_one() always checks the young flag for
>>> each PTE
>>> sequentially, which is inefficient for large folios. This
>>> inefficiency is
>>> especially noticeable when reclaiming clean file-backed large folios,
>>> where
>>> folio_referenced() is observed as a significant performance hotspot.
>>>
>>> Moreover, on Arm64 architecture, which supports contiguous PTEs,
>>> there is already
>>> an optimization to clear the young flags for PTEs within a contiguous
>>> range.
>>> However, this is not sufficient. We can extend this to perform
>>> batched operations
>>> for the entire large folio (which might exceed the contiguous range:
>>> CONT_PTE_SIZE).
>>>
>>> Introduce a new API: clear_flush_young_ptes() to facilitate batched
>>> checking
>>> of the young flags and flushing TLB entries, thereby improving
>>> performance
>>> during large folio reclamation. And it will be overridden by the
>>> architecture
>>> that implements a more efficient batch operation in the following
>>> patches.
>>>
>>> While we are at it, rename ptep_clear_flush_young_notify() to
>>> clear_flush_young_ptes_notify() to indicate that this is a batch
>>> operation.
>>>
>>> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> ---
>>> include/linux/mmu_notifier.h | 9 +++++----
>>> include/linux/pgtable.h | 31 +++++++++++++++++++++++++++++++
>>> mm/rmap.c | 31 ++++++++++++++++++++++++++++---
>>> 3 files changed, 64 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
>>> index d1094c2d5fb6..07a2bbaf86e9 100644
>>> --- a/include/linux/mmu_notifier.h
>>> +++ b/include/linux/mmu_notifier.h
>>> @@ -515,16 +515,17 @@ static inline void mmu_notifier_range_init_owner(
>>> range->owner = owner;
>>> }
>>> -#define ptep_clear_flush_young_notify(__vma, __address,
>>> __ptep) \
>>> +#define clear_flush_young_ptes_notify(__vma, __address, __ptep,
>>> __nr) \
>>> ({ \
>>> int __young; \
>>> struct vm_area_struct *___vma = __vma; \
>>> unsigned long ___address = __address; \
>>> - __young = ptep_clear_flush_young(___vma, ___address, __ptep); \
>>> + unsigned int ___nr = __nr; \
>>> + __young = clear_flush_young_ptes(___vma, ___address, __ptep,
>>> ___nr); \
>>> __young |= mmu_notifier_clear_flush_young(___vma->vm_mm, \
>>> ___address, \
>>> ___address + \
>>> - PAGE_SIZE); \
>>> + ___nr * PAGE_SIZE); \
>>> __young; \
>>> })
>>
>> Man that's ugly, Not your fault, but can this possibly be turned into
>> an inline function in a follow-up patch.
>
> Yes, the cleanup of these macros is already in my follow-up patch set.
>
>>> +#ifndef clear_flush_young_ptes
>>> +/**
>>> + * clear_flush_young_ptes - Clear the access bit and perform a TLB
>>> flush for PTEs
>>> + * that map consecutive pages of the same folio.
>>
>> With clear_young_dirty_ptes() description in mind, this should
>> probably be "Mark PTEs that map consecutive pages of the same folio as
>> clean and flush the TLB" ?
>
> IMO, “clean” is confusing here, as it sounds like clear the dirty bit to
> make the folio clean.
"as old", sorry, I used the wrong part of the description.
>
>>> + * @vma: The virtual memory area the pages are mapped into.
>>> + * @addr: Address the first page is mapped at.
>>> + * @ptep: Page table pointer for the first entry.
>>> + * @nr: Number of entries to clear access bit.
>>> + *
>>> + * May be overridden by the architecture; otherwise, implemented as
>>> a simple
>>> + * loop over ptep_clear_flush_young().
>>> + *
>>> + * Note that PTE bits in the PTE range besides the PFN can differ.
>>> For example,
>>> + * some PTEs might be write-protected.
>>> + *
>>> + * Context: The caller holds the page table lock. The PTEs map
>>> consecutive
>>> + * pages that belong to the same folio. The PTEs are all in the
>>> same PMD.
>>> + */
>>> +static inline int clear_flush_young_ptes(struct vm_area_struct *vma,
>>> + unsigned long addr, pte_t *ptep,
>>> + unsigned int nr)
>>
>> Two-tab alignment on second+ line like all similar functions here.
>
> Sure.
>
>>> +{
>>> + int i, young = 0;
>>> +
>>> + for (i = 0; i < nr; ++i, ++ptep, addr += PAGE_SIZE)
>>> + young |= ptep_clear_flush_young(vma, addr, ptep);
>>> +
>>
>> Why don't we use a similar loop we use in clear_young_dirty_ptes() or
>> clear_full_ptes() etc? It's not only consistent but also optimizes out
>> the first check for nr.
>> for (;;) {
>> young |= ptep_clear_flush_young(vma, addr, ptep);
>> if (--nr == 0)
>> break;
>> ptep++;
>> addr += PAGE_SIZE;
>> }
>
> We’ve discussed this loop pattern before [1], and it seems that people
> prefer the ‘for (;;)’ loop. Do you have a strong preference for changing
> it back?
Yes, to make all such helpers look consistent. Note that your version
was also not consistent with the other variants.
Ryans point was about avoiding two ptep_clear_flush_young() calls, which
the for(;;) avoids as well.
[...]
>>
>> And you will not have to mess with the "ptes" variable?
>
> We can't rely on pra->mapcount here, because a folio can be mapped in
> multiple VMAs. Even if the pra->mapcount is not zero, we can still call
> page_vma_mapped_walk_done() for the current VMA mapping when the entire
> folio is batched.
You are absolutely right for folios that are mapped into multiple processes.
--
Cheers,
David
next prev parent reply other threads:[~2026-02-09 9:20 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-26 6:07 [PATCH v5 0/5] support batch checking of references and unmapping for large folios Baolin Wang
2025-12-26 6:07 ` [PATCH v5 1/5] mm: rmap: support batched checks of the references " Baolin Wang
2026-01-07 6:01 ` Harry Yoo
2026-02-09 8:49 ` David Hildenbrand (Arm)
2026-02-09 9:14 ` Baolin Wang
2026-02-09 9:20 ` David Hildenbrand (Arm) [this message]
2026-02-09 9:25 ` Baolin Wang
2025-12-26 6:07 ` [PATCH v5 2/5] arm64: mm: factor out the address and ptep alignment into a new helper Baolin Wang
2026-02-09 8:50 ` David Hildenbrand (Arm)
2025-12-26 6:07 ` [PATCH v5 3/5] arm64: mm: support batch clearing of the young flag for large folios Baolin Wang
2026-01-02 12:21 ` Ryan Roberts
2026-02-09 9:02 ` David Hildenbrand (Arm)
2025-12-26 6:07 ` [PATCH v5 4/5] arm64: mm: implement the architecture-specific clear_flush_young_ptes() Baolin Wang
2026-01-28 11:47 ` Chris Mason
2026-01-29 1:42 ` Baolin Wang
2026-02-09 9:09 ` David Hildenbrand (Arm)
2026-02-09 9:36 ` Baolin Wang
2026-02-09 9:55 ` David Hildenbrand (Arm)
2026-02-09 10:13 ` Baolin Wang
2026-02-16 0:24 ` Alistair Popple
2025-12-26 6:07 ` [PATCH v5 5/5] mm: rmap: support batched unmapping for file large folios Baolin Wang
2026-01-06 13:22 ` Wei Yang
2026-01-06 21:29 ` Barry Song
2026-01-07 1:46 ` Wei Yang
2026-01-07 2:21 ` Barry Song
2026-01-07 2:29 ` Baolin Wang
2026-01-07 3:31 ` Wei Yang
2026-01-16 9:53 ` Dev Jain
2026-01-16 11:14 ` Lorenzo Stoakes
2026-01-16 14:28 ` Barry Song
2026-01-16 15:23 ` Barry Song
2026-01-16 15:49 ` Baolin Wang
2026-01-18 5:46 ` Dev Jain
2026-01-19 5:50 ` Baolin Wang
2026-01-19 6:36 ` Dev Jain
2026-01-19 7:22 ` Baolin Wang
2026-01-16 15:14 ` Barry Song
2026-01-18 5:48 ` Dev Jain
2026-01-07 6:54 ` Harry Yoo
2026-01-16 8:42 ` Lorenzo Stoakes
2026-01-16 16:26 ` [PATCH] mm: rmap: skip batched unmapping for UFFD vmas Baolin Wang
2026-02-09 9:54 ` David Hildenbrand (Arm)
2026-02-09 10:49 ` Barry Song
2026-02-09 10:58 ` David Hildenbrand (Arm)
2026-02-10 12:01 ` Dev Jain
2026-02-09 9:38 ` [PATCH v5 5/5] mm: rmap: support batched unmapping for file large folios David Hildenbrand (Arm)
2026-02-09 9:43 ` Baolin Wang
2026-02-13 5:19 ` Barry Song
2026-02-18 12:26 ` Dev Jain
2026-01-16 8:41 ` [PATCH v5 0/5] support batch checking of references and unmapping for " Lorenzo Stoakes
2026-01-16 10:53 ` David Hildenbrand (Red Hat)
2026-01-16 10:52 ` David Hildenbrand (Red Hat)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b86bfd4e-302c-4152-8dfd-41f67515b71d@kernel.org \
--to=david@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=catalin.marinas@arm.com \
--cc=dev.jain@arm.com \
--cc=harry.yoo@oracle.com \
--cc=jannh@google.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=riel@surriel.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.