From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: Barry Song <21cnbao@gmail.com>
Cc: akpm@linux-foundation.org, david@kernel.org,
catalin.marinas@arm.com, will@kernel.org,
lorenzo.stoakes@oracle.com, ryan.roberts@arm.com,
Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, riel@surriel.com,
harry.yoo@oracle.com, jannh@google.com, willy@infradead.org,
dev.jain@arm.com, linux-mm@kvack.org,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v6 4/5] arm64: mm: implement the architecture-specific clear_flush_young_ptes()
Date: Sat, 7 Mar 2026 10:14:34 +0800 [thread overview]
Message-ID: <721abb6a-93a0-4db3-9e69-ef23b253e4f5@linux.alibaba.com> (raw)
In-Reply-To: <CAGsJ_4wA-bpCaY8uLnwmzDT40s4RbsOXdRutvj5bZU+Em6cFeg@mail.gmail.com>
On 3/7/26 5:20 AM, Barry Song wrote:
> On Mon, Feb 9, 2026 at 10:07 PM Baolin Wang
> <baolin.wang@linux.alibaba.com> wrote:
>>
>> Implement the Arm64 architecture-specific clear_flush_young_ptes() to enable
>> batched checking of young flags and TLB flushing, improving performance during
>> large folio reclamation.
>>
>> Performance testing:
>> Allocate 10G clean file-backed folios by mmap() in a memory cgroup, and try to
>> reclaim 8G file-backed folios via the memory.reclaim interface. I can observe
>> 33% performance improvement on my Arm64 32-core server (and 10%+ improvement
>> on my X86 machine). Meanwhile, the hotspot folio_check_references() dropped
>> from approximately 35% to around 5%.
>>
>> W/o patchset:
>> real 0m1.518s
>> user 0m0.000s
>> sys 0m1.518s
>>
>> W/ patchset:
>> real 0m1.018s
>> user 0m0.000s
>> sys 0m1.018s
>>
>> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>
> Reviewed-by: Barry Song <baohua@kernel.org>
Thanks Barry. But this series has been upstreamed, I can not add your
reviewed tag.
>
>> ---
>> arch/arm64/include/asm/pgtable.h | 11 +++++++++++
>> 1 file changed, 11 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index 3dabf5ea17fa..a17eb8a76788 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -1838,6 +1838,17 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
>> return contpte_clear_flush_young_ptes(vma, addr, ptep, 1);
>> }
>>
>> +#define clear_flush_young_ptes clear_flush_young_ptes
>> +static inline int clear_flush_young_ptes(struct vm_area_struct *vma,
>> + unsigned long addr, pte_t *ptep,
>> + unsigned int nr)
>> +{
>> + if (likely(nr == 1 && !pte_cont(__ptep_get(ptep))))
>> + return __ptep_clear_flush_young(vma, addr, ptep);
>> +
>> + return contpte_clear_flush_young_ptes(vma, addr, ptep, nr);
>> +}
>
> A similar question arises here:
>
> If nr = 4 for 16KB large folios and one of those entries is young,
> we end up flushing the TLB for all 4 PTEs.
>
> If all four entries are young, we win; if only one is young, it seems
> we flush 3 redundant pages. but arm64 has TLB coalescing, so
> maybe they are just one TLB?
We discussed a similar issue in the previous thread [1], and I quote
some comments from Ryan:
"
My concern was the opportunity cost of evicting the entries for all the
non-accessed parts of the folio from the TLB. But of course, I'm talking
nonsense because the architecture does not allow caching non-accessed
entries in the TLB.
"
[1]
https://lore.kernel.org/all/02239ca7-9701-4bfa-af0f-dcf0d05a3e89@linux.alibaba.com/
next prev parent reply other threads:[~2026-03-07 2:14 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-09 14:07 [PATCH v6 0/5] support batch checking of references and unmapping for large folios Baolin Wang
2026-02-09 14:07 ` [PATCH v6 1/5] mm: rmap: support batched checks of the references " Baolin Wang
2026-02-09 15:25 ` David Hildenbrand (Arm)
2026-03-06 21:07 ` Barry Song
2026-03-07 2:22 ` Baolin Wang
2026-03-07 8:02 ` Barry Song
2026-03-10 1:37 ` Baolin Wang
2026-03-10 8:17 ` David Hildenbrand (Arm)
2026-03-16 6:25 ` Baolin Wang
2026-03-16 14:15 ` David Hildenbrand (Arm)
2026-03-25 14:36 ` Lorenzo Stoakes (Oracle)
2026-03-25 14:58 ` David Hildenbrand (Arm)
2026-03-25 15:06 ` Lorenzo Stoakes (Oracle)
2026-03-25 15:30 ` Andrew Morton
2026-03-25 15:32 ` Lorenzo Stoakes (Oracle)
2026-03-25 16:23 ` Andrew Morton
2026-03-25 16:28 ` Lorenzo Stoakes (Oracle)
2026-03-25 18:43 ` Andrew Morton
2026-03-25 18:58 ` Lorenzo Stoakes (Oracle)
2026-03-26 1:47 ` Baolin Wang
2026-03-26 5:31 ` Barry Song
2026-03-26 11:10 ` Lorenzo Stoakes (Oracle)
2026-03-26 12:04 ` Baolin Wang
2026-03-26 12:21 ` Lorenzo Stoakes (Oracle)
2026-03-27 10:20 ` Baolin Wang
2026-03-27 9:00 ` David Hildenbrand (Arm)
2026-03-17 7:30 ` Barry Song
2026-03-18 1:37 ` Baolin Wang
2026-02-09 14:07 ` [PATCH v6 2/5] arm64: mm: factor out the address and ptep alignment into a new helper Baolin Wang
2026-02-09 14:07 ` [PATCH v6 3/5] arm64: mm: support batch clearing of the young flag for large folios Baolin Wang
2026-02-09 14:07 ` [PATCH v6 4/5] arm64: mm: implement the architecture-specific clear_flush_young_ptes() Baolin Wang
2026-02-09 15:30 ` David Hildenbrand (Arm)
2026-02-10 0:39 ` Baolin Wang
2026-03-06 21:20 ` Barry Song
2026-03-07 2:14 ` Baolin Wang [this message]
2026-03-07 7:41 ` Barry Song
2026-02-09 14:07 ` [PATCH v6 5/5] mm: rmap: support batched unmapping for file large folios Baolin Wang
2026-02-09 15:31 ` David Hildenbrand (Arm)
2026-02-10 1:53 ` [PATCH v6 0/5] support batch checking of references and unmapping for " Andrew Morton
2026-02-10 2:01 ` Baolin Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=721abb6a-93a0-4db3-9e69-ef23b253e4f5@linux.alibaba.com \
--to=baolin.wang@linux.alibaba.com \
--cc=21cnbao@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=catalin.marinas@arm.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=harry.yoo@oracle.com \
--cc=jannh@google.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=riel@surriel.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.