From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Baolin Wang <baolin.wang@linux.alibaba.com>, akpm@linux-foundation.org
Cc: catalin.marinas@arm.com, will@kernel.org,
lorenzo.stoakes@oracle.com, ryan.roberts@arm.com,
Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, riel@surriel.com,
harry.yoo@oracle.com, jannh@google.com, willy@infradead.org,
baohua@kernel.org, dev.jain@arm.com, axelrasmussen@google.com,
yuanchu@google.com, weixugc@google.com, hannes@cmpxchg.org,
zhengqi.arch@bytedance.com, shakeel.butt@linux.dev,
linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 6/6] arm64: mm: implement the architecture-specific test_and_clear_young_ptes()
Date: Fri, 6 Mar 2026 15:47:35 +0100 [thread overview]
Message-ID: <6305e05e-2911-42b0-b6f5-7fdde787b778@kernel.org> (raw)
In-Reply-To: <7f891d42a720cc2e57862f3b79e4f774404f313c.1772778858.git.baolin.wang@linux.alibaba.com>
On 3/6/26 07:43, Baolin Wang wrote:
> Implement the Arm64 architecture-specific test_and_clear_young_ptes() to enable
> batched checking of young flags, improving performance during large folio
> reclamation when MGLRU is enabled.
>
> While we're at it, simplify ptep_test_and_clear_young() by calling
> test_and_clear_young_ptes(). Since callers guarantee that PTEs are present
> before calling these functions, we can use pte_cont() to check the CONT_PTE
> flag instead of pte_valid_cont().
>
> Performance testing:
> Enable MGLRU, then allocate 10G clean file-backed folios by mmap() in a memory
> cgroup, and try to reclaim 8G file-backed folios via the memory.reclaim interface.
> I can observe 60%+ performance improvement on my Arm64 32-core server (and about
> 15% improvement on my X86 machine).
>
> W/o patchset:
> real 0m0.470s
> user 0m0.000s
> sys 0m0.470s
>
> W/ patchset:
> real 0m0.180s
> user 0m0.001s
> sys 0m0.179s
>
> Reviewed-by: Rik van Riel <riel@surriel.com>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
> arch/arm64/include/asm/pgtable.h | 18 ++++++++++++------
> 1 file changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index aa4b13da6371..ab451d20e4c5 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -1812,16 +1812,22 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
> return __ptep_get_and_clear(mm, addr, ptep);
> }
>
> +#define test_and_clear_young_ptes test_and_clear_young_ptes
> +static inline int test_and_clear_young_ptes(struct vm_area_struct *vma,
> + unsigned long addr, pte_t *ptep,
> + unsigned int nr)
> +{
> + if (likely(nr == 1 && !pte_cont(__ptep_get(ptep))))
> + return __ptep_test_and_clear_young(vma, addr, ptep);
> +
> + return contpte_test_and_clear_young_ptes(vma, addr, ptep, nr);
> +}
Thinking out loud, what would happen if
(a) The range spans multiple possible cont ranges (like, 64 ptes).
(b) The first pte is !pte_cont(), but some others in there are?
--
Cheers,
David
next prev parent reply other threads:[~2026-03-06 14:47 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-06 6:43 [PATCH v3 0/6] support batched checking of the young flag for MGLRU Baolin Wang
2026-03-06 6:43 ` [PATCH v3 1/6] mm: use inline helper functions instead of ugly macros Baolin Wang
2026-03-06 6:43 ` [PATCH v3 2/6] mm: rename ptep/pmdp_clear_young_notify() to ptep/pmdp_test_and_clear_young_notify() Baolin Wang
2026-03-06 6:43 ` [PATCH v3 3/6] mm: rmap: add a ZONE_DEVICE folio warning in folio_referenced() Baolin Wang
2026-03-06 6:43 ` [PATCH v3 4/6] mm: add a batched helper to clear the young flag for large folios Baolin Wang
2026-03-06 6:43 ` [PATCH v3 5/6] mm: support batched checking of the young flag for MGLRU Baolin Wang
2026-03-06 14:44 ` David Hildenbrand (Arm)
2026-03-06 6:43 ` [PATCH v3 6/6] arm64: mm: implement the architecture-specific test_and_clear_young_ptes() Baolin Wang
2026-03-06 14:47 ` David Hildenbrand (Arm) [this message]
2026-03-07 1:28 ` Baolin Wang
2026-03-09 14:39 ` David Hildenbrand (Arm)
2026-03-10 2:51 ` Baolin Wang
2026-03-09 14:40 ` David Hildenbrand (Arm)
2026-03-06 23:20 ` [PATCH v3 0/6] support batched checking of the young flag for MGLRU Andrew Morton
2026-03-07 1:29 ` Baolin Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6305e05e-2911-42b0-b6f5-7fdde787b778@kernel.org \
--to=david@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=catalin.marinas@arm.com \
--cc=dev.jain@arm.com \
--cc=hannes@cmpxchg.org \
--cc=harry.yoo@oracle.com \
--cc=jannh@google.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=riel@surriel.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=weixugc@google.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=yuanchu@google.com \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.