From: Ankur Arora <ankur.a.arora@oracle.com>
To: Raghavendra K T <raghavendra.kt@amd.com>
Cc: ankur.a.arora@oracle.com, acme@kernel.org,
akpm@linux-foundation.org, boris.ostrovsky@oracle.com,
bp@alien8.de, dave.hansen@linux.intel.com, david@redhat.com,
hpa@zytor.com, konrad.wilk@oracle.com,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
luto@kernel.org, mingo@redhat.com, mjguzik@gmail.com,
namhyung@kernel.org, peterz@infradead.org, tglx@linutronix.de,
willy@infradead.org, x86@kernel.org
Subject: Re: [PATCH v7 00/16] mm: folio_zero_user: clear contiguous pages
Date: Mon, 06 Oct 2025 23:15:51 -0700 [thread overview]
Message-ID: <877bx7gru0.fsf@oracle.com> (raw)
In-Reply-To: <20250923062935.2416128-1-raghavendra.kt@amd.com>
Raghavendra K T <raghavendra.kt@amd.com> writes:
> On 9/17/2025 8:54 PM, Ankur Arora wrote:
>> This series adds clearing of contiguous page ranges for hugepages,
>> improving on the current page-at-a-time approach in two ways:
>>
>> - amortizes the per-page setup cost over a larger extent
>>
>> - when using string instructions, exposes the real region size
>> to the processor.
>>
>> A processor could use a knowledge of the extent to optimize the
>> clearing. AMD Zen uarchs, as an example, elide allocation of
>> cachelines for regions larger than L3-size.
> [...]
>
> Hello,
>
> Feel free to add
>
> Tested-by: Raghavendra K T <raghavendra.kt@amd.com>
Great. Thanks Raghu.
> for whole series.
>
> [ I do understand that there may be minor tweeks to clear page patches
> to convert nth_page once David's changes are in]
Yeah and a few other changes based on Andrew and David's comments.
> SUT: AMD Zen5
>
> I also did a quick hack to unconditionally use CLZERO/MOVNT on top of
> Ankur's series to test how much additional benefits can architectural
> enhancements bring in. [ Inline with second part of Ankur's old series before
> preempt lazy changes ]. Please note that it is only for testing ideally
> for lower sizes we would want rep stosb only. and threshold at which
> we need to do non-temporal copy should be a function of L3 and / OR L2 size
> perhaps.
>
> Results:
> base : 6.17-rc6 + perf bench patches
> clearpage : 6.17-rc6 + whole series from Ankur
> clzero : 6.17-rc6 + Ankur's series + clzero (below patch)
> movnt : 6.17-rc6 + Ankur's series + movnt (below patch)
>
> Command run: ./perf bench mem mmap -p 2MB -f demand -s 64GB -l 10
>
> Higher = better
>
> preempt = lazy (GB/sec) preempt = voluntary (GB/sec)
>
> base 20.655559 19.712500
> clearpage 35.060572 34.533414
> clzero 66.948422 66.067265
> movnt 51.593506 51.403765
The CLZERO number with page-size=2MB is pretty impressive. But as you
said that non temporal instructions need more thinking related to
thresholds etc.
> CLZERO/MOVNT experimental patch. Hope I have not missed anything here :)
Looks good to me :).
> -- >8 --
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 52c8910ba2ef..26cef2b187b9 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -3170,6 +3170,8 @@ config HAVE_ATOMIC_IOMAP
> def_bool y
> depends on X86_32
>
> +source "arch/x86/Kconfig.cpy"
> +
> source "arch/x86/kvm/Kconfig"
>
> source "arch/x86/Kconfig.cpufeatures"
> diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
> index 2361066d175e..aa2e62bbfa62 100644
> --- a/arch/x86/include/asm/page_64.h
> +++ b/arch/x86/include/asm/page_64.h
> @@ -84,11 +84,23 @@ static inline void clear_pages(void *addr, unsigned int npages)
> */
> kmsan_unpoison_memory(addr, len);
> asm volatile(ALTERNATIVE_2("call memzero_page_aligned_unrolled",
> - "shrq $3, %%rcx; rep stosq", X86_FEATURE_REP_GOOD,
> - "rep stosb", X86_FEATURE_ERMS)
> - : "+c" (len), "+D" (addr), ASM_CALL_CONSTRAINT
> - : "a" (0)
> - : "cc", "memory");
> + "shrq $3, %%rcx; rep stosq", X86_FEATURE_REP_GOOD,
> +#if defined(CONFIG_CLEARPAGE_CLZERO)
> + "call clear_pages_clzero", X86_FEATURE_CLZERO)
> + : "+c" (len), "+D" (addr), ASM_CALL_CONSTRAINT
> + : "a" (0)
> + : "cc", "memory");
> +#elif defined(CONFIG_CLEARPAGE_MOVNT)
> + "call clear_pages_movnt", X86_FEATURE_XMM2)
> + : "+c" (len), "+D" (addr), ASM_CALL_CONSTRAINT
> + : "a" (0)
> + : "cc", "memory");
> +#else
> + "rep stosb", X86_FEATURE_ERMS)
> + : "+c" (len), "+D" (addr), ASM_CALL_CONSTRAINT
> + : "a" (0)
> + : "cc", "memory");
> +#endif
> }
> #define clear_pages clear_pages
>
> diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
> index 27debe0c018c..0848287446dd 100644
> --- a/arch/x86/lib/clear_page_64.S
> +++ b/arch/x86/lib/clear_page_64.S
> @@ -4,6 +4,7 @@
> #include <linux/cfi_types.h>
> #include <linux/objtool.h>
> #include <asm/asm.h>
> +#include <asm/page_types.h>
>
> /*
> * Zero page aligned region.
> @@ -119,3 +120,40 @@ SYM_FUNC_START(rep_stos_alternative)
> _ASM_EXTABLE_UA(17b, .Lclear_user_tail)
> SYM_FUNC_END(rep_stos_alternative)
> EXPORT_SYMBOL(rep_stos_alternative)
> +
> +SYM_FUNC_START(clear_pages_movnt)
> + .p2align 4
> +.Lstart:
> + movnti %rax, 0x00(%rdi)
> + movnti %rax, 0x08(%rdi)
> + movnti %rax, 0x10(%rdi)
> + movnti %rax, 0x18(%rdi)
> + movnti %rax, 0x20(%rdi)
> + movnti %rax, 0x28(%rdi)
> + movnti %rax, 0x30(%rdi)
> + movnti %rax, 0x38(%rdi)
> + addq $0x40, %rdi
> + subl $0x40, %ecx
> + ja .Lstart
> + RET
> +SYM_FUNC_END(clear_pages_movnt)
> +EXPORT_SYMBOL_GPL(clear_pages_movnt)
> +
> +/*
> + * Zero a page using clzero (On AMD, with CPU_FEATURE_CLZERO.)
> + *
> + * Caller needs to issue a sfence at the end.
> + */
> +
> +SYM_FUNC_START(clear_pages_clzero)
> + movq %rdi,%rax
> + .p2align 4
> +.Liter:
> + clzero
> + addq $0x40, %rax
> + subl $0x40, %ecx
> + ja .Liter
> + sfence
> + RET
> +SYM_FUNC_END(clear_pages_clzero)
> +EXPORT_SYMBOL_GPL(clear_pages_clzero)
--
ankur
prev parent reply other threads:[~2025-10-07 6:18 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-17 15:24 [PATCH v7 00/16] mm: folio_zero_user: clear contiguous pages Ankur Arora
2025-09-17 15:24 ` [PATCH v7 01/16] perf bench mem: Remove repetition around time measurement Ankur Arora
2025-09-17 15:24 ` [PATCH v7 02/16] perf bench mem: Defer type munging of size to float Ankur Arora
2025-09-17 15:24 ` [PATCH v7 03/16] perf bench mem: Move mem op parameters into a structure Ankur Arora
2025-09-17 15:24 ` [PATCH v7 04/16] perf bench mem: Pull out init/fini logic Ankur Arora
2025-09-17 15:24 ` [PATCH v7 05/16] perf bench mem: Switch from zalloc() to mmap() Ankur Arora
2025-09-17 15:24 ` [PATCH v7 06/16] perf bench mem: Allow mapping of hugepages Ankur Arora
2025-09-17 15:24 ` [PATCH v7 07/16] perf bench mem: Allow chunking on a memory region Ankur Arora
2025-09-17 15:24 ` [PATCH v7 08/16] perf bench mem: Refactor mem_options Ankur Arora
2025-09-17 15:24 ` [PATCH v7 09/16] perf bench mem: Add mmap() workloads Ankur Arora
2025-09-17 15:24 ` [PATCH v7 10/16] mm: define clear_pages(), clear_user_pages() Ankur Arora
2025-09-23 8:04 ` David Hildenbrand
2025-09-23 20:26 ` Ankur Arora
2025-09-24 11:05 ` David Hildenbrand
2025-09-25 5:25 ` Ankur Arora
2025-09-30 9:43 ` David Hildenbrand
2025-10-10 10:37 ` David Hildenbrand
2025-10-10 13:03 ` David Hildenbrand
2025-09-17 15:24 ` [PATCH v7 11/16] mm/highmem: introduce clear_user_highpages() Ankur Arora
2025-09-23 8:06 ` David Hildenbrand
2025-09-23 20:34 ` Ankur Arora
2025-09-24 11:06 ` David Hildenbrand
2025-09-25 5:26 ` Ankur Arora
2025-09-30 9:44 ` David Hildenbrand
2025-09-17 15:24 ` [PATCH v7 12/16] arm: mm: define clear_user_highpages() Ankur Arora
2025-09-23 8:09 ` David Hildenbrand
2025-09-23 22:25 ` Ankur Arora
2025-09-24 11:10 ` David Hildenbrand
2025-09-25 6:08 ` Ankur Arora
2025-09-30 9:51 ` David Hildenbrand
2025-10-07 6:43 ` Ankur Arora
2025-09-17 15:24 ` [PATCH v7 13/16] mm: memory: support clearing page ranges Ankur Arora
2025-09-17 21:44 ` Andrew Morton
2025-09-18 4:54 ` Ankur Arora
2025-09-23 8:14 ` David Hildenbrand
2025-09-23 8:36 ` Raghavendra K T
2025-09-23 9:13 ` Raghavendra K T
2025-10-07 6:17 ` Ankur Arora
2025-09-19 11:33 ` kernel test robot
2025-09-17 15:24 ` [PATCH v7 14/16] x86/mm: Simplify clear_page_* Ankur Arora
2025-09-17 15:24 ` [PATCH v7 15/16] x86/clear_page: Introduce clear_pages() Ankur Arora
2025-09-17 15:24 ` [PATCH v7 16/16] x86/clear_pages: Support clearing of page-extents Ankur Arora
2025-09-17 16:29 ` [PATCH v7 00/16] mm: folio_zero_user: clear contiguous pages Arnaldo Carvalho de Melo
2025-09-18 4:00 ` Ankur Arora
2025-09-23 6:29 ` Raghavendra K T
2025-10-07 6:15 ` Ankur Arora [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877bx7gru0.fsf@oracle.com \
--to=ankur.a.arora@oracle.com \
--cc=acme@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=boris.ostrovsky@oracle.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=hpa@zytor.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=mjguzik@gmail.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@amd.com \
--cc=tglx@linutronix.de \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.