linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] arm64: clear_page: use stnp non-temporal instruction for performance optimizing
@ 2021-11-16 15:08 Guanghui Feng
  2021-11-16 18:17 ` Catalin Marinas
  2021-11-16 23:12 ` Robin Murphy
  0 siblings, 2 replies; 3+ messages in thread
From: Guanghui Feng @ 2021-11-16 15:08 UTC (permalink / raw)
  To: catalin.marinas, will, maz, qperret, linux-arm-kernel,
	linux-kernel
  Cc: baolin.wang, zhuo.song, zhangliguang

When clear page mem, there is no need to alloc cache for storing these
mem value. And the copy_page.S have used stnp instruction for optimizing.
So I rewrite the clear_page.S with stnp. At the same time, I have tested it
with stnp instruction which will get about twice the performance improvement.

Signed-off-by: Guanghui Feng <guanghuifeng@linux.alibaba.com>
---
 arch/arm64/lib/clear_page.S | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/lib/clear_page.S b/arch/arm64/lib/clear_page.S
index b84b179..e9dc2d6 100644
--- a/arch/arm64/lib/clear_page.S
+++ b/arch/arm64/lib/clear_page.S
@@ -15,13 +15,18 @@
  *	x0 - dest
  */
 SYM_FUNC_START_PI(clear_page)
-	mrs	x1, dczid_el0
-	and	w1, w1, #0xf
-	mov	x2, #4
-	lsl	x1, x2, x1
-
-1:	dc	zva, x0
-	add	x0, x0, x1
+	mov	x1, #0
+	mov	x2, #0
+1:
+	stnp	x1, x2, [x0]
+	stnp	x1, x2, [x0, #16]
+	stnp	x1, x2, [x0, #32]
+	stnp	x1, x2, [x0, #48]
+	stnp	x1, x2, [x0, #64]
+	stnp	x1, x2, [x0, #80]
+	stnp	x1, x2, [x0, #96]
+	stnp	x1, x2, [x0, #112]
+	add	x0, x0, #128
 	tst	x0, #(PAGE_SIZE - 1)
 	b.ne	1b
 	ret
-- 
1.8.3.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] arm64: clear_page: use stnp non-temporal instruction for performance optimizing
  2021-11-16 15:08 [PATCH] arm64: clear_page: use stnp non-temporal instruction for performance optimizing Guanghui Feng
@ 2021-11-16 18:17 ` Catalin Marinas
  2021-11-16 23:12 ` Robin Murphy
  1 sibling, 0 replies; 3+ messages in thread
From: Catalin Marinas @ 2021-11-16 18:17 UTC (permalink / raw)
  To: Guanghui Feng
  Cc: will, maz, qperret, linux-arm-kernel, linux-kernel, baolin.wang,
	zhuo.song, zhangliguang

On Tue, Nov 16, 2021 at 11:08:14PM +0800, Guanghui Feng wrote:
> When clear page mem, there is no need to alloc cache for storing these
> mem value.

I theory, DC ZVA is supposed to trigger write streaming mode and all
writes go directly to memory avoiding cache allocation.

> And the copy_page.S have used stnp instruction for optimizing.
> So I rewrite the clear_page.S with stnp. At the same time, I have tested it
> with stnp instruction which will get about twice the performance improvement.

On which CPU implementation? Is the same improvement seen on a wider
range of CPUs?

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] arm64: clear_page: use stnp non-temporal instruction for performance optimizing
  2021-11-16 15:08 [PATCH] arm64: clear_page: use stnp non-temporal instruction for performance optimizing Guanghui Feng
  2021-11-16 18:17 ` Catalin Marinas
@ 2021-11-16 23:12 ` Robin Murphy
  1 sibling, 0 replies; 3+ messages in thread
From: Robin Murphy @ 2021-11-16 23:12 UTC (permalink / raw)
  To: Guanghui Feng, catalin.marinas, will, maz, qperret,
	linux-arm-kernel, linux-kernel
  Cc: baolin.wang, zhuo.song, zhangliguang

On 2021-11-16 15:08, Guanghui Feng wrote:
> When clear page mem, there is no need to alloc cache for storing these
> mem value. And the copy_page.S have used stnp instruction for optimizing.
> So I rewrite the clear_page.S with stnp. At the same time, I have tested it
> with stnp instruction which will get about twice the performance improvement.
> 
> Signed-off-by: Guanghui Feng <guanghuifeng@linux.alibaba.com>
> ---
>   arch/arm64/lib/clear_page.S | 19 ++++++++++++-------
>   1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/lib/clear_page.S b/arch/arm64/lib/clear_page.S
> index b84b179..e9dc2d6 100644
> --- a/arch/arm64/lib/clear_page.S
> +++ b/arch/arm64/lib/clear_page.S
> @@ -15,13 +15,18 @@
>    *	x0 - dest
>    */
>   SYM_FUNC_START_PI(clear_page)
> -	mrs	x1, dczid_el0
> -	and	w1, w1, #0xf
> -	mov	x2, #4
> -	lsl	x1, x2, x1
> -
> -1:	dc	zva, x0
> -	add	x0, x0, x1
> +	mov	x1, #0
> +	mov	x2, #0

Regardless of the bigger question around the architectural intent that 
DC ZVA is supposed to be the best way to clear memory (sanity check: 
this wasn't under virtualisation with HCR_EL2.TDZ set, was it?) - out of 
curiosity, why do this and not just "stnp xzr, xzr, ..."?

Note also that this is liable to conflict with the patch for respecting 
DCZID_EL0.DZP. On which note, is DC {GVA,GZVA} performance also a 
concern, or does your platform not have MTE? If the performance anomaly 
does turn out to be platform-specific, maybe it might be better to quirk 
those platforms to set DZP, rather than changing the code for everyone?

Robin.

> +1:
> +	stnp	x1, x2, [x0]
> +	stnp	x1, x2, [x0, #16]
> +	stnp	x1, x2, [x0, #32]
> +	stnp	x1, x2, [x0, #48]
> +	stnp	x1, x2, [x0, #64]
> +	stnp	x1, x2, [x0, #80]
> +	stnp	x1, x2, [x0, #96]
> +	stnp	x1, x2, [x0, #112]
> +	add	x0, x0, #128
>   	tst	x0, #(PAGE_SIZE - 1)
>   	b.ne	1b
>   	ret
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-11-16 23:13 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-11-16 15:08 [PATCH] arm64: clear_page: use stnp non-temporal instruction for performance optimizing Guanghui Feng
2021-11-16 18:17 ` Catalin Marinas
2021-11-16 23:12 ` Robin Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).