All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ard Biesheuvel <ardb@kernel.org>
To: linux-arm-kernel@lists.infradead.org
Cc: linux-hardening@vger.kernel.org, mark.rutland@arm.com,
	catalin.marinas@arm.com, will@kernel.org,
	Ard Biesheuvel <ardb@kernel.org>
Subject: [RFC PATCH 5/9] arm64: chacha-neon: move frame pop forward
Date: Wed, 13 Oct 2021 17:22:39 +0200	[thread overview]
Message-ID: <20211013152243.2216899-6-ardb@kernel.org> (raw)
In-Reply-To: <20211013152243.2216899-1-ardb@kernel.org>

Instead of branching back to the common exit point of the routine to pop
the stack frame and return to the caller, move the frame pop to right
after the point where we last use the callee save registers. This
simplifies the generation of CFI unwind metadata, and reduces the number
of needed branches.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/crypto/chacha-neon-core.S | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/crypto/chacha-neon-core.S b/arch/arm64/crypto/chacha-neon-core.S
index b70ac76f2610..918c0beae019 100644
--- a/arch/arm64/crypto/chacha-neon-core.S
+++ b/arch/arm64/crypto/chacha-neon-core.S
@@ -691,6 +691,8 @@ CPU_BE(	  rev		a15, a15	)
 	zip2		v15.2d, v29.2d, v31.2d
 	  stp		a14, a15, [x1, #-8]
 
+	frame_pop
+
 	tbnz		x5, #63, .Lt128
 	ld1		{v28.16b-v31.16b}, [x2]
 
@@ -726,7 +728,6 @@ CPU_BE(	  rev		a15, a15	)
 	st1		{v24.16b-v27.16b}, [x1], #64
 	st1		{v28.16b-v31.16b}, [x1]
 
-.Lout:	frame_pop
 	ret
 
 	// fewer than 192 bytes of in/output
@@ -744,7 +745,7 @@ CPU_BE(	  rev		a15, a15	)
 	eor		v23.16b, v23.16b, v31.16b
 	st1		{v20.16b-v23.16b}, [x5]		// overlapping stores
 1:	st1		{v16.16b-v19.16b}, [x1]
-	b		.Lout
+	ret
 
 	// fewer than 128 bytes of in/output
 .Lt128:	ld1		{v28.16b-v31.16b}, [x10]
@@ -772,7 +773,7 @@ CPU_BE(	  rev		a15, a15	)
 	eor		v31.16b, v31.16b, v3.16b
 	st1		{v28.16b-v31.16b}, [x6]		// overlapping stores
 2:	st1		{v20.16b-v23.16b}, [x1]
-	b		.Lout
+	ret
 
 	// fewer than 320 bytes of in/output
 .Lt320:	cbz		x7, 3f				// exactly 256 bytes?
@@ -789,7 +790,7 @@ CPU_BE(	  rev		a15, a15	)
 	eor		v31.16b, v31.16b, v3.16b
 	st1		{v28.16b-v31.16b}, [x7]		// overlapping stores
 3:	st1		{v24.16b-v27.16b}, [x1]
-	b		.Lout
+	ret
 SYM_FUNC_END(chacha_4block_xor_neon)
 
 	.section	".rodata", "a", %progbits
-- 
2.30.2


WARNING: multiple messages have this Message-ID (diff)
From: Ard Biesheuvel <ardb@kernel.org>
To: linux-arm-kernel@lists.infradead.org
Cc: linux-hardening@vger.kernel.org, mark.rutland@arm.com,
	catalin.marinas@arm.com, will@kernel.org,
	Ard Biesheuvel <ardb@kernel.org>
Subject: [RFC PATCH 5/9] arm64: chacha-neon: move frame pop forward
Date: Wed, 13 Oct 2021 17:22:39 +0200	[thread overview]
Message-ID: <20211013152243.2216899-6-ardb@kernel.org> (raw)
In-Reply-To: <20211013152243.2216899-1-ardb@kernel.org>

Instead of branching back to the common exit point of the routine to pop
the stack frame and return to the caller, move the frame pop to right
after the point where we last use the callee save registers. This
simplifies the generation of CFI unwind metadata, and reduces the number
of needed branches.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/crypto/chacha-neon-core.S | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/crypto/chacha-neon-core.S b/arch/arm64/crypto/chacha-neon-core.S
index b70ac76f2610..918c0beae019 100644
--- a/arch/arm64/crypto/chacha-neon-core.S
+++ b/arch/arm64/crypto/chacha-neon-core.S
@@ -691,6 +691,8 @@ CPU_BE(	  rev		a15, a15	)
 	zip2		v15.2d, v29.2d, v31.2d
 	  stp		a14, a15, [x1, #-8]
 
+	frame_pop
+
 	tbnz		x5, #63, .Lt128
 	ld1		{v28.16b-v31.16b}, [x2]
 
@@ -726,7 +728,6 @@ CPU_BE(	  rev		a15, a15	)
 	st1		{v24.16b-v27.16b}, [x1], #64
 	st1		{v28.16b-v31.16b}, [x1]
 
-.Lout:	frame_pop
 	ret
 
 	// fewer than 192 bytes of in/output
@@ -744,7 +745,7 @@ CPU_BE(	  rev		a15, a15	)
 	eor		v23.16b, v23.16b, v31.16b
 	st1		{v20.16b-v23.16b}, [x5]		// overlapping stores
 1:	st1		{v16.16b-v19.16b}, [x1]
-	b		.Lout
+	ret
 
 	// fewer than 128 bytes of in/output
 .Lt128:	ld1		{v28.16b-v31.16b}, [x10]
@@ -772,7 +773,7 @@ CPU_BE(	  rev		a15, a15	)
 	eor		v31.16b, v31.16b, v3.16b
 	st1		{v28.16b-v31.16b}, [x6]		// overlapping stores
 2:	st1		{v20.16b-v23.16b}, [x1]
-	b		.Lout
+	ret
 
 	// fewer than 320 bytes of in/output
 .Lt320:	cbz		x7, 3f				// exactly 256 bytes?
@@ -789,7 +790,7 @@ CPU_BE(	  rev		a15, a15	)
 	eor		v31.16b, v31.16b, v3.16b
 	st1		{v28.16b-v31.16b}, [x7]		// overlapping stores
 3:	st1		{v24.16b-v27.16b}, [x1]
-	b		.Lout
+	ret
 SYM_FUNC_END(chacha_4block_xor_neon)
 
 	.section	".rodata", "a", %progbits
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2021-10-13 15:23 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-13 15:22 [RFC PATCH 0/9] arm64: use unwind data on GCC for shadow call stack Ard Biesheuvel
2021-10-13 15:22 ` Ard Biesheuvel
2021-10-13 15:22 ` [RFC PATCH 1/9] arm64: assembler: enable PAC for non-leaf assembler routines Ard Biesheuvel
2021-10-13 15:22   ` Ard Biesheuvel
2021-10-13 15:22 ` [RFC PATCH 2/9] arm64: cache: use ALIAS version of linkage macros for local aliases Ard Biesheuvel
2021-10-13 15:22   ` Ard Biesheuvel
2021-10-13 15:22 ` [RFC PATCH 3/9] arm64: crypto: avoid overlapping linkage definitions for AES-CBC Ard Biesheuvel
2021-10-13 15:22   ` Ard Biesheuvel
2021-10-13 15:22 ` [RFC PATCH 4/9] arm64: aes-neonbs: move frame pop to end of function Ard Biesheuvel
2021-10-13 15:22   ` Ard Biesheuvel
2021-10-13 15:22 ` Ard Biesheuvel [this message]
2021-10-13 15:22   ` [RFC PATCH 5/9] arm64: chacha-neon: move frame pop forward Ard Biesheuvel
2021-10-13 15:22 ` [RFC PATCH 6/9] arm64: smccc: create proper stack frames for HVC/SMC calls Ard Biesheuvel
2021-10-13 15:22   ` Ard Biesheuvel
2021-10-13 15:44   ` Mark Brown
2021-10-13 15:44     ` Mark Brown
2021-10-13 15:22 ` [RFC PATCH 7/9] arm64: assembler: add unwind annotations to frame push/pop macros Ard Biesheuvel
2021-10-13 15:22   ` Ard Biesheuvel
2021-10-13 15:22 ` [RFC PATCH 8/9] arm64: unwind: add asynchronous unwind tables to the kernel proper Ard Biesheuvel
2021-10-13 15:22   ` Ard Biesheuvel
2021-10-13 15:22 ` [RFC PATCH 9/9] arm64: implement dynamic shadow call stack for GCC Ard Biesheuvel
2021-10-13 15:22   ` Ard Biesheuvel
2021-10-13 15:42   ` Mark Brown
2021-10-13 15:42     ` Mark Brown
2021-10-13 22:35   ` Dan Li
2021-10-13 22:35     ` Dan Li
2021-10-14  9:41     ` Ard Biesheuvel
2021-10-14  9:41       ` Ard Biesheuvel
2021-10-13 17:52 ` [RFC PATCH 0/9] arm64: use unwind data on GCC for shadow call stack Ard Biesheuvel
2021-10-13 17:52   ` Ard Biesheuvel
2021-10-13 18:01 ` Nick Desaulniers
2021-10-13 18:01   ` Nick Desaulniers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211013152243.2216899-6-ardb@kernel.org \
    --to=ardb@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.