* [PATCH] crypto: arm/chacha20 - always use vrev for 16-bit rotates
@ 2018-07-25 1:29 Eric Biggers
2018-07-25 6:18 ` Ard Biesheuvel
2018-08-03 13:59 ` Herbert Xu
0 siblings, 2 replies; 3+ messages in thread
From: Eric Biggers @ 2018-07-25 1:29 UTC (permalink / raw)
To: linux-arm-kernel
From: Eric Biggers <ebiggers@google.com>
The 4-way ChaCha20 NEON code implements 16-bit rotates with vrev32.16,
but the one-way code (used on remainder blocks) implements it with
vshl + vsri, which is slower. Switch the one-way code to vrev32.16 too.
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
arch/arm/crypto/chacha20-neon-core.S | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/arch/arm/crypto/chacha20-neon-core.S b/arch/arm/crypto/chacha20-neon-core.S
index 3fecb2124c35..451a849ad518 100644
--- a/arch/arm/crypto/chacha20-neon-core.S
+++ b/arch/arm/crypto/chacha20-neon-core.S
@@ -51,9 +51,8 @@ ENTRY(chacha20_block_xor_neon)
.Ldoubleround:
// x0 += x1, x3 = rotl32(x3 ^ x0, 16)
vadd.i32 q0, q0, q1
- veor q4, q3, q0
- vshl.u32 q3, q4, #16
- vsri.u32 q3, q4, #16
+ veor q3, q3, q0
+ vrev32.16 q3, q3
// x2 += x3, x1 = rotl32(x1 ^ x2, 12)
vadd.i32 q2, q2, q3
@@ -82,9 +81,8 @@ ENTRY(chacha20_block_xor_neon)
// x0 += x1, x3 = rotl32(x3 ^ x0, 16)
vadd.i32 q0, q0, q1
- veor q4, q3, q0
- vshl.u32 q3, q4, #16
- vsri.u32 q3, q4, #16
+ veor q3, q3, q0
+ vrev32.16 q3, q3
// x2 += x3, x1 = rotl32(x1 ^ x2, 12)
vadd.i32 q2, q2, q3
--
2.18.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH] crypto: arm/chacha20 - always use vrev for 16-bit rotates
2018-07-25 1:29 [PATCH] crypto: arm/chacha20 - always use vrev for 16-bit rotates Eric Biggers
@ 2018-07-25 6:18 ` Ard Biesheuvel
2018-08-03 13:59 ` Herbert Xu
1 sibling, 0 replies; 3+ messages in thread
From: Ard Biesheuvel @ 2018-07-25 6:18 UTC (permalink / raw)
To: linux-arm-kernel
On 25 July 2018 at 03:29, Eric Biggers <ebiggers3@gmail.com> wrote:
> From: Eric Biggers <ebiggers@google.com>
>
> The 4-way ChaCha20 NEON code implements 16-bit rotates with vrev32.16,
> but the one-way code (used on remainder blocks) implements it with
> vshl + vsri, which is slower. Switch the one-way code to vrev32.16 too.
>
> Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
> arch/arm/crypto/chacha20-neon-core.S | 10 ++++------
> 1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm/crypto/chacha20-neon-core.S b/arch/arm/crypto/chacha20-neon-core.S
> index 3fecb2124c35..451a849ad518 100644
> --- a/arch/arm/crypto/chacha20-neon-core.S
> +++ b/arch/arm/crypto/chacha20-neon-core.S
> @@ -51,9 +51,8 @@ ENTRY(chacha20_block_xor_neon)
> .Ldoubleround:
> // x0 += x1, x3 = rotl32(x3 ^ x0, 16)
> vadd.i32 q0, q0, q1
> - veor q4, q3, q0
> - vshl.u32 q3, q4, #16
> - vsri.u32 q3, q4, #16
> + veor q3, q3, q0
> + vrev32.16 q3, q3
>
> // x2 += x3, x1 = rotl32(x1 ^ x2, 12)
> vadd.i32 q2, q2, q3
> @@ -82,9 +81,8 @@ ENTRY(chacha20_block_xor_neon)
>
> // x0 += x1, x3 = rotl32(x3 ^ x0, 16)
> vadd.i32 q0, q0, q1
> - veor q4, q3, q0
> - vshl.u32 q3, q4, #16
> - vsri.u32 q3, q4, #16
> + veor q3, q3, q0
> + vrev32.16 q3, q3
>
> // x2 += x3, x1 = rotl32(x1 ^ x2, 12)
> vadd.i32 q2, q2, q3
> --
> 2.18.0
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH] crypto: arm/chacha20 - always use vrev for 16-bit rotates
2018-07-25 1:29 [PATCH] crypto: arm/chacha20 - always use vrev for 16-bit rotates Eric Biggers
2018-07-25 6:18 ` Ard Biesheuvel
@ 2018-08-03 13:59 ` Herbert Xu
1 sibling, 0 replies; 3+ messages in thread
From: Herbert Xu @ 2018-08-03 13:59 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Jul 24, 2018 at 06:29:07PM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
>
> The 4-way ChaCha20 NEON code implements 16-bit rotates with vrev32.16,
> but the one-way code (used on remainder blocks) implements it with
> vshl + vsri, which is slower. Switch the one-way code to vrev32.16 too.
>
> Signed-off-by: Eric Biggers <ebiggers@google.com>
Patch applied. Thanks.
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2018-08-03 13:59 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-07-25 1:29 [PATCH] crypto: arm/chacha20 - always use vrev for 16-bit rotates Eric Biggers
2018-07-25 6:18 ` Ard Biesheuvel
2018-08-03 13:59 ` Herbert Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).