From: Herbert Xu <herbert@gondor.apana.org.au>
To: Ard Biesheuvel <ardb@kernel.org>
Cc: linux-crypto@vger.kernel.org, andre.przywara@arm.com,
linux-arm-kernel@lists.infradead.org,
Eric Biggers <ebiggers@google.com>,
"Jason A . Donenfeld" <Jason@zx2c4.com>
Subject: Re: [PATCH v2] crypto: arm/chacha-neon - optimize for non-block size multiples
Date: Fri, 13 Nov 2020 16:10:17 +1100 [thread overview]
Message-ID: <20201113051017.GC8350@gondor.apana.org.au> (raw)
In-Reply-To: <20201103162809.28167-1-ardb@kernel.org>
On Tue, Nov 03, 2020 at 05:28:09PM +0100, Ard Biesheuvel wrote:
> The current NEON based ChaCha implementation for ARM is optimized for
> multiples of 4x the ChaCha block size (64 bytes). This makes sense for
> block encryption, but given that ChaCha is also often used in the
> context of networking, it makes sense to consider arbitrary length
> inputs as well.
>
> For example, WireGuard typically uses 1420 byte packets, and performing
> ChaCha encryption involves 5 invocations of chacha_4block_xor_neon()
> and 3 invocations of chacha_block_xor_neon(), where the last one also
> involves a memcpy() using a buffer on the stack to process the final
> chunk of 1420 % 64 == 12 bytes.
>
> Let's optimize for this case as well, by letting chacha_4block_xor_neon()
> deal with any input size between 64 and 256 bytes, using NEON permutation
> instructions and overlapping loads and stores. This way, the 140 byte
> tail of a 1420 byte input buffer can simply be processed in one go.
>
> This results in the following performance improvements for 1420 byte
> blocks, without significant impact on power-of-2 input sizes. (Note
> that Raspberry Pi is widely used in combination with a 32-bit kernel,
> even though the core is 64-bit capable)
>
> Cortex-A8 (BeagleBone) : 7%
> Cortex-A15 (Calxeda Midway) : 21%
> Cortex-A53 (Raspberry Pi 3) : 3%
> Cortex-A72 (Raspberry Pi 4) : 19%
>
> Cc: Eric Biggers <ebiggers@google.com>
> Cc: "Jason A . Donenfeld" <Jason@zx2c4.com>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> v2:
> - avoid memcpy() if the residual byte count is exactly 64 bytes
> - get rid of register based post increments, and simply rewind the src
> pointer as needed (the dst pointer did not need the register post
> increment in the first place)
> - add benchmark results for 32-bit CPUs to commit log.
>
> arch/arm/crypto/chacha-glue.c | 34 +++----
> arch/arm/crypto/chacha-neon-core.S | 97 ++++++++++++++++++--
> 2 files changed, 107 insertions(+), 24 deletions(-)
Patch applied. Thanks.
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
next prev parent reply other threads:[~2020-11-13 5:10 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-03 16:28 [PATCH v2] crypto: arm/chacha-neon - optimize for non-block size multiples Ard Biesheuvel
2020-11-13 5:10 ` Herbert Xu [this message]
2020-12-12 6:43 ` Eric Biggers
2020-12-12 7:24 ` Ard Biesheuvel
2020-12-12 19:48 ` Eric Biggers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201113051017.GC8350@gondor.apana.org.au \
--to=herbert@gondor.apana.org.au \
--cc=Jason@zx2c4.com \
--cc=andre.przywara@arm.com \
--cc=ardb@kernel.org \
--cc=ebiggers@google.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-crypto@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox