From: Eric Biggers <ebiggers@kernel.org>
To: linux-crypto@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org,
Ard Biesheuvel <ardb@kernel.org>,
Herbert Xu <herbert@gondor.apana.org.au>,
David Sterba <dsterba@suse.com>,
"Jason A . Donenfeld" <Jason@zx2c4.com>,
Paul Crowley <paulcrowley@google.com>
Subject: Re: [PATCH 0/5] crypto: add NEON-optimized BLAKE2b
Date: Wed, 16 Dec 2020 12:47:56 -0800 [thread overview]
Message-ID: <X9pyfAaw5hQ6ngTI@gmail.com> (raw)
In-Reply-To: <20201215234708.105527-1-ebiggers@kernel.org>
On Tue, Dec 15, 2020 at 03:47:03PM -0800, Eric Biggers wrote:
> This patchset adds a NEON implementation of BLAKE2b for 32-bit ARM.
> Patches 1-4 prepare for it by making some updates to the generic
> implementation, while patch 5 adds the actual NEON implementation.
>
> On Cortex-A7 (which these days is the most common ARM processor that
> doesn't have the ARMv8 Crypto Extensions), this is over twice as fast as
> SHA-256, and slightly faster than SHA-1. It is also almost three times
> as fast as the generic implementation of BLAKE2b:
>
> Algorithm Cycles per byte (on 4096-byte messages)
> =================== =======================================
> blake2b-256-neon 14.1
> sha1-neon 16.4
> sha1-asm 20.8
> blake2s-256-generic 26.1
> sha256-neon 28.9
> sha256-asm 32.1
> blake2b-256-generic 39.9
>
> This implementation isn't directly based on any other implementation,
> but it borrows some ideas from previous NEON code I've written as well
> as from chacha-neon-core.S. At least on Cortex-A7, it is faster than
> the other NEON implementations of BLAKE2b I'm aware of (the
> implementation in the BLAKE2 official repository using intrinsics, and
> Andrew Moon's implementation which can be found in SUPERCOP).
>
> NEON-optimized BLAKE2b is useful because there is interest in using
> BLAKE2b-256 for dm-verity on low-end Android devices (specifically,
> devices that lack the ARMv8 Crypto Extensions) to replace SHA-1. On
> these devices, the performance cost of upgrading to SHA-256 may be
> unacceptable, whereas BLAKE2b-256 would actually improve performance.
>
> Although BLAKE2b is intended for 64-bit platforms (unlike BLAKE2s which
> is intended for 32-bit platforms), on 32-bit ARM processors with NEON,
> BLAKE2b is actually faster than BLAKE2s. This is because NEON supports
> 64-bit operations, and because BLAKE2s's block size is too small for
> NEON to be helpful for it. The best I've been able to do with BLAKE2s
> on Cortex-A7 is 19.0 cpb with an optimized scalar implementation.
By the way, if people are interested in having my ARM scalar implementation of
BLAKE2s in the kernel too, I can send a patchset for that too. It just ended up
being slower than BLAKE2b and SHA-1, so it wasn't as good for the use case
mentioned above. If it were to be added as "blake2s-256-arm", we'd have:
Algorithm Cycles per byte (on 4096-byte messages)
=================== =======================================
blake2b-256-neon 14.1
sha1-neon 16.4
blake2s-256-arm 19.0
sha1-asm 20.8
blake2s-256-generic 26.1
sha256-neon 28.9
sha256-asm 32.1
blake2b-256-generic 39.9
WARNING: multiple messages have this Message-ID (diff)
From: Eric Biggers <ebiggers@kernel.org>
To: linux-crypto@vger.kernel.org
Cc: "Jason A . Donenfeld" <Jason@zx2c4.com>,
Herbert Xu <herbert@gondor.apana.org.au>,
David Sterba <dsterba@suse.com>, Ard Biesheuvel <ardb@kernel.org>,
linux-arm-kernel@lists.infradead.org,
Paul Crowley <paulcrowley@google.com>
Subject: Re: [PATCH 0/5] crypto: add NEON-optimized BLAKE2b
Date: Wed, 16 Dec 2020 12:47:56 -0800 [thread overview]
Message-ID: <X9pyfAaw5hQ6ngTI@gmail.com> (raw)
In-Reply-To: <20201215234708.105527-1-ebiggers@kernel.org>
On Tue, Dec 15, 2020 at 03:47:03PM -0800, Eric Biggers wrote:
> This patchset adds a NEON implementation of BLAKE2b for 32-bit ARM.
> Patches 1-4 prepare for it by making some updates to the generic
> implementation, while patch 5 adds the actual NEON implementation.
>
> On Cortex-A7 (which these days is the most common ARM processor that
> doesn't have the ARMv8 Crypto Extensions), this is over twice as fast as
> SHA-256, and slightly faster than SHA-1. It is also almost three times
> as fast as the generic implementation of BLAKE2b:
>
> Algorithm Cycles per byte (on 4096-byte messages)
> =================== =======================================
> blake2b-256-neon 14.1
> sha1-neon 16.4
> sha1-asm 20.8
> blake2s-256-generic 26.1
> sha256-neon 28.9
> sha256-asm 32.1
> blake2b-256-generic 39.9
>
> This implementation isn't directly based on any other implementation,
> but it borrows some ideas from previous NEON code I've written as well
> as from chacha-neon-core.S. At least on Cortex-A7, it is faster than
> the other NEON implementations of BLAKE2b I'm aware of (the
> implementation in the BLAKE2 official repository using intrinsics, and
> Andrew Moon's implementation which can be found in SUPERCOP).
>
> NEON-optimized BLAKE2b is useful because there is interest in using
> BLAKE2b-256 for dm-verity on low-end Android devices (specifically,
> devices that lack the ARMv8 Crypto Extensions) to replace SHA-1. On
> these devices, the performance cost of upgrading to SHA-256 may be
> unacceptable, whereas BLAKE2b-256 would actually improve performance.
>
> Although BLAKE2b is intended for 64-bit platforms (unlike BLAKE2s which
> is intended for 32-bit platforms), on 32-bit ARM processors with NEON,
> BLAKE2b is actually faster than BLAKE2s. This is because NEON supports
> 64-bit operations, and because BLAKE2s's block size is too small for
> NEON to be helpful for it. The best I've been able to do with BLAKE2s
> on Cortex-A7 is 19.0 cpb with an optimized scalar implementation.
By the way, if people are interested in having my ARM scalar implementation of
BLAKE2s in the kernel too, I can send a patchset for that too. It just ended up
being slower than BLAKE2b and SHA-1, so it wasn't as good for the use case
mentioned above. If it were to be added as "blake2s-256-arm", we'd have:
Algorithm Cycles per byte (on 4096-byte messages)
=================== =======================================
blake2b-256-neon 14.1
sha1-neon 16.4
blake2s-256-arm 19.0
sha1-asm 20.8
blake2s-256-generic 26.1
sha256-neon 28.9
sha256-asm 32.1
blake2b-256-generic 39.9
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2020-12-16 20:48 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-15 23:47 [PATCH 0/5] crypto: add NEON-optimized BLAKE2b Eric Biggers
2020-12-15 23:47 ` Eric Biggers
2020-12-15 23:47 ` [PATCH 1/5] crypto: blake2b - rename constants for consistency with blake2s Eric Biggers
2020-12-15 23:47 ` Eric Biggers
2020-12-17 17:13 ` David Sterba
2020-12-17 17:13 ` David Sterba
2020-12-15 23:47 ` [PATCH 2/5] crypto: blake2b - define shash_alg structs using macros Eric Biggers
2020-12-15 23:47 ` Eric Biggers
2020-12-17 17:15 ` David Sterba
2020-12-17 17:15 ` David Sterba
2020-12-17 18:35 ` Eric Biggers
2020-12-17 18:35 ` Eric Biggers
2020-12-15 23:47 ` [PATCH 3/5] crypto: blake2b - export helpers for optimized implementations Eric Biggers
2020-12-15 23:47 ` Eric Biggers
2020-12-17 17:15 ` David Sterba
2020-12-17 17:15 ` David Sterba
2020-12-17 22:33 ` Eric Biggers
2020-12-17 22:33 ` Eric Biggers
2020-12-15 23:47 ` [PATCH 4/5] crypto: blake2b - update file comment Eric Biggers
2020-12-15 23:47 ` Eric Biggers
2020-12-17 17:17 ` David Sterba
2020-12-17 17:17 ` David Sterba
2020-12-15 23:47 ` [PATCH 5/5] crypto: arm/blake2b - add NEON-optimized BLAKE2b implementation Eric Biggers
2020-12-15 23:47 ` Eric Biggers
2020-12-16 20:57 ` Eric Biggers
2020-12-16 20:57 ` Eric Biggers
2020-12-16 20:47 ` Eric Biggers [this message]
2020-12-16 20:47 ` [PATCH 0/5] crypto: add NEON-optimized BLAKE2b Eric Biggers
2020-12-16 22:32 ` Jason A. Donenfeld
2020-12-16 22:32 ` Jason A. Donenfeld
2020-12-17 3:54 ` Eric Biggers
2020-12-17 3:54 ` Eric Biggers
2020-12-17 14:01 ` Jason A. Donenfeld
2020-12-17 14:01 ` Jason A. Donenfeld
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=X9pyfAaw5hQ6ngTI@gmail.com \
--to=ebiggers@kernel.org \
--cc=Jason@zx2c4.com \
--cc=ardb@kernel.org \
--cc=dsterba@suse.com \
--cc=herbert@gondor.apana.org.au \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-crypto@vger.kernel.org \
--cc=paulcrowley@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.