From: Eric Biggers <ebiggers@kernel.org>
To: linux-crypto@vger.kernel.org
Cc: Paul Crowley <paulcrowley@google.com>,
Ard Biesheuvel <ard.biesheuvel@linaro.org>,
"Jason A . Donenfeld" <Jason@zx2c4.com>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: [PATCH v2 0/4] crypto: ARM64 NEON optimized XChaCha and NHPoly1305 (for Adiantum)
Date: Mon, 3 Dec 2018 19:52:48 -0800 [thread overview]
Message-ID: <20181204035252.14853-1-ebiggers@kernel.org> (raw)
Hello,
This series optimizes the Adiantum encryption mode for ARM64 by adding
an ARM64 NEON accelerated implementation of NHPoly1305, specifically the
NH part; and by modifying the existing ARM64 NEON implementation of
ChaCha20 to support XChaCha20 and XChaCha12.
This greatly improves Adiantum performance on ARM64. For example,
encrypting 4096-byte messages (single-threaded) on a Raspberry Pi 3
Model B v1.2, which has a Cortex-A53 processor:
Before After
--------- ---------
adiantum(xchacha12,aes) 44.1 MB/s 82.7 MB/s
adiantum(xchacha20,aes) 35.5 MB/s 65.7 MB/s
Decryption is almost exactly the same speed as encryption.
The biggest benefit comes from accelerating XChaCha. Accelerating NH
gives a somewhat smaller, but still significant benefit.
Performance on 512-byte inputs is also improved, though that is much
slower in the first place. When Adiantium is used with dm-crypt (or
cryptsetup), we recommend using a 4096-byte sector size.
For comparison, on the same hardware AES-256-XTS encryption is only
24.5 MB/s and decryption 21.6 MB/s, both using the NEON-bitsliced
implementation ("xts-aes-neonbs"). That is the fastest AES-256-XTS
implementation on this processor, since it doesn't have the ARMv8
Cryptography Extensions. This is despite Adiantum also being a super-
pseudorandom permutation (SPRP) over the entire sector, unlike XTS.
Note that XChaCha20 and XChaCha12 can be used for other purposes too.
Changed since v1:
- Create full stack frame in hchacha_block_neon() and
chacha_block_xor_neon().
- Use x30 instead of lr.
- Fix whitespace in nh-neon-core.S.
Eric Biggers (4):
crypto: arm64/nhpoly1305 - add NEON-accelerated NHPoly1305
crypto: arm64/chacha20 - add XChaCha20 support
crypto: arm64/chacha20 - refactor to allow varying number of rounds
crypto: arm64/chacha - add XChaCha12 support
arch/arm64/crypto/Kconfig | 7 +-
arch/arm64/crypto/Makefile | 7 +-
...hacha20-neon-core.S => chacha-neon-core.S} | 92 +++++---
arch/arm64/crypto/chacha-neon-glue.c | 207 ++++++++++++++++++
arch/arm64/crypto/chacha20-neon-glue.c | 133 -----------
arch/arm64/crypto/nh-neon-core.S | 103 +++++++++
arch/arm64/crypto/nhpoly1305-neon-glue.c | 77 +++++++
7 files changed, 461 insertions(+), 165 deletions(-)
rename arch/arm64/crypto/{chacha20-neon-core.S => chacha-neon-core.S} (90%)
create mode 100644 arch/arm64/crypto/chacha-neon-glue.c
delete mode 100644 arch/arm64/crypto/chacha20-neon-glue.c
create mode 100644 arch/arm64/crypto/nh-neon-core.S
create mode 100644 arch/arm64/crypto/nhpoly1305-neon-glue.c
--
2.19.2
WARNING: multiple messages have this Message-ID (diff)
From: Eric Biggers <ebiggers@kernel.org>
To: linux-crypto@vger.kernel.org
Cc: "Jason A . Donenfeld" <Jason@zx2c4.com>,
Ard Biesheuvel <ard.biesheuvel@linaro.org>,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
Paul Crowley <paulcrowley@google.com>
Subject: [PATCH v2 0/4] crypto: ARM64 NEON optimized XChaCha and NHPoly1305 (for Adiantum)
Date: Mon, 3 Dec 2018 19:52:48 -0800 [thread overview]
Message-ID: <20181204035252.14853-1-ebiggers@kernel.org> (raw)
Hello,
This series optimizes the Adiantum encryption mode for ARM64 by adding
an ARM64 NEON accelerated implementation of NHPoly1305, specifically the
NH part; and by modifying the existing ARM64 NEON implementation of
ChaCha20 to support XChaCha20 and XChaCha12.
This greatly improves Adiantum performance on ARM64. For example,
encrypting 4096-byte messages (single-threaded) on a Raspberry Pi 3
Model B v1.2, which has a Cortex-A53 processor:
Before After
--------- ---------
adiantum(xchacha12,aes) 44.1 MB/s 82.7 MB/s
adiantum(xchacha20,aes) 35.5 MB/s 65.7 MB/s
Decryption is almost exactly the same speed as encryption.
The biggest benefit comes from accelerating XChaCha. Accelerating NH
gives a somewhat smaller, but still significant benefit.
Performance on 512-byte inputs is also improved, though that is much
slower in the first place. When Adiantium is used with dm-crypt (or
cryptsetup), we recommend using a 4096-byte sector size.
For comparison, on the same hardware AES-256-XTS encryption is only
24.5 MB/s and decryption 21.6 MB/s, both using the NEON-bitsliced
implementation ("xts-aes-neonbs"). That is the fastest AES-256-XTS
implementation on this processor, since it doesn't have the ARMv8
Cryptography Extensions. This is despite Adiantum also being a super-
pseudorandom permutation (SPRP) over the entire sector, unlike XTS.
Note that XChaCha20 and XChaCha12 can be used for other purposes too.
Changed since v1:
- Create full stack frame in hchacha_block_neon() and
chacha_block_xor_neon().
- Use x30 instead of lr.
- Fix whitespace in nh-neon-core.S.
Eric Biggers (4):
crypto: arm64/nhpoly1305 - add NEON-accelerated NHPoly1305
crypto: arm64/chacha20 - add XChaCha20 support
crypto: arm64/chacha20 - refactor to allow varying number of rounds
crypto: arm64/chacha - add XChaCha12 support
arch/arm64/crypto/Kconfig | 7 +-
arch/arm64/crypto/Makefile | 7 +-
...hacha20-neon-core.S => chacha-neon-core.S} | 92 +++++---
arch/arm64/crypto/chacha-neon-glue.c | 207 ++++++++++++++++++
arch/arm64/crypto/chacha20-neon-glue.c | 133 -----------
arch/arm64/crypto/nh-neon-core.S | 103 +++++++++
arch/arm64/crypto/nhpoly1305-neon-glue.c | 77 +++++++
7 files changed, 461 insertions(+), 165 deletions(-)
rename arch/arm64/crypto/{chacha20-neon-core.S => chacha-neon-core.S} (90%)
create mode 100644 arch/arm64/crypto/chacha-neon-glue.c
delete mode 100644 arch/arm64/crypto/chacha20-neon-glue.c
create mode 100644 arch/arm64/crypto/nh-neon-core.S
create mode 100644 arch/arm64/crypto/nhpoly1305-neon-glue.c
--
2.19.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next reply other threads:[~2018-12-04 3:56 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-04 3:52 Eric Biggers [this message]
2018-12-04 3:52 ` [PATCH v2 0/4] crypto: ARM64 NEON optimized XChaCha and NHPoly1305 (for Adiantum) Eric Biggers
2018-12-04 3:52 ` [PATCH v2 1/4] crypto: arm64/nhpoly1305 - add NEON-accelerated NHPoly1305 Eric Biggers
2018-12-04 3:52 ` Eric Biggers
2018-12-04 3:52 ` [PATCH v2 2/4] crypto: arm64/chacha20 - add XChaCha20 support Eric Biggers
2018-12-04 3:52 ` Eric Biggers
2018-12-04 14:51 ` Ard Biesheuvel
2018-12-04 14:51 ` Ard Biesheuvel
2018-12-04 3:52 ` [PATCH v2 3/4] crypto: arm64/chacha20 - refactor to allow varying number of rounds Eric Biggers
2018-12-04 3:52 ` Eric Biggers
2018-12-04 3:52 ` [PATCH v2 4/4] crypto: arm64/chacha - add XChaCha12 support Eric Biggers
2018-12-04 3:52 ` Eric Biggers
2018-12-13 10:31 ` [PATCH v2 0/4] crypto: ARM64 NEON optimized XChaCha and NHPoly1305 (for Adiantum) Herbert Xu
2018-12-13 10:31 ` Herbert Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181204035252.14853-1-ebiggers@kernel.org \
--to=ebiggers@kernel.org \
--cc=Jason@zx2c4.com \
--cc=ard.biesheuvel@linaro.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=paulcrowley@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.