From: Eric Biggers <ebiggers@kernel.org>
To: linux-crypto@vger.kernel.org
Cc: Paul Crowley <paulcrowley@google.com>,
Martin Willi <martin@strongswan.org>,
Milan Broz <gmazyland@gmail.com>,
"Jason A . Donenfeld" <Jason@zx2c4.com>,
linux-kernel@vger.kernel.org
Subject: [PATCH v3 0/6] crypto: x86_64 optimized XChaCha and NHPoly1305 (for Adiantum)
Date: Tue, 4 Dec 2018 22:19:59 -0800 [thread overview]
Message-ID: <20181205062005.27727-1-ebiggers@kernel.org> (raw)
Hello,
This series optimizes the Adiantum encryption mode for x86_64 by adding
SSE2 and AVX2 accelerated implementations of NHPoly1305, specifically
the NH part; and by modifying the existing x86_64 SSSE3/AVX2/AVX-512VL
implementation of ChaCha20 to support XChaCha20 and XChaCha12.
This greatly improves Adiantum performance on x86_64.
For example, encrypting 4096-byte messages (single-threaded) on a
Skylake-based processor (Intel Xeon, supports AVX-512VL and AVX2):
Before After
-------- ---------
adiantum(xchacha12,aes) 348 MB/s 1493 MB/s
adiantum(xchacha20,aes) 266 MB/s 1261 MB/s
And on a Zen-based processor (Threadripper 1950X, supports AVX2):
Before After
-------- ---------
adiantum(xchacha12,aes) 505 MB/s 1292 MB/s
adiantum(xchacha20,aes) 387 MB/s 1037 MB/s
Decryption is almost exactly the same speed as encryption.
The biggest benefit comes from accelerating XChaCha. Accelerating NH
gives a somewhat smaller, but still significant benefit.
Performance on 512-byte inputs is also improved, though that is much
slower in the first place. When Adiantium is used with dm-crypt (or
cryptsetup), we recommend using a 4096-byte sector size.
For comparison, AES-256-XTS is 2710 MB/s on the Skylake CPU and
4140 MB/s on the Zen CPU. However, AES has the benefit of direct AES-NI
hardware support whereas Adiantum is implemented entirely with
general-purpose instructions (scalar and SIMD). Adiantum is also a
super-pseudorandom permutation over the entire sector, unlike XTS.
Note that XChaCha20 and XChaCha12 can be used for other purposes too.
Changed since v2:
- Yield the FPU once per 4096 bytes rather than once per skcipher_walk
step.
- Create full stack frame in hchacha_block_ssse3() and
chacha_block_xor_ssse3().
Changed since v1:
- Rebase on top of latest cryptodev with the AVX-512VL accelerated
ChaCha20 from Martin Willi.
Eric Biggers (6):
crypto: x86/nhpoly1305 - add SSE2 accelerated NHPoly1305
crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305
crypto: x86/chacha20 - add XChaCha20 support
crypto: x86/chacha20 - refactor to allow varying number of rounds
crypto: x86/chacha - add XChaCha12 support
crypto: x86/chacha - yield the FPU occasionally
arch/x86/crypto/Makefile | 15 +-
...a20-avx2-x86_64.S => chacha-avx2-x86_64.S} | 33 +-
...12vl-x86_64.S => chacha-avx512vl-x86_64.S} | 35 +--
...0-ssse3-x86_64.S => chacha-ssse3-x86_64.S} | 104 +++---
arch/x86/crypto/chacha20_glue.c | 208 ------------
arch/x86/crypto/chacha_glue.c | 297 ++++++++++++++++++
arch/x86/crypto/nh-avx2-x86_64.S | 157 +++++++++
arch/x86/crypto/nh-sse2-x86_64.S | 123 ++++++++
arch/x86/crypto/nhpoly1305-avx2-glue.c | 77 +++++
arch/x86/crypto/nhpoly1305-sse2-glue.c | 76 +++++
crypto/Kconfig | 28 +-
11 files changed, 861 insertions(+), 292 deletions(-)
rename arch/x86/crypto/{chacha20-avx2-x86_64.S => chacha-avx2-x86_64.S} (97%)
rename arch/x86/crypto/{chacha20-avx512vl-x86_64.S => chacha-avx512vl-x86_64.S} (97%)
rename arch/x86/crypto/{chacha20-ssse3-x86_64.S => chacha-ssse3-x86_64.S} (92%)
delete mode 100644 arch/x86/crypto/chacha20_glue.c
create mode 100644 arch/x86/crypto/chacha_glue.c
create mode 100644 arch/x86/crypto/nh-avx2-x86_64.S
create mode 100644 arch/x86/crypto/nh-sse2-x86_64.S
create mode 100644 arch/x86/crypto/nhpoly1305-avx2-glue.c
create mode 100644 arch/x86/crypto/nhpoly1305-sse2-glue.c
--
2.19.2
next reply other threads:[~2018-12-05 6:21 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-05 6:19 Eric Biggers [this message]
2018-12-05 6:20 ` [PATCH v3 1/6] crypto: x86/nhpoly1305 - add SSE2 accelerated NHPoly1305 Eric Biggers
2018-12-05 6:20 ` [PATCH v3 2/6] crypto: x86/nhpoly1305 - add AVX2 " Eric Biggers
2018-12-05 6:20 ` [PATCH v3 3/6] crypto: x86/chacha20 - add XChaCha20 support Eric Biggers
2018-12-05 6:20 ` [PATCH v3 4/6] crypto: x86/chacha20 - refactor to allow varying number of rounds Eric Biggers
2018-12-05 6:20 ` [PATCH v3 5/6] crypto: x86/chacha - add XChaCha12 support Eric Biggers
2018-12-05 6:20 ` [PATCH v3 6/6] crypto: x86/chacha - yield the FPU occasionally Eric Biggers
2018-12-13 10:32 ` [PATCH v3 0/6] crypto: x86_64 optimized XChaCha and NHPoly1305 (for Adiantum) Herbert Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181205062005.27727-1-ebiggers@kernel.org \
--to=ebiggers@kernel.org \
--cc=Jason@zx2c4.com \
--cc=gmazyland@gmail.com \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=martin@strongswan.org \
--cc=paulcrowley@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.