All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/6] crypto: x86_64 optimized XChaCha and NHPoly1305 (for Adiantum)
@ 2018-12-05  6:19 Eric Biggers
  2018-12-05  6:20 ` [PATCH v3 1/6] crypto: x86/nhpoly1305 - add SSE2 accelerated NHPoly1305 Eric Biggers
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Eric Biggers @ 2018-12-05  6:19 UTC (permalink / raw)
  To: linux-crypto
  Cc: Paul Crowley, Martin Willi, Milan Broz, Jason A . Donenfeld,
	linux-kernel

Hello,

This series optimizes the Adiantum encryption mode for x86_64 by adding
SSE2 and AVX2 accelerated implementations of NHPoly1305, specifically
the NH part; and by modifying the existing x86_64 SSSE3/AVX2/AVX-512VL
implementation of ChaCha20 to support XChaCha20 and XChaCha12.

This greatly improves Adiantum performance on x86_64.  

For example, encrypting 4096-byte messages (single-threaded) on a
Skylake-based processor (Intel Xeon, supports AVX-512VL and AVX2):

                           Before                After
                           --------              ---------
adiantum(xchacha12,aes)    348 MB/s              1493 MB/s
adiantum(xchacha20,aes)    266 MB/s              1261 MB/s

And on a Zen-based processor (Threadripper 1950X, supports AVX2):

                           Before                After
                           --------              ---------
adiantum(xchacha12,aes)    505 MB/s              1292 MB/s
adiantum(xchacha20,aes)    387 MB/s              1037 MB/s

Decryption is almost exactly the same speed as encryption.

The biggest benefit comes from accelerating XChaCha.  Accelerating NH
gives a somewhat smaller, but still significant benefit.

Performance on 512-byte inputs is also improved, though that is much
slower in the first place.  When Adiantium is used with dm-crypt (or
cryptsetup), we recommend using a 4096-byte sector size.

For comparison, AES-256-XTS is 2710 MB/s on the Skylake CPU and
4140 MB/s on the Zen CPU.  However, AES has the benefit of direct AES-NI
hardware support whereas Adiantum is implemented entirely with
general-purpose instructions (scalar and SIMD).  Adiantum is also a
super-pseudorandom permutation over the entire sector, unlike XTS.

Note that XChaCha20 and XChaCha12 can be used for other purposes too.

Changed since v2:
  - Yield the FPU once per 4096 bytes rather than once per skcipher_walk
    step.
  - Create full stack frame in hchacha_block_ssse3() and
    chacha_block_xor_ssse3().

Changed since v1:
  - Rebase on top of latest cryptodev with the AVX-512VL accelerated
    ChaCha20 from Martin Willi.

Eric Biggers (6):
  crypto: x86/nhpoly1305 - add SSE2 accelerated NHPoly1305
  crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305
  crypto: x86/chacha20 - add XChaCha20 support
  crypto: x86/chacha20 - refactor to allow varying number of rounds
  crypto: x86/chacha - add XChaCha12 support
  crypto: x86/chacha - yield the FPU occasionally

 arch/x86/crypto/Makefile                      |  15 +-
 ...a20-avx2-x86_64.S => chacha-avx2-x86_64.S} |  33 +-
 ...12vl-x86_64.S => chacha-avx512vl-x86_64.S} |  35 +--
 ...0-ssse3-x86_64.S => chacha-ssse3-x86_64.S} | 104 +++---
 arch/x86/crypto/chacha20_glue.c               | 208 ------------
 arch/x86/crypto/chacha_glue.c                 | 297 ++++++++++++++++++
 arch/x86/crypto/nh-avx2-x86_64.S              | 157 +++++++++
 arch/x86/crypto/nh-sse2-x86_64.S              | 123 ++++++++
 arch/x86/crypto/nhpoly1305-avx2-glue.c        |  77 +++++
 arch/x86/crypto/nhpoly1305-sse2-glue.c        |  76 +++++
 crypto/Kconfig                                |  28 +-
 11 files changed, 861 insertions(+), 292 deletions(-)
 rename arch/x86/crypto/{chacha20-avx2-x86_64.S => chacha-avx2-x86_64.S} (97%)
 rename arch/x86/crypto/{chacha20-avx512vl-x86_64.S => chacha-avx512vl-x86_64.S} (97%)
 rename arch/x86/crypto/{chacha20-ssse3-x86_64.S => chacha-ssse3-x86_64.S} (92%)
 delete mode 100644 arch/x86/crypto/chacha20_glue.c
 create mode 100644 arch/x86/crypto/chacha_glue.c
 create mode 100644 arch/x86/crypto/nh-avx2-x86_64.S
 create mode 100644 arch/x86/crypto/nh-sse2-x86_64.S
 create mode 100644 arch/x86/crypto/nhpoly1305-avx2-glue.c
 create mode 100644 arch/x86/crypto/nhpoly1305-sse2-glue.c

-- 
2.19.2

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-12-13 10:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-12-05  6:19 [PATCH v3 0/6] crypto: x86_64 optimized XChaCha and NHPoly1305 (for Adiantum) Eric Biggers
2018-12-05  6:20 ` [PATCH v3 1/6] crypto: x86/nhpoly1305 - add SSE2 accelerated NHPoly1305 Eric Biggers
2018-12-05  6:20 ` [PATCH v3 2/6] crypto: x86/nhpoly1305 - add AVX2 " Eric Biggers
2018-12-05  6:20 ` [PATCH v3 3/6] crypto: x86/chacha20 - add XChaCha20 support Eric Biggers
2018-12-05  6:20 ` [PATCH v3 4/6] crypto: x86/chacha20 - refactor to allow varying number of rounds Eric Biggers
2018-12-05  6:20 ` [PATCH v3 5/6] crypto: x86/chacha - add XChaCha12 support Eric Biggers
2018-12-05  6:20 ` [PATCH v3 6/6] crypto: x86/chacha - yield the FPU occasionally Eric Biggers
2018-12-13 10:32 ` [PATCH v3 0/6] crypto: x86_64 optimized XChaCha and NHPoly1305 (for Adiantum) Herbert Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.