From: "Dey, Megha" <megha.dey@intel.com>
To: Ard Biesheuvel <ardb@kernel.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>,
"David S. Miller" <davem@davemloft.net>,
Linux Crypto Mailing List <linux-crypto@vger.kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
ravi.v.shankar@intel.com, tim.c.chen@intel.com,
andi.kleen@intel.com, dave.hansen@intel.com,
wajdi.k.feghali@intel.com, greg.b.tucker@intel.com,
robert.a.kasten@intel.com, rajendrakumar.chinnaiyan@intel.com,
tomasz.kantecki@intel.com, ryan.d.saffores@intel.com,
ilya.albrekht@intel.com, kyung.min.park@intel.com,
Tony Luck <tony.luck@intel.com>,
ira.weiny@intel.com
Subject: Re: [RFC V1 3/7] crypto: ghash - Optimized GHASH computations
Date: Fri, 15 Jan 2021 16:14:40 -0800 [thread overview]
Message-ID: <dfb5f2e0-027d-2b9c-aec7-313ff0275381@intel.com> (raw)
In-Reply-To: <CAMj1kXGhGopfg19at5N_9q89-UA4irSgMULyDXg+dKhnbRrCZQ@mail.gmail.com>
Hi Ard,
On 12/19/2020 9:03 AM, Ard Biesheuvel wrote:
> On Fri, 18 Dec 2020 at 22:07, Megha Dey <megha.dey@intel.com> wrote:
>> From: Kyung Min Park <kyung.min.park@intel.com>
>>
>> Optimize GHASH computations with the 512 bit wide VPCLMULQDQ instructions.
>> The new instruction allows to work on 4 x 16 byte blocks at the time.
>> For best parallelism and deeper out of order execution, the main loop of
>> the code works on 16 x 16 byte blocks at the time and performs reduction
>> every 48 x 16 byte blocks. Such approach needs 48 precomputed GHASH subkeys
>> and the precompute operation has been optimized as well to leverage 512 bit
>> registers, parallel carry less multiply and reduction.
>>
>> VPCLMULQDQ instruction is used to accelerate the most time-consuming
>> part of GHASH, carry-less multiplication. VPCLMULQDQ instruction
>> with AVX-512F adds EVEX encoded 512 bit version of PCLMULQDQ instruction.
>>
>> The glue code in ghash_clmulni_intel module overrides existing PCLMULQDQ
>> version with the VPCLMULQDQ version when the following criteria are met:
>> At compile time:
>> 1. CONFIG_CRYPTO_AVX512 is enabled
>> 2. toolchain(assembler) supports VPCLMULQDQ instructions
>> At runtime:
>> 1. VPCLMULQDQ and AVX512VL features are supported on a platform (currently
>> only Icelake)
>> 2. If compiled as built-in module, ghash_clmulni_intel.use_avx512 is set at
>> boot time or /sys/module/ghash_clmulni_intel/parameters/use_avx512 is set
>> to 1 after boot.
>> If compiled as loadable module, use_avx512 module parameter must be set:
>> modprobe ghash_clmulni_intel use_avx512=1
>>
>> With new implementation, tcrypt ghash speed test shows about 4x to 10x
>> speedup improvement for GHASH calculation compared to the original
>> implementation with PCLMULQDQ when the bytes per update size is 256 Bytes
>> or above. Detailed results for a variety of block sizes and update
>> sizes are in the table below. The test was performed on Icelake based
>> platform with constant frequency set for CPU.
>>
>> The average performance improvement of the AVX512 version over the current
>> implementation is as follows:
>> For bytes per update >= 1KB, we see the average improvement of 882%(~8.8x).
>> For bytes per update < 1KB, we see the average improvement of 370%(~3.7x).
>>
>> A typical run of tcrypt with GHASH calculation with PCLMULQDQ instruction
>> and VPCLMULQDQ instruction shows the following results.
>>
>> ---------------------------------------------------------------------------
>> | | | cycles/operation | |
>> | | | (the lower the better) | |
>> | byte | bytes |----------------------------------| percentage |
>> | blocks | per update | GHASH test | GHASH test | loss/gain |
>> | | | with PCLMULQDQ | with VPCLMULQDQ | |
>> |------------|------------|----------------|-----------------|------------|
>> | 16 | 16 | 144 | 233 | -38.0 |
>> | 64 | 16 | 535 | 709 | -24.5 |
>> | 64 | 64 | 210 | 146 | 43.8 |
>> | 256 | 16 | 1808 | 1911 | -5.4 |
>> | 256 | 64 | 865 | 581 | 48.9 |
>> | 256 | 256 | 682 | 170 | 301.0 |
>> | 1024 | 16 | 6746 | 6935 | -2.7 |
>> | 1024 | 256 | 2829 | 714 | 296.0 |
>> | 1024 | 1024 | 2543 | 341 | 645.0 |
>> | 2048 | 16 | 13219 | 13403 | -1.3 |
>> | 2048 | 256 | 5435 | 1408 | 286.0 |
>> | 2048 | 1024 | 5218 | 685 | 661.0 |
>> | 2048 | 2048 | 5061 | 565 | 796.0 |
>> | 4096 | 16 | 40793 | 27615 | 47.8 |
>> | 4096 | 256 | 10662 | 2689 | 297.0 |
>> | 4096 | 1024 | 10196 | 1333 | 665.0 |
>> | 4096 | 4096 | 10049 | 1011 | 894.0 |
>> | 8192 | 16 | 51672 | 54599 | -5.3 |
>> | 8192 | 256 | 21228 | 5284 | 301.0 |
>> | 8192 | 1024 | 20306 | 2556 | 694.0 |
>> | 8192 | 4096 | 20076 | 2044 | 882.0 |
>> | 8192 | 8192 | 20071 | 2017 | 895.0 |
>> ---------------------------------------------------------------------------
>>
>> This work was inspired by the AES GCM mode optimization published
>> in Intel Optimized IPSEC Cryptographic library.
>> https://github.com/intel/intel-ipsec-mb/lib/avx512/gcm_vaes_avx512.asm
>>
>> Co-developed-by: Greg Tucker <greg.b.tucker@intel.com>
>> Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
>> Co-developed-by: Tomasz Kantecki <tomasz.kantecki@intel.com>
>> Signed-off-by: Tomasz Kantecki <tomasz.kantecki@intel.com>
>> Signed-off-by: Kyung Min Park <kyung.min.park@intel.com>
>> Co-developed-by: Megha Dey <megha.dey@intel.com>
>> Signed-off-by: Megha Dey <megha.dey@intel.com>
> Hello Megha,
>
> What is the purpose of this separate GHASH module? GHASH is only used
> in combination with AES-CTR to produce GCM, and this series already
> contains a GCM driver.
>
> Do cores exist that implement PCLMULQDQ but not AES-NI?
>
> If not, I think we should be able to drop this patch (and remove the
> existing PCLMULQDQ GHASH driver as well)
AFAIK, dm-verity (authenticated but not encrypted file system) is one
use case for authentication only.
Although I am not sure if GHASH is specifically used for this or SHA?
Also, I do not know of any cores that implement PCLMULQDQ and not AES-NI.
Megha
next prev parent reply other threads:[~2021-01-16 0:15 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-18 21:10 [RFC V1 0/7] Introduce AVX512 optimized crypto algorithms Megha Dey
2020-12-18 21:10 ` [RFC V1 1/7] x86: Probe assembler capabilities for VAES and VPLCMULQDQ support Megha Dey
2021-01-16 16:54 ` Ard Biesheuvel
2021-01-20 22:38 ` Dey, Megha
2020-12-18 21:10 ` [RFC V1 2/7] crypto: crct10dif - Accelerated CRC T10 DIF with vectorized instruction Megha Dey
2021-01-16 17:00 ` Ard Biesheuvel
2021-01-20 22:46 ` Dey, Megha
2020-12-18 21:11 ` [RFC V1 3/7] crypto: ghash - Optimized GHASH computations Megha Dey
2020-12-19 17:03 ` Ard Biesheuvel
2021-01-16 0:14 ` Dey, Megha [this message]
2021-01-16 0:20 ` Dave Hansen
2021-01-16 2:04 ` Eric Biggers
2021-01-16 5:13 ` Dave Hansen
2021-01-16 16:48 ` Ard Biesheuvel
2021-01-16 1:43 ` Eric Biggers
2021-01-16 5:07 ` Dey, Megha
2020-12-18 21:11 ` [RFC V1 4/7] crypto: tcrypt - Add speed test for optimized " Megha Dey
2020-12-18 21:11 ` [RFC V1 5/7] crypto: aesni - AES CTR x86_64 "by16" AVX512 optimization Megha Dey
2021-01-16 17:03 ` Ard Biesheuvel
2021-01-20 22:46 ` Dey, Megha
2020-12-18 21:11 ` [RFC V1 6/7] crypto: aesni - fix coding style for if/else block Megha Dey
2020-12-18 21:11 ` [RFC V1 7/7] crypto: aesni - AVX512 version of AESNI-GCM using VPCLMULQDQ Megha Dey
2021-01-16 17:16 ` Ard Biesheuvel
2021-01-20 22:48 ` Dey, Megha
2020-12-21 23:20 ` [RFC V1 0/7] Introduce AVX512 optimized crypto algorithms Eric Biggers
2020-12-28 19:10 ` Dey, Megha
2021-01-16 16:52 ` Ard Biesheuvel
2021-01-16 18:35 ` Dey, Megha
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=dfb5f2e0-027d-2b9c-aec7-313ff0275381@intel.com \
--to=megha.dey@intel.com \
--cc=andi.kleen@intel.com \
--cc=ardb@kernel.org \
--cc=dave.hansen@intel.com \
--cc=davem@davemloft.net \
--cc=greg.b.tucker@intel.com \
--cc=herbert@gondor.apana.org.au \
--cc=ilya.albrekht@intel.com \
--cc=ira.weiny@intel.com \
--cc=kyung.min.park@intel.com \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=rajendrakumar.chinnaiyan@intel.com \
--cc=ravi.v.shankar@intel.com \
--cc=robert.a.kasten@intel.com \
--cc=ryan.d.saffores@intel.com \
--cc=tim.c.chen@intel.com \
--cc=tomasz.kantecki@intel.com \
--cc=tony.luck@intel.com \
--cc=wajdi.k.feghali@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox