From: Ben Greear <greearb@candelatech.com>
To: Ard Biesheuvel <ardb@kernel.org>,
Herbert Xu <herbert@gondor.apana.org.au>
Cc: Linux Crypto Mailing List <linux-crypto@vger.kernel.org>,
Eric Biggers <ebiggers@kernel.org>
Subject: Re: [PATCH 0/5] crypto: Implement cmac based on cbc skcipher
Date: Thu, 20 Aug 2020 06:54:58 -0700 [thread overview]
Message-ID: <6bd84823-7dc6-e132-2959-e73d6806d2f1@candelatech.com> (raw)
In-Reply-To: <CAMj1kXGjPbscU=vzZwoX7gxuELgTYWk+wR3Z7vKk9RwKdhv1TQ@mail.gmail.com>
On 8/20/20 12:56 AM, Ard Biesheuvel wrote:
> On Thu, 20 Aug 2020 at 09:54, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>>
>> On Thu, Aug 20, 2020 at 09:48:02AM +0200, Ard Biesheuvel wrote:
>>>
>>>> Or are you saying on Ben's machine cbc-aesni would have worse
>>>> performance vs. aes-generic?
>>>>
>>>
>>> Yes, given the pathological overhead of FPU preserve/restore for every
>>> block of 16 bytes processed by the cbcmac wrapper.
>>
>> I'm sceptical. Do we have numbers showing this? You can get them
>> from tcrypt with my patch:
>>
>> https://patchwork.kernel.org/patch/11701343/
>>
>> Just do
>>
>> modprobe tcrypt mode=400 alg='cmac(aes-aesni)' klen=16
>> modprobe tcrypt mode=400 alg='cmac(aes-generic)' klen=16
>>
>>> cmac() is not really relevant for performance, afaict. Only cbcmac()
>>> is used for bulk data.
>>
>> Sure but it's trivial to extend my cmac patch to support cbcmac.
>>
>
>
> Sure.
>
> Ben, care to have a go at the above on your hardware? It would help us
> get to the bottom of this issue.
Here's a run on an: Intel(R) Core(TM) i7-7700T CPU @ 2.90GHz
testing speed of async cmac(aes-aesni) (cmac(aes-aesni))
[ 259.397756] tcrypt: test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 244 cycles/operation, 15 cycles/byte
[ 259.397759] tcrypt: test 1 ( 64 byte blocks, 16 bytes per update, 4 updates): 1052 cycles/operation, 16 cycles/byte
[ 259.397765] tcrypt: test 2 ( 64 byte blocks, 64 bytes per update, 1 updates): 641 cycles/operation, 10 cycles/byte
[ 259.397768] tcrypt: test 3 ( 256 byte blocks, 16 bytes per update, 16 updates): 3909 cycles/operation, 15 cycles/byte
[ 259.397786] tcrypt: test 4 ( 256 byte blocks, 64 bytes per update, 4 updates): 2602 cycles/operation, 10 cycles/byte
[ 259.397797] tcrypt: test 5 ( 256 byte blocks, 256 bytes per update, 1 updates): 2211 cycles/operation, 8 cycles/byte
[ 259.397807] tcrypt: test 6 ( 1024 byte blocks, 16 bytes per update, 64 updates): 15453 cycles/operation, 15 cycles/byte
[ 259.397872] tcrypt: test 7 ( 1024 byte blocks, 256 bytes per update, 4 updates): 8863 cycles/operation, 8 cycles/byte
[ 259.397910] tcrypt: test 8 ( 1024 byte blocks, 1024 bytes per update, 1 updates): 8442 cycles/operation, 8 cycles/byte
[ 259.397946] tcrypt: test 9 ( 2048 byte blocks, 16 bytes per update, 128 updates): 43542 cycles/operation, 21 cycles/byte
[ 259.398110] tcrypt: test 10 ( 2048 byte blocks, 256 bytes per update, 8 updates): 17649 cycles/operation, 8 cycles/byte
[ 259.398184] tcrypt: test 11 ( 2048 byte blocks, 1024 bytes per update, 2 updates): 21255 cycles/operation, 10 cycles/byte
[ 259.398267] tcrypt: test 12 ( 2048 byte blocks, 2048 bytes per update, 1 updates): 16322 cycles/operation, 7 cycles/byte
[ 259.398335] tcrypt: test 13 ( 4096 byte blocks, 16 bytes per update, 256 updates): 60301 cycles/operation, 14 cycles/byte
[ 259.398585] tcrypt: test 14 ( 4096 byte blocks, 256 bytes per update, 16 updates): 34413 cycles/operation, 8 cycles/byte
[ 259.398728] tcrypt: test 15 ( 4096 byte blocks, 1024 bytes per update, 4 updates): 32894 cycles/operation, 8 cycles/byte
[ 259.398865] tcrypt: test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates): 32521 cycles/operation, 7 cycles/byte
[ 259.399000] tcrypt: test 17 ( 8192 byte blocks, 16 bytes per update, 512 updates): 120415 cycles/operation, 14 cycles/byte
[ 259.399550] tcrypt: test 18 ( 8192 byte blocks, 256 bytes per update, 32 updates): 68635 cycles/operation, 8 cycles/byte
[ 259.399834] tcrypt: test 19 ( 8192 byte blocks, 1024 bytes per update, 8 updates): 83770 cycles/operation, 10 cycles/byte
[ 259.400157] tcrypt: test 20 ( 8192 byte blocks, 4096 bytes per update, 2 updates): 65075 cycles/operation, 7 cycles/byte
[ 259.400427] tcrypt: test 21 ( 8192 byte blocks, 8192 bytes per update, 1 updates): 65085 cycles/operation, 7 cycles/byte
[ 294.171336]
testing speed of async cmac(aes-generic) (cmac(aes-generic))
[ 294.171340] tcrypt: test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 275 cycles/operation, 17 cycles/byte
[ 294.171343] tcrypt: test 1 ( 64 byte blocks, 16 bytes per update, 4 updates): 1191 cycles/operation, 18 cycles/byte
[ 294.171350] tcrypt: test 2 ( 64 byte blocks, 64 bytes per update, 1 updates): 738 cycles/operation, 11 cycles/byte
[ 294.171354] tcrypt: test 3 ( 256 byte blocks, 16 bytes per update, 16 updates): 4386 cycles/operation, 17 cycles/byte
[ 294.171374] tcrypt: test 4 ( 256 byte blocks, 64 bytes per update, 4 updates): 2915 cycles/operation, 11 cycles/byte
[ 294.171387] tcrypt: test 5 ( 256 byte blocks, 256 bytes per update, 1 updates): 2464 cycles/operation, 9 cycles/byte
[ 294.171398] tcrypt: test 6 ( 1024 byte blocks, 16 bytes per update, 64 updates): 17558 cycles/operation, 17 cycles/byte
[ 294.171472] tcrypt: test 7 ( 1024 byte blocks, 256 bytes per update, 4 updates): 14022 cycles/operation, 13 cycles/byte
[ 294.171530] tcrypt: test 8 ( 1024 byte blocks, 1024 bytes per update, 1 updates): 9022 cycles/operation, 8 cycles/byte
[ 294.171569] tcrypt: test 9 ( 2048 byte blocks, 16 bytes per update, 128 updates): 38107 cycles/operation, 18 cycles/byte
[ 294.171722] tcrypt: test 10 ( 2048 byte blocks, 256 bytes per update, 8 updates): 18083 cycles/operation, 8 cycles/byte
[ 294.171798] tcrypt: test 11 ( 2048 byte blocks, 1024 bytes per update, 2 updates): 17260 cycles/operation, 8 cycles/byte
[ 294.171870] tcrypt: test 12 ( 2048 byte blocks, 2048 bytes per update, 1 updates): 17415 cycles/operation, 8 cycles/byte
[ 294.171943] tcrypt: test 13 ( 4096 byte blocks, 16 bytes per update, 256 updates): 66005 cycles/operation, 16 cycles/byte
[ 294.172217] tcrypt: test 14 ( 4096 byte blocks, 256 bytes per update, 16 updates): 36035 cycles/operation, 8 cycles/byte
[ 294.172366] tcrypt: test 15 ( 4096 byte blocks, 1024 bytes per update, 4 updates): 42812 cycles/operation, 10 cycles/byte
[ 294.172533] tcrypt: test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates): 53415 cycles/operation, 13 cycles/byte
[ 294.172745] tcrypt: test 17 ( 8192 byte blocks, 16 bytes per update, 512 updates): 133326 cycles/operation, 16 cycles/byte
[ 294.173297] tcrypt: test 18 ( 8192 byte blocks, 256 bytes per update, 32 updates): 90271 cycles/operation, 11 cycles/byte
[ 294.173646] tcrypt: test 19 ( 8192 byte blocks, 1024 bytes per update, 8 updates): 68703 cycles/operation, 8 cycles/byte
[ 294.173931] tcrypt: test 20 ( 8192 byte blocks, 4096 bytes per update, 2 updates): 67951 cycles/operation, 8 cycles/byte
[ 294.174213] tcrypt: test 21 ( 8192 byte blocks, 8192 bytes per update, 1 updates): 68370 cycles/operation, 8 cycles/byte
On my slow apu2 board with processor: AMD GX-412TC SOC
testing speed of async cmac(aes-aesni) (cmac(aes-aesni))
[ 51.750514] tcrypt: test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 600 cycles/operation, 37 cycle
[ 51.750532] tcrypt: test 1 ( 64 byte blocks, 16 bytes per update, 4 updates): 2063 cycles/operation, 32 cycle
[ 51.750582] tcrypt: test 2 ( 64 byte blocks, 64 bytes per update, 1 updates): 1326 cycles/operation, 20 cycle
[ 51.750619] tcrypt: test 3 ( 256 byte blocks, 16 bytes per update, 16 updates): 11190 cycles/operation, 43 cycle
[ 51.750775] tcrypt: test 4 ( 256 byte blocks, 64 bytes per update, 4 updates): 4935 cycles/operation, 19 cycle
[ 51.750840] tcrypt: test 5 ( 256 byte blocks, 256 bytes per update, 1 updates): 8652 cycles/operation, 33 cycle
[ 51.750948] tcrypt: test 6 ( 1024 byte blocks, 16 bytes per update, 64 updates): 43430 cycles/operation, 42 cycle
[ 51.751488] tcrypt: test 7 ( 1024 byte blocks, 256 bytes per update, 4 updates): 23589 cycles/operation, 23 cycle
[ 51.751810] tcrypt: test 8 ( 1024 byte blocks, 1024 bytes per update, 1 updates): 18759 cycles/operation, 18 cycle
[ 51.752027] tcrypt: test 9 ( 2048 byte blocks, 16 bytes per update, 128 updates): 79699 cycles/operation, 38 cycle
[ 51.753035] tcrypt: test 10 ( 2048 byte blocks, 256 bytes per update, 8 updates): 39900 cycles/operation, 19 cycle
[ 51.753559] tcrypt: test 11 ( 2048 byte blocks, 1024 bytes per update, 2 updates): 38390 cycles/operation, 18 cycle
[ 51.754057] tcrypt: test 12 ( 2048 byte blocks, 2048 bytes per update, 1 updates): 40888 cycles/operation, 19 cycle
[ 51.754615] tcrypt: test 13 ( 4096 byte blocks, 16 bytes per update, 256 updates): 143019 cycles/operation, 34 cycle
[ 51.756369] tcrypt: test 14 ( 4096 byte blocks, 256 bytes per update, 16 updates): 89046 cycles/operation, 21 cycle
[ 51.757527] tcrypt: test 15 ( 4096 byte blocks, 1024 bytes per update, 4 updates): 77992 cycles/operation, 19 cycle
[ 51.758526] tcrypt: test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates): 76021 cycles/operation, 18 cycle
[ 51.759442] tcrypt: test 17 ( 8192 byte blocks, 16 bytes per update, 512 updates): 312260 cycles/operation, 38 cycle
[ 51.763195] tcrypt: test 18 ( 8192 byte blocks, 256 bytes per update, 32 updates): 176472 cycles/operation, 21 cycle
[ 51.765255] tcrypt: test 19 ( 8192 byte blocks, 1024 bytes per update, 8 updates): 169565 cycles/operation, 20 cycle
[ 51.767321] tcrypt: test 20 ( 8192 byte blocks, 4096 bytes per update, 2 updates): 164968 cycles/operation, 20 cycle
[ 51.769256] tcrypt: test 21 ( 8192 byte blocks, 8192 bytes per update, 1 updates): 165096 cycles/operation, 20 cycle
testing speed of async cmac(aes-generic) (cmac(aes-generic))
[ 97.835925] tcrypt: test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 665 cycles/operation, 41 cycle
[ 97.835945] tcrypt: test 1 ( 64 byte blocks, 16 bytes per update, 4 updates): 2430 cycles/operation, 37 cycle
[ 97.836016] tcrypt: test 2 ( 64 byte blocks, 64 bytes per update, 1 updates): 1656 cycles/operation, 25 cycle
[ 97.836044] tcrypt: test 3 ( 256 byte blocks, 16 bytes per update, 16 updates): 9014 cycles/operation, 35 cycle
[ 97.836259] tcrypt: test 4 ( 256 byte blocks, 64 bytes per update, 4 updates): 13444 cycles/operation, 52 cycle
[ 97.836399] tcrypt: test 5 ( 256 byte blocks, 256 bytes per update, 1 updates): 8960 cycles/operation, 35 cycle
[ 97.836515] tcrypt: test 6 ( 1024 byte blocks, 16 bytes per update, 64 updates): 51594 cycles/operation, 50 cycle
[ 97.837151] tcrypt: test 7 ( 1024 byte blocks, 256 bytes per update, 4 updates): 28105 cycles/operation, 27 cycle
[ 97.837497] tcrypt: test 8 ( 1024 byte blocks, 1024 bytes per update, 1 updates): 31365 cycles/operation, 30 cycle
[ 97.837865] tcrypt: test 9 ( 2048 byte blocks, 16 bytes per update, 128 updates): 86111 cycles/operation, 42 cycle
[ 97.838927] tcrypt: test 10 ( 2048 byte blocks, 256 bytes per update, 8 updates): 60021 cycles/operation, 29 cycle
[ 97.839628] tcrypt: test 11 ( 2048 byte blocks, 1024 bytes per update, 2 updates): 56311 cycles/operation, 27 cycle
[ 97.840308] tcrypt: test 12 ( 2048 byte blocks, 2048 bytes per update, 1 updates): 50877 cycles/operation, 24 cycle
[ 97.840943] tcrypt: test 13 ( 4096 byte blocks, 16 bytes per update, 256 updates): 174028 cycles/operation, 42 cycle
[ 97.843205] tcrypt: test 14 ( 4096 byte blocks, 256 bytes per update, 16 updates): 103243 cycles/operation, 25 cycle
[ 97.844524] tcrypt: test 15 ( 4096 byte blocks, 1024 bytes per update, 4 updates): 99960 cycles/operation, 24 cycle
[ 97.845865] tcrypt: test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates): 121735 cycles/operation, 29 cycle
[ 97.847355] tcrypt: test 17 ( 8192 byte blocks, 16 bytes per update, 512 updates): 387559 cycles/operation, 47 cycle
[ 97.851930] tcrypt: test 18 ( 8192 byte blocks, 256 bytes per update, 32 updates): 223662 cycles/operation, 27 cycle
[ 97.854617] tcrypt: test 19 ( 8192 byte blocks, 1024 bytes per update, 8 updates): 226131 cycles/operation, 27 cycle
[ 97.857385] tcrypt: test 20 ( 8192 byte blocks, 4096 bytes per update, 2 updates): 203840 cycles/operation, 24 cycle
[ 97.859888] tcrypt: test 21 ( 8192 byte blocks, 8192 bytes per update, 1 updates): 220232 cycles/operation, 26 cycle
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
next prev parent reply other threads:[~2020-08-20 14:10 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-02 9:06 [PATCH] crypto: x86/aesni - implement accelerated CBCMAC, CMAC and XCBC shashes Ard Biesheuvel
2020-08-03 19:11 ` Ben Greear
2020-08-04 12:55 ` Ard Biesheuvel
2020-08-04 13:01 ` Ben Greear
2020-08-04 13:08 ` Ard Biesheuvel
2020-08-04 13:22 ` Ben Greear
2020-08-04 19:45 ` Ben Greear
2020-08-04 20:12 ` Ard Biesheuvel
2020-09-23 11:03 ` Ben Greear
2020-10-29 16:58 ` Ard Biesheuvel
2020-08-18 8:24 ` [PATCH 0/5] crypto: Implement cmac based on cbc skcipher Herbert Xu
2020-08-18 8:25 ` [PATCH 1/6] crypto: skcipher - Add helpers for sync skcipher spawn Herbert Xu
2020-08-18 8:25 ` [PATCH 2/6] crypto: ahash - Add helper to free single spawn instance Herbert Xu
2020-08-18 8:25 ` [PATCH 3/6] crypto: ahash - Add init_tfm/exit_tfm Herbert Xu
2020-08-18 8:25 ` [PATCH 4/6] crypto: ahash - Add ahash_alg_instance Herbert Xu
2020-08-18 8:25 ` [PATCH 5/6] crypto: ahash - Remove AHASH_REQUEST_ON_STACK Herbert Xu
2020-08-26 10:55 ` Ard Biesheuvel
2020-08-18 8:25 ` [PATCH 6/6] crypto: cmac - Use cbc skcipher instead of raw cipher Herbert Xu
2020-08-24 9:47 ` Ard Biesheuvel
2020-08-24 11:20 ` Herbert Xu
2020-08-18 8:31 ` [PATCH 0/5] crypto: Implement cmac based on cbc skcipher Ard Biesheuvel
2020-08-18 13:51 ` Herbert Xu
2020-08-18 13:56 ` Ben Greear
2020-08-18 14:05 ` Herbert Xu
2020-08-18 14:17 ` Ben Greear
2020-08-18 22:15 ` Herbert Xu
2020-08-18 22:27 ` Herbert Xu
2020-08-18 22:31 ` Ben Greear
2020-08-18 22:33 ` Herbert Xu
2020-08-18 22:39 ` Ben Greear
2020-08-20 6:58 ` Ard Biesheuvel
2020-08-20 7:01 ` Herbert Xu
2020-08-20 7:04 ` Ard Biesheuvel
2020-08-20 7:06 ` Herbert Xu
2020-08-20 7:19 ` Ard Biesheuvel
2020-08-20 7:29 ` Herbert Xu
2020-08-20 7:33 ` Ard Biesheuvel
2020-08-20 7:44 ` Herbert Xu
2020-08-20 7:48 ` Ard Biesheuvel
2020-08-20 7:53 ` Herbert Xu
2020-08-20 7:56 ` Ard Biesheuvel
2020-08-20 13:54 ` Ben Greear [this message]
2020-08-20 20:10 ` Herbert Xu
2020-08-20 22:09 ` Ben Greear
2020-08-20 22:12 ` Herbert Xu
2020-08-22 22:35 ` Christian Lamparter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6bd84823-7dc6-e132-2959-e73d6806d2f1@candelatech.com \
--to=greearb@candelatech.com \
--cc=ardb@kernel.org \
--cc=ebiggers@kernel.org \
--cc=herbert@gondor.apana.org.au \
--cc=linux-crypto@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox