Linux cryptographic layer development
 help / color / mirror / Atom feed
From: Ben Greear <greearb@candelatech.com>
To: Ard Biesheuvel <ardb@kernel.org>,
	Herbert Xu <herbert@gondor.apana.org.au>
Cc: Linux Crypto Mailing List <linux-crypto@vger.kernel.org>,
	Eric Biggers <ebiggers@kernel.org>
Subject: Re: [PATCH 0/5] crypto: Implement cmac based on cbc skcipher
Date: Thu, 20 Aug 2020 06:54:58 -0700	[thread overview]
Message-ID: <6bd84823-7dc6-e132-2959-e73d6806d2f1@candelatech.com> (raw)
In-Reply-To: <CAMj1kXGjPbscU=vzZwoX7gxuELgTYWk+wR3Z7vKk9RwKdhv1TQ@mail.gmail.com>

On 8/20/20 12:56 AM, Ard Biesheuvel wrote:
> On Thu, 20 Aug 2020 at 09:54, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>>
>> On Thu, Aug 20, 2020 at 09:48:02AM +0200, Ard Biesheuvel wrote:
>>>
>>>> Or are you saying on Ben's machine cbc-aesni would have worse
>>>> performance vs. aes-generic?
>>>>
>>>
>>> Yes, given the pathological overhead of FPU preserve/restore for every
>>> block of 16 bytes processed by the cbcmac wrapper.
>>
>> I'm sceptical.  Do we have numbers showing this? You can get them
>> from tcrypt with my patch:
>>
>>          https://patchwork.kernel.org/patch/11701343/
>>
>> Just do
>>
>>          modprobe tcrypt mode=400 alg='cmac(aes-aesni)' klen=16
>>          modprobe tcrypt mode=400 alg='cmac(aes-generic)' klen=16
>>
>>> cmac() is not really relevant for performance, afaict. Only cbcmac()
>>> is used for bulk data.
>>
>> Sure but it's trivial to extend my cmac patch to support cbcmac.
>>
> 
> 
> Sure.
> 
> Ben, care to have a go at the above on your hardware? It would help us
> get to the bottom of this issue.

Here's a run on an:  Intel(R) Core(TM) i7-7700T CPU @ 2.90GHz

                testing speed of async cmac(aes-aesni) (cmac(aes-aesni))
[  259.397756] tcrypt: test  0 (   16 byte blocks,   16 bytes per update,   1 updates):    244 cycles/operation,   15 cycles/byte
[  259.397759] tcrypt: test  1 (   64 byte blocks,   16 bytes per update,   4 updates):   1052 cycles/operation,   16 cycles/byte
[  259.397765] tcrypt: test  2 (   64 byte blocks,   64 bytes per update,   1 updates):    641 cycles/operation,   10 cycles/byte
[  259.397768] tcrypt: test  3 (  256 byte blocks,   16 bytes per update,  16 updates):   3909 cycles/operation,   15 cycles/byte
[  259.397786] tcrypt: test  4 (  256 byte blocks,   64 bytes per update,   4 updates):   2602 cycles/operation,   10 cycles/byte
[  259.397797] tcrypt: test  5 (  256 byte blocks,  256 bytes per update,   1 updates):   2211 cycles/operation,    8 cycles/byte
[  259.397807] tcrypt: test  6 ( 1024 byte blocks,   16 bytes per update,  64 updates):  15453 cycles/operation,   15 cycles/byte
[  259.397872] tcrypt: test  7 ( 1024 byte blocks,  256 bytes per update,   4 updates):   8863 cycles/operation,    8 cycles/byte
[  259.397910] tcrypt: test  8 ( 1024 byte blocks, 1024 bytes per update,   1 updates):   8442 cycles/operation,    8 cycles/byte
[  259.397946] tcrypt: test  9 ( 2048 byte blocks,   16 bytes per update, 128 updates):  43542 cycles/operation,   21 cycles/byte
[  259.398110] tcrypt: test 10 ( 2048 byte blocks,  256 bytes per update,   8 updates):  17649 cycles/operation,    8 cycles/byte
[  259.398184] tcrypt: test 11 ( 2048 byte blocks, 1024 bytes per update,   2 updates):  21255 cycles/operation,   10 cycles/byte
[  259.398267] tcrypt: test 12 ( 2048 byte blocks, 2048 bytes per update,   1 updates):  16322 cycles/operation,    7 cycles/byte
[  259.398335] tcrypt: test 13 ( 4096 byte blocks,   16 bytes per update, 256 updates):  60301 cycles/operation,   14 cycles/byte
[  259.398585] tcrypt: test 14 ( 4096 byte blocks,  256 bytes per update,  16 updates):  34413 cycles/operation,    8 cycles/byte
[  259.398728] tcrypt: test 15 ( 4096 byte blocks, 1024 bytes per update,   4 updates):  32894 cycles/operation,    8 cycles/byte
[  259.398865] tcrypt: test 16 ( 4096 byte blocks, 4096 bytes per update,   1 updates):  32521 cycles/operation,    7 cycles/byte
[  259.399000] tcrypt: test 17 ( 8192 byte blocks,   16 bytes per update, 512 updates): 120415 cycles/operation,   14 cycles/byte
[  259.399550] tcrypt: test 18 ( 8192 byte blocks,  256 bytes per update,  32 updates):  68635 cycles/operation,    8 cycles/byte
[  259.399834] tcrypt: test 19 ( 8192 byte blocks, 1024 bytes per update,   8 updates):  83770 cycles/operation,   10 cycles/byte
[  259.400157] tcrypt: test 20 ( 8192 byte blocks, 4096 bytes per update,   2 updates):  65075 cycles/operation,    7 cycles/byte
[  259.400427] tcrypt: test 21 ( 8192 byte blocks, 8192 bytes per update,   1 updates):  65085 cycles/operation,    7 cycles/byte
[  294.171336]
                testing speed of async cmac(aes-generic) (cmac(aes-generic))
[  294.171340] tcrypt: test  0 (   16 byte blocks,   16 bytes per update,   1 updates):    275 cycles/operation,   17 cycles/byte
[  294.171343] tcrypt: test  1 (   64 byte blocks,   16 bytes per update,   4 updates):   1191 cycles/operation,   18 cycles/byte
[  294.171350] tcrypt: test  2 (   64 byte blocks,   64 bytes per update,   1 updates):    738 cycles/operation,   11 cycles/byte
[  294.171354] tcrypt: test  3 (  256 byte blocks,   16 bytes per update,  16 updates):   4386 cycles/operation,   17 cycles/byte
[  294.171374] tcrypt: test  4 (  256 byte blocks,   64 bytes per update,   4 updates):   2915 cycles/operation,   11 cycles/byte
[  294.171387] tcrypt: test  5 (  256 byte blocks,  256 bytes per update,   1 updates):   2464 cycles/operation,    9 cycles/byte
[  294.171398] tcrypt: test  6 ( 1024 byte blocks,   16 bytes per update,  64 updates):  17558 cycles/operation,   17 cycles/byte
[  294.171472] tcrypt: test  7 ( 1024 byte blocks,  256 bytes per update,   4 updates):  14022 cycles/operation,   13 cycles/byte
[  294.171530] tcrypt: test  8 ( 1024 byte blocks, 1024 bytes per update,   1 updates):   9022 cycles/operation,    8 cycles/byte
[  294.171569] tcrypt: test  9 ( 2048 byte blocks,   16 bytes per update, 128 updates):  38107 cycles/operation,   18 cycles/byte
[  294.171722] tcrypt: test 10 ( 2048 byte blocks,  256 bytes per update,   8 updates):  18083 cycles/operation,    8 cycles/byte
[  294.171798] tcrypt: test 11 ( 2048 byte blocks, 1024 bytes per update,   2 updates):  17260 cycles/operation,    8 cycles/byte
[  294.171870] tcrypt: test 12 ( 2048 byte blocks, 2048 bytes per update,   1 updates):  17415 cycles/operation,    8 cycles/byte
[  294.171943] tcrypt: test 13 ( 4096 byte blocks,   16 bytes per update, 256 updates):  66005 cycles/operation,   16 cycles/byte
[  294.172217] tcrypt: test 14 ( 4096 byte blocks,  256 bytes per update,  16 updates):  36035 cycles/operation,    8 cycles/byte
[  294.172366] tcrypt: test 15 ( 4096 byte blocks, 1024 bytes per update,   4 updates):  42812 cycles/operation,   10 cycles/byte
[  294.172533] tcrypt: test 16 ( 4096 byte blocks, 4096 bytes per update,   1 updates):  53415 cycles/operation,   13 cycles/byte
[  294.172745] tcrypt: test 17 ( 8192 byte blocks,   16 bytes per update, 512 updates): 133326 cycles/operation,   16 cycles/byte
[  294.173297] tcrypt: test 18 ( 8192 byte blocks,  256 bytes per update,  32 updates):  90271 cycles/operation,   11 cycles/byte
[  294.173646] tcrypt: test 19 ( 8192 byte blocks, 1024 bytes per update,   8 updates):  68703 cycles/operation,    8 cycles/byte
[  294.173931] tcrypt: test 20 ( 8192 byte blocks, 4096 bytes per update,   2 updates):  67951 cycles/operation,    8 cycles/byte
[  294.174213] tcrypt: test 21 ( 8192 byte blocks, 8192 bytes per update,   1 updates):  68370 cycles/operation,    8 cycles/byte


On my slow apu2 board with processor: AMD GX-412TC SOC

               testing speed of async cmac(aes-aesni) (cmac(aes-aesni))
[   51.750514] tcrypt: test  0 (   16 byte blocks,   16 bytes per update,   1 updates):    600 cycles/operation,   37 cycle
[   51.750532] tcrypt: test  1 (   64 byte blocks,   16 bytes per update,   4 updates):   2063 cycles/operation,   32 cycle
[   51.750582] tcrypt: test  2 (   64 byte blocks,   64 bytes per update,   1 updates):   1326 cycles/operation,   20 cycle
[   51.750619] tcrypt: test  3 (  256 byte blocks,   16 bytes per update,  16 updates):  11190 cycles/operation,   43 cycle
[   51.750775] tcrypt: test  4 (  256 byte blocks,   64 bytes per update,   4 updates):   4935 cycles/operation,   19 cycle
[   51.750840] tcrypt: test  5 (  256 byte blocks,  256 bytes per update,   1 updates):   8652 cycles/operation,   33 cycle
[   51.750948] tcrypt: test  6 ( 1024 byte blocks,   16 bytes per update,  64 updates):  43430 cycles/operation,   42 cycle
[   51.751488] tcrypt: test  7 ( 1024 byte blocks,  256 bytes per update,   4 updates):  23589 cycles/operation,   23 cycle
[   51.751810] tcrypt: test  8 ( 1024 byte blocks, 1024 bytes per update,   1 updates):  18759 cycles/operation,   18 cycle
[   51.752027] tcrypt: test  9 ( 2048 byte blocks,   16 bytes per update, 128 updates):  79699 cycles/operation,   38 cycle
[   51.753035] tcrypt: test 10 ( 2048 byte blocks,  256 bytes per update,   8 updates):  39900 cycles/operation,   19 cycle
[   51.753559] tcrypt: test 11 ( 2048 byte blocks, 1024 bytes per update,   2 updates):  38390 cycles/operation,   18 cycle
[   51.754057] tcrypt: test 12 ( 2048 byte blocks, 2048 bytes per update,   1 updates):  40888 cycles/operation,   19 cycle
[   51.754615] tcrypt: test 13 ( 4096 byte blocks,   16 bytes per update, 256 updates): 143019 cycles/operation,   34 cycle
[   51.756369] tcrypt: test 14 ( 4096 byte blocks,  256 bytes per update,  16 updates):  89046 cycles/operation,   21 cycle
[   51.757527] tcrypt: test 15 ( 4096 byte blocks, 1024 bytes per update,   4 updates):  77992 cycles/operation,   19 cycle
[   51.758526] tcrypt: test 16 ( 4096 byte blocks, 4096 bytes per update,   1 updates):  76021 cycles/operation,   18 cycle
[   51.759442] tcrypt: test 17 ( 8192 byte blocks,   16 bytes per update, 512 updates): 312260 cycles/operation,   38 cycle
[   51.763195] tcrypt: test 18 ( 8192 byte blocks,  256 bytes per update,  32 updates): 176472 cycles/operation,   21 cycle
[   51.765255] tcrypt: test 19 ( 8192 byte blocks, 1024 bytes per update,   8 updates): 169565 cycles/operation,   20 cycle
[   51.767321] tcrypt: test 20 ( 8192 byte blocks, 4096 bytes per update,   2 updates): 164968 cycles/operation,   20 cycle
[   51.769256] tcrypt: test 21 ( 8192 byte blocks, 8192 bytes per update,   1 updates): 165096 cycles/operation,   20 cycle

               testing speed of async cmac(aes-generic) (cmac(aes-generic))
[   97.835925] tcrypt: test  0 (   16 byte blocks,   16 bytes per update,   1 updates):    665 cycles/operation,   41 cycle
[   97.835945] tcrypt: test  1 (   64 byte blocks,   16 bytes per update,   4 updates):   2430 cycles/operation,   37 cycle
[   97.836016] tcrypt: test  2 (   64 byte blocks,   64 bytes per update,   1 updates):   1656 cycles/operation,   25 cycle
[   97.836044] tcrypt: test  3 (  256 byte blocks,   16 bytes per update,  16 updates):   9014 cycles/operation,   35 cycle
[   97.836259] tcrypt: test  4 (  256 byte blocks,   64 bytes per update,   4 updates):  13444 cycles/operation,   52 cycle
[   97.836399] tcrypt: test  5 (  256 byte blocks,  256 bytes per update,   1 updates):   8960 cycles/operation,   35 cycle
[   97.836515] tcrypt: test  6 ( 1024 byte blocks,   16 bytes per update,  64 updates):  51594 cycles/operation,   50 cycle
[   97.837151] tcrypt: test  7 ( 1024 byte blocks,  256 bytes per update,   4 updates):  28105 cycles/operation,   27 cycle
[   97.837497] tcrypt: test  8 ( 1024 byte blocks, 1024 bytes per update,   1 updates):  31365 cycles/operation,   30 cycle
[   97.837865] tcrypt: test  9 ( 2048 byte blocks,   16 bytes per update, 128 updates):  86111 cycles/operation,   42 cycle
[   97.838927] tcrypt: test 10 ( 2048 byte blocks,  256 bytes per update,   8 updates):  60021 cycles/operation,   29 cycle
[   97.839628] tcrypt: test 11 ( 2048 byte blocks, 1024 bytes per update,   2 updates):  56311 cycles/operation,   27 cycle
[   97.840308] tcrypt: test 12 ( 2048 byte blocks, 2048 bytes per update,   1 updates):  50877 cycles/operation,   24 cycle
[   97.840943] tcrypt: test 13 ( 4096 byte blocks,   16 bytes per update, 256 updates): 174028 cycles/operation,   42 cycle
[   97.843205] tcrypt: test 14 ( 4096 byte blocks,  256 bytes per update,  16 updates): 103243 cycles/operation,   25 cycle
[   97.844524] tcrypt: test 15 ( 4096 byte blocks, 1024 bytes per update,   4 updates):  99960 cycles/operation,   24 cycle
[   97.845865] tcrypt: test 16 ( 4096 byte blocks, 4096 bytes per update,   1 updates): 121735 cycles/operation,   29 cycle
[   97.847355] tcrypt: test 17 ( 8192 byte blocks,   16 bytes per update, 512 updates): 387559 cycles/operation,   47 cycle
[   97.851930] tcrypt: test 18 ( 8192 byte blocks,  256 bytes per update,  32 updates): 223662 cycles/operation,   27 cycle
[   97.854617] tcrypt: test 19 ( 8192 byte blocks, 1024 bytes per update,   8 updates): 226131 cycles/operation,   27 cycle
[   97.857385] tcrypt: test 20 ( 8192 byte blocks, 4096 bytes per update,   2 updates): 203840 cycles/operation,   24 cycle
[   97.859888] tcrypt: test 21 ( 8192 byte blocks, 8192 bytes per update,   1 updates): 220232 cycles/operation,   26 cycle

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

  reply	other threads:[~2020-08-20 14:10 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-02  9:06 [PATCH] crypto: x86/aesni - implement accelerated CBCMAC, CMAC and XCBC shashes Ard Biesheuvel
2020-08-03 19:11 ` Ben Greear
2020-08-04 12:55   ` Ard Biesheuvel
2020-08-04 13:01     ` Ben Greear
2020-08-04 13:08       ` Ard Biesheuvel
2020-08-04 13:22         ` Ben Greear
2020-08-04 19:45         ` Ben Greear
2020-08-04 20:12           ` Ard Biesheuvel
2020-09-23 11:03           ` Ben Greear
2020-10-29 16:58             ` Ard Biesheuvel
2020-08-18  8:24 ` [PATCH 0/5] crypto: Implement cmac based on cbc skcipher Herbert Xu
2020-08-18  8:25   ` [PATCH 1/6] crypto: skcipher - Add helpers for sync skcipher spawn Herbert Xu
2020-08-18  8:25   ` [PATCH 2/6] crypto: ahash - Add helper to free single spawn instance Herbert Xu
2020-08-18  8:25   ` [PATCH 3/6] crypto: ahash - Add init_tfm/exit_tfm Herbert Xu
2020-08-18  8:25   ` [PATCH 4/6] crypto: ahash - Add ahash_alg_instance Herbert Xu
2020-08-18  8:25   ` [PATCH 5/6] crypto: ahash - Remove AHASH_REQUEST_ON_STACK Herbert Xu
2020-08-26 10:55     ` Ard Biesheuvel
2020-08-18  8:25   ` [PATCH 6/6] crypto: cmac - Use cbc skcipher instead of raw cipher Herbert Xu
2020-08-24  9:47     ` Ard Biesheuvel
2020-08-24 11:20       ` Herbert Xu
2020-08-18  8:31   ` [PATCH 0/5] crypto: Implement cmac based on cbc skcipher Ard Biesheuvel
2020-08-18 13:51     ` Herbert Xu
2020-08-18 13:56       ` Ben Greear
2020-08-18 14:05         ` Herbert Xu
2020-08-18 14:17           ` Ben Greear
2020-08-18 22:15             ` Herbert Xu
2020-08-18 22:27               ` Herbert Xu
2020-08-18 22:31                 ` Ben Greear
2020-08-18 22:33                   ` Herbert Xu
2020-08-18 22:39                     ` Ben Greear
2020-08-20  6:58                       ` Ard Biesheuvel
2020-08-20  7:01                         ` Herbert Xu
2020-08-20  7:04                           ` Ard Biesheuvel
2020-08-20  7:06                             ` Herbert Xu
2020-08-20  7:19                               ` Ard Biesheuvel
2020-08-20  7:29                                 ` Herbert Xu
2020-08-20  7:33                                   ` Ard Biesheuvel
2020-08-20  7:44                                     ` Herbert Xu
2020-08-20  7:48                                       ` Ard Biesheuvel
2020-08-20  7:53                                         ` Herbert Xu
2020-08-20  7:56                                           ` Ard Biesheuvel
2020-08-20 13:54                                             ` Ben Greear [this message]
2020-08-20 20:10                                               ` Herbert Xu
2020-08-20 22:09                                                 ` Ben Greear
2020-08-20 22:12                                                   ` Herbert Xu
2020-08-22 22:35                 ` Christian Lamparter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6bd84823-7dc6-e132-2959-e73d6806d2f1@candelatech.com \
    --to=greearb@candelatech.com \
    --cc=ardb@kernel.org \
    --cc=ebiggers@kernel.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-crypto@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox