From: Jussi Kivilinna <jussi.kivilinna@iki.fi>
To: Ard Biesheuvel <ard.biesheuvel@linaro.org>,
linux-crypto@vger.kernel.org,
linux-arm-kernel@lists.infradead.org
Cc: nico@linaro.org, Andy Polyakov <appro@openssl.org>
Subject: Re: [PATCH 4/4] ARM: add support for bit sliced AES using NEON instructions
Date: Sun, 22 Sep 2013 14:12:07 +0300 [thread overview]
Message-ID: <523ED087.7050006@iki.fi> (raw)
In-Reply-To: <1379702811-8025-5-git-send-email-ard.biesheuvel@linaro.org>
On 20.09.2013 21:46, Ard Biesheuvel wrote:
> This implementation of the AES algorithm gives around 45% speedup on Cortex-A15
> for CTR mode and for XTS in encryption mode. Both CBC and XTS in decryption mode
> are slightly faster (5 - 10% on Cortex-A15). [As CBC in encryption mode can only
> be performed sequentially, there is no speedup in this case.]
>
> Unlike the core AES cipher (on which this module also depends), this algorithm
> uses bit slicing to process up to 8 blocks in parallel in constant time. This
> algorithm does not rely on any lookup tables so it is believed to be
> invulnerable to cache timing attacks.
>
> The core code has been adopted from the OpenSSL project (in collaboration
> with the original author, on cc). For ease of maintenance, this version is
> identical to the upstream OpenSSL code, i.e., all modifications that were
> required to make it suitable for inclusion into the kernel have already been
> merged upstream.
>
> Cc: Andy Polyakov <appro@openssl.org>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
[..snip..]
> + bcc .Ldec_done
> + @ multiplication by 0x0e
Decryption can probably be made faster by implementing InvMixColumns slightly
differently. Instead of implementing inverse MixColumns matrix directly, use
preprocessing step, followed by MixColumns as described in section "4.1.3
Decryption" of "The Design of Rijndael: AES - The Advanced Encryption Standard"
(J. Daemen, V. Rijmen / 2002).
In short, the MixColumns and InvMixColumns matrixes have following relation:
| 0e 0b 0d 09 | | 02 03 01 01 | | 05 00 04 00 |
| 09 0e 0b 0d | = | 01 02 03 01 | x | 00 05 00 04 |
| 0d 09 0e 0b | | 01 01 02 03 | | 04 00 05 00 |
| 0b 0d 09 0e | | 03 01 01 02 | | 00 04 00 05 |
Bit-sliced implementation of the 05-00-04-00 matrix much shorter than 0e-0b-0d-09
matrix, so even when combined with MixColumns total instruction count for
InvMixColumns implemented this way should be nearly half of current.
Check [1] for implementation of this on AVX instruction set.
-Jussi
[1] https://github.com/jkivilin/supercop-blockciphers/blob/beyond_master/crypto_stream/aes128ctr/avx/aes_asm_bitslice_avx.S#L234
WARNING: multiple messages have this Message-ID (diff)
From: jussi.kivilinna@iki.fi (Jussi Kivilinna)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 4/4] ARM: add support for bit sliced AES using NEON instructions
Date: Sun, 22 Sep 2013 14:12:07 +0300 [thread overview]
Message-ID: <523ED087.7050006@iki.fi> (raw)
In-Reply-To: <1379702811-8025-5-git-send-email-ard.biesheuvel@linaro.org>
On 20.09.2013 21:46, Ard Biesheuvel wrote:
> This implementation of the AES algorithm gives around 45% speedup on Cortex-A15
> for CTR mode and for XTS in encryption mode. Both CBC and XTS in decryption mode
> are slightly faster (5 - 10% on Cortex-A15). [As CBC in encryption mode can only
> be performed sequentially, there is no speedup in this case.]
>
> Unlike the core AES cipher (on which this module also depends), this algorithm
> uses bit slicing to process up to 8 blocks in parallel in constant time. This
> algorithm does not rely on any lookup tables so it is believed to be
> invulnerable to cache timing attacks.
>
> The core code has been adopted from the OpenSSL project (in collaboration
> with the original author, on cc). For ease of maintenance, this version is
> identical to the upstream OpenSSL code, i.e., all modifications that were
> required to make it suitable for inclusion into the kernel have already been
> merged upstream.
>
> Cc: Andy Polyakov <appro@openssl.org>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
[..snip..]
> + bcc .Ldec_done
> + @ multiplication by 0x0e
Decryption can probably be made faster by implementing InvMixColumns slightly
differently. Instead of implementing inverse MixColumns matrix directly, use
preprocessing step, followed by MixColumns as described in section "4.1.3
Decryption" of "The Design of Rijndael: AES - The Advanced Encryption Standard"
(J. Daemen, V. Rijmen / 2002).
In short, the MixColumns and InvMixColumns matrixes have following relation:
| 0e 0b 0d 09 | | 02 03 01 01 | | 05 00 04 00 |
| 09 0e 0b 0d | = | 01 02 03 01 | x | 00 05 00 04 |
| 0d 09 0e 0b | | 01 01 02 03 | | 04 00 05 00 |
| 0b 0d 09 0e | | 03 01 01 02 | | 00 04 00 05 |
Bit-sliced implementation of the 05-00-04-00 matrix much shorter than 0e-0b-0d-09
matrix, so even when combined with MixColumns total instruction count for
InvMixColumns implemented this way should be nearly half of current.
Check [1] for implementation of this on AVX instruction set.
-Jussi
[1] https://github.com/jkivilin/supercop-blockciphers/blob/beyond_master/crypto_stream/aes128ctr/avx/aes_asm_bitslice_avx.S#L234
next prev parent reply other threads:[~2013-09-22 11:12 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-20 18:46 [PATCH 0/4] ARM: NEON based fast(er) AES in CBC/CTR/XTS modes Ard Biesheuvel
2013-09-20 18:46 ` Ard Biesheuvel
2013-09-20 18:46 ` [PATCH 1/4] crypto: create generic version of ablk_helper Ard Biesheuvel
2013-09-20 18:46 ` Ard Biesheuvel
2013-09-22 10:05 ` Jussi Kivilinna
2013-09-22 10:05 ` Jussi Kivilinna
2013-09-22 10:22 ` Ard Biesheuvel
2013-09-20 18:46 ` [PATCH 2/4] ARM: pull in <asm/simd.h> from asm-generic Ard Biesheuvel
2013-09-20 18:46 ` Ard Biesheuvel
2013-09-20 18:46 ` [PATCH 3/4] ARM: move AES typedefs and function prototypes to separate header Ard Biesheuvel
2013-09-20 18:46 ` Ard Biesheuvel
2013-09-20 18:46 ` [PATCH 4/4] ARM: add support for bit sliced AES using NEON instructions Ard Biesheuvel
2013-09-20 18:46 ` Ard Biesheuvel
2013-09-22 11:12 ` Jussi Kivilinna [this message]
2013-09-22 11:12 ` Jussi Kivilinna
2013-09-23 7:08 ` Ard Biesheuvel
2013-09-23 7:08 ` Ard Biesheuvel
2013-09-20 19:12 ` [PATCH 0/4] ARM: NEON based fast(er) AES in CBC/CTR/XTS modes Nicolas Pitre
2013-09-20 19:12 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=523ED087.7050006@iki.fi \
--to=jussi.kivilinna@iki.fi \
--cc=appro@openssl.org \
--cc=ard.biesheuvel@linaro.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-crypto@vger.kernel.org \
--cc=nico@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.