From: Eric Biggers <ebiggers@kernel.org>
To: Jerry Shih <jerry.shih@sifive.com>
Cc: linux-riscv@lists.infradead.org,
"Palmer Dabbelt" <palmer@dabbelt.com>,
linux-crypto@vger.kernel.org,
"Christoph Müllner" <christoph.muellner@vrull.eu>,
"Heiko Stuebner" <heiko@sntech.de>,
"Phoebe Chen" <phoebe.chen@sifive.com>,
"Andy Chiu" <andy.chiu@sifive.com>
Subject: Re: [PATCH riscv/for-next] crypto: riscv - parallelize AES-CBC decryption
Date: Sat, 10 Feb 2024 10:12:40 -0800 [thread overview]
Message-ID: <20240210181240.GA1098@sol.localdomain> (raw)
In-Reply-To: <04703246-6EF6-4B54-B8F1-96EDEC2FBA6B@sifive.com>
On Sat, Feb 10, 2024 at 11:25:27PM +0800, Jerry Shih wrote:
> > .macro aes_cbc_decrypt keylen
> > + srli LEN, LEN, 2 // Convert LEN from bytes to words
> > vle32.v v16, (IVP) // Load IV
> > 1:
> > - vle32.v v17, (INP) // Load ciphertext block
> > - vmv.v.v v18, v17 // Save ciphertext block
> > - aes_decrypt v17, \keylen // Decrypt
> > - vxor.vv v17, v17, v16 // XOR with IV or prev ciphertext block
> > - vse32.v v17, (OUTP) // Store plaintext block
> > - vmv.v.v v16, v18 // Next "IV" is prev ciphertext block
> > - addi INP, INP, 16
> > - addi OUTP, OUTP, 16
> > - addi LEN, LEN, -16
> > + vsetvli t0, LEN, e32, m4, ta, ma
> > + vle32.v v20, (INP) // Load ciphertext blocks
> > + vslideup.vi v16, v20, 4 // Setup prev ciphertext blocks
> > + addi t1, t0, -4
> > + vslidedown.vx v24, v20, t1 // Save last ciphertext block
>
> Do we need to setup the `e32, len=t0` for next IV?
> I think we only need 128bit IV (with VL=4).
>
> > + aes_decrypt v20, \keylen // Decrypt the blocks
> > + vxor.vv v20, v20, v16 // XOR with prev ciphertext blocks
> > + vse32.v v20, (OUTP) // Store plaintext blocks
> > + vmv.v.v v16, v24 // Next "IV" is last ciphertext block
>
> Same VL issue here.
It's true that the vslidedown.vx and vmv.v.v only need vl=4. But it also works
fine with vl unchanged. It just results in some extra data being moved in the
registers. My hypothesis is that this is going to be faster than having the
three extra instructions per loop iteration to change the vl to 4 twice.
I still have no real hardware to test on, so I have no quantitative data. All I
can do is go with my instinct which is that the shorter version will be better.
If you have access to a real CPU that supports the RISC-V vector crypto
extensions, I'd be interested in the performance you get from each variant.
(Of course, different RISC-V CPU implementations may have quite different
performance characteristics, so that still won't be definitive.)
Here is the alternative variant given as a diff from this patch:
diff --git a/arch/riscv/crypto/aes-riscv64-zvkned.S b/arch/riscv/crypto/aes-riscv64-zvkned.S
index 43541aad6386c..ef380771f606a 100644
--- a/arch/riscv/crypto/aes-riscv64-zvkned.S
+++ b/arch/riscv/crypto/aes-riscv64-zvkned.S
@@ -146,10 +146,13 @@ SYM_FUNC_END(aes_ecb_decrypt_zvkned)
vle32.v v20, (INP) // Load ciphertext blocks
vslideup.vi v16, v20, 4 // Setup prev ciphertext blocks
addi t1, t0, -4
+ vsetivli zero, 4, e32, m4, ta, ma
vslidedown.vx v24, v20, t1 // Save last ciphertext block
+ vsetvli t0, LEN, e32, m4, ta, ma
aes_decrypt v20, \keylen // Decrypt the blocks
vxor.vv v20, v20, v16 // XOR with prev ciphertext blocks
vse32.v v20, (OUTP) // Store plaintext blocks
+ vsetivli zero, 4, e32, m4, ta, ma
vmv.v.v v16, v24 // Next "IV" is last ciphertext block
slli t1, t0, 2 // Words to bytes
add INP, INP, t1
@@ -157,7 +160,6 @@ SYM_FUNC_END(aes_ecb_decrypt_zvkned)
sub LEN, LEN, t0
bnez LEN, 1b
- vsetivli zero, 4, e32, m1, ta, ma
vse32.v v16, (IVP) // Store next IV
ret
.endm
A third variant would be to just replace vmv.v.v with vmv1r.v.
In general, this level of micro-optimization probably needs to be wait until
there are a variety of CPUs to test on. We know that parallelizing the
algorithms is helpful, so we should do that, as this patch does. But the
effects of small variations in the instruction sequences are currently unclear.
- Eric
next prev parent reply other threads:[~2024-02-10 18:12 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-08 6:08 [PATCH riscv/for-next] crypto: riscv - parallelize AES-CBC decryption Eric Biggers
2024-02-10 15:25 ` Jerry Shih
2024-02-10 18:12 ` Eric Biggers [this message]
2024-02-26 1:40 ` Jerry Shih
2024-03-20 1:48 ` Palmer Dabbelt
2024-03-20 20:50 ` patchwork-bot+linux-riscv
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240210181240.GA1098@sol.localdomain \
--to=ebiggers@kernel.org \
--cc=andy.chiu@sifive.com \
--cc=christoph.muellner@vrull.eu \
--cc=heiko@sntech.de \
--cc=jerry.shih@sifive.com \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=palmer@dabbelt.com \
--cc=phoebe.chen@sifive.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox