From: Eric Biggers <ebiggers@kernel.org>
To: Lukasz Stelmach <l.stelmach@samsung.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>,
"David S. Miller" <davem@davemloft.net>,
linux-crypto@vger.kernel.org
Subject: Re: xor_blocks() assumptions
Date: Tue, 3 Jan 2023 23:46:44 -0800 [thread overview]
Message-ID: <Y7Uu5GkxfrejPJXL@sol.localdomain> (raw)
In-Reply-To: <dleftjbknfoopx.fsf%l.stelmach@samsung.com>
On Tue, Jan 03, 2023 at 12:13:30PM +0100, Lukasz Stelmach wrote:
> > It also would be worth considering just optimizing crypto_xor() by
> > unrolling the word-at-a-time loop to 4x or so.
>
> If I understand correctly the generic 8regs and 32regs implementations
> in include/asm-generic/xor.h are what you mean. Using xor_blocks() in
> crypto_xor() could enable them for free on architectures lacking SIMD or
> vector instructions.
I actually meant exactly what I said -- unrolling the word-at-a-time loop in
crypto_xor(). Not using xor_blocks(). Something like this:
diff --git a/include/crypto/algapi.h b/include/crypto/algapi.h
index 61b327206b557..c0b90f14cae18 100644
--- a/include/crypto/algapi.h
+++ b/include/crypto/algapi.h
@@ -167,7 +167,18 @@ static inline void crypto_xor(u8 *dst, const u8 *src, unsigned int size)
unsigned long *s = (unsigned long *)src;
unsigned long l;
- while (size > 0) {
+ while (size >= 4 * sizeof(unsigned long)) {
+ l = get_unaligned(d) ^ get_unaligned(s++);
+ put_unaligned(l, d++);
+ l = get_unaligned(d) ^ get_unaligned(s++);
+ put_unaligned(l, d++);
+ l = get_unaligned(d) ^ get_unaligned(s++);
+ put_unaligned(l, d++);
+ l = get_unaligned(d) ^ get_unaligned(s++);
+ put_unaligned(l, d++);
+ size -= 4 * sizeof(unsigned long);
+ }
+ if (size > 0) {
l = get_unaligned(d) ^ get_unaligned(s++);
put_unaligned(l, d++);
size -= sizeof(unsigned long);
Actually, the compiler might unroll the loop automatically anyway, so even the
above change might not even be necessary. The point is, I expect that a proper
scalar implementation will perform well for pretty much anything other than
large input sizes.
It's only large input sizes where xor_blocks() might be worth it, considering
the significant overhead of the indirect call in xor_blocks() as well as
entering an SIMD code section. (Note that indirect calls are very expensive
these days, due to the speculative execution mitigations.)
Of course, the real question is what real-world scenario are you actually trying
to optimize for...
- Eric
prev parent reply other threads:[~2023-01-04 7:47 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20230102224447eucas1p1dad1a2362030eee0d3890dd3546a1532@eucas1p1.samsung.com>
2023-01-02 22:44 ` xor_blocks() assumptions Lukasz Stelmach
2023-01-02 23:03 ` Eric Biggers
2023-01-03 11:13 ` Lukasz Stelmach
2023-01-03 14:01 ` Ard Biesheuvel
2023-01-04 7:46 ` Eric Biggers [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y7Uu5GkxfrejPJXL@sol.localdomain \
--to=ebiggers@kernel.org \
--cc=davem@davemloft.net \
--cc=herbert@gondor.apana.org.au \
--cc=l.stelmach@samsung.com \
--cc=linux-crypto@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox