From: Eric Biggers <ebiggers@kernel.org>
To: Stefan Kanthak <stefan.kanthak@nexgo.de>
Cc: linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org,
ardb@kernel.org
Subject: Re: [PATCH 1/2] crypto: x86/sha256-ni - convert to use rounds macros
Date: Thu, 11 Apr 2024 09:16:55 -0700 [thread overview]
Message-ID: <20240411161655.GA9356@sol.localdomain> (raw)
In-Reply-To: <450F5ED9B5834B2EA883786C32E1A30E@H270>
Hi Stefan,
On Thu, Apr 11, 2024 at 09:42:00AM +0200, Stefan Kanthak wrote:
> "Eric Biggers" <ebiggers@kernel.org> wrote:
>
> > On Tue, Apr 09, 2024 at 06:52:02PM +0200, Stefan Kanthak wrote:
> >> "Eric Biggers" <ebiggers@kernel.org> wrote:
> >>
> >> > +.macro do_4rounds i, m0, m1, m2, m3
> >> > +.if \i < 16
> >> > + movdqu \i*4(DATA_PTR), MSG
> >> > + pshufb SHUF_MASK, MSG
> >> > + movdqa MSG, \m0
> >> > +.else
> >> > + movdqa \m0, MSG
> >> > +.endif
> >> > + paddd \i*4(SHA256CONSTANTS), MSG
> >>
> >> To load the round constant independent from and parallel to the previous
> >> instructions which use \m0 I recommend to change the first lines of the
> >> do_4rounds macro as follows (this might save 1+ cycle per macro invocation,
> >> and most obviously 2 lines):
> >>
> >> .macro do_4rounds i, m0, m1, m2, m3
> >> .if \i < 16
> >> movdqu \i*4(DATA_PTR), \m0
> >> pshufb SHUF_MASK, \m0
> >> .endif
> >> movdqa \i*4(SHA256CONSTANTS), MSG
> >> paddd \m0, MSG
> >> ...
> >
> > Yes, your suggestion looks good. I don't see any performance difference on
> > Ice Lake, but it does shorten the source code. It belongs in a separate patch
> > though, since this patch isn't meant to change the output.
>
> Hmmm... the output was already changed: 2 palignr/pblendw and 16 pshufd
> have been replaced with punpck?qdq, and 17 displacements changed.
Yes, the second patch does that. Your comment is on the first patch, so I
thought you were suggesting changing the first patch. I'll handle your
suggestion in another patch. I'm just trying to keep changes to the actual
output separate from source code only cleanups.
> Next simplification, and 5 more lines gone: replace the macro do_16rounds
> with a repetition
>
> @@ ...
> -.macro do_16rounds i
> - do_4rounds (\i + 0), MSGTMP0, MSGTMP1, MSGTMP2, MSGTMP3
> - do_4rounds (\i + 4), MSGTMP1, MSGTMP2, MSGTMP3, MSGTMP0
> - do_4rounds (\i + 8), MSGTMP2, MSGTMP3, MSGTMP0, MSGTMP1
> - do_4rounds (\i + 12), MSGTMP3, MSGTMP0, MSGTMP1, MSGTMP2
> -.endm
> -
> @@ ...
> - do_16rounds 0
> - do_16rounds 16
> - do_16rounds 32
> - do_16rounds 48
> +.irp i, 0, 16, 32, 48
> + do_4rounds (\i + 0), MSGTMP0, MSGTMP1, MSGTMP2, MSGTMP3
> + do_4rounds (\i + 4), MSGTMP1, MSGTMP2, MSGTMP3, MSGTMP0
> + do_4rounds (\i + 8), MSGTMP2, MSGTMP3, MSGTMP0, MSGTMP1
> + do_4rounds (\i + 12), MSGTMP3, MSGTMP0, MSGTMP1, MSGTMP2
> +.endr
>
> This doesn't change the instructions generated, so it belongs to this patch.
Yes, that makes sense.
> The following suggestion changes instructions: AFAIK all processors which
> support the SHA extensions support AVX too
No (unfortunately). Several generations of Intel's low-power CPUs support SHA
but not AVX. Namely Goldmont, Goldmont Plus, and Tremont.
We could provide two SHA-256 implementations, one with AVX and one without. I
think it's not worthwhile, though.
We ran into this issue with AES-NI too; I was hoping that we could just provide
the new AES-NI + AVX implementation of AES-XTS, and remove the older
implementation that uses AES-NI alone. However, apparently the above-mentioned
low-power CPUs do need the implementation that uses AES-NI alone.
> And last: are the "#define ... %xmm?" really necessary?
>
> - MSG can't be anything but %xmm0;
> - MSGTMP4 is despite its prefix MSG also used to shuffle STATE0 and STATE1,
> so it should be named TMP instead (if kept);
> - MSGTMP0 to MSGTMP3 are the circular message schedule, they should be named
> MSG0 to MSG3 instead (if kept).
>
> I suggest to remove at least those which are now encapsulated in the macro
> and the repetition.
Yes, I noticed the weird naming too. I'll rename MSGTMP[0-3] to MSG[0-3], and
MSGTMP4 to TMP.
You're correct that MSG has to be xmm0 because of how sha256rnds2 uses xmm0 as
an implicit operand. I think I'll leave the '#define' for MSG anyway because
the list of aliases helps make it clear what each register is used for.
Thanks,
- Eric
next prev parent reply other threads:[~2024-04-11 16:16 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-09 12:42 [PATCH 0/2] crypto: x86/sha256-ni - cleanup and optimization Eric Biggers
2024-04-09 12:42 ` [PATCH 1/2] crypto: x86/sha256-ni - convert to use rounds macros Eric Biggers
2024-04-09 16:52 ` Stefan Kanthak
2024-04-09 23:36 ` Eric Biggers
2024-04-11 7:42 ` Stefan Kanthak
2024-04-11 16:16 ` Eric Biggers [this message]
2024-04-09 12:42 ` [PATCH 2/2] crypto: x86/sha256-ni - optimize code size Eric Biggers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240411161655.GA9356@sol.localdomain \
--to=ebiggers@kernel.org \
--cc=ardb@kernel.org \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=stefan.kanthak@nexgo.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox