* [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions
@ 2024-04-06 0:26 Eric Biggers
2024-04-06 0:26 ` [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper Eric Biggers
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Eric Biggers @ 2024-04-06 0:26 UTC (permalink / raw)
To: linux-crypto; +Cc: x86, Tim Chen
This series adds missing vzeroupper instructions before returning from
code that uses ymm registers.
Eric Biggers (3):
crypto: x86/nh-avx2 - add missing vzeroupper
crypto: x86/sha256-avx2 - add missing vzeroupper
crypto: x86/sha512-avx2 - add missing vzeroupper
arch/x86/crypto/nh-avx2-x86_64.S | 1 +
arch/x86/crypto/sha256-avx2-asm.S | 1 +
arch/x86/crypto/sha512-avx2-asm.S | 1 +
3 files changed, 3 insertions(+)
base-commit: 4ad27a8be9dbefd4820da0f60da879d512b2f659
--
2.44.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper
2024-04-06 0:26 [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Eric Biggers
@ 2024-04-06 0:26 ` Eric Biggers
2024-04-09 22:42 ` Tim Chen
2024-04-06 0:26 ` [PATCH 2/3] crypto: x86/sha256-avx2 " Eric Biggers
` (2 subsequent siblings)
3 siblings, 1 reply; 8+ messages in thread
From: Eric Biggers @ 2024-04-06 0:26 UTC (permalink / raw)
To: linux-crypto; +Cc: x86, Tim Chen
From: Eric Biggers <ebiggers@google.com>
Since nh_avx2() uses ymm registers, execute vzeroupper before returning
from it. This is necessary to avoid reducing the performance of SSE
code.
Fixes: 0f961f9f670e ("crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305")
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
arch/x86/crypto/nh-avx2-x86_64.S | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/crypto/nh-avx2-x86_64.S b/arch/x86/crypto/nh-avx2-x86_64.S
index ef73a3ab8726..791386d9a83a 100644
--- a/arch/x86/crypto/nh-avx2-x86_64.S
+++ b/arch/x86/crypto/nh-avx2-x86_64.S
@@ -152,7 +152,8 @@ SYM_TYPED_FUNC_START(nh_avx2)
vpaddq T5, T4, T4
vpaddq T1, T0, T0
vpaddq T4, T0, T0
vmovdqu T0, (HASH)
+ vzeroupper
RET
SYM_FUNC_END(nh_avx2)
--
2.44.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/3] crypto: x86/sha256-avx2 - add missing vzeroupper
2024-04-06 0:26 [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Eric Biggers
2024-04-06 0:26 ` [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper Eric Biggers
@ 2024-04-06 0:26 ` Eric Biggers
2024-04-09 22:38 ` Tim Chen
2024-04-06 0:26 ` [PATCH 3/3] crypto: x86/sha512-avx2 " Eric Biggers
2024-04-12 7:33 ` [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Herbert Xu
3 siblings, 1 reply; 8+ messages in thread
From: Eric Biggers @ 2024-04-06 0:26 UTC (permalink / raw)
To: linux-crypto; +Cc: x86, Tim Chen
From: Eric Biggers <ebiggers@google.com>
Since sha256_transform_rorx() uses ymm registers, execute vzeroupper
before returning from it. This is necessary to avoid reducing the
performance of SSE code.
Fixes: d34a460092d8 ("crypto: sha256 - Optimized sha256 x86_64 routine using AVX2's RORX instructions")
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
arch/x86/crypto/sha256-avx2-asm.S | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/crypto/sha256-avx2-asm.S b/arch/x86/crypto/sha256-avx2-asm.S
index 9918212faf91..0ffb072be956 100644
--- a/arch/x86/crypto/sha256-avx2-asm.S
+++ b/arch/x86/crypto/sha256-avx2-asm.S
@@ -714,10 +714,11 @@ SYM_TYPED_FUNC_START(sha256_transform_rorx)
popq %r15
popq %r14
popq %r13
popq %r12
popq %rbx
+ vzeroupper
RET
SYM_FUNC_END(sha256_transform_rorx)
.section .rodata.cst512.K256, "aM", @progbits, 512
.align 64
--
2.44.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/3] crypto: x86/sha512-avx2 - add missing vzeroupper
2024-04-06 0:26 [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Eric Biggers
2024-04-06 0:26 ` [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper Eric Biggers
2024-04-06 0:26 ` [PATCH 2/3] crypto: x86/sha256-avx2 " Eric Biggers
@ 2024-04-06 0:26 ` Eric Biggers
2024-04-09 22:38 ` Tim Chen
2024-04-12 7:33 ` [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Herbert Xu
3 siblings, 1 reply; 8+ messages in thread
From: Eric Biggers @ 2024-04-06 0:26 UTC (permalink / raw)
To: linux-crypto; +Cc: x86, Tim Chen
From: Eric Biggers <ebiggers@google.com>
Since sha512_transform_rorx() uses ymm registers, execute vzeroupper
before returning from it. This is necessary to avoid reducing the
performance of SSE code.
Fixes: e01d69cb0195 ("crypto: sha512 - Optimized SHA512 x86_64 assembly routine using AVX instructions.")
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
arch/x86/crypto/sha512-avx2-asm.S | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/crypto/sha512-avx2-asm.S b/arch/x86/crypto/sha512-avx2-asm.S
index f08496cd6870..24973f42c43f 100644
--- a/arch/x86/crypto/sha512-avx2-asm.S
+++ b/arch/x86/crypto/sha512-avx2-asm.S
@@ -678,10 +678,11 @@ SYM_TYPED_FUNC_START(sha512_transform_rorx)
pop %r14
pop %r13
pop %r12
pop %rbx
+ vzeroupper
RET
SYM_FUNC_END(sha512_transform_rorx)
########################################################################
### Binary Data
--
2.44.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] crypto: x86/sha256-avx2 - add missing vzeroupper
2024-04-06 0:26 ` [PATCH 2/3] crypto: x86/sha256-avx2 " Eric Biggers
@ 2024-04-09 22:38 ` Tim Chen
0 siblings, 0 replies; 8+ messages in thread
From: Tim Chen @ 2024-04-09 22:38 UTC (permalink / raw)
To: Eric Biggers, linux-crypto; +Cc: x86
On Fri, 2024-04-05 at 20:26 -0400, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
>
> Since sha256_transform_rorx() uses ymm registers, execute vzeroupper
> before returning from it. This is necessary to avoid reducing the
> performance of SSE code.
>
> Fixes: d34a460092d8 ("crypto: sha256 - Optimized sha256 x86_64 routine using AVX2's RORX instructions")
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
> arch/x86/crypto/sha256-avx2-asm.S | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/crypto/sha256-avx2-asm.S b/arch/x86/crypto/sha256-avx2-asm.S
> index 9918212faf91..0ffb072be956 100644
> --- a/arch/x86/crypto/sha256-avx2-asm.S
> +++ b/arch/x86/crypto/sha256-avx2-asm.S
> @@ -714,10 +714,11 @@ SYM_TYPED_FUNC_START(sha256_transform_rorx)
> popq %r15
> popq %r14
> popq %r13
> popq %r12
> popq %rbx
> + vzeroupper
> RET
> SYM_FUNC_END(sha256_transform_rorx)
>
> .section .rodata.cst512.K256, "aM", @progbits, 512
> .align 64
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 3/3] crypto: x86/sha512-avx2 - add missing vzeroupper
2024-04-06 0:26 ` [PATCH 3/3] crypto: x86/sha512-avx2 " Eric Biggers
@ 2024-04-09 22:38 ` Tim Chen
0 siblings, 0 replies; 8+ messages in thread
From: Tim Chen @ 2024-04-09 22:38 UTC (permalink / raw)
To: Eric Biggers, linux-crypto; +Cc: x86
On Fri, 2024-04-05 at 20:26 -0400, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
>
> Since sha512_transform_rorx() uses ymm registers, execute vzeroupper
> before returning from it. This is necessary to avoid reducing the
> performance of SSE code.
>
> Fixes: e01d69cb0195 ("crypto: sha512 - Optimized SHA512 x86_64 assembly routine using AVX instructions.")
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
> arch/x86/crypto/sha512-avx2-asm.S | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/crypto/sha512-avx2-asm.S b/arch/x86/crypto/sha512-avx2-asm.S
> index f08496cd6870..24973f42c43f 100644
> --- a/arch/x86/crypto/sha512-avx2-asm.S
> +++ b/arch/x86/crypto/sha512-avx2-asm.S
> @@ -678,10 +678,11 @@ SYM_TYPED_FUNC_START(sha512_transform_rorx)
> pop %r14
> pop %r13
> pop %r12
> pop %rbx
>
> + vzeroupper
> RET
> SYM_FUNC_END(sha512_transform_rorx)
>
> ########################################################################
> ### Binary Data
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper
2024-04-06 0:26 ` [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper Eric Biggers
@ 2024-04-09 22:42 ` Tim Chen
0 siblings, 0 replies; 8+ messages in thread
From: Tim Chen @ 2024-04-09 22:42 UTC (permalink / raw)
To: Eric Biggers, linux-crypto; +Cc: x86
On Fri, 2024-04-05 at 20:26 -0400, Eric Biggers wrote:
> > From: Eric Biggers <ebiggers@google.com>
> >
> > Since nh_avx2() uses ymm registers, execute vzeroupper before returning
> > from it. This is necessary to avoid reducing the performance of SSE
> > code.
> >
> > Fixes: 0f961f9f670e ("crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305")
> > Signed-off-by: Eric Biggers <ebiggers@google.com>
> > ---
> > arch/x86/crypto/nh-avx2-x86_64.S | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/x86/crypto/nh-avx2-x86_64.S b/arch/x86/crypto/nh-avx2-x86_64.S
> > index ef73a3ab8726..791386d9a83a 100644
> > --- a/arch/x86/crypto/nh-avx2-x86_64.S
> > +++ b/arch/x86/crypto/nh-avx2-x86_64.S
> > @@ -152,7 +152,8 @@ SYM_TYPED_FUNC_START(nh_avx2)
> >
> > vpaddq T5, T4, T4
> > vpaddq T1, T0, T0
> > vpaddq T4, T0, T0
> > vmovdqu T0, (HASH)
> > + vzeroupper
> > RET
> > SYM_FUNC_END(nh_avx2)
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions
2024-04-06 0:26 [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Eric Biggers
` (2 preceding siblings ...)
2024-04-06 0:26 ` [PATCH 3/3] crypto: x86/sha512-avx2 " Eric Biggers
@ 2024-04-12 7:33 ` Herbert Xu
3 siblings, 0 replies; 8+ messages in thread
From: Herbert Xu @ 2024-04-12 7:33 UTC (permalink / raw)
To: Eric Biggers; +Cc: linux-crypto, x86, tim.c.chen
Eric Biggers <ebiggers@kernel.org> wrote:
> This series adds missing vzeroupper instructions before returning from
> code that uses ymm registers.
>
> Eric Biggers (3):
> crypto: x86/nh-avx2 - add missing vzeroupper
> crypto: x86/sha256-avx2 - add missing vzeroupper
> crypto: x86/sha512-avx2 - add missing vzeroupper
>
> arch/x86/crypto/nh-avx2-x86_64.S | 1 +
> arch/x86/crypto/sha256-avx2-asm.S | 1 +
> arch/x86/crypto/sha512-avx2-asm.S | 1 +
> 3 files changed, 3 insertions(+)
>
>
> base-commit: 4ad27a8be9dbefd4820da0f60da879d512b2f659
All applied. Thanks.
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-04-12 7:33 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-06 0:26 [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Eric Biggers
2024-04-06 0:26 ` [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper Eric Biggers
2024-04-09 22:42 ` Tim Chen
2024-04-06 0:26 ` [PATCH 2/3] crypto: x86/sha256-avx2 " Eric Biggers
2024-04-09 22:38 ` Tim Chen
2024-04-06 0:26 ` [PATCH 3/3] crypto: x86/sha512-avx2 " Eric Biggers
2024-04-09 22:38 ` Tim Chen
2024-04-12 7:33 ` [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Herbert Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).