* [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions
@ 2024-04-06 0:26 Eric Biggers
2024-04-06 0:26 ` [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper Eric Biggers
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Eric Biggers @ 2024-04-06 0:26 UTC (permalink / raw)
To: linux-crypto; +Cc: x86, Tim Chen
This series adds missing vzeroupper instructions before returning from
code that uses ymm registers.
Eric Biggers (3):
crypto: x86/nh-avx2 - add missing vzeroupper
crypto: x86/sha256-avx2 - add missing vzeroupper
crypto: x86/sha512-avx2 - add missing vzeroupper
arch/x86/crypto/nh-avx2-x86_64.S | 1 +
arch/x86/crypto/sha256-avx2-asm.S | 1 +
arch/x86/crypto/sha512-avx2-asm.S | 1 +
3 files changed, 3 insertions(+)
base-commit: 4ad27a8be9dbefd4820da0f60da879d512b2f659
--
2.44.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper
2024-04-06 0:26 [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Eric Biggers
@ 2024-04-06 0:26 ` Eric Biggers
2024-04-09 22:42 ` Tim Chen
2024-04-06 0:26 ` [PATCH 2/3] crypto: x86/sha256-avx2 " Eric Biggers
` (2 subsequent siblings)
3 siblings, 1 reply; 8+ messages in thread
From: Eric Biggers @ 2024-04-06 0:26 UTC (permalink / raw)
To: linux-crypto; +Cc: x86, Tim Chen
From: Eric Biggers <ebiggers@google.com>
Since nh_avx2() uses ymm registers, execute vzeroupper before returning
from it. This is necessary to avoid reducing the performance of SSE
code.
Fixes: 0f961f9f670e ("crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305")
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
arch/x86/crypto/nh-avx2-x86_64.S | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/crypto/nh-avx2-x86_64.S b/arch/x86/crypto/nh-avx2-x86_64.S
index ef73a3ab8726..791386d9a83a 100644
--- a/arch/x86/crypto/nh-avx2-x86_64.S
+++ b/arch/x86/crypto/nh-avx2-x86_64.S
@@ -152,7 +152,8 @@ SYM_TYPED_FUNC_START(nh_avx2)
vpaddq T5, T4, T4
vpaddq T1, T0, T0
vpaddq T4, T0, T0
vmovdqu T0, (HASH)
+ vzeroupper
RET
SYM_FUNC_END(nh_avx2)
--
2.44.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/3] crypto: x86/sha256-avx2 - add missing vzeroupper
2024-04-06 0:26 [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Eric Biggers
2024-04-06 0:26 ` [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper Eric Biggers
@ 2024-04-06 0:26 ` Eric Biggers
2024-04-09 22:38 ` Tim Chen
2024-04-06 0:26 ` [PATCH 3/3] crypto: x86/sha512-avx2 " Eric Biggers
2024-04-12 7:33 ` [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Herbert Xu
3 siblings, 1 reply; 8+ messages in thread
From: Eric Biggers @ 2024-04-06 0:26 UTC (permalink / raw)
To: linux-crypto; +Cc: x86, Tim Chen
From: Eric Biggers <ebiggers@google.com>
Since sha256_transform_rorx() uses ymm registers, execute vzeroupper
before returning from it. This is necessary to avoid reducing the
performance of SSE code.
Fixes: d34a460092d8 ("crypto: sha256 - Optimized sha256 x86_64 routine using AVX2's RORX instructions")
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
arch/x86/crypto/sha256-avx2-asm.S | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/crypto/sha256-avx2-asm.S b/arch/x86/crypto/sha256-avx2-asm.S
index 9918212faf91..0ffb072be956 100644
--- a/arch/x86/crypto/sha256-avx2-asm.S
+++ b/arch/x86/crypto/sha256-avx2-asm.S
@@ -714,10 +714,11 @@ SYM_TYPED_FUNC_START(sha256_transform_rorx)
popq %r15
popq %r14
popq %r13
popq %r12
popq %rbx
+ vzeroupper
RET
SYM_FUNC_END(sha256_transform_rorx)
.section .rodata.cst512.K256, "aM", @progbits, 512
.align 64
--
2.44.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/3] crypto: x86/sha512-avx2 - add missing vzeroupper
2024-04-06 0:26 [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Eric Biggers
2024-04-06 0:26 ` [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper Eric Biggers
2024-04-06 0:26 ` [PATCH 2/3] crypto: x86/sha256-avx2 " Eric Biggers
@ 2024-04-06 0:26 ` Eric Biggers
2024-04-09 22:38 ` Tim Chen
2024-04-12 7:33 ` [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Herbert Xu
3 siblings, 1 reply; 8+ messages in thread
From: Eric Biggers @ 2024-04-06 0:26 UTC (permalink / raw)
To: linux-crypto; +Cc: x86, Tim Chen
From: Eric Biggers <ebiggers@google.com>
Since sha512_transform_rorx() uses ymm registers, execute vzeroupper
before returning from it. This is necessary to avoid reducing the
performance of SSE code.
Fixes: e01d69cb0195 ("crypto: sha512 - Optimized SHA512 x86_64 assembly routine using AVX instructions.")
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
arch/x86/crypto/sha512-avx2-asm.S | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/crypto/sha512-avx2-asm.S b/arch/x86/crypto/sha512-avx2-asm.S
index f08496cd6870..24973f42c43f 100644
--- a/arch/x86/crypto/sha512-avx2-asm.S
+++ b/arch/x86/crypto/sha512-avx2-asm.S
@@ -678,10 +678,11 @@ SYM_TYPED_FUNC_START(sha512_transform_rorx)
pop %r14
pop %r13
pop %r12
pop %rbx
+ vzeroupper
RET
SYM_FUNC_END(sha512_transform_rorx)
########################################################################
### Binary Data
--
2.44.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] crypto: x86/sha256-avx2 - add missing vzeroupper
2024-04-06 0:26 ` [PATCH 2/3] crypto: x86/sha256-avx2 " Eric Biggers
@ 2024-04-09 22:38 ` Tim Chen
0 siblings, 0 replies; 8+ messages in thread
From: Tim Chen @ 2024-04-09 22:38 UTC (permalink / raw)
To: Eric Biggers, linux-crypto; +Cc: x86
On Fri, 2024-04-05 at 20:26 -0400, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
>
> Since sha256_transform_rorx() uses ymm registers, execute vzeroupper
> before returning from it. This is necessary to avoid reducing the
> performance of SSE code.
>
> Fixes: d34a460092d8 ("crypto: sha256 - Optimized sha256 x86_64 routine using AVX2's RORX instructions")
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
> arch/x86/crypto/sha256-avx2-asm.S | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/crypto/sha256-avx2-asm.S b/arch/x86/crypto/sha256-avx2-asm.S
> index 9918212faf91..0ffb072be956 100644
> --- a/arch/x86/crypto/sha256-avx2-asm.S
> +++ b/arch/x86/crypto/sha256-avx2-asm.S
> @@ -714,10 +714,11 @@ SYM_TYPED_FUNC_START(sha256_transform_rorx)
> popq %r15
> popq %r14
> popq %r13
> popq %r12
> popq %rbx
> + vzeroupper
> RET
> SYM_FUNC_END(sha256_transform_rorx)
>
> .section .rodata.cst512.K256, "aM", @progbits, 512
> .align 64
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 3/3] crypto: x86/sha512-avx2 - add missing vzeroupper
2024-04-06 0:26 ` [PATCH 3/3] crypto: x86/sha512-avx2 " Eric Biggers
@ 2024-04-09 22:38 ` Tim Chen
0 siblings, 0 replies; 8+ messages in thread
From: Tim Chen @ 2024-04-09 22:38 UTC (permalink / raw)
To: Eric Biggers, linux-crypto; +Cc: x86
On Fri, 2024-04-05 at 20:26 -0400, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
>
> Since sha512_transform_rorx() uses ymm registers, execute vzeroupper
> before returning from it. This is necessary to avoid reducing the
> performance of SSE code.
>
> Fixes: e01d69cb0195 ("crypto: sha512 - Optimized SHA512 x86_64 assembly routine using AVX instructions.")
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
> arch/x86/crypto/sha512-avx2-asm.S | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/crypto/sha512-avx2-asm.S b/arch/x86/crypto/sha512-avx2-asm.S
> index f08496cd6870..24973f42c43f 100644
> --- a/arch/x86/crypto/sha512-avx2-asm.S
> +++ b/arch/x86/crypto/sha512-avx2-asm.S
> @@ -678,10 +678,11 @@ SYM_TYPED_FUNC_START(sha512_transform_rorx)
> pop %r14
> pop %r13
> pop %r12
> pop %rbx
>
> + vzeroupper
> RET
> SYM_FUNC_END(sha512_transform_rorx)
>
> ########################################################################
> ### Binary Data
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper
2024-04-06 0:26 ` [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper Eric Biggers
@ 2024-04-09 22:42 ` Tim Chen
0 siblings, 0 replies; 8+ messages in thread
From: Tim Chen @ 2024-04-09 22:42 UTC (permalink / raw)
To: Eric Biggers, linux-crypto; +Cc: x86
On Fri, 2024-04-05 at 20:26 -0400, Eric Biggers wrote:
> > From: Eric Biggers <ebiggers@google.com>
> >
> > Since nh_avx2() uses ymm registers, execute vzeroupper before returning
> > from it. This is necessary to avoid reducing the performance of SSE
> > code.
> >
> > Fixes: 0f961f9f670e ("crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305")
> > Signed-off-by: Eric Biggers <ebiggers@google.com>
> > ---
> > arch/x86/crypto/nh-avx2-x86_64.S | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/x86/crypto/nh-avx2-x86_64.S b/arch/x86/crypto/nh-avx2-x86_64.S
> > index ef73a3ab8726..791386d9a83a 100644
> > --- a/arch/x86/crypto/nh-avx2-x86_64.S
> > +++ b/arch/x86/crypto/nh-avx2-x86_64.S
> > @@ -152,7 +152,8 @@ SYM_TYPED_FUNC_START(nh_avx2)
> >
> > vpaddq T5, T4, T4
> > vpaddq T1, T0, T0
> > vpaddq T4, T0, T0
> > vmovdqu T0, (HASH)
> > + vzeroupper
> > RET
> > SYM_FUNC_END(nh_avx2)
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions
2024-04-06 0:26 [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Eric Biggers
` (2 preceding siblings ...)
2024-04-06 0:26 ` [PATCH 3/3] crypto: x86/sha512-avx2 " Eric Biggers
@ 2024-04-12 7:33 ` Herbert Xu
3 siblings, 0 replies; 8+ messages in thread
From: Herbert Xu @ 2024-04-12 7:33 UTC (permalink / raw)
To: Eric Biggers; +Cc: linux-crypto, x86, tim.c.chen
Eric Biggers <ebiggers@kernel.org> wrote:
> This series adds missing vzeroupper instructions before returning from
> code that uses ymm registers.
>
> Eric Biggers (3):
> crypto: x86/nh-avx2 - add missing vzeroupper
> crypto: x86/sha256-avx2 - add missing vzeroupper
> crypto: x86/sha512-avx2 - add missing vzeroupper
>
> arch/x86/crypto/nh-avx2-x86_64.S | 1 +
> arch/x86/crypto/sha256-avx2-asm.S | 1 +
> arch/x86/crypto/sha512-avx2-asm.S | 1 +
> 3 files changed, 3 insertions(+)
>
>
> base-commit: 4ad27a8be9dbefd4820da0f60da879d512b2f659
All applied. Thanks.
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-04-12 7:33 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-06 0:26 [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Eric Biggers
2024-04-06 0:26 ` [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper Eric Biggers
2024-04-09 22:42 ` Tim Chen
2024-04-06 0:26 ` [PATCH 2/3] crypto: x86/sha256-avx2 " Eric Biggers
2024-04-09 22:38 ` Tim Chen
2024-04-06 0:26 ` [PATCH 3/3] crypto: x86/sha512-avx2 " Eric Biggers
2024-04-09 22:38 ` Tim Chen
2024-04-12 7:33 ` [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Herbert Xu
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.