linux-crypto.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions
@ 2024-04-06  0:26 Eric Biggers
  2024-04-06  0:26 ` [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper Eric Biggers
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Eric Biggers @ 2024-04-06  0:26 UTC (permalink / raw)
  To: linux-crypto; +Cc: x86, Tim Chen

This series adds missing vzeroupper instructions before returning from
code that uses ymm registers.

Eric Biggers (3):
  crypto: x86/nh-avx2 - add missing vzeroupper
  crypto: x86/sha256-avx2 - add missing vzeroupper
  crypto: x86/sha512-avx2 - add missing vzeroupper

 arch/x86/crypto/nh-avx2-x86_64.S  | 1 +
 arch/x86/crypto/sha256-avx2-asm.S | 1 +
 arch/x86/crypto/sha512-avx2-asm.S | 1 +
 3 files changed, 3 insertions(+)


base-commit: 4ad27a8be9dbefd4820da0f60da879d512b2f659
-- 
2.44.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper
  2024-04-06  0:26 [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Eric Biggers
@ 2024-04-06  0:26 ` Eric Biggers
  2024-04-09 22:42   ` Tim Chen
  2024-04-06  0:26 ` [PATCH 2/3] crypto: x86/sha256-avx2 " Eric Biggers
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Eric Biggers @ 2024-04-06  0:26 UTC (permalink / raw)
  To: linux-crypto; +Cc: x86, Tim Chen

From: Eric Biggers <ebiggers@google.com>

Since nh_avx2() uses ymm registers, execute vzeroupper before returning
from it.  This is necessary to avoid reducing the performance of SSE
code.

Fixes: 0f961f9f670e ("crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305")
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 arch/x86/crypto/nh-avx2-x86_64.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/crypto/nh-avx2-x86_64.S b/arch/x86/crypto/nh-avx2-x86_64.S
index ef73a3ab8726..791386d9a83a 100644
--- a/arch/x86/crypto/nh-avx2-x86_64.S
+++ b/arch/x86/crypto/nh-avx2-x86_64.S
@@ -152,7 +152,8 @@ SYM_TYPED_FUNC_START(nh_avx2)
 
 	vpaddq		T5, T4, T4
 	vpaddq		T1, T0, T0
 	vpaddq		T4, T0, T0
 	vmovdqu		T0, (HASH)
+	vzeroupper
 	RET
 SYM_FUNC_END(nh_avx2)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] crypto: x86/sha256-avx2 - add missing vzeroupper
  2024-04-06  0:26 [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Eric Biggers
  2024-04-06  0:26 ` [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper Eric Biggers
@ 2024-04-06  0:26 ` Eric Biggers
  2024-04-09 22:38   ` Tim Chen
  2024-04-06  0:26 ` [PATCH 3/3] crypto: x86/sha512-avx2 " Eric Biggers
  2024-04-12  7:33 ` [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Herbert Xu
  3 siblings, 1 reply; 8+ messages in thread
From: Eric Biggers @ 2024-04-06  0:26 UTC (permalink / raw)
  To: linux-crypto; +Cc: x86, Tim Chen

From: Eric Biggers <ebiggers@google.com>

Since sha256_transform_rorx() uses ymm registers, execute vzeroupper
before returning from it.  This is necessary to avoid reducing the
performance of SSE code.

Fixes: d34a460092d8 ("crypto: sha256 - Optimized sha256 x86_64 routine using AVX2's RORX instructions")
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 arch/x86/crypto/sha256-avx2-asm.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/crypto/sha256-avx2-asm.S b/arch/x86/crypto/sha256-avx2-asm.S
index 9918212faf91..0ffb072be956 100644
--- a/arch/x86/crypto/sha256-avx2-asm.S
+++ b/arch/x86/crypto/sha256-avx2-asm.S
@@ -714,10 +714,11 @@ SYM_TYPED_FUNC_START(sha256_transform_rorx)
 	popq	%r15
 	popq	%r14
 	popq	%r13
 	popq	%r12
 	popq	%rbx
+	vzeroupper
 	RET
 SYM_FUNC_END(sha256_transform_rorx)
 
 .section	.rodata.cst512.K256, "aM", @progbits, 512
 .align 64
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] crypto: x86/sha512-avx2 - add missing vzeroupper
  2024-04-06  0:26 [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Eric Biggers
  2024-04-06  0:26 ` [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper Eric Biggers
  2024-04-06  0:26 ` [PATCH 2/3] crypto: x86/sha256-avx2 " Eric Biggers
@ 2024-04-06  0:26 ` Eric Biggers
  2024-04-09 22:38   ` Tim Chen
  2024-04-12  7:33 ` [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Herbert Xu
  3 siblings, 1 reply; 8+ messages in thread
From: Eric Biggers @ 2024-04-06  0:26 UTC (permalink / raw)
  To: linux-crypto; +Cc: x86, Tim Chen

From: Eric Biggers <ebiggers@google.com>

Since sha512_transform_rorx() uses ymm registers, execute vzeroupper
before returning from it.  This is necessary to avoid reducing the
performance of SSE code.

Fixes: e01d69cb0195 ("crypto: sha512 - Optimized SHA512 x86_64 assembly routine using AVX instructions.")
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 arch/x86/crypto/sha512-avx2-asm.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/crypto/sha512-avx2-asm.S b/arch/x86/crypto/sha512-avx2-asm.S
index f08496cd6870..24973f42c43f 100644
--- a/arch/x86/crypto/sha512-avx2-asm.S
+++ b/arch/x86/crypto/sha512-avx2-asm.S
@@ -678,10 +678,11 @@ SYM_TYPED_FUNC_START(sha512_transform_rorx)
 	pop	%r14
 	pop	%r13
 	pop	%r12
 	pop	%rbx
 
+	vzeroupper
 	RET
 SYM_FUNC_END(sha512_transform_rorx)
 
 ########################################################################
 ### Binary Data
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] crypto: x86/sha256-avx2 - add missing vzeroupper
  2024-04-06  0:26 ` [PATCH 2/3] crypto: x86/sha256-avx2 " Eric Biggers
@ 2024-04-09 22:38   ` Tim Chen
  0 siblings, 0 replies; 8+ messages in thread
From: Tim Chen @ 2024-04-09 22:38 UTC (permalink / raw)
  To: Eric Biggers, linux-crypto; +Cc: x86

On Fri, 2024-04-05 at 20:26 -0400, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Since sha256_transform_rorx() uses ymm registers, execute vzeroupper
> before returning from it.  This is necessary to avoid reducing the
> performance of SSE code.
> 
> Fixes: d34a460092d8 ("crypto: sha256 - Optimized sha256 x86_64 routine using AVX2's RORX instructions")
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
>  arch/x86/crypto/sha256-avx2-asm.S | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/crypto/sha256-avx2-asm.S b/arch/x86/crypto/sha256-avx2-asm.S
> index 9918212faf91..0ffb072be956 100644
> --- a/arch/x86/crypto/sha256-avx2-asm.S
> +++ b/arch/x86/crypto/sha256-avx2-asm.S
> @@ -714,10 +714,11 @@ SYM_TYPED_FUNC_START(sha256_transform_rorx)
>  	popq	%r15
>  	popq	%r14
>  	popq	%r13
>  	popq	%r12
>  	popq	%rbx
> +	vzeroupper
>  	RET
>  SYM_FUNC_END(sha256_transform_rorx)
>  
>  .section	.rodata.cst512.K256, "aM", @progbits, 512
>  .align 64

Acked-by: Tim Chen <tim.c.chen@linux.intel.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] crypto: x86/sha512-avx2 - add missing vzeroupper
  2024-04-06  0:26 ` [PATCH 3/3] crypto: x86/sha512-avx2 " Eric Biggers
@ 2024-04-09 22:38   ` Tim Chen
  0 siblings, 0 replies; 8+ messages in thread
From: Tim Chen @ 2024-04-09 22:38 UTC (permalink / raw)
  To: Eric Biggers, linux-crypto; +Cc: x86

On Fri, 2024-04-05 at 20:26 -0400, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Since sha512_transform_rorx() uses ymm registers, execute vzeroupper
> before returning from it.  This is necessary to avoid reducing the
> performance of SSE code.
> 
> Fixes: e01d69cb0195 ("crypto: sha512 - Optimized SHA512 x86_64 assembly routine using AVX instructions.")
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
>  arch/x86/crypto/sha512-avx2-asm.S | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/crypto/sha512-avx2-asm.S b/arch/x86/crypto/sha512-avx2-asm.S
> index f08496cd6870..24973f42c43f 100644
> --- a/arch/x86/crypto/sha512-avx2-asm.S
> +++ b/arch/x86/crypto/sha512-avx2-asm.S
> @@ -678,10 +678,11 @@ SYM_TYPED_FUNC_START(sha512_transform_rorx)
>  	pop	%r14
>  	pop	%r13
>  	pop	%r12
>  	pop	%rbx
>  
> +	vzeroupper
>  	RET
>  SYM_FUNC_END(sha512_transform_rorx)
>  
>  ########################################################################
>  ### Binary Data

Acked-by: Tim Chen <tim.c.chen@linux.intel.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper
  2024-04-06  0:26 ` [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper Eric Biggers
@ 2024-04-09 22:42   ` Tim Chen
  0 siblings, 0 replies; 8+ messages in thread
From: Tim Chen @ 2024-04-09 22:42 UTC (permalink / raw)
  To: Eric Biggers, linux-crypto; +Cc: x86

On Fri, 2024-04-05 at 20:26 -0400, Eric Biggers wrote:
> > From: Eric Biggers <ebiggers@google.com>
> > 
> > Since nh_avx2() uses ymm registers, execute vzeroupper before returning
> > from it.  This is necessary to avoid reducing the performance of SSE
> > code.
> > 
> > Fixes: 0f961f9f670e ("crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305")
> > Signed-off-by: Eric Biggers <ebiggers@google.com>
> > ---
> >  arch/x86/crypto/nh-avx2-x86_64.S | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/arch/x86/crypto/nh-avx2-x86_64.S b/arch/x86/crypto/nh-avx2-x86_64.S
> > index ef73a3ab8726..791386d9a83a 100644
> > --- a/arch/x86/crypto/nh-avx2-x86_64.S
> > +++ b/arch/x86/crypto/nh-avx2-x86_64.S
> > @@ -152,7 +152,8 @@ SYM_TYPED_FUNC_START(nh_avx2)
> >  
> >  	vpaddq		T5, T4, T4
> >  	vpaddq		T1, T0, T0
> >  	vpaddq		T4, T0, T0
> >  	vmovdqu		T0, (HASH)
> > +	vzeroupper
> >  	RET
> >  SYM_FUNC_END(nh_avx2)

Acked-by: Tim Chen <tim.c.chen@linux.intel.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions
  2024-04-06  0:26 [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Eric Biggers
                   ` (2 preceding siblings ...)
  2024-04-06  0:26 ` [PATCH 3/3] crypto: x86/sha512-avx2 " Eric Biggers
@ 2024-04-12  7:33 ` Herbert Xu
  3 siblings, 0 replies; 8+ messages in thread
From: Herbert Xu @ 2024-04-12  7:33 UTC (permalink / raw)
  To: Eric Biggers; +Cc: linux-crypto, x86, tim.c.chen

Eric Biggers <ebiggers@kernel.org> wrote:
> This series adds missing vzeroupper instructions before returning from
> code that uses ymm registers.
> 
> Eric Biggers (3):
>  crypto: x86/nh-avx2 - add missing vzeroupper
>  crypto: x86/sha256-avx2 - add missing vzeroupper
>  crypto: x86/sha512-avx2 - add missing vzeroupper
> 
> arch/x86/crypto/nh-avx2-x86_64.S  | 1 +
> arch/x86/crypto/sha256-avx2-asm.S | 1 +
> arch/x86/crypto/sha512-avx2-asm.S | 1 +
> 3 files changed, 3 insertions(+)
> 
> 
> base-commit: 4ad27a8be9dbefd4820da0f60da879d512b2f659

All applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-04-12  7:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-06  0:26 [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Eric Biggers
2024-04-06  0:26 ` [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper Eric Biggers
2024-04-09 22:42   ` Tim Chen
2024-04-06  0:26 ` [PATCH 2/3] crypto: x86/sha256-avx2 " Eric Biggers
2024-04-09 22:38   ` Tim Chen
2024-04-06  0:26 ` [PATCH 3/3] crypto: x86/sha512-avx2 " Eric Biggers
2024-04-09 22:38   ` Tim Chen
2024-04-12  7:33 ` [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).