All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions
@ 2024-04-06  0:26 Eric Biggers
  2024-04-06  0:26 ` [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper Eric Biggers
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Eric Biggers @ 2024-04-06  0:26 UTC (permalink / raw)
  To: linux-crypto; +Cc: x86, Tim Chen

This series adds missing vzeroupper instructions before returning from
code that uses ymm registers.

Eric Biggers (3):
  crypto: x86/nh-avx2 - add missing vzeroupper
  crypto: x86/sha256-avx2 - add missing vzeroupper
  crypto: x86/sha512-avx2 - add missing vzeroupper

 arch/x86/crypto/nh-avx2-x86_64.S  | 1 +
 arch/x86/crypto/sha256-avx2-asm.S | 1 +
 arch/x86/crypto/sha512-avx2-asm.S | 1 +
 3 files changed, 3 insertions(+)


base-commit: 4ad27a8be9dbefd4820da0f60da879d512b2f659
-- 
2.44.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper
  2024-04-06  0:26 [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Eric Biggers
@ 2024-04-06  0:26 ` Eric Biggers
  2024-04-09 22:42   ` Tim Chen
  2024-04-06  0:26 ` [PATCH 2/3] crypto: x86/sha256-avx2 " Eric Biggers
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Eric Biggers @ 2024-04-06  0:26 UTC (permalink / raw)
  To: linux-crypto; +Cc: x86, Tim Chen

From: Eric Biggers <ebiggers@google.com>

Since nh_avx2() uses ymm registers, execute vzeroupper before returning
from it.  This is necessary to avoid reducing the performance of SSE
code.

Fixes: 0f961f9f670e ("crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305")
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 arch/x86/crypto/nh-avx2-x86_64.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/crypto/nh-avx2-x86_64.S b/arch/x86/crypto/nh-avx2-x86_64.S
index ef73a3ab8726..791386d9a83a 100644
--- a/arch/x86/crypto/nh-avx2-x86_64.S
+++ b/arch/x86/crypto/nh-avx2-x86_64.S
@@ -152,7 +152,8 @@ SYM_TYPED_FUNC_START(nh_avx2)
 
 	vpaddq		T5, T4, T4
 	vpaddq		T1, T0, T0
 	vpaddq		T4, T0, T0
 	vmovdqu		T0, (HASH)
+	vzeroupper
 	RET
 SYM_FUNC_END(nh_avx2)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] crypto: x86/sha256-avx2 - add missing vzeroupper
  2024-04-06  0:26 [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Eric Biggers
  2024-04-06  0:26 ` [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper Eric Biggers
@ 2024-04-06  0:26 ` Eric Biggers
  2024-04-09 22:38   ` Tim Chen
  2024-04-06  0:26 ` [PATCH 3/3] crypto: x86/sha512-avx2 " Eric Biggers
  2024-04-12  7:33 ` [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Herbert Xu
  3 siblings, 1 reply; 8+ messages in thread
From: Eric Biggers @ 2024-04-06  0:26 UTC (permalink / raw)
  To: linux-crypto; +Cc: x86, Tim Chen

From: Eric Biggers <ebiggers@google.com>

Since sha256_transform_rorx() uses ymm registers, execute vzeroupper
before returning from it.  This is necessary to avoid reducing the
performance of SSE code.

Fixes: d34a460092d8 ("crypto: sha256 - Optimized sha256 x86_64 routine using AVX2's RORX instructions")
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 arch/x86/crypto/sha256-avx2-asm.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/crypto/sha256-avx2-asm.S b/arch/x86/crypto/sha256-avx2-asm.S
index 9918212faf91..0ffb072be956 100644
--- a/arch/x86/crypto/sha256-avx2-asm.S
+++ b/arch/x86/crypto/sha256-avx2-asm.S
@@ -714,10 +714,11 @@ SYM_TYPED_FUNC_START(sha256_transform_rorx)
 	popq	%r15
 	popq	%r14
 	popq	%r13
 	popq	%r12
 	popq	%rbx
+	vzeroupper
 	RET
 SYM_FUNC_END(sha256_transform_rorx)
 
 .section	.rodata.cst512.K256, "aM", @progbits, 512
 .align 64
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] crypto: x86/sha512-avx2 - add missing vzeroupper
  2024-04-06  0:26 [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Eric Biggers
  2024-04-06  0:26 ` [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper Eric Biggers
  2024-04-06  0:26 ` [PATCH 2/3] crypto: x86/sha256-avx2 " Eric Biggers
@ 2024-04-06  0:26 ` Eric Biggers
  2024-04-09 22:38   ` Tim Chen
  2024-04-12  7:33 ` [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Herbert Xu
  3 siblings, 1 reply; 8+ messages in thread
From: Eric Biggers @ 2024-04-06  0:26 UTC (permalink / raw)
  To: linux-crypto; +Cc: x86, Tim Chen

From: Eric Biggers <ebiggers@google.com>

Since sha512_transform_rorx() uses ymm registers, execute vzeroupper
before returning from it.  This is necessary to avoid reducing the
performance of SSE code.

Fixes: e01d69cb0195 ("crypto: sha512 - Optimized SHA512 x86_64 assembly routine using AVX instructions.")
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 arch/x86/crypto/sha512-avx2-asm.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/crypto/sha512-avx2-asm.S b/arch/x86/crypto/sha512-avx2-asm.S
index f08496cd6870..24973f42c43f 100644
--- a/arch/x86/crypto/sha512-avx2-asm.S
+++ b/arch/x86/crypto/sha512-avx2-asm.S
@@ -678,10 +678,11 @@ SYM_TYPED_FUNC_START(sha512_transform_rorx)
 	pop	%r14
 	pop	%r13
 	pop	%r12
 	pop	%rbx
 
+	vzeroupper
 	RET
 SYM_FUNC_END(sha512_transform_rorx)
 
 ########################################################################
 ### Binary Data
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] crypto: x86/sha256-avx2 - add missing vzeroupper
  2024-04-06  0:26 ` [PATCH 2/3] crypto: x86/sha256-avx2 " Eric Biggers
@ 2024-04-09 22:38   ` Tim Chen
  0 siblings, 0 replies; 8+ messages in thread
From: Tim Chen @ 2024-04-09 22:38 UTC (permalink / raw)
  To: Eric Biggers, linux-crypto; +Cc: x86

On Fri, 2024-04-05 at 20:26 -0400, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Since sha256_transform_rorx() uses ymm registers, execute vzeroupper
> before returning from it.  This is necessary to avoid reducing the
> performance of SSE code.
> 
> Fixes: d34a460092d8 ("crypto: sha256 - Optimized sha256 x86_64 routine using AVX2's RORX instructions")
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
>  arch/x86/crypto/sha256-avx2-asm.S | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/crypto/sha256-avx2-asm.S b/arch/x86/crypto/sha256-avx2-asm.S
> index 9918212faf91..0ffb072be956 100644
> --- a/arch/x86/crypto/sha256-avx2-asm.S
> +++ b/arch/x86/crypto/sha256-avx2-asm.S
> @@ -714,10 +714,11 @@ SYM_TYPED_FUNC_START(sha256_transform_rorx)
>  	popq	%r15
>  	popq	%r14
>  	popq	%r13
>  	popq	%r12
>  	popq	%rbx
> +	vzeroupper
>  	RET
>  SYM_FUNC_END(sha256_transform_rorx)
>  
>  .section	.rodata.cst512.K256, "aM", @progbits, 512
>  .align 64

Acked-by: Tim Chen <tim.c.chen@linux.intel.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] crypto: x86/sha512-avx2 - add missing vzeroupper
  2024-04-06  0:26 ` [PATCH 3/3] crypto: x86/sha512-avx2 " Eric Biggers
@ 2024-04-09 22:38   ` Tim Chen
  0 siblings, 0 replies; 8+ messages in thread
From: Tim Chen @ 2024-04-09 22:38 UTC (permalink / raw)
  To: Eric Biggers, linux-crypto; +Cc: x86

On Fri, 2024-04-05 at 20:26 -0400, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Since sha512_transform_rorx() uses ymm registers, execute vzeroupper
> before returning from it.  This is necessary to avoid reducing the
> performance of SSE code.
> 
> Fixes: e01d69cb0195 ("crypto: sha512 - Optimized SHA512 x86_64 assembly routine using AVX instructions.")
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
>  arch/x86/crypto/sha512-avx2-asm.S | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/crypto/sha512-avx2-asm.S b/arch/x86/crypto/sha512-avx2-asm.S
> index f08496cd6870..24973f42c43f 100644
> --- a/arch/x86/crypto/sha512-avx2-asm.S
> +++ b/arch/x86/crypto/sha512-avx2-asm.S
> @@ -678,10 +678,11 @@ SYM_TYPED_FUNC_START(sha512_transform_rorx)
>  	pop	%r14
>  	pop	%r13
>  	pop	%r12
>  	pop	%rbx
>  
> +	vzeroupper
>  	RET
>  SYM_FUNC_END(sha512_transform_rorx)
>  
>  ########################################################################
>  ### Binary Data

Acked-by: Tim Chen <tim.c.chen@linux.intel.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper
  2024-04-06  0:26 ` [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper Eric Biggers
@ 2024-04-09 22:42   ` Tim Chen
  0 siblings, 0 replies; 8+ messages in thread
From: Tim Chen @ 2024-04-09 22:42 UTC (permalink / raw)
  To: Eric Biggers, linux-crypto; +Cc: x86

On Fri, 2024-04-05 at 20:26 -0400, Eric Biggers wrote:
> > From: Eric Biggers <ebiggers@google.com>
> > 
> > Since nh_avx2() uses ymm registers, execute vzeroupper before returning
> > from it.  This is necessary to avoid reducing the performance of SSE
> > code.
> > 
> > Fixes: 0f961f9f670e ("crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305")
> > Signed-off-by: Eric Biggers <ebiggers@google.com>
> > ---
> >  arch/x86/crypto/nh-avx2-x86_64.S | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/arch/x86/crypto/nh-avx2-x86_64.S b/arch/x86/crypto/nh-avx2-x86_64.S
> > index ef73a3ab8726..791386d9a83a 100644
> > --- a/arch/x86/crypto/nh-avx2-x86_64.S
> > +++ b/arch/x86/crypto/nh-avx2-x86_64.S
> > @@ -152,7 +152,8 @@ SYM_TYPED_FUNC_START(nh_avx2)
> >  
> >  	vpaddq		T5, T4, T4
> >  	vpaddq		T1, T0, T0
> >  	vpaddq		T4, T0, T0
> >  	vmovdqu		T0, (HASH)
> > +	vzeroupper
> >  	RET
> >  SYM_FUNC_END(nh_avx2)

Acked-by: Tim Chen <tim.c.chen@linux.intel.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions
  2024-04-06  0:26 [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Eric Biggers
                   ` (2 preceding siblings ...)
  2024-04-06  0:26 ` [PATCH 3/3] crypto: x86/sha512-avx2 " Eric Biggers
@ 2024-04-12  7:33 ` Herbert Xu
  3 siblings, 0 replies; 8+ messages in thread
From: Herbert Xu @ 2024-04-12  7:33 UTC (permalink / raw)
  To: Eric Biggers; +Cc: linux-crypto, x86, tim.c.chen

Eric Biggers <ebiggers@kernel.org> wrote:
> This series adds missing vzeroupper instructions before returning from
> code that uses ymm registers.
> 
> Eric Biggers (3):
>  crypto: x86/nh-avx2 - add missing vzeroupper
>  crypto: x86/sha256-avx2 - add missing vzeroupper
>  crypto: x86/sha512-avx2 - add missing vzeroupper
> 
> arch/x86/crypto/nh-avx2-x86_64.S  | 1 +
> arch/x86/crypto/sha256-avx2-asm.S | 1 +
> arch/x86/crypto/sha512-avx2-asm.S | 1 +
> 3 files changed, 3 insertions(+)
> 
> 
> base-commit: 4ad27a8be9dbefd4820da0f60da879d512b2f659

All applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-04-12  7:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-06  0:26 [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Eric Biggers
2024-04-06  0:26 ` [PATCH 1/3] crypto: x86/nh-avx2 - add missing vzeroupper Eric Biggers
2024-04-09 22:42   ` Tim Chen
2024-04-06  0:26 ` [PATCH 2/3] crypto: x86/sha256-avx2 " Eric Biggers
2024-04-09 22:38   ` Tim Chen
2024-04-06  0:26 ` [PATCH 3/3] crypto: x86/sha512-avx2 " Eric Biggers
2024-04-09 22:38   ` Tim Chen
2024-04-12  7:33 ` [PATCH 0/3] crypto: x86 - add missing vzeroupper instructions Herbert Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.