* [PATCH 0/2] crypto/arm64: Reduce stack bloat from scoped ksimd
@ 2025-12-03 16:38 Ard Biesheuvel
2025-12-03 16:38 ` [PATCH 1/2] crypto/arm64: aes/xts - Using single ksimd scope to reduce stack bloat Ard Biesheuvel
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Ard Biesheuvel @ 2025-12-03 16:38 UTC (permalink / raw)
To: linux-crypto; +Cc: Ard Biesheuvel, Arnd Bergmann, Eric Biggers, Herbert Xu
Arnd reports that the new scoped ksimd changes result in excessive stack
bloat in the XTS routines in some cases. Fix this for AES-XTS and
SM4-XTS.
Note that the offending patches went in via the libcrypto tree, so these
changes should either go via the same route, or wait for -rc1
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Ard Biesheuvel (2):
crypto/arm64: aes/xts - Using single ksimd scope to reduce stack bloat
crypto/arm64: sm4/xts: Merge ksimd scopes to reduce stack bloat
arch/arm64/crypto/aes-glue.c | 75 ++++++++++----------
arch/arm64/crypto/aes-neonbs-glue.c | 44 ++++++------
arch/arm64/crypto/sm4-ce-glue.c | 42 ++++++-----
3 files changed, 77 insertions(+), 84 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] crypto/arm64: aes/xts - Using single ksimd scope to reduce stack bloat
2025-12-03 16:38 [PATCH 0/2] crypto/arm64: Reduce stack bloat from scoped ksimd Ard Biesheuvel
@ 2025-12-03 16:38 ` Ard Biesheuvel
2025-12-03 16:38 ` [PATCH 2/2] crypto/arm64: sm4/xts: Merge ksimd scopes " Ard Biesheuvel
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Ard Biesheuvel @ 2025-12-03 16:38 UTC (permalink / raw)
To: linux-crypto; +Cc: Ard Biesheuvel, Arnd Bergmann, Eric Biggers, Herbert Xu
The ciphertext stealing logic in the AES-XTS implementation creates a
separate ksimd scope to call into the FP/SIMD core routines, and in some
cases (CONFIG_KASAN_STACK is one, but there might be others), the 528
byte kernel mode FP/SIMD buffer that is allocated inside this scope is
not shared with the preceding ksimd scope, resulting in unnecessary
stack bloat.
Considering that
a) the XTS ciphertext stealing logic is never called for block
encryption use cases, and XTS is rarely used for anything else,
b) in the vast majority of cases, the entire input block is processed
during the first iteration of the loop,
we can combine both ksimd scopes into a single one with no practical
impact on how often/how long FP/SIMD is en/disabled, allowing us to
reuse the same stack slot for both FP/SIMD routine calls.
Fixes: ba3c1b3b5ac9 ("crypto/arm64: aes-blk - Switch to 'ksimd' scoped guard API")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/crypto/aes-glue.c | 75 ++++++++++----------
arch/arm64/crypto/aes-neonbs-glue.c | 44 ++++++------
2 files changed, 57 insertions(+), 62 deletions(-)
diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c
index b087b900d279..c51d4487e9e9 100644
--- a/arch/arm64/crypto/aes-glue.c
+++ b/arch/arm64/crypto/aes-glue.c
@@ -549,38 +549,37 @@ static int __maybe_unused xts_encrypt(struct skcipher_request *req)
tail = 0;
}
- for (first = 1; walk.nbytes >= AES_BLOCK_SIZE; first = 0) {
- int nbytes = walk.nbytes;
+ scoped_ksimd() {
+ for (first = 1; walk.nbytes >= AES_BLOCK_SIZE; first = 0) {
+ int nbytes = walk.nbytes;
- if (walk.nbytes < walk.total)
- nbytes &= ~(AES_BLOCK_SIZE - 1);
+ if (walk.nbytes < walk.total)
+ nbytes &= ~(AES_BLOCK_SIZE - 1);
- scoped_ksimd()
aes_xts_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
ctx->key1.key_enc, rounds, nbytes,
ctx->key2.key_enc, walk.iv, first);
- err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
- }
+ err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+ }
- if (err || likely(!tail))
- return err;
+ if (err || likely(!tail))
+ return err;
- dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
- if (req->dst != req->src)
- dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
+ dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
+ if (req->dst != req->src)
+ dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
- skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
- req->iv);
+ skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
+ req->iv);
- err = skcipher_walk_virt(&walk, &subreq, false);
- if (err)
- return err;
+ err = skcipher_walk_virt(&walk, &subreq, false);
+ if (err)
+ return err;
- scoped_ksimd()
aes_xts_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
ctx->key1.key_enc, rounds, walk.nbytes,
ctx->key2.key_enc, walk.iv, first);
-
+ }
return skcipher_walk_done(&walk, 0);
}
@@ -619,39 +618,37 @@ static int __maybe_unused xts_decrypt(struct skcipher_request *req)
tail = 0;
}
- for (first = 1; walk.nbytes >= AES_BLOCK_SIZE; first = 0) {
- int nbytes = walk.nbytes;
+ scoped_ksimd() {
+ for (first = 1; walk.nbytes >= AES_BLOCK_SIZE; first = 0) {
+ int nbytes = walk.nbytes;
- if (walk.nbytes < walk.total)
- nbytes &= ~(AES_BLOCK_SIZE - 1);
+ if (walk.nbytes < walk.total)
+ nbytes &= ~(AES_BLOCK_SIZE - 1);
- scoped_ksimd()
aes_xts_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
ctx->key1.key_dec, rounds, nbytes,
ctx->key2.key_enc, walk.iv, first);
- err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
- }
+ err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+ }
- if (err || likely(!tail))
- return err;
-
- dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
- if (req->dst != req->src)
- dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
+ if (err || likely(!tail))
+ return err;
- skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
- req->iv);
+ dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
+ if (req->dst != req->src)
+ dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
- err = skcipher_walk_virt(&walk, &subreq, false);
- if (err)
- return err;
+ skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
+ req->iv);
+ err = skcipher_walk_virt(&walk, &subreq, false);
+ if (err)
+ return err;
- scoped_ksimd()
aes_xts_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
ctx->key1.key_dec, rounds, walk.nbytes,
ctx->key2.key_enc, walk.iv, first);
-
+ }
return skcipher_walk_done(&walk, 0);
}
diff --git a/arch/arm64/crypto/aes-neonbs-glue.c b/arch/arm64/crypto/aes-neonbs-glue.c
index d496effb0a5b..cb87c8fc66b3 100644
--- a/arch/arm64/crypto/aes-neonbs-glue.c
+++ b/arch/arm64/crypto/aes-neonbs-glue.c
@@ -312,13 +312,13 @@ static int __xts_crypt(struct skcipher_request *req, bool encrypt,
if (err)
return err;
- while (walk.nbytes >= AES_BLOCK_SIZE) {
- int blocks = (walk.nbytes / AES_BLOCK_SIZE) & ~7;
- out = walk.dst.virt.addr;
- in = walk.src.virt.addr;
- nbytes = walk.nbytes;
+ scoped_ksimd() {
+ while (walk.nbytes >= AES_BLOCK_SIZE) {
+ int blocks = (walk.nbytes / AES_BLOCK_SIZE) & ~7;
+ out = walk.dst.virt.addr;
+ in = walk.src.virt.addr;
+ nbytes = walk.nbytes;
- scoped_ksimd() {
if (blocks >= 8) {
if (first == 1)
neon_aes_ecb_encrypt(walk.iv, walk.iv,
@@ -344,30 +344,28 @@ static int __xts_crypt(struct skcipher_request *req, bool encrypt,
ctx->twkey, walk.iv, first);
nbytes = first = 0;
}
+ err = skcipher_walk_done(&walk, nbytes);
}
- err = skcipher_walk_done(&walk, nbytes);
- }
- if (err || likely(!tail))
- return err;
+ if (err || likely(!tail))
+ return err;
- /* handle ciphertext stealing */
- dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
- if (req->dst != req->src)
- dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
+ /* handle ciphertext stealing */
+ dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
+ if (req->dst != req->src)
+ dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
- skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
- req->iv);
+ skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
+ req->iv);
- err = skcipher_walk_virt(&walk, req, false);
- if (err)
- return err;
+ err = skcipher_walk_virt(&walk, req, false);
+ if (err)
+ return err;
- out = walk.dst.virt.addr;
- in = walk.src.virt.addr;
- nbytes = walk.nbytes;
+ out = walk.dst.virt.addr;
+ in = walk.src.virt.addr;
+ nbytes = walk.nbytes;
- scoped_ksimd() {
if (encrypt)
neon_aes_xts_encrypt(out, in, ctx->cts.key_enc,
ctx->key.rounds, nbytes, ctx->twkey,
--
2.47.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] crypto/arm64: sm4/xts: Merge ksimd scopes to reduce stack bloat
2025-12-03 16:38 [PATCH 0/2] crypto/arm64: Reduce stack bloat from scoped ksimd Ard Biesheuvel
2025-12-03 16:38 ` [PATCH 1/2] crypto/arm64: aes/xts - Using single ksimd scope to reduce stack bloat Ard Biesheuvel
@ 2025-12-03 16:38 ` Ard Biesheuvel
2025-12-03 18:10 ` [PATCH 0/2] crypto/arm64: Reduce stack bloat from scoped ksimd Eric Biggers
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Ard Biesheuvel @ 2025-12-03 16:38 UTC (permalink / raw)
To: linux-crypto; +Cc: Ard Biesheuvel, Arnd Bergmann, Eric Biggers, Herbert Xu
Merge the two ksimd scopes in the implementation of SM4-XTS to prevent
stack bloat in cases where the compiler fails to combine the stack slots
for the kernel mode FP/SIMD buffers.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/crypto/sm4-ce-glue.c | 42 ++++++++++----------
1 file changed, 20 insertions(+), 22 deletions(-)
diff --git a/arch/arm64/crypto/sm4-ce-glue.c b/arch/arm64/crypto/sm4-ce-glue.c
index 5569cece5a0b..0933ba45fbe7 100644
--- a/arch/arm64/crypto/sm4-ce-glue.c
+++ b/arch/arm64/crypto/sm4-ce-glue.c
@@ -346,11 +346,11 @@ static int sm4_xts_crypt(struct skcipher_request *req, bool encrypt)
tail = 0;
}
- while ((nbytes = walk.nbytes) >= SM4_BLOCK_SIZE) {
- if (nbytes < walk.total)
- nbytes &= ~(SM4_BLOCK_SIZE - 1);
+ scoped_ksimd() {
+ while ((nbytes = walk.nbytes) >= SM4_BLOCK_SIZE) {
+ if (nbytes < walk.total)
+ nbytes &= ~(SM4_BLOCK_SIZE - 1);
- scoped_ksimd() {
if (encrypt)
sm4_ce_xts_enc(ctx->key1.rkey_enc, walk.dst.virt.addr,
walk.src.virt.addr, walk.iv, nbytes,
@@ -359,32 +359,30 @@ static int sm4_xts_crypt(struct skcipher_request *req, bool encrypt)
sm4_ce_xts_dec(ctx->key1.rkey_dec, walk.dst.virt.addr,
walk.src.virt.addr, walk.iv, nbytes,
rkey2_enc);
- }
- rkey2_enc = NULL;
+ rkey2_enc = NULL;
- err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
- if (err)
- return err;
- }
+ err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+ if (err)
+ return err;
+ }
- if (likely(tail == 0))
- return 0;
+ if (likely(tail == 0))
+ return 0;
- /* handle ciphertext stealing */
+ /* handle ciphertext stealing */
- dst = src = scatterwalk_ffwd(sg_src, req->src, subreq.cryptlen);
- if (req->dst != req->src)
- dst = scatterwalk_ffwd(sg_dst, req->dst, subreq.cryptlen);
+ dst = src = scatterwalk_ffwd(sg_src, req->src, subreq.cryptlen);
+ if (req->dst != req->src)
+ dst = scatterwalk_ffwd(sg_dst, req->dst, subreq.cryptlen);
- skcipher_request_set_crypt(&subreq, src, dst, SM4_BLOCK_SIZE + tail,
- req->iv);
+ skcipher_request_set_crypt(&subreq, src, dst, SM4_BLOCK_SIZE + tail,
+ req->iv);
- err = skcipher_walk_virt(&walk, &subreq, false);
- if (err)
- return err;
+ err = skcipher_walk_virt(&walk, &subreq, false);
+ if (err)
+ return err;
- scoped_ksimd() {
if (encrypt)
sm4_ce_xts_enc(ctx->key1.rkey_enc, walk.dst.virt.addr,
walk.src.virt.addr, walk.iv, walk.nbytes,
--
2.47.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 0/2] crypto/arm64: Reduce stack bloat from scoped ksimd
2025-12-03 16:38 [PATCH 0/2] crypto/arm64: Reduce stack bloat from scoped ksimd Ard Biesheuvel
2025-12-03 16:38 ` [PATCH 1/2] crypto/arm64: aes/xts - Using single ksimd scope to reduce stack bloat Ard Biesheuvel
2025-12-03 16:38 ` [PATCH 2/2] crypto/arm64: sm4/xts: Merge ksimd scopes " Ard Biesheuvel
@ 2025-12-03 18:10 ` Eric Biggers
2025-12-03 22:23 ` Arnd Bergmann
2025-12-08 23:10 ` Eric Biggers
4 siblings, 0 replies; 6+ messages in thread
From: Eric Biggers @ 2025-12-03 18:10 UTC (permalink / raw)
To: Ard Biesheuvel; +Cc: linux-crypto, Arnd Bergmann, Herbert Xu
On Wed, Dec 03, 2025 at 05:38:04PM +0100, Ard Biesheuvel wrote:
> Arnd reports that the new scoped ksimd changes result in excessive stack
> bloat in the XTS routines in some cases. Fix this for AES-XTS and
> SM4-XTS.
>
> Note that the offending patches went in via the libcrypto tree, so these
> changes should either go via the same route, or wait for -rc1
>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Eric Biggers <ebiggers@kernel.org>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
>
> Ard Biesheuvel (2):
> crypto/arm64: aes/xts - Using single ksimd scope to reduce stack bloat
> crypto/arm64: sm4/xts: Merge ksimd scopes to reduce stack bloat
>
Thanks, looks good to me. I'll plan to take these and send another pull
request probably sometime next week.
- Eric
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/2] crypto/arm64: Reduce stack bloat from scoped ksimd
2025-12-03 16:38 [PATCH 0/2] crypto/arm64: Reduce stack bloat from scoped ksimd Ard Biesheuvel
` (2 preceding siblings ...)
2025-12-03 18:10 ` [PATCH 0/2] crypto/arm64: Reduce stack bloat from scoped ksimd Eric Biggers
@ 2025-12-03 22:23 ` Arnd Bergmann
2025-12-08 23:10 ` Eric Biggers
4 siblings, 0 replies; 6+ messages in thread
From: Arnd Bergmann @ 2025-12-03 22:23 UTC (permalink / raw)
To: Ard Biesheuvel, linux-crypto; +Cc: Eric Biggers, Herbert Xu
On Wed, Dec 3, 2025, at 17:38, Ard Biesheuvel wrote:
> Arnd reports that the new scoped ksimd changes result in excessive stack
> bloat in the XTS routines in some cases. Fix this for AES-XTS and
> SM4-XTS.
>
> Note that the offending patches went in via the libcrypto tree, so these
> changes should either go via the same route, or wait for -rc1
>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Eric Biggers <ebiggers@kernel.org>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
>
> Ard Biesheuvel (2):
> crypto/arm64: aes/xts - Using single ksimd scope to reduce stack bloat
> crypto/arm64: sm4/xts: Merge ksimd scopes to reduce stack bloat
I've tested a few of the configurations that I saw issues on
and they all look good so far.
Tested-by: Arnd Bergmann <arnd@arndb.de>
I'll leave randconfig builds running with this enabled to see if anything
was missing.
Arnd
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/2] crypto/arm64: Reduce stack bloat from scoped ksimd
2025-12-03 16:38 [PATCH 0/2] crypto/arm64: Reduce stack bloat from scoped ksimd Ard Biesheuvel
` (3 preceding siblings ...)
2025-12-03 22:23 ` Arnd Bergmann
@ 2025-12-08 23:10 ` Eric Biggers
4 siblings, 0 replies; 6+ messages in thread
From: Eric Biggers @ 2025-12-08 23:10 UTC (permalink / raw)
To: Ard Biesheuvel; +Cc: linux-crypto, Arnd Bergmann, Herbert Xu
On Wed, Dec 03, 2025 at 05:38:04PM +0100, Ard Biesheuvel wrote:
> Arnd reports that the new scoped ksimd changes result in excessive stack
> bloat in the XTS routines in some cases. Fix this for AES-XTS and
> SM4-XTS.
>
> Note that the offending patches went in via the libcrypto tree, so these
> changes should either go via the same route, or wait for -rc1
>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Eric Biggers <ebiggers@kernel.org>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
>
> Ard Biesheuvel (2):
> crypto/arm64: aes/xts - Using single ksimd scope to reduce stack bloat
> crypto/arm64: sm4/xts: Merge ksimd scopes to reduce stack bloat
>
> arch/arm64/crypto/aes-glue.c | 75 ++++++++++----------
> arch/arm64/crypto/aes-neonbs-glue.c | 44 ++++++------
> arch/arm64/crypto/sm4-ce-glue.c | 42 ++++++-----
> 3 files changed, 77 insertions(+), 84 deletions(-)
Applied to https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git/log/?h=libcrypto-fixes
- Eric
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-12-08 23:10 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-03 16:38 [PATCH 0/2] crypto/arm64: Reduce stack bloat from scoped ksimd Ard Biesheuvel
2025-12-03 16:38 ` [PATCH 1/2] crypto/arm64: aes/xts - Using single ksimd scope to reduce stack bloat Ard Biesheuvel
2025-12-03 16:38 ` [PATCH 2/2] crypto/arm64: sm4/xts: Merge ksimd scopes " Ard Biesheuvel
2025-12-03 18:10 ` [PATCH 0/2] crypto/arm64: Reduce stack bloat from scoped ksimd Eric Biggers
2025-12-03 22:23 ` Arnd Bergmann
2025-12-08 23:10 ` Eric Biggers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).