* Sporadic errors with alg selftest on next kernel.
@ 2025-05-19 8:09 Ingo Franzki
2025-05-19 8:28 ` Herbert Xu
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Ingo Franzki @ 2025-05-19 8:09 UTC (permalink / raw)
To: Herbert Xu
Cc: Eric Biggers, Harald Freudenberger, Holger Dengler, linux-crypto
Hi Herbert,
besides the regression found in paes-crt on s390x (reported and analyzed by Harald already), we sporadically encounter additional strange failures in our CI on the next kernel:
During this weekend's CI run, we got the following:
alg: aead: error allocating gcm_base(ctr(aes-generic),ghash-generic) (generic impl of gcm(aes)): -17
alg: self-tests for gcm(aes) using gcm-aes-s390 failed (rc=-17)
Last week, we had a similar failure:
aes_s390: Allocating AES fallback algorithm ctr(aes) failed
alg: skcipher: failed to allocate transform for ctr-aes-s390: -17
alg: self-tests for ctr(aes) using ctr-aes-s390 failed (rc=-17)
Those are only single failures, not reproducible, happen only of one system, although the same code is run on multiple systems.
So it must be some kind a race condition...
-17 is EEXIST, and from a quick look into the code this might be coming from registering an alg (e.g. __crypto_register_alg(), crypto_register_template(), af_alg_register_type(), crypto_add_alg()) when the alg is already there....
So looks like one wants to register the same alg although it was already registered concurrently?
Note that the s390x-AES ciphers are usually added via module_cpu_feature_match() automatically.
Maybe a selftest run attempts to add them again while or during aes_s390_init() is about to add them as well?
Its hard to debug, since it only happens sporadically and can't be reproduced easily.
Any idea where this might come from?
We did not see these kind of errors since long time, and still don't see them on kernels other than next.
Kind regards,
Ingo
--
Ingo Franzki
eMail: ifranzki@linux.ibm.com
Tel: ++49 (0)7031-16-4648
Linux on IBM Z Development, Schoenaicher Str. 220, 71032 Boeblingen, Germany
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Gregor Pillen
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM DATA Privacy Statement: https://www.ibm.com/privacy/us/en/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Sporadic errors with alg selftest on next kernel.
2025-05-19 8:09 Sporadic errors with alg selftest on next kernel Ingo Franzki
@ 2025-05-19 8:28 ` Herbert Xu
2025-05-19 8:34 ` Herbert Xu
` (2 subsequent siblings)
3 siblings, 0 replies; 9+ messages in thread
From: Herbert Xu @ 2025-05-19 8:28 UTC (permalink / raw)
To: Ingo Franzki
Cc: Eric Biggers, Harald Freudenberger, Holger Dengler, linux-crypto
On Mon, May 19, 2025 at 10:09:10AM +0200, Ingo Franzki wrote:
> Hi Herbert,
>
> besides the regression found in paes-crt on s390x (reported and analyzed by Harald already), we sporadically encounter additional strange failures in our CI on the next kernel:
>
> During this weekend's CI run, we got the following:
>
> alg: aead: error allocating gcm_base(ctr(aes-generic),ghash-generic) (generic impl of gcm(aes)): -17
> alg: self-tests for gcm(aes) using gcm-aes-s390 failed (rc=-17)
>
> Last week, we had a similar failure:
>
> aes_s390: Allocating AES fallback algorithm ctr(aes) failed
> alg: skcipher: failed to allocate transform for ctr-aes-s390: -17
> alg: self-tests for ctr(aes) using ctr-aes-s390 failed (rc=-17)
>
> Those are only single failures, not reproducible, happen only of one system, although the same code is run on multiple systems.
> So it must be some kind a race condition...
>
> -17 is EEXIST, and from a quick look into the code this might be coming from registering an alg (e.g. __crypto_register_alg(), crypto_register_template(), af_alg_register_type(), crypto_add_alg()) when the alg is already there....
> So looks like one wants to register the same alg although it was already registered concurrently?
Yes it looks like a race condition. It's normal for multiple
entities to try to construct the same algorithm at the same time.
The larvals/test larvals are meant to take care of that problem
But from time to time there are bugs (e.g., commit 7505436e2925)
that cause errors like this.
> Its hard to debug, since it only happens sporadically and can't be reproduced easily.
>
> Any idea where this might come from?
> We did not see these kind of errors since long time, and still don't see them on kernels other than next.
Well the immediate reason is the extra tests are now enabled by
default. So your CI likely weren't executing these extra tests,
and now they are. One thing that the extra tests do is allocating
a generic fallback to compare the test results against, that's
what was happening in the first error you saw above.
The next proximate cause is parallel testing but this has been
around for a few months already.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Sporadic errors with alg selftest on next kernel.
2025-05-19 8:09 Sporadic errors with alg selftest on next kernel Ingo Franzki
2025-05-19 8:28 ` Herbert Xu
@ 2025-05-19 8:34 ` Herbert Xu
2025-05-19 8:43 ` Ingo Franzki
2025-05-19 9:34 ` Herbert Xu
2025-05-19 10:29 ` [PATCH] crypto: api - Redo lookup on EEXIST Herbert Xu
3 siblings, 1 reply; 9+ messages in thread
From: Herbert Xu @ 2025-05-19 8:34 UTC (permalink / raw)
To: Ingo Franzki
Cc: Eric Biggers, Harald Freudenberger, Holger Dengler, linux-crypto
On Mon, May 19, 2025 at 10:09:10AM +0200, Ingo Franzki wrote:
>
> We did not see these kind of errors since long time, and still don't see them on kernels other than next.
Could you check whether the CI runs have the extra testing enabled
for non-next kernels? If they actually had extra tests enabled before
the current next kernel then that would be surprising.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Sporadic errors with alg selftest on next kernel.
2025-05-19 8:34 ` Herbert Xu
@ 2025-05-19 8:43 ` Ingo Franzki
0 siblings, 0 replies; 9+ messages in thread
From: Ingo Franzki @ 2025-05-19 8:43 UTC (permalink / raw)
To: Herbert Xu
Cc: Eric Biggers, Harald Freudenberger, Holger Dengler, linux-crypto
On 19.05.2025 10:34, Herbert Xu wrote:
> On Mon, May 19, 2025 at 10:09:10AM +0200, Ingo Franzki wrote:
>>
>> We did not see these kind of errors since long time, and still don't see them on kernels other than next.
>
> Could you check whether the CI runs have the extra testing enabled
> for non-next kernels? If they actually had extra tests enabled before
> the current next kernel then that would be surprising.
No we do not explicitly enable the extra tests, neither on non-next kernels, nor on next kernels.
So it can very well be that enabling the extra tests by default now triggers the failures to show up.
>
> Cheers,
--
Ingo Franzki
eMail: ifranzki@linux.ibm.com
Tel: ++49 (0)7031-16-4648
Linux on IBM Z Development, Schoenaicher Str. 220, 71032 Boeblingen, Germany
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Gregor Pillen
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM DATA Privacy Statement: https://www.ibm.com/privacy/us/en/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Sporadic errors with alg selftest on next kernel.
2025-05-19 8:09 Sporadic errors with alg selftest on next kernel Ingo Franzki
2025-05-19 8:28 ` Herbert Xu
2025-05-19 8:34 ` Herbert Xu
@ 2025-05-19 9:34 ` Herbert Xu
2025-05-19 10:29 ` [PATCH] crypto: api - Redo lookup on EEXIST Herbert Xu
3 siblings, 0 replies; 9+ messages in thread
From: Herbert Xu @ 2025-05-19 9:34 UTC (permalink / raw)
To: Ingo Franzki
Cc: Eric Biggers, Harald Freudenberger, Holger Dengler, linux-crypto
On Mon, May 19, 2025 at 10:09:10AM +0200, Ingo Franzki wrote:
> Hi Herbert,
>
> besides the regression found in paes-crt on s390x (reported and analyzed by Harald already), we sporadically encounter additional strange failures in our CI on the next kernel:
>
> During this weekend's CI run, we got the following:
>
> alg: aead: error allocating gcm_base(ctr(aes-generic),ghash-generic) (generic impl of gcm(aes)): -17
> alg: self-tests for gcm(aes) using gcm-aes-s390 failed (rc=-17)
>
> Last week, we had a similar failure:
>
> aes_s390: Allocating AES fallback algorithm ctr(aes) failed
> alg: skcipher: failed to allocate transform for ctr-aes-s390: -17
> alg: self-tests for ctr(aes) using ctr-aes-s390 failed (rc=-17)
OK I think I can see one scenario where this can happen. Because
you guys use a fallback, and the extra tests are also using a
generic fallback, they will both be trying to construct the same
thing at the same time.
However, they will construct them with different names: the s390
driver will allocate ctr(aes), and the extra tests will instead
opt for the more verbose ctr(aes-generic). The larval scheme only
works if everyone uses the same name, e.g., ctr(aes).
So yes they can indeed talk past each other and end up with a clash
at registration time and the losing side will get EEXIST.
Let me try to work out a solution for this, probably by retrying
on EEXIST. Or perhaps by turning EEXIST into EAGAIN in the template
instantiation code path.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] crypto: api - Redo lookup on EEXIST
2025-05-19 8:09 Sporadic errors with alg selftest on next kernel Ingo Franzki
` (2 preceding siblings ...)
2025-05-19 9:34 ` Herbert Xu
@ 2025-05-19 10:29 ` Herbert Xu
2025-05-19 13:47 ` Ingo Franzki
3 siblings, 1 reply; 9+ messages in thread
From: Herbert Xu @ 2025-05-19 10:29 UTC (permalink / raw)
To: Ingo Franzki
Cc: Eric Biggers, Harald Freudenberger, Holger Dengler, linux-crypto
On Mon, May 19, 2025 at 10:09:10AM +0200, Ingo Franzki wrote:
>
> During this weekend's CI run, we got the following:
>
> alg: aead: error allocating gcm_base(ctr(aes-generic),ghash-generic) (generic impl of gcm(aes)): -17
> alg: self-tests for gcm(aes) using gcm-aes-s390 failed (rc=-17)
>
> Last week, we had a similar failure:
>
> aes_s390: Allocating AES fallback algorithm ctr(aes) failed
> alg: skcipher: failed to allocate transform for ctr-aes-s390: -17
> alg: self-tests for ctr(aes) using ctr-aes-s390 failed (rc=-17)
Please try this patch:
---8<---
When two crypto algorithm lookups occur at the same time with
different names for the same algorithm, e.g., ctr(aes-generic)
and ctr(aes), they will both be instantiated. However, only one
of them can be registered. The second instantiation will fail
with EEXIST.
Avoid failing the second lookup by making it retry, but only once
because there are tricky names such as gcm_base(ctr(aes),ghash)
that will always fail, despite triggering instantiation and EEXIST.
Reported-by: Ingo Franzki <ifranzki@linux.ibm.com>
Fixes: 2825982d9d66 ("[CRYPTO] api: Added event notification")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
diff --git a/crypto/api.c b/crypto/api.c
index 133d9b626922..5724d62e9d07 100644
--- a/crypto/api.c
+++ b/crypto/api.c
@@ -219,10 +219,19 @@ static struct crypto_alg *crypto_larval_wait(struct crypto_alg *alg,
if (crypto_is_test_larval(larval))
crypto_larval_kill(larval);
alg = ERR_PTR(-ETIMEDOUT);
- } else if (!alg) {
+ } else if (!alg || PTR_ERR(alg) == -EEXIST) {
+ int err = alg ? -EEXIST : -EAGAIN;
+
+ /*
+ * EEXIST is expected because two probes can be scheduled
+ * at the same time with one using alg_name and the other
+ * using driver_name. Do a re-lookup but do not retry in
+ * case we hit a quirk like gcm_base(ctr(aes),...) which
+ * will never match.
+ */
alg = &larval->alg;
alg = crypto_alg_lookup(alg->cra_name, type, mask) ?:
- ERR_PTR(-EAGAIN);
+ ERR_PTR(err);
} else if (IS_ERR(alg))
;
else if (crypto_is_test_larval(larval) &&
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] crypto: api - Redo lookup on EEXIST
2025-05-19 10:29 ` [PATCH] crypto: api - Redo lookup on EEXIST Herbert Xu
@ 2025-05-19 13:47 ` Ingo Franzki
2025-05-26 6:44 ` Ingo Franzki
0 siblings, 1 reply; 9+ messages in thread
From: Ingo Franzki @ 2025-05-19 13:47 UTC (permalink / raw)
To: Herbert Xu
Cc: Eric Biggers, Harald Freudenberger, Holger Dengler, linux-crypto
On 19.05.2025 12:29, Herbert Xu wrote:
> On Mon, May 19, 2025 at 10:09:10AM +0200, Ingo Franzki wrote:
>>
>> During this weekend's CI run, we got the following:
>>
>> alg: aead: error allocating gcm_base(ctr(aes-generic),ghash-generic) (generic impl of gcm(aes)): -17
>> alg: self-tests for gcm(aes) using gcm-aes-s390 failed (rc=-17)
>>
>> Last week, we had a similar failure:
>>
>> aes_s390: Allocating AES fallback algorithm ctr(aes) failed
>> alg: skcipher: failed to allocate transform for ctr-aes-s390: -17
>> alg: self-tests for ctr(aes) using ctr-aes-s390 failed (rc=-17)
>
> Please try this patch:
Since I can't reproduce the problem at will, I can't tell if your patch solves the problem. From what I can tell it looks reasonable.
Nevertheless, I have added your patch to be applied on top of the next day's next kernel tree in our CI so that it runs with your patch for a while.
Lets run it for a few days and see if the error still shows up or not.
>
> ---8<---
> When two crypto algorithm lookups occur at the same time with
> different names for the same algorithm, e.g., ctr(aes-generic)
> and ctr(aes), they will both be instantiated. However, only one
> of them can be registered. The second instantiation will fail
> with EEXIST.
>
> Avoid failing the second lookup by making it retry, but only once
> because there are tricky names such as gcm_base(ctr(aes),ghash)
> that will always fail, despite triggering instantiation and EEXIST.
>
> Reported-by: Ingo Franzki <ifranzki@linux.ibm.com>
> Fixes: 2825982d9d66 ("[CRYPTO] api: Added event notification")
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
>
> diff --git a/crypto/api.c b/crypto/api.c
> index 133d9b626922..5724d62e9d07 100644
> --- a/crypto/api.c
> +++ b/crypto/api.c
> @@ -219,10 +219,19 @@ static struct crypto_alg *crypto_larval_wait(struct crypto_alg *alg,
> if (crypto_is_test_larval(larval))
> crypto_larval_kill(larval);
> alg = ERR_PTR(-ETIMEDOUT);
> - } else if (!alg) {
> + } else if (!alg || PTR_ERR(alg) == -EEXIST) {
> + int err = alg ? -EEXIST : -EAGAIN;
> +
> + /*
> + * EEXIST is expected because two probes can be scheduled
> + * at the same time with one using alg_name and the other
> + * using driver_name. Do a re-lookup but do not retry in
> + * case we hit a quirk like gcm_base(ctr(aes),...) which
> + * will never match.
> + */
> alg = &larval->alg;
> alg = crypto_alg_lookup(alg->cra_name, type, mask) ?:
> - ERR_PTR(-EAGAIN);
> + ERR_PTR(err);
> } else if (IS_ERR(alg))
> ;
> else if (crypto_is_test_larval(larval) &&
--
Ingo Franzki
eMail: ifranzki@linux.ibm.com
Tel: ++49 (0)7031-16-4648
Linux on IBM Z Development, Schoenaicher Str. 220, 71032 Boeblingen, Germany
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Gregor Pillen
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM DATA Privacy Statement: https://www.ibm.com/privacy/us/en/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] crypto: api - Redo lookup on EEXIST
2025-05-19 13:47 ` Ingo Franzki
@ 2025-05-26 6:44 ` Ingo Franzki
2025-05-26 8:13 ` Herbert Xu
0 siblings, 1 reply; 9+ messages in thread
From: Ingo Franzki @ 2025-05-26 6:44 UTC (permalink / raw)
To: Herbert Xu
Cc: Eric Biggers, Harald Freudenberger, Holger Dengler, linux-crypto
On 19.05.2025 15:47, Ingo Franzki wrote:
> On 19.05.2025 12:29, Herbert Xu wrote:
>> On Mon, May 19, 2025 at 10:09:10AM +0200, Ingo Franzki wrote:
>>>
>>> During this weekend's CI run, we got the following:
>>>
>>> alg: aead: error allocating gcm_base(ctr(aes-generic),ghash-generic) (generic impl of gcm(aes)): -17
>>> alg: self-tests for gcm(aes) using gcm-aes-s390 failed (rc=-17)
>>>
>>> Last week, we had a similar failure:
>>>
>>> aes_s390: Allocating AES fallback algorithm ctr(aes) failed
>>> alg: skcipher: failed to allocate transform for ctr-aes-s390: -17
>>> alg: self-tests for ctr(aes) using ctr-aes-s390 failed (rc=-17)
>>
>> Please try this patch:
>
> Since I can't reproduce the problem at will, I can't tell if your patch solves the problem. From what I can tell it looks reasonable.
> Nevertheless, I have added your patch to be applied on top of the next day's next kernel tree in our CI so that it runs with your patch for a while.
> Lets run it for a few days and see if the error still shows up or not.
It has now run several days in our CI with this patch on top of Next without showing the error again, so I would claim that your patch fixes the problem.
Can you please include it into the next kernel?
>
>>
>> ---8<---
>> When two crypto algorithm lookups occur at the same time with
>> different names for the same algorithm, e.g., ctr(aes-generic)
>> and ctr(aes), they will both be instantiated. However, only one
>> of them can be registered. The second instantiation will fail
>> with EEXIST.
>>
>> Avoid failing the second lookup by making it retry, but only once
>> because there are tricky names such as gcm_base(ctr(aes),ghash)
>> that will always fail, despite triggering instantiation and EEXIST.
>>
>> Reported-by: Ingo Franzki <ifranzki@linux.ibm.com>
>> Fixes: 2825982d9d66 ("[CRYPTO] api: Added event notification")
>> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
>>
>> diff --git a/crypto/api.c b/crypto/api.c
>> index 133d9b626922..5724d62e9d07 100644
>> --- a/crypto/api.c
>> +++ b/crypto/api.c
>> @@ -219,10 +219,19 @@ static struct crypto_alg *crypto_larval_wait(struct crypto_alg *alg,
>> if (crypto_is_test_larval(larval))
>> crypto_larval_kill(larval);
>> alg = ERR_PTR(-ETIMEDOUT);
>> - } else if (!alg) {
>> + } else if (!alg || PTR_ERR(alg) == -EEXIST) {
>> + int err = alg ? -EEXIST : -EAGAIN;
>> +
>> + /*
>> + * EEXIST is expected because two probes can be scheduled
>> + * at the same time with one using alg_name and the other
>> + * using driver_name. Do a re-lookup but do not retry in
>> + * case we hit a quirk like gcm_base(ctr(aes),...) which
>> + * will never match.
>> + */
>> alg = &larval->alg;
>> alg = crypto_alg_lookup(alg->cra_name, type, mask) ?:
>> - ERR_PTR(-EAGAIN);
>> + ERR_PTR(err);
>> } else if (IS_ERR(alg))
>> ;
>> else if (crypto_is_test_larval(larval) &&
>
>
--
Ingo Franzki
eMail: ifranzki@linux.ibm.com
Tel: ++49 (0)7031-16-4648
Linux on IBM Z Development, Schoenaicher Str. 220, 71032 Boeblingen, Germany
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Gregor Pillen
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM DATA Privacy Statement: https://www.ibm.com/privacy/us/en/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] crypto: api - Redo lookup on EEXIST
2025-05-26 6:44 ` Ingo Franzki
@ 2025-05-26 8:13 ` Herbert Xu
0 siblings, 0 replies; 9+ messages in thread
From: Herbert Xu @ 2025-05-26 8:13 UTC (permalink / raw)
To: Ingo Franzki
Cc: Eric Biggers, Harald Freudenberger, Holger Dengler, linux-crypto
On Mon, May 26, 2025 at 08:44:33AM +0200, Ingo Franzki wrote:
>
> It has now run several days in our CI with this patch on top of Next without showing the error again, so I would claim that your patch fixes the problem.
> Can you please include it into the next kernel?
Great! The patch is already in my pull request for 6.16.
Thanks,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-05-26 8:14 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-19 8:09 Sporadic errors with alg selftest on next kernel Ingo Franzki
2025-05-19 8:28 ` Herbert Xu
2025-05-19 8:34 ` Herbert Xu
2025-05-19 8:43 ` Ingo Franzki
2025-05-19 9:34 ` Herbert Xu
2025-05-19 10:29 ` [PATCH] crypto: api - Redo lookup on EEXIST Herbert Xu
2025-05-19 13:47 ` Ingo Franzki
2025-05-26 6:44 ` Ingo Franzki
2025-05-26 8:13 ` Herbert Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox