public inbox for ltp@lists.linux.it
 help / color / mirror / Atom feed
From: Eric Biggers via ltp <ltp@lists.linux.it>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: lkp@intel.com, "Horia Geantă" <horia.geanta@nxp.com>,
	"Russell King (Oracle)" <linux@armlinux.org.uk>,
	"David S. Miller" <davem@davemloft.net>,
	"kernel test robot" <oliver.sang@intel.com>,
	linux-crypto@vger.kernel.org, oe-lkp@lists.linux.dev,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Ard Biesheuvel" <ardb@kernel.org>,
	ltp@lists.linux.it
Subject: Re: [LTP] [PATCH] crypto: api - Fix generic algorithm self-test races
Date: Mon, 2 Sep 2024 10:05:54 -0700	[thread overview]
Message-ID: <20240902170554.GA77251@sol.localdomain> (raw)
In-Reply-To: <ZtQgVOnK6WzdIDlU@gondor.apana.org.au>

On Sun, Sep 01, 2024 at 04:05:40PM +0800, Herbert Xu wrote:
> On Fri, Aug 30, 2024 at 10:51:54AM -0700, Eric Biggers wrote:
> >
> > Given below in defconfig form, use 'make olddefconfig' to apply.  The failures
> > are nondeterministic and sometimes there are different ones, for example:
> > 
> > [    0.358017] alg: skcipher: failed to allocate transform for cbc(twofish-generic): -2
> > [    0.358365] alg: self-tests for cbc(twofish) using cbc(twofish-generic) failed (rc=-2)
> > [    0.358535] alg: skcipher: failed to allocate transform for cbc(camellia-generic): -2
> > [    0.358918] alg: self-tests for cbc(camellia) using cbc(camellia-generic) failed (rc=-2)
> > [    0.371533] alg: skcipher: failed to allocate transform for xts(ecb(aes-generic)): -2
> > [    0.371922] alg: self-tests for xts(aes) using xts(ecb(aes-generic)) failed (rc=-2)
> > 
> > Modules are not enabled, maybe that matters (I haven't checked yet).
> 
> Yes I think that was the key.  This triggers a massive self-test
> run which executes in parallel and reveals a few race conditions
> in the system.  I think it boils down to the following scenario:
> 
> Base algorithm X-generic, X-optimised
> Template Y
> Optimised algorithm Y-X-optimised
> 
> Everything gets registered, and then the self-tests are started.
> When Y-X-optimised gets tested, it requests the creation of the
> generic Y(X-generic).  Which then itself undergoes testing.
> 
> The race is that after Y(X-generic) gets registered, but just
> before it gets tested, X-optimised finally finishes self-testing
> which then causes all spawns of X-generic to be destroyed.  So
> by the time the self-test for Y(X-generic) comes along, it can
> no longer find the algorithm.  This error then bubbles up all
> the way up to the self-test of Y-X-optimised which then fails.
> 
> Note that there is some complexity that I've omitted here because
> when the generic self-test fails to find Y(X-generic) it actually
> triggers the construction of it again which then fails for various
> other reasons (these are not important because the construction
> should *not* be triggered at this point).
> 
> So in a way the error is expected, and we should probably remove
> the pr_err for the case where ENOENT is returned for the algorithm
> that we're currently testing.
> 
> The solution is two-fold.  First when an algorithm undergoes
> self-testing it should not trigger its construction.  Secondly
> if an instance larval fails to materialise due to it being destroyed
> by a more optimised algorithm coming along, it should obviously
> retry the construction.
> 
> Remove the check in __crypto_alg_lookup that stops a larval from
> matching new requests based on differences in the mask.  It is better
> to block new requests even if it is wrong and then simply retry the
> lookup.  If this ends up being the wrong larval it will sort iself
> out during the retry.
> 
> Reduce the CRYPTO_ALG_TYPE_MASK bits in type during larval creation
> as otherwise LSKCIPHER algorithms may not match SKCIPHER larvals.
> 
> Also block the instance creation during self-testing in the function
> crypto_larval_lookup by checking for CRYPTO_ALG_TESTED in the mask
> field.
> 
> Finally change the return value when crypto_alg_lookup fails in
> crypto_larval_wait to EAGAIN to redo the lookup.
> 
> Fixes: 37da5d0ffa7b ("crypto: api - Do not wait for tests during registration")
> Reported-by: Eric Biggers <ebiggers@kernel.org>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> diff --git a/crypto/api.c b/crypto/api.c
> index bbe29d438815..bfd177a4313a 100644
> --- a/crypto/api.c
> +++ b/crypto/api.c
> @@ -70,11 +70,6 @@ static struct crypto_alg *__crypto_alg_lookup(const char *name, u32 type,
>  		if ((q->cra_flags ^ type) & mask)
>  			continue;
>  
> -		if (crypto_is_larval(q) &&
> -		    !crypto_is_test_larval((struct crypto_larval *)q) &&
> -		    ((struct crypto_larval *)q)->mask != mask)
> -			continue;
> -
>  		exact = !strcmp(q->cra_driver_name, name);
>  		fuzzy = !strcmp(q->cra_name, name);
>  		if (!exact && !(fuzzy && q->cra_priority > best))
> @@ -113,6 +108,8 @@ struct crypto_larval *crypto_larval_alloc(const char *name, u32 type, u32 mask)
>  	if (!larval)
>  		return ERR_PTR(-ENOMEM);
>  
> +	type &= ~CRYPTO_ALG_TYPE_MASK | (mask ?: CRYPTO_ALG_TYPE_MASK);
> +
>  	larval->mask = mask;
>  	larval->alg.cra_flags = CRYPTO_ALG_LARVAL | type;
>  	larval->alg.cra_priority = -1;
> @@ -229,7 +226,7 @@ static struct crypto_alg *crypto_larval_wait(struct crypto_alg *alg)
>  		type = alg->cra_flags & ~(CRYPTO_ALG_LARVAL | CRYPTO_ALG_DEAD);
>  		mask = larval->mask;
>  		alg = crypto_alg_lookup(alg->cra_name, type, mask) ?:
> -		      ERR_PTR(-ENOENT);
> +		      ERR_PTR(-EAGAIN);
>  	} else if (IS_ERR(alg))
>  		;
>  	else if (crypto_is_test_larval(larval) &&
> @@ -308,8 +305,12 @@ static struct crypto_alg *crypto_larval_lookup(const char *name, u32 type,
>  
>  	if (!IS_ERR_OR_NULL(alg) && crypto_is_larval(alg))
>  		alg = crypto_larval_wait(alg);
> -	else if (!alg)
> +	else if (alg)
> +		;
> +	else if (!(mask & CRYPTO_ALG_TESTED))
>  		alg = crypto_larval_add(name, type, mask);
> +	else
> +		alg = ERR_PTR(-ENOENT);
>  
>  	return alg;
>  }

With both this patch "crypto: api - Fix generic algorithm self-test races" and
your other patch "crypto: algboss - Pass instance creation error up" applied,
I'm still getting errors occasionally, e.g.:

    [    5.155587] alg: skcipher: failed to allocate transform for cbc(sm4-generic): -2
    [    5.155954] alg: self-tests for cbc(sm4) using cbc(sm4-generic) failed (rc=-2)
    [    5.372511] alg: aead: failed to allocate transform for gcm_base(ctr(aes-generic),ghash-generic): -2
    [    5.372861] alg: self-tests for gcm(aes) using gcm_base(ctr(aes-generic),ghash-generic) failed (rc=-2)

I can't follow your explanation of what is going on here and what the fix is.
Would it make any sense to just revert the commits that introduced this problem?

- Eric

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

  reply	other threads:[~2024-09-02 17:06 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <ZrbTUk6DyktnO7qk@gondor.apana.org.au>
2024-08-16  8:45 ` [LTP] [PATCH 1/3] crypto: api - Remove instance larval fulfilment kernel test robot
2024-08-17  6:56   ` [LTP] [v3 PATCH " Herbert Xu via ltp
2024-08-17  6:57     ` [LTP] [v3 PATCH 2/3] crypto: api - Do not wait for tests during registration Herbert Xu via ltp
2024-08-17  6:58       ` [LTP] [v3 PATCH 3/3] crypto: simd - Do not call crypto_alloc_tfm " Herbert Xu via ltp
2024-08-27 18:48         ` Eric Biggers via ltp
2024-08-28  2:59           ` Herbert Xu via ltp
2024-08-30 17:51             ` Eric Biggers via ltp
2024-09-01  8:05               ` [LTP] [PATCH] crypto: api - Fix generic algorithm self-test races Herbert Xu via ltp
2024-09-02 17:05                 ` Eric Biggers via ltp [this message]
     [not found]                   ` <ZtZFOgh3WylktM1E@gondor.apana.org.au>
2024-10-05 22:24                     ` Eric Biggers via ltp
2024-10-06  0:53                       ` Herbert Xu via ltp
2024-10-06  3:06                         ` Eric Biggers via ltp
2024-10-07  4:32                           ` Herbert Xu via ltp
2024-10-07  7:58                             ` Herbert Xu via ltp
2024-10-07  8:31                             ` Herbert Xu via ltp

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240902170554.GA77251@sol.localdomain \
    --to=ltp@lists.linux.it \
    --cc=ardb@kernel.org \
    --cc=davem@davemloft.net \
    --cc=ebiggers@kernel.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=horia.geanta@nxp.com \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox