public inbox for linux-crypto@vger.kernel.org
 help / color / mirror / Atom feed
From: Horia Geanta <horia.geanta@nxp.com>
To: Kun Song <Kun.Song@windriver.com>,
	"herbert@gondor.apana.org.au" <herbert@gondor.apana.org.au>
Cc: Gaurav Jain <gaurav.jain@nxp.com>, Varun Sethi <V.Sethi@nxp.com>,
	Aymen Sghaier <aymen.sghaier@nxp.com>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"filip.pudak@windriver.com" <filip.pudak@windriver.com>,
	"heng.guo@windriver.com" <heng.guo@windriver.com>,
	"linux-crypto@vger.kernel.org" <linux-crypto@vger.kernel.org>,
	Meenakshi Aggarwal <meenakshi.aggarwal@nxp.com>,
	"richard.danter@windriver.com" <richard.danter@windriver.com>
Subject: Re: [PATCH v5.10.y] crypto: caam/jr - Fix possible caam_jr crash
Date: Wed, 31 Jan 2024 13:34:17 +0000	[thread overview]
Message-ID: <5be20cd5-8758-452c-81a9-6452df7aedeb@nxp.com> (raw)
In-Reply-To: <20231221093209.984929-1-Kun.Song@windriver.com>

On 12/21/2023 11:32 AM, Kun Song wrote:
> Test environment:
>   Linux kernel version: 5.10.y
>   Architecture: ARM Cortex-A
>   Processor: NXP Layerscape LS1028
> 
> Crash in reboot tests:
>   Reproducibility: 1%
> 
Replying here to comment on the log.
I've added all the people from the latest reply, i.e.:
https://lore.kernel.org/linux-crypto/AM0PR04MB6004AECDD044F1E6BF3B6732E7682@AM0PR04MB6004.eurprd04.prod.outlook.com

> If a job ring is still allocated, Once caam_jr_remove() returned,
> jrpriv will be freed and the registers will get unmapped.Then
In this case, most likely the root cause is different (see below).

> caam_jr_interrupt will get error irqstate value.
> So such a job ring will probably crash.Crash info is below:
> --------------------------------------
> RBS Sys: Restart ordered by epghd(0x1)
> RBS Sys: RESTARTING
This looks like a system restart.

> caam_jr 8030000.jr: Device is busy
> caam_jr 8020000.jr: Device is busy
> caam_jr 8010000.jr: Device is busy
For some reason, there are still tfms accounted for on all these caam job rings.
Maybe the system restart is not handled correctly at some point,
hence some resource leaks (unallocated tfms).

As already discussed, exiting early from caam_jr .remove() callback will leave
the HW unquiesced (e.g. job rings not flushed), interrupts still active.

> arm-smmu 5000000.iommu: disabling translation
From here onward caam memory accesses won't be translated.

> caam_jr 8010000.jr: job ring error: irqstate: 00000103
The error code means caam was not able to write the status of the job
it just finished in the output job ring (which is allocated in memory).

Most likely this happened due to iommu no longer translating the access.

> 
> Disabling interrupts is to ensure that the device removal
> operation is not interrupted.
> 
> Signed-off-by: Kun Song <Kun.Song@windriver.com>
> Reviewed-by: Hen Guo <Heng.Guo@windriver.com>
> ---
>  drivers/crypto/caam/jr.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/crypto/caam/jr.c b/drivers/crypto/caam/jr.c
> index 6f669966ba2c..d191e8caa1ad 100644
> --- a/drivers/crypto/caam/jr.c
> +++ b/drivers/crypto/caam/jr.c
> @@ -135,6 +135,10 @@ static int caam_jr_remove(struct platform_device *pdev)
>  	jrdev = &pdev->dev;
>  	jrpriv = dev_get_drvdata(jrdev);
>  
> +	/* Disabling interrupts is ensure that the device removal operation
> +	 * is not interrupted by interrupts.
> +	 */
> +	devm_free_irq(jrdev, jrpriv->irq, jrdev);
>  	if (jrpriv->hwrng)
>  		caam_rng_exit(jrdev->parent);
>  
As pointed out by previous discussions, this is not enough.
Crashes could still occur due to crypto API users calling into caam driver,
which would be in an inconsistent state.

Whether .remove() being called is triggered by a system restart or
a manual device unbinding [1] is irrelevant, the driver mustn't crash.

I think Herbert's suggestion [2] on how to deal with HW devices going away
makes sense.

Not sure if the changes (thinking of crypto API part) could go into LTS kernels.
If not, we'll have to stick to caam driver-only changes (but IIUC
other crypto drivers are having the same issue with HW devices going away [3]).

Thanks,
Horia

[1] https://lore.kernel.org/linux-crypto/VI1PR04MB7023A7EC91599A537CB6A487EEB30@VI1PR04MB7023.eurprd04.prod.outlook.com/
[2] https://lore.kernel.org/linux-crypto/20190919134512.GA29320@gondor.apana.org.au/
[3] https://lore.kernel.org/linux-crypto/ZAqwTqw3lR+dnImO@gondor.apana.org.au/


      parent reply	other threads:[~2024-01-31 13:34 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-21  9:32 [PATCH v5.10.y] crypto: caam/jr - Fix possible caam_jr crash Kun Song
2023-12-22 11:43 ` Gaurav Jain (OSS)
2023-12-25  3:05   ` Kun Song
2024-01-10  1:39   ` [REMINDER] " Kun Song
2024-01-10  6:57     ` Gaurav Jain (OSS)
2024-01-10  7:00     ` Gaurav Jain
2024-01-10  8:25       ` Kun Song
2024-01-10  9:14         ` [EXT] " Gaurav Jain
2024-01-10 11:56           ` Kun Song
2024-01-11 10:46             ` [EXT] " Gaurav Jain
2024-01-31 13:34 ` Horia Geanta [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5be20cd5-8758-452c-81a9-6452df7aedeb@nxp.com \
    --to=horia.geanta@nxp.com \
    --cc=Kun.Song@windriver.com \
    --cc=V.Sethi@nxp.com \
    --cc=aymen.sghaier@nxp.com \
    --cc=davem@davemloft.net \
    --cc=filip.pudak@windriver.com \
    --cc=gaurav.jain@nxp.com \
    --cc=heng.guo@windriver.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-crypto@vger.kernel.org \
    --cc=meenakshi.aggarwal@nxp.com \
    --cc=richard.danter@windriver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox