* Re: rcu_preempt detected stalls on CPUs/tasks
2025-08-20 11:10 ` Jens Wiklander
@ 2025-08-20 12:39 ` Lars Persson
2025-08-20 13:13 ` Jens Wiklander
2025-08-20 13:08 ` Stauffer Thomas MTANA via OP-TEE
2025-08-21 6:42 ` Sumit Garg via OP-TEE
2 siblings, 1 reply; 11+ messages in thread
From: Lars Persson @ 2025-08-20 12:39 UTC (permalink / raw)
To: Jens Wiklander, Stauffer Thomas MTANA, Sumit Garg
Cc: op-tee@lists.trustedfirmware.org, Ferreira Joao MTANA
Hi
One reason for not seeing this issue more is that the NXP SDK ships with the NS_TIMER_SWITCH=1 patch reverted.
The current SDK for example has this commit (it was there already 2020 on older branches):
https://github.com/nxp-imx/imx-atf/commit/c73b052c4d57a10b9bfcd9002e8730088d854583
/Lars
From: Jens Wiklander <jens.wiklander@linaro.org>
Date: Wednesday, 20 August 2025 at 13:10
To: Stauffer Thomas MTANA <Thomas.Stauffer@mt.com>, Sumit Garg <sumit.garg@kernel.org>
Cc: op-tee@lists.trustedfirmware.org <op-tee@lists.trustedfirmware.org>, Ferreira Joao MTANA <Joao.Ferreira@mt.com>
Subject: Re: rcu_preempt detected stalls on CPUs/tasks
Hi Thomas,
On Mon, Aug 4, 2025 at 7:09 PM Stauffer Thomas MTANA via OP-TEE
<op-tee@lists.trustedfirmware.org> wrote:
>
> Hi,
>
> I'm running OP-TEE 4.5 with PKCS11TA and ATF lts-v2.12.4 on an iMX8MP. When I create new rsa 4096 bit keypair with OP-TEE, I often get
>
> rcu_preempt detected stalls on CPUs/tasks
>
> from Linux 6.6.90 (mainline)
>
> Also PID 0 is sometimes blocked for more than 30 seconds. When I create a RT task with even higher priority, this process is also blocked up to 2 seconds. For a test I disabled saving/restoring the NS timer register in ATF (arm-trusted-firmware/lib/el3_runtime/aarch64/context_mgmt.c), this seems to get completely rid of the problem. Neither creating nor signing leads to any issue anymore. This hack may lead to other problems I do not fully understand yet. I "believe" that at least since ARMv8, the CPU have their own timers for secure/non-secure world, but I would assume that ATF implements this correctly already.
I'm starting to suspect that we're setting NS_TIMER_SWITCH to 1 in
services/spd/opteed/opteed.mk based on a misunderstanding. OP-TEE can
use timers, but then it's using the EL3 physical timer. So OP-TEE
should stay off the EL1 physical timer. Sumit, what's your view?
>
> Maybe I'm completely wrong here (assuming that it cannot be I'm the first person having this issue on this platform). Hint in any direction would be helpful.
I'm surprised we haven't seen more of this issue.
Cheers,
Jens
>
> Regards
>
> Thomas
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: rcu_preempt detected stalls on CPUs/tasks
2025-08-20 12:39 ` Lars Persson
@ 2025-08-20 13:13 ` Jens Wiklander
0 siblings, 0 replies; 11+ messages in thread
From: Jens Wiklander @ 2025-08-20 13:13 UTC (permalink / raw)
To: Lars Persson
Cc: Stauffer Thomas MTANA, Sumit Garg,
op-tee@lists.trustedfirmware.org, Ferreira Joao MTANA
Hi Lars,
On Wed, Aug 20, 2025 at 2:39 PM Lars Persson <Lars.Persson@axis.com> wrote:
>
> Hi
>
>
>
> One reason for not seeing this issue more is that the NXP SDK ships with the NS_TIMER_SWITCH=1 patch reverted.
>
>
>
> The current SDK for example has this commit (it was there already 2020 on older branches):
>
> https://github.com/nxp-imx/imx-atf/commit/c73b052c4d57a10b9bfcd9002e8730088d854583
Thanks, that's good to know.
Cheers,
Jens
>
>
>
> /Lars
>
>
>
> From: Jens Wiklander <jens.wiklander@linaro.org>
> Date: Wednesday, 20 August 2025 at 13:10
> To: Stauffer Thomas MTANA <Thomas.Stauffer@mt.com>, Sumit Garg <sumit.garg@kernel.org>
> Cc: op-tee@lists.trustedfirmware.org <op-tee@lists.trustedfirmware.org>, Ferreira Joao MTANA <Joao.Ferreira@mt.com>
> Subject: Re: rcu_preempt detected stalls on CPUs/tasks
>
> Hi Thomas,
>
> On Mon, Aug 4, 2025 at 7:09 PM Stauffer Thomas MTANA via OP-TEE
> <op-tee@lists.trustedfirmware.org> wrote:
> >
> > Hi,
> >
> > I'm running OP-TEE 4.5 with PKCS11TA and ATF lts-v2.12.4 on an iMX8MP. When I create new rsa 4096 bit keypair with OP-TEE, I often get
> >
> > rcu_preempt detected stalls on CPUs/tasks
> >
> > from Linux 6.6.90 (mainline)
> >
> > Also PID 0 is sometimes blocked for more than 30 seconds. When I create a RT task with even higher priority, this process is also blocked up to 2 seconds. For a test I disabled saving/restoring the NS timer register in ATF (arm-trusted-firmware/lib/el3_runtime/aarch64/context_mgmt.c), this seems to get completely rid of the problem. Neither creating nor signing leads to any issue anymore. This hack may lead to other problems I do not fully understand yet. I "believe" that at least since ARMv8, the CPU have their own timers for secure/non-secure world, but I would assume that ATF implements this correctly already.
>
> I'm starting to suspect that we're setting NS_TIMER_SWITCH to 1 in
> services/spd/opteed/opteed.mk based on a misunderstanding. OP-TEE can
> use timers, but then it's using the EL3 physical timer. So OP-TEE
> should stay off the EL1 physical timer. Sumit, what's your view?
>
> >
> > Maybe I'm completely wrong here (assuming that it cannot be I'm the first person having this issue on this platform). Hint in any direction would be helpful.
>
> I'm surprised we haven't seen more of this issue.
>
> Cheers,
> Jens
>
> >
> > Regards
> >
> > Thomas
> >
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: rcu_preempt detected stalls on CPUs/tasks
2025-08-20 11:10 ` Jens Wiklander
2025-08-20 12:39 ` Lars Persson
@ 2025-08-20 13:08 ` Stauffer Thomas MTANA via OP-TEE
2025-08-20 13:50 ` Jens Wiklander
2025-08-21 6:42 ` Sumit Garg via OP-TEE
2 siblings, 1 reply; 11+ messages in thread
From: Stauffer Thomas MTANA via OP-TEE @ 2025-08-20 13:08 UTC (permalink / raw)
To: Sumit Garg; +Cc: op-tee@lists.trustedfirmware.org
Hi Jens,
I analyzed this a little bit further since last time I wrote. Here what my "believe" at the moment is
*
Linux uses the non secure timer in arch_timer (physical/virtual) -> this is correct
*
OP-TEE uses the secure timer (physical/virtual) -> this is correct
*
ARM Trusted Firmware by default enables NS_TIMER_SWITCH=1 with opteed, this IMHO unnecessarily stores/restores time registers, setting NS_TIMER_SWITCH=0 seems to solve the issue, my personal tests and also xtest did not show me any issue so far
All this with some uncertainty, I read through quite some code, but I could have missed a case, where something may go wrong I did not see.
Latencies I tested with cycletest. With NS_TIMER_SWITCH=1 this skyrockets (and explains all the other negative consequences) with NS_TIMER_SWITCH=0, everything is back to normal, even doing "heavy" operation like creating 4096 bit RSA keys with OP-TEE.
Thomas
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: rcu_preempt detected stalls on CPUs/tasks
2025-08-20 13:08 ` Stauffer Thomas MTANA via OP-TEE
@ 2025-08-20 13:50 ` Jens Wiklander
2025-08-20 14:46 ` Jens Wiklander
2025-08-20 16:04 ` Andrew Davis via OP-TEE
0 siblings, 2 replies; 11+ messages in thread
From: Jens Wiklander @ 2025-08-20 13:50 UTC (permalink / raw)
To: Stauffer Thomas MTANA
Cc: Sumit Garg, op-tee@lists.trustedfirmware.org, Lars Persson
Hi Thomas,
On Wed, Aug 20, 2025 at 3:09 PM Stauffer Thomas MTANA via OP-TEE
<op-tee@lists.trustedfirmware.org> wrote:
>
> Hi Jens,
>
> I analyzed this a little bit further since last time I wrote. Here what my "believe" at the moment is
>
>
> *
> Linux uses the non secure timer in arch_timer (physical/virtual) -> this is correct
> *
> OP-TEE uses the secure timer (physical/virtual) -> this is correct
Thanks for confirming.
> *
> ARM Trusted Firmware by default enables NS_TIMER_SWITCH=1 with opteed, this IMHO unnecessarily stores/restores time registers, setting NS_TIMER_SWITCH=0 seems to solve the issue, my personal tests and also xtest did not show me any issue so far
>
> All this with some uncertainty, I read through quite some code, but I could have missed a case, where something may go wrong I did not see.
>
> Latencies I tested with cycletest. With NS_TIMER_SWITCH=1 this skyrockets (and explains all the other negative consequences) with NS_TIMER_SWITCH=0, everything is back to normal, even doing "heavy" operation like creating 4096 bit RSA keys with OP-TEE.
This and Lars's findings clearly indicate that we shouldn't set
NS_TIMER_SWITCH=1. I'll propose a patch.
Cheers,
Jens
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: rcu_preempt detected stalls on CPUs/tasks
2025-08-20 13:50 ` Jens Wiklander
@ 2025-08-20 14:46 ` Jens Wiklander
2025-08-21 9:19 ` Sumit Garg via OP-TEE
2025-08-20 16:04 ` Andrew Davis via OP-TEE
1 sibling, 1 reply; 11+ messages in thread
From: Jens Wiklander @ 2025-08-20 14:46 UTC (permalink / raw)
To: Stauffer Thomas MTANA
Cc: Sumit Garg, op-tee@lists.trustedfirmware.org, Lars Persson
FYI, here's the patch
https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/42078
Thanks,
Jens
On Wed, Aug 20, 2025 at 3:50 PM Jens Wiklander
<jens.wiklander@linaro.org> wrote:
>
> Hi Thomas,
>
> On Wed, Aug 20, 2025 at 3:09 PM Stauffer Thomas MTANA via OP-TEE
> <op-tee@lists.trustedfirmware.org> wrote:
> >
> > Hi Jens,
> >
> > I analyzed this a little bit further since last time I wrote. Here what my "believe" at the moment is
> >
> >
> > *
> > Linux uses the non secure timer in arch_timer (physical/virtual) -> this is correct
> > *
> > OP-TEE uses the secure timer (physical/virtual) -> this is correct
>
> Thanks for confirming.
>
> > *
> > ARM Trusted Firmware by default enables NS_TIMER_SWITCH=1 with opteed, this IMHO unnecessarily stores/restores time registers, setting NS_TIMER_SWITCH=0 seems to solve the issue, my personal tests and also xtest did not show me any issue so far
> >
> > All this with some uncertainty, I read through quite some code, but I could have missed a case, where something may go wrong I did not see.
> >
> > Latencies I tested with cycletest. With NS_TIMER_SWITCH=1 this skyrockets (and explains all the other negative consequences) with NS_TIMER_SWITCH=0, everything is back to normal, even doing "heavy" operation like creating 4096 bit RSA keys with OP-TEE.
>
> This and Lars's findings clearly indicate that we shouldn't set
> NS_TIMER_SWITCH=1. I'll propose a patch.
>
> Cheers,
> Jens
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: rcu_preempt detected stalls on CPUs/tasks
2025-08-20 14:46 ` Jens Wiklander
@ 2025-08-21 9:19 ` Sumit Garg via OP-TEE
0 siblings, 0 replies; 11+ messages in thread
From: Sumit Garg via OP-TEE @ 2025-08-21 9:19 UTC (permalink / raw)
To: Jens Wiklander
Cc: Stauffer Thomas MTANA, op-tee@lists.trustedfirmware.org,
Lars Persson
On Wed, Aug 20, 2025 at 04:46:06PM +0200, Jens Wiklander wrote:
> FYI, here's the patch
> https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/42078
>
Thanks Jens, but as we discussed on review of this patch, I have posted
a more complete fix here [1] for OP-TEE ftrace to work along with removing
context management of non-secure EL1 physical timer register.
[1] https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/42085
-Sumit
>
> On Wed, Aug 20, 2025 at 3:50 PM Jens Wiklander
> <jens.wiklander@linaro.org> wrote:
> >
> > Hi Thomas,
> >
> > On Wed, Aug 20, 2025 at 3:09 PM Stauffer Thomas MTANA via OP-TEE
> > <op-tee@lists.trustedfirmware.org> wrote:
> > >
> > > Hi Jens,
> > >
> > > I analyzed this a little bit further since last time I wrote. Here what my "believe" at the moment is
> > >
> > >
> > > *
> > > Linux uses the non secure timer in arch_timer (physical/virtual) -> this is correct
> > > *
> > > OP-TEE uses the secure timer (physical/virtual) -> this is correct
> >
> > Thanks for confirming.
> >
> > > *
> > > ARM Trusted Firmware by default enables NS_TIMER_SWITCH=1 with opteed, this IMHO unnecessarily stores/restores time registers, setting NS_TIMER_SWITCH=0 seems to solve the issue, my personal tests and also xtest did not show me any issue so far
> > >
> > > All this with some uncertainty, I read through quite some code, but I could have missed a case, where something may go wrong I did not see.
> > >
> > > Latencies I tested with cycletest. With NS_TIMER_SWITCH=1 this skyrockets (and explains all the other negative consequences) with NS_TIMER_SWITCH=0, everything is back to normal, even doing "heavy" operation like creating 4096 bit RSA keys with OP-TEE.
> >
> > This and Lars's findings clearly indicate that we shouldn't set
> > NS_TIMER_SWITCH=1. I'll propose a patch.
> >
> > Cheers,
> > Jens
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: rcu_preempt detected stalls on CPUs/tasks
2025-08-20 13:50 ` Jens Wiklander
2025-08-20 14:46 ` Jens Wiklander
@ 2025-08-20 16:04 ` Andrew Davis via OP-TEE
2025-08-21 7:08 ` Sumit Garg via OP-TEE
1 sibling, 1 reply; 11+ messages in thread
From: Andrew Davis via OP-TEE @ 2025-08-20 16:04 UTC (permalink / raw)
To: op-tee
On 8/20/25 8:50 AM, Jens Wiklander wrote:
> Hi Thomas,
>
> On Wed, Aug 20, 2025 at 3:09 PM Stauffer Thomas MTANA via OP-TEE
> <op-tee@lists.trustedfirmware.org> wrote:
>>
>> Hi Jens,
>>
>> I analyzed this a little bit further since last time I wrote. Here what my "believe" at the moment is
>>
>>
>> *
>> Linux uses the non secure timer in arch_timer (physical/virtual) -> this is correct
>> *
>> OP-TEE uses the secure timer (physical/virtual) -> this is correct
>
> Thanks for confirming.
>
>> *
>> ARM Trusted Firmware by default enables NS_TIMER_SWITCH=1 with opteed, this IMHO unnecessarily stores/restores time registers, setting NS_TIMER_SWITCH=0 seems to solve the issue, my personal tests and also xtest did not show me any issue so far
>>
>> All this with some uncertainty, I read through quite some code, but I could have missed a case, where something may go wrong I did not see.
>>
>> Latencies I tested with cycletest. With NS_TIMER_SWITCH=1 this skyrockets (and explains all the other negative consequences) with NS_TIMER_SWITCH=0, everything is back to normal, even doing "heavy" operation like creating 4096 bit RSA keys with OP-TEE.
>
> This and Lars's findings clearly indicate that we shouldn't set
> NS_TIMER_SWITCH=1. I'll propose a patch.
>
Same conclusion we came to last year, we disable it for our(TI) platforms
for the same reason[0], to prevent stalling the Linux during OP-TEE ops.
Andrew
[0] https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/25895
> Cheers,
> Jens
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: rcu_preempt detected stalls on CPUs/tasks
2025-08-20 16:04 ` Andrew Davis via OP-TEE
@ 2025-08-21 7:08 ` Sumit Garg via OP-TEE
0 siblings, 0 replies; 11+ messages in thread
From: Sumit Garg via OP-TEE @ 2025-08-21 7:08 UTC (permalink / raw)
To: Andrew Davis; +Cc: op-tee
On Wed, Aug 20, 2025 at 11:04:15AM -0500, Andrew Davis via OP-TEE wrote:
> On 8/20/25 8:50 AM, Jens Wiklander wrote:
> > Hi Thomas,
> >
> > On Wed, Aug 20, 2025 at 3:09 PM Stauffer Thomas MTANA via OP-TEE
> > <op-tee@lists.trustedfirmware.org> wrote:
> > >
> > > Hi Jens,
> > >
> > > I analyzed this a little bit further since last time I wrote. Here what my "believe" at the moment is
> > >
> > >
> > > *
> > > Linux uses the non secure timer in arch_timer (physical/virtual) -> this is correct
> > > *
> > > OP-TEE uses the secure timer (physical/virtual) -> this is correct
> >
> > Thanks for confirming.
> >
> > > *
> > > ARM Trusted Firmware by default enables NS_TIMER_SWITCH=1 with opteed, this IMHO unnecessarily stores/restores time registers, setting NS_TIMER_SWITCH=0 seems to solve the issue, my personal tests and also xtest did not show me any issue so far
> > >
> > > All this with some uncertainty, I read through quite some code, but I could have missed a case, where something may go wrong I did not see.
> > >
> > > Latencies I tested with cycletest. With NS_TIMER_SWITCH=1 this skyrockets (and explains all the other negative consequences) with NS_TIMER_SWITCH=0, everything is back to normal, even doing "heavy" operation like creating 4096 bit RSA keys with OP-TEE.
> >
> > This and Lars's findings clearly indicate that we shouldn't set
> > NS_TIMER_SWITCH=1. I'll propose a patch.
> >
>
> Same conclusion we came to last year, we disable it for our(TI) platforms
> for the same reason[0], to prevent stalling the Linux during OP-TEE ops.
>
I am unsure why folks choose to fix this problem in a platform specific
manner (upstream or downstream) since it's a generic platform agnostic
problem. Atleast I should be CCed on the problem report and fix
proposed since I added that NS_TIMER_SWITCH=1 for OP-TEE in the first
place. Also, this means nobody is able to enable ftrace on NXP and TI
platforms until now.
-Sumit
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: rcu_preempt detected stalls on CPUs/tasks
2025-08-20 11:10 ` Jens Wiklander
2025-08-20 12:39 ` Lars Persson
2025-08-20 13:08 ` Stauffer Thomas MTANA via OP-TEE
@ 2025-08-21 6:42 ` Sumit Garg via OP-TEE
2 siblings, 0 replies; 11+ messages in thread
From: Sumit Garg via OP-TEE @ 2025-08-21 6:42 UTC (permalink / raw)
To: Jens Wiklander
Cc: Stauffer Thomas MTANA, op-tee@lists.trustedfirmware.org,
Ferreira Joao MTANA
On Wed, Aug 20, 2025 at 01:10:07PM +0200, Jens Wiklander wrote:
> Hi Thomas,
>
> On Mon, Aug 4, 2025 at 7:09 PM Stauffer Thomas MTANA via OP-TEE
> <op-tee@lists.trustedfirmware.org> wrote:
> >
> > Hi,
> >
> > I'm running OP-TEE 4.5 with PKCS11TA and ATF lts-v2.12.4 on an iMX8MP. When I create new rsa 4096 bit keypair with OP-TEE, I often get
> >
> > rcu_preempt detected stalls on CPUs/tasks
> >
> > from Linux 6.6.90 (mainline)
> >
> > Also PID 0 is sometimes blocked for more than 30 seconds. When I create a RT task with even higher priority, this process is also blocked up to 2 seconds. For a test I disabled saving/restoring the NS timer register in ATF (arm-trusted-firmware/lib/el3_runtime/aarch64/context_mgmt.c), this seems to get completely rid of the problem. Neither creating nor signing leads to any issue anymore. This hack may lead to other problems I do not fully understand yet. I "believe" that at least since ARMv8, the CPU have their own timers for secure/non-secure world, but I would assume that ATF implements this correctly already.
>
> I'm starting to suspect that we're setting NS_TIMER_SWITCH to 1 in
> services/spd/opteed/opteed.mk based on a misunderstanding. OP-TEE can
> use timers, but then it's using the EL3 physical timer. So OP-TEE
> should stay off the EL1 physical timer. Sumit, what's your view?
I had to research the history why I added it in the first place. It was
basically added to save and restore cntkctl_el1 register which is needed
for ftrace to work correctly. Have a look here [1]. So your current
proposed patch will break ftrace.
However, as a side effect all the EL1 physical timer registers got saved
and restored which is a problem as mentioned here. So the correct fix
here would be to make NS_TIMER_SWITCH more granular to separate out the
cntkctl_el1 register save and restore.
[1] https://github.com/OP-TEE/optee_os/commit/edaf8c38f534497a65a460f0348a81d3e26b3518
-Sumit
>
> >
> > Maybe I'm completely wrong here (assuming that it cannot be I'm the first person having this issue on this platform). Hint in any direction would be helpful.
>
> I'm surprised we haven't seen more of this issue.
>
> Cheers,
> Jens
>
> >
> > Regards
> >
> > Thomas
> >
^ permalink raw reply [flat|nested] 11+ messages in thread