public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [BUG] CoreSight: WARN_ON in coresight_disclaim_device_unlocked due to register reset on CPU power-cycle
       [not found] <CAGo=-X1K1qZ_p9X0yeKy8Wm4QMDs2+4VE08LUNKOCrA15KFLTA@mail.gmail.com>
@ 2025-06-20  8:44 ` Suzuki K Poulose
  2025-06-20 10:19   ` Keita Morisaki
  0 siblings, 1 reply; 5+ messages in thread
From: Suzuki K Poulose @ 2025-06-20  8:44 UTC (permalink / raw)
  To: Keita Morisaki, linux-kernel, alexander.shishkin
  Cc: Yi-ming Tseng, Eric Chan, Leo Yan, James Clark,
	coresight@lists.linaro.org, Mike Leach

Cc: coresight lists, Leo, James, Mike L


Hello !

Thanks for the report ! In the future, please use
scripts/get_maintainer.pl for the clear list of people/list
for reporting issues.

Response inline, below.

On 20/06/2025 08:21, Keita Morisaki wrote:
> Hello folks,
> 
> I am writing to report a WARN_ON message I'm encountering in the 
> CoreSight driver on a multi-core ARM system running a 6.12-based kernel. 
> The warning appears consistently when disabling an Embedded Trace 
> Extension (ETE) source after it has been active. The issue is not 
> reproducible when CPUidle is disabled.
> 
> The problem occurs because the driver assumes the CoreSight claim 
> register is persistent, but it could be reset by the CPUidle power 
> management flow. The section B2.3.2 of Arm CoreSight Architecture 
> Specification v3.0[1] indicates that the claim register must reset at 
> “reset”. A CPU power-up from an idle state can trigger a Cold reset, 
> which might explain this behavior.
> 
> My ftrace analysis confirms this. I traced the only two functions that 
> modify the claim state: coresight_set_claim_tags (which sets the claim) 
> and coresight_clear_claim_tags (which is the only part of the kernel 
> that writes to CLAIMCLR). The trace shows the claim being set, followed 
> by a CPUidle transition, but no subsequent call to 
> coresight_clear_claim_tags.
> 
> Here are the steps to reproduce the issue:
> 
> modprobecoresight_etm4x
> 
> # Enable any relevant sink
> 
> echo1>/sys/bus/coresight/devices/ete0/enable_source
> 
> echo0>/sys/bus/coresight/devices/ete0/enable_source
> 
> 
> Here is a relevant snippet from the ftrace log that illustrates the 
> sequence:
> 
> #tracer:function_graph
> 
> #
> 
> #CPUDURATIONFUNCTIONCALLS
> 
> #|||||||
> 
> 0)|coresight_claim_device_unlocked[coresight](){
> 
> 0)3.750us|coresight_set_claim_tags[coresight]();//Claimissethere
> 
> 0)+20.260us|}
> 
> 0)|/*psci_domain_idle_enter:cpu_id=0state={Our PSCI parameter value}*/// 
> CPUgoesidle
> 
> 0)|/*psci_domain_idle_exit:cpu_id=0state={Our PSCI parameter value}*/// 
> CPUwakesup,causingColdreset
> 
> ...
> 
> 0)@309346.3us|coresight_disclaim_device_unlocked[coresight]();// 
> TriggersWARN_ON
> 
> 
> The following WARN_ON [2] is printed because the CLAIMCLR register has 
> already been reset at the time coresight_disclaim_device_unlocked is 
> called, contrary to the driver's expectation.
> 

We have the ETM driver performing the save/restore of ETM context during
a CPUidle. This is only done when the ETM/ETE is described to be loosing
context over PM operation. If this is not done (via DT), the driver
doesn't do anything. This could be problematic. Could you try adding:

"arm,coresight-loses-context-with-cpu"


property to the ETE nodes and see if it makes a difference ?

Kind regards
Suzuki

[0] 
https://elixir.bootlin.com/linux/v6.12/source/Documentation/devicetree/bindings/arm/arm,coresight-etm.yaml#L79 



> [416.354181][C0]WARNING:CPU:0PID:0atdrivers/hwtracing/coresight/ 
> coresight-core.c:187coresight_disclaim_device_unlocked+0x84/0x9c[coresight]
> 
> [416.535454][C0]Calltrace:
> 
> [416.538606][C0]coresight_disclaim_device_unlocked+0x84/0x9c[coresight]
> 
> [416.549359][C0]etm4_disable_hw+0x2d8/0x374[coresight_etm4x]
> 
> [416.623310][C0]do_idle+0x1d4/0x264
> 
> (Note on tracing: To get this detailed trace, I made two modifications 
> to the kernel. First, since the trace_psci_domain_idle_enter/exit events 
> are not available in kernel 6.12, I cherry-picked the upstream patch 
> 7b7644831e72 [3] to add them. Second, to specifically trace the claim 
> functions, I temporarily replaced their inline compiler hints with 
> noinline.)
> 
> Given the evidence, it appears the driver's assumption that the claim 
> register is persistent across CPU power states is incorrect and may need 
> to be addressed.
> 
> Could you please provide your guidance on this?
> 
> Thank you for your time and assistance.
> 
> [1] https://developer.arm.com/documentation/ihi0029/latest/ <https:// 
> developer.arm.com/documentation/ihi0029/latest/>_
> _[2] https://elixir.bootlin.com/linux/v6.12/source/drivers/hwtracing/ 
> coresight/coresight-core.c#L187 <https://elixir.bootlin.com/linux/v6.12/ 
> source/drivers/hwtracing/coresight/coresight-core.c#L187>_
> _[3] https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/ 
> linux.git/commit/?id=7b7644831e7276f52a233ec685d13c965fff09d9 <https:// 
> web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/? 
> id=7b7644831e7276f52a233ec685d13c965fff09d9>
> 
> Best regards,
> Keita


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] CoreSight: WARN_ON in coresight_disclaim_device_unlocked due to register reset on CPU power-cycle
  2025-06-20  8:44 ` [BUG] CoreSight: WARN_ON in coresight_disclaim_device_unlocked due to register reset on CPU power-cycle Suzuki K Poulose
@ 2025-06-20 10:19   ` Keita Morisaki
  2025-06-23  8:59     ` Keita Morisaki
  0 siblings, 1 reply; 5+ messages in thread
From: Keita Morisaki @ 2025-06-20 10:19 UTC (permalink / raw)
  To: suzuki.poulose
  Cc: alexander.shishkin, coresight, ericchancf, james.clark, keyz,
	leo.yan, linux-kernel, mike.leach, yimingtseng

Hi,
(Resending the same message in plain text (no HTML). The previous message was rejected by the mailing list because it contained HTML.)
thank you so much for the quick response. Really appreciate it.

> Thanks for the report ! In the future, please use
> scripts/get_maintainer.pl for the clear list of people/list
> for reporting issues.

I will do that!

> We have the ETM driver performing the save/restore of ETM context during
> a CPUidle. This is only done when the ETM/ETE is described to be loosing
> context over PM operation. If this is not done (via DT), the driver
> doesn't do anything. This could be problematic. Could you try adding:
>
> "arm,coresight-loses-context-with-cpu"
>
>
> property to the ETE nodes and see if it makes a difference ?

Noted. We will try this and get back to you.

Best,
Keita

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] CoreSight: WARN_ON in coresight_disclaim_device_unlocked due to register reset on CPU power-cycle
  2025-06-20 10:19   ` Keita Morisaki
@ 2025-06-23  8:59     ` Keita Morisaki
  2025-06-23 12:05       ` James Clark
  0 siblings, 1 reply; 5+ messages in thread
From: Keita Morisaki @ 2025-06-23  8:59 UTC (permalink / raw)
  To: suzuki.poulose
  Cc: alexander.shishkin, coresight, ericchancf, james.clark, keyz,
	leo.yan, linux-kernel, mike.leach, yimingtseng

> We have the ETM driver performing the save/restore of ETM context during
> a CPUidle. This is only done when the ETM/ETE is described to be loosing
> context over PM operation. If this is not done (via DT), the driver
> doesn't do anything. This could be problematic. Could you try adding:
>
> "arm,coresight-loses-context-with-cpu"
>
>
> property to the ETE nodes and see if it makes a difference ?

I tried this in our environment, and this worked well. The "arm,coresight-loses-context-with-cpu" property was what we needed.
Thank you so much again for the swift response with the useful information!

Best,
Keita

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] CoreSight: WARN_ON in coresight_disclaim_device_unlocked due to register reset on CPU power-cycle
  2025-06-23  8:59     ` Keita Morisaki
@ 2025-06-23 12:05       ` James Clark
  2025-06-24 13:02         ` Leo Yan
  0 siblings, 1 reply; 5+ messages in thread
From: James Clark @ 2025-06-23 12:05 UTC (permalink / raw)
  To: Keita Morisaki, suzuki.poulose, Leo Yan
  Cc: alexander.shishkin, coresight, ericchancf, linux-kernel,
	mike.leach, yimingtseng



On 23/06/2025 9:59 am, Keita Morisaki wrote:
>> We have the ETM driver performing the save/restore of ETM context during
>> a CPUidle. This is only done when the ETM/ETE is described to be loosing
>> context over PM operation. If this is not done (via DT), the driver
>> doesn't do anything. This could be problematic. Could you try adding:
>>
>> "arm,coresight-loses-context-with-cpu"
>>
>>
>> property to the ETE nodes and see if it makes a difference ?
> 
> I tried this in our environment, and this worked well. The "arm,coresight-loses-context-with-cpu" property was what we needed.
> Thank you so much again for the swift response with the useful information!
> 
> Best,
> Keita


Hi Keita,

Thanks for the report. We discussed internally and decided that it would 
be better for the driver to always save the context by default, because 
this mistake is easy to make. Saving when it doesn't need to be saved 
doesn't do any harm, but not saving when it should be causes quite bad bugs.

So "arm,coresight-loses-context-with-cpu" will be ignored in the future 
and we'll add a new flag like "arm,coresight-save-context" if anyone 
wants the optimization of not saving.

Thanks
James


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] CoreSight: WARN_ON in coresight_disclaim_device_unlocked due to register reset on CPU power-cycle
  2025-06-23 12:05       ` James Clark
@ 2025-06-24 13:02         ` Leo Yan
  0 siblings, 0 replies; 5+ messages in thread
From: Leo Yan @ 2025-06-24 13:02 UTC (permalink / raw)
  To: James Clark
  Cc: Keita Morisaki, suzuki.poulose, alexander.shishkin, coresight,
	ericchancf, linux-kernel, mike.leach, yimingtseng

On Mon, Jun 23, 2025 at 01:05:13PM +0100, James Clark wrote:

[...]

> Hi Keita,
> 
> Thanks for the report. We discussed internally and decided that it would be
> better for the driver to always save the context by default, because this
> mistake is easy to make. Saving when it doesn't need to be saved doesn't do
> any harm, but not saving when it should be causes quite bad bugs.
> 
> So "arm,coresight-loses-context-with-cpu" will be ignored in the future and
> we'll add a new flag like "arm,coresight-save-context" if anyone wants the
> optimization of not saving.

I'm a bit concerned that we might provide information that has not yet
been finalized.

Before landing any changes in the mainline kernel, at this stage, I'd
recommend using the option "coresight_etm4x.pm_save_enable=2" in the
Linux kernel command line. This provides a reliable configuration for
production environment, as it ensures consistency between the current
mainline kernel and any future versions.

Setting coresight_etm4x.pm_save_enable=2 overrides any setting in the
device tree binding and always enables context save and restore for
the ETM / ETE.

If coresight_etm4x.pm_save_enable is set to 1, the ETM driver will
never perform context save and restore. Setting it to 0 (the default
value) allows the device tree or ACPI to determine whether context
save and restore should be performed.

If you are trying to upstream the DT binding for ETE, you need to omit
the property "arm,coresight-loses-context-with-cpu" since it is not
defined in the ETE device tree YAML schema now. As James mentioned, we
need to consolidate this part.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-06-24 13:02 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAGo=-X1K1qZ_p9X0yeKy8Wm4QMDs2+4VE08LUNKOCrA15KFLTA@mail.gmail.com>
2025-06-20  8:44 ` [BUG] CoreSight: WARN_ON in coresight_disclaim_device_unlocked due to register reset on CPU power-cycle Suzuki K Poulose
2025-06-20 10:19   ` Keita Morisaki
2025-06-23  8:59     ` Keita Morisaki
2025-06-23 12:05       ` James Clark
2025-06-24 13:02         ` Leo Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox