From: James Clark <james.clark@linaro.org>
To: Leo Yan <leo.yan@arm.com>, Yabin Cui <yabinc@google.com>
Cc: coresight@lists.linaro.org, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org,
Suzuki K Poulose <suzuki.poulose@arm.com>,
Mike Leach <mike.leach@linaro.org>,
Levi Yun <yeoreum.yun@arm.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Keita Morisaki <keyz@google.com>,
Yuanfang Zhang <quic_yuanfang@quicinc.com>
Subject: Re: [PATCH v2 25/28] coresight: trbe: Save and restore state across CPU low power state
Date: Thu, 4 Sep 2025 14:15:21 +0100 [thread overview]
Message-ID: <ba94795e-367c-429a-a19f-2a220e33a117@linaro.org> (raw)
In-Reply-To: <20250701-arm_cs_pm_fix_v3-v2-25-23ebb864fcc1@arm.com>
On 01/07/2025 3:53 pm, Leo Yan wrote:
> From: Yabin Cui <yabinc@google.com>
>
> Similar to ETE, TRBE may lose its context when a CPU enters low power
> state. To make things worse, if ETE is restored without TRBE being
> restored, an enabled source device with no enabled sink devices can
> cause CPU hang on some devices (e.g., Pixel 9).
>
> The save and restore flows are described in the section K5.5 "Context
> switching" of Arm ARM (ARM DDI 0487 L.a). This commit adds save and
> restore callbacks with following the software usages defined in the
> architecture manual.
>
> Signed-off-by: Yabin Cui <yabinc@google.com>
> Co-developed-by: Leo Yan <leo.yan@arm.com>
> Signed-off-by: Leo Yan <leo.yan@arm.com>
> ---
Hi Leo,
I tested this commit to try to avoid hitting any issues with the last 3
hotplug changes but ran into two issues. They seemed to be hit when
running the CPU online/offline/enable_source stress test and then after
that running the Perf "Check Arm CoreSight trace data recording and
synthesized samples" test.
It hit when doing them in either order, but not when doing only one
after a reboot.
First one is just when running one of the tests:
=====================================================
WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
6.16.0-rc3+ #475 Not tainted
-----------------------------------------------------
perf-exec/709 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
ffff000804002cd0 (&drvdata->spinlock){+.+.}-{2:2}, at:
cti_enable+0x40/0x130 [coresight_cti]
and this task is already holding:
ffff00080ab67e18 (&ctx->lock){....}-{2:2}, at: perf_event_exec+0xc4/0x6b8
which would create a new lock dependency:
(&ctx->lock){....}-{2:2} -> (&drvdata->spinlock){+.+.}-{2:2}
but this new dependency connects a HARDIRQ-irq-safe lock:
(&cpuctx_lock){-...}-{2:2}
... which became HARDIRQ-irq-safe at:
lock_acquire+0x130/0x2c0
_raw_spin_lock+0x60/0xa8
__perf_install_in_context+0x5c/0x2f0
remote_function+0x58/0x78
__flush_smp_call_function_queue+0x1d8/0x9c0
generic_smp_call_function_single_interrupt+0x20/0x38
ipi_handler+0x118/0x338
handle_percpu_devid_irq+0xb0/0x180
generic_handle_domain_irq+0x4c/0x78
gic_handle_irq+0x68/0xf0
call_on_irq_stack+0x24/0x30
do_interrupt_handler+0x88/0xd0
el1_interrupt+0x34/0x68
el1h_64_irq_handler+0x18/0x28
el1h_64_irq+0x6c/0x70
arch_local_irq_enable+0x8/0x10
cpuidle_enter+0x44/0x68
do_idle+0x1b0/0x2b8
cpu_startup_entry+0x40/0x50
rest_init+0x1c4/0x1d0
start_kernel+0x394/0x458
__primary_switched+0x88/0x98
to a HARDIRQ-irq-unsafe lock:
(&drvdata->spinlock){+.+.}-{2:2}
... which became HARDIRQ-irq-unsafe at:
...
lock_acquire+0x130/0x2c0
_raw_spin_lock+0x60/0xa8
cti_disable+0x38/0xe8 [coresight_cti]
coresight_disable_source+0x88/0xa8 [coresight]
coresight_disable_sysfs+0xd0/0x1f0 [coresight]
enable_source_store+0x78/0xb0 [coresight]
dev_attr_store+0x24/0x40
sysfs_kf_write+0xa8/0xd0
kernfs_fop_write_iter+0x114/0x1c0
vfs_write+0x2d8/0x310
ksys_write+0x80/0xf8
__arm64_sys_write+0x28/0x40
invoke_syscall+0x4c/0x110
el0_svc_common+0xb8/0xf0
do_el0_svc+0x28/0x40
el0_svc+0x4c/0xe8
el0t_64_sync_handler+0x84/0x108
el0t_64_sync+0x198/0x1a0
other info that might help us debug this:
Chain exists of:
&cpuctx_lock --> &ctx->lock --> &drvdata->spinlock
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&drvdata->spinlock);
local_irq_disable();
lock(&cpuctx_lock);
lock(&ctx->lock);
<Interrupt>
lock(&cpuctx_lock);
*** DEADLOCK ***
4 locks held by perf-exec/709:
#0: ffff0008066b66f8 (&sig->cred_guard_mutex){+.+.}-{4:4}, at:
bprm_execve+0x54/0x690
#1: ffff0008066b67a0 (&sig->exec_update_lock){++++}-{4:4}, at:
exec_mmap+0x48/0x2b0
#2: ffff000976a467f0 (&cpuctx_lock){-...}-{2:2}, at:
perf_event_exec+0xb4/0x6b8
#3: ffff00080ab67e18 (&ctx->lock){....}-{2:2}, at:
perf_event_exec+0xc4/0x6b8
the dependencies between HARDIRQ-irq-safe lock and the
holding lock:
-> (&cpuctx_lock){-...}-{2:2} {
IN-HARDIRQ-W at:
lock_acquire+0x130/0x2c0
_raw_spin_lock+0x60/0xa8
__perf_install_in_context+0x5c/0x2f0
remote_function+0x58/0x78
__flush_smp_call_function_queue+0x1d8/0x9c0
generic_smp_call_function_single_interrupt+0x20/0x38
ipi_handler+0x118/0x338
handle_percpu_devid_irq+0xb0/0x180
generic_handle_domain_irq+0x4c/0x78
gic_handle_irq+0x68/0xf0
call_on_irq_stack+0x24/0x30
do_interrupt_handler+0x88/0xd0
el1_interrupt+0x34/0x68
el1h_64_irq_handler+0x18/0x28
el1h_64_irq+0x6c/0x70
arch_local_irq_enable+0x8/0x10
cpuidle_enter+0x44/0x68
do_idle+0x1b0/0x2b8
cpu_startup_entry+0x40/0x50
rest_init+0x1c4/0x1d0
start_kernel+0x394/0x458
__primary_switched+0x88/0x98
INITIAL USE at:
lock_acquire+0x130/0x2c0
_raw_spin_lock+0x60/0xa8
__perf_event_exit_context+0x3c/0xb0
generic_exec_single+0xb0/0x3a8
smp_call_function_single+0x180/0xa98
perf_event_exit_cpu+0x344/0x3d8
cpuhp_invoke_callback+0x120/0x2a0
cpuhp_thread_fun+0x170/0x1d8
smpboot_thread_fn+0x1c0/0x328
kthread+0x148/0x250
ret_from_fork+0x10/0x20
}
... key at: [<ffff800082bbe238>] cpuctx_lock+0x0/0x10
-> (&ctx->lock){....}-{2:2} {
INITIAL USE at:
lock_acquire+0x130/0x2c0
_raw_spin_lock_irq+0x70/0xb8
find_get_pmu_context+0x88/0x238
__arm64_sys_perf_event_open+0x794/0x1150
invoke_syscall+0x4c/0x110
el0_svc_common+0xb8/0xf0
do_el0_svc+0x28/0x40
el0_svc+0x4c/0xe8
el0t_64_sync_handler+0x84/0x108
el0t_64_sync+0x198/0x1a0
}
... key at: [<ffff800082bbe1d0>]
__perf_event_init_context.__key+0x0/0x10
... acquired at:
_raw_spin_lock+0x60/0xa8
__perf_install_in_context+0x6c/0x2f0
remote_function+0x58/0x78
generic_exec_single+0xb0/0x3a8
smp_call_function_single+0x180/0xa98
perf_install_in_context+0x1a0/0x290
__arm64_sys_perf_event_open+0x103c/0x1150
invoke_syscall+0x4c/0x110
el0_svc_common+0xb8/0xf0
do_el0_svc+0x28/0x40
el0_svc+0x4c/0xe8
el0t_64_sync_handler+0x84/0x108
el0t_64_sync+0x198/0x1a0
the dependencies between the lock to be acquired
and HARDIRQ-irq-unsafe lock:
-> (&drvdata->spinlock){+.+.}-{2:2} {
HARDIRQ-ON-W at:
lock_acquire+0x130/0x2c0
_raw_spin_lock+0x60/0xa8
cti_disable+0x38/0xe8 [coresight_cti]
coresight_disable_source+0x88/0xa8 [coresight]
coresight_disable_sysfs+0xd0/0x1f0 [coresight]
enable_source_store+0x78/0xb0 [coresight]
dev_attr_store+0x24/0x40
sysfs_kf_write+0xa8/0xd0
kernfs_fop_write_iter+0x114/0x1c0
vfs_write+0x2d8/0x310
ksys_write+0x80/0xf8
__arm64_sys_write+0x28/0x40
invoke_syscall+0x4c/0x110
el0_svc_common+0xb8/0xf0
do_el0_svc+0x28/0x40
el0_svc+0x4c/0xe8
el0t_64_sync_handler+0x84/0x108
el0t_64_sync+0x198/0x1a0
SOFTIRQ-ON-W at:
lock_acquire+0x130/0x2c0
_raw_spin_lock+0x60/0xa8
cti_disable+0x38/0xe8 [coresight_cti]
coresight_disable_source+0x88/0xa8 [coresight]
coresight_disable_sysfs+0xd0/0x1f0 [coresight]
enable_source_store+0x78/0xb0 [coresight]
dev_attr_store+0x24/0x40
sysfs_kf_write+0xa8/0xd0
kernfs_fop_write_iter+0x114/0x1c0
vfs_write+0x2d8/0x310
ksys_write+0x80/0xf8
__arm64_sys_write+0x28/0x40
invoke_syscall+0x4c/0x110
el0_svc_common+0xb8/0xf0
do_el0_svc+0x28/0x40
el0_svc+0x4c/0xe8
el0t_64_sync_handler+0x84/0x108
el0t_64_sync+0x198/0x1a0
INITIAL USE at:
lock_acquire+0x130/0x2c0
_raw_spin_lock+0x60/0xa8
cti_cpu_pm_notify+0x54/0x160 [coresight_cti]
notifier_call_chain+0xb8/0x1b8
raw_notifier_call_chain_robust+0x50/0xb0
cpu_pm_enter+0x50/0x90
psci_enter_idle_state+0x3c/0x80
cpuidle_enter_state+0x158/0x340
cpuidle_enter+0x44/0x68
do_idle+0x1b0/0x2b8
cpu_startup_entry+0x40/0x50
secondary_start_kernel+0x120/0x150
__secondary_switched+0xc0/0xc8
}
... key at: [<ffff80007b10d2a8>]
cti_probe.__key+0x0/0xffffffffffffdd58 [coresight_cti]
... acquired at:
_raw_spin_lock_irqsave+0x70/0xc0
cti_enable+0x40/0x130 [coresight_cti]
_coresight_enable_path+0x134/0x3c0 [coresight]
coresight_enable_path+0x28/0x88 [coresight]
etm_event_start+0xe0/0x228 [coresight]
etm_event_add+0x40/0x68 [coresight]
event_sched_in+0x270/0x418
visit_groups_merge+0x428/0xcd0
__pmu_ctx_sched_in+0xa0/0xe0
ctx_sched_in+0x110/0x188
ctx_resched+0x1c0/0x2b8
perf_event_exec+0x29c/0x6b8
begin_new_exec+0x378/0x558
load_elf_binary+0x2b0/0xb00
bprm_execve+0x394/0x690
do_execveat_common+0x2a0/0x300
__arm64_sys_execve+0x50/0x70
invoke_syscall+0x4c/0x110
el0_svc_common+0xb8/0xf0
do_el0_svc+0x28/0x40
el0_svc+0x4c/0xe8
el0t_64_sync_handler+0x84/0x108
el0t_64_sync+0x198/0x1a0
===============================================
And the second one is when reloading the modules:
$ sudo rmmod coresight_stm coresight_funnel stm_core
coresight_replicator coresight_tpiu coresight_etm4x coresight_tmc
coresight_cti coresight_cpu_debug coresight_trbe coresight
$ sudo modprobe coresight; sudo modprobe coresight_stm ; sudo
modprobe coresight_funnel; sudo modprobe stm_core; sudo modprobe
coresight_replicator; sudo modprobe coresight_cpu_debug; sudo modprobe
coresight_tpiu; sudo modprobe coresight_etm4x; sudo modprobe
coresight_tmc; sudo modprobe coresight_trbe ; sudo modprobe coresight_cti ;
Unable to handle kernel NULL pointer dereference at virtual address
00000000000004f0
pc : cti_cpu_pm_notify+0x74/0x160 [coresight_cti]
lr : cti_cpu_pm_notify+0x54/0x160 [coresight_cti]
Call trace:
cti_cpu_pm_notify+0x74/0x160 [coresight_cti] (P)
notifier_call_chain+0xb8/0x1b8
raw_notifier_call_chain_robust+0x50/0xb0
cpu_pm_enter+0x50/0x90
psci_enter_idle_state+0x3c/0x80
cpuidle_enter_state+0x158/0x340
cpuidle_enter+0x44/0x68
do_idle+0x1b0/0x2b8
cpu_startup_entry+0x40/0x50
secondary_start_kernel+0x120/0x150
__secondary_switched+0xc0/0xc8
next prev parent reply other threads:[~2025-09-04 16:27 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-01 14:53 [PATCH v2 00/28] CoreSight: Address CPU Power Management Issues Leo Yan
2025-07-01 14:53 ` [PATCH v2 01/28] coresight: Change device mode to atomic type Leo Yan
2025-07-02 9:49 ` Yeoreum Yun
2025-07-02 10:38 ` Leo Yan
2025-07-02 16:35 ` Yeoreum Yun
2025-07-15 6:53 ` Anshuman Khandual
2025-08-04 8:22 ` Leo Yan
2025-07-01 14:53 ` [PATCH v2 02/28] coresight: etm4x: Always set tracer's device mode on target CPU Leo Yan
2025-07-02 10:14 ` Yeoreum Yun
2025-07-15 7:26 ` Anshuman Khandual
2025-08-04 9:08 ` Leo Yan
2025-08-21 9:45 ` James Clark
2025-08-21 12:51 ` Leo Yan
2025-07-01 14:53 ` [PATCH v2 03/28] coresight: etm3x: " Leo Yan
2025-07-02 3:04 ` kernel test robot
2025-07-02 4:08 ` kernel test robot
2025-07-02 10:18 ` Yeoreum Yun
2025-07-02 10:42 ` Leo Yan
2025-07-01 14:53 ` [PATCH v2 04/28] coresight: etm4x: Correct polling IDLE bit Leo Yan
2025-07-02 10:24 ` Yeoreum Yun
2025-07-01 14:53 ` [PATCH v2 05/28] coresight: etm4x: Ensure context synchronization is not ignored Leo Yan
2025-07-02 11:10 ` Yeoreum Yun
2025-07-02 14:35 ` Leo Yan
2025-07-01 14:53 ` [PATCH v2 06/28] coresight: etm4x: Add context synchronization before enabling trace Leo Yan
2025-07-02 11:05 ` Yeoreum Yun
2025-07-02 14:40 ` Leo Yan
2025-07-02 16:21 ` Yeoreum Yun
2025-07-01 14:53 ` [PATCH v2 07/28] coresight: etm4x: Properly control filter in CPU idle with FEAT_TRF Leo Yan
2025-07-01 14:53 ` [PATCH v2 08/28] coresight: etm4x: Remove the state_needs_restore flag Leo Yan
2025-07-02 11:19 ` Yeoreum Yun
2025-07-01 14:53 ` [PATCH v2 09/28] coresight: etm4x: Add flag to control single-shot restart Leo Yan
2025-07-01 14:53 ` [PATCH v2 10/28] coresight: etm4x: Reuse normal enable and disable logic in CPU idle Leo Yan
2025-07-01 14:53 ` [PATCH v2 11/28] coresight: Populate CPU ID into the coresight_device structure Leo Yan
2025-07-02 6:34 ` kernel test robot
2025-07-01 14:53 ` [PATCH v2 12/28] coresight: sysfs: Validate CPU online status for per-CPU sources Leo Yan
2025-07-02 12:55 ` Yeoreum Yun
2025-07-01 14:53 ` [PATCH v2 13/28] coresight: Set per CPU source pointer Leo Yan
2025-07-01 14:53 ` [PATCH v2 14/28] coresight: Register CPU PM notifier in core layer Leo Yan
2025-07-01 14:53 ` [PATCH v2 15/28] coresight: etm4x: Hook CPU PM callbacks Leo Yan
2025-07-01 14:53 ` [PATCH v2 16/28] coresight: Add callback to determine if context save/restore is needed Leo Yan
2025-07-01 14:53 ` [PATCH v2 17/28] coresight: etm4x: Remove redundant condition checks in save and restore Leo Yan
2025-07-01 14:53 ` [PATCH v2 18/28] coresight: cti: Fix race condition by using device mode Leo Yan
2025-07-01 14:53 ` [PATCH v2 19/28] coresight: cti: Introduce CS_MODE_DEBUG mode Leo Yan
2025-07-01 14:53 ` [PATCH v2 20/28] coresight: cti: Properly handle modes in CPU PM notifiers Leo Yan
2025-07-01 14:53 ` [PATCH v2 21/28] coresight: Add per-CPU path pointer Leo Yan
2025-07-01 14:53 ` [PATCH v2 22/28] coresight: Add 'in_idle' argument to path enable/disable functions Leo Yan
2025-07-01 14:53 ` [PATCH v2 23/28] coresight: Control path during CPU idle Leo Yan
2025-07-01 14:53 ` [PATCH v2 24/28] coresight: Add PM callbacks for percpu sink Leo Yan
2025-07-01 14:53 ` [PATCH v2 25/28] coresight: trbe: Save and restore state across CPU low power state Leo Yan
2025-09-04 13:15 ` James Clark [this message]
2025-07-01 14:53 ` [PATCH v2 26/28] coresight: Take hotplug lock in enable_source_store() for Sysfs mode Leo Yan
2025-07-01 14:53 ` [PATCH v2 27/28] coresight: Move CPU hotplug callbacks to core layer Leo Yan
2025-07-01 14:53 ` [PATCH v2 28/28] coresight: Manage activated path during CPU hotplug Leo Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ba94795e-367c-429a-a19f-2a220e33a117@linaro.org \
--to=james.clark@linaro.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=coresight@lists.linaro.org \
--cc=gregkh@linuxfoundation.org \
--cc=keyz@google.com \
--cc=leo.yan@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mike.leach@linaro.org \
--cc=quic_yuanfang@quicinc.com \
--cc=suzuki.poulose@arm.com \
--cc=yabinc@google.com \
--cc=yeoreum.yun@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).