All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leo Yan <leo.yan@arm.com>
To: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>,
	Mike Leach <mike.leach@linaro.org>,
	James Clark <james.clark@linaro.org>,
	Levi Yun <yeoreum.yun@arm.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Yabin Cui <yabinc@google.com>, Keita Morisaki <keyz@google.com>,
	Yuanfang Zhang <quic_yuanfang@quicinc.com>,
	coresight@lists.linaro.org, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 02/28] coresight: etm4x: Always set tracer's device mode on target CPU
Date: Mon, 4 Aug 2025 10:08:12 +0100	[thread overview]
Message-ID: <20250804090812.GI143191@e132581.arm.com> (raw)
In-Reply-To: <09f9d195-7f3c-4cf1-95da-7e29c398ebcc@arm.com>

On Tue, Jul 15, 2025 at 12:56:54PM +0530, Anshuman Khandual wrote:
> 
> On 01/07/25 8:23 PM, Leo Yan wrote:
> > When enabling a tracer via SysFS interface, the device mode may be set
> > by any CPU - not necessarily the target CPU. This can lead to race
> > condition in SMP, and may result in incorrect mode values being read.
> > 
> > Consider the following example, where CPU0 attempts to enable the tracer
> > on CPU1 (the target CPU):
> > 
> >  CPU0                                    CPU1
> >  etm4_enable()
> >   ` coresight_take_mode(SYSFS)
> >   ` etm4_enable_sysfs()
> >      ` smp_call_function_single() ---->  etm4_enable_hw_smp_call()
> >      			                /
> >                                        /  CPU idle:
> >                                       /   etm4_cpu_save()
> >                                      /     ` coresight_get_mode()
> > 	       Failed to enable h/w /         ^^^
> >   ` coresight_set_mode(DISABLED) <-'          Read the intermediate SYSFS mode
> 
> The problem is - CPU1's HW state and CPU1's sysfs mode state might not
> remain in sync if CPU1 goes into idle state just after an unsuccessful
> etm4_enable_sysfs() attempt from CPU0. In which case a subsequent read
> coresight_get_mode() on CPU1 might erroneously give us DISABLED state,

In this case, CPU1 reads an intermediate "SYSFS" state, even though it
failed in etm4_enable_hw_smp_call(). The current code defers setting
the state to DISABLED on CPU0. As a result, CPU1 will incorrectly save
and restore the ETM context based on the intermediate "SYSFS" state.

> which actually does not seem to be too bad as the earlier enablement
> attempt had failed anyway. Just trying to understand what is the real
> problem here.

The problem is CPU1 might get an intermediate state, it turns out a
stale value and might guide CPU idle flow to wrongly save and restore
ETM context.

> > In this case, CPU0 initiates the operation by taking the SYSFS mode to
> > avoid conflicts with the Perf mode. It then sends an IPI to CPU1 to
> > configure the tracer registers. If any error occurs during this process,
> 
> What kind of error can happen during this process ?

So far, it might fail to claim a device and return an error.

A similar issue might occur when CPU1 exits an idle state. For example,
if CPU0 initiates the ETM enabling flow and sets the SYSFS mode in
advance, once CPU1 is woken up from idle by an IPI, it reads the ETM
state (SYSFS mode) and then restores and enables the ETM. This can
happen even before CPU1 invokes etm4_enable_hw_smp_call() to complete
the ETM enable flow.

> > CPU0 rolls back by setting the mode to DISABLED.
> 
> Which seems OK.
> 
> > 
> > However, if CPU1 enters an idle state during this time, it might read
> > the intermediate SYSFS mode. As a result, the CPU PM flow could wrongly
> > save and restore tracer context that is actually disabled.
> 
> Right but CPU0 had marked the CPU1' state as DISABLED after the enable
> attempt had failed. So what is the problem ?

There is a race condition between CPU0 writing the state and CPU1
reading the state (during its CPU idle flow). CPU1 might read a state
that is inconsistent with the actual ETM hardware state, which causes
CPU1 to save and restore the ETM context incorrectly.

A wider view is this series heavily relies on the ETM state to decide
the linked path has been enabled and take action for saving and
restoring all components on the path (not for ETM device only). We need
a reliable state machine to reflect hardware state. To avoid any
intermediate state, we always set the state on the target CPU.

Thanks,
Leo


  reply	other threads:[~2025-08-04  9:26 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-01 14:53 [PATCH v2 00/28] CoreSight: Address CPU Power Management Issues Leo Yan
2025-07-01 14:53 ` [PATCH v2 01/28] coresight: Change device mode to atomic type Leo Yan
2025-07-02  9:49   ` Yeoreum Yun
2025-07-02 10:38     ` Leo Yan
2025-07-02 16:35       ` Yeoreum Yun
2025-07-15  6:53   ` Anshuman Khandual
2025-08-04  8:22     ` Leo Yan
2025-07-01 14:53 ` [PATCH v2 02/28] coresight: etm4x: Always set tracer's device mode on target CPU Leo Yan
2025-07-02 10:14   ` Yeoreum Yun
2025-07-15  7:26   ` Anshuman Khandual
2025-08-04  9:08     ` Leo Yan [this message]
2025-08-21  9:45   ` James Clark
2025-08-21 12:51     ` Leo Yan
2025-07-01 14:53 ` [PATCH v2 03/28] coresight: etm3x: " Leo Yan
2025-07-02  3:04   ` kernel test robot
2025-07-02  4:08   ` kernel test robot
2025-07-02 10:18   ` Yeoreum Yun
2025-07-02 10:42     ` Leo Yan
2025-07-01 14:53 ` [PATCH v2 04/28] coresight: etm4x: Correct polling IDLE bit Leo Yan
2025-07-02 10:24   ` Yeoreum Yun
2025-07-01 14:53 ` [PATCH v2 05/28] coresight: etm4x: Ensure context synchronization is not ignored Leo Yan
2025-07-02 11:10   ` Yeoreum Yun
2025-07-02 14:35     ` Leo Yan
2025-07-01 14:53 ` [PATCH v2 06/28] coresight: etm4x: Add context synchronization before enabling trace Leo Yan
2025-07-02 11:05   ` Yeoreum Yun
2025-07-02 14:40     ` Leo Yan
2025-07-02 16:21       ` Yeoreum Yun
2025-07-01 14:53 ` [PATCH v2 07/28] coresight: etm4x: Properly control filter in CPU idle with FEAT_TRF Leo Yan
2025-07-01 14:53 ` [PATCH v2 08/28] coresight: etm4x: Remove the state_needs_restore flag Leo Yan
2025-07-02 11:19   ` Yeoreum Yun
2025-07-01 14:53 ` [PATCH v2 09/28] coresight: etm4x: Add flag to control single-shot restart Leo Yan
2025-07-01 14:53 ` [PATCH v2 10/28] coresight: etm4x: Reuse normal enable and disable logic in CPU idle Leo Yan
2025-07-01 14:53 ` [PATCH v2 11/28] coresight: Populate CPU ID into the coresight_device structure Leo Yan
2025-07-02  6:34   ` kernel test robot
2025-07-01 14:53 ` [PATCH v2 12/28] coresight: sysfs: Validate CPU online status for per-CPU sources Leo Yan
2025-07-02 12:55   ` Yeoreum Yun
2025-07-01 14:53 ` [PATCH v2 13/28] coresight: Set per CPU source pointer Leo Yan
2025-07-01 14:53 ` [PATCH v2 14/28] coresight: Register CPU PM notifier in core layer Leo Yan
2025-07-01 14:53 ` [PATCH v2 15/28] coresight: etm4x: Hook CPU PM callbacks Leo Yan
2025-07-01 14:53 ` [PATCH v2 16/28] coresight: Add callback to determine if context save/restore is needed Leo Yan
2025-07-01 14:53 ` [PATCH v2 17/28] coresight: etm4x: Remove redundant condition checks in save and restore Leo Yan
2025-07-01 14:53 ` [PATCH v2 18/28] coresight: cti: Fix race condition by using device mode Leo Yan
2025-07-01 14:53 ` [PATCH v2 19/28] coresight: cti: Introduce CS_MODE_DEBUG mode Leo Yan
2025-07-01 14:53 ` [PATCH v2 20/28] coresight: cti: Properly handle modes in CPU PM notifiers Leo Yan
2025-07-01 14:53 ` [PATCH v2 21/28] coresight: Add per-CPU path pointer Leo Yan
2025-07-01 14:53 ` [PATCH v2 22/28] coresight: Add 'in_idle' argument to path enable/disable functions Leo Yan
2025-07-01 14:53 ` [PATCH v2 23/28] coresight: Control path during CPU idle Leo Yan
2025-07-01 14:53 ` [PATCH v2 24/28] coresight: Add PM callbacks for percpu sink Leo Yan
2025-07-01 14:53 ` [PATCH v2 25/28] coresight: trbe: Save and restore state across CPU low power state Leo Yan
2025-09-04 13:15   ` James Clark
2025-07-01 14:53 ` [PATCH v2 26/28] coresight: Take hotplug lock in enable_source_store() for Sysfs mode Leo Yan
2025-07-01 14:53 ` [PATCH v2 27/28] coresight: Move CPU hotplug callbacks to core layer Leo Yan
2025-07-01 14:53 ` [PATCH v2 28/28] coresight: Manage activated path during CPU hotplug Leo Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250804090812.GI143191@e132581.arm.com \
    --to=leo.yan@arm.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=anshuman.khandual@arm.com \
    --cc=coresight@lists.linaro.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=james.clark@linaro.org \
    --cc=keyz@google.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mike.leach@linaro.org \
    --cc=quic_yuanfang@quicinc.com \
    --cc=suzuki.poulose@arm.com \
    --cc=yabinc@google.com \
    --cc=yeoreum.yun@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.