Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Leo Yan <leo.yan@arm.com>
To: James Clark <james.clark@linaro.org>
Cc: coresight@lists.linaro.org, linux-arm-kernel@lists.infradead.org,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Mike Leach <mike.leach@arm.com>,
	Yeoreum Yun <yeoreum.yun@arm.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Will Deacon <will@kernel.org>, Yabin Cui <yabinc@google.com>,
	Keita Morisaki <keyz@google.com>,
	Jie Gan <jie.gan@oss.qualcomm.com>,
	Yuanfang Zhang <quic_yuanfang@quicinc.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Tamas Petz <tamas.petz@arm.com>,
	Thomas Gleixner <tglx@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH v11 07/27] coresight: Take a reference on csdev
Date: Fri, 8 May 2026 14:28:44 +0100	[thread overview]
Message-ID: <20260508132844.GG3778514@e132581.arm.com> (raw)
In-Reply-To: <b9fb8b0f-54fc-4fd2-9a0f-44889b5a8eb0@linaro.org>

On Thu, May 07, 2026 at 03:55:26PM +0100, James Clark wrote:

[...]

> Hi Leo,
> 
> Testing on the Orion O6 board was all good, and so was stress testing
> concurrent sysfs mode and hotplug on Juno.

Thanks a lot for test!

> However, I was trying to stress test sysfs mode and rmmod on Juno and
> ran into an issue, although a similar issue is present without your
> patchset.

I don't think CPU PM introduces additional complexity for the above cases.
The reason is that CPU PM notifiers _only_ apply to active sessions, and
once a device is enabled, the module cannot be removed.

If the race conditions between enabling/disabling sessions and module
load/unload are properly handled, CPU PM should be safe. If we have bug
in these race conditions, the high frequency data access in CPU PM may
expose the issues - I don't expect CPU PM is the culprit.

> If you run an rmmod on all the coresight devices at the same time as an
> enable_source / disable loop you always get this:
> 
>  WARNING: possible circular locking dependency detected
>  7.0.0-rc1+ #713 Tainted: G                 N
>  ------------------------------------------------------
>  rmmod/1361 is trying to acquire lock:
>  ffff0008042f69a8 (kn->active#144){++++}-{0:0}, at:
> __kernfs_remove+0x1b8/0x2c8

kn->active is not a lock but an active reference of sysfs node, but it
use lockdep annotation to detect lock dependency.

>   Possible unsafe locking scenario:
>         CPU0                    CPU1
>         ----                    ----
>    lock(coresight_mutex);
>                                 lock(cpu_hotplug_lock);
>                                 lock(coresight_mutex);
>    lock(kn->active#144);
>   *** DEADLOCK ***

The potential deadlock sequence could be:

  kernfs_fop_write_iter()
   `> kernfs_get_active_of()    => acquire kn->active
   `> coresight_enable_sysfs()  => acquire coresight_mutex

  coresight_unregister()        => acquire coresight_mutex
   `> device_unregister()
       `> __kernfs_remove()
           `> kernfs_drain()    => acquire kn->active

> I think the issue can be fixed by releasing the coresight_mutex before
> device_unregister():
> 
>   diff --git a/drivers/hwtracing/coresight/coresight-core.c
> b/drivers/hwtracing/coresight/coresight-core.c
>   index 015363da12fa..620560880f12 100644
>   --- a/drivers/hwtracing/coresight/coresight-core.c
>   +++ b/drivers/hwtracing/coresight/coresight-core.c
>   @@ -1639,8 +1639,8 @@ void coresight_unregister(struct coresight_device
> *csdev)
>           coresight_remove_conns(csdev);
>           coresight_clear_default_sink(csdev);
>           coresight_release_platform_data(csdev->dev.parent, csdev->pdata);
>   -       device_unregister(&csdev->dev);
>           mutex_unlock(&coresight_mutex);
>   +       device_unregister(&csdev->dev);
>    }
>    EXPORT_SYMBOL_GPL(coresight_unregister);

If so, we also need to move device_register() out of the mutex scope.

That said, I still think we should dive a bit if can use smaller locking
granluarity (combining with bus management provided by device model).

> Although I didn't think too hard about the implications, but it might be ok
> because once all the connections are removed the device can't be used so
> releasing the coresight_mutex isn't an issue.
> 
> But then testing that I ran into some kind of refleak where I couldn't
> unload modules anymore, even though I'd disabled everything. But that could
> be a different issue:
> 
>   rmmod: ERROR: Module coresight_funnel is in use
>   rmmod: ERROR: Module coresight_replicator is in use
>   rmmod: ERROR: Module coresight_etm4x is in use
>   rmmod: ERROR: Module coresight_tmc is in use
>   rmmod: ERROR: Module coresight_cti is in use
>   rmmod: ERROR: Module coresight is in use by: coresight_tmc coresight_cti
> coresight_etm4x coresight_replicator coresight_funnel

I suspect this is due to module references are not properly released, or
the entire CS path is not properly disabled.

After the issue occurs, can the ETM sysfs knob still be accessed? I am
curious whether this is caused by the sysfs knob disappearing so no way
to disable the path or the sysfs knob still exists but the driver
internally misses to disable the path.

> Anyway I don't think your patches make this worse, so we can probably ignore
> it, but it would be good to be able to stress test the new modifications
> around the same area.

As no regression in test, I agree that we should not defer this series.

We can fix the race with module load/unload as a separate task:

  - sysfs mode + module load/unload
  - perf mode + module load/unload

Then we can combine stress test with CPU idle/hotplug.

Thanks,
Leo


  reply	other threads:[~2026-05-08 13:29 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-01 16:47 [PATCH v11 00/27] CoreSight: Refactor power management for CoreSight path Leo Yan
2026-05-01 16:47 ` [PATCH v11 01/27] coresight: Handle helper enable failure properly Leo Yan
2026-05-01 16:47 ` [PATCH v11 02/27] coresight: Extract device init into coresight_init_device() Leo Yan
2026-05-01 16:47 ` [PATCH v11 03/27] coresight: Populate CPU ID into coresight_device Leo Yan
2026-05-01 16:47 ` [PATCH v11 04/27] coresight: Remove .cpu_id() callback from source ops Leo Yan
2026-05-01 16:47 ` [PATCH v11 05/27] coresight: Take hotplug lock in enable_source_store() for Sysfs mode Leo Yan
2026-05-01 16:47 ` [PATCH v11 06/27] coresight: perf: Retrieve path and source from event data Leo Yan
2026-05-01 16:47 ` [PATCH v11 07/27] coresight: Take a reference on csdev Leo Yan
2026-05-07 14:55   ` James Clark
2026-05-08 13:28     ` Leo Yan [this message]
2026-05-01 16:47 ` [PATCH v11 08/27] coresight: Move per-CPU source pointer to core layer Leo Yan
2026-05-01 16:47 ` [PATCH v11 09/27] coresight: Grab per-CPU source device during AUX setup Leo Yan
2026-05-06  9:08   ` Suzuki K Poulose
2026-05-01 16:47 ` [PATCH v11 10/27] coresight: Register CPU PM notifier in core layer Leo Yan
2026-05-01 16:47 ` [PATCH v11 11/27] coresight: etm4x: Hook CPU PM callbacks Leo Yan
2026-05-01 16:47 ` [PATCH v11 12/27] coresight: etm4x: Remove redundant checks in PM save and restore Leo Yan
2026-05-01 16:47 ` [PATCH v11 13/27] coresight: syscfg: Use IRQ-safe spinlock to protect active variables Leo Yan
2026-05-01 16:47 ` [PATCH v11 14/27] coresight: Move source helper disabling to coresight_disable_path() Leo Yan
2026-05-06  9:05   ` Suzuki K Poulose
2026-05-06  9:33     ` Leo Yan
2026-05-01 16:47 ` [PATCH v11 15/27] coresight: Control path with range Leo Yan
2026-05-01 16:47 ` [PATCH v11 16/27] coresight: Use helpers to fetch first and last nodes Leo Yan
2026-05-01 16:47 ` [PATCH v11 17/27] coresight: Introduce coresight_enable_source() helper Leo Yan
2026-05-01 16:47 ` [PATCH v11 18/27] coresight: Save active path for system tracers Leo Yan
2026-05-01 16:48 ` [PATCH v11 19/27] coresight: etm4x: Set active path on target CPU Leo Yan
2026-05-01 16:48 ` [PATCH v11 20/27] coresight: etm3x: " Leo Yan
2026-05-01 16:48 ` [PATCH v11 21/27] coresight: sysfs: Use source's path pointer for path control Leo Yan
2026-05-01 16:48 ` [PATCH v11 22/27] coresight: Control path during CPU idle Leo Yan
2026-05-01 16:48 ` [PATCH v11 23/27] coresight: Add PM callbacks for sink device Leo Yan
2026-05-01 16:48 ` [PATCH v11 24/27] coresight: trbe: Save and restore state across CPU low power state Leo Yan
2026-05-01 16:48 ` [PATCH v11 25/27] coresight: sysfs: Increment refcount only for system tracers Leo Yan
2026-05-06  9:57   ` Suzuki K Poulose
2026-05-06 10:18     ` Leo Yan
2026-05-01 16:48 ` [PATCH v11 26/27] coresight: Move CPU hotplug callbacks to core layer Leo Yan
2026-05-01 16:48 ` [PATCH v11 27/27] coresight: sysfs: Validate CPU online status for per-CPU sources Leo Yan
2026-05-06  7:33 ` [PATCH v11 00/27] CoreSight: Refactor power management for CoreSight path Jie Gan
2026-05-06  9:38   ` Leo Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260508132844.GG3778514@e132581.arm.com \
    --to=leo.yan@arm.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=coresight@lists.linaro.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=james.clark@linaro.org \
    --cc=jie.gan@oss.qualcomm.com \
    --cc=keyz@google.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=mark.rutland@arm.com \
    --cc=mike.leach@arm.com \
    --cc=peterz@infradead.org \
    --cc=quic_yuanfang@quicinc.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tamas.petz@arm.com \
    --cc=tglx@kernel.org \
    --cc=will@kernel.org \
    --cc=yabinc@google.com \
    --cc=yeoreum.yun@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox