Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: James Clark <james.clark@linaro.org>
To: Leo Yan <leo.yan@arm.com>
Cc: coresight@lists.linaro.org, linux-arm-kernel@lists.infradead.org,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Mike Leach <mike.leach@arm.com>,
	Yeoreum Yun <yeoreum.yun@arm.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Will Deacon <will@kernel.org>, Yabin Cui <yabinc@google.com>,
	Keita Morisaki <keyz@google.com>,
	Jie Gan <jie.gan@oss.qualcomm.com>,
	Yuanfang Zhang <quic_yuanfang@quicinc.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Tamas Petz <tamas.petz@arm.com>,
	Thomas Gleixner <tglx@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH v11 07/27] coresight: Take a reference on csdev
Date: Thu, 7 May 2026 15:55:26 +0100	[thread overview]
Message-ID: <b9fb8b0f-54fc-4fd2-9a0f-44889b5a8eb0@linaro.org> (raw)
In-Reply-To: <20260501-arm_coresight_path_power_management_improvement-v11-7-fc7fb9d5af1c@arm.com>



On 01/05/2026 5:47 pm, Leo Yan wrote:
> coresight_get_ref() currently pins the provider module and takes a
> reference on the parent device. That does not pin the CoreSight device
> itself.
> 
> A driver can still be unbound while the CoreSight device is part of an
> active trace path. In that case coresight_unregister() may release the
> coresight_device while the path still holds its csdev pointer.

Hi Leo,

Testing on the Orion O6 board was all good, and so was stress testing 
concurrent sysfs mode and hotplug on Juno. However, I was trying to 
stress test sysfs mode and rmmod on Juno and ran into an issue, although 
a similar issue is present without your patchset.

If you run an rmmod on all the coresight devices at the same time as an 
enable_source / disable loop you always get this:

  WARNING: possible circular locking dependency detected
  7.0.0-rc1+ #713 Tainted: G                 N
  ------------------------------------------------------
  rmmod/1361 is trying to acquire lock:
  ffff0008042f69a8 (kn->active#144){++++}-{0:0}, at: 
__kernfs_remove+0x1b8/0x2c8
  but task is already holding lock:
  ffff80007abfc9d8 (coresight_mutex){+.+.}-{4:4}, at: 
coresight_unregister+0x50/0x280 [coresight]
  which lock already depends on the new lock.
  the existing dependency chain (in reverse order) is:
  -> #2 (coresight_mutex){+.+.}-{4:4}:
         __mutex_lock_common+0xe0/0x1408
         mutex_lock_nested+0x38/0x50
         coresight_enable_sysfs+0x34/0x1f0 [coresight]
         enable_source_store+0x6c/0xc0 [coresight]
         dev_attr_store+0x24/0x40
         sysfs_kf_write+0xa8/0xd0
         kernfs_fop_write_iter+0x120/0x1d0
         vfs_write+0x238/0x310
         ksys_write+0x80/0xf8
         __arm64_sys_write+0x28/0x40
         invoke_syscall+0x4c/0xe8
         el0_svc_common+0x9c/0xf0
         do_el0_svc+0x28/0x40
         el0_svc+0x50/0x230
         el0t_64_sync_handler+0x78/0x130
         el0t_64_sync+0x198/0x1a0
  -> #1 (cpu_hotplug_lock){++++}-{0:0}:
         cpus_read_lock+0x5c/0x160
         enable_source_store+0x5c/0xc0 [coresight]
         dev_attr_store+0x24/0x40
         sysfs_kf_write+0xa8/0xd0
         kernfs_fop_write_iter+0x120/0x1d0
         vfs_write+0x238/0x310
         ksys_write+0x80/0xf8
         __arm64_sys_write+0x28/0x40
         invoke_syscall+0x4c/0xe8
         el0_svc_common+0x9c/0xf0
         do_el0_svc+0x28/0x40
         el0_svc+0x50/0x230
         el0t_64_sync_handler+0x78/0x130
         el0t_64_sync+0x198/0x1a0
  -> #0 (kn->active#144){++++}-{0:0}:
         __lock_acquire+0x1b0c/0x3478
         lock_acquire+0x118/0x338
         kernfs_drain+0xe8/0x1e8
         __kernfs_remove+0x1b8/0x2c8
         kernfs_remove_by_name_ns+0x74/0xf0
         sysfs_remove_group+0xb0/0x118
         sysfs_remove_groups+0x40/0x68
         device_remove_attrs+0xc0/0x108
         device_del+0x1e4/0x360
         device_unregister+0x2c/0x90
         coresight_unregister+0x244/0x280 [coresight]
         etm4_remove_dev+0xa4/0xc8 [coresight_etm4x]
         etm4_remove_amba+0x24/0x38 [coresight_etm4x]
         amba_remove+0x3c/0x138
         device_release_driver_internal+0x188/0x2a0
         driver_detach+0x9c/0xe8
         bus_remove_driver+0xec/0x178
         driver_unregister+0x3c/0x68
         amba_driver_unregister+0x1c/0x30
         cleanup_module+0x1c/0xcc0 [coresight_etm4x]
         __arm64_sys_delete_module+0x240/0x3c0
         invoke_syscall+0x4c/0xe8
         el0_svc_common+0x9c/0xf0
         do_el0_svc+0x28/0x40
         el0_svc+0x50/0x230
         el0t_64_sync_handler+0x78/0x130
         el0t_64_sync+0x198/0x1a0
  other info that might help us debug this:
  Chain   kn->active#144 --> cpu_hotplug_lock --> coresight_mutex
   Possible unsafe locking scenario:
         CPU0                    CPU1
         ----                    ----
    lock(coresight_mutex);
                                 lock(cpu_hotplug_lock);
                                 lock(coresight_mutex);
    lock(kn->active#144);
   *** DEADLOCK ***
  2 locks held by rmmod/1361:
   #0: ffff00080103b0e8 (&dev->mutex){....}-{4:4}, at: 
device_release_driver_internal+0x5c/0x2a0
   #1: ffff80007abfc9d8 (coresight_mutex){+.+.}-{4:4}, at: 
coresight_unregister+0x50/0x280 [coresight]
  stack backtrace:
  CPU: 4 UID: 0 PID: 1361 Comm: rmmod Tainted: G                 N 
7.0.0-rc1+ #713 PREEMPT
  Tainted: [N]=TEST
  Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno 
Development Platform, BIOS EDK II Oct 19 2019
  Call trace:
   show_stack+0x24/0x38 (C)
   dump_stack_lvl+0x3c/0xb8
   dump_stack+0x18/0x24
   print_circular_bug+0x324/0x330
   check_noncircular+0x178/0x198
   __lock_acquire+0x1b0c/0x3478
   lock_acquire+0x118/0x338
   kernfs_drain+0xe8/0x1e8
   __kernfs_remove+0x1b8/0x2c8
   kernfs_remove_by_name_ns+0x74/0xf0
   sysfs_remove_group+0xb0/0x118
   sysfs_remove_groups+0x40/0x68
   device_remove_attrs+0xc0/0x108
   device_del+0x1e4/0x360
   device_unregister+0x2c/0x90
   coresight_unregister+0x244/0x280 [coresight]
   etm4_remove_dev+0xa4/0xc8 [coresight_etm4x]
   etm4_remove_amba+0x24/0x38 [coresight_etm4x]
   amba_remove+0x3c/0x138
   device_release_driver_internal+0x188/0x2a0
   driver_detach+0x9c/0xe8
   bus_remove_driver+0xec/0x178
   driver_unregister+0x3c/0x68
   amba_driver_unregister+0x1c/0x30
   cleanup_module+0x1c/0xcc0 [coresight_etm4x]
   __arm64_sys_delete_module+0x240/0x3c0
   invoke_syscall+0x4c/0xe8
   el0_svc_common+0x9c/0xf0
   do_el0_svc+0x28/0x40
   el0_svc+0x50/0x230
   el0t_64_sync_handler+0x78/0x130
   el0t_64_sync+0x198/0x1a0


I think the issue can be fixed by releasing the coresight_mutex before 
device_unregister():

   diff --git a/drivers/hwtracing/coresight/coresight-core.c 
b/drivers/hwtracing/coresight/coresight-core.c
   index 015363da12fa..620560880f12 100644
   --- a/drivers/hwtracing/coresight/coresight-core.c
   +++ b/drivers/hwtracing/coresight/coresight-core.c
   @@ -1639,8 +1639,8 @@ void coresight_unregister(struct 
coresight_device *csdev)
           coresight_remove_conns(csdev);
           coresight_clear_default_sink(csdev);
           coresight_release_platform_data(csdev->dev.parent, csdev->pdata);
   -       device_unregister(&csdev->dev);
           mutex_unlock(&coresight_mutex);
   +       device_unregister(&csdev->dev);
    }
    EXPORT_SYMBOL_GPL(coresight_unregister);

Although I didn't think too hard about the implications, but it might be 
ok because once all the connections are removed the device can't be used 
so releasing the coresight_mutex isn't an issue.

But then testing that I ran into some kind of refleak where I couldn't 
unload modules anymore, even though I'd disabled everything. But that 
could be a different issue:

   rmmod: ERROR: Module coresight_funnel is in use
   rmmod: ERROR: Module coresight_replicator is in use
   rmmod: ERROR: Module coresight_etm4x is in use
   rmmod: ERROR: Module coresight_tmc is in use
   rmmod: ERROR: Module coresight_cti is in use
   rmmod: ERROR: Module coresight is in use by: coresight_tmc 
coresight_cti coresight_etm4x coresight_replicator coresight_funnel


Anyway I don't think your patches make this worse, so we can probably 
ignore it, but it would be good to be able to stress test the new 
modifications around the same area.

> 
> Take a reference on &csdev->dev when grabbing a CoreSight device and drop
> it in coresight_put_ref().
> 
> Signed-off-by: Leo Yan <leo.yan@arm.com>
> ---
>   drivers/hwtracing/coresight/coresight-core.c | 23 ++++++++++++++---------
>   1 file changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c
> index 5519d323f21f38d719b9030c7d77a9a61948ba1d..1a389f63ed006e054dc6bc9764f8be8c1def9da1 100644
> --- a/drivers/hwtracing/coresight/coresight-core.c
> +++ b/drivers/hwtracing/coresight/coresight-core.c
> @@ -651,14 +651,18 @@ struct coresight_device *coresight_get_sink_by_id(u32 id)
>    */
>   static bool coresight_get_ref(struct coresight_device *csdev)
>   {
> -	struct device *dev = csdev->dev.parent;
> +	struct device *parent = csdev->dev.parent;
>   
>   	/* Make sure the driver can't be removed */
> -	if (!try_module_get(dev->driver->owner))
> +	if (!try_module_get(parent->driver->owner))
>   		return false;
> -	/* Make sure the device can't go away */
> -	get_device(dev);
> -	pm_runtime_get_sync(dev);
> +
> +	/* Make sure parent device can't go away and is powered on */
> +	get_device(parent);
> +	pm_runtime_get_sync(parent);
> +
> +	/* Make sure csdev can't go away */
> +	get_device(&csdev->dev);
>   	return true;
>   }
>   
> @@ -670,11 +674,12 @@ static bool coresight_get_ref(struct coresight_device *csdev)
>    */
>   static void coresight_put_ref(struct coresight_device *csdev)
>   {
> -	struct device *dev = csdev->dev.parent;
> +	struct device *parent = csdev->dev.parent;
>   
> -	pm_runtime_put(dev);
> -	put_device(dev);
> -	module_put(dev->driver->owner);
> +	put_device(&csdev->dev);
> +	pm_runtime_put(parent);
> +	put_device(parent);
> +	module_put(parent->driver->owner);
>   }
>   
>   /*
> 



  reply	other threads:[~2026-05-07 14:55 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-01 16:47 [PATCH v11 00/27] CoreSight: Refactor power management for CoreSight path Leo Yan
2026-05-01 16:47 ` [PATCH v11 01/27] coresight: Handle helper enable failure properly Leo Yan
2026-05-01 16:47 ` [PATCH v11 02/27] coresight: Extract device init into coresight_init_device() Leo Yan
2026-05-01 16:47 ` [PATCH v11 03/27] coresight: Populate CPU ID into coresight_device Leo Yan
2026-05-01 16:47 ` [PATCH v11 04/27] coresight: Remove .cpu_id() callback from source ops Leo Yan
2026-05-01 16:47 ` [PATCH v11 05/27] coresight: Take hotplug lock in enable_source_store() for Sysfs mode Leo Yan
2026-05-01 16:47 ` [PATCH v11 06/27] coresight: perf: Retrieve path and source from event data Leo Yan
2026-05-01 16:47 ` [PATCH v11 07/27] coresight: Take a reference on csdev Leo Yan
2026-05-07 14:55   ` James Clark [this message]
2026-05-08 13:28     ` Leo Yan
2026-05-01 16:47 ` [PATCH v11 08/27] coresight: Move per-CPU source pointer to core layer Leo Yan
2026-05-01 16:47 ` [PATCH v11 09/27] coresight: Grab per-CPU source device during AUX setup Leo Yan
2026-05-06  9:08   ` Suzuki K Poulose
2026-05-01 16:47 ` [PATCH v11 10/27] coresight: Register CPU PM notifier in core layer Leo Yan
2026-05-01 16:47 ` [PATCH v11 11/27] coresight: etm4x: Hook CPU PM callbacks Leo Yan
2026-05-01 16:47 ` [PATCH v11 12/27] coresight: etm4x: Remove redundant checks in PM save and restore Leo Yan
2026-05-01 16:47 ` [PATCH v11 13/27] coresight: syscfg: Use IRQ-safe spinlock to protect active variables Leo Yan
2026-05-01 16:47 ` [PATCH v11 14/27] coresight: Move source helper disabling to coresight_disable_path() Leo Yan
2026-05-06  9:05   ` Suzuki K Poulose
2026-05-06  9:33     ` Leo Yan
2026-05-01 16:47 ` [PATCH v11 15/27] coresight: Control path with range Leo Yan
2026-05-01 16:47 ` [PATCH v11 16/27] coresight: Use helpers to fetch first and last nodes Leo Yan
2026-05-01 16:47 ` [PATCH v11 17/27] coresight: Introduce coresight_enable_source() helper Leo Yan
2026-05-01 16:47 ` [PATCH v11 18/27] coresight: Save active path for system tracers Leo Yan
2026-05-01 16:48 ` [PATCH v11 19/27] coresight: etm4x: Set active path on target CPU Leo Yan
2026-05-01 16:48 ` [PATCH v11 20/27] coresight: etm3x: " Leo Yan
2026-05-01 16:48 ` [PATCH v11 21/27] coresight: sysfs: Use source's path pointer for path control Leo Yan
2026-05-01 16:48 ` [PATCH v11 22/27] coresight: Control path during CPU idle Leo Yan
2026-05-01 16:48 ` [PATCH v11 23/27] coresight: Add PM callbacks for sink device Leo Yan
2026-05-01 16:48 ` [PATCH v11 24/27] coresight: trbe: Save and restore state across CPU low power state Leo Yan
2026-05-01 16:48 ` [PATCH v11 25/27] coresight: sysfs: Increment refcount only for system tracers Leo Yan
2026-05-06  9:57   ` Suzuki K Poulose
2026-05-06 10:18     ` Leo Yan
2026-05-01 16:48 ` [PATCH v11 26/27] coresight: Move CPU hotplug callbacks to core layer Leo Yan
2026-05-01 16:48 ` [PATCH v11 27/27] coresight: sysfs: Validate CPU online status for per-CPU sources Leo Yan
2026-05-06  7:33 ` [PATCH v11 00/27] CoreSight: Refactor power management for CoreSight path Jie Gan
2026-05-06  9:38   ` Leo Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b9fb8b0f-54fc-4fd2-9a0f-44889b5a8eb0@linaro.org \
    --to=james.clark@linaro.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=coresight@lists.linaro.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jie.gan@oss.qualcomm.com \
    --cc=keyz@google.com \
    --cc=leo.yan@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=mark.rutland@arm.com \
    --cc=mike.leach@arm.com \
    --cc=peterz@infradead.org \
    --cc=quic_yuanfang@quicinc.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tamas.petz@arm.com \
    --cc=tglx@kernel.org \
    --cc=will@kernel.org \
    --cc=yabinc@google.com \
    --cc=yeoreum.yun@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox