From: Hans de Goede <hansg@kernel.org>
To: Benjamin Berg <benjamin@sipsolutions.net>, linux-pm@vger.kernel.org
Cc: Zhang Rui <rui.zhang@intel.com>, Benjamin Berg <benjamin.berg@intel.com>
Subject: Re: [PATCH v2] thermal: use a custom lock class for intel x86_pkg_temp
Date: Wed, 25 Jun 2025 11:44:25 +0200 [thread overview]
Message-ID: <4225e2d2-4fd8-49ee-b0e2-97cd5737423b@kernel.org> (raw)
In-Reply-To: <20250624132406.1485407-1-benjamin@sipsolutions.net>
Hi Benjamin,
On 24-Jun-25 3:24 PM, Benjamin Berg wrote:
> From: Benjamin Berg <benjamin.berg@intel.com>
>
> The intel driver has code paths that will take the tz->lock while the
> cpuhp_state-up lock is held. As the cpuhp_state-up lock is used in other
> code paths, it may happen that lockdep detects possible deadlocks
> through unrelated thermal zone devices.
>
> Fix these false positives by using a separate lockdep class for the
> x86_pkg_temp thermal device.
>
> Reported-by: Hans de Goede <hansg@kernel.org>
> Closes: https://lore.kernel.org/linux-pm/e9d7ef79-6a24-4515-aa35-d1f2357da798@kernel.org/
> Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Thank you for trying to fix this, the fix does get rid of cpuhp_state-up
in the lockdep report, only for it to be replaced with the thermal_list_lock
unfortunately (full lockdep report below).
It seems that the real issue is trying to lock wiphy.mtx while holding
tz->lock.
Or maybe we need this patch + a patch to not hold wiphy.mtx while
registering the thermalzone ?
Regards,
Hans
[ 19.759432] ======================================================
[ 19.759434] WARNING: possible circular locking dependency detected
[ 19.759436] 6.16.0-rc3+ #11 Not tainted
[ 19.759439] ------------------------------------------------------
[ 19.759440] modprobe/881 is trying to acquire lock:
[ 19.759442] ffff897b4c940768 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: iwl_mld_tzone_get_temp+0x2f/0x1d0 [iwlmld]
[ 19.759485]
but task is already holding lock:
[ 19.759486] ffff897b0204e708 (&tz->lock){+.+.}-{4:4}, at: thermal_zone_device_set_mode+0x20/0xa0
[ 19.759498]
which lock already depends on the new lock.
[ 19.759500]
the existing dependency chain (in reverse order) is:
[ 19.759501]
-> #5 (&tz->lock){+.+.}-{4:4}:
[ 19.759506] __mutex_lock+0x9f/0xed0
[ 19.759513] thermal_zone_device_register_with_trips+0x573/0x640
[ 19.759517] thermal_tripless_zone_device_register+0x1b/0x30
[ 19.759521] int3400_thermal_probe+0x21d/0x4f0 [int3400_thermal]
[ 19.759526] platform_probe+0x3f/0xa0
[ 19.759531] really_probe+0xdb/0x340
[ 19.759535] __driver_probe_device+0x78/0x140
[ 19.759539] driver_probe_device+0x1f/0xa0
[ 19.759542] __driver_attach+0xcb/0x1e0
[ 19.759546] bus_for_each_dev+0x63/0xa0
[ 19.759549] bus_add_driver+0x10a/0x1f0
[ 19.759552] driver_register+0x71/0xe0
[ 19.759556] do_one_initcall+0x54/0x390
[ 19.759561] do_init_module+0x62/0x240
[ 19.759565] __do_sys_init_module+0x164/0x190
[ 19.759568] do_syscall_64+0x94/0x3d0
[ 19.759571] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 19.759574]
-> #4 (thermal_list_lock){+.+.}-{4:4}:
[ 19.759578] __mutex_lock+0x9f/0xed0
[ 19.759581] __thermal_cooling_device_register.part.0+0x16e/0x2a0
[ 19.759585] acpi_processor_thermal_init+0x24/0xb0
[ 19.759591] acpi_soft_cpu_online+0xa7/0x140
[ 19.759593] cpuhp_invoke_callback+0x1ab/0x660
[ 19.759599] cpuhp_thread_fun+0x187/0x270
[ 19.759602] smpboot_thread_fn+0x12a/0x2e0
[ 19.759607] kthread+0x108/0x240
[ 19.759612] ret_from_fork+0x232/0x2a0
[ 19.759617] ret_from_fork_asm+0x1a/0x30
[ 19.759621]
-> #3 (cpuhp_state-up){+.+.}-{0:0}:
[ 19.759625] cpuhp_thread_fun+0x99/0x270
[ 19.759628] smpboot_thread_fn+0x12a/0x2e0
[ 19.759631] kthread+0x108/0x240
[ 19.759635] ret_from_fork+0x232/0x2a0
[ 19.759638] ret_from_fork_asm+0x1a/0x30
[ 19.759642]
-> #2 (cpu_hotplug_lock){++++}-{0:0}:
[ 19.759645] cpus_read_lock+0x3c/0xe0
[ 19.759649] static_key_slow_inc+0x12/0x30
[ 19.759656] __nf_register_net_hook+0xb7/0x210
[ 19.759662] nf_register_net_hook+0x2d/0x90
[ 19.759665] nf_tables_addchain.constprop.0+0x2dd/0x6f0 [nf_tables]
[ 19.759686] nf_tables_newchain+0x78f/0xb10 [nf_tables]
[ 19.759701] nfnetlink_rcv_batch+0x7a5/0xc50 [nfnetlink]
[ 19.759705] nfnetlink_rcv+0x12d/0x150 [nfnetlink]
[ 19.759708] netlink_unicast+0x1bf/0x2b0
[ 19.759712] netlink_sendmsg+0x211/0x430
[ 19.759714] ____sys_sendmsg+0x373/0x3b0
[ 19.759719] ___sys_sendmsg+0x7d/0xc0
[ 19.759722] __sys_sendmsg+0x5e/0xb0
[ 19.759724] do_syscall_64+0x94/0x3d0
[ 19.759727] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 19.759729]
-> #1 (&nft_net->commit_mutex){+.+.}-{4:4}:
[ 19.759733] __mutex_lock+0x9f/0xed0
[ 19.759735] nf_tables_netdev_event+0x59/0xc0 [nf_tables]
[ 19.759751] notifier_call_chain+0x3d/0x100
[ 19.759755] register_netdevice+0x731/0x8f0
[ 19.759759] cfg80211_register_netdevice+0x4c/0xf0 [cfg80211]
[ 19.759828] ieee80211_if_add+0x475/0x740 [mac80211]
[ 19.759921] ieee80211_register_hw+0xd6b/0xdb0 [mac80211]
[ 19.759967] iwl_op_mode_mld_start+0x438/0x4b0 [iwlmld]
[ 19.759989] _iwl_op_mode_start+0x67/0x100 [iwlwifi]
[ 19.760012] iwl_opmode_register+0x6b/0xc0 [iwlwifi]
[ 19.760028] iwl_mld_init+0x19/0xff0 [iwlmld]
[ 19.760043] do_one_initcall+0x54/0x390
[ 19.760046] do_init_module+0x62/0x240
[ 19.760049] __do_sys_init_module+0x164/0x190
[ 19.760052] do_syscall_64+0x94/0x3d0
[ 19.760055] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 19.760057]
-> #0 (&rdev->wiphy.mtx){+.+.}-{4:4}:
[ 19.760061] __lock_acquire+0x1481/0x2270
[ 19.760067] lock_acquire+0xc9/0x2c0
[ 19.760070] __mutex_lock+0x9f/0xed0
[ 19.760074] iwl_mld_tzone_get_temp+0x2f/0x1d0 [iwlmld]
[ 19.760098] __thermal_zone_get_temp+0x29/0x90
[ 19.760102] __thermal_zone_device_update+0x69/0x480
[ 19.760106] thermal_zone_device_set_mode+0x52/0xa0
[ 19.760110] iwl_mld_thermal_zone_register+0x144/0x1d0 [iwlmld]
[ 19.760126] iwl_op_mode_mld_start+0x460/0x4b0 [iwlmld]
[ 19.760140] _iwl_op_mode_start+0x67/0x100 [iwlwifi]
[ 19.760157] iwl_opmode_register+0x6b/0xc0 [iwlwifi]
[ 19.760173] iwl_mld_init+0x19/0xff0 [iwlmld]
[ 19.760187] do_one_initcall+0x54/0x390
[ 19.760190] do_init_module+0x62/0x240
[ 19.760193] __do_sys_init_module+0x164/0x190
[ 19.760196] do_syscall_64+0x94/0x3d0
[ 19.760198] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 19.760201]
other info that might help us debug this:
[ 19.760202] Chain exists of:
&rdev->wiphy.mtx --> thermal_list_lock --> &tz->lock
[ 19.760208] Possible unsafe locking scenario:
[ 19.760209] CPU0 CPU1
[ 19.760211] ---- ----
[ 19.760212] lock(&tz->lock);
[ 19.760215] lock(thermal_list_lock);
[ 19.760218] lock(&tz->lock);
[ 19.760220] lock(&rdev->wiphy.mtx);
[ 19.760223]
*** DEADLOCK ***
[ 19.760224] 2 locks held by modprobe/881:
[ 19.760227] #0: ffffffffc1914c68 (iwlwifi_opmode_table_mtx){+.+.}-{4:4}, at: iwl_opmode_register+0x21/0xc0 [iwlwifi]
[ 19.760247] #1: ffff897b0204e708 (&tz->lock){+.+.}-{4:4}, at: thermal_zone_device_set_mode+0x20/0xa0
[ 19.760255]
stack backtrace:
[ 19.760259] CPU: 11 UID: 0 PID: 881 Comm: modprobe Not tainted 6.16.0-rc3+ #11 PREEMPT(lazy)
[ 19.760262] Hardware name: Dell Inc. XPS 16 9640/09CK4V, BIOS 1.12.0 02/10/2025
[ 19.760264] Call Trace:
[ 19.760266] <TASK>
[ 19.760267] dump_stack_lvl+0x68/0x90
[ 19.760271] print_circular_bug.cold+0x185/0x1d0
[ 19.760276] check_noncircular+0x10f/0x130
[ 19.760279] ? __kernel_text_address+0xe/0x30
[ 19.760282] ? unwind_get_return_address+0x26/0x50
[ 19.760287] __lock_acquire+0x1481/0x2270
[ 19.760293] lock_acquire+0xc9/0x2c0
[ 19.760296] ? iwl_mld_tzone_get_temp+0x2f/0x1d0 [iwlmld]
[ 19.760314] __mutex_lock+0x9f/0xed0
[ 19.760317] ? iwl_mld_tzone_get_temp+0x2f/0x1d0 [iwlmld]
[ 19.760332] ? iwl_mld_tzone_get_temp+0x2f/0x1d0 [iwlmld]
[ 19.760345] ? lock_acquire+0xc9/0x2c0
[ 19.760348] ? thermal_zone_device_set_mode+0x20/0xa0
[ 19.760352] ? lock_acquire+0xd9/0x2c0
[ 19.760356] ? iwl_mld_tzone_get_temp+0x2f/0x1d0 [iwlmld]
[ 19.760368] iwl_mld_tzone_get_temp+0x2f/0x1d0 [iwlmld]
[ 19.760382] ? lock_is_held_type+0xd5/0x140
[ 19.760386] __thermal_zone_get_temp+0x29/0x90
[ 19.760389] __thermal_zone_device_update+0x69/0x480
[ 19.760395] thermal_zone_device_set_mode+0x52/0xa0
[ 19.760399] iwl_mld_thermal_zone_register+0x144/0x1d0 [iwlmld]
[ 19.760416] iwl_op_mode_mld_start+0x460/0x4b0 [iwlmld]
[ 19.760435] _iwl_op_mode_start+0x67/0x100 [iwlwifi]
[ 19.760453] iwl_opmode_register+0x6b/0xc0 [iwlwifi]
[ 19.760468] ? __pfx_iwl_mld_init+0x10/0x10 [iwlmld]
[ 19.760482] iwl_mld_init+0x19/0xff0 [iwlmld]
[ 19.760495] do_one_initcall+0x54/0x390
[ 19.760499] do_init_module+0x62/0x240
[ 19.760502] ? __do_sys_init_module+0x164/0x190
[ 19.760504] __do_sys_init_module+0x164/0x190
[ 19.760510] do_syscall_64+0x94/0x3d0
[ 19.760513] ? lock_acquire+0xc9/0x2c0
[ 19.760516] ? __folio_batch_add_and_move+0x8f/0x2f0
[ 19.760519] ? lock_acquire+0xd9/0x2c0
[ 19.760522] ? find_held_lock+0x2b/0x80
[ 19.760524] ? find_held_lock+0x2b/0x80
[ 19.760526] ? find_held_lock+0x2b/0x80
[ 19.760529] ? rcu_read_unlock+0x17/0x60
[ 19.760533] ? lock_release+0x1a0/0x2d0
[ 19.760537] ? __lock_acquire+0x45f/0x2270
[ 19.760541] ? __handle_mm_fault+0xaf4/0xe20
[ 19.760545] ? lock_acquire+0xc9/0x2c0
[ 19.760549] ? find_held_lock+0x2b/0x80
[ 19.760551] ? rcu_read_unlock+0x17/0x60
[ 19.760553] ? lock_release+0x1a0/0x2d0
[ 19.760556] ? find_held_lock+0x2b/0x80
[ 19.760559] ? exc_page_fault+0x8c/0x240
[ 19.760561] ? lock_release+0x1a0/0x2d0
[ 19.760564] ? do_user_addr_fault+0x370/0x6b0
[ 19.760568] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 19.760571] RIP: 0033:0x7f531b702bae
[ 19.760574] Code: 48 8b 0d 5d 32 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2a 32 0f 00 f7 d8 64 89 01 48
[ 19.760576] RSP: 002b:00007ffe4c645798 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[ 19.760580] RAX: ffffffffffffffda RBX: 00005600c715be60 RCX: 00007f531b702bae
[ 19.760581] RDX: 00005600a44015ee RSI: 00000000000cbf71 RDI: 00005600c7d0acd0
[ 19.760583] RBP: 00007ffe4c645850 R08: 00005600c715bd40 R09: 00007f531b7f6ac0
[ 19.760584] R10: 00005600c715b010 R11: 0000000000000246 R12: 0000000000040000
[ 19.760585] R13: 00005600c715bdc0 R14: 00005600a44015ee R15: 0000000000000000
[ 19.760590] </TASK>
[ 19.760846] iwlwifi 0000:03:00.0: Registered PHC clock: iwlwifi-PTP, with index: 0
Regards,
Hans
>
> ---
>
> Hi,
>
> I believe that this should solve the lockdep warning that Hans was
> seeing. That said, I have not tested it much.
>
> v2:
> - Fix function name
> - Mark lock class variable static
>
> Benjamin
> ---
> drivers/thermal/intel/x86_pkg_temp_thermal.c | 2 ++
> drivers/thermal/thermal_core.c | 7 +++++++
> include/linux/thermal.h | 8 ++++++++
> 3 files changed, 17 insertions(+)
>
> diff --git a/drivers/thermal/intel/x86_pkg_temp_thermal.c b/drivers/thermal/intel/x86_pkg_temp_thermal.c
> index 3fc679b6f11b..15d3c904eaa3 100644
> --- a/drivers/thermal/intel/x86_pkg_temp_thermal.c
> +++ b/drivers/thermal/intel/x86_pkg_temp_thermal.c
> @@ -310,6 +310,7 @@ static int pkg_temp_thermal_trips_init(int cpu, int tj_max,
>
> static int pkg_temp_thermal_device_add(unsigned int cpu)
> {
> + static struct lock_class_key x86_pkg_temp_class;
> struct thermal_trip trips[MAX_NUMBER_OF_TRIPS] = { 0 };
> int id = topology_logical_die_id(cpu);
> u32 eax, ebx, ecx, edx;
> @@ -349,6 +350,7 @@ static int pkg_temp_thermal_device_add(unsigned int cpu)
> err = PTR_ERR(zonedev->tzone);
> goto out_kfree_zonedev;
> }
> + thermal_zone_device_set_lock_class(zonedev->tzone, &x86_pkg_temp_class);
> err = thermal_zone_device_enable(zonedev->tzone);
> if (err)
> goto out_unregister_tz;
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 17ca5c082643..ff5c2e01904a 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -1657,6 +1657,13 @@ struct thermal_zone_device *thermal_tripless_zone_device_register(
> }
> EXPORT_SYMBOL_GPL(thermal_tripless_zone_device_register);
>
> +void thermal_zone_device_set_lock_class(struct thermal_zone_device *tz,
> + struct lock_class_key *lock_class)
> +{
> + lockdep_set_class_and_name(&tz->lock, lock_class, tz->type);
> +}
> +EXPORT_SYMBOL_GPL(thermal_zone_device_set_lock_class);
> +
> void *thermal_zone_device_priv(struct thermal_zone_device *tzd)
> {
> return tzd->devdata;
> diff --git a/include/linux/thermal.h b/include/linux/thermal.h
> index 0b5ed6821080..3cb4cf60b66d 100644
> --- a/include/linux/thermal.h
> +++ b/include/linux/thermal.h
> @@ -240,6 +240,9 @@ struct thermal_zone_device *thermal_tripless_zone_device_register(
> const struct thermal_zone_device_ops *ops,
> const struct thermal_zone_params *tzp);
>
> +void thermal_zone_device_set_lock_class(struct thermal_zone_device *tz,
> + struct lock_class_key *lock_class);
> +
> void thermal_zone_device_unregister(struct thermal_zone_device *tz);
>
> void *thermal_zone_device_priv(struct thermal_zone_device *tzd);
> @@ -290,6 +293,11 @@ static inline struct thermal_zone_device *thermal_tripless_zone_device_register(
> const struct thermal_zone_params *tzp)
> { return ERR_PTR(-ENODEV); }
>
> +static inline void
> +thermal_zone_device_set_lock_class(struct thermal_zone_device *tz,
> + struct lock_class_key *lock_class)
> +{ }
> +
> static inline void thermal_zone_device_unregister(struct thermal_zone_device *tz)
> { }
>
next prev parent reply other threads:[~2025-06-25 9:44 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-24 13:24 [PATCH v2] thermal: use a custom lock class for intel x86_pkg_temp Benjamin Berg
2025-06-25 9:44 ` Hans de Goede [this message]
2025-06-25 10:15 ` Berg, Benjamin
2025-06-25 11:34 ` Hans de Goede
2025-06-25 13:23 ` Hans de Goede
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4225e2d2-4fd8-49ee-b0e2-97cd5737423b@kernel.org \
--to=hansg@kernel.org \
--cc=benjamin.berg@intel.com \
--cc=benjamin@sipsolutions.net \
--cc=linux-pm@vger.kernel.org \
--cc=rui.zhang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox