Hi Benjamin, On 25-Jun-25 12:15 PM, Berg, Benjamin wrote: > On Wed, 2025-06-25 at 11:44 +0200, Hans de Goede wrote: >> Hi Benjamin, >> >> On 24-Jun-25 3:24 PM, Benjamin Berg wrote: >>> From: Benjamin Berg >>> >>> The intel driver has code paths that will take the tz->lock while the >>> cpuhp_state-up lock is held. As the cpuhp_state-up lock is used in other >>> code paths, it may happen that lockdep detects possible deadlocks >>> through unrelated thermal zone devices. >>> >>> Fix these false positives by using a separate lockdep class for the >>> x86_pkg_temp thermal device. >>> >>> Reported-by: Hans de Goede >>> Closes: https://lore.kernel.org/linux-pm/e9d7ef79-6a24-4515-aa35-d1f2357da798@kernel.org/ >>> Signed-off-by: Benjamin Berg >> >> Thank you for trying to fix this, the fix does get rid of cpuhp_state-up >> in the lockdep report, only for it to be replaced with the thermal_list_lock >> unfortunately (full lockdep report below). >> >> It seems that the real issue is trying to lock wiphy.mtx while holding >> tz->lock. >> >> Or maybe we need this patch + a patch to not hold wiphy.mtx while >> registering the thermalzone ? > > I think my patch simply did not work, because > thermal_zone_device_register_with_trips is already taking &tz->lock > before the lock class is changed. > > What might work is to instead pass the lock class in when registering > the thermal zone device. i.e. just add a new helper with an additional > argument for the lock class and set it right after the mutex_init call > for &tz->lock. I wrote the attached patch to give each call-site (so each tz driver) its own lock_class_key. Unfortunately this does not help and AFAICT the lockdep report is unchanged from your v2 patch. I'll put the new full lockdep report below. Regards, Hans [ 19.526864] ====================================================== [ 19.526866] WARNING: possible circular locking dependency detected [ 19.526868] 6.16.0-rc3+ #17 Tainted: G W [ 19.526871] ------------------------------------------------------ [ 19.526873] modprobe/893 is trying to acquire lock: [ 19.526875] ffff8c5c17570768 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: iwl_mld_tzone_get_temp+0x2f/0x1d0 [iwlmld] [ 19.526915] but task is already holding lock: [ 19.526916] ffff8c5bee900708 (&tz->lock#5){+.+.}-{4:4}, at: thermal_zone_device_set_mode+0x20/0xa0 [ 19.526929] which lock already depends on the new lock. [ 19.526931] the existing dependency chain (in reverse order) is: [ 19.526932] -> #5 (&tz->lock#5){+.+.}-{4:4}: [ 19.526938] __mutex_lock+0x9f/0xed0 [ 19.526944] thermal_zone_device_register_with_trips_and_lockkey+0x571/0x640 [ 19.526948] iwl_mld_thermal_zone_register+0x130/0x1e0 [iwlmld] [ 19.526967] iwl_op_mode_mld_start+0x460/0x4b0 [iwlmld] [ 19.526992] _iwl_op_mode_start+0x67/0x100 [iwlwifi] [ 19.527013] iwl_opmode_register+0x6b/0xc0 [iwlwifi] [ 19.527029] iwl_mld_init+0x19/0xff0 [iwlmld] [ 19.527045] do_one_initcall+0x54/0x390 [ 19.527051] do_init_module+0x62/0x240 [ 19.527056] __do_sys_init_module+0x164/0x190 [ 19.527058] do_syscall_64+0x94/0x3d0 [ 19.527062] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 19.527065] -> #4 (thermal_list_lock){+.+.}-{4:4}: [ 19.527069] __mutex_lock+0x9f/0xed0 [ 19.527072] __thermal_cooling_device_register.part.0+0x16e/0x2a0 [ 19.527076] acpi_processor_thermal_init+0x24/0xb0 [ 19.527082] acpi_soft_cpu_online+0xa7/0x140 [ 19.527085] cpuhp_invoke_callback+0x1ab/0x660 [ 19.527090] cpuhp_thread_fun+0x187/0x270 [ 19.527094] smpboot_thread_fn+0x12a/0x2e0 [ 19.527099] kthread+0x108/0x240 [ 19.527103] ret_from_fork+0x232/0x2a0 [ 19.527109] ret_from_fork_asm+0x1a/0x30 [ 19.527113] -> #3 (cpuhp_state-up){+.+.}-{0:0}: [ 19.527117] cpuhp_thread_fun+0x99/0x270 [ 19.527121] smpboot_thread_fn+0x12a/0x2e0 [ 19.527124] kthread+0x108/0x240 [ 19.527127] ret_from_fork+0x232/0x2a0 [ 19.527131] ret_from_fork_asm+0x1a/0x30 [ 19.527134] -> #2 (cpu_hotplug_lock){++++}-{0:0}: [ 19.527138] cpus_read_lock+0x3c/0xe0 [ 19.527142] static_key_slow_inc+0x12/0x30 [ 19.527148] __nf_register_net_hook+0xb7/0x210 [ 19.527154] nf_register_net_hook+0x2d/0x90 [ 19.527157] nf_tables_addchain.constprop.0+0x2dd/0x6f0 [nf_tables] [ 19.527180] nf_tables_newchain+0x78f/0xb10 [nf_tables] [ 19.527194] nfnetlink_rcv_batch+0x7a5/0xc50 [nfnetlink] [ 19.527199] nfnetlink_rcv+0x12d/0x150 [nfnetlink] [ 19.527202] netlink_unicast+0x1bf/0x2b0 [ 19.527205] netlink_sendmsg+0x211/0x430 [ 19.527207] ____sys_sendmsg+0x373/0x3b0 [ 19.527212] ___sys_sendmsg+0x7d/0xc0 [ 19.527216] __sys_sendmsg+0x5e/0xb0 [ 19.527218] do_syscall_64+0x94/0x3d0 [ 19.527221] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 19.527223] -> #1 (&nft_net->commit_mutex){+.+.}-{4:4}: [ 19.527228] __mutex_lock+0x9f/0xed0 [ 19.527230] nf_tables_netdev_event+0x59/0xc0 [nf_tables] [ 19.527246] notifier_call_chain+0x3d/0x100 [ 19.527250] register_netdevice+0x731/0x8f0 [ 19.527254] cfg80211_register_netdevice+0x4c/0xf0 [cfg80211] [ 19.527318] ieee80211_if_add+0x475/0x740 [mac80211] [ 19.527406] ieee80211_register_hw+0xd6b/0xdb0 [mac80211] [ 19.527453] iwl_op_mode_mld_start+0x438/0x4b0 [iwlmld] [ 19.527476] _iwl_op_mode_start+0x67/0x100 [iwlwifi] [ 19.527494] iwl_opmode_register+0x6b/0xc0 [iwlwifi] [ 19.527507] Adding alias for supply vdd-amp,(null) -> vdd-amp,sdw:0:0:01fa:4243:01 [ 19.527510] iwl_mld_init+0x19/0xff0 [iwlmld] [ 19.527526] do_one_initcall+0x54/0x390 [ 19.527530] do_init_module+0x62/0x240 [ 19.527533] __do_sys_init_module+0x164/0x190 [ 19.527535] do_syscall_64+0x94/0x3d0 [ 19.527538] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 19.527540] -> #0 (&rdev->wiphy.mtx){+.+.}-{4:4}: [ 19.527545] __lock_acquire+0x1481/0x2270 [ 19.527550] lock_acquire+0xc9/0x2c0 [ 19.527554] __mutex_lock+0x9f/0xed0 [ 19.527557] iwl_mld_tzone_get_temp+0x2f/0x1d0 [iwlmld] [ 19.527580] __thermal_zone_get_temp+0x29/0x90 [ 19.527584] __thermal_zone_device_update+0x69/0x480 [ 19.527588] thermal_zone_device_set_mode+0x52/0xa0 [ 19.527592] iwl_mld_thermal_zone_register+0x14b/0x1e0 [iwlmld] [ 19.527608] iwl_op_mode_mld_start+0x460/0x4b0 [iwlmld] [ 19.527626] _iwl_op_mode_start+0x67/0x100 [iwlwifi] [ 19.527642] iwl_opmode_register+0x6b/0xc0 [iwlwifi] [ 19.527658] iwl_mld_init+0x19/0xff0 [iwlmld] [ 19.527672] do_one_initcall+0x54/0x390 [ 19.527675] do_init_module+0x62/0x240 [ 19.527678] __do_sys_init_module+0x164/0x190 [ 19.527681] do_syscall_64+0x94/0x3d0 [ 19.527683] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 19.527685] other info that might help us debug this: [ 19.527687] Chain exists of: &rdev->wiphy.mtx --> thermal_list_lock --> &tz->lock#5 [ 19.527694] Possible unsafe locking scenario: [ 19.527695] CPU0 CPU1 [ 19.527697] ---- ---- [ 19.527698] lock(&tz->lock#5); [ 19.527701] lock(thermal_list_lock); [ 19.527704] lock(&tz->lock#5); [ 19.527707] lock(&rdev->wiphy.mtx); [ 19.527710] *** DEADLOCK *** [ 19.527711] 2 locks held by modprobe/893: [ 19.527714] #0: ffffffffc1a47c68 (iwlwifi_opmode_table_mtx){+.+.}-{4:4}, at: iwl_opmode_register+0x21/0xc0 [iwlwifi] [ 19.527735] #1: ffff8c5bee900708 (&tz->lock#5){+.+.}-{4:4}, at: thermal_zone_device_set_mode+0x20/0xa0 [ 19.527743] stack backtrace: [ 19.527747] CPU: 8 UID: 0 PID: 893 Comm: modprobe Tainted: G W 6.16.0-rc3+ #17 PREEMPT(lazy) [ 19.527751] Tainted: [W]=WARN [ 19.527752] Hardware name: Dell Inc. XPS 16 9640/09CK4V, BIOS 1.12.0 02/10/2025 [ 19.527754] Call Trace: [ 19.527756] [ 19.527757] dump_stack_lvl+0x68/0x90 [ 19.527762] print_circular_bug.cold+0x185/0x1d0 [ 19.527767] check_noncircular+0x10f/0x130 [ 19.527770] ? __kernel_text_address+0xe/0x30 [ 19.527773] ? unwind_get_return_address+0x26/0x50 [ 19.527778] __lock_acquire+0x1481/0x2270 [ 19.527784] lock_acquire+0xc9/0x2c0 [ 19.527787] ? iwl_mld_tzone_get_temp+0x2f/0x1d0 [iwlmld] [ 19.527806] __mutex_lock+0x9f/0xed0 [ 19.527809] ? iwl_mld_tzone_get_temp+0x2f/0x1d0 [iwlmld] [ 19.527824] ? iwl_mld_tzone_get_temp+0x2f/0x1d0 [iwlmld] [ 19.527839] ? lock_acquire+0xc9/0x2c0 [ 19.527843] ? thermal_zone_device_set_mode+0x20/0xa0 [ 19.527846] ? lock_acquire+0xd9/0x2c0 [ 19.527850] ? iwl_mld_tzone_get_temp+0x2f/0x1d0 [iwlmld] [ 19.527864] iwl_mld_tzone_get_temp+0x2f/0x1d0 [iwlmld] [ 19.527885] ? lock_is_held_type+0xd5/0x140 [ 19.527889] __thermal_zone_get_temp+0x29/0x90 [ 19.527892] __thermal_zone_device_update+0x69/0x480 [ 19.527898] thermal_zone_device_set_mode+0x52/0xa0 [ 19.527901] iwl_mld_thermal_zone_register+0x14b/0x1e0 [iwlmld] [ 19.527920] iwl_op_mode_mld_start+0x460/0x4b0 [iwlmld] [ 19.527943] _iwl_op_mode_start+0x67/0x100 [iwlwifi] [ 19.527963] iwl_opmode_register+0x6b/0xc0 [iwlwifi] [ 19.527979] ? __pfx_iwl_mld_init+0x10/0x10 [iwlmld] [ 19.527994] iwl_mld_init+0x19/0xff0 [iwlmld] [ 19.528008] do_one_initcall+0x54/0x390 [ 19.528013] do_init_module+0x62/0x240 [ 19.528015] ? __do_sys_init_module+0x164/0x190 [ 19.528018] __do_sys_init_module+0x164/0x190 [ 19.528025] do_syscall_64+0x94/0x3d0 [ 19.528028] ? lock_acquire+0xc9/0x2c0 [ 19.528032] ? find_held_lock+0x2b/0x80 [ 19.528035] ? rcu_read_unlock+0x17/0x60 [ 19.528040] ? lock_release+0x1a0/0x2d0 [ 19.528043] ? find_held_lock+0x2b/0x80 [ 19.528045] ? exc_page_fault+0x8c/0x240 [ 19.528047] ? lock_release+0x1a0/0x2d0 [ 19.528052] ? do_user_addr_fault+0x370/0x6b0 [ 19.528056] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 19.528058] RIP: 0033:0x7f7cf7f02bae [ 19.528063] Code: 48 8b 0d 5d 32 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2a 32 0f 00 f7 d8 64 89 01 48 [ 19.528065] RSP: 002b:00007ffeadb3b848 EFLAGS: 00000246 ORIG_RAX: 00000000000000af [ 19.528068] RAX: ffffffffffffffda RBX: 00005560c13fae60 RCX: 00007f7cf7f02bae [ 19.528070] RDX: 00005560a17a75ee RSI: 00000000000cbfb9 RDI: 00005560c1fa9cd0 [ 19.528071] RBP: 00007ffeadb3b900 R08: 00005560c13fad40 R09: 00007f7cf7ff6ac0 [ 19.528072] R10: 00005560c13fa010 R11: 0000000000000246 R12: 0000000000040000 [ 19.528073] R13: 00005560c13fadc0 R14: 00005560a17a75ee R15: 0000000000000000 [ 19.528078] [ 19.528285] iwlwifi 0000:03:00.0: Registered PHC clock: iwlwifi-PTP, with index: 0