public inbox for linux-wireless@vger.kernel.org
 help / color / mirror / Atom feed
* 6.18.13 iwlwifi deadlock allocating cma while work-item is active.
@ 2026-02-23 22:36 Ben Greear
  2026-02-27 16:31 ` Ben Greear
  0 siblings, 1 reply; 22+ messages in thread
From: Ben Greear @ 2026-02-23 22:36 UTC (permalink / raw)
  To: linux-wireless; +Cc: Korenblit, Miriam Rachel, linux-mm

Hello,

I hit a deadlock related to CMA mem allocation attempting to flush all work
while holding some wifi related mutex, and with a work-queue attempting to process a wifi regdomain
work item.  I really don't see any good way to fix this,
it would seem that any code that was holding a mutex that could block a work-queue
cannot safely allocate CMA memory?  Hopefully someone else has a better idea.

For whatever reason, my hacked up kernel will print out the sysrq process stack traces I need
to understand this, and my stable 6.18.13 will not.  But, the locks-held matches in both cases, so almost
certainly this is same problem.  I can reproduce the same problem on both un-modified stable
and my own.  The details below are from my modified 6.18.9+ kernel.

I only hit this (reliably?) with a KASAN enabled kernel, likely because it makes things slow enough to
hit the problem and/or causes CMA allocations in a different manner.

General way to reproduce is to have large amounts of intel be200 radios in a system, and bring them
admin up and down.


## From 6.18.13 (un-modified)

40479 Feb 23 14:13:31 ct523c-de7c kernel: 5 locks held by kworker/u32:11/34989:
40480 Feb 23 14:13:31 ct523c-de7c kernel:  #0: ffff888120161148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0xf7a/0x17b0
40481 Feb 23 14:13:31 ct523c-de7c kernel:  #1: ffff8881a561fd20 ((work_completion)(&rdev->wiphy_work)){+.+.}-{0:0}, at: process_one_work+0x7ca/0x17b0
40482 Feb 23 14:13:31 ct523c-de7c kernel:  #2: ffff88815e618788 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: cfg80211_wiphy_work+0x5c/0x570 [cfg80211]
40483 Feb 23 14:13:31 ct523c-de7c kernel:  #3: ffffffff87232e60 (&cma->alloc_mutex){+.+.}-{4:4}, at: __cma_alloc+0x3c5/0xd20
40484 Feb 23 14:13:31 ct523c-de7c kernel:  #4: ffffffff8534f668 (lock#5){+.+.}-{4:4}, at: __lru_add_drain_all+0x5f/0x530

40488 Feb 23 14:13:31 ct523c-de7c kernel: 4 locks held by kworker/1:0/39480:
40489 Feb 23 14:13:31 ct523c-de7c kernel:  #0: ffff88812006b148 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0xf7a/0x17b0
40490 Feb 23 14:13:31 ct523c-de7c kernel:  #1: ffff88814087fd20 (reg_work){+.+.}-{0:0}, at: process_one_work+0x7ca/0x17b0
40491 Feb 23 14:13:31 ct523c-de7c kernel:  #2: ffffffff85970028 (rtnl_mutex){+.+.}-{4:4}, at: reg_todo+0x18/0x770 [cfg80211]
40492 Feb 23 14:13:31 ct523c-de7c kernel:  #3: ffff88815e618788 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: reg_process_self_managed_hints+0x70/0x190 [cfg80211]


## Rest of this is from my 6.18.9+ hacks kernel.

### thread trying to allocate cma is blocked here, trying to flush work.

Type "apropos word" to search for commands related to "word"...
Reading symbols from vmlinux...
(gdb) l *(alloc_contig_range_noprof+0x1de)
0xffffffff8162453e is in alloc_contig_range_noprof (/home2/greearb/git/linux-6.18.dev.y/mm/page_alloc.c:6798).
6793			.reason = MR_CONTIG_RANGE,
6794		};
6795	
6796		lru_cache_disable();
6797	
6798		while (pfn < end || !list_empty(&cc->migratepages)) {
6799			if (fatal_signal_pending(current)) {
6800				ret = -EINTR;
6801				break;
6802			}
(gdb) l *(__lru_add_drain_all+0x19b)
0xffffffff815ae44b is in __lru_add_drain_all (/home2/greearb/git/linux-6.18.dev.y/mm/swap.c:884).
879				queue_work_on(cpu, mm_percpu_wq, work);
880				__cpumask_set_cpu(cpu, &has_work);
881			}
882		}
883	
884		for_each_cpu(cpu, &has_work)
885			flush_work(&per_cpu(lru_add_drain_work, cpu));
886	
887	done:
888		mutex_unlock(&lock);
(gdb)


#### and other thread is trying to process a regdom request, and trying to use
# rcu and rtnl???

Type "apropos word" to search for commands related to "word"...
Reading symbols from net/wireless/cfg80211.ko...
(gdb) l *(reg_todo+0x18)
0xe238 is in reg_todo (/home2/greearb/git/linux-6.18.dev.y/net/wireless/reg.c:3107).
3102	 */
3103	static void reg_process_pending_hints(void)
3104	{
3105		struct regulatory_request *reg_request, *lr;
3106	
3107		lr = get_last_request();
3108	
3109		/* When last_request->processed becomes true this will be rescheduled */
3110		if (lr && !lr->processed) {
3111			pr_debug("Pending regulatory request, waiting for it to be processed...\n");
(gdb)

static struct regulatory_request *get_last_request(void)
{
	return rcu_dereference_rtnl(last_request);
}


task:kworker/6:0     state:D stack:0     pid:56    tgid:56    ppid:2      task_flags:0x4208060 flags:0x00080000
Workqueue: events reg_todo [cfg80211]
Call Trace:
  <TASK>
  __schedule+0x526/0x1290
  preempt_schedule_notrace+0x35/0x50
  preempt_schedule_notrace_thunk+0x16/0x30
  rcu_is_watching+0x2a/0x30
  lock_acquire+0x26d/0x2c0
  schedule+0xac/0x120
  ? schedule+0x8d/0x120
  schedule_preempt_disabled+0x11/0x20
  __mutex_lock+0x726/0x1070
  ? reg_todo+0x18/0x2b0 [cfg80211]
  ? reg_todo+0x18/0x2b0 [cfg80211]
  reg_todo+0x18/0x2b0 [cfg80211]
  process_one_work+0x221/0x6d0
  worker_thread+0x1e5/0x3b0
  ? rescuer_thread+0x450/0x450
  kthread+0x108/0x220
  ? kthreads_online_cpu+0x110/0x110
  ret_from_fork+0x1c6/0x220
  ? kthreads_online_cpu+0x110/0x110
  ret_from_fork_asm+0x11/0x20
  </TASK>

task:ip              state:D stack:0     pid:72857 tgid:72857 ppid:72843  task_flags:0x400100 flags:0x00080001
Call Trace:
  <TASK>
  __schedule+0x526/0x1290
  ? schedule+0x8d/0x120
  ? schedule+0xe2/0x120
  schedule+0x36/0x120
  schedule_timeout+0xf9/0x110
  ? mark_held_locks+0x40/0x70
  __wait_for_common+0xbe/0x1e0
  ? hrtimer_nanosleep_restart+0x120/0x120
  ? __flush_work+0x20b/0x530
  __flush_work+0x34e/0x530
  ? flush_workqueue_prep_pwqs+0x160/0x160
  ? bpf_prog_test_run_tracing+0x160/0x2d0
  __lru_add_drain_all+0x19b/0x220
  alloc_contig_range_noprof+0x1de/0x8a0
  __cma_alloc+0x1f1/0x6a0
  __dma_direct_alloc_pages.isra.0+0xcb/0x2f0
  dma_direct_alloc+0x7b/0x250
  dma_alloc_attrs+0xa1/0x2a0
  _iwl_pcie_ctxt_info_dma_alloc_coherent+0x31/0xb0 [iwlwifi]
  iwl_pcie_ctxt_info_alloc_dma+0x20/0x50 [iwlwifi]
  iwl_pcie_init_fw_sec+0x2fc/0x380 [iwlwifi]
  iwl_pcie_ctxt_info_v2_alloc+0x19e/0x530 [iwlwifi]
  iwl_trans_pcie_gen2_start_fw+0x2e2/0x820 [iwlwifi]
  ? lock_is_held_type+0x92/0x100
  iwl_trans_start_fw+0x77/0x90 [iwlwifi]
  iwl_mld_load_fw_wait_alive+0x97/0x2c0 [iwlmld]
  ? iwl_mld_mac80211_sta_state+0x780/0x780 [iwlmld]
  ? lock_is_held_type+0x92/0x100
  iwl_mld_load_fw+0x91/0x240 [iwlmld]
  ? ieee80211_open+0x3d/0xe0 [mac80211]
  ? lock_is_held_type+0x92/0x100
  iwl_mld_start_fw+0x44/0x470 [iwlmld]
  iwl_mld_mac80211_start+0x3d/0x1b0 [iwlmld]
  drv_start+0x6f/0x1d0 [mac80211]
  ieee80211_do_open+0x2d6/0x960 [mac80211]
  ieee80211_open+0x62/0xe0 [mac80211]
  __dev_open+0x11a/0x2e0
  __dev_change_flags+0x1f8/0x280
  netif_change_flags+0x22/0x60
  do_setlink.isra.0+0xe57/0x11a0
  ? __mutex_lock+0xb0/0x1070
  ? __mutex_lock+0x99e/0x1070
  ? __nla_validate_parse+0x5e/0xcd0
  ? rtnl_newlink+0x355/0xb50
  ? cap_capable+0x90/0x100
  ? security_capable+0x72/0x80
  rtnl_newlink+0x7e8/0xb50
  ? __lock_acquire+0x436/0x2190
  ? lock_acquire+0xc2/0x2c0
  ? rtnetlink_rcv_msg+0x97/0x660
  ? find_held_lock+0x2b/0x80
  ? do_setlink.isra.0+0x11a0/0x11a0
  ? rtnetlink_rcv_msg+0x3ea/0x660
  ? lock_release+0xcc/0x290
  ? do_setlink.isra.0+0x11a0/0x11a0
  rtnetlink_rcv_msg+0x409/0x660
  ? rtnl_fdb_dump+0x240/0x240
  netlink_rcv_skb+0x56/0x100
  netlink_unicast+0x1e1/0x2d0
  netlink_sendmsg+0x219/0x460
  __sock_sendmsg+0x38/0x70
  ____sys_sendmsg+0x214/0x280
  ? import_iovec+0x2c/0x30
  ? copy_msghdr_from_user+0x6c/0xa0
  ___sys_sendmsg+0x85/0xd0
  ? __lock_acquire+0x436/0x2190
  ? find_held_lock+0x2b/0x80
  ? lock_acquire+0xc2/0x2c0
  ? mntput_no_expire+0x43/0x460
  ? find_held_lock+0x2b/0x80
  ? mntput_no_expire+0x8c/0x460
  __sys_sendmsg+0x6b/0xc0
  do_syscall_64+0x6b/0x11b0
  entry_SYSCALL_64_after_hwframe+0x4b/0x53

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 6.18.13 iwlwifi deadlock allocating cma while work-item is active.
  2026-02-23 22:36 6.18.13 iwlwifi deadlock allocating cma while work-item is active Ben Greear
@ 2026-02-27 16:31 ` Ben Greear
  2026-03-01 15:38   ` Ben Greear
  0 siblings, 1 reply; 22+ messages in thread
From: Ben Greear @ 2026-02-27 16:31 UTC (permalink / raw)
  To: linux-wireless; +Cc: Korenblit, Miriam Rachel, linux-mm

On 2/23/26 14:36, Ben Greear wrote:
> Hello,
> 
> I hit a deadlock related to CMA mem allocation attempting to flush all work
> while holding some wifi related mutex, and with a work-queue attempting to process a wifi regdomain
> work item.  I really don't see any good way to fix this,
> it would seem that any code that was holding a mutex that could block a work-queue
> cannot safely allocate CMA memory?  Hopefully someone else has a better idea.

I tried using a kthread to do the regulatory domain processing instead of worker item,
and that seems to have solved the problem.  If that seems reasonable approach to
wifi stack folks, I can post a patch.

Thanks,
Ben

> 
> For whatever reason, my hacked up kernel will print out the sysrq process stack traces I need
> to understand this, and my stable 6.18.13 will not.  But, the locks-held matches in both cases, so almost
> certainly this is same problem.  I can reproduce the same problem on both un-modified stable
> and my own.  The details below are from my modified 6.18.9+ kernel.
> 
> I only hit this (reliably?) with a KASAN enabled kernel, likely because it makes things slow enough to
> hit the problem and/or causes CMA allocations in a different manner.
> 
> General way to reproduce is to have large amounts of intel be200 radios in a system, and bring them
> admin up and down.
> 
> 
> ## From 6.18.13 (un-modified)
> 
> 40479 Feb 23 14:13:31 ct523c-de7c kernel: 5 locks held by kworker/u32:11/34989:
> 40480 Feb 23 14:13:31 ct523c-de7c kernel:  #0: ffff888120161148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0xf7a/0x17b0
> 40481 Feb 23 14:13:31 ct523c-de7c kernel:  #1: ffff8881a561fd20 ((work_completion)(&rdev->wiphy_work)){+.+.}-{0:0}, at: process_one_work+0x7ca/0x17b0
> 40482 Feb 23 14:13:31 ct523c-de7c kernel:  #2: ffff88815e618788 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: cfg80211_wiphy_work+0x5c/0x570 [cfg80211]
> 40483 Feb 23 14:13:31 ct523c-de7c kernel:  #3: ffffffff87232e60 (&cma->alloc_mutex){+.+.}-{4:4}, at: __cma_alloc+0x3c5/0xd20
> 40484 Feb 23 14:13:31 ct523c-de7c kernel:  #4: ffffffff8534f668 (lock#5){+.+.}-{4:4}, at: __lru_add_drain_all+0x5f/0x530
> 
> 40488 Feb 23 14:13:31 ct523c-de7c kernel: 4 locks held by kworker/1:0/39480:
> 40489 Feb 23 14:13:31 ct523c-de7c kernel:  #0: ffff88812006b148 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0xf7a/0x17b0
> 40490 Feb 23 14:13:31 ct523c-de7c kernel:  #1: ffff88814087fd20 (reg_work){+.+.}-{0:0}, at: process_one_work+0x7ca/0x17b0
> 40491 Feb 23 14:13:31 ct523c-de7c kernel:  #2: ffffffff85970028 (rtnl_mutex){+.+.}-{4:4}, at: reg_todo+0x18/0x770 [cfg80211]
> 40492 Feb 23 14:13:31 ct523c-de7c kernel:  #3: ffff88815e618788 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: reg_process_self_managed_hints+0x70/0x190 [cfg80211]
> 
> 
> ## Rest of this is from my 6.18.9+ hacks kernel.
> 
> ### thread trying to allocate cma is blocked here, trying to flush work.
> 
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from vmlinux...
> (gdb) l *(alloc_contig_range_noprof+0x1de)
> 0xffffffff8162453e is in alloc_contig_range_noprof (/home2/greearb/git/linux-6.18.dev.y/mm/page_alloc.c:6798).
> 6793            .reason = MR_CONTIG_RANGE,
> 6794        };
> 6795
> 6796        lru_cache_disable();
> 6797
> 6798        while (pfn < end || !list_empty(&cc->migratepages)) {
> 6799            if (fatal_signal_pending(current)) {
> 6800                ret = -EINTR;
> 6801                break;
> 6802            }
> (gdb) l *(__lru_add_drain_all+0x19b)
> 0xffffffff815ae44b is in __lru_add_drain_all (/home2/greearb/git/linux-6.18.dev.y/mm/swap.c:884).
> 879                queue_work_on(cpu, mm_percpu_wq, work);
> 880                __cpumask_set_cpu(cpu, &has_work);
> 881            }
> 882        }
> 883
> 884        for_each_cpu(cpu, &has_work)
> 885            flush_work(&per_cpu(lru_add_drain_work, cpu));
> 886
> 887    done:
> 888        mutex_unlock(&lock);
> (gdb)
> 
> 
> #### and other thread is trying to process a regdom request, and trying to use
> # rcu and rtnl???
> 
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from net/wireless/cfg80211.ko...
> (gdb) l *(reg_todo+0x18)
> 0xe238 is in reg_todo (/home2/greearb/git/linux-6.18.dev.y/net/wireless/reg.c:3107).
> 3102     */
> 3103    static void reg_process_pending_hints(void)
> 3104    {
> 3105        struct regulatory_request *reg_request, *lr;
> 3106
> 3107        lr = get_last_request();
> 3108
> 3109        /* When last_request->processed becomes true this will be rescheduled */
> 3110        if (lr && !lr->processed) {
> 3111            pr_debug("Pending regulatory request, waiting for it to be processed...\n");
> (gdb)
> 
> static struct regulatory_request *get_last_request(void)
> {
>      return rcu_dereference_rtnl(last_request);
> }
> 
> 
> task:kworker/6:0     state:D stack:0     pid:56    tgid:56    ppid:2      task_flags:0x4208060 flags:0x00080000
> Workqueue: events reg_todo [cfg80211]
> Call Trace:
>   <TASK>
>   __schedule+0x526/0x1290
>   preempt_schedule_notrace+0x35/0x50
>   preempt_schedule_notrace_thunk+0x16/0x30
>   rcu_is_watching+0x2a/0x30
>   lock_acquire+0x26d/0x2c0
>   schedule+0xac/0x120
>   ? schedule+0x8d/0x120
>   schedule_preempt_disabled+0x11/0x20
>   __mutex_lock+0x726/0x1070
>   ? reg_todo+0x18/0x2b0 [cfg80211]
>   ? reg_todo+0x18/0x2b0 [cfg80211]
>   reg_todo+0x18/0x2b0 [cfg80211]
>   process_one_work+0x221/0x6d0
>   worker_thread+0x1e5/0x3b0
>   ? rescuer_thread+0x450/0x450
>   kthread+0x108/0x220
>   ? kthreads_online_cpu+0x110/0x110
>   ret_from_fork+0x1c6/0x220
>   ? kthreads_online_cpu+0x110/0x110
>   ret_from_fork_asm+0x11/0x20
>   </TASK>
> 
> task:ip              state:D stack:0     pid:72857 tgid:72857 ppid:72843  task_flags:0x400100 flags:0x00080001
> Call Trace:
>   <TASK>
>   __schedule+0x526/0x1290
>   ? schedule+0x8d/0x120
>   ? schedule+0xe2/0x120
>   schedule+0x36/0x120
>   schedule_timeout+0xf9/0x110
>   ? mark_held_locks+0x40/0x70
>   __wait_for_common+0xbe/0x1e0
>   ? hrtimer_nanosleep_restart+0x120/0x120
>   ? __flush_work+0x20b/0x530
>   __flush_work+0x34e/0x530
>   ? flush_workqueue_prep_pwqs+0x160/0x160
>   ? bpf_prog_test_run_tracing+0x160/0x2d0
>   __lru_add_drain_all+0x19b/0x220
>   alloc_contig_range_noprof+0x1de/0x8a0
>   __cma_alloc+0x1f1/0x6a0
>   __dma_direct_alloc_pages.isra.0+0xcb/0x2f0
>   dma_direct_alloc+0x7b/0x250
>   dma_alloc_attrs+0xa1/0x2a0
>   _iwl_pcie_ctxt_info_dma_alloc_coherent+0x31/0xb0 [iwlwifi]
>   iwl_pcie_ctxt_info_alloc_dma+0x20/0x50 [iwlwifi]
>   iwl_pcie_init_fw_sec+0x2fc/0x380 [iwlwifi]
>   iwl_pcie_ctxt_info_v2_alloc+0x19e/0x530 [iwlwifi]
>   iwl_trans_pcie_gen2_start_fw+0x2e2/0x820 [iwlwifi]
>   ? lock_is_held_type+0x92/0x100
>   iwl_trans_start_fw+0x77/0x90 [iwlwifi]
>   iwl_mld_load_fw_wait_alive+0x97/0x2c0 [iwlmld]
>   ? iwl_mld_mac80211_sta_state+0x780/0x780 [iwlmld]
>   ? lock_is_held_type+0x92/0x100
>   iwl_mld_load_fw+0x91/0x240 [iwlmld]
>   ? ieee80211_open+0x3d/0xe0 [mac80211]
>   ? lock_is_held_type+0x92/0x100
>   iwl_mld_start_fw+0x44/0x470 [iwlmld]
>   iwl_mld_mac80211_start+0x3d/0x1b0 [iwlmld]
>   drv_start+0x6f/0x1d0 [mac80211]
>   ieee80211_do_open+0x2d6/0x960 [mac80211]
>   ieee80211_open+0x62/0xe0 [mac80211]
>   __dev_open+0x11a/0x2e0
>   __dev_change_flags+0x1f8/0x280
>   netif_change_flags+0x22/0x60
>   do_setlink.isra.0+0xe57/0x11a0
>   ? __mutex_lock+0xb0/0x1070
>   ? __mutex_lock+0x99e/0x1070
>   ? __nla_validate_parse+0x5e/0xcd0
>   ? rtnl_newlink+0x355/0xb50
>   ? cap_capable+0x90/0x100
>   ? security_capable+0x72/0x80
>   rtnl_newlink+0x7e8/0xb50
>   ? __lock_acquire+0x436/0x2190
>   ? lock_acquire+0xc2/0x2c0
>   ? rtnetlink_rcv_msg+0x97/0x660
>   ? find_held_lock+0x2b/0x80
>   ? do_setlink.isra.0+0x11a0/0x11a0
>   ? rtnetlink_rcv_msg+0x3ea/0x660
>   ? lock_release+0xcc/0x290
>   ? do_setlink.isra.0+0x11a0/0x11a0
>   rtnetlink_rcv_msg+0x409/0x660
>   ? rtnl_fdb_dump+0x240/0x240
>   netlink_rcv_skb+0x56/0x100
>   netlink_unicast+0x1e1/0x2d0
>   netlink_sendmsg+0x219/0x460
>   __sock_sendmsg+0x38/0x70
>   ____sys_sendmsg+0x214/0x280
>   ? import_iovec+0x2c/0x30
>   ? copy_msghdr_from_user+0x6c/0xa0
>   ___sys_sendmsg+0x85/0xd0
>   ? __lock_acquire+0x436/0x2190
>   ? find_held_lock+0x2b/0x80
>   ? lock_acquire+0xc2/0x2c0
>   ? mntput_no_expire+0x43/0x460
>   ? find_held_lock+0x2b/0x80
>   ? mntput_no_expire+0x8c/0x460
>   __sys_sendmsg+0x6b/0xc0
>   do_syscall_64+0x6b/0x11b0
>   entry_SYSCALL_64_after_hwframe+0x4b/0x53
> 
> Thanks,
> Ben
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 6.18.13 iwlwifi deadlock allocating cma while work-item is active.
  2026-02-27 16:31 ` Ben Greear
@ 2026-03-01 15:38   ` Ben Greear
  2026-03-02  8:07     ` Johannes Berg
  0 siblings, 1 reply; 22+ messages in thread
From: Ben Greear @ 2026-03-01 15:38 UTC (permalink / raw)
  To: linux-wireless; +Cc: Korenblit, Miriam Rachel, linux-mm

On 2/27/26 08:31, Ben Greear wrote:
> On 2/23/26 14:36, Ben Greear wrote:
>> Hello,
>>
>> I hit a deadlock related to CMA mem allocation attempting to flush all work
>> while holding some wifi related mutex, and with a work-queue attempting to process a wifi regdomain
>> work item.  I really don't see any good way to fix this,
>> it would seem that any code that was holding a mutex that could block a work-queue
>> cannot safely allocate CMA memory?  Hopefully someone else has a better idea.
> 
> I tried using a kthread to do the regulatory domain processing instead of worker item,
> and that seems to have solved the problem.  If that seems reasonable approach to
> wifi stack folks, I can post a patch.

The other net/wireless work-item 'disconnect_work' also needs to be moved to the kthread
for the same reason....

Thanks,
Ben

>> For whatever reason, my hacked up kernel will print out the sysrq process stack traces I need
>> to understand this, and my stable 6.18.13 will not.  But, the locks-held matches in both cases, so almost
>> certainly this is same problem.  I can reproduce the same problem on both un-modified stable
>> and my own.  The details below are from my modified 6.18.9+ kernel.
>>
>> I only hit this (reliably?) with a KASAN enabled kernel, likely because it makes things slow enough to
>> hit the problem and/or causes CMA allocations in a different manner.
>>
>> General way to reproduce is to have large amounts of intel be200 radios in a system, and bring them
>> admin up and down.
>>
>>
>> ## From 6.18.13 (un-modified)
>>
>> 40479 Feb 23 14:13:31 ct523c-de7c kernel: 5 locks held by kworker/u32:11/34989:
>> 40480 Feb 23 14:13:31 ct523c-de7c kernel:  #0: ffff888120161148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0xf7a/0x17b0
>> 40481 Feb 23 14:13:31 ct523c-de7c kernel:  #1: ffff8881a561fd20 ((work_completion)(&rdev->wiphy_work)){+.+.}-{0:0}, at: process_one_work+0x7ca/0x17b0
>> 40482 Feb 23 14:13:31 ct523c-de7c kernel:  #2: ffff88815e618788 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: cfg80211_wiphy_work+0x5c/0x570 [cfg80211]
>> 40483 Feb 23 14:13:31 ct523c-de7c kernel:  #3: ffffffff87232e60 (&cma->alloc_mutex){+.+.}-{4:4}, at: __cma_alloc+0x3c5/0xd20
>> 40484 Feb 23 14:13:31 ct523c-de7c kernel:  #4: ffffffff8534f668 (lock#5){+.+.}-{4:4}, at: __lru_add_drain_all+0x5f/0x530
>>
>> 40488 Feb 23 14:13:31 ct523c-de7c kernel: 4 locks held by kworker/1:0/39480:
>> 40489 Feb 23 14:13:31 ct523c-de7c kernel:  #0: ffff88812006b148 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0xf7a/0x17b0
>> 40490 Feb 23 14:13:31 ct523c-de7c kernel:  #1: ffff88814087fd20 (reg_work){+.+.}-{0:0}, at: process_one_work+0x7ca/0x17b0
>> 40491 Feb 23 14:13:31 ct523c-de7c kernel:  #2: ffffffff85970028 (rtnl_mutex){+.+.}-{4:4}, at: reg_todo+0x18/0x770 [cfg80211]
>> 40492 Feb 23 14:13:31 ct523c-de7c kernel:  #3: ffff88815e618788 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: reg_process_self_managed_hints+0x70/0x190 [cfg80211]
>>
>>
>> ## Rest of this is from my 6.18.9+ hacks kernel.
>>
>> ### thread trying to allocate cma is blocked here, trying to flush work.
>>
>> Type "apropos word" to search for commands related to "word"...
>> Reading symbols from vmlinux...
>> (gdb) l *(alloc_contig_range_noprof+0x1de)
>> 0xffffffff8162453e is in alloc_contig_range_noprof (/home2/greearb/git/linux-6.18.dev.y/mm/page_alloc.c:6798).
>> 6793            .reason = MR_CONTIG_RANGE,
>> 6794        };
>> 6795
>> 6796        lru_cache_disable();
>> 6797
>> 6798        while (pfn < end || !list_empty(&cc->migratepages)) {
>> 6799            if (fatal_signal_pending(current)) {
>> 6800                ret = -EINTR;
>> 6801                break;
>> 6802            }
>> (gdb) l *(__lru_add_drain_all+0x19b)
>> 0xffffffff815ae44b is in __lru_add_drain_all (/home2/greearb/git/linux-6.18.dev.y/mm/swap.c:884).
>> 879                queue_work_on(cpu, mm_percpu_wq, work);
>> 880                __cpumask_set_cpu(cpu, &has_work);
>> 881            }
>> 882        }
>> 883
>> 884        for_each_cpu(cpu, &has_work)
>> 885            flush_work(&per_cpu(lru_add_drain_work, cpu));
>> 886
>> 887    done:
>> 888        mutex_unlock(&lock);
>> (gdb)
>>
>>
>> #### and other thread is trying to process a regdom request, and trying to use
>> # rcu and rtnl???
>>
>> Type "apropos word" to search for commands related to "word"...
>> Reading symbols from net/wireless/cfg80211.ko...
>> (gdb) l *(reg_todo+0x18)
>> 0xe238 is in reg_todo (/home2/greearb/git/linux-6.18.dev.y/net/wireless/reg.c:3107).
>> 3102     */
>> 3103    static void reg_process_pending_hints(void)
>> 3104    {
>> 3105        struct regulatory_request *reg_request, *lr;
>> 3106
>> 3107        lr = get_last_request();
>> 3108
>> 3109        /* When last_request->processed becomes true this will be rescheduled */
>> 3110        if (lr && !lr->processed) {
>> 3111            pr_debug("Pending regulatory request, waiting for it to be processed...\n");
>> (gdb)
>>
>> static struct regulatory_request *get_last_request(void)
>> {
>>      return rcu_dereference_rtnl(last_request);
>> }
>>
>>
>> task:kworker/6:0     state:D stack:0     pid:56    tgid:56    ppid:2      task_flags:0x4208060 flags:0x00080000
>> Workqueue: events reg_todo [cfg80211]
>> Call Trace:
>>   <TASK>
>>   __schedule+0x526/0x1290
>>   preempt_schedule_notrace+0x35/0x50
>>   preempt_schedule_notrace_thunk+0x16/0x30
>>   rcu_is_watching+0x2a/0x30
>>   lock_acquire+0x26d/0x2c0
>>   schedule+0xac/0x120
>>   ? schedule+0x8d/0x120
>>   schedule_preempt_disabled+0x11/0x20
>>   __mutex_lock+0x726/0x1070
>>   ? reg_todo+0x18/0x2b0 [cfg80211]
>>   ? reg_todo+0x18/0x2b0 [cfg80211]
>>   reg_todo+0x18/0x2b0 [cfg80211]
>>   process_one_work+0x221/0x6d0
>>   worker_thread+0x1e5/0x3b0
>>   ? rescuer_thread+0x450/0x450
>>   kthread+0x108/0x220
>>   ? kthreads_online_cpu+0x110/0x110
>>   ret_from_fork+0x1c6/0x220
>>   ? kthreads_online_cpu+0x110/0x110
>>   ret_from_fork_asm+0x11/0x20
>>   </TASK>
>>
>> task:ip              state:D stack:0     pid:72857 tgid:72857 ppid:72843  task_flags:0x400100 flags:0x00080001
>> Call Trace:
>>   <TASK>
>>   __schedule+0x526/0x1290
>>   ? schedule+0x8d/0x120
>>   ? schedule+0xe2/0x120
>>   schedule+0x36/0x120
>>   schedule_timeout+0xf9/0x110
>>   ? mark_held_locks+0x40/0x70
>>   __wait_for_common+0xbe/0x1e0
>>   ? hrtimer_nanosleep_restart+0x120/0x120
>>   ? __flush_work+0x20b/0x530
>>   __flush_work+0x34e/0x530
>>   ? flush_workqueue_prep_pwqs+0x160/0x160
>>   ? bpf_prog_test_run_tracing+0x160/0x2d0
>>   __lru_add_drain_all+0x19b/0x220
>>   alloc_contig_range_noprof+0x1de/0x8a0
>>   __cma_alloc+0x1f1/0x6a0
>>   __dma_direct_alloc_pages.isra.0+0xcb/0x2f0
>>   dma_direct_alloc+0x7b/0x250
>>   dma_alloc_attrs+0xa1/0x2a0
>>   _iwl_pcie_ctxt_info_dma_alloc_coherent+0x31/0xb0 [iwlwifi]
>>   iwl_pcie_ctxt_info_alloc_dma+0x20/0x50 [iwlwifi]
>>   iwl_pcie_init_fw_sec+0x2fc/0x380 [iwlwifi]
>>   iwl_pcie_ctxt_info_v2_alloc+0x19e/0x530 [iwlwifi]
>>   iwl_trans_pcie_gen2_start_fw+0x2e2/0x820 [iwlwifi]
>>   ? lock_is_held_type+0x92/0x100
>>   iwl_trans_start_fw+0x77/0x90 [iwlwifi]
>>   iwl_mld_load_fw_wait_alive+0x97/0x2c0 [iwlmld]
>>   ? iwl_mld_mac80211_sta_state+0x780/0x780 [iwlmld]
>>   ? lock_is_held_type+0x92/0x100
>>   iwl_mld_load_fw+0x91/0x240 [iwlmld]
>>   ? ieee80211_open+0x3d/0xe0 [mac80211]
>>   ? lock_is_held_type+0x92/0x100
>>   iwl_mld_start_fw+0x44/0x470 [iwlmld]
>>   iwl_mld_mac80211_start+0x3d/0x1b0 [iwlmld]
>>   drv_start+0x6f/0x1d0 [mac80211]
>>   ieee80211_do_open+0x2d6/0x960 [mac80211]
>>   ieee80211_open+0x62/0xe0 [mac80211]
>>   __dev_open+0x11a/0x2e0
>>   __dev_change_flags+0x1f8/0x280
>>   netif_change_flags+0x22/0x60
>>   do_setlink.isra.0+0xe57/0x11a0
>>   ? __mutex_lock+0xb0/0x1070
>>   ? __mutex_lock+0x99e/0x1070
>>   ? __nla_validate_parse+0x5e/0xcd0
>>   ? rtnl_newlink+0x355/0xb50
>>   ? cap_capable+0x90/0x100
>>   ? security_capable+0x72/0x80
>>   rtnl_newlink+0x7e8/0xb50
>>   ? __lock_acquire+0x436/0x2190
>>   ? lock_acquire+0xc2/0x2c0
>>   ? rtnetlink_rcv_msg+0x97/0x660
>>   ? find_held_lock+0x2b/0x80
>>   ? do_setlink.isra.0+0x11a0/0x11a0
>>   ? rtnetlink_rcv_msg+0x3ea/0x660
>>   ? lock_release+0xcc/0x290
>>   ? do_setlink.isra.0+0x11a0/0x11a0
>>   rtnetlink_rcv_msg+0x409/0x660
>>   ? rtnl_fdb_dump+0x240/0x240
>>   netlink_rcv_skb+0x56/0x100
>>   netlink_unicast+0x1e1/0x2d0
>>   netlink_sendmsg+0x219/0x460
>>   __sock_sendmsg+0x38/0x70
>>   ____sys_sendmsg+0x214/0x280
>>   ? import_iovec+0x2c/0x30
>>   ? copy_msghdr_from_user+0x6c/0xa0
>>   ___sys_sendmsg+0x85/0xd0
>>   ? __lock_acquire+0x436/0x2190
>>   ? find_held_lock+0x2b/0x80
>>   ? lock_acquire+0xc2/0x2c0
>>   ? mntput_no_expire+0x43/0x460
>>   ? find_held_lock+0x2b/0x80
>>   ? mntput_no_expire+0x8c/0x460
>>   __sys_sendmsg+0x6b/0xc0
>>   do_syscall_64+0x6b/0x11b0
>>   entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>
>> Thanks,
>> Ben
>>
> 
> 

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 6.18.13 iwlwifi deadlock allocating cma while work-item is active.
  2026-03-01 15:38   ` Ben Greear
@ 2026-03-02  8:07     ` Johannes Berg
  2026-03-02 15:26       ` Ben Greear
  0 siblings, 1 reply; 22+ messages in thread
From: Johannes Berg @ 2026-03-02  8:07 UTC (permalink / raw)
  To: Ben Greear, linux-wireless; +Cc: Korenblit, Miriam Rachel, linux-mm

On Sun, 2026-03-01 at 07:38 -0800, Ben Greear wrote:
> On 2/27/26 08:31, Ben Greear wrote:
> > On 2/23/26 14:36, Ben Greear wrote:
> > > Hello,
> > > 
> > > I hit a deadlock related to CMA mem allocation attempting to flush all work
> > > while holding some wifi related mutex, and with a work-queue attempting to process a wifi regdomain
> > > work item.  I really don't see any good way to fix this,
> > > it would seem that any code that was holding a mutex that could block a work-queue
> > > cannot safely allocate CMA memory?  Hopefully someone else has a better idea.
> > 
> > I tried using a kthread to do the regulatory domain processing instead of worker item,
> > and that seems to have solved the problem.  If that seems reasonable approach to
> > wifi stack folks, I can post a patch.
> 
> The other net/wireless work-item 'disconnect_work' also needs to be moved to the kthread
> for the same reason....

I don't think we want to use a kthread for this, it doesn't really make
sense.

Was this with lockdep? If so, it complain about anything?

I'm having a hard time seeing why it would deadlock at all when wifi
uses  schedule_work() and therefore the system_percpu_wq, and
__lru_add_drain_all() flushes lru_add_drain_work on mm_percpu_wq, and
lru_add_and_bh_lrus_drain() doesn't really _seem_ to do anything related
to RTNL etc.?

I think we need a real explanation here rather than "if I randomly
change this, it no longer appears".

johannes

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 6.18.13 iwlwifi deadlock allocating cma while work-item is active.
  2026-03-02  8:07     ` Johannes Berg
@ 2026-03-02 15:26       ` Ben Greear
  2026-03-02 15:38         ` Johannes Berg
  0 siblings, 1 reply; 22+ messages in thread
From: Ben Greear @ 2026-03-02 15:26 UTC (permalink / raw)
  To: Johannes Berg, linux-wireless; +Cc: Korenblit, Miriam Rachel, linux-mm

On 3/2/26 00:07, Johannes Berg wrote:
> On Sun, 2026-03-01 at 07:38 -0800, Ben Greear wrote:
>> On 2/27/26 08:31, Ben Greear wrote:
>>> On 2/23/26 14:36, Ben Greear wrote:
>>>> Hello,
>>>>
>>>> I hit a deadlock related to CMA mem allocation attempting to flush all work
>>>> while holding some wifi related mutex, and with a work-queue attempting to process a wifi regdomain
>>>> work item.  I really don't see any good way to fix this,
>>>> it would seem that any code that was holding a mutex that could block a work-queue
>>>> cannot safely allocate CMA memory?  Hopefully someone else has a better idea.
>>>
>>> I tried using a kthread to do the regulatory domain processing instead of worker item,
>>> and that seems to have solved the problem.  If that seems reasonable approach to
>>> wifi stack folks, I can post a patch.
>>
>> The other net/wireless work-item 'disconnect_work' also needs to be moved to the kthread
>> for the same reason....
> 
> I don't think we want to use a kthread for this, it doesn't really make
> sense.
> 
> Was this with lockdep? If so, it complain about anything?
> 
> I'm having a hard time seeing why it would deadlock at all when wifi
> uses  schedule_work() and therefore the system_percpu_wq, and
> __lru_add_drain_all() flushes lru_add_drain_work on mm_percpu_wq, and
> lru_add_and_bh_lrus_drain() doesn't really _seem_ to do anything related
> to RTNL etc.?
> 
> I think we need a real explanation here rather than "if I randomly
> change this, it no longer appears".

The path where iwlwifi acquires CMA holds rtnl and/or wiphy locks before
allocating CMA memory, as expected.

And the CMA allocation path attempts to flush the work queues in
at least some cases.

If there is a work item queued that is trying to grab rtnl and/or wiphy lock
when CMA attempts to flush, then the flush work cannot complete, so it deadlocks.

Lockdep doesn't warn about this.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 6.18.13 iwlwifi deadlock allocating cma while work-item is active.
  2026-03-02 15:26       ` Ben Greear
@ 2026-03-02 15:38         ` Johannes Berg
  2026-03-02 15:50           ` Ben Greear
  0 siblings, 1 reply; 22+ messages in thread
From: Johannes Berg @ 2026-03-02 15:38 UTC (permalink / raw)
  To: Ben Greear, linux-wireless; +Cc: Korenblit, Miriam Rachel, linux-mm, Tejun Heo

On Mon, 2026-03-02 at 07:26 -0800, Ben Greear wrote:
> 
> > 
> > Was this with lockdep? If so, it complain about anything?
> > 
> > I'm having a hard time seeing why it would deadlock at all when wifi
> > uses  schedule_work() and therefore the system_percpu_wq, and
> > __lru_add_drain_all() flushes lru_add_drain_work on mm_percpu_wq, and
> > lru_add_and_bh_lrus_drain() doesn't really _seem_ to do anything related
> > to RTNL etc.?
> > 
> > I think we need a real explanation here rather than "if I randomly
> > change this, it no longer appears".
> 
> The path where iwlwifi acquires CMA holds rtnl and/or wiphy locks before
> allocating CMA memory, as expected.
> 
> And the CMA allocation path attempts to flush the work queues in
> at least some cases.
> 
> If there is a work item queued that is trying to grab rtnl and/or wiphy lock
> when CMA attempts to flush, then the flush work cannot complete, so it deadlocks.
> 
> Lockdep doesn't warn about this.

It really should, in cases where it can actually happen, I wrote the
code myself for that... Though things have changed since, and the checks
were lost at least once (and re-added), so I suppose it's possible that
they were lost _again_, but the flushing system is far more flexible now
and it's not flushing the same workqueue anyway, so it shouldn't happen.

I stand by what I said before, need to show more precisely what depends
on what, and I'm not going to accept a random kthread into this.

johannes

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 6.18.13 iwlwifi deadlock allocating cma while work-item is active.
  2026-03-02 15:38         ` Johannes Berg
@ 2026-03-02 15:50           ` Ben Greear
  2026-03-03 11:49             ` Johannes Berg
  0 siblings, 1 reply; 22+ messages in thread
From: Ben Greear @ 2026-03-02 15:50 UTC (permalink / raw)
  To: Johannes Berg, linux-wireless
  Cc: Korenblit, Miriam Rachel, linux-mm, Tejun Heo

On 3/2/26 07:38, Johannes Berg wrote:
> On Mon, 2026-03-02 at 07:26 -0800, Ben Greear wrote:
>>
>>>
>>> Was this with lockdep? If so, it complain about anything?
>>>
>>> I'm having a hard time seeing why it would deadlock at all when wifi
>>> uses  schedule_work() and therefore the system_percpu_wq, and
>>> __lru_add_drain_all() flushes lru_add_drain_work on mm_percpu_wq, and
>>> lru_add_and_bh_lrus_drain() doesn't really _seem_ to do anything related
>>> to RTNL etc.?
>>>
>>> I think we need a real explanation here rather than "if I randomly
>>> change this, it no longer appears".
>>
>> The path where iwlwifi acquires CMA holds rtnl and/or wiphy locks before
>> allocating CMA memory, as expected.
>>
>> And the CMA allocation path attempts to flush the work queues in
>> at least some cases.
>>
>> If there is a work item queued that is trying to grab rtnl and/or wiphy lock
>> when CMA attempts to flush, then the flush work cannot complete, so it deadlocks.
>>
>> Lockdep doesn't warn about this.
> 
> It really should, in cases where it can actually happen, I wrote the
> code myself for that... Though things have changed since, and the checks
> were lost at least once (and re-added), so I suppose it's possible that
> they were lost _again_, but the flushing system is far more flexible now
> and it's not flushing the same workqueue anyway, so it shouldn't happen.
> 
> I stand by what I said before, need to show more precisely what depends
> on what, and I'm not going to accept a random kthread into this.

My first email on the topic has process stack traces as well as lockdep
locks-held printout that points to the deadlock.  I'm not sure what else to offer...please let me know
what you'd like to see.

Thanks,
Ben


> 
> johannes
> 

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 6.18.13 iwlwifi deadlock allocating cma while work-item is active.
  2026-03-02 15:50           ` Ben Greear
@ 2026-03-03 11:49             ` Johannes Berg
  2026-03-03 20:52               ` Tejun Heo
  2026-03-04  3:08               ` Hillf Danton
  0 siblings, 2 replies; 22+ messages in thread
From: Johannes Berg @ 2026-03-03 11:49 UTC (permalink / raw)
  To: Ben Greear, linux-wireless
  Cc: Korenblit, Miriam Rachel, linux-mm, Tejun Heo, linux-kernel

On Mon, 2026-03-02 at 07:50 -0800, Ben Greear wrote:
> On 3/2/26 07:38, Johannes Berg wrote:
> > On Mon, 2026-03-02 at 07:26 -0800, Ben Greear wrote:
> > > 
> > > > 
> > > > Was this with lockdep? If so, it complain about anything?
> > > > 
> > > > I'm having a hard time seeing why it would deadlock at all when wifi
> > > > uses  schedule_work() and therefore the system_percpu_wq, and
> > > > __lru_add_drain_all() flushes lru_add_drain_work on mm_percpu_wq, and
> > > > lru_add_and_bh_lrus_drain() doesn't really _seem_ to do anything related
> > > > to RTNL etc.?
> > > > 
> > > > I think we need a real explanation here rather than "if I randomly
> > > > change this, it no longer appears".
> > > 
> > > The path where iwlwifi acquires CMA holds rtnl and/or wiphy locks before
> > > allocating CMA memory, as expected.
> > > 
> > > And the CMA allocation path attempts to flush the work queues in
> > > at least some cases.
> > > 
> > > If there is a work item queued that is trying to grab rtnl and/or wiphy lock
> > > when CMA attempts to flush, then the flush work cannot complete, so it deadlocks.
> > > 
> > > Lockdep doesn't warn about this.
> > 
> > It really should, in cases where it can actually happen, I wrote the
> > code myself for that... Though things have changed since, and the checks
> > were lost at least once (and re-added), so I suppose it's possible that
> > they were lost _again_, but the flushing system is far more flexible now
> > and it's not flushing the same workqueue anyway, so it shouldn't happen.
> > 
> > I stand by what I said before, need to show more precisely what depends
> > on what, and I'm not going to accept a random kthread into this.
> 
> My first email on the topic has process stack traces as well as lockdep
> locks-held printout that points to the deadlock.  I'm not sure what else to offer...please let me know
> what you'd like to see.

Fair. I don't know, I don't think there's anything that even shows that
there's a dependency between the two workqueues and the
"((wq_completion)events_unbound)" and "((wq_completion)events)", and
there would have to be for it to deadlock this way because of that?

But one is mm_percpu_wq and the other is system_percpu_wq.

Tejun, does the workqueue code somehow introduce a dependency between
different per-CPU workqueues that's not modelled in lockdep?

johannes

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 6.18.13 iwlwifi deadlock allocating cma while work-item is active.
  2026-03-03 11:49             ` Johannes Berg
@ 2026-03-03 20:52               ` Tejun Heo
  2026-03-03 21:03                 ` Johannes Berg
  2026-03-03 21:12                 ` Johannes Berg
  2026-03-04  3:08               ` Hillf Danton
  1 sibling, 2 replies; 22+ messages in thread
From: Tejun Heo @ 2026-03-03 20:52 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Ben Greear, linux-wireless, Korenblit, Miriam Rachel, linux-mm,
	linux-kernel

Hello,

On Tue, Mar 03, 2026 at 12:49:24PM +0100, Johannes Berg wrote:
> Fair. I don't know, I don't think there's anything that even shows that
> there's a dependency between the two workqueues and the
> "((wq_completion)events_unbound)" and "((wq_completion)events)", and
> there would have to be for it to deadlock this way because of that?
> 
> But one is mm_percpu_wq and the other is system_percpu_wq.
> 
> Tejun, does the workqueue code somehow introduce a dependency between
> different per-CPU workqueues that's not modelled in lockdep?

Hopefully not. Kinda late to the party. Why isn't mm_percpu_wq making
forward progress? That should in all circumstances. What's the work item and
kworker doing?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 6.18.13 iwlwifi deadlock allocating cma while work-item is active.
  2026-03-03 20:52               ` Tejun Heo
@ 2026-03-03 21:03                 ` Johannes Berg
  2026-03-03 21:12                 ` Johannes Berg
  1 sibling, 0 replies; 22+ messages in thread
From: Johannes Berg @ 2026-03-03 21:03 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ben Greear, linux-wireless, Korenblit, Miriam Rachel, linux-mm,
	linux-kernel

On Tue, 2026-03-03 at 10:52 -1000, Tejun Heo wrote:
> Hello,
> 
> On Tue, Mar 03, 2026 at 12:49:24PM +0100, Johannes Berg wrote:
> > Fair. I don't know, I don't think there's anything that even shows that
> > there's a dependency between the two workqueues and the
> > "((wq_completion)events_unbound)" and "((wq_completion)events)", and
> > there would have to be for it to deadlock this way because of that?
> > 
> > But one is mm_percpu_wq and the other is system_percpu_wq.
> > 
> > Tejun, does the workqueue code somehow introduce a dependency between
> > different per-CPU workqueues that's not modelled in lockdep?
> 
> Hopefully not. Kinda late to the party.

Yeah, sorry, should've included a link:
https://lore.kernel.org/linux-wireless/fa4e82ee-eb14-3930-c76c-f3bd59c5f258@candelatech.com/

> Why isn't mm_percpu_wq making
> forward progress? That should in all circumstances. What's the work item and
> kworker doing?

So it seems that first iwlwifi is holding the RTNL:

  ieee80211_open+0x62/0xe0 [mac80211]
  __dev_open+0x11a/0x2e0
  __dev_change_flags+0x1f8/0x280
  netif_change_flags+0x22/0x60
  do_setlink.isra.0+0xe57/0x11a0
  rtnl_newlink+0x7e8/0xb50

(last stack trace at the above link)
This stuff definitely happens with the RTNL held, although I didn't
check now which function actually acquires it in this stack.

Simultaneously the kworker/6:0 is stuck in reg_todo(), trying to acquire
the RTNL.

So far that seems fairly much normal. The kworker/6:0 running reg_todo()
is from net/wireless/reg.c, reg_work, scheduled to system_percpu_wq (by
simply schedule_work.)

Now iwlwifi is also trying to allocate coherent DMA memory (continuing
the stack trace), potentially a significant chunk for firmware loading:

  dma_direct_alloc+0x7b/0x250
  dma_alloc_attrs+0xa1/0x2a0
  _iwl_pcie_ctxt_info_dma_alloc_coherent+0x31/0xb0 [iwlwifi]
  iwl_pcie_ctxt_info_alloc_dma+0x20/0x50 [iwlwifi]
  iwl_pcie_init_fw_sec+0x2fc/0x380 [iwlwifi]
  iwl_pcie_ctxt_info_v2_alloc+0x19e/0x530 [iwlwifi]
  iwl_trans_pcie_gen2_start_fw+0x2e2/0x820 [iwlwifi]
  iwl_trans_start_fw+0x77/0x90 [iwlwifi]
  iwl_mld_load_fw_wait_alive+0x97/0x2c0 [iwlmld]
  iwl_mld_load_fw+0x91/0x240 [iwlmld]
  iwl_mld_start_fw+0x44/0x470 [iwlmld]
  iwl_mld_mac80211_start+0x3d/0x1b0 [iwlmld]
  drv_start+0x6f/0x1d0 [mac80211]
  ieee80211_do_open+0x2d6/0x960 [mac80211]
  ieee80211_open+0x62/0xe0 [mac80211]

This is fine, but then it gets into __flush_work() in
__lru_add_drain_all():

  __flush_work+0x34e/0x530
  __lru_add_drain_all+0x19b/0x220
  alloc_contig_range_noprof+0x1de/0x8a0
  __cma_alloc+0x1f1/0x6a0
  __dma_direct_alloc_pages.isra.0+0xcb/0x2f0
  dma_direct_alloc+0x7b/0x250

which is because __lru_add_drain_all() schedules a bunch of workers, one
for each CPU, onto the mm_percpu_wq and then waits for them.

Conceptually, I see nothing wrong with this, hence my question; Ben says
that the system stops making progress at this point.

johannes

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 6.18.13 iwlwifi deadlock allocating cma while work-item is active.
  2026-03-03 20:52               ` Tejun Heo
  2026-03-03 21:03                 ` Johannes Berg
@ 2026-03-03 21:12                 ` Johannes Berg
  2026-03-03 21:40                   ` Ben Greear
  1 sibling, 1 reply; 22+ messages in thread
From: Johannes Berg @ 2026-03-03 21:12 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ben Greear, linux-wireless, Korenblit, Miriam Rachel, linux-mm,
	linux-kernel

On Tue, 2026-03-03 at 10:52 -1000, Tejun Heo wrote:
> Hello,
> 
> On Tue, Mar 03, 2026 at 12:49:24PM +0100, Johannes Berg wrote:
> > Fair. I don't know, I don't think there's anything that even shows that
> > there's a dependency between the two workqueues and the
> > "((wq_completion)events_unbound)" and "((wq_completion)events)", and
> > there would have to be for it to deadlock this way because of that?
> > 
> > But one is mm_percpu_wq and the other is system_percpu_wq.
> > 
> > Tejun, does the workqueue code somehow introduce a dependency between
> > different per-CPU workqueues that's not modelled in lockdep?
> 
> Hopefully not. Kinda late to the party. Why isn't mm_percpu_wq making
> forward progress? That should in all circumstances. What's the work item and
> kworker doing?

Oh and in addition: the worker that's kicked off by
__lru_add_drain_all() doesn't really seem to do anything long-running?
It's lru_add_drain_per_cpu(), which is lru_add_and_bh_lrus_drain(),
which would appear to be entirely non-sleepable code (holding either
local locks or having irqs disabled.) It also doesn't show up in the
log, apparently, hence my question about strange dependencies.

johannes

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 6.18.13 iwlwifi deadlock allocating cma while work-item is active.
  2026-03-03 21:12                 ` Johannes Berg
@ 2026-03-03 21:40                   ` Ben Greear
  2026-03-03 21:54                     ` Tejun Heo
  0 siblings, 1 reply; 22+ messages in thread
From: Ben Greear @ 2026-03-03 21:40 UTC (permalink / raw)
  To: Johannes Berg, Tejun Heo
  Cc: linux-wireless, Korenblit, Miriam Rachel, linux-mm, linux-kernel

On 3/3/26 13:12, Johannes Berg wrote:
> On Tue, 2026-03-03 at 10:52 -1000, Tejun Heo wrote:
>> Hello,
>>
>> On Tue, Mar 03, 2026 at 12:49:24PM +0100, Johannes Berg wrote:
>>> Fair. I don't know, I don't think there's anything that even shows that
>>> there's a dependency between the two workqueues and the
>>> "((wq_completion)events_unbound)" and "((wq_completion)events)", and
>>> there would have to be for it to deadlock this way because of that?
>>>
>>> But one is mm_percpu_wq and the other is system_percpu_wq.
>>>
>>> Tejun, does the workqueue code somehow introduce a dependency between
>>> different per-CPU workqueues that's not modelled in lockdep?
>>
>> Hopefully not. Kinda late to the party. Why isn't mm_percpu_wq making
>> forward progress? That should in all circumstances. What's the work item and
>> kworker doing?
> 
> Oh and in addition: the worker that's kicked off by
> __lru_add_drain_all() doesn't really seem to do anything long-running?
> It's lru_add_drain_per_cpu(), which is lru_add_and_bh_lrus_drain(),
> which would appear to be entirely non-sleepable code (holding either
> local locks or having irqs disabled.) It also doesn't show up in the
> log, apparently, hence my question about strange dependencies.

Hello Tejun,

If I use a kthread to do the blocking reg_todo work, then the problem
goes away, so it somehow does appear that the work flush logic down in swap.c
is somehow being blocked by the reg_todo work item, not just the swap.c
logic somehow blocking against itself.

My kthread hack left the reg_todo work item logic in place, but instead of
the work item doing any blocking work, it instead just wakes the kthread
I added and has that kthread do the work under mutex.

The second regulatory related work item in net/wireless/ causes the same
lockup, though it was harder to reproduce.  Putting that work in the kthread
also seems to have fixed it.

I could only ever reproduce this with KASAN (and lockdep and other debugging options
enabled), my guess is that this is because then the system runs slower and/or there
is more memory pressure.

I should still be able to reproduce this if I switch to upstream kernel, so
if there is any debugging code you'd like me to execute, I will attempt to
do so.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 6.18.13 iwlwifi deadlock allocating cma while work-item is active.
  2026-03-03 21:40                   ` Ben Greear
@ 2026-03-03 21:54                     ` Tejun Heo
  2026-03-04  0:02                       ` Ben Greear
  0 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2026-03-03 21:54 UTC (permalink / raw)
  To: Ben Greear
  Cc: Johannes Berg, linux-wireless, Korenblit, Miriam Rachel, linux-mm,
	linux-kernel

Hello,

On Tue, Mar 03, 2026 at 01:40:54PM -0800, Ben Greear wrote:
> If I use a kthread to do the blocking reg_todo work, then the problem
> goes away, so it somehow does appear that the work flush logic down in swap.c
> is somehow being blocked by the reg_todo work item, not just the swap.c
> logic somehow blocking against itself.
> 
> My kthread hack left the reg_todo work item logic in place, but instead of
> the work item doing any blocking work, it instead just wakes the kthread
> I added and has that kthread do the work under mutex.
> 
> The second regulatory related work item in net/wireless/ causes the same
> lockup, though it was harder to reproduce.  Putting that work in the kthread
> also seems to have fixed it.
> 
> I could only ever reproduce this with KASAN (and lockdep and other debugging options
> enabled), my guess is that this is because then the system runs slower and/or there
> is more memory pressure.
> 
> I should still be able to reproduce this if I switch to upstream kernel, so
> if there is any debugging code you'd like me to execute, I will attempt to
> do so.

I think the main thing is findin out what state the work item is in. Is it
pending, running, or finished? You can enable wq tracepoints to figure that
out or if you can take a crashdump when it's stalled, nowadays it's really
easy to tell the state w/ something like claude code and drgn. Just tell
claude to use drgn to look at the crashdump and ask it to locate the work
item and what it's doing. It works surprisingly well.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 6.18.13 iwlwifi deadlock allocating cma while work-item is active.
  2026-03-03 21:54                     ` Tejun Heo
@ 2026-03-04  0:02                       ` Ben Greear
  2026-03-04 17:14                         ` Tejun Heo
  0 siblings, 1 reply; 22+ messages in thread
From: Ben Greear @ 2026-03-04  0:02 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Johannes Berg, linux-wireless, Korenblit, Miriam Rachel, linux-mm,
	linux-kernel

On 3/3/26 13:54, Tejun Heo wrote:
> Hello,
> 
> On Tue, Mar 03, 2026 at 01:40:54PM -0800, Ben Greear wrote:
>> If I use a kthread to do the blocking reg_todo work, then the problem
>> goes away, so it somehow does appear that the work flush logic down in swap.c
>> is somehow being blocked by the reg_todo work item, not just the swap.c
>> logic somehow blocking against itself.
>>
>> My kthread hack left the reg_todo work item logic in place, but instead of
>> the work item doing any blocking work, it instead just wakes the kthread
>> I added and has that kthread do the work under mutex.
>>
>> The second regulatory related work item in net/wireless/ causes the same
>> lockup, though it was harder to reproduce.  Putting that work in the kthread
>> also seems to have fixed it.
>>
>> I could only ever reproduce this with KASAN (and lockdep and other debugging options
>> enabled), my guess is that this is because then the system runs slower and/or there
>> is more memory pressure.
>>
>> I should still be able to reproduce this if I switch to upstream kernel, so
>> if there is any debugging code you'd like me to execute, I will attempt to
>> do so.
> 
> I think the main thing is findin out what state the work item is in. Is it
> pending, running, or finished? You can enable wq tracepoints to figure that
> out or if you can take a crashdump when it's stalled, nowadays it's really
> easy to tell the state w/ something like claude code and drgn. Just tell
> claude to use drgn to look at the crashdump and ask it to locate the work
> item and what it's doing. It works surprisingly well.

Could the logic that detects blocked work-queues instead be instrumented
to print out more useful information so that just reproducing the problem
and providing dmesg output will be sufficient?  Or does dmesg already provide
enough that would give you a clue as to what is going on?

If I were to attempt to use AI on the coredump, would echoing 'c' to /proc/sysrq-trigger
with kdump enabled (when deadlock is happening) be the appropriate action to grab the core file?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 6.18.13 iwlwifi deadlock allocating cma while work-item is active.
  2026-03-03 11:49             ` Johannes Berg
  2026-03-03 20:52               ` Tejun Heo
@ 2026-03-04  3:08               ` Hillf Danton
  2026-03-04  6:57                 ` Johannes Berg
  1 sibling, 1 reply; 22+ messages in thread
From: Hillf Danton @ 2026-03-04  3:08 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Ben Greear, linux-wireless, Korenblit, Miriam Rachel, linux-mm,
	Tejun Heo, linux-kernel

On Tue, 03 Mar 2026 12:49:24 +0100 Johannes Berg wrote:
>On Mon, 2026-03-02 at 07:50 -0800, Ben Greear wrote:
>> On 3/2/26 07:38, Johannes Berg wrote:
>> > On Mon, 2026-03-02 at 07:26 -0800, Ben Greear wrote:
>> > > > 
>> > > > Was this with lockdep? If so, it complain about anything?
>> > > > 
>> > > > I'm having a hard time seeing why it would deadlock at all when wifi
>> > > > uses  schedule_work() and therefore the system_percpu_wq, and
>> > > > __lru_add_drain_all() flushes lru_add_drain_work on mm_percpu_wq, and
>> > > > lru_add_and_bh_lrus_drain() doesn't really _seem_ to do anything related
>> > > > to RTNL etc.?
>> > > > 
>> > > > I think we need a real explanation here rather than "if I randomly
>> > > > change this, it no longer appears".
>> > > 
>> > > The path where iwlwifi acquires CMA holds rtnl and/or wiphy locks before
>> > > allocating CMA memory, as expected.
>> > > 
>> > > And the CMA allocation path attempts to flush the work queues in
>> > > at least some cases.
>> > > 
>> > > If there is a work item queued that is trying to grab rtnl and/or wiphy lock
>> > > when CMA attempts to flush, then the flush work cannot complete, so it deadlocks.
>> > > 
>> > > Lockdep doesn't warn about this.
>> > 
>> > It really should, in cases where it can actually happen, I wrote the
>> > code myself for that... Though things have changed since, and the checks
>> > were lost at least once (and re-added), so I suppose it's possible that
>> > they were lost _again_, but the flushing system is far more flexible now
>> > and it's not flushing the same workqueue anyway, so it shouldn't happen.
>> > 
>> > I stand by what I said before, need to show more precisely what depends
>> > on what, and I'm not going to accept a random kthread into this.
>> 
>> My first email on the topic has process stack traces as well as lockdep
>> locks-held printout that points to the deadlock.  I'm not sure what else to offer...please let me know
>> what you'd like to see.
>
> Fair. I don't know, I don't think there's anything that even shows that
> there's a dependency between the two workqueues and the
> "((wq_completion)events_unbound)" and "((wq_completion)events)", and
> there would have to be for it to deadlock this way because of that?
>
Given the locks held [1],

	kworker/1:0/39480	kworker/u32:11/34989
	rtnl_mutex
				&rdev->wiphy.mtx
				__lru_add_drain_all
				  flush_work(&per_cpu(lru_add_drain_work, cpu))
	&rdev->wiphy.mtx

__if__ there is one work item queued __before__ one of the flush targets on
workqueue and it acquires the rtnl mutex, then no deadlock can rise,
because worker-xyz gets off CPU due to failing to take the rtnl lock then
worker-xyz+1 dequeus the flush target and completes it due to nothing
with rtnl. Same applies to the wiphy lock.

BTW any chance for queuing work that acquires rtnl lock on mm_percpu_wq?

[1] Subject: 6.18.13 iwlwifi deadlock allocating cma while work-item is active.
https://lore.kernel.org/linux-wireless/fa4e82ee-eb14-3930-c76c-f3bd59c5f258@candelatech.com/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 6.18.13 iwlwifi deadlock allocating cma while work-item is active.
  2026-03-04  3:08               ` Hillf Danton
@ 2026-03-04  6:57                 ` Johannes Berg
  0 siblings, 0 replies; 22+ messages in thread
From: Johannes Berg @ 2026-03-04  6:57 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Ben Greear, linux-wireless, Korenblit, Miriam Rachel, linux-mm,
	Tejun Heo, linux-kernel

On Wed, 2026-03-04 at 11:08 +0800, Hillf Danton wrote:
> > 
> > Fair. I don't know, I don't think there's anything that even shows that
> > there's a dependency between the two workqueues and the
> > "((wq_completion)events_unbound)" and "((wq_completion)events)", and
> > there would have to be for it to deadlock this way because of that?
> > 
> Given the locks held [1],
> 
> 	kworker/1:0/39480	kworker/u32:11/34989
> 	rtnl_mutex
> 				&rdev->wiphy.mtx
> 				__lru_add_drain_all
> 				  flush_work(&per_cpu(lru_add_drain_work, cpu))
> 	&rdev->wiphy.mtx
> 
> __if__ there is one work item queued __before__ one of the flush targets on
> workqueue and it acquires the rtnl mutex, then no deadlock can rise,
> because worker-xyz gets off CPU due to failing to take the rtnl lock then
> worker-xyz+1 dequeus the flush target and completes it due to nothing
> with rtnl. Same applies to the wiphy lock.

Right.

> BTW any chance for queuing work that acquires rtnl lock on mm_percpu_wq?

There really is only the work I was describing and vmstat_work (calling
vmstat_update) on that workqueue, afaict.

johannes

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 6.18.13 iwlwifi deadlock allocating cma while work-item is active.
  2026-03-04  0:02                       ` Ben Greear
@ 2026-03-04 17:14                         ` Tejun Heo
  2026-03-10 16:10                           ` Ben Greear
  0 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2026-03-04 17:14 UTC (permalink / raw)
  To: Ben Greear
  Cc: Johannes Berg, linux-wireless, Korenblit, Miriam Rachel, linux-mm,
	linux-kernel

Hello,

(Partially drafted with the help of Claude)

On Tue, Mar 03, 2026 at 04:02:14PM -0800, Ben Greear wrote:
> Could the logic that detects blocked work-queues instead be instrumented
> to print out more useful information so that just reproducing the problem
> and providing dmesg output will be sufficient?  Or does dmesg already provide
> enough that would give you a clue as to what is going on?

It may not be exactly the same issue, but Breno just posted a patch that
might help. The current watchdog only prints backtraces for workers that
are actively running on CPU, so sleeping culprits are invisible. His
patch removes that filter so all in-flight workers get printed:

  http://lkml.kernel.org/r/aag4tTyeiZyw0jID@gmail.com

Might be worth trying.

> If I were to attempt to use AI on the coredump, would echoing 'c' to
> /proc/sysrq-trigger with kdump enabled (when deadlock is happening) be
> the appropriate action to grab the core file?

Yes, that's right, but you need to set up kdump first. The quickest way
depends on your distro:

 - Fedora/RHEL: dnf install kexec-tools, then kdumpctl reset-crashkernel,
   systemctl enable --now kdump
 - Ubuntu/Debian: apt install kdump-tools (say Yes to enable), reboot
 - Arch: Install kexec-tools, add crashkernel=512M to your kernel
   cmdline, create a kdump.service that runs
   kexec -p /boot/vmlinuz-linux --initrd=/boot/initramfs-linux.img \
     --append="root=<your-root> irqpoll nr_cpus=1 reset_devices"

After reboot, verify with: cat /sys/kernel/kexec_crash_size (should be
non-zero). Then when the deadlock happens:

  echo c > /proc/sysrq-trigger

The system will panic and boot into the kdump kernel. Note that the
kdump kernel runs with very limited memory, so you can't do much there
directly. Use makedumpfile to save a compressed dump to disk:

  makedumpfile -l -d 31 /proc/vmcore /var/crash/vmcore

Most distros' kdump setups do this automatically. Once the dump is saved,
the system reboots back to normal and you can analyze it at your leisure
with drgn:

  drgn -c /var/crash/vmcore

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 6.18.13 iwlwifi deadlock allocating cma while work-item is active.
  2026-03-04 17:14                         ` Tejun Heo
@ 2026-03-10 16:10                           ` Ben Greear
  2026-03-10 18:06                             ` Tejun Heo
  0 siblings, 1 reply; 22+ messages in thread
From: Ben Greear @ 2026-03-10 16:10 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Johannes Berg, linux-wireless, Miriam Rachel, linux-mm,
	linux-kernel

On 3/4/26 09:14, Tejun Heo wrote:
> Hello,
> 
> (Partially drafted with the help of Claude)
> 
> On Tue, Mar 03, 2026 at 04:02:14PM -0800, Ben Greear wrote:
>> Could the logic that detects blocked work-queues instead be instrumented
>> to print out more useful information so that just reproducing the problem
>> and providing dmesg output will be sufficient?  Or does dmesg already provide
>> enough that would give you a clue as to what is going on?
> 
> It may not be exactly the same issue, but Breno just posted a patch that
> might help. The current watchdog only prints backtraces for workers that
> are actively running on CPU, so sleeping culprits are invisible. His
> patch removes that filter so all in-flight workers get printed:
> 
>    http://lkml.kernel.org/r/aag4tTyeiZyw0jID@gmail.com
> 
> Might be worth trying.

Hello Tejun,

I applied the first 4 patches of the v2 of that series to my 6.18.14 kernel, with
my use-kthread-for-regdom patches reverted.

Stock 6.18.16 kernel crashes too often in the wifi driver to reliably reproduce the deadlock
there.

Both the reg_todo and lru_drain are on CPU 5, but I'm not sure if that really
matters.

Does this info below show anything useful to you?

Mar 10 08:59:15 ct523c-2103 kernel: BUG: workqueue lockup - pool cpus=5 node=0 flags=0x0 nice=0 stuck for 57507s!
Mar 10 08:59:15 ct523c-2103 kernel: Showing busy workqueues and worker pools:
Mar 10 08:59:15 ct523c-2103 kernel: workqueue events: flags=0x100
Mar 10 08:59:15 ct523c-2103 kernel:   pwq 2: cpus=0 node=0 flags=0x0 nice=0 active=2 refcnt=3
Mar 10 08:59:15 ct523c-2103 kernel:     in-flight: 264128:disconnect_work [cfg80211] for 57502s disconnect_work [cfg80211]
Mar 10 08:59:15 ct523c-2103 kernel:   pwq 22: cpus=5 node=0 flags=0x0 nice=0 active=9 refcnt=10
Mar 10 08:59:15 ct523c-2103 kernel:     in-flight: 271323:reg_todo [cfg80211] for 57507s
Mar 10 08:59:15 ct523c-2103 kernel:     pending: reg_todo [cfg80211], igb_watchdog_task [igb], output_poll_execute [drm_kms_helper], kernfs_notify_workfn, 
key_garbage_collector, 2*update_super_work, netstamp_clear
Mar 10 08:59:15 ct523c-2103 kernel: workqueue events_unbound: flags=0x2
Mar 10 08:59:15 ct523c-2103 kernel:   pwq 32: cpus=0-7 flags=0x4 nice=0 active=2 refcnt=3
Mar 10 08:59:15 ct523c-2103 kernel:     in-flight: 236337:cfg80211_wiphy_work [cfg80211] for 57507s cfg80211_wiphy_work [cfg80211]
Mar 10 08:59:15 ct523c-2103 kernel: workqueue events_unbound: flags=0x2
Mar 10 08:59:15 ct523c-2103 kernel:   pwq 32: cpus=0-7 flags=0x4 nice=0 active=1 refcnt=2
Mar 10 08:59:15 ct523c-2103 kernel:     in-flight: 142096:linkwatch_event for 57502s linkwatch_event
Mar 10 08:59:15 ct523c-2103 kernel:   pwq 32: cpus=0-7 flags=0x4 nice=0 active=3 refcnt=4
Mar 10 08:59:15 ct523c-2103 kernel:     in-flight: 218388:fsnotify_mark_destroy_workfn for 55995s fsnotify_mark_destroy_workfn BAR(1309) 
,267638:fsnotify_connector_destroy_workfn for 55995s fsnotify_connector_destroy_workfn
Mar 10 08:59:15 ct523c-2103 kernel:   pwq 32: cpus=0-7 flags=0x4 nice=0 active=2 refcnt=4
Mar 10 08:59:15 ct523c-2103 kernel: workqueue events_freezable: flags=0x104
Mar 10 08:59:15 ct523c-2103 kernel:   pwq 22: cpus=5 node=0 flags=0x0 nice=0 active=1 refcnt=2
Mar 10 08:59:15 ct523c-2103 kernel:     pending: pci_pme_list_scan
Mar 10 08:59:15 ct523c-2103 kernel: workqueue events_power_efficient: flags=0x180
Mar 10 08:59:15 ct523c-2103 kernel:   pwq 14: cpus=3 node=0 flags=0x0 nice=0 active=2 refcnt=3
Mar 10 08:59:15 ct523c-2103 kernel:     in-flight: 268226:reg_check_chans_work [cfg80211] for 57441s reg_check_chans_work [cfg80211]
Mar 10 08:59:15 ct523c-2103 kernel:   pwq 22: cpus=5 node=0 flags=0x0 nice=0 active=218 refcnt=219
Mar 10 08:59:15 ct523c-2103 kernel:     pending: gc_worker [nf_conntrack], hub_post_resume, 216*ioc_release_fn
Mar 10 08:59:15 ct523c-2103 kernel: workqueue rcu_gp: flags=0x108
Mar 10 08:59:15 ct523c-2103 kernel:   pwq 22: cpus=5 node=0 flags=0x0 nice=0 active=1 refcnt=2
Mar 10 08:59:15 ct523c-2103 kernel:     pending: process_srcu
Mar 10 08:59:15 ct523c-2103 kernel: workqueue mm_percpu_wq: flags=0x8
Mar 10 08:59:15 ct523c-2103 kernel:   pwq 22: cpus=5 node=0 flags=0x0 nice=0 active=2 refcnt=4
Mar 10 08:59:15 ct523c-2103 kernel:     pending: lru_add_drain_per_cpu BAR(236337), vmstat_update
Mar 10 08:59:15 ct523c-2103 kernel: workqueue cgroup_offline: flags=0x100
Mar 10 08:59:15 ct523c-2103 kernel:   pwq 22: cpus=5 node=0 flags=0x0 nice=0 active=1 refcnt=107
Mar 10 08:59:15 ct523c-2103 kernel:     pending: css_killed_work_fn
Mar 10 08:59:15 ct523c-2103 kernel:     inactive: 105*css_killed_work_fn
Mar 10 08:59:15 ct523c-2103 kernel: workqueue cgroup_release: flags=0x100
Mar 10 08:59:15 ct523c-2103 kernel:   pwq 22: cpus=5 node=0 flags=0x0 nice=0 active=1 refcnt=15
Mar 10 08:59:15 ct523c-2103 kernel:     pending: css_release_work_fn
Mar 10 08:59:15 ct523c-2103 kernel:     inactive: 13*css_release_work_fn
Mar 10 08:59:15 ct523c-2103 kernel: workqueue cgroup_bpf_destroy: flags=0x100
Mar 10 08:59:15 ct523c-2103 kernel:   pwq 22: cpus=5 node=0 flags=0x0 nice=0 active=1 refcnt=54
Mar 10 08:59:15 ct523c-2103 kernel:     pending: cgroup_bpf_release
Mar 10 08:59:15 ct523c-2103 kernel:     inactive: 52*cgroup_bpf_release
Mar 10 08:59:15 ct523c-2103 kernel: workqueue ipv6_addrconf: flags=0x6000a
Mar 10 08:59:15 ct523c-2103 kernel:   pwq 32: cpus=0-7 flags=0x4 nice=0 active=1 refcnt=14
Mar 10 08:59:15 ct523c-2103 kernel:     in-flight: 202972:addrconf_dad_work for 57505s
Mar 10 08:59:15 ct523c-2103 kernel:     inactive: 4*addrconf_verify_work
Mar 10 08:59:15 ct523c-2103 kernel: pool 2: cpus=0 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 203067 243191
Mar 10 08:59:15 ct523c-2103 kernel: pool 14: cpus=3 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 709498 699449
Mar 10 08:59:15 ct523c-2103 kernel: pool 22: cpus=5 node=0 flags=0x0 nice=0 hung=57507s workers=3 idle: 166929 260518
Mar 10 08:59:15 ct523c-2103 kernel: pool 32: cpus=0-7 flags=0x4 nice=0 hung=0s workers=9 idle: 631693 712021 717023 671858
Mar 10 08:59:15 ct523c-2103 kernel: Showing backtraces of busy workers in stalled CPU-bound worker pools:
Mar 10 08:59:15 ct523c-2103 kernel: pool 22:
Mar 10 08:59:15 ct523c-2103 kernel: task:kworker/5:2     state:D stack:0     pid:271323 tgid:271323 ppid:2      task_flags:0x4208060 flags:0x00080000
Mar 10 08:59:15 ct523c-2103 kernel: Workqueue: events reg_todo [cfg80211]
Mar 10 08:59:15 ct523c-2103 kernel: Call Trace:
Mar 10 08:59:15 ct523c-2103 kernel:  <TASK>
Mar 10 08:59:15 ct523c-2103 kernel:  __schedule+0x106f/0x4340
Mar 10 08:59:15 ct523c-2103 kernel:  ? lock_acquire+0x155/0x2e0
Mar 10 08:59:15 ct523c-2103 kernel:  ? io_schedule_timeout+0x150/0x150
Mar 10 08:59:15 ct523c-2103 kernel:  ? __schedule+0x1865/0x4340
Mar 10 08:59:15 ct523c-2103 kernel:  preempt_schedule_notrace+0x4c/0x70
Mar 10 08:59:15 ct523c-2103 kernel:  preempt_schedule_notrace_thunk+0x16/0x30
Mar 10 08:59:15 ct523c-2103 kernel:  rcu_is_watching+0x59/0x70
Mar 10 08:59:15 ct523c-2103 kernel:  lock_acquire+0x291/0x2e0
Mar 10 08:59:15 ct523c-2103 kernel:  schedule+0x211/0x3a0
Mar 10 08:59:15 ct523c-2103 kernel:  ? schedule+0x1f2/0x3a0
Mar 10 08:59:15 ct523c-2103 kernel:  schedule_preempt_disabled+0x11/0x20
Mar 10 08:59:15 ct523c-2103 kernel:  __mutex_lock+0xd02/0x1d60
Mar 10 08:59:15 ct523c-2103 kernel:  ? reg_process_self_managed_hints+0x70/0x190 [cfg80211]
Mar 10 08:59:15 ct523c-2103 kernel:  ? ww_mutex_lock+0x160/0x160
Mar 10 08:59:15 ct523c-2103 kernel:  ? __mutex_unlock_slowpath+0x15d/0x770
Mar 10 08:59:15 ct523c-2103 kernel:  ? wait_for_completion_io_timeout+0x20/0x20
Mar 10 08:59:15 ct523c-2103 kernel:  ? reg_process_self_managed_hints+0x70/0x190 [cfg80211]
Mar 10 08:59:15 ct523c-2103 kernel:  reg_process_self_managed_hints+0x70/0x190 [cfg80211]
Mar 10 08:59:15 ct523c-2103 kernel:  reg_todo+0x52e/0x7c0 [cfg80211]
Mar 10 08:59:15 ct523c-2103 kernel:  ? lock_release+0xce/0x290
Mar 10 08:59:15 ct523c-2103 kernel:  process_one_work+0x88b/0x1820
Mar 10 08:59:15 ct523c-2103 kernel:  ? pwq_dec_nr_in_flight+0xe00/0xe00
Mar 10 08:59:15 ct523c-2103 kernel:  ? reg_process_hint+0x1480/0x1480 [cfg80211]
Mar 10 08:59:15 ct523c-2103 kernel:  worker_thread+0x5a1/0xfd0
Mar 10 08:59:15 ct523c-2103 kernel:  ? __kthread_parkme+0xc6/0x1f0
Mar 10 08:59:15 ct523c-2103 kernel:  ? rescuer_thread+0x1350/0x1350
Mar 10 08:59:15 ct523c-2103 kernel:  kthread+0x3b7/0x770
Mar 10 08:59:15 ct523c-2103 kernel:  ? kthread_is_per_cpu+0xb0/0xb0
Mar 10 08:59:15 ct523c-2103 kernel:  ? ret_from_fork+0x17/0x3a0
Mar 10 08:59:15 ct523c-2103 kernel:  ? lock_release+0xce/0x290
Mar 10 08:59:15 ct523c-2103 kernel:  ? kthread_is_per_cpu+0xb0/0xb0
Mar 10 08:59:15 ct523c-2103 kernel:  ret_from_fork+0x28b/0x3a0
Mar 10 08:59:15 ct523c-2103 kernel:  ? kthread_is_per_cpu+0xb0/0xb0
Mar 10 08:59:15 ct523c-2103 kernel:  ret_from_fork_asm+0x11/0x20
Mar 10 08:59:15 ct523c-2103 kernel:  </TASK>
Mar 10 08:59:46 ct523c-2103 kernel: BUG: workqueue lockup - pool cpus=5 node=0 flags=0x0 nice=0 stuck for 57537s!
Mar 10 08:59:46 ct523c-2103 kernel: Showing busy workqueues and worker pools:
Mar 10 08:59:46 ct523c-2103 kernel: workqueue events: flags=0x100
Mar 10 08:59:46 ct523c-2103 kernel:   pwq 2: cpus=0 node=0 flags=0x0 nice=0 active=2 refcnt=3
Mar 10 08:59:46 ct523c-2103 kernel:     in-flight: 264128:disconnect_work [cfg80211] for 57533s disconnect_work [cfg80211]
Mar 10 08:59:46 ct523c-2103 kernel:   pwq 22: cpus=5 node=0 flags=0x0 nice=0 active=9 refcnt=10
Mar 10 08:59:46 ct523c-2103 kernel:     in-flight: 271323:reg_todo [cfg80211] for 57537s
Mar 10 08:59:46 ct523c-2103 kernel:     pending: reg_todo [cfg80211], igb_watchdog_task [igb], output_poll_execute [drm_kms_helper], kernfs_notify_workfn, 
key_garbage_collector, 2*update_super_work, netstamp_clear
Mar 10 08:59:46 ct523c-2103 kernel: workqueue events_unbound: flags=0x2
Mar 10 08:59:46 ct523c-2103 kernel:   pwq 32: cpus=0-7 flags=0x4 nice=0 active=2 refcnt=3
Mar 10 08:59:46 ct523c-2103 kernel:     in-flight: 236337:cfg80211_wiphy_work [cfg80211] for 57537s cfg80211_wiphy_work [cfg80211]
Mar 10 08:59:46 ct523c-2103 kernel: workqueue events_unbound: flags=0x2
Mar 10 08:59:46 ct523c-2103 kernel:   pwq 32: cpus=0-7 flags=0x4 nice=0 active=1 refcnt=2
Mar 10 08:59:46 ct523c-2103 kernel:     in-flight: 142096:linkwatch_event for 57533s linkwatch_event
Mar 10 08:59:46 ct523c-2103 kernel:   pwq 32: cpus=0-7 flags=0x4 nice=0 active=3 refcnt=4
Mar 10 08:59:46 ct523c-2103 kernel:     in-flight: 218388:fsnotify_mark_destroy_workfn for 56026s fsnotify_mark_destroy_workfn BAR(1309) 
,267638:fsnotify_connector_destroy_workfn for 56026s fsnotify_connector_destroy_workfn
Mar 10 08:59:46 ct523c-2103 kernel:   pwq 32: cpus=0-7 flags=0x4 nice=0 active=2 refcnt=4
Mar 10 08:59:46 ct523c-2103 kernel: workqueue events_freezable: flags=0x104
Mar 10 08:59:46 ct523c-2103 kernel:   pwq 22: cpus=5 node=0 flags=0x0 nice=0 active=1 refcnt=2
Mar 10 08:59:46 ct523c-2103 kernel:     pending: pci_pme_list_scan
Mar 10 08:59:46 ct523c-2103 kernel: workqueue events_power_efficient: flags=0x180
Mar 10 08:59:46 ct523c-2103 kernel:   pwq 14: cpus=3 node=0 flags=0x0 nice=0 active=2 refcnt=3
Mar 10 08:59:46 ct523c-2103 kernel:     in-flight: 268226:reg_check_chans_work [cfg80211] for 57472s reg_check_chans_work [cfg80211]
Mar 10 08:59:46 ct523c-2103 kernel:   pwq 22: cpus=5 node=0 flags=0x0 nice=0 active=218 refcnt=219
Mar 10 08:59:46 ct523c-2103 kernel:     pending: gc_worker [nf_conntrack], hub_post_resume, 216*ioc_release_fn
Mar 10 08:59:46 ct523c-2103 kernel: workqueue rcu_gp: flags=0x108
Mar 10 08:59:46 ct523c-2103 kernel:   pwq 22: cpus=5 node=0 flags=0x0 nice=0 active=1 refcnt=2
Mar 10 08:59:46 ct523c-2103 kernel:     pending: process_srcu
Mar 10 08:59:46 ct523c-2103 kernel: workqueue mm_percpu_wq: flags=0x8
Mar 10 08:59:46 ct523c-2103 kernel:   pwq 22: cpus=5 node=0 flags=0x0 nice=0 active=2 refcnt=4
Mar 10 08:59:46 ct523c-2103 kernel:     pending: lru_add_drain_per_cpu BAR(236337), vmstat_update
Mar 10 08:59:46 ct523c-2103 kernel: workqueue cgroup_offline: flags=0x100
Mar 10 08:59:46 ct523c-2103 kernel:   pwq 22: cpus=5 node=0 flags=0x0 nice=0 active=1 refcnt=107
Mar 10 08:59:46 ct523c-2103 kernel:     pending: css_killed_work_fn
Mar 10 08:59:46 ct523c-2103 kernel:     inactive: 105*css_killed_work_fn
Mar 10 08:59:46 ct523c-2103 kernel: workqueue cgroup_release: flags=0x100
Mar 10 08:59:46 ct523c-2103 kernel:   pwq 22: cpus=5 node=0 flags=0x0 nice=0 active=1 refcnt=15
Mar 10 08:59:46 ct523c-2103 kernel:     pending: css_release_work_fn
Mar 10 08:59:46 ct523c-2103 kernel:     inactive: 13*css_release_work_fn
Mar 10 08:59:46 ct523c-2103 kernel: workqueue cgroup_bpf_destroy: flags=0x100
Mar 10 08:59:46 ct523c-2103 kernel:   pwq 22: cpus=5 node=0 flags=0x0 nice=0 active=1 refcnt=54
Mar 10 08:59:46 ct523c-2103 kernel:     pending: cgroup_bpf_release
Mar 10 08:59:46 ct523c-2103 kernel:     inactive: 52*cgroup_bpf_release
Mar 10 08:59:46 ct523c-2103 kernel: workqueue ipv6_addrconf: flags=0x6000a
Mar 10 08:59:46 ct523c-2103 kernel:   pwq 32: cpus=0-7 flags=0x4 nice=0 active=1 refcnt=14
Mar 10 08:59:46 ct523c-2103 kernel:     in-flight: 202972:addrconf_dad_work for 57536s
Mar 10 08:59:46 ct523c-2103 kernel:     inactive: 4*addrconf_verify_work
Mar 10 08:59:46 ct523c-2103 kernel: pool 2: cpus=0 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 203067 243191
Mar 10 08:59:46 ct523c-2103 kernel: pool 14: cpus=3 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 709498 699449
Mar 10 08:59:46 ct523c-2103 kernel: pool 22: cpus=5 node=0 flags=0x0 nice=0 hung=57537s workers=3 idle: 166929 260518
Mar 10 08:59:46 ct523c-2103 kernel: pool 32: cpus=0-7 flags=0x4 nice=0 hung=0s workers=9 idle: 631693 712021 717023 671858
Mar 10 08:59:46 ct523c-2103 kernel: Showing backtraces of busy workers in stalled CPU-bound worker pools:
Mar 10 08:59:46 ct523c-2103 kernel: pool 22:
Mar 10 08:59:46 ct523c-2103 kernel: task:kworker/5:2     state:D stack:0     pid:271323 tgid:271323 ppid:2      task_flags:0x4208060 flags:0x00080000
Mar 10 08:59:46 ct523c-2103 kernel: Workqueue: events reg_todo [cfg80211]
Mar 10 08:59:46 ct523c-2103 kernel: Call Trace:
Mar 10 08:59:46 ct523c-2103 kernel:  <TASK>
Mar 10 08:59:46 ct523c-2103 kernel:  __schedule+0x106f/0x4340
Mar 10 08:59:46 ct523c-2103 kernel:  ? lock_acquire+0x155/0x2e0
Mar 10 08:59:46 ct523c-2103 kernel:  ? io_schedule_timeout+0x150/0x150
Mar 10 08:59:46 ct523c-2103 kernel:  ? __schedule+0x1865/0x4340
Mar 10 08:59:46 ct523c-2103 kernel:  preempt_schedule_notrace+0x4c/0x70
Mar 10 08:59:46 ct523c-2103 kernel:  preempt_schedule_notrace_thunk+0x16/0x30
Mar 10 08:59:46 ct523c-2103 kernel:  rcu_is_watching+0x59/0x70
Mar 10 08:59:46 ct523c-2103 kernel:  lock_acquire+0x291/0x2e0
Mar 10 08:59:46 ct523c-2103 kernel:  schedule+0x211/0x3a0
Mar 10 08:59:46 ct523c-2103 kernel:  ? schedule+0x1f2/0x3a0
Mar 10 08:59:46 ct523c-2103 kernel:  schedule_preempt_disabled+0x11/0x20
Mar 10 08:59:46 ct523c-2103 kernel:  __mutex_lock+0xd02/0x1d60
Mar 10 08:59:46 ct523c-2103 kernel:  ? reg_process_self_managed_hints+0x70/0x190 [cfg80211]
Mar 10 08:59:46 ct523c-2103 kernel:  ? ww_mutex_lock+0x160/0x160
Mar 10 08:59:46 ct523c-2103 kernel:  ? __mutex_unlock_slowpath+0x15d/0x770
Mar 10 08:59:46 ct523c-2103 kernel:  ? wait_for_completion_io_timeout+0x20/0x20
Mar 10 08:59:46 ct523c-2103 kernel:  ? reg_process_self_managed_hints+0x70/0x190 [cfg80211]
Mar 10 08:59:46 ct523c-2103 kernel:  reg_process_self_managed_hints+0x70/0x190 [cfg80211]
Mar 10 08:59:46 ct523c-2103 kernel:  reg_todo+0x52e/0x7c0 [cfg80211]
Mar 10 08:59:46 ct523c-2103 kernel:  ? lock_release+0xce/0x290
Mar 10 08:59:46 ct523c-2103 kernel:  process_one_work+0x88b/0x1820
Mar 10 08:59:46 ct523c-2103 kernel:  ? pwq_dec_nr_in_flight+0xe00/0xe00
Mar 10 08:59:46 ct523c-2103 kernel:  ? reg_process_hint+0x1480/0x1480 [cfg80211]
Mar 10 08:59:46 ct523c-2103 kernel:  worker_thread+0x5a1/0xfd0
Mar 10 08:59:46 ct523c-2103 kernel:  ? __kthread_parkme+0xc6/0x1f0
Mar 10 08:59:46 ct523c-2103 kernel:  ? rescuer_thread+0x1350/0x1350
Mar 10 08:59:46 ct523c-2103 kernel:  kthread+0x3b7/0x770
Mar 10 08:59:46 ct523c-2103 kernel:  ? kthread_is_per_cpu+0xb0/0xb0
Mar 10 08:59:46 ct523c-2103 kernel:  ? ret_from_fork+0x17/0x3a0
Mar 10 08:59:46 ct523c-2103 kernel:  ? lock_release+0xce/0x290
Mar 10 08:59:46 ct523c-2103 kernel:  ? kthread_is_per_cpu+0xb0/0xb0
Mar 10 08:59:46 ct523c-2103 kernel:  ret_from_fork+0x28b/0x3a0
Mar 10 08:59:46 ct523c-2103 kernel:  ? kthread_is_per_cpu+0xb0/0xb0
Mar 10 08:59:46 ct523c-2103 kernel:  ret_from_fork_asm+0x11/0x20
Mar 10 08:59:46 ct523c-2103 kernel:  </TASK>


>> If I were to attempt to use AI on the coredump, would echoing 'c' to
>> /proc/sysrq-trigger with kdump enabled (when deadlock is happening) be
>> the appropriate action to grab the core file?
> 
> Yes, that's right, but you need to set up kdump first. The quickest way
> depends on your distro:
> 
>   - Fedora/RHEL: dnf install kexec-tools, then kdumpctl reset-crashkernel,
>     systemctl enable --now kdump
>   - Ubuntu/Debian: apt install kdump-tools (say Yes to enable), reboot
>   - Arch: Install kexec-tools, add crashkernel=512M to your kernel
>     cmdline, create a kdump.service that runs
>     kexec -p /boot/vmlinuz-linux --initrd=/boot/initramfs-linux.img \
>       --append="root=<your-root> irqpoll nr_cpus=1 reset_devices"
> 
> After reboot, verify with: cat /sys/kernel/kexec_crash_size (should be
> non-zero). Then when the deadlock happens:
> 
>    echo c > /proc/sysrq-trigger

I have kdump enabled already, and I could create a vmcore like this.
I have never used drgn, nor claude.  Based on the logs above, do
you still think it would be helpful to try drgn?  If so, can you please
suggest some commands or approaches specific to this particular bug?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 6.18.13 iwlwifi deadlock allocating cma while work-item is active.
  2026-03-10 16:10                           ` Ben Greear
@ 2026-03-10 18:06                             ` Tejun Heo
  2026-03-10 19:18                               ` Ben Greear
  0 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2026-03-10 18:06 UTC (permalink / raw)
  To: Ben Greear
  Cc: Johannes Berg, linux-wireless, Miriam Rachel, linux-mm,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1255 bytes --]

Hello,

Thanks for the detailed dump. One thing that doesn't look right is the
number of pending work items on pool 22 (CPU 5). The pool reports 2 idle
workers, yet there are 7+ work items sitting in the pending list across
multiple workqueues. If the pool were making forward progress, those items
would have been picked up by the idle workers. So, the pool itself seems to
be stuck for some reason, and the cfg80211 mutex stall may be a consequence
rather than the cause.

Let's try using drgn on the crash dump. I'm attaching a prompt that you can
feed to Claude (or any LLM with tool access to drgn). It contains workqueue
internals documentation, drgn code snippets, and a systematic investigation
procedure. The idea is:

1. Generate the crash dump when the deadlock is happening:

     echo c > /proc/sysrq-trigger

2. After the crash kernel boots, create the dump file:

     makedumpfile -c -d 31 /proc/vmcore /tmp/vmcore.dmp

3. Feed the attached prompt to Claude with drgn access to the dump. It
   should produce a Markdown report with its findings that you can post
   back here.

This is a bit experimental, so let's see whether it works. Either way, the
report should at least give us concrete data points to work with.

Thanks.

-- 
tejun

[-- Attachment #2: wq-drgn-prompt.txt --]
[-- Type: text/plain, Size: 27606 bytes --]

# Workqueue Lockup Investigation with drgn

You are investigating a Linux kernel workqueue lockup using drgn on a crash
dump. The system reported a workqueue pool stall on CPU 5 with `reg_todo`
[cfg80211] stuck for ~57500 seconds. Your job is to determine the root cause.

## HOW TO RUN DRGN

```bash
# Install drgn (if not already installed):
#   pip3 install drgn
#   OR on Fedora: dnf install drgn

# Run drgn on a crash dump:
drgn -c /path/to/vmcore

# If symbols aren't found automatically, point to the vmlinux:
drgn -c /path/to/vmcore -s /path/to/vmlinux

# For modules, point to the module directory:
drgn -c /path/to/vmcore -s /path/to/vmlinux -s /lib/modules/$(uname -r)/

# Inside drgn, 'prog' is the program object. You can run Python interactively
# or pass a script with -e or -s:
drgn -c /path/to/vmcore -s /path/to/vmlinux -e 'print(prog["jiffies"])'
```

All code blocks in this document are Python code to run inside the drgn
interactive shell or via `-e`.

## METHODOLOGY — READ THIS FIRST

**CRITICAL RULES — violating these will produce wrong conclusions:**

1. **NEVER jump on any specific lead without concrete evidence.** Do not
   assume you know the answer from the dmesg alone. The dmesg gives you a
   starting point, not a conclusion.

2. **Draw conclusions if and only if the hard facts support them.** Every
   claim you make must be backed by specific drgn output — an address, a
   value, a stack trace. If you cannot show the evidence, say "I don't have
   evidence for this" and move on.

3. **Present results and thought process with specific, concrete evidence.**
   Show the drgn commands you ran and the relevant output. Then explain what
   that output means. Evidence first, interpretation second.

4. **Think holistically — do NOT separate workqueue stall from lock stalls.**
   A stuck workqueue pool can CAUSE what looks like deadlocks elsewhere.
   Work items that are expected to run but cannot (because the pool is
   stuck) will stall anything waiting on their completion. A mutex holder
   might be waiting for a work item that will never run. What looks like a
   "deadlock" might actually be a consequence of the pool stall, not the
   cause. Always consider both directions of causality.

5. **Check everything systematically.** Do not skip steps because you think
   you already know the answer. Complete Phase 1 fully before moving to
   Phase 2. If the pool has multiple pending work items that are not being
   processed, the pool IS stuck — do not dismiss this as "a transient
   snapshot."

## WORKQUEUE ARCHITECTURE

### Overview

The Linux workqueue subsystem processes deferred work using kernel threads
(workers) organized into pools:

- **worker_pool**: A group of kernel threads (workers) that share a worklist.
  Each CPU has two standard pools: pool[0] (normal priority) and pool[1]
  (high priority). Unbound pools serve work not tied to a specific CPU.

- **workqueue_struct**: A named workqueue (e.g., "events",
  "events_power_efficient"). Each workqueue connects to pools via
  pool_workqueue (pwq) structures — one pwq per pool the workqueue uses.
  Multiple workqueues share the same underlying pool.

- **pool_workqueue (pwq)**: Links a workqueue to a pool. Tracks nr_active
  (how many work items from this workqueue are active in the pool) and
  enforces max_active limits. Work items exceeding max_active go to
  pwq->inactive_works instead of pool->worklist.

- **worker**: A kernel thread that picks work from pool->worklist and
  executes it. Workers are either idle (on pool->idle_list) or busy
  (in pool->busy_hash, executing a work item).

### Concurrency Management (CMWQ)

For bound (per-CPU) pools, the workqueue uses a concurrency management
protocol based on `pool->nr_running`:

- **nr_running** counts workers actively running on CPU (not sleeping, not
  idle, not marked CPU_INTENSIVE).

- When a worker sleeps (e.g., waiting on a mutex), the scheduler calls
  `wq_worker_sleeping()` which decrements nr_running. If nr_running hits 0
  and there is pending work, `kick_pool()` wakes an idle worker.

- When a worker wakes up, `wq_worker_running()` increments nr_running.

- The decision functions:
  - `need_more_worker(pool)`: `!list_empty(&pool->worklist) && !pool->nr_running`
    — need a worker if work is pending AND nobody is running.
  - `may_start_working(pool)`: `pool->nr_idle` — can proceed only if there
    are idle workers remaining (so there's always a reserve).
  - `keep_working(pool)`: `!list_empty(&pool->worklist) && pool->nr_running <= 1`
    — current worker keeps going if work pending and it's the only runner.

- **Key insight**: If nr_running > 0, the pool assumes someone is handling
  work and does NOT wake idle workers, even if work is pending. A stuck
  nr_running > 0 with no worker actually on CPU would prevent all forward
  progress.

### Worker Lifecycle and the 2-Idle-Worker Invariant

The worker_thread() main loop:
1. Wake up, leave idle state (nr_idle--)
2. Check `need_more_worker()` — if no work or nr_running > 0, go to sleep
3. Check `may_start_working()` — if nr_idle == 0, become manager and create
   new workers before proceeding
4. Clear PREP flag, enter concurrency management (nr_running++)
5. Process work items from pool->worklist in a loop
6. When done, enter idle state (nr_idle++) and sleep

**The 2-idle-worker invariant**: The pool maintains at least 2 idle workers
(enforced by `too_many_workers()` which only culls when nr_idle > 2). This
ensures that when one idle worker wakes to process work (step 1: nr_idle--),
there is still at least one idle worker remaining. If nr_idle hits 0, the
woken worker must become the "manager" and create new workers before it can
process any work.

**Worker creation** (`create_worker()`): Allocates memory with GFP_KERNEL
and calls `kthread_create_on_node()`. Both operations can stall indefinitely
if the system is under memory pressure and reclaim is not making progress.
GFP_KERNEL allocations will not fail — they block in the allocator waiting
for pages. If memory reclaim is broken for any reason, `create_worker()` will
hang forever. This would prevent the pool from recovering if it runs out of
idle workers.

**Mayday/rescuer mechanism**: If `create_worker()` cannot make progress, the
pool's mayday_timer fires and sends distress signals to workqueues that have
WQ_MEM_RECLAIM set. Those workqueues have a dedicated rescuer thread that
can process their work items without needing new workers. However, regular
workqueues like "events" do NOT have rescuers — if they run out of workers,
they are stuck.

### Watchdog

The pool watchdog checks whether pool->watchdog_ts has advanced. watchdog_ts
is updated each time a worker picks up a new work item from the worklist.
If watchdog_ts hasn't advanced for wq_watchdog_thresh seconds (default 30),
the pool is considered stalled. The "hung=Ns" in the stall warning shows
`jiffies - pool->watchdog_ts` converted to seconds.

## DATA STRUCTURE REFERENCE (for drgn)

### Accessing Pools

```python
from drgn.helpers.linux.percpu import per_cpu
from drgn.helpers.linux.list import list_for_each_entry
from drgn import Object, cast

# Per-CPU normal-priority pool for CPU N:
pool = per_cpu(prog["cpu_worker_pools"], cpu)[0]

# Per-CPU high-priority pool for CPU N:
pool = per_cpu(prog["cpu_worker_pools"], cpu)[1]
```

### worker_pool fields
```
pool.cpu              — int, associated CPU (-1 for unbound)
pool.id               — int, pool ID
pool.nr_running       — int, workers currently running on CPU
pool.nr_workers       — int, total workers
pool.nr_idle          — int, currently idle workers
pool.worklist         — list_head, pending work items
pool.idle_list        — list_head, idle workers
pool.workers          — list_head, all workers (iterate via "node" member)
pool.busy_hash        — hashtable of busy workers
pool.manager          — struct worker *, current manager (or NULL)
pool.flags            — uint: POOL_MANAGER_ACTIVE=0x2, POOL_DISASSOCIATED=0x4
pool.watchdog_ts      — unsigned long, jiffies of last forward progress
pool.cpu_stall        — bool, set by watchdog when stalled
```

### worker fields (iterate via pool.workers, link member "node")
```
worker.task           — struct task_struct *, the kthread
worker.current_work   — struct work_struct *, work being executed (NULL if idle)
worker.current_func   — work_func_t, function of current work
worker.current_pwq    — struct pool_workqueue *, pwq of current work
worker.current_at     — u64, ktime at start of current work
worker.sleeping       — int, 1 if worker went to sleep (decremented nr_running)
worker.flags          — uint: WORKER_DIE=0x2, WORKER_IDLE=0x4, WORKER_PREP=0x8,
                        WORKER_CPU_INTENSIVE=0x40, WORKER_UNBOUND=0x80
worker.id             — int, worker ID (shows in task name as kworker/CPU:ID)
worker.last_active    — unsigned long, jiffies of last activity
worker.pool           — struct worker_pool *, associated pool
worker.scheduled      — list_head, scheduled works for this worker
```

### work_struct fields (iterate via pool.worklist, link member "entry")
```
work.data             — atomic_long_t, encodes pwq pointer + flags
work.func             — work_func_t, the function to execute
work.entry            — list_head, linkage in worklist

# Extracting pwq from work->data:
data = work.data.counter.value_()
WORK_STRUCT_PWQ_BIT = 1 << 2
WORK_STRUCT_PWQ_SHIFT = 8  # bits 0-7 are flags, bits 8+ are pwq pointer
if data & WORK_STRUCT_PWQ_BIT:
    pwq_addr = data & ~((1 << WORK_STRUCT_PWQ_SHIFT) - 1)
    pwq = Object(prog, "struct pool_workqueue", address=pwq_addr)
    wq_name = pwq.wq.name.string_().decode()
```

### pool_workqueue fields
```
pwq.pool              — struct worker_pool *, the pool
pwq.wq                — struct workqueue_struct *, the workqueue
pwq.nr_active         — int, active work items from this wq in this pool
pwq.inactive_works    — list_head, work items waiting for nr_active < max_active
pwq.stats[]           — u64 array: [0]=STARTED, [1]=COMPLETED, [2]=CPU_TIME,
                        [3]=CPU_INTENSIVE, [4]=CM_WAKEUP, [5]=REPATRIATED,
                        [6]=MAYDAY, [7]=RESCUED
```

### workqueue_struct fields
```
wq.name               — char[], workqueue name
wq.flags              — uint, WQ_UNBOUND=0x2, WQ_FREEZABLE=0x4,
                        WQ_MEM_RECLAIM=0x8, WQ_HIGHPRI=0x10
wq.max_active         — int, max concurrent work items per pwq
wq.cpu_pwq            — per-cpu pointer to pwqs (for bound workqueues)
wq.pwqs               — list_head, all pwqs (iterate via "pwqs_node")
wq.rescuer            — struct worker * (non-NULL if WQ_MEM_RECLAIM)
```

### Mutex inspection
```python
# struct mutex has an owner field (atomic_long_t)
# Low 3 bits are flags, remaining bits are task_struct pointer
owner_val = mutex.owner.counter.value_()
owner_ptr = owner_val & ~0x7
if owner_ptr:
    owner_task = Object(prog, "struct task_struct", address=owner_ptr)
    print(f"mutex owner: {owner_task.comm.string_().decode()} "
          f"pid={owner_task.pid.value_()}")
    for frame in prog.stack_trace(owner_task):
        print(f"  {frame}")
```

### Stack traces
```python
# IMPORTANT: prog.stack_trace() takes a task_struct or PID, NOT a CPU number
# By task_struct pointer:
for frame in prog.stack_trace(task):
    print(frame)

# By PID:
for frame in prog.stack_trace(pid_number):
    print(frame)
```

### Jiffies time delta
```python
jiffies = prog["jiffies"].value_()
hz = 1000  # CONFIG_HZ, usually 1000 on x86 — verify with the kernel config
delta_jiffies = jiffies - pool.watchdog_ts.value_()
delta_seconds = delta_jiffies / hz
```

## INVESTIGATION PROCEDURE

### Phase 1: Is the workqueue pool stuck?

The dmesg says pool 22 (cpus=5, normal priority) is hung for ~57500s with
multiple pending work items. If a pool has pending work items that are not
being processed, the pool IS stuck. Do not dismiss pending items as "about
to be run" — at 57500s, anything pending is stuck.

Your goal in this phase is to determine WHY the pool is stuck: is it a
concurrency management state bug (nr_running wrong), a worker shortage
(no idle workers, can't create new ones), or something else?

**Step 1.1: Pool overview**
```python
cpu = 5
pool = per_cpu(prog["cpu_worker_pools"], cpu)[0]
print(f"Pool {pool.id.value_()} on CPU {pool.cpu.value_()}")
print(f"  nr_running:  {pool.nr_running.value_()}")
print(f"  nr_workers:  {pool.nr_workers.value_()}")
print(f"  nr_idle:     {pool.nr_idle.value_()}")
print(f"  flags:       0x{pool.flags.value_():x}")
print(f"  cpu_stall:   {pool.cpu_stall.value_()}")
print(f"  manager:     {pool.manager}")

jiffies = prog["jiffies"].value_()
wts = pool.watchdog_ts.value_()
print(f"  watchdog_ts: {wts} (jiffies={jiffies}, delta={jiffies - wts})")
print(f"  worklist empty: {pool.worklist.next.value_() == pool.worklist.address_of_().value_()}")
```

**What to look for:**
- **nr_running**: If > 0 but no worker is actually executing on CPU, the
  concurrency management thinks someone is running when nobody is. This
  would prevent idle workers from being woken. Check every worker to verify
  whether nr_running matches reality.
- **nr_idle**: If 0 and there's pending work, the pool has no reserve
  workers. Check if a manager is active trying to create one — and if so,
  what the manager is stuck on (likely a GFP_KERNEL allocation that won't
  return).
- **nr_workers**: Compare to nr_idle to see how many are busy.
- **flags**: Check POOL_MANAGER_ACTIVE (0x2) — is someone trying to create
  workers?

**Step 1.2: Enumerate ALL workers**
```python
WORKER_DIE = 0x2
WORKER_IDLE = 0x4
WORKER_PREP = 0x8
WORKER_CPU_INTENSIVE = 0x40
WORKER_UNBOUND = 0x80

for worker in list_for_each_entry("struct worker",
        pool.workers.address_of_(), "node"):
    task = worker.task
    pid = task.pid.value_()
    flags = worker.flags.value_()
    state = []
    if flags & WORKER_DIE: state.append("DIE")
    if flags & WORKER_IDLE: state.append("IDLE")
    if flags & WORKER_PREP: state.append("PREP")
    if flags & WORKER_CPU_INTENSIVE: state.append("CPU_INTENSIVE")
    if flags & WORKER_UNBOUND: state.append("UNBOUND")
    if not state: state.append("RUNNING")

    cur = worker.current_work
    func_name = str(worker.current_func) if int(cur) else "(none)"
    sleeping = worker.sleeping.value_()
    last_active = worker.last_active.value_()

    print(f"  worker {worker.id.value_()}: pid={pid}, "
          f"flags=0x{flags:x} [{','.join(state)}], "
          f"sleeping={sleeping}, last_active={last_active}, "
          f"current_func={func_name}")

    # Stack trace for ALL non-idle workers:
    if not (flags & WORKER_IDLE):
        try:
            print(f"    Stack trace:")
            for frame in prog.stack_trace(task):
                print(f"      {frame}")
        except Exception as e:
            print(f"    (stack trace failed: {e})")
    # Also check idle workers' task state — are they truly sleeping idle?
    else:
        tstate = task.__state.value_() if hasattr(task, '__state') else task.state.value_()
        print(f"    task state: {tstate}")
```

**What to look for:**
- For each non-idle worker: What is it doing? Is it sleeping on a lock?
  Is it stuck in an allocation? Is it the manager trying to create workers?
- For idle workers: Are they in TASK_IDLE state as expected? If not,
  something is wrong.
- Cross-check: Does the count of non-IDLE, non-PREP, non-CPU_INTENSIVE
  workers match nr_running? If not, the concurrency management state is
  inconsistent.
- Check worker.sleeping for non-idle workers: if sleeping==1, the worker
  went through wq_worker_sleeping() and decremented nr_running. If all
  non-idle workers have sleeping==1, nr_running should be 0.

**Step 1.3: Check pending work items**
```python
count = 0
for work in list_for_each_entry("struct work_struct",
        pool.worklist.address_of_(), "entry"):
    data = work.data.counter.value_()
    func = str(work.func)
    wq_name = "?"
    if data & (1 << 2):  # WORK_STRUCT_PWQ_BIT
        pwq_addr = data & ~((1 << 8) - 1)
        try:
            pwq = Object(prog, "struct pool_workqueue", address=pwq_addr)
            wq_name = pwq.wq.name.string_().decode()
        except:
            wq_name = f"(pwq@{pwq_addr:#x})"
    print(f"  [{count}] func={func}, wq={wq_name}")
    count += 1
print(f"Total pending: {count}")
```

These are work items waiting to be executed. If there are multiple items and
idle workers exist, something is preventing the workers from picking them up.
Correlate with nr_running and worker states from Step 1.2.

**Step 1.4: Check pwq statistics**

Find pwqs associated with this pool and check started vs completed counts:
```python
# Iterate all workqueues and find pwqs for this pool
pool_id = pool.id.value_()
for wq in list_for_each_entry("struct workqueue_struct",
        prog["workqueues"].address_of_(), "list"):
    try:
        pwq = per_cpu(wq.cpu_pwq, cpu)
        if pwq.pool.id.value_() == pool_id:
            started = pwq.stats[0].value_()
            completed = pwq.stats[1].value_()
            nr_active = pwq.nr_active.value_()
            if started > 0 or nr_active > 0:
                print(f"  wq '{wq.name.string_().decode()}': "
                      f"started={started}, completed={completed}, "
                      f"in_flight={started-completed}, "
                      f"nr_active={nr_active}, "
                      f"max_active={wq.max_active.value_()}")
    except:
        pass  # unbound workqueues don't have cpu_pwq
```

**Step 1.5: Determine the stall mechanism**

Based on the above evidence, determine WHICH of these scenarios applies:

**Scenario A — nr_running stuck > 0**: nr_running is positive but no worker
is actually executing on CPU. All non-idle workers have sleeping==1
(decremented nr_running) or are in PREP state, yet nr_running hasn't
reached 0. This prevents kick_pool() from waking idle workers. This would
be a concurrency management bug.

**Scenario B — No idle workers, manager stuck**: nr_idle is 0,
POOL_MANAGER_ACTIVE is set, and the manager worker is stuck trying to create
a new worker (blocked in GFP_KERNEL allocation or kthread_create). The pool
cannot make progress because there are no idle workers to wake and creating
new ones is blocked. Check the manager's stack trace to see where it's stuck.
If it's in the page allocator, this suggests system-wide memory pressure
where reclaim is not working.

**Scenario C — Workers all blocked on locks**: There are workers processing
work items, but every single one is sleeping on a mutex/lock. nr_running
correctly went to 0, idle workers were woken, but they too picked up work
items that immediately blocked. Eventually all workers are blocked and
no idle workers remain.

**Scenario D — Something else**: If none of the above, describe exactly what
you see and what doesn't add up.

### Phase 2: Investigate the broader stall

**IMPORTANT**: Do not assume the lock that reg_todo is waiting on is the
"root cause." The pool being stuck can CAUSE lock stalls elsewhere. Consider:

- If work item X is expected to release lock L, and work item X is pending
  on a stuck pool, then anything waiting on lock L will appear deadlocked —
  but the real cause is the pool stall, not a lock ordering bug.
- The cfg80211 reg_todo waiting on a mutex might be a VICTIM, not the cause.
  The mutex holder might itself be waiting for something that depends on the
  stuck pool.

**Step 2.1: Identify what the stuck worker(s) are waiting on**

For each non-idle worker found in Step 1.2, examine its stack trace. If it's
in `__mutex_lock`, `__rwsem_down_*`, `schedule_preempt_disabled`, or similar,
identify which lock:

```python
# For a worker stuck in __mutex_lock, try to extract the lock:
for frame in prog.stack_trace(stuck_task):
    if "mutex_lock" in str(frame):
        try:
            lock = frame["lock"]
            print(f"  Mutex at: {lock}")
            owner_val = lock.owner.counter.value_()
            owner_ptr = owner_val & ~0x7
            if owner_ptr:
                owner = Object(prog, "struct task_struct",
                               address=owner_ptr)
                print(f"  Owner: {owner.comm.string_().decode()} "
                      f"pid={owner.pid.value_()}")
                print(f"  Owner stack:")
                for f in prog.stack_trace(owner):
                    print(f"    {f}")
            else:
                print(f"  No owner (owner_val=0x{owner_val:x})")
        except Exception as e:
            print(f"  (couldn't extract lock: {e})")
```

**Step 2.2: Follow the dependency chain**

For each lock owner found above:
1. Is the owner running, sleeping, or in D state?
2. If sleeping — what is it waiting on? Another lock? A completion? I/O?
3. If waiting on another lock — find THAT lock's owner and repeat.
4. At each step: is the waited-on resource something that depends on a
   work item running? If so, check which pool/workqueue that work item
   would run on — is THAT pool also stuck?

Continue until you find either:
- A cycle (A→B→C→A) — true deadlock
- A task waiting on something that depends on the stuck pool — the pool
  stall is the root cause, the "deadlock" is a symptom
- A task waiting on something unrelated (I/O, userspace, etc.)

**Step 2.3: Check other pools and CPUs**

The stall might not be limited to pool 22. Check all CPU pools:
```python
import os
nr_cpus = int(prog["nr_cpu_ids"])
for cpu in range(nr_cpus):
    pool = per_cpu(prog["cpu_worker_pools"], cpu)[0]
    nr_r = pool.nr_running.value_()
    nr_w = pool.nr_workers.value_()
    nr_i = pool.nr_idle.value_()
    wts = pool.watchdog_ts.value_()
    delta = (jiffies - wts) / 1000
    empty = pool.worklist.next.value_() == pool.worklist.address_of_().value_()
    if not empty or delta > 30:
        print(f"CPU {cpu}: pool {pool.id.value_()}, nr_running={nr_r}, "
              f"nr_workers={nr_w}, nr_idle={nr_i}, "
              f"hung={delta:.0f}s, worklist_empty={empty}")
```

**Step 2.4: Check for system-wide memory pressure**

If the pool stall involves a manager stuck in allocation:
```python
# Check memory state
from drgn.helpers.linux.percpu import per_cpu
try:
    for node in range(int(prog["nr_online_nodes"])):
        # Check per-node free pages
        pass  # Use /proc/meminfo equivalent via drgn
except:
    pass

# Check if any workers are stuck in page allocation:
from drgn.helpers.linux.pid import for_each_task
alloc_stuck = []
for task in for_each_task(prog):
    tstate = task.__state.value_() if hasattr(task, '__state') else task.state.value_()
    if tstate != 0:  # not TASK_RUNNING
        try:
            for frame in prog.stack_trace(task):
                if "alloc_pages" in str(frame) or "__alloc_pages" in str(frame):
                    alloc_stuck.append(
                        f"{task.comm.string_().decode()} pid={task.pid.value_()}")
                    break
        except:
            pass
if alloc_stuck:
    print(f"Tasks stuck in page allocation: {len(alloc_stuck)}")
    for t in alloc_stuck[:20]:
        print(f"  {t}")
```

**Step 2.5: Check all D-state tasks for patterns**
```python
from drgn.helpers.linux.pid import for_each_task
d_state_tasks = []
for task in for_each_task(prog):
    tstate = task.__state.value_() if hasattr(task, '__state') else task.state.value_()
    if tstate == 2:  # TASK_UNINTERRUPTIBLE
        comm = task.comm.string_().decode()
        pid = task.pid.value_()
        try:
            frames = [str(f) for f in prog.stack_trace(task)]
            # Categorize by what they're stuck on
            category = "unknown"
            for f in frames:
                if "mutex_lock" in f: category = "mutex"; break
                if "rwsem" in f: category = "rwsem"; break
                if "alloc_pages" in f: category = "alloc"; break
                if "wait_for_completion" in f: category = "completion"; break
                if "worker_thread" in f: category = "wq_idle"; break
            d_state_tasks.append((category, comm, pid, frames[0] if frames else "?"))
        except:
            d_state_tasks.append(("error", comm, pid, "?"))

# Group by category
from collections import Counter
cats = Counter(cat for cat, _, _, _ in d_state_tasks)
print(f"D-state tasks by category: {dict(cats)}")
print(f"Total D-state tasks: {len(d_state_tasks)}")
for cat, comm, pid, top_frame in sorted(d_state_tasks)[:30]:
    print(f"  [{cat}] {comm} pid={pid}: {top_frame}")
```

## REPORTING

Present your findings in this order:

1. **Pool State** — Hard facts from Phase 1: nr_running, nr_idle,
   nr_workers, pending work count, watchdog delta. For each worker: state,
   current function, sleeping flag, stack trace. These are raw facts — state
   them without interpretation first.

2. **Stall Mechanism** — Based on the pool state, which scenario (A/B/C/D)
   applies and why. Cite specific values: "nr_running=1 but only worker is
   in __mutex_lock with sleeping=1, so nr_running should be 0" or
   "nr_idle=0, POOL_MANAGER_ACTIVE=1, manager pid=X stuck in __alloc_pages."

3. **Dependency Analysis** — The lock/wait chain from Phase 2. For each
   link: who is waiting, on what, who holds it, what are they doing. Note
   explicitly whether any link depends on a work item that would run on the
   stuck pool.

4. **Root Cause** — What started the stall. This might be:
   - A workqueue concurrency management bug (nr_running inconsistency)
   - A worker creation failure (memory pressure preventing GFP_KERNEL allocs)
   - A lock ordering issue in cfg80211/networking that caused workers to
     all block
   - Something else entirely
   State this with evidence. If you cannot determine root cause with
   certainty, say so and list the candidates with evidence for/against each.

5. **Cascade Effects** — What other subsystems are stalled as a consequence,
   and through what mechanism (blocked on same lock, waiting for stuck work
   item, etc.).

For EVERY claim, cite the specific drgn output. If you cannot support a
claim, explicitly say so and mark it as hypothesis.

## OUTPUT FORMAT

Write your complete findings as a Markdown file. The file should be
self-contained and suitable for posting as a reply to the bug report on
LKML. Structure it as follows:

```markdown
# Workqueue Lockup Analysis — CPU 5 Pool

## 1. Pool State

(raw facts: nr_running, nr_idle, nr_workers, flags, watchdog delta)

### Workers

(table or list of every worker: id, pid, state, current_func, sleeping,
stack trace for non-idle workers)

### Pending Work Items

(list of all pending work items on the worklist with function and workqueue)

### PWQ Statistics

(started/completed counts for active pwqs on this pool)

## 2. Stall Mechanism

(which scenario applies and WHY, with specific values cited as evidence)

## 3. Lock / Dependency Analysis

(the chain: who waits on what, who holds it, what are THEY waiting on —
with addresses and stack traces at each step. note whether any link depends
on a work item on the stuck pool.)

## 4. Other Pools and System-Wide State

(any other stuck pools, D-state task summary, memory pressure indicators)

## 5. Root Cause

(what started the stall, with evidence. or candidates if uncertain.)

## 6. Cascade Effects

(what else is broken as a consequence)
```

Save this file and tell the user where it is so they can attach it to their
reply.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 6.18.13 iwlwifi deadlock allocating cma while work-item is active.
  2026-03-10 18:06                             ` Tejun Heo
@ 2026-03-10 19:18                               ` Ben Greear
  2026-03-10 19:47                                 ` Tejun Heo
  0 siblings, 1 reply; 22+ messages in thread
From: Ben Greear @ 2026-03-10 19:18 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Johannes Berg, linux-wireless, Miriam Rachel, linux-mm,
	linux-kernel

On 3/10/26 11:06, Tejun Heo wrote:
> Hello,
> 
> Thanks for the detailed dump. One thing that doesn't look right is the
> number of pending work items on pool 22 (CPU 5). The pool reports 2 idle
> workers, yet there are 7+ work items sitting in the pending list across
> multiple workqueues. If the pool were making forward progress, those items
> would have been picked up by the idle workers. So, the pool itself seems to
> be stuck for some reason, and the cfg80211 mutex stall may be a consequence
> rather than the cause.
> 
> Let's try using drgn on the crash dump. I'm attaching a prompt that you can
> feed to Claude (or any LLM with tool access to drgn). It contains workqueue
> internals documentation, drgn code snippets, and a systematic investigation
> procedure. The idea is:
> 
> 1. Generate the crash dump when the deadlock is happening:
> 
>       echo c > /proc/sysrq-trigger
> 
> 2. After the crash kernel boots, create the dump file:
> 
>       makedumpfile -c -d 31 /proc/vmcore /tmp/vmcore.dmp
> 
> 3. Feed the attached prompt to Claude with drgn access to the dump. It
>     should produce a Markdown report with its findings that you can post
>     back here.
> 
> This is a bit experimental, so let's see whether it works. Either way, the
> report should at least give us concrete data points to work with.
> 
> Thanks.

Thanks for that.  It will probably be a few days before I flip back to debugging
that lockup as I'm trying to get something ready for our internal release (using
kthread work-around).

While working on another bug, I found evidence (but not proof yet), that this code below
can be called multiple times for the same object.  The bug I'm tracking is that this
may be the cause of list corruption (my debugging logs and work-arounds are in the method below).

But could this work-item (re)initialization also explain work-queue system going
weird?  Just using kthreads, which 'fixes' the problem for me,
really shouldn't make a difference to the code below, so probably
it is not related?


void ieee80211_link_init(struct ieee80211_sub_if_data *sdata,
			 int link_id,
			 struct ieee80211_link_data *link,
			 struct ieee80211_bss_conf *link_conf)
{
	struct ieee80211_local *local = sdata->local;
	bool deflink = link_id < 0;

	lockdep_assert_wiphy(local->hw.wiphy);

	if (link_id < 0)
		link_id = 0;

	if (sdata->vif.type == NL80211_IFTYPE_AP_VLAN) {
		struct ieee80211_sub_if_data *ap_bss;
		struct ieee80211_bss_conf *ap_bss_conf;

		ap_bss = container_of(sdata->bss,
				      struct ieee80211_sub_if_data, u.ap);
		ap_bss_conf = sdata_dereference(ap_bss->vif.link_conf[link_id],
						ap_bss);
		memcpy(link_conf, ap_bss_conf, sizeof(*link_conf));
	}

	link->sdata = sdata;
	link->link_id = link_id;
	link->conf = link_conf;
	link_conf->link_id = link_id;
	link_conf->vif = &sdata->vif;
	link->ap_power_level = IEEE80211_UNSET_POWER_LEVEL;
	link->user_power_level = sdata->local->user_power_level;
	link_conf->txpower = INT_MIN;

	wiphy_work_init(&link->csa.finalize_work,
			ieee80211_csa_finalize_work);
	wiphy_work_init(&link->color_change_finalize_work,
			ieee80211_color_change_finalize_work);
	wiphy_delayed_work_init(&link->color_collision_detect_work,
				ieee80211_color_collision_detection_work);
	/* I see some sort of list corruption where links don't get removed from chanctx
	 * lists.  I think if we are in a list while here, that could cause it.  deflink
	 * appears to have chance of doing that.  So, remove from list first if
	 * it is indeed in one.
	 */
	if (WARN_ON_ONCE((link->assigned_chanctx_list.next != LIST_POISON1)
			 && (link->assigned_chanctx_list.next != link->assigned_chanctx_list.prev)
			 && (link->assigned_chanctx_list.next))) {
		sdata_err(sdata, "link-init: %d called while already in an assigned-chan-ctx list, clearing.\n",
			  link_id);
		list_del(&link->assigned_chanctx_list);
	}
	if (WARN_ON_ONCE((link->reserved_chanctx_list.next != LIST_POISON1)
			 && (link->reserved_chanctx_list.next != link->reserved_chanctx_list.prev)
			 && (link->reserved_chanctx_list.next))) {
		sdata_err(sdata, "link-init: %d called while already in a reserved-chan-ctx list, clearing.\n",
			  link_id);
		list_del(&link->reserved_chanctx_list);
	}

	INIT_LIST_HEAD(&link->assigned_chanctx_list);
	INIT_LIST_HEAD(&link->reserved_chanctx_list);
	wiphy_delayed_work_init(&link->dfs_cac_timer_work,
				ieee80211_dfs_cac_timer_work);

	if (!deflink) {
		switch (sdata->vif.type) {
		case NL80211_IFTYPE_AP:
		case NL80211_IFTYPE_AP_VLAN:
			ether_addr_copy(link_conf->addr,
					sdata->wdev.links[link_id].addr);
			link_conf->bssid = link_conf->addr;
			WARN_ON(!(sdata->wdev.valid_links & BIT(link_id)));
			break;
		case NL80211_IFTYPE_STATION:
			/* station sets the bssid in ieee80211_mgd_setup_link */
			break;
		default:
			WARN_ON(1);
		}

		ieee80211_link_debugfs_add(link);
	}

	rcu_assign_pointer(sdata->vif.link_conf[link_id], link_conf);
	rcu_assign_pointer(sdata->link[link_id], link);
}


Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 6.18.13 iwlwifi deadlock allocating cma while work-item is active.
  2026-03-10 19:18                               ` Ben Greear
@ 2026-03-10 19:47                                 ` Tejun Heo
  2026-03-10 19:48                                   ` Tejun Heo
  0 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2026-03-10 19:47 UTC (permalink / raw)
  To: Ben Greear
  Cc: Johannes Berg, linux-wireless, Miriam Rachel, linux-mm,
	linux-kernel

Hello,

On Tue, Mar 10, 2026 at 12:18:49PM -0700, Ben Greear wrote:
...
> But could this work-item (re)initialization also explain work-queue system going
> weird?  Just using kthreads, which 'fixes' the problem for me,
> really shouldn't make a difference to the code below, so probably
> it is not related?

Oh, re-initing can deifnitely corrupt things. Workqueue shares work list
across all work items sharing the pool, so the blast radius can be bigger.
ie. It'd be *possible* for kthread_worker to get lucky.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 6.18.13 iwlwifi deadlock allocating cma while work-item is active.
  2026-03-10 19:47                                 ` Tejun Heo
@ 2026-03-10 19:48                                   ` Tejun Heo
  0 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2026-03-10 19:48 UTC (permalink / raw)
  To: Ben Greear
  Cc: Johannes Berg, linux-wireless, Miriam Rachel, linux-mm,
	linux-kernel

On Tue, Mar 10, 2026 at 09:47:59AM -1000, Tejun Heo wrote:
> Hello,
> 
> On Tue, Mar 10, 2026 at 12:18:49PM -0700, Ben Greear wrote:
> ...
> > But could this work-item (re)initialization also explain work-queue system going
> > weird?  Just using kthreads, which 'fixes' the problem for me,
> > really shouldn't make a difference to the code below, so probably
> > it is not related?
> 
> Oh, re-initing can deifnitely corrupt things. Workqueue shares work list
> across all work items sharing the pool, so the blast radius can be bigger.
> ie. It'd be *possible* for kthread_worker to get lucky.

BTW, if you enable CONFIG_DEBUG_OBJECTS_WORK, re-init should trigger a dump.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2026-03-10 19:48 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-23 22:36 6.18.13 iwlwifi deadlock allocating cma while work-item is active Ben Greear
2026-02-27 16:31 ` Ben Greear
2026-03-01 15:38   ` Ben Greear
2026-03-02  8:07     ` Johannes Berg
2026-03-02 15:26       ` Ben Greear
2026-03-02 15:38         ` Johannes Berg
2026-03-02 15:50           ` Ben Greear
2026-03-03 11:49             ` Johannes Berg
2026-03-03 20:52               ` Tejun Heo
2026-03-03 21:03                 ` Johannes Berg
2026-03-03 21:12                 ` Johannes Berg
2026-03-03 21:40                   ` Ben Greear
2026-03-03 21:54                     ` Tejun Heo
2026-03-04  0:02                       ` Ben Greear
2026-03-04 17:14                         ` Tejun Heo
2026-03-10 16:10                           ` Ben Greear
2026-03-10 18:06                             ` Tejun Heo
2026-03-10 19:18                               ` Ben Greear
2026-03-10 19:47                                 ` Tejun Heo
2026-03-10 19:48                                   ` Tejun Heo
2026-03-04  3:08               ` Hillf Danton
2026-03-04  6:57                 ` Johannes Berg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox