All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ben Greear <greearb@candelatech.com>
To: linux-wireless <linux-wireless@vger.kernel.org>
Cc: "Korenblit, Miriam Rachel" <miriam.rachel.korenblit@intel.com>,
	linux-mm@kvack.org
Subject: Re: 6.18.13 iwlwifi deadlock allocating cma while work-item is active.
Date: Fri, 27 Feb 2026 08:31:05 -0800	[thread overview]
Message-ID: <18c4bfed-caca-bef3-a139-63d7fa48940a@candelatech.com> (raw)
In-Reply-To: <fa4e82ee-eb14-3930-c76c-f3bd59c5f258@candelatech.com>

On 2/23/26 14:36, Ben Greear wrote:
> Hello,
> 
> I hit a deadlock related to CMA mem allocation attempting to flush all work
> while holding some wifi related mutex, and with a work-queue attempting to process a wifi regdomain
> work item.  I really don't see any good way to fix this,
> it would seem that any code that was holding a mutex that could block a work-queue
> cannot safely allocate CMA memory?  Hopefully someone else has a better idea.

I tried using a kthread to do the regulatory domain processing instead of worker item,
and that seems to have solved the problem.  If that seems reasonable approach to
wifi stack folks, I can post a patch.

Thanks,
Ben

> 
> For whatever reason, my hacked up kernel will print out the sysrq process stack traces I need
> to understand this, and my stable 6.18.13 will not.  But, the locks-held matches in both cases, so almost
> certainly this is same problem.  I can reproduce the same problem on both un-modified stable
> and my own.  The details below are from my modified 6.18.9+ kernel.
> 
> I only hit this (reliably?) with a KASAN enabled kernel, likely because it makes things slow enough to
> hit the problem and/or causes CMA allocations in a different manner.
> 
> General way to reproduce is to have large amounts of intel be200 radios in a system, and bring them
> admin up and down.
> 
> 
> ## From 6.18.13 (un-modified)
> 
> 40479 Feb 23 14:13:31 ct523c-de7c kernel: 5 locks held by kworker/u32:11/34989:
> 40480 Feb 23 14:13:31 ct523c-de7c kernel:  #0: ffff888120161148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0xf7a/0x17b0
> 40481 Feb 23 14:13:31 ct523c-de7c kernel:  #1: ffff8881a561fd20 ((work_completion)(&rdev->wiphy_work)){+.+.}-{0:0}, at: process_one_work+0x7ca/0x17b0
> 40482 Feb 23 14:13:31 ct523c-de7c kernel:  #2: ffff88815e618788 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: cfg80211_wiphy_work+0x5c/0x570 [cfg80211]
> 40483 Feb 23 14:13:31 ct523c-de7c kernel:  #3: ffffffff87232e60 (&cma->alloc_mutex){+.+.}-{4:4}, at: __cma_alloc+0x3c5/0xd20
> 40484 Feb 23 14:13:31 ct523c-de7c kernel:  #4: ffffffff8534f668 (lock#5){+.+.}-{4:4}, at: __lru_add_drain_all+0x5f/0x530
> 
> 40488 Feb 23 14:13:31 ct523c-de7c kernel: 4 locks held by kworker/1:0/39480:
> 40489 Feb 23 14:13:31 ct523c-de7c kernel:  #0: ffff88812006b148 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0xf7a/0x17b0
> 40490 Feb 23 14:13:31 ct523c-de7c kernel:  #1: ffff88814087fd20 (reg_work){+.+.}-{0:0}, at: process_one_work+0x7ca/0x17b0
> 40491 Feb 23 14:13:31 ct523c-de7c kernel:  #2: ffffffff85970028 (rtnl_mutex){+.+.}-{4:4}, at: reg_todo+0x18/0x770 [cfg80211]
> 40492 Feb 23 14:13:31 ct523c-de7c kernel:  #3: ffff88815e618788 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: reg_process_self_managed_hints+0x70/0x190 [cfg80211]
> 
> 
> ## Rest of this is from my 6.18.9+ hacks kernel.
> 
> ### thread trying to allocate cma is blocked here, trying to flush work.
> 
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from vmlinux...
> (gdb) l *(alloc_contig_range_noprof+0x1de)
> 0xffffffff8162453e is in alloc_contig_range_noprof (/home2/greearb/git/linux-6.18.dev.y/mm/page_alloc.c:6798).
> 6793            .reason = MR_CONTIG_RANGE,
> 6794        };
> 6795
> 6796        lru_cache_disable();
> 6797
> 6798        while (pfn < end || !list_empty(&cc->migratepages)) {
> 6799            if (fatal_signal_pending(current)) {
> 6800                ret = -EINTR;
> 6801                break;
> 6802            }
> (gdb) l *(__lru_add_drain_all+0x19b)
> 0xffffffff815ae44b is in __lru_add_drain_all (/home2/greearb/git/linux-6.18.dev.y/mm/swap.c:884).
> 879                queue_work_on(cpu, mm_percpu_wq, work);
> 880                __cpumask_set_cpu(cpu, &has_work);
> 881            }
> 882        }
> 883
> 884        for_each_cpu(cpu, &has_work)
> 885            flush_work(&per_cpu(lru_add_drain_work, cpu));
> 886
> 887    done:
> 888        mutex_unlock(&lock);
> (gdb)
> 
> 
> #### and other thread is trying to process a regdom request, and trying to use
> # rcu and rtnl???
> 
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from net/wireless/cfg80211.ko...
> (gdb) l *(reg_todo+0x18)
> 0xe238 is in reg_todo (/home2/greearb/git/linux-6.18.dev.y/net/wireless/reg.c:3107).
> 3102     */
> 3103    static void reg_process_pending_hints(void)
> 3104    {
> 3105        struct regulatory_request *reg_request, *lr;
> 3106
> 3107        lr = get_last_request();
> 3108
> 3109        /* When last_request->processed becomes true this will be rescheduled */
> 3110        if (lr && !lr->processed) {
> 3111            pr_debug("Pending regulatory request, waiting for it to be processed...\n");
> (gdb)
> 
> static struct regulatory_request *get_last_request(void)
> {
>      return rcu_dereference_rtnl(last_request);
> }
> 
> 
> task:kworker/6:0     state:D stack:0     pid:56    tgid:56    ppid:2      task_flags:0x4208060 flags:0x00080000
> Workqueue: events reg_todo [cfg80211]
> Call Trace:
>   <TASK>
>   __schedule+0x526/0x1290
>   preempt_schedule_notrace+0x35/0x50
>   preempt_schedule_notrace_thunk+0x16/0x30
>   rcu_is_watching+0x2a/0x30
>   lock_acquire+0x26d/0x2c0
>   schedule+0xac/0x120
>   ? schedule+0x8d/0x120
>   schedule_preempt_disabled+0x11/0x20
>   __mutex_lock+0x726/0x1070
>   ? reg_todo+0x18/0x2b0 [cfg80211]
>   ? reg_todo+0x18/0x2b0 [cfg80211]
>   reg_todo+0x18/0x2b0 [cfg80211]
>   process_one_work+0x221/0x6d0
>   worker_thread+0x1e5/0x3b0
>   ? rescuer_thread+0x450/0x450
>   kthread+0x108/0x220
>   ? kthreads_online_cpu+0x110/0x110
>   ret_from_fork+0x1c6/0x220
>   ? kthreads_online_cpu+0x110/0x110
>   ret_from_fork_asm+0x11/0x20
>   </TASK>
> 
> task:ip              state:D stack:0     pid:72857 tgid:72857 ppid:72843  task_flags:0x400100 flags:0x00080001
> Call Trace:
>   <TASK>
>   __schedule+0x526/0x1290
>   ? schedule+0x8d/0x120
>   ? schedule+0xe2/0x120
>   schedule+0x36/0x120
>   schedule_timeout+0xf9/0x110
>   ? mark_held_locks+0x40/0x70
>   __wait_for_common+0xbe/0x1e0
>   ? hrtimer_nanosleep_restart+0x120/0x120
>   ? __flush_work+0x20b/0x530
>   __flush_work+0x34e/0x530
>   ? flush_workqueue_prep_pwqs+0x160/0x160
>   ? bpf_prog_test_run_tracing+0x160/0x2d0
>   __lru_add_drain_all+0x19b/0x220
>   alloc_contig_range_noprof+0x1de/0x8a0
>   __cma_alloc+0x1f1/0x6a0
>   __dma_direct_alloc_pages.isra.0+0xcb/0x2f0
>   dma_direct_alloc+0x7b/0x250
>   dma_alloc_attrs+0xa1/0x2a0
>   _iwl_pcie_ctxt_info_dma_alloc_coherent+0x31/0xb0 [iwlwifi]
>   iwl_pcie_ctxt_info_alloc_dma+0x20/0x50 [iwlwifi]
>   iwl_pcie_init_fw_sec+0x2fc/0x380 [iwlwifi]
>   iwl_pcie_ctxt_info_v2_alloc+0x19e/0x530 [iwlwifi]
>   iwl_trans_pcie_gen2_start_fw+0x2e2/0x820 [iwlwifi]
>   ? lock_is_held_type+0x92/0x100
>   iwl_trans_start_fw+0x77/0x90 [iwlwifi]
>   iwl_mld_load_fw_wait_alive+0x97/0x2c0 [iwlmld]
>   ? iwl_mld_mac80211_sta_state+0x780/0x780 [iwlmld]
>   ? lock_is_held_type+0x92/0x100
>   iwl_mld_load_fw+0x91/0x240 [iwlmld]
>   ? ieee80211_open+0x3d/0xe0 [mac80211]
>   ? lock_is_held_type+0x92/0x100
>   iwl_mld_start_fw+0x44/0x470 [iwlmld]
>   iwl_mld_mac80211_start+0x3d/0x1b0 [iwlmld]
>   drv_start+0x6f/0x1d0 [mac80211]
>   ieee80211_do_open+0x2d6/0x960 [mac80211]
>   ieee80211_open+0x62/0xe0 [mac80211]
>   __dev_open+0x11a/0x2e0
>   __dev_change_flags+0x1f8/0x280
>   netif_change_flags+0x22/0x60
>   do_setlink.isra.0+0xe57/0x11a0
>   ? __mutex_lock+0xb0/0x1070
>   ? __mutex_lock+0x99e/0x1070
>   ? __nla_validate_parse+0x5e/0xcd0
>   ? rtnl_newlink+0x355/0xb50
>   ? cap_capable+0x90/0x100
>   ? security_capable+0x72/0x80
>   rtnl_newlink+0x7e8/0xb50
>   ? __lock_acquire+0x436/0x2190
>   ? lock_acquire+0xc2/0x2c0
>   ? rtnetlink_rcv_msg+0x97/0x660
>   ? find_held_lock+0x2b/0x80
>   ? do_setlink.isra.0+0x11a0/0x11a0
>   ? rtnetlink_rcv_msg+0x3ea/0x660
>   ? lock_release+0xcc/0x290
>   ? do_setlink.isra.0+0x11a0/0x11a0
>   rtnetlink_rcv_msg+0x409/0x660
>   ? rtnl_fdb_dump+0x240/0x240
>   netlink_rcv_skb+0x56/0x100
>   netlink_unicast+0x1e1/0x2d0
>   netlink_sendmsg+0x219/0x460
>   __sock_sendmsg+0x38/0x70
>   ____sys_sendmsg+0x214/0x280
>   ? import_iovec+0x2c/0x30
>   ? copy_msghdr_from_user+0x6c/0xa0
>   ___sys_sendmsg+0x85/0xd0
>   ? __lock_acquire+0x436/0x2190
>   ? find_held_lock+0x2b/0x80
>   ? lock_acquire+0xc2/0x2c0
>   ? mntput_no_expire+0x43/0x460
>   ? find_held_lock+0x2b/0x80
>   ? mntput_no_expire+0x8c/0x460
>   __sys_sendmsg+0x6b/0xc0
>   do_syscall_64+0x6b/0x11b0
>   entry_SYSCALL_64_after_hwframe+0x4b/0x53
> 
> Thanks,
> Ben
> 



  reply	other threads:[~2026-02-27 16:31 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-23 22:36 6.18.13 iwlwifi deadlock allocating cma while work-item is active Ben Greear
2026-02-27 16:31 ` Ben Greear [this message]
2026-03-01 15:38   ` Ben Greear
2026-03-02  8:07     ` Johannes Berg
2026-03-02 15:26       ` Ben Greear
2026-03-02 15:38         ` Johannes Berg
2026-03-02 15:50           ` Ben Greear
2026-03-03 11:49             ` Johannes Berg
2026-03-03 20:52               ` Tejun Heo
2026-03-03 21:03                 ` Johannes Berg
2026-03-03 21:12                 ` Johannes Berg
2026-03-03 21:40                   ` Ben Greear
2026-03-03 21:54                     ` Tejun Heo
2026-03-04  0:02                       ` Ben Greear
2026-03-04 17:14                         ` Tejun Heo
2026-03-04 17:14                           ` Tejun Heo
2026-03-10 16:10                           ` Ben Greear
2026-03-10 18:06                             ` Tejun Heo
2026-03-10 19:18                               ` Ben Greear
2026-03-10 19:47                                 ` Tejun Heo
2026-03-10 19:48                                   ` Tejun Heo
2026-03-04  3:08               ` Hillf Danton
2026-03-04  6:57                 ` Johannes Berg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=18c4bfed-caca-bef3-a139-63d7fa48940a@candelatech.com \
    --to=greearb@candelatech.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=miriam.rachel.korenblit@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.