* 6.16-rcX: crashing way too often on thinkpad X220 @ 2025-07-22 17:35 Pavel Machek 2025-07-23 15:06 ` Pavel Machek 0 siblings, 1 reply; 27+ messages in thread From: Pavel Machek @ 2025-07-22 17:35 UTC (permalink / raw) To: kernel list, tglx, mingo, bp, dave.hansen, x86, hpa [-- Attachment #1: Type: text/plain, Size: 4184 bytes --] Hi! I have had 4th or so crash with 6.16. That's not usual. This time capslock was blinking, indicating panic. Two crashes were like that, two were just lockups. This time I checked logs, and nothing unusual (besides this, but..) Jul 22 09:05:20 duo pipewire[3473]: Failed to receive portal pid: org.freedesktop.DBus.Error.NameHasNoOwner: Could not get PID of nam> Jul 22 19:08:57 duo pulseaudio[3474]: ALSA woke us up to write new data to the device, but there was actually nothing to write. Jul 22 19:08:57 duo pulseaudio[3474]: Most likely this is a bug in the ALSA driver 'snd_hda_intel'. Please report this issue to the A> Jul 22 19:08:57 duo pulseaudio[3474]: We were woken up with POLLOUT set -- however a subsequent snd_pcm_avail() returned 0 or another> Digging through older logs, I get: Jul 06 15:22:20 duo kernel: iwlwifi 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control Jul 06 15:22:20 duo kernel: iwlwifi 0000:03:00.0: pci_enable_msi failed - -38 Jul 06 15:22:20 duo kernel: ACPI: \: failed to evaluate _DSM 2c176672-0b22-294b-814f-75e4dd26b5fd rev:0 func:0 (0x1001) Jul 06 15:22:20 duo kernel: ACPI: \: failed to evaluate _DSM 2c176672-0b22-294b-814f-75e4dd26b5fd rev:0 func:0 (0x1001) Jul 06 15:22:20 duo kernel: ENERGY_PERF_BIAS: Set to 'normal', was 'performance' Jul 06 15:22:20 duo kernel: platform regulatory.0: Direct firmware load for regulatory.db failed with error -2 Jul 06 15:22:20 duo kernel: Bluetooth: hci0: BCM: firmware Patch file not found, tried: Jul 06 15:22:20 duo kernel: Bluetooth: hci0: BCM: 'brcm/BCM20702A1-0a5c-21e6.hcd' Jul 06 15:22:20 duo kernel: Bluetooth: hci0: BCM: 'brcm/BCM-0a5c-21e6.hcd' ... Jul 08 08:21:32 duo kernel: Bluetooth: hci0: BCM: firmware Patch file not found, tried: Jul 08 08:21:32 duo kernel: Bluetooth: hci0: BCM: 'brcm/BCM20702A1-0a5c-21e6.hcd' Jul 08 08:21:32 duo kernel: Bluetooth: hci0: BCM: 'brcm/BCM-0a5c-21e6.hcd' Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: Microcode SW error detected. Restarting 0x2000000. Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: Loaded firmware version: 18.168.6.1 6000g2a-6.ucode Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: Start IWL Error Log Dump: Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: Status: 0x0000044C, count: 6 Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x00001030 | ADVANCED_SYSASSERT Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x0000C064 | uPc Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x0000C05C | branchlink1 Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x0000C05C | branchlink2 Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x0000D6BE | interruptlink1 Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | interruptlink2 Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x003B00F7 | data1 Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | data2 Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x00000800 | line Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0xED408AD8 | beacon time Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x40AC6528 | tsf low Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x0000000E | tsf hi Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x00000578 | time gp1 Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x190D1D6C | time gp2 Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | time gp3 Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x754312A8 | uCode version Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x000000B0 | hw version ... Jul 08 12:23:29 duo kernel: usb 1-1-port2: unable to enumerate USB device Jul 08 12:50:42 duo kernel: mmc0: tried to HW reset card, got error -123 Jul 08 13:44:20 duo kernel: usb 1-1.2: device descriptor read/64, error -71 Jul 08 13:44:20 duo kernel: usb 1-1.2: device descriptor read/64, error -71 Jul 08 13:44:57 duo kernel: usb 1-1.2: device descriptor read/64, error -71 But I doubt any of that is related. Any ideas? Does anyone see similar crashes? Best regards, Pavel -- I don't work for Nazis and criminals, and neither should you. Boycott Putin, Trump, and Musk! [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: 6.16-rcX: crashing way too often on thinkpad X220 2025-07-22 17:35 6.16-rcX: crashing way too often on thinkpad X220 Pavel Machek @ 2025-07-23 15:06 ` Pavel Machek 2025-07-23 15:11 ` Dave Hansen 2025-07-23 15:42 ` locking problems in iwlwifi? was " Pavel Machek 0 siblings, 2 replies; 27+ messages in thread From: Pavel Machek @ 2025-07-23 15:06 UTC (permalink / raw) To: kernel list, tglx, mingo, bp, dave.hansen, x86, hpa [-- Attachment #1: Type: text/plain, Size: 258 bytes --] Hi! > I have had 4th or so crash with 6.16. That's not usual. Two more crashes today. Last one was with 6.16-rc7+. Ideas welcome, Pavel -- I don't work for Nazis and criminals, and neither should you. Boycott Putin, Trump, and Musk! [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: 6.16-rcX: crashing way too often on thinkpad X220 2025-07-23 15:06 ` Pavel Machek @ 2025-07-23 15:11 ` Dave Hansen 2025-07-23 20:35 ` Pavel Machek 2025-07-23 15:42 ` locking problems in iwlwifi? was " Pavel Machek 1 sibling, 1 reply; 27+ messages in thread From: Dave Hansen @ 2025-07-23 15:11 UTC (permalink / raw) To: Pavel Machek, kernel list, tglx, mingo, bp, dave.hansen, x86, hpa On 7/23/25 08:06, Pavel Machek wrote: >> I have had 4th or so crash with 6.16. That's not usual. > Two more crashes today. Last one was with 6.16-rc7+. > > Ideas welcome, Still nothing in the logs? The age old idea of getting console and trying to capture an oops always applies. I've got an x230 sitting here that I have an ExpressCard serial port for. There's also netconsole and xhci debug cables. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: 6.16-rcX: crashing way too often on thinkpad X220 2025-07-23 15:11 ` Dave Hansen @ 2025-07-23 20:35 ` Pavel Machek 0 siblings, 0 replies; 27+ messages in thread From: Pavel Machek @ 2025-07-23 20:35 UTC (permalink / raw) To: Dave Hansen; +Cc: kernel list, tglx, mingo, bp, dave.hansen, x86, hpa [-- Attachment #1: Type: text/plain, Size: 910 bytes --] Hi! > >> I have had 4th or so crash with 6.16. That's not usual. > > Two more crashes today. Last one was with 6.16-rc7+. > > > > Ideas welcome, > > Still nothing in the logs? > > The age old idea of getting console and trying to capture an oops always > applies. > > I've got an x230 sitting here that I have an ExpressCard serial port > for. There's also netconsole and xhci debug cables. I was kind of wornering if I'm only one seeing problems. I've now built 6.15, and I plan to switch to it on next crash to see if it is maybe hardware being flakey or what. Oh, and I got _something_, see the thread. But that did not kill the machine, so... I'm currently traveling, and wifi reception is bad (and not really chance to do heavy debugging). Best regards, Pavel -- I don't work for Nazis and criminals, and neither should you. Boycott Putin, Trump, and Musk! [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220 2025-07-23 15:06 ` Pavel Machek 2025-07-23 15:11 ` Dave Hansen @ 2025-07-23 15:42 ` Pavel Machek 2025-07-23 17:13 ` Johannes Berg 2025-07-23 17:31 ` Thomas Gleixner 1 sibling, 2 replies; 27+ messages in thread From: Pavel Machek @ 2025-07-23 15:42 UTC (permalink / raw) To: kernel list, tglx, mingo, bp, dave.hansen, x86, hpa, peterz, will, longman, miriam.rachel.korenblit, linux-wireless [-- Attachment #1.1: Type: text/plain, Size: 4581 bytes --] Hi! > > I have had 4th or so crash with 6.16. That's not usual. > > Two more crashes today. Last one was with 6.16-rc7+. Did kernel boot on console (w/o X), and got this: not sure if it is related. Best regards, Pavel ... [ 18.466106] systemd-journald[1449]: File /var/log/journal/66f054f85acd4f7b9d2f07ff9b88ad15/user-1000.journal corrupted or uncleanly shut down, renaming and replacing. [ 21.302012] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0 [ 21.598035] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0 [ 24.925016] wlp3s0: authenticate with c8:3a:35:f0:ad:f1 (local address=a0:88:b4:62:a7:30) [ 24.925050] wlp3s0: send auth to c8:3a:35:f0:ad:f1 (try 1/3) [ 24.933544] wlp3s0: authenticated [ 24.937771] wlp3s0: associate with c8:3a:35:f0:ad:f1 (try 1/3) [ 24.941781] wlp3s0: RX AssocResp from c8:3a:35:f0:ad:f1 (capab=0x411 status=0 aid=8) [ 24.963083] wlp3s0: associated [ 25.625057] bcache: register_bdev() registered backing device sda2 [ 26.635587] EXT4-fs (bcache0): recovery complete [ 26.652144] EXT4-fs (bcache0): mounted filesystem 233f8e6b-8b89-4425-9eb4-8cb18077dd52 r/w with ordered data mode. Quota mode: disabled. [ 30.062975] Adding 9878100k swap on /dev/sdb1. Priority:-2 extents:1 across:9878100k SS [ 402.124881] wlp3s0: deauthenticated from c8:3a:35:f0:ad:f1 (Reason: 7=CLASS3_FRAME_FROM_NONASSOC_STA) [ 402.125635] ------------[ cut here ]------------ [ 402.125638] raw_local_irq_restore() called with IRQs enabled [ 402.125645] WARNING: CPU: 3 PID: 387 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30 [ 402.125654] Modules linked in: [ 402.125661] CPU: 3 UID: 0 PID: 387 Comm: kworker/u16:5 Tainted: G S 6.16.0-rc7+ #303 PREEMPT(voluntary) [ 402.125667] Tainted: [S]=CPU_OUT_OF_SPEC [ 402.125668] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016 [ 402.125671] Workqueue: events_unbound cfg80211_wiphy_work [ 402.125678] RIP: 0010:warn_bogus_irq_restore+0x25/0x30 [ 402.125683] Code: 90 90 90 90 90 80 3d 51 3d dc 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 c0 4f c9 85 48 89 e5 c6 05 38 3d dc 00 01 e8 9b d8 e6 fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 [ 402.125686] RSP: 0018:ffffc9000173fb30 EFLAGS: 00010282 [ 402.125691] RAX: 0000000000000000 RBX: ffffffff8616b460 RCX: 0000000000000000 [ 402.125694] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff [ 402.125696] RBP: ffffc9000173fb30 R08: 0000000028935f32 R09: 0000000000000001 [ 402.125699] R10: 0000000000000044 R11: ffff888100ba52c8 R12: 0000000000000001 [ 402.125702] R13: ffffc9000173fbcb R14: ffffffff84301224 R15: 0000000000000000 [ 402.125704] FS: 0000000000000000(0000) GS:ffff88829007f000(0000) knlGS:0000000000000000 [ 402.125707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 402.125710] CR2: 000055967d471ee0 CR3: 0000000006046001 CR4: 00000000000606b0 [ 402.125713] Call Trace: [ 402.125716] <TASK> [ 402.125719] console_flush_all+0x41e/0x460 [ 402.125725] ? console_flush_all+0x43/0x460 [ 402.125735] console_unlock+0x55/0x100 [ 402.125741] vprintk_emit+0x157/0x320 [ 402.125748] vprintk_default+0x18/0x20 [ 402.125752] vprintk+0x9/0x10 [ 402.125756] _printk+0x52/0x70 [ 402.125766] ieee80211_sta_rx_queued_mgmt+0x4c8/0xd30 [ 402.125775] ? __this_cpu_preempt_check+0x13/0x20 [ 402.125784] ieee80211_iface_work+0x3ad/0x500 [ 402.125792] cfg80211_wiphy_work+0xbf/0xe0 [ 402.125799] process_one_work+0x3fb/0x500 [ 402.125805] ? process_one_work+0x3bf/0x500 [ 402.125809] ? process_one_work+0x3bf/0x500 [ 402.125813] ? process_one_work+0x3bf/0x500 [ 402.125822] worker_thread+0x189/0x340 [ 402.125828] ? __pfx_worker_thread+0x10/0x10 [ 402.125834] kthread+0x104/0x260 [ 402.125840] ? __pfx_kthread+0x10/0x10 [ 402.125846] ret_from_fork+0x7d/0xe0 [ 402.125852] ? __pfx_kthread+0x10/0x10 [ 402.125857] ret_from_fork_asm+0x1a/0x30 [ 402.125870] </TASK> [ 402.125872] ---[ end trace 0000000000000000 ]--- [ 402.267771] wlp3s0: authenticate with c8:3a:35:f0:ad:f1 (local address=a0:88:b4:62:a7:30) [ 402.268566] wlp3s0: send auth to c8:3a:35:f0:ad:f1 (try 1/3) [ 402.277687] wlp3s0: authenticated [ 402.280664] wlp3s0: associate with c8:3a:35:f0:ad:f1 (try 1/3) [ 402.285133] wlp3s0: RX AssocResp from c8:3a:35:f0:ad:f1 (capab=0x411 status=0 aid=7) [ 402.306254] wlp3s0: associated -- I don't work for Nazis and criminals, and neither should you. Boycott Putin, Trump, and Musk! [-- Attachment #1.2: dmesg.723.gz --] [-- Type: application/gzip, Size: 20664 bytes --] [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220 2025-07-23 15:42 ` locking problems in iwlwifi? was " Pavel Machek @ 2025-07-23 17:13 ` Johannes Berg 2025-07-23 20:37 ` Pavel Machek ` (2 more replies) 2025-07-23 17:31 ` Thomas Gleixner 1 sibling, 3 replies; 27+ messages in thread From: Johannes Berg @ 2025-07-23 17:13 UTC (permalink / raw) To: Pavel Machek, kernel list, tglx, mingo, bp, dave.hansen, x86, hpa, peterz, will, longman, miriam.rachel.korenblit, linux-wireless Hi, On Wed, 2025-07-23 at 17:42 +0200, Pavel Machek wrote: > [ 402.125635] ------------[ cut here ]------------ > [ 402.125638] raw_local_irq_restore() called with IRQs enabled > [ 402.125645] WARNING: CPU: 3 PID: 387 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30 > [ 402.125654] Modules linked in: > [ 402.125661] CPU: 3 UID: 0 PID: 387 Comm: kworker/u16:5 Tainted: G S 6.16.0-rc7+ #303 PREEMPT(voluntary) > [ 402.125667] Tainted: [S]=CPU_OUT_OF_SPEC > [ 402.125668] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016 > [ 402.125671] Workqueue: events_unbound cfg80211_wiphy_work > [ 402.125678] RIP: 0010:warn_bogus_irq_restore+0x25/0x30 > [ 402.125683] Code: 90 90 90 90 90 80 3d 51 3d dc 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 c0 4f c9 85 48 89 e5 c6 05 38 3d dc 00 01 e8 9b d8 e6 fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 > [ 402.125686] RSP: 0018:ffffc9000173fb30 EFLAGS: 00010282 > [ 402.125691] RAX: 0000000000000000 RBX: ffffffff8616b460 RCX: 0000000000000000 > [ 402.125694] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff > [ 402.125696] RBP: ffffc9000173fb30 R08: 0000000028935f32 R09: 0000000000000001 > [ 402.125699] R10: 0000000000000044 R11: ffff888100ba52c8 R12: 0000000000000001 > [ 402.125702] R13: ffffc9000173fbcb R14: ffffffff84301224 R15: 0000000000000000 > [ 402.125704] FS: 0000000000000000(0000) GS:ffff88829007f000(0000) knlGS:0000000000000000 > [ 402.125707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 402.125710] CR2: 000055967d471ee0 CR3: 0000000006046001 CR4: 00000000000606b0 > [ 402.125713] Call Trace: > [ 402.125716] <TASK> > [ 402.125719] console_flush_all+0x41e/0x460 > [ 402.125725] ? console_flush_all+0x43/0x460 > [ 402.125735] console_unlock+0x55/0x100 > [ 402.125741] vprintk_emit+0x157/0x320 > [ 402.125748] vprintk_default+0x18/0x20 > [ 402.125752] vprintk+0x9/0x10 > [ 402.125756] _printk+0x52/0x70 > [ 402.125766] ieee80211_sta_rx_queued_mgmt+0x4c8/0xd30 > [ 402.125775] ? __this_cpu_preempt_check+0x13/0x20 > [ 402.125784] ieee80211_iface_work+0x3ad/0x500 That's not great, but I don't see how the driver or wifi subsystem is involved ... ieee80211_sta_rx_queued_mgmt() doesn't even use spinlocks let alone disable IRQs or use raw_ APIs, and it's in the middle of printk anyway. No idea what might be going on here, sorry. johannes ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220 2025-07-23 17:13 ` Johannes Berg @ 2025-07-23 20:37 ` Pavel Machek 2025-07-24 8:50 ` John Ogness 2025-07-23 21:42 ` Pavel Machek 2025-09-15 9:32 ` Pavel Machek 2 siblings, 1 reply; 27+ messages in thread From: Pavel Machek @ 2025-07-23 20:37 UTC (permalink / raw) To: Johannes Berg Cc: kernel list, tglx, mingo, bp, dave.hansen, x86, hpa, peterz, will, longman, miriam.rachel.korenblit, linux-wireless, pmladek, rostedt, john.ogness, senozhatsky [-- Attachment #1: Type: text/plain, Size: 2766 bytes --] Hi! > On Wed, 2025-07-23 at 17:42 +0200, Pavel Machek wrote: > > [ 402.125635] ------------[ cut here ]------------ > > [ 402.125638] raw_local_irq_restore() called with IRQs enabled > > [ 402.125645] WARNING: CPU: 3 PID: 387 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30 > > [ 402.125654] Modules linked in: > > [ 402.125661] CPU: 3 UID: 0 PID: 387 Comm: kworker/u16:5 Tainted: G S 6.16.0-rc7+ #303 PREEMPT(voluntary) > > [ 402.125667] Tainted: [S]=CPU_OUT_OF_SPEC > > [ 402.125668] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016 > > [ 402.125671] Workqueue: events_unbound cfg80211_wiphy_work > > [ 402.125678] RIP: 0010:warn_bogus_irq_restore+0x25/0x30 > > [ 402.125683] Code: 90 90 90 90 90 80 3d 51 3d dc 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 c0 4f c9 85 48 89 e5 c6 05 38 3d dc 00 01 e8 9b d8 e6 fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 > > [ 402.125686] RSP: 0018:ffffc9000173fb30 EFLAGS: 00010282 > > [ 402.125691] RAX: 0000000000000000 RBX: ffffffff8616b460 RCX: 0000000000000000 > > [ 402.125694] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff > > [ 402.125696] RBP: ffffc9000173fb30 R08: 0000000028935f32 R09: 0000000000000001 > > [ 402.125699] R10: 0000000000000044 R11: ffff888100ba52c8 R12: 0000000000000001 > > [ 402.125702] R13: ffffc9000173fbcb R14: ffffffff84301224 R15: 0000000000000000 > > [ 402.125704] FS: 0000000000000000(0000) GS:ffff88829007f000(0000) knlGS:0000000000000000 > > [ 402.125707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 402.125710] CR2: 000055967d471ee0 CR3: 0000000006046001 CR4: 00000000000606b0 > > [ 402.125713] Call Trace: > > [ 402.125716] <TASK> > > [ 402.125719] console_flush_all+0x41e/0x460 > > [ 402.125725] ? console_flush_all+0x43/0x460 > > [ 402.125735] console_unlock+0x55/0x100 > > [ 402.125741] vprintk_emit+0x157/0x320 > > [ 402.125748] vprintk_default+0x18/0x20 > > [ 402.125752] vprintk+0x9/0x10 > > [ 402.125756] _printk+0x52/0x70 > > [ 402.125766] ieee80211_sta_rx_queued_mgmt+0x4c8/0xd30 > > [ 402.125775] ? __this_cpu_preempt_check+0x13/0x20 > > [ 402.125784] ieee80211_iface_work+0x3ad/0x500 > > That's not great, but I don't see how the driver or wifi subsystem is > involved ... ieee80211_sta_rx_queued_mgmt() doesn't even use spinlocks > let alone disable IRQs or use raw_ APIs, and it's in the middle of > printk anyway. > > No idea what might be going on here, sorry. So lets ask printk() people if they have some idea... / seen something similar / ....? Pavel -- I don't work for Nazis and criminals, and neither should you. Boycott Putin, Trump, and Musk! [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220 2025-07-23 20:37 ` Pavel Machek @ 2025-07-24 8:50 ` John Ogness 0 siblings, 0 replies; 27+ messages in thread From: John Ogness @ 2025-07-24 8:50 UTC (permalink / raw) To: Pavel Machek, Johannes Berg Cc: kernel list, tglx, mingo, bp, dave.hansen, x86, hpa, peterz, will, longman, miriam.rachel.korenblit, linux-wireless, pmladek, rostedt, senozhatsky On 2025-07-23, Pavel Machek <pavel@ucw.cz> wrote: >> On Wed, 2025-07-23 at 17:42 +0200, Pavel Machek wrote: >> > [ 402.125635] ------------[ cut here ]------------ >> > [ 402.125638] raw_local_irq_restore() called with IRQs enabled >> > [ 402.125645] WARNING: CPU: 3 PID: 387 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30 >> > [ 402.125654] Modules linked in: >> > [ 402.125661] CPU: 3 UID: 0 PID: 387 Comm: kworker/u16:5 Tainted: G S 6.16.0-rc7+ #303 PREEMPT(voluntary) >> > [ 402.125667] Tainted: [S]=CPU_OUT_OF_SPEC >> > [ 402.125668] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016 >> > [ 402.125671] Workqueue: events_unbound cfg80211_wiphy_work >> > [ 402.125678] RIP: 0010:warn_bogus_irq_restore+0x25/0x30 >> > [ 402.125683] Code: 90 90 90 90 90 80 3d 51 3d dc 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 c0 4f c9 85 48 89 e5 c6 05 38 3d dc 00 01 e8 9b d8 e6 fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 >> > [ 402.125686] RSP: 0018:ffffc9000173fb30 EFLAGS: 00010282 >> > [ 402.125691] RAX: 0000000000000000 RBX: ffffffff8616b460 RCX: 0000000000000000 >> > [ 402.125694] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff >> > [ 402.125696] RBP: ffffc9000173fb30 R08: 0000000028935f32 R09: 0000000000000001 >> > [ 402.125699] R10: 0000000000000044 R11: ffff888100ba52c8 R12: 0000000000000001 >> > [ 402.125702] R13: ffffc9000173fbcb R14: ffffffff84301224 R15: 0000000000000000 >> > [ 402.125704] FS: 0000000000000000(0000) GS:ffff88829007f000(0000) knlGS:0000000000000000 >> > [ 402.125707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> > [ 402.125710] CR2: 000055967d471ee0 CR3: 0000000006046001 CR4: 00000000000606b0 >> > [ 402.125713] Call Trace: >> > [ 402.125716] <TASK> >> > [ 402.125719] console_flush_all+0x41e/0x460 >> > [ 402.125725] ? console_flush_all+0x43/0x460 >> > [ 402.125735] console_unlock+0x55/0x100 >> > [ 402.125741] vprintk_emit+0x157/0x320 >> > [ 402.125748] vprintk_default+0x18/0x20 >> > [ 402.125752] vprintk+0x9/0x10 >> > [ 402.125756] _printk+0x52/0x70 >> > [ 402.125766] ieee80211_sta_rx_queued_mgmt+0x4c8/0xd30 >> > [ 402.125775] ? __this_cpu_preempt_check+0x13/0x20 >> > [ 402.125784] ieee80211_iface_work+0x3ad/0x500 >> >> That's not great, but I don't see how the driver or wifi subsystem is >> involved ... ieee80211_sta_rx_queued_mgmt() doesn't even use spinlocks >> let alone disable IRQs or use raw_ APIs, and it's in the middle of >> printk anyway. printk() does perform local_irq_save()/_restore(). I rechecked all the sites and they are properly balanced in all paths. It looks like you may have found other issues related to corruption, so I will assume printk is not the culprit. But please keep us CC or ping us again if printk comes under suspicion again. John Ogness ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220 2025-07-23 17:13 ` Johannes Berg 2025-07-23 20:37 ` Pavel Machek @ 2025-07-23 21:42 ` Pavel Machek 2025-09-15 9:32 ` Pavel Machek 2 siblings, 0 replies; 27+ messages in thread From: Pavel Machek @ 2025-07-23 21:42 UTC (permalink / raw) To: Johannes Berg Cc: kernel list, tglx, mingo, bp, dave.hansen, x86, hpa, peterz, will, longman, miriam.rachel.korenblit, linux-wireless [-- Attachment #1: Type: text/plain, Size: 30317 bytes --] Hi! > On Wed, 2025-07-23 at 17:42 +0200, Pavel Machek wrote: > > [ 402.125635] ------------[ cut here ]------------ > > [ 402.125638] raw_local_irq_restore() called with IRQs enabled > > [ 402.125645] WARNING: CPU: 3 PID: 387 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30 > > [ 402.125654] Modules linked in: > > [ 402.125661] CPU: 3 UID: 0 PID: 387 Comm: kworker/u16:5 Tainted: G S 6.16.0-rc7+ #303 PREEMPT(voluntary) > > [ 402.125667] Tainted: [S]=CPU_OUT_OF_SPEC > > [ 402.125668] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016 > > [ 402.125671] Workqueue: events_unbound cfg80211_wiphy_work > > [ 402.125678] RIP: 0010:warn_bogus_irq_restore+0x25/0x30 > > [ 402.125683] Code: 90 90 90 90 90 80 3d 51 3d dc 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 c0 4f c9 85 48 89 e5 c6 05 38 3d dc 00 01 e8 9b d8 e6 fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 > > [ 402.125686] RSP: 0018:ffffc9000173fb30 EFLAGS: 00010282 > > [ 402.125691] RAX: 0000000000000000 RBX: ffffffff8616b460 RCX: 0000000000000000 > > [ 402.125694] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff > > [ 402.125696] RBP: ffffc9000173fb30 R08: 0000000028935f32 R09: 0000000000000001 > > [ 402.125699] R10: 0000000000000044 R11: ffff888100ba52c8 R12: 0000000000000001 > > [ 402.125702] R13: ffffc9000173fbcb R14: ffffffff84301224 R15: 0000000000000000 > > [ 402.125704] FS: 0000000000000000(0000) GS:ffff88829007f000(0000) knlGS:0000000000000000 > > [ 402.125707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 402.125710] CR2: 000055967d471ee0 CR3: 0000000006046001 CR4: 00000000000606b0 > > [ 402.125713] Call Trace: > > [ 402.125716] <TASK> > > [ 402.125719] console_flush_all+0x41e/0x460 > > [ 402.125725] ? console_flush_all+0x43/0x460 > > [ 402.125735] console_unlock+0x55/0x100 > > [ 402.125741] vprintk_emit+0x157/0x320 > > [ 402.125748] vprintk_default+0x18/0x20 > > [ 402.125752] vprintk+0x9/0x10 > > [ 402.125756] _printk+0x52/0x70 > > [ 402.125766] ieee80211_sta_rx_queued_mgmt+0x4c8/0xd30 > > [ 402.125775] ? __this_cpu_preempt_check+0x13/0x20 > > [ 402.125784] ieee80211_iface_work+0x3ad/0x500 > > That's not great, but I don't see how the driver or wifi subsystem is > involved ... ieee80211_sta_rx_queued_mgmt() doesn't even use spinlocks > let alone disable IRQs or use raw_ APIs, and it's in the middle of > printk anyway. > > No idea what might be going on here, sorry. Does this give any more hints? I seem to have... pretty unhappy wifi, followed by slab corruption. Is my assumption correct that this looks like wifi problem? (I can share full logs via personal email). BR, Pavel Jul 23 23:29:19 duo kernel: wlp3s0: authenticated Jul 23 23:29:19 duo kernel: wlp3s0: associate with c8:3a:35:f0:ad:f1 (try 1/3) Jul 23 23:29:19 duo kernel: wlp3s0: RX AssocResp from c8:3a:35:f0:ad:f1 (capab=0x411 status=0 aid=7) Jul 23 23:29:19 duo wpa_supplicant[2868]: wlp3s0: Associated with c8:3a:35:f0:ad:f1 Jul 23 23:29:19 duo wpa_supplicant[2868]: wlp3s0: CTRL-EVENT-SUBNET-STATUS-UPDATE status=0 Jul 23 23:29:19 duo kernel: wlp3s0: associated Jul 23 23:29:19 duo NetworkManager[2853]: <info> [1753306159.2521] device (wlp3s0): supplicant interface state: associating -> associated Jul 23 23:29:19 duo NetworkManager[2853]: <info> [1753306159.2522] device (wlp3s0): DHCPv4 lease renewal requested Jul 23 23:29:19 duo NetworkManager[2853]: <info> [1753306159.2524] dhcp4 (wlp3s0): canceled DHCP transaction Jul 23 23:29:19 duo NetworkManager[2853]: <info> [1753306159.2525] dhcp4 (wlp3s0): state changed extended -> terminated Jul 23 23:29:19 duo NetworkManager[2853]: <info> [1753306159.2532] dhcp4 (wlp3s0): activation: beginning transaction (timeout in 45 seconds) Jul 23 23:29:19 duo NetworkManager[2853]: <info> [1753306159.5357] device (wlp3s0): supplicant interface state: associated -> 4way_handshake Jul 23 23:29:19 duo wpa_supplicant[2868]: wlp3s0: WPA: Key negotiation completed with c8:3a:35:f0:ad:f1 [PTK=CCMP GTK=CCMP] Jul 23 23:29:19 duo wpa_supplicant[2868]: wlp3s0: CTRL-EVENT-CONNECTED - Connection to c8:3a:35:f0:ad:f1 completed [id=0 id_str=] Jul 23 23:29:19 duo NetworkManager[2853]: <info> [1753306159.5766] device (wlp3s0): supplicant interface state: 4way_handshake -> completed Jul 23 23:29:19 duo wpa_supplicant[2868]: wlp3s0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-71 noise=9999 txrate=0 Jul 23 23:29:23 duo sudo[78117]: pavel : TTY=pts/1 ; PWD=/fast/pavel ; USER=root ; COMMAND=/usr/sbin/hdparm -B 255 /dev/sda Jul 23 23:29:23 duo sudo[78117]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000) Jul 23 23:29:23 duo sudo[78117]: pam_unix(sudo:session): session closed for user root Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: Microcode SW error detected. Restarting 0x2000000. Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: Loaded firmware version: 18.168.6.1 6000g2a-6.ucode Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: Start IWL Error Log Dump: Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: Status: 0x0000044C, count: 6 Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00001030 | ADVANCED_SYSASSERT Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x0000C064 | uPc Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x0000C05C | branchlink1 Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x0000C05C | branchlink2 Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x0000D6BE | interruptlink1 Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | interruptlink2 Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x003900EE | data1 Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | data2 Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00000800 | line Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x14413F07 | beacon time Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0xF6E050F9 | tsf low Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | tsf hi Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x000005CB | time gp1 Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x533515F9 | time gp2 Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | time gp3 Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x754312A8 | uCode version Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x000000B0 | hw version Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00488700 | board version Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x003A001C | hcmd Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0xA786300A | isr0 Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x1141E000 | isr1 Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00000F1A | isr2 Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x014730C0 | isr3 Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | isr4 Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00010110 | isr_pref Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00023098 | wait_event Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00004288 | l2p_control Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x0000003C | l2p_duration Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | l2p_mhvalid Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x001054F7 | l2p_addr_match Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00000015 | lmpm_pmg_sel Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x06061222 | timestamp Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x000098A8 | flow_handler Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: Start IWL Event Log Dump: nothing in log Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: Device error - SW reset Jul 23 23:29:27 duo kernel: ieee80211 phy0: Hardware restart was requested Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0 Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0 Jul 23 23:29:29 duo sudo[78137]: pavel : TTY=pts/0 ; PWD=/fast/pavel ; USER=root ; COMMAND=/usr/sbin/hdparm -B 255 /dev/sda Jul 23 23:29:29 duo sudo[78137]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000) Jul 23 23:29:29 duo sudo[78137]: pam_unix(sudo:session): session closed for user root Jul 23 23:29:47 duo wpa_supplicant[2868]: wlp3s0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-58 noise=9999 txrate=54000 Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: Microcode SW error detected. Restarting 0x2000000. Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: Loaded firmware version: 18.168.6.1 6000g2a-6.ucode Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: Start IWL Error Log Dump: Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: Status: 0x0000044C, count: 6 Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00001030 | ADVANCED_SYSASSERT Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x0000C064 | uPc Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x0000C05C | branchlink1 Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x0000C05C | branchlink2 Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x0000D6BE | interruptlink1 Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | interruptlink2 Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x0045011D | data1 Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | data2 Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00000800 | line Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x30406BE5 | beacon time Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0xF816841B | tsf low Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | tsf hi Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x0000056B | time gp1 Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x01312AC3 | time gp2 Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | time gp3 Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x754312A8 | uCode version Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x000000B0 | hw version Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00488700 | board version Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x000C001C | hcmd Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0xA7F6300A | isr0 Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x0101E000 | isr1 Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00000E1A | isr2 Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x014768C0 | isr3 Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | isr4 Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x10014112 | isr_pref Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00023098 | wait_event Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x000000C4 | l2p_control Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00000586 | l2p_duration Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00000007 | l2p_mhvalid Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00101042 | l2p_addr_match Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00000015 | lmpm_pmg_sel Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x06061222 | timestamp Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x0000A8B8 | flow_handler Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: Start IWL Event Log Dump: nothing in log Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: Device error - SW reset Jul 23 23:29:47 duo kernel: ieee80211 phy0: Hardware restart was requested Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0 Jul 23 23:29:48 duo kernel: iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0 Jul 23 23:29:53 duo sudo[78189]: pavel : TTY=pts/1 ; PWD=/fast/pavel ; USER=root ; COMMAND=/usr/sbin/hdparm -B 255 /dev/sda Jul 23 23:29:53 duo sudo[78189]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000) Jul 23 23:29:53 duo sudo[78189]: pam_unix(sudo:session): session closed for user root Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Microcode SW error detected. Restarting 0x2000000. Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Loaded firmware version: 18.168.6.1 6000g2a-6.ucode Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Start IWL Error Log Dump: Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Status: 0x0000044C, count: 6 Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00001030 | ADVANCED_SYSASSERT Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x0000C064 | uPc Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x0000C05C | branchlink1 Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x0000C05C | branchlink2 Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x0000D6BE | interruptlink1 Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | interruptlink2 Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x003C00FA | data1 Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | data2 Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00000800 | line Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x0D005446 | beacon time Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0xF86E1BBA | tsf low Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | tsf hi Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00000577 | time gp1 Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00528405 | time gp2 Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | time gp3 Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x754312A8 | uCode version Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x000000B0 | hw version Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00488700 | board version Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x000A001C | hcmd Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0xA7F63008 | isr0 Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x0101E000 | isr1 Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00000E1A | isr2 Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x014730C0 | isr3 Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | isr4 Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x10010112 | isr_pref Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00023098 | wait_event Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x000000C4 | l2p_control Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00000592 | l2p_duration Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00000007 | l2p_mhvalid Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00101042 | l2p_addr_match Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00000015 | lmpm_pmg_sel Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x06061222 | timestamp Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00000818 | flow_handler Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Start IWL Event Log Dump: nothing in log Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Device error - reprobe! Jul 23 23:29:53 duo wpa_supplicant[2868]: wlp3s0: CTRL-EVENT-DISCONNECTED bssid=c8:3a:35:f0:ad:f1 reason=3 locally_generated=1 Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Command REPLY_LEDS_CMD failed: FW Error Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Command REPLY_LEDS_CMD failed: FW Error Jul 23 23:29:53 duo kernel: wlp3s0: deauthenticating from c8:3a:35:f0:ad:f1 by local choice (Reason: 3=DEAUTH_LEAVING) Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Command REPLY_TXFIFO_FLUSH failed: FW Error Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: flush request fail Jul 23 23:29:53 duo kernel: ------------[ cut here ]------------ Jul 23 23:29:53 duo kernel: bad state = 0 Jul 23 23:29:53 duo kernel: WARNING: CPU: 2 PID: 40123 at drivers/net/wireless/intel/iwlwifi/iwl-trans.c:755 iwl_trans_wait_tx_queues_empty+0x44/0x50 Jul 23 23:29:53 duo kernel: Modules linked in: Jul 23 23:29:53 duo kernel: CPU: 2 UID: 0 PID: 40123 Comm: kworker/2:0 Tainted: G S W 6.16.0-rc7+ #303 PREEMPT(voluntary) Jul 23 23:29:53 duo kernel: Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN Jul 23 23:29:53 duo kernel: Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016 Jul 23 23:29:53 duo kernel: Workqueue: events iwl_trans_reprobe_wk Jul 23 23:29:53 duo kernel: RIP: 0010:iwl_trans_wait_tx_queues_empty+0x44/0x50 Jul 23 23:29:53 duo kernel: Code: cc cc cc 80 3d d8 df 47 01 00 b8 fb ff ff ff 75 ec 44 89 c6 48 c7 c7 27 a0 d8 85 89 45 fc c6 05 bd df 47 01 01 e8 8c 79 52 ff <0f> 0b 8b 45 fc c9 c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 Jul 23 23:29:53 duo kernel: RSP: 0018:ffffc9000b2bf7f8 EFLAGS: 00010282 Jul 23 23:29:53 duo kernel: RAX: 0000000000000000 RBX: ffff888101f63060 RCX: 0000000000000000 Jul 23 23:29:53 duo kernel: RDX: 0000000000000002 RSI: 0000000000000027 RDI: 00000000ffffffff Jul 23 23:29:53 duo kernel: RBP: ffffc9000b2bf800 R08: ffffffff8613bc08 R09: 00000000ffffdfff Jul 23 23:29:53 duo kernel: R10: 0000000000000aa3 R11: ffffffff8610bc20 R12: 00000000000ffdef Jul 23 23:29:53 duo kernel: R13: 0000000000000000 R14: ffff888101f60e00 R15: ffff888101f630d8 Jul 23 23:29:53 duo kernel: FS: 0000000000000000(0000) GS:ffff88828ffff000(0000) knlGS:0000000000000000 Jul 23 23:29:53 duo kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 23 23:29:53 duo kernel: CR2: 000055f98d7e8cb0 CR3: 000000010911e004 CR4: 00000000000606b0 Jul 23 23:29:53 duo kernel: Call Trace: Jul 23 23:29:53 duo kernel: <TASK> Jul 23 23:29:53 duo kernel: iwlagn_mac_flush+0x90/0xb0 Jul 23 23:29:53 duo kernel: __ieee80211_flush_queues+0x1aa/0x200 Jul 23 23:29:53 duo kernel: ieee80211_flush_queues+0xe/0x20 Jul 23 23:29:53 duo kernel: ieee80211_set_disassoc+0x6c4/0x940 Jul 23 23:29:53 duo kernel: ieee80211_mgd_deauth+0x120/0x5b0 Jul 23 23:29:53 duo kernel: ieee80211_deauth+0x13/0x20 Jul 23 23:29:53 duo kernel: cfg80211_mlme_deauth+0xac/0x140 Jul 23 23:29:53 duo kernel: cfg80211_mlme_down+0x96/0xc0 Jul 23 23:29:53 duo kernel: cfg80211_disconnect+0x152/0x170 Jul 23 23:29:53 duo kernel: cfg80211_leave+0x180/0x1b0 Jul 23 23:29:53 duo kernel: cfg80211_netdev_notifier_call+0xed/0x3e0 Jul 23 23:29:53 duo kernel: raw_notifier_call_chain+0x44/0x60 Jul 23 23:29:53 duo kernel: call_netdevice_notifiers_info+0x40/0x80 Jul 23 23:29:53 duo kernel: __dev_close_many+0x5a/0x200 Jul 23 23:29:53 duo kernel: dev_close_many+0xa4/0x180 Jul 23 23:29:53 duo kernel: netif_close+0x8c/0xc0 Jul 23 23:29:53 duo kernel: dev_close+0x27/0x70 Jul 23 23:29:53 duo kernel: cfg80211_shutdown_all_interfaces+0x4b/0xf0 Jul 23 23:29:53 duo kernel: ieee80211_remove_interfaces+0x44/0x220 Jul 23 23:29:53 duo kernel: ieee80211_unregister_hw+0x39/0x140 Jul 23 23:29:53 duo kernel: iwlagn_mac_unregister+0x27/0x40 Jul 23 23:29:53 duo kernel: iwl_op_mode_dvm_stop+0x16/0xb0 Jul 23 23:29:53 duo kernel: _iwl_op_mode_stop+0x35/0x70 Jul 23 23:29:53 duo kernel: iwl_drv_stop+0x2b/0xa0 Jul 23 23:29:53 duo kernel: iwl_pci_remove+0x29/0x40 Jul 23 23:29:53 duo kernel: pci_device_remove+0x3c/0xa0 Jul 23 23:29:53 duo kernel: device_remove+0x41/0x70 Jul 23 23:29:53 duo kernel: device_release_driver_internal+0x1a6/0x210 Jul 23 23:29:53 duo kernel: device_driver_detach+0xf/0x20 Jul 23 23:29:53 duo kernel: device_reprobe+0x18/0x90 Jul 23 23:29:53 duo kernel: iwl_trans_reprobe_wk+0x17/0x50 Jul 23 23:29:53 duo kernel: process_one_work+0x3fb/0x500 Jul 23 23:29:53 duo kernel: ? process_one_work+0x3bf/0x500 Jul 23 23:29:53 duo kernel: ? process_one_work+0x3bf/0x500 Jul 23 23:29:53 duo kernel: ? process_one_work+0x3bf/0x500 Jul 23 23:29:53 duo kernel: worker_thread+0x189/0x340 Jul 23 23:29:53 duo kernel: ? __pfx_worker_thread+0x10/0x10 Jul 23 23:29:53 duo kernel: kthread+0x104/0x260 Jul 23 23:29:53 duo kernel: ? __pfx_kthread+0x10/0x10 Jul 23 23:29:53 duo kernel: ret_from_fork+0x7d/0xe0 Jul 23 23:29:53 duo kernel: ? __pfx_kthread+0x10/0x10 Jul 23 23:29:53 duo kernel: ret_from_fork_asm+0x1a/0x30 Jul 23 23:29:53 duo kernel: </TASK> Jul 23 23:29:53 duo kernel: ---[ end trace 0000000000000000 ]--- Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Command REPLY_ADD_STA failed: FW Error Jul 23 23:29:53 duo kernel: wlp3s0: HW problem - can not stop rx aggregation for c8:3a:35:f0:ad:f1 tid 0 Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Command REPLY_ADD_STA failed: FW Error Jul 23 23:29:53 duo kernel: wlp3s0: failed to remove key (0, c8:3a:35:f0:ad:f1) from hardware (-5) Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Command REPLY_QOS_PARAM failed: FW Error Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Failed to update QoS Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Command REPLY_RXON failed: FW Error Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Error clearing ASSOC_MSK on BSS (-5) Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Command REPLY_RXON failed: FW Error Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Error clearing ASSOC_MSK on BSS (-5) Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Command REPLY_RXON failed: FW Error Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Error clearing ASSOC_MSK on BSS (-5) Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Command REPLY_RXON failed: FW Error Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Error clearing ASSOC_MSK on BSS (-5) Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Command REPLY_ADD_STA failed: FW Error Jul 23 23:29:53 duo kernel: wlp3s0: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-5) Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Command REPLY_RXON failed: FW Error Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Error clearing ASSOC_MSK on BSS (-5) Jul 23 23:29:53 duo avahi-daemon[2850]: Interface wlp3s0.IPv4 no longer relevant for mDNS. Jul 23 23:29:53 duo avahi-daemon[2850]: Leaving mDNS multicast group on interface wlp3s0.IPv4 with address 192.168.1.110. Jul 23 23:29:53 duo avahi-daemon[2850]: Withdrawing address record for 192.168.1.110 on wlp3s0. Jul 23 23:29:53 duo NetworkManager[2853]: <info> [1753306193.6437] device (wlp3s0): state change: activated -> unmanaged (reason 'removed', sys-iface-state: 'removed') Jul 23 23:29:53 duo NetworkManager[2853]: <info> [1753306193.6710] dhcp4 (wlp3s0): canceled DHCP transaction Jul 23 23:29:53 duo NetworkManager[2853]: <info> [1753306193.6711] dhcp4 (wlp3s0): state changed unknown -> terminated Jul 23 23:29:53 duo systemd[1]: Starting Load/Save RF Kill Switch Status... Jul 23 23:29:53 duo kernel: ------------[ cut here ]------------ Jul 23 23:29:53 duo kernel: WARNING: CPU: 2 PID: 40123 at mm/slab_common.c:1002 __ksize+0xe9/0x110 Jul 23 23:29:53 duo kernel: Modules linked in: Jul 23 23:29:53 duo kernel: CPU: 2 UID: 0 PID: 40123 Comm: kworker/2:0 Tainted: G S W 6.16.0-rc7+ #303 PREEMPT(voluntary) Jul 23 23:29:53 duo kernel: Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN Jul 23 23:29:53 duo kernel: Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016 Jul 23 23:29:53 duo kernel: Workqueue: events iwl_trans_reprobe_wk Jul 23 23:29:53 duo kernel: RIP: 0010:__ksize+0xe9/0x110 Jul 23 23:29:53 duo kernel: Code: 29 48 89 d9 48 29 d1 48 c1 f9 06 48 89 ca 48 c1 e2 0c 48 03 15 a8 5c a6 01 48 39 d6 75 12 48 8b 13 83 e2 40 48 0f 44 c7 eb 8e <0f> 0b 31 c0 eb 88 0f 0b 31 c0 eb 82 48 8b 05 34 5a c9 01 e9 2d ff Jul 23 23:29:53 duo kernel: RSP: 0018:ffffc9000b2bfc88 EFLAGS: 00010246 Jul 23 23:29:53 duo kernel: RAX: 8000000000000000 RBX: ffffea000777f640 RCX: ffffffffffffffff Jul 23 23:29:53 duo kernel: RDX: ffffea0000000000 RSI: ffff8881ddfd9c80 RDI: ffff8881ddfd9c80 Jul 23 23:29:53 duo kernel: RBP: ffffc9000b2bfc90 R08: 0000000000000000 R09: 000000000020000a Jul 23 23:29:53 duo kernel: R10: 00000000000000db R11: ffff88813dd989b8 R12: ffff8881ddfd9c80 Jul 23 23:29:53 duo kernel: R13: 0000000000000000 R14: 0000000000000d70 R15: 0000000000000057 Jul 23 23:29:53 duo kernel: FS: 0000000000000000(0000) GS:ffff88828ffff000(0000) knlGS:0000000000000000 Jul 23 23:29:53 duo kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 23 23:29:53 duo kernel: CR2: 00000ac403a953c8 CR3: 0000000129dce002 CR4: 00000000000606b0 Jul 23 23:29:53 duo kernel: Call Trace: Jul 23 23:29:53 duo kernel: <TASK> Jul 23 23:29:53 duo kernel: kfree_sensitive+0x14/0x50 Jul 23 23:29:53 duo kernel: iwl_pcie_tx_free+0x16b/0x260 Jul 23 23:29:53 duo kernel: iwl_trans_pcie_free+0x201/0x230 Jul 23 23:29:53 duo kernel: iwl_pci_remove+0x31/0x40 Jul 23 23:29:53 duo kernel: pci_device_remove+0x3c/0xa0 Jul 23 23:29:53 duo kernel: device_remove+0x41/0x70 Jul 23 23:29:53 duo kernel: device_release_driver_internal+0x1a6/0x210 Jul 23 23:29:53 duo kernel: device_driver_detach+0xf/0x20 Jul 23 23:29:53 duo kernel: device_reprobe+0x18/0x90 Jul 23 23:29:53 duo kernel: iwl_trans_reprobe_wk+0x17/0x50 Jul 23 23:29:53 duo kernel: process_one_work+0x3fb/0x500 Jul 23 23:29:53 duo kernel: ? process_one_work+0x3bf/0x500 Jul 23 23:29:53 duo kernel: ? process_one_work+0x3bf/0x500 Jul 23 23:29:53 duo kernel: ? process_one_work+0x3bf/0x500 Jul 23 23:29:53 duo kernel: worker_thread+0x189/0x340 Jul 23 23:29:53 duo kernel: ? __pfx_worker_thread+0x10/0x10 Jul 23 23:29:53 duo kernel: kthread+0x104/0x260 Jul 23 23:29:53 duo kernel: ? __pfx_kthread+0x10/0x10 Jul 23 23:29:53 duo kernel: ret_from_fork+0x7d/0xe0 Jul 23 23:29:53 duo kernel: ? __pfx_kthread+0x10/0x10 Jul 23 23:29:53 duo kernel: ret_from_fork_asm+0x1a/0x30 Jul 23 23:29:53 duo kernel: </TASK> Jul 23 23:29:53 duo kernel: ---[ end trace 0000000000000000 ]--- Jul 23 23:29:53 duo kernel: ------------[ cut here ]------------ Jul 23 23:29:53 duo kernel: WARNING: CPU: 2 PID: 40123 at mm/slub.c:4753 free_large_kmalloc+0xb5/0xd0 Jul 23 23:29:53 duo kernel: Modules linked in: Jul 23 23:29:53 duo kernel: CPU: 2 UID: 0 PID: 40123 Comm: kworker/2:0 Tainted: G S W 6.16.0-rc7+ #303 PREEMPT(voluntary) Jul 23 23:29:53 duo kernel: Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN Jul 23 23:29:53 duo kernel: Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016 Jul 23 23:29:53 duo kernel: Workqueue: events iwl_trans_reprobe_wk Jul 23 23:29:53 duo kernel: RIP: 0010:free_large_kmalloc+0xb5/0xd0 Jul 23 23:29:53 duo kernel: Code: ca 85 48 89 55 f0 c6 05 51 30 dd 01 01 e8 83 51 f0 ff 48 8b 55 f0 eb 96 4c 89 e7 e8 95 3a fa ff 4c 8b 65 f8 c9 c3 cc cc cc cc <0f> 0b 48 c7 c6 08 7f ca 85 4c 89 e7 e8 6a 6f fc ff 4c 8b 65 f8 c9 Jul 23 23:29:53 duo kernel: RSP: 0018:ffffc9000b2bfc30 EFLAGS: 00010297 Jul 23 23:29:53 duo kernel: RAX: 00000000000000f2 RBX: ffff8881ddfd9c80 RCX: ffffffffffffffff Jul 23 23:29:53 duo kernel: RDX: 0000000000000000 RSI: ffff8881ddfd9c80 RDI: ffffea000777f640 Jul 23 23:29:53 duo kernel: RBP: ffffc9000b2bfc40 R08: 0000000000000000 R09: 000000000020000a Jul 23 23:29:53 duo kernel: R10: 00000000000000db R11: ffff88813dd989b8 R12: ffffea000777f640 Jul 23 23:29:53 duo kernel: R13: 0000000000000000 R14: 0000000000000d70 R15: 0000000000000057 Jul 23 23:29:53 duo kernel: FS: 0000000000000000(0000) GS:ffff88828ffff000(0000) knlGS:0000000000000000 Jul 23 23:29:53 duo kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 23 23:29:53 duo kernel: CR2: 00000ac403a953c8 CR3: 0000000129dce002 CR4: 00000000000606b0 Jul 23 23:29:53 duo kernel: Call Trace: Jul 23 23:29:53 duo kernel: <TASK> Jul 23 23:29:53 duo kernel: kfree+0xef/0x210 Jul 23 23:29:53 duo kernel: ? __ksize+0xeb/0x110 Jul 23 23:29:53 duo kernel: kfree_sensitive+0x21/0x50 Jul 23 23:29:53 duo kernel: iwl_pcie_tx_free+0x16b/0x260 Jul 23 23:29:53 duo kernel: iwl_trans_pcie_free+0x201/0x230 Jul 23 23:29:53 duo kernel: iwl_pci_remove+0x31/0x40 Jul 23 23:29:53 duo kernel: pci_device_remove+0x3c/0xa0 Jul 23 23:29:53 duo kernel: device_remove+0x41/0x70 Jul 23 23:29:53 duo kernel: device_release_driver_internal+0x1a6/0x210 Jul 23 23:29:53 duo kernel: device_driver_detach+0xf/0x20 Jul 23 23:29:53 duo kernel: device_reprobe+0x18/0x90 Jul 23 23:29:53 duo kernel: iwl_trans_reprobe_wk+0x17/0x50 Jul 23 23:29:53 duo kernel: process_one_work+0x3fb/0x500 Jul 23 23:29:53 duo kernel: ? process_one_work+0x3bf/0x500 Jul 23 23:29:53 duo kernel: ? process_one_work+0x3bf/0x500 Jul 23 23:29:53 duo kernel: ? process_one_work+0x3bf/0x500 Jul 23 23:29:53 duo kernel: worker_thread+0x189/0x340 Jul 23 23:29:53 duo kernel: ? __pfx_worker_thread+0x10/0x10 Jul 23 23:29:53 duo kernel: kthread+0x104/0x260 Jul 23 23:29:53 duo kernel: ? __pfx_kthread+0x10/0x10 Jul 23 23:29:53 duo kernel: ret_from_fork+0x7d/0xe0 Jul 23 23:29:53 duo kernel: ? __pfx_kthread+0x10/0x10 Jul 23 23:29:53 duo kernel: ret_from_fork_asm+0x1a/0x30 Jul 23 23:29:53 duo kernel: </TASK> Jul 23 23:29:53 duo kernel: ---[ end trace 0000000000000000 ]--- Jul 23 23:29:53 duo kernel: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff8881ddfd9d00 pfn:0x1ddfd9 Jul 23 23:29:53 duo kernel: flags: 0x8000000000000000(zone=2) Jul 23 23:29:53 duo kernel: page_type: f2(table) Jul 23 23:29:53 duo kernel: raw: 8000000000000000 0000000000000000 dead000000000122 0000000000000000 Jul 23 23:29:53 duo kernel: raw: ffff8881ddfd9d00 ffff88801f09d680 00000001f2000000 0000000000000000 Jul 23 23:29:53 duo kernel: page dumped because: Not a kmalloc allocation Jul 23 23:29:53 duo kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 Jul 23 23:29:53 duo kernel: #PF: supervisor instruction fetch in kernel mode Jul 23 23:29:53 duo kernel: #PF: error_code(0x0010) - not-present page Jul 23 23:29:53 duo kernel: PGD 0 P4D 0 Jul 23 23:29:53 duo kernel: Oops: Oops: 0010 [#1] SMP PTI Jul 23 23:29:53 duo kernel: CPU: 3 UID: 0 PID: 3093 Comm: Xorg Tainted: G S W 6.16.0-rc7+ #303 PREEMPT(voluntary) Jul 23 23:29:53 duo kernel: Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN Jul 23 23:29:53 duo kernel: Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016 Jul 23 23:29:53 duo dbus-daemon[3483]: [session uid=1000 pid=3483] Activating service name='org.freedesktop.Notifications' requested by ':1.45' (uid=1000 pid=3746 comm="nm-applet ") -- I don't work for Nazis and criminals, and neither should you. Boycott Putin, Trump, and Musk! [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220 2025-07-23 17:13 ` Johannes Berg 2025-07-23 20:37 ` Pavel Machek 2025-07-23 21:42 ` Pavel Machek @ 2025-09-15 9:32 ` Pavel Machek 2 siblings, 0 replies; 27+ messages in thread From: Pavel Machek @ 2025-09-15 9:32 UTC (permalink / raw) To: Johannes Berg Cc: kernel list, tglx, mingo, bp, dave.hansen, x86, hpa, peterz, will, longman, miriam.rachel.korenblit, linux-wireless [-- Attachment #1: Type: text/plain, Size: 2778 bytes --] Hi! > > [ 402.125635] ------------[ cut here ]------------ > > [ 402.125638] raw_local_irq_restore() called with IRQs enabled > > [ 402.125645] WARNING: CPU: 3 PID: 387 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30 > > [ 402.125654] Modules linked in: > > [ 402.125661] CPU: 3 UID: 0 PID: 387 Comm: kworker/u16:5 Tainted: G S 6.16.0-rc7+ #303 PREEMPT(voluntary) > > [ 402.125667] Tainted: [S]=CPU_OUT_OF_SPEC > > [ 402.125668] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016 > > [ 402.125671] Workqueue: events_unbound cfg80211_wiphy_work > > [ 402.125678] RIP: 0010:warn_bogus_irq_restore+0x25/0x30 > > [ 402.125683] Code: 90 90 90 90 90 80 3d 51 3d dc 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 c0 4f c9 85 48 89 e5 c6 05 38 3d dc 00 01 e8 9b d8 e6 fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 > > [ 402.125686] RSP: 0018:ffffc9000173fb30 EFLAGS: 00010282 > > [ 402.125691] RAX: 0000000000000000 RBX: ffffffff8616b460 RCX: 0000000000000000 > > [ 402.125694] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff > > [ 402.125696] RBP: ffffc9000173fb30 R08: 0000000028935f32 R09: 0000000000000001 > > [ 402.125699] R10: 0000000000000044 R11: ffff888100ba52c8 R12: 0000000000000001 > > [ 402.125702] R13: ffffc9000173fbcb R14: ffffffff84301224 R15: 0000000000000000 > > [ 402.125704] FS: 0000000000000000(0000) GS:ffff88829007f000(0000) knlGS:0000000000000000 > > [ 402.125707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 402.125710] CR2: 000055967d471ee0 CR3: 0000000006046001 CR4: 00000000000606b0 > > [ 402.125713] Call Trace: > > [ 402.125716] <TASK> > > [ 402.125719] console_flush_all+0x41e/0x460 > > [ 402.125725] ? console_flush_all+0x43/0x460 > > [ 402.125735] console_unlock+0x55/0x100 > > [ 402.125741] vprintk_emit+0x157/0x320 > > [ 402.125748] vprintk_default+0x18/0x20 > > [ 402.125752] vprintk+0x9/0x10 > > [ 402.125756] _printk+0x52/0x70 > > [ 402.125766] ieee80211_sta_rx_queued_mgmt+0x4c8/0xd30 > > [ 402.125775] ? __this_cpu_preempt_check+0x13/0x20 > > [ 402.125784] ieee80211_iface_work+0x3ad/0x500 > > That's not great, but I don't see how the driver or wifi subsystem is > involved ... ieee80211_sta_rx_queued_mgmt() doesn't even use spinlocks > let alone disable IRQs or use raw_ APIs, and it's in the middle of > printk anyway. > > No idea what might be going on here, sorry. For the record, problem was eventually traced to intel wifi driver bug in bugzilla, and at least experimental patch exist fixing it. Best regards, Pavel -- I don't work for Nazis and criminals, and neither should you. Boycott Putin, Trump, Netanyahu and Musk! [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220 2025-07-23 15:42 ` locking problems in iwlwifi? was " Pavel Machek 2025-07-23 17:13 ` Johannes Berg @ 2025-07-23 17:31 ` Thomas Gleixner 2025-07-23 17:32 ` Thomas Gleixner 1 sibling, 1 reply; 27+ messages in thread From: Thomas Gleixner @ 2025-07-23 17:31 UTC (permalink / raw) To: Pavel Machek, kernel list, mingo, bp, dave.hansen, x86, hpa, peterz, will, longman, miriam.rachel.korenblit, linux-wireless, Petr Mladek, John Ogness On Wed, Jul 23 2025 at 17:42, Pavel Machek wrote: > Did kernel boot on console (w/o X), and got this: not sure if it is > related. > [ 402.125635] ------------[ cut here ]------------ > [ 402.125638] raw_local_irq_restore() called with IRQs enabled > [ 402.125645] WARNING: CPU: 3 PID: 387 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30 > [ 402.125654] Modules linked in: > [ 402.125661] CPU: 3 UID: 0 PID: 387 Comm: kworker/u16:5 Tainted: G S 6.16.0-rc7+ #303 PREEMPT(voluntary) > [ 402.125667] Tainted: [S]=CPU_OUT_OF_SPEC > [ 402.125668] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016 > [ 402.125671] Workqueue: events_unbound cfg80211_wiphy_work > [ 402.125678] RIP: 0010:warn_bogus_irq_restore+0x25/0x30 > [ 402.125683] Code: 90 90 90 90 90 80 3d 51 3d dc 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 c0 4f c9 85 48 89 e5 c6 05 38 3d dc 00 01 e8 9b d8 e6 fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 > [ 402.125686] RSP: 0018:ffffc9000173fb30 EFLAGS: 00010282 > [ 402.125691] RAX: 0000000000000000 RBX: ffffffff8616b460 RCX: 0000000000000000 > [ 402.125694] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff > [ 402.125696] RBP: ffffc9000173fb30 R08: 0000000028935f32 R09: 0000000000000001 > [ 402.125699] R10: 0000000000000044 R11: ffff888100ba52c8 R12: 0000000000000001 > [ 402.125702] R13: ffffc9000173fbcb R14: ffffffff84301224 R15: 0000000000000000 > [ 402.125704] FS: 0000000000000000(0000) GS:ffff88829007f000(0000) knlGS:0000000000000000 > [ 402.125707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 402.125710] CR2: 000055967d471ee0 CR3: 0000000006046001 CR4: 00000000000606b0 > [ 402.125713] Call Trace: > [ 402.125716] <TASK> > [ 402.125719] console_flush_all+0x41e/0x460 Can you please decode this, so we can see which part of that code it is? > [ 402.125725] ? console_flush_all+0x43/0x460 > [ 402.125735] console_unlock+0x55/0x100 > [ 402.125741] vprintk_emit+0x157/0x320 > [ 402.125748] vprintk_default+0x18/0x20 > [ 402.125752] vprintk+0x9/0x10 > [ 402.125756] _printk+0x52/0x70 Thanks, tglx ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220 2025-07-23 17:31 ` Thomas Gleixner @ 2025-07-23 17:32 ` Thomas Gleixner 2025-07-24 17:15 ` Pavel Machek 0 siblings, 1 reply; 27+ messages in thread From: Thomas Gleixner @ 2025-07-23 17:32 UTC (permalink / raw) To: Pavel Machek, kernel list, mingo, bp, dave.hansen, x86, hpa, peterz, will, longman, miriam.rachel.korenblit, linux-wireless, Petr Mladek, John Ogness On Wed, Jul 23 2025 at 19:31, Thomas Gleixner wrote: > On Wed, Jul 23 2025 at 17:42, Pavel Machek wrote: >> Did kernel boot on console (w/o X), and got this: not sure if it is >> related. >> [ 402.125635] ------------[ cut here ]------------ >> [ 402.125638] raw_local_irq_restore() called with IRQs enabled >> [ 402.125645] WARNING: CPU: 3 PID: 387 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30 >> [ 402.125654] Modules linked in: >> [ 402.125661] CPU: 3 UID: 0 PID: 387 Comm: kworker/u16:5 Tainted: G S 6.16.0-rc7+ #303 PREEMPT(voluntary) >> [ 402.125667] Tainted: [S]=CPU_OUT_OF_SPEC >> [ 402.125668] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016 >> [ 402.125671] Workqueue: events_unbound cfg80211_wiphy_work >> [ 402.125678] RIP: 0010:warn_bogus_irq_restore+0x25/0x30 >> [ 402.125683] Code: 90 90 90 90 90 80 3d 51 3d dc 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 c0 4f c9 85 48 89 e5 c6 05 38 3d dc 00 01 e8 9b d8 e6 fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 >> [ 402.125686] RSP: 0018:ffffc9000173fb30 EFLAGS: 00010282 >> [ 402.125691] RAX: 0000000000000000 RBX: ffffffff8616b460 RCX: 0000000000000000 >> [ 402.125694] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff >> [ 402.125696] RBP: ffffc9000173fb30 R08: 0000000028935f32 R09: 0000000000000001 >> [ 402.125699] R10: 0000000000000044 R11: ffff888100ba52c8 R12: 0000000000000001 >> [ 402.125702] R13: ffffc9000173fbcb R14: ffffffff84301224 R15: 0000000000000000 >> [ 402.125704] FS: 0000000000000000(0000) GS:ffff88829007f000(0000) knlGS:0000000000000000 >> [ 402.125707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 402.125710] CR2: 000055967d471ee0 CR3: 0000000006046001 CR4: 00000000000606b0 >> [ 402.125713] Call Trace: >> [ 402.125716] <TASK> >> [ 402.125719] console_flush_all+0x41e/0x460 > > Can you please decode this, so we can see which part of that code it is? And enable lockdep so that we can see where the interrupts were enabled? Thanks, tglx ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220 2025-07-23 17:32 ` Thomas Gleixner @ 2025-07-24 17:15 ` Pavel Machek 2025-07-24 17:51 ` Waiman Long 0 siblings, 1 reply; 27+ messages in thread From: Pavel Machek @ 2025-07-24 17:15 UTC (permalink / raw) To: Thomas Gleixner Cc: kernel list, mingo, bp, dave.hansen, x86, hpa, peterz, will, longman, miriam.rachel.korenblit, linux-wireless, Petr Mladek, John Ogness [-- Attachment #1: Type: text/plain, Size: 17298 bytes --] Hi! On Wed 2025-07-23 19:32:10, Thomas Gleixner wrote: > On Wed, Jul 23 2025 at 19:31, Thomas Gleixner wrote: > > On Wed, Jul 23 2025 at 17:42, Pavel Machek wrote: > >> Did kernel boot on console (w/o X), and got this: not sure if it is > >> related. > >> [ 402.125635] ------------[ cut here ]------------ > >> [ 402.125638] raw_local_irq_restore() called with IRQs enabled > >> [ 402.125645] WARNING: CPU: 3 PID: 387 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30 > >> [ 402.125654] Modules linked in: > >> [ 402.125661] CPU: 3 UID: 0 PID: 387 Comm: kworker/u16:5 Tainted: G S 6.16.0-rc7+ #303 PREEMPT(voluntary) > >> [ 402.125667] Tainted: [S]=CPU_OUT_OF_SPEC > >> [ 402.125668] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016 > >> [ 402.125671] Workqueue: events_unbound cfg80211_wiphy_work > >> [ 402.125678] RIP: 0010:warn_bogus_irq_restore+0x25/0x30 > >> [ 402.125683] Code: 90 90 90 90 90 80 3d 51 3d dc 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 c0 4f c9 85 48 89 e5 c6 05 38 3d dc 00 01 e8 9b d8 e6 fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 > >> [ 402.125686] RSP: 0018:ffffc9000173fb30 EFLAGS: 00010282 > >> [ 402.125691] RAX: 0000000000000000 RBX: ffffffff8616b460 RCX: 0000000000000000 > >> [ 402.125694] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff > >> [ 402.125696] RBP: ffffc9000173fb30 R08: 0000000028935f32 R09: 0000000000000001 > >> [ 402.125699] R10: 0000000000000044 R11: ffff888100ba52c8 R12: 0000000000000001 > >> [ 402.125702] R13: ffffc9000173fbcb R14: ffffffff84301224 R15: 0000000000000000 > >> [ 402.125704] FS: 0000000000000000(0000) GS:ffff88829007f000(0000) knlGS:0000000000000000 > >> [ 402.125707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >> [ 402.125710] CR2: 000055967d471ee0 CR3: 0000000006046001 CR4: 00000000000606b0 > >> [ 402.125713] Call Trace: > >> [ 402.125716] <TASK> > >> [ 402.125719] console_flush_all+0x41e/0x460 > > > > Can you please decode this, so we can see which part of that code it is? > > And enable lockdep so that we can see where the interrupts were enabled? Enabled lockdep and got this one. It seems resume with bad wifi signal does it on 6.16...? Is it any good? Any decoding needed? ... [ 32.361445] CPU2 is up [ 32.361729] smpboot: Booting Node 0 Processor 3 APIC 0x3 [ 32.361982] Disabled fast string operations [ 32.366800] CPU3 is up [ 32.370186] ACPI: PM: Waking up from system sleep state S3 [ 32.393904] ACPI: EC: interrupt unblocked [ 32.396000] sdhci-pci 0000:0d:00.0: MMC controller base frequency changed to 50Mhz. [ 32.409738] ACPI: EC: event unblocked [ 32.470808] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0 [ 32.687300] usb 2-1.4: reset full-speed USB device number 4 using ehci-pci [ 32.758329] ata2: SATA link down (SStatus 0 SControl 300) [ 32.758375] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 32.762316] ata5: SATA link down (SStatus 0 SControl 300) [ 32.764585] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out [ 32.764593] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out [ 32.771931] sd 2:0:0:0: [sdb] Starting disk [ 32.777439] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out [ 32.777450] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out [ 32.782731] ata3.00: configured for UDMA/133 [ 32.786846] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0 [ 32.858513] usb 2-1.3: reset full-speed USB device number 3 using ehci-pci [ 33.026331] usb 2-1.6: reset high-speed USB device number 5 using ehci-pci [ 33.662583] psmouse serio1: synaptics: queried max coordinates: x [..5472], y [..4448] [ 33.810033] PM: resume devices took 1.404 seconds [ 33.841597] OOM killer enabled. [ 33.841808] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: docking [ 33.843280] Restarting tasks: Starting [ 33.849066] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: Unable to dock! [ 33.852744] Restarting tasks: Done [ 33.888306] PM: suspend exit [ 33.941831] Bluetooth: hci0: BCM: chip id 63 [ 33.944910] Bluetooth: hci0: BCM: features 0x07 [ 33.961985] Bluetooth: hci0: BCM20702A [ 33.962628] Bluetooth: hci0: BCM20702A1 (001.002.014) build 0000 [ 33.970094] Bluetooth: hci0: BCM: firmware Patch file not found, tried: [ 33.970993] Bluetooth: hci0: BCM: 'brcm/BCM20702A1-0a5c-21e6.hcd' [ 33.971410] Bluetooth: hci0: BCM: 'brcm/BCM-0a5c-21e6.hcd' [ 34.032198] Bluetooth: MGMT ver 1.23 [ 34.838285] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 35.166235] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out [ 35.167828] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out [ 35.206578] sd 0:0:0:0: [sda] Starting disk [ 35.238578] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out [ 35.240197] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out [ 35.278928] ata1.00: configured for UDMA/133 [ 35.613471] e1000e 0000:00:19.0 enp0s25: NIC Link is Down [ 54.283825] wlp3s0: authenticate with c8:3a:35:f0:ad:f1 (local address=a0:88:b4:62:a7:30) [ 54.284095] ================================ [ 54.284097] WARNING: inconsistent lock state [ 54.284100] 6.16.0-rc7+ #305 Tainted: G S [ 54.284104] -------------------------------- [ 54.284105] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. [ 54.284108] wpa_supplicant/2940 [HC0[0]:SC0[0]:HE0:SE1] takes: [ 54.284114] ffffffff86263fe0 (console_owner){+.?.}-{0:0}, at: console_lock_spinning_enable+0x3d/0x60 [ 54.284133] {IN-SOFTIRQ-W} state was registered at: [ 54.284135] lock_acquire+0xdc/0x300 [ 54.284144] console_lock_spinning_enable+0x59/0x60 [ 54.284148] console_flush_all+0x2c0/0x4d0 [ 54.284153] console_unlock+0x55/0x100 [ 54.284157] vprintk_emit+0x15b/0x3a0 [ 54.284162] vprintk_default+0x18/0x20 [ 54.284166] vprintk+0x9/0x10 [ 54.284171] _printk+0x52/0x70 [ 54.284177] _credit_init_bits.part.0+0xec/0x160 [ 54.284187] entropy_timer+0xaa/0xc0 [ 54.284194] call_timer_fn+0xa7/0x260 [ 54.284200] expire_timers+0xef/0x1b0 [ 54.284205] run_timer_base+0xb0/0x140 [ 54.284210] run_timer_softirq+0xb/0x40 [ 54.284215] handle_softirqs+0xcd/0x490 [ 54.284222] irq_exit_rcu+0xa2/0x160 [ 54.284227] sysvec_apic_timer_interrupt+0x9b/0xc0 [ 54.284236] asm_sysvec_apic_timer_interrupt+0x1b/0x20 [ 54.284242] cpuidle_enter_state+0x124/0x550 [ 54.284247] cpuidle_enter+0x29/0x40 [ 54.284253] do_idle+0x1d9/0x260 [ 54.284262] cpu_startup_entry+0x27/0x30 [ 54.284270] start_secondary+0x11e/0x140 [ 54.284277] common_startup_64+0x129/0x138 [ 54.284285] irq event stamp: 35912 [ 54.284287] hardirqs last enabled at (35911): [<ffffffff8432d3a1>] __down_trylock_console_sem+0xb1/0xc0 [ 54.284296] hardirqs last disabled at (35912): [<ffffffff84330651>] console_flush_all+0x3e1/0x4d0 [ 54.284302] softirqs last enabled at (35906): [<ffffffff85608c7e>] ieee80211_check_fast_xmit+0xce/0x820 [ 54.284312] softirqs last disabled at (35904): [<ffffffff85608c2c>] ieee80211_check_fast_xmit+0x7c/0x820 [ 54.284319] other info that might help us debug this: [ 54.284321] Possible unsafe locking scenario: [ 54.284322] CPU0 [ 54.284323] ---- [ 54.284324] lock(console_owner); [ 54.284328] <Interrupt> [ 54.284329] lock(console_owner); [ 54.284332] *** DEADLOCK *** [ 54.284333] 8 locks held by wpa_supplicant/2940: [ 54.284336] #0: ffffffff8646f9b0 (cb_lock){++++}-{4:4}, at: genl_rcv+0x15/0x40 [ 54.284351] #1: ffff888103068768 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: nl80211_pre_doit+0xc2/0x250 [ 54.284367] #2: ffffffff86344200 (console_lock){+.+.}-{0:0}, at: vprintk_default+0x18/0x20 [ 54.284376] #3: ffffffff86344250 (console_srcu){....}-{0:0}, at: console_flush_all+0x43/0x4d0 [ 54.284386] #4: ffffffff86263fe0 (console_owner){+.?.}-{0:0}, at: console_lock_spinning_enable+0x3d/0x60 [ 54.284395] #5: ffffffff86263f60 (printk_legacy_map-wait-type-override){+...}-{4:4}, at: console_flush_all+0x2d2/0x4d0 [ 54.284404] #6: ffffffff863ed2f8 (printing_lock){+...}-{3:3}, at: vt_console_print+0x55/0x470 [ 54.284415] #7: ffffffff86346ac0 (rcu_read_lock){....}-{1:3}, at: __flush_work+0x5e/0x550 [ 54.284428] stack backtrace: [ 54.284433] CPU: 3 UID: 0 PID: 2940 Comm: wpa_supplicant Tainted: G S 6.16.0-rc7+ #305 PREEMPT(voluntary) [ 54.284441] Tainted: [S]=CPU_OUT_OF_SPEC [ 54.284443] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016 [ 54.284446] Call Trace: [ 54.284448] <TASK> [ 54.284450] dump_stack_lvl+0x88/0xd0 [ 54.284458] dump_stack+0x10/0x20 [ 54.284463] print_usage_bug.part.0+0x237/0x2d0 [ 54.284470] mark_lock.part.0+0xa9c/0xfb0 [ 54.284479] mark_held_locks+0x4d/0x80 [ 54.284486] lockdep_hardirqs_on_prepare+0xff/0x1c0 [ 54.284493] trace_hardirqs_on+0x5a/0xe0 [ 54.284500] _raw_spin_unlock_irq+0x23/0x60 [ 54.284505] __flush_work+0x3b4/0x550 [ 54.284513] ? __timer_delete+0x2f/0xd0 [ 54.284519] ? timer_delete+0xb/0x20 [ 54.284524] ? try_to_grab_pending+0x12a/0x320 [ 54.284534] cancel_delayed_work_sync+0x65/0x70 [ 54.284539] fbcon_cursor+0xbe/0x160 [ 54.284548] hide_cursor+0x2c/0xc0 [ 54.284553] vt_console_print+0x45e/0x470 [ 54.284560] console_flush_all+0x301/0x4d0 [ 54.284565] ? console_flush_all+0x2d2/0x4d0 [ 54.284569] ? console_flush_all+0x43/0x4d0 [ 54.284572] ? console_flush_all+0x2d2/0x4d0 [ 54.284579] console_unlock+0x55/0x100 [ 54.284584] vprintk_emit+0x15b/0x3a0 [ 54.284590] vprintk_default+0x18/0x20 [ 54.284595] vprintk+0x9/0x10 [ 54.284600] _printk+0x52/0x70 [ 54.284608] ieee80211_mgd_auth+0x2a3/0x5e0 [ 54.284619] ? __this_cpu_preempt_check+0x13/0x20 [ 54.284625] ieee80211_auth+0x13/0x20 [ 54.284631] cfg80211_mlme_auth+0xeb/0x2a0 [ 54.284640] nl80211_authenticate+0x341/0x3a0 [ 54.284653] genl_family_rcv_msg_doit.constprop.0+0xd4/0x130 [ 54.284663] genl_rcv_msg+0x14c/0x250 [ 54.284669] ? __pfx_nl80211_pre_doit+0x10/0x10 [ 54.284676] ? __pfx_nl80211_authenticate+0x10/0x10 [ 54.284684] ? __pfx_nl80211_post_doit+0x10/0x10 [ 54.284691] ? __pfx_genl_rcv_msg+0x10/0x10 [ 54.284697] netlink_rcv_skb+0x55/0x100 [ 54.284706] genl_rcv+0x24/0x40 [ 54.284712] netlink_unicast+0x1e8/0x2b0 [ 54.284718] netlink_sendmsg+0x1e3/0x400 [ 54.284726] ____sys_sendmsg+0x2a0/0x2f0 [ 54.284736] ? copy_msghdr_from_user+0x71/0xb0 [ 54.284744] ___sys_sendmsg+0x85/0xd0 [ 54.284751] ? find_held_lock+0x31/0x90 [ 54.284756] ? __might_fault+0x2c/0x70 [ 54.284762] ? __this_cpu_preempt_check+0x13/0x20 [ 54.284766] ? __might_fault+0x2c/0x70 [ 54.284771] ? __might_fault+0x2c/0x70 [ 54.284776] ? debug_smp_processor_id+0x17/0x20 [ 54.284781] __sys_sendmsg+0x6e/0xd0 [ 54.284788] ? do_syscall_64+0x39/0x2c0 [ 54.284795] __x64_sys_sendmsg+0x1a/0x20 [ 54.284801] x64_sys_call+0x929/0x2150 [ 54.284808] do_syscall_64+0x71/0x2c0 [ 54.284814] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 54.284819] RIP: 0033:0x7f79e0d28fb3 [ 54.284825] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b7 66 2e 0f 1f 84 00 00 00 00 00 90 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 89 54 24 1c 48 [ 54.284831] RSP: 002b:00007ffc79c940b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 54.284837] RAX: ffffffffffffffda RBX: 000055b73b414650 RCX: 00007f79e0d28fb3 [ 54.284840] RDX: 0000000000000000 RSI: 00007ffc79c940f0 RDI: 0000000000000006 [ 54.284843] RBP: 000055b73b44dc70 R08: 0000000000000004 R09: 00007f79e0dfabe0 [ 54.284846] R10: 00007ffc79c941c4 R11: 0000000000000246 R12: 000055b73b414560 [ 54.284849] R13: 00007ffc79c940f0 R14: 00007ffc79c941c4 R15: 000055b73b453f20 [ 54.284856] </TASK> [ 54.285774] ------------[ cut here ]------------ [ 54.285777] raw_local_irq_restore() called with IRQs enabled [ 54.285788] WARNING: CPU: 3 PID: 2940 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30 [ 54.285801] Modules linked in: [ 54.285807] CPU: 3 UID: 0 PID: 2940 Comm: wpa_supplicant Tainted: G S 6.16.0-rc7+ #305 PREEMPT(voluntary) [ 54.285814] Tainted: [S]=CPU_OUT_OF_SPEC [ 54.285817] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016 [ 54.285820] RIP: 0010:warn_bogus_irq_restore+0x25/0x30 [ 54.285828] Code: 90 90 90 90 90 80 3d c2 5f e3 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 98 2f fa 85 c6 05 ac 5f e3 00 01 48 89 e5 e8 db 23 be fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 [ 54.285834] RSP: 0018:ffffc90001acb690 EFLAGS: 00010282 [ 54.285839] RAX: 0000000000000000 RBX: ffffffff863ed460 RCX: 0000000000000000 [ 54.285843] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff [ 54.285847] RBP: ffffc90001acb690 R08: 0000000000000000 R09: 0000000000000000 [ 54.285850] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 [ 54.285853] R13: ffffc90001acb72b R14: 0000000000000200 R15: 0000000000000000 [ 54.285857] FS: 00007f79e08da1c0(0000) GS:ffff88828fcf5000(0000) knlGS:0000000000000000 [ 54.285862] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 54.285865] CR2: 000055b73b455000 CR3: 000000010b07e005 CR4: 00000000000606b0 [ 54.285869] Call Trace: [ 54.285872] <TASK> [ 54.285874] console_flush_all+0x47d/0x4d0 [ 54.285880] ? console_flush_all+0x43/0x4d0 [ 54.285885] ? console_flush_all+0x2d2/0x4d0 [ 54.285892] console_unlock+0x55/0x100 [ 54.285898] vprintk_emit+0x15b/0x3a0 [ 54.285904] vprintk_default+0x18/0x20 [ 54.285910] vprintk+0x9/0x10 [ 54.285915] _printk+0x52/0x70 [ 54.285924] ieee80211_mgd_auth+0x2a3/0x5e0 [ 54.285934] ? __this_cpu_preempt_check+0x13/0x20 [ 54.285940] ieee80211_auth+0x13/0x20 [ 54.285945] cfg80211_mlme_auth+0xeb/0x2a0 [ 54.285954] nl80211_authenticate+0x341/0x3a0 [ 54.285967] genl_family_rcv_msg_doit.constprop.0+0xd4/0x130 [ 54.285977] genl_rcv_msg+0x14c/0x250 [ 54.285984] ? __pfx_nl80211_pre_doit+0x10/0x10 [ 54.285991] ? __pfx_nl80211_authenticate+0x10/0x10 [ 54.286000] ? __pfx_nl80211_post_doit+0x10/0x10 [ 54.286007] ? __pfx_genl_rcv_msg+0x10/0x10 [ 54.286014] netlink_rcv_skb+0x55/0x100 [ 54.286023] genl_rcv+0x24/0x40 [ 54.286029] netlink_unicast+0x1e8/0x2b0 [ 54.286036] netlink_sendmsg+0x1e3/0x400 [ 54.286044] ____sys_sendmsg+0x2a0/0x2f0 [ 54.286052] ? copy_msghdr_from_user+0x71/0xb0 [ 54.286061] ___sys_sendmsg+0x85/0xd0 [ 54.286068] ? find_held_lock+0x31/0x90 [ 54.286073] ? __might_fault+0x2c/0x70 [ 54.286078] ? __this_cpu_preempt_check+0x13/0x20 [ 54.286083] ? __might_fault+0x2c/0x70 [ 54.286088] ? __might_fault+0x2c/0x70 [ 54.286094] ? debug_smp_processor_id+0x17/0x20 [ 54.286100] __sys_sendmsg+0x6e/0xd0 [ 54.286108] ? do_syscall_64+0x39/0x2c0 [ 54.286114] __x64_sys_sendmsg+0x1a/0x20 [ 54.286168] x64_sys_call+0x929/0x2150 [ 54.286182] do_syscall_64+0x71/0x2c0 [ 54.286193] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 54.286200] RIP: 0033:0x7f79e0d28fb3 [ 54.286207] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b7 66 2e 0f 1f 84 00 00 00 00 00 90 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 89 54 24 1c 48 [ 54.286215] RSP: 002b:00007ffc79c940b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 54.286225] RAX: ffffffffffffffda RBX: 000055b73b414650 RCX: 00007f79e0d28fb3 [ 54.286230] RDX: 0000000000000000 RSI: 00007ffc79c940f0 RDI: 0000000000000006 [ 54.286235] RBP: 000055b73b44dc70 R08: 0000000000000004 R09: 00007f79e0dfabe0 [ 54.286246] R10: 00007ffc79c941c4 R11: 0000000000000246 R12: 000055b73b414560 [ 54.286247] R13: 00007ffc79c940f0 R14: 00007ffc79c941c4 R15: 000055b73b453f20 [ 54.286266] </TASK> [ 54.286269] irq event stamp: 35912 [ 54.286273] hardirqs last enabled at (35911): [<ffffffff8432d3a1>] __down_trylock_console_sem+0xb1/0xc0 [ 54.286286] hardirqs last disabled at (35912): [<ffffffff84330651>] console_flush_all+0x3e1/0x4d0 [ 54.286294] softirqs last enabled at (35906): [<ffffffff85608c7e>] ieee80211_check_fast_xmit+0xce/0x820 [ 54.286306] softirqs last disabled at (35904): [<ffffffff85608c2c>] ieee80211_check_fast_xmit+0x7c/0x820 [ 54.286318] ---[ end trace 0000000000000000 ]--- [ 54.373428] wlp3s0: send auth to c8:3a:35:f0:ad:f1 (try 1/3) [ 54.381882] wlp3s0: authenticated [ 54.386240] wlp3s0: associate with c8:3a:35:f0:ad:f1 (try 1/3) [ 54.392641] wlp3s0: RX AssocResp from c8:3a:35:f0:ad:f1 (capab=0x411 status=0 aid=6) [ 54.413392] wlp3s0: associated Best regards, Pavel -- I don't work for Nazis and criminals, and neither should you. Boycott Putin, Trump, and Musk! [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220 2025-07-24 17:15 ` Pavel Machek @ 2025-07-24 17:51 ` Waiman Long 2025-07-24 17:55 ` Waiman Long ` (2 more replies) 0 siblings, 3 replies; 27+ messages in thread From: Waiman Long @ 2025-07-24 17:51 UTC (permalink / raw) To: Pavel Machek, Thomas Gleixner Cc: kernel list, mingo, bp, dave.hansen, x86, hpa, peterz, will, miriam.rachel.korenblit, linux-wireless, Petr Mladek, John Ogness On 7/24/25 1:15 PM, Pavel Machek wrote: > Hi! > > On Wed 2025-07-23 19:32:10, Thomas Gleixner wrote: >> On Wed, Jul 23 2025 at 19:31, Thomas Gleixner wrote: >>> On Wed, Jul 23 2025 at 17:42, Pavel Machek wrote: >>>> Did kernel boot on console (w/o X), and got this: not sure if it is >>>> related. >>>> [ 402.125635] ------------[ cut here ]------------ >>>> [ 402.125638] raw_local_irq_restore() called with IRQs enabled >>>> [ 402.125645] WARNING: CPU: 3 PID: 387 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30 >>>> [ 402.125654] Modules linked in: >>>> [ 402.125661] CPU: 3 UID: 0 PID: 387 Comm: kworker/u16:5 Tainted: G S 6.16.0-rc7+ #303 PREEMPT(voluntary) >>>> [ 402.125667] Tainted: [S]=CPU_OUT_OF_SPEC >>>> [ 402.125668] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016 >>>> [ 402.125671] Workqueue: events_unbound cfg80211_wiphy_work >>>> [ 402.125678] RIP: 0010:warn_bogus_irq_restore+0x25/0x30 >>>> [ 402.125683] Code: 90 90 90 90 90 80 3d 51 3d dc 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 c0 4f c9 85 48 89 e5 c6 05 38 3d dc 00 01 e8 9b d8 e6 fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 >>>> [ 402.125686] RSP: 0018:ffffc9000173fb30 EFLAGS: 00010282 >>>> [ 402.125691] RAX: 0000000000000000 RBX: ffffffff8616b460 RCX: 0000000000000000 >>>> [ 402.125694] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff >>>> [ 402.125696] RBP: ffffc9000173fb30 R08: 0000000028935f32 R09: 0000000000000001 >>>> [ 402.125699] R10: 0000000000000044 R11: ffff888100ba52c8 R12: 0000000000000001 >>>> [ 402.125702] R13: ffffc9000173fbcb R14: ffffffff84301224 R15: 0000000000000000 >>>> [ 402.125704] FS: 0000000000000000(0000) GS:ffff88829007f000(0000) knlGS:0000000000000000 >>>> [ 402.125707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> [ 402.125710] CR2: 000055967d471ee0 CR3: 0000000006046001 CR4: 00000000000606b0 >>>> [ 402.125713] Call Trace: >>>> [ 402.125716] <TASK> >>>> [ 402.125719] console_flush_all+0x41e/0x460 >>> Can you please decode this, so we can see which part of that code it is? >> And enable lockdep so that we can see where the interrupts were enabled? > Enabled lockdep and got this one. It seems resume with bad wifi signal > does it on 6.16...? > > Is it any good? Any decoding needed? > > ... > [ 32.361445] CPU2 is up > [ 32.361729] smpboot: Booting Node 0 Processor 3 APIC 0x3 > [ 32.361982] Disabled fast string operations > [ 32.366800] CPU3 is up > [ 32.370186] ACPI: PM: Waking up from system sleep state S3 > [ 32.393904] ACPI: EC: interrupt unblocked > [ 32.396000] sdhci-pci 0000:0d:00.0: MMC controller base frequency changed to 50Mhz. > [ 32.409738] ACPI: EC: event unblocked > [ 32.470808] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0 > [ 32.687300] usb 2-1.4: reset full-speed USB device number 4 using ehci-pci > [ 32.758329] ata2: SATA link down (SStatus 0 SControl 300) > [ 32.758375] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) > [ 32.762316] ata5: SATA link down (SStatus 0 SControl 300) > [ 32.764585] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out > [ 32.764593] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out > [ 32.771931] sd 2:0:0:0: [sdb] Starting disk > [ 32.777439] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out > [ 32.777450] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out > [ 32.782731] ata3.00: configured for UDMA/133 > [ 32.786846] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0 > [ 32.858513] usb 2-1.3: reset full-speed USB device number 3 using ehci-pci > [ 33.026331] usb 2-1.6: reset high-speed USB device number 5 using ehci-pci > [ 33.662583] psmouse serio1: synaptics: queried max coordinates: x [..5472], y [..4448] > [ 33.810033] PM: resume devices took 1.404 seconds > [ 33.841597] OOM killer enabled. > [ 33.841808] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: docking > [ 33.843280] Restarting tasks: Starting > [ 33.849066] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: Unable to dock! > [ 33.852744] Restarting tasks: Done > [ 33.888306] PM: suspend exit > [ 33.941831] Bluetooth: hci0: BCM: chip id 63 > [ 33.944910] Bluetooth: hci0: BCM: features 0x07 > [ 33.961985] Bluetooth: hci0: BCM20702A > [ 33.962628] Bluetooth: hci0: BCM20702A1 (001.002.014) build 0000 > [ 33.970094] Bluetooth: hci0: BCM: firmware Patch file not found, tried: > [ 33.970993] Bluetooth: hci0: BCM: 'brcm/BCM20702A1-0a5c-21e6.hcd' > [ 33.971410] Bluetooth: hci0: BCM: 'brcm/BCM-0a5c-21e6.hcd' > [ 34.032198] Bluetooth: MGMT ver 1.23 > [ 34.838285] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > [ 35.166235] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out > [ 35.167828] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out > [ 35.206578] sd 0:0:0:0: [sda] Starting disk > [ 35.238578] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out > [ 35.240197] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out > [ 35.278928] ata1.00: configured for UDMA/133 > [ 35.613471] e1000e 0000:00:19.0 enp0s25: NIC Link is Down > [ 54.283825] wlp3s0: authenticate with c8:3a:35:f0:ad:f1 (local address=a0:88:b4:62:a7:30) > > [ 54.284095] ================================ > [ 54.284097] WARNING: inconsistent lock state > [ 54.284100] 6.16.0-rc7+ #305 Tainted: G S > [ 54.284104] -------------------------------- > [ 54.284105] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. > [ 54.284108] wpa_supplicant/2940 [HC0[0]:SC0[0]:HE0:SE1] takes: > [ 54.284114] ffffffff86263fe0 (console_owner){+.?.}-{0:0}, at: console_lock_spinning_enable+0x3d/0x60 The lockdep warning just means that console_owner_lock is acquired both in softirq context and in task context with interrupt enabled. That can leads to deadlock. So the remedy is to always take console_owner_lock with interrupt disabled, i.e. with raw_spin_lock_irqsave/raw_spin_lock_irqrestore. Cheers, Longman > [ 54.284133] {IN-SOFTIRQ-W} state was registered at: > [ 54.284135] lock_acquire+0xdc/0x300 > [ 54.284144] console_lock_spinning_enable+0x59/0x60 > [ 54.284148] console_flush_all+0x2c0/0x4d0 > [ 54.284153] console_unlock+0x55/0x100 > [ 54.284157] vprintk_emit+0x15b/0x3a0 > [ 54.284162] vprintk_default+0x18/0x20 > [ 54.284166] vprintk+0x9/0x10 > [ 54.284171] _printk+0x52/0x70 > [ 54.284177] _credit_init_bits.part.0+0xec/0x160 > [ 54.284187] entropy_timer+0xaa/0xc0 > [ 54.284194] call_timer_fn+0xa7/0x260 > [ 54.284200] expire_timers+0xef/0x1b0 > [ 54.284205] run_timer_base+0xb0/0x140 > [ 54.284210] run_timer_softirq+0xb/0x40 > [ 54.284215] handle_softirqs+0xcd/0x490 > [ 54.284222] irq_exit_rcu+0xa2/0x160 > [ 54.284227] sysvec_apic_timer_interrupt+0x9b/0xc0 > [ 54.284236] asm_sysvec_apic_timer_interrupt+0x1b/0x20 > [ 54.284242] cpuidle_enter_state+0x124/0x550 > [ 54.284247] cpuidle_enter+0x29/0x40 > [ 54.284253] do_idle+0x1d9/0x260 > [ 54.284262] cpu_startup_entry+0x27/0x30 > [ 54.284270] start_secondary+0x11e/0x140 > [ 54.284277] common_startup_64+0x129/0x138 > [ 54.284285] irq event stamp: 35912 > [ 54.284287] hardirqs last enabled at (35911): [<ffffffff8432d3a1>] __down_trylock_console_sem+0xb1/0xc0 > [ 54.284296] hardirqs last disabled at (35912): [<ffffffff84330651>] console_flush_all+0x3e1/0x4d0 > [ 54.284302] softirqs last enabled at (35906): [<ffffffff85608c7e>] ieee80211_check_fast_xmit+0xce/0x820 > [ 54.284312] softirqs last disabled at (35904): [<ffffffff85608c2c>] ieee80211_check_fast_xmit+0x7c/0x820 > [ 54.284319] > other info that might help us debug this: > [ 54.284321] Possible unsafe locking scenario: > > [ 54.284322] CPU0 > [ 54.284323] ---- > [ 54.284324] lock(console_owner); > [ 54.284328] <Interrupt> > [ 54.284329] lock(console_owner); > [ 54.284332] > *** DEADLOCK *** > > [ 54.284333] 8 locks held by wpa_supplicant/2940: > [ 54.284336] #0: ffffffff8646f9b0 (cb_lock){++++}-{4:4}, at: genl_rcv+0x15/0x40 > [ 54.284351] #1: ffff888103068768 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: nl80211_pre_doit+0xc2/0x250 > [ 54.284367] #2: ffffffff86344200 (console_lock){+.+.}-{0:0}, at: vprintk_default+0x18/0x20 > [ 54.284376] #3: ffffffff86344250 (console_srcu){....}-{0:0}, at: console_flush_all+0x43/0x4d0 > [ 54.284386] #4: ffffffff86263fe0 (console_owner){+.?.}-{0:0}, at: console_lock_spinning_enable+0x3d/0x60 > [ 54.284395] #5: ffffffff86263f60 (printk_legacy_map-wait-type-override){+...}-{4:4}, at: console_flush_all+0x2d2/0x4d0 > [ 54.284404] #6: ffffffff863ed2f8 (printing_lock){+...}-{3:3}, at: vt_console_print+0x55/0x470 > [ 54.284415] #7: ffffffff86346ac0 (rcu_read_lock){....}-{1:3}, at: __flush_work+0x5e/0x550 > [ 54.284428] > stack backtrace: > [ 54.284433] CPU: 3 UID: 0 PID: 2940 Comm: wpa_supplicant Tainted: G S 6.16.0-rc7+ #305 PREEMPT(voluntary) > [ 54.284441] Tainted: [S]=CPU_OUT_OF_SPEC > [ 54.284443] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016 > [ 54.284446] Call Trace: > [ 54.284448] <TASK> > [ 54.284450] dump_stack_lvl+0x88/0xd0 > [ 54.284458] dump_stack+0x10/0x20 > [ 54.284463] print_usage_bug.part.0+0x237/0x2d0 > [ 54.284470] mark_lock.part.0+0xa9c/0xfb0 > [ 54.284479] mark_held_locks+0x4d/0x80 > [ 54.284486] lockdep_hardirqs_on_prepare+0xff/0x1c0 > [ 54.284493] trace_hardirqs_on+0x5a/0xe0 > [ 54.284500] _raw_spin_unlock_irq+0x23/0x60 > [ 54.284505] __flush_work+0x3b4/0x550 > [ 54.284513] ? __timer_delete+0x2f/0xd0 > [ 54.284519] ? timer_delete+0xb/0x20 > [ 54.284524] ? try_to_grab_pending+0x12a/0x320 > [ 54.284534] cancel_delayed_work_sync+0x65/0x70 > [ 54.284539] fbcon_cursor+0xbe/0x160 > [ 54.284548] hide_cursor+0x2c/0xc0 > [ 54.284553] vt_console_print+0x45e/0x470 > [ 54.284560] console_flush_all+0x301/0x4d0 > [ 54.284565] ? console_flush_all+0x2d2/0x4d0 > [ 54.284569] ? console_flush_all+0x43/0x4d0 > [ 54.284572] ? console_flush_all+0x2d2/0x4d0 > [ 54.284579] console_unlock+0x55/0x100 > [ 54.284584] vprintk_emit+0x15b/0x3a0 > [ 54.284590] vprintk_default+0x18/0x20 > [ 54.284595] vprintk+0x9/0x10 > [ 54.284600] _printk+0x52/0x70 > [ 54.284608] ieee80211_mgd_auth+0x2a3/0x5e0 > [ 54.284619] ? __this_cpu_preempt_check+0x13/0x20 > [ 54.284625] ieee80211_auth+0x13/0x20 > [ 54.284631] cfg80211_mlme_auth+0xeb/0x2a0 > [ 54.284640] nl80211_authenticate+0x341/0x3a0 > [ 54.284653] genl_family_rcv_msg_doit.constprop.0+0xd4/0x130 > [ 54.284663] genl_rcv_msg+0x14c/0x250 > [ 54.284669] ? __pfx_nl80211_pre_doit+0x10/0x10 > [ 54.284676] ? __pfx_nl80211_authenticate+0x10/0x10 > [ 54.284684] ? __pfx_nl80211_post_doit+0x10/0x10 > [ 54.284691] ? __pfx_genl_rcv_msg+0x10/0x10 > [ 54.284697] netlink_rcv_skb+0x55/0x100 > [ 54.284706] genl_rcv+0x24/0x40 > [ 54.284712] netlink_unicast+0x1e8/0x2b0 > [ 54.284718] netlink_sendmsg+0x1e3/0x400 > [ 54.284726] ____sys_sendmsg+0x2a0/0x2f0 > [ 54.284736] ? copy_msghdr_from_user+0x71/0xb0 > [ 54.284744] ___sys_sendmsg+0x85/0xd0 > [ 54.284751] ? find_held_lock+0x31/0x90 > [ 54.284756] ? __might_fault+0x2c/0x70 > [ 54.284762] ? __this_cpu_preempt_check+0x13/0x20 > [ 54.284766] ? __might_fault+0x2c/0x70 > [ 54.284771] ? __might_fault+0x2c/0x70 > [ 54.284776] ? debug_smp_processor_id+0x17/0x20 > [ 54.284781] __sys_sendmsg+0x6e/0xd0 > [ 54.284788] ? do_syscall_64+0x39/0x2c0 > [ 54.284795] __x64_sys_sendmsg+0x1a/0x20 > [ 54.284801] x64_sys_call+0x929/0x2150 > [ 54.284808] do_syscall_64+0x71/0x2c0 > [ 54.284814] entry_SYSCALL_64_after_hwframe+0x76/0x7e > [ 54.284819] RIP: 0033:0x7f79e0d28fb3 > [ 54.284825] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b7 66 2e 0f 1f 84 00 00 00 00 00 90 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 89 54 24 1c 48 > [ 54.284831] RSP: 002b:00007ffc79c940b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e > [ 54.284837] RAX: ffffffffffffffda RBX: 000055b73b414650 RCX: 00007f79e0d28fb3 > [ 54.284840] RDX: 0000000000000000 RSI: 00007ffc79c940f0 RDI: 0000000000000006 > [ 54.284843] RBP: 000055b73b44dc70 R08: 0000000000000004 R09: 00007f79e0dfabe0 > [ 54.284846] R10: 00007ffc79c941c4 R11: 0000000000000246 R12: 000055b73b414560 > [ 54.284849] R13: 00007ffc79c940f0 R14: 00007ffc79c941c4 R15: 000055b73b453f20 > [ 54.284856] </TASK> > [ 54.285774] ------------[ cut here ]------------ > [ 54.285777] raw_local_irq_restore() called with IRQs enabled > [ 54.285788] WARNING: CPU: 3 PID: 2940 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30 > [ 54.285801] Modules linked in: > [ 54.285807] CPU: 3 UID: 0 PID: 2940 Comm: wpa_supplicant Tainted: G S 6.16.0-rc7+ #305 PREEMPT(voluntary) > [ 54.285814] Tainted: [S]=CPU_OUT_OF_SPEC > [ 54.285817] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016 > [ 54.285820] RIP: 0010:warn_bogus_irq_restore+0x25/0x30 > [ 54.285828] Code: 90 90 90 90 90 80 3d c2 5f e3 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 98 2f fa 85 c6 05 ac 5f e3 00 01 48 89 e5 e8 db 23 be fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 > [ 54.285834] RSP: 0018:ffffc90001acb690 EFLAGS: 00010282 > [ 54.285839] RAX: 0000000000000000 RBX: ffffffff863ed460 RCX: 0000000000000000 > [ 54.285843] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff > [ 54.285847] RBP: ffffc90001acb690 R08: 0000000000000000 R09: 0000000000000000 > [ 54.285850] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 > [ 54.285853] R13: ffffc90001acb72b R14: 0000000000000200 R15: 0000000000000000 > [ 54.285857] FS: 00007f79e08da1c0(0000) GS:ffff88828fcf5000(0000) knlGS:0000000000000000 > [ 54.285862] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 54.285865] CR2: 000055b73b455000 CR3: 000000010b07e005 CR4: 00000000000606b0 > [ 54.285869] Call Trace: > [ 54.285872] <TASK> > [ 54.285874] console_flush_all+0x47d/0x4d0 > [ 54.285880] ? console_flush_all+0x43/0x4d0 > [ 54.285885] ? console_flush_all+0x2d2/0x4d0 > [ 54.285892] console_unlock+0x55/0x100 > [ 54.285898] vprintk_emit+0x15b/0x3a0 > [ 54.285904] vprintk_default+0x18/0x20 > [ 54.285910] vprintk+0x9/0x10 > [ 54.285915] _printk+0x52/0x70 > [ 54.285924] ieee80211_mgd_auth+0x2a3/0x5e0 > [ 54.285934] ? __this_cpu_preempt_check+0x13/0x20 > [ 54.285940] ieee80211_auth+0x13/0x20 > [ 54.285945] cfg80211_mlme_auth+0xeb/0x2a0 > [ 54.285954] nl80211_authenticate+0x341/0x3a0 > [ 54.285967] genl_family_rcv_msg_doit.constprop.0+0xd4/0x130 > [ 54.285977] genl_rcv_msg+0x14c/0x250 > [ 54.285984] ? __pfx_nl80211_pre_doit+0x10/0x10 > [ 54.285991] ? __pfx_nl80211_authenticate+0x10/0x10 > [ 54.286000] ? __pfx_nl80211_post_doit+0x10/0x10 > [ 54.286007] ? __pfx_genl_rcv_msg+0x10/0x10 > [ 54.286014] netlink_rcv_skb+0x55/0x100 > [ 54.286023] genl_rcv+0x24/0x40 > [ 54.286029] netlink_unicast+0x1e8/0x2b0 > [ 54.286036] netlink_sendmsg+0x1e3/0x400 > [ 54.286044] ____sys_sendmsg+0x2a0/0x2f0 > [ 54.286052] ? copy_msghdr_from_user+0x71/0xb0 > [ 54.286061] ___sys_sendmsg+0x85/0xd0 > [ 54.286068] ? find_held_lock+0x31/0x90 > [ 54.286073] ? __might_fault+0x2c/0x70 > [ 54.286078] ? __this_cpu_preempt_check+0x13/0x20 > [ 54.286083] ? __might_fault+0x2c/0x70 > [ 54.286088] ? __might_fault+0x2c/0x70 > [ 54.286094] ? debug_smp_processor_id+0x17/0x20 > [ 54.286100] __sys_sendmsg+0x6e/0xd0 > [ 54.286108] ? do_syscall_64+0x39/0x2c0 > [ 54.286114] __x64_sys_sendmsg+0x1a/0x20 > [ 54.286168] x64_sys_call+0x929/0x2150 > [ 54.286182] do_syscall_64+0x71/0x2c0 > [ 54.286193] entry_SYSCALL_64_after_hwframe+0x76/0x7e > [ 54.286200] RIP: 0033:0x7f79e0d28fb3 > [ 54.286207] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b7 66 2e 0f 1f 84 00 00 00 00 00 90 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 89 54 24 1c 48 > [ 54.286215] RSP: 002b:00007ffc79c940b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e > [ 54.286225] RAX: ffffffffffffffda RBX: 000055b73b414650 RCX: 00007f79e0d28fb3 > [ 54.286230] RDX: 0000000000000000 RSI: 00007ffc79c940f0 RDI: 0000000000000006 > [ 54.286235] RBP: 000055b73b44dc70 R08: 0000000000000004 R09: 00007f79e0dfabe0 > [ 54.286246] R10: 00007ffc79c941c4 R11: 0000000000000246 R12: 000055b73b414560 > [ 54.286247] R13: 00007ffc79c940f0 R14: 00007ffc79c941c4 R15: 000055b73b453f20 > [ 54.286266] </TASK> > [ 54.286269] irq event stamp: 35912 > [ 54.286273] hardirqs last enabled at (35911): [<ffffffff8432d3a1>] __down_trylock_console_sem+0xb1/0xc0 > [ 54.286286] hardirqs last disabled at (35912): [<ffffffff84330651>] console_flush_all+0x3e1/0x4d0 > [ 54.286294] softirqs last enabled at (35906): [<ffffffff85608c7e>] ieee80211_check_fast_xmit+0xce/0x820 > [ 54.286306] softirqs last disabled at (35904): [<ffffffff85608c2c>] ieee80211_check_fast_xmit+0x7c/0x820 > [ 54.286318] ---[ end trace 0000000000000000 ]--- > [ 54.373428] wlp3s0: send auth to c8:3a:35:f0:ad:f1 (try 1/3) > [ 54.381882] wlp3s0: authenticated > [ 54.386240] wlp3s0: associate with c8:3a:35:f0:ad:f1 (try 1/3) > [ 54.392641] wlp3s0: RX AssocResp from c8:3a:35:f0:ad:f1 (capab=0x411 status=0 aid=6) > [ 54.413392] wlp3s0: associated > > Best regards, > Pavel > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220 2025-07-24 17:51 ` Waiman Long @ 2025-07-24 17:55 ` Waiman Long 2025-07-24 21:24 ` Thomas Gleixner 2025-07-24 19:07 ` Pavel Machek 2025-07-24 19:45 ` John Ogness 2 siblings, 1 reply; 27+ messages in thread From: Waiman Long @ 2025-07-24 17:55 UTC (permalink / raw) To: Pavel Machek, Thomas Gleixner Cc: kernel list, mingo, bp, dave.hansen, x86, hpa, peterz, will, miriam.rachel.korenblit, linux-wireless, Petr Mladek, John Ogness On 7/24/25 1:51 PM, Waiman Long wrote: > On 7/24/25 1:15 PM, Pavel Machek wrote: >> Hi! >> >> On Wed 2025-07-23 19:32:10, Thomas Gleixner wrote: >>> On Wed, Jul 23 2025 at 19:31, Thomas Gleixner wrote: >>>> On Wed, Jul 23 2025 at 17:42, Pavel Machek wrote: >>>>> Did kernel boot on console (w/o X), and got this: not sure if it is >>>>> related. >>>>> [ 402.125635] ------------[ cut here ]------------ >>>>> [ 402.125638] raw_local_irq_restore() called with IRQs enabled >>>>> [ 402.125645] WARNING: CPU: 3 PID: 387 at >>>>> kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30 >>>>> [ 402.125654] Modules linked in: >>>>> [ 402.125661] CPU: 3 UID: 0 PID: 387 Comm: kworker/u16:5 Tainted: >>>>> G S 6.16.0-rc7+ #303 PREEMPT(voluntary) >>>>> [ 402.125667] Tainted: [S]=CPU_OUT_OF_SPEC >>>>> [ 402.125668] Hardware name: LENOVO 4291W3B/4291W3B, BIOS >>>>> 8DET73WW (1.43 ) 10/12/2016 >>>>> [ 402.125671] Workqueue: events_unbound cfg80211_wiphy_work >>>>> [ 402.125678] RIP: 0010:warn_bogus_irq_restore+0x25/0x30 >>>>> [ 402.125683] Code: 90 90 90 90 90 80 3d 51 3d dc 00 00 74 05 c3 >>>>> cc cc cc cc 55 48 c7 c7 c0 4f c9 85 48 89 e5 c6 05 38 3d dc 00 01 >>>>> e8 9b d8 e6 fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 >>>>> 90 90 90 90 90 90 >>>>> [ 402.125686] RSP: 0018:ffffc9000173fb30 EFLAGS: 00010282 >>>>> [ 402.125691] RAX: 0000000000000000 RBX: ffffffff8616b460 RCX: >>>>> 0000000000000000 >>>>> [ 402.125694] RDX: 0000000000000003 RSI: 0000000000000027 RDI: >>>>> 00000000ffffffff >>>>> [ 402.125696] RBP: ffffc9000173fb30 R08: 0000000028935f32 R09: >>>>> 0000000000000001 >>>>> [ 402.125699] R10: 0000000000000044 R11: ffff888100ba52c8 R12: >>>>> 0000000000000001 >>>>> [ 402.125702] R13: ffffc9000173fbcb R14: ffffffff84301224 R15: >>>>> 0000000000000000 >>>>> [ 402.125704] FS: 0000000000000000(0000) >>>>> GS:ffff88829007f000(0000) knlGS:0000000000000000 >>>>> [ 402.125707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>> [ 402.125710] CR2: 000055967d471ee0 CR3: 0000000006046001 CR4: >>>>> 00000000000606b0 >>>>> [ 402.125713] Call Trace: >>>>> [ 402.125716] <TASK> >>>>> [ 402.125719] console_flush_all+0x41e/0x460 >>>> Can you please decode this, so we can see which part of that code >>>> it is? >>> And enable lockdep so that we can see where the interrupts were >>> enabled? >> Enabled lockdep and got this one. It seems resume with bad wifi signal >> does it on 6.16...? >> >> Is it any good? Any decoding needed? >> >> ... >> [ 32.361445] CPU2 is up >> [ 32.361729] smpboot: Booting Node 0 Processor 3 APIC 0x3 >> [ 32.361982] Disabled fast string operations >> [ 32.366800] CPU3 is up >> [ 32.370186] ACPI: PM: Waking up from system sleep state S3 >> [ 32.393904] ACPI: EC: interrupt unblocked >> [ 32.396000] sdhci-pci 0000:0d:00.0: MMC controller base frequency >> changed to 50Mhz. >> [ 32.409738] ACPI: EC: event unblocked >> [ 32.470808] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0 >> [ 32.687300] usb 2-1.4: reset full-speed USB device number 4 using >> ehci-pci >> [ 32.758329] ata2: SATA link down (SStatus 0 SControl 300) >> [ 32.758375] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) >> [ 32.762316] ata5: SATA link down (SStatus 0 SControl 300) >> [ 32.764585] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE >> LOCK) filtered out >> [ 32.764593] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) >> filtered out >> [ 32.771931] sd 2:0:0:0: [sdb] Starting disk >> [ 32.777439] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE >> LOCK) filtered out >> [ 32.777450] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) >> filtered out >> [ 32.782731] ata3.00: configured for UDMA/133 >> [ 32.786846] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0 >> [ 32.858513] usb 2-1.3: reset full-speed USB device number 3 using >> ehci-pci >> [ 33.026331] usb 2-1.6: reset high-speed USB device number 5 using >> ehci-pci >> [ 33.662583] psmouse serio1: synaptics: queried max coordinates: x >> [..5472], y [..4448] >> [ 33.810033] PM: resume devices took 1.404 seconds >> [ 33.841597] OOM killer enabled. >> [ 33.841808] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: docking >> [ 33.843280] Restarting tasks: Starting >> [ 33.849066] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: Unable to dock! >> [ 33.852744] Restarting tasks: Done >> [ 33.888306] PM: suspend exit >> [ 33.941831] Bluetooth: hci0: BCM: chip id 63 >> [ 33.944910] Bluetooth: hci0: BCM: features 0x07 >> [ 33.961985] Bluetooth: hci0: BCM20702A >> [ 33.962628] Bluetooth: hci0: BCM20702A1 (001.002.014) build 0000 >> [ 33.970094] Bluetooth: hci0: BCM: firmware Patch file not found, >> tried: >> [ 33.970993] Bluetooth: hci0: BCM: 'brcm/BCM20702A1-0a5c-21e6.hcd' >> [ 33.971410] Bluetooth: hci0: BCM: 'brcm/BCM-0a5c-21e6.hcd' >> [ 34.032198] Bluetooth: MGMT ver 1.23 >> [ 34.838285] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) >> [ 35.166235] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE >> LOCK) filtered out >> [ 35.167828] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) >> filtered out >> [ 35.206578] sd 0:0:0:0: [sda] Starting disk >> [ 35.238578] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE >> LOCK) filtered out >> [ 35.240197] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) >> filtered out >> [ 35.278928] ata1.00: configured for UDMA/133 >> [ 35.613471] e1000e 0000:00:19.0 enp0s25: NIC Link is Down >> [ 54.283825] wlp3s0: authenticate with c8:3a:35:f0:ad:f1 (local >> address=a0:88:b4:62:a7:30) >> >> [ 54.284095] ================================ >> [ 54.284097] WARNING: inconsistent lock state >> [ 54.284100] 6.16.0-rc7+ #305 Tainted: G S >> [ 54.284104] -------------------------------- >> [ 54.284105] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. >> [ 54.284108] wpa_supplicant/2940 [HC0[0]:SC0[0]:HE0:SE1] takes: >> [ 54.284114] ffffffff86263fe0 (console_owner){+.?.}-{0:0}, at: >> console_lock_spinning_enable+0x3d/0x60 > > The lockdep warning just means that console_owner_lock is acquired > both in softirq context and in task context with interrupt enabled. > That can leads to deadlock. So the remedy is to always take > console_owner_lock with interrupt disabled, i.e. with > raw_spin_lock_irqsave/raw_spin_lock_irqrestore. I suppose that this lock can also be acquired in hardirq context. So a similar HARDIRQ warning can be printed if that happens first. Regards, Longman ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220 2025-07-24 17:55 ` Waiman Long @ 2025-07-24 21:24 ` Thomas Gleixner 2025-07-25 0:33 ` Hillf Danton 2025-07-25 15:23 ` Thomas Gleixner 0 siblings, 2 replies; 27+ messages in thread From: Thomas Gleixner @ 2025-07-24 21:24 UTC (permalink / raw) To: Waiman Long, Pavel Machek Cc: kernel list, mingo, bp, dave.hansen, x86, hpa, peterz, will, miriam.rachel.korenblit, linux-wireless, Petr Mladek, John Ogness, Greg Kroah-Hartman, Jiri Slaby, Tejun Heo, Simona Vetter On Thu, Jul 24 2025 at 13:55, Waiman Long wrote: > On 7/24/25 1:51 PM, Waiman Long wrote: >>> [ 54.284095] ================================ >>> [ 54.284097] WARNING: inconsistent lock state >>> [ 54.284100] 6.16.0-rc7+ #305 Tainted: G S >>> [ 54.284104] -------------------------------- >>> [ 54.284105] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. >>> [ 54.284108] wpa_supplicant/2940 [HC0[0]:SC0[0]:HE0:SE1] takes: >>> [ 54.284114] ffffffff86263fe0 (console_owner){+.?.}-{0:0}, at: >>> console_lock_spinning_enable+0x3d/0x60 >> >> The lockdep warning just means that console_owner_lock is acquired >> both in softirq context and in task context with interrupt enabled. >> That can leads to deadlock. So the remedy is to always take >> console_owner_lock with interrupt disabled, i.e. with >> raw_spin_lock_irqsave/raw_spin_lock_irqrestore. > > I suppose that this lock can also be acquired in hardirq context. So a > similar HARDIRQ warning can be printed if that happens first. None of this makes any sense whatsoever. Both invocations of console_lock_spinning_enable() are within a hard interrupt disabled section. printk_safe_enter_irqsave(flags); console_lock_spinning_enable(); The lockdep map which is printed here is not console_owner_lock, it's console_owner, which is a software managed lockdep_map: static struct lockdep_map console_owner_dep_map = { .name = "console_owner" }; It's touched in console_lock_spinning_enable(), console_lock_spinning_disable_and_check() and console_trylock_spinning(). In all cases with interrupts disabled and all sites use printk_safe_enter_irqsave() which internally uses local_irq_save(). Now lets go back to the initial report, which is further down: [ 54.285777] raw_local_irq_restore() called with IRQs enabled [ 54.285788] WARNING: CPU: 3 PID: 2940 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30 [ 54.285801] Modules linked in: [ 54.285807] CPU: 3 UID: 0 PID: 2940 Comm: wpa_supplicant Tainted: G S 6.16.0-rc7+ #305 PREEMPT(voluntary) [ 54.285814] Tainted: [S]=CPU_OUT_OF_SPEC [ 54.285817] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016 [ 54.285820] RIP: 0010:warn_bogus_irq_restore+0x25/0x30 [ 54.285828] Code: 90 90 90 90 90 80 3d c2 5f e3 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 98 2f fa 85 c6 05 ac 5f e3 00 01 48 89 e5 e8 db 23 be fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 [ 54.285834] RSP: 0018:ffffc90001acb690 EFLAGS: 00010282 [ 54.285839] RAX: 0000000000000000 RBX: ffffffff863ed460 RCX: 0000000000000000 [ 54.285843] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff [ 54.285847] RBP: ffffc90001acb690 R08: 0000000000000000 R09: 0000000000000000 [ 54.285850] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 [ 54.285853] R13: ffffc90001acb72b R14: 0000000000000200 R15: 0000000000000000 [ 54.285857] FS: 00007f79e08da1c0(0000) GS:ffff88828fcf5000(0000) knlGS:0000000000000000 [ 54.285862] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 54.285865] CR2: 000055b73b455000 CR3: 000000010b07e005 CR4: 00000000000606b0 [ 54.285869] Call Trace: [ 54.285872] <TASK> [ 54.285874] console_flush_all+0x47d/0x4d0 [ 54.285880] ? console_flush_all+0x43/0x4d0 [ 54.285885] ? console_flush_all+0x2d2/0x4d0 [ 54.285892] console_unlock+0x55/0x100 [ 54.285924] ieee80211_mgd_auth+0x2a3/0x5e0 [ 54.285934] ? __this_cpu_preempt_check+0x13/0x20 [ 54.285940] ieee80211_auth+0x13/0x20 I asked for decoding that console_flush_all+0x47d/0x4d0 line, but I'm 100% sure now that it is line 3128 in console_emit_next_record() which is inlined into console_flush_all() printk_safe_exit_irqrestore(flags); Pavel, can you confirm that? But let's look at the other stack trace in the lockdep splat which is way more interesting: stack backtrace: [ 54.284433] CPU: 3 UID: 0 PID: 2940 Comm: wpa_supplicant Tainted: G S 6.16.0-rc7+ #305 PREEMPT(voluntary) [ 54.284441] Tainted: [S]=CPU_OUT_OF_SPEC [ 54.284443] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016 [ 54.284446] Call Trace: [ 54.284448] <TASK> [ 54.284450] dump_stack_lvl+0x88/0xd0 [ 54.284458] dump_stack+0x10/0x20 [ 54.284463] print_usage_bug.part.0+0x237/0x2d0 [ 54.284470] mark_lock.part.0+0xa9c/0xfb0 [ 54.284479] mark_held_locks+0x4d/0x80 [ 54.284486] lockdep_hardirqs_on_prepare+0xff/0x1c0 [ 54.284493] trace_hardirqs_on+0x5a/0xe0 [ 54.284500] _raw_spin_unlock_irq+0x23/0x60 [ 54.284505] __flush_work+0x3b4/0x550 [ 54.284513] ? __timer_delete+0x2f/0xd0 [ 54.284519] ? timer_delete+0xb/0x20 [ 54.284524] ? try_to_grab_pending+0x12a/0x320 [ 54.284534] cancel_delayed_work_sync+0x65/0x70 [ 54.284539] fbcon_cursor+0xbe/0x160 [ 54.284548] hide_cursor+0x2c/0xc0 [ 54.284553] vt_console_print+0x45e/0x470 [ 54.284560] console_flush_all+0x301/0x4d0 [ 54.284565] ? console_flush_all+0x2d2/0x4d0 [ 54.284569] ? console_flush_all+0x43/0x4d0 [ 54.284572] ? console_flush_all+0x2d2/0x4d0 [ 54.284579] console_unlock+0x55/0x100 [ 54.284584] vprintk_emit+0x15b/0x3a0 [ 54.284590] vprintk_default+0x18/0x20 [ 54.284595] vprintk+0x9/0x10 [ 54.284600] _printk+0x52/0x70 [ 54.284608] ieee80211_mgd_auth+0x2a3/0x5e0 [ 54.284619] ? __this_cpu_preempt_check+0x13/0x20 [ 54.284625] ieee80211_auth+0x13/0x20 That's the same call chain as the above which complains about the already enabled interrupt. So the lockdep splat is telling us that the raw_spin_unlock_irq() in the workqueue code is making console_owner unsafe. Now let's look where this comes from: console_flush_all+0x301/0x4d0 invokes vt_console_print+0x45e/0x470 via console_emit_next_record() ... printk_safe_enter_irqsave(flags); ... con->write(con, outbuf, pmsg.outbuf_len); Again decoding console_flush_all+0x301/0x4d0 should confirm that. Now that ends up in cancel_delayed_work_sync() via hide_cursor() -> fbcon_cursor(). cancel_delayed_work_sync() has a might_sleep() in it _and_ __flush_work() uses raw_spin_[un]lock_irq() as it requires to be invoked in thread or in some cases in BH context. vt_console_print() is the write() callback of the VT console driver. As that is a legacy console, it can't do that. Only the new NBCON variants are allowed to sleep in their write() callbacks because they are running in their own printer thread context. For atomic printouts in case of emergency they can have a write_atomic() callback. If they don't then printing is not attempted, but that's not relevant for vt_console_print() obviously. So lets look at that call chain further vt_console_print() hide_cursor() vc->vc_sw->con_cursor(vc, false); --> fbcon_cursor() fbcon_cursor() if (vc->vc_cursor_type & CUR_SW) fbcon_del_cursor_work(info) cancel_delayed_work_sync(&ops->cursor_work); Here we are. Now I checked whether this is a recent change, but I have to say it has been that way for a long time. It's a very interesting question why this never popped up before. The only explanation I have so far is that the code path in cancel_delayed_work_sync() which uses [un]lock_irq() is only reached, when the works is executed at that point. Unfortunatly the might_sleep() check in __cancel_work_sync() is after the point which determines that. So it's a matter of debug options being enabled and the probability of hitting a work which is actually executed to trip over this problem. I have no fix for that, but at least there should be some clues now where to tackle this. It's definitely not a printk bug. The culprit was hiding in plain sight for a long time in the fbcon_console code. Thanks, tglx ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220 2025-07-24 21:24 ` Thomas Gleixner @ 2025-07-25 0:33 ` Hillf Danton 2025-07-25 15:23 ` Thomas Gleixner 1 sibling, 0 replies; 27+ messages in thread From: Hillf Danton @ 2025-07-25 0:33 UTC (permalink / raw) To: Thomas Gleixner Cc: Waiman Long, Pavel Machek, LKML, Petr Mladek, John Ogness, Tejun Heo On Thu, 24 Jul 2025 23:24:26 +0200 Thomas Gleixner wrote: >On Thu, Jul 24 2025 at 13:55, Waiman Long wrote: >> On 7/24/25 1:51 PM, Waiman Long wrote: >>>> [ 54.284095] ================================ >>>> [ 54.284097] WARNING: inconsistent lock state >>>> [ 54.284100] 6.16.0-rc7+ #305 Tainted: G S >>>> [ 54.284104] -------------------------------- >>>> [ 54.284105] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. >>>> [ 54.284108] wpa_supplicant/2940 [HC0[0]:SC0[0]:HE0:SE1] takes: >>>> [ 54.284114] ffffffff86263fe0 (console_owner){+.?.}-{0:0}, at: >>>> console_lock_spinning_enable+0x3d/0x60 >>> >>> The lockdep warning just means that console_owner_lock is acquired >>> both in softirq context and in task context with interrupt enabled. >>> That can leads to deadlock. So the remedy is to always take >>> console_owner_lock with interrupt disabled, i.e. with >>> raw_spin_lock_irqsave/raw_spin_lock_irqrestore. >> >> I suppose that this lock can also be acquired in hardirq context. So a >> similar HARDIRQ warning can be printed if that happens first. > > None of this makes any sense whatsoever. > > Both invocations of console_lock_spinning_enable() are within a hard > interrupt disabled section. > > printk_safe_enter_irqsave(flags); > console_lock_spinning_enable(); > > The lockdep map which is printed here is not console_owner_lock, > it's console_owner, which is a software managed lockdep_map: > > static struct lockdep_map console_owner_dep_map = { > .name = "console_owner" > }; > > It's touched in console_lock_spinning_enable(), > console_lock_spinning_disable_and_check() and console_trylock_spinning(). > > In all cases with interrupts disabled and all sites use > printk_safe_enter_irqsave() which internally uses local_irq_save(). > > Now lets go back to the initial report, which is further down: > > [ 54.285777] raw_local_irq_restore() called with IRQs enabled > [ 54.285788] WARNING: CPU: 3 PID: 2940 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30 > [ 54.285801] Modules linked in: > [ 54.285807] CPU: 3 UID: 0 PID: 2940 Comm: wpa_supplicant Tainted: G S 6.16.0-rc7+ #305 PREEMPT(voluntary) > [ 54.285814] Tainted: [S]=CPU_OUT_OF_SPEC > [ 54.285817] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016 > [ 54.285820] RIP: 0010:warn_bogus_irq_restore+0x25/0x30 > [ 54.285828] Code: 90 90 90 90 90 80 3d c2 5f e3 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 98 2f fa 85 c6 05 ac 5f e3 00 01 48 89 e5 e8 db 23 be fe <0f> 0b > 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 > [ 54.285834] RSP: 0018:ffffc90001acb690 EFLAGS: 00010282 > [ 54.285839] RAX: 0000000000000000 RBX: ffffffff863ed460 RCX: 0000000000000000 > [ 54.285843] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff > [ 54.285847] RBP: ffffc90001acb690 R08: 0000000000000000 R09: 0000000000000000 > [ 54.285850] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 > [ 54.285853] R13: ffffc90001acb72b R14: 0000000000000200 R15: 0000000000000000 > [ 54.285857] FS: 00007f79e08da1c0(0000) GS:ffff88828fcf5000(0000) knlGS:0000000000000000 > [ 54.285862] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 54.285865] CR2: 000055b73b455000 CR3: 000000010b07e005 CR4: 00000000000606b0 > [ 54.285869] Call Trace: > [ 54.285872] <TASK> > [ 54.285874] console_flush_all+0x47d/0x4d0 > [ 54.285880] ? console_flush_all+0x43/0x4d0 > [ 54.285885] ? console_flush_all+0x2d2/0x4d0 > [ 54.285892] console_unlock+0x55/0x100 > [ 54.285924] ieee80211_mgd_auth+0x2a3/0x5e0 > [ 54.285934] ? __this_cpu_preempt_check+0x13/0x20 > [ 54.285940] ieee80211_auth+0x13/0x20 > > I asked for decoding that console_flush_all+0x47d/0x4d0 line, but I'm > 100% sure now that it is line 3128 in console_emit_next_record() which > is inlined into console_flush_all() > > printk_safe_exit_irqrestore(flags); > > Pavel, can you confirm that? > > But let's look at the other stack trace in the lockdep splat which is > way more interesting: > > stack backtrace: > [ 54.284433] CPU: 3 UID: 0 PID: 2940 Comm: wpa_supplicant Tainted: G S 6.16.0-rc7+ #305 PREEMPT(voluntary) > [ 54.284441] Tainted: [S]=CPU_OUT_OF_SPEC > [ 54.284443] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016 > [ 54.284446] Call Trace: > [ 54.284448] <TASK> > [ 54.284450] dump_stack_lvl+0x88/0xd0 > [ 54.284458] dump_stack+0x10/0x20 > [ 54.284463] print_usage_bug.part.0+0x237/0x2d0 > [ 54.284470] mark_lock.part.0+0xa9c/0xfb0 > [ 54.284479] mark_held_locks+0x4d/0x80 > [ 54.284486] lockdep_hardirqs_on_prepare+0xff/0x1c0 > [ 54.284493] trace_hardirqs_on+0x5a/0xe0 > [ 54.284500] _raw_spin_unlock_irq+0x23/0x60 > [ 54.284505] __flush_work+0x3b4/0x550 > [ 54.284513] ? __timer_delete+0x2f/0xd0 > [ 54.284519] ? timer_delete+0xb/0x20 > [ 54.284524] ? try_to_grab_pending+0x12a/0x320 > [ 54.284534] cancel_delayed_work_sync+0x65/0x70 > [ 54.284539] fbcon_cursor+0xbe/0x160 > [ 54.284548] hide_cursor+0x2c/0xc0 > [ 54.284553] vt_console_print+0x45e/0x470 > [ 54.284560] console_flush_all+0x301/0x4d0 > [ 54.284565] ? console_flush_all+0x2d2/0x4d0 > [ 54.284569] ? console_flush_all+0x43/0x4d0 > [ 54.284572] ? console_flush_all+0x2d2/0x4d0 > [ 54.284579] console_unlock+0x55/0x100 > [ 54.284584] vprintk_emit+0x15b/0x3a0 > [ 54.284590] vprintk_default+0x18/0x20 > [ 54.284595] vprintk+0x9/0x10 > [ 54.284600] _printk+0x52/0x70 > [ 54.284608] ieee80211_mgd_auth+0x2a3/0x5e0 > [ 54.284619] ? __this_cpu_preempt_check+0x13/0x20 > [ 54.284625] ieee80211_auth+0x13/0x20 > > That's the same call chain as the above which complains about the > already enabled interrupt. > > So the lockdep splat is telling us that the raw_spin_unlock_irq() in the > workqueue code is making console_owner unsafe. > Point A > Now let's look where this comes from: > > console_flush_all+0x301/0x4d0 > > invokes > > vt_console_print+0x45e/0x470 > > via > > console_emit_next_record() > ... > printk_safe_enter_irqsave(flags); > ... > con->write(con, outbuf, pmsg.outbuf_len); > > Again decoding console_flush_all+0x301/0x4d0 should confirm that. > > Now that ends up in cancel_delayed_work_sync() via hide_cursor() -> > fbcon_cursor(). > > cancel_delayed_work_sync() has a might_sleep() in it _and_ > __flush_work() uses raw_spin_[un]lock_irq() as it requires to be invoked > in thread or in some cases in BH context. > > vt_console_print() is the write() callback of the VT console driver. As > that is a legacy console, it can't do that. > > Only the new NBCON variants are allowed to sleep in their write() > callbacks because they are running in their own printer thread > context. For atomic printouts in case of emergency they can have a > write_atomic() callback. If they don't then printing is not attempted, > but that's not relevant for vt_console_print() obviously. > > So lets look at that call chain further > > vt_console_print() > hide_cursor() > vc->vc_sw->con_cursor(vc, false); --> fbcon_cursor() > > fbcon_cursor() > if (vc->vc_cursor_type & CUR_SW) > fbcon_del_cursor_work(info) > cancel_delayed_work_sync(&ops->cursor_work); > cancel_delayed_work_sync(&ops->cursor_work); __cancel_work_sync() __cancel_work() local_irq_save() ... local_irq_restore() // Point B __flush_work() start_flush_work() raw_spin_lock_irq(&pool->lock); ... raw_spin_unlock_irq(&pool->lock); // Point C The difference between Point B and C wrt irq creates Point A. > Here we are. > > Now I checked whether this is a recent change, but I have to say it has > been that way for a long time. It's a very interesting question why this > never popped up before. > > The only explanation I have so far is that the code path in > cancel_delayed_work_sync() which uses [un]lock_irq() is only reached, > when the works is executed at that point. > > Unfortunatly the might_sleep() check in __cancel_work_sync() is after > the point which determines that. > > So it's a matter of debug options being enabled and the probability of > hitting a work which is actually executed to trip over this problem. > No, debug makes no sense. Nor does a running work. > I have no fix for that, but at least there should be some clues now > where to tackle this. > Yes > It's definitely not a printk bug. The culprit was hiding in plain sight > for a long time in the fbcon_console code. > Nope because of Point A. Hillf > Thanks, > > tglx ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220 2025-07-24 21:24 ` Thomas Gleixner 2025-07-25 0:33 ` Hillf Danton @ 2025-07-25 15:23 ` Thomas Gleixner 2025-07-25 19:05 ` Pavel Machek 1 sibling, 1 reply; 27+ messages in thread From: Thomas Gleixner @ 2025-07-25 15:23 UTC (permalink / raw) To: Waiman Long, Pavel Machek Cc: kernel list, mingo, bp, dave.hansen, x86, hpa, peterz, will, miriam.rachel.korenblit, linux-wireless, Petr Mladek, John Ogness, Greg Kroah-Hartman, Jiri Slaby, Tejun Heo, Simona Vetter, Thierry Reding On Thu, Jul 24 2025 at 23:24, Thomas Gleixner wrote: > On Thu, Jul 24 2025 at 13:55, Waiman Long wrote: > So lets look at that call chain further > > vt_console_print() > hide_cursor() > vc->vc_sw->con_cursor(vc, false); --> fbcon_cursor() > > fbcon_cursor() > if (vc->vc_cursor_type & CUR_SW) > fbcon_del_cursor_work(info) > cancel_delayed_work_sync(&ops->cursor_work); > > Here we are. It's actually very simple to reproduce in a VM. No wireless, no suspend/resume, no special hardware required. Enable framebuffer and framebuffer console. Add console=tty0 to the kernel command line. Log into FB console and do # echo -e '\033[?17;0;64c' which enables the software cursor, which in turn enables the above conditional invocation of fbcon_del_cursor_work(). Then force a printk # echo h >/proc/sysrq-trigger and watch the show. [ 1406.245343] BUG: sleeping function called from invalid context at kernel/workqueue.c:4359 [ 1406.245345] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1819, name: bash [ 1406.245346] preempt_count: 2, expected: 0 [ 1406.245347] RCU nest depth: 1, expected: 0 [ 1406.245348] 7 locks held by bash/1819: [ 1406.245349] #0: ffff9a730c4ba428 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0x73/0xf0 [ 1406.245444] #1: ffffffffa7ade120 (rcu_read_lock){....}-{1:3}, at: __handle_sysrq+0x3d/0x270 [ 1406.245457] #2: ffffffffa7adb500 (console_lock){+.+.}-{0:0}, at: _printk+0x63/0x80 [ 1406.245464] #3: ffffffffa7adb550 (console_srcu){....}-{0:0}, at: console_flush_all+0x3c/0x530 [ 1406.245470] #4: ffffffffa7a6b140 (console_owner){....}-{0:0}, at: console_lock_spinning_enable+0x42/0x70 [ 1406.245475] #5: ffffffffa7a6b0c0 (printk_legacy_map-wait-type-override){....}-{4:4}, at: console_flush_all+0x2fe/0x530 [ 1406.245480] #6: ffffffffa7b8e4d8 (printing_lock){....}-{3:3}, at: vt_console_print+0x5b/0x490 [ 1406.245487] irq event stamp: 92690 [ 1406.245488] hardirqs last enabled at (92689): [<ffffffffa656edef>] console_flush_all+0x49f/0x530 [ 1406.245491] hardirqs last disabled at (92690): [<ffffffffa656edff>] console_flush_all+0x4af/0x530 [ 1406.245493] softirqs last enabled at (91822): [<ffffffffa6455166>] __fpu_restore_sig+0x206/0x6a0 [ 1406.245497] softirqs last disabled at (91820): [<ffffffffa6455088>] __fpu_restore_sig+0x128/0x6a0 [ 1406.245499] Preemption disabled at: [ 1406.245499] [<ffffffffa6570211>] vprintk_emit+0x1f1/0x430 [ 1406.245505] CPU: 46 UID: 0 PID: 1819 Comm: bash Not tainted 6.16.0-rc7+ #446 PREEMPT(voluntary) [ 1406.245508] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 1406.245510] Call Trace: [ 1406.245514] <TASK> [ 1406.245519] dump_stack_lvl+0x77/0xb0 [ 1406.245526] __might_resched+0x1b2/0x2d0 [ 1406.245539] cancel_delayed_work_sync+0x57/0x90 [ 1406.245548] fbcon_cursor+0xb2/0x140 [ 1406.245558] hide_cursor+0x24/0xb0 [ 1406.245562] vt_console_print+0x471/0x490 [ 1406.245565] ? __pfx_vt_console_print+0x10/0x10 [ 1406.245575] console_flush_all+0x32b/0x530 As I've enabled CONFIG_DEBUG_ATOMIC_SLEEP I get the splat immediately even when there is no work queued. (I misread that code yesterday with my tried brain and thought the might_sleep() would be too late). And clearly this is invoked with interrupts disabled: [ 1406.245345] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1819, name: bash Now if there would be work executed at that moment then the lockdep splat _AND_ the subsequent 'irqs already enabled' splat would materialize. I can't be bothered to figure out how to make the work executed because the above is already a clear prove of the theory :) It requires that the software cursor is enabled. That's probably rare, so that explains why this went unnoticed for so long. Contrary to all the halluzinations about the printk locking and lockdep correctness, this is a crystal clear bug in the framebuffer console code and has absolutely nothing to do with the printk() core code. The workqueue issue got introduced with commit 3b0fb6ab25dd ("fbcon: Use delayed work for cursor") in 5.19, but that replaced a del_timer_sync(), which had a similar problem because del_timer_sync() cannot be invoked in hard interrupt context, but printk() can ... That del_timer_sync() was conditional up to 4.2, where it became unconditional in commit 2a17d7e80f1d ("fbcon: unconditionally initialize cursor blink interval") just to be "fixed" shortly before 4.2-rc1 by moving the conditional back in commit a5edce421848 ("fbcon: Avoid deleting a timer in IRQ context"). That conditional did not fix anything as it just pushed the delete into the rarely executed path, therefore hiding the bug, which existed from the very beginning when this cursor blinking muck was introduced with commit acba9cd01974 ("fbcon: cursor blink control") in 2.6.23. So this software cursor code path was never tested with printk() and actually the splat which led to commit a5edce421848 ("fbcon: Avoid deleting a timer in IRQ context") should have made it crystal clear that this code path is broken: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at ../kernel/time/timer.c:1098 del_timer_sync+0x4c/0x54() Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.0-rc4-next-20150519 #1 Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [] (warn_slowpath_null) from [] (del_timer_sync+0x4c/0x54) [] (del_timer_sync) from [] (fbcon_del_cursor_timer+0x2c/0x40) [] (fbcon_del_cursor_timer) from [] (fbcon_cursor+0x9c/0x180) [] (fbcon_cursor) from [] (hide_cursor+0x30/0x98) [] (hide_cursor) from [] (vt_console_print+0x2a8/0x340) [] (vt_console_print) from [] (call_console_drivers.constprop.23+0xc8/0xec) [] (call_console_drivers.constprop.23) from [] (console_unlock+0x498/0x4f0) [] (console_unlock) from [] (vprintk_emit+0x1f0/0x508) [] (vprintk_emit) from [] (vprintk_default+0x24/0x2c) [] (vprintk_default) from [] (printk+0x70/0x88) But no, hiding it behind a rarely true conditional without actually testing it, is way better. Seriously? The changelog should have been: This commit papers over the underlying problem by restoring the condition... I leave it as an exercise to the framebuffer people how to fix this trainwreck for real. Thanks, tglx ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220 2025-07-25 15:23 ` Thomas Gleixner @ 2025-07-25 19:05 ` Pavel Machek 0 siblings, 0 replies; 27+ messages in thread From: Pavel Machek @ 2025-07-25 19:05 UTC (permalink / raw) To: Thomas Gleixner Cc: Waiman Long, kernel list, mingo, bp, dave.hansen, x86, hpa, peterz, will, miriam.rachel.korenblit, linux-wireless, Petr Mladek, John Ogness, Greg Kroah-Hartman, Jiri Slaby, Tejun Heo, Simona Vetter, Thierry Reding [-- Attachment #1: Type: text/plain, Size: 755 bytes --] Hi! > Enable framebuffer and framebuffer console. Add console=tty0 to the > kernel command line. > > Log into FB console and do > > # echo -e '\033[?17;0;64c' > > which enables the software cursor, which in turn enables the above > conditional invocation of fbcon_del_cursor_work(). Then force a printk > > # echo h >/proc/sysrq-trigger > > and watch the show. Heh, fun. Yes, long, long time ago I enabled software cursor. I don't normally use the console these days, but I switched to console for debugging the wireless problem -- hoping I would catch some kind of backtrace on the console. Best regards, Pavel -- I don't work for Nazis and criminals, and neither should you. Boycott Putin, Trump, and Musk! [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220 2025-07-24 17:51 ` Waiman Long 2025-07-24 17:55 ` Waiman Long @ 2025-07-24 19:07 ` Pavel Machek 2025-07-24 19:22 ` Waiman Long 2025-07-24 19:45 ` John Ogness 2 siblings, 1 reply; 27+ messages in thread From: Pavel Machek @ 2025-07-24 19:07 UTC (permalink / raw) To: Waiman Long Cc: Thomas Gleixner, kernel list, mingo, bp, dave.hansen, x86, hpa, peterz, will, miriam.rachel.korenblit, linux-wireless, Petr Mladek, John Ogness [-- Attachment #1: Type: text/plain, Size: 4433 bytes --] Hi! > > > > Can you please decode this, so we can see which part of that code it is? > > > And enable lockdep so that we can see where the interrupts were enabled? > > Enabled lockdep and got this one. It seems resume with bad wifi signal > > does it on 6.16...? > > > > Is it any good? Any decoding needed? > > > > ... > > [ 32.361445] CPU2 is up > > [ 32.361729] smpboot: Booting Node 0 Processor 3 APIC 0x3 > > [ 32.361982] Disabled fast string operations > > [ 32.366800] CPU3 is up > > [ 32.370186] ACPI: PM: Waking up from system sleep state S3 > > [ 32.393904] ACPI: EC: interrupt unblocked > > [ 32.396000] sdhci-pci 0000:0d:00.0: MMC controller base frequency changed to 50Mhz. > > [ 32.409738] ACPI: EC: event unblocked > > [ 32.470808] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0 > > [ 32.687300] usb 2-1.4: reset full-speed USB device number 4 using ehci-pci > > [ 32.758329] ata2: SATA link down (SStatus 0 SControl 300) > > [ 32.758375] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) > > [ 32.762316] ata5: SATA link down (SStatus 0 SControl 300) > > [ 32.764585] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out > > [ 32.764593] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out > > [ 32.771931] sd 2:0:0:0: [sdb] Starting disk > > [ 32.777439] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out > > [ 32.777450] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out > > [ 32.782731] ata3.00: configured for UDMA/133 > > [ 32.786846] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0 > > [ 32.858513] usb 2-1.3: reset full-speed USB device number 3 using ehci-pci > > [ 33.026331] usb 2-1.6: reset high-speed USB device number 5 using ehci-pci > > [ 33.662583] psmouse serio1: synaptics: queried max coordinates: x [..5472], y [..4448] > > [ 33.810033] PM: resume devices took 1.404 seconds > > [ 33.841597] OOM killer enabled. > > [ 33.841808] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: docking > > [ 33.843280] Restarting tasks: Starting > > [ 33.849066] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: Unable to dock! > > [ 33.852744] Restarting tasks: Done > > [ 33.888306] PM: suspend exit > > [ 33.941831] Bluetooth: hci0: BCM: chip id 63 > > [ 33.944910] Bluetooth: hci0: BCM: features 0x07 > > [ 33.961985] Bluetooth: hci0: BCM20702A > > [ 33.962628] Bluetooth: hci0: BCM20702A1 (001.002.014) build 0000 > > [ 33.970094] Bluetooth: hci0: BCM: firmware Patch file not found, tried: > > [ 33.970993] Bluetooth: hci0: BCM: 'brcm/BCM20702A1-0a5c-21e6.hcd' > > [ 33.971410] Bluetooth: hci0: BCM: 'brcm/BCM-0a5c-21e6.hcd' > > [ 34.032198] Bluetooth: MGMT ver 1.23 > > [ 34.838285] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > > [ 35.166235] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out > > [ 35.167828] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out > > [ 35.206578] sd 0:0:0:0: [sda] Starting disk > > [ 35.238578] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out > > [ 35.240197] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out > > [ 35.278928] ata1.00: configured for UDMA/133 > > [ 35.613471] e1000e 0000:00:19.0 enp0s25: NIC Link is Down > > [ 54.283825] wlp3s0: authenticate with c8:3a:35:f0:ad:f1 (local address=a0:88:b4:62:a7:30) > > > > [ 54.284095] ================================ > > [ 54.284097] WARNING: inconsistent lock state > > [ 54.284100] 6.16.0-rc7+ #305 Tainted: G S > > [ 54.284104] -------------------------------- > > [ 54.284105] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. > > [ 54.284108] wpa_supplicant/2940 [HC0[0]:SC0[0]:HE0:SE1] takes: > > [ 54.284114] ffffffff86263fe0 (console_owner){+.?.}-{0:0}, at: console_lock_spinning_enable+0x3d/0x60 > > The lockdep warning just means that console_owner_lock is acquired both in > softirq context and in task context with interrupt enabled. That can leads > to deadlock. So the remedy is to always take console_owner_lock with > interrupt disabled, i.e. with > raw_spin_lock_irqsave/raw_spin_lock_irqrestore. So you suspect problem is in the printk code? Best regards, Pavel -- I don't work for Nazis and criminals, and neither should you. Boycott Putin, Trump, and Musk! [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220 2025-07-24 19:07 ` Pavel Machek @ 2025-07-24 19:22 ` Waiman Long 2025-07-24 22:21 ` Thomas Gleixner 0 siblings, 1 reply; 27+ messages in thread From: Waiman Long @ 2025-07-24 19:22 UTC (permalink / raw) To: Pavel Machek, Waiman Long Cc: Thomas Gleixner, kernel list, mingo, bp, dave.hansen, x86, hpa, peterz, will, miriam.rachel.korenblit, linux-wireless, Petr Mladek, John Ogness On 7/24/25 3:07 PM, Pavel Machek wrote: > Hi! > >>>>> Can you please decode this, so we can see which part of that code it is? >>>> And enable lockdep so that we can see where the interrupts were enabled? >>> Enabled lockdep and got this one. It seems resume with bad wifi signal >>> does it on 6.16...? >>> >>> Is it any good? Any decoding needed? >>> >>> ... >>> [ 32.361445] CPU2 is up >>> [ 32.361729] smpboot: Booting Node 0 Processor 3 APIC 0x3 >>> [ 32.361982] Disabled fast string operations >>> [ 32.366800] CPU3 is up >>> [ 32.370186] ACPI: PM: Waking up from system sleep state S3 >>> [ 32.393904] ACPI: EC: interrupt unblocked >>> [ 32.396000] sdhci-pci 0000:0d:00.0: MMC controller base frequency changed to 50Mhz. >>> [ 32.409738] ACPI: EC: event unblocked >>> [ 32.470808] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0 >>> [ 32.687300] usb 2-1.4: reset full-speed USB device number 4 using ehci-pci >>> [ 32.758329] ata2: SATA link down (SStatus 0 SControl 300) >>> [ 32.758375] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) >>> [ 32.762316] ata5: SATA link down (SStatus 0 SControl 300) >>> [ 32.764585] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out >>> [ 32.764593] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out >>> [ 32.771931] sd 2:0:0:0: [sdb] Starting disk >>> [ 32.777439] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out >>> [ 32.777450] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out >>> [ 32.782731] ata3.00: configured for UDMA/133 >>> [ 32.786846] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0 >>> [ 32.858513] usb 2-1.3: reset full-speed USB device number 3 using ehci-pci >>> [ 33.026331] usb 2-1.6: reset high-speed USB device number 5 using ehci-pci >>> [ 33.662583] psmouse serio1: synaptics: queried max coordinates: x [..5472], y [..4448] >>> [ 33.810033] PM: resume devices took 1.404 seconds >>> [ 33.841597] OOM killer enabled. >>> [ 33.841808] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: docking >>> [ 33.843280] Restarting tasks: Starting >>> [ 33.849066] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: Unable to dock! >>> [ 33.852744] Restarting tasks: Done >>> [ 33.888306] PM: suspend exit >>> [ 33.941831] Bluetooth: hci0: BCM: chip id 63 >>> [ 33.944910] Bluetooth: hci0: BCM: features 0x07 >>> [ 33.961985] Bluetooth: hci0: BCM20702A >>> [ 33.962628] Bluetooth: hci0: BCM20702A1 (001.002.014) build 0000 >>> [ 33.970094] Bluetooth: hci0: BCM: firmware Patch file not found, tried: >>> [ 33.970993] Bluetooth: hci0: BCM: 'brcm/BCM20702A1-0a5c-21e6.hcd' >>> [ 33.971410] Bluetooth: hci0: BCM: 'brcm/BCM-0a5c-21e6.hcd' >>> [ 34.032198] Bluetooth: MGMT ver 1.23 >>> [ 34.838285] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) >>> [ 35.166235] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out >>> [ 35.167828] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out >>> [ 35.206578] sd 0:0:0:0: [sda] Starting disk >>> [ 35.238578] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out >>> [ 35.240197] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out >>> [ 35.278928] ata1.00: configured for UDMA/133 >>> [ 35.613471] e1000e 0000:00:19.0 enp0s25: NIC Link is Down >>> [ 54.283825] wlp3s0: authenticate with c8:3a:35:f0:ad:f1 (local address=a0:88:b4:62:a7:30) >>> >>> [ 54.284095] ================================ >>> [ 54.284097] WARNING: inconsistent lock state >>> [ 54.284100] 6.16.0-rc7+ #305 Tainted: G S >>> [ 54.284104] -------------------------------- >>> [ 54.284105] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. >>> [ 54.284108] wpa_supplicant/2940 [HC0[0]:SC0[0]:HE0:SE1] takes: >>> [ 54.284114] ffffffff86263fe0 (console_owner){+.?.}-{0:0}, at: console_lock_spinning_enable+0x3d/0x60 >> The lockdep warning just means that console_owner_lock is acquired both in >> softirq context and in task context with interrupt enabled. That can leads >> to deadlock. So the remedy is to always take console_owner_lock with >> interrupt disabled, i.e. with >> raw_spin_lock_irqsave/raw_spin_lock_irqrestore. > So you suspect problem is in the printk code? This lockdep problem is likely unrelated to the IRQ flag problem reported in the original email as it shouldn't affect the IRQ flag. Cheers, Longman ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220 2025-07-24 19:22 ` Waiman Long @ 2025-07-24 22:21 ` Thomas Gleixner 2025-07-25 1:56 ` Waiman Long 0 siblings, 1 reply; 27+ messages in thread From: Thomas Gleixner @ 2025-07-24 22:21 UTC (permalink / raw) To: Waiman Long, Pavel Machek, Waiman Long Cc: kernel list, mingo, bp, dave.hansen, x86, hpa, peterz, will, miriam.rachel.korenblit, linux-wireless, Petr Mladek, John Ogness On Thu, Jul 24 2025 at 15:22, Waiman Long wrote: > On 7/24/25 3:07 PM, Pavel Machek wrote: >>> The lockdep warning just means that console_owner_lock is acquired both in >>> softirq context and in task context with interrupt enabled. That can leads >>> to deadlock. So the remedy is to always take console_owner_lock with >>> interrupt disabled, i.e. with >>> raw_spin_lock_irqsave/raw_spin_lock_irqrestore. >> So you suspect problem is in the printk code? > > This lockdep problem is likely unrelated to the IRQ flag problem > reported in the original email as it shouldn't affect the IRQ flag. Wrong you are ... :) ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220 2025-07-24 22:21 ` Thomas Gleixner @ 2025-07-25 1:56 ` Waiman Long 0 siblings, 0 replies; 27+ messages in thread From: Waiman Long @ 2025-07-25 1:56 UTC (permalink / raw) To: Thomas Gleixner, Waiman Long, Pavel Machek Cc: kernel list, mingo, bp, dave.hansen, x86, hpa, peterz, will, miriam.rachel.korenblit, linux-wireless, Petr Mladek, John Ogness On 7/24/25 6:21 PM, Thomas Gleixner wrote: > On Thu, Jul 24 2025 at 15:22, Waiman Long wrote: >> On 7/24/25 3:07 PM, Pavel Machek wrote: >>>> The lockdep warning just means that console_owner_lock is acquired both in >>>> softirq context and in task context with interrupt enabled. That can leads >>>> to deadlock. So the remedy is to always take console_owner_lock with >>>> interrupt disabled, i.e. with >>>> raw_spin_lock_irqsave/raw_spin_lock_irqrestore. >>> So you suspect problem is in the printk code? >> This lockdep problem is likely unrelated to the IRQ flag problem >> reported in the original email as it shouldn't affect the IRQ flag. > Wrong you are ... :) > Sorry for coming to a conclusion early without looking more deeply into the printk code. I do admit that printk() isn't an area that I know enough about. Will look harder next time. Thanks, Longman ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220 2025-07-24 17:51 ` Waiman Long 2025-07-24 17:55 ` Waiman Long 2025-07-24 19:07 ` Pavel Machek @ 2025-07-24 19:45 ` John Ogness 2025-07-25 0:16 ` Waiman Long 2 siblings, 1 reply; 27+ messages in thread From: John Ogness @ 2025-07-24 19:45 UTC (permalink / raw) To: Waiman Long, Pavel Machek, Thomas Gleixner Cc: kernel list, mingo, bp, dave.hansen, x86, hpa, peterz, will, miriam.rachel.korenblit, linux-wireless, Petr Mladek On 2025-07-24, Waiman Long <llong@redhat.com> wrote: > On 7/24/25 1:15 PM, Pavel Machek wrote: >> Hi! >> >> On Wed 2025-07-23 19:32:10, Thomas Gleixner wrote: >>> On Wed, Jul 23 2025 at 19:31, Thomas Gleixner wrote: >>>> On Wed, Jul 23 2025 at 17:42, Pavel Machek wrote: >>>>> Did kernel boot on console (w/o X), and got this: not sure if it is >>>>> related. >>>>> [ 402.125635] ------------[ cut here ]------------ >>>>> [ 402.125638] raw_local_irq_restore() called with IRQs enabled >>>>> [ 402.125645] WARNING: CPU: 3 PID: 387 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30 >>>>> [ 402.125654] Modules linked in: >>>>> [ 402.125661] CPU: 3 UID: 0 PID: 387 Comm: kworker/u16:5 Tainted: G S 6.16.0-rc7+ #303 PREEMPT(voluntary) >>>>> [ 402.125667] Tainted: [S]=CPU_OUT_OF_SPEC >>>>> [ 402.125668] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016 >>>>> [ 402.125671] Workqueue: events_unbound cfg80211_wiphy_work >>>>> [ 402.125678] RIP: 0010:warn_bogus_irq_restore+0x25/0x30 >>>>> [ 402.125683] Code: 90 90 90 90 90 80 3d 51 3d dc 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 c0 4f c9 85 48 89 e5 c6 05 38 3d dc 00 01 e8 9b d8 e6 fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 >>>>> [ 402.125686] RSP: 0018:ffffc9000173fb30 EFLAGS: 00010282 >>>>> [ 402.125691] RAX: 0000000000000000 RBX: ffffffff8616b460 RCX: 0000000000000000 >>>>> [ 402.125694] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff >>>>> [ 402.125696] RBP: ffffc9000173fb30 R08: 0000000028935f32 R09: 0000000000000001 >>>>> [ 402.125699] R10: 0000000000000044 R11: ffff888100ba52c8 R12: 0000000000000001 >>>>> [ 402.125702] R13: ffffc9000173fbcb R14: ffffffff84301224 R15: 0000000000000000 >>>>> [ 402.125704] FS: 0000000000000000(0000) GS:ffff88829007f000(0000) knlGS:0000000000000000 >>>>> [ 402.125707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>> [ 402.125710] CR2: 000055967d471ee0 CR3: 0000000006046001 CR4: 00000000000606b0 >>>>> [ 402.125713] Call Trace: >>>>> [ 402.125716] <TASK> >>>>> [ 402.125719] console_flush_all+0x41e/0x460 >>>> Can you please decode this, so we can see which part of that code it is? >>> And enable lockdep so that we can see where the interrupts were enabled? >> Enabled lockdep and got this one. It seems resume with bad wifi signal >> does it on 6.16...? >> >> Is it any good? Any decoding needed? >> >> ... >> [ 32.361445] CPU2 is up >> [ 32.361729] smpboot: Booting Node 0 Processor 3 APIC 0x3 >> [ 32.361982] Disabled fast string operations >> [ 32.366800] CPU3 is up >> [ 32.370186] ACPI: PM: Waking up from system sleep state S3 >> [ 32.393904] ACPI: EC: interrupt unblocked >> [ 32.396000] sdhci-pci 0000:0d:00.0: MMC controller base frequency changed to 50Mhz. >> [ 32.409738] ACPI: EC: event unblocked >> [ 32.470808] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0 >> [ 32.687300] usb 2-1.4: reset full-speed USB device number 4 using ehci-pci >> [ 32.758329] ata2: SATA link down (SStatus 0 SControl 300) >> [ 32.758375] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) >> [ 32.762316] ata5: SATA link down (SStatus 0 SControl 300) >> [ 32.764585] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out >> [ 32.764593] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out >> [ 32.771931] sd 2:0:0:0: [sdb] Starting disk >> [ 32.777439] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out >> [ 32.777450] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out >> [ 32.782731] ata3.00: configured for UDMA/133 >> [ 32.786846] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0 >> [ 32.858513] usb 2-1.3: reset full-speed USB device number 3 using ehci-pci >> [ 33.026331] usb 2-1.6: reset high-speed USB device number 5 using ehci-pci >> [ 33.662583] psmouse serio1: synaptics: queried max coordinates: x [..5472], y [..4448] >> [ 33.810033] PM: resume devices took 1.404 seconds >> [ 33.841597] OOM killer enabled. >> [ 33.841808] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: docking >> [ 33.843280] Restarting tasks: Starting >> [ 33.849066] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: Unable to dock! >> [ 33.852744] Restarting tasks: Done >> [ 33.888306] PM: suspend exit >> [ 33.941831] Bluetooth: hci0: BCM: chip id 63 >> [ 33.944910] Bluetooth: hci0: BCM: features 0x07 >> [ 33.961985] Bluetooth: hci0: BCM20702A >> [ 33.962628] Bluetooth: hci0: BCM20702A1 (001.002.014) build 0000 >> [ 33.970094] Bluetooth: hci0: BCM: firmware Patch file not found, tried: >> [ 33.970993] Bluetooth: hci0: BCM: 'brcm/BCM20702A1-0a5c-21e6.hcd' >> [ 33.971410] Bluetooth: hci0: BCM: 'brcm/BCM-0a5c-21e6.hcd' >> [ 34.032198] Bluetooth: MGMT ver 1.23 >> [ 34.838285] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) >> [ 35.166235] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out >> [ 35.167828] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out >> [ 35.206578] sd 0:0:0:0: [sda] Starting disk >> [ 35.238578] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out >> [ 35.240197] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out >> [ 35.278928] ata1.00: configured for UDMA/133 >> [ 35.613471] e1000e 0000:00:19.0 enp0s25: NIC Link is Down >> [ 54.283825] wlp3s0: authenticate with c8:3a:35:f0:ad:f1 (local address=a0:88:b4:62:a7:30) >> >> [ 54.284095] ================================ >> [ 54.284097] WARNING: inconsistent lock state >> [ 54.284100] 6.16.0-rc7+ #305 Tainted: G S >> [ 54.284104] -------------------------------- >> [ 54.284105] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. >> [ 54.284108] wpa_supplicant/2940 [HC0[0]:SC0[0]:HE0:SE1] takes: >> [ 54.284114] ffffffff86263fe0 (console_owner){+.?.}-{0:0}, at: console_lock_spinning_enable+0x3d/0x60 > > The lockdep warning just means that console_owner_lock is acquired both > in softirq context and in task context with interrupt enabled. That can > leads to deadlock. So the remedy is to always take console_owner_lock > with interrupt disabled, i.e. with > raw_spin_lock_irqsave/raw_spin_lock_irqrestore. There are only 3 functions using this lock: 1. console_lock_spinning_enable() 2. console_lock_spinning_disable_and_check() 3. console_trylock_spinning() For all call sites of 1 and 2, it is surrounded by printk_safe_enter_irqsave()/_exit_irqrestore() calls. These explicitly call local_irq_save()/_restore(). For 3, it explicitly calls printk_safe_enter_irqsave()/_exit_irqrestore(). I wonder if this is a false positive because console_trylock_spinning() annotates the acquire after re-enabling interrupts: static int console_trylock_spinning(void) { ... printk_safe_exit_irqrestore(flags); ... mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_); ... } As to the crashing problem at hand, it might be faster to bisect the wireless-next-2025-05-22 merge. John Ogness ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220 2025-07-24 19:45 ` John Ogness @ 2025-07-25 0:16 ` Waiman Long 2025-07-25 13:44 ` Thomas Gleixner 0 siblings, 1 reply; 27+ messages in thread From: Waiman Long @ 2025-07-25 0:16 UTC (permalink / raw) To: John Ogness, Waiman Long, Pavel Machek, Thomas Gleixner Cc: kernel list, mingo, bp, dave.hansen, x86, hpa, peterz, will, miriam.rachel.korenblit, linux-wireless, Petr Mladek On 7/24/25 3:45 PM, John Ogness wrote: > On 2025-07-24, Waiman Long <llong@redhat.com> wrote: >> On 7/24/25 1:15 PM, Pavel Machek wrote: >>> Hi! >>> >>> On Wed 2025-07-23 19:32:10, Thomas Gleixner wrote: >>>> On Wed, Jul 23 2025 at 19:31, Thomas Gleixner wrote: >>>>> On Wed, Jul 23 2025 at 17:42, Pavel Machek wrote: >>>>>> Did kernel boot on console (w/o X), and got this: not sure if it is >>>>>> related. >>>>>> [ 402.125635] ------------[ cut here ]------------ >>>>>> [ 402.125638] raw_local_irq_restore() called with IRQs enabled >>>>>> [ 402.125645] WARNING: CPU: 3 PID: 387 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30 >>>>>> [ 402.125654] Modules linked in: >>>>>> [ 402.125661] CPU: 3 UID: 0 PID: 387 Comm: kworker/u16:5 Tainted: G S 6.16.0-rc7+ #303 PREEMPT(voluntary) >>>>>> [ 402.125667] Tainted: [S]=CPU_OUT_OF_SPEC >>>>>> [ 402.125668] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016 >>>>>> [ 402.125671] Workqueue: events_unbound cfg80211_wiphy_work >>>>>> [ 402.125678] RIP: 0010:warn_bogus_irq_restore+0x25/0x30 >>>>>> [ 402.125683] Code: 90 90 90 90 90 80 3d 51 3d dc 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 c0 4f c9 85 48 89 e5 c6 05 38 3d dc 00 01 e8 9b d8 e6 fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 >>>>>> [ 402.125686] RSP: 0018:ffffc9000173fb30 EFLAGS: 00010282 >>>>>> [ 402.125691] RAX: 0000000000000000 RBX: ffffffff8616b460 RCX: 0000000000000000 >>>>>> [ 402.125694] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff >>>>>> [ 402.125696] RBP: ffffc9000173fb30 R08: 0000000028935f32 R09: 0000000000000001 >>>>>> [ 402.125699] R10: 0000000000000044 R11: ffff888100ba52c8 R12: 0000000000000001 >>>>>> [ 402.125702] R13: ffffc9000173fbcb R14: ffffffff84301224 R15: 0000000000000000 >>>>>> [ 402.125704] FS: 0000000000000000(0000) GS:ffff88829007f000(0000) knlGS:0000000000000000 >>>>>> [ 402.125707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>> [ 402.125710] CR2: 000055967d471ee0 CR3: 0000000006046001 CR4: 00000000000606b0 >>>>>> [ 402.125713] Call Trace: >>>>>> [ 402.125716] <TASK> >>>>>> [ 402.125719] console_flush_all+0x41e/0x460 >>>>> Can you please decode this, so we can see which part of that code it is? >>>> And enable lockdep so that we can see where the interrupts were enabled? >>> Enabled lockdep and got this one. It seems resume with bad wifi signal >>> does it on 6.16...? >>> >>> Is it any good? Any decoding needed? >>> >>> ... >>> [ 32.361445] CPU2 is up >>> [ 32.361729] smpboot: Booting Node 0 Processor 3 APIC 0x3 >>> [ 32.361982] Disabled fast string operations >>> [ 32.366800] CPU3 is up >>> [ 32.370186] ACPI: PM: Waking up from system sleep state S3 >>> [ 32.393904] ACPI: EC: interrupt unblocked >>> [ 32.396000] sdhci-pci 0000:0d:00.0: MMC controller base frequency changed to 50Mhz. >>> [ 32.409738] ACPI: EC: event unblocked >>> [ 32.470808] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0 >>> [ 32.687300] usb 2-1.4: reset full-speed USB device number 4 using ehci-pci >>> [ 32.758329] ata2: SATA link down (SStatus 0 SControl 300) >>> [ 32.758375] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) >>> [ 32.762316] ata5: SATA link down (SStatus 0 SControl 300) >>> [ 32.764585] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out >>> [ 32.764593] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out >>> [ 32.771931] sd 2:0:0:0: [sdb] Starting disk >>> [ 32.777439] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out >>> [ 32.777450] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out >>> [ 32.782731] ata3.00: configured for UDMA/133 >>> [ 32.786846] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0 >>> [ 32.858513] usb 2-1.3: reset full-speed USB device number 3 using ehci-pci >>> [ 33.026331] usb 2-1.6: reset high-speed USB device number 5 using ehci-pci >>> [ 33.662583] psmouse serio1: synaptics: queried max coordinates: x [..5472], y [..4448] >>> [ 33.810033] PM: resume devices took 1.404 seconds >>> [ 33.841597] OOM killer enabled. >>> [ 33.841808] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: docking >>> [ 33.843280] Restarting tasks: Starting >>> [ 33.849066] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: Unable to dock! >>> [ 33.852744] Restarting tasks: Done >>> [ 33.888306] PM: suspend exit >>> [ 33.941831] Bluetooth: hci0: BCM: chip id 63 >>> [ 33.944910] Bluetooth: hci0: BCM: features 0x07 >>> [ 33.961985] Bluetooth: hci0: BCM20702A >>> [ 33.962628] Bluetooth: hci0: BCM20702A1 (001.002.014) build 0000 >>> [ 33.970094] Bluetooth: hci0: BCM: firmware Patch file not found, tried: >>> [ 33.970993] Bluetooth: hci0: BCM: 'brcm/BCM20702A1-0a5c-21e6.hcd' >>> [ 33.971410] Bluetooth: hci0: BCM: 'brcm/BCM-0a5c-21e6.hcd' >>> [ 34.032198] Bluetooth: MGMT ver 1.23 >>> [ 34.838285] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) >>> [ 35.166235] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out >>> [ 35.167828] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out >>> [ 35.206578] sd 0:0:0:0: [sda] Starting disk >>> [ 35.238578] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out >>> [ 35.240197] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out >>> [ 35.278928] ata1.00: configured for UDMA/133 >>> [ 35.613471] e1000e 0000:00:19.0 enp0s25: NIC Link is Down >>> [ 54.283825] wlp3s0: authenticate with c8:3a:35:f0:ad:f1 (local address=a0:88:b4:62:a7:30) >>> >>> [ 54.284095] ================================ >>> [ 54.284097] WARNING: inconsistent lock state >>> [ 54.284100] 6.16.0-rc7+ #305 Tainted: G S >>> [ 54.284104] -------------------------------- >>> [ 54.284105] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. >>> [ 54.284108] wpa_supplicant/2940 [HC0[0]:SC0[0]:HE0:SE1] takes: >>> [ 54.284114] ffffffff86263fe0 (console_owner){+.?.}-{0:0}, at: console_lock_spinning_enable+0x3d/0x60 >> The lockdep warning just means that console_owner_lock is acquired both >> in softirq context and in task context with interrupt enabled. That can >> leads to deadlock. So the remedy is to always take console_owner_lock >> with interrupt disabled, i.e. with >> raw_spin_lock_irqsave/raw_spin_lock_irqrestore. > There are only 3 functions using this lock: > > 1. console_lock_spinning_enable() > 2. console_lock_spinning_disable_and_check() > 3. console_trylock_spinning() > > For all call sites of 1 and 2, it is surrounded by > printk_safe_enter_irqsave()/_exit_irqrestore() calls. These explicitly > call local_irq_save()/_restore(). > > For 3, it explicitly calls > printk_safe_enter_irqsave()/_exit_irqrestore(). > > I wonder if this is a false positive because console_trylock_spinning() > annotates the acquire after re-enabling interrupts: > > static int console_trylock_spinning(void) > { > ... > printk_safe_exit_irqrestore(flags); > ... > mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_); > ... > } > > As to the crashing problem at hand, it might be faster to bisect the > wireless-next-2025-05-22 merge. I am sorry for confusing conole_owner_lock with the console_owner lockdep map. Yes, this lockdep splat should be a false positive. You do need to put the mutex_acquire before printk_safe_exit_irqrestore() to avoid this splat. Cheers, Longman ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220 2025-07-25 0:16 ` Waiman Long @ 2025-07-25 13:44 ` Thomas Gleixner 2025-07-25 19:38 ` Waiman Long 0 siblings, 1 reply; 27+ messages in thread From: Thomas Gleixner @ 2025-07-25 13:44 UTC (permalink / raw) To: Waiman Long, John Ogness, Waiman Long, Pavel Machek Cc: kernel list, mingo, bp, dave.hansen, x86, hpa, peterz, will, miriam.rachel.korenblit, linux-wireless, Petr Mladek On Thu, Jul 24 2025 at 20:16, Waiman Long wrote: > On 7/24/25 3:45 PM, John Ogness wrote: > I am sorry for confusing conole_owner_lock with the console_owner > lockdep map. Yes, this lockdep splat should be a false positive. You do > need to put the mutex_acquire before printk_safe_exit_irqrestore() to > avoid this splat. No. The printk code is absolutely correct as it is. Did you actually read my analysis? The lockdep splat is triggered by the raw_spin_unlock_irq() in __flush_work() as it makes console_owner unsafe. Again: printk() local_irq_save() acquire(console_owner); con->write() vt_console_print() hide_cursor() vc->vc_sw->con_cursor() fbcon_cursor() fbcon_del_cursor_work() cancel_delayed_work_sync() __flush_work() raw_spin_unlock_irq() <- FAIL The initial report from cancel_delayed_work_sync() _CANNOT_ be invoked from atomic contexts, unless the work is marked as BH work. And for BH work it cannot be called from hard interrupt context for obvious reasons. Marking that cursor work BH is not helping either, as printk() can be invoked from everywhere. There is no way to legitimately invoke cancel_delayed_work_sync() from the legacy console write() callback ever. No? Thanks, tglx ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220 2025-07-25 13:44 ` Thomas Gleixner @ 2025-07-25 19:38 ` Waiman Long 0 siblings, 0 replies; 27+ messages in thread From: Waiman Long @ 2025-07-25 19:38 UTC (permalink / raw) To: Thomas Gleixner, Waiman Long, John Ogness, Pavel Machek Cc: kernel list, mingo, bp, dave.hansen, x86, hpa, peterz, will, miriam.rachel.korenblit, linux-wireless, Petr Mladek On 7/25/25 9:44 AM, Thomas Gleixner wrote: > On Thu, Jul 24 2025 at 20:16, Waiman Long wrote: >> On 7/24/25 3:45 PM, John Ogness wrote: >> I am sorry for confusing conole_owner_lock with the console_owner >> lockdep map. Yes, this lockdep splat should be a false positive. You do >> need to put the mutex_acquire before printk_safe_exit_irqrestore() to >> avoid this splat. > No. The printk code is absolutely correct as it is. > > Did you actually read my analysis? I have read your analysis and I believe in your conclusion. > > The lockdep splat is triggered by the raw_spin_unlock_irq() in > __flush_work() as it makes console_owner unsafe. > > Again: > > printk() > local_irq_save() > acquire(console_owner); > con->write() > vt_console_print() > hide_cursor() > vc->vc_sw->con_cursor() > fbcon_cursor() > fbcon_del_cursor_work() > cancel_delayed_work_sync() > __flush_work() > raw_spin_unlock_irq() <- FAIL > > The initial report from > > cancel_delayed_work_sync() _CANNOT_ be invoked from atomic contexts, > unless the work is marked as BH work. And for BH work it cannot be > called from hard interrupt context for obvious reasons. > > Marking that cursor work BH is not helping either, as printk() can be > invoked from everywhere. > > There is no way to legitimately invoke cancel_delayed_work_sync() from > the legacy console write() callback ever. > > No? I am not disputing this. Thanks, Longman ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2025-09-15 9:40 UTC | newest] Thread overview: 27+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-07-22 17:35 6.16-rcX: crashing way too often on thinkpad X220 Pavel Machek 2025-07-23 15:06 ` Pavel Machek 2025-07-23 15:11 ` Dave Hansen 2025-07-23 20:35 ` Pavel Machek 2025-07-23 15:42 ` locking problems in iwlwifi? was " Pavel Machek 2025-07-23 17:13 ` Johannes Berg 2025-07-23 20:37 ` Pavel Machek 2025-07-24 8:50 ` John Ogness 2025-07-23 21:42 ` Pavel Machek 2025-09-15 9:32 ` Pavel Machek 2025-07-23 17:31 ` Thomas Gleixner 2025-07-23 17:32 ` Thomas Gleixner 2025-07-24 17:15 ` Pavel Machek 2025-07-24 17:51 ` Waiman Long 2025-07-24 17:55 ` Waiman Long 2025-07-24 21:24 ` Thomas Gleixner 2025-07-25 0:33 ` Hillf Danton 2025-07-25 15:23 ` Thomas Gleixner 2025-07-25 19:05 ` Pavel Machek 2025-07-24 19:07 ` Pavel Machek 2025-07-24 19:22 ` Waiman Long 2025-07-24 22:21 ` Thomas Gleixner 2025-07-25 1:56 ` Waiman Long 2025-07-24 19:45 ` John Ogness 2025-07-25 0:16 ` Waiman Long 2025-07-25 13:44 ` Thomas Gleixner 2025-07-25 19:38 ` Waiman Long
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).