* 6.16-rcX: crashing way too often on thinkpad X220
@ 2025-07-22 17:35 Pavel Machek
2025-07-23 15:06 ` Pavel Machek
0 siblings, 1 reply; 26+ messages in thread
From: Pavel Machek @ 2025-07-22 17:35 UTC (permalink / raw)
To: kernel list, tglx, mingo, bp, dave.hansen, x86, hpa
[-- Attachment #1: Type: text/plain, Size: 4184 bytes --]
Hi!
I have had 4th or so crash with 6.16. That's not usual.
This time capslock was blinking, indicating panic. Two crashes were
like that, two were just lockups.
This time I checked logs, and nothing unusual (besides this, but..)
Jul 22 09:05:20 duo pipewire[3473]: Failed to receive portal pid: org.freedesktop.DBus.Error.NameHasNoOwner: Could not get PID of nam>
Jul 22 19:08:57 duo pulseaudio[3474]: ALSA woke us up to write new data to the device, but there was actually nothing to write.
Jul 22 19:08:57 duo pulseaudio[3474]: Most likely this is a bug in the ALSA driver 'snd_hda_intel'. Please report this issue to the A>
Jul 22 19:08:57 duo pulseaudio[3474]: We were woken up with POLLOUT
set -- however a subsequent snd_pcm_avail() returned 0 or another>
Digging through older logs, I get:
Jul 06 15:22:20 duo kernel: iwlwifi 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control
Jul 06 15:22:20 duo kernel: iwlwifi 0000:03:00.0: pci_enable_msi failed - -38
Jul 06 15:22:20 duo kernel: ACPI: \: failed to evaluate _DSM 2c176672-0b22-294b-814f-75e4dd26b5fd rev:0 func:0 (0x1001)
Jul 06 15:22:20 duo kernel: ACPI: \: failed to evaluate _DSM 2c176672-0b22-294b-814f-75e4dd26b5fd rev:0 func:0 (0x1001)
Jul 06 15:22:20 duo kernel: ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
Jul 06 15:22:20 duo kernel: platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
Jul 06 15:22:20 duo kernel: Bluetooth: hci0: BCM: firmware Patch file not found, tried:
Jul 06 15:22:20 duo kernel: Bluetooth: hci0: BCM: 'brcm/BCM20702A1-0a5c-21e6.hcd'
Jul 06 15:22:20 duo kernel: Bluetooth: hci0: BCM: 'brcm/BCM-0a5c-21e6.hcd'
...
Jul 08 08:21:32 duo kernel: Bluetooth: hci0: BCM: firmware Patch file not found, tried:
Jul 08 08:21:32 duo kernel: Bluetooth: hci0: BCM: 'brcm/BCM20702A1-0a5c-21e6.hcd'
Jul 08 08:21:32 duo kernel: Bluetooth: hci0: BCM: 'brcm/BCM-0a5c-21e6.hcd'
Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: Microcode SW error detected. Restarting 0x2000000.
Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: Loaded firmware version: 18.168.6.1 6000g2a-6.ucode
Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: Start IWL Error Log Dump:
Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: Status: 0x0000044C, count: 6
Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x00001030 | ADVANCED_SYSASSERT
Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x0000C064 | uPc
Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x0000C05C | branchlink1
Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x0000C05C | branchlink2
Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x0000D6BE | interruptlink1
Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | interruptlink2
Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x003B00F7 | data1
Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | data2
Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x00000800 | line
Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0xED408AD8 | beacon time
Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x40AC6528 | tsf low
Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x0000000E | tsf hi
Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x00000578 | time gp1
Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x190D1D6C | time gp2
Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | time gp3
Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x754312A8 | uCode version
Jul 08 09:40:07 duo kernel: iwlwifi 0000:03:00.0: 0x000000B0 | hw version
...
Jul 08 12:23:29 duo kernel: usb 1-1-port2: unable to enumerate USB device
Jul 08 12:50:42 duo kernel: mmc0: tried to HW reset card, got error -123
Jul 08 13:44:20 duo kernel: usb 1-1.2: device descriptor read/64, error -71
Jul 08 13:44:20 duo kernel: usb 1-1.2: device descriptor read/64, error -71
Jul 08 13:44:57 duo kernel: usb 1-1.2: device descriptor read/64, error -71
But I doubt any of that is related.
Any ideas? Does anyone see similar crashes?
Best regards,
Pavel
--
I don't work for Nazis and criminals, and neither should you.
Boycott Putin, Trump, and Musk!
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: 6.16-rcX: crashing way too often on thinkpad X220
2025-07-22 17:35 6.16-rcX: crashing way too often on thinkpad X220 Pavel Machek
@ 2025-07-23 15:06 ` Pavel Machek
2025-07-23 15:11 ` Dave Hansen
2025-07-23 15:42 ` locking problems in iwlwifi? was " Pavel Machek
0 siblings, 2 replies; 26+ messages in thread
From: Pavel Machek @ 2025-07-23 15:06 UTC (permalink / raw)
To: kernel list, tglx, mingo, bp, dave.hansen, x86, hpa
[-- Attachment #1: Type: text/plain, Size: 258 bytes --]
Hi!
> I have had 4th or so crash with 6.16. That's not usual.
Two more crashes today. Last one was with 6.16-rc7+.
Ideas welcome,
Pavel
--
I don't work for Nazis and criminals, and neither should you.
Boycott Putin, Trump, and Musk!
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: 6.16-rcX: crashing way too often on thinkpad X220
2025-07-23 15:06 ` Pavel Machek
@ 2025-07-23 15:11 ` Dave Hansen
2025-07-23 20:35 ` Pavel Machek
2025-07-23 15:42 ` locking problems in iwlwifi? was " Pavel Machek
1 sibling, 1 reply; 26+ messages in thread
From: Dave Hansen @ 2025-07-23 15:11 UTC (permalink / raw)
To: Pavel Machek, kernel list, tglx, mingo, bp, dave.hansen, x86, hpa
On 7/23/25 08:06, Pavel Machek wrote:
>> I have had 4th or so crash with 6.16. That's not usual.
> Two more crashes today. Last one was with 6.16-rc7+.
>
> Ideas welcome,
Still nothing in the logs?
The age old idea of getting console and trying to capture an oops always
applies.
I've got an x230 sitting here that I have an ExpressCard serial port
for. There's also netconsole and xhci debug cables.
^ permalink raw reply [flat|nested] 26+ messages in thread
* locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220
2025-07-23 15:06 ` Pavel Machek
2025-07-23 15:11 ` Dave Hansen
@ 2025-07-23 15:42 ` Pavel Machek
2025-07-23 17:13 ` Johannes Berg
2025-07-23 17:31 ` Thomas Gleixner
1 sibling, 2 replies; 26+ messages in thread
From: Pavel Machek @ 2025-07-23 15:42 UTC (permalink / raw)
To: kernel list, tglx, mingo, bp, dave.hansen, x86, hpa, peterz, will,
longman, miriam.rachel.korenblit, linux-wireless
[-- Attachment #1.1: Type: text/plain, Size: 4581 bytes --]
Hi!
> > I have had 4th or so crash with 6.16. That's not usual.
>
> Two more crashes today. Last one was with 6.16-rc7+.
Did kernel boot on console (w/o X), and got this: not sure if it is
related.
Best regards,
Pavel
...
[ 18.466106] systemd-journald[1449]: File /var/log/journal/66f054f85acd4f7b9d2f07ff9b88ad15/user-1000.journal corrupted or uncleanly shut down, renaming and replacing.
[ 21.302012] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
[ 21.598035] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
[ 24.925016] wlp3s0: authenticate with c8:3a:35:f0:ad:f1 (local address=a0:88:b4:62:a7:30)
[ 24.925050] wlp3s0: send auth to c8:3a:35:f0:ad:f1 (try 1/3)
[ 24.933544] wlp3s0: authenticated
[ 24.937771] wlp3s0: associate with c8:3a:35:f0:ad:f1 (try 1/3)
[ 24.941781] wlp3s0: RX AssocResp from c8:3a:35:f0:ad:f1 (capab=0x411 status=0 aid=8)
[ 24.963083] wlp3s0: associated
[ 25.625057] bcache: register_bdev() registered backing device sda2
[ 26.635587] EXT4-fs (bcache0): recovery complete
[ 26.652144] EXT4-fs (bcache0): mounted filesystem 233f8e6b-8b89-4425-9eb4-8cb18077dd52 r/w with ordered data mode. Quota mode: disabled.
[ 30.062975] Adding 9878100k swap on /dev/sdb1. Priority:-2 extents:1 across:9878100k SS
[ 402.124881] wlp3s0: deauthenticated from c8:3a:35:f0:ad:f1 (Reason: 7=CLASS3_FRAME_FROM_NONASSOC_STA)
[ 402.125635] ------------[ cut here ]------------
[ 402.125638] raw_local_irq_restore() called with IRQs enabled
[ 402.125645] WARNING: CPU: 3 PID: 387 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30
[ 402.125654] Modules linked in:
[ 402.125661] CPU: 3 UID: 0 PID: 387 Comm: kworker/u16:5 Tainted: G S 6.16.0-rc7+ #303 PREEMPT(voluntary)
[ 402.125667] Tainted: [S]=CPU_OUT_OF_SPEC
[ 402.125668] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016
[ 402.125671] Workqueue: events_unbound cfg80211_wiphy_work
[ 402.125678] RIP: 0010:warn_bogus_irq_restore+0x25/0x30
[ 402.125683] Code: 90 90 90 90 90 80 3d 51 3d dc 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 c0 4f c9 85 48 89 e5 c6 05 38 3d dc 00 01 e8 9b d8 e6 fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90
[ 402.125686] RSP: 0018:ffffc9000173fb30 EFLAGS: 00010282
[ 402.125691] RAX: 0000000000000000 RBX: ffffffff8616b460 RCX: 0000000000000000
[ 402.125694] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff
[ 402.125696] RBP: ffffc9000173fb30 R08: 0000000028935f32 R09: 0000000000000001
[ 402.125699] R10: 0000000000000044 R11: ffff888100ba52c8 R12: 0000000000000001
[ 402.125702] R13: ffffc9000173fbcb R14: ffffffff84301224 R15: 0000000000000000
[ 402.125704] FS: 0000000000000000(0000) GS:ffff88829007f000(0000) knlGS:0000000000000000
[ 402.125707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 402.125710] CR2: 000055967d471ee0 CR3: 0000000006046001 CR4: 00000000000606b0
[ 402.125713] Call Trace:
[ 402.125716] <TASK>
[ 402.125719] console_flush_all+0x41e/0x460
[ 402.125725] ? console_flush_all+0x43/0x460
[ 402.125735] console_unlock+0x55/0x100
[ 402.125741] vprintk_emit+0x157/0x320
[ 402.125748] vprintk_default+0x18/0x20
[ 402.125752] vprintk+0x9/0x10
[ 402.125756] _printk+0x52/0x70
[ 402.125766] ieee80211_sta_rx_queued_mgmt+0x4c8/0xd30
[ 402.125775] ? __this_cpu_preempt_check+0x13/0x20
[ 402.125784] ieee80211_iface_work+0x3ad/0x500
[ 402.125792] cfg80211_wiphy_work+0xbf/0xe0
[ 402.125799] process_one_work+0x3fb/0x500
[ 402.125805] ? process_one_work+0x3bf/0x500
[ 402.125809] ? process_one_work+0x3bf/0x500
[ 402.125813] ? process_one_work+0x3bf/0x500
[ 402.125822] worker_thread+0x189/0x340
[ 402.125828] ? __pfx_worker_thread+0x10/0x10
[ 402.125834] kthread+0x104/0x260
[ 402.125840] ? __pfx_kthread+0x10/0x10
[ 402.125846] ret_from_fork+0x7d/0xe0
[ 402.125852] ? __pfx_kthread+0x10/0x10
[ 402.125857] ret_from_fork_asm+0x1a/0x30
[ 402.125870] </TASK>
[ 402.125872] ---[ end trace 0000000000000000 ]---
[ 402.267771] wlp3s0: authenticate with c8:3a:35:f0:ad:f1 (local address=a0:88:b4:62:a7:30)
[ 402.268566] wlp3s0: send auth to c8:3a:35:f0:ad:f1 (try 1/3)
[ 402.277687] wlp3s0: authenticated
[ 402.280664] wlp3s0: associate with c8:3a:35:f0:ad:f1 (try 1/3)
[ 402.285133] wlp3s0: RX AssocResp from c8:3a:35:f0:ad:f1 (capab=0x411 status=0 aid=7)
[ 402.306254] wlp3s0: associated
--
I don't work for Nazis and criminals, and neither should you.
Boycott Putin, Trump, and Musk!
[-- Attachment #1.2: dmesg.723.gz --]
[-- Type: application/gzip, Size: 20664 bytes --]
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220
2025-07-23 15:42 ` locking problems in iwlwifi? was " Pavel Machek
@ 2025-07-23 17:13 ` Johannes Berg
2025-07-23 20:37 ` Pavel Machek
2025-07-23 21:42 ` Pavel Machek
2025-07-23 17:31 ` Thomas Gleixner
1 sibling, 2 replies; 26+ messages in thread
From: Johannes Berg @ 2025-07-23 17:13 UTC (permalink / raw)
To: Pavel Machek, kernel list, tglx, mingo, bp, dave.hansen, x86, hpa,
peterz, will, longman, miriam.rachel.korenblit, linux-wireless
Hi,
On Wed, 2025-07-23 at 17:42 +0200, Pavel Machek wrote:
> [ 402.125635] ------------[ cut here ]------------
> [ 402.125638] raw_local_irq_restore() called with IRQs enabled
> [ 402.125645] WARNING: CPU: 3 PID: 387 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30
> [ 402.125654] Modules linked in:
> [ 402.125661] CPU: 3 UID: 0 PID: 387 Comm: kworker/u16:5 Tainted: G S 6.16.0-rc7+ #303 PREEMPT(voluntary)
> [ 402.125667] Tainted: [S]=CPU_OUT_OF_SPEC
> [ 402.125668] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016
> [ 402.125671] Workqueue: events_unbound cfg80211_wiphy_work
> [ 402.125678] RIP: 0010:warn_bogus_irq_restore+0x25/0x30
> [ 402.125683] Code: 90 90 90 90 90 80 3d 51 3d dc 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 c0 4f c9 85 48 89 e5 c6 05 38 3d dc 00 01 e8 9b d8 e6 fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90
> [ 402.125686] RSP: 0018:ffffc9000173fb30 EFLAGS: 00010282
> [ 402.125691] RAX: 0000000000000000 RBX: ffffffff8616b460 RCX: 0000000000000000
> [ 402.125694] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff
> [ 402.125696] RBP: ffffc9000173fb30 R08: 0000000028935f32 R09: 0000000000000001
> [ 402.125699] R10: 0000000000000044 R11: ffff888100ba52c8 R12: 0000000000000001
> [ 402.125702] R13: ffffc9000173fbcb R14: ffffffff84301224 R15: 0000000000000000
> [ 402.125704] FS: 0000000000000000(0000) GS:ffff88829007f000(0000) knlGS:0000000000000000
> [ 402.125707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 402.125710] CR2: 000055967d471ee0 CR3: 0000000006046001 CR4: 00000000000606b0
> [ 402.125713] Call Trace:
> [ 402.125716] <TASK>
> [ 402.125719] console_flush_all+0x41e/0x460
> [ 402.125725] ? console_flush_all+0x43/0x460
> [ 402.125735] console_unlock+0x55/0x100
> [ 402.125741] vprintk_emit+0x157/0x320
> [ 402.125748] vprintk_default+0x18/0x20
> [ 402.125752] vprintk+0x9/0x10
> [ 402.125756] _printk+0x52/0x70
> [ 402.125766] ieee80211_sta_rx_queued_mgmt+0x4c8/0xd30
> [ 402.125775] ? __this_cpu_preempt_check+0x13/0x20
> [ 402.125784] ieee80211_iface_work+0x3ad/0x500
That's not great, but I don't see how the driver or wifi subsystem is
involved ... ieee80211_sta_rx_queued_mgmt() doesn't even use spinlocks
let alone disable IRQs or use raw_ APIs, and it's in the middle of
printk anyway.
No idea what might be going on here, sorry.
johannes
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220
2025-07-23 15:42 ` locking problems in iwlwifi? was " Pavel Machek
2025-07-23 17:13 ` Johannes Berg
@ 2025-07-23 17:31 ` Thomas Gleixner
2025-07-23 17:32 ` Thomas Gleixner
1 sibling, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2025-07-23 17:31 UTC (permalink / raw)
To: Pavel Machek, kernel list, mingo, bp, dave.hansen, x86, hpa,
peterz, will, longman, miriam.rachel.korenblit, linux-wireless,
Petr Mladek, John Ogness
On Wed, Jul 23 2025 at 17:42, Pavel Machek wrote:
> Did kernel boot on console (w/o X), and got this: not sure if it is
> related.
> [ 402.125635] ------------[ cut here ]------------
> [ 402.125638] raw_local_irq_restore() called with IRQs enabled
> [ 402.125645] WARNING: CPU: 3 PID: 387 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30
> [ 402.125654] Modules linked in:
> [ 402.125661] CPU: 3 UID: 0 PID: 387 Comm: kworker/u16:5 Tainted: G S 6.16.0-rc7+ #303 PREEMPT(voluntary)
> [ 402.125667] Tainted: [S]=CPU_OUT_OF_SPEC
> [ 402.125668] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016
> [ 402.125671] Workqueue: events_unbound cfg80211_wiphy_work
> [ 402.125678] RIP: 0010:warn_bogus_irq_restore+0x25/0x30
> [ 402.125683] Code: 90 90 90 90 90 80 3d 51 3d dc 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 c0 4f c9 85 48 89 e5 c6 05 38 3d dc 00 01 e8 9b d8 e6 fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90
> [ 402.125686] RSP: 0018:ffffc9000173fb30 EFLAGS: 00010282
> [ 402.125691] RAX: 0000000000000000 RBX: ffffffff8616b460 RCX: 0000000000000000
> [ 402.125694] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff
> [ 402.125696] RBP: ffffc9000173fb30 R08: 0000000028935f32 R09: 0000000000000001
> [ 402.125699] R10: 0000000000000044 R11: ffff888100ba52c8 R12: 0000000000000001
> [ 402.125702] R13: ffffc9000173fbcb R14: ffffffff84301224 R15: 0000000000000000
> [ 402.125704] FS: 0000000000000000(0000) GS:ffff88829007f000(0000) knlGS:0000000000000000
> [ 402.125707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 402.125710] CR2: 000055967d471ee0 CR3: 0000000006046001 CR4: 00000000000606b0
> [ 402.125713] Call Trace:
> [ 402.125716] <TASK>
> [ 402.125719] console_flush_all+0x41e/0x460
Can you please decode this, so we can see which part of that code it is?
> [ 402.125725] ? console_flush_all+0x43/0x460
> [ 402.125735] console_unlock+0x55/0x100
> [ 402.125741] vprintk_emit+0x157/0x320
> [ 402.125748] vprintk_default+0x18/0x20
> [ 402.125752] vprintk+0x9/0x10
> [ 402.125756] _printk+0x52/0x70
Thanks,
tglx
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220
2025-07-23 17:31 ` Thomas Gleixner
@ 2025-07-23 17:32 ` Thomas Gleixner
2025-07-24 17:15 ` Pavel Machek
0 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2025-07-23 17:32 UTC (permalink / raw)
To: Pavel Machek, kernel list, mingo, bp, dave.hansen, x86, hpa,
peterz, will, longman, miriam.rachel.korenblit, linux-wireless,
Petr Mladek, John Ogness
On Wed, Jul 23 2025 at 19:31, Thomas Gleixner wrote:
> On Wed, Jul 23 2025 at 17:42, Pavel Machek wrote:
>> Did kernel boot on console (w/o X), and got this: not sure if it is
>> related.
>> [ 402.125635] ------------[ cut here ]------------
>> [ 402.125638] raw_local_irq_restore() called with IRQs enabled
>> [ 402.125645] WARNING: CPU: 3 PID: 387 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30
>> [ 402.125654] Modules linked in:
>> [ 402.125661] CPU: 3 UID: 0 PID: 387 Comm: kworker/u16:5 Tainted: G S 6.16.0-rc7+ #303 PREEMPT(voluntary)
>> [ 402.125667] Tainted: [S]=CPU_OUT_OF_SPEC
>> [ 402.125668] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016
>> [ 402.125671] Workqueue: events_unbound cfg80211_wiphy_work
>> [ 402.125678] RIP: 0010:warn_bogus_irq_restore+0x25/0x30
>> [ 402.125683] Code: 90 90 90 90 90 80 3d 51 3d dc 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 c0 4f c9 85 48 89 e5 c6 05 38 3d dc 00 01 e8 9b d8 e6 fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90
>> [ 402.125686] RSP: 0018:ffffc9000173fb30 EFLAGS: 00010282
>> [ 402.125691] RAX: 0000000000000000 RBX: ffffffff8616b460 RCX: 0000000000000000
>> [ 402.125694] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff
>> [ 402.125696] RBP: ffffc9000173fb30 R08: 0000000028935f32 R09: 0000000000000001
>> [ 402.125699] R10: 0000000000000044 R11: ffff888100ba52c8 R12: 0000000000000001
>> [ 402.125702] R13: ffffc9000173fbcb R14: ffffffff84301224 R15: 0000000000000000
>> [ 402.125704] FS: 0000000000000000(0000) GS:ffff88829007f000(0000) knlGS:0000000000000000
>> [ 402.125707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 402.125710] CR2: 000055967d471ee0 CR3: 0000000006046001 CR4: 00000000000606b0
>> [ 402.125713] Call Trace:
>> [ 402.125716] <TASK>
>> [ 402.125719] console_flush_all+0x41e/0x460
>
> Can you please decode this, so we can see which part of that code it is?
And enable lockdep so that we can see where the interrupts were enabled?
Thanks,
tglx
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: 6.16-rcX: crashing way too often on thinkpad X220
2025-07-23 15:11 ` Dave Hansen
@ 2025-07-23 20:35 ` Pavel Machek
0 siblings, 0 replies; 26+ messages in thread
From: Pavel Machek @ 2025-07-23 20:35 UTC (permalink / raw)
To: Dave Hansen; +Cc: kernel list, tglx, mingo, bp, dave.hansen, x86, hpa
[-- Attachment #1: Type: text/plain, Size: 910 bytes --]
Hi!
> >> I have had 4th or so crash with 6.16. That's not usual.
> > Two more crashes today. Last one was with 6.16-rc7+.
> >
> > Ideas welcome,
>
> Still nothing in the logs?
>
> The age old idea of getting console and trying to capture an oops always
> applies.
>
> I've got an x230 sitting here that I have an ExpressCard serial port
> for. There's also netconsole and xhci debug cables.
I was kind of wornering if I'm only one seeing problems. I've now
built 6.15, and I plan to switch to it on next crash to see if it is
maybe hardware being flakey or what.
Oh, and I got _something_, see the thread. But that did not kill the
machine, so...
I'm currently traveling, and wifi reception is bad (and not really
chance to do heavy debugging).
Best regards,
Pavel
--
I don't work for Nazis and criminals, and neither should you.
Boycott Putin, Trump, and Musk!
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220
2025-07-23 17:13 ` Johannes Berg
@ 2025-07-23 20:37 ` Pavel Machek
2025-07-24 8:50 ` John Ogness
2025-07-23 21:42 ` Pavel Machek
1 sibling, 1 reply; 26+ messages in thread
From: Pavel Machek @ 2025-07-23 20:37 UTC (permalink / raw)
To: Johannes Berg
Cc: kernel list, tglx, mingo, bp, dave.hansen, x86, hpa, peterz, will,
longman, miriam.rachel.korenblit, linux-wireless, pmladek,
rostedt, john.ogness, senozhatsky
[-- Attachment #1: Type: text/plain, Size: 2766 bytes --]
Hi!
> On Wed, 2025-07-23 at 17:42 +0200, Pavel Machek wrote:
> > [ 402.125635] ------------[ cut here ]------------
> > [ 402.125638] raw_local_irq_restore() called with IRQs enabled
> > [ 402.125645] WARNING: CPU: 3 PID: 387 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30
> > [ 402.125654] Modules linked in:
> > [ 402.125661] CPU: 3 UID: 0 PID: 387 Comm: kworker/u16:5 Tainted: G S 6.16.0-rc7+ #303 PREEMPT(voluntary)
> > [ 402.125667] Tainted: [S]=CPU_OUT_OF_SPEC
> > [ 402.125668] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016
> > [ 402.125671] Workqueue: events_unbound cfg80211_wiphy_work
> > [ 402.125678] RIP: 0010:warn_bogus_irq_restore+0x25/0x30
> > [ 402.125683] Code: 90 90 90 90 90 80 3d 51 3d dc 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 c0 4f c9 85 48 89 e5 c6 05 38 3d dc 00 01 e8 9b d8 e6 fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90
> > [ 402.125686] RSP: 0018:ffffc9000173fb30 EFLAGS: 00010282
> > [ 402.125691] RAX: 0000000000000000 RBX: ffffffff8616b460 RCX: 0000000000000000
> > [ 402.125694] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff
> > [ 402.125696] RBP: ffffc9000173fb30 R08: 0000000028935f32 R09: 0000000000000001
> > [ 402.125699] R10: 0000000000000044 R11: ffff888100ba52c8 R12: 0000000000000001
> > [ 402.125702] R13: ffffc9000173fbcb R14: ffffffff84301224 R15: 0000000000000000
> > [ 402.125704] FS: 0000000000000000(0000) GS:ffff88829007f000(0000) knlGS:0000000000000000
> > [ 402.125707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 402.125710] CR2: 000055967d471ee0 CR3: 0000000006046001 CR4: 00000000000606b0
> > [ 402.125713] Call Trace:
> > [ 402.125716] <TASK>
> > [ 402.125719] console_flush_all+0x41e/0x460
> > [ 402.125725] ? console_flush_all+0x43/0x460
> > [ 402.125735] console_unlock+0x55/0x100
> > [ 402.125741] vprintk_emit+0x157/0x320
> > [ 402.125748] vprintk_default+0x18/0x20
> > [ 402.125752] vprintk+0x9/0x10
> > [ 402.125756] _printk+0x52/0x70
> > [ 402.125766] ieee80211_sta_rx_queued_mgmt+0x4c8/0xd30
> > [ 402.125775] ? __this_cpu_preempt_check+0x13/0x20
> > [ 402.125784] ieee80211_iface_work+0x3ad/0x500
>
> That's not great, but I don't see how the driver or wifi subsystem is
> involved ... ieee80211_sta_rx_queued_mgmt() doesn't even use spinlocks
> let alone disable IRQs or use raw_ APIs, and it's in the middle of
> printk anyway.
>
> No idea what might be going on here, sorry.
So lets ask printk() people if they have some idea... / seen something
similar / ....?
Pavel
--
I don't work for Nazis and criminals, and neither should you.
Boycott Putin, Trump, and Musk!
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220
2025-07-23 17:13 ` Johannes Berg
2025-07-23 20:37 ` Pavel Machek
@ 2025-07-23 21:42 ` Pavel Machek
1 sibling, 0 replies; 26+ messages in thread
From: Pavel Machek @ 2025-07-23 21:42 UTC (permalink / raw)
To: Johannes Berg
Cc: kernel list, tglx, mingo, bp, dave.hansen, x86, hpa, peterz, will,
longman, miriam.rachel.korenblit, linux-wireless
[-- Attachment #1: Type: text/plain, Size: 30317 bytes --]
Hi!
> On Wed, 2025-07-23 at 17:42 +0200, Pavel Machek wrote:
> > [ 402.125635] ------------[ cut here ]------------
> > [ 402.125638] raw_local_irq_restore() called with IRQs enabled
> > [ 402.125645] WARNING: CPU: 3 PID: 387 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30
> > [ 402.125654] Modules linked in:
> > [ 402.125661] CPU: 3 UID: 0 PID: 387 Comm: kworker/u16:5 Tainted: G S 6.16.0-rc7+ #303 PREEMPT(voluntary)
> > [ 402.125667] Tainted: [S]=CPU_OUT_OF_SPEC
> > [ 402.125668] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016
> > [ 402.125671] Workqueue: events_unbound cfg80211_wiphy_work
> > [ 402.125678] RIP: 0010:warn_bogus_irq_restore+0x25/0x30
> > [ 402.125683] Code: 90 90 90 90 90 80 3d 51 3d dc 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 c0 4f c9 85 48 89 e5 c6 05 38 3d dc 00 01 e8 9b d8 e6 fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90
> > [ 402.125686] RSP: 0018:ffffc9000173fb30 EFLAGS: 00010282
> > [ 402.125691] RAX: 0000000000000000 RBX: ffffffff8616b460 RCX: 0000000000000000
> > [ 402.125694] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff
> > [ 402.125696] RBP: ffffc9000173fb30 R08: 0000000028935f32 R09: 0000000000000001
> > [ 402.125699] R10: 0000000000000044 R11: ffff888100ba52c8 R12: 0000000000000001
> > [ 402.125702] R13: ffffc9000173fbcb R14: ffffffff84301224 R15: 0000000000000000
> > [ 402.125704] FS: 0000000000000000(0000) GS:ffff88829007f000(0000) knlGS:0000000000000000
> > [ 402.125707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 402.125710] CR2: 000055967d471ee0 CR3: 0000000006046001 CR4: 00000000000606b0
> > [ 402.125713] Call Trace:
> > [ 402.125716] <TASK>
> > [ 402.125719] console_flush_all+0x41e/0x460
> > [ 402.125725] ? console_flush_all+0x43/0x460
> > [ 402.125735] console_unlock+0x55/0x100
> > [ 402.125741] vprintk_emit+0x157/0x320
> > [ 402.125748] vprintk_default+0x18/0x20
> > [ 402.125752] vprintk+0x9/0x10
> > [ 402.125756] _printk+0x52/0x70
> > [ 402.125766] ieee80211_sta_rx_queued_mgmt+0x4c8/0xd30
> > [ 402.125775] ? __this_cpu_preempt_check+0x13/0x20
> > [ 402.125784] ieee80211_iface_work+0x3ad/0x500
>
> That's not great, but I don't see how the driver or wifi subsystem is
> involved ... ieee80211_sta_rx_queued_mgmt() doesn't even use spinlocks
> let alone disable IRQs or use raw_ APIs, and it's in the middle of
> printk anyway.
>
> No idea what might be going on here, sorry.
Does this give any more hints? I seem to have... pretty unhappy wifi,
followed by slab corruption.
Is my assumption correct that this looks like wifi problem?
(I can share full logs via personal email).
BR,
Pavel
Jul 23 23:29:19 duo kernel: wlp3s0: authenticated
Jul 23 23:29:19 duo kernel: wlp3s0: associate with c8:3a:35:f0:ad:f1 (try 1/3)
Jul 23 23:29:19 duo kernel: wlp3s0: RX AssocResp from c8:3a:35:f0:ad:f1 (capab=0x411 status=0 aid=7)
Jul 23 23:29:19 duo wpa_supplicant[2868]: wlp3s0: Associated with c8:3a:35:f0:ad:f1
Jul 23 23:29:19 duo wpa_supplicant[2868]: wlp3s0: CTRL-EVENT-SUBNET-STATUS-UPDATE status=0
Jul 23 23:29:19 duo kernel: wlp3s0: associated
Jul 23 23:29:19 duo NetworkManager[2853]: <info> [1753306159.2521] device (wlp3s0): supplicant interface state: associating -> associated
Jul 23 23:29:19 duo NetworkManager[2853]: <info> [1753306159.2522] device (wlp3s0): DHCPv4 lease renewal requested
Jul 23 23:29:19 duo NetworkManager[2853]: <info> [1753306159.2524] dhcp4 (wlp3s0): canceled DHCP transaction
Jul 23 23:29:19 duo NetworkManager[2853]: <info> [1753306159.2525] dhcp4 (wlp3s0): state changed extended -> terminated
Jul 23 23:29:19 duo NetworkManager[2853]: <info> [1753306159.2532] dhcp4 (wlp3s0): activation: beginning transaction (timeout in 45 seconds)
Jul 23 23:29:19 duo NetworkManager[2853]: <info> [1753306159.5357] device (wlp3s0): supplicant interface state: associated -> 4way_handshake
Jul 23 23:29:19 duo wpa_supplicant[2868]: wlp3s0: WPA: Key negotiation completed with c8:3a:35:f0:ad:f1 [PTK=CCMP GTK=CCMP]
Jul 23 23:29:19 duo wpa_supplicant[2868]: wlp3s0: CTRL-EVENT-CONNECTED - Connection to c8:3a:35:f0:ad:f1 completed [id=0 id_str=]
Jul 23 23:29:19 duo NetworkManager[2853]: <info> [1753306159.5766] device (wlp3s0): supplicant interface state: 4way_handshake -> completed
Jul 23 23:29:19 duo wpa_supplicant[2868]: wlp3s0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-71 noise=9999 txrate=0
Jul 23 23:29:23 duo sudo[78117]: pavel : TTY=pts/1 ; PWD=/fast/pavel ; USER=root ; COMMAND=/usr/sbin/hdparm -B 255 /dev/sda
Jul 23 23:29:23 duo sudo[78117]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
Jul 23 23:29:23 duo sudo[78117]: pam_unix(sudo:session): session closed for user root
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: Microcode SW error detected. Restarting 0x2000000.
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: Loaded firmware version: 18.168.6.1 6000g2a-6.ucode
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: Start IWL Error Log Dump:
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: Status: 0x0000044C, count: 6
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00001030 | ADVANCED_SYSASSERT
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x0000C064 | uPc
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x0000C05C | branchlink1
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x0000C05C | branchlink2
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x0000D6BE | interruptlink1
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | interruptlink2
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x003900EE | data1
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | data2
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00000800 | line
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x14413F07 | beacon time
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0xF6E050F9 | tsf low
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | tsf hi
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x000005CB | time gp1
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x533515F9 | time gp2
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | time gp3
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x754312A8 | uCode version
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x000000B0 | hw version
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00488700 | board version
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x003A001C | hcmd
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0xA786300A | isr0
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x1141E000 | isr1
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00000F1A | isr2
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x014730C0 | isr3
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | isr4
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00010110 | isr_pref
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00023098 | wait_event
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00004288 | l2p_control
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x0000003C | l2p_duration
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | l2p_mhvalid
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x001054F7 | l2p_addr_match
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x00000015 | lmpm_pmg_sel
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x06061222 | timestamp
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: 0x000098A8 | flow_handler
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: Start IWL Event Log Dump: nothing in log
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: Device error - SW reset
Jul 23 23:29:27 duo kernel: ieee80211 phy0: Hardware restart was requested
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
Jul 23 23:29:27 duo kernel: iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
Jul 23 23:29:29 duo sudo[78137]: pavel : TTY=pts/0 ; PWD=/fast/pavel ; USER=root ; COMMAND=/usr/sbin/hdparm -B 255 /dev/sda
Jul 23 23:29:29 duo sudo[78137]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
Jul 23 23:29:29 duo sudo[78137]: pam_unix(sudo:session): session closed for user root
Jul 23 23:29:47 duo wpa_supplicant[2868]: wlp3s0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-58 noise=9999 txrate=54000
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: Microcode SW error detected. Restarting 0x2000000.
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: Loaded firmware version: 18.168.6.1 6000g2a-6.ucode
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: Start IWL Error Log Dump:
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: Status: 0x0000044C, count: 6
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00001030 | ADVANCED_SYSASSERT
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x0000C064 | uPc
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x0000C05C | branchlink1
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x0000C05C | branchlink2
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x0000D6BE | interruptlink1
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | interruptlink2
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x0045011D | data1
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | data2
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00000800 | line
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x30406BE5 | beacon time
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0xF816841B | tsf low
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | tsf hi
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x0000056B | time gp1
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x01312AC3 | time gp2
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | time gp3
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x754312A8 | uCode version
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x000000B0 | hw version
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00488700 | board version
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x000C001C | hcmd
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0xA7F6300A | isr0
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x0101E000 | isr1
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00000E1A | isr2
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x014768C0 | isr3
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | isr4
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x10014112 | isr_pref
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00023098 | wait_event
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x000000C4 | l2p_control
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00000586 | l2p_duration
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00000007 | l2p_mhvalid
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00101042 | l2p_addr_match
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x00000015 | lmpm_pmg_sel
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x06061222 | timestamp
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: 0x0000A8B8 | flow_handler
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: Start IWL Event Log Dump: nothing in log
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: Device error - SW reset
Jul 23 23:29:47 duo kernel: ieee80211 phy0: Hardware restart was requested
Jul 23 23:29:47 duo kernel: iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
Jul 23 23:29:48 duo kernel: iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
Jul 23 23:29:53 duo sudo[78189]: pavel : TTY=pts/1 ; PWD=/fast/pavel ; USER=root ; COMMAND=/usr/sbin/hdparm -B 255 /dev/sda
Jul 23 23:29:53 duo sudo[78189]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
Jul 23 23:29:53 duo sudo[78189]: pam_unix(sudo:session): session closed for user root
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Microcode SW error detected. Restarting 0x2000000.
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Loaded firmware version: 18.168.6.1 6000g2a-6.ucode
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Start IWL Error Log Dump:
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Status: 0x0000044C, count: 6
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00001030 | ADVANCED_SYSASSERT
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x0000C064 | uPc
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x0000C05C | branchlink1
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x0000C05C | branchlink2
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x0000D6BE | interruptlink1
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | interruptlink2
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x003C00FA | data1
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | data2
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00000800 | line
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x0D005446 | beacon time
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0xF86E1BBA | tsf low
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | tsf hi
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00000577 | time gp1
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00528405 | time gp2
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | time gp3
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x754312A8 | uCode version
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x000000B0 | hw version
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00488700 | board version
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x000A001C | hcmd
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0xA7F63008 | isr0
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x0101E000 | isr1
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00000E1A | isr2
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x014730C0 | isr3
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00000000 | isr4
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x10010112 | isr_pref
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00023098 | wait_event
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x000000C4 | l2p_control
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00000592 | l2p_duration
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00000007 | l2p_mhvalid
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00101042 | l2p_addr_match
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00000015 | lmpm_pmg_sel
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x06061222 | timestamp
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: 0x00000818 | flow_handler
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Start IWL Event Log Dump: nothing in log
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Device error - reprobe!
Jul 23 23:29:53 duo wpa_supplicant[2868]: wlp3s0: CTRL-EVENT-DISCONNECTED bssid=c8:3a:35:f0:ad:f1 reason=3 locally_generated=1
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Command REPLY_LEDS_CMD failed: FW Error
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Command REPLY_LEDS_CMD failed: FW Error
Jul 23 23:29:53 duo kernel: wlp3s0: deauthenticating from c8:3a:35:f0:ad:f1 by local choice (Reason: 3=DEAUTH_LEAVING)
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Command REPLY_TXFIFO_FLUSH failed: FW Error
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: flush request fail
Jul 23 23:29:53 duo kernel: ------------[ cut here ]------------
Jul 23 23:29:53 duo kernel: bad state = 0
Jul 23 23:29:53 duo kernel: WARNING: CPU: 2 PID: 40123 at drivers/net/wireless/intel/iwlwifi/iwl-trans.c:755 iwl_trans_wait_tx_queues_empty+0x44/0x50
Jul 23 23:29:53 duo kernel: Modules linked in:
Jul 23 23:29:53 duo kernel: CPU: 2 UID: 0 PID: 40123 Comm: kworker/2:0 Tainted: G S W 6.16.0-rc7+ #303 PREEMPT(voluntary)
Jul 23 23:29:53 duo kernel: Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
Jul 23 23:29:53 duo kernel: Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016
Jul 23 23:29:53 duo kernel: Workqueue: events iwl_trans_reprobe_wk
Jul 23 23:29:53 duo kernel: RIP: 0010:iwl_trans_wait_tx_queues_empty+0x44/0x50
Jul 23 23:29:53 duo kernel: Code: cc cc cc 80 3d d8 df 47 01 00 b8 fb ff ff ff 75 ec 44 89 c6 48 c7 c7 27 a0 d8 85 89 45 fc c6 05 bd df 47 01 01 e8 8c 79 52 ff <0f> 0b 8b 45 fc c9 c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90
Jul 23 23:29:53 duo kernel: RSP: 0018:ffffc9000b2bf7f8 EFLAGS: 00010282
Jul 23 23:29:53 duo kernel: RAX: 0000000000000000 RBX: ffff888101f63060 RCX: 0000000000000000
Jul 23 23:29:53 duo kernel: RDX: 0000000000000002 RSI: 0000000000000027 RDI: 00000000ffffffff
Jul 23 23:29:53 duo kernel: RBP: ffffc9000b2bf800 R08: ffffffff8613bc08 R09: 00000000ffffdfff
Jul 23 23:29:53 duo kernel: R10: 0000000000000aa3 R11: ffffffff8610bc20 R12: 00000000000ffdef
Jul 23 23:29:53 duo kernel: R13: 0000000000000000 R14: ffff888101f60e00 R15: ffff888101f630d8
Jul 23 23:29:53 duo kernel: FS: 0000000000000000(0000) GS:ffff88828ffff000(0000) knlGS:0000000000000000
Jul 23 23:29:53 duo kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 23 23:29:53 duo kernel: CR2: 000055f98d7e8cb0 CR3: 000000010911e004 CR4: 00000000000606b0
Jul 23 23:29:53 duo kernel: Call Trace:
Jul 23 23:29:53 duo kernel: <TASK>
Jul 23 23:29:53 duo kernel: iwlagn_mac_flush+0x90/0xb0
Jul 23 23:29:53 duo kernel: __ieee80211_flush_queues+0x1aa/0x200
Jul 23 23:29:53 duo kernel: ieee80211_flush_queues+0xe/0x20
Jul 23 23:29:53 duo kernel: ieee80211_set_disassoc+0x6c4/0x940
Jul 23 23:29:53 duo kernel: ieee80211_mgd_deauth+0x120/0x5b0
Jul 23 23:29:53 duo kernel: ieee80211_deauth+0x13/0x20
Jul 23 23:29:53 duo kernel: cfg80211_mlme_deauth+0xac/0x140
Jul 23 23:29:53 duo kernel: cfg80211_mlme_down+0x96/0xc0
Jul 23 23:29:53 duo kernel: cfg80211_disconnect+0x152/0x170
Jul 23 23:29:53 duo kernel: cfg80211_leave+0x180/0x1b0
Jul 23 23:29:53 duo kernel: cfg80211_netdev_notifier_call+0xed/0x3e0
Jul 23 23:29:53 duo kernel: raw_notifier_call_chain+0x44/0x60
Jul 23 23:29:53 duo kernel: call_netdevice_notifiers_info+0x40/0x80
Jul 23 23:29:53 duo kernel: __dev_close_many+0x5a/0x200
Jul 23 23:29:53 duo kernel: dev_close_many+0xa4/0x180
Jul 23 23:29:53 duo kernel: netif_close+0x8c/0xc0
Jul 23 23:29:53 duo kernel: dev_close+0x27/0x70
Jul 23 23:29:53 duo kernel: cfg80211_shutdown_all_interfaces+0x4b/0xf0
Jul 23 23:29:53 duo kernel: ieee80211_remove_interfaces+0x44/0x220
Jul 23 23:29:53 duo kernel: ieee80211_unregister_hw+0x39/0x140
Jul 23 23:29:53 duo kernel: iwlagn_mac_unregister+0x27/0x40
Jul 23 23:29:53 duo kernel: iwl_op_mode_dvm_stop+0x16/0xb0
Jul 23 23:29:53 duo kernel: _iwl_op_mode_stop+0x35/0x70
Jul 23 23:29:53 duo kernel: iwl_drv_stop+0x2b/0xa0
Jul 23 23:29:53 duo kernel: iwl_pci_remove+0x29/0x40
Jul 23 23:29:53 duo kernel: pci_device_remove+0x3c/0xa0
Jul 23 23:29:53 duo kernel: device_remove+0x41/0x70
Jul 23 23:29:53 duo kernel: device_release_driver_internal+0x1a6/0x210
Jul 23 23:29:53 duo kernel: device_driver_detach+0xf/0x20
Jul 23 23:29:53 duo kernel: device_reprobe+0x18/0x90
Jul 23 23:29:53 duo kernel: iwl_trans_reprobe_wk+0x17/0x50
Jul 23 23:29:53 duo kernel: process_one_work+0x3fb/0x500
Jul 23 23:29:53 duo kernel: ? process_one_work+0x3bf/0x500
Jul 23 23:29:53 duo kernel: ? process_one_work+0x3bf/0x500
Jul 23 23:29:53 duo kernel: ? process_one_work+0x3bf/0x500
Jul 23 23:29:53 duo kernel: worker_thread+0x189/0x340
Jul 23 23:29:53 duo kernel: ? __pfx_worker_thread+0x10/0x10
Jul 23 23:29:53 duo kernel: kthread+0x104/0x260
Jul 23 23:29:53 duo kernel: ? __pfx_kthread+0x10/0x10
Jul 23 23:29:53 duo kernel: ret_from_fork+0x7d/0xe0
Jul 23 23:29:53 duo kernel: ? __pfx_kthread+0x10/0x10
Jul 23 23:29:53 duo kernel: ret_from_fork_asm+0x1a/0x30
Jul 23 23:29:53 duo kernel: </TASK>
Jul 23 23:29:53 duo kernel: ---[ end trace 0000000000000000 ]---
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Command REPLY_ADD_STA failed: FW Error
Jul 23 23:29:53 duo kernel: wlp3s0: HW problem - can not stop rx aggregation for c8:3a:35:f0:ad:f1 tid 0
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Command REPLY_ADD_STA failed: FW Error
Jul 23 23:29:53 duo kernel: wlp3s0: failed to remove key (0, c8:3a:35:f0:ad:f1) from hardware (-5)
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Command REPLY_QOS_PARAM failed: FW Error
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Failed to update QoS
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Command REPLY_RXON failed: FW Error
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Error clearing ASSOC_MSK on BSS (-5)
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Command REPLY_RXON failed: FW Error
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Error clearing ASSOC_MSK on BSS (-5)
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Command REPLY_RXON failed: FW Error
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Error clearing ASSOC_MSK on BSS (-5)
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Command REPLY_RXON failed: FW Error
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Error clearing ASSOC_MSK on BSS (-5)
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Command REPLY_ADD_STA failed: FW Error
Jul 23 23:29:53 duo kernel: wlp3s0: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-5)
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Command REPLY_RXON failed: FW Error
Jul 23 23:29:53 duo kernel: iwlwifi 0000:03:00.0: Error clearing ASSOC_MSK on BSS (-5)
Jul 23 23:29:53 duo avahi-daemon[2850]: Interface wlp3s0.IPv4 no longer relevant for mDNS.
Jul 23 23:29:53 duo avahi-daemon[2850]: Leaving mDNS multicast group on interface wlp3s0.IPv4 with address 192.168.1.110.
Jul 23 23:29:53 duo avahi-daemon[2850]: Withdrawing address record for 192.168.1.110 on wlp3s0.
Jul 23 23:29:53 duo NetworkManager[2853]: <info> [1753306193.6437] device (wlp3s0): state change: activated -> unmanaged (reason 'removed', sys-iface-state: 'removed')
Jul 23 23:29:53 duo NetworkManager[2853]: <info> [1753306193.6710] dhcp4 (wlp3s0): canceled DHCP transaction
Jul 23 23:29:53 duo NetworkManager[2853]: <info> [1753306193.6711] dhcp4 (wlp3s0): state changed unknown -> terminated
Jul 23 23:29:53 duo systemd[1]: Starting Load/Save RF Kill Switch Status...
Jul 23 23:29:53 duo kernel: ------------[ cut here ]------------
Jul 23 23:29:53 duo kernel: WARNING: CPU: 2 PID: 40123 at mm/slab_common.c:1002 __ksize+0xe9/0x110
Jul 23 23:29:53 duo kernel: Modules linked in:
Jul 23 23:29:53 duo kernel: CPU: 2 UID: 0 PID: 40123 Comm: kworker/2:0 Tainted: G S W 6.16.0-rc7+ #303 PREEMPT(voluntary)
Jul 23 23:29:53 duo kernel: Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
Jul 23 23:29:53 duo kernel: Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016
Jul 23 23:29:53 duo kernel: Workqueue: events iwl_trans_reprobe_wk
Jul 23 23:29:53 duo kernel: RIP: 0010:__ksize+0xe9/0x110
Jul 23 23:29:53 duo kernel: Code: 29 48 89 d9 48 29 d1 48 c1 f9 06 48 89 ca 48 c1 e2 0c 48 03 15 a8 5c a6 01 48 39 d6 75 12 48 8b 13 83 e2 40 48 0f 44 c7 eb 8e <0f> 0b 31 c0 eb 88 0f 0b 31 c0 eb 82 48 8b 05 34 5a c9 01 e9 2d ff
Jul 23 23:29:53 duo kernel: RSP: 0018:ffffc9000b2bfc88 EFLAGS: 00010246
Jul 23 23:29:53 duo kernel: RAX: 8000000000000000 RBX: ffffea000777f640 RCX: ffffffffffffffff
Jul 23 23:29:53 duo kernel: RDX: ffffea0000000000 RSI: ffff8881ddfd9c80 RDI: ffff8881ddfd9c80
Jul 23 23:29:53 duo kernel: RBP: ffffc9000b2bfc90 R08: 0000000000000000 R09: 000000000020000a
Jul 23 23:29:53 duo kernel: R10: 00000000000000db R11: ffff88813dd989b8 R12: ffff8881ddfd9c80
Jul 23 23:29:53 duo kernel: R13: 0000000000000000 R14: 0000000000000d70 R15: 0000000000000057
Jul 23 23:29:53 duo kernel: FS: 0000000000000000(0000) GS:ffff88828ffff000(0000) knlGS:0000000000000000
Jul 23 23:29:53 duo kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 23 23:29:53 duo kernel: CR2: 00000ac403a953c8 CR3: 0000000129dce002 CR4: 00000000000606b0
Jul 23 23:29:53 duo kernel: Call Trace:
Jul 23 23:29:53 duo kernel: <TASK>
Jul 23 23:29:53 duo kernel: kfree_sensitive+0x14/0x50
Jul 23 23:29:53 duo kernel: iwl_pcie_tx_free+0x16b/0x260
Jul 23 23:29:53 duo kernel: iwl_trans_pcie_free+0x201/0x230
Jul 23 23:29:53 duo kernel: iwl_pci_remove+0x31/0x40
Jul 23 23:29:53 duo kernel: pci_device_remove+0x3c/0xa0
Jul 23 23:29:53 duo kernel: device_remove+0x41/0x70
Jul 23 23:29:53 duo kernel: device_release_driver_internal+0x1a6/0x210
Jul 23 23:29:53 duo kernel: device_driver_detach+0xf/0x20
Jul 23 23:29:53 duo kernel: device_reprobe+0x18/0x90
Jul 23 23:29:53 duo kernel: iwl_trans_reprobe_wk+0x17/0x50
Jul 23 23:29:53 duo kernel: process_one_work+0x3fb/0x500
Jul 23 23:29:53 duo kernel: ? process_one_work+0x3bf/0x500
Jul 23 23:29:53 duo kernel: ? process_one_work+0x3bf/0x500
Jul 23 23:29:53 duo kernel: ? process_one_work+0x3bf/0x500
Jul 23 23:29:53 duo kernel: worker_thread+0x189/0x340
Jul 23 23:29:53 duo kernel: ? __pfx_worker_thread+0x10/0x10
Jul 23 23:29:53 duo kernel: kthread+0x104/0x260
Jul 23 23:29:53 duo kernel: ? __pfx_kthread+0x10/0x10
Jul 23 23:29:53 duo kernel: ret_from_fork+0x7d/0xe0
Jul 23 23:29:53 duo kernel: ? __pfx_kthread+0x10/0x10
Jul 23 23:29:53 duo kernel: ret_from_fork_asm+0x1a/0x30
Jul 23 23:29:53 duo kernel: </TASK>
Jul 23 23:29:53 duo kernel: ---[ end trace 0000000000000000 ]---
Jul 23 23:29:53 duo kernel: ------------[ cut here ]------------
Jul 23 23:29:53 duo kernel: WARNING: CPU: 2 PID: 40123 at mm/slub.c:4753 free_large_kmalloc+0xb5/0xd0
Jul 23 23:29:53 duo kernel: Modules linked in:
Jul 23 23:29:53 duo kernel: CPU: 2 UID: 0 PID: 40123 Comm: kworker/2:0 Tainted: G S W 6.16.0-rc7+ #303 PREEMPT(voluntary)
Jul 23 23:29:53 duo kernel: Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
Jul 23 23:29:53 duo kernel: Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016
Jul 23 23:29:53 duo kernel: Workqueue: events iwl_trans_reprobe_wk
Jul 23 23:29:53 duo kernel: RIP: 0010:free_large_kmalloc+0xb5/0xd0
Jul 23 23:29:53 duo kernel: Code: ca 85 48 89 55 f0 c6 05 51 30 dd 01 01 e8 83 51 f0 ff 48 8b 55 f0 eb 96 4c 89 e7 e8 95 3a fa ff 4c 8b 65 f8 c9 c3 cc cc cc cc <0f> 0b 48 c7 c6 08 7f ca 85 4c 89 e7 e8 6a 6f fc ff 4c 8b 65 f8 c9
Jul 23 23:29:53 duo kernel: RSP: 0018:ffffc9000b2bfc30 EFLAGS: 00010297
Jul 23 23:29:53 duo kernel: RAX: 00000000000000f2 RBX: ffff8881ddfd9c80 RCX: ffffffffffffffff
Jul 23 23:29:53 duo kernel: RDX: 0000000000000000 RSI: ffff8881ddfd9c80 RDI: ffffea000777f640
Jul 23 23:29:53 duo kernel: RBP: ffffc9000b2bfc40 R08: 0000000000000000 R09: 000000000020000a
Jul 23 23:29:53 duo kernel: R10: 00000000000000db R11: ffff88813dd989b8 R12: ffffea000777f640
Jul 23 23:29:53 duo kernel: R13: 0000000000000000 R14: 0000000000000d70 R15: 0000000000000057
Jul 23 23:29:53 duo kernel: FS: 0000000000000000(0000) GS:ffff88828ffff000(0000) knlGS:0000000000000000
Jul 23 23:29:53 duo kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 23 23:29:53 duo kernel: CR2: 00000ac403a953c8 CR3: 0000000129dce002 CR4: 00000000000606b0
Jul 23 23:29:53 duo kernel: Call Trace:
Jul 23 23:29:53 duo kernel: <TASK>
Jul 23 23:29:53 duo kernel: kfree+0xef/0x210
Jul 23 23:29:53 duo kernel: ? __ksize+0xeb/0x110
Jul 23 23:29:53 duo kernel: kfree_sensitive+0x21/0x50
Jul 23 23:29:53 duo kernel: iwl_pcie_tx_free+0x16b/0x260
Jul 23 23:29:53 duo kernel: iwl_trans_pcie_free+0x201/0x230
Jul 23 23:29:53 duo kernel: iwl_pci_remove+0x31/0x40
Jul 23 23:29:53 duo kernel: pci_device_remove+0x3c/0xa0
Jul 23 23:29:53 duo kernel: device_remove+0x41/0x70
Jul 23 23:29:53 duo kernel: device_release_driver_internal+0x1a6/0x210
Jul 23 23:29:53 duo kernel: device_driver_detach+0xf/0x20
Jul 23 23:29:53 duo kernel: device_reprobe+0x18/0x90
Jul 23 23:29:53 duo kernel: iwl_trans_reprobe_wk+0x17/0x50
Jul 23 23:29:53 duo kernel: process_one_work+0x3fb/0x500
Jul 23 23:29:53 duo kernel: ? process_one_work+0x3bf/0x500
Jul 23 23:29:53 duo kernel: ? process_one_work+0x3bf/0x500
Jul 23 23:29:53 duo kernel: ? process_one_work+0x3bf/0x500
Jul 23 23:29:53 duo kernel: worker_thread+0x189/0x340
Jul 23 23:29:53 duo kernel: ? __pfx_worker_thread+0x10/0x10
Jul 23 23:29:53 duo kernel: kthread+0x104/0x260
Jul 23 23:29:53 duo kernel: ? __pfx_kthread+0x10/0x10
Jul 23 23:29:53 duo kernel: ret_from_fork+0x7d/0xe0
Jul 23 23:29:53 duo kernel: ? __pfx_kthread+0x10/0x10
Jul 23 23:29:53 duo kernel: ret_from_fork_asm+0x1a/0x30
Jul 23 23:29:53 duo kernel: </TASK>
Jul 23 23:29:53 duo kernel: ---[ end trace 0000000000000000 ]---
Jul 23 23:29:53 duo kernel: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff8881ddfd9d00 pfn:0x1ddfd9
Jul 23 23:29:53 duo kernel: flags: 0x8000000000000000(zone=2)
Jul 23 23:29:53 duo kernel: page_type: f2(table)
Jul 23 23:29:53 duo kernel: raw: 8000000000000000 0000000000000000 dead000000000122 0000000000000000
Jul 23 23:29:53 duo kernel: raw: ffff8881ddfd9d00 ffff88801f09d680 00000001f2000000 0000000000000000
Jul 23 23:29:53 duo kernel: page dumped because: Not a kmalloc allocation
Jul 23 23:29:53 duo kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
Jul 23 23:29:53 duo kernel: #PF: supervisor instruction fetch in kernel mode
Jul 23 23:29:53 duo kernel: #PF: error_code(0x0010) - not-present page
Jul 23 23:29:53 duo kernel: PGD 0 P4D 0
Jul 23 23:29:53 duo kernel: Oops: Oops: 0010 [#1] SMP PTI
Jul 23 23:29:53 duo kernel: CPU: 3 UID: 0 PID: 3093 Comm: Xorg Tainted: G S W 6.16.0-rc7+ #303 PREEMPT(voluntary)
Jul 23 23:29:53 duo kernel: Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
Jul 23 23:29:53 duo kernel: Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016
Jul 23 23:29:53 duo dbus-daemon[3483]: [session uid=1000 pid=3483] Activating service name='org.freedesktop.Notifications' requested by ':1.45' (uid=1000 pid=3746 comm="nm-applet ")
--
I don't work for Nazis and criminals, and neither should you.
Boycott Putin, Trump, and Musk!
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220
2025-07-23 20:37 ` Pavel Machek
@ 2025-07-24 8:50 ` John Ogness
0 siblings, 0 replies; 26+ messages in thread
From: John Ogness @ 2025-07-24 8:50 UTC (permalink / raw)
To: Pavel Machek, Johannes Berg
Cc: kernel list, tglx, mingo, bp, dave.hansen, x86, hpa, peterz, will,
longman, miriam.rachel.korenblit, linux-wireless, pmladek,
rostedt, senozhatsky
On 2025-07-23, Pavel Machek <pavel@ucw.cz> wrote:
>> On Wed, 2025-07-23 at 17:42 +0200, Pavel Machek wrote:
>> > [ 402.125635] ------------[ cut here ]------------
>> > [ 402.125638] raw_local_irq_restore() called with IRQs enabled
>> > [ 402.125645] WARNING: CPU: 3 PID: 387 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30
>> > [ 402.125654] Modules linked in:
>> > [ 402.125661] CPU: 3 UID: 0 PID: 387 Comm: kworker/u16:5 Tainted: G S 6.16.0-rc7+ #303 PREEMPT(voluntary)
>> > [ 402.125667] Tainted: [S]=CPU_OUT_OF_SPEC
>> > [ 402.125668] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016
>> > [ 402.125671] Workqueue: events_unbound cfg80211_wiphy_work
>> > [ 402.125678] RIP: 0010:warn_bogus_irq_restore+0x25/0x30
>> > [ 402.125683] Code: 90 90 90 90 90 80 3d 51 3d dc 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 c0 4f c9 85 48 89 e5 c6 05 38 3d dc 00 01 e8 9b d8 e6 fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90
>> > [ 402.125686] RSP: 0018:ffffc9000173fb30 EFLAGS: 00010282
>> > [ 402.125691] RAX: 0000000000000000 RBX: ffffffff8616b460 RCX: 0000000000000000
>> > [ 402.125694] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff
>> > [ 402.125696] RBP: ffffc9000173fb30 R08: 0000000028935f32 R09: 0000000000000001
>> > [ 402.125699] R10: 0000000000000044 R11: ffff888100ba52c8 R12: 0000000000000001
>> > [ 402.125702] R13: ffffc9000173fbcb R14: ffffffff84301224 R15: 0000000000000000
>> > [ 402.125704] FS: 0000000000000000(0000) GS:ffff88829007f000(0000) knlGS:0000000000000000
>> > [ 402.125707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > [ 402.125710] CR2: 000055967d471ee0 CR3: 0000000006046001 CR4: 00000000000606b0
>> > [ 402.125713] Call Trace:
>> > [ 402.125716] <TASK>
>> > [ 402.125719] console_flush_all+0x41e/0x460
>> > [ 402.125725] ? console_flush_all+0x43/0x460
>> > [ 402.125735] console_unlock+0x55/0x100
>> > [ 402.125741] vprintk_emit+0x157/0x320
>> > [ 402.125748] vprintk_default+0x18/0x20
>> > [ 402.125752] vprintk+0x9/0x10
>> > [ 402.125756] _printk+0x52/0x70
>> > [ 402.125766] ieee80211_sta_rx_queued_mgmt+0x4c8/0xd30
>> > [ 402.125775] ? __this_cpu_preempt_check+0x13/0x20
>> > [ 402.125784] ieee80211_iface_work+0x3ad/0x500
>>
>> That's not great, but I don't see how the driver or wifi subsystem is
>> involved ... ieee80211_sta_rx_queued_mgmt() doesn't even use spinlocks
>> let alone disable IRQs or use raw_ APIs, and it's in the middle of
>> printk anyway.
printk() does perform local_irq_save()/_restore(). I rechecked all the
sites and they are properly balanced in all paths.
It looks like you may have found other issues related to corruption, so
I will assume printk is not the culprit. But please keep us CC or ping
us again if printk comes under suspicion again.
John Ogness
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220
2025-07-23 17:32 ` Thomas Gleixner
@ 2025-07-24 17:15 ` Pavel Machek
2025-07-24 17:51 ` Waiman Long
0 siblings, 1 reply; 26+ messages in thread
From: Pavel Machek @ 2025-07-24 17:15 UTC (permalink / raw)
To: Thomas Gleixner
Cc: kernel list, mingo, bp, dave.hansen, x86, hpa, peterz, will,
longman, miriam.rachel.korenblit, linux-wireless, Petr Mladek,
John Ogness
[-- Attachment #1: Type: text/plain, Size: 17298 bytes --]
Hi!
On Wed 2025-07-23 19:32:10, Thomas Gleixner wrote:
> On Wed, Jul 23 2025 at 19:31, Thomas Gleixner wrote:
> > On Wed, Jul 23 2025 at 17:42, Pavel Machek wrote:
> >> Did kernel boot on console (w/o X), and got this: not sure if it is
> >> related.
> >> [ 402.125635] ------------[ cut here ]------------
> >> [ 402.125638] raw_local_irq_restore() called with IRQs enabled
> >> [ 402.125645] WARNING: CPU: 3 PID: 387 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30
> >> [ 402.125654] Modules linked in:
> >> [ 402.125661] CPU: 3 UID: 0 PID: 387 Comm: kworker/u16:5 Tainted: G S 6.16.0-rc7+ #303 PREEMPT(voluntary)
> >> [ 402.125667] Tainted: [S]=CPU_OUT_OF_SPEC
> >> [ 402.125668] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016
> >> [ 402.125671] Workqueue: events_unbound cfg80211_wiphy_work
> >> [ 402.125678] RIP: 0010:warn_bogus_irq_restore+0x25/0x30
> >> [ 402.125683] Code: 90 90 90 90 90 80 3d 51 3d dc 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 c0 4f c9 85 48 89 e5 c6 05 38 3d dc 00 01 e8 9b d8 e6 fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90
> >> [ 402.125686] RSP: 0018:ffffc9000173fb30 EFLAGS: 00010282
> >> [ 402.125691] RAX: 0000000000000000 RBX: ffffffff8616b460 RCX: 0000000000000000
> >> [ 402.125694] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff
> >> [ 402.125696] RBP: ffffc9000173fb30 R08: 0000000028935f32 R09: 0000000000000001
> >> [ 402.125699] R10: 0000000000000044 R11: ffff888100ba52c8 R12: 0000000000000001
> >> [ 402.125702] R13: ffffc9000173fbcb R14: ffffffff84301224 R15: 0000000000000000
> >> [ 402.125704] FS: 0000000000000000(0000) GS:ffff88829007f000(0000) knlGS:0000000000000000
> >> [ 402.125707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [ 402.125710] CR2: 000055967d471ee0 CR3: 0000000006046001 CR4: 00000000000606b0
> >> [ 402.125713] Call Trace:
> >> [ 402.125716] <TASK>
> >> [ 402.125719] console_flush_all+0x41e/0x460
> >
> > Can you please decode this, so we can see which part of that code it is?
>
> And enable lockdep so that we can see where the interrupts were enabled?
Enabled lockdep and got this one. It seems resume with bad wifi signal
does it on 6.16...?
Is it any good? Any decoding needed?
...
[ 32.361445] CPU2 is up
[ 32.361729] smpboot: Booting Node 0 Processor 3 APIC 0x3
[ 32.361982] Disabled fast string operations
[ 32.366800] CPU3 is up
[ 32.370186] ACPI: PM: Waking up from system sleep state S3
[ 32.393904] ACPI: EC: interrupt unblocked
[ 32.396000] sdhci-pci 0000:0d:00.0: MMC controller base frequency changed to 50Mhz.
[ 32.409738] ACPI: EC: event unblocked
[ 32.470808] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
[ 32.687300] usb 2-1.4: reset full-speed USB device number 4 using ehci-pci
[ 32.758329] ata2: SATA link down (SStatus 0 SControl 300)
[ 32.758375] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 32.762316] ata5: SATA link down (SStatus 0 SControl 300)
[ 32.764585] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
[ 32.764593] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
[ 32.771931] sd 2:0:0:0: [sdb] Starting disk
[ 32.777439] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
[ 32.777450] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
[ 32.782731] ata3.00: configured for UDMA/133
[ 32.786846] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
[ 32.858513] usb 2-1.3: reset full-speed USB device number 3 using ehci-pci
[ 33.026331] usb 2-1.6: reset high-speed USB device number 5 using ehci-pci
[ 33.662583] psmouse serio1: synaptics: queried max coordinates: x [..5472], y [..4448]
[ 33.810033] PM: resume devices took 1.404 seconds
[ 33.841597] OOM killer enabled.
[ 33.841808] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: docking
[ 33.843280] Restarting tasks: Starting
[ 33.849066] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: Unable to dock!
[ 33.852744] Restarting tasks: Done
[ 33.888306] PM: suspend exit
[ 33.941831] Bluetooth: hci0: BCM: chip id 63
[ 33.944910] Bluetooth: hci0: BCM: features 0x07
[ 33.961985] Bluetooth: hci0: BCM20702A
[ 33.962628] Bluetooth: hci0: BCM20702A1 (001.002.014) build 0000
[ 33.970094] Bluetooth: hci0: BCM: firmware Patch file not found, tried:
[ 33.970993] Bluetooth: hci0: BCM: 'brcm/BCM20702A1-0a5c-21e6.hcd'
[ 33.971410] Bluetooth: hci0: BCM: 'brcm/BCM-0a5c-21e6.hcd'
[ 34.032198] Bluetooth: MGMT ver 1.23
[ 34.838285] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 35.166235] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
[ 35.167828] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
[ 35.206578] sd 0:0:0:0: [sda] Starting disk
[ 35.238578] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
[ 35.240197] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
[ 35.278928] ata1.00: configured for UDMA/133
[ 35.613471] e1000e 0000:00:19.0 enp0s25: NIC Link is Down
[ 54.283825] wlp3s0: authenticate with c8:3a:35:f0:ad:f1 (local address=a0:88:b4:62:a7:30)
[ 54.284095] ================================
[ 54.284097] WARNING: inconsistent lock state
[ 54.284100] 6.16.0-rc7+ #305 Tainted: G S
[ 54.284104] --------------------------------
[ 54.284105] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[ 54.284108] wpa_supplicant/2940 [HC0[0]:SC0[0]:HE0:SE1] takes:
[ 54.284114] ffffffff86263fe0 (console_owner){+.?.}-{0:0}, at: console_lock_spinning_enable+0x3d/0x60
[ 54.284133] {IN-SOFTIRQ-W} state was registered at:
[ 54.284135] lock_acquire+0xdc/0x300
[ 54.284144] console_lock_spinning_enable+0x59/0x60
[ 54.284148] console_flush_all+0x2c0/0x4d0
[ 54.284153] console_unlock+0x55/0x100
[ 54.284157] vprintk_emit+0x15b/0x3a0
[ 54.284162] vprintk_default+0x18/0x20
[ 54.284166] vprintk+0x9/0x10
[ 54.284171] _printk+0x52/0x70
[ 54.284177] _credit_init_bits.part.0+0xec/0x160
[ 54.284187] entropy_timer+0xaa/0xc0
[ 54.284194] call_timer_fn+0xa7/0x260
[ 54.284200] expire_timers+0xef/0x1b0
[ 54.284205] run_timer_base+0xb0/0x140
[ 54.284210] run_timer_softirq+0xb/0x40
[ 54.284215] handle_softirqs+0xcd/0x490
[ 54.284222] irq_exit_rcu+0xa2/0x160
[ 54.284227] sysvec_apic_timer_interrupt+0x9b/0xc0
[ 54.284236] asm_sysvec_apic_timer_interrupt+0x1b/0x20
[ 54.284242] cpuidle_enter_state+0x124/0x550
[ 54.284247] cpuidle_enter+0x29/0x40
[ 54.284253] do_idle+0x1d9/0x260
[ 54.284262] cpu_startup_entry+0x27/0x30
[ 54.284270] start_secondary+0x11e/0x140
[ 54.284277] common_startup_64+0x129/0x138
[ 54.284285] irq event stamp: 35912
[ 54.284287] hardirqs last enabled at (35911): [<ffffffff8432d3a1>] __down_trylock_console_sem+0xb1/0xc0
[ 54.284296] hardirqs last disabled at (35912): [<ffffffff84330651>] console_flush_all+0x3e1/0x4d0
[ 54.284302] softirqs last enabled at (35906): [<ffffffff85608c7e>] ieee80211_check_fast_xmit+0xce/0x820
[ 54.284312] softirqs last disabled at (35904): [<ffffffff85608c2c>] ieee80211_check_fast_xmit+0x7c/0x820
[ 54.284319]
other info that might help us debug this:
[ 54.284321] Possible unsafe locking scenario:
[ 54.284322] CPU0
[ 54.284323] ----
[ 54.284324] lock(console_owner);
[ 54.284328] <Interrupt>
[ 54.284329] lock(console_owner);
[ 54.284332]
*** DEADLOCK ***
[ 54.284333] 8 locks held by wpa_supplicant/2940:
[ 54.284336] #0: ffffffff8646f9b0 (cb_lock){++++}-{4:4}, at: genl_rcv+0x15/0x40
[ 54.284351] #1: ffff888103068768 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: nl80211_pre_doit+0xc2/0x250
[ 54.284367] #2: ffffffff86344200 (console_lock){+.+.}-{0:0}, at: vprintk_default+0x18/0x20
[ 54.284376] #3: ffffffff86344250 (console_srcu){....}-{0:0}, at: console_flush_all+0x43/0x4d0
[ 54.284386] #4: ffffffff86263fe0 (console_owner){+.?.}-{0:0}, at: console_lock_spinning_enable+0x3d/0x60
[ 54.284395] #5: ffffffff86263f60 (printk_legacy_map-wait-type-override){+...}-{4:4}, at: console_flush_all+0x2d2/0x4d0
[ 54.284404] #6: ffffffff863ed2f8 (printing_lock){+...}-{3:3}, at: vt_console_print+0x55/0x470
[ 54.284415] #7: ffffffff86346ac0 (rcu_read_lock){....}-{1:3}, at: __flush_work+0x5e/0x550
[ 54.284428]
stack backtrace:
[ 54.284433] CPU: 3 UID: 0 PID: 2940 Comm: wpa_supplicant Tainted: G S 6.16.0-rc7+ #305 PREEMPT(voluntary)
[ 54.284441] Tainted: [S]=CPU_OUT_OF_SPEC
[ 54.284443] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016
[ 54.284446] Call Trace:
[ 54.284448] <TASK>
[ 54.284450] dump_stack_lvl+0x88/0xd0
[ 54.284458] dump_stack+0x10/0x20
[ 54.284463] print_usage_bug.part.0+0x237/0x2d0
[ 54.284470] mark_lock.part.0+0xa9c/0xfb0
[ 54.284479] mark_held_locks+0x4d/0x80
[ 54.284486] lockdep_hardirqs_on_prepare+0xff/0x1c0
[ 54.284493] trace_hardirqs_on+0x5a/0xe0
[ 54.284500] _raw_spin_unlock_irq+0x23/0x60
[ 54.284505] __flush_work+0x3b4/0x550
[ 54.284513] ? __timer_delete+0x2f/0xd0
[ 54.284519] ? timer_delete+0xb/0x20
[ 54.284524] ? try_to_grab_pending+0x12a/0x320
[ 54.284534] cancel_delayed_work_sync+0x65/0x70
[ 54.284539] fbcon_cursor+0xbe/0x160
[ 54.284548] hide_cursor+0x2c/0xc0
[ 54.284553] vt_console_print+0x45e/0x470
[ 54.284560] console_flush_all+0x301/0x4d0
[ 54.284565] ? console_flush_all+0x2d2/0x4d0
[ 54.284569] ? console_flush_all+0x43/0x4d0
[ 54.284572] ? console_flush_all+0x2d2/0x4d0
[ 54.284579] console_unlock+0x55/0x100
[ 54.284584] vprintk_emit+0x15b/0x3a0
[ 54.284590] vprintk_default+0x18/0x20
[ 54.284595] vprintk+0x9/0x10
[ 54.284600] _printk+0x52/0x70
[ 54.284608] ieee80211_mgd_auth+0x2a3/0x5e0
[ 54.284619] ? __this_cpu_preempt_check+0x13/0x20
[ 54.284625] ieee80211_auth+0x13/0x20
[ 54.284631] cfg80211_mlme_auth+0xeb/0x2a0
[ 54.284640] nl80211_authenticate+0x341/0x3a0
[ 54.284653] genl_family_rcv_msg_doit.constprop.0+0xd4/0x130
[ 54.284663] genl_rcv_msg+0x14c/0x250
[ 54.284669] ? __pfx_nl80211_pre_doit+0x10/0x10
[ 54.284676] ? __pfx_nl80211_authenticate+0x10/0x10
[ 54.284684] ? __pfx_nl80211_post_doit+0x10/0x10
[ 54.284691] ? __pfx_genl_rcv_msg+0x10/0x10
[ 54.284697] netlink_rcv_skb+0x55/0x100
[ 54.284706] genl_rcv+0x24/0x40
[ 54.284712] netlink_unicast+0x1e8/0x2b0
[ 54.284718] netlink_sendmsg+0x1e3/0x400
[ 54.284726] ____sys_sendmsg+0x2a0/0x2f0
[ 54.284736] ? copy_msghdr_from_user+0x71/0xb0
[ 54.284744] ___sys_sendmsg+0x85/0xd0
[ 54.284751] ? find_held_lock+0x31/0x90
[ 54.284756] ? __might_fault+0x2c/0x70
[ 54.284762] ? __this_cpu_preempt_check+0x13/0x20
[ 54.284766] ? __might_fault+0x2c/0x70
[ 54.284771] ? __might_fault+0x2c/0x70
[ 54.284776] ? debug_smp_processor_id+0x17/0x20
[ 54.284781] __sys_sendmsg+0x6e/0xd0
[ 54.284788] ? do_syscall_64+0x39/0x2c0
[ 54.284795] __x64_sys_sendmsg+0x1a/0x20
[ 54.284801] x64_sys_call+0x929/0x2150
[ 54.284808] do_syscall_64+0x71/0x2c0
[ 54.284814] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 54.284819] RIP: 0033:0x7f79e0d28fb3
[ 54.284825] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b7 66 2e 0f 1f 84 00 00 00 00 00 90 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 89 54 24 1c 48
[ 54.284831] RSP: 002b:00007ffc79c940b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 54.284837] RAX: ffffffffffffffda RBX: 000055b73b414650 RCX: 00007f79e0d28fb3
[ 54.284840] RDX: 0000000000000000 RSI: 00007ffc79c940f0 RDI: 0000000000000006
[ 54.284843] RBP: 000055b73b44dc70 R08: 0000000000000004 R09: 00007f79e0dfabe0
[ 54.284846] R10: 00007ffc79c941c4 R11: 0000000000000246 R12: 000055b73b414560
[ 54.284849] R13: 00007ffc79c940f0 R14: 00007ffc79c941c4 R15: 000055b73b453f20
[ 54.284856] </TASK>
[ 54.285774] ------------[ cut here ]------------
[ 54.285777] raw_local_irq_restore() called with IRQs enabled
[ 54.285788] WARNING: CPU: 3 PID: 2940 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30
[ 54.285801] Modules linked in:
[ 54.285807] CPU: 3 UID: 0 PID: 2940 Comm: wpa_supplicant Tainted: G S 6.16.0-rc7+ #305 PREEMPT(voluntary)
[ 54.285814] Tainted: [S]=CPU_OUT_OF_SPEC
[ 54.285817] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016
[ 54.285820] RIP: 0010:warn_bogus_irq_restore+0x25/0x30
[ 54.285828] Code: 90 90 90 90 90 80 3d c2 5f e3 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 98 2f fa 85 c6 05 ac 5f e3 00 01 48 89 e5 e8 db 23 be fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90
[ 54.285834] RSP: 0018:ffffc90001acb690 EFLAGS: 00010282
[ 54.285839] RAX: 0000000000000000 RBX: ffffffff863ed460 RCX: 0000000000000000
[ 54.285843] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff
[ 54.285847] RBP: ffffc90001acb690 R08: 0000000000000000 R09: 0000000000000000
[ 54.285850] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[ 54.285853] R13: ffffc90001acb72b R14: 0000000000000200 R15: 0000000000000000
[ 54.285857] FS: 00007f79e08da1c0(0000) GS:ffff88828fcf5000(0000) knlGS:0000000000000000
[ 54.285862] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 54.285865] CR2: 000055b73b455000 CR3: 000000010b07e005 CR4: 00000000000606b0
[ 54.285869] Call Trace:
[ 54.285872] <TASK>
[ 54.285874] console_flush_all+0x47d/0x4d0
[ 54.285880] ? console_flush_all+0x43/0x4d0
[ 54.285885] ? console_flush_all+0x2d2/0x4d0
[ 54.285892] console_unlock+0x55/0x100
[ 54.285898] vprintk_emit+0x15b/0x3a0
[ 54.285904] vprintk_default+0x18/0x20
[ 54.285910] vprintk+0x9/0x10
[ 54.285915] _printk+0x52/0x70
[ 54.285924] ieee80211_mgd_auth+0x2a3/0x5e0
[ 54.285934] ? __this_cpu_preempt_check+0x13/0x20
[ 54.285940] ieee80211_auth+0x13/0x20
[ 54.285945] cfg80211_mlme_auth+0xeb/0x2a0
[ 54.285954] nl80211_authenticate+0x341/0x3a0
[ 54.285967] genl_family_rcv_msg_doit.constprop.0+0xd4/0x130
[ 54.285977] genl_rcv_msg+0x14c/0x250
[ 54.285984] ? __pfx_nl80211_pre_doit+0x10/0x10
[ 54.285991] ? __pfx_nl80211_authenticate+0x10/0x10
[ 54.286000] ? __pfx_nl80211_post_doit+0x10/0x10
[ 54.286007] ? __pfx_genl_rcv_msg+0x10/0x10
[ 54.286014] netlink_rcv_skb+0x55/0x100
[ 54.286023] genl_rcv+0x24/0x40
[ 54.286029] netlink_unicast+0x1e8/0x2b0
[ 54.286036] netlink_sendmsg+0x1e3/0x400
[ 54.286044] ____sys_sendmsg+0x2a0/0x2f0
[ 54.286052] ? copy_msghdr_from_user+0x71/0xb0
[ 54.286061] ___sys_sendmsg+0x85/0xd0
[ 54.286068] ? find_held_lock+0x31/0x90
[ 54.286073] ? __might_fault+0x2c/0x70
[ 54.286078] ? __this_cpu_preempt_check+0x13/0x20
[ 54.286083] ? __might_fault+0x2c/0x70
[ 54.286088] ? __might_fault+0x2c/0x70
[ 54.286094] ? debug_smp_processor_id+0x17/0x20
[ 54.286100] __sys_sendmsg+0x6e/0xd0
[ 54.286108] ? do_syscall_64+0x39/0x2c0
[ 54.286114] __x64_sys_sendmsg+0x1a/0x20
[ 54.286168] x64_sys_call+0x929/0x2150
[ 54.286182] do_syscall_64+0x71/0x2c0
[ 54.286193] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 54.286200] RIP: 0033:0x7f79e0d28fb3
[ 54.286207] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b7 66 2e 0f 1f 84 00 00 00 00 00 90 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 89 54 24 1c 48
[ 54.286215] RSP: 002b:00007ffc79c940b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 54.286225] RAX: ffffffffffffffda RBX: 000055b73b414650 RCX: 00007f79e0d28fb3
[ 54.286230] RDX: 0000000000000000 RSI: 00007ffc79c940f0 RDI: 0000000000000006
[ 54.286235] RBP: 000055b73b44dc70 R08: 0000000000000004 R09: 00007f79e0dfabe0
[ 54.286246] R10: 00007ffc79c941c4 R11: 0000000000000246 R12: 000055b73b414560
[ 54.286247] R13: 00007ffc79c940f0 R14: 00007ffc79c941c4 R15: 000055b73b453f20
[ 54.286266] </TASK>
[ 54.286269] irq event stamp: 35912
[ 54.286273] hardirqs last enabled at (35911): [<ffffffff8432d3a1>] __down_trylock_console_sem+0xb1/0xc0
[ 54.286286] hardirqs last disabled at (35912): [<ffffffff84330651>] console_flush_all+0x3e1/0x4d0
[ 54.286294] softirqs last enabled at (35906): [<ffffffff85608c7e>] ieee80211_check_fast_xmit+0xce/0x820
[ 54.286306] softirqs last disabled at (35904): [<ffffffff85608c2c>] ieee80211_check_fast_xmit+0x7c/0x820
[ 54.286318] ---[ end trace 0000000000000000 ]---
[ 54.373428] wlp3s0: send auth to c8:3a:35:f0:ad:f1 (try 1/3)
[ 54.381882] wlp3s0: authenticated
[ 54.386240] wlp3s0: associate with c8:3a:35:f0:ad:f1 (try 1/3)
[ 54.392641] wlp3s0: RX AssocResp from c8:3a:35:f0:ad:f1 (capab=0x411 status=0 aid=6)
[ 54.413392] wlp3s0: associated
Best regards,
Pavel
--
I don't work for Nazis and criminals, and neither should you.
Boycott Putin, Trump, and Musk!
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220
2025-07-24 17:15 ` Pavel Machek
@ 2025-07-24 17:51 ` Waiman Long
2025-07-24 17:55 ` Waiman Long
` (2 more replies)
0 siblings, 3 replies; 26+ messages in thread
From: Waiman Long @ 2025-07-24 17:51 UTC (permalink / raw)
To: Pavel Machek, Thomas Gleixner
Cc: kernel list, mingo, bp, dave.hansen, x86, hpa, peterz, will,
miriam.rachel.korenblit, linux-wireless, Petr Mladek, John Ogness
On 7/24/25 1:15 PM, Pavel Machek wrote:
> Hi!
>
> On Wed 2025-07-23 19:32:10, Thomas Gleixner wrote:
>> On Wed, Jul 23 2025 at 19:31, Thomas Gleixner wrote:
>>> On Wed, Jul 23 2025 at 17:42, Pavel Machek wrote:
>>>> Did kernel boot on console (w/o X), and got this: not sure if it is
>>>> related.
>>>> [ 402.125635] ------------[ cut here ]------------
>>>> [ 402.125638] raw_local_irq_restore() called with IRQs enabled
>>>> [ 402.125645] WARNING: CPU: 3 PID: 387 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30
>>>> [ 402.125654] Modules linked in:
>>>> [ 402.125661] CPU: 3 UID: 0 PID: 387 Comm: kworker/u16:5 Tainted: G S 6.16.0-rc7+ #303 PREEMPT(voluntary)
>>>> [ 402.125667] Tainted: [S]=CPU_OUT_OF_SPEC
>>>> [ 402.125668] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016
>>>> [ 402.125671] Workqueue: events_unbound cfg80211_wiphy_work
>>>> [ 402.125678] RIP: 0010:warn_bogus_irq_restore+0x25/0x30
>>>> [ 402.125683] Code: 90 90 90 90 90 80 3d 51 3d dc 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 c0 4f c9 85 48 89 e5 c6 05 38 3d dc 00 01 e8 9b d8 e6 fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90
>>>> [ 402.125686] RSP: 0018:ffffc9000173fb30 EFLAGS: 00010282
>>>> [ 402.125691] RAX: 0000000000000000 RBX: ffffffff8616b460 RCX: 0000000000000000
>>>> [ 402.125694] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff
>>>> [ 402.125696] RBP: ffffc9000173fb30 R08: 0000000028935f32 R09: 0000000000000001
>>>> [ 402.125699] R10: 0000000000000044 R11: ffff888100ba52c8 R12: 0000000000000001
>>>> [ 402.125702] R13: ffffc9000173fbcb R14: ffffffff84301224 R15: 0000000000000000
>>>> [ 402.125704] FS: 0000000000000000(0000) GS:ffff88829007f000(0000) knlGS:0000000000000000
>>>> [ 402.125707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [ 402.125710] CR2: 000055967d471ee0 CR3: 0000000006046001 CR4: 00000000000606b0
>>>> [ 402.125713] Call Trace:
>>>> [ 402.125716] <TASK>
>>>> [ 402.125719] console_flush_all+0x41e/0x460
>>> Can you please decode this, so we can see which part of that code it is?
>> And enable lockdep so that we can see where the interrupts were enabled?
> Enabled lockdep and got this one. It seems resume with bad wifi signal
> does it on 6.16...?
>
> Is it any good? Any decoding needed?
>
> ...
> [ 32.361445] CPU2 is up
> [ 32.361729] smpboot: Booting Node 0 Processor 3 APIC 0x3
> [ 32.361982] Disabled fast string operations
> [ 32.366800] CPU3 is up
> [ 32.370186] ACPI: PM: Waking up from system sleep state S3
> [ 32.393904] ACPI: EC: interrupt unblocked
> [ 32.396000] sdhci-pci 0000:0d:00.0: MMC controller base frequency changed to 50Mhz.
> [ 32.409738] ACPI: EC: event unblocked
> [ 32.470808] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
> [ 32.687300] usb 2-1.4: reset full-speed USB device number 4 using ehci-pci
> [ 32.758329] ata2: SATA link down (SStatus 0 SControl 300)
> [ 32.758375] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [ 32.762316] ata5: SATA link down (SStatus 0 SControl 300)
> [ 32.764585] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
> [ 32.764593] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
> [ 32.771931] sd 2:0:0:0: [sdb] Starting disk
> [ 32.777439] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
> [ 32.777450] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
> [ 32.782731] ata3.00: configured for UDMA/133
> [ 32.786846] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
> [ 32.858513] usb 2-1.3: reset full-speed USB device number 3 using ehci-pci
> [ 33.026331] usb 2-1.6: reset high-speed USB device number 5 using ehci-pci
> [ 33.662583] psmouse serio1: synaptics: queried max coordinates: x [..5472], y [..4448]
> [ 33.810033] PM: resume devices took 1.404 seconds
> [ 33.841597] OOM killer enabled.
> [ 33.841808] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: docking
> [ 33.843280] Restarting tasks: Starting
> [ 33.849066] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: Unable to dock!
> [ 33.852744] Restarting tasks: Done
> [ 33.888306] PM: suspend exit
> [ 33.941831] Bluetooth: hci0: BCM: chip id 63
> [ 33.944910] Bluetooth: hci0: BCM: features 0x07
> [ 33.961985] Bluetooth: hci0: BCM20702A
> [ 33.962628] Bluetooth: hci0: BCM20702A1 (001.002.014) build 0000
> [ 33.970094] Bluetooth: hci0: BCM: firmware Patch file not found, tried:
> [ 33.970993] Bluetooth: hci0: BCM: 'brcm/BCM20702A1-0a5c-21e6.hcd'
> [ 33.971410] Bluetooth: hci0: BCM: 'brcm/BCM-0a5c-21e6.hcd'
> [ 34.032198] Bluetooth: MGMT ver 1.23
> [ 34.838285] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [ 35.166235] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
> [ 35.167828] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
> [ 35.206578] sd 0:0:0:0: [sda] Starting disk
> [ 35.238578] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
> [ 35.240197] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
> [ 35.278928] ata1.00: configured for UDMA/133
> [ 35.613471] e1000e 0000:00:19.0 enp0s25: NIC Link is Down
> [ 54.283825] wlp3s0: authenticate with c8:3a:35:f0:ad:f1 (local address=a0:88:b4:62:a7:30)
>
> [ 54.284095] ================================
> [ 54.284097] WARNING: inconsistent lock state
> [ 54.284100] 6.16.0-rc7+ #305 Tainted: G S
> [ 54.284104] --------------------------------
> [ 54.284105] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> [ 54.284108] wpa_supplicant/2940 [HC0[0]:SC0[0]:HE0:SE1] takes:
> [ 54.284114] ffffffff86263fe0 (console_owner){+.?.}-{0:0}, at: console_lock_spinning_enable+0x3d/0x60
The lockdep warning just means that console_owner_lock is acquired both
in softirq context and in task context with interrupt enabled. That can
leads to deadlock. So the remedy is to always take console_owner_lock
with interrupt disabled, i.e. with
raw_spin_lock_irqsave/raw_spin_lock_irqrestore.
Cheers,
Longman
> [ 54.284133] {IN-SOFTIRQ-W} state was registered at:
> [ 54.284135] lock_acquire+0xdc/0x300
> [ 54.284144] console_lock_spinning_enable+0x59/0x60
> [ 54.284148] console_flush_all+0x2c0/0x4d0
> [ 54.284153] console_unlock+0x55/0x100
> [ 54.284157] vprintk_emit+0x15b/0x3a0
> [ 54.284162] vprintk_default+0x18/0x20
> [ 54.284166] vprintk+0x9/0x10
> [ 54.284171] _printk+0x52/0x70
> [ 54.284177] _credit_init_bits.part.0+0xec/0x160
> [ 54.284187] entropy_timer+0xaa/0xc0
> [ 54.284194] call_timer_fn+0xa7/0x260
> [ 54.284200] expire_timers+0xef/0x1b0
> [ 54.284205] run_timer_base+0xb0/0x140
> [ 54.284210] run_timer_softirq+0xb/0x40
> [ 54.284215] handle_softirqs+0xcd/0x490
> [ 54.284222] irq_exit_rcu+0xa2/0x160
> [ 54.284227] sysvec_apic_timer_interrupt+0x9b/0xc0
> [ 54.284236] asm_sysvec_apic_timer_interrupt+0x1b/0x20
> [ 54.284242] cpuidle_enter_state+0x124/0x550
> [ 54.284247] cpuidle_enter+0x29/0x40
> [ 54.284253] do_idle+0x1d9/0x260
> [ 54.284262] cpu_startup_entry+0x27/0x30
> [ 54.284270] start_secondary+0x11e/0x140
> [ 54.284277] common_startup_64+0x129/0x138
> [ 54.284285] irq event stamp: 35912
> [ 54.284287] hardirqs last enabled at (35911): [<ffffffff8432d3a1>] __down_trylock_console_sem+0xb1/0xc0
> [ 54.284296] hardirqs last disabled at (35912): [<ffffffff84330651>] console_flush_all+0x3e1/0x4d0
> [ 54.284302] softirqs last enabled at (35906): [<ffffffff85608c7e>] ieee80211_check_fast_xmit+0xce/0x820
> [ 54.284312] softirqs last disabled at (35904): [<ffffffff85608c2c>] ieee80211_check_fast_xmit+0x7c/0x820
> [ 54.284319]
> other info that might help us debug this:
> [ 54.284321] Possible unsafe locking scenario:
>
> [ 54.284322] CPU0
> [ 54.284323] ----
> [ 54.284324] lock(console_owner);
> [ 54.284328] <Interrupt>
> [ 54.284329] lock(console_owner);
> [ 54.284332]
> *** DEADLOCK ***
>
> [ 54.284333] 8 locks held by wpa_supplicant/2940:
> [ 54.284336] #0: ffffffff8646f9b0 (cb_lock){++++}-{4:4}, at: genl_rcv+0x15/0x40
> [ 54.284351] #1: ffff888103068768 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: nl80211_pre_doit+0xc2/0x250
> [ 54.284367] #2: ffffffff86344200 (console_lock){+.+.}-{0:0}, at: vprintk_default+0x18/0x20
> [ 54.284376] #3: ffffffff86344250 (console_srcu){....}-{0:0}, at: console_flush_all+0x43/0x4d0
> [ 54.284386] #4: ffffffff86263fe0 (console_owner){+.?.}-{0:0}, at: console_lock_spinning_enable+0x3d/0x60
> [ 54.284395] #5: ffffffff86263f60 (printk_legacy_map-wait-type-override){+...}-{4:4}, at: console_flush_all+0x2d2/0x4d0
> [ 54.284404] #6: ffffffff863ed2f8 (printing_lock){+...}-{3:3}, at: vt_console_print+0x55/0x470
> [ 54.284415] #7: ffffffff86346ac0 (rcu_read_lock){....}-{1:3}, at: __flush_work+0x5e/0x550
> [ 54.284428]
> stack backtrace:
> [ 54.284433] CPU: 3 UID: 0 PID: 2940 Comm: wpa_supplicant Tainted: G S 6.16.0-rc7+ #305 PREEMPT(voluntary)
> [ 54.284441] Tainted: [S]=CPU_OUT_OF_SPEC
> [ 54.284443] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016
> [ 54.284446] Call Trace:
> [ 54.284448] <TASK>
> [ 54.284450] dump_stack_lvl+0x88/0xd0
> [ 54.284458] dump_stack+0x10/0x20
> [ 54.284463] print_usage_bug.part.0+0x237/0x2d0
> [ 54.284470] mark_lock.part.0+0xa9c/0xfb0
> [ 54.284479] mark_held_locks+0x4d/0x80
> [ 54.284486] lockdep_hardirqs_on_prepare+0xff/0x1c0
> [ 54.284493] trace_hardirqs_on+0x5a/0xe0
> [ 54.284500] _raw_spin_unlock_irq+0x23/0x60
> [ 54.284505] __flush_work+0x3b4/0x550
> [ 54.284513] ? __timer_delete+0x2f/0xd0
> [ 54.284519] ? timer_delete+0xb/0x20
> [ 54.284524] ? try_to_grab_pending+0x12a/0x320
> [ 54.284534] cancel_delayed_work_sync+0x65/0x70
> [ 54.284539] fbcon_cursor+0xbe/0x160
> [ 54.284548] hide_cursor+0x2c/0xc0
> [ 54.284553] vt_console_print+0x45e/0x470
> [ 54.284560] console_flush_all+0x301/0x4d0
> [ 54.284565] ? console_flush_all+0x2d2/0x4d0
> [ 54.284569] ? console_flush_all+0x43/0x4d0
> [ 54.284572] ? console_flush_all+0x2d2/0x4d0
> [ 54.284579] console_unlock+0x55/0x100
> [ 54.284584] vprintk_emit+0x15b/0x3a0
> [ 54.284590] vprintk_default+0x18/0x20
> [ 54.284595] vprintk+0x9/0x10
> [ 54.284600] _printk+0x52/0x70
> [ 54.284608] ieee80211_mgd_auth+0x2a3/0x5e0
> [ 54.284619] ? __this_cpu_preempt_check+0x13/0x20
> [ 54.284625] ieee80211_auth+0x13/0x20
> [ 54.284631] cfg80211_mlme_auth+0xeb/0x2a0
> [ 54.284640] nl80211_authenticate+0x341/0x3a0
> [ 54.284653] genl_family_rcv_msg_doit.constprop.0+0xd4/0x130
> [ 54.284663] genl_rcv_msg+0x14c/0x250
> [ 54.284669] ? __pfx_nl80211_pre_doit+0x10/0x10
> [ 54.284676] ? __pfx_nl80211_authenticate+0x10/0x10
> [ 54.284684] ? __pfx_nl80211_post_doit+0x10/0x10
> [ 54.284691] ? __pfx_genl_rcv_msg+0x10/0x10
> [ 54.284697] netlink_rcv_skb+0x55/0x100
> [ 54.284706] genl_rcv+0x24/0x40
> [ 54.284712] netlink_unicast+0x1e8/0x2b0
> [ 54.284718] netlink_sendmsg+0x1e3/0x400
> [ 54.284726] ____sys_sendmsg+0x2a0/0x2f0
> [ 54.284736] ? copy_msghdr_from_user+0x71/0xb0
> [ 54.284744] ___sys_sendmsg+0x85/0xd0
> [ 54.284751] ? find_held_lock+0x31/0x90
> [ 54.284756] ? __might_fault+0x2c/0x70
> [ 54.284762] ? __this_cpu_preempt_check+0x13/0x20
> [ 54.284766] ? __might_fault+0x2c/0x70
> [ 54.284771] ? __might_fault+0x2c/0x70
> [ 54.284776] ? debug_smp_processor_id+0x17/0x20
> [ 54.284781] __sys_sendmsg+0x6e/0xd0
> [ 54.284788] ? do_syscall_64+0x39/0x2c0
> [ 54.284795] __x64_sys_sendmsg+0x1a/0x20
> [ 54.284801] x64_sys_call+0x929/0x2150
> [ 54.284808] do_syscall_64+0x71/0x2c0
> [ 54.284814] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 54.284819] RIP: 0033:0x7f79e0d28fb3
> [ 54.284825] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b7 66 2e 0f 1f 84 00 00 00 00 00 90 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 89 54 24 1c 48
> [ 54.284831] RSP: 002b:00007ffc79c940b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> [ 54.284837] RAX: ffffffffffffffda RBX: 000055b73b414650 RCX: 00007f79e0d28fb3
> [ 54.284840] RDX: 0000000000000000 RSI: 00007ffc79c940f0 RDI: 0000000000000006
> [ 54.284843] RBP: 000055b73b44dc70 R08: 0000000000000004 R09: 00007f79e0dfabe0
> [ 54.284846] R10: 00007ffc79c941c4 R11: 0000000000000246 R12: 000055b73b414560
> [ 54.284849] R13: 00007ffc79c940f0 R14: 00007ffc79c941c4 R15: 000055b73b453f20
> [ 54.284856] </TASK>
> [ 54.285774] ------------[ cut here ]------------
> [ 54.285777] raw_local_irq_restore() called with IRQs enabled
> [ 54.285788] WARNING: CPU: 3 PID: 2940 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30
> [ 54.285801] Modules linked in:
> [ 54.285807] CPU: 3 UID: 0 PID: 2940 Comm: wpa_supplicant Tainted: G S 6.16.0-rc7+ #305 PREEMPT(voluntary)
> [ 54.285814] Tainted: [S]=CPU_OUT_OF_SPEC
> [ 54.285817] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016
> [ 54.285820] RIP: 0010:warn_bogus_irq_restore+0x25/0x30
> [ 54.285828] Code: 90 90 90 90 90 80 3d c2 5f e3 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 98 2f fa 85 c6 05 ac 5f e3 00 01 48 89 e5 e8 db 23 be fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90
> [ 54.285834] RSP: 0018:ffffc90001acb690 EFLAGS: 00010282
> [ 54.285839] RAX: 0000000000000000 RBX: ffffffff863ed460 RCX: 0000000000000000
> [ 54.285843] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff
> [ 54.285847] RBP: ffffc90001acb690 R08: 0000000000000000 R09: 0000000000000000
> [ 54.285850] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
> [ 54.285853] R13: ffffc90001acb72b R14: 0000000000000200 R15: 0000000000000000
> [ 54.285857] FS: 00007f79e08da1c0(0000) GS:ffff88828fcf5000(0000) knlGS:0000000000000000
> [ 54.285862] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 54.285865] CR2: 000055b73b455000 CR3: 000000010b07e005 CR4: 00000000000606b0
> [ 54.285869] Call Trace:
> [ 54.285872] <TASK>
> [ 54.285874] console_flush_all+0x47d/0x4d0
> [ 54.285880] ? console_flush_all+0x43/0x4d0
> [ 54.285885] ? console_flush_all+0x2d2/0x4d0
> [ 54.285892] console_unlock+0x55/0x100
> [ 54.285898] vprintk_emit+0x15b/0x3a0
> [ 54.285904] vprintk_default+0x18/0x20
> [ 54.285910] vprintk+0x9/0x10
> [ 54.285915] _printk+0x52/0x70
> [ 54.285924] ieee80211_mgd_auth+0x2a3/0x5e0
> [ 54.285934] ? __this_cpu_preempt_check+0x13/0x20
> [ 54.285940] ieee80211_auth+0x13/0x20
> [ 54.285945] cfg80211_mlme_auth+0xeb/0x2a0
> [ 54.285954] nl80211_authenticate+0x341/0x3a0
> [ 54.285967] genl_family_rcv_msg_doit.constprop.0+0xd4/0x130
> [ 54.285977] genl_rcv_msg+0x14c/0x250
> [ 54.285984] ? __pfx_nl80211_pre_doit+0x10/0x10
> [ 54.285991] ? __pfx_nl80211_authenticate+0x10/0x10
> [ 54.286000] ? __pfx_nl80211_post_doit+0x10/0x10
> [ 54.286007] ? __pfx_genl_rcv_msg+0x10/0x10
> [ 54.286014] netlink_rcv_skb+0x55/0x100
> [ 54.286023] genl_rcv+0x24/0x40
> [ 54.286029] netlink_unicast+0x1e8/0x2b0
> [ 54.286036] netlink_sendmsg+0x1e3/0x400
> [ 54.286044] ____sys_sendmsg+0x2a0/0x2f0
> [ 54.286052] ? copy_msghdr_from_user+0x71/0xb0
> [ 54.286061] ___sys_sendmsg+0x85/0xd0
> [ 54.286068] ? find_held_lock+0x31/0x90
> [ 54.286073] ? __might_fault+0x2c/0x70
> [ 54.286078] ? __this_cpu_preempt_check+0x13/0x20
> [ 54.286083] ? __might_fault+0x2c/0x70
> [ 54.286088] ? __might_fault+0x2c/0x70
> [ 54.286094] ? debug_smp_processor_id+0x17/0x20
> [ 54.286100] __sys_sendmsg+0x6e/0xd0
> [ 54.286108] ? do_syscall_64+0x39/0x2c0
> [ 54.286114] __x64_sys_sendmsg+0x1a/0x20
> [ 54.286168] x64_sys_call+0x929/0x2150
> [ 54.286182] do_syscall_64+0x71/0x2c0
> [ 54.286193] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 54.286200] RIP: 0033:0x7f79e0d28fb3
> [ 54.286207] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b7 66 2e 0f 1f 84 00 00 00 00 00 90 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 89 54 24 1c 48
> [ 54.286215] RSP: 002b:00007ffc79c940b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> [ 54.286225] RAX: ffffffffffffffda RBX: 000055b73b414650 RCX: 00007f79e0d28fb3
> [ 54.286230] RDX: 0000000000000000 RSI: 00007ffc79c940f0 RDI: 0000000000000006
> [ 54.286235] RBP: 000055b73b44dc70 R08: 0000000000000004 R09: 00007f79e0dfabe0
> [ 54.286246] R10: 00007ffc79c941c4 R11: 0000000000000246 R12: 000055b73b414560
> [ 54.286247] R13: 00007ffc79c940f0 R14: 00007ffc79c941c4 R15: 000055b73b453f20
> [ 54.286266] </TASK>
> [ 54.286269] irq event stamp: 35912
> [ 54.286273] hardirqs last enabled at (35911): [<ffffffff8432d3a1>] __down_trylock_console_sem+0xb1/0xc0
> [ 54.286286] hardirqs last disabled at (35912): [<ffffffff84330651>] console_flush_all+0x3e1/0x4d0
> [ 54.286294] softirqs last enabled at (35906): [<ffffffff85608c7e>] ieee80211_check_fast_xmit+0xce/0x820
> [ 54.286306] softirqs last disabled at (35904): [<ffffffff85608c2c>] ieee80211_check_fast_xmit+0x7c/0x820
> [ 54.286318] ---[ end trace 0000000000000000 ]---
> [ 54.373428] wlp3s0: send auth to c8:3a:35:f0:ad:f1 (try 1/3)
> [ 54.381882] wlp3s0: authenticated
> [ 54.386240] wlp3s0: associate with c8:3a:35:f0:ad:f1 (try 1/3)
> [ 54.392641] wlp3s0: RX AssocResp from c8:3a:35:f0:ad:f1 (capab=0x411 status=0 aid=6)
> [ 54.413392] wlp3s0: associated
>
> Best regards,
> Pavel
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220
2025-07-24 17:51 ` Waiman Long
@ 2025-07-24 17:55 ` Waiman Long
2025-07-24 21:24 ` Thomas Gleixner
2025-07-24 19:07 ` Pavel Machek
2025-07-24 19:45 ` John Ogness
2 siblings, 1 reply; 26+ messages in thread
From: Waiman Long @ 2025-07-24 17:55 UTC (permalink / raw)
To: Pavel Machek, Thomas Gleixner
Cc: kernel list, mingo, bp, dave.hansen, x86, hpa, peterz, will,
miriam.rachel.korenblit, linux-wireless, Petr Mladek, John Ogness
On 7/24/25 1:51 PM, Waiman Long wrote:
> On 7/24/25 1:15 PM, Pavel Machek wrote:
>> Hi!
>>
>> On Wed 2025-07-23 19:32:10, Thomas Gleixner wrote:
>>> On Wed, Jul 23 2025 at 19:31, Thomas Gleixner wrote:
>>>> On Wed, Jul 23 2025 at 17:42, Pavel Machek wrote:
>>>>> Did kernel boot on console (w/o X), and got this: not sure if it is
>>>>> related.
>>>>> [ 402.125635] ------------[ cut here ]------------
>>>>> [ 402.125638] raw_local_irq_restore() called with IRQs enabled
>>>>> [ 402.125645] WARNING: CPU: 3 PID: 387 at
>>>>> kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30
>>>>> [ 402.125654] Modules linked in:
>>>>> [ 402.125661] CPU: 3 UID: 0 PID: 387 Comm: kworker/u16:5 Tainted:
>>>>> G S 6.16.0-rc7+ #303 PREEMPT(voluntary)
>>>>> [ 402.125667] Tainted: [S]=CPU_OUT_OF_SPEC
>>>>> [ 402.125668] Hardware name: LENOVO 4291W3B/4291W3B, BIOS
>>>>> 8DET73WW (1.43 ) 10/12/2016
>>>>> [ 402.125671] Workqueue: events_unbound cfg80211_wiphy_work
>>>>> [ 402.125678] RIP: 0010:warn_bogus_irq_restore+0x25/0x30
>>>>> [ 402.125683] Code: 90 90 90 90 90 80 3d 51 3d dc 00 00 74 05 c3
>>>>> cc cc cc cc 55 48 c7 c7 c0 4f c9 85 48 89 e5 c6 05 38 3d dc 00 01
>>>>> e8 9b d8 e6 fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90
>>>>> 90 90 90 90 90 90
>>>>> [ 402.125686] RSP: 0018:ffffc9000173fb30 EFLAGS: 00010282
>>>>> [ 402.125691] RAX: 0000000000000000 RBX: ffffffff8616b460 RCX:
>>>>> 0000000000000000
>>>>> [ 402.125694] RDX: 0000000000000003 RSI: 0000000000000027 RDI:
>>>>> 00000000ffffffff
>>>>> [ 402.125696] RBP: ffffc9000173fb30 R08: 0000000028935f32 R09:
>>>>> 0000000000000001
>>>>> [ 402.125699] R10: 0000000000000044 R11: ffff888100ba52c8 R12:
>>>>> 0000000000000001
>>>>> [ 402.125702] R13: ffffc9000173fbcb R14: ffffffff84301224 R15:
>>>>> 0000000000000000
>>>>> [ 402.125704] FS: 0000000000000000(0000)
>>>>> GS:ffff88829007f000(0000) knlGS:0000000000000000
>>>>> [ 402.125707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [ 402.125710] CR2: 000055967d471ee0 CR3: 0000000006046001 CR4:
>>>>> 00000000000606b0
>>>>> [ 402.125713] Call Trace:
>>>>> [ 402.125716] <TASK>
>>>>> [ 402.125719] console_flush_all+0x41e/0x460
>>>> Can you please decode this, so we can see which part of that code
>>>> it is?
>>> And enable lockdep so that we can see where the interrupts were
>>> enabled?
>> Enabled lockdep and got this one. It seems resume with bad wifi signal
>> does it on 6.16...?
>>
>> Is it any good? Any decoding needed?
>>
>> ...
>> [ 32.361445] CPU2 is up
>> [ 32.361729] smpboot: Booting Node 0 Processor 3 APIC 0x3
>> [ 32.361982] Disabled fast string operations
>> [ 32.366800] CPU3 is up
>> [ 32.370186] ACPI: PM: Waking up from system sleep state S3
>> [ 32.393904] ACPI: EC: interrupt unblocked
>> [ 32.396000] sdhci-pci 0000:0d:00.0: MMC controller base frequency
>> changed to 50Mhz.
>> [ 32.409738] ACPI: EC: event unblocked
>> [ 32.470808] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
>> [ 32.687300] usb 2-1.4: reset full-speed USB device number 4 using
>> ehci-pci
>> [ 32.758329] ata2: SATA link down (SStatus 0 SControl 300)
>> [ 32.758375] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> [ 32.762316] ata5: SATA link down (SStatus 0 SControl 300)
>> [ 32.764585] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE
>> LOCK) filtered out
>> [ 32.764593] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES)
>> filtered out
>> [ 32.771931] sd 2:0:0:0: [sdb] Starting disk
>> [ 32.777439] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE
>> LOCK) filtered out
>> [ 32.777450] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES)
>> filtered out
>> [ 32.782731] ata3.00: configured for UDMA/133
>> [ 32.786846] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
>> [ 32.858513] usb 2-1.3: reset full-speed USB device number 3 using
>> ehci-pci
>> [ 33.026331] usb 2-1.6: reset high-speed USB device number 5 using
>> ehci-pci
>> [ 33.662583] psmouse serio1: synaptics: queried max coordinates: x
>> [..5472], y [..4448]
>> [ 33.810033] PM: resume devices took 1.404 seconds
>> [ 33.841597] OOM killer enabled.
>> [ 33.841808] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: docking
>> [ 33.843280] Restarting tasks: Starting
>> [ 33.849066] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: Unable to dock!
>> [ 33.852744] Restarting tasks: Done
>> [ 33.888306] PM: suspend exit
>> [ 33.941831] Bluetooth: hci0: BCM: chip id 63
>> [ 33.944910] Bluetooth: hci0: BCM: features 0x07
>> [ 33.961985] Bluetooth: hci0: BCM20702A
>> [ 33.962628] Bluetooth: hci0: BCM20702A1 (001.002.014) build 0000
>> [ 33.970094] Bluetooth: hci0: BCM: firmware Patch file not found,
>> tried:
>> [ 33.970993] Bluetooth: hci0: BCM: 'brcm/BCM20702A1-0a5c-21e6.hcd'
>> [ 33.971410] Bluetooth: hci0: BCM: 'brcm/BCM-0a5c-21e6.hcd'
>> [ 34.032198] Bluetooth: MGMT ver 1.23
>> [ 34.838285] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
>> [ 35.166235] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE
>> LOCK) filtered out
>> [ 35.167828] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES)
>> filtered out
>> [ 35.206578] sd 0:0:0:0: [sda] Starting disk
>> [ 35.238578] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE
>> LOCK) filtered out
>> [ 35.240197] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES)
>> filtered out
>> [ 35.278928] ata1.00: configured for UDMA/133
>> [ 35.613471] e1000e 0000:00:19.0 enp0s25: NIC Link is Down
>> [ 54.283825] wlp3s0: authenticate with c8:3a:35:f0:ad:f1 (local
>> address=a0:88:b4:62:a7:30)
>>
>> [ 54.284095] ================================
>> [ 54.284097] WARNING: inconsistent lock state
>> [ 54.284100] 6.16.0-rc7+ #305 Tainted: G S
>> [ 54.284104] --------------------------------
>> [ 54.284105] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
>> [ 54.284108] wpa_supplicant/2940 [HC0[0]:SC0[0]:HE0:SE1] takes:
>> [ 54.284114] ffffffff86263fe0 (console_owner){+.?.}-{0:0}, at:
>> console_lock_spinning_enable+0x3d/0x60
>
> The lockdep warning just means that console_owner_lock is acquired
> both in softirq context and in task context with interrupt enabled.
> That can leads to deadlock. So the remedy is to always take
> console_owner_lock with interrupt disabled, i.e. with
> raw_spin_lock_irqsave/raw_spin_lock_irqrestore.
I suppose that this lock can also be acquired in hardirq context. So a
similar HARDIRQ warning can be printed if that happens first.
Regards,
Longman
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220
2025-07-24 17:51 ` Waiman Long
2025-07-24 17:55 ` Waiman Long
@ 2025-07-24 19:07 ` Pavel Machek
2025-07-24 19:22 ` Waiman Long
2025-07-24 19:45 ` John Ogness
2 siblings, 1 reply; 26+ messages in thread
From: Pavel Machek @ 2025-07-24 19:07 UTC (permalink / raw)
To: Waiman Long
Cc: Thomas Gleixner, kernel list, mingo, bp, dave.hansen, x86, hpa,
peterz, will, miriam.rachel.korenblit, linux-wireless,
Petr Mladek, John Ogness
[-- Attachment #1: Type: text/plain, Size: 4433 bytes --]
Hi!
> > > > Can you please decode this, so we can see which part of that code it is?
> > > And enable lockdep so that we can see where the interrupts were enabled?
> > Enabled lockdep and got this one. It seems resume with bad wifi signal
> > does it on 6.16...?
> >
> > Is it any good? Any decoding needed?
> >
> > ...
> > [ 32.361445] CPU2 is up
> > [ 32.361729] smpboot: Booting Node 0 Processor 3 APIC 0x3
> > [ 32.361982] Disabled fast string operations
> > [ 32.366800] CPU3 is up
> > [ 32.370186] ACPI: PM: Waking up from system sleep state S3
> > [ 32.393904] ACPI: EC: interrupt unblocked
> > [ 32.396000] sdhci-pci 0000:0d:00.0: MMC controller base frequency changed to 50Mhz.
> > [ 32.409738] ACPI: EC: event unblocked
> > [ 32.470808] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
> > [ 32.687300] usb 2-1.4: reset full-speed USB device number 4 using ehci-pci
> > [ 32.758329] ata2: SATA link down (SStatus 0 SControl 300)
> > [ 32.758375] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> > [ 32.762316] ata5: SATA link down (SStatus 0 SControl 300)
> > [ 32.764585] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
> > [ 32.764593] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
> > [ 32.771931] sd 2:0:0:0: [sdb] Starting disk
> > [ 32.777439] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
> > [ 32.777450] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
> > [ 32.782731] ata3.00: configured for UDMA/133
> > [ 32.786846] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
> > [ 32.858513] usb 2-1.3: reset full-speed USB device number 3 using ehci-pci
> > [ 33.026331] usb 2-1.6: reset high-speed USB device number 5 using ehci-pci
> > [ 33.662583] psmouse serio1: synaptics: queried max coordinates: x [..5472], y [..4448]
> > [ 33.810033] PM: resume devices took 1.404 seconds
> > [ 33.841597] OOM killer enabled.
> > [ 33.841808] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: docking
> > [ 33.843280] Restarting tasks: Starting
> > [ 33.849066] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: Unable to dock!
> > [ 33.852744] Restarting tasks: Done
> > [ 33.888306] PM: suspend exit
> > [ 33.941831] Bluetooth: hci0: BCM: chip id 63
> > [ 33.944910] Bluetooth: hci0: BCM: features 0x07
> > [ 33.961985] Bluetooth: hci0: BCM20702A
> > [ 33.962628] Bluetooth: hci0: BCM20702A1 (001.002.014) build 0000
> > [ 33.970094] Bluetooth: hci0: BCM: firmware Patch file not found, tried:
> > [ 33.970993] Bluetooth: hci0: BCM: 'brcm/BCM20702A1-0a5c-21e6.hcd'
> > [ 33.971410] Bluetooth: hci0: BCM: 'brcm/BCM-0a5c-21e6.hcd'
> > [ 34.032198] Bluetooth: MGMT ver 1.23
> > [ 34.838285] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> > [ 35.166235] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
> > [ 35.167828] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
> > [ 35.206578] sd 0:0:0:0: [sda] Starting disk
> > [ 35.238578] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
> > [ 35.240197] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
> > [ 35.278928] ata1.00: configured for UDMA/133
> > [ 35.613471] e1000e 0000:00:19.0 enp0s25: NIC Link is Down
> > [ 54.283825] wlp3s0: authenticate with c8:3a:35:f0:ad:f1 (local address=a0:88:b4:62:a7:30)
> >
> > [ 54.284095] ================================
> > [ 54.284097] WARNING: inconsistent lock state
> > [ 54.284100] 6.16.0-rc7+ #305 Tainted: G S
> > [ 54.284104] --------------------------------
> > [ 54.284105] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> > [ 54.284108] wpa_supplicant/2940 [HC0[0]:SC0[0]:HE0:SE1] takes:
> > [ 54.284114] ffffffff86263fe0 (console_owner){+.?.}-{0:0}, at: console_lock_spinning_enable+0x3d/0x60
>
> The lockdep warning just means that console_owner_lock is acquired both in
> softirq context and in task context with interrupt enabled. That can leads
> to deadlock. So the remedy is to always take console_owner_lock with
> interrupt disabled, i.e. with
> raw_spin_lock_irqsave/raw_spin_lock_irqrestore.
So you suspect problem is in the printk code?
Best regards,
Pavel
--
I don't work for Nazis and criminals, and neither should you.
Boycott Putin, Trump, and Musk!
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220
2025-07-24 19:07 ` Pavel Machek
@ 2025-07-24 19:22 ` Waiman Long
2025-07-24 22:21 ` Thomas Gleixner
0 siblings, 1 reply; 26+ messages in thread
From: Waiman Long @ 2025-07-24 19:22 UTC (permalink / raw)
To: Pavel Machek, Waiman Long
Cc: Thomas Gleixner, kernel list, mingo, bp, dave.hansen, x86, hpa,
peterz, will, miriam.rachel.korenblit, linux-wireless,
Petr Mladek, John Ogness
On 7/24/25 3:07 PM, Pavel Machek wrote:
> Hi!
>
>>>>> Can you please decode this, so we can see which part of that code it is?
>>>> And enable lockdep so that we can see where the interrupts were enabled?
>>> Enabled lockdep and got this one. It seems resume with bad wifi signal
>>> does it on 6.16...?
>>>
>>> Is it any good? Any decoding needed?
>>>
>>> ...
>>> [ 32.361445] CPU2 is up
>>> [ 32.361729] smpboot: Booting Node 0 Processor 3 APIC 0x3
>>> [ 32.361982] Disabled fast string operations
>>> [ 32.366800] CPU3 is up
>>> [ 32.370186] ACPI: PM: Waking up from system sleep state S3
>>> [ 32.393904] ACPI: EC: interrupt unblocked
>>> [ 32.396000] sdhci-pci 0000:0d:00.0: MMC controller base frequency changed to 50Mhz.
>>> [ 32.409738] ACPI: EC: event unblocked
>>> [ 32.470808] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
>>> [ 32.687300] usb 2-1.4: reset full-speed USB device number 4 using ehci-pci
>>> [ 32.758329] ata2: SATA link down (SStatus 0 SControl 300)
>>> [ 32.758375] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>>> [ 32.762316] ata5: SATA link down (SStatus 0 SControl 300)
>>> [ 32.764585] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
>>> [ 32.764593] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
>>> [ 32.771931] sd 2:0:0:0: [sdb] Starting disk
>>> [ 32.777439] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
>>> [ 32.777450] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
>>> [ 32.782731] ata3.00: configured for UDMA/133
>>> [ 32.786846] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
>>> [ 32.858513] usb 2-1.3: reset full-speed USB device number 3 using ehci-pci
>>> [ 33.026331] usb 2-1.6: reset high-speed USB device number 5 using ehci-pci
>>> [ 33.662583] psmouse serio1: synaptics: queried max coordinates: x [..5472], y [..4448]
>>> [ 33.810033] PM: resume devices took 1.404 seconds
>>> [ 33.841597] OOM killer enabled.
>>> [ 33.841808] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: docking
>>> [ 33.843280] Restarting tasks: Starting
>>> [ 33.849066] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: Unable to dock!
>>> [ 33.852744] Restarting tasks: Done
>>> [ 33.888306] PM: suspend exit
>>> [ 33.941831] Bluetooth: hci0: BCM: chip id 63
>>> [ 33.944910] Bluetooth: hci0: BCM: features 0x07
>>> [ 33.961985] Bluetooth: hci0: BCM20702A
>>> [ 33.962628] Bluetooth: hci0: BCM20702A1 (001.002.014) build 0000
>>> [ 33.970094] Bluetooth: hci0: BCM: firmware Patch file not found, tried:
>>> [ 33.970993] Bluetooth: hci0: BCM: 'brcm/BCM20702A1-0a5c-21e6.hcd'
>>> [ 33.971410] Bluetooth: hci0: BCM: 'brcm/BCM-0a5c-21e6.hcd'
>>> [ 34.032198] Bluetooth: MGMT ver 1.23
>>> [ 34.838285] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
>>> [ 35.166235] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
>>> [ 35.167828] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
>>> [ 35.206578] sd 0:0:0:0: [sda] Starting disk
>>> [ 35.238578] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
>>> [ 35.240197] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
>>> [ 35.278928] ata1.00: configured for UDMA/133
>>> [ 35.613471] e1000e 0000:00:19.0 enp0s25: NIC Link is Down
>>> [ 54.283825] wlp3s0: authenticate with c8:3a:35:f0:ad:f1 (local address=a0:88:b4:62:a7:30)
>>>
>>> [ 54.284095] ================================
>>> [ 54.284097] WARNING: inconsistent lock state
>>> [ 54.284100] 6.16.0-rc7+ #305 Tainted: G S
>>> [ 54.284104] --------------------------------
>>> [ 54.284105] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
>>> [ 54.284108] wpa_supplicant/2940 [HC0[0]:SC0[0]:HE0:SE1] takes:
>>> [ 54.284114] ffffffff86263fe0 (console_owner){+.?.}-{0:0}, at: console_lock_spinning_enable+0x3d/0x60
>> The lockdep warning just means that console_owner_lock is acquired both in
>> softirq context and in task context with interrupt enabled. That can leads
>> to deadlock. So the remedy is to always take console_owner_lock with
>> interrupt disabled, i.e. with
>> raw_spin_lock_irqsave/raw_spin_lock_irqrestore.
> So you suspect problem is in the printk code?
This lockdep problem is likely unrelated to the IRQ flag problem
reported in the original email as it shouldn't affect the IRQ flag.
Cheers,
Longman
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220
2025-07-24 17:51 ` Waiman Long
2025-07-24 17:55 ` Waiman Long
2025-07-24 19:07 ` Pavel Machek
@ 2025-07-24 19:45 ` John Ogness
2025-07-25 0:16 ` Waiman Long
2 siblings, 1 reply; 26+ messages in thread
From: John Ogness @ 2025-07-24 19:45 UTC (permalink / raw)
To: Waiman Long, Pavel Machek, Thomas Gleixner
Cc: kernel list, mingo, bp, dave.hansen, x86, hpa, peterz, will,
miriam.rachel.korenblit, linux-wireless, Petr Mladek
On 2025-07-24, Waiman Long <llong@redhat.com> wrote:
> On 7/24/25 1:15 PM, Pavel Machek wrote:
>> Hi!
>>
>> On Wed 2025-07-23 19:32:10, Thomas Gleixner wrote:
>>> On Wed, Jul 23 2025 at 19:31, Thomas Gleixner wrote:
>>>> On Wed, Jul 23 2025 at 17:42, Pavel Machek wrote:
>>>>> Did kernel boot on console (w/o X), and got this: not sure if it is
>>>>> related.
>>>>> [ 402.125635] ------------[ cut here ]------------
>>>>> [ 402.125638] raw_local_irq_restore() called with IRQs enabled
>>>>> [ 402.125645] WARNING: CPU: 3 PID: 387 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30
>>>>> [ 402.125654] Modules linked in:
>>>>> [ 402.125661] CPU: 3 UID: 0 PID: 387 Comm: kworker/u16:5 Tainted: G S 6.16.0-rc7+ #303 PREEMPT(voluntary)
>>>>> [ 402.125667] Tainted: [S]=CPU_OUT_OF_SPEC
>>>>> [ 402.125668] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016
>>>>> [ 402.125671] Workqueue: events_unbound cfg80211_wiphy_work
>>>>> [ 402.125678] RIP: 0010:warn_bogus_irq_restore+0x25/0x30
>>>>> [ 402.125683] Code: 90 90 90 90 90 80 3d 51 3d dc 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 c0 4f c9 85 48 89 e5 c6 05 38 3d dc 00 01 e8 9b d8 e6 fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90
>>>>> [ 402.125686] RSP: 0018:ffffc9000173fb30 EFLAGS: 00010282
>>>>> [ 402.125691] RAX: 0000000000000000 RBX: ffffffff8616b460 RCX: 0000000000000000
>>>>> [ 402.125694] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff
>>>>> [ 402.125696] RBP: ffffc9000173fb30 R08: 0000000028935f32 R09: 0000000000000001
>>>>> [ 402.125699] R10: 0000000000000044 R11: ffff888100ba52c8 R12: 0000000000000001
>>>>> [ 402.125702] R13: ffffc9000173fbcb R14: ffffffff84301224 R15: 0000000000000000
>>>>> [ 402.125704] FS: 0000000000000000(0000) GS:ffff88829007f000(0000) knlGS:0000000000000000
>>>>> [ 402.125707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [ 402.125710] CR2: 000055967d471ee0 CR3: 0000000006046001 CR4: 00000000000606b0
>>>>> [ 402.125713] Call Trace:
>>>>> [ 402.125716] <TASK>
>>>>> [ 402.125719] console_flush_all+0x41e/0x460
>>>> Can you please decode this, so we can see which part of that code it is?
>>> And enable lockdep so that we can see where the interrupts were enabled?
>> Enabled lockdep and got this one. It seems resume with bad wifi signal
>> does it on 6.16...?
>>
>> Is it any good? Any decoding needed?
>>
>> ...
>> [ 32.361445] CPU2 is up
>> [ 32.361729] smpboot: Booting Node 0 Processor 3 APIC 0x3
>> [ 32.361982] Disabled fast string operations
>> [ 32.366800] CPU3 is up
>> [ 32.370186] ACPI: PM: Waking up from system sleep state S3
>> [ 32.393904] ACPI: EC: interrupt unblocked
>> [ 32.396000] sdhci-pci 0000:0d:00.0: MMC controller base frequency changed to 50Mhz.
>> [ 32.409738] ACPI: EC: event unblocked
>> [ 32.470808] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
>> [ 32.687300] usb 2-1.4: reset full-speed USB device number 4 using ehci-pci
>> [ 32.758329] ata2: SATA link down (SStatus 0 SControl 300)
>> [ 32.758375] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> [ 32.762316] ata5: SATA link down (SStatus 0 SControl 300)
>> [ 32.764585] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
>> [ 32.764593] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
>> [ 32.771931] sd 2:0:0:0: [sdb] Starting disk
>> [ 32.777439] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
>> [ 32.777450] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
>> [ 32.782731] ata3.00: configured for UDMA/133
>> [ 32.786846] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
>> [ 32.858513] usb 2-1.3: reset full-speed USB device number 3 using ehci-pci
>> [ 33.026331] usb 2-1.6: reset high-speed USB device number 5 using ehci-pci
>> [ 33.662583] psmouse serio1: synaptics: queried max coordinates: x [..5472], y [..4448]
>> [ 33.810033] PM: resume devices took 1.404 seconds
>> [ 33.841597] OOM killer enabled.
>> [ 33.841808] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: docking
>> [ 33.843280] Restarting tasks: Starting
>> [ 33.849066] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: Unable to dock!
>> [ 33.852744] Restarting tasks: Done
>> [ 33.888306] PM: suspend exit
>> [ 33.941831] Bluetooth: hci0: BCM: chip id 63
>> [ 33.944910] Bluetooth: hci0: BCM: features 0x07
>> [ 33.961985] Bluetooth: hci0: BCM20702A
>> [ 33.962628] Bluetooth: hci0: BCM20702A1 (001.002.014) build 0000
>> [ 33.970094] Bluetooth: hci0: BCM: firmware Patch file not found, tried:
>> [ 33.970993] Bluetooth: hci0: BCM: 'brcm/BCM20702A1-0a5c-21e6.hcd'
>> [ 33.971410] Bluetooth: hci0: BCM: 'brcm/BCM-0a5c-21e6.hcd'
>> [ 34.032198] Bluetooth: MGMT ver 1.23
>> [ 34.838285] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
>> [ 35.166235] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
>> [ 35.167828] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
>> [ 35.206578] sd 0:0:0:0: [sda] Starting disk
>> [ 35.238578] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
>> [ 35.240197] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
>> [ 35.278928] ata1.00: configured for UDMA/133
>> [ 35.613471] e1000e 0000:00:19.0 enp0s25: NIC Link is Down
>> [ 54.283825] wlp3s0: authenticate with c8:3a:35:f0:ad:f1 (local address=a0:88:b4:62:a7:30)
>>
>> [ 54.284095] ================================
>> [ 54.284097] WARNING: inconsistent lock state
>> [ 54.284100] 6.16.0-rc7+ #305 Tainted: G S
>> [ 54.284104] --------------------------------
>> [ 54.284105] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
>> [ 54.284108] wpa_supplicant/2940 [HC0[0]:SC0[0]:HE0:SE1] takes:
>> [ 54.284114] ffffffff86263fe0 (console_owner){+.?.}-{0:0}, at: console_lock_spinning_enable+0x3d/0x60
>
> The lockdep warning just means that console_owner_lock is acquired both
> in softirq context and in task context with interrupt enabled. That can
> leads to deadlock. So the remedy is to always take console_owner_lock
> with interrupt disabled, i.e. with
> raw_spin_lock_irqsave/raw_spin_lock_irqrestore.
There are only 3 functions using this lock:
1. console_lock_spinning_enable()
2. console_lock_spinning_disable_and_check()
3. console_trylock_spinning()
For all call sites of 1 and 2, it is surrounded by
printk_safe_enter_irqsave()/_exit_irqrestore() calls. These explicitly
call local_irq_save()/_restore().
For 3, it explicitly calls
printk_safe_enter_irqsave()/_exit_irqrestore().
I wonder if this is a false positive because console_trylock_spinning()
annotates the acquire after re-enabling interrupts:
static int console_trylock_spinning(void)
{
...
printk_safe_exit_irqrestore(flags);
...
mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_);
...
}
As to the crashing problem at hand, it might be faster to bisect the
wireless-next-2025-05-22 merge.
John Ogness
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220
2025-07-24 17:55 ` Waiman Long
@ 2025-07-24 21:24 ` Thomas Gleixner
2025-07-25 0:33 ` Hillf Danton
2025-07-25 15:23 ` Thomas Gleixner
0 siblings, 2 replies; 26+ messages in thread
From: Thomas Gleixner @ 2025-07-24 21:24 UTC (permalink / raw)
To: Waiman Long, Pavel Machek
Cc: kernel list, mingo, bp, dave.hansen, x86, hpa, peterz, will,
miriam.rachel.korenblit, linux-wireless, Petr Mladek, John Ogness,
Greg Kroah-Hartman, Jiri Slaby, Tejun Heo, Simona Vetter
On Thu, Jul 24 2025 at 13:55, Waiman Long wrote:
> On 7/24/25 1:51 PM, Waiman Long wrote:
>>> [ 54.284095] ================================
>>> [ 54.284097] WARNING: inconsistent lock state
>>> [ 54.284100] 6.16.0-rc7+ #305 Tainted: G S
>>> [ 54.284104] --------------------------------
>>> [ 54.284105] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
>>> [ 54.284108] wpa_supplicant/2940 [HC0[0]:SC0[0]:HE0:SE1] takes:
>>> [ 54.284114] ffffffff86263fe0 (console_owner){+.?.}-{0:0}, at:
>>> console_lock_spinning_enable+0x3d/0x60
>>
>> The lockdep warning just means that console_owner_lock is acquired
>> both in softirq context and in task context with interrupt enabled.
>> That can leads to deadlock. So the remedy is to always take
>> console_owner_lock with interrupt disabled, i.e. with
>> raw_spin_lock_irqsave/raw_spin_lock_irqrestore.
>
> I suppose that this lock can also be acquired in hardirq context. So a
> similar HARDIRQ warning can be printed if that happens first.
None of this makes any sense whatsoever.
Both invocations of console_lock_spinning_enable() are within a hard
interrupt disabled section.
printk_safe_enter_irqsave(flags);
console_lock_spinning_enable();
The lockdep map which is printed here is not console_owner_lock,
it's console_owner, which is a software managed lockdep_map:
static struct lockdep_map console_owner_dep_map = {
.name = "console_owner"
};
It's touched in console_lock_spinning_enable(),
console_lock_spinning_disable_and_check() and console_trylock_spinning().
In all cases with interrupts disabled and all sites use
printk_safe_enter_irqsave() which internally uses local_irq_save().
Now lets go back to the initial report, which is further down:
[ 54.285777] raw_local_irq_restore() called with IRQs enabled
[ 54.285788] WARNING: CPU: 3 PID: 2940 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30
[ 54.285801] Modules linked in:
[ 54.285807] CPU: 3 UID: 0 PID: 2940 Comm: wpa_supplicant Tainted: G S 6.16.0-rc7+ #305 PREEMPT(voluntary)
[ 54.285814] Tainted: [S]=CPU_OUT_OF_SPEC
[ 54.285817] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016
[ 54.285820] RIP: 0010:warn_bogus_irq_restore+0x25/0x30
[ 54.285828] Code: 90 90 90 90 90 80 3d c2 5f e3 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 98 2f fa 85 c6 05 ac 5f e3 00 01 48 89 e5 e8 db 23 be fe <0f> 0b
5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90
[ 54.285834] RSP: 0018:ffffc90001acb690 EFLAGS: 00010282
[ 54.285839] RAX: 0000000000000000 RBX: ffffffff863ed460 RCX: 0000000000000000
[ 54.285843] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff
[ 54.285847] RBP: ffffc90001acb690 R08: 0000000000000000 R09: 0000000000000000
[ 54.285850] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[ 54.285853] R13: ffffc90001acb72b R14: 0000000000000200 R15: 0000000000000000
[ 54.285857] FS: 00007f79e08da1c0(0000) GS:ffff88828fcf5000(0000) knlGS:0000000000000000
[ 54.285862] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 54.285865] CR2: 000055b73b455000 CR3: 000000010b07e005 CR4: 00000000000606b0
[ 54.285869] Call Trace:
[ 54.285872] <TASK>
[ 54.285874] console_flush_all+0x47d/0x4d0
[ 54.285880] ? console_flush_all+0x43/0x4d0
[ 54.285885] ? console_flush_all+0x2d2/0x4d0
[ 54.285892] console_unlock+0x55/0x100
[ 54.285924] ieee80211_mgd_auth+0x2a3/0x5e0
[ 54.285934] ? __this_cpu_preempt_check+0x13/0x20
[ 54.285940] ieee80211_auth+0x13/0x20
I asked for decoding that console_flush_all+0x47d/0x4d0 line, but I'm
100% sure now that it is line 3128 in console_emit_next_record() which
is inlined into console_flush_all()
printk_safe_exit_irqrestore(flags);
Pavel, can you confirm that?
But let's look at the other stack trace in the lockdep splat which is
way more interesting:
stack backtrace:
[ 54.284433] CPU: 3 UID: 0 PID: 2940 Comm: wpa_supplicant Tainted: G S 6.16.0-rc7+ #305 PREEMPT(voluntary)
[ 54.284441] Tainted: [S]=CPU_OUT_OF_SPEC
[ 54.284443] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016
[ 54.284446] Call Trace:
[ 54.284448] <TASK>
[ 54.284450] dump_stack_lvl+0x88/0xd0
[ 54.284458] dump_stack+0x10/0x20
[ 54.284463] print_usage_bug.part.0+0x237/0x2d0
[ 54.284470] mark_lock.part.0+0xa9c/0xfb0
[ 54.284479] mark_held_locks+0x4d/0x80
[ 54.284486] lockdep_hardirqs_on_prepare+0xff/0x1c0
[ 54.284493] trace_hardirqs_on+0x5a/0xe0
[ 54.284500] _raw_spin_unlock_irq+0x23/0x60
[ 54.284505] __flush_work+0x3b4/0x550
[ 54.284513] ? __timer_delete+0x2f/0xd0
[ 54.284519] ? timer_delete+0xb/0x20
[ 54.284524] ? try_to_grab_pending+0x12a/0x320
[ 54.284534] cancel_delayed_work_sync+0x65/0x70
[ 54.284539] fbcon_cursor+0xbe/0x160
[ 54.284548] hide_cursor+0x2c/0xc0
[ 54.284553] vt_console_print+0x45e/0x470
[ 54.284560] console_flush_all+0x301/0x4d0
[ 54.284565] ? console_flush_all+0x2d2/0x4d0
[ 54.284569] ? console_flush_all+0x43/0x4d0
[ 54.284572] ? console_flush_all+0x2d2/0x4d0
[ 54.284579] console_unlock+0x55/0x100
[ 54.284584] vprintk_emit+0x15b/0x3a0
[ 54.284590] vprintk_default+0x18/0x20
[ 54.284595] vprintk+0x9/0x10
[ 54.284600] _printk+0x52/0x70
[ 54.284608] ieee80211_mgd_auth+0x2a3/0x5e0
[ 54.284619] ? __this_cpu_preempt_check+0x13/0x20
[ 54.284625] ieee80211_auth+0x13/0x20
That's the same call chain as the above which complains about the
already enabled interrupt.
So the lockdep splat is telling us that the raw_spin_unlock_irq() in the
workqueue code is making console_owner unsafe.
Now let's look where this comes from:
console_flush_all+0x301/0x4d0
invokes
vt_console_print+0x45e/0x470
via
console_emit_next_record()
...
printk_safe_enter_irqsave(flags);
...
con->write(con, outbuf, pmsg.outbuf_len);
Again decoding console_flush_all+0x301/0x4d0 should confirm that.
Now that ends up in cancel_delayed_work_sync() via hide_cursor() ->
fbcon_cursor().
cancel_delayed_work_sync() has a might_sleep() in it _and_
__flush_work() uses raw_spin_[un]lock_irq() as it requires to be invoked
in thread or in some cases in BH context.
vt_console_print() is the write() callback of the VT console driver. As
that is a legacy console, it can't do that.
Only the new NBCON variants are allowed to sleep in their write()
callbacks because they are running in their own printer thread
context. For atomic printouts in case of emergency they can have a
write_atomic() callback. If they don't then printing is not attempted,
but that's not relevant for vt_console_print() obviously.
So lets look at that call chain further
vt_console_print()
hide_cursor()
vc->vc_sw->con_cursor(vc, false); --> fbcon_cursor()
fbcon_cursor()
if (vc->vc_cursor_type & CUR_SW)
fbcon_del_cursor_work(info)
cancel_delayed_work_sync(&ops->cursor_work);
Here we are.
Now I checked whether this is a recent change, but I have to say it has
been that way for a long time. It's a very interesting question why this
never popped up before.
The only explanation I have so far is that the code path in
cancel_delayed_work_sync() which uses [un]lock_irq() is only reached,
when the works is executed at that point.
Unfortunatly the might_sleep() check in __cancel_work_sync() is after
the point which determines that.
So it's a matter of debug options being enabled and the probability of
hitting a work which is actually executed to trip over this problem.
I have no fix for that, but at least there should be some clues now
where to tackle this.
It's definitely not a printk bug. The culprit was hiding in plain sight
for a long time in the fbcon_console code.
Thanks,
tglx
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220
2025-07-24 19:22 ` Waiman Long
@ 2025-07-24 22:21 ` Thomas Gleixner
2025-07-25 1:56 ` Waiman Long
0 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2025-07-24 22:21 UTC (permalink / raw)
To: Waiman Long, Pavel Machek, Waiman Long
Cc: kernel list, mingo, bp, dave.hansen, x86, hpa, peterz, will,
miriam.rachel.korenblit, linux-wireless, Petr Mladek, John Ogness
On Thu, Jul 24 2025 at 15:22, Waiman Long wrote:
> On 7/24/25 3:07 PM, Pavel Machek wrote:
>>> The lockdep warning just means that console_owner_lock is acquired both in
>>> softirq context and in task context with interrupt enabled. That can leads
>>> to deadlock. So the remedy is to always take console_owner_lock with
>>> interrupt disabled, i.e. with
>>> raw_spin_lock_irqsave/raw_spin_lock_irqrestore.
>> So you suspect problem is in the printk code?
>
> This lockdep problem is likely unrelated to the IRQ flag problem
> reported in the original email as it shouldn't affect the IRQ flag.
Wrong you are ... :)
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220
2025-07-24 19:45 ` John Ogness
@ 2025-07-25 0:16 ` Waiman Long
2025-07-25 13:44 ` Thomas Gleixner
0 siblings, 1 reply; 26+ messages in thread
From: Waiman Long @ 2025-07-25 0:16 UTC (permalink / raw)
To: John Ogness, Waiman Long, Pavel Machek, Thomas Gleixner
Cc: kernel list, mingo, bp, dave.hansen, x86, hpa, peterz, will,
miriam.rachel.korenblit, linux-wireless, Petr Mladek
On 7/24/25 3:45 PM, John Ogness wrote:
> On 2025-07-24, Waiman Long <llong@redhat.com> wrote:
>> On 7/24/25 1:15 PM, Pavel Machek wrote:
>>> Hi!
>>>
>>> On Wed 2025-07-23 19:32:10, Thomas Gleixner wrote:
>>>> On Wed, Jul 23 2025 at 19:31, Thomas Gleixner wrote:
>>>>> On Wed, Jul 23 2025 at 17:42, Pavel Machek wrote:
>>>>>> Did kernel boot on console (w/o X), and got this: not sure if it is
>>>>>> related.
>>>>>> [ 402.125635] ------------[ cut here ]------------
>>>>>> [ 402.125638] raw_local_irq_restore() called with IRQs enabled
>>>>>> [ 402.125645] WARNING: CPU: 3 PID: 387 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30
>>>>>> [ 402.125654] Modules linked in:
>>>>>> [ 402.125661] CPU: 3 UID: 0 PID: 387 Comm: kworker/u16:5 Tainted: G S 6.16.0-rc7+ #303 PREEMPT(voluntary)
>>>>>> [ 402.125667] Tainted: [S]=CPU_OUT_OF_SPEC
>>>>>> [ 402.125668] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016
>>>>>> [ 402.125671] Workqueue: events_unbound cfg80211_wiphy_work
>>>>>> [ 402.125678] RIP: 0010:warn_bogus_irq_restore+0x25/0x30
>>>>>> [ 402.125683] Code: 90 90 90 90 90 80 3d 51 3d dc 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 c0 4f c9 85 48 89 e5 c6 05 38 3d dc 00 01 e8 9b d8 e6 fe <0f> 0b 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90
>>>>>> [ 402.125686] RSP: 0018:ffffc9000173fb30 EFLAGS: 00010282
>>>>>> [ 402.125691] RAX: 0000000000000000 RBX: ffffffff8616b460 RCX: 0000000000000000
>>>>>> [ 402.125694] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff
>>>>>> [ 402.125696] RBP: ffffc9000173fb30 R08: 0000000028935f32 R09: 0000000000000001
>>>>>> [ 402.125699] R10: 0000000000000044 R11: ffff888100ba52c8 R12: 0000000000000001
>>>>>> [ 402.125702] R13: ffffc9000173fbcb R14: ffffffff84301224 R15: 0000000000000000
>>>>>> [ 402.125704] FS: 0000000000000000(0000) GS:ffff88829007f000(0000) knlGS:0000000000000000
>>>>>> [ 402.125707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> [ 402.125710] CR2: 000055967d471ee0 CR3: 0000000006046001 CR4: 00000000000606b0
>>>>>> [ 402.125713] Call Trace:
>>>>>> [ 402.125716] <TASK>
>>>>>> [ 402.125719] console_flush_all+0x41e/0x460
>>>>> Can you please decode this, so we can see which part of that code it is?
>>>> And enable lockdep so that we can see where the interrupts were enabled?
>>> Enabled lockdep and got this one. It seems resume with bad wifi signal
>>> does it on 6.16...?
>>>
>>> Is it any good? Any decoding needed?
>>>
>>> ...
>>> [ 32.361445] CPU2 is up
>>> [ 32.361729] smpboot: Booting Node 0 Processor 3 APIC 0x3
>>> [ 32.361982] Disabled fast string operations
>>> [ 32.366800] CPU3 is up
>>> [ 32.370186] ACPI: PM: Waking up from system sleep state S3
>>> [ 32.393904] ACPI: EC: interrupt unblocked
>>> [ 32.396000] sdhci-pci 0000:0d:00.0: MMC controller base frequency changed to 50Mhz.
>>> [ 32.409738] ACPI: EC: event unblocked
>>> [ 32.470808] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
>>> [ 32.687300] usb 2-1.4: reset full-speed USB device number 4 using ehci-pci
>>> [ 32.758329] ata2: SATA link down (SStatus 0 SControl 300)
>>> [ 32.758375] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>>> [ 32.762316] ata5: SATA link down (SStatus 0 SControl 300)
>>> [ 32.764585] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
>>> [ 32.764593] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
>>> [ 32.771931] sd 2:0:0:0: [sdb] Starting disk
>>> [ 32.777439] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
>>> [ 32.777450] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
>>> [ 32.782731] ata3.00: configured for UDMA/133
>>> [ 32.786846] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0
>>> [ 32.858513] usb 2-1.3: reset full-speed USB device number 3 using ehci-pci
>>> [ 33.026331] usb 2-1.6: reset high-speed USB device number 5 using ehci-pci
>>> [ 33.662583] psmouse serio1: synaptics: queried max coordinates: x [..5472], y [..4448]
>>> [ 33.810033] PM: resume devices took 1.404 seconds
>>> [ 33.841597] OOM killer enabled.
>>> [ 33.841808] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: docking
>>> [ 33.843280] Restarting tasks: Starting
>>> [ 33.849066] ACPI: \_SB_.PCI0.LPC_.EC__.BAT1: Unable to dock!
>>> [ 33.852744] Restarting tasks: Done
>>> [ 33.888306] PM: suspend exit
>>> [ 33.941831] Bluetooth: hci0: BCM: chip id 63
>>> [ 33.944910] Bluetooth: hci0: BCM: features 0x07
>>> [ 33.961985] Bluetooth: hci0: BCM20702A
>>> [ 33.962628] Bluetooth: hci0: BCM20702A1 (001.002.014) build 0000
>>> [ 33.970094] Bluetooth: hci0: BCM: firmware Patch file not found, tried:
>>> [ 33.970993] Bluetooth: hci0: BCM: 'brcm/BCM20702A1-0a5c-21e6.hcd'
>>> [ 33.971410] Bluetooth: hci0: BCM: 'brcm/BCM-0a5c-21e6.hcd'
>>> [ 34.032198] Bluetooth: MGMT ver 1.23
>>> [ 34.838285] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
>>> [ 35.166235] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
>>> [ 35.167828] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
>>> [ 35.206578] sd 0:0:0:0: [sda] Starting disk
>>> [ 35.238578] ata1.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
>>> [ 35.240197] ata1.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
>>> [ 35.278928] ata1.00: configured for UDMA/133
>>> [ 35.613471] e1000e 0000:00:19.0 enp0s25: NIC Link is Down
>>> [ 54.283825] wlp3s0: authenticate with c8:3a:35:f0:ad:f1 (local address=a0:88:b4:62:a7:30)
>>>
>>> [ 54.284095] ================================
>>> [ 54.284097] WARNING: inconsistent lock state
>>> [ 54.284100] 6.16.0-rc7+ #305 Tainted: G S
>>> [ 54.284104] --------------------------------
>>> [ 54.284105] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
>>> [ 54.284108] wpa_supplicant/2940 [HC0[0]:SC0[0]:HE0:SE1] takes:
>>> [ 54.284114] ffffffff86263fe0 (console_owner){+.?.}-{0:0}, at: console_lock_spinning_enable+0x3d/0x60
>> The lockdep warning just means that console_owner_lock is acquired both
>> in softirq context and in task context with interrupt enabled. That can
>> leads to deadlock. So the remedy is to always take console_owner_lock
>> with interrupt disabled, i.e. with
>> raw_spin_lock_irqsave/raw_spin_lock_irqrestore.
> There are only 3 functions using this lock:
>
> 1. console_lock_spinning_enable()
> 2. console_lock_spinning_disable_and_check()
> 3. console_trylock_spinning()
>
> For all call sites of 1 and 2, it is surrounded by
> printk_safe_enter_irqsave()/_exit_irqrestore() calls. These explicitly
> call local_irq_save()/_restore().
>
> For 3, it explicitly calls
> printk_safe_enter_irqsave()/_exit_irqrestore().
>
> I wonder if this is a false positive because console_trylock_spinning()
> annotates the acquire after re-enabling interrupts:
>
> static int console_trylock_spinning(void)
> {
> ...
> printk_safe_exit_irqrestore(flags);
> ...
> mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_);
> ...
> }
>
> As to the crashing problem at hand, it might be faster to bisect the
> wireless-next-2025-05-22 merge.
I am sorry for confusing conole_owner_lock with the console_owner
lockdep map. Yes, this lockdep splat should be a false positive. You do
need to put the mutex_acquire before printk_safe_exit_irqrestore() to
avoid this splat.
Cheers,
Longman
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220
2025-07-24 21:24 ` Thomas Gleixner
@ 2025-07-25 0:33 ` Hillf Danton
2025-07-25 15:23 ` Thomas Gleixner
1 sibling, 0 replies; 26+ messages in thread
From: Hillf Danton @ 2025-07-25 0:33 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Waiman Long, Pavel Machek, LKML, Petr Mladek, John Ogness,
Tejun Heo
On Thu, 24 Jul 2025 23:24:26 +0200 Thomas Gleixner wrote:
>On Thu, Jul 24 2025 at 13:55, Waiman Long wrote:
>> On 7/24/25 1:51 PM, Waiman Long wrote:
>>>> [ 54.284095] ================================
>>>> [ 54.284097] WARNING: inconsistent lock state
>>>> [ 54.284100] 6.16.0-rc7+ #305 Tainted: G S
>>>> [ 54.284104] --------------------------------
>>>> [ 54.284105] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
>>>> [ 54.284108] wpa_supplicant/2940 [HC0[0]:SC0[0]:HE0:SE1] takes:
>>>> [ 54.284114] ffffffff86263fe0 (console_owner){+.?.}-{0:0}, at:
>>>> console_lock_spinning_enable+0x3d/0x60
>>>
>>> The lockdep warning just means that console_owner_lock is acquired
>>> both in softirq context and in task context with interrupt enabled.
>>> That can leads to deadlock. So the remedy is to always take
>>> console_owner_lock with interrupt disabled, i.e. with
>>> raw_spin_lock_irqsave/raw_spin_lock_irqrestore.
>>
>> I suppose that this lock can also be acquired in hardirq context. So a
>> similar HARDIRQ warning can be printed if that happens first.
>
> None of this makes any sense whatsoever.
>
> Both invocations of console_lock_spinning_enable() are within a hard
> interrupt disabled section.
>
> printk_safe_enter_irqsave(flags);
> console_lock_spinning_enable();
>
> The lockdep map which is printed here is not console_owner_lock,
> it's console_owner, which is a software managed lockdep_map:
>
> static struct lockdep_map console_owner_dep_map = {
> .name = "console_owner"
> };
>
> It's touched in console_lock_spinning_enable(),
> console_lock_spinning_disable_and_check() and console_trylock_spinning().
>
> In all cases with interrupts disabled and all sites use
> printk_safe_enter_irqsave() which internally uses local_irq_save().
>
> Now lets go back to the initial report, which is further down:
>
> [ 54.285777] raw_local_irq_restore() called with IRQs enabled
> [ 54.285788] WARNING: CPU: 3 PID: 2940 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x25/0x30
> [ 54.285801] Modules linked in:
> [ 54.285807] CPU: 3 UID: 0 PID: 2940 Comm: wpa_supplicant Tainted: G S 6.16.0-rc7+ #305 PREEMPT(voluntary)
> [ 54.285814] Tainted: [S]=CPU_OUT_OF_SPEC
> [ 54.285817] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016
> [ 54.285820] RIP: 0010:warn_bogus_irq_restore+0x25/0x30
> [ 54.285828] Code: 90 90 90 90 90 80 3d c2 5f e3 00 00 74 05 c3 cc cc cc cc 55 48 c7 c7 98 2f fa 85 c6 05 ac 5f e3 00 01 48 89 e5 e8 db 23 be fe <0f> 0b
> 5d c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90
> [ 54.285834] RSP: 0018:ffffc90001acb690 EFLAGS: 00010282
> [ 54.285839] RAX: 0000000000000000 RBX: ffffffff863ed460 RCX: 0000000000000000
> [ 54.285843] RDX: 0000000000000003 RSI: 0000000000000027 RDI: 00000000ffffffff
> [ 54.285847] RBP: ffffc90001acb690 R08: 0000000000000000 R09: 0000000000000000
> [ 54.285850] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
> [ 54.285853] R13: ffffc90001acb72b R14: 0000000000000200 R15: 0000000000000000
> [ 54.285857] FS: 00007f79e08da1c0(0000) GS:ffff88828fcf5000(0000) knlGS:0000000000000000
> [ 54.285862] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 54.285865] CR2: 000055b73b455000 CR3: 000000010b07e005 CR4: 00000000000606b0
> [ 54.285869] Call Trace:
> [ 54.285872] <TASK>
> [ 54.285874] console_flush_all+0x47d/0x4d0
> [ 54.285880] ? console_flush_all+0x43/0x4d0
> [ 54.285885] ? console_flush_all+0x2d2/0x4d0
> [ 54.285892] console_unlock+0x55/0x100
> [ 54.285924] ieee80211_mgd_auth+0x2a3/0x5e0
> [ 54.285934] ? __this_cpu_preempt_check+0x13/0x20
> [ 54.285940] ieee80211_auth+0x13/0x20
>
> I asked for decoding that console_flush_all+0x47d/0x4d0 line, but I'm
> 100% sure now that it is line 3128 in console_emit_next_record() which
> is inlined into console_flush_all()
>
> printk_safe_exit_irqrestore(flags);
>
> Pavel, can you confirm that?
>
> But let's look at the other stack trace in the lockdep splat which is
> way more interesting:
>
> stack backtrace:
> [ 54.284433] CPU: 3 UID: 0 PID: 2940 Comm: wpa_supplicant Tainted: G S 6.16.0-rc7+ #305 PREEMPT(voluntary)
> [ 54.284441] Tainted: [S]=CPU_OUT_OF_SPEC
> [ 54.284443] Hardware name: LENOVO 4291W3B/4291W3B, BIOS 8DET73WW (1.43 ) 10/12/2016
> [ 54.284446] Call Trace:
> [ 54.284448] <TASK>
> [ 54.284450] dump_stack_lvl+0x88/0xd0
> [ 54.284458] dump_stack+0x10/0x20
> [ 54.284463] print_usage_bug.part.0+0x237/0x2d0
> [ 54.284470] mark_lock.part.0+0xa9c/0xfb0
> [ 54.284479] mark_held_locks+0x4d/0x80
> [ 54.284486] lockdep_hardirqs_on_prepare+0xff/0x1c0
> [ 54.284493] trace_hardirqs_on+0x5a/0xe0
> [ 54.284500] _raw_spin_unlock_irq+0x23/0x60
> [ 54.284505] __flush_work+0x3b4/0x550
> [ 54.284513] ? __timer_delete+0x2f/0xd0
> [ 54.284519] ? timer_delete+0xb/0x20
> [ 54.284524] ? try_to_grab_pending+0x12a/0x320
> [ 54.284534] cancel_delayed_work_sync+0x65/0x70
> [ 54.284539] fbcon_cursor+0xbe/0x160
> [ 54.284548] hide_cursor+0x2c/0xc0
> [ 54.284553] vt_console_print+0x45e/0x470
> [ 54.284560] console_flush_all+0x301/0x4d0
> [ 54.284565] ? console_flush_all+0x2d2/0x4d0
> [ 54.284569] ? console_flush_all+0x43/0x4d0
> [ 54.284572] ? console_flush_all+0x2d2/0x4d0
> [ 54.284579] console_unlock+0x55/0x100
> [ 54.284584] vprintk_emit+0x15b/0x3a0
> [ 54.284590] vprintk_default+0x18/0x20
> [ 54.284595] vprintk+0x9/0x10
> [ 54.284600] _printk+0x52/0x70
> [ 54.284608] ieee80211_mgd_auth+0x2a3/0x5e0
> [ 54.284619] ? __this_cpu_preempt_check+0x13/0x20
> [ 54.284625] ieee80211_auth+0x13/0x20
>
> That's the same call chain as the above which complains about the
> already enabled interrupt.
>
> So the lockdep splat is telling us that the raw_spin_unlock_irq() in the
> workqueue code is making console_owner unsafe.
>
Point A
> Now let's look where this comes from:
>
> console_flush_all+0x301/0x4d0
>
> invokes
>
> vt_console_print+0x45e/0x470
>
> via
>
> console_emit_next_record()
> ...
> printk_safe_enter_irqsave(flags);
> ...
> con->write(con, outbuf, pmsg.outbuf_len);
>
> Again decoding console_flush_all+0x301/0x4d0 should confirm that.
>
> Now that ends up in cancel_delayed_work_sync() via hide_cursor() ->
> fbcon_cursor().
>
> cancel_delayed_work_sync() has a might_sleep() in it _and_
> __flush_work() uses raw_spin_[un]lock_irq() as it requires to be invoked
> in thread or in some cases in BH context.
>
> vt_console_print() is the write() callback of the VT console driver. As
> that is a legacy console, it can't do that.
>
> Only the new NBCON variants are allowed to sleep in their write()
> callbacks because they are running in their own printer thread
> context. For atomic printouts in case of emergency they can have a
> write_atomic() callback. If they don't then printing is not attempted,
> but that's not relevant for vt_console_print() obviously.
>
> So lets look at that call chain further
>
> vt_console_print()
> hide_cursor()
> vc->vc_sw->con_cursor(vc, false); --> fbcon_cursor()
>
> fbcon_cursor()
> if (vc->vc_cursor_type & CUR_SW)
> fbcon_del_cursor_work(info)
> cancel_delayed_work_sync(&ops->cursor_work);
>
cancel_delayed_work_sync(&ops->cursor_work);
__cancel_work_sync()
__cancel_work()
local_irq_save()
...
local_irq_restore() // Point B
__flush_work()
start_flush_work()
raw_spin_lock_irq(&pool->lock);
...
raw_spin_unlock_irq(&pool->lock); // Point C
The difference between Point B and C wrt irq creates Point A.
> Here we are.
>
> Now I checked whether this is a recent change, but I have to say it has
> been that way for a long time. It's a very interesting question why this
> never popped up before.
>
> The only explanation I have so far is that the code path in
> cancel_delayed_work_sync() which uses [un]lock_irq() is only reached,
> when the works is executed at that point.
>
> Unfortunatly the might_sleep() check in __cancel_work_sync() is after
> the point which determines that.
>
> So it's a matter of debug options being enabled and the probability of
> hitting a work which is actually executed to trip over this problem.
>
No, debug makes no sense. Nor does a running work.
> I have no fix for that, but at least there should be some clues now
> where to tackle this.
>
Yes
> It's definitely not a printk bug. The culprit was hiding in plain sight
> for a long time in the fbcon_console code.
>
Nope because of Point A.
Hillf
> Thanks,
>
> tglx
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220
2025-07-24 22:21 ` Thomas Gleixner
@ 2025-07-25 1:56 ` Waiman Long
0 siblings, 0 replies; 26+ messages in thread
From: Waiman Long @ 2025-07-25 1:56 UTC (permalink / raw)
To: Thomas Gleixner, Waiman Long, Pavel Machek
Cc: kernel list, mingo, bp, dave.hansen, x86, hpa, peterz, will,
miriam.rachel.korenblit, linux-wireless, Petr Mladek, John Ogness
On 7/24/25 6:21 PM, Thomas Gleixner wrote:
> On Thu, Jul 24 2025 at 15:22, Waiman Long wrote:
>> On 7/24/25 3:07 PM, Pavel Machek wrote:
>>>> The lockdep warning just means that console_owner_lock is acquired both in
>>>> softirq context and in task context with interrupt enabled. That can leads
>>>> to deadlock. So the remedy is to always take console_owner_lock with
>>>> interrupt disabled, i.e. with
>>>> raw_spin_lock_irqsave/raw_spin_lock_irqrestore.
>>> So you suspect problem is in the printk code?
>> This lockdep problem is likely unrelated to the IRQ flag problem
>> reported in the original email as it shouldn't affect the IRQ flag.
> Wrong you are ... :)
>
Sorry for coming to a conclusion early without looking more deeply into
the printk code. I do admit that printk() isn't an area that I know
enough about.
Will look harder next time.
Thanks,
Longman
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220
2025-07-25 0:16 ` Waiman Long
@ 2025-07-25 13:44 ` Thomas Gleixner
2025-07-25 19:38 ` Waiman Long
0 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2025-07-25 13:44 UTC (permalink / raw)
To: Waiman Long, John Ogness, Waiman Long, Pavel Machek
Cc: kernel list, mingo, bp, dave.hansen, x86, hpa, peterz, will,
miriam.rachel.korenblit, linux-wireless, Petr Mladek
On Thu, Jul 24 2025 at 20:16, Waiman Long wrote:
> On 7/24/25 3:45 PM, John Ogness wrote:
> I am sorry for confusing conole_owner_lock with the console_owner
> lockdep map. Yes, this lockdep splat should be a false positive. You do
> need to put the mutex_acquire before printk_safe_exit_irqrestore() to
> avoid this splat.
No. The printk code is absolutely correct as it is.
Did you actually read my analysis?
The lockdep splat is triggered by the raw_spin_unlock_irq() in
__flush_work() as it makes console_owner unsafe.
Again:
printk()
local_irq_save()
acquire(console_owner);
con->write()
vt_console_print()
hide_cursor()
vc->vc_sw->con_cursor()
fbcon_cursor()
fbcon_del_cursor_work()
cancel_delayed_work_sync()
__flush_work()
raw_spin_unlock_irq() <- FAIL
The initial report from
cancel_delayed_work_sync() _CANNOT_ be invoked from atomic contexts,
unless the work is marked as BH work. And for BH work it cannot be
called from hard interrupt context for obvious reasons.
Marking that cursor work BH is not helping either, as printk() can be
invoked from everywhere.
There is no way to legitimately invoke cancel_delayed_work_sync() from
the legacy console write() callback ever.
No?
Thanks,
tglx
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220
2025-07-24 21:24 ` Thomas Gleixner
2025-07-25 0:33 ` Hillf Danton
@ 2025-07-25 15:23 ` Thomas Gleixner
2025-07-25 19:05 ` Pavel Machek
1 sibling, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2025-07-25 15:23 UTC (permalink / raw)
To: Waiman Long, Pavel Machek
Cc: kernel list, mingo, bp, dave.hansen, x86, hpa, peterz, will,
miriam.rachel.korenblit, linux-wireless, Petr Mladek, John Ogness,
Greg Kroah-Hartman, Jiri Slaby, Tejun Heo, Simona Vetter,
Thierry Reding
On Thu, Jul 24 2025 at 23:24, Thomas Gleixner wrote:
> On Thu, Jul 24 2025 at 13:55, Waiman Long wrote:
> So lets look at that call chain further
>
> vt_console_print()
> hide_cursor()
> vc->vc_sw->con_cursor(vc, false); --> fbcon_cursor()
>
> fbcon_cursor()
> if (vc->vc_cursor_type & CUR_SW)
> fbcon_del_cursor_work(info)
> cancel_delayed_work_sync(&ops->cursor_work);
>
> Here we are.
It's actually very simple to reproduce in a VM. No wireless, no
suspend/resume, no special hardware required.
Enable framebuffer and framebuffer console. Add console=tty0 to the
kernel command line.
Log into FB console and do
# echo -e '\033[?17;0;64c'
which enables the software cursor, which in turn enables the above
conditional invocation of fbcon_del_cursor_work(). Then force a printk
# echo h >/proc/sysrq-trigger
and watch the show.
[ 1406.245343] BUG: sleeping function called from invalid context at kernel/workqueue.c:4359
[ 1406.245345] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1819, name: bash
[ 1406.245346] preempt_count: 2, expected: 0
[ 1406.245347] RCU nest depth: 1, expected: 0
[ 1406.245348] 7 locks held by bash/1819:
[ 1406.245349] #0: ffff9a730c4ba428 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0x73/0xf0
[ 1406.245444] #1: ffffffffa7ade120 (rcu_read_lock){....}-{1:3}, at: __handle_sysrq+0x3d/0x270
[ 1406.245457] #2: ffffffffa7adb500 (console_lock){+.+.}-{0:0}, at: _printk+0x63/0x80
[ 1406.245464] #3: ffffffffa7adb550 (console_srcu){....}-{0:0}, at: console_flush_all+0x3c/0x530
[ 1406.245470] #4: ffffffffa7a6b140 (console_owner){....}-{0:0}, at: console_lock_spinning_enable+0x42/0x70
[ 1406.245475] #5: ffffffffa7a6b0c0 (printk_legacy_map-wait-type-override){....}-{4:4}, at: console_flush_all+0x2fe/0x530
[ 1406.245480] #6: ffffffffa7b8e4d8 (printing_lock){....}-{3:3}, at: vt_console_print+0x5b/0x490
[ 1406.245487] irq event stamp: 92690
[ 1406.245488] hardirqs last enabled at (92689): [<ffffffffa656edef>] console_flush_all+0x49f/0x530
[ 1406.245491] hardirqs last disabled at (92690): [<ffffffffa656edff>] console_flush_all+0x4af/0x530
[ 1406.245493] softirqs last enabled at (91822): [<ffffffffa6455166>] __fpu_restore_sig+0x206/0x6a0
[ 1406.245497] softirqs last disabled at (91820): [<ffffffffa6455088>] __fpu_restore_sig+0x128/0x6a0
[ 1406.245499] Preemption disabled at:
[ 1406.245499] [<ffffffffa6570211>] vprintk_emit+0x1f1/0x430
[ 1406.245505] CPU: 46 UID: 0 PID: 1819 Comm: bash Not tainted 6.16.0-rc7+ #446 PREEMPT(voluntary)
[ 1406.245508] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 1406.245510] Call Trace:
[ 1406.245514] <TASK>
[ 1406.245519] dump_stack_lvl+0x77/0xb0
[ 1406.245526] __might_resched+0x1b2/0x2d0
[ 1406.245539] cancel_delayed_work_sync+0x57/0x90
[ 1406.245548] fbcon_cursor+0xb2/0x140
[ 1406.245558] hide_cursor+0x24/0xb0
[ 1406.245562] vt_console_print+0x471/0x490
[ 1406.245565] ? __pfx_vt_console_print+0x10/0x10
[ 1406.245575] console_flush_all+0x32b/0x530
As I've enabled CONFIG_DEBUG_ATOMIC_SLEEP I get the splat immediately
even when there is no work queued. (I misread that code yesterday with
my tried brain and thought the might_sleep() would be too late).
And clearly this is invoked with interrupts disabled:
[ 1406.245345] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1819, name: bash
Now if there would be work executed at that moment then the lockdep
splat _AND_ the subsequent 'irqs already enabled' splat would
materialize.
I can't be bothered to figure out how to make the work executed because
the above is already a clear prove of the theory :)
It requires that the software cursor is enabled. That's probably rare,
so that explains why this went unnoticed for so long.
Contrary to all the halluzinations about the printk locking and lockdep
correctness, this is a crystal clear bug in the framebuffer console code
and has absolutely nothing to do with the printk() core code.
The workqueue issue got introduced with commit 3b0fb6ab25dd ("fbcon: Use
delayed work for cursor") in 5.19, but that replaced a del_timer_sync(),
which had a similar problem because del_timer_sync() cannot be invoked
in hard interrupt context, but printk() can ...
That del_timer_sync() was conditional up to 4.2, where it became
unconditional in commit 2a17d7e80f1d ("fbcon: unconditionally initialize
cursor blink interval") just to be "fixed" shortly before 4.2-rc1 by
moving the conditional back in commit a5edce421848 ("fbcon: Avoid
deleting a timer in IRQ context").
That conditional did not fix anything as it just pushed the delete into
the rarely executed path, therefore hiding the bug, which existed from
the very beginning when this cursor blinking muck was introduced with
commit acba9cd01974 ("fbcon: cursor blink control") in 2.6.23.
So this software cursor code path was never tested with printk() and
actually the splat which led to commit a5edce421848 ("fbcon: Avoid
deleting a timer in IRQ context") should have made it crystal clear that
this code path is broken:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at ../kernel/time/timer.c:1098 del_timer_sync+0x4c/0x54()
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.0-rc4-next-20150519 #1
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[] (warn_slowpath_null) from [] (del_timer_sync+0x4c/0x54)
[] (del_timer_sync) from [] (fbcon_del_cursor_timer+0x2c/0x40)
[] (fbcon_del_cursor_timer) from [] (fbcon_cursor+0x9c/0x180)
[] (fbcon_cursor) from [] (hide_cursor+0x30/0x98)
[] (hide_cursor) from [] (vt_console_print+0x2a8/0x340)
[] (vt_console_print) from [] (call_console_drivers.constprop.23+0xc8/0xec)
[] (call_console_drivers.constprop.23) from [] (console_unlock+0x498/0x4f0)
[] (console_unlock) from [] (vprintk_emit+0x1f0/0x508)
[] (vprintk_emit) from [] (vprintk_default+0x24/0x2c)
[] (vprintk_default) from [] (printk+0x70/0x88)
But no, hiding it behind a rarely true conditional without actually
testing it, is way better. Seriously?
The changelog should have been:
This commit papers over the underlying problem by restoring the
condition...
I leave it as an exercise to the framebuffer people how to fix this
trainwreck for real.
Thanks,
tglx
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220
2025-07-25 15:23 ` Thomas Gleixner
@ 2025-07-25 19:05 ` Pavel Machek
0 siblings, 0 replies; 26+ messages in thread
From: Pavel Machek @ 2025-07-25 19:05 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Waiman Long, kernel list, mingo, bp, dave.hansen, x86, hpa,
peterz, will, miriam.rachel.korenblit, linux-wireless,
Petr Mladek, John Ogness, Greg Kroah-Hartman, Jiri Slaby,
Tejun Heo, Simona Vetter, Thierry Reding
[-- Attachment #1: Type: text/plain, Size: 755 bytes --]
Hi!
> Enable framebuffer and framebuffer console. Add console=tty0 to the
> kernel command line.
>
> Log into FB console and do
>
> # echo -e '\033[?17;0;64c'
>
> which enables the software cursor, which in turn enables the above
> conditional invocation of fbcon_del_cursor_work(). Then force a printk
>
> # echo h >/proc/sysrq-trigger
>
> and watch the show.
Heh, fun. Yes, long, long time ago I enabled software cursor. I
don't normally use the console these days, but I switched to console
for debugging the wireless problem -- hoping I would catch some kind
of backtrace on the console.
Best regards,
Pavel
--
I don't work for Nazis and criminals, and neither should you.
Boycott Putin, Trump, and Musk!
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: locking problems in iwlwifi? was Re: 6.16-rcX: crashing way too often on thinkpad X220
2025-07-25 13:44 ` Thomas Gleixner
@ 2025-07-25 19:38 ` Waiman Long
0 siblings, 0 replies; 26+ messages in thread
From: Waiman Long @ 2025-07-25 19:38 UTC (permalink / raw)
To: Thomas Gleixner, Waiman Long, John Ogness, Pavel Machek
Cc: kernel list, mingo, bp, dave.hansen, x86, hpa, peterz, will,
miriam.rachel.korenblit, linux-wireless, Petr Mladek
On 7/25/25 9:44 AM, Thomas Gleixner wrote:
> On Thu, Jul 24 2025 at 20:16, Waiman Long wrote:
>> On 7/24/25 3:45 PM, John Ogness wrote:
>> I am sorry for confusing conole_owner_lock with the console_owner
>> lockdep map. Yes, this lockdep splat should be a false positive. You do
>> need to put the mutex_acquire before printk_safe_exit_irqrestore() to
>> avoid this splat.
> No. The printk code is absolutely correct as it is.
>
> Did you actually read my analysis?
I have read your analysis and I believe in your conclusion.
>
> The lockdep splat is triggered by the raw_spin_unlock_irq() in
> __flush_work() as it makes console_owner unsafe.
>
> Again:
>
> printk()
> local_irq_save()
> acquire(console_owner);
> con->write()
> vt_console_print()
> hide_cursor()
> vc->vc_sw->con_cursor()
> fbcon_cursor()
> fbcon_del_cursor_work()
> cancel_delayed_work_sync()
> __flush_work()
> raw_spin_unlock_irq() <- FAIL
>
> The initial report from
>
> cancel_delayed_work_sync() _CANNOT_ be invoked from atomic contexts,
> unless the work is marked as BH work. And for BH work it cannot be
> called from hard interrupt context for obvious reasons.
>
> Marking that cursor work BH is not helping either, as printk() can be
> invoked from everywhere.
>
> There is no way to legitimately invoke cancel_delayed_work_sync() from
> the legacy console write() callback ever.
>
> No?
I am not disputing this.
Thanks,
Longman
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2025-07-25 19:38 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-22 17:35 6.16-rcX: crashing way too often on thinkpad X220 Pavel Machek
2025-07-23 15:06 ` Pavel Machek
2025-07-23 15:11 ` Dave Hansen
2025-07-23 20:35 ` Pavel Machek
2025-07-23 15:42 ` locking problems in iwlwifi? was " Pavel Machek
2025-07-23 17:13 ` Johannes Berg
2025-07-23 20:37 ` Pavel Machek
2025-07-24 8:50 ` John Ogness
2025-07-23 21:42 ` Pavel Machek
2025-07-23 17:31 ` Thomas Gleixner
2025-07-23 17:32 ` Thomas Gleixner
2025-07-24 17:15 ` Pavel Machek
2025-07-24 17:51 ` Waiman Long
2025-07-24 17:55 ` Waiman Long
2025-07-24 21:24 ` Thomas Gleixner
2025-07-25 0:33 ` Hillf Danton
2025-07-25 15:23 ` Thomas Gleixner
2025-07-25 19:05 ` Pavel Machek
2025-07-24 19:07 ` Pavel Machek
2025-07-24 19:22 ` Waiman Long
2025-07-24 22:21 ` Thomas Gleixner
2025-07-25 1:56 ` Waiman Long
2025-07-24 19:45 ` John Ogness
2025-07-25 0:16 ` Waiman Long
2025-07-25 13:44 ` Thomas Gleixner
2025-07-25 19:38 ` Waiman Long
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).