* Re: lockup and kernel panic in linux-next-202505{09,12} when compiled with clang
@ 2025-05-13 16:48 Bert Karwatzki
2025-05-13 22:33 ` Thomas Gleixner
0 siblings, 1 reply; 20+ messages in thread
From: Bert Karwatzki @ 2025-05-13 16:48 UTC (permalink / raw)
To: linux-kernel
Cc: Bert Karwatzki, linux-next, llvm, Johannes Berg, Thomas Gleixner
>
> I'll now start a bisection where I revert 76a853f86c97 where possible in
> order to find the remaining bugs.
>
The second bisection (from v6.15-rc6 to next-20250512) is finished now:
This commit leads to lockups and kernel panics after
watching ~5-10min of a youtube video while compiling a kernel,
reverting it in next-20250512 is possible:
76a853f86c97 ("wifi: free SKBTX_WIFI_STATUS skb tx_flags flag")
This commit leads to the boot failure, reverting leads to the
compile error it is supposed to fix:
97f4b999e0c8 ("genirq: Use scoped_guard() to shut clang up")
So are these kernel bugs or a clang bugs?
Bert Karwatzki
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: lockup and kernel panic in linux-next-202505{09,12} when compiled with clang 2025-05-13 16:48 lockup and kernel panic in linux-next-202505{09,12} when compiled with clang Bert Karwatzki @ 2025-05-13 22:33 ` Thomas Gleixner 2025-05-14 0:11 ` Bert Karwatzki 0 siblings, 1 reply; 20+ messages in thread From: Thomas Gleixner @ 2025-05-13 22:33 UTC (permalink / raw) To: Bert Karwatzki, linux-kernel Cc: Bert Karwatzki, linux-next, llvm, Johannes Berg On Tue, May 13 2025 at 18:48, Bert Karwatzki wrote: >> >> I'll now start a bisection where I revert 76a853f86c97 where possible in >> order to find the remaining bugs. > > The second bisection (from v6.15-rc6 to next-20250512) is finished now: > > This commit leads to lockups and kernel panics after > watching ~5-10min of a youtube video while compiling a kernel, > reverting it in next-20250512 is possible: > 76a853f86c97 ("wifi: free SKBTX_WIFI_STATUS skb tx_flags flag") > This commit leads to the boot failure, reverting leads to the > compile error it is supposed to fix: > 97f4b999e0c8 ("genirq: Use scoped_guard() to shut clang up") I really have a hard time to understand what you are trying to explain here. 'This commit leads..' is so unspecified that I can't make any sense of it. Also please make sure that you have commit b5fcb6898202 ("genirq: Ensure flags in lock guard is consistently initialized") in your tree when re-testing. That's fixing another subtle (AFAICT clang only) problem in the guard conversion. If it's not in next yet, you can just merge git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/core into next or wait for the next next integration. Thanks tglx ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: lockup and kernel panic in linux-next-202505{09,12} when compiled with clang 2025-05-13 22:33 ` Thomas Gleixner @ 2025-05-14 0:11 ` Bert Karwatzki 2025-05-14 9:32 ` Bert Karwatzki 0 siblings, 1 reply; 20+ messages in thread From: Bert Karwatzki @ 2025-05-14 0:11 UTC (permalink / raw) To: Thomas Gleixner, linux-kernel; +Cc: linux-next, llvm, Johannes Berg, spasswolf Am Mittwoch, dem 14.05.2025 um 00:33 +0200 schrieb Thomas Gleixner: > On Tue, May 13 2025 at 18:48, Bert Karwatzki wrote: > > > > > > I'll now start a bisection where I revert 76a853f86c97 where possible in > > > order to find the remaining bugs. > > > > The second bisection (from v6.15-rc6 to next-20250512) is finished now: > > > > This commit leads to lockups and kernel panics after > > watching ~5-10min of a youtube video while compiling a kernel, > > reverting it in next-20250512 is possible: > > 76a853f86c97 ("wifi: free SKBTX_WIFI_STATUS skb tx_flags flag") > > This commit leads to the boot failure, reverting leads to the > > compile error it is supposed to fix: > > 97f4b999e0c8 ("genirq: Use scoped_guard() to shut clang up") > > I really have a hard time to understand what you are trying to explain > here. 'This commit leads..' is so unspecified that I can't make any > sense of it. > > Also please make sure that you have commit b5fcb6898202 ("genirq: Ensure > flags in lock guard is consistently initialized") in your tree when > re-testing. That's fixing another subtle (AFAICT clang only) problem in > the guard conversion. If it's not in next yet, you can just merge > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/core > > into next or wait for the next next integration. > > Thanks > > tglx I merged git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/core into next-20250513 and this fixes the boot failure but the system still locks up after a few minutes (with flashing capslock). To solve this I need to revert 76a853f86c97 ("wifi: free SKBTX_WIFI_STATUS skb tx_flags flag") Also commit 97f4b999e0c8 did not actually cause the boot failure that was a bisection error. Bert Karwatzki ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: lockup and kernel panic in linux-next-202505{09,12} when compiled with clang 2025-05-14 0:11 ` Bert Karwatzki @ 2025-05-14 9:32 ` Bert Karwatzki 2025-05-14 10:23 ` Johannes Berg 0 siblings, 1 reply; 20+ messages in thread From: Bert Karwatzki @ 2025-05-14 9:32 UTC (permalink / raw) To: linux-kernel; +Cc: linux-next, llvm, Johannes Berg, spasswolf, Thomas Gleixner Am Mittwoch, dem 14.05.2025 um 02:11 +0200 schrieb Bert Karwatzki: > Am Mittwoch, dem 14.05.2025 um 00:33 +0200 schrieb Thomas Gleixner: > > On Tue, May 13 2025 at 18:48, Bert Karwatzki wrote: > > > > > > > > I'll now start a bisection where I revert 76a853f86c97 where possible in > > > > order to find the remaining bugs. > > > > > > The second bisection (from v6.15-rc6 to next-20250512) is finished now: > > > > > > This commit leads to lockups and kernel panics after > > > watching ~5-10min of a youtube video while compiling a kernel, > > > reverting it in next-20250512 is possible: > > > 76a853f86c97 ("wifi: free SKBTX_WIFI_STATUS skb tx_flags flag") > > > This commit leads to the boot failure, reverting leads to the > > > compile error it is supposed to fix: > > > 97f4b999e0c8 ("genirq: Use scoped_guard() to shut clang up") > > > > I really have a hard time to understand what you are trying to explain > > here. 'This commit leads..' is so unspecified that I can't make any > > sense of it. > > > > Also please make sure that you have commit b5fcb6898202 ("genirq: Ensure > > flags in lock guard is consistently initialized") in your tree when > > re-testing. That's fixing another subtle (AFAICT clang only) problem in > > the guard conversion. If it's not in next yet, you can just merge > > > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/core > > > > into next or wait for the next next integration. > > > > Thanks > > > > tglx > > > I merged git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/core into > next-20250513 and this fixes the boot failure but the system still locks up > after a few minutes (with flashing capslock). To solve this I need to revert > 76a853f86c97 ("wifi: free SKBTX_WIFI_STATUS skb tx_flags flag") > > Also commit 97f4b999e0c8 did not actually cause the boot failure that was a > bisection error. > > Bert Karwatzki To investigate the problem with commit 76a853f86c97 ("wifi: free SKBTX_WIFI_STATUS skb tx_flags flag") I used next-20250513 with irq/core merged to fix the boot issue and the reverted commit 76a853f86c97. $ git log --oneline bb3ff0e21a16 Revert "wifi: free SKBTX_WIFI_STATUS skb tx_flags flag" 28d1f7734aa3 Merge branch 'irq/core' into clang_panic aa94665adc28 (tag: next-20250513, origin/master, origin/HEAD, master) Add linux- next specific files for 20250513 Then I reapplied commit 76a853f86c97 hunk by hunk and found the one hunk that causes the problem: diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index 3e751dd3ae7b..63df21228029 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -4648,8 +4648,7 @@ static void ieee80211_8023_xmit(struct ieee80211_sub_if_data *sdata, memcpy(IEEE80211_SKB_CB(seg), info, sizeof(*info)); } - if (unlikely(skb->sk && - skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS)) { + if (unlikely(skb->sk && sock_flag(skb->sk, SOCK_WIFI_STATUS))) { info->status_data = ieee80211_store_ack_skb(local, skb, &info->flags, NULL); if (info->status_data) This is enough to cause a kernel panic when compiled with clang (clang-19.1.7 from debian sid). Compiling the same kernel with gcc (gcc-14.2.0 from debian sid) shows no problem. The wifi card used is 04:00.0 Network controller [0280]: MEDIATEK Corp. MT7921K (RZ608) Wi-Fi 6E 80MHz [14c3:0608] Bert Karwatzki ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: lockup and kernel panic in linux-next-202505{09,12} when compiled with clang 2025-05-14 9:32 ` Bert Karwatzki @ 2025-05-14 10:23 ` Johannes Berg 2025-05-14 13:46 ` Bert Karwatzki 0 siblings, 1 reply; 20+ messages in thread From: Johannes Berg @ 2025-05-14 10:23 UTC (permalink / raw) To: Bert Karwatzki, linux-kernel@vger.kernel.org Cc: linux-next@vger.kernel.org, llvm@lists.linux.dev, Thomas Gleixner, linux-wireless + linux-wireless On Wed, 2025-05-14 at 09:32 +0000, Bert Karwatzki wrote: > Then I reapplied commit 76a853f86c97 hunk by hunk and found the one hunk that > causes the problem: > > diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c > index 3e751dd3ae7b..63df21228029 100644 > --- a/net/mac80211/tx.c > +++ b/net/mac80211/tx.c > @@ -4648,8 +4648,7 @@ static void ieee80211_8023_xmit(struct > ieee80211_sub_if_data *sdata, > memcpy(IEEE80211_SKB_CB(seg), info, sizeof(*info)); > } > > - if (unlikely(skb->sk && > - skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS)) { > + if (unlikely(skb->sk && sock_flag(skb->sk, SOCK_WIFI_STATUS))) { > info->status_data = ieee80211_store_ack_skb(local, skb, > &info->flags, NULL); > if (info->status_data) I think it crashed later on the status, but this inserts the skb into the IDR so the status can pick it up to return the status and afaict _that's_ where it crashed. Still I don't really know what could go wrong? The (copied) skb should still have been keeping the socket alive. > This is enough to cause a kernel panic when compiled with clang (clang-19.1.7 > from debian sid). Compiling the same kernel with gcc (gcc-14.2.0 from debian > sid) shows no problem. Right, even stranger. But I can't even say you should look at this place (which inserts) or the other (which takes it out again and crashed) to compare the code :-/ johannes ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: lockup and kernel panic in linux-next-202505{09,12} when compiled with clang 2025-05-14 10:23 ` Johannes Berg @ 2025-05-14 13:46 ` Bert Karwatzki 2025-05-14 17:49 ` Johannes Berg 2025-05-14 18:56 ` Johannes Berg 0 siblings, 2 replies; 20+ messages in thread From: Bert Karwatzki @ 2025-05-14 13:46 UTC (permalink / raw) To: Johannes Berg, linux-kernel@vger.kernel.org Cc: linux-next@vger.kernel.org, llvm@lists.linux.dev, Thomas Gleixner, linux-wireless, spasswolf Am Mittwoch, dem 14.05.2025 um 12:23 +0200 schrieb Johannes Berg: > + linux-wireless > > On Wed, 2025-05-14 at 09:32 +0000, Bert Karwatzki wrote: > > > Then I reapplied commit 76a853f86c97 hunk by hunk and found the one hunk that > > causes the problem: > > > > diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c > > index 3e751dd3ae7b..63df21228029 100644 > > --- a/net/mac80211/tx.c > > +++ b/net/mac80211/tx.c > > @@ -4648,8 +4648,7 @@ static void ieee80211_8023_xmit(struct > > ieee80211_sub_if_data *sdata, > > memcpy(IEEE80211_SKB_CB(seg), info, sizeof(*info)); > > } > > > > - if (unlikely(skb->sk && > > - skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS)) { > > + if (unlikely(skb->sk && sock_flag(skb->sk, SOCK_WIFI_STATUS))) { > > info->status_data = ieee80211_store_ack_skb(local, skb, > > &info->flags, NULL); > > if (info->status_data) > > I think it crashed later on the status, but this inserts the skb into > the IDR so the status can pick it up to return the status and afaict > _that's_ where it crashed. > > Still I don't really know what could go wrong? The (copied) skb should > still have been keeping the socket alive. > > > This is enough to cause a kernel panic when compiled with clang (clang-19.1.7 > > from debian sid). Compiling the same kernel with gcc (gcc-14.2.0 from debian > > sid) shows no problem. > > Right, even stranger. But I can't even say you should look at this place > (which inserts) or the other (which takes it out again and crashed) to > compare the code :-/ > > > johannes I've split off the problematic piece of code into an noinline function to simplify the disassembly: diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index 20de6e6b0929..075e012d9992 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -4582,7 +4582,19 @@ static bool ieee80211_tx_8023(struct ieee80211_sub_if_data *sdata, return ret; } -static noinline void ieee80211_8023_xmit(struct ieee80211_sub_if_data *sdata, +static noinline void ieee80211_8023_xmit_clang_debug_helper(struct sk_buff *skb, + struct ieee80211_local *local, + struct ieee80211_tx_info *info) +{ + if (unlikely(skb->sk && sock_flag(skb->sk, SOCK_WIFI_STATUS))) { + info->status_data = ieee80211_store_ack_skb(local, skb, + &info->flags, NULL); + if (info->status_data) + info->status_data_idr = 1; + } +} + +static void ieee80211_8023_xmit(struct ieee80211_sub_if_data *sdata, struct net_device *dev, struct sta_info *sta, struct ieee80211_key *key, struct sk_buff *skb) { @@ -4648,12 +4660,7 @@ static noinline void ieee80211_8023_xmit(struct ieee80211_sub_if_data *sdata, memcpy(IEEE80211_SKB_CB(seg), info, sizeof(*info)); } - if (unlikely(skb->sk && sock_flag(skb->sk, SOCK_WIFI_STATUS))) { - info->status_data = ieee80211_store_ack_skb(local, skb, - &info->flags, NULL); - if (info->status_data) - info->status_data_idr = 1; - } + ieee80211_8023_xmit_clang_debug_helper(skb, local, info); dev_sw_netstats_tx_add(dev, skbs, len); sta->deflink.tx_stats.packets[queue] += skbs; This shows the the behaviour as the old code, i.e. kernel panic when compiled with clang(-19.1.7), no problem when compiled with gcc(-14.2.0). When compiled with clang the disassembly of the function is (from objdump -d) 000000000000a260 <ieee80211_8023_xmit_clang_debug_helper>: a260: 48 8b 47 18 mov 0x18(%rdi),%rax a264: 48 85 c0 test %rax,%rax a267: 74 0c je a275 <ieee80211_8023_xmit_clang_debug_helper+0x15> a269: 53 push %rbx a26a: 48 f7 40 60 00 00 08 testq $0x80000,0x60(%rax) a271: 00 a272: 75 07 jne a27b <ieee80211_8023_xmit_clang_debug_helper+0x1b> a274: 5b pop %rbx a275: 2e e9 00 00 00 00 cs jmp a27b <ieee80211_8023_xmit_clang_debug_helper+0x1b> a27b: 48 89 f8 mov %rdi,%rax a27e: 48 89 f7 mov %rsi,%rdi a281: 48 89 c6 mov %rax,%rsi a284: 48 89 d3 mov %rdx,%rbx a287: 31 c9 xor %ecx,%ecx a289: e8 02 ff ff ff call a190 <ieee80211_store_ack_skb> a28e: 25 ff 1f 00 00 and $0x1fff,%eax a293: 89 c2 mov %eax,%edx a295: b9 0f 00 fe ff mov $0xfffe000f,%ecx a29a: 23 4b 04 and 0x4(%rbx),%ecx a29d: c1 e2 04 shl $0x4,%edx a2a0: 09 d1 or %edx,%ecx a2a2: 89 4b 04 mov %ecx,0x4(%rbx) a2a5: 85 c0 test %eax,%eax a2a7: 74 cb je a274 <ieee80211_8023_xmit_clang_debug_helper+0x14> a2a9: 83 c9 08 or $0x8,%ecx a2ac: 89 4b 04 mov %ecx,0x4(%rbx) a2af: eb c3 jmp a274 <ieee80211_8023_xmit_clang_debug_helper+0x14> a2b1: 66 66 66 66 66 66 2e data16 data16 data16 data16 data16 cs nopw 0x0(%rax,%rax,1) a2b8: 0f 1f 84 00 00 00 00 a2bf: 00 When compiled with gcc the disassembly is 00000000000010e0 <ieee80211_8023_xmit_clang_debug_helper>: 10e0: 48 8b 4f 18 mov 0x18(%rdi),%rcx 10e4: 48 89 f8 mov %rdi,%rax 10e7: 48 85 c9 test %rcx,%rcx 10ea: 75 05 jne 10f1 <ieee80211_8023_xmit_clang_debug_helper+0x11> 10ec: e9 00 00 00 00 jmp 10f1 <ieee80211_8023_xmit_clang_debug_helper+0x11> 10f1: 48 8b 49 60 mov 0x60(%rcx),%rcx 10f5: f7 c1 00 00 08 00 test $0x80000,%ecx 10fb: 74 ef je 10ec <ieee80211_8023_xmit_clang_debug_helper+0xc> 10fd: 48 83 ec 08 sub $0x8,%rsp 1101: 48 89 f7 mov %rsi,%rdi 1104: 31 c9 xor %ecx,%ecx 1106: 48 89 c6 mov %rax,%rsi 1109: 48 89 14 24 mov %rdx,(%rsp) 110d: e8 ce f8 ff ff call 9e0 <ieee80211_store_ack_skb> 1112: 48 8b 14 24 mov (%rsp),%rdx 1116: 89 c1 mov %eax,%ecx 1118: 8b 42 04 mov 0x4(%rdx),%eax 111b: 81 e1 ff 1f 00 00 and $0x1fff,%ecx 1121: c1 e1 04 shl $0x4,%ecx 1124: 25 0f 00 fe ff and $0xfffe000f,%eax 1129: 09 c8 or %ecx,%eax 112b: 89 42 04 mov %eax,0x4(%rdx) 112e: a9 f0 ff 01 00 test $0x1fff0,%eax 1133: 74 04 je 1139 <ieee80211_8023_xmit_clang_debug_helper+0x59> 1135: 80 4a 04 08 orb $0x8,0x4(%rdx) 1139: 48 83 c4 08 add $0x8,%rsp 113d: e9 00 00 00 00 jmp 1142 <ieee80211_8023_xmit_clang_debug_helper+0x62> 1142: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 1149: 00 00 00 00 114d: 0f 1f 00 nopl (%rax) 1150: 90 nop 1151: 90 nop 1152: 90 nop 1153: 90 nop 1154: 90 nop 1155: 90 nop 1156: 90 nop 1157: 90 nop 1158: 90 nop 1159: 90 nop 115a: 90 nop 115b: 90 nop 115c: 90 nop 115d: 90 nop 115e: 90 nop 115f: 90 nop I've not yet taken a closer look, but perhaps the error is obvious for some one else. Bert Karwatzki ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: lockup and kernel panic in linux-next-202505{09,12} when compiled with clang 2025-05-14 13:46 ` Bert Karwatzki @ 2025-05-14 17:49 ` Johannes Berg 2025-05-14 18:56 ` Johannes Berg 1 sibling, 0 replies; 20+ messages in thread From: Johannes Berg @ 2025-05-14 17:49 UTC (permalink / raw) To: Bert Karwatzki, linux-kernel@vger.kernel.org Cc: linux-next@vger.kernel.org, llvm@lists.linux.dev, Thomas Gleixner, linux-wireless On Wed, 2025-05-14 at 15:46 +0200, Bert Karwatzki wrote: > > When compiled with clang the disassembly of the function is (from objdump -d) Can you show with relocations ("objdump -dr" I think)? The jumps with four 00 bytes don't really make sense if there aren't relocations for them. johannes ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: lockup and kernel panic in linux-next-202505{09,12} when compiled with clang 2025-05-14 13:46 ` Bert Karwatzki 2025-05-14 17:49 ` Johannes Berg @ 2025-05-14 18:56 ` Johannes Berg 2025-05-14 22:27 ` Bert Karwatzki 1 sibling, 1 reply; 20+ messages in thread From: Johannes Berg @ 2025-05-14 18:56 UTC (permalink / raw) To: Bert Karwatzki, linux-kernel@vger.kernel.org Cc: linux-next@vger.kernel.org, llvm@lists.linux.dev, Thomas Gleixner, linux-wireless > > I've split off the problematic piece of code into an noinline function to simplify the disassembly: > Oh and also, does it even still crash with that? :) Still I feel it's possibly some kind of weird side-effect and not strictly a compiler issue? But I don't see anything so far. johannes ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: lockup and kernel panic in linux-next-202505{09,12} when compiled with clang 2025-05-14 18:56 ` Johannes Berg @ 2025-05-14 22:27 ` Bert Karwatzki 2025-05-15 6:30 ` Johannes Berg 0 siblings, 1 reply; 20+ messages in thread From: Bert Karwatzki @ 2025-05-14 22:27 UTC (permalink / raw) To: Johannes Berg, linux-kernel@vger.kernel.org Cc: linux-next@vger.kernel.org, llvm@lists.linux.dev, Thomas Gleixner, linux-wireless, spasswolf Am Mittwoch, dem 14.05.2025 um 20:56 +0200 schrieb Johannes Berg: > > > > I've split off the problematic piece of code into an noinline function to simplify the disassembly: > > > > Oh and also, does it even still crash with that? :) Yes, it still crashes when compiled with clang. > > Still I feel it's possibly some kind of weird side-effect and not > strictly a compiler issue? But I don't see anything so far. > > johannes The problem only occur with PREEMPT_RT=y and clang,I was able to capture the panic message via netconsole: [ 267.339591][ T575] BUG: unable to handle page fault for address: ffffffff51e080b0 [ 267.339598][ T575] #PF: supervisor write access in kernel mode [ 267.339602][ T575] #PF: error_code(0x0002) - not-present page [ 267.339606][ T575] PGD f1cc3c067 P4D f1cc3c067 PUD 0 [ 267.339613][ T575] Oops: Oops: 0002 [#1] SMP NOPTI [ 267.339622][ T575] CPU: 0 UID: 0 PID: 575 Comm: napi/phy0-0 Not tainted 6.15.0-rc6-next-20250513-llvm-00009-gec34cd07a425 #968 PREEMPT_{RT,(full)} [ 267.339629][ T575] Hardware name: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-158L, BIOS E158LAMS.10F 11/11/2024 [ 267.339632][ T575] RIP: 0010:queued_spin_lock_slowpath+0x120/0x1c0 66 a9 83 7a 08 00 75 08 f3 90 83 7a 08 00 74 f8 [ 267.339659][ T575] RSP: 0018:ffffcc5a81edf998 EFLAGS: 00010002 [ 267.339664][ T575] RAX: ffffffffa87a5ee0 RBX: 0000000000000286 RCX: 0000000000040000 [ 267.339668][ T575] RDX: ffff8b6d2e6231c0 RSI: 0000000000000010 RDI: ffff8b5e8855cda8 [ 267.339671][ T575] RBP: ffff8b5e852ff300 R08: fffffffffffffff8 R09: 0000000000000001 R11: ffffffffa87f07f0 R12: ffff8b5e8855cd90 [ 267.339677][ T575] R13: ffff8b5ec1bd2480 R14: ffff8b5e8855cda8 R15: ffff8b5e8855cda8 [ 267.339681][ T575] FS: 0000000000000000(0000) GS:ffff8b6d84fc1000(0000) knlGS:0000000000000000 [ 267.339684][ T575] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 267.339687][ T575] CR2: ffffffff51e080b0 CR3: 0000000f1cc3a000 CR4: 0000000000750ef0 [ 267.339690][ T575] PKRU: 55555554 [ 267.339692][ T575] Call Trace: [ 267.339701][ T575] <TASK> [ 267.339705][ T575] _raw_spin_lock_irqsave+0x57/0x60 [ 267.339714][ T575] rt_spin_lock+0x73/0xa0 [ 267.339720][ T575] sock_queue_err_skb+0xdc/0x140 [ 267.339727][ T575] skb_complete_wifi_ack+0xa9/0x120 [ 267.339737][ T575] ieee80211_report_used_skb+0x541/0x6e0 [mac80211] [ 267.339799][ T575] ? srso_alias_return_thunk+0x5/0xfbef5 [ 267.339804][ T575] ? start_dl_timer+0xcf/0x110 [ 267.339814][ T575] ieee80211_tx_status_ext+0x3b3/0x870 [mac80211] [ 267.339851][ T575] ? raw_spin_rq_lock_nested+0x15/0x80 [ 267.339862][ T575] ? srso_alias_return_thunk+0x5/0xfbef5 [ 267.339866][ T575] ? rt_spin_lock+0x3d/0xa0 [ 267.339873][ T575] ? mt76_tx_status_unlock+0x38/0x230 [mt76] [ 267.339886][ T575] mt76_tx_status_unlock+0x1e0/0x230 [mt76] [ 267.339901][ T575] __mt76_tx_complete_skb+0x13b/0x2e0 [mt76] [ 267.339912][ T575] ? srso_alias_return_thunk+0x5/0xfbef5 [ 267.339915][ T575] ? rt_spin_unlock+0x12/0x40 [ 267.339918][ T575] ? srso_alias_return_thunk+0x5/0xfbef5 [ 267.339924][ T575] mt76_connac2_txwi_free+0x127/0x150 [mt76_connac_lib] [ 267.339938][ T575] mt7921_mac_tx_free+0x112/0x260 [mt7921_common] [ 267.339950][ T575] mt7921_rx_check+0x33/0xe0 [mt7921_common] [ 267.339957][ T575] mt76_dma_rx_poll+0x322/0x660 [mt76] [ 267.339970][ T575] ? mt792x_poll_rx+0x2a/0x120 [mt792x_lib] [ 267.339982][ T575] mt792x_poll_rx+0x71/0x120 [mt792x_lib] [ 267.339989][ T575] __napi_poll+0x2a/0x170 [ 267.339994][ T575] ? napi_threaded_poll_loop+0x32/0x1b0 [ 267.339998][ T575] napi_threaded_poll_loop+0xe4/0x1b0 [ 267.340001][ T575] ? napi_threaded_poll_loop+0x32/0x1b0 [ 267.340007][ T575] napi_threaded_poll+0x57/0x80 [ 267.340011][ T575] ? __pfx_napi_threaded_poll+0x10/0x10 [ 267.340014][ T575] kthread+0x25c/0x280 [ 267.340020][ T575] ? __pfx_kthread+0x10/0x10 [ 267.340025][ T575] ret_from_fork+0xc4/0x1b0 [ 267.340030][ T575] ? __pfx_kthread+0x10/0x10 [ 267.340034][ T575] ret_from_fork_asm+0x1a/0x30 [ 267.340043][ T575] </TASK> [ 267.340045][ T575] Modules linked in: netconsole ccm snd_seq_dummy snd_hrtimer snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_seq_device rfcomm bnep nls_ascii nls_cp437 vfat fat snd_ctl_led snd_hda_codec_realtek snd_hda_scodec_component snd_hda_codec_generic snd_hda_codec_hdmi btusb btbcm btintel snd_hda_intel btrtl btmtk snd_intel_dspcfg snd_hda_codec snd_soc_dmic snd_acp3x_pdm_dma snd_acp3x_rn uvcvideo bluetooth snd_soc_core snd_hwdep videobuf2_vmalloc videobuf2_memops uvc snd_hda_core videobuf2_v4l2 snd_pcm_oss videodev snd_mixer_oss snd_pcm snd_rn_pci_acp3x videobuf2_common snd_acp_config snd_soc_acpi msi_wmi ecdh_generic ecc mc wmi_bmof sparse_keymap snd_timer edac_mce_amd snd k10temp ccp snd_pci_acp3x soundcore battery ac button joydev hid_sensor_gyro_3d hid_sensor_magn_3d hid_sensor_als hid_sensor_prox hid_sensor_accel_3d hid_sensor_trigger hid_sensor_iio_common amd_pmc industrialio_triggered_buffer kfifo_buf evdev industrialio mt7921e mt7921_common mt792x_lib mt76_ libarc4 cfg80211 [ 267.340161][ T575] rfkill msr fuse nvme_fabrics efi_pstore configfs efivarfs autofs4 ext4 mbcache jbd2 amdgpu usbhid drm_panel_backlight_quirks cec drm_buddy drm_suballoc_helper drm_exec i2c_algo_bit drm_display_helper xhci_pci gpu_sched drm_ttm_helper xhci_hcd hid_sensor_hub ttm hid_multitouch mfd_core hid_generic psmouse i2c_hid_acpi drm_client_lib nvme amd_sfh usbcore i2c_hid drm_kms_helper hid serio_raw nvme_core r8169 i2c_piix4 i2c_smbus usb_common amdxcp crc16 i2c_designware_platform i2c_designware_core [ 267.340214][ T575] CR2: ffffffff51e080b0 [ 267.340219][ T575] ---[ end trace 0000000000000000 ]--- [ 267.536499][ T575] RIP: 0010:queued_spin_lock_slowpath+0x120/0x1c0 66 a9 83 7a 08 00 75 08 f3 90 83 7a 08 00 74 f8 [ 267.536514][ T575] RSP: 0018:ffffcc5a81edf998 EFLAGS: 00010002 [ 267.536518][ T575] RAX: ffffffffa87a5ee0 RBX: 0000000000000286 RCX: 0000000000040000 [ 267.536521][ T575] RDX: ffff8b6d2e6231c0 RSI: 0000000000000010 RDI: ffff8b5e8855cda8 [ 267.536523][ T575] RBP: ffff8b5e852ff300 R08: fffffffffffffff8 R09: 0000000000000001 R11: ffffffffa87f07f0 R12: ffff8b5e8855cd90 [ 267.536526][ T575] R13: ffff8b5ec1bd2480 R14: ffff8b5e8855cda8 R15: ffff8b5e8855cda8 [ 267.536528][ T575] FS: 0000000000000000(0000) GS:ffff8b6d84fc1000(0000) knlGS:0000000000000000 [ 267.536530][ T575] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 267.536532][ T575] CR2: ffffffff51e080b0 CR3: 0000000f1cc3a000 CR4: 0000000000750ef0 [ 267.536534][ T575] PKRU: 55555554 [ 267.536536][ T575] Kernel panic - not syncing: Fatal exception in interrupt [ 267.536948][ T575] Kernel Offset: 0x26e00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 267.735256][ T575] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Another try showed a different error (non-canonical address!) [ 115.685734][ T579] Oops: general protection fault, probably for non- canonical address 0x9d504b8ce3373: 0000 [#1] SMP NOPTI [ 115.685742][ T579] CPU: 13 UID: 0 PID: 579 Comm: napi/phy0-0 Not tainted 6.15.0-rc6-next-20250513-llvm-00009-gec34cd07a425 #970 PREEMPT_{RT,(full)} [ 115.685747][ T579] Hardware name: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-158L, BIOS E158LAMS.10F 11/11/2024 [ 115.685749][ T579] RIP: 0010:queued_spin_lock_slowpath+0x120/0x1c0 86 b9 83 7a 08 00 75 08 f3 90 83 7a 08 00 74 f8 [ 115.685758][ T579] RSP: 0018:ffffd0c244fc3998 EFLAGS: 00010006 [ 115.685761][ T579] RAX: 0009d504ff4811a3 RBX: 0000000000000286 RCX: 0000000000380000 [ 115.685764][ T579] RDX: ffff8e13ee9631c0 RSI: 0000000000000010 RDI: ffff8e08c29126a8 [ 115.685765][ T579] RBP: ffff8e055300d400 R08: fffffffffffffff8 R09: 0000000000000001 R11: ffffffffb89f07f0 R12: ffff8e08c2912690 [ 115.685769][ T579] R13: ffff8e056a2f2480 R14: ffff8e08c29126a8 R15: ffff8e08c29126a8 [ 115.685771][ T579] FS: 0000000000000000(0000) GS:ffff8e1435101000(0000) knlGS:0000000000000000 [ 115.685773][ T579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 115.685776][ T579] CR2: 00007f56b8d28000 CR3: 0000000163d15000 CR4: 0000000000750ef0 [ 115.685778][ T579] PKRU: 55555554 [ 115.685779][ T579] Call Trace: [ 115.685782][ T579] <TASK> [ 115.685784][ T579] _raw_spin_lock_irqsave+0x57/0x60 [ 115.685790][ T579] rt_spin_lock+0x73/0xa0 [ 115.685795][ T579] sock_queue_err_skb+0xdc/0x140 [ 115.685801][ T579] skb_complete_wifi_ack+0xa9/0x120 [ 115.685809][ T579] ieee80211_report_used_skb+0x541/0x6e0 [mac80211] [ 115.685858][ T579] ? srso_alias_return_thunk+0x5/0xfbef5 [ 115.685862][ T579] ? wake_up_q+0x4e/0xe0 [ 115.685867][ T579] ? srso_alias_return_thunk+0x5/0xfbef5 [ 115.685872][ T579] ieee80211_tx_status_ext+0x3b3/0x870 [mac80211] [ 115.685902][ T579] ? srso_alias_return_thunk+0x5/0xfbef5 [ 115.685905][ T579] ? rt_spin_lock+0x3d/0xa0 [ 115.685910][ T579] ? mt76_tx_status_unlock+0x38/0x230 [mt76] [ 115.685920][ T579] mt76_tx_status_unlock+0x1e0/0x230 [mt76] [ 115.685932][ T579] __mt76_tx_complete_skb+0x13b/0x2e0 [mt76] [ 115.685942][ T579] ? srso_alias_return_thunk+0x5/0xfbef5 [ 115.685945][ T579] ? rt_spin_unlock+0x12/0x40 [ 115.685947][ T579] ? srso_alias_return_thunk+0x5/0xfbef5 [ 115.685954][ T579] mt76_connac2_txwi_free+0x127/0x150 [mt76_connac_lib] [ 115.685964][ T579] mt7921_mac_tx_free+0x112/0x260 [mt7921_common] [ 115.685975][ T579] mt7921_rx_check+0x33/0xe0 [mt7921_common] [ 115.685982][ T579] mt76_dma_rx_poll+0x322/0x660 [mt76] [ 115.685993][ T579] ? mt792x_poll_rx+0x2a/0x120 [mt792x_lib] [ 115.686001][ T579] mt792x_poll_rx+0x71/0x120 [mt792x_lib] [ 115.686009][ T579] __napi_poll+0x2a/0x170 [ 115.686013][ T579] ? napi_threaded_poll_loop+0x32/0x1b0 [ 115.686017][ T579] napi_threaded_poll_loop+0xe4/0x1b0 [ 115.686020][ T579] ? napi_threaded_poll_loop+0x32/0x1b0 [ 115.686026][ T579] napi_threaded_poll+0x57/0x80 [ 115.686029][ T579] ? __pfx_napi_threaded_poll+0x10/0x10 [ 115.686032][ T579] kthread+0x25c/0x280 [ 115.686038][ T579] ? __pfx_kthread+0x10/0x10 [ 115.686043][ T579] ret_from_fork+0xc4/0x1b0 [ 115.686047][ T579] ? __pfx_kthread+0x10/0x10 [ 115.686051][ T579] ret_from_fork_asm+0x1a/0x30 [ 115.686058][ T579] </TASK> [ 115.686060][ T579] Modules linked in: ccm netconsole snd_seq_dummy snd_hrtimer snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_seq_device rfcomm bnep nls_ascii nls_cp437 vfat fat snd_ctl_led snd_hda_codec_realtek snd_hda_scodec_component snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel btusb snd_intel_dspcfg btbcm snd_hda_codec btintel btrtl uvcvideo snd_acp3x_pdm_dma btmtk snd_soc_dmic snd_hwdep snd_acp3x_rn snd_soc_core snd_hda_core videobuf2_vmalloc videobuf2_memops uvc bluetooth snd_pcm_oss videobuf2_v4l2 snd_mixer_oss videodev snd_pcm snd_rn_pci_acp3x snd_acp_config videobuf2_common snd_timer msi_wmi snd_soc_acpi ecdh_generic ecc mc wmi_bmof sparse_keymap edac_mce_amd snd ccp soundcore k10temp snd_pci_acp3x battery ac button joydev hid_sensor_prox hid_sensor_accel_3d hid_sensor_gyro_3d hid_sensor_als hid_sensor_magn_3d hid_sensor_trigger hid_sensor_iio_common amd_pmc industrialio_triggered_buffer kfifo_buf evdev industrialio mt7921e mt7921_common mt792x_lib mt76_ libarc4 cfg80211 [ 115.686166][ T579] rfkill msr fuse nvme_fabrics configfs efi_pstore efivarfs autofs4 ext4 mbcache jbd2 amdgpu usbhid drm_panel_backlight_quirks cec drm_buddy drm_suballoc_helper drm_exec i2c_algo_bit drm_display_helper xhci_pci gpu_sched hid_sensor_hub xhci_hcd psmouse drm_ttm_helper mfd_core hid_multitouch hid_generic ttm i2c_hid_acpi serio_raw usbcore drm_client_lib nvme amd_sfh i2c_hid hid drm_kms_helper nvme_core r8169 i2c_piix4 amdxcp usb_common crc16 i2c_smbus i2c_designware_platform i2c_designware_core [ 115.686236][ T579] ---[ end trace 0000000000000000 ]--- [ 115.782274][ T579] RIP: 0010:queued_spin_lock_slowpath+0x120/0x1c0 86 b9 83 7a 08 00 75 08 f3 90 83 7a 08 00 74 f8 [ 115.782274][ T579] RSP: 0018:ffffd0c244fc3998 EFLAGS: 00010006 [ 115.782274][ T579] RAX: 0009d504ff4811a3 RBX: 0000000000000286 RCX: 0000000000380000 [ 115.782274][ T579] RDX: ffff8e13ee9631c0 RSI: 0000000000000010 RDI: ffff8e08c29126a8 [ 115.846760][ T579] RBP: ffff8e055300d400 R08: fffffffffffffff8 R09: 0000000000000001 R11: ffffffffb89f07f0 R12: ffff8e08c2912690 [ 115.846760][ T579] R13: ffff8e056a2f2480 R14: ffff8e08c29126a8 R15: ffff8e08c29126a8 [ 115.846765][ T579] FS: 0000000000000000(0000) GS:ffff8e1435101000(0000) knlGS:0000000000000000 [ 115.846765][ T579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 115.846770][ T579] CR2: 00007f56b8d28000 CR3: 000000018d63a000 CR4: 0000000000750ef0 [ 115.846770][ T579] PKRU: 55555554 [ 115.846773][ T579] Kernel panic - not syncing: Fatal exception in interrupt [ 115.846773][ T579] Kernel Offset: 0x37000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 115.685734][ T579] Oops: general protection fault, probably for non- canonical address 0x9d504b8ce3373: 0000 [#1] SMP NOPTI [ 115.685742][ T579] CPU: 13 UID: 0 PID: 579 Comm: napi/phy0-0 Not tainted 6.15.0-rc6-next-20250513-llvm-00009-gec34cd07a425 #970 PREEMPT_{RT,(full)} [ 115.685747][ T579] Hardware name: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-158L, BIOS E158LAMS.10F 11/11/2024 [ 115.685749][ T579] RIP: 0010:queued_spin_lock_slowpath+0x120/0x1c0 86 b9 83 7a 08 00 75 08 f3 90 83 7a 08 00 74 f8 [ 115.685758][ T579] RSP: 0018:ffffd0c244fc3998 EFLAGS: 00010006 [ 115.685761][ T579] RAX: 0009d504ff4811a3 RBX: 0000000000000286 RCX: 0000000000380000 [ 115.685764][ T579] RDX: ffff8e13ee9631c0 RSI: 0000000000000010 RDI: ffff8e08c29126a8 [ 115.685765][ T579] RBP: ffff8e055300d400 R08: fffffffffffffff8 R09: 0000000000000001 R11: ffffffffb89f07f0 R12: ffff8e08c2912690 [ 115.685769][ T579] R13: ffff8e056a2f2480 R14: ffff8e08c29126a8 R15: ffff8e08c29126a8 [ 115.685771][ T579] FS: 0000000000000000(0000) GS:ffff8e1435101000(0000) knlGS:0000000000000000 [ 115.685773][ T579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 115.685776][ T579] CR2: 00007f56b8d28000 CR3: 0000000163d15000 CR4: 0000000000750ef0 [ 115.685778][ T579] PKRU: 55555554 [ 115.685779][ T579] Call Trace: [ 115.685782][ T579] <TASK> [ 115.685784][ T579] _raw_spin_lock_irqsave+0x57/0x60 [ 115.685790][ T579] rt_spin_lock+0x73/0xa0 [ 115.685795][ T579] sock_queue_err_skb+0xdc/0x140 [ 115.685801][ T579] skb_complete_wifi_ack+0xa9/0x120 [ 115.685809][ T579] ieee80211_report_used_skb+0x541/0x6e0 [mac80211] [ 115.685858][ T579] ? srso_alias_return_thunk+0x5/0xfbef5 [ 115.685862][ T579] ? wake_up_q+0x4e/0xe0 [ 115.685867][ T579] ? srso_alias_return_thunk+0x5/0xfbef5 [ 115.685872][ T579] ieee80211_tx_status_ext+0x3b3/0x870 [mac80211] [ 115.685902][ T579] ? srso_alias_return_thunk+0x5/0xfbef5 [ 115.685905][ T579] ? rt_spin_lock+0x3d/0xa0 [ 115.685910][ T579] ? mt76_tx_status_unlock+0x38/0x230 [mt76] [ 115.685920][ T579] mt76_tx_status_unlock+0x1e0/0x230 [mt76] [ 115.685932][ T579] __mt76_tx_complete_skb+0x13b/0x2e0 [mt76] [ 115.685942][ T579] ? srso_alias_return_thunk+0x5/0xfbef5 [ 115.685945][ T579] ? rt_spin_unlock+0x12/0x40 [ 115.685947][ T579] ? srso_alias_return_thunk+0x5/0xfbef5 [ 115.685954][ T579] mt76_connac2_txwi_free+0x127/0x150 [mt76_connac_lib] [ 115.685964][ T579] mt7921_mac_tx_free+0x112/0x260 [mt7921_common] [ 115.685975][ T579] mt7921_rx_check+0x33/0xe0 [mt7921_common] [ 115.685982][ T579] mt76_dma_rx_poll+0x322/0x660 [mt76] [ 115.685993][ T579] ? mt792x_poll_rx+0x2a/0x120 [mt792x_lib] [ 115.686001][ T579] mt792x_poll_rx+0x71/0x120 [mt792x_lib] [ 115.686009][ T579] __napi_poll+0x2a/0x170 [ 115.686013][ T579] ? napi_threaded_poll_loop+0x32/0x1b0 [ 115.686017][ T579] napi_threaded_poll_loop+0xe4/0x1b0 [ 115.686020][ T579] ? napi_threaded_poll_loop+0x32/0x1b0 [ 115.686026][ T579] napi_threaded_poll+0x57/0x80 [ 115.686029][ T579] ? __pfx_napi_threaded_poll+0x10/0x10 [ 115.686032][ T579] kthread+0x25c/0x280 [ 115.686038][ T579] ? __pfx_kthread+0x10/0x10 [ 115.686043][ T579] ret_from_fork+0xc4/0x1b0 [ 115.686047][ T579] ? __pfx_kthread+0x10/0x10 [ 115.686051][ T579] ret_from_fork_asm+0x1a/0x30 [ 115.686058][ T579] </TASK> [ 115.686060][ T579] Modules linked in: ccm netconsole snd_seq_dummy snd_hrtimer snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_seq_device rfcomm bnep nls_ascii nls_cp437 vfat fat snd_ctl_led snd_hda_codec_realtek snd_hda_scodec_component snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel btusb snd_intel_dspcfg btbcm snd_hda_codec btintel btrtl uvcvideo snd_acp3x_pdm_dma btmtk snd_soc_dmic snd_hwdep snd_acp3x_rn snd_soc_core snd_hda_core videobuf2_vmalloc videobuf2_memops uvc bluetooth snd_pcm_oss videobuf2_v4l2 snd_mixer_oss videodev snd_pcm snd_rn_pci_acp3x snd_acp_config videobuf2_common snd_timer msi_wmi snd_soc_acpi ecdh_generic ecc mc wmi_bmof sparse_keymap edac_mce_amd snd ccp soundcore k10temp snd_pci_acp3x battery ac button joydev hid_sensor_prox hid_sensor_accel_3d hid_sensor_gyro_3d hid_sensor_als hid_sensor_magn_3d hid_sensor_trigger hid_sensor_iio_common amd_pmc industrialio_triggered_buffer kfifo_buf evdev industrialio mt7921e mt7921_common mt792x_lib mt76_ libarc4 cfg80211 [ 115.686166][ T579] rfkill msr fuse nvme_fabrics configfs efi_pstore efivarfs autofs4 ext4 mbcache jbd2 amdgpu usbhid drm_panel_backlight_quirks cec drm_buddy drm_suballoc_helper drm_exec i2c_algo_bit drm_display_helper xhci_pci gpu_sched hid_sensor_hub xhci_hcd psmouse drm_ttm_helper mfd_core hid_multitouch hid_generic ttm i2c_hid_acpi serio_raw usbcore drm_client_lib nvme amd_sfh i2c_hid hid drm_kms_helper nvme_core r8169 i2c_piix4 amdxcp usb_common crc16 i2c_smbus i2c_designware_platform i2c_designware_core [ 115.686236][ T579] ---[ end trace 0000000000000000 ]--- [ 115.782274][ T579] RIP: 0010:queued_spin_lock_slowpath+0x120/0x1c0 86 b9 83 7a 08 00 75 08 f3 90 83 7a 08 00 74 f8 [ 115.782274][ T579] RSP: 0018:ffffd0c244fc3998 EFLAGS: 00010006 [ 115.782274][ T579] RAX: 0009d504ff4811a3 RBX: 0000000000000286 RCX: 0000000000380000 [ 115.782274][ T579] RDX: ffff8e13ee9631c0 RSI: 0000000000000010 RDI: ffff8e08c29126a8 [ 115.846760][ T579] RBP: ffff8e055300d400 R08: fffffffffffffff8 R09: 0000000000000001 R11: ffffffffb89f07f0 R12: ffff8e08c2912690 [ 115.846760][ T579] R13: ffff8e056a2f2480 R14: ffff8e08c29126a8 R15: ffff8e08c29126a8 [ 115.846765][ T579] FS: 0000000000000000(0000) GS:ffff8e1435101000(0000) knlGS:0000000000000000 [ 115.846765][ T579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 115.846770][ T579] CR2: 00007f56b8d28000 CR3: 000000018d63a000 CR4: 0000000000750ef0 [ 115.846770][ T579] PKRU: 55555554 [ 115.846773][ T579] Kernel panic - not syncing: Fatal exception in interrupt [ 115.846773][ T579] Kernel Offset: 0x37000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 115.846773][ T579] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Bert Karwatzki ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: lockup and kernel panic in linux-next-202505{09,12} when compiled with clang 2025-05-14 22:27 ` Bert Karwatzki @ 2025-05-15 6:30 ` Johannes Berg 2025-05-15 9:10 ` Bert Karwatzki 0 siblings, 1 reply; 20+ messages in thread From: Johannes Berg @ 2025-05-15 6:30 UTC (permalink / raw) To: Bert Karwatzki, linux-kernel@vger.kernel.org Cc: linux-next@vger.kernel.org, llvm@lists.linux.dev, Thomas Gleixner, linux-wireless On Thu, 2025-05-15 at 00:27 +0200, Bert Karwatzki wrote: > Am Mittwoch, dem 14.05.2025 um 20:56 +0200 schrieb Johannes Berg: > > > > > > I've split off the problematic piece of code into an noinline function to simplify the disassembly: > > > > > > > Oh and also, does it even still crash with that? :) > > Yes, it still crashes when compiled with clang. OK, just checking. :) FWIW, I'm not convinced at all that the code you were looking at is really the problem. The crash (see below) is happening on the status side. Of course it cannot crash on the status side if on the TX side we never enter anything into the IDR data structure, and never tag the SKB to look up in the IDR and therefore never try to create the status report on the status side. Basically what happens is this: - on TX, if we have a socket requesting status, create a copy of the SKB, put it into the IDR, and put the IDR index into the original skb->cb - then transmit the original skb, of course - on TX status report from the driver, see if the skb->cb is tagged with the IDR value, if so, report the copy of the SKB back to the socket with the status information (The reason we need to make a copy is that the SKB could be encrypted or otherwise modified in flight, and we don't want to undo that, rather keeping a copy for the report.) > [ 267.339591][ T575] BUG: unable to handle page fault for address: ffffffff51e080b0 > [ 267.339598][ T575] #PF: supervisor write access in kernel mode > [ 267.339602][ T575] #PF: error_code(0x0002) - not-present page > [ 267.339606][ T575] PGD f1cc3c067 P4D f1cc3c067 PUD 0 > [ 267.339613][ T575] Oops: Oops: 0002 [#1] SMP NOPTI > [ 267.339622][ T575] CPU: 0 UID: 0 PID: 575 Comm: napi/phy0-0 Not tainted > 6.15.0-rc6-next-20250513-llvm-00009-gec34cd07a425 #968 PREEMPT_{RT,(full)} > [ 267.339629][ T575] Hardware name: Micro-Star International Co., Ltd. Alpha > 15 B5EEK/MS-158L, BIOS E158LAMS.10F 11/11/2024 > [ 267.339632][ T575] RIP: 0010:queued_spin_lock_slowpath+0x120/0x1c0 ... > [ 267.339692][ T575] Call Trace: > [ 267.339701][ T575] <TASK> > [ 267.339705][ T575] _raw_spin_lock_irqsave+0x57/0x60 > [ 267.339714][ T575] rt_spin_lock+0x73/0xa0 > [ 267.339720][ T575] sock_queue_err_skb+0xdc/0x140 > [ 267.339727][ T575] skb_complete_wifi_ack+0xa9/0x120 > [ 267.339737][ T575] ieee80211_report_used_skb+0x541/0x6e0 [mac80211] > [ 267.339799][ T575] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 267.339804][ T575] ? start_dl_timer+0xcf/0x110 > [ 267.339814][ T575] ieee80211_tx_status_ext+0x3b3/0x870 [mac80211] > [ 267.339851][ T575] ? raw_spin_rq_lock_nested+0x15/0x80 > [ 267.339862][ T575] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 267.339866][ T575] ? rt_spin_lock+0x3d/0xa0 > [ 267.339873][ T575] ? mt76_tx_status_unlock+0x38/0x230 [mt76] > [ 267.339886][ T575] mt76_tx_status_unlock+0x1e0/0x230 [mt76] Yeah so that's the crash on the status report as explained above, it kind of looks almost like the skb->sk was freed and somehow invalid now? But I don't see a general issue here (will keep digging), and how come it only shows up with clang? Since it reproduces pretty reliably, maybe you could do with KASAN? Also could be interesting - what userspace are you running with wifi? What tool is even setting up the wifi status? If you don't really know maybe just put WARN_ON(1) into net/core/sock.s where SO_WIFI_STATUS is written (sk_setsockopt). johannes ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: lockup and kernel panic in linux-next-202505{09,12} when compiled with clang 2025-05-15 6:30 ` Johannes Berg @ 2025-05-15 9:10 ` Bert Karwatzki 2025-05-16 18:19 ` Bert Karwatzki 0 siblings, 1 reply; 20+ messages in thread From: Bert Karwatzki @ 2025-05-15 9:10 UTC (permalink / raw) To: Johannes Berg, linux-kernel@vger.kernel.org Cc: linux-next@vger.kernel.org, llvm@lists.linux.dev, Thomas Gleixner, linux-wireless, spasswolf Am Donnerstag, dem 15.05.2025 um 08:30 +0200 schrieb Johannes Berg: > On Thu, 2025-05-15 at 00:27 +0200, Bert Karwatzki wrote: > > Am Mittwoch, dem 14.05.2025 um 20:56 +0200 schrieb Johannes Berg: > > > > > > > > I've split off the problematic piece of code into an noinline function to simplify the disassembly: > > > > > > > > > > Oh and also, does it even still crash with that? :) > > > > Yes, it still crashes when compiled with clang. > > OK, just checking. :) To be more precise I need clang AND PREEMPT_RT=y to get a crash. > > FWIW, I'm not convinced at all that the code you were looking at is > really the problem. The crash (see below) is happening on the status > side. Of course it cannot crash on the status side if on the TX side we > never enter anything into the IDR data structure, and never tag the SKB > to look up in the IDR and therefore never try to create the status > report on the status side. After looking at the backtrace I'm also no longer conviced that piece of code is the problem. > > Basically what happens is this: > > - on TX, if we have a socket requesting status, create a copy of the > SKB, put it into the IDR, and put the IDR index into the original > skb->cb > - then transmit the original skb, of course > - on TX status report from the driver, see if the skb->cb is tagged with > the IDR value, if so, report the copy of the SKB back to the socket > with the status information > > (The reason we need to make a copy is that the SKB could be encrypted or > otherwise modified in flight, and we don't want to undo that, rather > keeping a copy for the report.) > > > [ 267.339591][ T575] BUG: unable to handle page fault for address: ffffffff51e080b0 > > [ 267.339598][ T575] #PF: supervisor write access in kernel mode > > [ 267.339602][ T575] #PF: error_code(0x0002) - not-present page > > [ 267.339606][ T575] PGD f1cc3c067 P4D f1cc3c067 PUD 0 > > [ 267.339613][ T575] Oops: Oops: 0002 [#1] SMP NOPTI > > [ 267.339622][ T575] CPU: 0 UID: 0 PID: 575 Comm: napi/phy0-0 Not tainted > > 6.15.0-rc6-next-20250513-llvm-00009-gec34cd07a425 #968 PREEMPT_{RT,(full)} > > [ 267.339629][ T575] Hardware name: Micro-Star International Co., Ltd. Alpha > > 15 B5EEK/MS-158L, BIOS E158LAMS.10F 11/11/2024 > > [ 267.339632][ T575] RIP: 0010:queued_spin_lock_slowpath+0x120/0x1c0 > ... > > [ 267.339692][ T575] Call Trace: > > [ 267.339701][ T575] <TASK> > > [ 267.339705][ T575] _raw_spin_lock_irqsave+0x57/0x60 > > [ 267.339714][ T575] rt_spin_lock+0x73/0xa0 > > [ 267.339720][ T575] sock_queue_err_skb+0xdc/0x140 > > [ 267.339727][ T575] skb_complete_wifi_ack+0xa9/0x120 > > [ 267.339737][ T575] ieee80211_report_used_skb+0x541/0x6e0 [mac80211] > > [ 267.339799][ T575] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 267.339804][ T575] ? start_dl_timer+0xcf/0x110 > > [ 267.339814][ T575] ieee80211_tx_status_ext+0x3b3/0x870 [mac80211] > > [ 267.339851][ T575] ? raw_spin_rq_lock_nested+0x15/0x80 > > [ 267.339862][ T575] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 267.339866][ T575] ? rt_spin_lock+0x3d/0xa0 > > [ 267.339873][ T575] ? mt76_tx_status_unlock+0x38/0x230 [mt76] > > [ 267.339886][ T575] mt76_tx_status_unlock+0x1e0/0x230 [mt76] > > Yeah so that's the crash on the status report as explained above, it > kind of looks almost like the skb->sk was freed and somehow invalid now? > But I don't see a general issue here (will keep digging), and how come > it only shows up with clang? > > Since it reproduces pretty reliably, maybe you could do with KASAN? > I'm currently doing a testrun with KASAN enabled, test is running ~1h so far (without KASAN the max time to a crash was about 10min), so KASAN is probably killing the bug (there are no messages from KASAN in dmesg). > Also could be interesting - what userspace are you running with wifi? > What tool is even setting up the wifi status? If you don't really know > maybe just put WARN_ON(1) into net/core/sock.s where SO_WIFI_STATUS is > written (sk_setsockopt). > > johannes For the recording these backtraces I disabled wifi just after booting (it usually takes ~5s to connect here) with network manager (nmcli)(from debian sid (last updated on 20250511, before I encountered this bug)) $ nmcli radio wifi off then I set up the netconsole and reenabled wifi and waited for the crash $ nmcli radio wifi on Bert Karwatzki ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: lockup and kernel panic in linux-next-202505{09,12} when compiled with clang 2025-05-15 9:10 ` Bert Karwatzki @ 2025-05-16 18:19 ` Bert Karwatzki 2025-05-17 11:34 ` Bert Karwatzki 0 siblings, 1 reply; 20+ messages in thread From: Bert Karwatzki @ 2025-05-16 18:19 UTC (permalink / raw) To: Johannes Berg, linux-kernel@vger.kernel.org Cc: linux-next@vger.kernel.org, llvm@lists.linux.dev, Thomas Gleixner, linux-wireless, spasswolf I've added a debugging statement: diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index 3bd5ee0995fe..853493eca4f5 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -4586,7 +4586,11 @@ static noinline void ieee80211_8023_xmit_clang_debug_helper(struct sk_buff *skb, struct ieee80211_local *local, struct ieee80211_tx_info *info) { - if (unlikely(skb->sk && sock_flag(skb->sk, SOCK_WIFI_STATUS))) { + if (unlikely(skb->sk && ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) || + sock_flag(skb->sk, SOCK_WIFI_STATUS)))) { + if ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) ^ sock_flag(skb->sk, SOCK_WIFI_STATUS)) + printk(KERN_INFO "%s: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = %u sock_flag(skb->sk, SOCK_WIFI_STATUS) = %u\n", + __func__, (skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS), sock_flag(skb->sk, SOCK_WIFI_STATUS)); info->status_data = ieee80211_store_ack_skb(local, skb, &info->flags, NULL); if (info->status_data) This gives the following logoutput (and a lockup), indicating that sock_flag(skb->sk, SOCK_WIFI_STATUS) and (skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) are actually NOT equivalent (when compiled with clang and PREEMPT_RT=y) 2025-05-16T20:09:58.818563+02:00 lisa kernel: [ T581] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 2025-05-16T20:10:19.829599+02:00 lisa kernel: [ C2] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: 2025-05-16T20:10:19.829607+02:00 lisa kernel: [ C2] rcu: 4-...!: (1 GPs behind) idle=1ddc/1/0x4000000000000000 softirq=0/0 fqs=72 rcuc=21002 jiffies(starved) 2025-05-16T20:10:19.829609+02:00 lisa kernel: [ C2] rcu: 14-...!: (1 GPs behind) idle=4cbc/1/0x4000000000000000 softirq=0/0 fqs=72 rcuc=21013 jiffies(starved) 2025-05-16T20:10:19.829611+02:00 lisa kernel: [ C2] rcu: Tasks blocked on level-0 rcu_node (CPUs 0-15): P581 2025-05-16T20:10:19.829613+02:00 lisa kernel: [ C2] rcu: (detected by 2, t=21002 jiffies, g=7525, q=973 ncpus=16) 2025-05-16T20:10:19.829615+02:00 lisa kernel: [ C2] Sending NMI from CPU 2 to CPUs 4: 2025-05-16T20:10:19.829616+02:00 lisa kernel: [ C4] NMI backtrace for cpu 4 2025-05-16T20:10:19.829618+02:00 lisa kernel: [ C4] CPU: 4 UID: 0 PID: 581 Comm: napi/phy0-0 Not tainted 6.15.0-rc6- next-20250513-llvm-00011-gf9a7992d47e7 #978 PREEMPT_{RT,(full)} 2025-05-16T20:10:19.829620+02:00 lisa kernel: [ C4] Hardware name: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-158L, BIOS E158LAMS.10F 11/11/2024 2025-05-16T20:10:19.829622+02:00 lisa kernel: [ C4] RIP: 0010:rtlock_slowlock_locked+0xaed/0xc70 2025-05-16T20:10:19.829623+02:00 lisa kernel: [ C4] Code: 59 61 6a ff 49 c7 07 01 00 00 00 4d 89 7f 08 65 ff 0d b7 bd c1 00 74 4f 4d 85 ed 0f 84 76 ff ff ff e8 77 28 71 ff 48 8b 43 18 <48> 83 e0 fe 49 39 c5 75 2a 41 83 7d 34 00 0f 84 54 ff ff ff 41 8b 2025-05-16T20:10:19.829625+02:00 lisa kernel: [ C4] RSP: 0018:ffffcc0dc1ef7b00 EFLAGS: 00000246 2025-05-16T20:10:19.829627+02:00 lisa kernel: [ C4] RAX: ffff89e9c52f8001 RBX: ffff89e9e17a2e10 RCX: ffff89e9c52f8001 2025-05-16T20:10:19.829629+02:00 lisa kernel: [ C4] RDX: ffffcc0dc1ef7b38 RSI: ffff89e9c52fd000 RDI: ffffcc0dc1ef7bf0 2025-05-16T20:10:19.829631+02:00 lisa kernel: [ C4] RBP: ffff89e9c52fd820 R08: ffffffffffffeb42 R09: 0000000000000002 2025-05-16T20:10:19.829632+02:00 lisa kernel: [ C4] R10: 00000000000000e4 R11: 00000000000005fe R12: ffffcc0dc1ef7b38 2025-05-16T20:10:19.829634+02:00 lisa kernel: [ C4] R13: ffff89e9c52f8000 R14: ffff89e9c52fd000 R15: ffffcc0dc1ef7bf0 2025-05-16T20:10:19.829636+02:00 lisa kernel: [ C4] FS: 0000000000000000(0000) GS:ffff89f8986c1000(0000) knlGS:0000000000000000 2025-05-16T20:10:19.829637+02:00 lisa kernel: [ C4] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2025-05-16T20:10:19.829638+02:00 lisa kernel: [ C4] CR2: 00007f9ea92e2000 CR3: 00000007e5a3a000 CR4: 0000000000750ef0 2025-05-16T20:10:19.829640+02:00 lisa kernel: [ C4] PKRU: 55555554 2025-05-16T20:10:19.829642+02:00 lisa kernel: [ C4] Call Trace: 2025-05-16T20:10:19.829643+02:00 lisa kernel: [ C4] <TASK> 2025-05-16T20:10:19.829644+02:00 lisa kernel: [ C4] ? rt_spin_unlock+0x12/0x40 2025-05-16T20:10:19.829646+02:00 lisa kernel: [ C4] ? srso_alias_return_thunk+0x5/0xfbef5 2025-05-16T20:10:19.829648+02:00 lisa kernel: [ C4] rt_spin_lock+0x81/0xa0 2025-05-16T20:10:19.829649+02:00 lisa kernel: [ C4] mt76_rx_complete+0x49/0x2e0 [mt76] 2025-05-16T20:10:19.829651+02:00 lisa kernel: [ C4] ? srso_alias_return_thunk+0x5/0xfbef5 2025-05-16T20:10:19.829653+02:00 lisa kernel: [ C4] mt76_rx_poll_complete+0x4a4/0x4d0 [mt76] 2025-05-16T20:10:19.829654+02:00 lisa kernel: [ C4] ? mt76_dma_rx_poll+0xf6/0x660 [mt76] 2025-05-16T20:10:19.829656+02:00 lisa kernel: [ C4] mt76_dma_rx_poll+0x147/0x660 [mt76] 2025-05-16T20:10:19.829657+02:00 lisa kernel: [ C4] ? mt792x_poll_rx+0x2a/0x120 [mt792x_lib] 2025-05-16T20:10:19.829658+02:00 lisa kernel: [ C4] mt792x_poll_rx+0x71/0x120 [mt792x_lib] 2025-05-16T20:10:19.829660+02:00 lisa kernel: [ C4] __napi_poll+0x2a/0x170 2025-05-16T20:10:19.829662+02:00 lisa kernel: [ C4] ? napi_threaded_poll_loop+0x32/0x1b0 2025-05-16T20:10:19.829663+02:00 lisa kernel: [ C4] napi_threaded_poll_loop+0xe4/0x1b0 2025-05-16T20:10:19.829678+02:00 lisa kernel: [ C4] ? napi_threaded_poll_loop+0x32/0x1b0 2025-05-16T20:10:19.829679+02:00 lisa kernel: [ C4] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 2025-05-16T20:10:19.829680+02:00 lisa kernel: [ C4] napi_threaded_poll+0x57/0x80 2025-05-16T20:10:19.829682+02:00 lisa kernel: [ C4] ? __pfx_napi_threaded_poll+0x10/0x10 2025-05-16T20:10:19.829683+02:00 lisa kernel: [ C4] kthread+0x25c/0x280 2025-05-16T20:10:19.829685+02:00 lisa kernel: [ C4] ? __pfx_kthread+0x10/0x10 2025-05-16T20:10:19.829696+02:00 lisa kernel: [ C4] ret_from_fork+0xc4/0x1b0 2025-05-16T20:10:19.829698+02:00 lisa kernel: [ C4] ? __pfx_kthread+0x10/0x10 2025-05-16T20:10:19.829699+02:00 lisa kernel: [ C4] ret_from_fork_asm+0x1a/0x30 2025-05-16T20:10:19.829701+02:00 lisa kernel: [ C4] </TASK> 2025-05-16T20:10:19.829702+02:00 lisa kernel: [ C2] Sending NMI from CPU 2 to CPUs 14: 2025-05-16T20:10:19.829704+02:00 lisa kernel: [ C14] NMI backtrace for cpu 14 2025-05-16T20:10:19.829705+02:00 lisa kernel: [ C14] CPU: 14 UID: 0 PID: 585 Comm: napi/phy0-0 Not tainted 6.15.0-rc6- next-20250513-llvm-00011-gf9a7992d47e7 #978 PREEMPT_{RT,(full)} 2025-05-16T20:10:19.829707+02:00 lisa kernel: [ C14] Hardware name: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-158L, BIOS E158LAMS.10F 11/11/2024 2025-05-16T20:10:19.829708+02:00 lisa kernel: [ C14] RIP: 0010:queued_spin_lock_slowpath+0x134/0x1c0 2025-05-16T20:10:19.829710+02:00 lisa kernel: [ C14] Code: 03 c1 e6 04 83 e0 fc 49 c7 c0 f8 ff ff ff 49 8b 84 40 a0 fa 98 95 48 89 94 06 c0 21 06 96 83 7a 08 00 75 08 f3 90 83 7a 08 00 <74> f8 48 8b 32 48 85 f6 74 09 0f 0d 0e eb 0a 31 f6 eb 06 31 f6 eb 2025-05-16T20:10:19.829714+02:00 lisa kernel: [ C14] RSP: 0018:ffffcc0dc201f998 EFLAGS: 00000046 2025-05-16T20:10:19.829715+02:00 lisa kernel: [ C14] RAX: 0000000000000000 RBX: 0000000000000286 RCX: 00000000003c0000 2025-05-16T20:10:19.829717+02:00 lisa kernel: [ C14] RDX: ffff89f82e9a31c0 RSI: 0000000000000010 RDI: ffff89ea89ad79a8 2025-05-16T20:10:19.829718+02:00 lisa kernel: [ C14] RBP: ffff89ea05e8e000 R08: fffffffffffffff8 R09: 0000000000000001 2025-05-16T20:10:19.829720+02:00 lisa kernel: [ C14] R10: 0000000000000001 R11: ffffffff951f07f0 R12: ffff89ea89ad7990 2025-05-16T20:10:19.829722+02:00 lisa kernel: [ C14] R13: ffff89e9e17a2480 R14: ffff89ea89ad79a8 R15: ffff89ea89ad79a8 2025-05-16T20:10:19.829723+02:00 lisa kernel: [ C14] FS: 0000000000000000(0000) GS:ffff89f898941000(0000) knlGS:0000000000000000 2025-05-16T20:10:19.829735+02:00 lisa kernel: [ C14] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2025-05-16T20:10:19.829737+02:00 lisa kernel: [ C14] CR2: 00007f522c1f7000 CR3: 00000007e5a3a000 CR4: 0000000000750ef0 2025-05-16T20:10:19.829738+02:00 lisa kernel: [ C14] PKRU: 55555554 2025-05-16T20:10:19.829740+02:00 lisa kernel: [ C14] Call Trace: 2025-05-16T20:10:19.829741+02:00 lisa kernel: [ C14] <TASK> 2025-05-16T20:10:19.829743+02:00 lisa kernel: [ C14] _raw_spin_lock_irqsave+0x57/0x60 2025-05-16T20:10:19.829744+02:00 lisa kernel: [ C14] rt_spin_lock+0x73/0xa0 2025-05-16T20:10:19.829745+02:00 lisa kernel: [ C14] sock_queue_err_skb+0xdc/0x140 2025-05-16T20:10:19.829773+02:00 lisa kernel: [ C14] skb_complete_wifi_ack+0xa9/0x120 2025-05-16T20:10:19.829775+02:00 lisa kernel: [ C14] ieee80211_report_used_skb+0x541/0x6e0 [mac80211] 2025-05-16T20:10:19.829786+02:00 lisa kernel: [ C14] ? srso_alias_return_thunk+0x5/0xfbef5 2025-05-16T20:10:19.829816+02:00 lisa kernel: [ C14] ? __schedule+0x506/0x1280 2025-05-16T20:10:19.829822+02:00 lisa kernel: [ C14] ? preempt_schedule_irq+0x42/0x80 2025-05-16T20:10:19.829823+02:00 lisa kernel: [ C14] ieee80211_tx_status_ext+0x3b3/0x870 [mac80211] 2025-05-16T20:10:19.829824+02:00 lisa kernel: [ C14] ? srso_alias_return_thunk+0x5/0xfbef5 2025-05-16T20:10:19.829826+02:00 lisa kernel: [ C14] ? rt_spin_lock+0x3d/0xa0 2025-05-16T20:10:19.829828+02:00 lisa kernel: [ C14] ? mt76_tx_status_unlock+0x38/0x230 [mt76] 2025-05-16T20:10:19.829829+02:00 lisa kernel: [ C14] mt76_tx_status_unlock+0x1e0/0x230 [mt76] 2025-05-16T20:10:19.829830+02:00 lisa kernel: [ C14] __mt76_tx_complete_skb+0x13b/0x2e0 [mt76] 2025-05-16T20:10:19.829832+02:00 lisa kernel: [ C14] ? srso_alias_return_thunk+0x5/0xfbef5 2025-05-16T20:10:19.829833+02:00 lisa kernel: [ C14] ? rt_spin_unlock+0x12/0x40 2025-05-16T20:10:19.829834+02:00 lisa kernel: [ C14] ? srso_alias_return_thunk+0x5/0xfbef5 2025-05-16T20:10:19.829836+02:00 lisa kernel: [ C14] mt76_connac2_txwi_free+0x127/0x150 [mt76_connac_lib] 2025-05-16T20:10:19.829838+02:00 lisa kernel: [ C14] mt7921_mac_tx_free+0x112/0x260 [mt7921_common] 2025-05-16T20:10:19.829839+02:00 lisa kernel: [ C14] mt7921_rx_check+0x33/0xe0 [mt7921_common] 2025-05-16T20:10:19.829841+02:00 lisa kernel: [ C14] mt76_dma_rx_poll+0x322/0x660 [mt76] 2025-05-16T20:10:19.829842+02:00 lisa kernel: [ C14] ? mt792x_poll_rx+0x2a/0x120 [mt792x_lib] 2025-05-16T20:10:19.829843+02:00 lisa kernel: [ C14] mt792x_poll_rx+0x71/0x120 [mt792x_lib] 2025-05-16T20:10:19.829845+02:00 lisa kernel: [ C14] __napi_poll+0x2a/0x170 2025-05-16T20:10:19.829846+02:00 lisa kernel: [ C14] ? napi_threaded_poll_loop+0x32/0x1b0 2025-05-16T20:10:19.829848+02:00 lisa kernel: [ C14] napi_threaded_poll_loop+0xe4/0x1b0 2025-05-16T20:10:19.829849+02:00 lisa kernel: [ C14] ? napi_threaded_poll_loop+0x32/0x1b0 2025-05-16T20:10:19.829851+02:00 lisa kernel: [ C14] ? ttwu_do_activate+0xa9/0x1a0 2025-05-16T20:10:19.829863+02:00 lisa kernel: [ C14] ? srso_alias_return_thunk+0x5/0xfbef5 2025-05-16T20:10:19.829864+02:00 lisa kernel: [ C14] napi_threaded_poll+0x57/0x80 2025-05-16T20:10:19.829866+02:00 lisa kernel: [ C14] ? __pfx_napi_threaded_poll+0x10/0x10 2025-05-16T20:10:19.829867+02:00 lisa kernel: [ C14] kthread+0x25c/0x280 2025-05-16T20:10:19.829868+02:00 lisa kernel: [ C14] ? __pfx_kthread+0x10/0x10 2025-05-16T20:10:19.829871+02:00 lisa kernel: [ C14] ret_from_fork+0xc4/0x1b0 2025-05-16T20:10:19.829873+02:00 lisa kernel: [ C14] ? __pfx_kthread+0x10/0x10 2025-05-16T20:10:19.829874+02:00 lisa kernel: [ C14] ret_from_fork_asm+0x1a/0x30 2025-05-16T20:10:19.829875+02:00 lisa kernel: [ C14] </TASK> Bert Karwatzki ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: lockup and kernel panic in linux-next-202505{09,12} when compiled with clang 2025-05-16 18:19 ` Bert Karwatzki @ 2025-05-17 11:34 ` Bert Karwatzki 2025-05-17 19:49 ` Bert Karwatzki 0 siblings, 1 reply; 20+ messages in thread From: Bert Karwatzki @ 2025-05-17 11:34 UTC (permalink / raw) To: Johannes Berg, linux-kernel@vger.kernel.org Cc: linux-next@vger.kernel.org, llvm@lists.linux.dev, Thomas Gleixner, linux-wireless, spasswolf Am Freitag, dem 16.05.2025 um 20:19 +0200 schrieb Bert Karwatzki: > I've added a debugging statement: > > diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c > index 3bd5ee0995fe..853493eca4f5 100644 > --- a/net/mac80211/tx.c > +++ b/net/mac80211/tx.c > @@ -4586,7 +4586,11 @@ static noinline void ieee80211_8023_xmit_clang_debug_helper(struct sk_buff *skb, > struct ieee80211_local *local, > struct ieee80211_tx_info *info) > { > - if (unlikely(skb->sk && sock_flag(skb->sk, SOCK_WIFI_STATUS))) { > + if (unlikely(skb->sk && ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) || > + sock_flag(skb->sk, SOCK_WIFI_STATUS)))) { > + if ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) ^ sock_flag(skb->sk, SOCK_WIFI_STATUS)) > + printk(KERN_INFO "%s: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = %u sock_flag(skb->sk, > SOCK_WIFI_STATUS) = %u\n", > + __func__, (skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS), sock_flag(skb->sk, > SOCK_WIFI_STATUS)); > info->status_data = ieee80211_store_ack_skb(local, skb, > &info->flags, NULL); > if (info->status_data) > > This gives the following logoutput (and a lockup), indicating that sock_flag(skb->sk, SOCK_WIFI_STATUS) and > (skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) are actually NOT equivalent (when compiled with clang and > PREEMPT_RT=y) I've added more debugging output: diff --git a/include/net/sock.h b/include/net/sock.h index e223102337c7..e13560b5b7a8 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2735,8 +2735,10 @@ static inline void _sock_tx_timestamp(struct sock *sk, *tskey = atomic_inc_return(&sk->sk_tskey) - 1; } } - if (unlikely(sock_flag(sk, SOCK_WIFI_STATUS))) + if (unlikely(sock_flag(sk, SOCK_WIFI_STATUS))) { + printk(KERN_INFO "%s: setting SKBTX_WIFI_STATUS for sk = %px\n", __func__, sk); *tx_flags |= SKBTX_WIFI_STATUS; + } } static inline void sock_tx_timestamp(struct sock *sk, diff --git a/net/core/sock.c b/net/core/sock.c index e02a78538e3e..f6589ad5ba36 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1548,6 +1548,7 @@ int sk_setsockopt(struct sock *sk, int level, int optname, break; case SO_WIFI_STATUS: + printk(KERN_INFO "%s: setting SOCK_WIFI_STATUS to %u for sk = %px\n", __func__, valbool, sk); sock_valbool_flag(sk, SOCK_WIFI_STATUS, valbool); break; diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index 853493eca4f5..eee2f80949c6 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -4588,9 +4588,12 @@ static noinline void ieee80211_8023_xmit_clang_debug_helper(struct sk_buff *skb, { if (unlikely(skb->sk && ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) || sock_flag(skb->sk, SOCK_WIFI_STATUS)))) { - if ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) ^ sock_flag(skb->sk, SOCK_WIFI_STATUS)) + if ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) ^ sock_flag(skb->sk, SOCK_WIFI_STATUS)) { printk(KERN_INFO "%s: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = %u sock_flag(skb->sk, SOCK_WIFI_STATUS) = %u\n", __func__, (skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS), sock_flag(skb->sk, SOCK_WIFI_STATUS)); + printk(KERN_INFO "%s: skb->sk = %px skb->sk->sk_flags = 0x%lx\n", __func__, skb->sk, skb->sk->sk_flags); + return; // This should make this case non-fatal. + } info->status_data = ieee80211_store_ack_skb(local, skb, &info->flags, NULL); if (info->status_data) This gives after ~15min uptime [ 189.337797] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 [ 189.337803] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1b798c4e00 skb->sk->sk_flags = 0xffffffffb4efe640 [ 191.325256] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 [ 191.325259] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1b798c5a00 skb->sk->sk_flags = 0xffffffffb4efe640 [ 257.591831] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 [ 257.591844] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1baf3bca00 skb->sk->sk_flags = 0xffffffffb4efe640 [ 301.786963] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 [ 301.786967] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1c1bc40100 skb->sk->sk_flags = 0xffffffffb4efe640 [ 302.780881] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 [ 302.780884] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1a44cf6000 skb->sk->sk_flags = 0xffffffffb4efe640 [ 482.792298] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 [ 482.792304] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1da0f4de00 skb->sk->sk_flags = 0xffffffffb4efe640 [ 482.806144] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 [ 482.806148] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1da0f4c500 skb->sk->sk_flags = 0xffffffffb4efe640 [ 482.817280] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 [ 482.817284] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1da0f4df00 skb->sk->sk_flags = 0xffffffffb4efe640 [ 552.327291] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 [ 552.327295] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1da0f4de00 skb->sk->sk_flags = 0xffffffffb4efe640 [ 916.971599] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 [ 916.971607] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1a62834000 skb->sk->sk_flags = 0xffffffffb4efe640 The printk()s in sk_set_sockopt() and _sock_tx_timestamp() are not called at all so the flag SOCK_WIFI_STATUS is actually nevers set! What is printed when printing skb->sk->sk_flags looks suspiciously like a pointer, and as sk_flags is actually a member of a union in struct sock_common it seems clang is using sk_flags for one of the other union members here struct sock_common { [...] union { unsigned long skc_flags; struct sock *skc_listener; /* request_sock */ struct inet_timewait_death_row *skc_tw_dr; /* inet_timewait_sock */ }; [...] } Bert Karwatzki ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: lockup and kernel panic in linux-next-202505{09,12} when compiled with clang 2025-05-17 11:34 ` Bert Karwatzki @ 2025-05-17 19:49 ` Bert Karwatzki 2025-05-18 1:30 ` Jason Xing 0 siblings, 1 reply; 20+ messages in thread From: Bert Karwatzki @ 2025-05-17 19:49 UTC (permalink / raw) To: Johannes Berg, linux-kernel@vger.kernel.org Cc: linux-next@vger.kernel.org, llvm@lists.linux.dev, Thomas Gleixner, linux-wireless, Jason Xing, spasswolf Am Samstag, dem 17.05.2025 um 13:34 +0200 schrieb Bert Karwatzki: > Am Freitag, dem 16.05.2025 um 20:19 +0200 schrieb Bert Karwatzki: > > I've added a debugging statement: > > > > diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c > > index 3bd5ee0995fe..853493eca4f5 100644 > > --- a/net/mac80211/tx.c > > +++ b/net/mac80211/tx.c > > @@ -4586,7 +4586,11 @@ static noinline void ieee80211_8023_xmit_clang_debug_helper(struct sk_buff *skb, > > struct ieee80211_local *local, > > struct ieee80211_tx_info *info) > > { > > - if (unlikely(skb->sk && sock_flag(skb->sk, SOCK_WIFI_STATUS))) { > > + if (unlikely(skb->sk && ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) || > > + sock_flag(skb->sk, SOCK_WIFI_STATUS)))) { > > + if ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) ^ sock_flag(skb->sk, SOCK_WIFI_STATUS)) > > + printk(KERN_INFO "%s: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = %u sock_flag(skb->sk, > > SOCK_WIFI_STATUS) = %u\n", > > + __func__, (skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS), sock_flag(skb->sk, > > SOCK_WIFI_STATUS)); > > info->status_data = ieee80211_store_ack_skb(local, skb, > > &info->flags, NULL); > > if (info->status_data) > > > > This gives the following logoutput (and a lockup), indicating that sock_flag(skb->sk, SOCK_WIFI_STATUS) and > > (skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) are actually NOT equivalent (when compiled with clang and > > PREEMPT_RT=y) > > I've added more debugging output: > > diff --git a/include/net/sock.h b/include/net/sock.h > index e223102337c7..e13560b5b7a8 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -2735,8 +2735,10 @@ static inline void _sock_tx_timestamp(struct sock *sk, > *tskey = atomic_inc_return(&sk->sk_tskey) - 1; > } > } > - if (unlikely(sock_flag(sk, SOCK_WIFI_STATUS))) > + if (unlikely(sock_flag(sk, SOCK_WIFI_STATUS))) { > + printk(KERN_INFO "%s: setting SKBTX_WIFI_STATUS for sk = %px\n", __func__, sk); > *tx_flags |= SKBTX_WIFI_STATUS; > + } > } > > static inline void sock_tx_timestamp(struct sock *sk, > diff --git a/net/core/sock.c b/net/core/sock.c > index e02a78538e3e..f6589ad5ba36 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -1548,6 +1548,7 @@ int sk_setsockopt(struct sock *sk, int level, int optname, > break; > > case SO_WIFI_STATUS: > + printk(KERN_INFO "%s: setting SOCK_WIFI_STATUS to %u for sk = %px\n", __func__, valbool, sk); > sock_valbool_flag(sk, SOCK_WIFI_STATUS, valbool); > break; > > diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c > index 853493eca4f5..eee2f80949c6 100644 > --- a/net/mac80211/tx.c > +++ b/net/mac80211/tx.c > @@ -4588,9 +4588,12 @@ static noinline void ieee80211_8023_xmit_clang_debug_helper(struct sk_buff *skb, > { > if (unlikely(skb->sk && ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) || > sock_flag(skb->sk, SOCK_WIFI_STATUS)))) { > - if ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) ^ sock_flag(skb->sk, SOCK_WIFI_STATUS)) > + if ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) ^ sock_flag(skb->sk, SOCK_WIFI_STATUS)) { > printk(KERN_INFO "%s: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = %u sock_flag(skb->sk, SOCK_WIFI_STATUS) = %u\n", > __func__, (skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS), sock_flag(skb->sk, SOCK_WIFI_STATUS)); > + printk(KERN_INFO "%s: skb->sk = %px skb->sk->sk_flags = 0x%lx\n", __func__, skb->sk, skb->sk->sk_flags); > + return; // This should make this case non-fatal. > + } > info->status_data = ieee80211_store_ack_skb(local, skb, > &info->flags, NULL); > if (info->status_data) > > > > This gives after ~15min uptime > > [ 189.337797] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > [ 189.337803] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1b798c4e00 skb->sk->sk_flags = 0xffffffffb4efe640 > [ 191.325256] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > [ 191.325259] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1b798c5a00 skb->sk->sk_flags = 0xffffffffb4efe640 > [ 257.591831] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > [ 257.591844] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1baf3bca00 skb->sk->sk_flags = 0xffffffffb4efe640 > [ 301.786963] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > [ 301.786967] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1c1bc40100 skb->sk->sk_flags = 0xffffffffb4efe640 > [ 302.780881] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > [ 302.780884] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1a44cf6000 skb->sk->sk_flags = 0xffffffffb4efe640 > [ 482.792298] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > [ 482.792304] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1da0f4de00 skb->sk->sk_flags = 0xffffffffb4efe640 > [ 482.806144] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > [ 482.806148] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1da0f4c500 skb->sk->sk_flags = 0xffffffffb4efe640 > [ 482.817280] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > [ 482.817284] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1da0f4df00 skb->sk->sk_flags = 0xffffffffb4efe640 > [ 552.327291] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > [ 552.327295] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1da0f4de00 skb->sk->sk_flags = 0xffffffffb4efe640 > [ 916.971599] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > [ 916.971607] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1a62834000 skb->sk->sk_flags = 0xffffffffb4efe640 > > The printk()s in sk_set_sockopt() and _sock_tx_timestamp() are not called at all so the flag > SOCK_WIFI_STATUS is actually nevers set! What is printed when printing skb->sk->sk_flags looks > suspiciously like a pointer, and as sk_flags is actually a member of a union in struct sock_common > it seems clang is using sk_flags for one of the other union members here > > struct sock_common { > [...] > union { > unsigned long skc_flags; > struct sock *skc_listener; /* request_sock */ > struct inet_timewait_death_row *skc_tw_dr; /* inet_timewait_sock */ > }; > [...] > } > > Bert Karwatzki I added even more debugging output and found out why commit 76a853f86c97 (" wifi: free SKBTX_WIFI_STATUS skb tx_flags flag") does not work. diff --git a/include/net/sock.h b/include/net/sock.h index e13560b5b7a8..6e1291d2e5a1 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2738,6 +2738,8 @@ static inline void _sock_tx_timestamp(struct sock *sk, if (unlikely(sock_flag(sk, SOCK_WIFI_STATUS))) { printk(KERN_INFO "%s: setting SKBTX_WIFI_STATUS for sk = %px\n", __func__, sk); *tx_flags |= SKBTX_WIFI_STATUS; + } else { + printk(KERN_INFO "%s: NOT setting SKBTX_WIFI_STATUS for sk = %px\n", __func__, sk); } } diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 20915895bdaa..4913b09c0617 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -912,6 +912,7 @@ reqsk_alloc_noprof(const struct request_sock_ops *ops, struct sock *sk_listener, return NULL; } req->rsk_listener = sk_listener; + printk(KERN_INFO "%s: sk_listener = %px\n", __func__, sk_listener); } req->rsk_ops = ops; req_to_sk(req)->sk_prot = sk_listener->sk_prot; @@ -986,6 +987,7 @@ static struct request_sock *inet_reqsk_clone(struct request_sock *req, nreq_sk->sk_incoming_cpu = req_sk->sk_incoming_cpu; nreq->rsk_listener = sk; + printk(KERN_INFO "%s: rsk_listener =%px\n", __func__, sk); /* We need not acquire fastopenq->lock * because the child socket is locked in inet_csk_listen_stop(). diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c index 67efe9501581..1a3108ec7503 100644 --- a/net/ipv4/inet_timewait_sock.c +++ b/net/ipv4/inet_timewait_sock.c @@ -190,6 +190,7 @@ struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk, const struct inet_sock *inet = inet_sk(sk); tw->tw_dr = dr; + printk(KERN_INFO "%s: sk = %px tw_dr = %px\n", __func__, sk, dr); /* Give us an identity. */ tw->tw_daddr = inet->inet_daddr; tw->tw_rcv_saddr = inet->inet_rcv_saddr; diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index eee2f80949c6..227b86427e06 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -4586,6 +4586,8 @@ static noinline void ieee80211_8023_xmit_clang_debug_helper(struct sk_buff *skb, struct ieee80211_local *local, struct ieee80211_tx_info *info) { + if (skb->sk) + printk(KERN_INFO "%s: skb->sk = %px skb->sk->sk_flags = 0x%lx\n", __func__, skb->sk, skb->sk->sk_flags); if (unlikely(skb->sk && ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) || sock_flag(skb->sk, SOCK_WIFI_STATUS)))) { if ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) ^ sock_flag(skb->sk, SOCK_WIFI_STATUS)) { This monitor the value of skb->sk->sk_flags not only in the error case but in all cases, and also monitors the places where the other members of the sk_flags union are set. The error occurs when at the start of ieee80211_8023_xmit_clang_debug_helper() sk_flags is not actually the skc_flags member of the union but insted is skc_tw_dr which is only interpreted is flags. So why does it work with gcc but fail with clang? sock_flag(skb->sk, SOCK_WIFI_STATUS) test bit 19 of skb->sk->sk_flags Here are the important snippets of debug output: clang: [ T575] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8f1bebba4300 skb->sk->sk_flags = 0xffffffffa16fe640 Here test_bit(0xffffffffa16fe640, SOCK_WIFI_STATUS) is 1. gcc: [ T600] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8d3506bec700 skb->sk->sk_flags = 0xffffffff93d40100 Here test_bit(0xffffffff93d40100, SOCK_WIFI_STATUS) is 0. So that this works with gcc just seems like luck. I've not yet test why it works with clang when PREEMPT_RT is not enabled but my guess is that in that case we have a tw_dr pointer which fails the test_bit(). Bert Karwatzki ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: lockup and kernel panic in linux-next-202505{09,12} when compiled with clang 2025-05-17 19:49 ` Bert Karwatzki @ 2025-05-18 1:30 ` Jason Xing 2025-05-18 12:12 ` Bert Karwatzki 0 siblings, 1 reply; 20+ messages in thread From: Jason Xing @ 2025-05-18 1:30 UTC (permalink / raw) To: Bert Karwatzki Cc: Johannes Berg, linux-kernel@vger.kernel.org, linux-next@vger.kernel.org, llvm@lists.linux.dev, Thomas Gleixner, linux-wireless Hi Bert, Thanks for your report and analysis! On Sun, May 18, 2025 at 3:49 AM Bert Karwatzki <spasswolf@web.de> wrote: > > Am Samstag, dem 17.05.2025 um 13:34 +0200 schrieb Bert Karwatzki: > > Am Freitag, dem 16.05.2025 um 20:19 +0200 schrieb Bert Karwatzki: > > > I've added a debugging statement: > > > > > > diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c > > > index 3bd5ee0995fe..853493eca4f5 100644 > > > --- a/net/mac80211/tx.c > > > +++ b/net/mac80211/tx.c > > > @@ -4586,7 +4586,11 @@ static noinline void ieee80211_8023_xmit_clang_debug_helper(struct sk_buff *skb, What is the caller of it? It's the function that you customized? > > > struct ieee80211_local *local, > > > struct ieee80211_tx_info *info) > > > { > > > - if (unlikely(skb->sk && sock_flag(skb->sk, SOCK_WIFI_STATUS))) { > > > + if (unlikely(skb->sk && ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) || > > > + sock_flag(skb->sk, SOCK_WIFI_STATUS)))) { > > > + if ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) ^ sock_flag(skb->sk, SOCK_WIFI_STATUS)) > > > + printk(KERN_INFO "%s: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = %u sock_flag(skb->sk, > > > SOCK_WIFI_STATUS) = %u\n", > > > + __func__, (skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS), sock_flag(skb->sk, > > > SOCK_WIFI_STATUS)); > > > info->status_data = ieee80211_store_ack_skb(local, skb, > > > &info->flags, NULL); > > > if (info->status_data) > > > > > > This gives the following logoutput (and a lockup), indicating that sock_flag(skb->sk, SOCK_WIFI_STATUS) and > > > (skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) are actually NOT equivalent (when compiled with clang and > > > PREEMPT_RT=y) Moving skc_flags out of the union can solve the issue, right? Simple modification looks like this: diff --git a/include/net/sock.h b/include/net/sock.h index 3e15d7105ad2..5810c7b80507 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -189,13 +189,13 @@ struct sock_common { atomic64_t skc_cookie; + unsigned long skc_flags; /* following fields are padding to force * offset(struct sock, sk_refcnt) == 128 on 64bit arches * assuming IPV6 is enabled. We use this padding differently * for different kind of 'sockets' */ union { - unsigned long skc_flags; struct sock *skc_listener; /* request_sock */ struct inet_timewait_death_row *skc_tw_dr; /* inet_timewait_sock */ }; Can you give it a try? > > > > I've added more debugging output: > > > > diff --git a/include/net/sock.h b/include/net/sock.h > > index e223102337c7..e13560b5b7a8 100644 > > --- a/include/net/sock.h > > +++ b/include/net/sock.h > > @@ -2735,8 +2735,10 @@ static inline void _sock_tx_timestamp(struct sock *sk, > > *tskey = atomic_inc_return(&sk->sk_tskey) - 1; > > } > > } > > - if (unlikely(sock_flag(sk, SOCK_WIFI_STATUS))) > > + if (unlikely(sock_flag(sk, SOCK_WIFI_STATUS))) { > > + printk(KERN_INFO "%s: setting SKBTX_WIFI_STATUS for sk = %px\n", __func__, sk); > > *tx_flags |= SKBTX_WIFI_STATUS; > > + } > > } > > > > static inline void sock_tx_timestamp(struct sock *sk, > > diff --git a/net/core/sock.c b/net/core/sock.c > > index e02a78538e3e..f6589ad5ba36 100644 > > --- a/net/core/sock.c > > +++ b/net/core/sock.c > > @@ -1548,6 +1548,7 @@ int sk_setsockopt(struct sock *sk, int level, int optname, > > break; > > > > case SO_WIFI_STATUS: > > + printk(KERN_INFO "%s: setting SOCK_WIFI_STATUS to %u for sk = %px\n", __func__, valbool, sk); > > sock_valbool_flag(sk, SOCK_WIFI_STATUS, valbool); > > break; > > > > diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c > > index 853493eca4f5..eee2f80949c6 100644 > > --- a/net/mac80211/tx.c > > +++ b/net/mac80211/tx.c > > @@ -4588,9 +4588,12 @@ static noinline void ieee80211_8023_xmit_clang_debug_helper(struct sk_buff *skb, > > { > > if (unlikely(skb->sk && ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) || > > sock_flag(skb->sk, SOCK_WIFI_STATUS)))) { > > - if ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) ^ sock_flag(skb->sk, SOCK_WIFI_STATUS)) > > + if ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) ^ sock_flag(skb->sk, SOCK_WIFI_STATUS)) { > > printk(KERN_INFO "%s: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = %u sock_flag(skb->sk, SOCK_WIFI_STATUS) = %u\n", > > __func__, (skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS), sock_flag(skb->sk, SOCK_WIFI_STATUS)); > > + printk(KERN_INFO "%s: skb->sk = %px skb->sk->sk_flags = 0x%lx\n", __func__, skb->sk, skb->sk->sk_flags); > > + return; // This should make this case non-fatal. > > + } > > info->status_data = ieee80211_store_ack_skb(local, skb, > > &info->flags, NULL); > > if (info->status_data) > > > > > > > > This gives after ~15min uptime > > > > [ 189.337797] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > > [ 189.337803] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1b798c4e00 skb->sk->sk_flags = 0xffffffffb4efe640 > > [ 191.325256] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > > [ 191.325259] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1b798c5a00 skb->sk->sk_flags = 0xffffffffb4efe640 > > [ 257.591831] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > > [ 257.591844] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1baf3bca00 skb->sk->sk_flags = 0xffffffffb4efe640 > > [ 301.786963] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > > [ 301.786967] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1c1bc40100 skb->sk->sk_flags = 0xffffffffb4efe640 > > [ 302.780881] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > > [ 302.780884] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1a44cf6000 skb->sk->sk_flags = 0xffffffffb4efe640 > > [ 482.792298] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > > [ 482.792304] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1da0f4de00 skb->sk->sk_flags = 0xffffffffb4efe640 > > [ 482.806144] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > > [ 482.806148] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1da0f4c500 skb->sk->sk_flags = 0xffffffffb4efe640 > > [ 482.817280] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > > [ 482.817284] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1da0f4df00 skb->sk->sk_flags = 0xffffffffb4efe640 > > [ 552.327291] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > > [ 552.327295] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1da0f4de00 skb->sk->sk_flags = 0xffffffffb4efe640 > > [ 916.971599] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > > [ 916.971607] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1a62834000 skb->sk->sk_flags = 0xffffffffb4efe640 > > > > The printk()s in sk_set_sockopt() and _sock_tx_timestamp() are not called at all so the flag > > SOCK_WIFI_STATUS is actually nevers set! What is printed when printing skb->sk->sk_flags looks > > suspiciously like a pointer, and as sk_flags is actually a member of a union in struct sock_common > > it seems clang is using sk_flags for one of the other union members here > > > > struct sock_common { > > [...] > > union { > > unsigned long skc_flags; > > struct sock *skc_listener; /* request_sock */ > > struct inet_timewait_death_row *skc_tw_dr; /* inet_timewait_sock */ > > }; > > [...] > > } > > > > Bert Karwatzki > > I added even more debugging output and found out why commit 76a853f86c97 (" wifi: free > SKBTX_WIFI_STATUS skb tx_flags flag") does not work. > > diff --git a/include/net/sock.h b/include/net/sock.h > index e13560b5b7a8..6e1291d2e5a1 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -2738,6 +2738,8 @@ static inline void _sock_tx_timestamp(struct sock *sk, > if (unlikely(sock_flag(sk, SOCK_WIFI_STATUS))) { > printk(KERN_INFO "%s: setting SKBTX_WIFI_STATUS for sk = %px\n", __func__, sk); > *tx_flags |= SKBTX_WIFI_STATUS; > + } else { > + printk(KERN_INFO "%s: NOT setting SKBTX_WIFI_STATUS for sk = %px\n", __func__, sk); > } > } > > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c > index 20915895bdaa..4913b09c0617 100644 > --- a/net/ipv4/inet_connection_sock.c > +++ b/net/ipv4/inet_connection_sock.c > @@ -912,6 +912,7 @@ reqsk_alloc_noprof(const struct request_sock_ops *ops, struct sock *sk_listener, > return NULL; > } > req->rsk_listener = sk_listener; > + printk(KERN_INFO "%s: sk_listener = %px\n", __func__, sk_listener); > } > req->rsk_ops = ops; > req_to_sk(req)->sk_prot = sk_listener->sk_prot; > @@ -986,6 +987,7 @@ static struct request_sock *inet_reqsk_clone(struct request_sock *req, > nreq_sk->sk_incoming_cpu = req_sk->sk_incoming_cpu; > > nreq->rsk_listener = sk; > + printk(KERN_INFO "%s: rsk_listener =%px\n", __func__, sk); > > /* We need not acquire fastopenq->lock > * because the child socket is locked in inet_csk_listen_stop(). > diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c > index 67efe9501581..1a3108ec7503 100644 > --- a/net/ipv4/inet_timewait_sock.c > +++ b/net/ipv4/inet_timewait_sock.c > @@ -190,6 +190,7 @@ struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk, > const struct inet_sock *inet = inet_sk(sk); > > tw->tw_dr = dr; > + printk(KERN_INFO "%s: sk = %px tw_dr = %px\n", __func__, sk, dr); > /* Give us an identity. */ > tw->tw_daddr = inet->inet_daddr; > tw->tw_rcv_saddr = inet->inet_rcv_saddr; > diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c > index eee2f80949c6..227b86427e06 100644 > --- a/net/mac80211/tx.c > +++ b/net/mac80211/tx.c > @@ -4586,6 +4586,8 @@ static noinline void ieee80211_8023_xmit_clang_debug_helper(struct sk_buff *skb, > struct ieee80211_local *local, > struct ieee80211_tx_info *info) > { > + if (skb->sk) > + printk(KERN_INFO "%s: skb->sk = %px skb->sk->sk_flags = 0x%lx\n", __func__, skb->sk, skb->sk->sk_flags); > if (unlikely(skb->sk && ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) || > sock_flag(skb->sk, SOCK_WIFI_STATUS)))) { > if ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) ^ sock_flag(skb->sk, SOCK_WIFI_STATUS)) { > > > This monitor the value of skb->sk->sk_flags not only in the error case but in all cases, and also monitors > the places where the other members of the sk_flags union are set. The error occurs when at the start > of ieee80211_8023_xmit_clang_debug_helper() sk_flags is not actually the skc_flags member of the union > but insted is skc_tw_dr which is only interpreted is flags. > So why does it work with gcc but fail with clang? sock_flag(skb->sk, SOCK_WIFI_STATUS) test bit 19 of > skb->sk->sk_flags Could you say more about this? I don't follow it. Why does the gcc test just miss the crash issue? Is there anything (like call trace) different between them? My worry is that all the callers calling sock_flag might have such potential risk... Thanks, Jason > > Here are the important snippets of debug output: > > clang: > [ T575] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8f1bebba4300 skb->sk->sk_flags = 0xffffffffa16fe640 > > Here test_bit(0xffffffffa16fe640, SOCK_WIFI_STATUS) is 1. > > gcc: > [ T600] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8d3506bec700 skb->sk->sk_flags = 0xffffffff93d40100 > Here test_bit(0xffffffff93d40100, SOCK_WIFI_STATUS) is 0. > > So that this works with gcc just seems like luck. I've not yet test why it works with clang when PREEMPT_RT is not > enabled but my guess is that in that case we have a tw_dr pointer which fails the test_bit(). > > Bert Karwatzki > > > > > > > ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: lockup and kernel panic in linux-next-202505{09,12} when compiled with clang 2025-05-18 1:30 ` Jason Xing @ 2025-05-18 12:12 ` Bert Karwatzki 2025-05-18 12:43 ` Bert Karwatzki 0 siblings, 1 reply; 20+ messages in thread From: Bert Karwatzki @ 2025-05-18 12:12 UTC (permalink / raw) To: Jason Xing Cc: Johannes Berg, linux-kernel@vger.kernel.org, linux-next@vger.kernel.org, llvm@lists.linux.dev, Thomas Gleixner, linux-wireless, spasswolf Am Sonntag, dem 18.05.2025 um 09:30 +0800 schrieb Jason Xing: > Hi Bert, > > Thanks for your report and analysis! > > On Sun, May 18, 2025 at 3:49 AM Bert Karwatzki <spasswolf@web.de> wrote: > > > > Am Samstag, dem 17.05.2025 um 13:34 +0200 schrieb Bert Karwatzki: > > > Am Freitag, dem 16.05.2025 um 20:19 +0200 schrieb Bert Karwatzki: > > > > I've added a debugging statement: > > > > > > > > diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c > > > > index 3bd5ee0995fe..853493eca4f5 100644 > > > > --- a/net/mac80211/tx.c > > > > +++ b/net/mac80211/tx.c > > > > @@ -4586,7 +4586,11 @@ static noinline void ieee80211_8023_xmit_clang_debug_helper(struct sk_buff *skb, > > What is the caller of it? It's the function that you customized? The only caller of ieee80211_8023_xmit_clang_debug_helper() is ieee80211_8023_xmit(). I did this because I thought clang might have been producing incorrect code at the time, but it turned out clang did nothing wrong. > > > > > struct ieee80211_local *local, > > > > struct ieee80211_tx_info *info) > > > > { > > > > - if (unlikely(skb->sk && sock_flag(skb->sk, SOCK_WIFI_STATUS))) { > > > > + if (unlikely(skb->sk && ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) || > > > > + sock_flag(skb->sk, SOCK_WIFI_STATUS)))) { > > > > + if ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) ^ sock_flag(skb->sk, SOCK_WIFI_STATUS)) > > > > + printk(KERN_INFO "%s: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = %u sock_flag(skb->sk, > > > > SOCK_WIFI_STATUS) = %u\n", > > > > + __func__, (skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS), sock_flag(skb->sk, > > > > SOCK_WIFI_STATUS)); > > > > info->status_data = ieee80211_store_ack_skb(local, skb, > > > > &info->flags, NULL); > > > > if (info->status_data) > > > > > > > > This gives the following logoutput (and a lockup), indicating that sock_flag(skb->sk, SOCK_WIFI_STATUS) and > > > > (skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) are actually NOT equivalent (when compiled with clang and > > > > PREEMPT_RT=y) > > Moving skc_flags out of the union can solve the issue, right? Simple > modification looks like this: > diff --git a/include/net/sock.h b/include/net/sock.h > index 3e15d7105ad2..5810c7b80507 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -189,13 +189,13 @@ struct sock_common { > > atomic64_t skc_cookie; > > + unsigned long skc_flags; > /* following fields are padding to force > * offset(struct sock, sk_refcnt) == 128 on 64bit arches > * assuming IPV6 is enabled. We use this padding differently > * for different kind of 'sockets' > */ > union { > - unsigned long skc_flags; > struct sock *skc_listener; /* request_sock */ > struct inet_timewait_death_row *skc_tw_dr; /* > inet_timewait_sock */ > }; > > Can you give it a try? I thought this would work, but applying this patch on both on next-20250513 and next-20250516 gives the usual kernel panic (captured via netconsole) or the lockup (which I'm not repeating here ~1000 lines). [ 199.627464][ T580] Oops: general protection fault, probably for non-canonical address 0xff510aa8ab572985: 0000 [#1] SMP NOPTI [ 199.627475][ T580] CPU: 8 UID: 0 PID: 580 Comm: napi/phy0-0 Not tainted 6.15.0-rc6-next-20250513-llvm-00005-gdd968010bbfa #993 PREEMPT_{RT,(full)} [ 199.627481][ T580] Hardware name: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-158L, BIOS E158LAMS.10F 11/11/2024 [ 199.627484][ T580] RIP: 0010:queued_spin_lock_slowpath+0x120/0x1c0 [ 199.627494][ T580] Code: c8 c1 e8 10 66 87 47 02 66 85 c0 74 40 0f b7 c0 89 c6 83 e6 03 c1 e6 04 83 e0 fc 49 c7 c0 f8 ff ff ff 49 8b 84 40 a0 fa 98 ab <48> 89 94 06 c0 21 06 ac 83 7a 08 00 75 08 f3 90 83 7a 08 00 74 f8 [ 199.627497][ T580] RSP: 0018:ffffd0c301e77998 EFLAGS: 00010006 [ 199.627501][ T580] RAX: ff510aa8ff5107b5 RBX: 0000000000000286 RCX: 0000000000240000 [ 199.627503][ T580] RDX: ffff8a716e8231c0 RSI: 0000000000000010 RDI: ffff8a64c7ed35f8 [ 199.627505][ T580] RBP: ffff8a62c8751200 R08: fffffffffffffff8 R09: 0000000000000001 [ 199.627507][ T580] R10: 0000000000000001 R11: ffffffffab1f0820 R12: ffff8a64c7ed35e0 [ 199.627509][ T580] R13: ffff8a62cbaf2480 R14: ffff8a64c7ed35f8 R15: ffff8a64c7ed35f8 [ 199.627511][ T580] FS: 0000000000000000(0000) GS:ffff8a71c27c1000(0000) knlGS:0000000000000000 [ 199.627513][ T580] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 199.627515][ T580] CR2: 00007fbadcaec0b0 CR3: 00000007fc23a000 CR4: 0000000000750ef0 [ 199.627518][ T580] PKRU: 55555554 [ 199.627519][ T580] Call Trace: [ 199.627522][ T580] <TASK> [ 199.627525][ T580] _raw_spin_lock_irqsave+0x57/0x60 [ 199.627531][ T580] rt_spin_lock+0x73/0xa0 [ 199.627536][ T580] sock_queue_err_skb+0xdc/0x140 [ 199.627542][ T580] skb_complete_wifi_ack+0xa9/0x120 [ 199.627551][ T580] ieee80211_report_used_skb+0x541/0x6e0 [mac80211] [ 199.627598][ T580] ? srso_alias_return_thunk+0x5/0xfbef5 [ 199.627604][ T580] ? srso_alias_return_thunk+0x5/0xfbef5 [ 199.627608][ T580] ieee80211_tx_status_ext+0x3b3/0x870 [mac80211] [ 199.627636][ T580] ? srso_alias_return_thunk+0x5/0xfbef5 [ 199.627638][ T580] ? rt_spin_lock+0x3d/0xa0 [ 199.627646][ T580] ? mt76_tx_status_unlock+0x38/0x230 [mt76] [ 199.627657][ T580] mt76_tx_status_unlock+0x1e0/0x230 [mt76] [ 199.627668][ T580] __mt76_tx_complete_skb+0x13b/0x2e0 [mt76] [ 199.627676][ T580] ? srso_alias_return_thunk+0x5/0xfbef5 [ 199.627679][ T580] ? rt_spin_unlock+0x12/0x40 [ 199.627682][ T580] ? srso_alias_return_thunk+0x5/0xfbef5 [ 199.627688][ T580] mt76_connac2_txwi_free+0x127/0x150 [mt76_connac_lib] [ 199.627698][ T580] mt7921_mac_tx_free+0x112/0x260 [mt7921_common] [ 199.627708][ T580] mt7921_rx_check+0x33/0xe0 [mt7921_common] [ 199.627715][ T580] mt76_dma_rx_poll+0x322/0x660 [mt76] [ 199.627725][ T580] ? mt792x_poll_rx+0x2a/0x120 [mt792x_lib] [ 199.627732][ T580] mt792x_poll_rx+0x71/0x120 [mt792x_lib] [ 199.627739][ T580] __napi_poll+0x2a/0x170 [ 199.627743][ T580] ? napi_threaded_poll_loop+0x32/0x1b0 [ 199.627746][ T580] napi_threaded_poll_loop+0xe4/0x1b0 [ 199.627749][ T580] ? napi_threaded_poll_loop+0x32/0x1b0 [ 199.627751][ T580] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 199.627757][ T580] napi_threaded_poll+0x57/0x80 [ 199.627760][ T580] ? __pfx_napi_threaded_poll+0x10/0x10 [ 199.627763][ T580] kthread+0x25c/0x280 [ 199.627769][ T580] ? __pfx_kthread+0x10/0x10 [ 199.627773][ T580] ret_from_fork+0xc4/0x1b0 [ 199.627777][ T580] ? __pfx_kthread+0x10/0x10 [ 199.627781][ T580] ret_from_fork_asm+0x1a/0x30 [ 199.627788][ T580] </TASK> [ 199.627789][ T580] Modules linked in: sd_mod scsi_mod scsi_common netconsole ccm snd_seq_dummy snd_hrtimer snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_seq_device rfcomm bnep nls_ascii nls_cp437 vfat fat snd_ctl_led snd_hda_codec_realtek snd_hda_scodec_component snd_hda_codec_generic snd_hda_codec_hdmi btusb btbcm btintel btrtl snd_hda_intel btmtk snd_intel_dspcfg snd_hda_codec snd_soc_dmic snd_acp3x_rn snd_acp3x_pdm_dma snd_hwdep bluetooth snd_hda_core snd_soc_core uvcvideo videobuf2_vmalloc videobuf2_memops snd_pcm_oss uvc videobuf2_v4l2 snd_mixer_oss videodev snd_pcm snd_rn_pci_acp3x snd_acp_config videobuf2_common snd_timer snd_soc_acpi msi_wmi ecdh_generic ecc sparse_keymap mc wmi_bmof edac_mce_amd snd k10temp snd_pci_acp3x ccp soundcore battery ac button joydev hid_sensor_accel_3d hid_sensor_magn_3d hid_sensor_prox hid_sensor_als hid_sensor_gyro_3d hid_sensor_trigger hid_sensor_iio_common industrialio_triggered_buffer kfifo_buf amd_pmc evdev industrialio mt7921e mt May 18 13:22:44 7921_common mt792x_lib mt76_connac_lib mt76 [ 199.627877][ T580] mac80211 libarc4 cfg80211 rfkill msr fuse nvme_fabrics efi_pstore configfs efivarfs autofs4 ext4 mbcache jbd2 amdgpu usbhid drm_panel_backlight_quirks cec drm_buddy drm_suballoc_helper drm_exec i2c_algo_bit drm_display_helper gpu_sched drm_ttm_helper hid_sensor_hub ttm xhci_pci hid_multitouch mfd_core hid_generic xhci_hcd i2c_hid_acpi drm_client_lib usbcore psmouse amd_sfh i2c_hid drm_kms_helper nvme hid serio_raw nvme_core amdxcp r8169 i2c_piix4 crc16 usb_common i2c_smbus i2c_designware_platform i2c_designware_core [ 199.627931][ T580] ---[ end trace 0000000000000000 ]--- [ 199.781799][ T580] RIP: 0010:queued_spin_lock_slowpath+0x120/0x1c0 [ 199.781799][ T580] Code: c8 c1 e8 10 66 87 47 02 66 85 c0 74 40 0f b7 c0 89 c6 83 e6 03 c1 e6 04 83 e0 fc 49 c7 c0 f8 ff ff ff 49 8b 84 40 a0 fa 98 ab <48> 89 94 06 c0 21 06 ac 83 7a 08 00 75 08 f3 90 83 7a 08 00 74 f8 [ 199.781799][ T580] RSP: 0018:ffffd0c301e77998 EFLAGS: 00010006 [ 199.781799][ T580] RAX: ff510aa8ff5107b5 RBX: 0000000000000286 RCX: 0000000000240000 [ 199.781799][ T580] RDX: ffff8a716e8231c0 RSI: 0000000000000010 RDI: ffff8a64c7ed35f8 [ 199.781799][ T580] RBP: ffff8a62c8751200 R08: fffffffffffffff8 R09: 0000000000000001 [ 199.781799][ T580] R10: 0000000000000001 R11: ffffffffab1f0820 R12: ffff8a64c7ed35e0 [ 199.781799][ T580] R13: ffff8a62cbaf2480 R14: ffff8a64c7ed35f8 R15: ffff8a64c7ed35f8 [ 199.781799][ T580] FS: 0000000000000000(0000) GS:ffff8a71c27c1000(0000) knlGS:0000000000000000 [ 199.781799][ T580] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 199.781799][ T580] CR2: 00007fbadcaec0b0 CR3: 00000007fc23a000 CR4: 0000000000750ef0 [ 199.781799][ T580] PKRU: 55555554 [ 199.781799][ T580] Kernel panic - not syncing: Fatal exception in interrupt [ 199.788541][ T580] Kernel Offset: 0x29800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 199.788541][ T580] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- I even tried this version of your patch, to keep the offset of skc_refcnt at 128, but it doesn't work, either. commit fca84c5cde713be480544a64ed6680afc3319670 Author: Bert Karwatzki <spasswolf@web.de> Date: Sun May 18 13:32:36 2025 +0200 include: net: sock: move skc_flags out of the union Signed-off-by: Bert Karwatzki <spasswolf@web.de> diff --git a/include/net/sock.h b/include/net/sock.h index 3e15d7105ad2..e73929a4da6e 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -195,7 +195,6 @@ struct sock_common { * for different kind of 'sockets' */ union { - unsigned long skc_flags; struct sock *skc_listener; /* request_sock */ struct inet_timewait_death_row *skc_tw_dr; /* inet_timewait_sock */ }; @@ -221,6 +220,9 @@ struct sock_common { }; refcount_t skc_refcnt; + + /* place skc_flags here to keep offset(struct sock, sk_refcnt) == 128 */ + unsigned long skc_flags; /* private: */ int skc_dontcopy_end[0]; union { > > > > > > I've added more debugging output: > > > > > > diff --git a/include/net/sock.h b/include/net/sock.h > > > index e223102337c7..e13560b5b7a8 100644 > > > --- a/include/net/sock.h > > > +++ b/include/net/sock.h > > > @@ -2735,8 +2735,10 @@ static inline void _sock_tx_timestamp(struct sock *sk, > > > *tskey = atomic_inc_return(&sk->sk_tskey) - 1; > > > } > > > } > > > - if (unlikely(sock_flag(sk, SOCK_WIFI_STATUS))) > > > + if (unlikely(sock_flag(sk, SOCK_WIFI_STATUS))) { > > > + printk(KERN_INFO "%s: setting SKBTX_WIFI_STATUS for sk = %px\n", __func__, sk); > > > *tx_flags |= SKBTX_WIFI_STATUS; > > > + } > > > } > > > > > > static inline void sock_tx_timestamp(struct sock *sk, > > > diff --git a/net/core/sock.c b/net/core/sock.c > > > index e02a78538e3e..f6589ad5ba36 100644 > > > --- a/net/core/sock.c > > > +++ b/net/core/sock.c > > > @@ -1548,6 +1548,7 @@ int sk_setsockopt(struct sock *sk, int level, int optname, > > > break; > > > > > > case SO_WIFI_STATUS: > > > + printk(KERN_INFO "%s: setting SOCK_WIFI_STATUS to %u for sk = %px\n", __func__, valbool, sk); > > > sock_valbool_flag(sk, SOCK_WIFI_STATUS, valbool); > > > break; > > > > > > diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c > > > index 853493eca4f5..eee2f80949c6 100644 > > > --- a/net/mac80211/tx.c > > > +++ b/net/mac80211/tx.c > > > @@ -4588,9 +4588,12 @@ static noinline void ieee80211_8023_xmit_clang_debug_helper(struct sk_buff *skb, > > > { > > > if (unlikely(skb->sk && ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) || > > > sock_flag(skb->sk, SOCK_WIFI_STATUS)))) { > > > - if ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) ^ sock_flag(skb->sk, SOCK_WIFI_STATUS)) > > > + if ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) ^ sock_flag(skb->sk, SOCK_WIFI_STATUS)) { > > > printk(KERN_INFO "%s: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = %u sock_flag(skb->sk, SOCK_WIFI_STATUS) = %u\n", > > > __func__, (skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS), sock_flag(skb->sk, SOCK_WIFI_STATUS)); > > > + printk(KERN_INFO "%s: skb->sk = %px skb->sk->sk_flags = 0x%lx\n", __func__, skb->sk, skb->sk->sk_flags); > > > + return; // This should make this case non-fatal. > > > + } > > > info->status_data = ieee80211_store_ack_skb(local, skb, > > > &info->flags, NULL); > > > if (info->status_data) > > > > > > > > > > > > This gives after ~15min uptime > > > > > > [ 189.337797] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > > > [ 189.337803] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1b798c4e00 skb->sk->sk_flags = 0xffffffffb4efe640 > > > [ 191.325256] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > > > [ 191.325259] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1b798c5a00 skb->sk->sk_flags = 0xffffffffb4efe640 > > > [ 257.591831] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > > > [ 257.591844] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1baf3bca00 skb->sk->sk_flags = 0xffffffffb4efe640 > > > [ 301.786963] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > > > [ 301.786967] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1c1bc40100 skb->sk->sk_flags = 0xffffffffb4efe640 > > > [ 302.780881] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > > > [ 302.780884] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1a44cf6000 skb->sk->sk_flags = 0xffffffffb4efe640 > > > [ 482.792298] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > > > [ 482.792304] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1da0f4de00 skb->sk->sk_flags = 0xffffffffb4efe640 > > > [ 482.806144] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > > > [ 482.806148] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1da0f4c500 skb->sk->sk_flags = 0xffffffffb4efe640 > > > [ 482.817280] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > > > [ 482.817284] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1da0f4df00 skb->sk->sk_flags = 0xffffffffb4efe640 > > > [ 552.327291] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > > > [ 552.327295] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1da0f4de00 skb->sk->sk_flags = 0xffffffffb4efe640 > > > [ 916.971599] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS = 0 sock_flag(skb->sk, SOCK_WIFI_STATUS) = 1 > > > [ 916.971607] [ T576] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8c1a62834000 skb->sk->sk_flags = 0xffffffffb4efe640 > > > > > > The printk()s in sk_set_sockopt() and _sock_tx_timestamp() are not called at all so the flag > > > SOCK_WIFI_STATUS is actually nevers set! What is printed when printing skb->sk->sk_flags looks > > > suspiciously like a pointer, and as sk_flags is actually a member of a union in struct sock_common > > > it seems clang is using sk_flags for one of the other union members here > > > > > > struct sock_common { > > > [...] > > > union { > > > unsigned long skc_flags; > > > struct sock *skc_listener; /* request_sock */ > > > struct inet_timewait_death_row *skc_tw_dr; /* inet_timewait_sock */ > > > }; > > > [...] > > > } > > > > > > Bert Karwatzki > > > > I added even more debugging output and found out why commit 76a853f86c97 (" wifi: free > > SKBTX_WIFI_STATUS skb tx_flags flag") does not work. > > > > diff --git a/include/net/sock.h b/include/net/sock.h > > index e13560b5b7a8..6e1291d2e5a1 100644 > > --- a/include/net/sock.h > > +++ b/include/net/sock.h > > @@ -2738,6 +2738,8 @@ static inline void _sock_tx_timestamp(struct sock *sk, > > if (unlikely(sock_flag(sk, SOCK_WIFI_STATUS))) { > > printk(KERN_INFO "%s: setting SKBTX_WIFI_STATUS for sk = %px\n", __func__, sk); > > *tx_flags |= SKBTX_WIFI_STATUS; > > + } else { > > + printk(KERN_INFO "%s: NOT setting SKBTX_WIFI_STATUS for sk = %px\n", __func__, sk); > > } > > } > > > > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c > > index 20915895bdaa..4913b09c0617 100644 > > --- a/net/ipv4/inet_connection_sock.c > > +++ b/net/ipv4/inet_connection_sock.c > > @@ -912,6 +912,7 @@ reqsk_alloc_noprof(const struct request_sock_ops *ops, struct sock *sk_listener, > > return NULL; > > } > > req->rsk_listener = sk_listener; > > + printk(KERN_INFO "%s: sk_listener = %px\n", __func__, sk_listener); > > } > > req->rsk_ops = ops; > > req_to_sk(req)->sk_prot = sk_listener->sk_prot; > > @@ -986,6 +987,7 @@ static struct request_sock *inet_reqsk_clone(struct request_sock *req, > > nreq_sk->sk_incoming_cpu = req_sk->sk_incoming_cpu; > > > > nreq->rsk_listener = sk; > > + printk(KERN_INFO "%s: rsk_listener =%px\n", __func__, sk); > > > > /* We need not acquire fastopenq->lock > > * because the child socket is locked in inet_csk_listen_stop(). > > diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c > > index 67efe9501581..1a3108ec7503 100644 > > --- a/net/ipv4/inet_timewait_sock.c > > +++ b/net/ipv4/inet_timewait_sock.c > > @@ -190,6 +190,7 @@ struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk, > > const struct inet_sock *inet = inet_sk(sk); > > > > tw->tw_dr = dr; > > + printk(KERN_INFO "%s: sk = %px tw_dr = %px\n", __func__, sk, dr); > > /* Give us an identity. */ > > tw->tw_daddr = inet->inet_daddr; > > tw->tw_rcv_saddr = inet->inet_rcv_saddr; > > diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c > > index eee2f80949c6..227b86427e06 100644 > > --- a/net/mac80211/tx.c > > +++ b/net/mac80211/tx.c > > @@ -4586,6 +4586,8 @@ static noinline void ieee80211_8023_xmit_clang_debug_helper(struct sk_buff *skb, > > struct ieee80211_local *local, > > struct ieee80211_tx_info *info) > > { > > + if (skb->sk) > > + printk(KERN_INFO "%s: skb->sk = %px skb->sk->sk_flags = 0x%lx\n", __func__, skb->sk, skb->sk->sk_flags); > > if (unlikely(skb->sk && ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) || > > sock_flag(skb->sk, SOCK_WIFI_STATUS)))) { > > if ((skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS) ^ sock_flag(skb->sk, SOCK_WIFI_STATUS)) { > > > > > > This monitor the value of skb->sk->sk_flags not only in the error case but in all cases, and also monitors > > the places where the other members of the sk_flags union are set. The error occurs when at the start > > of ieee80211_8023_xmit_clang_debug_helper() sk_flags is not actually the skc_flags member of the union > > but insted is skc_tw_dr which is only interpreted is flags. > > So why does it work with gcc but fail with clang? sock_flag(skb->sk, SOCK_WIFI_STATUS) test bit 19 of > > skb->sk->sk_flags > > Could you say more about this? I don't follow it. Why does the gcc > test just miss the crash issue? Is there anything (like call trace) > different between them? > I think it is just pointer lottery, the pointer in the gcc version has bit 19 not set while the pointer in the clang version has bit 19 set. Why this is always the case, I don't know, there is KASLR active after all. By the way, the pointer value that is incorrectly used as sk_flags set in inet_twsk_alloc() (called by tcp_time_wait()): struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk, struct inet_timewait_death_row *dr, const int state) { struct inet_timewait_sock *tw; if (refcount_read(&dr->tw_refcount) - 1 >= READ_ONCE(dr->sysctl_max_tw_buckets)) return NULL; tw = kmem_cache_alloc(sk->sk_prot_creator->twsk_prot->twsk_slab, GFP_ATOMIC); if (tw) { const struct inet_sock *inet = inet_sk(sk); tw->tw_dr = dr; // This is incorrectly use as sk_flags!xXX > My worry is that all the callers calling sock_flag might have such > potential risk... > > Thanks, > Jason I'd worry that, too. How can callers of sock_flag() know which part of the union is active? At least for debugging purposes one could add a bool to struct sock_common which is false by default and gets set to true when the pointer members of the union are set, e.g. in inet_twsk_alloc(): struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk, struct inet_timewait_death_row *dr, const int state) { struct inet_timewait_sock *tw; if (refcount_read(&dr->tw_refcount) - 1 >= READ_ONCE(dr->sysctl_max_tw_buckets)) return NULL; tw = kmem_cache_alloc(sk->sk_prot_creator->twsk_prot->twsk_slab, GFP_ATOMIC); if (tw) { const struct inet_sock *inet = inet_sk(sk); tw->tw_dr = dr; tw->is_pointer = true; > Bert Karwatzki ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: lockup and kernel panic in linux-next-202505{09,12} when compiled with clang 2025-05-18 12:12 ` Bert Karwatzki @ 2025-05-18 12:43 ` Bert Karwatzki 2025-05-18 14:15 ` Bert Karwatzki 0 siblings, 1 reply; 20+ messages in thread From: Bert Karwatzki @ 2025-05-18 12:43 UTC (permalink / raw) To: Jason Xing Cc: Johannes Berg, linux-kernel@vger.kernel.org, linux-next@vger.kernel.org, llvm@lists.linux.dev, Thomas Gleixner, linux-wireless, spasswolf Am Sonntag, dem 18.05.2025 um 14:12 +0200 schrieb Bert Karwatzki: > > > > > > I even tried this version of your patch, to keep the offset of skc_refcnt at 128, > but it doesn't work, either. > > commit fca84c5cde713be480544a64ed6680afc3319670 > Author: Bert Karwatzki <spasswolf@web.de> > Date: Sun May 18 13:32:36 2025 +0200 > > include: net: sock: move skc_flags out of the union > > Signed-off-by: Bert Karwatzki <spasswolf@web.de> > > diff --git a/include/net/sock.h b/include/net/sock.h > index 3e15d7105ad2..e73929a4da6e 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -195,7 +195,6 @@ struct sock_common { > * for different kind of 'sockets' > */ > union { > - unsigned long skc_flags; > struct sock *skc_listener; /* request_sock */ > struct inet_timewait_death_row *skc_tw_dr; /* inet_timewait_sock */ > }; > @@ -221,6 +220,9 @@ struct sock_common { > }; > > refcount_t skc_refcnt; > + > + /* place skc_flags here to keep offset(struct sock, sk_refcnt) == 128 */ > + unsigned long skc_flags; > /* private: */ > int skc_dontcopy_end[0]; > union { > In the patch above I accidently put skc_flags in the part of struct sock_common which does not get copied, but putting it below skc_dontcopy_end[0] does not work, either: diff --git a/include/net/sock.h b/include/net/sock.h index 3e15d7105ad2..6d69753a205a 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -195,7 +195,6 @@ struct sock_common { * for different kind of 'sockets' */ union { - unsigned long skc_flags; struct sock *skc_listener; /* request_sock */ struct inet_timewait_death_row *skc_tw_dr; /* inet_timewait_sock */ }; @@ -221,8 +220,12 @@ struct sock_common { }; refcount_t skc_refcnt; + /* private: */ int skc_dontcopy_end[0]; + /* place skc_flags here to keep offset(struct sock, sk_refcnt) == 128 + * Also place it below skc_dontcopy_end[0] */ + unsigned long skc_flags; union { u32 skc_rxhash; u32 skc_window_clamp; This locks up as usual. Bert Karwatzki ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: lockup and kernel panic in linux-next-202505{09,12} when compiled with clang 2025-05-18 12:43 ` Bert Karwatzki @ 2025-05-18 14:15 ` Bert Karwatzki 2025-05-18 14:41 ` Bert Karwatzki 0 siblings, 1 reply; 20+ messages in thread From: Bert Karwatzki @ 2025-05-18 14:15 UTC (permalink / raw) To: Jason Xing Cc: Johannes Berg, linux-kernel@vger.kernel.org, linux-next@vger.kernel.org, llvm@lists.linux.dev, Thomas Gleixner, linux-wireless, spasswolf Am Sonntag, dem 18.05.2025 um 14:43 +0200 schrieb Bert Karwatzki: > Am Sonntag, dem 18.05.2025 um 14:12 +0200 schrieb Bert Karwatzki: > > > > > > > > > I even tried this version of your patch, to keep the offset of skc_refcnt at 128, > > but it doesn't work, either. > > > > commit fca84c5cde713be480544a64ed6680afc3319670 > > Author: Bert Karwatzki <spasswolf@web.de> > > Date: Sun May 18 13:32:36 2025 +0200 > > > > include: net: sock: move skc_flags out of the union > > > > Signed-off-by: Bert Karwatzki <spasswolf@web.de> > > > > diff --git a/include/net/sock.h b/include/net/sock.h > > index 3e15d7105ad2..e73929a4da6e 100644 > > --- a/include/net/sock.h > > +++ b/include/net/sock.h > > @@ -195,7 +195,6 @@ struct sock_common { > > * for different kind of 'sockets' > > */ > > union { > > - unsigned long skc_flags; > > struct sock *skc_listener; /* request_sock */ > > struct inet_timewait_death_row *skc_tw_dr; /* inet_timewait_sock */ > > }; > > @@ -221,6 +220,9 @@ struct sock_common { > > }; > > > > refcount_t skc_refcnt; > > + > > + /* place skc_flags here to keep offset(struct sock, sk_refcnt) == 128 */ > > + unsigned long skc_flags; > > /* private: */ > > int skc_dontcopy_end[0]; > > union { > > > > In the patch above I accidently put skc_flags in the part of struct sock_common > which does not get copied, but putting it below skc_dontcopy_end[0] does not work, > either: > > diff --git a/include/net/sock.h b/include/net/sock.h > index 3e15d7105ad2..6d69753a205a 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -195,7 +195,6 @@ struct sock_common { > * for different kind of 'sockets' > */ > union { > - unsigned long skc_flags; > struct sock *skc_listener; /* request_sock */ > struct inet_timewait_death_row *skc_tw_dr; /* inet_timewait_sock */ > }; > @@ -221,8 +220,12 @@ struct sock_common { > }; > > refcount_t skc_refcnt; > + > /* private: */ > int skc_dontcopy_end[0]; > + /* place skc_flags here to keep offset(struct sock, sk_refcnt) == 128 > + * Also place it below skc_dontcopy_end[0] */ > + unsigned long skc_flags; > union { > u32 skc_rxhash; > u32 skc_window_clamp; > > This locks up as usual. > > Bert Karwatzki So I did some more monitoring and found that even though skc_flags is removed from the union it can take strange values, e.g.: Here the value is not even a pointer (perhaps unitialized memory?): [ T572] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff88fc2abf4cc0 skb->sk->sk_flags = 0xa00f7fe57b16f7e1 These could be pointers, but as pointers would only be aligned to a 2-byte boundary ... [ T572] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff88fbd0bd3210 skb->sk->sk_flags = 0xffffc0f1c62dcc4e [ T572] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff88fbd0bd3210 skb->sk->sk_flags = 0xffffc0f1c62dcc4e Bert Karwatzki ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: lockup and kernel panic in linux-next-202505{09,12} when compiled with clang 2025-05-18 14:15 ` Bert Karwatzki @ 2025-05-18 14:41 ` Bert Karwatzki 0 siblings, 0 replies; 20+ messages in thread From: Bert Karwatzki @ 2025-05-18 14:41 UTC (permalink / raw) To: Jason Xing Cc: Johannes Berg, linux-kernel@vger.kernel.org, linux-next@vger.kernel.org, llvm@lists.linux.dev, Thomas Gleixner, linux-wireless, spasswolf Am Sonntag, dem 18.05.2025 um 16:15 +0200 schrieb Bert Karwatzki: > Am Sonntag, dem 18.05.2025 um 14:43 +0200 schrieb Bert Karwatzki: > > Am Sonntag, dem 18.05.2025 um 14:12 +0200 schrieb Bert Karwatzki: > > > > > > > > > > > > I even tried this version of your patch, to keep the offset of skc_refcnt at 128, > > > but it doesn't work, either. > > > > > > commit fca84c5cde713be480544a64ed6680afc3319670 > > > Author: Bert Karwatzki <spasswolf@web.de> > > > Date: Sun May 18 13:32:36 2025 +0200 > > > > > > include: net: sock: move skc_flags out of the union > > > > > > Signed-off-by: Bert Karwatzki <spasswolf@web.de> > > > > > > diff --git a/include/net/sock.h b/include/net/sock.h > > > index 3e15d7105ad2..e73929a4da6e 100644 > > > --- a/include/net/sock.h > > > +++ b/include/net/sock.h > > > @@ -195,7 +195,6 @@ struct sock_common { > > > * for different kind of 'sockets' > > > */ > > > union { > > > - unsigned long skc_flags; > > > struct sock *skc_listener; /* request_sock */ > > > struct inet_timewait_death_row *skc_tw_dr; /* inet_timewait_sock */ > > > }; > > > @@ -221,6 +220,9 @@ struct sock_common { > > > }; > > > > > > refcount_t skc_refcnt; > > > + > > > + /* place skc_flags here to keep offset(struct sock, sk_refcnt) == 128 */ > > > + unsigned long skc_flags; > > > /* private: */ > > > int skc_dontcopy_end[0]; > > > union { > > > > > > > In the patch above I accidently put skc_flags in the part of struct sock_common > > which does not get copied, but putting it below skc_dontcopy_end[0] does not work, > > either: > > > > diff --git a/include/net/sock.h b/include/net/sock.h > > index 3e15d7105ad2..6d69753a205a 100644 > > --- a/include/net/sock.h > > +++ b/include/net/sock.h > > @@ -195,7 +195,6 @@ struct sock_common { > > * for different kind of 'sockets' > > */ > > union { > > - unsigned long skc_flags; > > struct sock *skc_listener; /* request_sock */ > > struct inet_timewait_death_row *skc_tw_dr; /* inet_timewait_sock */ > > }; > > @@ -221,8 +220,12 @@ struct sock_common { > > }; > > > > refcount_t skc_refcnt; > > + > > /* private: */ > > int skc_dontcopy_end[0]; > > + /* place skc_flags here to keep offset(struct sock, sk_refcnt) == 128 > > + * Also place it below skc_dontcopy_end[0] */ > > + unsigned long skc_flags; > > union { > > u32 skc_rxhash; > > u32 skc_window_clamp; > > > > This locks up as usual. > > > > Bert Karwatzki > > So I did some more monitoring and found that even though skc_flags is removed from the union > it can take strange values, e.g.: > > Here the value is not even a pointer (perhaps unitialized memory?): > [ T572] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff88fc2abf4cc0 skb->sk->sk_flags = 0xa00f7fe57b16f7e1 > These could be pointers, but as pointers would only be aligned to a 2-byte boundary ... > [ T572] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff88fbd0bd3210 skb->sk->sk_flags = 0xffffc0f1c62dcc4e > [ T572] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff88fbd0bd3210 skb->sk->sk_flags = 0xffffc0f1c62dcc4e > > Bert Karwatzki I tried to set sk_flags to 0 in sk_prot_alloc() like this: commit 269f21266477e74321e32e0b022dda8e98785589 (HEAD -> clang_panic) Author: Bert Karwatzki <spasswolf@web.de> Date: Sun May 18 16:28:39 2025 +0200 net: core: sock: set initial sk_flags to 0 in sk_prot_alloc() Signed-off-by: Bert Karwatzki <spasswolf@web.de> diff --git a/net/core/sock.c b/net/core/sock.c index f6589ad5ba36..acaa39ad18be 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2216,6 +2216,7 @@ static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority, goto out_free_sec; } + sk->sk_flags = 0; return sk; out_free_sec: But that didn't work: [ 13.832282] [ T579] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8962805faee0 skb->sk->sk_flags = 0x4472000044f00000 [...] [ 124.165094] [ T579] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff896280760550 skb->sk->sk_flags = 0x726f2e65746f7571 [...] [ 185.138202] [ T579] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8960c78b7a90 skb->sk->sk_flags = 0x8000000000000025 [...] [ 290.623998] [ T579] ieee80211_8023_xmit_clang_debug_helper: skb->sk = ffff8961936b7870 skb->sk->sk_flags = 0xffff8961936b78f0 Bert Karwatzki ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: lockup and kernel panic in linux-next-202505{09,12} when compiled with clang
@ 2025-05-13 22:15 Bert Karwatzki
0 siblings, 0 replies; 20+ messages in thread
From: Bert Karwatzki @ 2025-05-13 22:15 UTC (permalink / raw)
To: linux-kernel
Cc: Bert Karwatzki, linux-next, llvm, Johannes Berg, Thomas Gleixner
commit 97f4b999e0c8 ("genirq: Use scoped_guard() to shut clang up") may me have been
a false lead, I reverted the following commit in next-20250512 and the boot failure
is still there.
73e2e0671c90 (HEAD -> clang_panic) Revert "genirq/manage: Convert to lock guards"
ff2e5dfa1c21 Revert "genirq/manage: Rework irq_update_affinity_desc()"
f2be1d787117 Revert "genirq/manage: Rework __irq_apply_affinity_hint()"
bc2493e2bdef Revert "genirq/manage: Rework irq_set_vcpu_affinity()"
8c1736260f99 Revert "genirq/manage: Rework __disable_irq_nosync()"
dd529a9bc52d Revert "genirq/manage: Rework enable_irq()"
75316d9120cf Revert "genirq/manage: Rework irq_set_irq_wake()"
544ff63947f5 Revert "genirq/manage: Rework can_request_irq()"
198028713b99 Revert "genirq/manage: Rework irq_set_parent()"
70a3f6953491 Revert "genirq/manage: Rework enable_percpu_irq()"
bcb28ca2603d Revert "genirq/manage: Rework irq_percpu_is_enabled()"
5858d87ac7e3 Revert "genirq/manage: Rework disable_percpu_irq()"
1a1f97a3dde0 Revert "genirq/manage: Rework prepare_percpu_nmi()"
e249ccf0dde0 Revert "genirq/manage: Rework teardown_percpu_nmi()"
9be3639bdde9 Revert "genirq: Remove irq_[get|put]_desc*()"
942b93a1ee9c Revert "genirq/manage: Rework irq_get_irqchip_state()"
8f731e7b7475 Revert "genirq/manage: Rework irq_set_irqchip_state()"
5bb621187696 Revert "genirq: Use scoped_guard() to shut clang up"
6539255c6012 Revert "wifi: free SKBTX_WIFI_STATUS skb tx_flags flag"
edef45700477 (tag: next-20250512, origin/master, origin/HEAD, master) Add linux-next specific files for 20250512
Bert Karwatzki
^ permalink raw reply [flat|nested] 20+ messages in threadend of thread, other threads:[~2025-05-18 14:41 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-13 16:48 lockup and kernel panic in linux-next-202505{09,12} when compiled with clang Bert Karwatzki
2025-05-13 22:33 ` Thomas Gleixner
2025-05-14 0:11 ` Bert Karwatzki
2025-05-14 9:32 ` Bert Karwatzki
2025-05-14 10:23 ` Johannes Berg
2025-05-14 13:46 ` Bert Karwatzki
2025-05-14 17:49 ` Johannes Berg
2025-05-14 18:56 ` Johannes Berg
2025-05-14 22:27 ` Bert Karwatzki
2025-05-15 6:30 ` Johannes Berg
2025-05-15 9:10 ` Bert Karwatzki
2025-05-16 18:19 ` Bert Karwatzki
2025-05-17 11:34 ` Bert Karwatzki
2025-05-17 19:49 ` Bert Karwatzki
2025-05-18 1:30 ` Jason Xing
2025-05-18 12:12 ` Bert Karwatzki
2025-05-18 12:43 ` Bert Karwatzki
2025-05-18 14:15 ` Bert Karwatzki
2025-05-18 14:41 ` Bert Karwatzki
-- strict thread matches above, loose matches on Subject: below --
2025-05-13 22:15 Bert Karwatzki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox