From: Bert Karwatzki <spasswolf@web.de>
To: Johannes Berg <johannes@sipsolutions.net>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Cc: "linux-next@vger.kernel.org" <linux-next@vger.kernel.org>,
"llvm@lists.linux.dev" <llvm@lists.linux.dev>,
Thomas Gleixner <tglx@linutronix.de>,
linux-wireless@vger.kernel.org, spasswolf@web.de
Subject: Re: lockup and kernel panic in linux-next-202505{09,12} when compiled with clang
Date: Thu, 15 May 2025 11:10:31 +0200 [thread overview]
Message-ID: <ba97a2559cda1b14e0c9754523ff1152bdad90ef.camel@web.de> (raw)
In-Reply-To: <8684a2b4bf367e2e2a97e2b52356ffe5436a8270.camel@sipsolutions.net>
Am Donnerstag, dem 15.05.2025 um 08:30 +0200 schrieb Johannes Berg:
> On Thu, 2025-05-15 at 00:27 +0200, Bert Karwatzki wrote:
> > Am Mittwoch, dem 14.05.2025 um 20:56 +0200 schrieb Johannes Berg:
> > > >
> > > > I've split off the problematic piece of code into an noinline function to simplify the disassembly:
> > > >
> > >
> > > Oh and also, does it even still crash with that? :)
> >
> > Yes, it still crashes when compiled with clang.
>
> OK, just checking. :)
To be more precise I need clang AND PREEMPT_RT=y to get a crash.
>
> FWIW, I'm not convinced at all that the code you were looking at is
> really the problem. The crash (see below) is happening on the status
> side. Of course it cannot crash on the status side if on the TX side we
> never enter anything into the IDR data structure, and never tag the SKB
> to look up in the IDR and therefore never try to create the status
> report on the status side.
After looking at the backtrace I'm also no longer conviced that piece of code is
the problem.
>
> Basically what happens is this:
>
> - on TX, if we have a socket requesting status, create a copy of the
> SKB, put it into the IDR, and put the IDR index into the original
> skb->cb
> - then transmit the original skb, of course
> - on TX status report from the driver, see if the skb->cb is tagged with
> the IDR value, if so, report the copy of the SKB back to the socket
> with the status information
>
> (The reason we need to make a copy is that the SKB could be encrypted or
> otherwise modified in flight, and we don't want to undo that, rather
> keeping a copy for the report.)
>
> > [ 267.339591][ T575] BUG: unable to handle page fault for address: ffffffff51e080b0
> > [ 267.339598][ T575] #PF: supervisor write access in kernel mode
> > [ 267.339602][ T575] #PF: error_code(0x0002) - not-present page
> > [ 267.339606][ T575] PGD f1cc3c067 P4D f1cc3c067 PUD 0
> > [ 267.339613][ T575] Oops: Oops: 0002 [#1] SMP NOPTI
> > [ 267.339622][ T575] CPU: 0 UID: 0 PID: 575 Comm: napi/phy0-0 Not tainted
> > 6.15.0-rc6-next-20250513-llvm-00009-gec34cd07a425 #968 PREEMPT_{RT,(full)}
> > [ 267.339629][ T575] Hardware name: Micro-Star International Co., Ltd. Alpha
> > 15 B5EEK/MS-158L, BIOS E158LAMS.10F 11/11/2024
> > [ 267.339632][ T575] RIP: 0010:queued_spin_lock_slowpath+0x120/0x1c0
> ...
> > [ 267.339692][ T575] Call Trace:
> > [ 267.339701][ T575] <TASK>
> > [ 267.339705][ T575] _raw_spin_lock_irqsave+0x57/0x60
> > [ 267.339714][ T575] rt_spin_lock+0x73/0xa0
> > [ 267.339720][ T575] sock_queue_err_skb+0xdc/0x140
> > [ 267.339727][ T575] skb_complete_wifi_ack+0xa9/0x120
> > [ 267.339737][ T575] ieee80211_report_used_skb+0x541/0x6e0 [mac80211]
> > [ 267.339799][ T575] ? srso_alias_return_thunk+0x5/0xfbef5
> > [ 267.339804][ T575] ? start_dl_timer+0xcf/0x110
> > [ 267.339814][ T575] ieee80211_tx_status_ext+0x3b3/0x870 [mac80211]
> > [ 267.339851][ T575] ? raw_spin_rq_lock_nested+0x15/0x80
> > [ 267.339862][ T575] ? srso_alias_return_thunk+0x5/0xfbef5
> > [ 267.339866][ T575] ? rt_spin_lock+0x3d/0xa0
> > [ 267.339873][ T575] ? mt76_tx_status_unlock+0x38/0x230 [mt76]
> > [ 267.339886][ T575] mt76_tx_status_unlock+0x1e0/0x230 [mt76]
>
> Yeah so that's the crash on the status report as explained above, it
> kind of looks almost like the skb->sk was freed and somehow invalid now?
> But I don't see a general issue here (will keep digging), and how come
> it only shows up with clang?
>
> Since it reproduces pretty reliably, maybe you could do with KASAN?
>
I'm currently doing a testrun with KASAN enabled, test is running ~1h so far
(without KASAN the max time to a crash was about 10min), so KASAN is probably
killing the bug (there are no messages from KASAN in dmesg).
> Also could be interesting - what userspace are you running with wifi?
> What tool is even setting up the wifi status? If you don't really know
> maybe just put WARN_ON(1) into net/core/sock.s where SO_WIFI_STATUS is
> written (sk_setsockopt).
>
> johannes
For the recording these backtraces I disabled wifi just after booting (it
usually takes ~5s to connect here) with network manager (nmcli)(from debian sid
(last updated on 20250511, before I encountered this bug))
$ nmcli radio wifi off
then I set up the netconsole and reenabled wifi and waited for the crash
$ nmcli radio wifi on
Bert Karwatzki
next prev parent reply other threads:[~2025-05-15 9:10 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-13 16:48 lockup and kernel panic in linux-next-202505{09,12} when compiled with clang Bert Karwatzki
2025-05-13 22:33 ` Thomas Gleixner
2025-05-14 0:11 ` Bert Karwatzki
2025-05-14 9:32 ` Bert Karwatzki
2025-05-14 10:23 ` Johannes Berg
2025-05-14 13:46 ` Bert Karwatzki
2025-05-14 17:49 ` Johannes Berg
2025-05-14 18:56 ` Johannes Berg
2025-05-14 22:27 ` Bert Karwatzki
2025-05-15 6:30 ` Johannes Berg
2025-05-15 9:10 ` Bert Karwatzki [this message]
2025-05-16 18:19 ` Bert Karwatzki
2025-05-17 11:34 ` Bert Karwatzki
2025-05-17 19:49 ` Bert Karwatzki
2025-05-18 1:30 ` Jason Xing
2025-05-18 12:12 ` Bert Karwatzki
2025-05-18 12:43 ` Bert Karwatzki
2025-05-18 14:15 ` Bert Karwatzki
2025-05-18 14:41 ` Bert Karwatzki
-- strict thread matches above, loose matches on Subject: below --
2025-05-13 22:15 Bert Karwatzki
2025-05-13 10:19 Bert Karwatzki
2025-05-13 8:00 Bert Karwatzki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ba97a2559cda1b14e0c9754523ff1152bdad90ef.camel@web.de \
--to=spasswolf@web.de \
--cc=johannes@sipsolutions.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-next@vger.kernel.org \
--cc=linux-wireless@vger.kernel.org \
--cc=llvm@lists.linux.dev \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox