public inbox for linux-next@vger.kernel.org
 help / color / mirror / Atom feed
From: Bert Karwatzki <spasswolf@web.de>
To: Johannes Berg <johannes@sipsolutions.net>,
	"linux-kernel@vger.kernel.org"	 <linux-kernel@vger.kernel.org>
Cc: "linux-next@vger.kernel.org" <linux-next@vger.kernel.org>,
	 "llvm@lists.linux.dev" <llvm@lists.linux.dev>,
	Thomas Gleixner <tglx@linutronix.de>,
	 linux-wireless@vger.kernel.org, spasswolf@web.de
Subject: Re: lockup and kernel panic in linux-next-202505{09,12} when compiled with clang
Date: Thu, 15 May 2025 11:10:31 +0200	[thread overview]
Message-ID: <ba97a2559cda1b14e0c9754523ff1152bdad90ef.camel@web.de> (raw)
In-Reply-To: <8684a2b4bf367e2e2a97e2b52356ffe5436a8270.camel@sipsolutions.net>

Am Donnerstag, dem 15.05.2025 um 08:30 +0200 schrieb Johannes Berg:
> On Thu, 2025-05-15 at 00:27 +0200, Bert Karwatzki wrote:
> > Am Mittwoch, dem 14.05.2025 um 20:56 +0200 schrieb Johannes Berg:
> > > > 
> > > > I've split off the problematic piece of code into an noinline function to simplify the disassembly:
> > > > 
> > > 
> > > Oh and also, does it even still crash with that? :)
> > 
> > Yes, it still crashes when compiled with clang.
> 
> OK, just checking. :)

To be more precise I need clang AND PREEMPT_RT=y to get a crash.

> 
> FWIW, I'm not convinced at all that the code you were looking at is
> really the problem. The crash (see below) is happening on the status
> side. Of course it cannot crash on the status side if on the TX side we
> never enter anything into the IDR data structure, and never tag the SKB
> to look up in the IDR and therefore never try to create the status
> report on the status side.

After looking at the backtrace I'm also no longer conviced that piece of code is
the problem.

> 
> Basically what happens is this:
> 
> - on TX, if we have a socket requesting status, create a copy of the
>   SKB, put it into the IDR, and put the IDR index into the original
>   skb->cb
> - then transmit the original skb, of course
> - on TX status report from the driver, see if the skb->cb is tagged with
>   the IDR value, if so, report the copy of the SKB back to the socket
>   with the status information
> 
> (The reason we need to make a copy is that the SKB could be encrypted or
> otherwise modified in flight, and we don't want to undo that, rather
> keeping a copy for the report.)
> 
> >  [  267.339591][  T575] BUG: unable to handle page fault for address: ffffffff51e080b0
> >  [  267.339598][  T575] #PF: supervisor write access in kernel mode
> >  [  267.339602][  T575] #PF: error_code(0x0002) - not-present page
> >  [  267.339606][  T575] PGD f1cc3c067 P4D f1cc3c067 PUD 0 
> >  [  267.339613][  T575] Oops: Oops: 0002 [#1] SMP NOPTI
> >  [  267.339622][  T575] CPU: 0 UID: 0 PID: 575 Comm: napi/phy0-0 Not tainted
> > 6.15.0-rc6-next-20250513-llvm-00009-gec34cd07a425 #968 PREEMPT_{RT,(full)} 
> >  [  267.339629][  T575] Hardware name: Micro-Star International Co., Ltd. Alpha
> > 15 B5EEK/MS-158L, BIOS E158LAMS.10F 11/11/2024
> >  [  267.339632][  T575] RIP: 0010:queued_spin_lock_slowpath+0x120/0x1c0
> ...
> > [  267.339692][  T575] Call Trace:
> >  [  267.339701][  T575]  <TASK>
> >  [  267.339705][  T575]  _raw_spin_lock_irqsave+0x57/0x60
> >  [  267.339714][  T575]  rt_spin_lock+0x73/0xa0
> >  [  267.339720][  T575]  sock_queue_err_skb+0xdc/0x140
> >  [  267.339727][  T575]  skb_complete_wifi_ack+0xa9/0x120
> >  [  267.339737][  T575]  ieee80211_report_used_skb+0x541/0x6e0 [mac80211]
> >  [  267.339799][  T575]  ? srso_alias_return_thunk+0x5/0xfbef5
> >  [  267.339804][  T575]  ? start_dl_timer+0xcf/0x110
> >  [  267.339814][  T575]  ieee80211_tx_status_ext+0x3b3/0x870 [mac80211]
> >  [  267.339851][  T575]  ? raw_spin_rq_lock_nested+0x15/0x80
> >  [  267.339862][  T575]  ? srso_alias_return_thunk+0x5/0xfbef5
> >  [  267.339866][  T575]  ? rt_spin_lock+0x3d/0xa0
> >  [  267.339873][  T575]  ? mt76_tx_status_unlock+0x38/0x230 [mt76]
> >  [  267.339886][  T575]  mt76_tx_status_unlock+0x1e0/0x230 [mt76]
> 
> Yeah so that's the crash on the status report as explained above, it
> kind of looks almost like the skb->sk was freed and somehow invalid now?
> But I don't see a general issue here (will keep digging), and how come
> it only shows up with clang?
> 
> Since it reproduces pretty reliably, maybe you could do with KASAN?
> 

I'm currently doing a testrun with KASAN enabled, test is running ~1h so far
(without KASAN the max time to a crash was about 10min), so KASAN is probably
killing the bug (there are no messages from KASAN in dmesg).

> Also could be interesting - what userspace are you running with wifi?
> What tool is even setting up the wifi status? If you don't really know
> maybe just put WARN_ON(1) into net/core/sock.s where SO_WIFI_STATUS is
> written (sk_setsockopt).
>
> johannes

For the recording these backtraces I disabled wifi just after booting (it
usually takes ~5s to connect here) with network manager (nmcli)(from debian sid
(last updated on 20250511, before I encountered this bug))
$ nmcli radio wifi off
then I set up the netconsole and reenabled wifi and waited for the crash
$ nmcli radio wifi on

Bert Karwatzki


  reply	other threads:[~2025-05-15  9:10 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-13 16:48 lockup and kernel panic in linux-next-202505{09,12} when compiled with clang Bert Karwatzki
2025-05-13 22:33 ` Thomas Gleixner
2025-05-14  0:11   ` Bert Karwatzki
2025-05-14  9:32     ` Bert Karwatzki
2025-05-14 10:23       ` Johannes Berg
2025-05-14 13:46         ` Bert Karwatzki
2025-05-14 17:49           ` Johannes Berg
2025-05-14 18:56           ` Johannes Berg
2025-05-14 22:27             ` Bert Karwatzki
2025-05-15  6:30               ` Johannes Berg
2025-05-15  9:10                 ` Bert Karwatzki [this message]
2025-05-16 18:19                   ` Bert Karwatzki
2025-05-17 11:34                     ` Bert Karwatzki
2025-05-17 19:49                       ` Bert Karwatzki
2025-05-18  1:30                         ` Jason Xing
2025-05-18 12:12                           ` Bert Karwatzki
2025-05-18 12:43                             ` Bert Karwatzki
2025-05-18 14:15                               ` Bert Karwatzki
2025-05-18 14:41                                 ` Bert Karwatzki
  -- strict thread matches above, loose matches on Subject: below --
2025-05-13 22:15 Bert Karwatzki
2025-05-13 10:19 Bert Karwatzki
2025-05-13  8:00 Bert Karwatzki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ba97a2559cda1b14e0c9754523ff1152bdad90ef.camel@web.de \
    --to=spasswolf@web.de \
    --cc=johannes@sipsolutions.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-next@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox