From: Frederic Weisbecker <frederic@kernel.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Mingcong Bai <jeffbai@aosc.io>,
Thorsten Leemhuis <regressions@leemhuis.info>,
Linux regressions mailing list <regressions@lists.linux.dev>,
LKML <linux-kernel@vger.kernel.org>, rcu <rcu@vger.kernel.org>,
sakiiily@aosc.io, Kexy Biscuit <kexybiscuit@aosc.io>
Subject: Re: [Regression] wifi problems since tg3 started throwing rcu stall warnings
Date: Tue, 12 Nov 2024 13:50:40 +0100 [thread overview]
Message-ID: <ZzNPIOR8aaxfrLE2@localhost.localdomain> (raw)
In-Reply-To: <814ca9e3-df3b-45ce-ad36-9659b445c499@paulmck-laptop>
Le Fri, Nov 08, 2024 at 07:14:41AM -0800, Paul E. McKenney a écrit :
> On Fri, Nov 08, 2024 at 02:46:16PM +0100, Frederic Weisbecker wrote:
> > Le Fri, Nov 08, 2024 at 12:29:40AM +0800, Mingcong Bai a écrit :
> > > Hi Frederic,
> > >
> > > <snip>
> > >
> > > > Sorry for the lag, I still don't understand how this specific commit
> > > > can produce this issue. Can you please retry with and without this
> > > > commit
> > > > reverted?
> > >
> > > Just tested v6.12-rc6 with and without the revert. Without the revert, the
> > > touchpad and the wireless adapter both stopped working, whereas with the
> > > revert, both devices functions as normal.
> > >
> > > I have attached the dmesg for both kernels below. Unlike the log we got last
> > > time, there is no direct reference to tg3 any more, but the NMI backtrace
> > > still pointed to NetworkManager and net/netlink-related functions (perhaps a
> > > debug kernel would be more helpful?). Here's a snippet:
> > >
> > > [ 10.337720] rcu: INFO: rcu_preempt detected expedited stalls on
> > > CPUs/tasks: { P683 } 21 jiffies s: 781 root: 0x0/T
> > > [ 10.339168] rcu: blocking rcu_node structures (internal RCU debug):
> > > [ 10.591480] loop0: detected capacity change from 0 to 8
> > > [ 11.777733] rcu: INFO: rcu_preempt detected expedited stalls on
> > > CPUs/tasks: { 3-.... } 21 jiffies s: 1077 root: 0x8/.
> > > [ 11.779210] rcu: blocking rcu_node structures (internal RCU debug):
> > > [ 11.780630] Sending NMI from CPU 1 to CPUs 3:
> > > [ 11.780659] NMI backtrace for cpu 3
> > > [ 11.780663] CPU: 3 UID: 0 PID: 1027 Comm: NetworkManager Not tainted
> > > 6.12.0-aosc-main #1
> >
> > Funny, this happens on bootup and no CPU has ever gone offline, so the path
> > modified by this patch shouldn't have been taken. And yet this commit has
> > an influence to the point of reliably triggering that stall.
> >
> > I'm running off of ideas, Paul any clue?
>
> Here is one straw to grasp at...
>
> Is it possible that one of the CPUs had a problem coming online at boot,
> and therefore backed out of the online process, thus appearing to at
> least some of the CPU-hotplug notifiers to have gone offline?
I looked for it in the dmesg and there are indeed rejected CPUs but very early,
before secondary boot-up.
Just in case, Mingcong Bai can you test the following patch without the
revert and see if it triggers something?
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 35949ec1f935..b4f8ed8138d3 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -5170,6 +5170,7 @@ void rcutree_migrate_callbacks(int cpu)
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
bool needwake;
+ WARN_ON_ONCE(1);
if (rcu_rdp_is_offloaded(rdp))
return;
Thanks.
>
> Thanx, Paul
next prev parent reply other threads:[~2024-11-12 12:50 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-23 8:27 [Regression] wifi problems since tg3 started throwing rcu stall warnings Linux regression tracking (Thorsten Leemhuis)
2024-10-23 9:11 ` Linux regression tracking (Thorsten Leemhuis)
2024-10-23 10:09 ` Frederic Weisbecker
2024-10-23 10:22 ` Linux regression tracking (Thorsten Leemhuis)
2024-11-05 7:17 ` Mingcong Bai
2024-11-07 9:10 ` Thorsten Leemhuis
2024-11-07 10:04 ` Frederic Weisbecker
2024-11-07 10:33 ` Mingcong Bai
2024-11-07 16:29 ` Mingcong Bai
2024-11-08 13:46 ` Frederic Weisbecker
2024-11-08 15:14 ` Paul E. McKenney
2024-11-12 12:50 ` Frederic Weisbecker [this message]
2024-11-15 3:01 ` Mingcong Bai
2024-11-19 10:43 ` Frederic Weisbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZzNPIOR8aaxfrLE2@localhost.localdomain \
--to=frederic@kernel.org \
--cc=jeffbai@aosc.io \
--cc=kexybiscuit@aosc.io \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@kernel.org \
--cc=rcu@vger.kernel.org \
--cc=regressions@leemhuis.info \
--cc=regressions@lists.linux.dev \
--cc=sakiiily@aosc.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.