From: Frederic Weisbecker <frederic@kernel.org>
To: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: "Michał Pecio" <michal.pecio@gmail.com>,
anna-maria@linutronix.de, linux-kernel@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, linux-usb@vger.kernel.org,
mingo@kernel.org, tglx@linutronix.de,
"Andrew Lunn" <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>,
"Hayes Wang" <hayeswang@realtek.com>,
"Simon Horman" <horms@kernel.org>,
netdev@vger.kernel.org
Subject: Re: NOHZ tick-stop error: local softirq work is pending, handler #08!!! on Dell XPS 13 9360
Date: Tue, 11 Feb 2025 16:02:05 +0100 [thread overview]
Message-ID: <Z6tmbdl646D_UjrY@localhost.localdomain> (raw)
In-Reply-To: <10de7289-653f-43b1-ad46-2e8a0cd42724@molgen.mpg.de>
Le Tue, Feb 11, 2025 at 12:57:33PM +0100, Paul Menzel a écrit :
> Dear Frederic,
>
>
> Thank you for your reply.
>
>
> Am 10.02.25 um 14:26 schrieb Frederic Weisbecker:
> > Le Mon, Feb 10, 2025 at 12:59:42PM +0100, Paul Menzel a écrit :
>
> > > Am 10.02.25 um 12:45 schrieb Michał Pecio:
> > >
> > > > > > > > > > > On Dell XPS 13 9360/0596KF, BIOS 2.21.0 06/02/2022, with Linux
> > > > > > > > > > > 6.9-rc2+
> > > >
> > > > > Just for the record, I am still seeing this with 6.14.0-rc1
> > > >
> > > > Is this a regression? If so, which versions were not affected?
> > >
> > > Unfortunately, I do not know. Right now, my logs go back until September
> > > 2024.
> > >
> > > Sep 22 13:08:04 abreu kernel: Linux version 6.11.0-07273-g1e7530883cd2 (build@bohemianrhapsody.molgen.mpg.de) (gcc (Debian 14.2.0-5) 14.2.0, GNU ld (GNU Binutils for Debian) 2.43.1) #12 SMP PREEMPT_DYNAMIC Sun Sep 22 09:57:36 CEST 2024
> > >
> > > > How hard to reproduce? Wasn't it during resume from hibernation?
> > >
> > > It’s not easy to reproduce, and I believe it’s not related with resuming
> > > from hibernation (which I do not use) or ACPI S3 suspend. I think, I can
> > > force it more, when having the USB-C adapter with only the network cable
> > > plugged into it, and then running `sudo powertop --auto-tune`. But sometimes
> > > it seems unrelated.
> > >
> > > > IRQ isuses may be a red herring, this code here is a busy wait under
> > > > spinlock. There are a few of those, they cause various problems.
> > > >
> > > > if (xhci_handshake(&xhci->op_regs->status,
> > > > STS_RESTORE, 0, 100 * 1000)) {
> > > > xhci_warn(xhci, "WARN: xHC restore state timeout\n");
> > > > spin_unlock_irq(&xhci->lock);
> > > > return -ETIMEDOUT;
> > > > }
> > > >
> > > > This thing timing out may be close to the root cause of everything.
> > >
> > > Interesting. Hopefully the USB folks have an idea.
> >
> > Handler #08 is NET_RX. So something raised the NET_RX on some non-appropriate
> > place, perhaps...
> >
> > Can I ask you one more trace dump?
> >
> > I need:
> >
> > echo 1 > /sys/kernel/tracing/events/irq/softirq_raise/enable
> > echo 1 > /sys/kernel/tracing/options/stacktrace
> >
> > Unfortunately this will also involve a small patch:
> >
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > index fa058510af9c..accd2eb8c927 100644
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -1159,6 +1159,9 @@ static bool report_idle_softirq(void)
> > if (local_bh_blocked())
> > return false;
> > + trace_printk("STOP\n");
> > + trace_dump_stack(0);
> > + tracing_off();
> > pr_warn("NOHZ tick-stop error: local softirq work is pending, handler #%02x!!!\n",
> > pending);
> > ratelimit++;
>
> Thank you for your help. I applied the patch on top of 6.14-rc2, and was
> able to reproduce the issue. Please find the Linux messages attached, and
> the trace can be downloaded [1].
So here is the offender:
=> __raise_softirq_irqoff
=> __napi_schedule
=> rtl8152_runtime_resume.isra.0
=> rtl8152_resume
=> usb_resume_interface.isra.0
=> usb_resume_both
=> __rpm_callback
=> rpm_callback
=> rpm_resume
=> __pm_runtime_resume
=> usb_autoresume_device
=> usb_remote_wakeup
=> hub_event
=> process_one_work
=> worker_thread
=> kthread
=> ret_from_fork
=> ret_from_fork_asm
It is calling napi_schedule() from a non-interrupt. And since
____napi_schedule() assumes to be called from an interrupt, it
raises the softirq accordingly without waking up ksoftirqd.
Can you try the following fix (untested, sorry...) ?
diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 468c73974046..8f6ea4e7685c 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -8537,8 +8537,11 @@ static int rtl8152_runtime_resume(struct r8152 *tp)
clear_bit(SELECTIVE_SUSPEND, &tp->flags);
smp_mb__after_atomic();
- if (!list_empty(&tp->rx_done))
+ if (!list_empty(&tp->rx_done)) {
+ local_bh_disable();
napi_schedule(&tp->napi);
+ local_bh_enable();
+ }
usb_submit_urb(tp->intr_urb, GFP_NOIO);
} else {
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 67964dc4db95..1bd730b881f0 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -619,6 +619,17 @@ do { \
(!in_softirq() || in_irq() || in_nmi())); \
} while (0)
+/*
+ * Assert to be either in hardirq or in serving softirq or with
+ * softirqs disabled. Verifies a safe context to queue a softirq
+ * with __raise_softirq_irqoff().
+ */
+#define lockdep_assert_in_interrupt() \
+do { \
+ WARN_ON_ONCE(__lockdep_enabled && !in_interrupt()); \
+} while (0)
+
+
extern void lockdep_assert_in_softirq_func(void);
#else
@@ -634,6 +645,7 @@ extern void lockdep_assert_in_softirq_func(void);
# define lockdep_assert_preemption_enabled() do { } while (0)
# define lockdep_assert_preemption_disabled() do { } while (0)
# define lockdep_assert_in_softirq() do { } while (0)
+# define lockdep_assert_in_interrupt() do { } while (0)
# define lockdep_assert_in_softirq_func() do { } while (0)
#endif
diff --git a/net/core/dev.c b/net/core/dev.c
index c0021cbd28fc..80e415ccf2c8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4666,6 +4666,7 @@ static inline void ____napi_schedule(struct softnet_data *sd,
struct task_struct *thread;
lockdep_assert_irqs_disabled();
+ lockdep_assert_in_interrupt();
if (test_bit(NAPI_STATE_THREADED, &napi->state)) {
/* Paired with smp_mb__before_atomic() in
next prev parent reply other threads:[~2025-02-11 15:02 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-04 7:28 NOHZ tick-stop error: local softirq work is pending, handler #08!!! on Dell XPS 13 9360 Paul Menzel
2024-04-08 10:10 ` Anna-Maria Behnsen
2024-04-08 11:18 ` Paul Menzel
2024-04-09 7:57 ` Anna-Maria Behnsen
2024-08-21 8:20 ` Paul Menzel
2024-08-21 21:01 ` Paul Menzel
2024-10-24 12:48 ` Paul Menzel
2025-02-10 7:55 ` Paul Menzel
2025-02-10 11:45 ` Michał Pecio
2025-02-10 11:59 ` Paul Menzel
2025-02-10 13:26 ` Frederic Weisbecker
2025-02-11 11:57 ` Paul Menzel
2025-02-11 15:02 ` Frederic Weisbecker [this message]
2025-02-13 17:07 ` Paul Menzel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z6tmbdl646D_UjrY@localhost.localdomain \
--to=frederic@kernel.org \
--cc=andrew+netdev@lunn.ch \
--cc=anna-maria@linutronix.de \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hayeswang@realtek.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=linux-usb@vger.kernel.org \
--cc=michal.pecio@gmail.com \
--cc=mingo@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pmenzel@molgen.mpg.de \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox