public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Joel Fernandes <joelagnelf@nvidia.com>
To: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Cc: Joel Fernandes <joel@joelfernandes.org>,
	ankur.a.arora@oracle.com,
	Frederic Weisbecker <frederic@kernel.org>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	neeraj.upadhyay@kernel.org, urezki@gmail.com,
	rcu@vger.kernel.org, linux-kernel@vger.kernel.org,
	xiqi2@huawei.com,
	"Wangshaobo (bobo)" <bobo.shaobowang@huawei.com>
Subject: Re: [QUESTION] problems report: rcu_read_unlock_special() called in irq_exit() causes dead loop
Date: Tue, 3 Jun 2025 21:35:26 -0400	[thread overview]
Message-ID: <20250604013526.GA1192922@joelnvbox> (raw)
In-Reply-To: <a82784fd-d51e-4ea2-9d5c-43db971a3074@nvidia.com>

On Tue, Jun 03, 2025 at 03:22:42PM -0400, Joel Fernandes wrote:
> 
> 
> On 6/3/2025 3:03 PM, Joel Fernandes wrote:
> > 
> > 
> > On 6/3/2025 2:59 PM, Joel Fernandes wrote:
> >> On Fri, May 30, 2025 at 09:55:45AM +0800, Xiongfeng Wang wrote:
> >>> Hi Joel,
> >>>
> >>> On 2025/5/29 0:30, Joel Fernandes wrote:
> >>>> On Wed, May 21, 2025 at 5:43 AM Xiongfeng Wang
> >>>> <wangxiongfeng2@huawei.com> wrote:
> >>>>>
> >>>>> Hi RCU experts,
> >>>>>
> >>>>> When I ran syskaller in Linux 6.6 with CONFIG_PREEMPT_RCU enabled, I got
> >>>>> the following soft lockup. The Calltrace is too long. I put it in the end.
> >>>>> The issue can also be reproduced in the latest kernel.
> >>>>>
> >>>>> The issue is as follows. CPU3 is waiting for a spin_lock, which is got by CPU1.
> >>>>> But CPU1 stuck in the following dead loop.
> >>>>>
> >>>>> irq_exit()
> >>>>>   __irq_exit_rcu()
> >>>>>     /* in_hardirq() returns false after this */
> >>>>>     preempt_count_sub(HARDIRQ_OFFSET)
> >>>>>     tick_irq_exit()
> >>>>>       tick_nohz_irq_exit()
> >>>>>             tick_nohz_stop_sched_tick()
> >>>>>               trace_tick_stop()  /* a bpf prog is hooked on this trace point */
> >>>>>                    __bpf_trace_tick_stop()
> >>>>>                       bpf_trace_run2()
> >>>>>                             rcu_read_unlock_special()
> >>>>>                               /* will send a IPI to itself */
> >>>>>                               irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
> >>>>>
> >>>>> /* after interrupt is enabled again, the irq_work is called */
> >>>>> asm_sysvec_irq_work()
> >>>>>   sysvec_irq_work()
> >>>>> irq_exit() /* after handled the irq_work, we again enter into irq_exit() */
> >>>>>   __irq_exit_rcu()
> >>>>>     ...skip...
> >>>>>            /* we queue a irq_work again, and enter a dead loop */
> >>>>>            irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
> >>>>

The following is a candidate fix (among other fixes being
considered/discussed). The change is to check if context tracking thinks
we're in IRQ and if so, avoid the irq_work. IMO, this should be rare enough
that it shouldn't be an issue and it is dangerous to self-IPI consistently
while we're exiting an IRQ anyway.

Thoughts? Xiongfeng, do you want to try it?

Btw, I could easily reproduce it as a boot hang by doing:

--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -638,6 +638,10 @@ void irq_enter(void)
 
 static inline void tick_irq_exit(void)
 {
+	rcu_read_lock();
+	WRITE_ONCE(current->rcu_read_unlock_special.b.need_qs, true);
+	rcu_read_unlock();
+
 #ifdef CONFIG_NO_HZ_COMMON
 	int cpu = smp_processor_id();
 
---8<-----------------------

From: Joel Fernandes <joelagnelf@nvidia.com>
Subject: [PATCH] Do not schedule irq_work when IRQ is exiting

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 include/linux/context_tracking_irq.h |  2 ++
 kernel/context_tracking.c            | 12 ++++++++++++
 kernel/rcu/tree_plugin.h             |  3 ++-
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/context_tracking_irq.h b/include/linux/context_tracking_irq.h
index 197916ee91a4..35a5ad971514 100644
--- a/include/linux/context_tracking_irq.h
+++ b/include/linux/context_tracking_irq.h
@@ -9,6 +9,7 @@ void ct_irq_enter_irqson(void);
 void ct_irq_exit_irqson(void);
 void ct_nmi_enter(void);
 void ct_nmi_exit(void);
+bool ct_in_irq(void);
 #else
 static __always_inline void ct_irq_enter(void) { }
 static __always_inline void ct_irq_exit(void) { }
@@ -16,6 +17,7 @@ static inline void ct_irq_enter_irqson(void) { }
 static inline void ct_irq_exit_irqson(void) { }
 static __always_inline void ct_nmi_enter(void) { }
 static __always_inline void ct_nmi_exit(void) { }
+static inline bool ct_in_irq(void) { return false; }
 #endif
 
 #endif
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index fb5be6e9b423..8e8055cf04af 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -392,6 +392,18 @@ noinstr void ct_irq_exit(void)
 	ct_nmi_exit();
 }
 
+/**
+ * ct_in_irq - check if CPU is currently in a tracked IRQ context.
+ *
+ * Returns true if ct_irq_enter() has been called and ct_irq_exit()
+ * has not yet been called. This indicates the CPU is currently
+ * processing an interrupt.
+ */
+bool ct_in_irq(void)
+{
+	return ct_nmi_nesting() != 0;
+}
+
 /*
  * Wrapper for ct_irq_enter() where interrupts are enabled.
  *
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 3c0bbbbb686f..a3eebd4c841e 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -673,7 +673,8 @@ static void rcu_read_unlock_special(struct task_struct *t)
 			set_tsk_need_resched(current);
 			set_preempt_need_resched();
 			if (IS_ENABLED(CONFIG_IRQ_WORK) && irqs_were_disabled &&
-			    expboost && !rdp->defer_qs_iw_pending && cpu_online(rdp->cpu)) {
+			    expboost && !rdp->defer_qs_iw_pending && cpu_online(rdp->cpu) &&
+			    !ct_in_irq()) {
 				// Get scheduler to re-evaluate and call hooks.
 				// If !IRQ_WORK, FQS scan will eventually IPI.
 				if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) &&
-- 
2.34.1


  reply	other threads:[~2025-06-04  1:35 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-21  9:43 [QUESTION] problems report: rcu_read_unlock_special() called in irq_exit() causes dead loop Xiongfeng Wang
2025-05-28 16:30 ` Joel Fernandes
2025-05-30  1:55   ` Xiongfeng Wang
2025-06-03 18:59     ` Joel Fernandes
2025-06-03 19:03       ` Joel Fernandes
2025-06-03 19:22         ` Joel Fernandes
2025-06-04  1:35           ` Joel Fernandes [this message]
2025-06-04  3:25             ` Xiongfeng Wang
     [not found]               ` <64dfcaad-091c-4319-882b-d94515365758@huawei.com>
2025-06-04  9:20                 ` Joel Fernandes
2025-06-04  3:20           ` Xiongfeng Wang
2025-06-04 12:26             ` Paul E. McKenney
2025-06-05 18:56               ` Joel Fernandes
2025-07-01  9:20                 ` Qi Xi
2025-07-01 13:29                   ` Joel Fernandes
2025-07-02  9:04                     ` Qi Xi
2025-07-02  9:14                     ` Qi Xi
2025-07-02 10:59                       ` Joel Fernandes
2025-07-02 11:11                         ` Frederic Weisbecker
2025-07-02 17:24                         ` Joel Fernandes
2025-07-03  1:04                           ` Xiongfeng Wang
2025-07-05 13:12                             ` Joel Fernandes
2025-07-07  3:06                               ` Qi Xi
2025-07-07  3:08                                 ` Joel Fernandes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250604013526.GA1192922@joelnvbox \
    --to=joelagnelf@nvidia.com \
    --cc=ankur.a.arora@oracle.com \
    --cc=bobo.shaobowang@huawei.com \
    --cc=boqun.feng@gmail.com \
    --cc=frederic@kernel.org \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neeraj.upadhyay@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=urezki@gmail.com \
    --cc=wangxiongfeng2@huawei.com \
    --cc=xiqi2@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox