bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] context_tracking: Provide helper to determine if we're in IRQ
@ 2025-06-09 18:01 Joel Fernandes
  2025-06-09 18:01 ` [PATCH 2/2] rcu: Fix lockup when RCU reader used while IRQ exiting Joel Fernandes
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Joel Fernandes @ 2025-06-09 18:01 UTC (permalink / raw)
  To: linux-kernel, Frederic Weisbecker, Paul E. McKenney
  Cc: Xiongfeng Wang, rcu, bpf, Joel Fernandes

context_tracking keeps track of whether we're handling IRQ well after
the preempt masks give take it off their books. We need this
functionality in a follow-up patch to fix a bug. Provide a helper API
for the same.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 include/linux/context_tracking_irq.h |  2 ++
 kernel/context_tracking.c            | 12 ++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/include/linux/context_tracking_irq.h b/include/linux/context_tracking_irq.h
index 197916ee91a4..35a5ad971514 100644
--- a/include/linux/context_tracking_irq.h
+++ b/include/linux/context_tracking_irq.h
@@ -9,6 +9,7 @@ void ct_irq_enter_irqson(void);
 void ct_irq_exit_irqson(void);
 void ct_nmi_enter(void);
 void ct_nmi_exit(void);
+bool ct_in_irq(void);
 #else
 static __always_inline void ct_irq_enter(void) { }
 static __always_inline void ct_irq_exit(void) { }
@@ -16,6 +17,7 @@ static inline void ct_irq_enter_irqson(void) { }
 static inline void ct_irq_exit_irqson(void) { }
 static __always_inline void ct_nmi_enter(void) { }
 static __always_inline void ct_nmi_exit(void) { }
+static inline bool ct_in_irq(void) { return false; }
 #endif
 
 #endif
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index fb5be6e9b423..d0759ef9a6bd 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -392,6 +392,18 @@ noinstr void ct_irq_exit(void)
 	ct_nmi_exit();
 }
 
+/**
+ * ct_in_irq - check if CPU is in a context-tracked IRQ context.
+ *
+ * Returns true if ct_irq_enter() has been called and ct_irq_exit()
+ * has not yet been called. This indicates the CPU is currently
+ * processing an interrupt.
+ */
+bool ct_in_irq(void)
+{
+	return ct_nmi_nesting() != 0;
+}
+
 /*
  * Wrapper for ct_irq_enter() where interrupts are enabled.
  *
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] rcu: Fix lockup when RCU reader used while IRQ exiting
  2025-06-09 18:01 [PATCH 1/2] context_tracking: Provide helper to determine if we're in IRQ Joel Fernandes
@ 2025-06-09 18:01 ` Joel Fernandes
  2025-06-09 19:49   ` Boqun Feng
                     ` (2 more replies)
  2025-06-09 18:05 ` [PATCH 1/2] context_tracking: Provide helper to determine if we're in IRQ Joel Fernandes
  2025-06-11 16:25 ` Frederic Weisbecker
  2 siblings, 3 replies; 14+ messages in thread
From: Joel Fernandes @ 2025-06-09 18:01 UTC (permalink / raw)
  To: linux-kernel, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
	Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang
  Cc: Xiongfeng Wang, rcu, bpf, Joel Fernandes

During rcu_read_unlock_special(), if this happens during irq_exit(), we
can lockup if an IPI is issued. This is because the IPI itself triggers
the irq_exit() path causing a recursive lock up.

This is precisely what Xiongfeng found when invoking a BPF program on
the trace_tick_stop() tracepoint As shown in the trace below. Fix by
using context-tracking to tell us if we're still in an IRQ.
context-tracking keeps track of the IRQ until after the tracepoint, so
it cures the issues.

irq_exit()
  __irq_exit_rcu()
    /* in_hardirq() returns false after this */
    preempt_count_sub(HARDIRQ_OFFSET)
    tick_irq_exit()
      tick_nohz_irq_exit()
	    tick_nohz_stop_sched_tick()
	      trace_tick_stop()  /* a bpf prog is hooked on this trace point */
		   __bpf_trace_tick_stop()
		      bpf_trace_run2()
			    rcu_read_unlock_special()
                              /* will send a IPI to itself */
			      irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);

A simple reproducer can also be obtained by doing the following in
tick_irq_exit(). It will hang on boot without the patch:

  static inline void tick_irq_exit(void)
  {
 +	rcu_read_lock();
 +	WRITE_ONCE(current->rcu_read_unlock_special.b.need_qs, true);
 +	rcu_read_unlock();
 +

While at it, add some comments to this code.

Reported-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Closes: https://lore.kernel.org/all/9acd5f9f-6732-7701-6880-4b51190aa070@huawei.com/
Tested-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 kernel/rcu/tree_plugin.h | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 3c0bbbbb686f..53d8b3415776 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -653,6 +653,9 @@ static void rcu_read_unlock_special(struct task_struct *t)
 		struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
 		struct rcu_node *rnp = rdp->mynode;
 
+		// In cases where the RCU-reader is boosted, we'd attempt deboost sooner than
+		// later to prevent inducing latency to other RT tasks. Also, expedited GPs
+		// should not be delayed too much. Track both these needs in expboost.
 		expboost = (t->rcu_blocked_node && READ_ONCE(t->rcu_blocked_node->exp_tasks)) ||
 			   (rdp->grpmask & READ_ONCE(rnp->expmask)) ||
 			   (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) &&
@@ -670,10 +673,15 @@ static void rcu_read_unlock_special(struct task_struct *t)
 			// Also if no expediting and no possible deboosting,
 			// slow is OK.  Plus nohz_full CPUs eventually get
 			// tick enabled.
+			//
+			// Also prevent doing this if context-tracking thinks
+			// we're handling an IRQ (including when we're exiting
+			// one -- required to prevent self-IPI deadloops).
 			set_tsk_need_resched(current);
 			set_preempt_need_resched();
 			if (IS_ENABLED(CONFIG_IRQ_WORK) && irqs_were_disabled &&
-			    expboost && !rdp->defer_qs_iw_pending && cpu_online(rdp->cpu)) {
+			    expboost && !rdp->defer_qs_iw_pending && cpu_online(rdp->cpu) &&
+			    !ct_in_irq()) {
 				// Get scheduler to re-evaluate and call hooks.
 				// If !IRQ_WORK, FQS scan will eventually IPI.
 				if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) &&
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] context_tracking: Provide helper to determine if we're in IRQ
  2025-06-09 18:01 [PATCH 1/2] context_tracking: Provide helper to determine if we're in IRQ Joel Fernandes
  2025-06-09 18:01 ` [PATCH 2/2] rcu: Fix lockup when RCU reader used while IRQ exiting Joel Fernandes
@ 2025-06-09 18:05 ` Joel Fernandes
  2025-06-11 16:25 ` Frederic Weisbecker
  2 siblings, 0 replies; 14+ messages in thread
From: Joel Fernandes @ 2025-06-09 18:05 UTC (permalink / raw)
  To: linux-kernel, Frederic Weisbecker, Paul E. McKenney
  Cc: Xiongfeng Wang, rcu, bpf



On 6/9/2025 2:01 PM, Joel Fernandes wrote:
> context_tracking keeps track of whether we're handling IRQ well after
> the preempt masks give take it off their books. We need this
> functionality in a follow-up patch to fix a bug. Provide a helper API
> for the same.
> 
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> ---
>  include/linux/context_tracking_irq.h |  2 ++
>  kernel/context_tracking.c            | 12 ++++++++++++
>  2 files changed, 14 insertions(+)
> 
> diff --git a/include/linux/context_tracking_irq.h b/include/linux/context_tracking_irq.h
> index 197916ee91a4..35a5ad971514 100644
> --- a/include/linux/context_tracking_irq.h
> +++ b/include/linux/context_tracking_irq.h
> @@ -9,6 +9,7 @@ void ct_irq_enter_irqson(void);
>  void ct_irq_exit_irqson(void);
>  void ct_nmi_enter(void);
>  void ct_nmi_exit(void);
> +bool ct_in_irq(void);
>  #else
>  static __always_inline void ct_irq_enter(void) { }
>  static __always_inline void ct_irq_exit(void) { }
> @@ -16,6 +17,7 @@ static inline void ct_irq_enter_irqson(void) { }
>  static inline void ct_irq_exit_irqson(void) { }
>  static __always_inline void ct_nmi_enter(void) { }
>  static __always_inline void ct_nmi_exit(void) { }
> +static inline bool ct_in_irq(void) { return false; }
I did s/inline/__always_inline/ here. Will send with next posting.

thanks,

 - Joel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] rcu: Fix lockup when RCU reader used while IRQ exiting
  2025-06-09 18:01 ` [PATCH 2/2] rcu: Fix lockup when RCU reader used while IRQ exiting Joel Fernandes
@ 2025-06-09 19:49   ` Boqun Feng
  2025-06-09 23:26     ` Frederic Weisbecker
  2025-06-10 12:23   ` Frederic Weisbecker
  2025-06-11 16:05   ` Boqun Feng
  2 siblings, 1 reply; 14+ messages in thread
From: Boqun Feng @ 2025-06-09 19:49 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Uladzislau Rezki,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Xiongfeng Wang, rcu, bpf

Hi Joel,

On Mon, Jun 09, 2025 at 02:01:24PM -0400, Joel Fernandes wrote:
> During rcu_read_unlock_special(), if this happens during irq_exit(), we
> can lockup if an IPI is issued. This is because the IPI itself triggers
> the irq_exit() path causing a recursive lock up.
> 
> This is precisely what Xiongfeng found when invoking a BPF program on
> the trace_tick_stop() tracepoint As shown in the trace below. Fix by
> using context-tracking to tell us if we're still in an IRQ.
> context-tracking keeps track of the IRQ until after the tracepoint, so
> it cures the issues.
> 

This does fix the issue, but do we know when the CPU will eventually
report a QS after this fix? I believe we still want to report a QS as
early as possible in this case?

Regards,
Boqun

> irq_exit()
>   __irq_exit_rcu()
>     /* in_hardirq() returns false after this */
>     preempt_count_sub(HARDIRQ_OFFSET)
>     tick_irq_exit()
>       tick_nohz_irq_exit()
> 	    tick_nohz_stop_sched_tick()
> 	      trace_tick_stop()  /* a bpf prog is hooked on this trace point */
> 		   __bpf_trace_tick_stop()
> 		      bpf_trace_run2()
> 			    rcu_read_unlock_special()
>                               /* will send a IPI to itself */
> 			      irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
> 
> A simple reproducer can also be obtained by doing the following in
> tick_irq_exit(). It will hang on boot without the patch:
> 
>   static inline void tick_irq_exit(void)
>   {
>  +	rcu_read_lock();
>  +	WRITE_ONCE(current->rcu_read_unlock_special.b.need_qs, true);
>  +	rcu_read_unlock();
>  +
> 
> While at it, add some comments to this code.
> 
> Reported-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
> Closes: https://lore.kernel.org/all/9acd5f9f-6732-7701-6880-4b51190aa070@huawei.com/
> Tested-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> ---
>  kernel/rcu/tree_plugin.h | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 3c0bbbbb686f..53d8b3415776 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -653,6 +653,9 @@ static void rcu_read_unlock_special(struct task_struct *t)
>  		struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
>  		struct rcu_node *rnp = rdp->mynode;
>  
> +		// In cases where the RCU-reader is boosted, we'd attempt deboost sooner than
> +		// later to prevent inducing latency to other RT tasks. Also, expedited GPs
> +		// should not be delayed too much. Track both these needs in expboost.
>  		expboost = (t->rcu_blocked_node && READ_ONCE(t->rcu_blocked_node->exp_tasks)) ||
>  			   (rdp->grpmask & READ_ONCE(rnp->expmask)) ||
>  			   (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) &&
> @@ -670,10 +673,15 @@ static void rcu_read_unlock_special(struct task_struct *t)
>  			// Also if no expediting and no possible deboosting,
>  			// slow is OK.  Plus nohz_full CPUs eventually get
>  			// tick enabled.
> +			//
> +			// Also prevent doing this if context-tracking thinks
> +			// we're handling an IRQ (including when we're exiting
> +			// one -- required to prevent self-IPI deadloops).
>  			set_tsk_need_resched(current);
>  			set_preempt_need_resched();
>  			if (IS_ENABLED(CONFIG_IRQ_WORK) && irqs_were_disabled &&
> -			    expboost && !rdp->defer_qs_iw_pending && cpu_online(rdp->cpu)) {
> +			    expboost && !rdp->defer_qs_iw_pending && cpu_online(rdp->cpu) &&
> +			    !ct_in_irq()) {
>  				// Get scheduler to re-evaluate and call hooks.
>  				// If !IRQ_WORK, FQS scan will eventually IPI.
>  				if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) &&
> -- 
> 2.34.1
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] rcu: Fix lockup when RCU reader used while IRQ exiting
  2025-06-09 19:49   ` Boqun Feng
@ 2025-06-09 23:26     ` Frederic Weisbecker
  2025-06-10  0:49       ` Boqun Feng
  0 siblings, 1 reply; 14+ messages in thread
From: Frederic Weisbecker @ 2025-06-09 23:26 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Joel Fernandes, linux-kernel, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Uladzislau Rezki, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Xiongfeng Wang, rcu,
	bpf

Le Mon, Jun 09, 2025 at 12:49:06PM -0700, Boqun Feng a écrit :
> Hi Joel,
> 
> On Mon, Jun 09, 2025 at 02:01:24PM -0400, Joel Fernandes wrote:
> > During rcu_read_unlock_special(), if this happens during irq_exit(), we
> > can lockup if an IPI is issued. This is because the IPI itself triggers
> > the irq_exit() path causing a recursive lock up.
> > 
> > This is precisely what Xiongfeng found when invoking a BPF program on
> > the trace_tick_stop() tracepoint As shown in the trace below. Fix by
> > using context-tracking to tell us if we're still in an IRQ.
> > context-tracking keeps track of the IRQ until after the tracepoint, so
> > it cures the issues.
> > 
> 
> This does fix the issue, but do we know when the CPU will eventually
> report a QS after this fix? I believe we still want to report a QS as
> early as possible in this case?

If !ct_in_irq(), we issue a self-IPI, then preempt_schedule_irq() will
call into schedule() and report a QS (if preempt/bh is not disabled, otherwise
this is delayed to preempt_enable() or local_bh_enable() issuing preempt_schedule())

If ct_in_irq(), we are already in an IRQ, then it's the same as above
eventually.

Thanks.

-- 
Frederic Weisbecker
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] rcu: Fix lockup when RCU reader used while IRQ exiting
  2025-06-09 23:26     ` Frederic Weisbecker
@ 2025-06-10  0:49       ` Boqun Feng
  0 siblings, 0 replies; 14+ messages in thread
From: Boqun Feng @ 2025-06-10  0:49 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Joel Fernandes, linux-kernel, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Uladzislau Rezki, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Xiongfeng Wang, rcu,
	bpf

On Tue, Jun 10, 2025 at 01:26:46AM +0200, Frederic Weisbecker wrote:
> Le Mon, Jun 09, 2025 at 12:49:06PM -0700, Boqun Feng a écrit :
> > Hi Joel,
> > 
> > On Mon, Jun 09, 2025 at 02:01:24PM -0400, Joel Fernandes wrote:
> > > During rcu_read_unlock_special(), if this happens during irq_exit(), we
> > > can lockup if an IPI is issued. This is because the IPI itself triggers
> > > the irq_exit() path causing a recursive lock up.
> > > 
> > > This is precisely what Xiongfeng found when invoking a BPF program on
> > > the trace_tick_stop() tracepoint As shown in the trace below. Fix by
> > > using context-tracking to tell us if we're still in an IRQ.
> > > context-tracking keeps track of the IRQ until after the tracepoint, so
> > > it cures the issues.
> > > 
> > 
> > This does fix the issue, but do we know when the CPU will eventually
> > report a QS after this fix? I believe we still want to report a QS as
> > early as possible in this case?
> 
> If !ct_in_irq(), we issue a self-IPI, then preempt_schedule_irq() will
> call into schedule() and report a QS (if preempt/bh is not disabled, otherwise
> this is delayed to preempt_enable() or local_bh_enable() issuing preempt_schedule())
> 
> If ct_in_irq(), we are already in an IRQ, then it's the same as above
> eventually.
> 

I see, I was missing this, thanks for pointing out ;-)

Regards,
Boqun

> Thanks.
> 
> -- 
> Frederic Weisbecker
> SUSE Labs
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] rcu: Fix lockup when RCU reader used while IRQ exiting
  2025-06-09 18:01 ` [PATCH 2/2] rcu: Fix lockup when RCU reader used while IRQ exiting Joel Fernandes
  2025-06-09 19:49   ` Boqun Feng
@ 2025-06-10 12:23   ` Frederic Weisbecker
  2025-06-10 15:47     ` Joel Fernandes
  2025-06-12  3:06     ` Xiongfeng Wang
  2025-06-11 16:05   ` Boqun Feng
  2 siblings, 2 replies; 14+ messages in thread
From: Frederic Weisbecker @ 2025-06-10 12:23 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Paul E. McKenney, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Uladzislau Rezki, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Xiongfeng Wang, rcu,
	bpf

Le Mon, Jun 09, 2025 at 02:01:24PM -0400, Joel Fernandes a écrit :
> During rcu_read_unlock_special(), if this happens during irq_exit(), we
> can lockup if an IPI is issued. This is because the IPI itself triggers
> the irq_exit() path causing a recursive lock up.
> 
> This is precisely what Xiongfeng found when invoking a BPF program on
> the trace_tick_stop() tracepoint As shown in the trace below. Fix by
> using context-tracking to tell us if we're still in an IRQ.
> context-tracking keeps track of the IRQ until after the tracepoint, so
> it cures the issues.
> 
> irq_exit()
>   __irq_exit_rcu()
>     /* in_hardirq() returns false after this */
>     preempt_count_sub(HARDIRQ_OFFSET)
>     tick_irq_exit()
>       tick_nohz_irq_exit()
> 	    tick_nohz_stop_sched_tick()
> 	      trace_tick_stop()  /* a bpf prog is hooked on this trace point */
> 		   __bpf_trace_tick_stop()
> 		      bpf_trace_run2()
> 			    rcu_read_unlock_special()
>                               /* will send a IPI to itself */
> 			      irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
> 
> A simple reproducer can also be obtained by doing the following in
> tick_irq_exit(). It will hang on boot without the patch:
> 
>   static inline void tick_irq_exit(void)
>   {
>  +	rcu_read_lock();
>  +	WRITE_ONCE(current->rcu_read_unlock_special.b.need_qs, true);
>  +	rcu_read_unlock();
>  +
> 
> While at it, add some comments to this code.
> 
> Reported-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
> Closes: https://lore.kernel.org/all/9acd5f9f-6732-7701-6880-4b51190aa070@huawei.com/
> Tested-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

Acked-by: Frederic Weisbecker <frederic@kernel.org>

Just a few remarks:

> ---
>  kernel/rcu/tree_plugin.h | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 3c0bbbbb686f..53d8b3415776 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -653,6 +653,9 @@ static void rcu_read_unlock_special(struct task_struct *t)
>  		struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
>  		struct rcu_node *rnp = rdp->mynode;
>  
> +		// In cases where the RCU-reader is boosted, we'd attempt deboost sooner than
> +		// later to prevent inducing latency to other RT tasks. Also, expedited GPs
> +		// should not be delayed too much. Track both these needs in expboost.
>  		expboost = (t->rcu_blocked_node && READ_ONCE(t->rcu_blocked_node->exp_tasks)) ||
>  			   (rdp->grpmask & READ_ONCE(rnp->expmask)) ||
>  			   (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) &&
> @@ -670,10 +673,15 @@ static void rcu_read_unlock_special(struct task_struct *t)
>  			// Also if no expediting and no possible deboosting,
>  			// slow is OK.  Plus nohz_full CPUs eventually get
>  			// tick enabled.
> +			//
> +			// Also prevent doing this if context-tracking thinks
> +			// we're handling an IRQ (including when we're exiting
> +			// one -- required to prevent self-IPI deadloops).
>  			set_tsk_need_resched(current);
>  			set_preempt_need_resched();
>  			if (IS_ENABLED(CONFIG_IRQ_WORK) && irqs_were_disabled &&
> -			    expboost && !rdp->defer_qs_iw_pending && cpu_online(rdp->cpu)) {
> +			    expboost && !rdp->defer_qs_iw_pending && cpu_online(rdp->cpu) &&
> +			    !ct_in_irq()) {
>  				// Get scheduler to re-evaluate and call hooks.
>  				// If !IRQ_WORK, FQS scan will eventually IPI.
>  				if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) &&
> --

Looking at the irq work handling here:

* What is the point of ->defer_qs_iw_pending ? If the irq work is already
  queued, it won't be requeued because the irq work code already prevent from
  that.

* CONFIG_PREEMPT_RT && !CONFIG_RCU_STRICT_GRACE_PERIOD would queue a lazy irq
  work but still raise a hardirq to wake up softirq to handle it. It's pointless
  because there is nothing to execute in softirq, all we care about is the
  hardirq.
  Also since the work is empty it might as well be executed in hard irq, that
  shouldn't induce more latency in RT.

* Empty hard irq work raised to trigger something on irq exit also exist
  elsewhere (see nohz_full_kick_func()). Would it make sense to have that
  implemented in irq_work.c instead and trigger that through a simple
  irq_work_kick()?

And then this would look like (only built-tested):

diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index 136f2980cba3..4149ed516524 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -57,6 +57,9 @@ static inline bool irq_work_is_hard(struct irq_work *work)
 bool irq_work_queue(struct irq_work *work);
 bool irq_work_queue_on(struct irq_work *work, int cpu);
 
+bool irq_work_kick(void);
+bool irq_work_kick_on(int cpu);
+
 void irq_work_tick(void);
 void irq_work_sync(struct irq_work *work);
 
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 73f7e1fd4ab4..383a3e9050d9 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -181,6 +181,22 @@ bool irq_work_queue_on(struct irq_work *work, int cpu)
 #endif /* CONFIG_SMP */
 }
 
+static void kick_func(struct irq_work *work)
+{
+}
+
+static DEFINE_PER_CPU(struct irq_work, kick_work) = IRQ_WORK_INIT_HARD(kick_func);
+
+bool irq_work_kick(void)
+{
+	return irq_work_queue(this_cpu_ptr(&kick_work));
+}
+
+bool irq_work_kick_on(int cpu)
+{
+	return irq_work_queue_on(per_cpu_ptr(&kick_work, cpu), cpu);
+}
+
 bool irq_work_needs_cpu(void)
 {
 	struct llist_head *raised, *lazy;
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index a9a811d9d7a3..b33888071e41 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -191,7 +191,6 @@ struct rcu_data {
 					/*  during and after the last grace */
 					/* period it is aware of. */
 	struct irq_work defer_qs_iw;	/* Obtain later scheduler attention. */
-	bool defer_qs_iw_pending;	/* Scheduler attention pending? */
 	struct work_struct strict_work;	/* Schedule readers for strict GPs. */
 
 	/* 2) batch handling */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 3c0bbbbb686f..0c7b7c220b46 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -619,17 +619,6 @@ notrace void rcu_preempt_deferred_qs(struct task_struct *t)
 	rcu_preempt_deferred_qs_irqrestore(t, flags);
 }
 
-/*
- * Minimal handler to give the scheduler a chance to re-evaluate.
- */
-static void rcu_preempt_deferred_qs_handler(struct irq_work *iwp)
-{
-	struct rcu_data *rdp;
-
-	rdp = container_of(iwp, struct rcu_data, defer_qs_iw);
-	rdp->defer_qs_iw_pending = false;
-}
-
 /*
  * Handle special cases during rcu_read_unlock(), such as needing to
  * notify RCU core processing or task having blocked during the RCU
@@ -673,18 +662,10 @@ static void rcu_read_unlock_special(struct task_struct *t)
 			set_tsk_need_resched(current);
 			set_preempt_need_resched();
 			if (IS_ENABLED(CONFIG_IRQ_WORK) && irqs_were_disabled &&
-			    expboost && !rdp->defer_qs_iw_pending && cpu_online(rdp->cpu)) {
+			    expboost && cpu_online(rdp->cpu)) {
 				// Get scheduler to re-evaluate and call hooks.
 				// If !IRQ_WORK, FQS scan will eventually IPI.
-				if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) &&
-				    IS_ENABLED(CONFIG_PREEMPT_RT))
-					rdp->defer_qs_iw = IRQ_WORK_INIT_HARD(
-								rcu_preempt_deferred_qs_handler);
-				else
-					init_irq_work(&rdp->defer_qs_iw,
-						      rcu_preempt_deferred_qs_handler);
-				rdp->defer_qs_iw_pending = true;
-				irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
+				irq_work_kick();
 			}
 		}
 		local_irq_restore(flags);
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index c527b421c865..84170656334d 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -377,14 +377,6 @@ static bool can_stop_full_tick(int cpu, struct tick_sched *ts)
 	return true;
 }
 
-static void nohz_full_kick_func(struct irq_work *work)
-{
-	/* Empty, the tick restart happens on tick_nohz_irq_exit() */
-}
-
-static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) =
-	IRQ_WORK_INIT_HARD(nohz_full_kick_func);
-
 /*
  * Kick this CPU if it's full dynticks in order to force it to
  * re-evaluate its dependency on the tick and restart it if necessary.
@@ -396,7 +388,7 @@ static void tick_nohz_full_kick(void)
 	if (!tick_nohz_full_cpu(smp_processor_id()))
 		return;
 
-	irq_work_queue(this_cpu_ptr(&nohz_full_kick_work));
+	irq_work_kick();
 }
 
 /*
@@ -408,7 +400,7 @@ void tick_nohz_full_kick_cpu(int cpu)
 	if (!tick_nohz_full_cpu(cpu))
 		return;
 
-	irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu);
+	irq_work_kick_on(cpu);
 }
 
 static void tick_nohz_kick_task(struct task_struct *tsk)

  
  
-- 
Frederic Weisbecker
SUSE Labs

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] rcu: Fix lockup when RCU reader used while IRQ exiting
  2025-06-10 12:23   ` Frederic Weisbecker
@ 2025-06-10 15:47     ` Joel Fernandes
  2025-06-12  3:06     ` Xiongfeng Wang
  1 sibling, 0 replies; 14+ messages in thread
From: Joel Fernandes @ 2025-06-10 15:47 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Paul E. McKenney, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Uladzislau Rezki, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Xiongfeng Wang, rcu,
	bpf



On 6/10/2025 8:23 AM, Frederic Weisbecker wrote:
> Le Mon, Jun 09, 2025 at 02:01:24PM -0400, Joel Fernandes a écrit :
>> During rcu_read_unlock_special(), if this happens during irq_exit(), we
>> can lockup if an IPI is issued. This is because the IPI itself triggers
>> the irq_exit() path causing a recursive lock up.
>>
>> This is precisely what Xiongfeng found when invoking a BPF program on
>> the trace_tick_stop() tracepoint As shown in the trace below. Fix by
>> using context-tracking to tell us if we're still in an IRQ.
>> context-tracking keeps track of the IRQ until after the tracepoint, so
>> it cures the issues.
>>
>> irq_exit()
>>   __irq_exit_rcu()
>>     /* in_hardirq() returns false after this */
>>     preempt_count_sub(HARDIRQ_OFFSET)
>>     tick_irq_exit()
>>       tick_nohz_irq_exit()
>> 	    tick_nohz_stop_sched_tick()
>> 	      trace_tick_stop()  /* a bpf prog is hooked on this trace point */
>> 		   __bpf_trace_tick_stop()
>> 		      bpf_trace_run2()
>> 			    rcu_read_unlock_special()
>>                               /* will send a IPI to itself */
>> 			      irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
>>
>> A simple reproducer can also be obtained by doing the following in
>> tick_irq_exit(). It will hang on boot without the patch:
>>
>>   static inline void tick_irq_exit(void)
>>   {
>>  +	rcu_read_lock();
>>  +	WRITE_ONCE(current->rcu_read_unlock_special.b.need_qs, true);
>>  +	rcu_read_unlock();
>>  +
>>
>> While at it, add some comments to this code.
>>
>> Reported-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
>> Closes: https://lore.kernel.org/all/9acd5f9f-6732-7701-6880-4b51190aa070@huawei.com/
>> Tested-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
>> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> 
> Acked-by: Frederic Weisbecker <frederic@kernel.org>

Thanks.

> * What is the point of ->defer_qs_iw_pending ? If the irq work is already
>   queued, it won't be requeued because the irq work code already prevent from
>   that.

Sure, but I think maybe we should not even attempt to queue the irq_work if
defer_qs_iw_pending? I understand there's no harm, but we'd depend on irq_work
internals for the intended behavior.

> 
> * CONFIG_PREEMPT_RT && !CONFIG_RCU_STRICT_GRACE_PERIOD would queue a lazy irq
>   work but still raise a hardirq to wake up softirq to handle it. It's pointless
>   because there is nothing to execute in softirq, all we care about is the
>   hardirq.
>   Also since the work is empty it might as well be executed in hard irq, that
>   shouldn't induce more latency in RT.

Oh, hm. So your irq_work_kick() on PREEMPT_RT would only trigger the hard irq?

That does make sense to me. Lets add the RT folks (Sebastian) as well to confirm
this behavior is sound?

> 
> * Empty hard irq work raised to trigger something on irq exit also exist
>   elsewhere (see nohz_full_kick_func()). Would it make sense to have that
>   implemented in irq_work.c instead and trigger that through a simple
>   irq_work_kick()?

Yeah, sure. We'd probably need some serious testing to make sure we didn't break
anything else and perhaps some new test cases. From past experience, this code
path seems easy to break. But, nice change!

thanks,

 - Joel


> And then this would look like (only built-tested):
> 
> diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
> index 136f2980cba3..4149ed516524 100644
> --- a/include/linux/irq_work.h
> +++ b/include/linux/irq_work.h
> @@ -57,6 +57,9 @@ static inline bool irq_work_is_hard(struct irq_work *work)
>  bool irq_work_queue(struct irq_work *work);
>  bool irq_work_queue_on(struct irq_work *work, int cpu);
>  
> +bool irq_work_kick(void);
> +bool irq_work_kick_on(int cpu);
> +
>  void irq_work_tick(void);
>  void irq_work_sync(struct irq_work *work);
>  
> diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> index 73f7e1fd4ab4..383a3e9050d9 100644
> --- a/kernel/irq_work.c
> +++ b/kernel/irq_work.c
> @@ -181,6 +181,22 @@ bool irq_work_queue_on(struct irq_work *work, int cpu)
>  #endif /* CONFIG_SMP */
>  }
>  
> +static void kick_func(struct irq_work *work)
> +{
> +}
> +
> +static DEFINE_PER_CPU(struct irq_work, kick_work) = IRQ_WORK_INIT_HARD(kick_func);
> +
> +bool irq_work_kick(void)
> +{
> +	return irq_work_queue(this_cpu_ptr(&kick_work));
> +}
> +
> +bool irq_work_kick_on(int cpu)
> +{
> +	return irq_work_queue_on(per_cpu_ptr(&kick_work, cpu), cpu);
> +}
> +
>  bool irq_work_needs_cpu(void)
>  {
>  	struct llist_head *raised, *lazy;
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index a9a811d9d7a3..b33888071e41 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -191,7 +191,6 @@ struct rcu_data {
>  					/*  during and after the last grace */
>  					/* period it is aware of. */
>  	struct irq_work defer_qs_iw;	/* Obtain later scheduler attention. */
> -	bool defer_qs_iw_pending;	/* Scheduler attention pending? */
>  	struct work_struct strict_work;	/* Schedule readers for strict GPs. */
>  
>  	/* 2) batch handling */
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 3c0bbbbb686f..0c7b7c220b46 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -619,17 +619,6 @@ notrace void rcu_preempt_deferred_qs(struct task_struct *t)
>  	rcu_preempt_deferred_qs_irqrestore(t, flags);
>  }
>  
> -/*
> - * Minimal handler to give the scheduler a chance to re-evaluate.
> - */
> -static void rcu_preempt_deferred_qs_handler(struct irq_work *iwp)
> -{
> -	struct rcu_data *rdp;
> -
> -	rdp = container_of(iwp, struct rcu_data, defer_qs_iw);
> -	rdp->defer_qs_iw_pending = false;
> -}
> -
>  /*
>   * Handle special cases during rcu_read_unlock(), such as needing to
>   * notify RCU core processing or task having blocked during the RCU
> @@ -673,18 +662,10 @@ static void rcu_read_unlock_special(struct task_struct *t)
>  			set_tsk_need_resched(current);
>  			set_preempt_need_resched();
>  			if (IS_ENABLED(CONFIG_IRQ_WORK) && irqs_were_disabled &&
> -			    expboost && !rdp->defer_qs_iw_pending && cpu_online(rdp->cpu)) {
> +			    expboost && cpu_online(rdp->cpu)) {
>  				// Get scheduler to re-evaluate and call hooks.
>  				// If !IRQ_WORK, FQS scan will eventually IPI.
> -				if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) &&
> -				    IS_ENABLED(CONFIG_PREEMPT_RT))
> -					rdp->defer_qs_iw = IRQ_WORK_INIT_HARD(
> -								rcu_preempt_deferred_qs_handler);
> -				else
> -					init_irq_work(&rdp->defer_qs_iw,
> -						      rcu_preempt_deferred_qs_handler);
> -				rdp->defer_qs_iw_pending = true;
> -				irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
> +				irq_work_kick();
>  			}
>  		}
>  		local_irq_restore(flags);
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index c527b421c865..84170656334d 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -377,14 +377,6 @@ static bool can_stop_full_tick(int cpu, struct tick_sched *ts)
>  	return true;
>  }
>  
> -static void nohz_full_kick_func(struct irq_work *work)
> -{
> -	/* Empty, the tick restart happens on tick_nohz_irq_exit() */
> -}
> -
> -static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) =
> -	IRQ_WORK_INIT_HARD(nohz_full_kick_func);
> -
>  /*
>   * Kick this CPU if it's full dynticks in order to force it to
>   * re-evaluate its dependency on the tick and restart it if necessary.
> @@ -396,7 +388,7 @@ static void tick_nohz_full_kick(void)
>  	if (!tick_nohz_full_cpu(smp_processor_id()))
>  		return;
>  
> -	irq_work_queue(this_cpu_ptr(&nohz_full_kick_work));
> +	irq_work_kick();
>  }
>  
>  /*
> @@ -408,7 +400,7 @@ void tick_nohz_full_kick_cpu(int cpu)
>  	if (!tick_nohz_full_cpu(cpu))
>  		return;
>  
> -	irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu);
> +	irq_work_kick_on(cpu);
>  }
>  
>  static void tick_nohz_kick_task(struct task_struct *tsk)
> 
>   
>   


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] rcu: Fix lockup when RCU reader used while IRQ exiting
  2025-06-09 18:01 ` [PATCH 2/2] rcu: Fix lockup when RCU reader used while IRQ exiting Joel Fernandes
  2025-06-09 19:49   ` Boqun Feng
  2025-06-10 12:23   ` Frederic Weisbecker
@ 2025-06-11 16:05   ` Boqun Feng
  2025-06-11 16:16     ` Paul E. McKenney
  2 siblings, 1 reply; 14+ messages in thread
From: Boqun Feng @ 2025-06-11 16:05 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Uladzislau Rezki,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Xiongfeng Wang, rcu, bpf

On Mon, Jun 09, 2025 at 02:01:24PM -0400, Joel Fernandes wrote:
> During rcu_read_unlock_special(), if this happens during irq_exit(), we
> can lockup if an IPI is issued. This is because the IPI itself triggers
> the irq_exit() path causing a recursive lock up.
> 
> This is precisely what Xiongfeng found when invoking a BPF program on
> the trace_tick_stop() tracepoint As shown in the trace below. Fix by
> using context-tracking to tell us if we're still in an IRQ.
> context-tracking keeps track of the IRQ until after the tracepoint, so
> it cures the issues.
> 
> irq_exit()
>   __irq_exit_rcu()
>     /* in_hardirq() returns false after this */
>     preempt_count_sub(HARDIRQ_OFFSET)
>     tick_irq_exit()

@Frederic, while we are at it, what's the purpose of in_hardirq() in
tick_irq_exit()? For nested interrupt detection?

Regards,
Boqun

>       tick_nohz_irq_exit()
> 	    tick_nohz_stop_sched_tick()
> 	      trace_tick_stop()  /* a bpf prog is hooked on this trace point */
> 		   __bpf_trace_tick_stop()
> 		      bpf_trace_run2()
> 			    rcu_read_unlock_special()
>                               /* will send a IPI to itself */
> 			      irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
> 
> A simple reproducer can also be obtained by doing the following in
> tick_irq_exit(). It will hang on boot without the patch:
> 
>   static inline void tick_irq_exit(void)
>   {
>  +	rcu_read_lock();
>  +	WRITE_ONCE(current->rcu_read_unlock_special.b.need_qs, true);
>  +	rcu_read_unlock();
>  +
> 
> While at it, add some comments to this code.
> 
> Reported-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
> Closes: https://lore.kernel.org/all/9acd5f9f-6732-7701-6880-4b51190aa070@huawei.com/
> Tested-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
[...]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] rcu: Fix lockup when RCU reader used while IRQ exiting
  2025-06-11 16:05   ` Boqun Feng
@ 2025-06-11 16:16     ` Paul E. McKenney
  2025-06-11 16:21       ` Boqun Feng
  0 siblings, 1 reply; 14+ messages in thread
From: Paul E. McKenney @ 2025-06-11 16:16 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Joel Fernandes, linux-kernel, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Uladzislau Rezki,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Xiongfeng Wang, rcu, bpf

On Wed, Jun 11, 2025 at 09:05:06AM -0700, Boqun Feng wrote:
> On Mon, Jun 09, 2025 at 02:01:24PM -0400, Joel Fernandes wrote:
> > During rcu_read_unlock_special(), if this happens during irq_exit(), we
> > can lockup if an IPI is issued. This is because the IPI itself triggers
> > the irq_exit() path causing a recursive lock up.
> > 
> > This is precisely what Xiongfeng found when invoking a BPF program on
> > the trace_tick_stop() tracepoint As shown in the trace below. Fix by
> > using context-tracking to tell us if we're still in an IRQ.
> > context-tracking keeps track of the IRQ until after the tracepoint, so
> > it cures the issues.
> > 
> > irq_exit()
> >   __irq_exit_rcu()
> >     /* in_hardirq() returns false after this */
> >     preempt_count_sub(HARDIRQ_OFFSET)
> >     tick_irq_exit()
> 
> @Frederic, while we are at it, what's the purpose of in_hardirq() in
> tick_irq_exit()? For nested interrupt detection?

If you are talking about the comment, these sorts of comments help
people reading the code, the point being that some common-code function
that invokes in_hardirq() after that point will get the wrong answer
from it.  The context-tracking code does the same for whether or not
RCU is watching.

							Thanx, Paul

> Regards,
> Boqun
> 
> >       tick_nohz_irq_exit()
> > 	    tick_nohz_stop_sched_tick()
> > 	      trace_tick_stop()  /* a bpf prog is hooked on this trace point */
> > 		   __bpf_trace_tick_stop()
> > 		      bpf_trace_run2()
> > 			    rcu_read_unlock_special()
> >                               /* will send a IPI to itself */
> > 			      irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
> > 
> > A simple reproducer can also be obtained by doing the following in
> > tick_irq_exit(). It will hang on boot without the patch:
> > 
> >   static inline void tick_irq_exit(void)
> >   {
> >  +	rcu_read_lock();
> >  +	WRITE_ONCE(current->rcu_read_unlock_special.b.need_qs, true);
> >  +	rcu_read_unlock();
> >  +
> > 
> > While at it, add some comments to this code.
> > 
> > Reported-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
> > Closes: https://lore.kernel.org/all/9acd5f9f-6732-7701-6880-4b51190aa070@huawei.com/
> > Tested-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
> > Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> [...]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] rcu: Fix lockup when RCU reader used while IRQ exiting
  2025-06-11 16:16     ` Paul E. McKenney
@ 2025-06-11 16:21       ` Boqun Feng
  0 siblings, 0 replies; 14+ messages in thread
From: Boqun Feng @ 2025-06-11 16:21 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Joel Fernandes, linux-kernel, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Uladzislau Rezki,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Xiongfeng Wang, rcu, bpf

On Wed, Jun 11, 2025 at 09:16:05AM -0700, Paul E. McKenney wrote:
> On Wed, Jun 11, 2025 at 09:05:06AM -0700, Boqun Feng wrote:
> > On Mon, Jun 09, 2025 at 02:01:24PM -0400, Joel Fernandes wrote:
> > > During rcu_read_unlock_special(), if this happens during irq_exit(), we
> > > can lockup if an IPI is issued. This is because the IPI itself triggers
> > > the irq_exit() path causing a recursive lock up.
> > > 
> > > This is precisely what Xiongfeng found when invoking a BPF program on
> > > the trace_tick_stop() tracepoint As shown in the trace below. Fix by
> > > using context-tracking to tell us if we're still in an IRQ.
> > > context-tracking keeps track of the IRQ until after the tracepoint, so
> > > it cures the issues.
> > > 
> > > irq_exit()
> > >   __irq_exit_rcu()
> > >     /* in_hardirq() returns false after this */
> > >     preempt_count_sub(HARDIRQ_OFFSET)
> > >     tick_irq_exit()
> > 
> > @Frederic, while we are at it, what's the purpose of in_hardirq() in
> > tick_irq_exit()? For nested interrupt detection?
> 
> If you are talking about the comment, these sorts of comments help
> people reading the code, the point being that some common-code function
> that invokes in_hardirq() after that point will get the wrong answer
> from it.  The context-tracking code does the same for whether or not

The thing is that tick_irq_exit() is supposed to be only called in
irq_exit() IIUC (given its name), and so without nested interrupts,
in_hardirq() will also give the wrong answer.

Regards,
Boqun

> RCU is watching.
> 
> 							Thanx, Paul
> 
> > Regards,
> > Boqun
> > 
> > >       tick_nohz_irq_exit()
> > > 	    tick_nohz_stop_sched_tick()
> > > 	      trace_tick_stop()  /* a bpf prog is hooked on this trace point */
> > > 		   __bpf_trace_tick_stop()
> > > 		      bpf_trace_run2()
> > > 			    rcu_read_unlock_special()
> > >                               /* will send a IPI to itself */
> > > 			      irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
> > > 
> > > A simple reproducer can also be obtained by doing the following in
> > > tick_irq_exit(). It will hang on boot without the patch:
> > > 
> > >   static inline void tick_irq_exit(void)
> > >   {
> > >  +	rcu_read_lock();
> > >  +	WRITE_ONCE(current->rcu_read_unlock_special.b.need_qs, true);
> > >  +	rcu_read_unlock();
> > >  +
> > > 
> > > While at it, add some comments to this code.
> > > 
> > > Reported-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
> > > Closes: https://lore.kernel.org/all/9acd5f9f-6732-7701-6880-4b51190aa070@huawei.com/
> > > Tested-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
> > > Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> > [...]
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] context_tracking: Provide helper to determine if we're in IRQ
  2025-06-09 18:01 [PATCH 1/2] context_tracking: Provide helper to determine if we're in IRQ Joel Fernandes
  2025-06-09 18:01 ` [PATCH 2/2] rcu: Fix lockup when RCU reader used while IRQ exiting Joel Fernandes
  2025-06-09 18:05 ` [PATCH 1/2] context_tracking: Provide helper to determine if we're in IRQ Joel Fernandes
@ 2025-06-11 16:25 ` Frederic Weisbecker
  2 siblings, 0 replies; 14+ messages in thread
From: Frederic Weisbecker @ 2025-06-11 16:25 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: linux-kernel, Paul E. McKenney, Xiongfeng Wang, rcu, bpf

Le Mon, Jun 09, 2025 at 02:01:23PM -0400, Joel Fernandes a écrit :
> context_tracking keeps track of whether we're handling IRQ well after
> the preempt masks give take it off their books. We need this
> functionality in a follow-up patch to fix a bug. Provide a helper API
> for the same.
> 
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> ---
>  include/linux/context_tracking_irq.h |  2 ++
>  kernel/context_tracking.c            | 12 ++++++++++++
>  2 files changed, 14 insertions(+)
> 
> diff --git a/include/linux/context_tracking_irq.h b/include/linux/context_tracking_irq.h
> index 197916ee91a4..35a5ad971514 100644
> --- a/include/linux/context_tracking_irq.h
> +++ b/include/linux/context_tracking_irq.h
> @@ -9,6 +9,7 @@ void ct_irq_enter_irqson(void);
>  void ct_irq_exit_irqson(void);
>  void ct_nmi_enter(void);
>  void ct_nmi_exit(void);
> +bool ct_in_irq(void);
>  #else
>  static __always_inline void ct_irq_enter(void) { }
>  static __always_inline void ct_irq_exit(void) { }
> @@ -16,6 +17,7 @@ static inline void ct_irq_enter_irqson(void) { }
>  static inline void ct_irq_exit_irqson(void) { }
>  static __always_inline void ct_nmi_enter(void) { }
>  static __always_inline void ct_nmi_exit(void) { }
> +static inline bool ct_in_irq(void) { return false; }
>  #endif
>  
>  #endif
> diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
> index fb5be6e9b423..d0759ef9a6bd 100644
> --- a/kernel/context_tracking.c
> +++ b/kernel/context_tracking.c
> @@ -392,6 +392,18 @@ noinstr void ct_irq_exit(void)
>  	ct_nmi_exit();
>  }
>  
> +/**
> + * ct_in_irq - check if CPU is in a context-tracked IRQ context.
> + *
> + * Returns true if ct_irq_enter() has been called and ct_irq_exit()
> + * has not yet been called. This indicates the CPU is currently
> + * processing an interrupt.
> + */
> +bool ct_in_irq(void)
> +{
> +	return ct_nmi_nesting() != 0;

If rcu_is_watching() and not in an interrupt, ct_nmi_nesting()
is actually CT_NESTING_IRQ_NONIDLE. If rcu_is_watching() and
in an interrupt, ct_nmi_nesting() can be CT_NESTING_IRQ_NONIDLE + whatever.

So this doesn't work. I wish we could remove that CT_NESTING_IRQ_NONIDLE
that is there for hysterical raisins but that doesn't fit in an urgent pile.

So probably:

bool ct_in_irq(void)
{
	long nesting = ct_nmi_nesting();

	return (nesting && nesting != CT_NESTING_IRQ_NONIDLE);
}

Thanks.

-- 
Frederic Weisbecker
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] rcu: Fix lockup when RCU reader used while IRQ exiting
  2025-06-10 12:23   ` Frederic Weisbecker
  2025-06-10 15:47     ` Joel Fernandes
@ 2025-06-12  3:06     ` Xiongfeng Wang
  2025-06-12 11:37       ` Frederic Weisbecker
  1 sibling, 1 reply; 14+ messages in thread
From: Xiongfeng Wang @ 2025-06-12  3:06 UTC (permalink / raw)
  To: Frederic Weisbecker, Joel Fernandes
  Cc: linux-kernel, Paul E. McKenney, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Uladzislau Rezki, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu, bpf, xiqi2

[-- Attachment #1: Type: text/plain, Size: 5037 bytes --]

+cc (Qi, my colleague who helps testing the modification)

On 2025/6/10 20:23, Frederic Weisbecker wrote:
> Le Mon, Jun 09, 2025 at 02:01:24PM -0400, Joel Fernandes a écrit :
>> During rcu_read_unlock_special(), if this happens during irq_exit(), we

...skipped...

We have tested the below modification without the modification written by Joel
using the previous syzkaller benchmark. The kernel still panic.
The dmesg log is attached.

Thanks,
Xiongfeng

> 
> diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
> index 136f2980cba3..4149ed516524 100644
> --- a/include/linux/irq_work.h
> +++ b/include/linux/irq_work.h
> @@ -57,6 +57,9 @@ static inline bool irq_work_is_hard(struct irq_work *work)
>  bool irq_work_queue(struct irq_work *work);
>  bool irq_work_queue_on(struct irq_work *work, int cpu);
>  
> +bool irq_work_kick(void);
> +bool irq_work_kick_on(int cpu);
> +
>  void irq_work_tick(void);
>  void irq_work_sync(struct irq_work *work);
>  
> diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> index 73f7e1fd4ab4..383a3e9050d9 100644
> --- a/kernel/irq_work.c
> +++ b/kernel/irq_work.c
> @@ -181,6 +181,22 @@ bool irq_work_queue_on(struct irq_work *work, int cpu)
>  #endif /* CONFIG_SMP */
>  }
>  
> +static void kick_func(struct irq_work *work)
> +{
> +}
> +
> +static DEFINE_PER_CPU(struct irq_work, kick_work) = IRQ_WORK_INIT_HARD(kick_func);
> +
> +bool irq_work_kick(void)
> +{
> +	return irq_work_queue(this_cpu_ptr(&kick_work));
> +}
> +
> +bool irq_work_kick_on(int cpu)
> +{
> +	return irq_work_queue_on(per_cpu_ptr(&kick_work, cpu), cpu);
> +}
> +
>  bool irq_work_needs_cpu(void)
>  {
>  	struct llist_head *raised, *lazy;
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index a9a811d9d7a3..b33888071e41 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -191,7 +191,6 @@ struct rcu_data {
>  					/*  during and after the last grace */
>  					/* period it is aware of. */
>  	struct irq_work defer_qs_iw;	/* Obtain later scheduler attention. */
> -	bool defer_qs_iw_pending;	/* Scheduler attention pending? */
>  	struct work_struct strict_work;	/* Schedule readers for strict GPs. */
>  
>  	/* 2) batch handling */
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 3c0bbbbb686f..0c7b7c220b46 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -619,17 +619,6 @@ notrace void rcu_preempt_deferred_qs(struct task_struct *t)
>  	rcu_preempt_deferred_qs_irqrestore(t, flags);
>  }
>  
> -/*
> - * Minimal handler to give the scheduler a chance to re-evaluate.
> - */
> -static void rcu_preempt_deferred_qs_handler(struct irq_work *iwp)
> -{
> -	struct rcu_data *rdp;
> -
> -	rdp = container_of(iwp, struct rcu_data, defer_qs_iw);
> -	rdp->defer_qs_iw_pending = false;
> -}
> -
>  /*
>   * Handle special cases during rcu_read_unlock(), such as needing to
>   * notify RCU core processing or task having blocked during the RCU
> @@ -673,18 +662,10 @@ static void rcu_read_unlock_special(struct task_struct *t)
>  			set_tsk_need_resched(current);
>  			set_preempt_need_resched();
>  			if (IS_ENABLED(CONFIG_IRQ_WORK) && irqs_were_disabled &&
> -			    expboost && !rdp->defer_qs_iw_pending && cpu_online(rdp->cpu)) {
> +			    expboost && cpu_online(rdp->cpu)) {
>  				// Get scheduler to re-evaluate and call hooks.
>  				// If !IRQ_WORK, FQS scan will eventually IPI.
> -				if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) &&
> -				    IS_ENABLED(CONFIG_PREEMPT_RT))
> -					rdp->defer_qs_iw = IRQ_WORK_INIT_HARD(
> -								rcu_preempt_deferred_qs_handler);
> -				else
> -					init_irq_work(&rdp->defer_qs_iw,
> -						      rcu_preempt_deferred_qs_handler);
> -				rdp->defer_qs_iw_pending = true;
> -				irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
> +				irq_work_kick();
>  			}
>  		}
>  		local_irq_restore(flags);
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index c527b421c865..84170656334d 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -377,14 +377,6 @@ static bool can_stop_full_tick(int cpu, struct tick_sched *ts)
>  	return true;
>  }
>  
> -static void nohz_full_kick_func(struct irq_work *work)
> -{
> -	/* Empty, the tick restart happens on tick_nohz_irq_exit() */
> -}
> -
> -static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) =
> -	IRQ_WORK_INIT_HARD(nohz_full_kick_func);
> -
>  /*
>   * Kick this CPU if it's full dynticks in order to force it to
>   * re-evaluate its dependency on the tick and restart it if necessary.
> @@ -396,7 +388,7 @@ static void tick_nohz_full_kick(void)
>  	if (!tick_nohz_full_cpu(smp_processor_id()))
>  		return;
>  
> -	irq_work_queue(this_cpu_ptr(&nohz_full_kick_work));
> +	irq_work_kick();
>  }
>  
>  /*
> @@ -408,7 +400,7 @@ void tick_nohz_full_kick_cpu(int cpu)
>  	if (!tick_nohz_full_cpu(cpu))
>  		return;
>  
> -	irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu);
> +	irq_work_kick_on(cpu);
>  }
>  
>  static void tick_nohz_kick_task(struct task_struct *tsk)
> 
>   
>   
> 

[-- Attachment #2: irq_panic.txt --]
[-- Type: text/plain, Size: 24044 bytes --]

[ 2392.445785][T72129] e1000 0000:00:03.0 ens3: Reset adapter
[ 2444.656512][    C0] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 2444.657196][    C0] rcu: 	2-....: (0 ticks this GP) idle=a31c/1/0x4000000000000000 softirq=281524/281524 fqs=24874
[ 2444.658112][    C0] rcu: 	(detected by 0, t=60002 jiffies, g=720513, q=257232 ncpus=4)
[ 2444.658802][    C0] Sending NMI from CPU 0 to CPUs 2:
[ 2444.659275][    C2] NMI backtrace for cpu 2
[ 2444.659283][    C2] CPU: 2 PID: 85034 Comm: syz.11.10669 Not tainted 6.6.0+ #10
[ 2444.659305][    C2] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[ 2444.659310][    C2] RIP: 0010:__sanitizer_cov_trace_pc+0x42/0x80
[ 2444.659333][    C2] Code: a9 00 01 ff 00 74 1d f6 c4 01 74 4a a9 00 00 0f 00 75 43 a9 00 00 f0 00 75 3c 8b 82 c4 14 00 00 85 c0 74 32 8b 82 a0 14 00 00 <83> f8 02 75 27 48 8b b2 a8 14 00 00 8b 92 a4 14 00 00 48 8b 06 48
[ 2444.659343][    C2] RSP: 0018:ff11000119909e08 EFLAGS: 00000046
[ 2444.659351][    C2] RAX: 0000000000000000 RBX: ff1100011993e380 RCX: ffffffff9fa11001
[ 2444.659357][    C2] RDX: ff11000112ac0000 RSI: 00000000000000f6 RDI: 000000000000003f
[ 2444.659362][    C2] RBP: 0000000012ac0001 R08: 0000000000000001 R09: ffe21c00233213b3
[ 2444.659368][    C2] R10: 0000000000000001 R11: ff11000119909ff8 R12: 1fe22000233213ca
[ 2444.659374][    C2] R13: ffa0000005c15028 R14: dffffc0000000000 R15: 0000000000000000
[ 2444.659379][    C2] FS:  00007ffaad1b86c0(0000) GS:ff11000119900000(0000) knlGS:0000000000000000
[ 2444.659389][    C2] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2444.659395][    C2] CR2: 0000000000638300 CR3: 000000002c596003 CR4: 0000000000771ee0
[ 2444.659401][    C2] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2444.659406][    C2] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2444.659412][    C2] PKRU: 00000000
[ 2444.659414][    C2] Call Trace:
[ 2444.659418][    C2]  <IRQ>
[ 2444.659421][    C2]  __irq_work_queue_local+0xc1/0x290
[ 2444.659436][    C2]  irq_work_kick+0x53/0x80
[ 2444.659446][    C2]  bpf_trace_run2+0xf7/0x220
[ 2444.659458][    C2]  ? __pfx_bpf_trace_run2+0x10/0x10
[ 2444.659470][    C2]  __bpf_trace_tick_stop+0xb4/0xf0
[ 2444.659479][    C2]  ? __pfx___bpf_trace_tick_stop+0x10/0x10
[ 2444.659488][    C2]  ? __pfx_sched_clock_cpu+0x10/0x10
[ 2444.659504][    C2]  ? rcu_iw_handler+0x41/0xf0
[ 2444.659519][    C2]  check_tick_dependency+0x362/0x670
[ 2444.659534][    C2]  __tick_nohz_full_update_tick+0xd1/0x220
[ 2444.659551][    C2]  tick_nohz_irq_exit+0x22c/0x2a0
[ 2444.659561][    C2]  sysvec_irq_work+0x6a/0x80
[ 2444.659577][    C2]  </IRQ>
[ 2444.659579][    C2]  <TASK>
[ 2444.659582][    C2]  asm_sysvec_irq_work+0x1a/0x20
[ 2444.659597][    C2] RIP: 0010:rcu_read_unlock_special+0x112/0x280
[ 2444.659609][    C2] Code: 05 2b 69 92 60 a9 00 00 0f 00 75 40 4d 85 f6 0f 84 af 00 00 00 45 84 e4 0f 84 a6 00 00 00 bf 09 00 00 00 e8 d0 47 df ff fb 5b <5d> 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 5b 5d 41 5c 41 5d 41 5e
[ 2444.659618][    C2] RSP: 0018:ff1100002d606f38 EFLAGS: 00000283
[ 2444.659625][    C2] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 1fe2200023328c49
[ 2444.659630][    C2] RDX: 0000000000000001 RSI: 0000000000000046 RDI: ff11000100f33084
[ 2444.659636][    C2] RBP: ff11000119946680 R08: 0000000000000000 R09: fffffbfff5ad6a54
[ 2444.659641][    C2] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000080000001
[ 2444.659646][    C2] R13: dffffc0000000000 R14: 0000000000000200 R15: ffffffffabd42f80
[ 2444.659658][    C2]  page_vma_mapped_walk+0x1830/0x2220
[ 2444.659677][    C2]  ? __pfx_page_vma_mapped_walk+0x10/0x10
[ 2444.659691][    C2]  ? __lruvec_stat_mod_folio+0x13f/0x1e0
[ 2444.659704][    C2]  ? folio_add_anon_rmap_ptes+0x1ab/0x2c0
[ 2444.659717][    C2]  remove_migration_pte+0x1d9/0xfb0
[ 2444.659734][    C2]  ? __pfx_remove_migration_pte+0x10/0x10
[ 2444.659750][    C2]  ? __anon_vma_interval_tree_subtree_search+0x171/0x1f0
[ 2444.659766][    C2]  ? __pfx_remove_migration_pte+0x10/0x10
[ 2444.659781][    C2]  rmap_walk_anon+0x2b0/0x980
[ 2444.659795][    C2]  rmap_walk_locked+0x5d/0x90
[ 2444.659808][    C2]  remove_migration_ptes+0xcc/0x130
[ 2444.659818][    C2]  ? __pfx_remove_migration_ptes+0x10/0x10
[ 2444.659829][    C2]  ? __pfx_remove_migration_pte+0x10/0x10
[ 2444.659843][    C2]  ? _raw_spin_lock+0x85/0xe0
[ 2444.659858][    C2]  remap_page.part.0+0xb5/0x170
[ 2444.659874][    C2]  __split_huge_page+0xb05/0x13d0
[ 2444.659887][    C2]  split_huge_page_to_list_to_order+0x12f3/0x17f0
[ 2444.659902][    C2]  ? __pfx_split_huge_page_to_list_to_order+0x10/0x10
[ 2444.659916][    C2]  ? __cgroup_account_cputime+0x8d/0xc0
[ 2444.659929][    C2]  madvise_cold_or_pageout_pte_range+0x1966/0x2450
[ 2444.659943][    C2]  ? enqueue_entity+0xe1c/0x33d0
[ 2444.659956][    C2]  ? __pfx_madvise_cold_or_pageout_pte_range+0x10/0x10
[ 2444.659968][    C2]  ? check_preempt_wakeup_fair+0x435/0x760
[ 2444.659980][    C2]  ? wakeup_preempt+0x193/0x260
[ 2444.659994][    C2]  ? __pfx_madvise_cold_or_pageout_pte_range+0x10/0x10
[ 2444.660006][    C2]  walk_pmd_range.isra.0+0x240/0x720
[ 2444.660024][    C2]  walk_pud_range.isra.0+0x3d3/0x6c0
[ 2444.660041][    C2]  walk_p4d_range+0x2ef/0x4f0
[ 2444.660057][    C2]  walk_pgd_range+0x27e/0x530
[ 2444.660073][    C2]  __walk_page_range+0x4ab/0x5a0
[ 2444.660094][    C2]  ? find_vma+0x81/0xb0
[ 2444.660109][    C2]  ? __pfx_find_vma+0x10/0x10
[ 2444.660123][    C2]  ? folios_put_refs+0x510/0x740
[ 2444.660132][    C2]  ? walk_page_test+0xa0/0x190
[ 2444.660147][    C2]  walk_page_range+0x2a0/0x530
[ 2444.660162][    C2]  ? __pfx_walk_page_range+0x10/0x10
[ 2444.660179][    C2]  ? folio_batch_move_lru+0x2b8/0x3d0
[ 2444.660187][    C2]  ? __pfx_lru_add_fn+0x10/0x10
[ 2444.660196][    C2]  madvise_pageout_page_range+0x1cc/0x6d0
[ 2444.660209][    C2]  ? __pfx_madvise_pageout_page_range+0x10/0x10
[ 2444.660223][    C2]  madvise_pageout+0x1f4/0x400
[ 2444.660235][    C2]  ? __pfx_madvise_pageout+0x10/0x10
[ 2444.660248][    C2]  ? futex_wait+0x552/0x680
[ 2444.660262][    C2]  ? __sanitizer_cov_trace_switch+0x54/0x90
[ 2444.660278][    C2]  ? __sanitizer_cov_trace_switch+0x54/0x90
[ 2444.660293][    C2]  ? mas_prev_setup.constprop.0+0xb4/0x530
[ 2444.660310][    C2]  madvise_vma_behavior+0x8fa/0xe30
[ 2444.660324][    C2]  ? __pfx_madvise_vma_behavior+0x10/0x10
[ 2444.660336][    C2]  ? find_vma_prev+0xf5/0x170
[ 2444.660345][    C2]  ? __pfx_find_vma_prev+0x10/0x10
[ 2444.660356][    C2]  ? do_madvise+0x4d8/0x650
[ 2444.660369][    C2]  do_madvise+0x3af/0x650
[ 2444.660381][    C2]  ? __pfx_do_madvise+0x10/0x10
[ 2444.660393][    C2]  ? __se_sys_futex+0xf7/0x390
[ 2444.660405][    C2]  ? __se_sys_futex+0x100/0x390
[ 2444.660417][    C2]  ? __pfx___se_sys_futex+0x10/0x10
[ 2444.660430][    C2]  ? restore_fpregs_from_fpstate+0x40/0x100
[ 2444.660447][    C2]  __x64_sys_madvise+0xaf/0x120
[ 2444.660459][    C2]  ? syscall_exit_to_user_mode+0x12e/0x1e0
[ 2444.660473][    C2]  ? __ct_user_exit+0x1c/0xe0
[ 2444.660486][    C2]  do_syscall_64+0x59/0x110
[ 2444.660501][    C2]  entry_SYSCALL_64_after_hwframe+0x78/0xe2
[ 2444.660514][    C2] RIP: 0033:0x54d2cd
[ 2444.660537][    C2] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
[ 2444.660546][    C2] RSP: 002b:00007ffaad1b8048 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
[ 2444.660554][    C2] RAX: ffffffffffffffda RBX: 0000000000795fa0 RCX: 000000000054d2cd
[ 2444.660560][    C2] RDX: 0000000000000015 RSI: 0000000000003000 RDI: 0000000020001000
[ 2444.660565][    C2] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 2444.660570][    C2] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000795fac
[ 2444.660575][    C2] R13: 0000000000000000 R14: 0000000000795fa0 R15: 00007ffaad198000
[ 2444.660584][    C2]  </TASK>
[ 2581.883413][    C0] watchdog: BUG: soft lockup - CPU#0 stuck for 186s! [kworker/0:4:72129]
[ 2581.884169][    C0] Modules linked in:
[ 2581.884506][    C0] CPU: 0 PID: 72129 Comm: kworker/0:4 Not tainted 6.6.0+ #10
[ 2581.885124][    C0] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[ 2581.885871][    C0] Workqueue: events e1000_reset_task
[ 2581.886336][    C0] RIP: 0010:_raw_spin_unlock_irqrestore+0x43/0x80
[ 2581.886891][    C0] Code: fa 48 c1 ea 03 0f b6 04 02 48 89 fa 83 e2 07 38 d0 7f 04 84 c0 75 2c c6 07 00 f7 c6 00 02 00 00 74 01 fb 65 ff 0d f5 2f 05 58 <74> 09 48 83 c4 10 c3 cc cc cc cc 0f 1f 44 00 00 48 83 c4 10 c3 cc
[ 2581.888516][    C0] RSP: 0018:ff11000032d5fb40 EFLAGS: 00000246
[ 2581.889016][    C0] RAX: 0000000000000000 RBX: 0000000000000246 RCX: ffffffff9f6d869d
[ 2581.889665][    C0] RDX: 0000000000000004 RSI: 0000000000000246 RDI: ff11000100a940b4
[ 2581.890333][    C0] RBP: ff11000100a94000 R08: 0000000000000001 R09: ffe21c00065abf54
[ 2581.890991][    C0] R10: 0000000000000003 R11: 0000000000000020 R12: ffe21c0020152809
[ 2581.891646][    C0] R13: dffffc0000000000 R14: ff11000100a940b4 R15: 0000000000000001
[ 2581.892318][    C0] FS:  0000000000000000(0000) GS:ff11000119800000(0000) knlGS:0000000000000000
[ 2581.893069][    C0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2581.893639][    C0] CR2: 000000001bfad4b0 CR3: 000000010833c002 CR4: 0000000000771ef0
[ 2581.894308][    C0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2581.894932][    C0] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2581.895554][    C0] PKRU: 55555554
[ 2581.895859][    C0] Call Trace:
[ 2581.896136][    C0]  <TASK>
[ 2581.896375][    C0]  __synchronize_hardirq+0x168/0x230
[ 2581.896807][    C0]  ? __pfx___synchronize_hardirq+0x10/0x10
[ 2581.897295][    C0]  ? schedule_timeout+0x4c1/0x770
[ 2581.897713][    C0]  ? __pfx_ref_tracker_alloc+0x10/0x10
[ 2581.898182][    C0]  ? __sanitizer_cov_trace_switch+0x54/0x90
[ 2581.898693][    C0]  __synchronize_irq+0x96/0x200
[ 2581.899127][    C0]  ? __pfx___synchronize_irq+0x10/0x10
[ 2581.899595][    C0]  ? __pfx_napi_disable+0x10/0x10
[ 2581.900034][    C0]  ? linkwatch_schedule_work+0x189/0x1d0
[ 2581.900511][    C0]  ? linkwatch_fire_event+0x6e/0x270
[ 2581.900975][    C0]  synchronize_irq+0x2d/0x40
[ 2581.901365][    C0]  e1000_down+0x3bc/0x790
[ 2581.901741][    C0]  ? e1000_reset_task+0x66/0xb0
[ 2581.902170][    C0]  e1000_reinit_locked+0xd0/0xf0
[ 2581.902599][    C0]  process_one_work+0x661/0x1020
[ 2581.903023][    C0]  worker_thread+0x849/0x1090
[ 2581.903450][    C0]  ? __kthread_parkme+0x10d/0x190
[ 2581.903882][    C0]  ? __pfx_worker_thread+0x10/0x10
[ 2581.904338][    C0]  kthread+0x2f4/0x3f0
[ 2581.904686][    C0]  ? __pfx_kthread+0x10/0x10
[ 2581.905105][    C0]  ret_from_fork+0x4a/0x80
[ 2581.905492][    C0]  ? __pfx_kthread+0x10/0x10
[ 2581.905894][    C0]  ret_from_fork_asm+0x1b/0x30
[ 2581.906308][    C0]  </TASK>
[ 2581.906583][    C0] Sending NMI from CPU 0 to CPUs 1-3:
[ 2581.907051][    C2] NMI backtrace for cpu 2
[ 2581.907059][    C2] CPU: 2 PID: 85034 Comm: syz.11.10669 Not tainted 6.6.0+ #10
[ 2581.907070][    C2] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[ 2581.907075][    C2] RIP: 0010:native_apic_msr_write+0x28/0x40
[ 2581.907100][    C2] Code: 90 90 f3 0f 1e fa 8d 87 30 ff ff ff 83 e0 ef 74 20 89 f8 83 e0 ef 83 f8 20 74 16 c1 ef 04 31 d2 89 f0 8d 8f 00 08 00 00 0f 30 <66> 90 c3 cc cc cc cc c3 cc cc cc cc 89 f6 31 d2 89 cf e9 91 82 e9
[ 2581.907110][    C2] RSP: 0018:ff11000119909cb0 EFLAGS: 00000046
[ 2581.907118][    C2] RAX: 00000000000000f6 RBX: 0000000000000001 RCX: 000000000000083f
[ 2581.907124][    C2] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000003f
[ 2581.907130][    C2] RBP: 0000000012ac0001 R08: 0000000000000001 R09: ffe21c002332138a
[ 2581.907136][    C2] R10: 0000000000000001 R11: 0000000000000000 R12: 1fe22000233213a1
[ 2581.907141][    C2] R13: ffa0000005c15028 R14: dffffc0000000000 R15: 0000000000000000
[ 2581.907147][    C2] FS:  00007ffaad1b86c0(0000) GS:ff11000119900000(0000) knlGS:0000000000000000
[ 2581.907158][    C2] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2581.907164][    C2] CR2: 0000000000638300 CR3: 000000002c596003 CR4: 0000000000771ee0
[ 2581.907170][    C2] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2581.907176][    C2] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2581.907181][    C2] PKRU: 00000000
[ 2581.907184][    C2] Call Trace:
[ 2581.907189][    C2]  <IRQ>
[ 2581.907192][    C2]  arch_irq_work_raise+0x54/0x70
[ 2581.907205][    C2]  __irq_work_queue_local+0xc1/0x290
[ 2581.907218][    C2]  irq_work_kick+0x53/0x80
[ 2581.907229][    C2]  bpf_trace_run2+0xf7/0x220
[ 2581.907240][    C2]  ? __pfx_bpf_trace_run2+0x10/0x10
[ 2581.907251][    C2]  ? read_tsc+0x9/0x20
[ 2581.907260][    C2]  ? ktime_get+0xfd/0x160
[ 2581.907275][    C2]  __bpf_trace_tick_stop+0xb4/0xf0
[ 2581.907285][    C2]  ? __pfx___bpf_trace_tick_stop+0x10/0x10
[ 2581.907294][    C2]  ? __pfx_sched_clock_cpu+0x10/0x10
[ 2581.907310][    C2]  ? hrtimer_interrupt+0x57f/0x7a0
[ 2581.907324][    C2]  check_tick_dependency+0x362/0x670
[ 2581.907340][    C2]  __tick_nohz_full_update_tick+0xd1/0x220
[ 2581.907357][    C2]  tick_nohz_irq_exit+0x22c/0x2a0
[ 2581.907367][    C2]  sysvec_irq_work+0x36/0x80
[ 2581.907379][    C2]  asm_sysvec_irq_work+0x1a/0x20
[ 2581.907394][    C2] RIP: 0010:handle_softirqs+0x12b/0x580
[ 2581.907408][    C2] Code: c1 e8 03 44 89 74 24 30 4c 01 e8 44 89 7c 24 2c 48 89 44 24 20 48 89 6c 24 18 65 66 c7 05 b3 2b b3 60 00 00 fb bb ff ff ff ff <48> c7 c0 c0 a0 80 ab 41 0f bc dc 83 c3 01 49 89 c2 0f 84 8e 00 00
[ 2581.907417][    C2] RSP: 0018:ff11000119909f70 EFLAGS: 00000286
[ 2581.907424][    C2] RAX: ffe21c0022558000 RBX: 00000000ffffffff RCX: 1fe22000233261bc
[ 2581.907430][    C2] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ff11000119930de0
[ 2581.907435][    C2] RBP: ff11000112ac0000 R08: 0000000000000001 R09: ffe21c0023326219
[ 2581.907441][    C2] R10: 0000000000000000 R11: 3030303030302052 R12: 0000000000000200
[ 2581.907446][    C2] R13: dffffc0000000000 R14: 0000000000400140 R15: 000000000000000a
[ 2581.907459][    C2]  irq_exit_rcu+0x134/0x190
[ 2581.907472][    C2]  sysvec_irq_work+0x6a/0x80
[ 2581.907482][    C2]  </IRQ>
[ 2581.907485][    C2]  <TASK>
[ 2581.907487][    C2]  asm_sysvec_irq_work+0x1a/0x20
[ 2581.907502][    C2] RIP: 0010:rcu_read_unlock_special+0x112/0x280
[ 2581.907514][    C2] Code: 05 2b 69 92 60 a9 00 00 0f 00 75 40 4d 85 f6 0f 84 af 00 00 00 45 84 e4 0f 84 a6 00 00 00 bf 09 00 00 00 e8 d0 47 df ff fb 5b <5d> 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 5b 5d 41 5c 41 5d 41 5e
[ 2581.907523][    C2] RSP: 0018:ff1100002d606f38 EFLAGS: 00000283
[ 2581.907529][    C2] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 1fe2200023328c49
[ 2581.907535][    C2] RDX: 0000000000000001 RSI: 0000000000000046 RDI: ff11000100f33084
[ 2581.907541][    C2] RBP: ff11000119946680 R08: 0000000000000000 R09: fffffbfff5ad6a54
[ 2581.907546][    C2] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000080000001
[ 2581.907552][    C2] R13: dffffc0000000000 R14: 0000000000000200 R15: ffffffffabd42f80
[ 2581.907563][    C2]  page_vma_mapped_walk+0x1830/0x2220
[ 2581.907583][    C2]  ? __pfx_page_vma_mapped_walk+0x10/0x10
[ 2581.907598][    C2]  ? __lruvec_stat_mod_folio+0x13f/0x1e0
[ 2581.907612][    C2]  ? folio_add_anon_rmap_ptes+0x1ab/0x2c0
[ 2581.907625][    C2]  remove_migration_pte+0x1d9/0xfb0
[ 2581.907642][    C2]  ? __pfx_remove_migration_pte+0x10/0x10
[ 2581.907659][    C2]  ? __anon_vma_interval_tree_subtree_search+0x171/0x1f0
[ 2581.907675][    C2]  ? __pfx_remove_migration_pte+0x10/0x10
[ 2581.907690][    C2]  rmap_walk_anon+0x2b0/0x980
[ 2581.907703][    C2]  rmap_walk_locked+0x5d/0x90
[ 2581.907717][    C2]  remove_migration_ptes+0xcc/0x130
[ 2581.907727][    C2]  ? __pfx_remove_migration_ptes+0x10/0x10
[ 2581.907737][    C2]  ? __pfx_remove_migration_pte+0x10/0x10
[ 2581.907753][    C2]  ? _raw_spin_lock+0x85/0xe0
[ 2581.907767][    C2]  remap_page.part.0+0xb5/0x170
[ 2581.907783][    C2]  __split_huge_page+0xb05/0x13d0
[ 2581.907796][    C2]  split_huge_page_to_list_to_order+0x12f3/0x17f0
[ 2581.907811][    C2]  ? __pfx_split_huge_page_to_list_to_order+0x10/0x10
[ 2581.907826][    C2]  ? __cgroup_account_cputime+0x8d/0xc0
[ 2581.907838][    C2]  madvise_cold_or_pageout_pte_range+0x1966/0x2450
[ 2581.907853][    C2]  ? enqueue_entity+0xe1c/0x33d0
[ 2581.907866][    C2]  ? __pfx_madvise_cold_or_pageout_pte_range+0x10/0x10
[ 2581.907879][    C2]  ? check_preempt_wakeup_fair+0x435/0x760
[ 2581.907891][    C2]  ? wakeup_preempt+0x193/0x260
[ 2581.907905][    C2]  ? __pfx_madvise_cold_or_pageout_pte_range+0x10/0x10
[ 2581.907917][    C2]  walk_pmd_range.isra.0+0x240/0x720
[ 2581.907936][    C2]  walk_pud_range.isra.0+0x3d3/0x6c0
[ 2581.907953][    C2]  walk_p4d_range+0x2ef/0x4f0
[ 2581.907969][    C2]  walk_pgd_range+0x27e/0x530
[ 2581.907985][    C2]  __walk_page_range+0x4ab/0x5a0
[ 2581.908001][    C2]  ? find_vma+0x81/0xb0
[ 2581.908015][    C2]  ? __pfx_find_vma+0x10/0x10
[ 2581.908029][    C2]  ? folios_put_refs+0x510/0x740
[ 2581.908038][    C2]  ? walk_page_test+0xa0/0x190
[ 2581.908053][    C2]  walk_page_range+0x2a0/0x530
[ 2581.908069][    C2]  ? __pfx_walk_page_range+0x10/0x10
[ 2581.908086][    C2]  ? folio_batch_move_lru+0x2b8/0x3d0
[ 2581.908100][    C2]  ? __pfx_lru_add_fn+0x10/0x10
[ 2581.908109][    C2]  madvise_pageout_page_range+0x1cc/0x6d0
[ 2581.908122][    C2]  ? __pfx_madvise_pageout_page_range+0x10/0x10
[ 2581.908137][    C2]  madvise_pageout+0x1f4/0x400
[ 2581.908148][    C2]  ? __pfx_madvise_pageout+0x10/0x10
[ 2581.908161][    C2]  ? futex_wait+0x552/0x680
[ 2581.908176][    C2]  ? __sanitizer_cov_trace_switch+0x54/0x90
[ 2581.908192][    C2]  ? __sanitizer_cov_trace_switch+0x54/0x90
[ 2581.908207][    C2]  ? mas_prev_setup.constprop.0+0xb4/0x530
[ 2581.908224][    C2]  madvise_vma_behavior+0x8fa/0xe30
[ 2581.908238][    C2]  ? __pfx_madvise_vma_behavior+0x10/0x10
[ 2581.908250][    C2]  ? find_vma_prev+0xf5/0x170
[ 2581.908259][    C2]  ? __pfx_find_vma_prev+0x10/0x10
[ 2581.908270][    C2]  ? do_madvise+0x4d8/0x650
[ 2581.908283][    C2]  do_madvise+0x3af/0x650
[ 2581.908295][    C2]  ? __pfx_do_madvise+0x10/0x10
[ 2581.908307][    C2]  ? __se_sys_futex+0xf7/0x390
[ 2581.908319][    C2]  ? __se_sys_futex+0x100/0x390
[ 2581.908332][    C2]  ? __pfx___se_sys_futex+0x10/0x10
[ 2581.908344][    C2]  ? restore_fpregs_from_fpstate+0x40/0x100
[ 2581.908361][    C2]  __x64_sys_madvise+0xaf/0x120
[ 2581.908373][    C2]  ? syscall_exit_to_user_mode+0x12e/0x1e0
[ 2581.908387][    C2]  ? __ct_user_exit+0x1c/0xe0
[ 2581.908400][    C2]  do_syscall_64+0x59/0x110
[ 2581.908415][    C2]  entry_SYSCALL_64_after_hwframe+0x78/0xe2
[ 2581.908428][    C2] RIP: 0033:0x54d2cd
[ 2581.908456][    C2] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
[ 2581.908465][    C2] RSP: 002b:00007ffaad1b8048 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
[ 2581.908473][    C2] RAX: ffffffffffffffda RBX: 0000000000795fa0 RCX: 000000000054d2cd
[ 2581.908479][    C2] RDX: 0000000000000015 RSI: 0000000000003000 RDI: 0000000020001000
[ 2581.908484][    C2] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 2581.908489][    C2] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000795fac
[ 2581.908495][    C2] R13: 0000000000000000 R14: 0000000000795fa0 R15: 00007ffaad198000
[ 2581.908504][    C2]  </TASK>
[ 2581.908508][    C1] NMI backtrace for cpu 1 skipped: idling at default_idle+0xf/0x20
[ 2581.908563][    C3] NMI backtrace for cpu 3 skipped: idling at default_idle+0xf/0x20
[ 2581.909034][    C0] Kernel panic - not syncing: softlockup: hung tasks
[ 2581.909039][    C0] CPU: 0 PID: 72129 Comm: kworker/0:4 Tainted: G             L     6.6.0+ #10
[ 2581.909048][    C0] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[ 2581.909053][    C0] Workqueue: events e1000_reset_task
[ 2581.909066][    C0] Call Trace:
[ 2581.909069][    C0]  <IRQ>
[ 2581.909072][    C0]  dump_stack_lvl+0x72/0xa0
[ 2581.909086][    C0]  panic+0x64b/0x6e0
[ 2581.909102][    C0]  ? __pfx_panic+0x10/0x10
[ 2581.909111][    C0]  ? irq_work_claim+0x76/0xa0
[ 2581.909122][    C0]  ? irq_work_queue+0x2a/0x70
[ 2581.909132][    C0]  ? watchdog_timer_fn+0x3af/0x450
[ 2581.909142][    C0]  watchdog_timer_fn+0x3c0/0x450
[ 2581.909151][    C0]  ? __pfx_watchdog_timer_fn+0x10/0x10
[ 2581.909160][    C0]  __run_hrtimer+0x13c/0x6b0
[ 2581.909173][    C0]  __hrtimer_run_queues+0x170/0x290
[ 2581.909187][    C0]  ? __pfx___hrtimer_run_queues+0x10/0x10
[ 2581.909199][    C0]  ? read_tsc+0x9/0x20
[ 2581.909206][    C0]  ? ktime_get_update_offsets_now+0x213/0x2f0
[ 2581.909218][    C0]  hrtimer_interrupt+0x2ed/0x7a0
[ 2581.909233][    C0]  __sysvec_apic_timer_interrupt+0x83/0x250
[ 2581.909245][    C0]  sysvec_apic_timer_interrupt+0x65/0x80
[ 2581.909258][    C0]  </IRQ>
[ 2581.909260][    C0]  <TASK>
[ 2581.909263][    C0]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 2581.909278][    C0] RIP: 0010:_raw_spin_unlock_irqrestore+0x43/0x80
[ 2581.909291][    C0] Code: fa 48 c1 ea 03 0f b6 04 02 48 89 fa 83 e2 07 38 d0 7f 04 84 c0 75 2c c6 07 00 f7 c6 00 02 00 00 74 01 fb 65 ff 0d f5 2f 05 58 <74> 09 48 83 c4 10 c3 cc cc cc cc 0f 1f 44 00 00 48 83 c4 10 c3 cc
[ 2581.909300][    C0] RSP: 0018:ff11000032d5fb40 EFLAGS: 00000246
[ 2581.909307][    C0] RAX: 0000000000000000 RBX: 0000000000000246 RCX: ffffffff9f6d869d
[ 2581.909312][    C0] RDX: 0000000000000004 RSI: 0000000000000246 RDI: ff11000100a940b4
[ 2581.909317][    C0] RBP: ff11000100a94000 R08: 0000000000000001 R09: ffe21c00065abf54
[ 2581.909323][    C0] R10: 0000000000000003 R11: 0000000000000020 R12: ffe21c0020152809
[ 2581.909328][    C0] R13: dffffc0000000000 R14: ff11000100a940b4 R15: 0000000000000001
[ 2581.909335][    C0]  ? __synchronize_hardirq+0x15d/0x230
[ 2581.909349][    C0]  __synchronize_hardirq+0x168/0x230
[ 2581.909361][    C0]  ? __pfx___synchronize_hardirq+0x10/0x10
[ 2581.909372][    C0]  ? schedule_timeout+0x4c1/0x770
[ 2581.909383][    C0]  ? __pfx_ref_tracker_alloc+0x10/0x10
[ 2581.909397][    C0]  ? __sanitizer_cov_trace_switch+0x54/0x90
[ 2581.909413][    C0]  __synchronize_irq+0x96/0x200
[ 2581.909425][    C0]  ? __pfx___synchronize_irq+0x10/0x10
[ 2581.909437][    C0]  ? __pfx_napi_disable+0x10/0x10
[ 2581.909447][    C0]  ? linkwatch_schedule_work+0x189/0x1d0
[ 2581.909462][    C0]  ? linkwatch_fire_event+0x6e/0x270
[ 2581.909471][    C0]  synchronize_irq+0x2d/0x40
[ 2581.909482][    C0]  e1000_down+0x3bc/0x790
[ 2581.909496][    C0]  ? e1000_reset_task+0x66/0xb0
[ 2581.909510][    C0]  e1000_reinit_locked+0xd0/0xf0
[ 2581.999836][    C0]  process_one_work+0x661/0x1020
[ 2582.000262][    C0]  worker_thread+0x849/0x1090
[ 2582.000663][    C0]  ? __kthread_parkme+0x10d/0x190
[ 2582.001084][    C0]  ? __pfx_worker_thread+0x10/0x10
[ 2582.001525][    C0]  kthread+0x2f4/0x3f0
[ 2582.001874][    C0]  ? __pfx_kthread+0x10/0x10
[ 2582.002270][    C0]  ret_from_fork+0x4a/0x80
[ 2582.002643][    C0]  ? __pfx_kthread+0x10/0x10
[ 2582.003030][    C0]  ret_from_fork_asm+0x1b/0x30
[ 2582.003684][    C0]  </TASK>
[ 2582.004763][    C0] Kernel Offset: disabled
[ 2582.005129][    C0] ---[ end Kernel panic - not syncing: softlockup: hung tasks ]---

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] rcu: Fix lockup when RCU reader used while IRQ exiting
  2025-06-12  3:06     ` Xiongfeng Wang
@ 2025-06-12 11:37       ` Frederic Weisbecker
  0 siblings, 0 replies; 14+ messages in thread
From: Frederic Weisbecker @ 2025-06-12 11:37 UTC (permalink / raw)
  To: Xiongfeng Wang
  Cc: Joel Fernandes, linux-kernel, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu,
	bpf, xiqi2

Le Thu, Jun 12, 2025 at 11:06:07AM +0800, Xiongfeng Wang a écrit :
> +cc (Qi, my colleague who helps testing the modification)
> 
> On 2025/6/10 20:23, Frederic Weisbecker wrote:
> > Le Mon, Jun 09, 2025 at 02:01:24PM -0400, Joel Fernandes a écrit :
> >> During rcu_read_unlock_special(), if this happens during irq_exit(), we
> 
> ...skipped...
> 
> We have tested the below modification without the modification written by Joel
> using the previous syzkaller benchmark. The kernel still panic.
> The dmesg log is attached.

Yes, it's a cleanup that doesn't include Joel's change yet. So this is
expected.

Thanks.

-- 
Frederic Weisbecker
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-06-12 11:37 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-09 18:01 [PATCH 1/2] context_tracking: Provide helper to determine if we're in IRQ Joel Fernandes
2025-06-09 18:01 ` [PATCH 2/2] rcu: Fix lockup when RCU reader used while IRQ exiting Joel Fernandes
2025-06-09 19:49   ` Boqun Feng
2025-06-09 23:26     ` Frederic Weisbecker
2025-06-10  0:49       ` Boqun Feng
2025-06-10 12:23   ` Frederic Weisbecker
2025-06-10 15:47     ` Joel Fernandes
2025-06-12  3:06     ` Xiongfeng Wang
2025-06-12 11:37       ` Frederic Weisbecker
2025-06-11 16:05   ` Boqun Feng
2025-06-11 16:16     ` Paul E. McKenney
2025-06-11 16:21       ` Boqun Feng
2025-06-09 18:05 ` [PATCH 1/2] context_tracking: Provide helper to determine if we're in IRQ Joel Fernandes
2025-06-11 16:25 ` Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).