public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] rcu: Fix yet another wake up from offline related issue
@ 2024-10-02 14:57 Frederic Weisbecker
  2024-10-02 14:57 ` [PATCH 1/3] rcu/nocb: Use switch/case on NOCB timer state machine Frederic Weisbecker
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Frederic Weisbecker @ 2024-10-02 14:57 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Boqun Feng, Joel Fernandes, Neeraj Upadhyay,
	Paul E . McKenney, Uladzislau Rezki, Zqiang, rcu

Hi,

A new warning has been reported due to swake_up_one_online() use
from an offline IRQ:

	https://lore.kernel.org/oe-lkp/202409231644.4c55582d-lkp@intel.com

Here is a tentative fix.

Similar issue can happen with exp kthread and GP kthread if offline tick
fires and there is a pending deferred quiescent state to report on
PREEMPT_RCU. Currently only oneshot ticks are disabled while the CPU is
offlining. I have yet to take care of the periodic tick implementation.
Work in progress...

Frederic Weisbecker (3):
  rcu/nocb: Use switch/case on NOCB timer state machine
  rcu/nocb: Fix rcuog wake-up from offline softirq
  rcu: Report callbacks enqueued on offline CPU blind spot

 kernel/rcu/tree.c      |  3 +++
 kernel/rcu/tree.h      |  1 +
 kernel/rcu/tree_nocb.h | 47 +++++++++++++++++++++++++++++++-----------
 3 files changed, 39 insertions(+), 12 deletions(-)

-- 
2.46.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] rcu/nocb: Use switch/case on NOCB timer state machine
  2024-10-02 14:57 [PATCH 0/3] rcu: Fix yet another wake up from offline related issue Frederic Weisbecker
@ 2024-10-02 14:57 ` Frederic Weisbecker
  2024-10-10  8:16   ` Boqun Feng
  2024-10-02 14:57 ` [PATCH 2/3] rcu/nocb: Fix rcuog wake-up from offline softirq Frederic Weisbecker
  2024-10-02 14:57 ` [PATCH 3/3] rcu: Report callbacks enqueued on offline CPU blind spot Frederic Weisbecker
  2 siblings, 1 reply; 14+ messages in thread
From: Frederic Weisbecker @ 2024-10-02 14:57 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Boqun Feng, Joel Fernandes, Neeraj Upadhyay,
	Paul E . McKenney, Uladzislau Rezki, Zqiang, rcu

It's more convenient to benefit from the fallthrough feature of
switch / case to handle the timer state machine. Also a new state is
about to be added that will take advantage of it.

No intended functional change.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree_nocb.h | 33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 97b99cd06923..2fb803f863da 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -271,22 +271,35 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
 
 	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
 
-	/*
-	 * Bypass wakeup overrides previous deferments. In case of
-	 * callback storms, no need to wake up too early.
-	 */
-	if (waketype == RCU_NOCB_WAKE_LAZY &&
-	    rdp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT) {
-		mod_timer(&rdp_gp->nocb_timer, jiffies + rcu_get_jiffies_lazy_flush());
-		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
-	} else if (waketype == RCU_NOCB_WAKE_BYPASS) {
+	switch (waketype) {
+	case RCU_NOCB_WAKE_BYPASS:
+		/*
+		 * Bypass wakeup overrides previous deferments. In case of
+		 * callback storms, no need to wake up too early.
+		 */
 		mod_timer(&rdp_gp->nocb_timer, jiffies + 2);
 		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
-	} else {
+		break;
+	case RCU_NOCB_WAKE_LAZY:
+		if (rdp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT) {
+			mod_timer(&rdp_gp->nocb_timer, jiffies + rcu_get_jiffies_lazy_flush());
+			WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
+		}
+		/*
+		 * If the timer is already armed, a non-lazy enqueue may have happened
+		 * in-between. Don't delay it and fall-through.
+		 */
+		break;
+	case RCU_NOCB_WAKE:
+		fallthrough;
+	case RCU_NOCB_WAKE_FORCE:
 		if (rdp_gp->nocb_defer_wakeup < RCU_NOCB_WAKE)
 			mod_timer(&rdp_gp->nocb_timer, jiffies + 1);
 		if (rdp_gp->nocb_defer_wakeup < waketype)
 			WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
+		break;
+	default:
+		WARN_ON_ONCE(1);
 	}
 
 	raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] rcu/nocb: Fix rcuog wake-up from offline softirq
  2024-10-02 14:57 [PATCH 0/3] rcu: Fix yet another wake up from offline related issue Frederic Weisbecker
  2024-10-02 14:57 ` [PATCH 1/3] rcu/nocb: Use switch/case on NOCB timer state machine Frederic Weisbecker
@ 2024-10-02 14:57 ` Frederic Weisbecker
  2024-10-09 18:23   ` Joel Fernandes
  2024-10-02 14:57 ` [PATCH 3/3] rcu: Report callbacks enqueued on offline CPU blind spot Frederic Weisbecker
  2 siblings, 1 reply; 14+ messages in thread
From: Frederic Weisbecker @ 2024-10-02 14:57 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Boqun Feng, Joel Fernandes, Neeraj Upadhyay,
	Paul E . McKenney, Uladzislau Rezki, Zqiang, rcu,
	kernel test robot

After a CPU has set itself offline and before it eventually calls
rcutree_report_cpu_dead(), there are still opportunities for callbacks
to be enqueued, for example from an IRQ. When that happens on NOCB, the
rcuog wake-up is deferred through an IPI to an online CPU in order not
to call into the scheduler and risk arming the RT-bandwidth after
hrtimers have been migrated out and disabled.

But performing a synchronized IPI from an IRQ is buggy as reported in
the following scenario:

	WARNING: CPU: 1 PID: 26 at kernel/smp.c:633 smp_call_function_single
	Modules linked in: rcutorture torture
	CPU: 1 UID: 0 PID: 26 Comm: migration/1 Not tainted 6.11.0-rc1-00012-g9139f93209d1 #1
	Stopper: multi_cpu_stop+0x0/0x320 <- __stop_cpus+0xd0/0x120
	RIP: 0010:smp_call_function_single
	<IRQ>
	swake_up_one_online
	__call_rcu_nocb_wake
	__call_rcu_common
	? rcu_torture_one_read
	call_timer_fn
	__run_timers
	run_timer_softirq
	handle_softirqs
	irq_exit_rcu
	? tick_handle_periodic
	sysvec_apic_timer_interrupt
	</IRQ>

The periodic tick must be shutdown when the CPU is offline, just like is
done for oneshot tick. This must be fixed but this is not enough:
softirqs can happen on any hardirq tail and reproduce the above scenario.

Fix this with introducing a special deferred rcuog wake up mode when the
CPU is offline. This deferred wake up doesn't arm any timer and simply
wait for rcu_report_cpu_dead() to be called in order to flush any
pending rcuog wake up.

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202409231644.4c55582d-lkp@intel.com
Fixes: 9139f93209d1 ("rcu/nocb: Fix RT throttling hrtimer armed from offline CPU")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree.h      |  1 +
 kernel/rcu/tree_nocb.h | 14 ++++++++++++--
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index a9a811d9d7a3..7ed060edd12b 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -290,6 +290,7 @@ struct rcu_data {
 #define RCU_NOCB_WAKE_LAZY	2
 #define RCU_NOCB_WAKE		3
 #define RCU_NOCB_WAKE_FORCE	4
+#define RCU_NOCB_WAKE_OFFLINE   5
 
 #define RCU_JIFFIES_TILL_FORCE_QS (1 + (HZ > 250) + (HZ > 500))
 					/* For jiffies_till_first_fqs and */
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 2fb803f863da..8648233e1717 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -295,6 +295,8 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
 	case RCU_NOCB_WAKE_FORCE:
 		if (rdp_gp->nocb_defer_wakeup < RCU_NOCB_WAKE)
 			mod_timer(&rdp_gp->nocb_timer, jiffies + 1);
+		fallthrough;
+	case RCU_NOCB_WAKE_OFFLINE:
 		if (rdp_gp->nocb_defer_wakeup < waketype)
 			WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
 		break;
@@ -562,8 +564,16 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
 	lazy_len = READ_ONCE(rdp->lazy_len);
 	if (was_alldone) {
 		rdp->qlen_last_fqs_check = len;
-		// Only lazy CBs in bypass list
-		if (lazy_len && bypass_len == lazy_len) {
+		if (cpu_is_offline(rdp->cpu)) {
+			/*
+			 * Offline CPUs can't call swake_up_one_online() from IRQs. Rely
+			 * on the final deferred wake-up rcutree_report_cpu_dead()
+			 */
+			rcu_nocb_unlock(rdp);
+			wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_OFFLINE,
+					   TPS("WakeEmptyIsDeferredOffline"));
+		} else if (lazy_len && bypass_len == lazy_len) {
+			// Only lazy CBs in bypass list
 			rcu_nocb_unlock(rdp);
 			wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_LAZY,
 					   TPS("WakeLazy"));
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] rcu: Report callbacks enqueued on offline CPU blind spot
  2024-10-02 14:57 [PATCH 0/3] rcu: Fix yet another wake up from offline related issue Frederic Weisbecker
  2024-10-02 14:57 ` [PATCH 1/3] rcu/nocb: Use switch/case on NOCB timer state machine Frederic Weisbecker
  2024-10-02 14:57 ` [PATCH 2/3] rcu/nocb: Fix rcuog wake-up from offline softirq Frederic Weisbecker
@ 2024-10-02 14:57 ` Frederic Weisbecker
  2024-10-02 15:00   ` Frederic Weisbecker
  2 siblings, 1 reply; 14+ messages in thread
From: Frederic Weisbecker @ 2024-10-02 14:57 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Boqun Feng, Joel Fernandes, Neeraj Upadhyay,
	Paul E . McKenney, Uladzislau Rezki, Zqiang, rcu

Callbacks enqueued after rcutree_report_cpu_dead() fall into RCU barrier
blind spot. Report any potential misuse.

Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a60616e69b66..36070b6bf4a1 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3084,8 +3084,11 @@ __call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy_in)
 	head->func = func;
 	head->next = NULL;
 	kasan_record_aux_stack_noalloc(head);
+
 	local_irq_save(flags);
 	rdp = this_cpu_ptr(&rcu_data);
+	RCU_LOCKDEP_WARN(rcu_rdp_cpu_online(rdp), "Callback enqueued on offline CPU!");
+
 	lazy = lazy_in && !rcu_async_should_hurry();
 
 	/* Add the callback to our list. */
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] rcu: Report callbacks enqueued on offline CPU blind spot
  2024-10-02 14:57 ` [PATCH 3/3] rcu: Report callbacks enqueued on offline CPU blind spot Frederic Weisbecker
@ 2024-10-02 15:00   ` Frederic Weisbecker
  2024-10-09  2:03     ` Paul E. McKenney
  2024-10-09  2:24     ` Neeraj Upadhyay
  0 siblings, 2 replies; 14+ messages in thread
From: Frederic Weisbecker @ 2024-10-02 15:00 UTC (permalink / raw)
  To: LKML
  Cc: Boqun Feng, Joel Fernandes, Neeraj Upadhyay, Paul E . McKenney,
	Uladzislau Rezki, Zqiang, rcu

Le Wed, Oct 02, 2024 at 04:57:38PM +0200, Frederic Weisbecker a écrit :
> Callbacks enqueued after rcutree_report_cpu_dead() fall into RCU barrier
> blind spot. Report any potential misuse.
> 
> Reported-by: Paul E. McKenney <paulmck@kernel.org>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
>  kernel/rcu/tree.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index a60616e69b66..36070b6bf4a1 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3084,8 +3084,11 @@ __call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy_in)
>  	head->func = func;
>  	head->next = NULL;
>  	kasan_record_aux_stack_noalloc(head);
> +
>  	local_irq_save(flags);
>  	rdp = this_cpu_ptr(&rcu_data);
> +	RCU_LOCKDEP_WARN(rcu_rdp_cpu_online(rdp), "Callback enqueued on offline
> CPU!");

This should be !rcu_rdp_cpu_online(rdp)

Sigh...

> +
>  	lazy = lazy_in && !rcu_async_should_hurry();
>  
>  	/* Add the callback to our list. */
> -- 
> 2.46.0
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] rcu: Report callbacks enqueued on offline CPU blind spot
  2024-10-02 15:00   ` Frederic Weisbecker
@ 2024-10-09  2:03     ` Paul E. McKenney
  2024-10-09 15:13       ` Paul E. McKenney
  2024-10-09  2:24     ` Neeraj Upadhyay
  1 sibling, 1 reply; 14+ messages in thread
From: Paul E. McKenney @ 2024-10-09  2:03 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Boqun Feng, Joel Fernandes, Neeraj Upadhyay,
	Uladzislau Rezki, Zqiang, rcu

On Wed, Oct 02, 2024 at 05:00:03PM +0200, Frederic Weisbecker wrote:
> Le Wed, Oct 02, 2024 at 04:57:38PM +0200, Frederic Weisbecker a écrit :
> > Callbacks enqueued after rcutree_report_cpu_dead() fall into RCU barrier
> > blind spot. Report any potential misuse.
> > 
> > Reported-by: Paul E. McKenney <paulmck@kernel.org>
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > ---
> >  kernel/rcu/tree.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index a60616e69b66..36070b6bf4a1 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3084,8 +3084,11 @@ __call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy_in)
> >  	head->func = func;
> >  	head->next = NULL;
> >  	kasan_record_aux_stack_noalloc(head);
> > +
> >  	local_irq_save(flags);
> >  	rdp = this_cpu_ptr(&rcu_data);
> > +	RCU_LOCKDEP_WARN(rcu_rdp_cpu_online(rdp), "Callback enqueued on offline
> > CPU!");
> 
> This should be !rcu_rdp_cpu_online(rdp)
> 
> Sigh...

I am pulling this in for testing with this change, thank you!

							Thanx, Paul

> > +
> >  	lazy = lazy_in && !rcu_async_should_hurry();
> >  
> >  	/* Add the callback to our list. */
> > -- 
> > 2.46.0
> > 
> > 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] rcu: Report callbacks enqueued on offline CPU blind spot
  2024-10-02 15:00   ` Frederic Weisbecker
  2024-10-09  2:03     ` Paul E. McKenney
@ 2024-10-09  2:24     ` Neeraj Upadhyay
  1 sibling, 0 replies; 14+ messages in thread
From: Neeraj Upadhyay @ 2024-10-09  2:24 UTC (permalink / raw)
  To: Frederic Weisbecker, LKML
  Cc: Boqun Feng, Joel Fernandes, Paul E . McKenney, Uladzislau Rezki,
	Zqiang, rcu



On 10/2/2024 8:30 PM, Frederic Weisbecker wrote:
> Le Wed, Oct 02, 2024 at 04:57:38PM +0200, Frederic Weisbecker a écrit :
>> Callbacks enqueued after rcutree_report_cpu_dead() fall into RCU barrier
>> blind spot. Report any potential misuse.
>>
>> Reported-by: Paul E. McKenney <paulmck@kernel.org>
>> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
>> ---
>>  kernel/rcu/tree.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>> index a60616e69b66..36070b6bf4a1 100644
>> --- a/kernel/rcu/tree.c
>> +++ b/kernel/rcu/tree.c
>> @@ -3084,8 +3084,11 @@ __call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy_in)
>>  	head->func = func;
>>  	head->next = NULL;
>>  	kasan_record_aux_stack_noalloc(head);
>> +
>>  	local_irq_save(flags);
>>  	rdp = this_cpu_ptr(&rcu_data);
>> +	RCU_LOCKDEP_WARN(rcu_rdp_cpu_online(rdp), "Callback enqueued on offline
>> CPU!");
> 
> This should be !rcu_rdp_cpu_online(rdp)
> 

With this patch series, 600 mins RCU torture overnight testing completed
without failures at my end.

- Neeraj

> Sigh...
> 
>> +
>>  	lazy = lazy_in && !rcu_async_should_hurry();
>>  
>>  	/* Add the callback to our list. */
>> -- 
>> 2.46.0
>>
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] rcu: Report callbacks enqueued on offline CPU blind spot
  2024-10-09  2:03     ` Paul E. McKenney
@ 2024-10-09 15:13       ` Paul E. McKenney
  2024-10-10 14:30         ` Joel Fernandes
  0 siblings, 1 reply; 14+ messages in thread
From: Paul E. McKenney @ 2024-10-09 15:13 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Boqun Feng, Joel Fernandes, Neeraj Upadhyay,
	Uladzislau Rezki, Zqiang, rcu

On Tue, Oct 08, 2024 at 07:03:50PM -0700, Paul E. McKenney wrote:
> On Wed, Oct 02, 2024 at 05:00:03PM +0200, Frederic Weisbecker wrote:
> > Le Wed, Oct 02, 2024 at 04:57:38PM +0200, Frederic Weisbecker a écrit :
> > > Callbacks enqueued after rcutree_report_cpu_dead() fall into RCU barrier
> > > blind spot. Report any potential misuse.
> > > 
> > > Reported-by: Paul E. McKenney <paulmck@kernel.org>
> > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > > ---
> > >  kernel/rcu/tree.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index a60616e69b66..36070b6bf4a1 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -3084,8 +3084,11 @@ __call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy_in)
> > >  	head->func = func;
> > >  	head->next = NULL;
> > >  	kasan_record_aux_stack_noalloc(head);
> > > +
> > >  	local_irq_save(flags);
> > >  	rdp = this_cpu_ptr(&rcu_data);
> > > +	RCU_LOCKDEP_WARN(rcu_rdp_cpu_online(rdp), "Callback enqueued on offline
> > > CPU!");
> > 
> > This should be !rcu_rdp_cpu_online(rdp)
> > 
> > Sigh...
> 
> I am pulling this in for testing with this change, thank you!

And:

Tested-by: Paul E. McKenney <paulmck@kernel.org>

> 							Thanx, Paul
> 
> > > +
> > >  	lazy = lazy_in && !rcu_async_should_hurry();
> > >  
> > >  	/* Add the callback to our list. */
> > > -- 
> > > 2.46.0
> > > 
> > > 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] rcu/nocb: Fix rcuog wake-up from offline softirq
  2024-10-02 14:57 ` [PATCH 2/3] rcu/nocb: Fix rcuog wake-up from offline softirq Frederic Weisbecker
@ 2024-10-09 18:23   ` Joel Fernandes
  2024-10-09 20:56     ` Frederic Weisbecker
  0 siblings, 1 reply; 14+ messages in thread
From: Joel Fernandes @ 2024-10-09 18:23 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Boqun Feng, Neeraj Upadhyay, Paul E . McKenney,
	Uladzislau Rezki, Zqiang, rcu, kernel test robot

Hi Frederic,

On Wed, Oct 2, 2024 at 10:57 AM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> After a CPU has set itself offline and before it eventually calls
> rcutree_report_cpu_dead(), there are still opportunities for callbacks
> to be enqueued, for example from an IRQ. When that happens on NOCB, the
> rcuog wake-up is deferred through an IPI to an online CPU in order not
> to call into the scheduler and risk arming the RT-bandwidth after
> hrtimers have been migrated out and disabled.
>
> But performing a synchronized IPI from an IRQ is buggy as reported in
> the following scenario:
>
>         WARNING: CPU: 1 PID: 26 at kernel/smp.c:633 smp_call_function_single
>         Modules linked in: rcutorture torture
>         CPU: 1 UID: 0 PID: 26 Comm: migration/1 Not tainted 6.11.0-rc1-00012-g9139f93209d1 #1
>         Stopper: multi_cpu_stop+0x0/0x320 <- __stop_cpus+0xd0/0x120
>         RIP: 0010:smp_call_function_single
>         <IRQ>
>         swake_up_one_online
>         __call_rcu_nocb_wake
>         __call_rcu_common
>         ? rcu_torture_one_read
>         call_timer_fn
>         __run_timers
>         run_timer_softirq
>         handle_softirqs
>         irq_exit_rcu

This call stack seems a bit confusing with the context of the first
paragraph. In the beginning of changelog, you had mentioned the issue
happens from IRQ, but in fact the callstack here is from softirq
right? The IRQ issue should already be resolved in current code
AFAICS.

>         ? tick_handle_periodic
>         sysvec_apic_timer_interrupt
>         </IRQ>
>
> The periodic tick must be shutdown when the CPU is offline, just like is
> done for oneshot tick. This must be fixed but this is not enough:
> softirqs can happen on any hardirq tail and reproduce the above scenario.
>
> Fix this with introducing a special deferred rcuog wake up mode when the
> CPU is offline. This deferred wake up doesn't arm any timer and simply
> wait for rcu_report_cpu_dead() to be called in order to flush any
> pending rcuog wake up.
[...]
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index a9a811d9d7a3..7ed060edd12b 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -290,6 +290,7 @@ struct rcu_data {
>  #define RCU_NOCB_WAKE_LAZY     2
>  #define RCU_NOCB_WAKE          3
>  #define RCU_NOCB_WAKE_FORCE    4
> +#define RCU_NOCB_WAKE_OFFLINE   5
>
>  #define RCU_JIFFIES_TILL_FORCE_QS (1 + (HZ > 250) + (HZ > 500))
>                                         /* For jiffies_till_first_fqs and */
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 2fb803f863da..8648233e1717 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -295,6 +295,8 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
>         case RCU_NOCB_WAKE_FORCE:
>                 if (rdp_gp->nocb_defer_wakeup < RCU_NOCB_WAKE)
>                         mod_timer(&rdp_gp->nocb_timer, jiffies + 1);
> +               fallthrough;
> +       case RCU_NOCB_WAKE_OFFLINE:
>                 if (rdp_gp->nocb_defer_wakeup < waketype)
>                         WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
>                 break;
> @@ -562,8 +564,16 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
>         lazy_len = READ_ONCE(rdp->lazy_len);
>         if (was_alldone) {
>                 rdp->qlen_last_fqs_check = len;
> -               // Only lazy CBs in bypass list
> -               if (lazy_len && bypass_len == lazy_len) {
> +               if (cpu_is_offline(rdp->cpu)) {
> +                       /*
> +                        * Offline CPUs can't call swake_up_one_online() from IRQs. Rely
> +                        * on the final deferred wake-up rcutree_report_cpu_dead()
> +                        */
> +                       rcu_nocb_unlock(rdp);
> +                       wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_OFFLINE,
> +                                          TPS("WakeEmptyIsDeferredOffline"));
> +               } else if (lazy_len && bypass_len == lazy_len) {

Since the call stack is when softirqs are disabled,  would an
alternative fix be (pseudocode):

Change the following in the "if (was_alldone)" block:

               if (!irqs_disabled_flags(flags)) {

to:
               if (!irqs_disabled_flags(flags) && !in_softirq())

?

That way perhaps an additional RCU_NOCB flag is not needed.

Or does that not work for some reason?

thanks,

 - Joel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] rcu/nocb: Fix rcuog wake-up from offline softirq
  2024-10-09 18:23   ` Joel Fernandes
@ 2024-10-09 20:56     ` Frederic Weisbecker
  2024-10-10  0:27       ` Joel Fernandes
  0 siblings, 1 reply; 14+ messages in thread
From: Frederic Weisbecker @ 2024-10-09 20:56 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: LKML, Boqun Feng, Neeraj Upadhyay, Paul E . McKenney,
	Uladzislau Rezki, Zqiang, rcu, kernel test robot

Le Wed, Oct 09, 2024 at 02:23:15PM -0400, Joel Fernandes a écrit :
> Hi Frederic,
> 
> On Wed, Oct 2, 2024 at 10:57 AM Frederic Weisbecker <frederic@kernel.org> wrote:
> >
> > After a CPU has set itself offline and before it eventually calls
> > rcutree_report_cpu_dead(), there are still opportunities for callbacks
> > to be enqueued, for example from an IRQ. When that happens on NOCB, the
> > rcuog wake-up is deferred through an IPI to an online CPU in order not
> > to call into the scheduler and risk arming the RT-bandwidth after
> > hrtimers have been migrated out and disabled.
> >
> > But performing a synchronized IPI from an IRQ is buggy as reported in
> > the following scenario:
> >
> >         WARNING: CPU: 1 PID: 26 at kernel/smp.c:633 smp_call_function_single
> >         Modules linked in: rcutorture torture
> >         CPU: 1 UID: 0 PID: 26 Comm: migration/1 Not tainted 6.11.0-rc1-00012-g9139f93209d1 #1
> >         Stopper: multi_cpu_stop+0x0/0x320 <- __stop_cpus+0xd0/0x120
> >         RIP: 0010:smp_call_function_single
> >         <IRQ>
> >         swake_up_one_online
> >         __call_rcu_nocb_wake
> >         __call_rcu_common
> >         ? rcu_torture_one_read
> >         call_timer_fn
> >         __run_timers
> >         run_timer_softirq
> >         handle_softirqs
> >         irq_exit_rcu
> 
> This call stack seems a bit confusing with the context of the first
> paragraph. In the beginning of changelog, you had mentioned the issue
> happens from IRQ, but in fact the callstack here is from softirq
> right? The IRQ issue should already be resolved in current code
> AFAICS.

Indeed, I need to s/IRQ/softirq for clarity.

> 
> >         ? tick_handle_periodic
> >         sysvec_apic_timer_interrupt
> >         </IRQ>
> >
> > The periodic tick must be shutdown when the CPU is offline, just like is
> > done for oneshot tick. This must be fixed but this is not enough:
> > softirqs can happen on any hardirq tail and reproduce the above scenario.
> >
> > Fix this with introducing a special deferred rcuog wake up mode when the
> > CPU is offline. This deferred wake up doesn't arm any timer and simply
> > wait for rcu_report_cpu_dead() to be called in order to flush any
> > pending rcuog wake up.
> [...]
> > diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> > index a9a811d9d7a3..7ed060edd12b 100644
> > --- a/kernel/rcu/tree.h
> > +++ b/kernel/rcu/tree.h
> > @@ -290,6 +290,7 @@ struct rcu_data {
> >  #define RCU_NOCB_WAKE_LAZY     2
> >  #define RCU_NOCB_WAKE          3
> >  #define RCU_NOCB_WAKE_FORCE    4
> > +#define RCU_NOCB_WAKE_OFFLINE   5
> >
> >  #define RCU_JIFFIES_TILL_FORCE_QS (1 + (HZ > 250) + (HZ > 500))
> >                                         /* For jiffies_till_first_fqs and */
> > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > index 2fb803f863da..8648233e1717 100644
> > --- a/kernel/rcu/tree_nocb.h
> > +++ b/kernel/rcu/tree_nocb.h
> > @@ -295,6 +295,8 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
> >         case RCU_NOCB_WAKE_FORCE:
> >                 if (rdp_gp->nocb_defer_wakeup < RCU_NOCB_WAKE)
> >                         mod_timer(&rdp_gp->nocb_timer, jiffies + 1);
> > +               fallthrough;
> > +       case RCU_NOCB_WAKE_OFFLINE:
> >                 if (rdp_gp->nocb_defer_wakeup < waketype)
> >                         WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
> >                 break;
> > @@ -562,8 +564,16 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
> >         lazy_len = READ_ONCE(rdp->lazy_len);
> >         if (was_alldone) {
> >                 rdp->qlen_last_fqs_check = len;
> > -               // Only lazy CBs in bypass list
> > -               if (lazy_len && bypass_len == lazy_len) {
> > +               if (cpu_is_offline(rdp->cpu)) {
> > +                       /*
> > +                        * Offline CPUs can't call swake_up_one_online() from IRQs. Rely
> > +                        * on the final deferred wake-up rcutree_report_cpu_dead()
> > +                        */
> > +                       rcu_nocb_unlock(rdp);
> > +                       wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_OFFLINE,
> > +                                          TPS("WakeEmptyIsDeferredOffline"));
> > +               } else if (lazy_len && bypass_len == lazy_len) {
> 
> Since the call stack is when softirqs are disabled,  would an
> alternative fix be (pseudocode):
> 
> Change the following in the "if (was_alldone)" block:
> 
>                if (!irqs_disabled_flags(flags)) {
> 
> to:
>                if (!irqs_disabled_flags(flags) && !in_softirq())
> 
> ?
> 
> That way perhaps an additional RCU_NOCB flag is not needed.
> 
> Or does that not work for some reason?

It works but this forces the wake-up through the timer when a callback is
enqueued from softirqs. And waking up from the timer is a bit more overhead
and also added GP delay. It could be this though:

    if (!irqs_disabled_flags(flags) && cpu_online(smp_processor_id()))

Hmm?

Thanks.
> 
> thanks,
> 
>  - Joel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] rcu/nocb: Fix rcuog wake-up from offline softirq
  2024-10-09 20:56     ` Frederic Weisbecker
@ 2024-10-10  0:27       ` Joel Fernandes
  0 siblings, 0 replies; 14+ messages in thread
From: Joel Fernandes @ 2024-10-10  0:27 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Boqun Feng, Neeraj Upadhyay, Paul E . McKenney,
	Uladzislau Rezki, Zqiang, rcu, kernel test robot

On Wed, Oct 9, 2024 at 4:56 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
[...]
> > >         ? tick_handle_periodic
> > >         sysvec_apic_timer_interrupt
> > >         </IRQ>
> > >
> > > The periodic tick must be shutdown when the CPU is offline, just like is
> > > done for oneshot tick. This must be fixed but this is not enough:
> > > softirqs can happen on any hardirq tail and reproduce the above scenario.
> > >
> > > Fix this with introducing a special deferred rcuog wake up mode when the
> > > CPU is offline. This deferred wake up doesn't arm any timer and simply
> > > wait for rcu_report_cpu_dead() to be called in order to flush any
> > > pending rcuog wake up.
> > [...]
> > > diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> > > index a9a811d9d7a3..7ed060edd12b 100644
> > > --- a/kernel/rcu/tree.h
> > > +++ b/kernel/rcu/tree.h
> > > @@ -290,6 +290,7 @@ struct rcu_data {
> > >  #define RCU_NOCB_WAKE_LAZY     2
> > >  #define RCU_NOCB_WAKE          3
> > >  #define RCU_NOCB_WAKE_FORCE    4
> > > +#define RCU_NOCB_WAKE_OFFLINE   5
> > >
> > >  #define RCU_JIFFIES_TILL_FORCE_QS (1 + (HZ > 250) + (HZ > 500))
> > >                                         /* For jiffies_till_first_fqs and */
> > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > > index 2fb803f863da..8648233e1717 100644
> > > --- a/kernel/rcu/tree_nocb.h
> > > +++ b/kernel/rcu/tree_nocb.h
> > > @@ -295,6 +295,8 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
> > >         case RCU_NOCB_WAKE_FORCE:
> > >                 if (rdp_gp->nocb_defer_wakeup < RCU_NOCB_WAKE)
> > >                         mod_timer(&rdp_gp->nocb_timer, jiffies + 1);
> > > +               fallthrough;
> > > +       case RCU_NOCB_WAKE_OFFLINE:
> > >                 if (rdp_gp->nocb_defer_wakeup < waketype)
> > >                         WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
> > >                 break;
> > > @@ -562,8 +564,16 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
> > >         lazy_len = READ_ONCE(rdp->lazy_len);
> > >         if (was_alldone) {
> > >                 rdp->qlen_last_fqs_check = len;
> > > -               // Only lazy CBs in bypass list
> > > -               if (lazy_len && bypass_len == lazy_len) {
> > > +               if (cpu_is_offline(rdp->cpu)) {
> > > +                       /*
> > > +                        * Offline CPUs can't call swake_up_one_online() from IRQs. Rely
> > > +                        * on the final deferred wake-up rcutree_report_cpu_dead()
> > > +                        */
> > > +                       rcu_nocb_unlock(rdp);
> > > +                       wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_OFFLINE,
> > > +                                          TPS("WakeEmptyIsDeferredOffline"));
> > > +               } else if (lazy_len && bypass_len == lazy_len) {
> >
> > Since the call stack is when softirqs are disabled,  would an
> > alternative fix be (pseudocode):
> >
> > Change the following in the "if (was_alldone)" block:
> >
> >                if (!irqs_disabled_flags(flags)) {
> >
> > to:
> >                if (!irqs_disabled_flags(flags) && !in_softirq())
> >
> > ?
> >
> > That way perhaps an additional RCU_NOCB flag is not needed.
> >
> > Or does that not work for some reason?
>
> It works but this forces the wake-up through the timer when a callback is
> enqueued from softirqs. And waking up from the timer is a bit more overhead
> and also added GP delay. It could be this though:
>
>     if (!irqs_disabled_flags(flags) && cpu_online(smp_processor_id()))
>

This makes sense to me and also will future-proof this code path from
potential users who end up here. I think it will work.

Feel free to add to this and the next patch:
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

thanks,

 - Joel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] rcu/nocb: Use switch/case on NOCB timer state machine
  2024-10-02 14:57 ` [PATCH 1/3] rcu/nocb: Use switch/case on NOCB timer state machine Frederic Weisbecker
@ 2024-10-10  8:16   ` Boqun Feng
  2024-10-10 12:55     ` Frederic Weisbecker
  0 siblings, 1 reply; 14+ messages in thread
From: Boqun Feng @ 2024-10-10  8:16 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Joel Fernandes, Neeraj Upadhyay, Paul E . McKenney,
	Uladzislau Rezki, Zqiang, rcu

On Wed, Oct 02, 2024 at 04:57:36PM +0200, Frederic Weisbecker wrote:
> It's more convenient to benefit from the fallthrough feature of
> switch / case to handle the timer state machine. Also a new state is
> about to be added that will take advantage of it.
> 
> No intended functional change.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
>  kernel/rcu/tree_nocb.h | 33 +++++++++++++++++++++++----------
>  1 file changed, 23 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 97b99cd06923..2fb803f863da 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -271,22 +271,35 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
>  
>  	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
>  
> -	/*
> -	 * Bypass wakeup overrides previous deferments. In case of
> -	 * callback storms, no need to wake up too early.
> -	 */
> -	if (waketype == RCU_NOCB_WAKE_LAZY &&
> -	    rdp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT) {

In the old code, if this "if" branch is not taken,

> -		mod_timer(&rdp_gp->nocb_timer, jiffies + rcu_get_jiffies_lazy_flush());
> -		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
> -	} else if (waketype == RCU_NOCB_WAKE_BYPASS) {
> +	switch (waketype) {
> +	case RCU_NOCB_WAKE_BYPASS:
> +		/*
> +		 * Bypass wakeup overrides previous deferments. In case of
> +		 * callback storms, no need to wake up too early.
> +		 */
>  		mod_timer(&rdp_gp->nocb_timer, jiffies + 2);
>  		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
> -	} else {

... it will end up in this else branch, however,

> +		break;
> +	case RCU_NOCB_WAKE_LAZY:
> +		if (rdp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT) {
> +			mod_timer(&rdp_gp->nocb_timer, jiffies + rcu_get_jiffies_lazy_flush());
> +			WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
> +		}
> +		/*
> +		 * If the timer is already armed, a non-lazy enqueue may have happened
> +		 * in-between. Don't delay it and fall-through.
> +		 */
> +		break;

... here we break instead of fallthrough when waketype ==
RCU_NOCB_WAKE_LAZY and rdp->nocb_defer_wakeup != RCU_NOCB_WAKE_NOT, this
seems to me a functional change, is this intented?

Regards,
Boqun

> +	case RCU_NOCB_WAKE:
> +		fallthrough;
> +	case RCU_NOCB_WAKE_FORCE:
>  		if (rdp_gp->nocb_defer_wakeup < RCU_NOCB_WAKE)
>  			mod_timer(&rdp_gp->nocb_timer, jiffies + 1);
>  		if (rdp_gp->nocb_defer_wakeup < waketype)
>  			WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
> +		break;
> +	default:
> +		WARN_ON_ONCE(1);
>  	}
>  
>  	raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);
> -- 
> 2.46.0
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] rcu/nocb: Use switch/case on NOCB timer state machine
  2024-10-10  8:16   ` Boqun Feng
@ 2024-10-10 12:55     ` Frederic Weisbecker
  0 siblings, 0 replies; 14+ messages in thread
From: Frederic Weisbecker @ 2024-10-10 12:55 UTC (permalink / raw)
  To: Boqun Feng
  Cc: LKML, Joel Fernandes, Neeraj Upadhyay, Paul E . McKenney,
	Uladzislau Rezki, Zqiang, rcu

Le Thu, Oct 10, 2024 at 01:16:41AM -0700, Boqun Feng a écrit :
> On Wed, Oct 02, 2024 at 04:57:36PM +0200, Frederic Weisbecker wrote:
> > It's more convenient to benefit from the fallthrough feature of
> > switch / case to handle the timer state machine. Also a new state is
> > about to be added that will take advantage of it.
> > 
> > No intended functional change.
> > 
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > ---
> >  kernel/rcu/tree_nocb.h | 33 +++++++++++++++++++++++----------
> >  1 file changed, 23 insertions(+), 10 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > index 97b99cd06923..2fb803f863da 100644
> > --- a/kernel/rcu/tree_nocb.h
> > +++ b/kernel/rcu/tree_nocb.h
> > @@ -271,22 +271,35 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
> >  
> >  	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
> >  
> > -	/*
> > -	 * Bypass wakeup overrides previous deferments. In case of
> > -	 * callback storms, no need to wake up too early.
> > -	 */
> > -	if (waketype == RCU_NOCB_WAKE_LAZY &&
> > -	    rdp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT) {
> 
> In the old code, if this "if" branch is not taken,
> 
> > -		mod_timer(&rdp_gp->nocb_timer, jiffies + rcu_get_jiffies_lazy_flush());
> > -		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
> > -	} else if (waketype == RCU_NOCB_WAKE_BYPASS) {
> > +	switch (waketype) {
> > +	case RCU_NOCB_WAKE_BYPASS:
> > +		/*
> > +		 * Bypass wakeup overrides previous deferments. In case of
> > +		 * callback storms, no need to wake up too early.
> > +		 */
> >  		mod_timer(&rdp_gp->nocb_timer, jiffies + 2);
> >  		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
> > -	} else {
> 
> ... it will end up in this else branch, however,
> 
> > +		break;
> > +	case RCU_NOCB_WAKE_LAZY:
> > +		if (rdp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT) {
> > +			mod_timer(&rdp_gp->nocb_timer, jiffies + rcu_get_jiffies_lazy_flush());
> > +			WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
> > +		}
> > +		/*
> > +		 * If the timer is already armed, a non-lazy enqueue may have happened
> > +		 * in-between. Don't delay it and fall-through.
> > +		 */
> > +		break;
> 
> ... here we break instead of fallthrough when waketype ==
> RCU_NOCB_WAKE_LAZY and rdp->nocb_defer_wakeup != RCU_NOCB_WAKE_NOT, this
> seems to me a functional change, is this intented?

You unveiled my secret plans!

I think it was intended at the last minute, because it "fixes" some rare case
where RCU_NOCB_WAKE_LAZY can be spuriously set over a RCU_NOCB_WAKE. The effect
shouldn't be too bad because the timer still fires in one jiffy but still
this is confusing.

So saying in the changelog "No functional change intended" was not intended.

I'll refactor the changelogs and comments.

Thanks.

> 
> Regards,
> Boqun
> 
> > +	case RCU_NOCB_WAKE:
> > +		fallthrough;
> > +	case RCU_NOCB_WAKE_FORCE:
> >  		if (rdp_gp->nocb_defer_wakeup < RCU_NOCB_WAKE)
> >  			mod_timer(&rdp_gp->nocb_timer, jiffies + 1);
> >  		if (rdp_gp->nocb_defer_wakeup < waketype)
> >  			WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
> > +		break;
> > +	default:
> > +		WARN_ON_ONCE(1);
> >  	}
> >  
> >  	raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);
> > -- 
> > 2.46.0
> > 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] rcu: Report callbacks enqueued on offline CPU blind spot
  2024-10-09 15:13       ` Paul E. McKenney
@ 2024-10-10 14:30         ` Joel Fernandes
  0 siblings, 0 replies; 14+ messages in thread
From: Joel Fernandes @ 2024-10-10 14:30 UTC (permalink / raw)
  To: paulmck
  Cc: Frederic Weisbecker, LKML, Boqun Feng, Neeraj Upadhyay,
	Uladzislau Rezki, Zqiang, rcu

On Wed, Oct 9, 2024 at 11:13 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Tue, Oct 08, 2024 at 07:03:50PM -0700, Paul E. McKenney wrote:
> > On Wed, Oct 02, 2024 at 05:00:03PM +0200, Frederic Weisbecker wrote:
> > > Le Wed, Oct 02, 2024 at 04:57:38PM +0200, Frederic Weisbecker a écrit :
> > > > Callbacks enqueued after rcutree_report_cpu_dead() fall into RCU barrier
> > > > blind spot. Report any potential misuse.
> > > >
> > > > Reported-by: Paul E. McKenney <paulmck@kernel.org>
> > > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > > > ---
> > > >  kernel/rcu/tree.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index a60616e69b66..36070b6bf4a1 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -3084,8 +3084,11 @@ __call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy_in)
> > > >   head->func = func;
> > > >   head->next = NULL;
> > > >   kasan_record_aux_stack_noalloc(head);
> > > > +
> > > >   local_irq_save(flags);
> > > >   rdp = this_cpu_ptr(&rcu_data);
> > > > + RCU_LOCKDEP_WARN(rcu_rdp_cpu_online(rdp), "Callback enqueued on offline
> > > > CPU!");
> > >
> > > This should be !rcu_rdp_cpu_online(rdp)
> > >
> > > Sigh...
> >
> > I am pulling this in for testing with this change, thank you!
>
> And:
>
> Tested-by: Paul E. McKenney <paulmck@kernel.org>

With the correction,
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

thanks,

 - Joel

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-10-10 14:30 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-02 14:57 [PATCH 0/3] rcu: Fix yet another wake up from offline related issue Frederic Weisbecker
2024-10-02 14:57 ` [PATCH 1/3] rcu/nocb: Use switch/case on NOCB timer state machine Frederic Weisbecker
2024-10-10  8:16   ` Boqun Feng
2024-10-10 12:55     ` Frederic Weisbecker
2024-10-02 14:57 ` [PATCH 2/3] rcu/nocb: Fix rcuog wake-up from offline softirq Frederic Weisbecker
2024-10-09 18:23   ` Joel Fernandes
2024-10-09 20:56     ` Frederic Weisbecker
2024-10-10  0:27       ` Joel Fernandes
2024-10-02 14:57 ` [PATCH 3/3] rcu: Report callbacks enqueued on offline CPU blind spot Frederic Weisbecker
2024-10-02 15:00   ` Frederic Weisbecker
2024-10-09  2:03     ` Paul E. McKenney
2024-10-09 15:13       ` Paul E. McKenney
2024-10-10 14:30         ` Joel Fernandes
2024-10-09  2:24     ` Neeraj Upadhyay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox