All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] RCU NOCB for v6.13
@ 2024-11-06 15:32 Frederic Weisbecker
  2024-11-06 15:32 ` [PATCH 1/2] rcu: Remove unused declaration rcu_segcblist_offload() Frederic Weisbecker
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2024-11-06 15:32 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Boqun Feng, Joel Fernandes, Josh Triplett,
	Lai Jiangshan, Mathieu Desnoyers, Neeraj Upadhyay,
	Paul E . McKenney, Steven Rostedt, Uladzislau Rezki, Zqiang, rcu

Hello,

Please find below the RCU NOCB patches targeted for the upcoming
merge window.

Yue Haibing (1):
  rcu: Remove unused declaration rcu_segcblist_offload()

Zqiang (1):
  rcu/nocb: Fix missed RCU barrier on deoffloading

 kernel/rcu/rcu_segcblist.h |  1 -
 kernel/rcu/tree_nocb.h     | 13 ++++++++++++-
 2 files changed, 12 insertions(+), 2 deletions(-)

Thanks.

-- 
2.46.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] rcu: Remove unused declaration rcu_segcblist_offload()
  2024-11-06 15:32 [PATCH 0/2] RCU NOCB for v6.13 Frederic Weisbecker
@ 2024-11-06 15:32 ` Frederic Weisbecker
  2024-11-06 15:32 ` [PATCH 2/2] rcu/nocb: Fix missed RCU barrier on deoffloading Frederic Weisbecker
  2024-11-11 17:33 ` [PATCH 0/2] RCU NOCB for v6.13 Neeraj Upadhyay
  2 siblings, 0 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2024-11-06 15:32 UTC (permalink / raw)
  To: LKML
  Cc: Yue Haibing, Boqun Feng, Joel Fernandes, Josh Triplett,
	Lai Jiangshan, Mathieu Desnoyers, Neeraj Upadhyay,
	Paul E . McKenney, Steven Rostedt, Uladzislau Rezki, Zqiang, rcu,
	Frederic Weisbecker, Neeraj Upadhyay

From: Yue Haibing <yuehaibing@huawei.com>

Commit 17351eb59abd ("rcu/nocb: Simplify (de-)offloading state machine")
removed the implementation but leave declaration.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/rcu_segcblist.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index 259904075636..fadc08ad4b7b 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -120,7 +120,6 @@ void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp);
 void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v);
 void rcu_segcblist_init(struct rcu_segcblist *rsclp);
 void rcu_segcblist_disable(struct rcu_segcblist *rsclp);
-void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload);
 bool rcu_segcblist_ready_cbs(struct rcu_segcblist *rsclp);
 bool rcu_segcblist_pend_cbs(struct rcu_segcblist *rsclp);
 struct rcu_head *rcu_segcblist_first_cb(struct rcu_segcblist *rsclp);
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] rcu/nocb: Fix missed RCU barrier on deoffloading
  2024-11-06 15:32 [PATCH 0/2] RCU NOCB for v6.13 Frederic Weisbecker
  2024-11-06 15:32 ` [PATCH 1/2] rcu: Remove unused declaration rcu_segcblist_offload() Frederic Weisbecker
@ 2024-11-06 15:32 ` Frederic Weisbecker
  2024-11-11  7:37   ` Neeraj Upadhyay
  2024-11-11 17:33 ` [PATCH 0/2] RCU NOCB for v6.13 Neeraj Upadhyay
  2 siblings, 1 reply; 7+ messages in thread
From: Frederic Weisbecker @ 2024-11-06 15:32 UTC (permalink / raw)
  To: LKML
  Cc: Zqiang, Boqun Feng, Joel Fernandes, Josh Triplett, Lai Jiangshan,
	Mathieu Desnoyers, Neeraj Upadhyay, Paul E . McKenney,
	Steven Rostedt, Uladzislau Rezki, rcu, Frederic Weisbecker

From: Zqiang <qiang.zhang1211@gmail.com>

Currently, running rcutorture test with torture_type=rcu fwd_progress=8
n_barrier_cbs=8 nocbs_nthreads=8 nocbs_toggle=100 onoff_interval=60
test_boost=2, will trigger the following warning:

	WARNING: CPU: 19 PID: 100 at kernel/rcu/tree_nocb.h:1061 rcu_nocb_rdp_deoffload+0x292/0x2a0
	RIP: 0010:rcu_nocb_rdp_deoffload+0x292/0x2a0
	 Call Trace:
	  <TASK>
	  ? __warn+0x7e/0x120
	  ? rcu_nocb_rdp_deoffload+0x292/0x2a0
	  ? report_bug+0x18e/0x1a0
	  ? handle_bug+0x3d/0x70
	  ? exc_invalid_op+0x18/0x70
	  ? asm_exc_invalid_op+0x1a/0x20
	  ? rcu_nocb_rdp_deoffload+0x292/0x2a0
	  rcu_nocb_cpu_deoffload+0x70/0xa0
	  rcu_nocb_toggle+0x136/0x1c0
	  ? __pfx_rcu_nocb_toggle+0x10/0x10
	  kthread+0xd1/0x100
	  ? __pfx_kthread+0x10/0x10
	  ret_from_fork+0x2f/0x50
	  ? __pfx_kthread+0x10/0x10
	  ret_from_fork_asm+0x1a/0x30
	  </TASK>

CPU0                               CPU2                          CPU3
//rcu_nocb_toggle             //nocb_cb_wait                   //rcutorture

// deoffload CPU1             // process CPU1's rdp
rcu_barrier()
    rcu_segcblist_entrain()
        rcu_segcblist_add_len(1);
        // len == 2
        // enqueue barrier
        // callback to CPU1's
        // rdp->cblist
                             rcu_do_batch()
                                 // invoke CPU1's rdp->cblist
                                 // callback
                                 rcu_barrier_callback()
                                                             rcu_barrier()
                                                               mutex_lock(&rcu_state.barrier_mutex);
                                                               // still see len == 2
                                                               // enqueue barrier callback
                                                               // to CPU1's rdp->cblist
                                                               rcu_segcblist_entrain()
                                                                   rcu_segcblist_add_len(1);
                                                                   // len == 3
                                 // decrement len
                                 rcu_segcblist_add_len(-2);
                             kthread_parkme()

// CPU1's rdp->cblist len == 1
// Warn because there is
// still a pending barrier
// trigger warning
WARN_ON_ONCE(rcu_segcblist_n_cbs(&rdp->cblist));
cpus_read_unlock();

                                                                // wait CPU1 to comes online and
                                                                // invoke barrier callback on
                                                                // CPU1 rdp's->cblist
                                                                wait_for_completion(&rcu_state.barrier_completion);
// deoffload CPU4
cpus_read_lock()
  rcu_barrier()
    mutex_lock(&rcu_state.barrier_mutex);
    // block on barrier_mutex
    // wait rcu_barrier() on
    // CPU3 to unlock barrier_mutex
    // but CPU3 unlock barrier_mutex
    // need to wait CPU1 comes online
    // when CPU1 going online will block on cpus_write_lock

The above scenario will not only trigger a WARN_ON_ONCE(), but also
trigger a deadlock.

Thanks to nocb locking, a second racing rcu_barrier() on an offline CPU
will either observe the decremented callback counter down to 0 and spare
the callback enqueue, or rcuo will observe the new callback and keep
rdp->nocb_cb_sleep to false.

Therefore check rdp->nocb_cb_sleep before parking to make sure no
further rcu_barrier() is waiting on the rdp.

Fixes: 1fcb932c8b5c ("rcu/nocb: Simplify (de-)offloading state machine")
Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree_nocb.h | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 16865475120b..2605dd234a13 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -891,7 +891,18 @@ static void nocb_cb_wait(struct rcu_data *rdp)
 	swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
 					    nocb_cb_wait_cond(rdp));
 	if (kthread_should_park()) {
-		kthread_parkme();
+		/*
+		 * kthread_park() must be preceded by an rcu_barrier().
+		 * But yet another rcu_barrier() might have sneaked in between
+		 * the barrier callback execution and the callbacks counter
+		 * decrement.
+		 */
+		if (rdp->nocb_cb_sleep) {
+			rcu_nocb_lock_irqsave(rdp, flags);
+			WARN_ON_ONCE(rcu_segcblist_n_cbs(&rdp->cblist));
+			rcu_nocb_unlock_irqrestore(rdp, flags);
+			kthread_parkme();
+		}
 	} else if (READ_ONCE(rdp->nocb_cb_sleep)) {
 		WARN_ON(signal_pending(current));
 		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] rcu/nocb: Fix missed RCU barrier on deoffloading
  2024-11-06 15:32 ` [PATCH 2/2] rcu/nocb: Fix missed RCU barrier on deoffloading Frederic Weisbecker
@ 2024-11-11  7:37   ` Neeraj Upadhyay
  2024-11-12 21:37     ` Frederic Weisbecker
  0 siblings, 1 reply; 7+ messages in thread
From: Neeraj Upadhyay @ 2024-11-11  7:37 UTC (permalink / raw)
  To: Frederic Weisbecker, LKML
  Cc: Zqiang, Boqun Feng, Joel Fernandes, Josh Triplett, Lai Jiangshan,
	Mathieu Desnoyers, Paul E . McKenney, Steven Rostedt,
	Uladzislau Rezki, rcu


> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 16865475120b..2605dd234a13 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -891,7 +891,18 @@ static void nocb_cb_wait(struct rcu_data *rdp)
>  	swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
>  					    nocb_cb_wait_cond(rdp));
>  	if (kthread_should_park()) {
> -		kthread_parkme();
> +		/*
> +		 * kthread_park() must be preceded by an rcu_barrier().
> +		 * But yet another rcu_barrier() might have sneaked in between
> +		 * the barrier callback execution and the callbacks counter
> +		 * decrement.
> +		 */
> +		if (rdp->nocb_cb_sleep) {

Is READ_ONCE() not required here?


- Neeraj

> +			rcu_nocb_lock_irqsave(rdp, flags);
> +			WARN_ON_ONCE(rcu_segcblist_n_cbs(&rdp->cblist));
> +			rcu_nocb_unlock_irqrestore(rdp, flags);
> +			kthread_parkme();
> +		}
>  	} else if (READ_ONCE(rdp->nocb_cb_sleep)) {
>  		WARN_ON(signal_pending(current));
>  		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] RCU NOCB for v6.13
  2024-11-06 15:32 [PATCH 0/2] RCU NOCB for v6.13 Frederic Weisbecker
  2024-11-06 15:32 ` [PATCH 1/2] rcu: Remove unused declaration rcu_segcblist_offload() Frederic Weisbecker
  2024-11-06 15:32 ` [PATCH 2/2] rcu/nocb: Fix missed RCU barrier on deoffloading Frederic Weisbecker
@ 2024-11-11 17:33 ` Neeraj Upadhyay
  2 siblings, 0 replies; 7+ messages in thread
From: Neeraj Upadhyay @ 2024-11-11 17:33 UTC (permalink / raw)
  To: Frederic Weisbecker, LKML
  Cc: Boqun Feng, Joel Fernandes, Josh Triplett, Lai Jiangshan,
	Mathieu Desnoyers, Paul E . McKenney, Steven Rostedt,
	Uladzislau Rezki, Zqiang, rcu

On 11/6/2024 9:02 PM, Frederic Weisbecker wrote:
> Hello,
> 
> Please find below the RCU NOCB patches targeted for the upcoming
> merge window.
> 
> Yue Haibing (1):
>   rcu: Remove unused declaration rcu_segcblist_offload()
> 
> Zqiang (1):
>   rcu/nocb: Fix missed RCU barrier on deoffloading
> 
>  kernel/rcu/rcu_segcblist.h |  1 -
>  kernel/rcu/tree_nocb.h     | 13 ++++++++++++-
>  2 files changed, 12 insertions(+), 2 deletions(-)
> 
> Thanks.
> 

Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>

- Neeraj


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] rcu/nocb: Fix missed RCU barrier on deoffloading
  2024-11-11  7:37   ` Neeraj Upadhyay
@ 2024-11-12 21:37     ` Frederic Weisbecker
  2024-11-13  3:22       ` Neeraj Upadhyay
  0 siblings, 1 reply; 7+ messages in thread
From: Frederic Weisbecker @ 2024-11-12 21:37 UTC (permalink / raw)
  To: Neeraj Upadhyay
  Cc: LKML, Zqiang, Boqun Feng, Joel Fernandes, Josh Triplett,
	Lai Jiangshan, Mathieu Desnoyers, Paul E . McKenney,
	Steven Rostedt, Uladzislau Rezki, rcu

Le Mon, Nov 11, 2024 at 01:07:16PM +0530, Neeraj Upadhyay a écrit :
> 
> > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > index 16865475120b..2605dd234a13 100644
> > --- a/kernel/rcu/tree_nocb.h
> > +++ b/kernel/rcu/tree_nocb.h
> > @@ -891,7 +891,18 @@ static void nocb_cb_wait(struct rcu_data *rdp)
> >  	swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
> >  					    nocb_cb_wait_cond(rdp));
> >  	if (kthread_should_park()) {
> > -		kthread_parkme();
> > +		/*
> > +		 * kthread_park() must be preceded by an rcu_barrier().
> > +		 * But yet another rcu_barrier() might have sneaked in between
> > +		 * the barrier callback execution and the callbacks counter
> > +		 * decrement.
> > +		 */
> > +		if (rdp->nocb_cb_sleep) {
> 
> Is READ_ONCE() not required here?

No because it can't be written concurrently at this point. The value observed
here if kthread_should_park() must have been written locally on the previous
call to nocb_cb_wait().

Thanks.


> 
> 
> - Neeraj
> 
> > +			rcu_nocb_lock_irqsave(rdp, flags);
> > +			WARN_ON_ONCE(rcu_segcblist_n_cbs(&rdp->cblist));
> > +			rcu_nocb_unlock_irqrestore(rdp, flags);
> > +			kthread_parkme();
> > +		}
> >  	} else if (READ_ONCE(rdp->nocb_cb_sleep)) {
> >  		WARN_ON(signal_pending(current));
> >  		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] rcu/nocb: Fix missed RCU barrier on deoffloading
  2024-11-12 21:37     ` Frederic Weisbecker
@ 2024-11-13  3:22       ` Neeraj Upadhyay
  0 siblings, 0 replies; 7+ messages in thread
From: Neeraj Upadhyay @ 2024-11-13  3:22 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Zqiang, Boqun Feng, Joel Fernandes, Josh Triplett,
	Lai Jiangshan, Mathieu Desnoyers, Paul E . McKenney,
	Steven Rostedt, Uladzislau Rezki, rcu



On 11/13/2024 3:07 AM, Frederic Weisbecker wrote:
> Le Mon, Nov 11, 2024 at 01:07:16PM +0530, Neeraj Upadhyay a écrit :
>>
>>> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
>>> index 16865475120b..2605dd234a13 100644
>>> --- a/kernel/rcu/tree_nocb.h
>>> +++ b/kernel/rcu/tree_nocb.h
>>> @@ -891,7 +891,18 @@ static void nocb_cb_wait(struct rcu_data *rdp)
>>>  	swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
>>>  					    nocb_cb_wait_cond(rdp));
>>>  	if (kthread_should_park()) {
>>> -		kthread_parkme();
>>> +		/*
>>> +		 * kthread_park() must be preceded by an rcu_barrier().
>>> +		 * But yet another rcu_barrier() might have sneaked in between
>>> +		 * the barrier callback execution and the callbacks counter
>>> +		 * decrement.
>>> +		 */
>>> +		if (rdp->nocb_cb_sleep) {
>>
>> Is READ_ONCE() not required here?
> 
> No because it can't be written concurrently at this point. The value observed
> here if kthread_should_park() must have been written locally on the previous
> call to nocb_cb_wait().
> 

Ok, got it. I was not aware of any other flow (other than the one described in
this fix) which can race with it. So, asked.



- Neeraj

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-11-13  3:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-06 15:32 [PATCH 0/2] RCU NOCB for v6.13 Frederic Weisbecker
2024-11-06 15:32 ` [PATCH 1/2] rcu: Remove unused declaration rcu_segcblist_offload() Frederic Weisbecker
2024-11-06 15:32 ` [PATCH 2/2] rcu/nocb: Fix missed RCU barrier on deoffloading Frederic Weisbecker
2024-11-11  7:37   ` Neeraj Upadhyay
2024-11-12 21:37     ` Frederic Weisbecker
2024-11-13  3:22       ` Neeraj Upadhyay
2024-11-11 17:33 ` [PATCH 0/2] RCU NOCB for v6.13 Neeraj Upadhyay

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.