* [PATCH 0/2] RCU NOCB for v6.13
@ 2024-11-06 15:32 Frederic Weisbecker
2024-11-06 15:32 ` [PATCH 1/2] rcu: Remove unused declaration rcu_segcblist_offload() Frederic Weisbecker
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2024-11-06 15:32 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Boqun Feng, Joel Fernandes, Josh Triplett,
Lai Jiangshan, Mathieu Desnoyers, Neeraj Upadhyay,
Paul E . McKenney, Steven Rostedt, Uladzislau Rezki, Zqiang, rcu
Hello,
Please find below the RCU NOCB patches targeted for the upcoming
merge window.
Yue Haibing (1):
rcu: Remove unused declaration rcu_segcblist_offload()
Zqiang (1):
rcu/nocb: Fix missed RCU barrier on deoffloading
kernel/rcu/rcu_segcblist.h | 1 -
kernel/rcu/tree_nocb.h | 13 ++++++++++++-
2 files changed, 12 insertions(+), 2 deletions(-)
Thanks.
--
2.46.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] rcu: Remove unused declaration rcu_segcblist_offload()
2024-11-06 15:32 [PATCH 0/2] RCU NOCB for v6.13 Frederic Weisbecker
@ 2024-11-06 15:32 ` Frederic Weisbecker
2024-11-06 15:32 ` [PATCH 2/2] rcu/nocb: Fix missed RCU barrier on deoffloading Frederic Weisbecker
2024-11-11 17:33 ` [PATCH 0/2] RCU NOCB for v6.13 Neeraj Upadhyay
2 siblings, 0 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2024-11-06 15:32 UTC (permalink / raw)
To: LKML
Cc: Yue Haibing, Boqun Feng, Joel Fernandes, Josh Triplett,
Lai Jiangshan, Mathieu Desnoyers, Neeraj Upadhyay,
Paul E . McKenney, Steven Rostedt, Uladzislau Rezki, Zqiang, rcu,
Frederic Weisbecker, Neeraj Upadhyay
From: Yue Haibing <yuehaibing@huawei.com>
Commit 17351eb59abd ("rcu/nocb: Simplify (de-)offloading state machine")
removed the implementation but leave declaration.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
kernel/rcu/rcu_segcblist.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index 259904075636..fadc08ad4b7b 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -120,7 +120,6 @@ void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp);
void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v);
void rcu_segcblist_init(struct rcu_segcblist *rsclp);
void rcu_segcblist_disable(struct rcu_segcblist *rsclp);
-void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload);
bool rcu_segcblist_ready_cbs(struct rcu_segcblist *rsclp);
bool rcu_segcblist_pend_cbs(struct rcu_segcblist *rsclp);
struct rcu_head *rcu_segcblist_first_cb(struct rcu_segcblist *rsclp);
--
2.46.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] rcu/nocb: Fix missed RCU barrier on deoffloading
2024-11-06 15:32 [PATCH 0/2] RCU NOCB for v6.13 Frederic Weisbecker
2024-11-06 15:32 ` [PATCH 1/2] rcu: Remove unused declaration rcu_segcblist_offload() Frederic Weisbecker
@ 2024-11-06 15:32 ` Frederic Weisbecker
2024-11-11 7:37 ` Neeraj Upadhyay
2024-11-11 17:33 ` [PATCH 0/2] RCU NOCB for v6.13 Neeraj Upadhyay
2 siblings, 1 reply; 7+ messages in thread
From: Frederic Weisbecker @ 2024-11-06 15:32 UTC (permalink / raw)
To: LKML
Cc: Zqiang, Boqun Feng, Joel Fernandes, Josh Triplett, Lai Jiangshan,
Mathieu Desnoyers, Neeraj Upadhyay, Paul E . McKenney,
Steven Rostedt, Uladzislau Rezki, rcu, Frederic Weisbecker
From: Zqiang <qiang.zhang1211@gmail.com>
Currently, running rcutorture test with torture_type=rcu fwd_progress=8
n_barrier_cbs=8 nocbs_nthreads=8 nocbs_toggle=100 onoff_interval=60
test_boost=2, will trigger the following warning:
WARNING: CPU: 19 PID: 100 at kernel/rcu/tree_nocb.h:1061 rcu_nocb_rdp_deoffload+0x292/0x2a0
RIP: 0010:rcu_nocb_rdp_deoffload+0x292/0x2a0
Call Trace:
<TASK>
? __warn+0x7e/0x120
? rcu_nocb_rdp_deoffload+0x292/0x2a0
? report_bug+0x18e/0x1a0
? handle_bug+0x3d/0x70
? exc_invalid_op+0x18/0x70
? asm_exc_invalid_op+0x1a/0x20
? rcu_nocb_rdp_deoffload+0x292/0x2a0
rcu_nocb_cpu_deoffload+0x70/0xa0
rcu_nocb_toggle+0x136/0x1c0
? __pfx_rcu_nocb_toggle+0x10/0x10
kthread+0xd1/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2f/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
CPU0 CPU2 CPU3
//rcu_nocb_toggle //nocb_cb_wait //rcutorture
// deoffload CPU1 // process CPU1's rdp
rcu_barrier()
rcu_segcblist_entrain()
rcu_segcblist_add_len(1);
// len == 2
// enqueue barrier
// callback to CPU1's
// rdp->cblist
rcu_do_batch()
// invoke CPU1's rdp->cblist
// callback
rcu_barrier_callback()
rcu_barrier()
mutex_lock(&rcu_state.barrier_mutex);
// still see len == 2
// enqueue barrier callback
// to CPU1's rdp->cblist
rcu_segcblist_entrain()
rcu_segcblist_add_len(1);
// len == 3
// decrement len
rcu_segcblist_add_len(-2);
kthread_parkme()
// CPU1's rdp->cblist len == 1
// Warn because there is
// still a pending barrier
// trigger warning
WARN_ON_ONCE(rcu_segcblist_n_cbs(&rdp->cblist));
cpus_read_unlock();
// wait CPU1 to comes online and
// invoke barrier callback on
// CPU1 rdp's->cblist
wait_for_completion(&rcu_state.barrier_completion);
// deoffload CPU4
cpus_read_lock()
rcu_barrier()
mutex_lock(&rcu_state.barrier_mutex);
// block on barrier_mutex
// wait rcu_barrier() on
// CPU3 to unlock barrier_mutex
// but CPU3 unlock barrier_mutex
// need to wait CPU1 comes online
// when CPU1 going online will block on cpus_write_lock
The above scenario will not only trigger a WARN_ON_ONCE(), but also
trigger a deadlock.
Thanks to nocb locking, a second racing rcu_barrier() on an offline CPU
will either observe the decremented callback counter down to 0 and spare
the callback enqueue, or rcuo will observe the new callback and keep
rdp->nocb_cb_sleep to false.
Therefore check rdp->nocb_cb_sleep before parking to make sure no
further rcu_barrier() is waiting on the rdp.
Fixes: 1fcb932c8b5c ("rcu/nocb: Simplify (de-)offloading state machine")
Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
kernel/rcu/tree_nocb.h | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 16865475120b..2605dd234a13 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -891,7 +891,18 @@ static void nocb_cb_wait(struct rcu_data *rdp)
swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
nocb_cb_wait_cond(rdp));
if (kthread_should_park()) {
- kthread_parkme();
+ /*
+ * kthread_park() must be preceded by an rcu_barrier().
+ * But yet another rcu_barrier() might have sneaked in between
+ * the barrier callback execution and the callbacks counter
+ * decrement.
+ */
+ if (rdp->nocb_cb_sleep) {
+ rcu_nocb_lock_irqsave(rdp, flags);
+ WARN_ON_ONCE(rcu_segcblist_n_cbs(&rdp->cblist));
+ rcu_nocb_unlock_irqrestore(rdp, flags);
+ kthread_parkme();
+ }
} else if (READ_ONCE(rdp->nocb_cb_sleep)) {
WARN_ON(signal_pending(current));
trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
--
2.46.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] rcu/nocb: Fix missed RCU barrier on deoffloading
2024-11-06 15:32 ` [PATCH 2/2] rcu/nocb: Fix missed RCU barrier on deoffloading Frederic Weisbecker
@ 2024-11-11 7:37 ` Neeraj Upadhyay
2024-11-12 21:37 ` Frederic Weisbecker
0 siblings, 1 reply; 7+ messages in thread
From: Neeraj Upadhyay @ 2024-11-11 7:37 UTC (permalink / raw)
To: Frederic Weisbecker, LKML
Cc: Zqiang, Boqun Feng, Joel Fernandes, Josh Triplett, Lai Jiangshan,
Mathieu Desnoyers, Paul E . McKenney, Steven Rostedt,
Uladzislau Rezki, rcu
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 16865475120b..2605dd234a13 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -891,7 +891,18 @@ static void nocb_cb_wait(struct rcu_data *rdp)
> swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
> nocb_cb_wait_cond(rdp));
> if (kthread_should_park()) {
> - kthread_parkme();
> + /*
> + * kthread_park() must be preceded by an rcu_barrier().
> + * But yet another rcu_barrier() might have sneaked in between
> + * the barrier callback execution and the callbacks counter
> + * decrement.
> + */
> + if (rdp->nocb_cb_sleep) {
Is READ_ONCE() not required here?
- Neeraj
> + rcu_nocb_lock_irqsave(rdp, flags);
> + WARN_ON_ONCE(rcu_segcblist_n_cbs(&rdp->cblist));
> + rcu_nocb_unlock_irqrestore(rdp, flags);
> + kthread_parkme();
> + }
> } else if (READ_ONCE(rdp->nocb_cb_sleep)) {
> WARN_ON(signal_pending(current));
> trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/2] RCU NOCB for v6.13
2024-11-06 15:32 [PATCH 0/2] RCU NOCB for v6.13 Frederic Weisbecker
2024-11-06 15:32 ` [PATCH 1/2] rcu: Remove unused declaration rcu_segcblist_offload() Frederic Weisbecker
2024-11-06 15:32 ` [PATCH 2/2] rcu/nocb: Fix missed RCU barrier on deoffloading Frederic Weisbecker
@ 2024-11-11 17:33 ` Neeraj Upadhyay
2 siblings, 0 replies; 7+ messages in thread
From: Neeraj Upadhyay @ 2024-11-11 17:33 UTC (permalink / raw)
To: Frederic Weisbecker, LKML
Cc: Boqun Feng, Joel Fernandes, Josh Triplett, Lai Jiangshan,
Mathieu Desnoyers, Paul E . McKenney, Steven Rostedt,
Uladzislau Rezki, Zqiang, rcu
On 11/6/2024 9:02 PM, Frederic Weisbecker wrote:
> Hello,
>
> Please find below the RCU NOCB patches targeted for the upcoming
> merge window.
>
> Yue Haibing (1):
> rcu: Remove unused declaration rcu_segcblist_offload()
>
> Zqiang (1):
> rcu/nocb: Fix missed RCU barrier on deoffloading
>
> kernel/rcu/rcu_segcblist.h | 1 -
> kernel/rcu/tree_nocb.h | 13 ++++++++++++-
> 2 files changed, 12 insertions(+), 2 deletions(-)
>
> Thanks.
>
Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
- Neeraj
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] rcu/nocb: Fix missed RCU barrier on deoffloading
2024-11-11 7:37 ` Neeraj Upadhyay
@ 2024-11-12 21:37 ` Frederic Weisbecker
2024-11-13 3:22 ` Neeraj Upadhyay
0 siblings, 1 reply; 7+ messages in thread
From: Frederic Weisbecker @ 2024-11-12 21:37 UTC (permalink / raw)
To: Neeraj Upadhyay
Cc: LKML, Zqiang, Boqun Feng, Joel Fernandes, Josh Triplett,
Lai Jiangshan, Mathieu Desnoyers, Paul E . McKenney,
Steven Rostedt, Uladzislau Rezki, rcu
Le Mon, Nov 11, 2024 at 01:07:16PM +0530, Neeraj Upadhyay a écrit :
>
> > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > index 16865475120b..2605dd234a13 100644
> > --- a/kernel/rcu/tree_nocb.h
> > +++ b/kernel/rcu/tree_nocb.h
> > @@ -891,7 +891,18 @@ static void nocb_cb_wait(struct rcu_data *rdp)
> > swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
> > nocb_cb_wait_cond(rdp));
> > if (kthread_should_park()) {
> > - kthread_parkme();
> > + /*
> > + * kthread_park() must be preceded by an rcu_barrier().
> > + * But yet another rcu_barrier() might have sneaked in between
> > + * the barrier callback execution and the callbacks counter
> > + * decrement.
> > + */
> > + if (rdp->nocb_cb_sleep) {
>
> Is READ_ONCE() not required here?
No because it can't be written concurrently at this point. The value observed
here if kthread_should_park() must have been written locally on the previous
call to nocb_cb_wait().
Thanks.
>
>
> - Neeraj
>
> > + rcu_nocb_lock_irqsave(rdp, flags);
> > + WARN_ON_ONCE(rcu_segcblist_n_cbs(&rdp->cblist));
> > + rcu_nocb_unlock_irqrestore(rdp, flags);
> > + kthread_parkme();
> > + }
> > } else if (READ_ONCE(rdp->nocb_cb_sleep)) {
> > WARN_ON(signal_pending(current));
> > trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] rcu/nocb: Fix missed RCU barrier on deoffloading
2024-11-12 21:37 ` Frederic Weisbecker
@ 2024-11-13 3:22 ` Neeraj Upadhyay
0 siblings, 0 replies; 7+ messages in thread
From: Neeraj Upadhyay @ 2024-11-13 3:22 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: LKML, Zqiang, Boqun Feng, Joel Fernandes, Josh Triplett,
Lai Jiangshan, Mathieu Desnoyers, Paul E . McKenney,
Steven Rostedt, Uladzislau Rezki, rcu
On 11/13/2024 3:07 AM, Frederic Weisbecker wrote:
> Le Mon, Nov 11, 2024 at 01:07:16PM +0530, Neeraj Upadhyay a écrit :
>>
>>> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
>>> index 16865475120b..2605dd234a13 100644
>>> --- a/kernel/rcu/tree_nocb.h
>>> +++ b/kernel/rcu/tree_nocb.h
>>> @@ -891,7 +891,18 @@ static void nocb_cb_wait(struct rcu_data *rdp)
>>> swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
>>> nocb_cb_wait_cond(rdp));
>>> if (kthread_should_park()) {
>>> - kthread_parkme();
>>> + /*
>>> + * kthread_park() must be preceded by an rcu_barrier().
>>> + * But yet another rcu_barrier() might have sneaked in between
>>> + * the barrier callback execution and the callbacks counter
>>> + * decrement.
>>> + */
>>> + if (rdp->nocb_cb_sleep) {
>>
>> Is READ_ONCE() not required here?
>
> No because it can't be written concurrently at this point. The value observed
> here if kthread_should_park() must have been written locally on the previous
> call to nocb_cb_wait().
>
Ok, got it. I was not aware of any other flow (other than the one described in
this fix) which can race with it. So, asked.
- Neeraj
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-11-13 3:22 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-06 15:32 [PATCH 0/2] RCU NOCB for v6.13 Frederic Weisbecker
2024-11-06 15:32 ` [PATCH 1/2] rcu: Remove unused declaration rcu_segcblist_offload() Frederic Weisbecker
2024-11-06 15:32 ` [PATCH 2/2] rcu/nocb: Fix missed RCU barrier on deoffloading Frederic Weisbecker
2024-11-11 7:37 ` Neeraj Upadhyay
2024-11-12 21:37 ` Frederic Weisbecker
2024-11-13 3:22 ` Neeraj Upadhyay
2024-11-11 17:33 ` [PATCH 0/2] RCU NOCB for v6.13 Neeraj Upadhyay
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).