* tasks-trace RCU: question about grace period forward progress
@ 2021-02-25 14:22 Mathieu Desnoyers
2021-02-25 15:36 ` Paul E. McKenney
0 siblings, 1 reply; 8+ messages in thread
From: Mathieu Desnoyers @ 2021-02-25 14:22 UTC (permalink / raw)
To: paulmck
Cc: linux-kernel, rcu, Peter Zijlstra, Josh Triplett, rostedt,
Lai Jiangshan, Joel Fernandes, Google
Hi Paul,
Answering a question from Peter on IRC got me to look at rcu_read_lock_trace(), and I see this:
static inline void rcu_read_lock_trace(void)
{
struct task_struct *t = current;
WRITE_ONCE(t->trc_reader_nesting, READ_ONCE(t->trc_reader_nesting) + 1);
barrier();
if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB) &&
t->trc_reader_special.b.need_mb)
smp_mb(); // Pairs with update-side barriers
rcu_lock_acquire(&rcu_trace_lock_map);
}
static inline void rcu_read_unlock_trace(void)
{
int nesting;
struct task_struct *t = current;
rcu_lock_release(&rcu_trace_lock_map);
nesting = READ_ONCE(t->trc_reader_nesting) - 1;
barrier(); // Critical section before disabling.
// Disable IPI-based setting of .need_qs.
WRITE_ONCE(t->trc_reader_nesting, INT_MIN);
if (likely(!READ_ONCE(t->trc_reader_special.s)) || nesting) {
WRITE_ONCE(t->trc_reader_nesting, nesting);
return; // We assume shallow reader nesting.
}
rcu_read_unlock_trace_special(t, nesting);
}
AFAIU, each thread keeps track of whether it is nested within a RCU read-side critical
section with a counter, and grace periods iterate over all threads to make sure they
are not within a read-side critical section before they can complete:
# define rcu_tasks_trace_qs(t) \
do { \
if (!likely(READ_ONCE((t)->trc_reader_checked)) && \
!unlikely(READ_ONCE((t)->trc_reader_nesting))) { \
smp_store_release(&(t)->trc_reader_checked, true); \
smp_mb(); /* Readers partitioned by store. */ \
} \
} while (0)
It reminds me of the liburcu urcu-mb flavor which also deals with per-thread
state to track whether threads are nested within a critical section:
https://github.com/urcu/userspace-rcu/blob/master/include/urcu/static/urcu-mb.h#L90
https://github.com/urcu/userspace-rcu/blob/master/include/urcu/static/urcu-mb.h#L125
static inline void _urcu_mb_read_lock_update(unsigned long tmp)
{
if (caa_likely(!(tmp & URCU_GP_CTR_NEST_MASK))) {
_CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr, _CMM_LOAD_SHARED(urcu_mb_gp.ctr));
cmm_smp_mb();
} else
_CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr, tmp + URCU_GP_COUNT);
}
static inline void _urcu_mb_read_lock(void)
{
unsigned long tmp;
urcu_assert(URCU_TLS(urcu_mb_reader).registered);
cmm_barrier();
tmp = URCU_TLS(urcu_mb_reader).ctr;
urcu_assert((tmp & URCU_GP_CTR_NEST_MASK) != URCU_GP_CTR_NEST_MASK);
_urcu_mb_read_lock_update(tmp);
}
The main difference between the two algorithm is that task-trace within the
kernel lacks the global "urcu_mb_gp.ctr" state snapshot, which is either
incremented or flipped between 0 and 1 by the grace period. This allow RCU readers
outermost nesting starting after the beginning of the grace period not to prevent
progress of the grace period.
Without this, a steady flow of incoming tasks-trace-RCU readers can prevent the
grace period from ever completing.
Or is this handled in a clever way that I am missing here ?
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: tasks-trace RCU: question about grace period forward progress 2021-02-25 14:22 tasks-trace RCU: question about grace period forward progress Mathieu Desnoyers @ 2021-02-25 15:36 ` Paul E. McKenney 2021-02-25 15:47 ` Mathieu Desnoyers 0 siblings, 1 reply; 8+ messages in thread From: Paul E. McKenney @ 2021-02-25 15:36 UTC (permalink / raw) To: Mathieu Desnoyers Cc: linux-kernel, rcu, Peter Zijlstra, Josh Triplett, rostedt, Lai Jiangshan, Joel Fernandes, Google On Thu, Feb 25, 2021 at 09:22:48AM -0500, Mathieu Desnoyers wrote: > Hi Paul, > > Answering a question from Peter on IRC got me to look at rcu_read_lock_trace(), and I see this: > > static inline void rcu_read_lock_trace(void) > { > struct task_struct *t = current; > > WRITE_ONCE(t->trc_reader_nesting, READ_ONCE(t->trc_reader_nesting) + 1); > barrier(); > if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB) && > t->trc_reader_special.b.need_mb) > smp_mb(); // Pairs with update-side barriers > rcu_lock_acquire(&rcu_trace_lock_map); > } > > static inline void rcu_read_unlock_trace(void) > { > int nesting; > struct task_struct *t = current; > > rcu_lock_release(&rcu_trace_lock_map); > nesting = READ_ONCE(t->trc_reader_nesting) - 1; > barrier(); // Critical section before disabling. > // Disable IPI-based setting of .need_qs. > WRITE_ONCE(t->trc_reader_nesting, INT_MIN); > if (likely(!READ_ONCE(t->trc_reader_special.s)) || nesting) { > WRITE_ONCE(t->trc_reader_nesting, nesting); > return; // We assume shallow reader nesting. > } > rcu_read_unlock_trace_special(t, nesting); > } > > AFAIU, each thread keeps track of whether it is nested within a RCU read-side critical > section with a counter, and grace periods iterate over all threads to make sure they > are not within a read-side critical section before they can complete: > > # define rcu_tasks_trace_qs(t) \ > do { \ > if (!likely(READ_ONCE((t)->trc_reader_checked)) && \ > !unlikely(READ_ONCE((t)->trc_reader_nesting))) { \ > smp_store_release(&(t)->trc_reader_checked, true); \ > smp_mb(); /* Readers partitioned by store. */ \ > } \ > } while (0) > > It reminds me of the liburcu urcu-mb flavor which also deals with per-thread > state to track whether threads are nested within a critical section: > > https://github.com/urcu/userspace-rcu/blob/master/include/urcu/static/urcu-mb.h#L90 > https://github.com/urcu/userspace-rcu/blob/master/include/urcu/static/urcu-mb.h#L125 > > static inline void _urcu_mb_read_lock_update(unsigned long tmp) > { > if (caa_likely(!(tmp & URCU_GP_CTR_NEST_MASK))) { > _CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr, _CMM_LOAD_SHARED(urcu_mb_gp.ctr)); > cmm_smp_mb(); > } else > _CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr, tmp + URCU_GP_COUNT); > } > > static inline void _urcu_mb_read_lock(void) > { > unsigned long tmp; > > urcu_assert(URCU_TLS(urcu_mb_reader).registered); > cmm_barrier(); > tmp = URCU_TLS(urcu_mb_reader).ctr; > urcu_assert((tmp & URCU_GP_CTR_NEST_MASK) != URCU_GP_CTR_NEST_MASK); > _urcu_mb_read_lock_update(tmp); > } > > The main difference between the two algorithm is that task-trace within the > kernel lacks the global "urcu_mb_gp.ctr" state snapshot, which is either > incremented or flipped between 0 and 1 by the grace period. This allow RCU readers > outermost nesting starting after the beginning of the grace period not to prevent > progress of the grace period. > > Without this, a steady flow of incoming tasks-trace-RCU readers can prevent the > grace period from ever completing. > > Or is this handled in a clever way that I am missing here ? There are several mechanisms designed to handle this. The following paragraphs describe these at a high level. The trc_wait_for_one_reader() is invoked on each task. It uses the try_invoke_on_locked_down_task(), which, if the task is currently not running, keeps it that way and invokes trc_inspect_reader(). If the locked-down task is in a read-side critical section, the need_qs field is set, which will cause the task's next rcu_read_lock_trace() to report the quiescent state. If read-side memory barriers have been enabled, trc_inspect_reader() is able to check for a reader being active, and if not, reports the quiescent state. If there is a reader, trc_inspect_reader() reports failure, which is another path to the following paragraph. If the task could not be locked down due its currently running, then trc_wait_for_one_reader() attempts to send an IPI, which results in trc_read_check_handler() rechecking for a read-side critical section and either reporting the quiescent state immediately or proceding in the same way that trc_inspect_reader() does. The trc_read_check_handler() of course checks to make sure that the target task is still running before doing anything. If the attempt to send the IPI fails, then the task is rechecked in a later pass. So what sequence of events did you find that causes these mechanisms to fail? Thanx, Paul ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: tasks-trace RCU: question about grace period forward progress 2021-02-25 15:36 ` Paul E. McKenney @ 2021-02-25 15:47 ` Mathieu Desnoyers 2021-02-25 18:33 ` Paul E. McKenney 0 siblings, 1 reply; 8+ messages in thread From: Mathieu Desnoyers @ 2021-02-25 15:47 UTC (permalink / raw) To: paulmck Cc: linux-kernel, rcu, Peter Zijlstra, Josh Triplett, rostedt, Lai Jiangshan, Joel Fernandes, Google ----- On Feb 25, 2021, at 10:36 AM, paulmck paulmck@kernel.org wrote: > On Thu, Feb 25, 2021 at 09:22:48AM -0500, Mathieu Desnoyers wrote: >> Hi Paul, >> >> Answering a question from Peter on IRC got me to look at rcu_read_lock_trace(), >> and I see this: >> >> static inline void rcu_read_lock_trace(void) >> { >> struct task_struct *t = current; >> >> WRITE_ONCE(t->trc_reader_nesting, READ_ONCE(t->trc_reader_nesting) + 1); >> barrier(); >> if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB) && >> t->trc_reader_special.b.need_mb) >> smp_mb(); // Pairs with update-side barriers >> rcu_lock_acquire(&rcu_trace_lock_map); >> } >> >> static inline void rcu_read_unlock_trace(void) >> { >> int nesting; >> struct task_struct *t = current; >> >> rcu_lock_release(&rcu_trace_lock_map); >> nesting = READ_ONCE(t->trc_reader_nesting) - 1; >> barrier(); // Critical section before disabling. >> // Disable IPI-based setting of .need_qs. >> WRITE_ONCE(t->trc_reader_nesting, INT_MIN); >> if (likely(!READ_ONCE(t->trc_reader_special.s)) || nesting) { >> WRITE_ONCE(t->trc_reader_nesting, nesting); >> return; // We assume shallow reader nesting. >> } >> rcu_read_unlock_trace_special(t, nesting); >> } >> >> AFAIU, each thread keeps track of whether it is nested within a RCU read-side >> critical >> section with a counter, and grace periods iterate over all threads to make sure >> they >> are not within a read-side critical section before they can complete: >> >> # define rcu_tasks_trace_qs(t) \ >> do { \ >> if (!likely(READ_ONCE((t)->trc_reader_checked)) && \ >> !unlikely(READ_ONCE((t)->trc_reader_nesting))) { \ >> smp_store_release(&(t)->trc_reader_checked, true); \ >> smp_mb(); /* Readers partitioned by store. */ \ >> } \ >> } while (0) >> >> It reminds me of the liburcu urcu-mb flavor which also deals with per-thread >> state to track whether threads are nested within a critical section: >> >> https://github.com/urcu/userspace-rcu/blob/master/include/urcu/static/urcu-mb.h#L90 >> https://github.com/urcu/userspace-rcu/blob/master/include/urcu/static/urcu-mb.h#L125 >> >> static inline void _urcu_mb_read_lock_update(unsigned long tmp) >> { >> if (caa_likely(!(tmp & URCU_GP_CTR_NEST_MASK))) { >> _CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr, >> _CMM_LOAD_SHARED(urcu_mb_gp.ctr)); >> cmm_smp_mb(); >> } else >> _CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr, tmp + URCU_GP_COUNT); >> } >> >> static inline void _urcu_mb_read_lock(void) >> { >> unsigned long tmp; >> >> urcu_assert(URCU_TLS(urcu_mb_reader).registered); >> cmm_barrier(); >> tmp = URCU_TLS(urcu_mb_reader).ctr; >> urcu_assert((tmp & URCU_GP_CTR_NEST_MASK) != URCU_GP_CTR_NEST_MASK); >> _urcu_mb_read_lock_update(tmp); >> } >> >> The main difference between the two algorithm is that task-trace within the >> kernel lacks the global "urcu_mb_gp.ctr" state snapshot, which is either >> incremented or flipped between 0 and 1 by the grace period. This allow RCU >> readers >> outermost nesting starting after the beginning of the grace period not to >> prevent >> progress of the grace period. >> >> Without this, a steady flow of incoming tasks-trace-RCU readers can prevent the >> grace period from ever completing. >> >> Or is this handled in a clever way that I am missing here ? > > There are several mechanisms designed to handle this. The following > paragraphs describe these at a high level. > > The trc_wait_for_one_reader() is invoked on each task. It uses the > try_invoke_on_locked_down_task(), which, if the task is currently not > running, keeps it that way and invokes trc_inspect_reader(). If the > locked-down task is in a read-side critical section, the need_qs field > is set, which will cause the task's next rcu_read_lock_trace() to report > the quiescent state. I suspect you meant "rcu_read_unlock_trace()" here. > > If read-side memory barriers have been enabled, trc_inspect_reader() > is able to check for a reader being active, and if not, reports the > quiescent state. If there is a reader, trc_inspect_reader() reports > failure, which is another path to the following paragraph. > > If the task could not be locked down due its currently running, > then trc_wait_for_one_reader() attempts to send an IPI, which results in > trc_read_check_handler() rechecking for a read-side critical section > and either reporting the quiescent state immediately or proceding in the > same way that trc_inspect_reader() does. The trc_read_check_handler() > of course checks to make sure that the target task is still running > before doing anything. If the attempt to send the IPI fails, then > the task is rechecked in a later pass. > > So what sequence of events did you find that causes these mechanisms > to fail? The explanation you provide takes care of my concerns, so I don't have any remaining problematic scenario in mind. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: tasks-trace RCU: question about grace period forward progress 2021-02-25 15:47 ` Mathieu Desnoyers @ 2021-02-25 18:33 ` Paul E. McKenney 2021-02-25 20:20 ` Mathieu Desnoyers 2021-02-25 22:23 ` Peter Zijlstra 0 siblings, 2 replies; 8+ messages in thread From: Paul E. McKenney @ 2021-02-25 18:33 UTC (permalink / raw) To: Mathieu Desnoyers Cc: linux-kernel, rcu, Peter Zijlstra, Josh Triplett, rostedt, Lai Jiangshan, Joel Fernandes, Google On Thu, Feb 25, 2021 at 10:47:32AM -0500, Mathieu Desnoyers wrote: > ----- On Feb 25, 2021, at 10:36 AM, paulmck paulmck@kernel.org wrote: > > > On Thu, Feb 25, 2021 at 09:22:48AM -0500, Mathieu Desnoyers wrote: > >> Hi Paul, > >> > >> Answering a question from Peter on IRC got me to look at rcu_read_lock_trace(), > >> and I see this: > >> > >> static inline void rcu_read_lock_trace(void) > >> { > >> struct task_struct *t = current; > >> > >> WRITE_ONCE(t->trc_reader_nesting, READ_ONCE(t->trc_reader_nesting) + 1); > >> barrier(); > >> if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB) && > >> t->trc_reader_special.b.need_mb) > >> smp_mb(); // Pairs with update-side barriers > >> rcu_lock_acquire(&rcu_trace_lock_map); > >> } > >> > >> static inline void rcu_read_unlock_trace(void) > >> { > >> int nesting; > >> struct task_struct *t = current; > >> > >> rcu_lock_release(&rcu_trace_lock_map); > >> nesting = READ_ONCE(t->trc_reader_nesting) - 1; > >> barrier(); // Critical section before disabling. > >> // Disable IPI-based setting of .need_qs. > >> WRITE_ONCE(t->trc_reader_nesting, INT_MIN); > >> if (likely(!READ_ONCE(t->trc_reader_special.s)) || nesting) { > >> WRITE_ONCE(t->trc_reader_nesting, nesting); > >> return; // We assume shallow reader nesting. > >> } > >> rcu_read_unlock_trace_special(t, nesting); > >> } > >> > >> AFAIU, each thread keeps track of whether it is nested within a RCU read-side > >> critical > >> section with a counter, and grace periods iterate over all threads to make sure > >> they > >> are not within a read-side critical section before they can complete: > >> > >> # define rcu_tasks_trace_qs(t) \ > >> do { \ > >> if (!likely(READ_ONCE((t)->trc_reader_checked)) && \ > >> !unlikely(READ_ONCE((t)->trc_reader_nesting))) { \ > >> smp_store_release(&(t)->trc_reader_checked, true); \ > >> smp_mb(); /* Readers partitioned by store. */ \ > >> } \ > >> } while (0) > >> > >> It reminds me of the liburcu urcu-mb flavor which also deals with per-thread > >> state to track whether threads are nested within a critical section: > >> > >> https://github.com/urcu/userspace-rcu/blob/master/include/urcu/static/urcu-mb.h#L90 > >> https://github.com/urcu/userspace-rcu/blob/master/include/urcu/static/urcu-mb.h#L125 > >> > >> static inline void _urcu_mb_read_lock_update(unsigned long tmp) > >> { > >> if (caa_likely(!(tmp & URCU_GP_CTR_NEST_MASK))) { > >> _CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr, > >> _CMM_LOAD_SHARED(urcu_mb_gp.ctr)); > >> cmm_smp_mb(); > >> } else > >> _CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr, tmp + URCU_GP_COUNT); > >> } > >> > >> static inline void _urcu_mb_read_lock(void) > >> { > >> unsigned long tmp; > >> > >> urcu_assert(URCU_TLS(urcu_mb_reader).registered); > >> cmm_barrier(); > >> tmp = URCU_TLS(urcu_mb_reader).ctr; > >> urcu_assert((tmp & URCU_GP_CTR_NEST_MASK) != URCU_GP_CTR_NEST_MASK); > >> _urcu_mb_read_lock_update(tmp); > >> } > >> > >> The main difference between the two algorithm is that task-trace within the > >> kernel lacks the global "urcu_mb_gp.ctr" state snapshot, which is either > >> incremented or flipped between 0 and 1 by the grace period. This allow RCU > >> readers > >> outermost nesting starting after the beginning of the grace period not to > >> prevent > >> progress of the grace period. > >> > >> Without this, a steady flow of incoming tasks-trace-RCU readers can prevent the > >> grace period from ever completing. > >> > >> Or is this handled in a clever way that I am missing here ? > > > > There are several mechanisms designed to handle this. The following > > paragraphs describe these at a high level. > > > > The trc_wait_for_one_reader() is invoked on each task. It uses the > > try_invoke_on_locked_down_task(), which, if the task is currently not > > running, keeps it that way and invokes trc_inspect_reader(). If the > > locked-down task is in a read-side critical section, the need_qs field > > is set, which will cause the task's next rcu_read_lock_trace() to report > > the quiescent state. > > I suspect you meant "rcu_read_unlock_trace()" here. You are quite correct, apologies for my early morning confusion! > > If read-side memory barriers have been enabled, trc_inspect_reader() > > is able to check for a reader being active, and if not, reports the > > quiescent state. If there is a reader, trc_inspect_reader() reports > > failure, which is another path to the following paragraph. > > > > If the task could not be locked down due its currently running, > > then trc_wait_for_one_reader() attempts to send an IPI, which results in > > trc_read_check_handler() rechecking for a read-side critical section > > and either reporting the quiescent state immediately or proceding in the > > same way that trc_inspect_reader() does. The trc_read_check_handler() > > of course checks to make sure that the target task is still running > > before doing anything. If the attempt to send the IPI fails, then > > the task is rechecked in a later pass. > > > > So what sequence of events did you find that causes these mechanisms > > to fail? > > The explanation you provide takes care of my concerns, so I don't have > any remaining problematic scenario in mind. Would the block comment added by the below patch have helped? One question for Peter... Does each and every context switch imply a full barrier? I am pretty sure that it does, but figured that this was a good time to double-check, given that RCU Tasks Trace assumes this. ;-) Thanx, Paul ------------------------------------------------------------------------ commit 581f79546b6be406a9c7280b2d3511b60821efe0 Author: Paul E. McKenney <paulmck@kernel.org> Date: Thu Feb 25 10:26:00 2021 -0800 rcu-tasks: Add block comment laying out RCU Tasks Trace design This commit adds a block comment that gives a high-level overview of how RCU tasks trace grace periods progress. It also adds a note about how exiting tasks are handles, plus it gives an overview of the memory ordering. Reported-by: Peter Zijlstra <peterz@infradead.org> Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h index 17c8ebe..f818357 100644 --- a/kernel/rcu/tasks.h +++ b/kernel/rcu/tasks.h @@ -726,6 +726,42 @@ EXPORT_SYMBOL_GPL(show_rcu_tasks_rude_gp_kthread); // flavors, rcu_preempt and rcu_sched. The fact that RCU Tasks Trace // readers can operate from idle, offline, and exception entry/exit in no // way allows rcu_preempt and rcu_sched readers to also do so. +// +// The implementation uses rcu_tasks_wait_gp(), which relies on function +// pointers in the rcu_tasks structure. The rcu_spawn_tasks_trace_kthread() +// function sets these function pointers up so that rcu_tasks_wait_gp() +// invokes these functions in this order: +// +// rcu_tasks_trace_pregp_step(): +// Initialize the count of readers and block CPU-hotplug operations. +// rcu_tasks_trace_pertask(), invoked on every non-idle task: +// Initialize per-task state and attempt to identify an immediate +// quiescent state for that task, or, failing that, attempt to set +// that task's .need_qs flag so that that task's next outermost +// rcu_read_unlock_trace() will report the quiescent state (in which +// case the count of readers is incremented). If both attempts fail, +// the task is added to a "holdout" list. +// rcu_tasks_trace_postscan(): +// Initialize state and attempt to identify an immediate quiescent +// state as above (but only for idle tasks), unblock CPU-hotplug +// operations, and wait for an RCU grace period to avoid races with +// tasks that are in the process of exiting. +// check_all_holdout_tasks_trace(), repeatedly until holdout list is empty: +// Scans the holdout list, attempting to identify a quiescent state +// for each task on the list. If there is a quiescent state, the +// corresponding task is removed from the holdout list. +// rcu_tasks_trace_postgp(): +// Wait for the count of readers do drop to zero, reporting any stalls. +// Also execute full memory barriers to maintain ordering with code +// executing after the grace period. +// +// The exit_tasks_rcu_finish_trace() synchronizes with exiting tasks. +// +// Pre-grace-period update-side code is ordered before the grace +// period via the ->cbs_lock and barriers in rcu_tasks_kthread(). +// Pre-grace-period read-side code is ordered before the grace period by +// atomic_dec_and_test() of the count of readers (for IPIed readers) and by +// scheduler context-switch ordering (for locked-down non-running readers). // The lockdep state must be outside of #ifdef to be useful. #ifdef CONFIG_DEBUG_LOCK_ALLOC ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: tasks-trace RCU: question about grace period forward progress 2021-02-25 18:33 ` Paul E. McKenney @ 2021-02-25 20:20 ` Mathieu Desnoyers 2021-02-25 20:55 ` Paul E. McKenney 2021-02-25 22:23 ` Peter Zijlstra 1 sibling, 1 reply; 8+ messages in thread From: Mathieu Desnoyers @ 2021-02-25 20:20 UTC (permalink / raw) To: paulmck Cc: linux-kernel, rcu, Peter Zijlstra, Josh Triplett, rostedt, Lai Jiangshan, Joel Fernandes, Google ----- On Feb 25, 2021, at 1:33 PM, paulmck paulmck@kernel.org wrote: [...] > commit 581f79546b6be406a9c7280b2d3511b60821efe0 > Author: Paul E. McKenney <paulmck@kernel.org> > Date: Thu Feb 25 10:26:00 2021 -0800 > > rcu-tasks: Add block comment laying out RCU Tasks Trace design > > This commit adds a block comment that gives a high-level overview of > how RCU tasks trace grace periods progress. It also adds a note about > how exiting tasks are handles, plus it gives an overview of the memory handles -> handled > ordering. > > Reported-by: Peter Zijlstra <peterz@infradead.org> > Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h > index 17c8ebe..f818357 100644 > --- a/kernel/rcu/tasks.h > +++ b/kernel/rcu/tasks.h > @@ -726,6 +726,42 @@ EXPORT_SYMBOL_GPL(show_rcu_tasks_rude_gp_kthread); > // flavors, rcu_preempt and rcu_sched. The fact that RCU Tasks Trace > // readers can operate from idle, offline, and exception entry/exit in no > // way allows rcu_preempt and rcu_sched readers to also do so. > +// > +// The implementation uses rcu_tasks_wait_gp(), which relies on function > +// pointers in the rcu_tasks structure. The rcu_spawn_tasks_trace_kthread() > +// function sets these function pointers up so that rcu_tasks_wait_gp() > +// invokes these functions in this order: > +// > +// rcu_tasks_trace_pregp_step(): > +// Initialize the count of readers and block CPU-hotplug operations. > +// rcu_tasks_trace_pertask(), invoked on every non-idle task: > +// Initialize per-task state and attempt to identify an immediate > +// quiescent state for that task, or, failing that, attempt to set > +// that task's .need_qs flag so that that task's next outermost > +// rcu_read_unlock_trace() will report the quiescent state (in which > +// case the count of readers is incremented). If both attempts fail, > +// the task is added to a "holdout" list. > +// rcu_tasks_trace_postscan(): > +// Initialize state and attempt to identify an immediate quiescent > +// state as above (but only for idle tasks), unblock CPU-hotplug > +// operations, and wait for an RCU grace period to avoid races with > +// tasks that are in the process of exiting. > +// check_all_holdout_tasks_trace(), repeatedly until holdout list is empty: > +// Scans the holdout list, attempting to identify a quiescent state > +// for each task on the list. If there is a quiescent state, the > +// corresponding task is removed from the holdout list. > +// rcu_tasks_trace_postgp(): > +// Wait for the count of readers do drop to zero, reporting any stalls. > +// Also execute full memory barriers to maintain ordering with code > +// executing after the grace period. > +// > +// The exit_tasks_rcu_finish_trace() synchronizes with exiting tasks. > +// > +// Pre-grace-period update-side code is ordered before the grace > +// period via the ->cbs_lock and barriers in rcu_tasks_kthread(). > +// Pre-grace-period read-side code is ordered before the grace period by > +// atomic_dec_and_test() of the count of readers (for IPIed readers) and by > +// scheduler context-switch ordering (for locked-down non-running readers). The rest looks good, thanks! Mathieu > > // The lockdep state must be outside of #ifdef to be useful. > #ifdef CONFIG_DEBUG_LOCK_ALLOC -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: tasks-trace RCU: question about grace period forward progress 2021-02-25 20:20 ` Mathieu Desnoyers @ 2021-02-25 20:55 ` Paul E. McKenney 0 siblings, 0 replies; 8+ messages in thread From: Paul E. McKenney @ 2021-02-25 20:55 UTC (permalink / raw) To: Mathieu Desnoyers Cc: linux-kernel, rcu, Peter Zijlstra, Josh Triplett, rostedt, Lai Jiangshan, Joel Fernandes, Google On Thu, Feb 25, 2021 at 03:20:34PM -0500, Mathieu Desnoyers wrote: > ----- On Feb 25, 2021, at 1:33 PM, paulmck paulmck@kernel.org wrote: > [...] > > commit 581f79546b6be406a9c7280b2d3511b60821efe0 > > Author: Paul E. McKenney <paulmck@kernel.org> > > Date: Thu Feb 25 10:26:00 2021 -0800 > > > > rcu-tasks: Add block comment laying out RCU Tasks Trace design > > > > This commit adds a block comment that gives a high-level overview of > > how RCU tasks trace grace periods progress. It also adds a note about > > how exiting tasks are handles, plus it gives an overview of the memory > > handles -> handled Good eyes, fixed! > > ordering. > > > > Reported-by: Peter Zijlstra <peterz@infradead.org> > > Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > > > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h > > index 17c8ebe..f818357 100644 > > --- a/kernel/rcu/tasks.h > > +++ b/kernel/rcu/tasks.h > > @@ -726,6 +726,42 @@ EXPORT_SYMBOL_GPL(show_rcu_tasks_rude_gp_kthread); > > // flavors, rcu_preempt and rcu_sched. The fact that RCU Tasks Trace > > // readers can operate from idle, offline, and exception entry/exit in no > > // way allows rcu_preempt and rcu_sched readers to also do so. > > +// > > +// The implementation uses rcu_tasks_wait_gp(), which relies on function > > +// pointers in the rcu_tasks structure. The rcu_spawn_tasks_trace_kthread() > > +// function sets these function pointers up so that rcu_tasks_wait_gp() > > +// invokes these functions in this order: > > +// > > +// rcu_tasks_trace_pregp_step(): > > +// Initialize the count of readers and block CPU-hotplug operations. > > +// rcu_tasks_trace_pertask(), invoked on every non-idle task: > > +// Initialize per-task state and attempt to identify an immediate > > +// quiescent state for that task, or, failing that, attempt to set > > +// that task's .need_qs flag so that that task's next outermost > > +// rcu_read_unlock_trace() will report the quiescent state (in which > > +// case the count of readers is incremented). If both attempts fail, > > +// the task is added to a "holdout" list. > > +// rcu_tasks_trace_postscan(): > > +// Initialize state and attempt to identify an immediate quiescent > > +// state as above (but only for idle tasks), unblock CPU-hotplug > > +// operations, and wait for an RCU grace period to avoid races with > > +// tasks that are in the process of exiting. > > +// check_all_holdout_tasks_trace(), repeatedly until holdout list is empty: > > +// Scans the holdout list, attempting to identify a quiescent state > > +// for each task on the list. If there is a quiescent state, the > > +// corresponding task is removed from the holdout list. > > +// rcu_tasks_trace_postgp(): > > +// Wait for the count of readers do drop to zero, reporting any stalls. > > +// Also execute full memory barriers to maintain ordering with code > > +// executing after the grace period. > > +// > > +// The exit_tasks_rcu_finish_trace() synchronizes with exiting tasks. > > +// > > +// Pre-grace-period update-side code is ordered before the grace > > +// period via the ->cbs_lock and barriers in rcu_tasks_kthread(). > > +// Pre-grace-period read-side code is ordered before the grace period by > > +// atomic_dec_and_test() of the count of readers (for IPIed readers) and by > > +// scheduler context-switch ordering (for locked-down non-running readers). > > The rest looks good, thanks! Thank you for looking it over! Thanx, Paul ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: tasks-trace RCU: question about grace period forward progress 2021-02-25 18:33 ` Paul E. McKenney 2021-02-25 20:20 ` Mathieu Desnoyers @ 2021-02-25 22:23 ` Peter Zijlstra 2021-02-25 23:05 ` Paul E. McKenney 1 sibling, 1 reply; 8+ messages in thread From: Peter Zijlstra @ 2021-02-25 22:23 UTC (permalink / raw) To: Paul E. McKenney Cc: Mathieu Desnoyers, linux-kernel, rcu, Josh Triplett, rostedt, Lai Jiangshan, Joel Fernandes, Google On Thu, Feb 25, 2021 at 10:33:21AM -0800, Paul E. McKenney wrote: > One question for Peter... Does each and every context switch imply a > full barrier? Yes, also see the smp_mb__after_spinlock() in __schedule() :-) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: tasks-trace RCU: question about grace period forward progress 2021-02-25 22:23 ` Peter Zijlstra @ 2021-02-25 23:05 ` Paul E. McKenney 0 siblings, 0 replies; 8+ messages in thread From: Paul E. McKenney @ 2021-02-25 23:05 UTC (permalink / raw) To: Peter Zijlstra Cc: Mathieu Desnoyers, linux-kernel, rcu, Josh Triplett, rostedt, Lai Jiangshan, Joel Fernandes, Google On Thu, Feb 25, 2021 at 11:23:18PM +0100, Peter Zijlstra wrote: > On Thu, Feb 25, 2021 at 10:33:21AM -0800, Paul E. McKenney wrote: > > One question for Peter... Does each and every context switch imply a > > full barrier? > > Yes, also see the smp_mb__after_spinlock() in __schedule() :-) Whew!!! ;-) Yeah, I could make RCU Tasks Trace deal with lack of a full barrier in that case, but I would rather not... I could imagine optimizing so that the full barrier happened only when tasks migrated, but I could also imagine a world of hurt stemming from such an optimization! Thanx, Paul ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-02-25 23:06 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-02-25 14:22 tasks-trace RCU: question about grace period forward progress Mathieu Desnoyers 2021-02-25 15:36 ` Paul E. McKenney 2021-02-25 15:47 ` Mathieu Desnoyers 2021-02-25 18:33 ` Paul E. McKenney 2021-02-25 20:20 ` Mathieu Desnoyers 2021-02-25 20:55 ` Paul E. McKenney 2021-02-25 22:23 ` Peter Zijlstra 2021-02-25 23:05 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox