Re: tasks-trace RCU: question about grace period forward progress

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: paulmck <paulmck@kernel.org>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	rcu <rcu@vger.kernel.org>, Peter Zijlstra <peterz@infradead.org>,
	Josh Triplett <josh@joshtriplett.org>,
	rostedt <rostedt@goodmis.org>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	"Joel Fernandes, Google" <joel@joelfernandes.org>
Subject: Re: tasks-trace RCU: question about grace period forward progress
Date: Thu, 25 Feb 2021 10:47:32 -0500 (EST)	[thread overview]
Message-ID: <47558398.5024.1614268052985.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20210225153656.GQ2743@paulmck-ThinkPad-P72>

----- On Feb 25, 2021, at 10:36 AM, paulmck paulmck@kernel.org wrote:

> On Thu, Feb 25, 2021 at 09:22:48AM -0500, Mathieu Desnoyers wrote:
>> Hi Paul,
>> 
>> Answering a question from Peter on IRC got me to look at rcu_read_lock_trace(),
>> and I see this:
>> 
>> static inline void rcu_read_lock_trace(void)
>> {
>>         struct task_struct *t = current;
>> 
>>         WRITE_ONCE(t->trc_reader_nesting, READ_ONCE(t->trc_reader_nesting) + 1);
>>         barrier();
>>         if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB) &&
>>             t->trc_reader_special.b.need_mb)
>>                 smp_mb(); // Pairs with update-side barriers
>>         rcu_lock_acquire(&rcu_trace_lock_map);
>> }
>> 
>> static inline void rcu_read_unlock_trace(void)
>> {
>>         int nesting;
>>         struct task_struct *t = current;
>> 
>>         rcu_lock_release(&rcu_trace_lock_map);
>>         nesting = READ_ONCE(t->trc_reader_nesting) - 1;
>>         barrier(); // Critical section before disabling.
>>         // Disable IPI-based setting of .need_qs.
>>         WRITE_ONCE(t->trc_reader_nesting, INT_MIN);
>>         if (likely(!READ_ONCE(t->trc_reader_special.s)) || nesting) {
>>                 WRITE_ONCE(t->trc_reader_nesting, nesting);
>>                 return;  // We assume shallow reader nesting.
>>         }
>>         rcu_read_unlock_trace_special(t, nesting);
>> }
>> 
>> AFAIU, each thread keeps track of whether it is nested within a RCU read-side
>> critical
>> section with a counter, and grace periods iterate over all threads to make sure
>> they
>> are not within a read-side critical section before they can complete:
>> 
>> # define rcu_tasks_trace_qs(t)                                          \
>>         do {                                                            \
>>                 if (!likely(READ_ONCE((t)->trc_reader_checked)) &&      \
>>                     !unlikely(READ_ONCE((t)->trc_reader_nesting))) {    \
>>                         smp_store_release(&(t)->trc_reader_checked, true); \
>>                         smp_mb(); /* Readers partitioned by store. */   \
>>                 }                                                       \
>>         } while (0)
>> 
>> It reminds me of the liburcu urcu-mb flavor which also deals with per-thread
>> state to track whether threads are nested within a critical section:
>> 
>> https://github.com/urcu/userspace-rcu/blob/master/include/urcu/static/urcu-mb.h#L90
>> https://github.com/urcu/userspace-rcu/blob/master/include/urcu/static/urcu-mb.h#L125
>> 
>> static inline void _urcu_mb_read_lock_update(unsigned long tmp)
>> {
>> 	if (caa_likely(!(tmp & URCU_GP_CTR_NEST_MASK))) {
>> 		_CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr,
>> 		_CMM_LOAD_SHARED(urcu_mb_gp.ctr));
>> 		cmm_smp_mb();
>> 	} else
>> 		_CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr, tmp + URCU_GP_COUNT);
>> }
>> 
>> static inline void _urcu_mb_read_lock(void)
>> {
>> 	unsigned long tmp;
>> 
>> 	urcu_assert(URCU_TLS(urcu_mb_reader).registered);
>> 	cmm_barrier();
>> 	tmp = URCU_TLS(urcu_mb_reader).ctr;
>> 	urcu_assert((tmp & URCU_GP_CTR_NEST_MASK) != URCU_GP_CTR_NEST_MASK);
>> 	_urcu_mb_read_lock_update(tmp);
>> }
>> 
>> The main difference between the two algorithm is that task-trace within the
>> kernel lacks the global "urcu_mb_gp.ctr" state snapshot, which is either
>> incremented or flipped between 0 and 1 by the grace period. This allow RCU
>> readers
>> outermost nesting starting after the beginning of the grace period not to
>> prevent
>> progress of the grace period.
>> 
>> Without this, a steady flow of incoming tasks-trace-RCU readers can prevent the
>> grace period from ever completing.
>> 
>> Or is this handled in a clever way that I am missing here ?
> 
> There are several mechanisms designed to handle this.  The following
> paragraphs describe these at a high level.
> 
> The trc_wait_for_one_reader() is invoked on each task.  It uses the
> try_invoke_on_locked_down_task(), which, if the task is currently not
> running, keeps it that way and invokes trc_inspect_reader().  If the
> locked-down task is in a read-side critical section, the need_qs field
> is set, which will cause the task's next rcu_read_lock_trace() to report
> the quiescent state.

I suspect you meant "rcu_read_unlock_trace()" here.

> 
> If read-side memory barriers have been enabled, trc_inspect_reader()
> is able to check for a reader being active, and if not, reports the
> quiescent state.  If there is a reader, trc_inspect_reader() reports
> failure, which is another path to the following paragraph.
> 
> If the task could not be locked down due its currently running,
> then trc_wait_for_one_reader() attempts to send an IPI, which results in
> trc_read_check_handler() rechecking for a read-side critical section
> and either reporting the quiescent state immediately or proceding in the
> same way that trc_inspect_reader() does.  The trc_read_check_handler()
> of course checks to make sure that the target task is still running
> before doing anything.  If the attempt to send the IPI fails, then
> the task is rechecked in a later pass.
> 
> So what sequence of events did you find that causes these mechanisms
> to fail?

The explanation you provide takes care of my concerns, so I don't have
any remaining problematic scenario in mind.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

next prev parent reply	other threads:[~2021-02-25 15:48 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-25 14:22 tasks-trace RCU: question about grace period forward progress Mathieu Desnoyers
2021-02-25 15:36 ` Paul E. McKenney
2021-02-25 15:47   ` Mathieu Desnoyers [this message]
2021-02-25 18:33     ` Paul E. McKenney
2021-02-25 20:20       ` Mathieu Desnoyers
2021-02-25 20:55         ` Paul E. McKenney
2021-02-25 22:23       ` Peter Zijlstra
2021-02-25 23:05         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47558398.5024.1614268052985.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.