All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: paulmck <paulmck@kernel.org>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	rcu <rcu@vger.kernel.org>, Peter Zijlstra <peterz@infradead.org>,
	Josh Triplett <josh@joshtriplett.org>,
	rostedt <rostedt@goodmis.org>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	"Joel Fernandes, Google" <joel@joelfernandes.org>
Subject: tasks-trace RCU: question about grace period forward progress
Date: Thu, 25 Feb 2021 09:22:48 -0500 (EST)	[thread overview]
Message-ID: <354598689.4868.1614262968890.JavaMail.zimbra@efficios.com> (raw)

Hi Paul,

Answering a question from Peter on IRC got me to look at rcu_read_lock_trace(), and I see this:

static inline void rcu_read_lock_trace(void)
{
        struct task_struct *t = current;

        WRITE_ONCE(t->trc_reader_nesting, READ_ONCE(t->trc_reader_nesting) + 1);
        barrier();
        if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB) &&
            t->trc_reader_special.b.need_mb)
                smp_mb(); // Pairs with update-side barriers
        rcu_lock_acquire(&rcu_trace_lock_map);
}

static inline void rcu_read_unlock_trace(void)
{
        int nesting;
        struct task_struct *t = current;

        rcu_lock_release(&rcu_trace_lock_map);
        nesting = READ_ONCE(t->trc_reader_nesting) - 1;
        barrier(); // Critical section before disabling.
        // Disable IPI-based setting of .need_qs.
        WRITE_ONCE(t->trc_reader_nesting, INT_MIN);
        if (likely(!READ_ONCE(t->trc_reader_special.s)) || nesting) {
                WRITE_ONCE(t->trc_reader_nesting, nesting);
                return;  // We assume shallow reader nesting.
        }
        rcu_read_unlock_trace_special(t, nesting);
}

AFAIU, each thread keeps track of whether it is nested within a RCU read-side critical
section with a counter, and grace periods iterate over all threads to make sure they
are not within a read-side critical section before they can complete:

# define rcu_tasks_trace_qs(t)                                          \
        do {                                                            \
                if (!likely(READ_ONCE((t)->trc_reader_checked)) &&      \
                    !unlikely(READ_ONCE((t)->trc_reader_nesting))) {    \
                        smp_store_release(&(t)->trc_reader_checked, true); \
                        smp_mb(); /* Readers partitioned by store. */   \
                }                                                       \
        } while (0)

It reminds me of the liburcu urcu-mb flavor which also deals with per-thread
state to track whether threads are nested within a critical section:

https://github.com/urcu/userspace-rcu/blob/master/include/urcu/static/urcu-mb.h#L90
https://github.com/urcu/userspace-rcu/blob/master/include/urcu/static/urcu-mb.h#L125

static inline void _urcu_mb_read_lock_update(unsigned long tmp)
{
	if (caa_likely(!(tmp & URCU_GP_CTR_NEST_MASK))) {
		_CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr, _CMM_LOAD_SHARED(urcu_mb_gp.ctr));
		cmm_smp_mb();
	} else
		_CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr, tmp + URCU_GP_COUNT);
}

static inline void _urcu_mb_read_lock(void)
{
	unsigned long tmp;

	urcu_assert(URCU_TLS(urcu_mb_reader).registered);
	cmm_barrier();
	tmp = URCU_TLS(urcu_mb_reader).ctr;
	urcu_assert((tmp & URCU_GP_CTR_NEST_MASK) != URCU_GP_CTR_NEST_MASK);
	_urcu_mb_read_lock_update(tmp);
}

The main difference between the two algorithm is that task-trace within the
kernel lacks the global "urcu_mb_gp.ctr" state snapshot, which is either
incremented or flipped between 0 and 1 by the grace period. This allow RCU readers
outermost nesting starting after the beginning of the grace period not to prevent
progress of the grace period.

Without this, a steady flow of incoming tasks-trace-RCU readers can prevent the
grace period from ever completing.

Or is this handled in a clever way that I am missing here ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

             reply	other threads:[~2021-02-25 14:23 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-25 14:22 Mathieu Desnoyers [this message]
2021-02-25 15:36 ` tasks-trace RCU: question about grace period forward progress Paul E. McKenney
2021-02-25 15:47   ` Mathieu Desnoyers
2021-02-25 18:33     ` Paul E. McKenney
2021-02-25 20:20       ` Mathieu Desnoyers
2021-02-25 20:55         ` Paul E. McKenney
2021-02-25 22:23       ` Peter Zijlstra
2021-02-25 23:05         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=354598689.4868.1614262968890.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.