From: "Paul E. McKenney" <paulmck@linux.ibm.com>
To: Joel Fernandes <joel@joelfernandes.org>
Cc: rcu@vger.kernel.org, linux-kernel@vger.kernel.org,
mingo@kernel.org, jiangshanlai@gmail.com, dipankar@in.ibm.com,
akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org,
rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com,
fweisbec@gmail.com, oleg@redhat.com
Subject: Re: [PATCH RFC tip/core/rcu 02/14] rcu/nocb: Add bypass callback queueing
Date: Tue, 6 Aug 2019 17:35:01 -0700 [thread overview]
Message-ID: <20190807003501.GX28441@linux.ibm.com> (raw)
In-Reply-To: <20190807000313.GA161170@google.com>
On Tue, Aug 06, 2019 at 08:03:13PM -0400, Joel Fernandes wrote:
> On Fri, Aug 02, 2019 at 08:14:49AM -0700, Paul E. McKenney wrote:
> > Use of the rcu_data structure's segmented ->cblist for no-CBs CPUs
> > takes advantage of unrelated grace periods, thus reducing the memory
> > footprint in the face of floods of call_rcu() invocations. However,
> > the ->cblist field is a more-complex rcu_segcblist structure which must
> > be protected via locking. Even though there are only three entities
> > which can acquire this lock (the CPU invoking call_rcu(), the no-CBs
> > grace-period kthread, and the no-CBs callbacks kthread), the contention
> > on this lock is excessive under heavy stress.
> >
> > This commit therefore greatly reduces contention by provisioning
> > an rcu_cblist structure field named ->nocb_bypass within the
> > rcu_data structure. Each no-CBs CPU is permitted only a limited
> > number of enqueues onto the ->cblist per jiffy, controlled by a new
> > nocb_nobypass_lim_per_jiffy kernel boot parameter that defaults to
> > about 16 enqueues per millisecond (16 * 1000 / HZ). When that limit is
> > exceeded, the CPU instead enqueues onto the new ->nocb_bypass.
>
> Looks quite interesting. I am guessing the not-no-CB (regular) enqueues don't
> need to use the same technique because both enqueues / callback execution are
> happening on same CPU..
That is the theory! ;-)
> Still looking through patch but I understood the basic idea. Some nits below:
>
> [snip]
> > diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> > index 2c3e9068671c..e4df86db8137 100644
> > --- a/kernel/rcu/tree.h
> > +++ b/kernel/rcu/tree.h
> > @@ -200,18 +200,26 @@ struct rcu_data {
> > atomic_t nocb_lock_contended; /* Contention experienced. */
> > int nocb_defer_wakeup; /* Defer wakeup of nocb_kthread. */
> > struct timer_list nocb_timer; /* Enforce finite deferral. */
> > + unsigned long nocb_gp_adv_time; /* Last call_rcu() CB adv (jiffies). */
> > +
> > + /* The following fields are used by call_rcu, hence own cacheline. */
> > + raw_spinlock_t nocb_bypass_lock ____cacheline_internodealigned_in_smp;
> > + struct rcu_cblist nocb_bypass; /* Lock-contention-bypass CB list. */
> > + unsigned long nocb_bypass_first; /* Time (jiffies) of first enqueue. */
> > + unsigned long nocb_nobypass_last; /* Last ->cblist enqueue (jiffies). */
> > + int nocb_nobypass_count; /* # ->cblist enqueues at ^^^ time. */
>
> Can these and below fields be ifdef'd out if !CONFIG_RCU_NOCB_CPU so as to
> keep the size of struct smaller for benefit of systems that don't use NOCB?
Please see below...
> > /* The following fields are used by GP kthread, hence own cacheline. */
> > raw_spinlock_t nocb_gp_lock ____cacheline_internodealigned_in_smp;
> > - bool nocb_gp_sleep;
> > - /* Is the nocb GP thread asleep? */
> > + struct timer_list nocb_bypass_timer; /* Force nocb_bypass flush. */
> > + bool nocb_gp_sleep; /* Is the nocb GP thread asleep? */
>
> And these too, I think.
>
>
> > struct swait_queue_head nocb_gp_wq; /* For nocb kthreads to sleep on. */
> > bool nocb_cb_sleep; /* Is the nocb CB thread asleep? */
> > struct task_struct *nocb_cb_kthread;
> > struct rcu_data *nocb_next_cb_rdp;
> > /* Next rcu_data in wakeup chain. */
> >
> > - /* The following fields are used by CB kthread, hence new cachline. */
> > + /* The following fields are used by CB kthread, hence new cacheline. */
> > struct rcu_data *nocb_gp_rdp ____cacheline_internodealigned_in_smp;
> > /* GP rdp takes GP-end wakeups. */
> > #endif /* #ifdef CONFIG_RCU_NOCB_CPU */
I believe that they in fact are all under CONFIG_RCU_NOCB_CPU.
> [snip]
> > +static void rcu_nocb_try_flush_bypass(struct rcu_data *rdp, unsigned long j)
> > +{
> > + rcu_lockdep_assert_cblist_protected(rdp);
> > + if (!rcu_segcblist_is_offloaded(&rdp->cblist) ||
> > + !rcu_nocb_bypass_trylock(rdp))
> > + return;
> > + WARN_ON_ONCE(!rcu_nocb_do_flush_bypass(rdp, NULL, j));
> > +}
> > +
> > +/*
> > + * See whether it is appropriate to use the ->nocb_bypass list in order
> > + * to control contention on ->nocb_lock. A limited number of direct
> > + * enqueues are permitted into ->cblist per jiffy. If ->nocb_bypass
> > + * is non-empty, further callbacks must be placed into ->nocb_bypass,
> > + * otherwise rcu_barrier() breaks. Use rcu_nocb_flush_bypass() to switch
> > + * back to direct use of ->cblist. However, ->nocb_bypass should not be
> > + * used if ->cblist is empty, because otherwise callbacks can be stranded
> > + * on ->nocb_bypass because we cannot count on the current CPU ever again
> > + * invoking call_rcu(). The general rule is that if ->nocb_bypass is
> > + * non-empty, the corresponding no-CBs grace-period kthread must not be
> > + * in an indefinite sleep state.
> > + *
> > + * Finally, it is not permitted to use the bypass during early boot,
> > + * as doing so would confuse the auto-initialization code. Besides
> > + * which, there is no point in worrying about lock contention while
> > + * there is only one CPU in operation.
> > + */
> > +static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> > + bool *was_alldone, unsigned long flags)
> > +{
> > + unsigned long c;
> > + unsigned long cur_gp_seq;
> > + unsigned long j = jiffies;
> > + long ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> > +
> > + if (!rcu_segcblist_is_offloaded(&rdp->cblist)) {
> > + *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
> > + return false; /* Not offloaded, no bypassing. */
> > + }
> > + lockdep_assert_irqs_disabled();
> > +
> > + // Don't use ->nocb_bypass during early boot.
>
> Very minor nit: comment style should be /* */
I thought that Linus said that "//" was now OK. Am I confused?
Thanx, Paul
next prev parent reply other threads:[~2019-08-07 0:35 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-02 15:14 [PATCH tip/core/rcu 0/14] No-CBs bypass addition for v5.4 Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 01/14] rcu/nocb: Atomic ->len field in rcu_segcblist structure Paul E. McKenney
2019-08-04 14:50 ` Peter Zijlstra
2019-08-04 14:52 ` Peter Zijlstra
2019-08-04 18:45 ` Paul E. McKenney
2019-08-04 18:42 ` Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 02/14] rcu/nocb: Add bypass callback queueing Paul E. McKenney
2019-08-07 0:03 ` Joel Fernandes
2019-08-07 0:16 ` Joel Fernandes
2019-08-07 0:35 ` Paul E. McKenney [this message]
2019-08-07 0:40 ` Steven Rostedt
2019-08-07 1:17 ` Paul E. McKenney
2019-08-07 1:24 ` Steven Rostedt
2019-08-07 3:47 ` Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 03/14] rcu/nocb: EXP Check use and usefulness of ->nocb_lock_contended Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 04/14] rcu/nocb: Print no-CBs diagnostics when rcutorture writer unduly delayed Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 05/14] rcu/nocb: Avoid synchronous wakeup in __call_rcu_nocb_wake() Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 06/14] rcu/nocb: Advance CBs after merge in rcutree_migrate_callbacks() Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 07/14] rcu/nocb: Reduce nocb_cb_wait() leaf rcu_node ->lock contention Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 08/14] rcu/nocb: Reduce __call_rcu_nocb_wake() " Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 09/14] rcu/nocb: Don't wake no-CBs GP kthread if timer posted under overload Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 10/14] rcu: Allow rcu_do_batch() to dynamically adjust batch sizes Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 11/14] EXP nohz: Add TICK_DEP_BIT_RCU Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 12/14] rcu/nohz: Force on tick when invoking lots of callbacks Paul E. McKenney
2019-08-02 15:15 ` [PATCH RFC tip/core/rcu 13/14] rcutorture: Force on tick for readers and callback flooders Paul E. McKenney
2019-08-02 15:15 ` [PATCH RFC tip/core/rcu 14/14] rcu/nohz: Make multi_cpu_stop() enable tick on all online CPUs Paul E. McKenney
2019-08-04 14:43 ` Peter Zijlstra
2019-08-04 14:48 ` Peter Zijlstra
2019-08-04 18:41 ` Paul E. McKenney
2019-08-04 20:24 ` Paul E. McKenney
2019-08-05 4:19 ` Paul E. McKenney
2019-08-05 8:07 ` Peter Zijlstra
2019-08-05 14:47 ` Paul E. McKenney
2019-08-05 8:05 ` Peter Zijlstra
2019-08-05 14:54 ` Paul E. McKenney
2019-08-05 15:50 ` Peter Zijlstra
2019-08-05 17:48 ` Paul E. McKenney
2019-08-06 18:08 ` Paul E. McKenney
2019-08-07 21:41 ` Paul E. McKenney
2019-08-08 20:35 ` Paul E. McKenney
2019-08-08 21:30 ` Paul E. McKenney
2019-08-09 16:51 ` Paul E. McKenney
2019-08-09 18:07 ` Joel Fernandes
2019-08-09 18:39 ` Paul E. McKenney
2019-08-12 21:02 ` Frederic Weisbecker
2019-08-12 23:23 ` Paul E. McKenney
2019-08-13 1:33 ` Joel Fernandes
2019-08-13 12:30 ` Frederic Weisbecker
2019-08-13 14:48 ` Paul E. McKenney
2019-08-14 17:55 ` Joel Fernandes
2019-08-14 22:05 ` Paul E. McKenney
2019-08-15 15:07 ` Joel Fernandes
2019-08-15 17:23 ` Paul E. McKenney
2019-08-15 18:15 ` Joel Fernandes
2019-08-15 18:39 ` Paul E. McKenney
2019-08-15 19:42 ` Joel Fernandes
2019-08-13 21:06 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190807003501.GX28441@linux.ibm.com \
--to=paulmck@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=dhowells@redhat.com \
--cc=dipankar@in.ibm.com \
--cc=edumazet@google.com \
--cc=fweisbec@gmail.com \
--cc=jiangshanlai@gmail.com \
--cc=joel@joelfernandes.org \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.