From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Joel Fernandes <joel@joelfernandes.org>
Cc: linux-kernel@vger.kernel.org, mingo@kernel.org,
jiangshanlai@gmail.com, dipankar@in.ibm.com,
akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org,
rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com,
fweisbec@gmail.com, oleg@redhat.com
Subject: Re: [PATCH RFC tip/core/rcu 1/2] rcu: Defer reporting RCU-preempt quiescent states when disabled
Date: Sun, 1 Jul 2018 15:25:01 -0700 [thread overview]
Message-ID: <20180701222501.GC3593@linux.vnet.ibm.com> (raw)
In-Reply-To: <20180701174045.GA111992@joelaf.mtv.corp.google.com>
On Sun, Jul 01, 2018 at 10:40:45AM -0700, Joel Fernandes wrote:
> On Wed, Jun 27, 2018 at 01:49:14PM -0700, Paul E. McKenney wrote:
> > This commit defers reporting of RCU-preempt quiescent states at
> > rcu_read_unlock_special() time when any of interrupts, softirq, or
> > preemption are disabled. These deferred quiescent states are reported
> > at a later RCU_SOFTIRQ, context switch, idle entry, or CPU-hotplug
> > offline operation. Of course, if another RCU read-side critical
> > section has started in the meantime, the reporting of the quiescent
> > state will be further deferred.
> >
> > This also means that disabling preemption, interrupts, and/or
> > softirqs will act as an RCU-preempt read-side critical section.
> > This is enforced by checking preempt_count() as needed.
> >
> > Some special cases must be handled on an ad-hoc basis, for example,
> > context switch is a quiescent state even though both the scheduler and
> > do_exit() disable preemption. In these cases, additional calls to
> > rcu_preempt_deferred_qs() override the preemption disabling. Similar
> > logic overrides disabled interrupts in rcu_preempt_check_callbacks()
> > because in this case the quiescent state happened just before the
> > corresponding scheduling-clock interrupt.
> >
> > This change lifts a long-standing restriction that required that if
> > interrupts were disabled across a call to rcu_read_unlock() that the
> > matching rcu_read_lock() also be contained within that interrupts-disabled
> > region of code. Because the reporting of the corresponding RCU-preempt
> > quiescent state is now deferred until after interrupts have been enabled,
> > it is no longer possible for this situation to result in deadlocks
> > involving the scheduler's runqueue and priority-inheritance locks.
> > This may allow some code simplification that might reduce interrupt
> > latency a bit. Unfortunately, this would also defer deboosting a
> > low-priority task that had been subjected to RCU priority boosting,
> > so real-time-response considerations might well force this restriction
> > to remain in place.
> >
> > Because RCU-preempt grace periods are now blocked not only by RCU
> > read-side critical sections, but also by disabling of interrupts,
> > preemption, and softirqs, it will be possible to eliminate RCU-bh and
> > RCU-sched in favor of RCU-preempt in CONFIG_PREEMPT=y kernels. This may
> > require some additional plumbing to provide the network denial-of-service
> > guarantees that have been traditionally provided by RCU-bh. Once these
> > are in place, CONFIG_PREEMPT=n kernels will be able to fold RCU-bh
> > into RCU-sched. This would mean that all kernels would have but
> > one flavor of RCU, which would open the door to significant code
> > cleanup.
> >
> > Moving to a single flavor of RCU would also have the beneficial effect
> > of reducing the NOCB kthreads by at least a factor of two.
> >
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> [...]
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index c1b17f5b9361..ff5c70eae47d 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -371,6 +371,9 @@ static void rcu_preempt_note_context_switch(bool preempt)
> > * behalf of preempted instance of __rcu_read_unlock().
> > */
> > rcu_read_unlock_special(t);
> > + rcu_preempt_deferred_qs(t);
> > + } else {
> > + rcu_preempt_deferred_qs(t);
> > }
> >
> > /*
> > @@ -464,54 +467,51 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp)
> > }
> >
> > /*
> > - * Handle special cases during rcu_read_unlock(), such as needing to
> > - * notify RCU core processing or task having blocked during the RCU
> > - * read-side critical section.
> > + * Report deferred quiescent states. The deferral time can
> > + * be quite short, for example, in the case of the call from
> > + * rcu_read_unlock_special().
> > */
> > -static void rcu_read_unlock_special(struct task_struct *t)
> > +static void
> > +rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
> > {
> > bool empty_exp;
> > bool empty_norm;
> > bool empty_exp_now;
> > - unsigned long flags;
> > struct list_head *np;
> > bool drop_boost_mutex = false;
> > struct rcu_data *rdp;
> > struct rcu_node *rnp;
> > union rcu_special special;
> >
> > - /* NMI handlers cannot block and cannot safely manipulate state. */
> > - if (in_nmi())
> > - return;
> > -
> > - local_irq_save(flags);
> > -
> > /*
> > * If RCU core is waiting for this CPU to exit its critical section,
> > * report the fact that it has exited. Because irqs are disabled,
> > * t->rcu_read_unlock_special cannot change.
> > */
> > special = t->rcu_read_unlock_special;
> > + rdp = this_cpu_ptr(rcu_state_p->rda);
> > + if (!special.s && !rdp->deferred_qs) {
> > + local_irq_restore(flags);
> > + return;
> > + }
> > if (special.b.need_qs) {
> > rcu_preempt_qs();
> > t->rcu_read_unlock_special.b.need_qs = false;
> > - if (!t->rcu_read_unlock_special.s) {
> > + if (!t->rcu_read_unlock_special.s && !rdp->deferred_qs) {
> > local_irq_restore(flags);
> > return;
> > }
> > }
> >
> > /*
> > - * Respond to a request for an expedited grace period, but only if
> > - * we were not preempted, meaning that we were running on the same
> > - * CPU throughout. If we were preempted, the exp_need_qs flag
> > - * would have been cleared at the time of the first preemption,
> > - * and the quiescent state would be reported when we were dequeued.
> > + * Respond to a request by an expedited grace period for a
> > + * quiescent state from this CPU. Note that requests from
> > + * tasks are handled when removing the task from the
> > + * blocked-tasks list below.
> > */
> > - if (special.b.exp_need_qs) {
> > - WARN_ON_ONCE(special.b.blocked);
> > + if (special.b.exp_need_qs || rdp->deferred_qs) {
> > t->rcu_read_unlock_special.b.exp_need_qs = false;
> > - rdp = this_cpu_ptr(rcu_state_p->rda);
> > + rdp->deferred_qs = false;
> > rcu_report_exp_rdp(rcu_state_p, rdp, true);
> > if (!t->rcu_read_unlock_special.s) {
> > local_irq_restore(flags);
> > @@ -519,19 +519,6 @@ static void rcu_read_unlock_special(struct task_struct *t)
> > }
> > }
> >
> > - /* Hardware IRQ handlers cannot block, complain if they get here. */
> > - if (in_irq() || in_serving_softirq()) {
> > - lockdep_rcu_suspicious(__FILE__, __LINE__,
> > - "rcu_read_unlock() from irq or softirq with blocking in critical section!!!\n");
> > - pr_alert("->rcu_read_unlock_special: %#x (b: %d, enq: %d nq: %d)\n",
> > - t->rcu_read_unlock_special.s,
> > - t->rcu_read_unlock_special.b.blocked,
> > - t->rcu_read_unlock_special.b.exp_need_qs,
> > - t->rcu_read_unlock_special.b.need_qs);
> > - local_irq_restore(flags);
> > - return;
> > - }
> > -
> > /* Clean up if blocked during RCU read-side critical section. */
> > if (special.b.blocked) {
> > t->rcu_read_unlock_special.b.blocked = false;
> > @@ -602,6 +589,66 @@ static void rcu_read_unlock_special(struct task_struct *t)
> > }
> > }
> >
> > +/*
> > + * Is a deferred quiescent-state pending, and are we also not in
> > + * an RCU read-side critical section? It is the caller's responsibility
> > + * to ensure it is otherwise safe to report any deferred quiescent
> > + * states. The reason for this is that it is safe to report a
> > + * quiescent state during context switch even though preemption
> > + * is disabled. This function cannot be expected to understand these
> > + * nuances, so the caller must handle them.
> > + */
> > +static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
> > +{
> > + return (this_cpu_ptr(&rcu_preempt_data)->deferred_qs ||
> > + READ_ONCE(t->rcu_read_unlock_special.s)) &&
> > + !t->rcu_read_lock_nesting;
> > +}
> > +
> > +/*
> > + * Report a deferred quiescent state if needed and safe to do so.
> > + * As with rcu_preempt_need_deferred_qs(), "safe" involves only
> > + * not being in an RCU read-side critical section. The caller must
> > + * evaluate safety in terms of interrupt, softirq, and preemption
> > + * disabling.
> > + */
> > +static void rcu_preempt_deferred_qs(struct task_struct *t)
> > +{
> > + unsigned long flags;
> > +
> > + if (!rcu_preempt_need_deferred_qs(t))
> > + return;
> > + local_irq_save(flags);
> > + rcu_preempt_deferred_qs_irqrestore(t, flags);
> > +}
> > +
> > +/*
> > + * Handle special cases during rcu_read_unlock(), such as needing to
> > + * notify RCU core processing or task having blocked during the RCU
> > + * read-side critical section.
> > + */
> > +static void rcu_read_unlock_special(struct task_struct *t)
> > +{
> > + unsigned long flags;
> > + bool preempt_bh_were_disabled = !!(preempt_count() & ~HARDIRQ_MASK);
>
> Would it be better to just test for those bits just to be safe the higher
> order bits don't bleed in, such as PREEMPT_NEED_RESCHED, something like the
> following based on the 'dev' branch?
Good point! My plan is to merge it into the original commit with
attribution. Please let me know if you have objections.
Thanx, Paul
> thanks,
>
> - Joel
>
> ---8<-----------------------
>
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index dfeca11c9fe7..ca7cfdf422f1 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -626,7 +626,8 @@ static void rcu_preempt_deferred_qs(struct task_struct *t)
> static void rcu_read_unlock_special(struct task_struct *t)
> {
> unsigned long flags;
> - bool preempt_bh_were_disabled = !!(preempt_count() & ~HARDIRQ_MASK);
> + bool preempt_bh_were_disabled = !!(preempt_count() &
> + (PREEMPT_MASK | SOFTIRQ_MASK));
> bool irqs_were_disabled;
>
> /* NMI handlers cannot block and cannot safely manipulate state. */
>
next prev parent reply other threads:[~2018-07-01 22:23 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-27 20:48 [PATCH RFC tip/core/rcu 0/2] RCU consolidation patches Paul E. McKenney
2018-06-27 20:49 ` [PATCH RFC tip/core/rcu 1/2] rcu: Defer reporting RCU-preempt quiescent states when disabled Paul E. McKenney
2018-07-01 17:40 ` Joel Fernandes
2018-07-01 22:25 ` Paul E. McKenney [this message]
2018-07-02 0:37 ` Joel Fernandes
2018-07-02 3:03 ` Paul E. McKenney
2018-07-01 18:38 ` Joel Fernandes
2018-07-01 22:27 ` Paul E. McKenney
2018-07-02 0:35 ` Joel Fernandes
2018-07-02 3:11 ` Paul E. McKenney
2018-07-02 4:49 ` Joel Fernandes
2018-07-02 13:02 ` Paul E. McKenney
2018-07-04 6:43 ` [lkp-robot] [rcu] e46874dd99: WARNING:suspicious_RCU_usage kernel test robot
2018-07-04 6:43 ` kernel test robot
2018-07-04 11:47 ` Paul E. McKenney
2018-07-04 11:47 ` Paul E. McKenney
2018-06-27 20:49 ` [PATCH RFC tip/core/rcu 2/2] rcutorture: Handle extended "rcu" read-side critical sections Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180701222501.GC3593@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=dhowells@redhat.com \
--cc=dipankar@in.ibm.com \
--cc=edumazet@google.com \
--cc=fweisbec@gmail.com \
--cc=jiangshanlai@gmail.com \
--cc=joel@joelfernandes.org \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.