All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: rcu@vger.kernel.org, linux-kernel@vger.kernel.org,
	mingo@kernel.org, jiangshanlai@gmail.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
	josh@joshtriplett.org, tglx@linutronix.de, rostedt@goodmis.org,
	dhowells@redhat.com, edumazet@google.com, fweisbec@gmail.com,
	oleg@redhat.com, joel@joelfernandes.org
Subject: Re: [PATCH RFC tip/core/rcu 14/14] rcu/nohz: Make multi_cpu_stop() enable tick on all online CPUs
Date: Sun, 4 Aug 2019 11:41:59 -0700	[thread overview]
Message-ID: <20190804184159.GC28441@linux.ibm.com> (raw)
In-Reply-To: <20190804144835.GB2386@hirez.programming.kicks-ass.net>

On Sun, Aug 04, 2019 at 04:48:35PM +0200, Peter Zijlstra wrote:
> On Sun, Aug 04, 2019 at 04:43:17PM +0200, Peter Zijlstra wrote:
> > On Fri, Aug 02, 2019 at 08:15:01AM -0700, Paul E. McKenney wrote:
> > > The multi_cpu_stop() function relies on the scheduler to gain control from
> > > whatever is running on the various online CPUs, including any nohz_full
> > > CPUs running long loops in kernel-mode code.  Lack of the scheduler-clock
> > > interrupt on such CPUs can delay multi_cpu_stop() for several minutes
> > > and can also result in RCU CPU stall warnings.  This commit therefore
> > > causes multi_cpu_stop() to enable the scheduler-clock interrupt on all
> > > online CPUs.
> > 
> > This sounds wrong; should we be fixing sched_can_stop_tick() instead to
> > return false when the stop task is runnable?

Agreed.  However, it is proving surprisingly hard to come up with a
code sequence that has the effect of rcu_nocb without nohz_full.
And rcu_nocb works just fine.  With nohz_full also in place, I am
decreasing the failure rate, but it still fails, perhaps a few times
per hour of TREE04 rcutorture on an eight-CPU system.  (My 12-CPU
system stubbornly refuses to fail.  Good thing I kept the eight-CPU
system around, I guess.)

When I arrive at some sequence of actions that actually work reliably,
then by all means let's put it somewhere in the NO_HZ_FULL machinery!

> And even without that; I don't understand how we're not instantly
> preempted the moment we enqueue the stop task.

There is no preemption because CONFIG_PREEMPT=n for the scenarios still
having trouble.  Yes, there are cond_resched() calls, but they don't do
anything unless the appropriate flags are set, which won't always happen
without the tick, apparently.  Or without -something- that isn't always
happening as it should.

> Any enqueue, should go through check_preempt_curr() which will be an
> instant resched_curr() when we just woke the stop class.

I did try hitting all of the CPUs with resched_cpu().  Ten times on each
CPU with a ten-jiffy wait between each.  This might have decreased the
probability of excessively long CPU-stopper waits by a factor of two or
three, but it did not eliminate the excessively long waits.

What else should I try?

For example, are there any diagnostics I could collect, say from within
the CPU stopper when things are taking too long?  I see CPU-stopper
delays in excess of five -minutes-, so this is anything but subtle.

							Thanx, Paul


  reply	other threads:[~2019-08-04 18:42 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-02 15:14 [PATCH tip/core/rcu 0/14] No-CBs bypass addition for v5.4 Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 01/14] rcu/nocb: Atomic ->len field in rcu_segcblist structure Paul E. McKenney
2019-08-04 14:50   ` Peter Zijlstra
2019-08-04 14:52     ` Peter Zijlstra
2019-08-04 18:45       ` Paul E. McKenney
2019-08-04 18:42     ` Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 02/14] rcu/nocb: Add bypass callback queueing Paul E. McKenney
2019-08-07  0:03   ` Joel Fernandes
2019-08-07  0:16     ` Joel Fernandes
2019-08-07  0:35     ` Paul E. McKenney
2019-08-07  0:40       ` Steven Rostedt
2019-08-07  1:17         ` Paul E. McKenney
2019-08-07  1:24           ` Steven Rostedt
2019-08-07  3:47             ` Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 03/14] rcu/nocb: EXP Check use and usefulness of ->nocb_lock_contended Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 04/14] rcu/nocb: Print no-CBs diagnostics when rcutorture writer unduly delayed Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 05/14] rcu/nocb: Avoid synchronous wakeup in __call_rcu_nocb_wake() Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 06/14] rcu/nocb: Advance CBs after merge in rcutree_migrate_callbacks() Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 07/14] rcu/nocb: Reduce nocb_cb_wait() leaf rcu_node ->lock contention Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 08/14] rcu/nocb: Reduce __call_rcu_nocb_wake() " Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 09/14] rcu/nocb: Don't wake no-CBs GP kthread if timer posted under overload Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 10/14] rcu: Allow rcu_do_batch() to dynamically adjust batch sizes Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 11/14] EXP nohz: Add TICK_DEP_BIT_RCU Paul E. McKenney
2019-08-02 15:14 ` [PATCH RFC tip/core/rcu 12/14] rcu/nohz: Force on tick when invoking lots of callbacks Paul E. McKenney
2019-08-02 15:15 ` [PATCH RFC tip/core/rcu 13/14] rcutorture: Force on tick for readers and callback flooders Paul E. McKenney
2019-08-02 15:15 ` [PATCH RFC tip/core/rcu 14/14] rcu/nohz: Make multi_cpu_stop() enable tick on all online CPUs Paul E. McKenney
2019-08-04 14:43   ` Peter Zijlstra
2019-08-04 14:48     ` Peter Zijlstra
2019-08-04 18:41       ` Paul E. McKenney [this message]
2019-08-04 20:24         ` Paul E. McKenney
2019-08-05  4:19           ` Paul E. McKenney
2019-08-05  8:07             ` Peter Zijlstra
2019-08-05 14:47               ` Paul E. McKenney
2019-08-05  8:05         ` Peter Zijlstra
2019-08-05 14:54           ` Paul E. McKenney
2019-08-05 15:50             ` Peter Zijlstra
2019-08-05 17:48               ` Paul E. McKenney
2019-08-06 18:08                 ` Paul E. McKenney
2019-08-07 21:41                   ` Paul E. McKenney
2019-08-08 20:35                     ` Paul E. McKenney
2019-08-08 21:30                       ` Paul E. McKenney
2019-08-09 16:51                         ` Paul E. McKenney
2019-08-09 18:07                           ` Joel Fernandes
2019-08-09 18:39                             ` Paul E. McKenney
2019-08-12 21:02   ` Frederic Weisbecker
2019-08-12 23:23     ` Paul E. McKenney
2019-08-13  1:33       ` Joel Fernandes
2019-08-13 12:30       ` Frederic Weisbecker
2019-08-13 14:48         ` Paul E. McKenney
2019-08-14 17:55           ` Joel Fernandes
2019-08-14 22:05             ` Paul E. McKenney
2019-08-15 15:07               ` Joel Fernandes
2019-08-15 17:23                 ` Paul E. McKenney
2019-08-15 18:15                   ` Joel Fernandes
2019-08-15 18:39                     ` Paul E. McKenney
2019-08-15 19:42                       ` Joel Fernandes
2019-08-13 21:06       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190804184159.GC28441@linux.ibm.com \
    --to=paulmck@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.