All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Anna-Maria Gleixner <anna-maria@linutronix.de>,
	linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Frederic Weisbecker <fweisbec@gmail.com>
Subject: Re: sched/core warning triggers on rcu torture test
Date: Thu, 28 Jun 2018 09:44:48 -0700	[thread overview]
Message-ID: <20180628164448.GL3593@linux.vnet.ibm.com> (raw)
In-Reply-To: <20180628163323.GB19886@lerouge>

On Thu, Jun 28, 2018 at 06:33:24PM +0200, Frederic Weisbecker wrote:
> On Wed, Jun 27, 2018 at 07:25:29AM -0700, Paul E. McKenney wrote:
> > On Wed, Jun 27, 2018 at 12:40:15PM +0200, Frederic Weisbecker wrote:
> > > On Tue, Jun 26, 2018 at 10:48:26AM -0700, Paul E. McKenney wrote:
> > > > On Tue, Jun 26, 2018 at 06:32:55PM +0200, Peter Zijlstra wrote:
> > > > > On Tue, Jun 26, 2018 at 06:16:04PM +0200, Anna-Maria Gleixner wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > during rcu torture tests (TREE04 and TREE07) I noticed, that a
> > > > > > WARN_ON_ONCE() in sched core triggers on a recent 4.18-rc2 based
> > > > > > kernel (6f0d349d922b ("Merge
> > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")) as well as
> > > > > > on a 4.17.3.
> > > > 
> > > > First, I am very glad that I am not the only one running rcutorture!  ;-)
> > > > 
> > > > > > I'm running the tests on a machine with 144 cores:
> > > > > > 
> > > > > >   tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 144 --duration 120 --configs "9*TREE07"
> > > > > >   tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 144 --duration 120 --configs "18*TREE04"
> > > > > > 
> > > > > > 
> > > > > > The warning was introduced by commit d84b31313ef8 ("sched/isolation:
> > > > > > Offload residual 1Hz scheduler tick").
> > > > > > 
> > > > > > 
> > > > > > Output looks similar for all tests I did (this one is the output of
> > > > > > the 4.18-rc2 based kernel):
> > > > > > 
> > > > > > WARNING: CPU: 11 PID: 906 at kernel/sched/core.c:3138 sched_tick_remote+0xb6/0xc0
> > > > > 
> > > > > That's nohz_full stuff, is that a normal part of rcutorture? In any
> > > > > case, is the one housekeeping CPU getting seriously overloaded or
> > > > > something?
> > > > 
> > > > Yes, nohz_full is a normal part for rcutorture because RCU has to deal
> > > > differently with userspace execution in the nohz_full case.
> > > > 
> > > > I do see this splat (at least when I don't comment it out), but I
> > > > do share my system with others, so I could easily be overloading the
> > > > housekeeping vCPUs due to hypervisor preemption.  I was intending to
> > > > dig into this one once I got done consolidating RCU-bh, RCU-preempt,
> > > > and RCU-sched at Linus's behest.
> > > > 
> > > > On overloading the housekeeping CPU without outside load, let's look at
> > > > TREE04 and TREE07 separately.
> > > > 
> > > > TREE04 uses eight CPUs, and seven of them ("nohz_full=1-7") are nohz_full
> > > > CPUs, and rcutorture doesn't generate all that large of a callback load.
> > > > It looks like all 144 CPUs are used in this case (18*8), though RCU
> > > > enforces idle periods in order to test idle/non-idle transitions.
> > > > But was there anything else running on the machine at the time?
> > > > 
> > > > TREE07 uses 16 CPUs, and eight of them ("nohz_full=2-9") are nohz_full
> > > > CPUs.  Again, it looks like all 144 CPUs are used (9*8).
> > > > 
> > > > I sometimes see this on TASKS03 as well, which uses two CPUs, and one of
> > > > them ("nohz_full=1") is a nohz_full CPU.
> > > > 
> > > > If your system is otherwise idle, would it make sense to trace context
> > > > switches on CPU 0 to see what it is up to?  And to do an ftrace_dump()
> > > > and turn tracing off when the warning triggers as well?
> > > 
> > > Yeah you guys reported me this warning a few times ago. I didn't manage to reproduce
> > > it because I fought and failed with a high NR_CPUS machine. But apparently 8 CPUs
> > > are enough. Let me try that with TREE04.
> > 
> > Looking forward to hearing what you find!
> 
> Please check "[PATCH] sched/nohz: Skip remote tick on idle task entirely" which I
> just posted. In the hope that the warning didn't trigger for another reason on
> your testings.

Very cool, thank you!  Firing up rcutorture with this now.

							Thanx, Paul


  reply	other threads:[~2018-06-28 16:42 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-26 16:16 sched/core warning triggers on rcu torture test Anna-Maria Gleixner
2018-06-26 16:32 ` Peter Zijlstra
2018-06-26 17:48   ` Paul E. McKenney
2018-06-27 10:40     ` Frederic Weisbecker
2018-06-27 14:25       ` Paul E. McKenney
2018-06-28 16:33         ` Frederic Weisbecker
2018-06-28 16:44           ` Paul E. McKenney [this message]
2018-06-28 19:04             ` Paul E. McKenney
2018-06-27 11:29 ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180628164448.GL3593@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=anna-maria@linutronix.de \
    --cc=frederic@kernel.org \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.