From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>,
Dave Hansen <dave.hansen@intel.com>,
LKML <linux-kernel@vger.kernel.org>,
Josh Triplett <josh@joshtriplett.org>,
"Chen, Tim C" <tim.c.chen@intel.com>,
Christoph Lameter <cl@linux.com>
Subject: Re: [bisected] pre-3.16 regression on open() scalability
Date: Thu, 19 Jun 2014 11:14:04 -0700 [thread overview]
Message-ID: <20140619181403.GF4904@linux.vnet.ibm.com> (raw)
In-Reply-To: <1403155458.5189.54.camel@marge.simpson.net>
On Thu, Jun 19, 2014 at 07:24:18AM +0200, Mike Galbraith wrote:
> On Wed, 2014-06-18 at 21:19 -0700, Paul E. McKenney wrote:
> > On Wed, Jun 18, 2014 at 08:38:16PM -0700, Andi Kleen wrote:
> > > On Wed, Jun 18, 2014 at 07:13:37PM -0700, Paul E. McKenney wrote:
> > > > On Wed, Jun 18, 2014 at 06:42:00PM -0700, Andi Kleen wrote:
> > > > >
> > > > > I still think it's totally the wrong direction to pollute so
> > > > > many fast paths with this obscure debugging check workaround
> > > > > unconditionally.
> > > >
> > > > OOM prevention should count for something, I would hope.
> > >
> > > OOM in what scenario? This is getting bizarre.
> >
> > On the bizarre part, at least we agree on something. ;-)
> >
> > CONFIG_NO_HZ_FULL booted with at least one nohz_full CPU. Said CPU
> > gets into the kernel and stays there, not necessarily generating RCU
> > callbacks. The other CPUs are very likely generating RCU callbacks.
> > Because the nohz_full CPU is in the kernel, and because there are no
> > scheduling-clock interrupts on that CPU, grace periods do not complete.
> > Eventually, the callbacks from the other CPUs (and perhaps also some
> > from the nohz_full CPU, for that matter) OOM the machine.
> >
> > Now this scenario constitutes an abuse of CONFIG_NO_HZ_FULL, because it
> > is intended for CPUs that execute either in userspace (in which case
> > those CPUs are in extended quiescent states so that RCU can happily
> > ignore them) or for real-time workloads with low CPU untilization (in
> > which case RCU sees them go idle, which is also a quiescent state).
> > But that won't stop people from abusing their kernels and complaining
> > when things break.
>
> IMHO, those people can keep the pieces.
>
> I don't even enable RCU_BOOST in -rt kernels, because that safety net
> has a price. The instant Joe User picks up the -rt shovel, it's his
> grave, and he gets to do the digging. Instead of trying to save his
> bacon, I hand him a slightly better shovel, let him prioritize all
> kthreads including workqueues. Joe can dig all he wants to, and it's on
> him, I just make sure he has the means to bury himself properly :)
One of the nice things about NO_HZ_FULL is that it should reduce the
need for RCU_BOOST. One of the purposes of RCU_BOOST is for the guy
who has an infinite-loop bug in a high-priority RT-priority process,
because such processes can starve out low-priority RCU readers. But
with a properly configured NO_HZ_FULL system, the low-priority processes
aren't sharing CPUs with the RT-priority processes.
In fact, it might be worth making RCU_BOOST depend on !NO_HZ_FULL in -rt.
> > This same thing can also happen without CONFIG_NO_HZ full, though
> > the system has to work a bit harder. In this case, the CPU looping
> > in the kernel has scheduling-clock interrupts, but if all it does
> > is cond_resched(), RCU is never informed of any quiescent states.
> > The whole point of this patch is to make those cond_resched() calls,
> > which are quiescent states, visible to RCU.
> >
> > > If something keeps looping forever in the kernel creating
> > > RCU callbacks without any real quiescent states it's simply broken.
> >
> > I could get behind that. But by that definition, there is a lot of
> > breakage in the current kernel, especially as we move to larger CPU
> > counts.
>
> Not only larger CPU counts: skipping the -rq clock update on wakeup
> (cycle saving optimization) turned out to be deadly to boxen with a
> zillion disks because our wakeup latency can be so incredibly horrible
> that falsely attributing wakeup latency to the next task to run
> (watchdog) resulted in it being throttled for long enough that big IO
> boxen panicked during boot.
>
> The root cause of that wasn't the optimization, the root was the
> horrific amounts of time we can spend locked up in the kernel.
Completely agreed!
Thanx, Paul
next prev parent reply other threads:[~2014-06-19 18:14 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-13 20:04 [bisected] pre-3.16 regression on open() scalability Dave Hansen
2014-06-13 22:45 ` Paul E. McKenney
2014-06-13 23:35 ` Dave Hansen
2014-06-14 2:03 ` Paul E. McKenney
2014-06-17 23:10 ` Dave Hansen
2014-06-18 0:00 ` Josh Triplett
2014-06-18 0:15 ` Andi Kleen
2014-06-18 1:04 ` Paul E. McKenney
2014-06-18 2:27 ` Andi Kleen
2014-06-18 4:47 ` Paul E. McKenney
2014-06-18 12:40 ` Andi Kleen
2014-06-18 12:56 ` Paul E. McKenney
2014-06-18 14:29 ` Christoph Lameter
2014-06-18 0:18 ` Paul E. McKenney
2014-06-18 6:33 ` Dave Hansen
2014-06-18 12:58 ` Paul E. McKenney
2014-06-18 17:36 ` Dave Hansen
2014-06-18 20:30 ` Paul E. McKenney
2014-06-18 23:51 ` Paul E. McKenney
2014-06-19 1:42 ` Andi Kleen
2014-06-19 2:13 ` Paul E. McKenney
2014-06-19 2:29 ` Paul E. McKenney
2014-06-19 2:50 ` Mike Galbraith
2014-06-19 4:19 ` Paul E. McKenney
2014-06-19 3:38 ` Andi Kleen
2014-06-19 4:19 ` Paul E. McKenney
2014-06-19 5:24 ` Mike Galbraith
2014-06-19 18:14 ` Paul E. McKenney [this message]
2014-06-19 4:52 ` Eric Dumazet
2014-06-19 5:23 ` Paul E. McKenney
2014-06-19 14:42 ` Christoph Lameter
2014-06-19 18:09 ` Paul E. McKenney
2014-06-19 20:31 ` Christoph Lameter
2014-06-19 20:42 ` Paul E. McKenney
2014-06-19 20:50 ` Andi Kleen
2014-06-19 21:03 ` Paul E. McKenney
2014-06-19 21:13 ` Christoph Lameter
2014-06-19 21:16 ` Christoph Lameter
2014-06-19 21:32 ` josh
2014-06-19 23:07 ` Paul E. McKenney
2014-06-20 15:20 ` Christoph Lameter
2014-06-20 15:38 ` Paul E. McKenney
2014-06-20 16:07 ` Christoph Lameter
2014-06-20 16:30 ` Paul E. McKenney
2014-06-20 17:39 ` Dave Hansen
2014-06-20 18:15 ` Paul E. McKenney
2014-06-18 21:48 ` Paul E. McKenney
2014-06-18 22:03 ` Dave Hansen
2014-06-18 22:52 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140619181403.GF4904@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=ak@linux.intel.com \
--cc=cl@linux.com \
--cc=dave.hansen@intel.com \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tim.c.chen@intel.com \
--cc=umgwanakikbuti@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.