Re: [RFC][PATCH 00/16] sched: Core scheduling

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Peter Zijlstra" <peterz@infradead.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Paul Turner" <pjt@google.com>,
	"Tim Chen" <tim.c.chen@linux.intel.com>,
	"Linux List Kernel Mailing" <linux-kernel@vger.kernel.org>,
	subhra.mazumdar@oracle.com,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	"Kees Cook" <keescook@chromium.org>,
	kerrnel@google.com
Subject: Re: [RFC][PATCH 00/16] sched: Core scheduling
Date: Tue, 19 Feb 2019 16:15:32 +0100	[thread overview]
Message-ID: <20190219151532.GA40581@gmail.com> (raw)
In-Reply-To: <CAHk-=whVrNomWXRmCjnBJkosiwiGXz5pYb63aXy=nSPGjvc-1g@mail.gmail.com>


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Feb 18, 2019 at 12:40 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > If there were close to no VMEXITs, it beat smt=off, if there were lots
> > of VMEXITs it was far far worse. Supposedly hosting people try their
> > very bestest to have no VMEXITs so it mostly works for them (with the
> > obvious exception of single VCPU guests).
> >
> > It's just that people have been bugging me for this crap; and I figure
> > I'd post it now that it's not exploding anymore and let others have at.
> 
> The patches didn't look disgusting to me, but I admittedly just
> scanned through them quickly.
> 
> Are there downsides (maintenance and/or performance) when core
> scheduling _isn't_ enabled? I guess if it's not a maintenance or
> performance nightmare when off, it's ok to just give people the
> option.

So this bit is the main straight-line performance impact when the 
CONFIG_SCHED_CORE Kconfig feature is present (which I expect distros to 
enable broadly):

  +static inline bool sched_core_enabled(struct rq *rq)
  +{
  +       return static_branch_unlikely(&__sched_core_enabled) && rq->core_enabled;
  +}

   static inline raw_spinlock_t *rq_lockp(struct rq *rq)
   {
  +       if (sched_core_enabled(rq))
  +               return &rq->core->__lock
  +
          return &rq->__lock;


This should at least in principe keep the runtime overhead down to more 
NOPs and a bit bigger instruction cache footprint - modulo compiler 
shenanigans.

Here's the code generation impact on x86-64 defconfig:

   text	   data	    bss	    dec	    hex	filename
    228	     48	      0	    276	    114	sched.core.n/cpufreq.o (ex sched.core.n/built-in.a)
    228	     48	      0	    276	    114	sched.core.y/cpufreq.o (ex sched.core.y/built-in.a)

   4438	     96	      0	   4534	   11b6	sched.core.n/completion.o (ex sched.core.n/built-in.a)
   4438	     96	      0	   4534	   11b6	sched.core.y/completion.o (ex sched.core.y/built-in.a)

   2167	   2428	      0	   4595	   11f3	sched.core.n/cpuacct.o (ex sched.core.n/built-in.a)
   2167	   2428	      0	   4595	   11f3	sched.core.y/cpuacct.o (ex sched.core.y/built-in.a)

  61099	  22114	    488	  83701	  146f5	sched.core.n/core.o (ex sched.core.n/built-in.a)
  70541	  25370	    508	  96419	  178a3	sched.core.y/core.o (ex sched.core.y/built-in.a)

   3262	   6272	      0	   9534	   253e	sched.core.n/wait_bit.o (ex sched.core.n/built-in.a)
   3262	   6272	      0	   9534	   253e	sched.core.y/wait_bit.o (ex sched.core.y/built-in.a)

  12235	    341	     96	  12672	   3180	sched.core.n/rt.o (ex sched.core.n/built-in.a)
  13073	    917	     96	  14086	   3706	sched.core.y/rt.o (ex sched.core.y/built-in.a)

  10293	    477	   1928	  12698	   319a	sched.core.n/topology.o (ex sched.core.n/built-in.a)
  10363	    509	   1928	  12800	   3200	sched.core.y/topology.o (ex sched.core.y/built-in.a)

    886	     24	      0	    910	    38e	sched.core.n/cpupri.o (ex sched.core.n/built-in.a)
    886	     24	      0	    910	    38e	sched.core.y/cpupri.o (ex sched.core.y/built-in.a)

   1061	     64	      0	   1125	    465	sched.core.n/stop_task.o (ex sched.core.n/built-in.a)
   1077	    128	      0	   1205	    4b5	sched.core.y/stop_task.o (ex sched.core.y/built-in.a)

  18443	    365	     24	  18832	   4990	sched.core.n/deadline.o (ex sched.core.n/built-in.a)
  20019	   2189	     24	  22232	   56d8	sched.core.y/deadline.o (ex sched.core.y/built-in.a)

   1123	      8	     64	   1195	    4ab	sched.core.n/loadavg.o (ex sched.core.n/built-in.a)
   1123	      8	     64	   1195	    4ab	sched.core.y/loadavg.o (ex sched.core.y/built-in.a)

   1323	      8	      0	   1331	    533	sched.core.n/stats.o (ex sched.core.n/built-in.a)
   1323	      8	      0	   1331	    533	sched.core.y/stats.o (ex sched.core.y/built-in.a)

   1282	    164	     32	   1478	    5c6	sched.core.n/isolation.o (ex sched.core.n/built-in.a)
   1282	    164	     32	   1478	    5c6	sched.core.y/isolation.o (ex sched.core.y/built-in.a)

   1564	     36	      0	   1600	    640	sched.core.n/cpudeadline.o (ex sched.core.n/built-in.a)
   1564	     36	      0	   1600	    640	sched.core.y/cpudeadline.o (ex sched.core.y/built-in.a)

   1640	     56	      0	   1696	    6a0	sched.core.n/swait.o (ex sched.core.n/built-in.a)
   1640	     56	      0	   1696	    6a0	sched.core.y/swait.o (ex sched.core.y/built-in.a)

   1859	    244	     32	   2135	    857	sched.core.n/clock.o (ex sched.core.n/built-in.a)
   1859	    244	     32	   2135	    857	sched.core.y/clock.o (ex sched.core.y/built-in.a)

   2339	      8	      0	   2347	    92b	sched.core.n/cputime.o (ex sched.core.n/built-in.a)
   2339	      8	      0	   2347	    92b	sched.core.y/cputime.o (ex sched.core.y/built-in.a)

   3014	     32	      0	   3046	    be6	sched.core.n/membarrier.o (ex sched.core.n/built-in.a)
   3014	     32	      0	   3046	    be6	sched.core.y/membarrier.o (ex sched.core.y/built-in.a)

  50027	    964	     96	  51087	   c78f	sched.core.n/fair.o (ex sched.core.n/built-in.a)
  51537	   2484	     96	  54117	   d365	sched.core.y/fair.o (ex sched.core.y/built-in.a)

   3192	    220	      0	   3412	    d54	sched.core.n/idle.o (ex sched.core.n/built-in.a)
   3276	    252	      0	   3528	    dc8	sched.core.y/idle.o (ex sched.core.y/built-in.a)

   3633	      0	      0	   3633	    e31	sched.core.n/pelt.o (ex sched.core.n/built-in.a)
   3633	      0	      0	   3633	    e31	sched.core.y/pelt.o (ex sched.core.y/built-in.a)

   3794	    160	      0	   3954	    f72	sched.core.n/wait.o (ex sched.core.n/built-in.a)
   3794	    160	      0	   3954	    f72	sched.core.y/wait.o (ex sched.core.y/built-in.a)

I'd say this one is representative:

   text	   data	    bss	    dec	    hex	filename
  12235	    341	     96	  12672	   3180	sched.core.n/rt.o (ex sched.core.n/built-in.a)
  13073	    917	     96	  14086	   3706	sched.core.y/rt.o (ex sched.core.y/built-in.a)

which ~6% bloat is primarily due to the higher rq-lock inlining overhead, 
I believe.

This is roughly what you'd expect from a change wrapping all 350+ inlined 
instantiations of rq->lock uses. I.e. it might make sense to uninline it.

In terms of long term maintenance overhead, ignoring the overhead of the 
core-scheduling feature itself, the rq-lock wrappery is the biggest 
ugliness, the rest is mostly isolated.

So if this actually *works* and improves the performance of some real 
VMEXIT-poor SMT workloads and allows the enabling of HyperThreading with 
untrusted VMs without inviting thousands of guest roots then I'm 
cautiously in support of it.

> That all assumes that it works at all for the people who are clamoring 
> for this feature, but I guess they can run some loads on it eventually. 
> It's a holiday in the US right now ("Presidents' Day"), but maybe we 
> can get some numebrs this week?

Such numbers would be *very* helpful indeed.

Thanks,

	Ingo

next prev parent reply	other threads:[~2019-02-19 15:15 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-18 16:56 [RFC][PATCH 00/16] sched: Core scheduling Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 01/16] stop_machine: Fix stop_cpus_in_progress ordering Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 02/16] sched: Fix kerneldoc comment for ia64_set_curr_task Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 03/16] sched: Wrap rq::lock access Peter Zijlstra
2019-02-19 16:13   ` Phil Auld
2019-02-19 16:22     ` Peter Zijlstra
2019-02-19 16:37       ` Phil Auld
2019-03-18 15:41   ` Julien Desfossez
2019-03-20  2:29     ` Subhra Mazumdar
2019-03-21 21:20       ` Julien Desfossez
2019-03-22 13:34         ` Peter Zijlstra
2019-03-22 20:59           ` Julien Desfossez
2019-03-23  0:06         ` Subhra Mazumdar
2019-03-27  1:02           ` Subhra Mazumdar
2019-03-29 13:35           ` Julien Desfossez
2019-03-29 22:23             ` Subhra Mazumdar
2019-04-01 21:35               ` Subhra Mazumdar
2019-04-03 20:16                 ` Julien Desfossez
2019-04-05  1:30                   ` Subhra Mazumdar
2019-04-02  7:42               ` Peter Zijlstra
2019-03-22 23:28       ` Tim Chen
2019-03-22 23:44         ` Tim Chen
2019-02-18 16:56 ` [RFC][PATCH 04/16] sched/{rt,deadline}: Fix set_next_task vs pick_next_task Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 05/16] sched: Add task_struct pointer to sched_class::set_curr_task Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 06/16] sched/fair: Export newidle_balance() Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 07/16] sched: Allow put_prev_task() to drop rq->lock Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 08/16] sched: Rework pick_next_task() slow-path Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 09/16] sched: Introduce sched_class::pick_task() Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 10/16] sched: Core-wide rq->lock Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 11/16] sched: Basic tracking of matching tasks Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 12/16] sched: A quick and dirty cgroup tagging interface Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 13/16] sched: Add core wide task selection and scheduling Peter Zijlstra
     [not found]   ` <20190402064612.GA46500@aaronlu>
2019-04-02  8:28     ` Peter Zijlstra
2019-04-02 13:20       ` Aaron Lu
2019-04-05 14:55       ` Aaron Lu
2019-04-09 18:09         ` Tim Chen
2019-04-10  4:36           ` Aaron Lu
2019-04-10 14:18             ` Aubrey Li
2019-04-11  2:11               ` Aaron Lu
2019-04-10 14:44             ` Peter Zijlstra
2019-04-11  3:05               ` Aaron Lu
2019-04-11  9:19                 ` Peter Zijlstra
2019-04-10  8:06           ` Peter Zijlstra
2019-04-10 19:58             ` Vineeth Remanan Pillai
2019-04-15 16:59             ` Julien Desfossez
2019-04-16 13:43       ` Aaron Lu
2019-04-09 18:38   ` Julien Desfossez
2019-04-10 15:01     ` Peter Zijlstra
2019-04-11  0:11     ` Subhra Mazumdar
2019-04-19  8:40       ` Ingo Molnar
2019-04-19 23:16         ` Subhra Mazumdar
2019-02-18 16:56 ` [RFC][PATCH 14/16] sched/fair: Add a few assertions Peter Zijlstra
2019-02-18 16:56 ` [RFC][PATCH 15/16] sched: Trivial forced-newidle balancer Peter Zijlstra
2019-02-21 16:19   ` Valentin Schneider
2019-02-21 16:41     ` Peter Zijlstra
2019-02-21 16:47       ` Peter Zijlstra
2019-02-21 18:28         ` Valentin Schneider
2019-04-04  8:31       ` Aubrey Li
2019-04-06  1:36         ` Aubrey Li
2019-02-18 16:56 ` [RFC][PATCH 16/16] sched: Debug bits Peter Zijlstra
2019-02-18 17:49 ` [RFC][PATCH 00/16] sched: Core scheduling Linus Torvalds
2019-02-18 20:40   ` Peter Zijlstra
2019-02-19  0:29     ` Linus Torvalds
2019-02-19 15:15       ` Ingo Molnar [this message]
2019-02-22 12:17     ` Paolo Bonzini
2019-02-22 14:20       ` Peter Zijlstra
2019-02-22 19:26         ` Tim Chen
2019-02-26  8:26           ` Aubrey Li
2019-02-27  7:54             ` Aubrey Li
2019-02-21  2:53   ` Subhra Mazumdar
2019-02-21 14:03     ` Peter Zijlstra
2019-02-21 18:44       ` Subhra Mazumdar
2019-02-22  0:34       ` Subhra Mazumdar
2019-02-22 12:45   ` Mel Gorman
2019-02-22 16:10     ` Mel Gorman
2019-03-08 19:44     ` Subhra Mazumdar
2019-03-11  4:23       ` Aubrey Li
2019-03-11 18:34         ` Subhra Mazumdar
2019-03-11 23:33           ` Subhra Mazumdar
2019-03-12  0:20             ` Greg Kerr
2019-03-12  0:47               ` Subhra Mazumdar
2019-03-12  7:33               ` Aaron Lu
2019-03-12  7:45             ` Aubrey Li
2019-03-13  5:55               ` Aubrey Li
2019-03-14  0:35                 ` Tim Chen
2019-03-14  5:30                   ` Aubrey Li
2019-03-14  6:07                     ` Li, Aubrey
2019-03-18  6:56             ` Aubrey Li
2019-03-12 19:07           ` Pawan Gupta
2019-03-26  7:32       ` Aaron Lu
2019-03-26  7:56         ` Aaron Lu
2019-02-19 22:07 ` Greg Kerr
2019-02-20  9:42   ` Peter Zijlstra
2019-02-20 18:33     ` Greg Kerr
2019-02-22 14:10       ` Peter Zijlstra
2019-03-07 22:06         ` Paolo Bonzini
2019-02-20 18:43     ` Subhra Mazumdar
2019-03-01  2:54 ` Subhra Mazumdar
2019-03-14 15:28 ` Julien Desfossez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190219151532.GA40581@gmail.com \
    --to=mingo@kernel.org \
    --cc=fweisbec@gmail.com \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=subhra.mazumdar@oracle.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.