From: Matt Fleming <matt@codeblueprint.co.uk>
To: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
Morten.Rasmussen@arm.com, dietmar.eggemann@arm.com,
kernellwp@gmail.com, yuyang.du@intel.com,
umgwanakikbuti@gmail.com,
Mel Gorman <mgorman@techsingularity.net>
Subject: Re: [PATCH 2/2 v2] sched: use load_avg for selecting idlest group
Date: Sat, 3 Dec 2016 21:47:07 +0000 [thread overview]
Message-ID: <20161203214707.GI20785@codeblueprint.co.uk> (raw)
In-Reply-To: <CAE40pde4VRH8LfRWMX3Vfq5pRoysUB6UuHTzN+VrcKMdSNW0uA@mail.gmail.com>
On Fri, 02 Dec, at 07:31:04PM, Brendan Gregg wrote:
>
> For background, is this from the "A decade of wasted cores" paper's
> patches?
No, this patch fixes an issue I originally reported here,
https://lkml.kernel.org/r/20160923115808.2330-1-matt@codeblueprint.co.uk
Essentially, if you have an idle or partially-idle system and a
workload that consists of fork()'ing a bunch of tasks, where each of
those tasks immediately sleeps waiting for some wakeup, then those
tasks aren't spread across all idle CPUs very well.
We saw this issue when running hackbench with a small loop count, such
that the actual benchmark setup (fork()'ing) is where the majority of
the runtime is spent.
In that scenario, there's a large potential/blocked load, but
essentially no runnable load, and the balance on fork scheduler code
only cares about runnable load without Vincent's patch applied.
The closest thing I can find in the "A decade of wasted cores" paper
is "The Overload-on-Wakeup bug", but I don't think that's the issue
here since,
a) We're balancing on fork, not wakeup
b) The fork on balance code balances across nodes OK
> What's the expected typical gain? Thanks,
The results are still coming back from the SUSE performance test grid
but they do show that this patch is mainly a win for multi-socket
machines with more than 8 cores or thereabouts.
[ Vincent, I'll follow up to your PATCH 1/2 with the results that are
specifically for that patch ]
Assuming a fork-intensive or fork-dominated workload, and a
multi-socket machine, such as this 2 socket, NUMA, with 12 cores and
HT enabled (48 cpus), we saw a very clear win between +10% and +15%
for processes communicating via pipes,
(1) tip-sched = tip/sched/core branch
(2) fix-fig-for-fork = (1) + PATCH 1/2
(3) fix-sig = (1) + (2) + PATCH 2/2
hackbench-process-pipes
4.9.0-rc6 4.9.0-rc6 4.9.0-rc6
tip-sched fix-fig-for-fork fix-sig
Amean 1 0.0717 ( 0.00%) 0.0696 ( 2.99%) 0.0730 ( -1.79%)
Amean 4 0.1244 ( 0.00%) 0.1200 ( 3.56%) 0.1190 ( 4.36%)
Amean 7 0.1891 ( 0.00%) 0.1937 ( -2.42%) 0.1831 ( 3.17%)
Amean 12 0.2964 ( 0.00%) 0.3116 ( -5.11%) 0.2784 ( 6.07%)
Amean 21 0.4011 ( 0.00%) 0.4090 ( -1.96%) 0.3574 ( 10.90%)
Amean 30 0.4944 ( 0.00%) 0.4654 ( 5.87%) 0.4171 ( 15.63%)
Amean 48 0.6113 ( 0.00%) 0.6309 ( -3.20%) 0.5331 ( 12.78%)
Amean 79 0.8616 ( 0.00%) 0.8706 ( -1.04%) 0.7710 ( 10.51%)
Amean 110 1.1304 ( 0.00%) 1.2211 ( -8.02%) 1.0163 ( 10.10%)
Amean 141 1.3754 ( 0.00%) 1.4279 ( -3.81%) 1.2803 ( 6.92%)
Amean 172 1.6217 ( 0.00%) 1.7367 ( -7.09%) 1.5363 ( 5.27%)
Amean 192 1.7809 ( 0.00%) 2.0199 (-13.42%) 1.7129 ( 3.82%)
Things look even better when using threads and pipes, with wins
between 11% and 29% when looking at results outside of the noise,
hackbench-thread-pipes
4.9.0-rc6 4.9.0-rc6 4.9.0-rc6
tip-sched fix-fig-for-fork fix-sig
Amean 1 0.0736 ( 0.00%) 0.0794 ( -7.96%) 0.0779 ( -5.83%)
Amean 4 0.1709 ( 0.00%) 0.1690 ( 1.09%) 0.1663 ( 2.68%)
Amean 7 0.2836 ( 0.00%) 0.3080 ( -8.61%) 0.2640 ( 6.90%)
Amean 12 0.4393 ( 0.00%) 0.4843 (-10.24%) 0.4090 ( 6.89%)
Amean 21 0.5821 ( 0.00%) 0.6369 ( -9.40%) 0.5126 ( 11.95%)
Amean 30 0.6557 ( 0.00%) 0.6459 ( 1.50%) 0.5711 ( 12.90%)
Amean 48 0.7924 ( 0.00%) 0.7760 ( 2.07%) 0.6286 ( 20.68%)
Amean 79 1.0534 ( 0.00%) 1.0551 ( -0.16%) 0.8481 ( 19.49%)
Amean 110 1.5286 ( 0.00%) 1.4504 ( 5.11%) 1.1121 ( 27.24%)
Amean 141 1.9507 ( 0.00%) 1.7790 ( 8.80%) 1.3804 ( 29.23%)
Amean 172 2.2261 ( 0.00%) 2.3330 ( -4.80%) 1.6336 ( 26.62%)
Amean 192 2.3753 ( 0.00%) 2.3307 ( 1.88%) 1.8246 ( 23.19%)
Somewhat surprisingly, I can see improvements for UMA machines with
fewer cores when the workload heavily saturates the machine and the
workload isn't dominated by fork. Such heavy saturation isn't super
realistic, but still interesting. I haven't dug into why these results
occurred, but I am happy things didn't instead fall off a cliff.
Here's a 4-cpu UMA box showing some improvement at the higher end,
hackbench-process-pipes
4.9.0-rc6 4.9.0-rc6 4.9.0-rc6
tip-sched fix-fig-for-fork fix-sig
Amean 1 3.5060 ( 0.00%) 3.5747 ( -1.96%) 3.5117 ( -0.16%)
Amean 3 7.7113 ( 0.00%) 7.8160 ( -1.36%) 7.7747 ( -0.82%)
Amean 5 11.4453 ( 0.00%) 11.5710 ( -1.10%) 11.3870 ( 0.51%)
Amean 7 15.3147 ( 0.00%) 15.9420 ( -4.10%) 15.8450 ( -3.46%)
Amean 12 25.5110 ( 0.00%) 24.3410 ( 4.59%) 22.6717 ( 11.13%)
Amean 16 32.3010 ( 0.00%) 28.5897 ( 11.49%) 25.7473 ( 20.29%)
next prev parent reply other threads:[~2016-12-03 21:54 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-25 15:34 [PATCH 0/2 v2] sched: improve spread of tasks during fork Vincent Guittot
2016-11-25 15:34 ` [PATCH 1/2 v2] sched: fix find_idlest_group for fork Vincent Guittot
2016-11-28 17:01 ` Matt Fleming
2016-11-28 17:20 ` Vincent Guittot
2016-11-29 10:57 ` Morten Rasmussen
2016-11-29 11:42 ` Peter Zijlstra
2016-11-29 11:44 ` Matt Fleming
2016-11-29 12:30 ` Peter Zijlstra
2016-11-29 14:46 ` Morten Rasmussen
2016-12-05 8:48 ` Peter Zijlstra
2016-11-29 13:04 ` Vincent Guittot
2016-11-29 14:50 ` Morten Rasmussen
2016-11-29 14:57 ` Vincent Guittot
2016-12-03 23:25 ` Matt Fleming
2016-12-05 9:17 ` Vincent Guittot
2016-11-25 15:34 ` [PATCH 2/2 v2] sched: use load_avg for selecting idlest group Vincent Guittot
2016-11-30 12:49 ` Morten Rasmussen
2016-11-30 13:49 ` Vincent Guittot
2016-11-30 13:54 ` Vincent Guittot
2016-11-30 14:24 ` Morten Rasmussen
2016-12-02 15:20 ` Vincent Guittot
2016-12-02 22:24 ` Matt Fleming
2016-11-30 14:23 ` Morten Rasmussen
2016-12-03 3:31 ` Brendan Gregg
2016-12-03 21:47 ` Matt Fleming [this message]
2016-12-05 9:27 ` Vincent Guittot
2016-12-05 13:35 ` Matt Fleming
2016-12-08 14:09 ` Matt Fleming
2016-12-08 14:33 ` Vincent Guittot
2016-11-28 17:02 ` [PATCH 0/2 v2] sched: improve spread of tasks during fork Matt Fleming
2016-11-28 17:20 ` Vincent Guittot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161203214707.GI20785@codeblueprint.co.uk \
--to=matt@codeblueprint.co.uk \
--cc=Morten.Rasmussen@arm.com \
--cc=brendan.d.gregg@gmail.com \
--cc=dietmar.eggemann@arm.com \
--cc=kernellwp@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=umgwanakikbuti@gmail.com \
--cc=vincent.guittot@linaro.org \
--cc=yuyang.du@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).