From: Vincent Guittot <vincent.guittot@linaro.org>
To: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
Morten.Rasmussen@arm.com, dietmar.eggemann@arm.com,
kernellwp@gmail.com, yuyang.du@intel.com,
umgwanakikbuti@gmail.com,
Mel Gorman <mgorman@techsingularity.net>
Subject: Re: [PATCH 2/2 v2] sched: use load_avg for selecting idlest group
Date: Mon, 5 Dec 2016 10:27:36 +0100 [thread overview]
Message-ID: <20161205092735.GA9161@linaro.org> (raw)
In-Reply-To: <20161203214707.GI20785@codeblueprint.co.uk>
Le Saturday 03 Dec 2016 à 21:47:07 (+0000), Matt Fleming a écrit :
> On Fri, 02 Dec, at 07:31:04PM, Brendan Gregg wrote:
> >
> > For background, is this from the "A decade of wasted cores" paper's
> > patches?
>
> No, this patch fixes an issue I originally reported here,
>
> https://lkml.kernel.org/r/20160923115808.2330-1-matt@codeblueprint.co.uk
>
> Essentially, if you have an idle or partially-idle system and a
> workload that consists of fork()'ing a bunch of tasks, where each of
> those tasks immediately sleeps waiting for some wakeup, then those
> tasks aren't spread across all idle CPUs very well.
>
> We saw this issue when running hackbench with a small loop count, such
> that the actual benchmark setup (fork()'ing) is where the majority of
> the runtime is spent.
>
> In that scenario, there's a large potential/blocked load, but
> essentially no runnable load, and the balance on fork scheduler code
> only cares about runnable load without Vincent's patch applied.
>
> The closest thing I can find in the "A decade of wasted cores" paper
> is "The Overload-on-Wakeup bug", but I don't think that's the issue
> here since,
>
> a) We're balancing on fork, not wakeup
> b) The fork on balance code balances across nodes OK
>
> > What's the expected typical gain? Thanks,
>
> The results are still coming back from the SUSE performance test grid
> but they do show that this patch is mainly a win for multi-socket
> machines with more than 8 cores or thereabouts.
>
> [ Vincent, I'll follow up to your PATCH 1/2 with the results that are
> specifically for that patch ]
>
> Assuming a fork-intensive or fork-dominated workload, and a
> multi-socket machine, such as this 2 socket, NUMA, with 12 cores and
> HT enabled (48 cpus), we saw a very clear win between +10% and +15%
> for processes communicating via pipes,
>
> (1) tip-sched = tip/sched/core branch
> (2) fix-fig-for-fork = (1) + PATCH 1/2
> (3) fix-sig = (1) + (2) + PATCH 2/2
>
> hackbench-process-pipes
> 4.9.0-rc6 4.9.0-rc6 4.9.0-rc6
> tip-sched fix-fig-for-fork fix-sig
> Amean 1 0.0717 ( 0.00%) 0.0696 ( 2.99%) 0.0730 ( -1.79%)
> Amean 4 0.1244 ( 0.00%) 0.1200 ( 3.56%) 0.1190 ( 4.36%)
> Amean 7 0.1891 ( 0.00%) 0.1937 ( -2.42%) 0.1831 ( 3.17%)
> Amean 12 0.2964 ( 0.00%) 0.3116 ( -5.11%) 0.2784 ( 6.07%)
> Amean 21 0.4011 ( 0.00%) 0.4090 ( -1.96%) 0.3574 ( 10.90%)
> Amean 30 0.4944 ( 0.00%) 0.4654 ( 5.87%) 0.4171 ( 15.63%)
> Amean 48 0.6113 ( 0.00%) 0.6309 ( -3.20%) 0.5331 ( 12.78%)
> Amean 79 0.8616 ( 0.00%) 0.8706 ( -1.04%) 0.7710 ( 10.51%)
> Amean 110 1.1304 ( 0.00%) 1.2211 ( -8.02%) 1.0163 ( 10.10%)
> Amean 141 1.3754 ( 0.00%) 1.4279 ( -3.81%) 1.2803 ( 6.92%)
> Amean 172 1.6217 ( 0.00%) 1.7367 ( -7.09%) 1.5363 ( 5.27%)
> Amean 192 1.7809 ( 0.00%) 2.0199 (-13.42%) 1.7129 ( 3.82%)
>
> Things look even better when using threads and pipes, with wins
> between 11% and 29% when looking at results outside of the noise,
>
> hackbench-thread-pipes
> 4.9.0-rc6 4.9.0-rc6 4.9.0-rc6
> tip-sched fix-fig-for-fork fix-sig
> Amean 1 0.0736 ( 0.00%) 0.0794 ( -7.96%) 0.0779 ( -5.83%)
> Amean 4 0.1709 ( 0.00%) 0.1690 ( 1.09%) 0.1663 ( 2.68%)
> Amean 7 0.2836 ( 0.00%) 0.3080 ( -8.61%) 0.2640 ( 6.90%)
> Amean 12 0.4393 ( 0.00%) 0.4843 (-10.24%) 0.4090 ( 6.89%)
> Amean 21 0.5821 ( 0.00%) 0.6369 ( -9.40%) 0.5126 ( 11.95%)
> Amean 30 0.6557 ( 0.00%) 0.6459 ( 1.50%) 0.5711 ( 12.90%)
> Amean 48 0.7924 ( 0.00%) 0.7760 ( 2.07%) 0.6286 ( 20.68%)
> Amean 79 1.0534 ( 0.00%) 1.0551 ( -0.16%) 0.8481 ( 19.49%)
> Amean 110 1.5286 ( 0.00%) 1.4504 ( 5.11%) 1.1121 ( 27.24%)
> Amean 141 1.9507 ( 0.00%) 1.7790 ( 8.80%) 1.3804 ( 29.23%)
> Amean 172 2.2261 ( 0.00%) 2.3330 ( -4.80%) 1.6336 ( 26.62%)
> Amean 192 2.3753 ( 0.00%) 2.3307 ( 1.88%) 1.8246 ( 23.19%)
>
> Somewhat surprisingly, I can see improvements for UMA machines with
> fewer cores when the workload heavily saturates the machine and the
> workload isn't dominated by fork. Such heavy saturation isn't super
> realistic, but still interesting. I haven't dug into why these results
> occurred, but I am happy things didn't instead fall off a cliff.
>
> Here's a 4-cpu UMA box showing some improvement at the higher end,
>
> hackbench-process-pipes
> 4.9.0-rc6 4.9.0-rc6 4.9.0-rc6
> tip-sched fix-fig-for-fork fix-sig
> Amean 1 3.5060 ( 0.00%) 3.5747 ( -1.96%) 3.5117 ( -0.16%)
> Amean 3 7.7113 ( 0.00%) 7.8160 ( -1.36%) 7.7747 ( -0.82%)
> Amean 5 11.4453 ( 0.00%) 11.5710 ( -1.10%) 11.3870 ( 0.51%)
> Amean 7 15.3147 ( 0.00%) 15.9420 ( -4.10%) 15.8450 ( -3.46%)
> Amean 12 25.5110 ( 0.00%) 24.3410 ( 4.59%) 22.6717 ( 11.13%)
> Amean 16 32.3010 ( 0.00%) 28.5897 ( 11.49%) 25.7473 ( 20.29%)
Hi Matt,
Thanks for the results.
During the review, it has been pointed out by Morten that the test condition
(100*this_avg_load < imbalance_scale*min_avg_load) makes more sense than
(100*min_avg_load > imbalance_scale*this_avg_load). But i see lower
performances with this change. Coud you run tests with the change below on
top of the patchset ?
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e8d1ae7..0129fbb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5514,7 +5514,7 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p,
if (!idlest ||
(min_runnable_load > (this_runnable_load + imbalance)) ||
((this_runnable_load < (min_runnable_load + imbalance)) &&
- (100*min_avg_load > imbalance_scale*this_avg_load)))
+ (100*this_avg_load < imbalance_scale*min_avg_load)))
return NULL;
return idlest;
}
--
2.7.4
next prev parent reply other threads:[~2016-12-05 9:27 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-25 15:34 [PATCH 0/2 v2] sched: improve spread of tasks during fork Vincent Guittot
2016-11-25 15:34 ` [PATCH 1/2 v2] sched: fix find_idlest_group for fork Vincent Guittot
2016-11-28 17:01 ` Matt Fleming
2016-11-28 17:20 ` Vincent Guittot
2016-11-29 10:57 ` Morten Rasmussen
2016-11-29 11:42 ` Peter Zijlstra
2016-11-29 11:44 ` Matt Fleming
2016-11-29 12:30 ` Peter Zijlstra
2016-11-29 14:46 ` Morten Rasmussen
2016-12-05 8:48 ` Peter Zijlstra
2016-11-29 13:04 ` Vincent Guittot
2016-11-29 14:50 ` Morten Rasmussen
2016-11-29 14:57 ` Vincent Guittot
2016-12-03 23:25 ` Matt Fleming
2016-12-05 9:17 ` Vincent Guittot
2016-11-25 15:34 ` [PATCH 2/2 v2] sched: use load_avg for selecting idlest group Vincent Guittot
2016-11-30 12:49 ` Morten Rasmussen
2016-11-30 13:49 ` Vincent Guittot
2016-11-30 13:54 ` Vincent Guittot
2016-11-30 14:24 ` Morten Rasmussen
2016-12-02 15:20 ` Vincent Guittot
2016-12-02 22:24 ` Matt Fleming
2016-11-30 14:23 ` Morten Rasmussen
2016-12-03 3:31 ` Brendan Gregg
2016-12-03 21:47 ` Matt Fleming
2016-12-05 9:27 ` Vincent Guittot [this message]
2016-12-05 13:35 ` Matt Fleming
2016-12-08 14:09 ` Matt Fleming
2016-12-08 14:33 ` Vincent Guittot
2016-11-28 17:02 ` [PATCH 0/2 v2] sched: improve spread of tasks during fork Matt Fleming
2016-11-28 17:20 ` Vincent Guittot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161205092735.GA9161@linaro.org \
--to=vincent.guittot@linaro.org \
--cc=Morten.Rasmussen@arm.com \
--cc=brendan.d.gregg@gmail.com \
--cc=dietmar.eggemann@arm.com \
--cc=kernellwp@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=matt@codeblueprint.co.uk \
--cc=mgorman@techsingularity.net \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=umgwanakikbuti@gmail.com \
--cc=yuyang.du@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).