Re: [PATCH 2/2 v2] sched: use load_avg for selecting idlest group

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Vincent Guittot <vincent.guittot@linaro.org>
To: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Morten.Rasmussen@arm.com, dietmar.eggemann@arm.com,
	kernellwp@gmail.com, yuyang.du@intel.com,
	umgwanakikbuti@gmail.com,
	Mel Gorman <mgorman@techsingularity.net>
Subject: Re: [PATCH 2/2 v2] sched: use load_avg for selecting idlest group
Date: Mon, 5 Dec 2016 10:27:36 +0100	[thread overview]
Message-ID: <20161205092735.GA9161@linaro.org> (raw)
In-Reply-To: <20161203214707.GI20785@codeblueprint.co.uk>

Le Saturday 03 Dec 2016 à 21:47:07 (+0000), Matt Fleming a écrit :
> On Fri, 02 Dec, at 07:31:04PM, Brendan Gregg wrote:
> > 
> > For background, is this from the "A decade of wasted cores" paper's
> > patches?
> 
> No, this patch fixes an issue I originally reported here,
> 
>   https://lkml.kernel.org/r/20160923115808.2330-1-matt@codeblueprint.co.uk
> 
> Essentially, if you have an idle or partially-idle system and a
> workload that consists of fork()'ing a bunch of tasks, where each of
> those tasks immediately sleeps waiting for some wakeup, then those
> tasks aren't spread across all idle CPUs very well.
> 
> We saw this issue when running hackbench with a small loop count, such
> that the actual benchmark setup (fork()'ing) is where the majority of
> the runtime is spent.
> 
> In that scenario, there's a large potential/blocked load, but
> essentially no runnable load, and the balance on fork scheduler code
> only cares about runnable load without Vincent's patch applied.
> 
> The closest thing I can find in the "A decade of wasted cores" paper
> is "The Overload-on-Wakeup bug", but I don't think that's the issue
> here since,
> 
>   a) We're balancing on fork, not wakeup
>   b) The fork on balance code balances across nodes OK
> 
> > What's the expected typical gain? Thanks,
> 
> The results are still coming back from the SUSE performance test grid
> but they do show that this patch is mainly a win for multi-socket
> machines with more than 8 cores or thereabouts.
> 
>  [ Vincent, I'll follow up to your PATCH 1/2 with the results that are
>    specifically for that patch ]
> 
> Assuming a fork-intensive or fork-dominated workload, and a
> multi-socket machine, such as this 2 socket, NUMA, with 12 cores and
> HT enabled (48 cpus), we saw a very clear win between +10% and +15%
> for processes communicating via pipes,
> 
>   (1) tip-sched = tip/sched/core branch
>   (2) fix-fig-for-fork = (1) + PATCH 1/2
>   (3) fix-sig = (1) + (2) + PATCH 2/2
> 
> hackbench-process-pipes
>                          4.9.0-rc6             4.9.0-rc6             4.9.0-rc6
>                          tip-sched      fix-fig-for-fork               fix-sig
> Amean    1        0.0717 (  0.00%)      0.0696 (  2.99%)      0.0730 ( -1.79%)
> Amean    4        0.1244 (  0.00%)      0.1200 (  3.56%)      0.1190 (  4.36%)
> Amean    7        0.1891 (  0.00%)      0.1937 ( -2.42%)      0.1831 (  3.17%)
> Amean    12       0.2964 (  0.00%)      0.3116 ( -5.11%)      0.2784 (  6.07%)
> Amean    21       0.4011 (  0.00%)      0.4090 ( -1.96%)      0.3574 ( 10.90%)
> Amean    30       0.4944 (  0.00%)      0.4654 (  5.87%)      0.4171 ( 15.63%)
> Amean    48       0.6113 (  0.00%)      0.6309 ( -3.20%)      0.5331 ( 12.78%)
> Amean    79       0.8616 (  0.00%)      0.8706 ( -1.04%)      0.7710 ( 10.51%)
> Amean    110      1.1304 (  0.00%)      1.2211 ( -8.02%)      1.0163 ( 10.10%)
> Amean    141      1.3754 (  0.00%)      1.4279 ( -3.81%)      1.2803 (  6.92%)
> Amean    172      1.6217 (  0.00%)      1.7367 ( -7.09%)      1.5363 (  5.27%)
> Amean    192      1.7809 (  0.00%)      2.0199 (-13.42%)      1.7129 (  3.82%)
> 
> Things look even better when using threads and pipes, with wins
> between 11% and 29% when looking at results outside of the noise,
> 
> hackbench-thread-pipes
>                          4.9.0-rc6             4.9.0-rc6             4.9.0-rc6
>                          tip-sched      fix-fig-for-fork               fix-sig
> Amean    1        0.0736 (  0.00%)      0.0794 ( -7.96%)      0.0779 ( -5.83%)
> Amean    4        0.1709 (  0.00%)      0.1690 (  1.09%)      0.1663 (  2.68%)
> Amean    7        0.2836 (  0.00%)      0.3080 ( -8.61%)      0.2640 (  6.90%)
> Amean    12       0.4393 (  0.00%)      0.4843 (-10.24%)      0.4090 (  6.89%)
> Amean    21       0.5821 (  0.00%)      0.6369 ( -9.40%)      0.5126 ( 11.95%)
> Amean    30       0.6557 (  0.00%)      0.6459 (  1.50%)      0.5711 ( 12.90%)
> Amean    48       0.7924 (  0.00%)      0.7760 (  2.07%)      0.6286 ( 20.68%)
> Amean    79       1.0534 (  0.00%)      1.0551 ( -0.16%)      0.8481 ( 19.49%)
> Amean    110      1.5286 (  0.00%)      1.4504 (  5.11%)      1.1121 ( 27.24%)
> Amean    141      1.9507 (  0.00%)      1.7790 (  8.80%)      1.3804 ( 29.23%)
> Amean    172      2.2261 (  0.00%)      2.3330 ( -4.80%)      1.6336 ( 26.62%)
> Amean    192      2.3753 (  0.00%)      2.3307 (  1.88%)      1.8246 ( 23.19%)
> 
> Somewhat surprisingly, I can see improvements for UMA machines with
> fewer cores when the workload heavily saturates the machine and the
> workload isn't dominated by fork. Such heavy saturation isn't super
> realistic, but still interesting. I haven't dug into why these results
> occurred, but I am happy things didn't instead fall off a cliff.
> 
> Here's a 4-cpu UMA box showing some improvement at the higher end,
> 
> hackbench-process-pipes
>                         4.9.0-rc6             4.9.0-rc6             4.9.0-rc6
>                         tip-sched      fix-fig-for-fork               fix-sig
> Amean    1       3.5060 (  0.00%)      3.5747 ( -1.96%)      3.5117 ( -0.16%)
> Amean    3       7.7113 (  0.00%)      7.8160 ( -1.36%)      7.7747 ( -0.82%)
> Amean    5      11.4453 (  0.00%)     11.5710 ( -1.10%)     11.3870 (  0.51%)
> Amean    7      15.3147 (  0.00%)     15.9420 ( -4.10%)     15.8450 ( -3.46%)
> Amean    12     25.5110 (  0.00%)     24.3410 (  4.59%)     22.6717 ( 11.13%)
> Amean    16     32.3010 (  0.00%)     28.5897 ( 11.49%)     25.7473 ( 20.29%)

Hi Matt,

Thanks for the results.

During the review, it has been pointed out by Morten that the test condition
(100*this_avg_load < imbalance_scale*min_avg_load) makes more sense than
(100*min_avg_load > imbalance_scale*this_avg_load). But i see lower
performances with this change. Coud you run tests with the change below on
top of the patchset ?

---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e8d1ae7..0129fbb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5514,7 +5514,7 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p,
 	if (!idlest ||
 	    (min_runnable_load > (this_runnable_load + imbalance)) ||
 	    ((this_runnable_load < (min_runnable_load + imbalance)) &&
-			(100*min_avg_load > imbalance_scale*this_avg_load)))
+			(100*this_avg_load < imbalance_scale*min_avg_load)))
 		return NULL;
 	return idlest;
 }
-- 
2.7.4

next prev parent reply	other threads:[~2016-12-05  9:27 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-25 15:34 [PATCH 0/2 v2] sched: improve spread of tasks during fork Vincent Guittot
2016-11-25 15:34 ` [PATCH 1/2 v2] sched: fix find_idlest_group for fork Vincent Guittot
2016-11-28 17:01   ` Matt Fleming
2016-11-28 17:20     ` Vincent Guittot
2016-11-29 10:57   ` Morten Rasmussen
2016-11-29 11:42     ` Peter Zijlstra
2016-11-29 11:44       ` Matt Fleming
2016-11-29 12:30         ` Peter Zijlstra
2016-11-29 14:46       ` Morten Rasmussen
2016-12-05  8:48         ` Peter Zijlstra
2016-11-29 13:04     ` Vincent Guittot
2016-11-29 14:50       ` Morten Rasmussen
2016-11-29 14:57         ` Vincent Guittot
2016-12-03 23:25   ` Matt Fleming
2016-12-05  9:17     ` Vincent Guittot
2016-11-25 15:34 ` [PATCH 2/2 v2] sched: use load_avg for selecting idlest group Vincent Guittot
2016-11-30 12:49   ` Morten Rasmussen
2016-11-30 13:49     ` Vincent Guittot
2016-11-30 13:54       ` Vincent Guittot
2016-11-30 14:24         ` Morten Rasmussen
2016-12-02 15:20           ` Vincent Guittot
2016-12-02 22:24             ` Matt Fleming
2016-11-30 14:23       ` Morten Rasmussen
2016-12-03  3:31   ` Brendan Gregg
2016-12-03 21:47     ` Matt Fleming
2016-12-05  9:27       ` Vincent Guittot [this message]
2016-12-05 13:35         ` Matt Fleming
2016-12-08 14:09           ` Matt Fleming
2016-12-08 14:33             ` Vincent Guittot
2016-11-28 17:02 ` [PATCH 0/2 v2] sched: improve spread of tasks during fork Matt Fleming
2016-11-28 17:20   ` Vincent Guittot

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:e8d1ae7 dfblob:0129fbb )
 OR (
bs:"Re: [PATCH 2/2 v2] sched: use load_avg for selecting idlest group" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161205092735.GA9161@linaro.org \
    --to=vincent.guittot@linaro.org \
    --cc=Morten.Rasmussen@arm.com \
    --cc=brendan.d.gregg@gmail.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=kernellwp@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matt@codeblueprint.co.uk \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=umgwanakikbuti@gmail.com \
    --cc=yuyang.du@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.