Re: [PATCH] sched/fair: update_pick_idlest() Select group with lowest group_util when idle_cpus are equal

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Valentin Schneider <valentin.schneider@arm.com>
To: peter.puhov@linaro.org
Cc: linux-kernel@vger.kernel.org, robert.foley@linaro.org,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>
Subject: Re: [PATCH] sched/fair: update_pick_idlest() Select group with lowest group_util when idle_cpus are equal
Date: Wed, 17 Jun 2020 11:50:22 +0100	[thread overview]
Message-ID: <jhjo8pidl01.mognet@arm.com> (raw)
In-Reply-To: <20200616164801.18644-1-peter.puhov@linaro.org>


On 16/06/20 17:48, peter.puhov@linaro.org wrote:
> From: Peter Puhov <peter.puhov@linaro.org>
> We tested this patch with following benchmarks:
>   perf bench -f simple sched pipe -l 4000000
>   perf bench -f simple sched messaging -l 30000
>   perf bench -f simple  mem memset -s 3GB -l 15 -f default
>   perf bench -f simple futex wake -s -t 640 -w 1
>   sysbench cpu --threads=8 --cpu-max-prime=10000 run
>   sysbench memory --memory-access-mode=rnd --threads=8 run
>   sysbench threads --threads=8 run
>   sysbench mutex --mutex-num=1 --threads=8 run
>   hackbench --loops 20000
>   hackbench --pipe --threads --loops 20000
>   hackbench --pipe --threads --loops 20000 --datasize 4096
>
> and found some performance improvements in:
>   sysbench threads
>   sysbench mutex
>   perf bench futex wake
> and no regressions in others.
>

One nitpick for the results of those: condensing them in a table form would
make them more reader-friendly. Perhaps something like:

| Benchmark        | Metric   | Lower is better? | BASELINE | SERIES | DELTA |
|------------------+----------+------------------+----------+--------+-------|
| Sysbench threads | # events | No               |    45526 |  56567 |  +24% |
| Sysbench mutex   | ...      |                  |          |        |       |

If you want to include more stats for each benchmark, you could have one table
per (e.g. see [1]) - it'd still be a more readable form (or so I believe).

[1]: https://lore.kernel.org/lkml/20200206191957.12325-1-valentin.schneider@arm.com/

> ---
>  kernel/sched/fair.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 02f323b85b6d..abcbdf80ee75 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8662,8 +8662,14 @@ static bool update_pick_idlest(struct sched_group *idlest,
>
>       case group_has_spare:
>               /* Select group with most idle CPUs */
> -		if (idlest_sgs->idle_cpus >= sgs->idle_cpus)
> +		if (idlest_sgs->idle_cpus > sgs->idle_cpus)
>                       return false;
> +
> +		/* Select group with lowest group_util */
> +		if (idlest_sgs->idle_cpus == sgs->idle_cpus &&
> +			idlest_sgs->group_util <= sgs->group_util)
> +			return false;
> +
>               break;
>       }

update_sd_pick_busiest() uses the group's nr_running instead. You mention
in the changelog that using nr_running is a possible alternative, did you
try benchmarking that and seeing how it compares to using group_util?

I think it would be nice to keep pick_busiest() and pick_idlest() aligned
wherever possible/sensible.

Also, there can be cases where one group has a few "big" tasks and another
has a handful more "small" tasks. Say something like

  sgs_a->group_util = U
  sgs_a->sum_nr_running = N

  sgs_b->group_util = U*4/3
  sgs_b->sum_nr_running = N*2/3

  (sgs_b has more util per task, i.e. bigger tasks on average)

Given that we're in the 'group_has_spare' case, I would think picking the
group with the lesser amount of running tasks would make sense. Though I
guess you can find pathological cases where the util per task difference is
huge and we should look at util first...

next prev parent reply	other threads:[~2020-06-17 10:50 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-16 16:48 [PATCH] sched/fair: update_pick_idlest() Select group with lowest group_util when idle_cpus are equal peter.puhov
2020-06-17 10:50 ` Valentin Schneider [this message]
2020-06-17 14:52   ` Peter Puhov
2020-07-02  9:27     ` Dietmar Eggemann
2020-07-02 13:20       ` Mel Gorman
2020-07-02 13:45         ` Vincent Guittot
2020-07-01  9:19 ` [sched/fair] 0b9730e694: vm-scalability.throughput 7.7% improvement kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jhjo8pidl01.mognet@arm.com \
    --to=valentin.schneider@arm.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peter.puhov@linaro.org \
    --cc=peterz@infradead.org \
    --cc=robert.foley@linaro.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.