public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Valentin Schneider <valentin.schneider@arm.com>
To: "Li\, Aubrey" <aubrey.li@linux.intel.com>
Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	tim.c.chen@linux.intel.com, linux-kernel@vger.kernel.org,
	Aubrey Li <aubrey.li@intel.com>,
	Qais Yousef <qais.yousef@arm.com>,
	Jiang Biao <benbjiang@gmail.com>
Subject: Re: [RFC PATCH v3] sched/fair: select idle cpu from idle cpumask for task wakeup
Date: Mon, 09 Nov 2020 15:54:36 +0000	[thread overview]
Message-ID: <jhjsg9iy18j.mognet@arm.com> (raw)
In-Reply-To: <ac73a9e2-8cc0-b1fe-fc2b-14b9cb21c8bf@linux.intel.com>


On 09/11/20 13:40, Li, Aubrey wrote:
> On 2020/11/7 5:20, Valentin Schneider wrote:
>>
>> On 21/10/20 16:03, Aubrey Li wrote:
>>> From: Aubrey Li <aubrey.li@intel.com>
>>>
>>> Added idle cpumask to track idle cpus in sched domain. When a CPU
>>> enters idle, its corresponding bit in the idle cpumask will be set,
>>> and when the CPU exits idle, its bit will be cleared.
>>>
>>> When a task wakes up to select an idle cpu, scanning idle cpumask
>>> has low cost than scanning all the cpus in last level cache domain,
>>> especially when the system is heavily loaded.
>>>
>>
>> FWIW I gave this a spin on my arm64 desktop (Ampere eMAG, 32 core). I get
>> some barely noticeable (AIUI not statistically significant for bench sched)
>> changes for 100 iterations of:
>>
>> | bench                              | metric   |   mean |     std |    q90 |    q99 |
>> |------------------------------------+----------+--------+---------+--------+--------|
>> | hackbench --loops 5000 --groups 1  | duration | -1.07% |  -2.23% | -0.88% | -0.25% |
>> | hackbench --loops 5000 --groups 2  | duration | -0.79% | +30.60% | -0.49% | -0.74% |
>> | hackbench --loops 5000 --groups 4  | duration | -0.54% |  +6.99% | -0.21% | -0.12% |
>> | perf bench sched pipe -T -l 100000 | ops/sec  | +1.05% |  -2.80% | -0.17% | +0.39% |
>>
>> q90 & q99 being the 90th and 99th percentile.
>>
>> Base was tip/sched/core at:
>> d8fcb81f1acf ("sched/fair: Check for idle core in wake_affine")
>
> Thanks for the data, Valentin! So does the negative value mean improvement?
>

For hackbench yes (shorter is better); for perf bench sched no, since the
metric here is ops/sec so higher is better.

That said, I (use a tool that) run a 2-sample Kolmogorov–Smirnov test
against the two sample sets (tip/sched/core vs tip/sched/core+patch), and
the p-value for perf sched bench is quite high (~0.9) which means we can't
reject that both sample sets come from the same distribution; long story
short we can't say whether the patch had a noticeable impact for that
benchmark.

> If so the data looks expected to me. As we set idle cpumask every time we
> enter idle, but only clear it at the tick frequency, so if the workload
> is not heavy enough, there could be a lot of idle during two ticks, so idle
> cpumask is almost equal to sched_domain_span(sd), which makes no difference.
>
> But if the system load is heavy enough, CPU has few/no chance to enter idle,
> then idle cpumask can be cleared during tick, which makes the bit number in
> sds_idle_cpus(sd->shared) far less than the bit number in sched_domain_span(sd)
> if llc domain has large count of CPUs.
>

With hackbench -g 4 that's 160 tasks (against 32 CPUs, all under same LLC),
although the work done by each task isn't much. I'll try bumping that a
notch, or increasing the size of the messages.

> For example, if I run 4 x overcommit uperf on a system with 192 CPUs,
> I observed:
> - default, the average of this_sd->avg_scan_cost is 223.12ns
> - patch, the average of this_sd->avg_scan_cost is 63.4ns
>
> And select_idle_cpu is called 7670253 times per second, so for every CPU the
> scan cost is saved (223.12 - 63.4) * 7670253 / 192 = 6.4ms. As a result, I
> saw uperf thoughput improved by 60+%.
>

That's ~1.2s of "extra" CPU time per second, which sounds pretty cool.

I don't think I've ever played with uperf. I'll give that a shot someday.

> Thanks,
> -Aubrey
>
>
>

  reply	other threads:[~2020-11-09 15:54 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-21 15:03 [RFC PATCH v3] sched/fair: select idle cpu from idle cpumask for task wakeup Aubrey Li
2020-11-03 19:27 ` Valentin Schneider
2020-11-04 11:52   ` Li, Aubrey
2020-11-06 21:22     ` Valentin Schneider
2020-11-06  7:58 ` Vincent Guittot
2020-11-09  6:05   ` Li, Aubrey
2020-11-06 21:20 ` Valentin Schneider
2020-11-09 13:40   ` Li, Aubrey
2020-11-09 15:54     ` Valentin Schneider [this message]
2020-11-11  8:38       ` Li, Aubrey
2020-11-12 10:57 ` Qais Yousef
2020-11-12 12:12   ` Li, Aubrey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jhjsg9iy18j.mognet@arm.com \
    --to=valentin.schneider@arm.com \
    --cc=aubrey.li@intel.com \
    --cc=aubrey.li@linux.intel.com \
    --cc=benbjiang@gmail.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=qais.yousef@arm.com \
    --cc=rostedt@goodmis.org \
    --cc=tim.c.chen@linux.intel.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox