From: Valentin Schneider <valentin.schneider@arm.com>
To: "Li\, Aubrey" <aubrey.li@linux.intel.com>
Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com,
vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
tim.c.chen@linux.intel.com, linux-kernel@vger.kernel.org,
Aubrey Li <aubrey.li@intel.com>,
Qais Yousef <qais.yousef@arm.com>,
Jiang Biao <benbjiang@gmail.com>
Subject: Re: [RFC PATCH v3] sched/fair: select idle cpu from idle cpumask for task wakeup
Date: Mon, 09 Nov 2020 15:54:36 +0000 [thread overview]
Message-ID: <jhjsg9iy18j.mognet@arm.com> (raw)
In-Reply-To: <ac73a9e2-8cc0-b1fe-fc2b-14b9cb21c8bf@linux.intel.com>
On 09/11/20 13:40, Li, Aubrey wrote:
> On 2020/11/7 5:20, Valentin Schneider wrote:
>>
>> On 21/10/20 16:03, Aubrey Li wrote:
>>> From: Aubrey Li <aubrey.li@intel.com>
>>>
>>> Added idle cpumask to track idle cpus in sched domain. When a CPU
>>> enters idle, its corresponding bit in the idle cpumask will be set,
>>> and when the CPU exits idle, its bit will be cleared.
>>>
>>> When a task wakes up to select an idle cpu, scanning idle cpumask
>>> has low cost than scanning all the cpus in last level cache domain,
>>> especially when the system is heavily loaded.
>>>
>>
>> FWIW I gave this a spin on my arm64 desktop (Ampere eMAG, 32 core). I get
>> some barely noticeable (AIUI not statistically significant for bench sched)
>> changes for 100 iterations of:
>>
>> | bench | metric | mean | std | q90 | q99 |
>> |------------------------------------+----------+--------+---------+--------+--------|
>> | hackbench --loops 5000 --groups 1 | duration | -1.07% | -2.23% | -0.88% | -0.25% |
>> | hackbench --loops 5000 --groups 2 | duration | -0.79% | +30.60% | -0.49% | -0.74% |
>> | hackbench --loops 5000 --groups 4 | duration | -0.54% | +6.99% | -0.21% | -0.12% |
>> | perf bench sched pipe -T -l 100000 | ops/sec | +1.05% | -2.80% | -0.17% | +0.39% |
>>
>> q90 & q99 being the 90th and 99th percentile.
>>
>> Base was tip/sched/core at:
>> d8fcb81f1acf ("sched/fair: Check for idle core in wake_affine")
>
> Thanks for the data, Valentin! So does the negative value mean improvement?
>
For hackbench yes (shorter is better); for perf bench sched no, since the
metric here is ops/sec so higher is better.
That said, I (use a tool that) run a 2-sample Kolmogorov–Smirnov test
against the two sample sets (tip/sched/core vs tip/sched/core+patch), and
the p-value for perf sched bench is quite high (~0.9) which means we can't
reject that both sample sets come from the same distribution; long story
short we can't say whether the patch had a noticeable impact for that
benchmark.
> If so the data looks expected to me. As we set idle cpumask every time we
> enter idle, but only clear it at the tick frequency, so if the workload
> is not heavy enough, there could be a lot of idle during two ticks, so idle
> cpumask is almost equal to sched_domain_span(sd), which makes no difference.
>
> But if the system load is heavy enough, CPU has few/no chance to enter idle,
> then idle cpumask can be cleared during tick, which makes the bit number in
> sds_idle_cpus(sd->shared) far less than the bit number in sched_domain_span(sd)
> if llc domain has large count of CPUs.
>
With hackbench -g 4 that's 160 tasks (against 32 CPUs, all under same LLC),
although the work done by each task isn't much. I'll try bumping that a
notch, or increasing the size of the messages.
> For example, if I run 4 x overcommit uperf on a system with 192 CPUs,
> I observed:
> - default, the average of this_sd->avg_scan_cost is 223.12ns
> - patch, the average of this_sd->avg_scan_cost is 63.4ns
>
> And select_idle_cpu is called 7670253 times per second, so for every CPU the
> scan cost is saved (223.12 - 63.4) * 7670253 / 192 = 6.4ms. As a result, I
> saw uperf thoughput improved by 60+%.
>
That's ~1.2s of "extra" CPU time per second, which sounds pretty cool.
I don't think I've ever played with uperf. I'll give that a shot someday.
> Thanks,
> -Aubrey
>
>
>
next prev parent reply other threads:[~2020-11-09 15:54 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-21 15:03 [RFC PATCH v3] sched/fair: select idle cpu from idle cpumask for task wakeup Aubrey Li
2020-10-21 17:40 ` kernel test robot
2020-10-22 1:28 ` Li, Aubrey
2020-10-21 18:20 ` kernel test robot
2020-11-03 19:27 ` Valentin Schneider
2020-11-04 11:52 ` Li, Aubrey
2020-11-06 21:22 ` Valentin Schneider
2020-11-06 7:58 ` Vincent Guittot
2020-11-09 6:05 ` Li, Aubrey
2020-11-06 21:20 ` Valentin Schneider
2020-11-09 13:40 ` Li, Aubrey
2020-11-09 15:54 ` Valentin Schneider [this message]
2020-11-11 8:38 ` Li, Aubrey
2020-11-12 10:57 ` Qais Yousef
2020-11-12 12:12 ` Li, Aubrey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=jhjsg9iy18j.mognet@arm.com \
--to=valentin.schneider@arm.com \
--cc=aubrey.li@intel.com \
--cc=aubrey.li@linux.intel.com \
--cc=benbjiang@gmail.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=qais.yousef@arm.com \
--cc=rostedt@goodmis.org \
--cc=tim.c.chen@linux.intel.com \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.