From: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
Valentin Schneider <vschneid@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Daniel Bristot de Oliveira <bristot@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Juri Lelli <juri.lelli@redhat.com>,
Swapnil Sapkal <Swapnil.Sapkal@amd.com>,
Aaron Lu <aaron.lu@intel.com>,
x86@kernel.org, Peter Zijlstra <peterz@infradead.org>,
shrikanth hegde <sshegde@linux.vnet.ibm.com>
Subject: Re: [RFC PATCH 1/1] sched: Extend cpu idle state for 1ms
Date: Wed, 26 Jul 2023 13:03:54 +0530 [thread overview]
Message-ID: <077e87e2-be7c-c8d5-fa25-fe46e012d3f8@linux.vnet.ibm.com> (raw)
In-Reply-To: <20230725193048.124796-1-mathieu.desnoyers@efficios.com>
On 7/26/23 1:00 AM, Mathieu Desnoyers wrote:
> Allow select_task_rq to consider a cpu as idle for 1ms after that cpu
> has exited the idle loop.
>
> This speeds up the following hackbench workload on a 192 cores AMD EPYC
> 9654 96-Core Processor (over 2 sockets):
>
> hackbench -g 32 -f 20 --threads --pipe -l 480000 -s 100
>
> from 49s to 34s. (30% speedup)
>
> My working hypothesis for why this helps is: queuing more than a single
> task on the runqueue of a cpu which just exited idle rather than
> spreading work over other idle cpus helps power efficiency on systems
> with large number of cores.
>
> This was developed as part of the investigation into a weird regression
> reported by AMD where adding a raw spinlock in the scheduler context
> switch accelerated hackbench.
>
> It turned out that changing this raw spinlock for a loop of 10000x
> cpu_relax within do_idle() had similar benefits.
>
> This patch achieve a similar effect without the busy-waiting by
> introducing a runqueue state sampling the sched_clock() when exiting
> idle, which allows select_task_rq to consider "as idle" a cpu which has
> recently exited idle.
>
> This patch should be considered "food for thoughts", and I would be glad
> to hear feedback on whether it causes regressions on _other_ workloads,
> and whether it helps with the hackbench workload on large Intel system
> as well.
>
> Link: https://lore.kernel.org/r/09e0f469-a3f7-62ef-75a1-e64cec2dcfc5@amd.com
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Valentin Schneider <vschneid@redhat.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Swapnil Sapkal <Swapnil.Sapkal@amd.com>
> Cc: Aaron Lu <aaron.lu@intel.com>
> Cc: x86@kernel.org
> ---
> kernel/sched/core.c | 4 ++++
> kernel/sched/sched.h | 3 +++
> 2 files changed, 7 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index a68d1276bab0..d40e3a0a5ced 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6769,6 +6769,7 @@ void __sched schedule_idle(void)
> * TASK_RUNNING state.
> */
> WARN_ON_ONCE(current->__state);
> + WRITE_ONCE(this_rq()->idle_end_time, sched_clock());
> do {
> __schedule(SM_NONE);
> } while (need_resched());
> @@ -7300,6 +7301,9 @@ int idle_cpu(int cpu)
> {
> struct rq *rq = cpu_rq(cpu);
>
> + if (sched_clock() < READ_ONCE(rq->idle_end_time) + IDLE_CPU_DELAY_NS)
Wouldn't this hurt the latency badly? Specially on a loaded system with
a workload that does a lot of wakeup.
ran schbench on a 50% loaded system with stress-ng. (there could be a better benchmark to measure latency)
I see that latency takes a hit. specially tail latencies.full log below with different schbench groups.
6.5-rc3 6.5-rc3+this patch
Groups: 1
50.0th: 14.0 13.0
75.0th: 16.0 16.0
90.0th: 19.5 20.0
95.0th: 53.0 226.0
99.0th: 1969.0 2165.0
99.5th: 2912.0 2648.0
99.9th: 4680.0 4142.0
Groups: 2
50.0th: 15.5 15.5
75.0th: 18.0 19.5
90.0th: 25.5 497.0
95.0th: 323.0 1384.0
99.0th: 2055.0 3144.0
99.5th: 2972.0 4014.0
99.9th: 6026.0 6560.0
Groups: 4
50.0th: 18.0 18.5
75.0th: 21.5 26.0
90.0th: 56.0 940.5
95.0th: 678.0 1896.0
99.0th: 2484.0 3756.0
99.5th: 3224.0 4616.0
99.9th: 4960.0 6824.0
Groups: 8
50.0th: 23.5 25.5
75.0th: 30.5 421.5
90.0th: 443.5 1722.0
95.0th: 1410.0 2736.0
99.0th: 3942.0 5496.0
99.5th: 5232.0 7016.0
99.9th: 7996.0 8896.0
Groups: 16
50.0th: 33.5 41.5
75.0th: 49.0 752.0
90.0th: 1067.5 2332.0
95.0th: 2093.0 3468.0
99.0th: 5048.0 6728.0
99.5th: 6760.0 7624.0
99.9th: 8592.0 9504.0
Groups: 32
50.0th: 60.0 79.0
75.0th: 456.5 1712.0
90.0th: 2788.0 3996.0
95.0th: 4544.0 5768.0
99.0th: 8444.0 9104.0
99.5th: 9168.0 9808.0
99.9th: 11984.0 12448.0
> + return 1;
> +
> if (rq->curr != rq->idle)
> return 0;
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 81ac605b9cd5..8932e198a33a 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -97,6 +97,8 @@
> # define SCHED_WARN_ON(x) ({ (void)(x), 0; })
> #endif
>
> +#define IDLE_CPU_DELAY_NS 1000000 /* 1ms */
> +
> struct rq;
> struct cpuidle_state;
>
> @@ -1010,6 +1012,7 @@ struct rq {
>
> struct task_struct __rcu *curr;
> struct task_struct *idle;
> + u64 idle_end_time;
> struct task_struct *stop;
> unsigned long next_balance;
> struct mm_struct *prev_mm;
next prev parent reply other threads:[~2023-07-26 8:05 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-25 19:30 [RFC PATCH 1/1] sched: Extend cpu idle state for 1ms Mathieu Desnoyers
2023-07-26 7:33 ` Shrikanth Hegde [this message]
2023-07-26 8:04 ` Shrikanth Hegde
2023-07-26 14:07 ` Mathieu Desnoyers
2023-07-26 17:40 ` Shrikanth Hegde
2023-07-26 18:56 ` Mathieu Desnoyers
2023-07-26 19:16 ` Mathieu Desnoyers
2023-08-01 7:24 ` Aaron Lu
2023-08-01 15:03 ` Chen Yu
2023-08-03 20:21 ` Mathieu Desnoyers
2023-08-03 5:53 ` Swapnil Sapkal
2023-08-03 20:12 ` Mathieu Desnoyers
2023-08-05 15:37 ` Shrikanth Hegde
2023-07-27 5:04 ` Chen Yu
2023-08-01 7:42 ` Aaron Lu
2023-08-04 14:04 ` David Laight
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=077e87e2-be7c-c8d5-fa25-fe46e012d3f8@linux.vnet.ibm.com \
--to=sshegde@linux.vnet.ibm.com \
--cc=Swapnil.Sapkal@amd.com \
--cc=aaron.lu@intel.com \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox