From: Benjamin Segall <bsegall@google.com>
To: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
Cc: mingo@redhat.com, peterz@infradead.org,
vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
tglx@linutronix.de, srikar@linux.vnet.ibm.com,
arjan@linux.intel.com, svaidy@linux.ibm.com,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH V3] Interleave cfs bandwidth timers for improved single thread performance at low utilization
Date: Thu, 23 Feb 2023 15:09:46 -0800 [thread overview]
Message-ID: <xm26356w3rnp.fsf@google.com> (raw)
In-Reply-To: <20230223185153.1499710-1-sshegde@linux.vnet.ibm.com> (Shrikanth Hegde's message of "Fri, 24 Feb 2023 00:21:53 +0530")
Shrikanth Hegde <sshegde@linux.vnet.ibm.com> writes:
> CPU cfs bandwidth controller uses hrtimer. Currently there is no initial
> value set. Hence all period timers would align at expiry.
> This happens when there are multiple CPU cgroup's.
>
> There is a performance gain that can be achieved here if the timers are
> interleaved when the utilization of each CPU cgroup is low and total
> utilization of all the CPU cgroup's is less than 50%. If the timers are
> interleaved, then the unthrottled cgroup can run freely without many
> context switches and can also benefit from SMT Folding. This effect will
> be further amplified in SPLPAR environment.
>
> This commit adds a random offset after initializing each hrtimer. This
> would result in interleaving the timers at expiry, which helps in achieving
> the said performance gain.
>
> This was tested on powerpc platform with 8 core SMT=8. Socket power was
> measured when the workload. Benchmarked the stress-ng with power
> information. Throughput oriented benchmarks show significant gain up to
> 25% while power consumption increases up to 15%.
>
> Workload: stress-ng --cpu=32 --cpu-ops=50000.
> 1CG - 1 cgroup is running.
> 2CG - 2 cgroups are running together.
> Time taken to complete stress-ng in seconds and power is in watts.
> each cgroup is throttled at 25% with 100ms as the period value.
> 6.2-rc6 | with patch
> 8 core 1CG power 2CG power | 1CG power 2 CG power
> 27.5 80.6 40 90 | 27.3 82 32.3 104
> 27.5 81 40.2 91 | 27.5 81 38.7 96
> 27.7 80 40.1 89 | 27.6 80 29.7 106
> 27.7 80.1 40.3 94 | 27.6 80 31.5 105
>
> Latency might be affected by this change. That could happen if the CPU was
> in a deep idle state which is possible if we interleave the timers. Used
> schbench for measuring the latency. Each cgroup is throttled at 25% with
> period value is set to 100ms. Numbers are when both the cgroups are
> running simultaneously. Latency values don't degrade much. Some
> improvement is seen in tail latencies.
>
> 6.2-rc6 with patch
> Groups: 16
> 50.0th: 39.5 42.5
> 75.0th: 924.0 922.0
> 90.0th: 972.0 968.0
> 95.0th: 1005.5 994.0
> 99.0th: 4166.0 2287.0
> 99.5th: 7314.0 7448.0
> 99.9th: 15024.0 13600.0
>
> Groups: 32
> 50.0th: 819.0 463.0
> 75.0th: 1596.0 918.0
> 90.0th: 5992.0 1281.5
> 95.0th: 13184.0 2765.0
> 99.0th: 21792.0 14240.0
> 99.5th: 25696.0 18920.0
> 99.9th: 33280.0 35776.0
>
> Groups: 64
> 50.0th: 4806.0 3440.0
> 75.0th: 31136.0 33664.0
> 90.0th: 54144.0 58752.0
> 95.0th: 66176.0 67200.0
> 99.0th: 84736.0 91520.0
> 99.5th: 97408.0 114048.0
> 99.9th: 136448.0 140032.0
>
> Signed-off-by: Shrikanth Hegde<sshegde@linux.vnet.ibm.com>
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ben Segall <bsegall@google.com>
>
> Initial RFC PATCH, discussions and details on the problem:
> Link1: https://lore.kernel.org/lkml/5ae3cb09-8c9a-11e8-75a7-cc774d9bc283@linux.vnet.ibm.com/
> Link2: https://lore.kernel.org/lkml/9c57c92c-3e0c-b8c5-4be9-8f4df344a347@linux.vnet.ibm.com/
>
> ---
> kernel/sched/fair.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ff4dbbae3b10..2a4a0969e04f 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5923,6 +5923,10 @@ void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
> INIT_LIST_HEAD(&cfs_b->throttled_cfs_rq);
> hrtimer_init(&cfs_b->period_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED);
> cfs_b->period_timer.function = sched_cfs_period_timer;
> +
> + /* Add a random offset so that timers interleave */
> + hrtimer_set_expires(&cfs_b->period_timer,
> + get_random_u32_below(cfs_b->period));
> hrtimer_init(&cfs_b->slack_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> cfs_b->slack_timer.function = sched_cfs_slack_timer;
> cfs_b->slack_started = false;
> --
> 2.31.1
next prev parent reply other threads:[~2023-02-23 23:10 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-23 18:51 [PATCH V3] Interleave cfs bandwidth timers for improved single thread performance at low utilization Shrikanth Hegde
2023-02-23 23:09 ` Benjamin Segall [this message]
2023-03-09 14:21 ` Shrikanth Hegde
2023-03-14 7:55 ` Vincent Guittot
2023-03-14 9:59 ` Peter Zijlstra
2023-03-22 9:22 ` [tip: sched/core] sched: " tip-bot2 for Shrikanth Hegde
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xm26356w3rnp.fsf@google.com \
--to=bsegall@google.com \
--cc=arjan@linux.intel.com \
--cc=dietmar.eggemann@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=srikar@linux.vnet.ibm.com \
--cc=sshegde@linux.vnet.ibm.com \
--cc=svaidy@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.