public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Benjamin Segall <bsegall@google.com>
To: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
Cc: mingo@redhat.com, peterz@infradead.org,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	tglx@linutronix.de, srikar@linux.vnet.ibm.com,
	arjan@linux.intel.com, svaidy@linux.ibm.com,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH V3] Interleave cfs bandwidth timers for improved single thread performance at low utilization
Date: Thu, 23 Feb 2023 15:09:46 -0800	[thread overview]
Message-ID: <xm26356w3rnp.fsf@google.com> (raw)
In-Reply-To: <20230223185153.1499710-1-sshegde@linux.vnet.ibm.com> (Shrikanth Hegde's message of "Fri, 24 Feb 2023 00:21:53 +0530")

Shrikanth Hegde <sshegde@linux.vnet.ibm.com> writes:

> CPU cfs bandwidth controller uses hrtimer. Currently there is no initial
> value set. Hence all period timers would align at expiry.
> This happens when there are multiple CPU cgroup's.
>
> There is a performance gain that can be achieved here if the timers are
> interleaved when the utilization of each CPU cgroup is low and total
> utilization of all the CPU cgroup's is less than 50%. If the timers are
> interleaved, then the unthrottled cgroup can run freely without many
> context switches and can also benefit from SMT Folding. This effect will
> be further amplified in SPLPAR environment.
>
> This commit adds a random offset after initializing each hrtimer. This
> would result in interleaving the timers at expiry, which helps in achieving
> the said performance gain.
>
> This was tested on powerpc platform with 8 core SMT=8. Socket power was
> measured when the workload. Benchmarked the stress-ng with power
> information. Throughput oriented benchmarks show significant gain up to
> 25% while power consumption increases up to 15%.
>
> Workload: stress-ng --cpu=32 --cpu-ops=50000.
> 1CG - 1 cgroup is running.
> 2CG - 2 cgroups are running together.
> Time taken to complete stress-ng in seconds and power is in watts.
> each cgroup is throttled at 25% with 100ms as the period value.
>            6.2-rc6                     |   with patch
> 8 core   1CG    power   2CG     power  |  1CG    power  2 CG    power
>         27.5    80.6    40      90     |  27.3    82    32.3    104
>         27.5    81      40.2    91     |  27.5    81    38.7     96
>         27.7    80      40.1    89     |  27.6    80    29.7    106
>         27.7    80.1    40.3    94     |  27.6    80    31.5    105
>
> Latency might be affected by this change. That could happen if the CPU was
> in a deep idle state which is possible if we interleave the timers. Used
> schbench for measuring the latency. Each cgroup is throttled at 25% with
> period value is set to 100ms. Numbers are when both the cgroups are
> running simultaneously. Latency values don't degrade much. Some
> improvement is seen in tail latencies.
>
> 		6.2-rc6        with patch
> Groups: 16
> 50.0th:          39.5            42.5
> 75.0th:         924.0           922.0
> 90.0th:         972.0           968.0
> 95.0th:        1005.5           994.0
> 99.0th:        4166.0          2287.0
> 99.5th:        7314.0          7448.0
> 99.9th:       15024.0         13600.0
>
> Groups: 32
> 50.0th:         819.0           463.0
> 75.0th:        1596.0           918.0
> 90.0th:        5992.0          1281.5
> 95.0th:       13184.0          2765.0
> 99.0th:       21792.0         14240.0
> 99.5th:       25696.0         18920.0
> 99.9th:       33280.0         35776.0
>
> Groups: 64
> 50.0th:        4806.0          3440.0
> 75.0th:       31136.0         33664.0
> 90.0th:       54144.0         58752.0
> 95.0th:       66176.0         67200.0
> 99.0th:       84736.0         91520.0
> 99.5th:       97408.0        114048.0
> 99.9th:      136448.0        140032.0
>
> Signed-off-by: Shrikanth Hegde<sshegde@linux.vnet.ibm.com>
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>

Reviewed-by: Ben Segall <bsegall@google.com>

>
> Initial RFC PATCH, discussions and details on the problem:
> Link1: https://lore.kernel.org/lkml/5ae3cb09-8c9a-11e8-75a7-cc774d9bc283@linux.vnet.ibm.com/
> Link2: https://lore.kernel.org/lkml/9c57c92c-3e0c-b8c5-4be9-8f4df344a347@linux.vnet.ibm.com/
>
> ---
>  kernel/sched/fair.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ff4dbbae3b10..2a4a0969e04f 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5923,6 +5923,10 @@ void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
>  	INIT_LIST_HEAD(&cfs_b->throttled_cfs_rq);
>  	hrtimer_init(&cfs_b->period_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED);
>  	cfs_b->period_timer.function = sched_cfs_period_timer;
> +
> +	/* Add a random offset so that timers interleave */
> +	hrtimer_set_expires(&cfs_b->period_timer,
> +			    get_random_u32_below(cfs_b->period));
>  	hrtimer_init(&cfs_b->slack_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
>  	cfs_b->slack_timer.function = sched_cfs_slack_timer;
>  	cfs_b->slack_started = false;
> --
> 2.31.1

  reply	other threads:[~2023-02-23 23:10 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-23 18:51 [PATCH V3] Interleave cfs bandwidth timers for improved single thread performance at low utilization Shrikanth Hegde
2023-02-23 23:09 ` Benjamin Segall [this message]
2023-03-09 14:21   ` Shrikanth Hegde
2023-03-14  7:55     ` Vincent Guittot
2023-03-14  9:59 ` Peter Zijlstra
2023-03-22  9:22 ` [tip: sched/core] sched: " tip-bot2 for Shrikanth Hegde

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xm26356w3rnp.fsf@google.com \
    --to=bsegall@google.com \
    --cc=arjan@linux.intel.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=sshegde@linux.vnet.ibm.com \
    --cc=svaidy@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox