From: Ingo Molnar <mingo@kernel.org>
To: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Daniel Bristot de Oliveira <bristot@redhat.com>,
Valentin Schneider <vschneid@redhat.com>,
Joe Mario <jmario@redhat.com>
Subject: Re: [PATCH] sched/idle: Make idle poll dynamic per-cpu
Date: Sun, 15 Jan 2023 10:15:31 +0100 [thread overview]
Message-ID: <Y8PEM9TK4vTPWuxH@gmail.com> (raw)
In-Reply-To: <20230112162426.217522-1-bristot@kernel.org>
* Daniel Bristot de Oliveira <bristot@kernel.org> wrote:
> idle=poll is frequently used on ultra-low-latency systems. Examples of
> such systems are high-performance trading and 5G NVRAM. The performance
> gain is given by avoiding the idle driver machinery and by keeping the
> CPU is always in an active state - avoiding (odd) hardware heuristics that
> are out of the control of the OS.
>
> Currently, idle=poll is an all-or-nothing static option defined at
> boot time. The motivation for creating this option dynamic and per-cpu
> are two:
>
> 1) Reduce the power usage/heat by allowing only selected CPUs to
> do idle polling;
> 2) Allow multi-tenant systems (e.g., Kubernetes) to enable idle
> polling only when ultra-low-latency applications are present
> on specific CPUs.
>
> Joe Mario did some experiments with this option enabled, and the results
> were significant. For example, by using dynamic idle polling on
> selected CPUs, cyclictest performance is optimal (like when using
> idle=poll), but cpu power consumption drops from 381 to 233 watts.
>
> Also, limiting idle=poll to the set of CPUs that benefits from
> it allows other CPUs to benefit from frequency boosts. Joe also
> shows that the results can be in the order of 80nsec round trip
> improvement when system-wide idle=poll was not used.
>
> The user can enable idle polling with this command:
> # echo 1 > /sys/devices/system/cpu/cpu{CPU_ID}/idle_poll
>
> And disable it via:
> # echo 0 > /sys/devices/system/cpu/cpu{CPU_ID}/idle_poll
>
> By default, all CPUs have idle polling disabled (the current behavior).
> A static key avoids the CPU mask check overhead when no idle polling
> is enabled.
Sounds useful in general.
A couple of observations:
ABI: how about putting the new file into the existing
/sys/devices/system/cpu/cpuidle/ directory - the sysfs space of cpuidle?
Arguably this flag is an extension of it.
> extern char __cpuidle_text_start[], __cpuidle_text_end[];
>
> +/*
> + * per-cpu idle polling selector.
> + */
> +static struct cpumask cpu_poll_mask;
> +DEFINE_STATIC_KEY_FALSE(cpu_poll_enabled);
> +
> +/*
> + * Protects the mask/static key relation.
> + */
> +DEFINE_MUTEX(cpu_poll_mutex);
> +
> +static ssize_t idle_poll_store(struct device *dev, struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + int cpu = dev->id;
> + int retval, set;
> + bool val;
> +
> + retval = kstrtobool(buf, &val);
> + if (retval)
> + return retval;
> +
> + mutex_lock(&cpu_poll_mutex);
> +
> + if (val) {
> + set = cpumask_test_and_set_cpu(cpu, &cpu_poll_mask);
> +
> + /*
> + * If the CPU was already on, do not increase the static key usage.
> + */
> + if (!set)
> + static_branch_inc(&cpu_poll_enabled);
> + } else {
> + set = cpumask_test_and_clear_cpu(cpu, &cpu_poll_mask);
> +
> + /*
> + * If the CPU was already off, do not decrease the static key usage.
> + */
> + if (set)
> + static_branch_dec(&cpu_poll_enabled);
> + }
Nit: I think 'old_bit' or so is easier to read than a generic 'set'?
> +
> + mutex_unlock(&cpu_poll_mutex);
Also, is cpu_poll_mutex locking really necessary, given that these bitops
methods are atomic, and CPUs observe cpu_poll_enabled without taking any
locks?
> +static int is_cpu_idle_poll(int cpu)
> +{
> + if (static_branch_unlikely(&cpu_poll_enabled))
> + return cpumask_test_cpu(cpu, &cpu_poll_mask);
> +
> + return 0;
> +}
static inline might be justified in this case I guess.
> @@ -51,18 +136,21 @@ __setup("hlt", cpu_idle_nopoll_setup);
>
> static noinline int __cpuidle cpu_idle_poll(void)
> {
> - trace_cpu_idle(0, smp_processor_id());
> + int cpu = smp_processor_id();
> +
> + trace_cpu_idle(0, cpu);
> stop_critical_timings();
> ct_idle_enter();
> local_irq_enable();
>
> while (!tif_need_resched() &&
> - (cpu_idle_force_poll || tick_check_broadcast_expired()))
> + (cpu_idle_force_poll || tick_check_broadcast_expired()
> + || is_cpu_idle_poll(cpu)))
> cpu_relax();
>
> ct_idle_exit();
> start_critical_timings();
> - trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
> + trace_cpu_idle(PWR_EVENT_EXIT, cpu);
>
> return 1;
So I think the introduction of the 'cpu' local variable to clean up the
flow of cpu_idle_poll() should be a separate preparatory patch, which will
make the addition of the is_cpu_idle_poll() call a bit easier to read in
the second patch.
> }
> @@ -296,7 +384,8 @@ static void do_idle(void)
> * broadcast device expired for us, we don't want to go deep
> * idle as we know that the IPI is going to arrive right away.
> */
> - if (cpu_idle_force_poll || tick_check_broadcast_expired()) {
> + if (cpu_idle_force_poll || tick_check_broadcast_expired()
> + || is_cpu_idle_poll(cpu)) {
> tick_nohz_idle_restart_tick();
> cpu_idle_poll();
Shouldn't we check is_cpu_idle_poll() right after the cpu_idle_force_poll
check, and before the tick_check_broadcast_expired() check?
Shouldn't matter to the outcome, but for consistency's sake.
Plus, if we are doing this anyway, maybe cpu_idle_force_poll could now be
implemented as 0/all setting of cpu_poll_mask, eliminating the
cpu_idle_force_poll flag? As a third patch on top.
Thanks,
Ingo
next prev parent reply other threads:[~2023-01-15 9:15 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-12 16:24 [PATCH] sched/idle: Make idle poll dynamic per-cpu Daniel Bristot de Oliveira
2023-01-15 9:15 ` Ingo Molnar [this message]
2023-01-17 11:20 ` Daniel Bristot de Oliveira
2023-01-16 1:43 ` Chen Yu
2023-01-16 8:53 ` Peter Zijlstra
2023-01-16 9:02 ` Ingo Molnar
2023-01-16 9:28 ` Ingo Molnar
2023-01-16 9:51 ` Vincent Guittot
2023-01-16 10:11 ` Daniel Bristot de Oliveira
2023-01-16 10:06 ` Daniel Bristot de Oliveira
2023-01-16 11:54 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y8PEM9TK4vTPWuxH@gmail.com \
--to=mingo@kernel.org \
--cc=bristot@kernel.org \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=jmario@redhat.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox