From: Ingo Molnar <mingo@kernel.org>
To: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Daniel Bristot de Oliveira <bristot@redhat.com>,
Valentin Schneider <vschneid@redhat.com>,
Joe Mario <jmario@redhat.com>
Subject: Re: [PATCH] sched/idle: Make idle poll dynamic per-cpu
Date: Sun, 15 Jan 2023 10:15:31 +0100 [thread overview]
Message-ID: <Y8PEM9TK4vTPWuxH@gmail.com> (raw)
In-Reply-To: <20230112162426.217522-1-bristot@kernel.org>
* Daniel Bristot de Oliveira <bristot@kernel.org> wrote:
> idle=poll is frequently used on ultra-low-latency systems. Examples of
> such systems are high-performance trading and 5G NVRAM. The performance
> gain is given by avoiding the idle driver machinery and by keeping the
> CPU is always in an active state - avoiding (odd) hardware heuristics that
> are out of the control of the OS.
>
> Currently, idle=poll is an all-or-nothing static option defined at
> boot time. The motivation for creating this option dynamic and per-cpu
> are two:
>
> 1) Reduce the power usage/heat by allowing only selected CPUs to
> do idle polling;
> 2) Allow multi-tenant systems (e.g., Kubernetes) to enable idle
> polling only when ultra-low-latency applications are present
> on specific CPUs.
>
> Joe Mario did some experiments with this option enabled, and the results
> were significant. For example, by using dynamic idle polling on
> selected CPUs, cyclictest performance is optimal (like when using
> idle=poll), but cpu power consumption drops from 381 to 233 watts.
>
> Also, limiting idle=poll to the set of CPUs that benefits from
> it allows other CPUs to benefit from frequency boosts. Joe also
> shows that the results can be in the order of 80nsec round trip
> improvement when system-wide idle=poll was not used.
>
> The user can enable idle polling with this command:
> # echo 1 > /sys/devices/system/cpu/cpu{CPU_ID}/idle_poll
>
> And disable it via:
> # echo 0 > /sys/devices/system/cpu/cpu{CPU_ID}/idle_poll
>
> By default, all CPUs have idle polling disabled (the current behavior).
> A static key avoids the CPU mask check overhead when no idle polling
> is enabled.
Sounds useful in general.
A couple of observations:
ABI: how about putting the new file into the existing
/sys/devices/system/cpu/cpuidle/ directory - the sysfs space of cpuidle?
Arguably this flag is an extension of it.
> extern char __cpuidle_text_start[], __cpuidle_text_end[];
>
> +/*
> + * per-cpu idle polling selector.
> + */
> +static struct cpumask cpu_poll_mask;
> +DEFINE_STATIC_KEY_FALSE(cpu_poll_enabled);
> +
> +/*
> + * Protects the mask/static key relation.
> + */
> +DEFINE_MUTEX(cpu_poll_mutex);
> +
> +static ssize_t idle_poll_store(struct device *dev, struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + int cpu = dev->id;
> + int retval, set;
> + bool val;
> +
> + retval = kstrtobool(buf, &val);
> + if (retval)
> + return retval;
> +
> + mutex_lock(&cpu_poll_mutex);
> +
> + if (val) {
> + set = cpumask_test_and_set_cpu(cpu, &cpu_poll_mask);
> +
> + /*
> + * If the CPU was already on, do not increase the static key usage.
> + */
> + if (!set)
> + static_branch_inc(&cpu_poll_enabled);
> + } else {
> + set = cpumask_test_and_clear_cpu(cpu, &cpu_poll_mask);
> +
> + /*
> + * If the CPU was already off, do not decrease the static key usage.
> + */
> + if (set)
> + static_branch_dec(&cpu_poll_enabled);
> + }
Nit: I think 'old_bit' or so is easier to read than a generic 'set'?
> +
> + mutex_unlock(&cpu_poll_mutex);
Also, is cpu_poll_mutex locking really necessary, given that these bitops
methods are atomic, and CPUs observe cpu_poll_enabled without taking any
locks?
> +static int is_cpu_idle_poll(int cpu)
> +{
> + if (static_branch_unlikely(&cpu_poll_enabled))
> + return cpumask_test_cpu(cpu, &cpu_poll_mask);
> +
> + return 0;
> +}
static inline might be justified in this case I guess.
> @@ -51,18 +136,21 @@ __setup("hlt", cpu_idle_nopoll_setup);
>
> static noinline int __cpuidle cpu_idle_poll(void)
> {
> - trace_cpu_idle(0, smp_processor_id());
> + int cpu = smp_processor_id();
> +
> + trace_cpu_idle(0, cpu);
> stop_critical_timings();
> ct_idle_enter();
> local_irq_enable();
>
> while (!tif_need_resched() &&
> - (cpu_idle_force_poll || tick_check_broadcast_expired()))
> + (cpu_idle_force_poll || tick_check_broadcast_expired()
> + || is_cpu_idle_poll(cpu)))
> cpu_relax();
>
> ct_idle_exit();
> start_critical_timings();
> - trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
> + trace_cpu_idle(PWR_EVENT_EXIT, cpu);
>
> return 1;
So I think the introduction of the 'cpu' local variable to clean up the
flow of cpu_idle_poll() should be a separate preparatory patch, which will
make the addition of the is_cpu_idle_poll() call a bit easier to read in
the second patch.
> }
> @@ -296,7 +384,8 @@ static void do_idle(void)
> * broadcast device expired for us, we don't want to go deep
> * idle as we know that the IPI is going to arrive right away.
> */
> - if (cpu_idle_force_poll || tick_check_broadcast_expired()) {
> + if (cpu_idle_force_poll || tick_check_broadcast_expired()
> + || is_cpu_idle_poll(cpu)) {
> tick_nohz_idle_restart_tick();
> cpu_idle_poll();
Shouldn't we check is_cpu_idle_poll() right after the cpu_idle_force_poll
check, and before the tick_check_broadcast_expired() check?
Shouldn't matter to the outcome, but for consistency's sake.
Plus, if we are doing this anyway, maybe cpu_idle_force_poll could now be
implemented as 0/all setting of cpu_poll_mask, eliminating the
cpu_idle_force_poll flag? As a third patch on top.
Thanks,
Ingo
next prev parent reply other threads:[~2023-01-15 9:15 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-12 16:24 [PATCH] sched/idle: Make idle poll dynamic per-cpu Daniel Bristot de Oliveira
2023-01-15 9:15 ` Ingo Molnar [this message]
2023-01-17 11:20 ` Daniel Bristot de Oliveira
2023-01-16 1:43 ` Chen Yu
2023-01-16 8:53 ` Peter Zijlstra
2023-01-16 9:02 ` Ingo Molnar
2023-01-16 9:28 ` Ingo Molnar
2023-01-16 9:51 ` Vincent Guittot
2023-01-16 10:11 ` Daniel Bristot de Oliveira
2023-01-16 10:06 ` Daniel Bristot de Oliveira
2023-01-16 11:54 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y8PEM9TK4vTPWuxH@gmail.com \
--to=mingo@kernel.org \
--cc=bristot@kernel.org \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=jmario@redhat.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.