Re: [PATCH] sched/idle: Make idle poll dynamic per-cpu

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@kernel.org>
To: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Joe Mario <jmario@redhat.com>
Subject: Re: [PATCH] sched/idle: Make idle poll dynamic per-cpu
Date: Sun, 15 Jan 2023 10:15:31 +0100	[thread overview]
Message-ID: <Y8PEM9TK4vTPWuxH@gmail.com> (raw)
In-Reply-To: <20230112162426.217522-1-bristot@kernel.org>


* Daniel Bristot de Oliveira <bristot@kernel.org> wrote:

> idle=poll is frequently used on ultra-low-latency systems. Examples of
> such systems are high-performance trading and 5G NVRAM. The performance
> gain is given by avoiding the idle driver machinery and by keeping the
> CPU is always in an active state - avoiding (odd) hardware heuristics that
> are out of the control of the OS.
> 
> Currently, idle=poll is an all-or-nothing static option defined at
> boot time. The motivation for creating this option dynamic and per-cpu
> are two:
> 
>   1) Reduce the power usage/heat by allowing only selected CPUs to
>      do idle polling;
>   2) Allow multi-tenant systems (e.g., Kubernetes) to enable idle
>      polling only when ultra-low-latency applications are present
>      on specific CPUs.
> 
> Joe Mario did some experiments with this option enabled, and the results
> were significant. For example, by using dynamic idle polling on
> selected CPUs, cyclictest performance is optimal (like when using
> idle=poll), but cpu power consumption drops from 381 to 233 watts.
> 
> Also, limiting idle=poll to the set of CPUs that benefits from
> it allows other CPUs to benefit from frequency boosts. Joe also
> shows that the results can be in the order of 80nsec round trip
> improvement when system-wide idle=poll was not used.
> 
> The user can enable idle polling with this command:
>   # echo 1 > /sys/devices/system/cpu/cpu{CPU_ID}/idle_poll
> 
> And disable it via:
>   # echo 0 > /sys/devices/system/cpu/cpu{CPU_ID}/idle_poll
> 
> By default, all CPUs have idle polling disabled (the current behavior).
> A static key avoids the CPU mask check overhead when no idle polling
> is enabled.

Sounds useful in general.

A couple of observations:

ABI: how about putting the new file into the existing 
/sys/devices/system/cpu/cpuidle/ directory - the sysfs space of cpuidle? 
Arguably this flag is an extension of it.


>  extern char __cpuidle_text_start[], __cpuidle_text_end[];
>  
> +/*
> + * per-cpu idle polling selector.
> + */
> +static struct cpumask cpu_poll_mask;
> +DEFINE_STATIC_KEY_FALSE(cpu_poll_enabled);
> +
> +/*
> + * Protects the mask/static key relation.
> + */
> +DEFINE_MUTEX(cpu_poll_mutex);
> +
> +static ssize_t idle_poll_store(struct device *dev, struct device_attribute *attr,
> +			       const char *buf, size_t count)
> +{
> +	int cpu = dev->id;
> +	int retval, set;
> +	bool val;
> +
> +	retval = kstrtobool(buf, &val);
> +	if (retval)
> +		return retval;
> +
> +	mutex_lock(&cpu_poll_mutex);
> +
> +	if (val) {
> +		set = cpumask_test_and_set_cpu(cpu, &cpu_poll_mask);
> +
> +		/*
> +		 * If the CPU was already on, do not increase the static key usage.
> +		 */
> +		if (!set)
> +			static_branch_inc(&cpu_poll_enabled);
> +	} else {
> +		set = cpumask_test_and_clear_cpu(cpu, &cpu_poll_mask);
> +
> +		/*
> +		 * If the CPU was already off, do not decrease the static key usage.
> +		 */
> +		if (set)
> +			static_branch_dec(&cpu_poll_enabled);
> +	}

Nit: I think 'old_bit' or so is easier to read than a generic 'set'?


> +
> +	mutex_unlock(&cpu_poll_mutex);

Also, is cpu_poll_mutex locking really necessary, given that these bitops 
methods are atomic, and CPUs observe cpu_poll_enabled without taking any 
locks?

> +static int is_cpu_idle_poll(int cpu)
> +{
> +	if (static_branch_unlikely(&cpu_poll_enabled))
> +		return cpumask_test_cpu(cpu, &cpu_poll_mask);
> +
> +	return 0;
> +}

static inline might be justified in this case I guess.

> @@ -51,18 +136,21 @@ __setup("hlt", cpu_idle_nopoll_setup);
>  
>  static noinline int __cpuidle cpu_idle_poll(void)
>  {
> -	trace_cpu_idle(0, smp_processor_id());
> +	int cpu = smp_processor_id();
> +
> +	trace_cpu_idle(0, cpu);
>  	stop_critical_timings();
>  	ct_idle_enter();
>  	local_irq_enable();
>  
>  	while (!tif_need_resched() &&
> -	       (cpu_idle_force_poll || tick_check_broadcast_expired()))
> +	       (cpu_idle_force_poll || tick_check_broadcast_expired()
> +		|| is_cpu_idle_poll(cpu)))
>  		cpu_relax();
>  
>  	ct_idle_exit();
>  	start_critical_timings();
> -	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
> +	trace_cpu_idle(PWR_EVENT_EXIT, cpu);
>  
>  	return 1;

So I think the introduction of the 'cpu' local variable to clean up the 
flow of cpu_idle_poll() should be a separate preparatory patch, which will 
make the addition of the is_cpu_idle_poll() call a bit easier to read in 
the second patch.

>  }
> @@ -296,7 +384,8 @@ static void do_idle(void)
>  		 * broadcast device expired for us, we don't want to go deep
>  		 * idle as we know that the IPI is going to arrive right away.
>  		 */
> -		if (cpu_idle_force_poll || tick_check_broadcast_expired()) {
> +		if (cpu_idle_force_poll || tick_check_broadcast_expired()
> +		    || is_cpu_idle_poll(cpu)) {
>  			tick_nohz_idle_restart_tick();
>  			cpu_idle_poll();

Shouldn't we check is_cpu_idle_poll() right after the cpu_idle_force_poll 
check, and before the tick_check_broadcast_expired() check?

Shouldn't matter to the outcome, but for consistency's sake.

Plus, if we are doing this anyway, maybe cpu_idle_force_poll could now be 
implemented as 0/all setting of cpu_poll_mask, eliminating the 
cpu_idle_force_poll flag? As a third patch on top.

Thanks,

	Ingo

next prev parent reply	other threads:[~2023-01-15  9:15 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-12 16:24 [PATCH] sched/idle: Make idle poll dynamic per-cpu Daniel Bristot de Oliveira
2023-01-15  9:15 ` Ingo Molnar [this message]
2023-01-17 11:20   ` Daniel Bristot de Oliveira
2023-01-16  1:43 ` Chen Yu
2023-01-16  8:53 ` Peter Zijlstra
2023-01-16  9:02   ` Ingo Molnar
2023-01-16  9:28     ` Ingo Molnar
2023-01-16  9:51       ` Vincent Guittot
2023-01-16 10:11         ` Daniel Bristot de Oliveira
2023-01-16 10:06       ` Daniel Bristot de Oliveira
2023-01-16 11:54       ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y8PEM9TK4vTPWuxH@gmail.com \
    --to=mingo@kernel.org \
    --cc=bristot@kernel.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=jmario@redhat.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox