public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched/core: Bypass arch_scale_freq_tick() on nohz_full CPUs
@ 2021-08-27 18:49 Nicolas Saenz Julienne
  2021-08-30  9:31 ` Peter Zijlstra
  0 siblings, 1 reply; 2+ messages in thread
From: Nicolas Saenz Julienne @ 2021-08-27 18:49 UTC (permalink / raw)
  To: linux-kernel, mingo, peterz
  Cc: vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, juri.lelli, mtosatti, nilal, frederic,
	Nicolas Saenz Julienne

arch_scale_freq_tick() calculations generally track the increment of
frequency performance counters in between scheduler ticks to provide
estimations on how DVFS swayed frequency on that CPU. This information
eventually allows for more precise scheduling decisions. It's all fine
and good, but nohz_full CPUs are capable of disabling the tick
indefinitely and potentially trigger overflows in
arch_scale_freq_tick()'s calculations once it's eventually re-enabled.

This will happen for both users of this interface: x86 and arm64. And
it's also relevant that the heuristic on what to do in case of
operations overflowing varies depending on the implementation. It goes
from fully disabling frequency invariance scaling on all CPUS, to
ignoring this is a possibility.

It's arguable that nohz_full CPUs are unlikely to benefit much from this
feature, since their aim is to allow for uninterrupted execution of a
single task, effectively getting the scheduler out of the way. Also,
DVFS itself is also unlikely be used on most nohz_full systems, given
its effects on latency.

So get around this by not calling arch_scale_freq_tick() on nohz_full
CPUs.

Note that tick_nohz_full_cpu() relies on a static branch, which avoids
degrading performance on the rest of systems.

Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
---
 kernel/sched/core.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2fd623b2270d..8c04ec0e073a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5016,7 +5016,13 @@ void scheduler_tick(void)
 	unsigned long thermal_pressure;
 	u64 resched_latency;
 
-	arch_scale_freq_tick();
+	/*
+	 * nohz_full CPUs are capable of disabling the scheduler tick
+	 * indefinitely, potentially overflowing arch_scale_freq_tick()
+	 * calculations once it's re-enabled.
+	 */
+	if (!tick_nohz_full_cpu(smp_processor_id()))
+		arch_scale_freq_tick();
 	sched_clock_tick();
 
 	rq_lock(rq, &rf);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] sched/core: Bypass arch_scale_freq_tick() on nohz_full CPUs
  2021-08-27 18:49 [PATCH] sched/core: Bypass arch_scale_freq_tick() on nohz_full CPUs Nicolas Saenz Julienne
@ 2021-08-30  9:31 ` Peter Zijlstra
  0 siblings, 0 replies; 2+ messages in thread
From: Peter Zijlstra @ 2021-08-30  9:31 UTC (permalink / raw)
  To: Nicolas Saenz Julienne
  Cc: linux-kernel, mingo, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, bristot, juri.lelli, mtosatti, nilal, frederic

On Fri, Aug 27, 2021 at 08:49:10PM +0200, Nicolas Saenz Julienne wrote:
> arch_scale_freq_tick() calculations generally track the increment of
> frequency performance counters in between scheduler ticks to provide
> estimations on how DVFS swayed frequency on that CPU. This information
> eventually allows for more precise scheduling decisions. It's all fine
> and good, but nohz_full CPUs are capable of disabling the tick
> indefinitely and potentially trigger overflows in
> arch_scale_freq_tick()'s calculations once it's eventually re-enabled.
> 
> This will happen for both users of this interface: x86 and arm64. And
> it's also relevant that the heuristic on what to do in case of
> operations overflowing varies depending on the implementation. It goes
> from fully disabling frequency invariance scaling on all CPUS, to
> ignoring this is a possibility.
> 
> It's arguable that nohz_full CPUs are unlikely to benefit much from this
> feature, since their aim is to allow for uninterrupted execution of a
> single task, effectively getting the scheduler out of the way. Also,
> DVFS itself is also unlikely be used on most nohz_full systems, given
> its effects on latency.
> 
> So get around this by not calling arch_scale_freq_tick() on nohz_full
> CPUs.
> 
> Note that tick_nohz_full_cpu() relies on a static branch, which avoids
> degrading performance on the rest of systems.
> 
> Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
> ---
>  kernel/sched/core.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2fd623b2270d..8c04ec0e073a 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5016,7 +5016,13 @@ void scheduler_tick(void)
>  	unsigned long thermal_pressure;
>  	u64 resched_latency;
>  
> -	arch_scale_freq_tick();
> +	/*
> +	 * nohz_full CPUs are capable of disabling the scheduler tick
> +	 * indefinitely, potentially overflowing arch_scale_freq_tick()
> +	 * calculations once it's re-enabled.
> +	 */
> +	if (!tick_nohz_full_cpu(smp_processor_id()))
> +		arch_scale_freq_tick();
>  	sched_clock_tick();
>  
>  	rq_lock(rq, &rf);

Hurmph,.. I'm not too happy with this.. Fundamentally the whole
NOHZ_FULL state should be dynamic, it currently isn't but that's
arguably a bug of the current implementation.

As such the above doesn't really dtrt, since a CPU disabling NOHZ_FULL
will then still suffer all them overflows you mentioned.

Frederic, how should this be done right? Is there a place where upon
entering/exiting NOHZ_FULL we can do fixups?

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-08-30  9:31 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-08-27 18:49 [PATCH] sched/core: Bypass arch_scale_freq_tick() on nohz_full CPUs Nicolas Saenz Julienne
2021-08-30  9:31 ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox