[patch 2/2] sched/debug: Remove need_resched ratelimiting for warnings

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [patch 2/2] sched/debug: Remove need_resched ratelimiting for warnings
       [not found] <77e42990-0ea3-fc53-8051-6856a92ad4d0@google.com>
@ 2025-01-06 20:39 ` David Rientjes
  2025-01-07 18:14   ` Madadi Vineeth Reddy
  0 siblings, 1 reply; 7+ messages in thread
From: David Rientjes @ 2025-01-06 20:39 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, linux-kernel

The need_resched warnings are controlled by two tunables in debugfs:
 - latency_warn_ms
 - latency_warn_once

By default, latency_warn_once is enabled.  Thus, a need_resched warning
is only emitted once per boot.

If the user configures this to not be the case and changes the default,
then allow the user to also control the threshold through latency_warn_ms
that these warnings trigger.  Do not impose our own ratelimiting on top
that may make it appear like there are no cases where need_resched is set
for longer than the threshold.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 kernel/sched/debug.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -1293,11 +1293,6 @@ void proc_sched_set_task(struct task_struct *p)
 
 void resched_latency_warn(int cpu, u64 latency)
 {
-	static DEFINE_RATELIMIT_STATE(latency_check_ratelimit, 60 * 60 * HZ, 1);
-
-	if (likely(!__ratelimit(&latency_check_ratelimit)))
-		return;
-
 	pr_err("sched: CPU %d need_resched set for > %llu ns (%d ticks) without schedule\n",
 	       cpu, latency, cpu_rq(cpu)->ticks_without_resched);
 	dump_stack();

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch 2/2] sched/debug: Remove need_resched ratelimiting for warnings
  2025-01-06 20:39 ` [patch 2/2] sched/debug: Remove need_resched ratelimiting for warnings David Rientjes
@ 2025-01-07 18:14   ` Madadi Vineeth Reddy
  2025-01-07 20:15     ` David Rientjes
  0 siblings, 1 reply; 7+ messages in thread
From: Madadi Vineeth Reddy @ 2025-01-07 18:14 UTC (permalink / raw)
  To: David Rientjes
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, linux-kernel, Madadi Vineeth Reddy

Hi David Rientjes,

On 07/01/25 02:09, David Rientjes wrote:
> The need_resched warnings are controlled by two tunables in debugfs:
>  - latency_warn_ms
>  - latency_warn_once
> 
> By default, latency_warn_once is enabled.  Thus, a need_resched warning
> is only emitted once per boot.
> 
> If the user configures this to not be the case and changes the default,
> then allow the user to also control the threshold through latency_warn_ms
> that these warnings trigger.  Do not impose our own ratelimiting on top
> that may make it appear like there are no cases where need_resched is set
> for longer than the threshold.

Any idea why it was initially kept to one warning per hour?

The possible reasons that come to mind are to prevent excessive logging under
high CPU contention, as well as to ensure that a warning logged once an hour
indicates the issue is not caused by a short workload spike. Additionally,
this rate limit might help avoid impacting system performance due to excessive
logging.

However, if the default value of latency_warn_once is changed to disable it, it
may be acceptable to bypass the rate limit, as it would indicate a preference
for logging over performance.

Thoughts?

Thanks,
Madadi Vineeth Reddy

> 
> Signed-off-by: David Rientjes <rientjes@google.com>
> ---
>  kernel/sched/debug.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -1293,11 +1293,6 @@ void proc_sched_set_task(struct task_struct *p)
>  
>  void resched_latency_warn(int cpu, u64 latency)
>  {
> -	static DEFINE_RATELIMIT_STATE(latency_check_ratelimit, 60 * 60 * HZ, 1);
> -
> -	if (likely(!__ratelimit(&latency_check_ratelimit)))
> -		return;
> -
>  	pr_err("sched: CPU %d need_resched set for > %llu ns (%d ticks) without schedule\n",
>  	       cpu, latency, cpu_rq(cpu)->ticks_without_resched);
>  	dump_stack();


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch 2/2] sched/debug: Remove need_resched ratelimiting for warnings
  2025-01-07 18:14   ` Madadi Vineeth Reddy
@ 2025-01-07 20:15     ` David Rientjes
  2025-01-07 20:45       ` Josh Don
  0 siblings, 1 reply; 7+ messages in thread
From: David Rientjes @ 2025-01-07 20:15 UTC (permalink / raw)
  To: Madadi Vineeth Reddy, Josh Don
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, linux-kernel

On Tue, 7 Jan 2025, Madadi Vineeth Reddy wrote:

> Hi David Rientjes,
> 
> On 07/01/25 02:09, David Rientjes wrote:
> > The need_resched warnings are controlled by two tunables in debugfs:
> >  - latency_warn_ms
> >  - latency_warn_once
> > 
> > By default, latency_warn_once is enabled.  Thus, a need_resched warning
> > is only emitted once per boot.
> > 
> > If the user configures this to not be the case and changes the default,
> > then allow the user to also control the threshold through latency_warn_ms
> > that these warnings trigger.  Do not impose our own ratelimiting on top
> > that may make it appear like there are no cases where need_resched is set
> > for longer than the threshold.
> 
> Any idea why it was initially kept to one warning per hour?
> 

Adding Josh Don who may have insight into this historically.

> The possible reasons that come to mind are to prevent excessive logging under
> high CPU contention, as well as to ensure that a warning logged once an hour
> indicates the issue is not caused by a short workload spike. Additionally,
> this rate limit might help avoid impacting system performance due to excessive
> logging.
> 
> However, if the default value of latency_warn_once is changed to disable it, it
> may be acceptable to bypass the rate limit, as it would indicate a preference
> for logging over performance.
> 

Right, I think this should be entirely up to what the admin configures in 
debugfs.  If they elect to disable latency_warn_once, we'll simply emit 
the information as often as they specify in latency_warn_ms and not add 
our own ratelimiting on top.  If they have a preference for lots of 
logging, so be it, let's not hide that data.

> Thoughts?
> 
> Thanks,
> Madadi Vineeth Reddy
> 
> > 
> > Signed-off-by: David Rientjes <rientjes@google.com>
> > ---
> >  kernel/sched/debug.c | 5 -----
> >  1 file changed, 5 deletions(-)
> > 
> > diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> > --- a/kernel/sched/debug.c
> > +++ b/kernel/sched/debug.c
> > @@ -1293,11 +1293,6 @@ void proc_sched_set_task(struct task_struct *p)
> >  
> >  void resched_latency_warn(int cpu, u64 latency)
> >  {
> > -	static DEFINE_RATELIMIT_STATE(latency_check_ratelimit, 60 * 60 * HZ, 1);
> > -
> > -	if (likely(!__ratelimit(&latency_check_ratelimit)))
> > -		return;
> > -
> >  	pr_err("sched: CPU %d need_resched set for > %llu ns (%d ticks) without schedule\n",
> >  	       cpu, latency, cpu_rq(cpu)->ticks_without_resched);
> >  	dump_stack();
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch 2/2] sched/debug: Remove need_resched ratelimiting for warnings
  2025-01-07 20:15     ` David Rientjes
@ 2025-01-07 20:45       ` Josh Don
  2025-01-09 17:59         ` David Rientjes
  0 siblings, 1 reply; 7+ messages in thread
From: Josh Don @ 2025-01-07 20:45 UTC (permalink / raw)
  To: David Rientjes
  Cc: Madadi Vineeth Reddy, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, linux-kernel

On Tue, Jan 7, 2025 at 12:15 PM David Rientjes <rientjes@google.com> wrote:
>
> On Tue, 7 Jan 2025, Madadi Vineeth Reddy wrote:
>
> > Any idea why it was initially kept to one warning per hour?
> >
>
> Adding Josh Don who may have insight into this historically.

No idea on the hour default, unfortunately. Almost certainly arbitrary.

> > The possible reasons that come to mind are to prevent excessive logging under
> > high CPU contention, as well as to ensure that a warning logged once an hour
> > indicates the issue is not caused by a short workload spike. Additionally,
> > this rate limit might help avoid impacting system performance due to excessive
> > logging.
> >
> > However, if the default value of latency_warn_once is changed to disable it, it
> > may be acceptable to bypass the rate limit, as it would indicate a preference
> > for logging over performance.
> >
>
> Right, I think this should be entirely up to what the admin configures in
> debugfs.  If they elect to disable latency_warn_once, we'll simply emit
> the information as often as they specify in latency_warn_ms and not add
> our own ratelimiting on top.  If they have a preference for lots of
> logging, so be it, let's not hide that data.

Your change doesn't reset rq->last_seen_need_resched_ns, so now
without the ratelimit I think we'll get a dump every single tick until
we eventually reschedule.

Another potential benefit to the ratelimit is that if we have
something wedging multiple cpus concurrently, we don't spam the log
(if warn_once is disabled). Though, probably an unlikely occurrence.

I think if you modify the patch to reset last_seen_need_resched_ns
that'll give the behavior you're after.

Best,
Josh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch 2/2] sched/debug: Remove need_resched ratelimiting for warnings
  2025-01-07 20:45       ` Josh Don
@ 2025-01-09 17:59         ` David Rientjes
  2025-01-09 18:53           ` Josh Don
  0 siblings, 1 reply; 7+ messages in thread
From: David Rientjes @ 2025-01-09 17:59 UTC (permalink / raw)
  To: Josh Don
  Cc: Madadi Vineeth Reddy, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, linux-kernel

On Tue, 7 Jan 2025, Josh Don wrote:

> > Right, I think this should be entirely up to what the admin configures in
> > debugfs.  If they elect to disable latency_warn_once, we'll simply emit
> > the information as often as they specify in latency_warn_ms and not add
> > our own ratelimiting on top.  If they have a preference for lots of
> > logging, so be it, let's not hide that data.
> 
> Your change doesn't reset rq->last_seen_need_resched_ns, so now
> without the ratelimit I think we'll get a dump every single tick until
> we eventually reschedule.
> 
> Another potential benefit to the ratelimit is that if we have
> something wedging multiple cpus concurrently, we don't spam the log
> (if warn_once is disabled). Though, probably an unlikely occurrence.
> 
> I think if you modify the patch to reset last_seen_need_resched_ns
> that'll give the behavior you're after.
> 

Thanks Josh for pointing this out!  I'm surprised by the implementation 
here where, even though it's only CONFIG_SCHED_DEBUG, we'd be taking the 
function call every tick only to find that the ratelimit makes it a no-op 
:/

Is that worth improving as well?

Otherwise, please take a look, is this what you had in mind?

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5659,8 +5659,10 @@ void sched_tick(void)
 
 	rq_unlock(rq, &rf);
 
-	if (sched_feat(LATENCY_WARN) && resched_latency)
+	if (sched_feat(LATENCY_WARN) && resched_latency) {
 		resched_latency_warn(cpu, resched_latency);
+		rq->last_seen_need_resched_ns = 0;
+	}
 
 	perf_event_task_tick();
 
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -1293,11 +1293,6 @@ void proc_sched_set_task(struct task_struct *p)
 
 void resched_latency_warn(int cpu, u64 latency)
 {
-	static DEFINE_RATELIMIT_STATE(latency_check_ratelimit, 60 * 60 * HZ, 1);
-
-	if (likely(!__ratelimit(&latency_check_ratelimit)))
-		return;
-
 	pr_err("sched: CPU %d need_resched set for > %llu ns (%d ticks) without schedule\n",
 	       cpu, latency, cpu_rq(cpu)->ticks_without_resched);
 	dump_stack();

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch 2/2] sched/debug: Remove need_resched ratelimiting for warnings
  2025-01-09 17:59         ` David Rientjes
@ 2025-01-09 18:53           ` Josh Don
  2025-01-10  0:22             ` David Rientjes
  0 siblings, 1 reply; 7+ messages in thread
From: Josh Don @ 2025-01-09 18:53 UTC (permalink / raw)
  To: David Rientjes
  Cc: Madadi Vineeth Reddy, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, linux-kernel

On Thu, Jan 9, 2025 at 9:59 AM David Rientjes <rientjes@google.com> wrote:
>
> On Tue, 7 Jan 2025, Josh Don wrote:
>
> > > Right, I think this should be entirely up to what the admin configures in
> > > debugfs.  If they elect to disable latency_warn_once, we'll simply emit
> > > the information as often as they specify in latency_warn_ms and not add
> > > our own ratelimiting on top.  If they have a preference for lots of
> > > logging, so be it, let's not hide that data.
> >
> > Your change doesn't reset rq->last_seen_need_resched_ns, so now
> > without the ratelimit I think we'll get a dump every single tick until
> > we eventually reschedule.
> >
> > Another potential benefit to the ratelimit is that if we have
> > something wedging multiple cpus concurrently, we don't spam the log
> > (if warn_once is disabled). Though, probably an unlikely occurrence.
> >
> > I think if you modify the patch to reset last_seen_need_resched_ns
> > that'll give the behavior you're after.
> >
>
> Thanks Josh for pointing this out!  I'm surprised by the implementation
> here where, even though it's only CONFIG_SCHED_DEBUG, we'd be taking the
> function call every tick only to find that the ratelimit makes it a no-op
> :/
>
> Is that worth improving as well?

I think your change takes care of it by removing the ratelimit entirely :)

> Otherwise, please take a look, is this what you had in mind?

I'm realizing now that we'll end up getting multiple splats for a
single very long stall (one per the warning threshold). We could fix
that by using a magic number rather than 0 here (such as U64_MAX), and
then teach resched_latency() to bail out on this value.

Additionally, while on the surface it might appear odd to write to the
rq field not under lock, but we'll never have concurrent read/write to
a given rq's last_seen_need_resched, so that's fine, but just wanted
to mention explicitly.

>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5659,8 +5659,10 @@ void sched_tick(void)
>
>         rq_unlock(rq, &rf);
>
> -       if (sched_feat(LATENCY_WARN) && resched_latency)
> +       if (sched_feat(LATENCY_WARN) && resched_latency) {
>                 resched_latency_warn(cpu, resched_latency);
> +               rq->last_seen_need_resched_ns = 0;
> +       }
>
>         perf_event_task_tick();
>
> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -1293,11 +1293,6 @@ void proc_sched_set_task(struct task_struct *p)
>
>  void resched_latency_warn(int cpu, u64 latency)
>  {
> -       static DEFINE_RATELIMIT_STATE(latency_check_ratelimit, 60 * 60 * HZ, 1);
> -
> -       if (likely(!__ratelimit(&latency_check_ratelimit)))
> -               return;
> -

I think it is possible some users would want a control to enact some
type of rate-limit even with warn_once disabled, but for now I think
this is perfectly reasonable. We can always add a separate knob later
on to control a minimum cooldown between splats in that case.

>         pr_err("sched: CPU %d need_resched set for > %llu ns (%d ticks) without schedule\n",
>                cpu, latency, cpu_rq(cpu)->ticks_without_resched);
>         dump_stack();

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch 2/2] sched/debug: Remove need_resched ratelimiting for warnings
  2025-01-09 18:53           ` Josh Don
@ 2025-01-10  0:22             ` David Rientjes
  0 siblings, 0 replies; 7+ messages in thread
From: David Rientjes @ 2025-01-10  0:22 UTC (permalink / raw)
  To: Josh Don
  Cc: Madadi Vineeth Reddy, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, linux-kernel

On Thu, 9 Jan 2025, Josh Don wrote:

> > Otherwise, please take a look, is this what you had in mind?
> 
> I'm realizing now that we'll end up getting multiple splats for a
> single very long stall (one per the warning threshold). We could fix
> that by using a magic number rather than 0 here (such as U64_MAX), and
> then teach resched_latency() to bail out on this value.
> 

Ack, ok.  I'll drop this patch because I see that there are existing users 
(at least one NVMe library) that cares about tuning both of these values 
with what appears to be some amount of thought:

/sys/kernel/debug/sched/latency_warn_once=0
/sys/kernel/debug/sched/latency_warn_ms=16

and the intent was not that they get excessive output that they don't need 
or aren't expecting.  Thanks for looking at this!

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-01-10  0:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <77e42990-0ea3-fc53-8051-6856a92ad4d0@google.com>
2025-01-06 20:39 ` [patch 2/2] sched/debug: Remove need_resched ratelimiting for warnings David Rientjes
2025-01-07 18:14   ` Madadi Vineeth Reddy
2025-01-07 20:15     ` David Rientjes
2025-01-07 20:45       ` Josh Don
2025-01-09 17:59         ` David Rientjes
2025-01-09 18:53           ` Josh Don
2025-01-10  0:22             ` David Rientjes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox