* [patch 2/2] sched/debug: Remove need_resched ratelimiting for warnings [not found] <77e42990-0ea3-fc53-8051-6856a92ad4d0@google.com> @ 2025-01-06 20:39 ` David Rientjes 2025-01-07 18:14 ` Madadi Vineeth Reddy 0 siblings, 1 reply; 7+ messages in thread From: David Rientjes @ 2025-01-06 20:39 UTC (permalink / raw) To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel The need_resched warnings are controlled by two tunables in debugfs: - latency_warn_ms - latency_warn_once By default, latency_warn_once is enabled. Thus, a need_resched warning is only emitted once per boot. If the user configures this to not be the case and changes the default, then allow the user to also control the threshold through latency_warn_ms that these warnings trigger. Do not impose our own ratelimiting on top that may make it appear like there are no cases where need_resched is set for longer than the threshold. Signed-off-by: David Rientjes <rientjes@google.com> --- kernel/sched/debug.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -1293,11 +1293,6 @@ void proc_sched_set_task(struct task_struct *p) void resched_latency_warn(int cpu, u64 latency) { - static DEFINE_RATELIMIT_STATE(latency_check_ratelimit, 60 * 60 * HZ, 1); - - if (likely(!__ratelimit(&latency_check_ratelimit))) - return; - pr_err("sched: CPU %d need_resched set for > %llu ns (%d ticks) without schedule\n", cpu, latency, cpu_rq(cpu)->ticks_without_resched); dump_stack(); ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch 2/2] sched/debug: Remove need_resched ratelimiting for warnings 2025-01-06 20:39 ` [patch 2/2] sched/debug: Remove need_resched ratelimiting for warnings David Rientjes @ 2025-01-07 18:14 ` Madadi Vineeth Reddy 2025-01-07 20:15 ` David Rientjes 0 siblings, 1 reply; 7+ messages in thread From: Madadi Vineeth Reddy @ 2025-01-07 18:14 UTC (permalink / raw) To: David Rientjes Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel, Madadi Vineeth Reddy Hi David Rientjes, On 07/01/25 02:09, David Rientjes wrote: > The need_resched warnings are controlled by two tunables in debugfs: > - latency_warn_ms > - latency_warn_once > > By default, latency_warn_once is enabled. Thus, a need_resched warning > is only emitted once per boot. > > If the user configures this to not be the case and changes the default, > then allow the user to also control the threshold through latency_warn_ms > that these warnings trigger. Do not impose our own ratelimiting on top > that may make it appear like there are no cases where need_resched is set > for longer than the threshold. Any idea why it was initially kept to one warning per hour? The possible reasons that come to mind are to prevent excessive logging under high CPU contention, as well as to ensure that a warning logged once an hour indicates the issue is not caused by a short workload spike. Additionally, this rate limit might help avoid impacting system performance due to excessive logging. However, if the default value of latency_warn_once is changed to disable it, it may be acceptable to bypass the rate limit, as it would indicate a preference for logging over performance. Thoughts? Thanks, Madadi Vineeth Reddy > > Signed-off-by: David Rientjes <rientjes@google.com> > --- > kernel/sched/debug.c | 5 ----- > 1 file changed, 5 deletions(-) > > diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c > --- a/kernel/sched/debug.c > +++ b/kernel/sched/debug.c > @@ -1293,11 +1293,6 @@ void proc_sched_set_task(struct task_struct *p) > > void resched_latency_warn(int cpu, u64 latency) > { > - static DEFINE_RATELIMIT_STATE(latency_check_ratelimit, 60 * 60 * HZ, 1); > - > - if (likely(!__ratelimit(&latency_check_ratelimit))) > - return; > - > pr_err("sched: CPU %d need_resched set for > %llu ns (%d ticks) without schedule\n", > cpu, latency, cpu_rq(cpu)->ticks_without_resched); > dump_stack(); ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch 2/2] sched/debug: Remove need_resched ratelimiting for warnings 2025-01-07 18:14 ` Madadi Vineeth Reddy @ 2025-01-07 20:15 ` David Rientjes 2025-01-07 20:45 ` Josh Don 0 siblings, 1 reply; 7+ messages in thread From: David Rientjes @ 2025-01-07 20:15 UTC (permalink / raw) To: Madadi Vineeth Reddy, Josh Don Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel On Tue, 7 Jan 2025, Madadi Vineeth Reddy wrote: > Hi David Rientjes, > > On 07/01/25 02:09, David Rientjes wrote: > > The need_resched warnings are controlled by two tunables in debugfs: > > - latency_warn_ms > > - latency_warn_once > > > > By default, latency_warn_once is enabled. Thus, a need_resched warning > > is only emitted once per boot. > > > > If the user configures this to not be the case and changes the default, > > then allow the user to also control the threshold through latency_warn_ms > > that these warnings trigger. Do not impose our own ratelimiting on top > > that may make it appear like there are no cases where need_resched is set > > for longer than the threshold. > > Any idea why it was initially kept to one warning per hour? > Adding Josh Don who may have insight into this historically. > The possible reasons that come to mind are to prevent excessive logging under > high CPU contention, as well as to ensure that a warning logged once an hour > indicates the issue is not caused by a short workload spike. Additionally, > this rate limit might help avoid impacting system performance due to excessive > logging. > > However, if the default value of latency_warn_once is changed to disable it, it > may be acceptable to bypass the rate limit, as it would indicate a preference > for logging over performance. > Right, I think this should be entirely up to what the admin configures in debugfs. If they elect to disable latency_warn_once, we'll simply emit the information as often as they specify in latency_warn_ms and not add our own ratelimiting on top. If they have a preference for lots of logging, so be it, let's not hide that data. > Thoughts? > > Thanks, > Madadi Vineeth Reddy > > > > > Signed-off-by: David Rientjes <rientjes@google.com> > > --- > > kernel/sched/debug.c | 5 ----- > > 1 file changed, 5 deletions(-) > > > > diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c > > --- a/kernel/sched/debug.c > > +++ b/kernel/sched/debug.c > > @@ -1293,11 +1293,6 @@ void proc_sched_set_task(struct task_struct *p) > > > > void resched_latency_warn(int cpu, u64 latency) > > { > > - static DEFINE_RATELIMIT_STATE(latency_check_ratelimit, 60 * 60 * HZ, 1); > > - > > - if (likely(!__ratelimit(&latency_check_ratelimit))) > > - return; > > - > > pr_err("sched: CPU %d need_resched set for > %llu ns (%d ticks) without schedule\n", > > cpu, latency, cpu_rq(cpu)->ticks_without_resched); > > dump_stack(); > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch 2/2] sched/debug: Remove need_resched ratelimiting for warnings 2025-01-07 20:15 ` David Rientjes @ 2025-01-07 20:45 ` Josh Don 2025-01-09 17:59 ` David Rientjes 0 siblings, 1 reply; 7+ messages in thread From: Josh Don @ 2025-01-07 20:45 UTC (permalink / raw) To: David Rientjes Cc: Madadi Vineeth Reddy, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel On Tue, Jan 7, 2025 at 12:15 PM David Rientjes <rientjes@google.com> wrote: > > On Tue, 7 Jan 2025, Madadi Vineeth Reddy wrote: > > > Any idea why it was initially kept to one warning per hour? > > > > Adding Josh Don who may have insight into this historically. No idea on the hour default, unfortunately. Almost certainly arbitrary. > > The possible reasons that come to mind are to prevent excessive logging under > > high CPU contention, as well as to ensure that a warning logged once an hour > > indicates the issue is not caused by a short workload spike. Additionally, > > this rate limit might help avoid impacting system performance due to excessive > > logging. > > > > However, if the default value of latency_warn_once is changed to disable it, it > > may be acceptable to bypass the rate limit, as it would indicate a preference > > for logging over performance. > > > > Right, I think this should be entirely up to what the admin configures in > debugfs. If they elect to disable latency_warn_once, we'll simply emit > the information as often as they specify in latency_warn_ms and not add > our own ratelimiting on top. If they have a preference for lots of > logging, so be it, let's not hide that data. Your change doesn't reset rq->last_seen_need_resched_ns, so now without the ratelimit I think we'll get a dump every single tick until we eventually reschedule. Another potential benefit to the ratelimit is that if we have something wedging multiple cpus concurrently, we don't spam the log (if warn_once is disabled). Though, probably an unlikely occurrence. I think if you modify the patch to reset last_seen_need_resched_ns that'll give the behavior you're after. Best, Josh ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch 2/2] sched/debug: Remove need_resched ratelimiting for warnings 2025-01-07 20:45 ` Josh Don @ 2025-01-09 17:59 ` David Rientjes 2025-01-09 18:53 ` Josh Don 0 siblings, 1 reply; 7+ messages in thread From: David Rientjes @ 2025-01-09 17:59 UTC (permalink / raw) To: Josh Don Cc: Madadi Vineeth Reddy, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel On Tue, 7 Jan 2025, Josh Don wrote: > > Right, I think this should be entirely up to what the admin configures in > > debugfs. If they elect to disable latency_warn_once, we'll simply emit > > the information as often as they specify in latency_warn_ms and not add > > our own ratelimiting on top. If they have a preference for lots of > > logging, so be it, let's not hide that data. > > Your change doesn't reset rq->last_seen_need_resched_ns, so now > without the ratelimit I think we'll get a dump every single tick until > we eventually reschedule. > > Another potential benefit to the ratelimit is that if we have > something wedging multiple cpus concurrently, we don't spam the log > (if warn_once is disabled). Though, probably an unlikely occurrence. > > I think if you modify the patch to reset last_seen_need_resched_ns > that'll give the behavior you're after. > Thanks Josh for pointing this out! I'm surprised by the implementation here where, even though it's only CONFIG_SCHED_DEBUG, we'd be taking the function call every tick only to find that the ratelimit makes it a no-op :/ Is that worth improving as well? Otherwise, please take a look, is this what you had in mind? diff --git a/kernel/sched/core.c b/kernel/sched/core.c --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5659,8 +5659,10 @@ void sched_tick(void) rq_unlock(rq, &rf); - if (sched_feat(LATENCY_WARN) && resched_latency) + if (sched_feat(LATENCY_WARN) && resched_latency) { resched_latency_warn(cpu, resched_latency); + rq->last_seen_need_resched_ns = 0; + } perf_event_task_tick(); diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -1293,11 +1293,6 @@ void proc_sched_set_task(struct task_struct *p) void resched_latency_warn(int cpu, u64 latency) { - static DEFINE_RATELIMIT_STATE(latency_check_ratelimit, 60 * 60 * HZ, 1); - - if (likely(!__ratelimit(&latency_check_ratelimit))) - return; - pr_err("sched: CPU %d need_resched set for > %llu ns (%d ticks) without schedule\n", cpu, latency, cpu_rq(cpu)->ticks_without_resched); dump_stack(); ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch 2/2] sched/debug: Remove need_resched ratelimiting for warnings 2025-01-09 17:59 ` David Rientjes @ 2025-01-09 18:53 ` Josh Don 2025-01-10 0:22 ` David Rientjes 0 siblings, 1 reply; 7+ messages in thread From: Josh Don @ 2025-01-09 18:53 UTC (permalink / raw) To: David Rientjes Cc: Madadi Vineeth Reddy, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel On Thu, Jan 9, 2025 at 9:59 AM David Rientjes <rientjes@google.com> wrote: > > On Tue, 7 Jan 2025, Josh Don wrote: > > > > Right, I think this should be entirely up to what the admin configures in > > > debugfs. If they elect to disable latency_warn_once, we'll simply emit > > > the information as often as they specify in latency_warn_ms and not add > > > our own ratelimiting on top. If they have a preference for lots of > > > logging, so be it, let's not hide that data. > > > > Your change doesn't reset rq->last_seen_need_resched_ns, so now > > without the ratelimit I think we'll get a dump every single tick until > > we eventually reschedule. > > > > Another potential benefit to the ratelimit is that if we have > > something wedging multiple cpus concurrently, we don't spam the log > > (if warn_once is disabled). Though, probably an unlikely occurrence. > > > > I think if you modify the patch to reset last_seen_need_resched_ns > > that'll give the behavior you're after. > > > > Thanks Josh for pointing this out! I'm surprised by the implementation > here where, even though it's only CONFIG_SCHED_DEBUG, we'd be taking the > function call every tick only to find that the ratelimit makes it a no-op > :/ > > Is that worth improving as well? I think your change takes care of it by removing the ratelimit entirely :) > Otherwise, please take a look, is this what you had in mind? I'm realizing now that we'll end up getting multiple splats for a single very long stall (one per the warning threshold). We could fix that by using a magic number rather than 0 here (such as U64_MAX), and then teach resched_latency() to bail out on this value. Additionally, while on the surface it might appear odd to write to the rq field not under lock, but we'll never have concurrent read/write to a given rq's last_seen_need_resched, so that's fine, but just wanted to mention explicitly. > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -5659,8 +5659,10 @@ void sched_tick(void) > > rq_unlock(rq, &rf); > > - if (sched_feat(LATENCY_WARN) && resched_latency) > + if (sched_feat(LATENCY_WARN) && resched_latency) { > resched_latency_warn(cpu, resched_latency); > + rq->last_seen_need_resched_ns = 0; > + } > > perf_event_task_tick(); > > diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c > --- a/kernel/sched/debug.c > +++ b/kernel/sched/debug.c > @@ -1293,11 +1293,6 @@ void proc_sched_set_task(struct task_struct *p) > > void resched_latency_warn(int cpu, u64 latency) > { > - static DEFINE_RATELIMIT_STATE(latency_check_ratelimit, 60 * 60 * HZ, 1); > - > - if (likely(!__ratelimit(&latency_check_ratelimit))) > - return; > - I think it is possible some users would want a control to enact some type of rate-limit even with warn_once disabled, but for now I think this is perfectly reasonable. We can always add a separate knob later on to control a minimum cooldown between splats in that case. > pr_err("sched: CPU %d need_resched set for > %llu ns (%d ticks) without schedule\n", > cpu, latency, cpu_rq(cpu)->ticks_without_resched); > dump_stack(); ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch 2/2] sched/debug: Remove need_resched ratelimiting for warnings 2025-01-09 18:53 ` Josh Don @ 2025-01-10 0:22 ` David Rientjes 0 siblings, 0 replies; 7+ messages in thread From: David Rientjes @ 2025-01-10 0:22 UTC (permalink / raw) To: Josh Don Cc: Madadi Vineeth Reddy, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel On Thu, 9 Jan 2025, Josh Don wrote: > > Otherwise, please take a look, is this what you had in mind? > > I'm realizing now that we'll end up getting multiple splats for a > single very long stall (one per the warning threshold). We could fix > that by using a magic number rather than 0 here (such as U64_MAX), and > then teach resched_latency() to bail out on this value. > Ack, ok. I'll drop this patch because I see that there are existing users (at least one NVMe library) that cares about tuning both of these values with what appears to be some amount of thought: /sys/kernel/debug/sched/latency_warn_once=0 /sys/kernel/debug/sched/latency_warn_ms=16 and the intent was not that they get excessive output that they don't need or aren't expecting. Thanks for looking at this! ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-01-10 0:22 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <77e42990-0ea3-fc53-8051-6856a92ad4d0@google.com>
2025-01-06 20:39 ` [patch 2/2] sched/debug: Remove need_resched ratelimiting for warnings David Rientjes
2025-01-07 18:14 ` Madadi Vineeth Reddy
2025-01-07 20:15 ` David Rientjes
2025-01-07 20:45 ` Josh Don
2025-01-09 17:59 ` David Rientjes
2025-01-09 18:53 ` Josh Don
2025-01-10 0:22 ` David Rientjes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox