* [RESEND][PATCH v2] sched: Add trace_sched_waking() tracepoint to sched_ttwu_pending() @ 2024-03-18 19:28 John Stultz 2024-03-20 5:23 ` Vishal Chourasia 0 siblings, 1 reply; 4+ messages in thread From: John Stultz @ 2024-03-18 19:28 UTC (permalink / raw) To: LKML Cc: John Stultz, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, Phil Auld, Valentin Schneider, kernel-team, Zimuzo Ezeozue Zimuzo reported seeing occasional cases in perfetto traces where tasks went from sleeping directly to trace_sched_wakeup() without always seeing a trace_sched_waking(). Looking at the code, trace_sched_wakeup() is only called in ttwu_do_wakeup() The call paths that get you to ttwu_do_wakeup() are: try_to_wake_up() -> ttwu_do_wakeup() try_to_wake_up() -> ttwu_runnable() -> ttwu_do_wakeup() try_to_wake_up() -> ttwu_queue() -> ttwu_do_activate() -> ttwu_do_wakeup() sched_ttwu_pending() -> ttwu_do_activate() -> ttwu_do_wakeup() where trace_sched_waking() is currently called only in try_to_wake_up(). So add a trace_sched_waking() call to sched_ttwu_pending(), so we see the same state machine transitions. With this change, the number of unexpected state transitions in perfetto was greatly reduced. Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Phil Auld <pauld@redhat.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: kernel-team@android.com Reviewed-by: Phil Auld <pauld@redhat.com> Reported-by: Zimuzo Ezeozue <zezeozue@google.com> Signed-off-by: John Stultz <jstultz@google.com> --- v2: * Minor commit message fix suggested by Phil Auld --- kernel/sched/core.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 9116bcc90346..233f06360d6a 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3894,6 +3894,7 @@ void sched_ttwu_pending(void *arg) if (WARN_ON_ONCE(task_cpu(p) != cpu_of(rq))) set_task_cpu(p, cpu_of(rq)); + trace_sched_waking(p); ttwu_do_activate(rq, p, p->sched_remote_wakeup ? WF_MIGRATED : 0, &rf); } -- 2.44.0.291.gc1ea87d7ee-goog ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [RESEND][PATCH v2] sched: Add trace_sched_waking() tracepoint to sched_ttwu_pending() 2024-03-18 19:28 [RESEND][PATCH v2] sched: Add trace_sched_waking() tracepoint to sched_ttwu_pending() John Stultz @ 2024-03-20 5:23 ` Vishal Chourasia 2024-03-20 22:42 ` John Stultz 0 siblings, 1 reply; 4+ messages in thread From: Vishal Chourasia @ 2024-03-20 5:23 UTC (permalink / raw) To: John Stultz Cc: LKML, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, Phil Auld, Valentin Schneider, kernel-team, Zimuzo Ezeozue Hi, On Mon, Mar 18, 2024 at 12:28:33PM -0700, John Stultz wrote: > Zimuzo reported seeing occasional cases in perfetto traces where > tasks went from sleeping directly to trace_sched_wakeup() > without always seeing a trace_sched_waking(). > > Looking at the code, trace_sched_wakeup() is only called in > ttwu_do_wakeup() > > The call paths that get you to ttwu_do_wakeup() are: > try_to_wake_up() -> ttwu_do_wakeup() > try_to_wake_up() -> ttwu_runnable() -> ttwu_do_wakeup() > try_to_wake_up() -> ttwu_queue() -> ttwu_do_activate() -> ttwu_do_wakeup() > sched_ttwu_pending() -> ttwu_do_activate() -> ttwu_do_wakeup() Notably, sched_ttwu_pending() is invoked for remote wakeups. Given this, I anticipate a scenario similar to the following occurred: When a process (P) is to be awakened on a remote CPU, the scheduler adds P to the remote CPU's wakelist,a per-CPU queue, and sends an IPI to the remote CPU. This action triggers sched_ttwu_pending() on the remote CPU, which then processes the wakelist and wakes up the queued processes. In this scenario, the "waking trace" of P, signifying the initiation of the wake-up, is recorded on the CPU where try_to_wake_up was executed. Meanwhile, the "wakeup trace," denoting the completion of the wake-up, is observed on the remote CPU where sched_ttwu_pending() is executed. Is there a possibility that something other than the above occurred in your case? > > where trace_sched_waking() is currently called only in > try_to_wake_up(). > > So add a trace_sched_waking() call to sched_ttwu_pending(), so > we see the same state machine transitions. > > With this change, the number of unexpected state transitions in > perfetto was greatly reduced. > > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Juri Lelli <juri.lelli@redhat.com> > Cc: Vincent Guittot <vincent.guittot@linaro.org> > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> > Cc: Steven Rostedt <rostedt@goodmis.org> > Cc: Ben Segall <bsegall@google.com> > Cc: Mel Gorman <mgorman@suse.de> > Cc: Daniel Bristot de Oliveira <bristot@redhat.com> > Cc: Phil Auld <pauld@redhat.com> > Cc: Valentin Schneider <vschneid@redhat.com> > Cc: kernel-team@android.com > Reviewed-by: Phil Auld <pauld@redhat.com> > Reported-by: Zimuzo Ezeozue <zezeozue@google.com> > Signed-off-by: John Stultz <jstultz@google.com> > --- > v2: > * Minor commit message fix suggested by Phil Auld > --- > kernel/sched/core.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 9116bcc90346..233f06360d6a 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3894,6 +3894,7 @@ void sched_ttwu_pending(void *arg) > if (WARN_ON_ONCE(task_cpu(p) != cpu_of(rq))) > set_task_cpu(p, cpu_of(rq)); > > + trace_sched_waking(p); > ttwu_do_activate(rq, p, p->sched_remote_wakeup ? WF_MIGRATED : 0, &rf); > } > > -- > 2.44.0.291.gc1ea87d7ee-goog > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RESEND][PATCH v2] sched: Add trace_sched_waking() tracepoint to sched_ttwu_pending() 2024-03-20 5:23 ` Vishal Chourasia @ 2024-03-20 22:42 ` John Stultz 2024-03-26 6:20 ` Vishal Chourasia 0 siblings, 1 reply; 4+ messages in thread From: John Stultz @ 2024-03-20 22:42 UTC (permalink / raw) To: Vishal Chourasia Cc: LKML, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, Phil Auld, Valentin Schneider, kernel-team, Zimuzo Ezeozue On Tue, Mar 19, 2024 at 10:24 PM Vishal Chourasia <vishalc@linux.ibm.com> wrote: > On Mon, Mar 18, 2024 at 12:28:33PM -0700, John Stultz wrote: > > Zimuzo reported seeing occasional cases in perfetto traces where > > tasks went from sleeping directly to trace_sched_wakeup() > > without always seeing a trace_sched_waking(). > > > > Looking at the code, trace_sched_wakeup() is only called in > > ttwu_do_wakeup() > > > > The call paths that get you to ttwu_do_wakeup() are: > > try_to_wake_up() -> ttwu_do_wakeup() > > try_to_wake_up() -> ttwu_runnable() -> ttwu_do_wakeup() > > try_to_wake_up() -> ttwu_queue() -> ttwu_do_activate() -> ttwu_do_wakeup() > > sched_ttwu_pending() -> ttwu_do_activate() -> ttwu_do_wakeup() > Notably, sched_ttwu_pending() is invoked for remote wakeups. > > Given this, I anticipate a scenario similar to the following > occurred: When a process (P) is to be awakened on a remote CPU, > the scheduler adds P to the remote CPU's wakelist,a per-CPU queue, > and sends an IPI to the remote CPU. This action triggers > sched_ttwu_pending() on the remote CPU, which then processes the > wakelist and wakes up the queued processes. Hey! Thanks for taking a look here and sharing this feedback. Indeed, looking closer here I do have a hard time seeing how we can get to sched_ttwu_pending() without having gone through the paths in try_to_wake_up() that would call trace_sched_wakeup(): try_to_wake_up()->[ttwu_queue ->] ttwu_queue_wakelist()->sched_ttwu_pending()->... > In this scenario, the "waking trace" of P, signifying the initiation > of the wake-up, is recorded on the CPU where try_to_wake_up was executed. > Meanwhile, the "wakeup trace," denoting the completion of the wake-up, > is observed on the remote CPU where sched_ttwu_pending() is executed. > > Is there a possibility that something other than the above occurred > in your case? I suspect that the case reported may be that the task went to sleep right after the trace_sched_waking(). Which could result in the transition from sleeping to trace_sched_wakeup() without the trace_sched_waking() inbetween. The added latency from the remote wakeup probably increases the chance for this race to occur. So I'll withdraw this change for now and sync back up to check if my suspicions are correct or not. Thanks so much again for pointing this out. -john ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RESEND][PATCH v2] sched: Add trace_sched_waking() tracepoint to sched_ttwu_pending() 2024-03-20 22:42 ` John Stultz @ 2024-03-26 6:20 ` Vishal Chourasia 0 siblings, 0 replies; 4+ messages in thread From: Vishal Chourasia @ 2024-03-26 6:20 UTC (permalink / raw) To: John Stultz Cc: LKML, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, Phil Auld, Valentin Schneider, kernel-team, Zimuzo Ezeozue On Wed, Mar 20, 2024 at 03:42:31PM -0700, John Stultz wrote: > On Tue, Mar 19, 2024 at 10:24 PM Vishal Chourasia <vishalc@linux.ibm.com> wrote: > > On Mon, Mar 18, 2024 at 12:28:33PM -0700, John Stultz wrote: > > > Zimuzo reported seeing occasional cases in perfetto traces where > > > tasks went from sleeping directly to trace_sched_wakeup() > > > without always seeing a trace_sched_waking(). > > > > > > Looking at the code, trace_sched_wakeup() is only called in > > > ttwu_do_wakeup() > > > > > > The call paths that get you to ttwu_do_wakeup() are: > > > try_to_wake_up() -> ttwu_do_wakeup() > > > try_to_wake_up() -> ttwu_runnable() -> ttwu_do_wakeup() > > > try_to_wake_up() -> ttwu_queue() -> ttwu_do_activate() -> ttwu_do_wakeup() > > > sched_ttwu_pending() -> ttwu_do_activate() -> ttwu_do_wakeup() > > Notably, sched_ttwu_pending() is invoked for remote wakeups. > > > > Given this, I anticipate a scenario similar to the following > > occurred: When a process (P) is to be awakened on a remote CPU, > > the scheduler adds P to the remote CPU's wakelist,a per-CPU queue, > > and sends an IPI to the remote CPU. This action triggers > > sched_ttwu_pending() on the remote CPU, which then processes the > > wakelist and wakes up the queued processes. > > Hey! Thanks for taking a look here and sharing this feedback. > > Indeed, looking closer here I do have a hard time seeing how we can > get to sched_ttwu_pending() without having gone through the paths in > try_to_wake_up() that would call trace_sched_wakeup(): > try_to_wake_up()->[ttwu_queue ->] > ttwu_queue_wakelist()->sched_ttwu_pending()->... > > > In this scenario, the "waking trace" of P, signifying the initiation > > of the wake-up, is recorded on the CPU where try_to_wake_up was executed. > > Meanwhile, the "wakeup trace," denoting the completion of the wake-up, > > is observed on the remote CPU where sched_ttwu_pending() is executed. > > > > Is there a possibility that something other than the above occurred > > in your case? > > I suspect that the case reported may be that the task went to sleep > right after the trace_sched_waking(). Which could result in the > transition from sleeping to trace_sched_wakeup() without the > trace_sched_waking() inbetween. The added latency from the remote > wakeup probably increases the chance for this race to occur. Hi, # trace-cmd record -e sched:* perf record -a -g -e sched:* ./schbench This generates perf.data and trace.dat file. Running a simple check for equality between sched_waking and sched_wakeup. I see two different results. # perf script -G | grep -i "sched_waking" | wc -l 35264 # perf script -G | grep -i "sched_wakeup:" | wc -l 35314 perf report suggests missing sched_waking events. Although trace collected through ftrace (using trace-cmd) suggests equal waking and wakeup events. # trace-cmd report | grep -i "sched_waking" | wc -l 35959 # trace-cmd report | grep -i "sched_wakeup:" | wc -l 35959 Below is a stack trace for a sched_wakeup event for which sched_waking was not recorded in perf.data file, schbench 307346 [023] 600177.250150: sched:sched_wakeup: perf:307321 [120] CPU:023 ttwu_do_activate+0x1c0 ([kernel.kallsyms]) print_fmt_xfs_perag_intents_class+0x0 ([xfs].data) sched_ttwu_pending+0x118 ([kernel.kallsyms]) => __flush_smp_call_function_queue+0x19c ([kernel.kallsyms]) smp_ipi_demux_relaxed+0xec ([kernel.kallsyms]) xive_muxed_ipi_action+0x20 ([kernel.kallsyms]) __handle_irq_event_percpu+0x8c ([kernel.kallsyms]) handle_irq_event_percpu+0x2c ([kernel.kallsyms]) handle_percpu_irq+0x80 ([kernel.kallsyms]) generic_handle_irq+0x50 ([kernel.kallsyms]) __do_irq+0xb8 ([kernel.kallsyms]) __do_IRQ+0x88 ([kernel.kallsyms]) print_fmt_xfs_perag_intents_class+0x0 ([xfs].data) do_IRQ+0x50 ([kernel.kallsyms]) hardware_interrupt_common_virt+0x28c ([kernel.kallsyms]) worker_thread+0x268 (/usr/bin/schbench) worker_thread+0x4cc (/usr/bin/schbench) start_thread+0x170 (/usr/lib64/glibc-hwcaps/power10/libc.so.6) __clone+0xa0 (/usr/lib64/glibc-hwcaps/power10/libc.so.6) Also, it has been observed for cases where sched_waking is missing only happens when perf task is woken up. Looking at the code in __flush_smp_call_function_queue, it invokes all the callbacks currently queue on the given cpu, and if sched_ttwu_pending is being called, suggesting that csd of type CSD_TYPE_TTWU was queued. Now, this happens inside try_to_wake_up. if there are no other way to queue a csd of type CSD_TYPE_TTWU, then this suggests some events are being missed by perf. > > So I'll withdraw this change for now and sync back up to check if my > suspicions are correct or not. > > Thanks so much again for pointing this out. > -john ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-03-26 6:20 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-03-18 19:28 [RESEND][PATCH v2] sched: Add trace_sched_waking() tracepoint to sched_ttwu_pending() John Stultz 2024-03-20 5:23 ` Vishal Chourasia 2024-03-20 22:42 ` John Stultz 2024-03-26 6:20 ` Vishal Chourasia
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox