From: Vishal Chourasia <vishalc@linux.ibm.com>
To: John Stultz <jstultz@google.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Daniel Bristot de Oliveira <bristot@redhat.com>,
Phil Auld <pauld@redhat.com>,
Valentin Schneider <vschneid@redhat.com>,
kernel-team@android.com, Zimuzo Ezeozue <zezeozue@google.com>
Subject: Re: [RESEND][PATCH v2] sched: Add trace_sched_waking() tracepoint to sched_ttwu_pending()
Date: Tue, 26 Mar 2024 11:50:12 +0530 [thread overview]
Message-ID: <ZgJpHAmxS4Lh8QEE@linux.ibm.com> (raw)
In-Reply-To: <CANDhNCro84C8uxTo5MJ1xX0wGnTXE7_REm4nqp7jGQ+mN0wfkQ@mail.gmail.com>
On Wed, Mar 20, 2024 at 03:42:31PM -0700, John Stultz wrote:
> On Tue, Mar 19, 2024 at 10:24 PM Vishal Chourasia <vishalc@linux.ibm.com> wrote:
> > On Mon, Mar 18, 2024 at 12:28:33PM -0700, John Stultz wrote:
> > > Zimuzo reported seeing occasional cases in perfetto traces where
> > > tasks went from sleeping directly to trace_sched_wakeup()
> > > without always seeing a trace_sched_waking().
> > >
> > > Looking at the code, trace_sched_wakeup() is only called in
> > > ttwu_do_wakeup()
> > >
> > > The call paths that get you to ttwu_do_wakeup() are:
> > > try_to_wake_up() -> ttwu_do_wakeup()
> > > try_to_wake_up() -> ttwu_runnable() -> ttwu_do_wakeup()
> > > try_to_wake_up() -> ttwu_queue() -> ttwu_do_activate() -> ttwu_do_wakeup()
> > > sched_ttwu_pending() -> ttwu_do_activate() -> ttwu_do_wakeup()
> > Notably, sched_ttwu_pending() is invoked for remote wakeups.
> >
> > Given this, I anticipate a scenario similar to the following
> > occurred: When a process (P) is to be awakened on a remote CPU,
> > the scheduler adds P to the remote CPU's wakelist,a per-CPU queue,
> > and sends an IPI to the remote CPU. This action triggers
> > sched_ttwu_pending() on the remote CPU, which then processes the
> > wakelist and wakes up the queued processes.
>
> Hey! Thanks for taking a look here and sharing this feedback.
>
> Indeed, looking closer here I do have a hard time seeing how we can
> get to sched_ttwu_pending() without having gone through the paths in
> try_to_wake_up() that would call trace_sched_wakeup():
> try_to_wake_up()->[ttwu_queue ->]
> ttwu_queue_wakelist()->sched_ttwu_pending()->...
>
> > In this scenario, the "waking trace" of P, signifying the initiation
> > of the wake-up, is recorded on the CPU where try_to_wake_up was executed.
> > Meanwhile, the "wakeup trace," denoting the completion of the wake-up,
> > is observed on the remote CPU where sched_ttwu_pending() is executed.
> >
> > Is there a possibility that something other than the above occurred
> > in your case?
>
> I suspect that the case reported may be that the task went to sleep
> right after the trace_sched_waking(). Which could result in the
> transition from sleeping to trace_sched_wakeup() without the
> trace_sched_waking() inbetween. The added latency from the remote
> wakeup probably increases the chance for this race to occur.
Hi,
# trace-cmd record -e sched:* perf record -a -g -e sched:* ./schbench
This generates perf.data and trace.dat file.
Running a simple check for equality between sched_waking and
sched_wakeup. I see two different results.
# perf script -G | grep -i "sched_waking" | wc -l
35264
# perf script -G | grep -i "sched_wakeup:" | wc -l
35314
perf report suggests missing sched_waking events. Although trace
collected through ftrace (using trace-cmd) suggests equal waking and
wakeup events.
# trace-cmd report | grep -i "sched_waking" | wc -l
35959
# trace-cmd report | grep -i "sched_wakeup:" | wc -l
35959
Below is a stack trace for a sched_wakeup event for which sched_waking
was not recorded in perf.data file,
schbench 307346 [023] 600177.250150:
sched:sched_wakeup: perf:307321 [120] CPU:023
ttwu_do_activate+0x1c0 ([kernel.kallsyms])
print_fmt_xfs_perag_intents_class+0x0 ([xfs].data)
sched_ttwu_pending+0x118 ([kernel.kallsyms])
=> __flush_smp_call_function_queue+0x19c ([kernel.kallsyms])
smp_ipi_demux_relaxed+0xec ([kernel.kallsyms])
xive_muxed_ipi_action+0x20 ([kernel.kallsyms])
__handle_irq_event_percpu+0x8c ([kernel.kallsyms])
handle_irq_event_percpu+0x2c ([kernel.kallsyms])
handle_percpu_irq+0x80 ([kernel.kallsyms])
generic_handle_irq+0x50 ([kernel.kallsyms])
__do_irq+0xb8 ([kernel.kallsyms])
__do_IRQ+0x88 ([kernel.kallsyms])
print_fmt_xfs_perag_intents_class+0x0 ([xfs].data)
do_IRQ+0x50 ([kernel.kallsyms])
hardware_interrupt_common_virt+0x28c ([kernel.kallsyms])
worker_thread+0x268 (/usr/bin/schbench)
worker_thread+0x4cc (/usr/bin/schbench)
start_thread+0x170 (/usr/lib64/glibc-hwcaps/power10/libc.so.6)
__clone+0xa0 (/usr/lib64/glibc-hwcaps/power10/libc.so.6)
Also, it has been observed for cases where sched_waking is missing only
happens when perf task is woken up.
Looking at the code in __flush_smp_call_function_queue,
it invokes all the callbacks currently queue on the given cpu, and if
sched_ttwu_pending is being called, suggesting that csd of type CSD_TYPE_TTWU
was queued. Now, this happens inside try_to_wake_up.
if there are no other way to queue a csd of type CSD_TYPE_TTWU, then
this suggests some events are being missed by perf.
>
> So I'll withdraw this change for now and sync back up to check if my
> suspicions are correct or not.
>
> Thanks so much again for pointing this out.
> -john
prev parent reply other threads:[~2024-03-26 6:20 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-18 19:28 [RESEND][PATCH v2] sched: Add trace_sched_waking() tracepoint to sched_ttwu_pending() John Stultz
2024-03-20 5:23 ` Vishal Chourasia
2024-03-20 22:42 ` John Stultz
2024-03-26 6:20 ` Vishal Chourasia [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZgJpHAmxS4Lh8QEE@linux.ibm.com \
--to=vishalc@linux.ibm.com \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=jstultz@google.com \
--cc=juri.lelli@redhat.com \
--cc=kernel-team@android.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=pauld@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=zezeozue@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox