All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vishal Chourasia <vishalc@linux.ibm.com>
To: John Stultz <jstultz@google.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Phil Auld <pauld@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	kernel-team@android.com, Zimuzo Ezeozue <zezeozue@google.com>
Subject: Re: [RESEND][PATCH v2] sched: Add trace_sched_waking() tracepoint to sched_ttwu_pending()
Date: Tue, 26 Mar 2024 11:50:12 +0530	[thread overview]
Message-ID: <ZgJpHAmxS4Lh8QEE@linux.ibm.com> (raw)
In-Reply-To: <CANDhNCro84C8uxTo5MJ1xX0wGnTXE7_REm4nqp7jGQ+mN0wfkQ@mail.gmail.com>

On Wed, Mar 20, 2024 at 03:42:31PM -0700, John Stultz wrote:
> On Tue, Mar 19, 2024 at 10:24 PM Vishal Chourasia <vishalc@linux.ibm.com> wrote:
> > On Mon, Mar 18, 2024 at 12:28:33PM -0700, John Stultz wrote:
> > > Zimuzo reported seeing occasional cases in perfetto traces where
> > > tasks went from sleeping directly to trace_sched_wakeup()
> > > without always seeing a trace_sched_waking().
> > >
> > > Looking at the code, trace_sched_wakeup() is only called in
> > > ttwu_do_wakeup()
> > >
> > > The call paths that get you to ttwu_do_wakeup() are:
> > > try_to_wake_up() -> ttwu_do_wakeup()
> > > try_to_wake_up() -> ttwu_runnable() -> ttwu_do_wakeup()
> > > try_to_wake_up() -> ttwu_queue() -> ttwu_do_activate() -> ttwu_do_wakeup()
> > > sched_ttwu_pending() -> ttwu_do_activate() -> ttwu_do_wakeup()
> > Notably, sched_ttwu_pending() is invoked for remote wakeups.
> >
> > Given this, I anticipate a scenario similar to the following
> > occurred: When a process (P) is to be awakened on a remote CPU,
> > the scheduler adds P to the remote CPU's wakelist,a per-CPU queue,
> > and sends an IPI to the remote CPU. This action triggers
> > sched_ttwu_pending() on the remote CPU, which then processes the
> > wakelist and wakes up the queued processes.
> 
> Hey! Thanks for taking a look here and sharing this feedback.
> 
> Indeed, looking closer here I do have a hard time seeing how we can
> get to sched_ttwu_pending() without having gone through the paths in
> try_to_wake_up() that would call trace_sched_wakeup():
>   try_to_wake_up()->[ttwu_queue ->]
> ttwu_queue_wakelist()->sched_ttwu_pending()->...
> 
> > In this scenario, the "waking trace" of P, signifying the initiation
> > of the wake-up, is recorded on the CPU where try_to_wake_up was executed.
> > Meanwhile, the "wakeup trace," denoting the completion of the wake-up,
> > is observed on the remote CPU where sched_ttwu_pending() is executed.
> >
> > Is there a possibility that something other than the above occurred
> > in your case?
> 
> I suspect that the case reported may be that the task went to sleep
> right after the trace_sched_waking(). Which could result in the
> transition from sleeping to trace_sched_wakeup() without the
> trace_sched_waking() inbetween.  The added latency from the remote
> wakeup probably increases the chance for this race to occur.

Hi,

# trace-cmd record -e sched:* perf record -a -g -e sched:* ./schbench
This generates perf.data and trace.dat file.

Running a simple check for equality between sched_waking and
sched_wakeup. I see two different results.

# perf script -G | grep -i "sched_waking" | wc -l
35264
# perf script -G | grep -i "sched_wakeup:" | wc -l
35314

perf report suggests missing sched_waking events. Although trace
collected through ftrace (using trace-cmd) suggests equal waking and
wakeup events.

# trace-cmd report | grep -i "sched_waking" | wc -l
35959
# trace-cmd report | grep -i "sched_wakeup:" | wc -l
35959

Below is a stack trace for a sched_wakeup event for which sched_waking
was not recorded in perf.data file,

schbench  307346 [023] 600177.250150:    
         sched:sched_wakeup: perf:307321 [120] CPU:023
         ttwu_do_activate+0x1c0 ([kernel.kallsyms])
         print_fmt_xfs_perag_intents_class+0x0 ([xfs].data)
         sched_ttwu_pending+0x118 ([kernel.kallsyms])
=>       __flush_smp_call_function_queue+0x19c ([kernel.kallsyms])
         smp_ipi_demux_relaxed+0xec ([kernel.kallsyms])
         xive_muxed_ipi_action+0x20 ([kernel.kallsyms])
         __handle_irq_event_percpu+0x8c ([kernel.kallsyms])
         handle_irq_event_percpu+0x2c ([kernel.kallsyms])
         handle_percpu_irq+0x80 ([kernel.kallsyms])
         generic_handle_irq+0x50 ([kernel.kallsyms])
         __do_irq+0xb8 ([kernel.kallsyms])
         __do_IRQ+0x88 ([kernel.kallsyms])
         print_fmt_xfs_perag_intents_class+0x0 ([xfs].data)
         do_IRQ+0x50 ([kernel.kallsyms])
         hardware_interrupt_common_virt+0x28c ([kernel.kallsyms])
         worker_thread+0x268 (/usr/bin/schbench)
         worker_thread+0x4cc (/usr/bin/schbench)
         start_thread+0x170 (/usr/lib64/glibc-hwcaps/power10/libc.so.6)
         __clone+0xa0 (/usr/lib64/glibc-hwcaps/power10/libc.so.6)

Also, it has been observed for cases where sched_waking is missing only
happens when perf task is woken up.

Looking at the code in __flush_smp_call_function_queue,
it invokes all the callbacks currently queue on the given cpu, and if
sched_ttwu_pending is being called, suggesting that csd of type CSD_TYPE_TTWU
was queued. Now, this happens inside try_to_wake_up.

if there are no other way to queue a csd of type CSD_TYPE_TTWU, then
this suggests some events are being missed by perf.

> 
> So I'll withdraw this change for now and sync back up to check if my
> suspicions are correct or not.
> 
> Thanks so much again for pointing this out.
> -john

      reply	other threads:[~2024-03-26  6:20 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-18 19:28 [RESEND][PATCH v2] sched: Add trace_sched_waking() tracepoint to sched_ttwu_pending() John Stultz
2024-03-20  5:23 ` Vishal Chourasia
2024-03-20 22:42   ` John Stultz
2024-03-26  6:20     ` Vishal Chourasia [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZgJpHAmxS4Lh8QEE@linux.ibm.com \
    --to=vishalc@linux.ibm.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=jstultz@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=zezeozue@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.