All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiri Olsa <jolsa@redhat.com>
To: Josh Hunt <johunt@akamai.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: perf record per-thread vs per-cpu behavior
Date: Thu, 9 Dec 2021 18:52:58 +0100	[thread overview]
Message-ID: <YbJCeiV6iTitxYEb@krava> (raw)
In-Reply-To: <40cd791f-c9fd-bf2f-2119-7225460a5e2c@akamai.com>

On Wed, Dec 08, 2021 at 03:33:18PM -0800, Josh Hunt wrote:
> Hi
> 
> I'm seeing what I believe to be incorrect behavior when running perf record
> -p and was hoping I could get some clarification from someone here. We have
> an application which has around 800 threads and when I run perf record -p
> <pid> on that the # of samples returned seems impossible given the # of
> cpus, sampling freq, and duration of profile.
> 
> I'm running on a 5.4 LTS kernel and the machine has 40 hw threads.
> 
> Running with -p <pid> I see:
> # perf record -p 7297 -s -F 49 --call-graph lbr -- sleep 5
> [ perf record: Woken up 32 times to write data ]
> [ perf record: Captured and wrote 11.798 MB perf.data (58430 samples) ]
> 
> and I've confirmed there are actually 58k samples in the data file, but the
> # of samples should be closer to 5 * 49 * 40 = 9800.
> 
> Running with -a I see something closer to what I would expect:
> 
> # perf record -a -s -F 49 --call-graph lbr -- sleep 5
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 3.405 MB perf.data (9745 samples) ]
> 
> Running with -p on applications with fewer threads seems to produce
> something closer to what I would expect since the pid being profiled isn't
> on cpu 100% of the time:
> 
> # perf record -p 11629 -s -F 49 --call-graph lbr -- sleep 5
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.279 MB perf.data (1324 samples) ]
> 
> It's very possible I'm doing something wrong or don't understand things :)
> but if anyone has any insights into the above I'd appreciate your help!

hi,
so with -p x you monitor also all the 800 threads

and IIRC the way -F works is by gradually moving counter's period into
proper value, to reach the desired frequency

so I think you get initial burst of samples when thread gets scheduled
(from 800 threads), which screws the expected count, like:

# ./perf script
                                           value:
 sched-messaging 281355 413544.604702:          1 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281355 413544.604707:          1 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281355 413544.604709:        195 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281355 413544.604711:     108360 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281355 413544.604758:   60054506 cycles:  ffffffff81653ced __list_del_entry_valid+0x2d ([kernel.kallsyms])
 sched-messaging 281341 413544.604778:          1 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281341 413544.604780:          1 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281341 413544.604782:        566 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281341 413544.604784:     357999 cycles:  ffffffff81c53ae7 rcu_nmi_exit+0x17 ([kernel.kallsyms])
 sched-messaging 281467 413544.604823:          1 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281467 413544.604825:          1 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281467 413544.604827:        645 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281467 413544.604829:     484508 cycles:  ffffffff81e015c6 nmi_restore+0x25 ([kernel.kallsyms])
 sched-messaging 281467 413544.605025:  345122294 cycles:  ffffffff81312f3a memcg_slab_post_alloc_hook+0x17a ([kernel.kallsyms])
 sched-messaging 281346 413544.605262:          1 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281346 413544.605266:          1 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281346 413544.605268:        260 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281346 413544.605270:     124558 cycles:  ffffffff8106f098 native_write_msr+0x8 ([kernel.kallsyms])
 sched-messaging 281345 413544.605306:          1 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281345 413544.605308:          1 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281345 413544.605310:        518 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281345 413544.605312:     344137 cycles:  ffffffff81c53ce2 irqentry_nmi_exit+0x32 ([kernel.kallsyms])
 ...

check the 'value:' column

with system wide monitoring you don't see this issue and the counter
stabilize fast, because it's not scheduled in/out all the time

jirka

> 
> I should also say that the resulting reports look different and it has me
> questioning the validity of the -p data.
> 
> I'm happy to provide more info if needed, and thanks for any guidance you
> can provide.
> 
> Josh
> 


      reply	other threads:[~2021-12-09 17:53 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-08 23:33 perf record per-thread vs per-cpu behavior Josh Hunt
2021-12-09 17:52 ` Jiri Olsa [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YbJCeiV6iTitxYEb@krava \
    --to=jolsa@redhat.com \
    --cc=johunt@akamai.com \
    --cc=linux-perf-users@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.