Re: perf record per-thread vs per-cpu behavior

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jiri Olsa <jolsa@redhat.com>
To: Josh Hunt <johunt@akamai.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: perf record per-thread vs per-cpu behavior
Date: Thu, 9 Dec 2021 18:52:58 +0100	[thread overview]
Message-ID: <YbJCeiV6iTitxYEb@krava> (raw)
In-Reply-To: <40cd791f-c9fd-bf2f-2119-7225460a5e2c@akamai.com>

On Wed, Dec 08, 2021 at 03:33:18PM -0800, Josh Hunt wrote:
> Hi
> 
> I'm seeing what I believe to be incorrect behavior when running perf record
> -p and was hoping I could get some clarification from someone here. We have
> an application which has around 800 threads and when I run perf record -p
> <pid> on that the # of samples returned seems impossible given the # of
> cpus, sampling freq, and duration of profile.
> 
> I'm running on a 5.4 LTS kernel and the machine has 40 hw threads.
> 
> Running with -p <pid> I see:
> # perf record -p 7297 -s -F 49 --call-graph lbr -- sleep 5
> [ perf record: Woken up 32 times to write data ]
> [ perf record: Captured and wrote 11.798 MB perf.data (58430 samples) ]
> 
> and I've confirmed there are actually 58k samples in the data file, but the
> # of samples should be closer to 5 * 49 * 40 = 9800.
> 
> Running with -a I see something closer to what I would expect:
> 
> # perf record -a -s -F 49 --call-graph lbr -- sleep 5
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 3.405 MB perf.data (9745 samples) ]
> 
> Running with -p on applications with fewer threads seems to produce
> something closer to what I would expect since the pid being profiled isn't
> on cpu 100% of the time:
> 
> # perf record -p 11629 -s -F 49 --call-graph lbr -- sleep 5
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.279 MB perf.data (1324 samples) ]
> 
> It's very possible I'm doing something wrong or don't understand things :)
> but if anyone has any insights into the above I'd appreciate your help!

hi,
so with -p x you monitor also all the 800 threads

and IIRC the way -F works is by gradually moving counter's period into
proper value, to reach the desired frequency

so I think you get initial burst of samples when thread gets scheduled
(from 800 threads), which screws the expected count, like:

# ./perf script
                                           value:
 sched-messaging 281355 413544.604702:          1 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281355 413544.604707:          1 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281355 413544.604709:        195 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281355 413544.604711:     108360 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281355 413544.604758:   60054506 cycles:  ffffffff81653ced __list_del_entry_valid+0x2d ([kernel.kallsyms])
 sched-messaging 281341 413544.604778:          1 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281341 413544.604780:          1 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281341 413544.604782:        566 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281341 413544.604784:     357999 cycles:  ffffffff81c53ae7 rcu_nmi_exit+0x17 ([kernel.kallsyms])
 sched-messaging 281467 413544.604823:          1 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281467 413544.604825:          1 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281467 413544.604827:        645 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281467 413544.604829:     484508 cycles:  ffffffff81e015c6 nmi_restore+0x25 ([kernel.kallsyms])
 sched-messaging 281467 413544.605025:  345122294 cycles:  ffffffff81312f3a memcg_slab_post_alloc_hook+0x17a ([kernel.kallsyms])
 sched-messaging 281346 413544.605262:          1 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281346 413544.605266:          1 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281346 413544.605268:        260 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281346 413544.605270:     124558 cycles:  ffffffff8106f098 native_write_msr+0x8 ([kernel.kallsyms])
 sched-messaging 281345 413544.605306:          1 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281345 413544.605308:          1 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281345 413544.605310:        518 cycles:  ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
 sched-messaging 281345 413544.605312:     344137 cycles:  ffffffff81c53ce2 irqentry_nmi_exit+0x32 ([kernel.kallsyms])
 ...

check the 'value:' column

with system wide monitoring you don't see this issue and the counter
stabilize fast, because it's not scheduled in/out all the time

jirka

> 
> I should also say that the resulting reports look different and it has me
> questioning the validity of the -p data.
> 
> I'm happy to provide more info if needed, and thanks for any guidance you
> can provide.
> 
> Josh
>

     prev parent reply	other threads:[~2021-12-09 17:53 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-08 23:33 perf record per-thread vs per-cpu behavior Josh Hunt
2021-12-09 17:52 ` Jiri Olsa [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YbJCeiV6iTitxYEb@krava \
    --to=jolsa@redhat.com \
    --cc=johunt@akamai.com \
    --cc=linux-perf-users@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).