From: Jiri Olsa <jolsa@redhat.com>
To: Josh Hunt <johunt@akamai.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: perf record per-thread vs per-cpu behavior
Date: Thu, 9 Dec 2021 18:52:58 +0100 [thread overview]
Message-ID: <YbJCeiV6iTitxYEb@krava> (raw)
In-Reply-To: <40cd791f-c9fd-bf2f-2119-7225460a5e2c@akamai.com>
On Wed, Dec 08, 2021 at 03:33:18PM -0800, Josh Hunt wrote:
> Hi
>
> I'm seeing what I believe to be incorrect behavior when running perf record
> -p and was hoping I could get some clarification from someone here. We have
> an application which has around 800 threads and when I run perf record -p
> <pid> on that the # of samples returned seems impossible given the # of
> cpus, sampling freq, and duration of profile.
>
> I'm running on a 5.4 LTS kernel and the machine has 40 hw threads.
>
> Running with -p <pid> I see:
> # perf record -p 7297 -s -F 49 --call-graph lbr -- sleep 5
> [ perf record: Woken up 32 times to write data ]
> [ perf record: Captured and wrote 11.798 MB perf.data (58430 samples) ]
>
> and I've confirmed there are actually 58k samples in the data file, but the
> # of samples should be closer to 5 * 49 * 40 = 9800.
>
> Running with -a I see something closer to what I would expect:
>
> # perf record -a -s -F 49 --call-graph lbr -- sleep 5
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 3.405 MB perf.data (9745 samples) ]
>
> Running with -p on applications with fewer threads seems to produce
> something closer to what I would expect since the pid being profiled isn't
> on cpu 100% of the time:
>
> # perf record -p 11629 -s -F 49 --call-graph lbr -- sleep 5
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.279 MB perf.data (1324 samples) ]
>
> It's very possible I'm doing something wrong or don't understand things :)
> but if anyone has any insights into the above I'd appreciate your help!
hi,
so with -p x you monitor also all the 800 threads
and IIRC the way -F works is by gradually moving counter's period into
proper value, to reach the desired frequency
so I think you get initial burst of samples when thread gets scheduled
(from 800 threads), which screws the expected count, like:
# ./perf script
value:
sched-messaging 281355 413544.604702: 1 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281355 413544.604707: 1 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281355 413544.604709: 195 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281355 413544.604711: 108360 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281355 413544.604758: 60054506 cycles: ffffffff81653ced __list_del_entry_valid+0x2d ([kernel.kallsyms])
sched-messaging 281341 413544.604778: 1 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281341 413544.604780: 1 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281341 413544.604782: 566 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281341 413544.604784: 357999 cycles: ffffffff81c53ae7 rcu_nmi_exit+0x17 ([kernel.kallsyms])
sched-messaging 281467 413544.604823: 1 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281467 413544.604825: 1 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281467 413544.604827: 645 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281467 413544.604829: 484508 cycles: ffffffff81e015c6 nmi_restore+0x25 ([kernel.kallsyms])
sched-messaging 281467 413544.605025: 345122294 cycles: ffffffff81312f3a memcg_slab_post_alloc_hook+0x17a ([kernel.kallsyms])
sched-messaging 281346 413544.605262: 1 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281346 413544.605266: 1 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281346 413544.605268: 260 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281346 413544.605270: 124558 cycles: ffffffff8106f098 native_write_msr+0x8 ([kernel.kallsyms])
sched-messaging 281345 413544.605306: 1 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281345 413544.605308: 1 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281345 413544.605310: 518 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281345 413544.605312: 344137 cycles: ffffffff81c53ce2 irqentry_nmi_exit+0x32 ([kernel.kallsyms])
...
check the 'value:' column
with system wide monitoring you don't see this issue and the counter
stabilize fast, because it's not scheduled in/out all the time
jirka
>
> I should also say that the resulting reports look different and it has me
> questioning the validity of the -p data.
>
> I'm happy to provide more info if needed, and thanks for any guidance you
> can provide.
>
> Josh
>
prev parent reply other threads:[~2021-12-09 17:53 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-08 23:33 perf record per-thread vs per-cpu behavior Josh Hunt
2021-12-09 17:52 ` Jiri Olsa [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YbJCeiV6iTitxYEb@krava \
--to=jolsa@redhat.com \
--cc=johunt@akamai.com \
--cc=linux-perf-users@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).