* perf record per-thread vs per-cpu behavior
@ 2021-12-08 23:33 Josh Hunt
2021-12-09 17:52 ` Jiri Olsa
0 siblings, 1 reply; 2+ messages in thread
From: Josh Hunt @ 2021-12-08 23:33 UTC (permalink / raw)
To: linux-perf-users
Hi
I'm seeing what I believe to be incorrect behavior when running perf
record -p and was hoping I could get some clarification from someone
here. We have an application which has around 800 threads and when I run
perf record -p <pid> on that the # of samples returned seems impossible
given the # of cpus, sampling freq, and duration of profile.
I'm running on a 5.4 LTS kernel and the machine has 40 hw threads.
Running with -p <pid> I see:
# perf record -p 7297 -s -F 49 --call-graph lbr -- sleep 5
[ perf record: Woken up 32 times to write data ]
[ perf record: Captured and wrote 11.798 MB perf.data (58430 samples) ]
and I've confirmed there are actually 58k samples in the data file, but
the # of samples should be closer to 5 * 49 * 40 = 9800.
Running with -a I see something closer to what I would expect:
# perf record -a -s -F 49 --call-graph lbr -- sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 3.405 MB perf.data (9745 samples) ]
Running with -p on applications with fewer threads seems to produce
something closer to what I would expect since the pid being profiled
isn't on cpu 100% of the time:
# perf record -p 11629 -s -F 49 --call-graph lbr -- sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.279 MB perf.data (1324 samples) ]
It's very possible I'm doing something wrong or don't understand things
:) but if anyone has any insights into the above I'd appreciate your help!
I should also say that the resulting reports look different and it has
me questioning the validity of the -p data.
I'm happy to provide more info if needed, and thanks for any guidance
you can provide.
Josh
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: perf record per-thread vs per-cpu behavior
2021-12-08 23:33 perf record per-thread vs per-cpu behavior Josh Hunt
@ 2021-12-09 17:52 ` Jiri Olsa
0 siblings, 0 replies; 2+ messages in thread
From: Jiri Olsa @ 2021-12-09 17:52 UTC (permalink / raw)
To: Josh Hunt; +Cc: linux-perf-users
On Wed, Dec 08, 2021 at 03:33:18PM -0800, Josh Hunt wrote:
> Hi
>
> I'm seeing what I believe to be incorrect behavior when running perf record
> -p and was hoping I could get some clarification from someone here. We have
> an application which has around 800 threads and when I run perf record -p
> <pid> on that the # of samples returned seems impossible given the # of
> cpus, sampling freq, and duration of profile.
>
> I'm running on a 5.4 LTS kernel and the machine has 40 hw threads.
>
> Running with -p <pid> I see:
> # perf record -p 7297 -s -F 49 --call-graph lbr -- sleep 5
> [ perf record: Woken up 32 times to write data ]
> [ perf record: Captured and wrote 11.798 MB perf.data (58430 samples) ]
>
> and I've confirmed there are actually 58k samples in the data file, but the
> # of samples should be closer to 5 * 49 * 40 = 9800.
>
> Running with -a I see something closer to what I would expect:
>
> # perf record -a -s -F 49 --call-graph lbr -- sleep 5
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 3.405 MB perf.data (9745 samples) ]
>
> Running with -p on applications with fewer threads seems to produce
> something closer to what I would expect since the pid being profiled isn't
> on cpu 100% of the time:
>
> # perf record -p 11629 -s -F 49 --call-graph lbr -- sleep 5
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.279 MB perf.data (1324 samples) ]
>
> It's very possible I'm doing something wrong or don't understand things :)
> but if anyone has any insights into the above I'd appreciate your help!
hi,
so with -p x you monitor also all the 800 threads
and IIRC the way -F works is by gradually moving counter's period into
proper value, to reach the desired frequency
so I think you get initial burst of samples when thread gets scheduled
(from 800 threads), which screws the expected count, like:
# ./perf script
value:
sched-messaging 281355 413544.604702: 1 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281355 413544.604707: 1 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281355 413544.604709: 195 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281355 413544.604711: 108360 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281355 413544.604758: 60054506 cycles: ffffffff81653ced __list_del_entry_valid+0x2d ([kernel.kallsyms])
sched-messaging 281341 413544.604778: 1 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281341 413544.604780: 1 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281341 413544.604782: 566 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281341 413544.604784: 357999 cycles: ffffffff81c53ae7 rcu_nmi_exit+0x17 ([kernel.kallsyms])
sched-messaging 281467 413544.604823: 1 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281467 413544.604825: 1 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281467 413544.604827: 645 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281467 413544.604829: 484508 cycles: ffffffff81e015c6 nmi_restore+0x25 ([kernel.kallsyms])
sched-messaging 281467 413544.605025: 345122294 cycles: ffffffff81312f3a memcg_slab_post_alloc_hook+0x17a ([kernel.kallsyms])
sched-messaging 281346 413544.605262: 1 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281346 413544.605266: 1 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281346 413544.605268: 260 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281346 413544.605270: 124558 cycles: ffffffff8106f098 native_write_msr+0x8 ([kernel.kallsyms])
sched-messaging 281345 413544.605306: 1 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281345 413544.605308: 1 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281345 413544.605310: 518 cycles: ffffffff8106f094 native_write_msr+0x4 ([kernel.kallsyms])
sched-messaging 281345 413544.605312: 344137 cycles: ffffffff81c53ce2 irqentry_nmi_exit+0x32 ([kernel.kallsyms])
...
check the 'value:' column
with system wide monitoring you don't see this issue and the counter
stabilize fast, because it's not scheduled in/out all the time
jirka
>
> I should also say that the resulting reports look different and it has me
> questioning the validity of the -p data.
>
> I'm happy to provide more info if needed, and thanks for any guidance you
> can provide.
>
> Josh
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2021-12-09 17:53 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-12-08 23:33 perf record per-thread vs per-cpu behavior Josh Hunt
2021-12-09 17:52 ` Jiri Olsa
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).