linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Broken stack traces with --call-graph=fp and a multi-threaded app due to page faults?
@ 2023-11-08 10:46 Maksymilian Graczyk
  2023-11-10 10:45 ` Maksymilian Graczyk
  2023-11-10 15:59 ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 7+ messages in thread
From: Maksymilian Graczyk @ 2023-11-08 10:46 UTC (permalink / raw)
  To: linux-perf-users; +Cc: syclops-project, Guilherme Amadio, Stephan Hageboeck

Hello all,

I have a problem with broken stack traces in perf when I profile a small 
multi-threaded program in C producing deep (~1000 entries) stacks with 
"perf record --call-graph=fp -e task-clock -F <any number> --off-cpu" 
attached to the program's PID. The callchains seem to stop at "random" 
points throughout my application, occasionally managing to reach the 
bottom (i.e. either "start" or "clone3").

This is the machine configuration I use:

* Intel Xeon Silver 4216 @ 2.10 GHz, with two 16-core CPUs and 
hyper-threading disabled
* 180 GB RAM, with no errors detected by Memtest86+ and no swap
* Gentoo with Linux 6.5.8 (installed from the Gentoo repo with 
gentoo-sources and compiled using this config: 
https://gist.github.com/maksgraczyk/1bce96841a5b2cb2a92f725635c04bf2)
* perf 6.6 with this quick patch of mine: 
https://gist.github.com/maksgraczyk/ee1dd98dda79129a35f7fd3acffb35fd
* Everything is compiled with "-fno-omit-frame-pointer" and 
"-mno-omit-leaf-frame-pointer" gcc flags
* kernel.perf_event_max_stack is set to 1024 and 
kernel.perf_event_paranoid is set to -1

The code of the profiled program is at 
https://gist.github.com/maksgraczyk/da2bc6d0be9d4e7d88f8bea45221a542 
(the higher the value of NUM_THREADS, the more likely the issue is to 
occur; you may need to compile the code without compiler optimisations).

Alongside sampling-based profiling, I run syscall profiling with a 
separate "perf record" instance attached to the same PID.

When I debug the kernel using kgdb, I see more-or-less the following 
behaviour happening in the stack traversal loop in perf_callchain_user() 
in arch/x86/events/core.c for the same thread being profiled:

1. The first sample goes fine, the entire stack is traversed.
2. The second sample breaks at some point inside my program, with a page 
fault due to page not present.
3. The third sample breaks at another *earlier* point inside my program, 
with a page fault due to page not present.
4. The fourth sample breaks at another *later* point inside my program, 
with a page fault due to page not present.

The stack frame addresses do not change throughout profiling and all 
page faults happen at __get_user(frame.next_frame, &fp->next_frame). The 
behaviour above also occurs occasionally in a single-threaded variant of 
the code (without pthread at all) with a very high sampling frequency 
(tens of thousands Hz).

This issue makes profiling results unreliable for my use case, as I 
usually profile multi-threaded applications with deep stacks with 
hundreds of entries (hence why my test program also produces a deep 
stack) and use flame graphs for later analysis.

Could you help me diagnose the problem? For example, what may be the 
cause of my page faults? I also did tests (without debugging though) 
without syscall profiling and the "--off-cpu" flag, broken stacks still 
appeared.

(I cannot use DWARF because it makes profiling too slow and perf.data 
size too large in my tests. I also want to avoid using 
non-portable/vendor-specific stack unwinding solutions like LBR, as we 
may need to run profiling on non-Intel CPUs.)

Best regards,
Maks Graczyk

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-11-11 13:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-08 10:46 Broken stack traces with --call-graph=fp and a multi-threaded app due to page faults? Maksymilian Graczyk
2023-11-10 10:45 ` Maksymilian Graczyk
2023-11-10 10:51   ` Maksymilian Graczyk
2023-11-10 15:59 ` Arnaldo Carvalho de Melo
2023-11-10 17:40   ` long BPF stack traces " Arnaldo Carvalho de Melo
2023-11-10 23:01     ` Namhyung Kim
2023-11-11 13:37       ` Maksymilian Graczyk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).