From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 494BC2111E for ; Fri, 10 Nov 2023 17:40:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Ra52qHk0" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C34C4C433CA; Fri, 10 Nov 2023 17:40:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1699638048; bh=fGBoZnP2xadG2rKM/h58nMACuhHYVt8nwgYEioXQRuo=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Ra52qHk0nia5r1gamKEdE1fvUzyjRyJfdaeqz0eIxLNQnsBNHSVR3J+/sUBfaHZwN oVLfKTpF7lqP6eR2J856K67Er2OANg2u2FCoX/XjdAajlwN3fsco1DNbHFN2sS9yOF NPS5bNlzwgaAUkU/8cnFk5MTh/bwqo6a4OR7H3fWQlVQjLEOkDiVbiu2XMVNRPd0Hx ho2ctMiSYHRIk9kv6SkjTk8ghaa6BQe7JO1iZchC1ZosFp7sjyvEXdNVOihh+PbhEJ 0c6lP48US70DkRZrPh+rAxldVOp9ZXlVhZ6o3o5BoKGwIm8OWKJkyAvzx1guewIHGv 3Mjiu4HVRbY0w== Received: by quaco.ghostprotocols.net (Postfix, from userid 1000) id 48B3D40094; Fri, 10 Nov 2023 14:40:44 -0300 (-03) Date: Fri, 10 Nov 2023 14:40:44 -0300 From: Arnaldo Carvalho de Melo To: Maksymilian Graczyk Cc: Hao Luo , Namhyung Kim , Jiri Olsa , Andrii Nakryiko , linux-perf-users@vger.kernel.org, syclops-project , Guilherme Amadio , Stephan Hageboeck Subject: long BPF stack traces Re: Broken stack traces with --call-graph=fp and a multi-threaded app due to page faults? Message-ID: References: Precedence: bulk X-Mailing-List: linux-perf-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Url: http://acmel.wordpress.com Em Fri, Nov 10, 2023 at 12:59:34PM -0300, Arnaldo Carvalho de Melo escreveu: > Em Wed, Nov 08, 2023 at 11:46:03AM +0100, Maksymilian Graczyk escreveu: > > Alongside sampling-based profiling, I run syscall profiling with a separate > > "perf record" instance attached to the same PID. > > When I debug the kernel using kgdb, I see more-or-less the following > > behaviour happening in the stack traversal loop in perf_callchain_user() in > > arch/x86/events/core.c for the same thread being profiled: > > 1. The first sample goes fine, the entire stack is traversed. > > 2. The second sample breaks at some point inside my program, with a page > > fault due to page not present. > > 3. The third sample breaks at another *earlier* point inside my program, > > with a page fault due to page not present. > > 4. The fourth sample breaks at another *later* point inside my program, with > > a page fault due to page not present. > Namhyung, Jiri: ideas? I have to stop this analysis now, will continue later. Would https://lore.kernel.org/all/20220225234339.2386398-7-haoluo@google.com/ this come into play, i.e. we would need to use a sleepable tp_btf (see below about __get_user(frame.next_frame...) page faulting) in tools/perf/util/bpf_skel/off_cpu.bpf.c, here: [acme@quaco perf-tools-next]$ grep tp_btf tools/perf/util/bpf_skel/off_cpu.bpf.c SEC("tp_btf/task_newtask") SEC("tp_btf/sched_switch") [acme@quaco perf-tools-next]$ But Hao's patch (CCed here) doesn't seem to have made its way to tools/lib/bpf/, Hao, why this hasn't made into libbpf? + SEC_DEF("tp_btf.s/", TRACING, BPF_TRACE_RAW_TP, SEC_ATTACH_BTF | SEC_SLEEPABLE, attach_trace), - Arnaldo > > The stack frame addresses do not change throughout profiling and all page > > faults happen at __get_user(frame.next_frame, &fp->next_frame). The > > behaviour above also occurs occasionally in a single-threaded variant of the > > code (without pthread at all) with a very high sampling frequency (tens of > > thousands Hz). > > This issue makes profiling results unreliable for my use case, as I usually > > profile multi-threaded applications with deep stacks with hundreds of > > entries (hence why my test program also produces a deep stack) and use flame > > graphs for later analysis. > > Could you help me diagnose the problem? For example, what may be the cause > > of my page faults? I also did tests (without debugging though) without > > syscall profiling and the "--off-cpu" flag, broken stacks still appeared. > > (I cannot use DWARF because it makes profiling too slow and perf.data size > > too large in my tests. I also want to avoid using > > non-portable/vendor-specific stack unwinding solutions like LBR, as we may > > need to run profiling on non-Intel CPUs.)