From: Namhyung Kim <namhyung@kernel.org>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
Masami Hiramatsu <mhiramat@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Andrew Morton <akpm@linux-foundation.org>,
Josh Poimboeuf <jpoimboe@kernel.org>,
x86@kernel.org, Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Indu Bhagat <indu.bhagat@oracle.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
Adrian Hunter <adrian.hunter@intel.com>,
linux-perf-users@vger.kernel.org, Mark Brown <broonie@kernel.org>,
linux-toolchains@vger.kernel.org, Jordan Rome <jordalgo@meta.com>,
Sam James <sam@gentoo.org>,
Andrii Nakryiko <andrii.nakryiko@gmail.com>,
Jens Remus <jremus@linux.ibm.com>,
Florian Weimer <fweimer@redhat.com>,
Andy Lutomirski <luto@kernel.org>, Weinan Liu <wnliu@google.com>,
Blake Jones <blakejones@google.com>,
Beau Belgrave <beaub@linux.microsoft.com>,
"Jose E. Marchesi" <jemarch@gnu.org>
Subject: Re: [PATCH v5 13/17] perf: Support deferred user callchains
Date: Thu, 8 May 2025 11:44:42 -0700 [thread overview]
Message-ID: <aBz7mvEQwtlgNUjI@google.com> (raw)
In-Reply-To: <20250508120321.20677bc6@gandalf.local.home>
Hi Steve,
On Thu, May 08, 2025 at 12:03:21PM -0400, Steven Rostedt wrote:
> On Thu, 24 Apr 2025 12:25:42 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > +static void perf_event_callchain_deferred(struct callback_head *work)
> > +{
> > + struct perf_event *event = container_of(work, struct perf_event, pending_unwind_work);
> > + struct perf_callchain_deferred_event deferred_event;
> > + u64 callchain_context = PERF_CONTEXT_USER;
> > + struct unwind_stacktrace trace;
> > + struct perf_output_handle handle;
> > + struct perf_sample_data data;
> > + u64 nr;
> > +
> > + if (!event->pending_unwind_callback)
> > + return;
> > +
> > + if (unwind_deferred_trace(&trace) < 0)
> > + goto out;
> > +
> > + /*
> > + * All accesses to the event must belong to the same implicit RCU
> > + * read-side critical section as the ->pending_unwind_callback reset.
> > + * See comment in perf_pending_unwind_sync().
> > + */
> > + guard(rcu)();
> > +
> > + if (!current->mm)
> > + goto out;
> > +
> > + nr = trace.nr + 1 ; /* '+1' == callchain_context */
>
> Hi Namhyung,
>
> Talking with Beau about how Microsoft does their own deferred tracing, I
> wonder if the timestamp approach would be useful.
>
> This is where a timestamp is taken at the first request for a deferred
> trace, and this is recorded in the trace when it happens. It basically
> states that "this trace is good up until the given timestamp".
>
> The rationale for this is for lost events. Let's say you have:
>
> <task enters kernel>
> Request deferred trace
>
> <buffer fills up and events start to get lost>
>
> Deferred trace happens (but is dropped due to buffer being full)
>
> <task exits kernel>
>
> <task enters kernel again>
> Request deferred trace (Still dropped due to buffer being full)
>
> <Reader catches up and buffer is free again>
>
> Deferred trace happens (this time it is recorded>
> <task exits kernel>
>
> How would user space know that the deferred trace that was recorded doesn't
> go with the request (and kernel stack trace) that was done initially)?
Right, this is a problem.
>
> If we add a timestamp, then it would look like:
>
> <task enters kernel>
> Request deferred trace
> [Record timestamp]
>
> <buffer fills up and events start to get lost>
>
> Deferred trace happens with timestamp (but is dropped due to buffer being full)
>
> <task exits kernel>
>
> <task enters kernel again>
> Request deferred trace (Still dropped due to buffer being full)
> [Record timestamp]
>
> <Reader catches up and buffer is free again>
>
> Deferred trace happens with timestamp (this time it is recorded>
> <task exits kernel>
>
> Then user space will look at the timestamp that was recorded and know that
> it's not for the initial request because the timestamp of the kernel stack
> trace done was before the timestamp of the user space stacktrace and
> therefore is not valid for the kernel stacktrace.
IIUC the deferred stacktrace will have the timestamp of the first
request, right?
>
> The timestamp would become zero when exiting to user space. The first
> request will add it but would need a cmpxchg to do so, and if the cmpxchg
> fails, it then needs to check if the one recorded is before the current
> one, and if it isn't it still needs to update the timestamp (this is to
> handle races with NMIs).
Yep, it needs to maintain an accurate first timestamp.
>
> Basically, the timestamp would replace the cookie method.
>
> Thoughts?
Sounds good to me. You'll need to add it to the
PERF_RECORD_DEFERRED_CALLCHAIN. Probably it should check if sample_type
has PERF_SAMPLE_TIME. It'd work along with PERF_SAMPLE_TID (which will
be added by the perf tools anyway).
Thanks,
Namhyung
>
> > +
> > + deferred_event.header.type = PERF_RECORD_CALLCHAIN_DEFERRED;
> > + deferred_event.header.misc = PERF_RECORD_MISC_USER;
> > + deferred_event.header.size = sizeof(deferred_event) + (nr * sizeof(u64));
> > +
> > + deferred_event.nr = nr;
> > +
> > + perf_event_header__init_id(&deferred_event.header, &data, event);
> > +
> > + if (perf_output_begin(&handle, &data, event, deferred_event.header.size))
> > + goto out;
> > +
> > + perf_output_put(&handle, deferred_event);
> > + perf_output_put(&handle, callchain_context);
> > + perf_output_copy(&handle, trace.entries, trace.nr * sizeof(u64));
> > + perf_event__output_id_sample(event, &handle, &data);
> > +
> > + perf_output_end(&handle);
> > +
> > +out:
> > + event->pending_unwind_callback = 0;
> > + local_dec(&event->ctx->nr_no_switch_fast);
> > + rcuwait_wake_up(&event->pending_unwind_wait);
> > +}
> > +
next prev parent reply other threads:[~2025-05-08 18:44 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-24 16:25 [PATCH v5 00/17] perf: Deferred unwinding of user space stack traces Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 01/17] unwind_user: Add user space unwinding API Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 02/17] unwind_user: Add frame pointer support Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 03/17] unwind_user/x86: Enable frame pointer unwinding on x86 Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 04/17] perf/x86: Rename and move get_segment_base() and make it global Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 05/17] unwind_user: Add compat mode frame pointer support Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 06/17] unwind_user/x86: Enable compat mode frame pointer unwinding on x86 Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 07/17] unwind_user/deferred: Add unwind_deferred_trace() Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 08/17] unwind_user/deferred: Add unwind cache Steven Rostedt
2025-04-24 19:00 ` Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 09/17] perf: Remove get_perf_callchain() init_nr argument Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 10/17] perf: Have get_perf_callchain() return NULL if crosstask and user are set Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 11/17] perf: Simplify get_perf_callchain() user logic Steven Rostedt
2025-04-24 16:36 ` Peter Zijlstra
2025-04-24 17:28 ` Steven Rostedt
2025-04-24 17:42 ` Mathieu Desnoyers
2025-04-24 17:47 ` Steven Rostedt
2025-04-25 7:13 ` Peter Zijlstra
2025-04-24 16:25 ` [PATCH v5 12/17] perf: Skip user unwind if !current->mm Steven Rostedt
2025-04-24 16:37 ` Peter Zijlstra
2025-04-24 17:01 ` Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 13/17] perf: Support deferred user callchains Steven Rostedt
2025-04-24 16:38 ` Peter Zijlstra
2025-04-24 17:16 ` Steven Rostedt
2025-04-25 15:24 ` Namhyung Kim
2025-04-25 16:58 ` Steven Rostedt
2025-04-28 20:42 ` Namhyung Kim
2025-04-28 22:02 ` Steven Rostedt
2025-04-29 0:29 ` Namhyung Kim
2025-04-29 14:00 ` Steven Rostedt
2025-05-08 16:03 ` Steven Rostedt
2025-05-08 18:44 ` Namhyung Kim [this message]
2025-05-08 18:49 ` Mathieu Desnoyers
2025-05-08 18:54 ` Steven Rostedt
2025-05-09 12:23 ` Mathieu Desnoyers
2025-05-09 15:45 ` Namhyung Kim
2025-05-09 15:55 ` Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 14/17] perf tools: Minimal CALLCHAIN_DEFERRED support Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 15/17] perf record: Enable defer_callchain for user callchains Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 16/17] perf script: Display PERF_RECORD_CALLCHAIN_DEFERRED Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 17/17] perf tools: Merge deferred user callchains Steven Rostedt
2025-04-24 17:04 ` [PATCH v5 00/17] perf: Deferred unwinding of user space stack traces Steven Rostedt
2025-04-24 18:32 ` Miguel Ojeda
2025-04-24 18:41 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aBz7mvEQwtlgNUjI@google.com \
--to=namhyung@kernel.org \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=andrii.nakryiko@gmail.com \
--cc=beaub@linux.microsoft.com \
--cc=blakejones@google.com \
--cc=broonie@kernel.org \
--cc=fweimer@redhat.com \
--cc=indu.bhagat@oracle.com \
--cc=irogers@google.com \
--cc=jemarch@gnu.org \
--cc=jolsa@kernel.org \
--cc=jordalgo@meta.com \
--cc=jpoimboe@kernel.org \
--cc=jremus@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=linux-toolchains@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mark.rutland@arm.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=sam@gentoo.org \
--cc=wnliu@google.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).