From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Steven Rostedt <rostedt@goodmis.org>,
linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
Namhyung Kim <namhyung@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Josh Poimboeuf <jpoimboe@kernel.org>,
x86@kernel.org, Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Indu Bhagat <indu.bhagat@oracle.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
Adrian Hunter <adrian.hunter@intel.com>,
linux-perf-users@vger.kernel.org, Mark Brown <broonie@kernel.org>,
linux-toolchains@vger.kernel.org, Jordan Rome <jordalgo@meta.com>,
Sam James <sam@gentoo.org>,
Andrii Nakryiko <andrii.nakryiko@gmail.com>,
Jens Remus <jremus@linux.ibm.com>,
Florian Weimer <fweimer@redhat.com>,
Andy Lutomirski <luto@kernel.org>, Weinan Liu <wnliu@google.com>,
Blake Jones <blakejones@google.com>,
Beau Belgrave <beaub@linux.microsoft.com>,
"Jose E. Marchesi" <jemarch@gnu.org>
Subject: Re: [PATCH v5 13/17] perf: Support deferred user callchains
Date: Thu, 8 May 2025 14:49:59 -0400 [thread overview]
Message-ID: <89c62296-fbe4-4d9d-a2ec-19c4ca0c14b2@efficios.com> (raw)
In-Reply-To: <20250508120321.20677bc6@gandalf.local.home>
On 2025-05-08 12:03, Steven Rostedt wrote:
> On Thu, 24 Apr 2025 12:25:42 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
>> +static void perf_event_callchain_deferred(struct callback_head *work)
>> +{
>> + struct perf_event *event = container_of(work, struct perf_event, pending_unwind_work);
>> + struct perf_callchain_deferred_event deferred_event;
>> + u64 callchain_context = PERF_CONTEXT_USER;
>> + struct unwind_stacktrace trace;
>> + struct perf_output_handle handle;
>> + struct perf_sample_data data;
>> + u64 nr;
>> +
>> + if (!event->pending_unwind_callback)
>> + return;
>> +
>> + if (unwind_deferred_trace(&trace) < 0)
>> + goto out;
>> +
>> + /*
>> + * All accesses to the event must belong to the same implicit RCU
>> + * read-side critical section as the ->pending_unwind_callback reset.
>> + * See comment in perf_pending_unwind_sync().
>> + */
>> + guard(rcu)();
>> +
>> + if (!current->mm)
>> + goto out;
>> +
>> + nr = trace.nr + 1 ; /* '+1' == callchain_context */
>
> Hi Namhyung,
>
> Talking with Beau about how Microsoft does their own deferred tracing, I
> wonder if the timestamp approach would be useful.
>
> This is where a timestamp is taken at the first request for a deferred
> trace, and this is recorded in the trace when it happens. It basically
> states that "this trace is good up until the given timestamp".
>
> The rationale for this is for lost events. Let's say you have:
>
> <task enters kernel>
> Request deferred trace
>
> <buffer fills up and events start to get lost>
>
> Deferred trace happens (but is dropped due to buffer being full)
>
> <task exits kernel>
>
> <task enters kernel again>
> Request deferred trace (Still dropped due to buffer being full)
>
> <Reader catches up and buffer is free again>
>
> Deferred trace happens (this time it is recorded>
> <task exits kernel>
>
> How would user space know that the deferred trace that was recorded doesn't
> go with the request (and kernel stack trace) that was done initially)?
>
> If we add a timestamp, then it would look like:
>
> <task enters kernel>
> Request deferred trace
> [Record timestamp]
>
> <buffer fills up and events start to get lost>
>
> Deferred trace happens with timestamp (but is dropped due to buffer being full)
>
> <task exits kernel>
>
> <task enters kernel again>
> Request deferred trace (Still dropped due to buffer being full)
> [Record timestamp]
>
> <Reader catches up and buffer is free again>
>
> Deferred trace happens with timestamp (this time it is recorded>
> <task exits kernel>
>
> Then user space will look at the timestamp that was recorded and know that
> it's not for the initial request because the timestamp of the kernel stack
> trace done was before the timestamp of the user space stacktrace and
> therefore is not valid for the kernel stacktrace.
>
> The timestamp would become zero when exiting to user space. The first
> request will add it but would need a cmpxchg to do so, and if the cmpxchg
> fails, it then needs to check if the one recorded is before the current
> one, and if it isn't it still needs to update the timestamp (this is to
> handle races with NMIs).
>
> Basically, the timestamp would replace the cookie method.
>
> Thoughts?
AFAIR, the cookie method generates the cookie by combining the cpu
number with a per-cpu count.
This ensures that there are not two cookies emitted at the same time
from two CPUs that have the same value by accident.
How would the timestamp method prevent this ?
Thanks,
Mathieu
>
> -- Steve
>
>
>> +
>> + deferred_event.header.type = PERF_RECORD_CALLCHAIN_DEFERRED;
>> + deferred_event.header.misc = PERF_RECORD_MISC_USER;
>> + deferred_event.header.size = sizeof(deferred_event) + (nr * sizeof(u64));
>> +
>> + deferred_event.nr = nr;
>> +
>> + perf_event_header__init_id(&deferred_event.header, &data, event);
>> +
>> + if (perf_output_begin(&handle, &data, event, deferred_event.header.size))
>> + goto out;
>> +
>> + perf_output_put(&handle, deferred_event);
>> + perf_output_put(&handle, callchain_context);
>> + perf_output_copy(&handle, trace.entries, trace.nr * sizeof(u64));
>> + perf_event__output_id_sample(event, &handle, &data);
>> +
>> + perf_output_end(&handle);
>> +
>> +out:
>> + event->pending_unwind_callback = 0;
>> + local_dec(&event->ctx->nr_no_switch_fast);
>> + rcuwait_wake_up(&event->pending_unwind_wait);
>> +}
>> +
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
next prev parent reply other threads:[~2025-05-08 18:50 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-24 16:25 [PATCH v5 00/17] perf: Deferred unwinding of user space stack traces Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 01/17] unwind_user: Add user space unwinding API Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 02/17] unwind_user: Add frame pointer support Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 03/17] unwind_user/x86: Enable frame pointer unwinding on x86 Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 04/17] perf/x86: Rename and move get_segment_base() and make it global Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 05/17] unwind_user: Add compat mode frame pointer support Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 06/17] unwind_user/x86: Enable compat mode frame pointer unwinding on x86 Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 07/17] unwind_user/deferred: Add unwind_deferred_trace() Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 08/17] unwind_user/deferred: Add unwind cache Steven Rostedt
2025-04-24 19:00 ` Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 09/17] perf: Remove get_perf_callchain() init_nr argument Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 10/17] perf: Have get_perf_callchain() return NULL if crosstask and user are set Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 11/17] perf: Simplify get_perf_callchain() user logic Steven Rostedt
2025-04-24 16:36 ` Peter Zijlstra
2025-04-24 17:28 ` Steven Rostedt
2025-04-24 17:42 ` Mathieu Desnoyers
2025-04-24 17:47 ` Steven Rostedt
2025-04-25 7:13 ` Peter Zijlstra
2025-04-24 16:25 ` [PATCH v5 12/17] perf: Skip user unwind if !current->mm Steven Rostedt
2025-04-24 16:37 ` Peter Zijlstra
2025-04-24 17:01 ` Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 13/17] perf: Support deferred user callchains Steven Rostedt
2025-04-24 16:38 ` Peter Zijlstra
2025-04-24 17:16 ` Steven Rostedt
2025-04-25 15:24 ` Namhyung Kim
2025-04-25 16:58 ` Steven Rostedt
2025-04-28 20:42 ` Namhyung Kim
2025-04-28 22:02 ` Steven Rostedt
2025-04-29 0:29 ` Namhyung Kim
2025-04-29 14:00 ` Steven Rostedt
2025-05-08 16:03 ` Steven Rostedt
2025-05-08 18:44 ` Namhyung Kim
2025-05-08 18:49 ` Mathieu Desnoyers [this message]
2025-05-08 18:54 ` Steven Rostedt
2025-05-09 12:23 ` Mathieu Desnoyers
2025-05-09 15:45 ` Namhyung Kim
2025-05-09 15:55 ` Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 14/17] perf tools: Minimal CALLCHAIN_DEFERRED support Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 15/17] perf record: Enable defer_callchain for user callchains Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 16/17] perf script: Display PERF_RECORD_CALLCHAIN_DEFERRED Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 17/17] perf tools: Merge deferred user callchains Steven Rostedt
2025-04-24 17:04 ` [PATCH v5 00/17] perf: Deferred unwinding of user space stack traces Steven Rostedt
2025-04-24 18:32 ` Miguel Ojeda
2025-04-24 18:41 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=89c62296-fbe4-4d9d-a2ec-19c4ca0c14b2@efficios.com \
--to=mathieu.desnoyers@efficios.com \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=andrii.nakryiko@gmail.com \
--cc=beaub@linux.microsoft.com \
--cc=blakejones@google.com \
--cc=broonie@kernel.org \
--cc=fweimer@redhat.com \
--cc=indu.bhagat@oracle.com \
--cc=irogers@google.com \
--cc=jemarch@gnu.org \
--cc=jolsa@kernel.org \
--cc=jordalgo@meta.com \
--cc=jpoimboe@kernel.org \
--cc=jremus@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=linux-toolchains@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mark.rutland@arm.com \
--cc=mhiramat@kernel.org \
--cc=mingo@kernel.org \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=sam@gentoo.org \
--cc=wnliu@google.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).