All of lore.kernel.org
 help / color / mirror / Atom feed
From: Namhyung Kim <namhyung@kernel.org>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Josh Poimboeuf <jpoimboe@kernel.org>,
	x86@kernel.org, Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Indu Bhagat <indu.bhagat@oracle.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	linux-perf-users@vger.kernel.org, Mark Brown <broonie@kernel.org>,
	linux-toolchains@vger.kernel.org, Jordan Rome <jordalgo@meta.com>,
	Sam James <sam@gentoo.org>,
	Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	Jens Remus <jremus@linux.ibm.com>,
	Florian Weimer <fweimer@redhat.com>,
	Andy Lutomirski <luto@kernel.org>, Weinan Liu <wnliu@google.com>,
	Blake Jones <blakejones@google.com>,
	Beau Belgrave <beaub@linux.microsoft.com>,
	"Jose E. Marchesi" <jemarch@gnu.org>
Subject: Re: [PATCH v5 13/17] perf: Support deferred user callchains
Date: Thu, 8 May 2025 11:44:42 -0700	[thread overview]
Message-ID: <aBz7mvEQwtlgNUjI@google.com> (raw)
In-Reply-To: <20250508120321.20677bc6@gandalf.local.home>

Hi Steve,

On Thu, May 08, 2025 at 12:03:21PM -0400, Steven Rostedt wrote:
> On Thu, 24 Apr 2025 12:25:42 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > +static void perf_event_callchain_deferred(struct callback_head *work)
> > +{
> > +	struct perf_event *event = container_of(work, struct perf_event, pending_unwind_work);
> > +	struct perf_callchain_deferred_event deferred_event;
> > +	u64 callchain_context = PERF_CONTEXT_USER;
> > +	struct unwind_stacktrace trace;
> > +	struct perf_output_handle handle;
> > +	struct perf_sample_data data;
> > +	u64 nr;
> > +
> > +	if (!event->pending_unwind_callback)
> > +		return;
> > +
> > +	if (unwind_deferred_trace(&trace) < 0)
> > +		goto out;
> > +
> > +	/*
> > +	 * All accesses to the event must belong to the same implicit RCU
> > +	 * read-side critical section as the ->pending_unwind_callback reset.
> > +	 * See comment in perf_pending_unwind_sync().
> > +	 */
> > +	guard(rcu)();
> > +
> > +	if (!current->mm)
> > +		goto out;
> > +
> > +	nr = trace.nr + 1 ; /* '+1' == callchain_context */
> 
> Hi Namhyung,
> 
> Talking with Beau about how Microsoft does their own deferred tracing, I
> wonder if the timestamp approach would be useful.
> 
> This is where a timestamp is taken at the first request for a deferred
> trace, and this is recorded in the trace when it happens. It basically
> states that "this trace is good up until the given timestamp".
> 
> The rationale for this is for lost events. Let's say you have:
> 
>   <task enters kernel>
>     Request deferred trace
> 
>     <buffer fills up and events start to get lost>
> 
>     Deferred trace happens (but is dropped due to buffer being full)
> 
>   <task exits kernel>
> 
>   <task enters kernel again>
>     Request deferred trace  (Still dropped due to buffer being full)
> 
>     <Reader catches up and buffer is free again>
> 
>     Deferred trace happens (this time it is recorded>
>   <task exits kernel>
> 
> How would user space know that the deferred trace that was recorded doesn't
> go with the request (and kernel stack trace) that was done initially)?

Right, this is a problem.

> 
> If we add a timestamp, then it would look like:
> 
>   <task enters kernel>
>     Request deferred trace
>     [Record timestamp]
> 
>     <buffer fills up and events start to get lost>
> 
>     Deferred trace happens with timestamp (but is dropped due to buffer being full)
> 
>   <task exits kernel>
> 
>   <task enters kernel again>
>     Request deferred trace  (Still dropped due to buffer being full)
>     [Record timestamp]
> 
>     <Reader catches up and buffer is free again>
> 
>     Deferred trace happens with timestamp (this time it is recorded>
>   <task exits kernel>
> 
> Then user space will look at the timestamp that was recorded and know that
> it's not for the initial request because the timestamp of the kernel stack
> trace done was before the timestamp of the user space stacktrace and
> therefore is not valid for the kernel stacktrace.

IIUC the deferred stacktrace will have the timestamp of the first
request, right?

> 
> The timestamp would become zero when exiting to user space. The first
> request will add it but would need a cmpxchg to do so, and if the cmpxchg
> fails, it then needs to check if the one recorded is before the current
> one, and if it isn't it still needs to update the timestamp (this is to
> handle races with NMIs).

Yep, it needs to maintain an accurate first timestamp.

> 
> Basically, the timestamp would replace the cookie method.
> 
> Thoughts?

Sounds good to me.  You'll need to add it to the
PERF_RECORD_DEFERRED_CALLCHAIN.  Probably it should check if sample_type
has PERF_SAMPLE_TIME.  It'd work along with PERF_SAMPLE_TID (which will
be added by the perf tools anyway).
 
Thanks,
Namhyung

> 
> > +
> > +	deferred_event.header.type = PERF_RECORD_CALLCHAIN_DEFERRED;
> > +	deferred_event.header.misc = PERF_RECORD_MISC_USER;
> > +	deferred_event.header.size = sizeof(deferred_event) + (nr * sizeof(u64));
> > +
> > +	deferred_event.nr = nr;
> > +
> > +	perf_event_header__init_id(&deferred_event.header, &data, event);
> > +
> > +	if (perf_output_begin(&handle, &data, event, deferred_event.header.size))
> > +		goto out;
> > +
> > +	perf_output_put(&handle, deferred_event);
> > +	perf_output_put(&handle, callchain_context);
> > +	perf_output_copy(&handle, trace.entries, trace.nr * sizeof(u64));
> > +	perf_event__output_id_sample(event, &handle, &data);
> > +
> > +	perf_output_end(&handle);
> > +
> > +out:
> > +	event->pending_unwind_callback = 0;
> > +	local_dec(&event->ctx->nr_no_switch_fast);
> > +	rcuwait_wake_up(&event->pending_unwind_wait);
> > +}
> > +

  reply	other threads:[~2025-05-08 18:44 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-24 16:25 [PATCH v5 00/17] perf: Deferred unwinding of user space stack traces Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 01/17] unwind_user: Add user space unwinding API Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 02/17] unwind_user: Add frame pointer support Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 03/17] unwind_user/x86: Enable frame pointer unwinding on x86 Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 04/17] perf/x86: Rename and move get_segment_base() and make it global Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 05/17] unwind_user: Add compat mode frame pointer support Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 06/17] unwind_user/x86: Enable compat mode frame pointer unwinding on x86 Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 07/17] unwind_user/deferred: Add unwind_deferred_trace() Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 08/17] unwind_user/deferred: Add unwind cache Steven Rostedt
2025-04-24 19:00   ` Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 09/17] perf: Remove get_perf_callchain() init_nr argument Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 10/17] perf: Have get_perf_callchain() return NULL if crosstask and user are set Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 11/17] perf: Simplify get_perf_callchain() user logic Steven Rostedt
2025-04-24 16:36   ` Peter Zijlstra
2025-04-24 17:28     ` Steven Rostedt
2025-04-24 17:42       ` Mathieu Desnoyers
2025-04-24 17:47         ` Steven Rostedt
2025-04-25  7:13       ` Peter Zijlstra
2025-04-24 16:25 ` [PATCH v5 12/17] perf: Skip user unwind if !current->mm Steven Rostedt
2025-04-24 16:37   ` Peter Zijlstra
2025-04-24 17:01     ` Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 13/17] perf: Support deferred user callchains Steven Rostedt
2025-04-24 16:38   ` Peter Zijlstra
2025-04-24 17:16     ` Steven Rostedt
2025-04-25 15:24   ` Namhyung Kim
2025-04-25 16:58     ` Steven Rostedt
2025-04-28 20:42       ` Namhyung Kim
2025-04-28 22:02         ` Steven Rostedt
2025-04-29  0:29           ` Namhyung Kim
2025-04-29 14:00             ` Steven Rostedt
2025-05-08 16:03   ` Steven Rostedt
2025-05-08 18:44     ` Namhyung Kim [this message]
2025-05-08 18:49     ` Mathieu Desnoyers
2025-05-08 18:54       ` Steven Rostedt
2025-05-09 12:23         ` Mathieu Desnoyers
2025-05-09 15:45           ` Namhyung Kim
2025-05-09 15:55             ` Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 14/17] perf tools: Minimal CALLCHAIN_DEFERRED support Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 15/17] perf record: Enable defer_callchain for user callchains Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 16/17] perf script: Display PERF_RECORD_CALLCHAIN_DEFERRED Steven Rostedt
2025-04-24 16:25 ` [PATCH v5 17/17] perf tools: Merge deferred user callchains Steven Rostedt
2025-04-24 17:04 ` [PATCH v5 00/17] perf: Deferred unwinding of user space stack traces Steven Rostedt
2025-04-24 18:32   ` Miguel Ojeda
2025-04-24 18:41     ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aBz7mvEQwtlgNUjI@google.com \
    --to=namhyung@kernel.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=beaub@linux.microsoft.com \
    --cc=blakejones@google.com \
    --cc=broonie@kernel.org \
    --cc=fweimer@redhat.com \
    --cc=indu.bhagat@oracle.com \
    --cc=irogers@google.com \
    --cc=jemarch@gnu.org \
    --cc=jolsa@kernel.org \
    --cc=jordalgo@meta.com \
    --cc=jpoimboe@kernel.org \
    --cc=jremus@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=linux-toolchains@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sam@gentoo.org \
    --cc=wnliu@google.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.