linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ian Rogers <irogers@google.com>
To: Beau Belgrave <beaub@linux.microsoft.com>
Cc: peterz@infradead.org, mingo@redhat.com, acme@kernel.org,
	 namhyung@kernel.org, rostedt@goodmis.org, mhiramat@kernel.org,
	 mathieu.desnoyers@efficios.com, linux-kernel@vger.kernel.org,
	 linux-trace-kernel@vger.kernel.org,
	linux-perf-users@vger.kernel.org,  mark.rutland@arm.com,
	alexander.shishkin@linux.intel.com, jolsa@kernel.org,
	 adrian.hunter@intel.com, primiano@google.com,
	aahringo@redhat.com,  dcook@linux.microsoft.com
Subject: Re: [RFC PATCH 0/4] perf: Correlating user process data to samples
Date: Thu, 11 Apr 2024 21:52:22 -0700	[thread overview]
Message-ID: <CAP-5=fVVQ5RGqEQo596to_3BYZ6vNFC_DR1nnunH_-Bb6bdpVg@mail.gmail.com> (raw)
In-Reply-To: <20240412001732.475-1-beaub@linux.microsoft.com>

On Thu, Apr 11, 2024 at 5:17 PM Beau Belgrave <beaub@linux.microsoft.com> wrote:
>
> In the Open Telemetry profiling SIG [1], we are trying to find a way to
> grab a tracing association quickly on a per-sample basis. The team at
> Elastic has a bespoke way to do this [2], however, I'd like to see a
> more general way to achieve this. The folks I've been talking with seem
> open to the idea of just having a TLS value for this we could capture

Presumably TLS == Thread Local Storage.

> upon each sample. We could then just state, Open Telemetry SDKs should
> have a TLS value for span correlation. However, we need a way to sample
> the TLS or other value(s) when a sampling event is generated. This is
> supported today on Windows via EventActivityIdControl() [3]. Since
> Open Telemetry works on both Windows and Linux, ideally we can do
> something as efficient for Linux based workloads.
>
> This series is to explore how it would be best possible to collect
> supporting data from a user process when a profile sample is collected.
> Having a value stored in TLS makes a lot of sense for this however
> there are other ways to explore. Whatever is chosen, kernel samples
> taken in process context should be able to get this supporting data.
> In these patches on X64 the fsbase and gsbase are used for this.
>
> An option to explore suggested by Mathieu Desnoyers is to utilize rseq
> for processes to register a value location that can be included when
> profiling if desired. This would allow a tighter contract between user
> processes and a profiler.  It would allow better labeling/categorizing
> the correlation values.

It is hard to understand this idea. Are you saying stash a cookie in
TLS for samples to capture to indicate an activity? Restartable
sequences are about preemption on a CPU not of a thread, so at least
my intuition is that they feel different. You could stash information
like this today by changing the thread name which generates comm
events. I've wondered about having similar information in some form of
reserved for profiling stack slot, for example, to stash a pointer to
the name of a function being interpreted. Snapshotting all of a stack
is bad performance wise and for security. A stack slot would be able
to deal with nesting.

> An idea flow would look like this:
> User Task               Profile
> do_work();              sample() -> IP + No activity
> ...
> set_activity(123);
> ...
> do_work();              sample() -> IP + activity (123)
> ...
> set_activity(124);
> ...
> do_work();              sample() -> IP + activity (124)
>
> Ideally, the set_activity() method would not be a syscall. It needs to
> be very cheap as this should not bottleneck work. Ideally this is just
> a memcpy of 16-20 bytes as it is on Windows via EventActivityIdControl()
> using EVENT_ACTIVITY_CTRL_SET_ID.
>
> For those not aware, Open Telemetry allows collecting data from multiple
> machines and show where time was spent. The tracing context is already
> available for logs, but not for profiling samples. The idea is to show
> where slowdowns occur and have profile samples to explain why they
> slowed down. This must be possible without having to track context
> switches to do this correlation. This is because the profiling rates
> are typically 20hz - 1Khz, while the context switching rates are much
> higher. We do not want to have to consume high context switch rates
> just to know a correlation for a 20hz signal. Often these 20hz signals
> are always enabled in some environments.
>
> Regardless if TLS, rseq, or other source is used I believe we will need
> a way for perf_events to include it within a sample. The changes in this
> series show how it could be done with TLS. There is some factoring work
> under perf to make it easier to add more dump types using the existing
> ABI. This is mostly to make the patches clearer, certainly the refactor
> parts could get dropped and we could have duplicated/specialized paths.

fs and gs may be used for more than just the C runtime's TLS. For
example, they may be used by emulators or managed runtimes. I'm not
clear why this specific case couldn't be handled through BPF.

Thanks,
Ian

> 1. https://opentelemetry.io/blog/2024/profiling/
> 2. https://www.elastic.co/blog/continuous-profiling-distributed-tracing-correlation
> 3. https://learn.microsoft.com/en-us/windows/win32/api/evntprov/nf-evntprov-eventactivityidcontrol
>
> Beau Belgrave (4):
>   perf/core: Introduce perf_prepare_dump_data()
>   perf: Introduce PERF_SAMPLE_TLS_USER sample type
>   perf/core: Factor perf_output_sample_udump()
>   perf/x86/core: Add tls dump support
>
>  arch/Kconfig                      |   7 ++
>  arch/x86/Kconfig                  |   1 +
>  arch/x86/events/core.c            |  14 +++
>  arch/x86/include/asm/perf_event.h |   5 +
>  include/linux/perf_event.h        |   7 ++
>  include/uapi/linux/perf_event.h   |   5 +-
>  kernel/events/core.c              | 166 +++++++++++++++++++++++-------
>  kernel/events/internal.h          |  16 +++
>  8 files changed, 180 insertions(+), 41 deletions(-)
>
>
> base-commit: fec50db7033ea478773b159e0e2efb135270e3b7
> --
> 2.34.1
>

  parent reply	other threads:[~2024-04-12  4:52 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-12  0:17 [RFC PATCH 0/4] perf: Correlating user process data to samples Beau Belgrave
2024-04-12  0:17 ` [RFC PATCH 1/4] perf/core: Introduce perf_prepare_dump_data() Beau Belgrave
2024-04-12  0:17 ` [RFC PATCH 2/4] perf: Introduce PERF_SAMPLE_TLS_USER sample type Beau Belgrave
2024-04-12  0:17 ` [RFC PATCH 3/4] perf/core: Factor perf_output_sample_udump() Beau Belgrave
2024-04-12  0:17 ` [RFC PATCH 4/4] perf/x86/core: Add tls dump support Beau Belgrave
2024-04-12  4:52 ` Ian Rogers [this message]
2024-04-12 16:28   ` [RFC PATCH 0/4] perf: Correlating user process data to samples Beau Belgrave
2024-04-12 18:32     ` Mathieu Desnoyers
2024-04-12  7:12 ` Peter Zijlstra
2024-04-12 16:37   ` Beau Belgrave
2024-04-13 10:53     ` Peter Zijlstra
2024-04-13 12:48       ` Steven Rostedt
2024-04-18 22:53         ` Josh Poimboeuf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAP-5=fVVQ5RGqEQo596to_3BYZ6vNFC_DR1nnunH_-Bb6bdpVg@mail.gmail.com' \
    --to=irogers@google.com \
    --cc=aahringo@redhat.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=beaub@linux.microsoft.com \
    --cc=dcook@linux.microsoft.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=primiano@google.com \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).