linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Copying TLS/user register data per perf-sample?
@ 2024-04-04 19:26 Beau Belgrave
  2024-04-09 23:32 ` Namhyung Kim
  2024-04-10 13:06 ` Masami Hiramatsu
  0 siblings, 2 replies; 7+ messages in thread
From: Beau Belgrave @ 2024-04-04 19:26 UTC (permalink / raw)
  To: Namhyung Kim, Masami Hiramatsu; +Cc: linux-trace-kernel, linux-kernel

Hello,

I'm looking into the possibility of capturing user data that is pointed
to by a user register (IE: fs/gs for TLS on x86/64) for each sample via
perf_events.

I was hoping to find a way to do this similar to PERF_SAMPLE_STACK_USER.
I think it could even use roughly the same ABI in the perf ring buffer.
Or it may be possible by some kprobe linked to the perf sample function.

This would allow a profiler to collect TLS (or other values) on x64. In
the Open Telemetry profiling SIG [1], we are trying to find a fast way
to grab a tracing association quickly on a per-thread basis. The team
at Elastic has a bespoke way to do this [2], however, I'd like to see a
more general way to achieve this. The folks I've been talking with seem
open to the idea of just having a TLS value for this we could capture
upon each sample. We could then just state, Open Telemetry SDKs should
have a TLS value for span correlation. However, we need a way to sample
the TLS value(s) when a sampling event is generated.

Is this already possible via some other means? It'd be great to be able
to do this directly at the perf_event sample via the ABI or a probe.

Thanks,
-Beau

1. https://opentelemetry.io/blog/2024/profiling/
2. https://www.elastic.co/blog/continuous-profiling-distributed-tracing-correlation

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Copying TLS/user register data per perf-sample?
  2024-04-04 19:26 Copying TLS/user register data per perf-sample? Beau Belgrave
@ 2024-04-09 23:32 ` Namhyung Kim
  2024-04-10 15:37   ` Beau Belgrave
  2024-04-10 13:06 ` Masami Hiramatsu
  1 sibling, 1 reply; 7+ messages in thread
From: Namhyung Kim @ 2024-04-09 23:32 UTC (permalink / raw)
  To: Beau Belgrave; +Cc: Masami Hiramatsu, linux-trace-kernel, linux-kernel

Hello,

On Thu, Apr 4, 2024 at 12:26 PM Beau Belgrave <beaub@linux.microsoft.com> wrote:
>
> Hello,
>
> I'm looking into the possibility of capturing user data that is pointed
> to by a user register (IE: fs/gs for TLS on x86/64) for each sample via
> perf_events.
>
> I was hoping to find a way to do this similar to PERF_SAMPLE_STACK_USER.
> I think it could even use roughly the same ABI in the perf ring buffer.
> Or it may be possible by some kprobe linked to the perf sample function.
>
> This would allow a profiler to collect TLS (or other values) on x64. In
> the Open Telemetry profiling SIG [1], we are trying to find a fast way
> to grab a tracing association quickly on a per-thread basis. The team
> at Elastic has a bespoke way to do this [2], however, I'd like to see a
> more general way to achieve this. The folks I've been talking with seem
> open to the idea of just having a TLS value for this we could capture
> upon each sample. We could then just state, Open Telemetry SDKs should
> have a TLS value for span correlation. However, we need a way to sample
> the TLS value(s) when a sampling event is generated.
>
> Is this already possible via some other means? It'd be great to be able
> to do this directly at the perf_event sample via the ABI or a probe.

I don't think the current perf ABI allows capturing %fs/%gs + offset.
IIRC kprobes/uprobes don't have that too but I could be wrong.

Thanks,
Namhyung

>
> 1. https://opentelemetry.io/blog/2024/profiling/
> 2. https://www.elastic.co/blog/continuous-profiling-distributed-tracing-correlation

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Copying TLS/user register data per perf-sample?
  2024-04-04 19:26 Copying TLS/user register data per perf-sample? Beau Belgrave
  2024-04-09 23:32 ` Namhyung Kim
@ 2024-04-10 13:06 ` Masami Hiramatsu
  2024-04-10 15:35   ` Beau Belgrave
  1 sibling, 1 reply; 7+ messages in thread
From: Masami Hiramatsu @ 2024-04-10 13:06 UTC (permalink / raw)
  To: Beau Belgrave; +Cc: Namhyung Kim, linux-trace-kernel, linux-kernel

On Thu, 4 Apr 2024 12:26:41 -0700
Beau Belgrave <beaub@linux.microsoft.com> wrote:

> Hello,
> 
> I'm looking into the possibility of capturing user data that is pointed
> to by a user register (IE: fs/gs for TLS on x86/64) for each sample via
> perf_events.
> 
> I was hoping to find a way to do this similar to PERF_SAMPLE_STACK_USER.
> I think it could even use roughly the same ABI in the perf ring buffer.
> Or it may be possible by some kprobe linked to the perf sample function.
> 
> This would allow a profiler to collect TLS (or other values) on x64. In
> the Open Telemetry profiling SIG [1], we are trying to find a fast way
> to grab a tracing association quickly on a per-thread basis. The team
> at Elastic has a bespoke way to do this [2], however, I'd like to see a
> more general way to achieve this. The folks I've been talking with seem
> open to the idea of just having a TLS value for this we could capture
> upon each sample. We could then just state, Open Telemetry SDKs should
> have a TLS value for span correlation. However, we need a way to sample
> the TLS value(s) when a sampling event is generated.
> 
> Is this already possible via some other means? It'd be great to be able
> to do this directly at the perf_event sample via the ABI or a probe.
> 

Have you tried to use uprobes? It should be able to access user-space
registers including fs/gs.

Thank you,

-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Copying TLS/user register data per perf-sample?
  2024-04-10 13:06 ` Masami Hiramatsu
@ 2024-04-10 15:35   ` Beau Belgrave
  2024-04-11 15:55     ` Masami Hiramatsu
  0 siblings, 1 reply; 7+ messages in thread
From: Beau Belgrave @ 2024-04-10 15:35 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: Namhyung Kim, linux-trace-kernel, linux-kernel

On Wed, Apr 10, 2024 at 10:06:28PM +0900, Masami Hiramatsu wrote:
> On Thu, 4 Apr 2024 12:26:41 -0700
> Beau Belgrave <beaub@linux.microsoft.com> wrote:
> 
> > Hello,
> > 
> > I'm looking into the possibility of capturing user data that is pointed
> > to by a user register (IE: fs/gs for TLS on x86/64) for each sample via
> > perf_events.
> > 
> > I was hoping to find a way to do this similar to PERF_SAMPLE_STACK_USER.
> > I think it could even use roughly the same ABI in the perf ring buffer.
> > Or it may be possible by some kprobe linked to the perf sample function.
> > 
> > This would allow a profiler to collect TLS (or other values) on x64. In
> > the Open Telemetry profiling SIG [1], we are trying to find a fast way
> > to grab a tracing association quickly on a per-thread basis. The team
> > at Elastic has a bespoke way to do this [2], however, I'd like to see a
> > more general way to achieve this. The folks I've been talking with seem
> > open to the idea of just having a TLS value for this we could capture
> > upon each sample. We could then just state, Open Telemetry SDKs should
> > have a TLS value for span correlation. However, we need a way to sample
> > the TLS value(s) when a sampling event is generated.
> > 
> > Is this already possible via some other means? It'd be great to be able
> > to do this directly at the perf_event sample via the ABI or a probe.
> > 
> 
> Have you tried to use uprobes? It should be able to access user-space
> registers including fs/gs.
> 

We need to get fs/gs during a sample interrupt from perf. If the sample
interrupt lands during kernel code (IE: syscall) we would also like to
get these TLS values when in process context.

I have some patches into the kernel to make this possible via
perf_events that works well, however, I don't want to reinvent the wheel
if there is some way to get these via perf samples already.

In OTel, we are trying to attribute samples to transactions that are
occurring. So the TLS fetch has to be aligned exactly with the sample.
You can do this via eBPF when it's available, however, we have
environments where eBPF is not available.

It's sounding like to do this properly without eBPF a new feature would
be required. If so, I do have some patches I can share in a bit as an
RFC.

Thanks,
-Beau

> Thank you,
> 
> -- 
> Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Copying TLS/user register data per perf-sample?
  2024-04-09 23:32 ` Namhyung Kim
@ 2024-04-10 15:37   ` Beau Belgrave
  0 siblings, 0 replies; 7+ messages in thread
From: Beau Belgrave @ 2024-04-10 15:37 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: Masami Hiramatsu, linux-trace-kernel, linux-kernel

On Tue, Apr 09, 2024 at 04:32:46PM -0700, Namhyung Kim wrote:
> Hello,
> 
> On Thu, Apr 4, 2024 at 12:26 PM Beau Belgrave <beaub@linux.microsoft.com> wrote:
> >
> > Hello,
> >
> > I'm looking into the possibility of capturing user data that is pointed
> > to by a user register (IE: fs/gs for TLS on x86/64) for each sample via
> > perf_events.
> >
> > I was hoping to find a way to do this similar to PERF_SAMPLE_STACK_USER.
> > I think it could even use roughly the same ABI in the perf ring buffer.
> > Or it may be possible by some kprobe linked to the perf sample function.
> >
> > This would allow a profiler to collect TLS (or other values) on x64. In
> > the Open Telemetry profiling SIG [1], we are trying to find a fast way
> > to grab a tracing association quickly on a per-thread basis. The team
> > at Elastic has a bespoke way to do this [2], however, I'd like to see a
> > more general way to achieve this. The folks I've been talking with seem
> > open to the idea of just having a TLS value for this we could capture
> > upon each sample. We could then just state, Open Telemetry SDKs should
> > have a TLS value for span correlation. However, we need a way to sample
> > the TLS value(s) when a sampling event is generated.
> >
> > Is this already possible via some other means? It'd be great to be able
> > to do this directly at the perf_event sample via the ABI or a probe.
> 
> I don't think the current perf ABI allows capturing %fs/%gs + offset.
> IIRC kprobes/uprobes don't have that too but I could be wrong.
> 

Yeah, I didn't see it either. I have some patches that I will submit in
a bit as RFC that enable this functionality. I was hoping there was
already an easy way to do this.

Thanks,
-Beau

> Thanks,
> Namhyung
> 
> >
> > 1. https://opentelemetry.io/blog/2024/profiling/
> > 2. https://www.elastic.co/blog/continuous-profiling-distributed-tracing-correlation

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Copying TLS/user register data per perf-sample?
  2024-04-10 15:35   ` Beau Belgrave
@ 2024-04-11 15:55     ` Masami Hiramatsu
  2024-04-11 15:58       ` Beau Belgrave
  0 siblings, 1 reply; 7+ messages in thread
From: Masami Hiramatsu @ 2024-04-11 15:55 UTC (permalink / raw)
  To: Beau Belgrave; +Cc: Namhyung Kim, linux-trace-kernel, linux-kernel

On Wed, 10 Apr 2024 08:35:42 -0700
Beau Belgrave <beaub@linux.microsoft.com> wrote:

> On Wed, Apr 10, 2024 at 10:06:28PM +0900, Masami Hiramatsu wrote:
> > On Thu, 4 Apr 2024 12:26:41 -0700
> > Beau Belgrave <beaub@linux.microsoft.com> wrote:
> > 
> > > Hello,
> > > 
> > > I'm looking into the possibility of capturing user data that is pointed
> > > to by a user register (IE: fs/gs for TLS on x86/64) for each sample via
> > > perf_events.
> > > 
> > > I was hoping to find a way to do this similar to PERF_SAMPLE_STACK_USER.
> > > I think it could even use roughly the same ABI in the perf ring buffer.
> > > Or it may be possible by some kprobe linked to the perf sample function.
> > > 
> > > This would allow a profiler to collect TLS (or other values) on x64. In
> > > the Open Telemetry profiling SIG [1], we are trying to find a fast way
> > > to grab a tracing association quickly on a per-thread basis. The team
> > > at Elastic has a bespoke way to do this [2], however, I'd like to see a
> > > more general way to achieve this. The folks I've been talking with seem
> > > open to the idea of just having a TLS value for this we could capture
> > > upon each sample. We could then just state, Open Telemetry SDKs should
> > > have a TLS value for span correlation. However, we need a way to sample
> > > the TLS value(s) when a sampling event is generated.
> > > 
> > > Is this already possible via some other means? It'd be great to be able
> > > to do this directly at the perf_event sample via the ABI or a probe.
> > > 
> > 
> > Have you tried to use uprobes? It should be able to access user-space
> > registers including fs/gs.
> > 
> 
> We need to get fs/gs during a sample interrupt from perf. If the sample
> interrupt lands during kernel code (IE: syscall) we would also like to
> get these TLS values when in process context.

OK, those are not directly accessible from pt_regs.

> 
> I have some patches into the kernel to make this possible via
> perf_events that works well, however, I don't want to reinvent the wheel
> if there is some way to get these via perf samples already.

I would like to see it. I think it is possible to introduce a helper
to get a base address of user TLS for probe events, and start supporting
from x86.

> 
> In OTel, we are trying to attribute samples to transactions that are
> occurring. So the TLS fetch has to be aligned exactly with the sample.
> You can do this via eBPF when it's available, however, we have
> environments where eBPF is not available.
> 
> It's sounding like to do this properly without eBPF a new feature would
> be required. If so, I do have some patches I can share in a bit as an
> RFC.

It is better to be shared in RFC stage, so that we can discuss it from
the direction level.

Thank you,

> 
> Thanks,
> -Beau
> 
> > Thank you,
> > 
> > -- 
> > Masami Hiramatsu (Google) <mhiramat@kernel.org>


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Copying TLS/user register data per perf-sample?
  2024-04-11 15:55     ` Masami Hiramatsu
@ 2024-04-11 15:58       ` Beau Belgrave
  0 siblings, 0 replies; 7+ messages in thread
From: Beau Belgrave @ 2024-04-11 15:58 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: Namhyung Kim, linux-trace-kernel, linux-kernel

On Fri, Apr 12, 2024 at 12:55:19AM +0900, Masami Hiramatsu wrote:
> On Wed, 10 Apr 2024 08:35:42 -0700
> Beau Belgrave <beaub@linux.microsoft.com> wrote:
> 
> > On Wed, Apr 10, 2024 at 10:06:28PM +0900, Masami Hiramatsu wrote:
> > > On Thu, 4 Apr 2024 12:26:41 -0700
> > > Beau Belgrave <beaub@linux.microsoft.com> wrote:
> > > 
> > > > Hello,
> > > > 
> > > > I'm looking into the possibility of capturing user data that is pointed
> > > > to by a user register (IE: fs/gs for TLS on x86/64) for each sample via
> > > > perf_events.
> > > > 
> > > > I was hoping to find a way to do this similar to PERF_SAMPLE_STACK_USER.
> > > > I think it could even use roughly the same ABI in the perf ring buffer.
> > > > Or it may be possible by some kprobe linked to the perf sample function.
> > > > 
> > > > This would allow a profiler to collect TLS (or other values) on x64. In
> > > > the Open Telemetry profiling SIG [1], we are trying to find a fast way
> > > > to grab a tracing association quickly on a per-thread basis. The team
> > > > at Elastic has a bespoke way to do this [2], however, I'd like to see a
> > > > more general way to achieve this. The folks I've been talking with seem
> > > > open to the idea of just having a TLS value for this we could capture
> > > > upon each sample. We could then just state, Open Telemetry SDKs should
> > > > have a TLS value for span correlation. However, we need a way to sample
> > > > the TLS value(s) when a sampling event is generated.
> > > > 
> > > > Is this already possible via some other means? It'd be great to be able
> > > > to do this directly at the perf_event sample via the ABI or a probe.
> > > > 
> > > 
> > > Have you tried to use uprobes? It should be able to access user-space
> > > registers including fs/gs.
> > > 
> > 
> > We need to get fs/gs during a sample interrupt from perf. If the sample
> > interrupt lands during kernel code (IE: syscall) we would also like to
> > get these TLS values when in process context.
> 
> OK, those are not directly accessible from pt_regs.
> 

Yeah, it's a per-arch thread attribute.

> > 
> > I have some patches into the kernel to make this possible via
> > perf_events that works well, however, I don't want to reinvent the wheel
> > if there is some way to get these via perf samples already.
> 
> I would like to see it. I think it is possible to introduce a helper
> to get a base address of user TLS for probe events, and start supporting
> from x86.
> 

For sure, I'm hoping the patches start the right conversations.

> > 
> > In OTel, we are trying to attribute samples to transactions that are
> > occurring. So the TLS fetch has to be aligned exactly with the sample.
> > You can do this via eBPF when it's available, however, we have
> > environments where eBPF is not available.
> > 
> > It's sounding like to do this properly without eBPF a new feature would
> > be required. If so, I do have some patches I can share in a bit as an
> > RFC.
> 
> It is better to be shared in RFC stage, so that we can discuss it from
> the direction level.
> 

Agree, it could be that having the ability to run a probe on sample may
be a better option. Not sure.

Thanks,
-Beau

> Thank you,
> 
> > 
> > Thanks,
> > -Beau
> > 
> > > Thank you,
> > > 
> > > -- 
> > > Masami Hiramatsu (Google) <mhiramat@kernel.org>
> 
> 
> -- 
> Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-04-11 15:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-04 19:26 Copying TLS/user register data per perf-sample? Beau Belgrave
2024-04-09 23:32 ` Namhyung Kim
2024-04-10 15:37   ` Beau Belgrave
2024-04-10 13:06 ` Masami Hiramatsu
2024-04-10 15:35   ` Beau Belgrave
2024-04-11 15:55     ` Masami Hiramatsu
2024-04-11 15:58       ` Beau Belgrave

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).