[PATCH 0/2] bpf: context casting for tail call and gtrace prog type

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] bpf: context casting for tail call and gtrace prog type
@ 2019-02-25 15:54 Kris Van Hees
  2019-02-26  6:18 ` Alexei Starovoitov
  0 siblings, 1 reply; 13+ messages in thread
From: Kris Van Hees @ 2019-02-25 15:54 UTC (permalink / raw)
  To: netdev

The patches in this set are part of an effort to provide support for
tracing tools beyond attaching programs to probes and events and working
with the context data they provide.  It is also aimed at avoiding adding
new helpers for every piece of task information that tracers may want to
include in trace data (as was discussed at the Linux Plumbers Conference
BPF mini-conference track last year).

One of the main characteristics of tracers is that a variety of
information can be collected at the time of a probe firing.  When using
BPF program to implement actions to be taken when a probe fires, The most
natural source of a large part of this information (task information,
probe data, tracer state) is the context that is associated with a BPF
program.  It is also possible to obtain (most of) this information by
means of full access helper calls like probe_read() but that isn't
something you want to make available to an unprivileged user.

So we have two areas where BPF programs can be very useful and powerful:

- BPF programs that are attached to a probe or event, operating on the
  context provided by the probe or event
- BPF programs that implement actions to be taken within the context of a
  tracing tool

The Linux kernel provides a wealth of probes and events to which we can
attach BPF programs.  These event sources do not have any knowledge of the
tracing tool that might be using them.  But being able to use them from any
tracing tool is definitely preferable over implementing your own probes and
events.  We definitely also do not want to 'teach' all the existing probes
and events about any possible BPF program type that would like to get called
from those probes and events.

So, to illustrate what we're trying to accomplish, consider a kprobe.  We
can attach a BPF program to it and it will be called with a 'struct pt_regs'
context.  From the side of our tracing tool, we also want information about
the task that triggered the kprobe to fire (beyond what is currently
available through helpers) and we want to be able to access that information
from a BPF program that implements what should happen when the probe fires
(e.g. recording the event and specific data that we are interested in).

The 2nd patch in this set implements a very basic generic tracer program
type BPF_PROG_TYPE_GTRACE) that provides the pt_regs data and select task
data in its context.  We cannot attach a program of this type to a kprobe
because that probe supports BPF_PROG_TYPE_KPROBE instead.

The 1st patch in this set implements a mechanism to solve this issue: it
allows a tail-call from one program type to another if the callee type
supports conversion of a caller context into a context for the callee.

So, in the sample, the BPF_PROG_TYPE_GTRACE provides can_cast() and
cast_context() functions that support converting a BPF_PROG_TYPE_KPROBE
context into a BPF_PROG_TYPE_GTRACE context.

The work flow a tracer can use is:

 1. The tracer creates a program array map, and inserts one or more programs
    of type BPF_PROG_TYPE_GTRACE.  These programs implement whatever actions
    are to be taken when a specific probe fires.  This step must be done first
    so that the program array is initialized with the correct program type.
    This type needs to be known so that when the calling program is verified,
    compatibility checking can be performed.
 2. The tracer loads a program of type BPF_PROG_TYPE_KPROBE and attaches it
    to the kprobe we're interested in.  This program contains a tail-call to
    a BPF_PROG_TYPE_GTRACE program in the program array.
 3. The kprobe fires and executes our program (of type BPF_PROG_TYPE_KPROBE).
   3.1 The program performs whatever operations that we need to have done
       at the level of the probe firing.
   3.2 The program performs a tail-call into a program from our program array.
     3.2.1 The execution of the tail-call instruction causes a call to be
           made to a cast_context() function provided by BPF_PROG_TYPE_GTRACE.
           This function creates a context structure, and populates it with
           task information and copies in the pt_regs data from the context
           that was passed to the BPF_PROG_TYPE_KPROBE program.
     3.2.2 The new context is assigned to R1 (replacing the original context),
           and execution is transferred to the called program.

The implementation is done in such way that existing tail-calls will work
without any change aside from the fact that the verifier is inserting an
instruction right before the tail-call.  That instruction simply loads the
BPF program type into R4.  This ensures that at the time of the tail-call,
the program type of the calling program can be passed to the cast_context()
function.  Knowledge about the program type of an executing program is not
available anywhere and we need to know what context we're trying to convert
from.  The function prototype for the (pseudo-)helper bpf_tail_call declares
only 3 arguments so existing code is not affected by this internal use of R4.

Obviously, if there is no conversion function or the conversion is not
supported, the tail-call will fail because that situation is effectively the
same as trying to call a program of an incompatible type.

The goal is to further extend the BPF_PROG_TYPE_GTRACE implementation to
support what tracers commonly need, and I am also looking at ways to further
extend this model to allow more tracer-specific features as well without the
need for adding a BPF program types for every tracer.

	Kris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] bpf: context casting for tail call and gtrace prog type
  2019-02-25 15:54 [PATCH 0/2] bpf: context casting for tail call and gtrace prog type Kris Van Hees
@ 2019-02-26  6:18 ` Alexei Starovoitov
  2019-02-26  6:46   ` Kris Van Hees
  0 siblings, 1 reply; 13+ messages in thread
From: Alexei Starovoitov @ 2019-02-26  6:18 UTC (permalink / raw)
  To: Kris Van Hees; +Cc: netdev, bpf, daniel

On Mon, Feb 25, 2019 at 07:54:13AM -0800, Kris Van Hees wrote:
> 
> The goal is to further extend the BPF_PROG_TYPE_GTRACE implementation to
> support what tracers commonly need, and I am also looking at ways to further
> extend this model to allow more tracer-specific features as well without the
> need for adding a BPF program types for every tracer.

It seems by themselves the patches don't provide any new functionality,
but instead look like plumbing to call external code.
This is no-go.
There were several attempts to do so in the past, so we documented it here:
Documentation/bpf/bpf_design_QA.rst
Q: New functionality via kernel modules?
----------------------------------------
Q: Can BPF functionality such as new program or map types, new
helpers, etc be added out of kernel module code?

A: NO.

The answer is still the same.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] bpf: context casting for tail call and gtrace prog type
  2019-02-26  6:18 ` Alexei Starovoitov
@ 2019-02-26  6:46   ` Kris Van Hees
  2019-03-05 18:59     ` Alexei Starovoitov
  0 siblings, 1 reply; 13+ messages in thread
From: Kris Van Hees @ 2019-02-26  6:46 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Kris Van Hees, netdev, bpf, daniel

On Mon, Feb 25, 2019 at 10:18:25PM -0800, Alexei Starovoitov wrote:
> On Mon, Feb 25, 2019 at 07:54:13AM -0800, Kris Van Hees wrote:
> > 
> > The goal is to further extend the BPF_PROG_TYPE_GTRACE implementation to
> > support what tracers commonly need, and I am also looking at ways to further
> > extend this model to allow more tracer-specific features as well without the
> > need for adding a BPF program types for every tracer.
> 
> It seems by themselves the patches don't provide any new functionality,
> but instead look like plumbing to call external code.

The patches are definitely not plumbing to call external code, and if I gave
that impression I apologise.  I overlooked the information you quote below on
allowing new functionality through modules when I wrote the comment above but
please note that it was a forward-looking comment in terms of what could be
done - not a reason for the patches that I submitted.

The patches accomplish something that is totally independent from that: they
make it possible for existing events that execute BPF programs when triggered
to transfer control to a BPF program with a more rich context.  The first
patch makes such a transfer possible (using tail-call combined with converting
the context to the new program type), and the second patch provides one such
program type (generic trace).  The new functionality provided by the program
type is direct access to task information that previously could only be
obtained through helper calls.  E.g. the new program type allows programs to
access the task state, prio, ppid, euid, and egid.  None of those pieces of
information can currently be obtained unless you start poking around in
memory using bpf_probe_read() helper calls.

> This is no-go.
> There were several attempts to do so in the past, so we documented it here:
> Documentation/bpf/bpf_design_QA.rst
> Q: New functionality via kernel modules?
> ----------------------------------------
> Q: Can BPF functionality such as new program or map types, new
> helpers, etc be added out of kernel module code?
> 
> A: NO.
> 
> The answer is still the same.

Thanks for pointing this out - but again, my reference to modules was merely
musing about the possibilities.  This information clearly closes the door on
that train of thought, but that is not directly related to what I am doing
with the patches I submitted.

	Kris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] bpf: context casting for tail call and gtrace prog type
  2019-02-26  6:46   ` Kris Van Hees
@ 2019-03-05 18:59     ` Alexei Starovoitov
  2019-03-06  2:03       ` Kris Van Hees
  0 siblings, 1 reply; 13+ messages in thread
From: Alexei Starovoitov @ 2019-03-05 18:59 UTC (permalink / raw)
  To: Kris Van Hees; +Cc: netdev, bpf, daniel

On Tue, Feb 26, 2019 at 01:46:01AM -0500, Kris Van Hees wrote:
> On Mon, Feb 25, 2019 at 10:18:25PM -0800, Alexei Starovoitov wrote:
> > On Mon, Feb 25, 2019 at 07:54:13AM -0800, Kris Van Hees wrote:
> > > 
> > > The goal is to further extend the BPF_PROG_TYPE_GTRACE implementation to
> > > support what tracers commonly need, and I am also looking at ways to further
> > > extend this model to allow more tracer-specific features as well without the
> > > need for adding a BPF program types for every tracer.
> > 
> > It seems by themselves the patches don't provide any new functionality,
> > but instead look like plumbing to call external code.
> 
> The patches are definitely not plumbing to call external code, and if I gave
> that impression I apologise.  I overlooked the information you quote below on
> allowing new functionality through modules when I wrote the comment above but
> please note that it was a forward-looking comment in terms of what could be
> done - not a reason for the patches that I submitted.
> 
> The patches accomplish something that is totally independent from that: they
> make it possible for existing events that execute BPF programs when triggered
> to transfer control to a BPF program with a more rich context.  The first
> patch makes such a transfer possible (using tail-call combined with converting
> the context to the new program type), and the second patch provides one such
> program type (generic trace).  The new functionality provided by the program
> type is direct access to task information that previously could only be
> obtained through helper calls.  E.g. the new program type allows programs to
> access the task state, prio, ppid, euid, and egid.  None of those pieces of
> information can currently be obtained unless you start poking around in
> memory using bpf_probe_read() helper calls.

I don't think I understand the problem you're trying to solve.
From kprobe/tracepoints/etc bpf prog can use bpf_probe_read() to read everything.
Are you saying direct access to state, prio, ppid, euid, and egid via context
is much superior? Why? Because it's more stable?
Why stop at these fields then? task_struct has many others.

What we observed that no matter how many fields we add to stable uapi
somebody will always request one more. For networking the total number of
such fields is contained, but for tracing we're talking about thousands
of useful fields. We cannot make them stable.
Hence we've been working on alternative approach via BTF to make all
of kernel internal fields sort-of stable via 'compile once' technique that
we described at the last LPC.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] bpf: context casting for tail call and gtrace prog type
  2019-03-05 18:59     ` Alexei Starovoitov
@ 2019-03-06  2:03       ` Kris Van Hees
  2019-03-07 21:30         ` Alexei Starovoitov
  0 siblings, 1 reply; 13+ messages in thread
From: Kris Van Hees @ 2019-03-06  2:03 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Kris Van Hees, netdev, bpf, daniel

On Tue, Mar 05, 2019 at 10:59:52AM -0800, Alexei Starovoitov wrote:
> On Tue, Feb 26, 2019 at 01:46:01AM -0500, Kris Van Hees wrote:
> > On Mon, Feb 25, 2019 at 10:18:25PM -0800, Alexei Starovoitov wrote:
> > > On Mon, Feb 25, 2019 at 07:54:13AM -0800, Kris Van Hees wrote:
> > > > 
> > > > The goal is to further extend the BPF_PROG_TYPE_GTRACE implementation to
> > > > support what tracers commonly need, and I am also looking at ways to further
> > > > extend this model to allow more tracer-specific features as well without the
> > > > need for adding a BPF program types for every tracer.
> > > 
> > > It seems by themselves the patches don't provide any new functionality,
> > > but instead look like plumbing to call external code.
> > 
> > The patches are definitely not plumbing to call external code, and if I gave
> > that impression I apologise.  I overlooked the information you quote below on
> > allowing new functionality through modules when I wrote the comment above but
> > please note that it was a forward-looking comment in terms of what could be
> > done - not a reason for the patches that I submitted.
> > 
> > The patches accomplish something that is totally independent from that: they
> > make it possible for existing events that execute BPF programs when triggered
> > to transfer control to a BPF program with a more rich context.  The first
> > patch makes such a transfer possible (using tail-call combined with converting
> > the context to the new program type), and the second patch provides one such
> > program type (generic trace).  The new functionality provided by the program
> > type is direct access to task information that previously could only be
> > obtained through helper calls.  E.g. the new program type allows programs to
> > access the task state, prio, ppid, euid, and egid.  None of those pieces of
> > information can currently be obtained unless you start poking around in
> > memory using bpf_probe_read() helper calls.
> 
> I don't think I understand the problem you're trying to solve.
> >From kprobe/tracepoints/etc bpf prog can use bpf_probe_read() to read everything.
> Are you saying direct access to state, prio, ppid, euid, and egid via context
> is much superior? Why? Because it's more stable?

When you provide tracing to non-privileged users you definitely do not want
to allow BPF programs to access any memory they want in kernel space, yet you
would still want to be able to provide a decent amount of information about
tasks at time of probe firing.

> Why stop at these fields then? task_struct has many others.
>
> What we observed that no matter how many fields we add to stable uapi
> somebody will always request one more. For networking the total number of
> such fields is contained, but for tracing we're talking about thousands
> of useful fields. We cannot make them stable.
> Hence we've been working on alternative approach via BTF to make all
> of kernel internal fields sort-of stable via 'compile once' technique that
> we described at the last LPC.

Sure, but the ones I put in there were an example of how this can be used.
And again, in the case of unprivileged tracing, this easily becomes an issue
about where you end up enforcing what a tracing program can do and cannot do.
There will always be cases where more than the 'standard' information is
needed for a tracing task, and then it would be quite reasonable to conclude
that a higher level of privileges is required to accomplish that - but that
shouldn't prevent unprivileged tracing from being able to be useful as well.

Again, the limited set of fields I put in there right now is a matter of
showing how this can be used.  It is certainly meant to be expended quite a
bit.

The primary reason though behind the context conversion approach and the
generic tracing program type and context is that tracing on Linux based on
the existing kernel facilities limits the userspace tools because userspace
has quite limited control over what happens when a probe/event fires.  One
of the features of advanced tracing tools has been the ability to have more
(safe) control over what happens when the probe/event fires and how data is
stored in output buffers.  Since the userspace tool is the one requested data
and ultimately processes the generated data, it stands to reason that it
would benefit from being able to have more freedom in that area.  But that
means it needs to be able to provide a BPF program of a type that more closely
relates to the tracing tool functionality rather than the probe or event
itself (especially since probes and events are very specific, and by their
very nature should not really care about how userspace uses information).
This is again even more true for privileged tracing - right now there is a lot
of useful task information that you cannot get to without bpf_probe_read() but
unprivileged users really shouldn't be able to just read arbitrary kernel
memory.

So in summary, I am trying to solve two (related) problems:

- Ensure that unprivileged tracing can obtain information about the task that
  triggered a probe or event.  There will always be limitations but we can do
  better than is available now.
- Allow tracing tools ab ability to provide actions to be performed when a
  probe or event fires, beyond what the individual BPF program types allow
  for the specific probe/event types (and do it in a generic manner, in a
  sense encapsulating multiple probe/event types in a more generic tracing
  context).

A patch I am currently working on ties into this (and I hope to get it ready
sometime next week).  It builds on the support you already have for accessing
packet data from the __sk_buff context.  If we can make this same functionality
available to other contexts as well, my goal would be to make it possible for
the generic tracing context to have a buffer (data and data_end members) that
the BPF program can issue direct stores to as a means to allow a tracing
program to control how data is written into the buffer.  I am still working
out some details but I have a prototype working, and it retains all safety
provisions that BPF offres us.  But being able to do things like this without
needing to touch the context of any other BPF program type is a great benefit
to offer tracing tools, as far as I see it.

	Kris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] bpf: context casting for tail call and gtrace prog type
  2019-03-06  2:03       ` Kris Van Hees
@ 2019-03-07 21:30         ` Alexei Starovoitov
  2019-03-11 14:21           ` Kris Van Hees
  0 siblings, 1 reply; 13+ messages in thread
From: Alexei Starovoitov @ 2019-03-07 21:30 UTC (permalink / raw)
  To: Kris Van Hees; +Cc: netdev, bpf, daniel

On Tue, Mar 05, 2019 at 09:03:57PM -0500, Kris Van Hees wrote:
> 
> So in summary, I am trying to solve two (related) problems:
> 
> - Ensure that unprivileged tracing can obtain information about the task that
>   triggered a probe or event.  There will always be limitations but we can do
>   better than is available now.

I think unprivileged tracing is a huge topic on its own.
It's too easy to create security holes with such mechanism.
kprobe/tracepoints/etc have been historically root only and I don't see a way
for them to become unpriv.
imo the existing /proc/pid/status is already more powerful than
what you're proposing with gtrace context.

> - Allow tracing tools ab ability to provide actions to be performed when a
>   probe or event fires, beyond what the individual BPF program types allow
>   for the specific probe/event types (and do it in a generic manner, in a
>   sense encapsulating multiple probe/event types in a more generic tracing
>   context).

I think existing bpf tracing is generic whereas proposed gtrace is not generic at all.
'generic' is a loaded word. we can throw it back and forth and won't make
any forward progress. Let's focus on technical bits, ok?

> A patch I am currently working on ties into this (and I hope to get it ready
> sometime next week).  It builds on the support you already have for accessing
> packet data from the __sk_buff context.  If we can make this same functionality
> available to other contexts as well, my goal would be to make it possible for
> the generic tracing context to have a buffer (data and data_end members) that
> the BPF program can issue direct stores to as a means to allow a tracing
> program to control how data is written into the buffer. 

sounds like you're trying to reinvent bpf_perf_event_output() mechanism.

> But being able to do things like this without
> needing to touch the context of any other BPF program type is a great benefit
> to offer tracing tools, as far as I see it.

I still don't understand what you're referring to by 'things like this'
that somehow will be possible in the future, but not possible today.
Could you please give concrete example?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] bpf: context casting for tail call and gtrace prog type
  2019-03-07 21:30         ` Alexei Starovoitov
@ 2019-03-11 14:21           ` Kris Van Hees
  2019-03-12  1:29             ` Brendan Gregg
  0 siblings, 1 reply; 13+ messages in thread
From: Kris Van Hees @ 2019-03-11 14:21 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Kris Van Hees, netdev, bpf, daniel

On Thu, Mar 07, 2019 at 01:30:37PM -0800, Alexei Starovoitov wrote:
> On Tue, Mar 05, 2019 at 09:03:57PM -0500, Kris Van Hees wrote:
> > 
> > So in summary, I am trying to solve two (related) problems:
> > 
> > - Ensure that unprivileged tracing can obtain information about the task that
> >   triggered a probe or event.  There will always be limitations but we can do
> >   better than is available now.
> 
> I think unprivileged tracing is a huge topic on its own.
> It's too easy to create security holes with such mechanism.
> kprobe/tracepoints/etc have been historically root only and I don't see a way
> for them to become unpriv.

I agree it is a huge topic and one that needs very careful attention, but I
believe it certainly can be done (DTrace on Solaris provided it, and while we
haven't implemented it on Linux thus far, the availability of BPF actually
makes it more realistic).  This remains an important goal for us, and while
it will take a while to get there, we certain want to contribute in this area.

> imo the existing /proc/pid/status is already more powerful than
> what you're proposing with gtrace context.

THe problem with /proc/pid/status is that it cannot be read at the exact same
time as when a peobe fires.  And when you're doing tracing, that is one of the
main requirements: looking at the system at the moment of the probe firing.
Surely, /proc/pid/status contains more information right now but as I mentioned
in earlier email, the gtrace context I proposed so far was merely showing what
I am trying to do - it is nowhere near the final version.

> > - Allow tracing tools ab ability to provide actions to be performed when a
> >   probe or event fires, beyond what the individual BPF program types allow
> >   for the specific probe/event types (and do it in a generic manner, in a
> >   sense encapsulating multiple probe/event types in a more generic tracing
> >   context).
> 
> I think existing bpf tracing is generic whereas proposed gtrace is not generic at all.
> 'generic' is a loaded word. we can throw it back and forth and won't make
> any forward progress. Let's focus on technical bits, ok?

Sure, I definitely agree that we can mean very different things with 'generic'.
It is important then to explain what I mean here though because it is rather
crucial to the design.  From the perspective of tracing that we want to be able
to do we are mostly interested in being able to look at the system as a whole
through the sequence of probes that fire.  In that way, the specific context of
each probe is important, but the actions to be executed when probes fire need
more than just that probe context - they also need information about the task
that is executing.  An example would be to trace whether non-root executables
(often scripts) make use of setuid executables that could potentially be a
security risk.  You end up tracing through a possible complex tree of task
clones, tasks issueing exec syscalls, and performing file operations.  The
probes themselves are pretty meaningless without the larger context.  It is the
task context combined with the probe specifics that provide the information you
need to execute meaningful actions when a probe fires.  And that information
needs to be obtained when the probe fires - reading it from /proc when the
probe data reaches userspace is often too late.  Think of the case where you
need to trace information when an exec syscall is executed - by the time
tracing data is available in a buffer for userspace to process, the syscall
will usually have completed and userspace will only be able to obtain task
info about the state *after* the exec took place.

The majority of tracing use-cases that I encounter relate to observing one or
more tasks rather than probes without caring about the task context, and I
would argue that most use-cases I have read about from other people match this
observation,  Since we want to be able to use BPF as the execution engine for
the probe actions, it seems to make sense to me that the BPF context available
to those programs would therefore be providing the task data and probe data,
so that the programs can be more naturally written based on the context that
makes sense for the functionality they are providing.

So, my patches provide a initial implementation of a BPF program context (and
a mechanism to execute programs in it as a result of probes triggering) that
serves the perspective of the tracer looking at tasks.

> > A patch I am currently working on ties into this (and I hope to get it ready
> > sometime next week).  It builds on the support you already have for accessing
> > packet data from the __sk_buff context.  If we can make this same functionality
> > available to other contexts as well, my goal would be to make it possible for
> > the generic tracing context to have a buffer (data and data_end members) that
> > the BPF program can issue direct stores to as a means to allow a tracing
> > program to control how data is written into the buffer. 
> 
> sounds like you're trying to reinvent bpf_perf_event_output() mechanism.

It sure sounds like that, but the difference is that bpf_perf_event_output()
encapsulates data you are providing in a specific format already.  Most of the
tracers that support BPF in some way seems to pick a perf_event output type
that allows providing raw data and use that to write out what they need to pass
to userspace.

Our situation is a bit different because we have an existing tracing tool
(DTrace) with its own requirements on what the output buffer data is supposed
to look like.  When I looked at the options of using (somewhat abusing) the
bpf_perf_event_output() mechanism as a vehicle to get data to userspace vs
supporting DTrace's format, I concluded that being able to let the tracer
(DTrace in my case) define the output format is a benefit because it means
others can do the same (if they want to).  And given that DTrace works with
multiple buffer types and with things like speculation buffers (buffers that
are used as temporary output store in place of the default buffer - in a way
that the rest of the action need not be aware of - and written to the default
buffer upon commit, or discarded when not needed), bpf_perf_event_output() is
not sufficient.

> > But being able to do things like this without
> > needing to touch the context of any other BPF program type is a great benefit
> > to offer tracing tools, as far as I see it.
> 
> I still don't understand what you're referring to by 'things like this'
> that somehow will be possible in the future, but not possible today.
> Could you please give concrete example?

My apologies for not being clear.  I am referring to the features of the
gtrace context in terms of containing task information, and output buffers
to be used in BPF programs triggered from various probe sources (kprobe,
tracepoints, ...)  I would not want to suggest making changes to all the
different program contexts in order to support tracing needs because that
would be wrong.  Doing it in a central place makes it a lot easier to maintain
without impacting other program types, etc.

Of course, yes, bpf_probe_read() and bpf_perf_event_output() can be used
to implement a lot of what existing tracing tools like DTrace can do, if you
write them based on that.  One limitations I am obviously working with is
that DTrace already exists and has existed for a long time.  And while it is
100% available as open source, it involves a pretty involved set of patches to
be applied to the kernel to be able to use it which is just not ideal.  Hence
the goal to make it available by re-using as much of the existing features in
Linux as possible, while still maintaining the same level of functionality in
DTrace.  That means we need to fill the gaps - and from where I am sitting,
the ways to do that might as well be of use to others (if they want to).

If phrasing things in the context of DTrace would make the conversation easier
I certainly don;t mind doing that, but I really don't want to limit my patches
to supporting just DTrace (even if right now it might be the only tracer using
it).

	Kris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] bpf: context casting for tail call and gtrace prog type
  2019-03-11 14:21           ` Kris Van Hees
@ 2019-03-12  1:29             ` Brendan Gregg
  2019-03-12  3:24               ` Kris Van Hees
  0 siblings, 1 reply; 13+ messages in thread
From: Brendan Gregg @ 2019-03-12  1:29 UTC (permalink / raw)
  To: Kris Van Hees; +Cc: Alexei Starovoitov, netdev, bpf, Daniel Borkmann

On Mon, Mar 11, 2019 at 7:21 AM Kris Van Hees <kris.van.hees@oracle.com> wrote:
>
> On Thu, Mar 07, 2019 at 01:30:37PM -0800, Alexei Starovoitov wrote:
> > On Tue, Mar 05, 2019 at 09:03:57PM -0500, Kris Van Hees wrote:
[...]
> > > But being able to do things like this without
> > > needing to touch the context of any other BPF program type is a great benefit
> > > to offer tracing tools, as far as I see it.
> >
> > I still don't understand what you're referring to by 'things like this'
> > that somehow will be possible in the future, but not possible today.
> > Could you please give concrete example?
>
> My apologies for not being clear.  I am referring to the features of the
> gtrace context in terms of containing task information, and output buffers
> to be used in BPF programs triggered from various probe sources (kprobe,
> tracepoints, ...)  I would not want to suggest making changes to all the
> different program contexts in order to support tracing needs because that
> would be wrong.  Doing it in a central place makes it a lot easier to maintain
> without impacting other program types, etc.
>
> Of course, yes, bpf_probe_read() and bpf_perf_event_output() can be used
> to implement a lot of what existing tracing tools like DTrace can do, if you
> write them based on that.  One limitations I am obviously working with is
> that DTrace already exists and has existed for a long time.  And while it is
> 100% available as open source, it involves a pretty involved set of patches to
> be applied to the kernel to be able to use it which is just not ideal.  Hence
> the goal to make it available by re-using as much of the existing features in
> Linux as possible, while still maintaining the same level of functionality in
> DTrace.  That means we need to fill the gaps - and from where I am sitting,
> the ways to do that might as well be of use to others (if they want to).
>
> If phrasing things in the context of DTrace would make the conversation easier
> I certainly don;t mind doing that, but I really don't want to limit my patches
> to supporting just DTrace (even if right now it might be the only tracer using
> it).

As a concrete example, can you point to one of my own published DTrace
tools that BPF can't do? These were created to solve many real
production issues, and make good use cases. I've been porting them
over to BPF (bcc and bpftrace) without too much problem, and I can't
think of a single one that I couldn't port over today.

There's a few minor things that I'm currently doing workarounds for,
like ppid, but that should be satisfied with a few more helpers. And
if it's really niche, then BTF sounds like a good solution.

If your ultimate goal is to have a command called "dtrace" that runs D
programs, to support your existing users, then I'd add a lex/yacc pair
to bpftrace and have it emit a dtrace binary.

Brendan

--
Brendan Gregg, Senior Performance Architect, Netflix

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] bpf: context casting for tail call and gtrace prog type
  2019-03-12  1:29             ` Brendan Gregg
@ 2019-03-12  3:24               ` Kris Van Hees
  2019-03-12  6:03                 ` Brendan Gregg
  0 siblings, 1 reply; 13+ messages in thread
From: Kris Van Hees @ 2019-03-12  3:24 UTC (permalink / raw)
  To: Brendan Gregg
  Cc: Kris Van Hees, Alexei Starovoitov, netdev, bpf, Daniel Borkmann

On Mon, Mar 11, 2019 at 06:29:55PM -0700, Brendan Gregg wrote:
> On Mon, Mar 11, 2019 at 7:21 AM Kris Van Hees <kris.van.hees@oracle.com> wrote:
> >
> > On Thu, Mar 07, 2019 at 01:30:37PM -0800, Alexei Starovoitov wrote:
> > > On Tue, Mar 05, 2019 at 09:03:57PM -0500, Kris Van Hees wrote:
> [...]
> > > > But being able to do things like this without
> > > > needing to touch the context of any other BPF program type is a great benefit
> > > > to offer tracing tools, as far as I see it.
> > >
> > > I still don't understand what you're referring to by 'things like this'
> > > that somehow will be possible in the future, but not possible today.
> > > Could you please give concrete example?
> >
> > My apologies for not being clear.  I am referring to the features of the
> > gtrace context in terms of containing task information, and output buffers
> > to be used in BPF programs triggered from various probe sources (kprobe,
> > tracepoints, ...)  I would not want to suggest making changes to all the
> > different program contexts in order to support tracing needs because that
> > would be wrong.  Doing it in a central place makes it a lot easier to maintain
> > without impacting other program types, etc.
> >
> > Of course, yes, bpf_probe_read() and bpf_perf_event_output() can be used
> > to implement a lot of what existing tracing tools like DTrace can do, if you
> > write them based on that.  One limitations I am obviously working with is
> > that DTrace already exists and has existed for a long time.  And while it is
> > 100% available as open source, it involves a pretty involved set of patches to
> > be applied to the kernel to be able to use it which is just not ideal.  Hence
> > the goal to make it available by re-using as much of the existing features in
> > Linux as possible, while still maintaining the same level of functionality in
> > DTrace.  That means we need to fill the gaps - and from where I am sitting,
> > the ways to do that might as well be of use to others (if they want to).
> >
> > If phrasing things in the context of DTrace would make the conversation easier
> > I certainly don;t mind doing that, but I really don't want to limit my patches
> > to supporting just DTrace (even if right now it might be the only tracer using
> > it).
> 
> As a concrete example, can you point to one of my own published DTrace
> tools that BPF can't do? These were created to solve many real
> production issues, and make good use cases. I've been porting them
> over to BPF (bcc and bpftrace) without too much problem, and I can't
> think of a single one that I couldn't port over today.

I am unclear how pointing at one of your published DTrace tools would
contribute to this discussion.  Surely the scope of use cases is not limited
to the DTrace scripts you published?

Either way, one of the features that I make use of is speculative tracing.
And yes, even that could be handled with some ugly workarounds but my intent
is to implement things in a more clean way rather than depending on a bunch
of workarounds to make it somewhat work.

> There's a few minor things that I'm currently doing workarounds for,
> like ppid, but that should be satisfied with a few more helpers. And
> if it's really niche, then BTF sounds like a good solution.

Of course, we can always add more helpers to get to information that is
needed, but that is hardly a practical solution in the long run, and at
Plumbers 2019 it was already indicated that just adding helpers to get to
more information about tasks is not the route people want to take.

> If your ultimate goal is to have a command called "dtrace" that runs D
> programs, to support your existing users, then I'd add a lex/yacc pair
> to bpftrace and have it emit a dtrace binary.

My goal is not to have a command called dtarce that somehow simply provides
some form of support for dtrace scripts in some legacy support model.  My
goal is to make DTrace available on Linux based on existing kernel features
(and contirbuting extra features where needed, in a collaborative manner).

DTrace is currently already available as open source for Linux but it involves
a much too invasive set of patches to the kernel, often (almost) duplicating
functionality that is already present.  That's not a good solution.  Working
on implementing the kernel portion to make use of kernel features has brought
to light some areas where contributions can help avoid workarounds and provide
mechanisms that can be of use to other tracing solutions as well.  That is the
basis for my patches.

	Kris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] bpf: context casting for tail call and gtrace prog type
  2019-03-12  3:24               ` Kris Van Hees
@ 2019-03-12  6:03                 ` Brendan Gregg
  2019-03-12 16:53                   ` Kris Van Hees
  0 siblings, 1 reply; 13+ messages in thread
From: Brendan Gregg @ 2019-03-12  6:03 UTC (permalink / raw)
  To: Kris Van Hees; +Cc: Alexei Starovoitov, netdev, bpf, Daniel Borkmann

On Mon, Mar 11, 2019 at 8:24 PM Kris Van Hees <kris.van.hees@oracle.com> wrote:
>
> On Mon, Mar 11, 2019 at 06:29:55PM -0700, Brendan Gregg wrote:
> > On Mon, Mar 11, 2019 at 7:21 AM Kris Van Hees <kris.van.hees@oracle.com> wrote:
> > >
> > > On Thu, Mar 07, 2019 at 01:30:37PM -0800, Alexei Starovoitov wrote:
> > > > On Tue, Mar 05, 2019 at 09:03:57PM -0500, Kris Van Hees wrote:
> > [...]
> > > > > But being able to do things like this without
> > > > > needing to touch the context of any other BPF program type is a great benefit
> > > > > to offer tracing tools, as far as I see it.
> > > >
> > > > I still don't understand what you're referring to by 'things like this'
> > > > that somehow will be possible in the future, but not possible today.
> > > > Could you please give concrete example?
> > >
> > > My apologies for not being clear.  I am referring to the features of the
> > > gtrace context in terms of containing task information, and output buffers
> > > to be used in BPF programs triggered from various probe sources (kprobe,
> > > tracepoints, ...)  I would not want to suggest making changes to all the
> > > different program contexts in order to support tracing needs because that
> > > would be wrong.  Doing it in a central place makes it a lot easier to maintain
> > > without impacting other program types, etc.
> > >
> > > Of course, yes, bpf_probe_read() and bpf_perf_event_output() can be used
> > > to implement a lot of what existing tracing tools like DTrace can do, if you
> > > write them based on that.  One limitations I am obviously working with is
> > > that DTrace already exists and has existed for a long time.  And while it is
> > > 100% available as open source, it involves a pretty involved set of patches to
> > > be applied to the kernel to be able to use it which is just not ideal.  Hence
> > > the goal to make it available by re-using as much of the existing features in
> > > Linux as possible, while still maintaining the same level of functionality in
> > > DTrace.  That means we need to fill the gaps - and from where I am sitting,
> > > the ways to do that might as well be of use to others (if they want to).
> > >
> > > If phrasing things in the context of DTrace would make the conversation easier
> > > I certainly don;t mind doing that, but I really don't want to limit my patches
> > > to supporting just DTrace (even if right now it might be the only tracer using
> > > it).
> >
> > As a concrete example, can you point to one of my own published DTrace
> > tools that BPF can't do? These were created to solve many real
> > production issues, and make good use cases. I've been porting them
> > over to BPF (bcc and bpftrace) without too much problem, and I can't
> > think of a single one that I couldn't port over today.
>
> I am unclear how pointing at one of your published DTrace tools would
> contribute to this discussion.  Surely the scope of use cases is not limited
> to the DTrace scripts you published?
>
> Either way, one of the features that I make use of is speculative tracing.

The subject was a concrete example. I don't think I used speculative
tracing in any of my scripts. Can you pick a better example of
something BPF can't do?

> And yes, even that could be handled with some ugly workarounds but my intent
> is to implement things in a more clean way rather than depending on a bunch
> of workarounds to make it somewhat work.
>
> > There's a few minor things that I'm currently doing workarounds for,
> > like ppid, but that should be satisfied with a few more helpers. And
> > if it's really niche, then BTF sounds like a good solution.
>
> Of course, we can always add more helpers to get to information that is
> needed, but that is hardly a practical solution in the long run, and at
> Plumbers 2019 it was already indicated that just adding helpers to get to
> more information about tasks is not the route people want to take.
>
> > If your ultimate goal is to have a command called "dtrace" that runs D
> > programs, to support your existing users, then I'd add a lex/yacc pair
> > to bpftrace and have it emit a dtrace binary.
>
> My goal is not to have a command called dtarce that somehow simply provides
> some form of support for dtrace scripts in some legacy support model.  My
> goal is to make DTrace available on Linux based on existing kernel features
> (and contirbuting extra features where needed, in a collaborative manner).

If bpftrace builds a dtrace binary that runs D code, then you just did
make DTrace available on Linux.

And without changing the kernel.

Brendan

-- 
Brendan Gregg, Senior Performance Architect, Netflix

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] bpf: context casting for tail call and gtrace prog type
  2019-03-12  6:03                 ` Brendan Gregg
@ 2019-03-12 16:53                   ` Kris Van Hees
  2019-03-13 16:30                     ` Brendan Gregg
  0 siblings, 1 reply; 13+ messages in thread
From: Kris Van Hees @ 2019-03-12 16:53 UTC (permalink / raw)
  To: Brendan Gregg
  Cc: Kris Van Hees, Alexei Starovoitov, netdev, bpf, Daniel Borkmann

On Mon, Mar 11, 2019 at 11:03:10PM -0700, Brendan Gregg wrote:
> On Mon, Mar 11, 2019 at 8:24 PM Kris Van Hees <kris.van.hees@oracle.com> wrote:
> >
> > On Mon, Mar 11, 2019 at 06:29:55PM -0700, Brendan Gregg wrote:
> > > On Mon, Mar 11, 2019 at 7:21 AM Kris Van Hees <kris.van.hees@oracle.com> wrote:
> > > >
> > > > On Thu, Mar 07, 2019 at 01:30:37PM -0800, Alexei Starovoitov wrote:
> > > > > On Tue, Mar 05, 2019 at 09:03:57PM -0500, Kris Van Hees wrote:
> > > [...]
> > > > > > But being able to do things like this without
> > > > > > needing to touch the context of any other BPF program type is a great benefit
> > > > > > to offer tracing tools, as far as I see it.
> > > > >
> > > > > I still don't understand what you're referring to by 'things like this'
> > > > > that somehow will be possible in the future, but not possible today.
> > > > > Could you please give concrete example?
> > > >
> > > > My apologies for not being clear.  I am referring to the features of the
> > > > gtrace context in terms of containing task information, and output buffers
> > > > to be used in BPF programs triggered from various probe sources (kprobe,
> > > > tracepoints, ...)  I would not want to suggest making changes to all the
> > > > different program contexts in order to support tracing needs because that
> > > > would be wrong.  Doing it in a central place makes it a lot easier to maintain
> > > > without impacting other program types, etc.
> > > >
> > > > Of course, yes, bpf_probe_read() and bpf_perf_event_output() can be used
> > > > to implement a lot of what existing tracing tools like DTrace can do, if you
> > > > write them based on that.  One limitations I am obviously working with is
> > > > that DTrace already exists and has existed for a long time.  And while it is
> > > > 100% available as open source, it involves a pretty involved set of patches to
> > > > be applied to the kernel to be able to use it which is just not ideal.  Hence
> > > > the goal to make it available by re-using as much of the existing features in
> > > > Linux as possible, while still maintaining the same level of functionality in
> > > > DTrace.  That means we need to fill the gaps - and from where I am sitting,
> > > > the ways to do that might as well be of use to others (if they want to).
> > > >
> > > > If phrasing things in the context of DTrace would make the conversation easier
> > > > I certainly don;t mind doing that, but I really don't want to limit my patches
> > > > to supporting just DTrace (even if right now it might be the only tracer using
> > > > it).
> > >
> > > As a concrete example, can you point to one of my own published DTrace
> > > tools that BPF can't do? These were created to solve many real
> > > production issues, and make good use cases. I've been porting them
> > > over to BPF (bcc and bpftrace) without too much problem, and I can't
> > > think of a single one that I couldn't port over today.
> >
> > I am unclear how pointing at one of your published DTrace tools would
> > contribute to this discussion.  Surely the scope of use cases is not limited
> > to the DTrace scripts you published?
> >
> > Either way, one of the features that I make use of is speculative tracing.
> 
> The subject was a concrete example. I don't think I used speculative
> tracing in any of my scripts. Can you pick a better example of
> something BPF can't do?

Well, then speculative tracing is a good example of something that cannot be
done right now.  The specopen.d script that is part of the DTrace documentation
and is also featured in the test suite makes for a concrete example.  That
script has been used as a kind of template script in situations where we had
to analyze code paths associated with a specific subset of conditions that
could not be known beforehand.

Are there other ways to accomplish the same thing?  Sure.  But I don't think
that is really the point.  There are often multiple tools that can do the same
thing (or close to the same thing), and people have the option to choose one
or the other.  DTrace is one of those tools, as is systemtap, bpftrace, perf,
and other tools.

> > And yes, even that could be handled with some ugly workarounds but my intent
> > is to implement things in a more clean way rather than depending on a bunch
> > of workarounds to make it somewhat work.
> >
> > > There's a few minor things that I'm currently doing workarounds for,
> > > like ppid, but that should be satisfied with a few more helpers. And
> > > if it's really niche, then BTF sounds like a good solution.
> >
> > Of course, we can always add more helpers to get to information that is
> > needed, but that is hardly a practical solution in the long run, and at
> > Plumbers 2019 it was already indicated that just adding helpers to get to
> > more information about tasks is not the route people want to take.
> >
> > > If your ultimate goal is to have a command called "dtrace" that runs D
> > > programs, to support your existing users, then I'd add a lex/yacc pair
> > > to bpftrace and have it emit a dtrace binary.
> >
> > My goal is not to have a command called dtrace that somehow simply provides
> > some form of support for dtrace scripts in some legacy support model.  My
> > goal is to make DTrace available on Linux based on existing kernel features
> > (and contributing extra features where needed, in a collaborative manner).
> 
> If bpftrace builds a dtrace binary that runs D code, then you just did
> make DTrace available on Linux.
> 
> And without changing the kernel.

If bpftrace could do everything that DTrace does, in a way that is 100%
transparent to the user, and without requiring extra dependencies like e.g.
software development packages (llvm, etc) to be installed on the system where
it will be used in production, I think we wouldn't be having this conversation.

Anyway, I feel we're getting off track here...  the discussion is not about
whether bpftrace can do what dtrace can do, or any other tool for that matter.
There is a need for DTrace on Linux and I am working on making that possible
in a way where DTrace is one among multiple tracing tools, and by leveraging
existing features in the kernel to the extent possible rather than putting it
in as an almost self-contained monolith.  We already made those patches 
available a while ago - but that isn't the right way to go about this in the
long run and it isn't a benefit to the overall community because there isn't
any good way other tools can make use of it.  As part of doing the work to
contribute DTrace as a tool within the Linux tracing framework I identify
areas where there are gaps in terms of what we need, and I contribute patches
that fill those gaps in a way that makes it possible for others to make use of
those features as well.

	Kris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] bpf: context casting for tail call and gtrace prog type
  2019-03-12 16:53                   ` Kris Van Hees
@ 2019-03-13 16:30                     ` Brendan Gregg
  2019-03-16  1:49                       ` Kris Van Hees
  0 siblings, 1 reply; 13+ messages in thread
From: Brendan Gregg @ 2019-03-13 16:30 UTC (permalink / raw)
  To: Kris Van Hees; +Cc: Alexei Starovoitov, netdev, bpf, Daniel Borkmann

On Tue, Mar 12, 2019 at 9:54 AM Kris Van Hees <kris.van.hees@oracle.com> wrote:
>
> On Mon, Mar 11, 2019 at 11:03:10PM -0700, Brendan Gregg wrote:
> > On Mon, Mar 11, 2019 at 8:24 PM Kris Van Hees <kris.van.hees@oracle.com> wrote:
> > >
> > > On Mon, Mar 11, 2019 at 06:29:55PM -0700, Brendan Gregg wrote:
> > > > On Mon, Mar 11, 2019 at 7:21 AM Kris Van Hees <kris.van.hees@oracle.com> wrote:
> > > > >
> > > > > On Thu, Mar 07, 2019 at 01:30:37PM -0800, Alexei Starovoitov wrote:
> > > > > > On Tue, Mar 05, 2019 at 09:03:57PM -0500, Kris Van Hees wrote:
> > > > [...]
> > > > > > > But being able to do things like this without
> > > > > > > needing to touch the context of any other BPF program type is a great benefit
> > > > > > > to offer tracing tools, as far as I see it.
> > > > > >
> > > > > > I still don't understand what you're referring to by 'things like this'
> > > > > > that somehow will be possible in the future, but not possible today.
> > > > > > Could you please give concrete example?
> > > > >
> > > > > My apologies for not being clear.  I am referring to the features of the
> > > > > gtrace context in terms of containing task information, and output buffers
> > > > > to be used in BPF programs triggered from various probe sources (kprobe,
> > > > > tracepoints, ...)  I would not want to suggest making changes to all the
> > > > > different program contexts in order to support tracing needs because that
> > > > > would be wrong.  Doing it in a central place makes it a lot easier to maintain
> > > > > without impacting other program types, etc.
> > > > >
> > > > > Of course, yes, bpf_probe_read() and bpf_perf_event_output() can be used
> > > > > to implement a lot of what existing tracing tools like DTrace can do, if you
> > > > > write them based on that.  One limitations I am obviously working with is
> > > > > that DTrace already exists and has existed for a long time.  And while it is
> > > > > 100% available as open source, it involves a pretty involved set of patches to
> > > > > be applied to the kernel to be able to use it which is just not ideal.  Hence
> > > > > the goal to make it available by re-using as much of the existing features in
> > > > > Linux as possible, while still maintaining the same level of functionality in
> > > > > DTrace.  That means we need to fill the gaps - and from where I am sitting,
> > > > > the ways to do that might as well be of use to others (if they want to).
> > > > >
> > > > > If phrasing things in the context of DTrace would make the conversation easier
> > > > > I certainly don;t mind doing that, but I really don't want to limit my patches
> > > > > to supporting just DTrace (even if right now it might be the only tracer using
> > > > > it).
> > > >
> > > > As a concrete example, can you point to one of my own published DTrace
> > > > tools that BPF can't do? These were created to solve many real
> > > > production issues, and make good use cases. I've been porting them
> > > > over to BPF (bcc and bpftrace) without too much problem, and I can't
> > > > think of a single one that I couldn't port over today.
> > >
> > > I am unclear how pointing at one of your published DTrace tools would
> > > contribute to this discussion.  Surely the scope of use cases is not limited
> > > to the DTrace scripts you published?
> > >
> > > Either way, one of the features that I make use of is speculative tracing.
> >
> > The subject was a concrete example. I don't think I used speculative
> > tracing in any of my scripts. Can you pick a better example of
> > something BPF can't do?
>
> Well, then speculative tracing is a good example of something that cannot be
> done right now.  The specopen.d script that is part of the DTrace documentation
> and is also featured in the test suite makes for a concrete example.  That
> script has been used as a kind of template script in situations where we had
> to analyze code paths associated with a specific subset of conditions that
> could not be known beforehand.

I used spec tracing zero times in any of my published scripts from
fifteen years of using DTrace, and zero times in my 1152-page DTrace
book. Might I find a use one day? Maybe, just as you have. But I
wouldn't call it a great example of necessary functionality!

Besides, I bet we could implement spec tracing using existing BPF maps
if we wanted -- I just haven't had that need.

>
> Are there other ways to accomplish the same thing?  Sure.  But I don't think
> that is really the point.  There are often multiple tools that can do the same
> thing (or close to the same thing), and people have the option to choose one
> or the other.  DTrace is one of those tools, as is systemtap, bpftrace, perf,
> and other tools.
>
> > > And yes, even that could be handled with some ugly workarounds but my intent
> > > is to implement things in a more clean way rather than depending on a bunch
> > > of workarounds to make it somewhat work.
> > >
> > > > There's a few minor things that I'm currently doing workarounds for,
> > > > like ppid, but that should be satisfied with a few more helpers. And
> > > > if it's really niche, then BTF sounds like a good solution.
> > >
> > > Of course, we can always add more helpers to get to information that is
> > > needed, but that is hardly a practical solution in the long run, and at
> > > Plumbers 2019 it was already indicated that just adding helpers to get to
> > > more information about tasks is not the route people want to take.
> > >
> > > > If your ultimate goal is to have a command called "dtrace" that runs D
> > > > programs, to support your existing users, then I'd add a lex/yacc pair
> > > > to bpftrace and have it emit a dtrace binary.
> > >
> > > My goal is not to have a command called dtrace that somehow simply provides
> > > some form of support for dtrace scripts in some legacy support model.  My
> > > goal is to make DTrace available on Linux based on existing kernel features
> > > (and contributing extra features where needed, in a collaborative manner).
> >
> > If bpftrace builds a dtrace binary that runs D code, then you just did
> > make DTrace available on Linux.
> >
> > And without changing the kernel.
>
> If bpftrace could do everything that DTrace does, in a way that is 100%
> transparent to the user, and without requiring extra dependencies like e.g.
> software development packages (llvm, etc) to be installed on the system where
> it will be used in production, I think we wouldn't be having this conversation.

bpftrace is solving that need, and it was built from the ground up to
work with eBPF. It's going to provide 100% of useful functionality.
Getting it available out of the box is both a user-space and distro
problem.

>
> Anyway, I feel we're getting off track here...  the discussion is not about
> whether bpftrace can do what dtrace can do, or any other tool for that matter.
> There is a need for DTrace on Linux and I am working on making that possible

I wanted DTrace on Linux five years ago. Now I don't. We have eBPF,
which does more.

> in a way where DTrace is one among multiple tracing tools, and by leveraging

Multiple tracing tools? This will fracture the user and engineering
communities. People have complained enough about Ftrace and perf and
BPF -- we don't need a fourth tracing framework that doesn't solve any
new problems.

We can add a legacy interface to bpftrace for running D programs, to
help people transition. That can be done as a user-space parser.

Brendan

-- 
Brendan Gregg, Senior Performance Architect, Netflix

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] bpf: context casting for tail call and gtrace prog type
  2019-03-13 16:30                     ` Brendan Gregg
@ 2019-03-16  1:49                       ` Kris Van Hees
  0 siblings, 0 replies; 13+ messages in thread
From: Kris Van Hees @ 2019-03-16  1:49 UTC (permalink / raw)
  To: Brendan Gregg
  Cc: Kris Van Hees, Alexei Starovoitov, netdev, bpf, Daniel Borkmann

On Wed, Mar 13, 2019 at 09:30:02AM -0700, Brendan Gregg wrote:
> On Tue, Mar 12, 2019 at 9:54 AM Kris Van Hees <kris.van.hees@oracle.com> wrote:
> >
> > On Mon, Mar 11, 2019 at 11:03:10PM -0700, Brendan Gregg wrote:
> > > On Mon, Mar 11, 2019 at 8:24 PM Kris Van Hees <kris.van.hees@oracle.com> wrote:
> > > >
> > > > On Mon, Mar 11, 2019 at 06:29:55PM -0700, Brendan Gregg wrote:
> > > > > On Mon, Mar 11, 2019 at 7:21 AM Kris Van Hees <kris.van.hees@oracle.com> wrote:
> > > > > >
> > > > > > On Thu, Mar 07, 2019 at 01:30:37PM -0800, Alexei Starovoitov wrote:
> > > > > > > On Tue, Mar 05, 2019 at 09:03:57PM -0500, Kris Van Hees wrote:
> > > > > [...]
> > > > > > > > But being able to do things like this without
> > > > > > > > needing to touch the context of any other BPF program type is a great benefit
> > > > > > > > to offer tracing tools, as far as I see it.
> > > > > > >
> > > > > > > I still don't understand what you're referring to by 'things like this'
> > > > > > > that somehow will be possible in the future, but not possible today.
> > > > > > > Could you please give concrete example?
> > > > > >
> > > > > > My apologies for not being clear.  I am referring to the features of the
> > > > > > gtrace context in terms of containing task information, and output buffers
> > > > > > to be used in BPF programs triggered from various probe sources (kprobe,
> > > > > > tracepoints, ...)  I would not want to suggest making changes to all the
> > > > > > different program contexts in order to support tracing needs because that
> > > > > > would be wrong.  Doing it in a central place makes it a lot easier to maintain
> > > > > > without impacting other program types, etc.
> > > > > >
> > > > > > Of course, yes, bpf_probe_read() and bpf_perf_event_output() can be used
> > > > > > to implement a lot of what existing tracing tools like DTrace can do, if you
> > > > > > write them based on that.  One limitations I am obviously working with is
> > > > > > that DTrace already exists and has existed for a long time.  And while it is
> > > > > > 100% available as open source, it involves a pretty involved set of patches to
> > > > > > be applied to the kernel to be able to use it which is just not ideal.  Hence
> > > > > > the goal to make it available by re-using as much of the existing features in
> > > > > > Linux as possible, while still maintaining the same level of functionality in
> > > > > > DTrace.  That means we need to fill the gaps - and from where I am sitting,
> > > > > > the ways to do that might as well be of use to others (if they want to).
> > > > > >
> > > > > > If phrasing things in the context of DTrace would make the conversation easier
> > > > > > I certainly don;t mind doing that, but I really don't want to limit my patches
> > > > > > to supporting just DTrace (even if right now it might be the only tracer using
> > > > > > it).
> > > > >
> > > > > As a concrete example, can you point to one of my own published DTrace
> > > > > tools that BPF can't do? These were created to solve many real
> > > > > production issues, and make good use cases. I've been porting them
> > > > > over to BPF (bcc and bpftrace) without too much problem, and I can't
> > > > > think of a single one that I couldn't port over today.
> > > >
> > > > I am unclear how pointing at one of your published DTrace tools would
> > > > contribute to this discussion.  Surely the scope of use cases is not limited
> > > > to the DTrace scripts you published?
> > > >
> > > > Either way, one of the features that I make use of is speculative tracing.
> > >
> > > The subject was a concrete example. I don't think I used speculative
> > > tracing in any of my scripts. Can you pick a better example of
> > > something BPF can't do?
> >
> > Well, then speculative tracing is a good example of something that cannot be
> > done right now.  The specopen.d script that is part of the DTrace documentation
> > and is also featured in the test suite makes for a concrete example.  That
> > script has been used as a kind of template script in situations where we had
> > to analyze code paths associated with a specific subset of conditions that
> > could not be known beforehand.
> 
> I used spec tracing zero times in any of my published scripts from
> fifteen years of using DTrace, and zero times in my 1152-page DTrace
> book. Might I find a use one day? Maybe, just as you have. But I
> wouldn't call it a great example of necessary functionality!

Perhaps not for you, but it is for me.  There are other much simpler things
that are currently missing, such as being able to obtain euid/egid, ppid (as
you mentioned already), ...  There are workaround possible or helpers can be
added, but I would think that making them available by means of a mechanism
that is very much native to BPF (obtaining data from context, as is done for
networking packets) makes a lot more sense.

> Besides, I bet we could implement spec tracing using existing BPF maps
> if we wanted -- I just haven't had that need.
> 
> >
> > Are there other ways to accomplish the same thing?  Sure.  But I don't think
> > that is really the point.  There are often multiple tools that can do the same
> > thing (or close to the same thing), and people have the option to choose one
> > or the other.  DTrace is one of those tools, as is systemtap, bpftrace, perf,
> > and other tools.
> >
> > > > And yes, even that could be handled with some ugly workarounds but my intent
> > > > is to implement things in a more clean way rather than depending on a bunch
> > > > of workarounds to make it somewhat work.
> > > >
> > > > > There's a few minor things that I'm currently doing workarounds for,
> > > > > like ppid, but that should be satisfied with a few more helpers. And
> > > > > if it's really niche, then BTF sounds like a good solution.
> > > >
> > > > Of course, we can always add more helpers to get to information that is
> > > > needed, but that is hardly a practical solution in the long run, and at
> > > > Plumbers 2019 it was already indicated that just adding helpers to get to
> > > > more information about tasks is not the route people want to take.
> > > >
> > > > > If your ultimate goal is to have a command called "dtrace" that runs D
> > > > > programs, to support your existing users, then I'd add a lex/yacc pair
> > > > > to bpftrace and have it emit a dtrace binary.
> > > >
> > > > My goal is not to have a command called dtrace that somehow simply provides
> > > > some form of support for dtrace scripts in some legacy support model.  My
> > > > goal is to make DTrace available on Linux based on existing kernel features
> > > > (and contributing extra features where needed, in a collaborative manner).
> > >
> > > If bpftrace builds a dtrace binary that runs D code, then you just did
> > > make DTrace available on Linux.
> > >
> > > And without changing the kernel.
> >
> > If bpftrace could do everything that DTrace does, in a way that is 100%
> > transparent to the user, and without requiring extra dependencies like e.g.
> > software development packages (llvm, etc) to be installed on the system where
> > it will be used in production, I think we wouldn't be having this conversation.
> 
> bpftrace is solving that need, and it was built from the ground up to
> work with eBPF. It's going to provide 100% of useful functionality.
> Getting it available out of the box is both a user-space and distro
> problem.
> 
> >
> > Anyway, I feel we're getting off track here...  the discussion is not about
> > whether bpftrace can do what dtrace can do, or any other tool for that matter.
> > There is a need for DTrace on Linux and I am working on making that possible
> 
> I wanted DTrace on Linux five years ago. Now I don't. We have eBPF,
> which does more.

I respect your personal feeling on this, but there are people who use DTrace
on Linux, and who want DTrace to keep improving and be more based on existing
core kernel features.  You say that eBPF does more, but clearly it does not
since there are features DTrace has that eBPF currently does not provide (as
basic as ppid of a task).

But really, where does that even matter?  If eBPF ends up being able to do
more than the current DTrace can, then a DTrace that is making use of eBPF
will obviously be able to do more as well.  That's a great benefit in my
opinion.

> > in a way where DTrace is one among multiple tracing tools, and by leveraging
> 
> Multiple tracing tools? This will fracture the user and engineering
> communities. People have complained enough about Ftrace and perf and
> BPF -- we don't need a fourth tracing framework that doesn't solve any
> new problems.

People already use multiple tools and those tools continue being developed.
And I am not working on a tracing framework alongside whatever already exists,
but rather on integrating DTrace with the existing tracing features in the
kernel.  Even if you feel you don't need another tracing tool, fact is that
DTrace already exists on Linux even though the patches are not in the upstream
kernel.  People use it and like it.  Working to make it available without the
rather invasive kernel changes and instead building on the existing kernel
features means we also can contribute new features that expand the capabilities
of eBPF (within its excellent safety mechanisms) and hopefully other tracing
features.  What is wrong with that?

There will always be new features added to tools, there will always be new
tools that emerge.  That's the nature of software development.  And people
have a tendency to work with tools they like.  Should everyone stop writing
C programs because C++ exists?

> We can add a legacy interface to bpftrace for running D programs, to
> help people transition. That can be done as a user-space parser.

Perhaps - but there clearly are features you are missing.  And your statement
makes the rather big assumption that people *want* to transition.  Why would
you force people to transition, or live with some kind of legacy interface?

But the point of the discussion here is not whether bpftrace can be made to
behave as if it is dtrace - I submitted patches that implement new
functionality.  Are you suggesting that those patches should be rejected
off-hand based on the fact that you feel they are unnecessary because people
should use bpftrace instead?

	Kris

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-03-16  1:49 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-02-25 15:54 [PATCH 0/2] bpf: context casting for tail call and gtrace prog type Kris Van Hees
2019-02-26  6:18 ` Alexei Starovoitov
2019-02-26  6:46   ` Kris Van Hees
2019-03-05 18:59     ` Alexei Starovoitov
2019-03-06  2:03       ` Kris Van Hees
2019-03-07 21:30         ` Alexei Starovoitov
2019-03-11 14:21           ` Kris Van Hees
2019-03-12  1:29             ` Brendan Gregg
2019-03-12  3:24               ` Kris Van Hees
2019-03-12  6:03                 ` Brendan Gregg
2019-03-12 16:53                   ` Kris Van Hees
2019-03-13 16:30                     ` Brendan Gregg
2019-03-16  1:49                       ` Kris Van Hees

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).