* [PATCH 0/2] bpf: context casting for tail call and gtrace prog type @ 2019-02-25 15:54 Kris Van Hees 2019-02-26 6:18 ` Alexei Starovoitov 0 siblings, 1 reply; 13+ messages in thread From: Kris Van Hees @ 2019-02-25 15:54 UTC (permalink / raw) To: netdev The patches in this set are part of an effort to provide support for tracing tools beyond attaching programs to probes and events and working with the context data they provide. It is also aimed at avoiding adding new helpers for every piece of task information that tracers may want to include in trace data (as was discussed at the Linux Plumbers Conference BPF mini-conference track last year). One of the main characteristics of tracers is that a variety of information can be collected at the time of a probe firing. When using BPF program to implement actions to be taken when a probe fires, The most natural source of a large part of this information (task information, probe data, tracer state) is the context that is associated with a BPF program. It is also possible to obtain (most of) this information by means of full access helper calls like probe_read() but that isn't something you want to make available to an unprivileged user. So we have two areas where BPF programs can be very useful and powerful: - BPF programs that are attached to a probe or event, operating on the context provided by the probe or event - BPF programs that implement actions to be taken within the context of a tracing tool The Linux kernel provides a wealth of probes and events to which we can attach BPF programs. These event sources do not have any knowledge of the tracing tool that might be using them. But being able to use them from any tracing tool is definitely preferable over implementing your own probes and events. We definitely also do not want to 'teach' all the existing probes and events about any possible BPF program type that would like to get called from those probes and events. So, to illustrate what we're trying to accomplish, consider a kprobe. We can attach a BPF program to it and it will be called with a 'struct pt_regs' context. From the side of our tracing tool, we also want information about the task that triggered the kprobe to fire (beyond what is currently available through helpers) and we want to be able to access that information from a BPF program that implements what should happen when the probe fires (e.g. recording the event and specific data that we are interested in). The 2nd patch in this set implements a very basic generic tracer program type BPF_PROG_TYPE_GTRACE) that provides the pt_regs data and select task data in its context. We cannot attach a program of this type to a kprobe because that probe supports BPF_PROG_TYPE_KPROBE instead. The 1st patch in this set implements a mechanism to solve this issue: it allows a tail-call from one program type to another if the callee type supports conversion of a caller context into a context for the callee. So, in the sample, the BPF_PROG_TYPE_GTRACE provides can_cast() and cast_context() functions that support converting a BPF_PROG_TYPE_KPROBE context into a BPF_PROG_TYPE_GTRACE context. The work flow a tracer can use is: 1. The tracer creates a program array map, and inserts one or more programs of type BPF_PROG_TYPE_GTRACE. These programs implement whatever actions are to be taken when a specific probe fires. This step must be done first so that the program array is initialized with the correct program type. This type needs to be known so that when the calling program is verified, compatibility checking can be performed. 2. The tracer loads a program of type BPF_PROG_TYPE_KPROBE and attaches it to the kprobe we're interested in. This program contains a tail-call to a BPF_PROG_TYPE_GTRACE program in the program array. 3. The kprobe fires and executes our program (of type BPF_PROG_TYPE_KPROBE). 3.1 The program performs whatever operations that we need to have done at the level of the probe firing. 3.2 The program performs a tail-call into a program from our program array. 3.2.1 The execution of the tail-call instruction causes a call to be made to a cast_context() function provided by BPF_PROG_TYPE_GTRACE. This function creates a context structure, and populates it with task information and copies in the pt_regs data from the context that was passed to the BPF_PROG_TYPE_KPROBE program. 3.2.2 The new context is assigned to R1 (replacing the original context), and execution is transferred to the called program. The implementation is done in such way that existing tail-calls will work without any change aside from the fact that the verifier is inserting an instruction right before the tail-call. That instruction simply loads the BPF program type into R4. This ensures that at the time of the tail-call, the program type of the calling program can be passed to the cast_context() function. Knowledge about the program type of an executing program is not available anywhere and we need to know what context we're trying to convert from. The function prototype for the (pseudo-)helper bpf_tail_call declares only 3 arguments so existing code is not affected by this internal use of R4. Obviously, if there is no conversion function or the conversion is not supported, the tail-call will fail because that situation is effectively the same as trying to call a program of an incompatible type. The goal is to further extend the BPF_PROG_TYPE_GTRACE implementation to support what tracers commonly need, and I am also looking at ways to further extend this model to allow more tracer-specific features as well without the need for adding a BPF program types for every tracer. Kris ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] bpf: context casting for tail call and gtrace prog type 2019-02-25 15:54 [PATCH 0/2] bpf: context casting for tail call and gtrace prog type Kris Van Hees @ 2019-02-26 6:18 ` Alexei Starovoitov 2019-02-26 6:46 ` Kris Van Hees 0 siblings, 1 reply; 13+ messages in thread From: Alexei Starovoitov @ 2019-02-26 6:18 UTC (permalink / raw) To: Kris Van Hees; +Cc: netdev, bpf, daniel On Mon, Feb 25, 2019 at 07:54:13AM -0800, Kris Van Hees wrote: > > The goal is to further extend the BPF_PROG_TYPE_GTRACE implementation to > support what tracers commonly need, and I am also looking at ways to further > extend this model to allow more tracer-specific features as well without the > need for adding a BPF program types for every tracer. It seems by themselves the patches don't provide any new functionality, but instead look like plumbing to call external code. This is no-go. There were several attempts to do so in the past, so we documented it here: Documentation/bpf/bpf_design_QA.rst Q: New functionality via kernel modules? ---------------------------------------- Q: Can BPF functionality such as new program or map types, new helpers, etc be added out of kernel module code? A: NO. The answer is still the same. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] bpf: context casting for tail call and gtrace prog type 2019-02-26 6:18 ` Alexei Starovoitov @ 2019-02-26 6:46 ` Kris Van Hees 2019-03-05 18:59 ` Alexei Starovoitov 0 siblings, 1 reply; 13+ messages in thread From: Kris Van Hees @ 2019-02-26 6:46 UTC (permalink / raw) To: Alexei Starovoitov; +Cc: Kris Van Hees, netdev, bpf, daniel On Mon, Feb 25, 2019 at 10:18:25PM -0800, Alexei Starovoitov wrote: > On Mon, Feb 25, 2019 at 07:54:13AM -0800, Kris Van Hees wrote: > > > > The goal is to further extend the BPF_PROG_TYPE_GTRACE implementation to > > support what tracers commonly need, and I am also looking at ways to further > > extend this model to allow more tracer-specific features as well without the > > need for adding a BPF program types for every tracer. > > It seems by themselves the patches don't provide any new functionality, > but instead look like plumbing to call external code. The patches are definitely not plumbing to call external code, and if I gave that impression I apologise. I overlooked the information you quote below on allowing new functionality through modules when I wrote the comment above but please note that it was a forward-looking comment in terms of what could be done - not a reason for the patches that I submitted. The patches accomplish something that is totally independent from that: they make it possible for existing events that execute BPF programs when triggered to transfer control to a BPF program with a more rich context. The first patch makes such a transfer possible (using tail-call combined with converting the context to the new program type), and the second patch provides one such program type (generic trace). The new functionality provided by the program type is direct access to task information that previously could only be obtained through helper calls. E.g. the new program type allows programs to access the task state, prio, ppid, euid, and egid. None of those pieces of information can currently be obtained unless you start poking around in memory using bpf_probe_read() helper calls. > This is no-go. > There were several attempts to do so in the past, so we documented it here: > Documentation/bpf/bpf_design_QA.rst > Q: New functionality via kernel modules? > ---------------------------------------- > Q: Can BPF functionality such as new program or map types, new > helpers, etc be added out of kernel module code? > > A: NO. > > The answer is still the same. Thanks for pointing this out - but again, my reference to modules was merely musing about the possibilities. This information clearly closes the door on that train of thought, but that is not directly related to what I am doing with the patches I submitted. Kris ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] bpf: context casting for tail call and gtrace prog type 2019-02-26 6:46 ` Kris Van Hees @ 2019-03-05 18:59 ` Alexei Starovoitov 2019-03-06 2:03 ` Kris Van Hees 0 siblings, 1 reply; 13+ messages in thread From: Alexei Starovoitov @ 2019-03-05 18:59 UTC (permalink / raw) To: Kris Van Hees; +Cc: netdev, bpf, daniel On Tue, Feb 26, 2019 at 01:46:01AM -0500, Kris Van Hees wrote: > On Mon, Feb 25, 2019 at 10:18:25PM -0800, Alexei Starovoitov wrote: > > On Mon, Feb 25, 2019 at 07:54:13AM -0800, Kris Van Hees wrote: > > > > > > The goal is to further extend the BPF_PROG_TYPE_GTRACE implementation to > > > support what tracers commonly need, and I am also looking at ways to further > > > extend this model to allow more tracer-specific features as well without the > > > need for adding a BPF program types for every tracer. > > > > It seems by themselves the patches don't provide any new functionality, > > but instead look like plumbing to call external code. > > The patches are definitely not plumbing to call external code, and if I gave > that impression I apologise. I overlooked the information you quote below on > allowing new functionality through modules when I wrote the comment above but > please note that it was a forward-looking comment in terms of what could be > done - not a reason for the patches that I submitted. > > The patches accomplish something that is totally independent from that: they > make it possible for existing events that execute BPF programs when triggered > to transfer control to a BPF program with a more rich context. The first > patch makes such a transfer possible (using tail-call combined with converting > the context to the new program type), and the second patch provides one such > program type (generic trace). The new functionality provided by the program > type is direct access to task information that previously could only be > obtained through helper calls. E.g. the new program type allows programs to > access the task state, prio, ppid, euid, and egid. None of those pieces of > information can currently be obtained unless you start poking around in > memory using bpf_probe_read() helper calls. I don't think I understand the problem you're trying to solve. From kprobe/tracepoints/etc bpf prog can use bpf_probe_read() to read everything. Are you saying direct access to state, prio, ppid, euid, and egid via context is much superior? Why? Because it's more stable? Why stop at these fields then? task_struct has many others. What we observed that no matter how many fields we add to stable uapi somebody will always request one more. For networking the total number of such fields is contained, but for tracing we're talking about thousands of useful fields. We cannot make them stable. Hence we've been working on alternative approach via BTF to make all of kernel internal fields sort-of stable via 'compile once' technique that we described at the last LPC. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] bpf: context casting for tail call and gtrace prog type 2019-03-05 18:59 ` Alexei Starovoitov @ 2019-03-06 2:03 ` Kris Van Hees 2019-03-07 21:30 ` Alexei Starovoitov 0 siblings, 1 reply; 13+ messages in thread From: Kris Van Hees @ 2019-03-06 2:03 UTC (permalink / raw) To: Alexei Starovoitov; +Cc: Kris Van Hees, netdev, bpf, daniel On Tue, Mar 05, 2019 at 10:59:52AM -0800, Alexei Starovoitov wrote: > On Tue, Feb 26, 2019 at 01:46:01AM -0500, Kris Van Hees wrote: > > On Mon, Feb 25, 2019 at 10:18:25PM -0800, Alexei Starovoitov wrote: > > > On Mon, Feb 25, 2019 at 07:54:13AM -0800, Kris Van Hees wrote: > > > > > > > > The goal is to further extend the BPF_PROG_TYPE_GTRACE implementation to > > > > support what tracers commonly need, and I am also looking at ways to further > > > > extend this model to allow more tracer-specific features as well without the > > > > need for adding a BPF program types for every tracer. > > > > > > It seems by themselves the patches don't provide any new functionality, > > > but instead look like plumbing to call external code. > > > > The patches are definitely not plumbing to call external code, and if I gave > > that impression I apologise. I overlooked the information you quote below on > > allowing new functionality through modules when I wrote the comment above but > > please note that it was a forward-looking comment in terms of what could be > > done - not a reason for the patches that I submitted. > > > > The patches accomplish something that is totally independent from that: they > > make it possible for existing events that execute BPF programs when triggered > > to transfer control to a BPF program with a more rich context. The first > > patch makes such a transfer possible (using tail-call combined with converting > > the context to the new program type), and the second patch provides one such > > program type (generic trace). The new functionality provided by the program > > type is direct access to task information that previously could only be > > obtained through helper calls. E.g. the new program type allows programs to > > access the task state, prio, ppid, euid, and egid. None of those pieces of > > information can currently be obtained unless you start poking around in > > memory using bpf_probe_read() helper calls. > > I don't think I understand the problem you're trying to solve. > >From kprobe/tracepoints/etc bpf prog can use bpf_probe_read() to read everything. > Are you saying direct access to state, prio, ppid, euid, and egid via context > is much superior? Why? Because it's more stable? When you provide tracing to non-privileged users you definitely do not want to allow BPF programs to access any memory they want in kernel space, yet you would still want to be able to provide a decent amount of information about tasks at time of probe firing. > Why stop at these fields then? task_struct has many others. > > What we observed that no matter how many fields we add to stable uapi > somebody will always request one more. For networking the total number of > such fields is contained, but for tracing we're talking about thousands > of useful fields. We cannot make them stable. > Hence we've been working on alternative approach via BTF to make all > of kernel internal fields sort-of stable via 'compile once' technique that > we described at the last LPC. Sure, but the ones I put in there were an example of how this can be used. And again, in the case of unprivileged tracing, this easily becomes an issue about where you end up enforcing what a tracing program can do and cannot do. There will always be cases where more than the 'standard' information is needed for a tracing task, and then it would be quite reasonable to conclude that a higher level of privileges is required to accomplish that - but that shouldn't prevent unprivileged tracing from being able to be useful as well. Again, the limited set of fields I put in there right now is a matter of showing how this can be used. It is certainly meant to be expended quite a bit. The primary reason though behind the context conversion approach and the generic tracing program type and context is that tracing on Linux based on the existing kernel facilities limits the userspace tools because userspace has quite limited control over what happens when a probe/event fires. One of the features of advanced tracing tools has been the ability to have more (safe) control over what happens when the probe/event fires and how data is stored in output buffers. Since the userspace tool is the one requested data and ultimately processes the generated data, it stands to reason that it would benefit from being able to have more freedom in that area. But that means it needs to be able to provide a BPF program of a type that more closely relates to the tracing tool functionality rather than the probe or event itself (especially since probes and events are very specific, and by their very nature should not really care about how userspace uses information). This is again even more true for privileged tracing - right now there is a lot of useful task information that you cannot get to without bpf_probe_read() but unprivileged users really shouldn't be able to just read arbitrary kernel memory. So in summary, I am trying to solve two (related) problems: - Ensure that unprivileged tracing can obtain information about the task that triggered a probe or event. There will always be limitations but we can do better than is available now. - Allow tracing tools ab ability to provide actions to be performed when a probe or event fires, beyond what the individual BPF program types allow for the specific probe/event types (and do it in a generic manner, in a sense encapsulating multiple probe/event types in a more generic tracing context). A patch I am currently working on ties into this (and I hope to get it ready sometime next week). It builds on the support you already have for accessing packet data from the __sk_buff context. If we can make this same functionality available to other contexts as well, my goal would be to make it possible for the generic tracing context to have a buffer (data and data_end members) that the BPF program can issue direct stores to as a means to allow a tracing program to control how data is written into the buffer. I am still working out some details but I have a prototype working, and it retains all safety provisions that BPF offres us. But being able to do things like this without needing to touch the context of any other BPF program type is a great benefit to offer tracing tools, as far as I see it. Kris ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] bpf: context casting for tail call and gtrace prog type 2019-03-06 2:03 ` Kris Van Hees @ 2019-03-07 21:30 ` Alexei Starovoitov 2019-03-11 14:21 ` Kris Van Hees 0 siblings, 1 reply; 13+ messages in thread From: Alexei Starovoitov @ 2019-03-07 21:30 UTC (permalink / raw) To: Kris Van Hees; +Cc: netdev, bpf, daniel On Tue, Mar 05, 2019 at 09:03:57PM -0500, Kris Van Hees wrote: > > So in summary, I am trying to solve two (related) problems: > > - Ensure that unprivileged tracing can obtain information about the task that > triggered a probe or event. There will always be limitations but we can do > better than is available now. I think unprivileged tracing is a huge topic on its own. It's too easy to create security holes with such mechanism. kprobe/tracepoints/etc have been historically root only and I don't see a way for them to become unpriv. imo the existing /proc/pid/status is already more powerful than what you're proposing with gtrace context. > - Allow tracing tools ab ability to provide actions to be performed when a > probe or event fires, beyond what the individual BPF program types allow > for the specific probe/event types (and do it in a generic manner, in a > sense encapsulating multiple probe/event types in a more generic tracing > context). I think existing bpf tracing is generic whereas proposed gtrace is not generic at all. 'generic' is a loaded word. we can throw it back and forth and won't make any forward progress. Let's focus on technical bits, ok? > A patch I am currently working on ties into this (and I hope to get it ready > sometime next week). It builds on the support you already have for accessing > packet data from the __sk_buff context. If we can make this same functionality > available to other contexts as well, my goal would be to make it possible for > the generic tracing context to have a buffer (data and data_end members) that > the BPF program can issue direct stores to as a means to allow a tracing > program to control how data is written into the buffer. sounds like you're trying to reinvent bpf_perf_event_output() mechanism. > But being able to do things like this without > needing to touch the context of any other BPF program type is a great benefit > to offer tracing tools, as far as I see it. I still don't understand what you're referring to by 'things like this' that somehow will be possible in the future, but not possible today. Could you please give concrete example? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] bpf: context casting for tail call and gtrace prog type 2019-03-07 21:30 ` Alexei Starovoitov @ 2019-03-11 14:21 ` Kris Van Hees 2019-03-12 1:29 ` Brendan Gregg 0 siblings, 1 reply; 13+ messages in thread From: Kris Van Hees @ 2019-03-11 14:21 UTC (permalink / raw) To: Alexei Starovoitov; +Cc: Kris Van Hees, netdev, bpf, daniel On Thu, Mar 07, 2019 at 01:30:37PM -0800, Alexei Starovoitov wrote: > On Tue, Mar 05, 2019 at 09:03:57PM -0500, Kris Van Hees wrote: > > > > So in summary, I am trying to solve two (related) problems: > > > > - Ensure that unprivileged tracing can obtain information about the task that > > triggered a probe or event. There will always be limitations but we can do > > better than is available now. > > I think unprivileged tracing is a huge topic on its own. > It's too easy to create security holes with such mechanism. > kprobe/tracepoints/etc have been historically root only and I don't see a way > for them to become unpriv. I agree it is a huge topic and one that needs very careful attention, but I believe it certainly can be done (DTrace on Solaris provided it, and while we haven't implemented it on Linux thus far, the availability of BPF actually makes it more realistic). This remains an important goal for us, and while it will take a while to get there, we certain want to contribute in this area. > imo the existing /proc/pid/status is already more powerful than > what you're proposing with gtrace context. THe problem with /proc/pid/status is that it cannot be read at the exact same time as when a peobe fires. And when you're doing tracing, that is one of the main requirements: looking at the system at the moment of the probe firing. Surely, /proc/pid/status contains more information right now but as I mentioned in earlier email, the gtrace context I proposed so far was merely showing what I am trying to do - it is nowhere near the final version. > > - Allow tracing tools ab ability to provide actions to be performed when a > > probe or event fires, beyond what the individual BPF program types allow > > for the specific probe/event types (and do it in a generic manner, in a > > sense encapsulating multiple probe/event types in a more generic tracing > > context). > > I think existing bpf tracing is generic whereas proposed gtrace is not generic at all. > 'generic' is a loaded word. we can throw it back and forth and won't make > any forward progress. Let's focus on technical bits, ok? Sure, I definitely agree that we can mean very different things with 'generic'. It is important then to explain what I mean here though because it is rather crucial to the design. From the perspective of tracing that we want to be able to do we are mostly interested in being able to look at the system as a whole through the sequence of probes that fire. In that way, the specific context of each probe is important, but the actions to be executed when probes fire need more than just that probe context - they also need information about the task that is executing. An example would be to trace whether non-root executables (often scripts) make use of setuid executables that could potentially be a security risk. You end up tracing through a possible complex tree of task clones, tasks issueing exec syscalls, and performing file operations. The probes themselves are pretty meaningless without the larger context. It is the task context combined with the probe specifics that provide the information you need to execute meaningful actions when a probe fires. And that information needs to be obtained when the probe fires - reading it from /proc when the probe data reaches userspace is often too late. Think of the case where you need to trace information when an exec syscall is executed - by the time tracing data is available in a buffer for userspace to process, the syscall will usually have completed and userspace will only be able to obtain task info about the state *after* the exec took place. The majority of tracing use-cases that I encounter relate to observing one or more tasks rather than probes without caring about the task context, and I would argue that most use-cases I have read about from other people match this observation, Since we want to be able to use BPF as the execution engine for the probe actions, it seems to make sense to me that the BPF context available to those programs would therefore be providing the task data and probe data, so that the programs can be more naturally written based on the context that makes sense for the functionality they are providing. So, my patches provide a initial implementation of a BPF program context (and a mechanism to execute programs in it as a result of probes triggering) that serves the perspective of the tracer looking at tasks. > > A patch I am currently working on ties into this (and I hope to get it ready > > sometime next week). It builds on the support you already have for accessing > > packet data from the __sk_buff context. If we can make this same functionality > > available to other contexts as well, my goal would be to make it possible for > > the generic tracing context to have a buffer (data and data_end members) that > > the BPF program can issue direct stores to as a means to allow a tracing > > program to control how data is written into the buffer. > > sounds like you're trying to reinvent bpf_perf_event_output() mechanism. It sure sounds like that, but the difference is that bpf_perf_event_output() encapsulates data you are providing in a specific format already. Most of the tracers that support BPF in some way seems to pick a perf_event output type that allows providing raw data and use that to write out what they need to pass to userspace. Our situation is a bit different because we have an existing tracing tool (DTrace) with its own requirements on what the output buffer data is supposed to look like. When I looked at the options of using (somewhat abusing) the bpf_perf_event_output() mechanism as a vehicle to get data to userspace vs supporting DTrace's format, I concluded that being able to let the tracer (DTrace in my case) define the output format is a benefit because it means others can do the same (if they want to). And given that DTrace works with multiple buffer types and with things like speculation buffers (buffers that are used as temporary output store in place of the default buffer - in a way that the rest of the action need not be aware of - and written to the default buffer upon commit, or discarded when not needed), bpf_perf_event_output() is not sufficient. > > But being able to do things like this without > > needing to touch the context of any other BPF program type is a great benefit > > to offer tracing tools, as far as I see it. > > I still don't understand what you're referring to by 'things like this' > that somehow will be possible in the future, but not possible today. > Could you please give concrete example? My apologies for not being clear. I am referring to the features of the gtrace context in terms of containing task information, and output buffers to be used in BPF programs triggered from various probe sources (kprobe, tracepoints, ...) I would not want to suggest making changes to all the different program contexts in order to support tracing needs because that would be wrong. Doing it in a central place makes it a lot easier to maintain without impacting other program types, etc. Of course, yes, bpf_probe_read() and bpf_perf_event_output() can be used to implement a lot of what existing tracing tools like DTrace can do, if you write them based on that. One limitations I am obviously working with is that DTrace already exists and has existed for a long time. And while it is 100% available as open source, it involves a pretty involved set of patches to be applied to the kernel to be able to use it which is just not ideal. Hence the goal to make it available by re-using as much of the existing features in Linux as possible, while still maintaining the same level of functionality in DTrace. That means we need to fill the gaps - and from where I am sitting, the ways to do that might as well be of use to others (if they want to). If phrasing things in the context of DTrace would make the conversation easier I certainly don;t mind doing that, but I really don't want to limit my patches to supporting just DTrace (even if right now it might be the only tracer using it). Kris ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] bpf: context casting for tail call and gtrace prog type 2019-03-11 14:21 ` Kris Van Hees @ 2019-03-12 1:29 ` Brendan Gregg 2019-03-12 3:24 ` Kris Van Hees 0 siblings, 1 reply; 13+ messages in thread From: Brendan Gregg @ 2019-03-12 1:29 UTC (permalink / raw) To: Kris Van Hees; +Cc: Alexei Starovoitov, netdev, bpf, Daniel Borkmann On Mon, Mar 11, 2019 at 7:21 AM Kris Van Hees <kris.van.hees@oracle.com> wrote: > > On Thu, Mar 07, 2019 at 01:30:37PM -0800, Alexei Starovoitov wrote: > > On Tue, Mar 05, 2019 at 09:03:57PM -0500, Kris Van Hees wrote: [...] > > > But being able to do things like this without > > > needing to touch the context of any other BPF program type is a great benefit > > > to offer tracing tools, as far as I see it. > > > > I still don't understand what you're referring to by 'things like this' > > that somehow will be possible in the future, but not possible today. > > Could you please give concrete example? > > My apologies for not being clear. I am referring to the features of the > gtrace context in terms of containing task information, and output buffers > to be used in BPF programs triggered from various probe sources (kprobe, > tracepoints, ...) I would not want to suggest making changes to all the > different program contexts in order to support tracing needs because that > would be wrong. Doing it in a central place makes it a lot easier to maintain > without impacting other program types, etc. > > Of course, yes, bpf_probe_read() and bpf_perf_event_output() can be used > to implement a lot of what existing tracing tools like DTrace can do, if you > write them based on that. One limitations I am obviously working with is > that DTrace already exists and has existed for a long time. And while it is > 100% available as open source, it involves a pretty involved set of patches to > be applied to the kernel to be able to use it which is just not ideal. Hence > the goal to make it available by re-using as much of the existing features in > Linux as possible, while still maintaining the same level of functionality in > DTrace. That means we need to fill the gaps - and from where I am sitting, > the ways to do that might as well be of use to others (if they want to). > > If phrasing things in the context of DTrace would make the conversation easier > I certainly don;t mind doing that, but I really don't want to limit my patches > to supporting just DTrace (even if right now it might be the only tracer using > it). As a concrete example, can you point to one of my own published DTrace tools that BPF can't do? These were created to solve many real production issues, and make good use cases. I've been porting them over to BPF (bcc and bpftrace) without too much problem, and I can't think of a single one that I couldn't port over today. There's a few minor things that I'm currently doing workarounds for, like ppid, but that should be satisfied with a few more helpers. And if it's really niche, then BTF sounds like a good solution. If your ultimate goal is to have a command called "dtrace" that runs D programs, to support your existing users, then I'd add a lex/yacc pair to bpftrace and have it emit a dtrace binary. Brendan -- Brendan Gregg, Senior Performance Architect, Netflix ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] bpf: context casting for tail call and gtrace prog type 2019-03-12 1:29 ` Brendan Gregg @ 2019-03-12 3:24 ` Kris Van Hees 2019-03-12 6:03 ` Brendan Gregg 0 siblings, 1 reply; 13+ messages in thread From: Kris Van Hees @ 2019-03-12 3:24 UTC (permalink / raw) To: Brendan Gregg Cc: Kris Van Hees, Alexei Starovoitov, netdev, bpf, Daniel Borkmann On Mon, Mar 11, 2019 at 06:29:55PM -0700, Brendan Gregg wrote: > On Mon, Mar 11, 2019 at 7:21 AM Kris Van Hees <kris.van.hees@oracle.com> wrote: > > > > On Thu, Mar 07, 2019 at 01:30:37PM -0800, Alexei Starovoitov wrote: > > > On Tue, Mar 05, 2019 at 09:03:57PM -0500, Kris Van Hees wrote: > [...] > > > > But being able to do things like this without > > > > needing to touch the context of any other BPF program type is a great benefit > > > > to offer tracing tools, as far as I see it. > > > > > > I still don't understand what you're referring to by 'things like this' > > > that somehow will be possible in the future, but not possible today. > > > Could you please give concrete example? > > > > My apologies for not being clear. I am referring to the features of the > > gtrace context in terms of containing task information, and output buffers > > to be used in BPF programs triggered from various probe sources (kprobe, > > tracepoints, ...) I would not want to suggest making changes to all the > > different program contexts in order to support tracing needs because that > > would be wrong. Doing it in a central place makes it a lot easier to maintain > > without impacting other program types, etc. > > > > Of course, yes, bpf_probe_read() and bpf_perf_event_output() can be used > > to implement a lot of what existing tracing tools like DTrace can do, if you > > write them based on that. One limitations I am obviously working with is > > that DTrace already exists and has existed for a long time. And while it is > > 100% available as open source, it involves a pretty involved set of patches to > > be applied to the kernel to be able to use it which is just not ideal. Hence > > the goal to make it available by re-using as much of the existing features in > > Linux as possible, while still maintaining the same level of functionality in > > DTrace. That means we need to fill the gaps - and from where I am sitting, > > the ways to do that might as well be of use to others (if they want to). > > > > If phrasing things in the context of DTrace would make the conversation easier > > I certainly don;t mind doing that, but I really don't want to limit my patches > > to supporting just DTrace (even if right now it might be the only tracer using > > it). > > As a concrete example, can you point to one of my own published DTrace > tools that BPF can't do? These were created to solve many real > production issues, and make good use cases. I've been porting them > over to BPF (bcc and bpftrace) without too much problem, and I can't > think of a single one that I couldn't port over today. I am unclear how pointing at one of your published DTrace tools would contribute to this discussion. Surely the scope of use cases is not limited to the DTrace scripts you published? Either way, one of the features that I make use of is speculative tracing. And yes, even that could be handled with some ugly workarounds but my intent is to implement things in a more clean way rather than depending on a bunch of workarounds to make it somewhat work. > There's a few minor things that I'm currently doing workarounds for, > like ppid, but that should be satisfied with a few more helpers. And > if it's really niche, then BTF sounds like a good solution. Of course, we can always add more helpers to get to information that is needed, but that is hardly a practical solution in the long run, and at Plumbers 2019 it was already indicated that just adding helpers to get to more information about tasks is not the route people want to take. > If your ultimate goal is to have a command called "dtrace" that runs D > programs, to support your existing users, then I'd add a lex/yacc pair > to bpftrace and have it emit a dtrace binary. My goal is not to have a command called dtarce that somehow simply provides some form of support for dtrace scripts in some legacy support model. My goal is to make DTrace available on Linux based on existing kernel features (and contirbuting extra features where needed, in a collaborative manner). DTrace is currently already available as open source for Linux but it involves a much too invasive set of patches to the kernel, often (almost) duplicating functionality that is already present. That's not a good solution. Working on implementing the kernel portion to make use of kernel features has brought to light some areas where contributions can help avoid workarounds and provide mechanisms that can be of use to other tracing solutions as well. That is the basis for my patches. Kris ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] bpf: context casting for tail call and gtrace prog type 2019-03-12 3:24 ` Kris Van Hees @ 2019-03-12 6:03 ` Brendan Gregg 2019-03-12 16:53 ` Kris Van Hees 0 siblings, 1 reply; 13+ messages in thread From: Brendan Gregg @ 2019-03-12 6:03 UTC (permalink / raw) To: Kris Van Hees; +Cc: Alexei Starovoitov, netdev, bpf, Daniel Borkmann On Mon, Mar 11, 2019 at 8:24 PM Kris Van Hees <kris.van.hees@oracle.com> wrote: > > On Mon, Mar 11, 2019 at 06:29:55PM -0700, Brendan Gregg wrote: > > On Mon, Mar 11, 2019 at 7:21 AM Kris Van Hees <kris.van.hees@oracle.com> wrote: > > > > > > On Thu, Mar 07, 2019 at 01:30:37PM -0800, Alexei Starovoitov wrote: > > > > On Tue, Mar 05, 2019 at 09:03:57PM -0500, Kris Van Hees wrote: > > [...] > > > > > But being able to do things like this without > > > > > needing to touch the context of any other BPF program type is a great benefit > > > > > to offer tracing tools, as far as I see it. > > > > > > > > I still don't understand what you're referring to by 'things like this' > > > > that somehow will be possible in the future, but not possible today. > > > > Could you please give concrete example? > > > > > > My apologies for not being clear. I am referring to the features of the > > > gtrace context in terms of containing task information, and output buffers > > > to be used in BPF programs triggered from various probe sources (kprobe, > > > tracepoints, ...) I would not want to suggest making changes to all the > > > different program contexts in order to support tracing needs because that > > > would be wrong. Doing it in a central place makes it a lot easier to maintain > > > without impacting other program types, etc. > > > > > > Of course, yes, bpf_probe_read() and bpf_perf_event_output() can be used > > > to implement a lot of what existing tracing tools like DTrace can do, if you > > > write them based on that. One limitations I am obviously working with is > > > that DTrace already exists and has existed for a long time. And while it is > > > 100% available as open source, it involves a pretty involved set of patches to > > > be applied to the kernel to be able to use it which is just not ideal. Hence > > > the goal to make it available by re-using as much of the existing features in > > > Linux as possible, while still maintaining the same level of functionality in > > > DTrace. That means we need to fill the gaps - and from where I am sitting, > > > the ways to do that might as well be of use to others (if they want to). > > > > > > If phrasing things in the context of DTrace would make the conversation easier > > > I certainly don;t mind doing that, but I really don't want to limit my patches > > > to supporting just DTrace (even if right now it might be the only tracer using > > > it). > > > > As a concrete example, can you point to one of my own published DTrace > > tools that BPF can't do? These were created to solve many real > > production issues, and make good use cases. I've been porting them > > over to BPF (bcc and bpftrace) without too much problem, and I can't > > think of a single one that I couldn't port over today. > > I am unclear how pointing at one of your published DTrace tools would > contribute to this discussion. Surely the scope of use cases is not limited > to the DTrace scripts you published? > > Either way, one of the features that I make use of is speculative tracing. The subject was a concrete example. I don't think I used speculative tracing in any of my scripts. Can you pick a better example of something BPF can't do? > And yes, even that could be handled with some ugly workarounds but my intent > is to implement things in a more clean way rather than depending on a bunch > of workarounds to make it somewhat work. > > > There's a few minor things that I'm currently doing workarounds for, > > like ppid, but that should be satisfied with a few more helpers. And > > if it's really niche, then BTF sounds like a good solution. > > Of course, we can always add more helpers to get to information that is > needed, but that is hardly a practical solution in the long run, and at > Plumbers 2019 it was already indicated that just adding helpers to get to > more information about tasks is not the route people want to take. > > > If your ultimate goal is to have a command called "dtrace" that runs D > > programs, to support your existing users, then I'd add a lex/yacc pair > > to bpftrace and have it emit a dtrace binary. > > My goal is not to have a command called dtarce that somehow simply provides > some form of support for dtrace scripts in some legacy support model. My > goal is to make DTrace available on Linux based on existing kernel features > (and contirbuting extra features where needed, in a collaborative manner). If bpftrace builds a dtrace binary that runs D code, then you just did make DTrace available on Linux. And without changing the kernel. Brendan -- Brendan Gregg, Senior Performance Architect, Netflix ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] bpf: context casting for tail call and gtrace prog type 2019-03-12 6:03 ` Brendan Gregg @ 2019-03-12 16:53 ` Kris Van Hees 2019-03-13 16:30 ` Brendan Gregg 0 siblings, 1 reply; 13+ messages in thread From: Kris Van Hees @ 2019-03-12 16:53 UTC (permalink / raw) To: Brendan Gregg Cc: Kris Van Hees, Alexei Starovoitov, netdev, bpf, Daniel Borkmann On Mon, Mar 11, 2019 at 11:03:10PM -0700, Brendan Gregg wrote: > On Mon, Mar 11, 2019 at 8:24 PM Kris Van Hees <kris.van.hees@oracle.com> wrote: > > > > On Mon, Mar 11, 2019 at 06:29:55PM -0700, Brendan Gregg wrote: > > > On Mon, Mar 11, 2019 at 7:21 AM Kris Van Hees <kris.van.hees@oracle.com> wrote: > > > > > > > > On Thu, Mar 07, 2019 at 01:30:37PM -0800, Alexei Starovoitov wrote: > > > > > On Tue, Mar 05, 2019 at 09:03:57PM -0500, Kris Van Hees wrote: > > > [...] > > > > > > But being able to do things like this without > > > > > > needing to touch the context of any other BPF program type is a great benefit > > > > > > to offer tracing tools, as far as I see it. > > > > > > > > > > I still don't understand what you're referring to by 'things like this' > > > > > that somehow will be possible in the future, but not possible today. > > > > > Could you please give concrete example? > > > > > > > > My apologies for not being clear. I am referring to the features of the > > > > gtrace context in terms of containing task information, and output buffers > > > > to be used in BPF programs triggered from various probe sources (kprobe, > > > > tracepoints, ...) I would not want to suggest making changes to all the > > > > different program contexts in order to support tracing needs because that > > > > would be wrong. Doing it in a central place makes it a lot easier to maintain > > > > without impacting other program types, etc. > > > > > > > > Of course, yes, bpf_probe_read() and bpf_perf_event_output() can be used > > > > to implement a lot of what existing tracing tools like DTrace can do, if you > > > > write them based on that. One limitations I am obviously working with is > > > > that DTrace already exists and has existed for a long time. And while it is > > > > 100% available as open source, it involves a pretty involved set of patches to > > > > be applied to the kernel to be able to use it which is just not ideal. Hence > > > > the goal to make it available by re-using as much of the existing features in > > > > Linux as possible, while still maintaining the same level of functionality in > > > > DTrace. That means we need to fill the gaps - and from where I am sitting, > > > > the ways to do that might as well be of use to others (if they want to). > > > > > > > > If phrasing things in the context of DTrace would make the conversation easier > > > > I certainly don;t mind doing that, but I really don't want to limit my patches > > > > to supporting just DTrace (even if right now it might be the only tracer using > > > > it). > > > > > > As a concrete example, can you point to one of my own published DTrace > > > tools that BPF can't do? These were created to solve many real > > > production issues, and make good use cases. I've been porting them > > > over to BPF (bcc and bpftrace) without too much problem, and I can't > > > think of a single one that I couldn't port over today. > > > > I am unclear how pointing at one of your published DTrace tools would > > contribute to this discussion. Surely the scope of use cases is not limited > > to the DTrace scripts you published? > > > > Either way, one of the features that I make use of is speculative tracing. > > The subject was a concrete example. I don't think I used speculative > tracing in any of my scripts. Can you pick a better example of > something BPF can't do? Well, then speculative tracing is a good example of something that cannot be done right now. The specopen.d script that is part of the DTrace documentation and is also featured in the test suite makes for a concrete example. That script has been used as a kind of template script in situations where we had to analyze code paths associated with a specific subset of conditions that could not be known beforehand. Are there other ways to accomplish the same thing? Sure. But I don't think that is really the point. There are often multiple tools that can do the same thing (or close to the same thing), and people have the option to choose one or the other. DTrace is one of those tools, as is systemtap, bpftrace, perf, and other tools. > > And yes, even that could be handled with some ugly workarounds but my intent > > is to implement things in a more clean way rather than depending on a bunch > > of workarounds to make it somewhat work. > > > > > There's a few minor things that I'm currently doing workarounds for, > > > like ppid, but that should be satisfied with a few more helpers. And > > > if it's really niche, then BTF sounds like a good solution. > > > > Of course, we can always add more helpers to get to information that is > > needed, but that is hardly a practical solution in the long run, and at > > Plumbers 2019 it was already indicated that just adding helpers to get to > > more information about tasks is not the route people want to take. > > > > > If your ultimate goal is to have a command called "dtrace" that runs D > > > programs, to support your existing users, then I'd add a lex/yacc pair > > > to bpftrace and have it emit a dtrace binary. > > > > My goal is not to have a command called dtrace that somehow simply provides > > some form of support for dtrace scripts in some legacy support model. My > > goal is to make DTrace available on Linux based on existing kernel features > > (and contributing extra features where needed, in a collaborative manner). > > If bpftrace builds a dtrace binary that runs D code, then you just did > make DTrace available on Linux. > > And without changing the kernel. If bpftrace could do everything that DTrace does, in a way that is 100% transparent to the user, and without requiring extra dependencies like e.g. software development packages (llvm, etc) to be installed on the system where it will be used in production, I think we wouldn't be having this conversation. Anyway, I feel we're getting off track here... the discussion is not about whether bpftrace can do what dtrace can do, or any other tool for that matter. There is a need for DTrace on Linux and I am working on making that possible in a way where DTrace is one among multiple tracing tools, and by leveraging existing features in the kernel to the extent possible rather than putting it in as an almost self-contained monolith. We already made those patches available a while ago - but that isn't the right way to go about this in the long run and it isn't a benefit to the overall community because there isn't any good way other tools can make use of it. As part of doing the work to contribute DTrace as a tool within the Linux tracing framework I identify areas where there are gaps in terms of what we need, and I contribute patches that fill those gaps in a way that makes it possible for others to make use of those features as well. Kris ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] bpf: context casting for tail call and gtrace prog type 2019-03-12 16:53 ` Kris Van Hees @ 2019-03-13 16:30 ` Brendan Gregg 2019-03-16 1:49 ` Kris Van Hees 0 siblings, 1 reply; 13+ messages in thread From: Brendan Gregg @ 2019-03-13 16:30 UTC (permalink / raw) To: Kris Van Hees; +Cc: Alexei Starovoitov, netdev, bpf, Daniel Borkmann On Tue, Mar 12, 2019 at 9:54 AM Kris Van Hees <kris.van.hees@oracle.com> wrote: > > On Mon, Mar 11, 2019 at 11:03:10PM -0700, Brendan Gregg wrote: > > On Mon, Mar 11, 2019 at 8:24 PM Kris Van Hees <kris.van.hees@oracle.com> wrote: > > > > > > On Mon, Mar 11, 2019 at 06:29:55PM -0700, Brendan Gregg wrote: > > > > On Mon, Mar 11, 2019 at 7:21 AM Kris Van Hees <kris.van.hees@oracle.com> wrote: > > > > > > > > > > On Thu, Mar 07, 2019 at 01:30:37PM -0800, Alexei Starovoitov wrote: > > > > > > On Tue, Mar 05, 2019 at 09:03:57PM -0500, Kris Van Hees wrote: > > > > [...] > > > > > > > But being able to do things like this without > > > > > > > needing to touch the context of any other BPF program type is a great benefit > > > > > > > to offer tracing tools, as far as I see it. > > > > > > > > > > > > I still don't understand what you're referring to by 'things like this' > > > > > > that somehow will be possible in the future, but not possible today. > > > > > > Could you please give concrete example? > > > > > > > > > > My apologies for not being clear. I am referring to the features of the > > > > > gtrace context in terms of containing task information, and output buffers > > > > > to be used in BPF programs triggered from various probe sources (kprobe, > > > > > tracepoints, ...) I would not want to suggest making changes to all the > > > > > different program contexts in order to support tracing needs because that > > > > > would be wrong. Doing it in a central place makes it a lot easier to maintain > > > > > without impacting other program types, etc. > > > > > > > > > > Of course, yes, bpf_probe_read() and bpf_perf_event_output() can be used > > > > > to implement a lot of what existing tracing tools like DTrace can do, if you > > > > > write them based on that. One limitations I am obviously working with is > > > > > that DTrace already exists and has existed for a long time. And while it is > > > > > 100% available as open source, it involves a pretty involved set of patches to > > > > > be applied to the kernel to be able to use it which is just not ideal. Hence > > > > > the goal to make it available by re-using as much of the existing features in > > > > > Linux as possible, while still maintaining the same level of functionality in > > > > > DTrace. That means we need to fill the gaps - and from where I am sitting, > > > > > the ways to do that might as well be of use to others (if they want to). > > > > > > > > > > If phrasing things in the context of DTrace would make the conversation easier > > > > > I certainly don;t mind doing that, but I really don't want to limit my patches > > > > > to supporting just DTrace (even if right now it might be the only tracer using > > > > > it). > > > > > > > > As a concrete example, can you point to one of my own published DTrace > > > > tools that BPF can't do? These were created to solve many real > > > > production issues, and make good use cases. I've been porting them > > > > over to BPF (bcc and bpftrace) without too much problem, and I can't > > > > think of a single one that I couldn't port over today. > > > > > > I am unclear how pointing at one of your published DTrace tools would > > > contribute to this discussion. Surely the scope of use cases is not limited > > > to the DTrace scripts you published? > > > > > > Either way, one of the features that I make use of is speculative tracing. > > > > The subject was a concrete example. I don't think I used speculative > > tracing in any of my scripts. Can you pick a better example of > > something BPF can't do? > > Well, then speculative tracing is a good example of something that cannot be > done right now. The specopen.d script that is part of the DTrace documentation > and is also featured in the test suite makes for a concrete example. That > script has been used as a kind of template script in situations where we had > to analyze code paths associated with a specific subset of conditions that > could not be known beforehand. I used spec tracing zero times in any of my published scripts from fifteen years of using DTrace, and zero times in my 1152-page DTrace book. Might I find a use one day? Maybe, just as you have. But I wouldn't call it a great example of necessary functionality! Besides, I bet we could implement spec tracing using existing BPF maps if we wanted -- I just haven't had that need. > > Are there other ways to accomplish the same thing? Sure. But I don't think > that is really the point. There are often multiple tools that can do the same > thing (or close to the same thing), and people have the option to choose one > or the other. DTrace is one of those tools, as is systemtap, bpftrace, perf, > and other tools. > > > > And yes, even that could be handled with some ugly workarounds but my intent > > > is to implement things in a more clean way rather than depending on a bunch > > > of workarounds to make it somewhat work. > > > > > > > There's a few minor things that I'm currently doing workarounds for, > > > > like ppid, but that should be satisfied with a few more helpers. And > > > > if it's really niche, then BTF sounds like a good solution. > > > > > > Of course, we can always add more helpers to get to information that is > > > needed, but that is hardly a practical solution in the long run, and at > > > Plumbers 2019 it was already indicated that just adding helpers to get to > > > more information about tasks is not the route people want to take. > > > > > > > If your ultimate goal is to have a command called "dtrace" that runs D > > > > programs, to support your existing users, then I'd add a lex/yacc pair > > > > to bpftrace and have it emit a dtrace binary. > > > > > > My goal is not to have a command called dtrace that somehow simply provides > > > some form of support for dtrace scripts in some legacy support model. My > > > goal is to make DTrace available on Linux based on existing kernel features > > > (and contributing extra features where needed, in a collaborative manner). > > > > If bpftrace builds a dtrace binary that runs D code, then you just did > > make DTrace available on Linux. > > > > And without changing the kernel. > > If bpftrace could do everything that DTrace does, in a way that is 100% > transparent to the user, and without requiring extra dependencies like e.g. > software development packages (llvm, etc) to be installed on the system where > it will be used in production, I think we wouldn't be having this conversation. bpftrace is solving that need, and it was built from the ground up to work with eBPF. It's going to provide 100% of useful functionality. Getting it available out of the box is both a user-space and distro problem. > > Anyway, I feel we're getting off track here... the discussion is not about > whether bpftrace can do what dtrace can do, or any other tool for that matter. > There is a need for DTrace on Linux and I am working on making that possible I wanted DTrace on Linux five years ago. Now I don't. We have eBPF, which does more. > in a way where DTrace is one among multiple tracing tools, and by leveraging Multiple tracing tools? This will fracture the user and engineering communities. People have complained enough about Ftrace and perf and BPF -- we don't need a fourth tracing framework that doesn't solve any new problems. We can add a legacy interface to bpftrace for running D programs, to help people transition. That can be done as a user-space parser. Brendan -- Brendan Gregg, Senior Performance Architect, Netflix ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] bpf: context casting for tail call and gtrace prog type 2019-03-13 16:30 ` Brendan Gregg @ 2019-03-16 1:49 ` Kris Van Hees 0 siblings, 0 replies; 13+ messages in thread From: Kris Van Hees @ 2019-03-16 1:49 UTC (permalink / raw) To: Brendan Gregg Cc: Kris Van Hees, Alexei Starovoitov, netdev, bpf, Daniel Borkmann On Wed, Mar 13, 2019 at 09:30:02AM -0700, Brendan Gregg wrote: > On Tue, Mar 12, 2019 at 9:54 AM Kris Van Hees <kris.van.hees@oracle.com> wrote: > > > > On Mon, Mar 11, 2019 at 11:03:10PM -0700, Brendan Gregg wrote: > > > On Mon, Mar 11, 2019 at 8:24 PM Kris Van Hees <kris.van.hees@oracle.com> wrote: > > > > > > > > On Mon, Mar 11, 2019 at 06:29:55PM -0700, Brendan Gregg wrote: > > > > > On Mon, Mar 11, 2019 at 7:21 AM Kris Van Hees <kris.van.hees@oracle.com> wrote: > > > > > > > > > > > > On Thu, Mar 07, 2019 at 01:30:37PM -0800, Alexei Starovoitov wrote: > > > > > > > On Tue, Mar 05, 2019 at 09:03:57PM -0500, Kris Van Hees wrote: > > > > > [...] > > > > > > > > But being able to do things like this without > > > > > > > > needing to touch the context of any other BPF program type is a great benefit > > > > > > > > to offer tracing tools, as far as I see it. > > > > > > > > > > > > > > I still don't understand what you're referring to by 'things like this' > > > > > > > that somehow will be possible in the future, but not possible today. > > > > > > > Could you please give concrete example? > > > > > > > > > > > > My apologies for not being clear. I am referring to the features of the > > > > > > gtrace context in terms of containing task information, and output buffers > > > > > > to be used in BPF programs triggered from various probe sources (kprobe, > > > > > > tracepoints, ...) I would not want to suggest making changes to all the > > > > > > different program contexts in order to support tracing needs because that > > > > > > would be wrong. Doing it in a central place makes it a lot easier to maintain > > > > > > without impacting other program types, etc. > > > > > > > > > > > > Of course, yes, bpf_probe_read() and bpf_perf_event_output() can be used > > > > > > to implement a lot of what existing tracing tools like DTrace can do, if you > > > > > > write them based on that. One limitations I am obviously working with is > > > > > > that DTrace already exists and has existed for a long time. And while it is > > > > > > 100% available as open source, it involves a pretty involved set of patches to > > > > > > be applied to the kernel to be able to use it which is just not ideal. Hence > > > > > > the goal to make it available by re-using as much of the existing features in > > > > > > Linux as possible, while still maintaining the same level of functionality in > > > > > > DTrace. That means we need to fill the gaps - and from where I am sitting, > > > > > > the ways to do that might as well be of use to others (if they want to). > > > > > > > > > > > > If phrasing things in the context of DTrace would make the conversation easier > > > > > > I certainly don;t mind doing that, but I really don't want to limit my patches > > > > > > to supporting just DTrace (even if right now it might be the only tracer using > > > > > > it). > > > > > > > > > > As a concrete example, can you point to one of my own published DTrace > > > > > tools that BPF can't do? These were created to solve many real > > > > > production issues, and make good use cases. I've been porting them > > > > > over to BPF (bcc and bpftrace) without too much problem, and I can't > > > > > think of a single one that I couldn't port over today. > > > > > > > > I am unclear how pointing at one of your published DTrace tools would > > > > contribute to this discussion. Surely the scope of use cases is not limited > > > > to the DTrace scripts you published? > > > > > > > > Either way, one of the features that I make use of is speculative tracing. > > > > > > The subject was a concrete example. I don't think I used speculative > > > tracing in any of my scripts. Can you pick a better example of > > > something BPF can't do? > > > > Well, then speculative tracing is a good example of something that cannot be > > done right now. The specopen.d script that is part of the DTrace documentation > > and is also featured in the test suite makes for a concrete example. That > > script has been used as a kind of template script in situations where we had > > to analyze code paths associated with a specific subset of conditions that > > could not be known beforehand. > > I used spec tracing zero times in any of my published scripts from > fifteen years of using DTrace, and zero times in my 1152-page DTrace > book. Might I find a use one day? Maybe, just as you have. But I > wouldn't call it a great example of necessary functionality! Perhaps not for you, but it is for me. There are other much simpler things that are currently missing, such as being able to obtain euid/egid, ppid (as you mentioned already), ... There are workaround possible or helpers can be added, but I would think that making them available by means of a mechanism that is very much native to BPF (obtaining data from context, as is done for networking packets) makes a lot more sense. > Besides, I bet we could implement spec tracing using existing BPF maps > if we wanted -- I just haven't had that need. > > > > > Are there other ways to accomplish the same thing? Sure. But I don't think > > that is really the point. There are often multiple tools that can do the same > > thing (or close to the same thing), and people have the option to choose one > > or the other. DTrace is one of those tools, as is systemtap, bpftrace, perf, > > and other tools. > > > > > > And yes, even that could be handled with some ugly workarounds but my intent > > > > is to implement things in a more clean way rather than depending on a bunch > > > > of workarounds to make it somewhat work. > > > > > > > > > There's a few minor things that I'm currently doing workarounds for, > > > > > like ppid, but that should be satisfied with a few more helpers. And > > > > > if it's really niche, then BTF sounds like a good solution. > > > > > > > > Of course, we can always add more helpers to get to information that is > > > > needed, but that is hardly a practical solution in the long run, and at > > > > Plumbers 2019 it was already indicated that just adding helpers to get to > > > > more information about tasks is not the route people want to take. > > > > > > > > > If your ultimate goal is to have a command called "dtrace" that runs D > > > > > programs, to support your existing users, then I'd add a lex/yacc pair > > > > > to bpftrace and have it emit a dtrace binary. > > > > > > > > My goal is not to have a command called dtrace that somehow simply provides > > > > some form of support for dtrace scripts in some legacy support model. My > > > > goal is to make DTrace available on Linux based on existing kernel features > > > > (and contributing extra features where needed, in a collaborative manner). > > > > > > If bpftrace builds a dtrace binary that runs D code, then you just did > > > make DTrace available on Linux. > > > > > > And without changing the kernel. > > > > If bpftrace could do everything that DTrace does, in a way that is 100% > > transparent to the user, and without requiring extra dependencies like e.g. > > software development packages (llvm, etc) to be installed on the system where > > it will be used in production, I think we wouldn't be having this conversation. > > bpftrace is solving that need, and it was built from the ground up to > work with eBPF. It's going to provide 100% of useful functionality. > Getting it available out of the box is both a user-space and distro > problem. > > > > > Anyway, I feel we're getting off track here... the discussion is not about > > whether bpftrace can do what dtrace can do, or any other tool for that matter. > > There is a need for DTrace on Linux and I am working on making that possible > > I wanted DTrace on Linux five years ago. Now I don't. We have eBPF, > which does more. I respect your personal feeling on this, but there are people who use DTrace on Linux, and who want DTrace to keep improving and be more based on existing core kernel features. You say that eBPF does more, but clearly it does not since there are features DTrace has that eBPF currently does not provide (as basic as ppid of a task). But really, where does that even matter? If eBPF ends up being able to do more than the current DTrace can, then a DTrace that is making use of eBPF will obviously be able to do more as well. That's a great benefit in my opinion. > > in a way where DTrace is one among multiple tracing tools, and by leveraging > > Multiple tracing tools? This will fracture the user and engineering > communities. People have complained enough about Ftrace and perf and > BPF -- we don't need a fourth tracing framework that doesn't solve any > new problems. People already use multiple tools and those tools continue being developed. And I am not working on a tracing framework alongside whatever already exists, but rather on integrating DTrace with the existing tracing features in the kernel. Even if you feel you don't need another tracing tool, fact is that DTrace already exists on Linux even though the patches are not in the upstream kernel. People use it and like it. Working to make it available without the rather invasive kernel changes and instead building on the existing kernel features means we also can contribute new features that expand the capabilities of eBPF (within its excellent safety mechanisms) and hopefully other tracing features. What is wrong with that? There will always be new features added to tools, there will always be new tools that emerge. That's the nature of software development. And people have a tendency to work with tools they like. Should everyone stop writing C programs because C++ exists? > We can add a legacy interface to bpftrace for running D programs, to > help people transition. That can be done as a user-space parser. Perhaps - but there clearly are features you are missing. And your statement makes the rather big assumption that people *want* to transition. Why would you force people to transition, or live with some kind of legacy interface? But the point of the discussion here is not whether bpftrace can be made to behave as if it is dtrace - I submitted patches that implement new functionality. Are you suggesting that those patches should be rejected off-hand based on the fact that you feel they are unnecessary because people should use bpftrace instead? Kris ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2019-03-16 1:49 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-02-25 15:54 [PATCH 0/2] bpf: context casting for tail call and gtrace prog type Kris Van Hees 2019-02-26 6:18 ` Alexei Starovoitov 2019-02-26 6:46 ` Kris Van Hees 2019-03-05 18:59 ` Alexei Starovoitov 2019-03-06 2:03 ` Kris Van Hees 2019-03-07 21:30 ` Alexei Starovoitov 2019-03-11 14:21 ` Kris Van Hees 2019-03-12 1:29 ` Brendan Gregg 2019-03-12 3:24 ` Kris Van Hees 2019-03-12 6:03 ` Brendan Gregg 2019-03-12 16:53 ` Kris Van Hees 2019-03-13 16:30 ` Brendan Gregg 2019-03-16 1:49 ` Kris Van Hees
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).