* ktap and ebpf integration @ 2014-04-04 1:21 Jovi Zhangwei 2014-04-04 6:26 ` Alexei Starovoitov 0 siblings, 1 reply; 13+ messages in thread From: Jovi Zhangwei @ 2014-04-04 1:21 UTC (permalink / raw) To: Alexei Starovoitov, Ingo Molnar Cc: Steven Rostedt, Masami Hiramatsu, Greg KH, Andi Kleen, LKML Hi Alexei, We talked a lot on ktap and ebpf integration in these days, Now I think we can put into deeply to thinking out some technical issues in there. Firstly, I want to make sure you are support this ktap and ebpf integration direction, I aware you have ongoing 'bpf filter' patch set work, which actually overlapping with ktap integration efforts (IMO the interface should be unified and simple for user, so I think filter debugfs file is not a good interface), so please let me know your answer about this. If the answer is yes, then we can go through ebpf core improvement, for example: - support global variable access this is mandatory for dynamic tracing, otherwise, there have no possible to run a simple script like get function execution time. - support timer in kernel The final solution must need to support kernel timer for profiling, and sampling stack. - support register multi-event in one script - support trace_end If the answer of first question is no, you still believe your "bpf filter" solution is a correct way, that's means there have no need to integrate ktap and ebpf, and don't need any ktap upstream efforts, I 'm fine with it, then I can make another technical plan for ktap. Thank you. Jovi ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ktap and ebpf integration 2014-04-04 1:21 ktap and ebpf integration Jovi Zhangwei @ 2014-04-04 6:26 ` Alexei Starovoitov 2014-04-04 7:26 ` Jovi Zhangwei 2014-04-04 7:27 ` Ingo Molnar 0 siblings, 2 replies; 13+ messages in thread From: Alexei Starovoitov @ 2014-04-04 6:26 UTC (permalink / raw) To: Jovi Zhangwei Cc: Ingo Molnar, Steven Rostedt, Masami Hiramatsu, Greg KH, Andi Kleen, LKML On Thu, Apr 3, 2014 at 6:21 PM, Jovi Zhangwei <jovi.zhangwei@gmail.com> wrote: > Hi Alexei, > > We talked a lot on ktap and ebpf integration in these days, > Now I think we can put into deeply to thinking out some > technical issues in there. > > Firstly, I want to make sure you are support this ktap and > ebpf integration direction, I aware you have ongoing 'bpf filter' > patch set work, which actually overlapping with ktap integration > efforts (IMO the interface should be unified and simple for user, > so I think filter debugfs file is not a good interface), so please let > me know your answer about this. I think the more choices users have the better. I'll continue with C based filters and you can continue with ktap syntax. That's ok. We can share all kernel pieces. Like: 1. user: C -> llvm -> obj_file kernel: obj_file -> ibpf_verifier -> ibpf execution engine 2. user: ktap language -> ktap_compiler -> obj_file kernel: obj_file -> ibpf_verifier -> ibpf execution engine > If the answer is yes, then we can go through ebpf core > improvement, for example: In the architecture I'm proposing there are three main pieces: - user facing language and userspace compiler into ibpf instruction set stored into object file format like ELF or something simpler - in kernel loader of that object file, license and instruction verifier - ibpf execution engine ibpf execution engine can do all requested features already. It's a matter of loader and verifier to accept them. For example: > - support global variable access from execution engine point of view global or stack variable makes no difference. It's a 'ld rY, word ptr [rX]' instruction. where register rX is pointing to the stack or to some memory location. In my old patch set 'verifier' was proving correctness of stack and table accesses only, since I didn't see the need for global pointers yet, but we can add it. > this is mandatory for dynamic tracing, otherwise, there have > no possible to run a simple script like get function execution > time. I don't understand the correlation between measuring function execution time and global variables. I think userspace should be measuring script execution time. Time sampling within kernel can be done from ibpf program by calling ktime_get(). > - support timer in kernel > The final solution must need to support kernel timer for profiling, > and sampling stack. we can let programs be executed in kernel by timer events, but I think it's a userspace task. If userspace can do it without hurting performance, it probably should do it. For example to do systemtap 'iotop.stp' which looks like: probe vfs.read.return { reads[execname()] += bytes_read } probe vfs.write.return { writes[execname()] += bytes_written } # print top 10 IO processes every 5 seconds probe timer.s(5) { foreach (name in writes) total_io[name] += writes[name] foreach (name in reads) total_io[name] += reads[name] printf ("%16s\t%10s\t%10s\n", "Process", "KB Read", "KB Written") ... } first two probe functions belong in kernel as two independent ibpf programs that access 'reads' and 'writes' tables, and 'timer.s' really belongs in userspace. Every 5 seconds it can access 'reads' and 'write' tables, sort them, print them, etc. The important concept here is a user/kernel shared table. ibpf program can read/write to it from kernel. userspace component can read/write it in parallel. Back in september I posted patches for this style of table access via netlink. Note that ibpf program doesn't own memory. It can call 'bpf_table_update' to store key/value pair into kernel table. Think of it as small in kernel database that ibpf program can store data to and user space can read/write data at the same time. > - support register multi-event in one script I think it should be clear now, that it's already supported. one ibpf program == one function. object file may contain multiple programs that attach to different kprobe events and store key/value pairs into the same or different tables. >From verifier point of view this two programs are disjoint. They cannot call each other. Verifier checks them independently. > - support trace_end if you mean the final print out of everything, then it's a userspace task. Thanks Alexei ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ktap and ebpf integration 2014-04-04 6:26 ` Alexei Starovoitov @ 2014-04-04 7:26 ` Jovi Zhangwei 2014-04-04 7:48 ` Ingo Molnar 2014-04-04 14:20 ` Andi Kleen 2014-04-04 7:27 ` Ingo Molnar 1 sibling, 2 replies; 13+ messages in thread From: Jovi Zhangwei @ 2014-04-04 7:26 UTC (permalink / raw) To: Alexei Starovoitov Cc: Ingo Molnar, Steven Rostedt, Masami Hiramatsu, Greg KH, Andi Kleen, LKML On Fri, Apr 4, 2014 at 2:26 PM, Alexei Starovoitov <ast@plumgrid.com> wrote: > On Thu, Apr 3, 2014 at 6:21 PM, Jovi Zhangwei <jovi.zhangwei@gmail.com> wrote: >> Hi Alexei, >> >> We talked a lot on ktap and ebpf integration in these days, >> Now I think we can put into deeply to thinking out some >> technical issues in there. >> >> Firstly, I want to make sure you are support this ktap and >> ebpf integration direction, I aware you have ongoing 'bpf filter' >> patch set work, which actually overlapping with ktap integration >> efforts (IMO the interface should be unified and simple for user, >> so I think filter debugfs file is not a good interface), so please let >> me know your answer about this. > > I think the more choices users have the better. > I'll continue with C based filters and you can continue with ktap > syntax. That's ok. We can share all kernel pieces. Now I understand that there is no way to integrate ktap and ibpf in technical point of view, the kernel side and interface is completely different, and obviously you don't want to change current per-event filter file based interface and kernel part, that make impossible to let ktap could integrate or share with ibpf. Anyway, I think there will don't have any necessary to upstream ktap any more, I still enjoy the simplicity and flexibility given by ktap, and hope there will have a kernel built-in alternative solution in future. Specially thanks for the guys which put the efforts on ktap review. Jovi ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ktap and ebpf integration 2014-04-04 7:26 ` Jovi Zhangwei @ 2014-04-04 7:48 ` Ingo Molnar 2014-04-04 8:46 ` Jovi Zhangwei 2014-04-04 14:20 ` Andi Kleen 1 sibling, 1 reply; 13+ messages in thread From: Ingo Molnar @ 2014-04-04 7:48 UTC (permalink / raw) To: Jovi Zhangwei Cc: Alexei Starovoitov, Ingo Molnar, Steven Rostedt, Masami Hiramatsu, Greg KH, Andi Kleen, LKML * Jovi Zhangwei <jovi.zhangwei@gmail.com> wrote: > On Fri, Apr 4, 2014 at 2:26 PM, Alexei Starovoitov <ast@plumgrid.com> wrote: > > On Thu, Apr 3, 2014 at 6:21 PM, Jovi Zhangwei <jovi.zhangwei@gmail.com> wrote: > >> Hi Alexei, > >> > >> We talked a lot on ktap and ebpf integration in these days, > >> Now I think we can put into deeply to thinking out some > >> technical issues in there. > >> > >> Firstly, I want to make sure you are support this ktap and > >> ebpf integration direction, I aware you have ongoing 'bpf filter' > >> patch set work, which actually overlapping with ktap integration > >> efforts (IMO the interface should be unified and simple for user, > >> so I think filter debugfs file is not a good interface), so please let > >> me know your answer about this. > > > > I think the more choices users have the better. > > I'll continue with C based filters and you can continue with ktap > > syntax. That's ok. We can share all kernel pieces. > > Now I understand that there is no way to integrate ktap and ibpf in > technical point of view, the kernel side and interface is completely > different, and obviously you don't want to change current per-event > filter file based interface and kernel part, that make impossible to > let ktap could integrate or share with ibpf. In my reading that's not what Alexei wrote: he just suggested that as long as the kernel bits are largely shared, the user-space bits (syntax, etc.) can stay completely orthogonal and independent. It also does not mean that ktap is forced to use the per event filter file based interface to pass BPF scripts to the kernel. BPF is already used by various facilities in the kernel, with different user-space APIs to interface with it. So the main technical question is: why should ktap have its own separate in-kernel code execution engine, if we already have the BPF virtual machine (which is well-maintained, has excellent performance through JIT, etc.), which could be reused and/or enhanced? Is there any aspect of ktap's virtual machine that BPF does not have? Thanks, Ingo ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ktap and ebpf integration 2014-04-04 7:48 ` Ingo Molnar @ 2014-04-04 8:46 ` Jovi Zhangwei 2014-04-04 15:57 ` Alexei Starovoitov 2014-04-04 17:28 ` Alexei Starovoitov 0 siblings, 2 replies; 13+ messages in thread From: Jovi Zhangwei @ 2014-04-04 8:46 UTC (permalink / raw) To: Ingo Molnar Cc: Alexei Starovoitov, Ingo Molnar, Steven Rostedt, Masami Hiramatsu, Greg KH, Andi Kleen, LKML On Fri, Apr 4, 2014 at 3:48 PM, Ingo Molnar <mingo@kernel.org> wrote: > > * Jovi Zhangwei <jovi.zhangwei@gmail.com> wrote: > >> On Fri, Apr 4, 2014 at 2:26 PM, Alexei Starovoitov <ast@plumgrid.com> wrote: >> > On Thu, Apr 3, 2014 at 6:21 PM, Jovi Zhangwei <jovi.zhangwei@gmail.com> wrote: >> >> Hi Alexei, >> >> >> >> We talked a lot on ktap and ebpf integration in these days, >> >> Now I think we can put into deeply to thinking out some >> >> technical issues in there. >> >> >> >> Firstly, I want to make sure you are support this ktap and >> >> ebpf integration direction, I aware you have ongoing 'bpf filter' >> >> patch set work, which actually overlapping with ktap integration >> >> efforts (IMO the interface should be unified and simple for user, >> >> so I think filter debugfs file is not a good interface), so please let >> >> me know your answer about this. >> > >> > I think the more choices users have the better. >> > I'll continue with C based filters and you can continue with ktap >> > syntax. That's ok. We can share all kernel pieces. >> >> Now I understand that there is no way to integrate ktap and ibpf in >> technical point of view, the kernel side and interface is completely >> different, and obviously you don't want to change current per-event >> filter file based interface and kernel part, that make impossible to >> let ktap could integrate or share with ibpf. > > In my reading that's not what Alexei wrote: he just suggested that as > long as the kernel bits are largely shared, the user-space bits > (syntax, etc.) can stay completely orthogonal and independent. > Actually I also agree this, kernel part should be unified and well designed, I also agree that userspace part should have unified program in long term, we can start from C initially, and make some part more simile and flexible for end user(like provide associative array and aggregation syntax, that's addon for C syntax) > It also does not mean that ktap is forced to use the per event filter > file based interface to pass BPF scripts to the kernel. BPF is already > used by various facilities in the kernel, with different user-space > APIs to interface with it. > The issue is one-event mapping with one-program design in BPF, which Alexei already mentioned clear on this, I'm really don't like this design, how about support multi-events with same probe callback? current ktap support this: "trace *:* {}", it means it trace all tracepoints events, this ktap design is constantly match with perf does now, but it will be strongly conflicts current BPF "one-event mapping with one-program" design This is why the interface really matters. > So the main technical question is: why should ktap have its own > separate in-kernel code execution engine, if we already have the BPF > virtual machine (which is well-maintained, has excellent performance > through JIT, etc.), which could be reused and/or enhanced? > I already mentioned I agree maintain one bytecode engine in kernel. > Is there any aspect of ktap's virtual machine that BPF does not have? > Already said, I don't want to bring ktap virtual machine to kernel even though I like it and putted endless effort on it, what I really want is we should have a well designed dynamic tracing framework, so I hope I can bring ktap "features"(not bytecode engine and ktap compiler) to enhance BPF: - Tracing framework which unified with perf(make possible to integrate with perf someday) trace *"* {} trace syscalls:* {} trace probe:libc.so:* {} trace ftrace:function {} This basic framework is well designed and loved by ktap end user. (This design heavily conflicts with BPF one-event one program.) - timer event BPF insist timer should move to userspace, I doubt that, that even make BPF can not profiling kernel(userspace) stack, time event must fired in kernel space to get stack, this is so easy to understand, but BPF object. - Global variable access I also doubt how to access global variable if BPF use one-event one-function one-program design. - Flexible associative array(kernel part) - Ring buffer (based on ftrace rb) - Library and built-in functions Those part also could be reused by move to BPF. - Samples samples could be reuse (of course syntax need to change), I think BPF should look more ktap samples before form its finial design. These is what I want to bring to current BPF design and implementation. But obviously these "features" could not work on BPF, the kernel part cannot shared between ktap and BPF, this make ktap have to leave away from BPF. I guess I already claim my concerns clearly on BPF design. Thanks. Jovi ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ktap and ebpf integration 2014-04-04 8:46 ` Jovi Zhangwei @ 2014-04-04 15:57 ` Alexei Starovoitov 2014-04-04 17:28 ` Alexei Starovoitov 1 sibling, 0 replies; 13+ messages in thread From: Alexei Starovoitov @ 2014-04-04 15:57 UTC (permalink / raw) To: Jovi Zhangwei Cc: Ingo Molnar, Ingo Molnar, Steven Rostedt, Masami Hiramatsu, Greg KH, Andi Kleen, LKML On Fri, Apr 4, 2014 at 1:46 AM, Jovi Zhangwei <jovi.zhangwei@gmail.com> wrote: > On Fri, Apr 4, 2014 at 3:48 PM, Ingo Molnar <mingo@kernel.org> wrote: >> >> * Jovi Zhangwei <jovi.zhangwei@gmail.com> wrote: >> >>> On Fri, Apr 4, 2014 at 2:26 PM, Alexei Starovoitov <ast@plumgrid.com> wrote: >>> > On Thu, Apr 3, 2014 at 6:21 PM, Jovi Zhangwei <jovi.zhangwei@gmail.com> wrote: >>> >> Hi Alexei, >>> >> >>> >> We talked a lot on ktap and ebpf integration in these days, >>> >> Now I think we can put into deeply to thinking out some >>> >> technical issues in there. >>> >> >>> >> Firstly, I want to make sure you are support this ktap and >>> >> ebpf integration direction, I aware you have ongoing 'bpf filter' >>> >> patch set work, which actually overlapping with ktap integration >>> >> efforts (IMO the interface should be unified and simple for user, >>> >> so I think filter debugfs file is not a good interface), so please let >>> >> me know your answer about this. >>> > >>> > I think the more choices users have the better. >>> > I'll continue with C based filters and you can continue with ktap >>> > syntax. That's ok. We can share all kernel pieces. >>> >>> Now I understand that there is no way to integrate ktap and ibpf in >>> technical point of view, the kernel side and interface is completely >>> different, and obviously you don't want to change current per-event >>> filter file based interface and kernel part, that make impossible to >>> let ktap could integrate or share with ibpf. >> >> In my reading that's not what Alexei wrote: he just suggested that as >> long as the kernel bits are largely shared, the user-space bits >> (syntax, etc.) can stay completely orthogonal and independent. >> > Actually I also agree this, kernel part should be unified and well > designed, I also agree that userspace part should have unified program > in long term, we can start from C initially, and make some part > more simile and flexible for end user(like provide associative array > and aggregation syntax, that's addon for C syntax) > >> It also does not mean that ktap is forced to use the per event filter >> file based interface to pass BPF scripts to the kernel. BPF is already >> used by various facilities in the kernel, with different user-space >> APIs to interface with it. >> > The issue is one-event mapping with one-program design in BPF, which > Alexei already mentioned clear on this, I'm really don't like this design, > how about support multi-events with same probe callback? current > ktap support this: "trace *:* {}", it means it trace all tracepoints events, > this ktap design is constantly match with perf does now, but it will > be strongly conflicts current BPF "one-event mapping with one-program" design I didn't say that. I said 'one bpf program = one function'. 'bpf program' terminology comes from old days. the code is full of 'prog' structures and variables. Here it may be confusing. That's why I keep saying 'bpf program = function' Obviously nothing prevents the same program to be attached to multiple events. > This is why the interface really matters. > >> So the main technical question is: why should ktap have its own >> separate in-kernel code execution engine, if we already have the BPF >> virtual machine (which is well-maintained, has excellent performance >> through JIT, etc.), which could be reused and/or enhanced? >> > I already mentioned I agree maintain one bytecode engine in kernel. > >> Is there any aspect of ktap's virtual machine that BPF does not have? >> > Already said, I don't want to bring ktap virtual machine to kernel even though > I like it and putted endless effort on it, what I really want is we should > have a well designed dynamic tracing framework, so I hope I can bring > ktap "features"(not bytecode engine and ktap compiler) to enhance BPF: > > - Tracing framework which unified with perf(make possible to integrate > with perf someday) > trace *"* {} > trace syscalls:* {} > trace probe:libc.so:* {} > trace ftrace:function {} > > This basic framework is well designed and loved by ktap end user. > (This design heavily conflicts with BPF one-event one program.) you misunderstood proposed architecture. > - timer event > BPF insist timer should move to userspace, I doubt that, that even make > BPF can not profiling kernel(userspace) stack, time event must fired > in kernel space to get stack, this is so easy to understand, but BPF object. please point me to the ktap script that does what you have in mind and I can show how it can be done with ibpf > - Global variable access > I also doubt how to access global variable if BPF use one-event one-function > one-program design. seems you misunderstood it again. > - Flexible associative array(kernel part) > - Ring buffer (based on ftrace rb) > - Library and built-in functions > Those part also could be reused by move to BPF. I don't think we're on the same page yet. ring buffer is a part of tracing. I hope you're not proposing to copy it to ktapvm or ibpf. bpf program can add events to ring buffer via function call. In my earlier patches I've demonstrated it. > - Samples > samples could be reuse (of course syntax need to change), > I think BPF should look more ktap samples before form its finial design. > > These is what I want to bring to current BPF design and implementation. > But obviously these "features" could not work on BPF, the kernel part > cannot shared between ktap and BPF, this make ktap have to > leave away from BPF. You're making far fetching conclusions without even trying to understand what bpf can do. I think the way to move forward is: you post ktap script, I show how it's done with bpf ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ktap and ebpf integration 2014-04-04 8:46 ` Jovi Zhangwei 2014-04-04 15:57 ` Alexei Starovoitov @ 2014-04-04 17:28 ` Alexei Starovoitov 2014-04-05 14:23 ` Jovi Zhangwei 1 sibling, 1 reply; 13+ messages in thread From: Alexei Starovoitov @ 2014-04-04 17:28 UTC (permalink / raw) To: Jovi Zhangwei Cc: Ingo Molnar, Ingo Molnar, Steven Rostedt, Masami Hiramatsu, Greg KH, Andi Kleen, LKML On Fri, Apr 4, 2014 at 7:20 AM, Andi Kleen <andi@firstfloor.org> wrote: > BTW I agree that EBPF won't work for ktap. The models > (static vs dynamic typing etc.) are just too different. If you meant 'static vs dynamic safety checking' then yes. This is a main difference between bpf and ktap approach to safety. bpf engine and checker are disjoint. Interpreter is dumb and just executes instructions. ktap interpreter has to do all sorts of checking, since it cannot trust instructions it sees. In this sense, loops are not supported by ibpf today, since they require run-time checks. I can think of a way to add such support, but rather not. Such 'anti-feature' is not needed. 'ktap syntax' from user space point of view, can use ibpf as-is. Show me the script and I can show how ibpf can run it. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ktap and ebpf integration 2014-04-04 17:28 ` Alexei Starovoitov @ 2014-04-05 14:23 ` Jovi Zhangwei 2014-04-05 17:22 ` Alexei Starovoitov 2014-04-05 17:50 ` Andi Kleen 0 siblings, 2 replies; 13+ messages in thread From: Jovi Zhangwei @ 2014-04-05 14:23 UTC (permalink / raw) To: Alexei Starovoitov Cc: Ingo Molnar, Ingo Molnar, Steven Rostedt, Masami Hiramatsu, Greg KH, Andi Kleen, LKML On Sat, Apr 5, 2014 at 1:28 AM, Alexei Starovoitov <ast@plumgrid.com> wrote: > On Fri, Apr 4, 2014 at 7:20 AM, Andi Kleen <andi@firstfloor.org> wrote: > >> BTW I agree that EBPF won't work for ktap. The models >> (static vs dynamic typing etc.) are just too different. > > If you meant 'static vs dynamic safety checking' then yes. > This is a main difference between bpf and ktap approach to safety. > bpf engine and checker are disjoint. > Interpreter is dumb and just executes instructions. > ktap interpreter has to do all sorts of checking, since it cannot > trust instructions it sees. > In this sense, loops are not supported by ibpf today, since they > require run-time checks. I can think of a way to add such > support, but rather not. Such 'anti-feature' is not needed. > > 'ktap syntax' from user space point of view, can use ibpf as-is. > Show me the script and I can show how ibpf can run it. Well, please don't engage 'ktap syntax' in here, if you think "Integration" only means ktap compiler compiles ktap syntax into BPF bytecode, then that's entirely misunderstood what's the real problem in there, some ktap samples in below: 1). trace syscalls:* { print(argstr) } Register many events. I posted this script in previous mail, but don't get the answer how to support this in BPF. Note ktap implement this by library function(kdebug,trace_by_id), not change object file, can BPF does this? 2). print("hello world") This is simplest hello world script in ktap, note that the executing context is not probe context, but in main ktap context, BPF main context only allow declare table, nothing else. (You may think this helloworld script is not useful, but not true, many script don't have to run in probe context, for example, the script just want to read some global variable in kernel) 3). var s = {}; trace *:* { s[probename] += 1 } variable table s is allocated in main context, same as above, BPF disallow allocate table in this flexible way, ktap allow assign table entries before register events, BPF also don't support. 4) var i = 0; trace *:* { i += 1} Assign global variable in here, there also can assign other value not 0, please show me how BPF do this. (See complex global usage example in samples/schedule/schedtimes.kp) 5) kdebug.kprobe("SyS_futex", function () { print(pid) }) ktap register event through function call, not change any core vm, obviously BPF cannot support this flexible callback mechanism. 6). time.profile { print(stack()) } print kernel stack in timer manner. Note ktap implement this by library function, not change any bytecode object file format. 7). trace_end Note there may have execute logic in trace_end part, not just only dump everything as you said, so I don't understand why BPF want to move trace_end to userspace, Dtrace/stap both support this, why BPF object this? And ktap implement trace_end by function call, not change any core vm design, hope BPF can do this without introduce any change in BPF object file format. 8) call user defined function It seems BPF cannot call user defined function(not inlined), user defined function is useful when dynamic tracing solution support tapset in future(IMO it's hard to avoid user defined tapset). Note that all those above ktap examples don't change any core ktap virtual machine and object file format, table and event register both implemented by library, ktap decouples features and vm very well, table/aggregation should be a feature, not be in core vm, but BPF glue everything together, in summary, three key issues in BPF: 1) BPF couples table in compiler/validation program. Similar with table design, I think if BPF want to support aggregation in future, it must need to change compiler and validation, and will keep changes if BPF support more features. 2) BPF don't allow execute in main context This is the main issue to for ktap integration, ktap allow assign global variable, call allowed function before register events to initiate things, this is mandatory for ktap, and IMO it is mandatory for all generic dynamic tracing tools. 3) BPF mix event register logic in object format file ktap object file don't aware any event logic, it's just a normal function all in ktap, but in BPF object file, there even have a "event" section. IMO, BPF engine should be a simple and generic script engine, just focus on the script engine, not features(table/aggregation/ event registration/trace_end/timer/etc), this is why ktap is so simple and flexible, this is what I really want BPF can do, we are have different opinions on those features, if it decouples with core BPF vm and object file design, then everything will be solve, let each part implement specific features though own library function. this is not only useful for ktap, but may also benefit for other kernel subsystem and external modules as well. All these issues make we cannot let ktap run on BPF engine because of current BPF limited and specific design. Thanks. Jovi ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ktap and ebpf integration 2014-04-05 14:23 ` Jovi Zhangwei @ 2014-04-05 17:22 ` Alexei Starovoitov 2014-04-05 21:26 ` Jovi Zhangwei 2014-04-05 17:50 ` Andi Kleen 1 sibling, 1 reply; 13+ messages in thread From: Alexei Starovoitov @ 2014-04-05 17:22 UTC (permalink / raw) To: Jovi Zhangwei Cc: Ingo Molnar, Ingo Molnar, Steven Rostedt, Masami Hiramatsu, Greg KH, Andi Kleen, LKML On Sat, Apr 5, 2014 at 7:23 AM, Jovi Zhangwei <jovi.zhangwei@gmail.com> wrote: > On Sat, Apr 5, 2014 at 1:28 AM, Alexei Starovoitov <ast@plumgrid.com> wrote: >> >> 'ktap syntax' from user space point of view, can use ibpf as-is. >> Show me the script and I can show how ibpf can run it. > > Well, please don't engage 'ktap syntax' in here, if you think > "Integration" only means ktap compiler compiles ktap syntax > into BPF bytecode, then that's entirely misunderstood what's > the real problem in there, some ktap samples in below: Great. Nice examples. To better understand how they map to bpf architecture consider what bpf is: - bpf instruction set = assembler code - one bpf program = one function - obj_file generated by ktap or C compiler consists of multiple bpf programs (functions) and each one attaches to one or multiple events - events are [ku]probe/tracepoint events including init and fini events - bpf program cannot have loops or call other bpf programs, though it can call safe kernel functions like bpf_printk, bpf_gettimeofday, bfp_getpid, etc - one of such calls is 'bpf_load_pointer' = non-faulting access to any memory - another call is 'bpf_table_lookup' that does table lookup - bpf tables are not part of execution engine. tables are owned by kernel. User space can access them via netlink and may be through other mechanisms (like debugfs) Normal kernel C functions (like bpf_table_update) can access them in parallel. 'tables' is a mechanism to pass data between bpf programs and between bpf program and userspace > 1). trace syscalls:* { print(argstr) } > Register many events. > I posted this script in previous mail, but don't get the answer > how to support this in BPF. > Note ktap implement this by library function(kdebug,trace_by_id), > not change object file, can BPF does this? yes. should be clear from above explanation. > 2). print("hello world") > This is simplest hello world script in ktap, note that the > executing context is not probe context, but in main ktap > context, BPF main context only allow declare table, > nothing else. > (You may think this helloworld script is not useful, but not > true, many script don't have to run in probe context, for > example, the script just want to read some global variable in kernel) yes. see above. > 3). var s = {}; trace *:* { s[probename] += 1 } > variable table s is allocated in main context, same as above, > BPF disallow allocate table in this flexible way, ktap allow > assign table entries before register events, BPF also don't support. already supported. 's' is a table where key = probe_id, value = 4-byte integer > 4) var i = 0; trace *:* { i += 1} > Assign global variable in here, there also can assign other > value not 0, please show me how BPF do this. > (See complex global usage example in samples/schedule/schedtimes.kp) hmm. schedtimes.kp example doesn't have any global variables. RUNNING = 0 and SLEEPING = 2 are constants. as far as I can see even that complex example maps to bpf just fine > 5) kdebug.kprobe("SyS_futex", function () { print(pid) }) > ktap register event through function call, not change any core vm, > obviously BPF cannot support this flexible callback mechanism. I'm missing a 'callback' point here. seems you're attaching to futex and printing pid. That's supported. > 6). time.profile { print(stack()) } > print kernel stack in timer manner. Note ktap implement this by library > function, not change any bytecode object file format. I don't understand what 'time.profile' event is. Isn't this the same as attaching bpf program to some periodic event and printing stack? That's supported. Note: nothing stops the user to write bpf program that is attached to in kernel periodic event like timer. I just don't want a built-in mechanism for timers, since it's a can of worms from security point of view. > 7). trace_end > Note there may have execute logic in trace_end part, not just only > dump everything as you said, so I don't understand why BPF > want to move trace_end to userspace, Dtrace/stap both support > this, why BPF object this? > And ktap implement trace_end by function call, not change > any core vm design, hope BPF can do this without introduce any > change in BPF object file format. in case of schedule/schedtimes.kp example trace_end event should be part of userspace, since it walks potentially very large tables. At the same time there is a 'fini' event that in-kernel bpf program can attach to. If one of the bpf programs in obj_file is attached to 'init' event it gets called upon obj_file loading. Similar with 'fini'. > 8) call user defined function > It seems BPF cannot call user defined function(not inlined), > user defined function is useful when dynamic tracing solution > support tapset in future(IMO it's hard to avoid user defined tapset). completely the opposite. bpf_call instruction is the key difference between new bpf and classic bpf. > in summary, three key issues in BPF: > > 1) BPF couples table in compiler/validation program. > Similar with table design, I think if BPF want to support aggregation > in future, it must need to change compiler and validation, and > will keep changes if BPF support more features. it should be clear that tables, bpf execution engine, kernel functions are decoupled building blocks. verifier brings things together by allowing fixed set of kernel functions to be called from bpf program. Obviously we cannot allow arbitrary function call from programs. It's not safe. > 2) BPF don't allow execute in main context > This is the main issue to for ktap integration, ktap allow > assign global variable, call allowed function before register > events to initiate things, this is mandatory for ktap, and > IMO it is mandatory for all generic dynamic tracing tools. not true. see all of the above. > 3) BPF mix event register logic in object format file > ktap object file don't aware any event logic, it's just a normal > function all in ktap, but in BPF object file, there even have a "event" > section. hmm. I'm missing 'issue' here. I think it's a feature not an issue. bpf program is a function. Like C function it doesn't embed in itself where it's supposed to be called. Separate 'section' in obj_file needs to describe relation between event and function. > IMO, BPF engine should be a simple and generic script engine, > just focus on the script engine, not features(table/aggregation/ we won't be able to go to far while you keep thinking of "execution engine" as "script engine" > All these issues make we cannot let ktap run on BPF engine because > of current BPF limited and specific design. imo that sounds like you just trying to find an excuse to do your own "script engine" You probably got an impression that I'm shutting down all of your 'bpf extension' requests. Not at all. If things are missing, let's add them. Thanks Alexei ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ktap and ebpf integration 2014-04-05 17:22 ` Alexei Starovoitov @ 2014-04-05 21:26 ` Jovi Zhangwei 0 siblings, 0 replies; 13+ messages in thread From: Jovi Zhangwei @ 2014-04-05 21:26 UTC (permalink / raw) To: Alexei Starovoitov Cc: Ingo Molnar, Ingo Molnar, Steven Rostedt, Masami Hiramatsu, Greg KH, Andi Kleen, LKML On Sun, Apr 6, 2014 at 1:22 AM, Alexei Starovoitov <ast@plumgrid.com> wrote: > On Sat, Apr 5, 2014 at 7:23 AM, Jovi Zhangwei <jovi.zhangwei@gmail.com> wrote: >> On Sat, Apr 5, 2014 at 1:28 AM, Alexei Starovoitov <ast@plumgrid.com> wrote: >>> >>> 'ktap syntax' from user space point of view, can use ibpf as-is. >>> Show me the script and I can show how ibpf can run it. >> >> Well, please don't engage 'ktap syntax' in here, if you think >> "Integration" only means ktap compiler compiles ktap syntax >> into BPF bytecode, then that's entirely misunderstood what's >> the real problem in there, some ktap samples in below: > > Great. Nice examples. > To better understand how they map to bpf architecture > consider what bpf is: > - bpf instruction set = assembler code > - one bpf program = one function > - obj_file generated by ktap or C compiler consists of multiple > bpf programs (functions) and each one attaches to one or > multiple events > - events are [ku]probe/tracepoint events including > init and fini events > - bpf program cannot have loops or call other bpf programs, > though it can call safe kernel functions like bpf_printk, > bpf_gettimeofday, bfp_getpid, etc > - one of such calls is 'bpf_load_pointer' = non-faulting access > to any memory > - another call is 'bpf_table_lookup' that does table lookup > - bpf tables are not part of execution engine. > tables are owned by kernel. User space can access them > via netlink and may be through other mechanisms (like debugfs) > Normal kernel C functions (like bpf_table_update) can access > them in parallel. > 'tables' is a mechanism to pass data between bpf programs > and between bpf program and userspace > It seems you use confused statement about "it's already supported" vs "it will be support" in below. >> 1). trace syscalls:* { print(argstr) } >> Register many events. >> I posted this script in previous mail, but don't get the answer >> how to support this in BPF. >> Note ktap implement this by library function(kdebug,trace_by_id), >> not change object file, can BPF does this? > > yes. should be clear from above explanation. > You still don't give me clear answer how it register multi-events in one script, you use "attach" term for event register, so I guess it means use "cat *.bpf > /sys/kernel/***/event/filter" to attach, right? In my thinking, the event registration should be self-described in script, why need another command line for event registration? is that means user need to "cat" many times to register multi-events? >> 2). print("hello world") >> This is simplest hello world script in ktap, note that the >> executing context is not probe context, but in main ktap >> context, BPF main context only allow declare table, >> nothing else. >> (You may think this helloworld script is not useful, but not >> true, many script don't have to run in probe context, for >> example, the script just want to read some global variable in kernel) > > yes. see above. > Already supported? or will supported? I didn't found a way to support this based on your patchset. >> 3). var s = {}; trace *:* { s[probename] += 1 } >> variable table s is allocated in main context, same as above, >> BPF disallow allocate table in this flexible way, ktap allow >> assign table entries before register events, BPF also don't support. > > already supported. > 's' is a table where key = probe_id, value = 4-byte integer > >> 4) var i = 0; trace *:* { i += 1} >> Assign global variable in here, there also can assign other >> value not 0, please show me how BPF do this. >> (See complex global usage example in samples/schedule/schedtimes.kp) > > hmm. schedtimes.kp example doesn't have any global variables. > RUNNING = 0 and SLEEPING = 2 are constants. > as far as I can see even that complex example maps to bpf just fine > Firstly, I want to say access global variable is not supported in your patchset, compiler report it clearly, so if you think it already supported, then it should be "will supported". Anyway, I'm glad to see BPF are going to support this. If I guess right, you plan to init global variables in 'init' section, it's fine for me, again, it's "will support", not "supported", and this is first time to know 'init' and 'fini' section in this mail, not mentioned before, it's good to see BPF make progress. >> 5) kdebug.kprobe("SyS_futex", function () { print(pid) }) >> ktap register event through function call, not change any core vm, >> obviously BPF cannot support this flexible callback mechanism. > > I'm missing a 'callback' point here. > seems you're attaching to futex and printing pid. > That's supported. > The key is ktap implement this without change object file format, but BPF need, anyway, I don't think hardcode in section is a big program, but it should be self-described in script, not need another cat command line. >> 6). time.profile { print(stack()) } >> print kernel stack in timer manner. Note ktap implement this by library >> function, not change any bytecode object file format. > > I don't understand what 'time.profile' event is. > Isn't this the same as attaching bpf program to some periodic > event and printing stack? That's supported. > Note: nothing stops the user to write bpf program that is attached > to in kernel periodic event like timer. > I just don't want a built-in mechanism for timers, since it's a can > of worms from security point of view. > Sorry, the syntax is: profile-10us {...} It means timer fired on each cpu, maybe timer is NMI, it's needed to get real kernel stack. similarly, tick-10us means timer only fired on one cpu.(stap/dtrace both support this kind of timer mode) It not means attach to in kernel periodic event, user need right to set they specific timer interval. Actually I don't know why you object this timer event? you mean it as security issue, but how perf? perf also allow use to set timer frequency, perf also have security issue? >> 7). trace_end >> Note there may have execute logic in trace_end part, not just only >> dump everything as you said, so I don't understand why BPF >> want to move trace_end to userspace, Dtrace/stap both support >> this, why BPF object this? >> And ktap implement trace_end by function call, not change >> any core vm design, hope BPF can do this without introduce any >> change in BPF object file format. > > in case of schedule/schedtimes.kp example > trace_end event should be part of userspace, since it walks > potentially very large tables. > At the same time there is a 'fini' event that in-kernel bpf program > can attach to. > If one of the bpf programs in obj_file is attached to 'init' event > it gets called upon obj_file loading. Similar with 'fini'. > Again, first time to know BPF will support "init" and "fini" function, good move. >> 8) call user defined function >> It seems BPF cannot call user defined function(not inlined), >> user defined function is useful when dynamic tracing solution >> support tapset in future(IMO it's hard to avoid user defined tapset). > > completely the opposite. > bpf_call instruction is the key difference between new bpf and > classic bpf. > Perhaps you misunderstood, I mean call function in script, not pre-defineded in kernel. That's need for tapset, which I think it cannot be avoid in these dynamic tracing tool(ktap/stap/dtrace), but I think it's not a big issue in first step, just mind it maybe need to call another function in script. >> in summary, three key issues in BPF: >> >> 1) BPF couples table in compiler/validation program. >> Similar with table design, I think if BPF want to support aggregation >> in future, it must need to change compiler and validation, and >> will keep changes if BPF support more features. > > it should be clear that tables, bpf execution engine, > kernel functions are decoupled building blocks. I don't think it's proper to use "decoupled" when there have a "table" section in object format, it should not be in there if it's truly decoupled. Actually I think we can find a way to remove table section out of object file without hurt safety(note BPF may need to support aggregation someday, which is another kind of table, I don't think it's a good idea to add more sections), but let's finish current issues, then we can go through table design in the last. > verifier brings things together by allowing fixed set of kernel > functions to be called from bpf program. > Obviously we cannot allow arbitrary function call from > programs. It's not safe. > >> 2) BPF don't allow execute in main context >> This is the main issue to for ktap integration, ktap allow >> assign global variable, call allowed function before register >> events to initiate things, this is mandatory for ktap, and >> IMO it is mandatory for all generic dynamic tracing tools. > > not true. see all of the above. > Again, you just raised solution in this mail(init and fini section), not before. >> 3) BPF mix event register logic in object format file >> ktap object file don't aware any event logic, it's just a normal >> function all in ktap, but in BPF object file, there even have a "event" >> section. > > hmm. I'm missing 'issue' here. > I think it's a feature not an issue. > bpf program is a function. Like C function it doesn't embed > in itself where it's supposed to be called. > Separate 'section' in obj_file needs to describe relation > between event and function. > Said above, if the event section can be self-described, then that's fine, even though ktap do this more cleaner without touch object file format. IMO we need to use some form string(syscalls:*) to represent event registration or event id(like perf/ktap does), both is fine for me. Anyway, I'm glad to see we already have some agreement on what BPF need to extend. Thanks. Jovi ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ktap and ebpf integration 2014-04-05 14:23 ` Jovi Zhangwei 2014-04-05 17:22 ` Alexei Starovoitov @ 2014-04-05 17:50 ` Andi Kleen 1 sibling, 0 replies; 13+ messages in thread From: Andi Kleen @ 2014-04-05 17:50 UTC (permalink / raw) To: Jovi Zhangwei Cc: Alexei Starovoitov, Ingo Molnar, Ingo Molnar, Steven Rostedt, Masami Hiramatsu, Greg KH, Andi Kleen, LKML > 4) var i = 0; trace *:* { i += 1} > Assign global variable in here, there also can assign other > value not 0, please show me how BPF do this. > (See complex global usage example in samples/schedule/schedtimes.kp) That's what I meant. BPF is essentially a statically typed language. KTAP is dynamically typed. It's a very different model. Yes they are both Turing complete and likely could be somehow translated into each other, but would it be efficient and simple? No. [essentially it's the "UNCOL problem" -- see http://en.wikipedia.org/wiki/UNCOL One size doesn't fit all in immediate languages] -Andi ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ktap and ebpf integration 2014-04-04 7:26 ` Jovi Zhangwei 2014-04-04 7:48 ` Ingo Molnar @ 2014-04-04 14:20 ` Andi Kleen 1 sibling, 0 replies; 13+ messages in thread From: Andi Kleen @ 2014-04-04 14:20 UTC (permalink / raw) To: Jovi Zhangwei Cc: Alexei Starovoitov, Ingo Molnar, Steven Rostedt, Masami Hiramatsu, Greg KH, Andi Kleen, LKML > Anyway, I think there will don't have any necessary to upstream > ktap any more, I still enjoy the simplicity and flexibility given Not sure how you got to that conclusion. You were asked to evaluate if EBPF is an alternative for ktap. It looks like the answer is no. So the original KTAP VM design is back on the table. You can continue pursuing to merge that. No reason to give up. BTW I agree that EBPF won't work for ktap. The models (static vs dynamic typing etc.) are just too different. But it's good that it was studied in detail. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ktap and ebpf integration 2014-04-04 6:26 ` Alexei Starovoitov 2014-04-04 7:26 ` Jovi Zhangwei @ 2014-04-04 7:27 ` Ingo Molnar 1 sibling, 0 replies; 13+ messages in thread From: Ingo Molnar @ 2014-04-04 7:27 UTC (permalink / raw) To: Alexei Starovoitov Cc: Jovi Zhangwei, Ingo Molnar, Steven Rostedt, Masami Hiramatsu, Greg KH, Andi Kleen, LKML * Alexei Starovoitov <ast@plumgrid.com> wrote: > On Thu, Apr 3, 2014 at 6:21 PM, Jovi Zhangwei <jovi.zhangwei@gmail.com> wrote: > > Hi Alexei, > > > > We talked a lot on ktap and ebpf integration in these days, > > Now I think we can put into deeply to thinking out some > > technical issues in there. > > > > Firstly, I want to make sure you are support this ktap and > > ebpf integration direction, I aware you have ongoing 'bpf filter' > > patch set work, which actually overlapping with ktap integration > > efforts (IMO the interface should be unified and simple for user, > > so I think filter debugfs file is not a good interface), so please let > > me know your answer about this. > > I think the more choices users have the better. > I'll continue with C based filters and you can continue with ktap > syntax. That's ok. We can share all kernel pieces. I'd somewhat agree with that if this wasn't about the kernel, but I think that it's evidently useful to have one syntax for the kernel (both the scheduler and drivers are written in C) - and probing the kernel is really very close to the kernel source itself so it's just an extension of that same principle. Look at the advantages: people who learn how to write C syntax ktaps would only be a very small step away from writing actual kernel patches and becoming contributors. With some random weird new syntax (be it Lua, C# or Java or any other simplified syntax) that has no relation to kernel source syntax, there's no such synergy! If the 'ktap syntax' lives purely in user space, and the kernel bits are largely be shared and reused, which your suggested design is, then I have no fundamental objections to that: other than I think it's a mistake to not harmonize with the syntax of the probed project! But as long as the other desing aspects are fixed it's not a big showstopper as the mistake is not propagated to the kernel. > [ design suggestions ] I fully agree with your suggestions so far, that looks like a workable way to address my concerns. Thanks, Ingo ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2014-04-05 21:26 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-04-04 1:21 ktap and ebpf integration Jovi Zhangwei 2014-04-04 6:26 ` Alexei Starovoitov 2014-04-04 7:26 ` Jovi Zhangwei 2014-04-04 7:48 ` Ingo Molnar 2014-04-04 8:46 ` Jovi Zhangwei 2014-04-04 15:57 ` Alexei Starovoitov 2014-04-04 17:28 ` Alexei Starovoitov 2014-04-05 14:23 ` Jovi Zhangwei 2014-04-05 17:22 ` Alexei Starovoitov 2014-04-05 21:26 ` Jovi Zhangwei 2014-04-05 17:50 ` Andi Kleen 2014-04-04 14:20 ` Andi Kleen 2014-04-04 7:27 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).