ktap and ebpf integration

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* ktap and ebpf integration
@ 2014-04-04  1:21 Jovi Zhangwei
  2014-04-04  6:26 ` Alexei Starovoitov
  0 siblings, 1 reply; 13+ messages in thread
From: Jovi Zhangwei @ 2014-04-04  1:21 UTC (permalink / raw)
  To: Alexei Starovoitov, Ingo Molnar
  Cc: Steven Rostedt, Masami Hiramatsu, Greg KH, Andi Kleen, LKML

Hi Alexei,

We talked a lot on ktap and ebpf integration in these days,
Now I think we can put into deeply to thinking out some
technical issues in there.

Firstly, I want to make sure you are support this ktap and
ebpf integration direction, I aware you have ongoing 'bpf filter'
patch set work, which actually overlapping with ktap integration
efforts (IMO the interface should be unified and simple for user,
 so I think filter debugfs file is not a good interface), so please let
me know your answer about this.

If the answer is yes, then we can go through ebpf core
improvement, for example:
- support global variable access
  this is mandatory for dynamic tracing, otherwise, there have
  no possible to run a simple script like get function execution
  time.
- support timer in kernel
  The final solution must need to support kernel timer for profiling,
  and sampling stack.
- support register multi-event in one script
- support trace_end

If the answer of first question is no, you still believe your "bpf filter"
solution is a correct way, that's means there have no need to
integrate ktap and ebpf, and don't need any ktap upstream efforts,
I 'm fine with it, then I can make another technical plan for ktap.

Thank you.

Jovi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ktap and ebpf integration
  2014-04-04  1:21 ktap and ebpf integration Jovi Zhangwei
@ 2014-04-04  6:26 ` Alexei Starovoitov
  2014-04-04  7:26   ` Jovi Zhangwei
  2014-04-04  7:27   ` Ingo Molnar
  0 siblings, 2 replies; 13+ messages in thread
From: Alexei Starovoitov @ 2014-04-04  6:26 UTC (permalink / raw)
  To: Jovi Zhangwei
  Cc: Ingo Molnar, Steven Rostedt, Masami Hiramatsu, Greg KH,
	Andi Kleen, LKML

On Thu, Apr 3, 2014 at 6:21 PM, Jovi Zhangwei <jovi.zhangwei@gmail.com> wrote:
> Hi Alexei,
>
> We talked a lot on ktap and ebpf integration in these days,
> Now I think we can put into deeply to thinking out some
> technical issues in there.
>
> Firstly, I want to make sure you are support this ktap and
> ebpf integration direction, I aware you have ongoing 'bpf filter'
> patch set work, which actually overlapping with ktap integration
> efforts (IMO the interface should be unified and simple for user,
>  so I think filter debugfs file is not a good interface), so please let
> me know your answer about this.

I think the more choices users have the better.
I'll continue with C based filters and you can continue with ktap
syntax. That's ok. We can share all kernel pieces.
Like:
1.
user: C -> llvm -> obj_file
kernel: obj_file -> ibpf_verifier -> ibpf execution engine
2.
user: ktap language -> ktap_compiler -> obj_file
kernel: obj_file -> ibpf_verifier -> ibpf execution engine

> If the answer is yes, then we can go through ebpf core
> improvement, for example:

In the architecture I'm proposing there are three main pieces:
- user facing language and userspace compiler into ibpf
  instruction set stored into object file format like ELF
  or something simpler
- in kernel loader of that object file, license and instruction verifier
- ibpf execution engine

ibpf execution engine can do all requested features already.
It's a matter of loader and verifier to accept them.
For example:

> - support global variable access

from execution engine point of view global or stack variable
makes no difference. It's a 'ld rY, word ptr [rX]' instruction.
where register rX is pointing to the stack or to some memory location.
In my old patch set 'verifier' was proving correctness of stack
and table accesses only, since I didn't see the need for global
pointers yet, but we can add it.

>   this is mandatory for dynamic tracing, otherwise, there have
>   no possible to run a simple script like get function execution
>   time.

I don't understand the correlation between measuring function
execution time and global variables.
I think userspace should be measuring script execution time.
Time sampling within kernel can be done from ibpf program
by calling ktime_get().

> - support timer in kernel
>   The final solution must need to support kernel timer for profiling,
>   and sampling stack.

we can let programs be executed in kernel by timer events, but
I think it's a userspace task.
If userspace can do it without hurting performance, it probably
should do it.

For example to do systemtap 'iotop.stp' which looks like:
probe vfs.read.return {
    reads[execname()] += bytes_read
}
probe vfs.write.return {
    writes[execname()] += bytes_written
}
# print top 10 IO processes every 5 seconds
probe timer.s(5) {
    foreach (name in writes)
        total_io[name] += writes[name]
    foreach (name in reads)
        total_io[name] += reads[name]
    printf ("%16s\t%10s\t%10s\n", "Process", "KB Read", "KB Written")
...
}
first two probe functions belong in kernel as two independent
ibpf programs that access 'reads' and 'writes' tables,
and 'timer.s' really belongs in userspace.
Every 5 seconds it can access 'reads' and 'write' tables, sort them,
print them, etc.
The important concept here is a user/kernel shared table.
ibpf program can read/write to it from kernel.
userspace component can read/write it in parallel.

Back in september I posted patches for this style of table
access via netlink.
Note that ibpf program doesn't own memory.
It can call 'bpf_table_update' to store key/value pair
into kernel table. Think of it as small in kernel database
that ibpf program can store data to and user space can
read/write data at the same time.

> - support register multi-event in one script

I think it should be clear now, that it's already supported.
one ibpf program == one function.
object file may contain multiple programs that attach to
different kprobe events and store key/value pairs into
the same or different tables.
>From verifier point of view this two programs are disjoint.
They cannot call each other. Verifier checks them
independently.

> - support trace_end

if you mean the final print out of everything,
then it's a userspace task.

Thanks
Alexei

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ktap and ebpf integration
  2014-04-04  6:26 ` Alexei Starovoitov
@ 2014-04-04  7:26   ` Jovi Zhangwei
  2014-04-04  7:48     ` Ingo Molnar
  2014-04-04 14:20     ` Andi Kleen
  2014-04-04  7:27   ` Ingo Molnar
  1 sibling, 2 replies; 13+ messages in thread
From: Jovi Zhangwei @ 2014-04-04  7:26 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Ingo Molnar, Steven Rostedt, Masami Hiramatsu, Greg KH,
	Andi Kleen, LKML

On Fri, Apr 4, 2014 at 2:26 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On Thu, Apr 3, 2014 at 6:21 PM, Jovi Zhangwei <jovi.zhangwei@gmail.com> wrote:
>> Hi Alexei,
>>
>> We talked a lot on ktap and ebpf integration in these days,
>> Now I think we can put into deeply to thinking out some
>> technical issues in there.
>>
>> Firstly, I want to make sure you are support this ktap and
>> ebpf integration direction, I aware you have ongoing 'bpf filter'
>> patch set work, which actually overlapping with ktap integration
>> efforts (IMO the interface should be unified and simple for user,
>>  so I think filter debugfs file is not a good interface), so please let
>> me know your answer about this.
>
> I think the more choices users have the better.
> I'll continue with C based filters and you can continue with ktap
> syntax. That's ok. We can share all kernel pieces.

Now I understand that there is no way to integrate ktap and ibpf
in technical point of view, the kernel side and interface is
completely different, and obviously you don't want to change
current per-event filter file based interface and kernel part, that
make impossible to let ktap could integrate or share with ibpf.

Anyway, I think there will don't have any necessary to upstream
ktap any more, I still enjoy the simplicity and flexibility given
by ktap,  and hope there will have a kernel built-in alternative
solution in future.

Specially thanks for the guys which put the efforts on ktap review.

Jovi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ktap and ebpf integration
  2014-04-04  7:26   ` Jovi Zhangwei
@ 2014-04-04  7:48     ` Ingo Molnar
  2014-04-04  8:46       ` Jovi Zhangwei
  2014-04-04 14:20     ` Andi Kleen
  1 sibling, 1 reply; 13+ messages in thread
From: Ingo Molnar @ 2014-04-04  7:48 UTC (permalink / raw)
  To: Jovi Zhangwei
  Cc: Alexei Starovoitov, Ingo Molnar, Steven Rostedt, Masami Hiramatsu,
	Greg KH, Andi Kleen, LKML


* Jovi Zhangwei <jovi.zhangwei@gmail.com> wrote:

> On Fri, Apr 4, 2014 at 2:26 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> > On Thu, Apr 3, 2014 at 6:21 PM, Jovi Zhangwei <jovi.zhangwei@gmail.com> wrote:
> >> Hi Alexei,
> >>
> >> We talked a lot on ktap and ebpf integration in these days,
> >> Now I think we can put into deeply to thinking out some
> >> technical issues in there.
> >>
> >> Firstly, I want to make sure you are support this ktap and
> >> ebpf integration direction, I aware you have ongoing 'bpf filter'
> >> patch set work, which actually overlapping with ktap integration
> >> efforts (IMO the interface should be unified and simple for user,
> >>  so I think filter debugfs file is not a good interface), so please let
> >> me know your answer about this.
> >
> > I think the more choices users have the better.
> > I'll continue with C based filters and you can continue with ktap
> > syntax. That's ok. We can share all kernel pieces.
> 
> Now I understand that there is no way to integrate ktap and ibpf in 
> technical point of view, the kernel side and interface is completely 
> different, and obviously you don't want to change current per-event 
> filter file based interface and kernel part, that make impossible to 
> let ktap could integrate or share with ibpf.

In my reading that's not what Alexei wrote: he just suggested that as 
long as the kernel bits are largely shared, the user-space bits 
(syntax, etc.) can stay completely orthogonal and independent.

It also does not mean that ktap is forced to use the per event filter 
file based interface to pass BPF scripts to the kernel. BPF is already 
used by various facilities in the kernel, with different user-space 
APIs to interface with it.

So the main technical question is: why should ktap have its own 
separate in-kernel code execution engine, if we already have the BPF 
virtual machine (which is well-maintained, has excellent performance 
through JIT, etc.), which could be reused and/or enhanced?

Is there any aspect of ktap's virtual machine that BPF does not have?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ktap and ebpf integration
  2014-04-04  7:48     ` Ingo Molnar
@ 2014-04-04  8:46       ` Jovi Zhangwei
  2014-04-04 15:57         ` Alexei Starovoitov
  2014-04-04 17:28         ` Alexei Starovoitov
  0 siblings, 2 replies; 13+ messages in thread
From: Jovi Zhangwei @ 2014-04-04  8:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alexei Starovoitov, Ingo Molnar, Steven Rostedt, Masami Hiramatsu,
	Greg KH, Andi Kleen, LKML

On Fri, Apr 4, 2014 at 3:48 PM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Jovi Zhangwei <jovi.zhangwei@gmail.com> wrote:
>
>> On Fri, Apr 4, 2014 at 2:26 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>> > On Thu, Apr 3, 2014 at 6:21 PM, Jovi Zhangwei <jovi.zhangwei@gmail.com> wrote:
>> >> Hi Alexei,
>> >>
>> >> We talked a lot on ktap and ebpf integration in these days,
>> >> Now I think we can put into deeply to thinking out some
>> >> technical issues in there.
>> >>
>> >> Firstly, I want to make sure you are support this ktap and
>> >> ebpf integration direction, I aware you have ongoing 'bpf filter'
>> >> patch set work, which actually overlapping with ktap integration
>> >> efforts (IMO the interface should be unified and simple for user,
>> >>  so I think filter debugfs file is not a good interface), so please let
>> >> me know your answer about this.
>> >
>> > I think the more choices users have the better.
>> > I'll continue with C based filters and you can continue with ktap
>> > syntax. That's ok. We can share all kernel pieces.
>>
>> Now I understand that there is no way to integrate ktap and ibpf in
>> technical point of view, the kernel side and interface is completely
>> different, and obviously you don't want to change current per-event
>> filter file based interface and kernel part, that make impossible to
>> let ktap could integrate or share with ibpf.
>
> In my reading that's not what Alexei wrote: he just suggested that as
> long as the kernel bits are largely shared, the user-space bits
> (syntax, etc.) can stay completely orthogonal and independent.
>
Actually I also agree this, kernel part should be unified and well
designed, I also agree that userspace part should have unified program
in long term, we can start from C initially, and make some part
more simile and flexible for end user(like provide associative array
and aggregation syntax, that's addon for C syntax)

> It also does not mean that ktap is forced to use the per event filter
> file based interface to pass BPF scripts to the kernel. BPF is already
> used by various facilities in the kernel, with different user-space
> APIs to interface with it.
>
The issue is one-event mapping with one-program design in BPF, which
Alexei already mentioned clear on this, I'm really don't like this design,
how about support multi-events with same probe callback? current
ktap support this: "trace *:* {}", it means it trace all tracepoints events,
this ktap design is constantly match with perf does now, but it will
be strongly conflicts current BPF "one-event mapping with one-program" design

This is why the interface really matters.

> So the main technical question is: why should ktap have its own
> separate in-kernel code execution engine, if we already have the BPF
> virtual machine (which is well-maintained, has excellent performance
> through JIT, etc.), which could be reused and/or enhanced?
>
I already mentioned I agree maintain one bytecode engine in kernel.

> Is there any aspect of ktap's virtual machine that BPF does not have?
>
Already said, I don't want to bring ktap virtual machine to kernel even though
I like it and putted endless effort on it, what I really want is we should
have a well designed dynamic tracing framework, so I hope I can bring
ktap "features"(not bytecode engine and ktap compiler) to enhance BPF:

- Tracing framework which unified with perf(make possible to integrate
with perf someday)
  trace *"* {}
  trace syscalls:* {}
  trace probe:libc.so:* {}
  trace ftrace:function {}

  This basic framework is well designed and loved by ktap end user.
  (This design heavily conflicts with BPF one-event one program.)

- timer event
  BPF insist timer should move to userspace, I doubt that, that even make
  BPF can not profiling kernel(userspace) stack, time event must fired
  in kernel space to get stack, this is so easy to understand, but BPF object.

- Global variable access
  I also doubt how to access global variable if BPF use one-event one-function
  one-program design.

- Flexible associative array(kernel part)
- Ring buffer (based on ftrace rb)
- Library and built-in functions
  Those part also could be reused by move to BPF.

- Samples
  samples could be reuse (of course syntax need to change),
  I think BPF should look more ktap samples before form its finial design.

These is what I want to bring to current BPF design and implementation.
But obviously these "features" could not work on BPF, the kernel part
cannot shared between ktap and BPF, this make ktap have to
leave away from BPF.

I guess I already claim my concerns clearly on BPF design.

Thanks.

Jovi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ktap and ebpf integration
  2014-04-04  8:46       ` Jovi Zhangwei
@ 2014-04-04 15:57         ` Alexei Starovoitov
  2014-04-04 17:28         ` Alexei Starovoitov
  1 sibling, 0 replies; 13+ messages in thread
From: Alexei Starovoitov @ 2014-04-04 15:57 UTC (permalink / raw)
  To: Jovi Zhangwei
  Cc: Ingo Molnar, Ingo Molnar, Steven Rostedt, Masami Hiramatsu,
	Greg KH, Andi Kleen, LKML

On Fri, Apr 4, 2014 at 1:46 AM, Jovi Zhangwei <jovi.zhangwei@gmail.com> wrote:
> On Fri, Apr 4, 2014 at 3:48 PM, Ingo Molnar <mingo@kernel.org> wrote:
>>
>> * Jovi Zhangwei <jovi.zhangwei@gmail.com> wrote:
>>
>>> On Fri, Apr 4, 2014 at 2:26 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>>> > On Thu, Apr 3, 2014 at 6:21 PM, Jovi Zhangwei <jovi.zhangwei@gmail.com> wrote:
>>> >> Hi Alexei,
>>> >>
>>> >> We talked a lot on ktap and ebpf integration in these days,
>>> >> Now I think we can put into deeply to thinking out some
>>> >> technical issues in there.
>>> >>
>>> >> Firstly, I want to make sure you are support this ktap and
>>> >> ebpf integration direction, I aware you have ongoing 'bpf filter'
>>> >> patch set work, which actually overlapping with ktap integration
>>> >> efforts (IMO the interface should be unified and simple for user,
>>> >>  so I think filter debugfs file is not a good interface), so please let
>>> >> me know your answer about this.
>>> >
>>> > I think the more choices users have the better.
>>> > I'll continue with C based filters and you can continue with ktap
>>> > syntax. That's ok. We can share all kernel pieces.
>>>
>>> Now I understand that there is no way to integrate ktap and ibpf in
>>> technical point of view, the kernel side and interface is completely
>>> different, and obviously you don't want to change current per-event
>>> filter file based interface and kernel part, that make impossible to
>>> let ktap could integrate or share with ibpf.
>>
>> In my reading that's not what Alexei wrote: he just suggested that as
>> long as the kernel bits are largely shared, the user-space bits
>> (syntax, etc.) can stay completely orthogonal and independent.
>>
> Actually I also agree this, kernel part should be unified and well
> designed, I also agree that userspace part should have unified program
> in long term, we can start from C initially, and make some part
> more simile and flexible for end user(like provide associative array
> and aggregation syntax, that's addon for C syntax)
>
>> It also does not mean that ktap is forced to use the per event filter
>> file based interface to pass BPF scripts to the kernel. BPF is already
>> used by various facilities in the kernel, with different user-space
>> APIs to interface with it.
>>
> The issue is one-event mapping with one-program design in BPF, which
> Alexei already mentioned clear on this, I'm really don't like this design,
> how about support multi-events with same probe callback? current
> ktap support this: "trace *:* {}", it means it trace all tracepoints events,
> this ktap design is constantly match with perf does now, but it will
> be strongly conflicts current BPF "one-event mapping with one-program" design

I didn't say that.
I said 'one bpf program = one function'.
'bpf program' terminology comes from old days.
the code is full of 'prog' structures and variables.
Here it may be confusing. That's why I keep saying
'bpf program = function'
Obviously nothing prevents the same program to be attached to
multiple events.

> This is why the interface really matters.
>
>> So the main technical question is: why should ktap have its own
>> separate in-kernel code execution engine, if we already have the BPF
>> virtual machine (which is well-maintained, has excellent performance
>> through JIT, etc.), which could be reused and/or enhanced?
>>
> I already mentioned I agree maintain one bytecode engine in kernel.
>
>> Is there any aspect of ktap's virtual machine that BPF does not have?
>>
> Already said, I don't want to bring ktap virtual machine to kernel even though
> I like it and putted endless effort on it, what I really want is we should
> have a well designed dynamic tracing framework, so I hope I can bring
> ktap "features"(not bytecode engine and ktap compiler) to enhance BPF:
>
> - Tracing framework which unified with perf(make possible to integrate
> with perf someday)
>   trace *"* {}
>   trace syscalls:* {}
>   trace probe:libc.so:* {}
>   trace ftrace:function {}
>
>   This basic framework is well designed and loved by ktap end user.
>   (This design heavily conflicts with BPF one-event one program.)

you misunderstood proposed architecture.

> - timer event
>   BPF insist timer should move to userspace, I doubt that, that even make
>   BPF can not profiling kernel(userspace) stack, time event must fired
>   in kernel space to get stack, this is so easy to understand, but BPF object.

please point me to the ktap script that does what you have in mind
and I can show how it can be done with ibpf

> - Global variable access
>   I also doubt how to access global variable if BPF use one-event one-function
>   one-program design.

seems you misunderstood it again.

> - Flexible associative array(kernel part)
> - Ring buffer (based on ftrace rb)
> - Library and built-in functions
>   Those part also could be reused by move to BPF.

I don't think we're on the same page yet.
ring buffer is a part of tracing. I hope you're not proposing to copy
it to ktapvm or ibpf.
bpf program can add events to ring buffer via function call.
In my earlier patches I've demonstrated it.

> - Samples
>   samples could be reuse (of course syntax need to change),
>   I think BPF should look more ktap samples before form its finial design.
>
> These is what I want to bring to current BPF design and implementation.
> But obviously these "features" could not work on BPF, the kernel part
> cannot shared between ktap and BPF, this make ktap have to
> leave away from BPF.

You're making far fetching conclusions without even trying to understand
what bpf can do.
I think the way to move forward is:
you post ktap script, I show how it's done with bpf

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ktap and ebpf integration
  2014-04-04  8:46       ` Jovi Zhangwei
  2014-04-04 15:57         ` Alexei Starovoitov
@ 2014-04-04 17:28         ` Alexei Starovoitov
  2014-04-05 14:23           ` Jovi Zhangwei
  1 sibling, 1 reply; 13+ messages in thread
From: Alexei Starovoitov @ 2014-04-04 17:28 UTC (permalink / raw)
  To: Jovi Zhangwei
  Cc: Ingo Molnar, Ingo Molnar, Steven Rostedt, Masami Hiramatsu,
	Greg KH, Andi Kleen, LKML

On Fri, Apr 4, 2014 at 7:20 AM, Andi Kleen <andi@firstfloor.org> wrote:

> BTW I agree that EBPF won't work for ktap. The models
> (static vs dynamic typing etc.) are just too different.

If you meant 'static vs dynamic safety checking' then yes.
This is a main difference between bpf and ktap approach to safety.
bpf engine and checker are disjoint.
Interpreter is dumb and just executes instructions.
ktap interpreter has to do all sorts of checking, since it cannot
trust instructions it sees.
In this sense, loops are not supported by ibpf today, since they
require run-time checks. I can think of a way to add such
support, but rather not. Such 'anti-feature' is not needed.

'ktap syntax' from user space point of view, can use ibpf as-is.
Show me the script and I can show how ibpf can run it.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ktap and ebpf integration
  2014-04-04 17:28         ` Alexei Starovoitov
@ 2014-04-05 14:23           ` Jovi Zhangwei
  2014-04-05 17:22             ` Alexei Starovoitov
  2014-04-05 17:50             ` Andi Kleen
  0 siblings, 2 replies; 13+ messages in thread
From: Jovi Zhangwei @ 2014-04-05 14:23 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Ingo Molnar, Ingo Molnar, Steven Rostedt, Masami Hiramatsu,
	Greg KH, Andi Kleen, LKML

On Sat, Apr 5, 2014 at 1:28 AM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On Fri, Apr 4, 2014 at 7:20 AM, Andi Kleen <andi@firstfloor.org> wrote:
>
>> BTW I agree that EBPF won't work for ktap. The models
>> (static vs dynamic typing etc.) are just too different.
>
> If you meant 'static vs dynamic safety checking' then yes.
> This is a main difference between bpf and ktap approach to safety.
> bpf engine and checker are disjoint.
> Interpreter is dumb and just executes instructions.
> ktap interpreter has to do all sorts of checking, since it cannot
> trust instructions it sees.
> In this sense, loops are not supported by ibpf today, since they
> require run-time checks. I can think of a way to add such
> support, but rather not. Such 'anti-feature' is not needed.
>
> 'ktap syntax' from user space point of view, can use ibpf as-is.
> Show me the script and I can show how ibpf can run it.

Well, please don't engage 'ktap syntax' in here, if you think
"Integration" only means ktap compiler compiles ktap syntax
into BPF bytecode, then that's entirely misunderstood what's
the real problem in there, some ktap samples in below:

1). trace syscalls:* { print(argstr) }
Register many events.
I posted this script in previous mail, but don't get the answer
how to support this in BPF.
Note ktap implement this by library function(kdebug,trace_by_id),
not change object file, can BPF does this?

2). print("hello world")
This is simplest hello world script in ktap, note that the
executing context is not probe context, but in main ktap
context, BPF main context only allow declare table,
nothing else.
(You may think this helloworld script is not useful, but not
true, many script don't have to run in probe context, for
example, the script just want to read some global variable in kernel)

3). var s = {}; trace *:* { s[probename] += 1 }
variable table s is allocated in main context, same as above,
BPF disallow allocate table in this flexible way, ktap allow
assign table entries before register events, BPF also don't support.

4) var i = 0; trace *:* { i += 1}
Assign global variable in here, there also can assign other
value not 0, please show me how BPF do this.
(See complex global usage example in samples/schedule/schedtimes.kp)

5) kdebug.kprobe("SyS_futex", function () { print(pid) })
ktap register event through function call, not change any core vm,
obviously BPF cannot support this flexible callback mechanism.

6). time.profile { print(stack()) }
print kernel stack in timer manner. Note ktap implement this by library
function, not change any bytecode object file format.

7). trace_end
Note there may have execute logic in trace_end part, not just only
dump everything as you said, so I don't understand why BPF
want to move trace_end to userspace, Dtrace/stap both support
this, why BPF object this?
And ktap implement trace_end by function call, not change
any core vm design, hope BPF can do this without introduce any
change in BPF object file format.

8) call user defined function
It seems BPF cannot call user defined function(not inlined),
user defined function is useful when dynamic tracing solution
support tapset in future(IMO it's hard to avoid user defined tapset).

Note that all those above ktap examples don't change
any core ktap virtual machine and object file format,
table and event register both implemented by library, ktap
decouples features and vm very well, table/aggregation should
be a feature, not be in core vm, but BPF glue everything together,
in summary, three key issues in BPF:

1) BPF couples table in compiler/validation program.
Similar with table design, I think if BPF want to support aggregation
in future, it must need to change compiler and validation, and
will keep changes if BPF support more features.

2) BPF don't allow execute in main context
This is the main issue to for ktap integration, ktap allow
assign global variable, call allowed function before register
events to initiate things, this is mandatory for ktap, and
IMO it is mandatory for all generic dynamic tracing tools.

3) BPF mix event register logic in object format file
ktap object file don't aware any event logic, it's just a normal
function all in ktap, but in BPF object file, there even have a "event"
section.

IMO, BPF engine should be a simple and generic script engine,
just focus on the script engine, not features(table/aggregation/
event registration/trace_end/timer/etc), this is why ktap is so simple
and flexible, this is what I really want BPF can do, we are
have different opinions on those features, if it decouples with core
BPF vm and object file design, then everything will be solve, let
each part implement specific features though own library function.
this is not only useful for ktap, but may also benefit for other kernel
subsystem and external modules as well.

All these issues make we cannot let ktap run on BPF engine because
of current BPF limited and specific design.

Thanks.

Jovi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ktap and ebpf integration
  2014-04-05 14:23           ` Jovi Zhangwei
@ 2014-04-05 17:22             ` Alexei Starovoitov
  2014-04-05 21:26               ` Jovi Zhangwei
  2014-04-05 17:50             ` Andi Kleen
  1 sibling, 1 reply; 13+ messages in thread
From: Alexei Starovoitov @ 2014-04-05 17:22 UTC (permalink / raw)
  To: Jovi Zhangwei
  Cc: Ingo Molnar, Ingo Molnar, Steven Rostedt, Masami Hiramatsu,
	Greg KH, Andi Kleen, LKML

On Sat, Apr 5, 2014 at 7:23 AM, Jovi Zhangwei <jovi.zhangwei@gmail.com> wrote:
> On Sat, Apr 5, 2014 at 1:28 AM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>>
>> 'ktap syntax' from user space point of view, can use ibpf as-is.
>> Show me the script and I can show how ibpf can run it.
>
> Well, please don't engage 'ktap syntax' in here, if you think
> "Integration" only means ktap compiler compiles ktap syntax
> into BPF bytecode, then that's entirely misunderstood what's
> the real problem in there, some ktap samples in below:

Great. Nice examples.
To better understand how they map to bpf architecture
consider what bpf is:
- bpf instruction set = assembler code
- one bpf program = one function
- obj_file generated by ktap or C compiler consists of multiple
  bpf programs (functions) and each one attaches to one or
  multiple events
- events are [ku]probe/tracepoint events including
  init and fini events
- bpf program cannot have loops or call other bpf programs,
  though it can call safe kernel functions like bpf_printk,
  bpf_gettimeofday, bfp_getpid, etc
- one of such calls is 'bpf_load_pointer' = non-faulting access
  to any memory
- another call is 'bpf_table_lookup' that does table lookup
- bpf tables are not part of execution engine.
  tables are owned by kernel. User space can access them
  via netlink and may be through other mechanisms (like debugfs)
  Normal kernel C functions (like bpf_table_update) can access
  them in parallel.
  'tables' is a mechanism to pass data between bpf programs
  and between bpf program and userspace

> 1). trace syscalls:* { print(argstr) }
> Register many events.
> I posted this script in previous mail, but don't get the answer
> how to support this in BPF.
> Note ktap implement this by library function(kdebug,trace_by_id),
> not change object file, can BPF does this?

yes. should be clear from above explanation.

> 2). print("hello world")
> This is simplest hello world script in ktap, note that the
> executing context is not probe context, but in main ktap
> context, BPF main context only allow declare table,
> nothing else.
> (You may think this helloworld script is not useful, but not
> true, many script don't have to run in probe context, for
> example, the script just want to read some global variable in kernel)

yes. see above.

> 3). var s = {}; trace *:* { s[probename] += 1 }
> variable table s is allocated in main context, same as above,
> BPF disallow allocate table in this flexible way, ktap allow
> assign table entries before register events, BPF also don't support.

already supported.
's' is a table where key = probe_id, value = 4-byte integer

> 4) var i = 0; trace *:* { i += 1}
> Assign global variable in here, there also can assign other
> value not 0, please show me how BPF do this.
> (See complex global usage example in samples/schedule/schedtimes.kp)

hmm. schedtimes.kp example doesn't have any global variables.
RUNNING = 0 and SLEEPING = 2 are constants.
as far as I can see even that complex example maps to bpf just fine

> 5) kdebug.kprobe("SyS_futex", function () { print(pid) })
> ktap register event through function call, not change any core vm,
> obviously BPF cannot support this flexible callback mechanism.

I'm missing a 'callback' point here.
seems you're attaching to futex and printing pid.
That's supported.

> 6). time.profile { print(stack()) }
> print kernel stack in timer manner. Note ktap implement this by library
> function, not change any bytecode object file format.

I don't understand what 'time.profile' event is.
Isn't this the same as attaching bpf program to some periodic
event and printing stack? That's supported.
Note: nothing stops the user to write bpf program that is attached
to in kernel periodic event like timer.
I just don't want a built-in mechanism for timers, since it's a can
of worms from security point of view.

> 7). trace_end
> Note there may have execute logic in trace_end part, not just only
> dump everything as you said, so I don't understand why BPF
> want to move trace_end to userspace, Dtrace/stap both support
> this, why BPF object this?
> And ktap implement trace_end by function call, not change
> any core vm design, hope BPF can do this without introduce any
> change in BPF object file format.

in case of schedule/schedtimes.kp example
trace_end event should be part of userspace, since it walks
potentially very large tables.
At the same time there is a 'fini' event that in-kernel bpf program
can attach to.
If one of the bpf programs in obj_file is attached to 'init' event
it gets called upon obj_file loading. Similar with 'fini'.

> 8) call user defined function
> It seems BPF cannot call user defined function(not inlined),
> user defined function is useful when dynamic tracing solution
> support tapset in future(IMO it's hard to avoid user defined tapset).

completely the opposite.
bpf_call instruction is the key difference between new bpf and
classic bpf.

> in summary, three key issues in BPF:
>
> 1) BPF couples table in compiler/validation program.
> Similar with table design, I think if BPF want to support aggregation
> in future, it must need to change compiler and validation, and
> will keep changes if BPF support more features.

it should be clear that tables, bpf execution engine,
kernel functions are decoupled building blocks.
verifier brings things together by allowing fixed set of kernel
functions to be called from bpf program.
Obviously we cannot allow arbitrary function call from
programs. It's not safe.

> 2) BPF don't allow execute in main context
> This is the main issue to for ktap integration, ktap allow
> assign global variable, call allowed function before register
> events to initiate things, this is mandatory for ktap, and
> IMO it is mandatory for all generic dynamic tracing tools.

not true. see all of the above.

> 3) BPF mix event register logic in object format file
> ktap object file don't aware any event logic, it's just a normal
> function all in ktap, but in BPF object file, there even have a "event"
> section.

hmm. I'm missing 'issue' here.
I think it's a feature not an issue.
bpf program is a function. Like C function it doesn't embed
in itself where it's supposed to be called.
Separate 'section' in obj_file needs to describe relation
between event and function.

> IMO, BPF engine should be a simple and generic script engine,
> just focus on the script engine, not features(table/aggregation/

we won't be able to go to far while you keep thinking
of "execution engine" as "script engine"

> All these issues make we cannot let ktap run on BPF engine because
> of current BPF limited and specific design.

imo that sounds like you just trying to find an excuse to do your own
"script engine"
You probably got an impression that I'm shutting down all of your
'bpf extension' requests. Not at all.
If things are missing, let's add them.

Thanks
Alexei

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ktap and ebpf integration
  2014-04-05 17:22             ` Alexei Starovoitov
@ 2014-04-05 21:26               ` Jovi Zhangwei
  0 siblings, 0 replies; 13+ messages in thread
From: Jovi Zhangwei @ 2014-04-05 21:26 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Ingo Molnar, Ingo Molnar, Steven Rostedt, Masami Hiramatsu,
	Greg KH, Andi Kleen, LKML

On Sun, Apr 6, 2014 at 1:22 AM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On Sat, Apr 5, 2014 at 7:23 AM, Jovi Zhangwei <jovi.zhangwei@gmail.com> wrote:
>> On Sat, Apr 5, 2014 at 1:28 AM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>>>
>>> 'ktap syntax' from user space point of view, can use ibpf as-is.
>>> Show me the script and I can show how ibpf can run it.
>>
>> Well, please don't engage 'ktap syntax' in here, if you think
>> "Integration" only means ktap compiler compiles ktap syntax
>> into BPF bytecode, then that's entirely misunderstood what's
>> the real problem in there, some ktap samples in below:
>
> Great. Nice examples.
> To better understand how they map to bpf architecture
> consider what bpf is:
> - bpf instruction set = assembler code
> - one bpf program = one function
> - obj_file generated by ktap or C compiler consists of multiple
>   bpf programs (functions) and each one attaches to one or
>   multiple events
> - events are [ku]probe/tracepoint events including
>   init and fini events
> - bpf program cannot have loops or call other bpf programs,
>   though it can call safe kernel functions like bpf_printk,
>   bpf_gettimeofday, bfp_getpid, etc
> - one of such calls is 'bpf_load_pointer' = non-faulting access
>   to any memory
> - another call is 'bpf_table_lookup' that does table lookup
> - bpf tables are not part of execution engine.
>   tables are owned by kernel. User space can access them
>   via netlink and may be through other mechanisms (like debugfs)
>   Normal kernel C functions (like bpf_table_update) can access
>   them in parallel.
>   'tables' is a mechanism to pass data between bpf programs
>   and between bpf program and userspace
>

It seems you use confused statement about
"it's already supported" vs "it will be support" in below.

>> 1). trace syscalls:* { print(argstr) }
>> Register many events.
>> I posted this script in previous mail, but don't get the answer
>> how to support this in BPF.
>> Note ktap implement this by library function(kdebug,trace_by_id),
>> not change object file, can BPF does this?
>
> yes. should be clear from above explanation.
>
You still don't give me clear answer how it register multi-events
in one script, you use "attach" term for event register, so I guess
it means use "cat *.bpf > /sys/kernel/***/event/filter" to attach, right?
In my thinking, the event registration should be self-described in
script, why need another command line for event registration?
is that means user need to "cat" many times to register multi-events?

>> 2). print("hello world")
>> This is simplest hello world script in ktap, note that the
>> executing context is not probe context, but in main ktap
>> context, BPF main context only allow declare table,
>> nothing else.
>> (You may think this helloworld script is not useful, but not
>> true, many script don't have to run in probe context, for
>> example, the script just want to read some global variable in kernel)
>
> yes. see above.
>
Already supported? or will supported?
I didn't found a way to support this based on your patchset.

>> 3). var s = {}; trace *:* { s[probename] += 1 }
>> variable table s is allocated in main context, same as above,
>> BPF disallow allocate table in this flexible way, ktap allow
>> assign table entries before register events, BPF also don't support.
>
> already supported.
> 's' is a table where key = probe_id, value = 4-byte integer
>
>> 4) var i = 0; trace *:* { i += 1}
>> Assign global variable in here, there also can assign other
>> value not 0, please show me how BPF do this.
>> (See complex global usage example in samples/schedule/schedtimes.kp)
>
> hmm. schedtimes.kp example doesn't have any global variables.
> RUNNING = 0 and SLEEPING = 2 are constants.
> as far as I can see even that complex example maps to bpf just fine
>
Firstly, I want to say access global variable is not supported in your
patchset, compiler report it clearly, so if you think it already supported,
then it should be "will supported".

Anyway, I'm glad to see BPF are going to support this.

If I guess right, you plan to init global variables in 'init' section, it's fine
for me, again, it's "will support", not "supported", and this is first
time to know 'init' and 'fini' section in this mail, not mentioned before,
it's good to see BPF make progress.

>> 5) kdebug.kprobe("SyS_futex", function () { print(pid) })
>> ktap register event through function call, not change any core vm,
>> obviously BPF cannot support this flexible callback mechanism.
>
> I'm missing a 'callback' point here.
> seems you're attaching to futex and printing pid.
> That's supported.
>
The key is ktap implement this without change object file format, but
BPF need, anyway, I don't think hardcode in section is a big program,
but it should be self-described in script, not need another cat command line.

>> 6). time.profile { print(stack()) }
>> print kernel stack in timer manner. Note ktap implement this by library
>> function, not change any bytecode object file format.
>
> I don't understand what 'time.profile' event is.
> Isn't this the same as attaching bpf program to some periodic
> event and printing stack? That's supported.
> Note: nothing stops the user to write bpf program that is attached
> to in kernel periodic event like timer.
> I just don't want a built-in mechanism for timers, since it's a can
> of worms from security point of view.
>
Sorry, the syntax is: profile-10us {...}
It means timer fired on each cpu, maybe timer is NMI, it's needed to
get real kernel stack. similarly, tick-10us means timer only fired on
one cpu.(stap/dtrace both support this kind of timer mode)

It not means attach to in kernel periodic event, user need right to
set they specific timer interval.

Actually I don't know why you object this timer event? you mean
it as security issue, but how perf? perf also allow use to set
timer frequency, perf also have security issue?

>> 7). trace_end
>> Note there may have execute logic in trace_end part, not just only
>> dump everything as you said, so I don't understand why BPF
>> want to move trace_end to userspace, Dtrace/stap both support
>> this, why BPF object this?
>> And ktap implement trace_end by function call, not change
>> any core vm design, hope BPF can do this without introduce any
>> change in BPF object file format.
>
> in case of schedule/schedtimes.kp example
> trace_end event should be part of userspace, since it walks
> potentially very large tables.
> At the same time there is a 'fini' event that in-kernel bpf program
> can attach to.
> If one of the bpf programs in obj_file is attached to 'init' event
> it gets called upon obj_file loading. Similar with 'fini'.
>
Again, first time to know BPF will support "init" and "fini" function,
good move.

>> 8) call user defined function
>> It seems BPF cannot call user defined function(not inlined),
>> user defined function is useful when dynamic tracing solution
>> support tapset in future(IMO it's hard to avoid user defined tapset).
>
> completely the opposite.
> bpf_call instruction is the key difference between new bpf and
> classic bpf.
>
Perhaps you misunderstood, I mean call function in script,
not pre-defineded in kernel.

That's need for tapset, which I think it cannot be avoid in these
dynamic tracing tool(ktap/stap/dtrace), but I think it's not a
big issue in first step, just mind it maybe need to call
another function in script.

>> in summary, three key issues in BPF:
>>
>> 1) BPF couples table in compiler/validation program.
>> Similar with table design, I think if BPF want to support aggregation
>> in future, it must need to change compiler and validation, and
>> will keep changes if BPF support more features.
>
> it should be clear that tables, bpf execution engine,
> kernel functions are decoupled building blocks.

I don't think it's proper to use "decoupled" when there have a
"table" section in object format, it should not be in there if it's
truly decoupled.

Actually I think we can find a way to remove table section out
of object file without hurt safety(note BPF may need to support
aggregation someday, which is another kind of table, I don't think
it's a good idea to add more sections), but let's finish current issues,
then we can go through table design in the last.

> verifier brings things together by allowing fixed set of kernel
> functions to be called from bpf program.
> Obviously we cannot allow arbitrary function call from
> programs. It's not safe.
>
>> 2) BPF don't allow execute in main context
>> This is the main issue to for ktap integration, ktap allow
>> assign global variable, call allowed function before register
>> events to initiate things, this is mandatory for ktap, and
>> IMO it is mandatory for all generic dynamic tracing tools.
>
> not true. see all of the above.
>
Again, you just raised solution in this mail(init and fini section), not before.

>> 3) BPF mix event register logic in object format file
>> ktap object file don't aware any event logic, it's just a normal
>> function all in ktap, but in BPF object file, there even have a "event"
>> section.
>
> hmm. I'm missing 'issue' here.
> I think it's a feature not an issue.
> bpf program is a function. Like C function it doesn't embed
> in itself where it's supposed to be called.
> Separate 'section' in obj_file needs to describe relation
> between event and function.
>
Said above, if the event section can be self-described, then that's fine,
even though ktap do this more cleaner without touch object file format.

IMO we need to use some form string(syscalls:*) to represent event
registration or event id(like perf/ktap does), both is fine for me.

Anyway, I'm glad to see we already have some agreement on what
BPF need to extend.

Thanks.

Jovi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ktap and ebpf integration
  2014-04-05 14:23           ` Jovi Zhangwei
  2014-04-05 17:22             ` Alexei Starovoitov
@ 2014-04-05 17:50             ` Andi Kleen
  1 sibling, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2014-04-05 17:50 UTC (permalink / raw)
  To: Jovi Zhangwei
  Cc: Alexei Starovoitov, Ingo Molnar, Ingo Molnar, Steven Rostedt,
	Masami Hiramatsu, Greg KH, Andi Kleen, LKML

> 4) var i = 0; trace *:* { i += 1}
> Assign global variable in here, there also can assign other
> value not 0, please show me how BPF do this.
> (See complex global usage example in samples/schedule/schedtimes.kp)

That's what I meant. BPF is essentially a statically typed
language. KTAP is dynamically typed. It's a very different
model.

Yes they are both Turing complete and likely could 
be somehow translated into each other, but would
it be efficient and simple? No.

[essentially it's the "UNCOL problem" -- see
http://en.wikipedia.org/wiki/UNCOL
One size doesn't fit all in immediate languages]

-Andi


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ktap and ebpf integration
  2014-04-04  7:26   ` Jovi Zhangwei
  2014-04-04  7:48     ` Ingo Molnar
@ 2014-04-04 14:20     ` Andi Kleen
  1 sibling, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2014-04-04 14:20 UTC (permalink / raw)
  To: Jovi Zhangwei
  Cc: Alexei Starovoitov, Ingo Molnar, Steven Rostedt, Masami Hiramatsu,
	Greg KH, Andi Kleen, LKML

> Anyway, I think there will don't have any necessary to upstream
> ktap any more, I still enjoy the simplicity and flexibility given

Not sure how you got to that conclusion.

You were asked to evaluate if EBPF is an alternative
for ktap. It looks like the answer is no. So the original KTAP VM design
is back on the table. You can continue pursuing to merge that.
No reason to give up.

BTW I agree that EBPF won't work for ktap. The models
(static vs dynamic typing etc.) are just too different.
But it's good that it was studied in detail.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ktap and ebpf integration
  2014-04-04  6:26 ` Alexei Starovoitov
  2014-04-04  7:26   ` Jovi Zhangwei
@ 2014-04-04  7:27   ` Ingo Molnar
  1 sibling, 0 replies; 13+ messages in thread
From: Ingo Molnar @ 2014-04-04  7:27 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jovi Zhangwei, Ingo Molnar, Steven Rostedt, Masami Hiramatsu,
	Greg KH, Andi Kleen, LKML

* Alexei Starovoitov <ast@plumgrid.com> wrote:

> On Thu, Apr 3, 2014 at 6:21 PM, Jovi Zhangwei <jovi.zhangwei@gmail.com> wrote:
> > Hi Alexei,
> >
> > We talked a lot on ktap and ebpf integration in these days,
> > Now I think we can put into deeply to thinking out some
> > technical issues in there.
> >
> > Firstly, I want to make sure you are support this ktap and
> > ebpf integration direction, I aware you have ongoing 'bpf filter'
> > patch set work, which actually overlapping with ktap integration
> > efforts (IMO the interface should be unified and simple for user,
> >  so I think filter debugfs file is not a good interface), so please let
> > me know your answer about this.
> 
> I think the more choices users have the better.
> I'll continue with C based filters and you can continue with ktap
> syntax. That's ok. We can share all kernel pieces.

I'd somewhat agree with that if this wasn't about the kernel, but I 
think that it's evidently useful to have one syntax for the kernel 
(both the scheduler and drivers are written in C) - and probing the 
kernel is really very close to the kernel source itself so it's just 
an extension of that same principle.

Look at the advantages: people who learn how to write C syntax ktaps 
would only be a very small step away from writing actual kernel 
patches and becoming contributors.

With some random weird new syntax (be it Lua, C# or Java or any other 
simplified syntax) that has no relation to kernel source syntax, 
there's no such synergy!

If the 'ktap syntax' lives purely in user space, and the kernel bits 
are largely be shared and reused, which your suggested design is, then 
I have no fundamental objections to that: other than I think it's a 
mistake to not harmonize with the syntax of the probed project! But as 
long as the other desing aspects are fixed it's not a big showstopper 
as the mistake is not propagated to the kernel.

> [ design suggestions ]

I fully agree with your suggestions so far, that looks like a workable 
way to address my concerns.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-04-05 21:26 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-04  1:21 ktap and ebpf integration Jovi Zhangwei
2014-04-04  6:26 ` Alexei Starovoitov
2014-04-04  7:26   ` Jovi Zhangwei
2014-04-04  7:48     ` Ingo Molnar
2014-04-04  8:46       ` Jovi Zhangwei
2014-04-04 15:57         ` Alexei Starovoitov
2014-04-04 17:28         ` Alexei Starovoitov
2014-04-05 14:23           ` Jovi Zhangwei
2014-04-05 17:22             ` Alexei Starovoitov
2014-04-05 21:26               ` Jovi Zhangwei
2014-04-05 17:50             ` Andi Kleen
2014-04-04 14:20     ` Andi Kleen
2014-04-04  7:27   ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).