public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Jiri Olsa <jolsa@redhat.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: "Andrii Nakryiko" <andrii.nakryiko@gmail.com>,
	"Jiri Olsa" <jolsa@kernel.org>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Andrii Nakryiko" <andriin@fb.com>,
	Networking <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
	"Martin KaFai Lau" <kafai@fb.com>,
	"Song Liu" <songliubraving@fb.com>, "Yonghong Song" <yhs@fb.com>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"KP Singh" <kpsingh@chromium.org>, "Daniel Xu" <dxu@dxuuu.xyz>,
	"Jesper Brouer" <jbrouer@redhat.com>,
	"Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Viktor Malik" <vmalik@redhat.com>
Subject: Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
Date: Thu, 15 Apr 2021 19:39:45 +0200	[thread overview]
Message-ID: <YHh6YeOPh0HIlb3e@krava> (raw)
In-Reply-To: <20210415111002.324b6bfa@gandalf.local.home>

On Thu, Apr 15, 2021 at 11:10:02AM -0400, Steven Rostedt wrote:

SNIP

> > > heya,
> > > I had some initial prototypes trying this way, but always ended up
> > > in complicated code, that's why I turned to ftrace_ops.
> > >
> > > let's see if it'll make any sense to you ;-)
> > >
> > > 1) so let's say we have extra trampoline for the program (which
> > > also seems a bit of waste since there will be just single record  
> > 
> > BPF trampoline does more than just calls BPF program. At the very
> > least it saves input arguments for fexit program to be able to access
> > it. But given it's one BPF trampoline attached to thousands of
> > functions, I don't see any problem there.
> 
> Note, there's a whole infrastructure that does similar things in ftrace.
> I wrote the direct call to jump to individual trampolines, because ftrace
> was too generic. The only way at the time to get to the arguments was via
> the ftrace_regs_caller, which did a full save of regs, because this was
> what kprobes needed, and was too expensive for BPF.
> 
> I now regret writing the direct callers, and instead should have just done
> what I did afterward, which was to make ftrace default to a light weight
> trampoline that only saves enough for getting access to the arguments of
> the function. And have BPF use that. But I was under the impression that
> BPF needed fast access to a single function, and it would not become a
> generic trampoline for multiple functions, because that was the argument
> used to not enhance ftrace.
> 
> Today, ftrace by dafault (on x86) implements a generic way to get the
> arguments, and just the arguments which is exactly what BPF would need for
> multiple functions. And yes, you even have access to the return code if you
> want to "hijack" it. And since it was originally for a individual functions
> (and not a batch), I created the direct caller for BPF. But the direct
> caller will not be enhanced for multiple functions, as that's not its
> purpose. If you want a trampoline to be called back to multiple functions,
> then use the infrastructure that was designed for that. Which is what Jiri
> had proposed here.
> 
> And because the direct caller can mess with the return code, it breaks
> function graph tracing. As a temporary work around, we just made function
> graph ignore any function that has a direct caller attached to it.
> 
> If you want batch processing of BPF programs, you need to first fix the
> function graph tracing issue, and allow both BPF attached callers and
> function graph to work on the same functions.
> 
> I don't know how the BPF code does it, but if you are tracing the exit
> of a function, I'm assuming that you hijack the return pointer and replace
> it with a call to a trampoline that has access to the arguments. To do

hi,
it's bit different, the trampoline makes use of the fact that the
call to trampoline is at the very begining of the function and, so
it can call the origin function with 'call function + 5' instr.

so in nutshell the trampoline does:

  call entry_progs
  call original_func+5
  call exit_progs

you can check this in arch/x86/net/bpf_jit_comp.c in moe detail:

 * The assembly code when eth_type_trans is called from trampoline:
 *
 * push rbp
 * mov rbp, rsp
 * sub rsp, 24                     // space for skb, dev, return value
 * push rbx                        // temp regs to pass start time
 * mov qword ptr [rbp - 24], rdi   // save skb pointer to stack
 * mov qword ptr [rbp - 16], rsi   // save dev pointer to stack
 * call __bpf_prog_enter           // rcu_read_lock and preempt_disable
 * mov rbx, rax                    // remember start time if bpf stats are enabled
 * lea rdi, [rbp - 24]             // R1==ctx of bpf prog
 * call addr_of_jited_FENTRY_prog  // bpf prog can access skb and dev

entry program called ^^^

 * movabsq rdi, 64bit_addr_of_struct_bpf_prog  // unused if bpf stats are off
 * mov rsi, rbx                    // prog start time
 * call __bpf_prog_exit            // rcu_read_unlock, preempt_enable and stats math
 * mov rdi, qword ptr [rbp - 24]   // restore skb pointer from stack
 * mov rsi, qword ptr [rbp - 16]   // restore dev pointer from stack
 * call eth_type_trans+5           // execute body of eth_type_trans

original function called ^^^

 * mov qword ptr [rbp - 8], rax    // save return value
 * call __bpf_prog_enter           // rcu_read_lock and preempt_disable
 * mov rbx, rax                    // remember start time in bpf stats are enabled
 * lea rdi, [rbp - 24]             // R1==ctx of bpf prog
 * call addr_of_jited_FEXIT_prog   // bpf prog can access skb, dev, return value

exit program called ^^^

 * movabsq rdi, 64bit_addr_of_struct_bpf_prog  // unused if bpf stats are off
 * mov rsi, rbx                    // prog start time
 * call __bpf_prog_exit            // rcu_read_unlock, preempt_enable and stats math
 * mov rax, qword ptr [rbp - 8]    // restore eth_type_trans's return value
 * pop rbx
 * leave
 * add rsp, 8                      // skip eth_type_trans's frame
 * ret                             // return to its caller

> this you need a shadow stack to save the real return as well as the
> parameters of the function. This is something that I have patches that do
> similar things with function graph.
> 
> If you want this feature, lets work together and make this work for both
> BPF and ftrace.

it's been some time I saw a graph tracer, is there a way to make it
access input arguments and make it available through ftrace_ops
interface?

thanks,
jirka


  reply	other threads:[~2021-04-15 17:40 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-13 12:15 [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe Jiri Olsa
2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 1/7] bpf: Move bpf_prog_start/end functions to generic place Jiri Olsa
2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 2/7] bpf: Add bpf_functions object Jiri Olsa
2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 3/7] bpf: Add support to attach program to ftrace probe Jiri Olsa
2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 4/7] libbpf: Add btf__find_by_pattern_kind function Jiri Olsa
2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 5/7] libbpf: Add support to load and attach ftrace probe Jiri Olsa
2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 6/7] selftests/bpf: Add ftrace probe to fentry test Jiri Olsa
2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 7/7] selftests/bpf: Add ftrace probe test Jiri Olsa
2021-04-14  1:04 ` [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe Andrii Nakryiko
2021-04-14 12:19   ` Jiri Olsa
2021-04-14 22:46     ` Andrii Nakryiko
2021-04-15 14:00       ` Jiri Olsa
2021-04-15 15:10       ` Steven Rostedt
2021-04-15 17:39         ` Jiri Olsa [this message]
2021-04-15 18:18           ` Steven Rostedt
2021-04-15 18:21             ` Steven Rostedt
2021-04-15 21:49               ` Jiri Olsa
2021-04-15 23:30                 ` Steven Rostedt
2021-04-19 20:51                   ` Jiri Olsa
2021-04-19 22:00                     ` Steven Rostedt
2021-04-15 18:31             ` Yonghong Song
2021-04-15 20:45         ` Andrii Nakryiko
2021-04-15 21:00           ` Steven Rostedt
2021-04-16 15:03             ` Masami Hiramatsu
2021-04-16 16:48               ` Steven Rostedt
2021-04-19 14:29                 ` Masami Hiramatsu
2021-04-20 12:51                 ` Jiri Olsa
2021-04-20 15:33                   ` Alexei Starovoitov
2021-04-20 16:33                     ` Steven Rostedt
2021-04-20 16:52                     ` Jiri Olsa
2021-04-20 23:38                       ` Alexei Starovoitov
2021-04-21 13:40                         ` Jiri Olsa
2021-04-21 14:05                           ` Steven Rostedt
2021-04-21 18:52                             ` Andrii Nakryiko
2021-04-21 19:18                               ` Jiri Olsa
2021-04-22 14:24                                 ` Steven Rostedt
2021-04-21 21:37                             ` Jiri Olsa
2021-04-22  1:17                               ` Steven Rostedt
2021-04-20  4:51               ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YHh6YeOPh0HIlb3e@krava \
    --to=jolsa@redhat.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andriin@fb.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dxu@dxuuu.xyz \
    --cc=jbrouer@redhat.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kafai@fb.com \
    --cc=kpsingh@chromium.org \
    --cc=netdev@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=songliubraving@fb.com \
    --cc=toke@redhat.com \
    --cc=vmalik@redhat.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox