From: Alexei Starovoitov <ast@plumgrid.com>
To: Wang Nan <wangnan0@huawei.com>,
davem@davemloft.net, acme@kernel.org, mingo@redhat.com,
a.p.zijlstra@chello.nl, masami.hiramatsu.pt@hitachi.com,
jolsa@kernel.org
Cc: linux-kernel@vger.kernel.org, pi3orama@163.com,
hekuang@huawei.com, bgregg@netflix.com
Subject: Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
Date: Mon, 04 May 2015 22:49:01 -0700 [thread overview]
Message-ID: <554859CD.4090206@plumgrid.com> (raw)
In-Reply-To: <55484A11.7070603@huawei.com>
On 5/4/15 9:41 PM, Wang Nan wrote:
>
> That's great. Could you please append the description of 'llvm -s' into your README
> or comments? It has cost me a lot of time for dumping eBPF instructions so I decide to
> add it into perf...
sure. it's just -filetype=asm flag to llc instead of -filetype=obj.
Eventually it will work as normal 'clang -S file.c' when few more
llvm commits are accepted upstream.
>>> My collage He Kuang is working on variable accessing. Probing inside function body
>>> and accessing its local variable will be supported like this:
>>>
>>> SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara"
>>> int prog(struct pt_regs *ctx, unsigned long vara) {
>>> // vara is the value of localvara of function func_name
>>> }
>>
>> that would be great. I'm not sure though how you can achieve that
>> without changing C front-end ?
>
> It's not very difficult. He is trying to generate the loader of vara
> as prologue, then paste the prologue and the main eBPF program together.
> From the viewpoint of kernel bpf verifier, there is only one param (ctx); the
> prologue program fetches the value of vara then put it into a propoer register,
> then main program work.
got it. I think that's much cleaner than what I was proposing.
The only question is then:
char _prog_config[] = "prog: func_name:1234 vara=localvara"
should actually be something like "... r2=localvara", right?
since prologue would need to assign into r2.
Otherwise I don't see where you find out about 'vara' inside
compiled bpf code.
Would be nice if this can be done without debug info.
Like in tracex2_kern.c I have:
SEC("kprobe/sys_write")
int bpf_prog(struct pt_regs *ctx)
{
long wr_size = ctx->dx; /* arg3 */
with your prolog generator the above can be rewritten as:
SEC("kprobe/sys_write")
int bpf_prog(struct pt_regs *unused, int fd, char *buf, size_t wr_size)
{
/* use wr_size */
that will improve ease of use a lot.
> Another possible solution is to change the protocol between kprobe and eBPF
> program, makes kprobes calls fetchers and passes them to eBPF program as
> a second param (group all varx together).
> A prologue may still need in this case to load each param into correct
> register.
you mean grouping varx together in some other struct and embedding it
together with pt_regs into new container struct?
doable, but your first approach is quite clean already. why bother.
> Could you please consider the following problem?
>
> We find there are serval __lock_page() calls last very long time. We are going
> to find corresponding __unlock_page() so we can know what blocks them. We want to
> insert eBPF programs before io_schedule() in __lock_page(), and also add eBPF program
> on the entry of __unlock_page(), so we can compute the interval between page locking and
> unlocking. If time is longer than a threshold, let __unlock_page() trigger a perf sampling
> so we get its call stack. In this case, eBPF program acts as a trace filter.
all makes sense and your use case fits quite well into existing
bpf+kprobe model. I'm not sure why you're calling a 'problem'.
A problem of how to display that call stack from perf?
I would say it fits better as a sample than a trace.
If you dump it as a trace, it won't easy to decipher, whereas if you
treat it a sampling event, perf record/report facility will pick it up
and display nicely. Meaning that one sample == lock_page/unlock_page
latency > N. Then existing sample_callchain flag should work.
next prev parent reply other threads:[~2015-05-05 5:49 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-30 10:52 [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs Wang Nan
2015-04-30 10:52 ` [RFC PATCH 01/22] perf: probe: avoid segfault if passed with '' Wang Nan
2015-05-05 14:09 ` Masami Hiramatsu
2015-05-05 15:26 ` Arnaldo Carvalho de Melo
2015-05-05 16:33 ` Masami Hiramatsu
2015-04-30 10:52 ` [RFC PATCH 02/22] perf: bpf: prepare: add __aligned_u64 to types.h Wang Nan
2015-04-30 10:52 ` [RFC PATCH 03/22] perf: add bpf common operations Wang Nan
2015-04-30 10:52 ` [RFC PATCH 04/22] perf tools: Add new 'perf bpf' command Wang Nan
2015-05-11 6:28 ` Namhyung Kim
2015-04-30 10:52 ` [RFC PATCH 05/22] perf bpf: open eBPF object file and do basic validation Wang Nan
2015-04-30 10:52 ` [RFC PATCH 06/22] perf bpf: check swap according to EHDR Wang Nan
2015-04-30 10:52 ` [RFC PATCH 07/22] perf bpf: iterater over elf sections to collect information Wang Nan
2015-04-30 10:52 ` [RFC PATCH 08/22] perf bpf: collect version and license from ELF Wang Nan
2015-04-30 10:52 ` [RFC PATCH 09/22] perf bpf: collect map definitions Wang Nan
2015-05-11 6:32 ` Namhyung Kim
2015-04-30 10:52 ` [RFC PATCH 10/22] perf bpf: collect config section in object Wang Nan
2015-04-30 10:52 ` [RFC PATCH 11/22] perf bpf: collect symbol table in object files Wang Nan
2015-04-30 10:52 ` [RFC PATCH 12/22] perf bpf: collect bpf programs from " Wang Nan
2015-04-30 10:52 ` [RFC PATCH 13/22] perf bpf: collects relocation sections from object file Wang Nan
2015-04-30 10:52 ` [RFC PATCH 14/22] perf bpf: config eBPF programs based on their names Wang Nan
2015-04-30 10:52 ` [RFC PATCH 15/22] perf bpf: config eBPF programs using config section Wang Nan
2015-04-30 10:52 ` [RFC PATCH 16/22] perf bpf: create maps needed by object file Wang Nan
2015-04-30 10:52 ` [RFC PATCH 17/22] perf bpf: relocation programs Wang Nan
2015-04-30 10:52 ` [RFC PATCH 18/22] perf bpf: load eBPF programs into kernel Wang Nan
2015-04-30 10:52 ` [RFC PATCH 19/22] perf bpf: dump eBPF program before loading Wang Nan
2015-04-30 10:52 ` [RFC PATCH 20/22] perf bpf: clean elf memory after loading Wang Nan
2015-04-30 10:52 ` [RFC PATCH 21/22] perf bpf: probe at kprobe points Wang Nan
2015-05-05 16:34 ` Masami Hiramatsu
2015-05-06 2:36 ` Wang Nan
2015-04-30 10:52 ` [RFC PATCH 22/22] perf bpf: attaches eBPF program to perf fd Wang Nan
2015-05-01 4:37 ` [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs Alexei Starovoitov
2015-05-01 11:06 ` Peter Zijlstra
2015-05-01 11:49 ` Ingo Molnar
2015-05-01 16:56 ` Alexei Starovoitov
2015-05-01 17:06 ` Ingo Molnar
2015-05-05 15:39 ` Arnaldo Carvalho de Melo
2015-05-02 7:19 ` Wang Nan
2015-05-05 3:02 ` Alexei Starovoitov
2015-05-05 4:41 ` Wang Nan
2015-05-05 5:49 ` Alexei Starovoitov [this message]
2015-05-05 6:14 ` Wang Nan
2015-05-06 4:46 ` Wang Nan
2015-05-06 4:56 ` Alexei Starovoitov
2015-05-06 5:00 ` Wang Nan
2015-05-01 7:16 ` Ingo Molnar
2015-05-05 21:52 ` Brendan Gregg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=554859CD.4090206@plumgrid.com \
--to=ast@plumgrid.com \
--cc=a.p.zijlstra@chello.nl \
--cc=acme@kernel.org \
--cc=bgregg@netflix.com \
--cc=davem@davemloft.net \
--cc=hekuang@huawei.com \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=masami.hiramatsu.pt@hitachi.com \
--cc=mingo@redhat.com \
--cc=pi3orama@163.com \
--cc=wangnan0@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox