Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Wang Nan <wangnan0@huawei.com>
To: Alexei Starovoitov <ast@plumgrid.com>, <davem@davemloft.net>,
	<acme@kernel.org>, <mingo@redhat.com>, <a.p.zijlstra@chello.nl>,
	<masami.hiramatsu.pt@hitachi.com>, <jolsa@kernel.org>
Cc: <linux-kernel@vger.kernel.org>, <pi3orama@163.com>,
	<hekuang@huawei.com>, <bgregg@netflix.com>
Subject: Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
Date: Tue, 5 May 2015 12:41:53 +0800	[thread overview]
Message-ID: <55484A11.7070603@huawei.com> (raw)
In-Reply-To: <554832AA.5050503@plumgrid.com>

On 2015/5/5 11:02, Alexei Starovoitov wrote:
> On 5/2/15 12:19 AM, Wang Nan wrote:
>>
>> I'd like to do following works in the next version (based on my experience and feedbacks):
>>
>> 1. Safely clean up kprobe points after unloading;
>>
>> 2. Add subcommand space to 'perf bpf'. Current staff should be reside in 'perf bpf load';
>>
>> 3. Extract eBPF ELF walking and collecting work to a separated library to help others.
> 
> that's a good list.
> 
> The feedback for existing patches:
> patch 18 - since we're creating a generic library for bpf elf
> loading it would great to do the following:
> first try to load with
> attr.log_buf = NULL;
> attr.log_level = 0;
> then only if it fails, allocate a buffer and repeat with log_level = 1.
> The reason is that it's better to have fast program loading by default
> without any verbosity emitted by verifier.
> 

Will do.

> patch 19 - I think it's unnecessary.
> verifier already dumps it. so this '-v' flag can be translated into
> verbose loading.
> There is also .s output from llvm for those interested in bpf asm
> instructions.
> 

That's great. Could you please append the description of 'llvm -s' into your README
or comments? It has cost me a lot of time for dumping eBPF instructions so I decide to
add it into perf...

>> My collage He Kuang is working on variable accessing. Probing inside function body
>> and accessing its local variable will be supported like this:
>>
>>   SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara"
>>   int prog(struct pt_regs *ctx, unsigned long vara) {
>>      // vara is the value of localvara of function func_name
>>   }
> 
> that would be great. I'm not sure though how you can achieve that
> without changing C front-end ?

It's not very difficult. He is trying to generate the loader of vara
as prologue, then paste the prologue and the main eBPF program together.
>From the viewpoint of kernel bpf verifier, there is only one param (ctx); the
prologue program fetches the value of vara then put it into a propoer register,
then main program work.

Another possible solution is to change the protocol between kprobe and eBPF
program, makes kprobes calls fetchers and passes them to eBPF program as
a second param (group all varx together).
A prologue may still need in this case to load each param into correct
register.

> This type of feature is exactly the reason why we're trying to write
> our front-end.
> In general there are two ways to achieve 'restricted C' language:
> - start from clang and chop all features that are not supported.
>   I believe Jovi already tried to do that and it became very difficult.
> - start from simple front-end with minimal C and add all things one by
>   one. That's what we're trying to do. So far we have most of normal
>   syntax. The problem with our approach is that we cannot easily do
>   #include of existing .h files. We're working on that.
>   It's too experimental still. May be will be drop it and go back to
>   first approach.
> 
> The reason for extending front-end is your example above, where
> the user would want to write:
>    int prog(struct pt_regs *ctx, unsigned long vara) {
>     // use 'vara'
> but generated BPF should have only one 'ctx' pointer, since that's
> the only thing that verifier will accept. bpf/core and JITs expect
> only one argument, etc.
> So this func definition + 'vara' access can be compiled as ctx->si
> (if vara is actually in register) or
> bpf_probe_read(ctx->bp + magic_offset_from_debug_info)
> (if vara is on stack)
> or it can also be done via store_trace_args() but that will be slower
> and requires hacking kernel, whereas ctx->... style is pure userspace.
> Lot's of things to brainstorm. So please share your progress soon.
> 
>> And I want to discuss with you and others about:
>>
>>   1. How to make eBPF output its tracing and aggregation results to perf?
> 
> well, the output of bpf program is a data stored in maps. Each program
> needs a corresponding user space reader/printer/sorter of this data.
> Like tracex2 prints this data as histogram and tracex3 prints it as
> heatmap. We can standardize few things like this, but ideally we
> keep it up to user. So that user can write single file that consists
> of functions that are loaded as bpf into kernel and other functions
> that are executed in user space. llvm can jit first set to bpf and
> second set to x86. That's distant future though.
> So far samples/bpf/ style of kern.c+user.c worked quite well.
> 

Well, looks like in your design the usage of BPF programs are some aggration
results. In my side, I want they also ack as trace filters.

Could you please consider the following problem?

We find there are serval __lock_page() calls last very long time. We are going
to find corresponding __unlock_page() so we can know what blocks them. We want to
insert eBPF programs before io_schedule() in __lock_page(), and also add eBPF program
on the entry of __unlock_page(), so we can compute the interval between page locking and
unlocking. If time is longer than a threshold, let __unlock_page() trigger a perf sampling
so we get its call stack. In this case, eBPF program acts as a trace filter.

Thank you.

next prev parent reply	other threads:[~2015-05-05  4:42 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-30 10:52 [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs Wang Nan
2015-04-30 10:52 ` [RFC PATCH 01/22] perf: probe: avoid segfault if passed with '' Wang Nan
2015-05-05 14:09   ` Masami Hiramatsu
2015-05-05 15:26     ` Arnaldo Carvalho de Melo
2015-05-05 16:33       ` Masami Hiramatsu
2015-04-30 10:52 ` [RFC PATCH 02/22] perf: bpf: prepare: add __aligned_u64 to types.h Wang Nan
2015-04-30 10:52 ` [RFC PATCH 03/22] perf: add bpf common operations Wang Nan
2015-04-30 10:52 ` [RFC PATCH 04/22] perf tools: Add new 'perf bpf' command Wang Nan
2015-05-11  6:28   ` Namhyung Kim
2015-04-30 10:52 ` [RFC PATCH 05/22] perf bpf: open eBPF object file and do basic validation Wang Nan
2015-04-30 10:52 ` [RFC PATCH 06/22] perf bpf: check swap according to EHDR Wang Nan
2015-04-30 10:52 ` [RFC PATCH 07/22] perf bpf: iterater over elf sections to collect information Wang Nan
2015-04-30 10:52 ` [RFC PATCH 08/22] perf bpf: collect version and license from ELF Wang Nan
2015-04-30 10:52 ` [RFC PATCH 09/22] perf bpf: collect map definitions Wang Nan
2015-05-11  6:32   ` Namhyung Kim
2015-04-30 10:52 ` [RFC PATCH 10/22] perf bpf: collect config section in object Wang Nan
2015-04-30 10:52 ` [RFC PATCH 11/22] perf bpf: collect symbol table in object files Wang Nan
2015-04-30 10:52 ` [RFC PATCH 12/22] perf bpf: collect bpf programs from " Wang Nan
2015-04-30 10:52 ` [RFC PATCH 13/22] perf bpf: collects relocation sections from object file Wang Nan
2015-04-30 10:52 ` [RFC PATCH 14/22] perf bpf: config eBPF programs based on their names Wang Nan
2015-04-30 10:52 ` [RFC PATCH 15/22] perf bpf: config eBPF programs using config section Wang Nan
2015-04-30 10:52 ` [RFC PATCH 16/22] perf bpf: create maps needed by object file Wang Nan
2015-04-30 10:52 ` [RFC PATCH 17/22] perf bpf: relocation programs Wang Nan
2015-04-30 10:52 ` [RFC PATCH 18/22] perf bpf: load eBPF programs into kernel Wang Nan
2015-04-30 10:52 ` [RFC PATCH 19/22] perf bpf: dump eBPF program before loading Wang Nan
2015-04-30 10:52 ` [RFC PATCH 20/22] perf bpf: clean elf memory after loading Wang Nan
2015-04-30 10:52 ` [RFC PATCH 21/22] perf bpf: probe at kprobe points Wang Nan
2015-05-05 16:34   ` Masami Hiramatsu
2015-05-06  2:36     ` Wang Nan
2015-04-30 10:52 ` [RFC PATCH 22/22] perf bpf: attaches eBPF program to perf fd Wang Nan
2015-05-01  4:37 ` [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs Alexei Starovoitov
2015-05-01 11:06   ` Peter Zijlstra
2015-05-01 11:49     ` Ingo Molnar
2015-05-01 16:56       ` Alexei Starovoitov
2015-05-01 17:06         ` Ingo Molnar
2015-05-05 15:39         ` Arnaldo Carvalho de Melo
2015-05-02  7:19   ` Wang Nan
2015-05-05  3:02     ` Alexei Starovoitov
2015-05-05  4:41       ` Wang Nan [this message]
2015-05-05  5:49         ` Alexei Starovoitov
2015-05-05  6:14           ` Wang Nan
2015-05-06  4:46             ` Wang Nan
2015-05-06  4:56               ` Alexei Starovoitov
2015-05-06  5:00                 ` Wang Nan
2015-05-01  7:16 ` Ingo Molnar
2015-05-05 21:52 ` Brendan Gregg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55484A11.7070603@huawei.com \
    --to=wangnan0@huawei.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@kernel.org \
    --cc=ast@plumgrid.com \
    --cc=bgregg@netflix.com \
    --cc=davem@davemloft.net \
    --cc=hekuang@huawei.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=mingo@redhat.com \
    --cc=pi3orama@163.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox