public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Wang Nan <wangnan0@huawei.com>
To: Alexei Starovoitov <ast@plumgrid.com>
Cc: <linux-kernel@vger.kernel.org>, <pi3orama@163.com>,
	Li Zefan <lizefan@huawei.com>
Subject: Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
Date: Wed, 6 May 2015 12:46:40 +0800	[thread overview]
Message-ID: <55499CB0.1090400@huawei.com> (raw)
In-Reply-To: <55485FBF.20306@huawei.com>

Hi Alexei Starovoitov,

Have you ever read this mail?

I'm very intrerested in triggering perf sample in BPF code.
You said it is not a problem. Could you please give me some
further information?

Thank you.

On 2015/5/5 14:14, Wang Nan wrote:
> On 2015/5/5 13:49, Alexei Starovoitov wrote:
>> On 5/4/15 9:41 PM, Wang Nan wrote:
>>>
>>> That's great. Could you please append the description of 'llvm -s' into your README
>>> or comments? It has cost me a lot of time for dumping eBPF instructions so I decide to
>>> add it into perf...
>>
>> sure. it's just -filetype=asm flag to llc instead of -filetype=obj.
>> Eventually it will work as normal 'clang -S file.c' when few more
>> llvm commits are accepted upstream.
>>
>>>>> My collage He Kuang is working on variable accessing. Probing inside function body
>>>>> and accessing its local variable will be supported like this:
>>>>>
>>>>>    SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara"
>>>>>    int prog(struct pt_regs *ctx, unsigned long vara) {
>>>>>       // vara is the value of localvara of function func_name
>>>>>    }
>>>>
>>>> that would be great. I'm not sure though how you can achieve that
>>>> without changing C front-end ?
>>>
>>> It's not very difficult. He is trying to generate the loader of vara
>>> as prologue, then paste the prologue and the main eBPF program together.
>>>  From the viewpoint of kernel bpf verifier, there is only one param (ctx); the
>>> prologue program fetches the value of vara then put it into a propoer register,
>>> then main program work.
>>
>> got it. I think that's much cleaner than what I was proposing.
>> The only question is then:
>> char _prog_config[] = "prog: func_name:1234 vara=localvara"
>> should actually be something like "... r2=localvara", right?
>> since prologue would need to assign into r2.
>> Otherwise I don't see where you find out about 'vara' inside
>> compiled bpf code.
>>
> 
> I think the calling convention could teach us which var should go to which
> register. In the case of
> 
>  SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara varb=globalvarb";
>  int prog(struct pt_regs *ctx, unsigned long vara, unsigned long varb) { ... }
> 
> llvm should compile 'prog' according to calling convention. The body of that
> program should assume vara in r2 and varb in r3. The prologue also puts the vars into
> r2 and r3 according to calling convention. Therefore, after paste them together, the final
> program should run properly. There is no need to describe register number explicitly.
> What do you think?
> 
> 
>> Would be nice if this can be done without debug info.
>> Like in tracex2_kern.c I have:
>> SEC("kprobe/sys_write")
>> int bpf_prog(struct pt_regs *ctx)
>> {
>>         long wr_size = ctx->dx; /* arg3 */
>>
>> with your prolog generator the above can be rewritten as:
>> SEC("kprobe/sys_write")
>> int bpf_prog(struct pt_regs *unused, int fd, char *buf, size_t wr_size)
>> {
>>         /* use wr_size */
>>
>> that will improve ease of use a lot.
>>
> 
> It is possible if probing on the entry of a function. However, when probing on
> function body, there still need a way to pass variable list required by the
> program to perf to let it generate correct prologue. We'd like to implement
> the generic one (list vars in config string) first, then make function
> parameters accessing as a syntax sugar.
> 
>>> Another possible solution is to change the protocol between kprobe and eBPF
>>> program, makes kprobes calls fetchers and passes them to eBPF program as
>>> a second param (group all varx together).
>>> A prologue may still need in this case to load each param into correct
>>> register.
>>
>> you mean grouping varx together in some other struct and embedding it
>> together with pt_regs into new container struct?
>> doable, but your first approach is quite clean already. why bother.
>>
> 
> The second approach makes us reuse the fetchers code which are already in
> kernel. Further more, if new type of fetchers are appear (for example, fetcher
> of PMU counter), we support it automatically.
> 
>>> Could you please consider the following problem?
>>>
>>> We find there are serval __lock_page() calls last very long time. We are going
>>> to find corresponding __unlock_page() so we can know what blocks them. We want to
>>> insert eBPF programs before io_schedule() in __lock_page(), and also add eBPF program
>>> on the entry of __unlock_page(), so we can compute the interval between page locking and
>>> unlocking. If time is longer than a threshold, let __unlock_page() trigger a perf sampling
>>> so we get its call stack. In this case, eBPF program acts as a trace filter.
>>
>> all makes sense and your use case fits quite well into existing
>> bpf+kprobe model. I'm not sure why you're calling a 'problem'.
>> A problem of how to display that call stack from perf?
>> I would say it fits better as a sample than a trace.
>> If you dump it as a trace, it won't easy to decipher, whereas if you
>> treat it a sampling event, perf record/report facility will pick it up and display nicely. Meaning that one sample == lock_page/unlock_page
>> latency > N. Then existing sample_callchain flag should work.
>>
> 
> Quite well. Do we have an eBPF function like
> 
> static int (*bpf_perf_sample)(const char *fmt, int fmt_size, ...) = BPF_FUNC_perf_sample
> 
> so we can use it in the program probed in the body of __unlock_page() like that:
> 
>  ...
>  if (latency > 0.5s)
>     bpf_perf_sample("page=%p, latency=%d", sizeof(...), page, latency);
>  ...
> 
> Thank you.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 



  reply	other threads:[~2015-05-06  4:47 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-30 10:52 [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs Wang Nan
2015-04-30 10:52 ` [RFC PATCH 01/22] perf: probe: avoid segfault if passed with '' Wang Nan
2015-05-05 14:09   ` Masami Hiramatsu
2015-05-05 15:26     ` Arnaldo Carvalho de Melo
2015-05-05 16:33       ` Masami Hiramatsu
2015-04-30 10:52 ` [RFC PATCH 02/22] perf: bpf: prepare: add __aligned_u64 to types.h Wang Nan
2015-04-30 10:52 ` [RFC PATCH 03/22] perf: add bpf common operations Wang Nan
2015-04-30 10:52 ` [RFC PATCH 04/22] perf tools: Add new 'perf bpf' command Wang Nan
2015-05-11  6:28   ` Namhyung Kim
2015-04-30 10:52 ` [RFC PATCH 05/22] perf bpf: open eBPF object file and do basic validation Wang Nan
2015-04-30 10:52 ` [RFC PATCH 06/22] perf bpf: check swap according to EHDR Wang Nan
2015-04-30 10:52 ` [RFC PATCH 07/22] perf bpf: iterater over elf sections to collect information Wang Nan
2015-04-30 10:52 ` [RFC PATCH 08/22] perf bpf: collect version and license from ELF Wang Nan
2015-04-30 10:52 ` [RFC PATCH 09/22] perf bpf: collect map definitions Wang Nan
2015-05-11  6:32   ` Namhyung Kim
2015-04-30 10:52 ` [RFC PATCH 10/22] perf bpf: collect config section in object Wang Nan
2015-04-30 10:52 ` [RFC PATCH 11/22] perf bpf: collect symbol table in object files Wang Nan
2015-04-30 10:52 ` [RFC PATCH 12/22] perf bpf: collect bpf programs from " Wang Nan
2015-04-30 10:52 ` [RFC PATCH 13/22] perf bpf: collects relocation sections from object file Wang Nan
2015-04-30 10:52 ` [RFC PATCH 14/22] perf bpf: config eBPF programs based on their names Wang Nan
2015-04-30 10:52 ` [RFC PATCH 15/22] perf bpf: config eBPF programs using config section Wang Nan
2015-04-30 10:52 ` [RFC PATCH 16/22] perf bpf: create maps needed by object file Wang Nan
2015-04-30 10:52 ` [RFC PATCH 17/22] perf bpf: relocation programs Wang Nan
2015-04-30 10:52 ` [RFC PATCH 18/22] perf bpf: load eBPF programs into kernel Wang Nan
2015-04-30 10:52 ` [RFC PATCH 19/22] perf bpf: dump eBPF program before loading Wang Nan
2015-04-30 10:52 ` [RFC PATCH 20/22] perf bpf: clean elf memory after loading Wang Nan
2015-04-30 10:52 ` [RFC PATCH 21/22] perf bpf: probe at kprobe points Wang Nan
2015-05-05 16:34   ` Masami Hiramatsu
2015-05-06  2:36     ` Wang Nan
2015-04-30 10:52 ` [RFC PATCH 22/22] perf bpf: attaches eBPF program to perf fd Wang Nan
2015-05-01  4:37 ` [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs Alexei Starovoitov
2015-05-01 11:06   ` Peter Zijlstra
2015-05-01 11:49     ` Ingo Molnar
2015-05-01 16:56       ` Alexei Starovoitov
2015-05-01 17:06         ` Ingo Molnar
2015-05-05 15:39         ` Arnaldo Carvalho de Melo
2015-05-02  7:19   ` Wang Nan
2015-05-05  3:02     ` Alexei Starovoitov
2015-05-05  4:41       ` Wang Nan
2015-05-05  5:49         ` Alexei Starovoitov
2015-05-05  6:14           ` Wang Nan
2015-05-06  4:46             ` Wang Nan [this message]
2015-05-06  4:56               ` Alexei Starovoitov
2015-05-06  5:00                 ` Wang Nan
2015-05-01  7:16 ` Ingo Molnar
2015-05-05 21:52 ` Brendan Gregg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55499CB0.1090400@huawei.com \
    --to=wangnan0@huawei.com \
    --cc=ast@plumgrid.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=pi3orama@163.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox