* Re: perf bpf examples [not found] ` <CAE40pddH3gTzJd6A_DLV6ftEDZhyew4Mbj=BLMMtNJAU87DmGw@mail.gmail.com> @ 2016-07-08 4:18 ` Wangnan (F) 2016-07-08 7:57 ` Brendan Gregg 0 siblings, 1 reply; 4+ messages in thread From: Wangnan (F) @ 2016-07-08 4:18 UTC (permalink / raw) To: Brendan Gregg Cc: linux-perf-use., linux-kernel@vger.kernel.org, Arnaldo Carvalho de Melo, Alexei Starovoitov On 2016/7/8 1:58, Brendan Gregg wrote: > On Thu, Jul 7, 2016 at 10:54 AM, Brendan Gregg > <brendan.d.gregg@gmail.com> wrote: >> On Wed, Jul 6, 2016 at 6:49 PM, Wangnan (F) <wangnan0@huawei.com> wrote: >>> >>> >>> On 2016/7/7 4:29, Brendan Gregg wrote: >>>> G'Day, >>>> >>>> Are perf bpf examples shared anywhere? I've seen many posted to lkml >>>> (by Wang Nan), but don't see them in the linux source, or >>>> documentation. Would be very handy to throw them all up somewhere for >>>> searching/learning, if that hasn't already happened, eg, github. >>>> >>>> I was also looking to see if perf bpf supports sampling yet, but I >>>> don't think it does. Eg, imagine a: >>>> >>>> perf record -F 99 -e bpf_process_samples.c -a -- sleep 10 >>>> >>>> which would require BPF attaching to perf_swevent_hrtimer()/etc, and >>>> also emitting a map (eg, sampled instruction pointer counts). I don't >>>> think perf currently does either, but was hoping for a collection of >>>> examples to double check. >>> >>> Currently perf-bpf doesn't support dumpping resuling maps, but >>> we are working on it. I think you have read our uBPF approach: >>> >>> http://article.gmane.org/gmane.linux.kernel/2203717 >>> >>> and >>> >>> http://article.gmane.org/gmane.linux.kernel/2253579 >>> >>> in them we embeded a uBPF virtual machine to perf and give it >>> the ability to operate the result in maps. >>> >>> Now we are trying another approach, introduce LLVM to perf, >>> compile data analysis and report to code. It would be much >>> powerful. >> >> Great, thanks! >> >> But what about a set of examples covering the existing perf+bpf >> capabilities so far? I know you've emailed them to lkml, but has >> someone put them all in one place yet? If not, I can go through lkml >> and at least put them on github so we can search and learn from them. Great. Thanks a lot. > ... Also, has anyone looked into perf sampling (-F 99) with bpf yet? Thanks, Theoretically, BPF program is an additional filter to decide whetier an event should be filtered out or pass to perf. -F 99 is another filter, which drops samples to ensure the frequence. Filters works together. The full graph should be: BPF --> traditional filter --> proc (system wide of proc specific) --> period See the example at the end of this mail. The BPF program returns 0 for half of the events, and the result should be symmetrical. We can get similar result without -F: # ~/perf record -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd if=/dev/zero of=/dev/null count=8388480 8388480+0 records in 8388480+0 records out 4294901760 bytes (4.3 GB) copied, 11.9908 s, 358 MB/s [ perf record: Woken up 28 times to write data ] [ perf record: Captured and wrote 303.915 MB perf.data (4194449 samples) ] # root@wn-Lenovo-Product:~# ~/perf record -a --clang-opt '-DCATCH_EVEN' -e ./sampling.c dd if=/dev/zero of=/dev/null count=8388480 8388480+0 records in 8388480+0 records out 4294901760 bytes (4.3 GB) copied, 12.1154 s, 355 MB/s [ perf record: Woken up 54 times to write data ] [ perf record: Captured and wrote 303.933 MB perf.data (4194347 samples) ] With -F99 added: # ~/perf record -F99 -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd if=/dev/zero of=/dev/null count=8388480 8388480+0 records in 8388480+0 records out 4294901760 bytes (4.3 GB) copied, 9.60126 s, 447 MB/s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.402 MB perf.data (35 samples) ] # ~/perf record -F99 -a --clang-opt '-DCATCH_EVEN' -e ./sampling.c dd if=/dev/zero of=/dev/null count=8388480 8388480+0 records in 8388480+0 records out 4294901760 bytes (4.3 GB) copied, 9.76719 s, 440 MB/s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.399 MB perf.data (37 samples) ] However, there must be something I don't understand. It takes nearly 10 seconds to finish the record, so we should get nearly 1000 samples. Sometimes I can get about 500 samples: # ~/perf record -F99 -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd if=/dev/zero of=/dev/null count=8388480 8388480+0 records in 8388480+0 records out 4294901760 bytes (4.3 GB) copied, 9.60536 s, 447 MB/s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.431 MB perf.data (555 samples) ] ///////////////////////////////////////////////////////////////// #include <uapi/linux/bpf.h> #define SEC(NAME) __attribute__((section(NAME), used)) struct bpf_map_def { unsigned int type; unsigned int key_size; unsigned int value_size; unsigned int max_entries; }; struct bpf_map_def SEC("maps") m = { .type = BPF_MAP_TYPE_ARRAY, .key_size = sizeof(int), .value_size = sizeof(int), .max_entries = 1, }; static void *(*map_lookup_elem)(struct bpf_map_def *, void *) = (void *)BPF_FUNC_map_lookup_elem; static int (*trace_printk)(const char *fmt, int fmt_size, ...) = (void *)BPF_FUNC_trace_printk; char _license[] SEC("license") = "GPL"; int _version SEC("version") = LINUX_VERSION_CODE; #ifdef CATCH_ODD # define RET_ODD 1 # define RET_EVEN 0 #endif #ifdef CATCH_EVEN # define RET_ODD 0 # define RET_EVEN 1 #endif SEC("func=sys_read") int func(void *ctx) { int key = 0, *v; v = map_lookup_elem(&m, &key); if (!v) return 0; __sync_fetch_and_add(v, 1); if (*v & 1) return RET_ODD; return RET_EVEN; } > Brendan ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: perf bpf examples 2016-07-08 4:18 ` perf bpf examples Wangnan (F) @ 2016-07-08 7:57 ` Brendan Gregg 2016-07-08 10:46 ` Wangnan (F) 0 siblings, 1 reply; 4+ messages in thread From: Brendan Gregg @ 2016-07-08 7:57 UTC (permalink / raw) To: Wangnan (F) Cc: linux-perf-use., linux-kernel@vger.kernel.org, Arnaldo Carvalho de Melo, Alexei Starovoitov On Thu, Jul 7, 2016 at 9:18 PM, Wangnan (F) <wangnan0@huawei.com> wrote: > > > On 2016/7/8 1:58, Brendan Gregg wrote: >> >> On Thu, Jul 7, 2016 at 10:54 AM, Brendan Gregg >> <brendan.d.gregg@gmail.com> wrote: >>> >>> On Wed, Jul 6, 2016 at 6:49 PM, Wangnan (F) <wangnan0@huawei.com> wrote: [...] >> ... Also, has anyone looked into perf sampling (-F 99) with bpf yet? >> Thanks, > > > Theoretically, BPF program is an additional filter to > decide whetier an event should be filtered out or pass to perf. -F 99 > is another filter, which drops samples to ensure the frequence. > Filters works together. The full graph should be: > > BPF --> traditional filter --> proc (system wide of proc specific) --> > period > > See the example at the end of this mail. The BPF program returns 0 for half > of > the events, and the result should be symmetrical. We can get similar result > without > -F: > > # ~/perf record -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd if=/dev/zero > of=/dev/null count=8388480 > 8388480+0 records in > 8388480+0 records out > 4294901760 bytes (4.3 GB) copied, 11.9908 s, 358 MB/s > [ perf record: Woken up 28 times to write data ] > [ perf record: Captured and wrote 303.915 MB perf.data (4194449 samples) ] > # > root@wn-Lenovo-Product:~# ~/perf record -a --clang-opt '-DCATCH_EVEN' -e > ./sampling.c dd if=/dev/zero of=/dev/null count=8388480 > 8388480+0 records in > 8388480+0 records out > 4294901760 bytes (4.3 GB) copied, 12.1154 s, 355 MB/s > [ perf record: Woken up 54 times to write data ] > [ perf record: Captured and wrote 303.933 MB perf.data (4194347 samples) ] > > > With -F99 added: > > # ~/perf record -F99 -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd > if=/dev/zero of=/dev/null count=8388480 > 8388480+0 records in > 8388480+0 records out > 4294901760 bytes (4.3 GB) copied, 9.60126 s, 447 MB/s > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.402 MB perf.data (35 samples) ] > # ~/perf record -F99 -a --clang-opt '-DCATCH_EVEN' -e ./sampling.c dd > if=/dev/zero of=/dev/null count=8388480 > 8388480+0 records in > 8388480+0 records out > 4294901760 bytes (4.3 GB) copied, 9.76719 s, 440 MB/s > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.399 MB perf.data (37 samples) ] That looks like it's doing two different things: -F99, and a sampling.c script (SEC("func=sys_read")). I mean just an -F99 that executes a BPF program on each sample. My most common use for perf is: perf record -F 99 -a -g -- sleep 30 perf report (or perf script, for making flame graphs) But this uses perf.data as an intermediate file. With the recent BPF_MAP_TYPE_STACK_TRACE, we could frequency count stack traces in kernel context, and just dump a report. Much more efficient. And improving a very common perf one-liner. Brendan ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: perf bpf examples 2016-07-08 7:57 ` Brendan Gregg @ 2016-07-08 10:46 ` Wangnan (F) 2016-07-08 16:35 ` Brendan Gregg 0 siblings, 1 reply; 4+ messages in thread From: Wangnan (F) @ 2016-07-08 10:46 UTC (permalink / raw) To: Brendan Gregg Cc: linux-perf-use., linux-kernel@vger.kernel.org, Arnaldo Carvalho de Melo, Alexei Starovoitov On 2016/7/8 15:57, Brendan Gregg wrote: > On Thu, Jul 7, 2016 at 9:18 PM, Wangnan (F) <wangnan0@huawei.com> wrote: >> >> On 2016/7/8 1:58, Brendan Gregg wrote: >>> On Thu, Jul 7, 2016 at 10:54 AM, Brendan Gregg >>> <brendan.d.gregg@gmail.com> wrote: >>>> On Wed, Jul 6, 2016 at 6:49 PM, Wangnan (F) <wangnan0@huawei.com> wrote: > [...] >>> ... Also, has anyone looked into perf sampling (-F 99) with bpf yet? >>> Thanks, >> >> Theoretically, BPF program is an additional filter to >> decide whetier an event should be filtered out or pass to perf. -F 99 >> is another filter, which drops samples to ensure the frequence. >> Filters works together. The full graph should be: >> >> BPF --> traditional filter --> proc (system wide of proc specific) --> >> period >> >> See the example at the end of this mail. The BPF program returns 0 for half >> of >> the events, and the result should be symmetrical. We can get similar result >> without >> -F: >> >> # ~/perf record -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd if=/dev/zero >> of=/dev/null count=8388480 >> 8388480+0 records in >> 8388480+0 records out >> 4294901760 bytes (4.3 GB) copied, 11.9908 s, 358 MB/s >> [ perf record: Woken up 28 times to write data ] >> [ perf record: Captured and wrote 303.915 MB perf.data (4194449 samples) ] >> # >> root@wn-Lenovo-Product:~# ~/perf record -a --clang-opt '-DCATCH_EVEN' -e >> ./sampling.c dd if=/dev/zero of=/dev/null count=8388480 >> 8388480+0 records in >> 8388480+0 records out >> 4294901760 bytes (4.3 GB) copied, 12.1154 s, 355 MB/s >> [ perf record: Woken up 54 times to write data ] >> [ perf record: Captured and wrote 303.933 MB perf.data (4194347 samples) ] >> >> >> With -F99 added: >> >> # ~/perf record -F99 -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd >> if=/dev/zero of=/dev/null count=8388480 >> 8388480+0 records in >> 8388480+0 records out >> 4294901760 bytes (4.3 GB) copied, 9.60126 s, 447 MB/s >> [ perf record: Woken up 1 times to write data ] >> [ perf record: Captured and wrote 0.402 MB perf.data (35 samples) ] >> # ~/perf record -F99 -a --clang-opt '-DCATCH_EVEN' -e ./sampling.c dd >> if=/dev/zero of=/dev/null count=8388480 >> 8388480+0 records in >> 8388480+0 records out >> 4294901760 bytes (4.3 GB) copied, 9.76719 s, 440 MB/s >> [ perf record: Woken up 1 times to write data ] >> [ perf record: Captured and wrote 0.399 MB perf.data (37 samples) ] > That looks like it's doing two different things: -F99, and a > sampling.c script (SEC("func=sys_read")). > > I mean just an -F99 that executes a BPF program on each sample. My > most common use for perf is: > > perf record -F 99 -a -g -- sleep 30 > perf report (or perf script, for making flame graphs) > > But this uses perf.data as an intermediate file. With the recent > BPF_MAP_TYPE_STACK_TRACE, we could frequency count stack traces in > kernel context, and just dump a report. Much more efficient. And > improving a very common perf one-liner. You can't attach BPF script to samples other than kprobe and tracepoints. When you use 'perf record -F99 -a -g -- sleep 30', you are sampling on 'cycles:ppp' event. This is a hardware PMU event. If we find a kprobe or tracepoint event which would be triggered 99 times in each second, we can utilize BPF_MAP_TYPE_STACK_TRACE and bpf_get_stackid(). Thank you. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: perf bpf examples 2016-07-08 10:46 ` Wangnan (F) @ 2016-07-08 16:35 ` Brendan Gregg 0 siblings, 0 replies; 4+ messages in thread From: Brendan Gregg @ 2016-07-08 16:35 UTC (permalink / raw) To: Wangnan (F) Cc: linux-perf-use., linux-kernel@vger.kernel.org, Arnaldo Carvalho de Melo, Alexei Starovoitov On Fri, Jul 8, 2016 at 3:46 AM, Wangnan (F) <wangnan0@huawei.com> wrote: > > > On 2016/7/8 15:57, Brendan Gregg wrote: >> [...] >> I mean just an -F99 that executes a BPF program on each sample. My >> most common use for perf is: >> >> perf record -F 99 -a -g -- sleep 30 >> perf report (or perf script, for making flame graphs) >> >> But this uses perf.data as an intermediate file. With the recent >> BPF_MAP_TYPE_STACK_TRACE, we could frequency count stack traces in >> kernel context, and just dump a report. Much more efficient. And >> improving a very common perf one-liner. > > > You can't attach BPF script to samples other than kprobe and tracepoints. > When you use 'perf record -F99 -a -g -- sleep 30', you are sampling on > 'cycles:ppp' event. This is a hardware PMU event. Sure, either cycles:ppp or cpu-clock (my Xen guests have no PMU, sadly). But These are ultimately calling perf_swevent_hrtimer()/etc, so I was wondering if someone was already looking at enhancing this code to support BPF? Ie, BPF should be able to attach to kprobes, uprobes, tracepoints, and timer-based samples. > If we find a kprobe or tracepoint event which would be triggered 99 times > in each second, we can utilize BPF_MAP_TYPE_STACK_TRACE and > bpf_get_stackid(). Yes, that should be a workaround. It's annoying as some like perf_swevent_hrtimer() can't be kprobed (inlined?), but I found perf_misc_flags(struct pt_regs *regs) was called, but passing in that regs to bpf_get_stackid() was returning "type=inv expected=ctx" errors, despite casting. I'm guessing the BPF ctx type is special and can't be casted, but need to dig more. Brendan ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-07-08 16:37 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAE40pdfDNi-+80az4YNa9asLGyOJc2e5RcPgFxz5FK9VwgRUtw@mail.gmail.com>
[not found] ` <577DB50D.3040204@huawei.com>
[not found] ` <CAE40pdc7+gv2L7u+7CfFjmDQUv3rsV13XkAGDSrCq_FSOqLFdA@mail.gmail.com>
[not found] ` <CAE40pddH3gTzJd6A_DLV6ftEDZhyew4Mbj=BLMMtNJAU87DmGw@mail.gmail.com>
2016-07-08 4:18 ` perf bpf examples Wangnan (F)
2016-07-08 7:57 ` Brendan Gregg
2016-07-08 10:46 ` Wangnan (F)
2016-07-08 16:35 ` Brendan Gregg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox