* Re: perf bpf examples
[not found] ` <CAE40pddH3gTzJd6A_DLV6ftEDZhyew4Mbj=BLMMtNJAU87DmGw@mail.gmail.com>
@ 2016-07-08 4:18 ` Wangnan (F)
2016-07-08 7:57 ` Brendan Gregg
0 siblings, 1 reply; 4+ messages in thread
From: Wangnan (F) @ 2016-07-08 4:18 UTC (permalink / raw)
To: Brendan Gregg
Cc: linux-perf-use., linux-kernel@vger.kernel.org,
Arnaldo Carvalho de Melo, Alexei Starovoitov
On 2016/7/8 1:58, Brendan Gregg wrote:
> On Thu, Jul 7, 2016 at 10:54 AM, Brendan Gregg
> <brendan.d.gregg@gmail.com> wrote:
>> On Wed, Jul 6, 2016 at 6:49 PM, Wangnan (F) <wangnan0@huawei.com> wrote:
>>>
>>>
>>> On 2016/7/7 4:29, Brendan Gregg wrote:
>>>> G'Day,
>>>>
>>>> Are perf bpf examples shared anywhere? I've seen many posted to lkml
>>>> (by Wang Nan), but don't see them in the linux source, or
>>>> documentation. Would be very handy to throw them all up somewhere for
>>>> searching/learning, if that hasn't already happened, eg, github.
>>>>
>>>> I was also looking to see if perf bpf supports sampling yet, but I
>>>> don't think it does. Eg, imagine a:
>>>>
>>>> perf record -F 99 -e bpf_process_samples.c -a -- sleep 10
>>>>
>>>> which would require BPF attaching to perf_swevent_hrtimer()/etc, and
>>>> also emitting a map (eg, sampled instruction pointer counts). I don't
>>>> think perf currently does either, but was hoping for a collection of
>>>> examples to double check.
>>>
>>> Currently perf-bpf doesn't support dumpping resuling maps, but
>>> we are working on it. I think you have read our uBPF approach:
>>>
>>> http://article.gmane.org/gmane.linux.kernel/2203717
>>>
>>> and
>>>
>>> http://article.gmane.org/gmane.linux.kernel/2253579
>>>
>>> in them we embeded a uBPF virtual machine to perf and give it
>>> the ability to operate the result in maps.
>>>
>>> Now we are trying another approach, introduce LLVM to perf,
>>> compile data analysis and report to code. It would be much
>>> powerful.
>>
>> Great, thanks!
>>
>> But what about a set of examples covering the existing perf+bpf
>> capabilities so far? I know you've emailed them to lkml, but has
>> someone put them all in one place yet? If not, I can go through lkml
>> and at least put them on github so we can search and learn from them.
Great. Thanks a lot.
> ... Also, has anyone looked into perf sampling (-F 99) with bpf yet? Thanks,
Theoretically, BPF program is an additional filter to
decide whetier an event should be filtered out or pass to perf. -F 99
is another filter, which drops samples to ensure the frequence.
Filters works together. The full graph should be:
BPF --> traditional filter --> proc (system wide of proc specific) -->
period
See the example at the end of this mail. The BPF program returns 0 for
half of
the events, and the result should be symmetrical. We can get similar
result without
-F:
# ~/perf record -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd
if=/dev/zero of=/dev/null count=8388480
8388480+0 records in
8388480+0 records out
4294901760 bytes (4.3 GB) copied, 11.9908 s, 358 MB/s
[ perf record: Woken up 28 times to write data ]
[ perf record: Captured and wrote 303.915 MB perf.data (4194449 samples) ]
#
root@wn-Lenovo-Product:~# ~/perf record -a --clang-opt '-DCATCH_EVEN' -e
./sampling.c dd if=/dev/zero of=/dev/null count=8388480
8388480+0 records in
8388480+0 records out
4294901760 bytes (4.3 GB) copied, 12.1154 s, 355 MB/s
[ perf record: Woken up 54 times to write data ]
[ perf record: Captured and wrote 303.933 MB perf.data (4194347 samples) ]
With -F99 added:
# ~/perf record -F99 -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd
if=/dev/zero of=/dev/null count=8388480
8388480+0 records in
8388480+0 records out
4294901760 bytes (4.3 GB) copied, 9.60126 s, 447 MB/s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.402 MB perf.data (35 samples) ]
# ~/perf record -F99 -a --clang-opt '-DCATCH_EVEN' -e ./sampling.c dd
if=/dev/zero of=/dev/null count=8388480
8388480+0 records in
8388480+0 records out
4294901760 bytes (4.3 GB) copied, 9.76719 s, 440 MB/s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.399 MB perf.data (37 samples) ]
However, there must be something I don't understand. It takes nearly 10
seconds to
finish the record, so we should get nearly 1000 samples. Sometimes I can
get about 500 samples:
# ~/perf record -F99 -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd
if=/dev/zero of=/dev/null count=8388480
8388480+0 records in
8388480+0 records out
4294901760 bytes (4.3 GB) copied, 9.60536 s, 447 MB/s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.431 MB perf.data (555 samples) ]
/////////////////////////////////////////////////////////////////
#include <uapi/linux/bpf.h>
#define SEC(NAME) __attribute__((section(NAME), used))
struct bpf_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
unsigned int max_entries;
};
struct bpf_map_def SEC("maps") m = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(int),
.max_entries = 1,
};
static void *(*map_lookup_elem)(struct bpf_map_def *, void *) =
(void *)BPF_FUNC_map_lookup_elem;
static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
(void *)BPF_FUNC_trace_printk;
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
#ifdef CATCH_ODD
# define RET_ODD 1
# define RET_EVEN 0
#endif
#ifdef CATCH_EVEN
# define RET_ODD 0
# define RET_EVEN 1
#endif
SEC("func=sys_read")
int func(void *ctx)
{
int key = 0, *v;
v = map_lookup_elem(&m, &key);
if (!v)
return 0;
__sync_fetch_and_add(v, 1);
if (*v & 1)
return RET_ODD;
return RET_EVEN;
}
> Brendan
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: perf bpf examples
2016-07-08 4:18 ` perf bpf examples Wangnan (F)
@ 2016-07-08 7:57 ` Brendan Gregg
2016-07-08 10:46 ` Wangnan (F)
0 siblings, 1 reply; 4+ messages in thread
From: Brendan Gregg @ 2016-07-08 7:57 UTC (permalink / raw)
To: Wangnan (F)
Cc: linux-perf-use., linux-kernel@vger.kernel.org,
Arnaldo Carvalho de Melo, Alexei Starovoitov
On Thu, Jul 7, 2016 at 9:18 PM, Wangnan (F) <wangnan0@huawei.com> wrote:
>
>
> On 2016/7/8 1:58, Brendan Gregg wrote:
>>
>> On Thu, Jul 7, 2016 at 10:54 AM, Brendan Gregg
>> <brendan.d.gregg@gmail.com> wrote:
>>>
>>> On Wed, Jul 6, 2016 at 6:49 PM, Wangnan (F) <wangnan0@huawei.com> wrote:
[...]
>> ... Also, has anyone looked into perf sampling (-F 99) with bpf yet?
>> Thanks,
>
>
> Theoretically, BPF program is an additional filter to
> decide whetier an event should be filtered out or pass to perf. -F 99
> is another filter, which drops samples to ensure the frequence.
> Filters works together. The full graph should be:
>
> BPF --> traditional filter --> proc (system wide of proc specific) -->
> period
>
> See the example at the end of this mail. The BPF program returns 0 for half
> of
> the events, and the result should be symmetrical. We can get similar result
> without
> -F:
>
> # ~/perf record -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd if=/dev/zero
> of=/dev/null count=8388480
> 8388480+0 records in
> 8388480+0 records out
> 4294901760 bytes (4.3 GB) copied, 11.9908 s, 358 MB/s
> [ perf record: Woken up 28 times to write data ]
> [ perf record: Captured and wrote 303.915 MB perf.data (4194449 samples) ]
> #
> root@wn-Lenovo-Product:~# ~/perf record -a --clang-opt '-DCATCH_EVEN' -e
> ./sampling.c dd if=/dev/zero of=/dev/null count=8388480
> 8388480+0 records in
> 8388480+0 records out
> 4294901760 bytes (4.3 GB) copied, 12.1154 s, 355 MB/s
> [ perf record: Woken up 54 times to write data ]
> [ perf record: Captured and wrote 303.933 MB perf.data (4194347 samples) ]
>
>
> With -F99 added:
>
> # ~/perf record -F99 -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd
> if=/dev/zero of=/dev/null count=8388480
> 8388480+0 records in
> 8388480+0 records out
> 4294901760 bytes (4.3 GB) copied, 9.60126 s, 447 MB/s
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.402 MB perf.data (35 samples) ]
> # ~/perf record -F99 -a --clang-opt '-DCATCH_EVEN' -e ./sampling.c dd
> if=/dev/zero of=/dev/null count=8388480
> 8388480+0 records in
> 8388480+0 records out
> 4294901760 bytes (4.3 GB) copied, 9.76719 s, 440 MB/s
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.399 MB perf.data (37 samples) ]
That looks like it's doing two different things: -F99, and a
sampling.c script (SEC("func=sys_read")).
I mean just an -F99 that executes a BPF program on each sample. My
most common use for perf is:
perf record -F 99 -a -g -- sleep 30
perf report (or perf script, for making flame graphs)
But this uses perf.data as an intermediate file. With the recent
BPF_MAP_TYPE_STACK_TRACE, we could frequency count stack traces in
kernel context, and just dump a report. Much more efficient. And
improving a very common perf one-liner.
Brendan
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: perf bpf examples
2016-07-08 7:57 ` Brendan Gregg
@ 2016-07-08 10:46 ` Wangnan (F)
2016-07-08 16:35 ` Brendan Gregg
0 siblings, 1 reply; 4+ messages in thread
From: Wangnan (F) @ 2016-07-08 10:46 UTC (permalink / raw)
To: Brendan Gregg
Cc: linux-perf-use., linux-kernel@vger.kernel.org,
Arnaldo Carvalho de Melo, Alexei Starovoitov
On 2016/7/8 15:57, Brendan Gregg wrote:
> On Thu, Jul 7, 2016 at 9:18 PM, Wangnan (F) <wangnan0@huawei.com> wrote:
>>
>> On 2016/7/8 1:58, Brendan Gregg wrote:
>>> On Thu, Jul 7, 2016 at 10:54 AM, Brendan Gregg
>>> <brendan.d.gregg@gmail.com> wrote:
>>>> On Wed, Jul 6, 2016 at 6:49 PM, Wangnan (F) <wangnan0@huawei.com> wrote:
> [...]
>>> ... Also, has anyone looked into perf sampling (-F 99) with bpf yet?
>>> Thanks,
>>
>> Theoretically, BPF program is an additional filter to
>> decide whetier an event should be filtered out or pass to perf. -F 99
>> is another filter, which drops samples to ensure the frequence.
>> Filters works together. The full graph should be:
>>
>> BPF --> traditional filter --> proc (system wide of proc specific) -->
>> period
>>
>> See the example at the end of this mail. The BPF program returns 0 for half
>> of
>> the events, and the result should be symmetrical. We can get similar result
>> without
>> -F:
>>
>> # ~/perf record -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd if=/dev/zero
>> of=/dev/null count=8388480
>> 8388480+0 records in
>> 8388480+0 records out
>> 4294901760 bytes (4.3 GB) copied, 11.9908 s, 358 MB/s
>> [ perf record: Woken up 28 times to write data ]
>> [ perf record: Captured and wrote 303.915 MB perf.data (4194449 samples) ]
>> #
>> root@wn-Lenovo-Product:~# ~/perf record -a --clang-opt '-DCATCH_EVEN' -e
>> ./sampling.c dd if=/dev/zero of=/dev/null count=8388480
>> 8388480+0 records in
>> 8388480+0 records out
>> 4294901760 bytes (4.3 GB) copied, 12.1154 s, 355 MB/s
>> [ perf record: Woken up 54 times to write data ]
>> [ perf record: Captured and wrote 303.933 MB perf.data (4194347 samples) ]
>>
>>
>> With -F99 added:
>>
>> # ~/perf record -F99 -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd
>> if=/dev/zero of=/dev/null count=8388480
>> 8388480+0 records in
>> 8388480+0 records out
>> 4294901760 bytes (4.3 GB) copied, 9.60126 s, 447 MB/s
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.402 MB perf.data (35 samples) ]
>> # ~/perf record -F99 -a --clang-opt '-DCATCH_EVEN' -e ./sampling.c dd
>> if=/dev/zero of=/dev/null count=8388480
>> 8388480+0 records in
>> 8388480+0 records out
>> 4294901760 bytes (4.3 GB) copied, 9.76719 s, 440 MB/s
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.399 MB perf.data (37 samples) ]
> That looks like it's doing two different things: -F99, and a
> sampling.c script (SEC("func=sys_read")).
>
> I mean just an -F99 that executes a BPF program on each sample. My
> most common use for perf is:
>
> perf record -F 99 -a -g -- sleep 30
> perf report (or perf script, for making flame graphs)
>
> But this uses perf.data as an intermediate file. With the recent
> BPF_MAP_TYPE_STACK_TRACE, we could frequency count stack traces in
> kernel context, and just dump a report. Much more efficient. And
> improving a very common perf one-liner.
You can't attach BPF script to samples other than kprobe and tracepoints.
When you use 'perf record -F99 -a -g -- sleep 30', you are sampling on
'cycles:ppp' event. This is a hardware PMU event.
If we find a kprobe or tracepoint event which would be triggered 99 times
in each second, we can utilize BPF_MAP_TYPE_STACK_TRACE and
bpf_get_stackid().
Thank you.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: perf bpf examples
2016-07-08 10:46 ` Wangnan (F)
@ 2016-07-08 16:35 ` Brendan Gregg
0 siblings, 0 replies; 4+ messages in thread
From: Brendan Gregg @ 2016-07-08 16:35 UTC (permalink / raw)
To: Wangnan (F)
Cc: linux-perf-use., linux-kernel@vger.kernel.org,
Arnaldo Carvalho de Melo, Alexei Starovoitov
On Fri, Jul 8, 2016 at 3:46 AM, Wangnan (F) <wangnan0@huawei.com> wrote:
>
>
> On 2016/7/8 15:57, Brendan Gregg wrote:
>>
[...]
>> I mean just an -F99 that executes a BPF program on each sample. My
>> most common use for perf is:
>>
>> perf record -F 99 -a -g -- sleep 30
>> perf report (or perf script, for making flame graphs)
>>
>> But this uses perf.data as an intermediate file. With the recent
>> BPF_MAP_TYPE_STACK_TRACE, we could frequency count stack traces in
>> kernel context, and just dump a report. Much more efficient. And
>> improving a very common perf one-liner.
>
>
> You can't attach BPF script to samples other than kprobe and tracepoints.
> When you use 'perf record -F99 -a -g -- sleep 30', you are sampling on
> 'cycles:ppp' event. This is a hardware PMU event.
Sure, either cycles:ppp or cpu-clock (my Xen guests have no PMU,
sadly). But These are ultimately calling perf_swevent_hrtimer()/etc,
so I was wondering if someone was already looking at enhancing this
code to support BPF? Ie, BPF should be able to attach to kprobes,
uprobes, tracepoints, and timer-based samples.
> If we find a kprobe or tracepoint event which would be triggered 99 times
> in each second, we can utilize BPF_MAP_TYPE_STACK_TRACE and
> bpf_get_stackid().
Yes, that should be a workaround. It's annoying as some like
perf_swevent_hrtimer() can't be kprobed (inlined?), but I found
perf_misc_flags(struct pt_regs *regs) was called, but passing in that
regs to bpf_get_stackid() was returning "type=inv expected=ctx"
errors, despite casting. I'm guessing the BPF ctx type is special and
can't be casted, but need to dig more.
Brendan
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-07-08 16:37 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAE40pdfDNi-+80az4YNa9asLGyOJc2e5RcPgFxz5FK9VwgRUtw@mail.gmail.com>
[not found] ` <577DB50D.3040204@huawei.com>
[not found] ` <CAE40pdc7+gv2L7u+7CfFjmDQUv3rsV13XkAGDSrCq_FSOqLFdA@mail.gmail.com>
[not found] ` <CAE40pddH3gTzJd6A_DLV6ftEDZhyew4Mbj=BLMMtNJAU87DmGw@mail.gmail.com>
2016-07-08 4:18 ` perf bpf examples Wangnan (F)
2016-07-08 7:57 ` Brendan Gregg
2016-07-08 10:46 ` Wangnan (F)
2016-07-08 16:35 ` Brendan Gregg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox