From: Daniel Borkmann <daniel@iogearbox.net>
To: Kaixu Xia <xiakaixu@huawei.com>,
ast@plumgrid.com, davem@davemloft.net, acme@kernel.org,
mingo@redhat.com, a.p.zijlstra@chello.nl,
masami.hiramatsu.pt@hitachi.com, jolsa@kernel.org
Cc: wangnan0@huawei.com, linux-kernel@vger.kernel.org,
pi3orama@163.com, hekuang@huawei.com
Subject: Re: [PATCH v2 0/5] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter
Date: Fri, 24 Jul 2015 01:33:47 +0200 [thread overview]
Message-ID: <55B179DB.4080308@iogearbox.net> (raw)
In-Reply-To: <1437552572-84748-1-git-send-email-xiakaixu@huawei.com>
On 07/22/2015 10:09 AM, Kaixu Xia wrote:
> Previous patch v1 url:
> https://lkml.org/lkml/2015/7/17/287
[ Sorry to chime in late, just noticed this series now as I wasn't in Cc for
the core BPF changes. More below ... ]
> This patchset allows user read PMU events in the following way:
> 1. Open the PMU using perf_event_open() (for each CPUs or for
> each processes he/she'd like to watch);
> 2. Create a BPF_MAP_TYPE_PERF_EVENT_ARRAY BPF map;
> 3. Insert FDs into the map with some key-value mapping scheme
> (i.e. cpuid -> event on that CPU);
> 4. Load and attach eBPF programs as usual;
> 5. In eBPF program, get the perf_event_map_fd and key (i.e.
> cpuid get from bpf_get_smp_processor_id()) then use
> bpf_perf_event_read() to read from it.
> 6. Do anything he/her want.
>
> changes in V2:
> - put atomic_long_inc_not_zero() between fdget() and fdput();
> - limit the event type to PERF_TYPE_RAW and PERF_TYPE_HARDWARE;
> - Only read the event counter on current CPU or on current
> process;
> - add new map type BPF_MAP_TYPE_PERF_EVENT_ARRAY to store the
> pointer to the struct perf_event;
> - according to the perf_event_map_fd and key, the function
> bpf_perf_event_read() can get the Hardware PMU counter value;
>
> Patch 5/5 is a simple example and shows how to use this new eBPF
> programs ability. The PMU counter data can be found in
> /sys/kernel/debug/tracing/trace(trace_pipe).(the cycles PMU
> value when 'kprobe/sys_write' sampling)
>
> $ cat /sys/kernel/debug/tracing/trace_pipe
> $ ./tracex6
> ...
> cat-677 [002] d..1 210.299270: : bpf count: CPU-2 5316659
> cat-677 [002] d..1 210.299316: : bpf count: CPU-2 5378639
> cat-677 [002] d..1 210.299362: : bpf count: CPU-2 5440654
> cat-677 [002] d..1 210.299408: : bpf count: CPU-2 5503211
> cat-677 [002] d..1 210.299454: : bpf count: CPU-2 5565438
> cat-677 [002] d..1 210.299500: : bpf count: CPU-2 5627433
> cat-677 [002] d..1 210.299547: : bpf count: CPU-2 5690033
> cat-677 [002] d..1 210.299593: : bpf count: CPU-2 5752184
> cat-677 [002] d..1 210.299639: : bpf count: CPU-2 5814543
> <...>-548 [009] d..1 210.299667: : bpf count: CPU-9 605418074
> <...>-548 [009] d..1 210.299692: : bpf count: CPU-9 605452692
> cat-677 [002] d..1 210.299700: : bpf count: CPU-2 5896319
> <...>-548 [009] d..1 210.299710: : bpf count: CPU-9 605477824
> <...>-548 [009] d..1 210.299728: : bpf count: CPU-9 605501726
> <...>-548 [009] d..1 210.299745: : bpf count: CPU-9 605525279
> <...>-548 [009] d..1 210.299762: : bpf count: CPU-9 605547817
> <...>-548 [009] d..1 210.299778: : bpf count: CPU-9 605570433
> <...>-548 [009] d..1 210.299795: : bpf count: CPU-9 605592743
> ...
>
> The detail of patches is as follow:
>
> Patch 1/5 introduces a new bpf map type. This map only stores the
> pointer to struct perf_event;
>
> Patch 2/5 introduces a map_traverse_elem() function for further use;
>
> Patch 3/5 convets event file descriptors into perf_event structure when
> add new element to the map;
So far all the map backends are of generic nature, knowing absolutely nothing
about a particular consumer/subsystem of eBPF (tc, socket filters, etc). The
tail call is a bit special, but nevertheless generic for each user and [very]
useful, so it makes sense to inherit from the array map and move the code there.
I don't really like that we start add new _special_-cased maps here into the
eBPF core code, it seems quite hacky. :( From your rather terse commit description
where you introduce the maps, I failed to see a detailed elaboration on this i.e.
why it cannot be abstracted any different?
> Patch 4/5 implement function bpf_perf_event_read() that get the selected
> hardware PMU conuter;
>
> Patch 5/5 give a simple example.
>
> Kaixu Xia (5):
> bpf: Add new bpf map type to store the pointer to struct perf_event
> bpf: Add function map->ops->map_traverse_elem() to traverse map elems
> bpf: Save the pointer to struct perf_event to map
> bpf: Implement function bpf_perf_event_read() that get the selected
> hardware PMU conuter
> samples/bpf: example of get selected PMU counter value
>
> include/linux/bpf.h | 6 +++
> include/linux/perf_event.h | 5 ++-
> include/uapi/linux/bpf.h | 3 ++
> kernel/bpf/arraymap.c | 110 +++++++++++++++++++++++++++++++++++++++++++++
> kernel/bpf/helpers.c | 42 +++++++++++++++++
> kernel/bpf/syscall.c | 26 +++++++++++
> kernel/events/core.c | 30 ++++++++++++-
> kernel/trace/bpf_trace.c | 2 +
> samples/bpf/Makefile | 4 ++
> samples/bpf/bpf_helpers.h | 2 +
> samples/bpf/tracex6_kern.c | 27 +++++++++++
> samples/bpf/tracex6_user.c | 67 +++++++++++++++++++++++++++
> 12 files changed, 321 insertions(+), 3 deletions(-)
> create mode 100644 samples/bpf/tracex6_kern.c
> create mode 100644 samples/bpf/tracex6_user.c
>
next prev parent reply other threads:[~2015-07-23 23:34 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-22 8:09 [PATCH v2 0/5] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter Kaixu Xia
2015-07-22 8:09 ` [PATCH v2 1/5] bpf: Add new bpf map type to store the pointer to struct perf_event Kaixu Xia
2015-07-23 0:48 ` Alexei Starovoitov
2015-07-23 1:08 ` Wangnan (F)
2015-07-23 1:39 ` Alexei Starovoitov
2015-07-22 8:09 ` [PATCH v2 2/5] bpf: Add function map->ops->map_traverse_elem() to traverse map elems Kaixu Xia
2015-07-23 1:00 ` Alexei Starovoitov
2015-07-22 8:09 ` [PATCH v2 3/5] bpf: Save the pointer to struct perf_event to map Kaixu Xia
2015-07-23 1:08 ` Alexei Starovoitov
2015-07-22 8:09 ` [PATCH v2 4/5] bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter Kaixu Xia
2015-07-23 1:14 ` Alexei Starovoitov
2015-07-23 2:12 ` xiakaixu
2015-07-23 2:22 ` Alexei Starovoitov
2015-07-23 2:39 ` xiakaixu
2015-07-22 8:09 ` [PATCH v2 5/5] samples/bpf: example of get selected PMU counter value Kaixu Xia
2015-07-23 1:16 ` Alexei Starovoitov
2015-07-23 23:33 ` Daniel Borkmann [this message]
2015-07-25 2:14 ` [PATCH v2 0/5] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter xiakaixu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55B179DB.4080308@iogearbox.net \
--to=daniel@iogearbox.net \
--cc=a.p.zijlstra@chello.nl \
--cc=acme@kernel.org \
--cc=ast@plumgrid.com \
--cc=davem@davemloft.net \
--cc=hekuang@huawei.com \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=masami.hiramatsu.pt@hitachi.com \
--cc=mingo@redhat.com \
--cc=pi3orama@163.com \
--cc=wangnan0@huawei.com \
--cc=xiakaixu@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox