All of lore.kernel.org
 help / color / mirror / Atom feed
From: xiakaixu <xiakaixu@huawei.com>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: <ast@plumgrid.com>, <davem@davemloft.net>, <acme@kernel.org>,
	<mingo@redhat.com>, <a.p.zijlstra@chello.nl>,
	<masami.hiramatsu.pt@hitachi.com>, <jolsa@kernel.org>,
	<wangnan0@huawei.com>, <linux-kernel@vger.kernel.org>,
	<pi3orama@163.com>, <hekuang@huawei.com>
Subject: Re: [PATCH v2 0/5] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter
Date: Sat, 25 Jul 2015 10:14:59 +0800	[thread overview]
Message-ID: <55B2F123.4060908@huawei.com> (raw)
In-Reply-To: <55B179DB.4080308@iogearbox.net>

于 2015/7/24 7:33, Daniel Borkmann 写道:
> On 07/22/2015 10:09 AM, Kaixu Xia wrote:
>> Previous patch v1 url:
>> https://lkml.org/lkml/2015/7/17/287
> 
> [ Sorry to chime in late, just noticed this series now as I wasn't in Cc for
>   the core BPF changes. More below ... ]

Sorry about this, will add you to the CC list:) Welcome your comments.
> 
>> This patchset allows user read PMU events in the following way:
>>   1. Open the PMU using perf_event_open() (for each CPUs or for
>>      each processes he/she'd like to watch);
>>   2. Create a BPF_MAP_TYPE_PERF_EVENT_ARRAY BPF map;
>>   3. Insert FDs into the map with some key-value mapping scheme
>>      (i.e. cpuid -> event on that CPU);
>>   4. Load and attach eBPF programs as usual;
>>   5. In eBPF program, get the perf_event_map_fd and key (i.e.
>>      cpuid get from bpf_get_smp_processor_id()) then use
>>      bpf_perf_event_read() to read from it.
>>   6. Do anything he/her want.
>>
>> changes in V2:
>>   - put atomic_long_inc_not_zero() between fdget() and fdput();
>>   - limit the event type to PERF_TYPE_RAW and PERF_TYPE_HARDWARE;
>>   - Only read the event counter on current CPU or on current
>>     process;
>>   - add new map type BPF_MAP_TYPE_PERF_EVENT_ARRAY to store the
>>     pointer to the struct perf_event;
>>   - according to the perf_event_map_fd and key, the function
>>     bpf_perf_event_read() can get the Hardware PMU counter value;
>>
>> Patch 5/5 is a simple example and shows how to use this new eBPF
>> programs ability. The PMU counter data can be found in
>> /sys/kernel/debug/tracing/trace(trace_pipe).(the cycles PMU
>> value when 'kprobe/sys_write' sampling)
>>
>>    $ cat /sys/kernel/debug/tracing/trace_pipe
>>    $ ./tracex6
>>         ...
>>               cat-677   [002] d..1   210.299270: : bpf count: CPU-2  5316659
>>               cat-677   [002] d..1   210.299316: : bpf count: CPU-2  5378639
>>               cat-677   [002] d..1   210.299362: : bpf count: CPU-2  5440654
>>               cat-677   [002] d..1   210.299408: : bpf count: CPU-2  5503211
>>               cat-677   [002] d..1   210.299454: : bpf count: CPU-2  5565438
>>               cat-677   [002] d..1   210.299500: : bpf count: CPU-2  5627433
>>               cat-677   [002] d..1   210.299547: : bpf count: CPU-2  5690033
>>               cat-677   [002] d..1   210.299593: : bpf count: CPU-2  5752184
>>               cat-677   [002] d..1   210.299639: : bpf count: CPU-2  5814543
>>             <...>-548   [009] d..1   210.299667: : bpf count: CPU-9  605418074
>>             <...>-548   [009] d..1   210.299692: : bpf count: CPU-9  605452692
>>               cat-677   [002] d..1   210.299700: : bpf count: CPU-2  5896319
>>             <...>-548   [009] d..1   210.299710: : bpf count: CPU-9  605477824
>>             <...>-548   [009] d..1   210.299728: : bpf count: CPU-9  605501726
>>             <...>-548   [009] d..1   210.299745: : bpf count: CPU-9  605525279
>>             <...>-548   [009] d..1   210.299762: : bpf count: CPU-9  605547817
>>             <...>-548   [009] d..1   210.299778: : bpf count: CPU-9  605570433
>>             <...>-548   [009] d..1   210.299795: : bpf count: CPU-9  605592743
>>         ...
>>
>> The detail of patches is as follow:
>>
>> Patch 1/5 introduces a new bpf map type. This map only stores the
>> pointer to struct perf_event;
>>
>> Patch 2/5 introduces a map_traverse_elem() function for further use;
>>
>> Patch 3/5 convets event file descriptors into perf_event structure when
>> add new element to the map;
> 
> So far all the map backends are of generic nature, knowing absolutely nothing
> about a particular consumer/subsystem of eBPF (tc, socket filters, etc). The
> tail call is a bit special, but nevertheless generic for each user and [very]
> useful, so it makes sense to inherit from the array map and move the code there.
> 
> I don't really like that we start add new _special_-cased maps here into the
> eBPF core code, it seems quite hacky. :( From your rather terse commit description
> where you introduce the maps, I failed to see a detailed elaboration on this i.e.
> why it cannot be abstracted any different?

It will be very useful that giving the eBPF programs the ablility to access
hardware PMU counter, just as I mentioned in V1 commit message.
Of course, there are some special code when creating the perf_event type map
in V2, but you will find less special code in the next version(V3). I have
reused most of the prog_array map implementation. We can make the perf_event
array map more generic in the future.

BR.
>  
>> Patch 4/5 implement function bpf_perf_event_read() that get the selected
>> hardware PMU conuter;
>>
>> Patch 5/5 give a simple example.
>>
>> Kaixu Xia (5):
>>    bpf: Add new bpf map type to store the pointer to struct perf_event
>>    bpf: Add function map->ops->map_traverse_elem() to traverse map elems
>>    bpf: Save the pointer to struct perf_event to map
>>    bpf: Implement function bpf_perf_event_read() that get the selected
>>      hardware PMU conuter
>>    samples/bpf: example of get selected PMU counter value
>>
>>   include/linux/bpf.h        |   6 +++
>>   include/linux/perf_event.h |   5 ++-
>>   include/uapi/linux/bpf.h   |   3 ++
>>   kernel/bpf/arraymap.c      | 110 +++++++++++++++++++++++++++++++++++++++++++++
>>   kernel/bpf/helpers.c       |  42 +++++++++++++++++
>>   kernel/bpf/syscall.c       |  26 +++++++++++
>>   kernel/events/core.c       |  30 ++++++++++++-
>>   kernel/trace/bpf_trace.c   |   2 +
>>   samples/bpf/Makefile       |   4 ++
>>   samples/bpf/bpf_helpers.h  |   2 +
>>   samples/bpf/tracex6_kern.c |  27 +++++++++++
>>   samples/bpf/tracex6_user.c |  67 +++++++++++++++++++++++++++
>>   12 files changed, 321 insertions(+), 3 deletions(-)
>>   create mode 100644 samples/bpf/tracex6_kern.c
>>   create mode 100644 samples/bpf/tracex6_user.c
>>
> 
> 
> .
> 



      reply	other threads:[~2015-07-25  2:15 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-22  8:09 [PATCH v2 0/5] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter Kaixu Xia
2015-07-22  8:09 ` [PATCH v2 1/5] bpf: Add new bpf map type to store the pointer to struct perf_event Kaixu Xia
2015-07-23  0:48   ` Alexei Starovoitov
2015-07-23  1:08     ` Wangnan (F)
2015-07-23  1:39       ` Alexei Starovoitov
2015-07-22  8:09 ` [PATCH v2 2/5] bpf: Add function map->ops->map_traverse_elem() to traverse map elems Kaixu Xia
2015-07-23  1:00   ` Alexei Starovoitov
2015-07-22  8:09 ` [PATCH v2 3/5] bpf: Save the pointer to struct perf_event to map Kaixu Xia
2015-07-23  1:08   ` Alexei Starovoitov
2015-07-22  8:09 ` [PATCH v2 4/5] bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter Kaixu Xia
2015-07-23  1:14   ` Alexei Starovoitov
2015-07-23  2:12     ` xiakaixu
2015-07-23  2:22       ` Alexei Starovoitov
2015-07-23  2:39         ` xiakaixu
2015-07-22  8:09 ` [PATCH v2 5/5] samples/bpf: example of get selected PMU counter value Kaixu Xia
2015-07-23  1:16   ` Alexei Starovoitov
2015-07-23 23:33 ` [PATCH v2 0/5] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter Daniel Borkmann
2015-07-25  2:14   ` xiakaixu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55B2F123.4060908@huawei.com \
    --to=xiakaixu@huawei.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@kernel.org \
    --cc=ast@plumgrid.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=hekuang@huawei.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=mingo@redhat.com \
    --cc=pi3orama@163.com \
    --cc=wangnan0@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.