public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: xiakaixu <xiakaixu@huawei.com>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: <ast@plumgrid.com>, <davem@davemloft.net>, <acme@kernel.org>,
	<mingo@redhat.com>, <a.p.zijlstra@chello.nl>,
	<masami.hiramatsu.pt@hitachi.com>, <jolsa@kernel.org>,
	<wangnan0@huawei.com>, <linux-kernel@vger.kernel.org>,
	<pi3orama@163.com>, <hekuang@huawei.com>
Subject: Re: [PATCH v2 0/5] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter
Date: Sat, 25 Jul 2015 10:14:59 +0800	[thread overview]
Message-ID: <55B2F123.4060908@huawei.com> (raw)
In-Reply-To: <55B179DB.4080308@iogearbox.net>

于 2015/7/24 7:33, Daniel Borkmann 写道:
> On 07/22/2015 10:09 AM, Kaixu Xia wrote:
>> Previous patch v1 url:
>> https://lkml.org/lkml/2015/7/17/287
> 
> [ Sorry to chime in late, just noticed this series now as I wasn't in Cc for
>   the core BPF changes. More below ... ]

Sorry about this, will add you to the CC list:) Welcome your comments.
> 
>> This patchset allows user read PMU events in the following way:
>>   1. Open the PMU using perf_event_open() (for each CPUs or for
>>      each processes he/she'd like to watch);
>>   2. Create a BPF_MAP_TYPE_PERF_EVENT_ARRAY BPF map;
>>   3. Insert FDs into the map with some key-value mapping scheme
>>      (i.e. cpuid -> event on that CPU);
>>   4. Load and attach eBPF programs as usual;
>>   5. In eBPF program, get the perf_event_map_fd and key (i.e.
>>      cpuid get from bpf_get_smp_processor_id()) then use
>>      bpf_perf_event_read() to read from it.
>>   6. Do anything he/her want.
>>
>> changes in V2:
>>   - put atomic_long_inc_not_zero() between fdget() and fdput();
>>   - limit the event type to PERF_TYPE_RAW and PERF_TYPE_HARDWARE;
>>   - Only read the event counter on current CPU or on current
>>     process;
>>   - add new map type BPF_MAP_TYPE_PERF_EVENT_ARRAY to store the
>>     pointer to the struct perf_event;
>>   - according to the perf_event_map_fd and key, the function
>>     bpf_perf_event_read() can get the Hardware PMU counter value;
>>
>> Patch 5/5 is a simple example and shows how to use this new eBPF
>> programs ability. The PMU counter data can be found in
>> /sys/kernel/debug/tracing/trace(trace_pipe).(the cycles PMU
>> value when 'kprobe/sys_write' sampling)
>>
>>    $ cat /sys/kernel/debug/tracing/trace_pipe
>>    $ ./tracex6
>>         ...
>>               cat-677   [002] d..1   210.299270: : bpf count: CPU-2  5316659
>>               cat-677   [002] d..1   210.299316: : bpf count: CPU-2  5378639
>>               cat-677   [002] d..1   210.299362: : bpf count: CPU-2  5440654
>>               cat-677   [002] d..1   210.299408: : bpf count: CPU-2  5503211
>>               cat-677   [002] d..1   210.299454: : bpf count: CPU-2  5565438
>>               cat-677   [002] d..1   210.299500: : bpf count: CPU-2  5627433
>>               cat-677   [002] d..1   210.299547: : bpf count: CPU-2  5690033
>>               cat-677   [002] d..1   210.299593: : bpf count: CPU-2  5752184
>>               cat-677   [002] d..1   210.299639: : bpf count: CPU-2  5814543
>>             <...>-548   [009] d..1   210.299667: : bpf count: CPU-9  605418074
>>             <...>-548   [009] d..1   210.299692: : bpf count: CPU-9  605452692
>>               cat-677   [002] d..1   210.299700: : bpf count: CPU-2  5896319
>>             <...>-548   [009] d..1   210.299710: : bpf count: CPU-9  605477824
>>             <...>-548   [009] d..1   210.299728: : bpf count: CPU-9  605501726
>>             <...>-548   [009] d..1   210.299745: : bpf count: CPU-9  605525279
>>             <...>-548   [009] d..1   210.299762: : bpf count: CPU-9  605547817
>>             <...>-548   [009] d..1   210.299778: : bpf count: CPU-9  605570433
>>             <...>-548   [009] d..1   210.299795: : bpf count: CPU-9  605592743
>>         ...
>>
>> The detail of patches is as follow:
>>
>> Patch 1/5 introduces a new bpf map type. This map only stores the
>> pointer to struct perf_event;
>>
>> Patch 2/5 introduces a map_traverse_elem() function for further use;
>>
>> Patch 3/5 convets event file descriptors into perf_event structure when
>> add new element to the map;
> 
> So far all the map backends are of generic nature, knowing absolutely nothing
> about a particular consumer/subsystem of eBPF (tc, socket filters, etc). The
> tail call is a bit special, but nevertheless generic for each user and [very]
> useful, so it makes sense to inherit from the array map and move the code there.
> 
> I don't really like that we start add new _special_-cased maps here into the
> eBPF core code, it seems quite hacky. :( From your rather terse commit description
> where you introduce the maps, I failed to see a detailed elaboration on this i.e.
> why it cannot be abstracted any different?

It will be very useful that giving the eBPF programs the ablility to access
hardware PMU counter, just as I mentioned in V1 commit message.
Of course, there are some special code when creating the perf_event type map
in V2, but you will find less special code in the next version(V3). I have
reused most of the prog_array map implementation. We can make the perf_event
array map more generic in the future.

BR.
>  
>> Patch 4/5 implement function bpf_perf_event_read() that get the selected
>> hardware PMU conuter;
>>
>> Patch 5/5 give a simple example.
>>
>> Kaixu Xia (5):
>>    bpf: Add new bpf map type to store the pointer to struct perf_event
>>    bpf: Add function map->ops->map_traverse_elem() to traverse map elems
>>    bpf: Save the pointer to struct perf_event to map
>>    bpf: Implement function bpf_perf_event_read() that get the selected
>>      hardware PMU conuter
>>    samples/bpf: example of get selected PMU counter value
>>
>>   include/linux/bpf.h        |   6 +++
>>   include/linux/perf_event.h |   5 ++-
>>   include/uapi/linux/bpf.h   |   3 ++
>>   kernel/bpf/arraymap.c      | 110 +++++++++++++++++++++++++++++++++++++++++++++
>>   kernel/bpf/helpers.c       |  42 +++++++++++++++++
>>   kernel/bpf/syscall.c       |  26 +++++++++++
>>   kernel/events/core.c       |  30 ++++++++++++-
>>   kernel/trace/bpf_trace.c   |   2 +
>>   samples/bpf/Makefile       |   4 ++
>>   samples/bpf/bpf_helpers.h  |   2 +
>>   samples/bpf/tracex6_kern.c |  27 +++++++++++
>>   samples/bpf/tracex6_user.c |  67 +++++++++++++++++++++++++++
>>   12 files changed, 321 insertions(+), 3 deletions(-)
>>   create mode 100644 samples/bpf/tracex6_kern.c
>>   create mode 100644 samples/bpf/tracex6_user.c
>>
> 
> 
> .
> 



      reply	other threads:[~2015-07-25  2:15 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-22  8:09 [PATCH v2 0/5] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter Kaixu Xia
2015-07-22  8:09 ` [PATCH v2 1/5] bpf: Add new bpf map type to store the pointer to struct perf_event Kaixu Xia
2015-07-23  0:48   ` Alexei Starovoitov
2015-07-23  1:08     ` Wangnan (F)
2015-07-23  1:39       ` Alexei Starovoitov
2015-07-22  8:09 ` [PATCH v2 2/5] bpf: Add function map->ops->map_traverse_elem() to traverse map elems Kaixu Xia
2015-07-23  1:00   ` Alexei Starovoitov
2015-07-22  8:09 ` [PATCH v2 3/5] bpf: Save the pointer to struct perf_event to map Kaixu Xia
2015-07-23  1:08   ` Alexei Starovoitov
2015-07-22  8:09 ` [PATCH v2 4/5] bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter Kaixu Xia
2015-07-23  1:14   ` Alexei Starovoitov
2015-07-23  2:12     ` xiakaixu
2015-07-23  2:22       ` Alexei Starovoitov
2015-07-23  2:39         ` xiakaixu
2015-07-22  8:09 ` [PATCH v2 5/5] samples/bpf: example of get selected PMU counter value Kaixu Xia
2015-07-23  1:16   ` Alexei Starovoitov
2015-07-23 23:33 ` [PATCH v2 0/5] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter Daniel Borkmann
2015-07-25  2:14   ` xiakaixu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55B2F123.4060908@huawei.com \
    --to=xiakaixu@huawei.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@kernel.org \
    --cc=ast@plumgrid.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=hekuang@huawei.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=mingo@redhat.com \
    --cc=pi3orama@163.com \
    --cc=wangnan0@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox