From: Alexei Starovoitov <ast@plumgrid.com>
To: kaixu xia <xiakaixu@huawei.com>,
davem@davemloft.net, acme@kernel.org, mingo@redhat.com,
a.p.zijlstra@chello.nl, masami.hiramatsu.pt@hitachi.com,
jolsa@kernel.org
Cc: wangnan0@huawei.com, linux-kernel@vger.kernel.org,
pi3orama@163.com, hekuang@huawei.com
Subject: Re: [RFC PATCH 0/6] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter
Date: Fri, 17 Jul 2015 15:56:08 -0700 [thread overview]
Message-ID: <55A98808.9010307@plumgrid.com> (raw)
In-Reply-To: <1437129816-13176-1-git-send-email-xiakaixu@huawei.com>
On 7/17/15 3:43 AM, kaixu xia wrote:
> There are many useful PMUs provided by X86 and other architectures. By
> combining PMU, kprobe and eBPF program together, many interesting things
> can be done. For example, by probing at sched:sched_switch we can
> measure IPC changing between different processes by watching 'cycle' PMU
> counter; by probing at entry and exit points of a kernel function we are
> able to compute cache miss rate for a function by collecting
> 'cache-misses' counter and see the differences. In summary, we can
> define the begin and end points of a procedure, insert kprobes on them,
> attach two BPF programs and let them collect specific PMU counter.
that would be definitely a useful feature.
As far as overall design I think it should be done slightly differently.
The addition of 'flags' to all maps is a bit hacky and it seems has few
holes. It's better to reuse 'store fds into maps' code that prog_array
is doing. You can add new map type BPF_MAP_TYPE_PERF_EVENT_ARRAY
and reuse most of the arraymap.c code.
The program also wouldn't need to do lookup+read_pmu, so instead of:
r0 = 0 (the chosen key: CPU-0)
*(u32 *)(fp - 4) = r0
value = bpf_map_lookup_elem(map_fd, fp - 4);
count = bpf_read_pmu(value);
you will be able to do:
count = bpf_perf_event_read(perf_event_array_map_fd, index)
which will be faster.
note, I'd prefer 'bpf_perf_event_read' name for the helper.
Then inside helper we really cannot do mutex, sleep or smp_call,
but since programs are always executed in preempt disabled
and never from NMI, I think something like the following should work:
u64 bpf_perf_event_read(u64 r1, u64 index,...)
{
struct bpf_perf_event_array *array = (void *) (long) r1;
struct perf_event *event;
if (unlikely(index >= array->map.max_entries))
return -EINVAL;
event = array->events[index];
if (event->state != PERF_EVENT_STATE_ACTIVE)
return -EINVAL;
if (event->oncpu != raw_smp_processor_id())
return -EINVAL;
__perf_event_read(event);
return perf_event_count(event);
}
not sure whether we need to disable irq around __perf_event_read,
I think it should be ok without.
Also during store of FD into perf_event_array you'd need
to filter out all crazy events. I would limit it to few
basic types first.
btw, make sure you do your tests with lockdep and other debugs on.
and for the sample code please use C for the bpf program. Not many
people can read bpf asm ;)
next prev parent reply other threads:[~2015-07-17 22:56 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-17 10:43 [RFC PATCH 0/6] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter kaixu xia
2015-07-17 10:43 ` [RFC PATCH 1/6] bpf: Add new flags that specify the value type stored in map kaixu xia
2015-07-17 10:43 ` [RFC PATCH 2/6] bpf: Add function map->ops->map_traverse_elem() to traverse map elems kaixu xia
2015-07-17 10:43 ` [RFC PATCH 3/6] bpf: Save the pointer to struct perf_event to map kaixu xia
2015-07-17 11:06 ` Peter Zijlstra
2015-07-17 11:21 ` Wangnan (F)
2015-07-17 11:34 ` Wangnan (F)
2015-07-17 11:40 ` Peter Zijlstra
2015-07-17 11:54 ` Wangnan (F)
2015-07-17 12:02 ` Peter Zijlstra
2015-07-17 12:07 ` Wangnan (F)
2015-07-17 11:37 ` Peter Zijlstra
2015-07-17 10:43 ` [RFC PATCH 4/6] bpf: Add a bpf program function argument constraint for PMU map kaixu xia
2015-07-17 10:43 ` [RFC PATCH 5/6] bpf: Implement function bpf_read_pmu() that get the selected hardware PMU conuter kaixu xia
2015-07-17 11:05 ` Peter Zijlstra
2015-07-17 11:29 ` Wangnan (F)
2015-07-17 11:39 ` Peter Zijlstra
2015-07-17 11:45 ` Wangnan (F)
2015-07-17 11:55 ` Peter Zijlstra
2015-07-17 11:56 ` Peter Zijlstra
2015-07-17 12:01 ` Wangnan (F)
2015-07-17 12:04 ` Wangnan (F)
2015-07-17 12:18 ` Peter Zijlstra
2015-07-17 12:27 ` Wangnan (F)
2015-07-17 12:45 ` Peter Zijlstra
2015-07-17 12:46 ` Peter Zijlstra
2015-07-17 12:57 ` pi3orama
2015-07-17 13:26 ` Peter Zijlstra
2015-07-17 13:45 ` pi3orama
2015-07-17 11:33 ` Peter Zijlstra
2015-07-17 10:43 ` [RFC PATCH 6/6] samples/bpf: example of get selected PMU counter value kaixu xia
2015-07-17 22:56 ` Alexei Starovoitov [this message]
2015-07-17 23:27 ` [RFC PATCH 0/6] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter pi3orama
2015-07-18 0:42 ` Alexei Starovoitov
2015-07-18 1:02 ` pi3orama
2015-07-18 1:22 ` Alexei Starovoitov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55A98808.9010307@plumgrid.com \
--to=ast@plumgrid.com \
--cc=a.p.zijlstra@chello.nl \
--cc=acme@kernel.org \
--cc=davem@davemloft.net \
--cc=hekuang@huawei.com \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=masami.hiramatsu.pt@hitachi.com \
--cc=mingo@redhat.com \
--cc=pi3orama@163.com \
--cc=wangnan0@huawei.com \
--cc=xiakaixu@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox