Re: [RFC PATCH 0/6] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Alexei Starovoitov <ast@plumgrid.com>
To: kaixu xia <xiakaixu@huawei.com>,
	davem@davemloft.net, acme@kernel.org, mingo@redhat.com,
	a.p.zijlstra@chello.nl, masami.hiramatsu.pt@hitachi.com,
	jolsa@kernel.org
Cc: wangnan0@huawei.com, linux-kernel@vger.kernel.org,
	pi3orama@163.com, hekuang@huawei.com
Subject: Re: [RFC PATCH 0/6] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter
Date: Fri, 17 Jul 2015 15:56:08 -0700	[thread overview]
Message-ID: <55A98808.9010307@plumgrid.com> (raw)
In-Reply-To: <1437129816-13176-1-git-send-email-xiakaixu@huawei.com>

On 7/17/15 3:43 AM, kaixu xia wrote:
> There are many useful PMUs provided by X86 and other architectures. By
> combining PMU, kprobe and eBPF program together, many interesting things
> can be done. For example, by probing at sched:sched_switch we can
> measure IPC changing between different processes by watching 'cycle' PMU
> counter; by probing at entry and exit points of a kernel function we are
> able to compute cache miss rate for a function by collecting
> 'cache-misses' counter and see the differences. In summary, we can
> define the begin and end points of a procedure, insert kprobes on them,
> attach two BPF programs and let them collect specific PMU counter.

that would be definitely a useful feature.
As far as overall design I think it should be done slightly differently.
The addition of 'flags' to all maps is a bit hacky and it seems has few
holes. It's better to reuse 'store fds into maps' code that prog_array
is doing. You can add new map type BPF_MAP_TYPE_PERF_EVENT_ARRAY
and reuse most of the arraymap.c code.
The program also wouldn't need to do lookup+read_pmu, so instead of:
   r0 = 0 (the chosen key: CPU-0)
   *(u32 *)(fp - 4) = r0
   value = bpf_map_lookup_elem(map_fd, fp - 4);
   count = bpf_read_pmu(value);
you will be able to do:
   count = bpf_perf_event_read(perf_event_array_map_fd, index)
which will be faster.
note, I'd prefer 'bpf_perf_event_read' name for the helper.

Then inside helper we really cannot do mutex, sleep or smp_call,
but since programs are always executed in preempt disabled
and never from NMI, I think something like the following should work:
u64 bpf_perf_event_read(u64 r1, u64 index,...)
{
   struct bpf_perf_event_array *array = (void *) (long) r1;
   struct perf_event *event;

   if (unlikely(index >= array->map.max_entries))
      return -EINVAL;
   event = array->events[index];
   if (event->state != PERF_EVENT_STATE_ACTIVE)
      return -EINVAL;
   if (event->oncpu != raw_smp_processor_id())
      return -EINVAL;
   __perf_event_read(event);
   return perf_event_count(event);
}
not sure whether we need to disable irq around __perf_event_read,
I think it should be ok without.
Also during store of FD into perf_event_array you'd need
to filter out all crazy events. I would limit it to few
basic types first.

btw, make sure you do your tests with lockdep and other debugs on.
and for the sample code please use C for the bpf program. Not many
people can read bpf asm ;)

next prev parent reply	other threads:[~2015-07-17 22:56 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-17 10:43 [RFC PATCH 0/6] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter kaixu xia
2015-07-17 10:43 ` [RFC PATCH 1/6] bpf: Add new flags that specify the value type stored in map kaixu xia
2015-07-17 10:43 ` [RFC PATCH 2/6] bpf: Add function map->ops->map_traverse_elem() to traverse map elems kaixu xia
2015-07-17 10:43 ` [RFC PATCH 3/6] bpf: Save the pointer to struct perf_event to map kaixu xia
2015-07-17 11:06   ` Peter Zijlstra
2015-07-17 11:21     ` Wangnan (F)
2015-07-17 11:34       ` Wangnan (F)
2015-07-17 11:40         ` Peter Zijlstra
2015-07-17 11:54           ` Wangnan (F)
2015-07-17 12:02             ` Peter Zijlstra
2015-07-17 12:07               ` Wangnan (F)
2015-07-17 11:37   ` Peter Zijlstra
2015-07-17 10:43 ` [RFC PATCH 4/6] bpf: Add a bpf program function argument constraint for PMU map kaixu xia
2015-07-17 10:43 ` [RFC PATCH 5/6] bpf: Implement function bpf_read_pmu() that get the selected hardware PMU conuter kaixu xia
2015-07-17 11:05   ` Peter Zijlstra
2015-07-17 11:29     ` Wangnan (F)
2015-07-17 11:39       ` Peter Zijlstra
2015-07-17 11:45         ` Wangnan (F)
2015-07-17 11:55           ` Peter Zijlstra
2015-07-17 11:56             ` Peter Zijlstra
2015-07-17 12:01               ` Wangnan (F)
2015-07-17 12:04                 ` Wangnan (F)
2015-07-17 12:18                 ` Peter Zijlstra
2015-07-17 12:27                   ` Wangnan (F)
2015-07-17 12:45                     ` Peter Zijlstra
2015-07-17 12:46                       ` Peter Zijlstra
2015-07-17 12:57                       ` pi3orama
2015-07-17 13:26                         ` Peter Zijlstra
2015-07-17 13:45                           ` pi3orama
2015-07-17 11:33     ` Peter Zijlstra
2015-07-17 10:43 ` [RFC PATCH 6/6] samples/bpf: example of get selected PMU counter value kaixu xia
2015-07-17 22:56 ` Alexei Starovoitov [this message]
2015-07-17 23:27   ` [RFC PATCH 0/6] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter pi3orama
2015-07-18  0:42     ` Alexei Starovoitov
2015-07-18  1:02       ` pi3orama
2015-07-18  1:22         ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55A98808.9010307@plumgrid.com \
    --to=ast@plumgrid.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@kernel.org \
    --cc=davem@davemloft.net \
    --cc=hekuang@huawei.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=mingo@redhat.com \
    --cc=pi3orama@163.com \
    --cc=wangnan0@huawei.com \
    --cc=xiakaixu@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox