All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexei Starovoitov <ast@plumgrid.com>
To: kaixu xia <xiakaixu@huawei.com>,
	davem@davemloft.net, acme@kernel.org, mingo@redhat.com,
	a.p.zijlstra@chello.nl, masami.hiramatsu.pt@hitachi.com,
	jolsa@kernel.org
Cc: wangnan0@huawei.com, linux-kernel@vger.kernel.org,
	pi3orama@163.com, hekuang@huawei.com
Subject: Re: [RFC PATCH 0/6] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter
Date: Fri, 17 Jul 2015 15:56:08 -0700	[thread overview]
Message-ID: <55A98808.9010307@plumgrid.com> (raw)
In-Reply-To: <1437129816-13176-1-git-send-email-xiakaixu@huawei.com>

On 7/17/15 3:43 AM, kaixu xia wrote:
> There are many useful PMUs provided by X86 and other architectures. By
> combining PMU, kprobe and eBPF program together, many interesting things
> can be done. For example, by probing at sched:sched_switch we can
> measure IPC changing between different processes by watching 'cycle' PMU
> counter; by probing at entry and exit points of a kernel function we are
> able to compute cache miss rate for a function by collecting
> 'cache-misses' counter and see the differences. In summary, we can
> define the begin and end points of a procedure, insert kprobes on them,
> attach two BPF programs and let them collect specific PMU counter.

that would be definitely a useful feature.
As far as overall design I think it should be done slightly differently.
The addition of 'flags' to all maps is a bit hacky and it seems has few
holes. It's better to reuse 'store fds into maps' code that prog_array
is doing. You can add new map type BPF_MAP_TYPE_PERF_EVENT_ARRAY
and reuse most of the arraymap.c code.
The program also wouldn't need to do lookup+read_pmu, so instead of:
   r0 = 0 (the chosen key: CPU-0)
   *(u32 *)(fp - 4) = r0
   value = bpf_map_lookup_elem(map_fd, fp - 4);
   count = bpf_read_pmu(value);
you will be able to do:
   count = bpf_perf_event_read(perf_event_array_map_fd, index)
which will be faster.
note, I'd prefer 'bpf_perf_event_read' name for the helper.

Then inside helper we really cannot do mutex, sleep or smp_call,
but since programs are always executed in preempt disabled
and never from NMI, I think something like the following should work:
u64 bpf_perf_event_read(u64 r1, u64 index,...)
{
   struct bpf_perf_event_array *array = (void *) (long) r1;
   struct perf_event *event;

   if (unlikely(index >= array->map.max_entries))
      return -EINVAL;
   event = array->events[index];
   if (event->state != PERF_EVENT_STATE_ACTIVE)
      return -EINVAL;
   if (event->oncpu != raw_smp_processor_id())
      return -EINVAL;
   __perf_event_read(event);
   return perf_event_count(event);
}
not sure whether we need to disable irq around __perf_event_read,
I think it should be ok without.
Also during store of FD into perf_event_array you'd need
to filter out all crazy events. I would limit it to few
basic types first.

btw, make sure you do your tests with lockdep and other debugs on.
and for the sample code please use C for the bpf program. Not many
people can read bpf asm ;)


  parent reply	other threads:[~2015-07-17 22:56 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-17 10:43 [RFC PATCH 0/6] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter kaixu xia
2015-07-17 10:43 ` [RFC PATCH 1/6] bpf: Add new flags that specify the value type stored in map kaixu xia
2015-07-17 10:43 ` [RFC PATCH 2/6] bpf: Add function map->ops->map_traverse_elem() to traverse map elems kaixu xia
2015-07-17 10:43 ` [RFC PATCH 3/6] bpf: Save the pointer to struct perf_event to map kaixu xia
2015-07-17 11:06   ` Peter Zijlstra
2015-07-17 11:21     ` Wangnan (F)
2015-07-17 11:34       ` Wangnan (F)
2015-07-17 11:40         ` Peter Zijlstra
2015-07-17 11:54           ` Wangnan (F)
2015-07-17 12:02             ` Peter Zijlstra
2015-07-17 12:07               ` Wangnan (F)
2015-07-17 11:37   ` Peter Zijlstra
2015-07-17 10:43 ` [RFC PATCH 4/6] bpf: Add a bpf program function argument constraint for PMU map kaixu xia
2015-07-17 10:43 ` [RFC PATCH 5/6] bpf: Implement function bpf_read_pmu() that get the selected hardware PMU conuter kaixu xia
2015-07-17 11:05   ` Peter Zijlstra
2015-07-17 11:29     ` Wangnan (F)
2015-07-17 11:39       ` Peter Zijlstra
2015-07-17 11:45         ` Wangnan (F)
2015-07-17 11:55           ` Peter Zijlstra
2015-07-17 11:56             ` Peter Zijlstra
2015-07-17 12:01               ` Wangnan (F)
2015-07-17 12:04                 ` Wangnan (F)
2015-07-17 12:18                 ` Peter Zijlstra
2015-07-17 12:27                   ` Wangnan (F)
2015-07-17 12:45                     ` Peter Zijlstra
2015-07-17 12:46                       ` Peter Zijlstra
2015-07-17 12:57                       ` pi3orama
2015-07-17 13:26                         ` Peter Zijlstra
2015-07-17 13:45                           ` pi3orama
2015-07-17 11:33     ` Peter Zijlstra
2015-07-17 10:43 ` [RFC PATCH 6/6] samples/bpf: example of get selected PMU counter value kaixu xia
2015-07-17 22:56 ` Alexei Starovoitov [this message]
2015-07-17 23:27   ` [RFC PATCH 0/6] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter pi3orama
2015-07-18  0:42     ` Alexei Starovoitov
2015-07-18  1:02       ` pi3orama
2015-07-18  1:22         ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55A98808.9010307@plumgrid.com \
    --to=ast@plumgrid.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@kernel.org \
    --cc=davem@davemloft.net \
    --cc=hekuang@huawei.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=mingo@redhat.com \
    --cc=pi3orama@163.com \
    --cc=wangnan0@huawei.com \
    --cc=xiakaixu@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.