public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter
@ 2015-07-17 10:43 kaixu xia
  2015-07-17 10:43 ` [RFC PATCH 1/6] bpf: Add new flags that specify the value type stored in map kaixu xia
                   ` (6 more replies)
  0 siblings, 7 replies; 36+ messages in thread
From: kaixu xia @ 2015-07-17 10:43 UTC (permalink / raw)
  To: ast, davem, acme, mingo, a.p.zijlstra, masami.hiramatsu.pt, jolsa
  Cc: xiakaixu, wangnan0, linux-kernel, pi3orama, hekuang

This series of patches introduce the new ability of eBPF programs
to access hardware PMU counter. Previous discussions on this subject:
https://lkml.org/lkml/2015/5/27/1027.

There are many useful PMUs provided by X86 and other architectures. By
combining PMU, kprobe and eBPF program together, many interesting things
can be done. For example, by probing at sched:sched_switch we can
measure IPC changing between different processes by watching 'cycle' PMU
counter; by probing at entry and exit points of a kernel function we are
able to compute cache miss rate for a function by collecting
'cache-misses' counter and see the differences. In summary, we can
define the begin and end points of a procedure, insert kprobes on them,
attach two BPF programs and let them collect specific PMU counter.
Further, by reading those PMU counter BPF program can bring some hints
to resource schedulers. 

This patchset allows user read PMU events in the following way:
 1. Open the PMU using perf_event_open() (for each CPUs or for
    each processes he/she'd like to watch);
 2. Create a BPF map with BPF_MAP_FLAG_PERF_EVENT set in its
    type field;
 3. Insert FDs into the map with some key-value mapping scheme
    (i.e. cpuid -> event on that CPU);
 4. Load and attach eBPF programs as usual; 
 5. In eBPF program, fetch the perf_event from map with key
    (i.e. cpuid get from bpf_get_smp_processor_id()) then use
    bpf_read_pmu() to read from it.
 6. Do anything he/her want. 

This patchset consists of necessary changes to the kernel space.
Perf will be the normal user space tool based on
https://lkml.org/lkml/2015/7/8/823 (perf tools: filtering events
using eBPF programs), https://lkml.org/lkml/2015/7/13/831
(Make eBPF programs output data to perf) and the corresonding
patches are on the way.

Patch 6/6 is a simple example and shows how to use this new eBPF
programs ability. The PMU counter data can be found in
/sys/kernel/debug/tracing/trace.(the cycles counter value when
'kprobe/sys_write' sampling)

  $ ./bpf_pmu_test
  $ cat /sys/kernel/debug/tracing/trace
       ...
       syslog-ng-555   [001] dn.1 10189.004626: : bpf count: CPU-0  9935764297
       syslog-ng-555   [001] d..1 10189.053776: : bpf count: CPU-0  10000706398
       syslog-ng-555   [001] dn.1 10189.102972: : bpf count: CPU-0  10067117321
       syslog-ng-555   [001] d..1 10189.152925: : bpf count: CPU-0  10134551505
       syslog-ng-555   [001] dn.1 10189.202043: : bpf count: CPU-0  10200869299
       syslog-ng-555   [001] d..1 10189.251167: : bpf count: CPU-0  10267179481
       syslog-ng-555   [001] dn.1 10189.300285: : bpf count: CPU-0  10333493522
       syslog-ng-555   [001] d..1 10189.349410: : bpf count: CPU-0  10399808073
       syslog-ng-555   [001] dn.1 10189.398528: : bpf count: CPU-0  10466121583
       syslog-ng-555   [001] d..1 10189.447645: : bpf count: CPU-0  10532433368
       syslog-ng-555   [001] d..1 10189.496841: : bpf count: CPU-0  10598841104
       syslog-ng-555   [001] d..1 10189.546891: : bpf count: CPU-0  10666410564
       syslog-ng-555   [001] dn.1 10189.596016: : bpf count: CPU-0  10732729739
       syslog-ng-555   [001] d..1 10189.645146: : bpf count: CPU-0  12884941186
       syslog-ng-555   [001] d..1 10189.694263: : bpf count: CPU-0  12951249903
       syslog-ng-555   [001] dn.1 10189.743382: : bpf count: CPU-0  13017561470
       syslog-ng-555   [001] d..1 10189.792506: : bpf count: CPU-0  13083873521
       syslog-ng-555   [001] d..1 10189.841631: : bpf count: CPU-0  13150190416
       syslog-ng-555   [001] d..1 10189.890749: : bpf count: CPU-0  13216505962
       syslog-ng-555   [001] d..1 10189.939945: : bpf count: CPU-0  13282913062
       ...

The detail of patches is as follow: 

Patch 1/6 introduces a flag of map. The flag bit is encoded into type
field passed through attr;

Patch 2/6 introduces a map_traverse_elem() function for further use; 

Patch 3/6 convets event file descriptors into perf_event structure when
add new element to a map with the flag set; 

Patch 4/6 introduces a bpf program function argument constraint for
PMU map;

Patch 5/6 implement function bpf_read_pmu() that get the selected
hardware PMU conuter;

Patch 6/6 give a simple example.

kaixu xia (6):
  bpf: Add new flags that specify the value type stored in map
  bpf: Add function map->ops->map_traverse_elem() to traverse map elems
  bpf: Save the pointer to struct perf_event to map
  bpf: Add a bpf program function argument constraint for PMU map
  bpf: Implement function bpf_read_pmu() that get the selected hardware
    PMU conuter
  samples/bpf: example of get selected PMU counter value

 include/linux/bpf.h        |    7 +++
 include/linux/perf_event.h |    2 +
 include/uapi/linux/bpf.h   |   16 +++++
 kernel/bpf/arraymap.c      |   17 ++++++
 kernel/bpf/hashtab.c       |   27 +++++++++
 kernel/bpf/helpers.c       |   27 +++++++++
 kernel/bpf/syscall.c       |   81 ++++++++++++++++++++++++-
 kernel/bpf/verifier.c      |    9 +++
 kernel/events/core.c       |   22 +++++++
 kernel/trace/bpf_trace.c   |    2 +
 samples/bpf/bpf_helpers.h  |    2 +
 samples/bpf/bpf_pmu_test.c |  143 ++++++++++++++++++++++++++++++++++++++++++++
 12 files changed, 353 insertions(+), 2 deletions(-)
 create mode 100644 samples/bpf/bpf_pmu_test.c

-- 
1.7.10.4


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2015-07-18  1:22 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-17 10:43 [RFC PATCH 0/6] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter kaixu xia
2015-07-17 10:43 ` [RFC PATCH 1/6] bpf: Add new flags that specify the value type stored in map kaixu xia
2015-07-17 10:43 ` [RFC PATCH 2/6] bpf: Add function map->ops->map_traverse_elem() to traverse map elems kaixu xia
2015-07-17 10:43 ` [RFC PATCH 3/6] bpf: Save the pointer to struct perf_event to map kaixu xia
2015-07-17 11:06   ` Peter Zijlstra
2015-07-17 11:21     ` Wangnan (F)
2015-07-17 11:34       ` Wangnan (F)
2015-07-17 11:40         ` Peter Zijlstra
2015-07-17 11:54           ` Wangnan (F)
2015-07-17 12:02             ` Peter Zijlstra
2015-07-17 12:07               ` Wangnan (F)
2015-07-17 11:37   ` Peter Zijlstra
2015-07-17 10:43 ` [RFC PATCH 4/6] bpf: Add a bpf program function argument constraint for PMU map kaixu xia
2015-07-17 10:43 ` [RFC PATCH 5/6] bpf: Implement function bpf_read_pmu() that get the selected hardware PMU conuter kaixu xia
2015-07-17 11:05   ` Peter Zijlstra
2015-07-17 11:29     ` Wangnan (F)
2015-07-17 11:39       ` Peter Zijlstra
2015-07-17 11:45         ` Wangnan (F)
2015-07-17 11:55           ` Peter Zijlstra
2015-07-17 11:56             ` Peter Zijlstra
2015-07-17 12:01               ` Wangnan (F)
2015-07-17 12:04                 ` Wangnan (F)
2015-07-17 12:18                 ` Peter Zijlstra
2015-07-17 12:27                   ` Wangnan (F)
2015-07-17 12:45                     ` Peter Zijlstra
2015-07-17 12:46                       ` Peter Zijlstra
2015-07-17 12:57                       ` pi3orama
2015-07-17 13:26                         ` Peter Zijlstra
2015-07-17 13:45                           ` pi3orama
2015-07-17 11:33     ` Peter Zijlstra
2015-07-17 10:43 ` [RFC PATCH 6/6] samples/bpf: example of get selected PMU counter value kaixu xia
2015-07-17 22:56 ` [RFC PATCH 0/6] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter Alexei Starovoitov
2015-07-17 23:27   ` pi3orama
2015-07-18  0:42     ` Alexei Starovoitov
2015-07-18  1:02       ` pi3orama
2015-07-18  1:22         ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox