* [PATCH V4 0/1] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling
@ 2015-10-19 10:37 Kaixu Xia
2015-10-19 10:37 ` [PATCH V4 1/1] " Kaixu Xia
0 siblings, 1 reply; 4+ messages in thread
From: Kaixu Xia @ 2015-10-19 10:37 UTC (permalink / raw)
To: ast, davem, acme, mingo, a.p.zijlstra, masami.hiramatsu.pt, jolsa,
daniel
Cc: xiakaixu, wangnan0, linux-kernel, pi3orama, hekuang, netdev
Previous patch V3 url:
https://lkml.org/lkml/2015/10/16/101
This patchset introduces the new perf_event_attr attribute
'soft_disable'. The already existed 'disabled' flag doesn't
meet the requirements. The cpu_function_call is too much
to do from bpf program and we control the perf_event stored in
maps like soft_disable, so if the 'disabled' flag is set to
true, we can't enable/disable the perf event by bpf programs.
changes in V4:
- make the naming more proper;
- fix the initial value set of attr->soft_disable bug;
- add unlikely() to the check of event->soft_enable;
- squash the 2ed patch into 1st patch;
changes in V3:
- make the flag name and condition check consistent;
- check the bpf helper flag only bit 0 and check all other bits are
reserved;
- use atomic_dec_if_positive() and atomic_inc_unless_negative();
- make bpf_perf_event_dump_control_proto be static;
- remove the ioctl PERF_EVENT_IOC_SET_ENABLER and 'enabler' event;
- implement the function that controlling all the perf events
stored in PERF_EVENT_ARRAY maps by setting the parameter 'index'
to maps max_entries;
changes in V2:
- rebase the whole patch set to net-next tree(4b418bf);
- remove the added flag perf_sample_disable in bpf_map;
- move the added fields in structure perf_event to proper place
to avoid cacheline miss;
- use counter based flag instead of 0/1 switcher in considering
of reentering events;
- use a single helper bpf_perf_event_sample_control() to enable/
disable events;
- implement a light-weight solution to control the trace data
output on current cpu;
- create a new ioctl PERF_EVENT_IOC_SET_ENABLER to enable/disable
a set of events;
Before this patch,
$ ./perf record -e cycles -a sleep 1
$ ./perf report --stdio
# To display the perf.data header info, please use --header/--header-only option
#
#
# Total Lost Samples: 0
#
# Samples: 527 of event 'cycles'
# Event count (approx.): 87824857
...
After this patch,
$ ./perf record -e pmux=cycles --event perf-bpf.o/my_cycles_map=pmux/ -a sleep 1
$ ./perf report --stdio
# To display the perf.data header info, please use --header/--header-only option
#
#
# Total Lost Samples: 0
#
# Samples: 22 of event 'cycles'
# Event count (approx.): 4213922
...
The bpf program example:
struct bpf_map_def SEC("maps") my_cycles_map = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = 32,
};
SEC("enter=sys_write")
int bpf_prog_1(struct pt_regs *ctx)
{
bpf_perf_event_control(&my_cycles_map, 0, 2);
return 0;
}
SEC("exit=sys_write%return")
int bpf_prog_2(struct pt_regs *ctx)
{
bpf_perf_event_control(&my_cycles_map, 0, 3);
return 0;
}
Consider control sampling in function level, we have to set
a high sample frequency to dump trace data when enable/disable
the perf event on current cpu.
Kaixu Xia (1):
bpf: control events stored in PERF_EVENT_ARRAY maps trace data output
when perf sampling
include/linux/perf_event.h | 1 +
include/uapi/linux/bpf.h | 19 +++++++++++++++
include/uapi/linux/perf_event.h | 3 ++-
kernel/bpf/verifier.c | 3 ++-
kernel/events/core.c | 13 +++++++++++
kernel/trace/bpf_trace.c | 51 +++++++++++++++++++++++++++++++++++++++++
6 files changed, 88 insertions(+), 2 deletions(-)
--
1.8.3.4
^ permalink raw reply [flat|nested] 4+ messages in thread* [PATCH V4 1/1] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling 2015-10-19 10:37 [PATCH V4 0/1] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling Kaixu Xia @ 2015-10-19 10:37 ` Kaixu Xia 2015-10-20 2:14 ` Alexei Starovoitov 0 siblings, 1 reply; 4+ messages in thread From: Kaixu Xia @ 2015-10-19 10:37 UTC (permalink / raw) To: ast, davem, acme, mingo, a.p.zijlstra, masami.hiramatsu.pt, jolsa, daniel Cc: xiakaixu, wangnan0, linux-kernel, pi3orama, hekuang, netdev This patch adds the flag soft_enable to control the trace data output process when perf sampling. By setting this flag and integrating with ebpf, we can control the data output process and get the samples we are most interested in. The bpf helper bpf_perf_event_control() can control either the perf event on current cpu or all the perf events stored in the maps by checking the third parameter 'flag'. Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> --- include/linux/perf_event.h | 1 + include/uapi/linux/bpf.h | 19 +++++++++++++++ include/uapi/linux/perf_event.h | 3 ++- kernel/bpf/verifier.c | 3 ++- kernel/events/core.c | 13 +++++++++++ kernel/trace/bpf_trace.c | 51 +++++++++++++++++++++++++++++++++++++++++ 6 files changed, 88 insertions(+), 2 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 092a0e8..bb3bf87 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -472,6 +472,7 @@ struct perf_event { struct irq_work pending; atomic_t event_limit; + atomic_t soft_enable; void (*destroy)(struct perf_event *); struct rcu_head rcu_head; diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 564f1f0..a2b0d9d 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -132,6 +132,20 @@ enum bpf_prog_type { #define BPF_NOEXIST 1 /* create new element if it didn't exist */ #define BPF_EXIST 2 /* update existing element */ +/* flags for PERF_EVENT_ARRAY maps*/ +enum { + BPF_EVENT_CTL_BIT_CUR = 0, + BPF_EVENT_CTL_BIT_ALL = 1, + __NR_BPF_EVENT_CTL_BITS, +}; + +#define BPF_CTL_BIT_FLAG_MASK \ + ((1ULL << __NR_BPF_EVENT_CTL_BITS) - 1) +#define BPF_CTL_BIT_DUMP_CUR \ + (1ULL << BPF_EVENT_CTL_BIT_CUR) +#define BPF_CTL_BIT_DUMP_ALL \ + (1ULL << BPF_EVENT_CTL_BIT_ALL) + union bpf_attr { struct { /* anonymous struct used by BPF_MAP_CREATE command */ __u32 map_type; /* one of enum bpf_map_type */ @@ -287,6 +301,11 @@ enum bpf_func_id { * Return: realm if != 0 */ BPF_FUNC_get_route_realm, + + /** + * u64 bpf_perf_event_control(&map, index, flag) + */ + BPF_FUNC_perf_event_control, __BPF_FUNC_MAX_ID, }; diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index 2881145..a791b03 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -331,7 +331,8 @@ struct perf_event_attr { comm_exec : 1, /* flag comm events that are due to an exec */ use_clockid : 1, /* use @clockid for time fields */ context_switch : 1, /* context switch data */ - __reserved_1 : 37; + soft_disable : 1, /* output data on samples by default */ + __reserved_1 : 36; union { __u32 wakeup_events; /* wakeup every n events */ diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 1d6b97b..ffec14b 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -245,6 +245,7 @@ static const struct { } func_limit[] = { {BPF_MAP_TYPE_PROG_ARRAY, BPF_FUNC_tail_call}, {BPF_MAP_TYPE_PERF_EVENT_ARRAY, BPF_FUNC_perf_event_read}, + {BPF_MAP_TYPE_PERF_EVENT_ARRAY, BPF_FUNC_perf_event_control}, }; static void print_verifier_state(struct verifier_env *env) @@ -910,7 +911,7 @@ static int check_map_func_compatibility(struct bpf_map *map, int func_id) * don't allow any other map type to be passed into * the special func; */ - if (bool_map != bool_func) + if (bool_func && bool_map != bool_func) return -EINVAL; } diff --git a/kernel/events/core.c b/kernel/events/core.c index b11756f..5219635 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -6337,6 +6337,9 @@ static int __perf_event_overflow(struct perf_event *event, irq_work_queue(&event->pending); } + if (unlikely(!atomic_read(&event->soft_enable))) + return 0; + if (event->overflow_handler) event->overflow_handler(event, data, regs); else @@ -7709,6 +7712,14 @@ static void account_event(struct perf_event *event) account_event_cpu(event, event->cpu); } +static void perf_event_check_dump_flag(struct perf_event *event) +{ + if (event->attr.soft_disable == 1) + atomic_set(&event->soft_enable, 0); + else + atomic_set(&event->soft_enable, 1); +} + /* * Allocate and initialize a event structure */ @@ -7840,6 +7851,8 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, } } + perf_event_check_dump_flag(event); + return event; err_per_task: diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 0fe96c7..d26f3d4 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -215,6 +215,55 @@ const struct bpf_func_proto bpf_perf_event_read_proto = { .arg2_type = ARG_ANYTHING, }; +static u64 bpf_perf_event_control(u64 r1, u64 index, u64 flag, u64 r4, u64 r5) +{ + struct bpf_map *map = (struct bpf_map *) (unsigned long) r1; + struct bpf_array *array = container_of(map, struct bpf_array, map); + struct perf_event *event; + int i; + + if (unlikely(index >= array->map.max_entries)) + return -E2BIG; + + if (flag & (~BPF_CTL_BIT_FLAG_MASK)) + return -EINVAL; + + if (flag & BPF_CTL_BIT_DUMP_ALL) { + bool dump_control = flag & BPF_CTL_BIT_DUMP_CUR; + + for (i = 0; i < array->map.max_entries; i++) { + event = (struct perf_event *)array->ptrs[i]; + if (!event) + continue; + + if (dump_control) + atomic_dec_if_positive(&event->soft_enable); + else + atomic_inc_unless_negative(&event->soft_enable); + } + return 0; + } + + event = (struct perf_event *)array->ptrs[index]; + if (!event) + return -ENOENT; + + if (flag & BPF_CTL_BIT_DUMP_CUR) + atomic_dec_if_positive(&event->soft_enable); + else + atomic_inc_unless_negative(&event->soft_enable); + return 0; +} + +static const struct bpf_func_proto bpf_perf_event_control_proto = { + .func = bpf_perf_event_control, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_CONST_MAP_PTR, + .arg2_type = ARG_ANYTHING, + .arg3_type = ARG_ANYTHING, +}; + static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func_id) { switch (func_id) { @@ -242,6 +291,8 @@ static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func return &bpf_get_smp_processor_id_proto; case BPF_FUNC_perf_event_read: return &bpf_perf_event_read_proto; + case BPF_FUNC_perf_event_control: + return &bpf_perf_event_control_proto; default: return NULL; } -- 1.8.3.4 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH V4 1/1] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling 2015-10-19 10:37 ` [PATCH V4 1/1] " Kaixu Xia @ 2015-10-20 2:14 ` Alexei Starovoitov 2015-10-20 2:35 ` xiakaixu 0 siblings, 1 reply; 4+ messages in thread From: Alexei Starovoitov @ 2015-10-20 2:14 UTC (permalink / raw) To: Kaixu Xia, davem, acme, mingo, a.p.zijlstra, masami.hiramatsu.pt, jolsa, daniel Cc: wangnan0, linux-kernel, pi3orama, hekuang, netdev On 10/19/15 3:37 AM, Kaixu Xia wrote: > +/* flags for PERF_EVENT_ARRAY maps*/ > +enum { > + BPF_EVENT_CTL_BIT_CUR = 0, > + BPF_EVENT_CTL_BIT_ALL = 1, > + __NR_BPF_EVENT_CTL_BITS, > +}; > + > +#define BPF_CTL_BIT_FLAG_MASK \ > + ((1ULL << __NR_BPF_EVENT_CTL_BITS) - 1) > +#define BPF_CTL_BIT_DUMP_CUR \ > + (1ULL << BPF_EVENT_CTL_BIT_CUR) > +#define BPF_CTL_BIT_DUMP_ALL \ > + (1ULL << BPF_EVENT_CTL_BIT_ALL) > + the above shouldn't be part of uapi header. It can stay in bpf_trace.c Just document these bits next to helper similar to skb_store_bytes() The rest looks ok. It still needs an ack from Peter for perf_event bits ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH V4 1/1] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling 2015-10-20 2:14 ` Alexei Starovoitov @ 2015-10-20 2:35 ` xiakaixu 0 siblings, 0 replies; 4+ messages in thread From: xiakaixu @ 2015-10-20 2:35 UTC (permalink / raw) To: Alexei Starovoitov Cc: davem, acme, mingo, a.p.zijlstra, masami.hiramatsu.pt, jolsa, daniel, wangnan0, linux-kernel, pi3orama, hekuang, netdev 于 2015/10/20 10:14, Alexei Starovoitov 写道: > On 10/19/15 3:37 AM, Kaixu Xia wrote: >> +/* flags for PERF_EVENT_ARRAY maps*/ >> +enum { >> + BPF_EVENT_CTL_BIT_CUR = 0, >> + BPF_EVENT_CTL_BIT_ALL = 1, >> + __NR_BPF_EVENT_CTL_BITS, >> +}; >> + >> +#define BPF_CTL_BIT_FLAG_MASK \ >> + ((1ULL << __NR_BPF_EVENT_CTL_BITS) - 1) >> +#define BPF_CTL_BIT_DUMP_CUR \ >> + (1ULL << BPF_EVENT_CTL_BIT_CUR) >> +#define BPF_CTL_BIT_DUMP_ALL \ >> + (1ULL << BPF_EVENT_CTL_BIT_ALL) >> + > > the above shouldn't be part of uapi header. It can stay in bpf_trace.c > Just document these bits next to helper similar to skb_store_bytes() > > The rest looks ok. > It still needs an ack from Peter for perf_event bits Thanks for your comments! This part will be moved to bpf_trace.c in next version. > > > . > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-10-20 2:35 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-10-19 10:37 [PATCH V4 0/1] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling Kaixu Xia 2015-10-19 10:37 ` [PATCH V4 1/1] " Kaixu Xia 2015-10-20 2:14 ` Alexei Starovoitov 2015-10-20 2:35 ` xiakaixu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).