From mboxrd@z Thu Jan 1 00:00:00 1970 From: xiakaixu Subject: Re: [PATCH V3 1/2] bpf: control the trace data output on current cpu when perf sampling Date: Mon, 19 Oct 2015 10:48:12 +0800 Message-ID: <562459EC.20700@huawei.com> References: <1444981333-70429-1-git-send-email-xiakaixu@huawei.com> <1444981333-70429-2-git-send-email-xiakaixu@huawei.com> <562174CE.9070900@plumgrid.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: , , , , , , , , , , , To: Alexei Starovoitov Return-path: In-Reply-To: <562174CE.9070900@plumgrid.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: netdev.vger.kernel.org =E4=BA=8E 2015/10/17 6:06, Alexei Starovoitov =E5=86=99=E9=81=93: > On 10/16/15 12:42 AM, Kaixu Xia wrote: >> This patch adds the flag dump_enable to control the trace data >> output process when perf sampling. By setting this flag and >> integrating with ebpf, we can control the data output process and >> get the samples we are most interested in. >> >> The bpf helper bpf_perf_event_dump_control() can control the >> perf_event on current cpu. >> >> Signed-off-by: Kaixu Xia >> --- >> include/linux/perf_event.h | 1 + >> include/uapi/linux/bpf.h | 5 +++++ >> include/uapi/linux/perf_event.h | 3 ++- >> kernel/bpf/verifier.c | 3 ++- >> kernel/events/core.c | 13 ++++++++++++ >> kernel/trace/bpf_trace.c | 44 +++++++++++++++++++++++++++++= ++++++++++++ >> 6 files changed, 67 insertions(+), 2 deletions(-) >> >> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h >> index 092a0e8..2af527e 100644 >> --- a/include/linux/perf_event.h >> +++ b/include/linux/perf_event.h >> @@ -472,6 +472,7 @@ struct perf_event { >> struct irq_work pending; >> >> atomic_t event_limit; >> + atomic_t dump_enable; >=20 > The naming is the hardest... > How about calling it 'soft_enable' instead? >=20 >> --- a/include/uapi/linux/bpf.h >> +++ b/include/uapi/linux/bpf.h >> @@ -287,6 +287,11 @@ enum bpf_func_id { >> * Return: realm if !=3D 0 >> */ >> BPF_FUNC_get_route_realm, >> + >> + /** >> + * u64 bpf_perf_event_dump_control(&map, index, flag) >> + */ >> + BPF_FUNC_perf_event_dump_control, >=20 > and this one is too long. > May be bpf_perf_event_control() ? >=20 > Daniel, any thoughts on naming? >=20 >> --- a/include/uapi/linux/perf_event.h >> +++ b/include/uapi/linux/perf_event.h >> @@ -331,7 +331,8 @@ struct perf_event_attr { >> comm_exec : 1, /* flag comm events that are = due to an exec */ >> use_clockid : 1, /* use @clockid for time fiel= ds */ >> context_switch : 1, /* context switch data */ >> - __reserved_1 : 37; >> + dump_enable : 1, /* don't output data on sample= s */ >=20 > either comment or name is wrong. > how about calling this one 'soft_disable', > since you want zero to be default and the event should be on. >=20 >> diff --git a/kernel/events/core.c b/kernel/events/core.c >> index b11756f..74a16af 100644 >> --- a/kernel/events/core.c >> +++ b/kernel/events/core.c >> @@ -6337,6 +6337,9 @@ static int __perf_event_overflow(struct perf_e= vent *event, >> irq_work_queue(&event->pending); >> } >> >> + if (!atomic_read(&event->dump_enable)) >> + return ret; >=20 > I'm not an expert in this piece of perf, but should it be 'return 0' > instead ? > and may be moved to is_sampling_event() check? > Also please add unlikely(). >=20 >> +static void perf_event_check_dump_flag(struct perf_event *event) >> +{ >> + if (event->attr.dump_enable =3D=3D 1)it=20 >> + atomic_set(&event->dump_enable, 1); >> + else >> + atomic_set(&event->dump_enable, 0); >=20 > that looks like it breaks perf, since default for bits is zero > and all events will be soft-disabled? > How did you test it? > Please add a test to samples/bpf/ for this feature. It is really hard that adding a test to samples/bpf/. We need to implem= ent most of 'perf record/report' commands from tools/perf/, like mmap(), dump trace= , etc. Only the perf_event_open syscall is really not enough. Actually, this patch set is only the kernel space side, and it still ne= eds the perf user space side, you can find the necessary patches in Wang Nan's git t= ree[1]. Based on Wang Nan's git tree, we can config BPF maps through perf cmdli= ne. We also need to confing attr->soft_disable in perf user side based on t= ree[1]. so it was not included in this patchset. I will send out the perf userspac= e part after this patch set is applied. [1] git://git.kernel.org/pub/scm/linux/kernel/git/pi3orama/linux.git pe= rf/ebpf >=20 >=20 > . >=20