From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexei Starovoitov Subject: Re: [PATCH net] bpf: one perf event close won't free bpf program attached by another perf event Date: Thu, 21 Sep 2017 14:53:48 -0700 Message-ID: <222d5d19-a843-348f-8cf4-bf0669ae6c91@fb.com> References: <20170918233836.1817062-1-yhs@fb.com> <20170920214109.11002b7c@vmware.local.home> <94f5a61e-1a88-f285-224b-66c92dc3da7b@fb.com> <9e968490-87ae-7a79-9e59-0dcc840a93f5@fb.com> <20170921111706.343om7252gcagco6@hirez.programming.kicks-ass.net> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Cc: Steven Rostedt , , , To: Peter Zijlstra , Yonghong Song Return-path: Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:43640 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751763AbdIUVyh (ORCPT ); Thu, 21 Sep 2017 17:54:37 -0400 In-Reply-To: <20170921111706.343om7252gcagco6@hirez.programming.kicks-ass.net> Sender: netdev-owner@vger.kernel.org List-ID: On 9/21/17 4:17 AM, Peter Zijlstra wrote: > On Wed, Sep 20, 2017 at 10:20:13PM -0700, Yonghong Song wrote: >>> (2). trace_event_call->perf_events are per cpu data structure, that >>> means, some filtering logic is needed to avoid the same perf_event prog >>> is executing twice. >> >> What I mean here is that the trace_event_call->perf_events need to be >> checked on ALL cpus since bpf prog should be executed regardless of >> cpu affiliation. It is possible that the same perf_event in different >> per_cpu bucket and hence filtering is needed to avoid the same >> perf_event bpf_prog is executed twice. > > An event will only ever be on a single CPU's list at any one time IIRC. yes, but doing for_each_cpu there is not an option. too slow. struct trace_event_call is the only stable argument in perf_trace_##call(), so we gotta have a pointer there for stuff we need to run. This patch added another annoying pointer, since it's the simplest bugfix for stable. For net-next we're going to remove it, since we're working on multi-prog support for kprobes/tracepoints. (right now there is only one prog allowed and that's very limiting) With multi-prog that bpf_prog_owner pointer will be removed and existing 'struct bpf_prog *prog' pointer will be replaced with something else. > Now, hysterically perf_event_set_bpf_prog used the tracepoint crud > because that already had bpf bits in. But it might make sense to look at > unifying the bpf stuff across all the different event types. Have them > all use event->prog. it sounds good in theory, but in practice we need a separate 'stuff to run' pointer in both perf_event and trace_even_call, since that's what being passed to overflow_handle and perf_trace_##call. > I suspect that would break a fair bunch of bpf proglets, since the data > access to the trace data would be completely different, but it would be > much nicer to not have this distinction based on event type. such things are certainly an abi. kprobe+bpf has to see struct pt_regs perf_event+bpf has to see struct bpf_perf_event_data and tracepoint+bpf has to see struct foo { fields } The fields will change every time tracepoint is changed. That's fine. But we cannot unify kprobe with tracepoints with perf_event prog types. And frankly I don't see the need. Note that in pt_regs we don't need to populate everything. The 'optimized fprobe' we were talking about at plumbers we would populate di,si,dx,cx,sp since most of the kprobe+bpf progs don't care about the other regs and especially cpu flags. So plenty of room for tweaks and optimizations.