From: Yonghong Song <yhs@fb.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: <peterz@infradead.org>, <ast@fb.com>, <daniel@iogearbox.net>,
<netdev@vger.kernel.org>, <kernel-team@fb.com>
Subject: Re: [PATCH net] bpf: one perf event close won't free bpf program attached by another perf event
Date: Wed, 20 Sep 2017 22:20:13 -0700 [thread overview]
Message-ID: <9e968490-87ae-7a79-9e59-0dcc840a93f5@fb.com> (raw)
In-Reply-To: <94f5a61e-1a88-f285-224b-66c92dc3da7b@fb.com>
On 9/20/17 10:17 PM, Yonghong Song wrote:
>
>
> On 9/20/17 6:41 PM, Steven Rostedt wrote:
>> On Mon, 18 Sep 2017 16:38:36 -0700
>> Yonghong Song <yhs@fb.com> wrote:
>>
>>> This patch fixes a bug exhibited by the following scenario:
>>> 1. fd1 = perf_event_open with attr.config = ID1
>>> 2. attach bpf program prog1 to fd1
>>> 3. fd2 = perf_event_open with attr.config = ID1
>>> <this will be successful>
>>> 4. user program closes fd2 and prog1 is detached from the tracepoint.
>>> 5. user program with fd1 does not work properly as tracepoint
>>> no output any more.
>>>
>>> The issue happens at step 4. Multiple perf_event_open can be called
>>> successfully, but only one bpf prog pointer in the tp_event. In the
>>> current logic, any fd release for the same tp_event will free
>>> the tp_event->prog.
>>>
>>> The fix is to free tp_event->prog only when the closing fd
>>> corresponds to the one which registered the program.
>>>
>>> Signed-off-by: Yonghong Song <yhs@fb.com>
>>> ---
>>> Additional context: discussed with Alexei internally but did not find
>>> a solution which can avoid introducing the additional field in
>>> trace_event_call structure.
>>>
>>> Peter, could you take a look as well and maybe you could have better
>>> alternative? Thanks!
>>>
>>> include/linux/trace_events.h | 1 +
>>> kernel/events/core.c | 3 ++-
>>> 2 files changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
>>> index 7f11050..2e0f222 100644
>>> --- a/include/linux/trace_events.h
>>> +++ b/include/linux/trace_events.h
>>> @@ -272,6 +272,7 @@ struct trace_event_call {
>>> int perf_refcount;
>>> struct hlist_head __percpu *perf_events;
>>> struct bpf_prog *prog;
>>> + struct perf_event *bpf_prog_owner;
>>
>> Does this have to be in the trace_event_call structure? Hmm, I'm
>> wondering if the prog needs to be there (I should look to see if we can
>> move it from it). The trace_event_call is created for *every* event,
>> and there's thousands of them now. Every byte to this structure adds
>> 1000s of bytes to the kernel. Would it be possible to attach the prog
>> and the owner to the perf_event?
>
> Regarding whether we could move the prog and the owner to the
> perf_event. It is possible. There is already a "prog" field in the
> perf_event structure for overflow handler. We could reuse the "prog"
> field and we do not need bpf_prog_owner any more. We can iterate
> through trace_event_call->perf_events to find the event which has the
> prog and executes it. This will support multiple prog's attaching to
> the same trace_event_call as well.
>
> This approach may need careful evaluation though.
> (1). It adds runtime overhead although the overhead should be small
> since perf_event attaching to the same trace_event_call should be small.
> (2). trace_event_call->perf_events are per cpu data structure, that
> means, some filtering logic is needed to avoid the same perf_event prog
> is executing twice.
What I mean here is that the trace_event_call->perf_events need to be
checked on ALL cpus since bpf prog should be executed regardless of
cpu affiliation. It is possible that the same perf_event in different
per_cpu bucket and hence filtering is needed to avoid the same
perf_event bpf_prog is executed twice.
> (3). since the list is traversed, the locking (rcu?) may be required to
> pretect the list. Not sure whether additional locking is needed or not.
>
> Alternative to using trace_event_call->perf_events, we may replace
> "struct bpf_prog *prog" to a list (or some other list like data
> structure) to just record perf events which have bpf progs attached.
> But this will add memory overhead to trace_event_call data structure.
>
>>
>> -- Steve
>>
>>
>>> int (*perf_perm)(struct trace_event_call *,
>>> struct perf_event *);
>>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>>> index 3e691b7..6bc21e2 100644
>>> --- a/kernel/events/core.c
>>> +++ b/kernel/events/core.c
>>> @@ -8171,6 +8171,7 @@ static int perf_event_set_bpf_prog(struct
>>> perf_event *event, u32 prog_fd)
>>> }
>>> }
>>> event->tp_event->prog = prog;
>>> + event->tp_event->bpf_prog_owner = event;
>>> return 0;
>>> }
>>> @@ -8185,7 +8186,7 @@ static void perf_event_free_bpf_prog(struct
>>> perf_event *event)
>>> return;
>>> prog = event->tp_event->prog;
>>> - if (prog) {
>>> + if (prog && event->tp_event->bpf_prog_owner == event) {
>>> event->tp_event->prog = NULL;
>>> bpf_prog_put(prog);
>>> }
>>
next prev parent reply other threads:[~2017-09-21 5:21 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-18 23:38 [PATCH net] bpf: one perf event close won't free bpf program attached by another perf event Yonghong Song
2017-09-20 21:12 ` David Miller
2017-09-21 1:41 ` Steven Rostedt
2017-09-21 5:17 ` Yonghong Song
2017-09-21 5:20 ` Yonghong Song [this message]
2017-09-21 11:17 ` Peter Zijlstra
2017-09-21 14:02 ` Steven Rostedt
2017-09-21 21:53 ` Alexei Starovoitov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9e968490-87ae-7a79-9e59-0dcc840a93f5@fb.com \
--to=yhs@fb.com \
--cc=ast@fb.com \
--cc=daniel@iogearbox.net \
--cc=kernel-team@fb.com \
--cc=netdev@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox