Netdev List
 help / color / mirror / Atom feed
From: Yonghong Song <yhs@fb.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: <peterz@infradead.org>, <ast@fb.com>, <daniel@iogearbox.net>,
	<netdev@vger.kernel.org>, <kernel-team@fb.com>
Subject: Re: [PATCH net] bpf: one perf event close won't free bpf program attached by another perf event
Date: Wed, 20 Sep 2017 22:20:13 -0700	[thread overview]
Message-ID: <9e968490-87ae-7a79-9e59-0dcc840a93f5@fb.com> (raw)
In-Reply-To: <94f5a61e-1a88-f285-224b-66c92dc3da7b@fb.com>



On 9/20/17 10:17 PM, Yonghong Song wrote:
> 
> 
> On 9/20/17 6:41 PM, Steven Rostedt wrote:
>> On Mon, 18 Sep 2017 16:38:36 -0700
>> Yonghong Song <yhs@fb.com> wrote:
>>
>>> This patch fixes a bug exhibited by the following scenario:
>>>    1. fd1 = perf_event_open with attr.config = ID1
>>>    2. attach bpf program prog1 to fd1
>>>    3. fd2 = perf_event_open with attr.config = ID1
>>>       <this will be successful>
>>>    4. user program closes fd2 and prog1 is detached from the tracepoint.
>>>    5. user program with fd1 does not work properly as tracepoint
>>>       no output any more.
>>>
>>> The issue happens at step 4. Multiple perf_event_open can be called
>>> successfully, but only one bpf prog pointer in the tp_event. In the
>>> current logic, any fd release for the same tp_event will free
>>> the tp_event->prog.
>>>
>>> The fix is to free tp_event->prog only when the closing fd
>>> corresponds to the one which registered the program.
>>>
>>> Signed-off-by: Yonghong Song <yhs@fb.com>
>>> ---
>>>   Additional context: discussed with Alexei internally but did not find
>>>   a solution which can avoid introducing the additional field in
>>>   trace_event_call structure.
>>>
>>>   Peter, could you take a look as well and maybe you could have better
>>>   alternative? Thanks!
>>>
>>>   include/linux/trace_events.h | 1 +
>>>   kernel/events/core.c         | 3 ++-
>>>   2 files changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
>>> index 7f11050..2e0f222 100644
>>> --- a/include/linux/trace_events.h
>>> +++ b/include/linux/trace_events.h
>>> @@ -272,6 +272,7 @@ struct trace_event_call {
>>>       int                perf_refcount;
>>>       struct hlist_head __percpu    *perf_events;
>>>       struct bpf_prog            *prog;
>>> +    struct perf_event        *bpf_prog_owner;
>>
>> Does this have to be in the trace_event_call structure? Hmm, I'm
>> wondering if the prog needs to be there (I should look to see if we can
>> move it from it). The trace_event_call is created for *every* event,
>> and there's thousands of them now. Every byte to this structure adds
>> 1000s of bytes to the kernel. Would it be possible to attach the prog
>> and the owner to the perf_event?
> 
> Regarding whether we could move the prog and the owner to the 
> perf_event. It is possible. There is already a "prog" field in the
> perf_event structure for overflow handler. We could reuse the "prog"
> field and we do not need bpf_prog_owner any more. We can iterate
> through trace_event_call->perf_events to find the event which has the
> prog and executes it. This will support multiple prog's attaching to
> the same trace_event_call as well.
> 
> This approach may need careful evaluation though.
> (1). It adds runtime overhead although the overhead should be small
> since perf_event attaching to the same trace_event_call should be small.
> (2). trace_event_call->perf_events are per cpu data structure, that 
> means, some filtering logic is needed to avoid the same perf_event prog 
> is executing twice.

What I mean here is that the trace_event_call->perf_events need to be
checked on ALL cpus since bpf prog should be executed regardless of
cpu affiliation. It is possible that the same perf_event in different
per_cpu bucket and hence filtering is needed to avoid the same
perf_event bpf_prog is executed twice.

> (3). since the list is traversed, the locking (rcu?) may be required to
> pretect the list. Not sure whether additional locking is needed or not.
> 
> Alternative to using trace_event_call->perf_events, we may replace
> "struct bpf_prog *prog" to a list (or some other list like data 
> structure) to just record perf events which have bpf progs attached.
> But this will add memory overhead to trace_event_call data structure.
> 
>>
>> -- Steve
>>
>>
>>>       int    (*perf_perm)(struct trace_event_call *,
>>>                    struct perf_event *);
>>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>>> index 3e691b7..6bc21e2 100644
>>> --- a/kernel/events/core.c
>>> +++ b/kernel/events/core.c
>>> @@ -8171,6 +8171,7 @@ static int perf_event_set_bpf_prog(struct 
>>> perf_event *event, u32 prog_fd)
>>>           }
>>>       }
>>>       event->tp_event->prog = prog;
>>> +    event->tp_event->bpf_prog_owner = event;
>>>       return 0;
>>>   }
>>> @@ -8185,7 +8186,7 @@ static void perf_event_free_bpf_prog(struct 
>>> perf_event *event)
>>>           return;
>>>       prog = event->tp_event->prog;
>>> -    if (prog) {
>>> +    if (prog && event->tp_event->bpf_prog_owner == event) {
>>>           event->tp_event->prog = NULL;
>>>           bpf_prog_put(prog);
>>>       }
>>

  reply	other threads:[~2017-09-21  5:21 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-18 23:38 [PATCH net] bpf: one perf event close won't free bpf program attached by another perf event Yonghong Song
2017-09-20 21:12 ` David Miller
2017-09-21  1:41 ` Steven Rostedt
2017-09-21  5:17   ` Yonghong Song
2017-09-21  5:20     ` Yonghong Song [this message]
2017-09-21 11:17       ` Peter Zijlstra
2017-09-21 14:02         ` Steven Rostedt
2017-09-21 21:53         ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9e968490-87ae-7a79-9e59-0dcc840a93f5@fb.com \
    --to=yhs@fb.com \
    --cc=ast@fb.com \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox