[PATCH net] bpf: one perf event close won't free bpf program attached by another perf event

Netdev List
 help / color / mirror / Atom feed

* [PATCH net] bpf: one perf event close won't free bpf program attached by another perf event
@ 2017-09-18 23:38 Yonghong Song
  2017-09-20 21:12 ` David Miller
  2017-09-21  1:41 ` Steven Rostedt
  0 siblings, 2 replies; 8+ messages in thread
From: Yonghong Song @ 2017-09-18 23:38 UTC (permalink / raw)
  To: peterz, rostedt, ast, daniel, netdev; +Cc: kernel-team

This patch fixes a bug exhibited by the following scenario:
  1. fd1 = perf_event_open with attr.config = ID1
  2. attach bpf program prog1 to fd1
  3. fd2 = perf_event_open with attr.config = ID1
     <this will be successful>
  4. user program closes fd2 and prog1 is detached from the tracepoint.
  5. user program with fd1 does not work properly as tracepoint
     no output any more.

The issue happens at step 4. Multiple perf_event_open can be called
successfully, but only one bpf prog pointer in the tp_event. In the
current logic, any fd release for the same tp_event will free
the tp_event->prog.

The fix is to free tp_event->prog only when the closing fd
corresponds to the one which registered the program.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 Additional context: discussed with Alexei internally but did not find
 a solution which can avoid introducing the additional field in
 trace_event_call structure.

 Peter, could you take a look as well and maybe you could have better
 alternative? Thanks!

 include/linux/trace_events.h | 1 +
 kernel/events/core.c         | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 7f11050..2e0f222 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -272,6 +272,7 @@ struct trace_event_call {
 	int				perf_refcount;
 	struct hlist_head __percpu	*perf_events;
 	struct bpf_prog			*prog;
+	struct perf_event		*bpf_prog_owner;
 
 	int	(*perf_perm)(struct trace_event_call *,
 			     struct perf_event *);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 3e691b7..6bc21e2 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8171,6 +8171,7 @@ static int perf_event_set_bpf_prog(struct perf_event *event, u32 prog_fd)
 		}
 	}
 	event->tp_event->prog = prog;
+	event->tp_event->bpf_prog_owner = event;
 
 	return 0;
 }
@@ -8185,7 +8186,7 @@ static void perf_event_free_bpf_prog(struct perf_event *event)
 		return;
 
 	prog = event->tp_event->prog;
-	if (prog) {
+	if (prog && event->tp_event->bpf_prog_owner == event) {
 		event->tp_event->prog = NULL;
 		bpf_prog_put(prog);
 	}
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net] bpf: one perf event close won't free bpf program attached by another perf event
  2017-09-18 23:38 [PATCH net] bpf: one perf event close won't free bpf program attached by another perf event Yonghong Song
@ 2017-09-20 21:12 ` David Miller
  2017-09-21  1:41 ` Steven Rostedt
  1 sibling, 0 replies; 8+ messages in thread
From: David Miller @ 2017-09-20 21:12 UTC (permalink / raw)
  To: yhs; +Cc: peterz, rostedt, ast, daniel, netdev, kernel-team

From: Yonghong Song <yhs@fb.com>
Date: Mon, 18 Sep 2017 16:38:36 -0700

> This patch fixes a bug exhibited by the following scenario:
>   1. fd1 = perf_event_open with attr.config = ID1
>   2. attach bpf program prog1 to fd1
>   3. fd2 = perf_event_open with attr.config = ID1
>      <this will be successful>
>   4. user program closes fd2 and prog1 is detached from the tracepoint.
>   5. user program with fd1 does not work properly as tracepoint
>      no output any more.
> 
> The issue happens at step 4. Multiple perf_event_open can be called
> successfully, but only one bpf prog pointer in the tp_event. In the
> current logic, any fd release for the same tp_event will free
> the tp_event->prog.
> 
> The fix is to free tp_event->prog only when the closing fd
> corresponds to the one which registered the program.
> 
> Signed-off-by: Yonghong Song <yhs@fb.com>

I've applied this and queued it up for -stable as it looks good
to me and 2 days is enough time for waiting for any other reviews.

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] bpf: one perf event close won't free bpf program attached by another perf event
  2017-09-18 23:38 [PATCH net] bpf: one perf event close won't free bpf program attached by another perf event Yonghong Song
  2017-09-20 21:12 ` David Miller
@ 2017-09-21  1:41 ` Steven Rostedt
  2017-09-21  5:17   ` Yonghong Song
  1 sibling, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2017-09-21  1:41 UTC (permalink / raw)
  To: Yonghong Song; +Cc: peterz, ast, daniel, netdev, kernel-team

On Mon, 18 Sep 2017 16:38:36 -0700
Yonghong Song <yhs@fb.com> wrote:

> This patch fixes a bug exhibited by the following scenario:
>   1. fd1 = perf_event_open with attr.config = ID1
>   2. attach bpf program prog1 to fd1
>   3. fd2 = perf_event_open with attr.config = ID1
>      <this will be successful>
>   4. user program closes fd2 and prog1 is detached from the tracepoint.
>   5. user program with fd1 does not work properly as tracepoint
>      no output any more.
> 
> The issue happens at step 4. Multiple perf_event_open can be called
> successfully, but only one bpf prog pointer in the tp_event. In the
> current logic, any fd release for the same tp_event will free
> the tp_event->prog.
> 
> The fix is to free tp_event->prog only when the closing fd
> corresponds to the one which registered the program.
> 
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  Additional context: discussed with Alexei internally but did not find
>  a solution which can avoid introducing the additional field in
>  trace_event_call structure.
> 
>  Peter, could you take a look as well and maybe you could have better
>  alternative? Thanks!
> 
>  include/linux/trace_events.h | 1 +
>  kernel/events/core.c         | 3 ++-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index 7f11050..2e0f222 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -272,6 +272,7 @@ struct trace_event_call {
>  	int				perf_refcount;
>  	struct hlist_head __percpu	*perf_events;
>  	struct bpf_prog			*prog;
> +	struct perf_event		*bpf_prog_owner;

Does this have to be in the trace_event_call structure? Hmm, I'm
wondering if the prog needs to be there (I should look to see if we can
move it from it). The trace_event_call is created for *every* event,
and there's thousands of them now. Every byte to this structure adds
1000s of bytes to the kernel. Would it be possible to attach the prog
and the owner to the perf_event?

-- Steve


>  
>  	int	(*perf_perm)(struct trace_event_call *,
>  			     struct perf_event *);
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 3e691b7..6bc21e2 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8171,6 +8171,7 @@ static int perf_event_set_bpf_prog(struct perf_event *event, u32 prog_fd)
>  		}
>  	}
>  	event->tp_event->prog = prog;
> +	event->tp_event->bpf_prog_owner = event;
>  
>  	return 0;
>  }
> @@ -8185,7 +8186,7 @@ static void perf_event_free_bpf_prog(struct perf_event *event)
>  		return;
>  
>  	prog = event->tp_event->prog;
> -	if (prog) {
> +	if (prog && event->tp_event->bpf_prog_owner == event) {
>  		event->tp_event->prog = NULL;
>  		bpf_prog_put(prog);
>  	}

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] bpf: one perf event close won't free bpf program attached by another perf event
  2017-09-21  1:41 ` Steven Rostedt
@ 2017-09-21  5:17   ` Yonghong Song
  2017-09-21  5:20     ` Yonghong Song
  0 siblings, 1 reply; 8+ messages in thread
From: Yonghong Song @ 2017-09-21  5:17 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: peterz, ast, daniel, netdev, kernel-team



On 9/20/17 6:41 PM, Steven Rostedt wrote:
> On Mon, 18 Sep 2017 16:38:36 -0700
> Yonghong Song <yhs@fb.com> wrote:
> 
>> This patch fixes a bug exhibited by the following scenario:
>>    1. fd1 = perf_event_open with attr.config = ID1
>>    2. attach bpf program prog1 to fd1
>>    3. fd2 = perf_event_open with attr.config = ID1
>>       <this will be successful>
>>    4. user program closes fd2 and prog1 is detached from the tracepoint.
>>    5. user program with fd1 does not work properly as tracepoint
>>       no output any more.
>>
>> The issue happens at step 4. Multiple perf_event_open can be called
>> successfully, but only one bpf prog pointer in the tp_event. In the
>> current logic, any fd release for the same tp_event will free
>> the tp_event->prog.
>>
>> The fix is to free tp_event->prog only when the closing fd
>> corresponds to the one which registered the program.
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>   Additional context: discussed with Alexei internally but did not find
>>   a solution which can avoid introducing the additional field in
>>   trace_event_call structure.
>>
>>   Peter, could you take a look as well and maybe you could have better
>>   alternative? Thanks!
>>
>>   include/linux/trace_events.h | 1 +
>>   kernel/events/core.c         | 3 ++-
>>   2 files changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
>> index 7f11050..2e0f222 100644
>> --- a/include/linux/trace_events.h
>> +++ b/include/linux/trace_events.h
>> @@ -272,6 +272,7 @@ struct trace_event_call {
>>   	int				perf_refcount;
>>   	struct hlist_head __percpu	*perf_events;
>>   	struct bpf_prog			*prog;
>> +	struct perf_event		*bpf_prog_owner;
> 
> Does this have to be in the trace_event_call structure? Hmm, I'm
> wondering if the prog needs to be there (I should look to see if we can
> move it from it). The trace_event_call is created for *every* event,
> and there's thousands of them now. Every byte to this structure adds
> 1000s of bytes to the kernel. Would it be possible to attach the prog
> and the owner to the perf_event?

Regarding whether we could move the prog and the owner to the 
perf_event. It is possible. There is already a "prog" field in the
perf_event structure for overflow handler. We could reuse the "prog"
field and we do not need bpf_prog_owner any more. We can iterate
through trace_event_call->perf_events to find the event which has the
prog and executes it. This will support multiple prog's attaching to
the same trace_event_call as well.

This approach may need careful evaluation though.
(1). It adds runtime overhead although the overhead should be small
since perf_event attaching to the same trace_event_call should be small.
(2). trace_event_call->perf_events are per cpu data structure, that 
means, some filtering logic is needed to avoid the same perf_event prog 
is executing twice.
(3). since the list is traversed, the locking (rcu?) may be required to
pretect the list. Not sure whether additional locking is needed or not.

Alternative to using trace_event_call->perf_events, we may replace
"struct bpf_prog *prog" to a list (or some other list like data 
structure) to just record perf events which have bpf progs attached.
But this will add memory overhead to trace_event_call data structure.

> 
> -- Steve
> 
> 
>>   
>>   	int	(*perf_perm)(struct trace_event_call *,
>>   			     struct perf_event *);
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 3e691b7..6bc21e2 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -8171,6 +8171,7 @@ static int perf_event_set_bpf_prog(struct perf_event *event, u32 prog_fd)
>>   		}
>>   	}
>>   	event->tp_event->prog = prog;
>> +	event->tp_event->bpf_prog_owner = event;
>>   
>>   	return 0;
>>   }
>> @@ -8185,7 +8186,7 @@ static void perf_event_free_bpf_prog(struct perf_event *event)
>>   		return;
>>   
>>   	prog = event->tp_event->prog;
>> -	if (prog) {
>> +	if (prog && event->tp_event->bpf_prog_owner == event) {
>>   		event->tp_event->prog = NULL;
>>   		bpf_prog_put(prog);
>>   	}
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] bpf: one perf event close won't free bpf program attached by another perf event
  2017-09-21  5:17   ` Yonghong Song
@ 2017-09-21  5:20     ` Yonghong Song
  2017-09-21 11:17       ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Yonghong Song @ 2017-09-21  5:20 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: peterz, ast, daniel, netdev, kernel-team



On 9/20/17 10:17 PM, Yonghong Song wrote:
> 
> 
> On 9/20/17 6:41 PM, Steven Rostedt wrote:
>> On Mon, 18 Sep 2017 16:38:36 -0700
>> Yonghong Song <yhs@fb.com> wrote:
>>
>>> This patch fixes a bug exhibited by the following scenario:
>>>    1. fd1 = perf_event_open with attr.config = ID1
>>>    2. attach bpf program prog1 to fd1
>>>    3. fd2 = perf_event_open with attr.config = ID1
>>>       <this will be successful>
>>>    4. user program closes fd2 and prog1 is detached from the tracepoint.
>>>    5. user program with fd1 does not work properly as tracepoint
>>>       no output any more.
>>>
>>> The issue happens at step 4. Multiple perf_event_open can be called
>>> successfully, but only one bpf prog pointer in the tp_event. In the
>>> current logic, any fd release for the same tp_event will free
>>> the tp_event->prog.
>>>
>>> The fix is to free tp_event->prog only when the closing fd
>>> corresponds to the one which registered the program.
>>>
>>> Signed-off-by: Yonghong Song <yhs@fb.com>
>>> ---
>>>   Additional context: discussed with Alexei internally but did not find
>>>   a solution which can avoid introducing the additional field in
>>>   trace_event_call structure.
>>>
>>>   Peter, could you take a look as well and maybe you could have better
>>>   alternative? Thanks!
>>>
>>>   include/linux/trace_events.h | 1 +
>>>   kernel/events/core.c         | 3 ++-
>>>   2 files changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
>>> index 7f11050..2e0f222 100644
>>> --- a/include/linux/trace_events.h
>>> +++ b/include/linux/trace_events.h
>>> @@ -272,6 +272,7 @@ struct trace_event_call {
>>>       int                perf_refcount;
>>>       struct hlist_head __percpu    *perf_events;
>>>       struct bpf_prog            *prog;
>>> +    struct perf_event        *bpf_prog_owner;
>>
>> Does this have to be in the trace_event_call structure? Hmm, I'm
>> wondering if the prog needs to be there (I should look to see if we can
>> move it from it). The trace_event_call is created for *every* event,
>> and there's thousands of them now. Every byte to this structure adds
>> 1000s of bytes to the kernel. Would it be possible to attach the prog
>> and the owner to the perf_event?
> 
> Regarding whether we could move the prog and the owner to the 
> perf_event. It is possible. There is already a "prog" field in the
> perf_event structure for overflow handler. We could reuse the "prog"
> field and we do not need bpf_prog_owner any more. We can iterate
> through trace_event_call->perf_events to find the event which has the
> prog and executes it. This will support multiple prog's attaching to
> the same trace_event_call as well.
> 
> This approach may need careful evaluation though.
> (1). It adds runtime overhead although the overhead should be small
> since perf_event attaching to the same trace_event_call should be small.
> (2). trace_event_call->perf_events are per cpu data structure, that 
> means, some filtering logic is needed to avoid the same perf_event prog 
> is executing twice.

What I mean here is that the trace_event_call->perf_events need to be
checked on ALL cpus since bpf prog should be executed regardless of
cpu affiliation. It is possible that the same perf_event in different
per_cpu bucket and hence filtering is needed to avoid the same
perf_event bpf_prog is executed twice.

> (3). since the list is traversed, the locking (rcu?) may be required to
> pretect the list. Not sure whether additional locking is needed or not.
> 
> Alternative to using trace_event_call->perf_events, we may replace
> "struct bpf_prog *prog" to a list (or some other list like data 
> structure) to just record perf events which have bpf progs attached.
> But this will add memory overhead to trace_event_call data structure.
> 
>>
>> -- Steve
>>
>>
>>>       int    (*perf_perm)(struct trace_event_call *,
>>>                    struct perf_event *);
>>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>>> index 3e691b7..6bc21e2 100644
>>> --- a/kernel/events/core.c
>>> +++ b/kernel/events/core.c
>>> @@ -8171,6 +8171,7 @@ static int perf_event_set_bpf_prog(struct 
>>> perf_event *event, u32 prog_fd)
>>>           }
>>>       }
>>>       event->tp_event->prog = prog;
>>> +    event->tp_event->bpf_prog_owner = event;
>>>       return 0;
>>>   }
>>> @@ -8185,7 +8186,7 @@ static void perf_event_free_bpf_prog(struct 
>>> perf_event *event)
>>>           return;
>>>       prog = event->tp_event->prog;
>>> -    if (prog) {
>>> +    if (prog && event->tp_event->bpf_prog_owner == event) {
>>>           event->tp_event->prog = NULL;
>>>           bpf_prog_put(prog);
>>>       }
>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] bpf: one perf event close won't free bpf program attached by another perf event
  2017-09-21  5:20     ` Yonghong Song
@ 2017-09-21 11:17       ` Peter Zijlstra
  2017-09-21 14:02         ` Steven Rostedt
  2017-09-21 21:53         ` Alexei Starovoitov
  0 siblings, 2 replies; 8+ messages in thread
From: Peter Zijlstra @ 2017-09-21 11:17 UTC (permalink / raw)
  To: Yonghong Song; +Cc: Steven Rostedt, ast, daniel, netdev, kernel-team

On Wed, Sep 20, 2017 at 10:20:13PM -0700, Yonghong Song wrote:
> > (2). trace_event_call->perf_events are per cpu data structure, that
> > means, some filtering logic is needed to avoid the same perf_event prog
> > is executing twice.
> 
> What I mean here is that the trace_event_call->perf_events need to be
> checked on ALL cpus since bpf prog should be executed regardless of
> cpu affiliation. It is possible that the same perf_event in different
> per_cpu bucket and hence filtering is needed to avoid the same
> perf_event bpf_prog is executed twice.

An event will only ever be on a single CPU's list at any one time IIRC.

Now, hysterically perf_event_set_bpf_prog used the tracepoint crud
because that already had bpf bits in. But it might make sense to look at
unifying the bpf stuff across all the different event types. Have them
all use event->prog.

I suspect that would break a fair bunch of bpf proglets, since the data
access to the trace data would be completely different, but it would be
much nicer to not have this distinction based on event type.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] bpf: one perf event close won't free bpf program attached by another perf event
  2017-09-21 11:17       ` Peter Zijlstra
@ 2017-09-21 14:02         ` Steven Rostedt
  2017-09-21 21:53         ` Alexei Starovoitov
  1 sibling, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2017-09-21 14:02 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Yonghong Song, ast, daniel, netdev, kernel-team

On Thu, 21 Sep 2017 13:17:06 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> I suspect that would break a fair bunch of bpf proglets, since the data
> access to the trace data would be completely different, but it would be
> much nicer to not have this distinction based on event type.

Maybe this would be a good test to see if bpf is considered an ABI or
not.

-- Steve

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] bpf: one perf event close won't free bpf program attached by another perf event
  2017-09-21 11:17       ` Peter Zijlstra
  2017-09-21 14:02         ` Steven Rostedt
@ 2017-09-21 21:53         ` Alexei Starovoitov
  1 sibling, 0 replies; 8+ messages in thread
From: Alexei Starovoitov @ 2017-09-21 21:53 UTC (permalink / raw)
  To: Peter Zijlstra, Yonghong Song; +Cc: Steven Rostedt, daniel, netdev, kernel-team

On 9/21/17 4:17 AM, Peter Zijlstra wrote:
> On Wed, Sep 20, 2017 at 10:20:13PM -0700, Yonghong Song wrote:
>>> (2). trace_event_call->perf_events are per cpu data structure, that
>>> means, some filtering logic is needed to avoid the same perf_event prog
>>> is executing twice.
>>
>> What I mean here is that the trace_event_call->perf_events need to be
>> checked on ALL cpus since bpf prog should be executed regardless of
>> cpu affiliation. It is possible that the same perf_event in different
>> per_cpu bucket and hence filtering is needed to avoid the same
>> perf_event bpf_prog is executed twice.
>
> An event will only ever be on a single CPU's list at any one time IIRC.

yes, but doing for_each_cpu there is not an option. too slow.
struct trace_event_call is the only stable argument in
perf_trace_##call(), so we gotta have a pointer there for stuff
we need to run.
This patch added another annoying pointer, since it's the simplest
bugfix for stable. For net-next we're going to remove it, since
we're working on multi-prog support for kprobes/tracepoints.
(right now there is only one prog allowed and that's very limiting)
With multi-prog that bpf_prog_owner pointer will be removed and
existing 'struct bpf_prog *prog' pointer will be replaced with
something else.

> Now, hysterically perf_event_set_bpf_prog used the tracepoint crud
> because that already had bpf bits in. But it might make sense to look at
> unifying the bpf stuff across all the different event types. Have them
> all use event->prog.

it sounds good in theory, but in practice we need a separate
'stuff to run' pointer in both perf_event and trace_even_call,
since that's what being passed to overflow_handle and perf_trace_##call.

> I suspect that would break a fair bunch of bpf proglets, since the data
> access to the trace data would be completely different, but it would be
> much nicer to not have this distinction based on event type.

such things are certainly an abi.
kprobe+bpf has to see struct pt_regs
perf_event+bpf has to see struct bpf_perf_event_data and
tracepoint+bpf has to see struct foo { fields }
The fields will change every time tracepoint is changed.
That's fine.
But we cannot unify kprobe with tracepoints with perf_event prog types.
And frankly I don't see the need.
Note that in pt_regs we don't need to populate everything.
The 'optimized fprobe' we were talking about at plumbers we
would populate di,si,dx,cx,sp since most of the kprobe+bpf progs
don't care about the other regs and especially cpu flags.
So plenty of room for tweaks and optimizations.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-09-21 21:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-18 23:38 [PATCH net] bpf: one perf event close won't free bpf program attached by another perf event Yonghong Song
2017-09-20 21:12 ` David Miller
2017-09-21  1:41 ` Steven Rostedt
2017-09-21  5:17   ` Yonghong Song
2017-09-21  5:20     ` Yonghong Song
2017-09-21 11:17       ` Peter Zijlstra
2017-09-21 14:02         ` Steven Rostedt
2017-09-21 21:53         ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox