netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v3 bpf-next 1/2] bpf: separate bpf_get_[stack|stackid] for perf events BPF
       [not found] ` <20200716225933.196342-2-songliubraving@fb.com>
@ 2020-07-21 19:10   ` Alexei Starovoitov
  2020-07-21 22:40     ` Song Liu
  0 siblings, 1 reply; 6+ messages in thread
From: Alexei Starovoitov @ 2020-07-21 19:10 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-kernel, bpf, netdev, ast, daniel, kernel-team,
	john.fastabend, kpsingh, brouer, peterz

On Thu, Jul 16, 2020 at 03:59:32PM -0700, Song Liu wrote:
> +
> +BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
> +	   struct bpf_map *, map, u64, flags)
> +{
> +	struct perf_event *event = ctx->event;
> +	struct perf_callchain_entry *trace;
> +	bool has_kernel, has_user;
> +	bool kernel, user;
> +
> +	/* perf_sample_data doesn't have callchain, use bpf_get_stackid */
> +	if (!(event->attr.sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY))

what if event was not created with PERF_SAMPLE_CALLCHAIN ?
Calling the helper will still cause crashes, no?

> +		return bpf_get_stackid((unsigned long)(ctx->regs),
> +				       (unsigned long) map, flags, 0, 0);
> +
> +	if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
> +			       BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
> +		return -EINVAL;
> +
> +	user = flags & BPF_F_USER_STACK;
> +	kernel = !user;
> +
> +	has_kernel = !event->attr.exclude_callchain_kernel;
> +	has_user = !event->attr.exclude_callchain_user;
> +
> +	if ((kernel && !has_kernel) || (user && !has_user))
> +		return -EINVAL;

this will break existing users in a way that will be very hard for them to debug.
If they happen to set exclude_callchain_* flags during perf_event_open
the helpers will be failing at run-time.
One can argue that when precise_ip=1 the bpf_get_stack is broken, but
this is a change in behavior.
It also seems to be broken when PERF_SAMPLE_CALLCHAIN was not set at event
creation time, but precise_ip=1 was.

> +
> +	trace = ctx->data->callchain;
> +	if (unlikely(!trace))
> +		return -EFAULT;
> +
> +	if (has_kernel && has_user) {

shouldn't it be || ?

> +		__u64 nr_kernel = count_kernel_ip(trace);
> +		int ret;
> +
> +		if (kernel) {
> +			__u64 nr = trace->nr;
> +
> +			trace->nr = nr_kernel;
> +			ret = __bpf_get_stackid(map, trace, flags);
> +
> +			/* restore nr */
> +			trace->nr = nr;
> +		} else { /* user */
> +			u64 skip = flags & BPF_F_SKIP_FIELD_MASK;
> +
> +			skip += nr_kernel;
> +			if (skip > BPF_F_SKIP_FIELD_MASK)
> +				return -EFAULT;
> +
> +			flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
> +			ret = __bpf_get_stackid(map, trace, flags);
> +		}
> +		return ret;
> +	}
> +	return __bpf_get_stackid(map, trace, flags);
...
> +	if (has_kernel && has_user) {
> +		__u64 nr_kernel = count_kernel_ip(trace);
> +		int ret;
> +
> +		if (kernel) {
> +			__u64 nr = trace->nr;
> +
> +			trace->nr = nr_kernel;
> +			ret = __bpf_get_stack(ctx->regs, NULL, trace, buf,
> +					      size, flags);
> +
> +			/* restore nr */
> +			trace->nr = nr;
> +		} else { /* user */
> +			u64 skip = flags & BPF_F_SKIP_FIELD_MASK;
> +
> +			skip += nr_kernel;
> +			if (skip > BPF_F_SKIP_FIELD_MASK)
> +				goto clear;
> +
> +			flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
> +			ret = __bpf_get_stack(ctx->regs, NULL, trace, buf,
> +					      size, flags);
> +		}

Looks like copy-paste. I think there should be a way to make it
into common helper.

I think the main isssue is wrong interaction with event attr flags.
I think the verifier should detect that bpf_get_stack/bpf_get_stackid
were used and prevent attaching to perf_event with attr.precise_ip=1
and PERF_SAMPLE_CALLCHAIN is not specified.
I was thinking whether attaching bpf to event can force setting of
PERF_SAMPLE_CALLCHAIN, but that would be a surprising behavior,
so not a good idea.
So the only thing left is to reject attach when bpf_get_stack is used
in two cases:
if attr.precise_ip=1 and PERF_SAMPLE_CALLCHAIN is not set.
  (since it will lead to crashes)
if attr.precise_ip=1 and PERF_SAMPLE_CALLCHAIN is set,
but exclude_callchain_[user|kernel]=1 is set too.
  (since it will lead to surprising behavior of bpf_get_stack)

Other ideas?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 bpf-next 1/2] bpf: separate bpf_get_[stack|stackid] for perf events BPF
  2020-07-21 19:10   ` [PATCH v3 bpf-next 1/2] bpf: separate bpf_get_[stack|stackid] for perf events BPF Alexei Starovoitov
@ 2020-07-21 22:40     ` Song Liu
  2020-07-21 22:43       ` Alexei Starovoitov
  2020-07-22 15:40       ` Peter Zijlstra
  0 siblings, 2 replies; 6+ messages in thread
From: Song Liu @ 2020-07-21 22:40 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john.fastabend@gmail.com, kpsingh@chromium.org,
	brouer@redhat.com, peterz@infradead.org



> On Jul 21, 2020, at 12:10 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> On Thu, Jul 16, 2020 at 03:59:32PM -0700, Song Liu wrote:
>> +
>> +BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
>> +	   struct bpf_map *, map, u64, flags)
>> +{
>> +	struct perf_event *event = ctx->event;
>> +	struct perf_callchain_entry *trace;
>> +	bool has_kernel, has_user;
>> +	bool kernel, user;
>> +
>> +	/* perf_sample_data doesn't have callchain, use bpf_get_stackid */
>> +	if (!(event->attr.sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY))
> 
> what if event was not created with PERF_SAMPLE_CALLCHAIN ?
> Calling the helper will still cause crashes, no?

Yeah, it may still crash. Somehow I messed up this logic...

> 
>> +		return bpf_get_stackid((unsigned long)(ctx->regs),
>> +				       (unsigned long) map, flags, 0, 0);
>> +
>> +	if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
>> +			       BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
>> +		return -EINVAL;
>> +
>> +	user = flags & BPF_F_USER_STACK;
>> +	kernel = !user;
>> +
>> +	has_kernel = !event->attr.exclude_callchain_kernel;
>> +	has_user = !event->attr.exclude_callchain_user;
>> +
>> +	if ((kernel && !has_kernel) || (user && !has_user))
>> +		return -EINVAL;
> 
> this will break existing users in a way that will be very hard for them to debug.
> If they happen to set exclude_callchain_* flags during perf_event_open
> the helpers will be failing at run-time.
> One can argue that when precise_ip=1 the bpf_get_stack is broken, but
> this is a change in behavior.
> It also seems to be broken when PERF_SAMPLE_CALLCHAIN was not set at event
> creation time, but precise_ip=1 was.
> 
>> +
>> +	trace = ctx->data->callchain;
>> +	if (unlikely(!trace))
>> +		return -EFAULT;
>> +
>> +	if (has_kernel && has_user) {
> 
> shouldn't it be || ?

It should be &&. We only need to adjust the attached calltrace when it has both 
kernel and user stack. 

> 
>> +		__u64 nr_kernel = count_kernel_ip(trace);
>> +		int ret;
>> +
>> +		if (kernel) {
>> +			__u64 nr = trace->nr;
>> +
>> +			trace->nr = nr_kernel;
>> +			ret = __bpf_get_stackid(map, trace, flags);
>> +
>> +			/* restore nr */
>> +			trace->nr = nr;
>> +		} else { /* user */
>> +			u64 skip = flags & BPF_F_SKIP_FIELD_MASK;
>> +
>> +			skip += nr_kernel;
>> +			if (skip > BPF_F_SKIP_FIELD_MASK)
>> +				return -EFAULT;
>> +
>> +			flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
>> +			ret = __bpf_get_stackid(map, trace, flags);
>> +		}
>> +		return ret;
>> +	}
>> +	return __bpf_get_stackid(map, trace, flags);
> ...
>> +	if (has_kernel && has_user) {
>> +		__u64 nr_kernel = count_kernel_ip(trace);
>> +		int ret;
>> +
>> +		if (kernel) {
>> +			__u64 nr = trace->nr;
>> +
>> +			trace->nr = nr_kernel;
>> +			ret = __bpf_get_stack(ctx->regs, NULL, trace, buf,
>> +					      size, flags);
>> +
>> +			/* restore nr */
>> +			trace->nr = nr;
>> +		} else { /* user */
>> +			u64 skip = flags & BPF_F_SKIP_FIELD_MASK;
>> +
>> +			skip += nr_kernel;
>> +			if (skip > BPF_F_SKIP_FIELD_MASK)
>> +				goto clear;
>> +
>> +			flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
>> +			ret = __bpf_get_stack(ctx->regs, NULL, trace, buf,
>> +					      size, flags);
>> +		}
> 
> Looks like copy-paste. I think there should be a way to make it
> into common helper.

I thought about moving this logic to a helper. But we are calling
__bpf_get_stackid() above, and __bpf_get_stack() here. So we can't 
easily put all the logic in a big helper. Multiple small helpers 
looks messy (to me). 

> 
> I think the main isssue is wrong interaction with event attr flags.
> I think the verifier should detect that bpf_get_stack/bpf_get_stackid
> were used and prevent attaching to perf_event with attr.precise_ip=1
> and PERF_SAMPLE_CALLCHAIN is not specified.
> I was thinking whether attaching bpf to event can force setting of
> PERF_SAMPLE_CALLCHAIN, but that would be a surprising behavior,
> so not a good idea.
> So the only thing left is to reject attach when bpf_get_stack is used
> in two cases:
> if attr.precise_ip=1 and PERF_SAMPLE_CALLCHAIN is not set.
>  (since it will lead to crashes)

We only need to block precise_ip >= 2. precise_ip == 1 is OK. 

> if attr.precise_ip=1 and PERF_SAMPLE_CALLCHAIN is set,
> but exclude_callchain_[user|kernel]=1 is set too.
>  (since it will lead to surprising behavior of bpf_get_stack)
> 
> Other ideas?

Yes, this sounds good. 

Thanks,
Song


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 bpf-next 1/2] bpf: separate bpf_get_[stack|stackid] for perf events BPF
  2020-07-21 22:40     ` Song Liu
@ 2020-07-21 22:43       ` Alexei Starovoitov
  2020-07-21 22:51         ` Song Liu
  2020-07-22 15:40       ` Peter Zijlstra
  1 sibling, 1 reply; 6+ messages in thread
From: Alexei Starovoitov @ 2020-07-21 22:43 UTC (permalink / raw)
  To: Song Liu
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john.fastabend@gmail.com, kpsingh@chromium.org,
	brouer@redhat.com, peterz@infradead.org

On Tue, Jul 21, 2020 at 3:40 PM Song Liu <songliubraving@fb.com> wrote:
>
> We only need to block precise_ip >= 2. precise_ip == 1 is OK.

Are you sure?
intel_pmu_hw_config() has:
if (event->attr.precise_ip) {
    if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
            event->attr.sample_type |= __PERF_SAMPLE_CALLCHAIN_EARLY;
}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 bpf-next 1/2] bpf: separate bpf_get_[stack|stackid] for perf events BPF
  2020-07-21 22:43       ` Alexei Starovoitov
@ 2020-07-21 22:51         ` Song Liu
  0 siblings, 0 replies; 6+ messages in thread
From: Song Liu @ 2020-07-21 22:51 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john.fastabend@gmail.com, kpsingh@chromium.org,
	brouer@redhat.com, peterz@infradead.org



> On Jul 21, 2020, at 3:43 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> On Tue, Jul 21, 2020 at 3:40 PM Song Liu <songliubraving@fb.com> wrote:
>> 
>> We only need to block precise_ip >= 2. precise_ip == 1 is OK.
> 
> Are you sure?
> intel_pmu_hw_config() has:
> if (event->attr.precise_ip) {
>    if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
>            event->attr.sample_type |= __PERF_SAMPLE_CALLCHAIN_EARLY;
> }

The bit that breaks the unwinder was in setup_pebs_fixed_sample_data():

                if (x86_pmu.intel_cap.pebs_format >= 2) {
                        set_linear_ip(regs, pebs->real_ip);
                        regs->flags |= PERF_EFLAGS_EXACT;
                } 

"real_ip" causes the issue. 

But on a second thought, it is probably better also blocks precise_ip == 1, 
to match the logic in intel_pmu_hw_config(). 

Thanks,
Song

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 bpf-next 1/2] bpf: separate bpf_get_[stack|stackid] for perf events BPF
  2020-07-21 22:40     ` Song Liu
  2020-07-21 22:43       ` Alexei Starovoitov
@ 2020-07-22 15:40       ` Peter Zijlstra
  2020-07-22 16:49         ` Song Liu
  1 sibling, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2020-07-22 15:40 UTC (permalink / raw)
  To: Song Liu
  Cc: Alexei Starovoitov, open list, bpf, Networking,
	Alexei Starovoitov, Daniel Borkmann, Kernel Team,
	john.fastabend@gmail.com, kpsingh@chromium.org, brouer@redhat.com

On Tue, Jul 21, 2020 at 10:40:19PM +0000, Song Liu wrote:

> We only need to block precise_ip >= 2. precise_ip == 1 is OK. 

Uuuh, how? Anything PEBS would have the same problem. Sure, precise_ip
== 1 will not correct the IP, but the stack will not match regardless.

You need IP,SP(,BP) to be a consistent set _AND_ have it match the
current stack, PEBS simply cannot do that, because the regs get recorded
(much) earlier than the PMI and the stack can have changed in the
meantime.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 bpf-next 1/2] bpf: separate bpf_get_[stack|stackid] for perf events BPF
  2020-07-22 15:40       ` Peter Zijlstra
@ 2020-07-22 16:49         ` Song Liu
  0 siblings, 0 replies; 6+ messages in thread
From: Song Liu @ 2020-07-22 16:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexei Starovoitov, open list, bpf, Networking,
	Alexei Starovoitov, Daniel Borkmann, Kernel Team,
	john.fastabend@gmail.com, kpsingh@chromium.org, brouer@redhat.com



> On Jul 22, 2020, at 8:40 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Tue, Jul 21, 2020 at 10:40:19PM +0000, Song Liu wrote:
> 
>> We only need to block precise_ip >= 2. precise_ip == 1 is OK. 
> 
> Uuuh, how? Anything PEBS would have the same problem. Sure, precise_ip
> == 1 will not correct the IP, but the stack will not match regardless.
> 
> You need IP,SP(,BP) to be a consistent set _AND_ have it match the
> current stack, PEBS simply cannot do that, because the regs get recorded
> (much) earlier than the PMI and the stack can have changed in the
> meantime.
> 

By "OK", I meant unwinder will not report error (in my tests). For 
accurate stack, we should do the same for precise_ip == 1. 

Thanks,
Song

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-07-22 16:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20200716225933.196342-1-songliubraving@fb.com>
     [not found] ` <20200716225933.196342-2-songliubraving@fb.com>
2020-07-21 19:10   ` [PATCH v3 bpf-next 1/2] bpf: separate bpf_get_[stack|stackid] for perf events BPF Alexei Starovoitov
2020-07-21 22:40     ` Song Liu
2020-07-21 22:43       ` Alexei Starovoitov
2020-07-21 22:51         ` Song Liu
2020-07-22 15:40       ` Peter Zijlstra
2020-07-22 16:49         ` Song Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).