[BUG] perf stat: explicit grouping yields unexpected results

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [BUG] perf stat: explicit grouping yields unexpected results
@ 2013-11-14 20:50 Stephane Eranian
  2013-11-15  6:34 ` Ingo Molnar
  0 siblings, 1 reply; 21+ messages in thread
From: Stephane Eranian @ 2013-11-14 20:50 UTC (permalink / raw)
  To: LKML
  Cc: Jiri Olsa, Peter Zijlstra, mingo@elte.hu, David Ahern,
	ak@linux.intel.com, Arnaldo Carvalho de Melo, Namhyung Kim

Jiri,

I was trying the grouping support in perf stat and I was surprised
to see that if I create a group that is too big to be scheduled, and
where only N out of P events can fit, perf stat still yields counts
for the N events. I was expecting 0 counts or <not supported>.

The kernel semantic is to schedule all the events in a group or none.
Perf does something different and this is confusing. If you use explicit
grouping then I think you want to group to fail if not all the events can
be scheduled:

On an IvyBridge:
$ perf stat --g -e
'{cycles,instructions,branches,branches,branches,branches,branches}'
noploop 1
     3 229 417 079 cycles
     3 223 919 023 instructions              #    1,00  insns per cycle
     3 220 868 098 branches
     3 220 868 098 branches
     3 220 868 098 branches
     3 220 868 098 branches
   <not supported> branches

I think it should be: <not supported> for all events.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] perf stat: explicit grouping yields unexpected results
  2013-11-14 20:50 [BUG] perf stat: explicit grouping yields unexpected results Stephane Eranian
@ 2013-11-15  6:34 ` Ingo Molnar
  2013-11-15  9:24   ` Stephane Eranian
                     ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Ingo Molnar @ 2013-11-15  6:34 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: LKML, Jiri Olsa, Peter Zijlstra, mingo@elte.hu, David Ahern,
	ak@linux.intel.com, Arnaldo Carvalho de Melo, Namhyung Kim


* Stephane Eranian <eranian@google.com> wrote:

> Jiri,
> 
> I was trying the grouping support in perf stat and I was surprised 
> to see that if I create a group that is too big to be scheduled, and 
> where only N out of P events can fit, perf stat still yields counts 
> for the N events. I was expecting 0 counts or <not supported>.
> 
> The kernel semantic is to schedule all the events in a group or 
> none. Perf does something different and this is confusing. If you 
> use explicit grouping then I think you want to group to fail if not 
> all the events can be scheduled:
> 
> On an IvyBridge:
> $ perf stat --g -e
> '{cycles,instructions,branches,branches,branches,branches,branches}'
> noploop 1
>      3 229 417 079 cycles
>      3 223 919 023 instructions              #    1,00  insns per cycle
>      3 220 868 098 branches
>      3 220 868 098 branches
>      3 220 868 098 branches
>      3 220 868 098 branches
>    <not supported> branches
> 
> I think it should be: <not supported> for all events.

Btw., does the kernel side currently support discovery of such 
impossible group scheduling constraints at group setup time? If not 
then it probably should and it should reject them straight away.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] perf stat: explicit grouping yields unexpected results
  2013-11-15  6:34 ` Ingo Molnar
@ 2013-11-15  9:24   ` Stephane Eranian
  2013-11-15 10:34     ` Ingo Molnar
  2013-11-15 10:05   ` Peter Zijlstra
  2013-11-15 15:08   ` Vince Weaver
  2 siblings, 1 reply; 21+ messages in thread
From: Stephane Eranian @ 2013-11-15  9:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Jiri Olsa, Peter Zijlstra, mingo@elte.hu, David Ahern,
	ak@linux.intel.com, Arnaldo Carvalho de Melo, Namhyung Kim

On Fri, Nov 15, 2013 at 7:34 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Stephane Eranian <eranian@google.com> wrote:
>
>> Jiri,
>>
>> I was trying the grouping support in perf stat and I was surprised
>> to see that if I create a group that is too big to be scheduled, and
>> where only N out of P events can fit, perf stat still yields counts
>> for the N events. I was expecting 0 counts or <not supported>.
>>
>> The kernel semantic is to schedule all the events in a group or
>> none. Perf does something different and this is confusing. If you
>> use explicit grouping then I think you want to group to fail if not
>> all the events can be scheduled:
>>
>> On an IvyBridge:
>> $ perf stat --g -e
>> '{cycles,instructions,branches,branches,branches,branches,branches}'
>> noploop 1
>>      3 229 417 079 cycles
>>      3 223 919 023 instructions              #    1,00  insns per cycle
>>      3 220 868 098 branches
>>      3 220 868 098 branches
>>      3 220 868 098 branches
>>      3 220 868 098 branches
>>    <not supported> branches
>>
>> I think it should be: <not supported> for all events.
>
> Btw., does the kernel side currently support discovery of such
> impossible group scheduling constraints at group setup time? If not
> then it probably should and it should reject them straight away.
>
The kernel does validate events as they are added to a group.
That's why we have validate_event(), validate_group() and
the fake_cpuc mode.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] perf stat: explicit grouping yields unexpected results
  2013-11-15  9:24   ` Stephane Eranian
@ 2013-11-15 10:34     ` Ingo Molnar
  2013-11-15 10:41       ` Stephane Eranian
                         ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Ingo Molnar @ 2013-11-15 10:34 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: LKML, Jiri Olsa, Peter Zijlstra, mingo@elte.hu, David Ahern,
	ak@linux.intel.com, Arnaldo Carvalho de Melo, Namhyung Kim


* Stephane Eranian <eranian@google.com> wrote:

> On Fri, Nov 15, 2013 at 7:34 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > * Stephane Eranian <eranian@google.com> wrote:
> >
> >> Jiri,
> >>
> >> I was trying the grouping support in perf stat and I was surprised
> >> to see that if I create a group that is too big to be scheduled, and
> >> where only N out of P events can fit, perf stat still yields counts
> >> for the N events. I was expecting 0 counts or <not supported>.
> >>
> >> The kernel semantic is to schedule all the events in a group or
> >> none. Perf does something different and this is confusing. If you
> >> use explicit grouping then I think you want to group to fail if not
> >> all the events can be scheduled:
> >>
> >> On an IvyBridge:
> >> $ perf stat --g -e
> >> '{cycles,instructions,branches,branches,branches,branches,branches}'
> >> noploop 1
> >>      3 229 417 079 cycles
> >>      3 223 919 023 instructions              #    1,00  insns per cycle
> >>      3 220 868 098 branches
> >>      3 220 868 098 branches
> >>      3 220 868 098 branches
> >>      3 220 868 098 branches
> >>    <not supported> branches
> >>
> >> I think it should be: <not supported> for all events.
> >
> > Btw., does the kernel side currently support discovery of such
> > impossible group scheduling constraints at group setup time? If not
> > then it probably should and it should reject them straight away.
>
> The kernel does validate events as they are added to a group. That's 
> why we have validate_event(), validate_group() and the fake_cpuc 
> mode.

So the problem here isn't really that the kernel doesn't tell 
userspace about it, but that perf stat does not interpret the 
validation result properly.

That brings up an interesting question: what is better for users, if 
we schedule as many as we can and say 'not supported' to the rest 
(current behavior), or if we fail the whole group?

I'd say that the default behavior should be what Jiri implemented: get 
the most out of the situation and inform. But you are right in that 
'forcing' all elements of a group to be valid should be possible as 
well - if a special perf stat option or event format is used.

Even in that second case it shouldn't say <unsupported> for everything 
in the result, but should deny the run immediately and return with an 
error, and should tell the user how many events in the group fit and 
which ones didn't.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] perf stat: explicit grouping yields unexpected results
  2013-11-15 10:34     ` Ingo Molnar
@ 2013-11-15 10:41       ` Stephane Eranian
  2013-11-15 10:50       ` Peter Zijlstra
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: Stephane Eranian @ 2013-11-15 10:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Jiri Olsa, Peter Zijlstra, mingo@elte.hu, David Ahern,
	ak@linux.intel.com, Arnaldo Carvalho de Melo, Namhyung Kim

On Fri, Nov 15, 2013 at 11:34 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Stephane Eranian <eranian@google.com> wrote:
>
>> On Fri, Nov 15, 2013 at 7:34 AM, Ingo Molnar <mingo@kernel.org> wrote:
>> >
>> > * Stephane Eranian <eranian@google.com> wrote:
>> >
>> >> Jiri,
>> >>
>> >> I was trying the grouping support in perf stat and I was surprised
>> >> to see that if I create a group that is too big to be scheduled, and
>> >> where only N out of P events can fit, perf stat still yields counts
>> >> for the N events. I was expecting 0 counts or <not supported>.
>> >>
>> >> The kernel semantic is to schedule all the events in a group or
>> >> none. Perf does something different and this is confusing. If you
>> >> use explicit grouping then I think you want to group to fail if not
>> >> all the events can be scheduled:
>> >>
>> >> On an IvyBridge:
>> >> $ perf stat --g -e
>> >> '{cycles,instructions,branches,branches,branches,branches,branches}'
>> >> noploop 1
>> >>      3 229 417 079 cycles
>> >>      3 223 919 023 instructions              #    1,00  insns per cycle
>> >>      3 220 868 098 branches
>> >>      3 220 868 098 branches
>> >>      3 220 868 098 branches
>> >>      3 220 868 098 branches
>> >>    <not supported> branches
>> >>
>> >> I think it should be: <not supported> for all events.
>> >
>> > Btw., does the kernel side currently support discovery of such
>> > impossible group scheduling constraints at group setup time? If not
>> > then it probably should and it should reject them straight away.
>>
>> The kernel does validate events as they are added to a group. That's
>> why we have validate_event(), validate_group() and the fake_cpuc
>> mode.
>
> So the problem here isn't really that the kernel doesn't tell
> userspace about it, but that perf stat does not interpret the
> validation result properly.
>
Correct.

> That brings up an interesting question: what is better for users, if
> we schedule as many as we can and say 'not supported' to the rest
> (current behavior), or if we fail the whole group?
>
> I'd say that the default behavior should be what Jiri implemented: get
> the most out of the situation and inform. But you are right in that
> 'forcing' all elements of a group to be valid should be possible as
> well - if a special perf stat option or event format is used.
>
> Even in that second case it shouldn't say <unsupported> for everything
> in the result, but should deny the run immediately and return with an
> error, and should tell the user how many events in the group fit and
> which ones didn't.
>
Fine with me.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] perf stat: explicit grouping yields unexpected results
  2013-11-15 10:34     ` Ingo Molnar
  2013-11-15 10:41       ` Stephane Eranian
@ 2013-11-15 10:50       ` Peter Zijlstra
  2013-11-15 11:52       ` Peter Zijlstra
  2013-11-17  3:41       ` Andi Kleen
  3 siblings, 0 replies; 21+ messages in thread
From: Peter Zijlstra @ 2013-11-15 10:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Stephane Eranian, LKML, Jiri Olsa, mingo@elte.hu, David Ahern,
	ak@linux.intel.com, Arnaldo Carvalho de Melo, Namhyung Kim

On Fri, Nov 15, 2013 at 11:34:05AM +0100, Ingo Molnar wrote:
> That brings up an interesting question: what is better for users, if 
> we schedule as many as we can and say 'not supported' to the rest 
> (current behavior), or if we fail the whole group?

Obviously fail the whole group :-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] perf stat: explicit grouping yields unexpected results
  2013-11-15 10:34     ` Ingo Molnar
  2013-11-15 10:41       ` Stephane Eranian
  2013-11-15 10:50       ` Peter Zijlstra
@ 2013-11-15 11:52       ` Peter Zijlstra
  2013-11-15 11:58         ` Stephane Eranian
  2013-11-17  3:41       ` Andi Kleen
  3 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2013-11-15 11:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Stephane Eranian, LKML, Jiri Olsa, mingo@elte.hu, David Ahern,
	ak@linux.intel.com, Arnaldo Carvalho de Melo, Namhyung Kim

On Fri, Nov 15, 2013 at 11:34:05AM +0100, Ingo Molnar wrote:
> That brings up an interesting question: what is better for users, if 
> we schedule as many as we can and say 'not supported' to the rest 
> (current behavior), or if we fail the whole group?
> 
> I'd say that the default behavior should be what Jiri implemented: get 
> the most out of the situation and inform. But you are right in that 
> 'forcing' all elements of a group to be valid should be possible as 
> well - if a special perf stat option or event format is used.

So I don't agree, but if you want to keep this IMO weird behaviour at
least WARN about it in big blinking neon letters that the user isn't
getting what he asked for.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] perf stat: explicit grouping yields unexpected results
  2013-11-15 11:52       ` Peter Zijlstra
@ 2013-11-15 11:58         ` Stephane Eranian
  0 siblings, 0 replies; 21+ messages in thread
From: Stephane Eranian @ 2013-11-15 11:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, LKML, Jiri Olsa, mingo@elte.hu, David Ahern,
	ak@linux.intel.com, Arnaldo Carvalho de Melo, Namhyung Kim

On Fri, Nov 15, 2013 at 12:52 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Nov 15, 2013 at 11:34:05AM +0100, Ingo Molnar wrote:
>> That brings up an interesting question: what is better for users, if
>> we schedule as many as we can and say 'not supported' to the rest
>> (current behavior), or if we fail the whole group?
>>
>> I'd say that the default behavior should be what Jiri implemented: get
>> the most out of the situation and inform. But you are right in that
>> 'forcing' all elements of a group to be valid should be possible as
>> well - if a special perf stat option or event format is used.
>
> So I don't agree, but if you want to keep this IMO weird behaviour at
> least WARN about it in big blinking neon letters that the user isn't
> getting what he asked for.

Agreed.
When I tested this, I already knew what to expect, that's why I was
surprised not to get any warning or error messages.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] perf stat: explicit grouping yields unexpected results
  2013-11-15 10:34     ` Ingo Molnar
                         ` (2 preceding siblings ...)
  2013-11-15 11:52       ` Peter Zijlstra
@ 2013-11-17  3:41       ` Andi Kleen
  2013-11-29 13:33         ` Jiri Olsa
  3 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2013-11-17  3:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Stephane Eranian, LKML, Jiri Olsa, Peter Zijlstra, mingo@elte.hu,
	David Ahern, Arnaldo Carvalho de Melo, Namhyung Kim

> I'd say that the default behavior should be what Jiri implemented: get 
> the most out of the situation and inform. But you are right in that 
> 'forcing' all elements of a group to be valid should be possible as 
> well - if a special perf stat option or event format is used.

When something is multiplexed it can have a very 
large measurement error. For workloads that fluctuate quite a bit, and the
fluctuations do not line up well with the multiplexing interval,
the default scaling does not give good results.

So you expect to get good data, but you get very bad data.

When collecting data for a large number of events it is important
to group them correctly, so that events that are directly dependent
on each other in equations are properly grouped.

When explicit groups were added the user likely considered this 
problem, so it's not good to silently override the choices.

If a user doesn't care they can always not use groups.

> Even in that second case it shouldn't say <unsupported> for everything 
> in the result, but should deny the run immediately and return with an 
> error, and should tell the user how many events in the group fit and 
> which ones didn't.

Returning this information would be great, but it would really 
need an extended errno, or just a error string reported out.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] perf stat: explicit grouping yields unexpected results
  2013-11-17  3:41       ` Andi Kleen
@ 2013-11-29 13:33         ` Jiri Olsa
  2013-11-29 13:43           ` Stephane Eranian
  2013-12-02 15:23           ` Andi Kleen
  0 siblings, 2 replies; 21+ messages in thread
From: Jiri Olsa @ 2013-11-29 13:33 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, Stephane Eranian, LKML, Peter Zijlstra,
	mingo@elte.hu, David Ahern, Arnaldo Carvalho de Melo,
	Namhyung Kim

On Sat, Nov 16, 2013 at 07:41:34PM -0800, Andi Kleen wrote:
> > I'd say that the default behavior should be what Jiri implemented: get 
> > the most out of the situation and inform. But you are right in that 
> > 'forcing' all elements of a group to be valid should be possible as 
> > well - if a special perf stat option or event format is used.
> 
> When something is multiplexed it can have a very 
> large measurement error. For workloads that fluctuate quite a bit, and the
> fluctuations do not line up well with the multiplexing interval,
> the default scaling does not give good results.
> 
> So you expect to get good data, but you get very bad data.
> 
> When collecting data for a large number of events it is important
> to group them correctly, so that events that are directly dependent
> on each other in equations are properly grouped.
> 
> When explicit groups were added the user likely considered this 
> problem, so it's not good to silently override the choices.
> 
> If a user doesn't care they can always not use groups.
> 
> > Even in that second case it shouldn't say <unsupported> for everything 
> > in the result, but should deny the run immediately and return with an 
> > error, and should tell the user how many events in the group fit and 
> > which ones didn't.
> 
> Returning this information would be great, but it would really 
> need an extended errno, or just a error string reported out.

(sry for late reply, I was still ooo, and missed this conversation)

I agree, when the last event fails sys_perf_event_open
due to the validate_group check, we will get just EINVAL

Was there any discussion about the error (or erorr string)
propagation from sys_perf_event_open?

Something like below? user space supply buffer for error string.

jirka


---
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index e1802d6..a827870 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -331,8 +331,8 @@ struct perf_event_attr {
 	 */
 	__u32	sample_stack_user;
 
-	/* Align to u64. */
-	__u32	__reserved_2;
+	__u32	errstr_size;
+	char	*errstr;
 };
 
 #define perf_flags(attr)	(*(&(attr)->read_format + 1))


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [BUG] perf stat: explicit grouping yields unexpected results
  2013-11-29 13:33         ` Jiri Olsa
@ 2013-11-29 13:43           ` Stephane Eranian
  2013-11-29 13:52             ` Jiri Olsa
  2013-12-02 15:23           ` Andi Kleen
  1 sibling, 1 reply; 21+ messages in thread
From: Stephane Eranian @ 2013-11-29 13:43 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andi Kleen, Ingo Molnar, LKML, Peter Zijlstra, mingo@elte.hu,
	David Ahern, Arnaldo Carvalho de Melo, Namhyung Kim

On Fri, Nov 29, 2013 at 2:33 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> On Sat, Nov 16, 2013 at 07:41:34PM -0800, Andi Kleen wrote:
>> > I'd say that the default behavior should be what Jiri implemented: get
>> > the most out of the situation and inform. But you are right in that
>> > 'forcing' all elements of a group to be valid should be possible as
>> > well - if a special perf stat option or event format is used.
>>
>> When something is multiplexed it can have a very
>> large measurement error. For workloads that fluctuate quite a bit, and the
>> fluctuations do not line up well with the multiplexing interval,
>> the default scaling does not give good results.
>>
>> So you expect to get good data, but you get very bad data.
>>
>> When collecting data for a large number of events it is important
>> to group them correctly, so that events that are directly dependent
>> on each other in equations are properly grouped.
>>
>> When explicit groups were added the user likely considered this
>> problem, so it's not good to silently override the choices.
>>
>> If a user doesn't care they can always not use groups.
>>
>> > Even in that second case it shouldn't say <unsupported> for everything
>> > in the result, but should deny the run immediately and return with an
>> > error, and should tell the user how many events in the group fit and
>> > which ones didn't.
>>
>> Returning this information would be great, but it would really
>> need an extended errno, or just a error string reported out.
>
> (sry for late reply, I was still ooo, and missed this conversation)
>
> I agree, when the last event fails sys_perf_event_open
> due to the validate_group check, we will get just EINVAL
>
> Was there any discussion about the error (or erorr string)
> propagation from sys_perf_event_open?
>
> Something like below? user space supply buffer for error string.
>
No. Why do you need kernel changes for that.
Perf gets the error, knows it is grouping and prints an appropriate
error message. Why do you need kernel for this?

> jirka
>
>
> ---
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index e1802d6..a827870 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -331,8 +331,8 @@ struct perf_event_attr {
>          */
>         __u32   sample_stack_user;
>
> -       /* Align to u64. */
> -       __u32   __reserved_2;
> +       __u32   errstr_size;
> +       char    *errstr;
>  };
>
>  #define perf_flags(attr)       (*(&(attr)->read_format + 1))
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] perf stat: explicit grouping yields unexpected results
  2013-11-29 13:43           ` Stephane Eranian
@ 2013-11-29 13:52             ` Jiri Olsa
  2013-11-29 14:01               ` Stephane Eranian
  0 siblings, 1 reply; 21+ messages in thread
From: Jiri Olsa @ 2013-11-29 13:52 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Andi Kleen, Ingo Molnar, LKML, Peter Zijlstra, mingo@elte.hu,
	David Ahern, Arnaldo Carvalho de Melo, Namhyung Kim

On Fri, Nov 29, 2013 at 02:43:35PM +0100, Stephane Eranian wrote:
> On Fri, Nov 29, 2013 at 2:33 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> > On Sat, Nov 16, 2013 at 07:41:34PM -0800, Andi Kleen wrote:
> >> > I'd say that the default behavior should be what Jiri implemented: get
> >> > the most out of the situation and inform. But you are right in that
> >> > 'forcing' all elements of a group to be valid should be possible as
> >> > well - if a special perf stat option or event format is used.
> >>
> >> When something is multiplexed it can have a very
> >> large measurement error. For workloads that fluctuate quite a bit, and the
> >> fluctuations do not line up well with the multiplexing interval,
> >> the default scaling does not give good results.
> >>
> >> So you expect to get good data, but you get very bad data.
> >>
> >> When collecting data for a large number of events it is important
> >> to group them correctly, so that events that are directly dependent
> >> on each other in equations are properly grouped.
> >>
> >> When explicit groups were added the user likely considered this
> >> problem, so it's not good to silently override the choices.
> >>
> >> If a user doesn't care they can always not use groups.
> >>
> >> > Even in that second case it shouldn't say <unsupported> for everything
> >> > in the result, but should deny the run immediately and return with an
> >> > error, and should tell the user how many events in the group fit and
> >> > which ones didn't.
> >>
> >> Returning this information would be great, but it would really
> >> need an extended errno, or just a error string reported out.
> >
> > (sry for late reply, I was still ooo, and missed this conversation)
> >
> > I agree, when the last event fails sys_perf_event_open
> > due to the validate_group check, we will get just EINVAL
> >
> > Was there any discussion about the error (or erorr string)
> > propagation from sys_perf_event_open?
> >
> > Something like below? user space supply buffer for error string.
> >
> No. Why do you need kernel changes for that.
> Perf gets the error, knows it is grouping and prints an appropriate

how does perf know it's grouping and not something else?

> error message. Why do you need kernel for this?

like how would you differentiate EINVAL from validate_group or say
from set_ext_hw_attr (got by using unsupported cache event) ?

jirka

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] perf stat: explicit grouping yields unexpected results
  2013-11-29 13:52             ` Jiri Olsa
@ 2013-11-29 14:01               ` Stephane Eranian
  0 siblings, 0 replies; 21+ messages in thread
From: Stephane Eranian @ 2013-11-29 14:01 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andi Kleen, Ingo Molnar, LKML, Peter Zijlstra, mingo@elte.hu,
	David Ahern, Arnaldo Carvalho de Melo, Namhyung Kim

On Fri, Nov 29, 2013 at 2:52 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> On Fri, Nov 29, 2013 at 02:43:35PM +0100, Stephane Eranian wrote:
>> On Fri, Nov 29, 2013 at 2:33 PM, Jiri Olsa <jolsa@redhat.com> wrote:
>> > On Sat, Nov 16, 2013 at 07:41:34PM -0800, Andi Kleen wrote:
>> >> > I'd say that the default behavior should be what Jiri implemented: get
>> >> > the most out of the situation and inform. But you are right in that
>> >> > 'forcing' all elements of a group to be valid should be possible as
>> >> > well - if a special perf stat option or event format is used.
>> >>
>> >> When something is multiplexed it can have a very
>> >> large measurement error. For workloads that fluctuate quite a bit, and the
>> >> fluctuations do not line up well with the multiplexing interval,
>> >> the default scaling does not give good results.
>> >>
>> >> So you expect to get good data, but you get very bad data.
>> >>
>> >> When collecting data for a large number of events it is important
>> >> to group them correctly, so that events that are directly dependent
>> >> on each other in equations are properly grouped.
>> >>
>> >> When explicit groups were added the user likely considered this
>> >> problem, so it's not good to silently override the choices.
>> >>
>> >> If a user doesn't care they can always not use groups.
>> >>
>> >> > Even in that second case it shouldn't say <unsupported> for everything
>> >> > in the result, but should deny the run immediately and return with an
>> >> > error, and should tell the user how many events in the group fit and
>> >> > which ones didn't.
>> >>
>> >> Returning this information would be great, but it would really
>> >> need an extended errno, or just a error string reported out.
>> >
>> > (sry for late reply, I was still ooo, and missed this conversation)
>> >
>> > I agree, when the last event fails sys_perf_event_open
>> > due to the validate_group check, we will get just EINVAL
>> >
>> > Was there any discussion about the error (or erorr string)
>> > propagation from sys_perf_event_open?
>> >
>> > Something like below? user space supply buffer for error string.
>> >
>> No. Why do you need kernel changes for that.
>> Perf gets the error, knows it is grouping and prints an appropriate
>
> how does perf know it's grouping and not something else?
>
Because the group_fd on this syscall is not -1.

>> error message. Why do you need kernel for this?
>
> like how would you differentiate EINVAL from validate_group or say
> from set_ext_hw_attr (got by using unsupported cache event) ?
>
If you cannot, simply abort and print something like:
if (group_fd != -1 && ret == EINVAL)
  warnx("cannot program event X in group. You may be overcommitting an
event group, try
            reducing the number of events/group"

Or something like that.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] perf stat: explicit grouping yields unexpected results
  2013-11-29 13:33         ` Jiri Olsa
  2013-11-29 13:43           ` Stephane Eranian
@ 2013-12-02 15:23           ` Andi Kleen
  2013-12-03  2:52             ` Stephane Eranian
  1 sibling, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2013-12-02 15:23 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, Stephane Eranian, LKML, Peter Zijlstra,
	mingo@elte.hu, David Ahern, Arnaldo Carvalho de Melo,
	Namhyung Kim

> Something like below? user space supply buffer for error string.

That would work, although I was thinking of making it a more
generic mechanism (store it into task_struct, have a extra
syscall to retrieve)

-Andi

> 
> jirka
> 
> 
> ---
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index e1802d6..a827870 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -331,8 +331,8 @@ struct perf_event_attr {
>  	 */
>  	__u32	sample_stack_user;
>  
> -	/* Align to u64. */
> -	__u32	__reserved_2;
> +	__u32	errstr_size;
> +	char	*errstr;
>  };
>  
>  #define perf_flags(attr)	(*(&(attr)->read_format + 1))
> 

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] perf stat: explicit grouping yields unexpected results
  2013-12-02 15:23           ` Andi Kleen
@ 2013-12-03  2:52             ` Stephane Eranian
  2013-12-03 23:44               ` Andi Kleen
  0 siblings, 1 reply; 21+ messages in thread
From: Stephane Eranian @ 2013-12-03  2:52 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jiri Olsa, Ingo Molnar, LKML, Peter Zijlstra, mingo@elte.hu,
	David Ahern, Arnaldo Carvalho de Melo, Namhyung Kim

On Mon, Dec 2, 2013 at 4:23 PM, Andi Kleen <ak@linux.intel.com> wrote:
>> Something like below? user space supply buffer for error string.
>
> That would work, although I was thinking of making it a more
> generic mechanism (store it into task_struct, have a extra
> syscall to retrieve)
>
I don't understand why this need to be so complicated.
Maybe just change the error code in case of group
overcommitment? That way, the tool could distinguish
and report the appropriate error message.


>> jirka
>>
>>
>> ---
>> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
>> index e1802d6..a827870 100644
>> --- a/include/uapi/linux/perf_event.h
>> +++ b/include/uapi/linux/perf_event.h
>> @@ -331,8 +331,8 @@ struct perf_event_attr {
>>        */
>>       __u32   sample_stack_user;
>>
>> -     /* Align to u64. */
>> -     __u32   __reserved_2;
>> +     __u32   errstr_size;
>> +     char    *errstr;
>>  };
>>
>>  #define perf_flags(attr)     (*(&(attr)->read_format + 1))
>>
>
> --
> ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] perf stat: explicit grouping yields unexpected results
  2013-12-03  2:52             ` Stephane Eranian
@ 2013-12-03 23:44               ` Andi Kleen
  0 siblings, 0 replies; 21+ messages in thread
From: Andi Kleen @ 2013-12-03 23:44 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Jiri Olsa, Ingo Molnar, LKML, Peter Zijlstra, mingo@elte.hu,
	David Ahern, Arnaldo Carvalho de Melo, Namhyung Kim

Stephane Eranian <eranian@google.com> writes:
>>
> I don't understand why this need to be so complicated.
> Maybe just change the error code in case of group
> overcommitment? That way, the tool could distinguish
> and report the appropriate error message.

The existing errno is an ABI.

Also that would handle this case, but there are a bazillion other 
failure cases in perf. And we can't invent new errnos for all of them.

A string error interface would cover it all.

Usually when something strange like this happens in perf I have to apply
this patch
https://lkml.org/lkml/2013/3/8/432
and use ftrace to trace it down.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] perf stat: explicit grouping yields unexpected results
  2013-11-15  6:34 ` Ingo Molnar
  2013-11-15  9:24   ` Stephane Eranian
@ 2013-11-15 10:05   ` Peter Zijlstra
  2013-11-15 10:13     ` Stephane Eranian
  2013-11-15 15:08   ` Vince Weaver
  2 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2013-11-15 10:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Stephane Eranian, LKML, Jiri Olsa, mingo@elte.hu, David Ahern,
	ak@linux.intel.com, Arnaldo Carvalho de Melo, Namhyung Kim

On Fri, Nov 15, 2013 at 07:34:57AM +0100, Ingo Molnar wrote:
> Btw., does the kernel side currently support discovery of such 
> impossible group scheduling constraints at group setup time? 

Up to a point.

> If not 
> then it probably should and it should reject them straight away.

We do I think, for the case where its obvious it can never fit.

That said, if you have a pinned cpu event, it all comes apart.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] perf stat: explicit grouping yields unexpected results
  2013-11-15 10:05   ` Peter Zijlstra
@ 2013-11-15 10:13     ` Stephane Eranian
  2013-11-15 10:49       ` Peter Zijlstra
  0 siblings, 1 reply; 21+ messages in thread
From: Stephane Eranian @ 2013-11-15 10:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, LKML, Jiri Olsa, mingo@elte.hu, David Ahern,
	ak@linux.intel.com, Arnaldo Carvalho de Melo, Namhyung Kim

On Fri, Nov 15, 2013 at 11:05 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Nov 15, 2013 at 07:34:57AM +0100, Ingo Molnar wrote:
>> Btw., does the kernel side currently support discovery of such
>> impossible group scheduling constraints at group setup time?
>
> Up to a point.
>
It is only looking at the group itself  not the overall condition
of the system, e.g., the other HT thread is case of shared
resources.
I think all it guarantees is that if the events in the group
are compatible with each other. And I think it covers the case
where the events use different counters but the same shared
resource, e.g., offcore_response on Intel X86.

>> If not
>> then it probably should and it should reject them straight away.
>
> We do I think, for the case where its obvious it can never fit.
>
> That said, if you have a pinned cpu event, it all comes apart.

You mean in the group?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] perf stat: explicit grouping yields unexpected results
  2013-11-15 10:13     ` Stephane Eranian
@ 2013-11-15 10:49       ` Peter Zijlstra
  0 siblings, 0 replies; 21+ messages in thread
From: Peter Zijlstra @ 2013-11-15 10:49 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Ingo Molnar, LKML, Jiri Olsa, mingo@elte.hu, David Ahern,
	ak@linux.intel.com, Arnaldo Carvalho de Melo, Namhyung Kim

On Fri, Nov 15, 2013 at 11:13:27AM +0100, Stephane Eranian wrote:
> On Fri, Nov 15, 2013 at 11:05 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Fri, Nov 15, 2013 at 07:34:57AM +0100, Ingo Molnar wrote:
> >> Btw., does the kernel side currently support discovery of such
> >> impossible group scheduling constraints at group setup time?
> >
> > Up to a point.
> >
> It is only looking at the group itself  not the overall condition
> of the system, e.g., the other HT thread is case of shared
> resources.
> I think all it guarantees is that if the events in the group
> are compatible with each other. And I think it covers the case
> where the events use different counters but the same shared
> resource, e.g., offcore_response on Intel X86.
> 
> >> If not
> >> then it probably should and it should reject them straight away.
> >
> > We do I think, for the case where its obvious it can never fit.
> >
> > That said, if you have a pinned cpu event, it all comes apart.
> 
> You mean in the group?

No, outside of the group, so that the above validation doesn't guarantee
actual schedulability. Yes it could all fit on the PMU in one go, but
since we _have_ to also fit this pinned task, it really will never get
scheduled.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] perf stat: explicit grouping yields unexpected results
  2013-11-15  6:34 ` Ingo Molnar
  2013-11-15  9:24   ` Stephane Eranian
  2013-11-15 10:05   ` Peter Zijlstra
@ 2013-11-15 15:08   ` Vince Weaver
  2013-11-15 22:52     ` Stephane Eranian
  2 siblings, 1 reply; 21+ messages in thread
From: Vince Weaver @ 2013-11-15 15:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Stephane Eranian, LKML, Jiri Olsa, Peter Zijlstra, mingo@elte.hu,
	David Ahern, ak@linux.intel.com, Arnaldo Carvalho de Melo,
	Namhyung Kim

On Fri, 15 Nov 2013, Ingo Molnar wrote:
> 
> Btw., does the kernel side currently support discovery of such 
> impossible group scheduling constraints at group setup time? If not 
> then it probably should and it should reject them straight away.

It does not, or at least I'm pretty sure it can't if the NMI watchdog is 
enabled (Stephane, are you doing your tests with NMI watchdog disabled?)

This is a big problem with PAPI.  If the NMI watchdog is enabled you can't 
tell if a group will fail until the first read, the kernel can't tell you 
at setup time.

Vince

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG] perf stat: explicit grouping yields unexpected results
  2013-11-15 15:08   ` Vince Weaver
@ 2013-11-15 22:52     ` Stephane Eranian
  0 siblings, 0 replies; 21+ messages in thread
From: Stephane Eranian @ 2013-11-15 22:52 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Ingo Molnar, LKML, Jiri Olsa, Peter Zijlstra, mingo@elte.hu,
	David Ahern, ak@linux.intel.com, Arnaldo Carvalho de Melo,
	Namhyung Kim

On Fri, Nov 15, 2013 at 4:08 PM, Vince Weaver <vince@deater.net> wrote:
> On Fri, 15 Nov 2013, Ingo Molnar wrote:
>>
>> Btw., does the kernel side currently support discovery of such
>> impossible group scheduling constraints at group setup time? If not
>> then it probably should and it should reject them straight away.
>
> It does not, or at least I'm pretty sure it can't if the NMI watchdog is
> enabled (Stephane, are you doing your tests with NMI watchdog disabled?)
>
I think my tests were with watchdog disabled.

> This is a big problem with PAPI.  If the NMI watchdog is enabled you can't
> tell if a group will fail until the first read, the kernel can't tell you
> at setup time.
>
Unless, you build the group and measure in for a short bit of time to validate
runtime execution. But then there is no guarantee either. Because the
perf_events
situation may change, e.g., you may get another pinned event. So I don't think
you will ever get that guarantee at setup time. I goes back to the core design
idea behind perf_events.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2013-12-03 23:44 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-14 20:50 [BUG] perf stat: explicit grouping yields unexpected results Stephane Eranian
2013-11-15  6:34 ` Ingo Molnar
2013-11-15  9:24   ` Stephane Eranian
2013-11-15 10:34     ` Ingo Molnar
2013-11-15 10:41       ` Stephane Eranian
2013-11-15 10:50       ` Peter Zijlstra
2013-11-15 11:52       ` Peter Zijlstra
2013-11-15 11:58         ` Stephane Eranian
2013-11-17  3:41       ` Andi Kleen
2013-11-29 13:33         ` Jiri Olsa
2013-11-29 13:43           ` Stephane Eranian
2013-11-29 13:52             ` Jiri Olsa
2013-11-29 14:01               ` Stephane Eranian
2013-12-02 15:23           ` Andi Kleen
2013-12-03  2:52             ` Stephane Eranian
2013-12-03 23:44               ` Andi Kleen
2013-11-15 10:05   ` Peter Zijlstra
2013-11-15 10:13     ` Stephane Eranian
2013-11-15 10:49       ` Peter Zijlstra
2013-11-15 15:08   ` Vince Weaver
2013-11-15 22:52     ` Stephane Eranian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox