* Re: [BUG] perf stat: explicit grouping yields unexpected results
2013-11-15 10:34 ` Ingo Molnar
@ 2013-11-15 10:41 ` Stephane Eranian
2013-11-15 10:50 ` Peter Zijlstra
` (2 subsequent siblings)
3 siblings, 0 replies; 21+ messages in thread
From: Stephane Eranian @ 2013-11-15 10:41 UTC (permalink / raw)
To: Ingo Molnar
Cc: LKML, Jiri Olsa, Peter Zijlstra, mingo@elte.hu, David Ahern,
ak@linux.intel.com, Arnaldo Carvalho de Melo, Namhyung Kim
On Fri, Nov 15, 2013 at 11:34 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Stephane Eranian <eranian@google.com> wrote:
>
>> On Fri, Nov 15, 2013 at 7:34 AM, Ingo Molnar <mingo@kernel.org> wrote:
>> >
>> > * Stephane Eranian <eranian@google.com> wrote:
>> >
>> >> Jiri,
>> >>
>> >> I was trying the grouping support in perf stat and I was surprised
>> >> to see that if I create a group that is too big to be scheduled, and
>> >> where only N out of P events can fit, perf stat still yields counts
>> >> for the N events. I was expecting 0 counts or <not supported>.
>> >>
>> >> The kernel semantic is to schedule all the events in a group or
>> >> none. Perf does something different and this is confusing. If you
>> >> use explicit grouping then I think you want to group to fail if not
>> >> all the events can be scheduled:
>> >>
>> >> On an IvyBridge:
>> >> $ perf stat --g -e
>> >> '{cycles,instructions,branches,branches,branches,branches,branches}'
>> >> noploop 1
>> >> 3 229 417 079 cycles
>> >> 3 223 919 023 instructions # 1,00 insns per cycle
>> >> 3 220 868 098 branches
>> >> 3 220 868 098 branches
>> >> 3 220 868 098 branches
>> >> 3 220 868 098 branches
>> >> <not supported> branches
>> >>
>> >> I think it should be: <not supported> for all events.
>> >
>> > Btw., does the kernel side currently support discovery of such
>> > impossible group scheduling constraints at group setup time? If not
>> > then it probably should and it should reject them straight away.
>>
>> The kernel does validate events as they are added to a group. That's
>> why we have validate_event(), validate_group() and the fake_cpuc
>> mode.
>
> So the problem here isn't really that the kernel doesn't tell
> userspace about it, but that perf stat does not interpret the
> validation result properly.
>
Correct.
> That brings up an interesting question: what is better for users, if
> we schedule as many as we can and say 'not supported' to the rest
> (current behavior), or if we fail the whole group?
>
> I'd say that the default behavior should be what Jiri implemented: get
> the most out of the situation and inform. But you are right in that
> 'forcing' all elements of a group to be valid should be possible as
> well - if a special perf stat option or event format is used.
>
> Even in that second case it shouldn't say <unsupported> for everything
> in the result, but should deny the run immediately and return with an
> error, and should tell the user how many events in the group fit and
> which ones didn't.
>
Fine with me.
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [BUG] perf stat: explicit grouping yields unexpected results
2013-11-15 10:34 ` Ingo Molnar
2013-11-15 10:41 ` Stephane Eranian
@ 2013-11-15 10:50 ` Peter Zijlstra
2013-11-15 11:52 ` Peter Zijlstra
2013-11-17 3:41 ` Andi Kleen
3 siblings, 0 replies; 21+ messages in thread
From: Peter Zijlstra @ 2013-11-15 10:50 UTC (permalink / raw)
To: Ingo Molnar
Cc: Stephane Eranian, LKML, Jiri Olsa, mingo@elte.hu, David Ahern,
ak@linux.intel.com, Arnaldo Carvalho de Melo, Namhyung Kim
On Fri, Nov 15, 2013 at 11:34:05AM +0100, Ingo Molnar wrote:
> That brings up an interesting question: what is better for users, if
> we schedule as many as we can and say 'not supported' to the rest
> (current behavior), or if we fail the whole group?
Obviously fail the whole group :-)
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [BUG] perf stat: explicit grouping yields unexpected results
2013-11-15 10:34 ` Ingo Molnar
2013-11-15 10:41 ` Stephane Eranian
2013-11-15 10:50 ` Peter Zijlstra
@ 2013-11-15 11:52 ` Peter Zijlstra
2013-11-15 11:58 ` Stephane Eranian
2013-11-17 3:41 ` Andi Kleen
3 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2013-11-15 11:52 UTC (permalink / raw)
To: Ingo Molnar
Cc: Stephane Eranian, LKML, Jiri Olsa, mingo@elte.hu, David Ahern,
ak@linux.intel.com, Arnaldo Carvalho de Melo, Namhyung Kim
On Fri, Nov 15, 2013 at 11:34:05AM +0100, Ingo Molnar wrote:
> That brings up an interesting question: what is better for users, if
> we schedule as many as we can and say 'not supported' to the rest
> (current behavior), or if we fail the whole group?
>
> I'd say that the default behavior should be what Jiri implemented: get
> the most out of the situation and inform. But you are right in that
> 'forcing' all elements of a group to be valid should be possible as
> well - if a special perf stat option or event format is used.
So I don't agree, but if you want to keep this IMO weird behaviour at
least WARN about it in big blinking neon letters that the user isn't
getting what he asked for.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [BUG] perf stat: explicit grouping yields unexpected results
2013-11-15 11:52 ` Peter Zijlstra
@ 2013-11-15 11:58 ` Stephane Eranian
0 siblings, 0 replies; 21+ messages in thread
From: Stephane Eranian @ 2013-11-15 11:58 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, LKML, Jiri Olsa, mingo@elte.hu, David Ahern,
ak@linux.intel.com, Arnaldo Carvalho de Melo, Namhyung Kim
On Fri, Nov 15, 2013 at 12:52 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Nov 15, 2013 at 11:34:05AM +0100, Ingo Molnar wrote:
>> That brings up an interesting question: what is better for users, if
>> we schedule as many as we can and say 'not supported' to the rest
>> (current behavior), or if we fail the whole group?
>>
>> I'd say that the default behavior should be what Jiri implemented: get
>> the most out of the situation and inform. But you are right in that
>> 'forcing' all elements of a group to be valid should be possible as
>> well - if a special perf stat option or event format is used.
>
> So I don't agree, but if you want to keep this IMO weird behaviour at
> least WARN about it in big blinking neon letters that the user isn't
> getting what he asked for.
Agreed.
When I tested this, I already knew what to expect, that's why I was
surprised not to get any warning or error messages.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [BUG] perf stat: explicit grouping yields unexpected results
2013-11-15 10:34 ` Ingo Molnar
` (2 preceding siblings ...)
2013-11-15 11:52 ` Peter Zijlstra
@ 2013-11-17 3:41 ` Andi Kleen
2013-11-29 13:33 ` Jiri Olsa
3 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2013-11-17 3:41 UTC (permalink / raw)
To: Ingo Molnar
Cc: Stephane Eranian, LKML, Jiri Olsa, Peter Zijlstra, mingo@elte.hu,
David Ahern, Arnaldo Carvalho de Melo, Namhyung Kim
> I'd say that the default behavior should be what Jiri implemented: get
> the most out of the situation and inform. But you are right in that
> 'forcing' all elements of a group to be valid should be possible as
> well - if a special perf stat option or event format is used.
When something is multiplexed it can have a very
large measurement error. For workloads that fluctuate quite a bit, and the
fluctuations do not line up well with the multiplexing interval,
the default scaling does not give good results.
So you expect to get good data, but you get very bad data.
When collecting data for a large number of events it is important
to group them correctly, so that events that are directly dependent
on each other in equations are properly grouped.
When explicit groups were added the user likely considered this
problem, so it's not good to silently override the choices.
If a user doesn't care they can always not use groups.
> Even in that second case it shouldn't say <unsupported> for everything
> in the result, but should deny the run immediately and return with an
> error, and should tell the user how many events in the group fit and
> which ones didn't.
Returning this information would be great, but it would really
need an extended errno, or just a error string reported out.
-Andi
--
ak@linux.intel.com -- Speaking for myself only
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [BUG] perf stat: explicit grouping yields unexpected results
2013-11-17 3:41 ` Andi Kleen
@ 2013-11-29 13:33 ` Jiri Olsa
2013-11-29 13:43 ` Stephane Eranian
2013-12-02 15:23 ` Andi Kleen
0 siblings, 2 replies; 21+ messages in thread
From: Jiri Olsa @ 2013-11-29 13:33 UTC (permalink / raw)
To: Andi Kleen
Cc: Ingo Molnar, Stephane Eranian, LKML, Peter Zijlstra,
mingo@elte.hu, David Ahern, Arnaldo Carvalho de Melo,
Namhyung Kim
On Sat, Nov 16, 2013 at 07:41:34PM -0800, Andi Kleen wrote:
> > I'd say that the default behavior should be what Jiri implemented: get
> > the most out of the situation and inform. But you are right in that
> > 'forcing' all elements of a group to be valid should be possible as
> > well - if a special perf stat option or event format is used.
>
> When something is multiplexed it can have a very
> large measurement error. For workloads that fluctuate quite a bit, and the
> fluctuations do not line up well with the multiplexing interval,
> the default scaling does not give good results.
>
> So you expect to get good data, but you get very bad data.
>
> When collecting data for a large number of events it is important
> to group them correctly, so that events that are directly dependent
> on each other in equations are properly grouped.
>
> When explicit groups were added the user likely considered this
> problem, so it's not good to silently override the choices.
>
> If a user doesn't care they can always not use groups.
>
> > Even in that second case it shouldn't say <unsupported> for everything
> > in the result, but should deny the run immediately and return with an
> > error, and should tell the user how many events in the group fit and
> > which ones didn't.
>
> Returning this information would be great, but it would really
> need an extended errno, or just a error string reported out.
(sry for late reply, I was still ooo, and missed this conversation)
I agree, when the last event fails sys_perf_event_open
due to the validate_group check, we will get just EINVAL
Was there any discussion about the error (or erorr string)
propagation from sys_perf_event_open?
Something like below? user space supply buffer for error string.
jirka
---
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index e1802d6..a827870 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -331,8 +331,8 @@ struct perf_event_attr {
*/
__u32 sample_stack_user;
- /* Align to u64. */
- __u32 __reserved_2;
+ __u32 errstr_size;
+ char *errstr;
};
#define perf_flags(attr) (*(&(attr)->read_format + 1))
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [BUG] perf stat: explicit grouping yields unexpected results
2013-11-29 13:33 ` Jiri Olsa
@ 2013-11-29 13:43 ` Stephane Eranian
2013-11-29 13:52 ` Jiri Olsa
2013-12-02 15:23 ` Andi Kleen
1 sibling, 1 reply; 21+ messages in thread
From: Stephane Eranian @ 2013-11-29 13:43 UTC (permalink / raw)
To: Jiri Olsa
Cc: Andi Kleen, Ingo Molnar, LKML, Peter Zijlstra, mingo@elte.hu,
David Ahern, Arnaldo Carvalho de Melo, Namhyung Kim
On Fri, Nov 29, 2013 at 2:33 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> On Sat, Nov 16, 2013 at 07:41:34PM -0800, Andi Kleen wrote:
>> > I'd say that the default behavior should be what Jiri implemented: get
>> > the most out of the situation and inform. But you are right in that
>> > 'forcing' all elements of a group to be valid should be possible as
>> > well - if a special perf stat option or event format is used.
>>
>> When something is multiplexed it can have a very
>> large measurement error. For workloads that fluctuate quite a bit, and the
>> fluctuations do not line up well with the multiplexing interval,
>> the default scaling does not give good results.
>>
>> So you expect to get good data, but you get very bad data.
>>
>> When collecting data for a large number of events it is important
>> to group them correctly, so that events that are directly dependent
>> on each other in equations are properly grouped.
>>
>> When explicit groups were added the user likely considered this
>> problem, so it's not good to silently override the choices.
>>
>> If a user doesn't care they can always not use groups.
>>
>> > Even in that second case it shouldn't say <unsupported> for everything
>> > in the result, but should deny the run immediately and return with an
>> > error, and should tell the user how many events in the group fit and
>> > which ones didn't.
>>
>> Returning this information would be great, but it would really
>> need an extended errno, or just a error string reported out.
>
> (sry for late reply, I was still ooo, and missed this conversation)
>
> I agree, when the last event fails sys_perf_event_open
> due to the validate_group check, we will get just EINVAL
>
> Was there any discussion about the error (or erorr string)
> propagation from sys_perf_event_open?
>
> Something like below? user space supply buffer for error string.
>
No. Why do you need kernel changes for that.
Perf gets the error, knows it is grouping and prints an appropriate
error message. Why do you need kernel for this?
> jirka
>
>
> ---
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index e1802d6..a827870 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -331,8 +331,8 @@ struct perf_event_attr {
> */
> __u32 sample_stack_user;
>
> - /* Align to u64. */
> - __u32 __reserved_2;
> + __u32 errstr_size;
> + char *errstr;
> };
>
> #define perf_flags(attr) (*(&(attr)->read_format + 1))
>
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [BUG] perf stat: explicit grouping yields unexpected results
2013-11-29 13:43 ` Stephane Eranian
@ 2013-11-29 13:52 ` Jiri Olsa
2013-11-29 14:01 ` Stephane Eranian
0 siblings, 1 reply; 21+ messages in thread
From: Jiri Olsa @ 2013-11-29 13:52 UTC (permalink / raw)
To: Stephane Eranian
Cc: Andi Kleen, Ingo Molnar, LKML, Peter Zijlstra, mingo@elte.hu,
David Ahern, Arnaldo Carvalho de Melo, Namhyung Kim
On Fri, Nov 29, 2013 at 02:43:35PM +0100, Stephane Eranian wrote:
> On Fri, Nov 29, 2013 at 2:33 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> > On Sat, Nov 16, 2013 at 07:41:34PM -0800, Andi Kleen wrote:
> >> > I'd say that the default behavior should be what Jiri implemented: get
> >> > the most out of the situation and inform. But you are right in that
> >> > 'forcing' all elements of a group to be valid should be possible as
> >> > well - if a special perf stat option or event format is used.
> >>
> >> When something is multiplexed it can have a very
> >> large measurement error. For workloads that fluctuate quite a bit, and the
> >> fluctuations do not line up well with the multiplexing interval,
> >> the default scaling does not give good results.
> >>
> >> So you expect to get good data, but you get very bad data.
> >>
> >> When collecting data for a large number of events it is important
> >> to group them correctly, so that events that are directly dependent
> >> on each other in equations are properly grouped.
> >>
> >> When explicit groups were added the user likely considered this
> >> problem, so it's not good to silently override the choices.
> >>
> >> If a user doesn't care they can always not use groups.
> >>
> >> > Even in that second case it shouldn't say <unsupported> for everything
> >> > in the result, but should deny the run immediately and return with an
> >> > error, and should tell the user how many events in the group fit and
> >> > which ones didn't.
> >>
> >> Returning this information would be great, but it would really
> >> need an extended errno, or just a error string reported out.
> >
> > (sry for late reply, I was still ooo, and missed this conversation)
> >
> > I agree, when the last event fails sys_perf_event_open
> > due to the validate_group check, we will get just EINVAL
> >
> > Was there any discussion about the error (or erorr string)
> > propagation from sys_perf_event_open?
> >
> > Something like below? user space supply buffer for error string.
> >
> No. Why do you need kernel changes for that.
> Perf gets the error, knows it is grouping and prints an appropriate
how does perf know it's grouping and not something else?
> error message. Why do you need kernel for this?
like how would you differentiate EINVAL from validate_group or say
from set_ext_hw_attr (got by using unsupported cache event) ?
jirka
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [BUG] perf stat: explicit grouping yields unexpected results
2013-11-29 13:52 ` Jiri Olsa
@ 2013-11-29 14:01 ` Stephane Eranian
0 siblings, 0 replies; 21+ messages in thread
From: Stephane Eranian @ 2013-11-29 14:01 UTC (permalink / raw)
To: Jiri Olsa
Cc: Andi Kleen, Ingo Molnar, LKML, Peter Zijlstra, mingo@elte.hu,
David Ahern, Arnaldo Carvalho de Melo, Namhyung Kim
On Fri, Nov 29, 2013 at 2:52 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> On Fri, Nov 29, 2013 at 02:43:35PM +0100, Stephane Eranian wrote:
>> On Fri, Nov 29, 2013 at 2:33 PM, Jiri Olsa <jolsa@redhat.com> wrote:
>> > On Sat, Nov 16, 2013 at 07:41:34PM -0800, Andi Kleen wrote:
>> >> > I'd say that the default behavior should be what Jiri implemented: get
>> >> > the most out of the situation and inform. But you are right in that
>> >> > 'forcing' all elements of a group to be valid should be possible as
>> >> > well - if a special perf stat option or event format is used.
>> >>
>> >> When something is multiplexed it can have a very
>> >> large measurement error. For workloads that fluctuate quite a bit, and the
>> >> fluctuations do not line up well with the multiplexing interval,
>> >> the default scaling does not give good results.
>> >>
>> >> So you expect to get good data, but you get very bad data.
>> >>
>> >> When collecting data for a large number of events it is important
>> >> to group them correctly, so that events that are directly dependent
>> >> on each other in equations are properly grouped.
>> >>
>> >> When explicit groups were added the user likely considered this
>> >> problem, so it's not good to silently override the choices.
>> >>
>> >> If a user doesn't care they can always not use groups.
>> >>
>> >> > Even in that second case it shouldn't say <unsupported> for everything
>> >> > in the result, but should deny the run immediately and return with an
>> >> > error, and should tell the user how many events in the group fit and
>> >> > which ones didn't.
>> >>
>> >> Returning this information would be great, but it would really
>> >> need an extended errno, or just a error string reported out.
>> >
>> > (sry for late reply, I was still ooo, and missed this conversation)
>> >
>> > I agree, when the last event fails sys_perf_event_open
>> > due to the validate_group check, we will get just EINVAL
>> >
>> > Was there any discussion about the error (or erorr string)
>> > propagation from sys_perf_event_open?
>> >
>> > Something like below? user space supply buffer for error string.
>> >
>> No. Why do you need kernel changes for that.
>> Perf gets the error, knows it is grouping and prints an appropriate
>
> how does perf know it's grouping and not something else?
>
Because the group_fd on this syscall is not -1.
>> error message. Why do you need kernel for this?
>
> like how would you differentiate EINVAL from validate_group or say
> from set_ext_hw_attr (got by using unsupported cache event) ?
>
If you cannot, simply abort and print something like:
if (group_fd != -1 && ret == EINVAL)
warnx("cannot program event X in group. You may be overcommitting an
event group, try
reducing the number of events/group"
Or something like that.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [BUG] perf stat: explicit grouping yields unexpected results
2013-11-29 13:33 ` Jiri Olsa
2013-11-29 13:43 ` Stephane Eranian
@ 2013-12-02 15:23 ` Andi Kleen
2013-12-03 2:52 ` Stephane Eranian
1 sibling, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2013-12-02 15:23 UTC (permalink / raw)
To: Jiri Olsa
Cc: Ingo Molnar, Stephane Eranian, LKML, Peter Zijlstra,
mingo@elte.hu, David Ahern, Arnaldo Carvalho de Melo,
Namhyung Kim
> Something like below? user space supply buffer for error string.
That would work, although I was thinking of making it a more
generic mechanism (store it into task_struct, have a extra
syscall to retrieve)
-Andi
>
> jirka
>
>
> ---
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index e1802d6..a827870 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -331,8 +331,8 @@ struct perf_event_attr {
> */
> __u32 sample_stack_user;
>
> - /* Align to u64. */
> - __u32 __reserved_2;
> + __u32 errstr_size;
> + char *errstr;
> };
>
> #define perf_flags(attr) (*(&(attr)->read_format + 1))
>
--
ak@linux.intel.com -- Speaking for myself only
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [BUG] perf stat: explicit grouping yields unexpected results
2013-12-02 15:23 ` Andi Kleen
@ 2013-12-03 2:52 ` Stephane Eranian
2013-12-03 23:44 ` Andi Kleen
0 siblings, 1 reply; 21+ messages in thread
From: Stephane Eranian @ 2013-12-03 2:52 UTC (permalink / raw)
To: Andi Kleen
Cc: Jiri Olsa, Ingo Molnar, LKML, Peter Zijlstra, mingo@elte.hu,
David Ahern, Arnaldo Carvalho de Melo, Namhyung Kim
On Mon, Dec 2, 2013 at 4:23 PM, Andi Kleen <ak@linux.intel.com> wrote:
>> Something like below? user space supply buffer for error string.
>
> That would work, although I was thinking of making it a more
> generic mechanism (store it into task_struct, have a extra
> syscall to retrieve)
>
I don't understand why this need to be so complicated.
Maybe just change the error code in case of group
overcommitment? That way, the tool could distinguish
and report the appropriate error message.
>> jirka
>>
>>
>> ---
>> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
>> index e1802d6..a827870 100644
>> --- a/include/uapi/linux/perf_event.h
>> +++ b/include/uapi/linux/perf_event.h
>> @@ -331,8 +331,8 @@ struct perf_event_attr {
>> */
>> __u32 sample_stack_user;
>>
>> - /* Align to u64. */
>> - __u32 __reserved_2;
>> + __u32 errstr_size;
>> + char *errstr;
>> };
>>
>> #define perf_flags(attr) (*(&(attr)->read_format + 1))
>>
>
> --
> ak@linux.intel.com -- Speaking for myself only
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [BUG] perf stat: explicit grouping yields unexpected results
2013-12-03 2:52 ` Stephane Eranian
@ 2013-12-03 23:44 ` Andi Kleen
0 siblings, 0 replies; 21+ messages in thread
From: Andi Kleen @ 2013-12-03 23:44 UTC (permalink / raw)
To: Stephane Eranian
Cc: Jiri Olsa, Ingo Molnar, LKML, Peter Zijlstra, mingo@elte.hu,
David Ahern, Arnaldo Carvalho de Melo, Namhyung Kim
Stephane Eranian <eranian@google.com> writes:
>>
> I don't understand why this need to be so complicated.
> Maybe just change the error code in case of group
> overcommitment? That way, the tool could distinguish
> and report the appropriate error message.
The existing errno is an ABI.
Also that would handle this case, but there are a bazillion other
failure cases in perf. And we can't invent new errnos for all of them.
A string error interface would cover it all.
Usually when something strange like this happens in perf I have to apply
this patch
https://lkml.org/lkml/2013/3/8/432
and use ftrace to trace it down.
-Andi
--
ak@linux.intel.com -- Speaking for myself only
^ permalink raw reply [flat|nested] 21+ messages in thread