* [PATCH 0/4] perf_event_open.2 Linux 3.12 updates
@ 2013-11-06 18:26 Vince Weaver
[not found] ` <alpine.DEB.2.10.1311061324070.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Vince Weaver @ 2013-11-06 18:26 UTC (permalink / raw)
To: Michael Kerrisk (man-pages); +Cc: linux-man-u79uwXL29TY76Z2rM5mHXA
Hello
here is a patch series bringing perf_event_open.2 documentation
in line with the recent Linux 3.12 release.
This replaces a patch I sent previously that made similar changes.
[The only realy change from that patch was "mmap2" sample support
was dropped at the last minute. The defines are still in the
perf_event.h header file but it will always return EINVAL
until the kernel developers finalize the interface].
Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread[parent not found: <alpine.DEB.2.10.1311061324070.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>]
* [PATCH 1/4] perf_event_open.2 PERF_COUNT_SW_DUMMY support [not found] ` <alpine.DEB.2.10.1311061324070.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org> @ 2013-11-06 18:28 ` Vince Weaver [not found] ` <alpine.DEB.2.10.1311061327180.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org> 2013-11-06 18:30 ` [PATCH 2/4] perf_event_open.2 Linux 3.12 PERF_SAMPLE_IDENTIFIER Vince Weaver ` (2 subsequent siblings) 3 siblings, 1 reply; 9+ messages in thread From: Vince Weaver @ 2013-11-06 18:28 UTC (permalink / raw) To: Michael Kerrisk (man-pages); +Cc: linux-man-u79uwXL29TY76Z2rM5mHXA Support for the PERF_COUNT_SW_DUMMY event type was added in Linux 3.12. Signed-off-by: Vince Weaver <vincent.weaver-e7X0jjDqjFGHXe+LvDLADg@public.gmane.org> diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2 index 4ff9690..a443b6e 100644 --- a/man2/perf_event_open.2 +++ b/man2/perf_event_open.2 @@ -468,6 +468,13 @@ This counts the number of emulation faults. The kernel sometimes traps on unimplemented instructions and emulates them for user space. This can negatively impact performance. +.TP +.BR PERF_COUNT_SW_DUMMY " (Since Linux 3.12)" +This is a placeholder event that counts nothing. +Informational sample record types such as mmap or comm +must be associated with an active event. +This dummy event allows gathering such records without requiring +a counting event. .RE .RS -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 9+ messages in thread
[parent not found: <alpine.DEB.2.10.1311061327180.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>]
* Re: [PATCH 1/4] perf_event_open.2 PERF_COUNT_SW_DUMMY support [not found] ` <alpine.DEB.2.10.1311061327180.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org> @ 2013-11-07 17:11 ` Michael Kerrisk (man-pages) 0 siblings, 0 replies; 9+ messages in thread From: Michael Kerrisk (man-pages) @ 2013-11-07 17:11 UTC (permalink / raw) To: Vince Weaver Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, linux-man-u79uwXL29TY76Z2rM5mHXA On 11/07/13 07:28, Vince Weaver wrote: > > Support for the PERF_COUNT_SW_DUMMY event type was added in Linux 3.12. Thanks, Vince. Applied. Cheers, Michael > Signed-off-by: Vince Weaver <vincent.weaver-e7X0jjDqjFGHXe+LvDLADg@public.gmane.org> > > diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2 > index 4ff9690..a443b6e 100644 > --- a/man2/perf_event_open.2 > +++ b/man2/perf_event_open.2 > @@ -468,6 +468,13 @@ This counts the number of emulation faults. > The kernel sometimes traps on unimplemented instructions > and emulates them for user space. > This can negatively impact performance. > +.TP > +.BR PERF_COUNT_SW_DUMMY " (Since Linux 3.12)" > +This is a placeholder event that counts nothing. > +Informational sample record types such as mmap or comm > +must be associated with an active event. > +This dummy event allows gathering such records without requiring > +a counting event. > .RE > > .RS > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/4] perf_event_open.2 Linux 3.12 PERF_SAMPLE_IDENTIFIER [not found] ` <alpine.DEB.2.10.1311061324070.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org> 2013-11-06 18:28 ` [PATCH 1/4] perf_event_open.2 PERF_COUNT_SW_DUMMY support Vince Weaver @ 2013-11-06 18:30 ` Vince Weaver [not found] ` <alpine.DEB.2.10.1311061329120.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org> 2013-11-06 18:31 ` [PATCH 3/4] perf_event_open.2 Linux 3.12 PERF_EVENT_IOC_ID Vince Weaver 2013-11-06 18:34 ` [PATCH 4/4] perf_event_open.2 Linux 3.12 rdpmc/mmap Vince Weaver 3 siblings, 1 reply; 9+ messages in thread From: Vince Weaver @ 2013-11-06 18:30 UTC (permalink / raw) To: Michael Kerrisk (man-pages); +Cc: linux-man-u79uwXL29TY76Z2rM5mHXA A new PERF_SAMPLE_IDENTIFIER sample type was added in Linux 3.12 Signed-off-by: Vince Weaver <vincent.weaver-e7X0jjDqjFGHXe+LvDLADg@public.gmane.org> diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2 index 4ff9690..a443b6e 100644 --- a/man2/perf_event_open.2 +++ b/man2/perf_event_open.2 @@ -680,6 +687,27 @@ Records the data source: where in the memory hierarchy the data associated with the sampled instruction came from. This is only available if the underlying hardware supports this feature. +.TP +.BR PERF_SAMPLE_IDENTIFIER " (Since Linux 3.12)" +Places the SAMPLE_ID value in a fixed position in the record, +either at the beginning (for sample events) or at the end +(if a non-sample event). + +This was necessary because a sample stream may have +records from various different event sources with different +.I sample_type +settings. +Parsing the event stream properly was not possible because the +format of the record was needed to find SAMPLE_ID, but +the the format could not be found without knowing what +event the sample belonged to (causing a circular +dependency). + +This new +.B PERF_SAMPLE_IDENTIFIER +setting makes the event stream always parsable +by putting SAMPLE_ID in a fixed location, even though +it means having duplicate SAMPLE_ID values in records. .RE .TP .IR "read_format" @@ -860,12 +888,33 @@ field, but enables including data mmap events in the ring-buffer. .TP .IR "sample_id_all" " (Since Linux 2.6.38)" -If set, then TID, TIME, ID, CPU, and STREAM_ID can +If set, then TID, TIME, ID, STREAM_ID, and CPU can additionally be included in .RB non- PERF_RECORD_SAMPLE s if the corresponding .I sample_type is selected. + +If +.B PERF_SAMPLE_IDENTIFIER +is specified than an additional ID value is included +as the last value to ease parsing the record stream. +This may lead to the +.I id +value appearing twice. + +The layout is described by this pseudo-structure: +.in +4n +.nf +struct sample_id { + { u32 pid, tid; } /* if PERF_SAMPLE_TID set */ + { u64 time; } /* if PERF_SAMPLE_TIME set */ + { u64 id; } /* if PERF_SAMPLE_ID set */ + { u64 stream_id;} /* if PERF_SAMPLE_STREAM_ID set */ + { u32 cpu, res; } /* if PERF_SAMPLE_CPU set */ + { u64 id; } /* if PERF_SAMPLE_IDENTIFIER set */ +}; +.fi .TP .IR "exclude_host" " (Since Linux 3.2)" Do not measure time spent in VM host. @@ -1385,6 +1510,7 @@ The values in the corresponding record (that follows the header) depend on the .I type selected as shown. + .RS .TP 4 .B PERF_RECORD_MMAP @@ -1416,6 +1542,7 @@ struct { struct perf_event_header header; u64 id; u64 lost; + struct sample_id sample_id; }; .fi .in @@ -1437,6 +1564,7 @@ struct { struct perf_event_header header; u32 pid, tid; char comm[]; + struct sample_id sample_id; }; .fi .in @@ -1451,6 +1579,7 @@ struct { u32 pid, ppid; u32 tid, ptid; u64 time; + struct sample_id sample_id; }; .fi .in @@ -1465,6 +1594,7 @@ struct { u64 time; u64 id; u64 stream_id; + struct sample_id sample_id; }; .fi .in @@ -1479,6 +1609,7 @@ struct { u32 pid, ppid; u32 tid, ptid; u64 time; + struct sample_id sample_id; }; .fi .in @@ -1492,6 +1623,7 @@ struct { struct perf_event_header header; u32 pid, tid; struct read_format values; + struct sample_id sample_id; }; .fi .in @@ -1503,6 +1635,7 @@ This record indicates a sample. .nf struct { struct perf_event_header header; + u64 sample_id; /* if PERF_SAMPLE_IDENTIFIER */ u64 ip; /* if PERF_SAMPLE_IP */ u32 pid, tid; /* if PERF_SAMPLE_TID */ u64 time; /* if PERF_SAMPLE_TIME */ @@ -1531,6 +1664,16 @@ struct { .fi .RS 4 .TP 4 +.I sample_id +If +.B PERF_SAMPLE_IDENTIFIER +is enabled, a 64-bit unique ID is included. +This is a duplication of the +.B PERF_SAMPLE_ID +.I id +value, but included at the beginning of the sample +so parsers can easily obtain the value. +.TP .I ip If .B PERF_SAMPLE_IP -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 9+ messages in thread
[parent not found: <alpine.DEB.2.10.1311061329120.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>]
* Re: [PATCH 2/4] perf_event_open.2 Linux 3.12 PERF_SAMPLE_IDENTIFIER [not found] ` <alpine.DEB.2.10.1311061329120.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org> @ 2013-11-07 17:13 ` Michael Kerrisk (man-pages) 0 siblings, 0 replies; 9+ messages in thread From: Michael Kerrisk (man-pages) @ 2013-11-07 17:13 UTC (permalink / raw) To: Vince Weaver Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, linux-man-u79uwXL29TY76Z2rM5mHXA On 11/07/13 07:30, Vince Weaver wrote: > > A new PERF_SAMPLE_IDENTIFIER sample type was added in Linux 3.12 Thanks, Vince. Applied. Cheers, Michael > Signed-off-by: Vince Weaver <vincent.weaver-e7X0jjDqjFGHXe+LvDLADg@public.gmane.org> > > diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2 > index 4ff9690..a443b6e 100644 > --- a/man2/perf_event_open.2 > +++ b/man2/perf_event_open.2 > @@ -680,6 +687,27 @@ Records the data source: where in the memory hierarchy > the data associated with the sampled instruction came from. > This is only available if the underlying hardware > supports this feature. > +.TP > +.BR PERF_SAMPLE_IDENTIFIER " (Since Linux 3.12)" > +Places the SAMPLE_ID value in a fixed position in the record, > +either at the beginning (for sample events) or at the end > +(if a non-sample event). > + > +This was necessary because a sample stream may have > +records from various different event sources with different > +.I sample_type > +settings. > +Parsing the event stream properly was not possible because the > +format of the record was needed to find SAMPLE_ID, but > +the the format could not be found without knowing what > +event the sample belonged to (causing a circular > +dependency). > + > +This new > +.B PERF_SAMPLE_IDENTIFIER > +setting makes the event stream always parsable > +by putting SAMPLE_ID in a fixed location, even though > +it means having duplicate SAMPLE_ID values in records. > .RE > .TP > .IR "read_format" > @@ -860,12 +888,33 @@ field, but enables including data mmap events > in the ring-buffer. > .TP > .IR "sample_id_all" " (Since Linux 2.6.38)" > -If set, then TID, TIME, ID, CPU, and STREAM_ID can > +If set, then TID, TIME, ID, STREAM_ID, and CPU can > additionally be included in > .RB non- PERF_RECORD_SAMPLE s > if the corresponding > .I sample_type > is selected. > + > +If > +.B PERF_SAMPLE_IDENTIFIER > +is specified than an additional ID value is included > +as the last value to ease parsing the record stream. > +This may lead to the > +.I id > +value appearing twice. > + > +The layout is described by this pseudo-structure: > +.in +4n > +.nf > +struct sample_id { > + { u32 pid, tid; } /* if PERF_SAMPLE_TID set */ > + { u64 time; } /* if PERF_SAMPLE_TIME set */ > + { u64 id; } /* if PERF_SAMPLE_ID set */ > + { u64 stream_id;} /* if PERF_SAMPLE_STREAM_ID set */ > + { u32 cpu, res; } /* if PERF_SAMPLE_CPU set */ > + { u64 id; } /* if PERF_SAMPLE_IDENTIFIER set */ > +}; > +.fi > .TP > .IR "exclude_host" " (Since Linux 3.2)" > Do not measure time spent in VM host. > @@ -1385,6 +1510,7 @@ The values in the corresponding record (that follows the header) > depend on the > .I type > selected as shown. > + > .RS > .TP 4 > .B PERF_RECORD_MMAP > @@ -1416,6 +1542,7 @@ struct { > struct perf_event_header header; > u64 id; > u64 lost; > + struct sample_id sample_id; > }; > .fi > .in > @@ -1437,6 +1564,7 @@ struct { > struct perf_event_header header; > u32 pid, tid; > char comm[]; > + struct sample_id sample_id; > }; > .fi > .in > @@ -1451,6 +1579,7 @@ struct { > u32 pid, ppid; > u32 tid, ptid; > u64 time; > + struct sample_id sample_id; > }; > .fi > .in > @@ -1465,6 +1594,7 @@ struct { > u64 time; > u64 id; > u64 stream_id; > + struct sample_id sample_id; > }; > .fi > .in > @@ -1479,6 +1609,7 @@ struct { > u32 pid, ppid; > u32 tid, ptid; > u64 time; > + struct sample_id sample_id; > }; > .fi > .in > @@ -1492,6 +1623,7 @@ struct { > struct perf_event_header header; > u32 pid, tid; > struct read_format values; > + struct sample_id sample_id; > }; > .fi > .in > @@ -1503,6 +1635,7 @@ This record indicates a sample. > .nf > struct { > struct perf_event_header header; > + u64 sample_id; /* if PERF_SAMPLE_IDENTIFIER */ > u64 ip; /* if PERF_SAMPLE_IP */ > u32 pid, tid; /* if PERF_SAMPLE_TID */ > u64 time; /* if PERF_SAMPLE_TIME */ > @@ -1531,6 +1664,16 @@ struct { > .fi > .RS 4 > .TP 4 > +.I sample_id > +If > +.B PERF_SAMPLE_IDENTIFIER > +is enabled, a 64-bit unique ID is included. > +This is a duplication of the > +.B PERF_SAMPLE_ID > +.I id > +value, but included at the beginning of the sample > +so parsers can easily obtain the value. > +.TP > .I ip > If > .B PERF_SAMPLE_IP > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 3/4] perf_event_open.2 Linux 3.12 PERF_EVENT_IOC_ID [not found] ` <alpine.DEB.2.10.1311061324070.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org> 2013-11-06 18:28 ` [PATCH 1/4] perf_event_open.2 PERF_COUNT_SW_DUMMY support Vince Weaver 2013-11-06 18:30 ` [PATCH 2/4] perf_event_open.2 Linux 3.12 PERF_SAMPLE_IDENTIFIER Vince Weaver @ 2013-11-06 18:31 ` Vince Weaver [not found] ` <alpine.DEB.2.10.1311061330470.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org> 2013-11-06 18:34 ` [PATCH 4/4] perf_event_open.2 Linux 3.12 rdpmc/mmap Vince Weaver 3 siblings, 1 reply; 9+ messages in thread From: Vince Weaver @ 2013-11-06 18:31 UTC (permalink / raw) To: Michael Kerrisk (man-pages); +Cc: linux-man-u79uwXL29TY76Z2rM5mHXA A new perf_event related ioctl, PERF_EVENT_IOC_ID, was added in Linux 3.12. Signed-off-by: Vince Weaver <vincent.weaver-e7X0jjDqjFGHXe+LvDLADg@public.gmane.org> diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2 index 4ff9690..a443b6e 100644 --- a/man2/perf_event_open.2 +++ b/man2/perf_event_open.2 @@ -2000,6 +2160,12 @@ output should be ignored. This adds an ftrace filter to this event. The argument is a pointer to the desired ftrace filter. +.TP +.BR PERF_EVENT_IOC_ID " (Since Linux 3.12)" +Returns the event ID value for the given event fd. + +The argument is a pointer to a 64-bit unsigned integer +to hold the result. .SS Using prctl A process can enable or disable all the event groups that are attached to it using the -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 9+ messages in thread
[parent not found: <alpine.DEB.2.10.1311061330470.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>]
* Re: [PATCH 3/4] perf_event_open.2 Linux 3.12 PERF_EVENT_IOC_ID [not found] ` <alpine.DEB.2.10.1311061330470.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org> @ 2013-11-07 17:13 ` Michael Kerrisk (man-pages) 0 siblings, 0 replies; 9+ messages in thread From: Michael Kerrisk (man-pages) @ 2013-11-07 17:13 UTC (permalink / raw) To: Vince Weaver Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, linux-man-u79uwXL29TY76Z2rM5mHXA On 11/07/13 07:31, Vince Weaver wrote: > > A new perf_event related ioctl, PERF_EVENT_IOC_ID, was added > in Linux 3.12. Thanks, Vince. Applied. Cheers, Michael > Signed-off-by: Vince Weaver <vincent.weaver-e7X0jjDqjFGHXe+LvDLADg@public.gmane.org> > > > diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2 > index 4ff9690..a443b6e 100644 > --- a/man2/perf_event_open.2 > +++ b/man2/perf_event_open.2 > @@ -2000,6 +2160,12 @@ output should be ignored. > This adds an ftrace filter to this event. > > The argument is a pointer to the desired ftrace filter. > +.TP > +.BR PERF_EVENT_IOC_ID " (Since Linux 3.12)" > +Returns the event ID value for the given event fd. > + > +The argument is a pointer to a 64-bit unsigned integer > +to hold the result. > .SS Using prctl > A process can enable or disable all the event groups that are > attached to it using the > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 4/4] perf_event_open.2 Linux 3.12 rdpmc/mmap [not found] ` <alpine.DEB.2.10.1311061324070.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org> ` (2 preceding siblings ...) 2013-11-06 18:31 ` [PATCH 3/4] perf_event_open.2 Linux 3.12 PERF_EVENT_IOC_ID Vince Weaver @ 2013-11-06 18:34 ` Vince Weaver [not found] ` <alpine.DEB.2.10.1311061332040.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org> 3 siblings, 1 reply; 9+ messages in thread From: Vince Weaver @ 2013-11-06 18:34 UTC (permalink / raw) To: Michael Kerrisk (man-pages); +Cc: linux-man-u79uwXL29TY76Z2rM5mHXA It turns out that the perf_event mmap page rdpmc/time setting was broken, dating back to the introduction of the feature. Due to a mistake with a bitfield, two different values mapped to the same feature bit. A new somewhat backwards compatible interface was introduced in Linux 3.12. A much longer report on the issue can be found here: https://lwn.net/Articles/567894/ Signed-off-by: Vince Weaver <vincent.weaver-e7X0jjDqjFGHXe+LvDLADg@public.gmane.org> diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2 index 4ff9690..a443b6e 100644 --- a/man2/perf_event_open.2 +++ b/man2/perf_event_open.2 @@ -1142,8 +1196,13 @@ struct perf_event_mmap_page { __u64 time_running; /* time event on CPU */ union { __u64 capabilities; - __u64 cap_usr_time : 1, - cap_usr_rdpmc : 1, + struct { + __u64 cap_usr_time / cap_usr_rdpmc / cap_bit0 : 1, + cap_bit0_is_deprecated : 1, + cap_user_rdpmc : 1, + cap_user_time : 1, + cap_user_time_zero : 1, + }; }; __u16 pmc_width; __u16 time_shift; @@ -1173,8 +1232,9 @@ A seqlock for synchronization. A unique hardware counter identifier. .TP .I offset -.\" FIXME clarify -Add this to hardware counter value?? +When using rdpmc for reads this offset value +must be added to the one returned by rdpmc to get +the current total event count. .TP .I time_enabled Time the event was active. @@ -1182,10 +1242,45 @@ Time the event was active. .I time_running Time the event was running. .TP +.IR cap_usr_time " / " cap_usr_rdpmc " / " cap_bit0 " (Since Linux 3.4)" +There was a bug in the definition of .I cap_usr_time -User time capability. +and +.I cap_usr_rdpmc +from Linux 3.4 until Linux 3.11. +Both bits were defined to point to the same location, so it was +impossible to know if +.I cap_usr_time +or +.I cap_usr_rdpmc +were actually set. + +Starting with 3.12 these are renamed to +.I cap_bit0 +and you should use the new +.I cap_user_time +and +.I cap_user_rdpmc +fields instead. + .TP +.IR cap_bit0_is_deprecated " (Since Linux 3.12)" +If set this bit indicates that the kernel supports +the properly separated +.I cap_user_time +and +.I cap_user_rdpmc +bits. + +If not-set, it indicates an older kernel where +.I cap_usr_time +and .I cap_usr_rdpmc +map to the same bit and thus both features should +be used with caution. + +.TP +.IR cap_user_rdpmc " (Since Linux 3.12)" If the hardware supports user-space read of performance counters without syscall (this is the "rdpmc" instruction on x86), then the following code can be used to do a read: @@ -1195,7 +1290,6 @@ the following code can be used to do a read: u32 seq, time_mult, time_shift, idx, width; u64 count, enabled, running; u64 cyc, time_offset; -s64 pmc = 0; do { seq = pc\->lock; @@ -1215,7 +1309,7 @@ do { if (pc\->cap_usr_rdpmc && idx) { width = pc\->pmc_width; - pmc = rdpmc(idx \- 1); + count += rdpmc(idx \- 1); } barrier(); @@ -1223,6 +1317,16 @@ do { .fi .in .TP +.I cap_user_time " (Since Linux 3.12)" +This bit indicates the hardware has a constant, non-stop +timestamp counter (TSC on x86). +.TP +.IR cap_user_time_zero " (Since Linux 3.12)" +Indicates the presence of +.I time_zero +which allows mapping timestamp values to +the hardware clock. +.TP .I pmc_width If .IR cap_usr_rdpmc , @@ -1274,6 +1378,27 @@ enabled and possible running (if idx), improving the scaling: count = quot * enabled + (rem * enabled) / running; .fi .TP +.IR time_zero " (Since Linux 3.12)" + +If +.I cap_usr_time_zero +is set then the hardware clock (the TSC timestamp counter on x86) +can be calculated from the +.IR time_zero ", " time_mult ", and " time_shift " values:" +.nf + time = timestamp - time_zero; + quot = time / time_mult; + rem = time % time_mult; + cyc = (quot << time_shift) + (rem << time_shift) / time_mult; +.fi +And vice versa: +.nf + quot = cyc >> time_shift; + rem = cyc & ((1 << time_shift) - 1); + timestamp = time_zero + quot * time_mult + + ((rem * time_mult) >> time_shift); +.fi +.TP .I data_head This points to the head of the data section. The value continuously increases, it does not wrap. @@ -2221,6 +2387,17 @@ ioctl argument was broken and would repeatedly operate on the event specified rather than iterating across all sibling events in a group. +From Linux 3.4 to Linux 3.11 the mmap +.I cap_usr_rdpmc +and +.I cap_usr_time +bits mapped to the same location. +Code should migrate to the new +.I cap_user_rdpmc +and +.I cap_user_time +fields instead. + Always double-check your results! Various generalized events have had wrong values. For example, retired branches measured -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 9+ messages in thread
[parent not found: <alpine.DEB.2.10.1311061332040.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>]
* Re: [PATCH 4/4] perf_event_open.2 Linux 3.12 rdpmc/mmap [not found] ` <alpine.DEB.2.10.1311061332040.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org> @ 2013-11-07 17:14 ` Michael Kerrisk (man-pages) 0 siblings, 0 replies; 9+ messages in thread From: Michael Kerrisk (man-pages) @ 2013-11-07 17:14 UTC (permalink / raw) To: Vince Weaver Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, linux-man-u79uwXL29TY76Z2rM5mHXA On 11/07/13 07:34, Vince Weaver wrote: > > It turns out that the perf_event mmap page rdpmc/time setting was > broken, dating back to the introduction of the feature. Due > to a mistake with a bitfield, two different values mapped to > the same feature bit. > > A new somewhat backwards compatible interface was introduced > in Linux 3.12. A much longer report on the issue can be found > here: > https://lwn.net/Articles/567894/ > > Signed-off-by: Vince Weaver <vincent.weaver-e7X0jjDqjFGHXe+LvDLADg@public.gmane.org> Thanks, Vince. Applied. Cheers, Michael > diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2 > index 4ff9690..a443b6e 100644 > --- a/man2/perf_event_open.2 > +++ b/man2/perf_event_open.2 > @@ -1142,8 +1196,13 @@ struct perf_event_mmap_page { > __u64 time_running; /* time event on CPU */ > union { > __u64 capabilities; > - __u64 cap_usr_time : 1, > - cap_usr_rdpmc : 1, > + struct { > + __u64 cap_usr_time / cap_usr_rdpmc / cap_bit0 : 1, > + cap_bit0_is_deprecated : 1, > + cap_user_rdpmc : 1, > + cap_user_time : 1, > + cap_user_time_zero : 1, > + }; > }; > __u16 pmc_width; > __u16 time_shift; > @@ -1173,8 +1232,9 @@ A seqlock for synchronization. > A unique hardware counter identifier. > .TP > .I offset > -.\" FIXME clarify > -Add this to hardware counter value?? > +When using rdpmc for reads this offset value > +must be added to the one returned by rdpmc to get > +the current total event count. > .TP > .I time_enabled > Time the event was active. > @@ -1182,10 +1242,45 @@ Time the event was active. > .I time_running > Time the event was running. > .TP > +.IR cap_usr_time " / " cap_usr_rdpmc " / " cap_bit0 " (Since Linux 3.4)" > +There was a bug in the definition of > .I cap_usr_time > -User time capability. > +and > +.I cap_usr_rdpmc > +from Linux 3.4 until Linux 3.11. > +Both bits were defined to point to the same location, so it was > +impossible to know if > +.I cap_usr_time > +or > +.I cap_usr_rdpmc > +were actually set. > + > +Starting with 3.12 these are renamed to > +.I cap_bit0 > +and you should use the new > +.I cap_user_time > +and > +.I cap_user_rdpmc > +fields instead. > + > .TP > +.IR cap_bit0_is_deprecated " (Since Linux 3.12)" > +If set this bit indicates that the kernel supports > +the properly separated > +.I cap_user_time > +and > +.I cap_user_rdpmc > +bits. > + > +If not-set, it indicates an older kernel where > +.I cap_usr_time > +and > .I cap_usr_rdpmc > +map to the same bit and thus both features should > +be used with caution. > + > +.TP > +.IR cap_user_rdpmc " (Since Linux 3.12)" > If the hardware supports user-space read of performance counters > without syscall (this is the "rdpmc" instruction on x86), then > the following code can be used to do a read: > @@ -1195,7 +1290,6 @@ the following code can be used to do a read: > u32 seq, time_mult, time_shift, idx, width; > u64 count, enabled, running; > u64 cyc, time_offset; > -s64 pmc = 0; > > do { > seq = pc\->lock; > @@ -1215,7 +1309,7 @@ do { > > if (pc\->cap_usr_rdpmc && idx) { > width = pc\->pmc_width; > - pmc = rdpmc(idx \- 1); > + count += rdpmc(idx \- 1); > } > > barrier(); > @@ -1223,6 +1317,16 @@ do { > .fi > .in > .TP > +.I cap_user_time " (Since Linux 3.12)" > +This bit indicates the hardware has a constant, non-stop > +timestamp counter (TSC on x86). > +.TP > +.IR cap_user_time_zero " (Since Linux 3.12)" > +Indicates the presence of > +.I time_zero > +which allows mapping timestamp values to > +the hardware clock. > +.TP > .I pmc_width > If > .IR cap_usr_rdpmc , > @@ -1274,6 +1378,27 @@ enabled and possible running (if idx), improving the scaling: > count = quot * enabled + (rem * enabled) / running; > .fi > .TP > +.IR time_zero " (Since Linux 3.12)" > + > +If > +.I cap_usr_time_zero > +is set then the hardware clock (the TSC timestamp counter on x86) > +can be calculated from the > +.IR time_zero ", " time_mult ", and " time_shift " values:" > +.nf > + time = timestamp - time_zero; > + quot = time / time_mult; > + rem = time % time_mult; > + cyc = (quot << time_shift) + (rem << time_shift) / time_mult; > +.fi > +And vice versa: > +.nf > + quot = cyc >> time_shift; > + rem = cyc & ((1 << time_shift) - 1); > + timestamp = time_zero + quot * time_mult + > + ((rem * time_mult) >> time_shift); > +.fi > +.TP > .I data_head > This points to the head of the data section. > The value continuously increases, it does not wrap. > @@ -2221,6 +2387,17 @@ ioctl argument was broken and would repeatedly operate > on the event specified rather than iterating across > all sibling events in a group. > > +From Linux 3.4 to Linux 3.11 the mmap > +.I cap_usr_rdpmc > +and > +.I cap_usr_time > +bits mapped to the same location. > +Code should migrate to the new > +.I cap_user_rdpmc > +and > +.I cap_user_time > +fields instead. > + > Always double-check your results! > Various generalized events have had wrong values. > For example, retired branches measured > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-11-07 17:14 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-06 18:26 [PATCH 0/4] perf_event_open.2 Linux 3.12 updates Vince Weaver
[not found] ` <alpine.DEB.2.10.1311061324070.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
2013-11-06 18:28 ` [PATCH 1/4] perf_event_open.2 PERF_COUNT_SW_DUMMY support Vince Weaver
[not found] ` <alpine.DEB.2.10.1311061327180.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
2013-11-07 17:11 ` Michael Kerrisk (man-pages)
2013-11-06 18:30 ` [PATCH 2/4] perf_event_open.2 Linux 3.12 PERF_SAMPLE_IDENTIFIER Vince Weaver
[not found] ` <alpine.DEB.2.10.1311061329120.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
2013-11-07 17:13 ` Michael Kerrisk (man-pages)
2013-11-06 18:31 ` [PATCH 3/4] perf_event_open.2 Linux 3.12 PERF_EVENT_IOC_ID Vince Weaver
[not found] ` <alpine.DEB.2.10.1311061330470.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
2013-11-07 17:13 ` Michael Kerrisk (man-pages)
2013-11-06 18:34 ` [PATCH 4/4] perf_event_open.2 Linux 3.12 rdpmc/mmap Vince Weaver
[not found] ` <alpine.DEB.2.10.1311061332040.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
2013-11-07 17:14 ` Michael Kerrisk (man-pages)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).