* [PATCH 0/4] perf_event_open.2 Linux 3.12 updates
@ 2013-11-06 18:26 Vince Weaver
[not found] ` <alpine.DEB.2.10.1311061324070.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Vince Weaver @ 2013-11-06 18:26 UTC (permalink / raw)
To: Michael Kerrisk (man-pages); +Cc: linux-man-u79uwXL29TY76Z2rM5mHXA
Hello
here is a patch series bringing perf_event_open.2 documentation
in line with the recent Linux 3.12 release.
This replaces a patch I sent previously that made similar changes.
[The only realy change from that patch was "mmap2" sample support
was dropped at the last minute. The defines are still in the
perf_event.h header file but it will always return EINVAL
until the kernel developers finalize the interface].
Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/4] perf_event_open.2 PERF_COUNT_SW_DUMMY support
[not found] ` <alpine.DEB.2.10.1311061324070.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
@ 2013-11-06 18:28 ` Vince Weaver
[not found] ` <alpine.DEB.2.10.1311061327180.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
2013-11-06 18:30 ` [PATCH 2/4] perf_event_open.2 Linux 3.12 PERF_SAMPLE_IDENTIFIER Vince Weaver
` (2 subsequent siblings)
3 siblings, 1 reply; 9+ messages in thread
From: Vince Weaver @ 2013-11-06 18:28 UTC (permalink / raw)
To: Michael Kerrisk (man-pages); +Cc: linux-man-u79uwXL29TY76Z2rM5mHXA
Support for the PERF_COUNT_SW_DUMMY event type was added in Linux 3.12.
Signed-off-by: Vince Weaver <vincent.weaver-e7X0jjDqjFGHXe+LvDLADg@public.gmane.org>
diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2
index 4ff9690..a443b6e 100644
--- a/man2/perf_event_open.2
+++ b/man2/perf_event_open.2
@@ -468,6 +468,13 @@ This counts the number of emulation faults.
The kernel sometimes traps on unimplemented instructions
and emulates them for user space.
This can negatively impact performance.
+.TP
+.BR PERF_COUNT_SW_DUMMY " (Since Linux 3.12)"
+This is a placeholder event that counts nothing.
+Informational sample record types such as mmap or comm
+must be associated with an active event.
+This dummy event allows gathering such records without requiring
+a counting event.
.RE
.RS
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/4] perf_event_open.2 Linux 3.12 PERF_SAMPLE_IDENTIFIER
[not found] ` <alpine.DEB.2.10.1311061324070.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
2013-11-06 18:28 ` [PATCH 1/4] perf_event_open.2 PERF_COUNT_SW_DUMMY support Vince Weaver
@ 2013-11-06 18:30 ` Vince Weaver
[not found] ` <alpine.DEB.2.10.1311061329120.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
2013-11-06 18:31 ` [PATCH 3/4] perf_event_open.2 Linux 3.12 PERF_EVENT_IOC_ID Vince Weaver
2013-11-06 18:34 ` [PATCH 4/4] perf_event_open.2 Linux 3.12 rdpmc/mmap Vince Weaver
3 siblings, 1 reply; 9+ messages in thread
From: Vince Weaver @ 2013-11-06 18:30 UTC (permalink / raw)
To: Michael Kerrisk (man-pages); +Cc: linux-man-u79uwXL29TY76Z2rM5mHXA
A new PERF_SAMPLE_IDENTIFIER sample type was added in Linux 3.12
Signed-off-by: Vince Weaver <vincent.weaver-e7X0jjDqjFGHXe+LvDLADg@public.gmane.org>
diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2
index 4ff9690..a443b6e 100644
--- a/man2/perf_event_open.2
+++ b/man2/perf_event_open.2
@@ -680,6 +687,27 @@ Records the data source: where in the memory hierarchy
the data associated with the sampled instruction came from.
This is only available if the underlying hardware
supports this feature.
+.TP
+.BR PERF_SAMPLE_IDENTIFIER " (Since Linux 3.12)"
+Places the SAMPLE_ID value in a fixed position in the record,
+either at the beginning (for sample events) or at the end
+(if a non-sample event).
+
+This was necessary because a sample stream may have
+records from various different event sources with different
+.I sample_type
+settings.
+Parsing the event stream properly was not possible because the
+format of the record was needed to find SAMPLE_ID, but
+the the format could not be found without knowing what
+event the sample belonged to (causing a circular
+dependency).
+
+This new
+.B PERF_SAMPLE_IDENTIFIER
+setting makes the event stream always parsable
+by putting SAMPLE_ID in a fixed location, even though
+it means having duplicate SAMPLE_ID values in records.
.RE
.TP
.IR "read_format"
@@ -860,12 +888,33 @@ field, but enables including data mmap events
in the ring-buffer.
.TP
.IR "sample_id_all" " (Since Linux 2.6.38)"
-If set, then TID, TIME, ID, CPU, and STREAM_ID can
+If set, then TID, TIME, ID, STREAM_ID, and CPU can
additionally be included in
.RB non- PERF_RECORD_SAMPLE s
if the corresponding
.I sample_type
is selected.
+
+If
+.B PERF_SAMPLE_IDENTIFIER
+is specified than an additional ID value is included
+as the last value to ease parsing the record stream.
+This may lead to the
+.I id
+value appearing twice.
+
+The layout is described by this pseudo-structure:
+.in +4n
+.nf
+struct sample_id {
+ { u32 pid, tid; } /* if PERF_SAMPLE_TID set */
+ { u64 time; } /* if PERF_SAMPLE_TIME set */
+ { u64 id; } /* if PERF_SAMPLE_ID set */
+ { u64 stream_id;} /* if PERF_SAMPLE_STREAM_ID set */
+ { u32 cpu, res; } /* if PERF_SAMPLE_CPU set */
+ { u64 id; } /* if PERF_SAMPLE_IDENTIFIER set */
+};
+.fi
.TP
.IR "exclude_host" " (Since Linux 3.2)"
Do not measure time spent in VM host.
@@ -1385,6 +1510,7 @@ The values in the corresponding record (that follows the header)
depend on the
.I type
selected as shown.
+
.RS
.TP 4
.B PERF_RECORD_MMAP
@@ -1416,6 +1542,7 @@ struct {
struct perf_event_header header;
u64 id;
u64 lost;
+ struct sample_id sample_id;
};
.fi
.in
@@ -1437,6 +1564,7 @@ struct {
struct perf_event_header header;
u32 pid, tid;
char comm[];
+ struct sample_id sample_id;
};
.fi
.in
@@ -1451,6 +1579,7 @@ struct {
u32 pid, ppid;
u32 tid, ptid;
u64 time;
+ struct sample_id sample_id;
};
.fi
.in
@@ -1465,6 +1594,7 @@ struct {
u64 time;
u64 id;
u64 stream_id;
+ struct sample_id sample_id;
};
.fi
.in
@@ -1479,6 +1609,7 @@ struct {
u32 pid, ppid;
u32 tid, ptid;
u64 time;
+ struct sample_id sample_id;
};
.fi
.in
@@ -1492,6 +1623,7 @@ struct {
struct perf_event_header header;
u32 pid, tid;
struct read_format values;
+ struct sample_id sample_id;
};
.fi
.in
@@ -1503,6 +1635,7 @@ This record indicates a sample.
.nf
struct {
struct perf_event_header header;
+ u64 sample_id; /* if PERF_SAMPLE_IDENTIFIER */
u64 ip; /* if PERF_SAMPLE_IP */
u32 pid, tid; /* if PERF_SAMPLE_TID */
u64 time; /* if PERF_SAMPLE_TIME */
@@ -1531,6 +1664,16 @@ struct {
.fi
.RS 4
.TP 4
+.I sample_id
+If
+.B PERF_SAMPLE_IDENTIFIER
+is enabled, a 64-bit unique ID is included.
+This is a duplication of the
+.B PERF_SAMPLE_ID
+.I id
+value, but included at the beginning of the sample
+so parsers can easily obtain the value.
+.TP
.I ip
If
.B PERF_SAMPLE_IP
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 3/4] perf_event_open.2 Linux 3.12 PERF_EVENT_IOC_ID
[not found] ` <alpine.DEB.2.10.1311061324070.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
2013-11-06 18:28 ` [PATCH 1/4] perf_event_open.2 PERF_COUNT_SW_DUMMY support Vince Weaver
2013-11-06 18:30 ` [PATCH 2/4] perf_event_open.2 Linux 3.12 PERF_SAMPLE_IDENTIFIER Vince Weaver
@ 2013-11-06 18:31 ` Vince Weaver
[not found] ` <alpine.DEB.2.10.1311061330470.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
2013-11-06 18:34 ` [PATCH 4/4] perf_event_open.2 Linux 3.12 rdpmc/mmap Vince Weaver
3 siblings, 1 reply; 9+ messages in thread
From: Vince Weaver @ 2013-11-06 18:31 UTC (permalink / raw)
To: Michael Kerrisk (man-pages); +Cc: linux-man-u79uwXL29TY76Z2rM5mHXA
A new perf_event related ioctl, PERF_EVENT_IOC_ID, was added
in Linux 3.12.
Signed-off-by: Vince Weaver <vincent.weaver-e7X0jjDqjFGHXe+LvDLADg@public.gmane.org>
diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2
index 4ff9690..a443b6e 100644
--- a/man2/perf_event_open.2
+++ b/man2/perf_event_open.2
@@ -2000,6 +2160,12 @@ output should be ignored.
This adds an ftrace filter to this event.
The argument is a pointer to the desired ftrace filter.
+.TP
+.BR PERF_EVENT_IOC_ID " (Since Linux 3.12)"
+Returns the event ID value for the given event fd.
+
+The argument is a pointer to a 64-bit unsigned integer
+to hold the result.
.SS Using prctl
A process can enable or disable all the event groups that are
attached to it using the
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 4/4] perf_event_open.2 Linux 3.12 rdpmc/mmap
[not found] ` <alpine.DEB.2.10.1311061324070.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
` (2 preceding siblings ...)
2013-11-06 18:31 ` [PATCH 3/4] perf_event_open.2 Linux 3.12 PERF_EVENT_IOC_ID Vince Weaver
@ 2013-11-06 18:34 ` Vince Weaver
[not found] ` <alpine.DEB.2.10.1311061332040.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
3 siblings, 1 reply; 9+ messages in thread
From: Vince Weaver @ 2013-11-06 18:34 UTC (permalink / raw)
To: Michael Kerrisk (man-pages); +Cc: linux-man-u79uwXL29TY76Z2rM5mHXA
It turns out that the perf_event mmap page rdpmc/time setting was
broken, dating back to the introduction of the feature. Due
to a mistake with a bitfield, two different values mapped to
the same feature bit.
A new somewhat backwards compatible interface was introduced
in Linux 3.12. A much longer report on the issue can be found
here:
https://lwn.net/Articles/567894/
Signed-off-by: Vince Weaver <vincent.weaver-e7X0jjDqjFGHXe+LvDLADg@public.gmane.org>
diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2
index 4ff9690..a443b6e 100644
--- a/man2/perf_event_open.2
+++ b/man2/perf_event_open.2
@@ -1142,8 +1196,13 @@ struct perf_event_mmap_page {
__u64 time_running; /* time event on CPU */
union {
__u64 capabilities;
- __u64 cap_usr_time : 1,
- cap_usr_rdpmc : 1,
+ struct {
+ __u64 cap_usr_time / cap_usr_rdpmc / cap_bit0 : 1,
+ cap_bit0_is_deprecated : 1,
+ cap_user_rdpmc : 1,
+ cap_user_time : 1,
+ cap_user_time_zero : 1,
+ };
};
__u16 pmc_width;
__u16 time_shift;
@@ -1173,8 +1232,9 @@ A seqlock for synchronization.
A unique hardware counter identifier.
.TP
.I offset
-.\" FIXME clarify
-Add this to hardware counter value??
+When using rdpmc for reads this offset value
+must be added to the one returned by rdpmc to get
+the current total event count.
.TP
.I time_enabled
Time the event was active.
@@ -1182,10 +1242,45 @@ Time the event was active.
.I time_running
Time the event was running.
.TP
+.IR cap_usr_time " / " cap_usr_rdpmc " / " cap_bit0 " (Since Linux 3.4)"
+There was a bug in the definition of
.I cap_usr_time
-User time capability.
+and
+.I cap_usr_rdpmc
+from Linux 3.4 until Linux 3.11.
+Both bits were defined to point to the same location, so it was
+impossible to know if
+.I cap_usr_time
+or
+.I cap_usr_rdpmc
+were actually set.
+
+Starting with 3.12 these are renamed to
+.I cap_bit0
+and you should use the new
+.I cap_user_time
+and
+.I cap_user_rdpmc
+fields instead.
+
.TP
+.IR cap_bit0_is_deprecated " (Since Linux 3.12)"
+If set this bit indicates that the kernel supports
+the properly separated
+.I cap_user_time
+and
+.I cap_user_rdpmc
+bits.
+
+If not-set, it indicates an older kernel where
+.I cap_usr_time
+and
.I cap_usr_rdpmc
+map to the same bit and thus both features should
+be used with caution.
+
+.TP
+.IR cap_user_rdpmc " (Since Linux 3.12)"
If the hardware supports user-space read of performance counters
without syscall (this is the "rdpmc" instruction on x86), then
the following code can be used to do a read:
@@ -1195,7 +1290,6 @@ the following code can be used to do a read:
u32 seq, time_mult, time_shift, idx, width;
u64 count, enabled, running;
u64 cyc, time_offset;
-s64 pmc = 0;
do {
seq = pc\->lock;
@@ -1215,7 +1309,7 @@ do {
if (pc\->cap_usr_rdpmc && idx) {
width = pc\->pmc_width;
- pmc = rdpmc(idx \- 1);
+ count += rdpmc(idx \- 1);
}
barrier();
@@ -1223,6 +1317,16 @@ do {
.fi
.in
.TP
+.I cap_user_time " (Since Linux 3.12)"
+This bit indicates the hardware has a constant, non-stop
+timestamp counter (TSC on x86).
+.TP
+.IR cap_user_time_zero " (Since Linux 3.12)"
+Indicates the presence of
+.I time_zero
+which allows mapping timestamp values to
+the hardware clock.
+.TP
.I pmc_width
If
.IR cap_usr_rdpmc ,
@@ -1274,6 +1378,27 @@ enabled and possible running (if idx), improving the scaling:
count = quot * enabled + (rem * enabled) / running;
.fi
.TP
+.IR time_zero " (Since Linux 3.12)"
+
+If
+.I cap_usr_time_zero
+is set then the hardware clock (the TSC timestamp counter on x86)
+can be calculated from the
+.IR time_zero ", " time_mult ", and " time_shift " values:"
+.nf
+ time = timestamp - time_zero;
+ quot = time / time_mult;
+ rem = time % time_mult;
+ cyc = (quot << time_shift) + (rem << time_shift) / time_mult;
+.fi
+And vice versa:
+.nf
+ quot = cyc >> time_shift;
+ rem = cyc & ((1 << time_shift) - 1);
+ timestamp = time_zero + quot * time_mult +
+ ((rem * time_mult) >> time_shift);
+.fi
+.TP
.I data_head
This points to the head of the data section.
The value continuously increases, it does not wrap.
@@ -2221,6 +2387,17 @@ ioctl argument was broken and would repeatedly operate
on the event specified rather than iterating across
all sibling events in a group.
+From Linux 3.4 to Linux 3.11 the mmap
+.I cap_usr_rdpmc
+and
+.I cap_usr_time
+bits mapped to the same location.
+Code should migrate to the new
+.I cap_user_rdpmc
+and
+.I cap_user_time
+fields instead.
+
Always double-check your results!
Various generalized events have had wrong values.
For example, retired branches measured
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/4] perf_event_open.2 PERF_COUNT_SW_DUMMY support
[not found] ` <alpine.DEB.2.10.1311061327180.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
@ 2013-11-07 17:11 ` Michael Kerrisk (man-pages)
0 siblings, 0 replies; 9+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-11-07 17:11 UTC (permalink / raw)
To: Vince Weaver
Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
linux-man-u79uwXL29TY76Z2rM5mHXA
On 11/07/13 07:28, Vince Weaver wrote:
>
> Support for the PERF_COUNT_SW_DUMMY event type was added in Linux 3.12.
Thanks, Vince. Applied.
Cheers,
Michael
> Signed-off-by: Vince Weaver <vincent.weaver-e7X0jjDqjFGHXe+LvDLADg@public.gmane.org>
>
> diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2
> index 4ff9690..a443b6e 100644
> --- a/man2/perf_event_open.2
> +++ b/man2/perf_event_open.2
> @@ -468,6 +468,13 @@ This counts the number of emulation faults.
> The kernel sometimes traps on unimplemented instructions
> and emulates them for user space.
> This can negatively impact performance.
> +.TP
> +.BR PERF_COUNT_SW_DUMMY " (Since Linux 3.12)"
> +This is a placeholder event that counts nothing.
> +Informational sample record types such as mmap or comm
> +must be associated with an active event.
> +This dummy event allows gathering such records without requiring
> +a counting event.
> .RE
>
> .RS
>
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/4] perf_event_open.2 Linux 3.12 PERF_SAMPLE_IDENTIFIER
[not found] ` <alpine.DEB.2.10.1311061329120.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
@ 2013-11-07 17:13 ` Michael Kerrisk (man-pages)
0 siblings, 0 replies; 9+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-11-07 17:13 UTC (permalink / raw)
To: Vince Weaver
Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
linux-man-u79uwXL29TY76Z2rM5mHXA
On 11/07/13 07:30, Vince Weaver wrote:
>
> A new PERF_SAMPLE_IDENTIFIER sample type was added in Linux 3.12
Thanks, Vince. Applied.
Cheers,
Michael
> Signed-off-by: Vince Weaver <vincent.weaver-e7X0jjDqjFGHXe+LvDLADg@public.gmane.org>
>
> diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2
> index 4ff9690..a443b6e 100644
> --- a/man2/perf_event_open.2
> +++ b/man2/perf_event_open.2
> @@ -680,6 +687,27 @@ Records the data source: where in the memory hierarchy
> the data associated with the sampled instruction came from.
> This is only available if the underlying hardware
> supports this feature.
> +.TP
> +.BR PERF_SAMPLE_IDENTIFIER " (Since Linux 3.12)"
> +Places the SAMPLE_ID value in a fixed position in the record,
> +either at the beginning (for sample events) or at the end
> +(if a non-sample event).
> +
> +This was necessary because a sample stream may have
> +records from various different event sources with different
> +.I sample_type
> +settings.
> +Parsing the event stream properly was not possible because the
> +format of the record was needed to find SAMPLE_ID, but
> +the the format could not be found without knowing what
> +event the sample belonged to (causing a circular
> +dependency).
> +
> +This new
> +.B PERF_SAMPLE_IDENTIFIER
> +setting makes the event stream always parsable
> +by putting SAMPLE_ID in a fixed location, even though
> +it means having duplicate SAMPLE_ID values in records.
> .RE
> .TP
> .IR "read_format"
> @@ -860,12 +888,33 @@ field, but enables including data mmap events
> in the ring-buffer.
> .TP
> .IR "sample_id_all" " (Since Linux 2.6.38)"
> -If set, then TID, TIME, ID, CPU, and STREAM_ID can
> +If set, then TID, TIME, ID, STREAM_ID, and CPU can
> additionally be included in
> .RB non- PERF_RECORD_SAMPLE s
> if the corresponding
> .I sample_type
> is selected.
> +
> +If
> +.B PERF_SAMPLE_IDENTIFIER
> +is specified than an additional ID value is included
> +as the last value to ease parsing the record stream.
> +This may lead to the
> +.I id
> +value appearing twice.
> +
> +The layout is described by this pseudo-structure:
> +.in +4n
> +.nf
> +struct sample_id {
> + { u32 pid, tid; } /* if PERF_SAMPLE_TID set */
> + { u64 time; } /* if PERF_SAMPLE_TIME set */
> + { u64 id; } /* if PERF_SAMPLE_ID set */
> + { u64 stream_id;} /* if PERF_SAMPLE_STREAM_ID set */
> + { u32 cpu, res; } /* if PERF_SAMPLE_CPU set */
> + { u64 id; } /* if PERF_SAMPLE_IDENTIFIER set */
> +};
> +.fi
> .TP
> .IR "exclude_host" " (Since Linux 3.2)"
> Do not measure time spent in VM host.
> @@ -1385,6 +1510,7 @@ The values in the corresponding record (that follows the header)
> depend on the
> .I type
> selected as shown.
> +
> .RS
> .TP 4
> .B PERF_RECORD_MMAP
> @@ -1416,6 +1542,7 @@ struct {
> struct perf_event_header header;
> u64 id;
> u64 lost;
> + struct sample_id sample_id;
> };
> .fi
> .in
> @@ -1437,6 +1564,7 @@ struct {
> struct perf_event_header header;
> u32 pid, tid;
> char comm[];
> + struct sample_id sample_id;
> };
> .fi
> .in
> @@ -1451,6 +1579,7 @@ struct {
> u32 pid, ppid;
> u32 tid, ptid;
> u64 time;
> + struct sample_id sample_id;
> };
> .fi
> .in
> @@ -1465,6 +1594,7 @@ struct {
> u64 time;
> u64 id;
> u64 stream_id;
> + struct sample_id sample_id;
> };
> .fi
> .in
> @@ -1479,6 +1609,7 @@ struct {
> u32 pid, ppid;
> u32 tid, ptid;
> u64 time;
> + struct sample_id sample_id;
> };
> .fi
> .in
> @@ -1492,6 +1623,7 @@ struct {
> struct perf_event_header header;
> u32 pid, tid;
> struct read_format values;
> + struct sample_id sample_id;
> };
> .fi
> .in
> @@ -1503,6 +1635,7 @@ This record indicates a sample.
> .nf
> struct {
> struct perf_event_header header;
> + u64 sample_id; /* if PERF_SAMPLE_IDENTIFIER */
> u64 ip; /* if PERF_SAMPLE_IP */
> u32 pid, tid; /* if PERF_SAMPLE_TID */
> u64 time; /* if PERF_SAMPLE_TIME */
> @@ -1531,6 +1664,16 @@ struct {
> .fi
> .RS 4
> .TP 4
> +.I sample_id
> +If
> +.B PERF_SAMPLE_IDENTIFIER
> +is enabled, a 64-bit unique ID is included.
> +This is a duplication of the
> +.B PERF_SAMPLE_ID
> +.I id
> +value, but included at the beginning of the sample
> +so parsers can easily obtain the value.
> +.TP
> .I ip
> If
> .B PERF_SAMPLE_IP
>
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/4] perf_event_open.2 Linux 3.12 PERF_EVENT_IOC_ID
[not found] ` <alpine.DEB.2.10.1311061330470.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
@ 2013-11-07 17:13 ` Michael Kerrisk (man-pages)
0 siblings, 0 replies; 9+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-11-07 17:13 UTC (permalink / raw)
To: Vince Weaver
Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
linux-man-u79uwXL29TY76Z2rM5mHXA
On 11/07/13 07:31, Vince Weaver wrote:
>
> A new perf_event related ioctl, PERF_EVENT_IOC_ID, was added
> in Linux 3.12.
Thanks, Vince. Applied.
Cheers,
Michael
> Signed-off-by: Vince Weaver <vincent.weaver-e7X0jjDqjFGHXe+LvDLADg@public.gmane.org>
>
>
> diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2
> index 4ff9690..a443b6e 100644
> --- a/man2/perf_event_open.2
> +++ b/man2/perf_event_open.2
> @@ -2000,6 +2160,12 @@ output should be ignored.
> This adds an ftrace filter to this event.
>
> The argument is a pointer to the desired ftrace filter.
> +.TP
> +.BR PERF_EVENT_IOC_ID " (Since Linux 3.12)"
> +Returns the event ID value for the given event fd.
> +
> +The argument is a pointer to a 64-bit unsigned integer
> +to hold the result.
> .SS Using prctl
> A process can enable or disable all the event groups that are
> attached to it using the
>
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 4/4] perf_event_open.2 Linux 3.12 rdpmc/mmap
[not found] ` <alpine.DEB.2.10.1311061332040.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
@ 2013-11-07 17:14 ` Michael Kerrisk (man-pages)
0 siblings, 0 replies; 9+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-11-07 17:14 UTC (permalink / raw)
To: Vince Weaver
Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
linux-man-u79uwXL29TY76Z2rM5mHXA
On 11/07/13 07:34, Vince Weaver wrote:
>
> It turns out that the perf_event mmap page rdpmc/time setting was
> broken, dating back to the introduction of the feature. Due
> to a mistake with a bitfield, two different values mapped to
> the same feature bit.
>
> A new somewhat backwards compatible interface was introduced
> in Linux 3.12. A much longer report on the issue can be found
> here:
> https://lwn.net/Articles/567894/
>
> Signed-off-by: Vince Weaver <vincent.weaver-e7X0jjDqjFGHXe+LvDLADg@public.gmane.org>
Thanks, Vince. Applied.
Cheers,
Michael
> diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2
> index 4ff9690..a443b6e 100644
> --- a/man2/perf_event_open.2
> +++ b/man2/perf_event_open.2
> @@ -1142,8 +1196,13 @@ struct perf_event_mmap_page {
> __u64 time_running; /* time event on CPU */
> union {
> __u64 capabilities;
> - __u64 cap_usr_time : 1,
> - cap_usr_rdpmc : 1,
> + struct {
> + __u64 cap_usr_time / cap_usr_rdpmc / cap_bit0 : 1,
> + cap_bit0_is_deprecated : 1,
> + cap_user_rdpmc : 1,
> + cap_user_time : 1,
> + cap_user_time_zero : 1,
> + };
> };
> __u16 pmc_width;
> __u16 time_shift;
> @@ -1173,8 +1232,9 @@ A seqlock for synchronization.
> A unique hardware counter identifier.
> .TP
> .I offset
> -.\" FIXME clarify
> -Add this to hardware counter value??
> +When using rdpmc for reads this offset value
> +must be added to the one returned by rdpmc to get
> +the current total event count.
> .TP
> .I time_enabled
> Time the event was active.
> @@ -1182,10 +1242,45 @@ Time the event was active.
> .I time_running
> Time the event was running.
> .TP
> +.IR cap_usr_time " / " cap_usr_rdpmc " / " cap_bit0 " (Since Linux 3.4)"
> +There was a bug in the definition of
> .I cap_usr_time
> -User time capability.
> +and
> +.I cap_usr_rdpmc
> +from Linux 3.4 until Linux 3.11.
> +Both bits were defined to point to the same location, so it was
> +impossible to know if
> +.I cap_usr_time
> +or
> +.I cap_usr_rdpmc
> +were actually set.
> +
> +Starting with 3.12 these are renamed to
> +.I cap_bit0
> +and you should use the new
> +.I cap_user_time
> +and
> +.I cap_user_rdpmc
> +fields instead.
> +
> .TP
> +.IR cap_bit0_is_deprecated " (Since Linux 3.12)"
> +If set this bit indicates that the kernel supports
> +the properly separated
> +.I cap_user_time
> +and
> +.I cap_user_rdpmc
> +bits.
> +
> +If not-set, it indicates an older kernel where
> +.I cap_usr_time
> +and
> .I cap_usr_rdpmc
> +map to the same bit and thus both features should
> +be used with caution.
> +
> +.TP
> +.IR cap_user_rdpmc " (Since Linux 3.12)"
> If the hardware supports user-space read of performance counters
> without syscall (this is the "rdpmc" instruction on x86), then
> the following code can be used to do a read:
> @@ -1195,7 +1290,6 @@ the following code can be used to do a read:
> u32 seq, time_mult, time_shift, idx, width;
> u64 count, enabled, running;
> u64 cyc, time_offset;
> -s64 pmc = 0;
>
> do {
> seq = pc\->lock;
> @@ -1215,7 +1309,7 @@ do {
>
> if (pc\->cap_usr_rdpmc && idx) {
> width = pc\->pmc_width;
> - pmc = rdpmc(idx \- 1);
> + count += rdpmc(idx \- 1);
> }
>
> barrier();
> @@ -1223,6 +1317,16 @@ do {
> .fi
> .in
> .TP
> +.I cap_user_time " (Since Linux 3.12)"
> +This bit indicates the hardware has a constant, non-stop
> +timestamp counter (TSC on x86).
> +.TP
> +.IR cap_user_time_zero " (Since Linux 3.12)"
> +Indicates the presence of
> +.I time_zero
> +which allows mapping timestamp values to
> +the hardware clock.
> +.TP
> .I pmc_width
> If
> .IR cap_usr_rdpmc ,
> @@ -1274,6 +1378,27 @@ enabled and possible running (if idx), improving the scaling:
> count = quot * enabled + (rem * enabled) / running;
> .fi
> .TP
> +.IR time_zero " (Since Linux 3.12)"
> +
> +If
> +.I cap_usr_time_zero
> +is set then the hardware clock (the TSC timestamp counter on x86)
> +can be calculated from the
> +.IR time_zero ", " time_mult ", and " time_shift " values:"
> +.nf
> + time = timestamp - time_zero;
> + quot = time / time_mult;
> + rem = time % time_mult;
> + cyc = (quot << time_shift) + (rem << time_shift) / time_mult;
> +.fi
> +And vice versa:
> +.nf
> + quot = cyc >> time_shift;
> + rem = cyc & ((1 << time_shift) - 1);
> + timestamp = time_zero + quot * time_mult +
> + ((rem * time_mult) >> time_shift);
> +.fi
> +.TP
> .I data_head
> This points to the head of the data section.
> The value continuously increases, it does not wrap.
> @@ -2221,6 +2387,17 @@ ioctl argument was broken and would repeatedly operate
> on the event specified rather than iterating across
> all sibling events in a group.
>
> +From Linux 3.4 to Linux 3.11 the mmap
> +.I cap_usr_rdpmc
> +and
> +.I cap_usr_time
> +bits mapped to the same location.
> +Code should migrate to the new
> +.I cap_user_rdpmc
> +and
> +.I cap_user_time
> +fields instead.
> +
> Always double-check your results!
> Various generalized events have had wrong values.
> For example, retired branches measured
>
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-11-07 17:14 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-06 18:26 [PATCH 0/4] perf_event_open.2 Linux 3.12 updates Vince Weaver
[not found] ` <alpine.DEB.2.10.1311061324070.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
2013-11-06 18:28 ` [PATCH 1/4] perf_event_open.2 PERF_COUNT_SW_DUMMY support Vince Weaver
[not found] ` <alpine.DEB.2.10.1311061327180.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
2013-11-07 17:11 ` Michael Kerrisk (man-pages)
2013-11-06 18:30 ` [PATCH 2/4] perf_event_open.2 Linux 3.12 PERF_SAMPLE_IDENTIFIER Vince Weaver
[not found] ` <alpine.DEB.2.10.1311061329120.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
2013-11-07 17:13 ` Michael Kerrisk (man-pages)
2013-11-06 18:31 ` [PATCH 3/4] perf_event_open.2 Linux 3.12 PERF_EVENT_IOC_ID Vince Weaver
[not found] ` <alpine.DEB.2.10.1311061330470.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
2013-11-07 17:13 ` Michael Kerrisk (man-pages)
2013-11-06 18:34 ` [PATCH 4/4] perf_event_open.2 Linux 3.12 rdpmc/mmap Vince Weaver
[not found] ` <alpine.DEB.2.10.1311061332040.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
2013-11-07 17:14 ` Michael Kerrisk (man-pages)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).