From: "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: Vince Weaver <vincent.weaver-e7X0jjDqjFGHXe+LvDLADg@public.gmane.org>
Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH 4/4] perf_event_open.2 Linux 3.12 rdpmc/mmap
Date: Fri, 08 Nov 2013 06:14:48 +1300 [thread overview]
Message-ID: <527BCA88.90107@gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.10.1311061332040.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
On 11/07/13 07:34, Vince Weaver wrote:
>
> It turns out that the perf_event mmap page rdpmc/time setting was
> broken, dating back to the introduction of the feature. Due
> to a mistake with a bitfield, two different values mapped to
> the same feature bit.
>
> A new somewhat backwards compatible interface was introduced
> in Linux 3.12. A much longer report on the issue can be found
> here:
> https://lwn.net/Articles/567894/
>
> Signed-off-by: Vince Weaver <vincent.weaver-e7X0jjDqjFGHXe+LvDLADg@public.gmane.org>
Thanks, Vince. Applied.
Cheers,
Michael
> diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2
> index 4ff9690..a443b6e 100644
> --- a/man2/perf_event_open.2
> +++ b/man2/perf_event_open.2
> @@ -1142,8 +1196,13 @@ struct perf_event_mmap_page {
> __u64 time_running; /* time event on CPU */
> union {
> __u64 capabilities;
> - __u64 cap_usr_time : 1,
> - cap_usr_rdpmc : 1,
> + struct {
> + __u64 cap_usr_time / cap_usr_rdpmc / cap_bit0 : 1,
> + cap_bit0_is_deprecated : 1,
> + cap_user_rdpmc : 1,
> + cap_user_time : 1,
> + cap_user_time_zero : 1,
> + };
> };
> __u16 pmc_width;
> __u16 time_shift;
> @@ -1173,8 +1232,9 @@ A seqlock for synchronization.
> A unique hardware counter identifier.
> .TP
> .I offset
> -.\" FIXME clarify
> -Add this to hardware counter value??
> +When using rdpmc for reads this offset value
> +must be added to the one returned by rdpmc to get
> +the current total event count.
> .TP
> .I time_enabled
> Time the event was active.
> @@ -1182,10 +1242,45 @@ Time the event was active.
> .I time_running
> Time the event was running.
> .TP
> +.IR cap_usr_time " / " cap_usr_rdpmc " / " cap_bit0 " (Since Linux 3.4)"
> +There was a bug in the definition of
> .I cap_usr_time
> -User time capability.
> +and
> +.I cap_usr_rdpmc
> +from Linux 3.4 until Linux 3.11.
> +Both bits were defined to point to the same location, so it was
> +impossible to know if
> +.I cap_usr_time
> +or
> +.I cap_usr_rdpmc
> +were actually set.
> +
> +Starting with 3.12 these are renamed to
> +.I cap_bit0
> +and you should use the new
> +.I cap_user_time
> +and
> +.I cap_user_rdpmc
> +fields instead.
> +
> .TP
> +.IR cap_bit0_is_deprecated " (Since Linux 3.12)"
> +If set this bit indicates that the kernel supports
> +the properly separated
> +.I cap_user_time
> +and
> +.I cap_user_rdpmc
> +bits.
> +
> +If not-set, it indicates an older kernel where
> +.I cap_usr_time
> +and
> .I cap_usr_rdpmc
> +map to the same bit and thus both features should
> +be used with caution.
> +
> +.TP
> +.IR cap_user_rdpmc " (Since Linux 3.12)"
> If the hardware supports user-space read of performance counters
> without syscall (this is the "rdpmc" instruction on x86), then
> the following code can be used to do a read:
> @@ -1195,7 +1290,6 @@ the following code can be used to do a read:
> u32 seq, time_mult, time_shift, idx, width;
> u64 count, enabled, running;
> u64 cyc, time_offset;
> -s64 pmc = 0;
>
> do {
> seq = pc\->lock;
> @@ -1215,7 +1309,7 @@ do {
>
> if (pc\->cap_usr_rdpmc && idx) {
> width = pc\->pmc_width;
> - pmc = rdpmc(idx \- 1);
> + count += rdpmc(idx \- 1);
> }
>
> barrier();
> @@ -1223,6 +1317,16 @@ do {
> .fi
> .in
> .TP
> +.I cap_user_time " (Since Linux 3.12)"
> +This bit indicates the hardware has a constant, non-stop
> +timestamp counter (TSC on x86).
> +.TP
> +.IR cap_user_time_zero " (Since Linux 3.12)"
> +Indicates the presence of
> +.I time_zero
> +which allows mapping timestamp values to
> +the hardware clock.
> +.TP
> .I pmc_width
> If
> .IR cap_usr_rdpmc ,
> @@ -1274,6 +1378,27 @@ enabled and possible running (if idx), improving the scaling:
> count = quot * enabled + (rem * enabled) / running;
> .fi
> .TP
> +.IR time_zero " (Since Linux 3.12)"
> +
> +If
> +.I cap_usr_time_zero
> +is set then the hardware clock (the TSC timestamp counter on x86)
> +can be calculated from the
> +.IR time_zero ", " time_mult ", and " time_shift " values:"
> +.nf
> + time = timestamp - time_zero;
> + quot = time / time_mult;
> + rem = time % time_mult;
> + cyc = (quot << time_shift) + (rem << time_shift) / time_mult;
> +.fi
> +And vice versa:
> +.nf
> + quot = cyc >> time_shift;
> + rem = cyc & ((1 << time_shift) - 1);
> + timestamp = time_zero + quot * time_mult +
> + ((rem * time_mult) >> time_shift);
> +.fi
> +.TP
> .I data_head
> This points to the head of the data section.
> The value continuously increases, it does not wrap.
> @@ -2221,6 +2387,17 @@ ioctl argument was broken and would repeatedly operate
> on the event specified rather than iterating across
> all sibling events in a group.
>
> +From Linux 3.4 to Linux 3.11 the mmap
> +.I cap_usr_rdpmc
> +and
> +.I cap_usr_time
> +bits mapped to the same location.
> +Code should migrate to the new
> +.I cap_user_rdpmc
> +and
> +.I cap_user_time
> +fields instead.
> +
> Always double-check your results!
> Various generalized events have had wrong values.
> For example, retired branches measured
>
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
prev parent reply other threads:[~2013-11-07 17:14 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-06 18:26 [PATCH 0/4] perf_event_open.2 Linux 3.12 updates Vince Weaver
[not found] ` <alpine.DEB.2.10.1311061324070.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
2013-11-06 18:28 ` [PATCH 1/4] perf_event_open.2 PERF_COUNT_SW_DUMMY support Vince Weaver
[not found] ` <alpine.DEB.2.10.1311061327180.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
2013-11-07 17:11 ` Michael Kerrisk (man-pages)
2013-11-06 18:30 ` [PATCH 2/4] perf_event_open.2 Linux 3.12 PERF_SAMPLE_IDENTIFIER Vince Weaver
[not found] ` <alpine.DEB.2.10.1311061329120.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
2013-11-07 17:13 ` Michael Kerrisk (man-pages)
2013-11-06 18:31 ` [PATCH 3/4] perf_event_open.2 Linux 3.12 PERF_EVENT_IOC_ID Vince Weaver
[not found] ` <alpine.DEB.2.10.1311061330470.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
2013-11-07 17:13 ` Michael Kerrisk (man-pages)
2013-11-06 18:34 ` [PATCH 4/4] perf_event_open.2 Linux 3.12 rdpmc/mmap Vince Weaver
[not found] ` <alpine.DEB.2.10.1311061332040.26649-6xBS8L8d439fDsnSvq7Uq4Se7xf15W0s1dQoKJhdanU@public.gmane.org>
2013-11-07 17:14 ` Michael Kerrisk (man-pages) [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=527BCA88.90107@gmail.com \
--to=mtk.manpages-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
--cc=linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=vincent.weaver-e7X0jjDqjFGHXe+LvDLADg@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).