* [PATCH 1/2] perf_event_open.2: Document PERF_EVENT_IOC_PAUSE_OUTPUT
@ 2016-10-21 11:38 Wang Nan
[not found] ` <1477049893-143199-1-git-send-email-wangnan0-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Wang Nan @ 2016-10-21 11:38 UTC (permalink / raw)
To: mtk.manpages
Cc: wangnan0, pi3orama, linux-kernel, linux-man, lizefan,
vincent.weaver
Linux 4.7 (86e7972f690c1017fd086cdfe53d8524e68c661c) introduces
PERF_EVENT_IOC_PAUSE_OUTPUT feature. Document it.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
---
man2/perf_event_open.2 | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2
index fade28c..2d3acad 100644
--- a/man2/perf_event_open.2
+++ b/man2/perf_event_open.2
@@ -2865,7 +2865,18 @@ The argument is a BPF program file descriptor that was created by
a previous
.BR bpf (2)
system call.
-.SS Using prctl(2)
+.TP
+.BR PERF_EVENT_IOC_PAUSE_OUTPUT " (since Linux 4.7)"
+.\" commit 86e7972f690c1017fd086cdfe53d8524e68c661c
+This allows pausing and resuming the event's ring-buffer. A
+paused ring-buffer does not prevent samples generation, but simply
+discards them. The discarded samples are considered lost, causes
+.BR PERF_RECORD_LOST
+to be generated when possible.
+
+The argument is an integer. Nonzero value pauses the ring-buffer,
+zero value resumes the ring-buffer.
+.SS Using prctl
A process can enable or disable all the event groups that are
attached to it using the
.BR prctl (2)
--
2.10.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/2] perf_event_open.2: Document write_backward
[not found] ` <1477049893-143199-1-git-send-email-wangnan0-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2016-10-21 11:38 ` Wang Nan
2016-10-21 21:25 ` Vince Weaver
2016-10-21 21:16 ` [PATCH 1/2] perf_event_open.2: Document PERF_EVENT_IOC_PAUSE_OUTPUT Vince Weaver
2016-10-22 10:02 ` Michael Kerrisk (man-pages)
2 siblings, 1 reply; 8+ messages in thread
From: Wang Nan @ 2016-10-21 11:38 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
Cc: wangnan0-hv44wF8Li93QT0dZR+AlfA, pi3orama-9Onoh4P/yGk,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-man-u79uwXL29TY76Z2rM5mHXA, lizefan-hv44wF8Li93QT0dZR+AlfA,
vincent.weaver-e7X0jjDqjFGHXe+LvDLADg
Linux 4.7 (9ecda41acb971ebd07c8fb35faf24005c0baea12) introduces write_backward
attribute to perf_event_attr. Document this feature.
Signed-off-by: Wang Nan <wangnan0-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Cc: Michael Kerrisk <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
man2/perf_event_open.2 | 56 +++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 53 insertions(+), 3 deletions(-)
diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2
index 2d3acad..e5fdfec 100644
--- a/man2/perf_event_open.2
+++ b/man2/perf_event_open.2
@@ -244,8 +244,8 @@ struct perf_event_attr {
due to exec */
use_clockid : 1, /* use clockid for time fields */
context_switch : 1, /* context switch data */
-
- __reserved_1 : 37;
+ write_backward : 1, /* Write ring buffer from end to beginning */
+ __reserved_1 : 36;
union {
__u32 wakeup_events; /* wakeup every n events */
@@ -1127,6 +1127,29 @@ The advantage of this method is that it will give full
information even with strict
.I perf_event_paranoid
settings.
+.IR "write_backward" " (since Linux 4.6)"
+.\" commit 9ecda41acb971ebd07c8fb35faf24005c0baea12
+This makes the resuling event use a backward ring-buffer, which
+writes samples from the end of the ring-buffer.
+
+It is not allowed to connect events with backward and forward
+ring-buffer settings together using
+.B PERF_EVENT_IOC_SET_OUTPUT.
+
+Backward ring-buffer is useful when the ring-buffer is overwritable
+(created by readonly
+.BR mmap (2)
+). In this case,
+.IR data_tail
+is useless,
+.IR data_head
+points to the head of the most recent sample in a backward
+ring-buffer. It is easy to iterate over the whole ring-buffer by reading
+samples one by one because size of a sample can be found from decoding
+its header. In contract, in a forward overwritable ring-buffer, the only
+information is the end of the most recent sample which is pointed by
+.IR data_head,
+but the size of a sample can't be determined from the end of it.
.TP
.IR "wakeup_events" ", " "wakeup_watermark"
This union sets how many samples
@@ -1671,7 +1694,9 @@ And vice versa:
.TP
.I data_head
This points to the head of the data section.
-The value continuously increases, it does not wrap.
+The value continuously increases (or decrease if
+.IR write_backward
+is set), it does not wrap.
The value needs to be manually wrapped by the size of the mmap buffer
before accessing the samples.
@@ -2727,6 +2752,24 @@ Starting with Linux 3.18,
.B POLL_HUP
is indicated if the event being monitored is attached to a different
process and that process exits.
+.SS Reading from overwritable ring-buffer
+Reader is unable to update
+.IR data_tail
+if the mapping is not
+.BR PROT_WRITE .
+In this case, kernel will overwrite data without considering whether
+they are read or not, so ring-buffer is overwritable and
+behaves like a flight recorder. To read from an overwritable
+ring-buffer, setting
+.IR write_backward
+is suggested, or it would be hard to find a proper position to start
+decoding. In addition, ring-buffer should be paused before reading
+through
+.BR ioctl (2)
+with
+.B PERF_EVENT_IOC_PAUSE_OUTPUT
+to avoid racing between kernel and reader. Ring-buffer should be resumed
+after finish reading.
.SS rdpmc instruction
Starting with Linux 3.4 on x86, you can use the
.\" commit c7206205d00ab375839bd6c7ddb247d600693c09
@@ -2839,6 +2882,13 @@ The file descriptors must all be on the same CPU.
The argument specifies the desired file descriptor, or \-1 if
output should be ignored.
+
+Two events with different
+.IR write_backward
+settings are not allowed to be connected together using
+.B PERF_EVENT_IOC_SET_OUTPUT.
+.B EINVAL
+is returned in this case.
.TP
.BR PERF_EVENT_IOC_SET_FILTER " (since Linux 2.6.33)"
.\" commit 6fb2915df7f0747d9044da9dbff5b46dc2e20830
--
2.10.1
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] perf_event_open.2: Document PERF_EVENT_IOC_PAUSE_OUTPUT
[not found] ` <1477049893-143199-1-git-send-email-wangnan0-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2016-10-21 11:38 ` [PATCH 2/2] perf_event_open.2: Document write_backward Wang Nan
@ 2016-10-21 21:16 ` Vince Weaver
2016-10-22 10:00 ` Michael Kerrisk (man-pages)
2016-10-22 10:02 ` Michael Kerrisk (man-pages)
2 siblings, 1 reply; 8+ messages in thread
From: Vince Weaver @ 2016-10-21 21:16 UTC (permalink / raw)
To: Wang Nan
Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, pi3orama-9Onoh4P/yGk,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-man-u79uwXL29TY76Z2rM5mHXA, lizefan-hv44wF8Li93QT0dZR+AlfA,
vincent.weaver-e7X0jjDqjFGHXe+LvDLADg
On Fri, 21 Oct 2016, Wang Nan wrote:
> -.SS Using prctl(2)
> +.SS Using prctl
why this change?
> +.BR PERF_EVENT_IOC_PAUSE_OUTPUT " (since Linux 4.7)"
> +.\" commit 86e7972f690c1017fd086cdfe53d8524e68c661c
> +This allows pausing and resuming the event's ring-buffer. A
> +paused ring-buffer does not prevent samples generation, but simply
> +discards them. The discarded samples are considered lost, causes
> +.BR PERF_RECORD_LOST
> +to be generated when possible.
I don't know if it's worth mentioning that the reason to add this is to
allow reading the ring-buffer without having to worry about data being
overwritten.
There are a few odd wording choices (mostly plural nouns) but otherwise
looks fine to me.
Reviewed-by: Vince Weaver <vincent.weaver-e7X0jjDqjFGHXe+LvDLADg@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] perf_event_open.2: Document write_backward
2016-10-21 11:38 ` [PATCH 2/2] perf_event_open.2: Document write_backward Wang Nan
@ 2016-10-21 21:25 ` Vince Weaver
2016-10-22 10:05 ` Michael Kerrisk (man-pages)
0 siblings, 1 reply; 8+ messages in thread
From: Vince Weaver @ 2016-10-21 21:25 UTC (permalink / raw)
To: Wang Nan
Cc: mtk.manpages, pi3orama, linux-kernel, linux-man, lizefan,
vincent.weaver
On Fri, 21 Oct 2016, Wang Nan wrote:
> context_switch : 1, /* context switch data */
> -
> - __reserved_1 : 37;
> + write_backward : 1, /* Write ring buffer from end to beginning */
> + __reserved_1 : 36;
This removes a blank line, not sure if intentional or not.
> +.IR "write_backward" " (since Linux 4.6)"
It didn't committed until Linux 4.7 from what I can tell?
> +This makes the resuling event use a backward ring-buffer, which
resulting
> +writes samples from the end of the ring-buffer.
> +
> +It is not allowed to connect events with backward and forward
> +ring-buffer settings together using
> +.B PERF_EVENT_IOC_SET_OUTPUT.
> +
> +Backward ring-buffer is useful when the ring-buffer is overwritable
> +(created by readonly
> +.BR mmap (2)
> +).
A ring buffer is over-writable when it is mmapped readonly?
Is this a hard requirement?
Can you set the read-backwards bit if not mapped readonly?
Otherwise the documentation seems reasonable.
Reviewed-by: Vince Weaver <vincent.weaver@maine.edu>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] perf_event_open.2: Document PERF_EVENT_IOC_PAUSE_OUTPUT
2016-10-21 21:16 ` [PATCH 1/2] perf_event_open.2: Document PERF_EVENT_IOC_PAUSE_OUTPUT Vince Weaver
@ 2016-10-22 10:00 ` Michael Kerrisk (man-pages)
0 siblings, 0 replies; 8+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-10-22 10:00 UTC (permalink / raw)
To: Vince Weaver, Wang Nan
Cc: mtk.manpages, pi3orama, linux-kernel, linux-man, lizefan
On 10/21/2016 11:16 PM, Vince Weaver wrote:
> On Fri, 21 Oct 2016, Wang Nan wrote:
>
>
>> -.SS Using prctl(2)
>> +.SS Using prctl
>
> why this change?
I suspect a diff against a slight stale version of the page,
since I added the '(2)' just a few days ago. Wang Nan, please
do pull the latest version of the page :-).
>> +.BR PERF_EVENT_IOC_PAUSE_OUTPUT " (since Linux 4.7)"
>> +.\" commit 86e7972f690c1017fd086cdfe53d8524e68c661c
>> +This allows pausing and resuming the event's ring-buffer. A
>> +paused ring-buffer does not prevent samples generation, but simply
>> +discards them. The discarded samples are considered lost, causes
>> +.BR PERF_RECORD_LOST
>> +to be generated when possible.
>
> I don't know if it's worth mentioning that the reason to add this is to
> allow reading the ring-buffer without having to worry about data being
> overwritten.
Wang Nan, what do you you thing. Should this be added?
> There are a few odd wording choices (mostly plural nouns) but otherwise
> looks fine to me.
>
> Reviewed-by: Vince Weaver <vincent.weaver@maine.edu>
Wang Nan, I'll send a few wording corrections. Could you please include
Vince's reviewed by tag on your next revision?
Thanks,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] perf_event_open.2: Document PERF_EVENT_IOC_PAUSE_OUTPUT
[not found] ` <1477049893-143199-1-git-send-email-wangnan0-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2016-10-21 11:38 ` [PATCH 2/2] perf_event_open.2: Document write_backward Wang Nan
2016-10-21 21:16 ` [PATCH 1/2] perf_event_open.2: Document PERF_EVENT_IOC_PAUSE_OUTPUT Vince Weaver
@ 2016-10-22 10:02 ` Michael Kerrisk (man-pages)
2 siblings, 0 replies; 8+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-10-22 10:02 UTC (permalink / raw)
To: Wang Nan
Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, pi3orama-9Onoh4P/yGk,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-man-u79uwXL29TY76Z2rM5mHXA, lizefan-hv44wF8Li93QT0dZR+AlfA,
vincent.weaver-e7X0jjDqjFGHXe+LvDLADg
Hello Wang Nan
Thanks for this patch! A few comments below.
On 10/21/2016 01:38 PM, Wang Nan wrote:
> Linux 4.7 (86e7972f690c1017fd086cdfe53d8524e68c661c) introduces
> PERF_EVENT_IOC_PAUSE_OUTPUT feature. Document it.
>
> Signed-off-by: Wang Nan <wangnan0-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> Cc: Michael Kerrisk <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
> man2/perf_event_open.2 | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2
> index fade28c..2d3acad 100644
> --- a/man2/perf_event_open.2
> +++ b/man2/perf_event_open.2
> @@ -2865,7 +2865,18 @@ The argument is a BPF program file descriptor that was created by
> a previous
> .BR bpf (2)
> system call.
> -.SS Using prctl(2)
> +.TP
> +.BR PERF_EVENT_IOC_PAUSE_OUTPUT " (since Linux 4.7)"
> +.\" commit 86e7972f690c1017fd086cdfe53d8524e68c661c
> +This allows pausing and resuming the event's ring-buffer. A
> +paused ring-buffer does not prevent samples generation, but simply
s/samples generation/generation of samples/
> +discards them. The discarded samples are considered lost, causes
s/them/the samples/
s/causes/causing/
> +.BR PERF_RECORD_LOST
> +to be generated when possible.
> +
> +The argument is an integer. Nonzero value pauses the ring-buffer,
s/Nonzero/a nonzero/
> +zero value resumes the ring-buffer.
s/zero value/zero/
> +.SS Using prctl
As noted by Vince, the change to this SS line should not be part of this patch.
> A process can enable or disable all the event groups that are
> attached to it using the
> .BR prctl (2)
Thanks,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] perf_event_open.2: Document write_backward
2016-10-21 21:25 ` Vince Weaver
@ 2016-10-22 10:05 ` Michael Kerrisk (man-pages)
2016-10-24 6:44 ` Wangnan (F)
0 siblings, 1 reply; 8+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-10-22 10:05 UTC (permalink / raw)
To: Vince Weaver, Wang Nan
Cc: mtk.manpages, pi3orama, linux-kernel, linux-man, lizefan
On 10/21/2016 11:25 PM, Vince Weaver wrote:
> On Fri, 21 Oct 2016, Wang Nan wrote:
>
>> context_switch : 1, /* context switch data */
>> -
>> - __reserved_1 : 37;
>> + write_backward : 1, /* Write ring buffer from end to beginning */
>> + __reserved_1 : 36;
>
> This removes a blank line, not sure if intentional or not.
Maybe it would be better to keep it. I don't feel too strongly about
this though.
>> +.IR "write_backward" " (since Linux 4.6)"
>
> It didn't committed until Linux 4.7 from what I can tell?
Yes, that's my recollection too.
>
>> +This makes the resuling event use a backward ring-buffer, which
> resulting
>
>> +writes samples from the end of the ring-buffer.
>> +
>> +It is not allowed to connect events with backward and forward
>> +ring-buffer settings together using
>> +.B PERF_EVENT_IOC_SET_OUTPUT.
>> +
>> +Backward ring-buffer is useful when the ring-buffer is overwritable
>> +(created by readonly
>> +.BR mmap (2)
>> +).
>
> A ring buffer is over-writable when it is mmapped readonly?
> Is this a hard requirement?
> Can you set the read-backwards bit if not mapped readonly?
Wang Nan, could you perhaps clarify this in the next version of the patch?
>
> Otherwise the documentation seems reasonable.
>
> Reviewed-by: Vince Weaver <vincent.weaver@maine.edu>
Thanks for reviewing both patches, Vince. Wang Nan, please include the
Reviewed-by: in the next patch iteration.
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] perf_event_open.2: Document write_backward
2016-10-22 10:05 ` Michael Kerrisk (man-pages)
@ 2016-10-24 6:44 ` Wangnan (F)
0 siblings, 0 replies; 8+ messages in thread
From: Wangnan (F) @ 2016-10-24 6:44 UTC (permalink / raw)
To: Michael Kerrisk (man-pages), Vince Weaver
Cc: pi3orama, linux-kernel, linux-man, lizefan
On 2016/10/22 18:05, Michael Kerrisk (man-pages) wrote:
> On 10/21/2016 11:25 PM, Vince Weaver wrote:
>> On Fri, 21 Oct 2016, Wang Nan wrote:
>>
>>> context_switch : 1, /* context switch data */
>>> -
>>> - __reserved_1 : 37;
>>> + write_backward : 1, /* Write ring buffer from end to beginning */
>>> + __reserved_1 : 36;
>> This removes a blank line, not sure if intentional or not.
> Maybe it would be better to keep it. I don't feel too strongly about
> this though.
>
>>> +.IR "write_backward" " (since Linux 4.6)"
>> It didn't committed until Linux 4.7 from what I can tell?
> Yes, that's my recollection too.
>
>>> +This makes the resuling event use a backward ring-buffer, which
>> resulting
>>
>>> +writes samples from the end of the ring-buffer.
>>> +
>>> +It is not allowed to connect events with backward and forward
>>> +ring-buffer settings together using
>>> +.B PERF_EVENT_IOC_SET_OUTPUT.
>>> +
>>> +Backward ring-buffer is useful when the ring-buffer is overwritable
>>> +(created by readonly
>>> +.BR mmap (2)
>>> +).
>> A ring buffer is over-writable when it is mmapped readonly?
>> Is this a hard requirement?
I'd like to explain over-writable ring buffer in patch 1/1 like this:
diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2
index fade28c..561331c 100644
--- a/man2/perf_event_open.2
+++ b/man2/perf_event_open.2
@@ -1687,6 +1687,15 @@ the
.I data_tail
value should be written by user space to reflect the last read data.
In this case, the kernel will not overwrite unread data.
+
+When the mapping is read only (without
+.BR PROT_WRITE ),
+setting .I data_tail is not allowed.
+In this case, the kernel will overwrite data when sample coming, unless
+the ring buffer is paused by a
+.BR PERF_EVENT_IOC_PAUSE_OUTPUT
+.BR ioctl (2)
+system call before reading.
.TP
.IR data_offset " (since Linux 4.1)"
.\" commit e8c6deac69629c0cb97c3d3272f8631ef17f8f0f
The ring buffer become over-writable because there's no way to tell kernel
the positioin of the last read data when mmaped read only.
>> Can you set the read-backwards bit if not mapped readonly?
I don't understand why we need read-backwards.
Mapped with PROT_WRITE is the *default* setting. In this case user program
like perf is able to tell the reading position to kernel through writing to
'data_tail'. In this case kernel won't overwrite unread data, it reads
forwardly.
Or do you think the naming is confusing? The name of 'write_backward' is
kernel-centric, means adjust kernel behavior. kernel *write* data, so I
call it 'write_backward'. The name 'read-backwards' is user-centric,
because user 'read' data.
Thank you.
^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-10-24 6:44 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-21 11:38 [PATCH 1/2] perf_event_open.2: Document PERF_EVENT_IOC_PAUSE_OUTPUT Wang Nan
[not found] ` <1477049893-143199-1-git-send-email-wangnan0-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2016-10-21 11:38 ` [PATCH 2/2] perf_event_open.2: Document write_backward Wang Nan
2016-10-21 21:25 ` Vince Weaver
2016-10-22 10:05 ` Michael Kerrisk (man-pages)
2016-10-24 6:44 ` Wangnan (F)
2016-10-21 21:16 ` [PATCH 1/2] perf_event_open.2: Document PERF_EVENT_IOC_PAUSE_OUTPUT Vince Weaver
2016-10-22 10:00 ` Michael Kerrisk (man-pages)
2016-10-22 10:02 ` Michael Kerrisk (man-pages)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).