* [PATCH 01/16] perf, mmap: Factor out ring_buffer_detach_all()
2014-04-07 15:04 [PATCH v4 00/16] perf, persistent: Add persistent events Jean Pihet
@ 2014-04-07 15:04 ` Jean Pihet
2014-04-07 15:04 ` [PATCH 02/16] perf, mmap: Factor out try_get_event()/put_event() Jean Pihet
` (16 subsequent siblings)
17 siblings, 0 replies; 41+ messages in thread
From: Jean Pihet @ 2014-04-07 15:04 UTC (permalink / raw)
To: Borislav Petkov, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel, Robert Richter
Cc: Robert Richter, Jean Pihet
From: Robert Richter <robert.richter@linaro.org>
Factor out a function to detach all events from a ringbuffer. No
functional changes.
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
Signed-off-by: Jean Pihet <jean.pihet@linaro.org>
---
kernel/events/core.c | 82 ++++++++++++++++++++++++++++------------------------
1 file changed, 44 insertions(+), 38 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 661951a..8867236 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3878,6 +3878,49 @@ static void ring_buffer_detach(struct perf_event *event, struct ring_buffer *rb)
spin_unlock_irqrestore(&rb->event_lock, flags);
}
+static void ring_buffer_detach_all(struct ring_buffer *rb)
+{
+ struct perf_event *event;
+again:
+ rcu_read_lock();
+ list_for_each_entry_rcu(event, &rb->event_list, rb_entry) {
+ if (!atomic_long_inc_not_zero(&event->refcount)) {
+ /*
+ * This event is en-route to free_event() which will
+ * detach it and remove it from the list.
+ */
+ continue;
+ }
+ rcu_read_unlock();
+
+ mutex_lock(&event->mmap_mutex);
+ /*
+ * Check we didn't race with perf_event_set_output() which can
+ * swizzle the rb from under us while we were waiting to
+ * acquire mmap_mutex.
+ *
+ * If we find a different rb; ignore this event, a next
+ * iteration will no longer find it on the list. We have to
+ * still restart the iteration to make sure we're not now
+ * iterating the wrong list.
+ */
+ if (event->rb == rb) {
+ rcu_assign_pointer(event->rb, NULL);
+ ring_buffer_detach(event, rb);
+ ring_buffer_put(rb); /* can't be last, we still have one */
+ }
+ mutex_unlock(&event->mmap_mutex);
+ put_event(event);
+
+ /*
+ * Restart the iteration; either we're on the wrong list or
+ * destroyed its integrity by doing a deletion.
+ */
+ goto again;
+ }
+ rcu_read_unlock();
+}
+
static void ring_buffer_wakeup(struct perf_event *event)
{
struct ring_buffer *rb;
@@ -3970,44 +4013,7 @@ static void perf_mmap_close(struct vm_area_struct *vma)
* into the now unreachable buffer. Somewhat complicated by the
* fact that rb::event_lock otherwise nests inside mmap_mutex.
*/
-again:
- rcu_read_lock();
- list_for_each_entry_rcu(event, &rb->event_list, rb_entry) {
- if (!atomic_long_inc_not_zero(&event->refcount)) {
- /*
- * This event is en-route to free_event() which will
- * detach it and remove it from the list.
- */
- continue;
- }
- rcu_read_unlock();
-
- mutex_lock(&event->mmap_mutex);
- /*
- * Check we didn't race with perf_event_set_output() which can
- * swizzle the rb from under us while we were waiting to
- * acquire mmap_mutex.
- *
- * If we find a different rb; ignore this event, a next
- * iteration will no longer find it on the list. We have to
- * still restart the iteration to make sure we're not now
- * iterating the wrong list.
- */
- if (event->rb == rb) {
- rcu_assign_pointer(event->rb, NULL);
- ring_buffer_detach(event, rb);
- ring_buffer_put(rb); /* can't be last, we still have one */
- }
- mutex_unlock(&event->mmap_mutex);
- put_event(event);
-
- /*
- * Restart the iteration; either we're on the wrong list or
- * destroyed its integrity by doing a deletion.
- */
- goto again;
- }
- rcu_read_unlock();
+ ring_buffer_detach_all(rb);
/*
* It could be there's still a few 0-ref events on the list; they'll
--
1.7.11.7
^ permalink raw reply related [flat|nested] 41+ messages in thread* [PATCH 02/16] perf, mmap: Factor out try_get_event()/put_event()
2014-04-07 15:04 [PATCH v4 00/16] perf, persistent: Add persistent events Jean Pihet
2014-04-07 15:04 ` [PATCH 01/16] perf, mmap: Factor out ring_buffer_detach_all() Jean Pihet
@ 2014-04-07 15:04 ` Jean Pihet
2014-04-07 15:04 ` [PATCH 03/16] perf, mmap: Factor out perf_alloc/free_rb() Jean Pihet
` (15 subsequent siblings)
17 siblings, 0 replies; 41+ messages in thread
From: Jean Pihet @ 2014-04-07 15:04 UTC (permalink / raw)
To: Borislav Petkov, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel, Robert Richter
Cc: Robert Richter, Jean Pihet
From: Robert Richter <robert.richter@linaro.org>
Implement try_get_event() as counter part to put_event(). Put both in
internal.h to make it available to other perf files.
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
Signed-off-by: Jean Pihet <jean.pihet@linaro.org>
---
kernel/events/core.c | 9 +++------
kernel/events/internal.h | 12 ++++++++++++
2 files changed, 15 insertions(+), 6 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 8867236..5eaba42 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3311,13 +3311,10 @@ EXPORT_SYMBOL_GPL(perf_event_release_kernel);
/*
* Called when the last reference to the file is gone.
*/
-static void put_event(struct perf_event *event)
+void __put_event(struct perf_event *event)
{
struct task_struct *owner;
- if (!atomic_long_dec_and_test(&event->refcount))
- return;
-
rcu_read_lock();
owner = ACCESS_ONCE(event->owner);
/*
@@ -3884,7 +3881,7 @@ static void ring_buffer_detach_all(struct ring_buffer *rb)
again:
rcu_read_lock();
list_for_each_entry_rcu(event, &rb->event_list, rb_entry) {
- if (!atomic_long_inc_not_zero(&event->refcount)) {
+ if (!try_get_event(event)) {
/*
* This event is en-route to free_event() which will
* detach it and remove it from the list.
@@ -7606,7 +7603,7 @@ inherit_event(struct perf_event *parent_event,
if (IS_ERR(child_event))
return child_event;
- if (!atomic_long_inc_not_zero(&parent_event->refcount)) {
+ if (!try_get_event(parent_event)) {
free_event(child_event);
return NULL;
}
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 569b2187..3bd89d4 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -195,4 +195,16 @@ static inline bool arch_perf_have_user_stack_dump(void)
#define perf_user_stack_pointer(regs) 0
#endif /* CONFIG_HAVE_PERF_USER_STACK_DUMP */
+static inline bool try_get_event(struct perf_event *event)
+{
+ return atomic_long_inc_not_zero(&event->refcount) != 0;
+}
+extern void __put_event(struct perf_event *event);
+static inline void put_event(struct perf_event *event)
+{
+ if (!atomic_long_dec_and_test(&event->refcount))
+ return;
+ __put_event(event);
+}
+
#endif /* _KERNEL_EVENTS_INTERNAL_H */
--
1.7.11.7
^ permalink raw reply related [flat|nested] 41+ messages in thread* [PATCH 03/16] perf, mmap: Factor out perf_alloc/free_rb()
2014-04-07 15:04 [PATCH v4 00/16] perf, persistent: Add persistent events Jean Pihet
2014-04-07 15:04 ` [PATCH 01/16] perf, mmap: Factor out ring_buffer_detach_all() Jean Pihet
2014-04-07 15:04 ` [PATCH 02/16] perf, mmap: Factor out try_get_event()/put_event() Jean Pihet
@ 2014-04-07 15:04 ` Jean Pihet
2014-04-22 14:25 ` Peter Zijlstra
2014-04-07 15:04 ` [PATCH 04/16] perf, mmap: Factor out perf_get_fd() Jean Pihet
` (14 subsequent siblings)
17 siblings, 1 reply; 41+ messages in thread
From: Jean Pihet @ 2014-04-07 15:04 UTC (permalink / raw)
To: Borislav Petkov, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel, Robert Richter
Cc: Robert Richter, Jean Pihet
From: Robert Richter <robert.richter@linaro.org>
Factor out code to allocate and deallocate ringbuffers. We need this
later to setup the sampling buffer for persistent events.
While at this, replacing get_current_user() with get_uid(user).
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
Signed-off-by: Jean Pihet <jean.pihet@linaro.org>
---
kernel/events/core.c | 77 +++++++++++++++++++++++++++++-------------------
kernel/events/internal.h | 3 ++
2 files changed, 50 insertions(+), 30 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5eaba42..22ec8f0 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3193,7 +3193,45 @@ static void free_event_rcu(struct rcu_head *head)
}
static void ring_buffer_put(struct ring_buffer *rb);
+static void ring_buffer_attach(struct perf_event *event, struct ring_buffer *rb);
static void ring_buffer_detach(struct perf_event *event, struct ring_buffer *rb);
+static void perf_event_init_userpage(struct perf_event *event);
+
+/*
+ * Must be called with &event->mmap_mutex held. event->rb must be
+ * NULL. perf_alloc_rb() requires &event->mmap_count to be incremented
+ * on success which corresponds to &rb->mmap_count that is initialized
+ * with 1.
+ */
+int perf_alloc_rb(struct perf_event *event, int nr_pages, int flags)
+{
+ struct ring_buffer *rb;
+
+ rb = rb_alloc(nr_pages,
+ event->attr.watermark ? event->attr.wakeup_watermark : 0,
+ event->cpu, flags);
+ if (!rb)
+ return -ENOMEM;
+
+ atomic_set(&rb->mmap_count, 1);
+ ring_buffer_attach(event, rb);
+ rcu_assign_pointer(event->rb, rb);
+
+ perf_event_init_userpage(event);
+ perf_event_update_userpage(event);
+
+ return 0;
+}
+
+/* Must be called with &event->mmap_mutex held. event->rb must be set. */
+void perf_free_rb(struct perf_event *event)
+{
+ struct ring_buffer *rb = event->rb;
+
+ rcu_assign_pointer(event->rb, NULL);
+ ring_buffer_detach(event, rb);
+ ring_buffer_put(rb);
+}
static void unaccount_event_cpu(struct perf_event *event, int cpu)
{
@@ -3246,6 +3284,7 @@ static void __free_event(struct perf_event *event)
call_rcu(&event->rcu_head, free_event_rcu);
}
+
static void free_event(struct perf_event *event)
{
irq_work_sync(&event->pending);
@@ -3253,8 +3292,6 @@ static void free_event(struct perf_event *event)
unaccount_event(event);
if (event->rb) {
- struct ring_buffer *rb;
-
/*
* Can happen when we close an event with re-directed output.
*
@@ -3262,12 +3299,8 @@ static void free_event(struct perf_event *event)
* over us; possibly making our ring_buffer_put() the last.
*/
mutex_lock(&event->mmap_mutex);
- rb = event->rb;
- if (rb) {
- rcu_assign_pointer(event->rb, NULL);
- ring_buffer_detach(event, rb);
- ring_buffer_put(rb); /* could be last */
- }
+ if (event->rb)
+ perf_free_rb(event);
mutex_unlock(&event->mmap_mutex);
}
@@ -3901,11 +3934,8 @@ again:
* still restart the iteration to make sure we're not now
* iterating the wrong list.
*/
- if (event->rb == rb) {
- rcu_assign_pointer(event->rb, NULL);
- ring_buffer_detach(event, rb);
- ring_buffer_put(rb); /* can't be last, we still have one */
- }
+ if (event->rb == rb)
+ perf_free_rb(event);
mutex_unlock(&event->mmap_mutex);
put_event(event);
@@ -4041,7 +4071,6 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
unsigned long user_locked, user_lock_limit;
struct user_struct *user = current_user();
unsigned long locked, lock_limit;
- struct ring_buffer *rb;
unsigned long vma_size;
unsigned long nr_pages;
long user_extra, extra;
@@ -4125,28 +4154,16 @@ again:
if (vma->vm_flags & VM_WRITE)
flags |= RING_BUFFER_WRITABLE;
- rb = rb_alloc(nr_pages,
- event->attr.watermark ? event->attr.wakeup_watermark : 0,
- event->cpu, flags);
-
- if (!rb) {
- ret = -ENOMEM;
+ ret = perf_alloc_rb(event, nr_pages, flags);
+ if (ret)
goto unlock;
- }
- atomic_set(&rb->mmap_count, 1);
- rb->mmap_locked = extra;
- rb->mmap_user = get_current_user();
+ event->rb->mmap_locked = extra;
+ event->rb->mmap_user = get_uid(user);
atomic_long_add(user_extra, &user->locked_vm);
vma->vm_mm->pinned_vm += extra;
- ring_buffer_attach(event, rb);
- rcu_assign_pointer(event->rb, rb);
-
- perf_event_init_userpage(event);
- perf_event_update_userpage(event);
-
unlock:
if (!ret)
atomic_inc(&event->mmap_count);
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 3bd89d4..e9007ff 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -207,4 +207,7 @@ static inline void put_event(struct perf_event *event)
__put_event(event);
}
+extern int perf_alloc_rb(struct perf_event *event, int nr_pages, int flags);
+extern void perf_free_rb(struct perf_event *event);
+
#endif /* _KERNEL_EVENTS_INTERNAL_H */
--
1.7.11.7
^ permalink raw reply related [flat|nested] 41+ messages in thread* Re: [PATCH 03/16] perf, mmap: Factor out perf_alloc/free_rb()
2014-04-07 15:04 ` [PATCH 03/16] perf, mmap: Factor out perf_alloc/free_rb() Jean Pihet
@ 2014-04-22 14:25 ` Peter Zijlstra
0 siblings, 0 replies; 41+ messages in thread
From: Peter Zijlstra @ 2014-04-22 14:25 UTC (permalink / raw)
To: Jean Pihet
Cc: Borislav Petkov, Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa,
linux-kernel, Robert Richter, Robert Richter
On Mon, Apr 07, 2014 at 05:04:25PM +0200, Jean Pihet wrote:
> From: Robert Richter <robert.richter@linaro.org>
>
> Factor out code to allocate and deallocate ringbuffers. We need this
> later to setup the sampling buffer for persistent events.
>
> While at this, replacing get_current_user() with get_uid(user).
>
> Signed-off-by: Robert Richter <robert.richter@linaro.org>
> Signed-off-by: Robert Richter <rric@kernel.org>
> Signed-off-by: Jean Pihet <jean.pihet@linaro.org>
> ---
> kernel/events/core.c | 77 +++++++++++++++++++++++++++++-------------------
> kernel/events/internal.h | 3 ++
> 2 files changed, 50 insertions(+), 30 deletions(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 5eaba42..22ec8f0 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3193,7 +3193,45 @@ static void free_event_rcu(struct rcu_head *head)
> }
>
> static void ring_buffer_put(struct ring_buffer *rb);
> +static void ring_buffer_attach(struct perf_event *event, struct ring_buffer *rb);
> static void ring_buffer_detach(struct perf_event *event, struct ring_buffer *rb);
> +static void perf_event_init_userpage(struct perf_event *event);
> +
> +/*
> + * Must be called with &event->mmap_mutex held. event->rb must be
> + * NULL. perf_alloc_rb() requires &event->mmap_count to be incremented
> + * on success which corresponds to &rb->mmap_count that is initialized
> + * with 1.
> + */
> +int perf_alloc_rb(struct perf_event *event, int nr_pages, int flags)
> +{
> + struct ring_buffer *rb;
lockdep_assert_held(&event->mmap_mutex);
WARN_ON(event->rb);
> +
> + rb = rb_alloc(nr_pages,
> + event->attr.watermark ? event->attr.wakeup_watermark : 0,
> + event->cpu, flags);
> + if (!rb)
> + return -ENOMEM;
> +
> + atomic_set(&rb->mmap_count, 1);
> + ring_buffer_attach(event, rb);
> + rcu_assign_pointer(event->rb, rb);
> +
> + perf_event_init_userpage(event);
> + perf_event_update_userpage(event);
> +
> + return 0;
> +}
> +
> +/* Must be called with &event->mmap_mutex held. event->rb must be set. */
> +void perf_free_rb(struct perf_event *event)
> +{
> + struct ring_buffer *rb = event->rb;
lockdep_assert_held(&event->mmap_mutex);
> + rcu_assign_pointer(event->rb, NULL);
> + ring_buffer_detach(event, rb);
> + ring_buffer_put(rb);
> +}
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 04/16] perf, mmap: Factor out perf_get_fd()
2014-04-07 15:04 [PATCH v4 00/16] perf, persistent: Add persistent events Jean Pihet
` (2 preceding siblings ...)
2014-04-07 15:04 ` [PATCH 03/16] perf, mmap: Factor out perf_alloc/free_rb() Jean Pihet
@ 2014-04-07 15:04 ` Jean Pihet
2014-04-22 14:27 ` Peter Zijlstra
2014-04-07 15:04 ` [PATCH 05/16] perf: Add persistent events Jean Pihet
` (13 subsequent siblings)
17 siblings, 1 reply; 41+ messages in thread
From: Jean Pihet @ 2014-04-07 15:04 UTC (permalink / raw)
To: Borislav Petkov, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel, Robert Richter
Cc: Robert Richter, Jean Pihet
From: Robert Richter <robert.richter@linaro.org>
This new function creates a new fd for an event. We need this later to
get a fd from a persistent event.
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
Signed-off-by: Jean Pihet <jean.pihet@linaro.org>
---
kernel/events/core.c | 14 ++++++++------
kernel/events/internal.h | 1 +
2 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 22ec8f0..9857475 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4206,6 +4206,11 @@ static const struct file_operations perf_fops = {
.fasync = perf_fasync,
};
+int perf_get_fd(struct perf_event *event, int f_flags)
+{
+ return anon_inode_getfd("[perf_event]", &perf_fops, event, f_flags);
+}
+
/*
* Perf event wakeup
*
@@ -7028,7 +7033,6 @@ SYSCALL_DEFINE5(perf_event_open,
struct perf_event *event, *sibling;
struct perf_event_attr attr;
struct perf_event_context *ctx;
- struct file *event_file = NULL;
struct fd group = {NULL, 0};
struct task_struct *task = NULL;
struct pmu *pmu;
@@ -7189,10 +7193,9 @@ SYSCALL_DEFINE5(perf_event_open,
goto err_context;
}
- event_file = anon_inode_getfile("[perf_event]", &perf_fops, event,
- f_flags);
- if (IS_ERR(event_file)) {
- err = PTR_ERR(event_file);
+ event_fd = perf_get_fd(event, f_flags);
+ if (event_fd < 0) {
+ err = event_fd;
goto err_context;
}
@@ -7257,7 +7260,6 @@ SYSCALL_DEFINE5(perf_event_open,
* perf_group_detach().
*/
fdput(group);
- fd_install(event_fd, event_file);
return event_fd;
err_context:
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index e9007ff..5f1f92d 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -209,5 +209,6 @@ static inline void put_event(struct perf_event *event)
extern int perf_alloc_rb(struct perf_event *event, int nr_pages, int flags);
extern void perf_free_rb(struct perf_event *event);
+extern int perf_get_fd(struct perf_event *event, int f_flags);
#endif /* _KERNEL_EVENTS_INTERNAL_H */
--
1.7.11.7
^ permalink raw reply related [flat|nested] 41+ messages in thread* Re: [PATCH 04/16] perf, mmap: Factor out perf_get_fd()
2014-04-07 15:04 ` [PATCH 04/16] perf, mmap: Factor out perf_get_fd() Jean Pihet
@ 2014-04-22 14:27 ` Peter Zijlstra
2014-04-25 13:54 ` Robert Richter
0 siblings, 1 reply; 41+ messages in thread
From: Peter Zijlstra @ 2014-04-22 14:27 UTC (permalink / raw)
To: Jean Pihet
Cc: Borislav Petkov, Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa,
linux-kernel, Robert Richter, Robert Richter
On Mon, Apr 07, 2014 at 05:04:26PM +0200, Jean Pihet wrote:
> From: Robert Richter <robert.richter@linaro.org>
>
> This new function creates a new fd for an event. We need this later to
> get a fd from a persistent event.
>
> Signed-off-by: Robert Richter <robert.richter@linaro.org>
> Signed-off-by: Robert Richter <rric@kernel.org>
> Signed-off-by: Jean Pihet <jean.pihet@linaro.org>
> ---
> kernel/events/core.c | 14 ++++++++------
> kernel/events/internal.h | 1 +
> 2 files changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 22ec8f0..9857475 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -4206,6 +4206,11 @@ static const struct file_operations perf_fops = {
> .fasync = perf_fasync,
> };
>
> +int perf_get_fd(struct perf_event *event, int f_flags)
> +{
> + return anon_inode_getfd("[perf_event]", &perf_fops, event, f_flags);
> +}
> +
> /*
> * Perf event wakeup
> *
> @@ -7028,7 +7033,6 @@ SYSCALL_DEFINE5(perf_event_open,
> struct perf_event *event, *sibling;
> struct perf_event_attr attr;
> struct perf_event_context *ctx;
> - struct file *event_file = NULL;
> struct fd group = {NULL, 0};
> struct task_struct *task = NULL;
> struct pmu *pmu;
> @@ -7189,10 +7193,9 @@ SYSCALL_DEFINE5(perf_event_open,
> goto err_context;
> }
>
> - event_file = anon_inode_getfile("[perf_event]", &perf_fops, event,
> - f_flags);
> - if (IS_ERR(event_file)) {
> - err = PTR_ERR(event_file);
> + event_fd = perf_get_fd(event, f_flags);
> + if (event_fd < 0) {
> + err = event_fd;
> goto err_context;
> }
>
> @@ -7257,7 +7260,6 @@ SYSCALL_DEFINE5(perf_event_open,
> * perf_group_detach().
> */
> fdput(group);
> - fd_install(event_fd, event_file);
> return event_fd;
>
> err_context:
ISTR there being a good reason we only install the FD at the very last
moment. Of course I cannot quite recall it :/
I suspect it had something to do with not being able to fudge the state
while its being set-up or somesuch.
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH 04/16] perf, mmap: Factor out perf_get_fd()
2014-04-22 14:27 ` Peter Zijlstra
@ 2014-04-25 13:54 ` Robert Richter
2014-04-25 14:43 ` Peter Zijlstra
0 siblings, 1 reply; 41+ messages in thread
From: Robert Richter @ 2014-04-25 13:54 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Jean Pihet, Borislav Petkov, Ingo Molnar,
Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel, Al Viro
On 22.04.14 16:27:59, Peter Zijlstra wrote:
> On Mon, Apr 07, 2014 at 05:04:26PM +0200, Jean Pihet wrote:
> > From: Robert Richter <robert.richter@linaro.org>
> >
> > This new function creates a new fd for an event. We need this later to
> > get a fd from a persistent event.
> >
> > Signed-off-by: Robert Richter <robert.richter@linaro.org>
> > Signed-off-by: Robert Richter <rric@kernel.org>
> > Signed-off-by: Jean Pihet <jean.pihet@linaro.org>
> > ---
> > kernel/events/core.c | 14 ++++++++------
> > kernel/events/internal.h | 1 +
> > 2 files changed, 9 insertions(+), 6 deletions(-)
> >
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 22ec8f0..9857475 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -4206,6 +4206,11 @@ static const struct file_operations perf_fops = {
> > .fasync = perf_fasync,
> > };
> >
> > +int perf_get_fd(struct perf_event *event, int f_flags)
> > +{
> > + return anon_inode_getfd("[perf_event]", &perf_fops, event, f_flags);
> > +}
> > +
> > /*
> > * Perf event wakeup
> > *
> > @@ -7028,7 +7033,6 @@ SYSCALL_DEFINE5(perf_event_open,
> > struct perf_event *event, *sibling;
> > struct perf_event_attr attr;
> > struct perf_event_context *ctx;
> > - struct file *event_file = NULL;
> > struct fd group = {NULL, 0};
> > struct task_struct *task = NULL;
> > struct pmu *pmu;
> > @@ -7189,10 +7193,9 @@ SYSCALL_DEFINE5(perf_event_open,
> > goto err_context;
> > }
> >
> > - event_file = anon_inode_getfile("[perf_event]", &perf_fops, event,
> > - f_flags);
> > - if (IS_ERR(event_file)) {
> > - err = PTR_ERR(event_file);
> > + event_fd = perf_get_fd(event, f_flags);
> > + if (event_fd < 0) {
> > + err = event_fd;
> > goto err_context;
> > }
> >
> > @@ -7257,7 +7260,6 @@ SYSCALL_DEFINE5(perf_event_open,
> > * perf_group_detach().
> > */
> > fdput(group);
> > - fd_install(event_fd, event_file);
> > return event_fd;
> >
> > err_context:
>
> ISTR there being a good reason we only install the FD at the very last
> moment. Of course I cannot quite recall it :/
>
> I suspect it had something to do with not being able to fudge the state
> while its being set-up or somesuch.
I guess its this patch from Al you referring to:
ea635c6 Fix racy use of anon_inode_getfd() in perf_event.c
I think we do not introduce it back due to:
a6fa941 perf_event: Switch to internal refcount, fix race with close()
which removes the user of event_file.
-Robert
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH 04/16] perf, mmap: Factor out perf_get_fd()
2014-04-25 13:54 ` Robert Richter
@ 2014-04-25 14:43 ` Peter Zijlstra
2014-04-25 14:52 ` Peter Zijlstra
0 siblings, 1 reply; 41+ messages in thread
From: Peter Zijlstra @ 2014-04-25 14:43 UTC (permalink / raw)
To: Robert Richter
Cc: Jean Pihet, Borislav Petkov, Ingo Molnar,
Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel, Al Viro
On Fri, Apr 25, 2014 at 03:54:13PM +0200, Robert Richter wrote:
> I guess its this patch from Al you referring to:
>
> ea635c6 Fix racy use of anon_inode_getfd() in perf_event.c
>
> I think we do not introduce it back due to:
>
> a6fa941 perf_event: Switch to internal refcount, fix race with close()
>
> which removes the user of event_file.
linux-2.6# git show a6fa941
error: short SHA1 a6fa941 is ambiguous.
error: short SHA1 a6fa941 is ambiguous.
fatal: ambiguous argument 'a6fa941': unknown revision or path not in the working tree.
Which is I think why Linus makes us use 12 chars instead of the git
default of 8.
Of course its also entirely useless of git to not list the full IDs for
those it did find to be ambiguous.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 04/16] perf, mmap: Factor out perf_get_fd()
2014-04-25 14:43 ` Peter Zijlstra
@ 2014-04-25 14:52 ` Peter Zijlstra
2014-04-29 15:34 ` Robert Richter
0 siblings, 1 reply; 41+ messages in thread
From: Peter Zijlstra @ 2014-04-25 14:52 UTC (permalink / raw)
To: Robert Richter
Cc: Jean Pihet, Borislav Petkov, Ingo Molnar,
Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel, Al Viro
On Fri, Apr 25, 2014 at 04:43:01PM +0200, Peter Zijlstra wrote:
> On Fri, Apr 25, 2014 at 03:54:13PM +0200, Robert Richter wrote:
> > I guess its this patch from Al you referring to:
> >
> > ea635c6 Fix racy use of anon_inode_getfd() in perf_event.c
> >
> > I think we do not introduce it back due to:
> >
> > a6fa941 perf_event: Switch to internal refcount, fix race with close()
> >
> > which removes the user of event_file.
>
> linux-2.6# git show a6fa941
> error: short SHA1 a6fa941 is ambiguous.
> error: short SHA1 a6fa941 is ambiguous.
> fatal: ambiguous argument 'a6fa941': unknown revision or path not in the working tree.
>
> Which is I think why Linus makes us use 12 chars instead of the git
> default of 8.
>
> Of course its also entirely useless of git to not list the full IDs for
> those it did find to be ambiguous.
ok, so its a6fa941d94b4.
But no, I don't think that helps, its still true that the moment you get
a fd another thread can immediately close(). That would drop the last
ref and free it, meanwhile perf_event_open() is happily poking at it.
Now I think you could cure this by adding an extra ref before calling
your perf_get_fd() and dropping that extra ref at the end, where we used
to have fd_install().
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 04/16] perf, mmap: Factor out perf_get_fd()
2014-04-25 14:52 ` Peter Zijlstra
@ 2014-04-29 15:34 ` Robert Richter
0 siblings, 0 replies; 41+ messages in thread
From: Robert Richter @ 2014-04-29 15:34 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Jean Pihet, Borislav Petkov, Ingo Molnar,
Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel, Al Viro
On 25.04.14 16:52:05, Peter Zijlstra wrote:
> But no, I don't think that helps, its still true that the moment you get
> a fd another thread can immediately close(). That would drop the last
> ref and free it, meanwhile perf_event_open() is happily poking at it.
>
> Now I think you could cure this by adding an extra ref before calling
> your perf_get_fd() and dropping that extra ref at the end, where we used
> to have fd_install().
Yes, right. I have a solution now which increments the event's ref
count before creating the file descriptor using try_get_event()/
put_event().
The patch also does not remove get_unused_fd_flags() and the err_fd
error handler.
Have an update already of a rebase version but still need to test it.
Would it be ok to split the patch set and send in a first step only
the first 4 patches that refactor the perf mmap code?
Thanks,
-Robert
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 05/16] perf: Add persistent events
2014-04-07 15:04 [PATCH v4 00/16] perf, persistent: Add persistent events Jean Pihet
` (3 preceding siblings ...)
2014-04-07 15:04 ` [PATCH 04/16] perf, mmap: Factor out perf_get_fd() Jean Pihet
@ 2014-04-07 15:04 ` Jean Pihet
2014-04-07 15:04 ` [PATCH 06/16] mce, x86: Enable " Jean Pihet
` (12 subsequent siblings)
17 siblings, 0 replies; 41+ messages in thread
From: Jean Pihet @ 2014-04-07 15:04 UTC (permalink / raw)
To: Borislav Petkov, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel, Robert Richter
Cc: Robert Richter, Fengguang Wu, Jean Pihet
From: Robert Richter <robert.richter@linaro.org>
Add the needed pieces for persistent events which makes them
process-agnostic. Also, make their buffers read-only when mmaping them
from userspace.
Add a barebones implementation for registering persistent events with
perf. For that, we don't destroy the buffers when they're unmapped;
also, we map them read-only so that multiple agents can access them.
Also, we allocate the event buffers at event init time and not at mmap
time so that we can log samples into them regardless of whether there
are readers in userspace or not.
Multiple events from different cpus may map to a single persistent
event entry which has a unique identifier. The identifier allows to
access the persistent event with the perf_event_open() syscall. For
this the new event type PERF_TYPE_PERSISTENT must be set with its id
specified in attr.config. Currently there is only support for per-cpu
events. Also, root access is required.
Since the buffers are shared, the set_output ioctl may not be used in
conjunction with persistent events.
This patch only supports trace_points, support for all event types is
implemented in a later patch.
Based on patch set from Borislav Petkov <bp@alien8.de>.
Cc: Borislav Petkov <bp@alien8.de>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
Signed-off-by: Jean Pihet <jean.pihet@linaro.org>
---
include/linux/perf_event.h | 12 ++-
include/uapi/linux/perf_event.h | 4 +-
kernel/events/Makefile | 2 +-
kernel/events/core.c | 38 +++++--
kernel/events/internal.h | 2 +
kernel/events/persistent.c | 221 ++++++++++++++++++++++++++++++++++++++++
6 files changed, 266 insertions(+), 13 deletions(-)
create mode 100644 kernel/events/persistent.c
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index e56b07f..c368e9c 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -436,6 +436,8 @@ struct perf_event {
struct perf_cgroup *cgrp; /* cgroup event is attach to */
int cgrp_defer_enabled;
#endif
+ struct list_head pevent_entry; /* persistent event */
+ int pevent_id;
#endif /* CONFIG_PERF_EVENTS */
};
@@ -770,7 +772,7 @@ extern void perf_event_enable(struct perf_event *event);
extern void perf_event_disable(struct perf_event *event);
extern int __perf_event_disable(void *info);
extern void perf_event_task_tick(void);
-#else
+#else /* !CONFIG_PERF_EVENTS */
static inline void
perf_event_task_sched_in(struct task_struct *prev,
struct task_struct *task) { }
@@ -810,7 +812,7 @@ static inline void perf_event_enable(struct perf_event *event) { }
static inline void perf_event_disable(struct perf_event *event) { }
static inline int __perf_event_disable(void *info) { return -1; }
static inline void perf_event_task_tick(void) { }
-#endif
+#endif /* !CONFIG_PERF_EVENTS */
#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_NO_HZ_FULL)
extern bool perf_event_can_stop_tick(void);
@@ -824,6 +826,12 @@ extern void perf_restore_debug_store(void);
static inline void perf_restore_debug_store(void) { }
#endif
+#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_EVENT_TRACING)
+extern int perf_add_persistent_tp(struct ftrace_event_call *tp);
+#else
+static inline int perf_add_persistent_tp(void *tp) { return -ENOENT; }
+#endif
+
#define perf_output_put(handle, x) perf_output_copy((handle), &(x), sizeof(x))
/*
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 853bc1c..a3f2761 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -32,6 +32,7 @@ enum perf_type_id {
PERF_TYPE_HW_CACHE = 3,
PERF_TYPE_RAW = 4,
PERF_TYPE_BREAKPOINT = 5,
+ PERF_TYPE_PERSISTENT = 6,
PERF_TYPE_MAX, /* non-ABI */
};
@@ -301,8 +302,9 @@ struct perf_event_attr {
exclude_callchain_kernel : 1, /* exclude kernel callchains */
exclude_callchain_user : 1, /* exclude user callchains */
mmap2 : 1, /* include mmap with inode data */
+ persistent : 1, /* always-on event */
- __reserved_1 : 40;
+ __reserved_1 : 39;
union {
__u32 wakeup_events; /* wakeup every n events */
diff --git a/kernel/events/Makefile b/kernel/events/Makefile
index 103f5d1..70990d5 100644
--- a/kernel/events/Makefile
+++ b/kernel/events/Makefile
@@ -2,7 +2,7 @@ ifdef CONFIG_FUNCTION_TRACER
CFLAGS_REMOVE_core.o = -pg
endif
-obj-y := core.o ring_buffer.o callchain.o
+obj-y := core.o ring_buffer.o callchain.o persistent.o
obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
obj-$(CONFIG_UPROBES) += uprobes.o
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9857475..80ada8e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4087,6 +4087,9 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
if (!(vma->vm_flags & VM_SHARED))
return -EINVAL;
+ if (event->attr.persistent && (vma->vm_flags & VM_WRITE))
+ return -EACCES;
+
vma_size = vma->vm_end - vma->vm_start;
nr_pages = (vma_size / PAGE_SIZE) - 1;
@@ -4112,6 +4115,11 @@ again:
goto unlock;
}
+ if (!event->rb->overwrite && vma->vm_flags & VM_WRITE) {
+ ret = -EACCES;
+ goto unlock;
+ }
+
if (!atomic_inc_not_zero(&event->rb->mmap_count)) {
/*
* Raced against perf_mmap_close() through
@@ -5995,7 +6003,7 @@ static struct pmu perf_tracepoint = {
.event_idx = perf_swevent_event_idx,
};
-static inline void perf_tp_register(void)
+static inline void perf_register_tp(void)
{
perf_pmu_register(&perf_tracepoint, "tracepoint", PERF_TYPE_TRACEPOINT);
}
@@ -6025,18 +6033,14 @@ static void perf_event_free_filter(struct perf_event *event)
#else
-static inline void perf_tp_register(void)
-{
-}
+static inline void perf_register_tp(void) { }
static int perf_event_set_filter(struct perf_event *event, void __user *arg)
{
return -ENOENT;
}
-static void perf_event_free_filter(struct perf_event *event)
-{
-}
+static void perf_event_free_filter(struct perf_event *event) { }
#endif /* CONFIG_EVENT_TRACING */
@@ -6729,7 +6733,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
INIT_LIST_HEAD(&event->rb_entry);
INIT_LIST_HEAD(&event->active_entry);
INIT_HLIST_NODE(&event->hlist_entry);
-
+ INIT_LIST_HEAD(&event->pevent_entry);
init_waitqueue_head(&event->waitq);
init_irq_work(&event->pending, perf_pending_event);
@@ -6991,6 +6995,13 @@ set:
goto unlock;
}
+ /* Don't redirect read-only (persistent) events. */
+ ret = -EACCES;
+ if (old_rb && !old_rb->overwrite)
+ goto unlock;
+ if (rb && !rb->overwrite)
+ goto unlock;
+
if (old_rb)
ring_buffer_detach(event, old_rb);
@@ -7049,6 +7060,14 @@ SYSCALL_DEFINE5(perf_event_open,
if (err)
return err;
+ /* return fd for an existing persistent event */
+ if (attr.type == PERF_TYPE_PERSISTENT)
+ return perf_get_persistent_event_fd(cpu, attr.config);
+
+ /* put event into persistent state (not yet supported) */
+ if (attr.persistent)
+ return -EOPNOTSUPP;
+
if (!attr.exclude_kernel) {
if (perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN))
return -EACCES;
@@ -7990,7 +8009,8 @@ void __init perf_event_init(void)
perf_pmu_register(&perf_swevent, "software", PERF_TYPE_SOFTWARE);
perf_pmu_register(&perf_cpu_clock, NULL, -1);
perf_pmu_register(&perf_task_clock, NULL, -1);
- perf_tp_register();
+ perf_register_tp();
+ perf_register_persistent();
perf_cpu_notifier(perf_cpu_notify);
register_reboot_notifier(&perf_reboot_notifier);
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 5f1f92d..6b9a11d 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -210,5 +210,7 @@ static inline void put_event(struct perf_event *event)
extern int perf_alloc_rb(struct perf_event *event, int nr_pages, int flags);
extern void perf_free_rb(struct perf_event *event);
extern int perf_get_fd(struct perf_event *event, int f_flags);
+extern int perf_get_persistent_event_fd(int cpu, int id);
+extern void __init perf_register_persistent(void);
#endif /* _KERNEL_EVENTS_INTERNAL_H */
diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
new file mode 100644
index 0000000..78fdd74
--- /dev/null
+++ b/kernel/events/persistent.c
@@ -0,0 +1,221 @@
+#include <linux/slab.h>
+#include <linux/perf_event.h>
+#include <linux/ftrace_event.h>
+
+#include "internal.h"
+
+/* 512 kiB: default perf tools memory size, see perf_evlist__mmap() */
+#define CPU_BUFFER_NR_PAGES ((512 * 1024) / PAGE_SIZE)
+
+struct pevent {
+ char *name;
+ int id;
+};
+
+static DEFINE_PER_CPU(struct list_head, pevents);
+static DEFINE_PER_CPU(struct mutex, pevents_lock);
+
+/* Must be protected with pevents_lock. */
+static struct perf_event *__pevent_find(int cpu, int id)
+{
+ struct perf_event *event;
+
+ list_for_each_entry(event, &per_cpu(pevents, cpu), pevent_entry) {
+ if (event->pevent_id == id)
+ return event;
+ }
+
+ return NULL;
+}
+
+static int pevent_add(struct pevent *pevent, struct perf_event *event)
+{
+ int ret = -EEXIST;
+ int cpu = event->cpu;
+
+ mutex_lock(&per_cpu(pevents_lock, cpu));
+
+ if (__pevent_find(cpu, pevent->id))
+ goto unlock;
+
+ if (event->pevent_id)
+ goto unlock;
+
+ ret = 0;
+ event->pevent_id = pevent->id;
+ list_add_tail(&event->pevent_entry, &per_cpu(pevents, cpu));
+unlock:
+ mutex_unlock(&per_cpu(pevents_lock, cpu));
+
+ return ret;
+}
+
+static struct perf_event *pevent_del(struct pevent *pevent, int cpu)
+{
+ struct perf_event *event;
+
+ mutex_lock(&per_cpu(pevents_lock, cpu));
+
+ event = __pevent_find(cpu, pevent->id);
+ if (event) {
+ list_del(&event->pevent_entry);
+ event->pevent_id = 0;
+ }
+
+ mutex_unlock(&per_cpu(pevents_lock, cpu));
+
+ return event;
+}
+
+static void persistent_event_release(struct perf_event *event)
+{
+ /*
+ * Safe since we hold &event->mmap_count. The ringbuffer is
+ * released with put_event() if there are no other references.
+ * In this case there are also no other mmaps.
+ */
+ atomic_dec(&event->rb->mmap_count);
+ atomic_dec(&event->mmap_count);
+ put_event(event);
+}
+
+static int persistent_event_open(int cpu, struct pevent *pevent,
+ struct perf_event_attr *attr, int nr_pages)
+{
+ struct perf_event *event;
+ int ret;
+
+ event = perf_event_create_kernel_counter(attr, cpu, NULL, NULL, NULL);
+ if (IS_ERR(event))
+ return PTR_ERR(event);
+
+ if (nr_pages < 0)
+ nr_pages = CPU_BUFFER_NR_PAGES;
+
+ ret = perf_alloc_rb(event, nr_pages, 0);
+ if (ret)
+ goto fail;
+
+ ret = pevent_add(pevent, event);
+ if (ret)
+ goto fail;
+
+ atomic_inc(&event->mmap_count);
+
+ /* All workie, enable event now */
+ perf_event_enable(event);
+
+ return ret;
+fail:
+ perf_event_release_kernel(event);
+ return ret;
+}
+
+static void persistent_event_close(int cpu, struct pevent *pevent)
+{
+ struct perf_event *event = pevent_del(pevent, cpu);
+ if (event)
+ persistent_event_release(event);
+}
+
+static int __maybe_unused
+persistent_open(char *name, struct perf_event_attr *attr, int nr_pages)
+{
+ struct pevent *pevent;
+ char id_buf[32];
+ int cpu;
+ int ret = 0;
+
+ pevent = kzalloc(sizeof(*pevent), GFP_KERNEL);
+ if (!pevent)
+ return -ENOMEM;
+
+ pevent->id = attr->config;
+
+ if (!name) {
+ snprintf(id_buf, sizeof(id_buf), "%d", pevent->id);
+ name = id_buf;
+ }
+
+ pevent->name = kstrdup(name, GFP_KERNEL);
+ if (!pevent->name) {
+ ret = -ENOMEM;
+ goto fail;
+ }
+
+ for_each_possible_cpu(cpu) {
+ ret = persistent_event_open(cpu, pevent, attr, nr_pages);
+ if (ret)
+ goto fail;
+ }
+
+ return 0;
+fail:
+ for_each_possible_cpu(cpu)
+ persistent_event_close(cpu, pevent);
+ kfree(pevent->name);
+ kfree(pevent);
+
+ pr_err("%s: Error adding persistent event: %d\n",
+ __func__, ret);
+
+ return ret;
+}
+
+#ifdef CONFIG_EVENT_TRACING
+
+int perf_add_persistent_tp(struct ftrace_event_call *tp)
+{
+ struct perf_event_attr attr;
+
+ memset(&attr, 0, sizeof(attr));
+ attr.sample_period = 1;
+ attr.wakeup_events = 1;
+ attr.sample_type = PERF_SAMPLE_RAW;
+ attr.persistent = 1;
+ attr.config = tp->event.type;
+ attr.type = PERF_TYPE_TRACEPOINT;
+ attr.size = sizeof(attr);
+
+ return persistent_open(tp->name, &attr, -1);
+}
+
+#endif /* CONFIG_EVENT_TRACING */
+
+int perf_get_persistent_event_fd(int cpu, int id)
+{
+ struct perf_event *event;
+ int event_fd = 0;
+
+ if ((unsigned)cpu >= nr_cpu_ids)
+ return -EINVAL;
+
+ /* Must be root for persistent events */
+ if (perf_paranoid_cpu() && !capable(CAP_SYS_ADMIN))
+ return -EACCES;
+
+ mutex_lock(&per_cpu(pevents_lock, cpu));
+ event = __pevent_find(cpu, id);
+ if (!event || !try_get_event(event))
+ event_fd = -ENOENT;
+ mutex_unlock(&per_cpu(pevents_lock, cpu));
+
+ if (event_fd)
+ return event_fd;
+
+ event_fd = perf_get_fd(event, O_RDWR);
+ if (event_fd < 0)
+ put_event(event);
+
+ return event_fd;
+}
+
+void __init perf_register_persistent(void)
+{
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ INIT_LIST_HEAD(&per_cpu(pevents, cpu));
+ mutex_init(&per_cpu(pevents_lock, cpu));
+ }
+}
--
1.7.11.7
^ permalink raw reply related [flat|nested] 41+ messages in thread* [PATCH 06/16] mce, x86: Enable persistent events
2014-04-07 15:04 [PATCH v4 00/16] perf, persistent: Add persistent events Jean Pihet
` (4 preceding siblings ...)
2014-04-07 15:04 ` [PATCH 05/16] perf: Add persistent events Jean Pihet
@ 2014-04-07 15:04 ` Jean Pihet
2014-04-07 15:04 ` [PATCH 07/16] perf, persistent: Implementing a persistent pmu Jean Pihet
` (11 subsequent siblings)
17 siblings, 0 replies; 41+ messages in thread
From: Jean Pihet @ 2014-04-07 15:04 UTC (permalink / raw)
To: Borislav Petkov, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel, Robert Richter
Cc: Borislav Petkov
From: Borislav Petkov <bp@suse.de>
... for MCEs collection.
Signed-off-by: Borislav Petkov <bp@suse.de>
[ rric: Fix build error for no-tracepoints configs ]
[ rric: Return proper error code. ]
[ rric: No error message if perf is disabled. ]
Signed-off-by: Robert Richter <rric@kernel.org>
---
arch/x86/kernel/cpu/mcheck/mce.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 4d5419b..92b0d45 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -2018,6 +2018,25 @@ int __init mcheck_init(void)
return 0;
}
+#ifdef CONFIG_EVENT_TRACING
+
+int __init mcheck_init_tp(void)
+{
+ int ret = perf_add_persistent_tp(&event_mce_record);
+
+ if (ret && ret != -ENOENT)
+ pr_err("Error adding MCE persistent event: %d\n", ret);
+
+ return ret;
+}
+/*
+ * We can't run earlier because persistent events uses anon_inode_getfile and
+ * its anon_inode_mnt gets initialized as a fs_initcall.
+ */
+fs_initcall_sync(mcheck_init_tp);
+
+#endif /* CONFIG_EVENT_TRACING */
+
/*
* mce_syscore: PM support
*/
--
1.7.11.7
^ permalink raw reply related [flat|nested] 41+ messages in thread* [PATCH 07/16] perf, persistent: Implementing a persistent pmu
2014-04-07 15:04 [PATCH v4 00/16] perf, persistent: Add persistent events Jean Pihet
` (5 preceding siblings ...)
2014-04-07 15:04 ` [PATCH 06/16] mce, x86: Enable " Jean Pihet
@ 2014-04-07 15:04 ` Jean Pihet
2014-04-07 15:04 ` [PATCH 08/16] perf, persistent: Exposing persistent events using sysfs Jean Pihet
` (10 subsequent siblings)
17 siblings, 0 replies; 41+ messages in thread
From: Jean Pihet @ 2014-04-07 15:04 UTC (permalink / raw)
To: Borislav Petkov, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel, Robert Richter
Cc: Robert Richter, Jean Pihet
From: Robert Richter <robert.richter@linaro.org>
We want to use the kernel's pmu design to later expose persistent
events via sysfs to userland. Initially implement a persistent pmu.
The format syntax is introduced allowing to set bits anywhere in
struct perf_event_attr. This is used in this case to set the
persistent flag (attr5:24). The syntax is attr<num> where num is the
index of the u64 array in struct perf_event_attr. Otherwise syntax is
same as for config<num>.
Patches that implement this functionality for perf tools are sent in a
separate patchset.
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
Signed-off-by: Jean Pihet <jean.pihet@linaro.org>
---
kernel/events/persistent.c | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)
diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index 78fdd74..6b039f5 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -12,6 +12,7 @@ struct pevent {
int id;
};
+static struct pmu persistent_pmu;
static DEFINE_PER_CPU(struct list_head, pevents);
static DEFINE_PER_CPU(struct mutex, pevents_lock);
@@ -210,10 +211,43 @@ int perf_get_persistent_event_fd(int cpu, int id)
return event_fd;
}
+PMU_FORMAT_ATTR(persistent, "attr5:24");
+
+static struct attribute *persistent_format_attrs[] = {
+ &format_attr_persistent.attr,
+ NULL,
+};
+
+static struct attribute_group persistent_format_group = {
+ .name = "format",
+ .attrs = persistent_format_attrs,
+};
+
+static const struct attribute_group *persistent_attr_groups[] = {
+ &persistent_format_group,
+ NULL,
+};
+
+static int persistent_pmu_init(struct perf_event *event)
+{
+ if (persistent_pmu.type != event->attr.type)
+ return -ENOENT;
+
+ /* Not a persistent event. */
+ return -EFAULT;
+}
+
+static struct pmu persistent_pmu = {
+ .event_init = persistent_pmu_init,
+ .attr_groups = persistent_attr_groups,
+};
+
void __init perf_register_persistent(void)
{
int cpu;
+ perf_pmu_register(&persistent_pmu, "persistent", PERF_TYPE_PERSISTENT);
+
for_each_possible_cpu(cpu) {
INIT_LIST_HEAD(&per_cpu(pevents, cpu));
mutex_init(&per_cpu(pevents_lock, cpu));
--
1.7.11.7
^ permalink raw reply related [flat|nested] 41+ messages in thread* [PATCH 08/16] perf, persistent: Exposing persistent events using sysfs
2014-04-07 15:04 [PATCH v4 00/16] perf, persistent: Add persistent events Jean Pihet
` (6 preceding siblings ...)
2014-04-07 15:04 ` [PATCH 07/16] perf, persistent: Implementing a persistent pmu Jean Pihet
@ 2014-04-07 15:04 ` Jean Pihet
2014-04-07 15:04 ` [PATCH 09/16] perf, persistent: Use unique event ids Jean Pihet
` (9 subsequent siblings)
17 siblings, 0 replies; 41+ messages in thread
From: Jean Pihet @ 2014-04-07 15:04 UTC (permalink / raw)
To: Borislav Petkov, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel, Robert Richter
Cc: Robert Richter, Namhyung Kim, Jean Pihet
From: Robert Richter <robert.richter@linaro.org>
Expose persistent events in the system to userland using sysfs. Perf
tools are able to read existing pmu events from sysfs. Now we use a
persistent pmu as an event container containing all registered
persistent events of the system. This patch adds dynamically
registration of persistent events to sysfs. E.g. something like this:
/sys/bus/event_source/devices/persistent/events/mce_record:persistent,config=106
/sys/bus/event_source/devices/persistent/format/persistent:attr5:23
Perf tools need to support the attr<num> syntax that is added in a
separate patch set. With it we are able to run perf tool commands to
read persistent events, e.g.:
# perf record -e persistent/mce_record/ sleep 10
# perf top -e persistent/mce_record/
[ Jiri: Document attr<index> syntax in sysfs ABI ]
[ Namhyung: Fix sysfs registration with lockdep enabled ]
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
Signed-off-by: Jean Pihet <jean.pihet@linaro.org>
---
.../testing/sysfs-bus-event_source-devices-format | 43 ++++++++++++----
kernel/events/persistent.c | 60 ++++++++++++++++++++++
2 files changed, 92 insertions(+), 11 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-format b/Documentation/ABI/testing/sysfs-bus-event_source-devices-format
index 77f47ff..744d53a 100644
--- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-format
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-format
@@ -1,13 +1,14 @@
-Where: /sys/bus/event_source/devices/<dev>/format
+Where: /sys/bus/event_source/devices/<pmu>/format/<name>
Date: January 2012
-Kernel Version: 3.3
+Kernel Version: 3.3
+ 3.xx (added attr<index>:<bits>)
Contact: Jiri Olsa <jolsa@redhat.com>
-Description:
- Attribute group to describe the magic bits that go into
- perf_event_attr::config[012] for a particular pmu.
- Each attribute of this group defines the 'hardware' bitmask
- we want to export, so that userspace can deal with sane
- name/value pairs.
+
+Description: Define formats for bit ranges in perf_event_attr
+
+ Attribute group to describe the magic bits that go
+ into struct perf_event_attr for a particular pmu. Bit
+ range may be any bit mask of an u64 (bits 0 to 63).
Userspace must be prepared for the possibility that attributes
define overlapping bit ranges. For example:
@@ -15,6 +16,26 @@ Description:
attr2 = 'config:0-7'
attr3 = 'config:12-35'
- Example: 'config1:1,6-10,44'
- Defines contents of attribute that occupies bits 1,6-10,44 of
- perf_event_attr::config1.
+ Syntax Description
+
+ config[012]*:<bits> Each attribute of this group
+ defines the 'hardware' bitmask
+ we want to export, so that
+ userspace can deal with sane
+ name/value pairs.
+
+ attr<index>:<bits> Set any field of the event
+ attribute. The index is a
+ decimal number that specifies
+ the u64 value to be set within
+ struct perf_event_attr.
+
+ Examples:
+
+ 'config1:1,6-10,44' Defines contents of attribute
+ that occupies bits 1,6-10,44
+ of perf_event_attr::config1.
+
+ 'attr5:24' Define the persistent event
+ flag (bit 24 of the attribute
+ flags)
diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index 6b039f5..6578acf 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -8,6 +8,7 @@
#define CPU_BUFFER_NR_PAGES ((512 * 1024) / PAGE_SIZE)
struct pevent {
+ struct perf_pmu_events_attr sysfs;
char *name;
int id;
};
@@ -119,6 +120,8 @@ static void persistent_event_close(int cpu, struct pevent *pevent)
persistent_event_release(event);
}
+static int pevent_sysfs_register(struct pevent *event);
+
static int __maybe_unused
persistent_open(char *name, struct perf_event_attr *attr, int nr_pages)
{
@@ -144,12 +147,18 @@ persistent_open(char *name, struct perf_event_attr *attr, int nr_pages)
goto fail;
}
+ pevent->sysfs.id = pevent->id;
+
for_each_possible_cpu(cpu) {
ret = persistent_event_open(cpu, pevent, attr, nr_pages);
if (ret)
goto fail;
}
+ ret = pevent_sysfs_register(pevent);
+ if (ret)
+ goto fail;
+
return 0;
fail:
for_each_possible_cpu(cpu)
@@ -223,10 +232,61 @@ static struct attribute_group persistent_format_group = {
.attrs = persistent_format_attrs,
};
+#define MAX_EVENTS 16
+
+static struct attribute *pevents_attr[MAX_EVENTS + 1] = { };
+
+static struct attribute_group pevents_group = {
+ .name = "events",
+ .attrs = pevents_attr,
+};
+
static const struct attribute_group *persistent_attr_groups[] = {
&persistent_format_group,
+ NULL, /* placeholder: &pevents_group */
NULL,
};
+#define EVENTS_GROUP_PTR (&persistent_attr_groups[1])
+
+static ssize_t pevent_sysfs_show(struct device *dev,
+ struct device_attribute *__attr, char *page)
+{
+ struct perf_pmu_events_attr *attr =
+ container_of(__attr, struct perf_pmu_events_attr, attr);
+ return sprintf(page, "persistent,config=%lld",
+ (unsigned long long)attr->id);
+}
+
+static int pevent_sysfs_register(struct pevent *pevent)
+{
+ struct perf_pmu_events_attr *sysfs = &pevent->sysfs;
+ struct attribute *attr = &sysfs->attr.attr;
+ struct device *dev = persistent_pmu.dev;
+ const struct attribute_group **group = EVENTS_GROUP_PTR;
+ int idx;
+
+ sysfs->id = pevent->id;
+ sysfs->attr = (struct device_attribute)
+ __ATTR(, 0444, pevent_sysfs_show, NULL);
+ attr->name = pevent->name;
+ sysfs_attr_init(attr);
+
+ /* add sysfs attr to events: */
+ for (idx = 0; idx < MAX_EVENTS; idx++) {
+ if (!cmpxchg(pevents_attr + idx, NULL, attr))
+ break;
+ }
+
+ if (idx >= MAX_EVENTS)
+ return -ENOSPC;
+ if (!idx)
+ *group = &pevents_group;
+ if (!dev)
+ return 0; /* sysfs not yet initialized */
+ if (idx)
+ return sysfs_add_file_to_group(&dev->kobj, attr, (*group)->name);
+ return sysfs_create_group(&persistent_pmu.dev->kobj, *group);
+}
static int persistent_pmu_init(struct perf_event *event)
{
--
1.7.11.7
^ permalink raw reply related [flat|nested] 41+ messages in thread* [PATCH 09/16] perf, persistent: Use unique event ids
2014-04-07 15:04 [PATCH v4 00/16] perf, persistent: Add persistent events Jean Pihet
` (7 preceding siblings ...)
2014-04-07 15:04 ` [PATCH 08/16] perf, persistent: Exposing persistent events using sysfs Jean Pihet
@ 2014-04-07 15:04 ` Jean Pihet
2014-04-07 15:04 ` [PATCH 10/16] perf, persistent: Implement reference counter for events Jean Pihet
` (8 subsequent siblings)
17 siblings, 0 replies; 41+ messages in thread
From: Jean Pihet @ 2014-04-07 15:04 UTC (permalink / raw)
To: Borislav Petkov, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel, Robert Richter
Cc: Robert Richter, Jean Pihet
From: Robert Richter <robert.richter@linaro.org>
Tracepoints have a unique attr.config value. But, this is not
sufficient to support all event types. For this we need to generate
unique event ids.
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
Signed-off-by: Jean Pihet <jean.pihet@linaro.org>
---
kernel/events/persistent.c | 40 ++++++++++++++++++++++++++++++++++++++--
1 file changed, 38 insertions(+), 2 deletions(-)
diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index 6578acf..d04827c 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -1,6 +1,7 @@
#include <linux/slab.h>
#include <linux/perf_event.h>
#include <linux/ftrace_event.h>
+#include <linux/idr.h>
#include "internal.h"
@@ -13,10 +14,37 @@ struct pevent {
int id;
};
+static struct idr event_idr;
+static struct mutex event_lock;
static struct pmu persistent_pmu;
static DEFINE_PER_CPU(struct list_head, pevents);
static DEFINE_PER_CPU(struct mutex, pevents_lock);
+static inline struct pevent *find_event(int id)
+{
+ struct pevent *pevent;
+ rcu_read_lock();
+ pevent = idr_find(&event_idr, id);
+ rcu_read_lock();
+ return pevent;
+}
+
+static inline int get_event_id(struct pevent *pevent)
+{
+ int event_id;
+ mutex_lock(&event_lock);
+ event_id = idr_alloc(&event_idr, pevent, 1, INT_MAX, GFP_KERNEL);
+ mutex_unlock(&event_lock);
+ return event_id;
+}
+
+static inline void put_event_id(int id)
+{
+ mutex_lock(&event_lock);
+ idr_remove(&event_idr, id);
+ mutex_unlock(&event_lock);
+}
+
/* Must be protected with pevents_lock. */
static struct perf_event *__pevent_find(int cpu, int id)
{
@@ -128,13 +156,16 @@ persistent_open(char *name, struct perf_event_attr *attr, int nr_pages)
struct pevent *pevent;
char id_buf[32];
int cpu;
- int ret = 0;
+ int ret;
pevent = kzalloc(sizeof(*pevent), GFP_KERNEL);
if (!pevent)
return -ENOMEM;
- pevent->id = attr->config;
+ ret = get_event_id(pevent);
+ if (ret < 0)
+ goto fail;
+ pevent->id = ret;
if (!name) {
snprintf(id_buf, sizeof(id_buf), "%d", pevent->id);
@@ -163,6 +194,9 @@ persistent_open(char *name, struct perf_event_attr *attr, int nr_pages)
fail:
for_each_possible_cpu(cpu)
persistent_event_close(cpu, pevent);
+
+ if (pevent->id)
+ put_event_id(pevent->id);
kfree(pevent->name);
kfree(pevent);
@@ -306,6 +340,8 @@ void __init perf_register_persistent(void)
{
int cpu;
+ idr_init(&event_idr);
+ mutex_init(&event_lock);
perf_pmu_register(&persistent_pmu, "persistent", PERF_TYPE_PERSISTENT);
for_each_possible_cpu(cpu) {
--
1.7.11.7
^ permalink raw reply related [flat|nested] 41+ messages in thread* [PATCH 10/16] perf, persistent: Implement reference counter for events
2014-04-07 15:04 [PATCH v4 00/16] perf, persistent: Add persistent events Jean Pihet
` (8 preceding siblings ...)
2014-04-07 15:04 ` [PATCH 09/16] perf, persistent: Use unique event ids Jean Pihet
@ 2014-04-07 15:04 ` Jean Pihet
2014-04-07 15:04 ` [PATCH 11/16] perf, persistent: Dynamically resize list of sysfs entries Jean Pihet
` (7 subsequent siblings)
17 siblings, 0 replies; 41+ messages in thread
From: Jean Pihet @ 2014-04-07 15:04 UTC (permalink / raw)
To: Borislav Petkov, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel, Robert Richter
Cc: Robert Richter, Jean Pihet
From: Robert Richter <robert.richter@linaro.org>
We need this later for proper event removal.
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
Signed-off-by: Jean Pihet <jean.pihet@linaro.org>
---
kernel/events/persistent.c | 27 +++++++++++++++++----------
1 file changed, 17 insertions(+), 10 deletions(-)
diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index d04827c..fcbb4f8 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -9,6 +9,7 @@
#define CPU_BUFFER_NR_PAGES ((512 * 1024) / PAGE_SIZE)
struct pevent {
+ atomic_t refcount;
struct perf_pmu_events_attr sysfs;
char *name;
int id;
@@ -130,6 +131,7 @@ static int persistent_event_open(int cpu, struct pevent *pevent,
if (ret)
goto fail;
+ atomic_inc(&pevent->refcount);
atomic_inc(&event->mmap_count);
/* All workie, enable event now */
@@ -144,8 +146,11 @@ fail:
static void persistent_event_close(int cpu, struct pevent *pevent)
{
struct perf_event *event = pevent_del(pevent, cpu);
- if (event)
+ if (event) {
+ /* Safe, the caller holds &pevent->refcount too. */
+ atomic_dec(&pevent->refcount);
persistent_event_release(event);
+ }
}
static int pevent_sysfs_register(struct pevent *event);
@@ -162,6 +167,8 @@ persistent_open(char *name, struct perf_event_attr *attr, int nr_pages)
if (!pevent)
return -ENOMEM;
+ atomic_set(&pevent->refcount, 1);
+
ret = get_event_id(pevent);
if (ret < 0)
goto fail;
@@ -187,21 +194,21 @@ persistent_open(char *name, struct perf_event_attr *attr, int nr_pages)
}
ret = pevent_sysfs_register(pevent);
- if (ret)
- goto fail;
-
- return 0;
+ if (!ret)
+ goto out;
fail:
for_each_possible_cpu(cpu)
persistent_event_close(cpu, pevent);
- if (pevent->id)
- put_event_id(pevent->id);
- kfree(pevent->name);
- kfree(pevent);
-
pr_err("%s: Error adding persistent event: %d\n",
__func__, ret);
+out:
+ if (atomic_dec_and_test(&pevent->refcount)) {
+ if (pevent->id)
+ put_event_id(pevent->id);
+ kfree(pevent->name);
+ kfree(pevent);
+ }
return ret;
}
--
1.7.11.7
^ permalink raw reply related [flat|nested] 41+ messages in thread* [PATCH 11/16] perf, persistent: Dynamically resize list of sysfs entries
2014-04-07 15:04 [PATCH v4 00/16] perf, persistent: Add persistent events Jean Pihet
` (9 preceding siblings ...)
2014-04-07 15:04 ` [PATCH 10/16] perf, persistent: Implement reference counter for events Jean Pihet
@ 2014-04-07 15:04 ` Jean Pihet
2014-04-07 15:04 ` [PATCH 12/16] perf, persistent: ioctl functions to control persistency Jean Pihet
` (6 subsequent siblings)
17 siblings, 0 replies; 41+ messages in thread
From: Jean Pihet @ 2014-04-07 15:04 UTC (permalink / raw)
To: Borislav Petkov, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel, Robert Richter
Cc: Robert Richter, Jean Pihet
From: Robert Richter <robert.richter@linaro.org>
There was a limitation of the total number of persistent events to be
registered in sysfs due to the lack of dynamically list allocation.
This patch implements memory reallocation in case an event is added or
removed from the list.
While at this also implement pevent_sysfs_unregister() which we need
later for proper event removal.
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
Signed-off-by: Jean Pihet <jean.pihet@linaro.org>
---
kernel/events/persistent.c | 115 ++++++++++++++++++++++++++++++++++++++-------
1 file changed, 99 insertions(+), 16 deletions(-)
diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index fcbb4f8..49bf889 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -154,6 +154,7 @@ static void persistent_event_close(int cpu, struct pevent *pevent)
}
static int pevent_sysfs_register(struct pevent *event);
+static void pevent_sysfs_unregister(struct pevent *event);
static int __maybe_unused
persistent_open(char *name, struct perf_event_attr *attr, int nr_pages)
@@ -204,6 +205,7 @@ fail:
__func__, ret);
out:
if (atomic_dec_and_test(&pevent->refcount)) {
+ pevent_sysfs_unregister(pevent);
if (pevent->id)
put_event_id(pevent->id);
kfree(pevent->name);
@@ -273,13 +275,12 @@ static struct attribute_group persistent_format_group = {
.attrs = persistent_format_attrs,
};
-#define MAX_EVENTS 16
-
-static struct attribute *pevents_attr[MAX_EVENTS + 1] = { };
+static struct mutex sysfs_lock;
+static int sysfs_nr_entries;
static struct attribute_group pevents_group = {
.name = "events",
- .attrs = pevents_attr,
+ .attrs = NULL, /* dynamically allocated */
};
static const struct attribute_group *persistent_attr_groups[] = {
@@ -288,6 +289,7 @@ static const struct attribute_group *persistent_attr_groups[] = {
NULL,
};
#define EVENTS_GROUP_PTR (&persistent_attr_groups[1])
+#define EVENTS_ATTRS_PTR (&pevents_group.attrs)
static ssize_t pevent_sysfs_show(struct device *dev,
struct device_attribute *__attr, char *page)
@@ -304,7 +306,9 @@ static int pevent_sysfs_register(struct pevent *pevent)
struct attribute *attr = &sysfs->attr.attr;
struct device *dev = persistent_pmu.dev;
const struct attribute_group **group = EVENTS_GROUP_PTR;
- int idx;
+ struct attribute ***attrs_ptr = EVENTS_ATTRS_PTR;
+ struct attribute **attrs;
+ int ret = 0;
sysfs->id = pevent->id;
sysfs->attr = (struct device_attribute)
@@ -312,21 +316,99 @@ static int pevent_sysfs_register(struct pevent *pevent)
attr->name = pevent->name;
sysfs_attr_init(attr);
- /* add sysfs attr to events: */
- for (idx = 0; idx < MAX_EVENTS; idx++) {
- if (!cmpxchg(pevents_attr + idx, NULL, attr))
- break;
+ mutex_lock(&sysfs_lock);
+
+ /*
+ * Keep old list if no new one is available. Need this for
+ * device_remove_attrs() if unregistering pmu.
+ */
+ attrs = __krealloc(*attrs_ptr, (sysfs_nr_entries + 2) * sizeof(*attrs),
+ GFP_KERNEL);
+
+ if (!attrs) {
+ ret = -ENOMEM;
+ goto unlock;
}
- if (idx >= MAX_EVENTS)
- return -ENOSPC;
- if (!idx)
+ attrs[sysfs_nr_entries++] = attr;
+ attrs[sysfs_nr_entries] = NULL;
+
+ if (!*group)
*group = &pevents_group;
+
+ if (!dev)
+ goto out; /* sysfs not yet initialized */
+
+ if (sysfs_nr_entries == 1)
+ ret = sysfs_create_group(&dev->kobj, *group);
+ else
+ ret = sysfs_add_file_to_group(&dev->kobj, attr, (*group)->name);
+
+ if (ret) {
+ /* roll back */
+ sysfs_nr_entries--;
+ if (!sysfs_nr_entries)
+ *group = NULL;
+ if (*attrs_ptr != attrs)
+ kfree(attrs);
+ else
+ attrs[sysfs_nr_entries] = NULL;
+ goto unlock;
+ }
+out:
+ if (*attrs_ptr != attrs) {
+ kfree(*attrs_ptr);
+ *attrs_ptr = attrs;
+ }
+unlock:
+ mutex_unlock(&sysfs_lock);
+
+ return ret;
+}
+
+static void pevent_sysfs_unregister(struct pevent *pevent)
+{
+ struct attribute *attr = &pevent->sysfs.attr.attr;
+ struct device *dev = persistent_pmu.dev;
+ const struct attribute_group **group = EVENTS_GROUP_PTR;
+ struct attribute ***attrs_ptr = EVENTS_ATTRS_PTR;
+ struct attribute **attrs, **dest;
+
+ mutex_lock(&sysfs_lock);
+
+ for (dest = *attrs_ptr; *dest; dest++) {
+ if (*dest == attr)
+ break;
+ }
+
+ if (!*dest)
+ goto unlock;
+
+ sysfs_nr_entries--;
+
+ *dest = (*attrs_ptr)[sysfs_nr_entries];
+ (*attrs_ptr)[sysfs_nr_entries] = NULL;
+
if (!dev)
- return 0; /* sysfs not yet initialized */
- if (idx)
- return sysfs_add_file_to_group(&dev->kobj, attr, (*group)->name);
- return sysfs_create_group(&persistent_pmu.dev->kobj, *group);
+ goto out; /* sysfs not yet initialized */
+
+ if (!sysfs_nr_entries)
+ sysfs_remove_group(&dev->kobj, *group);
+ else
+ sysfs_remove_file_from_group(&dev->kobj, attr, (*group)->name);
+out:
+ if (!sysfs_nr_entries)
+ *group = NULL;
+
+ attrs = __krealloc(*attrs_ptr, (sysfs_nr_entries + 1) * sizeof(*attrs),
+ GFP_KERNEL);
+
+ if (!attrs && *attrs_ptr != attrs) {
+ kfree(*attrs_ptr);
+ *attrs_ptr = attrs;
+ }
+unlock:
+ mutex_unlock(&sysfs_lock);
}
static int persistent_pmu_init(struct perf_event *event)
@@ -349,6 +431,7 @@ void __init perf_register_persistent(void)
idr_init(&event_idr);
mutex_init(&event_lock);
+ mutex_init(&sysfs_lock);
perf_pmu_register(&persistent_pmu, "persistent", PERF_TYPE_PERSISTENT);
for_each_possible_cpu(cpu) {
--
1.7.11.7
^ permalink raw reply related [flat|nested] 41+ messages in thread* [PATCH 12/16] perf, persistent: ioctl functions to control persistency
2014-04-07 15:04 [PATCH v4 00/16] perf, persistent: Add persistent events Jean Pihet
` (10 preceding siblings ...)
2014-04-07 15:04 ` [PATCH 11/16] perf, persistent: Dynamically resize list of sysfs entries Jean Pihet
@ 2014-04-07 15:04 ` Jean Pihet
2014-04-07 15:04 ` [PATCH 13/16] perf tools: Rename flex conditions to avoid name conflicts Jean Pihet
` (5 subsequent siblings)
17 siblings, 0 replies; 41+ messages in thread
From: Jean Pihet @ 2014-04-07 15:04 UTC (permalink / raw)
To: Borislav Petkov, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel, Robert Richter
Cc: Robert Richter, Vince Weaver, Jean Pihet
From: Robert Richter <robert.richter@linaro.org>
Implementing ioctl functions to control persistent events. There are
functions to unclaim or claim an event to or from a process. The
PERF_EVENT_IOC_UNCLAIM ioctl call makes an event persistent. After
closing the event's fd it runs then in the background of the system
without the need of a controlling process. The perf_event_open()
syscall can be used to reopen the event by any process. The
PERF_EVENT_IOC_CLAIM ioctl attaches the event again so that it is
removed after closing the event's fd.
This is for Linux man-pages:
type ...
PERF_TYPE_PERSISTENT (Since Linux 3.xx)
This indicates a persistent event. There is a unique
identifier for each persistent event that needs to be
specified in the event's attribute config field.
Persistent events are listed under:
/sys/bus/event_source/devices/persistent/
...
persistent : 41, /* always-on event */
...
persistent: (Since Linux 3.xx)
Put event into persistent state after opening. After closing
the event's fd the event is persistent in the system and
continues to run.
perf_event ioctl calls
PERF_EVENT_IOC_UNCLAIM (Since Linux 3.xx)
Unclaim the event specified by the file descriptor from the
process and make it persistent in the system. After
closing the fd the event will continue to run. An unique
identifier for the persistent event is returned or an
error otherwise. The following allows to connect to the
event again:
pe.type = PERF_TYPE_PERSISTENT;
pe.config = <pevent_id>;
...
fd = perf_event_open(...);
The event must be reopened on the same cpu.
PERF_EVENT_IOC_CLAIM (Since Linux 3.xx)
Claim the event specified by the file descriptor to the
current process. The event is no longer persistent in the
system and will be removed after all users disconnected
from the event. Thus, if there are no other users the
event will be closed too after closing its file
descriptor, the event then no longer exists.
[ Jean Pihet: renamed PERF_EVENT_IOC_ATTACH/DETACH to PERF_EVENT_IOC_CLAIM/UNCLAIM ]
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
Signed-off-by: Jean Pihet <jean.pihet@linaro.org>
---
include/uapi/linux/perf_event.h | 2 +
kernel/events/core.c | 6 ++
kernel/events/internal.h | 2 +
kernel/events/persistent.c | 178 +++++++++++++++++++++++++++++++++-------
4 files changed, 160 insertions(+), 28 deletions(-)
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index a3f2761..b2f1943 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -350,6 +350,8 @@ struct perf_event_attr {
#define PERF_EVENT_IOC_SET_OUTPUT _IO ('$', 5)
#define PERF_EVENT_IOC_SET_FILTER _IOW('$', 6, char *)
#define PERF_EVENT_IOC_ID _IOR('$', 7, __u64 *)
+#define PERF_EVENT_IOC_UNCLAIM _IO ('$', 8)
+#define PERF_EVENT_IOC_CLAIM _IO ('$', 9)
enum perf_event_ioc_flags {
PERF_IOC_FLAG_GROUP = 1U << 0,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 80ada8e..bdf3895 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3707,6 +3707,12 @@ static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
case PERF_EVENT_IOC_SET_FILTER:
return perf_event_set_filter(event, (void __user *)arg);
+ case PERF_EVENT_IOC_UNCLAIM:
+ return perf_event_unclaim(event);
+
+ case PERF_EVENT_IOC_CLAIM:
+ return perf_event_claim(event);
+
default:
return -ENOTTY;
}
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 6b9a11d..9a871b5 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -212,5 +212,7 @@ extern void perf_free_rb(struct perf_event *event);
extern int perf_get_fd(struct perf_event *event, int f_flags);
extern int perf_get_persistent_event_fd(int cpu, int id);
extern void __init perf_register_persistent(void);
+extern int perf_event_unclaim(struct perf_event *event);
+extern int perf_event_claim(struct perf_event *event);
#endif /* _KERNEL_EVENTS_INTERNAL_H */
diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index 49bf889..0d796c6 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -59,6 +59,49 @@ static struct perf_event *__pevent_find(int cpu, int id)
return NULL;
}
+static void pevent_free(struct pevent *pevent)
+{
+ if (pevent->id)
+ put_event_id(pevent->id);
+
+ kfree(pevent->name);
+ kfree(pevent);
+}
+
+static struct pevent *pevent_alloc(char *name)
+{
+ struct pevent *pevent;
+ char id_buf[32];
+ int ret;
+
+ pevent = kzalloc(sizeof(*pevent), GFP_KERNEL);
+ if (!pevent)
+ return ERR_PTR(-ENOMEM);
+
+ atomic_set(&pevent->refcount, 1);
+
+ ret = get_event_id(pevent);
+ if (ret < 0)
+ goto fail;
+ pevent->id = ret;
+
+ if (!name) {
+ snprintf(id_buf, sizeof(id_buf), "%d", pevent->id);
+ name = id_buf;
+ }
+
+ pevent->name = kstrdup(name, GFP_KERNEL);
+ if (!pevent->name) {
+ ret = -ENOMEM;
+ goto fail;
+ }
+
+ return pevent;
+fail:
+ pevent_free(pevent);
+ return ERR_PTR(ret);
+}
+
static int pevent_add(struct pevent *pevent, struct perf_event *event)
{
int ret = -EEXIST;
@@ -74,6 +117,7 @@ static int pevent_add(struct pevent *pevent, struct perf_event *event)
ret = 0;
event->pevent_id = pevent->id;
+ event->attr.persistent = 1;
list_add_tail(&event->pevent_entry, &per_cpu(pevents, cpu));
unlock:
mutex_unlock(&per_cpu(pevents_lock, cpu));
@@ -91,6 +135,7 @@ static struct perf_event *pevent_del(struct pevent *pevent, int cpu)
if (event) {
list_del(&event->pevent_entry);
event->pevent_id = 0;
+ event->attr.persistent = 0;
}
mutex_unlock(&per_cpu(pevents_lock, cpu));
@@ -160,33 +205,12 @@ static int __maybe_unused
persistent_open(char *name, struct perf_event_attr *attr, int nr_pages)
{
struct pevent *pevent;
- char id_buf[32];
int cpu;
int ret;
- pevent = kzalloc(sizeof(*pevent), GFP_KERNEL);
- if (!pevent)
- return -ENOMEM;
-
- atomic_set(&pevent->refcount, 1);
-
- ret = get_event_id(pevent);
- if (ret < 0)
- goto fail;
- pevent->id = ret;
-
- if (!name) {
- snprintf(id_buf, sizeof(id_buf), "%d", pevent->id);
- name = id_buf;
- }
-
- pevent->name = kstrdup(name, GFP_KERNEL);
- if (!pevent->name) {
- ret = -ENOMEM;
- goto fail;
- }
-
- pevent->sysfs.id = pevent->id;
+ pevent = pevent_alloc(name);
+ if (IS_ERR(pevent))
+ return PTR_ERR(pevent);
for_each_possible_cpu(cpu) {
ret = persistent_event_open(cpu, pevent, attr, nr_pages);
@@ -206,10 +230,7 @@ fail:
out:
if (atomic_dec_and_test(&pevent->refcount)) {
pevent_sysfs_unregister(pevent);
- if (pevent->id)
- put_event_id(pevent->id);
- kfree(pevent->name);
- kfree(pevent);
+ pevent_free(pevent);
}
return ret;
@@ -439,3 +460,104 @@ void __init perf_register_persistent(void)
mutex_init(&per_cpu(pevents_lock, cpu));
}
}
+
+/*
+ * Unclaim an event from a process. The event will remain in the system
+ * after closing the event's fd, it becomes persistent.
+ */
+int perf_event_unclaim(struct perf_event *event)
+{
+ struct pevent *pevent;
+ int cpu;
+ int ret;
+
+ if (!try_get_event(event))
+ return -ENOENT;
+
+ /* task events not yet supported: */
+ cpu = event->cpu;
+ if ((unsigned)cpu >= nr_cpu_ids) {
+ ret = -EINVAL;
+ goto fail_rb;
+ }
+
+ /*
+ * Avoid grabbing an id, later checked again in pevent_add()
+ * with mmap_mutex held.
+ */
+ if (event->pevent_id) {
+ ret = -EEXIST;
+ goto fail_rb;
+ }
+
+ mutex_lock(&event->mmap_mutex);
+ if (event->rb)
+ ret = -EBUSY;
+ else
+ ret = perf_alloc_rb(event, CPU_BUFFER_NR_PAGES, 0);
+ mutex_unlock(&event->mmap_mutex);
+
+ if (ret)
+ goto fail_rb;
+
+ pevent = pevent_alloc(NULL);
+ if (IS_ERR(pevent)) {
+ ret = PTR_ERR(pevent);
+ goto fail_pevent;
+ }
+
+ ret = pevent_add(pevent, event);
+ if (ret)
+ goto fail_add;
+
+ ret = pevent_sysfs_register(pevent);
+ if (ret)
+ goto fail_sysfs;
+
+ atomic_inc(&event->mmap_count);
+
+ return pevent->id;
+fail_sysfs:
+ pevent_del(pevent, cpu);
+fail_add:
+ pevent_free(pevent);
+fail_pevent:
+ mutex_lock(&event->mmap_mutex);
+ if (event->rb)
+ perf_free_rb(event);
+ mutex_unlock(&event->mmap_mutex);
+fail_rb:
+ put_event(event);
+ return ret;
+}
+
+/*
+ * Claim an event from a process. The event will be removed after all
+ * users disconnected from it, it's no longer persistent in the
+ * system.
+ */
+int perf_event_claim(struct perf_event *event)
+{
+ int cpu = event->cpu;
+ struct pevent *pevent;
+
+ if ((unsigned)cpu >= nr_cpu_ids)
+ return -EINVAL;
+
+ pevent = find_event(event->pevent_id);
+ if (!pevent)
+ return -EINVAL;
+
+ event = pevent_del(pevent, cpu);
+ if (!event)
+ return -EINVAL;
+
+ if (atomic_dec_and_test(&pevent->refcount)) {
+ pevent_sysfs_unregister(pevent);
+ pevent_free(pevent);
+ }
+
+ persistent_event_release(event);
+
+ return 0;
+}
--
1.7.11.7
^ permalink raw reply related [flat|nested] 41+ messages in thread* [PATCH 13/16] perf tools: Rename flex conditions to avoid name conflicts
2014-04-07 15:04 [PATCH v4 00/16] perf, persistent: Add persistent events Jean Pihet
` (11 preceding siblings ...)
2014-04-07 15:04 ` [PATCH 12/16] perf, persistent: ioctl functions to control persistency Jean Pihet
@ 2014-04-07 15:04 ` Jean Pihet
2014-04-07 15:04 ` [PATCH 14/16] perf tools: Modify event parser to update event attribute by index Jean Pihet
` (4 subsequent siblings)
17 siblings, 0 replies; 41+ messages in thread
From: Jean Pihet @ 2014-04-07 15:04 UTC (permalink / raw)
To: Borislav Petkov, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel, Robert Richter
Cc: Robert Richter, Jean Pihet
From: Robert Richter <robert.richter@linaro.org>
These define's may cause conflicts with other definitions:
#define INITIAL 0
#define mem 1
#define config 2
#define event 3
Prefix them with cond_* to avoid this.
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
Signed-off-by: Jean Pihet <jean.pihet@linaro.org>
---
tools/perf/util/parse-events.l | 26 +++++++++++++-------------
1 file changed, 13 insertions(+), 13 deletions(-)
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 3432995..a5808d1 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -69,9 +69,9 @@ static int term(yyscan_t scanner, int type)
%}
-%x mem
-%s config
-%x event
+%x cond_mem
+%s cond_config
+%x cond_event
group [^,{}/]*[{][^}]*[}][^,{}/]*
event_pmu [^,{}/]+[/][^/]*[/][^,{}/]*
@@ -95,9 +95,9 @@ modifier_bp [rwx]{1,3}
start_token = parse_events_get_extra(yyscanner);
if (start_token == PE_START_TERMS)
- BEGIN(config);
+ BEGIN(cond_config);
else if (start_token == PE_START_EVENTS)
- BEGIN(event);
+ BEGIN(cond_event);
if (start_token) {
parse_events_set_extra(NULL, yyscanner);
@@ -106,7 +106,7 @@ modifier_bp [rwx]{1,3}
}
%}
-<event>{
+<cond_event>{
{group} {
BEGIN(INITIAL); yyless(0);
@@ -126,7 +126,7 @@ modifier_bp [rwx]{1,3}
}
-<config>{
+<cond_config>{
config { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG); }
config1 { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG1); }
config2 { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG2); }
@@ -138,7 +138,7 @@ branch_type { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE
{name_minus} { return str(yyscanner, PE_NAME); }
}
-<mem>{
+<cond_mem>{
{modifier_bp} { return str(yyscanner, PE_MODIFIER_BP); }
: { return ':'; }
{num_dec} { return value(yyscanner, 10); }
@@ -146,7 +146,7 @@ branch_type { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE
/*
* We need to separate 'mem:' scanner part, in order to get specific
* modifier bits parsed out. Otherwise we would need to handle PE_NAME
- * and we'd need to parse it manually. During the escape from <mem>
+ * and we'd need to parse it manually. During the escape from <cond_mem>
* state we need to put the escaping char back, so we dont miss it.
*/
. { unput(*yytext); BEGIN(INITIAL); }
@@ -193,18 +193,18 @@ speculative-read|speculative-load |
refs|Reference|ops|access |
misses|miss { return str(yyscanner, PE_NAME_CACHE_OP_RESULT); }
-mem: { BEGIN(mem); return PE_PREFIX_MEM; }
+mem: { BEGIN(cond_mem); return PE_PREFIX_MEM; }
r{num_raw_hex} { return raw(yyscanner); }
{num_dec} { return value(yyscanner, 10); }
{num_hex} { return value(yyscanner, 16); }
{modifier_event} { return str(yyscanner, PE_MODIFIER_EVENT); }
{name} { return str(yyscanner, PE_NAME); }
-"/" { BEGIN(config); return '/'; }
+"/" { BEGIN(cond_config); return '/'; }
- { return '-'; }
-, { BEGIN(event); return ','; }
+, { BEGIN(cond_event); return ','; }
: { return ':'; }
-"{" { BEGIN(event); return '{'; }
+"{" { BEGIN(cond_event); return '{'; }
"}" { return '}'; }
= { return '='; }
\n { }
--
1.7.11.7
^ permalink raw reply related [flat|nested] 41+ messages in thread* [PATCH 14/16] perf tools: Modify event parser to update event attribute by index
2014-04-07 15:04 [PATCH v4 00/16] perf, persistent: Add persistent events Jean Pihet
` (12 preceding siblings ...)
2014-04-07 15:04 ` [PATCH 13/16] perf tools: Rename flex conditions to avoid name conflicts Jean Pihet
@ 2014-04-07 15:04 ` Jean Pihet
2014-04-07 15:04 ` [PATCH 15/16] perf tools: Add attr<num> syntax to event parser Jean Pihet
` (3 subsequent siblings)
17 siblings, 0 replies; 41+ messages in thread
From: Jean Pihet @ 2014-04-07 15:04 UTC (permalink / raw)
To: Borislav Petkov, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel, Robert Richter
Cc: Robert Richter, Jean Pihet
From: Robert Richter <robert.richter@linaro.org>
In a later patch we want to introduce a syntax that allows updating
attribute fields by an index pointing to a certain u64 entry of struct
perf_event_attr. We need this to expose any event via sysfs that is
available in the system where especially flag fields need to be set.
Reworking the event parser to use an attribute index. This is done by
introducing type PARSE_EVENTS__TERM_TYPE_ATTR for all numeric values
to be added to the event attribute fields. We use an index to specify
the corresponding u64 attr value.
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
Signed-off-by: Jean Pihet <jean.pihet@linaro.org>
---
tools/perf/tests/parse-events.c | 12 ++++++---
tools/perf/util/parse-events.c | 59 +++++++++++++++++++----------------------
tools/perf/util/parse-events.h | 12 ++++-----
tools/perf/util/parse-events.l | 18 ++++++-------
tools/perf/util/parse-events.y | 24 ++++++++++-------
5 files changed, 65 insertions(+), 60 deletions(-)
diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index 8605ff5..c15660c 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -463,7 +463,9 @@ static int test__checkterms_simple(struct list_head *terms)
/* config=10 */
term = list_entry(terms->next, struct parse_events_term, list);
TEST_ASSERT_VAL("wrong type term",
- term->type_term == PARSE_EVENTS__TERM_TYPE_CONFIG);
+ term->type_term == PARSE_EVENTS__TERM_TYPE_ATTR);
+ TEST_ASSERT_VAL("wrong type idx",
+ term->idx == 1);
TEST_ASSERT_VAL("wrong type val",
term->type_val == PARSE_EVENTS__TERM_TYPE_NUM);
TEST_ASSERT_VAL("wrong val", term->val.num == 10);
@@ -472,7 +474,9 @@ static int test__checkterms_simple(struct list_head *terms)
/* config1 */
term = list_entry(term->list.next, struct parse_events_term, list);
TEST_ASSERT_VAL("wrong type term",
- term->type_term == PARSE_EVENTS__TERM_TYPE_CONFIG1);
+ term->type_term == PARSE_EVENTS__TERM_TYPE_ATTR);
+ TEST_ASSERT_VAL("wrong type idx",
+ term->idx == 7);
TEST_ASSERT_VAL("wrong type val",
term->type_val == PARSE_EVENTS__TERM_TYPE_NUM);
TEST_ASSERT_VAL("wrong val", term->val.num == 1);
@@ -481,7 +485,9 @@ static int test__checkterms_simple(struct list_head *terms)
/* config2=3 */
term = list_entry(term->list.next, struct parse_events_term, list);
TEST_ASSERT_VAL("wrong type term",
- term->type_term == PARSE_EVENTS__TERM_TYPE_CONFIG2);
+ term->type_term == PARSE_EVENTS__TERM_TYPE_ATTR);
+ TEST_ASSERT_VAL("wrong type idx",
+ term->idx == 8);
TEST_ASSERT_VAL("wrong type val",
term->type_val == PARSE_EVENTS__TERM_TYPE_NUM);
TEST_ASSERT_VAL("wrong val", term->val.num == 3);
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 1e15df1..8fd2542 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -541,6 +541,19 @@ int parse_events_add_breakpoint(struct list_head *list, int *idx,
return add_event(list, idx, &attr, NULL);
}
+int parse_events__set_attr(struct perf_event_attr *__attr, u64 idx, u64 val)
+{
+ __u64 *attr = (__u64 *)__attr;
+
+ if (idx * sizeof(*attr) >= sizeof(*__attr))
+ return -EINVAL;
+
+ attr += idx;
+ *attr |= val;
+
+ return 0;
+}
+
static int config_term(struct perf_event_attr *attr,
struct parse_events_term *term)
{
@@ -551,36 +564,17 @@ do { \
} while (0)
switch (term->type_term) {
- case PARSE_EVENTS__TERM_TYPE_CONFIG:
+ case PARSE_EVENTS__TERM_TYPE_ATTR:
CHECK_TYPE_VAL(NUM);
- attr->config = term->val.num;
- break;
- case PARSE_EVENTS__TERM_TYPE_CONFIG1:
- CHECK_TYPE_VAL(NUM);
- attr->config1 = term->val.num;
- break;
- case PARSE_EVENTS__TERM_TYPE_CONFIG2:
- CHECK_TYPE_VAL(NUM);
- attr->config2 = term->val.num;
- break;
- case PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD:
- CHECK_TYPE_VAL(NUM);
- attr->sample_period = term->val.num;
- break;
- case PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE:
- /*
- * TODO uncomment when the field is available
- * attr->branch_sample_type = term->val.num;
- */
- break;
+ return parse_events__set_attr(attr, term->idx, term->val.num);
case PARSE_EVENTS__TERM_TYPE_NAME:
CHECK_TYPE_VAL(STR);
- break;
+ return 0;
default:
- return -EINVAL;
+ break;
}
- return 0;
+ return -EINVAL;
#undef CHECK_TYPE_VAL
}
@@ -1277,7 +1271,7 @@ int parse_events__is_hardcoded_term(struct parse_events_term *term)
static int new_term(struct parse_events_term **_term, int type_val,
int type_term, char *config,
- char *str, u64 num)
+ char *str, u64 num, u64 idx)
{
struct parse_events_term *term;
@@ -1288,7 +1282,8 @@ static int new_term(struct parse_events_term **_term, int type_val,
INIT_LIST_HEAD(&term->list);
term->type_val = type_val;
term->type_term = type_term;
- term->config = config;
+ term->config = config;
+ term->idx = idx;
switch (type_val) {
case PARSE_EVENTS__TERM_TYPE_NUM:
@@ -1307,17 +1302,17 @@ static int new_term(struct parse_events_term **_term, int type_val,
}
int parse_events_term__num(struct parse_events_term **term,
- int type_term, char *config, u64 num)
+ int type_term, char *config, u64 num, u64 idx)
{
return new_term(term, PARSE_EVENTS__TERM_TYPE_NUM, type_term,
- config, NULL, num);
+ config, NULL, num, idx);
}
int parse_events_term__str(struct parse_events_term **term,
int type_term, char *config, char *str)
{
return new_term(term, PARSE_EVENTS__TERM_TYPE_STR, type_term,
- config, str, 0);
+ config, str, 0, 0);
}
int parse_events_term__sym_hw(struct parse_events_term **term,
@@ -1331,18 +1326,18 @@ int parse_events_term__sym_hw(struct parse_events_term **term,
if (config)
return new_term(term, PARSE_EVENTS__TERM_TYPE_STR,
PARSE_EVENTS__TERM_TYPE_USER, config,
- (char *) sym->symbol, 0);
+ (char *) sym->symbol, 0, 0);
else
return new_term(term, PARSE_EVENTS__TERM_TYPE_STR,
PARSE_EVENTS__TERM_TYPE_USER,
- (char *) "event", (char *) sym->symbol, 0);
+ (char *) "event", (char *) sym->symbol, 0, 0);
}
int parse_events_term__clone(struct parse_events_term **new,
struct parse_events_term *term)
{
return new_term(new, term->type_val, term->type_term, term->config,
- term->val.str, term->val.num);
+ term->val.str, term->val.num, term->idx);
}
void parse_events__free_terms(struct list_head *terms)
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index f1cb4c4..61bbe88 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -35,6 +35,8 @@ extern int parse_events_terms(struct list_head *terms, const char *str);
extern int parse_filter(const struct option *opt, const char *str, int unset);
#define EVENTS_HELP_MAX (128*1024)
+#define PERF_ATTR_IDX(MEMBER) \
+ (offsetof(struct perf_event_attr, MEMBER) / sizeof(__u64))
enum {
PARSE_EVENTS__TERM_TYPE_NUM,
@@ -43,12 +45,8 @@ enum {
enum {
PARSE_EVENTS__TERM_TYPE_USER,
- PARSE_EVENTS__TERM_TYPE_CONFIG,
- PARSE_EVENTS__TERM_TYPE_CONFIG1,
- PARSE_EVENTS__TERM_TYPE_CONFIG2,
+ PARSE_EVENTS__TERM_TYPE_ATTR,
PARSE_EVENTS__TERM_TYPE_NAME,
- PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD,
- PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE,
};
struct parse_events_term {
@@ -59,6 +57,7 @@ struct parse_events_term {
} val;
int type_val;
int type_term;
+ u64 idx;
struct list_head list;
};
@@ -74,7 +73,7 @@ struct parse_events_terms {
int parse_events__is_hardcoded_term(struct parse_events_term *term);
int parse_events_term__num(struct parse_events_term **_term,
- int type_term, char *config, u64 num);
+ int type_term, char *config, u64 num, u64 idx);
int parse_events_term__str(struct parse_events_term **_term,
int type_term, char *config, char *str);
int parse_events_term__sym_hw(struct parse_events_term **term,
@@ -97,6 +96,7 @@ int parse_events_add_breakpoint(struct list_head *list, int *idx,
int parse_events_add_pmu(struct list_head *list, int *idx,
char *pmu , struct list_head *head_config);
void parse_events__set_leader(char *name, struct list_head *list);
+int parse_events__set_attr(struct perf_event_attr *__attr, u64 idx, u64 val);
void parse_events_update_lists(struct list_head *list_event,
struct list_head *list_all);
void parse_events_error(void *data, void *scanner, char const *msg);
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index a5808d1..8ef3f6c 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -59,12 +59,12 @@ static int sym(yyscan_t scanner, int type, int config)
return type == PERF_TYPE_HARDWARE ? PE_VALUE_SYM_HW : PE_VALUE_SYM_SW;
}
-static int term(yyscan_t scanner, int type)
+static int attr(yyscan_t scanner, u64 idx)
{
YYSTYPE *yylval = parse_events_get_lval(scanner);
- yylval->num = type;
- return PE_TERM;
+ yylval->num = idx;
+ return PE_TERM_ATTR;
}
%}
@@ -127,12 +127,12 @@ modifier_bp [rwx]{1,3}
}
<cond_config>{
-config { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG); }
-config1 { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG1); }
-config2 { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG2); }
-name { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NAME); }
-period { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD); }
-branch_type { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE); }
+config { return attr(yyscanner, PERF_ATTR_IDX(config)); }
+config1 { return attr(yyscanner, PERF_ATTR_IDX(config1)); }
+config2 { return attr(yyscanner, PERF_ATTR_IDX(config2)); }
+period { return attr(yyscanner, PERF_ATTR_IDX(sample_period)); }
+branch_type { return attr(yyscanner, PERF_ATTR_IDX(branch_sample_type)); }
+name { return PE_TERM_NAME; }
, { return ','; }
"/" { BEGIN(INITIAL); return '/'; }
{name_minus} { return str(yyscanner, PE_NAME); }
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index 4eb67ec..2bd285e 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -40,7 +40,8 @@ static inc_group_count(struct list_head *list,
%}
%token PE_START_EVENTS PE_START_TERMS
-%token PE_VALUE PE_VALUE_SYM_HW PE_VALUE_SYM_SW PE_RAW PE_TERM
+%token PE_VALUE PE_VALUE_SYM_HW PE_VALUE_SYM_SW PE_RAW
+%token PE_TERM_NAME PE_TERM_ATTR
%token PE_EVENT_NAME
%token PE_NAME
%token PE_MODIFIER_EVENT PE_MODIFIER_BP
@@ -51,7 +52,7 @@ static inc_group_count(struct list_head *list,
%type <num> PE_VALUE_SYM_HW
%type <num> PE_VALUE_SYM_SW
%type <num> PE_RAW
-%type <num> PE_TERM
+%type <num> PE_TERM_ATTR
%type <str> PE_NAME
%type <str> PE_NAME_CACHE_TYPE
%type <str> PE_NAME_CACHE_OP_RESULT
@@ -375,7 +376,7 @@ PE_NAME '=' PE_VALUE
struct parse_events_term *term;
ABORT_ON(parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_USER,
- $1, $3));
+ $1, $3, 0));
$$ = term;
}
|
@@ -393,7 +394,7 @@ PE_NAME
struct parse_events_term *term;
ABORT_ON(parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_USER,
- $1, 1));
+ $1, 1, 0));
$$ = term;
}
|
@@ -406,27 +407,30 @@ PE_VALUE_SYM_HW
$$ = term;
}
|
-PE_TERM '=' PE_NAME
+PE_TERM_NAME '=' PE_NAME
{
struct parse_events_term *term;
- ABORT_ON(parse_events_term__str(&term, (int)$1, NULL, $3));
+ ABORT_ON(parse_events_term__str(&term, PARSE_EVENTS__TERM_TYPE_NAME,
+ NULL, $3));
$$ = term;
}
|
-PE_TERM '=' PE_VALUE
+PE_TERM_ATTR '=' PE_VALUE
{
struct parse_events_term *term;
- ABORT_ON(parse_events_term__num(&term, (int)$1, NULL, $3));
+ ABORT_ON(parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_ATTR,
+ NULL, $3, $1));
$$ = term;
}
|
-PE_TERM
+PE_TERM_ATTR
{
struct parse_events_term *term;
- ABORT_ON(parse_events_term__num(&term, (int)$1, NULL, 1));
+ ABORT_ON(parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_ATTR,
+ NULL, 1, $1));
$$ = term;
}
--
1.7.11.7
^ permalink raw reply related [flat|nested] 41+ messages in thread* [PATCH 15/16] perf tools: Add attr<num> syntax to event parser
2014-04-07 15:04 [PATCH v4 00/16] perf, persistent: Add persistent events Jean Pihet
` (13 preceding siblings ...)
2014-04-07 15:04 ` [PATCH 14/16] perf tools: Modify event parser to update event attribute by index Jean Pihet
@ 2014-04-07 15:04 ` Jean Pihet
2014-04-07 15:04 ` [PATCH 16/16] perf tools: Retry mapping buffers readonly on EACCES Jean Pihet
` (2 subsequent siblings)
17 siblings, 0 replies; 41+ messages in thread
From: Jean Pihet @ 2014-04-07 15:04 UTC (permalink / raw)
To: Borislav Petkov, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel, Robert Richter
Cc: Robert Richter, Jean Pihet
From: Robert Richter <robert.richter@linaro.org>
The event parser is limited to update only a subset of all fields in
struct perf_event_attr (config*, period, branch_type). We are not able
to set other attr fields, esp. flags.
Introducing a new syntax to set any field of the event attribute by
using an index to the u64 value to be used within struct
perf_event_attr. The new syntax attr<num> is similar to config<num>,
but <num> specifies the index to be used. E.g. attr5:24 sets bit 24 of
the flag field of attr.
The persistent event implementation is a use case of the above. In
this case sysfs provides:
/sys/bus/event_source/devices/persistent/events/mce_record:persistent,config=106
/sys/bus/event_source/devices/persistent/format/persistent:attr5:24
Persistent events are exposed via sysfs and need to set the persistent
flag (bit 24 of the flag field). With the sysfs entry above we are
able to define the persistent flag format and then may setup a
mce_record event with that flag set.
In general we are now flexible to describe with sysfs any event to be
setup by perf tools.
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
Signed-off-by: Jean Pihet <jean.pihet@linaro.org>
---
tools/perf/util/parse-events.l | 14 ++++++++++++++
tools/perf/util/pmu.c | 32 ++++++--------------------------
tools/perf/util/pmu.h | 9 ++-------
tools/perf/util/pmu.l | 1 +
tools/perf/util/pmu.y | 18 ++++++++++++++----
5 files changed, 37 insertions(+), 37 deletions(-)
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 8ef3f6c..707ce83 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -67,6 +67,19 @@ static int attr(yyscan_t scanner, u64 idx)
return PE_TERM_ATTR;
}
+static int attr_parse(yyscan_t scanner)
+{
+ YYSTYPE *yylval = parse_events_get_lval(scanner);
+ char *text = parse_events_get_text(scanner);
+
+ errno = 0;
+ yylval->num = strtoull(text + 4, NULL, 10);
+ if (errno)
+ return PE_ERROR;
+
+ return PE_TERM_ATTR;
+}
+
%}
%x cond_mem
@@ -127,6 +140,7 @@ modifier_bp [rwx]{1,3}
}
<cond_config>{
+attr[0-9]* { return attr_parse(yyscanner); }
config { return attr(yyscanner, PERF_ATTR_IDX(config)); }
config1 { return attr(yyscanner, PERF_ATTR_IDX(config1)); }
config2 { return attr(yyscanner, PERF_ATTR_IDX(config2)); }
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 00a7dcb..d36d750 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -21,8 +21,8 @@ struct perf_pmu_alias {
};
struct perf_pmu_format {
- char *name;
- int value;
+ char *name;
+ u64 idx;
DECLARE_BITMAP(bits, PERF_PMU_FORMAT_BITS);
struct list_head list;
};
@@ -512,7 +512,6 @@ static int pmu_config_term(struct list_head *formats,
struct parse_events_term *term)
{
struct perf_pmu_format *format;
- __u64 *vp;
/*
* Support only for hardcoded and numnerial terms.
@@ -529,27 +528,8 @@ static int pmu_config_term(struct list_head *formats,
if (!format)
return -EINVAL;
- switch (format->value) {
- case PERF_PMU_FORMAT_VALUE_CONFIG:
- vp = &attr->config;
- break;
- case PERF_PMU_FORMAT_VALUE_CONFIG1:
- vp = &attr->config1;
- break;
- case PERF_PMU_FORMAT_VALUE_CONFIG2:
- vp = &attr->config2;
- break;
- default:
- return -EINVAL;
- }
-
- /*
- * XXX If we ever decide to go with string values for
- * non-hardcoded terms, here's the place to translate
- * them into value.
- */
- *vp |= pmu_format_value(format->bits, term->val.num);
- return 0;
+ return parse_events__set_attr(attr, format->idx,
+ pmu_format_value(format->bits, term->val.num));
}
int perf_pmu__config_terms(struct list_head *formats,
@@ -678,7 +658,7 @@ int perf_pmu__check_alias(struct perf_pmu *pmu, struct list_head *head_terms,
}
int perf_pmu__new_format(struct list_head *list, char *name,
- int config, unsigned long *bits)
+ __u64 idx, unsigned long *bits)
{
struct perf_pmu_format *format;
@@ -687,7 +667,7 @@ int perf_pmu__new_format(struct list_head *list, char *name,
return -ENOMEM;
format->name = strdup(name);
- format->value = config;
+ format->idx = idx;
memcpy(format->bits, bits, sizeof(format->bits));
list_add_tail(&format->list, list);
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 8b64125..10269f7 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -4,12 +4,7 @@
#include <linux/bitops.h>
#include <linux/perf_event.h>
#include <stdbool.h>
-
-enum {
- PERF_PMU_FORMAT_VALUE_CONFIG,
- PERF_PMU_FORMAT_VALUE_CONFIG1,
- PERF_PMU_FORMAT_VALUE_CONFIG2,
-};
+#include "parse-events.h"
#define PERF_PMU_FORMAT_BITS 64
@@ -36,7 +31,7 @@ int perf_pmu_wrap(void);
void perf_pmu_error(struct list_head *list, char *name, char const *msg);
int perf_pmu__new_format(struct list_head *list, char *name,
- int config, unsigned long *bits);
+ __u64 idx, unsigned long *bits);
void perf_pmu__set_format(unsigned long *bits, long from, long to);
int perf_pmu__format_parse(char *dir, struct list_head *head);
diff --git a/tools/perf/util/pmu.l b/tools/perf/util/pmu.l
index a15d9fb..9d5aa62 100644
--- a/tools/perf/util/pmu.l
+++ b/tools/perf/util/pmu.l
@@ -26,6 +26,7 @@ num_dec [0-9]+
%%
{num_dec} { return value(10); }
+attr { return PP_ATTR; }
config { return PP_CONFIG; }
config1 { return PP_CONFIG1; }
config2 { return PP_CONFIG2; }
diff --git a/tools/perf/util/pmu.y b/tools/perf/util/pmu.y
index bfd7e85..fb86df8 100644
--- a/tools/perf/util/pmu.y
+++ b/tools/perf/util/pmu.y
@@ -20,7 +20,7 @@ do { \
%}
-%token PP_CONFIG PP_CONFIG1 PP_CONFIG2
+%token PP_ATTR PP_CONFIG PP_CONFIG1 PP_CONFIG2
%token PP_VALUE PP_ERROR
%type <num> PP_VALUE
%type <bits> bit_term
@@ -40,24 +40,34 @@ format format_term
format_term
format_term:
+PP_ATTR ':' bits
+{
+ ABORT_ON(perf_pmu__new_format(format, name, 0, $3));
+}
+|
+PP_ATTR PP_VALUE ':' bits
+{
+ ABORT_ON(perf_pmu__new_format(format, name, $2, $4));
+}
+|
PP_CONFIG ':' bits
{
ABORT_ON(perf_pmu__new_format(format, name,
- PERF_PMU_FORMAT_VALUE_CONFIG,
+ PERF_ATTR_IDX(config),
$3));
}
|
PP_CONFIG1 ':' bits
{
ABORT_ON(perf_pmu__new_format(format, name,
- PERF_PMU_FORMAT_VALUE_CONFIG1,
+ PERF_ATTR_IDX(config1),
$3));
}
|
PP_CONFIG2 ':' bits
{
ABORT_ON(perf_pmu__new_format(format, name,
- PERF_PMU_FORMAT_VALUE_CONFIG2,
+ PERF_ATTR_IDX(config2),
$3));
}
--
1.7.11.7
^ permalink raw reply related [flat|nested] 41+ messages in thread* [PATCH 16/16] perf tools: Retry mapping buffers readonly on EACCES
2014-04-07 15:04 [PATCH v4 00/16] perf, persistent: Add persistent events Jean Pihet
` (14 preceding siblings ...)
2014-04-07 15:04 ` [PATCH 15/16] perf tools: Add attr<num> syntax to event parser Jean Pihet
@ 2014-04-07 15:04 ` Jean Pihet
2014-04-17 12:44 ` [PATCH v4 00/16] perf, persistent: Add persistent events Jean Pihet
2014-05-06 12:39 ` Robert Richter
17 siblings, 0 replies; 41+ messages in thread
From: Jean Pihet @ 2014-04-07 15:04 UTC (permalink / raw)
To: Borislav Petkov, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel, Robert Richter
Cc: Robert Richter, Jean Pihet
From: Robert Richter <robert.richter@linaro.org>
Persistent event buffers may only be mmapped readonly. Thus, retry
mapping it readonly if mmap returns EACCES after trying to mmap
writable.
[ namhyung: Don't write to readonly mmap'ed buffers. ]
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
Signed-off-by: Jean Pihet <jean.pihet@linaro.org>
---
tools/perf/builtin-record.c | 11 ++++++++---
tools/perf/builtin-top.c | 8 ++++++--
tools/perf/perf.h | 1 +
3 files changed, 15 insertions(+), 5 deletions(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index eb524f9..ede3af1 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -133,8 +133,8 @@ static int record__mmap_read(struct record *rec, struct perf_mmap *md)
}
md->prev = old;
- perf_mmap__write_tail(md, old);
-
+ if (!rec->opts.mmap_ro)
+ perf_mmap__write_tail(md, old);
out:
return rc;
}
@@ -207,7 +207,12 @@ try_again:
goto out;
}
- if (perf_evlist__mmap(evlist, opts->mmap_pages, false) < 0) {
+try_again2:
+ if (perf_evlist__mmap(evlist, opts->mmap_pages, opts->mmap_ro) < 0) {
+ if (!opts->mmap_ro && errno == EACCES) {
+ opts->mmap_ro = true;
+ goto try_again2;
+ }
if (errno == EPERM) {
pr_err("Permission error mapping pages.\n"
"Consider increasing "
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 65aaa5b..479356f 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -868,8 +868,12 @@ try_again:
goto out_err;
}
}
-
- if (perf_evlist__mmap(evlist, opts->mmap_pages, false) < 0) {
+try_again2:
+ if (perf_evlist__mmap(evlist, opts->mmap_pages, opts->mmap_ro) < 0) {
+ if (!opts->mmap_ro && errno == EACCES) {
+ opts->mmap_ro = true;
+ goto try_again2;
+ }
ui__error("Failed to mmap with %d (%s)\n",
errno, strerror(errno));
goto out_err;
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 1138c41..a8ffaba4 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -282,6 +282,7 @@ struct record_opts {
bool sample_weight;
bool sample_time;
bool period;
+ bool mmap_ro;
unsigned int freq;
unsigned int mmap_pages;
unsigned int user_freq;
--
1.7.11.7
^ permalink raw reply related [flat|nested] 41+ messages in thread* Re: [PATCH v4 00/16] perf, persistent: Add persistent events
2014-04-07 15:04 [PATCH v4 00/16] perf, persistent: Add persistent events Jean Pihet
` (15 preceding siblings ...)
2014-04-07 15:04 ` [PATCH 16/16] perf tools: Retry mapping buffers readonly on EACCES Jean Pihet
@ 2014-04-17 12:44 ` Jean Pihet
2014-04-17 12:50 ` Borislav Petkov
2014-05-06 12:39 ` Robert Richter
17 siblings, 1 reply; 41+ messages in thread
From: Jean Pihet @ 2014-04-17 12:44 UTC (permalink / raw)
To: Borislav Petkov, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel@vger.kernel.org,
Robert Richter
Cc: Jean Pihet
Hi,
Ping on this series. Are there some opinions or interest in this?
Regards,
Jean
On 7 April 2014 17:04, Jean Pihet <jean.pihet@linaro.org> wrote:
> Jean Pihet <jean.pihet@linaro.org>:
> - In order to restart the discussion on the topic, here is a rebased
> version of Robert's latest patches (v3) on acme/perf/core.
> It has been compiled and lightly tested on ARM64.
>
> - From the latest discussion on ML the ioctls are renamed from
> PERF_EVENT_IOC_ATTACH/DETACH to PERF_EVENT_IOC_CLAIM/UNCLAIM.
>
>
> Robert Richter <rric@kernel.org>:
> This patch set implements the necessary kernel changes for persistent
> events.
>
> Persistent events run standalone in the system without the need of a
> controlling process that holds an event's file descriptor. The events
> are always enabled and collect data samples in a ring buffer.
> Processes may connect to existing persistent events using the
> perf_event_open() syscall. For this the syscall must be configured
> using the new PERF_TYPE_PERSISTENT event type and a unique event
> identifier specified in attr.config. The id is propagated in sysfs or
> using ioctl (see below).
>
> Persistent event buffers may be accessed with mmap() in the same way
> as for any other event. Since the buffers may be used by multiple
> processes at the same time, there is only read-only access to them.
> Currently there is only support for per-cpu events, thus root access
> is needed too.
>
> Persistent events are visible in sysfs. They are added or removed
> dynamically. With the information in sysfs userland knows about how to
> setup the perf_event attribute of a persistent event. Since a
> persistent event always has the persistent flag set, a way is needed
> to express this in sysfs. A new syntax is used for this. With
> 'attr<num>:<mask>' any bit in the attribute structure may be set in a
> similar way as using 'config<num>', but <num> is an index that points
> to the u64 value to change within the attribute.
>
> For persistent events the persistent flag (bit 24 of flag field in
> struct perf_event_attr) needs to be set which is expressed in sysfs
> with "attr5:24". E.g. the mce_record event is described in sysfs as
> follows:
>
> /sys/bus/event_source/devices/persistent/events/mce_record:persistent,config=106
> /sys/bus/event_source/devices/persistent/format/persistent:attr5:24
>
> Note that perf tools need to support the 'attr<num>' syntax that is
> added in a separate patch set. With it we are able to run perf tool
> commands to read persistent events, e.g.:
>
> # perf record -e persistent/mce_record/ sleep 10
> # perf top -e persistent/mce_record/
>
> In general the new syntax is flexible to describe with sysfs any event
> to be setup by perf tools.
>
> There are ioctl functions to control persistent events that can be
> used to detach or attach an event to or from a process. The
> PERF_EVENT_IOC_UNCLAIM ioctl call makes an event persistent. The
> perf_event_open() syscall can be used to re-open the event by any
> process. The PERF_EVENT_IOC_CLAIM ioctl attaches the event again so
> that it is removed after closing the event's fd.
>
> The patches base on the originally work from Borislav Petkov.
>
> This version 3 of the patch set is a complete rework of the code.
> There are the following major changes:
>
> * new event type PERF_TYPE_PERSISTENT introduced,
>
> * support for all type of events,
>
> * unique event ids,
>
> * improvements in reference counting and locking,
>
> * ioctl functions are added to control persistency,
>
> * the sysfs implementation now uses variable list size.
>
> This should address most issues discussed during last review of
> version 2. The following is unresolved yet and can be added later on
> top of this patches, if necessary:
>
> * support for per-task events (also allowing non-root access),
>
> * creation of persistent events for disabled cpus,
>
> * make event persistent with already open (mmap'ed) buffers,
>
> * make event persistent while creating it.
>
> First patches contain some rework of the perf mmap code to reuse it
> for persistent events.
>
> Also note that patch 12 (ioctl functions to control persistency) is
> RFC and untested. A perf tools implementation for this is missing and
> some ideas are needed how this could be integrated, esp. in something
> like perf trace or so.
>
> All v3 patches can be found here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git persistent-v3
>
> Note: I will resent the perf tools patch necessary to use persistent
> events.
>
> -Robert
>
>
> Borislav Petkov (1):
> mce, x86: Enable persistent events
>
> Robert Richter (15):
> perf, mmap: Factor out ring_buffer_detach_all()
> perf, mmap: Factor out try_get_event()/put_event()
> perf, mmap: Factor out perf_alloc/free_rb()
> perf, mmap: Factor out perf_get_fd()
> perf: Add persistent events
> perf, persistent: Implementing a persistent pmu
> perf, persistent: Exposing persistent events using sysfs
> perf, persistent: Use unique event ids
> perf, persistent: Implement reference counter for events
> perf, persistent: Dynamically resize list of sysfs entries
> perf, persistent: ioctl functions to control persistency
> perf tools: Rename flex conditions to avoid name conflicts
> perf tools: Modify event parser to update event attribute by index
> perf tools: Add attr<num> syntax to event parser
> perf tools: Retry mapping buffers readonly on EACCES
>
> .../testing/sysfs-bus-event_source-devices-format | 43 +-
> arch/x86/kernel/cpu/mcheck/mce.c | 19 +
> include/linux/perf_event.h | 12 +-
> include/uapi/linux/perf_event.h | 6 +-
> kernel/events/Makefile | 2 +-
> kernel/events/core.c | 214 +++++---
> kernel/events/internal.h | 20 +
> kernel/events/persistent.c | 563 +++++++++++++++++++++
> tools/perf/builtin-record.c | 11 +-
> tools/perf/builtin-top.c | 8 +-
> tools/perf/perf.h | 1 +
> tools/perf/tests/parse-events.c | 12 +-
> tools/perf/util/parse-events.c | 59 +--
> tools/perf/util/parse-events.h | 12 +-
> tools/perf/util/parse-events.l | 58 ++-
> tools/perf/util/parse-events.y | 24 +-
> tools/perf/util/pmu.c | 32 +-
> tools/perf/util/pmu.h | 9 +-
> tools/perf/util/pmu.l | 1 +
> tools/perf/util/pmu.y | 18 +-
> 20 files changed, 911 insertions(+), 213 deletions(-)
> create mode 100644 kernel/events/persistent.c
>
> --
> 1.7.11.7
>
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH v4 00/16] perf, persistent: Add persistent events
2014-04-17 12:44 ` [PATCH v4 00/16] perf, persistent: Add persistent events Jean Pihet
@ 2014-04-17 12:50 ` Borislav Petkov
2014-04-17 13:17 ` Jean Pihet
0 siblings, 1 reply; 41+ messages in thread
From: Borislav Petkov @ 2014-04-17 12:50 UTC (permalink / raw)
To: Jean Pihet
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa,
linux-kernel@vger.kernel.org, Robert Richter
On Thu, Apr 17, 2014 at 02:44:26PM +0200, Jean Pihet wrote:
> Hi,
>
> Ping on this series. Are there some opinions or interest in this?
Just checking whether you've read my mail from 2-ish weeks ago:
https://lkml.kernel.org/r/20140329094250.GA11987@nazgul.tnic
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH v4 00/16] perf, persistent: Add persistent events
2014-04-17 12:50 ` Borislav Petkov
@ 2014-04-17 13:17 ` Jean Pihet
2014-04-17 13:21 ` Borislav Petkov
0 siblings, 1 reply; 41+ messages in thread
From: Jean Pihet @ 2014-04-17 13:17 UTC (permalink / raw)
To: Borislav Petkov, Ingo Molnar
Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
linux-kernel@vger.kernel.org, Robert Richter
Hi Borislav,
On 17 April 2014 14:50, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Apr 17, 2014 at 02:44:26PM +0200, Jean Pihet wrote:
>> Hi,
>>
>> Ping on this series. Are there some opinions or interest in this?
>
> Just checking whether you've read my mail from 2-ish weeks ago:
>
> https://lkml.kernel.org/r/20140329094250.GA11987@nazgul.tnic
Yes indeed!
Ingo,
Is the perf support for persistent event needed, or can this series be
reviewed as is?
In the meantime I am now working on the perf tool:
- add persistent events,
- factor out the code and provide a library (libperf) for other tool
to use it (e.g. monitoring daemon).
I surely will let you know!
Thx,
Jean
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v4 00/16] perf, persistent: Add persistent events
2014-04-17 13:17 ` Jean Pihet
@ 2014-04-17 13:21 ` Borislav Petkov
2014-04-22 8:20 ` Jean Pihet
0 siblings, 1 reply; 41+ messages in thread
From: Borislav Petkov @ 2014-04-17 13:21 UTC (permalink / raw)
To: Jean Pihet
Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
linux-kernel@vger.kernel.org, Robert Richter
Hi Jean,
On Thu, Apr 17, 2014 at 03:17:01PM +0200, Jean Pihet wrote:
> Yes indeed!
>
> Ingo,
> Is the perf support for persistent event needed, or can this series be
> reviewed as is?
>
> In the meantime I am now working on the perf tool:
> - add persistent events,
> - factor out the code and provide a library (libperf) for other tool
> to use it (e.g. monitoring daemon).
Ok, good. I'm doing a bit too:
https://lkml.org/lkml/2014/4/13/20
so let's synchronize and get this thing going. I'll add you to my CC
list on future submissions.
Thanks.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH v4 00/16] perf, persistent: Add persistent events
2014-04-17 13:21 ` Borislav Petkov
@ 2014-04-22 8:20 ` Jean Pihet
2014-04-22 10:07 ` Jean Pihet
0 siblings, 1 reply; 41+ messages in thread
From: Jean Pihet @ 2014-04-22 8:20 UTC (permalink / raw)
To: Borislav Petkov
Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
linux-kernel@vger.kernel.org, Robert Richter
Hi Borislav,
On 17 April 2014 15:21, Borislav Petkov <bp@alien8.de> wrote:
> Hi Jean,
>
> On Thu, Apr 17, 2014 at 03:17:01PM +0200, Jean Pihet wrote:
>> Yes indeed!
>>
>> Ingo,
>> Is the perf support for persistent event needed, or can this series be
>> reviewed as is?
>>
>> In the meantime I am now working on the perf tool:
>> - add persistent events,
>> - factor out the code and provide a library (libperf) for other tool
>> to use it (e.g. monitoring daemon).
>
> Ok, good. I'm doing a bit too:
>
> https://lkml.org/lkml/2014/4/13/20
>
> so let's synchronize and get this thing going. I'll add you to my CC
> list on future submissions.
Ok, great!
About libperf and the persistent events, do you know more details
about the status and the proposed API? I have no info about the work
that was started previously.
Regards,
Jean
>
> Thanks.
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v4 00/16] perf, persistent: Add persistent events
2014-04-22 8:20 ` Jean Pihet
@ 2014-04-22 10:07 ` Jean Pihet
2014-04-22 10:20 ` Borislav Petkov
0 siblings, 1 reply; 41+ messages in thread
From: Jean Pihet @ 2014-04-22 10:07 UTC (permalink / raw)
To: Borislav Petkov, Fu Wei, Jiri Olsa, Arnaldo Carvalho de Melo,
Robert Richter
Cc: Ingo Molnar, Peter Zijlstra, linux-kernel@vger.kernel.org
Hi,
-- adding Fu Wei in the loop--
Fu Wei is working on the RAS daemon, which is a user of libperf. It is
crucial to agree on the libperf API in order to start with the correct
implementation.
Jean
On 22 April 2014 10:20, Jean Pihet <jean.pihet@linaro.org> wrote:
> Hi Borislav,
>
> On 17 April 2014 15:21, Borislav Petkov <bp@alien8.de> wrote:
>> Hi Jean,
>>
>> On Thu, Apr 17, 2014 at 03:17:01PM +0200, Jean Pihet wrote:
>>> Yes indeed!
>>>
>>> Ingo,
>>> Is the perf support for persistent event needed, or can this series be
>>> reviewed as is?
>>>
>>> In the meantime I am now working on the perf tool:
>>> - add persistent events,
>>> - factor out the code and provide a library (libperf) for other tool
>>> to use it (e.g. monitoring daemon).
>>
>> Ok, good. I'm doing a bit too:
>>
>> https://lkml.org/lkml/2014/4/13/20
>>
>> so let's synchronize and get this thing going. I'll add you to my CC
>> list on future submissions.
> Ok, great!
>
> About libperf and the persistent events, do you know more details
> about the status and the proposed API? I have no info about the work
> that was started previously.
>
> Regards,
> Jean
>
>>
>> Thanks.
>>
>> --
>> Regards/Gruss,
>> Boris.
>>
>> Sent from a fat crate under my desk. Formatting is fine.
>> --
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v4 00/16] perf, persistent: Add persistent events
2014-04-22 10:07 ` Jean Pihet
@ 2014-04-22 10:20 ` Borislav Petkov
0 siblings, 0 replies; 41+ messages in thread
From: Borislav Petkov @ 2014-04-22 10:20 UTC (permalink / raw)
To: Jean Pihet
Cc: Fu Wei, Jiri Olsa, Arnaldo Carvalho de Melo, Robert Richter,
Ingo Molnar, Peter Zijlstra, linux-kernel@vger.kernel.org
On Tue, Apr 22, 2014 at 12:07:07PM +0200, Jean Pihet wrote:
> Hi,
>
> -- adding Fu Wei in the loop--
>
> Fu Wei is working on the RAS daemon, which is a user of libperf. It is
Good. :)
> crucial to agree on the libperf API in order to start with the correct
> implementation.
Again, it should be a libperf library which contains only the
perf-specific bits. I've done a rough carve out, see the mail below.
However, then we started a conversation in a smaller thread which lead
to
https://lkml.kernel.org/r/20140329094250.GA11987@nazgul.tnic
In any case, the basic perf tool functionality which we need right
now is in the last 4 patches of these 16 - we want to be able to
open/enable/close/... persistent events in the ras daemon without the
perf tool, i.e. link to as many libraries as needed in tools/.
Also, libperf should not be exposed to external users... for now.
And yes, carving out perf tool functionality is very tedious work -
that's why I've dropped to sending only a couple of patches at once so
that merge conflicts can be avoided.
What else,... hmm, yeah, that should be it. Just take a look at those
mails and let me know. Also, this whole thing is not the final design -
we're just hashing out ideas and changing things as we go.
Thanks!
--
>From bp@alien8.de Fri Nov 15 21:29:18 2013
Date: Fri, 15 Nov 2013 21:28:48 +0100
From: Borislav Petkov <bp@alien8.de>
To: Ingo Molnar <mingo@kernel.org>, Arnaldo Carvalho de Melo
<acme@infradead.org>
Cc: Robert Richter <rric@kernel.org>, Peter Zijlstra <peterz@infradead.org>,
Jiri Olsa <jolsa@redhat.com>
Subject: Re: persistent events: Tooling support; perf tool splitup
Message-ID: <20131115202848.GK29277@pd.tnic>
User-Agent: Mutt/1.5.21 (2010-09-15)
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=utf-8
Status: RO
Hi guys,
I'm finally ready with the next attempt at a carve out of the facilities
in tools/perf/ into libraries for external use. I'm adding the diffstat
of the patches below with comments so that you can see how it would look
like. Once we all agree on the structure, I'll start feeding acme 1-2
patches at a time so that there are no big merge issues.
Comments and suggestions are, as always, very welcome and needed. :-)
* The first group are the kernel headers any tools in tools/
could/are/would be using. I've put them into tools/include/ because it
leaves the #include <linux...> directives unchanged.
But since we're making a liblk kernel library, I can imagine putting
them into tools/lib/lk/ and making all part of that generic library.
tools/{perf/util => }/include/asm/bug.h | 6 +-
tools/{perf/util => }/include/linux/bitmap.h | 7 +-
tools/{perf/util => }/include/linux/bitops.h | 12 +-
tools/{perf/util => }/include/linux/compiler.h | 8 +-
tools/{perf/util => }/include/linux/kernel.h | 6 +-
tools/{perf/util => }/include/linux/list.h | 9 +-
tools/{perf/util => }/include/linux/types.h | 6 +-
tools/include/asm/byteorder.h | 2 +
tools/include/linux/export.h | 6 +
tools/include/linux/hash.h | 8 +
tools/include/linux/poison.h | 1 +
tools/include/linux/prefetch.h | 6 +
tools/include/linux/rbtree.h | 3 +
tools/include/linux/rbtree_augmented.h | 2 +
This is tools/lib/lk/liblk.a which contains only compilation units which
contain generic code. The idea behind this library is to be used by all
tools in tools/.
tools/{perf/util => lib/lk}/color.c | 69 +-
tools/{perf/util => lib/lk}/color.h | 34 +-
tools/{perf/util => lib/lk}/cpumap.c | 9 +-
tools/{perf/util => lib/lk}/cpumap.h | 4 +
tools/{perf/util => lib/lk}/ctype.c | 2 +-
tools/{perf/util => lib/lk}/hweight.c | 1 +
tools/{perf/util => lib/lk}/include/linux/magic.h | 0
tools/{perf/util => lib/lk}/parse-options.c | 6 +-
tools/{perf/util => lib/lk}/parse-options.h | 13 +-
tools/{perf/util => lib/lk}/rblist.c | 0
tools/{perf/util => lib/lk}/rblist.h | 6 +-
tools/{perf/util => lib/lk}/strbuf.c | 3 +-
tools/{perf/util => lib/lk}/strbuf.h | 15 +-
tools/{perf/util => lib/lk}/string.c | 42 +-
tools/{perf/util => lib/lk}/strlist.c | 0
tools/{perf/util => lib/lk}/strlist.h | 6 +-
tools/{perf/util => lib/lk}/sysfs.c | 4 +
tools/{perf/util => lib/lk}/thread_map.c | 0
tools/{perf/util => lib/lk}/thread_map.h | 6 +-
tools/{perf/util => lib/lk}/types.h | 6 +-
tools/{perf/util => lib/lk}/usage.c | 2 +-
tools/{perf/util => lib/lk}/wrapper.c | 3 +-
tools/{perf/util => lib/lk}/xyarray.c | 0
tools/{perf/util => lib/lk}/xyarray.h | 6 +-
tools/lib/lk/Makefile | 62 +-
tools/lib/lk/config.c | 370 ++++++
tools/lib/lk/config.h | 13 +
tools/lib/lk/ctype.h | 37 +
tools/lib/lk/hweight.h | 13 +
tools/lib/lk/string.h | 21 +
tools/lib/lk/sysfs.h | 12 +
tools/lib/lk/usage.h | 34 +
tools/lib/lk/util.c | 18 +
tools/lib/lk/util.h | 38 +
tools/lib/lk/wrapper.h | 27 +
This is lib/perf/libperfevent.a which got initiated from exporting
evsel.* and evlist.* which I'm going to need for other tools to use.
Diffstat shows what moves in there, I've tried to keep it as small as
possible.
tools/{perf/util => lib/perf}/callchain.c | 4 +-
tools/{perf/util => lib/perf}/callchain.h | 12 +-
tools/{perf/util => lib/perf}/cgroup.c | 8 +-
tools/{perf/util => lib/perf}/cgroup.h | 7 +-
tools/{perf/util => lib/perf}/debug.c | 29 +-
tools/{perf/util => lib/perf}/debug.h | 19 +-
tools/{perf/util => lib/perf}/dso.c | 8 +-
tools/{perf/util => lib/perf}/dso.h | 11 +-
tools/{perf/util => lib/perf}/event.c | 109 +-
tools/{perf/util => lib/perf}/event.h | 12 +-
tools/{perf/util => lib/perf}/evlist.c | 37 +-
tools/{perf/util => lib/perf}/evlist.h | 15 +-
tools/{perf/util => lib/perf}/evsel.c | 38 +-
tools/{perf/util => lib/perf}/evsel.h | 33 +-
tools/{perf/util => lib/perf}/map.c | 10 +-
tools/{perf/util => lib/perf}/map.h | 9 +-
tools/{perf/util => lib/perf}/path.c | 28 +-
tools/{perf/util => lib/perf}/perf_regs.h | 6 +-
tools/{perf/util => lib/perf}/symbol.c | 6 +-
tools/{perf/util => lib/perf}/symbol.h | 10 +-
tools/{perf/util => lib/perf}/target.c | 3 -
tools/{perf/util => lib/perf}/target.h | 12 +-
tools/{perf/util => lib/perf}/thread.h | 7 +-
tools/{perf/util => lib/perf}/tool.h | 6 +-
tools/{perf/util => lib/perf}/vdso.c | 4 +-
tools/{perf/util => lib/perf}/vdso.h | 6 +-
tools/lib/perf/Makefile | 83 ++
tools/lib/perf/arch.h | 112 ++
tools/lib/perf/build-id.c | 102 ++
tools/lib/perf/build-id.h | 32 +
tools/lib/perf/hist.h | 69 ++
tools/lib/perf/machine.c | 1268 ++++++++++++++++++++
tools/lib/perf/machine.h | 168 +++
tools/lib/perf/path.h | 29 +
tools/lib/perf/perf.c | 29 +
tools/lib/perf/perf.h | 105 ++
tools/lib/perf/thread.c | 187 +++
Rest is changes to tools/perf/
tools/perf/MANIFEST | 3 -
tools/perf/Makefile | 136 +--
tools/perf/arch/common.c | 4 +-
tools/perf/arch/x86/include/perf_regs.h | 2 +-
tools/perf/arch/x86/util/header.c | 2 +-
tools/perf/arch/x86/util/tsc.c | 4 +-
tools/perf/arch/x86/util/tsc.h | 2 +-
tools/perf/bench/mem-memcpy.c | 8 +-
tools/perf/bench/mem-memset.c | 8 +-
tools/perf/bench/numa.c | 6 +-
tools/perf/bench/sched-messaging.c | 4 +-
tools/perf/bench/sched-pipe.c | 4 +-
tools/perf/builtin-annotate.c | 21 +-
tools/perf/builtin-bench.c | 4 +-
tools/perf/builtin-buildid-cache.c | 12 +-
tools/perf/builtin-buildid-list.c | 19 +-
tools/perf/builtin-diff.c | 14 +-
tools/perf/builtin-evlist.c | 8 +-
tools/perf/builtin-help.c | 15 +-
tools/perf/builtin-inject.c | 18 +-
tools/perf/builtin-kmem.c | 16 +-
tools/perf/builtin-kvm.c | 20 +-
tools/perf/builtin-lock.c | 20 +-
tools/perf/builtin-mem.c | 6 +-
tools/perf/builtin-probe.c | 10 +-
tools/perf/builtin-record.c | 162 ++-
tools/perf/builtin-report.c | 26 +-
tools/perf/builtin-sched.c | 21 +-
tools/perf/builtin-script.c | 18 +-
tools/perf/builtin-stat.c | 29 +-
tools/perf/builtin-timechart.c | 21 +-
tools/perf/builtin-top.c | 27 +-
tools/perf/builtin-trace.c | 41 +-
tools/perf/builtin.h | 2 +-
tools/perf/config/Makefile | 4 +-
tools/perf/perf.c | 18 +-
tools/perf/perf.h | 187 +--
tools/perf/tests/attr.c | 5 +-
tools/perf/tests/bp_signal.c | 2 +-
tools/perf/tests/bp_signal_overflow.c | 2 +-
tools/perf/tests/builtin-test.c | 15 +-
tools/perf/tests/code-reading.c | 14 +-
tools/perf/tests/dso-data.c | 6 +-
tools/perf/tests/evsel-roundtrip-name.c | 6 +-
tools/perf/tests/evsel-tp-sched.c | 2 +-
tools/perf/tests/hists_link.c | 12 +-
tools/perf/tests/keep-tracking.c | 8 +-
tools/perf/tests/mmap-basic.c | 9 +-
tools/perf/tests/open-syscall-all-cpus.c | 8 +-
tools/perf/tests/open-syscall-tp-fields.c | 6 +-
tools/perf/tests/open-syscall.c | 6 +-
tools/perf/tests/parse-events.c | 9 +-
tools/perf/tests/parse-no-sample-id-all.c | 4 +-
tools/perf/tests/perf-record.c | 6 +-
tools/perf/tests/perf-time-to-tsc.c | 8 +-
tools/perf/tests/python-use.c | 2 -
tools/perf/tests/rdpmc.c | 4 +-
tools/perf/tests/sample-parsing.c | 5 +-
tools/perf/tests/sw-clock.c | 8 +-
tools/perf/tests/task-exit.c | 8 +-
tools/perf/tests/tests.h | 4 +
tools/perf/tests/vmlinux-kallsyms.c | 7 +-
tools/perf/ui/browser.c | 2 +-
tools/perf/ui/browser.h | 2 +-
tools/perf/ui/browsers/annotate.c | 8 +-
tools/perf/ui/browsers/hists.c | 6 +-
tools/perf/ui/browsers/map.c | 4 +-
tools/perf/ui/browsers/scripts.c | 7 +-
tools/perf/ui/gtk/annotate.c | 4 +-
tools/perf/ui/gtk/browser.c | 6 +-
tools/perf/ui/gtk/helpline.c | 2 +-
tools/perf/ui/gtk/hists.c | 6 +-
tools/perf/ui/gtk/setup.c | 2 +-
tools/perf/ui/gtk/util.c | 2 +-
tools/perf/ui/helpline.c | 2 +-
tools/perf/ui/helpline.h | 1 -
tools/perf/ui/hist.c | 4 +-
tools/perf/ui/progress.h | 2 +-
tools/perf/ui/setup.c | 4 +-
tools/perf/ui/stdio/hist.c | 4 +-
tools/perf/ui/tui/helpline.c | 2 +-
tools/perf/ui/tui/setup.c | 3 +-
tools/perf/ui/tui/util.c | 2 +-
tools/perf/ui/ui.h | 4 +-
tools/perf/ui/util.c | 2 +-
tools/perf/util/abspath.c | 2 +
tools/perf/util/alias.c | 2 +-
tools/perf/util/annotate.c | 14 +-
tools/perf/util/annotate.h | 4 +-
tools/perf/util/build-id.c | 129 +-
tools/perf/util/build-id.h | 24 +-
tools/perf/util/cache.h | 45 +-
tools/perf/util/config.c | 463 +------
tools/perf/util/dwarf-aux.c | 2 +-
tools/perf/util/exec_cmd.c | 10 +-
tools/perf/util/exec_cmd.h | 6 +-
tools/perf/util/header.c | 30 +-
tools/perf/util/header.h | 16 +-
tools/perf/util/help.c | 2 +-
tools/perf/util/hist.c | 70 +-
tools/perf/util/hist.h | 195 ++-
tools/perf/util/include/asm/byteorder.h | 2 -
tools/perf/util/include/asm/hweight.h | 8 -
tools/perf/util/include/linux/export.h | 6 -
tools/perf/util/include/linux/hash.h | 5 -
tools/perf/util/include/linux/poison.h | 1 -
tools/perf/util/include/linux/prefetch.h | 6 -
tools/perf/util/include/linux/rbtree.h | 2 -
tools/perf/util/include/linux/rbtree_augmented.h | 2 -
tools/perf/util/include/linux/string.h | 4 -
tools/perf/util/intlist.h | 2 +-
tools/perf/util/machine.c | 1258 +------------------
tools/perf/util/machine.h | 163 +--
tools/perf/util/pager.c | 15 +-
tools/perf/util/parse-events.c | 12 +-
tools/perf/util/parse-events.h | 3 +-
tools/perf/util/parse-events.y | 2 +-
tools/perf/util/pmu.c | 4 +-
tools/perf/util/probe-event.c | 14 +-
tools/perf/util/probe-event.h | 2 +-
tools/perf/util/probe-finder.c | 7 +-
tools/perf/util/probe-finder.h | 2 +
tools/perf/util/python-ext-sources | 22 +-
tools/perf/util/python.c | 12 +-
tools/perf/util/record.c | 8 +-
.../perf/util/scripting-engines/trace-event-perl.c | 6 +-
.../util/scripting-engines/trace-event-python.c | 6 +-
tools/perf/util/session.c | 17 +-
tools/perf/util/session.h | 8 +-
tools/perf/util/setup.py | 3 +-
tools/perf/util/sort.c | 8 +-
tools/perf/util/sort.h | 83 +-
tools/perf/util/stat.h | 2 +-
tools/perf/util/strfilter.c | 2 +
tools/perf/util/svghelper.h | 2 +-
tools/perf/util/symbol-elf.c | 4 +-
tools/perf/util/symbol-minimal.c | 2 +-
tools/perf/util/sysfs.h | 6 -
tools/perf/util/thread.c | 91 --
tools/perf/util/top.c | 10 +-
tools/perf/util/top.h | 6 +-
tools/perf/util/trace-event-info.c | 2 +-
tools/perf/util/trace-event-read.c | 2 -
tools/perf/util/unwind.c | 4 +-
tools/perf/util/unwind.h | 6 +-
tools/perf/util/util.c | 98 +-
tools/perf/util/util.h | 193 ++-
tools/perf/util/values.h | 2 +-
238 files changed, 4299 insertions(+), 3722 deletions(-)
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
>
> Jean
>
> On 22 April 2014 10:20, Jean Pihet <jean.pihet@linaro.org> wrote:
> > Hi Borislav,
> >
> > On 17 April 2014 15:21, Borislav Petkov <bp@alien8.de> wrote:
> >> Hi Jean,
> >>
> >> On Thu, Apr 17, 2014 at 03:17:01PM +0200, Jean Pihet wrote:
> >>> Yes indeed!
> >>>
> >>> Ingo,
> >>> Is the perf support for persistent event needed, or can this series be
> >>> reviewed as is?
> >>>
> >>> In the meantime I am now working on the perf tool:
> >>> - add persistent events,
> >>> - factor out the code and provide a library (libperf) for other tool
> >>> to use it (e.g. monitoring daemon).
> >>
> >> Ok, good. I'm doing a bit too:
> >>
> >> https://lkml.org/lkml/2014/4/13/20
> >>
> >> so let's synchronize and get this thing going. I'll add you to my CC
> >> list on future submissions.
> > Ok, great!
> >
> > About libperf and the persistent events, do you know more details
> > about the status and the proposed API? I have no info about the work
> > that was started previously.
> >
> > Regards,
> > Jean
> >
> >>
> >> Thanks.
> >>
> >> --
> >> Regards/Gruss,
> >> Boris.
> >>
> >> Sent from a fat crate under my desk. Formatting is fine.
> >> --
>
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v4 00/16] perf, persistent: Add persistent events
2014-04-07 15:04 [PATCH v4 00/16] perf, persistent: Add persistent events Jean Pihet
` (16 preceding siblings ...)
2014-04-17 12:44 ` [PATCH v4 00/16] perf, persistent: Add persistent events Jean Pihet
@ 2014-05-06 12:39 ` Robert Richter
2014-05-06 18:50 ` Borislav Petkov
` (2 more replies)
17 siblings, 3 replies; 41+ messages in thread
From: Robert Richter @ 2014-05-06 12:39 UTC (permalink / raw)
To: Jean Pihet
Cc: Borislav Petkov, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel, Tomasz Nowicki
> Robert Richter <rric@kernel.org>:
> This patch set implements the necessary kernel changes for persistent
> events.
I was reviewing the code again after a while. It demonstrates how
persistent events can be used esp. for kernel enabled tracing. It also
allows people to play with it. The initial use case for this was
memory/hardware error detection. But for general-purpose uses of
persistent events I think the design must be improved. I want to start
a discussion on this. Sorry for the long text, but I hope my
description is detailed enough to show you the challenge.
The implementation of persistent events (see description in original
mail below about this) in the kernel is the right direction, basically
increase the refcount to not free events and its buffers when a
process terminates. Then, have a mechanism to reopen the event with
the syscall which can be done by the same or any other
process. Buffers can also be shared and have multiple users.
Now, the interesting question is how userland should all handle
this. There are ioctls implemented to control persistency of
events. But there are some open questions we need to discuss and
address since once the API is defined it can not be changed easily
later. Esp. the identification of events and its grouping of the same
kind of events running on all cpus (I don't mean group events).
We need to cover the following use cases for persistent events:
* create,
* reopen,
* delete,
* list and name them in the system.
The last is the most tricky.
I want to explain the current implementation with pros and cons and
possible alternatives.
Creation of events basically does (function arguments are a bit pseudo
code):
attr.type = pmu_type;
attr.config = event_config;
event = perf_event_open(attr, cpu, ...);
id = ioctl(event, PERF_EVENT_IOC_UNCLAIM, 0);
close(event);
To reopen it again:
attr.type = PERF_TYPE_PERSISTENT;
attr.config = id;
event = perf_event_open(attr, cpu, ...);
do_something(event); /* access buffers etc.: */
close(event);
Id must be known from sysfs or stored somewhere.
Events could also be shared transparently. This means, if there is
already an event running with the same attr, it could be reused. Not
sure if this makes sense much and is also feasible. Most events are
opened with writable buffers and thus can not be shared anyway.
Removing it:
... /* same as reopen */
id = ioctl(event, PERF_EVENT_IOC_CLAIM, 0);
close(event);
I rather would change the ioctl to
id = ioctl(PERF_EVENT_IOC_SET_PERSIST, arg);
arg != 0: create persistent event (unclaim)
arg == 0: delete persistent event (claim)
This has the advantage that the naming is better and arg can be used
as parameter (e.g. event id to share a namespace).
The unclaim ioctl *creates* a buffer with 512kB default size. The
reopening process must mmap with the same buffer size. This is a
problem as in this implementation the buffer size is fix and can not
be adjusted. We could let create the process the buffers and make the
event persistent including the current buffers.
Variable buffer size is a must, so the reopening process also must be
able to detect buffer size. Mmap buffer size could be detected by
mmap'ing only the header page, reading the buffer size from the header
and then remapping the buffer with adjusted size.
It would be good to have a perf tool option -P that puts events in
persistent state instead of starting a command:
# perf record -e ... -a -P <namespace> # create pers. events
# perf record -e persistent/<namespace>/ -a # read existing buffers
# perf record -e persistent/<namespace>/ -a -k # read and kill existing events
Note that no args for a command are given.
Persistent events are listed in sysfs, e.g.:
/sys/bus/event_source/devices/persistent/events/mce_record:persistent,config=106
/sys/bus/event_source/devices/persistent/format/persistent:attr5:24
Thus perf tool is able to setup events that connect to persistent
events. The list is dynamically updated.
The persistent flag is actually ignored and thus not needed for
setup. It is used internally to mark the event as persistent. (We
modify event attr, is this ok?) It is planned to use the persistent
flag to instantly make the event persistent during its creation, which
is not yet implemented (-EOPNOTSUPP). It is unclear if this is
possible if the buffers must be created by the process. It might be
better to drop the persistent flag then. In this case we need an
alternative flag to be used internally. The following sysfs entries
should be enough and attrX support is obsolete and no longer
necessary, though it is still useful in my point of view:
/sys/bus/event_source/devices/persistent/events/mce_record:config=<id>
This is nice but raises some questions about the handling of names and
ids. First, the 'mce_record' string is chosen by the kernel function
that created the events (perf_add_persistent_tp()). In user space we
can't choose a name, the string is created from the unique persistent
id (see pevent_alloc()). Hmm, this (a number) is not parsable by the
perf tool as event name which must be name_minus (see parse-events.l:
[a-zA-Z_*?][a-zA-Z0-9\-_*?]*). A name could be provided with an
ioctl():
err = ioctl(event, PERF_EVENT_IOC_SET_NAME, name);
Do we need to check the name's syntax, or can we expect userspace
doing it right?
If there is a namespace, other events must be able to be added to it.
Events could be created like this:
u64 id = ~0;
for_each_cpu(cpu, mask) {
event = perf_event_open(attr, cpu, ...)
err = ioctl(event, PERF_EVENT_IOC_SET_NAME, name);
...
/*
* if id == ~0 create own name, otherwise use id's
* name space, id is updated by the ioctl
*/
err = ioctl(event, PERF_EVENT_IOC_SET_PERSIST, &id)
...
}
We might require to set a name before issuing SET_PERSIST. Another
problem is that the name must not yet exist and is created new the
first time. Otherwise there might be namespace collisions.
Second, persistent events run in system wide mode. Thus, multiple
events (one per cpu) are involved. Should there one id for all or
separate ids for every single event? In this case we could use
(u64)event->id. There is already a syscall to read the id. In sysfs
all events on all cpus would be listed (each id). This is too much and
also not useful since sysfs event entries (.../devices/persistent/
events/*) do not handle cpu numbers. So, it would be better to combine
the events on all cpus then, at least for the sysfs entries. But if
the id is shared, we need a solution for the case where an event is
not running on all cpus. Cpu hotplug must also be considered.
Suppose there is one single name for a collection of events that are
persistent, this is how all events could be reopened (similar to
getdents()):
/* not actual an event: */
fd = perf_event_open(type = PERF_TYPE_PERSISTENT, config = ~0, ...)
err = ioctl(fd, PERF_EVENT_IOC_SET_NAME, name);
/* maintain an internal iteration state bound to fd */
while(1) {
err = ioctl(fd, PERF_EVENT_IOC_GET_PERSIST, &id);
if (!id)
break;
event = perf_event_open(PERF_TYPE_PERSISTENT, config = id);
...
}
close(fd)
A problem is the cpu number here, needs to be solved somehow.
Events could be listed and controlled with the perf ctl command:
$ perf ctl -l
<event name1>
<event name2>
...
$ perf ctl -d <event name>
The first lists all events in the system, the -d option removes an
event from it. This raises the question from above, whether events on
all cpus should be listed or not.
I hope this shows a bit what the current kernel/user interaction
misses and starts a discussion about how this could be solved. There
is no final solution yet. Not all questions mentioned need to be
answered, but we need to be aware that we do not break API with later
changes.
Please ask me questions, I might not have explained it good
enough. Any ideas are appreciated.
Thanks,
-Robert
> Persistent events run standalone in the system without the need of a
> controlling process that holds an event's file descriptor. The events
> are always enabled and collect data samples in a ring buffer.
> Processes may connect to existing persistent events using the
> perf_event_open() syscall. For this the syscall must be configured
> using the new PERF_TYPE_PERSISTENT event type and a unique event
> identifier specified in attr.config. The id is propagated in sysfs or
> using ioctl (see below).
>
> Persistent event buffers may be accessed with mmap() in the same way
> as for any other event. Since the buffers may be used by multiple
> processes at the same time, there is only read-only access to them.
> Currently there is only support for per-cpu events, thus root access
> is needed too.
>
> Persistent events are visible in sysfs. They are added or removed
> dynamically. With the information in sysfs userland knows about how to
> setup the perf_event attribute of a persistent event. Since a
> persistent event always has the persistent flag set, a way is needed
> to express this in sysfs. A new syntax is used for this. With
> 'attr<num>:<mask>' any bit in the attribute structure may be set in a
> similar way as using 'config<num>', but <num> is an index that points
> to the u64 value to change within the attribute.
>
> For persistent events the persistent flag (bit 24 of flag field in
> struct perf_event_attr) needs to be set which is expressed in sysfs
> with "attr5:24". E.g. the mce_record event is described in sysfs as
> follows:
>
> /sys/bus/event_source/devices/persistent/events/mce_record:persistent,config=106
> /sys/bus/event_source/devices/persistent/format/persistent:attr5:24
>
> Note that perf tools need to support the 'attr<num>' syntax that is
> added in a separate patch set. With it we are able to run perf tool
> commands to read persistent events, e.g.:
>
> # perf record -e persistent/mce_record/ sleep 10
> # perf top -e persistent/mce_record/
>
> In general the new syntax is flexible to describe with sysfs any event
> to be setup by perf tools.
>
> There are ioctl functions to control persistent events that can be
> used to detach or attach an event to or from a process. The
> PERF_EVENT_IOC_UNCLAIM ioctl call makes an event persistent. The
> perf_event_open() syscall can be used to re-open the event by any
> process. The PERF_EVENT_IOC_CLAIM ioctl attaches the event again so
> that it is removed after closing the event's fd.
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH v4 00/16] perf, persistent: Add persistent events
2014-05-06 12:39 ` Robert Richter
@ 2014-05-06 18:50 ` Borislav Petkov
2014-05-06 18:53 ` Borislav Petkov
2014-05-06 18:58 ` Borislav Petkov
2 siblings, 0 replies; 41+ messages in thread
From: Borislav Petkov @ 2014-05-06 18:50 UTC (permalink / raw)
To: Robert Richter
Cc: Jean Pihet, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Jiri Olsa, linux-kernel, Tomasz Nowicki
On Tue, May 06, 2014 at 02:39:07PM +0200, Robert Richter wrote:
> Creation of events basically does (function arguments are a bit pseudo
> code):
>
> attr.type = pmu_type;
> attr.config = event_config;
> event = perf_event_open(attr, cpu, ...);
> id = ioctl(event, PERF_EVENT_IOC_UNCLAIM, 0);
Remind me again pls why we decided for the CLAIM/UNCLAIM part? It is
very unintuitive. What was against we detach from the event (we're
implicitly attached to it in the normal case, when it is not persistent)
and then reattach?
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH v4 00/16] perf, persistent: Add persistent events
2014-05-06 12:39 ` Robert Richter
2014-05-06 18:50 ` Borislav Petkov
@ 2014-05-06 18:53 ` Borislav Petkov
2014-05-07 16:44 ` Robert Richter
2014-05-06 18:58 ` Borislav Petkov
2 siblings, 1 reply; 41+ messages in thread
From: Borislav Petkov @ 2014-05-06 18:53 UTC (permalink / raw)
To: Robert Richter
Cc: Jean Pihet, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Jiri Olsa, linux-kernel, Tomasz Nowicki
On Tue, May 06, 2014 at 02:39:07PM +0200, Robert Richter wrote:
> Events could also be shared transparently. This means, if there is
> already an event running with the same attr, it could be reused. Not
> sure if this makes sense much and is also feasible. Most events are
> opened with writable buffers and thus can not be shared anyway.
Well, this is simple: sharing events implicitly says they're read-only -
you can't share them otherwise. If you want the buffers to be writable,
then the events cannot be shared.
I think this is nicely simple.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH v4 00/16] perf, persistent: Add persistent events
2014-05-06 18:53 ` Borislav Petkov
@ 2014-05-07 16:44 ` Robert Richter
2014-05-08 18:23 ` Borislav Petkov
0 siblings, 1 reply; 41+ messages in thread
From: Robert Richter @ 2014-05-07 16:44 UTC (permalink / raw)
To: Borislav Petkov
Cc: Jean Pihet, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Jiri Olsa, linux-kernel, Tomasz Nowicki
On 06.05.14 20:53:59, Borislav Petkov wrote:
> On Tue, May 06, 2014 at 02:39:07PM +0200, Robert Richter wrote:
> > Events could also be shared transparently. This means, if there is
> > already an event running with the same attr, it could be reused. Not
> > sure if this makes sense much and is also feasible. Most events are
> > opened with writable buffers and thus can not be shared anyway.
>
> Well, this is simple: sharing events implicitly says they're read-only -
> you can't share them otherwise. If you want the buffers to be writable,
> then the events cannot be shared.
>
> I think this is nicely simple.
With transparently I mean that the process even does not know that the
same event is already running by another process. The kernel detects
this and maps the request to that event and buffer. Of course the
event's buffer must be at least readonly to be shared for this.
This could be a mechanism to connect to persistent events. The kernel
detects by type and attr that the requested event is already running
persistent and maps to it.
But at the moment persistent events can only be shared using
attr.type = PERF_TYPE_PERSISTENT
attr.config = id
So the above is more an alternative to connect to persistent events
and the question is, which one to use. Presumable the easiest first,
which is the current implementation.
-Robert
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v4 00/16] perf, persistent: Add persistent events
2014-05-07 16:44 ` Robert Richter
@ 2014-05-08 18:23 ` Borislav Petkov
2014-05-09 9:17 ` Robert Richter
0 siblings, 1 reply; 41+ messages in thread
From: Borislav Petkov @ 2014-05-08 18:23 UTC (permalink / raw)
To: Robert Richter
Cc: Jean Pihet, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Jiri Olsa, linux-kernel, Tomasz Nowicki
On Wed, May 07, 2014 at 06:44:07PM +0200, Robert Richter wrote:
> With transparently I mean that the process even does not know that the
> same event is already running by another process. The kernel detects
> this and maps the request to that event and buffer. Of course the
> event's buffer must be at least readonly to be shared for this.
Yes.
> This could be a mechanism to connect to persistent events. The kernel
> detects by type and attr that the requested event is already running
> persistent and maps to it.
>
> But at the moment persistent events can only be shared using
>
> attr.type = PERF_TYPE_PERSISTENT
> attr.config = id
>
> So the above is more an alternative to connect to persistent events
> and the question is, which one to use. Presumable the easiest first,
> which is the current implementation.
Well, there is no trivial way to share event buffers if they're not
read-only AFAICT.
But in questions like this, we always have to step one step back and ask
ourselves: what are the use cases for shared events and after we have
enumerated them, to design the kernel side so that it supports them.
So, do we want anything else besides shared, read-only events?
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH v4 00/16] perf, persistent: Add persistent events
2014-05-08 18:23 ` Borislav Petkov
@ 2014-05-09 9:17 ` Robert Richter
0 siblings, 0 replies; 41+ messages in thread
From: Robert Richter @ 2014-05-09 9:17 UTC (permalink / raw)
To: Borislav Petkov
Cc: Jean Pihet, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Jiri Olsa, linux-kernel, Tomasz Nowicki
On 08.05.14 20:23:44, Borislav Petkov wrote:
> On Wed, May 07, 2014 at 06:44:07PM +0200, Robert Richter wrote:
> > With transparently I mean that the process even does not know that the
> > same event is already running by another process. The kernel detects
> > this and maps the request to that event and buffer. Of course the
> > event's buffer must be at least readonly to be shared for this.
>
> Yes.
>
> > This could be a mechanism to connect to persistent events. The kernel
> > detects by type and attr that the requested event is already running
> > persistent and maps to it.
> >
> > But at the moment persistent events can only be shared using
> >
> > attr.type = PERF_TYPE_PERSISTENT
> > attr.config = id
> >
> > So the above is more an alternative to connect to persistent events
> > and the question is, which one to use. Presumable the easiest first,
> > which is the current implementation.
>
> Well, there is no trivial way to share event buffers if they're not
> read-only AFAICT.
>
> But in questions like this, we always have to step one step back and ask
> ourselves: what are the use cases for shared events and after we have
> enumerated them, to design the kernel side so that it supports them.
>
> So, do we want anything else besides shared, read-only events?
I only talk about events that are sharable since they are read-only.
The question I am asking here is how to connect to an event that is
sharable. This could be done transparently.
-Robert
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v4 00/16] perf, persistent: Add persistent events
2014-05-06 12:39 ` Robert Richter
2014-05-06 18:50 ` Borislav Petkov
2014-05-06 18:53 ` Borislav Petkov
@ 2014-05-06 18:58 ` Borislav Petkov
2014-05-07 17:01 ` Robert Richter
2 siblings, 1 reply; 41+ messages in thread
From: Borislav Petkov @ 2014-05-06 18:58 UTC (permalink / raw)
To: Robert Richter
Cc: Jean Pihet, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Jiri Olsa, linux-kernel, Tomasz Nowicki
On Tue, May 06, 2014 at 02:39:07PM +0200, Robert Richter wrote:
> I rather would change the ioctl to
>
> id = ioctl(PERF_EVENT_IOC_SET_PERSIST, arg);
>
> arg != 0: create persistent event (unclaim)
> arg == 0: delete persistent event (claim)
>
> This has the advantage that the naming is better and arg can be used
> as parameter (e.g. event id to share a namespace).
Yep, this is better than the CLAIM/UNCLAIM thing.
> The unclaim ioctl *creates* a buffer with 512kB default size. The
> reopening process must mmap with the same buffer size. This is a
> problem as in this implementation the buffer size is fix and can not
> be adjusted.
Why do we need it to be adjustable?
> We could let create the process the buffers and make the
> event persistent including the current buffers.
>
> Variable buffer size is a must,
Yeah, why? I'm just asking for my own understanding.
> so the reopening process also must be
> able to detect buffer size. Mmap buffer size could be detected by
> mmap'ing only the header page, reading the buffer size from the header
> and then remapping the buffer with adjusted size.
>
> It would be good to have a perf tool option -P that puts events in
> persistent state instead of starting a command:
>
> # perf record -e ... -a -P <namespace> # create pers. events
> # perf record -e persistent/<namespace>/ -a # read existing buffers
> # perf record -e persistent/<namespace>/ -a -k # read and kill existing events
>
> Note that no args for a command are given.
Yep, looks sane to me.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH v4 00/16] perf, persistent: Add persistent events
2014-05-06 18:58 ` Borislav Petkov
@ 2014-05-07 17:01 ` Robert Richter
2014-05-08 18:36 ` Borislav Petkov
2014-05-09 10:17 ` Borislav Petkov
0 siblings, 2 replies; 41+ messages in thread
From: Robert Richter @ 2014-05-07 17:01 UTC (permalink / raw)
To: Borislav Petkov
Cc: Jean Pihet, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Jiri Olsa, linux-kernel, Tomasz Nowicki
On 06.05.14 20:58:26, Borislav Petkov wrote:
> On Tue, May 06, 2014 at 02:39:07PM +0200, Robert Richter wrote:
> > I rather would change the ioctl to
> >
> > id = ioctl(PERF_EVENT_IOC_SET_PERSIST, arg);
> >
> > arg != 0: create persistent event (unclaim)
> > arg == 0: delete persistent event (claim)
> >
> > This has the advantage that the naming is better and arg can be used
> > as parameter (e.g. event id to share a namespace).
>
> Yep, this is better than the CLAIM/UNCLAIM thing.
Yeah, I am wondering why we didn't get this earlier. ;)
> > The unclaim ioctl *creates* a buffer with 512kB default size. The
> > reopening process must mmap with the same buffer size. This is a
> > problem as in this implementation the buffer size is fix and can not
> > be adjusted.
>
> Why do we need it to be adjustable?
I am fine with a fixed size. But this design might not be changable in
the future without breaking the API. So its worth to think about this.
Maybe we allow the process also to create the buffer, there are 2
variants then:
perf_event_open(...);
mmap(..., fd, ...);
ioctl(fd, set_persistent, 1); /* Use existing buffer. */
close();
fd = perf_event_open(...);
ioctl(fd, set_persistent, 1); /* Create buffer with default size. */
close();
This would solve the problem how we adjust buffer size.
This requires a way to detect buffer size. See my mail how this could
be done.
>
> > We could let create the process the buffers and make the
> > event persistent including the current buffers.
> >
> > Variable buffer size is a must,
>
> Yeah, why? I'm just asking for my own understanding.
For general-purpose... Why 512k buffers are ok? Depending on the
event, smaller buffers are maybe good enough, esp. since they are
permanently enabled and use system resources. Or, at some point, 512k
is not sufficient anymore. You don't want 512k be carved in stone.
> > so the reopening process also must be
> > able to detect buffer size. Mmap buffer size could be detected by
> > mmap'ing only the header page, reading the buffer size from the header
> > and then remapping the buffer with adjusted size.
> >
> > It would be good to have a perf tool option -P that puts events in
> > persistent state instead of starting a command:
> >
> > # perf record -e ... -a -P <namespace> # create pers. events
> > # perf record -e persistent/<namespace>/ -a # read existing buffers
> > # perf record -e persistent/<namespace>/ -a -k # read and kill existing events
> >
> > Note that no args for a command are given.
>
> Yep, looks sane to me.
Ok.
Thanks for your commments.
-Robert
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v4 00/16] perf, persistent: Add persistent events
2014-05-07 17:01 ` Robert Richter
@ 2014-05-08 18:36 ` Borislav Petkov
2014-05-09 8:52 ` Robert Richter
2014-05-09 10:17 ` Borislav Petkov
1 sibling, 1 reply; 41+ messages in thread
From: Borislav Petkov @ 2014-05-08 18:36 UTC (permalink / raw)
To: Robert Richter
Cc: Jean Pihet, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Jiri Olsa, linux-kernel, Tomasz Nowicki
On Wed, May 07, 2014 at 07:01:55PM +0200, Robert Richter wrote:
> For general-purpose... Why 512k buffers are ok? Depending on the
> event, smaller buffers are maybe good enough, esp. since they are
> permanently enabled and use system resources. Or, at some point, 512k
> is not sufficient anymore. You don't want 512k be carved in stone.
Well, if it turns out that 512K is not enough, it will be changed in
perf itself, right?
And we still don't care because no matter the size, the persistent
buffers overwrite themselves on fill up. If they'd stopped, they're not
really persistent, right?
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH v4 00/16] perf, persistent: Add persistent events
2014-05-08 18:36 ` Borislav Petkov
@ 2014-05-09 8:52 ` Robert Richter
0 siblings, 0 replies; 41+ messages in thread
From: Robert Richter @ 2014-05-09 8:52 UTC (permalink / raw)
To: Borislav Petkov
Cc: Jean Pihet, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Jiri Olsa, linux-kernel, Tomasz Nowicki
On 08.05.14 20:36:29, Borislav Petkov wrote:
> On Wed, May 07, 2014 at 07:01:55PM +0200, Robert Richter wrote:
> > For general-purpose... Why 512k buffers are ok? Depending on the
> > event, smaller buffers are maybe good enough, esp. since they are
> > permanently enabled and use system resources. Or, at some point, 512k
> > is not sufficient anymore. You don't want 512k be carved in stone.
>
> Well, if it turns out that 512K is not enough, it will be changed in
> perf itself, right?
This requires patching the kernel.
>
> And we still don't care because no matter the size, the persistent
> buffers overwrite themselves on fill up. If they'd stopped, they're not
> really persistent, right?
If the buffer size does not fit it might either use too much resources
in the system or overwrite its own samples before reading it.
Fixed buffers might work and are good enough in the beginning. If all
users can live with it, fine. But if not, the design does not allow
changes in API without breaking it. What's that I care about and
what's we should think about while designing the i/f.
-Robert
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v4 00/16] perf, persistent: Add persistent events
2014-05-07 17:01 ` Robert Richter
2014-05-08 18:36 ` Borislav Petkov
@ 2014-05-09 10:17 ` Borislav Petkov
1 sibling, 0 replies; 41+ messages in thread
From: Borislav Petkov @ 2014-05-09 10:17 UTC (permalink / raw)
To: Robert Richter
Cc: Jean Pihet, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Jiri Olsa, linux-kernel, Tomasz Nowicki
On Wed, May 07, 2014 at 07:01:55PM +0200, Robert Richter wrote:
> I am fine with a fixed size. But this design might not be changable in
> the future without breaking the API. So its worth to think about this.
> Maybe we allow the process also to create the buffer, there are 2
> variants then:
>
> perf_event_open(...);
> mmap(..., fd, ...);
> ioctl(fd, set_persistent, 1); /* Use existing buffer. */
> close();
>
> fd = perf_event_open(...);
> ioctl(fd, set_persistent, 1); /* Create buffer with default size. */
> close();
>
> This would solve the problem how we adjust buffer size.
>
> This requires a way to detect buffer size. See my mail how this could
> be done.
Yep, this makes sense and fits nicely with the perf interface so
persistent buffer sizes will be established at event creation.
This is all fine then.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 41+ messages in thread