* ye olde task_ctx_sched_out trace. @ 2014-05-21 15:06 Dave Jones 2014-05-21 15:16 ` Peter Zijlstra 0 siblings, 1 reply; 12+ messages in thread From: Dave Jones @ 2014-05-21 15:06 UTC (permalink / raw) To: Linux Kernel; +Cc: peterz I thought we had this nailed down a while ago, but it still keeps popping up... WARNING: CPU: 3 PID: 32310 at kernel/events/core.c:2384 task_ctx_sched_out+0x6b/0x80() CPU: 3 PID: 32310 Comm: trinity-c185 Not tainted 3.15.0-rc5+ #214 0000000000000009 000000003d7dfb5c ffff880019671df8 ffffffff9371a1fd 0000000000000000 ffff880019671e30 ffffffff9306d5dd ffff88024d0d6d48 ffff88010c4944e8 0000000000000286 ffff880243b82d00 ffff88010c4944e8 Call Trace: [<ffffffff9371a1fd>] dump_stack+0x4e/0x7a [<ffffffff9306d5dd>] warn_slowpath_common+0x7d/0xa0 [<ffffffff9306d70a>] warn_slowpath_null+0x1a/0x20 [<ffffffff931430bb>] task_ctx_sched_out+0x6b/0x80 [<ffffffff93146138>] perf_event_comm+0xc8/0x220 [<ffffffff930a19cd>] ? get_parent_ip+0xd/0x50 [<ffffffff931c025f>] set_task_comm+0x4f/0xc0 [<ffffffff93085b23>] SyS_prctl+0x1d3/0x480 [<ffffffff9372cf9f>] tracesys+0xdd/0xe2 There was on perf activity at all going on at the time. I had told trinity to do -g vm which excludes all non-VM related syscalls. What is perf_event_comm doing ? Is that storing some state in case I later decide to run perf ? Dave ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ye olde task_ctx_sched_out trace. 2014-05-21 15:06 ye olde task_ctx_sched_out trace Dave Jones @ 2014-05-21 15:16 ` Peter Zijlstra 2014-05-21 15:30 ` Peter Zijlstra 0 siblings, 1 reply; 12+ messages in thread From: Peter Zijlstra @ 2014-05-21 15:16 UTC (permalink / raw) To: Dave Jones, Linux Kernel On Wed, May 21, 2014 at 11:06:13AM -0400, Dave Jones wrote: > I thought we had this nailed down a while ago, but it still keeps > popping up... > > WARNING: CPU: 3 PID: 32310 at kernel/events/core.c:2384 task_ctx_sched_out+0x6b/0x80() > CPU: 3 PID: 32310 Comm: trinity-c185 Not tainted 3.15.0-rc5+ #214 > 0000000000000009 000000003d7dfb5c ffff880019671df8 ffffffff9371a1fd > 0000000000000000 ffff880019671e30 ffffffff9306d5dd ffff88024d0d6d48 > ffff88010c4944e8 0000000000000286 ffff880243b82d00 ffff88010c4944e8 > Call Trace: > [<ffffffff9371a1fd>] dump_stack+0x4e/0x7a > [<ffffffff9306d5dd>] warn_slowpath_common+0x7d/0xa0 > [<ffffffff9306d70a>] warn_slowpath_null+0x1a/0x20 > [<ffffffff931430bb>] task_ctx_sched_out+0x6b/0x80 > [<ffffffff93146138>] perf_event_comm+0xc8/0x220 > [<ffffffff930a19cd>] ? get_parent_ip+0xd/0x50 > [<ffffffff931c025f>] set_task_comm+0x4f/0xc0 > [<ffffffff93085b23>] SyS_prctl+0x1d3/0x480 > [<ffffffff9372cf9f>] tracesys+0xdd/0xe2 > > There was on perf activity at all going on at the time. > I had told trinity to do -g vm which excludes all non-VM related syscalls. > > What is perf_event_comm doing ? Is that storing some state in case > I later decide to run perf ? So we use perf_event_comm() to trigger start_on_exec, which in turn pretty much assumes .tsk=current. Now some people have advanced set_task_comm() usage far beyond this point and we can now pretty much call it on random tasks at random times in order to make 'top' look pretty or similar useless things. So I think I should separate this and add perf_event_exec() and leave perf_event_comm() for just reporting task->comm changes. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ye olde task_ctx_sched_out trace. 2014-05-21 15:16 ` Peter Zijlstra @ 2014-05-21 15:30 ` Peter Zijlstra 2014-05-21 15:32 ` Peter Zijlstra 0 siblings, 1 reply; 12+ messages in thread From: Peter Zijlstra @ 2014-05-21 15:30 UTC (permalink / raw) To: Dave Jones, Linux Kernel On Wed, May 21, 2014 at 05:16:55PM +0200, Peter Zijlstra wrote: > On Wed, May 21, 2014 at 11:06:13AM -0400, Dave Jones wrote: > > I thought we had this nailed down a while ago, but it still keeps > > popping up... > > > > WARNING: CPU: 3 PID: 32310 at kernel/events/core.c:2384 task_ctx_sched_out+0x6b/0x80() > > CPU: 3 PID: 32310 Comm: trinity-c185 Not tainted 3.15.0-rc5+ #214 > > 0000000000000009 000000003d7dfb5c ffff880019671df8 ffffffff9371a1fd > > 0000000000000000 ffff880019671e30 ffffffff9306d5dd ffff88024d0d6d48 > > ffff88010c4944e8 0000000000000286 ffff880243b82d00 ffff88010c4944e8 > > Call Trace: > > [<ffffffff9371a1fd>] dump_stack+0x4e/0x7a > > [<ffffffff9306d5dd>] warn_slowpath_common+0x7d/0xa0 > > [<ffffffff9306d70a>] warn_slowpath_null+0x1a/0x20 > > [<ffffffff931430bb>] task_ctx_sched_out+0x6b/0x80 > > [<ffffffff93146138>] perf_event_comm+0xc8/0x220 > > [<ffffffff930a19cd>] ? get_parent_ip+0xd/0x50 > > [<ffffffff931c025f>] set_task_comm+0x4f/0xc0 > > [<ffffffff93085b23>] SyS_prctl+0x1d3/0x480 > > [<ffffffff9372cf9f>] tracesys+0xdd/0xe2 > > > > There was on perf activity at all going on at the time. > > I had told trinity to do -g vm which excludes all non-VM related syscalls. > > > > What is perf_event_comm doing ? Is that storing some state in case > > I later decide to run perf ? > > So we use perf_event_comm() to trigger start_on_exec, which in turn > pretty much assumes .tsk=current. > > Now some people have advanced set_task_comm() usage far beyond this > point and we can now pretty much call it on random tasks at random times > in order to make 'top' look pretty or similar useless things. > > So I think I should separate this and add perf_event_exec() and leave > perf_event_comm() for just reporting task->comm changes. A little something like so I suppose. --- fs/exec.c | 1 + include/linux/perf_event.h | 1 + kernel/events/core.c | 28 ++++++++++++++++------------ 3 files changed, 18 insertions(+), 12 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 476f3ebf437e..8d51d7ce3dcf 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1111,6 +1111,7 @@ void setup_new_exec(struct linux_binprm * bprm) set_dumpable(current->mm, suid_dumpable); set_task_comm(current, kbasename(bprm->filename)); + perf_event_exec(); /* Set the new mm task size. We have to do that late because it may * depend on TIF_32BIT which is only updated in flush_thread() on diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index af6dcf1d9e47..5975d68a3fe6 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -694,6 +694,7 @@ extern struct perf_guest_info_callbacks *perf_guest_cbs; extern int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks); extern int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks); +extern void perf_event_exec(void); extern void perf_event_comm(struct task_struct *tsk); extern void perf_event_fork(struct task_struct *tsk); diff --git a/kernel/events/core.c b/kernel/events/core.c index 7ab734fbaeeb..e46ae0635dca 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2973,6 +2973,22 @@ static void perf_event_enable_on_exec(struct perf_event_context *ctx) local_irq_restore(flags); } +void perf_event_exec(void) +{ + struct perf_event_context *ctx; + int ctxn; + + rcu_read_lock(); + for_each_task_context_nr(ctxn) { + ctx = task->perf_event_ctxp[ctxn]; + if (!ctx) + continue; + + perf_event_enable_on_exec(ctx); + } + rcu_read_unlock(); +} + /* * Cross CPU call to read the hardware event */ @@ -5070,18 +5086,6 @@ static void perf_event_comm_event(struct perf_comm_event *comm_event) void perf_event_comm(struct task_struct *task) { struct perf_comm_event comm_event; - struct perf_event_context *ctx; - int ctxn; - - rcu_read_lock(); - for_each_task_context_nr(ctxn) { - ctx = task->perf_event_ctxp[ctxn]; - if (!ctx) - continue; - - perf_event_enable_on_exec(ctx); - } - rcu_read_unlock(); if (!atomic_read(&nr_comm_events)) return; ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: ye olde task_ctx_sched_out trace. 2014-05-21 15:30 ` Peter Zijlstra @ 2014-05-21 15:32 ` Peter Zijlstra 2014-05-22 6:52 ` Adrian Hunter 2014-06-06 12:19 ` [tip:perf/core] perf: Fix perf_event_comm() vs. exec() assumption tip-bot for Peter Zijlstra 0 siblings, 2 replies; 12+ messages in thread From: Peter Zijlstra @ 2014-05-21 15:32 UTC (permalink / raw) To: Dave Jones, Linux Kernel On Wed, May 21, 2014 at 05:30:13PM +0200, Peter Zijlstra wrote: > A little something like so I suppose. > > --- > +void perf_event_exec(void) > +{ > + struct perf_event_context *ctx; > + int ctxn; > + > + rcu_read_lock(); > + for_each_task_context_nr(ctxn) { > + ctx = task->perf_event_ctxp[ctxn]; s/task/current/ > + if (!ctx) > + continue; > + > + perf_event_enable_on_exec(ctx); > + } > + rcu_read_unlock(); > +} It would help if I pasted the one that actually compiles I suppose.. --- fs/exec.c | 1 + include/linux/perf_event.h | 1 + kernel/events/core.c | 28 ++++++++++++++++------------ 3 files changed, 18 insertions(+), 12 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 476f3ebf437e..8d51d7ce3dcf 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1111,6 +1111,7 @@ void setup_new_exec(struct linux_binprm * bprm) set_dumpable(current->mm, suid_dumpable); set_task_comm(current, kbasename(bprm->filename)); + perf_event_exec(); /* Set the new mm task size. We have to do that late because it may * depend on TIF_32BIT which is only updated in flush_thread() on diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index af6dcf1d9e47..5975d68a3fe6 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -694,6 +694,7 @@ extern struct perf_guest_info_callbacks *perf_guest_cbs; extern int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks); extern int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks); +extern void perf_event_exec(void); extern void perf_event_comm(struct task_struct *tsk); extern void perf_event_fork(struct task_struct *tsk); diff --git a/kernel/events/core.c b/kernel/events/core.c index 7ab734fbaeeb..59382cfc87a1 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2973,6 +2973,22 @@ static void perf_event_enable_on_exec(struct perf_event_context *ctx) local_irq_restore(flags); } +void perf_event_exec(void) +{ + struct perf_event_context *ctx; + int ctxn; + + rcu_read_lock(); + for_each_task_context_nr(ctxn) { + ctx = current->perf_event_ctxp[ctxn]; + if (!ctx) + continue; + + perf_event_enable_on_exec(ctx); + } + rcu_read_unlock(); +} + /* * Cross CPU call to read the hardware event */ @@ -5070,18 +5086,6 @@ static void perf_event_comm_event(struct perf_comm_event *comm_event) void perf_event_comm(struct task_struct *task) { struct perf_comm_event comm_event; - struct perf_event_context *ctx; - int ctxn; - - rcu_read_lock(); - for_each_task_context_nr(ctxn) { - ctx = task->perf_event_ctxp[ctxn]; - if (!ctx) - continue; - - perf_event_enable_on_exec(ctx); - } - rcu_read_unlock(); if (!atomic_read(&nr_comm_events)) return; ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: ye olde task_ctx_sched_out trace. 2014-05-21 15:32 ` Peter Zijlstra @ 2014-05-22 6:52 ` Adrian Hunter 2014-05-22 7:04 ` Peter Zijlstra ` (2 more replies) 2014-06-06 12:19 ` [tip:perf/core] perf: Fix perf_event_comm() vs. exec() assumption tip-bot for Peter Zijlstra 1 sibling, 3 replies; 12+ messages in thread From: Adrian Hunter @ 2014-05-22 6:52 UTC (permalink / raw) To: Peter Zijlstra, Dave Jones, Linux Kernel On 05/21/2014 06:32 PM, Peter Zijlstra wrote: > On Wed, May 21, 2014 at 05:30:13PM +0200, Peter Zijlstra wrote: >> A little something like so I suppose. >> >> --- > >> +void perf_event_exec(void) >> +{ >> + struct perf_event_context *ctx; >> + int ctxn; >> + >> + rcu_read_lock(); >> + for_each_task_context_nr(ctxn) { >> + ctx = task->perf_event_ctxp[ctxn]; > > s/task/current/ > >> + if (!ctx) >> + continue; >> + >> + perf_event_enable_on_exec(ctx); >> + } >> + rcu_read_unlock(); >> +} > > > It would help if I pasted the one that actually compiles I suppose.. > > --- > fs/exec.c | 1 + > include/linux/perf_event.h | 1 + > kernel/events/core.c | 28 ++++++++++++++++------------ > 3 files changed, 18 insertions(+), 12 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index 476f3ebf437e..8d51d7ce3dcf 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1111,6 +1111,7 @@ void setup_new_exec(struct linux_binprm * bprm) > set_dumpable(current->mm, suid_dumpable); > > set_task_comm(current, kbasename(bprm->filename)); > + perf_event_exec(); Shouldn't that be the other way around i.e. + perf_event_exec(); set_task_comm(current, kbasename(bprm->filename)); Also what about flagging the comm event that corresponds to an exec e.g. diff --git a/fs/exec.c b/fs/exec.c index 476f3eb..28d69a1 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1046,13 +1046,13 @@ EXPORT_SYMBOL_GPL(get_task_comm); * so that a new one can be started */ -void set_task_comm(struct task_struct *tsk, const char *buf) +void __set_task_comm(struct task_struct *tsk, const char *buf, bool exec) { task_lock(tsk); trace_task_rename(tsk, buf); strlcpy(tsk->comm, buf, sizeof(tsk->comm)); task_unlock(tsk); - perf_event_comm(tsk); + perf_event_comm(tsk, exec); } int flush_old_exec(struct linux_binprm * bprm) @@ -1110,7 +1110,7 @@ void setup_new_exec(struct linux_binprm * bprm) else set_dumpable(current->mm, suid_dumpable); - set_task_comm(current, kbasename(bprm->filename)); + __set_task_comm(current, kbasename(bprm->filename), true); /* Set the new mm task size. We have to do that late because it may * depend on TIF_32BIT which is only updated in flush_thread() on diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index af6dcf1..26614312 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -694,7 +694,7 @@ extern struct perf_guest_info_callbacks *perf_guest_cbs; extern int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks); extern int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks); -extern void perf_event_comm(struct task_struct *tsk); +extern void perf_event_comm(struct task_struct *tsk, bool exec); extern void perf_event_fork(struct task_struct *tsk); /* Callchains */ @@ -801,7 +801,7 @@ static inline int perf_unregister_guest_info_callbacks (struct perf_guest_info_callbacks *callbacks) { return 0; } static inline void perf_event_mmap(struct vm_area_struct *vma) { } -static inline void perf_event_comm(struct task_struct *tsk) { } +static inline void perf_event_comm(struct task_struct *tsk, bool exec) { } static inline void perf_event_fork(struct task_struct *tsk) { } static inline void perf_event_init(void) { } static inline int perf_swevent_get_recursion_context(void) { return -1; } diff --git a/include/linux/sched.h b/include/linux/sched.h index 25f54c7..cadcf38 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2376,7 +2376,11 @@ extern long do_fork(unsigned long, unsigned long, unsigned long, int __user *, i struct task_struct *fork_idle(int); extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags); -extern void set_task_comm(struct task_struct *tsk, const char *from); +extern void __set_task_comm(struct task_struct *tsk, const char *from, bool exec); +static inline void set_task_comm(struct task_struct *tsk, const char *from) +{ + __set_task_comm(tsk, from, false); +} extern char *get_task_comm(char *to, struct task_struct *tsk); #ifdef CONFIG_SMP diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index e3fc8f0..8ae875a 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -501,7 +501,12 @@ struct perf_event_mmap_page { #define PERF_RECORD_MISC_GUEST_KERNEL (4 << 0) #define PERF_RECORD_MISC_GUEST_USER (5 << 0) +/* + * PERF_RECORD_MISC_MMAP_DATA and PERF_RECORD_MISC_COMM_EXEC are used on + * different events so can reuse the same bit position. + */ #define PERF_RECORD_MISC_MMAP_DATA (1 << 13) +#define PERF_RECORD_MISC_COMM_EXEC (1 << 13) /* * Indicates that the content of PERF_SAMPLE_IP points to * the actual instruction that triggered the event. See also diff --git a/kernel/events/core.c b/kernel/events/core.c index ed50b09..760abd0 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5067,7 +5067,7 @@ static void perf_event_comm_event(struct perf_comm_event *comm_event) NULL); } -void perf_event_comm(struct task_struct *task) +void perf_event_comm(struct task_struct *task, bool exec) { struct perf_comm_event comm_event; struct perf_event_context *ctx; @@ -5093,7 +5093,7 @@ void perf_event_comm(struct task_struct *task) .event_id = { .header = { .type = PERF_RECORD_COMM, - .misc = 0, + .misc = exec ? PERF_RECORD_MISC_COMM_EXEC : 0, /* .size */ }, /* .pid */ ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: ye olde task_ctx_sched_out trace. 2014-05-22 6:52 ` Adrian Hunter @ 2014-05-22 7:04 ` Peter Zijlstra 2014-05-22 7:10 ` Adrian Hunter 2014-05-22 7:22 ` Peter Zijlstra 2014-06-06 12:19 ` [tip:perf/core] perf: Differentiate exec() and non-exec() comm events tip-bot for Adrian Hunter 2 siblings, 1 reply; 12+ messages in thread From: Peter Zijlstra @ 2014-05-22 7:04 UTC (permalink / raw) To: Adrian Hunter; +Cc: Dave Jones, Linux Kernel [-- Attachment #1: Type: text/plain, Size: 769 bytes --] On Thu, May 22, 2014 at 09:52:46AM +0300, Adrian Hunter wrote: > > diff --git a/fs/exec.c b/fs/exec.c > > index 476f3ebf437e..8d51d7ce3dcf 100644 > > --- a/fs/exec.c > > +++ b/fs/exec.c > > @@ -1111,6 +1111,7 @@ void setup_new_exec(struct linux_binprm * bprm) > > set_dumpable(current->mm, suid_dumpable); > > > > set_task_comm(current, kbasename(bprm->filename)); > > + perf_event_exec(); > > Shouldn't that be the other way around i.e. > > + perf_event_exec(); > set_task_comm(current, kbasename(bprm->filename)); I suppose so indeed. > Also what about flagging the comm event that corresponds to an exec e.g. I think it was a mistake to conflate the two concepts, and separating them into different functions makes things clearer. [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ye olde task_ctx_sched_out trace. 2014-05-22 7:04 ` Peter Zijlstra @ 2014-05-22 7:10 ` Adrian Hunter 2014-05-22 7:20 ` Peter Zijlstra 0 siblings, 1 reply; 12+ messages in thread From: Adrian Hunter @ 2014-05-22 7:10 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Dave Jones, Linux Kernel On 05/22/2014 10:04 AM, Peter Zijlstra wrote: > On Thu, May 22, 2014 at 09:52:46AM +0300, Adrian Hunter wrote: >>> diff --git a/fs/exec.c b/fs/exec.c >>> index 476f3ebf437e..8d51d7ce3dcf 100644 >>> --- a/fs/exec.c >>> +++ b/fs/exec.c >>> @@ -1111,6 +1111,7 @@ void setup_new_exec(struct linux_binprm * bprm) >>> set_dumpable(current->mm, suid_dumpable); >>> >>> set_task_comm(current, kbasename(bprm->filename)); >>> + perf_event_exec(); >> >> Shouldn't that be the other way around i.e. >> >> + perf_event_exec(); >> set_task_comm(current, kbasename(bprm->filename)); > > I suppose so indeed. > >> Also what about flagging the comm event that corresponds to an exec e.g. > > I think it was a mistake to conflate the two concepts, and separating > them into different functions makes things clearer. My patch was not related to that. It was to get effectively an "exec" event, by piggybacking the comm event. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ye olde task_ctx_sched_out trace. 2014-05-22 7:10 ` Adrian Hunter @ 2014-05-22 7:20 ` Peter Zijlstra 0 siblings, 0 replies; 12+ messages in thread From: Peter Zijlstra @ 2014-05-22 7:20 UTC (permalink / raw) To: Adrian Hunter; +Cc: Dave Jones, Linux Kernel [-- Attachment #1: Type: text/plain, Size: 219 bytes --] On Thu, May 22, 2014 at 10:10:04AM +0300, Adrian Hunter wrote: > > My patch was not related to that. It was to get effectively an "exec" > event, by piggybacking the comm event. Lemme wake up and try again :-) [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ye olde task_ctx_sched_out trace. 2014-05-22 6:52 ` Adrian Hunter 2014-05-22 7:04 ` Peter Zijlstra @ 2014-05-22 7:22 ` Peter Zijlstra 2014-05-28 8:02 ` Adrian Hunter 2014-06-06 12:19 ` [tip:perf/core] perf: Differentiate exec() and non-exec() comm events tip-bot for Adrian Hunter 2 siblings, 1 reply; 12+ messages in thread From: Peter Zijlstra @ 2014-05-22 7:22 UTC (permalink / raw) To: Adrian Hunter; +Cc: Dave Jones, Linux Kernel [-- Attachment #1: Type: text/plain, Size: 1273 bytes --] On Thu, May 22, 2014 at 09:52:46AM +0300, Adrian Hunter wrote: > +/* > + * PERF_RECORD_MISC_MMAP_DATA and PERF_RECORD_MISC_COMM_EXEC are used on > + * different events so can reuse the same bit position. > + */ > #define PERF_RECORD_MISC_MMAP_DATA (1 << 13) > +#define PERF_RECORD_MISC_COMM_EXEC (1 << 13) > /* > * Indicates that the content of PERF_SAMPLE_IP points to > * the actual instruction that triggered the event. See also > diff --git a/kernel/events/core.c b/kernel/events/core.c > index ed50b09..760abd0 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -5067,7 +5067,7 @@ static void perf_event_comm_event(struct perf_comm_event *comm_event) > NULL); > } > > -void perf_event_comm(struct task_struct *task) > +void perf_event_comm(struct task_struct *task, bool exec) > { > struct perf_comm_event comm_event; > struct perf_event_context *ctx; > @@ -5093,7 +5093,7 @@ void perf_event_comm(struct task_struct *task) > .event_id = { > .header = { > .type = PERF_RECORD_COMM, > - .misc = 0, > + .misc = exec ? PERF_RECORD_MISC_COMM_EXEC : 0, > /* .size */ > }, > /* .pid */ OK, now that you pointed out the obvious, yeah, I suppose we can do that :-) [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ye olde task_ctx_sched_out trace. 2014-05-22 7:22 ` Peter Zijlstra @ 2014-05-28 8:02 ` Adrian Hunter 0 siblings, 0 replies; 12+ messages in thread From: Adrian Hunter @ 2014-05-28 8:02 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Dave Jones, Linux Kernel On 05/22/2014 10:22 AM, Peter Zijlstra wrote: > On Thu, May 22, 2014 at 09:52:46AM +0300, Adrian Hunter wrote: >> +/* >> + * PERF_RECORD_MISC_MMAP_DATA and PERF_RECORD_MISC_COMM_EXEC are used on >> + * different events so can reuse the same bit position. >> + */ >> #define PERF_RECORD_MISC_MMAP_DATA (1 << 13) >> +#define PERF_RECORD_MISC_COMM_EXEC (1 << 13) >> /* >> * Indicates that the content of PERF_SAMPLE_IP points to >> * the actual instruction that triggered the event. See also >> diff --git a/kernel/events/core.c b/kernel/events/core.c >> index ed50b09..760abd0 100644 >> --- a/kernel/events/core.c >> +++ b/kernel/events/core.c >> @@ -5067,7 +5067,7 @@ static void perf_event_comm_event(struct perf_comm_event *comm_event) >> NULL); >> } >> >> -void perf_event_comm(struct task_struct *task) >> +void perf_event_comm(struct task_struct *task, bool exec) >> { >> struct perf_comm_event comm_event; >> struct perf_event_context *ctx; >> @@ -5093,7 +5093,7 @@ void perf_event_comm(struct task_struct *task) >> .event_id = { >> .header = { >> .type = PERF_RECORD_COMM, >> - .misc = 0, >> + .misc = exec ? PERF_RECORD_MISC_COMM_EXEC : 0, >> /* .size */ >> }, >> /* .pid */ > > OK, now that you pointed out the obvious, yeah, I suppose we can do that :-) > One problem is how user space can figure out if the kernel supports it. For my purposes, I don't need to know in advance, but I can imagine some users might want to. So I suggest adding: diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index e3fc8f0..67cec3e 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -301,8 +301,9 @@ struct perf_event_attr { exclude_callchain_kernel : 1, /* exclude kernel callchains */ exclude_callchain_user : 1, /* exclude user callchains */ mmap2 : 1, /* include mmap with inode data */ + exec : 1, /* flag comm events that are due to an exec */ - __reserved_1 : 40; + __reserved_1 : 39; union { __u32 wakeup_events; /* wakeup every n events */ Commit message: perf tools like 'perf report' can aggregate samples by comm strings, which generally works. However, there are other potential use-cases. For example, to pair up 'calls' with 'returns' accurately (from branch events like Intel BTS) it is necessary to identify whether the process has exec'd. Although a comm event is generated when an 'exec' happens it is also generated whenever the comm string is changed on a whim (e.g. by prctl PR_SET_NAME). This patch adds a flag to the comm event to differentiate one case from the other. In order to determine whether the kernel supports the new flag, a selection bit named 'exec' is added to struct perf_event_attr. The bit does nothing but will cause perf_event_open() to fail if the bit is set on kernels that do not have it defined. ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [tip:perf/core] perf: Differentiate exec() and non-exec() comm events 2014-05-22 6:52 ` Adrian Hunter 2014-05-22 7:04 ` Peter Zijlstra 2014-05-22 7:22 ` Peter Zijlstra @ 2014-06-06 12:19 ` tip-bot for Adrian Hunter 2 siblings, 0 replies; 12+ messages in thread From: tip-bot for Adrian Hunter @ 2014-06-06 12:19 UTC (permalink / raw) To: linux-tip-commits Cc: paulus, linux-kernel, hpa, mingo, torvalds, peterz, acme, davej, viro, jolsa, dsahern, adrian.hunter, tglx Commit-ID: 82b897782d10fcc4930c9d4a15b175348fdd2871 Gitweb: http://git.kernel.org/tip/82b897782d10fcc4930c9d4a15b175348fdd2871 Author: Adrian Hunter <adrian.hunter@intel.com> AuthorDate: Wed, 28 May 2014 11:45:04 +0300 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Fri, 6 Jun 2014 07:56:22 +0200 perf: Differentiate exec() and non-exec() comm events perf tools like 'perf report' can aggregate samples by comm strings, which generally works. However, there are other potential use-cases. For example, to pair up 'calls' with 'returns' accurately (from branch events like Intel BTS) it is necessary to identify whether the process has exec'd. Although a comm event is generated when an 'exec' happens it is also generated whenever the comm string is changed on a whim (e.g. by prctl PR_SET_NAME). This patch adds a flag to the comm event to differentiate one case from the other. In order to determine whether the kernel supports the new flag, a selection bit named 'exec' is added to struct perf_event_attr. The bit does nothing but will cause perf_event_open() to fail if the bit is set on kernels that do not have it defined. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/537D9EBE.7030806@intel.com Cc: Paul Mackerras <paulus@samba.org> Cc: Dave Jones <davej@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> --- fs/exec.c | 6 +++--- include/linux/perf_event.h | 4 ++-- include/linux/sched.h | 6 +++++- include/uapi/linux/perf_event.h | 9 +++++++-- kernel/events/core.c | 4 ++-- 5 files changed, 19 insertions(+), 10 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index a038a41..a3d33fe 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1046,13 +1046,13 @@ EXPORT_SYMBOL_GPL(get_task_comm); * so that a new one can be started */ -void set_task_comm(struct task_struct *tsk, const char *buf) +void __set_task_comm(struct task_struct *tsk, const char *buf, bool exec) { task_lock(tsk); trace_task_rename(tsk, buf); strlcpy(tsk->comm, buf, sizeof(tsk->comm)); task_unlock(tsk); - perf_event_comm(tsk); + perf_event_comm(tsk, exec); } int flush_old_exec(struct linux_binprm * bprm) @@ -1111,7 +1111,7 @@ void setup_new_exec(struct linux_binprm * bprm) set_dumpable(current->mm, suid_dumpable); perf_event_exec(); - set_task_comm(current, kbasename(bprm->filename)); + __set_task_comm(current, kbasename(bprm->filename), true); /* Set the new mm task size. We have to do that late because it may * depend on TIF_32BIT which is only updated in flush_thread() on diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index b4c1d46..707617a 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -707,7 +707,7 @@ extern int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks * extern int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks); extern void perf_event_exec(void); -extern void perf_event_comm(struct task_struct *tsk); +extern void perf_event_comm(struct task_struct *tsk, bool exec); extern void perf_event_fork(struct task_struct *tsk); /* Callchains */ @@ -815,7 +815,7 @@ static inline int perf_unregister_guest_info_callbacks static inline void perf_event_mmap(struct vm_area_struct *vma) { } static inline void perf_event_exec(void) { } -static inline void perf_event_comm(struct task_struct *tsk) { } +static inline void perf_event_comm(struct task_struct *tsk, bool exec) { } static inline void perf_event_fork(struct task_struct *tsk) { } static inline void perf_event_init(void) { } static inline int perf_swevent_get_recursion_context(void) { return -1; } diff --git a/include/linux/sched.h b/include/linux/sched.h index 221b2bd..ad86e1d 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2379,7 +2379,11 @@ extern long do_fork(unsigned long, unsigned long, unsigned long, int __user *, i struct task_struct *fork_idle(int); extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags); -extern void set_task_comm(struct task_struct *tsk, const char *from); +extern void __set_task_comm(struct task_struct *tsk, const char *from, bool exec); +static inline void set_task_comm(struct task_struct *tsk, const char *from) +{ + __set_task_comm(tsk, from, false); +} extern char *get_task_comm(char *to, struct task_struct *tsk); #ifdef CONFIG_SMP diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index d9cd853..5312fae 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -302,8 +302,8 @@ struct perf_event_attr { exclude_callchain_kernel : 1, /* exclude kernel callchains */ exclude_callchain_user : 1, /* exclude user callchains */ mmap2 : 1, /* include mmap with inode data */ - - __reserved_1 : 40; + comm_exec : 1, /* flag comm events that are due to an exec */ + __reserved_1 : 39; union { __u32 wakeup_events; /* wakeup every n events */ @@ -502,7 +502,12 @@ struct perf_event_mmap_page { #define PERF_RECORD_MISC_GUEST_KERNEL (4 << 0) #define PERF_RECORD_MISC_GUEST_USER (5 << 0) +/* + * PERF_RECORD_MISC_MMAP_DATA and PERF_RECORD_MISC_COMM_EXEC are used on + * different events so can reuse the same bit position. + */ #define PERF_RECORD_MISC_MMAP_DATA (1 << 13) +#define PERF_RECORD_MISC_COMM_EXEC (1 << 13) /* * Indicates that the content of PERF_SAMPLE_IP points to * the actual instruction that triggered the event. See also diff --git a/kernel/events/core.c b/kernel/events/core.c index 8fac205..7da5e56 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5090,7 +5090,7 @@ static void perf_event_comm_event(struct perf_comm_event *comm_event) NULL); } -void perf_event_comm(struct task_struct *task) +void perf_event_comm(struct task_struct *task, bool exec) { struct perf_comm_event comm_event; @@ -5104,7 +5104,7 @@ void perf_event_comm(struct task_struct *task) .event_id = { .header = { .type = PERF_RECORD_COMM, - .misc = 0, + .misc = exec ? PERF_RECORD_MISC_COMM_EXEC : 0, /* .size */ }, /* .pid */ ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [tip:perf/core] perf: Fix perf_event_comm() vs. exec() assumption 2014-05-21 15:32 ` Peter Zijlstra 2014-05-22 6:52 ` Adrian Hunter @ 2014-06-06 12:19 ` tip-bot for Peter Zijlstra 1 sibling, 0 replies; 12+ messages in thread From: tip-bot for Peter Zijlstra @ 2014-06-06 12:19 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, hpa, mingo, torvalds, peterz, acme, davej, viro, tglx Commit-ID: e041e328c4b41e1db79bfe5ba9992c2ed771ad19 Gitweb: http://git.kernel.org/tip/e041e328c4b41e1db79bfe5ba9992c2ed771ad19 Author: Peter Zijlstra <peterz@infradead.org> AuthorDate: Wed, 21 May 2014 17:32:19 +0200 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Fri, 6 Jun 2014 07:54:02 +0200 perf: Fix perf_event_comm() vs. exec() assumption perf_event_comm() assumes that set_task_comm() is only called on exec(), and in particular that its only called on current. Neither are true, as Dave reported a WARN triggered by set_task_comm() being called on !current. Separate the exec() hook from the comm hook. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20140521153219.GH5226@laptop.programming.kicks-ass.net [ Build fix. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> --- fs/exec.c | 1 + include/linux/perf_event.h | 4 +++- kernel/events/core.c | 28 ++++++++++++++++------------ 3 files changed, 20 insertions(+), 13 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 238b7aa..a038a41 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1110,6 +1110,7 @@ void setup_new_exec(struct linux_binprm * bprm) else set_dumpable(current->mm, suid_dumpable); + perf_event_exec(); set_task_comm(current, kbasename(bprm->filename)); /* Set the new mm task size. We have to do that late because it may diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 3ef6ea1..9b5cd19 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -695,6 +695,7 @@ extern struct perf_guest_info_callbacks *perf_guest_cbs; extern int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks); extern int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks); +extern void perf_event_exec(void); extern void perf_event_comm(struct task_struct *tsk); extern void perf_event_fork(struct task_struct *tsk); @@ -772,7 +773,7 @@ extern void perf_event_enable(struct perf_event *event); extern void perf_event_disable(struct perf_event *event); extern int __perf_event_disable(void *info); extern void perf_event_task_tick(void); -#else +#else /* !CONFIG_PERF_EVENTS: */ static inline void perf_event_task_sched_in(struct task_struct *prev, struct task_struct *task) { } @@ -802,6 +803,7 @@ static inline int perf_unregister_guest_info_callbacks (struct perf_guest_info_callbacks *callbacks) { return 0; } static inline void perf_event_mmap(struct vm_area_struct *vma) { } +static inline void perf_event_exec(void) { } static inline void perf_event_comm(struct task_struct *tsk) { } static inline void perf_event_fork(struct task_struct *tsk) { } static inline void perf_event_init(void) { } diff --git a/kernel/events/core.c b/kernel/events/core.c index 440eefc..647698f 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2970,6 +2970,22 @@ out: local_irq_restore(flags); } +void perf_event_exec(void) +{ + struct perf_event_context *ctx; + int ctxn; + + rcu_read_lock(); + for_each_task_context_nr(ctxn) { + ctx = current->perf_event_ctxp[ctxn]; + if (!ctx) + continue; + + perf_event_enable_on_exec(ctx); + } + rcu_read_unlock(); +} + /* * Cross CPU call to read the hardware event */ @@ -5057,18 +5073,6 @@ static void perf_event_comm_event(struct perf_comm_event *comm_event) void perf_event_comm(struct task_struct *task) { struct perf_comm_event comm_event; - struct perf_event_context *ctx; - int ctxn; - - rcu_read_lock(); - for_each_task_context_nr(ctxn) { - ctx = task->perf_event_ctxp[ctxn]; - if (!ctx) - continue; - - perf_event_enable_on_exec(ctx); - } - rcu_read_unlock(); if (!atomic_read(&nr_comm_events)) return; ^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-06-06 12:21 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-05-21 15:06 ye olde task_ctx_sched_out trace Dave Jones 2014-05-21 15:16 ` Peter Zijlstra 2014-05-21 15:30 ` Peter Zijlstra 2014-05-21 15:32 ` Peter Zijlstra 2014-05-22 6:52 ` Adrian Hunter 2014-05-22 7:04 ` Peter Zijlstra 2014-05-22 7:10 ` Adrian Hunter 2014-05-22 7:20 ` Peter Zijlstra 2014-05-22 7:22 ` Peter Zijlstra 2014-05-28 8:02 ` Adrian Hunter 2014-06-06 12:19 ` [tip:perf/core] perf: Differentiate exec() and non-exec() comm events tip-bot for Adrian Hunter 2014-06-06 12:19 ` [tip:perf/core] perf: Fix perf_event_comm() vs. exec() assumption tip-bot for Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).