From: Jiri Olsa <jolsa@redhat.com>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>,
Paul Mackerras <paulus@samba.org>, Ingo Molnar <mingo@elte.hu>
Cc: oleg@redhat.com, linux-kernel@vger.kernel.org
Subject: [PATCH,RFC] perf: panic due to inclied cpu context task_ctx value
Date: Thu, 24 Mar 2011 17:44:36 +0100 [thread overview]
Message-ID: <20110324164436.GC1930@jolsa.brq.redhat.com> (raw)
hi,
following program triggers panic in the current tip tree.
run it like: "while [ 1 ]; do ./ctx ; done"
--- reproducer
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <linux/perf_event.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/syscall.h>
static int trace_event__id(const char *evname)
{
char *filename;
int err = -1, fd;
if (asprintf(&filename,
"/sys/kernel/debug/tracing/events/syscalls/%s/id",
evname) < 0)
return -1;
fd = open(filename, O_RDONLY);
if (fd >= 0) {
char id[16];
if (read(fd, id, sizeof(id)) > 0)
err = atoi(id);
close(fd);
}
free(filename);
return err;
}
static inline int
sys_perf_event_open(struct perf_event_attr *attr,
pid_t pid, int cpu, int group_fd,
unsigned long flags)
{
attr->size = sizeof(*attr);
return syscall(__NR_perf_event_open, attr, pid, cpu,
group_fd, flags);
}
int main(int argc, char **argv)
{
struct perf_event_attr attr;
#define CPU_MAX 2
int fd[CPU_MAX], cpu, i, l;
int id = trace_event__id("sys_enter_open");
if (id < 0) {
printf("is debugfs mounted on /sys/kernel/debug?\n");
return -1;
}
memset(&attr, 0, sizeof(attr));
attr.type = PERF_TYPE_TRACEPOINT;
attr.config = id;
for(l = 0; l < 5; l++) {
for(cpu = 0; cpu < CPU_MAX; cpu++)
fd[cpu] = sys_perf_event_open(&attr,
0, cpu, -1, 0);
for(i = 0; i < 1000; i++) {
close(open("krava", 0));
}
for(cpu = 0; cpu < CPU_MAX; cpu++)
close(fd[cpu]);
}
return 0;
}
---
--- panic msg
[ 157.737618] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 157.737618] IP: [<ffffffff81104e9b>] perf_ctx_adjust_freq+0x5b/0x130
[ 157.737618] PGD 7835c067 PUD 7be57067 PMD 0
[ 157.737618] Oops: 0000 [#1] SMP
[ 157.737618] last sysfs file: /sys/devices/system/cpu/online
[ 157.737618] CPU 1
[ 157.737618] Modules linked in:
[ 157.737618]
[ 157.737618] Pid: 2112, comm: perf Not tainted 2.6.38-tip+ #602 Intel Corporation Montevina platform/To be filled by O.E.M.
[ 157.737618] RIP: 0010:[<ffffffff81104e9b>] [<ffffffff81104e9b>] perf_ctx_adjust_freq+0x5b/0x130
[ 157.737618] RSP: 0018:ffff88007b483d68 EFLAGS: 00010087
[ 157.737618] RAX: 0000000000000000 RBX: ffff880077a811c0 RCX: ffffffff818b32ab
[ 157.737618] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff880077a811cc
[ 157.737618] RBP: ffff88007b483da8 R08: 0000000000000000 R09: 0000000000000000
[ 157.737618] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000f41a8
[ 157.737618] R13: fffffffffffffff0 R14: ffff880077a81210 R15: ffff88007b483d70
[ 157.737618] FS: 00007fba6834e720(0000) GS:ffff88007b480000(0000) knlGS:0000000000000000
[ 157.737618] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 157.737618] CR2: 0000000000000000 CR3: 000000007bf93000 CR4: 00000000000406e0
[ 157.737618] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 157.737618] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 157.737618] Process perf (pid: 2112, threadinfo ffff88007be02000, task ffff880076088400)
[ 157.737618] Stack:
[ 157.737618] ffff88007b483da8 0000000000000000 0000000000000000 ffff88007b4940d0
[ 157.737618] ffff88007b48e578 ffff880077a811c0 ffff88007b4941a8 0000000000000000
[ 157.737618] ffff88007b483e18 ffffffff811050e7 ffff88007b491540 ffff880000000001
[ 157.737618] Call Trace:
[ 157.737618] <IRQ>
[ 157.737618] [<ffffffff811050e7>] perf_event_task_tick+0x177/0x330
[ 157.737618] [<ffffffff8107b408>] scheduler_tick+0xd8/0x300
[ 157.737618] [<ffffffff81091abe>] update_process_times+0x6e/0x90
[ 157.737618] [<ffffffff810b49a3>] tick_sched_timer+0x93/0xe0
[ 157.737618] [<ffffffff810a7ed1>] __run_hrtimer+0x121/0x250
[ 157.737618] [<ffffffff810b4910>] ? tick_sched_timer+0x0/0xe0
[ 157.737618] [<ffffffff810a832e>] hrtimer_interrupt+0xee/0x210
[ 157.737618] [<ffffffff81514f5b>] smp_apic_timer_interrupt+0x6b/0x9b
[ 157.737618] [<ffffffff81514393>] apic_timer_interrupt+0x13/0x20
[ 157.737618] <EOI>
[ 157.737618] [<ffffffff8108166b>] ? vprintk+0x22b/0x480
[ 157.737618] [<ffffffff810fe02a>] ? perf_event_read+0x11a/0x120
[ 157.737618] [<ffffffff81508226>] printk+0x41/0x43
[ 157.737618] [<ffffffff810ff5d4>] perf_release+0x34/0xe0
[ 157.737618] [<ffffffff8114e644>] fput+0x124/0x2d0
[ 157.737618] [<ffffffff8114a66b>] filp_close+0x7b/0xd0
[ 157.737618] [<ffffffff8114a75d>] sys_close+0x9d/0x130
[ 157.737618] [<ffffffff81513c02>] tracesys+0xd0/0xd5
[ 157.737618] Code: 48 8d 7b 0c e8 d7 66 40 00 48 8b 43 50 48 89 45 c8 4c 8b 6d c8 49 83 ed 10 eb 10 0f 1f 44 00 00 48 89 45 c8 4d 8b 2f 49 83 ed 10
[ 157.737618] 8b 45 10 49 8d 55 10 49 39 d6 0f 18 08 74 55 41 83 7d 58 01
[ 157.737618] RIP [<ffffffff81104e9b>] perf_ctx_adjust_freq+0x5b/0x130
[ 157.737618] RSP <ffff88007b483d68>
[ 157.737618] CR2: 0000000000000000
[ 157.737618] ---[ end trace 7175df90a55db905 ]---
---
The reason is task_ctx (struct perf_cpu_context) is holding an invalid
pointer inside perf_rotate_context function.
I think I found the reason and can workaround the issue with attached
patch, but I dont think it's proper solution, and I'm not sure how to fix
this properly ;)
Now here's what I think is happening..
- once an event is created by sys_perf_event_open, task context
is created and it stays even if the event is closed, until the task
is finished ... thats what I see in code and I assume it's correct
- when the task opens event, perf_sched_events jump label is incremented
and following callbacks are started from scheduler
__perf_event_task_sched_in
__perf_event_task_sched_out
These callback *in/out set/unset cpuctx->task_ctx value to the task
context.
- close is called on event on CPU 0:
- the task is scheduled on CPU 0
- __perf_event_task_sched_in is called
- cpuctx->task_ctx is set
- perf_sched_events jump label is decremented and == 0
- __perf_event_task_sched_out is not called
- cpuctx->task_ctx on CPU 0 stays set
- exit is called on CPU 1:
- the task is scheduled on CPU 1
- perf_event_exit_task is called
- task_ctx_sched_out unsets cpuctx->task_ctx on CPU 1
- put_ctx destroys the context
- another call of perf_rotate_context on CPU 0 will use invalid
task_ctx pointer, and eventualy panic
The attached workaround makes sure that the task_ctx is not set
when the context is being removed. As I said it's not ment to be
fix.
any ideas?
thanks,
jirka
---
kernel/perf_event.c | 13 +++++++++++++
1 files changed, 13 insertions(+), 0 deletions(-)
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index c75925c..25baf13 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -596,6 +596,19 @@ static void free_ctx(struct rcu_head *head)
static void put_ctx(struct perf_event_context *ctx)
{
if (atomic_dec_and_test(&ctx->refcount)) {
+
+ if (!atomic_read(&perf_sched_events)) {
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ struct perf_cpu_context *cpuctx;
+
+ cpuctx = per_cpu_ptr(ctx->pmu->pmu_cpu_context, cpu);
+ if (cpuctx->task_ctx == ctx)
+ cpuctx->task_ctx = NULL;
+ }
+ }
+
if (ctx->parent_ctx)
put_ctx(ctx->parent_ctx);
if (ctx->task)
--
1.7.1
next reply other threads:[~2011-03-24 16:44 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-24 16:44 Jiri Olsa [this message]
2011-03-25 19:10 ` [PATCH,RFC] perf: panic due to inclied cpu context task_ctx value Oleg Nesterov
2011-03-26 15:37 ` Peter Zijlstra
2011-03-26 16:13 ` Oleg Nesterov
2011-03-26 16:38 ` Peter Zijlstra
2011-03-26 17:09 ` Oleg Nesterov
2011-03-26 17:35 ` Oleg Nesterov
2011-03-26 18:29 ` Peter Zijlstra
2011-03-26 18:49 ` Oleg Nesterov
2011-03-28 13:30 ` Oleg Nesterov
2011-03-28 14:57 ` Peter Zijlstra
2011-03-28 15:00 ` Peter Zijlstra
2011-03-28 15:15 ` Oleg Nesterov
2011-03-28 16:27 ` Peter Zijlstra
2011-03-28 15:39 ` Oleg Nesterov
2011-03-28 15:49 ` Peter Zijlstra
2011-03-28 16:56 ` Oleg Nesterov
2011-03-29 8:32 ` Peter Zijlstra
2011-03-29 10:49 ` Peter Zijlstra
2011-03-29 16:28 ` Oleg Nesterov
2011-03-29 19:01 ` Peter Zijlstra
2011-03-30 13:09 ` Jiri Olsa
2011-03-30 14:51 ` Peter Zijlstra
2011-03-30 16:37 ` Oleg Nesterov
2011-03-30 18:30 ` Paul E. McKenney
2011-03-30 19:53 ` Oleg Nesterov
2011-03-30 21:26 ` Peter Zijlstra
2011-03-30 21:35 ` Oleg Nesterov
2011-03-31 10:32 ` Jiri Olsa
2011-03-31 12:41 ` [tip:perf/urgent] perf: Fix task context scheduling tip-bot for Peter Zijlstra
2011-03-31 13:28 ` [PATCH,RFC] perf: panic due to inclied cpu context task_ctx value Oleg Nesterov
2011-03-31 13:51 ` Peter Zijlstra
2011-03-31 14:10 ` Oleg Nesterov
2011-04-04 16:20 ` Oleg Nesterov
2011-03-30 15:32 ` Oleg Nesterov
2011-03-30 15:40 ` Peter Zijlstra
2011-03-30 15:52 ` Oleg Nesterov
2011-03-30 15:57 ` Peter Zijlstra
2011-03-30 16:11 ` Peter Zijlstra
2011-03-30 17:13 ` Oleg Nesterov
2011-03-26 17:09 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110324164436.GC1930@jolsa.brq.redhat.com \
--to=jolsa@redhat.com \
--cc=a.p.zijlstra@chello.nl \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=oleg@redhat.com \
--cc=paulus@samba.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.