All of lore.kernel.org
 help / color / mirror / Atom feed
From: Maneesh Soni <maneesh@in.ibm.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH][GIT PULL] tracing/wakeup: move access to wakeup_cpu into spinlock
Date: Fri, 27 Mar 2009 18:15:32 +0530	[thread overview]
Message-ID: <20090327124532.GA9963@in.ibm.com> (raw)
In-Reply-To: <20090327084144.GA23318@in.ibm.com>

On Fri, Mar 27, 2009 at 02:11:44PM +0530, Maneesh Soni wrote:
> On Thu, Mar 26, 2009 at 10:40:47AM -0400, Steven Rostedt wrote:
> > 
> > Ingo,
> > 
> > I believe this is the fix for the oops that Maneesh saw. What he explains 
> > he did sounds like it would trigger the race. Reading the trace output
> > resets wakeup_task to NULL and wakeup_cpu to -1. And he hit the bug by 
> > just reading the trace in a while loop.
> > 
> > Please pull the latest tip/tracing/ftrace-1 tree, which can be found at:
> > 
> >   git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
> > tip/tracing/ftrace-1
> > 
> > 
> > Steven Rostedt (1):
> >       tracing/wakeup: move access to wakeup_cpu into spinlock
> > 
> > ----
> >  kernel/trace/trace_sched_wakeup.c |    6 +++---
> >  1 files changed, 3 insertions(+), 3 deletions(-)
> > ---------------------------
> > commit 8ac070ca952d14c57e3d772e203fd117161cc596
> > Author: Steven Rostedt <srostedt@redhat.com>
> > Date:   Thu Mar 26 10:25:24 2009 -0400
> > 
> >     tracing/wakeup: move access to wakeup_cpu into spinlock
> >     
> >     Impact: fix for race condition
> >     
> >     The code had the following outside the lock:
> >     
> >             if (next != wakeup_task)
> >                     return;
> >     
> >             pc = preempt_count();
> >     
> >             /* The task we are waiting for is waking up */
> >             data = wakeup_trace->data[wakeup_cpu];
> >     
> >     On initialization, wakeup_task is NULL and wakeup_cpu -1. This code
> >     is not under a lock. If wakeup_task is set on another CPU as that
> >     task is waking up, we can see the wakeup_task before wakeup_cpu is
> >     set. If we read wakeup_cpu while it is still -1 then we will have
> >     a bad data pointer.
> >     
> >     This patch moves the reading of wakeup_cpu within the protection of
> >     the spinlock used to protect the writing of wakeup_cpu and wakeup_task.
> >     
> >     Reported-by: Maneesh Soni <maneesh@in.ibm.com>
> >     Signed-off-by: Steven Rostedt <srostedt@redhat.com>
> > 
> > diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
> > index 3c5ad6b..9e4ce4c 100644
> > --- a/kernel/trace/trace_sched_wakeup.c
> > +++ b/kernel/trace/trace_sched_wakeup.c
> > @@ -138,9 +138,6 @@ probe_wakeup_sched_switch(struct rq *rq, struct task_struct *prev,
> > 
> >  	pc = preempt_count();
> > 
> > -	/* The task we are waiting for is waking up */
> > -	data = wakeup_trace->data[wakeup_cpu];
> > -
> >  	/* disable local data, not wakeup_cpu data */
> >  	cpu = raw_smp_processor_id();
> >  	disabled = atomic_inc_return(&wakeup_trace->data[cpu]->disabled);
> > @@ -154,6 +151,9 @@ probe_wakeup_sched_switch(struct rq *rq, struct task_struct *prev,
> >  	if (unlikely(!tracer_enabled || next != wakeup_task))
> >  		goto out_unlock;
> > 
> > +	/* The task we are waiting for is waking up */
> > +	data = wakeup_trace->data[wakeup_cpu];
> > +
> >  	trace_function(wakeup_trace, CALLER_ADDR1, CALLER_ADDR2, flags, pc);
> >  	tracing_sched_switch_trace(wakeup_trace, prev, next, flags, pc);
> > 
> > 
> 
> Hi Steven,
> 
> I still the oops with the patch,
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> IP: [<ffffffff802d2259>] probe_wakeup_sched_switch+0xd2/0x1ac
> PGD 1d68e9067 PUD 1e6cdf067 PMD 0 
> Oops: 0000 [#1] SMP 
> last sysfs file: /sys/devices/pci0000:0c/0000:0c:00.0/irq
> CPU 5 
> Modules linked in: autofs4 hidp rfcomm l2cap bluetooth iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod sbs sbshc battery ac parport_pc lp parport sg sr_mod ide_cd_mod cdrom serio_raw acpi_memhotplug button tg3 libphy i2c_piix4 i2c_core pcspkr usb_storage uhci_hcd ohci_hcd ehci_hcd aacraid sd_mod scsi_mod ext3 jbd
> Pid: 6095, comm: cat Not tainted 2.6.29-tip-test #2 eserver xSeries 366-[88632RA]-
> RIP: 0010:[<ffffffff802d2259>]  [<ffffffff802d2259>] probe_wakeup_sched_switch+0xd2/0x1ac
> RSP: 0000:ffff8801e6d03ea0  EFLAGS: 00010046
> RAX: 0000000000000000 RBX: ffff8800281989a0 RCX: 0000000000000046
> RDX: ffff8800281989a0 RSI: ffffffff8020d56d RDI: ffffffff811de480
> RBP: ffff8801e6d03ed0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000046
> R13: ffff88022f232820 R14: 0000000000000000 R15: 0000000000000005
> FS:  00007f98b94cb6e0(0000) GS:ffff88002818b000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000008 CR3: 0000000224595000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process cat (pid: 6095, threadinfo ffff8801e6d02000, task ffff8801d6935040)
> Stack:
>  ffff8801d6935040 ffff880224981230 ffff88002819dd40 0000003b4ae00aa0
>  00007fffc14df830 ffff88002819dd40 ffff8801e6d03f70 ffffffff806f8264
>  0000000000000002 ffff8801e6d03f48 ffff8801d6935040 ffff88022f232820
> Call Trace:
>  [<ffffffff806f8264>] schedule+0xd99/0x10f9
>  [<ffffffff806fb61b>] ? trace_hardirqs_on_thunk+0x3a/0x3c
>  [<ffffffff8027d2cb>] ? delayed_work_timer_fn+0x0/0x38
>  [<ffffffff8020d56d>] retint_careful+0x12/0x2e
> Code: 2d ad de f2 00 75 77 48 63 05 ac de f2 00 48 8b 4d 00 45 89 f0 48 8b 3d 8e de f2 00 48 8b 71 08 48 8b 5c c7 28 48 8b 01 4c 89 e1 <48> 8b 50 08 e8 f6 9f ff ff 48 8b 3d 6f de f2 00 48 8b 75 d0 45 
> RIP  [<ffffffff802d2259>] probe_wakeup_sched_switch+0xd2/0x1ac
>  RSP <ffff8801e6d03ea0>
> CR2: 0000000000000008
> 

I could get a kdump in my next attempt. Let me know if you want me to upload the dump
somewhere. (700 MB compressed)

Some info from the dump using gdb & crash-utility.

(gdb) bt full
#0  0xffffffff802d2259 in probe_wakeup_sched_switch (rq=<value optimized out>, 
    prev=Cannot access memory at address 0xffff8801d05c7ea0
) at kernel/trace/trace_sched_wakeup.c:157
	data = (struct trace_array_cpu *) 0xffff8800281629a0
	T0 = <value optimized out>
	T1 = <value optimized out>
	delta = <value optimized out>
	flags = 70
	pc = 0
Cannot access memory at address 0xffff8801d05c7ed8
(gdb) print wakeup_trace
$7 = (struct trace_array *) 0xffffffff811de480
(gdb) print *(struct trace_array *) 0xffffffff811de480
$8 = {buffer = 0xffff88022f090440, entries = 1441792, cpu = 0, 
  time_start = 437360893965, waiter = 0x0, data = {0xffff8800281119a0, 
    0xffff88002812c9a0, 0xffff8800281479a0, 0xffff8800281629a0, 
    0xffff88002817d9a0, 0xffff8800281989a0, 0xffff8800281b39a0, 
    0xffff8800281ce9a0, 0x0 <repeats 247 times>}}
(gdb) print tracer_enabled
$9 = 1
(gdb) print wakeup_cpu
$10 = 3
(gdb) info threads
  8 process 0  0xffffffff80217024 in mwait_idle ()
    at /home/maneesh/linux-2.6-tip/arch/x86/include/asm/processor.h:732
  7 process 0  0xffffffff80217024 in mwait_idle ()
    at /home/maneesh/linux-2.6-tip/arch/x86/include/asm/processor.h:732
  6 process 0  0xffffffff80217024 in mwait_idle ()
    at /home/maneesh/linux-2.6-tip/arch/x86/include/asm/processor.h:732
  5 process 0  0xffffffff80217024 in mwait_idle ()
    at /home/maneesh/linux-2.6-tip/arch/x86/include/asm/processor.h:732
* 4 process 5773  0xffffffff802d2259 in probe_wakeup_sched_switch (
    rq=<value optimized out>, prev=Cannot access memory at address 0xffff8801d05c7ea0
) at kernel/trace/trace_sched_wakeup.c:157
  3 process 0  0xffffffff80217024 in mwait_idle ()
    at /home/maneesh/linux-2.6-tip/arch/x86/include/asm/processor.h:732
  2 process 0  0xffffffff80217024 in mwait_idle ()
    at /home/maneesh/linux-2.6-tip/arch/x86/include/asm/processor.h:732
  1 process 0  0xffffffff80217024 in mwait_idle ()
    at /home/maneesh/linux-2.6-tip/arch/x86/include/asm/processor.h:732
(gdb) info registers
rax            0x0	0
rbx            0xffff8800281629a0	-131940722792032
rcx            0x46	70
rdx            0xffff8800281629a0	-131940722792032
rsi            0xffffffff8020d56d	-2145331859
rdi            0xffffffff811de480	-2128747392
rbp            0xffff8801d05c7ed0	0xffff8801d05c7ed0
rsp            0xffff8801d05c7ea0	0xffff8801d05c7ea0
r8             0x0	0
r9             0x0	0
r10            0xffff8801d05c7eb0	-131933604643152
r11            0x1	1
r12            0x46	70
r13            0xffff88022f21d820	-131932014651360
r14            0x0	0
r15            0x3	3
rip            0xffffffff802d2259	0xffffffff802d2259 <probe_wakeup_sched_switch+210>
eflags         0x10046	[ PF ZF RF ]
cs             0x10	16
ss             0x0	0
ds             0x0	0
es             0x0	0
fs             0x0	0
gs             0x0	0
fctrl          0x0	0
fstat          0x0	0
ftag           0x0	0
fiseg          0x0	0
fioff          0x0	0
foseg          0x0	0
fooff          0x0	0
fop            0x0	0
mxcsr          0x0	[ ]
(gdb) 
...
...
0xffffffff802d2208 <probe_wakeup_sched_switch+129>:	lock xadd %ax,0xf2debf(%rip)        # 0xffffffff812000d0 <wakeup_lock>
0xffffffff802d2211 <probe_wakeup_sched_switch+138>:	cmp    %ah,%al
0xffffffff802d2213 <probe_wakeup_sched_switch+140>:	je     0xffffffff802d221f <probe_wakeup_sched_switch+152>
0xffffffff802d2215 <probe_wakeup_sched_switch+142>:	pause  
0xffffffff802d2217 <probe_wakeup_sched_switch+144>:	mov    0xf2deb3(%rip),%al        # 0xffffffff812000d0 <wakeup_lock>
0xffffffff802d221d <probe_wakeup_sched_switch+150>:	jmp    0xffffffff802d2211 <probe_wakeup_sched_switch+138>
0xffffffff802d221f <probe_wakeup_sched_switch+152>:	cmpl   $0x0,0x9fb02a(%rip)        # 0xffffffff80ccd250 <tracer_enabled>
0xffffffff802d2226 <probe_wakeup_sched_switch+159>:	je     0xffffffff802d22ac <probe_wakeup_sched_switch+293>
0xffffffff802d222c <probe_wakeup_sched_switch+165>:	cmp    0xf2dead(%rip),%r13        # 0xffffffff812000e0 <wakeup_task>
0xffffffff802d2233 <probe_wakeup_sched_switch+172>:	jne    0xffffffff802d22ac <probe_wakeup_sched_switch+293>
0xffffffff802d2235 <probe_wakeup_sched_switch+174>:	movslq 0xf2deac(%rip),%rax        # 0xffffffff812000e8 <wakeup_cpu>
0xffffffff802d223c <probe_wakeup_sched_switch+181>:	mov    0x0(%rbp),%rcx
0xffffffff802d2240 <probe_wakeup_sched_switch+185>:	mov    %r14d,%r8d
0xffffffff802d2243 <probe_wakeup_sched_switch+188>:	mov    0xf2de8e(%rip),%rdi        # 0xffffffff812000d8 <wakeup_trace>
0xffffffff802d224a <probe_wakeup_sched_switch+195>:	mov    0x8(%rcx),%rsi
0xffffffff802d224e <probe_wakeup_sched_switch+199>:	mov    0x28(%rdi,%rax,8),%rbx
0xffffffff802d2253 <probe_wakeup_sched_switch+204>:	mov    (%rcx),%rax
0xffffffff802d2256 <probe_wakeup_sched_switch+207>:	mov    %r12,%rcx
0xffffffff802d2259 <probe_wakeup_sched_switch+210>:	mov    0x8(%rax),%rdx   <<<<<<<<<===================== Here.
0xffffffff802d225d <probe_wakeup_sched_switch+214>:	callq  0xffffffff802cc258 <trace_function>
0xffffffff802d2262 <probe_wakeup_sched_switch+219>:	mov    0xf2de6f(%rip),%rdi        # 0xffffffff812000d8 <wakeup_trace>
0xffffffff802d2269 <probe_wakeup_sched_switch+226>:	mov    -0x30(%rbp),%rsi
0xffffffff802d226d <probe_wakeup_sched_switch+230>:	mov    %r14d,%r8d
0xffffffff802d2270 <probe_wakeup_sched_switch+233>:	mov    %r13,%rdx
0xffffffff802d2273 <probe_wakeup_sched_switch+236>:	mov    %r12,%rcx
0xffffffff802d2276 <probe_wakeup_sched_switch+239>:	callq  0xffffffff802cd5b---Type <return> to continue, or q <return> to quit---
2 <tracing_sched_switch_trace>


Some stack from rbp

crash> rd 0xffff8801d05c7ed0 50
ffff8801d05c7ed0:  ffff8801d05c7f70 ffffffff806f8264   p.\.....d.o.....
ffff8801d05c7ee0:  0000000000000002 ffff8801d05c7f48   ........H.\.....
ffff8801d05c7ef0:  ffff8801d043d040 ffff88022f21d820   @.C..... .!/....
ffff8801d05c7f00:  ffff8801d043d3e8 0000000300000000   ..C.............
ffff8801d05c7f10:  00007fff656c34d0 0000000000000000   .4le............
ffff8801d05c7f20:  0000000000000000 0000000000000000   ................
ffff8801d05c7f30:  ffffffff806fb61b 00000000000cf7bd   ..o.............
ffff8801d05c7f40:  0000000000000857 00000000008c3fc0   W........?......
ffff8801d05c7f50:  0000000000000000 00007fff656c34d0   .........4le....
ffff8801d05c7f60:  0000000000000000 0000000000000000   ................
ffff8801d05c7f70:  0000000000000000 ffffffff8020d56d   ........m. .....
ffff8801d05c7f80:  000000000000fe2e 0000000000000202   ................
ffff8801d05c7f90:  0000003b4b151a30 0000000000000003   0..K;...........
ffff8801d05c7fa0:  0000000000000004 00000000008c3fc0   .........?......
ffff8801d05c7fb0:  00000000008c30b0 0000000000000000   .0..............
ffff8801d05c7fc0:  00007f8e5a102de8 0000000000000000   .-.Z............
ffff8801d05c7fd0:  ffffffffffffffff 00000000004533f6   .........3E.....
ffff8801d05c7fe0:  0000000000000033 0000000000010206   3...............
ffff8801d05c7ff0:  00007fff656c3270 000000000000002b   p2le....+.......
ffff8801d05c8000:  0000000000000000 0000000000000000   ................

and some relvant text addresses
crash> sym ffffffff806f8264
ffffffff806f8264 (T) schedule+3481  include/trace/sched_event_types.h: 149
crash> sym ffffffff8020d56d
ffffffff8020d56d (t) retint_careful+18  include/linux/kdev_t.h: 52
crash> sym ffffffff806fb61b
ffffffff806fb61b (T) trace_hardirqs_on_thunk+58  include/linux/hrtimer.h: 199
crash> sym ffff880028167d40

It seems CALLER_ADDR is not correct? 

Thanks
Maneesh


-- 
Maneesh Soni
Linux Technology Center
IBM India Systems and Technology Lab,
Bangalore, India.

  reply	other threads:[~2009-03-27 12:45 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-26 12:28 [-tip] ftrace: oops at probe_wakeup_sched_switch+0xd1/0x1ae Maneesh Soni
2009-03-26 13:27 ` Frederic Weisbecker
2009-03-26 13:49   ` Steven Rostedt
2009-03-26 13:59     ` Steven Rostedt
2009-03-26 14:40     ` [PATCH][GIT PULL] tracing/wakeup: move access to wakeup_cpu into spinlock Steven Rostedt
2009-03-27  8:41       ` Maneesh Soni
2009-03-27 12:45         ` Maneesh Soni [this message]
2009-03-27 13:15           ` Steven Rostedt
2009-03-27 13:41             ` Frederic Weisbecker
2009-03-27 14:28               ` Steven Rostedt
2009-03-27 17:41             ` Maneesh Soni
2009-04-01 23:42           ` Steven Rostedt
2009-04-02  6:02             ` Maneesh Soni
2009-04-02 13:18               ` Steven Rostedt
2009-04-07 10:02                 ` Maneesh Soni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090327124532.GA9963@in.ibm.com \
    --to=maneesh@in.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.