* 2.6.31-rc6: Oops in fsnotify
@ 2009-08-20 15:48 Grant Wilson
2009-08-21 2:22 ` Eric Paris
0 siblings, 1 reply; 6+ messages in thread
From: Grant Wilson @ 2009-08-20 15:48 UTC (permalink / raw)
To: lkml
Hi,
Hit this oops while web-surfing:
[485211.962472] BUG: unable to handle kernel NULL pointer dereference at (null)
[485211.962488] IP: [<ffffffff8111bf93>] fsnotify+0x93/0x150
[485211.962503] PGD 120c58067 PUD 120c3c067 PMD 0
[485211.962518] Oops: 0000 [#1] PREEMPT SMP
[485211.962537] last sysfs file: /sys/devices/platform/coretemp.3/temp1_input
[485211.962544] CPU 0
[485211.962553] Modules linked in:
[485211.962614] Pid: 4889, comm: konqueror Not tainted 2.6.31-rc6 #1 Maximus Formula
[485211.962619] RIP: 0010:[<ffffffff8111bf93>] [<ffffffff8111bf93>] fsnotify+0x93/0x150
[485211.962628] RSP: 0018:ffff880120c89d88 EFLAGS: 00010246
[485211.962633] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[485211.962638] RDX: ffff88012b8aadcc RSI: 0000000000000000 RDI: ffff88012b8aadc8
[485211.962643] RBP: ffff880120c89df8 R08: 0000000000000000 R09: 0000000000000001
[485211.962648] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000200
[485211.962653] R13: 0000000008000200 R14: ffff8801286bf760 R15: 0000000000000000
[485211.962658] FS: 00007f10fd749750(0000) GS:ffff88002f000000(0000) knlGS:0000000000000000
[485211.962663] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[485211.962668] CR2: 0000000000000000 CR3: 0000000120c59000 CR4: 00000000000026f0
[485211.962673] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[485211.962679] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[485211.962684] Process konqueror (pid: 4889, threadinfo ffff880120c88000, task ffff880120c80000)
[485211.962689] Stack:
[485211.962693] ffff8801000000d0 0000000000000246 ffff880120c89dd8 0000000000000246
[485211.962706] <0> ffff88004a0295e0 000000024a029508 0000000000000000 ffff8800634d3be0
[485211.962723] <0> 0000000000000200 ffff88012278f800 ffff8801286bf760 ffff88004a029500
[485211.962742] Call Trace:
[485211.962749] [<ffffffff8111c232>] __fsnotify_parent+0xb2/0x110
[485211.962757] [<ffffffff810fee4f>] d_delete+0xaf/0xe0
[485211.962763] [<ffffffff810f51e4>] vfs_unlink+0xe4/0xf0
[485211.962770] [<ffffffff810f75ab>] do_unlinkat+0x18b/0x1c0
[485211.962778] [<ffffffff8102e5fa>] ? sysret_check+0x2e/0x69
[485211.962786] [<ffffffff81095d6d>] ? trace_hardirqs_on_caller+0x14d/0x1a0
[485211.962795] [<ffffffff815849c6>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[485211.962897] [<ffffffff810f75f1>] sys_unlink+0x11/0x20
[485211.962904] [<ffffffff8102e5c2>] system_call_fastpath+0x16/0x1b
[485211.962909] Code: f0 4c 8b 7d f8 c9 c3 90 48 c7 c7 60 b3 3f 82 e8 94 ce f6 ff 48 8b 1d 3d b5 6d 00 89 45 ac 48 c7 45 c8 00 00 00 00 eb 03 48 8b 1b <48> 8b 03 48 81 fb c0 74 7
f 81 0f 18 08 74 36 44 85 63 10 74 e8
[485211.963003] RIP [<ffffffff8111bf93>] fsnotify+0x93/0x150
[485211.963003] RSP <ffff880120c89d88>
[485211.963003] CR2: 0000000000000000
[485211.964056] ---[ end trace 28e6054427343f5a ]---
Current config available at http://swandive.no-ip.com/linux/config
Cheers,
Grant
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.31-rc6: Oops in fsnotify
2009-08-20 15:48 2.6.31-rc6: Oops in fsnotify Grant Wilson
@ 2009-08-21 2:22 ` Eric Paris
2009-08-21 6:25 ` Grant Wilson
0 siblings, 1 reply; 6+ messages in thread
From: Eric Paris @ 2009-08-21 2:22 UTC (permalink / raw)
To: Grant Wilson, aviro; +Cc: lkml, eparis
I'll take a close look in the morning. I don't offhand see how this
is possible without calling vfs_unlink on a negative dentry (does that
even make sense?)
What was the filesystem you are dealing with? Odd, very, odd (as with
all the problems people find in my notify code)
-Eric
On Thu, Aug 20, 2009 at 11:48 AM, Grant Wilson<grant.wilson@zen.co.uk> wrote:
> Hi,
> Hit this oops while web-surfing:
>
> [485211.962472] BUG: unable to handle kernel NULL pointer dereference at (null)
> [485211.962488] IP: [<ffffffff8111bf93>] fsnotify+0x93/0x150
> [485211.962503] PGD 120c58067 PUD 120c3c067 PMD 0
> [485211.962518] Oops: 0000 [#1] PREEMPT SMP
> [485211.962537] last sysfs file: /sys/devices/platform/coretemp.3/temp1_input
> [485211.962544] CPU 0
> [485211.962553] Modules linked in:
> [485211.962614] Pid: 4889, comm: konqueror Not tainted 2.6.31-rc6 #1 Maximus Formula
> [485211.962619] RIP: 0010:[<ffffffff8111bf93>] [<ffffffff8111bf93>] fsnotify+0x93/0x150
> [485211.962628] RSP: 0018:ffff880120c89d88 EFLAGS: 00010246
> [485211.962633] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [485211.962638] RDX: ffff88012b8aadcc RSI: 0000000000000000 RDI: ffff88012b8aadc8
> [485211.962643] RBP: ffff880120c89df8 R08: 0000000000000000 R09: 0000000000000001
> [485211.962648] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000200
> [485211.962653] R13: 0000000008000200 R14: ffff8801286bf760 R15: 0000000000000000
> [485211.962658] FS: 00007f10fd749750(0000) GS:ffff88002f000000(0000) knlGS:0000000000000000
> [485211.962663] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [485211.962668] CR2: 0000000000000000 CR3: 0000000120c59000 CR4: 00000000000026f0
> [485211.962673] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [485211.962679] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [485211.962684] Process konqueror (pid: 4889, threadinfo ffff880120c88000, task ffff880120c80000)
> [485211.962689] Stack:
> [485211.962693] ffff8801000000d0 0000000000000246 ffff880120c89dd8 0000000000000246
> [485211.962706] <0> ffff88004a0295e0 000000024a029508 0000000000000000 ffff8800634d3be0
> [485211.962723] <0> 0000000000000200 ffff88012278f800 ffff8801286bf760 ffff88004a029500
> [485211.962742] Call Trace:
> [485211.962749] [<ffffffff8111c232>] __fsnotify_parent+0xb2/0x110
> [485211.962757] [<ffffffff810fee4f>] d_delete+0xaf/0xe0
> [485211.962763] [<ffffffff810f51e4>] vfs_unlink+0xe4/0xf0
> [485211.962770] [<ffffffff810f75ab>] do_unlinkat+0x18b/0x1c0
> [485211.962778] [<ffffffff8102e5fa>] ? sysret_check+0x2e/0x69
> [485211.962786] [<ffffffff81095d6d>] ? trace_hardirqs_on_caller+0x14d/0x1a0
> [485211.962795] [<ffffffff815849c6>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [485211.962897] [<ffffffff810f75f1>] sys_unlink+0x11/0x20
> [485211.962904] [<ffffffff8102e5c2>] system_call_fastpath+0x16/0x1b
> [485211.962909] Code: f0 4c 8b 7d f8 c9 c3 90 48 c7 c7 60 b3 3f 82 e8 94 ce f6 ff 48 8b 1d 3d b5 6d 00 89 45 ac 48 c7 45 c8 00 00 00 00 eb 03 48 8b 1b <48> 8b 03 48 81 fb c0 74 7
> f 81 0f 18 08 74 36 44 85 63 10 74 e8
> [485211.963003] RIP [<ffffffff8111bf93>] fsnotify+0x93/0x150
> [485211.963003] RSP <ffff880120c89d88>
> [485211.963003] CR2: 0000000000000000
> [485211.964056] ---[ end trace 28e6054427343f5a ]---
>
> Current config available at http://swandive.no-ip.com/linux/config
>
> Cheers,
> Grant
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.31-rc6: Oops in fsnotify
2009-08-21 2:22 ` Eric Paris
@ 2009-08-21 6:25 ` Grant Wilson
2009-08-22 17:11 ` Eric Paris
2009-08-22 22:27 ` Eric Paris
0 siblings, 2 replies; 6+ messages in thread
From: Grant Wilson @ 2009-08-21 6:25 UTC (permalink / raw)
To: Eric Paris; +Cc: aviro, lkml, eparis
On Thu, 20 Aug 2009 22:22:34 -0400
Eric Paris <eparis@parisplace.org> wrote:
> I'll take a close look in the morning. I don't offhand see how this
> is possible without calling vfs_unlink on a negative dentry (does that
> even make sense?)
>
> What was the filesystem you are dealing with? Odd, very, odd (as with
> all the problems people find in my notify code)
>
The filesystem is ext4 (converted from ext3).
Grant
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.31-rc6: Oops in fsnotify
2009-08-21 6:25 ` Grant Wilson
@ 2009-08-22 17:11 ` Eric Paris
2009-08-22 22:27 ` Eric Paris
1 sibling, 0 replies; 6+ messages in thread
From: Eric Paris @ 2009-08-22 17:11 UTC (permalink / raw)
To: Grant Wilson; +Cc: Eric Paris, aviro, lkml
On Fri, 2009-08-21 at 07:25 +0100, Grant Wilson wrote:
> On Thu, 20 Aug 2009 22:22:34 -0400
> Eric Paris <eparis@parisplace.org> wrote:
>
> > I'll take a close look in the morning. I don't offhand see how this
> > is possible without calling vfs_unlink on a negative dentry (does that
> > even make sense?)
> >
> > What was the filesystem you are dealing with? Odd, very, odd (as with
> > all the problems people find in my notify code)
Do you still have the vmlinux file you built? Would you mind sending it
to me along with your .config? It's actualy quite normal to get a
negative dentry down in that code and it shouldn't cause a problem. I
don't see any likely candidates about would like to try to figure out
exactly where in fsnotify it hit that NULL....
-Eric
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.31-rc6: Oops in fsnotify
2009-08-21 6:25 ` Grant Wilson
2009-08-22 17:11 ` Eric Paris
@ 2009-08-22 22:27 ` Eric Paris
2009-08-28 15:32 ` Paul E. McKenney
1 sibling, 1 reply; 6+ messages in thread
From: Eric Paris @ 2009-08-22 22:27 UTC (permalink / raw)
To: Grant Wilson; +Cc: Eric Paris, aviro, lkml
On Fri, 2009-08-21 at 07:25 +0100, Grant Wilson wrote:
> On Thu, 20 Aug 2009 22:22:34 -0400
> Eric Paris <eparis@parisplace.org> wrote:
>
> > I'll take a close look in the morning. I don't offhand see how this
> > is possible without calling vfs_unlink on a negative dentry (does that
> > even make sense?)
> >
> > What was the filesystem you are dealing with? Odd, very, odd (as with
> > all the problems people find in my notify code)
> >
> The filesystem is ext4 (converted from ext3).
I got the assembly from Grant, it's actually very easy to map it back to
the code but I don't see any problems!
RIP: 0010:[<ffffffff8111bf93>] [<ffffffff8111bf93>] fsnotify+0x93/0x150
RSP: 0018:ffff880120c89d88 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff88012b8aadcc RSI: 0000000000000000 RDI: ffff88012b8aadc8
RBP: ffff880120c89df8 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000200
R13: 0000000008000200 R14: ffff8801286bf760 R15: 0000000000000000
FS: 00007f10fd749750(0000) GS:ffff88002f000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
DR2: 0000000000000000 CR3: 0000000120c59000 CR4: 00000000000026f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
ffffffff8111bf77: e8 94 ce f6 ff callq ffffffff81088e10 <srcu_read_lock>
ffffffff8111bf7c: 48 8b 1d 3d b5 6d 00 mov 0x6db53d(%rip),%rbx # ffffffff817f74c0 <fsnotify_groups>
ffffffff8111bf83: 89 45 ac mov %eax,-0x54(%rbp)
ffffffff8111bf86: 48 c7 45 c8 00 00 00 movq $0x0,-0x38(%rbp)
ffffffff8111bf8d: 00
ffffffff8111bf8e: eb 03 jmp ffffffff8111bf93 <fsnotify+0x93>
ffffffff8111bf90: 48 8b 1b mov (%rbx),%rbx
ffffffff8111bf93: 48 8b 03 mov (%rbx),%rax <----------------------IP IS HERE
ffffffff8111bf96: 48 81 fb c0 74 7f 81 cmp $0xffffffff817f74c0,%rbx
ffffffff8111bf9d: 0f 18 08 prefetcht0 (%rax)
ffffffff8111bfa0: 74 36 je ffffffff8111bfd8 <fsnotify+0xd8>
ffffffff8111bfa2: 44 85 63 10 test %r12d,0x10(%rbx) <--- [if (test_mask &group->mask)]
ffffffff8111bfa6: 74 e8 je ffffffff8111bf90 <fsnotify+0x90>
ffffffff8111bfa8: 48 8b 43 20 mov 0x20(%rbx),%rax
ffffffff8111bfac: 44 89 ea mov %r13d,%edx
ffffffff8111bfaf: 4c 89 f6 mov %r14,%rsi
ffffffff8111bfb2: 48 89 df mov %rbx,%rdi
ffffffff8111bfb5: ff 10 callq *(%rax) <-------------- [group->ops->should_send_event()]
ffffffff8111bfb7: 84 c0 test %al,%al
ffffffff8111bfb9: 74 d5 je ffffffff8111bf90 <fsnotify+0x90>
ffffffff8111bfbb: 48 83 7d c8 00 cmpq $0x0,-0x38(%rbp)
ffffffff8111bfc0: 74 46 je ffffffff8111c008 <fsnotify+0x108> <- [fsnotify_create_event()]
ffffffff8111bfc2: 48 8b 43 20 mov 0x20(%rbx),%rax
ffffffff8111bfc6: 48 8b 75 c8 mov -0x38(%rbp),%rsi
ffffffff8111bfca: 48 89 df mov %rbx,%rdi
ffffffff8111bfcd: ff 50 08 callq *0x8(%rax) <----------- [group->ops->handle_event()]
ffffffff8111bfd0: eb be jmp ffffffff8111bf90 <fsnotify+0x90>
ffffffff8111bfd2: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
ffffffff8111bfd8: 8b 75 ac mov -0x54(%rbp),%esi
ffffffff8111bfdb: 48 c7 c7 60 b3 3f 82 mov $0xffffffff823fb360,%rdi
ffffffff8111bfe2: e8 c9 cd f6 ff callq ffffffff81088db0 <srcu_read_unlock>
Here is the code segment from fsnotify():
idx = srcu_read_lock(&fsnotify_grp_srcu);
list_for_each_entry_rcu(group, &fsnotify_groups, group_list) {
if (test_mask & group->mask) {
if (!group->ops->should_send_event(group, to_tell, mask))
continue;
if (!event) {
event = fsnotify_create_event(to_tell, mask, data,
data_is, file_name, cookie,
GFP_KERNEL);
/* shit, we OOM'd and now we can't tell, maybe
* someday someone else will want to do something
* here */
if (!event)
break;
}
group->ops->handle_event(group, event);
}
}
srcu_read_unlock(&fsnotify_grp_srcu, idx);
So the IP is clearly inside the 'list_for_each_entry_rcu.' This means
that somehow a ->next pointer = NULL. The anchor for this list
(fsnotify_groups) is declared as LIST_HEAD(fsnotify_groups); so it can't
point to NULL. So I have to look at things that can get added/removed.
fsnotify_obtain_group() does the only addition (using list_add_rcu) and
holding the fsnotify_grp_mutex. __fsnotify_evict_group() does a
list_del_rcu() (which doesn't change the forward pointer) and also hold
the correct mutex. These are the only 2 manipulation sites for objects
on the list and they are clearly protected by the fsnotify_grp_mutex.
So if all of the list manipulation has proper synchronization the
concern then becomes if an object was removed from the list (properly)
and freed (improperly) and had it's ->next pntr set to NULL before the
rcu timeouts. But in fsnotify_put_group() I have:
__fsnotify_evict_group(group);
mutex_unlock(&fsnotify_grp_mutex);
synchronize_srcu(&fsnotify_grp_srcu);
fsnotify_destroy_group(group);
So the group is taken off of the list in __fsnotify_evict_group() and
then we imediately wait for the fsnotify_grp_srcu which is protecting
the read section where we hit a bug. So it doesn't look possible that
the object could have been freed while it was still in use by this list.
Maybe someone a lot smarter than me or someone who understands {s,}rcu a
lot better than I do can show me something that I've done wrong, but I'm
having problems seeing anything. The only thing I can imagine is that
something was scribbling in random memory it didn't own. I'm sorry
Grant, but I don't see what is wrong here! If you keep getting bugs,
especially if they happen in random places, I'd suggest either dying RAM
or blame some other random subsystem :)
-Eric
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.31-rc6: Oops in fsnotify
2009-08-22 22:27 ` Eric Paris
@ 2009-08-28 15:32 ` Paul E. McKenney
0 siblings, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2009-08-28 15:32 UTC (permalink / raw)
To: Eric Paris; +Cc: Grant Wilson, Eric Paris, aviro, lkml
On Sat, Aug 22, 2009 at 06:27:06PM -0400, Eric Paris wrote:
> On Fri, 2009-08-21 at 07:25 +0100, Grant Wilson wrote:
> > On Thu, 20 Aug 2009 22:22:34 -0400
> > Eric Paris <eparis@parisplace.org> wrote:
> >
> > > I'll take a close look in the morning. I don't offhand see how this
> > > is possible without calling vfs_unlink on a negative dentry (does that
> > > even make sense?)
> > >
> > > What was the filesystem you are dealing with? Odd, very, odd (as with
> > > all the problems people find in my notify code)
> > >
> > The filesystem is ext4 (converted from ext3).
>
> I got the assembly from Grant, it's actually very easy to map it back to
> the code but I don't see any problems!
>
> RIP: 0010:[<ffffffff8111bf93>] [<ffffffff8111bf93>] fsnotify+0x93/0x150
> RSP: 0018:ffff880120c89d88 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: ffff88012b8aadcc RSI: 0000000000000000 RDI: ffff88012b8aadc8
> RBP: ffff880120c89df8 R08: 0000000000000000 R09: 0000000000000001
> R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000200
> R13: 0000000008000200 R14: ffff8801286bf760 R15: 0000000000000000
> FS: 00007f10fd749750(0000) GS:ffff88002f000000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> DR2: 0000000000000000 CR3: 0000000120c59000 CR4: 00000000000026f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>
> ffffffff8111bf77: e8 94 ce f6 ff callq ffffffff81088e10 <srcu_read_lock>
> ffffffff8111bf7c: 48 8b 1d 3d b5 6d 00 mov 0x6db53d(%rip),%rbx # ffffffff817f74c0 <fsnotify_groups>
> ffffffff8111bf83: 89 45 ac mov %eax,-0x54(%rbp)
> ffffffff8111bf86: 48 c7 45 c8 00 00 00 movq $0x0,-0x38(%rbp)
> ffffffff8111bf8d: 00
> ffffffff8111bf8e: eb 03 jmp ffffffff8111bf93 <fsnotify+0x93>
> ffffffff8111bf90: 48 8b 1b mov (%rbx),%rbx
> ffffffff8111bf93: 48 8b 03 mov (%rbx),%rax <----------------------IP IS HERE
> ffffffff8111bf96: 48 81 fb c0 74 7f 81 cmp $0xffffffff817f74c0,%rbx
> ffffffff8111bf9d: 0f 18 08 prefetcht0 (%rax)
> ffffffff8111bfa0: 74 36 je ffffffff8111bfd8 <fsnotify+0xd8>
> ffffffff8111bfa2: 44 85 63 10 test %r12d,0x10(%rbx) <--- [if (test_mask &group->mask)]
> ffffffff8111bfa6: 74 e8 je ffffffff8111bf90 <fsnotify+0x90>
> ffffffff8111bfa8: 48 8b 43 20 mov 0x20(%rbx),%rax
> ffffffff8111bfac: 44 89 ea mov %r13d,%edx
> ffffffff8111bfaf: 4c 89 f6 mov %r14,%rsi
> ffffffff8111bfb2: 48 89 df mov %rbx,%rdi
> ffffffff8111bfb5: ff 10 callq *(%rax) <-------------- [group->ops->should_send_event()]
> ffffffff8111bfb7: 84 c0 test %al,%al
> ffffffff8111bfb9: 74 d5 je ffffffff8111bf90 <fsnotify+0x90>
> ffffffff8111bfbb: 48 83 7d c8 00 cmpq $0x0,-0x38(%rbp)
> ffffffff8111bfc0: 74 46 je ffffffff8111c008 <fsnotify+0x108> <- [fsnotify_create_event()]
> ffffffff8111bfc2: 48 8b 43 20 mov 0x20(%rbx),%rax
> ffffffff8111bfc6: 48 8b 75 c8 mov -0x38(%rbp),%rsi
> ffffffff8111bfca: 48 89 df mov %rbx,%rdi
> ffffffff8111bfcd: ff 50 08 callq *0x8(%rax) <----------- [group->ops->handle_event()]
> ffffffff8111bfd0: eb be jmp ffffffff8111bf90 <fsnotify+0x90>
> ffffffff8111bfd2: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
> ffffffff8111bfd8: 8b 75 ac mov -0x54(%rbp),%esi
> ffffffff8111bfdb: 48 c7 c7 60 b3 3f 82 mov $0xffffffff823fb360,%rdi
> ffffffff8111bfe2: e8 c9 cd f6 ff callq ffffffff81088db0 <srcu_read_unlock>
>
> Here is the code segment from fsnotify():
>
> idx = srcu_read_lock(&fsnotify_grp_srcu);
> list_for_each_entry_rcu(group, &fsnotify_groups, group_list) {
> if (test_mask & group->mask) {
> if (!group->ops->should_send_event(group, to_tell, mask))
> continue;
> if (!event) {
> event = fsnotify_create_event(to_tell, mask, data,
> data_is, file_name, cookie,
> GFP_KERNEL);
> /* shit, we OOM'd and now we can't tell, maybe
> * someday someone else will want to do something
> * here */
> if (!event)
> break;
> }
> group->ops->handle_event(group, event);
> }
> }
> srcu_read_unlock(&fsnotify_grp_srcu, idx);
>
> So the IP is clearly inside the 'list_for_each_entry_rcu.' This means
> that somehow a ->next pointer = NULL. The anchor for this list
> (fsnotify_groups) is declared as LIST_HEAD(fsnotify_groups); so it can't
> point to NULL. So I have to look at things that can get added/removed.
> fsnotify_obtain_group() does the only addition (using list_add_rcu) and
> holding the fsnotify_grp_mutex. __fsnotify_evict_group() does a
> list_del_rcu() (which doesn't change the forward pointer) and also hold
> the correct mutex. These are the only 2 manipulation sites for objects
> on the list and they are clearly protected by the fsnotify_grp_mutex.
>
> So if all of the list manipulation has proper synchronization the
> concern then becomes if an object was removed from the list (properly)
> and freed (improperly) and had it's ->next pntr set to NULL before the
> rcu timeouts. But in fsnotify_put_group() I have:
>
> __fsnotify_evict_group(group);
> mutex_unlock(&fsnotify_grp_mutex);
> synchronize_srcu(&fsnotify_grp_srcu);
> fsnotify_destroy_group(group);
>
> So the group is taken off of the list in __fsnotify_evict_group() and
> then we imediately wait for the fsnotify_grp_srcu which is protecting
> the read section where we hit a bug. So it doesn't look possible that
> the object could have been freed while it was still in use by this list.
>
> Maybe someone a lot smarter than me or someone who understands {s,}rcu a
> lot better than I do can show me something that I've done wrong, but I'm
> having problems seeing anything. The only thing I can imagine is that
> something was scribbling in random memory it didn't own. I'm sorry
> Grant, but I don't see what is wrong here! If you keep getting bugs,
> especially if they happen in random places, I'd suggest either dying RAM
> or blame some other random subsystem :)
I don't immediately see any of these potential problems, but thought
I should list them for completeness...
1. Executing an srcu_read_unlock() without the matching
srcu_read_lock(). This has the same bad effect as a
stray decrement of a reference counter.
2. Indefinite nesting of srcu_read_lock(), which will eventually
overflow the (int) counter, which is just as bad as overflowing
a reference counter.
#1 and #2 can be checked for by occasionally checking the
value of srcu_readers_active(). If this ever shows up negative
or if it increases indefinitely, check your code carefully.
3. Passing to srcu_read_unlock() some value other than that
returned by the matching srcu_read_lock(). I am presuming that
"idx" above is a local variable -- attempting to share the "idx"
variable between multiple readers would be fatal. Please also
note that when I say "matching", I really mean it. The following,
for example, is buggy, and could result in the type of failures
you are seeing:
idx1 = srcu_read_lock(&my_srcu);
...
idx2 = srcu_read_lock(&my_srcu);
...
srcu_read_unlock(&my_srcu, idx1); /* BUGGY!!! */
...
srcu_read_unlock(&my_srcu, idx2); /* BUGGY!!! */
4. Obviously, there might be bugs in SRCU itself. I did
re-inspect it, and didn't see any bugs, but you might want to
run rcutorture with torture_type=srcu on the hardware on which
the failure occurs.
5. Whatever the bug really is. ;-)
Thanx, Paul
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-08-28 15:32 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-20 15:48 2.6.31-rc6: Oops in fsnotify Grant Wilson
2009-08-21 2:22 ` Eric Paris
2009-08-21 6:25 ` Grant Wilson
2009-08-22 17:11 ` Eric Paris
2009-08-22 22:27 ` Eric Paris
2009-08-28 15:32 ` Paul E. McKenney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox