Re: BUG: 2.6.26-rc1-git8: NULL reference in drop

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: BUG: 2.6.26-rc1-git8: NULL reference in drop_buffers
       [not found] <20080511105429.a5e40721.randy.dunlap@oracle.com>
@ 2008-05-12  6:23 ` Andrew Morton
  2008-05-12 15:04   ` Randy Dunlap
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2008-05-12  6:23 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: lkml, viro, Jan Kara, linux-mm

On Sun, 11 May 2008 10:54:29 -0700 Randy Dunlap <randy.dunlap@oracle.com> wrote:

> On x86_64, during testing using "stress" package:
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000000


> IP: [<ffffffff802ad273>] drop_buffers+0x2f/0xfb
> PGD 1ee8ad067 PUD 26f19a067 PMD 0
> Oops: 0000 [1] SMP
> CPU 3
> Modules linked in: parport_pc lp parport tg3 cciss ehci_hcd ohci_hcd uhci_hcd
> Pid: 16860, comm: stress Not tainted 2.6.26-rc1-git8 #1
> RIP: 0010:[<ffffffff802ad273>]  [<ffffffff802ad273>] drop_buffers+0x2f/0xfb
> RSP: 0000:ffff81026bc03a08  EFLAGS: 00010203
> RAX: 0000000000000000 RBX: ffffe20008bae680 RCX: ffff81027f490f00
> RDX: 0000000000000000 RSI: ffff81026bc03a58 RDI: ffffe20008bae680
> RBP: ffff81026bc03a38 R08: ffff81026bc03b78 R09: ffff810001103780
> R10: ffff81026bc03a08 R11: ffff81026bc03c88 R12: ffffe20008bae680
> R13: ffff81027c412850 R14: ffff81026bc03d58 R15: ffff81026bc03a58
> FS:  00007fa9e7e416f0(0000) GS:ffff81027f806980(0000) knlGS:00000000f7f856c0
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000000 CR3: 000000027f973000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process stress (pid: 16860, threadinfo ffff81026bc02000, task ffff81027e424c50)
> Stack:  ffffe20008bafa68 ffffe20008bae680 ffff81027f490f00 ffff81026bc03a58
>  ffff81026bc03d58 ffff81026bc03c88 ffff81026bc03a78 ffffffff802ad39f
>  ffff81027f490f00 ffffe20008b14060 0000000000000000 ffff81027f490f00
> Call Trace:
>  [<ffffffff802ad39f>] try_to_free_buffers+0x60/0xa2
>  [<ffffffff80267f98>] try_to_release_page+0x3b/0x41
>  [<ffffffff802719bc>] shrink_page_list+0x457/0x562
>  [<ffffffff80271bed>] shrink_inactive_list+0x126/0x361
>  [<ffffffff80271f0d>] shrink_zone+0xe5/0x10a
>  [<ffffffff8027227d>] try_to_free_pages+0x1ef/0x326
>  [<ffffffff80270f4b>] ? isolate_pages_global+0x0/0x34
>  [<ffffffff8026d843>] __alloc_pages_internal+0x25a/0x3ad
>  [<ffffffff8026d9ac>] __alloc_pages+0xb/0xd
>  [<ffffffff80277759>] handle_mm_fault+0x238/0x6d0
>  [<ffffffff8053d9c4>] do_page_fault+0x438/0x7de
>  [<ffffffff8053b999>] error_exit+0x0/0x51
> 
> 
> Code: 41 57 49 89 f7 41 56 41 55 41 54 49 89 fc 53 48 83 ec 08 48 8b 07 25 00 08 00 00 48 85 c0 75 04 0f 0b eb fe 4c 8b 6f 10 4c 89 ea <48> 8b 02 25 00 08 00 00 48 85 c0 74 10 49 8b 44 24 18 48 85 c0
> RIP  [<ffffffff802ad273>] drop_buffers+0x2f/0xfb
>  RSP <ffff81026bc03a08>
> CR2: 0000000000000000
> Kernel panic - not syncing: Fatal exception

Seems that local variable `bh' is NULL.

I wonder what the heck we did to cause that.  Which filesystems were in
use?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BUG: 2.6.26-rc1-git8: NULL reference in drop_buffers
  2008-05-12  6:23 ` BUG: 2.6.26-rc1-git8: NULL reference in drop_buffers Andrew Morton
@ 2008-05-12 15:04   ` Randy Dunlap
  2008-05-12 16:59     ` Jan Kara
  2008-05-12 17:15     ` Chris Mason
  0 siblings, 2 replies; 7+ messages in thread
From: Randy Dunlap @ 2008-05-12 15:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, viro, Jan Kara, linux-mm

Andrew Morton wrote:
> On Sun, 11 May 2008 10:54:29 -0700 Randy Dunlap <randy.dunlap@oracle.com> wrote:
> 
>> On x86_64, during testing using "stress" package:
>>
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
> 
> 
>> IP: [<ffffffff802ad273>] drop_buffers+0x2f/0xfb
>> PGD 1ee8ad067 PUD 26f19a067 PMD 0
>> Oops: 0000 [1] SMP
>> CPU 3
>> Modules linked in: parport_pc lp parport tg3 cciss ehci_hcd ohci_hcd uhci_hcd
>> Pid: 16860, comm: stress Not tainted 2.6.26-rc1-git8 #1
>> RIP: 0010:[<ffffffff802ad273>]  [<ffffffff802ad273>] drop_buffers+0x2f/0xfb
>> RSP: 0000:ffff81026bc03a08  EFLAGS: 00010203
>> RAX: 0000000000000000 RBX: ffffe20008bae680 RCX: ffff81027f490f00
>> RDX: 0000000000000000 RSI: ffff81026bc03a58 RDI: ffffe20008bae680
>> RBP: ffff81026bc03a38 R08: ffff81026bc03b78 R09: ffff810001103780
>> R10: ffff81026bc03a08 R11: ffff81026bc03c88 R12: ffffe20008bae680
>> R13: ffff81027c412850 R14: ffff81026bc03d58 R15: ffff81026bc03a58
>> FS:  00007fa9e7e416f0(0000) GS:ffff81027f806980(0000) knlGS:00000000f7f856c0
>> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> CR2: 0000000000000000 CR3: 000000027f973000 CR4: 00000000000006e0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> Process stress (pid: 16860, threadinfo ffff81026bc02000, task ffff81027e424c50)
>> Stack:  ffffe20008bafa68 ffffe20008bae680 ffff81027f490f00 ffff81026bc03a58
>>  ffff81026bc03d58 ffff81026bc03c88 ffff81026bc03a78 ffffffff802ad39f
>>  ffff81027f490f00 ffffe20008b14060 0000000000000000 ffff81027f490f00
>> Call Trace:
>>  [<ffffffff802ad39f>] try_to_free_buffers+0x60/0xa2
>>  [<ffffffff80267f98>] try_to_release_page+0x3b/0x41
>>  [<ffffffff802719bc>] shrink_page_list+0x457/0x562
>>  [<ffffffff80271bed>] shrink_inactive_list+0x126/0x361
>>  [<ffffffff80271f0d>] shrink_zone+0xe5/0x10a
>>  [<ffffffff8027227d>] try_to_free_pages+0x1ef/0x326
>>  [<ffffffff80270f4b>] ? isolate_pages_global+0x0/0x34
>>  [<ffffffff8026d843>] __alloc_pages_internal+0x25a/0x3ad
>>  [<ffffffff8026d9ac>] __alloc_pages+0xb/0xd
>>  [<ffffffff80277759>] handle_mm_fault+0x238/0x6d0
>>  [<ffffffff8053d9c4>] do_page_fault+0x438/0x7de
>>  [<ffffffff8053b999>] error_exit+0x0/0x51
>>
>>
>> Code: 41 57 49 89 f7 41 56 41 55 41 54 49 89 fc 53 48 83 ec 08 48 8b 07 25 00 08 00 00 48 85 c0 75 04 0f 0b eb fe 4c 8b 6f 10 4c 89 ea <48> 8b 02 25 00 08 00 00 48 85 c0 74 10 49 8b 44 24 18 48 85 c0
>> RIP  [<ffffffff802ad273>] drop_buffers+0x2f/0xfb
>>  RSP <ffff81026bc03a08>
>> CR2: 0000000000000000
>> Kernel panic - not syncing: Fatal exception
> 
> Seems that local variable `bh' is NULL.
> 
> I wonder what the heck we did to cause that.  Which filesystems were in
> use?

ext3, nfs, and the usual procfs, sysfs, and tmpfs.

Also in the kernel:  debugfs, usbfs, inotifyfs, configfs, ramfs,
hugetlbfs, msdos, vfat, iso9660, and rootfs.

-- 
~Randy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BUG: 2.6.26-rc1-git8: NULL reference in drop_buffers
  2008-05-12 15:04   ` Randy Dunlap
@ 2008-05-12 16:59     ` Jan Kara
  2008-05-12 17:15     ` Chris Mason
  1 sibling, 0 replies; 7+ messages in thread
From: Jan Kara @ 2008-05-12 16:59 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: Andrew Morton, lkml, viro, Jan Kara, linux-mm

> Andrew Morton wrote:
> >On Sun, 11 May 2008 10:54:29 -0700 Randy Dunlap <randy.dunlap@oracle.com> 
> >wrote:
> >
> >>On x86_64, during testing using "stress" package:
> >>
> >>BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
> >
> >
> >>IP: [<ffffffff802ad273>] drop_buffers+0x2f/0xfb
> >>PGD 1ee8ad067 PUD 26f19a067 PMD 0
> >>Oops: 0000 [1] SMP
> >>CPU 3
> >>Modules linked in: parport_pc lp parport tg3 cciss ehci_hcd ohci_hcd 
> >>uhci_hcd
> >>Pid: 16860, comm: stress Not tainted 2.6.26-rc1-git8 #1
> >>RIP: 0010:[<ffffffff802ad273>]  [<ffffffff802ad273>] 
> >>drop_buffers+0x2f/0xfb
> >>RSP: 0000:ffff81026bc03a08  EFLAGS: 00010203
> >>RAX: 0000000000000000 RBX: ffffe20008bae680 RCX: ffff81027f490f00
> >>RDX: 0000000000000000 RSI: ffff81026bc03a58 RDI: ffffe20008bae680
> >>RBP: ffff81026bc03a38 R08: ffff81026bc03b78 R09: ffff810001103780
> >>R10: ffff81026bc03a08 R11: ffff81026bc03c88 R12: ffffe20008bae680
> >>R13: ffff81027c412850 R14: ffff81026bc03d58 R15: ffff81026bc03a58
> >>FS:  00007fa9e7e416f0(0000) GS:ffff81027f806980(0000) 
> >>knlGS:00000000f7f856c0
> >>CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> >>CR2: 0000000000000000 CR3: 000000027f973000 CR4: 00000000000006e0
> >>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> >>Process stress (pid: 16860, threadinfo ffff81026bc02000, task 
> >>ffff81027e424c50)
> >>Stack:  ffffe20008bafa68 ffffe20008bae680 ffff81027f490f00 
> >>ffff81026bc03a58
> >> ffff81026bc03d58 ffff81026bc03c88 ffff81026bc03a78 ffffffff802ad39f
> >> ffff81027f490f00 ffffe20008b14060 0000000000000000 ffff81027f490f00
> >>Call Trace:
> >> [<ffffffff802ad39f>] try_to_free_buffers+0x60/0xa2
> >> [<ffffffff80267f98>] try_to_release_page+0x3b/0x41
> >> [<ffffffff802719bc>] shrink_page_list+0x457/0x562
> >> [<ffffffff80271bed>] shrink_inactive_list+0x126/0x361
> >> [<ffffffff80271f0d>] shrink_zone+0xe5/0x10a
> >> [<ffffffff8027227d>] try_to_free_pages+0x1ef/0x326
> >> [<ffffffff80270f4b>] ? isolate_pages_global+0x0/0x34
> >> [<ffffffff8026d843>] __alloc_pages_internal+0x25a/0x3ad
> >> [<ffffffff8026d9ac>] __alloc_pages+0xb/0xd
> >> [<ffffffff80277759>] handle_mm_fault+0x238/0x6d0
> >> [<ffffffff8053d9c4>] do_page_fault+0x438/0x7de
> >> [<ffffffff8053b999>] error_exit+0x0/0x51
> >>
> >>
> >>Code: 41 57 49 89 f7 41 56 41 55 41 54 49 89 fc 53 48 83 ec 08 48 8b 07 
> >>25 00 08 00 00 48 85 c0 75 04 0f 0b eb fe 4c 8b 6f 10 4c 89 ea <48> 8b 02 
> >>25 00 08 00 00 48 85 c0 74 10 49 8b 44 24 18 48 85 c0
> >>RIP  [<ffffffff802ad273>] drop_buffers+0x2f/0xfb
> >> RSP <ffff81026bc03a08>
> >>CR2: 0000000000000000
> >>Kernel panic - not syncing: Fatal exception
> >
> >Seems that local variable `bh' is NULL.
> >
> >I wonder what the heck we did to cause that.  Which filesystems were in
> >use?
> 
> ext3, nfs, and the usual procfs, sysfs, and tmpfs.
  Hmm, the page doesn't look like one from ext3 or nfs because they have
their private releasepage() calls... In theory it could also be a single
bit error setting PagePrivate bit but that's just a last resort hope ;)
I don't have better explanation though.

> Also in the kernel:  debugfs, usbfs, inotifyfs, configfs, ramfs,
> hugetlbfs, msdos, vfat, iso9660, and rootfs.

									Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BUG: 2.6.26-rc1-git8: NULL reference in drop_buffers
  2008-05-12 15:04   ` Randy Dunlap
  2008-05-12 16:59     ` Jan Kara
@ 2008-05-12 17:15     ` Chris Mason
  2008-05-12 18:36       ` Randy Dunlap
  2008-05-12 21:12       ` Randy Dunlap
  1 sibling, 2 replies; 7+ messages in thread
From: Chris Mason @ 2008-05-12 17:15 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: Andrew Morton, lkml, viro, Jan Kara, linux-mm

On Monday 12 May 2008, Randy Dunlap wrote:
> Andrew Morton wrote:
> > On Sun, 11 May 2008 10:54:29 -0700 Randy Dunlap <randy.dunlap@oracle.com> 
wrote:
> >> On x86_64, during testing using "stress" package:
> >>
> >> BUG: unable to handle kernel NULL pointer dereference at
> >> 0000000000000000
> >>
> >>
> >> IP: [<ffffffff802ad273>] drop_buffers+0x2f/0xfb
> >> PGD 1ee8ad067 PUD 26f19a067 PMD 0
> >> Oops: 0000 [1] SMP
> >> CPU 3
> >> Modules linked in: parport_pc lp parport tg3 cciss ehci_hcd ohci_hcd
> >> uhci_hcd Pid: 16860, comm: stress Not tainted 2.6.26-rc1-git8 #1
> >> RIP: 0010:[<ffffffff802ad273>]  [<ffffffff802ad273>]
> >> drop_buffers+0x2f/0xfb RSP: 0000:ffff81026bc03a08  EFLAGS: 00010203
> >> RAX: 0000000000000000 RBX: ffffe20008bae680 RCX: ffff81027f490f00
> >> RDX: 0000000000000000 RSI: ffff81026bc03a58 RDI: ffffe20008bae680
> >> RBP: ffff81026bc03a38 R08: ffff81026bc03b78 R09: ffff810001103780
> >> R10: ffff81026bc03a08 R11: ffff81026bc03c88 R12: ffffe20008bae680
> >> R13: ffff81027c412850 R14: ffff81026bc03d58 R15: ffff81026bc03a58
> >> FS:  00007fa9e7e416f0(0000) GS:ffff81027f806980(0000)
> >> knlGS:00000000f7f856c0 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> >> CR2: 0000000000000000 CR3: 000000027f973000 CR4: 00000000000006e0
> >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> >> Process stress (pid: 16860, threadinfo ffff81026bc02000, task
> >> ffff81027e424c50) Stack:  ffffe20008bafa68 ffffe20008bae680
> >> ffff81027f490f00 ffff81026bc03a58 ffff81026bc03d58 ffff81026bc03c88
> >> ffff81026bc03a78 ffffffff802ad39f ffff81027f490f00 ffffe20008b14060
> >> 0000000000000000 ffff81027f490f00 Call Trace:
> >>  [<ffffffff802ad39f>] try_to_free_buffers+0x60/0xa2
> >>  [<ffffffff80267f98>] try_to_release_page+0x3b/0x41
> >>  [<ffffffff802719bc>] shrink_page_list+0x457/0x562
> >>  [<ffffffff80271bed>] shrink_inactive_list+0x126/0x361
> >>  [<ffffffff80271f0d>] shrink_zone+0xe5/0x10a
> >>  [<ffffffff8027227d>] try_to_free_pages+0x1ef/0x326
> >>  [<ffffffff80270f4b>] ? isolate_pages_global+0x0/0x34
> >>  [<ffffffff8026d843>] __alloc_pages_internal+0x25a/0x3ad
> >>  [<ffffffff8026d9ac>] __alloc_pages+0xb/0xd
> >>  [<ffffffff80277759>] handle_mm_fault+0x238/0x6d0
> >>  [<ffffffff8053d9c4>] do_page_fault+0x438/0x7de
> >>  [<ffffffff8053b999>] error_exit+0x0/0x51
> >>
> >>
> >> Code: 41 57 49 89 f7 41 56 41 55 41 54 49 89 fc 53 48 83 ec 08 48 8b 07
> >> 25 00 08 00 00 48 85 c0 75 04 0f 0b eb fe 4c 8b 6f 10 4c 89 ea <48> 8b
> >> 02 25 00 08 00 00 48 85 c0 74 10 49 8b 44 24 18 48 85 c0 RIP 
> >> [<ffffffff802ad273>] drop_buffers+0x2f/0xfb
> >>  RSP <ffff81026bc03a08>
> >> CR2: 0000000000000000
> >> Kernel panic - not syncing: Fatal exception
> >
> > Seems that local variable `bh' is NULL.
> >
> > I wonder what the heck we did to cause that.  Which filesystems were in
> > use?
>
> ext3, nfs, and the usual procfs, sysfs, and tmpfs.
>
> Also in the kernel:  debugfs, usbfs, inotifyfs, configfs, ramfs,
> hugetlbfs, msdos, vfat, iso9660, and rootfs.

If you stand on your head, and race really really hard, 
nfs_inode_remove_request() does this without locking the page:

       set_page_private(req->wb_page, 0);
       ClearPagePrivate(req->wb_page);

That code has been around for a long time though.

Probably not the droids we're looking for, but it was the only one that jumped 
out at me during a quick search of set_page_private(foo, 0) callers.  

It seems more likely that we got there by an invalidatepage call that left 
PagePrivate set but didn't allow the page to be freed.

The page would turn into the funky anonymous zombie thing meant for buffers 
that had to be written before the page could be freed (PagePrivate set but 
page->mapping == NULL), and eventually find its way to try_to_free_buffers().

The problem with that theory is that I would expect page->private to be 
non-null in such a case.  Randy, any chance this can be reproduced?

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BUG: 2.6.26-rc1-git8: NULL reference in drop_buffers
  2008-05-12 17:15     ` Chris Mason
@ 2008-05-12 18:36       ` Randy Dunlap
  2008-05-12 21:12       ` Randy Dunlap
  1 sibling, 0 replies; 7+ messages in thread
From: Randy Dunlap @ 2008-05-12 18:36 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-mm, Jan Kara, viro, lkml, Andrew Morton

--- Original Message ---
> On Monday 12 May 2008, Randy Dunlap wrote:
> > Andrew Morton wrote:
> > > On Sun, 11 May 2008 10:54:29 -0700 Randy Dunlap <randy.dunlap@oracle.com> 
> wrote:
> > >> On x86_64, during testing using "stress" package:
> > >>
> > >> BUG: unable to handle kernel NULL pointer dereference at
> > >> 0000000000000000
> > >>
> > >>
> > >> IP: [<ffffffff802ad273>] drop_buffers+0x2f/0xfb
> > >> PGD 1ee8ad067 PUD 26f19a067 PMD 0
> > >> Oops: 0000 [1] SMP
> > >> CPU 3
> > >> Modules linked in: parport_pc lp parport tg3 cciss ehci_hcd ohci_hcd
> > >> uhci_hcd Pid: 16860, comm: stress Not tainted 2.6.26-rc1-git8 #1
> > >> RIP: 0010:[<ffffffff802ad273>]  [<ffffffff802ad273>]
> > >> drop_buffers+0x2f/0xfb RSP: 0000:ffff81026bc03a08  EFLAGS: 00010203
> > >> RAX: 0000000000000000 RBX: ffffe20008bae680 RCX: ffff81027f490f00
> > >> RDX: 0000000000000000 RSI: ffff81026bc03a58 RDI: ffffe20008bae680
> > >> RBP: ffff81026bc03a38 R08: ffff81026bc03b78 R09: ffff810001103780
> > >> R10: ffff81026bc03a08 R11: ffff81026bc03c88 R12: ffffe20008bae680
> > >> R13: ffff81027c412850 R14: ffff81026bc03d58 R15: ffff81026bc03a58
> > >> FS:  00007fa9e7e416f0(0000) GS:ffff81027f806980(0000)
> > >> knlGS:00000000f7f856c0 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > >> CR2: 0000000000000000 CR3: 000000027f973000 CR4: 00000000000006e0
> > >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > >> Process stress (pid: 16860, threadinfo ffff81026bc02000, task
> > >> ffff81027e424c50) Stack:  ffffe20008bafa68 ffffe20008bae680
> > >> ffff81027f490f00 ffff81026bc03a58 ffff81026bc03d58 ffff81026bc03c88
> > >> ffff81026bc03a78 ffffffff802ad39f ffff81027f490f00 ffffe20008b14060
> > >> 0000000000000000 ffff81027f490f00 Call Trace:
> > >>  [<ffffffff802ad39f>] try_to_free_buffers+0x60/0xa2
> > >>  [<ffffffff80267f98>] try_to_release_page+0x3b/0x41
> > >>  [<ffffffff802719bc>] shrink_page_list+0x457/0x562
> > >>  [<ffffffff80271bed>] shrink_inactive_list+0x126/0x361
> > >>  [<ffffffff80271f0d>] shrink_zone+0xe5/0x10a
> > >>  [<ffffffff8027227d>] try_to_free_pages+0x1ef/0x326
> > >>  [<ffffffff80270f4b>] ? isolate_pages_global+0x0/0x34
> > >>  [<ffffffff8026d843>] __alloc_pages_internal+0x25a/0x3ad
> > >>  [<ffffffff8026d9ac>] __alloc_pages+0xb/0xd
> > >>  [<ffffffff80277759>] handle_mm_fault+0x238/0x6d0
> > >>  [<ffffffff8053d9c4>] do_page_fault+0x438/0x7de
> > >>  [<ffffffff8053b999>] error_exit+0x0/0x51
> > >>
> > >>
> > >> Code: 41 57 49 89 f7 41 56 41 55 41 54 49 89 fc 53 48 83 ec 08 48 8b 07
> > >> 25 00 08 00 00 48 85 c0 75 04 0f 0b eb fe 4c 8b 6f 10 4c 89 ea <48> 8b
> > >> 02 25 00 08 00 00 48 85 c0 74 10 49 8b 44 24 18 48 85 c0 RIP 
> > >> [<ffffffff802ad273>] drop_buffers+0x2f/0xfb
> > >>  RSP <ffff81026bc03a08>
> > >> CR2: 0000000000000000
> > >> Kernel panic - not syncing: Fatal exception
> > >
> > > Seems that local variable `bh' is NULL.
> > >
> > > I wonder what the heck we did to cause that.  Which filesystems were in
> > > use?
> >
> > ext3, nfs, and the usual procfs, sysfs, and tmpfs.
> >
> > Also in the kernel:  debugfs, usbfs, inotifyfs, configfs, ramfs,
> > hugetlbfs, msdos, vfat, iso9660, and rootfs.
> 
> If you stand on your head, and race really really hard, 
> nfs_inode_remove_request() does this without locking the page:
> 
>        set_page_private(req->wb_page, 0);
>        ClearPagePrivate(req->wb_page);
> 
> That code has been around for a long time though.
> 
> Probably not the droids we're looking for, but it was the only
> one that jumped 
> out at me during a quick search of set_page_private(foo, 0)
> callers.  
> 
> It seems more likely that we got there by an invalidatepage
> call that left 
> PagePrivate set but didn't allow the page to be freed.
> 
> The page would turn into the funky anonymous zombie thing meant
> for buffers 
> that had to be written before the page could be freed (PagePrivate
> set but 
> page->mapping == NULL), and eventually find its way to
> try_to_free_buffers().
> 
> The problem with that theory is that I would expect page->private
> to be 
> non-null in such a case.  Randy, any chance this can be
> reproduced?

No idea.  I'm rerunning the test now.

~Randy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BUG: 2.6.26-rc1-git8: NULL reference in drop_buffers
  2008-05-12 17:15     ` Chris Mason
  2008-05-12 18:36       ` Randy Dunlap
@ 2008-05-12 21:12       ` Randy Dunlap
  2008-05-12 22:44         ` Chris Mason
  1 sibling, 1 reply; 7+ messages in thread
From: Randy Dunlap @ 2008-05-12 21:12 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-mm, Jan Kara, viro, lkml, Andrew Morton

--- Original Message ---
> On Monday 12 May 2008, Randy Dunlap wrote:
> > Andrew Morton wrote:
> > > On Sun, 11 May 2008 10:54:29 -0700 Randy Dunlap <randy.dunlap@oracle.com> 
> wrote:
> > >> On x86_64, during testing using "stress" package:
> > >>
> > >> BUG: unable to handle kernel NULL pointer dereference at
> > >> 0000000000000000
> > >>
> > >>
> > >> IP: [<ffffffff802ad273>] drop_buffers+0x2f/0xfb
> > >> PGD 1ee8ad067 PUD 26f19a067 PMD 0
> > >> Oops: 0000 [1] SMP
> > >> CPU 3
> > >> Modules linked in: parport_pc lp parport tg3 cciss ehci_hcd ohci_hcd
> > >> uhci_hcd Pid: 16860, comm: stress Not tainted 2.6.26-rc1-git8 #1
> > >> RIP: 0010:[<ffffffff802ad273>]  [<ffffffff802ad273>]
> > >> drop_buffers+0x2f/0xfb RSP: 0000:ffff81026bc03a08  EFLAGS: 00010203
> > >> RAX: 0000000000000000 RBX: ffffe20008bae680 RCX: ffff81027f490f00
> > >> RDX: 0000000000000000 RSI: ffff81026bc03a58 RDI: ffffe20008bae680
> > >> RBP: ffff81026bc03a38 R08: ffff81026bc03b78 R09: ffff810001103780
> > >> R10: ffff81026bc03a08 R11: ffff81026bc03c88 R12: ffffe20008bae680
> > >> R13: ffff81027c412850 R14: ffff81026bc03d58 R15: ffff81026bc03a58
> > >> FS:  00007fa9e7e416f0(0000) GS:ffff81027f806980(0000)
> > >> knlGS:00000000f7f856c0 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > >> CR2: 0000000000000000 CR3: 000000027f973000 CR4: 00000000000006e0
> > >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > >> Process stress (pid: 16860, threadinfo ffff81026bc02000, task
> > >> ffff81027e424c50) Stack:  ffffe20008bafa68 ffffe20008bae680
> > >> ffff81027f490f00 ffff81026bc03a58 ffff81026bc03d58 ffff81026bc03c88
> > >> ffff81026bc03a78 ffffffff802ad39f ffff81027f490f00 ffffe20008b14060
> > >> 0000000000000000 ffff81027f490f00 Call Trace:
> > >>  [<ffffffff802ad39f>] try_to_free_buffers+0x60/0xa2
> > >>  [<ffffffff80267f98>] try_to_release_page+0x3b/0x41
> > >>  [<ffffffff802719bc>] shrink_page_list+0x457/0x562
> > >>  [<ffffffff80271bed>] shrink_inactive_list+0x126/0x361
> > >>  [<ffffffff80271f0d>] shrink_zone+0xe5/0x10a
> > >>  [<ffffffff8027227d>] try_to_free_pages+0x1ef/0x326
> > >>  [<ffffffff80270f4b>] ? isolate_pages_global+0x0/0x34
> > >>  [<ffffffff8026d843>] __alloc_pages_internal+0x25a/0x3ad
> > >>  [<ffffffff8026d9ac>] __alloc_pages+0xb/0xd
> > >>  [<ffffffff80277759>] handle_mm_fault+0x238/0x6d0
> > >>  [<ffffffff8053d9c4>] do_page_fault+0x438/0x7de
> > >>  [<ffffffff8053b999>] error_exit+0x0/0x51
> > >>
> > >>
> > >> Code: 41 57 49 89 f7 41 56 41 55 41 54 49 89 fc 53 48 83 ec 08 48 8b 07
> > >> 25 00 08 00 00 48 85 c0 75 04 0f 0b eb fe 4c 8b 6f 10 4c 89 ea <48> 8b
> > >> 02 25 00 08 00 00 48 85 c0 74 10 49 8b 44 24 18 48 85 c0 RIP 
> > >> [<ffffffff802ad273>] drop_buffers+0x2f/0xfb
> > >>  RSP <ffff81026bc03a08>
> > >> CR2: 0000000000000000
> > >> Kernel panic - not syncing: Fatal exception
> > >
> > > Seems that local variable `bh' is NULL.
> > >
> > > I wonder what the heck we did to cause that.  Which filesystems were in
> > > use?
> >
> > ext3, nfs, and the usual procfs, sysfs, and tmpfs.
> >
> > Also in the kernel:  debugfs, usbfs, inotifyfs, configfs, ramfs,
> > hugetlbfs, msdos, vfat, iso9660, and rootfs.
> 
> If you stand on your head, and race really really hard, 
> nfs_inode_remove_request() does this without locking the page:
> 
>        set_page_private(req->wb_page, 0);
>        ClearPagePrivate(req->wb_page);
> 
> That code has been around for a long time though.
> 
> Probably not the droids we're looking for, but it was the only
> one that jumped 
> out at me during a quick search of set_page_private(foo, 0)
> callers.  
> 
> It seems more likely that we got there by an invalidatepage
> call that left 
> PagePrivate set but didn't allow the page to be freed.
> 
> The page would turn into the funky anonymous zombie thing meant
> for buffers 
> that had to be written before the page could be freed (PagePrivate
> set but 
> page->mapping == NULL), and eventually find its way to
> try_to_free_buffers().
> 
> The problem with that theory is that I would expect page->private
> to be 
> non-null in such a case.  Randy, any chance this can be
> reproduced?

It didn't fail when I re-ran the test.

~Randy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BUG: 2.6.26-rc1-git8: NULL reference in drop_buffers
  2008-05-12 21:12       ` Randy Dunlap
@ 2008-05-12 22:44         ` Chris Mason
  0 siblings, 0 replies; 7+ messages in thread
From: Chris Mason @ 2008-05-12 22:44 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: linux-mm, Jan Kara, viro, lkml, Andrew Morton

On Monday 12 May 2008, Randy Dunlap wrote:
> --- Original Message ---
>
> > On Monday 12 May 2008, Randy Dunlap wrote:
> > > Andrew Morton wrote:
> > > > On Sun, 11 May 2008 10:54:29 -0700 Randy Dunlap
> > > > <randy.dunlap@oracle.com>
> >
> > wrote:
> > > >> On x86_64, during testing using "stress" package:
> > > >>
> > > >> BUG: unable to handle kernel NULL pointer dereference at
> > > >> 0000000000000000
> > > >>
> > > >>
> > > >> IP: [<ffffffff802ad273>] drop_buffers+0x2f/0xfb
> > > >> PGD 1ee8ad067 PUD 26f19a067 PMD 0
> > > >> Oops: 0000 [1] SMP
> > > >> CPU 3
> > > >> Modules linked in: parport_pc lp parport tg3 cciss ehci_hcd ohci_hcd
> > > >> uhci_hcd Pid: 16860, comm: stress Not tainted 2.6.26-rc1-git8 #1
> > > >> RIP: 0010:[<ffffffff802ad273>]  [<ffffffff802ad273>]
> > > >> drop_buffers+0x2f/0xfb RSP: 0000:ffff81026bc03a08  EFLAGS: 00010203
> > > >> RAX: 0000000000000000 RBX: ffffe20008bae680 RCX: ffff81027f490f00
> > > >> RDX: 0000000000000000 RSI: ffff81026bc03a58 RDI: ffffe20008bae680
> > > >> RBP: ffff81026bc03a38 R08: ffff81026bc03b78 R09: ffff810001103780
> > > >> R10: ffff81026bc03a08 R11: ffff81026bc03c88 R12: ffffe20008bae680
> > > >> R13: ffff81027c412850 R14: ffff81026bc03d58 R15: ffff81026bc03a58
> > > >> FS:  00007fa9e7e416f0(0000) GS:ffff81027f806980(0000)
> > > >> knlGS:00000000f7f856c0 CS:  0010 DS: 0000 ES: 0000 CR0:
> > > >> 000000008005003b CR2: 0000000000000000 CR3: 000000027f973000 CR4:
> > > >> 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > >> 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > > >> 0000000000000400 Process stress (pid: 16860, threadinfo
> > > >> ffff81026bc02000, task ffff81027e424c50) Stack:  ffffe20008bafa68
> > > >> ffffe20008bae680 ffff81027f490f00 ffff81026bc03a58 ffff81026bc03d58
> > > >> ffff81026bc03c88 ffff81026bc03a78 ffffffff802ad39f ffff81027f490f00
> > > >> ffffe20008b14060 0000000000000000 ffff81027f490f00 Call Trace:
> > > >>  [<ffffffff802ad39f>] try_to_free_buffers+0x60/0xa2
> > > >>  [<ffffffff80267f98>] try_to_release_page+0x3b/0x41
> > > >>  [<ffffffff802719bc>] shrink_page_list+0x457/0x562
> > > >>  [<ffffffff80271bed>] shrink_inactive_list+0x126/0x361
> > > >>  [<ffffffff80271f0d>] shrink_zone+0xe5/0x10a
> > > >>  [<ffffffff8027227d>] try_to_free_pages+0x1ef/0x326
> > > >>  [<ffffffff80270f4b>] ? isolate_pages_global+0x0/0x34
> > > >>  [<ffffffff8026d843>] __alloc_pages_internal+0x25a/0x3ad
> > > >>  [<ffffffff8026d9ac>] __alloc_pages+0xb/0xd
> > > >>  [<ffffffff80277759>] handle_mm_fault+0x238/0x6d0
> > > >>  [<ffffffff8053d9c4>] do_page_fault+0x438/0x7de
> > > >>  [<ffffffff8053b999>] error_exit+0x0/0x51
> > > >>
> > > >>
> > > >> Code: 41 57 49 89 f7 41 56 41 55 41 54 49 89 fc 53 48 83 ec 08 48 8b
> > > >> 07 25 00 08 00 00 48 85 c0 75 04 0f 0b eb fe 4c 8b 6f 10 4c 89 ea
> > > >> <48> 8b 02 25 00 08 00 00 48 85 c0 74 10 49 8b 44 24 18 48 85 c0 RIP
> > > >> [<ffffffff802ad273>] drop_buffers+0x2f/0xfb
> > > >>  RSP <ffff81026bc03a08>
> > > >> CR2: 0000000000000000
> > > >> Kernel panic - not syncing: Fatal exception
> > > >
> > > > Seems that local variable `bh' is NULL.
> > > >
> > > > I wonder what the heck we did to cause that.  Which filesystems were
> > > > in use?
> > >
> > > ext3, nfs, and the usual procfs, sysfs, and tmpfs.
> > >
> > > Also in the kernel:  debugfs, usbfs, inotifyfs, configfs, ramfs,
> > > hugetlbfs, msdos, vfat, iso9660, and rootfs.
> >
> > If you stand on your head, and race really really hard,
> > nfs_inode_remove_request() does this without locking the page:
> >
> >        set_page_private(req->wb_page, 0);
> >        ClearPagePrivate(req->wb_page);
> >
> > That code has been around for a long time though.
> >
> > Probably not the droids we're looking for, but it was the only
> > one that jumped
> > out at me during a quick search of set_page_private(foo, 0)
> > callers.
> >
> > It seems more likely that we got there by an invalidatepage
> > call that left
> > PagePrivate set but didn't allow the page to be freed.
> >
> > The page would turn into the funky anonymous zombie thing meant
> > for buffers
> > that had to be written before the page could be freed (PagePrivate
> > set but
> > page->mapping == NULL), and eventually find its way to
> > try_to_free_buffers().
> >
> > The problem with that theory is that I would expect page->private
> > to be
> > non-null in such a case.  Randy, any chance this can be
> > reproduced?
>
> It didn't fail when I re-ran the test.

So, either the teeny tiny NFS race I saw or a really unfortunate single bit 
flip.  We can put a busy loop into the NFS code to make it easier to trigger, 
but I don't think that'll prove its the bug you hit.

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-05-12 22:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20080511105429.a5e40721.randy.dunlap@oracle.com>
2008-05-12  6:23 ` BUG: 2.6.26-rc1-git8: NULL reference in drop_buffers Andrew Morton
2008-05-12 15:04   ` Randy Dunlap
2008-05-12 16:59     ` Jan Kara
2008-05-12 17:15     ` Chris Mason
2008-05-12 18:36       ` Randy Dunlap
2008-05-12 21:12       ` Randy Dunlap
2008-05-12 22:44         ` Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).