* Re: 2.6.23-rc9: Oops in cache_alloc_refill() mm/slab.c [not found] ` <1191534231.6106.99.camel@dyn9047017100.beaverton.ibm.com> @ 2007-10-05 13:41 ` Valerie Clement 2007-10-05 14:54 ` Badari Pulavarty 0 siblings, 1 reply; 5+ messages in thread From: Valerie Clement @ 2007-10-05 13:41 UTC (permalink / raw) To: Badari Pulavarty; +Cc: Linux Kernel Mailing List, ext4 development Badari Pulavarty wrote: > On Thu, 2007-10-04 at 18:13 +0200, Valerie Clement wrote: >> While running ffsb tests on my ext4 filesystem, I got an Oops in >> cache_alloc_refill(). >> I turned on SLAB debugging and here is the message I got: >> >> slab: Internal list corruption detected in cache 'buffer_head'(30), >> slabp ffff81007e100100(1515870810). Hexdump: > > slabp->inuse = 1515870810 looks bogus. Is this easily reproducible ? Hi Badari, Thanks for your answer. I didn't reproduce it without the latest ext4 patches. So I suspect a bug in one of them. But how debugging this? Which other debug traces can I turn on? Valérie ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.6.23-rc9: Oops in cache_alloc_refill() mm/slab.c 2007-10-05 13:41 ` 2.6.23-rc9: Oops in cache_alloc_refill() mm/slab.c Valerie Clement @ 2007-10-05 14:54 ` Badari Pulavarty 2007-10-05 20:30 ` Mingming Cao 0 siblings, 1 reply; 5+ messages in thread From: Badari Pulavarty @ 2007-10-05 14:54 UTC (permalink / raw) To: Valerie Clement; +Cc: Linux Kernel Mailing List, ext4 development, cmm On Fri, 2007-10-05 at 15:41 +0200, Valerie Clement wrote: > Badari Pulavarty wrote: > > On Thu, 2007-10-04 at 18:13 +0200, Valerie Clement wrote: > >> While running ffsb tests on my ext4 filesystem, I got an Oops in > >> cache_alloc_refill(). > >> I turned on SLAB debugging and here is the message I got: > >> > >> slab: Internal list corruption detected in cache 'buffer_head'(30), > >> slabp ffff81007e100100(1515870810). Hexdump: > > > > slabp->inuse = 1515870810 looks bogus. Is this easily reproducible ? > > Hi Badari, > Thanks for your answer. > I didn't reproduce it without the latest ext4 patches. So I suspect a > bug in one of them. > But how debugging this? > Which other debug traces can I turn on? Let me understand. You applied latest ext4 patchsets ? If so, Mingming has some slab-cleanup changes in the patchset. You can try backing them out and see. Thanks, Badari ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.6.23-rc9: Oops in cache_alloc_refill() mm/slab.c 2007-10-05 14:54 ` Badari Pulavarty @ 2007-10-05 20:30 ` Mingming Cao 2007-10-05 22:06 ` Mingming Cao 0 siblings, 1 reply; 5+ messages in thread From: Mingming Cao @ 2007-10-05 20:30 UTC (permalink / raw) To: Badari Pulavarty Cc: Valerie Clement, Linux Kernel Mailing List, ext4 development On Fri, 2007-10-05 at 07:54 -0700, Badari Pulavarty wrote: > On Fri, 2007-10-05 at 15:41 +0200, Valerie Clement wrote: > > Badari Pulavarty wrote: > > > On Thu, 2007-10-04 at 18:13 +0200, Valerie Clement wrote: > > >> While running ffsb tests on my ext4 filesystem, I got an Oops in > > >> cache_alloc_refill(). > > >> I turned on SLAB debugging and here is the message I got: > > >> > > >> slab: Internal list corruption detected in cache 'buffer_head'(30), > > >> slabp ffff81007e100100(1515870810). Hexdump: > > > > > > slabp->inuse = 1515870810 looks bogus. Is this easily reproducible ? > > > > Hi Badari, > > Thanks for your answer. > > I didn't reproduce it without the latest ext4 patches. So I suspect a > > bug in one of them. > > But how debugging this? > > Which other debug traces can I turn on? > > Let me understand. You applied latest ext4 patchsets ? If so, Mingming > has some slab-cleanup changes in the patchset. You can try backing them > out and see. > It's unlikely to be the jbd_slab_cleanup.patch, which actually get rid of slab allocation for buffers passing down to disk IO, and replace with get_free_page directly. Could you send me the profile used for ffsb test? Thanks, Mingming ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.6.23-rc9: Oops in cache_alloc_refill() mm/slab.c 2007-10-05 20:30 ` Mingming Cao @ 2007-10-05 22:06 ` Mingming Cao 2007-10-08 14:12 ` Valerie Clement 0 siblings, 1 reply; 5+ messages in thread From: Mingming Cao @ 2007-10-05 22:06 UTC (permalink / raw) To: Badari Pulavarty; +Cc: Linux Kernel Mailing List, ext4 development -------- Forwarded Message -------- From: Valerie Clement <valerie.clement@bull.net> To: Linux Kernel Mailing List <linux-kernel@vger.kernel.org> Subject: 2.6.23-rc9: Oops in cache_alloc_refill() mm/slab.c Date: Thu, 04 Oct 2007 18:13:46 +0200 While running ffsb tests on my ext4 filesystem, I got an Oops in cache_alloc_refill(). I turned on SLAB debugging and here is the message I got: slab: Internal list corruption detected in cache 'buffer_head'(30), slabp ffff81007e100100(1515870810). Hexdump: =======================> slabp->inuse counter looks corrupted (1515870810), it should not greater than cachep->num looks valid (30) 000: 5a 5a 5a 5a 5a 5a 5a 5a b8 23 34 7e 00 81 ff ff 010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 020: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 030: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 040: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 050: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 060: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a a5 070: c0 88 56 63 c5 56 41 d8 f1 37 4a 80 ff ff ff ff 080: c0 88 56 63 c5 56 41 d8 80 33 53 7d 00 81 ff ff 090: e8 25 60 7d 00 81 ff ff 68 cb 3b 01 00 81 ff ff 0a0: 18 68 50 7d 00 81 ff ff ------------[ cut here ]------------ kernel BUG at /home/clementv/src/linux-2.6.23-rc9/mm/slab.c:2923! invalid opcode: 0000 [1] SMP CPU 2 Modules linked in: qla2xxx Pid: 4041, comm: ffsb Not tainted 2.6.23-rc9 #2 RIP: 0010:[<ffffffff802758b6>] [<ffffffff802758b6>] check_slabp+0xb5/0xc1 RSP: 0018:ffff8100774bb958 EFLAGS: 00010096 RAX: 0000000000000001 RBX: ffff81007e100100 RCX: 0000000000006d20 RDX: 00000000ffffffff RSI: 0000000000000046 RDI: ffff81007e347280 RBP: 00000000000000a8 R08: 0000000000000005 R09: ffffffff8060bb10 R10: 00000000000ae468 R11: 0000000500000002 R12: 00000000000000a8 R13: ffff81007e347280 R14: ffff81007e347280 R15: 0000000000000002 FS: 0000000041802950(0063) GS:ffff81007e0c4728(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000005f83d00c CR3: 0000000078149000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process ffsb (pid: 4041, threadinfo ffff8100774ba000, task ffff81007dbdc7a0) Stack: 000000000000000d 000000000000000e ffff81007e100100 ffff81007e342398 ffff81007e078488 ffffffff80277069 0000000000008050 ffff81007e347280 0000000000008050 0000000000000246 ffffffff80299539 fffffffffffff000 Call Trace: [<ffffffff80277069>] cache_alloc_refill+0xc8/0x23f [<ffffffff80299539>] alloc_buffer_head+0x14/0x45 [<ffffffff802774cd>] kmem_cache_alloc+0x94/0xe9 [<ffffffff80299539>] alloc_buffer_head+0x14/0x45 [<ffffffff80299cf7>] alloc_page_buffers+0x38/0xd5 [<ffffffff80299da8>] create_empty_buffers+0x14/0x9b [<ffffffff8029a875>] __block_prepare_write+0x7c/0x45b [<ffffffff802f6e29>] ext4_get_block+0x0/0x139 [<ffffffff8029ac6e>] block_prepare_write+0x1a/0x25 [<ffffffff802f8340>] ext4_prepare_write+0xaf/0x175 [<ffffffff802576c2>] generic_file_buffered_write+0x288/0x631 [<ffffffff80257daa>] __generic_file_aio_write_nolock+0x33f/0x3a9 [<ffffffff8022b7d5>] enqueue_entity+0x17c/0x1a3 [<ffffffff80257e75>] generic_file_aio_write+0x61/0xc1 [<ffffffff8022c512>] __check_preempt_curr_fair+0x56/0x76 [<ffffffff802f4022>] ext4_file_write+0x16/0x91 [<ffffffff8027c4f4>] do_sync_write+0xc9/0x10c [<ffffffff8027d50a>] file_move+0x1d/0x4c [<ffffffff80245992>] autoremove_wake_function+0x0/0x2e [<ffffffff8027b216>] do_filp_open+0x2a/0x38 [<ffffffff80275f7a>] poison_obj+0x26/0x30 [<ffffffff8027cc34>] vfs_write+0xad/0x136 [<ffffffff8027d171>] sys_write+0x45/0x6e [<ffffffff8020b32e>] system_call+0x7e/0x83 =============> The stack track shows ext4_new_block(), is the problem repeatable? Does away without multiple block allocation patch? Mingming ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.6.23-rc9: Oops in cache_alloc_refill() mm/slab.c 2007-10-05 22:06 ` Mingming Cao @ 2007-10-08 14:12 ` Valerie Clement 0 siblings, 0 replies; 5+ messages in thread From: Valerie Clement @ 2007-10-08 14:12 UTC (permalink / raw) To: cmm; +Cc: ext4 development, Aneesh Kumar K.V Mingming Cao wrote: > kernel BUG at /home/clementv/src/linux-2.6.23-rc9/mm/slab.c:2923! > invalid opcode: 0000 [1] SMP > CPU 2 > Modules linked in: qla2xxx > Pid: 4041, comm: ffsb Not tainted 2.6.23-rc9 #2 > RIP: 0010:[<ffffffff802758b6>] [<ffffffff802758b6>] check_slabp+0xb5/0xc1 > RSP: 0018:ffff8100774bb958 EFLAGS: 00010096 > RAX: 0000000000000001 RBX: ffff81007e100100 RCX: 0000000000006d20 > RDX: 00000000ffffffff RSI: 0000000000000046 RDI: ffff81007e347280 > RBP: 00000000000000a8 R08: 0000000000000005 R09: ffffffff8060bb10 > R10: 00000000000ae468 R11: 0000000500000002 R12: 00000000000000a8 > R13: ffff81007e347280 R14: ffff81007e347280 R15: 0000000000000002 > FS: 0000000041802950(0063) GS:ffff81007e0c4728(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 000000005f83d00c CR3: 0000000078149000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process ffsb (pid: 4041, threadinfo ffff8100774ba000, task ffff81007dbdc7a0) > Stack: 000000000000000d 000000000000000e ffff81007e100100 ffff81007e342398 > ffff81007e078488 ffffffff80277069 0000000000008050 ffff81007e347280 > 0000000000008050 0000000000000246 ffffffff80299539 fffffffffffff000 > Call Trace: > [<ffffffff80277069>] cache_alloc_refill+0xc8/0x23f > [<ffffffff80299539>] alloc_buffer_head+0x14/0x45 > [<ffffffff802774cd>] kmem_cache_alloc+0x94/0xe9 > [<ffffffff80299539>] alloc_buffer_head+0x14/0x45 > [<ffffffff80299cf7>] alloc_page_buffers+0x38/0xd5 > [<ffffffff80299da8>] create_empty_buffers+0x14/0x9b > [<ffffffff8029a875>] __block_prepare_write+0x7c/0x45b > [<ffffffff802f6e29>] ext4_get_block+0x0/0x139 > [<ffffffff8029ac6e>] block_prepare_write+0x1a/0x25 > [<ffffffff802f8340>] ext4_prepare_write+0xaf/0x175 > [<ffffffff802576c2>] generic_file_buffered_write+0x288/0x631 > [<ffffffff80257daa>] __generic_file_aio_write_nolock+0x33f/0x3a9 > [<ffffffff8022b7d5>] enqueue_entity+0x17c/0x1a3 > [<ffffffff80257e75>] generic_file_aio_write+0x61/0xc1 > [<ffffffff8022c512>] __check_preempt_curr_fair+0x56/0x76 > [<ffffffff802f4022>] ext4_file_write+0x16/0x91 > [<ffffffff8027c4f4>] do_sync_write+0xc9/0x10c > [<ffffffff8027d50a>] file_move+0x1d/0x4c > [<ffffffff80245992>] autoremove_wake_function+0x0/0x2e > [<ffffffff8027b216>] do_filp_open+0x2a/0x38 > [<ffffffff80275f7a>] poison_obj+0x26/0x30 > [<ffffffff8027cc34>] vfs_write+0xad/0x136 > [<ffffffff8027d171>] sys_write+0x45/0x6e > [<ffffffff8020b32e>] system_call+0x7e/0x83 > > > =============> > > The stack track shows ext4_new_block(), is the problem repeatable? Does away without > multiple block allocation patch? The oops was not easily reproductible. I'd got it twice while testing the mballoc feature with the uninit_groups option but the running tests were different. Since I made the change I sent friday to Aneesh (in the uninitialized-block-groups patch), I could not reproduce it. Could the oops be related to this other problem I found ? Valérie ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-10-08 14:13 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <4705113A.7030908@bull.net>
[not found] ` <1191534231.6106.99.camel@dyn9047017100.beaverton.ibm.com>
2007-10-05 13:41 ` 2.6.23-rc9: Oops in cache_alloc_refill() mm/slab.c Valerie Clement
2007-10-05 14:54 ` Badari Pulavarty
2007-10-05 20:30 ` Mingming Cao
2007-10-05 22:06 ` Mingming Cao
2007-10-08 14:12 ` Valerie Clement
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).