From: Mike Christie <michaelc@cs.wisc.edu>
To: Chuck Ebbert <76306.1226@compuserve.com>
Cc: James Lamanna <jlamanna@gmail.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
Pekka Enberg <penberg@cs.helsinki.fi>,
James Bottomley <James.Bottomley@steeleye.com>
Subject: Re: [OOPS] amrestore dies in kmem_cache_free 2.6.16.18 - cannot restore backups!
Date: Sat, 27 May 2006 10:22:35 -0500 [thread overview]
Message-ID: <44786EBB.50300@cs.wisc.edu> (raw)
In-Reply-To: <200605260029_MC3-1-C0CF-C67C@compuserve.com>
Chuck Ebbert wrote:
> In-Reply-To: <aa4c40ff0605231824j55c998c3oe427dec2404afba0@mail.gmail.com>
>
> On Tue, 23 May 2006 18:24:14 -0700, James Lamanna wrote:
>
>> So I was able to recreate this problem on a vanilla 2.6.16.18 with the
>> following oops..
>> I'd say this is a serious regression since I cannot restore backups
>> anymore (I could with 2.6.14.x, but that kernel series had other
>> issues...)
>
>> Unable to handle kernel paging request at ffff82bc81000030 RIP: <ffffffff801657d9>{kmem_cache_free+82}
>> PGD 0
>> Oops: 0000 [1] SMP
>> CPU 1
>> Modules linked in:
>> Pid: 5814, comm: amrestore Not tainted 2.6.16.18 #2
>> RIP: 0010:[<ffffffff801657d9>] <ffffffff801657d9>{kmem_cache_free+82}
>> RSP: 0018:ffff81007d4afcd8 EFLAGS: 00010086
>> RAX: ffff82bc81000000 RBX: ffff81004119d800 RCX: 000000000000001e
>> RDX: ffff81000000c000 RSI: 0000000000000000 RDI: 00000007f0000000
>> RBP: ffff81007ff0c800 R08: 0000000000000000 R09: 0000000000000400
>> R10: 0000000000000000 R11: ffffffff8014b3d6 R12: ffff810041311480
>> R13: 0000000000000400 R14: 0000000000000400 R15: ffff81007e676748
>> FS: 00002b7f39708020(0000) GS:ffff810041173bc0(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> CR2: ffff82bc81000030 CR3: 000000007de09000 CR4: 00000000000006e0
>> Process amrestore (pid: 5814, threadinfo ffff81007d4ae000, task ffff81007e2f8ae0)
>> Stack: 0000000000000000 0000000000000246 ffff8100413c9bc0 ffff81007ff0c800
>> ffff8100413c9bc0 ffffffff8016dfdc ffff8100413c9bc0 ffff81007fe25408
>> 00000000ffffffea ffffffff803187e7
>> Call Trace: <ffffffff8016dfdc>{bio_free+48} <ffffffff803187e7>{scsi_execute_async+640}
>> <ffffffff8035d8d2>{st_do_scsi+422} <ffffffff8035d6e2>{st_sleep_done+0}
>> <ffffffff80362950>{st_read+855} <ffffffff8013e1ca>{autoremove_wake_function+0}
>> <ffffffff80169d7c>{vfs_read+171} <ffffffff8016a0af>{sys_read+69}
>> <ffffffff8010a93e>{system_call+126}
>>
>> Code: 48 8b 48 30 0f b7 51 28 65 8b 04 25 30 00 00 00 39 c2 0f 84
>> RIP <ffffffff801657d9>{kmem_cache_free+82} RSP <ffff81007d4afcd8>
>> CR2: ffff82bc81000030
>
> First of all, to really see what is happening you need to recompile your kernel
> after adding some debug options:
>
> Kernel Hacking --->
> [*] Kernel debugging
> [*] Debug memory allocations
> [*] Compile the kernel with frame pointers
>
> (Frame pointers won't give an exact trace but they'll prevent the tail merging
> that makes it so hard to follow.)
>
> Then reproduce the error and send the oops and any new error messages you see.
> Don't send the whole boot log and .config again -- we have them already.
>
> The bug is happening here, in __cache_free, in code that's only included
> on NUMA machines:
>
> static inline void __cache_free(struct kmem_cache *cachep, void *objp)
> {
> struct array_cache *ac = cpu_cache_get(cachep);
>
> check_irq_off();
> objp = cache_free_debugcheck(cachep, objp, __builtin_return_address(0));
>
> /* Make sure we are not freeing a object from another
> * node to the array cache on this cpu.
> */
> #ifdef CONFIG_NUMA
> {
> struct slab *slabp;
> slabp = virt_to_slab(objp); <==== OOPS
> if (unlikely(slabp->nodeid != numa_node_id())) {
> struct array_cache *alien = NULL;
> int nodeid = slabp->nodeid;
>
>
> Tracing through the nested inline functions, we have:
>
> static inline struct slab *virt_to_slab(const void *obj)
> {
> struct page *page = virt_to_page(obj);
> return page_get_slab(page); <==== OOPS
> }
>
> static inline struct slab *page_get_slab(struct page *page)
> {
> return (struct slab *)page->lru.prev; <==== OOPS
> }
>
>
> virt_to_page() returned a struct page * that pointed to unmapped memory.
>
>
> This all came from scsi_execute_async, possibly through this path:
>
> scsi_execute_async
> scsi_rq_map_sg: some kind of error occurred?
> bio_endio
> bio->bi_end_io ==> scsi_bi_end_io
> bio_put
> bio->bi_destructor ==> bio_fs_destructor
> bio_free
> mempool_free
> kmem_cache_free
>
> scsi_execute_async and scsi_rq_map_sg were rewritten last December, so may have
> new bugs.
>
>
Sorry for the late reply. I have been traveling.
Maybe I messed up on the bounce code usage. Are you using st's direct IO
feature?
next prev parent reply other threads:[~2006-05-27 15:18 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-05-26 4:27 [OOPS] amrestore dies in kmem_cache_free 2.6.16.18 - cannot restore backups! Chuck Ebbert
2006-05-27 15:22 ` Mike Christie [this message]
-- strict thread matches above, loose matches on Subject: below --
2006-05-24 1:24 James Lamanna
2006-05-24 7:38 ` Pekka Enberg
2006-05-25 20:02 ` Kai Makisara
2006-05-27 9:34 ` Kai Makisara
2006-05-28 22:04 ` James Lamanna
2006-05-30 15:17 ` James Lamanna
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=44786EBB.50300@cs.wisc.edu \
--to=michaelc@cs.wisc.edu \
--cc=76306.1226@compuserve.com \
--cc=James.Bottomley@steeleye.com \
--cc=jlamanna@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=penberg@cs.helsinki.fi \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox