Re: [OOPS] amrestore dies in kmem_cache_free 2.6.16.18 - cannot restore backups!

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Mike Christie <michaelc@cs.wisc.edu>
To: Chuck Ebbert <76306.1226@compuserve.com>
Cc: James Lamanna <jlamanna@gmail.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	James Bottomley <James.Bottomley@steeleye.com>
Subject: Re: [OOPS] amrestore dies in kmem_cache_free 2.6.16.18 -  cannot restore backups!
Date: Sat, 27 May 2006 10:22:35 -0500	[thread overview]
Message-ID: <44786EBB.50300@cs.wisc.edu> (raw)
In-Reply-To: <200605260029_MC3-1-C0CF-C67C@compuserve.com>

Chuck Ebbert wrote:
> In-Reply-To: <aa4c40ff0605231824j55c998c3oe427dec2404afba0@mail.gmail.com>
> 
> On Tue, 23 May 2006 18:24:14 -0700, James Lamanna wrote:
> 
>> So I was able to recreate this problem on a vanilla 2.6.16.18 with the
>> following oops..
>> I'd say this is a serious regression since I cannot restore backups
>> anymore (I could with 2.6.14.x, but that kernel series had other
>> issues...)
> 
>> Unable to handle kernel paging request at ffff82bc81000030 RIP: <ffffffff801657d9>{kmem_cache_free+82}
>> PGD 0
>> Oops: 0000 [1] SMP
>> CPU 1
>> Modules linked in:
>> Pid: 5814, comm: amrestore Not tainted 2.6.16.18 #2
>> RIP: 0010:[<ffffffff801657d9>] <ffffffff801657d9>{kmem_cache_free+82}
>> RSP: 0018:ffff81007d4afcd8  EFLAGS: 00010086
>> RAX: ffff82bc81000000 RBX: ffff81004119d800 RCX: 000000000000001e
>> RDX: ffff81000000c000 RSI: 0000000000000000 RDI: 00000007f0000000
>> RBP: ffff81007ff0c800 R08: 0000000000000000 R09: 0000000000000400
>> R10: 0000000000000000 R11: ffffffff8014b3d6 R12: ffff810041311480
>> R13: 0000000000000400 R14: 0000000000000400 R15: ffff81007e676748
>> FS:  00002b7f39708020(0000) GS:ffff810041173bc0(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> CR2: ffff82bc81000030 CR3: 000000007de09000 CR4: 00000000000006e0
>> Process amrestore (pid: 5814, threadinfo ffff81007d4ae000, task ffff81007e2f8ae0)
>> Stack: 0000000000000000 0000000000000246 ffff8100413c9bc0 ffff81007ff0c800
>>        ffff8100413c9bc0 ffffffff8016dfdc ffff8100413c9bc0 ffff81007fe25408
>>        00000000ffffffea ffffffff803187e7
>> Call Trace: <ffffffff8016dfdc>{bio_free+48} <ffffffff803187e7>{scsi_execute_async+640}
>>        <ffffffff8035d8d2>{st_do_scsi+422} <ffffffff8035d6e2>{st_sleep_done+0}
>>        <ffffffff80362950>{st_read+855} <ffffffff8013e1ca>{autoremove_wake_function+0}
>>        <ffffffff80169d7c>{vfs_read+171} <ffffffff8016a0af>{sys_read+69}
>>        <ffffffff8010a93e>{system_call+126}
>>
>> Code: 48 8b 48 30 0f b7 51 28 65 8b 04 25 30 00 00 00 39 c2 0f 84
>> RIP <ffffffff801657d9>{kmem_cache_free+82} RSP <ffff81007d4afcd8>
>> CR2: ffff82bc81000030
> 
> First of all, to really see what is happening you need to recompile your kernel
> after adding some debug options:
> 
> Kernel Hacking --->
>    [*] Kernel debugging
>    [*]   Debug memory allocations
>    [*]   Compile the kernel with frame pointers
> 
> (Frame pointers won't give an exact trace but they'll prevent the tail merging
> that makes it so hard to follow.)
> 
> Then reproduce the error and send the oops and any new error messages you see.
> Don't send the whole boot log and .config again -- we have them already.
> 
> The bug is happening here, in __cache_free, in code that's only included
> on NUMA machines:
> 
> static inline void __cache_free(struct kmem_cache *cachep, void *objp)
> {
>         struct array_cache *ac = cpu_cache_get(cachep);
> 
>         check_irq_off();
>         objp = cache_free_debugcheck(cachep, objp, __builtin_return_address(0));
> 
>         /* Make sure we are not freeing a object from another
>          * node to the array cache on this cpu.
>          */
> #ifdef CONFIG_NUMA
>         {
>                 struct slab *slabp;
>                 slabp = virt_to_slab(objp);                      <==== OOPS
>                 if (unlikely(slabp->nodeid != numa_node_id())) {
>                         struct array_cache *alien = NULL;
>                         int nodeid = slabp->nodeid;
> 
> 
> Tracing through the nested inline functions, we have:
> 
> static inline struct slab *virt_to_slab(const void *obj)
> {
>         struct page *page = virt_to_page(obj);
>         return page_get_slab(page);                              <==== OOPS
> }
> 
> static inline struct slab *page_get_slab(struct page *page)
> {
>         return (struct slab *)page->lru.prev;                    <==== OOPS
> }
> 
> 
> virt_to_page() returned a struct page * that pointed to unmapped memory.
> 
> 
> This all came from scsi_execute_async, possibly through this path:
> 
> scsi_execute_async
>     scsi_rq_map_sg: some kind of error occurred?
>         bio_endio
>             bio->bi_end_io ==> scsi_bi_end_io
>                 bio_put
>                     bio->bi_destructor ==> bio_fs_destructor
>                         bio_free
>                             mempool_free
>                                 kmem_cache_free
> 
> scsi_execute_async and scsi_rq_map_sg were rewritten last December, so may have
> new bugs.
> 
> 

Sorry for the late reply. I have been traveling.

Maybe I messed up on the bounce code usage. Are you using st's direct IO
feature?

next prev parent reply	other threads:[~2006-05-27 15:18 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-26  4:27 [OOPS] amrestore dies in kmem_cache_free 2.6.16.18 - cannot restore backups! Chuck Ebbert
2006-05-27 15:22 ` Mike Christie [this message]
  -- strict thread matches above, loose matches on Subject: below --
2006-05-24  1:24 James Lamanna
2006-05-24  7:38 ` Pekka Enberg
2006-05-25 20:02 ` Kai Makisara
2006-05-27  9:34   ` Kai Makisara
2006-05-28 22:04     ` James Lamanna
2006-05-30 15:17     ` James Lamanna

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44786EBB.50300@cs.wisc.edu \
    --to=michaelc@cs.wisc.edu \
    --cc=76306.1226@compuserve.com \
    --cc=James.Bottomley@steeleye.com \
    --cc=jlamanna@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=penberg@cs.helsinki.fi \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox