Re: [OOPS] amrestore dies in kmem_cache_free 2.6.16.18 - cannot restore backups!

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mike Christie <michaelc@cs.wisc.edu>
To: Chuck Ebbert <76306.1226@compuserve.com>
Cc: James Lamanna <jlamanna@gmail.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	James Bottomley <James.Bottomley@steeleye.com>
Subject: Re: [OOPS] amrestore dies in kmem_cache_free 2.6.16.18 -  cannot restore backups!
Date: Sat, 27 May 2006 10:22:35 -0500	[thread overview]
Message-ID: <44786EBB.50300@cs.wisc.edu> (raw)
In-Reply-To: <200605260029_MC3-1-C0CF-C67C@compuserve.com>

Chuck Ebbert wrote:
> In-Reply-To: <aa4c40ff0605231824j55c998c3oe427dec2404afba0@mail.gmail.com>
> 
> On Tue, 23 May 2006 18:24:14 -0700, James Lamanna wrote:
> 
>> So I was able to recreate this problem on a vanilla 2.6.16.18 with the
>> following oops..
>> I'd say this is a serious regression since I cannot restore backups
>> anymore (I could with 2.6.14.x, but that kernel series had other
>> issues...)
> 
>> Unable to handle kernel paging request at ffff82bc81000030 RIP: <ffffffff801657d9>{kmem_cache_free+82}
>> PGD 0
>> Oops: 0000 [1] SMP
>> CPU 1
>> Modules linked in:
>> Pid: 5814, comm: amrestore Not tainted 2.6.16.18 #2
>> RIP: 0010:[<ffffffff801657d9>] <ffffffff801657d9>{kmem_cache_free+82}
>> RSP: 0018:ffff81007d4afcd8  EFLAGS: 00010086
>> RAX: ffff82bc81000000 RBX: ffff81004119d800 RCX: 000000000000001e
>> RDX: ffff81000000c000 RSI: 0000000000000000 RDI: 00000007f0000000
>> RBP: ffff81007ff0c800 R08: 0000000000000000 R09: 0000000000000400
>> R10: 0000000000000000 R11: ffffffff8014b3d6 R12: ffff810041311480
>> R13: 0000000000000400 R14: 0000000000000400 R15: ffff81007e676748
>> FS:  00002b7f39708020(0000) GS:ffff810041173bc0(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> CR2: ffff82bc81000030 CR3: 000000007de09000 CR4: 00000000000006e0
>> Process amrestore (pid: 5814, threadinfo ffff81007d4ae000, task ffff81007e2f8ae0)
>> Stack: 0000000000000000 0000000000000246 ffff8100413c9bc0 ffff81007ff0c800
>>        ffff8100413c9bc0 ffffffff8016dfdc ffff8100413c9bc0 ffff81007fe25408
>>        00000000ffffffea ffffffff803187e7
>> Call Trace: <ffffffff8016dfdc>{bio_free+48} <ffffffff803187e7>{scsi_execute_async+640}
>>        <ffffffff8035d8d2>{st_do_scsi+422} <ffffffff8035d6e2>{st_sleep_done+0}
>>        <ffffffff80362950>{st_read+855} <ffffffff8013e1ca>{autoremove_wake_function+0}
>>        <ffffffff80169d7c>{vfs_read+171} <ffffffff8016a0af>{sys_read+69}
>>        <ffffffff8010a93e>{system_call+126}
>>
>> Code: 48 8b 48 30 0f b7 51 28 65 8b 04 25 30 00 00 00 39 c2 0f 84
>> RIP <ffffffff801657d9>{kmem_cache_free+82} RSP <ffff81007d4afcd8>
>> CR2: ffff82bc81000030
> 
> First of all, to really see what is happening you need to recompile your kernel
> after adding some debug options:
> 
> Kernel Hacking --->
>    [*] Kernel debugging
>    [*]   Debug memory allocations
>    [*]   Compile the kernel with frame pointers
> 
> (Frame pointers won't give an exact trace but they'll prevent the tail merging
> that makes it so hard to follow.)
> 
> Then reproduce the error and send the oops and any new error messages you see.
> Don't send the whole boot log and .config again -- we have them already.
> 
> The bug is happening here, in __cache_free, in code that's only included
> on NUMA machines:
> 
> static inline void __cache_free(struct kmem_cache *cachep, void *objp)
> {
>         struct array_cache *ac = cpu_cache_get(cachep);
> 
>         check_irq_off();
>         objp = cache_free_debugcheck(cachep, objp, __builtin_return_address(0));
> 
>         /* Make sure we are not freeing a object from another
>          * node to the array cache on this cpu.
>          */
> #ifdef CONFIG_NUMA
>         {
>                 struct slab *slabp;
>                 slabp = virt_to_slab(objp);                      <==== OOPS
>                 if (unlikely(slabp->nodeid != numa_node_id())) {
>                         struct array_cache *alien = NULL;
>                         int nodeid = slabp->nodeid;
> 
> 
> Tracing through the nested inline functions, we have:
> 
> static inline struct slab *virt_to_slab(const void *obj)
> {
>         struct page *page = virt_to_page(obj);
>         return page_get_slab(page);                              <==== OOPS
> }
> 
> static inline struct slab *page_get_slab(struct page *page)
> {
>         return (struct slab *)page->lru.prev;                    <==== OOPS
> }
> 
> 
> virt_to_page() returned a struct page * that pointed to unmapped memory.
> 
> 
> This all came from scsi_execute_async, possibly through this path:
> 
> scsi_execute_async
>     scsi_rq_map_sg: some kind of error occurred?
>         bio_endio
>             bio->bi_end_io ==> scsi_bi_end_io
>                 bio_put
>                     bio->bi_destructor ==> bio_fs_destructor
>                         bio_free
>                             mempool_free
>                                 kmem_cache_free
> 
> scsi_execute_async and scsi_rq_map_sg were rewritten last December, so may have
> new bugs.
> 
> 

Sorry for the late reply. I have been traveling.

Maybe I messed up on the bounce code usage. Are you using st's direct IO
feature?

next prev parent reply	other threads:[~2006-05-27 15:18 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-26  4:27 [OOPS] amrestore dies in kmem_cache_free 2.6.16.18 - cannot restore backups! Chuck Ebbert
2006-05-27 15:22 ` Mike Christie [this message]
  -- strict thread matches above, loose matches on Subject: below --
2006-05-24  1:24 James Lamanna
2006-05-24  7:38 ` Pekka Enberg
2006-05-25 20:02 ` Kai Makisara
2006-05-27  9:34   ` Kai Makisara
2006-05-28 22:04     ` James Lamanna
2006-05-30 15:17     ` James Lamanna

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44786EBB.50300@cs.wisc.edu \
    --to=michaelc@cs.wisc.edu \
    --cc=76306.1226@compuserve.com \
    --cc=James.Bottomley@steeleye.com \
    --cc=jlamanna@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=penberg@cs.helsinki.fi \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.