From: Simon Kirby <sim@hostway.ca>
To: Pekka Enberg <penberg@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
Christoph Lameter <cl@linux.com>,
Chris Mason <chris.mason@fusionio.com>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds()
Date: Mon, 19 Aug 2013 13:17:17 -0700 [thread overview]
Message-ID: <20130819201717.GA23608@hostway.ca> (raw)
In-Reply-To: <CAOJsxLGA+Aha8JcPsWyNd3BLQ6L0g7iznN0o2fG5yL5Xw0s8Lg@mail.gmail.com>
On Sat, Jul 06, 2013 at 11:27:38AM +0300, Pekka Enberg wrote:
> On Sat, Jul 6, 2013 at 3:09 AM, Simon Kirby <sim@hostway.ca> wrote:
> > We saw two Oopses overnight on two separate boxes that seem possibly
> > related, but both are weird. These boxes typically run btrfs for rsync
> > snapshot backups (and usually Oops in btrfs ;), but not this time!
> > backup02 was running 3.10-rc6 plus btrfs-next at the time, and backup03
> > was running 3.10 release plus btrfs-next from yesterday. Full kern.log
> > and .config at http://0x.ca/sim/ref/3.10/
> >
> > backup02's first Oops:
> >
> > BUG: unable to handle kernel paging request at 0000000100000000
> > IP: [<ffffffff81124beb>] kmem_cache_alloc+0x4b/0x110
> > PGD 1f54f7067 PUD 0
> > Oops: 0000 [#1] SMP
> > Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe microcode serio_raw bnx2 evdev
> > CPU: 0 PID: 23112 Comm: ionice Not tainted 3.10.0-rc6-hw+ #46
> > Hardware name: Dell Inc. PowerEdge 2950/0NH278, BIOS 2.7.0 10/30/2010
> > task: ffff8802c3f08000 ti: ffff8801b4876000 task.ti: ffff8801b4876000
> > RIP: 0010:[<ffffffff81124beb>] [<ffffffff81124beb>] kmem_cache_alloc+0x4b/0x110
> > RSP: 0018:ffff8801b4877e88 EFLAGS: 00010206
> > RAX: 0000000000000000 RBX: ffff8802c3f08000 RCX: 00000000017f040e
> > RDX: 00000000017f040d RSI: 00000000000000d0 RDI: ffffffff8107a503
> > RBP: ffff8801b4877ec8 R08: 0000000000016a80 R09: 0000000000000000
> > R10: 00007fff025fe120 R11: 0000000000000246 R12: 00000000000000d0
> > R13: ffff88042d8019c0 R14: 0000000100000000 R15: 00007fc3588ee97f
> > FS: 0000000000000000(0000) GS:ffff88043fc00000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 0000000100000000 CR3: 0000000409d68000 CR4: 00000000000007f0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Stack:
> > ffff8801b4877ed8 ffffffff8112a1bc ffff8800985acd20 ffff8802c3f08000
> > 0000000000000001 00007fc3588ee334 00007fc358af5758 00007fc3588ee97f
> > ffff8801b4877ee8 ffffffff8107a503 ffff8801b4877ee8 ffffffffffffffea
> > Call Trace:
> > [<ffffffff8112a1bc>] ? __fput+0x12c/0x240
> > [<ffffffff8107a503>] prepare_creds+0x23/0x150
> > [<ffffffff811272d4>] SyS_faccessat+0x34/0x1f0
> > [<ffffffff811274a3>] SyS_access+0x13/0x20
> > [<ffffffff8179e7a9>] system_call_fastpath+0x16/0x1b
> > Code: 75 f0 4c 89 7d f8 49 8b 4d 00 65 48 03 0c 25 68 da 00 00 48 8b 51 08 4c 8b 31 4d 85 f6 74 5f 49 63 45 20 4d 8b 45 00 48 8d 4a 01 <49> 8b 1c 06 4c 89 f0 65 49 0f c7 08 0f 94 c0 84 c0 74 c8 49 63
> > RIP [<ffffffff81124beb>] kmem_cache_alloc+0x4b/0x110
> > RSP <ffff8801b4877e88>
> > CR2: 0000000100000000
> > ---[ end trace 744477356cd98306 ]---
> >
> > backup03's first Oops:
> >
> > BUG: unable to handle kernel paging request at ffff880502efc240
> > IP: [<ffffffff81124c4b>] kmem_cache_alloc+0x4b/0x110
> > PGD 1d3a067 PUD 0
> > Oops: 0000 [#1] SMP
> > Modules linked in: aoe ipmi_devintf ipmi_msghandler bnx2 microcode serio_raw evdev
> > CPU: 6 PID: 14066 Comm: perl Not tainted 3.10.0-hw+ #2
> > Hardware name: Dell Inc. PowerEdge R510/0DPRKF, BIOS 1.11.0 07/23/2012
> > task: ffff88040111c3b0 ti: ffff8803c23ae000 task.ti: ffff8803c23ae000
> > RIP: 0010:[<ffffffff81124c4b>] [<ffffffff81124c4b>] kmem_cache_alloc+0x4b/0x110
> > RSP: 0018:ffff8803c23afd90 EFLAGS: 00010282
> > RAX: 0000000000000000 RBX: ffff88040111c3b0 RCX: 000000000002a76e
> > RDX: 000000000002a76d RSI: 00000000000000d0 RDI: ffffffff8107a4e3
> > RBP: ffff8803c23afdd0 R08: 0000000000016a80 R09: 00000000ffffffff
> > R10: fffffffffffffffe R11: ffffffffffffffd0 R12: 00000000000000d0
> > R13: ffff88041d403980 R14: ffff880502efc240 R15: ffff88010e375a40
> > FS: 00007f2cae496700(0000) GS:ffff88041f2c0000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: ffff880502efc240 CR3: 00000001e0ced000 CR4: 00000000000007e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Stack:
> > ffff8803c23afe98 ffff8803c23afdb8 ffffffff81133811 ffff88040111c3b0
> > ffff88010e375a40 0000000001200011 00007f2cae4969d0 ffff88010e375a40
> > ffff8803c23afdf0 ffffffff8107a4e3 ffffffff81b49b80 0000000001200011
> > Call Trace:
> > [<ffffffff81133811>] ? final_putname+0x21/0x50
> > [<ffffffff8107a4e3>] prepare_creds+0x23/0x150
> > [<ffffffff8107ab11>] copy_creds+0x31/0x160
> > [<ffffffff8101a97b>] ? unlazy_fpu+0x9b/0xb0
> > [<ffffffff8104ef09>] copy_process.part.49+0x239/0x1390
> > [<ffffffff81143312>] ? __alloc_fd+0x42/0x100
> > [<ffffffff81050134>] do_fork+0xa4/0x320
> > [<ffffffff81131b77>] ? __do_pipe_flags+0x77/0xb0
> > [<ffffffff81143426>] ? __fd_install+0x26/0x60
> > [<ffffffff81050431>] SyS_clone+0x11/0x20
> > [<ffffffff817ad849>] stub_clone+0x69/0x90
> > [<ffffffff817ad569>] ? system_call_fastpath+0x16/0x1b
> > Code: 75 f0 4c 89 7d f8 49 8b 4d 00 65 48 03 0c 25 68 da 00 00 48 8b 51 08 4c 8b 31 4d 85 f6 74 5f 49 63 45 20 4d 8b 45 00 48 8d 4a 01 <49> 8b 1c 06 4c 89 f0 65 49 0f c7 08 0f 94 c0 84 c0 74 c8 49 63
> > RIP [<ffffffff81124c4b>] kmem_cache_alloc+0x4b/0x110
> > RSP <ffff8803c23afd90>
> > CR2: ffff880502efc240
> > ---[ end trace 956d153150ecc57f ]---
>
> Looks like slab corruption to me.
>
> Please try reproducing with "slub_debug" passed as a kernel parameter.
> It should give us some more debug output for catching the caller
> that's messing up slab.
>
> Btw, there are some btrfs related lockup warnings in the logs so I'm
> also CC'ing Chris.
So, with slub_debug, we are seeing "Poison overwritten" on two separate
boxes (we have four running roughly the same NFS-to-btrfs snapshot backup
tasks). One does it about weekly, the other has only done it once. The
alloc/free traces are always the same -- always alloc_pipe_info and
free_pipe_info. This is seen on 3.10 and (now) 3.11-rc4:
=============================================================================
BUG kmalloc-192 (Not tainted): Poison overwritten
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: 0xffff880090f19e7c-0xffff880090f19e7c. First byte 0x6c instead of 0x6b
INFO: Allocated in alloc_pipe_info+0x1f/0xb0 age=15 cpu=6 pid=21914
__slab_alloc.constprop.66+0x35b/0x3a0
kmem_cache_alloc_trace+0xa0/0x100
alloc_pipe_info+0x1f/0xb0
create_pipe_files+0x41/0x1f0
__do_pipe_flags+0x3c/0xb0
SyS_pipe2+0x1b/0xa0
SyS_pipe+0xb/0x10
system_call_fastpath+0x16/0x1b
INFO: Freed in free_pipe_info+0x6a/0x70 age=14 cpu=6 pid=21914
__slab_free+0x2d/0x2df
kfree+0xfd/0x130
free_pipe_info+0x6a/0x70
pipe_release+0x94/0xf0
__fput+0xa7/0x230
____fput+0x9/0x10
task_work_run+0x97/0xd0
do_notify_resume+0x66/0x70
int_signal+0x12/0x17
INFO: Slab 0xffffea000243c600 objects=31 used=31 fp=0x (null) flags=0x4000000000004080
INFO: Object 0xffff880090f19e78 @offset=7800 fp=0xffff880090f1b6d8
Bytes b4 ffff880090f19e68: 11 a2 b0 07 01 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ
Object ffff880090f19e78: 6b 6b 6b 6b 6c 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkklkkkkkkkkkkk
Object ffff880090f19e88: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff880090f19e98: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff880090f19ea8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff880090f19eb8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff880090f19ec8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff880090f19ed8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff880090f19ee8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff880090f19ef8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff880090f19f08: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff880090f19f18: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff880090f19f28: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
Redzone ffff880090f19f38: bb bb bb bb bb bb bb bb ........
Padding ffff880090f1a078: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
CPU: 6 PID: 21914 Comm: perl Tainted: G B 3.11.0-rc4-hw+ #48
Hardware name: Dell Inc. PowerEdge R510/0DPRKF, BIOS 1.11.0 07/23/2012
ffff880090f19e78 ffff8800a0f03c98 ffffffff817af54c 0000000000000007
ffff88041d404900 ffff8800a0f03cc8 ffffffff81131c89 ffff880090f19e7d
ffff88041d404900 000000000000006b ffff880090f19e78 ffff8800a0f03d18
Call Trace:
[<ffffffff817af54c>] dump_stack+0x46/0x58
[<ffffffff81131c89>] print_trailer+0xf9/0x160
[<ffffffff81131e22>] check_bytes_and_report+0xe2/0x120
[<ffffffff81132027>] check_object+0x1c7/0x240
[<ffffffff8113fd9f>] ? alloc_pipe_info+0x1f/0xb0
[<ffffffff817abaae>] alloc_debug_processing+0x153/0x168
[<ffffffff817abe1e>] __slab_alloc.constprop.66+0x35b/0x3a0
[<ffffffff8113fd9f>] ? alloc_pipe_info+0x1f/0xb0
[<ffffffff811333a0>] kmem_cache_alloc_trace+0xa0/0x100
[<ffffffff8114f26d>] ? inode_init_always+0xed/0x1b0
[<ffffffff8113fd9f>] alloc_pipe_info+0x1f/0xb0
[<ffffffff811402c1>] create_pipe_files+0x41/0x1f0
[<ffffffff811404ac>] __do_pipe_flags+0x3c/0xb0
[<ffffffff81152206>] ? __fd_install+0x26/0x60
[<ffffffff8114057b>] SyS_pipe2+0x1b/0xa0
[<ffffffff8114060b>] SyS_pipe+0xb/0x10
[<ffffffff817bce69>] system_call_fastpath+0x16/0x1b
FIX kmalloc-192: Restoring 0xffff880090f19e7c-0xffff880090f19e7c=0x6b
FIX kmalloc-192: Marking all objects used
This and more traces posted here: http://0x.ca/sim/ref/3.11-rc4/
Is there anything more we should turn on to get more information?
CONFIG_EFENCE? :)
Simon-
next prev parent reply other threads:[~2013-08-19 20:36 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-06 0:09 [3.10] Oopses in kmem_cache_allocate() via prepare_creds() Simon Kirby
2013-07-06 8:27 ` Pekka Enberg
2013-08-19 20:17 ` Simon Kirby [this message]
2013-08-19 20:29 ` Christoph Lameter
2013-08-19 21:16 ` Linus Torvalds
2013-08-19 21:24 ` Chris Mason
2013-08-19 23:31 ` Simon Kirby
2013-09-03 20:43 ` Simon Kirby
2013-08-20 4:06 ` Al Viro
2013-08-20 7:17 ` Ian Applegate
2013-08-20 7:21 ` Al Viro
2013-08-20 7:51 ` Ian Applegate
2013-11-26 0:44 ` Simon Kirby
2013-11-26 23:16 ` Linus Torvalds
2013-11-26 23:44 ` Linus Torvalds
2013-11-30 9:43 ` Simon Kirby
2013-11-30 17:25 ` Linus Torvalds
2013-11-30 21:04 ` Simon Kirby
2013-11-30 21:08 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130819201717.GA23608@hostway.ca \
--to=sim@hostway.ca \
--cc=chris.mason@fusionio.com \
--cc=cl@linux.com \
--cc=linux-kernel@vger.kernel.org \
--cc=penberg@kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).