* [3.10] Oopses in kmem_cache_allocate() via prepare_creds() @ 2013-07-06 0:09 Simon Kirby 2013-07-06 8:27 ` Pekka Enberg 0 siblings, 1 reply; 19+ messages in thread From: Simon Kirby @ 2013-07-06 0:09 UTC (permalink / raw) To: linux-kernel We saw two Oopses overnight on two separate boxes that seem possibly related, but both are weird. These boxes typically run btrfs for rsync snapshot backups (and usually Oops in btrfs ;), but not this time! backup02 was running 3.10-rc6 plus btrfs-next at the time, and backup03 was running 3.10 release plus btrfs-next from yesterday. Full kern.log and .config at http://0x.ca/sim/ref/3.10/ backup02's first Oops: BUG: unable to handle kernel paging request at 0000000100000000 IP: [<ffffffff81124beb>] kmem_cache_alloc+0x4b/0x110 PGD 1f54f7067 PUD 0 Oops: 0000 [#1] SMP Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe microcode serio_raw bnx2 evdev CPU: 0 PID: 23112 Comm: ionice Not tainted 3.10.0-rc6-hw+ #46 Hardware name: Dell Inc. PowerEdge 2950/0NH278, BIOS 2.7.0 10/30/2010 task: ffff8802c3f08000 ti: ffff8801b4876000 task.ti: ffff8801b4876000 RIP: 0010:[<ffffffff81124beb>] [<ffffffff81124beb>] kmem_cache_alloc+0x4b/0x110 RSP: 0018:ffff8801b4877e88 EFLAGS: 00010206 RAX: 0000000000000000 RBX: ffff8802c3f08000 RCX: 00000000017f040e RDX: 00000000017f040d RSI: 00000000000000d0 RDI: ffffffff8107a503 RBP: ffff8801b4877ec8 R08: 0000000000016a80 R09: 0000000000000000 R10: 00007fff025fe120 R11: 0000000000000246 R12: 00000000000000d0 R13: ffff88042d8019c0 R14: 0000000100000000 R15: 00007fc3588ee97f FS: 0000000000000000(0000) GS:ffff88043fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000100000000 CR3: 0000000409d68000 CR4: 00000000000007f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff8801b4877ed8 ffffffff8112a1bc ffff8800985acd20 ffff8802c3f08000 0000000000000001 00007fc3588ee334 00007fc358af5758 00007fc3588ee97f ffff8801b4877ee8 ffffffff8107a503 ffff8801b4877ee8 ffffffffffffffea Call Trace: [<ffffffff8112a1bc>] ? __fput+0x12c/0x240 [<ffffffff8107a503>] prepare_creds+0x23/0x150 [<ffffffff811272d4>] SyS_faccessat+0x34/0x1f0 [<ffffffff811274a3>] SyS_access+0x13/0x20 [<ffffffff8179e7a9>] system_call_fastpath+0x16/0x1b Code: 75 f0 4c 89 7d f8 49 8b 4d 00 65 48 03 0c 25 68 da 00 00 48 8b 51 08 4c 8b 31 4d 85 f6 74 5f 49 63 45 20 4d 8b 45 00 48 8d 4a 01 <49> 8b 1c 06 4c 89 f0 65 49 0f c7 08 0f 94 c0 84 c0 74 c8 49 63 RIP [<ffffffff81124beb>] kmem_cache_alloc+0x4b/0x110 RSP <ffff8801b4877e88> CR2: 0000000100000000 ---[ end trace 744477356cd98306 ]--- backup03's first Oops: BUG: unable to handle kernel paging request at ffff880502efc240 IP: [<ffffffff81124c4b>] kmem_cache_alloc+0x4b/0x110 PGD 1d3a067 PUD 0 Oops: 0000 [#1] SMP Modules linked in: aoe ipmi_devintf ipmi_msghandler bnx2 microcode serio_raw evdev CPU: 6 PID: 14066 Comm: perl Not tainted 3.10.0-hw+ #2 Hardware name: Dell Inc. PowerEdge R510/0DPRKF, BIOS 1.11.0 07/23/2012 task: ffff88040111c3b0 ti: ffff8803c23ae000 task.ti: ffff8803c23ae000 RIP: 0010:[<ffffffff81124c4b>] [<ffffffff81124c4b>] kmem_cache_alloc+0x4b/0x110 RSP: 0018:ffff8803c23afd90 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff88040111c3b0 RCX: 000000000002a76e RDX: 000000000002a76d RSI: 00000000000000d0 RDI: ffffffff8107a4e3 RBP: ffff8803c23afdd0 R08: 0000000000016a80 R09: 00000000ffffffff R10: fffffffffffffffe R11: ffffffffffffffd0 R12: 00000000000000d0 R13: ffff88041d403980 R14: ffff880502efc240 R15: ffff88010e375a40 FS: 00007f2cae496700(0000) GS:ffff88041f2c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff880502efc240 CR3: 00000001e0ced000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff8803c23afe98 ffff8803c23afdb8 ffffffff81133811 ffff88040111c3b0 ffff88010e375a40 0000000001200011 00007f2cae4969d0 ffff88010e375a40 ffff8803c23afdf0 ffffffff8107a4e3 ffffffff81b49b80 0000000001200011 Call Trace: [<ffffffff81133811>] ? final_putname+0x21/0x50 [<ffffffff8107a4e3>] prepare_creds+0x23/0x150 [<ffffffff8107ab11>] copy_creds+0x31/0x160 [<ffffffff8101a97b>] ? unlazy_fpu+0x9b/0xb0 [<ffffffff8104ef09>] copy_process.part.49+0x239/0x1390 [<ffffffff81143312>] ? __alloc_fd+0x42/0x100 [<ffffffff81050134>] do_fork+0xa4/0x320 [<ffffffff81131b77>] ? __do_pipe_flags+0x77/0xb0 [<ffffffff81143426>] ? __fd_install+0x26/0x60 [<ffffffff81050431>] SyS_clone+0x11/0x20 [<ffffffff817ad849>] stub_clone+0x69/0x90 [<ffffffff817ad569>] ? system_call_fastpath+0x16/0x1b Code: 75 f0 4c 89 7d f8 49 8b 4d 00 65 48 03 0c 25 68 da 00 00 48 8b 51 08 4c 8b 31 4d 85 f6 74 5f 49 63 45 20 4d 8b 45 00 48 8d 4a 01 <49> 8b 1c 06 4c 89 f0 65 49 0f c7 08 0f 94 c0 84 c0 74 c8 49 63 RIP [<ffffffff81124c4b>] kmem_cache_alloc+0x4b/0x110 RSP <ffff8803c23afd90> CR2: ffff880502efc240 ---[ end trace 956d153150ecc57f ]--- Simon- ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds() 2013-07-06 0:09 [3.10] Oopses in kmem_cache_allocate() via prepare_creds() Simon Kirby @ 2013-07-06 8:27 ` Pekka Enberg 2013-08-19 20:17 ` Simon Kirby 0 siblings, 1 reply; 19+ messages in thread From: Pekka Enberg @ 2013-07-06 8:27 UTC (permalink / raw) To: Simon Kirby; +Cc: LKML, Christoph Lameter, Chris Mason, Linus Torvalds On Sat, Jul 6, 2013 at 3:09 AM, Simon Kirby <sim@hostway.ca> wrote: > We saw two Oopses overnight on two separate boxes that seem possibly > related, but both are weird. These boxes typically run btrfs for rsync > snapshot backups (and usually Oops in btrfs ;), but not this time! > backup02 was running 3.10-rc6 plus btrfs-next at the time, and backup03 > was running 3.10 release plus btrfs-next from yesterday. Full kern.log > and .config at http://0x.ca/sim/ref/3.10/ > > backup02's first Oops: > > BUG: unable to handle kernel paging request at 0000000100000000 > IP: [<ffffffff81124beb>] kmem_cache_alloc+0x4b/0x110 > PGD 1f54f7067 PUD 0 > Oops: 0000 [#1] SMP > Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe microcode serio_raw bnx2 evdev > CPU: 0 PID: 23112 Comm: ionice Not tainted 3.10.0-rc6-hw+ #46 > Hardware name: Dell Inc. PowerEdge 2950/0NH278, BIOS 2.7.0 10/30/2010 > task: ffff8802c3f08000 ti: ffff8801b4876000 task.ti: ffff8801b4876000 > RIP: 0010:[<ffffffff81124beb>] [<ffffffff81124beb>] kmem_cache_alloc+0x4b/0x110 > RSP: 0018:ffff8801b4877e88 EFLAGS: 00010206 > RAX: 0000000000000000 RBX: ffff8802c3f08000 RCX: 00000000017f040e > RDX: 00000000017f040d RSI: 00000000000000d0 RDI: ffffffff8107a503 > RBP: ffff8801b4877ec8 R08: 0000000000016a80 R09: 0000000000000000 > R10: 00007fff025fe120 R11: 0000000000000246 R12: 00000000000000d0 > R13: ffff88042d8019c0 R14: 0000000100000000 R15: 00007fc3588ee97f > FS: 0000000000000000(0000) GS:ffff88043fc00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 0000000100000000 CR3: 0000000409d68000 CR4: 00000000000007f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Stack: > ffff8801b4877ed8 ffffffff8112a1bc ffff8800985acd20 ffff8802c3f08000 > 0000000000000001 00007fc3588ee334 00007fc358af5758 00007fc3588ee97f > ffff8801b4877ee8 ffffffff8107a503 ffff8801b4877ee8 ffffffffffffffea > Call Trace: > [<ffffffff8112a1bc>] ? __fput+0x12c/0x240 > [<ffffffff8107a503>] prepare_creds+0x23/0x150 > [<ffffffff811272d4>] SyS_faccessat+0x34/0x1f0 > [<ffffffff811274a3>] SyS_access+0x13/0x20 > [<ffffffff8179e7a9>] system_call_fastpath+0x16/0x1b > Code: 75 f0 4c 89 7d f8 49 8b 4d 00 65 48 03 0c 25 68 da 00 00 48 8b 51 08 4c 8b 31 4d 85 f6 74 5f 49 63 45 20 4d 8b 45 00 48 8d 4a 01 <49> 8b 1c 06 4c 89 f0 65 49 0f c7 08 0f 94 c0 84 c0 74 c8 49 63 > RIP [<ffffffff81124beb>] kmem_cache_alloc+0x4b/0x110 > RSP <ffff8801b4877e88> > CR2: 0000000100000000 > ---[ end trace 744477356cd98306 ]--- > > backup03's first Oops: > > BUG: unable to handle kernel paging request at ffff880502efc240 > IP: [<ffffffff81124c4b>] kmem_cache_alloc+0x4b/0x110 > PGD 1d3a067 PUD 0 > Oops: 0000 [#1] SMP > Modules linked in: aoe ipmi_devintf ipmi_msghandler bnx2 microcode serio_raw evdev > CPU: 6 PID: 14066 Comm: perl Not tainted 3.10.0-hw+ #2 > Hardware name: Dell Inc. PowerEdge R510/0DPRKF, BIOS 1.11.0 07/23/2012 > task: ffff88040111c3b0 ti: ffff8803c23ae000 task.ti: ffff8803c23ae000 > RIP: 0010:[<ffffffff81124c4b>] [<ffffffff81124c4b>] kmem_cache_alloc+0x4b/0x110 > RSP: 0018:ffff8803c23afd90 EFLAGS: 00010282 > RAX: 0000000000000000 RBX: ffff88040111c3b0 RCX: 000000000002a76e > RDX: 000000000002a76d RSI: 00000000000000d0 RDI: ffffffff8107a4e3 > RBP: ffff8803c23afdd0 R08: 0000000000016a80 R09: 00000000ffffffff > R10: fffffffffffffffe R11: ffffffffffffffd0 R12: 00000000000000d0 > R13: ffff88041d403980 R14: ffff880502efc240 R15: ffff88010e375a40 > FS: 00007f2cae496700(0000) GS:ffff88041f2c0000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: ffff880502efc240 CR3: 00000001e0ced000 CR4: 00000000000007e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Stack: > ffff8803c23afe98 ffff8803c23afdb8 ffffffff81133811 ffff88040111c3b0 > ffff88010e375a40 0000000001200011 00007f2cae4969d0 ffff88010e375a40 > ffff8803c23afdf0 ffffffff8107a4e3 ffffffff81b49b80 0000000001200011 > Call Trace: > [<ffffffff81133811>] ? final_putname+0x21/0x50 > [<ffffffff8107a4e3>] prepare_creds+0x23/0x150 > [<ffffffff8107ab11>] copy_creds+0x31/0x160 > [<ffffffff8101a97b>] ? unlazy_fpu+0x9b/0xb0 > [<ffffffff8104ef09>] copy_process.part.49+0x239/0x1390 > [<ffffffff81143312>] ? __alloc_fd+0x42/0x100 > [<ffffffff81050134>] do_fork+0xa4/0x320 > [<ffffffff81131b77>] ? __do_pipe_flags+0x77/0xb0 > [<ffffffff81143426>] ? __fd_install+0x26/0x60 > [<ffffffff81050431>] SyS_clone+0x11/0x20 > [<ffffffff817ad849>] stub_clone+0x69/0x90 > [<ffffffff817ad569>] ? system_call_fastpath+0x16/0x1b > Code: 75 f0 4c 89 7d f8 49 8b 4d 00 65 48 03 0c 25 68 da 00 00 48 8b 51 08 4c 8b 31 4d 85 f6 74 5f 49 63 45 20 4d 8b 45 00 48 8d 4a 01 <49> 8b 1c 06 4c 89 f0 65 49 0f c7 08 0f 94 c0 84 c0 74 c8 49 63 > RIP [<ffffffff81124c4b>] kmem_cache_alloc+0x4b/0x110 > RSP <ffff8803c23afd90> > CR2: ffff880502efc240 > ---[ end trace 956d153150ecc57f ]--- Looks like slab corruption to me. Please try reproducing with "slub_debug" passed as a kernel parameter. It should give us some more debug output for catching the caller that's messing up slab. Btw, there are some btrfs related lockup warnings in the logs so I'm also CC'ing Chris. Pekka ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds() 2013-07-06 8:27 ` Pekka Enberg @ 2013-08-19 20:17 ` Simon Kirby 2013-08-19 20:29 ` Christoph Lameter 0 siblings, 1 reply; 19+ messages in thread From: Simon Kirby @ 2013-08-19 20:17 UTC (permalink / raw) To: Pekka Enberg; +Cc: LKML, Christoph Lameter, Chris Mason, Linus Torvalds On Sat, Jul 06, 2013 at 11:27:38AM +0300, Pekka Enberg wrote: > On Sat, Jul 6, 2013 at 3:09 AM, Simon Kirby <sim@hostway.ca> wrote: > > We saw two Oopses overnight on two separate boxes that seem possibly > > related, but both are weird. These boxes typically run btrfs for rsync > > snapshot backups (and usually Oops in btrfs ;), but not this time! > > backup02 was running 3.10-rc6 plus btrfs-next at the time, and backup03 > > was running 3.10 release plus btrfs-next from yesterday. Full kern.log > > and .config at http://0x.ca/sim/ref/3.10/ > > > > backup02's first Oops: > > > > BUG: unable to handle kernel paging request at 0000000100000000 > > IP: [<ffffffff81124beb>] kmem_cache_alloc+0x4b/0x110 > > PGD 1f54f7067 PUD 0 > > Oops: 0000 [#1] SMP > > Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe microcode serio_raw bnx2 evdev > > CPU: 0 PID: 23112 Comm: ionice Not tainted 3.10.0-rc6-hw+ #46 > > Hardware name: Dell Inc. PowerEdge 2950/0NH278, BIOS 2.7.0 10/30/2010 > > task: ffff8802c3f08000 ti: ffff8801b4876000 task.ti: ffff8801b4876000 > > RIP: 0010:[<ffffffff81124beb>] [<ffffffff81124beb>] kmem_cache_alloc+0x4b/0x110 > > RSP: 0018:ffff8801b4877e88 EFLAGS: 00010206 > > RAX: 0000000000000000 RBX: ffff8802c3f08000 RCX: 00000000017f040e > > RDX: 00000000017f040d RSI: 00000000000000d0 RDI: ffffffff8107a503 > > RBP: ffff8801b4877ec8 R08: 0000000000016a80 R09: 0000000000000000 > > R10: 00007fff025fe120 R11: 0000000000000246 R12: 00000000000000d0 > > R13: ffff88042d8019c0 R14: 0000000100000000 R15: 00007fc3588ee97f > > FS: 0000000000000000(0000) GS:ffff88043fc00000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > CR2: 0000000100000000 CR3: 0000000409d68000 CR4: 00000000000007f0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > Stack: > > ffff8801b4877ed8 ffffffff8112a1bc ffff8800985acd20 ffff8802c3f08000 > > 0000000000000001 00007fc3588ee334 00007fc358af5758 00007fc3588ee97f > > ffff8801b4877ee8 ffffffff8107a503 ffff8801b4877ee8 ffffffffffffffea > > Call Trace: > > [<ffffffff8112a1bc>] ? __fput+0x12c/0x240 > > [<ffffffff8107a503>] prepare_creds+0x23/0x150 > > [<ffffffff811272d4>] SyS_faccessat+0x34/0x1f0 > > [<ffffffff811274a3>] SyS_access+0x13/0x20 > > [<ffffffff8179e7a9>] system_call_fastpath+0x16/0x1b > > Code: 75 f0 4c 89 7d f8 49 8b 4d 00 65 48 03 0c 25 68 da 00 00 48 8b 51 08 4c 8b 31 4d 85 f6 74 5f 49 63 45 20 4d 8b 45 00 48 8d 4a 01 <49> 8b 1c 06 4c 89 f0 65 49 0f c7 08 0f 94 c0 84 c0 74 c8 49 63 > > RIP [<ffffffff81124beb>] kmem_cache_alloc+0x4b/0x110 > > RSP <ffff8801b4877e88> > > CR2: 0000000100000000 > > ---[ end trace 744477356cd98306 ]--- > > > > backup03's first Oops: > > > > BUG: unable to handle kernel paging request at ffff880502efc240 > > IP: [<ffffffff81124c4b>] kmem_cache_alloc+0x4b/0x110 > > PGD 1d3a067 PUD 0 > > Oops: 0000 [#1] SMP > > Modules linked in: aoe ipmi_devintf ipmi_msghandler bnx2 microcode serio_raw evdev > > CPU: 6 PID: 14066 Comm: perl Not tainted 3.10.0-hw+ #2 > > Hardware name: Dell Inc. PowerEdge R510/0DPRKF, BIOS 1.11.0 07/23/2012 > > task: ffff88040111c3b0 ti: ffff8803c23ae000 task.ti: ffff8803c23ae000 > > RIP: 0010:[<ffffffff81124c4b>] [<ffffffff81124c4b>] kmem_cache_alloc+0x4b/0x110 > > RSP: 0018:ffff8803c23afd90 EFLAGS: 00010282 > > RAX: 0000000000000000 RBX: ffff88040111c3b0 RCX: 000000000002a76e > > RDX: 000000000002a76d RSI: 00000000000000d0 RDI: ffffffff8107a4e3 > > RBP: ffff8803c23afdd0 R08: 0000000000016a80 R09: 00000000ffffffff > > R10: fffffffffffffffe R11: ffffffffffffffd0 R12: 00000000000000d0 > > R13: ffff88041d403980 R14: ffff880502efc240 R15: ffff88010e375a40 > > FS: 00007f2cae496700(0000) GS:ffff88041f2c0000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > CR2: ffff880502efc240 CR3: 00000001e0ced000 CR4: 00000000000007e0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > Stack: > > ffff8803c23afe98 ffff8803c23afdb8 ffffffff81133811 ffff88040111c3b0 > > ffff88010e375a40 0000000001200011 00007f2cae4969d0 ffff88010e375a40 > > ffff8803c23afdf0 ffffffff8107a4e3 ffffffff81b49b80 0000000001200011 > > Call Trace: > > [<ffffffff81133811>] ? final_putname+0x21/0x50 > > [<ffffffff8107a4e3>] prepare_creds+0x23/0x150 > > [<ffffffff8107ab11>] copy_creds+0x31/0x160 > > [<ffffffff8101a97b>] ? unlazy_fpu+0x9b/0xb0 > > [<ffffffff8104ef09>] copy_process.part.49+0x239/0x1390 > > [<ffffffff81143312>] ? __alloc_fd+0x42/0x100 > > [<ffffffff81050134>] do_fork+0xa4/0x320 > > [<ffffffff81131b77>] ? __do_pipe_flags+0x77/0xb0 > > [<ffffffff81143426>] ? __fd_install+0x26/0x60 > > [<ffffffff81050431>] SyS_clone+0x11/0x20 > > [<ffffffff817ad849>] stub_clone+0x69/0x90 > > [<ffffffff817ad569>] ? system_call_fastpath+0x16/0x1b > > Code: 75 f0 4c 89 7d f8 49 8b 4d 00 65 48 03 0c 25 68 da 00 00 48 8b 51 08 4c 8b 31 4d 85 f6 74 5f 49 63 45 20 4d 8b 45 00 48 8d 4a 01 <49> 8b 1c 06 4c 89 f0 65 49 0f c7 08 0f 94 c0 84 c0 74 c8 49 63 > > RIP [<ffffffff81124c4b>] kmem_cache_alloc+0x4b/0x110 > > RSP <ffff8803c23afd90> > > CR2: ffff880502efc240 > > ---[ end trace 956d153150ecc57f ]--- > > Looks like slab corruption to me. > > Please try reproducing with "slub_debug" passed as a kernel parameter. > It should give us some more debug output for catching the caller > that's messing up slab. > > Btw, there are some btrfs related lockup warnings in the logs so I'm > also CC'ing Chris. So, with slub_debug, we are seeing "Poison overwritten" on two separate boxes (we have four running roughly the same NFS-to-btrfs snapshot backup tasks). One does it about weekly, the other has only done it once. The alloc/free traces are always the same -- always alloc_pipe_info and free_pipe_info. This is seen on 3.10 and (now) 3.11-rc4: ============================================================================= BUG kmalloc-192 (Not tainted): Poison overwritten ----------------------------------------------------------------------------- Disabling lock debugging due to kernel taint INFO: 0xffff880090f19e7c-0xffff880090f19e7c. First byte 0x6c instead of 0x6b INFO: Allocated in alloc_pipe_info+0x1f/0xb0 age=15 cpu=6 pid=21914 __slab_alloc.constprop.66+0x35b/0x3a0 kmem_cache_alloc_trace+0xa0/0x100 alloc_pipe_info+0x1f/0xb0 create_pipe_files+0x41/0x1f0 __do_pipe_flags+0x3c/0xb0 SyS_pipe2+0x1b/0xa0 SyS_pipe+0xb/0x10 system_call_fastpath+0x16/0x1b INFO: Freed in free_pipe_info+0x6a/0x70 age=14 cpu=6 pid=21914 __slab_free+0x2d/0x2df kfree+0xfd/0x130 free_pipe_info+0x6a/0x70 pipe_release+0x94/0xf0 __fput+0xa7/0x230 ____fput+0x9/0x10 task_work_run+0x97/0xd0 do_notify_resume+0x66/0x70 int_signal+0x12/0x17 INFO: Slab 0xffffea000243c600 objects=31 used=31 fp=0x (null) flags=0x4000000000004080 INFO: Object 0xffff880090f19e78 @offset=7800 fp=0xffff880090f1b6d8 Bytes b4 ffff880090f19e68: 11 a2 b0 07 01 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ Object ffff880090f19e78: 6b 6b 6b 6b 6c 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkklkkkkkkkkkkk Object ffff880090f19e88: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object ffff880090f19e98: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object ffff880090f19ea8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object ffff880090f19eb8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object ffff880090f19ec8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object ffff880090f19ed8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object ffff880090f19ee8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object ffff880090f19ef8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object ffff880090f19f08: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object ffff880090f19f18: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object ffff880090f19f28: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk. Redzone ffff880090f19f38: bb bb bb bb bb bb bb bb ........ Padding ffff880090f1a078: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ CPU: 6 PID: 21914 Comm: perl Tainted: G B 3.11.0-rc4-hw+ #48 Hardware name: Dell Inc. PowerEdge R510/0DPRKF, BIOS 1.11.0 07/23/2012 ffff880090f19e78 ffff8800a0f03c98 ffffffff817af54c 0000000000000007 ffff88041d404900 ffff8800a0f03cc8 ffffffff81131c89 ffff880090f19e7d ffff88041d404900 000000000000006b ffff880090f19e78 ffff8800a0f03d18 Call Trace: [<ffffffff817af54c>] dump_stack+0x46/0x58 [<ffffffff81131c89>] print_trailer+0xf9/0x160 [<ffffffff81131e22>] check_bytes_and_report+0xe2/0x120 [<ffffffff81132027>] check_object+0x1c7/0x240 [<ffffffff8113fd9f>] ? alloc_pipe_info+0x1f/0xb0 [<ffffffff817abaae>] alloc_debug_processing+0x153/0x168 [<ffffffff817abe1e>] __slab_alloc.constprop.66+0x35b/0x3a0 [<ffffffff8113fd9f>] ? alloc_pipe_info+0x1f/0xb0 [<ffffffff811333a0>] kmem_cache_alloc_trace+0xa0/0x100 [<ffffffff8114f26d>] ? inode_init_always+0xed/0x1b0 [<ffffffff8113fd9f>] alloc_pipe_info+0x1f/0xb0 [<ffffffff811402c1>] create_pipe_files+0x41/0x1f0 [<ffffffff811404ac>] __do_pipe_flags+0x3c/0xb0 [<ffffffff81152206>] ? __fd_install+0x26/0x60 [<ffffffff8114057b>] SyS_pipe2+0x1b/0xa0 [<ffffffff8114060b>] SyS_pipe+0xb/0x10 [<ffffffff817bce69>] system_call_fastpath+0x16/0x1b FIX kmalloc-192: Restoring 0xffff880090f19e7c-0xffff880090f19e7c=0x6b FIX kmalloc-192: Marking all objects used This and more traces posted here: http://0x.ca/sim/ref/3.11-rc4/ Is there anything more we should turn on to get more information? CONFIG_EFENCE? :) Simon- ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds() 2013-08-19 20:17 ` Simon Kirby @ 2013-08-19 20:29 ` Christoph Lameter 2013-08-19 21:16 ` Linus Torvalds 0 siblings, 1 reply; 19+ messages in thread From: Christoph Lameter @ 2013-08-19 20:29 UTC (permalink / raw) To: Simon Kirby; +Cc: Pekka Enberg, LKML, Chris Mason, Linus Torvalds On Mon, 19 Aug 2013, Simon Kirby wrote: > Object ffff880090f19e78: 6b 6b 6b 6b 6c 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkklkkkkkkkkkkk This looks like an increment after free in the second 32 bit value of the structure. First 32 bit value's poison is unchanged. > CONFIG_EFENCE? :) I think the developers should have enough information now to fix the issue. Find the struct definition. Look for the second 32 bit value and find out where that value is incremented. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds() 2013-08-19 20:29 ` Christoph Lameter @ 2013-08-19 21:16 ` Linus Torvalds 2013-08-19 21:24 ` Chris Mason 2013-08-20 4:06 ` Al Viro 0 siblings, 2 replies; 19+ messages in thread From: Linus Torvalds @ 2013-08-19 21:16 UTC (permalink / raw) To: Christoph Lameter, Al Viro; +Cc: Simon Kirby, Pekka Enberg, LKML, Chris Mason On Mon, Aug 19, 2013 at 1:29 PM, Christoph Lameter <cl@gentwo.org> wrote: > On Mon, 19 Aug 2013, Simon Kirby wrote: > >> [... ] The >> alloc/free traces are always the same -- always alloc_pipe_info and >> free_pipe_info. This is seen on 3.10 and (now) 3.11-rc4: >> >> Object ffff880090f19e78: 6b 6b 6b 6b 6c 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkklkkkkkkkkkkk > > This looks like an increment after free in the second 32 bit value of the > structure. First 32 bit value's poison is unchanged. Ugh. If that is "struct pipe_inode_info" and I read it right, that's the "wait_lock" spinlock that is part of the mutex. Doing a "spin_lock()" could indeed cause an increment operation. But it still sounds like a very odd case. And even for some wild pointer I'd then expect the spin_unlock to also happen, and to then increment the next byte (or word) too. More importantly, for a mutex, I'd expect the *other* fields to be corrupted too (the "waiter" field etc). That is, unless we're still spinning waiting for the mutex, but with that value we shouldn't, as far as I can see. But it kind of does match at least one of your oopses that you had before using slab debugging: one of them had a pointer that should have been NULL that was 0000000100000000. Which again is "increment the second 32-bit word", and could be explained by the slab entry being re-used for another allocation that just happened to have a pointer in the first 8 bytes instead. And I think the timing is interesting, and there is data to back up the fact that it is that mutex field: the field was introduced by commit 72b0d9aacb89 ("pipe: don't use ->i_mutex"), which was merged into 3.10-rc1. So it matches the timing Simon sees. So while I think the pipe mutex spinlock field is a bit odd, Al Viro added to the participants list. Because that pipe->mutex->mutex_lock corruption doesn't really make sense to me, but there are certainly interesting coincidences wrt timing. Simon - it *might* be interesting to do this with DEBUG_PAGEALLOC, and make the pipe_inode_info allocations use a full page instead of a kmalloc() in order to trigger that way. So now it uses pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL); and kfree(pipe); in alloc_pipe_info/free_pipe_info respectively, could you make it use pipe = (void *)get_zeroed_page(GFP_KERNEL); and free_page((unsigned long)pipe); instead respectively, and then enable DEBUG_PAGEALLOC? That *should* trigger an exception on the actual bad access, if it really is this pipe_inode_info that is having problems.. Linus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds() 2013-08-19 21:16 ` Linus Torvalds @ 2013-08-19 21:24 ` Chris Mason 2013-08-19 23:31 ` Simon Kirby 2013-08-20 4:06 ` Al Viro 1 sibling, 1 reply; 19+ messages in thread From: Chris Mason @ 2013-08-19 21:24 UTC (permalink / raw) To: Linus Torvalds, Christoph Lameter, Al Viro Cc: Simon Kirby, Pekka Enberg, LKML Quoting Linus Torvalds (2013-08-19 17:16:36) > On Mon, Aug 19, 2013 at 1:29 PM, Christoph Lameter <cl@gentwo.org> wrote: > > On Mon, 19 Aug 2013, Simon Kirby wrote: > > > >> [... ] The > >> alloc/free traces are always the same -- always alloc_pipe_info and > >> free_pipe_info. This is seen on 3.10 and (now) 3.11-rc4: > >> > >> Object ffff880090f19e78: 6b 6b 6b 6b 6c 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkklkkkkkkkkkkk > > > > This looks like an increment after free in the second 32 bit value of the > > structure. First 32 bit value's poison is unchanged. > > Ugh. If that is "struct pipe_inode_info" and I read it right, that's > the "wait_lock" spinlock that is part of the mutex. > > Doing a "spin_lock()" could indeed cause an increment operation. But > it still sounds like a very odd case. And even for some wild pointer > I'd then expect the spin_unlock to also happen, and to then increment > the next byte (or word) too. More importantly, for a mutex, I'd expect > the *other* fields to be corrupted too (the "waiter" field etc). That > is, unless we're still spinning waiting for the mutex, but with that > value we shouldn't, as far as I can see. > Simon, is this box doing btrfs send/receive? If so, it's probably where this pipe is coming from. Linus' CONFIG_DEBUG_PAGE_ALLOC suggestions are going to be the fastest way to find it, I can give you a patch if it'll help. It would be nice if you could trigger this on plain 3.11-rcX instead of btrfs-next. -chris ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds() 2013-08-19 21:24 ` Chris Mason @ 2013-08-19 23:31 ` Simon Kirby 2013-09-03 20:43 ` Simon Kirby 0 siblings, 1 reply; 19+ messages in thread From: Simon Kirby @ 2013-08-19 23:31 UTC (permalink / raw) To: Chris Mason Cc: Linus Torvalds, Christoph Lameter, Al Viro, Pekka Enberg, LKML On Mon, Aug 19, 2013 at 05:24:41PM -0400, Chris Mason wrote: > Quoting Linus Torvalds (2013-08-19 17:16:36) > > On Mon, Aug 19, 2013 at 1:29 PM, Christoph Lameter <cl@gentwo.org> wrote: > > > On Mon, 19 Aug 2013, Simon Kirby wrote: > > > > > >> [... ] The > > >> alloc/free traces are always the same -- always alloc_pipe_info and > > >> free_pipe_info. This is seen on 3.10 and (now) 3.11-rc4: > > >> > > >> Object ffff880090f19e78: 6b 6b 6b 6b 6c 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkklkkkkkkkkkkk > > > > > > This looks like an increment after free in the second 32 bit value of the > > > structure. First 32 bit value's poison is unchanged. > > > > Ugh. If that is "struct pipe_inode_info" and I read it right, that's > > the "wait_lock" spinlock that is part of the mutex. > > > > Doing a "spin_lock()" could indeed cause an increment operation. But > > it still sounds like a very odd case. And even for some wild pointer > > I'd then expect the spin_unlock to also happen, and to then increment > > the next byte (or word) too. More importantly, for a mutex, I'd expect > > the *other* fields to be corrupted too (the "waiter" field etc). That > > is, unless we're still spinning waiting for the mutex, but with that > > value we shouldn't, as far as I can see. > > > > Simon, is this box doing btrfs send/receive? If so, it's probably where > this pipe is coming from. No, not for some time (a few kernel versions ago). > Linus' CONFIG_DEBUG_PAGE_ALLOC suggestions are going to be the fastest > way to find it, I can give you a patch if it'll help. I presume it's just: diff --git a/fs/pipe.c b/fs/pipe.c index d2c45e1..30d5b8d 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -780,7 +780,7 @@ struct pipe_inode_info *alloc_pipe_info(void) { struct pipe_inode_info *pipe; - pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL); + pipe = (void *)get_zeroed_page(GFP_KERNEL); if (pipe) { pipe->bufs = kzalloc(sizeof(struct pipe_buffer) * PIPE_DEF_BUFFERS, GFP_KERNEL); if (pipe->bufs) { @@ -790,7 +790,7 @@ struct pipe_inode_info *alloc_pipe_info(void) mutex_init(&pipe->mutex); return pipe; } - kfree(pipe); + free_page((unsigned long)pipe); } return NULL; @@ -808,7 +808,7 @@ void free_pipe_info(struct pipe_inode_info *pipe) if (pipe->tmp_page) __free_page(pipe->tmp_page); kfree(pipe->bufs); - kfree(pipe); + free_page((unsigned long)pipe); } static struct vfsmount *pipe_mnt __read_mostly; ...and CONFIG_DEBUG_PAGEALLOC enabled. > It would be nice if you could trigger this on plain 3.11-rcX instead of > btrfs-next. On 3.10 it was with some btrfs-next pulled in, but the 3.11-rc4 traces were from 3.11-rc4 with just some of our local patches: > git diff --stat v3.11-rc4..master firmware/Makefile | 4 +- firmware/bnx2/bnx2-mips-06-6.2.3.fw.ihex | 5804 ++++++++++++++++++++++ firmware/bnx2/bnx2-mips-09-6.2.1b.fw.ihex | 6496 +++++++++++++++++++++++++ kernel/acct.c | 21 +- net/sunrpc/auth.c | 2 +- net/sunrpc/clnt.c | 10 + net/sunrpc/xprt.c | 8 +- 7 files changed, 12335 insertions(+), 10 deletions(-) None of them look relevant, but I'm building vanilla -rc4 with CONFIG_DEBUG_PAGEALLOC and the patch above. Simon- ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds() 2013-08-19 23:31 ` Simon Kirby @ 2013-09-03 20:43 ` Simon Kirby 0 siblings, 0 replies; 19+ messages in thread From: Simon Kirby @ 2013-09-03 20:43 UTC (permalink / raw) To: Chris Mason Cc: Linus Torvalds, Christoph Lameter, Al Viro, Pekka Enberg, LKML On Mon, Aug 19, 2013 at 04:31:38PM -0700, Simon Kirby wrote: > On Mon, Aug 19, 2013 at 05:24:41PM -0400, Chris Mason wrote: > > > Quoting Linus Torvalds (2013-08-19 17:16:36) > > > On Mon, Aug 19, 2013 at 1:29 PM, Christoph Lameter <cl@gentwo.org> wrote: > > > > On Mon, 19 Aug 2013, Simon Kirby wrote: > > > > > > > >> [... ] The > > > >> alloc/free traces are always the same -- always alloc_pipe_info and > > > >> free_pipe_info. This is seen on 3.10 and (now) 3.11-rc4: > > > >> > > > >> Object ffff880090f19e78: 6b 6b 6b 6b 6c 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkklkkkkkkkkkkk > > > > > > > > This looks like an increment after free in the second 32 bit value of the > > > > structure. First 32 bit value's poison is unchanged. > > > > > > Ugh. If that is "struct pipe_inode_info" and I read it right, that's > > > the "wait_lock" spinlock that is part of the mutex. > > > > > > Doing a "spin_lock()" could indeed cause an increment operation. But > > > it still sounds like a very odd case. And even for some wild pointer > > > I'd then expect the spin_unlock to also happen, and to then increment > > > the next byte (or word) too. More importantly, for a mutex, I'd expect > > > the *other* fields to be corrupted too (the "waiter" field etc). That > > > is, unless we're still spinning waiting for the mutex, but with that > > > value we shouldn't, as far as I can see. > > > > > > > Simon, is this box doing btrfs send/receive? If so, it's probably where > > this pipe is coming from. > > No, not for some time (a few kernel versions ago). > > > Linus' CONFIG_DEBUG_PAGE_ALLOC suggestions are going to be the fastest > > way to find it, I can give you a patch if it'll help. > > I presume it's just: > > diff --git a/fs/pipe.c b/fs/pipe.c > index d2c45e1..30d5b8d 100644 > --- a/fs/pipe.c > +++ b/fs/pipe.c > @@ -780,7 +780,7 @@ struct pipe_inode_info *alloc_pipe_info(void) > { > struct pipe_inode_info *pipe; > > - pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL); > + pipe = (void *)get_zeroed_page(GFP_KERNEL); > if (pipe) { > pipe->bufs = kzalloc(sizeof(struct pipe_buffer) * PIPE_DEF_BUFFERS, GFP_KERNEL); > if (pipe->bufs) { > @@ -790,7 +790,7 @@ struct pipe_inode_info *alloc_pipe_info(void) > mutex_init(&pipe->mutex); > return pipe; > } > - kfree(pipe); > + free_page((unsigned long)pipe); > } > > return NULL; > @@ -808,7 +808,7 @@ void free_pipe_info(struct pipe_inode_info *pipe) > if (pipe->tmp_page) > __free_page(pipe->tmp_page); > kfree(pipe->bufs); > - kfree(pipe); > + free_page((unsigned long)pipe); > } > > static struct vfsmount *pipe_mnt __read_mostly; > > ...and CONFIG_DEBUG_PAGEALLOC enabled. > > > It would be nice if you could trigger this on plain 3.11-rcX instead of > > btrfs-next. > > On 3.10 it was with some btrfs-next pulled in, but the 3.11-rc4 traces > were from 3.11-rc4 with just some of our local patches: > > > git diff --stat v3.11-rc4..master > firmware/Makefile | 4 +- > firmware/bnx2/bnx2-mips-06-6.2.3.fw.ihex | 5804 ++++++++++++++++++++++ > firmware/bnx2/bnx2-mips-09-6.2.1b.fw.ihex | 6496 +++++++++++++++++++++++++ > kernel/acct.c | 21 +- > net/sunrpc/auth.c | 2 +- > net/sunrpc/clnt.c | 10 + > net/sunrpc/xprt.c | 8 +- > 7 files changed, 12335 insertions(+), 10 deletions(-) > > None of them look relevant, but I'm building vanilla -rc4 with > CONFIG_DEBUG_PAGEALLOC and the patch above. Stock 3.11-rc4 plus the above get_zeroed_page() for pipe allocations has been running since August 19th on a few btrfs boxes. It has been quiet until a few days ago, where we hit this: BUG: soft lockup - CPU#5 stuck for 22s! [btrfs-cleaner:5871] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe serio_raw bnx2 evdev CPU: 5 PID: 5871 Comm: btrfs-cleaner Not tainted 3.11.0-rc4-hw+ #48 Hardware name: Dell Inc. PowerEdge 2950/0NH278, BIOS 2.7.0 10/30/2010 task: ffff8804261117d0 ti: ffff8804120d8000 task.ti: ffff8804120d8000 RIP: 0010:[<ffffffff817b4c8c>] [<ffffffff817b4c8c>] _raw_spin_unlock_irqrestore+0xc/0x20 RSP: 0018:ffff8804120d98b8 EFLAGS: 00000296 RAX: ffff88042d800a80 RBX: ffffffff81130ffc RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000296 RDI: 0000000000000296 RBP: ffff8804120d98b8 R08: ffff8804120d9840 R09: 0000000000000000 R10: ffff8803c6f47400 R11: ffff88042d807180 R12: ffff8803c6f474a8 R13: 00000000000000cc R14: ffffffff8101f6da R15: ffff8804120d9828 FS: 0000000000000000(0000) GS:ffff88043fd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffffff600400 CR3: 00000001a9fc4000 CR4: 00000000000007e0 Stack: ffff8804120d9958 ffffffff817abea8 ffff8804120d98f8 ffffffff813733d0 00000034ffffffff 0000000000000296 ffff88004edd99d0 0000000000000010 ffff8804120d9918 ffffffff81319211 0000000000000004 ffff88004edd99d0 Call Trace: [<ffffffff817abea8>] __slab_free+0x45/0x2df [<ffffffff813733d0>] ? btrfs_set_lock_blocking_rw+0xa0/0xe0 [<ffffffff81319211>] ? btrfs_set_path_blocking+0x31/0x70 [<ffffffff8131f15b>] ? btrfs_search_old_slot+0x75b/0x980 [<ffffffff8139899e>] ? ulist_free+0x1e/0x30 [<ffffffff811336ed>] kfree+0xfd/0x130 [<ffffffff8139899e>] ulist_free+0x1e/0x30 [<ffffffff81396b96>] __resolve_indirect_refs+0x436/0x5b0 [<ffffffff8139757c>] find_parent_nodes+0x47c/0x6c0 [<ffffffff81397851>] btrfs_find_all_roots+0x91/0x100 [<ffffffff8139b7e8>] btrfs_qgroup_account_ref+0x148/0x500 [<ffffffff81328016>] btrfs_delayed_refs_qgroup_accounting+0xa6/0x100 [<ffffffff8132f6f9>] btrfs_run_delayed_refs+0x49/0x5b0 [<ffffffff81328460>] ? walk_down_proc+0x120/0x2d0 [<ffffffff8135e359>] ? free_extent_buffer+0x59/0xb0 [<ffffffff8132c741>] ? walk_up_tree+0xe1/0x1e0 [<ffffffff8133f03a>] btrfs_should_end_transaction+0x4a/0x70 [<ffffffff8132df2b>] btrfs_drop_snapshot+0x3db/0x6a0 [<ffffffff8138e726>] ? btrfs_kill_all_delayed_nodes+0x56/0xf0 [<ffffffff817b4bee>] ? _raw_spin_lock+0xe/0x30 [<ffffffff81340b06>] btrfs_clean_one_deleted_snapshot+0xc6/0x130 [<ffffffff81337ab0>] cleaner_kthread+0x120/0x180 [<ffffffff81337990>] ? transaction_kthread+0x210/0x210 [<ffffffff8107af5b>] kthread+0xbb/0xc0 [<ffffffff81010000>] ? xen_smp_intr_init+0x100/0x220 [<ffffffff8107aea0>] ? kthread_freezable_should_stop+0x60/0x60 [<ffffffff817bcdbc>] ret_from_fork+0x7c/0xb0 [<ffffffff8107aea0>] ? kthread_freezable_should_stop+0x60/0x60 Code: 66 0f c1 07 0f b6 d4 38 c2 74 0d 0f 1f 40 00 f3 90 0f b6 07 38 d0 75 f7 5d c3 0f 1f 44 00 00 55 48 89 e5 80 07 01 48 89 f7 57 9d <0f> 1f 44 00 00 5d c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 Perhaps something is still be wrong with the btrfs ulist stuff, even with 35f0399db6658f465b00893bdd13b992a0acfef0? I was thinking of going back to the regular slab pipe allocations and DEBUG_MUTEXES, but it may not be pipe-related at this point... Simon- ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds() 2013-08-19 21:16 ` Linus Torvalds 2013-08-19 21:24 ` Chris Mason @ 2013-08-20 4:06 ` Al Viro 2013-08-20 7:17 ` Ian Applegate 1 sibling, 1 reply; 19+ messages in thread From: Al Viro @ 2013-08-20 4:06 UTC (permalink / raw) To: Linus Torvalds Cc: Christoph Lameter, Simon Kirby, Pekka Enberg, LKML, Chris Mason On Mon, Aug 19, 2013 at 02:16:36PM -0700, Linus Torvalds wrote: > On Mon, Aug 19, 2013 at 1:29 PM, Christoph Lameter <cl@gentwo.org> wrote: > > On Mon, 19 Aug 2013, Simon Kirby wrote: > > > >> [... ] The > >> alloc/free traces are always the same -- always alloc_pipe_info and > >> free_pipe_info. This is seen on 3.10 and (now) 3.11-rc4: > >> > >> Object ffff880090f19e78: 6b 6b 6b 6b 6c 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkklkkkkkkkkkkk > > > > This looks like an increment after free in the second 32 bit value of the > > structure. First 32 bit value's poison is unchanged. > > Ugh. If that is "struct pipe_inode_info" and I read it right, that's > the "wait_lock" spinlock that is part of the mutex. > > Doing a "spin_lock()" could indeed cause an increment operation. But > it still sounds like a very odd case. And even for some wild pointer > I'd then expect the spin_unlock to also happen, and to then increment > the next byte (or word) too. More importantly, for a mutex, I'd expect > the *other* fields to be corrupted too (the "waiter" field etc). That > is, unless we're still spinning waiting for the mutex, but with that > value we shouldn't, as far as I can see. Point... I would probably start with CONFIG_DEBUG_MUTEXES and see if it screams about mutex magic, etc. having been buggered. FWIW, pipe is neither a FIFO nor an internal per-task one - it's a usual pipe(2) one, which should've excluded most of the weird codepaths... ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds() 2013-08-20 4:06 ` Al Viro @ 2013-08-20 7:17 ` Ian Applegate 2013-08-20 7:21 ` Al Viro 0 siblings, 1 reply; 19+ messages in thread From: Ian Applegate @ 2013-08-20 7:17 UTC (permalink / raw) To: Al Viro Cc: Linus Torvalds, Christoph Lameter, Simon Kirby, Pekka Enberg, LKML, Chris Mason We are also seeing this or a similar issue. On a fairly widespread deployment of 3.10.1 & 3.10.6 this occurred fairly consistently on the order of 36 days (combined MTBF.) [28974.739774] ------------[ cut here ]------------ [28974.744980] kernel BUG at mm/slub.c:3352! [28974.749502] invalid opcode: 0000 [#1] SMP [28974.754143] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler dm_mod md_mod nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6table_raw ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack iptable_filter xt_tcpudp xt_CT nf_conntrack xt_multiport iptable_raw ip_tables x_tables rpcsec_gss_krb5 auth_rpcgss oid_registry nfsv4 fuse nfsv3 nfs_acl nfs fscache lockd sunrpc bonding tcp_cubic sg sfc(O) mtd mdio igb dca i2c_algo_bit i2c_core ptp pps_core coretemp kvm_intel kvm crc32c_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 acpi_cpufreq evdev sd_mod crc_t10dif snd_pcm tpm_tis microcode tpm snd_timer tpm_bios snd soundcore snd_page_alloc pcspkr ahci libahci uhci_hcd ehci_pci ehci_hcd lpc_ich libata mfd_core usbcore usb_common hpsa scsi_mod mperf button processor thermal_sys [28974.835407] CPU: 17 PID: 21400 Comm: nginx-fl Tainted: G O 3.10.1-cloudflare-trace+ #2 [28974.845307] Hardware name: HP ProLiant DL180 G6 , BIOS O20 01/24/2011 [28974.852653] task: ffff8805bea7b390 ti: ffff8809a5f0e000 task.ti: ffff8809a5f0e000 [28974.861095] RIP: 0010:[<ffffffff810ebcb2>] [<ffffffff810ebcb2>] kfree+0x59/0xe7 [28974.869453] RSP: 0018:ffff8809a5f0fe30 EFLAGS: 00010246 [28974.875434] RAX: 0000000000000000 RBX: ffff880b21e79d40 RCX: 0000000000000028 [28974.883458] RDX: 006ffc0000080068 RSI: 000000000001183c RDI: ffffea002c879e40 [28974.891483] RBP: ffffea002c879e40 R08: 0000000000017100 R09: ffffea0009ea0480 [28974.899507] R10: ffff880ae9746250 R11: 0000000000000000 R12: ffffffff81119456 [28974.907533] R13: 000000000000000a R14: ffff8809a5f0ff48 R15: 0000000000013b80 [28974.915558] FS: 00007fb36c115710(0000) GS:ffff880627d60000(0000) knlGS:0000000000000000 [28974.924681] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [28974.931149] CR2: ffffffffff600400 CR3: 0000000761b93000 CR4: 00000000000007e0 [28974.939173] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [28974.947190] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [28974.955205] Stack: [28974.957486] ffff880a3e72a700 000000000001183c 0000000000000000 ffffffff81119456 [28974.965873] ffff8809a5f0fe48 0000000000000000 ffff880b21e79d40 000000000000c350 [28974.974253] ffff8809a5f0fec0 ffff8802b4781bc0 0000000000000200 ffffffff811d96eb [28974.982646] Call Trace: [28974.985419] [<ffffffff81119456>] ? do_readv_writev+0xfc/0x135 [28974.991987] [<ffffffff811d96eb>] ? ep_poll+0x215/0x286 [28974.997875] [<ffffffff8112ccb4>] ? fget_light+0x2e/0x7c [28975.003861] [<ffffffff811182fd>] ? fdget+0x16/0x1c [28975.009359] [<ffffffff8111956c>] ? SyS_readv+0x5a/0xb0 [28975.015245] [<ffffffff811da1ae>] ? SyS_epoll_wait+0x86/0xc1 [28975.021622] [<ffffffff814e4d92>] ? system_call_fastpath+0x16/0x1b [28975.028578] Code: 48 83 fb 10 0f 86 aa 00 00 00 48 89 df e8 8c de ff ff 48 89 c7 48 89 c5 e8 b5 d7 ff ff 85 c0 75 22 48 f7 45 00 00 c0 00 00 75 02 <0f> 0b 48 89 ef e8 a8 d7 ff ff 5b 48 89 ef 89 c6 5d 41 5c e9 9a [28975.050651] RIP [<ffffffff810ebcb2>] kfree+0x59/0xe7 [28975.056354] RSP <ffff8809a5f0fe30> [28975.062479] ---[ end trace 29f90372a2c3b0ac ]--- The machine hobbles along until all processes crash at this point. [14064.855658] ============================================================================= [14064.864884] BUG kmalloc-192 (Tainted: G W ): Poison overwritten [14064.872424] ----------------------------------------------------------------------------- [14064.872424] [14064.883322] Disabling lock debugging due to kernel taint [14064.889304] INFO: 0xffff880930a54824-0xffff880930a54824. First byte 0x6c instead of 0x6b [14064.898433] INFO: Allocated in alloc_pipe_info+0x17/0x94 age=80 cpu=8 pid=28897 [14064.906684] set_track+0x5c/0xd7 [14064.910332] alloc_debug_processing+0x76/0xfd [14064.915250] __slab_alloc+0x3e0/0x417 [14064.919386] alloc_inode+0x26/0x6c [14064.923232] alloc_pipe_info+0x17/0x94 [14064.927467] kmem_cache_alloc_trace+0xbe/0x14e [14064.932486] alloc_pipe_info+0x17/0x94 [14064.936721] alloc_pipe_info+0x17/0x94 [14064.940956] create_pipe_files+0x3c/0x1e5 [14064.945481] __do_pipe_flags+0x23/0xa7 [14064.949718] SyS_pipe2+0x18/0x86 [14064.953374] system_call_fastpath+0x16/0x1b [14064.958098] INFO: Freed in pipe_release+0xc4/0xcd age=76 cpu=23 pid=28897 [14064.965740] set_track+0x5c/0xd7 [14064.969389] free_debug_processing+0x11d/0x1a9 [14064.974391] __slab_free+0x32/0x30a [14064.978334] free_pages_prepare+0x104/0x128 [14064.983053] pipe_release+0xc4/0xcd [14064.986991] __fput+0xe1/0x1e4 [14064.990443] task_work_run+0x7b/0x8f [14064.994482] do_notify_resume+0x59/0x68 [14064.998816] int_signal+0x12/0x17 [14065.002561] INFO: Slab 0xffffea0024c29500 objects=31 used=28 fp=0xffff880930a570c0 flags=0x6ffc0000004080 [14065.013341] INFO: Object 0xffff880930a54820 @offset=2080 fp=0xffff880930a55040 [14065.023177] Bytes b4 ffff880930a54810: 4d fa 14 00 01 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a M.......ZZZZZZZZ [14065.033864] Object ffff880930a54820: 6b 6b 6b 6b 6c 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkklkkkkkkkkkkk [14065.044354] Object ffff880930a54830: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [14065.054845] Object ffff880930a54840: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [14065.065335] Object ffff880930a54850: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [14065.075827] Object ffff880930a54860: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [14065.086321] Object ffff880930a54870: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [14065.096806] Object ffff880930a54880: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [14065.107298] Object ffff880930a54890: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [14065.117788] Object ffff880930a548a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [14065.128278] Object ffff880930a548b0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [14065.138770] Object ffff880930a548c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [14065.149262] Object ffff880930a548d0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk. [14065.159751] Redzone ffff880930a548e0: bb bb bb bb bb bb bb bb ........ [14065.169559] Padding ffff880930a54a20: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ [14065.179372] CPU: 10 PID: 7969 Comm: nginx-cache Tainted: G B W 3.10.6-cloudflare-debug+ #5 [14065.189469] Hardware name: HP ProLiant DL180 G6 , BIOS O20 01/24/2011 [14065.196807] 0000000000000000 ffffffff810ee8d2 ffff880930a54820 ffffffff81827fe7 [14065.205185] ffffea0024c29500 ffff880627804900 ffff880930a54820 ffffea0024c29500 [14065.213576] ffffffff81223ebb ffff880930a54820 ffffea0024c29500 ffffffff810ee9fe [14065.221965] Call Trace: [14065.224743] [<ffffffff810ee8d2>] ? check_bytes_and_report+0xa7/0xf4 [14065.231902] [<ffffffff81223ebb>] ? kmem_alloc+0x43/0x6d [14065.237890] [<ffffffff810ee9fe>] ? check_object+0xdf/0x19e [14065.244170] [<ffffffff81223e9d>] ? kmem_alloc+0x25/0x6d [14065.250154] [<ffffffff814e834d>] ? alloc_debug_processing+0x5c/0xfd [14065.257310] [<ffffffff814e87ce>] ? __slab_alloc+0x3e0/0x417 [14065.263687] [<ffffffff81223e9d>] ? kmem_alloc+0x25/0x6d [14065.269671] [<ffffffff810efe41>] ? __kmalloc+0x15b/0x191 [14065.275755] [<ffffffff81223e9d>] ? kmem_alloc+0x25/0x6d [14065.281733] [<ffffffff81223e9d>] ? kmem_alloc+0x25/0x6d [14065.287709] [<ffffffff8124ce2f>] ? xfs_idata_realloc+0xd5/0xf7 [14065.294378] [<ffffffff810efde0>] ? __kmalloc+0xfa/0x191 [14065.300366] [<ffffffff81223e9d>] ? kmem_alloc+0x25/0x6d [14065.306354] [<ffffffff81223e9d>] ? kmem_alloc+0x25/0x6d [14065.312340] [<ffffffff8125c4ab>] ? xfs_log_commit_cil+0xe8/0x3ce [14065.319202] [<ffffffff812589a1>] ? xfs_trans_commit+0x5c/0x1c5 [14065.325873] [<ffffffff81222ed4>] ? xfs_create+0x404/0x527 [14065.332055] [<ffffffff8121d07d>] ? xfs_vn_mknod+0xb4/0x15e [14065.338334] [<ffffffff8112d010>] ? vfs_create+0x5f/0x84 [14065.344318] [<ffffffff8112e0f6>] ? do_last+0x54b/0x9b7 [14065.350206] [<ffffffff8112e624>] ? path_openat+0xc2/0x327 [14065.356386] [<ffffffff81123b89>] ? do_readv_writev+0x119/0x135 [14065.363053] [<ffffffff8112eb2c>] ? do_filp_open+0x2a/0x6e [14065.369232] [<ffffffff8112bda0>] ? getname_flags.part.32+0x22/0x10a [14065.376386] [<ffffffff8113791b>] ? __alloc_fd+0xcc/0xdb [14065.382373] [<ffffffff81122803>] ? do_sys_open+0x5c/0xe0 [14065.388457] [<ffffffff814f43d2>] ? system_call_fastpath+0x16/0x1b [14065.395413] FIX kmalloc-192: Restoring 0xffff880930a54824-0xffff880930a54824=0x6b [14065.395413] [14065.405533] FIX kmalloc-192: Marking all objects used On Mon, Aug 19, 2013 at 9:06 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Mon, Aug 19, 2013 at 02:16:36PM -0700, Linus Torvalds wrote: >> On Mon, Aug 19, 2013 at 1:29 PM, Christoph Lameter <cl@gentwo.org> wrote: >> > On Mon, 19 Aug 2013, Simon Kirby wrote: >> > >> >> [... ] The >> >> alloc/free traces are always the same -- always alloc_pipe_info and >> >> free_pipe_info. This is seen on 3.10 and (now) 3.11-rc4: >> >> >> >> Object ffff880090f19e78: 6b 6b 6b 6b 6c 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkklkkkkkkkkkkk >> > >> > This looks like an increment after free in the second 32 bit value of the >> > structure. First 32 bit value's poison is unchanged. >> >> Ugh. If that is "struct pipe_inode_info" and I read it right, that's >> the "wait_lock" spinlock that is part of the mutex. >> >> Doing a "spin_lock()" could indeed cause an increment operation. But >> it still sounds like a very odd case. And even for some wild pointer >> I'd then expect the spin_unlock to also happen, and to then increment >> the next byte (or word) too. More importantly, for a mutex, I'd expect >> the *other* fields to be corrupted too (the "waiter" field etc). That >> is, unless we're still spinning waiting for the mutex, but with that >> value we shouldn't, as far as I can see. > > Point... I would probably start with CONFIG_DEBUG_MUTEXES and see if > it screams about mutex magic, etc. having been buggered. FWIW, pipe is > neither a FIFO nor an internal per-task one - it's a usual pipe(2) one, > which should've excluded most of the weird codepaths... > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds() 2013-08-20 7:17 ` Ian Applegate @ 2013-08-20 7:21 ` Al Viro 2013-08-20 7:51 ` Ian Applegate 0 siblings, 1 reply; 19+ messages in thread From: Al Viro @ 2013-08-20 7:21 UTC (permalink / raw) To: Ian Applegate Cc: Linus Torvalds, Christoph Lameter, Simon Kirby, Pekka Enberg, LKML, Chris Mason On Tue, Aug 20, 2013 at 12:17:52AM -0700, Ian Applegate wrote: > We are also seeing this or a similar issue. On a fairly widespread > deployment of 3.10.1 & 3.10.6 this occurred fairly consistently on the > order of 36 days (combined MTBF.) Do you have any boxen with CONFIG_DEBUG_MUTEXES among those? What kind of setup do you have on those, BTW? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds() 2013-08-20 7:21 ` Al Viro @ 2013-08-20 7:51 ` Ian Applegate 2013-11-26 0:44 ` Simon Kirby 0 siblings, 1 reply; 19+ messages in thread From: Ian Applegate @ 2013-08-20 7:51 UTC (permalink / raw) To: Al Viro Cc: Linus Torvalds, Christoph Lameter, Simon Kirby, Pekka Enberg, LKML, Chris Mason Unfortunately no boxen with CONFIG_DEBUG_MUTEXES among them. I can enable on a few and should have some results within the day. These mainly serve (quite a bit of) HTTP/S cache traffic. On Tue, Aug 20, 2013 at 12:21 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Tue, Aug 20, 2013 at 12:17:52AM -0700, Ian Applegate wrote: >> We are also seeing this or a similar issue. On a fairly widespread >> deployment of 3.10.1 & 3.10.6 this occurred fairly consistently on the >> order of 36 days (combined MTBF.) > > Do you have any boxen with CONFIG_DEBUG_MUTEXES among those? What > kind of setup do you have on those, BTW? > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds() 2013-08-20 7:51 ` Ian Applegate @ 2013-11-26 0:44 ` Simon Kirby 2013-11-26 23:16 ` Linus Torvalds 0 siblings, 1 reply; 19+ messages in thread From: Simon Kirby @ 2013-11-26 0:44 UTC (permalink / raw) To: Ian Applegate Cc: Al Viro, Linus Torvalds, Christoph Lameter, Pekka Enberg, LKML, Chris Mason On Tue, Aug 20, 2013 at 12:51:11AM -0700, Ian Applegate wrote: > Unfortunately no boxen with CONFIG_DEBUG_MUTEXES among them. I can > enable on a few and should have some results within the day. These > mainly serve (quite a bit of) HTTP/S cache traffic. > > On Tue, Aug 20, 2013 at 12:21 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Tue, Aug 20, 2013 at 12:17:52AM -0700, Ian Applegate wrote: > >> We are also seeing this or a similar issue. On a fairly widespread > >> deployment of 3.10.1 & 3.10.6 this occurred fairly consistently on the > >> order of 36 days (combined MTBF.) > > > > Do you have any boxen with CONFIG_DEBUG_MUTEXES among those? What > > kind of setup do you have on those, BTW? Hmm. So, we went through a few months of running with Linus' suggested culprit-catching patch w/DEBUG_PAGE_ALLOC, but it never tripped. We also ran with DEBUG_MUTEXES, but that never seem to catch anything, either. Ian, is it true that what you saw involved no btrfs? I was still guessing this is related to btrfs, as we are only seeing this on boxes doing btrfs rsync-snapshot backups. I don't know what else is interesting about our workload there, since we're not doing anything exotic. Meanwhille, with DEBUG_LIST on 3.12-rc, we found list corruption, which Josef fixed in 93858769172c4e3678917810e9d5de360eb991cc. This missed 3.12, unfortunately, so I built a 3.12 with Josef's btrfs-next merged (to 54563d41a58be77e9bd9ef7af1ea4026cf0e7e07, which contained that fix). I was hoping this or something else by 3.12 would have fixed it, so after testing we deployed this everywhere and turned off the rest of the debug options. I missed slub_debug on one server, though...and it just hit another case of overwritten poison. Is it true that with slub_debug, aliasing of equal-sized objects is turned off, and so they shouldn't be immediately side-by-side? In other words, would there be similar scrawling victim chances as allocating pipe_inode_info with pages instead of slabs? "slabinfo -a" is empty. [158037.526662] ============================================================================= [158037.528014] BUG kmalloc-192 (Not tainted): Poison overwritten [158037.528014] ----------------------------------------------------------------------------- [158037.528014] [158037.528014] Disabling lock debugging due to kernel taint [158037.528014] INFO: 0xffff88013af3da6c-0xffff88013af3da6c. First byte 0x6c instead of 0x6b [158037.528014] INFO: Allocated in alloc_pipe_info+0x1f/0xb0 age=22 cpu=3 pid=26402 [158037.528014] __slab_alloc.constprop.63+0x35b/0x3a0 [158037.528014] kmem_cache_alloc_trace+0xab/0x110 [158037.528014] alloc_pipe_info+0x1f/0xb0 [158037.528014] create_pipe_files+0x41/0x1f0 [158037.528014] __do_pipe_flags+0x3c/0xb0 [158037.528014] SyS_pipe2+0x1b/0xa0 [158037.528014] SyS_pipe+0xb/0x10 [158037.528014] system_call_fastpath+0x16/0x1b [158037.528014] INFO: Freed in free_pipe_info+0x6a/0x70 age=39 cpu=1 pid=26402 [158037.528014] __slab_free+0x2d/0x2d4 [158037.528014] kfree+0xfd/0x130 [158037.528014] free_pipe_info+0x6a/0x70 [158037.528014] pipe_release+0x94/0xf0 [158037.528014] __fput+0xa7/0x230 [158037.528014] ____fput+0x9/0x10 [158037.528014] task_work_run+0x97/0xd0 [158037.528014] do_notify_resume+0x66/0x70 [158037.528014] int_signal+0x12/0x17 [158037.528014] INFO: Slab 0xffffea0004ebcf00 objects=31 used=29 fp=0xffff88013af3e080 flags=0x8000000000004080 [158037.528014] INFO: Object 0xffff88013af3da68 @offset=6760 fp=0xffff88013af3ca28 [158037.528014] [158037.528014] Bytes b4 ffff88013af3da58: 97 b8 59 02 01 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ..Y.....ZZZZZZZZ [158037.528014] Object ffff88013af3da68: 6b 6b 6b 6b 6c 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkklkkkkkkkkkkk [158037.528014] Object ffff88013af3da78: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [158037.528014] Object ffff88013af3da88: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [158037.528014] Object ffff88013af3da98: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [158037.528014] Object ffff88013af3daa8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [158037.528014] Object ffff88013af3dab8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [158037.528014] Object ffff88013af3dac8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [158037.528014] Object ffff88013af3dad8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [158037.528014] Object ffff88013af3dae8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [158037.528014] Object ffff88013af3daf8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [158037.528014] Object ffff88013af3db08: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [158037.528014] Object ffff88013af3db18: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk. [158037.528014] Redzone ffff88013af3db28: bb bb bb bb bb bb bb bb ........ [158037.528014] Padding ffff88013af3dc68: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ [158037.528014] CPU: 3 PID: 26402 Comm: perl Tainted: G B 3.12.0-hw+ #75 [158037.528014] Hardware name: Dell Inc. PowerEdge 1950/0UR033, BIOS 2.0.1 10/30/2007 [158037.528014] ffff88013af3da68 ffff88012252fc98 ffffffff817f6c6e 0000000000000007 [158037.528014] ffff88042d804900 ffff88012252fcc8 ffffffff81135099 ffff88013af3da6d [158037.528014] ffff88042d804900 000000000000006b ffff88013af3da68 ffff88012252fd18 [158037.528014] Call Trace: [158037.528014] [<ffffffff817f6c6e>] dump_stack+0x46/0x58 [158037.528014] [<ffffffff81135099>] print_trailer+0xf9/0x160 [158037.528014] [<ffffffff81135232>] check_bytes_and_report+0xe2/0x120 [158037.528014] [<ffffffff81135437>] check_object+0x1c7/0x240 [158037.528014] [<ffffffff8114325f>] ? alloc_pipe_info+0x1f/0xb0 [158037.528014] [<ffffffff817f2d0c>] alloc_debug_processing+0x153/0x168 [158037.528014] [<ffffffff817f307c>] __slab_alloc.constprop.63+0x35b/0x3a0 [158037.528014] [<ffffffff8114325f>] ? alloc_pipe_info+0x1f/0xb0 [158037.528014] [<ffffffff811369ab>] kmem_cache_alloc_trace+0xab/0x110 [158037.528014] [<ffffffff81152f2d>] ? inode_init_always+0xed/0x1b0 [158037.528014] [<ffffffff8114325f>] alloc_pipe_info+0x1f/0xb0 [158037.528014] [<ffffffff81143781>] create_pipe_files+0x41/0x1f0 [158037.528014] [<ffffffff8114396c>] __do_pipe_flags+0x3c/0xb0 [158037.528014] [<ffffffff8107808c>] ? task_work_run+0x9c/0xd0 [158037.528014] [<ffffffff81143a3b>] SyS_pipe2+0x1b/0xa0 [158037.528014] [<ffffffff818045f0>] ? int_signal+0x12/0x17 [158037.528014] [<ffffffff81143acb>] SyS_pipe+0xb/0x10 [158037.528014] [<ffffffff818043b9>] system_call_fastpath+0x16/0x1b [158037.528014] FIX kmalloc-192: Restoring 0xffff88013af3da6c-0xffff88013af3da6c=0x6b [158037.528014] [158037.528014] FIX kmalloc-192: Marking all objects used There was a crash from btrfs memcpy shortly after this, which may or may not be related, followed by another poison overwritten. Writeback was wedged at this point. Full kernel log: http://0x.ca/sim/ref/3.12/dmesg Other stuff: http://0x.ca/sim/ref/3.12/ Simon- ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds() 2013-11-26 0:44 ` Simon Kirby @ 2013-11-26 23:16 ` Linus Torvalds 2013-11-26 23:44 ` Linus Torvalds 2013-11-30 9:43 ` Simon Kirby 0 siblings, 2 replies; 19+ messages in thread From: Linus Torvalds @ 2013-11-26 23:16 UTC (permalink / raw) To: Simon Kirby Cc: Ian Applegate, Al Viro, Christoph Lameter, Pekka Enberg, LKML, Chris Mason On Mon, Nov 25, 2013 at 4:44 PM, Simon Kirby <sim@hostway.ca> wrote: > > I was hoping this or something else by 3.12 would have fixed it, so after > testing we deployed this everywhere and turned off the rest of the debug > options. I missed slub_debug on one server, though...and it just hit > another case of overwritten poison. Your thing is *very* consistent, it's once more four bytes into that pipe-info. And it's once more that exact same "increment second word in the allocation" pattern. But the fact that it goes away when you enable other debug options is really really annoying. It's consistent only if you don't have the options on that might help us debug it further. Damn. > Is it true that with slub_debug, aliasing of equal-sized objects is > turned off, and so they shouldn't be immediately side-by-side? In other > words, would there be similar scrawling victim chances as allocating > pipe_inode_info with pages instead of slabs? "slabinfo -a" is empty. So the thing is, with slub debugging, slub shouldn't be merging different slab caches. HOWEVER. The pipe-info structure isn't using its own slab cache, it's just using "kmalloc()". So it by definition will merge with all other kmalloc() allocations of the same size (or, to be exact, of "similar enough size to hit the same size bucket"). In your case it's the 192-byte-sized bucket. But all the debugging code talks purely about pipe_info allocations - both the previous kmalloc/kfree _and_ the kmalloc() that actually sees the slub debugging error. So if it's mixing with something else, I'm not seeing what that would be. It would have to be an older allocation (as it "it got re-allocated to a pipe in between") or another type that was the similar size. Which doesn't look all that likely. Not when your problems are so consistent, and seem to be *always* about that pipe_inode_info. But dammit, that it such a simple set of allocations. I still don't see how they could be to blame. And if it's some suspicious access to the pipe mutex (that second word is still the "wait_lock" spinlock in the pipe inode mutex) I really would have expected the mutex debugging to have screamed loudly. Or the DEBUG_PAGEALLOC. I'm really not very happy with the whole pipe locking logic (or the refcounting we do, separately from the "struct inode"), and in that sense I'm perfectly willing to blame that code for doing bad things. But the fact that it all goes away with debugging makes me very very unhappy. Linus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds() 2013-11-26 23:16 ` Linus Torvalds @ 2013-11-26 23:44 ` Linus Torvalds 2013-11-30 9:43 ` Simon Kirby 1 sibling, 0 replies; 19+ messages in thread From: Linus Torvalds @ 2013-11-26 23:44 UTC (permalink / raw) To: Simon Kirby Cc: Ian Applegate, Al Viro, Christoph Lameter, Pekka Enberg, LKML, Chris Mason [-- Attachment #1: Type: text/plain, Size: 1397 bytes --] On Tue, Nov 26, 2013 at 3:16 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > I'm really not very happy with the whole pipe locking logic (or the > refcounting we do, separately from the "struct inode"), and in that > sense I'm perfectly willing to blame that code for doing bad things. > But the fact that it all goes away with debugging makes me very very > unhappy. Al, I really hate the "pipe_lock()" function that tests for "pipe->files" being non-zero. I don't think there are any valid cases where pipe->file ever *could* be zero, and if there are, they are fundamentally racy. Is there really any reason for that test? It used to test for "pipe->inode" and that kind of made sense as the pipe didn't even have a lock (it re-used the inode one). But these days that test makes zero sense, except as a "don't even bother locking if this is some fake internal pipe", but is that even valid? Also, why does "pipe_release()" still use i_pipe. I'd much rather it used "file->private_data" like everything else (and then it unconditionally clear it). We are, after all, talking about releasing the *file*, and we shouldn't be mixing up that inode in there. IOW, what is wrong with the attached patch? Then we could/should - make the free_pipe_info() happen from the drop_inode() - delete pipe->files counter entirely because it has no valid use Hmm? Linus [-- Attachment #2: patch.diff --] [-- Type: text/plain, Size: 1113 bytes --] fs/pipe.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/pipe.c b/fs/pipe.c index d2c45e14e6d8..719214ed5e5e 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -56,8 +56,8 @@ unsigned int pipe_min_size = PAGE_SIZE; static void pipe_lock_nested(struct pipe_inode_info *pipe, int subclass) { - if (pipe->files) - mutex_lock_nested(&pipe->mutex, subclass); + WARN_ON_ONCE(!pipe->files); + mutex_lock_nested(&pipe->mutex, subclass); } void pipe_lock(struct pipe_inode_info *pipe) @@ -71,8 +71,8 @@ EXPORT_SYMBOL(pipe_lock); void pipe_unlock(struct pipe_inode_info *pipe) { - if (pipe->files) - mutex_unlock(&pipe->mutex); + WARN_ON_ONCE(!pipe->files); + mutex_unlock(&pipe->mutex); } EXPORT_SYMBOL(pipe_unlock); @@ -729,9 +729,10 @@ pipe_poll(struct file *filp, poll_table *wait) static int pipe_release(struct inode *inode, struct file *file) { - struct pipe_inode_info *pipe = inode->i_pipe; + struct pipe_inode_info *pipe = file->private_data; int kill = 0; + file->private_data = NULL; __pipe_lock(pipe); if (file->f_mode & FMODE_READ) pipe->readers--; ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds() 2013-11-26 23:16 ` Linus Torvalds 2013-11-26 23:44 ` Linus Torvalds @ 2013-11-30 9:43 ` Simon Kirby 2013-11-30 17:25 ` Linus Torvalds 2013-11-30 21:08 ` Linus Torvalds 1 sibling, 2 replies; 19+ messages in thread From: Simon Kirby @ 2013-11-30 9:43 UTC (permalink / raw) To: Linus Torvalds Cc: Ian Applegate, Al Viro, Christoph Lameter, Pekka Enberg, LKML, Chris Mason On Tue, Nov 26, 2013 at 03:16:09PM -0800, Linus Torvalds wrote: > On Mon, Nov 25, 2013 at 4:44 PM, Simon Kirby <sim@hostway.ca> wrote: > > > > I was hoping this or something else by 3.12 would have fixed it, so after > > testing we deployed this everywhere and turned off the rest of the debug > > options. I missed slub_debug on one server, though...and it just hit > > another case of overwritten poison. > > Your thing is *very* consistent, it's once more four bytes into that > pipe-info. And it's once more that exact same "increment second word > in the allocation" pattern. > > > Is it true that with slub_debug, aliasing of equal-sized objects is > > turned off, and so they shouldn't be immediately side-by-side? In other > > words, would there be similar scrawling victim chances as allocating > > pipe_inode_info with pages instead of slabs? "slabinfo -a" is empty. > > So the thing is, with slub debugging, slub shouldn't be merging > different slab caches. > > HOWEVER. > > The pipe-info structure isn't using its own slab cache, it's just > using "kmalloc()". So it by definition will merge with all other > kmalloc() allocations of the same size (or, to be exact, of "similar > enough size to hit the same size bucket"). In your case it's the > 192-byte-sized bucket. I turned on kmalloc-192 tracing to find what else is using it: struct nfs_fh, struct bio, and struct cred. Poking around those, struct bio has bi_cnt, but it is way down in the struct. struct cred has "usage", but it comes first. Hmm. Nevertheless, I set: CONFIG_DEBUG_MUTEXES=y CONFIG_DEBUG_LIST=y CONFIG_DEBUG_CREDENTIALS=y And tried: diff --git a/include/linux/bio.h b/include/linux/bio.h index ec48bac..216dc43 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -168,7 +168,7 @@ static inline void *bio_data(struct bio *bio) * returns. and then bio would be freed memory when if (bio->bi_flags ...) * runs */ -#define bio_get(bio) atomic_inc(&(bio)->bi_cnt) +#define bio_get(bio) WARN_ON(atomic_inc_return(&(bio)->bi_cnt) == 0x6c) #if defined(CONFIG_BLK_DEV_INTEGRITY) /* diff --git a/include/linux/cred.h b/include/linux/cred.h index 04421e8..2646fe9 100644 --- a/include/linux/cred.h +++ b/include/linux/cred.h @@ -205,7 +205,9 @@ static inline void validate_process_creds(void) */ static inline struct cred *get_new_cred(struct cred *cred) { - atomic_inc(&cred->usage); + if (atomic_inc_return(&cred->usage) == 0x6c) { + WARN_ON(cred->uid == 0x6b); + } return cred; } On the same server, this last hunk warned fairly quickly: [ 850.303535] ------------[ cut here ]------------ [ 850.312774] WARNING: CPU: 3 PID: 6169 at include/linux/cred.h:209 get_empty_filp+0x109/0x1b0() [ 850.329974] Modules linked in: ipmi_devintf aoe ipmi_si bnx2 ipmi_msghandler evdev serio_raw [ 850.346913] CPU: 3 PID: 6169 Comm: omreport Not tainted 3.12.0-hw-debug-mutexes+ #83 [ 850.362374] Hardware name: Dell Inc. PowerEdge 1950/0UR033, BIOS 2.0.1 10/30/2007 [ 850.377316] 0000000000000009 ffff880428d0fd28 ffffffff817f2407 ffff88043fccf9e8 [ 850.392134] 0000000000000000 ffff880428d0fd68 ffffffff8105a537 ffff880428d0fd58 [ 850.406936] ffff880428d89e00 ffff88042960f480 ffff880428d0ff24 ffff88042a190000 [ 850.421746] Call Trace: [ 850.426627] [<ffffffff817f2407>] dump_stack+0x46/0x58 [ 850.436888] [<ffffffff8105a537>] warn_slowpath_common+0x87/0xb0 [ 850.448878] [<ffffffff8105a575>] warn_slowpath_null+0x15/0x20 [ 850.460523] [<ffffffff8113c7c9>] get_empty_filp+0x109/0x1b0 [ 850.471818] [<ffffffff811499c3>] path_openat+0x43/0x660 [ 850.482426] [<ffffffff8118595b>] ? fcntl_setlk+0x5b/0x2d0 [ 850.493391] [<ffffffff8114a38e>] do_filp_open+0x3e/0xa0 [ 850.504008] [<ffffffff81157bc4>] ? mntput_no_expire+0x44/0x130 [ 850.515842] [<ffffffff81156032>] ? __alloc_fd+0x42/0x110 [ 850.526630] [<ffffffff81139e9c>] do_sys_open+0x13c/0x230 [ 850.537428] [<ffffffff81187946>] compat_SyS_open+0x16/0x20 [ 850.548579] [<ffffffff81802268>] sysenter_dispatch+0x7/0x25 [ 850.559888] ---[ end trace acdbea3e141dbaec ]--- All traces are the same, and all Comms are "omreport", which is from the Dell OpenManage tools blob, executed regularly for RAID monitoring. Running it directly does not seem to cause the warning. kern.log shows it seems to warn every 20 minutes. No warnings from CONFIG_DEBUG_CREDENTIALS magic checking at all. Is there anything interesting about this tool? It is 32-bit. I can hook path_openat() and check for the cred contents there to print the path, if that would help. Simon- ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds() 2013-11-30 9:43 ` Simon Kirby @ 2013-11-30 17:25 ` Linus Torvalds 2013-11-30 21:04 ` Simon Kirby 2013-11-30 21:08 ` Linus Torvalds 1 sibling, 1 reply; 19+ messages in thread From: Linus Torvalds @ 2013-11-30 17:25 UTC (permalink / raw) To: Simon Kirby Cc: Ian Applegate, Al Viro, Christoph Lameter, Pekka Enberg, LKML, Chris Mason On Sat, Nov 30, 2013 at 1:43 AM, Simon Kirby <sim@hostway.ca> wrote: > On Tue, Nov 26, 2013 at 03:16:09PM -0800, Linus Torvalds wrote: >> >> The pipe-info structure isn't using its own slab cache, it's just >> using "kmalloc()". So it by definition will merge with all other >> kmalloc() allocations of the same size (or, to be exact, of "similar >> enough size to hit the same size bucket"). In your case it's the >> 192-byte-sized bucket. > > I turned on kmalloc-192 tracing to find what else is using it: struct > nfs_fh, struct bio, and struct cred. Poking around those, struct bio has > bi_cnt, but it is way down in the struct. struct cred has "usage", but it > comes first. Hmm. Nevertheless, I set: > > CONFIG_DEBUG_MUTEXES=y > CONFIG_DEBUG_LIST=y > CONFIG_DEBUG_CREDENTIALS=y > > And tried: > > diff --git a/include/linux/cred.h b/include/linux/cred.h > index 04421e8..2646fe9 100644 > --- a/include/linux/cred.h > +++ b/include/linux/cred.h > @@ -205,7 +205,9 @@ static inline void validate_process_creds(void) > */ > static inline struct cred *get_new_cred(struct cred *cred) > { > - atomic_inc(&cred->usage); > + if (atomic_inc_return(&cred->usage) == 0x6c) { > + WARN_ON(cred->uid == 0x6b); Oh, damn, I thought you had found it, and got very excited and already wrote a long email about things I wanted you to try. And then I started looking closer... That test is wrong. Both of those fields are 32-bit, so testing them against 0x6b/0x6c is bogus: you're just catching real cases. The reason it catches omreport is presumably because omreport runs as some special user that happens to have uid 107 (on my machine that happens to be qemu). And having a usage count of 108 isn't particularly strange either - creds get a lot of re-use. So close. It *might* still be one of those cases, but it doesn't really sound very likely. "bi_cnt" is deep inside the struct bio, and "usage" is at offset 0, not offset 4. And the ns_fh isn't very interesting. Linus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds() 2013-11-30 17:25 ` Linus Torvalds @ 2013-11-30 21:04 ` Simon Kirby 0 siblings, 0 replies; 19+ messages in thread From: Simon Kirby @ 2013-11-30 21:04 UTC (permalink / raw) To: Linus Torvalds Cc: Ian Applegate, Al Viro, Christoph Lameter, Pekka Enberg, LKML, Chris Mason On Sat, Nov 30, 2013 at 09:25:33AM -0800, Linus Torvalds wrote: > On Sat, Nov 30, 2013 at 1:43 AM, Simon Kirby <sim@hostway.ca> wrote: > > > I turned on kmalloc-192 tracing to find what else is using it: struct > > nfs_fh, struct bio, and struct cred. Poking around those, struct bio has > > bi_cnt, but it is way down in the struct. struct cred has "usage", but it > > comes first. Hmm. Nevertheless, I set: > > > > CONFIG_DEBUG_MUTEXES=y > > CONFIG_DEBUG_LIST=y > > CONFIG_DEBUG_CREDENTIALS=y > > > > And tried: > > > > diff --git a/include/linux/cred.h b/include/linux/cred.h > > index 04421e8..2646fe9 100644 > > --- a/include/linux/cred.h > > +++ b/include/linux/cred.h > > @@ -205,7 +205,9 @@ static inline void validate_process_creds(void) > > */ > > static inline struct cred *get_new_cred(struct cred *cred) > > { > > - atomic_inc(&cred->usage); > > + if (atomic_inc_return(&cred->usage) == 0x6c) { > > + WARN_ON(cred->uid == 0x6b); > > Oh, damn, I thought you had found it, and got very excited and already > wrote a long email about things I wanted you to try. And then I > started looking closer... > > That test is wrong. Both of those fields are 32-bit, so testing them > against 0x6b/0x6c is bogus: you're just catching real cases. The > reason it catches omreport is presumably because omreport runs as some > special user that happens to have uid 107 (on my machine that happens > to be qemu). And having a usage count of 108 isn't particularly > strange either - creds get a lot of re-use. > > So close. It *might* still be one of those cases, but it doesn't > really sound very likely. "bi_cnt" is deep inside the struct bio, and > "usage" is at offset 0, not offset 4. And the ns_fh isn't very > interesting. *head smack* Too much 8-bit AVR coding... Makes sense now: uid=107(nagios) gid=109(nagios) groups=109(nagios) Well, the chances of atomic_inc intentionally incrementing 0x6b6b6b6b are probably pretty rare. I'll try that. Simon- ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds() 2013-11-30 9:43 ` Simon Kirby 2013-11-30 17:25 ` Linus Torvalds @ 2013-11-30 21:08 ` Linus Torvalds 1 sibling, 0 replies; 19+ messages in thread From: Linus Torvalds @ 2013-11-30 21:08 UTC (permalink / raw) To: Simon Kirby Cc: Ian Applegate, Al Viro, Christoph Lameter, Pekka Enberg, LKML, Chris Mason [-- Attachment #1: Type: text/plain, Size: 752 bytes --] On Sat, Nov 30, 2013 at 1:43 AM, Simon Kirby <sim@hostway.ca> wrote: > > I turned on kmalloc-192 tracing to find what else is using it: struct > nfs_fh, struct bio, and struct cred. Poking around those, struct bio has > bi_cnt, but it is way down in the struct. struct cred has "usage", but it > comes first. You could also try to avoid the whole aliasing by using a separate slab for just the pipe_inode_info. Something like the attached patch (which also enables debugging just for that slab). I still don't see what could be wrong with the pipe_inode_info thing, but the fact that it's been so consistent in your traces does make me suspect it really is *that* particular slab. But I dunno, and this would just confirm it. Linus [-- Attachment #2: patch.diff --] [-- Type: text/plain, Size: 1901 bytes --] fs/pipe.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/fs/pipe.c b/fs/pipe.c index d2c45e14e6d8..7cfd6cdf0147 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -729,7 +729,7 @@ pipe_poll(struct file *filp, poll_table *wait) static int pipe_release(struct inode *inode, struct file *file) { - struct pipe_inode_info *pipe = inode->i_pipe; + struct pipe_inode_info *pipe = file->private_data; int kill = 0; __pipe_lock(pipe); @@ -776,11 +776,13 @@ pipe_fasync(int fd, struct file *filp, int on) return retval; } +static struct kmem_cache *pipe_info_slab; + struct pipe_inode_info *alloc_pipe_info(void) { struct pipe_inode_info *pipe; - pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL); + pipe = kmem_cache_zalloc(pipe_info_slab, GFP_KERNEL); if (pipe) { pipe->bufs = kzalloc(sizeof(struct pipe_buffer) * PIPE_DEF_BUFFERS, GFP_KERNEL); if (pipe->bufs) { @@ -790,7 +792,7 @@ struct pipe_inode_info *alloc_pipe_info(void) mutex_init(&pipe->mutex); return pipe; } - kfree(pipe); + kmem_cache_free(pipe_info_slab, pipe); } return NULL; @@ -808,7 +810,7 @@ void free_pipe_info(struct pipe_inode_info *pipe) if (pipe->tmp_page) __free_page(pipe->tmp_page); kfree(pipe->bufs); - kfree(pipe); + kmem_cache_free(pipe_info_slab, pipe); } static struct vfsmount *pipe_mnt __read_mostly; @@ -1311,8 +1313,17 @@ static struct file_system_type pipe_fs_type = { static int __init init_pipe_fs(void) { - int err = register_filesystem(&pipe_fs_type); + int err; + + pipe_info_slab = kmem_cache_create("pipe-inode-info", + sizeof(struct pipe_inode_info), + 0, + SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | SLAB_DEBUG_FREE, + NULL); + if (!pipe_info_slab) + return -ENOMEM; + err = register_filesystem(&pipe_fs_type); if (!err) { pipe_mnt = kern_mount(&pipe_fs_type); if (IS_ERR(pipe_mnt)) { ^ permalink raw reply related [flat|nested] 19+ messages in thread
end of thread, other threads:[~2013-11-30 21:08 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-07-06 0:09 [3.10] Oopses in kmem_cache_allocate() via prepare_creds() Simon Kirby 2013-07-06 8:27 ` Pekka Enberg 2013-08-19 20:17 ` Simon Kirby 2013-08-19 20:29 ` Christoph Lameter 2013-08-19 21:16 ` Linus Torvalds 2013-08-19 21:24 ` Chris Mason 2013-08-19 23:31 ` Simon Kirby 2013-09-03 20:43 ` Simon Kirby 2013-08-20 4:06 ` Al Viro 2013-08-20 7:17 ` Ian Applegate 2013-08-20 7:21 ` Al Viro 2013-08-20 7:51 ` Ian Applegate 2013-11-26 0:44 ` Simon Kirby 2013-11-26 23:16 ` Linus Torvalds 2013-11-26 23:44 ` Linus Torvalds 2013-11-30 9:43 ` Simon Kirby 2013-11-30 17:25 ` Linus Torvalds 2013-11-30 21:04 ` Simon Kirby 2013-11-30 21:08 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).