* kernel crash when using libnuma
@ 2012-02-06 8:57 Trevor Kramer
2012-02-06 22:23 ` Andi Kleen
0 siblings, 1 reply; 3+ messages in thread
From: Trevor Kramer @ 2012-02-06 8:57 UTC (permalink / raw)
To: linux-numa
I have a program which can use libnuma to allocate memory using
numa_alloc_onnode() or using malloc. When running in malloc mode
everything works fine but when running under libnuma mode I get
consistent kernel panics with the following traces. This only occurs
when multiple threads are running. Has anyone seen this before or have
any recommendations on how to debug further?
crash> bt
PID: 62333 TASK: ffff883ff5698b40 CPU: 17 COMMAND: "test"
#0 [ffff883ff58378f0] machine_kexec at ffffffff810310cb
#1 [ffff883ff5837950] crash_kexec at ffffffff810b6392
#2 [ffff883ff5837a20] oops_end at ffffffff814de670
#3 [ffff883ff5837a50] die at ffffffff8100f2eb
#4 [ffff883ff5837a80] do_trap at ffffffff814ddf64
#5 [ffff883ff5837ae0] do_invalid_op at ffffffff8100ceb5
#6 [ffff883ff5837b80] invalid_op at ffffffff8100bf5b
[exception RIP: split_huge_page+2021]
RIP: ffffffff8116c605 RSP: ffff883ff5837c38 RFLAGS: 00010297
RAX: 0000000000000001 RBX: ffff880ff704bc38 RCX: 000000000000fe9e
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
RBP: ffff883ff5837d08 R8: 0000000000000000 R9: 0000000000000004
R10: 0000000000000001 R11: ffff880ff6fb7906 R12: ffff880ff84b7aa8
R13: fffffffffffffff2 R14: ffffea006c34c000 R15: ffffea006c34c000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffff883ff5837c30] split_huge_page at ffffffff8116c5aa
#8 [ffff883ff5837d10] __split_huge_page_pmd at ffffffff8116c6d1
#9 [ffff883ff5837d40] unmap_vmas at ffffffff8113559e
#10 [ffff883ff5837e80] unmap_region at ffffffff8113cce1
#11 [ffff883ff5837ef0] do_munmap at ffffffff8113d3a6
#12 [ffff883ff5837f50] sys_munmap at ffffffff8113d4e6
#13 [ffff883ff5837f80] system_call_fastpath at ffffffff8100b172
RIP: 00007f12d33154d2 RSP: 00007f12884731f8 RFLAGS: 00010283
RAX: 000000000000000b RBX: ffffffff8100b172 RCX: 0000000000000020
RDX: 0000000000000000 RSI: 00000000003fe560 RDI: 00007f129f460000
RBP: 00000000003fe560 R8: 00007f1288475300 R9: 00007f1288475300
R10: 0000003d9c0eb3b0 R11: 0000000000000246 R12: 0000003d9c0f1fc0
R13: 0000003d9c0f0e00 R14: 00007f129f460000 R15: 00007f129f460000
ORIG_RAX: 000000000000000b CS: 0033 SS: 002b
The machine is running RedHat Enterprise Server 6 with
2.6.32-220.4.1.el6.x86_64.
Thanks,
Trevor
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: kernel crash when using libnuma
2012-02-06 8:57 kernel crash when using libnuma Trevor Kramer
@ 2012-02-06 22:23 ` Andi Kleen
2012-02-06 22:43 ` Andrea Arcangeli
0 siblings, 1 reply; 3+ messages in thread
From: Andi Kleen @ 2012-02-06 22:23 UTC (permalink / raw)
To: Trevor Kramer; +Cc: linux-numa, aarcange
On Mon, Feb 06, 2012 at 03:57:52AM -0500, Trevor Kramer wrote:
> I have a program which can use libnuma to allocate memory using
> numa_alloc_onnode() or using malloc. When running in malloc mode
> everything works fine but when running under libnuma mode I get
> consistent kernel panics with the following traces. This only occurs
> when multiple threads are running. Has anyone seen this before or have
> any recommendations on how to debug further?
Looks like a THP problem.
For RHEL issues you normally need to talk to RedHat, these lists
are more for mainline.
-Andi
>
> crash> bt
> PID: 62333 TASK: ffff883ff5698b40 CPU: 17 COMMAND: "test"
> #0 [ffff883ff58378f0] machine_kexec at ffffffff810310cb
> #1 [ffff883ff5837950] crash_kexec at ffffffff810b6392
> #2 [ffff883ff5837a20] oops_end at ffffffff814de670
> #3 [ffff883ff5837a50] die at ffffffff8100f2eb
> #4 [ffff883ff5837a80] do_trap at ffffffff814ddf64
> #5 [ffff883ff5837ae0] do_invalid_op at ffffffff8100ceb5
> #6 [ffff883ff5837b80] invalid_op at ffffffff8100bf5b
> [exception RIP: split_huge_page+2021]
> RIP: ffffffff8116c605 RSP: ffff883ff5837c38 RFLAGS: 00010297
> RAX: 0000000000000001 RBX: ffff880ff704bc38 RCX: 000000000000fe9e
> RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
> RBP: ffff883ff5837d08 R8: 0000000000000000 R9: 0000000000000004
> R10: 0000000000000001 R11: ffff880ff6fb7906 R12: ffff880ff84b7aa8
> R13: fffffffffffffff2 R14: ffffea006c34c000 R15: ffffea006c34c000
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> #7 [ffff883ff5837c30] split_huge_page at ffffffff8116c5aa
> #8 [ffff883ff5837d10] __split_huge_page_pmd at ffffffff8116c6d1
> #9 [ffff883ff5837d40] unmap_vmas at ffffffff8113559e
> #10 [ffff883ff5837e80] unmap_region at ffffffff8113cce1
> #11 [ffff883ff5837ef0] do_munmap at ffffffff8113d3a6
> #12 [ffff883ff5837f50] sys_munmap at ffffffff8113d4e6
> #13 [ffff883ff5837f80] system_call_fastpath at ffffffff8100b172
> RIP: 00007f12d33154d2 RSP: 00007f12884731f8 RFLAGS: 00010283
> RAX: 000000000000000b RBX: ffffffff8100b172 RCX: 0000000000000020
> RDX: 0000000000000000 RSI: 00000000003fe560 RDI: 00007f129f460000
> RBP: 00000000003fe560 R8: 00007f1288475300 R9: 00007f1288475300
> R10: 0000003d9c0eb3b0 R11: 0000000000000246 R12: 0000003d9c0f1fc0
> R13: 0000003d9c0f0e00 R14: 00007f129f460000 R15: 00007f129f460000
> ORIG_RAX: 000000000000000b CS: 0033 SS: 002b
>
> The machine is running RedHat Enterprise Server 6 with
> 2.6.32-220.4.1.el6.x86_64.
>
> Thanks,
>
> Trevor
> --
> To unsubscribe from this list: send the line "unsubscribe linux-numa" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: kernel crash when using libnuma
2012-02-06 22:23 ` Andi Kleen
@ 2012-02-06 22:43 ` Andrea Arcangeli
0 siblings, 0 replies; 3+ messages in thread
From: Andrea Arcangeli @ 2012-02-06 22:43 UTC (permalink / raw)
To: Andi Kleen; +Cc: Trevor Kramer, linux-numa
On Mon, Feb 06, 2012 at 11:23:18PM +0100, Andi Kleen wrote:
> On Mon, Feb 06, 2012 at 03:57:52AM -0500, Trevor Kramer wrote:
> > I have a program which can use libnuma to allocate memory using
> > numa_alloc_onnode() or using malloc. When running in malloc mode
> > everything works fine but when running under libnuma mode I get
> > consistent kernel panics with the following traces. This only occurs
> > when multiple threads are running. Has anyone seen this before or have
> > any recommendations on how to debug further?
>
>
> Looks like a THP problem.
>
> For RHEL issues you normally need to talk to RedHat, these lists
> are more for mainline.
Well at this point we don't know yet if this affects mainline too or
not.
To be sure, you can file a bugzilla.redhat.com and we'll fix it ASAP
and submit the fix upstream if it happens there too.
Best of all is if you can send the source of the program that trigger
this attached to the bugzilla (or by email). Or create a small source
testcase that can reproduce it.
A BUG_ON is triggering, probably the rmap mapcount vs page->mapcount
one. I'm unsure why this is related to libnuma only, and a wild guess
could be that the vma policy does something wrong over the vmas to the
point rmap won't find the pmds (like wrong vma splitting or
something). But thanks to the BUG_ON there is close to zero risk of
data or memory corruption, just an annoyance we need to fix (plus the
program I assume requires root to use libnuma).
The last time (more than one year ago) I've seen this it was a bug in
the vma splitting. So again guessing wild, it could be a missing
vma_adjust_trans_huge in mempolicy.c or some other place splitting
vmas to create a partial vma policy on an existing vma.
Thanks,
Andrea
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-02-06 22:43 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-06 8:57 kernel crash when using libnuma Trevor Kramer
2012-02-06 22:23 ` Andi Kleen
2012-02-06 22:43 ` Andrea Arcangeli
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).