* mm: kernel BUG at mm/memory.c:1230 @ 2012-05-24 18:27 Sasha Levin 2012-05-24 19:07 ` Andrew Morton 0 siblings, 1 reply; 7+ messages in thread From: Sasha Levin @ 2012-05-24 18:27 UTC (permalink / raw) To: viro, oleg, akpm, a.p.zijlstra, mingo Cc: Dave Jones, linux-kernel@vger.kernel.org, linux-mm Hi all, During fuzzing with trinity inside a KVM tools guest, using latest linux-next, I've stumbled on the following: [ 2043.098949] ------------[ cut here ]------------ [ 2043.099014] kernel BUG at mm/memory.c:1230! [ 2043.099014] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 2043.111029] CPU 3 [ 2043.111029] Pid: 26853, comm: trinity Tainted: G W 3.4.0-next-20120524-sasha-00003-ge89ff01 #281 [ 2043.111029] RIP: 0010:[<ffffffff811f14d2>] [<ffffffff811f14d2>] unmap_page_range+0x232/0x3b0 [ 2043.111029] RSP: 0018:ffff880030349ce8 EFLAGS: 00010246 [ 2043.111029] RAX: ffff880000025000 RBX: ffff8800266bc000 RCX: 00003ffffffff000 [ 2043.111029] RDX: ffff880000000000 RSI: ffff88003028cfc0 RDI: 000000006de001e0 [ 2043.111029] RBP: ffff880030349d68 R08: 0000000100001000 R09: 0000000000000000 [ 2043.111029] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000100000000 [ 2043.111029] R13: 0000000100001000 R14: ffff880030349e08 R15: 0000000100000fff [ 2043.111029] FS: 0000000000000000(0000) GS:ffff880035a00000(0000) knlGS:0000000000000000 [ 2043.111029] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 2043.111029] CR2: 0000000000000ffc CR3: 0000000013480000 CR4: 00000000000406e0 [ 2043.111029] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2043.111029] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 2043.111029] Process trinity (pid: 26853, threadinfo ffff880030348000, task ffff88002ed68000) [ 2043.111029] Stack: [ 2043.111029] ffffffff811f0b55 0000000100000000 0000000100000fff 0000000100001000 [ 2043.111029] ffff880013480000 0000000100000fff 0000000000000000 ffff88003028cfc0 [ 2043.111029] ffff8800142b0020 0000000100001000 ffff880030349d58 ffff88003028cfc0 [ 2043.111029] Call Trace: [ 2043.111029] [<ffffffff811f0b55>] ? follow_page+0x315/0x5a0 [ 2043.111029] [<ffffffff811f1719>] unmap_single_vma+0xc9/0xe0 [ 2043.111029] [<ffffffff811f1792>] unmap_vmas+0x62/0xa0 [ 2043.111029] [<ffffffff811f77a9>] exit_mmap+0xc9/0x170 [ 2043.111029] [<ffffffff81225ae5>] ? __khugepaged_exit+0xd5/0x140 [ 2043.111029] [<ffffffff810cf719>] mmput+0x89/0xe0 [ 2043.111029] [<ffffffff810d5f7b>] exit_mm+0x11b/0x130 [ 2043.111029] [<ffffffff82f71b99>] ? _raw_spin_unlock_irq+0x59/0x80 [ 2043.111029] [<ffffffff810d8933>] do_exit+0x263/0x510 [ 2043.111029] [<ffffffff810d8c81>] do_group_exit+0xa1/0xe0 [ 2043.111029] [<ffffffff810d8cd2>] sys_exit_group+0x12/0x20 [ 2043.111029] [<ffffffff82f72bf9>] system_call_fastpath+0x16/0x1b [ 2043.111029] Code: 00 48 89 f8 66 66 66 90 84 c0 0f 89 89 00 00 00 4c 89 c0 4c 29 e0 48 3d 00 00 20 00 74 5b 49 8b 06 48 83 b8 a8 00 00 00 00 75 0e <0f> 0b 0f 1f 40 00 eb fe 66 0f 1f 44 00 00 48 8b 3b 48 83 3d 85 [ 2043.111029] RIP [<ffffffff811f14d2>] unmap_page_range+0x232/0x3b0 [ 2043.111029] RSP <ffff880030349ce8> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mm: kernel BUG at mm/memory.c:1230 2012-05-24 18:27 mm: kernel BUG at mm/memory.c:1230 Sasha Levin @ 2012-05-24 19:07 ` Andrew Morton 2012-05-24 19:14 ` Sasha Levin 2012-08-22 1:12 ` Andrea Arcangeli 0 siblings, 2 replies; 7+ messages in thread From: Andrew Morton @ 2012-05-24 19:07 UTC (permalink / raw) To: Sasha Levin Cc: viro, oleg, a.p.zijlstra, mingo, Dave Jones, linux-kernel@vger.kernel.org, linux-mm, Andrea Arcangeli On Thu, 24 May 2012 20:27:34 +0200 Sasha Levin <levinsasha928@gmail.com> wrote: > Hi all, > > During fuzzing with trinity inside a KVM tools guest, using latest linux-next, I've stumbled on the following: > > [ 2043.098949] ------------[ cut here ]------------ > [ 2043.099014] kernel BUG at mm/memory.c:1230! That's VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem)); in zap_pmd_range()? > [ 2043.099014] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC > [ 2043.111029] CPU 3 > [ 2043.111029] Pid: 26853, comm: trinity Tainted: G W 3.4.0-next-20120524-sasha-00003-ge89ff01 #281 > [ 2043.111029] RIP: 0010:[<ffffffff811f14d2>] [<ffffffff811f14d2>] unmap_page_range+0x232/0x3b0 > [ 2043.111029] RSP: 0018:ffff880030349ce8 EFLAGS: 00010246 > [ 2043.111029] RAX: ffff880000025000 RBX: ffff8800266bc000 RCX: 00003ffffffff000 > [ 2043.111029] RDX: ffff880000000000 RSI: ffff88003028cfc0 RDI: 000000006de001e0 > [ 2043.111029] RBP: ffff880030349d68 R08: 0000000100001000 R09: 0000000000000000 > [ 2043.111029] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000100000000 > [ 2043.111029] R13: 0000000100001000 R14: ffff880030349e08 R15: 0000000100000fff > [ 2043.111029] FS: 0000000000000000(0000) GS:ffff880035a00000(0000) knlGS:0000000000000000 > [ 2043.111029] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 2043.111029] CR2: 0000000000000ffc CR3: 0000000013480000 CR4: 00000000000406e0 > [ 2043.111029] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 2043.111029] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 2043.111029] Process trinity (pid: 26853, threadinfo ffff880030348000, task ffff88002ed68000) > [ 2043.111029] Stack: > [ 2043.111029] ffffffff811f0b55 0000000100000000 0000000100000fff 0000000100001000 > [ 2043.111029] ffff880013480000 0000000100000fff 0000000000000000 ffff88003028cfc0 > [ 2043.111029] ffff8800142b0020 0000000100001000 ffff880030349d58 ffff88003028cfc0 > [ 2043.111029] Call Trace: > [ 2043.111029] [<ffffffff811f0b55>] ? follow_page+0x315/0x5a0 > [ 2043.111029] [<ffffffff811f1719>] unmap_single_vma+0xc9/0xe0 > [ 2043.111029] [<ffffffff811f1792>] unmap_vmas+0x62/0xa0 > [ 2043.111029] [<ffffffff811f77a9>] exit_mmap+0xc9/0x170 > [ 2043.111029] [<ffffffff81225ae5>] ? __khugepaged_exit+0xd5/0x140 > [ 2043.111029] [<ffffffff810cf719>] mmput+0x89/0xe0 > [ 2043.111029] [<ffffffff810d5f7b>] exit_mm+0x11b/0x130 > [ 2043.111029] [<ffffffff82f71b99>] ? _raw_spin_unlock_irq+0x59/0x80 > [ 2043.111029] [<ffffffff810d8933>] do_exit+0x263/0x510 > [ 2043.111029] [<ffffffff810d8c81>] do_group_exit+0xa1/0xe0 > [ 2043.111029] [<ffffffff810d8cd2>] sys_exit_group+0x12/0x20 > [ 2043.111029] [<ffffffff82f72bf9>] system_call_fastpath+0x16/0x1b > [ 2043.111029] Code: 00 48 89 f8 66 66 66 90 84 c0 0f 89 89 00 00 00 4c 89 c0 4c 29 e0 48 3d 00 00 20 00 74 5b 49 8b 06 48 83 b8 a8 00 00 00 00 75 0e <0f> 0b 0f 1f 40 00 eb fe 66 0f 1f 44 00 00 48 8b 3b 48 83 3d 85 > [ 2043.111029] RIP [<ffffffff811f14d2>] unmap_page_range+0x232/0x3b0 > [ 2043.111029] RSP <ffff880030349ce8> The assertion was added in Jan 2011 by 14d1a55cd26f1860 ("thp: add debug checks for mapcount related invariants"). AFAICT it's just wrong on the exit path. Unclear why it's triggering now... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mm: kernel BUG at mm/memory.c:1230 2012-05-24 19:07 ` Andrew Morton @ 2012-05-24 19:14 ` Sasha Levin 2012-05-26 20:26 ` Hugh Dickins 2012-08-22 1:12 ` Andrea Arcangeli 1 sibling, 1 reply; 7+ messages in thread From: Sasha Levin @ 2012-05-24 19:14 UTC (permalink / raw) To: Andrew Morton Cc: viro, oleg, a.p.zijlstra, mingo, Dave Jones, linux-kernel@vger.kernel.org, linux-mm, Andrea Arcangeli On Thu, May 24, 2012 at 9:07 PM, Andrew Morton <akpm@linux-foundation.org> wrote: > On Thu, 24 May 2012 20:27:34 +0200 > Sasha Levin <levinsasha928@gmail.com> wrote: > >> Hi all, >> >> During fuzzing with trinity inside a KVM tools guest, using latest linux-next, I've stumbled on the following: >> >> [ 2043.098949] ------------[ cut here ]------------ >> [ 2043.099014] kernel BUG at mm/memory.c:1230! > > That's > > VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem)); > > in zap_pmd_range()? Yup. > The assertion was added in Jan 2011 by 14d1a55cd26f1860 ("thp: add > debug checks for mapcount related invariants"). AFAICT it's just wrong > on the exit path. Unclear why it's triggering now... I'm not sure if that's indeed the issue or not, but note that this is the first time I've managed to trigger that with the fuzzer, and it's not that easy to reproduce. Which is a bit odd for code that was there for 4 months... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mm: kernel BUG at mm/memory.c:1230 2012-05-24 19:14 ` Sasha Levin @ 2012-05-26 20:26 ` Hugh Dickins 2012-05-26 23:54 ` Andrea Arcangeli 2012-05-27 20:45 ` Sasha Levin 0 siblings, 2 replies; 7+ messages in thread From: Hugh Dickins @ 2012-05-26 20:26 UTC (permalink / raw) To: Sasha Levin Cc: Andrew Morton, viro, oleg, a.p.zijlstra, mingo, Dave Jones, linux-kernel@vger.kernel.org, linux-mm, Andrea Arcangeli [-- Attachment #1: Type: TEXT/PLAIN, Size: 2259 bytes --] On Thu, 24 May 2012, Sasha Levin wrote: > On Thu, May 24, 2012 at 9:07 PM, Andrew Morton > <akpm@linux-foundation.org> wrote: > > On Thu, 24 May 2012 20:27:34 +0200 > > Sasha Levin <levinsasha928@gmail.com> wrote: > > > >> Hi all, > >> > >> During fuzzing with trinity inside a KVM tools guest, using latest linux-next, I've stumbled on the following: > >> > >> [ 2043.098949] ------------[ cut here ]------------ > >> [ 2043.099014] kernel BUG at mm/memory.c:1230! > > > > That's > > > > VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem)); > > > > in zap_pmd_range()? > > Yup. > > > The assertion was added in Jan 2011 by 14d1a55cd26f1860 ("thp: add > > debug checks for mapcount related invariants"). AFAICT it's just wrong > > on the exit path. Unclear why it's triggering now... I've been round this loop before with that particular VM_BUG_ON. At first I thought like Andrew, that it's glaringly wrong on the exit path; but then changed my mind. When munmapping, we certainly can arrive here with an unaligned addr and next; but in that case rwsem_is_locked. Whereas in exiting, rwsem is not locked, but we're going linearly upwards, and whenever we walk into a pmd_trans_huge area, both addr and next should be hpage aligned: the vma bounds are unsuited to THP if they're unaligned. Other cases equally should not arise: madvise MADV_DONTNEED should have rwsem_is_locked; and truncation or hole-punching shouldn't be possible on a pure-anonymous (!vma->vm_ops) area considered for THP. But I cannot remember what brought me here before: a crash in testing on one of my machines, which further investigation root-caused elsewhere? or a report from someone else? or noticed when auditing another problem? I'm frustrated not to recall. > > I'm not sure if that's indeed the issue or not, but note that this is > the first time I've managed to trigger that with the fuzzer, and it's > not that easy to reproduce. Which is a bit odd for code that was there > for 4 months... I'm keeping off the linux-next for the moment; I'll worry about this more if it shows up when we try 3.5-rc1. Your fuzzing tells that my logic above is wrong, but maybe it's just a passing defect in next. Hugh ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mm: kernel BUG at mm/memory.c:1230 2012-05-26 20:26 ` Hugh Dickins @ 2012-05-26 23:54 ` Andrea Arcangeli 2012-05-27 20:45 ` Sasha Levin 1 sibling, 0 replies; 7+ messages in thread From: Andrea Arcangeli @ 2012-05-26 23:54 UTC (permalink / raw) To: Hugh Dickins Cc: Sasha Levin, Andrew Morton, viro, oleg, a.p.zijlstra, mingo, Dave Jones, linux-kernel@vger.kernel.org, linux-mm Hello everyone, On Sat, May 26, 2012 at 01:26:48PM -0700, Hugh Dickins wrote: > I've been round this loop before with that particular VM_BUG_ON. > > At first I thought like Andrew, that it's glaringly wrong on the exit > path; but then changed my mind. > > When munmapping, we certainly can arrive here with an unaligned addr > and next; but in that case rwsem_is_locked. > > Whereas in exiting, rwsem is not locked, but we're going linearly upwards, > and whenever we walk into a pmd_trans_huge area, both addr and next should > be hpage aligned: the vma bounds are unsuited to THP if they're unaligned. > > Other cases equally should not arise: madvise MADV_DONTNEED should > have rwsem_is_locked; and truncation or hole-punching shouldn't be > possible on a pure-anonymous (!vma->vm_ops) area considered for THP. > > But I cannot remember what brought me here before: a crash in testing > on one of my machines, which further investigation root-caused elsewhere? > or a report from someone else? or noticed when auditing another problem? > I'm frustrated not to recall. I agree it's not a false positive. The reason I introduced that VM_BUG_ON was to verify if any vma_adjust_trans_huge() was missing anywhere (so that it doesn't crash later in split_huge_page with an obscure mapcount != page_mapcount BUG_ON, there it would be much less obvious to see why it crashed than here). We should printk addr, end and the vma->vm_start/vm_end to debug this further. > > I'm not sure if that's indeed the issue or not, but note that this is > > the first time I've managed to trigger that with the fuzzer, and it's > > not that easy to reproduce. Which is a bit odd for code that was there > > for 4 months... > > I'm keeping off the linux-next for the moment; I'll worry about this > more if it shows up when we try 3.5-rc1. Your fuzzing tells that my > logic above is wrong, but maybe it's just a passing defect in next. If it's a missing vma_adjust_trans_huge() it shouldn't go unnoticed even with DEBUG_VM=n, so I agree that if it only happens on linux-next it's worth trying to reproduce it with 3.5-rc/3.4 too just in case. It's actually the first time I hear of this bugcheck triggering. Thanks! Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mm: kernel BUG at mm/memory.c:1230 2012-05-26 20:26 ` Hugh Dickins 2012-05-26 23:54 ` Andrea Arcangeli @ 2012-05-27 20:45 ` Sasha Levin 1 sibling, 0 replies; 7+ messages in thread From: Sasha Levin @ 2012-05-27 20:45 UTC (permalink / raw) To: Hugh Dickins Cc: Andrew Morton, viro, oleg, a.p.zijlstra, mingo, Dave Jones, linux-kernel@vger.kernel.org, linux-mm, Andrea Arcangeli On Sat, May 26, 2012 at 10:26 PM, Hugh Dickins <hughd@google.com> wrote: > I'm keeping off the linux-next for the moment; I'll worry about this > more if it shows up when we try 3.5-rc1. Your fuzzing tells that my > logic above is wrong, but maybe it's just a passing defect in next. I have a theory about this, which might explain it. After a couple of days of not being able to reproduce it, I've decided to revert Mel Gorman's patch related to memory corruption in mbind(). Once I've reverted it, I wasn't able to reproduce this exact case, but did observe several other interesting things: - The original mbind() memory corruption. - Corruption in eventfd related structures (same dump as the mbind one, but about eventfd structure). - Same as above, but with flock. - Hit a different BUG() in mm/mempolicy.c (The one at the end of slab_node()). Is it possible that this issue could be explained by a corruption related to the mbind() issue? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mm: kernel BUG at mm/memory.c:1230 2012-05-24 19:07 ` Andrew Morton 2012-05-24 19:14 ` Sasha Levin @ 2012-08-22 1:12 ` Andrea Arcangeli 1 sibling, 0 replies; 7+ messages in thread From: Andrea Arcangeli @ 2012-08-22 1:12 UTC (permalink / raw) To: Andrew Morton Cc: Sasha Levin, viro, oleg, a.p.zijlstra, mingo, Dave Jones, linux-kernel@vger.kernel.org, linux-mm Hi everyone, On Thu, May 24, 2012 at 12:07:27PM -0700, Andrew Morton wrote: > On Thu, 24 May 2012 20:27:34 +0200 > Sasha Levin <levinsasha928@gmail.com> wrote: > > > Hi all, > > > > During fuzzing with trinity inside a KVM tools guest, using latest linux-next, I've stumbled on the following: > > > > [ 2043.098949] ------------[ cut here ]------------ > > [ 2043.099014] kernel BUG at mm/memory.c:1230! > > That's > > VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem)); > > in zap_pmd_range()? Originally split_huge_page_address didn't exist. If the vma was splitted at a not 2m aligned address by a syscall like madvise that would only mangle the vma and not touch the pagetables (munmap for example was safe), the THP would remain in place and it would lead to a BUG_ON in split_huge_page where the number of rmaps was different than the page_mapcount for a cascade of side effects of the above bug triggering. It was a the most more obscure BUG_ON I got in the whole THP development and the hardest bug to fix (it was not easily reproducible either, madvise not so common). After I fixed it adding split_huge_page_address, I also added this VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem)). So if I missed any split_huge_page_address invocation I would get a more meaningful VM_BUG_ON, closer to the actual bug, signaling problems in the vma layout and not anymore a misleading BUG_ON in the split_huge_page internals when in fact split_huge_page was perfectly fine. My previous theory was a bug in the vma mangling of mbind, it could still be it, I didn't review it closely yet. But mbind is one syscall that like madvise depends on split_huge_page_address when it does split_vma! So now I think I found the cause of the above VM_BUG_ON. split_huge_page_address uses pmd_present so it won't run if the hugepage is under splitting. So it's likely the below will fix the above VM_BUG_ON. The race condition is tiny, it's not a critical bug and it makes sense that only a syscall stresser like trinity can exercise it and not any real app. static void split_huge_page_address(struct mm_struct *mm, unsigned long address) { [..] if (!pmd_present(*pmd)) return; /* * Caller holds the mmap_sem write mode, so a huge pmd cannot * materialize from under us. */ split_huge_page_pmd(mm, pmd); } This time I think it is worth to fix pmd_present for good instead of converting it to !pmd_none like I did with most others. I'm well aware pmd_present wasn't ok during split_huge_page but most have been converted and I didn't change what wasn't absolutely necessary in case some lowlevel code depended on the lowlevel semantics of pmd_present (strict _PRESENT check) but now it looks to risky not to fix it. The below patch isn't well tested yet. Reviews welcome. Especially if you could test it again with trinity over the mbind syscall it'd be wonderful. Thanks, Andrea === ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-08-22 1:12 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-05-24 18:27 mm: kernel BUG at mm/memory.c:1230 Sasha Levin 2012-05-24 19:07 ` Andrew Morton 2012-05-24 19:14 ` Sasha Levin 2012-05-26 20:26 ` Hugh Dickins 2012-05-26 23:54 ` Andrea Arcangeli 2012-05-27 20:45 ` Sasha Levin 2012-08-22 1:12 ` Andrea Arcangeli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).