linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Sasha Levin <levinsasha928@gmail.com>,
	viro <viro@zeniv.linux.org.uk>,
	oleg@redhat.com, "a.p.zijlstra" <a.p.zijlstra@chello.nl>,
	mingo <mingo@kernel.org>, Dave Jones <davej@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>
Subject: Re: mm: kernel BUG at mm/memory.c:1230
Date: Wed, 22 Aug 2012 03:12:13 +0200	[thread overview]
Message-ID: <20120822011213.GM29978@redhat.com> (raw)
In-Reply-To: <20120524120727.6eab2f97.akpm@linux-foundation.org>

Hi everyone,

On Thu, May 24, 2012 at 12:07:27PM -0700, Andrew Morton wrote:
> On Thu, 24 May 2012 20:27:34 +0200
> Sasha Levin <levinsasha928@gmail.com> wrote:
> 
> > Hi all,
> > 
> > During fuzzing with trinity inside a KVM tools guest, using latest linux-next, I've stumbled on the following:
> > 
> > [ 2043.098949] ------------[ cut here ]------------
> > [ 2043.099014] kernel BUG at mm/memory.c:1230!
> 
> That's
> 
> 	VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem));
> 
> in zap_pmd_range()?

Originally split_huge_page_address didn't exist. If the vma was
splitted at a not 2m aligned address by a syscall like madvise that
would only mangle the vma and not touch the pagetables (munmap for
example was safe), the THP would remain in place and it would lead to
a BUG_ON in split_huge_page where the number of rmaps was different
than the page_mapcount for a cascade of side effects of the above bug
triggering. It was a the most more obscure BUG_ON I got in the whole
THP development and the hardest bug to fix (it was not easily
reproducible either, madvise not so common).

After I fixed it adding split_huge_page_address, I also added this
VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem)). So if I missed any
split_huge_page_address invocation I would get a more meaningful
VM_BUG_ON, closer to the actual bug, signaling problems in the vma
layout and not anymore a misleading BUG_ON in the split_huge_page
internals when in fact split_huge_page was perfectly fine.

My previous theory was a bug in the vma mangling of mbind, it could
still be it, I didn't review it closely yet. But mbind is one syscall
that like madvise depends on split_huge_page_address when it does
split_vma!

So now I think I found the cause of the above
VM_BUG_ON. split_huge_page_address uses pmd_present so it won't run if
the hugepage is under splitting. So it's likely the below will fix the
above VM_BUG_ON. The race condition is tiny, it's not a critical bug
and it makes sense that only a syscall stresser like trinity can
exercise it and not any real app.

static void split_huge_page_address(struct mm_struct *mm,
				    unsigned long address)
{
[..]
	if (!pmd_present(*pmd))
		return;
	/*
	 * Caller holds the mmap_sem write mode, so a huge pmd cannot
	 * materialize from under us.
	 */
	split_huge_page_pmd(mm, pmd);
}

This time I think it is worth to fix pmd_present for good instead of
converting it to !pmd_none like I did with most others.

I'm well aware pmd_present wasn't ok during split_huge_page but most
have been converted and I didn't change what wasn't absolutely
necessary in case some lowlevel code depended on the lowlevel
semantics of pmd_present (strict _PRESENT check) but now it looks to
risky not to fix it.

The below patch isn't well tested yet. Reviews welcome. Especially if
you could test it again with trinity over the mbind syscall it'd be
wonderful.

Thanks,
Andrea

===

      parent reply	other threads:[~2012-08-22  1:12 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-24 18:27 mm: kernel BUG at mm/memory.c:1230 Sasha Levin
2012-05-24 19:07 ` Andrew Morton
2012-05-24 19:14   ` Sasha Levin
2012-05-26 20:26     ` Hugh Dickins
2012-05-26 23:54       ` Andrea Arcangeli
2012-05-27 20:45       ` Sasha Levin
2012-08-22  1:12   ` Andrea Arcangeli [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120822011213.GM29978@redhat.com \
    --to=aarcange@redhat.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=davej@redhat.com \
    --cc=levinsasha928@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).