All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Sasha Levin <sasha.levin@oracle.com>
Cc: Hugh Dickins <hughd@google.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Jones <davej@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Cyrill Gorcunov <gorcunov@gmail.com>
Subject: Re: mm: BUG in unmap_page_range
Date: Mon, 8 Sep 2014 18:18:53 +0100	[thread overview]
Message-ID: <20140908171853.GN17501@suse.de> (raw)
In-Reply-To: <54082B25.9090600@oracle.com>

On Thu, Sep 04, 2014 at 05:04:37AM -0400, Sasha Levin wrote:
> On 08/29/2014 09:23 PM, Sasha Levin wrote:
> > On 08/27/2014 11:26 AM, Mel Gorman wrote:
> >> > diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
> >> > index 281870f..ffea570 100644
> >> > --- a/include/asm-generic/pgtable.h
> >> > +++ b/include/asm-generic/pgtable.h
> >> > @@ -723,6 +723,9 @@ static inline pte_t pte_mknuma(pte_t pte)
> >> >  
> >> >  	VM_BUG_ON(!(val & _PAGE_PRESENT));
> >> >  
> >> > +	/* debugging only, specific to x86 */
> >> > +	VM_BUG_ON(val & _PAGE_PROTNONE);
> >> > +
> >> >  	val &= ~_PAGE_PRESENT;
> >> >  	val |= _PAGE_NUMA;
> > Triggered again, the first VM_BUG_ON got hit, the second one never did.
> 
> Okay, this bug has reproduced quite a few times since then that I no longer
> suspect it's random memory corruption. I'd be happy to try out more debug
> patches if you have any leads.
> 

The fact the second one doesn't trigger makes me think that this is not
related to how the helpers are called and is instead relating to timing.
I tried reproducing this but got nothing after 3 hours. How long does it
typically take to reproduce in a given run? You mentioned that it takes a
few weeks to hit but maybe the frequency has changed since. I tried todays
linux-next kernel but it didn't even boot so next-20140826 to match your
original report but got nothing. Can you also send me the config you used
in case that's a factor.

I had one hunch that this may somehow be related to a collision between
pagetable teardown during exit and the scanner but I could not find a
way that could actually happen. During teardown there should be only one
user of the mm and it can't race with itself.

A worse possibility is that somehow the lock is getting corrupted but
that's also a tough sell considering that the locks should be allocated
from a dedicated cache. I guess I could try breaking that to allocate
one page per lock so DEBUG_PAGEALLOC triggers but I'm not very
optimistic.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@suse.de>
To: Sasha Levin <sasha.levin@oracle.com>
Cc: Hugh Dickins <hughd@google.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Jones <davej@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Cyrill Gorcunov <gorcunov@gmail.com>
Subject: Re: mm: BUG in unmap_page_range
Date: Mon, 8 Sep 2014 18:18:53 +0100	[thread overview]
Message-ID: <20140908171853.GN17501@suse.de> (raw)
In-Reply-To: <54082B25.9090600@oracle.com>

On Thu, Sep 04, 2014 at 05:04:37AM -0400, Sasha Levin wrote:
> On 08/29/2014 09:23 PM, Sasha Levin wrote:
> > On 08/27/2014 11:26 AM, Mel Gorman wrote:
> >> > diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
> >> > index 281870f..ffea570 100644
> >> > --- a/include/asm-generic/pgtable.h
> >> > +++ b/include/asm-generic/pgtable.h
> >> > @@ -723,6 +723,9 @@ static inline pte_t pte_mknuma(pte_t pte)
> >> >  
> >> >  	VM_BUG_ON(!(val & _PAGE_PRESENT));
> >> >  
> >> > +	/* debugging only, specific to x86 */
> >> > +	VM_BUG_ON(val & _PAGE_PROTNONE);
> >> > +
> >> >  	val &= ~_PAGE_PRESENT;
> >> >  	val |= _PAGE_NUMA;
> > Triggered again, the first VM_BUG_ON got hit, the second one never did.
> 
> Okay, this bug has reproduced quite a few times since then that I no longer
> suspect it's random memory corruption. I'd be happy to try out more debug
> patches if you have any leads.
> 

The fact the second one doesn't trigger makes me think that this is not
related to how the helpers are called and is instead relating to timing.
I tried reproducing this but got nothing after 3 hours. How long does it
typically take to reproduce in a given run? You mentioned that it takes a
few weeks to hit but maybe the frequency has changed since. I tried todays
linux-next kernel but it didn't even boot so next-20140826 to match your
original report but got nothing. Can you also send me the config you used
in case that's a factor.

I had one hunch that this may somehow be related to a collision between
pagetable teardown during exit and the scanner but I could not find a
way that could actually happen. During teardown there should be only one
user of the mm and it can't race with itself.

A worse possibility is that somehow the lock is getting corrupted but
that's also a tough sell considering that the locks should be allocated
from a dedicated cache. I guess I could try breaking that to allocate
one page per lock so DEBUG_PAGEALLOC triggers but I'm not very
optimistic.

-- 
Mel Gorman
SUSE Labs

  reply	other threads:[~2014-09-08 17:19 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-02 21:58 mm: BUG in unmap_page_range Sasha Levin
2014-08-02 21:58 ` Sasha Levin
2014-08-04 11:40 ` Hugh Dickins
2014-08-04 11:40   ` Hugh Dickins
2014-08-05 14:44   ` Mel Gorman
2014-08-05 14:44     ` Mel Gorman
2014-08-06  0:42     ` Hugh Dickins
2014-08-06  0:42       ` Hugh Dickins
2014-08-06  1:04       ` Sasha Levin
2014-08-06  1:04         ` Sasha Levin
2014-08-12  3:28         ` Sasha Levin
2014-08-12  3:28           ` Sasha Levin
2014-08-12 10:47           ` [PATCH] x86,mm: fix pte_special versus pte_numa Mel Gorman
2014-08-12 10:47             ` Mel Gorman
2014-08-12 11:08             ` [PATCH] mm: Remove misleading ARCH_USES_NUMA_PROT_NONE Mel Gorman
2014-08-12 11:08               ` Mel Gorman
2014-08-13 13:14               ` Aneesh Kumar K.V
2014-08-13 13:14                 ` Aneesh Kumar K.V
2014-08-27  3:16           ` mm: BUG in unmap_page_range Sasha Levin
2014-08-27  3:16             ` Sasha Levin
2014-08-27 15:26             ` Mel Gorman
2014-08-27 15:26               ` Mel Gorman
2014-08-27 18:21               ` Sasha Levin
2014-08-27 18:21                 ` Sasha Levin
2014-08-30  1:23               ` Sasha Levin
2014-08-30  1:23                 ` Sasha Levin
2014-09-04  9:04                 ` Sasha Levin
2014-09-04  9:04                   ` Sasha Levin
2014-09-08 17:18                   ` Mel Gorman [this message]
2014-09-08 17:18                     ` Mel Gorman
2014-09-08 17:23                     ` Sasha Levin
2014-09-08 17:56                     ` Sasha Levin
2014-09-08 17:56                       ` Sasha Levin
2014-09-09 21:33                       ` Mel Gorman
2014-09-09 21:33                         ` Mel Gorman
2014-09-09 22:20                         ` Sasha Levin
2014-09-09 22:20                           ` Sasha Levin
2014-09-10  2:45                           ` Hugh Dickins
2014-09-10  2:45                             ` Hugh Dickins
2014-09-10 12:47                             ` Mel Gorman
2014-09-10 12:47                               ` Mel Gorman
2014-09-10 14:24                               ` Trinity and mbind flags (WAS: Re: mm: BUG in unmap_page_range) Sasha Levin
2014-09-10 14:24                                 ` Sasha Levin
2014-09-10 14:33                                 ` Dave Jones
2014-09-10 14:33                                   ` Dave Jones
2014-09-10 19:06                               ` mm: BUG in unmap_page_range Sasha Levin
2014-09-10 19:06                                 ` Sasha Levin
2014-09-10 19:36                               ` Hugh Dickins
2014-09-10 19:36                                 ` Hugh Dickins
2014-09-11  2:43                                 ` Sasha Levin
2014-09-11  2:43                                   ` Sasha Levin
2014-09-11 11:39                                   ` Hugh Dickins
2014-09-11 11:39                                     ` Hugh Dickins
2014-09-11 14:22                                     ` Sasha Levin
2014-09-11 14:22                                       ` Sasha Levin
2014-09-11 14:33                                       ` Dave Jones
2014-09-11 14:33                                         ` Dave Jones
2014-09-11 16:28                                     ` Mel Gorman
2014-09-11 16:28                                       ` Mel Gorman
2014-09-11 22:38                                       ` Sasha Levin
2014-09-11 22:38                                         ` Sasha Levin
2014-09-17 21:37                                         ` Sasha Levin
2014-09-17 21:37                                           ` Sasha Levin
2014-09-10 13:12                             ` Sasha Levin
2014-09-10 13:12                               ` Sasha Levin
2014-09-10 13:40                               ` Mel Gorman
2014-09-10 13:40                                 ` Mel Gorman
2014-09-10 16:44                                 ` Sasha Levin
2014-09-10 16:44                                   ` Sasha Levin
2014-09-10 19:09                               ` Hugh Dickins
2014-09-10 19:09                                 ` Hugh Dickins
2014-09-10 20:36                                 ` Sasha Levin
2014-09-10 20:36                                   ` Sasha Levin
2014-09-10 23:00                                   ` Hugh Dickins
2014-09-10 23:00                                     ` Hugh Dickins
2014-08-06 10:35       ` Mel Gorman
2014-08-06 10:35         ` Mel Gorman
2014-08-06  7:14     ` Aneesh Kumar K.V
2014-08-06  7:14       ` Aneesh Kumar K.V
2014-08-06  7:14       ` Aneesh Kumar K.V
2014-08-06 10:23       ` Mel Gorman
2014-08-06 10:23         ` Mel Gorman
2014-08-06 10:23         ` Mel Gorman
2014-08-07  8:40         ` Aneesh Kumar K.V
2014-08-07  8:40           ` Aneesh Kumar K.V
2014-08-07  8:40           ` Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140908171853.GN17501@suse.de \
    --to=mgorman@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=davej@redhat.com \
    --cc=gorcunov@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=sasha.levin@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.