All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>, Adam Litke <agl@us.ibm.com>,
	Nishanth Aravamudan <nacc@us.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Eric Whitney <eric.whitney@hp.com>
Subject: Re: [BUG] 2.6.25-rc4 hang/softlockups after freeing hugepages
Date: Fri, 07 Mar 2008 09:36:56 -0500	[thread overview]
Message-ID: <1204900617.5340.2.camel@localhost> (raw)
In-Reply-To: <20080307114849.GC26229@csn.ul.ie>

On Fri, 2008-03-07 at 11:48 +0000, Mel Gorman wrote:
> On (06/03/08 12:23), Lee Schermerhorn didst pronounce:
> > Test platform:  HP Proliant DL585 server - 4 socket, dual core AMD with
> > 32GB memory.
> > 
> > I first saw this on 25-rc2-mm1 with Mel's zonelist patches, while
> > investigating the interaction of hugepages and cpusets.  Thinking that
> > it might be caused by the zonelist patches, I went back to 25-rc2-mm1
> > w/o the patches and saw the same thing.  It sometimes takes a while for
> > the softlockups to start appearing, and I wanted to find a fairly
> > minimal duplicator.  Meanwhile 25-rc3 and rc4 have come out, so I tried
> > the latest upstream kernel and see the same thing.
> > 
> > To duplicate the problem, I need only:
> > 
> > + log into the platform as root in one window and:
> > 
> > 	echo N >/proc/sys/vm/nr_hugepages
> > 	echo 0 >proc/sys/vm/nr_hugepages
> > 
> 
> Uncool, I am going to try and find a machine to reproduce this one but
> in case I have no luck, can you try setting the following in your
> .config which may rattle out something please?
> 
> CONFIG_DEBUG_SPINLOCK=y
> CONFIG_DEBUG_MUTEXES=y
> CONFIG_DEBUG_LOCK_ALLOC=y
> CONFIG_PROVE_LOCKING=y
> CONFIG_DEBUG_SPINLOCK_SLEEP=y
> CONFIG_DEBUG_VM=y
> 
> and as you have DEBUG_INFO, can you say what line is ffffffff8027b693 ?

Will test and get back to you with info.  Slightly backed up here...

> 
> > In my case, N=64.  If I look, before echoing 0, I see 16 hugepages
> > allocated on each of the 4 nodes, as expected.
> > 
> > + then in another window, log in again.  
> > 
> > Sometimes it will hang during the 2nd login and I'll never see a shell
> > prompt. 
> 
> My initial guess was that is is something to do with page_table_lock but as
> you didn't get to fault in huge pages, it doesn't make much sense.

Yeah.  Most of my previous tests involved creating a hugetlb segment
[shm or mmap'd hugetlbfs file] and faulting in the pages.  On a whim, I
tried just allocating and freeing huge pages to/from the free list and
see the same behavior...  I'm really hoping this isn't another dumb
operator error :-(.

Lee


WARNING: multiple messages have this Message-ID (diff)
From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>, Adam Litke <agl@us.ibm.com>,
	Nishanth Aravamudan <nacc@us.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Eric Whitney <eric.whitney@hp.com>
Subject: Re: [BUG] 2.6.25-rc4 hang/softlockups after freeing hugepages
Date: Fri, 07 Mar 2008 09:36:56 -0500	[thread overview]
Message-ID: <1204900617.5340.2.camel@localhost> (raw)
In-Reply-To: <20080307114849.GC26229@csn.ul.ie>

On Fri, 2008-03-07 at 11:48 +0000, Mel Gorman wrote:
> On (06/03/08 12:23), Lee Schermerhorn didst pronounce:
> > Test platform:  HP Proliant DL585 server - 4 socket, dual core AMD with
> > 32GB memory.
> > 
> > I first saw this on 25-rc2-mm1 with Mel's zonelist patches, while
> > investigating the interaction of hugepages and cpusets.  Thinking that
> > it might be caused by the zonelist patches, I went back to 25-rc2-mm1
> > w/o the patches and saw the same thing.  It sometimes takes a while for
> > the softlockups to start appearing, and I wanted to find a fairly
> > minimal duplicator.  Meanwhile 25-rc3 and rc4 have come out, so I tried
> > the latest upstream kernel and see the same thing.
> > 
> > To duplicate the problem, I need only:
> > 
> > + log into the platform as root in one window and:
> > 
> > 	echo N >/proc/sys/vm/nr_hugepages
> > 	echo 0 >proc/sys/vm/nr_hugepages
> > 
> 
> Uncool, I am going to try and find a machine to reproduce this one but
> in case I have no luck, can you try setting the following in your
> .config which may rattle out something please?
> 
> CONFIG_DEBUG_SPINLOCK=y
> CONFIG_DEBUG_MUTEXES=y
> CONFIG_DEBUG_LOCK_ALLOC=y
> CONFIG_PROVE_LOCKING=y
> CONFIG_DEBUG_SPINLOCK_SLEEP=y
> CONFIG_DEBUG_VM=y
> 
> and as you have DEBUG_INFO, can you say what line is ffffffff8027b693 ?

Will test and get back to you with info.  Slightly backed up here...

> 
> > In my case, N=64.  If I look, before echoing 0, I see 16 hugepages
> > allocated on each of the 4 nodes, as expected.
> > 
> > + then in another window, log in again.  
> > 
> > Sometimes it will hang during the 2nd login and I'll never see a shell
> > prompt. 
> 
> My initial guess was that is is something to do with page_table_lock but as
> you didn't get to fault in huge pages, it doesn't make much sense.

Yeah.  Most of my previous tests involved creating a hugetlb segment
[shm or mmap'd hugetlbfs file] and faulting in the pages.  On a whim, I
tried just allocating and freeing huge pages to/from the free list and
see the same behavior...  I'm really hoping this isn't another dumb
operator error :-(.

Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-03-07 14:36 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-06 17:23 [BUG] 2.6.25-rc4 hang/softlockups after freeing hugepages Lee Schermerhorn
2008-03-06 17:23 ` Lee Schermerhorn
2008-03-06 17:45 ` Ingo Molnar
2008-03-06 17:45   ` Ingo Molnar
2008-03-06 18:19   ` Lee Schermerhorn
2008-03-06 18:19     ` Lee Schermerhorn
2008-03-06 17:53 ` Nishanth Aravamudan
2008-03-06 17:53   ` Nishanth Aravamudan
2008-03-06 18:17   ` Lee Schermerhorn
2008-03-06 18:17     ` Lee Schermerhorn
2008-03-07 11:48 ` Mel Gorman
2008-03-07 11:48   ` Mel Gorman
2008-03-07 14:36   ` Lee Schermerhorn [this message]
2008-03-07 14:36     ` Lee Schermerhorn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1204900617.5340.2.camel@localhost \
    --to=lee.schermerhorn@hp.com \
    --cc=agl@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=eric.whitney@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=nacc@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.