All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
To: Christoph Lameter <cl@linux.com>
Cc: linux-mm@kvack.org, mgorman@suse.de,
	linuxppc-dev@lists.ozlabs.org, anton@samba.org,
	rientjes@google.com
Subject: Re: Bug in reclaim logic with exhausted nodes?
Date: Thu, 27 Mar 2014 13:33:54 -0700	[thread overview]
Message-ID: <20140327203354.GA16651@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.DEB.2.10.1403251323030.26744@nuc>

Hi Christoph,

On 25.03.2014 [13:25:30 -0500], Christoph Lameter wrote:
> On Tue, 25 Mar 2014, Nishanth Aravamudan wrote:
> 
> > On power, very early, we find the 16G pages (gpages in the powerpc arch
> > code) in the device-tree:
> >
> > early_setup ->
> > 	early_init_mmu ->
> > 		htab_initialize ->
> > 			htab_init_page_sizes ->
> > 				htab_dt_scan_hugepage_blocks ->
> > 					memblock_reserve
> > 						which marks the memory
> > 						as reserved
> > 					add_gpage
> > 						which saves the address
> > 						off so future calls for
> > 						alloc_bootmem_huge_page()
> >
> > hugetlb_init ->
> > 		hugetlb_init_hstates ->
> > 			hugetlb_hstate_alloc_pages ->
> > 				alloc_bootmem_huge_page
> >
> > > Not sure if I understand that correctly.
> >
> > Basically this is present memory that is "reserved" for the 16GB usage
> > per the LPAR configuration. We honor that configuration in Linux based
> > upon the contents of the device-tree. It just so happens in the
> > configuration from my original e-mail that a consequence of this is that
> > a NUMA node has memory (topologically), but none of that memory is free,
> > nor will it ever be free.
> 
> Well dont do that
> 
> > Perhaps, in this case, we could just remove that node from the N_MEMORY
> > mask? Memory allocations will never succeed from the node, and we can
> > never free these 16GB pages. It is really not any different than a
> > memoryless node *except* when you are using the 16GB pages.
> 
> That looks to be the correct way to handle things. Maybe mark the node as
> offline or somehow not present so that the kernel ignores it.

This is a SLUB condition:

mm/slub.c::early_kmem_cache_node_alloc():
...
        page = new_slab(kmem_cache_node, GFP_NOWAIT, node);
...
        if (page_to_nid(page) != node) {
                printk(KERN_ERR "SLUB: Unable to allocate memory from "
                                "node %d\n", node);
                printk(KERN_ERR "SLUB: Allocating a useless per node structure "
                                "in order to be able to continue\n");
        }
...

Since this is quite early, and we have not set up the nodemasks yet,
does it make sense to perhaps have a temporary init-time nodemask that
we set bits in here, and "fix-up" those nodes when we setup the
nodemasks?

Thanks,
Nish

WARNING: multiple messages have this Message-ID (diff)
From: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
To: Christoph Lameter <cl@linux.com>
Cc: linux-mm@kvack.org, rientjes@google.com,
	linuxppc-dev@lists.ozlabs.org, anton@samba.org, mgorman@suse.de
Subject: Re: Bug in reclaim logic with exhausted nodes?
Date: Thu, 27 Mar 2014 13:33:54 -0700	[thread overview]
Message-ID: <20140327203354.GA16651@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.DEB.2.10.1403251323030.26744@nuc>

Hi Christoph,

On 25.03.2014 [13:25:30 -0500], Christoph Lameter wrote:
> On Tue, 25 Mar 2014, Nishanth Aravamudan wrote:
> 
> > On power, very early, we find the 16G pages (gpages in the powerpc arch
> > code) in the device-tree:
> >
> > early_setup ->
> > 	early_init_mmu ->
> > 		htab_initialize ->
> > 			htab_init_page_sizes ->
> > 				htab_dt_scan_hugepage_blocks ->
> > 					memblock_reserve
> > 						which marks the memory
> > 						as reserved
> > 					add_gpage
> > 						which saves the address
> > 						off so future calls for
> > 						alloc_bootmem_huge_page()
> >
> > hugetlb_init ->
> > 		hugetlb_init_hstates ->
> > 			hugetlb_hstate_alloc_pages ->
> > 				alloc_bootmem_huge_page
> >
> > > Not sure if I understand that correctly.
> >
> > Basically this is present memory that is "reserved" for the 16GB usage
> > per the LPAR configuration. We honor that configuration in Linux based
> > upon the contents of the device-tree. It just so happens in the
> > configuration from my original e-mail that a consequence of this is that
> > a NUMA node has memory (topologically), but none of that memory is free,
> > nor will it ever be free.
> 
> Well dont do that
> 
> > Perhaps, in this case, we could just remove that node from the N_MEMORY
> > mask? Memory allocations will never succeed from the node, and we can
> > never free these 16GB pages. It is really not any different than a
> > memoryless node *except* when you are using the 16GB pages.
> 
> That looks to be the correct way to handle things. Maybe mark the node as
> offline or somehow not present so that the kernel ignores it.

This is a SLUB condition:

mm/slub.c::early_kmem_cache_node_alloc():
...
        page = new_slab(kmem_cache_node, GFP_NOWAIT, node);
...
        if (page_to_nid(page) != node) {
                printk(KERN_ERR "SLUB: Unable to allocate memory from "
                                "node %d\n", node);
                printk(KERN_ERR "SLUB: Allocating a useless per node structure "
                                "in order to be able to continue\n");
        }
...

Since this is quite early, and we have not set up the nodemasks yet,
does it make sense to perhaps have a temporary init-time nodemask that
we set bits in here, and "fix-up" those nodes when we setup the
nodemasks?

Thanks,
Nish

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2014-03-27 20:34 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-11 21:06 Bug in reclaim logic with exhausted nodes? Nishanth Aravamudan
2014-03-11 21:06 ` Nishanth Aravamudan
2014-03-13 17:01 ` Nishanth Aravamudan
2014-03-13 17:01   ` Nishanth Aravamudan
2014-03-24 23:05   ` Nishanth Aravamudan
2014-03-24 23:05     ` Nishanth Aravamudan
2014-03-25 16:17     ` Christoph Lameter
2014-03-25 16:17       ` Christoph Lameter
2014-03-25 16:23       ` Nishanth Aravamudan
2014-03-25 16:23         ` Nishanth Aravamudan
2014-03-25 16:53         ` Christoph Lameter
2014-03-25 16:53           ` Christoph Lameter
2014-03-25 18:10           ` Nishanth Aravamudan
2014-03-25 18:10             ` Nishanth Aravamudan
2014-03-25 18:25             ` Christoph Lameter
2014-03-25 18:25               ` Christoph Lameter
2014-03-25 18:37               ` Nishanth Aravamudan
2014-03-25 18:37                 ` Nishanth Aravamudan
2014-03-27 20:33               ` Nishanth Aravamudan [this message]
2014-03-27 20:33                 ` Nishanth Aravamudan
2014-03-29  5:40                 ` Christoph Lameter
2014-03-29  5:40                   ` Christoph Lameter
2014-04-01  1:33                   ` Nishanth Aravamudan
2014-04-01  1:33                     ` Nishanth Aravamudan
2014-04-03 16:41                     ` Christoph Lameter
2014-04-03 16:41                       ` Christoph Lameter
2014-05-12 18:46                       ` Nishanth Aravamudan
2014-05-12 18:46                         ` Nishanth Aravamudan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140327203354.GA16651@linux.vnet.ibm.com \
    --to=nacc@linux.vnet.ibm.com \
    --cc=anton@samba.org \
    --cc=cl@linux.com \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mgorman@suse.de \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.