Re: Bug in reclaim logic with exhausted nodes?

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
To: Christoph Lameter <cl@linux.com>
Cc: linux-mm@kvack.org, rientjes@google.com,
	linuxppc-dev@lists.ozlabs.org, anton@samba.org, mgorman@suse.de
Subject: Re: Bug in reclaim logic with exhausted nodes?
Date: Mon, 31 Mar 2014 18:33:46 -0700	[thread overview]
Message-ID: <20140401013346.GD5144@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.DEB.2.10.1403290038200.24286@nuc>

On 29.03.2014 [00:40:41 -0500], Christoph Lameter wrote:
> On Thu, 27 Mar 2014, Nishanth Aravamudan wrote:
> 
> > > That looks to be the correct way to handle things. Maybe mark the node as
> > > offline or somehow not present so that the kernel ignores it.
> >
> > This is a SLUB condition:
> >
> > mm/slub.c::early_kmem_cache_node_alloc():
> > ...
> >         page = new_slab(kmem_cache_node, GFP_NOWAIT, node);
> > ...
> 
> So the page allocation from the node failed. We have a strange boot
> condition where the OS is aware of anode but allocations on that node
> fail.

Yep. The node exists, it's just fully exhausted at boot (due to the
presence of 16GB pages reserved at boot-time).

>  >         if (page_to_nid(page) != node) {
> >                 printk(KERN_ERR "SLUB: Unable to allocate memory from "
> >                                 "node %d\n", node);
> >                 printk(KERN_ERR "SLUB: Allocating a useless per node structure "
> >                                 "in order to be able to continue\n");
> >         }
> > ...
> >
> > Since this is quite early, and we have not set up the nodemasks yet,
> > does it make sense to perhaps have a temporary init-time nodemask that
> > we set bits in here, and "fix-up" those nodes when we setup the
> > nodemasks?
> 
> Please take care of this earlier than this. The page allocator in
> general should allow allocations from all nodes with memory during
> boot,

I'd appreciate a bit more guidance? I'm suggesting that in this case the
node functionally has no memory. So the page allocator should not allow
allocations from it -- except (I need to investigate this still)
userspace accessing the 16GB pages on that node, but that, I believe,
doesn't go through the page allocator at all, it's all from hugetlb
interfaces. It seems to me there is a bug in SLUB that we are noting
that we have a useless per-node structure for a given nid, but not
actually preventing requests to that node or reclaim because of those
allocations.

The page allocator is actually fine here, afaict. We've pulled out
memory from this node, even though it's present, so none is free. All of
that is working as expected, based upon the issue we've seen. The
problems start when we "force" (by way of a round-robin page allocation
request from /proc/sys/vm/nr_hugepages) a THISNODE allocation to come
from the exhausted node, which has no memory free, causing reclaim,
which progresses on other nodes, and thus never alleviates the
allocation failure (and can't).

I think there is a logical bug (even if it only occurs in this
particular corner case) where if reclaim progresses for a THISNODE
allocation, we don't check *where* the reclaim is progressing, and thus
may falsely be indicating that we have done some progress when in fact
the allocation that is causing reclaim will not possibly make any more
progress.

Thanks,
Nish

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2014-04-01  1:33 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-11 21:06 Bug in reclaim logic with exhausted nodes? Nishanth Aravamudan
2014-03-13 17:01 ` Nishanth Aravamudan
2014-03-24 23:05   ` Nishanth Aravamudan
2014-03-25 16:17     ` Christoph Lameter
2014-03-25 16:23       ` Nishanth Aravamudan
2014-03-25 16:53         ` Christoph Lameter
2014-03-25 18:10           ` Nishanth Aravamudan
2014-03-25 18:25             ` Christoph Lameter
2014-03-25 18:37               ` Nishanth Aravamudan
2014-03-27 20:33               ` Nishanth Aravamudan
2014-03-29  5:40                 ` Christoph Lameter
2014-04-01  1:33                   ` Nishanth Aravamudan [this message]
2014-04-03 16:41                     ` Christoph Lameter
2014-05-12 18:46                       ` Nishanth Aravamudan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140401013346.GD5144@linux.vnet.ibm.com \
    --to=nacc@linux.vnet.ibm.com \
    --cc=anton@samba.org \
    --cc=cl@linux.com \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mgorman@suse.de \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).