All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>,
	anton@sambar.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Dave Hansen <dave.hansen@intel.com>,
	Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org, Dan Streetman <ddstreet@ieee.org>
Subject: Re: [PATCH v2] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable pages
Date: Fri, 3 Apr 2015 11:50:39 -0700	[thread overview]
Message-ID: <20150403185039.GB38424@linux.vnet.ibm.com> (raw)
In-Reply-To: <20150403182445.GA31900@dhcp22.suse.cz>

On 03.04.2015 [20:24:45 +0200], Michal Hocko wrote:
> On Fri 03-04-15 10:43:57, Nishanth Aravamudan wrote:
> > On 31.03.2015 [11:48:29 +0200], Michal Hocko wrote:
> [...]
> > > I would expect kswapd would be looping endlessly because the zone
> > > wouldn't be balanced obviously. But I would be wrong... because
> > > pgdat_balanced is doing this:
> > > 		/*
> > > 		 * A special case here:
> > > 		 *
> > > 		 * balance_pgdat() skips over all_unreclaimable after
> > > 		 * DEF_PRIORITY. Effectively, it considers them balanced so
> > > 		 * they must be considered balanced here as well!
> > > 		 */
> > > 		if (!zone_reclaimable(zone)) {
> > > 			balanced_pages += zone->managed_pages;
> > > 			continue;
> > > 		}
> > > 
> > > and zone_reclaimable is false for you as you didn't have any
> > > zone_reclaimable_pages(). But wakeup_kswapd doesn't do this check so it
> > > would see !zone_balanced() AFAICS (build_zonelists doesn't ignore those
> > > zones right?) and so the kswapd would be woken up easily. So it looks
> > > like a mess.
> > 
> > My understanding, and I could easily be wrong, is that kswapd2 (node 2
> > is the exhausted one) spins endlessly, because the reclaim logic sees
> > that we are reclaiming from somewhere but the allocation request for
> > node 2 (which is __GFP_THISNODE for hugepages, not GFP_THISNODE) will
> > never complete, so we just continue to reclaim.
> 
> __GFP_THISNODE would be waking up kswapd2 again and again, that is true.

Right, one idea I had for this was ensuring that we perform reclaim with
somehow some knowledge of __GFP_THISNODE -- that is it needs to be
somewhat targetted in order to actually help satisfy the current
allocation. But it got pretty hairy fast and I didn't want to break the
world :)

> I am just wondering whether we will have any __GFP_THISNODE allocations
> for a node without CPUs (numa_node_id() shouldn't return such a node
> AFAICS). Maybe if somebody is bound to Node2 explicitly but I would
> consider this as a misconfiguration.

Right, I'd need to check what happens if in our setup you taskset to
node2 and tried to force memory to be local -- I think you'd either be
killed immediately, or the kernel will just disagree with your binding
since it's invalid (e.g., that will happen if you try to bind to a
memoryless node, I think).

Keep in mind that although in my config node2 had no CPUs, that's not a
hard & fast requirement. I do believe in a previous iteration of this
bug, the exhausted node had no free memory but did have cpus assigned to
it.

-Nish

WARNING: multiple messages have this Message-ID (diff)
From: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: Dave Hansen <dave.hansen@intel.com>, Mel Gorman <mgorman@suse.de>,
	anton@sambar.org, linuxppc-dev@lists.ozlabs.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Rik van Riel <riel@redhat.com>, Dan Streetman <ddstreet@ieee.org>
Subject: Re: [PATCH v2] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable pages
Date: Fri, 3 Apr 2015 11:50:39 -0700	[thread overview]
Message-ID: <20150403185039.GB38424@linux.vnet.ibm.com> (raw)
In-Reply-To: <20150403182445.GA31900@dhcp22.suse.cz>

On 03.04.2015 [20:24:45 +0200], Michal Hocko wrote:
> On Fri 03-04-15 10:43:57, Nishanth Aravamudan wrote:
> > On 31.03.2015 [11:48:29 +0200], Michal Hocko wrote:
> [...]
> > > I would expect kswapd would be looping endlessly because the zone
> > > wouldn't be balanced obviously. But I would be wrong... because
> > > pgdat_balanced is doing this:
> > > 		/*
> > > 		 * A special case here:
> > > 		 *
> > > 		 * balance_pgdat() skips over all_unreclaimable after
> > > 		 * DEF_PRIORITY. Effectively, it considers them balanced so
> > > 		 * they must be considered balanced here as well!
> > > 		 */
> > > 		if (!zone_reclaimable(zone)) {
> > > 			balanced_pages += zone->managed_pages;
> > > 			continue;
> > > 		}
> > > 
> > > and zone_reclaimable is false for you as you didn't have any
> > > zone_reclaimable_pages(). But wakeup_kswapd doesn't do this check so it
> > > would see !zone_balanced() AFAICS (build_zonelists doesn't ignore those
> > > zones right?) and so the kswapd would be woken up easily. So it looks
> > > like a mess.
> > 
> > My understanding, and I could easily be wrong, is that kswapd2 (node 2
> > is the exhausted one) spins endlessly, because the reclaim logic sees
> > that we are reclaiming from somewhere but the allocation request for
> > node 2 (which is __GFP_THISNODE for hugepages, not GFP_THISNODE) will
> > never complete, so we just continue to reclaim.
> 
> __GFP_THISNODE would be waking up kswapd2 again and again, that is true.

Right, one idea I had for this was ensuring that we perform reclaim with
somehow some knowledge of __GFP_THISNODE -- that is it needs to be
somewhat targetted in order to actually help satisfy the current
allocation. But it got pretty hairy fast and I didn't want to break the
world :)

> I am just wondering whether we will have any __GFP_THISNODE allocations
> for a node without CPUs (numa_node_id() shouldn't return such a node
> AFAICS). Maybe if somebody is bound to Node2 explicitly but I would
> consider this as a misconfiguration.

Right, I'd need to check what happens if in our setup you taskset to
node2 and tried to force memory to be local -- I think you'd either be
killed immediately, or the kernel will just disagree with your binding
since it's invalid (e.g., that will happen if you try to bind to a
memoryless node, I think).

Keep in mind that although in my config node2 had no CPUs, that's not a
hard & fast requirement. I do believe in a previous iteration of this
bug, the exhausted node had no free memory but did have cpus assigned to
it.

-Nish

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: Dave Hansen <dave.hansen@intel.com>, Mel Gorman <mgorman@suse.de>,
	anton@sambar.org, linuxppc-dev@lists.ozlabs.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Rik van Riel <riel@redhat.com>, Dan Streetman <ddstreet@ieee.org>
Subject: Re: [PATCH v2] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable pages
Date: Fri, 3 Apr 2015 11:50:39 -0700	[thread overview]
Message-ID: <20150403185039.GB38424@linux.vnet.ibm.com> (raw)
In-Reply-To: <20150403182445.GA31900@dhcp22.suse.cz>

On 03.04.2015 [20:24:45 +0200], Michal Hocko wrote:
> On Fri 03-04-15 10:43:57, Nishanth Aravamudan wrote:
> > On 31.03.2015 [11:48:29 +0200], Michal Hocko wrote:
> [...]
> > > I would expect kswapd would be looping endlessly because the zone
> > > wouldn't be balanced obviously. But I would be wrong... because
> > > pgdat_balanced is doing this:
> > > 		/*
> > > 		 * A special case here:
> > > 		 *
> > > 		 * balance_pgdat() skips over all_unreclaimable after
> > > 		 * DEF_PRIORITY. Effectively, it considers them balanced so
> > > 		 * they must be considered balanced here as well!
> > > 		 */
> > > 		if (!zone_reclaimable(zone)) {
> > > 			balanced_pages += zone->managed_pages;
> > > 			continue;
> > > 		}
> > > 
> > > and zone_reclaimable is false for you as you didn't have any
> > > zone_reclaimable_pages(). But wakeup_kswapd doesn't do this check so it
> > > would see !zone_balanced() AFAICS (build_zonelists doesn't ignore those
> > > zones right?) and so the kswapd would be woken up easily. So it looks
> > > like a mess.
> > 
> > My understanding, and I could easily be wrong, is that kswapd2 (node 2
> > is the exhausted one) spins endlessly, because the reclaim logic sees
> > that we are reclaiming from somewhere but the allocation request for
> > node 2 (which is __GFP_THISNODE for hugepages, not GFP_THISNODE) will
> > never complete, so we just continue to reclaim.
> 
> __GFP_THISNODE would be waking up kswapd2 again and again, that is true.

Right, one idea I had for this was ensuring that we perform reclaim with
somehow some knowledge of __GFP_THISNODE -- that is it needs to be
somewhat targetted in order to actually help satisfy the current
allocation. But it got pretty hairy fast and I didn't want to break the
world :)

> I am just wondering whether we will have any __GFP_THISNODE allocations
> for a node without CPUs (numa_node_id() shouldn't return such a node
> AFAICS). Maybe if somebody is bound to Node2 explicitly but I would
> consider this as a misconfiguration.

Right, I'd need to check what happens if in our setup you taskset to
node2 and tried to force memory to be local -- I think you'd either be
killed immediately, or the kernel will just disagree with your binding
since it's invalid (e.g., that will happen if you try to bind to a
memoryless node, I think).

Keep in mind that although in my config node2 had no CPUs, that's not a
hard & fast requirement. I do believe in a previous iteration of this
bug, the exhausted node had no free memory but did have cpus assigned to
it.

-Nish


  reply	other threads:[~2015-04-03 18:50 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-27 19:28 [PATCH] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable zones Nishanth Aravamudan
2015-03-27 19:28 ` Nishanth Aravamudan
2015-03-27 19:28 ` Nishanth Aravamudan
2015-03-27 19:39 ` Nishanth Aravamudan
2015-03-27 19:39   ` Nishanth Aravamudan
2015-03-27 19:39   ` Nishanth Aravamudan
2015-03-27 19:58 ` Dan Streetman
2015-03-27 19:58   ` Dan Streetman
2015-03-27 19:58   ` Dan Streetman
2015-03-27 20:17 ` Dave Hansen
2015-03-27 20:17   ` Dave Hansen
2015-03-27 20:17   ` Dave Hansen
2015-03-27 22:23   ` [PATCH v2] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable pages Nishanth Aravamudan
2015-03-27 22:23     ` Nishanth Aravamudan
2015-03-27 22:23     ` Nishanth Aravamudan
2015-03-31  9:48     ` Michal Hocko
2015-03-31  9:48       ` Michal Hocko
2015-03-31  9:48       ` Michal Hocko
2015-04-03  7:57       ` Vlastimil Babka
2015-04-03  7:57         ` Vlastimil Babka
2015-04-03  7:57         ` Vlastimil Babka
2015-04-03 17:45         ` Nishanth Aravamudan
2015-04-03 17:45           ` Nishanth Aravamudan
2015-04-03 17:45           ` Nishanth Aravamudan
2015-05-05 22:09           ` Nishanth Aravamudan
2015-05-05 22:09             ` Nishanth Aravamudan
2015-05-05 22:09             ` Nishanth Aravamudan
2015-05-06  9:28             ` Vlastimil Babka
2015-05-06  9:28               ` Vlastimil Babka
2015-05-06  9:28               ` Vlastimil Babka
2015-05-08 22:47               ` Andrew Morton
2015-05-08 22:47                 ` Andrew Morton
2015-05-08 22:47                 ` Andrew Morton
2015-05-08 23:18                 ` Nishanth Aravamudan
2015-05-08 23:18                   ` Nishanth Aravamudan
2015-05-08 23:18                   ` Nishanth Aravamudan
2015-04-03 17:43       ` Nishanth Aravamudan
2015-04-03 17:43         ` Nishanth Aravamudan
2015-04-03 17:43         ` Nishanth Aravamudan
2015-04-03 18:24         ` Michal Hocko
2015-04-03 18:24           ` Michal Hocko
2015-04-03 18:24           ` Michal Hocko
2015-04-03 18:50           ` Nishanth Aravamudan [this message]
2015-04-03 18:50             ` Nishanth Aravamudan
2015-04-03 18:50             ` Nishanth Aravamudan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150403185039.GB38424@linux.vnet.ibm.com \
    --to=nacc@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=anton@sambar.org \
    --cc=dave.hansen@intel.com \
    --cc=ddstreet@ieee.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.