All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Linux-MM <linux-mm@kvack.org>, Rik van Riel <riel@surriel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 16/27] mm, page_alloc: Consider dirtyable memory in terms of nodes
Date: Sun, 28 Feb 2016 11:17:46 -0500	[thread overview]
Message-ID: <20160228161746.GG25622@cmpxchg.org> (raw)
In-Reply-To: <20160223151755.GB2854@techsingularity.net>

On Tue, Feb 23, 2016 at 03:17:55PM +0000, Mel Gorman wrote:
> @@ -686,6 +680,12 @@ typedef struct pglist_data {
>  	/* Number of pages migrated during the rate limiting time interval */
>  	unsigned long numabalancing_migrate_nr_pages;
>  #endif
> +	/*
> +	 * This is a per-zone reserve of pages that are not available
> +	 * to userspace allocations.
> +	 */
> +	unsigned long		totalreserve_pages;

"per-node reserve"

> @@ -297,22 +306,11 @@ static unsigned long highmem_dirtyable_memory(unsigned long total)
>  	int node;
>  	unsigned long x = 0;
>  
> -	for_each_node_state(node, N_HIGH_MEMORY) {
> -		struct zone *z = &NODE_DATA(node)->node_zones[ZONE_HIGHMEM];
> -
> -		x += zone_dirtyable_memory(z);
> -	}
>  	/*
> -	 * Unreclaimable memory (kernel memory or anonymous memory
> -	 * without swap) can bring down the dirtyable pages below
> -	 * the zone's dirty balance reserve and the above calculation
> -	 * will underflow.  However we still want to add in nodes
> -	 * which are below threshold (negative values) to get a more
> -	 * accurate calculation but make sure that the total never
> -	 * underflows.
> +	 * LRU lists are per-node so there is accurate way of accurately
> +	 * calculating dirtyable memory of just the high zone

"no accurate way of calculating"

> @@ -2665,7 +2665,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
>  		 * will require awareness of zones in the
>  		 * dirty-throttling and the flusher threads.
>  		 */
> -		if (ac->spread_dirty_pages && !zone_dirty_ok(zone))
> +		if (ac->spread_dirty_pages && !node_dirty_ok(zone->zone_pgdat))
>  			continue;

The comment above this branch can be updated. I'm attaching a diff
below, feel free to use it.

>  		mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
> @@ -6333,7 +6333,7 @@ static void calculate_totalreserve_pages(void)
>  			if (max > zone->managed_pages)
>  				max = zone->managed_pages;
>  
> -			zone->totalreserve_pages = max;
> +			pgdat->totalreserve_pages += max;

calculate_totalreserve_pages() can be called repeatedly. It needs to
be set freshly in this function, not added to.

---

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c461a94..fedd0b5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2596,28 +2596,21 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 				continue;
 		/*
 		 * When allocating a page cache page for writing, we
-		 * want to get it from a zone that is within its dirty
-		 * limit, such that no single zone holds more than its
+		 * want to get it from a node that is within its dirty
+		 * limit, such that no node zone holds more than its
 		 * proportional share of globally allowed dirty pages.
-		 * The dirty limits take into account the zone's
+		 * The dirty limits take into account the node's
 		 * lowmem reserves and high watermark so that kswapd
 		 * should be able to balance it without having to
 		 * write pages from its LRU list.
 		 *
-		 * This may look like it could increase pressure on
-		 * lower zones by failing allocations in higher zones
-		 * before they are full.  But the pages that do spill
-		 * over are limited as the lower zones are protected
-		 * by this very same mechanism.  It should not become
-		 * a practical burden to them.
-		 *
 		 * XXX: For now, allow allocations to potentially
-		 * exceed the per-zone dirty limit in the slowpath
+		 * exceed the per-node dirty limit in the slowpath
 		 * (spread_dirty_pages unset) before going into reclaim,
 		 * which is important when on a NUMA setup the allowed
-		 * zones are together not big enough to reach the
+		 * nodes are together not big enough to reach the
 		 * global limit.  The proper fix for these situations
-		 * will require awareness of zones in the
+		 * will require awareness of nodes in the
 		 * dirty-throttling and the flusher threads.
 		 */
 		if (ac->spread_dirty_pages) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Linux-MM <linux-mm@kvack.org>, Rik van Riel <riel@surriel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 16/27] mm, page_alloc: Consider dirtyable memory in terms of nodes
Date: Sun, 28 Feb 2016 11:17:46 -0500	[thread overview]
Message-ID: <20160228161746.GG25622@cmpxchg.org> (raw)
In-Reply-To: <20160223151755.GB2854@techsingularity.net>

On Tue, Feb 23, 2016 at 03:17:55PM +0000, Mel Gorman wrote:
> @@ -686,6 +680,12 @@ typedef struct pglist_data {
>  	/* Number of pages migrated during the rate limiting time interval */
>  	unsigned long numabalancing_migrate_nr_pages;
>  #endif
> +	/*
> +	 * This is a per-zone reserve of pages that are not available
> +	 * to userspace allocations.
> +	 */
> +	unsigned long		totalreserve_pages;

"per-node reserve"

> @@ -297,22 +306,11 @@ static unsigned long highmem_dirtyable_memory(unsigned long total)
>  	int node;
>  	unsigned long x = 0;
>  
> -	for_each_node_state(node, N_HIGH_MEMORY) {
> -		struct zone *z = &NODE_DATA(node)->node_zones[ZONE_HIGHMEM];
> -
> -		x += zone_dirtyable_memory(z);
> -	}
>  	/*
> -	 * Unreclaimable memory (kernel memory or anonymous memory
> -	 * without swap) can bring down the dirtyable pages below
> -	 * the zone's dirty balance reserve and the above calculation
> -	 * will underflow.  However we still want to add in nodes
> -	 * which are below threshold (negative values) to get a more
> -	 * accurate calculation but make sure that the total never
> -	 * underflows.
> +	 * LRU lists are per-node so there is accurate way of accurately
> +	 * calculating dirtyable memory of just the high zone

"no accurate way of calculating"

> @@ -2665,7 +2665,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
>  		 * will require awareness of zones in the
>  		 * dirty-throttling and the flusher threads.
>  		 */
> -		if (ac->spread_dirty_pages && !zone_dirty_ok(zone))
> +		if (ac->spread_dirty_pages && !node_dirty_ok(zone->zone_pgdat))
>  			continue;

The comment above this branch can be updated. I'm attaching a diff
below, feel free to use it.

>  		mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
> @@ -6333,7 +6333,7 @@ static void calculate_totalreserve_pages(void)
>  			if (max > zone->managed_pages)
>  				max = zone->managed_pages;
>  
> -			zone->totalreserve_pages = max;
> +			pgdat->totalreserve_pages += max;

calculate_totalreserve_pages() can be called repeatedly. It needs to
be set freshly in this function, not added to.

---

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c461a94..fedd0b5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2596,28 +2596,21 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 				continue;
 		/*
 		 * When allocating a page cache page for writing, we
-		 * want to get it from a zone that is within its dirty
-		 * limit, such that no single zone holds more than its
+		 * want to get it from a node that is within its dirty
+		 * limit, such that no node zone holds more than its
 		 * proportional share of globally allowed dirty pages.
-		 * The dirty limits take into account the zone's
+		 * The dirty limits take into account the node's
 		 * lowmem reserves and high watermark so that kswapd
 		 * should be able to balance it without having to
 		 * write pages from its LRU list.
 		 *
-		 * This may look like it could increase pressure on
-		 * lower zones by failing allocations in higher zones
-		 * before they are full.  But the pages that do spill
-		 * over are limited as the lower zones are protected
-		 * by this very same mechanism.  It should not become
-		 * a practical burden to them.
-		 *
 		 * XXX: For now, allow allocations to potentially
-		 * exceed the per-zone dirty limit in the slowpath
+		 * exceed the per-node dirty limit in the slowpath
 		 * (spread_dirty_pages unset) before going into reclaim,
 		 * which is important when on a NUMA setup the allowed
-		 * zones are together not big enough to reach the
+		 * nodes are together not big enough to reach the
 		 * global limit.  The proper fix for these situations
-		 * will require awareness of zones in the
+		 * will require awareness of nodes in the
 		 * dirty-throttling and the flusher threads.
 		 */
 		if (ac->spread_dirty_pages) {

  reply	other threads:[~2016-02-28 16:17 UTC|newest]

Thread overview: 114+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-23 15:04 [RFC PATCH 00/27] Move LRU page reclaim from zones to nodes v2 Mel Gorman
2016-02-23 15:04 ` Mel Gorman
2016-02-23 15:04 ` [PATCH 01/27] mm, page_alloc: Use ac->classzone_idx instead of zone_idx(preferred_zone) Mel Gorman
2016-02-23 15:04   ` Mel Gorman
2016-02-23 18:04   ` Johannes Weiner
2016-02-23 18:04     ` Johannes Weiner
2016-03-03 10:37   ` Vlastimil Babka
2016-03-03 10:37     ` Vlastimil Babka
2016-02-23 15:04 ` [PATCH 02/27] mm, vmscan: Check if cpusets are enabled during direct reclaim Mel Gorman
2016-02-23 15:04   ` Mel Gorman
2016-02-23 18:06   ` Johannes Weiner
2016-02-23 18:06     ` Johannes Weiner
     [not found]   ` <1456239890-20737-3-git-send-email-mgorman-3eNAlZScCAx27rWaFMvyedHuzzzSOjJt@public.gmane.org>
2016-03-03 11:31     ` Vlastimil Babka
2016-03-03 11:31       ` Vlastimil Babka
2016-03-03 11:31       ` Vlastimil Babka
2016-03-09 11:59       ` Mel Gorman
2016-03-09 11:59         ` Mel Gorman
     [not found]         ` <20160309115909.GA31585-3eNAlZScCAx27rWaFMvyedHuzzzSOjJt@public.gmane.org>
2016-03-09 12:30           ` Vlastimil Babka
2016-03-09 12:30             ` Vlastimil Babka
2016-03-09 12:30             ` Vlastimil Babka
2016-02-23 15:04 ` [PATCH 03/27] mm, vmstat: Add infrastructure for per-node vmstats Mel Gorman
2016-02-23 15:04   ` Mel Gorman
2016-02-23 18:13   ` Johannes Weiner
2016-02-23 18:13     ` Johannes Weiner
2016-02-24  9:19     ` Mel Gorman
2016-02-24  9:19       ` Mel Gorman
2016-02-23 15:04 ` [PATCH 04/27] mm, vmscan: Move lru_lock to the node Mel Gorman
2016-02-23 15:04   ` Mel Gorman
2016-02-23 18:40   ` Johannes Weiner
2016-02-23 18:40     ` Johannes Weiner
2016-02-23 15:04 ` [PATCH 05/27] mm, vmscan: Move LRU lists to node Mel Gorman
2016-02-23 15:04   ` Mel Gorman
2016-02-23 18:42   ` Johannes Weiner
2016-02-23 18:42     ` Johannes Weiner
2016-02-23 15:04 ` [PATCH 06/27] mm, vmscan: Begin reclaiming pages on a per-node basis Mel Gorman
2016-02-23 15:04   ` Mel Gorman
2016-02-23 18:57   ` Johannes Weiner
2016-02-23 18:57     ` Johannes Weiner
2016-02-23 19:03     ` Johannes Weiner
2016-02-23 19:03       ` Johannes Weiner
2016-02-24 10:21     ` Mel Gorman
2016-02-24 10:21       ` Mel Gorman
2016-02-23 15:04 ` [PATCH 07/27] mm, vmscan: Have kswapd only scan based on the highest requested zone Mel Gorman
2016-02-23 15:04   ` Mel Gorman
2016-02-25 22:17   ` Johannes Weiner
2016-02-25 22:17     ` Johannes Weiner
2016-02-23 15:04 ` [PATCH 08/27] mm, vmscan: Make kswapd reclaim in terms of nodes Mel Gorman
2016-02-23 15:04   ` Mel Gorman
2016-02-28 16:16   ` Johannes Weiner
2016-02-28 16:16     ` Johannes Weiner
2016-03-03 13:46   ` Vlastimil Babka
2016-03-03 13:46     ` Vlastimil Babka
2016-03-09 14:45     ` Mel Gorman
2016-03-09 14:45       ` Mel Gorman
2016-02-23 15:04 ` [PATCH 09/27] mm, vmscan: Simplify the logic deciding whether kswapd sleeps Mel Gorman
2016-02-23 15:04   ` Mel Gorman
2016-02-28 16:16   ` Johannes Weiner
2016-02-28 16:16     ` Johannes Weiner
2016-02-23 15:04 ` [PATCH 10/27] mm, vmscan: By default have direct reclaim only shrink once per node Mel Gorman
2016-02-23 15:04   ` Mel Gorman
2016-02-28 16:17   ` Johannes Weiner
2016-02-28 16:17     ` Johannes Weiner
2016-02-23 15:04 ` [PATCH 11/27] mm, vmscan: Clear congestion, dirty and need for compaction on a per-node basis Mel Gorman
2016-02-23 15:04   ` Mel Gorman
2016-02-23 15:04 ` [PATCH 12/27] mm: vmscan: Do not reclaim from kswapd if there is any eligible zone Mel Gorman
2016-02-23 15:04   ` Mel Gorman
2016-02-23 15:04 ` [PATCH 13/27] mm, vmscan: Make shrink_node decisions more node-centric Mel Gorman
2016-02-23 15:04   ` Mel Gorman
2016-02-23 15:04 ` [PATCH 14/27] mm, memcg: Move memcg limit enforcement from zones to nodes Mel Gorman
2016-02-23 15:04   ` Mel Gorman
2016-02-23 15:04 ` [PATCH 15/27] mm, workingset: Make working set detection node-aware Mel Gorman
2016-02-23 15:04   ` Mel Gorman
2016-02-28 16:17   ` Johannes Weiner
2016-02-28 16:17     ` Johannes Weiner
2016-02-23 15:17 ` [PATCH 16/27] mm, page_alloc: Consider dirtyable memory in terms of nodes Mel Gorman
2016-02-23 15:17   ` Mel Gorman
2016-02-28 16:17   ` Johannes Weiner [this message]
2016-02-28 16:17     ` Johannes Weiner
2016-02-23 15:18 ` [PATCH 17/27] mm: Move page mapped accounting to the node Mel Gorman
2016-02-23 15:18   ` Mel Gorman
2016-02-23 15:18 ` [PATCH 18/27] mm: Rename NR_ANON_PAGES to NR_ANON_MAPPED Mel Gorman
2016-02-23 15:18   ` Mel Gorman
2016-02-23 15:18 ` [PATCH 19/27] mm: Move most file-based accounting to the node Mel Gorman
2016-02-23 15:18   ` Mel Gorman
2016-02-23 15:19 ` [PATCH 20/27] mm: Move vmscan writes and file write " Mel Gorman
2016-02-23 15:19   ` Mel Gorman
2016-02-23 15:19 ` [PATCH 21/27] mm, vmscan: Update classzone_idx if buffer_heads_over_limit Mel Gorman
2016-02-23 15:19   ` Mel Gorman
2016-02-23 15:19 ` [PATCH 22/27] mm, vmscan: Only wakeup kswapd once per node for the requested classzone Mel Gorman
2016-02-23 15:19   ` Mel Gorman
2016-02-23 15:20 ` [PATCH 23/27] mm, vmscan: Account in vmstat for pages skipped during reclaim Mel Gorman
2016-02-23 15:20   ` Mel Gorman
2016-02-23 15:20 ` [PATCH 24/27] mm: Convert zone_reclaim to node_reclaim Mel Gorman
2016-02-23 15:20   ` Mel Gorman
2016-02-23 15:20 ` [PATCH 25/27] mm, vmscan: Add classzone information to tracepoints Mel Gorman
2016-02-23 15:20   ` Mel Gorman
2016-02-23 15:21 ` [PATCH 26/27] mm, page_alloc: Remove fair zone allocation policy Mel Gorman
2016-02-23 15:21   ` Mel Gorman
2016-02-23 15:21 ` [PATCH 27/27] mm: page_alloc: Cache the last node whose dirty limit is reached Mel Gorman
2016-02-23 15:21   ` Mel Gorman
2016-02-23 17:15 ` [RFC PATCH 00/27] Move LRU page reclaim from zones to nodes v2 Christoph Lameter
2016-02-23 17:15   ` Christoph Lameter
2016-02-23 20:04 ` Johannes Weiner
2016-02-23 20:04   ` Johannes Weiner
2016-02-23 20:19   ` Mel Gorman
2016-02-23 20:19     ` Mel Gorman
2016-02-23 20:59     ` Johannes Weiner
2016-02-23 20:59       ` Johannes Weiner
2016-02-23 21:58       ` Mel Gorman
2016-02-23 21:58         ` Mel Gorman
2016-02-24  0:12         ` Johannes Weiner
2016-02-24  0:12           ` Johannes Weiner
2016-02-24 10:46           ` Mel Gorman
2016-02-24 10:46             ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160228161746.GG25622@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=riel@surriel.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.