From: Mel Gorman <mgorman@suse.de>
To: Linux-MM <linux-mm@kvack.org>, Linux-Netdev <netdev@vger.kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
David Miller <davem@davemloft.net>, Neil Brown <neilb@suse.de>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Mel Gorman <mgorman@suse.de>
Subject: [PATCH 12/13] mm: Throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage
Date: Tue, 26 Apr 2011 08:36:53 +0100 [thread overview]
Message-ID: <1303803414-5937-13-git-send-email-mgorman@suse.de> (raw)
In-Reply-To: <1303803414-5937-1-git-send-email-mgorman@suse.de>
If swap is backed by network storage such as NBD, there is a risk that a
large number of reclaimers can hang the system by consuming all
PF_MEMALLOC reserves. To avoid these hangs, the administrator must tune
min_free_kbytes in advance. This patch will throttle direct reclaimers
if half the PF_MEMALLOC reserves are in use as the system is at risk of
hanging.
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
include/linux/mmzone.h | 1 +
mm/page_alloc.c | 1 +
mm/vmscan.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 59 insertions(+), 0 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e56f835..8e5c627 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -638,6 +638,7 @@ typedef struct pglist_data {
range, including holes */
int node_id;
wait_queue_head_t kswapd_wait;
+ wait_queue_head_t pfmemalloc_wait;
struct task_struct *kswapd;
int kswapd_max_order;
enum zone_type classzone_idx;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c0c42ce..13fc246 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4225,6 +4225,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
pgdat_resize_init(pgdat);
pgdat->nr_zones = 0;
init_waitqueue_head(&pgdat->kswapd_wait);
+ init_waitqueue_head(&pgdat->pfmemalloc_wait);
pgdat->kswapd_max_order = 0;
pgdat_page_cgroup_init(pgdat);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b3a569f..8b6da2b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2117,6 +2117,55 @@ out:
return 0;
}
+static bool pfmemalloc_watermark_ok(pg_data_t *pgdat, int high_zoneidx)
+{
+ struct zone *zone;
+ unsigned long pfmemalloc_reserve = 0;
+ unsigned long free_pages = 0;
+ int i;
+
+ for (i = 0; i <= high_zoneidx; i++) {
+ zone = &pgdat->node_zones[i];
+ pfmemalloc_reserve += min_wmark_pages(zone);
+ free_pages += zone_page_state(zone, NR_FREE_PAGES);
+ }
+
+ return (free_pages > pfmemalloc_reserve / 2) ? true : false;
+}
+
+/*
+ * Throttle direct reclaimers if backing storage is backed by the network
+ * and the PFMEMALLOC reserve for the preferred node is getting dangerously
+ * depleted. kswapd will continue to make progress and wake the processes
+ * when the low watermark is reached
+ */
+static void throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
+ nodemask_t *nodemask)
+{
+ struct zone *zone;
+ int high_zoneidx = gfp_zone(gfp_mask);
+ DEFINE_WAIT(wait);
+
+ /* Check if the pfmemalloc reserves are ok */
+ first_zones_zonelist(zonelist, high_zoneidx, NULL, &zone);
+ prepare_to_wait(&zone->zone_pgdat->pfmemalloc_wait, &wait,
+ TASK_INTERRUPTIBLE);
+ if (pfmemalloc_watermark_ok(zone->zone_pgdat, high_zoneidx))
+ goto out;
+
+ /* Throttle */
+ do {
+ schedule();
+ finish_wait(&zone->zone_pgdat->pfmemalloc_wait, &wait);
+ prepare_to_wait(&zone->zone_pgdat->pfmemalloc_wait, &wait,
+ TASK_INTERRUPTIBLE);
+ } while (!pfmemalloc_watermark_ok(zone->zone_pgdat, high_zoneidx) &&
+ !fatal_signal_pending(current));
+
+out:
+ finish_wait(&zone->zone_pgdat->pfmemalloc_wait, &wait);
+}
+
unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
gfp_t gfp_mask, nodemask_t *nodemask)
{
@@ -2133,6 +2182,8 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
.nodemask = nodemask,
};
+ throttle_direct_reclaim(gfp_mask, zonelist, nodemask);
+
trace_mm_vmscan_direct_reclaim_begin(order,
sc.may_writepage,
gfp_mask);
@@ -2488,6 +2539,12 @@ loop_again:
}
}
+
+ /* Wake throttled direct reclaimers if low watermark is met */
+ if (waitqueue_active(&pgdat->pfmemalloc_wait) &&
+ pfmemalloc_watermark_ok(pgdat, MAX_NR_ZONES - 1))
+ wake_up_interruptible(&pgdat->pfmemalloc_wait);
+
if (all_zones_ok || (order && pgdat_balanced(pgdat, balanced, *classzone_idx)))
break; /* kswapd: all done */
/*
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-04-26 7:37 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-26 7:36 [PATCH 00/13] Swap-over-NBD without deadlocking Mel Gorman
2011-04-26 7:36 ` [PATCH 01/13] mm: Serialize access to min_free_kbytes Mel Gorman
2011-04-26 7:36 ` [PATCH 02/13] mm: sl[au]b: Add knowledge of PFMEMALLOC reserve pages Mel Gorman
2011-04-26 11:15 ` NeilBrown
2011-04-26 11:33 ` Mel Gorman
2011-04-26 12:05 ` NeilBrown
2011-04-26 11:37 ` NeilBrown
2011-04-26 13:59 ` Mel Gorman
2011-04-27 23:21 ` NeilBrown
2011-04-28 9:46 ` Mel Gorman
2011-04-26 7:36 ` [PATCH 03/13] mm: Introduce __GFP_MEMALLOC to allow access to emergency reserves Mel Gorman
2011-04-26 9:49 ` NeilBrown
2011-04-26 10:36 ` Mel Gorman
2011-04-26 10:53 ` NeilBrown
2011-04-26 14:00 ` Mel Gorman
2011-04-26 7:36 ` [PATCH 04/13] mm: allow PF_MEMALLOC from softirq context Mel Gorman
2011-04-26 7:36 ` [PATCH 05/13] mm: Ignore mempolicies when using ALLOC_NO_WATERMARK Mel Gorman
2011-04-26 7:36 ` [PATCH 06/13] net: Introduce sk_allocation() to allow addition of GFP flags depending on the individual socket Mel Gorman
2011-04-26 7:36 ` [PATCH 07/13] netvm: Allow the use of __GFP_MEMALLOC by specific sockets Mel Gorman
2011-04-26 7:36 ` [PATCH 08/13] netvm: Allow skb allocation to use PFMEMALLOC reserves Mel Gorman
2011-04-26 7:36 ` [PATCH 09/13] netvm: Set PF_MEMALLOC as appropriate during SKB processing Mel Gorman
2011-04-26 12:21 ` NeilBrown
2011-04-26 14:10 ` Mel Gorman
2011-04-26 23:22 ` NeilBrown
2011-04-26 7:36 ` [PATCH 10/13] mm: Micro-optimise slab to avoid a function call Mel Gorman
2011-04-26 7:36 ` [PATCH 11/13] nbd: Set SOCK_MEMALLOC for access to PFMEMALLOC reserves Mel Gorman
2011-04-26 7:36 ` Mel Gorman [this message]
2011-04-26 12:30 ` [PATCH 12/13] mm: Throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage NeilBrown
2011-04-26 14:26 ` Mel Gorman
2011-04-26 23:18 ` NeilBrown
2011-04-27 8:36 ` Mel Gorman
2011-04-26 7:36 ` [PATCH 13/13] mm: Account for the number of times direct reclaimers get throttled Mel Gorman
2011-04-26 12:35 ` NeilBrown
2011-04-26 14:26 ` Mel Gorman
2011-04-26 14:23 ` [PATCH 00/13] Swap-over-NBD without deadlocking Peter Zijlstra
2011-04-26 14:46 ` Mel Gorman
2011-04-26 14:50 ` Peter Zijlstra
2011-04-27 8:43 ` Mel Gorman
2011-04-28 13:31 ` Pavel Machek
2011-04-28 13:42 ` Mel Gorman
-- strict thread matches above, loose matches on Subject: below --
2011-04-27 16:07 [PATCH 00/13] Swap-over-NBD without deadlocking v3 Mel Gorman
2011-04-27 16:08 ` [PATCH 12/13] mm: Throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage Mel Gorman
2011-04-28 0:22 ` NeilBrown
2011-04-28 10:14 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1303803414-5937-13-git-send-email-mgorman@suse.de \
--to=mgorman@suse.de \
--cc=a.p.zijlstra@chello.nl \
--cc=davem@davemloft.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=neilb@suse.de \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).