stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 3.10 14/77] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no ZONE_NORMAL
Date: Sat, 28 Jun 2014 10:46:07 -0700	[thread overview]
Message-ID: <20140628174522.352181945@linuxfoundation.org> (raw)
In-Reply-To: <20140628174521.691276402@linuxfoundation.org>

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mel Gorman <mgorman@suse.de>

commit 675becce15f320337499bc1a9356260409a5ba29 upstream.

throttle_direct_reclaim() is meant to trigger during swap-over-network
during which the min watermark is treated as a pfmemalloc reserve.  It
throttes on the first node in the zonelist but this is flawed.

The user-visible impact is that a process running on CPU whose local
memory node has no ZONE_NORMAL will stall for prolonged periods of time,
possibly indefintely.  This is due to throttle_direct_reclaim thinking the
pfmemalloc reserves are depleted when in fact they don't exist on that
node.

On a NUMA machine running a 32-bit kernel (I know) allocation requests
from CPUs on node 1 would detect no pfmemalloc reserves and the process
gets throttled.  This patch adjusts throttling of direct reclaim to
throttle based on the first node in the zonelist that has a usable
ZONE_NORMAL or lower zone.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/vmscan.c |   43 +++++++++++++++++++++++++++++++++++++------
 1 file changed, 37 insertions(+), 6 deletions(-)

--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2286,10 +2286,17 @@ static bool pfmemalloc_watermark_ok(pg_d
 
 	for (i = 0; i <= ZONE_NORMAL; i++) {
 		zone = &pgdat->node_zones[i];
+		if (!populated_zone(zone))
+			continue;
+
 		pfmemalloc_reserve += min_wmark_pages(zone);
 		free_pages += zone_page_state(zone, NR_FREE_PAGES);
 	}
 
+	/* If there are no reserves (unexpected config) then do not throttle */
+	if (!pfmemalloc_reserve)
+		return true;
+
 	wmark_ok = free_pages > pfmemalloc_reserve / 2;
 
 	/* kswapd must be awake if processes are being throttled */
@@ -2314,9 +2321,9 @@ static bool pfmemalloc_watermark_ok(pg_d
 static bool throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
 					nodemask_t *nodemask)
 {
+	struct zoneref *z;
 	struct zone *zone;
-	int high_zoneidx = gfp_zone(gfp_mask);
-	pg_data_t *pgdat;
+	pg_data_t *pgdat = NULL;
 
 	/*
 	 * Kernel threads should not be throttled as they may be indirectly
@@ -2335,10 +2342,34 @@ static bool throttle_direct_reclaim(gfp_
 	if (fatal_signal_pending(current))
 		goto out;
 
-	/* Check if the pfmemalloc reserves are ok */
-	first_zones_zonelist(zonelist, high_zoneidx, NULL, &zone);
-	pgdat = zone->zone_pgdat;
-	if (pfmemalloc_watermark_ok(pgdat))
+	/*
+	 * Check if the pfmemalloc reserves are ok by finding the first node
+	 * with a usable ZONE_NORMAL or lower zone. The expectation is that
+	 * GFP_KERNEL will be required for allocating network buffers when
+	 * swapping over the network so ZONE_HIGHMEM is unusable.
+	 *
+	 * Throttling is based on the first usable node and throttled processes
+	 * wait on a queue until kswapd makes progress and wakes them. There
+	 * is an affinity then between processes waking up and where reclaim
+	 * progress has been made assuming the process wakes on the same node.
+	 * More importantly, processes running on remote nodes will not compete
+	 * for remote pfmemalloc reserves and processes on different nodes
+	 * should make reasonable progress.
+	 */
+	for_each_zone_zonelist_nodemask(zone, z, zonelist,
+					gfp_mask, nodemask) {
+		if (zone_idx(zone) > ZONE_NORMAL)
+			continue;
+
+		/* Throttle based on the first usable node */
+		pgdat = zone->zone_pgdat;
+		if (pfmemalloc_watermark_ok(pgdat))
+			goto out;
+		break;
+	}
+
+	/* If no zone was usable by the allocation flags then do not throttle */
+	if (!pgdat)
 		goto out;
 
 	/* Account for the throttling */



  parent reply	other threads:[~2014-06-28 17:46 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-28 17:45 [PATCH 3.10 00/77] 3.10.46-stable review Greg Kroah-Hartman
2014-06-28 17:45 ` [PATCH 3.10 01/77] can: peak_pci: prevent use after free at netdev removal Greg Kroah-Hartman
2014-06-28 17:45 ` [PATCH 3.10 02/77] af_iucv: wrong mapping of sent and confirmed skbs Greg Kroah-Hartman
2014-06-28 17:45 ` [PATCH 3.10 03/77] net: cpsw: fix null dereference at probe Greg Kroah-Hartman
2014-06-28 17:45 ` [PATCH 3.10 04/77] extcon: max8997: Fix NULL pointer exception on missing pdata Greg Kroah-Hartman
2014-06-28 17:45 ` [PATCH 3.10 05/77] staging: tidspbridge: check for CONFIG_SND_OMAP_SOC_MCBSP Greg Kroah-Hartman
2014-06-28 17:45 ` [PATCH 3.10 06/77] applicom: dereferencing NULL on error path Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 07/77] usb: usbtest: fix unlink write error with pattern 1 Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 08/77] USB: usbtest: add a timeout for scatter-gather tests Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 09/77] usb: gadget: rename CONFIG_USB_GADGET_PXA25X Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 10/77] usb: dwc3: gadget: clear stall when disabling endpoint Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 11/77] ARM: OMAP: replace checks for CONFIG_USB_GADGET_OMAP Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 12/77] USB: EHCI: avoid BIOS handover on the HASEE E200 Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 13/77] USB: option: fix runtime PM handling Greg Kroah-Hartman
2014-06-28 17:46 ` Greg Kroah-Hartman [this message]
2014-06-28 17:46 ` [PATCH 3.10 15/77] mm/memory-failure.c-failure: send right signal code to correct thread Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 16/77] mm/memory-failure.c: dont let collect_procs() skip over processes for MF_ACTION_REQUIRED Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 17/77] mm: fix sleeping function warning from __put_anon_vma Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 18/77] HID: core: fix validation of report id 0 Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 19/77] mm: vmscan: clear kswapds special reclaim powers before exiting Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 20/77] ptrace: fix fork event messages across pid namespaces Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 21/77] arm64: ptrace: change fs when passing kernel pointer to regset code Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 22/77] idr: fix overflow bug during maximum ID calculation at maximum height Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 23/77] s390/lowcore: reserve 96 bytes for IRB in lowcore Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 24/77] ext4: fix zeroing of page during writeback Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 25/77] ext4: fix wrong assert in ext4_mb_normalize_request() Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 26/77] matroxfb: perform a dummy read of M_STATUS Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 27/77] USB: usb_wwan: fix urb leak in write error path Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 28/77] USB: usb_wwan: fix race between write and resume Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 29/77] USB: usb_wwan: fix write and suspend race Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 30/77] USB: usb_wwan: fix urb leak at shutdown Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 31/77] USB: usb_wwan: fix potential NULL-deref at resume Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 32/77] USB: usb_wwan: fix potential blocked I/O after resume Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 33/77] USB: sierra: fix AA deadlock in open error path Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 34/77] USB: sierra: fix use after free at suspend/resume Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 35/77] USB: sierra: fix urb and memory leak in resume error path Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 36/77] USB: sierra: fix urb and memory leak on disconnect Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 37/77] USB: sierra: fix remote wakeup Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 38/77] usb: qcserial: add Netgear AirCard 341U Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 39/77] usb: qcserial: add additional Sierra Wireless QMI devices Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 40/77] USB: serial: fix potential runtime pm imbalance at device remove Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 41/77] media: ivtv: Fix Oops when no firmware is loaded Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 42/77] media: stk1160: Avoid stack-allocated buffer for control URBs Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 43/77] ACPICA: utstring: Check array index bound before use Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 44/77] ACPI: Fix conflict between customized DSDT and DSDT local copy Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 45/77] media: uvcvideo: Fix clock param realtime setting Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 46/77] ARM: stacktrace: avoid listing stacktrace functions in stacktrace Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 47/77] ARM: 8037/1: mm: support big-endian page tables Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 50/77] Target/iser: Bail from accept_np if np_thread is trying to close Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 51/77] Target/iser: Fix hangs in connection teardown Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 52/77] target: Set CMD_T_ACTIVE bit for Task Management Requests Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 53/77] target: Use complete_all for se_cmd->t_transport_stop_comp Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 54/77] iscsi-target: Fix ABORT_TASK + connection reset iscsi_queue_req memory leak Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 55/77] target: Report correct response length for some commands Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 56/77] [PATCH] target: Explicitly clear ramdisk_mcp backend pages Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 57/77] x86-32, espfix: Remove filter for espfix32 due to race Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 58/77] x86, x32: Use compat shims for io_{setup,submit} Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 59/77] genirq: Sanitize spurious interrupt detection of threaded irqs Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 60/77] aio: fix aio request leak when events are reaped by userspace Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 61/77] aio: fix kernel memory disclosure in io_getevents() introduced in v3.10 Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 62/77] skbuff: skb_segment: orphan frags before copying Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 63/77] Btrfs: fix double free in find_lock_delalloc_range Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 64/77] btrfs: Add ctime/mtime update for btrfs device add/remove Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 65/77] Btrfs: output warning instead of error when loading free space cache failed Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 66/77] Btrfs: make sure there are not any read requests before stopping workers Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.10 67/77] Btrfs: mark mapping with error flag to report errors to userspace Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.10 68/77] Btrfs: set right total device count for seeding support Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.10 69/77] Btrfs: send, dont error in the presence of subvols/snapshots Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.10 70/77] fs: btrfs: volumes.c: Fix for possible null pointer dereference Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.10 71/77] Btrfs: use right type to get real comparison Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.10 72/77] Btrfs: fix scrub_print_warning to handle skinny metadata extents Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.10 73/77] btrfs: fix use of uninit "ret" in end_extent_writepage() Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.10 74/77] usb: usbtest: Add timetout to simple_io() Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.10 75/77] Target/iser: Improve cm events handling Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.10 76/77] Target/iser: Wait for proper cleanup before unloading Greg Kroah-Hartman
2014-06-28 22:30 ` [PATCH 3.10 00/77] 3.10.46-stable review Guenter Roeck
2014-06-30 16:17 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140628174522.352181945@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).