stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 3.14 023/110] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no ZONE_NORMAL
Date: Sat, 28 Jun 2014 10:46:20 -0700	[thread overview]
Message-ID: <20140628174546.603703167@linuxfoundation.org> (raw)
In-Reply-To: <20140628174545.354748696@linuxfoundation.org>

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mel Gorman <mgorman@suse.de>

commit 675becce15f320337499bc1a9356260409a5ba29 upstream.

throttle_direct_reclaim() is meant to trigger during swap-over-network
during which the min watermark is treated as a pfmemalloc reserve.  It
throttes on the first node in the zonelist but this is flawed.

The user-visible impact is that a process running on CPU whose local
memory node has no ZONE_NORMAL will stall for prolonged periods of time,
possibly indefintely.  This is due to throttle_direct_reclaim thinking the
pfmemalloc reserves are depleted when in fact they don't exist on that
node.

On a NUMA machine running a 32-bit kernel (I know) allocation requests
from CPUs on node 1 would detect no pfmemalloc reserves and the process
gets throttled.  This patch adjusts throttling of direct reclaim to
throttle based on the first node in the zonelist that has a usable
ZONE_NORMAL or lower zone.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/vmscan.c |   43 +++++++++++++++++++++++++++++++++++++------
 1 file changed, 37 insertions(+), 6 deletions(-)

--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2502,10 +2502,17 @@ static bool pfmemalloc_watermark_ok(pg_d
 
 	for (i = 0; i <= ZONE_NORMAL; i++) {
 		zone = &pgdat->node_zones[i];
+		if (!populated_zone(zone))
+			continue;
+
 		pfmemalloc_reserve += min_wmark_pages(zone);
 		free_pages += zone_page_state(zone, NR_FREE_PAGES);
 	}
 
+	/* If there are no reserves (unexpected config) then do not throttle */
+	if (!pfmemalloc_reserve)
+		return true;
+
 	wmark_ok = free_pages > pfmemalloc_reserve / 2;
 
 	/* kswapd must be awake if processes are being throttled */
@@ -2530,9 +2537,9 @@ static bool pfmemalloc_watermark_ok(pg_d
 static bool throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
 					nodemask_t *nodemask)
 {
+	struct zoneref *z;
 	struct zone *zone;
-	int high_zoneidx = gfp_zone(gfp_mask);
-	pg_data_t *pgdat;
+	pg_data_t *pgdat = NULL;
 
 	/*
 	 * Kernel threads should not be throttled as they may be indirectly
@@ -2551,10 +2558,34 @@ static bool throttle_direct_reclaim(gfp_
 	if (fatal_signal_pending(current))
 		goto out;
 
-	/* Check if the pfmemalloc reserves are ok */
-	first_zones_zonelist(zonelist, high_zoneidx, NULL, &zone);
-	pgdat = zone->zone_pgdat;
-	if (pfmemalloc_watermark_ok(pgdat))
+	/*
+	 * Check if the pfmemalloc reserves are ok by finding the first node
+	 * with a usable ZONE_NORMAL or lower zone. The expectation is that
+	 * GFP_KERNEL will be required for allocating network buffers when
+	 * swapping over the network so ZONE_HIGHMEM is unusable.
+	 *
+	 * Throttling is based on the first usable node and throttled processes
+	 * wait on a queue until kswapd makes progress and wakes them. There
+	 * is an affinity then between processes waking up and where reclaim
+	 * progress has been made assuming the process wakes on the same node.
+	 * More importantly, processes running on remote nodes will not compete
+	 * for remote pfmemalloc reserves and processes on different nodes
+	 * should make reasonable progress.
+	 */
+	for_each_zone_zonelist_nodemask(zone, z, zonelist,
+					gfp_mask, nodemask) {
+		if (zone_idx(zone) > ZONE_NORMAL)
+			continue;
+
+		/* Throttle based on the first usable node */
+		pgdat = zone->zone_pgdat;
+		if (pfmemalloc_watermark_ok(pgdat))
+			goto out;
+		break;
+	}
+
+	/* If no zone was usable by the allocation flags then do not throttle */
+	if (!pgdat)
 		goto out;
 
 	/* Account for the throttling */



  parent reply	other threads:[~2014-06-28 17:46 UTC|newest]

Thread overview: 108+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-28 17:45 [PATCH 3.14 000/110] 3.14.10-stable review Greg Kroah-Hartman
2014-06-28 17:45 ` [PATCH 3.14 001/110] Revert "net: eth: cpsw: Correctly attach to GPIO bitbang MDIO driver" Greg Kroah-Hartman
2014-06-28 17:45 ` [PATCH 3.14 002/110] can: peak_pci: prevent use after free at netdev removal Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 003/110] af_iucv: wrong mapping of sent and confirmed skbs Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 004/110] net: cpsw: fix null dereference at probe Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 005/110] extcon: max8997: Fix NULL pointer exception on missing pdata Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 006/110] extcon: max77693: Fix two NULL pointer exceptions " Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 007/110] extcon: max14577: Fix probe failure on successful work queue Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 008/110] extcon: max14577: Properly handle regmap_irq_get_virq error Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 010/110] staging: tidspbridge: check for CONFIG_SND_OMAP_SOC_MCBSP Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 011/110] Staging: rtl8188eu: overflow in update_sta_support_rate() Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 012/110] staging/mt29f_spinand: Terminate of match table Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 013/110] applicom: dereferencing NULL on error path Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 014/110] usb: usbtest: fix unlink write error with pattern 1 Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 015/110] USB: usbtest: add a timeout for scatter-gather tests Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 016/110] usb: gadget: rename CONFIG_USB_GADGET_PXA25X Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 017/110] usb: dwc3: gadget: clear stall when disabling endpoint Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 018/110] ARM: OMAP: replace checks for CONFIG_USB_GADGET_OMAP Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 019/110] USB: EHCI: avoid BIOS handover on the HASEE E200 Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 020/110] USB: option: fix runtime PM handling Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 021/110] hugetlb: restrict hugepage_migration_support() to x86_64 Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 022/110] kthread: fix return value of kthread_create() upon SIGKILL Greg Kroah-Hartman
2014-06-28 17:46 ` Greg Kroah-Hartman [this message]
2014-06-28 17:46 ` [PATCH 3.14 024/110] mm: page_alloc: use word-based accesses for get/set pageblock bitmaps Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 025/110] mm/memory-failure.c-failure: send right signal code to correct thread Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 026/110] mm/memory-failure.c: dont let collect_procs() skip over processes for MF_ACTION_REQUIRED Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 027/110] mm/memory-failure.c: support use of a dedicated thread to handle SIGBUS(BUS_MCEERR_AO) Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 028/110] mm: fix sleeping function warning from __put_anon_vma Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 029/110] HID: core: fix validation of report id 0 Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 030/110] mm: vmscan: clear kswapds special reclaim powers before exiting Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 031/110] ptrace: fix fork event messages across pid namespaces Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 032/110] arm64: ptrace: change fs when passing kernel pointer to regset code Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 033/110] arm64: ptrace: fix empty registers set in prstatus of aarch32 process core Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 034/110] idr: fix overflow bug during maximum ID calculation at maximum height Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 035/110] s390/time: cast tv_nsec to u64 prior to shift in update_vsyscall Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 036/110] s390/lowcore: reserve 96 bytes for IRB in lowcore Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 037/110] ext4: fix data integrity sync in ordered mode Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 038/110] ext4: fix zeroing of page during writeback Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 039/110] ext4: fix wrong assert in ext4_mb_normalize_request() Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 040/110] matroxfb: perform a dummy read of M_STATUS Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 041/110] USB: usb_wwan: fix urb leak in write error path Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 042/110] USB: usb_wwan: fix race between write and resume Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 043/110] USB: usb_wwan: fix write and suspend race Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 044/110] USB: usb_wwan: fix urb leak at shutdown Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 045/110] USB: usb_wwan: fix potential NULL-deref at resume Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 046/110] USB: usb_wwan: fix potential blocked I/O after resume Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 047/110] USB: sierra: fix AA deadlock in open error path Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 048/110] USB: sierra: fix use after free at suspend/resume Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 049/110] USB: sierra: fix urb and memory leak in resume error path Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 050/110] USB: sierra: fix urb and memory leak on disconnect Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 051/110] USB: sierra: fix remote wakeup Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 052/110] usb: qcserial: add Netgear AirCard 341U Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 053/110] usb: qcserial: add additional Sierra Wireless QMI devices Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 054/110] USB: serial: fix potential runtime pm imbalance at device remove Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 055/110] media: ivtv: Fix Oops when no firmware is loaded Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 056/110] media: stk1160: Avoid stack-allocated buffer for control URBs Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 058/110] ACPICA: utstring: Check array index bound before use Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 059/110] ACPI: Fix conflict between customized DSDT and DSDT local copy Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 060/110] rtmutex: Detect changes in the pi lock chain Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 061/110] rtmutex: Handle deadlock detection smarter Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.14 062/110] rtmutex: Plug slow unlock race Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 063/110] media: uvcvideo: Fix clock param realtime setting Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 065/110] media: saa7134: fix regression with tvtime Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 066/110] ARM: stacktrace: avoid listing stacktrace functions in stacktrace Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 067/110] ARM: 8037/1: mm: support big-endian page tables Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 068/110] PM / OPP: fix incorrect OPP count handling in of_init_opp_table Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 070/110] Bluetooth: 6LoWPAN: Fix MAC address universal/local bit handling Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 072/110] Target/iser: Bail from accept_np if np_thread is trying to close Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 073/110] Target/iser: Fix hangs in connection teardown Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 074/110] Target/iser: Improve cm events handling Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 075/110] Target/iser: Wait for proper cleanup before unloading Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 076/110] target: Set CMD_T_ACTIVE bit for Task Management Requests Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 077/110] target: Use complete_all for se_cmd->t_transport_stop_comp Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 078/110] iscsi-target: Fix ABORT_TASK + connection reset iscsi_queue_req memory leak Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 079/110] Target/iscsi: Fix sendtargets response pdu for iser transport Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 080/110] target: Report correct response length for some commands Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 081/110] [PATCH] target: Explicitly clear ramdisk_mcp backend pages Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 082/110] SCSI: Fix spurious request sense in error handling Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 083/110] ARM: mvebu: DT: fix OpenBlocks AX3-4 RAM size Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 084/110] arm64/dma: Removing ARCH_HAS_DMA_GET_REQUIRED_MASK macro Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 085/110] x86-32, espfix: Remove filter for espfix32 due to race Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 086/110] x86, x32: Use compat shims for io_{setup,submit} Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 087/110] genirq: Sanitize spurious interrupt detection of threaded irqs Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 088/110] aio: fix aio request leak when events are reaped by userspace Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 089/110] aio: fix kernel memory disclosure in io_getevents() introduced in v3.10 Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 090/110] CIFS: Fix memory leaks in SMB2_open Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 091/110] Btrfs: fix double free in find_lock_delalloc_range Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 092/110] btrfs: Add ctime/mtime update for btrfs device add/remove Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 093/110] Btrfs: output warning instead of error when loading free space cache failed Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 094/110] Btrfs: make sure there are not any read requests before stopping workers Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 095/110] Btrfs: fix NULL pointer crash of deleting a seed device Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 096/110] Btrfs: mark mapping with error flag to report errors to userspace Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 097/110] Btrfs: set right total device count for seeding support Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 098/110] Btrfs: send, dont error in the presence of subvols/snapshots Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 099/110] fs: btrfs: volumes.c: Fix for possible null pointer dereference Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 100/110] Btrfs: dont check nodes for extent items Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 101/110] Btrfs: use right type to get real comparison Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 102/110] Btrfs: fix scrub_print_warning to handle skinny metadata extents Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 103/110] btrfs: fix use of uninit "ret" in end_extent_writepage() Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 104/110] btrfs: fix lockdep warning with reclaim lock inversion Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 105/110] btrfs: allocate raid type kobjects dynamically Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 106/110] lz4: fix another possible overrun Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 108/110] epoll: fix use-after-free in eventpoll_release_file Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 109/110] builddeb: use $OBJCOPY variable instead of objcopy Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.14 110/110] efi-pstore: Fix an overflow on 32-bit builds Greg Kroah-Hartman
2014-06-28 22:31 ` [PATCH 3.14 000/110] 3.14.10-stable review Guenter Roeck
2014-06-29  9:34 ` Holger Hoffstätte
2014-06-30 16:18 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140628174546.603703167@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).