From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Mel Gorman <mgorman@suse.de>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 3.10 14/77] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no ZONE_NORMAL
Date: Sat, 28 Jun 2014 10:46:07 -0700 [thread overview]
Message-ID: <20140628174522.352181945@linuxfoundation.org> (raw)
In-Reply-To: <20140628174521.691276402@linuxfoundation.org>
3.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mel Gorman <mgorman@suse.de>
commit 675becce15f320337499bc1a9356260409a5ba29 upstream.
throttle_direct_reclaim() is meant to trigger during swap-over-network
during which the min watermark is treated as a pfmemalloc reserve. It
throttes on the first node in the zonelist but this is flawed.
The user-visible impact is that a process running on CPU whose local
memory node has no ZONE_NORMAL will stall for prolonged periods of time,
possibly indefintely. This is due to throttle_direct_reclaim thinking the
pfmemalloc reserves are depleted when in fact they don't exist on that
node.
On a NUMA machine running a 32-bit kernel (I know) allocation requests
from CPUs on node 1 would detect no pfmemalloc reserves and the process
gets throttled. This patch adjusts throttling of direct reclaim to
throttle based on the first node in the zonelist that has a usable
ZONE_NORMAL or lower zone.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
mm/vmscan.c | 43 +++++++++++++++++++++++++++++++++++++------
1 file changed, 37 insertions(+), 6 deletions(-)
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2286,10 +2286,17 @@ static bool pfmemalloc_watermark_ok(pg_d
for (i = 0; i <= ZONE_NORMAL; i++) {
zone = &pgdat->node_zones[i];
+ if (!populated_zone(zone))
+ continue;
+
pfmemalloc_reserve += min_wmark_pages(zone);
free_pages += zone_page_state(zone, NR_FREE_PAGES);
}
+ /* If there are no reserves (unexpected config) then do not throttle */
+ if (!pfmemalloc_reserve)
+ return true;
+
wmark_ok = free_pages > pfmemalloc_reserve / 2;
/* kswapd must be awake if processes are being throttled */
@@ -2314,9 +2321,9 @@ static bool pfmemalloc_watermark_ok(pg_d
static bool throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
nodemask_t *nodemask)
{
+ struct zoneref *z;
struct zone *zone;
- int high_zoneidx = gfp_zone(gfp_mask);
- pg_data_t *pgdat;
+ pg_data_t *pgdat = NULL;
/*
* Kernel threads should not be throttled as they may be indirectly
@@ -2335,10 +2342,34 @@ static bool throttle_direct_reclaim(gfp_
if (fatal_signal_pending(current))
goto out;
- /* Check if the pfmemalloc reserves are ok */
- first_zones_zonelist(zonelist, high_zoneidx, NULL, &zone);
- pgdat = zone->zone_pgdat;
- if (pfmemalloc_watermark_ok(pgdat))
+ /*
+ * Check if the pfmemalloc reserves are ok by finding the first node
+ * with a usable ZONE_NORMAL or lower zone. The expectation is that
+ * GFP_KERNEL will be required for allocating network buffers when
+ * swapping over the network so ZONE_HIGHMEM is unusable.
+ *
+ * Throttling is based on the first usable node and throttled processes
+ * wait on a queue until kswapd makes progress and wakes them. There
+ * is an affinity then between processes waking up and where reclaim
+ * progress has been made assuming the process wakes on the same node.
+ * More importantly, processes running on remote nodes will not compete
+ * for remote pfmemalloc reserves and processes on different nodes
+ * should make reasonable progress.
+ */
+ for_each_zone_zonelist_nodemask(zone, z, zonelist,
+ gfp_mask, nodemask) {
+ if (zone_idx(zone) > ZONE_NORMAL)
+ continue;
+
+ /* Throttle based on the first usable node */
+ pgdat = zone->zone_pgdat;
+ if (pfmemalloc_watermark_ok(pgdat))
+ goto out;
+ break;
+ }
+
+ /* If no zone was usable by the allocation flags then do not throttle */
+ if (!pgdat)
goto out;
/* Account for the throttling */
next prev parent reply other threads:[~2014-06-28 17:46 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-28 17:45 [PATCH 3.10 00/77] 3.10.46-stable review Greg Kroah-Hartman
2014-06-28 17:45 ` [PATCH 3.10 01/77] can: peak_pci: prevent use after free at netdev removal Greg Kroah-Hartman
2014-06-28 17:45 ` [PATCH 3.10 02/77] af_iucv: wrong mapping of sent and confirmed skbs Greg Kroah-Hartman
2014-06-28 17:45 ` [PATCH 3.10 03/77] net: cpsw: fix null dereference at probe Greg Kroah-Hartman
2014-06-28 17:45 ` [PATCH 3.10 04/77] extcon: max8997: Fix NULL pointer exception on missing pdata Greg Kroah-Hartman
2014-06-28 17:45 ` [PATCH 3.10 05/77] staging: tidspbridge: check for CONFIG_SND_OMAP_SOC_MCBSP Greg Kroah-Hartman
2014-06-28 17:45 ` [PATCH 3.10 06/77] applicom: dereferencing NULL on error path Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 07/77] usb: usbtest: fix unlink write error with pattern 1 Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 08/77] USB: usbtest: add a timeout for scatter-gather tests Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 09/77] usb: gadget: rename CONFIG_USB_GADGET_PXA25X Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 10/77] usb: dwc3: gadget: clear stall when disabling endpoint Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 11/77] ARM: OMAP: replace checks for CONFIG_USB_GADGET_OMAP Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 12/77] USB: EHCI: avoid BIOS handover on the HASEE E200 Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 13/77] USB: option: fix runtime PM handling Greg Kroah-Hartman
2014-06-28 17:46 ` Greg Kroah-Hartman [this message]
2014-06-28 17:46 ` [PATCH 3.10 15/77] mm/memory-failure.c-failure: send right signal code to correct thread Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 16/77] mm/memory-failure.c: dont let collect_procs() skip over processes for MF_ACTION_REQUIRED Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 17/77] mm: fix sleeping function warning from __put_anon_vma Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 18/77] HID: core: fix validation of report id 0 Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 19/77] mm: vmscan: clear kswapds special reclaim powers before exiting Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 20/77] ptrace: fix fork event messages across pid namespaces Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 21/77] arm64: ptrace: change fs when passing kernel pointer to regset code Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 22/77] idr: fix overflow bug during maximum ID calculation at maximum height Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 23/77] s390/lowcore: reserve 96 bytes for IRB in lowcore Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 24/77] ext4: fix zeroing of page during writeback Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 25/77] ext4: fix wrong assert in ext4_mb_normalize_request() Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 26/77] matroxfb: perform a dummy read of M_STATUS Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 27/77] USB: usb_wwan: fix urb leak in write error path Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 28/77] USB: usb_wwan: fix race between write and resume Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 29/77] USB: usb_wwan: fix write and suspend race Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 30/77] USB: usb_wwan: fix urb leak at shutdown Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 31/77] USB: usb_wwan: fix potential NULL-deref at resume Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 32/77] USB: usb_wwan: fix potential blocked I/O after resume Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 33/77] USB: sierra: fix AA deadlock in open error path Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 34/77] USB: sierra: fix use after free at suspend/resume Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 35/77] USB: sierra: fix urb and memory leak in resume error path Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 36/77] USB: sierra: fix urb and memory leak on disconnect Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 37/77] USB: sierra: fix remote wakeup Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 38/77] usb: qcserial: add Netgear AirCard 341U Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 39/77] usb: qcserial: add additional Sierra Wireless QMI devices Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 40/77] USB: serial: fix potential runtime pm imbalance at device remove Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 41/77] media: ivtv: Fix Oops when no firmware is loaded Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 42/77] media: stk1160: Avoid stack-allocated buffer for control URBs Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 43/77] ACPICA: utstring: Check array index bound before use Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 44/77] ACPI: Fix conflict between customized DSDT and DSDT local copy Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 45/77] media: uvcvideo: Fix clock param realtime setting Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 46/77] ARM: stacktrace: avoid listing stacktrace functions in stacktrace Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 47/77] ARM: 8037/1: mm: support big-endian page tables Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 50/77] Target/iser: Bail from accept_np if np_thread is trying to close Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 51/77] Target/iser: Fix hangs in connection teardown Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 52/77] target: Set CMD_T_ACTIVE bit for Task Management Requests Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 53/77] target: Use complete_all for se_cmd->t_transport_stop_comp Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 54/77] iscsi-target: Fix ABORT_TASK + connection reset iscsi_queue_req memory leak Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 55/77] target: Report correct response length for some commands Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 56/77] [PATCH] target: Explicitly clear ramdisk_mcp backend pages Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 57/77] x86-32, espfix: Remove filter for espfix32 due to race Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 58/77] x86, x32: Use compat shims for io_{setup,submit} Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 59/77] genirq: Sanitize spurious interrupt detection of threaded irqs Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 60/77] aio: fix aio request leak when events are reaped by userspace Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 61/77] aio: fix kernel memory disclosure in io_getevents() introduced in v3.10 Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 62/77] skbuff: skb_segment: orphan frags before copying Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 63/77] Btrfs: fix double free in find_lock_delalloc_range Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 64/77] btrfs: Add ctime/mtime update for btrfs device add/remove Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 65/77] Btrfs: output warning instead of error when loading free space cache failed Greg Kroah-Hartman
2014-06-28 17:46 ` [PATCH 3.10 66/77] Btrfs: make sure there are not any read requests before stopping workers Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.10 67/77] Btrfs: mark mapping with error flag to report errors to userspace Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.10 68/77] Btrfs: set right total device count for seeding support Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.10 69/77] Btrfs: send, dont error in the presence of subvols/snapshots Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.10 70/77] fs: btrfs: volumes.c: Fix for possible null pointer dereference Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.10 71/77] Btrfs: use right type to get real comparison Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.10 72/77] Btrfs: fix scrub_print_warning to handle skinny metadata extents Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.10 73/77] btrfs: fix use of uninit "ret" in end_extent_writepage() Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.10 74/77] usb: usbtest: Add timetout to simple_io() Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.10 75/77] Target/iser: Improve cm events handling Greg Kroah-Hartman
2014-06-28 17:47 ` [PATCH 3.10 76/77] Target/iser: Wait for proper cleanup before unloading Greg Kroah-Hartman
2014-06-28 22:30 ` [PATCH 3.10 00/77] 3.10.46-stable review Guenter Roeck
2014-06-30 16:17 ` Shuah Khan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140628174522.352181945@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).