From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Mel Gorman <mgorman@techsingularity.net>,
Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>,
David Hildenbrand <david@redhat.com>,
Matt Fleming <matt@codeblueprint.co.uk>,
Thomas Gleixner <tglx@linutronix.de>,
Borislav Petkov <bp@alien8.de>, Qian Cai <cai@lca.pw>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 4.9 12/65] mm, meminit: recalculate pcpu batch and high limits after init completes
Date: Mon, 11 Nov 2019 19:28:12 +0100 [thread overview]
Message-ID: <20191111181343.283339357@linuxfoundation.org> (raw)
In-Reply-To: <20191111181331.917659011@linuxfoundation.org>
From: Mel Gorman <mgorman@techsingularity.net>
commit 3e8fc0075e24338b1117cdff6a79477427b8dbed upstream.
Deferred memory initialisation updates zone->managed_pages during the
initialisation phase but before that finishes, the per-cpu page
allocator (pcpu) calculates the number of pages allocated/freed in
batches as well as the maximum number of pages allowed on a per-cpu
list. As zone->managed_pages is not up to date yet, the pcpu
initialisation calculates inappropriately low batch and high values.
This increases zone lock contention quite severely in some cases with
the degree of severity depending on how many CPUs share a local zone and
the size of the zone. A private report indicated that kernel build
times were excessive with extremely high system CPU usage. A perf
profile indicated that a large chunk of time was lost on zone->lock
contention.
This patch recalculates the pcpu batch and high values after deferred
initialisation completes for every populated zone in the system. It was
tested on a 2-socket AMD EPYC 2 machine using a kernel compilation
workload -- allmodconfig and all available CPUs.
mmtests configuration: config-workload-kernbench-max Configuration was
modified to build on a fresh XFS partition.
kernbench
5.4.0-rc3 5.4.0-rc3
vanilla resetpcpu-v2
Amean user-256 13249.50 ( 0.00%) 16401.31 * -23.79%*
Amean syst-256 14760.30 ( 0.00%) 4448.39 * 69.86%*
Amean elsp-256 162.42 ( 0.00%) 119.13 * 26.65%*
Stddev user-256 42.97 ( 0.00%) 19.15 ( 55.43%)
Stddev syst-256 336.87 ( 0.00%) 6.71 ( 98.01%)
Stddev elsp-256 2.46 ( 0.00%) 0.39 ( 84.03%)
5.4.0-rc3 5.4.0-rc3
vanilla resetpcpu-v2
Duration User 39766.24 49221.79
Duration System 44298.10 13361.67
Duration Elapsed 519.11 388.87
The patch reduces system CPU usage by 69.86% and total build time by
26.65%. The variance of system CPU usage is also much reduced.
Before, this was the breakdown of batch and high values over all zones
was:
256 batch: 1
256 batch: 63
512 batch: 7
256 high: 0
256 high: 378
512 high: 42
512 pcpu pagesets had a batch limit of 7 and a high limit of 42. After
the patch:
256 batch: 1
768 batch: 63
256 high: 0
768 high: 378
[mgorman@techsingularity.net: fix merge/linkage snafu]
Link: http://lkml.kernel.org/r/20191023084705.GD3016@techsingularity.netLink: http://lkml.kernel.org/r/20191021094808.28824-2-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Qian Cai <cai@lca.pw>
Cc: <stable@vger.kernel.org> [4.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
mm/page_alloc.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2051,6 +2051,14 @@ static void reserve_highatomic_pageblock
unsigned long max_managed, flags;
/*
+ * The number of managed pages has changed due to the initialisation
+ * so the pcpu batch and high limits needs to be updated or the limits
+ * will be artificially small.
+ */
+ for_each_populated_zone(zone)
+ zone_pcp_update(zone);
+
+ /*
* Limit the number reserved to 1 pageblock or roughly 1% of a zone.
* Check is race-prone but harmless.
*/
@@ -7385,7 +7393,6 @@ void free_contig_range(unsigned long pfn
}
#endif
-#ifdef CONFIG_MEMORY_HOTPLUG
/*
* The zone indicated has a new number of managed_pages; batch sizes and percpu
* page high values need to be recalulated.
@@ -7399,7 +7406,6 @@ void __meminit zone_pcp_update(struct zo
per_cpu_ptr(zone->pageset, cpu));
mutex_unlock(&pcp_batch_high_lock);
}
-#endif
void zone_pcp_reset(struct zone *zone)
{
next prev parent reply other threads:[~2019-11-11 19:09 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-11 18:28 [PATCH 4.9 00/65] 4.9.201-stable review Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 01/65] CDC-NCM: handle incomplete transfer of MTU Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 02/65] ipv4: Fix table id reference in fib_sync_down_addr Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 03/65] net: fix data-race in neigh_event_send() Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 04/65] net: usb: qmi_wwan: add support for DW5821e with eSIM support Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 05/65] NFC: fdp: fix incorrect free object Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 06/65] nfc: netlink: fix double device reference drop Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 07/65] NFC: st21nfca: fix double free Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 08/65] qede: fix NULL pointer deref in __qede_remove() Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 09/65] ALSA: timer: Fix incorrectly assigned timer instance Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 10/65] ALSA: bebob: fix to detect configured source of sampling clock for Focusrite Saffire Pro i/o series Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 11/65] ALSA: hda/ca0132 - Fix possible workqueue stall Greg Kroah-Hartman
2019-11-11 18:28 ` Greg Kroah-Hartman [this message]
2019-11-11 18:28 ` [PATCH 4.9 13/65] mm: thp: handle page cache THP correctly in PageTransCompoundMap Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 14/65] mm, vmstat: hide /proc/pagetypeinfo from normal users Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 15/65] dump_stack: avoid the livelock of the dump_lock Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 16/65] perf tools: Fix time sorting Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 17/65] drm/radeon: fix si_enable_smc_cac() failed issue Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 18/65] ceph: fix use-after-free in __ceph_remove_cap() Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 19/65] iio: imu: adis16480: make sure provided frequency is positive Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 20/65] netfilter: nf_tables: Align nft_expr private data to 64-bit Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 21/65] netfilter: ipset: Fix an error code in ip_set_sockfn_get() Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 22/65] can: usb_8dev: fix use-after-free on disconnect Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 23/65] can: c_can: c_can_poll(): only read status register after status IRQ Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 24/65] can: peak_usb: fix a potential out-of-sync while decoding packets Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 25/65] can: gs_usb: gs_can_open(): prevent memory leak Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 26/65] can: peak_usb: fix slab info leak Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 27/65] configfs: Fix bool initialization/comparison Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 28/65] configfs: stash the data we need into configfs_buffer at open time Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 29/65] configfs_register_group() shouldnt be (and isnt) called in rmdirable parts Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 30/65] configfs: new object reprsenting tree fragments Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 31/65] configfs: provide exclusion between IO and removals Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 32/65] configfs: fix a deadlock in configfs_symlink() Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 33/65] usbip: stub_rx: fix static checker warning on unnecessary checks Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 34/65] usbip: Fix vhci_urb_enqueue() URB null transfer buffer error path Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 35/65] usbip: fix possibility of dereference by NULLL pointer in vhci_hcd.c Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 36/65] drivers: usb: usbip: Add missing break statement to switch Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 37/65] PCI: tegra: Enable Relaxed Ordering only for Tegra20 & Tegra30 Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 38/65] dmaengine: xilinx_dma: Fix control reg update in vdma_channel_set_config Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 39/65] HID: intel-ish-hid: fix wrong error handling in ishtp_cl_alloc_tx_ring() Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 40/65] scsi: qla2xxx: fixup incorrect usage of host_byte Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 41/65] scsi: lpfc: Honor module parameter lpfc_use_adisc Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 42/65] ipvs: move old_secure_tcp into struct netns_ipvs Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 43/65] bonding: fix unexpected IFF_BONDING bit unset Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 44/65] usb: fsl: Check memory resource before releasing it Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 45/65] usb: gadget: udc: atmel: Fix interrupt storm in FIFO mode Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 46/65] usb: gadget: composite: Fix possible double free memory bug Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 47/65] usb: gadget: configfs: fix concurrent issue between composite APIs Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 48/65] usb: dwc3: remove the call trace of USBx_GFLADJ Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 49/65] perf/x86/amd/ibs: Fix reading of the IBS OpData register and thus precise RIP validity Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 50/65] perf/x86/amd/ibs: Handle erratum #420 only on the affected CPU family (10h) Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 51/65] USB: Skip endpoints with 0 maxpacket length Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 52/65] RDMA/iw_cxgb4: Avoid freeing skb twice in arp failure case Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 53/65] scsi: qla2xxx: stop timer in shutdown path Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 54/65] fjes: Handle workqueue allocation failure Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 55/65] net: hisilicon: Fix "Trying to free already-free IRQ" Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 56/65] NFSv4: Dont allow a cached open with a revoked delegation Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 57/65] net: ethernet: arc: add the missed clk_disable_unprepare Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 58/65] igb: Fix constant media auto sense switching when no cable is connected Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 59/65] e1000: fix memory leaks Greg Kroah-Hartman
2019-11-11 18:29 ` [PATCH 4.9 60/65] x86/apic: Move pending interrupt check code into its own function Greg Kroah-Hartman
2019-11-11 18:29 ` [PATCH 4.9 61/65] x86/apic: Drop logical_smp_processor_id() inline Greg Kroah-Hartman
2019-11-11 18:29 ` [PATCH 4.9 62/65] x86/apic/32: Avoid bogus LDR warnings Greg Kroah-Hartman
2019-11-11 18:29 ` [PATCH 4.9 63/65] can: flexcan: disable completely the ECC mechanism Greg Kroah-Hartman
2019-11-11 18:29 ` [PATCH 4.9 64/65] mm/filemap.c: dont initiate writeback if mapping has no dirty pages Greg Kroah-Hartman
2019-11-11 18:29 ` [PATCH 4.9 65/65] cgroup,writeback: dont switch wbs immediately on dead wbs if the memcg is dead Greg Kroah-Hartman
2019-11-12 0:38 ` [PATCH 4.9 00/65] 4.9.201-stable review kernelci.org bot
2019-11-12 5:27 ` Greg Kroah-Hartman
2019-11-12 12:00 ` Jon Hunter
2019-11-12 13:24 ` Naresh Kamboju
2019-11-12 18:19 ` Guenter Roeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191111181343.283339357@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=cai@lca.pw \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=matt@codeblueprint.co.uk \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).