From: Mel Gorman <mgorman@techsingularity.net>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>,
Thomas Gleixner <tglx@linutronix.de>,
Matt Fleming <matt@codeblueprint.co.uk>,
Borislav Petkov <bp@alien8.de>, Linux-MM <linux-mm@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Mel Gorman <mgorman@techsingularity.net>
Subject: [PATCH 2/3] mm, meminit: Recalculate pcpu batch and high limits after init completes
Date: Fri, 18 Oct 2019 11:56:05 +0100 [thread overview]
Message-ID: <20191018105606.3249-3-mgorman@techsingularity.net> (raw)
In-Reply-To: <20191018105606.3249-1-mgorman@techsingularity.net>
Deferred memory initialisation updates zone->managed_pages during
the initialisation phase but before that finishes, the per-cpu page
allocator (pcpu) calculates the number of pages allocated/freed in
batches as well as the maximum number of pages allowed on a per-cpu list.
As zone->managed_pages is not up to date yet, the pcpu initialisation
calculates inappropriately low batch and high values.
This increases zone lock contention quite severely in some cases with the
degree of severity depending on how many CPUs share a local zone and the
size of the zone. A private report indicated that kernel build times were
excessive with extremely high system CPU usage. A perf profile indicated
that a large chunk of time was lost on zone->lock contention.
This patch recalculates the pcpu batch and high values after deferred
initialisation completes on each node. It was tested on a 2-socket AMD
EPYC 2 machine using a kernel compilation workload -- allmodconfig and
all available CPUs.
mmtests configuration: config-workload-kernbench-max
Configuration was modified to build on a fresh XFS partition.
kernbench
5.4.0-rc3 5.4.0-rc3
vanilla resetpcpu-v1r1
Amean user-256 13249.50 ( 0.00%) 15928.40 * -20.22%*
Amean syst-256 14760.30 ( 0.00%) 4551.77 * 69.16%*
Amean elsp-256 162.42 ( 0.00%) 118.46 * 27.06%*
Stddev user-256 42.97 ( 0.00%) 50.83 ( -18.30%)
Stddev syst-256 336.87 ( 0.00%) 33.70 ( 90.00%)
Stddev elsp-256 2.46 ( 0.00%) 0.81 ( 67.01%)
5.4.0-rc3 5.4.0-rc3
vanillaresetpcpu-v1r1
Duration User 39766.24 47802.92
Duration System 44298.10 13671.93
Duration Elapsed 519.11 387.65
The patch reduces system CPU usage by 69.16% and total build time by
27.06%. The variance of system CPU usage is also much reduced.
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cafe568d36f6..0a0dd74edc83 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1818,6 +1818,14 @@ static int __init deferred_init_memmap(void *data)
*/
while (spfn < epfn)
nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn);
+
+ /*
+ * The number of managed pages has changed due to the initialisation
+ * so the pcpu batch and high limits needs to be updated or the limits
+ * will be artificially small.
+ */
+ zone_pcp_update(zone);
+
zone_empty:
pgdat_resize_unlock(pgdat, &flags);
@@ -8516,7 +8524,6 @@ void free_contig_range(unsigned long pfn, unsigned int nr_pages)
WARN(count != 0, "%d pages are still in use!\n", count);
}
-#ifdef CONFIG_MEMORY_HOTPLUG
/*
* The zone indicated has a new number of managed_pages; batch sizes and percpu
* page high values need to be recalulated.
@@ -8527,7 +8534,6 @@ void __meminit zone_pcp_update(struct zone *zone)
__zone_pcp_update(zone);
mutex_unlock(&pcp_batch_high_lock);
}
-#endif
void zone_pcp_reset(struct zone *zone)
{
--
2.16.4
next prev parent reply other threads:[~2019-10-18 10:56 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-18 10:56 [PATCH 0/3] Recalculate per-cpu page allocator batch and high limits after deferred meminit Mel Gorman
2019-10-18 10:56 ` [PATCH 1/3] mm, pcp: Share common code between memory hotplug and percpu sysctl handler Mel Gorman
2019-10-18 11:57 ` Matt Fleming
2019-10-18 12:51 ` Michal Hocko
2019-10-18 10:56 ` Mel Gorman [this message]
2019-10-18 11:57 ` [PATCH 2/3] mm, meminit: Recalculate pcpu batch and high limits after init completes Matt Fleming
2019-10-18 13:01 ` Michal Hocko
2019-10-18 14:09 ` Mel Gorman
2019-10-19 1:40 ` Andrew Morton
2019-10-20 9:32 ` Mel Gorman
2019-10-18 10:56 ` [PATCH 3/3] mm, pcpu: Make zone pcp updates and reset internal to the mm Mel Gorman
2019-10-18 11:57 ` Matt Fleming
2019-10-18 13:02 ` Michal Hocko
2019-10-18 11:58 ` [PATCH 0/3] Recalculate per-cpu page allocator batch and high limits after deferred meminit Matt Fleming
2019-10-18 12:54 ` Mel Gorman
2019-10-18 14:48 ` Matt Fleming
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191018105606.3249-3-mgorman@techsingularity.net \
--to=mgorman@techsingularity.net \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=matt@codeblueprint.co.uk \
--cc=mhocko@suse.com \
--cc=tglx@linutronix.de \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).