From: Nicholas Piggin <npiggin@gmail.com>
To: linux-mm@kvack.org
Cc: Nicholas Piggin <npiggin@gmail.com>,
linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 2/2] mm/large system hash: clear hashdist when only one node with memory is booted
Date: Thu, 6 Jun 2019 00:48:14 +1000 [thread overview]
Message-ID: <20190605144814.29319-2-npiggin@gmail.com> (raw)
In-Reply-To: <20190605144814.29319-1-npiggin@gmail.com>
CONFIG_NUMA on 64-bit CPUs currently enables hashdist unconditionally
even when booting on single node machines. This causes the large system
hashes to be allocated with vmalloc, and mapped with small pages.
This change clears hashdist if only one node has come up with memory.
This results in the important large inode and dentry hashes using
memblock allocations. All others are within 4MB size up to about 128GB
of RAM, which allows them to be allocated from the linear map on most
non-NUMA images.
Other big hashes like futex and TCP should eventually be moved over to
the same style of allocation as those vfs caches that use HASH_EARLY if
!hashdist, so they don't exceed MAX_ORDER on very large non-NUMA images.
This brings dTLB misses for linux kernel tree `git diff` from ~45,000 to
~8,000 on a Kaby Lake KVM guest with 8MB dentry hash and mitigations=off
(performance is in the noise, under 1% difference, page tables are
likely to be well cached for this workload).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
mm/page_alloc.c | 31 ++++++++++++++++++-------------
1 file changed, 18 insertions(+), 13 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 15f46be7d210..cd944f48be9a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7519,10 +7519,28 @@ static int page_alloc_cpu_dead(unsigned int cpu)
return 0;
}
+#ifdef CONFIG_NUMA
+int hashdist = HASHDIST_DEFAULT;
+
+static int __init set_hashdist(char *str)
+{
+ if (!str)
+ return 0;
+ hashdist = simple_strtoul(str, &str, 0);
+ return 1;
+}
+__setup("hashdist=", set_hashdist);
+#endif
+
void __init page_alloc_init(void)
{
int ret;
+#ifdef CONFIG_NUMA
+ if (num_node_state(N_MEMORY) == 1)
+ hashdist = 0;
+#endif
+
ret = cpuhp_setup_state_nocalls(CPUHP_PAGE_ALLOC_DEAD,
"mm/page_alloc:dead", NULL,
page_alloc_cpu_dead);
@@ -7907,19 +7925,6 @@ int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *table, int write,
return ret;
}
-#ifdef CONFIG_NUMA
-int hashdist = HASHDIST_DEFAULT;
-
-static int __init set_hashdist(char *str)
-{
- if (!str)
- return 0;
- hashdist = simple_strtoul(str, &str, 0);
- return 1;
-}
-__setup("hashdist=", set_hashdist);
-#endif
-
#ifndef __HAVE_ARCH_RESERVED_KERNEL_PAGES
/*
* Returns the number of pages that arch has reserved but
--
2.20.1
next prev parent reply other threads:[~2019-06-05 14:49 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-05 14:48 [PATCH 1/2] mm/large system hash: use vmalloc for size > MAX_ORDER when !hashdist Nicholas Piggin
2019-06-05 14:48 ` Nicholas Piggin [this message]
2019-06-05 21:22 ` Andrew Morton
2019-06-06 2:27 ` Nicholas Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190605144814.29319-2-npiggin@gmail.com \
--to=npiggin@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).