From: Michal Privoznik <mprivozn@redhat.com>
To: qemu-devel@nongnu.org
Cc: david@redhat.com, imammedo@redhat.com,
marcandre.lureau@redhat.com, berrange@redhat.com
Subject: [PATCH] hostmem: Honour multiple preferred nodes if possible
Date: Fri, 9 Dec 2022 17:29:23 +0100 [thread overview]
Message-ID: <ba02465fc48807eddea9ad646fca7cc92f929ae7.1670603308.git.mprivozn@redhat.com> (raw)
If a memory-backend is configured with mode
HOST_MEM_POLICY_PREFERRED then
host_memory_backend_memory_complete() calls mbind() as:
mbind(..., MPOL_PREFERRED, nodemask, ...);
Here, 'nodemask' is a bitmap of host NUMA nodes and corresponds
to the .host-nodes attribute. Therefore, there can be multiple
nodes specified. However, the documentation to MPOL_PREFERRED
says:
MPOL_PREFERRED
This mode sets the preferred node for allocation. ...
If nodemask specifies more than one node ID, the first node
in the mask will be selected as the preferred node.
Therefore, only the first node is honoured and the rest is
silently ignored. Well, with recent changes to the kernel and
numactl we can do better.
Firstly, new mode - MPOL_PREFERRED_MANY - was introduced to
kernel (v5.15-rc1~107^2~21) which now accepts multiple NUMA
nodes.
Then, numa_has_preferred_many() API was introduced to numactl
(v2.0.15~26) allowing applications to query kernel support.
Wiring this all together, we can pass MPOL_PREFERRED_MANY to the
mbind() call instead and stop ignoring multiple nodes, silently.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
---
backends/hostmem.c | 28 ++++++++++++++++++++++++++++
meson.build | 5 +++++
2 files changed, 33 insertions(+)
diff --git a/backends/hostmem.c b/backends/hostmem.c
index 8640294c10..e0d6cb6c8a 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -23,10 +23,22 @@
#ifdef CONFIG_NUMA
#include <numaif.h>
+#include <numa.h>
QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_DEFAULT != MPOL_DEFAULT);
+/*
+ * HOST_MEM_POLICY_PREFERRED may some time also by MPOL_PREFERRED_MANY, see
+ * below.
+ */
QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_PREFERRED != MPOL_PREFERRED);
QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_BIND != MPOL_BIND);
QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_INTERLEAVE != MPOL_INTERLEAVE);
+
+/*
+ * -1 for uninitialized,
+ * 0 for MPOL_PREFERRED_MANY unsupported,
+ * 1 for supported.
+ */
+static int has_preferred_many = -1;
#endif
char *
@@ -346,6 +358,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
* before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so
* this doesn't catch hugepage case. */
unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE;
+ int mode = backend->policy;
/* check for invalid host-nodes and policies and give more verbose
* error messages than mbind(). */
@@ -369,6 +382,21 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long));
assert(maxnode <= MAX_NODES);
+#ifdef HAVE_NUMA_SET_PREFERRED_MANY
+ if (has_preferred_many < 0) {
+ /* Check, whether kernel supports MPOL_PREFERRED_MANY. */
+ has_preferred_many = numa_has_preferred_many() > 0 ? 1 : 0;
+ }
+
+ if (mode == MPOL_PREFERRED && has_preferred_many > 0) {
+ /*
+ * Replace with MPOL_PREFERRED_MANY otherwise the mbind() below
+ * silently picks the first node.
+ */
+ mode = MPOL_PREFERRED_MANY;
+ }
+#endif
+
if (maxnode &&
mbind(ptr, sz, backend->policy, backend->host_nodes, maxnode + 1,
flags)) {
diff --git a/meson.build b/meson.build
index 5c6b5a1c75..ebbff7a8ea 100644
--- a/meson.build
+++ b/meson.build
@@ -1858,6 +1858,11 @@ config_host_data.set('CONFIG_LINUX_AIO', libaio.found())
config_host_data.set('CONFIG_LINUX_IO_URING', linux_io_uring.found())
config_host_data.set('CONFIG_LIBPMEM', libpmem.found())
config_host_data.set('CONFIG_NUMA', numa.found())
+if numa.found()
+ config_host_data.set('HAVE_NUMA_SET_PREFERRED_MANY',
+ cc.has_function('numa_set_preferred_many',
+ dependencies: numa))
+endif
config_host_data.set('CONFIG_OPENGL', opengl.found())
config_host_data.set('CONFIG_PROFILER', get_option('profiler'))
config_host_data.set('CONFIG_RBD', rbd.found())
--
2.37.4
next reply other threads:[~2022-12-09 16:34 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-09 16:29 Michal Privoznik [this message]
2022-12-14 10:54 ` [PATCH] hostmem: Honour multiple preferred nodes if possible David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ba02465fc48807eddea9ad646fca7cc92f929ae7.1670603308.git.mprivozn@redhat.com \
--to=mprivozn@redhat.com \
--cc=berrange@redhat.com \
--cc=david@redhat.com \
--cc=imammedo@redhat.com \
--cc=marcandre.lureau@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).