[GIT PULL 0/8] Host Memory Backends and Memory devices patches

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [GIT PULL 0/8] Host Memory Backends and Memory devices patches
@ 2022-10-28  9:52 David Hildenbrand
  2022-10-28  9:52 ` [GIT PULL 1/8] hw/mem/nvdimm: fix error message for 'unarmed' flag David Hildenbrand
                   ` (8 more replies)
  0 siblings, 9 replies; 14+ messages in thread
From: David Hildenbrand @ 2022-10-28  9:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Igor Mammedov, Xiao Guangrong, Richard Henderson, Stefan Weil,
	David Hildenbrand

The following changes since commit 0529245488865038344d64fff7ee05864d3d17f6:

  Merge tag 'pull-target-arm-20221020' of https://git.linaro.org/people/pmaydell/qemu-arm into staging (2022-10-20 14:36:12 -0400)

are available in the Git repository at:

  https://github.com/davidhildenbrand/qemu.git tags/mem-2022-10-28

for you to fetch changes up to bd77c30df984faefa85e6a402939b485d6e05f05:

  vl: Allow ThreadContext objects to be created before the sandbox option (2022-10-27 11:01:09 +0200)

----------------------------------------------------------------
Hi,

"Host Memory Backends" and "Memory devices" queue ("mem"):
- Fix NVDIMM error message
- Add ThreadContext user-creatable object and wire it up for NUMA-aware
  hostmem preallocation

----------------------------------------------------------------
David Hildenbrand (7):
      util: Cleanup and rename os_mem_prealloc()
      util: Introduce qemu_thread_set_affinity() and qemu_thread_get_affinity()
      util: Introduce ThreadContext user-creatable object
      util: Add write-only "node-affinity" property for ThreadContext
      util: Make qemu_prealloc_mem() optionally consume a ThreadContext
      hostmem: Allow for specifying a ThreadContext for preallocation
      vl: Allow ThreadContext objects to be created before the sandbox option

Julia Suvorova (1):
      hw/mem/nvdimm: fix error message for 'unarmed' flag

 backends/hostmem.c            |  13 +-
 hw/mem/nvdimm.c               |   2 +-
 hw/virtio/virtio-mem.c        |   2 +-
 include/qemu/osdep.h          |  19 ++-
 include/qemu/thread-context.h |  57 +++++++
 include/qemu/thread.h         |   4 +
 include/sysemu/hostmem.h      |   2 +
 meson.build                   |  16 ++
 qapi/qom.json                 |  28 ++++
 softmmu/cpus.c                |   2 +-
 softmmu/vl.c                  |  36 ++++-
 util/meson.build              |   1 +
 util/oslib-posix.c            |  39 +++--
 util/oslib-win32.c            |   8 +-
 util/qemu-thread-posix.c      |  70 ++++++++
 util/qemu-thread-win32.c      |  12 ++
 util/thread-context.c         | 362 ++++++++++++++++++++++++++++++++++++++++++
 17 files changed, 642 insertions(+), 31 deletions(-)
 create mode 100644 include/qemu/thread-context.h
 create mode 100644 util/thread-context.c

-- 
2.37.3



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [GIT PULL 1/8] hw/mem/nvdimm: fix error message for 'unarmed' flag
  2022-10-28  9:52 [GIT PULL 0/8] Host Memory Backends and Memory devices patches David Hildenbrand
@ 2022-10-28  9:52 ` David Hildenbrand
  2022-10-28  9:52 ` [GIT PULL 2/8] util: Cleanup and rename os_mem_prealloc() David Hildenbrand
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2022-10-28  9:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Igor Mammedov, Xiao Guangrong, Richard Henderson, Stefan Weil,
	David Hildenbrand, Julia Suvorova, Stefan Hajnoczi, Pankaj Gupta,
	Philippe Mathieu-Daudé

From: Julia Suvorova <jusual@redhat.com>

In the ACPI specification [1], the 'unarmed' bit is set when a device
cannot accept a persistent write. This means that when a memdev is
read-only, the 'unarmed' flag must be turned on. The logic is correct,
just changing the error message.

[1] ACPI NFIT NVDIMM Region Mapping Structure "NVDIMM State Flags" Bit 3

Fixes: dbd730e859 ("nvdimm: check -object memory-backend-file, readonly=on option")
Signed-off-by: Julia Suvorova <jusual@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: David Hildenbrand <david@redhat.com>
Message-Id: <20221023195812.15523-1-jusual@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/mem/nvdimm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index 7c7d777781..31080c22c9 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -149,7 +149,7 @@ static void nvdimm_prepare_memory_region(NVDIMMDevice *nvdimm, Error **errp)
     if (!nvdimm->unarmed && memory_region_is_rom(mr)) {
         HostMemoryBackend *hostmem = dimm->hostmem;
 
-        error_setg(errp, "'unarmed' property must be off since memdev %s "
+        error_setg(errp, "'unarmed' property must be 'on' since memdev %s "
                    "is read-only",
                    object_get_canonical_path_component(OBJECT(hostmem)));
         return;
-- 
2.37.3



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [GIT PULL 2/8] util: Cleanup and rename os_mem_prealloc()
  2022-10-28  9:52 [GIT PULL 0/8] Host Memory Backends and Memory devices patches David Hildenbrand
  2022-10-28  9:52 ` [GIT PULL 1/8] hw/mem/nvdimm: fix error message for 'unarmed' flag David Hildenbrand
@ 2022-10-28  9:52 ` David Hildenbrand
  2022-10-28  9:52 ` [GIT PULL 3/8] util: Introduce qemu_thread_set_affinity() and qemu_thread_get_affinity() David Hildenbrand
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2022-10-28  9:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Igor Mammedov, Xiao Guangrong, Richard Henderson, Stefan Weil,
	David Hildenbrand, Michal Privoznik

Let's
* give the function a "qemu_*" style name
* make sure the parameters in the implementation match the prototype
* rename smp_cpus to max_threads, which makes the semantics of that
  parameter clearer

... and add a function documentation.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Message-Id: <20221014134720.168738-2-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 backends/hostmem.c     |  6 +++---
 hw/virtio/virtio-mem.c |  2 +-
 include/qemu/osdep.h   | 17 +++++++++++++++--
 softmmu/cpus.c         |  2 +-
 util/oslib-posix.c     | 24 ++++++++++++------------
 util/oslib-win32.c     |  8 ++++----
 6 files changed, 36 insertions(+), 23 deletions(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index 4428e06738..491cb10b97 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -232,7 +232,7 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value,
         void *ptr = memory_region_get_ram_ptr(&backend->mr);
         uint64_t sz = memory_region_size(&backend->mr);
 
-        os_mem_prealloc(fd, ptr, sz, backend->prealloc_threads, &local_err);
+        qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads, &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
             return;
@@ -383,8 +383,8 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
          * specified NUMA policy in place.
          */
         if (backend->prealloc) {
-            os_mem_prealloc(memory_region_get_fd(&backend->mr), ptr, sz,
-                            backend->prealloc_threads, &local_err);
+            qemu_prealloc_mem(memory_region_get_fd(&backend->mr), ptr, sz,
+                              backend->prealloc_threads, &local_err);
             if (local_err) {
                 goto out;
             }
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 30d03e987a..0e9ef4ff19 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -467,7 +467,7 @@ static int virtio_mem_set_block_state(VirtIOMEM *vmem, uint64_t start_gpa,
             int fd = memory_region_get_fd(&vmem->memdev->mr);
             Error *local_err = NULL;
 
-            os_mem_prealloc(fd, area, size, 1, &local_err);
+            qemu_prealloc_mem(fd, area, size, 1, &local_err);
             if (local_err) {
                 static bool warned;
 
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index b1c161c035..e556e45143 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -568,8 +568,21 @@ unsigned long qemu_getauxval(unsigned long type);
 
 void qemu_set_tty_echo(int fd, bool echo);
 
-void os_mem_prealloc(int fd, char *area, size_t sz, int smp_cpus,
-                     Error **errp);
+/**
+ * qemu_prealloc_mem:
+ * @fd: the fd mapped into the area, -1 for anonymous memory
+ * @area: start address of the are to preallocate
+ * @sz: the size of the area to preallocate
+ * @max_threads: maximum number of threads to use
+ * @errp: returns an error if this function fails
+ *
+ * Preallocate memory (populate/prefault page tables writable) for the virtual
+ * memory area starting at @area with the size of @sz. After a successful call,
+ * each page in the area was faulted in writable at least once, for example,
+ * after allocating file blocks for mapped files.
+ */
+void qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads,
+                       Error **errp);
 
 /**
  * qemu_get_pid_name:
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index 61b27ff59d..01c94fd298 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -354,7 +354,7 @@ static void qemu_init_sigbus(void)
 
     /*
      * ALERT: when modifying this, take care that SIGBUS forwarding in
-     * os_mem_prealloc() will continue working as expected.
+     * qemu_prealloc_mem() will continue working as expected.
      */
     memset(&action, 0, sizeof(action));
     action.sa_flags = SA_SIGINFO;
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 827a7aadba..905cbc27cc 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -329,7 +329,7 @@ static void sigbus_handler(int signal)
         return;
     }
 #endif /* CONFIG_LINUX */
-    warn_report("os_mem_prealloc: unrelated SIGBUS detected and ignored");
+    warn_report("qemu_prealloc_mem: unrelated SIGBUS detected and ignored");
 }
 
 static void *do_touch_pages(void *arg)
@@ -399,13 +399,13 @@ static void *do_madv_populate_write_pages(void *arg)
 }
 
 static inline int get_memset_num_threads(size_t hpagesize, size_t numpages,
-                                         int smp_cpus)
+                                         int max_threads)
 {
     long host_procs = sysconf(_SC_NPROCESSORS_ONLN);
     int ret = 1;
 
     if (host_procs > 0) {
-        ret = MIN(MIN(host_procs, MAX_MEM_PREALLOC_THREAD_COUNT), smp_cpus);
+        ret = MIN(MIN(host_procs, MAX_MEM_PREALLOC_THREAD_COUNT), max_threads);
     }
 
     /* Especially with gigantic pages, don't create more threads than pages. */
@@ -418,11 +418,11 @@ static inline int get_memset_num_threads(size_t hpagesize, size_t numpages,
 }
 
 static int touch_all_pages(char *area, size_t hpagesize, size_t numpages,
-                           int smp_cpus, bool use_madv_populate_write)
+                           int max_threads, bool use_madv_populate_write)
 {
     static gsize initialized = 0;
     MemsetContext context = {
-        .num_threads = get_memset_num_threads(hpagesize, numpages, smp_cpus),
+        .num_threads = get_memset_num_threads(hpagesize, numpages, max_threads),
     };
     size_t numpages_per_thread, leftover;
     void *(*touch_fn)(void *);
@@ -494,13 +494,13 @@ static bool madv_populate_write_possible(char *area, size_t pagesize)
            errno != EINVAL;
 }
 
-void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus,
-                     Error **errp)
+void qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads,
+                       Error **errp)
 {
     static gsize initialized;
     int ret;
     size_t hpagesize = qemu_fd_getpagesize(fd);
-    size_t numpages = DIV_ROUND_UP(memory, hpagesize);
+    size_t numpages = DIV_ROUND_UP(sz, hpagesize);
     bool use_madv_populate_write;
     struct sigaction act;
 
@@ -530,24 +530,24 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus,
         if (ret) {
             qemu_mutex_unlock(&sigbus_mutex);
             error_setg_errno(errp, errno,
-                "os_mem_prealloc: failed to install signal handler");
+                "qemu_prealloc_mem: failed to install signal handler");
             return;
         }
     }
 
     /* touch pages simultaneously */
-    ret = touch_all_pages(area, hpagesize, numpages, smp_cpus,
+    ret = touch_all_pages(area, hpagesize, numpages, max_threads,
                           use_madv_populate_write);
     if (ret) {
         error_setg_errno(errp, -ret,
-                         "os_mem_prealloc: preallocating memory failed");
+                         "qemu_prealloc_mem: preallocating memory failed");
     }
 
     if (!use_madv_populate_write) {
         ret = sigaction(SIGBUS, &sigbus_oldact, NULL);
         if (ret) {
             /* Terminate QEMU since it can't recover from error */
-            perror("os_mem_prealloc: failed to reinstall signal handler");
+            perror("qemu_prealloc_mem: failed to reinstall signal handler");
             exit(1);
         }
         qemu_mutex_unlock(&sigbus_mutex);
diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index 5723d3eb4c..e1cb725ecc 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -268,14 +268,14 @@ int getpagesize(void)
     return system_info.dwPageSize;
 }
 
-void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus,
-                     Error **errp)
+void qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads,
+                       Error **errp)
 {
     int i;
     size_t pagesize = qemu_real_host_page_size();
 
-    memory = (memory + pagesize - 1) & -pagesize;
-    for (i = 0; i < memory / pagesize; i++) {
+    sz = (sz + pagesize - 1) & -pagesize;
+    for (i = 0; i < sz / pagesize; i++) {
         memset(area + pagesize * i, 0, 1);
     }
 }
-- 
2.37.3



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [GIT PULL 3/8] util: Introduce qemu_thread_set_affinity() and qemu_thread_get_affinity()
  2022-10-28  9:52 [GIT PULL 0/8] Host Memory Backends and Memory devices patches David Hildenbrand
  2022-10-28  9:52 ` [GIT PULL 1/8] hw/mem/nvdimm: fix error message for 'unarmed' flag David Hildenbrand
  2022-10-28  9:52 ` [GIT PULL 2/8] util: Cleanup and rename os_mem_prealloc() David Hildenbrand
@ 2022-10-28  9:52 ` David Hildenbrand
  2022-10-28  9:52 ` [GIT PULL 4/8] util: Introduce ThreadContext user-creatable object David Hildenbrand
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2022-10-28  9:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Igor Mammedov, Xiao Guangrong, Richard Henderson, Stefan Weil,
	David Hildenbrand, Michal Privoznik

Usually, we let upper layers handle CPU pinning, because
pthread_setaffinity_np() (-> sched_setaffinity()) is blocked via
seccomp when starting QEMU with
    -sandbox enable=on,resourcecontrol=deny

However, we want to configure and observe the CPU affinity of threads
from QEMU directly in some cases when the sandbox option is either not
enabled or not active yet.

So let's add a way to configure CPU pinning via
qemu_thread_set_affinity() and obtain CPU affinity via
qemu_thread_get_affinity() and implement them under POSIX using
pthread_setaffinity_np() + pthread_getaffinity_np().

Implementation under Windows is possible using SetProcessAffinityMask()
+ GetProcessAffinityMask(), however, that is left as future work.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Message-Id: <20221014134720.168738-3-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/qemu/thread.h    |  4 +++
 meson.build              | 16 +++++++++
 util/qemu-thread-posix.c | 70 ++++++++++++++++++++++++++++++++++++++++
 util/qemu-thread-win32.c | 12 +++++++
 4 files changed, 102 insertions(+)

diff --git a/include/qemu/thread.h b/include/qemu/thread.h
index af19f2b3fc..79e507c7f0 100644
--- a/include/qemu/thread.h
+++ b/include/qemu/thread.h
@@ -185,6 +185,10 @@ void qemu_event_destroy(QemuEvent *ev);
 void qemu_thread_create(QemuThread *thread, const char *name,
                         void *(*start_routine)(void *),
                         void *arg, int mode);
+int qemu_thread_set_affinity(QemuThread *thread, unsigned long *host_cpus,
+                             unsigned long nbits);
+int qemu_thread_get_affinity(QemuThread *thread, unsigned long **host_cpus,
+                             unsigned long *nbits);
 void *qemu_thread_join(QemuThread *thread);
 void qemu_thread_get_self(QemuThread *thread);
 bool qemu_thread_is_self(QemuThread *thread);
diff --git a/meson.build b/meson.build
index b686dfef75..3e0aa4925d 100644
--- a/meson.build
+++ b/meson.build
@@ -2114,7 +2114,23 @@ config_host_data.set('CONFIG_PTHREAD_CONDATTR_SETCLOCK', cc.links(gnu_source_pre
     pthread_condattr_setclock(&attr, CLOCK_MONOTONIC);
     return 0;
   }''', dependencies: threads))
+config_host_data.set('CONFIG_PTHREAD_AFFINITY_NP', cc.links(gnu_source_prefix + '''
+  #include <pthread.h>
 
+  static void *f(void *p) { return NULL; }
+  int main(void)
+  {
+    int setsize = CPU_ALLOC_SIZE(64);
+    pthread_t thread;
+    cpu_set_t *cpuset;
+    pthread_create(&thread, 0, f, 0);
+    cpuset = CPU_ALLOC(64);
+    CPU_ZERO_S(setsize, cpuset);
+    pthread_setaffinity_np(thread, setsize, cpuset);
+    pthread_getaffinity_np(thread, setsize, cpuset);
+    CPU_FREE(cpuset);
+    return 0;
+  }''', dependencies: threads))
 config_host_data.set('CONFIG_SIGNALFD', cc.links(gnu_source_prefix + '''
   #include <sys/signalfd.h>
   #include <stddef.h>
diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index ac1d56e673..bae938c670 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -16,6 +16,7 @@
 #include "qemu/notify.h"
 #include "qemu-thread-common.h"
 #include "qemu/tsan.h"
+#include "qemu/bitmap.h"
 
 static bool name_threads;
 
@@ -552,6 +553,75 @@ void qemu_thread_create(QemuThread *thread, const char *name,
     pthread_attr_destroy(&attr);
 }
 
+int qemu_thread_set_affinity(QemuThread *thread, unsigned long *host_cpus,
+                             unsigned long nbits)
+{
+#if defined(CONFIG_PTHREAD_AFFINITY_NP)
+    const size_t setsize = CPU_ALLOC_SIZE(nbits);
+    unsigned long value;
+    cpu_set_t *cpuset;
+    int err;
+
+    cpuset = CPU_ALLOC(nbits);
+    g_assert(cpuset);
+
+    CPU_ZERO_S(setsize, cpuset);
+    value = find_first_bit(host_cpus, nbits);
+    while (value < nbits) {
+        CPU_SET_S(value, setsize, cpuset);
+        value = find_next_bit(host_cpus, nbits, value + 1);
+    }
+
+    err = pthread_setaffinity_np(thread->thread, setsize, cpuset);
+    CPU_FREE(cpuset);
+    return err;
+#else
+    return -ENOSYS;
+#endif
+}
+
+int qemu_thread_get_affinity(QemuThread *thread, unsigned long **host_cpus,
+                             unsigned long *nbits)
+{
+#if defined(CONFIG_PTHREAD_AFFINITY_NP)
+    unsigned long tmpbits;
+    cpu_set_t *cpuset;
+    size_t setsize;
+    int i, err;
+
+    tmpbits = CPU_SETSIZE;
+    while (true) {
+        setsize = CPU_ALLOC_SIZE(tmpbits);
+        cpuset = CPU_ALLOC(tmpbits);
+        g_assert(cpuset);
+
+        err = pthread_getaffinity_np(thread->thread, setsize, cpuset);
+        if (err) {
+            CPU_FREE(cpuset);
+            if (err != -EINVAL) {
+                return err;
+            }
+            tmpbits *= 2;
+        } else {
+            break;
+        }
+    }
+
+    /* Convert the result into a proper bitmap. */
+    *nbits = tmpbits;
+    *host_cpus = bitmap_new(tmpbits);
+    for (i = 0; i < tmpbits; i++) {
+        if (CPU_ISSET(i, cpuset)) {
+            set_bit(i, *host_cpus);
+        }
+    }
+    CPU_FREE(cpuset);
+    return 0;
+#else
+    return -ENOSYS;
+#endif
+}
+
 void qemu_thread_get_self(QemuThread *thread)
 {
     thread->thread = pthread_self();
diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c
index b9a467d7db..69db254ac7 100644
--- a/util/qemu-thread-win32.c
+++ b/util/qemu-thread-win32.c
@@ -477,6 +477,18 @@ void qemu_thread_create(QemuThread *thread, const char *name,
     thread->data = data;
 }
 
+int qemu_thread_set_affinity(QemuThread *thread, unsigned long *host_cpus,
+                             unsigned long nbits)
+{
+    return -ENOSYS;
+}
+
+int qemu_thread_get_affinity(QemuThread *thread, unsigned long **host_cpus,
+                             unsigned long *nbits)
+{
+    return -ENOSYS;
+}
+
 void qemu_thread_get_self(QemuThread *thread)
 {
     thread->data = qemu_thread_data;
-- 
2.37.3



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [GIT PULL 4/8] util: Introduce ThreadContext user-creatable object
  2022-10-28  9:52 [GIT PULL 0/8] Host Memory Backends and Memory devices patches David Hildenbrand
                   ` (2 preceding siblings ...)
  2022-10-28  9:52 ` [GIT PULL 3/8] util: Introduce qemu_thread_set_affinity() and qemu_thread_get_affinity() David Hildenbrand
@ 2022-10-28  9:52 ` David Hildenbrand
  2022-10-28  9:52 ` [GIT PULL 5/8] util: Add write-only "node-affinity" property for ThreadContext David Hildenbrand
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2022-10-28  9:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Igor Mammedov, Xiao Guangrong, Richard Henderson, Stefan Weil,
	David Hildenbrand, Michal Privoznik, Markus Armbruster

Setting the CPU affinity of QEMU threads is a bit problematic, because
QEMU doesn't always have permissions to set the CPU affinity itself,
for example, with seccomp after initialized by QEMU:
    -sandbox enable=on,resourcecontrol=deny

General information about CPU affinities can be found in the man page of
taskset:
    CPU affinity is a scheduler property that "bonds" a process to a given
    set of CPUs on the system. The Linux scheduler will honor the given CPU
    affinity and the process will not run on any other CPUs.

While upper layers are already aware of how to handle CPU affinities for
long-lived threads like iothreads or vcpu threads, especially short-lived
threads, as used for memory-backend preallocation, are more involved to
handle. These threads are created on demand and upper layers are not even
able to identify and configure them.

Introduce the concept of a ThreadContext, that is essentially a thread
used for creating new threads. All threads created via that context
thread inherit the configured CPU affinity. Consequently, it's
sufficient to create a ThreadContext and configure it once, and have all
threads created via that ThreadContext inherit the same CPU affinity.

The CPU affinity of a ThreadContext can be configured two ways:

(1) Obtaining the thread id via the "thread-id" property and setting the
    CPU affinity manually (e.g., via taskset).

(2) Setting the "cpu-affinity" property and letting QEMU try set the
    CPU affinity itself. This will fail if QEMU doesn't have permissions
    to do so anymore after seccomp was initialized.

A simple QEMU example to set the CPU affinity to host CPU 0,1,6,7 would be:
    qemu-system-x86_64 -S \
      -object thread-context,id=tc1,cpu-affinity=0-1,cpu-affinity=6-7

And we can query it via HMP/QMP:
    (qemu) qom-get tc1 cpu-affinity
    [
        0,
        1,
        6,
        7
    ]

But note that due to dynamic library loading this example will not work
before we actually make use of thread_context_create_thread() in QEMU
code, because the type will otherwise not get registered. We'll wire
this up next to make it work.

In general, the interface behaves like pthread_setaffinity_np(): host
CPU numbers that are currently not available are ignored; only host CPU
numbers that are impossible with the current kernel will fail. If the
list of host CPU numbers does not include a single CPU that is
available, setting the CPU affinity will fail.

A ThreadContext can be reused, simply by reconfiguring the CPU affinity.
Note that the CPU affinity of previously created threads will not get
adjusted.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221014134720.168738-4-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/qemu/thread-context.h |  57 +++++++
 qapi/qom.json                 |  17 +++
 util/meson.build              |   1 +
 util/oslib-posix.c            |   1 +
 util/thread-context.c         | 278 ++++++++++++++++++++++++++++++++++
 5 files changed, 354 insertions(+)
 create mode 100644 include/qemu/thread-context.h
 create mode 100644 util/thread-context.c

diff --git a/include/qemu/thread-context.h b/include/qemu/thread-context.h
new file mode 100644
index 0000000000..2ebd6b7fe1
--- /dev/null
+++ b/include/qemu/thread-context.h
@@ -0,0 +1,57 @@
+/*
+ * QEMU Thread Context
+ *
+ * Copyright Red Hat Inc., 2022
+ *
+ * Authors:
+ *  David Hildenbrand <david@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef SYSEMU_THREAD_CONTEXT_H
+#define SYSEMU_THREAD_CONTEXT_H
+
+#include "qapi/qapi-types-machine.h"
+#include "qemu/thread.h"
+#include "qom/object.h"
+
+#define TYPE_THREAD_CONTEXT "thread-context"
+OBJECT_DECLARE_TYPE(ThreadContext, ThreadContextClass,
+                    THREAD_CONTEXT)
+
+struct ThreadContextClass {
+    ObjectClass parent_class;
+};
+
+struct ThreadContext {
+    /* private */
+    Object parent;
+
+    /* private */
+    unsigned int thread_id;
+    QemuThread thread;
+
+    /* Semaphore to wait for context thread action. */
+    QemuSemaphore sem;
+    /* Semaphore to wait for action in context thread. */
+    QemuSemaphore sem_thread;
+    /* Mutex to synchronize requests. */
+    QemuMutex mutex;
+
+    /* Commands for the thread to execute. */
+    int thread_cmd;
+    void *thread_cmd_data;
+
+    /* CPU affinity bitmap used for initialization. */
+    unsigned long *init_cpu_bitmap;
+    int init_cpu_nbits;
+};
+
+void thread_context_create_thread(ThreadContext *tc, QemuThread *thread,
+                                  const char *name,
+                                  void *(*start_routine)(void *), void *arg,
+                                  int mode);
+
+#endif /* SYSEMU_THREAD_CONTEXT_H */
diff --git a/qapi/qom.json b/qapi/qom.json
index 80dd419b39..8013ba4b82 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -830,6 +830,21 @@
             'reduced-phys-bits': 'uint32',
             '*kernel-hashes': 'bool' } }
 
+##
+# @ThreadContextProperties:
+#
+# Properties for thread context objects.
+#
+# @cpu-affinity: the list of host CPU numbers used as CPU affinity for all
+#                threads created in the thread context (default: QEMU main
+#                thread CPU affinity)
+#
+# Since: 7.2
+##
+{ 'struct': 'ThreadContextProperties',
+  'data': { '*cpu-affinity': ['uint16'] } }
+
+
 ##
 # @ObjectType:
 #
@@ -882,6 +897,7 @@
     { 'name': 'secret_keyring',
       'if': 'CONFIG_SECRET_KEYRING' },
     'sev-guest',
+    'thread-context',
     's390-pv-guest',
     'throttle-group',
     'tls-creds-anon',
@@ -948,6 +964,7 @@
       'secret_keyring':             { 'type': 'SecretKeyringProperties',
                                       'if': 'CONFIG_SECRET_KEYRING' },
       'sev-guest':                  'SevGuestProperties',
+      'thread-context':             'ThreadContextProperties',
       'throttle-group':             'ThrottleGroupProperties',
       'tls-creds-anon':             'TlsCredsAnonProperties',
       'tls-creds-psk':              'TlsCredsPskProperties',
diff --git a/util/meson.build b/util/meson.build
index 5e282130df..e97cd2d779 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -1,4 +1,5 @@
 util_ss.add(files('osdep.c', 'cutils.c', 'unicode.c', 'qemu-timer-common.c'))
+util_ss.add(files('thread-context.c'))
 if not config_host_data.get('CONFIG_ATOMIC64')
   util_ss.add(files('atomic64.c'))
 endif
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 905cbc27cc..28305cdea3 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -42,6 +42,7 @@
 #include "qemu/cutils.h"
 #include "qemu/compiler.h"
 #include "qemu/units.h"
+#include "qemu/thread-context.h"
 
 #ifdef CONFIG_LINUX
 #include <sys/syscall.h>
diff --git a/util/thread-context.c b/util/thread-context.c
new file mode 100644
index 0000000000..c921905396
--- /dev/null
+++ b/util/thread-context.c
@@ -0,0 +1,278 @@
+/*
+ * QEMU Thread Context
+ *
+ * Copyright Red Hat Inc., 2022
+ *
+ * Authors:
+ *  David Hildenbrand <david@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/thread-context.h"
+#include "qapi/error.h"
+#include "qapi/qapi-builtin-visit.h"
+#include "qapi/visitor.h"
+#include "qemu/config-file.h"
+#include "qapi/qapi-builtin-visit.h"
+#include "qom/object_interfaces.h"
+#include "qemu/module.h"
+#include "qemu/bitmap.h"
+
+enum {
+    TC_CMD_NONE = 0,
+    TC_CMD_STOP,
+    TC_CMD_NEW,
+};
+
+typedef struct ThreadContextCmdNew {
+    QemuThread *thread;
+    const char *name;
+    void *(*start_routine)(void *);
+    void *arg;
+    int mode;
+} ThreadContextCmdNew;
+
+static void *thread_context_run(void *opaque)
+{
+    ThreadContext *tc = opaque;
+
+    tc->thread_id = qemu_get_thread_id();
+    qemu_sem_post(&tc->sem);
+
+    while (true) {
+        /*
+         * Threads inherit the CPU affinity of the creating thread. For this
+         * reason, we create new (especially short-lived) threads from our
+         * persistent context thread.
+         *
+         * Especially when QEMU is not allowed to set the affinity itself,
+         * management tools can simply set the affinity of the context thread
+         * after creating the context, to have new threads created via
+         * the context inherit the CPU affinity automatically.
+         */
+        switch (tc->thread_cmd) {
+        case TC_CMD_NONE:
+            break;
+        case TC_CMD_STOP:
+            tc->thread_cmd = TC_CMD_NONE;
+            qemu_sem_post(&tc->sem);
+            return NULL;
+        case TC_CMD_NEW: {
+            ThreadContextCmdNew *cmd_new = tc->thread_cmd_data;
+
+            qemu_thread_create(cmd_new->thread, cmd_new->name,
+                               cmd_new->start_routine, cmd_new->arg,
+                               cmd_new->mode);
+            tc->thread_cmd = TC_CMD_NONE;
+            tc->thread_cmd_data = NULL;
+            qemu_sem_post(&tc->sem);
+            break;
+        }
+        default:
+            g_assert_not_reached();
+        }
+        qemu_sem_wait(&tc->sem_thread);
+    }
+}
+
+static void thread_context_set_cpu_affinity(Object *obj, Visitor *v,
+                                            const char *name, void *opaque,
+                                            Error **errp)
+{
+    ThreadContext *tc = THREAD_CONTEXT(obj);
+    uint16List *l, *host_cpus = NULL;
+    unsigned long *bitmap = NULL;
+    int nbits = 0, ret;
+    Error *err = NULL;
+
+    visit_type_uint16List(v, name, &host_cpus, &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+
+    if (!host_cpus) {
+        error_setg(errp, "CPU list is empty");
+        goto out;
+    }
+
+    for (l = host_cpus; l; l = l->next) {
+        nbits = MAX(nbits, l->value + 1);
+    }
+    bitmap = bitmap_new(nbits);
+    for (l = host_cpus; l; l = l->next) {
+        set_bit(l->value, bitmap);
+    }
+
+    if (tc->thread_id != -1) {
+        /*
+         * Note: we won't be adjusting the affinity of any thread that is still
+         * around, but only the affinity of the context thread.
+         */
+        ret = qemu_thread_set_affinity(&tc->thread, bitmap, nbits);
+        if (ret) {
+            error_setg(errp, "Setting CPU affinity failed: %s", strerror(ret));
+        }
+    } else {
+        tc->init_cpu_bitmap = bitmap;
+        bitmap = NULL;
+        tc->init_cpu_nbits = nbits;
+    }
+out:
+    g_free(bitmap);
+    qapi_free_uint16List(host_cpus);
+}
+
+static void thread_context_get_cpu_affinity(Object *obj, Visitor *v,
+                                            const char *name, void *opaque,
+                                            Error **errp)
+{
+    unsigned long *bitmap, nbits, value;
+    ThreadContext *tc = THREAD_CONTEXT(obj);
+    uint16List *host_cpus = NULL;
+    uint16List **tail = &host_cpus;
+    int ret;
+
+    if (tc->thread_id == -1) {
+        error_setg(errp, "Object not initialized yet");
+        return;
+    }
+
+    ret = qemu_thread_get_affinity(&tc->thread, &bitmap, &nbits);
+    if (ret) {
+        error_setg(errp, "Getting CPU affinity failed: %s", strerror(ret));
+        return;
+    }
+
+    value = find_first_bit(bitmap, nbits);
+    while (value < nbits) {
+        QAPI_LIST_APPEND(tail, value);
+
+        value = find_next_bit(bitmap, nbits, value + 1);
+    }
+    g_free(bitmap);
+
+    visit_type_uint16List(v, name, &host_cpus, errp);
+    qapi_free_uint16List(host_cpus);
+}
+
+static void thread_context_get_thread_id(Object *obj, Visitor *v,
+                                         const char *name, void *opaque,
+                                         Error **errp)
+{
+    ThreadContext *tc = THREAD_CONTEXT(obj);
+    uint64_t value = tc->thread_id;
+
+    visit_type_uint64(v, name, &value, errp);
+}
+
+static void thread_context_instance_complete(UserCreatable *uc, Error **errp)
+{
+    ThreadContext *tc = THREAD_CONTEXT(uc);
+    char *thread_name;
+    int ret;
+
+    thread_name = g_strdup_printf("TC %s",
+                               object_get_canonical_path_component(OBJECT(uc)));
+    qemu_thread_create(&tc->thread, thread_name, thread_context_run, tc,
+                       QEMU_THREAD_JOINABLE);
+    g_free(thread_name);
+
+    /* Wait until initialization of the thread is done. */
+    while (tc->thread_id == -1) {
+        qemu_sem_wait(&tc->sem);
+    }
+
+    if (tc->init_cpu_bitmap) {
+        ret = qemu_thread_set_affinity(&tc->thread, tc->init_cpu_bitmap,
+                                       tc->init_cpu_nbits);
+        if (ret) {
+            error_setg(errp, "Setting CPU affinity failed: %s", strerror(ret));
+        }
+        g_free(tc->init_cpu_bitmap);
+        tc->init_cpu_bitmap = NULL;
+    }
+}
+
+static void thread_context_class_init(ObjectClass *oc, void *data)
+{
+    UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc);
+
+    ucc->complete = thread_context_instance_complete;
+    object_class_property_add(oc, "thread-id", "int",
+                              thread_context_get_thread_id, NULL, NULL,
+                              NULL);
+    object_class_property_add(oc, "cpu-affinity", "int",
+                              thread_context_get_cpu_affinity,
+                              thread_context_set_cpu_affinity, NULL, NULL);
+}
+
+static void thread_context_instance_init(Object *obj)
+{
+    ThreadContext *tc = THREAD_CONTEXT(obj);
+
+    tc->thread_id = -1;
+    qemu_sem_init(&tc->sem, 0);
+    qemu_sem_init(&tc->sem_thread, 0);
+    qemu_mutex_init(&tc->mutex);
+}
+
+static void thread_context_instance_finalize(Object *obj)
+{
+    ThreadContext *tc = THREAD_CONTEXT(obj);
+
+    if (tc->thread_id != -1) {
+        tc->thread_cmd = TC_CMD_STOP;
+        qemu_sem_post(&tc->sem_thread);
+        qemu_thread_join(&tc->thread);
+    }
+    qemu_sem_destroy(&tc->sem);
+    qemu_sem_destroy(&tc->sem_thread);
+    qemu_mutex_destroy(&tc->mutex);
+}
+
+static const TypeInfo thread_context_info = {
+    .name = TYPE_THREAD_CONTEXT,
+    .parent = TYPE_OBJECT,
+    .class_init = thread_context_class_init,
+    .instance_size = sizeof(ThreadContext),
+    .instance_init = thread_context_instance_init,
+    .instance_finalize = thread_context_instance_finalize,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_USER_CREATABLE },
+        { }
+    }
+};
+
+static void thread_context_register_types(void)
+{
+    type_register_static(&thread_context_info);
+}
+type_init(thread_context_register_types)
+
+void thread_context_create_thread(ThreadContext *tc, QemuThread *thread,
+                                  const char *name,
+                                  void *(*start_routine)(void *), void *arg,
+                                  int mode)
+{
+    ThreadContextCmdNew data = {
+        .thread = thread,
+        .name = name,
+        .start_routine = start_routine,
+        .arg = arg,
+        .mode = mode,
+    };
+
+    qemu_mutex_lock(&tc->mutex);
+    tc->thread_cmd = TC_CMD_NEW;
+    tc->thread_cmd_data = &data;
+    qemu_sem_post(&tc->sem_thread);
+
+    while (tc->thread_cmd != TC_CMD_NONE) {
+        qemu_sem_wait(&tc->sem);
+    }
+    qemu_mutex_unlock(&tc->mutex);
+}
-- 
2.37.3



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [GIT PULL 5/8] util: Add write-only "node-affinity" property for ThreadContext
  2022-10-28  9:52 [GIT PULL 0/8] Host Memory Backends and Memory devices patches David Hildenbrand
                   ` (3 preceding siblings ...)
  2022-10-28  9:52 ` [GIT PULL 4/8] util: Introduce ThreadContext user-creatable object David Hildenbrand
@ 2022-10-28  9:52 ` David Hildenbrand
  2024-02-05 10:14   ` Claudio Fontana
  2022-10-28  9:52 ` [GIT PULL 6/8] util: Make qemu_prealloc_mem() optionally consume a ThreadContext David Hildenbrand
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2022-10-28  9:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Igor Mammedov, Xiao Guangrong, Richard Henderson, Stefan Weil,
	David Hildenbrand, Michal Privoznik, Markus Armbruster

Let's make it easier to pin threads created via a ThreadContext to
all host CPUs currently belonging to a given set of host NUMA nodes --
which is the common case.

"node-affinity" is simply a shortcut for setting "cpu-affinity" manually
to the list of host CPUs belonging to the set of host nodes. This property
can only be written.

A simple QEMU example to set the CPU affinity to host node 1 on a system
with two nodes, 24 CPUs each, whereby odd-numbered host CPUs belong to
host node 1:
    qemu-system-x86_64 -S \
      -object thread-context,id=tc1,node-affinity=1

And we can query the cpu-affinity via HMP/QMP:
    (qemu) qom-get tc1 cpu-affinity
    [
        1,
        3,
        5,
        7,
        9,
        11,
        13,
        15,
        17,
        19,
        21,
        23,
        25,
        27,
        29,
        31,
        33,
        35,
        37,
        39,
        41,
        43,
        45,
        47
    ]

We cannot query the node-affinity:
    (qemu) qom-get tc1 node-affinity
    Error: Insufficient permission to perform this operation

But note that due to dynamic library loading this example will not work
before we actually make use of thread_context_create_thread() in QEMU
code, because the type will otherwise not get registered. We'll wire
this up next to make it work.

Note that if the host CPUs for a host node change due do CPU hot(un)plug
CPU onlining/offlining (i.e., lscpu output changes) after the ThreadContext
was started, the CPU affinity will not get updated.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221014134720.168738-5-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 qapi/qom.json         |  9 ++++-
 util/meson.build      |  2 +-
 util/thread-context.c | 84 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/qapi/qom.json b/qapi/qom.json
index 8013ba4b82..20b5735d78 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -839,10 +839,17 @@
 #                threads created in the thread context (default: QEMU main
 #                thread CPU affinity)
 #
+# @node-affinity: the list of host node numbers that will be resolved to a
+#                 list of host CPU numbers used as CPU affinity. This is a
+#                 shortcut for specifying the list of host CPU numbers
+#                 belonging to the host nodes manually by setting
+#                 @cpu-affinity. (default: QEMU main thread affinity)
+#
 # Since: 7.2
 ##
 { 'struct': 'ThreadContextProperties',
-  'data': { '*cpu-affinity': ['uint16'] } }
+  'data': { '*cpu-affinity': ['uint16'],
+            '*node-affinity': ['uint16'] } }
 
 
 ##
diff --git a/util/meson.build b/util/meson.build
index e97cd2d779..c0a7bc54d4 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -1,5 +1,5 @@
 util_ss.add(files('osdep.c', 'cutils.c', 'unicode.c', 'qemu-timer-common.c'))
-util_ss.add(files('thread-context.c'))
+util_ss.add(files('thread-context.c'), numa)
 if not config_host_data.get('CONFIG_ATOMIC64')
   util_ss.add(files('atomic64.c'))
 endif
diff --git a/util/thread-context.c b/util/thread-context.c
index c921905396..4138245332 100644
--- a/util/thread-context.c
+++ b/util/thread-context.c
@@ -21,6 +21,10 @@
 #include "qemu/module.h"
 #include "qemu/bitmap.h"
 
+#ifdef CONFIG_NUMA
+#include <numa.h>
+#endif
+
 enum {
     TC_CMD_NONE = 0,
     TC_CMD_STOP,
@@ -88,6 +92,11 @@ static void thread_context_set_cpu_affinity(Object *obj, Visitor *v,
     int nbits = 0, ret;
     Error *err = NULL;
 
+    if (tc->init_cpu_bitmap) {
+        error_setg(errp, "Mixing CPU and node affinity not supported");
+        return;
+    }
+
     visit_type_uint16List(v, name, &host_cpus, &err);
     if (err) {
         error_propagate(errp, err);
@@ -159,6 +168,79 @@ static void thread_context_get_cpu_affinity(Object *obj, Visitor *v,
     qapi_free_uint16List(host_cpus);
 }
 
+static void thread_context_set_node_affinity(Object *obj, Visitor *v,
+                                             const char *name, void *opaque,
+                                             Error **errp)
+{
+#ifdef CONFIG_NUMA
+    const int nbits = numa_num_possible_cpus();
+    ThreadContext *tc = THREAD_CONTEXT(obj);
+    uint16List *l, *host_nodes = NULL;
+    unsigned long *bitmap = NULL;
+    struct bitmask *tmp_cpus;
+    Error *err = NULL;
+    int ret, i;
+
+    if (tc->init_cpu_bitmap) {
+        error_setg(errp, "Mixing CPU and node affinity not supported");
+        return;
+    }
+
+    visit_type_uint16List(v, name, &host_nodes, &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+
+    if (!host_nodes) {
+        error_setg(errp, "Node list is empty");
+        goto out;
+    }
+
+    bitmap = bitmap_new(nbits);
+    tmp_cpus = numa_allocate_cpumask();
+    for (l = host_nodes; l; l = l->next) {
+        numa_bitmask_clearall(tmp_cpus);
+        ret = numa_node_to_cpus(l->value, tmp_cpus);
+        if (ret) {
+            /* We ignore any errors, such as impossible nodes. */
+            continue;
+        }
+        for (i = 0; i < nbits; i++) {
+            if (numa_bitmask_isbitset(tmp_cpus, i)) {
+                set_bit(i, bitmap);
+            }
+        }
+    }
+    numa_free_cpumask(tmp_cpus);
+
+    if (bitmap_empty(bitmap, nbits)) {
+        error_setg(errp, "The nodes select no CPUs");
+        goto out;
+    }
+
+    if (tc->thread_id != -1) {
+        /*
+         * Note: we won't be adjusting the affinity of any thread that is still
+         * around for now, but only the affinity of the context thread.
+         */
+        ret = qemu_thread_set_affinity(&tc->thread, bitmap, nbits);
+        if (ret) {
+            error_setg(errp, "Setting CPU affinity failed: %s", strerror(ret));
+        }
+    } else {
+        tc->init_cpu_bitmap = bitmap;
+        bitmap = NULL;
+        tc->init_cpu_nbits = nbits;
+    }
+out:
+    g_free(bitmap);
+    qapi_free_uint16List(host_nodes);
+#else
+    error_setg(errp, "NUMA node affinity is not supported by this QEMU");
+#endif
+}
+
 static void thread_context_get_thread_id(Object *obj, Visitor *v,
                                          const char *name, void *opaque,
                                          Error **errp)
@@ -208,6 +290,8 @@ static void thread_context_class_init(ObjectClass *oc, void *data)
     object_class_property_add(oc, "cpu-affinity", "int",
                               thread_context_get_cpu_affinity,
                               thread_context_set_cpu_affinity, NULL, NULL);
+    object_class_property_add(oc, "node-affinity", "int", NULL,
+                              thread_context_set_node_affinity, NULL, NULL);
 }
 
 static void thread_context_instance_init(Object *obj)
-- 
2.37.3



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [GIT PULL 6/8] util: Make qemu_prealloc_mem() optionally consume a ThreadContext
  2022-10-28  9:52 [GIT PULL 0/8] Host Memory Backends and Memory devices patches David Hildenbrand
                   ` (4 preceding siblings ...)
  2022-10-28  9:52 ` [GIT PULL 5/8] util: Add write-only "node-affinity" property for ThreadContext David Hildenbrand
@ 2022-10-28  9:52 ` David Hildenbrand
  2022-10-28  9:52 ` [GIT PULL 7/8] hostmem: Allow for specifying a ThreadContext for preallocation David Hildenbrand
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2022-10-28  9:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Igor Mammedov, Xiao Guangrong, Richard Henderson, Stefan Weil,
	David Hildenbrand, Michal Privoznik

... and implement it under POSIX. When a ThreadContext is provided,
create new threads via the context such that these new threads obtain a
properly configured CPU affinity.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Message-Id: <20221014134720.168738-6-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 backends/hostmem.c     |  5 +++--
 hw/virtio/virtio-mem.c |  2 +-
 include/qemu/osdep.h   |  4 +++-
 util/oslib-posix.c     | 20 ++++++++++++++------
 util/oslib-win32.c     |  2 +-
 5 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index 491cb10b97..76f0394490 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -232,7 +232,8 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value,
         void *ptr = memory_region_get_ram_ptr(&backend->mr);
         uint64_t sz = memory_region_size(&backend->mr);
 
-        qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads, &local_err);
+        qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads, NULL,
+                          &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
             return;
@@ -384,7 +385,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
          */
         if (backend->prealloc) {
             qemu_prealloc_mem(memory_region_get_fd(&backend->mr), ptr, sz,
-                              backend->prealloc_threads, &local_err);
+                              backend->prealloc_threads, NULL, &local_err);
             if (local_err) {
                 goto out;
             }
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 0e9ef4ff19..ed170def48 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -467,7 +467,7 @@ static int virtio_mem_set_block_state(VirtIOMEM *vmem, uint64_t start_gpa,
             int fd = memory_region_get_fd(&vmem->memdev->mr);
             Error *local_err = NULL;
 
-            qemu_prealloc_mem(fd, area, size, 1, &local_err);
+            qemu_prealloc_mem(fd, area, size, 1, NULL, &local_err);
             if (local_err) {
                 static bool warned;
 
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index e556e45143..625298c8bc 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -568,6 +568,8 @@ unsigned long qemu_getauxval(unsigned long type);
 
 void qemu_set_tty_echo(int fd, bool echo);
 
+typedef struct ThreadContext ThreadContext;
+
 /**
  * qemu_prealloc_mem:
  * @fd: the fd mapped into the area, -1 for anonymous memory
@@ -582,7 +584,7 @@ void qemu_set_tty_echo(int fd, bool echo);
  * after allocating file blocks for mapped files.
  */
 void qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads,
-                       Error **errp);
+                       ThreadContext *tc, Error **errp);
 
 /**
  * qemu_get_pid_name:
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 28305cdea3..59a891b6a8 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -419,7 +419,8 @@ static inline int get_memset_num_threads(size_t hpagesize, size_t numpages,
 }
 
 static int touch_all_pages(char *area, size_t hpagesize, size_t numpages,
-                           int max_threads, bool use_madv_populate_write)
+                           int max_threads, ThreadContext *tc,
+                           bool use_madv_populate_write)
 {
     static gsize initialized = 0;
     MemsetContext context = {
@@ -458,9 +459,16 @@ static int touch_all_pages(char *area, size_t hpagesize, size_t numpages,
         context.threads[i].numpages = numpages_per_thread + (i < leftover);
         context.threads[i].hpagesize = hpagesize;
         context.threads[i].context = &context;
-        qemu_thread_create(&context.threads[i].pgthread, "touch_pages",
-                           touch_fn, &context.threads[i],
-                           QEMU_THREAD_JOINABLE);
+        if (tc) {
+            thread_context_create_thread(tc, &context.threads[i].pgthread,
+                                         "touch_pages",
+                                         touch_fn, &context.threads[i],
+                                         QEMU_THREAD_JOINABLE);
+        } else {
+            qemu_thread_create(&context.threads[i].pgthread, "touch_pages",
+                               touch_fn, &context.threads[i],
+                               QEMU_THREAD_JOINABLE);
+        }
         addr += context.threads[i].numpages * hpagesize;
     }
 
@@ -496,7 +504,7 @@ static bool madv_populate_write_possible(char *area, size_t pagesize)
 }
 
 void qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads,
-                       Error **errp)
+                       ThreadContext *tc, Error **errp)
 {
     static gsize initialized;
     int ret;
@@ -537,7 +545,7 @@ void qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads,
     }
 
     /* touch pages simultaneously */
-    ret = touch_all_pages(area, hpagesize, numpages, max_threads,
+    ret = touch_all_pages(area, hpagesize, numpages, max_threads, tc,
                           use_madv_populate_write);
     if (ret) {
         error_setg_errno(errp, -ret,
diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index e1cb725ecc..a67cb3822e 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -269,7 +269,7 @@ int getpagesize(void)
 }
 
 void qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads,
-                       Error **errp)
+                       ThreadContext *tc, Error **errp)
 {
     int i;
     size_t pagesize = qemu_real_host_page_size();
-- 
2.37.3



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [GIT PULL 7/8] hostmem: Allow for specifying a ThreadContext for preallocation
  2022-10-28  9:52 [GIT PULL 0/8] Host Memory Backends and Memory devices patches David Hildenbrand
                   ` (5 preceding siblings ...)
  2022-10-28  9:52 ` [GIT PULL 6/8] util: Make qemu_prealloc_mem() optionally consume a ThreadContext David Hildenbrand
@ 2022-10-28  9:52 ` David Hildenbrand
  2022-10-28  9:52 ` [GIT PULL 8/8] vl: Allow ThreadContext objects to be created before the sandbox option David Hildenbrand
  2022-10-31 10:14 ` [GIT PULL 0/8] Host Memory Backends and Memory devices patches Stefan Hajnoczi
  8 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2022-10-28  9:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Igor Mammedov, Xiao Guangrong, Richard Henderson, Stefan Weil,
	David Hildenbrand, Michal Privoznik

Let's allow for specifying a thread context via the "prealloc-context"
property. When set, preallcoation threads will be crated via the
thread context -- inheriting the same CPU affinity as the thread
context.

Pinning preallcoation threads to CPUs can heavily increase performance
in NUMA setups, because, preallocation from a CPU close to the target
NUMA node(s) is faster then preallocation from a CPU further remote,
simply because of memory bandwidth for initializing memory with zeroes.
This is especially relevant for very large VMs backed by huge/gigantic
pages, whereby preallocation is mandatory.

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Message-Id: <20221014134720.168738-7-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 backends/hostmem.c       | 12 +++++++++---
 include/sysemu/hostmem.h |  2 ++
 qapi/qom.json            |  4 ++++
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index 76f0394490..8640294c10 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -232,8 +232,8 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value,
         void *ptr = memory_region_get_ram_ptr(&backend->mr);
         uint64_t sz = memory_region_size(&backend->mr);
 
-        qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads, NULL,
-                          &local_err);
+        qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads,
+                          backend->prealloc_context, &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
             return;
@@ -385,7 +385,8 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
          */
         if (backend->prealloc) {
             qemu_prealloc_mem(memory_region_get_fd(&backend->mr), ptr, sz,
-                              backend->prealloc_threads, NULL, &local_err);
+                              backend->prealloc_threads,
+                              backend->prealloc_context, &local_err);
             if (local_err) {
                 goto out;
             }
@@ -493,6 +494,11 @@ host_memory_backend_class_init(ObjectClass *oc, void *data)
         NULL, NULL);
     object_class_property_set_description(oc, "prealloc-threads",
         "Number of CPU threads to use for prealloc");
+    object_class_property_add_link(oc, "prealloc-context",
+        TYPE_THREAD_CONTEXT, offsetof(HostMemoryBackend, prealloc_context),
+        object_property_allow_set_link, OBJ_PROP_LINK_STRONG);
+    object_class_property_set_description(oc, "prealloc-context",
+        "Context to use for creating CPU threads for preallocation");
     object_class_property_add(oc, "size", "int",
         host_memory_backend_get_size,
         host_memory_backend_set_size,
diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
index 9ff5c16963..39326f1d4f 100644
--- a/include/sysemu/hostmem.h
+++ b/include/sysemu/hostmem.h
@@ -18,6 +18,7 @@
 #include "qom/object.h"
 #include "exec/memory.h"
 #include "qemu/bitmap.h"
+#include "qemu/thread-context.h"
 
 #define TYPE_MEMORY_BACKEND "memory-backend"
 OBJECT_DECLARE_TYPE(HostMemoryBackend, HostMemoryBackendClass,
@@ -66,6 +67,7 @@ struct HostMemoryBackend {
     bool merge, dump, use_canonical_path;
     bool prealloc, is_mapped, share, reserve;
     uint32_t prealloc_threads;
+    ThreadContext *prealloc_context;
     DECLARE_BITMAP(host_nodes, MAX_NODES + 1);
     HostMemPolicy policy;
 
diff --git a/qapi/qom.json b/qapi/qom.json
index 20b5735d78..87fcad2423 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -578,6 +578,9 @@
 #
 # @prealloc-threads: number of CPU threads to use for prealloc (default: 1)
 #
+# @prealloc-context: thread context to use for creation of preallocation threads
+#                    (default: none) (since 7.2)
+#
 # @share: if false, the memory is private to QEMU; if true, it is shared
 #         (default: false)
 #
@@ -608,6 +611,7 @@
             '*policy': 'HostMemPolicy',
             '*prealloc': 'bool',
             '*prealloc-threads': 'uint32',
+            '*prealloc-context': 'str',
             '*share': 'bool',
             '*reserve': 'bool',
             'size': 'size',
-- 
2.37.3



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [GIT PULL 8/8] vl: Allow ThreadContext objects to be created before the sandbox option
  2022-10-28  9:52 [GIT PULL 0/8] Host Memory Backends and Memory devices patches David Hildenbrand
                   ` (6 preceding siblings ...)
  2022-10-28  9:52 ` [GIT PULL 7/8] hostmem: Allow for specifying a ThreadContext for preallocation David Hildenbrand
@ 2022-10-28  9:52 ` David Hildenbrand
  2022-10-31 10:14 ` [GIT PULL 0/8] Host Memory Backends and Memory devices patches Stefan Hajnoczi
  8 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2022-10-28  9:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Igor Mammedov, Xiao Guangrong, Richard Henderson, Stefan Weil,
	David Hildenbrand, Michal Privoznik

Currently, there is no way to configure a CPU affinity inside QEMU when
the sandbox option disables it for QEMU as a whole, for example, via:
    -sandbox enable=on,resourcecontrol=deny

While ThreadContext objects can be created on the QEMU commandline and
the CPU affinity can be configured externally via the thread-id, this is
insufficient if a ThreadContext with a certain CPU affinity is already
required during QEMU startup, before we can intercept QEMU and
configure the CPU affinity.

Blocking sched_setaffinity() was introduced in 24f8cdc57224 ("seccomp:
add resourcecontrol argument to command line"), "to avoid any bigger of the
process". However, we only care about once QEMU is running, not when
the instance starting QEMU explicitly requests a certain CPU affinity
on the QEMU comandline.

Right now, for NUMA-aware preallocation of memory backends used for initial
machine RAM, one has to:

1) Start QEMU with the memory-backend with "prealloc=off"
2) Pause QEMU before it starts the guest (-S)
3) Create ThreadContext, configure the CPU affinity using the thread-id
4) Configure the ThreadContext as "prealloc-context" of the memory
   backend
5) Trigger preallocation by setting "prealloc=on"

To simplify this handling especially for initial machine RAM,
allow creation of ThreadContext objects before parsing sandbox options,
such that the CPU affinity requested on the QEMU commandline alongside the
sandbox option can be set. As ThreadContext objects essentially only create
a persistent context thread and set the CPU affinity, this is easily
possible.

With this change, we can create a ThreadContext with a CPU affinity on
the QEMU commandline and use it for preallocation of memory backends
glued to the machine (simplified example):

To make "-name debug-threads=on" keep working as expected for the
context threads, perform earlier parsing of "-name".

qemu-system-x86_64 -m 1G \
 -object thread-context,id=tc1,cpu-affinity=3-4 \
 -object memory-backend-ram,id=pc.ram,size=1G,prealloc=on,prealloc-threads=2,prealloc-context=tc1 \
 -machine memory-backend=pc.ram \
 -S -monitor stdio -sandbox enable=on,resourcecontrol=deny

And while we can query the current CPU affinity:
  (qemu) qom-get tc1 cpu-affinity
  [
      3,
      4
  ]

We can no longer change it from QEMU directly:
  (qemu) qom-set tc1 cpu-affinity 1-2
  Error: Setting CPU affinity failed: Operation not permitted

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Message-Id: <20221014134720.168738-8-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 softmmu/vl.c | 36 ++++++++++++++++++++++++++++++++----
 1 file changed, 32 insertions(+), 4 deletions(-)

diff --git a/softmmu/vl.c b/softmmu/vl.c
index b464da25bc..b5a23420ac 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -1759,6 +1759,27 @@ static void object_option_parse(const char *optarg)
     visit_free(v);
 }
 
+/*
+ * Very early object creation, before the sandbox options have been activated.
+ */
+static bool object_create_pre_sandbox(const char *type)
+{
+    /*
+     * Objects should in general not get initialized "too early" without
+     * a reason. If you add one, state the reason in a comment!
+     */
+
+    /*
+     * Reason: -sandbox on,resourcecontrol=deny disallows setting CPU
+     * affinity of threads.
+     */
+    if (g_str_equal(type, "thread-context")) {
+        return true;
+    }
+
+    return false;
+}
+
 /*
  * Initial object creation happens before all other
  * QEMU data types are created. The majority of objects
@@ -1773,6 +1794,11 @@ static bool object_create_early(const char *type)
      * add one, state the reason in a comment!
      */
 
+    /* Reason: already created. */
+    if (object_create_pre_sandbox(type)) {
+        return false;
+    }
+
     /* Reason: property "chardev" */
     if (g_str_equal(type, "rng-egd") ||
         g_str_equal(type, "qtest")) {
@@ -1895,7 +1921,7 @@ static void qemu_create_early_backends(void)
  */
 static bool object_create_late(const char *type)
 {
-    return !object_create_early(type);
+    return !object_create_early(type) && !object_create_pre_sandbox(type);
 }
 
 static void qemu_create_late_backends(void)
@@ -2351,6 +2377,11 @@ static int process_runstate_actions(void *opaque, QemuOpts *opts, Error **errp)
 
 static void qemu_process_early_options(void)
 {
+    qemu_opts_foreach(qemu_find_opts("name"),
+                      parse_name, NULL, &error_fatal);
+
+    object_option_foreach_add(object_create_pre_sandbox);
+
 #ifdef CONFIG_SECCOMP
     QemuOptsList *olist = qemu_find_opts_err("sandbox", NULL);
     if (olist) {
@@ -2358,9 +2389,6 @@ static void qemu_process_early_options(void)
     }
 #endif
 
-    qemu_opts_foreach(qemu_find_opts("name"),
-                      parse_name, NULL, &error_fatal);
-
     if (qemu_opts_foreach(qemu_find_opts("action"),
                           process_runstate_actions, NULL, &error_fatal)) {
         exit(1);
-- 
2.37.3



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [GIT PULL 0/8] Host Memory Backends and Memory devices patches
  2022-10-28  9:52 [GIT PULL 0/8] Host Memory Backends and Memory devices patches David Hildenbrand
                   ` (7 preceding siblings ...)
  2022-10-28  9:52 ` [GIT PULL 8/8] vl: Allow ThreadContext objects to be created before the sandbox option David Hildenbrand
@ 2022-10-31 10:14 ` Stefan Hajnoczi
  8 siblings, 0 replies; 14+ messages in thread
From: Stefan Hajnoczi @ 2022-10-31 10:14 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Igor Mammedov, Xiao Guangrong, Richard Henderson,
	Stefan Weil, David Hildenbrand

[-- Attachment #1: Type: text/plain, Size: 115 bytes --]

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.2 for any user-visible changes.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL 5/8] util: Add write-only "node-affinity" property for ThreadContext
  2022-10-28  9:52 ` [GIT PULL 5/8] util: Add write-only "node-affinity" property for ThreadContext David Hildenbrand
@ 2024-02-05 10:14   ` Claudio Fontana
  2024-02-05 14:15     ` David Hildenbrand
  0 siblings, 1 reply; 14+ messages in thread
From: Claudio Fontana @ 2024-02-05 10:14 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Igor Mammedov, Xiao Guangrong, Richard Henderson, Stefan Weil,
	Michal Privoznik, Markus Armbruster, qemu-devel

Hi,

turning pages back in time,

noticed that in recent qemu-img binaries we include an ELF dependency on libnuma.so that seems unused.

I think it stems from this commit:

commit 10218ae6d006f76410804cc4dc690085b3d008b5
Author: David Hildenbrand <david@redhat.com>
Date:   Fri Oct 14 15:47:17 2022 +0200

    util: Add write-only "node-affinity" property for ThreadContext


possibly this hunk?

diff --git a/util/meson.build b/util/meson.build
index e97cd2d779..c0a7bc54d4 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -1,5 +1,5 @@
 util_ss.add(files('osdep.c', 'cutils.c', 'unicode.c', 'qemu-timer-common.c'))
-util_ss.add(files('thread-context.c'))
+util_ss.add(files('thread-context.c'), numa)
 if not config_host_data.get('CONFIG_ATOMIC64')
   util_ss.add(files('atomic64.c'))
 endif


I wonder if there is some conditional we could use to avoid the apparently useless dependency to libnuma in the qemu-img binary?

Ciao,

Claudio 


On 10/28/22 11:52, David Hildenbrand wrote:
> Let's make it easier to pin threads created via a ThreadContext to
> all host CPUs currently belonging to a given set of host NUMA nodes --
> which is the common case.
> 
> "node-affinity" is simply a shortcut for setting "cpu-affinity" manually
> to the list of host CPUs belonging to the set of host nodes. This property
> can only be written.
> 
> A simple QEMU example to set the CPU affinity to host node 1 on a system
> with two nodes, 24 CPUs each, whereby odd-numbered host CPUs belong to
> host node 1:
>     qemu-system-x86_64 -S \
>       -object thread-context,id=tc1,node-affinity=1
> 
> And we can query the cpu-affinity via HMP/QMP:
>     (qemu) qom-get tc1 cpu-affinity
>     [
>         1,
>         3,
>         5,
>         7,
>         9,
>         11,
>         13,
>         15,
>         17,
>         19,
>         21,
>         23,
>         25,
>         27,
>         29,
>         31,
>         33,
>         35,
>         37,
>         39,
>         41,
>         43,
>         45,
>         47
>     ]
> 
> We cannot query the node-affinity:
>     (qemu) qom-get tc1 node-affinity
>     Error: Insufficient permission to perform this operation
> 
> But note that due to dynamic library loading this example will not work
> before we actually make use of thread_context_create_thread() in QEMU
> code, because the type will otherwise not get registered. We'll wire
> this up next to make it work.
> 
> Note that if the host CPUs for a host node change due do CPU hot(un)plug
> CPU onlining/offlining (i.e., lscpu output changes) after the ThreadContext
> was started, the CPU affinity will not get updated.
> 
> Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
> Acked-by: Markus Armbruster <armbru@redhat.com>
> Message-Id: <20221014134720.168738-5-david@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  qapi/qom.json         |  9 ++++-
>  util/meson.build      |  2 +-
>  util/thread-context.c | 84 +++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 93 insertions(+), 2 deletions(-)
> 
> diff --git a/qapi/qom.json b/qapi/qom.json
> index 8013ba4b82..20b5735d78 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -839,10 +839,17 @@
>  #                threads created in the thread context (default: QEMU main
>  #                thread CPU affinity)
>  #
> +# @node-affinity: the list of host node numbers that will be resolved to a
> +#                 list of host CPU numbers used as CPU affinity. This is a
> +#                 shortcut for specifying the list of host CPU numbers
> +#                 belonging to the host nodes manually by setting
> +#                 @cpu-affinity. (default: QEMU main thread affinity)
> +#
>  # Since: 7.2
>  ##
>  { 'struct': 'ThreadContextProperties',
> -  'data': { '*cpu-affinity': ['uint16'] } }
> +  'data': { '*cpu-affinity': ['uint16'],
> +            '*node-affinity': ['uint16'] } }
>  
>  
>  ##
> diff --git a/util/meson.build b/util/meson.build
> index e97cd2d779..c0a7bc54d4 100644
> --- a/util/meson.build
> +++ b/util/meson.build
> @@ -1,5 +1,5 @@
>  util_ss.add(files('osdep.c', 'cutils.c', 'unicode.c', 'qemu-timer-common.c'))
> -util_ss.add(files('thread-context.c'))
> +util_ss.add(files('thread-context.c'), numa)
>  if not config_host_data.get('CONFIG_ATOMIC64')
>    util_ss.add(files('atomic64.c'))
>  endif
> diff --git a/util/thread-context.c b/util/thread-context.c
> index c921905396..4138245332 100644
> --- a/util/thread-context.c
> +++ b/util/thread-context.c
> @@ -21,6 +21,10 @@
>  #include "qemu/module.h"
>  #include "qemu/bitmap.h"
>  
> +#ifdef CONFIG_NUMA
> +#include <numa.h>
> +#endif
> +
>  enum {
>      TC_CMD_NONE = 0,
>      TC_CMD_STOP,
> @@ -88,6 +92,11 @@ static void thread_context_set_cpu_affinity(Object *obj, Visitor *v,
>      int nbits = 0, ret;
>      Error *err = NULL;
>  
> +    if (tc->init_cpu_bitmap) {
> +        error_setg(errp, "Mixing CPU and node affinity not supported");
> +        return;
> +    }
> +
>      visit_type_uint16List(v, name, &host_cpus, &err);
>      if (err) {
>          error_propagate(errp, err);
> @@ -159,6 +168,79 @@ static void thread_context_get_cpu_affinity(Object *obj, Visitor *v,
>      qapi_free_uint16List(host_cpus);
>  }
>  
> +static void thread_context_set_node_affinity(Object *obj, Visitor *v,
> +                                             const char *name, void *opaque,
> +                                             Error **errp)
> +{
> +#ifdef CONFIG_NUMA
> +    const int nbits = numa_num_possible_cpus();
> +    ThreadContext *tc = THREAD_CONTEXT(obj);
> +    uint16List *l, *host_nodes = NULL;
> +    unsigned long *bitmap = NULL;
> +    struct bitmask *tmp_cpus;
> +    Error *err = NULL;
> +    int ret, i;
> +
> +    if (tc->init_cpu_bitmap) {
> +        error_setg(errp, "Mixing CPU and node affinity not supported");
> +        return;
> +    }
> +
> +    visit_type_uint16List(v, name, &host_nodes, &err);
> +    if (err) {
> +        error_propagate(errp, err);
> +        return;
> +    }
> +
> +    if (!host_nodes) {
> +        error_setg(errp, "Node list is empty");
> +        goto out;
> +    }
> +
> +    bitmap = bitmap_new(nbits);
> +    tmp_cpus = numa_allocate_cpumask();
> +    for (l = host_nodes; l; l = l->next) {
> +        numa_bitmask_clearall(tmp_cpus);
> +        ret = numa_node_to_cpus(l->value, tmp_cpus);
> +        if (ret) {
> +            /* We ignore any errors, such as impossible nodes. */
> +            continue;
> +        }
> +        for (i = 0; i < nbits; i++) {
> +            if (numa_bitmask_isbitset(tmp_cpus, i)) {
> +                set_bit(i, bitmap);
> +            }
> +        }
> +    }
> +    numa_free_cpumask(tmp_cpus);
> +
> +    if (bitmap_empty(bitmap, nbits)) {
> +        error_setg(errp, "The nodes select no CPUs");
> +        goto out;
> +    }
> +
> +    if (tc->thread_id != -1) {
> +        /*
> +         * Note: we won't be adjusting the affinity of any thread that is still
> +         * around for now, but only the affinity of the context thread.
> +         */
> +        ret = qemu_thread_set_affinity(&tc->thread, bitmap, nbits);
> +        if (ret) {
> +            error_setg(errp, "Setting CPU affinity failed: %s", strerror(ret));
> +        }
> +    } else {
> +        tc->init_cpu_bitmap = bitmap;
> +        bitmap = NULL;
> +        tc->init_cpu_nbits = nbits;
> +    }
> +out:
> +    g_free(bitmap);
> +    qapi_free_uint16List(host_nodes);
> +#else
> +    error_setg(errp, "NUMA node affinity is not supported by this QEMU");
> +#endif
> +}
> +
>  static void thread_context_get_thread_id(Object *obj, Visitor *v,
>                                           const char *name, void *opaque,
>                                           Error **errp)
> @@ -208,6 +290,8 @@ static void thread_context_class_init(ObjectClass *oc, void *data)
>      object_class_property_add(oc, "cpu-affinity", "int",
>                                thread_context_get_cpu_affinity,
>                                thread_context_set_cpu_affinity, NULL, NULL);
> +    object_class_property_add(oc, "node-affinity", "int", NULL,
> +                              thread_context_set_node_affinity, NULL, NULL);
>  }
>  
>  static void thread_context_instance_init(Object *obj)



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [GIT PULL 5/8] util: Add write-only "node-affinity" property for ThreadContext
  2024-02-05 10:14   ` Claudio Fontana
@ 2024-02-05 14:15     ` David Hildenbrand
  2024-02-05 16:13       ` Claudio Fontana
  0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2024-02-05 14:15 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: Igor Mammedov, Xiao Guangrong, Richard Henderson, Stefan Weil,
	Michal Privoznik, Markus Armbruster, qemu-devel

On 05.02.24 11:14, Claudio Fontana wrote:
> Hi,

Hi Claudio,

> 
> turning pages back in time,
> 
> noticed that in recent qemu-img binaries we include an ELF dependency on libnuma.so that seems unused.
> 
> I think it stems from this commit:
> 
> commit 10218ae6d006f76410804cc4dc690085b3d008b5
> Author: David Hildenbrand <david@redhat.com>
> Date:   Fri Oct 14 15:47:17 2022 +0200
> 
>      util: Add write-only "node-affinity" property for ThreadContext
> 
> 
> possibly this hunk?
> 
> diff --git a/util/meson.build b/util/meson.build
> index e97cd2d779..c0a7bc54d4 100644
> --- a/util/meson.build
> +++ b/util/meson.build
> @@ -1,5 +1,5 @@
>   util_ss.add(files('osdep.c', 'cutils.c', 'unicode.c', 'qemu-timer-common.c'))
> -util_ss.add(files('thread-context.c'))
> +util_ss.add(files('thread-context.c'), numa)
>   if not config_host_data.get('CONFIG_ATOMIC64')
>     util_ss.add(files('atomic64.c'))
>   endif
> 
> 
> I wonder if there is some conditional we could use to avoid the apparently useless dependency to libnuma in the qemu-img binary?

the simplest change is probably moving the thread-context stuff out of 
util (as you say, it's currently only used by QEMU itself).

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL 5/8] util: Add write-only "node-affinity" property for ThreadContext
  2024-02-05 14:15     ` David Hildenbrand
@ 2024-02-05 16:13       ` Claudio Fontana
  2024-02-05 17:55         ` David Hildenbrand
  0 siblings, 1 reply; 14+ messages in thread
From: Claudio Fontana @ 2024-02-05 16:13 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Igor Mammedov, Xiao Guangrong, Richard Henderson, Stefan Weil,
	Michal Privoznik, Markus Armbruster, qemu-devel,
	Philippe Mathieu-Daudé

Hello David,

It would seem to me that a lot of the calling code like qemu_prealloc_mem for example
should be sysemu-only, not used for tools, or user mode either right?

And the thread_context.c itself should also be sysemu-only, correct?

Thanks,

Claudio

On 2/5/24 15:15, David Hildenbrand wrote:
> On 05.02.24 11:14, Claudio Fontana wrote:
>> Hi,
> 
> Hi Claudio,
> 
>>
>> turning pages back in time,
>>
>> noticed that in recent qemu-img binaries we include an ELF dependency on libnuma.so that seems unused.
>>
>> I think it stems from this commit:
>>
>> commit 10218ae6d006f76410804cc4dc690085b3d008b5
>> Author: David Hildenbrand <david@redhat.com>
>> Date:   Fri Oct 14 15:47:17 2022 +0200
>>
>>      util: Add write-only "node-affinity" property for ThreadContext
>>
>>
>> possibly this hunk?
>>
>> diff --git a/util/meson.build b/util/meson.build
>> index e97cd2d779..c0a7bc54d4 100644
>> --- a/util/meson.build
>> +++ b/util/meson.build
>> @@ -1,5 +1,5 @@
>>   util_ss.add(files('osdep.c', 'cutils.c', 'unicode.c', 'qemu-timer-common.c'))
>> -util_ss.add(files('thread-context.c'))
>> +util_ss.add(files('thread-context.c'), numa)
>>   if not config_host_data.get('CONFIG_ATOMIC64')
>>     util_ss.add(files('atomic64.c'))
>>   endif
>>
>>
>> I wonder if there is some conditional we could use to avoid the apparently useless dependency to libnuma in the qemu-img binary?
> 
> the simplest change is probably moving the thread-context stuff out of 
> util (as you say, it's currently only used by QEMU itself).
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL 5/8] util: Add write-only "node-affinity" property for ThreadContext
  2024-02-05 16:13       ` Claudio Fontana
@ 2024-02-05 17:55         ` David Hildenbrand
  0 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2024-02-05 17:55 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: Igor Mammedov, Xiao Guangrong, Richard Henderson, Stefan Weil,
	Michal Privoznik, Markus Armbruster, qemu-devel,
	Philippe Mathieu-Daudé

On 05.02.24 17:13, Claudio Fontana wrote:
> Hello David,
> 

Hi,

> It would seem to me that a lot of the calling code like qemu_prealloc_mem for example
> should be sysemu-only, not used for tools, or user mode either right?
> 
> And the thread_context.c itself should also be sysemu-only, correct?

Yes, both should currently only get used for sysemu only. Memory 
backends and virtio-mem are sysemu-only.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-02-05 17:56 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-28  9:52 [GIT PULL 0/8] Host Memory Backends and Memory devices patches David Hildenbrand
2022-10-28  9:52 ` [GIT PULL 1/8] hw/mem/nvdimm: fix error message for 'unarmed' flag David Hildenbrand
2022-10-28  9:52 ` [GIT PULL 2/8] util: Cleanup and rename os_mem_prealloc() David Hildenbrand
2022-10-28  9:52 ` [GIT PULL 3/8] util: Introduce qemu_thread_set_affinity() and qemu_thread_get_affinity() David Hildenbrand
2022-10-28  9:52 ` [GIT PULL 4/8] util: Introduce ThreadContext user-creatable object David Hildenbrand
2022-10-28  9:52 ` [GIT PULL 5/8] util: Add write-only "node-affinity" property for ThreadContext David Hildenbrand
2024-02-05 10:14   ` Claudio Fontana
2024-02-05 14:15     ` David Hildenbrand
2024-02-05 16:13       ` Claudio Fontana
2024-02-05 17:55         ` David Hildenbrand
2022-10-28  9:52 ` [GIT PULL 6/8] util: Make qemu_prealloc_mem() optionally consume a ThreadContext David Hildenbrand
2022-10-28  9:52 ` [GIT PULL 7/8] hostmem: Allow for specifying a ThreadContext for preallocation David Hildenbrand
2022-10-28  9:52 ` [GIT PULL 8/8] vl: Allow ThreadContext objects to be created before the sandbox option David Hildenbrand
2022-10-31 10:14 ` [GIT PULL 0/8] Host Memory Backends and Memory devices patches Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).