From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: qemu-devel@nongnu.org, "Michal Privoznik" <mprivozn@redhat.com>,
"Igor Mammedov" <imammedo@redhat.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Daniel P. Berrangé" <berrange@redhat.com>,
"Eduardo Habkost" <eduardo@habkost.net>,
"Eric Blake" <eblake@redhat.com>,
"Markus Armbruster" <armbru@redhat.com>,
"Richard Henderson" <richard.henderson@linaro.org>,
"Stefan Weil" <sw@weilnetz.de>
Subject: Re: [PATCH v2 0/7] hostmem: NUMA-aware memory preallocation using ThreadContext
Date: Mon, 10 Oct 2022 11:40:25 +0100 [thread overview]
Message-ID: <Y0P2mQcHpXlXbEY1@work-vm> (raw)
In-Reply-To: <20221010091117.88603-1-david@redhat.com>
* David Hildenbrand (david@redhat.com) wrote:
> This is a follow-up on "util: NUMA aware memory preallocation" [1] by
> Michal.
>
> Setting the CPU affinity of threads from inside QEMU usually isn't
> easily possible, because we don't want QEMU -- once started and running
> guest code -- to be able to mess up the system. QEMU disallows relevant
> syscalls using seccomp, such that any such invocation will fail.
>
> Especially for memory preallocation in memory backends, the CPU affinity
> can significantly increase guest startup time, for example, when running
> large VMs backed by huge/gigantic pages, because of NUMA effects. For
> NUMA-aware preallocation, we have to set the CPU affinity, however:
>
> (1) Once preallocation threads are created during preallocation, management
> tools cannot intercept anymore to change the affinity. These threads
> are created automatically on demand.
> (2) QEMU cannot easily set the CPU affinity itself.
> (3) The CPU affinity derived from the NUMA bindings of the memory backend
> might not necessarily be exactly the CPUs we actually want to use
> (e.g., CPU-less NUMA nodes, CPUs that are pinned/used for other VMs).
>
> There is an easy "workaround". If we have a thread with the right CPU
> affinity, we can simply create new threads on demand via that prepared
> context. So, all we have to do is setup and create such a context ahead
> of time, to then configure preallocation to create new threads via that
> environment.
>
> So, let's introduce a user-creatable "thread-context" object that
> essentially consists of a context thread used to create new threads.
> QEMU can either try setting the CPU affinity itself ("cpu-affinity",
> "node-affinity" property), or upper layers can extract the thread id
> ("thread-id" property) to configure it externally.
>
> Make memory-backends consume a thread-context object
> (via the "prealloc-context" property) and use it when preallocating to
> create new threads with the desired CPU affinity. Further, to make it
> easier to use, allow creation of "thread-context" objects, including
> setting the CPU affinity directly from QEMU, before enabling the
> sandbox option.
>
>
> Quick test on a system with 2 NUMA nodes:
>
> Without CPU affinity:
> time qemu-system-x86_64 \
> -object memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind \
> -nographic -monitor stdio
>
> real 0m5.383s
> real 0m3.499s
> real 0m5.129s
> real 0m4.232s
> real 0m5.220s
> real 0m4.288s
> real 0m3.582s
> real 0m4.305s
> real 0m5.421s
> real 0m4.502s
>
> -> It heavily depends on the scheduler CPU selection
>
> With CPU affinity:
> time qemu-system-x86_64 \
> -object thread-context,id=tc1,node-affinity=0 \
> -object memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind,prealloc-context=tc1 \
> -sandbox enable=on,resourcecontrol=deny \
> -nographic -monitor stdio
>
> real 0m1.959s
> real 0m1.942s
> real 0m1.943s
> real 0m1.941s
> real 0m1.948s
> real 0m1.964s
> real 0m1.949s
> real 0m1.948s
> real 0m1.941s
> real 0m1.937s
>
> On reasonably large VMs, the speedup can be quite significant.
>
> While this concept is currently only used for short-lived preallocation
> threads, nothing major speaks against reusing the concept for other
> threads that are harder to identify/configure -- except that
> we need additional (idle) context threads that are otherwise left unused.
>
> This series does not yet tackle concurrent preallocation of memory
> backends. Memory backend objects are created and memory is preallocated one
> memory backend at a time -- and there is currently no way to do
> preallocation asynchronously.
Since you seem to have a full set of r-b's - do you intend to merge this
as-is or do the cuncurrenct preallocation first?
Dave
> [1] https://lkml.kernel.org/r/ffdcd118d59b379ede2b64745144165a40f6a813.1652165704.git.mprivozn@redhat.com
>
> v1 -> v2:
> * Fixed some minor style nits
> * "util: Introduce ThreadContext user-creatable object"
> -> Impove documentation and patch description. [Markus]
> * "util: Add write-only "node-affinity" property for ThreadContext"
> -> Impove documentation and patch description. [Markus]
>
> RFC -> v1:
> * "vl: Allow ThreadContext objects to be created before the sandbox option"
> -> Move parsing of the "name" property before object_create_pre_sandbox
> * Added RB's
>
> Cc: Michal Privoznik <mprivozn@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Daniel P. Berrangé" <berrange@redhat.com>
> Cc: Eduardo Habkost <eduardo@habkost.net>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Cc: Eric Blake <eblake@redhat.com>
> Cc: Markus Armbruster <armbru@redhat.com>
> Cc: Richard Henderson <richard.henderson@linaro.org>
> Cc: Stefan Weil <sw@weilnetz.de>
>
> David Hildenbrand (7):
> util: Cleanup and rename os_mem_prealloc()
> util: Introduce qemu_thread_set_affinity() and
> qemu_thread_get_affinity()
> util: Introduce ThreadContext user-creatable object
> util: Add write-only "node-affinity" property for ThreadContext
> util: Make qemu_prealloc_mem() optionally consume a ThreadContext
> hostmem: Allow for specifying a ThreadContext for preallocation
> vl: Allow ThreadContext objects to be created before the sandbox
> option
>
> backends/hostmem.c | 13 +-
> hw/virtio/virtio-mem.c | 2 +-
> include/qemu/osdep.h | 19 +-
> include/qemu/thread-context.h | 57 ++++++
> include/qemu/thread.h | 4 +
> include/sysemu/hostmem.h | 2 +
> meson.build | 16 ++
> qapi/qom.json | 28 +++
> softmmu/cpus.c | 2 +-
> softmmu/vl.c | 36 +++-
> util/meson.build | 1 +
> util/oslib-posix.c | 39 ++--
> util/oslib-win32.c | 8 +-
> util/qemu-thread-posix.c | 70 +++++++
> util/qemu-thread-win32.c | 12 ++
> util/thread-context.c | 362 ++++++++++++++++++++++++++++++++++
> 16 files changed, 641 insertions(+), 30 deletions(-)
> create mode 100644 include/qemu/thread-context.h
> create mode 100644 util/thread-context.c
>
> --
> 2.37.3
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2022-10-10 10:41 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-10 9:11 [PATCH v2 0/7] hostmem: NUMA-aware memory preallocation using ThreadContext David Hildenbrand
2022-10-10 9:11 ` [PATCH v2 1/7] util: Cleanup and rename os_mem_prealloc() David Hildenbrand
2022-10-10 9:11 ` [PATCH v2 2/7] util: Introduce qemu_thread_set_affinity() and qemu_thread_get_affinity() David Hildenbrand
2022-10-10 9:11 ` [PATCH v2 3/7] util: Introduce ThreadContext user-creatable object David Hildenbrand
2022-10-11 5:47 ` Markus Armbruster
2022-10-11 7:53 ` David Hildenbrand
2022-10-12 8:02 ` Markus Armbruster
2022-10-12 8:19 ` David Hildenbrand
2022-10-12 10:23 ` Markus Armbruster
2022-10-12 12:27 ` David Hildenbrand
2022-10-10 9:11 ` [PATCH v2 4/7] util: Add write-only "node-affinity" property for ThreadContext David Hildenbrand
2022-10-11 6:03 ` Markus Armbruster
2022-10-11 7:34 ` David Hildenbrand
2022-10-12 8:03 ` Markus Armbruster
2022-10-12 8:26 ` David Hildenbrand
2022-10-10 9:11 ` [PATCH v2 5/7] util: Make qemu_prealloc_mem() optionally consume a ThreadContext David Hildenbrand
2022-10-10 9:11 ` [PATCH v2 6/7] hostmem: Allow for specifying a ThreadContext for preallocation David Hildenbrand
2022-10-10 9:11 ` [PATCH v2 7/7] vl: Allow ThreadContext objects to be created before the sandbox option David Hildenbrand
2022-10-10 10:40 ` Dr. David Alan Gilbert [this message]
2022-10-10 11:18 ` [PATCH v2 0/7] hostmem: NUMA-aware memory preallocation using ThreadContext David Hildenbrand
2022-10-11 9:02 ` Dr. David Alan Gilbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y0P2mQcHpXlXbEY1@work-vm \
--to=dgilbert@redhat.com \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=david@redhat.com \
--cc=eblake@redhat.com \
--cc=eduardo@habkost.net \
--cc=imammedo@redhat.com \
--cc=mprivozn@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=sw@weilnetz.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).