Re: [PATCH v2 0/7] hostmem: NUMA-aware memory preallocation using ThreadContext

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: qemu-devel@nongnu.org, "Michal Privoznik" <mprivozn@redhat.com>,
	"Igor Mammedov" <imammedo@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	"Eduardo Habkost" <eduardo@habkost.net>,
	"Eric Blake" <eblake@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Stefan Weil" <sw@weilnetz.de>
Subject: Re: [PATCH v2 0/7] hostmem: NUMA-aware memory preallocation using ThreadContext
Date: Tue, 11 Oct 2022 10:02:45 +0100	[thread overview]
Message-ID: <Y0UxNX5Y2dgZsUyN@work-vm> (raw)
In-Reply-To: <23dd0ce0-5393-3aa0-affe-11277c6a123b@redhat.com>

* David Hildenbrand (david@redhat.com) wrote:
> On 10.10.22 12:40, Dr. David Alan Gilbert wrote:
> > * David Hildenbrand (david@redhat.com) wrote:
> > > This is a follow-up on "util: NUMA aware memory preallocation" [1] by
> > > Michal.
> > > 
> > > Setting the CPU affinity of threads from inside QEMU usually isn't
> > > easily possible, because we don't want QEMU -- once started and running
> > > guest code -- to be able to mess up the system. QEMU disallows relevant
> > > syscalls using seccomp, such that any such invocation will fail.
> > > 
> > > Especially for memory preallocation in memory backends, the CPU affinity
> > > can significantly increase guest startup time, for example, when running
> > > large VMs backed by huge/gigantic pages, because of NUMA effects. For
> > > NUMA-aware preallocation, we have to set the CPU affinity, however:
> > > 
> > > (1) Once preallocation threads are created during preallocation, management
> > >      tools cannot intercept anymore to change the affinity. These threads
> > >      are created automatically on demand.
> > > (2) QEMU cannot easily set the CPU affinity itself.
> > > (3) The CPU affinity derived from the NUMA bindings of the memory backend
> > >      might not necessarily be exactly the CPUs we actually want to use
> > >      (e.g., CPU-less NUMA nodes, CPUs that are pinned/used for other VMs).
> > > 
> > > There is an easy "workaround". If we have a thread with the right CPU
> > > affinity, we can simply create new threads on demand via that prepared
> > > context. So, all we have to do is setup and create such a context ahead
> > > of time, to then configure preallocation to create new threads via that
> > > environment.
> > > 
> > > So, let's introduce a user-creatable "thread-context" object that
> > > essentially consists of a context thread used to create new threads.
> > > QEMU can either try setting the CPU affinity itself ("cpu-affinity",
> > > "node-affinity" property), or upper layers can extract the thread id
> > > ("thread-id" property) to configure it externally.
> > > 
> > > Make memory-backends consume a thread-context object
> > > (via the "prealloc-context" property) and use it when preallocating to
> > > create new threads with the desired CPU affinity. Further, to make it
> > > easier to use, allow creation of "thread-context" objects, including
> > > setting the CPU affinity directly from QEMU, before enabling the
> > > sandbox option.
> > > 
> > > 
> > > Quick test on a system with 2 NUMA nodes:
> > > 
> > > Without CPU affinity:
> > >      time qemu-system-x86_64 \
> > >          -object memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind \
> > >          -nographic -monitor stdio
> > > 
> > >      real    0m5.383s
> > >      real    0m3.499s
> > >      real    0m5.129s
> > >      real    0m4.232s
> > >      real    0m5.220s
> > >      real    0m4.288s
> > >      real    0m3.582s
> > >      real    0m4.305s
> > >      real    0m5.421s
> > >      real    0m4.502s
> > > 
> > >      -> It heavily depends on the scheduler CPU selection
> > > 
> > > With CPU affinity:
> > >      time qemu-system-x86_64 \
> > >          -object thread-context,id=tc1,node-affinity=0 \
> > >          -object memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind,prealloc-context=tc1 \
> > >          -sandbox enable=on,resourcecontrol=deny \
> > >          -nographic -monitor stdio
> > > 
> > >      real    0m1.959s
> > >      real    0m1.942s
> > >      real    0m1.943s
> > >      real    0m1.941s
> > >      real    0m1.948s
> > >      real    0m1.964s
> > >      real    0m1.949s
> > >      real    0m1.948s
> > >      real    0m1.941s
> > >      real    0m1.937s
> > > 
> > > On reasonably large VMs, the speedup can be quite significant.
> > > 
> > > While this concept is currently only used for short-lived preallocation
> > > threads, nothing major speaks against reusing the concept for other
> > > threads that are harder to identify/configure -- except that
> > > we need additional (idle) context threads that are otherwise left unused.
> > > 
> > > This series does not yet tackle concurrent preallocation of memory
> > > backends. Memory backend objects are created and memory is preallocated one
> > > memory backend at a time -- and there is currently no way to do
> > > preallocation asynchronously.
> 
> Hi Dave,
> 
> > 
> > Since you seem to have a full set of r-b's - do you intend to merge this
> > as-is or do the cuncurrenct preallocation first?
> 
> I intent to merge this as is, as it provides a benefit as it stands and
> concurrent preallcoation might not require user interface changes.

Yep, that's fair enough.

> I do have some ideas on how to implement concurrent preallocation, but it
> needs more thought (and more importantly, time).

Yep, it would be nice for the really huge VMs.

Dave


> -- 
> Thanks,
> 
> David / dhildenb
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

     prev parent reply	other threads:[~2022-10-11  9:26 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-10  9:11 [PATCH v2 0/7] hostmem: NUMA-aware memory preallocation using ThreadContext David Hildenbrand
2022-10-10  9:11 ` [PATCH v2 1/7] util: Cleanup and rename os_mem_prealloc() David Hildenbrand
2022-10-10  9:11 ` [PATCH v2 2/7] util: Introduce qemu_thread_set_affinity() and qemu_thread_get_affinity() David Hildenbrand
2022-10-10  9:11 ` [PATCH v2 3/7] util: Introduce ThreadContext user-creatable object David Hildenbrand
2022-10-11  5:47   ` Markus Armbruster
2022-10-11  7:53     ` David Hildenbrand
2022-10-12  8:02       ` Markus Armbruster
2022-10-12  8:19         ` David Hildenbrand
2022-10-12 10:23           ` Markus Armbruster
2022-10-12 12:27             ` David Hildenbrand
2022-10-10  9:11 ` [PATCH v2 4/7] util: Add write-only "node-affinity" property for ThreadContext David Hildenbrand
2022-10-11  6:03   ` Markus Armbruster
2022-10-11  7:34     ` David Hildenbrand
2022-10-12  8:03       ` Markus Armbruster
2022-10-12  8:26         ` David Hildenbrand
2022-10-10  9:11 ` [PATCH v2 5/7] util: Make qemu_prealloc_mem() optionally consume a ThreadContext David Hildenbrand
2022-10-10  9:11 ` [PATCH v2 6/7] hostmem: Allow for specifying a ThreadContext for preallocation David Hildenbrand
2022-10-10  9:11 ` [PATCH v2 7/7] vl: Allow ThreadContext objects to be created before the sandbox option David Hildenbrand
2022-10-10 10:40 ` [PATCH v2 0/7] hostmem: NUMA-aware memory preallocation using ThreadContext Dr. David Alan Gilbert
2022-10-10 11:18   ` David Hildenbrand
2022-10-11  9:02     ` Dr. David Alan Gilbert [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y0UxNX5Y2dgZsUyN@work-vm \
    --to=dgilbert@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=david@redhat.com \
    --cc=eblake@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=imammedo@redhat.com \
    --cc=mprivozn@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=sw@weilnetz.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).