From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Michal Privoznik <mprivozn@redhat.com>
Cc: qemu-devel@nongnu.org, david@redhat.com
Subject: Re: [PATCH] util: NUMA aware memory preallocation
Date: Wed, 11 May 2022 10:19:59 +0100 [thread overview]
Message-ID: <Ynt/v2SHmnO2afg4@redhat.com> (raw)
In-Reply-To: <ffdcd118d59b379ede2b64745144165a40f6a813.1652165704.git.mprivozn@redhat.com>
On Tue, May 10, 2022 at 08:55:33AM +0200, Michal Privoznik wrote:
> When allocating large amounts of memory the task is offloaded
> onto threads. These threads then use various techniques to
> allocate the memory fully (madvise(), writing into the memory).
> However, these threads are free to run on any CPU, which becomes
> problematic on NUMA machines because it may happen that a thread
> is running on a distant node.
>
> Ideally, this is something that a management application would
> resolve, but we are not anywhere close to that, Firstly, memory
> allocation happens before monitor socket is even available. But
> okay, that's what -preconfig is for. But then the problem is that
> 'object-add' would not return until all memory is preallocated.
Is the delay to 'object-add' actually a problem ?
Currently we're cold plugging the memory backends, so prealloc
happens before QMP is available. So we have a delay immediately
at startup. Switching to -preconfig plus 'object-add' would
not be making the delay worse, merely moving it ever so slightly
later.
With the POV of an application using libvirt, this is the same.
virDomainCreate takes 1 hour, regardless of whether the 1 hour
allocatinon delay is before QMP or in -preconfig object-add
execution.
> Long story short, management application has no way of learning
> TIDs of allocator threads so it can't make them run NUMA aware.
This feels like the key issue. The preallocation threads are
invisible to libvirt, regardless of whether we're doing coldplug
or hotplug of memory-backends. Indeed the threads are invisible
to all of QEMU, except the memory backend code.
Conceptually we need 1 or more explicit worker threads, that we
can assign CPU affinity to, and then QEMU can place jobs on them.
I/O threads serve this role, but limited to blockdev work. We
need a generalization of I/O threads, for arbitrary jobs that
QEMU might want to farm out to specific numa nodes.
In a guest spanning multiple host NUMA nodes, libvirt would
have to configure 1 or more worker threads for QEMU, learn
their TIDs,then add the memory backends in -preconfig, which
would farm our preallocation to the worker threads, with
job placement matching the worker's affinity.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
next prev parent reply other threads:[~2022-05-11 9:29 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-10 6:55 [PATCH] util: NUMA aware memory preallocation Michal Privoznik
2022-05-10 9:12 ` Daniel P. Berrangé
2022-05-10 10:27 ` Dr. David Alan Gilbert
2022-05-11 13:16 ` Michal Prívozník
2022-05-11 14:50 ` David Hildenbrand
2022-05-11 15:08 ` Daniel P. Berrangé
2022-05-11 16:41 ` David Hildenbrand
2022-05-11 8:34 ` Dr. David Alan Gilbert
2022-05-11 9:20 ` Daniel P. Berrangé
2022-05-11 9:19 ` Daniel P. Berrangé [this message]
2022-05-11 9:31 ` David Hildenbrand
2022-05-11 9:34 ` Daniel P. Berrangé
2022-05-11 10:03 ` David Hildenbrand
2022-05-11 10:10 ` Daniel P. Berrangé
2022-05-11 11:07 ` Paolo Bonzini
2022-05-11 16:54 ` Daniel P. Berrangé
2022-05-12 7:41 ` Paolo Bonzini
2022-05-12 8:15 ` Daniel P. Berrangé
2022-06-08 10:34 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Ynt/v2SHmnO2afg4@redhat.com \
--to=berrange@redhat.com \
--cc=david@redhat.com \
--cc=mprivozn@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).