From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: thuth@redhat.com, frankja@linux.ibm.com, david@redhat.com,
cohuck@redhat.com, qemu-devel@nongnu.org, borntraeger@de.ibm.com,
pbonzini@redhat.com
Subject: Re: [PATCH v1 1/1] osdep: asynchronous teardown for shutdown on Linux
Date: Mon, 6 Dec 2021 12:27:15 +0000 [thread overview]
Message-ID: <Ya4Bo+AsD7NdaXG9@redhat.com> (raw)
In-Reply-To: <20211206131514.02801337@p-imbrenda>
On Mon, Dec 06, 2021 at 01:15:14PM +0100, Claudio Imbrenda wrote:
> On Mon, 6 Dec 2021 11:47:55 +0000
> Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> > On Mon, Dec 06, 2021 at 12:43:12PM +0100, Claudio Imbrenda wrote:
> > > On Mon, 6 Dec 2021 11:21:10 +0000
> > > Daniel P. Berrangé <berrange@redhat.com> wrote:
> > >
> > > > On Mon, Dec 06, 2021 at 12:06:11PM +0100, Claudio Imbrenda wrote:
> > > > > This patch adds support for asynchronously tearing down a VM on Linux.
> > > > >
> > > > > When qemu terminates, either naturally or because of a fatal signal,
> > > > > the VM is torn down. If the VM is huge, it can take a considerable
> > > > > amount of time for it to be cleaned up. In case of a protected VM, it
> > > > > might take even longer than a non-protected VM (this is the case on
> > > > > s390x, for example).
> > > > >
> > > > > Some users might want to shut down a VM and restart it immediately,
> > > > > without having to wait. This is especially true if management
> > > > > infrastructure like libvirt is used.
> > > > >
> > > > > This patch implements a simple trick on Linux to allow qemu to return
> > > > > immediately, with the teardown of the VM being performed
> > > > > asynchronously.
> > > > >
> > > > > If the new commandline option -async-teardown is used, a new process is
> > > > > spawned from qemu using the clone syscall, so that it will share its
> > > > > address space with qemu.
> > > > >
> > > > > The new process will then wait until qemu terminates, and then it will
> > > > > exit itself.
> > > > >
> > > > > This allows qemu to terminate quickly, without having to wait for the
> > > > > whole address space to be torn down. The teardown process will exit
> > > > > after qemu, so it will be the last user of the address space, and
> > > > > therefore it will take care of the actual teardown.
> > > > >
> > > > > The teardown process will share the same cgroups as qemu, so both
> > > > > memory usage and cpu time will be accounted properly.
> > > >
> > > > If this suggested workaround has any benefit to the shutdown of a VM
> > > > with libvirt, then it is a bug in libvirt IMHO.
> > > >
> > > > When libvirt tears down a QEMU VM, it should be waiting for *every*
> > > > process in the VM's cgroup to be terminated before it reports that
> > > > the VM is shutoff. IOW, the fact that this workaround lets the main
> > > > QEMU process exit quickly should not matter. libvirt should still
> > > > be blocked in exactly the same place in its code, waiting for the
> > > > "async" cleanup process to exit. IOW, this should not be async at
> > > > all from libvirt's POV.
> > >
> > > interesting, I did not know that about libvirt.
> > >
> > > maybe libvirt could be fixed/improved to allow this patch to work?
> >
> > That would not be desirable. When libvirt reports a VM as shutoff,
> > it is expected that all resources associated with the VM huave been
> > fully released, such that they are available for launching a new
> > VM. We can't allow resources to be asynchronously released as that
> > violates app's expectation that the resources are released once the
> > VM is shutoff.
>
> what about people who do not use libvirt? should those also be
> prevented from taking advantage of this feature only because libvirt
> can't use it?
Do we have any other mgmt tools this that won't ultimately have the
same issue ? I'd expect the same to apply to anything that is using
cgroups for managing resources used by a QEMU process at least.
> > > surely without this patch an asynchronous teardown will not be possible
> > > at all
> >
> > I appreciate that the current slow teardown is a pain, but async
> > teardown does not sound like an appealing alternative given that
> > the app can't use the resources again until the teardown is
> > complete.
>
> when a VM starts, it will not use all of the memory at once. it will
> start using it a little at a time. time during which the asynchronous
> process can complete the teardown.
How quickly it uses memory will depend on various factors. If it tries
to use more memory before the async cleanup has released enough, then
this looks like it risks putting the host into overcommit / OOM killer
scenarios surely ?
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
next prev parent reply other threads:[~2021-12-06 12:29 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-06 11:06 [PATCH v1 1/1] osdep: asynchronous teardown for shutdown on Linux Claudio Imbrenda
2021-12-06 11:21 ` Daniel P. Berrangé
2021-12-06 11:43 ` Claudio Imbrenda
2021-12-06 11:47 ` Daniel P. Berrangé
2021-12-06 12:15 ` Claudio Imbrenda
2021-12-06 12:27 ` Daniel P. Berrangé [this message]
2021-12-07 14:59 ` Halil Pasic
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Ya4Bo+AsD7NdaXG9@redhat.com \
--to=berrange@redhat.com \
--cc=borntraeger@de.ibm.com \
--cc=cohuck@redhat.com \
--cc=david@redhat.com \
--cc=frankja@linux.ibm.com \
--cc=imbrenda@linux.ibm.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=thuth@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).