From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Steven Sistare <steven.sistare@oracle.com>
Cc: Peter Xu <peterx@redhat.com>,
qemu-devel@nongnu.org, Fabiano Rosas <farosas@suse.de>,
David Hildenbrand <david@redhat.com>,
Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
Eduardo Habkost <eduardo@habkost.net>,
Philippe Mathieu-Daude <philmd@linaro.org>,
Paolo Bonzini <pbonzini@redhat.com>,
Markus Armbruster <armbru@redhat.com>
Subject: Re: [PATCH V2 00/11] Live update: cpr-exec
Date: Thu, 5 Sep 2024 10:30:11 +0100 [thread overview]
Message-ID: <Ztl6I21FOHZRiK5e@redhat.com> (raw)
In-Reply-To: <d45761d3-6bee-42ac-9752-1192b3bae6ef@oracle.com>
On Fri, Aug 16, 2024 at 01:09:23PM -0400, Steven Sistare wrote:
> On 8/16/2024 12:17 PM, Peter Xu wrote:
> > On Fri, Aug 16, 2024 at 05:00:32PM +0100, Daniel P. Berrangé wrote:
> > > On Fri, Aug 16, 2024 at 11:34:10AM -0400, Peter Xu wrote:
> > > > On Fri, Aug 16, 2024 at 04:16:50PM +0100, Daniel P. Berrangé wrote:
> > > > > On Fri, Aug 16, 2024 at 11:06:10AM -0400, Peter Xu wrote:
> > > > > > On Thu, Aug 15, 2024 at 04:55:20PM -0400, Steven Sistare wrote:
> > > > > > > On 8/13/2024 3:46 PM, Peter Xu wrote:
> > > > > > > > On Tue, Aug 06, 2024 at 04:56:18PM -0400, Steven Sistare wrote:
> > > > > > > > > > The flipside, however, is that localhost migration via 2 separate QEMU
> > > > > > > > > > processes has issues where both QEMUs want to be opening the very same
> > > > > > > > > > file, and only 1 of them can ever have them open.
> > > > > > > >
> > > > > > > > I thought we used to have similar issue with block devices, but I assume
> > > > > > > > it's solved for years (and whoever owns it will take proper file lock,
> > > > > > > > IIRC, and QEMU migration should properly serialize the time window on who's
> > > > > > > > going to take the file lock).
> > > > > > > >
> > > > > > > > Maybe this is about something else?
> > > > > > >
> > > > > > > I don't have an example where this fails.
> > > > > > >
> > > > > > > I can cause "Failed to get "write" lock" errors if two qemu instances open
> > > > > > > the same block device, but the error is suppressed if you add the -incoming
> > > > > > > argument, due to this code:
> > > > > > >
> > > > > > > blk_attach_dev()
> > > > > > > if (runstate_check(RUN_STATE_INMIGRATE))
> > > > > > > blk->disable_perm = true;
> > > > > >
> > > > > > Yep, this one is pretty much expected.
> > > > > >
> > > > > > >
> > > > > > > > > Indeed, and "files" includes unix domain sockets.
> > > > > > >
> > > > > > > More on this -- the second qemu to bind a unix domain socket for listening
> > > > > > > wins, and the first qemu loses it (because second qemu unlinks and recreates
> > > > > > > the socket path before binding on the assumption that it is stale).
> > > > > > >
> > > > > > > One must use a different name for the socket for second qemu, and clients
> > > > > > > that wish to connect must be aware of the new port.
> > > > > > >
> > > > > > > > > Network ports also conflict.
> > > > > > > > > cpr-exec avoids such problems, and is one of the advantages of the method that
> > > > > > > > > I forgot to promote.
> > > > > > > >
> > > > > > > > I was thinking that's fine, as the host ports should be the backend of the
> > > > > > > > VM ports only anyway so they don't need to be identical on both sides?
> > > > > > > >
> > > > > > > > IOW, my understanding is it's the guest IP/ports/... which should still be
> > > > > > > > stable across migrations, where the host ports can be different as long as
> > > > > > > > the host ports can forward guest port messages correctly?
> > > > > > >
> > > > > > > Yes, one must use a different host port number for the second qemu, and clients
> > > > > > > that wish to connect must be aware of the new port.
> > > > > > >
> > > > > > > That is my point -- cpr-transfer requires fiddling with such things.
> > > > > > > cpr-exec does not.
> > > > > >
> > > > > > Right, and my understanding is all these facilities are already there, so
> > > > > > no new code should be needed on reconnect issues if to support cpr-transfer
> > > > > > in Libvirt or similar management layers that supports migrations.
> > > > >
> > > > > Note Libvirt explicitly blocks localhost migration today because
> > > > > solving all these clashing resource problems is a huge can of worms
> > > > > and it can't be made invisible to the user of libvirt in any practical
> > > > > way.
> > > >
> > > > Ahhh, OK. I'm pretty surprised by this, as I thought at least kubevirt
> > > > supported local migration somehow on top of libvirt.
> > >
> > > Since kubevirt runs inside a container, "localhost" migration
> > > is effectively migrating between 2 completely separate OS installs
> > > (containers), that happen to be on the same physical host. IOW, it
> > > is a cross-host migration from Libvirt & QEMU's POV, and there are
> > > no clashing resources to worry about.
> >
> > OK, makes sense.
> >
> > Then do you think it's possible to support cpr-transfer in that scenario
> > from Libvirt POV?
> >
> > >
> > > > Does it mean that cpr-transfer is a no-go in this case at least for Libvirt
> > > > to consume it (as cpr-* is only for local host migrations so far)? Even if
> > > > all the rest issues we're discussing with cpr-exec, is that the only way to
> > > > go for Libvirt, then?
> > >
> > > cpr-exec is certainly appealing from the POV of avoiding the clashing
> > > resources problem in libvirt.
> > >
> > > It has own issues though, because libvirt runs all QEMU processes with
> > > seccomp filters that block 'execve', as we consider QEMU to be untrustworthy
> > > and thus don't want to allow it to exec anything !
> > >
> > > I don't know which is the lesser evil from libvirt's POV.
> > >
> > > Personally I see security controls as an overriding requirement for
> > > everything.
> >
> > One thing I am aware of is cpr-exec is not the only one who might start to
> > use exec() in QEMU. TDX fundamentally will need to create another key VM to
> > deliver the keys and the plan seems to be using exec() too. However in
> > that case per my understanding the exec() is optional - the key VM can also
> > be created by Libvirt.
> >
> > IOW, it looks like we can still stick with execve() being blocked yet so
> > far except cpr-exec().
> >
> > Hmm, this makes the decision harder to make. We need to figure out a way
> > on knowing how to consume this feature for at least open source virt
> > stack.. So far it looks like it's only possible (if we take seccomp high
> > priority) we use cpr-transfer but only in a container.
>
> libvirt starts qemu with the -sandbox spawn=deny option which blocks fork, exec,
> and change namespace operations. I have a patch in my workspace to be submitted
> later called "seccomp: fine-grained control of fork, exec, and namespace" that allows
> libvirt to block fork and namespace but allow exec.
IMHO this significantly undermines the protection offered. fork(), without
execve() is relatively benign from a security POV, mostly a slightly greater
resource consumption issue, compared to spawning threads, which is always
allowed. Blocking execve() is the key security benefit, as that is a way to
pick up new privileges (through setuid), or bring new binary code into
memory (via the new ELF images loaded), or pick up new MAC policy through
transition rules, etc.
IOW, if you're going to allow 'exec', there's little point in blocking fork
IMHO, and as such this doesn't sound very appealing as something to add to
libvirt.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
prev parent reply other threads:[~2024-09-05 9:31 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-30 19:40 [PATCH V2 00/11] Live update: cpr-exec Steve Sistare
2024-06-30 19:40 ` [PATCH V2 01/11] machine: alloc-anon option Steve Sistare
2024-07-15 17:52 ` Fabiano Rosas
2024-07-16 9:19 ` Igor Mammedov
2024-07-17 19:24 ` Peter Xu
2024-07-18 15:43 ` Steven Sistare
2024-07-18 16:22 ` Peter Xu
2024-07-20 20:35 ` Steven Sistare
2024-08-04 16:20 ` Peter Xu
2024-07-20 20:28 ` Steven Sistare
2024-07-22 9:10 ` David Hildenbrand
2024-07-29 12:29 ` Igor Mammedov
2024-08-08 18:32 ` Steven Sistare
2024-08-12 18:37 ` Steven Sistare
2024-08-13 15:35 ` Peter Xu
2024-08-13 17:00 ` Alex Williamson
2024-08-13 18:45 ` Peter Xu
2024-08-13 18:56 ` Steven Sistare
2024-08-13 18:46 ` Steven Sistare
2024-08-13 18:49 ` Steven Sistare
2024-08-13 17:34 ` Steven Sistare
2024-08-13 19:02 ` Peter Xu
2024-06-30 19:40 ` [PATCH V2 02/11] migration: cpr-state Steve Sistare
2024-07-17 18:39 ` Fabiano Rosas
2024-07-19 15:03 ` Peter Xu
2024-07-20 19:53 ` Steven Sistare
2024-06-30 19:40 ` [PATCH V2 03/11] migration: save cpr mode Steve Sistare
2024-07-17 18:39 ` Fabiano Rosas
2024-07-18 15:47 ` Steven Sistare
2024-06-30 19:40 ` [PATCH V2 04/11] migration: stop vm earlier for cpr Steve Sistare
2024-07-17 18:59 ` Fabiano Rosas
2024-07-20 20:00 ` Steven Sistare
2024-07-22 13:42 ` Fabiano Rosas
2024-08-06 20:52 ` Steven Sistare
2024-06-30 19:40 ` [PATCH V2 05/11] physmem: preserve ram blocks " Steve Sistare
2024-06-30 19:40 ` [PATCH V2 06/11] migration: fix mismatched GPAs during cpr Steve Sistare
2024-07-19 16:28 ` Peter Xu
2024-07-20 21:28 ` Steven Sistare
2024-08-07 21:04 ` Steven Sistare
2024-08-13 20:43 ` Peter Xu
2024-08-15 20:54 ` Steven Sistare
2024-08-16 14:43 ` Peter Xu
2024-08-16 17:10 ` Steven Sistare
2024-08-21 16:57 ` Peter Xu
2024-06-30 19:40 ` [PATCH V2 07/11] oslib: qemu_clear_cloexec Steve Sistare
2024-06-30 19:40 ` [PATCH V2 08/11] vl: helper to request exec Steve Sistare
2024-06-30 19:40 ` [PATCH V2 09/11] migration: cpr-exec-command parameter Steve Sistare
2024-06-30 19:40 ` [PATCH V2 10/11] migration: cpr-exec save and load Steve Sistare
2024-06-30 19:40 ` [PATCH V2 11/11] migration: cpr-exec mode Steve Sistare
2024-07-18 15:56 ` [PATCH V2 00/11] Live update: cpr-exec Peter Xu
2024-07-20 21:26 ` Steven Sistare
2024-08-04 16:10 ` Peter Xu
2024-08-07 19:47 ` Steven Sistare
2024-08-13 20:12 ` Peter Xu
2024-08-20 16:28 ` [PATCH V2 00/11] Live update: cpr-exec (reconnections) Steven Sistare
2024-07-22 8:59 ` [PATCH V2 00/11] Live update: cpr-exec David Hildenbrand
2024-08-04 15:43 ` Peter Xu
2024-08-05 9:52 ` David Hildenbrand
2024-08-05 10:06 ` David Hildenbrand
2024-08-05 10:01 ` Daniel P. Berrangé
2024-08-06 20:56 ` Steven Sistare
2024-08-13 19:46 ` Peter Xu
2024-08-15 20:55 ` Steven Sistare
2024-08-16 15:06 ` Peter Xu
2024-08-16 15:16 ` Daniel P. Berrangé
2024-08-16 15:19 ` Steven Sistare
2024-08-16 15:34 ` Peter Xu
2024-08-16 16:00 ` Daniel P. Berrangé
2024-08-16 16:17 ` Peter Xu
2024-08-16 16:28 ` Daniel P. Berrangé
2024-08-16 17:09 ` Steven Sistare
2024-08-21 18:34 ` Peter Xu
2024-09-04 20:58 ` Steven Sistare
2024-09-04 22:23 ` Peter Xu
2024-09-05 9:49 ` Daniel P. Berrangé
2024-09-05 9:43 ` Daniel P. Berrangé
2024-09-05 9:30 ` Daniel P. Berrangé [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Ztl6I21FOHZRiK5e@redhat.com \
--to=berrange@redhat.com \
--cc=armbru@redhat.com \
--cc=david@redhat.com \
--cc=eduardo@habkost.net \
--cc=farosas@suse.de \
--cc=marcel.apfelbaum@gmail.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=steven.sistare@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).