qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dave@treblig.org>
To: Peter Xu <peterx@redhat.com>
Cc: "Steve Sistare" <steven.sistare@oracle.com>,
	qemu-devel@nongnu.org, "Fabiano Rosas" <farosas@suse.de>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Eric Blake" <eblake@redhat.com>,
	"Vladimir Sementsov-Ogievskiy" <vsementsov@yandex-team.ru>,
	"Daniel P. Berrangé" <berrange@redhat.com>
Subject: Re: [PATCH V3 0/9] Live update: cpr-exec
Date: Fri, 5 Sep 2025 17:09:05 +0000	[thread overview]
Message-ID: <aLsZMXHDc4uKMkyx@gallifrey> (raw)
In-Reply-To: <aLsUQWjW8gyZjySs@x1.local>

* Peter Xu (peterx@redhat.com) wrote:
> Add Vladimir and Dan.
> 
> On Thu, Aug 14, 2025 at 10:17:14AM -0700, Steve Sistare wrote:
> > This patch series adds the live migration cpr-exec mode.  
> > 
> > The new user-visible interfaces are:
> >   * cpr-exec (MigMode migration parameter)
> >   * cpr-exec-command (migration parameter)
> > 
> > cpr-exec mode is similar in most respects to cpr-transfer mode, with the 
> > primary difference being that old QEMU directly exec's new QEMU.  The user
> > specifies the command to exec new QEMU in the migration parameter
> > cpr-exec-command.
> > 
> > Why?
> > 
> > In a containerized QEMU environment, cpr-exec reuses an existing QEMU
> > container and its assigned resources.  By contrast, cpr-transfer mode
> > requires a new container to be created on the same host as the target of
> > the CPR operation.  Resources must be reserved for the new container, while
> > the old container still reserves resources until the operation completes.
> > Avoiding over commitment requires extra work in the management layer.
> 
> Can we spell out what are these resources?
> 
> CPR definitely relies on completely shared memory.  That's already not a
> concern.
> 
> CPR resolves resources that are bound to devices like VFIO by passing over
> FDs, these are not over commited either.
> 
> Is it accounting QEMU/KVM process overhead?  That would really be trivial,
> IMHO, but maybe something else?
> 
> > This is one reason why a cloud provider may prefer cpr-exec.  A second reason
> > is that the container may include agents with their own connections to the
> > outside world, and such connections remain intact if the container is reused.
> 
> We discussed about this one.  Personally I still cannot understand why this
> is a concern if the agents can be trivially started as a new instance.  But
> I admit I may not know the whole picture.  To me, the above point is more
> persuasive, but I'll need to understand which part that is over-commited
> that can be a problem.

> After all, cloud hosts should preserve some extra memory anyway to make
> sure dynamic resources allocations all the time (e.g., when live migration
> starts, KVM pgtables can drastically increase if huge pages are enabled,
> for PAGE_SIZE trackings), I assumed the over-commit portion should be less
> that those.. and when it's also temporary (src QEMU will release all
> resources after live upgrade) then it looks manageable.

k8s used to find it very hard to change the amount of memory allocated to a
container after launch (although I heard that's getting fixed); so you'd
need more excess at the start even if your peek during hand over is only
very short.

Dave
> 
> > 
> > How?
> > 
> > cpr-exec preserves descriptors across exec by clearing the CLOEXEC flag,
> > and by sending the unique name and value of each descriptor to new QEMU
> > via CPR state.
> > 
> > CPR state cannot be sent over the normal migration channel, because devices
> > and backends are created prior to reading the channel, so this mode sends
> > CPR state over a second migration channel that is not visible to the user.
> > New QEMU reads the second channel prior to creating devices or backends.
> > 
> > The exec itself is trivial.  After writing to the migration channels, the
> > migration code calls a new main-loop hook to perform the exec.
> > 
> > Example:
> > 
> > In this example, we simply restart the same version of QEMU, but in
> > a real scenario one would use a new QEMU binary path in cpr-exec-command.
> > 
> >   # qemu-kvm -monitor stdio
> >   -object memory-backend-memfd,id=ram0,size=1G
> >   -machine memory-backend=ram0 -machine aux-ram-share=on ...
> > 
> >   QEMU 10.1.50 monitor - type 'help' for more information
> >   (qemu) info status
> >   VM status: running
> >   (qemu) migrate_set_parameter mode cpr-exec
> >   (qemu) migrate_set_parameter cpr-exec-command qemu-kvm ... -incoming file:vm.state
> >   (qemu) migrate -d file:vm.state
> >   (qemu) QEMU 10.1.50 monitor - type 'help' for more information
> >   (qemu) info status
> >   VM status: running
> > 
> > Steve Sistare (9):
> >   migration: multi-mode notifier
> >   migration: add cpr_walk_fd
> >   oslib: qemu_clear_cloexec
> >   vl: helper to request exec
> >   migration: cpr-exec-command parameter
> >   migration: cpr-exec save and load
> >   migration: cpr-exec mode
> >   migration: cpr-exec docs
> >   vfio: cpr-exec mode
> 
> The other thing is, as Vladimir is working on (looks like) a cleaner way of
> passing FDs fully relying on unix sockets, I want to understand better on
> the relationships of his work and the exec model.
> 
> I still personally think we should always stick with unix sockets, but I'm
> open to be convinced on above limitations.  If exec is better than
> cpr-transfer in any way, the hope is more people can and should adopt it.
> 
> We also have no answer yet on how cpr-exec can resolve container world with
> seccomp forbidding exec.  I guess that's a no-go.  It's definitely a
> downside instead.  Better mention that in the cover letter.
> 
> Thanks,
> 
> -- 
> Peter Xu
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/


  reply	other threads:[~2025-09-05 17:11 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-14 17:17 [PATCH V3 0/9] Live update: cpr-exec Steve Sistare
2025-08-14 17:17 ` [PATCH V3 1/9] migration: multi-mode notifier Steve Sistare
2025-08-19 13:09   ` Fabiano Rosas
2025-08-14 17:17 ` [PATCH V3 2/9] migration: add cpr_walk_fd Steve Sistare
2025-08-14 17:17 ` [PATCH V3 3/9] oslib: qemu_clear_cloexec Steve Sistare
2025-08-14 17:17 ` [PATCH V3 4/9] vl: helper to request exec Steve Sistare
2025-08-14 17:17 ` [PATCH V3 5/9] migration: cpr-exec-command parameter Steve Sistare
2025-08-14 17:17 ` [PATCH V3 6/9] migration: cpr-exec save and load Steve Sistare
2025-08-14 17:17 ` [PATCH V3 7/9] migration: cpr-exec mode Steve Sistare
2025-08-14 17:17 ` [PATCH V3 8/9] migration: cpr-exec docs Steve Sistare
2025-08-14 17:17 ` [PATCH V3 9/9] vfio: cpr-exec mode Steve Sistare
2025-08-14 17:20   ` Steven Sistare
2025-09-05 16:48 ` [PATCH V3 0/9] Live update: cpr-exec Peter Xu
2025-09-05 17:09   ` Dr. David Alan Gilbert [this message]
2025-09-05 17:48     ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aLsZMXHDc4uKMkyx@gallifrey \
    --to=dave@treblig.org \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=eblake@redhat.com \
    --cc=farosas@suse.de \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=steven.sistare@oracle.com \
    --cc=vsementsov@yandex-team.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).