From: "Dr. David Alan Gilbert" <dave@treblig.org>
To: Peter Xu <peterx@redhat.com>
Cc: "Steve Sistare" <steven.sistare@oracle.com>,
qemu-devel@nongnu.org, "Fabiano Rosas" <farosas@suse.de>,
"Markus Armbruster" <armbru@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Eric Blake" <eblake@redhat.com>,
"Vladimir Sementsov-Ogievskiy" <vsementsov@yandex-team.ru>,
"Daniel P. Berrangé" <berrange@redhat.com>
Subject: Re: [PATCH V3 0/9] Live update: cpr-exec
Date: Fri, 5 Sep 2025 17:09:05 +0000 [thread overview]
Message-ID: <aLsZMXHDc4uKMkyx@gallifrey> (raw)
In-Reply-To: <aLsUQWjW8gyZjySs@x1.local>
* Peter Xu (peterx@redhat.com) wrote:
> Add Vladimir and Dan.
>
> On Thu, Aug 14, 2025 at 10:17:14AM -0700, Steve Sistare wrote:
> > This patch series adds the live migration cpr-exec mode.
> >
> > The new user-visible interfaces are:
> > * cpr-exec (MigMode migration parameter)
> > * cpr-exec-command (migration parameter)
> >
> > cpr-exec mode is similar in most respects to cpr-transfer mode, with the
> > primary difference being that old QEMU directly exec's new QEMU. The user
> > specifies the command to exec new QEMU in the migration parameter
> > cpr-exec-command.
> >
> > Why?
> >
> > In a containerized QEMU environment, cpr-exec reuses an existing QEMU
> > container and its assigned resources. By contrast, cpr-transfer mode
> > requires a new container to be created on the same host as the target of
> > the CPR operation. Resources must be reserved for the new container, while
> > the old container still reserves resources until the operation completes.
> > Avoiding over commitment requires extra work in the management layer.
>
> Can we spell out what are these resources?
>
> CPR definitely relies on completely shared memory. That's already not a
> concern.
>
> CPR resolves resources that are bound to devices like VFIO by passing over
> FDs, these are not over commited either.
>
> Is it accounting QEMU/KVM process overhead? That would really be trivial,
> IMHO, but maybe something else?
>
> > This is one reason why a cloud provider may prefer cpr-exec. A second reason
> > is that the container may include agents with their own connections to the
> > outside world, and such connections remain intact if the container is reused.
>
> We discussed about this one. Personally I still cannot understand why this
> is a concern if the agents can be trivially started as a new instance. But
> I admit I may not know the whole picture. To me, the above point is more
> persuasive, but I'll need to understand which part that is over-commited
> that can be a problem.
> After all, cloud hosts should preserve some extra memory anyway to make
> sure dynamic resources allocations all the time (e.g., when live migration
> starts, KVM pgtables can drastically increase if huge pages are enabled,
> for PAGE_SIZE trackings), I assumed the over-commit portion should be less
> that those.. and when it's also temporary (src QEMU will release all
> resources after live upgrade) then it looks manageable.
k8s used to find it very hard to change the amount of memory allocated to a
container after launch (although I heard that's getting fixed); so you'd
need more excess at the start even if your peek during hand over is only
very short.
Dave
>
> >
> > How?
> >
> > cpr-exec preserves descriptors across exec by clearing the CLOEXEC flag,
> > and by sending the unique name and value of each descriptor to new QEMU
> > via CPR state.
> >
> > CPR state cannot be sent over the normal migration channel, because devices
> > and backends are created prior to reading the channel, so this mode sends
> > CPR state over a second migration channel that is not visible to the user.
> > New QEMU reads the second channel prior to creating devices or backends.
> >
> > The exec itself is trivial. After writing to the migration channels, the
> > migration code calls a new main-loop hook to perform the exec.
> >
> > Example:
> >
> > In this example, we simply restart the same version of QEMU, but in
> > a real scenario one would use a new QEMU binary path in cpr-exec-command.
> >
> > # qemu-kvm -monitor stdio
> > -object memory-backend-memfd,id=ram0,size=1G
> > -machine memory-backend=ram0 -machine aux-ram-share=on ...
> >
> > QEMU 10.1.50 monitor - type 'help' for more information
> > (qemu) info status
> > VM status: running
> > (qemu) migrate_set_parameter mode cpr-exec
> > (qemu) migrate_set_parameter cpr-exec-command qemu-kvm ... -incoming file:vm.state
> > (qemu) migrate -d file:vm.state
> > (qemu) QEMU 10.1.50 monitor - type 'help' for more information
> > (qemu) info status
> > VM status: running
> >
> > Steve Sistare (9):
> > migration: multi-mode notifier
> > migration: add cpr_walk_fd
> > oslib: qemu_clear_cloexec
> > vl: helper to request exec
> > migration: cpr-exec-command parameter
> > migration: cpr-exec save and load
> > migration: cpr-exec mode
> > migration: cpr-exec docs
> > vfio: cpr-exec mode
>
> The other thing is, as Vladimir is working on (looks like) a cleaner way of
> passing FDs fully relying on unix sockets, I want to understand better on
> the relationships of his work and the exec model.
>
> I still personally think we should always stick with unix sockets, but I'm
> open to be convinced on above limitations. If exec is better than
> cpr-transfer in any way, the hope is more people can and should adopt it.
>
> We also have no answer yet on how cpr-exec can resolve container world with
> seccomp forbidding exec. I guess that's a no-go. It's definitely a
> downside instead. Better mention that in the cover letter.
>
> Thanks,
>
> --
> Peter Xu
>
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux | Happy \
\ dave @ treblig.org | | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
next prev parent reply other threads:[~2025-09-05 17:11 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-14 17:17 [PATCH V3 0/9] Live update: cpr-exec Steve Sistare
2025-08-14 17:17 ` [PATCH V3 1/9] migration: multi-mode notifier Steve Sistare
2025-08-19 13:09 ` Fabiano Rosas
2025-08-14 17:17 ` [PATCH V3 2/9] migration: add cpr_walk_fd Steve Sistare
2025-08-14 17:17 ` [PATCH V3 3/9] oslib: qemu_clear_cloexec Steve Sistare
2025-08-14 17:17 ` [PATCH V3 4/9] vl: helper to request exec Steve Sistare
2025-08-14 17:17 ` [PATCH V3 5/9] migration: cpr-exec-command parameter Steve Sistare
2025-08-14 17:17 ` [PATCH V3 6/9] migration: cpr-exec save and load Steve Sistare
2025-08-14 17:17 ` [PATCH V3 7/9] migration: cpr-exec mode Steve Sistare
2025-08-14 17:17 ` [PATCH V3 8/9] migration: cpr-exec docs Steve Sistare
2025-08-14 17:17 ` [PATCH V3 9/9] vfio: cpr-exec mode Steve Sistare
2025-08-14 17:20 ` Steven Sistare
2025-09-05 16:48 ` [PATCH V3 0/9] Live update: cpr-exec Peter Xu
2025-09-05 17:09 ` Dr. David Alan Gilbert [this message]
2025-09-05 17:48 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aLsZMXHDc4uKMkyx@gallifrey \
--to=dave@treblig.org \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=eblake@redhat.com \
--cc=farosas@suse.de \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=steven.sistare@oracle.com \
--cc=vsementsov@yandex-team.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).