From: "Dr. David Alan Gilbert" <dave@treblig.org>
To: Peter Xu <peterx@redhat.com>
Cc: "Steve Sistare" <steven.sistare@oracle.com>,
qemu-devel@nongnu.org, "Fabiano Rosas" <farosas@suse.de>,
"Markus Armbruster" <armbru@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Eric Blake" <eblake@redhat.com>,
"Vladimir Sementsov-Ogievskiy" <vsementsov@yandex-team.ru>,
"Daniel P. Berrangé" <berrange@redhat.com>
Subject: Re: [PATCH V3 0/9] Live update: cpr-exec
Date: Fri, 5 Sep 2025 17:09:05 +0000 [thread overview]
Message-ID: <aLsZMXHDc4uKMkyx@gallifrey> (raw)
In-Reply-To: <aLsUQWjW8gyZjySs@x1.local>
* Peter Xu (peterx@redhat.com) wrote:
> Add Vladimir and Dan.
>
> On Thu, Aug 14, 2025 at 10:17:14AM -0700, Steve Sistare wrote:
> > This patch series adds the live migration cpr-exec mode.
> >
> > The new user-visible interfaces are:
> > * cpr-exec (MigMode migration parameter)
> > * cpr-exec-command (migration parameter)
> >
> > cpr-exec mode is similar in most respects to cpr-transfer mode, with the
> > primary difference being that old QEMU directly exec's new QEMU. The user
> > specifies the command to exec new QEMU in the migration parameter
> > cpr-exec-command.
> >
> > Why?
> >
> > In a containerized QEMU environment, cpr-exec reuses an existing QEMU
> > container and its assigned resources. By contrast, cpr-transfer mode
> > requires a new container to be created on the same host as the target of
> > the CPR operation. Resources must be reserved for the new container, while
> > the old container still reserves resources until the operation completes.
> > Avoiding over commitment requires extra work in the management layer.
>
> Can we spell out what are these resources?
>
> CPR definitely relies on completely shared memory. That's already not a
> concern.
>
> CPR resolves resources that are bound to devices like VFIO by passing over
> FDs, these are not over commited either.
>
> Is it accounting QEMU/KVM process overhead? That would really be trivial,
> IMHO, but maybe something else?
>
> > This is one reason why a cloud provider may prefer cpr-exec. A second reason
> > is that the container may include agents with their own connections to the
> > outside world, and such connections remain intact if the container is reused.
>
> We discussed about this one. Personally I still cannot understand why this
> is a concern if the agents can be trivially started as a new instance. But
> I admit I may not know the whole picture. To me, the above point is more
> persuasive, but I'll need to understand which part that is over-commited
> that can be a problem.
> After all, cloud hosts should preserve some extra memory anyway to make
> sure dynamic resources allocations all the time (e.g., when live migration
> starts, KVM pgtables can drastically increase if huge pages are enabled,
> for PAGE_SIZE trackings), I assumed the over-commit portion should be less
> that those.. and when it's also temporary (src QEMU will release all
> resources after live upgrade) then it looks manageable.
k8s used to find it very hard to change the amount of memory allocated to a
container after launch (although I heard that's getting fixed); so you'd
need more excess at the start even if your peek during hand over is only
very short.
Dave
>
> >
> > How?
> >
> > cpr-exec preserves descriptors across exec by clearing the CLOEXEC flag,
> > and by sending the unique name and value of each descriptor to new QEMU
> > via CPR state.
> >
> > CPR state cannot be sent over the normal migration channel, because devices
> > and backends are created prior to reading the channel, so this mode sends
> > CPR state over a second migration channel that is not visible to the user.
> > New QEMU reads the second channel prior to creating devices or backends.
> >
> > The exec itself is trivial. After writing to the migration channels, the
> > migration code calls a new main-loop hook to perform the exec.
> >
> > Example:
> >
> > In this example, we simply restart the same version of QEMU, but in
> > a real scenario one would use a new QEMU binary path in cpr-exec-command.
> >
> > # qemu-kvm -monitor stdio
> > -object memory-backend-memfd,id=ram0,size=1G
> > -machine memory-backend=ram0 -machine aux-ram-share=on ...
> >
> > QEMU 10.1.50 monitor - type 'help' for more information
> > (qemu) info status
> > VM status: running
> > (qemu) migrate_set_parameter mode cpr-exec
> > (qemu) migrate_set_parameter cpr-exec-command qemu-kvm ... -incoming file:vm.state
> > (qemu) migrate -d file:vm.state
> > (qemu) QEMU 10.1.50 monitor - type 'help' for more information
> > (qemu) info status
> > VM status: running
> >
> > Steve Sistare (9):
> > migration: multi-mode notifier
> > migration: add cpr_walk_fd
> > oslib: qemu_clear_cloexec
> > vl: helper to request exec
> > migration: cpr-exec-command parameter
> > migration: cpr-exec save and load
> > migration: cpr-exec mode
> > migration: cpr-exec docs
> > vfio: cpr-exec mode
>
> The other thing is, as Vladimir is working on (looks like) a cleaner way of
> passing FDs fully relying on unix sockets, I want to understand better on
> the relationships of his work and the exec model.
>
> I still personally think we should always stick with unix sockets, but I'm
> open to be convinced on above limitations. If exec is better than
> cpr-transfer in any way, the hope is more people can and should adopt it.
>
> We also have no answer yet on how cpr-exec can resolve container world with
> seccomp forbidding exec. I guess that's a no-go. It's definitely a
> downside instead. Better mention that in the cover letter.
>
> Thanks,
>
> --
> Peter Xu
>
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux | Happy \
\ dave @ treblig.org | | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
next prev parent reply other threads:[~2025-09-05 17:11 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-14 17:17 [PATCH V3 0/9] Live update: cpr-exec Steve Sistare
2025-08-14 17:17 ` [PATCH V3 1/9] migration: multi-mode notifier Steve Sistare
2025-08-19 13:09 ` Fabiano Rosas
2025-09-09 15:43 ` Peter Xu
2025-09-09 16:40 ` Steven Sistare
2025-08-14 17:17 ` [PATCH V3 2/9] migration: add cpr_walk_fd Steve Sistare
2025-09-09 15:45 ` Peter Xu
2025-08-14 17:17 ` [PATCH V3 3/9] oslib: qemu_clear_cloexec Steve Sistare
2025-08-14 17:17 ` [PATCH V3 4/9] vl: helper to request exec Steve Sistare
2025-09-09 15:51 ` Peter Xu
2025-09-12 14:49 ` Steven Sistare
2025-09-15 16:35 ` Peter Xu
2025-09-19 15:27 ` Steven Sistare
2025-08-14 17:17 ` [PATCH V3 5/9] migration: cpr-exec-command parameter Steve Sistare
2025-09-08 16:07 ` Daniel P. Berrangé
2025-09-09 15:22 ` Steven Sistare
2025-09-11 15:10 ` Markus Armbruster
2025-09-12 14:48 ` Steven Sistare
2025-08-14 17:17 ` [PATCH V3 6/9] migration: cpr-exec save and load Steve Sistare
2025-09-19 15:35 ` Steven Sistare
2025-08-14 17:17 ` [PATCH V3 7/9] migration: cpr-exec mode Steve Sistare
2025-09-09 16:32 ` Peter Xu
2025-09-09 18:10 ` Steven Sistare
2025-09-09 19:27 ` Peter Xu
2025-09-12 14:49 ` Steven Sistare
2025-09-11 15:09 ` Markus Armbruster
2025-09-12 14:49 ` Steven Sistare
2025-08-14 17:17 ` [PATCH V3 8/9] migration: cpr-exec docs Steve Sistare
2025-09-15 20:36 ` Fabiano Rosas
2025-09-19 15:28 ` Steven Sistare
2025-08-14 17:17 ` [PATCH V3 9/9] vfio: cpr-exec mode Steve Sistare
2025-08-14 17:20 ` Steven Sistare
2025-09-19 15:35 ` Steven Sistare
2025-09-19 16:30 ` Cédric Le Goater
2025-09-05 16:48 ` [PATCH V3 0/9] Live update: cpr-exec Peter Xu
2025-09-05 17:09 ` Dr. David Alan Gilbert [this message]
2025-09-05 17:48 ` Peter Xu
2025-09-09 14:36 ` Steven Sistare
2025-09-09 15:24 ` Peter Xu
2025-09-09 16:03 ` Steven Sistare
2025-09-09 18:37 ` Peter Xu
2025-09-12 14:50 ` Steven Sistare
2025-09-12 15:44 ` Peter Xu
2025-09-19 17:16 ` Steven Sistare
2025-09-23 14:37 ` Vladimir Sementsov-Ogievskiy
2025-09-09 16:41 ` Vladimir Sementsov-Ogievskiy
2025-09-08 17:02 ` Vladimir Sementsov-Ogievskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aLsZMXHDc4uKMkyx@gallifrey \
--to=dave@treblig.org \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=eblake@redhat.com \
--cc=farosas@suse.de \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=steven.sistare@oracle.com \
--cc=vsementsov@yandex-team.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.