From: Steven Sistare <steven.sistare@oracle.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: qemu-devel@nongnu.org, Peter Xu <peterx@redhat.com>,
Fabiano Rosas <farosas@suse.de>,
David Hildenbrand <david@redhat.com>,
Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
Eduardo Habkost <eduardo@habkost.net>,
Philippe Mathieu-Daude <philmd@linaro.org>,
Paolo Bonzini <pbonzini@redhat.com>,
"Daniel P. Berrange" <berrange@redhat.com>
Subject: Re: [PATCH V5 15/23] migration: cpr-transfer mode
Date: Tue, 28 Jan 2025 16:19:06 -0500 [thread overview]
Message-ID: <631f048e-22f7-43c8-bf19-7ecaf3072f7e@oracle.com> (raw)
In-Reply-To: <878qqvgnj8.fsf@pond.sub.org>
On 1/28/2025 6:56 AM, Markus Armbruster wrote:
> Steven Sistare <steven.sistare@oracle.com> writes:
>> On 1/17/2025 8:44 AM, Markus Armbruster wrote:
>>> Steven Sistare <steven.sistare@oracle.com> writes:
>>>
>>>> On 1/7/2025 7:05 AM, Markus Armbruster wrote:
>>>>> Steve Sistare <steven.sistare@oracle.com> writes:
>>>>>
>>>>>> Add the cpr-transfer migration mode, which allows the user to transfer
>>>>>> a guest to a new QEMU instance on the same host with minimal guest pause
>>>>>> time, by preserving guest RAM in place, albeit with new virtual addresses
>>>>>> in new QEMU, and by preserving device file descriptors. Pages that were
>>>>>> locked in memory for DMA in old QEMU remain locked in new QEMU, because the
>>>>>> descriptor of the device that locked them remains open.
>>>>>>
>>>>>> cpr-transfer preserves memory and devices descriptors by sending them to
>>>>>> new QEMU over a unix domain socket using SCM_RIGHTS. Such CPR state cannot
>>>>>> be sent over the normal migration channel, because devices and backends
>>>>>> are created prior to reading the channel,
>>>>>
>>>>> Is that an artifact of the way QEMU starts up, or is it fundamental?
>>>>
>>>> Hi Markus, welcome back and Happy New Year!
>>>
>>> Thank you! A late happy new year for you, too!
>>
>> Sorry for the delay, I have been heads down preparing a new series.
>
> No sweat :)
>
>>>> This is a deeply ingrained artifact. Changing it would require radically
>>>> refactoring the information passed on the command line vs the information
>>>> passed via monitor.
>>>
>>> More on this below.
>>>
>>>>>> so this mode sends CPR state
>>>>>> over a second "cpr" migration channel. New QEMU reads the cpr channel
>>>>>> prior to creating devices or backends. The user specifies the cpr channel
>>>>>> in the channel arguments on the outgoing side, and in a second -incoming
>>>>>> command-line parameter on the incoming side.
>>>>>>
>>>>>> The user must start old QEMU with the the '-machine aux-ram-share=on' option,
>>>>>> which allows anonymous memory to be transferred in place to the new process
>>>>>> by transferring a memory descriptor for each ram block. Memory-backend
>>>>>> objects must have the share=on attribute, but memory-backend-epc is not
>>>>>> supported.
>>>>>>
>>>>>> The user starts new QEMU on the same host as old QEMU, with command-line
>>>>>> arguments to create the same machine, plus the -incoming option for the
>>>>>> main migration channel, like normal live migration. In addition, the user
>>>>>> adds a second -incoming option with channel type "cpr". The CPR channel
>>>>>> address must be a type, such as unix socket, that supports SCM_RIGHTS.
>>>>>>
>>>>>> To initiate CPR, the user issues a migrate command to old QEMU, adding
>>>>>> a second migration channel of type "cpr" in the channels argument.
>>>>>> Old QEMU stops the VM, saves state to the migration channels, and enters
>>>>>> the postmigrate state. New QEMU mmap's memory descriptors, and execution
>>>>>> resumes.
>>>>>>
>>>>>> The implementation splits qmp_migrate into start and finish functions.
>>>>>> Start sends CPR state to new QEMU, which responds by closing the CPR
>>>>>> channel. Old QEMU detects the HUP then calls finish, which connects the
>>>>>> main migration channel.
>>>>>>
>>>>>> In summary, the usage is:
>>>>>>
>>>>>> qemu-system-$arch -machine aux-ram-share=on ...
>>>>>>
>>>>>> start new QEMU with "-incoming <main-uri> -incoming <cpr-channel>"
>>>>>>
>>>>>> Issue commands to old QEMU:
>>>>>> migrate_set_parameter mode cpr-transfer
>>>>>>
>>>>>> {"execute": "migrate", ...
>>>>>> {"channel-type": "main"...}, {"channel-type": "cpr"...} ... }
>>>>>>
>>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>>> [...]
>>>>>
>>>>>> diff --git a/migration/vmstate-types.c b/migration/vmstate-types.c
>>>>>> index f31deb3..2210f0c 100644
>>>>>> --- a/migration/vmstate-types.c
>>>>>> +++ b/migration/vmstate-types.c
>>>>>> @@ -15,6 +15,7 @@
>>>>>> #include "qemu-file.h"
>>>>>> #include "migration.h"
>>>>>> #include "migration/vmstate.h"
>>>>>> +#include "migration/client-options.h"
>>>>>> #include "qemu/error-report.h"
>>>>>> #include "qemu/queue.h"
>>>>>> #include "trace.h"
>>>>>> diff --git a/qapi/migration.json b/qapi/migration.json
>>>>>> index a605dc2..35309dc 100644
>>>>>> --- a/qapi/migration.json
>>>>>> +++ b/qapi/migration.json
>>>>>> @@ -614,9 +614,47 @@
>>>>>> # or COLO.
>>>>>> #
>>>>>> # (since 8.2)
>>>>>> +#
>>>>>> +# @cpr-transfer: This mode allows the user to transfer a guest to a
>>>>>> +# new QEMU instance on the same host with minimal guest pause
>>>>>> +# time, by preserving guest RAM in place, albeit with new virtual
>>>>>> +# addresses in new QEMU. Devices and their pinned pages will also
>>>>>> +# be preserved in a future QEMU release.
>>>>>
>>>>> Maybe "@cpr-transfer: Checkpoint and restart migration mode minimizes
>>>>> guest pause time by transferring guest RAM without copying it."
>>>>
>>>> "Checkpoint and restart migration mode" is ambiguous. It would be
>>>> "Checkpoint and restart transfer migration mode". That's a mouthful!
>>>> "This mode" is unambiguous, and matches the concise style of describing
>>>> options in this file. Few if any of the options in this file repeat the
>>>> name of the option in the description.
>>>
>>> True. But will readers understand what "CPR" stands for? Do they need
>>> to understand?
>>
>> No, IMO they do not need to know the full spelling of the acronym to use the
>> functionality. It is spelled out in a few places now. It could be spelled
>> out in more places in the future.
>
> Alright.
>
>>>>> If you want to mention the guest RAM mapping differs between old and new
>>>>> QEMU, that's fine, but it's also detail, so I'd do it further down.
>>>>
>>>> I'll strike it. I agree the user does not need to know.
>>>>
>>>>>> +#
>>>>>> +# The user starts new QEMU on the same host as old QEMU, with
>>>>>> +# command-line arguments to create the same machine, plus the
>>>>>> +# -incoming option for the main migration channel, like normal
>>>>>> +# live migration. In addition, the user adds a second -incoming
>>>>>> +# option with channel type "cpr". The CPR channel address must
>>>>>> +# be a type, such as unix socket, that supports SCM_RIGHTS.
>>>>>
>>>>> Permit me to indulge in a bit of pedantry... A channel address doesn't
>>>>> support SCM_RIGHTS, only a channel may. A channel supports it when it
>>>>> is backed by a UNIX domain socket. The channel's socket's transport
>>>>> type need not be 'unix' for that, it could also be 'fd'.
>>>>>
>>>>> Suggest something like "This CPR channel must be a UNIX domain socket."
>>>>>
>>>>> If you want to say why, that's fine: "This CPR channel must support file
>>>>> descriptor transfer, i.e. it must be a UNIX domain socket."
>>>>>
>>>>> If you want to mention SCM_RIGHTS, that's fine, too: "This CPR channel
>>>>> must support file descriptor transfer with SCM_RIGHTS, i.e. it must be a
>>>>> UNIX domain socket."
>>>>
>>>> OK.
>>>>
>>>>>> +#
>>>>>> +# To initiate CPR, the user issues a migrate command to old QEMU,
>>>>>> +# adding a second migration channel of type "cpr" in the channels
>>>>>
>>>>> in the channel's
>>>>>
>>>>>> +# argument. Old QEMU stops the VM, saves state to the migration
>>>>>> +# channels, and enters the postmigrate state. Execution resumes
>>>>>> +# in new QEMU.
>>>>>> +#
>>>>>> +# New QEMU reads the CPR channel before opening a monitor, hence
>>>>>> +# the CPR channel cannot be specified in the list of channels for
>>>>>> +# a migrate-incoming command. It may only be specified on the
>>>>>> +# command line.
>>>>>
>>>>> This is a restriction that could conceivably be lifted in the future,
>>>>> right?
>>>>
>>>> Yes, but lifting it requires the big command-line vs monitor restructuring
>>>> I mentioned earlier. IMO that is far enough in the future (and possibly never)
>>>> that adding a "Currently" disclaimer would be deceptive.
>>>
>>> Now I'm confused.
>>>
>>> Earlier, you explained why we can't simply send CPR state via the normal
>>> migration channel: we create devices and backends much earlier long
>>> before we receive from the migration channel. Correct?
>>
>> Correct.
>>
>>> Here, you're documenting that the CPR channel can only be specified on
>>> the command line, not with migrate-incoming. Isn't that a different
>>> problem?
>>
>> They are entangled problems.
>>
>> * cpr state must loaded before backends are created
>
> I understand this is fundamental, i.e. CPR state is *required* to create
> backends. Correct?
Correct.
>> * monitor must be created after backends are created
>> (because the chardevs that define a monitor are backends).
>
> This is an artifact of the way we configure monitors (use of a character
> backend) and create character backends (all at once at a certain point
> in startup).
>
> Some management applications would prefer to use the command line just
> for setting up QMP, and do everything else in QMP. Understandable!
> They need to use QMP anyway, and mostly eliminating the command line
> would simplify things. Moreover, only QMP provides introspection.
>
> To enable this, we need to provide a way to start a QMP monitor early
> enough to set up everything else. If we pull this off, "monitor must be
> created after backends are created" is no longer true.
Yes. It's a big change, which is why I ventured "unlikely to change any
time soon".
> Aside: in my view, QMP and command line are transports wrapping around
> an abstract interface. I'd like to have them wrap around the *same*
> abstract interface.
>
>> * migrate-incoming must come after the monitor is created
>
> Fundamental.
>
>> ==> migrate-incoming cannot specify the cpr-state channel
>>
>> This restriction is unlikely to change any time soon, if ever.
>
> I wouldn't bet my own money on this :)
OK, "if ever" is too pessimistic.
>> I have documented it as is, without speculating about future state.
>> If users do not like it they can request changes.
>
> I'm not objecting to your doc text. I asked to better understand your
> thinking, because I needed that for a competent review.
>
>>>>> What happens if a user tries to specify it with migrate-incoming? Fails
>>>>> cleanly? What's the error message?
>>>>
>>>> It fails cleanly with a pre-existing error message that could be more
>>>> descriptive, so I will improve it, thanks.
>>>>
>>>>> Maybe simply:
>>>>>
>>>>> Currently, the CPR channel can only be specified on the command
>>>>> line, not with the migrate-incoming command.
>>>>>
>>>>> with a big, fat comment explaining the restriction next to the spot
>>>>> that reports the error.
>>>>>
>>>>>> +#
>>>>>> +# The main channel address cannot be a file type, and for the tcp
>>>>>> +# type, the port cannot be 0 (meaning dynamically choose a port).
>>>>>
>>>>> What's "the tcp type"? URI "tcp:..." / channel
>>>>> addr.transport=socket,addr.type=inet?
>>>>
>>>> I will clarify.
>>>>
>>>>>> +#
>>>>>> +# Memory-backend objects must have the share=on attribute, but
>>>>>> +# memory-backend-epc is not supported. The VM must be started
>>>>>> +# with the '-machine aux-ram-share=on' option.
>>>>>> +#
>>>>>> +# When using -incoming defer, you must issue the migrate command
>>>>>> +# to old QEMU before issuing any monitor commands to new QEMU.
>>>>>> +# However, new QEMU does not open and read the migration stream
>>>>>> +# until you issue the migrate incoming command.
>>>>>
>>>>> I think some (all?) instances of "old QEMU" and "new QEMU" would read
>>>>> better as "the old QEMU" and "the new QEMU".
>>>>
>>>> Maybe slightly,
>>>
>>> A second opinion from a native speaker would be nice.
>>
>> I have appreciated all your feedback and made many changes, and it has made
>> the code and documentation better. Thanks for that. But right now, the V9
>> patches are queued, pass CI, and are ready to roll. I would hope at this point
>> that no one would consider "old QEMU" vs "the old QEMU" to be a showstopper
>> that requires new patches to be posted.
>
> Certainly not a blocker. If we decide it's an improvement we want, we
> can easily do it on top.
>
> I routinely point out lots of little things that aren't blockers. I try
> to be clear what is a blocker and what isn't, but I'm far from perfect
> there. Thanks for your patience!
Thanks. Sounds like we're good, and Fabiano can proceed with integration.
- Steve
next prev parent reply other threads:[~2025-01-28 21:20 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-24 16:16 [PATCH V5 00/23] Live update: cpr-transfer Steve Sistare
2024-12-24 16:16 ` [PATCH V5 01/23] backends/hostmem-shm: factor out allocation of "anonymous shared memory with an fd" Steve Sistare
2024-12-24 16:56 ` Peter Xu
2024-12-24 16:16 ` [PATCH V5 02/23] physmem: qemu_ram_alloc_from_fd extensions Steve Sistare
2024-12-24 17:18 ` Peter Xu
2025-01-02 18:36 ` Steven Sistare
2025-01-02 19:48 ` Peter Xu
2025-01-02 20:03 ` Steven Sistare
2024-12-24 16:16 ` [PATCH V5 03/23] physmem: fd-based shared memory Steve Sistare
2024-12-24 17:27 ` Peter Xu
2025-01-02 18:34 ` Steven Sistare
2024-12-24 16:16 ` [PATCH V5 04/23] memory: add RAM_PRIVATE Steve Sistare
2024-12-24 16:16 ` [PATCH V5 05/23] machine: aux-ram-share option Steve Sistare
2024-12-24 16:16 ` [PATCH V5 06/23] migration: cpr-state Steve Sistare
2024-12-24 16:16 ` [PATCH V5 07/23] physmem: preserve ram blocks for cpr Steve Sistare
2024-12-24 17:32 ` Peter Xu
2024-12-24 16:16 ` [PATCH V5 08/23] hostmem-memfd: preserve " Steve Sistare
2024-12-24 16:16 ` [PATCH V5 09/23] hostmem-shm: " Steve Sistare
2024-12-24 16:16 ` [PATCH V5 10/23] migration: enhance migrate_uri_parse Steve Sistare
2024-12-24 17:48 ` Peter Xu
2024-12-24 16:16 ` [PATCH V5 11/23] migration: incoming channel Steve Sistare
2024-12-24 17:51 ` Peter Xu
2024-12-24 16:16 ` [PATCH V5 12/23] migration: SCM_RIGHTS for QEMUFile Steve Sistare
2024-12-24 16:16 ` [PATCH V5 13/23] migration: VMSTATE_FD Steve Sistare
2024-12-24 16:16 ` [PATCH V5 14/23] migration: cpr-transfer save and load Steve Sistare
2024-12-24 16:17 ` [PATCH V5 15/23] migration: cpr-transfer mode Steve Sistare
2024-12-24 19:24 ` Peter Xu
2025-01-02 19:21 ` Steven Sistare
2025-01-02 19:57 ` Peter Xu
2025-01-02 20:05 ` Steven Sistare
2025-01-07 12:05 ` Markus Armbruster
2025-01-07 15:38 ` Steven Sistare
2025-01-17 13:44 ` Markus Armbruster
2025-01-27 16:35 ` Steven Sistare
2025-01-28 11:56 ` Markus Armbruster
2025-01-28 21:19 ` Steven Sistare [this message]
2025-01-28 21:30 ` Steven Sistare
2025-01-29 6:19 ` Markus Armbruster
2024-12-24 16:17 ` [PATCH V5 16/23] migration-test: memory_backend Steve Sistare
2024-12-24 16:17 ` [PATCH V5 17/23] tests/qtest: optimize migrate_set_ports Steve Sistare
2024-12-24 19:26 ` Peter Xu
2024-12-24 16:17 ` [PATCH V5 18/23] tests/qtest: defer connection Steve Sistare
2024-12-24 19:27 ` Peter Xu
2024-12-24 16:17 ` [PATCH V5 19/23] migration-test: " Steve Sistare
2024-12-24 16:17 ` [PATCH V5 20/23] tests/qtest: enhance migration channels Steve Sistare
2024-12-24 19:48 ` Peter Xu
2024-12-24 16:17 ` [PATCH V5 21/23] tests/qtest: assert qmp_ready Steve Sistare
2024-12-24 19:54 ` Peter Xu
2025-01-02 18:36 ` Steven Sistare
2024-12-24 16:17 ` [PATCH V5 22/23] migration-test: cpr-transfer Steve Sistare
2024-12-24 20:01 ` Peter Xu
2024-12-24 20:06 ` Peter Xu
2025-01-02 18:35 ` Steven Sistare
2025-01-02 20:11 ` Peter Xu
2025-01-02 18:35 ` Steven Sistare
2025-01-02 20:09 ` Peter Xu
2025-01-02 20:12 ` Peter Xu
2024-12-24 16:17 ` [PATCH V5 23/23] migration: cpr-transfer documentation Steve Sistare
2024-12-24 20:02 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=631f048e-22f7-43c8-bf19-7ecaf3072f7e@oracle.com \
--to=steven.sistare@oracle.com \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=david@redhat.com \
--cc=eduardo@habkost.net \
--cc=farosas@suse.de \
--cc=marcel.apfelbaum@gmail.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).