qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Steven Sistare <steven.sistare@oracle.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: qemu-devel@nongnu.org, Peter Xu <peterx@redhat.com>,
	Fabiano Rosas <farosas@suse.de>,
	David Hildenbrand <david@redhat.com>,
	Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
	Eduardo Habkost <eduardo@habkost.net>,
	Philippe Mathieu-Daude <philmd@linaro.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Daniel P. Berrange" <berrange@redhat.com>
Subject: Re: [PATCH V5 15/23] migration: cpr-transfer mode
Date: Tue, 28 Jan 2025 16:19:06 -0500	[thread overview]
Message-ID: <631f048e-22f7-43c8-bf19-7ecaf3072f7e@oracle.com> (raw)
In-Reply-To: <878qqvgnj8.fsf@pond.sub.org>

On 1/28/2025 6:56 AM, Markus Armbruster wrote:
> Steven Sistare <steven.sistare@oracle.com> writes:
>> On 1/17/2025 8:44 AM, Markus Armbruster wrote:
>>> Steven Sistare <steven.sistare@oracle.com> writes:
>>>
>>>> On 1/7/2025 7:05 AM, Markus Armbruster wrote:
>>>>> Steve Sistare <steven.sistare@oracle.com> writes:
>>>>>
>>>>>> Add the cpr-transfer migration mode, which allows the user to transfer
>>>>>> a guest to a new QEMU instance on the same host with minimal guest pause
>>>>>> time, by preserving guest RAM in place, albeit with new virtual addresses
>>>>>> in new QEMU, and by preserving device file descriptors.  Pages that were
>>>>>> locked in memory for DMA in old QEMU remain locked in new QEMU, because the
>>>>>> descriptor of the device that locked them remains open.
>>>>>>
>>>>>> cpr-transfer preserves memory and devices descriptors by sending them to
>>>>>> new QEMU over a unix domain socket using SCM_RIGHTS.  Such CPR state cannot
>>>>>> be sent over the normal migration channel, because devices and backends
>>>>>> are created prior to reading the channel,
>>>>>
>>>>> Is that an artifact of the way QEMU starts up, or is it fundamental?
>>>>
>>>> Hi Markus, welcome back and Happy New Year!
>>>
>>> Thank you!  A late happy new year for you, too!
>>
>> Sorry for the delay, I have been heads down preparing a new series.
> 
> No sweat :)
> 
>>>> This is a deeply ingrained artifact.  Changing it would require radically
>>>> refactoring the information passed on the command line vs the information
>>>> passed via monitor.
>>>
>>> More on this below.
>>>
>>>>>>                                               so this mode sends CPR state
>>>>>> over a second "cpr" migration channel.  New QEMU reads the cpr channel
>>>>>> prior to creating devices or backends.  The user specifies the cpr channel
>>>>>> in the channel arguments on the outgoing side, and in a second -incoming
>>>>>> command-line parameter on the incoming side.
>>>>>>
>>>>>> The user must start old QEMU with the the '-machine aux-ram-share=on' option,
>>>>>> which allows anonymous memory to be transferred in place to the new process
>>>>>> by transferring a memory descriptor for each ram block.  Memory-backend
>>>>>> objects must have the share=on attribute, but memory-backend-epc is not
>>>>>> supported.
>>>>>>
>>>>>> The user starts new QEMU on the same host as old QEMU, with command-line
>>>>>> arguments to create the same machine, plus the -incoming option for the
>>>>>> main migration channel, like normal live migration.  In addition, the user
>>>>>> adds a second -incoming option with channel type "cpr".  The CPR channel
>>>>>> address must be a type, such as unix socket, that supports SCM_RIGHTS.
>>>>>>
>>>>>> To initiate CPR, the user issues a migrate command to old QEMU, adding
>>>>>> a second migration channel of type "cpr" in the channels argument.
>>>>>> Old QEMU stops the VM, saves state to the migration channels, and enters
>>>>>> the postmigrate state.  New QEMU mmap's memory descriptors, and execution
>>>>>> resumes.
>>>>>>
>>>>>> The implementation splits qmp_migrate into start and finish functions.
>>>>>> Start sends CPR state to new QEMU, which responds by closing the CPR
>>>>>> channel.  Old QEMU detects the HUP then calls finish, which connects the
>>>>>> main migration channel.
>>>>>>
>>>>>> In summary, the usage is:
>>>>>>
>>>>>>      qemu-system-$arch -machine aux-ram-share=on ...
>>>>>>
>>>>>>      start new QEMU with "-incoming <main-uri> -incoming <cpr-channel>"
>>>>>>
>>>>>>      Issue commands to old QEMU:
>>>>>>        migrate_set_parameter mode cpr-transfer
>>>>>>
>>>>>>        {"execute": "migrate", ...
>>>>>>            {"channel-type": "main"...}, {"channel-type": "cpr"...} ... }
>>>>>>
>>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>>> [...]
>>>>>
>>>>>> diff --git a/migration/vmstate-types.c b/migration/vmstate-types.c
>>>>>> index f31deb3..2210f0c 100644
>>>>>> --- a/migration/vmstate-types.c
>>>>>> +++ b/migration/vmstate-types.c
>>>>>> @@ -15,6 +15,7 @@
>>>>>>     #include "qemu-file.h"
>>>>>>     #include "migration.h"
>>>>>>     #include "migration/vmstate.h"
>>>>>> +#include "migration/client-options.h"
>>>>>>     #include "qemu/error-report.h"
>>>>>>     #include "qemu/queue.h"
>>>>>>     #include "trace.h"
>>>>>> diff --git a/qapi/migration.json b/qapi/migration.json
>>>>>> index a605dc2..35309dc 100644
>>>>>> --- a/qapi/migration.json
>>>>>> +++ b/qapi/migration.json
>>>>>> @@ -614,9 +614,47 @@
>>>>>>     #     or COLO.
>>>>>>     #
>>>>>>     #     (since 8.2)
>>>>>> +#
>>>>>> +# @cpr-transfer: This mode allows the user to transfer a guest to a
>>>>>> +#     new QEMU instance on the same host with minimal guest pause
>>>>>> +#     time, by preserving guest RAM in place, albeit with new virtual
>>>>>> +#     addresses in new QEMU.  Devices and their pinned pages will also
>>>>>> +#     be preserved in a future QEMU release.
>>>>>
>>>>> Maybe "@cpr-transfer: Checkpoint and restart migration mode minimizes
>>>>> guest pause time by transferring guest RAM without copying it."
>>>>
>>>> "Checkpoint and restart migration mode" is ambiguous.  It would be
>>>> "Checkpoint and restart transfer migration mode".  That's a mouthful!
>>>> "This mode" is unambiguous, and matches the concise style of describing
>>>> options in this file.  Few if any of the options in this file repeat the
>>>> name of the option in the description.
>>>
>>> True.  But will readers understand what "CPR" stands for?  Do they need
>>> to understand?
>>
>> No, IMO they do not need to know the full spelling of the acronym to use the
>> functionality.  It is spelled out in a few places now.  It could be spelled
>> out in more places in the future.
> 
> Alright.
> 
>>>>> If you want to mention the guest RAM mapping differs between old and new
>>>>> QEMU, that's fine, but it's also detail, so I'd do it further down.
>>>>
>>>> I'll strike it.  I agree the user does not need to know.
>>>>
>>>>>> +#
>>>>>> +#     The user starts new QEMU on the same host as old QEMU, with
>>>>>> +#     command-line arguments to create the same machine, plus the
>>>>>> +#     -incoming option for the main migration channel, like normal
>>>>>> +#     live migration.  In addition, the user adds a second -incoming
>>>>>> +#     option with channel type "cpr".  The CPR channel address must
>>>>>> +#     be a type, such as unix socket, that supports SCM_RIGHTS.
>>>>>
>>>>> Permit me to indulge in a bit of pedantry...  A channel address doesn't
>>>>> support SCM_RIGHTS, only a channel may.  A channel supports it when it
>>>>> is backed by a UNIX domain socket.  The channel's socket's transport
>>>>> type need not be 'unix' for that, it could also be 'fd'.
>>>>>
>>>>> Suggest something like "This CPR channel must be a UNIX domain socket."
>>>>>
>>>>> If you want to say why, that's fine: "This CPR channel must support file
>>>>> descriptor transfer, i.e. it must be a UNIX domain socket."
>>>>>
>>>>> If you want to mention SCM_RIGHTS, that's fine, too: "This CPR channel
>>>>> must support file descriptor transfer with SCM_RIGHTS, i.e. it must be a
>>>>> UNIX domain socket."
>>>>
>>>> OK.
>>>>
>>>>>> +#
>>>>>> +#     To initiate CPR, the user issues a migrate command to old QEMU,
>>>>>> +#     adding a second migration channel of type "cpr" in the channels
>>>>>
>>>>> in the channel's
>>>>>
>>>>>> +#     argument.  Old QEMU stops the VM, saves state to the migration
>>>>>> +#     channels, and enters the postmigrate state.  Execution resumes
>>>>>> +#     in new QEMU.
>>>>>> +#
>>>>>> +#     New QEMU reads the CPR channel before opening a monitor, hence
>>>>>> +#     the CPR channel cannot be specified in the list of channels for
>>>>>> +#     a migrate-incoming command.  It may only be specified on the
>>>>>> +#     command line.
>>>>>
>>>>> This is a restriction that could conceivably be lifted in the future,
>>>>> right?
>>>>
>>>> Yes, but lifting it requires the big command-line vs monitor restructuring
>>>> I mentioned earlier.  IMO that is far enough in the future (and possibly never)
>>>> that adding a "Currently" disclaimer would be deceptive.
>>>
>>> Now I'm confused.
>>>
>>> Earlier, you explained why we can't simply send CPR state via the normal
>>> migration channel: we create devices and backends much earlier long
>>> before we receive from the migration channel.  Correct?
>>
>> Correct.
>>
>>> Here, you're documenting that the CPR channel can only be specified on
>>> the command line, not with migrate-incoming.  Isn't that a different
>>> problem?
>>
>> They are entangled problems.
>>
>> * cpr state must loaded before backends are created
> 
> I understand this is fundamental, i.e. CPR state is *required* to create
> backends.  Correct?

Correct.

>> * monitor must be created after backends are created
>>     (because the chardevs that define a monitor are backends).
> 
> This is an artifact of the way we configure monitors (use of a character
> backend) and create character backends (all at once at a certain point
> in startup).
> 
> Some management applications would prefer to use the command line just
> for setting up QMP, and do everything else in QMP.  Understandable!
> They need to use QMP anyway, and mostly eliminating the command line
> would simplify things.  Moreover, only QMP provides introspection.
> 
> To enable this, we need to provide a way to start a QMP monitor early
> enough to set up everything else.  If we pull this off, "monitor must be
> created after backends are created" is no longer true.

Yes.  It's a big change, which is why I ventured "unlikely to change any
time soon".

> Aside: in my view, QMP and command line are transports wrapping around
> an abstract interface.  I'd like to have them wrap around the *same*
> abstract interface.
> 
>> * migrate-incoming must come after the monitor is created
> 
> Fundamental.
> 
>> ==> migrate-incoming cannot specify the cpr-state channel
>>
>> This restriction is unlikely to change any time soon, if ever.
> 
> I wouldn't bet my own money on this :)

OK, "if ever" is too pessimistic.

>> I have documented it as is, without speculating about future state.
>> If users do not like it they can request changes.
> 
> I'm not objecting to your doc text.  I asked to better understand your
> thinking, because I needed that for a competent review.
> 
>>>>> What happens if a user tries to specify it with migrate-incoming?  Fails
>>>>> cleanly?  What's the error message?
>>>>
>>>> It fails cleanly with a pre-existing error message that could be more
>>>> descriptive, so I will improve it, thanks.
>>>>
>>>>> Maybe simply:
>>>>>
>>>>>             Currently, the CPR channel can only be specified on the command
>>>>>             line, not with the migrate-incoming command.
>>>>>
>>>>> with a big, fat comment explaining the restriction next to the spot
>>>>> that reports the error.
>>>>>
>>>>>> +#
>>>>>> +#     The main channel address cannot be a file type, and for the tcp
>>>>>> +#     type, the port cannot be 0 (meaning dynamically choose a port).
>>>>>
>>>>> What's "the tcp type"?  URI "tcp:..." / channel
>>>>> addr.transport=socket,addr.type=inet?
>>>>
>>>> I will clarify.
>>>>
>>>>>> +#
>>>>>> +#     Memory-backend objects must have the share=on attribute, but
>>>>>> +#     memory-backend-epc is not supported.  The VM must be started
>>>>>> +#     with the '-machine aux-ram-share=on' option.
>>>>>> +#
>>>>>> +#     When using -incoming defer, you must issue the migrate command
>>>>>> +#     to old QEMU before issuing any monitor commands to new QEMU.
>>>>>> +#     However, new QEMU does not open and read the migration stream
>>>>>> +#     until you issue the migrate incoming command.
>>>>>
>>>>> I think some (all?) instances of "old QEMU" and "new QEMU" would read
>>>>> better as "the old QEMU" and "the new QEMU".
>>>>
>>>> Maybe slightly,
>>>
>>> A second opinion from a native speaker would be nice.
>>
>> I have appreciated all your feedback and made many changes, and it has made
>> the code and documentation better.  Thanks for that. But right now, the V9
>> patches are queued, pass CI, and are ready to roll. I would hope at this point
>> that no one would consider "old QEMU" vs "the old QEMU" to be a showstopper
>> that requires new patches to be posted.
> 
> Certainly not a blocker.  If we decide it's an improvement we want, we
> can easily do it on top.
> 
> I routinely point out lots of little things that aren't blockers.  I try
> to be clear what is a blocker and what isn't, but I'm far from perfect
> there.  Thanks for your patience!

Thanks.  Sounds like we're good, and Fabiano can proceed with integration.

- Steve



  reply	other threads:[~2025-01-28 21:20 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-24 16:16 [PATCH V5 00/23] Live update: cpr-transfer Steve Sistare
2024-12-24 16:16 ` [PATCH V5 01/23] backends/hostmem-shm: factor out allocation of "anonymous shared memory with an fd" Steve Sistare
2024-12-24 16:56   ` Peter Xu
2024-12-24 16:16 ` [PATCH V5 02/23] physmem: qemu_ram_alloc_from_fd extensions Steve Sistare
2024-12-24 17:18   ` Peter Xu
2025-01-02 18:36     ` Steven Sistare
2025-01-02 19:48       ` Peter Xu
2025-01-02 20:03         ` Steven Sistare
2024-12-24 16:16 ` [PATCH V5 03/23] physmem: fd-based shared memory Steve Sistare
2024-12-24 17:27   ` Peter Xu
2025-01-02 18:34     ` Steven Sistare
2024-12-24 16:16 ` [PATCH V5 04/23] memory: add RAM_PRIVATE Steve Sistare
2024-12-24 16:16 ` [PATCH V5 05/23] machine: aux-ram-share option Steve Sistare
2024-12-24 16:16 ` [PATCH V5 06/23] migration: cpr-state Steve Sistare
2024-12-24 16:16 ` [PATCH V5 07/23] physmem: preserve ram blocks for cpr Steve Sistare
2024-12-24 17:32   ` Peter Xu
2024-12-24 16:16 ` [PATCH V5 08/23] hostmem-memfd: preserve " Steve Sistare
2024-12-24 16:16 ` [PATCH V5 09/23] hostmem-shm: " Steve Sistare
2024-12-24 16:16 ` [PATCH V5 10/23] migration: enhance migrate_uri_parse Steve Sistare
2024-12-24 17:48   ` Peter Xu
2024-12-24 16:16 ` [PATCH V5 11/23] migration: incoming channel Steve Sistare
2024-12-24 17:51   ` Peter Xu
2024-12-24 16:16 ` [PATCH V5 12/23] migration: SCM_RIGHTS for QEMUFile Steve Sistare
2024-12-24 16:16 ` [PATCH V5 13/23] migration: VMSTATE_FD Steve Sistare
2024-12-24 16:16 ` [PATCH V5 14/23] migration: cpr-transfer save and load Steve Sistare
2024-12-24 16:17 ` [PATCH V5 15/23] migration: cpr-transfer mode Steve Sistare
2024-12-24 19:24   ` Peter Xu
2025-01-02 19:21     ` Steven Sistare
2025-01-02 19:57       ` Peter Xu
2025-01-02 20:05         ` Steven Sistare
2025-01-07 12:05   ` Markus Armbruster
2025-01-07 15:38     ` Steven Sistare
2025-01-17 13:44       ` Markus Armbruster
2025-01-27 16:35         ` Steven Sistare
2025-01-28 11:56           ` Markus Armbruster
2025-01-28 21:19             ` Steven Sistare [this message]
2025-01-28 21:30             ` Steven Sistare
2025-01-29  6:19   ` Markus Armbruster
2024-12-24 16:17 ` [PATCH V5 16/23] migration-test: memory_backend Steve Sistare
2024-12-24 16:17 ` [PATCH V5 17/23] tests/qtest: optimize migrate_set_ports Steve Sistare
2024-12-24 19:26   ` Peter Xu
2024-12-24 16:17 ` [PATCH V5 18/23] tests/qtest: defer connection Steve Sistare
2024-12-24 19:27   ` Peter Xu
2024-12-24 16:17 ` [PATCH V5 19/23] migration-test: " Steve Sistare
2024-12-24 16:17 ` [PATCH V5 20/23] tests/qtest: enhance migration channels Steve Sistare
2024-12-24 19:48   ` Peter Xu
2024-12-24 16:17 ` [PATCH V5 21/23] tests/qtest: assert qmp_ready Steve Sistare
2024-12-24 19:54   ` Peter Xu
2025-01-02 18:36     ` Steven Sistare
2024-12-24 16:17 ` [PATCH V5 22/23] migration-test: cpr-transfer Steve Sistare
2024-12-24 20:01   ` Peter Xu
2024-12-24 20:06     ` Peter Xu
2025-01-02 18:35       ` Steven Sistare
2025-01-02 20:11         ` Peter Xu
2025-01-02 18:35     ` Steven Sistare
2025-01-02 20:09       ` Peter Xu
2025-01-02 20:12   ` Peter Xu
2024-12-24 16:17 ` [PATCH V5 23/23] migration: cpr-transfer documentation Steve Sistare
2024-12-24 20:02   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=631f048e-22f7-43c8-bf19-7ecaf3072f7e@oracle.com \
    --to=steven.sistare@oracle.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=david@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=farosas@suse.de \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).