From: Juraj Marcin <jmarcin@redhat.com>
To: qemu-devel@nongnu.org
Cc: Juraj Marcin <jmarcin@redhat.com>,
Jiri Denemark <jdenemar@redhat.com>, Stefan Weil <sw@weilnetz.de>,
Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
Fabiano Rosas <farosas@suse.de>
Subject: [RFC PATCH 0/4] migration: Introduce postcopy-setup capability and state
Date: Thu, 7 Aug 2025 13:49:08 +0200 [thread overview]
Message-ID: <20250807114922.1013286-1-jmarcin@redhat.com> (raw)
When postcopy migration starts, the source side sends all
non-postcopiable device data in one package command and immediately
transitions to a "postcopy-active" state. However, if the destination
side fails to load the device data or crashes during it, the source side
stays paused indefinitely with no way of recovery.
This series introduces a new "postcopy-setup" state during which the
destination side is guaranteed to not been started yet and, the source
side can recover and resume and the destination side gracefully exit.
Key element of this feature is isolating the postcopy-run command from
non-postcopiable data and sending it only after the destination side
acknowledges, that it has loaded all devices and is ready to be started.
This is necessary, as once the postcopy-run command is sent, the source
side cannot be sure if the destination is running or not and if it can
safely resume in case of a failure.
Reusing existing ping/pong messages was also considered, PING 3 is right
before the postcopy-run command, but there are two reasons why the PING
3 message might not be delivered to the source side:
1. destination machine failed, it is not running, and the source side
can resume,
2. there is a network failure, so PING 3 delivery fails, but until until
TCP or other transport times out, the destination could process the
postcopy-run command and start, in which case the source side cannot
resume.
Furthermore, this series contains two more patches required for the
implementation of this feature, that make the listen thread joinable for
graceful cleanup and detach it explicitly otherwise, and one patch
fixing state transitions inside postcopy_start().
Such (or similar) feature could be potentially useful also for normal
(only precopy) migration with return-path, to prevent issues when
network failure happens just as the destination side shuts the
return-path. When I tested such scenario (by filtering out the SHUT
command), the destination started and reported successful migration,
while the source side reported failed migration and tried to resume, but
exited as it failed to gain disk image file lock.
Another suggestion from Peter, that I would like to discuss, is that
instead of introducing a new state, we could move the boundary between
"device" and "postcopy-active" states to when the postcopy-run command
is actually sent (in this series boundary of "postcopy-setup" and
"postcopy-active"), however, I am not sure if such change would not have
any unwanted implications.
Juraj Marcin (4):
qemu-thread: Introduce qemu_thread_detach()
migration: Fix state transition in postcopy_start() error handling
migration: Make listen thread joinable
migration: Introduce postcopy-setup capability and state
include/qemu/thread.h | 1 +
migration/migration.c | 77 +++++++++++++++++++++++---
migration/migration.h | 7 +++
migration/options.c | 16 ++++++
migration/options.h | 1 +
migration/postcopy-ram.c | 7 +++
migration/savevm.c | 53 ++++++++++++++++--
qapi/migration.json | 19 ++++++-
tests/qtest/migration/postcopy-tests.c | 55 ++++++++++++++++++
tests/qtest/migration/precopy-tests.c | 3 +-
util/qemu-thread-posix.c | 8 +++
util/qemu-thread-win32.c | 10 ++++
12 files changed, 241 insertions(+), 16 deletions(-)
--
2.50.1
next reply other threads:[~2025-08-07 11:51 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-07 11:49 Juraj Marcin [this message]
2025-08-07 11:49 ` [RFC PATCH 1/4] qemu-thread: Introduce qemu_thread_detach() Juraj Marcin
2025-08-19 10:37 ` Daniel P. Berrangé
2025-08-07 11:49 ` [RFC PATCH 2/4] migration: Fix state transition in postcopy_start() error handling Juraj Marcin
2025-08-07 20:54 ` Peter Xu
2025-08-08 9:44 ` Juraj Marcin
2025-08-08 16:00 ` Peter Xu
2025-08-08 19:08 ` Fabiano Rosas
2025-08-11 13:00 ` Juraj Marcin
2025-08-07 11:49 ` [RFC PATCH 3/4] migration: Make listen thread joinable Juraj Marcin
2025-08-07 20:57 ` Peter Xu
2025-08-08 11:08 ` Juraj Marcin
2025-08-08 17:05 ` Peter Xu
2025-08-11 13:02 ` Juraj Marcin
2025-08-07 11:49 ` [RFC PATCH 4/4] migration: Introduce postcopy-setup capability and state Juraj Marcin
2025-08-11 14:54 ` [RFC PATCH 0/4] " Peter Xu
2025-08-12 13:34 ` Juraj Marcin
2025-08-13 17:42 ` Peter Xu
2025-08-14 15:42 ` Juraj Marcin
2025-08-14 19:24 ` Peter Xu
2025-08-15 6:35 ` Juraj Marcin
2025-09-01 17:57 ` Dr. David Alan Gilbert
2025-09-02 8:30 ` Juraj Marcin
2025-09-03 12:00 ` Dr. David Alan Gilbert
2025-09-03 13:07 ` Peter Xu
2025-09-04 16:11 ` Juraj Marcin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250807114922.1013286-1-jmarcin@redhat.com \
--to=jmarcin@redhat.com \
--cc=farosas@suse.de \
--cc=jdenemar@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=sw@weilnetz.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).