Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Dr. David Alan Gilbert" <dave@treblig.org>
To: Peter Xu <peterx@redhat.com>
Cc: "Lukas Straub" <lukasstraub2@web.de>,
	qemu-devel@nongnu.org, "Juraj Marcin" <jmarcin@redhat.com>,
	"Fabiano Rosas" <farosas@suse.de>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Daniel P . Berrangé" <berrange@redhat.com>,
	"Lukáš Doktor" <ldoktor@redhat.com>,
	"Juan Quintela" <quintela@trasno.org>,
	"Zhang Chen" <zhangckid@gmail.com>,
	zhanghailiang@xfusion.com, "Li Zhijian" <lizhijian@fujitsu.com>,
	"Jason Wang" <jasowang@redhat.com>
Subject: Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Date: Wed, 21 Jan 2026 17:31:32 +0000	[thread overview]
Message-ID: <aXENdA6DP5j0ETIU@gallifrey> (raw)
In-Reply-To: <aXEG73I8tJyhpn69@x1.local>

* Peter Xu (peterx@redhat.com) wrote:
> On Wed, Jan 21, 2026 at 01:25:32AM +0000, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > On Tue, Jan 20, 2026 at 07:04:09PM +0000, Dr. David Alan Gilbert wrote:
> > 
> > <snip>
> > 
> > > > >   (2) Failure happens _after_ applying the new checkpoint, but _before_ the
> > > > >       whole checkpoint is applied.
> > > > > 
> > > > >       To be explicit, consider qemu_load_device_state() when the process of
> > > > >       colo_incoming_process_checkpoint() failed.  It means SVM applied
> > > > >       partial of PVM's checkpoint, I think it should mean PVM is completely
> > > > >       corrupted.
> > > > 
> > > > As long as the SVM has got the entire checkpoint, then it *can* apply it all
> > > > and carry on from that point.
> > > 
> > > Does it mean we assert() that qemu_load_device_state() will always success
> > > for COLO syncs?
> > 
> > Not sure; I'd expect if that load fails then the SVM fails; if that happens
> > on a periodic checkpoint then the PVM should carry on.
> 
> Hmm right, if qemu_load_device_state() failed, likely PVM is still alive.
> 
> > 
> > > Logically post_load() can invoke anything and I'm not sure if something can
> > > start to fail, but I confess I don't know an existing device that can
> > > trigger it.
> > 
> > Like a postcopy, it shouldn't fail unless there's an underlying failure
> > (e.g. storage died)
> 
> Postcopy can definitely fail at post_load()..  Actually Juraj just fixed it
> for 10.2 here so postcopy can now fail properly while save/load device
> states (we used to hang):
> 
> https://lore.kernel.org/r/20251103183301.3840862-1-jmarcin@redhat.com

Ah good.

> The two major causes that can fail postcopy vmstate load that I hit (while
> looking at bugs after you left; I wished you are still here!):
> 
> (1) KVM put() failures due to kernel version mismatch, or,
> 
> (2) virtio post_load() failures due to e.g. virtio feature unsupported.
> 
> Both of them fall into "unsupported dest kernel version" realm, though, so
> indeed it may not affect COLO, as I expect COLO should have two hosts to
> run the same kernel.

Right.

> > > Lukas told me something was broken though with pc machine type, on
> > > post_load() not re-entrant.  I think it might be possible though when
> > > post_load() is relevant to some device states (that guest driver can change
> > > between two checkpoint loads), but that's still only theoretical.  So maybe
> > > we can indeed assert it here.
> > 
> > I don't understand that non re-entrant bit?
> 
> It may not be the exact wording, the message is here:
> 
> https://lore.kernel.org/r/20260115233500.26fd1628@penguin
> 
>         There is a bug in the emulated ahci disk controller which crashes
>         when it's vmstate is loaded more than once.
> 
> I was expecting it's a post_load() because normal scalar vmstates should be
> fine to be loaded more than once.  I didn't look deeper.

Oh I see, multiple calls to post-load rather than calling within side each other;
yeh that makes sense - some things aren't expecting that.
But again, you're likely to find that out pretty quickly either way; it's not
something that is made worse by regular checkpointing.

<snip>

> > Oh, I think I've remembered why it's necessary to split it into RAM and non-RAM;
> > you can't parse a non-RAM stream and know when you've got an EOF flag in the stream;
> > especially for stuff that's open coded (like some of virtio);   so there's
> 
> Shouldn't customized get()/put() will at least still be wrapped with a
> QEMU_VM_SECTION_FULL section?

Yes - but the VM_SECTION wrapper doesn't tell you how long the data in the
section is; you have to walk your vmstate structures, decoding the data
(and possibly doing magic get()/put()'s) and at the end hoping
you hit a VMS_END (which I added just to spot screwups in this process).
So there's no way to 'read the whole of a VM_SECTION' - because you don't
know you've hit the end until you've decoded it.
(And some of those get() calls are open coded list storage which are something
like

  do {
      x=get()
      if (x & flag)
        break;

      read more data
  } while (...)

so on those you're really hoping you hit the flag.
I did turn some get()/put()'s into vmstate a while back; but those open
coded loops are really hard, there's a lot of variation.

> > no way to write a 'load until EOF' into a simple RAM buffer; you need to be
> > given an explicit size to know how much to expect.
> > 
> > You could do it for the RAM, but you'd need to write a protocol parser
> > to follow the stream to watch for the EOF.  It's actuallly harder with multifd;
> > how would you make a temporary buffer with multiple streams like that?
> 
> My understanding is postcopy must need a buffer because postcopy needs page
> request to work even during loading vmstates.  I don't see it required for
> COLO, though..

Right that's true for postcopy; but then the only way to load the stream into
that buffer is to load it all at once because of the vmstate problem above.
(and because in the original postcopy we needed the original fd free
for page requests; you might be able to avoid that with multifd now)

> I'll try to see if I can change COLO to use the generic precopy way of
> dumping vmstate, then I'll know if I missed something, and what I've
> missed..

Dave

> Thanks,
> 
> -- 
> Peter Xu
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

next prev parent reply	other threads:[~2026-01-21 17:32 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-14 19:56 [PATCH 0/3] migration: deprecations and removals for 11.0 Peter Xu
2026-01-14 19:56 ` [PATCH 1/3] migration/colo: Deprecate COLO migration framework Peter Xu
2026-01-14 20:11   ` Peter Xu
2026-01-15 21:49     ` Lukas Straub
2026-01-15 22:39       ` Peter Xu
2026-01-15 22:59         ` Dr. David Alan Gilbert
2026-01-15 23:38           ` Peter Xu
2026-01-16  0:37             ` Dr. David Alan Gilbert
2026-01-16  8:16               ` Zhang Chen
2026-01-16  7:47             ` Zhang Chen
2026-01-17 19:49             ` Lukas Straub
2026-01-17 20:15               ` Lukas Straub
2026-01-19 22:33               ` Peter Xu
2026-01-20 11:48                 ` Lukas Straub
2026-01-20 15:58                   ` Peter Xu
2026-01-20 19:04                     ` Dr. David Alan Gilbert
2026-01-20 19:50                       ` Peter Xu
2026-01-21  1:25                         ` Dr. David Alan Gilbert
2026-01-21 17:03                           ` Peter Xu
2026-01-21 17:31                             ` Dr. David Alan Gilbert [this message]
2026-01-21 20:22                               ` Peter Xu
2026-01-21 21:31                                 ` Dr. David Alan Gilbert
2026-01-21 22:22                                   ` Peter Xu
2026-01-16  7:05           ` Zhang Chen
2026-01-16  9:46           ` Daniel P. Berrangé
2026-01-16 13:56             ` Peter Xu
2026-01-16  6:26       ` Markus Armbruster
2026-01-16  8:22         ` Zhang Chen
2026-01-16  9:41           ` Markus Armbruster
2026-01-16 14:08             ` Peter Xu
2026-01-16 15:33               ` Markus Armbruster
2026-01-14 21:13   ` Dr. David Alan Gilbert
2026-01-15  5:56   ` Markus Armbruster
2026-01-15 18:53     ` Peter Xu
2026-01-14 19:56 ` [PATCH 2/3] migration: Remove zero-blocks capability Peter Xu
2026-01-15  6:00   ` Markus Armbruster
2026-01-15 18:53     ` Peter Xu
2026-01-14 19:56 ` [PATCH 3/3] migration: Remove fd: support on files Peter Xu
2026-01-14 22:10   ` Peter Xu
2026-01-15 12:15   ` Prasad Pandit
2026-01-15 17:39     ` Peter Xu
2026-01-15  6:11 ` [PATCH 0/3] migration: deprecations and removals for 11.0 Markus Armbruster
2026-01-15 18:58   ` Peter Xu
2026-01-15 14:37 ` Fabiano Rosas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aXENdA6DP5j0ETIU@gallifrey \
    --to=dave@treblig.org \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=farosas@suse.de \
    --cc=jasowang@redhat.com \
    --cc=jmarcin@redhat.com \
    --cc=ldoktor@redhat.com \
    --cc=lizhijian@fujitsu.com \
    --cc=lukasstraub2@web.de \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@trasno.org \
    --cc=zhangckid@gmail.com \
    --cc=zhanghailiang@xfusion.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.