From: "Dr. David Alan Gilbert" <dave@treblig.org>
To: Peter Xu <peterx@redhat.com>
Cc: "Lukas Straub" <lukasstraub2@web.de>,
qemu-devel@nongnu.org, "Juraj Marcin" <jmarcin@redhat.com>,
"Fabiano Rosas" <farosas@suse.de>,
"Markus Armbruster" <armbru@redhat.com>,
"Daniel P . Berrangé" <berrange@redhat.com>,
"Lukáš Doktor" <ldoktor@redhat.com>,
"Juan Quintela" <quintela@trasno.org>,
"Zhang Chen" <zhangckid@gmail.com>,
zhanghailiang@xfusion.com, "Li Zhijian" <lizhijian@fujitsu.com>,
"Jason Wang" <jasowang@redhat.com>
Subject: Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Date: Fri, 16 Jan 2026 00:37:46 +0000 [thread overview]
Message-ID: <aWmIWrXjxLsqwLd6@gallifrey> (raw)
In-Reply-To: <aWl6ixQpHaMJhV_E@x1.local>
* Peter Xu (peterx@redhat.com) wrote:
> On Thu, Jan 15, 2026 at 10:59:47PM +0000, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > On Thu, Jan 15, 2026 at 10:49:29PM +0100, Lukas Straub wrote:
> > > > Nack.
> > > >
> > > > This code has users, as explained in my other email:
> > > > https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464
> > >
> > > Please then rework that series and consider include the following (I
> > > believe I pointed out a long time ago somewhere..):
> > >
> >
> > > - Some form of justification of why multifd needs to be enabled for COLO.
> > > For example, in your cluster deployment, using multifd can improve XXX
> > > by YYY. Please describe the use case and improvements.
> >
> > That one is pretty easy; since COLO is regularly taking snapshots, the faster
> > the snapshoting the less overhead there is.
>
> Thanks for chiming in, Dave. I can explain why I want to request for some
> numbers.
>
> Firstly, numbers normally proves it's used in a real system. It's at least
> being used and seriously tested.
Fair.
> Secondly, per my very limited understanding to COLO... the two VMs in most
> cases should be in-sync state already when both sides generate the same
> network packets.
(It's about a decade since I did any serious Colo, so I'll try and remember)
> Another sync (where multifd can start to take effect) is only needed when
> there're packets misalignments, but IIUC it should be rare. I don't know
> how rare it is, it would be good if Lukas could introduce some of those
> numbers in his deployment to help us understand COLO better if we'll need
> to keep it.
In reality misalignments are actually pretty common - although it's
very workload dependent. Any randomness in the order of execution in a multi-threaded
guest for example, or when a timer arrives etc can change the packet generation.
The migration time then becomes a latency issue before you can
transmit the mismatched packet once it's detected.
I think You still need to send a regular stream of snapshots even without
having *yet* received a packet difference. Now, I'm trying to remember the
reasoning; for a start if you leave the difference too long the migration
snapshot gets larger (which I think needs to be stored on RAM on the dest?)
and also you increase the chances of them getting a packet difference from
randomness increases.
I seem to remember there were clever schemes to get the optimal snapshot
scheme.
> IIUC, the critical path of COLO shouldn't be migration on its own? It
> should be when heartbeat gets lost; that normally should happen when two
> VMs are in sync. In this path, I don't see how multifd helps.. because
> there's no migration happening, only the src recording what has changed.
> Hence I think some number with description of the measurements may help us
> understand how important multifd is to COLO.
There's more than one critical path:
a) Time to recovery when one host fails
b) Overhead when both hosts are happy.
> Supporting multifd will cause new COLO functions to inject into core
> migration code paths (even if not much..). I want to make sure such (new)
> complexity is justified. I also want to avoid introducing a feature only
> because "we have XXX, then let's support XXX in COLO too, maybe some day
> it'll be useful".
I can't remember where the COLO code got into the main migration paths;
is that the reception side storing the received differences somewhere else?
> After these days, I found removing code is sometimes harder than writting
> new..
Haha yes.
Dave
> Thanks,
>
> >
> > Lukas: Given COLO has a bunch of different features (i.e. the block
> > replication, the clever network comparison etc) do you know which ones
> > are used in the setups you are aware of?
> >
> > I'd guess the tricky part of a test would be the network side; I'm
> > not too sure how you'd set that in a test.
>
> --
> Peter Xu
>
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux | Happy \
\ dave @ treblig.org | | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
next prev parent reply other threads:[~2026-01-16 0:38 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-14 19:56 [PATCH 0/3] migration: deprecations and removals for 11.0 Peter Xu
2026-01-14 19:56 ` [PATCH 1/3] migration/colo: Deprecate COLO migration framework Peter Xu
2026-01-14 20:11 ` Peter Xu
2026-01-15 21:49 ` Lukas Straub
2026-01-15 22:39 ` Peter Xu
2026-01-15 22:59 ` Dr. David Alan Gilbert
2026-01-15 23:38 ` Peter Xu
2026-01-16 0:37 ` Dr. David Alan Gilbert [this message]
2026-01-16 8:16 ` Zhang Chen
2026-01-16 7:47 ` Zhang Chen
2026-01-17 19:49 ` Lukas Straub
2026-01-17 20:15 ` Lukas Straub
2026-01-19 22:33 ` Peter Xu
2026-01-20 11:48 ` Lukas Straub
2026-01-20 15:58 ` Peter Xu
2026-01-20 19:04 ` Dr. David Alan Gilbert
2026-01-20 19:50 ` Peter Xu
2026-01-21 1:25 ` Dr. David Alan Gilbert
2026-01-21 17:03 ` Peter Xu
2026-01-21 17:31 ` Dr. David Alan Gilbert
2026-01-21 20:22 ` Peter Xu
2026-01-21 21:31 ` Dr. David Alan Gilbert
2026-01-21 22:22 ` Peter Xu
2026-01-16 7:05 ` Zhang Chen
2026-01-16 9:46 ` Daniel P. Berrangé
2026-01-16 13:56 ` Peter Xu
2026-01-16 6:26 ` Markus Armbruster
2026-01-16 8:22 ` Zhang Chen
2026-01-16 9:41 ` Markus Armbruster
2026-01-16 14:08 ` Peter Xu
2026-01-16 15:33 ` Markus Armbruster
2026-01-14 21:13 ` Dr. David Alan Gilbert
2026-01-15 5:56 ` Markus Armbruster
2026-01-15 18:53 ` Peter Xu
2026-01-14 19:56 ` [PATCH 2/3] migration: Remove zero-blocks capability Peter Xu
2026-01-15 6:00 ` Markus Armbruster
2026-01-15 18:53 ` Peter Xu
2026-01-14 19:56 ` [PATCH 3/3] migration: Remove fd: support on files Peter Xu
2026-01-14 22:10 ` Peter Xu
2026-01-15 12:15 ` Prasad Pandit
2026-01-15 17:39 ` Peter Xu
2026-01-15 6:11 ` [PATCH 0/3] migration: deprecations and removals for 11.0 Markus Armbruster
2026-01-15 18:58 ` Peter Xu
2026-01-15 14:37 ` Fabiano Rosas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aWmIWrXjxLsqwLd6@gallifrey \
--to=dave@treblig.org \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=farosas@suse.de \
--cc=jasowang@redhat.com \
--cc=jmarcin@redhat.com \
--cc=ldoktor@redhat.com \
--cc=lizhijian@fujitsu.com \
--cc=lukasstraub2@web.de \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@trasno.org \
--cc=zhangckid@gmail.com \
--cc=zhanghailiang@xfusion.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.