All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lukas Straub <lukasstraub2@web.de>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, Fabiano Rosas <farosas@suse.de>,
	Laurent Vivier <lvivier@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Zhang Chen <zhangckid@gmail.com>,
	Hailiang Zhang <zhanghailiang@xfusion.com>,
	Markus Armbruster <armbru@redhat.com>,
	Juan Quintela <quintela@trasno.org>
Subject: Re: [PATCH v2 4/8] multifd: Add COLO support
Date: Wed, 21 Jan 2026 20:00:30 +0100	[thread overview]
Message-ID: <20260121200030.30e8537c@penguin> (raw)
In-Reply-To: <aW-31rQky14G8_lq@x1.local>

[-- Attachment #1: Type: text/plain, Size: 9684 bytes --]

On Tue, 20 Jan 2026 12:13:58 -0500
Peter Xu <peterx@redhat.com> wrote:

> On Sat, Jan 17, 2026 at 03:09:11PM +0100, Lukas Straub wrote:
> > Like in the normal ram_load() path, put the received pages into the
> > colo cache and mark the pages in the bitmap so that they will be
> > flushed to the guest later.
> > 
> > Multifd with COLO is useful to reduce the VM pause time during checkpointing
> > for latency sensitive workloads. In such workloads the worst-case latency
> > is especially important.
> > 
> > Also, multifd migration is the preferred way to do migration nowadays and this
> > allows to use multifd compression with COLO.
> > 
> > Benchmark:
> > Cluster nodes
> >  - Intel Xenon E5-2630 v3
> >  - 48Gb RAM
> >  - 10G Ethernet
> > Guest
> >  - Windows Server 2016
> >  - 6Gb RAM
> >  - 4 cores
> > Workload
> >  - Upload a file to the guest with SMB to simulate moderate
> >    memory dirtying
> >  - Measure the memory transfer time portion of each checkpoint
> >  - 600ms COLO checkpoint interval
> > 
> > Results
> > Plain
> >  idle mean: 4.50ms 99per: 10.33ms
> >  load mean: 24.30ms 99per: 78.05ms
> > Multifd-4
> >  idle mean: 6.48ms 99per: 10.41ms
> >  load mean: 14.12ms 99per: 31.27ms  
> 
> Thanks for the numbers.  They're persuasive at least from 1st look.
> 
> Said that, one major question is, multifd should only help with throughput
> when cpu is a bottleneck sending, in your case it's 10Gbps NIC.  Normally
> any decent cpu should be able to push closer to 10Gbps even without
> multifd.
> 
> Per my previous experiences, multifd can only show a difference when the
> hosts have at least 25GBps+ bandwidth available.
> 
> Maybe you turned on compression already?  If so, worth stating the
> compressor methods chosen / parameters.

No, it's just the old Haswell CPU I guess. My clients also use it on
embedded platforms with embedded/mobile CPUs, so not necessarily the
fastest CPUs.

Btw, I forgot to mention, it's already worth it for the precopy phase
as it helps with converging. And also for my other patch that sends
dirty ram in the time between the checkpoints.

> 
> > 
> > Evaluation
> > While multifd has slightly higher latency when the guest idles, it is
> > 10ms faster under load and more importantly it's worst case latency is
> > less than 1/2 of plain under load as can be seen in the 99. Percentile.
> > 
> > Signed-off-by: Juan Quintela <quintela@redhat.com>
> > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> > ---
> >  MAINTAINERS              |  1 +
> >  migration/meson.build    |  2 +-
> >  migration/multifd-colo.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  migration/multifd-colo.h | 26 +++++++++++++++++++++++++
> >  migration/multifd.c      | 12 ++++++++++++
> >  migration/multifd.h      |  1 +
> >  6 files changed, 90 insertions(+), 1 deletion(-)
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 563804345fec68ee72793dbb7c1b7e5be4c32083..dbb217255c2cf35dc0ce971c2021b130fac5469b 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -3837,6 +3837,7 @@ COLO Framework
> >  M: Lukas Straub <lukasstraub2@web.de>
> >  S: Maintained
> >  F: migration/colo*
> > +F: migration/multifd-colo.*
> >  F: include/migration/colo.h
> >  F: include/migration/failover.h
> >  F: docs/COLO-FT.txt
> > diff --git a/migration/meson.build b/migration/meson.build
> > index 16909d54c5110fc5d8187fd3a68c4a5b08b59ea7..1e59fe4f1f0bbfffed90df38e8f39fa87bceb9b9 100644
> > --- a/migration/meson.build
> > +++ b/migration/meson.build
> > @@ -40,7 +40,7 @@ system_ss.add(files(
> >  ), gnutls, zlib)
> >  
> >  if get_option('replication').allowed()
> > -  system_ss.add(files('colo-failover.c', 'colo.c'))
> > +  system_ss.add(files('colo-failover.c', 'colo.c', 'multifd-colo.c'))
> >  else
> >    system_ss.add(files('colo-stubs.c'))
> >  endif
> > diff --git a/migration/multifd-colo.c b/migration/multifd-colo.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..d8d98e79b12ed52c41f341052a682d7786e221b5
> > --- /dev/null
> > +++ b/migration/multifd-colo.c
> > @@ -0,0 +1,49 @@
> > +/*
> > + * SPDX-License-Identifier: GPL-2.0-or-later
> > + *
> > + * multifd colo implementation
> > + *
> > + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "exec/target_page.h"
> > +#include "qemu/error-report.h"
> > +#include "qapi/error.h"
> > +#include "ram.h"
> > +#include "multifd.h"
> > +#include "options.h"
> > +#include "io/channel-socket.h"
> > +#include "migration/colo.h"
> > +#include "multifd-colo.h"
> > +#include "system/ramblock.h"
> > +
> > +void multifd_colo_prepare_recv(MultiFDRecvParams *p)
> > +{
> > +    assert(p->block->colo_cache);
> > +
> > +    /*
> > +     * While we're still in precopy state (not yet in colo state), we copy
> > +     * received pages to both guest and cache. No need to set dirty bits,
> > +     * since guest and cache memory are in sync.
> > +     */
> > +    if (migration_incoming_in_colo_state()) {
> > +        colo_record_bitmap(p->block, p->normal, p->normal_num);
> > +    }
> > +    p->host = p->block->colo_cache;  
> 
> I should have mentioned it while reviewing the previous version, anyway..
> 
> IMHO it would be better to have one place setting p->host instead of
> overwritting it.
> 
> So instead of hooking before ->recv(), we should do it in
> multifd_ram_unfill_packet(), moving the p->host update to the end of
> function and hook it there with COLO (so that you can still record the
> bitmaps, only after normal[]).

Okay, will fix that in the next version.

> 
> Another thing, which might be more important: you seem to have ignored
> zero[], but I think you need it.  zero[] keeps all pages that are zeros
> (which may not used to be zeros).  So IMHO you'll need to record them as
> dirty too in COLO's dest bitmap, otherwise you may hit hard to debug RAM
> corruptions.

Yes, that is a good catch. I missed that during forward porting.

> 
> > +}
> > +
> > +void multifd_colo_process_recv(MultiFDRecvParams *p)
> > +{
> > +    if (!migration_incoming_in_colo_state()) {
> > +        for (int i = 0; i < p->normal_num; i++) {
> > +            void *guest = p->block->host + p->normal[i];
> > +            void *cache = p->host + p->normal[i];
> > +            memcpy(guest, cache, multifd_ram_page_size());
> > +        }
> > +    }
> > +    p->host = p->block->host;  
> 
> Is resetting the pointer required?  If not, we can skip it.

Not that I can see, i will remove this in the next version.

> 
> > +}
> > diff --git a/migration/multifd-colo.h b/migration/multifd-colo.h
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..82eaf3f48c47de2f090f9de52f9d57a337d4754a
> > --- /dev/null
> > +++ b/migration/multifd-colo.h
> > @@ -0,0 +1,26 @@
> > +/*
> > + * SPDX-License-Identifier: GPL-2.0-or-later
> > + *
> > + * multifd colo header
> > + *
> > + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +
> > +#ifndef QEMU_MIGRATION_MULTIFD_COLO_H
> > +#define QEMU_MIGRATION_MULTIFD_COLO_H
> > +
> > +#ifdef CONFIG_REPLICATION
> > +
> > +void multifd_colo_prepare_recv(MultiFDRecvParams *p);
> > +void multifd_colo_process_recv(MultiFDRecvParams *p);
> > +
> > +#else
> > +
> > +static inline void multifd_colo_prepare_recv(MultiFDRecvParams *p) {}
> > +static inline void multifd_colo_process_recv(MultiFDRecvParams *p) {}
> > +
> > +#endif
> > +#endif
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index 8e71171fb7a17726ba7eb0705e293c41e8aa32ec..6c85acec3bac134e85cfcee0d32057134f5af8d1 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -29,6 +29,7 @@
> >  #include "qemu-file.h"
> >  #include "trace.h"
> >  #include "multifd.h"
> > +#include "multifd-colo.h"
> >  #include "threadinfo.h"
> >  #include "options.h"
> >  #include "qemu/yank.h"
> > @@ -1269,7 +1270,18 @@ static int multifd_ram_state_recv(MultiFDRecvParams *p, Error **errp)
> >  {
> >      int ret;
> >  
> > +    if (migrate_colo()) {
> > +        multifd_colo_prepare_recv(p);
> > +    }
> > +
> >      ret = multifd_recv_state->ops->recv(p, errp);
> > +    if (ret != 0) {
> > +        return ret;
> > +    }
> > +
> > +    if (migrate_colo()) {
> > +        multifd_colo_process_recv(p);
> > +    }
> >  
> >      return ret;
> >  }
> > diff --git a/migration/multifd.h b/migration/multifd.h
> > index 9b6d81e7ede024f05d4cd235de95e73840d0bbc4..7036f438fade1baed2442bfdcf8b5d6397c4a448 100644
> > --- a/migration/multifd.h
> > +++ b/migration/multifd.h
> > @@ -280,6 +280,7 @@ typedef struct {
> >      /* ramblock */
> >      RAMBlock *block;
> >      /* ramblock host address */
> > +    /* or points to the corresponding address in the colo cache */  
> 
> Nit: we can merge it with /* ... */, and some wording change:
> 
>        /*
>         * Normally, it points to ramblock's host address.  When COLO
>         * enabled, it points to the mirror cache for the ramblock.
>         */

Okay, will fix this.

> 
> >      uint8_t *host;
> >      /* buffers to recv */
> >      struct iovec *iov;
> > 
> > -- 
> > 2.39.5
> >   
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2026-01-21 19:01 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-17 14:09 [PATCH v2 0/8] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
2026-01-17 14:09 ` [PATCH v2 1/8] MAINTAINERS: Add myself as maintainer for COLO migration framework Lukas Straub
2026-01-20 17:32   ` Peter Xu
2026-01-22  9:54     ` Zhang Chen
2026-01-17 14:09 ` [PATCH v2 2/8] MAINTAINERS: Remove Hailiang Zhang from " Lukas Straub
2026-01-20 17:32   ` Peter Xu
2026-01-22  9:54     ` Zhang Chen
2026-01-17 14:09 ` [PATCH v2 3/8] Move ram state receive into multifd_ram_state_recv() Lukas Straub
2026-01-20 17:14   ` Peter Xu
2026-01-17 14:09 ` [PATCH v2 4/8] multifd: Add COLO support Lukas Straub
2026-01-20 17:13   ` Peter Xu
2026-01-20 18:05     ` Daniel P. Berrangé
2026-01-20 19:18       ` Peter Xu
2026-01-21 19:00     ` Lukas Straub [this message]
2026-01-17 14:09 ` [PATCH v2 5/8] migration-test: Add COLO migration unit test Lukas Straub
2026-01-20 17:23   ` Peter Xu
2026-01-21 19:37     ` Lukas Straub
2026-01-25 17:18       ` Lukas Straub
2026-01-26 15:28         ` Peter Xu
2026-01-17 14:09 ` [PATCH v2 6/8] Convert colo main documentation to restructuredText Lukas Straub
2026-01-20 17:26   ` Peter Xu
2026-01-21 19:44     ` Lukas Straub
2026-01-17 14:09 ` [PATCH v2 7/8] qemu-colo.rst: Miscellaneous changes Lukas Straub
2026-01-20 17:30   ` Peter Xu
2026-01-17 14:09 ` [PATCH v2 8/8] qemu-colo.rst: Simplify the block replication setup Lukas Straub
2026-01-20 17:32   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260121200030.30e8537c@penguin \
    --to=lukasstraub2@web.de \
    --cc=armbru@redhat.com \
    --cc=farosas@suse.de \
    --cc=lvivier@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@trasno.org \
    --cc=zhangckid@gmail.com \
    --cc=zhanghailiang@xfusion.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.