All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lukas Straub <lukasstraub2@web.de>
To: Fabiano Rosas <farosas@suse.de>
Cc: Peter Xu <peterx@redhat.com>,
	qemu-devel@nongnu.org, Laurent Vivier <lvivier@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Zhang Chen <zhangckid@gmail.com>,
	Hailiang Zhang <zhanghailiang@xfusion.com>,
	Markus Armbruster <armbru@redhat.com>,
	Li Zhijian <lizhijian@fujitsu.com>,
	"Dr. David Alan Gilbert" <dave@treblig.org>,
	Juan Quintela <quintela@trasno.org>
Subject: Re: [PATCH v3 04/10] multifd: Add COLO support
Date: Tue, 3 Feb 2026 10:47:54 +0100	[thread overview]
Message-ID: <20260203104754.17e7baf6@penguin> (raw)
In-Reply-To: <87v7glgbrz.fsf@suse.de>

[-- Attachment #1: Type: text/plain, Size: 5345 bytes --]

On Wed, 28 Jan 2026 09:30:24 -0300
Fabiano Rosas <farosas@suse.de> wrote:

> Peter Xu <peterx@redhat.com> writes:
> 
> > On Mon, Jan 26, 2026 at 06:37:31PM -0300, Fabiano Rosas wrote:  
> >> Lukas Straub <lukasstraub2@web.de> writes:
> >>   
> >> >> [...]
> >> >>   
> >> >> > +        for (int i = 0; i < p->zero_num; i++) {
> >> >> > +            void *guest = p->block->host + p->zero[i];
> >> >> > +            memset(guest, 0, multifd_ram_page_size());
> >> >> > +        }    
> >> >> 
> >> >> At multifd_nocomp_recv, there will be a call to
> >> >> multifd_recv_zero_page_process(), which by that point will have p->host
> >> >> == p->block->colo_cache, so it looks like that function will do some
> >> >> zero page processing in the colo_cache, setting the rb->receivedmap for
> >> >> pages in the colo_cache and potentially also doing a memcpy. Is this
> >> >> intended?  
> >> >
> >> > rb->receivedmap is only for postcopy, right? So it doesn't apply with
> >> > colo.
> >> >  
> >> 
> >> It's not anymore since commit 5ef7e26bdb ("migration/multifd: solve zero
> >> page causing multiple page faults"). So it seems we might be doing extra
> >> work on top of the colo_cache.  
> >
> > IIUC not extra, but exactly what will be needed.
> >
> > The logic was about "in a vanilla precopy, if we see one page arriving the
> > 1st time we don't need to zero the buffer because the buffer should be zero
> > allocated".
> >
> > In COLO's case, COLO always puts RAM data into colo_cache, hence it should
> > apply to colo_cache too, avoiding unnecessary memset() for colo_cache
> > instead.
> >
> > E.g. colo_cache is allocated from qemu_anon_ram_alloc(), it's also
> > guaranteed to be zeros when never touched.
> >  
> >>   
> >> >> 
> >> >> I'm thinking that maybe it would overall be better to hook colo directly
> >> >> in to multifd_nocomp_recv:  
> >> >
> >> > But then it will only work for nocomp, right? It feels like the wrong
> >> > level of abstraction to me.
> >> >  
> >> 
> >> Ah, nocomp != ram indeed.
> >>   
> >> >> 
> >> >> > [...]
> >> >> > diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
> >> >> > index 9be79b3b8e00371ebff9e112766c225bec260bf7..9f7a792fa761b3bc30b971b35f464103a61787f0 100644
> >> >> > --- a/migration/multifd-nocomp.c
> >> >> > +++ b/migration/multifd-nocomp.c
> >> >> > @@ -16,6 +16,7 @@
> >> >> >  #include "file.h"
> >> >> >  #include "migration-stats.h"
> >> >> >  #include "multifd.h"
> >> >> > +#include "multifd-colo.h"
> >> >> >  #include "options.h"
> >> >> >  #include "migration.h"
> >> >> >  #include "qapi/error.h"
> >> >> > @@ -269,7 +270,6 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
> >> >> >          return -1;
> >> >> >      }
> >> >> >  
> >> >> > -    p->host = p->block->host;
> >> >> >      for (i = 0; i < p->normal_num; i++) {
> >> >> >          uint64_t offset = be64_to_cpu(packet->offset[i]);
> >> >> >  
> >> >> > @@ -294,6 +294,14 @@ int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
> >> >> >          p->zero[i] = offset;
> >> >> >      }
> >> >> >  
> >> >> > +    if (migrate_colo()) {
> >> >> > +        multifd_colo_prepare_recv(p);
> >> >> > +        assert(p->block->colo_cache);
> >> >> > +        p->host = p->block->colo_cache;    
> >> >> 
> >> >> Can't you just use p->block->colo_cache later? I don't see why p->host
> >> >> needs to be set beforehand even in the non-colo case.  
> >> >
> >> > We should not touch the guest ram directly while in colo state, since
> >> > the incoming guest is running and we either want to receive and apply a
> >> > whole checkpoint with all ram into colo cache and all device state,
> >> > or if anything goes wrong during checkpointing, keep the currently
> >> > running guest on the incoming side in pristine state.
> >> >  
> >> 
> >> I was asking about setting p->host at this specific point. I don't think
> >> any of this fits the unfill function. However, I see those were
> >> suggested by Peter so let's not go back and forth.  
> >
> > Actually I don't know why p->host existed before this work; IIUC we could
> > have always used p->block->host.  Maybe when Juan was developing this Juan
> > kept COLO in mind; or maybe Juan wanted to avoid frequent p->block pointer
> > reference.
> >  
> 
> Maybe p->block was being reset at some point and p->host was passed
> being the point where the (whatever) lock was release. I checked and
> today there's no such thing. The p->mutex seems to be there just to
> protect against this in multifd_recv_sync_main:
> 
> WITH_QEMU_LOCK_GUARD(&p->mutex) {
>     if (multifd_recv_state->packet_num < p->packet_num) {
>         multifd_recv_state->packet_num = p->packet_num;
>     }
> }
> 
> > IIUC, we could remove p->host, but when we need to access "the buffer of
> > the ramblock" we'll need to call a helper to fetch that (either ramblock's
> > buffer, or colo_cache, per migrate_colo()).  And it might be slightly
> > slower than p->host indeed.
> >  
> 
> Yeah, let's keep it, the compression code also uses it, there's no point
> removing it now.
> 

Actually p->host was there first p->block was added later for COLO in
5d1d1fcf4 multifd: Add the ramblock to MultiFDRecvParams

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2026-02-03  9:49 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-25 20:40 [PATCH v3 00/10] migration: Add COLO multifd support and COLO migration unit test Lukas Straub
2026-01-25 20:40 ` [PATCH v3 01/10] MAINTAINERS: Add myself as maintainer for COLO migration framework Lukas Straub
2026-01-25 20:40 ` [PATCH v3 02/10] MAINTAINERS: Remove Hailiang Zhang from " Lukas Straub
2026-01-25 20:40 ` [PATCH v3 03/10] Move ram state receive into multifd_ram_state_recv() Lukas Straub
2026-01-26 12:51   ` Fabiano Rosas
2026-01-25 20:40 ` [PATCH v3 04/10] multifd: Add COLO support Lukas Straub
2026-01-26 10:36   ` Zhang Chen
2026-01-26 11:13     ` Lukas Straub
2026-01-26 14:33   ` Fabiano Rosas
2026-01-26 19:33     ` Lukas Straub
2026-01-26 21:37       ` Fabiano Rosas
2026-01-27 20:36         ` Peter Xu
2026-01-28 12:30           ` Fabiano Rosas
2026-01-28 14:09             ` Peter Xu
2026-01-28 20:02               ` Fabiano Rosas
2026-02-03  9:47             ` Lukas Straub [this message]
2026-01-25 20:40 ` [PATCH v3 05/10] colo: Fix crash during device vmstate load Lukas Straub
2026-01-27 20:38   ` Peter Xu
2026-01-30 12:49     ` Lukas Straub
2026-02-02 14:12       ` Peter Xu
2026-02-03  9:25         ` Lukas Straub
2026-01-25 20:40 ` [PATCH v3 06/10] migration-test: Add COLO migration unit test Lukas Straub
2026-01-26 14:40   ` Fabiano Rosas
2026-01-27 20:49   ` Peter Xu
2026-01-30 10:24     ` Lukas Straub
2026-02-02 14:26       ` Peter Xu
2026-02-03  9:18         ` Lukas Straub
2026-02-03 21:21           ` Peter Xu
2026-02-06 19:11             ` Lukas Straub
2026-01-28 12:32   ` Fabiano Rosas
2026-01-25 20:40 ` [PATCH v3 07/10] Convert colo main documentation to restructuredText Lukas Straub
2026-01-25 20:40 ` [PATCH v3 08/10] qemu-colo.rst: Miscellaneous changes Lukas Straub
2026-01-26 10:21   ` Zhang Chen
2026-01-26 10:56     ` Lukas Straub
2026-01-25 20:40 ` [PATCH v3 09/10] qemu-colo.rst: Add my copyright Lukas Straub
2026-01-26 10:23   ` Zhang Chen
2026-01-25 20:40 ` [PATCH v3 10/10] qemu-colo.rst: Simplify the block replication setup Lukas Straub

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260203104754.17e7baf6@penguin \
    --to=lukasstraub2@web.de \
    --cc=armbru@redhat.com \
    --cc=dave@treblig.org \
    --cc=farosas@suse.de \
    --cc=lizhijian@fujitsu.com \
    --cc=lvivier@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@trasno.org \
    --cc=zhangckid@gmail.com \
    --cc=zhanghailiang@xfusion.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.