All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Fabiano Rosas <farosas@suse.de>
Cc: Li Zhijian via <qemu-devel@nongnu.org>,
	Laurent Vivier <lvivier@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Li Zhijian <lizhijian@fujitsu.com>
Subject: Re: [PATCH 1/2] migration: Prioritize RDMA in ram_save_target_page()
Date: Tue, 18 Feb 2025 17:03:35 -0500	[thread overview]
Message-ID: <Z7UDtxdNSS-Jqm-y@x1.local> (raw)
In-Reply-To: <8734gb9erz.fsf@suse.de>

On Tue, Feb 18, 2025 at 05:30:40PM -0300, Fabiano Rosas wrote:
> Li Zhijian via <qemu-devel@nongnu.org> writes:
> 
> > Address an error in RDMA-based migration by ensuring RDMA is prioritized
> > when saving pages in `ram_save_target_page()`.
> >
> > Previously, the RDMA protocol's page-saving step was placed after other
> > protocols due to a refactoring in commit bc38dc2f5f3. This led to migration
> > failures characterized by unknown control messages and state loading errors
> > destination:
> > (qemu) qemu-system-x86_64: Unknown control message QEMU FILE
> > qemu-system-x86_64: error while loading state section id 1(ram)
> > qemu-system-x86_64: load of migration failed: Operation not permitted
> > source:
> > (qemu) qemu-system-x86_64: RDMA is in an error state waiting migration to abort!
> > qemu-system-x86_64: failed to save SaveStateEntry with id(name): 1(ram): -1
> > qemu-system-x86_64: rdma migration: recv polling control error!
> > qemu-system-x86_64: warning: Early error. Sending error.
> > qemu-system-x86_64: warning: rdma migration: send polling control error
> >
> > RDMA migration implemented its own protocol/method to send pages to
> > destination side, hand over to RDMA first to prevent pages being saved by
> > other protocol.
> >
> > Fixes: bc38dc2f5f3 ("migration: refactor ram_save_target_page functions")
> > Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> > ---
> >  migration/ram.c | 9 +++++----
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/migration/ram.c b/migration/ram.c
> > index 6f460fd22d2..635a2fe443a 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -1964,6 +1964,11 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
> >      ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
> >      int res;
> >  
> > +    /* Hand over to RDMA first */
> > +    if (control_save_page(pss, offset, &res)) {
> > +        return res;
> > +    }
> > +
> 
> Can we hoist that migrate_rdma() from inside the function? Since the
> other paths already check first before calling their functions.

If we're talking about hoist and stuff.. and if we want to go slightly
further, I wonder if we could also drop RAM_SAVE_CONTROL_NOT_SUPP.

    if (!migrate_rdma() || migration_in_postcopy()) {
        return RAM_SAVE_CONTROL_NOT_SUPP;
    }

We should make sure rdma_control_save_page() won't get invoked at all in
either case above..  For postcopy, maybe we could fail in the QMP migrate /
migrate_incoming cmd, at migration_channels_and_transport_compatible().

> 
> >      if (!migrate_multifd()
> >          || migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) {
> >          if (save_zero_page(rs, pss, offset)) {
> > @@ -1976,10 +1981,6 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
> >          return ram_save_multifd_page(block, offset);
> >      }
> >  
> > -    if (control_save_page(pss, offset, &res)) {
> > -        return res;
> > -    }
> > -
> >      return ram_save_page(rs, pss);
> >  }
> 

-- 
Peter Xu



  reply	other threads:[~2025-02-18 22:04 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-18  7:43 [PATCH 1/2] migration: Prioritize RDMA in ram_save_target_page() Li Zhijian via
2025-02-18  7:43 ` [PATCH 2/2] [NOT-FOR-MERGE] Add qtest for migration over RDMA Li Zhijian via
2025-02-18 21:03   ` Fabiano Rosas
2025-02-18 22:40     ` Peter Xu
2025-02-19  5:33       ` Zhijian Li (Fujitsu) via
2025-02-19 12:47         ` Peter Xu
2025-02-19 13:20           ` Fabiano Rosas
2025-02-19 14:11             ` Peter Xu
2025-02-20  9:40               ` Li Zhijian via
2025-02-20 15:55                 ` Peter Xu
2025-02-21  1:32                   ` Zhijian Li (Fujitsu) via
2025-02-18 20:30 ` [PATCH 1/2] migration: Prioritize RDMA in ram_save_target_page() Fabiano Rosas
2025-02-18 22:03   ` Peter Xu [this message]
2025-02-19  9:39     ` Zhijian Li (Fujitsu) via
2025-02-19 13:23       ` Peter Xu
2025-02-20  1:21         ` Zhijian Li (Fujitsu) via

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z7UDtxdNSS-Jqm-y@x1.local \
    --to=peterx@redhat.com \
    --cc=farosas@suse.de \
    --cc=lizhijian@fujitsu.com \
    --cc=lvivier@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.