Re: [PATCH v2 4/7] migration/multifd: Enable zero page checking from multifd threads.

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Peter Xu <peterx@redhat.com>
To: Fabiano Rosas <farosas@suse.de>
Cc: Hao Xiang <hao.xiang@bytedance.com>,
	pbonzini@redhat.com, berrange@redhat.com, eduardo@habkost.net,
	eblake@redhat.com, armbru@redhat.com, thuth@redhat.com,
	lvivier@redhat.com, qemu-devel@nongnu.org, jdenemar@redhat.com
Subject: Re: [PATCH v2 4/7] migration/multifd: Enable zero page checking from multifd threads.
Date: Fri, 23 Feb 2024 10:33:29 +0800	[thread overview]
Message-ID: <ZdgD-eqUYxbQMJQN@x1n> (raw)
In-Reply-To: <875xyhbk8k.fsf@suse.de>

On Wed, Feb 21, 2024 at 06:06:19PM -0300, Fabiano Rosas wrote:
> Hao Xiang <hao.xiang@bytedance.com> writes:
> 
> > This change adds a dedicated handler for MigrationOps::ram_save_target_page in
> 
> nit: Add a dedicated handler...
> 
> Usually "this patch/change" is used only when necessary to avoid
> ambiguity.
> 
> > multifd live migration. Now zero page checking can be done in the multifd threads
> > and this becomes the default configuration. We still provide backward compatibility
> > where zero page checking is done from the migration main thread.
> >
> > Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
> > ---
> >  migration/multifd.c |  1 +
> >  migration/options.c |  2 +-
> >  migration/ram.c     | 53 ++++++++++++++++++++++++++++++++++-----------
> >  3 files changed, 42 insertions(+), 14 deletions(-)
> >
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index fbb40ea10b..ef5dad1019 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -13,6 +13,7 @@
> >  #include "qemu/osdep.h"
> >  #include "qemu/cutils.h"
> 
> This include...
> 
> >  #include "qemu/rcu.h"
> > +#include "qemu/cutils.h"
> 
> is there already.
> 
> >  #include "exec/target_page.h"
> >  #include "sysemu/sysemu.h"
> >  #include "exec/ramblock.h"
> > diff --git a/migration/options.c b/migration/options.c
> > index 3c603391b0..3c79b6ccd4 100644
> > --- a/migration/options.c
> > +++ b/migration/options.c
> > @@ -181,7 +181,7 @@ Property migration_properties[] = {
> >                        MIG_MODE_NORMAL),
> >      DEFINE_PROP_ZERO_PAGE_DETECTION("zero-page-detection", MigrationState,
> >                         parameters.zero_page_detection,
> > -                       ZERO_PAGE_DETECTION_LEGACY),
> > +                       ZERO_PAGE_DETECTION_MULTIFD),
> 
> I think we'll need something to avoid a 9.0 -> 8.2 migration with this
> enabled. Otherwise it will go along happily until we get data corruption
> because the new QEMU didn't send any zero pages on the migration thread
> and the old QEMU did not look for them in the multifd packet.

It could be even worse, as the new QEMU will only attach "normal" pages
after the multifd packet, the old QEMU could read more than it could,
expecting all pages..

> 
> Perhaps bumping the MULTIFD_VERSION when ZERO_PAGE_DETECTION_MULTIFD is
> in use. We'd just need to fix the test in the new QEMU to check
> (msg.version > MULTIFD_VERSION) instead of (msg.version != MULTIFD_VERSION).

IMHO we don't need yet to change MULTIFD_VERSION, what we need is perhaps a
compat entry in hw_compat_8_2 setting "zero-page-detection" to "legacy".
We should make sure when "legacy" is set, multifd ran the old protocol
(zero_num will always be 0, and will be ignored by old QEMUs, IIUC).

One more comment is, when repost please consider split this patch into two;
The new ram_save_target_page_multifd() hook can be done in another patch,
AFAIU.

> 
> >  
> >      /* Migration capabilities */
> >      DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
> > diff --git a/migration/ram.c b/migration/ram.c
> > index 5ece9f042e..b088c5a98c 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -1123,10 +1123,6 @@ static int save_zero_page(RAMState *rs, PageSearchStatus *pss,
> >      QEMUFile *file = pss->pss_channel;
> >      int len = 0;
> >  
> > -    if (migrate_zero_page_detection() != ZERO_PAGE_DETECTION_LEGACY) {
> > -        return 0;
> > -    }
> 
> How does 'none' work now?
> 
> > -
> >      if (!buffer_is_zero(p, TARGET_PAGE_SIZE)) {
> >          return 0;
> >      }
> > @@ -1256,6 +1252,10 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
> >  
> >  static int ram_save_multifd_page(RAMBlock *block, ram_addr_t offset)
> >  {
> > +    assert(migrate_multifd());
> > +    assert(!migrate_compress());
> > +    assert(!migration_in_postcopy());
> 
> Drop these, please. Keep only the asserts that are likely to trigger
> during development, such as the existing ones at multifd_send_pages.
> 
> > +
> >      if (!multifd_queue_page(block, offset)) {
> >          return -1;
> >      }
> > @@ -2046,7 +2046,6 @@ static bool save_compress_page(RAMState *rs, PageSearchStatus *pss,
> >   */
> >  static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
> >  {
> > -    RAMBlock *block = pss->block;
> >      ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
> >      int res;
> >  
> > @@ -2062,17 +2061,40 @@ static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
> >          return 1;
> >      }
> >  
> > +    return ram_save_page(rs, pss);
> 
> Look at where git put this! Are you using the default diff algorithm? If
> not try using --patience to see if it improves the diff.
> 
> > +}
> > +
> > +/**
> > + * ram_save_target_page_multifd: save one target page
> > + *
> > + * Returns the number of pages written
> 
> We could be more precise here:
> 
>  ram_save_target_page_multifd: send one target page to multifd workers
>  
>  Returns 1 if the page was queued, -1 otherwise.
> 
> > + *
> > + * @rs: current RAM state
> > + * @pss: data about the page we want to send
> > + */
> > +static int ram_save_target_page_multifd(RAMState *rs, PageSearchStatus *pss)
> > +{
> > +    RAMBlock *block = pss->block;
> > +    ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
> > +
> > +    /* Multifd is not compatible with old compression. */
> > +    assert(!migrate_compress());
> 
> This should already be enforced at options.c.
> 
> > +
> > +    /* Multifd is not compabible with postcopy. */
> > +    assert(!migration_in_postcopy());
> 
> Same here.
> 
> > +
> >      /*
> > -     * Do not use multifd in postcopy as one whole host page should be
> > -     * placed.  Meanwhile postcopy requires atomic update of pages, so even
> > -     * if host page size == guest page size the dest guest during run may
> > -     * still see partially copied pages which is data corruption.
> > +     * Backward compatibility support. While using multifd live
> > +     * migration, we still need to handle zero page checking on the
> > +     * migration main thread.
> >       */
> > -    if (migrate_multifd() && !migration_in_postcopy()) {
> > -        return ram_save_multifd_page(block, offset);
> > +    if (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) {
> > +        if (save_zero_page(rs, pss, offset)) {
> > +            return 1;
> > +        }
> >      }
> >  
> > -    return ram_save_page(rs, pss);
> > +    return ram_save_multifd_page(block, offset);
> >  }
> >  
> >  /* Should be called before sending a host page */
> > @@ -2984,7 +3006,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
> >      }
> >  
> >      migration_ops = g_malloc0(sizeof(MigrationOps));
> > -    migration_ops->ram_save_target_page = ram_save_target_page_legacy;
> > +
> > +    if (migrate_multifd()) {
> > +        migration_ops->ram_save_target_page = ram_save_target_page_multifd;
> > +    } else {
> > +        migration_ops->ram_save_target_page = ram_save_target_page_legacy;
> > +    }
> >  
> >      bql_unlock();
> >      ret = multifd_send_sync_main();
> 

-- 
Peter Xu

next prev parent reply	other threads:[~2024-02-23  2:34 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-16 22:39 [PATCH v2 0/7] Introduce multifd zero page checking Hao Xiang
2024-02-16 22:39 ` [PATCH v2 1/7] migration/multifd: Add new migration option zero-page-detection Hao Xiang
2024-02-21 12:03   ` Markus Armbruster
2024-02-23  4:22     ` [External] " Hao Xiang
2024-02-21 13:58   ` Elena Ufimtseva
2024-02-23  4:37     ` [External] " Hao Xiang
2024-02-22 10:36   ` Peter Xu
2024-02-26  7:18   ` Wang, Lei
2024-02-26 19:45     ` [External] " Hao Xiang
2024-02-16 22:39 ` [PATCH v2 2/7] migration/multifd: Support for zero pages transmission in multifd format Hao Xiang
2024-02-21 15:37   ` Elena Ufimtseva
2024-02-23  4:18     ` [External] " Hao Xiang
2024-02-16 22:39 ` [PATCH v2 3/7] migration/multifd: Zero page transmission on the multifd thread Hao Xiang
2024-02-16 23:49   ` Richard Henderson
2024-02-23  4:38     ` [External] " Hao Xiang
2024-02-24 19:06       ` Hao Xiang
2024-02-21 12:04   ` Markus Armbruster
2024-02-21 16:00   ` Elena Ufimtseva
2024-02-23  4:59     ` [External] " Hao Xiang
2024-02-21 21:04   ` Fabiano Rosas
2024-02-23  2:20     ` Peter Xu
2024-02-23  5:15       ` [External] " Hao Xiang
2024-02-24 22:56         ` Hao Xiang
2024-02-26  1:30           ` Peter Xu
2024-02-23  5:18     ` Hao Xiang
2024-02-23 14:47       ` Fabiano Rosas
2024-02-16 22:39 ` [PATCH v2 4/7] migration/multifd: Enable zero page checking from multifd threads Hao Xiang
2024-02-21 16:11   ` Elena Ufimtseva
2024-02-23  5:24     ` [External] " Hao Xiang
2024-02-21 21:06   ` Fabiano Rosas
2024-02-23  2:33     ` Peter Xu [this message]
2024-02-23  6:02       ` [External] " Hao Xiang
2024-02-24 23:03         ` Hao Xiang
2024-02-26  1:43           ` Peter Xu
2024-02-23  5:47     ` Hao Xiang
2024-02-23 14:38       ` Fabiano Rosas
2024-02-16 22:40 ` [PATCH v2 5/7] migration/multifd: Add new migration test cases for legacy zero page checking Hao Xiang
2024-02-21 20:59   ` Fabiano Rosas
2024-02-23  4:20     ` [External] " Hao Xiang
2024-02-16 22:40 ` [PATCH v2 6/7] migration/multifd: Add zero pages and zero bytes counter to migration status interface Hao Xiang
2024-02-21 12:07   ` Markus Armbruster
2024-02-16 22:40 ` [PATCH v2 7/7] Update maintainer contact for migration multifd zero page checking acceleration Hao Xiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZdgD-eqUYxbQMJQN@x1n \
    --to=peterx@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=eblake@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=farosas@suse.de \
    --cc=hao.xiang@bytedance.com \
    --cc=jdenemar@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).