qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Hao Xiang <hao.xiang@bytedance.com>
To: Peter Xu <peterx@redhat.com>
Cc: Fabiano Rosas <farosas@suse.de>,
	pbonzini@redhat.com, berrange@redhat.com,  eduardo@habkost.net,
	eblake@redhat.com, armbru@redhat.com, thuth@redhat.com,
	lvivier@redhat.com, qemu-devel@nongnu.org, jdenemar@redhat.com
Subject: Re: [External] Re: [PATCH v2 4/7] migration/multifd: Enable zero page checking from multifd threads.
Date: Sat, 24 Feb 2024 15:03:15 -0800	[thread overview]
Message-ID: <CAAYibXgp-NGqE5ATby_Y6=s7WR5yToTxWQbdeVydv0Jez98iEQ@mail.gmail.com> (raw)
In-Reply-To: <CAAYibXjBX8CeCL3-9BcUoGxY6UY9-N8sriJ7N_GUzVPUX1y3YQ@mail.gmail.com>

On Thu, Feb 22, 2024 at 10:02 PM Hao Xiang <hao.xiang@bytedance.com> wrote:
>
> On Thu, Feb 22, 2024 at 6:33 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Wed, Feb 21, 2024 at 06:06:19PM -0300, Fabiano Rosas wrote:
> > > Hao Xiang <hao.xiang@bytedance.com> writes:
> > >
> > > > This change adds a dedicated handler for MigrationOps::ram_save_target_page in
> > >
> > > nit: Add a dedicated handler...
> > >
> > > Usually "this patch/change" is used only when necessary to avoid
> > > ambiguity.
> > >
> > > > multifd live migration. Now zero page checking can be done in the multifd threads
> > > > and this becomes the default configuration. We still provide backward compatibility
> > > > where zero page checking is done from the migration main thread.
> > > >
> > > > Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
> > > > ---
> > > >  migration/multifd.c |  1 +
> > > >  migration/options.c |  2 +-
> > > >  migration/ram.c     | 53 ++++++++++++++++++++++++++++++++++-----------
> > > >  3 files changed, 42 insertions(+), 14 deletions(-)
> > > >
> > > > diff --git a/migration/multifd.c b/migration/multifd.c
> > > > index fbb40ea10b..ef5dad1019 100644
> > > > --- a/migration/multifd.c
> > > > +++ b/migration/multifd.c
> > > > @@ -13,6 +13,7 @@
> > > >  #include "qemu/osdep.h"
> > > >  #include "qemu/cutils.h"
> > >
> > > This include...
> > >
> > > >  #include "qemu/rcu.h"
> > > > +#include "qemu/cutils.h"
> > >
> > > is there already.
> > >
> > > >  #include "exec/target_page.h"
> > > >  #include "sysemu/sysemu.h"
> > > >  #include "exec/ramblock.h"
> > > > diff --git a/migration/options.c b/migration/options.c
> > > > index 3c603391b0..3c79b6ccd4 100644
> > > > --- a/migration/options.c
> > > > +++ b/migration/options.c
> > > > @@ -181,7 +181,7 @@ Property migration_properties[] = {
> > > >                        MIG_MODE_NORMAL),
> > > >      DEFINE_PROP_ZERO_PAGE_DETECTION("zero-page-detection", MigrationState,
> > > >                         parameters.zero_page_detection,
> > > > -                       ZERO_PAGE_DETECTION_LEGACY),
> > > > +                       ZERO_PAGE_DETECTION_MULTIFD),
> > >
> > > I think we'll need something to avoid a 9.0 -> 8.2 migration with this
> > > enabled. Otherwise it will go along happily until we get data corruption
> > > because the new QEMU didn't send any zero pages on the migration thread
> > > and the old QEMU did not look for them in the multifd packet.
> >
> > It could be even worse, as the new QEMU will only attach "normal" pages
> > after the multifd packet, the old QEMU could read more than it could,
> > expecting all pages..
> >
> > >
> > > Perhaps bumping the MULTIFD_VERSION when ZERO_PAGE_DETECTION_MULTIFD is
> > > in use. We'd just need to fix the test in the new QEMU to check
> > > (msg.version > MULTIFD_VERSION) instead of (msg.version != MULTIFD_VERSION).
> >
> > IMHO we don't need yet to change MULTIFD_VERSION, what we need is perhaps a
> > compat entry in hw_compat_8_2 setting "zero-page-detection" to "legacy".
> > We should make sure when "legacy" is set, multifd ran the old protocol
> > (zero_num will always be 0, and will be ignored by old QEMUs, IIUC).
> >
> > One more comment is, when repost please consider split this patch into two;
> > The new ram_save_target_page_multifd() hook can be done in another patch,
> > AFAIU.
>
> Sorry, I kept missing this. I will keep telling myself, compatibility
> is king. I will set the hw_compat_8_2 setting and make sure to test
> migration 9.0 -> 8.2 fails with "multifd" option set.
> Will split patches.

So I just want to make sure I am coding the right solution. I added
setting "zero-page-detection" to "legacy" in hw_compat_8_2 and tested
it. The behavior is that if I set machine type to pc-q35-8.2,
zero-page-detection will automatically be set to "legacy". But if I
set the machine type to pc-q35-9.0, zero-page-detection will be the
default value "multifd". However, this doesn't seem to be a hard
requirement because I can still override zero-page-detection to
multifd on machine type pc-q35-8.2. Is this OK?

>
> >
> > >
> > > >
> > > >      /* Migration capabilities */
> > > >      DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
> > > > diff --git a/migration/ram.c b/migration/ram.c
> > > > index 5ece9f042e..b088c5a98c 100644
> > > > --- a/migration/ram.c
> > > > +++ b/migration/ram.c
> > > > @@ -1123,10 +1123,6 @@ static int save_zero_page(RAMState *rs, PageSearchStatus *pss,
> > > >      QEMUFile *file = pss->pss_channel;
> > > >      int len = 0;
> > > >
> > > > -    if (migrate_zero_page_detection() != ZERO_PAGE_DETECTION_LEGACY) {
> > > > -        return 0;
> > > > -    }
> > >
> > > How does 'none' work now?
> > >
> > > > -
> > > >      if (!buffer_is_zero(p, TARGET_PAGE_SIZE)) {
> > > >          return 0;
> > > >      }
> > > > @@ -1256,6 +1252,10 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
> > > >
> > > >  static int ram_save_multifd_page(RAMBlock *block, ram_addr_t offset)
> > > >  {
> > > > +    assert(migrate_multifd());
> > > > +    assert(!migrate_compress());
> > > > +    assert(!migration_in_postcopy());
> > >
> > > Drop these, please. Keep only the asserts that are likely to trigger
> > > during development, such as the existing ones at multifd_send_pages.
> > >
> > > > +
> > > >      if (!multifd_queue_page(block, offset)) {
> > > >          return -1;
> > > >      }
> > > > @@ -2046,7 +2046,6 @@ static bool save_compress_page(RAMState *rs, PageSearchStatus *pss,
> > > >   */
> > > >  static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
> > > >  {
> > > > -    RAMBlock *block = pss->block;
> > > >      ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
> > > >      int res;
> > > >
> > > > @@ -2062,17 +2061,40 @@ static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
> > > >          return 1;
> > > >      }
> > > >
> > > > +    return ram_save_page(rs, pss);
> > >
> > > Look at where git put this! Are you using the default diff algorithm? If
> > > not try using --patience to see if it improves the diff.
> > >
> > > > +}
> > > > +
> > > > +/**
> > > > + * ram_save_target_page_multifd: save one target page
> > > > + *
> > > > + * Returns the number of pages written
> > >
> > > We could be more precise here:
> > >
> > >  ram_save_target_page_multifd: send one target page to multifd workers
> > >
> > >  Returns 1 if the page was queued, -1 otherwise.
> > >
> > > > + *
> > > > + * @rs: current RAM state
> > > > + * @pss: data about the page we want to send
> > > > + */
> > > > +static int ram_save_target_page_multifd(RAMState *rs, PageSearchStatus *pss)
> > > > +{
> > > > +    RAMBlock *block = pss->block;
> > > > +    ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
> > > > +
> > > > +    /* Multifd is not compatible with old compression. */
> > > > +    assert(!migrate_compress());
> > >
> > > This should already be enforced at options.c.
> > >
> > > > +
> > > > +    /* Multifd is not compabible with postcopy. */
> > > > +    assert(!migration_in_postcopy());
> > >
> > > Same here.
> > >
> > > > +
> > > >      /*
> > > > -     * Do not use multifd in postcopy as one whole host page should be
> > > > -     * placed.  Meanwhile postcopy requires atomic update of pages, so even
> > > > -     * if host page size == guest page size the dest guest during run may
> > > > -     * still see partially copied pages which is data corruption.
> > > > +     * Backward compatibility support. While using multifd live
> > > > +     * migration, we still need to handle zero page checking on the
> > > > +     * migration main thread.
> > > >       */
> > > > -    if (migrate_multifd() && !migration_in_postcopy()) {
> > > > -        return ram_save_multifd_page(block, offset);
> > > > +    if (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) {
> > > > +        if (save_zero_page(rs, pss, offset)) {
> > > > +            return 1;
> > > > +        }
> > > >      }
> > > >
> > > > -    return ram_save_page(rs, pss);
> > > > +    return ram_save_multifd_page(block, offset);
> > > >  }
> > > >
> > > >  /* Should be called before sending a host page */
> > > > @@ -2984,7 +3006,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
> > > >      }
> > > >
> > > >      migration_ops = g_malloc0(sizeof(MigrationOps));
> > > > -    migration_ops->ram_save_target_page = ram_save_target_page_legacy;
> > > > +
> > > > +    if (migrate_multifd()) {
> > > > +        migration_ops->ram_save_target_page = ram_save_target_page_multifd;
> > > > +    } else {
> > > > +        migration_ops->ram_save_target_page = ram_save_target_page_legacy;
> > > > +    }
> > > >
> > > >      bql_unlock();
> > > >      ret = multifd_send_sync_main();
> > >
> >
> > --
> > Peter Xu
> >


  reply	other threads:[~2024-02-24 23:04 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-16 22:39 [PATCH v2 0/7] Introduce multifd zero page checking Hao Xiang
2024-02-16 22:39 ` [PATCH v2 1/7] migration/multifd: Add new migration option zero-page-detection Hao Xiang
2024-02-21 12:03   ` Markus Armbruster
2024-02-23  4:22     ` [External] " Hao Xiang
2024-02-21 13:58   ` Elena Ufimtseva
2024-02-23  4:37     ` [External] " Hao Xiang
2024-02-22 10:36   ` Peter Xu
2024-02-26  7:18   ` Wang, Lei
2024-02-26 19:45     ` [External] " Hao Xiang
2024-02-16 22:39 ` [PATCH v2 2/7] migration/multifd: Support for zero pages transmission in multifd format Hao Xiang
2024-02-21 15:37   ` Elena Ufimtseva
2024-02-23  4:18     ` [External] " Hao Xiang
2024-02-16 22:39 ` [PATCH v2 3/7] migration/multifd: Zero page transmission on the multifd thread Hao Xiang
2024-02-16 23:49   ` Richard Henderson
2024-02-23  4:38     ` [External] " Hao Xiang
2024-02-24 19:06       ` Hao Xiang
2024-02-21 12:04   ` Markus Armbruster
2024-02-21 16:00   ` Elena Ufimtseva
2024-02-23  4:59     ` [External] " Hao Xiang
2024-02-21 21:04   ` Fabiano Rosas
2024-02-23  2:20     ` Peter Xu
2024-02-23  5:15       ` [External] " Hao Xiang
2024-02-24 22:56         ` Hao Xiang
2024-02-26  1:30           ` Peter Xu
2024-02-23  5:18     ` Hao Xiang
2024-02-23 14:47       ` Fabiano Rosas
2024-02-16 22:39 ` [PATCH v2 4/7] migration/multifd: Enable zero page checking from multifd threads Hao Xiang
2024-02-21 16:11   ` Elena Ufimtseva
2024-02-23  5:24     ` [External] " Hao Xiang
2024-02-21 21:06   ` Fabiano Rosas
2024-02-23  2:33     ` Peter Xu
2024-02-23  6:02       ` [External] " Hao Xiang
2024-02-24 23:03         ` Hao Xiang [this message]
2024-02-26  1:43           ` Peter Xu
2024-02-23  5:47     ` Hao Xiang
2024-02-23 14:38       ` Fabiano Rosas
2024-02-16 22:40 ` [PATCH v2 5/7] migration/multifd: Add new migration test cases for legacy zero page checking Hao Xiang
2024-02-21 20:59   ` Fabiano Rosas
2024-02-23  4:20     ` [External] " Hao Xiang
2024-02-16 22:40 ` [PATCH v2 6/7] migration/multifd: Add zero pages and zero bytes counter to migration status interface Hao Xiang
2024-02-21 12:07   ` Markus Armbruster
2024-02-16 22:40 ` [PATCH v2 7/7] Update maintainer contact for migration multifd zero page checking acceleration Hao Xiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAAYibXgp-NGqE5ATby_Y6=s7WR5yToTxWQbdeVydv0Jez98iEQ@mail.gmail.com' \
    --to=hao.xiang@bytedance.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=eblake@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=farosas@suse.de \
    --cc=jdenemar@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).