qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Wei Wang <wei.w.wang@intel.com>
Cc: virtio-dev@lists.oasis-open.org, quintela@redhat.com,
	liliang.opensource@gmail.com, mst@redhat.com,
	qemu-devel@nongnu.org, dgilbert@redhat.com, pbonzini@redhat.com,
	nilal@redhat.com
Subject: Re: [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy
Date: Thu, 29 Nov 2018 13:47:22 +0800	[thread overview]
Message-ID: <20181129054722.GD29246@xz-x1> (raw)
In-Reply-To: <20181129051014.GC29246@xz-x1>

On Thu, Nov 29, 2018 at 01:10:14PM +0800, Peter Xu wrote:
> On Thu, Nov 29, 2018 at 11:40:57AM +0800, Wei Wang wrote:
> > On 11/28/2018 05:32 PM, Peter Xu wrote:
> > > 
> > > So what I am worrying here are corner cases where we might forget to
> > > stop the hinting.  I'm fabricating one example sequence of events:
> > > 
> > >    (start migration)
> > >    START_MIGRATION
> > >    BEFORE_SYNC
> > >    AFTER_SYNC
> > >    ...
> > >    BEFORE_SYNC
> > >    AFTER_SYNC
> > >    (some SaveStateEntry failed rather than RAM, then
> > >     migration_detect_error returned MIG_THR_ERR_FATAL so we need to
> > >     fail the migration, however when running the previous
> > >     ram_save_iterate for RAM's specific SaveStateEntry we didn't see
> > >     any error so no ERROR event detected)
> > > 
> > > Then it seems the hinting will last forever.  Considering that now I'm
> > > not sure whether this can be done ram-only, since even if you capture
> > > ram_save_complete() and at the same time you introduce PRECOPY_END you
> > > may still miss the PRECOPY_END event since AFAIU ram_save_complete()
> > > won't be called at all in this case.
> > > 
> > > Could this happen?
> > 
> > Thanks, indeed this case could happen if we add PRECOPY_END in
> > ram_save_complete.
> > 
> > How about putting PRECOPY_END in ram_save_cleanup?
> > I think it would be called in any case.
> 
> Sounds good.
> 
> > 
> > I'm also thinking probably we don't need PRECOPY_ERR when we have
> > PRECOPY_END,
> > and what do you think of the notifier names below:
> > 
> > +typedef enum PrecopyNotifyReason {
> > +    PRECOPY_NOTIFY_RAM_SAVE_END = 0,
> > +    PRECOPY_NOTIFY_RAM_SAVE_START = 1,
> > +    PRECOPY_NOTIFY_RAM_SAVE_BEFORE_SYNC_BITMAP = 2,
> > +    PRECOPY_NOTIFY_RAM_SAVE_AFTER_SYNC_BITMAP = 3,
> > +    PRECOPY_NOTIFY_RAM_SAVE_MAX = 4,
> > +} PrecopyNotifyReason;
> 
> (please see below [1]...)
> 
> > 
> > 
> > > 
> > > > 
> > > > > [1]
> > > > > 
> > > > > > > Another thing to mention about the "reasons" (though I see it more
> > > > > > > like "events"): have you thought about adding a PRECOPY_NOTIFY_END?
> > > > > > > It might help in some cases:
> > > > > > > 
> > > > > > >      - then you don't need to trickily export the migrate_postcopy()
> > > > > > >        since you'll notify that before postcopy starts
> > > > > > I'm thinking probably we don't need to export migrate_postcopy even now.
> > > > > > It's more like a sanity check, and not needed because now we have the
> > > > > > notifier registered to the precopy specific callchain, which has ensured
> > > > > > that
> > > > > > it is invoked via precopy.
> > > > > But postcopy will always start with precopy, no?
> > > > Yes, but I think we could add the check in precopy_notify()
> > > I'm not sure that's good.  If the notifier could potentially have
> > > other user, they might still work with postcopy, and they might expect
> > > e.g. BEFORE_SYNC to be called for every sync, even if it's at the
> > > precopy stage of a postcopy.
> > 
> > I think this precopy notifier callchain is expected to be used only for
> > the precopy mode. Postcopy has its dedicated notifier callchain that
> > users could use.
> > 
> > How about changing the migrate_postcopy() check to "ms->start_postcopy":
> > 
> > bool migration_postcopy_start(void)
> > {
> >     MigrationState *s;
> > 
> >     s = migrate_get_current();
> > 
> >     return atomic_read(&s->start_postcopy);
> > }
> > 
> > 
> > static void precopy_notify(PrecopyNotifyReason reason)
> > {
> >     if (migration_postcopy_start())
> >         return;
> > 
> >     notifier_list_notify(&precopy_notifier_list, &reason);
> > }
> > 
> > If postcopy started with precopy, the precopy optimization feature
> > could still be used until it switches to the postcopy mode.
> 
> I'm not sure we can use start_postcopy.  It's a variable being set in
> the QMP handler but it does not mean postcopy has started.  I'm afraid
> there can be race where it's still precopy but the variable is set so
> event could be missed...
> 
> IMHO the problem is not that complicated.  How about this proposal:
> 
> [1]
> 
>   typedef enum PrecopyNotifyReason {
>     PRECOPY_NOTIFY_RAM_START,
>     PRECOPY_NOTIFY_RAM_BEFORE_SYNC,
>     PRECOPY_NOTIFY_RAM_AFTER_SYNC,
>     PRECOPY_NOTIFY_COMPLETE,
>     PRECOPY_NOTIFY_RAM_CLEANUP,
>   };
> 
> The first three keep the same as your old ones.  Notify RAM_CLEANUP in
> ram_save_cleanup() to make sure it'll always be cleaned up (the same
> as PRECOPY_END, just another name).  Notify COMPLETE in
> qemu_savevm_state_complete_precopy() to show that precopy is
> completed.  Meanwhile on balloon side you should stop the hinting for
> either RAM_CLEANUP or COMPLETE event.  Then either:
> 
>   - precopy is switching to postcopy, or
>   - precopy completed, or
>   - precopy failed/cancelled
> 
> You should always get at least a notification to stop the balloon.
> Though you could also get one RAM_CLEANUP after one COMPLETE, but
> the balloon should easily handle it (stop the hinting twice).
> 
> Here maybe you can even remove the "RAM_" in both RAM_START and
> RAM_CLEANUP if we're going to have COMPLETE since after all it'll be
> not only limited to RAM.

Oh maybe we can remove all the RAM_ prefix to make it precopy
general...

  typedef enum PrecopyNotifyReason {
    PRECOPY_NOTIFY_SETUP,
    PRECOPY_NOTIFY_BEFORE_SYNC,
    PRECOPY_NOTIFY_AFTER_SYNC,
    PRECOPY_NOTIFY_COMPLETE,
    PRECOPY_NOTIFY_CLEANUP,
  };

Then we can just hook everything with the corresponding names:

  SETUP:    hooks with qemu_savevm_state_setup
  COMPLETE: hooks with qemu_savevm_state_complete_precopy
  CLEANUP:  hooks with qemu_savevm_state_cleanup

I'm not sure whether you'll need another hook in ram_state_reset in
the future but for now I don't see it necessary since I don't thnk
ram_list.version would change during migration for now, so
ram_state_reset should only be called during setup.

> 
> Another suggestion is that you can add an Error into the notify hooks,
> please refer to the postcopy one:
> 
>   int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp);
> 
> So the hook functions have a way to even stop the migration (though
> for balloon hinting it'll be always optional so no error should be
> reported...), then the two interfaces are matched.
> 
> > 
> > 
> > 
> > > In that sense I still feel the
> > > PRECOPY_END is better (so contantly call it at the end of precopy, no
> > > matter whether there's another postcopy afterwards).  It sounds like a
> > > cleaner interface.
> > 
> > Probably I still haven't got the point how PRECOPY_END could help above yet.
> 
> Please have a look at above proposal.  Thanks,
> 
> -- 
> Peter Xu
> 

Regards,

-- 
Peter Xu

  reply	other threads:[~2018-11-29  5:47 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-15 10:07 [Qemu-devel] [PATCH v9 0/8] virtio-balloon: free page hint support Wei Wang
2018-11-15 10:07 ` [Qemu-devel] [PATCH v9 1/8] bitmap: fix bitmap_count_one Wei Wang
2018-11-15 10:07 ` [Qemu-devel] [PATCH v9 2/8] bitmap: bitmap_count_one_with_offset Wei Wang
2018-11-15 10:07 ` [Qemu-devel] [PATCH v9 3/8] migration: use bitmap_mutex in migration_bitmap_clear_dirty Wei Wang
2018-11-27  5:40   ` Peter Xu
2018-11-27  6:02     ` Wei Wang
2018-11-27  6:12       ` [Qemu-devel] [virtio-dev] " Wei Wang
2018-11-27  7:41         ` Peter Xu
2018-11-27 10:17           ` Wei Wang
2018-11-15 10:08 ` [Qemu-devel] [PATCH v9 4/8] migration: API to clear bits of guest free pages from the dirty bitmap Wei Wang
2018-11-27  6:06   ` Peter Xu
2018-11-27  6:52     ` Wei Wang
2018-11-27  7:43       ` Peter Xu
2018-11-15 10:08 ` [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy Wei Wang
2018-11-27  7:38   ` Peter Xu
2018-11-27 10:25     ` Wei Wang
2018-11-28  5:26       ` Peter Xu
2018-11-28  9:01         ` Wei Wang
2018-11-28  9:32           ` Peter Xu
2018-11-29  3:40             ` Wei Wang
2018-11-29  5:10               ` Peter Xu
2018-11-29  5:47                 ` Peter Xu [this message]
2018-11-29  6:30                 ` Wei Wang
2018-11-30  5:05                 ` Wei Wang
2018-11-30  5:57                   ` Peter Xu
2018-11-30  7:09                     ` Wei Wang
2018-11-15 10:08 ` [Qemu-devel] [PATCH v9 6/8] migration/ram.c: add a function to disable the bulk stage Wei Wang
2018-11-15 10:08 ` [Qemu-devel] [PATCH v9 7/8] migration: move migrate_postcopy() to include/migration/misc.h Wei Wang
2018-11-15 10:08 ` [Qemu-devel] [PATCH v9 8/8] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT Wei Wang
2018-11-15 18:50 ` [Qemu-devel] [PATCH v9 0/8] virtio-balloon: free page hint support no-reply
2018-11-16  1:38   ` Wei Wang
2018-11-27  3:11 ` Wei Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181129054722.GD29246@xz-x1 \
    --to=peterx@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=liliang.opensource@gmail.com \
    --cc=mst@redhat.com \
    --cc=nilal@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=virtio-dev@lists.oasis-open.org \
    --cc=wei.w.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).