From: Peter Xu <peterx@redhat.com>
To: Wei Wang <wei.w.wang@intel.com>
Cc: virtio-dev@lists.oasis-open.org, quintela@redhat.com,
liliang.opensource@gmail.com, mst@redhat.com,
qemu-devel@nongnu.org, dgilbert@redhat.com, pbonzini@redhat.com,
nilal@redhat.com
Subject: Re: [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy
Date: Thu, 29 Nov 2018 13:47:22 +0800 [thread overview]
Message-ID: <20181129054722.GD29246@xz-x1> (raw)
In-Reply-To: <20181129051014.GC29246@xz-x1>
On Thu, Nov 29, 2018 at 01:10:14PM +0800, Peter Xu wrote:
> On Thu, Nov 29, 2018 at 11:40:57AM +0800, Wei Wang wrote:
> > On 11/28/2018 05:32 PM, Peter Xu wrote:
> > >
> > > So what I am worrying here are corner cases where we might forget to
> > > stop the hinting. I'm fabricating one example sequence of events:
> > >
> > > (start migration)
> > > START_MIGRATION
> > > BEFORE_SYNC
> > > AFTER_SYNC
> > > ...
> > > BEFORE_SYNC
> > > AFTER_SYNC
> > > (some SaveStateEntry failed rather than RAM, then
> > > migration_detect_error returned MIG_THR_ERR_FATAL so we need to
> > > fail the migration, however when running the previous
> > > ram_save_iterate for RAM's specific SaveStateEntry we didn't see
> > > any error so no ERROR event detected)
> > >
> > > Then it seems the hinting will last forever. Considering that now I'm
> > > not sure whether this can be done ram-only, since even if you capture
> > > ram_save_complete() and at the same time you introduce PRECOPY_END you
> > > may still miss the PRECOPY_END event since AFAIU ram_save_complete()
> > > won't be called at all in this case.
> > >
> > > Could this happen?
> >
> > Thanks, indeed this case could happen if we add PRECOPY_END in
> > ram_save_complete.
> >
> > How about putting PRECOPY_END in ram_save_cleanup?
> > I think it would be called in any case.
>
> Sounds good.
>
> >
> > I'm also thinking probably we don't need PRECOPY_ERR when we have
> > PRECOPY_END,
> > and what do you think of the notifier names below:
> >
> > +typedef enum PrecopyNotifyReason {
> > + PRECOPY_NOTIFY_RAM_SAVE_END = 0,
> > + PRECOPY_NOTIFY_RAM_SAVE_START = 1,
> > + PRECOPY_NOTIFY_RAM_SAVE_BEFORE_SYNC_BITMAP = 2,
> > + PRECOPY_NOTIFY_RAM_SAVE_AFTER_SYNC_BITMAP = 3,
> > + PRECOPY_NOTIFY_RAM_SAVE_MAX = 4,
> > +} PrecopyNotifyReason;
>
> (please see below [1]...)
>
> >
> >
> > >
> > > >
> > > > > [1]
> > > > >
> > > > > > > Another thing to mention about the "reasons" (though I see it more
> > > > > > > like "events"): have you thought about adding a PRECOPY_NOTIFY_END?
> > > > > > > It might help in some cases:
> > > > > > >
> > > > > > > - then you don't need to trickily export the migrate_postcopy()
> > > > > > > since you'll notify that before postcopy starts
> > > > > > I'm thinking probably we don't need to export migrate_postcopy even now.
> > > > > > It's more like a sanity check, and not needed because now we have the
> > > > > > notifier registered to the precopy specific callchain, which has ensured
> > > > > > that
> > > > > > it is invoked via precopy.
> > > > > But postcopy will always start with precopy, no?
> > > > Yes, but I think we could add the check in precopy_notify()
> > > I'm not sure that's good. If the notifier could potentially have
> > > other user, they might still work with postcopy, and they might expect
> > > e.g. BEFORE_SYNC to be called for every sync, even if it's at the
> > > precopy stage of a postcopy.
> >
> > I think this precopy notifier callchain is expected to be used only for
> > the precopy mode. Postcopy has its dedicated notifier callchain that
> > users could use.
> >
> > How about changing the migrate_postcopy() check to "ms->start_postcopy":
> >
> > bool migration_postcopy_start(void)
> > {
> > MigrationState *s;
> >
> > s = migrate_get_current();
> >
> > return atomic_read(&s->start_postcopy);
> > }
> >
> >
> > static void precopy_notify(PrecopyNotifyReason reason)
> > {
> > if (migration_postcopy_start())
> > return;
> >
> > notifier_list_notify(&precopy_notifier_list, &reason);
> > }
> >
> > If postcopy started with precopy, the precopy optimization feature
> > could still be used until it switches to the postcopy mode.
>
> I'm not sure we can use start_postcopy. It's a variable being set in
> the QMP handler but it does not mean postcopy has started. I'm afraid
> there can be race where it's still precopy but the variable is set so
> event could be missed...
>
> IMHO the problem is not that complicated. How about this proposal:
>
> [1]
>
> typedef enum PrecopyNotifyReason {
> PRECOPY_NOTIFY_RAM_START,
> PRECOPY_NOTIFY_RAM_BEFORE_SYNC,
> PRECOPY_NOTIFY_RAM_AFTER_SYNC,
> PRECOPY_NOTIFY_COMPLETE,
> PRECOPY_NOTIFY_RAM_CLEANUP,
> };
>
> The first three keep the same as your old ones. Notify RAM_CLEANUP in
> ram_save_cleanup() to make sure it'll always be cleaned up (the same
> as PRECOPY_END, just another name). Notify COMPLETE in
> qemu_savevm_state_complete_precopy() to show that precopy is
> completed. Meanwhile on balloon side you should stop the hinting for
> either RAM_CLEANUP or COMPLETE event. Then either:
>
> - precopy is switching to postcopy, or
> - precopy completed, or
> - precopy failed/cancelled
>
> You should always get at least a notification to stop the balloon.
> Though you could also get one RAM_CLEANUP after one COMPLETE, but
> the balloon should easily handle it (stop the hinting twice).
>
> Here maybe you can even remove the "RAM_" in both RAM_START and
> RAM_CLEANUP if we're going to have COMPLETE since after all it'll be
> not only limited to RAM.
Oh maybe we can remove all the RAM_ prefix to make it precopy
general...
typedef enum PrecopyNotifyReason {
PRECOPY_NOTIFY_SETUP,
PRECOPY_NOTIFY_BEFORE_SYNC,
PRECOPY_NOTIFY_AFTER_SYNC,
PRECOPY_NOTIFY_COMPLETE,
PRECOPY_NOTIFY_CLEANUP,
};
Then we can just hook everything with the corresponding names:
SETUP: hooks with qemu_savevm_state_setup
COMPLETE: hooks with qemu_savevm_state_complete_precopy
CLEANUP: hooks with qemu_savevm_state_cleanup
I'm not sure whether you'll need another hook in ram_state_reset in
the future but for now I don't see it necessary since I don't thnk
ram_list.version would change during migration for now, so
ram_state_reset should only be called during setup.
>
> Another suggestion is that you can add an Error into the notify hooks,
> please refer to the postcopy one:
>
> int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp);
>
> So the hook functions have a way to even stop the migration (though
> for balloon hinting it'll be always optional so no error should be
> reported...), then the two interfaces are matched.
>
> >
> >
> >
> > > In that sense I still feel the
> > > PRECOPY_END is better (so contantly call it at the end of precopy, no
> > > matter whether there's another postcopy afterwards). It sounds like a
> > > cleaner interface.
> >
> > Probably I still haven't got the point how PRECOPY_END could help above yet.
>
> Please have a look at above proposal. Thanks,
>
> --
> Peter Xu
>
Regards,
--
Peter Xu
next prev parent reply other threads:[~2018-11-29 5:47 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-15 10:07 [Qemu-devel] [PATCH v9 0/8] virtio-balloon: free page hint support Wei Wang
2018-11-15 10:07 ` [Qemu-devel] [PATCH v9 1/8] bitmap: fix bitmap_count_one Wei Wang
2018-11-15 10:07 ` [Qemu-devel] [PATCH v9 2/8] bitmap: bitmap_count_one_with_offset Wei Wang
2018-11-15 10:07 ` [Qemu-devel] [PATCH v9 3/8] migration: use bitmap_mutex in migration_bitmap_clear_dirty Wei Wang
2018-11-27 5:40 ` Peter Xu
2018-11-27 6:02 ` Wei Wang
2018-11-27 6:12 ` [Qemu-devel] [virtio-dev] " Wei Wang
2018-11-27 7:41 ` Peter Xu
2018-11-27 10:17 ` Wei Wang
2018-11-15 10:08 ` [Qemu-devel] [PATCH v9 4/8] migration: API to clear bits of guest free pages from the dirty bitmap Wei Wang
2018-11-27 6:06 ` Peter Xu
2018-11-27 6:52 ` Wei Wang
2018-11-27 7:43 ` Peter Xu
2018-11-15 10:08 ` [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy Wei Wang
2018-11-27 7:38 ` Peter Xu
2018-11-27 10:25 ` Wei Wang
2018-11-28 5:26 ` Peter Xu
2018-11-28 9:01 ` Wei Wang
2018-11-28 9:32 ` Peter Xu
2018-11-29 3:40 ` Wei Wang
2018-11-29 5:10 ` Peter Xu
2018-11-29 5:47 ` Peter Xu [this message]
2018-11-29 6:30 ` Wei Wang
2018-11-30 5:05 ` Wei Wang
2018-11-30 5:57 ` Peter Xu
2018-11-30 7:09 ` Wei Wang
2018-11-15 10:08 ` [Qemu-devel] [PATCH v9 6/8] migration/ram.c: add a function to disable the bulk stage Wei Wang
2018-11-15 10:08 ` [Qemu-devel] [PATCH v9 7/8] migration: move migrate_postcopy() to include/migration/misc.h Wei Wang
2018-11-15 10:08 ` [Qemu-devel] [PATCH v9 8/8] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT Wei Wang
2018-11-15 18:50 ` [Qemu-devel] [PATCH v9 0/8] virtio-balloon: free page hint support no-reply
2018-11-16 1:38 ` Wei Wang
2018-11-27 3:11 ` Wei Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181129054722.GD29246@xz-x1 \
--to=peterx@redhat.com \
--cc=dgilbert@redhat.com \
--cc=liliang.opensource@gmail.com \
--cc=mst@redhat.com \
--cc=nilal@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=virtio-dev@lists.oasis-open.org \
--cc=wei.w.wang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).