From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: Fei Li <fli@suse.com>,
qemu-devel@nongnu.org, armbru@redhat.com, famz@redhat.com,
quintela@redhat.com
Subject: Re: [Qemu-devel] [PATCH RFC v7 5/9] migration: fix the multifd code when sending less channels
Date: Fri, 2 Nov 2018 16:33:35 +0000 [thread overview]
Message-ID: <20181102163334.GC2387@work-vm> (raw)
In-Reply-To: <20181102033241.GF7804@xz-x1>
* Peter Xu (peterx@redhat.com) wrote:
> On Fri, Nov 02, 2018 at 11:00:24AM +0800, Fei Li wrote:
> >
> >
> > On 11/02/2018 10:37 AM, Peter Xu wrote:
> > > On Thu, Nov 01, 2018 at 06:17:11PM +0800, Fei Li wrote:
> > > > Set the migration state to "failed" instead of "setup" when failing
> > > > to send packet via some channel.
> > > Could you please provide more information in the commit message?
> > > E.g., what will happen if without this patch? Will it crash the
> > > source or stall the source migration or others? Otherwise it's a bit
> > > hard for me to understand what's this patch for.
> > Sorry for the inadequate description , I was intended to say that when
> > failing
> > to do the live migration using multifd, e.g. sending less channels, the src
> > status displays "setup" when running `info migrate`. I assume we should tell
> > users that the "Migration status" is "failed" now (and along with the
> > failure reason).
> >
> > The current src status when failed inmultifd_new_send_channel_async():
> >
> >
> > (qemu) migrate_set_capability x-multifd on
> > (qemu) migrate_set_parameter x-multifd-channels 4
> > (qemu) migrate -d tcp:192.168.190.98:4444
> > (qemu) qemu-system-x86_64: failed in multifd_new_send_channel_async due to
> > ...
> > (qemu) info migrate
> > globals:
> > store-global-state: on
> > only-migratable: off
> > send-configuration: on
> > send-section-footer: on
> > decompress-error-check: on
> > capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks:
> > off compress: off events: off postcopy-ram: off x-colo: off release-ram: off
> > block: off return-path: off pause-before-switchover: off x-multifd: on
> > dirty-bitmaps: off postcopy-blocktime: off late-block-activate: off
> > Migration status: setup
> > total time: 0 milliseconds
>
> Thanks for the information.
>
> I had a quick look. For now we do this:
>
> multifd_save_setup (without waiting for channels to be ready)
> create thread migration_thread
> (in thread)
> ram_save_setup
> multifd_send_sync_main (wait for the channels)
>
> The thing is that we didn't get the notification when one of the
> multifd channel is failed. IMHO instead of setting the global
> migration state in a per-channel function, we should just report the
> error upwards, then the main thread should decide how to change the
> state machine of the migration.
Best to wait for Juan on that; I've got vague memories that reporting
errors among the threads was a bit tricky.
Dave
> And we have set it in migrate_set_error() after all so the main thread
> should be able to know somehow (though IMHO I'll even prefer to have a
> per-channel variable to keep the state of the channel, then the
> per-channel functions won't touch any globals which offers better
> isolation).
>
> I'm not sure how Juan thinks about it, but I'd prefer some work to
> provide such isolation and also some mechanism to allow the main
> thread to detect the per-channel errors not only during setup phase
> but also during the migration (e.g., when network is suddenly down).
> Then we don't touch any globals (e.g., we shouldn't call
> migrate_get_current in any per-channel function like
> multifd_new_send_channel_async).
>
> >
> > >
> > > Normally I would prefer to not touch global states in feature specific
> > > code path, but I'd like to know the problem more first...
> > >
> > > Thanks,
> > >
> > > > Cc: Peter Xu <peterx@redhat.com>
> > > > Signed-off-by: Fei Li <fli@suse.com>
> > > > ---
> > > > migration/ram.c | 2 ++
> > > > 1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/migration/ram.c b/migration/ram.c
> > > > index 4db3b3e8f4..c84d164fc8 100644
> > > > --- a/migration/ram.c
> > > > +++ b/migration/ram.c
> > > > @@ -1072,6 +1072,7 @@ out:
> > > > static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
> > > > {
> > > > MultiFDSendParams *p = opaque;
> > > > + MigrationState *s = migrate_get_current();
> > > > QIOChannel *sioc = QIO_CHANNEL(qio_task_get_source(task));
> > > > Error *local_err = NULL;
> > > > @@ -1083,6 +1084,7 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
> > > > if (multifd_save_cleanup(&local_err) != 0) {
> > > > migrate_set_error(migrate_get_current(), local_err);
> > > > }
> > > > + migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
> > > > } else {
> > > > p->c = QIO_CHANNEL(sioc);
> > > > qio_channel_set_delay(p->c, false);
> > > > --
> > > > 2.13.7
> > > >
> > > Regards,
> > >
> >
>
> Regards,
>
> --
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2018-11-02 16:33 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-01 10:17 [Qemu-devel] [PATCH RFC v7 0/9] qemu_thread_create: propagate errors to callers to check Fei Li
2018-11-01 10:17 ` [Qemu-devel] [PATCH RFC v7 1/9] Fix segmentation fault when qemu_signal_init fails Fei Li
2018-11-05 13:32 ` Juan Quintela
2018-11-06 5:08 ` Fei Li
2018-11-01 10:17 ` [Qemu-devel] [PATCH RFC v7 2/9] qemu_init_vcpu: add a new Error parameter to propagate Fei Li
2018-11-05 13:34 ` Juan Quintela
2018-11-01 10:17 ` [Qemu-devel] [PATCH RFC v7 3/9] qemu_thread_join: fix segmentation fault Fei Li
2018-11-01 10:17 ` [Qemu-devel] [PATCH RFC v7 4/9] migration: fix some segmentation faults when using multifd Fei Li
2018-11-02 2:31 ` Peter Xu
2018-11-02 6:03 ` Fei Li
2018-11-01 10:17 ` [Qemu-devel] [PATCH RFC v7 5/9] migration: fix the multifd code when sending less channels Fei Li
2018-11-02 2:37 ` Peter Xu
2018-11-02 3:00 ` Fei Li
2018-11-02 3:32 ` Peter Xu
2018-11-02 7:13 ` Fei Li
2018-11-02 7:32 ` Peter Xu
2018-11-02 16:33 ` Dr. David Alan Gilbert [this message]
2018-11-12 4:43 ` Fei Li
2018-12-04 7:32 ` Fei Li
2018-11-01 10:17 ` [Qemu-devel] [PATCH RFC v7 6/9] migration: fix the multifd code when receiving " Fei Li
2018-11-02 2:46 ` Peter Xu
2018-11-06 5:29 ` Fei Li
2018-11-01 10:17 ` [Qemu-devel] [PATCH RFC v7 7/9] migration: remove unused &local_err parameter in migrate_set_error Fei Li
2018-11-05 13:59 ` Juan Quintela
2018-11-06 4:51 ` Fei Li
2018-11-01 10:17 ` [Qemu-devel] [PATCH RFC v7 8/9] migration: add more error handling for postcopy_ram_enable_notify Fei Li
2018-11-01 10:17 ` [Qemu-devel] [PATCH RFC v7 9/9] qemu_thread_create: propagate the error to callers to handle Fei Li
2018-11-05 13:53 ` Juan Quintela
2018-11-06 7:15 ` Fei Li
2018-11-03 18:09 ` [Qemu-devel] [PATCH RFC v7 0/9] qemu_thread_create: propagate errors to callers to check no-reply
2018-11-05 4:57 ` Fei Li
2018-11-05 18:19 ` no-reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181102163334.GC2387@work-vm \
--to=dgilbert@redhat.com \
--cc=armbru@redhat.com \
--cc=famz@redhat.com \
--cc=fli@suse.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).