qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Fei Li <fli@suse.com>
Cc: qemu-devel@nongnu.org, armbru@redhat.com, dgilbert@redhat.com,
	famz@redhat.com, quintela@redhat.com
Subject: Re: [Qemu-devel] [PATCH RFC v7 5/9] migration: fix the multifd code when sending less channels
Date: Fri, 2 Nov 2018 11:32:41 +0800	[thread overview]
Message-ID: <20181102033241.GF7804@xz-x1> (raw)
In-Reply-To: <39af00cd-fb54-de7d-154e-d9047156ab43@suse.com>

On Fri, Nov 02, 2018 at 11:00:24AM +0800, Fei Li wrote:
> 
> 
> On 11/02/2018 10:37 AM, Peter Xu wrote:
> > On Thu, Nov 01, 2018 at 06:17:11PM +0800, Fei Li wrote:
> > > Set the migration state to "failed" instead of "setup" when failing
> > > to send packet via some channel.
> > Could you please provide more information in the commit message?
> > E.g., what will happen if without this patch?  Will it crash the
> > source or stall the source migration or others?  Otherwise it's a bit
> > hard for me to understand what's this patch for.
> Sorry for the inadequate description , I was intended to say that when
> failing
> to do the live migration using multifd, e.g. sending less channels, the src
> status displays "setup" when running `info migrate`. I assume we should tell
> users that the "Migration status" is "failed" now (and along with the
> failure reason).
> 
> The current src status when failed inmultifd_new_send_channel_async():
> 
> 
> (qemu) migrate_set_capability x-multifd on
> (qemu) migrate_set_parameter x-multifd-channels 4
> (qemu) migrate -d tcp:192.168.190.98:4444
> (qemu) qemu-system-x86_64: failed in multifd_new_send_channel_async due to
> ...
> (qemu) info migrate
> globals:
> store-global-state: on
> only-migratable: off
> send-configuration: on
> send-section-footer: on
> decompress-error-check: on
> capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks:
> off compress: off events: off postcopy-ram: off x-colo: off release-ram: off
> block: off return-path: off pause-before-switchover: off x-multifd: on
> dirty-bitmaps: off postcopy-blocktime: off late-block-activate: off
> Migration status: setup
> total time: 0 milliseconds

Thanks for the information.

I had a quick look.  For now we do this:

        multifd_save_setup (without waiting for channels to be ready)
        create thread migration_thread
                (in thread)
                ram_save_setup
                        multifd_send_sync_main (wait for the channels)

The thing is that we didn't get the notification when one of the
multifd channel is failed.  IMHO instead of setting the global
migration state in a per-channel function, we should just report the
error upwards, then the main thread should decide how to change the
state machine of the migration.

And we have set it in migrate_set_error() after all so the main thread
should be able to know somehow (though IMHO I'll even prefer to have a
per-channel variable to keep the state of the channel, then the
per-channel functions won't touch any globals which offers better
isolation).

I'm not sure how Juan thinks about it, but I'd prefer some work to
provide such isolation and also some mechanism to allow the main
thread to detect the per-channel errors not only during setup phase
but also during the migration (e.g., when network is suddenly down).
Then we don't touch any globals (e.g., we shouldn't call
migrate_get_current in any per-channel function like
multifd_new_send_channel_async).

> 
> > 
> > Normally I would prefer to not touch global states in feature specific
> > code path, but I'd like to know the problem more first...
> > 
> > Thanks,
> > 
> > > Cc: Peter Xu <peterx@redhat.com>
> > > Signed-off-by: Fei Li <fli@suse.com>
> > > ---
> > >   migration/ram.c | 2 ++
> > >   1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/migration/ram.c b/migration/ram.c
> > > index 4db3b3e8f4..c84d164fc8 100644
> > > --- a/migration/ram.c
> > > +++ b/migration/ram.c
> > > @@ -1072,6 +1072,7 @@ out:
> > >   static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
> > >   {
> > >       MultiFDSendParams *p = opaque;
> > > +    MigrationState *s = migrate_get_current();
> > >       QIOChannel *sioc = QIO_CHANNEL(qio_task_get_source(task));
> > >       Error *local_err = NULL;
> > > @@ -1083,6 +1084,7 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
> > >           if (multifd_save_cleanup(&local_err) != 0) {
> > >               migrate_set_error(migrate_get_current(), local_err);
> > >           }
> > > +        migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
> > >       } else {
> > >           p->c = QIO_CHANNEL(sioc);
> > >           qio_channel_set_delay(p->c, false);
> > > -- 
> > > 2.13.7
> > > 
> > Regards,
> > 
> 

Regards,

-- 
Peter Xu

  reply	other threads:[~2018-11-02  3:32 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-01 10:17 [Qemu-devel] [PATCH RFC v7 0/9] qemu_thread_create: propagate errors to callers to check Fei Li
2018-11-01 10:17 ` [Qemu-devel] [PATCH RFC v7 1/9] Fix segmentation fault when qemu_signal_init fails Fei Li
2018-11-05 13:32   ` Juan Quintela
2018-11-06  5:08     ` Fei Li
2018-11-01 10:17 ` [Qemu-devel] [PATCH RFC v7 2/9] qemu_init_vcpu: add a new Error parameter to propagate Fei Li
2018-11-05 13:34   ` Juan Quintela
2018-11-01 10:17 ` [Qemu-devel] [PATCH RFC v7 3/9] qemu_thread_join: fix segmentation fault Fei Li
2018-11-01 10:17 ` [Qemu-devel] [PATCH RFC v7 4/9] migration: fix some segmentation faults when using multifd Fei Li
2018-11-02  2:31   ` Peter Xu
2018-11-02  6:03     ` Fei Li
2018-11-01 10:17 ` [Qemu-devel] [PATCH RFC v7 5/9] migration: fix the multifd code when sending less channels Fei Li
2018-11-02  2:37   ` Peter Xu
2018-11-02  3:00     ` Fei Li
2018-11-02  3:32       ` Peter Xu [this message]
2018-11-02  7:13         ` Fei Li
2018-11-02  7:32           ` Peter Xu
2018-11-02 16:33         ` Dr. David Alan Gilbert
2018-11-12  4:43           ` Fei Li
2018-12-04  7:32             ` Fei Li
2018-11-01 10:17 ` [Qemu-devel] [PATCH RFC v7 6/9] migration: fix the multifd code when receiving " Fei Li
2018-11-02  2:46   ` Peter Xu
2018-11-06  5:29     ` Fei Li
2018-11-01 10:17 ` [Qemu-devel] [PATCH RFC v7 7/9] migration: remove unused &local_err parameter in migrate_set_error Fei Li
2018-11-05 13:59   ` Juan Quintela
2018-11-06  4:51     ` Fei Li
2018-11-01 10:17 ` [Qemu-devel] [PATCH RFC v7 8/9] migration: add more error handling for postcopy_ram_enable_notify Fei Li
2018-11-01 10:17 ` [Qemu-devel] [PATCH RFC v7 9/9] qemu_thread_create: propagate the error to callers to handle Fei Li
2018-11-05 13:53   ` Juan Quintela
2018-11-06  7:15     ` Fei Li
2018-11-03 18:09 ` [Qemu-devel] [PATCH RFC v7 0/9] qemu_thread_create: propagate errors to callers to check no-reply
2018-11-05  4:57   ` Fei Li
2018-11-05 18:19 ` no-reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181102033241.GF7804@xz-x1 \
    --to=peterx@redhat.com \
    --cc=armbru@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=famz@redhat.com \
    --cc=fli@suse.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).