From: Peter Xu <peterx@redhat.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: qemu-devel@nongnu.org, berrange@redhat.com, quintela@redhat.com
Subject: Re: [Qemu-devel] [PATCH 0/2] migration/channel errors and cancelling
Date: Wed, 20 Dec 2017 11:10:57 +0800 [thread overview]
Message-ID: <20171220031057.GC22308@xz-mi> (raw)
In-Reply-To: <20171219113321.GD2730@work-vm>
On Tue, Dec 19, 2017 at 11:33:21AM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Tue, Dec 19, 2017 at 10:14:08AM +0000, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (peterx@redhat.com) wrote:
> > > > On Fri, Dec 15, 2017 at 05:16:53PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > >
> > > > > Hi,
> > > > > Where a channel fails asynchronously during connect, call
> > > > > back through the migration code so it can clean up.
> > > > > In particular this causes the transition of a 'cancelling' state
> > > > > to 'cancelled' in the case of:
> > > > >
> > > > > migrate -d tcp:deadhost:port
> > > > > <host tries to connect>
> > > > > migrate_cancel
> > > > >
> > > > > previously the status would get stuck in cancelling because
> > > > > the final cleanup didn't happen.
> > > > >
> > > > > This is the second part of the fix for:
> > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1525899
> > > >
> > > > IIUC this series tries to deliver the connection error a long way
> > > > until migrate_fd_connect() to handle it. But, haven't we already have
> > > > a function migrate_fd_error() to do that (which is faster, and
> > > > simpler)?
> > > >
> > > > void migrate_fd_error(MigrationState *s, const Error *error)
> > > > {
> > > > trace_migrate_fd_error(error_get_pretty(error));
> > > > assert(s->to_dst_file == NULL);
> > > > migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
> > > > MIGRATION_STATUS_FAILED);
> > > > migrate_set_error(s, error);
> > > > notifier_list_notify(&migration_state_notifiers, s);
> > > > block_cleanup_parameters(s);
> > > > }
> > > >
> > > > I think it's not handling the case when cancelling. If we let it to
> > > > handle the cancelling case well, would it be a simpler fix?
> > > >
> > > > Moreover, I think this is another good example that migration is not
> > > > handling the cleanup "cleanly" in general... I really hope we can do
> > > > this better in 2.12. I'll see whether I can give it a shot, but in
> > > > all cases it'll be after the merging of existing patches since there
> > > > are already quite a lot of dangling patches.
> > >
> > > No, I think migrate_fd_error is the cause of the problem here, not the
> > > answer.
> >
> > Could I ask why migrate_fd_error() is problematic? Yeah I agree that
> > we should have a single point to clean things up, then can we call
> > migrate_fd_cleanup() somehow inside migrate_fd_error()?
> >
> > The thing I don't really understand is: why we want the error be
> > delivered via those functions (migration_channel_connect,
> > migrate_fd_connect, etc.) to finally be cleaned up.
>
> migrate_fd_cleanup has a lot of code for dealing with different cases:
> a) Closing the to_dst_file
> b) joining the thread if its already running
> c) Cleaning up multifd (stub)
> d) finishing of cancel
> e) notification
> f) block cleanup
>
> we seem to have copied some of those into migrate_fd_error - but not all
> of them. In this case the one we're missing is (d) got finishing
> cancel;
> when you issue a cancel command we move from whatever state we were in
> to the 'cancelling' state, various things get cleaned up and eventually
> we move to cancelled; that wasn't happening because we missed the
> code in migrate_fd_cleanup out. So we could copy that code (copied code
> is bad) or we could just make sure migrate_fd_cleanup is called like
> it normally is.
I fully agree on above.
>
> The other thinking is that at the moment the migration/socket.c and tls.c
> etc code has to choose between callbacks into the main migration code,
> either migration_channel_connet or migrate_fd_error - now it's simpler,
> once you've asked for an outgoing migration you always get a callback
> to migration_channel_connect. Much more predictable.
This is the point I still don't understand, on why we must go into
migrate_fd_connect(), even if error happens before that point.
What I meant is pasted at the end. Again, I don't know whether it
works, just want to show what I meant. I'm fine with current approach
too. Thanks,
----------------------------------
diff --git a/migration/migration.c b/migration/migration.c
index 4de3b551fe..fd9b509ab1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1074,8 +1074,10 @@ static void migrate_fd_cleanup(void *opaque)
{
MigrationState *s = opaque;
- qemu_bh_delete(s->cleanup_bh);
- s->cleanup_bh = NULL;
+ if (s->cleanup_bh) {
+ qemu_bh_delete(s->cleanup_bh);
+ s->cleanup_bh = NULL;
+ }
if (s->to_dst_file) {
Error *local_err = NULL;
@@ -1124,11 +1126,15 @@ void migrate_fd_error(MigrationState *s, const Error *error)
{
trace_migrate_fd_error(error_get_pretty(error));
assert(s->to_dst_file == NULL);
+ /*
+ * If we are still in setup, switch to failure. It's also
+ * possible that the migration has been cancelled, then we do
+ * nothing here.
+ */
migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
MIGRATION_STATUS_FAILED);
migrate_set_error(s, error);
- notifier_list_notify(&migration_state_notifiers, s);
- block_cleanup_parameters(s);
+ migrate_fd_cleanup(s);
}
static void migrate_fd_cancel(MigrationState *s)
--
Peter Xu
next prev parent reply other threads:[~2017-12-20 3:11 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-15 17:16 [Qemu-devel] [PATCH 0/2] migration/channel errors and cancelling Dr. David Alan Gilbert (git)
2017-12-15 17:16 ` [Qemu-devel] [PATCH 1/2] migration: Allow migrate_fd_connect to take an Error * Dr. David Alan Gilbert (git)
2018-01-28 18:53 ` Juan Quintela
2017-12-15 17:16 ` [Qemu-devel] [PATCH 2/2] migration: Route errors down through migration_channel_connect Dr. David Alan Gilbert (git)
2018-01-28 18:53 ` Juan Quintela
2017-12-19 5:16 ` [Qemu-devel] [PATCH 0/2] migration/channel errors and cancelling Peter Xu
2017-12-19 10:14 ` Dr. David Alan Gilbert
2017-12-19 11:21 ` Peter Xu
2017-12-19 11:33 ` Dr. David Alan Gilbert
2017-12-20 3:10 ` Peter Xu [this message]
2017-12-20 18:15 ` Dr. David Alan Gilbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171220031057.GC22308@xz-mi \
--to=peterx@redhat.com \
--cc=berrange@redhat.com \
--cc=dgilbert@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).