All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: qemu-devel@nongnu.org, berrange@redhat.com, quintela@redhat.com
Subject: Re: [Qemu-devel] [PATCH 0/2] migration/channel errors and cancelling
Date: Wed, 20 Dec 2017 11:10:57 +0800	[thread overview]
Message-ID: <20171220031057.GC22308@xz-mi> (raw)
In-Reply-To: <20171219113321.GD2730@work-vm>

On Tue, Dec 19, 2017 at 11:33:21AM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Tue, Dec 19, 2017 at 10:14:08AM +0000, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (peterx@redhat.com) wrote:
> > > > On Fri, Dec 15, 2017 at 05:16:53PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > > 
> > > > > Hi,
> > > > >   Where a channel fails asynchronously during connect, call
> > > > > back through the migration code so it can clean up.
> > > > >   In particular this causes the transition of a 'cancelling' state
> > > > > to 'cancelled' in the case of:
> > > > > 
> > > > >     migrate -d tcp:deadhost:port
> > > > >      <host tries to connect>
> > > > >     migrate_cancel
> > > > > 
> > > > > previously the status would get stuck in cancelling because
> > > > > the final cleanup didn't happen.
> > > > > 
> > > > >   This is the second part of the fix for:
> > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1525899
> > > > 
> > > > IIUC this series tries to deliver the connection error a long way
> > > > until migrate_fd_connect() to handle it. But, haven't we already have
> > > > a function migrate_fd_error() to do that (which is faster, and
> > > > simpler)?
> > > > 
> > > > void migrate_fd_error(MigrationState *s, const Error *error)
> > > > {
> > > >     trace_migrate_fd_error(error_get_pretty(error));
> > > >     assert(s->to_dst_file == NULL);
> > > >     migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
> > > >                       MIGRATION_STATUS_FAILED);
> > > >     migrate_set_error(s, error);
> > > >     notifier_list_notify(&migration_state_notifiers, s);
> > > >     block_cleanup_parameters(s);
> > > > }
> > > > 
> > > > I think it's not handling the case when cancelling.  If we let it to
> > > > handle the cancelling case well, would it be a simpler fix?
> > > >
> > > > Moreover, I think this is another good example that migration is not
> > > > handling the cleanup "cleanly" in general... I really hope we can do
> > > > this better in 2.12.  I'll see whether I can give it a shot, but in
> > > > all cases it'll be after the merging of existing patches since there
> > > > are already quite a lot of dangling patches.
> > > 
> > > No, I think migrate_fd_error is the cause of the problem here, not the
> > > answer.
> > 
> > Could I ask why migrate_fd_error() is problematic?  Yeah I agree that
> > we should have a single point to clean things up, then can we call
> > migrate_fd_cleanup() somehow inside migrate_fd_error()?
> > 
> > The thing I don't really understand is: why we want the error be
> > delivered via those functions (migration_channel_connect,
> > migrate_fd_connect, etc.) to finally be cleaned up.
> 
> migrate_fd_cleanup has a lot of code for dealing with different cases:
>    a) Closing the to_dst_file
>    b) joining the thread if its already running
>    c) Cleaning up multifd (stub)
>    d) finishing of cancel
>    e) notification
>    f) block cleanup
> 
> we seem to have copied some of those into migrate_fd_error - but not all
> of them.  In this case the one we're missing is (d) got finishing
> cancel;
> when you issue a cancel command we move from whatever state we were in
> to the 'cancelling' state, various things get cleaned up and eventually
> we move to cancelled; that wasn't happening because we missed the
> code in migrate_fd_cleanup out.  So we could copy that code (copied code
> is bad) or we could just make sure migrate_fd_cleanup is called like
> it normally is.

I fully agree on above.

> 
> The other thinking is that at the moment the migration/socket.c and tls.c
> etc code has to choose between callbacks into the main migration code,
> either migration_channel_connet or migrate_fd_error - now it's simpler,
> once you've asked for an outgoing migration you always get a callback
> to migration_channel_connect.  Much more predictable.

This is the point I still don't understand, on why we must go into
migrate_fd_connect(), even if error happens before that point.

What I meant is pasted at the end.  Again, I don't know whether it
works, just want to show what I meant.  I'm fine with current approach
too.  Thanks,

----------------------------------

diff --git a/migration/migration.c b/migration/migration.c                     
index 4de3b551fe..fd9b509ab1 100644    
--- a/migration/migration.c            
+++ b/migration/migration.c            
@@ -1074,8 +1074,10 @@ static void migrate_fd_cleanup(void *opaque)            
 {                                     
     MigrationState *s = opaque;       
                                       
-    qemu_bh_delete(s->cleanup_bh);    
-    s->cleanup_bh = NULL;             
+    if (s->cleanup_bh) {              
+        qemu_bh_delete(s->cleanup_bh);
+        s->cleanup_bh = NULL;         
+    }                                 
                                       
     if (s->to_dst_file) {             
         Error *local_err = NULL;      
@@ -1124,11 +1126,15 @@ void migrate_fd_error(MigrationState *s, const Error *error)                                                                          
 {                                     
     trace_migrate_fd_error(error_get_pretty(error));                          
     assert(s->to_dst_file == NULL);   
+    /*                                
+     * If we are still in setup, switch to failure.  It's also                
+     * possible that the migration has been cancelled, then we do             
+     * nothing here.                  
+     */                               
     migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,                      
                       MIGRATION_STATUS_FAILED);                               
     migrate_set_error(s, error);      
-    notifier_list_notify(&migration_state_notifiers, s);                      
-    block_cleanup_parameters(s);      
+    migrate_fd_cleanup(s);            
 }                                     
                                       
 static void migrate_fd_cancel(MigrationState *s)                              

-- 
Peter Xu

  reply	other threads:[~2017-12-20  3:11 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-15 17:16 [Qemu-devel] [PATCH 0/2] migration/channel errors and cancelling Dr. David Alan Gilbert (git)
2017-12-15 17:16 ` [Qemu-devel] [PATCH 1/2] migration: Allow migrate_fd_connect to take an Error * Dr. David Alan Gilbert (git)
2018-01-28 18:53   ` Juan Quintela
2017-12-15 17:16 ` [Qemu-devel] [PATCH 2/2] migration: Route errors down through migration_channel_connect Dr. David Alan Gilbert (git)
2018-01-28 18:53   ` Juan Quintela
2017-12-19  5:16 ` [Qemu-devel] [PATCH 0/2] migration/channel errors and cancelling Peter Xu
2017-12-19 10:14   ` Dr. David Alan Gilbert
2017-12-19 11:21     ` Peter Xu
2017-12-19 11:33       ` Dr. David Alan Gilbert
2017-12-20  3:10         ` Peter Xu [this message]
2017-12-20 18:15           ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171220031057.GC22308@xz-mi \
    --to=peterx@redhat.com \
    --cc=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.