From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: John Snow <jsnow@redhat.com>, qemu-devel@nongnu.org, quintela@redhat.com
Subject: Re: [Qemu-devel] [PATCH] migration: invalidate cache before source start
Date: Tue, 26 Jun 2018 20:11:51 +0100 [thread overview]
Message-ID: <20180626191151.GD2505@work-vm> (raw)
In-Reply-To: <7b6640d1-9e93-4692-500b-f1bce22852ab@virtuozzo.com>
* Vladimir Sementsov-Ogievskiy (vsementsov@virtuozzo.com) wrote:
> 25.06.2018 21:03, John Snow wrote:
> >
> > On 06/25/2018 01:50 PM, Dr. David Alan Gilbert wrote:
> > > * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> > > > * Vladimir Sementsov-Ogievskiy (vsementsov@virtuozzo.com) wrote:
> > > > > 15.06.2018 15:06, Dr. David Alan Gilbert wrote:
> > > > > > * Vladimir Sementsov-Ogievskiy (vsementsov@virtuozzo.com) wrote:
> > > > > > > Invalidate cache before source start in case of failed migration.
> > > > > > >
> > > > > > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > > > > > Why doesn't the code at the bottom of migration_completion,
> > > > > > fail_invalidate: and the code in migrate_fd_cancel handle this?
> > > > > >
> > > > > > What case did you see it in that those didn't handle?
> > > > > > (Also I guess it probably should set s->block_inactive = false)
> > > > > on source I see:
> > > > >
> > > > > 81392@1529065750.766289:migrate_set_state new state 7
> > > > > 81392@1529065750.766330:migration_thread_file_err
> > > > > 81392@1529065750.766332:migration_thread_after_loop
> > > > >
> > > > > so, we are leaving loop on
> > > > > if (qemu_file_get_error(s->to_dst_file)) {
> > > > > migrate_set_state(&s->state, current_active_state,
> > > > > MIGRATION_STATUS_FAILED);
> > > > > trace_migration_thread_file_err();
> > > > > break;
> > > > > }
> > > > >
> > > > > and skip migration_completion()
> > > > Yeh, OK; I'd seen soemthing else a few days ago, where a cancellation
> > > > test that had previously ended with a 'cancelled' state has now ended up
> > > > in 'failed' (which is the state 7 you have above).
> > > > I suspect there's something else going on as well; I think what is
> > > > supposed to happen in the case of 'cancel' is that it spins in 'cancelling' for
> > > > a while in migrate_fd_cancel and then at the bottom of migrate_fd_cancel
> > > > it does the recovery, but because it's going to failed instead, then
> > > > it's jumping over that recovery.
> > > Going back and actually looking at the patch again;
> > > can I ask for 1 small change;
> > > Can you set s->block_inactive = false in the case where you
> > > don't get the local_err (Like we do at the bottom of migrate_fd_cancel)
> > >
> > >
> > > Does that make sense?
> > >
> > > Thanks,
> > >
> > > Dave
> > >
> > Vladimir, one more question for you because I'm not as familiar with
> > this code:
> >
> > In the normal case we need to invalidate the qcow2 cache as a way to
> > re-engage the disk (yes?) when we have failed during the late-migration
> > steps.
> >
> > In this case, we seem to be observing a failure during the bulk transfer
> > loop. Why is it important to invalidate the cache at this step -- would
> > the disk have been inactivated yet? It shouldn't, because it's in the
> > bulk transfer phase -- or am I missing something?
> >
> > I feel like this code is behaving in a way that's fairly surprising for
> > a casual reader so I was hoping you could elaborate for me.
> >
> > --js
>
> In my case, source is already in postcopy state, when error occured, so it
> is inactivated.
Ah, that explains why I couldn't understand the path that got you there;
I never think about restarting the source once we're in postcopy -
because once the destination is running all is lost.
But, you might be in the gap efore management has actually started
the destination so it does need fixing.
Dave
> --
> Best regards,
> Vladimir
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2018-06-26 19:12 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-09 14:14 [Qemu-devel] [PATCH] migration: invalidate cache before source start Vladimir Sementsov-Ogievskiy
2018-06-15 12:06 ` Dr. David Alan Gilbert
2018-06-15 12:33 ` Vladimir Sementsov-Ogievskiy
2018-06-22 20:54 ` John Snow
2018-06-25 17:14 ` Dr. David Alan Gilbert
2018-06-25 17:50 ` Dr. David Alan Gilbert
2018-06-25 18:03 ` John Snow
2018-06-26 7:31 ` Vladimir Sementsov-Ogievskiy
2018-06-26 19:11 ` Dr. David Alan Gilbert [this message]
2018-06-26 8:44 ` Vladimir Sementsov-Ogievskiy
2018-10-08 15:36 ` Vladimir Sementsov-Ogievskiy
2018-10-08 20:21 ` John Snow
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180626191151.GD2505@work-vm \
--to=dgilbert@redhat.com \
--cc=jsnow@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.