From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: John Snow <jsnow@redhat.com>, qemu-devel@nongnu.org, quintela@redhat.com
Subject: Re: [Qemu-devel] [PATCH] migration: invalidate cache before source start
Date: Tue, 26 Jun 2018 20:11:51 +0100 [thread overview]
Message-ID: <20180626191151.GD2505@work-vm> (raw)
In-Reply-To: <7b6640d1-9e93-4692-500b-f1bce22852ab@virtuozzo.com>
* Vladimir Sementsov-Ogievskiy (vsementsov@virtuozzo.com) wrote:
> 25.06.2018 21:03, John Snow wrote:
> >
> > On 06/25/2018 01:50 PM, Dr. David Alan Gilbert wrote:
> > > * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> > > > * Vladimir Sementsov-Ogievskiy (vsementsov@virtuozzo.com) wrote:
> > > > > 15.06.2018 15:06, Dr. David Alan Gilbert wrote:
> > > > > > * Vladimir Sementsov-Ogievskiy (vsementsov@virtuozzo.com) wrote:
> > > > > > > Invalidate cache before source start in case of failed migration.
> > > > > > >
> > > > > > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > > > > > Why doesn't the code at the bottom of migration_completion,
> > > > > > fail_invalidate: and the code in migrate_fd_cancel handle this?
> > > > > >
> > > > > > What case did you see it in that those didn't handle?
> > > > > > (Also I guess it probably should set s->block_inactive = false)
> > > > > on source I see:
> > > > >
> > > > > 81392@1529065750.766289:migrate_set_state new state 7
> > > > > 81392@1529065750.766330:migration_thread_file_err
> > > > > 81392@1529065750.766332:migration_thread_after_loop
> > > > >
> > > > > so, we are leaving loop on
> > > > > if (qemu_file_get_error(s->to_dst_file)) {
> > > > > migrate_set_state(&s->state, current_active_state,
> > > > > MIGRATION_STATUS_FAILED);
> > > > > trace_migration_thread_file_err();
> > > > > break;
> > > > > }
> > > > >
> > > > > and skip migration_completion()
> > > > Yeh, OK; I'd seen soemthing else a few days ago, where a cancellation
> > > > test that had previously ended with a 'cancelled' state has now ended up
> > > > in 'failed' (which is the state 7 you have above).
> > > > I suspect there's something else going on as well; I think what is
> > > > supposed to happen in the case of 'cancel' is that it spins in 'cancelling' for
> > > > a while in migrate_fd_cancel and then at the bottom of migrate_fd_cancel
> > > > it does the recovery, but because it's going to failed instead, then
> > > > it's jumping over that recovery.
> > > Going back and actually looking at the patch again;
> > > can I ask for 1 small change;
> > > Can you set s->block_inactive = false in the case where you
> > > don't get the local_err (Like we do at the bottom of migrate_fd_cancel)
> > >
> > >
> > > Does that make sense?
> > >
> > > Thanks,
> > >
> > > Dave
> > >
> > Vladimir, one more question for you because I'm not as familiar with
> > this code:
> >
> > In the normal case we need to invalidate the qcow2 cache as a way to
> > re-engage the disk (yes?) when we have failed during the late-migration
> > steps.
> >
> > In this case, we seem to be observing a failure during the bulk transfer
> > loop. Why is it important to invalidate the cache at this step -- would
> > the disk have been inactivated yet? It shouldn't, because it's in the
> > bulk transfer phase -- or am I missing something?
> >
> > I feel like this code is behaving in a way that's fairly surprising for
> > a casual reader so I was hoping you could elaborate for me.
> >
> > --js
>
> In my case, source is already in postcopy state, when error occured, so it
> is inactivated.
Ah, that explains why I couldn't understand the path that got you there;
I never think about restarting the source once we're in postcopy -
because once the destination is running all is lost.
But, you might be in the gap efore management has actually started
the destination so it does need fixing.
Dave
> --
> Best regards,
> Vladimir
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2018-06-26 19:12 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-09 14:14 [Qemu-devel] [PATCH] migration: invalidate cache before source start Vladimir Sementsov-Ogievskiy
2018-06-15 12:06 ` Dr. David Alan Gilbert
2018-06-15 12:33 ` Vladimir Sementsov-Ogievskiy
2018-06-22 20:54 ` John Snow
2018-06-25 17:14 ` Dr. David Alan Gilbert
2018-06-25 17:50 ` Dr. David Alan Gilbert
2018-06-25 18:03 ` John Snow
2018-06-26 7:31 ` Vladimir Sementsov-Ogievskiy
2018-06-26 19:11 ` Dr. David Alan Gilbert [this message]
2018-06-26 8:44 ` Vladimir Sementsov-Ogievskiy
2018-10-08 15:36 ` Vladimir Sementsov-Ogievskiy
2018-10-08 20:21 ` John Snow
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180626191151.GD2505@work-vm \
--to=dgilbert@redhat.com \
--cc=jsnow@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).