qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Gonglei (Arei)" <arei.gonglei@huawei.com>,
	"quintela@redhat.com" <quintela@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	yanghongyang <yanghongyang@huawei.com>,
	Huangzhichao <huangzhichao@huawei.com>,
	jdenemar@redhat.com
Subject: Re: [Qemu-devel] [Bug?] BQL about live migration
Date: Fri, 3 Mar 2017 15:03:42 +0000	[thread overview]
Message-ID: <20170303150342.GE2439@work-vm> (raw)
In-Reply-To: <4b0de5de-aec7-0388-0a68-cf9e02b48b1d@redhat.com>

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> 
> 
> On 03/03/2017 14:26, Dr. David Alan Gilbert wrote:
> > * Paolo Bonzini (pbonzini@redhat.com) wrote:
> >>
> >>
> >> On 03/03/2017 14:11, Dr. David Alan Gilbert wrote:
> >>> * Paolo Bonzini (pbonzini@redhat.com) wrote:
> >>>>
> >>>>
> >>>> On 03/03/2017 13:00, Dr. David Alan Gilbert wrote:
> >>>>> Ouch that's pretty nasty; I remember Paolo explaining to me a while ago that
> >>>>> their were times when run_on_cpu would have to drop the BQL and I worried about it,
> >>>>> but this is the 1st time I've seen an error due to it.
> >>>>>
> >>>>> Do you know what the migration state was at that point? Was it MIGRATION_STATUS_CANCELLING?
> >>>>> I'm thinking perhaps we should stop 'cont' from continuing while migration is in
> >>>>> MIGRATION_STATUS_CANCELLING.  Do we send an event when we hit CANCELLED - so that
> >>>>> perhaps libvirt could avoid sending the 'cont' until then?
> >>>>
> >>>> No, there's no event, though I thought libvirt would poll until
> >>>> "query-migrate" returns the cancelled state.  Of course that is a small
> >>>> consolation, because a segfault is unacceptable.
> >>>
> >>> I think you might get an event if you set the new migrate capability called
> >>> 'events' on!
> >>>
> >>> void migrate_set_state(int *state, int old_state, int new_state)
> >>> {
> >>>     if (atomic_cmpxchg(state, old_state, new_state) == old_state) {
> >>>         trace_migrate_set_state(new_state);
> >>>         migrate_generate_event(new_state);
> >>>     }
> >>> }
> >>>
> >>> static void migrate_generate_event(int new_state)
> >>> {
> >>>     if (migrate_use_events()) {
> >>>         qapi_event_send_migration(new_state, &error_abort); 
> >>>     }
> >>> }
> >>>
> >>> That event feature went in sometime after 2.3.0.
> >>>
> >>>> One possibility is to suspend the monitor in qmp_migrate_cancel and
> >>>> resume it (with add_migration_state_change_notifier) when we hit the
> >>>> CANCELLED state.  I'm not sure what the latency would be between the end
> >>>> of migrate_fd_cancel and finally reaching CANCELLED.
> >>>
> >>> I don't like suspending monitors; it can potentially take quite a significant
> >>> time to do a cancel.
> >>> How about making 'cont' fail if we're in CANCELLING?
> >>
> >> Actually I thought that would be the case already (in fact CANCELLING is
> >> internal only; the outside world sees it as "active" in query-migrate).
> >>
> >> Lei, what is the runstate?  (That is, why did cont succeed at all)?
> > 
> > I suspect it's RUN_STATE_FINISH_MIGRATE - we set that before we do the device
> > save, and that's what we get at the end of a migrate and it's legal to restart
> > from there.
> 
> Yeah, but I think we get there at the end of a failed migrate only.  So
> perhaps we can introduce a new state RUN_STATE_FAILED_MIGRATE and forbid
> "cont" from finish-migrate (only allow it from failed-migrate)?

OK, I was wrong in my previous statement; we actually go FINISH_MIGRATE->POSTMIGRATE
so no new state is needed; you shouldn't be restarting the cpu in FINISH_MIGRATE.

My preference is to get libvirt to wait for the transition to POSTMIGRATE before
it issues the 'cont'.  I'd rather not block the monitor with 'cont' but I'm
not sure how we'd cleanly make cont fail without breaking existing libvirts
that usually don't hit this race. (cc'ing in Jiri).

Dave

> Paolo
> 
> >> Paolo
> >>
> >>> I'd really love to see the 'run_on_cpu' being more careful about the BQL;
> >>> we really need all of the rest of the devices to stay quiesced at times.
> >>
> >> That's not really possible, because of how condition variables work. :(
> > 
> > *Really* we need to find a solution to that - there's probably lots of 
> > other things that can spring up in that small window other than the
> > 'cont'.
> > 
> > Dave
> > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  parent reply	other threads:[~2017-03-03 15:03 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-03  9:29 [Qemu-devel] [Bug?] BQL about live migration Gonglei (Arei)
2017-03-03 10:42 ` Fam Zheng
2017-03-06  2:07   ` yanghongyang
2017-03-03 12:00 ` Dr. David Alan Gilbert
2017-03-03 12:48   ` Paolo Bonzini
2017-03-03 13:11     ` Dr. David Alan Gilbert
2017-03-03 13:14       ` Paolo Bonzini
2017-03-03 13:26         ` Dr. David Alan Gilbert
2017-03-03 13:33           ` Paolo Bonzini
2017-03-03 14:15             ` Yang Hongyang
2017-03-03 15:03             ` Dr. David Alan Gilbert [this message]
2017-03-03 13:57           ` Yang Hongyang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170303150342.GE2439@work-vm \
    --to=dgilbert@redhat.com \
    --cc=arei.gonglei@huawei.com \
    --cc=huangzhichao@huawei.com \
    --cc=jdenemar@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=yanghongyang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).