All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Gonglei (Arei)" <arei.gonglei@huawei.com>,
	"quintela@redhat.com" <quintela@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	yanghongyang <yanghongyang@huawei.com>,
	Huangzhichao <huangzhichao@huawei.com>
Subject: Re: [Qemu-devel] [Bug?] BQL about live migration
Date: Fri, 3 Mar 2017 13:26:54 +0000	[thread overview]
Message-ID: <20170303132653.GD2439@work-vm> (raw)
In-Reply-To: <bcaabb9b-44a6-ee7d-14b9-eade207e1538@redhat.com>

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> 
> 
> On 03/03/2017 14:11, Dr. David Alan Gilbert wrote:
> > * Paolo Bonzini (pbonzini@redhat.com) wrote:
> >>
> >>
> >> On 03/03/2017 13:00, Dr. David Alan Gilbert wrote:
> >>> Ouch that's pretty nasty; I remember Paolo explaining to me a while ago that
> >>> their were times when run_on_cpu would have to drop the BQL and I worried about it,
> >>> but this is the 1st time I've seen an error due to it.
> >>>
> >>> Do you know what the migration state was at that point? Was it MIGRATION_STATUS_CANCELLING?
> >>> I'm thinking perhaps we should stop 'cont' from continuing while migration is in
> >>> MIGRATION_STATUS_CANCELLING.  Do we send an event when we hit CANCELLED - so that
> >>> perhaps libvirt could avoid sending the 'cont' until then?
> >>
> >> No, there's no event, though I thought libvirt would poll until
> >> "query-migrate" returns the cancelled state.  Of course that is a small
> >> consolation, because a segfault is unacceptable.
> > 
> > I think you might get an event if you set the new migrate capability called
> > 'events' on!
> > 
> > void migrate_set_state(int *state, int old_state, int new_state)
> > {
> >     if (atomic_cmpxchg(state, old_state, new_state) == old_state) {
> >         trace_migrate_set_state(new_state);
> >         migrate_generate_event(new_state);
> >     }
> > }
> > 
> > static void migrate_generate_event(int new_state)
> > {
> >     if (migrate_use_events()) {
> >         qapi_event_send_migration(new_state, &error_abort); 
> >     }
> > }
> > 
> > That event feature went in sometime after 2.3.0.
> > 
> >> One possibility is to suspend the monitor in qmp_migrate_cancel and
> >> resume it (with add_migration_state_change_notifier) when we hit the
> >> CANCELLED state.  I'm not sure what the latency would be between the end
> >> of migrate_fd_cancel and finally reaching CANCELLED.
> > 
> > I don't like suspending monitors; it can potentially take quite a significant
> > time to do a cancel.
> > How about making 'cont' fail if we're in CANCELLING?
> 
> Actually I thought that would be the case already (in fact CANCELLING is
> internal only; the outside world sees it as "active" in query-migrate).
> 
> Lei, what is the runstate?  (That is, why did cont succeed at all)?

I suspect it's RUN_STATE_FINISH_MIGRATE - we set that before we do the device
save, and that's what we get at the end of a migrate and it's legal to restart
from there.

> Paolo
> 
> > I'd really love to see the 'run_on_cpu' being more careful about the BQL;
> > we really need all of the rest of the devices to stay quiesced at times.
> 
> That's not really possible, because of how condition variables work. :(

*Really* we need to find a solution to that - there's probably lots of 
other things that can spring up in that small window other than the
'cont'.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  reply	other threads:[~2017-03-03 13:27 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-03  9:29 [Qemu-devel] [Bug?] BQL about live migration Gonglei (Arei)
2017-03-03 10:42 ` Fam Zheng
2017-03-06  2:07   ` yanghongyang
2017-03-03 12:00 ` Dr. David Alan Gilbert
2017-03-03 12:48   ` Paolo Bonzini
2017-03-03 13:11     ` Dr. David Alan Gilbert
2017-03-03 13:14       ` Paolo Bonzini
2017-03-03 13:26         ` Dr. David Alan Gilbert [this message]
2017-03-03 13:33           ` Paolo Bonzini
2017-03-03 14:15             ` Yang Hongyang
2017-03-03 15:03             ` Dr. David Alan Gilbert
2017-03-03 13:57           ` Yang Hongyang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170303132653.GD2439@work-vm \
    --to=dgilbert@redhat.com \
    --cc=arei.gonglei@huawei.com \
    --cc=huangzhichao@huawei.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=yanghongyang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.