From: "Daniel P. Berrange" <berrange@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Felipe Franciosi <felipe@nutanix.com>, Mike Cui <cui@nutanix.com>,
Kevin Wolf <kwolf@redhat.com>,
Juan Quintela <quintela@redhat.com>,
qemu-devel <qemu-devel@nongnu.org>,
"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] Live migration without bdrv_drain_all()
Date: Tue, 27 Sep 2016 10:51:36 +0100 [thread overview]
Message-ID: <20160927095136.GG3967@redhat.com> (raw)
In-Reply-To: <20160927092712.GA563@stefanha-x1.localdomain>
On Tue, Sep 27, 2016 at 10:27:12AM +0100, Stefan Hajnoczi wrote:
> On Mon, Aug 29, 2016 at 06:56:42PM +0000, Felipe Franciosi wrote:
> > Heya!
> >
> > > On 29 Aug 2016, at 08:06, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > >
> > > At KVM Forum an interesting idea was proposed to avoid
> > > bdrv_drain_all() during live migration. Mike Cui and Felipe Franciosi
> > > mentioned running at queue depth 1. It needs more thought to make it
> > > workable but I want to capture it here for discussion and to archive
> > > it.
> > >
> > > bdrv_drain_all() is synchronous and can cause VM downtime if I/O
> > > requests hang. We should find a better way of quiescing I/O that is
> > > not synchronous. Up until now I thought we should simply add a
> > > timeout to bdrv_drain_all() so it can at least fail (and live
> > > migration would fail) if I/O is stuck instead of hanging the VM. But
> > > the following approach is also interesting...
> > >
> > > During the iteration phase of live migration we could limit the queue
> > > depth so points with no I/O requests in-flight are identified. At
> > > these points the migration algorithm has the opportunity to move to
> > > the next phase without requiring bdrv_drain_all() since no requests
> > > are pending.
> >
> > I actually think that this "io quiesced state" is highly unlikely to _just_ happen on a busy guest. The main idea behind running at QD1 is to naturally throttle the guest and make it easier to "force quiesce" the VQs.
> >
> > In other words, if the guest is busy and we run at QD1, I would expect the rings to be quite full of pending (ie. unprocessed) requests. At the same time, I would expect that a call to bdrv_drain_all() (as part of do_vm_stop()) should complete much quicker.
> >
> > Nevertheless, you mentioned that this is still problematic as that single outstanding IO could block, leaving the VM paused for longer.
> >
> > My suggestion is therefore that we leave the vCPUs running, but stop picking up requests from the VQs. Provided nothing blocks, you should reach the "io quiesced state" fairly quickly. If you don't, then the VM is at least still running (despite seeing no progress on its VQs).
> >
> > Thoughts on that?
>
> If the guest experiences a hung disk it may enter error recovery. QEMU
> should avoid this so the guest doesn't remount file systems read-only.
>
> This can be solved by only quiescing the disk for, say, 30 seconds at a
> time. If we don't reach a point where live migration can proceed during
> those 30 seconds then the disk will service requests again temporarily
> to avoid upsetting the guest.
What is the actual trigger for guest error recovery ? If you have the
situation where bdrv_drain_all could hang, surely even if you start
processing requests again after 30 seconds, you might not actually be
able to complete those requests for a long time, due to fact that
drain all has still got outstanding work blocking the new requests
you just accepted from the guest ?
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
next prev parent reply other threads:[~2016-09-27 9:57 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-29 15:06 [Qemu-devel] Live migration without bdrv_drain_all() Stefan Hajnoczi
2016-08-29 18:56 ` Felipe Franciosi
2016-09-27 9:27 ` Stefan Hajnoczi
2016-09-27 9:51 ` Daniel P. Berrange [this message]
2016-09-27 9:54 ` Dr. David Alan Gilbert
2016-09-28 9:03 ` Juan Quintela
2016-09-28 10:00 ` Felipe Franciosi
2016-09-28 10:23 ` Daniel P. Berrange
2016-09-27 9:48 ` Daniel P. Berrange
2016-10-12 13:09 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160927095136.GG3967@redhat.com \
--to=berrange@redhat.com \
--cc=cui@nutanix.com \
--cc=dgilbert@redhat.com \
--cc=felipe@nutanix.com \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=stefanha@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).