From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:49937)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <berrange@redhat.com>) id 1bop9A-00074t-4j
	for qemu-devel@nongnu.org; Tue, 27 Sep 2016 05:57:57 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <berrange@redhat.com>) id 1bop95-0006eu-Tn
	for qemu-devel@nongnu.org; Tue, 27 Sep 2016 05:57:55 -0400
Received: from mx1.redhat.com ([209.132.183.28]:53796)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <berrange@redhat.com>) id 1bop95-0006eO-JV
	for qemu-devel@nongnu.org; Tue, 27 Sep 2016 05:57:51 -0400
Date: Tue, 27 Sep 2016 10:51:36 +0100
From: "Daniel P. Berrange" <berrange@redhat.com>
Message-ID: <20160927095136.GG3967@redhat.com>
Reply-To: "Daniel P. Berrange" <berrange@redhat.com>
References: <CAJSP0QUV4mBXsoZdhDV7_tZfNLQ4LUk4otoYCp2ZYhxD+OHJWQ@mail.gmail.com>
	<03BF752A-0E6A-4AAD-A310-DFACDF0B8339@nutanix.com>
	<20160927092712.GA563@stefanha-x1.localdomain>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <20160927092712.GA563@stefanha-x1.localdomain>
Subject: Re: [Qemu-devel] Live migration without bdrv_drain_all()
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Felipe Franciosi <felipe@nutanix.com>, Mike Cui <cui@nutanix.com>, Kevin Wolf <kwolf@redhat.com>, Juan Quintela <quintela@redhat.com>, qemu-devel <qemu-devel@nongnu.org>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>

On Tue, Sep 27, 2016 at 10:27:12AM +0100, Stefan Hajnoczi wrote:
> On Mon, Aug 29, 2016 at 06:56:42PM +0000, Felipe Franciosi wrote:
> > Heya!
> > 
> > > On 29 Aug 2016, at 08:06, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > > 
> > > At KVM Forum an interesting idea was proposed to avoid
> > > bdrv_drain_all() during live migration.  Mike Cui and Felipe Franciosi
> > > mentioned running at queue depth 1.  It needs more thought to make it
> > > workable but I want to capture it here for discussion and to archive
> > > it.
> > > 
> > > bdrv_drain_all() is synchronous and can cause VM downtime if I/O
> > > requests hang.  We should find a better way of quiescing I/O that is
> > > not synchronous.  Up until now I thought we should simply add a
> > > timeout to bdrv_drain_all() so it can at least fail (and live
> > > migration would fail) if I/O is stuck instead of hanging the VM.  But
> > > the following approach is also interesting...
> > > 
> > > During the iteration phase of live migration we could limit the queue
> > > depth so points with no I/O requests in-flight are identified.  At
> > > these points the migration algorithm has the opportunity to move to
> > > the next phase without requiring bdrv_drain_all() since no requests
> > > are pending.
> > 
> > I actually think that this "io quiesced state" is highly unlikely to _just_ happen on a busy guest. The main idea behind running at QD1 is to naturally throttle the guest and make it easier to "force quiesce" the VQs.
> > 
> > In other words, if the guest is busy and we run at QD1, I would expect the rings to be quite full of pending (ie. unprocessed) requests. At the same time, I would expect that a call to bdrv_drain_all() (as part of do_vm_stop()) should complete much quicker.
> > 
> > Nevertheless, you mentioned that this is still problematic as that single outstanding IO could block, leaving the VM paused for longer.
> > 
> > My suggestion is therefore that we leave the vCPUs running, but stop picking up requests from the VQs. Provided nothing blocks, you should reach the "io quiesced state" fairly quickly. If you don't, then the VM is at least still running (despite seeing no progress on its VQs).
> > 
> > Thoughts on that?
> 
> If the guest experiences a hung disk it may enter error recovery.  QEMU
> should avoid this so the guest doesn't remount file systems read-only.
> 
> This can be solved by only quiescing the disk for, say, 30 seconds at a
> time.  If we don't reach a point where live migration can proceed during
> those 30 seconds then the disk will service requests again temporarily
> to avoid upsetting the guest.

What is the actual trigger for guest error recovery ? If you have the
situation where bdrv_drain_all could hang, surely even if you start
processing requests again after 30 seconds, you might not actually be
able to complete those requests for a long time, due to fact that
drain all has still got outstanding work blocking the new requests
you just accepted from the guest ?


Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|