From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:59855)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stefanha@gmail.com>) id 1beO9D-0001Is-Py
	for qemu-devel@nongnu.org; Mon, 29 Aug 2016 11:06:55 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <stefanha@gmail.com>) id 1beO9B-0005FO-S8
	for qemu-devel@nongnu.org; Mon, 29 Aug 2016 11:06:50 -0400
Received: from mail-ua0-x232.google.com ([2607:f8b0:400c:c08::232]:36766)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stefanha@gmail.com>) id 1beO9B-0005FK-LQ
	for qemu-devel@nongnu.org; Mon, 29 Aug 2016 11:06:49 -0400
Received: by mail-ua0-x232.google.com with SMTP id m60so210085454uam.3
	for <qemu-devel@nongnu.org>; Mon, 29 Aug 2016 08:06:49 -0700 (PDT)
MIME-Version: 1.0
From: Stefan Hajnoczi <stefanha@gmail.com>
Date: Mon, 29 Aug 2016 11:06:48 -0400
Message-ID: <CAJSP0QUV4mBXsoZdhDV7_tZfNLQ4LUk4otoYCp2ZYhxD+OHJWQ@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Subject: [Qemu-devel] Live migration without bdrv_drain_all()
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel <qemu-devel@nongnu.org>
Cc: cui@nutanix.com, felipe@nutanix.com, Kevin Wolf <kwolf@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>

At KVM Forum an interesting idea was proposed to avoid
bdrv_drain_all() during live migration.  Mike Cui and Felipe Franciosi
mentioned running at queue depth 1.  It needs more thought to make it
workable but I want to capture it here for discussion and to archive
it.

bdrv_drain_all() is synchronous and can cause VM downtime if I/O
requests hang.  We should find a better way of quiescing I/O that is
not synchronous.  Up until now I thought we should simply add a
timeout to bdrv_drain_all() so it can at least fail (and live
migration would fail) if I/O is stuck instead of hanging the VM.  But
the following approach is also interesting...

During the iteration phase of live migration we could limit the queue
depth so points with no I/O requests in-flight are identified.  At
these points the migration algorithm has the opportunity to move to
the next phase without requiring bdrv_drain_all() since no requests
are pending.

Unprocessed requests are left in the virtio-blk/virtio-scsi virtqueues
so that the destination QEMU can process them after migration
completes.

Unfortunately this approach makes convergence harder because the VM
might also be dirtying memory pages during the iteration phase.  Now
we need to reach a spot where no I/O is in-flight *and* dirty memory
is under the threshold.

Thoughts?

Stefan