From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59855) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1beO9D-0001Is-Py for qemu-devel@nongnu.org; Mon, 29 Aug 2016 11:06:55 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1beO9B-0005FO-S8 for qemu-devel@nongnu.org; Mon, 29 Aug 2016 11:06:50 -0400 Received: from mail-ua0-x232.google.com ([2607:f8b0:400c:c08::232]:36766) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1beO9B-0005FK-LQ for qemu-devel@nongnu.org; Mon, 29 Aug 2016 11:06:49 -0400 Received: by mail-ua0-x232.google.com with SMTP id m60so210085454uam.3 for ; Mon, 29 Aug 2016 08:06:49 -0700 (PDT) MIME-Version: 1.0 From: Stefan Hajnoczi Date: Mon, 29 Aug 2016 11:06:48 -0400 Message-ID: Content-Type: text/plain; charset=UTF-8 Subject: [Qemu-devel] Live migration without bdrv_drain_all() List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel Cc: cui@nutanix.com, felipe@nutanix.com, Kevin Wolf , Paolo Bonzini At KVM Forum an interesting idea was proposed to avoid bdrv_drain_all() during live migration. Mike Cui and Felipe Franciosi mentioned running at queue depth 1. It needs more thought to make it workable but I want to capture it here for discussion and to archive it. bdrv_drain_all() is synchronous and can cause VM downtime if I/O requests hang. We should find a better way of quiescing I/O that is not synchronous. Up until now I thought we should simply add a timeout to bdrv_drain_all() so it can at least fail (and live migration would fail) if I/O is stuck instead of hanging the VM. But the following approach is also interesting... During the iteration phase of live migration we could limit the queue depth so points with no I/O requests in-flight are identified. At these points the migration algorithm has the opportunity to move to the next phase without requiring bdrv_drain_all() since no requests are pending. Unprocessed requests are left in the virtio-blk/virtio-scsi virtqueues so that the destination QEMU can process them after migration completes. Unfortunately this approach makes convergence harder because the VM might also be dirtying memory pages during the iteration phase. Now we need to reach a spot where no I/O is in-flight *and* dirty memory is under the threshold. Thoughts? Stefan