From mboxrd@z Thu Jan  1 00:00:00 1970
From: Glauber Costa <glommer@redhat.com>
Subject: Re: [Qemu-devel] Live migration broken when under heavy IO
Date: Mon, 15 Jun 2009 17:48:52 -0300
Message-ID: <20090615204852.GA6693@poweredge.glommer>
References: <4A36B025.2080602@us.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	kvm-devel <kvm@vger.kernel.org>
To: Anthony Liguori <aliguori@us.ibm.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx2.redhat.com ([66.187.237.31]:59061 "EHLO mx2.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751000AbZFOUmr (ORCPT <rfc822;kvm@vger.kernel.org>);
	Mon, 15 Jun 2009 16:42:47 -0400
Content-Disposition: inline
In-Reply-To: <4A36B025.2080602@us.ibm.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On Mon, Jun 15, 2009 at 03:33:41PM -0500, Anthony Liguori wrote:
> The basic issue is that:
>
> migrate_fd_put_ready():    bdrv_flush_all();
>
> Does:
>
> block.c:
>
> foreach block driver:
>   drv->flush(bs);
>
> Which in the case of raw, is just fsync(s->fd).
>
> Any submitted request is not queued or flushed which will lead to the  
> request being dropped after the live migration.
you mean any request submitted _after_ that is not queued, right?

>
> Is anyone working on fixing this?  Does anyone have a clever idea how to  
> fix this without just waiting for all IO requests to complete?
If I understood you correctly, we could do something in the lines of dirty
tracking for I/O devices.

use register_savevm_live() instead of register_savevm() for those, and
keep doing passes until we reach stage 3, for some criteria. We can then
just flush the remaining requests on that device and mark[1] it somewhere.
We can then either stop that device, so that new requests never arrive,
or stop the VM entirely.

[1] By mark, I mean the verb "to mark", not our dear friend Mark McLaughing.