From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1MGJ14-0004XE-Qf
	for qemu-devel@nongnu.org; Mon, 15 Jun 2009 16:42:54 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1MGJ10-0004WD-4P
	for qemu-devel@nongnu.org; Mon, 15 Jun 2009 16:42:54 -0400
Received: from [199.232.76.173] (port=57333 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1MGJ0z-0004WA-S2
	for qemu-devel@nongnu.org; Mon, 15 Jun 2009 16:42:49 -0400
Received: from mx2.redhat.com ([66.187.237.31]:47782)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <glommer@redhat.com>) id 1MGJ0z-0006tX-7d
	for qemu-devel@nongnu.org; Mon, 15 Jun 2009 16:42:49 -0400
Date: Mon, 15 Jun 2009 17:48:52 -0300
From: Glauber Costa <glommer@redhat.com>
Subject: Re: [Qemu-devel] Live migration broken when under heavy IO
Message-ID: <20090615204852.GA6693@poweredge.glommer>
References: <4A36B025.2080602@us.ibm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4A36B025.2080602@us.ibm.com>
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Anthony Liguori <aliguori@us.ibm.com>
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, kvm-devel <kvm@vger.kernel.org>

On Mon, Jun 15, 2009 at 03:33:41PM -0500, Anthony Liguori wrote:
> The basic issue is that:
>
> migrate_fd_put_ready():    bdrv_flush_all();
>
> Does:
>
> block.c:
>
> foreach block driver:
>   drv->flush(bs);
>
> Which in the case of raw, is just fsync(s->fd).
>
> Any submitted request is not queued or flushed which will lead to the  
> request being dropped after the live migration.
you mean any request submitted _after_ that is not queued, right?

>
> Is anyone working on fixing this?  Does anyone have a clever idea how to  
> fix this without just waiting for all IO requests to complete?
If I understood you correctly, we could do something in the lines of dirty
tracking for I/O devices.

use register_savevm_live() instead of register_savevm() for those, and
keep doing passes until we reach stage 3, for some criteria. We can then
just flush the remaining requests on that device and mark[1] it somewhere.
We can then either stop that device, so that new requests never arrive,
or stop the VM entirely.

[1] By mark, I mean the verb "to mark", not our dear friend Mark McLaughing.