From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [Qemu-devel] Live migration broken when under heavy IO Date: Mon, 15 Jun 2009 17:48:52 -0300 Message-ID: <20090615204852.GA6693@poweredge.glommer> References: <4A36B025.2080602@us.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: "qemu-devel@nongnu.org" , kvm-devel To: Anthony Liguori Return-path: Received: from mx2.redhat.com ([66.187.237.31]:59061 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751000AbZFOUmr (ORCPT ); Mon, 15 Jun 2009 16:42:47 -0400 Content-Disposition: inline In-Reply-To: <4A36B025.2080602@us.ibm.com> Sender: kvm-owner@vger.kernel.org List-ID: On Mon, Jun 15, 2009 at 03:33:41PM -0500, Anthony Liguori wrote: > The basic issue is that: > > migrate_fd_put_ready(): bdrv_flush_all(); > > Does: > > block.c: > > foreach block driver: > drv->flush(bs); > > Which in the case of raw, is just fsync(s->fd). > > Any submitted request is not queued or flushed which will lead to the > request being dropped after the live migration. you mean any request submitted _after_ that is not queued, right? > > Is anyone working on fixing this? Does anyone have a clever idea how to > fix this without just waiting for all IO requests to complete? If I understood you correctly, we could do something in the lines of dirty tracking for I/O devices. use register_savevm_live() instead of register_savevm() for those, and keep doing passes until we reach stage 3, for some criteria. We can then just flush the remaining requests on that device and mark[1] it somewhere. We can then either stop that device, so that new requests never arrive, or stop the VM entirely. [1] By mark, I mean the verb "to mark", not our dear friend Mark McLaughing.