From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38588) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XgTje-0004t4-Qf for qemu-devel@nongnu.org; Tue, 21 Oct 2014 03:20:08 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XgTjY-0004OF-JS for qemu-devel@nongnu.org; Tue, 21 Oct 2014 03:20:02 -0400 Received: from mx1.redhat.com ([209.132.183.28]:57269) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XgTjY-0004O9-7R for qemu-devel@nongnu.org; Tue, 21 Oct 2014 03:19:56 -0400 Date: Tue, 21 Oct 2014 09:19:49 +0200 From: Kevin Wolf Message-ID: <20141021071949.GA4409@noname.redhat.com> References: <201410202148499451733@sangfor.com> <20141020135955.GL3585@noname.redhat.com> <544525AA.6030609@gmail.com> <20141020160724.GS3585@noname.redhat.com> <201410210913396197278@sangfor.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201410210913396197278@sangfor.com> Subject: Re: [Qemu-devel] [question] savevm/delvm: Is it necessary to perform bdrv_drain_all before savevm and delvm? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Zhang Haoyu Cc: Stefan Hajnoczi , qemu-devel , Zhang Haoyu , Max Reitz Am 21.10.2014 um 03:13 hat Zhang Haoyu geschrieben: > >> >> Hi, > >> >> > >> >> I noticed that bdrv_drain_all is performed in load_vmstate before > >> bdrv_snapshot_goto, > >> >> and bdrv_drain_all is performed in qmp_transaction before > >> internal_snapshot_prepare, > >> >> so is it also neccesary to perform bdrv_drain_all in savevm and delvm? > >> > > >> >Definitely yes for savevm. do_savevm() calls it indirectly via > >> >vm_stop(), so that part looks okay. > >> > > >> Yes, you are right. > >> > >> >delvm doesn't affect the currently running VM, and therefore doesn't > >> >interfere with guest requests that are in flight. So I think that a > >> >bdrv_drain_all() isn't needed there. > >> > > >> I'm worried about that there are still pending IOs while deleting snapshot, > >> then is it possible that there is concurrency problem between the > >> process of deleting snapshot > >> and the coroutine of io read/write(bdrv_co_do_rw) invoked by the > >> pending IOs? > >> This coroutine is also in main thread. > >> Am I missing something? > > > >What kind of concurrency problem are you thinking of? > > > I have encountered two problem, > 1) double-free of Qcow2DiscardRegion in qcow2_process_discards > please see the discussing mail: [PATCH] qcow2: fix double-free of Qcow2DiscardRegion in qcow2_process_discards > 2) qcow2 image is truncated to very huge size, but the size shown by qemu-img info is normal > please see the discussing mail: > [question] is it possible that big-endian l1 table offset referenced by other I/O while updating l1 table offset in qcow2_update_snapshot_refcount? Did you verify that the invalid value actually makes sense if byteswapped? For example, that there is no reserved bit set then? > I suspect that both of the two problems are concurrency problem mentioned above, > please see the discussing mail. > > > >I do see that there might be a chance of concurrency, but that doesn't > >automatically mean the requests are conflicting. > > > >Would you feel better with taking s->lock in qcow2_snapshot_delete()? > Both deleting snapshot and the coroutine of pending io read/write(bdrv_co_do_rw) > are performed in main thread, could BDRVQcowState.lock work? Yes. s->lock is not a mutex for threads, but a coroutine based one. The problem, however, is that qcow2_snapshot_delete() isn't execute in a coroutine, so we can't take s->lock here. We would either need to move it into a coroutine or add a bdrv_drain_all() indeed. This also means that we probably need to review all other cases where non-coroutine callbacks from BlockDriver might interfere with running requests. The original assumption that they are safe as long as they are not running in a coroutine seems to be wrong. Kevin