From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:37068)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <zhanghy@sangfor.com>) id 1XgVlY-0000CK-9H
	for qemu-devel@nongnu.org; Tue, 21 Oct 2014 05:30:14 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <zhanghy@sangfor.com>) id 1XgVlS-00037S-5Y
	for qemu-devel@nongnu.org; Tue, 21 Oct 2014 05:30:08 -0400
Received: from [58.251.49.30] (port=60018 helo=mail.sangfor.com)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <zhanghy@sangfor.com>) id 1XgVlR-00035B-Dj
	for qemu-devel@nongnu.org; Tue, 21 Oct 2014 05:30:02 -0400
Date: Tue, 21 Oct 2014 15:51:32 +0800
From: "Zhang Haoyu" <zhanghy@sangfor.com>
References: <201410202148499451733@sangfor.com>,
	<20141020135955.GL3585@noname.redhat.com>,
	<544525AA.6030609@gmail.com>,
	<20141020160724.GS3585@noname.redhat.com>,
	<201410210913396197278@sangfor.com>,
	<20141021071949.GA4409@noname.redhat.com>
Message-ID: <201410211551209223378@sangfor.com>
Mime-Version: 1.0
Content-Type: text/plain;
	charset="gb2312"
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [question] savevm/delvm: Is it necessary to
	performbdrv_drain_all before savevm and delvm?
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>, qemu-devel <qemu-devel@nongnu.org>, Zhang Haoyu <ahzhanghaoyu@gmail.com>, Max Reitz <mreitz@redhat.com>

> >> >> Hi,
> >> >>
> >> >> I noticed that bdrv_drain_all is performed in load_vmstate before
> >> bdrv_snapshot_goto,
>> >> >> and bdrv_drain_all is performed in qmp_transaction before
>> >> internal_snapshot_prepare,
>> >> >> so is it also neccesary to perform bdrv_drain_all in savevm and delvm?
>> >> >
>> >> >Definitely yes for savevm. do_savevm() calls it indirectly via
>> >> >vm_stop(), so that part looks okay.
>> >> >
>> >> Yes, you are right.
>> >> 
>> >> >delvm doesn't affect the currently running VM, and therefore doesn't
>> >> >interfere with guest requests that are in flight. So I think that a
>> >> >bdrv_drain_all() isn't needed there.
>> >> >
>> >> I'm worried about that there are still pending IOs while deleting snapshot,
>> >> then is it possible that there is concurrency problem between the
>> >> process of deleting snapshot
>> >> and the coroutine of io read/write(bdrv_co_do_rw) invoked by the
>> >> pending IOs?
>> >> This coroutine is also in main thread.
>> >> Am I missing something?
>> >
>> >What kind of concurrency problem are you thinking of?
>> >
>> I have encountered two problem,
>> 1) double-free of Qcow2DiscardRegion in qcow2_process_discards
>>    please see the discussing mail: [PATCH] qcow2: fix double-free of Qcow2DiscardRegion in qcow2_process_discards
>> 2) qcow2 image is truncated to very huge size, but the size shown by qemu-img info is normal
>>    please see the discussing mail:  
>>    [question] is it possible that big-endian l1 table offset referenced by other I/O while updating l1 table offset in qcow2_update_snapshot_refcount?
>
>Did you verify that the invalid value actually makes sense if
>byteswapped? For example, that there is no reserved bit set then?
>
Yes, exactly, I have verified that those l2 table offset are invalid value if byte-swapped.

>> I suspect that both of the two problems are concurrency problem mentioned above, 
>> please see the discussing mail.
>> 
>> 
>> >I do see that there might be a chance of concurrency, but that doesn't
>> >automatically mean the requests are conflicting.
>> >
>> >Would you feel better with taking s->lock in qcow2_snapshot_delete()?
>> Both deleting snapshot and the coroutine of pending io read/write(bdrv_co_do_rw)
>> are performed in main thread, could BDRVQcowState.lock work?
>
>Yes. s->lock is not a mutex for threads, but a coroutine based one.
>
Yes, you are right.

>The problem, however, is that qcow2_snapshot_delete() isn't execute in a
>coroutine, so we can't take s->lock here. We would either need to move
>it into a coroutine or add a bdrv_drain_all() indeed.
>
I'm inclined to add bdrv_drain_all(), just keeping consistent with the other 
snapshot-related operations, like savevm, loadvm, internal_snapshot_prepare, etc.

Thanks,
Zhang Haoyu

>This also means that we probably need to review all other cases where
>non-coroutine callbacks from BlockDriver might interfere with running
>requests. The original assumption that they are safe as long as they are
>not running in a coroutine seems to be wrong.
Agreed.
>
>Kevin