From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:52560)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1XbWXD-00015P-FQ
	for qemu-devel@nongnu.org; Tue, 07 Oct 2014 11:18:49 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1XbWX3-0008Lo-Mp
	for qemu-devel@nongnu.org; Tue, 07 Oct 2014 11:18:43 -0400
Received: from mx1.redhat.com ([209.132.183.28]:45040)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1XbWX3-0008LZ-Fm
	for qemu-devel@nongnu.org; Tue, 07 Oct 2014 11:18:33 -0400
Received: from int-mx09.intmail.prod.int.phx2.redhat.com
	(int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22])
	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s97FIWwI011980
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256
	verify=FAIL)
	for <qemu-devel@nongnu.org>; Tue, 7 Oct 2014 11:18:32 -0400
Message-ID: <54340444.7070608@redhat.com>
Date: Tue, 07 Oct 2014 17:18:28 +0200
From: Max Reitz <mreitz@redhat.com>
MIME-Version: 1.0
References: <1412182919-9550-1-git-send-email-stefanha@redhat.com>
	<1412182919-9550-11-git-send-email-stefanha@redhat.com>
	<54306676.1050205@redhat.com>
	<20141006093041.GB2694@stefanha-thinkpad.redhat.com>
In-Reply-To: <20141006093041.GB2694@stefanha-thinkpad.redhat.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH 10/11] block: let commit blockjob run in
	BDS AioContext
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, Fam Zheng <famz@redhat.com>, qemu-devel@nongnu.org

On 06.10.2014 11:30, Stefan Hajnoczi wrote:
> On Sat, Oct 04, 2014 at 11:28:22PM +0200, Max Reitz wrote:
>> On 01.10.2014 19:01, Stefan Hajnoczi wrote:
>>> The commit block job must run in the BlockDriverState AioContext so that
>>> it works with dataplane.
>>>
>>> Acquire the AioContext in blockdev.c so starting the block job is safe.
>>> One detail here is that the bdrv_drain_all() must be moved inside the
>>> aio_context_acquire() region so requests cannot sneak in between the
>>> drain and acquire.
>> Hm, I see the intent, but in patch 5 you said bdrv_drain_all() should never
>> be called outside of the main loop (at least that's how it appeared to me).
>> Wouldn't it be enough to use bdrv_drain() on the source BDS, like in patch
>> 9?
> There is no contradiction here because qmp_block_commit() is invoked by
> the QEMU monitor from the main loop.
>
> The problem with bdrv_drain_all() is that it acquires AioContexts.  If
> called outside the main loop without taking special care, it could
> result in lock ordering problems (e.g. two threads trying to acquire all
> AioContexts at the same time while already holding their respective
> contexts).
>
> qmp_block_commit() is just traditional QEMU global mutex code so it is
> allowed to call bdrv_drain_all().

Hm, okay then.

>>> @@ -140,27 +173,14 @@ wait:
>>>       ret = 0;
>>> -    if (!block_job_is_cancelled(&s->common) && sector_num == end) {
>>> -        /* success */
>>> -        ret = bdrv_drop_intermediate(active, top, base, s->backing_file_str);
>>> +out:
>>> +    if (buf) {
>>> +        qemu_vfree(buf);
>>>       }
>> Is this new condition really necessary? However, it won't hurt, so:
> This was a mistake.  Since commit
> 94c8ff3a01d9bd1005f066a0ee3fe43c842a43b7 ("w32: Make qemu_vfree() accept
> NULL like the POSIX implementation") it is no longer necessary to check
> for NULL pointers.
>
> You can't teach an old dog new tricks :).
>
> Thanks, will fix in the next revision!
>
>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>>
>> A general question regarding the assertions here and in patch 8: I tried to
>> break them, but it couldn't find a way. The way I tried was by creating two
>> devices in different threads with just one qcow2 behind each of them, and
>> then trying to attach on of those qcow2 BDS to the other as a backing file.
>> I couldn't find out, how, but I guess this is something we might want to
>> support in the future. Can we actually be sure that all of the BDS in one
>> tree are always running in the same AIO context? Are we already enforcing
>> this?
> bdrv_set_aio_context() is recursive so it also sets all the child nodes.
> That is the only mechanism to ensure AioContext is consistent across
> nodes.
>
> When the BDS graph is manipulated (e.g. attaching new roots, swapping
> nodes, etc) we don't perform checks today.
>
> Markus has asked that I add the appropriate assertions so errors are
> caught early.  I haven't done that yet but it's a good next step.

Okay, seems good to me. It's not possible to break them now, and if it 
will ever be, the assertions will at least catch it.

> As far as I'm aware, these patches don't introduce cases where we would
> make the AioContext in the graph inconsistent.  So I see the AioContext
> consistency assertions as a separate patch series (which I will work on
> next...hopefully not to discover horrible problems!).
>
>> And furthermore, basically all the calls to acquire an AIO context are of
>> the form "aio_context = bdrv_get_aio_context(bs);
>> aio_context_acquire(aio_context);". It is *extremely* unlikely if possible
>> at all, but wouldn't it be possible to change the BDS's AIO context from
>> another thread after the first function returned and before the lock is
>> acquired? If that is really the case, I think we should have some atomic
>> bdrv_acquire_aio_context() function.
> No, because only the main loop calls bdrv_set_aio_context().  At the
> moment the case you mentioned cannot happen.
>
> Ultimately, we should move away from "this only works in the main loop"
> constraints.  In order to provide atomic BDS AioContext acquire we need
> a global root that is thread-safe.  That doesn't exist today -
> bdrv_states is protected by the QEMU global mutex only.
>
> I thought about adding the infrastructure in this patch series but it is
> not necessary yet and would make the series more complicated.
>
> The idea is:
>
>   * Add bdrv_states_lock to protect the global list of BlockDriverStates
>   * Integrate bdrv_ref()/bdrv_unref() as well as bdrv_get_aio_context()
>     so they are atomic and protected by the bdrv_states_lock
>
> So bdrv_find() and other functions that access bdrv_states become the
> entry points to acquiring BlockDriverStates in a thread-safe fashion.
> bdrv_unref() will need rethinking too to prevent races between freeing a
> BDS and bdrv_find().
>
> Can you think of a place where we need this today?  I haven't found one
> yet but would like one to develop the code against.

No, I can't think of anything, as long as QMP commands always arrive 
through the main loop.

Thank you for your explanations!

Max