From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52560) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XbWXD-00015P-FQ for qemu-devel@nongnu.org; Tue, 07 Oct 2014 11:18:49 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XbWX3-0008Lo-Mp for qemu-devel@nongnu.org; Tue, 07 Oct 2014 11:18:43 -0400 Received: from mx1.redhat.com ([209.132.183.28]:45040) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XbWX3-0008LZ-Fm for qemu-devel@nongnu.org; Tue, 07 Oct 2014 11:18:33 -0400 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s97FIWwI011980 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL) for ; Tue, 7 Oct 2014 11:18:32 -0400 Message-ID: <54340444.7070608@redhat.com> Date: Tue, 07 Oct 2014 17:18:28 +0200 From: Max Reitz MIME-Version: 1.0 References: <1412182919-9550-1-git-send-email-stefanha@redhat.com> <1412182919-9550-11-git-send-email-stefanha@redhat.com> <54306676.1050205@redhat.com> <20141006093041.GB2694@stefanha-thinkpad.redhat.com> In-Reply-To: <20141006093041.GB2694@stefanha-thinkpad.redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 10/11] block: let commit blockjob run in BDS AioContext List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Kevin Wolf , Paolo Bonzini , Fam Zheng , qemu-devel@nongnu.org On 06.10.2014 11:30, Stefan Hajnoczi wrote: > On Sat, Oct 04, 2014 at 11:28:22PM +0200, Max Reitz wrote: >> On 01.10.2014 19:01, Stefan Hajnoczi wrote: >>> The commit block job must run in the BlockDriverState AioContext so that >>> it works with dataplane. >>> >>> Acquire the AioContext in blockdev.c so starting the block job is safe. >>> One detail here is that the bdrv_drain_all() must be moved inside the >>> aio_context_acquire() region so requests cannot sneak in between the >>> drain and acquire. >> Hm, I see the intent, but in patch 5 you said bdrv_drain_all() should never >> be called outside of the main loop (at least that's how it appeared to me). >> Wouldn't it be enough to use bdrv_drain() on the source BDS, like in patch >> 9? > There is no contradiction here because qmp_block_commit() is invoked by > the QEMU monitor from the main loop. > > The problem with bdrv_drain_all() is that it acquires AioContexts. If > called outside the main loop without taking special care, it could > result in lock ordering problems (e.g. two threads trying to acquire all > AioContexts at the same time while already holding their respective > contexts). > > qmp_block_commit() is just traditional QEMU global mutex code so it is > allowed to call bdrv_drain_all(). Hm, okay then. >>> @@ -140,27 +173,14 @@ wait: >>> ret = 0; >>> - if (!block_job_is_cancelled(&s->common) && sector_num == end) { >>> - /* success */ >>> - ret = bdrv_drop_intermediate(active, top, base, s->backing_file_str); >>> +out: >>> + if (buf) { >>> + qemu_vfree(buf); >>> } >> Is this new condition really necessary? However, it won't hurt, so: > This was a mistake. Since commit > 94c8ff3a01d9bd1005f066a0ee3fe43c842a43b7 ("w32: Make qemu_vfree() accept > NULL like the POSIX implementation") it is no longer necessary to check > for NULL pointers. > > You can't teach an old dog new tricks :). > > Thanks, will fix in the next revision! > >> Reviewed-by: Max Reitz >> >> A general question regarding the assertions here and in patch 8: I tried to >> break them, but it couldn't find a way. The way I tried was by creating two >> devices in different threads with just one qcow2 behind each of them, and >> then trying to attach on of those qcow2 BDS to the other as a backing file. >> I couldn't find out, how, but I guess this is something we might want to >> support in the future. Can we actually be sure that all of the BDS in one >> tree are always running in the same AIO context? Are we already enforcing >> this? > bdrv_set_aio_context() is recursive so it also sets all the child nodes. > That is the only mechanism to ensure AioContext is consistent across > nodes. > > When the BDS graph is manipulated (e.g. attaching new roots, swapping > nodes, etc) we don't perform checks today. > > Markus has asked that I add the appropriate assertions so errors are > caught early. I haven't done that yet but it's a good next step. Okay, seems good to me. It's not possible to break them now, and if it will ever be, the assertions will at least catch it. > As far as I'm aware, these patches don't introduce cases where we would > make the AioContext in the graph inconsistent. So I see the AioContext > consistency assertions as a separate patch series (which I will work on > next...hopefully not to discover horrible problems!). > >> And furthermore, basically all the calls to acquire an AIO context are of >> the form "aio_context = bdrv_get_aio_context(bs); >> aio_context_acquire(aio_context);". It is *extremely* unlikely if possible >> at all, but wouldn't it be possible to change the BDS's AIO context from >> another thread after the first function returned and before the lock is >> acquired? If that is really the case, I think we should have some atomic >> bdrv_acquire_aio_context() function. > No, because only the main loop calls bdrv_set_aio_context(). At the > moment the case you mentioned cannot happen. > > Ultimately, we should move away from "this only works in the main loop" > constraints. In order to provide atomic BDS AioContext acquire we need > a global root that is thread-safe. That doesn't exist today - > bdrv_states is protected by the QEMU global mutex only. > > I thought about adding the infrastructure in this patch series but it is > not necessary yet and would make the series more complicated. > > The idea is: > > * Add bdrv_states_lock to protect the global list of BlockDriverStates > * Integrate bdrv_ref()/bdrv_unref() as well as bdrv_get_aio_context() > so they are atomic and protected by the bdrv_states_lock > > So bdrv_find() and other functions that access bdrv_states become the > entry points to acquiring BlockDriverStates in a thread-safe fashion. > bdrv_unref() will need rethinking too to prevent races between freeing a > BDS and bdrv_find(). > > Can you think of a place where we need this today? I haven't found one > yet but would like one to develop the code against. No, I can't think of anything, as long as QMP commands always arrive through the main loop. Thank you for your explanations! Max