From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58502) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zysru-0007ku-KO for qemu-devel@nongnu.org; Tue, 17 Nov 2015 21:53:11 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zysrt-0000Bk-6E for qemu-devel@nongnu.org; Tue, 17 Nov 2015 21:53:10 -0500 Date: Wed, 18 Nov 2015 10:52:59 +0800 From: Fam Zheng Message-ID: <20151118025259.GB9000@ad.usersys.redhat.com> References: <1447108773-6836-1-git-send-email-mreitz@redhat.com> <1447108773-6836-23-git-send-email-mreitz@redhat.com> <20151112073427.GM4082@ad.usersys.redhat.com> <564A0119.7080704@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <564A0119.7080704@redhat.com> Subject: Re: [Qemu-devel] [PATCH v7 22/24] block: Rewrite bdrv_close_all() List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Max Reitz Cc: Kevin Wolf , Alberto Garcia , qemu-block@nongnu.org, Markus Armbruster , qemu-devel@nongnu.org, Stefan Hajnoczi , Paolo Bonzini , John Snow On Mon, 11/16 17:15, Max Reitz wrote: > >> @@ -1971,13 +1969,44 @@ static void bdrv_close(BlockDriverState *bs) > >> void bdrv_close_all(void) > >> { > >> BlockDriverState *bs; > >> + AioContext *aio_context; > >> + int original_refcount = 0; > >> > >> - QTAILQ_FOREACH(bs, &bdrv_states, device_list) { > >> - AioContext *aio_context = bdrv_get_aio_context(bs); > >> + /* Drop references from requests still in flight, such as canceled block > >> + * jobs whose AIO context has not been polled yet */ > >> + bdrv_drain_all(); > >> > >> - aio_context_acquire(aio_context); > >> - bdrv_close(bs); > >> - aio_context_release(aio_context); > >> + blockdev_close_all_bdrv_states(); > >> + blk_remove_all_bs(); > >> + > >> + /* Cancel all block jobs */ > >> + while (!QTAILQ_EMPTY(&all_bdrv_states)) { > >> + QTAILQ_FOREACH(bs, &all_bdrv_states, bs_list) { > >> + aio_context = bdrv_get_aio_context(bs); > >> + > >> + aio_context_acquire(aio_context); > >> + if (bs->job) { > >> + /* So we can safely query the current refcount */ > >> + bdrv_ref(bs); > >> + original_refcount = bs->refcnt; > >> + > >> + block_job_cancel_sync(bs->job); > >> + aio_context_release(aio_context); > >> + break; > >> + } > >> + aio_context_release(aio_context); > >> + } > >> + > >> + /* All the remaining BlockDriverStates are referenced directly or > >> + * indirectly from block jobs, so there needs to be at least one BDS > >> + * directly used by a block job */ > >> + assert(bs); > >> + > >> + /* Wait for the block job to release its reference */ > >> + while (bs->refcnt >= original_refcount) { > >> + aio_poll(aio_context, true); > >> + } > >> + bdrv_unref(bs); > > > > If at this point bs->refcnt is greater than 1, why don't we care where are the > > remaining references from? > > We do care. A BDS will not be removed from all_bdrv_states until it is > deleted (i.e. its refcount becomes 0). Therefore, this loop will > continue until all BDSs have been deleted. > > So where might additional references come from? Since this loop only > cares about direct or indirect references from block jobs, that's > exactly it: > > (1) You might have multiple block jobs running on a BDS in the future. > Then, you'll cancel them one by one, and after having canceled the > first one, the refcount will still be greater than one before the > bdrv_unref(). > > (2) Imagine a BDS A with a parent BDS B. There are block jobs running on > both of them. Now, B is referenced by both its block job and by A > (indirectly by the block job referencing A). If we cancel the job on > B before the one on A, then the refcount on B will still be greater > than 1 before bdrv_unref() because it is still referenced by its > parent (until the block job on A is canceled, too). > > The first cannot happen right now, but the second one may, I'm not sure > (depending on whether op blockers allow it). OK, that makes sense. The all_bdrv_states is the central place to make sure all refcnts reaching 0. > > > And we do make sure that there are no additional references besides: > - directly from the monitor (monitor-owned BDSs) > - from a BB > - from block jobs > - (+ everything transitively through the respective BDS tree) > > If there were additional references, the inner loop would at some point > no longer find a BDS with a block job while all_bdrv_states is still not > empty. That's what the "assert(bs)" after the inner loop is for. > > If you can imagine another way where a reference to a BDS may come from, > that would be a bug in this patch and we have to make sure to respect > that case, too. I think the only one reference I'm not sure is in xen_disk. blk_unref is called in the .free and .disconnect call backs but I have no idea if they are called before bdrv_close_all. Otherwise this patch looks good for me. Thanks, Fam