From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38730) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZyMRN-0003pj-FG for qemu-devel@nongnu.org; Mon, 16 Nov 2015 11:15:38 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZyMRM-0006Sq-4j for qemu-devel@nongnu.org; Mon, 16 Nov 2015 11:15:37 -0500 References: <1447108773-6836-1-git-send-email-mreitz@redhat.com> <1447108773-6836-23-git-send-email-mreitz@redhat.com> <20151112073427.GM4082@ad.usersys.redhat.com> From: Max Reitz Message-ID: <564A0119.7080704@redhat.com> Date: Mon, 16 Nov 2015 17:15:21 +0100 MIME-Version: 1.0 In-Reply-To: <20151112073427.GM4082@ad.usersys.redhat.com> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="iCoQvT0bd99CCmIgXdVq1EOLUfUQxrSIL" Subject: Re: [Qemu-devel] [PATCH v7 22/24] block: Rewrite bdrv_close_all() List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Fam Zheng Cc: Kevin Wolf , Alberto Garcia , qemu-block@nongnu.org, Markus Armbruster , qemu-devel@nongnu.org, Stefan Hajnoczi , Paolo Bonzini , John Snow This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --iCoQvT0bd99CCmIgXdVq1EOLUfUQxrSIL Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 12.11.2015 08:34, Fam Zheng wrote: > On Mon, 11/09 23:39, Max Reitz wrote: >> This patch rewrites bdrv_close_all(): Until now, all root BDSs have be= en >> force-closed. This is bad because it can lead to cached data not being= >> flushed to disk. >> >> Instead, try to make all reference holders relinquish their reference >> voluntarily: >> >> 1. All BlockBackend users are handled by making all BBs simply eject >> their BDS tree. Since a BDS can never be on top of a BB, this will >> not cause any of the issues as seen with the force-closing of BDSs.= >> The references will be relinquished and any further access to the B= B >> will fail gracefully. >> 2. All BDSs which are owned by the monitor itself (because they do not= >> have a BB) are relinquished next. >> 3. Besides BBs and the monitor, block jobs and other BDSs are the only= >> things left that can hold a reference to BDSs. After every remainin= g >> block job has been canceled, there should not be any BDSs left (and= >> the loop added here will always terminate (as long as NDEBUG is not= >> defined), because either all_bdrv_states will be empty or there wil= l >> not be any block job left to cancel, failing the assertion). >> >> Signed-off-by: Max Reitz >> Reviewed-by: Kevin Wolf >> --- >> block.c | 45 +++++++++++++++++++++++++++++++++++++-------- >> 1 file changed, 37 insertions(+), 8 deletions(-) >> >> diff --git a/block.c b/block.c >> index b935dff..e3d5aea 100644 >> --- a/block.c >> +++ b/block.c >> @@ -1905,9 +1905,7 @@ static void bdrv_close(BlockDriverState *bs) >> { >> BdrvAioNotifier *ban, *ban_next; >> =20 >> - if (bs->job) { >> - block_job_cancel_sync(bs->job); >> - } >> + assert(!bs->job); >> =20 >> /* Disable I/O limits and drain all pending throttled requests */= >> if (bs->throttle_state) { >> @@ -1971,13 +1969,44 @@ static void bdrv_close(BlockDriverState *bs) >> void bdrv_close_all(void) >> { >> BlockDriverState *bs; >> + AioContext *aio_context; >> + int original_refcount =3D 0; >> =20 >> - QTAILQ_FOREACH(bs, &bdrv_states, device_list) { >> - AioContext *aio_context =3D bdrv_get_aio_context(bs); >> + /* Drop references from requests still in flight, such as cancele= d block >> + * jobs whose AIO context has not been polled yet */ >> + bdrv_drain_all(); >> =20 >> - aio_context_acquire(aio_context); >> - bdrv_close(bs); >> - aio_context_release(aio_context); >> + blockdev_close_all_bdrv_states(); >> + blk_remove_all_bs(); >> + >> + /* Cancel all block jobs */ >> + while (!QTAILQ_EMPTY(&all_bdrv_states)) { >> + QTAILQ_FOREACH(bs, &all_bdrv_states, bs_list) { >> + aio_context =3D bdrv_get_aio_context(bs); >> + >> + aio_context_acquire(aio_context); >> + if (bs->job) { >> + /* So we can safely query the current refcount */ >> + bdrv_ref(bs); >> + original_refcount =3D bs->refcnt; >> + >> + block_job_cancel_sync(bs->job); >> + aio_context_release(aio_context); >> + break; >> + } >> + aio_context_release(aio_context); >> + } >> + >> + /* All the remaining BlockDriverStates are referenced directl= y or >> + * indirectly from block jobs, so there needs to be at least = one BDS >> + * directly used by a block job */ >> + assert(bs); >> + >> + /* Wait for the block job to release its reference */ >> + while (bs->refcnt >=3D original_refcount) { >> + aio_poll(aio_context, true); >> + } >> + bdrv_unref(bs); >=20 > If at this point bs->refcnt is greater than 1, why don't we care where = are the > remaining references from? We do care. A BDS will not be removed from all_bdrv_states until it is deleted (i.e. its refcount becomes 0). Therefore, this loop will continue until all BDSs have been deleted. So where might additional references come from? Since this loop only cares about direct or indirect references from block jobs, that's exactly it: (1) You might have multiple block jobs running on a BDS in the future. Then, you'll cancel them one by one, and after having canceled the first one, the refcount will still be greater than one before the bdrv_unref(). (2) Imagine a BDS A with a parent BDS B. There are block jobs running on both of them. Now, B is referenced by both its block job and by A (indirectly by the block job referencing A). If we cancel the job on B before the one on A, then the refcount on B will still be greater than 1 before bdrv_unref() because it is still referenced by its parent (until the block job on A is canceled, too). The first cannot happen right now, but the second one may, I'm not sure (depending on whether op blockers allow it). And we do make sure that there are no additional references besides: - directly from the monitor (monitor-owned BDSs) - from a BB - from block jobs - (+ everything transitively through the respective BDS tree) If there were additional references, the inner loop would at some point no longer find a BDS with a block job while all_bdrv_states is still not empty. That's what the "assert(bs)" after the inner loop is for. If you can imagine another way where a reference to a BDS may come from, that would be a bug in this patch and we have to make sure to respect that case, too. Max --iCoQvT0bd99CCmIgXdVq1EOLUfUQxrSIL Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAEBCAAGBQJWSgEZAAoJEDuxQgLoOKytik0H/Am0XPhU25/Cma4ocTVk4OTo gaDUtQQ7rycfLbEpUuuGWmMgqJt+NRsIHeAAcy7Teej2g+Ah2bX/SnxjzxQYevNx OeEFZ1QMxhEyHKX+GBxFCCkIH38PyBBz36lTN0Gp6w048ZT1bvCT3i7Yri+0sbDM I9fdVq0TYhORVUG58sq0vMVQ7CqkyqM2SWyidVIYyj4gTuPk5rXzfn/hg84yzBWl z2HOOwdK1HGDQbdUCMueb0dDr4cB/OmN7zANLny6hCLP8g04bZt2FftZ9xGH5wjm P2W6WL3kJMHqCPhCGtMwDPlMlaF7H7iKB90eHt6kaXY7gT8xTgBKivph+el869A= =gCxq -----END PGP SIGNATURE----- --iCoQvT0bd99CCmIgXdVq1EOLUfUQxrSIL--