From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51847) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1anNSv-0007wX-A6 for qemu-devel@nongnu.org; Tue, 05 Apr 2016 05:40:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1anNSu-0006x9-1h for qemu-devel@nongnu.org; Tue, 05 Apr 2016 05:40:05 -0400 Date: Tue, 5 Apr 2016 10:39:56 +0100 From: Stefan Hajnoczi Message-ID: <20160405093956.GC2015@stefanha-x1.localdomain> References: <1459519058-29864-1-git-send-email-famz@redhat.com> <20160404115734.GA10964@stefanha-x1.localdomain> <20160405012739.GA670@ad.usersys.redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="TYecfFk8j8mZq+dy" Content-Disposition: inline In-Reply-To: <20160405012739.GA670@ad.usersys.redhat.com> Subject: Re: [Qemu-devel] [PATCH v2] block: Fix bdrv_drain in coroutine List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Fam Zheng Cc: Kevin Wolf , lvivier@redhat.com, qemu-devel@nongnu.org, qemu-block@nongnu.org, pbonzini@redhat.com --TYecfFk8j8mZq+dy Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Apr 05, 2016 at 09:27:39AM +0800, Fam Zheng wrote: > On Mon, 04/04 12:57, Stefan Hajnoczi wrote: > > On Fri, Apr 01, 2016 at 09:57:38PM +0800, Fam Zheng wrote: > > > + BdrvCoDrainData data; > > > + > > > + assert(qemu_in_coroutine()); > > > + data =3D (BdrvCoDrainData) { > > > + .co =3D qemu_coroutine_self(), > > > + .bs =3D bs, > > > + .done =3D false, > > > + .bh =3D aio_bh_new(bdrv_get_aio_context(bs), bdrv_co_drain_b= h_cb, &data), > > > + }; > > > + qemu_bh_schedule(data.bh); > > > + > > > + qemu_coroutine_yield(); > > > + /* If we are resumed from some other event (such as an aio compl= etion or a > > > + * timer callback), it is a bug in the caller that should be fix= ed. */ > > > + assert(data.done); > > > +} > > > + > > > /* > > > * Wait for pending requests to complete on a single BlockDriverStat= e subtree, > > > * and suspend block driver's internal I/O until next request arrive= s. > > > @@ -269,6 +306,10 @@ void bdrv_drain(BlockDriverState *bs) > > > bool busy =3D true; > > > =20 > > > bdrv_drain_recurse(bs); > > > + if (qemu_in_coroutine()) { > > > + bdrv_co_drain(bs); > > > + return; > > > + } > > > while (busy) { > > > /* Keep iterating */ > > > bdrv_flush_io_queue(bs); > > > --=20 > > > 2.7.4 > >=20 > > block/mirror.c should call bdrv_co_drain() explicitly and bdrv_drain() > > should assert(!qemu_in_coroutine()). > >=20 > > The reason why existing bdrv_read() and friends detect coroutine context > > at runtime is because it is meant for legacy code that runs in both > > coroutine and non-coroutine contexts. >=20 > blk_unref/bdrv_unref are called in coroutine (.bdrv_create implementation= s), > and this doesn't just work with the assertion. Should I clean up this "le= gacy" > code first, i.e. move bdrv_unref calls to BHs in the callers and > assert(!qemu_in_coroutine()) there too? I didn't think this because it > complicates the code somehow. This is a messy problem. In general I don't like introducing yields into non-coroutine_fn functions because it can lead to bugs when the caller didn't expect a yield point. For example, I myself wouldn't have expected bdrv_unref() to be a yield point. So maybe coroutine code I've written would be vulnerable to interference (I won't call it a race condition) from another coroutine across the bdrv_unref() call. This could mean that another coroutine now sees intermediate state that would never be visible without the new yield point. I think attempting to invoke qemu_co_queue_run_restart() directly instead of scheduling a BH and yielding does not improve the situation. It's also a layering violation since qemu_co_queue_run_restart() is just meant for the core coroutine code and isn't a public interface. Anyway, let's consider bdrv_drain() legacy code that can call if (qemu_in_coroutine()) but please make bdrv_co_drain() public so block/mirror.c can at least call it directly. Stefan --TYecfFk8j8mZq+dy Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJXA4fsAAoJEJykq7OBq3PICscIAKTToysDJ9+JTQ/BEPQm/zqP Hqv7RzV6w89/TwVLjQ9/TC4B03mjMD6UFjR5vpUipUcBL4TPDuNQFPGDSgM/Xp/u wNtF7Zw4QKfoVPgjVYBZt9zNgOR3ZGmQ3n4LHGYVzqHx1i1H9w7kKI5UBJyLFrid 9Gs4b2mTJI5Izk+yMDVrKE2lnMxDThUKuC82DdEeA1HztPGNOt5BdKf9PYrcTGv+ bttpf2MX1wpWgRhHqhv26WV09BDLD9ZNvMjBcOsRJ5xU01POnvMO4LrZMyXZUm4V MoPncWUeKi7o3V09urMTjPsk1XVYGRpNa4koFEaEAebIDNsAn1e8hNSSPGdN3fI= =xfYG -----END PGP SIGNATURE----- --TYecfFk8j8mZq+dy--