From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40805) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XZhkc-0006xX-4u for qemu-devel@nongnu.org; Thu, 02 Oct 2014 10:53:08 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XZhkV-0004A9-Du for qemu-devel@nongnu.org; Thu, 02 Oct 2014 10:53:02 -0400 Received: from mail-wi0-x234.google.com ([2a00:1450:400c:c05::234]:53194) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XZhkU-00049a-Ua for qemu-devel@nongnu.org; Thu, 02 Oct 2014 10:52:55 -0400 Received: by mail-wi0-f180.google.com with SMTP id em10so4328689wid.13 for ; Thu, 02 Oct 2014 07:52:54 -0700 (PDT) Date: Thu, 2 Oct 2014 15:52:51 +0100 From: Stefan Hajnoczi Message-ID: <20141002145251.GD6250@stefanha-thinkpad.redhat.com> References: <1412239972-23493-1-git-send-email-aik@ozlabs.ru> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="3Gf/FFewwPeBMqCJ" Content-Disposition: inline In-Reply-To: <1412239972-23493-1-git-send-email-aik@ozlabs.ru> Subject: Re: [Qemu-devel] [RFC PATCH] block/migration: Disable cache invalidate for incoming migration List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexey Kardashevskiy Cc: Kevin Wolf , Paolo Bonzini , qemu-devel@nongnu.org, Stefan Hajnoczi --3Gf/FFewwPeBMqCJ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Oct 02, 2014 at 06:52:52PM +1000, Alexey Kardashevskiy wrote: > When migrated using libvirt with "--copy-storage-all", at the end of > migration there is race between NBD mirroring task trying to do flush > and migration completion, both end up invalidating cache. Since qcow2 > driver does not handle this situation very well, random crashes happen. >=20 > This disables the BDRV_O_INCOMING flag for the block device being migrated > and restores it when NBD task is done. >=20 > Signed-off-by: Alexey Kardashevskiy > --- >=20 >=20 > The commit log is not full and most likely incorrect as well > as the patch :) Please, help. Thanks! >=20 > The patch seems to fix the initial problem though. >=20 >=20 > btw is there any easy way to migrate one QEMU to another > using NBD (i.e. not using "migrate -b") and not using libvirt? > What would the command line be? Debugging with libvirt is real > pain :( >=20 >=20 > --- > block.c | 17 ++++------------- > migration.c | 1 - > nbd.c | 11 +++++++++++ > 3 files changed, 15 insertions(+), 14 deletions(-) >=20 > diff --git a/block.c b/block.c > index c5a251c..ed72e0a 100644 > --- a/block.c > +++ b/block.c > @@ -5073,6 +5073,10 @@ void bdrv_invalidate_cache_all(Error **errp) > QTAILQ_FOREACH(bs, &bdrv_states, device_list) { > AioContext *aio_context =3D bdrv_get_aio_context(bs); > =20 > + if (!(bs->open_flags & BDRV_O_INCOMING)) { > + continue; > + } > + We shouldn't touch bs before acquiring the AioContext. Acquiring the AioContext is basically the "lock" for the BDS. It needs to be moved... > aio_context_acquire(aio_context); =2E..in here. > bdrv_invalidate_cache(bs, &local_err); > aio_context_release(aio_context); > @@ -5083,19 +5087,6 @@ void bdrv_invalidate_cache_all(Error **errp) > } > } > =20 > -void bdrv_clear_incoming_migration_all(void) > -{ > - BlockDriverState *bs; > - > - QTAILQ_FOREACH(bs, &bdrv_states, device_list) { > - AioContext *aio_context =3D bdrv_get_aio_context(bs); > - > - aio_context_acquire(aio_context); > - bs->open_flags =3D bs->open_flags & ~(BDRV_O_INCOMING); > - aio_context_release(aio_context); > - } > -} > - > int bdrv_flush(BlockDriverState *bs) > { > Coroutine *co; > diff --git a/migration.c b/migration.c > index 8d675b3..c49a05a 100644 > --- a/migration.c > +++ b/migration.c > @@ -103,7 +103,6 @@ static void process_incoming_migration_co(void *opaqu= e) > } > qemu_announce_self(); > =20 > - bdrv_clear_incoming_migration_all(); > /* Make sure all file formats flush their mutable metadata */ > bdrv_invalidate_cache_all(&local_err); BDRV_O_INCOMING needs to be cleared, otherwise the block drivers will think that incoming migration is still taking place and treat the file as effectively read-only during open. On IRC I suggested doing away with the bdrv_invalidate_cache_all() name since it no longer calls bdrv_invalidate_cache() on all BDSes. Combine both clearing BDRV_O_INCOMING and calling bdrv_invalidate_cache() if BDRV_O_INCOMING was previously set into one function - you could reuse bdrv_clear_incoming_migration_all() for that. > if (local_err) { > diff --git a/nbd.c b/nbd.c > index e9b539b..7b479c0 100644 > --- a/nbd.c > +++ b/nbd.c > @@ -106,6 +106,7 @@ struct NBDExport { > off_t dev_offset; > off_t size; > uint32_t nbdflags; > + bool restore_incoming; Not needed, it does not make sense to restore BDRV_O_INCOMING because once we've written to a file it cannot be in use by another host at the same time. > QTAILQ_HEAD(, NBDClient) clients; > QTAILQ_ENTRY(NBDExport) next; > =20 > @@ -972,6 +973,13 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_= t dev_offset, > exp->ctx =3D bdrv_get_aio_context(bs); > bdrv_ref(bs); > bdrv_add_aio_context_notifier(bs, bs_aio_attached, bs_aio_detach, ex= p); > + > + if (bs->open_flags & BDRV_O_INCOMING) { I think the flag has to be cleared before calling bdrv_invalidate_cache() because the .bdrv_open() function looks at the flag. --3Gf/FFewwPeBMqCJ Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJULWbDAAoJEJykq7OBq3PIuKwIAMmRMZN7CxVcbcj/qtLS8QaL bDi6gsN9dG8Gm+wZw0BoHGOR+z7vldn9aA2KPWGcdrcm+vfCW+xVnTE7vFFnJjQx kpQVJ6No/JT+MgpTOHO1v5nyyNoYtiU75DADbFUjODjNZnp6bh4vzS9cECjW3Gjo hC97KxdeR8xL4R0y18tWwb5qMc7s9Qw/12qCRtQrBfixMO5J3bAHPvzZOUz6pUw6 dASOQ8MCSVZn3lSTqmx7IZ5zvGjQumopvqbeBw7aBKjwowOZCNhRVNyVfmTREfnI +2Eh/eabfI7FctX6Z9SjpV7w6S91JjHGugKItDKMFAum5XcNc4esYeeax87tvUk= =+eRB -----END PGP SIGNATURE----- --3Gf/FFewwPeBMqCJ--