From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42091) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bAkHy-0001Rg-Rv for qemu-devel@nongnu.org; Wed, 08 Jun 2016 16:41:24 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bAkHu-000837-KC for qemu-devel@nongnu.org; Wed, 08 Jun 2016 16:41:21 -0400 Received: from mx1.redhat.com ([209.132.183.28]:36951) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bAkHu-00082y-Bi for qemu-devel@nongnu.org; Wed, 08 Jun 2016 16:41:18 -0400 References: <1465409584-16308-1-git-send-email-haris.phnx@gmail.com> From: Eric Blake Message-ID: <575882EC.1090906@redhat.com> Date: Wed, 8 Jun 2016 14:41:16 -0600 MIME-Version: 1.0 In-Reply-To: <1465409584-16308-1-git-send-email-haris.phnx@gmail.com> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="fe0fmQfqvm8ljF63hMTU9Om511qqA6bCd" Subject: Re: [Qemu-devel] [Qemu-devel [RFC] [WIP] v2] Keeping the Source side alive incase of network failure (Migration recovery from network failure) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Md Haris Iqbal , qemu-devel@nongnu.org Cc: dgilbert@redhat.com This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --fe0fmQfqvm8ljF63hMTU9Om511qqA6bCd Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 06/08/2016 12:13 PM, Md Haris Iqbal wrote: The subject line is long, and has a typo (s/incase/in case/). Also, the mailing list automatically prepends [Qemu-devel], so you shouldn't repeat it manually. Better might have been a short subject line then a longer commit body: migration: keep source alive on network failure Details about what was failing, and why this code improves it Missing a Signed-off-by: attribution; without that, we can't take it. > --- You marked this patch as v2, but in the same minute sent another email with subject line v1, and didn't say what changed to need a v2. Here after the --- divider is a good place for that. > include/migration/migration.h | 1 + > migration/migration.c | 76 +++++++++++++++++++++++++++++++++++= +++++--- > qapi-schema.json | 11 +++++-- > vl.c | 4 +++ > 4 files changed, 85 insertions(+), 7 deletions(-) >=20 > @@ -1726,11 +1755,32 @@ static void *migration_thread(void *opaque) > } > } > =20 > - if (qemu_file_get_error(s->to_dst_file)) { > - migrate_set_state(&s->state, current_active_state, > - MIGRATION_STATUS_FAILED); > - trace_migration_thread_file_err(); > - break; > + if ((ret =3D qemu_file_get_error(s->to_dst_file))) { > + fprintf(stderr, "1 : Error %s %d\n", strerror(-ret), -ret)= ; fprintf() is rather awkward for errors; can we use qemu's Error mechanism= ? > + > + /* This check is based on how the error is set during the= network > + * recv(). When recv() returns 0 (i.e. no data to read), = the error > + * is set to -EIO. For all other network errors, it is se= t > + * according to the return value received. > + */ > + if (ret !=3D -EIO && s->state =3D=3D MIGRATION_STATUS_POST= COPY_ACTIVE) { > + /* Network Failure during postcopy */ > + > + current_active_state =3D MIGRATION_STATUS_POSTCOPY_REC= OVERY; > + runstate_set(RUN_STATE_POSTMIGRATE_RECOVERY); > + fprintf(stderr, "1.1 : Error %s %d\n", strerror(-ret),= -ret); Does the end user really need to see "1.1 :" > +++ b/qapi-schema.json > @@ -154,12 +154,14 @@ > # @watchdog: the watchdog action is configured to pause and has been t= riggered > # > # @guest-panicked: guest has been panicked as a result of guest OS pan= ic > +# > +# @postmigrate-recovery: guest is paused for recovery after a network = failure Not your fault that the overall enum is missing an overall line: # Since: 1.4 nor that guest-panicked is missing a "(since 1.5)" hint, but at least your addition should have a "(since 2.7)" hint. > ## > { 'enum': 'RunState', > 'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'pause= d', > 'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm'= , > 'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',= > - 'guest-panicked' ] } > + 'guest-panicked', 'postmigrate-recovery' ] } Adding new enums can cause existing clients like libvirt to do weird things if they aren't expecting the new state. Are we sure we want to do it? Is it a state that cannot be entered by default, but only in response to a client request that proves the client is new enough to expect the new state? > =20 > ## > # @StatusInfo: > @@ -434,12 +436,15 @@ > # > # @failed: some error occurred during migration process. > # > +# @postcopy-recovery: in recovery mode, after a network failure. > +# Missing a "(since 2.7)" hint. > # Since: 2.3 > # > ## > { 'enum': 'MigrationStatus', > 'data': [ 'none', 'setup', 'cancelling', 'cancelled', > - 'active', 'postcopy-active', 'completed', 'failed' ] } > + 'active', 'postcopy-active', 'completed', 'failed', > + 'postcopy-recovery' ] } > =20 > ## > # @MigrationInfo > @@ -2058,6 +2063,8 @@ > # > # @uri: the Uniform Resource Identifier of the destination VM > # > +# @recover: #optional recover from a broken migration > +# I don't see any 'recover' parameter added to the 'migrate' command to match this added documentation. > # @blk: #optional do block migration (full disk copy) > # > # @inc: #optional incremental disk copy migration > diff --git a/vl.c b/vl.c > index 5fd22cb..c237140 100644 > --- a/vl.c > +++ b/vl.c > @@ -618,6 +618,10 @@ static const RunStateTransition runstate_transitio= ns_def[] =3D { > { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING }, > { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE }, > { RUN_STATE_FINISH_MIGRATE, RUN_STATE_PRELAUNCH }, > + { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE_RECOVERY }, > + > + { RUN_STATE_POSTMIGRATE_RECOVERY, RUN_STATE_FINISH_MIGRATE }, > + { RUN_STATE_POSTMIGRATE_RECOVERY, RUN_STATE_SHUTDOWN }, > =20 > { RUN_STATE_RESTORE_VM, RUN_STATE_RUNNING }, > { RUN_STATE_RESTORE_VM, RUN_STATE_PRELAUNCH }, >=20 --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --fe0fmQfqvm8ljF63hMTU9Om511qqA6bCd Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJXWILsAAoJEKeha0olJ0NqSUAH/ia7r6GtUubV1iFLwTYqPSRF hu+Bo7LkUVuN0RUlWPAHINzynFgpGgtUcC5JgNtBjAykmHK01Vk6levJOdAJej0M Ok7qseHNBZI7jI0JY7evAuJmu15CPFs72NHDtFwMTDyURegOs60h9SlchwYTVyOQ gToeosMQ8gDOQAi2xmJ1LK9YuKKmSIEGxcULdWz3Yg/DJgUBvT0F5L/tykxT3QTj yKmI5+KdBBicplmngsUJnosiaP4pvHrgREbUj+ZXRin4ra2QcHZjBuW5ajFtlvbC hRMP7kLZRsr7RGZLw07uJoODo2Fs+y6TsU1/fcyBQelIWBkaedpBx7zlMShkyUs= =1ju3 -----END PGP SIGNATURE----- --fe0fmQfqvm8ljF63hMTU9Om511qqA6bCd--