From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50655) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XlDMt-0004jb-On for qemu-devel@nongnu.org; Mon, 03 Nov 2014 03:52:10 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XlDMr-00026a-LT for qemu-devel@nongnu.org; Mon, 03 Nov 2014 03:52:07 -0500 Received: from ozlabs.org ([2401:3900:2:1::2]:42897) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XlDMr-00026V-1u for qemu-devel@nongnu.org; Mon, 03 Nov 2014 03:52:05 -0500 Date: Mon, 3 Nov 2014 16:51:09 +1100 From: David Gibson Message-ID: <20141103055109.GQ8949@voom.redhat.com> References: <1412358473-31398-1-git-send-email-dgilbert@redhat.com> <1412358473-31398-22-git-send-email-dgilbert@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="yFH6hwN92mUA4HK5" Content-Disposition: inline In-Reply-To: <1412358473-31398-22-git-send-email-dgilbert@redhat.com> Subject: Re: [Qemu-devel] [PATCH v4 21/47] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert (git)" Cc: aarcange@redhat.com, yamahata@private.email.ne.jp, lilei@linux.vnet.ibm.com, quintela@redhat.com, cristian.klein@cs.umu.se, qemu-devel@nongnu.org, amit.shah@redhat.com, yanghy@cn.fujitsu.com --yFH6hwN92mUA4HK5 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Oct 03, 2014 at 06:47:27PM +0100, Dr. David Alan Gilbert (git) wrot= e: > From: "Dr. David Alan Gilbert" >=20 > Add state variable showing current incoming postcopy state. This appears to implement a lot more than just adding a state variable... > Signed-off-by: Dr. David Alan Gilbert > --- > include/migration/migration.h | 8 + > include/sysemu/sysemu.h | 20 +++ > savevm.c | 335 ++++++++++++++++++++++++++++++++++++= ++++++ > 3 files changed, 363 insertions(+) >=20 > diff --git a/include/migration/migration.h b/include/migration/migration.h > index 0d9f62d..2c078c4 100644 > --- a/include/migration/migration.h > +++ b/include/migration/migration.h > @@ -61,6 +61,14 @@ typedef struct MigrationState MigrationState; > struct MigrationIncomingState { > QEMUFile *file; > =20 > + volatile enum { What's the reason for the volatile? I think that really needs a comment. > + POSTCOPY_RAM_INCOMING_NONE =3D 0, /* Initial state - no postcop= y */ > + POSTCOPY_RAM_INCOMING_ADVISE, > + POSTCOPY_RAM_INCOMING_LISTENING, > + POSTCOPY_RAM_INCOMING_RUNNING, > + POSTCOPY_RAM_INCOMING_END > + } postcopy_ram_state; > + > QEMUFile *return_path; > QemuMutex rp_mutex; /* We send replies from multiple threads= */ > }; > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h > index ad96f2a..102dd93 100644 > --- a/include/sysemu/sysemu.h > +++ b/include/sysemu/sysemu.h > @@ -88,6 +88,16 @@ enum qemu_vm_cmd { > QEMU_VM_CMD_OPENRP, /* Tell the dest to open the Return path = */ > QEMU_VM_CMD_REQACK, /* Request an ACK on the RP */ > =20 > + QEMU_VM_CMD_POSTCOPY_RAM_ADVISE =3D 20, /* Prior to any page transf= ers, just > + warn we might want to do P= C */ > + QEMU_VM_CMD_POSTCOPY_RAM_DISCARD, /* A list of pages to discard= that > + were previously sent during > + precopy but are dirty. */ > + QEMU_VM_CMD_POSTCOPY_RAM_LISTEN, /* Start listening for incomi= ng > + pages as it's running. */ > + QEMU_VM_CMD_POSTCOPY_RAM_RUN, /* Start execution */ > + QEMU_VM_CMD_POSTCOPY_RAM_END, /* Postcopy is finished. */ > + > QEMU_VM_CMD_AFTERLASTVALID > }; > =20 > @@ -102,6 +112,16 @@ void qemu_savevm_command_send(QEMUFile *f, enum qemu= _vm_cmd command, > uint16_t len, uint8_t *data); > void qemu_savevm_send_reqack(QEMUFile *f, uint32_t value); > void qemu_savevm_send_openrp(QEMUFile *f); > +void qemu_savevm_send_postcopy_ram_advise(QEMUFile *f); > +void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name, > + uint16_t len, uint8_t offset, > + uint64_t *addrlist, > + uint32_t *masklist); > + > +void qemu_savevm_send_postcopy_ram_listen(QEMUFile *f); > +void qemu_savevm_send_postcopy_ram_run(QEMUFile *f); > +void qemu_savevm_send_postcopy_ram_end(QEMUFile *f, uint8_t status); > + > int qemu_loadvm_state(QEMUFile *f); > =20 > /* SLIRP */ > diff --git a/savevm.c b/savevm.c > index 7236232..b942e8c 100644 > --- a/savevm.c > +++ b/savevm.c > @@ -39,6 +39,7 @@ > #include "exec/memory.h" > #include "qmp-commands.h" > #include "trace.h" > +#include "qemu/bitops.h" > #include "qemu/iov.h" > #include "block/snapshot.h" > #include "block/qapi.h" > @@ -624,6 +625,92 @@ void qemu_savevm_send_openrp(QEMUFile *f) > { > qemu_savevm_command_send(f, QEMU_VM_CMD_OPENRP, 0, NULL); > } > + > +/* Send prior to any RAM transfer */ > +void qemu_savevm_send_postcopy_ram_advise(QEMUFile *f) > +{ > + DPRINTF("send postcopy-ram-advise"); > + uint64_t tmp[2]; > + tmp[0] =3D cpu_to_be64(sysconf(_SC_PAGESIZE)); > + tmp[1] =3D cpu_to_be64(1ul << qemu_target_page_bits()); > + > + qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_ADVISE, 16, > + (uint8_t *)tmp); > +} > + > +/* Prior to running, to cause pages that have been dirtied after precopy > + * started to be discarded on the destination. > + * CMD_POSTCOPY_RAM_DISCARD consist of: > + * 3 byte header (filled in by qemu_savevm_send_postcopy_ram_discard) > + * byte version (0) > + * byte offset into the 1st data word containing 1st page of RAMB= lock I'm not able to follow what that description means. > + * byte Length of name field > + * n x byte RAM block name (NOT 0 terminated) > + * n x > + * be64 Page addresses for start of an invalidation range > + * be32 mask of 32 pages, '1' to discard' > + * > + * Hopefully this is pretty sparse so we don't get too many entries, > + * and using the mask should deal with most pagesize differences > + * just ending up as a single full mask > + * > + * The mask is always 32bits irrespective of the long size > + * > + * name: RAMBlock name that these entries are part of > + * len: Number of page entries > + * addrlist: 'len' addresses > + * masklist: 'len' masks (corresponding to the addresses) > + */ > +void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name, > + uint16_t len, uint8_t offset, > + uint64_t *addrlist, > + uint32_t *masklist) > +{ > + uint8_t *buf; > + uint16_t tmplen; > + uint16_t t; > + > + DPRINTF("send postcopy-ram-discard"); > + buf =3D g_malloc0(len*12 + strlen(name) + 3); > + buf[0] =3D 0; /* Version */ > + buf[1] =3D offset; > + assert(strlen(name) < 256); > + buf[2] =3D strlen(name); > + memcpy(buf+3, name, strlen(name)); > + tmplen =3D 3+strlen(name); > + > + for (t =3D 0; t < len; t++) { > + cpu_to_be64w((uint64_t *)(buf + tmplen), addrlist[t]); > + tmplen +=3D 8; > + cpu_to_be32w((uint32_t *)(buf + tmplen), masklist[t]); > + tmplen +=3D 4; > + } > + qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_DISCARD, > + tmplen, buf); > + g_free(buf); > +} > + > +/* Get the destination into a state where it can receive page data. */ > +void qemu_savevm_send_postcopy_ram_listen(QEMUFile *f) > +{ > + DPRINTF("send postcopy-ram-listen"); > + qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_LISTEN, 0, NULL= ); > +} > + > +/* Kick the destination into running */ > +void qemu_savevm_send_postcopy_ram_run(QEMUFile *f) > +{ > + DPRINTF("send postcopy-ram-run"); > + qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_RUN, 0, NULL); > +} > + > +/* End of postcopy - with a status byte; 0 is good, anything else is a f= ail */ > +void qemu_savevm_send_postcopy_ram_end(QEMUFile *f, uint8_t status) > +{ > + DPRINTF("send postcopy-ram-end"); > + qemu_savevm_command_send(f, QEMU_VM_CMD_POSTCOPY_RAM_END, 1, &status= ); > +} > + > bool qemu_savevm_state_blocked(Error **errp) > { > SaveStateEntry *se; > @@ -935,6 +1022,220 @@ static LoadStateEntry_Head loadvm_handlers =3D > static int qemu_loadvm_state_main(QEMUFile *f, > LoadStateEntry_Head *loadvm_handlers); > =20 > +/* ------ incoming postcopy-ram messages ------ */ > +/* 'advise' arrives before any RAM transfers just to tell us that a post= copy > + * *might* happen - it might be skipped if precopy transferred everything > + * quickly. > + */ > +static int loadvm_postcopy_ram_handle_advise(MigrationIncomingState *mis, > + uint64_t remote_hps, > + uint64_t remote_tps) > +{ > + DPRINTF("%s", __func__); > + if (mis->postcopy_ram_state !=3D POSTCOPY_RAM_INCOMING_NONE) { > + error_report("CMD_POSTCOPY_RAM_ADVISE in wrong postcopy state (%= d)", > + mis->postcopy_ram_state); > + return -1; > + } > + > + if (remote_hps !=3D sysconf(_SC_PAGESIZE)) { > + /* > + * Some combinations of mismatch are probably possible but it ge= ts > + * a bit more complicated. In particular we need to place whole > + * host pages on the dest at once, and we need to ensure that we > + * handle dirtying to make sure we never end up sending part of > + * a hostpage on it's own. > + */ > + error_report("Postcopy needs matching host page sizes (s=3D%d d= =3D%d)", > + (int)remote_hps, (int)sysconf(_SC_PAGESIZE)); > + return -1; > + } > + > + if (remote_tps !=3D (1ul << qemu_target_page_bits())) { > + /* > + * Again, some differences could be dealt with, but for now keep= it > + * simple. > + */ > + error_report("Postcopy needs matching target page sizes (s=3D%d = d=3D%d)", > + (int)remote_tps, 1 << qemu_target_page_bits()); > + return -1; > + } > + > + mis->postcopy_ram_state =3D POSTCOPY_RAM_INCOMING_ADVISE; > + > + /* > + * Postcopy will be sending lots of small messages along the return = path > + * that it needs quick answers to. > + */ > + socket_set_nodelay(qemu_get_fd(mis->return_path)); So, here you break the QEMUFile abstraction and assume you have a socket. > + return 0; > +} > + > +/* After postcopy we will be told to throw some pages away since they're > + * dirty and will have to be demand fetched. Must happen before CPU is > + * started. > + * There can be 0..many of these messages, each encoding multiple pages. > + * Bits set in the message represent a page in the source VMs bitmap, but > + * since the guest/target page sizes can be different on s/d then we have > + * to convert. > + */ > +static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mi= s, > + uint16_t len) > +{ > + int tmp; > + unsigned int first_bit_offset; > + char ramid[256]; > + > + DPRINTF("%s", __func__); > + > + if (mis->postcopy_ram_state !=3D POSTCOPY_RAM_INCOMING_ADVISE) { > + error_report("CMD_POSTCOPY_RAM_DISCARD in wrong postcopy state (= %d)", > + mis->postcopy_ram_state); > + return -1; > + } > + /* We're expecting a > + * 3 byte header, > + * a RAM ID string > + * then at least 1 12 byte chunks > + */ > + if (len < 16) { > + error_report("CMD_POSTCOPY_RAM_DISCARD invalid length (%d)", len= ); > + return -1; > + } > + > + tmp =3D qemu_get_byte(mis->file); > + if (tmp !=3D 0) { > + error_report("CMD_POSTCOPY_RAM_DISCARD invalid version (%d)", tm= p); > + return -1; > + } > + first_bit_offset =3D qemu_get_byte(mis->file); > + > + if (qemu_get_counted_string(mis->file, (uint8_t *)ramid)) { > + error_report("CMD_POSTCOPY_RAM_DISCARD Failed to read RAMBlock I= D"); > + return -1; > + } > + > + len -=3D 3+strlen(ramid); > + if (len % 12) { > + error_report("CMD_POSTCOPY_RAM_DISCARD invalid length (%d)", len= ); > + return -1; > + } > + while (len) { > + uint64_t startaddr; > + uint32_t mask; > + /* > + * We now have pairs of address, mask > + * The mask is 32 bits of bitmask starting at 'startaddr'-offs= et > + * RAMBlock; e.g. if the RAMBlock started at 8k where TPS=3D4k > + * then first_bit_offset=3D2 and the 1st 2 bits of the mask > + * aren't relevant to this RAMBlock, and bit 2 corresponds > + * to the 1st page of this RAMBlock Um.. yeah.. can't make much snse of this comment either. > + */ > + startaddr =3D qemu_get_be64(mis->file); > + mask =3D qemu_get_be32(mis->file); > + > + len -=3D 12; > + > + while (mask) { > + /* mask=3D .....?10...0 */ > + /* ^fs */ > + int firstset =3D ctz32(mask); > + > + /* tmp32=3D.....?11...1 */ > + /* ^fs */ > + uint32_t tmp32 =3D mask | ((((uint32_t)1)< + > + /* mask=3D .?01..10...0 */ > + /* ^fz ^fs */ > + int firstzero =3D cto32(tmp32); > + > + if ((startaddr =3D=3D 0) && (firstset < first_bit_offset)) { > + error_report("CMD_POSTCOPY_RAM_DISCARD bad data; bit set" > + " prior to block; block=3D%s offset=3D%d" > + " firstset=3D%d\n", ramid, first_bit_offs= et, > + firstzero); > + return -1; > + } > + > + /* > + * we know there must be at least 1 bit set due to the loop = entry > + * If there is no 0 firstzero will be 32 > + */ > + /* TODO - ram_discard_range gets added in a later patch > + int ret =3D ram_discard_range(mis, ramid, > + startaddr + firstset - first_bit_offset, > + startaddr + (firstzero - 1) - first_bit_= offset); > + ret =3D -1; > + if (ret) { > + return ret; > + } > + */ > + > + /* mask=3D .?0000000000 */ > + /* ^fz ^fs */ > + if (firstzero !=3D 32) { > + mask &=3D (((uint32_t)-1) << firstzero); > + } else { > + mask =3D 0; > + } > + } > + } > + DPRINTF("%s finished", __func__); > + > + return 0; > +} > + > +/* After this message we must be able to immediately receive page data */ The purpose of the listen message from a protocol point of view isn't really clear to me. I understand why the destination needs to set up the postcopy handling before processing the device data, but why does it need an assertion from the source to start this, rather than just an internal detail on the load path. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --yFH6hwN92mUA4HK5 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJUVxfNAAoJEGw4ysog2bOSvYEQAMw7Ni8mYMA5cBd+oO4kSPPZ FH10nuxDIPbDhLMVbI7k7VPDjf4oMiVfSVt/ehvDwW9W54kviEeVUUK0w7cnH0Fa AE+mfiV8n6n4ZREI0Fq5nXfH+0p2EHEN+tv37aB7n4B/gSyWEra7ksYosl46HLn+ Ck1Um+IKi3Zzpo55iG5Nu6etQS8dzZXLQCVVjnT/yObN6XynJ78caQtEbdBpJDJ6 0tZ1miUOdv/oyv3O8fsmzZ/Bp6I/QfxyS+v/IJVx5DPpE8QQIJKllSTBiA8bu35N tPME1Oche6PxYaJbpvBjdI09jNf+6lJZUtfOhi3tsmbwVVh/ClY6VPps9DSi8ZpM ffCXrx3YWrPdw7uUxZ5Rmvis15FspHA+gqtEUf3PTZdCV7LsAV9+x/8Hv1aAJjwq FnWMP3nmbRzz2DqjIR8tLeOcgJuF3CnFbk5muXXgXRSnHmVOwVZeels33AuI4zw/ HGCbSpcrPj9NJxvhOGMRCri79aeq00ujj6DDmxi+ZKr1pY3LOQ5pb0U6aoDGBPkh s5Q26BUBD1LcZ0Ts+20OYq28a7L4Md3kfcJ6/jdZZKjYR5ijHRgUaubIroQUqgu5 ALb0Xss5mBKNCMdkL+V7z1I4TP3nSJBJTAIGSx5v8cxF3tRPsI8JZWzOvv2mtRHx QFmsa/uRvFUJdN05B7Oo =itCH -----END PGP SIGNATURE----- --yFH6hwN92mUA4HK5--