From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:48351) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UQ7Tr-0003fG-S7 for qemu-devel@nongnu.org; Wed, 10 Apr 2013 22:43:21 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UQ7Tq-0006e4-GM for qemu-devel@nongnu.org; Wed, 10 Apr 2013 22:43:19 -0400 Received: from mx1.redhat.com ([209.132.183.28]:42972) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UQ7Tq-0006du-7x for qemu-devel@nongnu.org; Wed, 10 Apr 2013 22:43:18 -0400 Message-ID: <51662342.8090802@redhat.com> Date: Wed, 10 Apr 2013 20:43:14 -0600 From: Eric Blake MIME-Version: 1.0 References: <1365632901-15470-1-git-send-email-mrhines@linux.vnet.ibm.com> <1365632901-15470-13-git-send-email-mrhines@linux.vnet.ibm.com> In-Reply-To: <1365632901-15470-13-git-send-email-mrhines@linux.vnet.ibm.com> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="----enig2NDPKJSHIIDCBUERVSWQX" Subject: Re: [Qemu-devel] [RFC PATCH RDMA support v1: 12/13] updated protocol documentation List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: mrhines@linux.vnet.ibm.com Cc: aliguori@us.ibm.com, mst@redhat.com, qemu-devel@nongnu.org, owasserm@redhat.com, abali@us.ibm.com, mrhines@us.ibm.com, gokul@us.ibm.com, pbonzini@redhat.com This is an OpenPGP/MIME signed message (RFC 4880 and 3156) ------enig2NDPKJSHIIDCBUERVSWQX Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 04/10/2013 04:28 PM, mrhines@linux.vnet.ibm.com wrote: > From: "Michael R. Hines" >=20 > Full documentation on the rdma protocol: docs/rdma.txt >=20 > Signed-off-by: Michael R. Hines > --- > docs/rdma.txt | 331 +++++++++++++++++++++++++++++++++++++++++++++++++= ++++++++ > 1 file changed, 331 insertions(+) > create mode 100644 docs/rdma.txt >=20 > diff --git a/docs/rdma.txt b/docs/rdma.txt > new file mode 100644 > index 0000000..ae68d2f > --- /dev/null > +++ b/docs/rdma.txt > @@ -0,0 +1,331 @@ > +Changes since v6: > + > +(Thanks, Paolo - things look much cleaner now.) > + > +- Try to get patch-ordering correct =3D) > +- Much cleaner use of QEMUFileOps > +- Much fewer header files changes > +- Convert zero check capability to QMP command instead > +- Updated documentation The above text probably shouldn't be in the file. > + > +Wiki: http://wiki.qemu.org/Features/RDMALiveMigration > +Github: git@github.com:hinesmr/qemu.git > +Contact: Michael R. Hines, mrhines@us.ibm.com Missing a copyright statement, but that's just following the example of other docs, so I guess it's okay? > + > +RDMA Live Migration Specification, Version # 1 > + > +Contents: > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > +* Running > +* RDMA Protocol Description > +* Versioning and Capabilities > +* QEMUFileRDMA Interface > +* Migration of pc.ram > +* Error handling > +* TODO > +* Performance > + No high-level overview of what the acronym RDMA even stands for? > +RUNNING: > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D > + > +First, decide if you want dynamic page registration on the server-side= =2E > +This always happens on the primary-VM side, but is optional on the ser= ver. > +Doing this allows you to support overcommit (such as cgroups or balloo= ning) > +with a smaller footprint on the server-side without having to register= the > +entire VM memory footprint.=20 > +NOTE: This significantly slows down RDMA throughput (about 30% slower)= =2E > + > +$ virsh qemu-monitor-command --hmp \ > + --cmd "migrate_set_capability chunk_register_destination off" # en= abled by default 'virsh qemu-monitor-command' is documented as unsupported by libvirt (it's intended solely as a development/debugging aid); but I guess until libvirt learns to expose RDMA support by default, this is okay for a first cut of documentation. Furthermore, you are missing a domain argume= nt. Do you really want to be requiring the user to do everything through libvirt? This is qemu documentation, so you should document how things work without needing libvirt in the picture. > + > +Next, if you decided *not* to use chunked registration on the server, > +it is recommended to also disable zero page detection. While this is n= ot > +strictly necessary, zero page detection also significantly slows down > +throughput on higher-performance links (by about 50%), like 40 gbps in= finiband cards: > + > +$ virsh qemu-monitor-command --hmp \ > + --cmd "migrate_check_for_zero off" # enabled by default Missing a domain argument. > + > +Finally, set the migration speed to match your hardware's capabilities= : > + > +$ virsh qemu-monitor-command --hmp \ > + --cmd "migrate_set_speed 40g" # or whatever is the MAX of your RDM= A device This modifies qemu state behind libvirt's back, and won't necessarily do what you want if libvirt tries to change things back to the speed it thought it was managing. Instead, use 'virsh migrate-setspeed $dom 40'. > + > +Finally, perform the actual migration: > + > +$ virsh migrate domain rdma:xx.xx.xx.xx:port That's not quite valid syntax for 'virsh migrate'. Again, do you really want to be documenting libvirt's interface, or qemu's interface? > + > +RDMA Protocol Description: > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D Aesthetics: match the length of =3D=3D=3D to the line above it. I'm not reviewing technical content, just face value... > + > +These two functions are very short and simply used the protocol > +describe above to deliver bytes without changing the upper-level > +users of QEMUFile that depend on a bytstream abstraction. s/bytstream/bytestream/ =2E.. > + > +After pinning, an RDMA Write is generated and tramsmitted > +for the entire chunk. s/tramsmitted/transmitted/ > +5. Also, some form of balloon-device usage tracking would also > + help aleviate some of these issues. s/aleviate/alleviate/ > + > +PERFORMANCE > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > + > +Using a 40gbps infinband link performing a worst-case stress test: s/infinband/infiniband/ > + > +RDMA Throughput With $ stress --vm-bytes 1024M --vm 1 --vm-keep > +Approximately 30 gpbs (little better than the paper) which paper? Call that out in your high-level summary =2E.. > + > +An *exhaustive* paper (2010) shows additional performance details > +linked on the QEMU wiki: Missing the actual reference? And it would help to mention it at the beginning of the file. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org ------enig2NDPKJSHIIDCBUERVSWQX Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJRZiNCAAoJEKeha0olJ0NqjssH/Ax/mexwBOPdlPY+UXBoNVqf iLA7nZgsjHwV7ITiH6FYkFHJBK+Pm8m5b1ASSlzP36eLsdUKcVyeS4wRS9Q1ahzT 8jjkYTkzzxkJZ908Om4Z2p0twasq0CeBQF6ljZ1tEDdp6sU/Kx852iLdGJPNnDvg 6vifdJ8PZaxwQusVdLk79A2s6C2fLTxMufXIvFfMEMqfFY+vJm0oyzLRXFZBc0GV n0RchQo2J83zD8N4Pj/xcsXoSwuyKgCDFiGeWzhitnZjhRzJBQ/9OoZfEs/gkOPL LcOFunRZrVURBpXFQujX5QzfN0L+o6x0AZNw/lo4gj3OBfJRW1pjiJtZ5gEnRhE= =wnlz -----END PGP SIGNATURE----- ------enig2NDPKJSHIIDCBUERVSWQX--