From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Michael R. Hines" <mrhines@linux.vnet.ibm.com>
Cc: aliguori@us.ibm.com, qemu-devel@nongnu.org, owasserm@redhat.com,
abali@us.ibm.com, mrhines@us.ibm.com, gokul@us.ibm.com,
pbonzini@redhat.com
Subject: Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport
Date: Mon, 18 Mar 2013 23:26:46 +0200 [thread overview]
Message-ID: <20130318212646.GB20406@redhat.com> (raw)
In-Reply-To: <5147780C.1080800@linux.vnet.ibm.com>
On Mon, Mar 18, 2013 at 04:24:44PM -0400, Michael R. Hines wrote:
> On 03/18/2013 06:40 AM, Michael S. Tsirkin wrote:
> >I think there are two things here, API documentation and protocol
> >documentation, protocol documentation still needs some more work.
> >Also if what I understand from this document is correct this
> >breaks memory overcommit on destination which needs to be fixed.
> >
> >I think something chunk-based on the destination side is required
> >as well. You also can't trust the source to tell you the chunk
> >size it could be malicious and ask for too much. Maybe source
> >gives chunk size hint and destination responds with what it wants
> >to use.
>
> Do we allow ballooning *during* the live migration? Is that necessary?
Probably but I haven't mentioned ballooning at all.
memory overcommit != ballooning
> Would it be sufficient to inform the destination which pages are ballooned
> and then only register the ones that the VM actually owns?
I haven't thought about it.
> >Is there any feature and/or version negotiation? How are we going to
> >handle compatibility when we extend the protocol?
> You mean, on top of the protocol versioning that's already
> builtin to QEMUFile? inside qemu_savevm_state_begin()?
I mean for protocol things like credit negotiation, which are unrelated
to high level QEMUFile.
> Should I piggy-back and additional protocol version number
> before QEMUFile sends it's version number?
CM can exchange a bit of data during connection setup, maybe use that?
> >So how does destination know it's ok to send anything to source? I
> >suspect this is wrong. When using CM you must post on RQ before
> >completing the connection negotiation, not after it's done.
>
> This is already handled by the RDMA connection manager (librdmacm).
>
> The library already has functions like listen() and accept() the same
> way that TCP does.
>
> Once these functions return success, we have a gaurantee that both
> sides of the connection have already posted the appropriate work
> requests sufficient for driving the migration.
Not if you don't post anything. librdmacm does not post requests. So
everyone posts 1 buffer on RQ during connection setup?
OK though this is not what the document said, I was under the impression
this is done after connection setup.
>
> >>+2. We transmit an empty SEND to let the sender know that
> >>+ we are *ready* to receive some bytes from QEMUFileRDMA.
> >>+ These bytes will come in the form of a another SEND.
> >Using an empty message seems somewhat hacky, a fixed header in the
> >message would let you do more things if protocol is ever extended.
>
> Great idea....... I'll add a struct RDMAHeader to each send
> message in the next RFC which includes a version number.
>
> (Until now, there were *only* QEMUFile bytes, nothing else,
> so I didn't have any reason for a formal structure.)
>
>
> >OK to summarize flow control: at any time there's either 0 or 1
> >outstanding buffers in RQ. At each time only one side can talk.
> >Destination always goes first, then source, etc. At each time a
> >single send message can be passed. Just FYI, this means you are
> >often at 0 buffers in RQ and IIRC 0 buffers is a worst-case path
> >for infiniband. It's better to keep at least 1 buffers in RQ at
> >all times, so prepost 2 initially so it would fluctuate between 1
> >and 2.
>
> That's correct. Having 0 buffers is not possible - sending
> a message with 0 buffers would throw an error. The "protocol"
> as I described ensures that there is always one buffer posted
> before waiting for another message to arrive.
So # of buffers goes 0 -> 1 -> 0 -> 1.
What I am saying is you should have an extra buffer
so it goes 1 -> 2 -> 1 -> 2
otherwise you keep hitting slow path in RQ processing:
each time you consume the last buffer, IIRC receiver sends
and ACK to sender saying "hey this is the last buffer, slow down".
You don't want that.
> I avoided "better" flow control because the non-live state
> is so small in comparison to the pc.ram contents that would be sent.
> The non-live state is in the range of kilobytes, so it seemed silly to
> have more rigorous flow control....
I think it's good enough, just add an extra unused buffer to make
hardware happy.
> >>+Migration of pc.ram:
> >>+===============================
> >>+
> >>+At the beginning of the migration, (migration-rdma.c),
> >>+the sender and the receiver populate the list of RAMBlocks
> >>+to be registered with each other into a structure.
> >Could you add the packet format here as well please?
> >Need to document endian-ness etc.
>
> There is no packet format for pc.ram.
The 'structure' above is passed using SEND so there is
a format.
> It's just bytes - raw RDMA
> writes of each 4K page, because the memory must be registered
> before the RDMA write can begin.
>
> (As discussed, there will be a format for SEND, though - so I'll
> take care of that in my next RFC).
>
> > Yes but we also need to report errors detected during migration.
> >Need to document how this is done. We also need to report success.
> Acknowledged - I'll add more verbosity to the different error conditions.
>
> - Michael R. Hines
next prev parent reply other threads:[~2013-03-18 21:26 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-18 3:18 [Qemu-devel] [RFC PATCH RDMA support v4: 00/10] cleaner ramblocks and documentation mrhines
2013-03-18 3:18 ` [Qemu-devel] [RFC PATCH RDMA support v4: 01/10] ./configure --enable-rdma mrhines
2013-03-18 3:18 ` [Qemu-devel] [RFC PATCH RDMA support v4: 02/10] check for CONFIG_RDMA mrhines
2013-03-18 3:18 ` [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport mrhines
2013-03-18 10:40 ` Michael S. Tsirkin
2013-03-18 20:24 ` Michael R. Hines
2013-03-18 21:26 ` Michael S. Tsirkin [this message]
2013-03-18 23:23 ` Michael R. Hines
2013-03-19 8:19 ` Michael S. Tsirkin
2013-03-19 13:21 ` Michael R. Hines
2013-03-19 15:08 ` Michael R. Hines
2013-03-19 15:16 ` Michael S. Tsirkin
2013-03-19 15:32 ` Michael R. Hines
2013-03-19 15:36 ` Michael S. Tsirkin
2013-03-19 17:09 ` Michael R. Hines
2013-03-19 17:14 ` Paolo Bonzini
2013-03-19 17:23 ` Michael S. Tsirkin
2013-03-19 17:40 ` Michael R. Hines
2013-03-19 17:52 ` Paolo Bonzini
2013-03-19 18:04 ` Michael R. Hines
2013-03-20 13:07 ` Michael S. Tsirkin
2013-03-20 15:15 ` Michael R. Hines
2013-03-20 15:22 ` Michael R. Hines
2013-03-20 15:55 ` Michael S. Tsirkin
2013-03-20 16:08 ` Michael R. Hines
2013-03-20 19:06 ` Michael S. Tsirkin
2013-03-20 20:20 ` Michael R. Hines
2013-03-20 20:31 ` Michael S. Tsirkin
2013-03-20 20:39 ` Michael R. Hines
2013-03-20 20:46 ` Michael S. Tsirkin
2013-03-20 20:56 ` Michael R. Hines
2013-03-21 5:20 ` Michael S. Tsirkin
2013-03-20 20:24 ` Michael R. Hines
2013-03-20 20:37 ` Michael S. Tsirkin
2013-03-20 20:45 ` Michael R. Hines
2013-03-20 20:52 ` Michael S. Tsirkin
2013-03-19 17:49 ` Michael R. Hines
2013-03-21 6:11 ` Michael S. Tsirkin
2013-03-21 15:22 ` Michael R. Hines
2013-04-05 20:45 ` Michael R. Hines
2013-04-05 20:46 ` Michael R. Hines
2013-03-18 3:18 ` [Qemu-devel] [RFC PATCH RDMA support v4: 04/10] iterators for getting the RAMBlocks mrhines
2013-03-18 8:48 ` Paolo Bonzini
2013-03-18 20:25 ` Michael R. Hines
2013-03-18 3:18 ` [Qemu-devel] [RFC PATCH RDMA support v4: 05/10] reuse function for parsing the QMP 'migrate' string mrhines
2013-03-18 3:18 ` [Qemu-devel] [RFC PATCH RDMA support v4: 06/10] core RDMA migration code (rdma.c) mrhines
2013-03-18 3:19 ` [Qemu-devel] [RFC PATCH RDMA support v4: 07/10] connection-establishment for RDMA mrhines
2013-03-18 8:56 ` Paolo Bonzini
2013-03-18 20:26 ` Michael R. Hines
2013-03-18 3:19 ` [Qemu-devel] [RFC PATCH RDMA support v4: 08/10] introduce QEMUFileRDMA mrhines
2013-03-18 9:09 ` Paolo Bonzini
2013-03-18 20:33 ` Michael R. Hines
2013-03-19 9:18 ` Paolo Bonzini
2013-03-19 13:12 ` Michael R. Hines
2013-03-19 13:25 ` Paolo Bonzini
2013-03-19 13:40 ` Michael R. Hines
2013-03-19 13:45 ` Paolo Bonzini
2013-03-19 14:10 ` Michael R. Hines
2013-03-19 14:22 ` Paolo Bonzini
2013-03-19 15:02 ` [Qemu-devel] [Bug]? (RDMA-related) ballooned memory not consulted during migration? Michael R. Hines
2013-03-19 15:12 ` Michael R. Hines
2013-03-19 15:17 ` Michael S. Tsirkin
2013-03-19 18:27 ` [Qemu-devel] [RFC PATCH RDMA support v4: 08/10] introduce QEMUFileRDMA Michael R. Hines
2013-03-19 18:40 ` Paolo Bonzini
2013-03-20 15:20 ` Paolo Bonzini
2013-03-20 16:09 ` Michael R. Hines
2013-03-18 3:19 ` [Qemu-devel] [RFC PATCH RDMA support v4: 09/10] check for QMP string and bypass nonblock() calls mrhines
2013-03-18 8:47 ` Paolo Bonzini
2013-03-18 20:37 ` Michael R. Hines
2013-03-19 9:23 ` Paolo Bonzini
2013-03-19 13:08 ` Michael R. Hines
2013-03-19 13:20 ` Paolo Bonzini
2013-03-18 3:19 ` [Qemu-devel] [RFC PATCH RDMA support v4: 10/10] send pc.ram over RDMA mrhines
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130318212646.GB20406@redhat.com \
--to=mst@redhat.com \
--cc=abali@us.ibm.com \
--cc=aliguori@us.ibm.com \
--cc=gokul@us.ibm.com \
--cc=mrhines@linux.vnet.ibm.com \
--cc=mrhines@us.ibm.com \
--cc=owasserm@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).