qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Michael R. Hines" <mrhines@linux.vnet.ibm.com>
Cc: aliguori@us.ibm.com, qemu-devel@nongnu.org, owasserm@redhat.com,
	abali@us.ibm.com, mrhines@us.ibm.com, gokul@us.ibm.com,
	pbonzini@redhat.com
Subject: Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport
Date: Mon, 18 Mar 2013 23:26:46 +0200	[thread overview]
Message-ID: <20130318212646.GB20406@redhat.com> (raw)
In-Reply-To: <5147780C.1080800@linux.vnet.ibm.com>

On Mon, Mar 18, 2013 at 04:24:44PM -0400, Michael R. Hines wrote:
> On 03/18/2013 06:40 AM, Michael S. Tsirkin wrote:
> >I think there are two things here, API documentation and protocol
> >documentation, protocol documentation still needs some more work.
> >Also if what I understand from this document is correct this
> >breaks memory overcommit on destination which needs to be fixed.
> >
> >I think something chunk-based on the destination side is required
> >as well. You also can't trust the source to tell you the chunk
> >size it could be malicious and ask for too much. Maybe source
> >gives chunk size hint and destination responds with what it wants
> >to use.
> 
> Do we allow ballooning *during* the live migration? Is that necessary?

Probably but I haven't mentioned ballooning at all.

memory overcommit != ballooning

> Would it be sufficient to inform the destination which pages are ballooned
> and then only register the ones that the VM actually owns?

I haven't thought about it.

> >Is there any feature and/or version negotiation? How are we going to
> >handle compatibility when we extend the protocol?
> You mean, on top of the protocol versioning that's already
> builtin to QEMUFile? inside qemu_savevm_state_begin()?

I mean for protocol things like credit negotiation, which are unrelated
to high level QEMUFile.

> Should I piggy-back and additional protocol version number
> before QEMUFile sends it's version number?

CM can exchange a bit of data during connection setup, maybe use that?

> >So how does destination know it's ok to send anything to source? I
> >suspect this is wrong. When using CM you must post on RQ before
> >completing the connection negotiation, not after it's done.
> 
> This is already handled by the RDMA connection manager (librdmacm).
> 
> The library already has functions like listen() and accept() the same
> way that TCP does.
> 
> Once these functions return success, we have a gaurantee that both
> sides of the connection have already posted the appropriate work
> requests sufficient for driving the migration.

Not if you don't post anything. librdmacm does not post requests.  So
everyone posts 1 buffer on RQ during connection setup?
OK though this is not what the document said, I was under the impression
this is done after connection setup.

> 
> >>+2. We transmit an empty SEND to let the sender know that
> >>+   we are *ready* to receive some bytes from QEMUFileRDMA.
> >>+   These bytes will come in the form of a another SEND.
> >Using an empty message seems somewhat hacky, a fixed header in the
> >message would let you do more things if protocol is ever extended.
> 
> Great idea....... I'll add a struct RDMAHeader to each send
> message in the next RFC which includes a version number.
> 
> (Until now, there were *only* QEMUFile bytes, nothing else,
> so I didn't have any reason for a formal structure.)
> 
> 
> >OK to summarize flow control: at any time there's either 0 or 1
> >outstanding buffers in RQ. At each time only one side can talk.
> >Destination always goes first, then source, etc. At each time a
> >single send message can be passed. Just FYI, this means you are
> >often at 0 buffers in RQ and IIRC 0 buffers is a worst-case path
> >for infiniband. It's better to keep at least 1 buffers in RQ at
> >all times, so prepost 2 initially so it would fluctuate between 1
> >and 2.
> 
> That's correct. Having 0 buffers is not possible - sending
> a message with 0 buffers would throw an error. The "protocol"
> as I described ensures that there is always one buffer posted
> before waiting for another message to arrive.

So # of buffers goes 0 -> 1 -> 0 -> 1.
What I am saying is you should have an extra buffer
so it goes 1 -> 2 -> 1 -> 2
otherwise you keep hitting slow path in RQ processing:
each time you consume the last buffer, IIRC receiver sends
and ACK to sender saying "hey this is the last buffer, slow down".
You don't want that.

> I avoided "better" flow control because the non-live state
> is so small in comparison to the pc.ram contents that would be sent.
> The non-live state is in the range of kilobytes, so it seemed silly to
> have more rigorous flow control....

I think it's good enough, just add an extra unused buffer to make
hardware happy.

> >>+Migration of pc.ram:
> >>+===============================
> >>+
> >>+At the beginning of the migration, (migration-rdma.c),
> >>+the sender and the receiver populate the list of RAMBlocks
> >>+to be registered with each other into a structure.
> >Could you add the packet format here as well please?
> >Need to document endian-ness etc.
> 
> There is no packet format for pc.ram.

The 'structure' above is passed using SEND so there is
a format.

> It's just bytes - raw RDMA
> writes of each 4K page, because the memory must be registered
> before the RDMA write can begin.
> 
> (As discussed, there will be a format for SEND, though - so I'll
> take care of that in my next RFC).
> 
> > Yes but we also need to report errors detected during migration.
> >Need to document how this is done. We also need to report success.
> Acknowledged - I'll add more verbosity to the different error conditions.
> 
> - Michael R. Hines

  reply	other threads:[~2013-03-18 21:26 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-18  3:18 [Qemu-devel] [RFC PATCH RDMA support v4: 00/10] cleaner ramblocks and documentation mrhines
2013-03-18  3:18 ` [Qemu-devel] [RFC PATCH RDMA support v4: 01/10] ./configure --enable-rdma mrhines
2013-03-18  3:18 ` [Qemu-devel] [RFC PATCH RDMA support v4: 02/10] check for CONFIG_RDMA mrhines
2013-03-18  3:18 ` [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport mrhines
2013-03-18 10:40   ` Michael S. Tsirkin
2013-03-18 20:24     ` Michael R. Hines
2013-03-18 21:26       ` Michael S. Tsirkin [this message]
2013-03-18 23:23         ` Michael R. Hines
2013-03-19  8:19           ` Michael S. Tsirkin
2013-03-19 13:21             ` Michael R. Hines
2013-03-19 15:08             ` Michael R. Hines
2013-03-19 15:16               ` Michael S. Tsirkin
2013-03-19 15:32                 ` Michael R. Hines
2013-03-19 15:36                   ` Michael S. Tsirkin
2013-03-19 17:09                     ` Michael R. Hines
2013-03-19 17:14                       ` Paolo Bonzini
2013-03-19 17:23                         ` Michael S. Tsirkin
2013-03-19 17:40                         ` Michael R. Hines
2013-03-19 17:52                           ` Paolo Bonzini
2013-03-19 18:04                             ` Michael R. Hines
2013-03-20 13:07                             ` Michael S. Tsirkin
2013-03-20 15:15                               ` Michael R. Hines
2013-03-20 15:22                                 ` Michael R. Hines
2013-03-20 15:55                                 ` Michael S. Tsirkin
2013-03-20 16:08                                   ` Michael R. Hines
2013-03-20 19:06                                     ` Michael S. Tsirkin
2013-03-20 20:20                                       ` Michael R. Hines
2013-03-20 20:31                                         ` Michael S. Tsirkin
2013-03-20 20:39                                           ` Michael R. Hines
2013-03-20 20:46                                             ` Michael S. Tsirkin
2013-03-20 20:56                                               ` Michael R. Hines
2013-03-21  5:20                                                 ` Michael S. Tsirkin
2013-03-20 20:24                                   ` Michael R. Hines
2013-03-20 20:37                                     ` Michael S. Tsirkin
2013-03-20 20:45                                       ` Michael R. Hines
2013-03-20 20:52                                         ` Michael S. Tsirkin
2013-03-19 17:49                         ` Michael R. Hines
2013-03-21  6:11                           ` Michael S. Tsirkin
2013-03-21 15:22                             ` Michael R. Hines
2013-04-05 20:45                             ` Michael R. Hines
2013-04-05 20:46                             ` Michael R. Hines
2013-03-18  3:18 ` [Qemu-devel] [RFC PATCH RDMA support v4: 04/10] iterators for getting the RAMBlocks mrhines
2013-03-18  8:48   ` Paolo Bonzini
2013-03-18 20:25     ` Michael R. Hines
2013-03-18  3:18 ` [Qemu-devel] [RFC PATCH RDMA support v4: 05/10] reuse function for parsing the QMP 'migrate' string mrhines
2013-03-18  3:18 ` [Qemu-devel] [RFC PATCH RDMA support v4: 06/10] core RDMA migration code (rdma.c) mrhines
2013-03-18  3:19 ` [Qemu-devel] [RFC PATCH RDMA support v4: 07/10] connection-establishment for RDMA mrhines
2013-03-18  8:56   ` Paolo Bonzini
2013-03-18 20:26     ` Michael R. Hines
2013-03-18  3:19 ` [Qemu-devel] [RFC PATCH RDMA support v4: 08/10] introduce QEMUFileRDMA mrhines
2013-03-18  9:09   ` Paolo Bonzini
2013-03-18 20:33     ` Michael R. Hines
2013-03-19  9:18       ` Paolo Bonzini
2013-03-19 13:12         ` Michael R. Hines
2013-03-19 13:25           ` Paolo Bonzini
2013-03-19 13:40             ` Michael R. Hines
2013-03-19 13:45               ` Paolo Bonzini
2013-03-19 14:10                 ` Michael R. Hines
2013-03-19 14:22                   ` Paolo Bonzini
2013-03-19 15:02                     ` [Qemu-devel] [Bug]? (RDMA-related) ballooned memory not consulted during migration? Michael R. Hines
2013-03-19 15:12                       ` Michael R. Hines
2013-03-19 15:17                         ` Michael S. Tsirkin
2013-03-19 18:27                     ` [Qemu-devel] [RFC PATCH RDMA support v4: 08/10] introduce QEMUFileRDMA Michael R. Hines
2013-03-19 18:40                       ` Paolo Bonzini
2013-03-20 15:20                         ` Paolo Bonzini
2013-03-20 16:09                           ` Michael R. Hines
2013-03-18  3:19 ` [Qemu-devel] [RFC PATCH RDMA support v4: 09/10] check for QMP string and bypass nonblock() calls mrhines
2013-03-18  8:47   ` Paolo Bonzini
2013-03-18 20:37     ` Michael R. Hines
2013-03-19  9:23       ` Paolo Bonzini
2013-03-19 13:08         ` Michael R. Hines
2013-03-19 13:20           ` Paolo Bonzini
2013-03-18  3:19 ` [Qemu-devel] [RFC PATCH RDMA support v4: 10/10] send pc.ram over RDMA mrhines

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130318212646.GB20406@redhat.com \
    --to=mst@redhat.com \
    --cc=abali@us.ibm.com \
    --cc=aliguori@us.ibm.com \
    --cc=gokul@us.ibm.com \
    --cc=mrhines@linux.vnet.ibm.com \
    --cc=mrhines@us.ibm.com \
    --cc=owasserm@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).