All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Michael R. Hines" <mrhines@linux.vnet.ibm.com>
Cc: aliguori@us.ibm.com, qemu-devel@nongnu.org, owasserm@redhat.com,
	abali@us.ibm.com, mrhines@us.ibm.com, gokul@us.ibm.com,
	pbonzini@redhat.com
Subject: Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport
Date: Mon, 18 Mar 2013 23:26:46 +0200	[thread overview]
Message-ID: <20130318212646.GB20406@redhat.com> (raw)
In-Reply-To: <5147780C.1080800@linux.vnet.ibm.com>

On Mon, Mar 18, 2013 at 04:24:44PM -0400, Michael R. Hines wrote:
> On 03/18/2013 06:40 AM, Michael S. Tsirkin wrote:
> >I think there are two things here, API documentation and protocol
> >documentation, protocol documentation still needs some more work.
> >Also if what I understand from this document is correct this
> >breaks memory overcommit on destination which needs to be fixed.
> >
> >I think something chunk-based on the destination side is required
> >as well. You also can't trust the source to tell you the chunk
> >size it could be malicious and ask for too much. Maybe source
> >gives chunk size hint and destination responds with what it wants
> >to use.
> 
> Do we allow ballooning *during* the live migration? Is that necessary?

Probably but I haven't mentioned ballooning at all.

memory overcommit != ballooning

> Would it be sufficient to inform the destination which pages are ballooned
> and then only register the ones that the VM actually owns?

I haven't thought about it.

> >Is there any feature and/or version negotiation? How are we going to
> >handle compatibility when we extend the protocol?
> You mean, on top of the protocol versioning that's already
> builtin to QEMUFile? inside qemu_savevm_state_begin()?

I mean for protocol things like credit negotiation, which are unrelated
to high level QEMUFile.

> Should I piggy-back and additional protocol version number
> before QEMUFile sends it's version number?

CM can exchange a bit of data during connection setup, maybe use that?

> >So how does destination know it's ok to send anything to source? I
> >suspect this is wrong. When using CM you must post on RQ before
> >completing the connection negotiation, not after it's done.
> 
> This is already handled by the RDMA connection manager (librdmacm).
> 
> The library already has functions like listen() and accept() the same
> way that TCP does.
> 
> Once these functions return success, we have a gaurantee that both
> sides of the connection have already posted the appropriate work
> requests sufficient for driving the migration.

Not if you don't post anything. librdmacm does not post requests.  So
everyone posts 1 buffer on RQ during connection setup?
OK though this is not what the document said, I was under the impression
this is done after connection setup.

> 
> >>+2. We transmit an empty SEND to let the sender know that
> >>+   we are *ready* to receive some bytes from QEMUFileRDMA.
> >>+   These bytes will come in the form of a another SEND.
> >Using an empty message seems somewhat hacky, a fixed header in the
> >message would let you do more things if protocol is ever extended.
> 
> Great idea....... I'll add a struct RDMAHeader to each send
> message in the next RFC which includes a version number.
> 
> (Until now, there were *only* QEMUFile bytes, nothing else,
> so I didn't have any reason for a formal structure.)
> 
> 
> >OK to summarize flow control: at any time there's either 0 or 1
> >outstanding buffers in RQ. At each time only one side can talk.
> >Destination always goes first, then source, etc. At each time a
> >single send message can be passed. Just FYI, this means you are
> >often at 0 buffers in RQ and IIRC 0 buffers is a worst-case path
> >for infiniband. It's better to keep at least 1 buffers in RQ at
> >all times, so prepost 2 initially so it would fluctuate between 1
> >and 2.
> 
> That's correct. Having 0 buffers is not possible - sending
> a message with 0 buffers would throw an error. The "protocol"
> as I described ensures that there is always one buffer posted
> before waiting for another message to arrive.

So # of buffers goes 0 -> 1 -> 0 -> 1.
What I am saying is you should have an extra buffer
so it goes 1 -> 2 -> 1 -> 2
otherwise you keep hitting slow path in RQ processing:
each time you consume the last buffer, IIRC receiver sends
and ACK to sender saying "hey this is the last buffer, slow down".
You don't want that.

> I avoided "better" flow control because the non-live state
> is so small in comparison to the pc.ram contents that would be sent.
> The non-live state is in the range of kilobytes, so it seemed silly to
> have more rigorous flow control....

I think it's good enough, just add an extra unused buffer to make
hardware happy.

> >>+Migration of pc.ram:
> >>+===============================
> >>+
> >>+At the beginning of the migration, (migration-rdma.c),
> >>+the sender and the receiver populate the list of RAMBlocks
> >>+to be registered with each other into a structure.
> >Could you add the packet format here as well please?
> >Need to document endian-ness etc.
> 
> There is no packet format for pc.ram.

The 'structure' above is passed using SEND so there is
a format.

> It's just bytes - raw RDMA
> writes of each 4K page, because the memory must be registered
> before the RDMA write can begin.
> 
> (As discussed, there will be a format for SEND, though - so I'll
> take care of that in my next RFC).
> 
> > Yes but we also need to report errors detected during migration.
> >Need to document how this is done. We also need to report success.
> Acknowledged - I'll add more verbosity to the different error conditions.
> 
> - Michael R. Hines

  reply	other threads:[~2013-03-18 21:26 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-18  3:18 [Qemu-devel] [RFC PATCH RDMA support v4: 00/10] cleaner ramblocks and documentation mrhines
2013-03-18  3:18 ` [Qemu-devel] [RFC PATCH RDMA support v4: 01/10] ./configure --enable-rdma mrhines
2013-03-18  3:18 ` [Qemu-devel] [RFC PATCH RDMA support v4: 02/10] check for CONFIG_RDMA mrhines
2013-03-18  3:18 ` [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport mrhines
2013-03-18 10:40   ` Michael S. Tsirkin
2013-03-18 20:24     ` Michael R. Hines
2013-03-18 21:26       ` Michael S. Tsirkin [this message]
2013-03-18 23:23         ` Michael R. Hines
2013-03-19  8:19           ` Michael S. Tsirkin
2013-03-19 13:21             ` Michael R. Hines
2013-03-19 15:08             ` Michael R. Hines
2013-03-19 15:16               ` Michael S. Tsirkin
2013-03-19 15:32                 ` Michael R. Hines
2013-03-19 15:36                   ` Michael S. Tsirkin
2013-03-19 17:09                     ` Michael R. Hines
2013-03-19 17:14                       ` Paolo Bonzini
2013-03-19 17:23                         ` Michael S. Tsirkin
2013-03-19 17:40                         ` Michael R. Hines
2013-03-19 17:52                           ` Paolo Bonzini
2013-03-19 18:04                             ` Michael R. Hines
2013-03-20 13:07                             ` Michael S. Tsirkin
2013-03-20 15:15                               ` Michael R. Hines
2013-03-20 15:22                                 ` Michael R. Hines
2013-03-20 15:55                                 ` Michael S. Tsirkin
2013-03-20 16:08                                   ` Michael R. Hines
2013-03-20 19:06                                     ` Michael S. Tsirkin
2013-03-20 20:20                                       ` Michael R. Hines
2013-03-20 20:31                                         ` Michael S. Tsirkin
2013-03-20 20:39                                           ` Michael R. Hines
2013-03-20 20:46                                             ` Michael S. Tsirkin
2013-03-20 20:56                                               ` Michael R. Hines
2013-03-21  5:20                                                 ` Michael S. Tsirkin
2013-03-20 20:24                                   ` Michael R. Hines
2013-03-20 20:37                                     ` Michael S. Tsirkin
2013-03-20 20:45                                       ` Michael R. Hines
2013-03-20 20:52                                         ` Michael S. Tsirkin
2013-03-19 17:49                         ` Michael R. Hines
2013-03-21  6:11                           ` Michael S. Tsirkin
2013-03-21 15:22                             ` Michael R. Hines
2013-04-05 20:45                             ` Michael R. Hines
2013-04-05 20:46                             ` Michael R. Hines
2013-03-18  3:18 ` [Qemu-devel] [RFC PATCH RDMA support v4: 04/10] iterators for getting the RAMBlocks mrhines
2013-03-18  8:48   ` Paolo Bonzini
2013-03-18 20:25     ` Michael R. Hines
2013-03-18  3:18 ` [Qemu-devel] [RFC PATCH RDMA support v4: 05/10] reuse function for parsing the QMP 'migrate' string mrhines
2013-03-18  3:18 ` [Qemu-devel] [RFC PATCH RDMA support v4: 06/10] core RDMA migration code (rdma.c) mrhines
2013-03-18  3:19 ` [Qemu-devel] [RFC PATCH RDMA support v4: 07/10] connection-establishment for RDMA mrhines
2013-03-18  8:56   ` Paolo Bonzini
2013-03-18 20:26     ` Michael R. Hines
2013-03-18  3:19 ` [Qemu-devel] [RFC PATCH RDMA support v4: 08/10] introduce QEMUFileRDMA mrhines
2013-03-18  9:09   ` Paolo Bonzini
2013-03-18 20:33     ` Michael R. Hines
2013-03-19  9:18       ` Paolo Bonzini
2013-03-19 13:12         ` Michael R. Hines
2013-03-19 13:25           ` Paolo Bonzini
2013-03-19 13:40             ` Michael R. Hines
2013-03-19 13:45               ` Paolo Bonzini
2013-03-19 14:10                 ` Michael R. Hines
2013-03-19 14:22                   ` Paolo Bonzini
2013-03-19 15:02                     ` [Qemu-devel] [Bug]? (RDMA-related) ballooned memory not consulted during migration? Michael R. Hines
2013-03-19 15:12                       ` Michael R. Hines
2013-03-19 15:17                         ` Michael S. Tsirkin
2013-03-19 18:27                     ` [Qemu-devel] [RFC PATCH RDMA support v4: 08/10] introduce QEMUFileRDMA Michael R. Hines
2013-03-19 18:40                       ` Paolo Bonzini
2013-03-20 15:20                         ` Paolo Bonzini
2013-03-20 16:09                           ` Michael R. Hines
2013-03-18  3:19 ` [Qemu-devel] [RFC PATCH RDMA support v4: 09/10] check for QMP string and bypass nonblock() calls mrhines
2013-03-18  8:47   ` Paolo Bonzini
2013-03-18 20:37     ` Michael R. Hines
2013-03-19  9:23       ` Paolo Bonzini
2013-03-19 13:08         ` Michael R. Hines
2013-03-19 13:20           ` Paolo Bonzini
2013-03-18  3:19 ` [Qemu-devel] [RFC PATCH RDMA support v4: 10/10] send pc.ram over RDMA mrhines

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130318212646.GB20406@redhat.com \
    --to=mst@redhat.com \
    --cc=abali@us.ibm.com \
    --cc=aliguori@us.ibm.com \
    --cc=gokul@us.ibm.com \
    --cc=mrhines@linux.vnet.ibm.com \
    --cc=mrhines@us.ibm.com \
    --cc=owasserm@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.