qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Michael R. Hines" <mrhines@linux.vnet.ibm.com>
To: Eric Blake <eblake@redhat.com>
Cc: aliguori@us.ibm.com, quintela@redhat.com, qemu-devel@nongnu.org,
	owasserm@redhat.com, abali@us.ibm.com, mrhines@us.ibm.com,
	gokul@us.ibm.com, pbonzini@redhat.com, chegu_vinod@hp.com,
	knoel@redhat.com
Subject: Re: [Qemu-devel] [PATCH v3 resend/cleanup 1/8] rdma: update documentation to reflect new unpin support
Date: Fri, 12 Jul 2013 13:26:05 -0400	[thread overview]
Message-ID: <51E03C2D.9020206@linux.vnet.ibm.com> (raw)
In-Reply-To: <51E0382E.5030209@redhat.com>

On 07/12/2013 01:09 PM, Eric Blake wrote:
> On 07/12/2013 08:40 AM, mrhines@linux.vnet.ibm.com wrote:
>> From: "Michael R. Hines" <mrhines@us.ibm.com>
>>
>> As requested, the protocol now includes memory unpinning support.
>> This has been implemented in a non-optimized manner, in such a way
>> that one could devise an LRU or other workload-specific information
>> on top of the basic mechanism to influence the way unpinning happens
>> during runtime.
>>
>> The feature is not yet user-facing, and is thus can only be enabled
>> at compile-time.
>>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
>> Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
>> ---
>>   docs/rdma.txt |   51 ++++++++++++++++++++++++++++++---------------------
>>   1 file changed, 30 insertions(+), 21 deletions(-)
> I suggest splitting this patch into two; and cc-ing the first of the two
> patches through qemu-trivial (since formatting cleanups can be applied
> now, even while still waiting for a comprehensive review of the
> algorithm in the rest of the series)

My understanding is that the reviews have completed already,
including a very extensive test series that I performed which
included both virt-test results and non-virt-test results from both
myself and Chegu.

Am I mistaken?


>
>> diff --git a/docs/rdma.txt b/docs/rdma.txt
>> index 45a4b1d..45d1c8a 100644
>> --- a/docs/rdma.txt
>> +++ b/docs/rdma.txt
>> @@ -35,7 +35,7 @@ memory tracked during each live migration iteration round cannot keep pace
>>   with the rate of dirty memory produced by the workload.
>>   
>>   RDMA currently comes in two flavors: both Ethernet based (RoCE, or RDMA
>> -over Convered Ethernet) as well as Infiniband-based. This implementation of
>> +over Converged Ethernet) as well as Infiniband-based. This implementation of
> Trivial
>
>>   migration using RDMA is capable of using both technologies because of
>>   the use of the OpenFabrics OFED software stack that abstracts out the
>>   programming model irrespective of the underlying hardware.
>> @@ -188,9 +188,9 @@ header portion and a data portion (but together are transmitted
>>   as a single SEND message).
>>   
>>   Header:
>> -    * Length  (of the data portion, uint32, network byte order)
>> -    * Type    (what command to perform, uint32, network byte order)
>> -    * Repeat  (Number of commands in data portion, same type only)
>> +    * Length               (of the data portion, uint32, network byte order)
>> +    * Type                 (what command to perform, uint32, network byte order)
>> +    * Repeat               (Number of commands in data portion, same type only)
> trivial
>
>>   
>>   The 'Repeat' field is here to support future multiple page registrations
>>   in a single message without any need to change the protocol itself
>> @@ -202,17 +202,19 @@ The maximum number of repeats is hard-coded to 4096. This is a conservative
>>   limit based on the maximum size of a SEND message along with emperical
>>   observations on the maximum future benefit of simultaneous page registrations.
>>   
>> -The 'type' field has 10 different command values:
>> -    1. Unused
>> -    2. Error              (sent to the source during bad things)
>> -    3. Ready              (control-channel is available)
>> -    4. QEMU File          (for sending non-live device state)
>> -    5. RAM Blocks request (used right after connection setup)
>> -    6. RAM Blocks result  (used right after connection setup)
>> -    7. Compress page      (zap zero page and skip registration)
>> -    8. Register request   (dynamic chunk registration)
>> -    9. Register result    ('rkey' to be used by sender)
>> -    10. Register finished  (registration for current iteration finished)
>> +The 'type' field has 12 different command values:
>> +     1. Unused
>> +     2. Error                      (sent to the source during bad things)
>> +     3. Ready                      (control-channel is available)
>> +     4. QEMU File                  (for sending non-live device state)
>> +     5. RAM Blocks request         (used right after connection setup)
>> +     6. RAM Blocks result          (used right after connection setup)
>> +     7. Compress page              (zap zero page and skip registration)
>> +     8. Register request           (dynamic chunk registration)
>> +     9. Register result            ('rkey' to be used by sender)
>> +    10. Register finished          (registration for current iteration finished)
> reformatting is trivial,
>
>> +    11. Unregister request         (unpin previously registered memory)
>> +    12. Unregister finished        (confirmation that unpin completed)
> addition belongs in the second patch (so that we don't have to wade
> through that much trivial stuff to find the real changes)
>
>>   
>>   A single control message, as hinted above, can contain within the data
>>   portion an array of many commands of the same type. If there is more than
>> @@ -243,7 +245,7 @@ qemu_rdma_exchange_send(header, data, optional response header & data):
>>      from the receiver to tell us that the receiver
>>      is *ready* for us to transmit some new bytes.
>>   2. Optionally: if we are expecting a response from the command
>> -   (that we have no yet transmitted), let's post an RQ
>> +   (that we have not yet transmitted), let's post an RQ
> trivial
>
>>      work request to receive that data a few moments later.
>>   3. When the READY arrives, librdmacm will
>>      unblock us and we immediately post a RQ work request
>> @@ -293,8 +295,10 @@ librdmacm provides the user with a 'private data' area to be exchanged
>>   at connection-setup time before any infiniband traffic is generated.
>>   
>>   Header:
>> -    * Version (protocol version validated before send/recv occurs), uint32, network byte order
>> -    * Flags   (bitwise OR of each capability), uint32, network byte order
>> +    * Version (protocol version validated before send/recv occurs),
>> +                                               uint32, network byte order
>> +    * Flags   (bitwise OR of each capability),
>> +                                               uint32, network byte order
> trivial
>
>>   
>>   There is no data portion of this header right now, so there is
>>   no length field. The maximum size of the 'private data' section
>> @@ -313,7 +317,7 @@ If the version is invalid, we throw an error.
>>   If the version is new, we only negotiate the capabilities that the
>>   requested version is able to perform and ignore the rest.
>>   
>> -Currently there is only *one* capability in Version #1: dynamic page registration
>> +Currently there is only one capability in Version #1: dynamic page registration
> trivial
>
>>   
>>   Finally: Negotiation happens with the Flags field: If the primary-VM
>>   sets a flag, but the destination does not support this capability, it
>> @@ -326,8 +330,8 @@ QEMUFileRDMA Interface:
>>   
>>   QEMUFileRDMA introduces a couple of new functions:
>>   
>> -1. qemu_rdma_get_buffer()  (QEMUFileOps rdma_read_ops)
>> -2. qemu_rdma_put_buffer()  (QEMUFileOps rdma_write_ops)
>> +1. qemu_rdma_get_buffer()               (QEMUFileOps rdma_read_ops)
>> +2. qemu_rdma_put_buffer()               (QEMUFileOps rdma_write_ops)
> trivial
>
>>   
>>   These two functions are very short and simply use the protocol
>>   describe above to deliver bytes without changing the upper-level
>> @@ -413,3 +417,8 @@ TODO:
>>      the use of KSM and ballooning while using RDMA.
>>   4. Also, some form of balloon-device usage tracking would also
>>      help alleviate some issues.
>> +5. Move UNREGISTER requests to a separate thread.
>> +6. Use LRU to provide more fine-grained direction of UNREGISTER
>> +   requests for unpinning memory in an overcommitted environment.
>> +7. Expose UNREGISTER support to the user by way of workload-specific
>> +   hints about application behavior.
>>
> new content
>

  reply	other threads:[~2013-07-12 17:26 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-12 14:40 [Qemu-devel] [PATCH v3 resend/cleanup 0/8] rdma: core logic mrhines
2013-07-12 14:40 ` [Qemu-devel] [PATCH v3 resend/cleanup 1/8] rdma: update documentation to reflect new unpin support mrhines
2013-07-12 17:09   ` Eric Blake
2013-07-12 17:26     ` Michael R. Hines [this message]
2013-07-12 17:39       ` Eric Blake
2013-07-12 17:46         ` Michael R. Hines
2013-07-12 14:40 ` [Qemu-devel] [PATCH v3 resend/cleanup 2/8] rdma: bugfix: ram_control_save_page() mrhines
2013-07-12 17:09   ` Eric Blake
2013-07-12 14:40 ` [Qemu-devel] [PATCH v3 resend/cleanup 3/8] rdma: introduce ram_handle_compressed() mrhines
2013-07-12 14:40 ` [Qemu-devel] [PATCH v3 resend/cleanup 4/8] rdma: core logic mrhines
2013-07-12 14:40 ` [Qemu-devel] [PATCH v3 resend/cleanup 5/8] rdma: send pc.ram mrhines
2013-07-12 14:40 ` [Qemu-devel] [PATCH v3 resend/cleanup 6/8] rdma: allow state transitions between other states besides ACTIVE mrhines
2013-07-12 14:40 ` [Qemu-devel] [PATCH v3 resend/cleanup 7/8] rdma: introduce MIG_STATE_NONE and change MIG_STATE_SETUP state transition mrhines
2013-07-12 14:40 ` [Qemu-devel] [PATCH v3 resend/cleanup 8/8] rdma: account for the time spent in MIG_STATE_SETUP through QMP mrhines

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51E03C2D.9020206@linux.vnet.ibm.com \
    --to=mrhines@linux.vnet.ibm.com \
    --cc=abali@us.ibm.com \
    --cc=aliguori@us.ibm.com \
    --cc=chegu_vinod@hp.com \
    --cc=eblake@redhat.com \
    --cc=gokul@us.ibm.com \
    --cc=knoel@redhat.com \
    --cc=mrhines@us.ibm.com \
    --cc=owasserm@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).