qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Michael R. Hines" <mrhines@linux.vnet.ibm.com>
To: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: aliguori@us.ibm.com, quintela@redhat.com, qemu-devel@nongnu.org,
	owasserm@redhat.com, abali@us.ibm.com, mrhines@us.ibm.com,
	gokul@us.ibm.com, pbonzini@redhat.com, chegu_vinod@hp.com,
	knoel@redhat.com
Subject: Re: [Qemu-devel] [PATCH v3 resend 4/8] rdma: core logic
Date: Thu, 18 Jul 2013 08:07:13 -0400	[thread overview]
Message-ID: <51E7DA71.2040400@linux.vnet.ibm.com> (raw)
In-Reply-To: <1374132600.2000.12.camel@localhost.localdomain>

On 07/18/2013 03:30 AM, Marcel Apfelbaum wrote:
> On Tue, 2013-07-16 at 12:48 -0400, mrhines@linux.vnet.ibm.com wrote:
>> From: "Michael R. Hines" <mrhines@us.ibm.com>
>>
>> Code that does need to be visible is kept
>> well contained inside this file and this is the only
>> new additional file to the entire patch.
>>
>> This file includes the entire protocol and interfaces
>> required to perform RDMA migration.
>>
>> Also, the configure and Makefile modifications to link
>> this file are included.
>>
>> Full documentation is in docs/rdma.txt
>>
> This patch is too big (in my opinion).
> I would split it into at least 3 patches:
> 1. Generic RDMA code (this part can be reused by everyone who will need RDMA in the future)
> 2. RDMA transfer protocol (separating this will give us possibility for optimization without touching the rest of the code)
> 3. Migration related code
>

Don't let the "v3" mislead you =). The patch actually *used* to look 
just like what
you described (3 different ones), but after more than a dozen reviews 
since January
I was told to join all the code into a single file by the reviewers.

>
>> + */
>> +#define RDMA_WRID_TYPE_SHIFT  0UL
>> +#define RDMA_WRID_BLOCK_SHIFT 16UL
>> +#define RDMA_WRID_CHUNK_SHIFT 30UL
> If I understand correctly each RDMA write is exactly 1MB.
> I think that it is crucial from the performance point of view to make this configurable.
> Sometimes the time to register the MR(and unregister) on both machines may be the same as transmission of 1MB.
> Bottom line, this cannot be hard-coded because it depends on the connection speed.

Yes, that's correct, but you probably missed reading the whole file 
(it's big, I know). Search for "mergable" in the patch, and you'll see 
the function.

Chunks are only 1MB *contiguous*. That means that if ram_save_block() 
does not gives us contiguous
pages up to 1MB, then we do not send 1MB - we only send what we were 
given. For example: if we were
given 0.5 of contiguous transfers to perform and then the next page 
address is far far away, then we transmit
that "chunk" immediately without worrying about waiting for anything else.


>
>
>> +/*
>> + * RDMA requires memory registration (mlock/pinning), but this is not good for
>> + * overcommitment.
>> + *
>> + * In preparation for the future where LRU information or workload-specific
>> + * writable writable working set memory access behavior is available to QEMU
>> + * it would be nice to have in place the ability to UN-register/UN-pin
>> + * particular memory regions from the RDMA hardware when it is determine that
>> + * those regions of memory will likely not be accessed again in the near future.
>> + *
>> + * While we do not yet have such information right now, the following
>> + * compile-time option allows us to perform a non-optimized version of this
>> + * behavior.
>> + *
>> + * By uncommenting this option, you will cause *all* RDMA transfers to be
>> + * unregistered immediately after the transfer completes on both sides of the
>> + * connection. This has no effect in 'rdma-pin-all' mode, only regular mode.
>> + *
>> + * This will have a terrible impact on migration performance, so until future
>> + * workload information or LRU information is available, do not attempt to use
>> + * this feature except for basic testing.
>> + */
>> +//#define RDMA_UNREGISTRATION_EXAMPLE
>> +
>> +/*
>> + * Perform a non-optimized memory unregistration after every transfer
>> + * for demonsration purposes, only if pin-all is not requested.
>> + *
>> + * Potential optimizations:
>> + * 1. Start a new thread to run this function continuously
>> +        - for bit clearing
>> +        - and for receipt of unregister messages
>> + * 2. Use an LRU.
>> + * 3. Use workload hints.
>> + */
> I think that if we work with chunks large enough, in such way that the time to pass
> the pages between hosts is much bigger the MR registration/unregistration, it may solve
> the issue as least partially (it will not be such an impact).
> After that, optimization is a good idea.
>
>
> Nice work!
> Marcel
>

Agreed. Thank you.

- Michael

  reply	other threads:[~2013-07-18 12:08 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-16 16:48 [Qemu-devel] [PATCH v3 resend 0/8] rdma: core logic mrhines
2013-07-16 16:48 ` [Qemu-devel] [PATCH v3 resend 1/8] rdma: update documentation to reflect new unpin support mrhines
2013-07-16 16:48 ` [Qemu-devel] [PATCH v3 resend 2/8] rdma: bugfix: ram_control_save_page() mrhines
2013-07-16 16:48 ` [Qemu-devel] [PATCH v3 resend 3/8] rdma: introduce ram_handle_compressed() mrhines
2013-07-16 16:48 ` [Qemu-devel] [PATCH v3 resend 4/8] rdma: core logic mrhines
2013-07-18  7:30   ` Marcel Apfelbaum
2013-07-18 12:07     ` Michael R. Hines [this message]
2013-07-18 12:22       ` Paolo Bonzini
2013-07-16 16:48 ` [Qemu-devel] [PATCH v3 resend 5/8] rdma: send pc.ram mrhines
2013-07-16 16:48 ` [Qemu-devel] [PATCH v3 resend 6/8] rdma: allow state transitions between other states besides ACTIVE mrhines
2013-07-16 16:48 ` [Qemu-devel] [PATCH v3 resend 7/8] rdma: introduce MIG_STATE_NONE and change MIG_STATE_SETUP state transition mrhines
2013-07-16 16:48 ` [Qemu-devel] [PATCH v3 resend 8/8] rdma: account for the time spent in MIG_STATE_SETUP through QMP mrhines
  -- strict thread matches above, loose matches on Subject: below --
2013-07-22 14:01 [Qemu-devel] [PATCH v3 resend 0/8] rdma: core logic mrhines
2013-07-22 14:01 ` [Qemu-devel] [PATCH v3 resend 4/8] " mrhines

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51E7DA71.2040400@linux.vnet.ibm.com \
    --to=mrhines@linux.vnet.ibm.com \
    --cc=abali@us.ibm.com \
    --cc=aliguori@us.ibm.com \
    --cc=chegu_vinod@hp.com \
    --cc=gokul@us.ibm.com \
    --cc=knoel@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mrhines@us.ibm.com \
    --cc=owasserm@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).