From: "Michael R. Hines" <mrhines@linux.vnet.ibm.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: aliguori@us.ibm.com, quintela@redhat.com, qemu-devel@nongnu.org,
owasserm@redhat.com, abali@us.ibm.com, mrhines@us.ibm.com,
gokul@us.ibm.com, chegu_vinod@hp.com, knoel@redhat.com
Subject: Re: [Qemu-devel] [PATCH v11 14/15] rdma: introduce MIG_STATE_NONE and change MIG_STATE_SETUP state transition
Date: Wed, 26 Jun 2013 10:09:07 -0400 [thread overview]
Message-ID: <51CAF603.7080809@linux.vnet.ibm.com> (raw)
In-Reply-To: <51CAE117.4020300@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 3796 bytes --]
On 06/26/2013 08:39 AM, Paolo Bonzini wrote:
> Il 26/06/2013 14:37, Michael R. Hines ha scritto:
>> On 06/26/2013 02:37 AM, Paolo Bonzini wrote:
>>> Il 26/06/2013 02:31, Michael R. Hines ha scritto:
>>>> On 06/25/2013 05:06 PM, Paolo Bonzini wrote:
>>>>> Il 25/06/2013 22:56, Michael R. Hines ha scritto:
>>>>>> I was wrong - this does require a protocol extension.
>>>>>>
>>>>>> This is because the RDMA transfers are asynchronous, and thus
>>>>>> we cannot know in advance that it is safe to unregister the memory
>>>>>> associated with each individual transfer before the transfer actually
>>>>>> completes.
>>>>>>
>>>>>> While the destination currently uses the protocol to participate in
>>>>>> *registering* the page, the destination does not participate in the
>>>>>> RDMA transfers themselves, only the source does, and thus would
>>>>>> require a new exchange of messages to block and instruct the
>>>>>> destination to unpin the memory.
>>>>> Yes, that's what I recalled too (really what mst told me :)). Does it
>>>>> need to be blocking though? As long as the pinning is blocking, and
>>>>> messages are processed in order, the source can proceed immediately
>>>>> after sending an unpin message. This assumes of course that the chunk
>>>>> is not being transmitted, and I am not sure how easy the source can
>>>>> determine that.
>>>> No, they're not processed in order. In fact, not only does the device
>>>> write out of order, but also the PCI bus writes out of order.
>>>> This was such a problem in fact, that I fixed several bugs as a result
>>>> a few weeks ago (v7 of the patch with an in-depth description).
>>>>
>>>> The destination simply cannot assume whatsoever what the ordering
>>>> of the writes are - that's really the whole point of using RDMA in the
>>>> first place so that the software can get out of the way of the transfer
>>>> process to lower the latency of each transfer.
>>> The memory is processed out of order, but what about the messages?
>>> Those must be in order.
>>>
>>> Note that I wrote above "This assumes of course that the chunk is not
>>> being transmitted". Can the source know when an asynchronous transfer
>>> finished, and delay the unpinning until that time?
>> Yes, the source does know. There's no problem unpinning on the source.
>>
>> But both sides must do the unpinning, not just the source.
>>
>> Did I misunderstand you? Are you suggesting *only* unpinning on the source?
> I'm suggesting (if possible) that the source only tells the destination
> to unpin once it knows it is safe for the destination to do it. As long
> as unpin and pin messages are not reordered, it should be possible to do
> it with an asynchronous message from the source to the destination.
>
> Paolo
>
Oh, certainly. I agree. That's not a trivial patch, though (as we were
originally shooting for).
(I'll list the steps below on the QEMU wiki, for the record).
*This requires some steps:*
1. First, maintain a new data structure: something like
"These memory ranges are 'being unpinned'" - block all potential writes
to these addresses until the unpinning completes.
2. Once the source unpin completes, send the asynchronous control
channel message
to the other side for unpinning.
2. Mark the data structure and return and allow the migration to continue
with the next RDMA write.
3. Upon completion of the unpinning on the destination,
respond to the source that it was finished.
4. Source then clears the data structure for the successfully unpinned
memory ranges.
5. At this point, one or more writes may (or may not) be blocking on the
unpinned memory areas and will poll the data structure and find that
the unpinning has completed.
6. Then issue the new writes and proceed as normal.
7. Repeat step 1.
[-- Attachment #2: Type: text/html, Size: 4718 bytes --]
next prev parent reply other threads:[~2013-06-26 14:10 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-25 1:57 [Qemu-devel] [PATCH v11 00/15] rdma: migration support mrhines
2013-06-25 1:57 ` [Qemu-devel] [PATCH v11 01/15] rdma: add documentation mrhines
2013-06-25 11:54 ` Juan Quintela
2013-06-25 1:57 ` [Qemu-devel] [PATCH v11 02/15] rdma: introduce qemu_update_position() mrhines
2013-06-25 9:24 ` Juan Quintela
2013-06-25 1:57 ` [Qemu-devel] [PATCH v11 03/15] rdma: export yield_until_fd_readable() mrhines
2013-06-25 9:26 ` Juan Quintela
2013-06-25 1:57 ` [Qemu-devel] [PATCH v11 04/15] rdma: export throughput w/ MigrationStats QMP mrhines
2013-06-25 9:27 ` Juan Quintela
2013-06-25 13:36 ` Michael R. Hines
2013-06-25 1:57 ` [Qemu-devel] [PATCH v11 05/15] rdma: introduce qemu_file_mode_is_not_valid() mrhines
2013-06-25 9:28 ` Juan Quintela
2013-06-25 1:57 ` [Qemu-devel] [PATCH v11 06/15] rdma: export qemu_fflush() mrhines
2013-06-25 9:29 ` Juan Quintela
2013-06-25 1:57 ` [Qemu-devel] [PATCH v11 07/15] rdma: introduce ram_handle_compressed() mrhines
2013-06-25 9:30 ` Juan Quintela
2013-06-25 1:57 ` [Qemu-devel] [PATCH v11 08/15] rdma: introduce qemu_ram_foreach_block() mrhines
2013-06-25 9:30 ` Juan Quintela
2013-06-25 1:57 ` [Qemu-devel] [PATCH v11 09/15] rdma: new QEMUFileOps hooks mrhines
2013-06-25 11:51 ` Juan Quintela
2013-06-25 13:38 ` Michael R. Hines
2013-06-25 13:50 ` Paolo Bonzini
2013-06-25 1:58 ` [Qemu-devel] [PATCH v11 10/15] rdma: introduce capability x-rdma-pin-all mrhines
2013-06-25 9:33 ` Juan Quintela
2013-06-25 1:58 ` [Qemu-devel] [PATCH v11 11/15] rdma: core logic mrhines
2013-06-25 12:05 ` Juan Quintela
2013-06-25 13:39 ` Michael R. Hines
2013-06-25 16:31 ` Vasilis Liaskovitis
2013-06-25 16:41 ` Paolo Bonzini
2013-06-25 18:38 ` Michael R. Hines
2013-06-25 1:58 ` [Qemu-devel] [PATCH v11 12/15] rdma: send pc.ram mrhines
2013-06-25 1:58 ` [Qemu-devel] [PATCH v11 13/15] rdma: allow state transitions between other states besides ACTIVE mrhines
2013-06-25 9:40 ` Juan Quintela
2013-06-25 1:58 ` [Qemu-devel] [PATCH v11 14/15] rdma: introduce MIG_STATE_NONE and change MIG_STATE_SETUP state transition mrhines
2013-06-25 9:49 ` Juan Quintela
2013-06-25 10:13 ` Paolo Bonzini
2013-06-25 13:44 ` Michael R. Hines
2013-06-25 13:53 ` Paolo Bonzini
2013-06-25 14:54 ` Michael R. Hines
2013-06-25 14:55 ` Paolo Bonzini
2013-06-25 16:57 ` Michael R. Hines
2013-06-25 20:56 ` Michael R. Hines
2013-06-25 21:06 ` Paolo Bonzini
2013-06-26 0:31 ` Michael R. Hines
2013-06-26 6:37 ` Paolo Bonzini
2013-06-26 12:37 ` Michael R. Hines
2013-06-26 12:39 ` Paolo Bonzini
2013-06-26 14:09 ` Michael R. Hines [this message]
2013-06-26 14:57 ` Paolo Bonzini
2013-06-26 19:25 ` Michael R. Hines
2013-06-25 14:17 ` Juan Quintela
2013-06-25 17:02 ` Michael R. Hines
2013-06-25 18:48 ` Michael R. Hines
2013-06-25 13:40 ` Michael R. Hines
2013-06-25 1:58 ` [Qemu-devel] [PATCH v11 15/15] rdma: account for the time spent in MIG_STATE_SETUP through QMP mrhines
2013-06-25 9:50 ` Juan Quintela
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51CAF603.7080809@linux.vnet.ibm.com \
--to=mrhines@linux.vnet.ibm.com \
--cc=abali@us.ibm.com \
--cc=aliguori@us.ibm.com \
--cc=chegu_vinod@hp.com \
--cc=gokul@us.ibm.com \
--cc=knoel@redhat.com \
--cc=mrhines@us.ibm.com \
--cc=owasserm@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.