From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44600) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Urddm-0002AF-MT for qemu-devel@nongnu.org; Tue, 25 Jun 2013 20:31:27 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Urddc-0000Ku-NM for qemu-devel@nongnu.org; Tue, 25 Jun 2013 20:31:18 -0400 Received: from e37.co.us.ibm.com ([32.97.110.158]:48704) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Urddc-0000Ka-DH for qemu-devel@nongnu.org; Tue, 25 Jun 2013 20:31:08 -0400 Received: from /spool/local by e37.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 25 Jun 2013 18:31:07 -0600 Received: from d03relay01.boulder.ibm.com (d03relay01.boulder.ibm.com [9.17.195.226]) by d03dlp02.boulder.ibm.com (Postfix) with ESMTP id A1C553E40026 for ; Tue, 25 Jun 2013 18:30:45 -0600 (MDT) Received: from d03av06.boulder.ibm.com (d03av06.boulder.ibm.com [9.17.195.245]) by d03relay01.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r5Q0V4OJ031260 for ; Tue, 25 Jun 2013 18:31:04 -0600 Received: from d03av06.boulder.ibm.com (loopback [127.0.0.1]) by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r5Q0XWf4003148 for ; Tue, 25 Jun 2013 18:33:32 -0600 Message-ID: <51CA3646.8040409@linux.vnet.ibm.com> Date: Tue, 25 Jun 2013 20:31:02 -0400 From: "Michael R. Hines" MIME-Version: 1.0 References: <1372125485-11795-1-git-send-email-mrhines@linux.vnet.ibm.com> <1372125485-11795-15-git-send-email-mrhines@linux.vnet.ibm.com> <8761x21pvx.fsf@elfo.elfo> <51C96D39.2020603@redhat.com> <51C99EC6.2050008@linux.vnet.ibm.com> <51C9A0D8.6050800@redhat.com> <51C9AF15.8000404@linux.vnet.ibm.com> <51C9AF56.9030800@redhat.com> <51CA03FF.1000806@linux.vnet.ibm.com> <51CA0640.1040507@redhat.com> In-Reply-To: <51CA0640.1040507@redhat.com> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v11 14/15] rdma: introduce MIG_STATE_NONE and change MIG_STATE_SETUP state transition List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: aliguori@us.ibm.com, quintela@redhat.com, qemu-devel@nongnu.org, owasserm@redhat.com, abali@us.ibm.com, mrhines@us.ibm.com, gokul@us.ibm.com, chegu_vinod@hp.com, knoel@redhat.com On 06/25/2013 05:06 PM, Paolo Bonzini wrote: > Il 25/06/2013 22:56, Michael R. Hines ha scritto: >> I was wrong - this does require a protocol extension. >> >> This is because the RDMA transfers are asynchronous, and thus >> we cannot know in advance that it is safe to unregister the memory >> associated with each individual transfer before the transfer actually >> completes. >> >> While the destination currently uses the protocol to participate in >> *registering* the page, the destination does not participate in the >> RDMA transfers themselves, only the source does, and thus would >> require a new exchange of messages to block and instruct the >> destination to unpin the memory. > Yes, that's what I recalled too (really what mst told me :)). Does it > need to be blocking though? As long as the pinning is blocking, and > messages are processed in order, the source can proceed immediately > after sending an unpin message. This assumes of course that the chunk > is not being transmitted, and I am not sure how easy the source can > determine that. > > Paolo > No, they're not processed in order. In fact, not only does the device write out of order, but also the PCI bus writes out of order. This was such a problem in fact, that I fixed several bugs as a result a few weeks ago (v7 of the patch with an in-depth description). The destination simply cannot assume whatsoever what the ordering of the writes are - that's really the whole point of using RDMA in the first place so that the software can get out of the way of the transfer process to lower the latency of each transfer. The only option is to send a blocking message to the other side to request the unpinning (in addition to unpinning on the source first upon completion of the original transfer). As you can expect, this would be very expensive and we must ensure that we have *very* good a-priori information that this memory will not need to be re-registered anytime in the near future. - Michael