From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:56564)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mrhines@linux.vnet.ibm.com>) id 1UQJNt-00018A-Iy
	for qemu-devel@nongnu.org; Thu, 11 Apr 2013 11:26:00 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mrhines@linux.vnet.ibm.com>) id 1UQJHR-0000DZ-0G
	for qemu-devel@nongnu.org; Thu, 11 Apr 2013 11:19:25 -0400
Received: from e7.ny.us.ibm.com ([32.97.182.137]:37429)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mrhines@linux.vnet.ibm.com>) id 1UQJHQ-0000DS-Qh
	for qemu-devel@nongnu.org; Thu, 11 Apr 2013 11:19:16 -0400
Received: from /spool/local
	by e7.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only!
	Violators will be prosecuted
	for <qemu-devel@nongnu.org> from <mrhines@linux.vnet.ibm.com>;
	Thu, 11 Apr 2013 11:19:15 -0400
Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236])
	by d01dlp01.pok.ibm.com (Postfix) with ESMTP id E875338C8091
	for <qemu-devel@nongnu.org>; Thu, 11 Apr 2013 11:19:10 -0400 (EDT)
Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169])
	by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id
	r3BFJAmt341680
	for <qemu-devel@nongnu.org>; Thu, 11 Apr 2013 11:19:10 -0400
Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1])
	by d03av03.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP
	id r3BFJ6K0009317
	for <qemu-devel@nongnu.org>; Thu, 11 Apr 2013 09:19:07 -0600
Message-ID: <5166D460.2070106@linux.vnet.ibm.com>
Date: Thu, 11 Apr 2013 11:18:56 -0400
From: "Michael R. Hines" <mrhines@linux.vnet.ibm.com>
MIME-Version: 1.0
References: <20130410052714.GB12777@redhat.com>
	<5165636C.1090908@linux.vnet.ibm.com>
	<20130410133448.GA18128@redhat.com>
	<51658554.2000909@linux.vnet.ibm.com>
	<20130410174107.GB32247@redhat.com>
	<5165C60E.20006@linux.vnet.ibm.com>
	<20130411071927.GA17063@redhat.com>
	<5166B6B1.2030003@linux.vnet.ibm.com>
	<20130411134820.GA24942@redhat.com>
	<5166C19A.1040402@linux.vnet.ibm.com>
	<20130411143718.GC24942@redhat.com>
In-Reply-To: <20130411143718.GC24942@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive
 protocol documentation
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: aliguori@us.ibm.com, qemu-devel@nongnu.org, owasserm@redhat.com, abali@us.ibm.com, mrhines@us.ibm.com, gokul@us.ibm.com, pbonzini@redhat.com

First of all, this whole argument should not even exist for the 
following reason:

Page registrations are supposed to be *rare* - once a page is registered, it
is registered for life. There is nothing in the design that says a page must
be "unregistered" and I do not believe anybody is proposing that.

Second, this means that my previous analysis showing that performance 
was reduced
was also incorrect because most of the RDMA transfers were against pages 
during
the bulk phase round, which incorrectly makes dynamic page registration 
look bad.
I should have done more testing *after* the bulk phase round,
and I apologize for not doing that.

Indeed when I do such a test (with the 'stress' command) the cost of 
page registration disappears
because most of the registrations have already completed a long time ago.

Thanks, Paolo for reminding us about the bulk-phase behavior to being with.

Third, this means that optimizing this protocol would not be helpful and 
that we should
follow the "keep it simple" approach because during steady-state phase 
of the migration
most of the pages should have already been registered.

- Michael


On 04/11/2013 10:37 AM, Michael S. Tsirkin wrote:
> Answer above.
>
> Here's how things are supposed to work in a pipeline:
>
> req -> registration request
> res -> response
> done -> rdma done notification (remote can unregister)
> pgX  -> page, or chunk, or whatever unit is used
>          for registration
> rdma -> one or more rdma write requests
>
>
>
> pg1 ->  pin -> req -> res -> rdma -> done
>          pg2 ->  pin -> req -> res -> rdma -> done
>                  pg3 -> pin -> req -> res -> rdma -> done
>                         pg4 -> pin -> req -> res -> rdma -> done
>                                pg4 -> pin -> req -> res -> rdma -> done
>
>
>
> It's like a assembly line see?  So while software does the registration
> roundtrip dance, hardware is processing rdma requests for previous
> chunks.
>
> ....
>
> When do you have to stall? when you run out of rx buffer credits so you
> can not start a new req.  Your protocol has 2 outstanding buffers,
> so you can only have one req in the air. Do more and
> you will not need to stall - possibly at all.
>
> One other minor point is that your protocol requires extra explicit
> ready commands. You can pass the number of rx buffers as extra payload
> in the traffic you are sending anyway, and reduce that overhead.
>