From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:36634)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mrhines@linux.vnet.ibm.com>) id 1UQHIo-0008VF-4b
	for qemu-devel@nongnu.org; Thu, 11 Apr 2013 09:12:36 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mrhines@linux.vnet.ibm.com>) id 1UQHIk-0006ja-Nk
	for qemu-devel@nongnu.org; Thu, 11 Apr 2013 09:12:34 -0400
Received: from e8.ny.us.ibm.com ([32.97.182.138]:55148)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mrhines@linux.vnet.ibm.com>) id 1UQHIk-0006jU-Jd
	for qemu-devel@nongnu.org; Thu, 11 Apr 2013 09:12:30 -0400
Received: from /spool/local
	by e8.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only!
	Violators will be prosecuted
	for <qemu-devel@nongnu.org> from <mrhines@linux.vnet.ibm.com>;
	Thu, 11 Apr 2013 09:12:29 -0400
Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234])
	by d01dlp03.pok.ibm.com (Postfix) with ESMTP id 8B925C9005C
	for <qemu-devel@nongnu.org>; Thu, 11 Apr 2013 09:12:27 -0400 (EDT)
Received: from d03av06.boulder.ibm.com (d03av06.boulder.ibm.com [9.17.195.245])
	by d01relay02.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id
	r3BDCQk9254684
	for <qemu-devel@nongnu.org>; Thu, 11 Apr 2013 09:12:27 -0400
Received: from d03av06.boulder.ibm.com (loopback [127.0.0.1])
	by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP
	id r3BDF5V4009290
	for <qemu-devel@nongnu.org>; Thu, 11 Apr 2013 07:15:05 -0600
Message-ID: <5166B6B1.2030003@linux.vnet.ibm.com>
Date: Thu, 11 Apr 2013 09:12:17 -0400
From: "Michael R. Hines" <mrhines@linux.vnet.ibm.com>
MIME-Version: 1.0
References: <1365476681-31593-1-git-send-email-mrhines@linux.vnet.ibm.com>
	<1365476681-31593-4-git-send-email-mrhines@linux.vnet.ibm.com>
	<20130410052714.GB12777@redhat.com>	<5165636C.1090908@linux.vnet.ibm.com>
	<20130410133448.GA18128@redhat.com>	<51658554.2000909@linux.vnet.ibm.com>
	<20130410174107.GB32247@redhat.com>	<5165C60E.20006@linux.vnet.ibm.com>
	<20130411071927.GA17063@redhat.com>
In-Reply-To: <20130411071927.GA17063@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive
 protocol documentation
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: aliguori@us.ibm.com, qemu-devel@nongnu.org, owasserm@redhat.com, abali@us.ibm.com, mrhines@us.ibm.com, gokul@us.ibm.com, pbonzini@redhat.com

On 04/11/2013 03:19 AM, Michael S. Tsirkin wrote:
> On Wed, Apr 10, 2013 at 04:05:34PM -0400, Michael R. Hines wrote:
> Maybe we should just say "RDMA is incompatible with memory overcommit" 
> and be done with it then. But see below.
>> I would like to propose a compromise:
>>
>> How about we *keep* the registration capability and leave it enabled
>> by default?
>>
>> This gives management tools the ability to get performance if they want to,
>> but also satisfies your requirements in case management doesn't know the
>> feature exists - they will just get the default enabled?
> Well unfortunately the "overcommit" feature as implemented seems useless
> really.  Someone wants to migrate with RDMA but with low performance?
> Why not migrate with TCP then?

Answer below.

>> Either way, I agree that the optimization would be very useful,
>> but I disagree that it is possible for an optimized registration algorithm
>> to perform *as well as* the case when there is no dynamic
>> registration at all.
>>
>> The point is that dynamic registration *only* helps overcommitment.
>>
>> It does nothing for performance - and since that's true any optimizations
>> that improve on dynamic registrations will always be sub-optimal to turning
>> off dynamic registration in the first place.
>>
>> - Michael
> So you've given up on it.  Question is, sub-optimal by how much?  And
> where's the bottleneck?
>
> Let's do some math. Assume you send 16 bytes registration request and
> get back a 16 byte response for each 4Kbyte page (16 bytes enough?).  That's
> 32/4096 < 1% transport overhead. Negligeable.
>
> Is it the source CPU then? But CPU on source is basically doing same
> things as with pre-registration: you do not pin all memory on source.
>
> So it must be the destination CPU that does not keep up then?
> But it has to do even less than the source CPU.
>
> I suggest one explanation: the protocol you proposed is inefficient.
> It seems to basically do everything in a single thread:
> get a chunk,pin,wait for control credit,request,response,rdma,unpin,
> There are two round-trips of send/receive here where you are not
> going anything useful. Why not let migration proceed?
>
> Doesn't all of this sound worth checking before we give up?
>
First, let me remind you:

Chunks are already doing this!

Perhaps you don't fully understand how chunks work or perhaps I should 
be more verbose
in the documentation. The protocol is already joining multiple pages into a
single chunk without issuing any writes. It is only until the chunk is 
full that an
actual page registration request occurs.

So, basically what you want to know is what happens if we *change* the 
chunk size
dynamically?

Something like this:

1. Chunk = 1MB, what is the performance?
2. Chunk = 2MB, what is the performance?
3. Chunk = 4MB, what is the performance?
4. Chunk = 8MB, what is the performance?
5. Chunk = 16MB, what is the performance?
6. Chunk = 32MB, what is the performance?
7. Chunk = 64MB, what is the performance?
8. Chunk = 128MB, what is the performance?

I'll get you a this table today. Expect an email soon.

- Michael