From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:47618)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1UQHrZ-0001YF-Ti
	for qemu-devel@nongnu.org; Thu, 11 Apr 2013 09:48:33 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1UQHrW-0001yT-BL
	for qemu-devel@nongnu.org; Thu, 11 Apr 2013 09:48:29 -0400
Received: from mx1.redhat.com ([209.132.183.28]:64441)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1UQHrW-0001yF-3I
	for qemu-devel@nongnu.org; Thu, 11 Apr 2013 09:48:26 -0400
Date: Thu, 11 Apr 2013 16:48:20 +0300
From: "Michael S. Tsirkin" <mst@redhat.com>
Message-ID: <20130411134820.GA24942@redhat.com>
References: <1365476681-31593-1-git-send-email-mrhines@linux.vnet.ibm.com>
	<1365476681-31593-4-git-send-email-mrhines@linux.vnet.ibm.com>
	<20130410052714.GB12777@redhat.com>
	<5165636C.1090908@linux.vnet.ibm.com>
	<20130410133448.GA18128@redhat.com>
	<51658554.2000909@linux.vnet.ibm.com>
	<20130410174107.GB32247@redhat.com>
	<5165C60E.20006@linux.vnet.ibm.com>
	<20130411071927.GA17063@redhat.com>
	<5166B6B1.2030003@linux.vnet.ibm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <5166B6B1.2030003@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive
 protocol documentation
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Michael R. Hines" <mrhines@linux.vnet.ibm.com>
Cc: aliguori@us.ibm.com, qemu-devel@nongnu.org, owasserm@redhat.com, abali@us.ibm.com, mrhines@us.ibm.com, gokul@us.ibm.com, pbonzini@redhat.com

On Thu, Apr 11, 2013 at 09:12:17AM -0400, Michael R. Hines wrote:
> On 04/11/2013 03:19 AM, Michael S. Tsirkin wrote:
> >On Wed, Apr 10, 2013 at 04:05:34PM -0400, Michael R. Hines wrote:
> >Maybe we should just say "RDMA is incompatible with memory
> >overcommit" and be done with it then. But see below.
> >>I would like to propose a compromise:
> >>
> >>How about we *keep* the registration capability and leave it enabled
> >>by default?
> >>
> >>This gives management tools the ability to get performance if they want to,
> >>but also satisfies your requirements in case management doesn't know the
> >>feature exists - they will just get the default enabled?
> >Well unfortunately the "overcommit" feature as implemented seems useless
> >really.  Someone wants to migrate with RDMA but with low performance?
> >Why not migrate with TCP then?
> 
> Answer below.
> 
> >>Either way, I agree that the optimization would be very useful,
> >>but I disagree that it is possible for an optimized registration algorithm
> >>to perform *as well as* the case when there is no dynamic
> >>registration at all.
> >>
> >>The point is that dynamic registration *only* helps overcommitment.
> >>
> >>It does nothing for performance - and since that's true any optimizations
> >>that improve on dynamic registrations will always be sub-optimal to turning
> >>off dynamic registration in the first place.
> >>
> >>- Michael
> >So you've given up on it.  Question is, sub-optimal by how much?  And
> >where's the bottleneck?
> >
> >Let's do some math. Assume you send 16 bytes registration request and
> >get back a 16 byte response for each 4Kbyte page (16 bytes enough?).  That's
> >32/4096 < 1% transport overhead. Negligeable.
> >
> >Is it the source CPU then? But CPU on source is basically doing same
> >things as with pre-registration: you do not pin all memory on source.
> >
> >So it must be the destination CPU that does not keep up then?
> >But it has to do even less than the source CPU.
> >
> >I suggest one explanation: the protocol you proposed is inefficient.
> >It seems to basically do everything in a single thread:
> >get a chunk,pin,wait for control credit,request,response,rdma,unpin,
> >There are two round-trips of send/receive here where you are not
> >going anything useful. Why not let migration proceed?
> >
> >Doesn't all of this sound worth checking before we give up?
> >
> First, let me remind you:
> 
> Chunks are already doing this!
> 
> Perhaps you don't fully understand how chunks work or perhaps I
> should be more verbose
> in the documentation. The protocol is already joining multiple pages into a
> single chunk without issuing any writes. It is only until the chunk
> is full that an
> actual page registration request occurs.

I think I got that at a high level.
But there is a stall between chunks. If you make chunks smaller,
but pipeline registration, then there will never be any stall.

> So, basically what you want to know is what happens if we *change*
> the chunk size
> dynamically?

What I wanted to know is where is performance going?
Why is chunk based slower? It's not the extra messages,
on the wire, these take up negligeable BW.

> Something like this:
> 
> 1. Chunk = 1MB, what is the performance?
> 2. Chunk = 2MB, what is the performance?
> 3. Chunk = 4MB, what is the performance?
> 4. Chunk = 8MB, what is the performance?
> 5. Chunk = 16MB, what is the performance?
> 6. Chunk = 32MB, what is the performance?
> 7. Chunk = 64MB, what is the performance?
> 8. Chunk = 128MB, what is the performance?
> 
> I'll get you a this table today. Expect an email soon.
> 
> - Michael
> 
> 
> 
> 
>