From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:47618) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UQHrZ-0001YF-Ti for qemu-devel@nongnu.org; Thu, 11 Apr 2013 09:48:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UQHrW-0001yT-BL for qemu-devel@nongnu.org; Thu, 11 Apr 2013 09:48:29 -0400 Received: from mx1.redhat.com ([209.132.183.28]:64441) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UQHrW-0001yF-3I for qemu-devel@nongnu.org; Thu, 11 Apr 2013 09:48:26 -0400 Date: Thu, 11 Apr 2013 16:48:20 +0300 From: "Michael S. Tsirkin" Message-ID: <20130411134820.GA24942@redhat.com> References: <1365476681-31593-1-git-send-email-mrhines@linux.vnet.ibm.com> <1365476681-31593-4-git-send-email-mrhines@linux.vnet.ibm.com> <20130410052714.GB12777@redhat.com> <5165636C.1090908@linux.vnet.ibm.com> <20130410133448.GA18128@redhat.com> <51658554.2000909@linux.vnet.ibm.com> <20130410174107.GB32247@redhat.com> <5165C60E.20006@linux.vnet.ibm.com> <20130411071927.GA17063@redhat.com> <5166B6B1.2030003@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5166B6B1.2030003@linux.vnet.ibm.com> Subject: Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive protocol documentation List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael R. Hines" Cc: aliguori@us.ibm.com, qemu-devel@nongnu.org, owasserm@redhat.com, abali@us.ibm.com, mrhines@us.ibm.com, gokul@us.ibm.com, pbonzini@redhat.com On Thu, Apr 11, 2013 at 09:12:17AM -0400, Michael R. Hines wrote: > On 04/11/2013 03:19 AM, Michael S. Tsirkin wrote: > >On Wed, Apr 10, 2013 at 04:05:34PM -0400, Michael R. Hines wrote: > >Maybe we should just say "RDMA is incompatible with memory > >overcommit" and be done with it then. But see below. > >>I would like to propose a compromise: > >> > >>How about we *keep* the registration capability and leave it enabled > >>by default? > >> > >>This gives management tools the ability to get performance if they want to, > >>but also satisfies your requirements in case management doesn't know the > >>feature exists - they will just get the default enabled? > >Well unfortunately the "overcommit" feature as implemented seems useless > >really. Someone wants to migrate with RDMA but with low performance? > >Why not migrate with TCP then? > > Answer below. > > >>Either way, I agree that the optimization would be very useful, > >>but I disagree that it is possible for an optimized registration algorithm > >>to perform *as well as* the case when there is no dynamic > >>registration at all. > >> > >>The point is that dynamic registration *only* helps overcommitment. > >> > >>It does nothing for performance - and since that's true any optimizations > >>that improve on dynamic registrations will always be sub-optimal to turning > >>off dynamic registration in the first place. > >> > >>- Michael > >So you've given up on it. Question is, sub-optimal by how much? And > >where's the bottleneck? > > > >Let's do some math. Assume you send 16 bytes registration request and > >get back a 16 byte response for each 4Kbyte page (16 bytes enough?). That's > >32/4096 < 1% transport overhead. Negligeable. > > > >Is it the source CPU then? But CPU on source is basically doing same > >things as with pre-registration: you do not pin all memory on source. > > > >So it must be the destination CPU that does not keep up then? > >But it has to do even less than the source CPU. > > > >I suggest one explanation: the protocol you proposed is inefficient. > >It seems to basically do everything in a single thread: > >get a chunk,pin,wait for control credit,request,response,rdma,unpin, > >There are two round-trips of send/receive here where you are not > >going anything useful. Why not let migration proceed? > > > >Doesn't all of this sound worth checking before we give up? > > > First, let me remind you: > > Chunks are already doing this! > > Perhaps you don't fully understand how chunks work or perhaps I > should be more verbose > in the documentation. The protocol is already joining multiple pages into a > single chunk without issuing any writes. It is only until the chunk > is full that an > actual page registration request occurs. I think I got that at a high level. But there is a stall between chunks. If you make chunks smaller, but pipeline registration, then there will never be any stall. > So, basically what you want to know is what happens if we *change* > the chunk size > dynamically? What I wanted to know is where is performance going? Why is chunk based slower? It's not the extra messages, on the wire, these take up negligeable BW. > Something like this: > > 1. Chunk = 1MB, what is the performance? > 2. Chunk = 2MB, what is the performance? > 3. Chunk = 4MB, what is the performance? > 4. Chunk = 8MB, what is the performance? > 5. Chunk = 16MB, what is the performance? > 6. Chunk = 32MB, what is the performance? > 7. Chunk = 64MB, what is the performance? > 8. Chunk = 128MB, what is the performance? > > I'll get you a this table today. Expect an email soon. > > - Michael > > > > >