From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:36116) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UQLIX-0005oF-22 for qemu-devel@nongnu.org; Thu, 11 Apr 2013 13:28:36 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UQLIN-0005ip-IW for qemu-devel@nongnu.org; Thu, 11 Apr 2013 13:28:32 -0400 Received: from e38.co.us.ibm.com ([32.97.110.159]:58768) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UQLIN-0005ib-Az for qemu-devel@nongnu.org; Thu, 11 Apr 2013 13:28:23 -0400 Received: from /spool/local by e38.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 11 Apr 2013 11:28:18 -0600 Received: from d03relay01.boulder.ibm.com (d03relay01.boulder.ibm.com [9.17.195.226]) by d03dlp02.boulder.ibm.com (Postfix) with ESMTP id 25E983E4005B for ; Thu, 11 Apr 2013 11:27:59 -0600 (MDT) Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by d03relay01.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r3BHS0Ww120302 for ; Thu, 11 Apr 2013 11:28:00 -0600 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r3BHS084018535 for ; Thu, 11 Apr 2013 11:28:00 -0600 Message-ID: <5166F29E.4020209@linux.vnet.ibm.com> Date: Thu, 11 Apr 2013 13:27:58 -0400 From: "Michael R. Hines" MIME-Version: 1.0 References: <20130410174107.GB32247@redhat.com> <5165C60E.20006@linux.vnet.ibm.com> <20130411071927.GA17063@redhat.com> <5166B6B1.2030003@linux.vnet.ibm.com> <20130411134820.GA24942@redhat.com> <5166C19A.1040402@linux.vnet.ibm.com> <20130411143718.GC24942@redhat.com> <5166D460.2070106@linux.vnet.ibm.com> <20130411154424.GB22779@redhat.com> <5166E048.4090008@linux.vnet.ibm.com> <20130411170407.GA23301@redhat.com> In-Reply-To: <20130411170407.GA23301@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive protocol documentation List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: aliguori@us.ibm.com, qemu-devel@nongnu.org, owasserm@redhat.com, abali@us.ibm.com, mrhines@us.ibm.com, gokul@us.ibm.com, pbonzini@redhat.com On 04/11/2013 01:04 PM, Michael S. Tsirkin wrote: > On Thu, Apr 11, 2013 at 12:09:44PM -0400, Michael R. Hines wrote: >> >> Yes, that's correct. The agony is just delayed. The right thing to do >> in a future patch would be to pin as much as possible in advance >> before the bulk phase round even begins (using the pagemap). > IMHO the right thing is to unpin memory after it's sent. Based on what, exactly? Would you unpin a hot page? Would you unpin a cold page that becomes hot again later? I don't see how we can know in advance the behavior of individual pages and make the decision to unpin them - we probably don't want to know either. Trying to build a more complex protocol just for something that's unpredictable (and probably not the common case) doesn't seem like a good focus for debate. Overcommit is really only useful when the "overcommitted" memory is not expected to fluctuate. Unpinning pages just so they can be overcommitted later means that it was probably a bad idea to overcommit those pages in the first place.... What you're asking for is very fine-grained overcommitment, which, in my experience is not a practical decision making process that QEMU can ever really know about. Memory footprints tend to either be very big or very small and they stay that way for a very long time until something comes along to change that. >> In the meantime, chunk registartion performance is still very good >> so long as total migration time is not the metric you are optimizing for. > You mean it has better downtime than TCP? Or lower host CPU > overhead? These are the metrics we care about. Yes, it does indeed have better downtime because RDMA latencies are much lower and *most* of the page registrations will have already occurred after the bulk phase round has passed in the first iteration. . - Michael >>> If you mean that registering all memory is a requirement, >>> then I am not sure I agree: you wrote one slow protocol, this >>> does not mean that there can't be a fast one. >>> >>> But if you mean to say that the current chunk based code >>> is useless, then I'd have to agree. >> Answer above. > I don't see it above. What does "keep it simple mean"? > By simple, I mean the argument for a simpler protocol that I made above. - Michael