From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:36116)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mrhines@linux.vnet.ibm.com>) id 1UQLIX-0005oF-22
	for qemu-devel@nongnu.org; Thu, 11 Apr 2013 13:28:36 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mrhines@linux.vnet.ibm.com>) id 1UQLIN-0005ip-IW
	for qemu-devel@nongnu.org; Thu, 11 Apr 2013 13:28:32 -0400
Received: from e38.co.us.ibm.com ([32.97.110.159]:58768)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mrhines@linux.vnet.ibm.com>) id 1UQLIN-0005ib-Az
	for qemu-devel@nongnu.org; Thu, 11 Apr 2013 13:28:23 -0400
Received: from /spool/local
	by e38.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only!
	Violators will be prosecuted
	for <qemu-devel@nongnu.org> from <mrhines@linux.vnet.ibm.com>;
	Thu, 11 Apr 2013 11:28:18 -0600
Received: from d03relay01.boulder.ibm.com (d03relay01.boulder.ibm.com
	[9.17.195.226])
	by d03dlp02.boulder.ibm.com (Postfix) with ESMTP id 25E983E4005B
	for <qemu-devel@nongnu.org>; Thu, 11 Apr 2013 11:27:59 -0600 (MDT)
Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169])
	by d03relay01.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id
	r3BHS0Ww120302
	for <qemu-devel@nongnu.org>; Thu, 11 Apr 2013 11:28:00 -0600
Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1])
	by d03av03.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP
	id r3BHS084018535
	for <qemu-devel@nongnu.org>; Thu, 11 Apr 2013 11:28:00 -0600
Message-ID: <5166F29E.4020209@linux.vnet.ibm.com>
Date: Thu, 11 Apr 2013 13:27:58 -0400
From: "Michael R. Hines" <mrhines@linux.vnet.ibm.com>
MIME-Version: 1.0
References: <20130410174107.GB32247@redhat.com>
	<5165C60E.20006@linux.vnet.ibm.com>
	<20130411071927.GA17063@redhat.com>
	<5166B6B1.2030003@linux.vnet.ibm.com>
	<20130411134820.GA24942@redhat.com>
	<5166C19A.1040402@linux.vnet.ibm.com>
	<20130411143718.GC24942@redhat.com>
	<5166D460.2070106@linux.vnet.ibm.com>
	<20130411154424.GB22779@redhat.com>
	<5166E048.4090008@linux.vnet.ibm.com>
	<20130411170407.GA23301@redhat.com>
In-Reply-To: <20130411170407.GA23301@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive
 protocol documentation
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: aliguori@us.ibm.com, qemu-devel@nongnu.org, owasserm@redhat.com, abali@us.ibm.com, mrhines@us.ibm.com, gokul@us.ibm.com, pbonzini@redhat.com

On 04/11/2013 01:04 PM, Michael S. Tsirkin wrote:
> On Thu, Apr 11, 2013 at 12:09:44PM -0400, Michael R. Hines wrote:
>>
>> Yes, that's correct. The agony is just delayed. The right thing to do
>> in a future patch would be to pin as much as possible in advance
>> before the bulk phase round even begins (using the pagemap).
> IMHO the right thing is to unpin memory after it's sent.

Based on what, exactly? Would you unpin a hot page? Would you
unpin a cold page that becomes hot again later? I don't see how we can
know in advance the behavior of individual pages and make the decision
to unpin them - we probably don't want to know either.

Trying to build a more complex protocol just for something that's 
unpredictable
(and probably not the common case) doesn't seem like a good focus for 
debate.

Overcommit is really only useful when the "overcommitted" memory
is not expected to fluctuate.  Unpinning pages just so they can be 
overcommitted
later means that it was probably a bad idea to overcommit those pages in 
the first place....

What you're asking for is very fine-grained overcommitment, which, in my
experience is not a practical decision making process that QEMU can ever 
really know
about. Memory footprints tend to either be very big or very small
and they stay that way for a very long time until something comes along 
to change that.

>> In the meantime, chunk registartion performance is still very good
>> so long as total migration time is not the metric you are optimizing for.
> You mean it has better downtime than TCP? Or lower host CPU
> overhead? These are the metrics we care about.
Yes, it does indeed have better downtime because RDMA latencies are much
lower and *most* of the page registrations will have already occurred after
the bulk phase round has passed in the first iteration.

.

- Michael

>>> If you mean that registering all memory is a requirement,
>>> then I am not sure I agree: you wrote one slow protocol, this
>>> does not mean that there can't be a fast one.
>>>
>>> But if you mean to say that the current chunk based code
>>> is useless, then I'd have to agree.
>> Answer above.
> I don't see it above. What does "keep it simple mean"?
>

By simple, I mean the argument for a simpler protocol that I made above.

- Michael