From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:45288)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mrhines@linux.vnet.ibm.com>) id 1URSFq-0006ko-IB
	for qemu-devel@nongnu.org; Sun, 14 Apr 2013 15:06:25 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mrhines@linux.vnet.ibm.com>) id 1URSFn-0003Yd-7g
	for qemu-devel@nongnu.org; Sun, 14 Apr 2013 15:06:22 -0400
Received: from e39.co.us.ibm.com ([32.97.110.160]:49486)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mrhines@linux.vnet.ibm.com>) id 1URSFm-0003WU-W7
	for qemu-devel@nongnu.org; Sun, 14 Apr 2013 15:06:19 -0400
Received: from /spool/local
	by e39.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only!
	Violators will be prosecuted
	for <qemu-devel@nongnu.org> from <mrhines@linux.vnet.ibm.com>;
	Sun, 14 Apr 2013 13:06:16 -0600
Received: from d01relay07.pok.ibm.com (d01relay07.pok.ibm.com [9.56.227.147])
	by d01dlp02.pok.ibm.com (Postfix) with ESMTP id CAFC76E803A
	for <qemu-devel@nongnu.org>; Sun, 14 Apr 2013 15:06:10 -0400 (EDT)
Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217])
	by d01relay07.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id
	r3EJ6CIB62783732
	for <qemu-devel@nongnu.org>; Sun, 14 Apr 2013 15:06:12 -0400
Received: from d01av03.pok.ibm.com (loopback [127.0.0.1])
	by d01av03.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id
	r3EJ6BAw025455
	for <qemu-devel@nongnu.org>; Sun, 14 Apr 2013 16:06:11 -0300
Message-ID: <516AFE23.104@linux.vnet.ibm.com>
Date: Sun, 14 Apr 2013 15:06:11 -0400
From: "Michael R. Hines" <mrhines@linux.vnet.ibm.com>
MIME-Version: 1.0
References: <20130411191533.GA25515@redhat.com>
	<51671DFF.80904@linux.vnet.ibm.com>
	<20130412104802.GA23467@redhat.com> <5167E797.2050103@redhat.com>
	<20130412112553.GB23467@redhat.com> <51681DAA.3000503@redhat.com>
	<20130414115911.GA4923@redhat.com>
	<516ABCCC.207@linux.vnet.ibm.com>
	<20130414160327.GB7165@redhat.com>
	<516ADBEA.5090100@linux.vnet.ibm.com>
	<20130414183041.GC7165@redhat.com>
In-Reply-To: <20130414183041.GC7165@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive
 protocol documentation
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: aliguori@us.ibm.com, qemu-devel@nongnu.org, owasserm@redhat.com, abali@us.ibm.com, mrhines@us.ibm.com, gokul@us.ibm.com, Paolo Bonzini <pbonzini@redhat.com>

On 04/14/2013 02:30 PM, Michael S. Tsirkin wrote:
> On Sun, Apr 14, 2013 at 12:40:10PM -0400, Michael R. Hines wrote:
>> On 04/14/2013 12:03 PM, Michael S. Tsirkin wrote:
>>> On Sun, Apr 14, 2013 at 10:27:24AM -0400, Michael R. Hines wrote:
>>>> On 04/14/2013 07:59 AM, Michael S. Tsirkin wrote:
>>>>> On Fri, Apr 12, 2013 at 04:43:54PM +0200, Paolo Bonzini wrote:
>>>>>> Il 12/04/2013 13:25, Michael S. Tsirkin ha scritto:
>>>>>>> On Fri, Apr 12, 2013 at 12:53:11PM +0200, Paolo Bonzini wrote:
>>>>>>>> Il 12/04/2013 12:48, Michael S. Tsirkin ha scritto:
>>>>>>>>> 1.  You have two protocols already and this does not make sense in
>>>>>>>>> version 1 of the patch.
>>>>>>>> It makes sense if we consider it experimental (add x- in front of
>>>>>>>> transport and capability) and would like people to play with it.
>>>>>>>>
>>>>>>>> Paolo
>>>>>>> But it's not testable yet.  I see problems just reading the
>>>>>>> documentation.  Author thinks "ulimit -l 10000000000" on both source and
>>>>>>> destination is just fine.  This can easily crash host or cause OOM
>>>>>>> killer to kill QEMU.  So why is there any need for extra testers?  Fix
>>>>>>> the major bugs first.
>>>>>>>
>>>>>>> There's a similar issue with device assignment - we can't fix it there,
>>>>>>> and despite being available for years, this was one of two reasons that
>>>>>>> has kept this feature out of hands of lots of users (and assuming guest
>>>>>>> has lots of zero pages won't work: balloon is not widely used either
>>>>>>> since it depends on a well-behaved guest to work correctly).
>>>>>> I agree assuming guest has lots of zero pages won't work, but I think
>>>>>> you are overstating the importance of overcommit.  Let's mark the damn
>>>>>> thing as experimental, and stop making perfect the enemy of good.
>>>>>>
>>>>>> Paolo
>>>>> It looks like we have to decide, before merging, whether migration with
>>>>> rdma that breaks overcommit is worth it or not.  Since the author made
>>>>> it very clear he does not intend to make it work with overcommit, ever.
>>>>>
>>>> That depends entirely as what you define as overcommit.
>>> You don't get to define your own terms.  Look it up in wikipedia or
>>> something.
>>>
>>>> The pages do get unregistered at the end of the migration =)
>>>>
>>>> - Michael
>>> The limitations are pretty clear, and you really should document them:
>>>
>>> 1. run qemu as root, or under ulimit -l <total guest memory> on both source and
>>>    destination
>>>
>>> 2. expect that as much as that amount of memory is pinned
>>>    and unvailable to host kernel and applications for
>>>    arbitrarily long time.
>>>    Make sure you have much more RAM in host or QEMU will get killed.
>>>
>>> To me, especially 1 is an unacceptable security tradeoff.
>>> It is entirely fixable but we both have other priorities,
>>> so it'll stay broken.
>>>
>> I've modified the beginning of docs/rdma.txt to say the following:
> It really should say this, in a very prominent place:
>
> BUGS:
Not a bug. We'll have to agree to disagree. Please drop this.
> 1. You must run qemu as root, or under
>     ulimit -l <total guest memory> on both source and destination
Good, will update the documentation now.
> 2. Expect that as much as that amount of memory to be locked
>     and unvailable to host kernel and applications for
>     arbitrarily long time.
>     Make sure you have much more RAM in host otherwise QEMU,
>     or some other arbitrary application on same host, will get killed.
This is implied already. The docs say "If you don't want pinning, then 
use TCP".
That's enough warning.
> 3. Migration with RDMA support is experimental and unsupported.
>     In particular, please do not expect it to work across qemu versions,
>     and do not expect the management interface to be stable.
>     

The only correct statement here is that it's experimental.

I will update the docs to reflect that.

>> $ cat docs/rdma.txt
>>
>> ... snip ..
>>
>> BEFORE RUNNING:
>> ===============
>>
>> Use of RDMA requires pinning and registering memory with the
>> hardware. If this is not acceptable for your application or
>> product, then the use of RDMA is strongly discouraged and you
>> should revert back to standard TCP-based migration.
> No one knows of should know what "pinning and registering" means.

I will define it in the docs, then.

> For which applications and products is it appropriate?

That's up to the vendor or user to decide, not us.

> Also, you are talking about current QEMU
> code using RDMA for migration but say "RDMA" generally.

Sure, I will fix the docs.

>> Next, decide if you want dynamic page registration on the server-side.
>> For example, if you have an 8GB RAM virtual machine, but only 1GB
>> is in active use, then disabling this feature will cause all 8GB to
>> be pinned and resident in memory. This feature mostly affects the
>> bulk-phase round of the migration and can be disabled for extremely
>> high-performance RDMA hardware using the following command:
>> QEMU Monitor Command:
>> $ migrate_set_capability chunk_register_destination off # enabled by default
>>
>> Performing this action will cause all 8GB to be pinned, so if that's
>> not what you want, then please ignore this step altogether.
> This does not make it clear what is the benefit of disabling this
> capability. I think it's best to avoid options, just use chunk
> based always.
> If it's here "so people can play with it" then please rename
> it to something like "x-unsupported-chunk_register_destination"
> so people know this is unsupported and not to be used for production.

Again, please drop the request for removing chunking.

Paolo already told me to use "x-rdma" - so that's enough for now.

- Michael