From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:45288) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1URSFq-0006ko-IB for qemu-devel@nongnu.org; Sun, 14 Apr 2013 15:06:25 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1URSFn-0003Yd-7g for qemu-devel@nongnu.org; Sun, 14 Apr 2013 15:06:22 -0400 Received: from e39.co.us.ibm.com ([32.97.110.160]:49486) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1URSFm-0003WU-W7 for qemu-devel@nongnu.org; Sun, 14 Apr 2013 15:06:19 -0400 Received: from /spool/local by e39.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Sun, 14 Apr 2013 13:06:16 -0600 Received: from d01relay07.pok.ibm.com (d01relay07.pok.ibm.com [9.56.227.147]) by d01dlp02.pok.ibm.com (Postfix) with ESMTP id CAFC76E803A for ; Sun, 14 Apr 2013 15:06:10 -0400 (EDT) Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by d01relay07.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r3EJ6CIB62783732 for ; Sun, 14 Apr 2013 15:06:12 -0400 Received: from d01av03.pok.ibm.com (loopback [127.0.0.1]) by d01av03.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r3EJ6BAw025455 for ; Sun, 14 Apr 2013 16:06:11 -0300 Message-ID: <516AFE23.104@linux.vnet.ibm.com> Date: Sun, 14 Apr 2013 15:06:11 -0400 From: "Michael R. Hines" MIME-Version: 1.0 References: <20130411191533.GA25515@redhat.com> <51671DFF.80904@linux.vnet.ibm.com> <20130412104802.GA23467@redhat.com> <5167E797.2050103@redhat.com> <20130412112553.GB23467@redhat.com> <51681DAA.3000503@redhat.com> <20130414115911.GA4923@redhat.com> <516ABCCC.207@linux.vnet.ibm.com> <20130414160327.GB7165@redhat.com> <516ADBEA.5090100@linux.vnet.ibm.com> <20130414183041.GC7165@redhat.com> In-Reply-To: <20130414183041.GC7165@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive protocol documentation List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: aliguori@us.ibm.com, qemu-devel@nongnu.org, owasserm@redhat.com, abali@us.ibm.com, mrhines@us.ibm.com, gokul@us.ibm.com, Paolo Bonzini On 04/14/2013 02:30 PM, Michael S. Tsirkin wrote: > On Sun, Apr 14, 2013 at 12:40:10PM -0400, Michael R. Hines wrote: >> On 04/14/2013 12:03 PM, Michael S. Tsirkin wrote: >>> On Sun, Apr 14, 2013 at 10:27:24AM -0400, Michael R. Hines wrote: >>>> On 04/14/2013 07:59 AM, Michael S. Tsirkin wrote: >>>>> On Fri, Apr 12, 2013 at 04:43:54PM +0200, Paolo Bonzini wrote: >>>>>> Il 12/04/2013 13:25, Michael S. Tsirkin ha scritto: >>>>>>> On Fri, Apr 12, 2013 at 12:53:11PM +0200, Paolo Bonzini wrote: >>>>>>>> Il 12/04/2013 12:48, Michael S. Tsirkin ha scritto: >>>>>>>>> 1. You have two protocols already and this does not make sense in >>>>>>>>> version 1 of the patch. >>>>>>>> It makes sense if we consider it experimental (add x- in front of >>>>>>>> transport and capability) and would like people to play with it. >>>>>>>> >>>>>>>> Paolo >>>>>>> But it's not testable yet. I see problems just reading the >>>>>>> documentation. Author thinks "ulimit -l 10000000000" on both source and >>>>>>> destination is just fine. This can easily crash host or cause OOM >>>>>>> killer to kill QEMU. So why is there any need for extra testers? Fix >>>>>>> the major bugs first. >>>>>>> >>>>>>> There's a similar issue with device assignment - we can't fix it there, >>>>>>> and despite being available for years, this was one of two reasons that >>>>>>> has kept this feature out of hands of lots of users (and assuming guest >>>>>>> has lots of zero pages won't work: balloon is not widely used either >>>>>>> since it depends on a well-behaved guest to work correctly). >>>>>> I agree assuming guest has lots of zero pages won't work, but I think >>>>>> you are overstating the importance of overcommit. Let's mark the damn >>>>>> thing as experimental, and stop making perfect the enemy of good. >>>>>> >>>>>> Paolo >>>>> It looks like we have to decide, before merging, whether migration with >>>>> rdma that breaks overcommit is worth it or not. Since the author made >>>>> it very clear he does not intend to make it work with overcommit, ever. >>>>> >>>> That depends entirely as what you define as overcommit. >>> You don't get to define your own terms. Look it up in wikipedia or >>> something. >>> >>>> The pages do get unregistered at the end of the migration =) >>>> >>>> - Michael >>> The limitations are pretty clear, and you really should document them: >>> >>> 1. run qemu as root, or under ulimit -l on both source and >>> destination >>> >>> 2. expect that as much as that amount of memory is pinned >>> and unvailable to host kernel and applications for >>> arbitrarily long time. >>> Make sure you have much more RAM in host or QEMU will get killed. >>> >>> To me, especially 1 is an unacceptable security tradeoff. >>> It is entirely fixable but we both have other priorities, >>> so it'll stay broken. >>> >> I've modified the beginning of docs/rdma.txt to say the following: > It really should say this, in a very prominent place: > > BUGS: Not a bug. We'll have to agree to disagree. Please drop this. > 1. You must run qemu as root, or under > ulimit -l on both source and destination Good, will update the documentation now. > 2. Expect that as much as that amount of memory to be locked > and unvailable to host kernel and applications for > arbitrarily long time. > Make sure you have much more RAM in host otherwise QEMU, > or some other arbitrary application on same host, will get killed. This is implied already. The docs say "If you don't want pinning, then use TCP". That's enough warning. > 3. Migration with RDMA support is experimental and unsupported. > In particular, please do not expect it to work across qemu versions, > and do not expect the management interface to be stable. > The only correct statement here is that it's experimental. I will update the docs to reflect that. >> $ cat docs/rdma.txt >> >> ... snip .. >> >> BEFORE RUNNING: >> =============== >> >> Use of RDMA requires pinning and registering memory with the >> hardware. If this is not acceptable for your application or >> product, then the use of RDMA is strongly discouraged and you >> should revert back to standard TCP-based migration. > No one knows of should know what "pinning and registering" means. I will define it in the docs, then. > For which applications and products is it appropriate? That's up to the vendor or user to decide, not us. > Also, you are talking about current QEMU > code using RDMA for migration but say "RDMA" generally. Sure, I will fix the docs. >> Next, decide if you want dynamic page registration on the server-side. >> For example, if you have an 8GB RAM virtual machine, but only 1GB >> is in active use, then disabling this feature will cause all 8GB to >> be pinned and resident in memory. This feature mostly affects the >> bulk-phase round of the migration and can be disabled for extremely >> high-performance RDMA hardware using the following command: >> QEMU Monitor Command: >> $ migrate_set_capability chunk_register_destination off # enabled by default >> >> Performing this action will cause all 8GB to be pinned, so if that's >> not what you want, then please ignore this step altogether. > This does not make it clear what is the benefit of disabling this > capability. I think it's best to avoid options, just use chunk > based always. > If it's here "so people can play with it" then please rename > it to something like "x-unsupported-chunk_register_destination" > so people know this is unsupported and not to be used for production. Again, please drop the request for removing chunking. Paolo already told me to use "x-rdma" - so that's enough for now. - Michael