From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:41836) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1URPyX-0000Sc-8P for qemu-devel@nongnu.org; Sun, 14 Apr 2013 12:40:24 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1URPyU-0002Tl-1a for qemu-devel@nongnu.org; Sun, 14 Apr 2013 12:40:21 -0400 Received: from e39.co.us.ibm.com ([32.97.110.160]:47057) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1URPyT-0002TB-RU for qemu-devel@nongnu.org; Sun, 14 Apr 2013 12:40:17 -0400 Received: from /spool/local by e39.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Sun, 14 Apr 2013 10:40:14 -0600 Received: from d01relay05.pok.ibm.com (d01relay05.pok.ibm.com [9.56.227.237]) by d01dlp02.pok.ibm.com (Postfix) with ESMTP id 63E516E803A for ; Sun, 14 Apr 2013 12:40:08 -0400 (EDT) Received: from d01av05.pok.ibm.com (d01av05.pok.ibm.com [9.56.224.195]) by d01relay05.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r3EGeBJe319228 for ; Sun, 14 Apr 2013 12:40:11 -0400 Received: from d01av05.pok.ibm.com (loopback [127.0.0.1]) by d01av05.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r3EGeAcH029733 for ; Sun, 14 Apr 2013 12:40:10 -0400 Message-ID: <516ADBEA.5090100@linux.vnet.ibm.com> Date: Sun, 14 Apr 2013 12:40:10 -0400 From: "Michael R. Hines" MIME-Version: 1.0 References: <20130411145632.GA2280@redhat.com> <5166F7AE.8070209@linux.vnet.ibm.com> <20130411191533.GA25515@redhat.com> <51671DFF.80904@linux.vnet.ibm.com> <20130412104802.GA23467@redhat.com> <5167E797.2050103@redhat.com> <20130412112553.GB23467@redhat.com> <51681DAA.3000503@redhat.com> <20130414115911.GA4923@redhat.com> <516ABCCC.207@linux.vnet.ibm.com> <20130414160327.GB7165@redhat.com> In-Reply-To: <20130414160327.GB7165@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive protocol documentation List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: aliguori@us.ibm.com, qemu-devel@nongnu.org, owasserm@redhat.com, abali@us.ibm.com, mrhines@us.ibm.com, gokul@us.ibm.com, Paolo Bonzini On 04/14/2013 12:03 PM, Michael S. Tsirkin wrote: > On Sun, Apr 14, 2013 at 10:27:24AM -0400, Michael R. Hines wrote: >> On 04/14/2013 07:59 AM, Michael S. Tsirkin wrote: >>> On Fri, Apr 12, 2013 at 04:43:54PM +0200, Paolo Bonzini wrote: >>>> Il 12/04/2013 13:25, Michael S. Tsirkin ha scritto: >>>>> On Fri, Apr 12, 2013 at 12:53:11PM +0200, Paolo Bonzini wrote: >>>>>> Il 12/04/2013 12:48, Michael S. Tsirkin ha scritto: >>>>>>> 1. You have two protocols already and this does not make sense in >>>>>>> version 1 of the patch. >>>>>> It makes sense if we consider it experimental (add x- in front of >>>>>> transport and capability) and would like people to play with it. >>>>>> >>>>>> Paolo >>>>> But it's not testable yet. I see problems just reading the >>>>> documentation. Author thinks "ulimit -l 10000000000" on both source and >>>>> destination is just fine. This can easily crash host or cause OOM >>>>> killer to kill QEMU. So why is there any need for extra testers? Fix >>>>> the major bugs first. >>>>> >>>>> There's a similar issue with device assignment - we can't fix it there, >>>>> and despite being available for years, this was one of two reasons that >>>>> has kept this feature out of hands of lots of users (and assuming guest >>>>> has lots of zero pages won't work: balloon is not widely used either >>>>> since it depends on a well-behaved guest to work correctly). >>>> I agree assuming guest has lots of zero pages won't work, but I think >>>> you are overstating the importance of overcommit. Let's mark the damn >>>> thing as experimental, and stop making perfect the enemy of good. >>>> >>>> Paolo >>> It looks like we have to decide, before merging, whether migration with >>> rdma that breaks overcommit is worth it or not. Since the author made >>> it very clear he does not intend to make it work with overcommit, ever. >>> >> That depends entirely as what you define as overcommit. > You don't get to define your own terms. Look it up in wikipedia or > something. > >> The pages do get unregistered at the end of the migration =) >> >> - Michael > The limitations are pretty clear, and you really should document them: > > 1. run qemu as root, or under ulimit -l on both source and > destination > > 2. expect that as much as that amount of memory is pinned > and unvailable to host kernel and applications for > arbitrarily long time. > Make sure you have much more RAM in host or QEMU will get killed. > > To me, especially 1 is an unacceptable security tradeoff. > It is entirely fixable but we both have other priorities, > so it'll stay broken. > I've modified the beginning of docs/rdma.txt to say the following: $ cat docs/rdma.txt ... snip .. BEFORE RUNNING: =============== Use of RDMA requires pinning and registering memory with the hardware. If this is not acceptable for your application or product, then the use of RDMA is strongly discouraged and you should revert back to standard TCP-based migration. Next, decide if you want dynamic page registration on the server-side. For example, if you have an 8GB RAM virtual machine, but only 1GB is in active use, then disabling this feature will cause all 8GB to be pinned and resident in memory. This feature mostly affects the bulk-phase round of the migration and can be disabled for extremely high-performance RDMA hardware using the following command: QEMU Monitor Command: $ migrate_set_capability chunk_register_destination off # enabled by default Performing this action will cause all 8GB to be pinned, so if that's not what you want, then please ignore this step altogether. RUNNING: ======= ..... snip ... I'll group this change into a future patch whenever the current patch gets pulled, and I will also update the QEMU wiki to make this point clear. - Michael