From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:57913) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1URXse-0001iO-5Y for qemu-devel@nongnu.org; Sun, 14 Apr 2013 21:06:50 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1URXsb-00056V-1W for qemu-devel@nongnu.org; Sun, 14 Apr 2013 21:06:48 -0400 Received: from e9.ny.us.ibm.com ([32.97.182.139]:45639) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1URXsa-00056F-Sj for qemu-devel@nongnu.org; Sun, 14 Apr 2013 21:06:44 -0400 Received: from /spool/local by e9.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Sun, 14 Apr 2013 21:06:41 -0400 Received: from d01relay01.pok.ibm.com (d01relay01.pok.ibm.com [9.56.227.233]) by d01dlp03.pok.ibm.com (Postfix) with ESMTP id 93DA2C90028 for ; Sun, 14 Apr 2013 21:06:37 -0400 (EDT) Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay01.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r3F16baO326332 for ; Sun, 14 Apr 2013 21:06:37 -0400 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r3F16afi031767 for ; Sun, 14 Apr 2013 22:06:37 -0300 Message-ID: <516B529C.5090503@linux.vnet.ibm.com> Date: Sun, 14 Apr 2013 21:06:36 -0400 From: "Michael R. Hines" MIME-Version: 1.0 References: <20130412104802.GA23467@redhat.com> <5167E797.2050103@redhat.com> <20130412112553.GB23467@redhat.com> <51681DAA.3000503@redhat.com> <20130414115911.GA4923@redhat.com> <516ABCCC.207@linux.vnet.ibm.com> <20130414160327.GB7165@redhat.com> <516ADBEA.5090100@linux.vnet.ibm.com> <20130414183041.GC7165@redhat.com> <516AFE23.104@linux.vnet.ibm.com> <20130414211055.GF7165@redhat.com> In-Reply-To: <20130414211055.GF7165@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive protocol documentation List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: aliguori@us.ibm.com, qemu-devel@nongnu.org, owasserm@redhat.com, abali@us.ibm.com, mrhines@us.ibm.com, gokul@us.ibm.com, Paolo Bonzini On 04/14/2013 05:10 PM, Michael S. Tsirkin wrote: > On Sun, Apr 14, 2013 at 03:06:11PM -0400, Michael R. Hines wrote: >> On 04/14/2013 02:30 PM, Michael S. Tsirkin wrote: >>> On Sun, Apr 14, 2013 at 12:40:10PM -0400, Michael R. Hines wrote: >>>> On 04/14/2013 12:03 PM, Michael S. Tsirkin wrote: >>>>> On Sun, Apr 14, 2013 at 10:27:24AM -0400, Michael R. Hines wrote: >>>>>> On 04/14/2013 07:59 AM, Michael S. Tsirkin wrote: >>>>>>> On Fri, Apr 12, 2013 at 04:43:54PM +0200, Paolo Bonzini wrote: >>>>>>>> Il 12/04/2013 13:25, Michael S. Tsirkin ha scritto: >>>>>>>>> On Fri, Apr 12, 2013 at 12:53:11PM +0200, Paolo Bonzini wrote: >>>>>>>>>> Il 12/04/2013 12:48, Michael S. Tsirkin ha scritto: >>>>>>>>>>> 1. You have two protocols already and this does not make sense in >>>>>>>>>>> version 1 of the patch. >>>>>>>>>> It makes sense if we consider it experimental (add x- in front of >>>>>>>>>> transport and capability) and would like people to play with it. >>>>>>>>>> >>>>>>>>>> Paolo >>>>>>>>> But it's not testable yet. I see problems just reading the >>>>>>>>> documentation. Author thinks "ulimit -l 10000000000" on both source and >>>>>>>>> destination is just fine. This can easily crash host or cause OOM >>>>>>>>> killer to kill QEMU. So why is there any need for extra testers? Fix >>>>>>>>> the major bugs first. >>>>>>>>> >>>>>>>>> There's a similar issue with device assignment - we can't fix it there, >>>>>>>>> and despite being available for years, this was one of two reasons that >>>>>>>>> has kept this feature out of hands of lots of users (and assuming guest >>>>>>>>> has lots of zero pages won't work: balloon is not widely used either >>>>>>>>> since it depends on a well-behaved guest to work correctly). >>>>>>>> I agree assuming guest has lots of zero pages won't work, but I think >>>>>>>> you are overstating the importance of overcommit. Let's mark the damn >>>>>>>> thing as experimental, and stop making perfect the enemy of good. >>>>>>>> >>>>>>>> Paolo >>>>>>> It looks like we have to decide, before merging, whether migration with >>>>>>> rdma that breaks overcommit is worth it or not. Since the author made >>>>>>> it very clear he does not intend to make it work with overcommit, ever. >>>>>>> >>>>>> That depends entirely as what you define as overcommit. >>>>> You don't get to define your own terms. Look it up in wikipedia or >>>>> something. >>>>> >>>>>> The pages do get unregistered at the end of the migration =) >>>>>> >>>>>> - Michael >>>>> The limitations are pretty clear, and you really should document them: >>>>> >>>>> 1. run qemu as root, or under ulimit -l on both source and >>>>> destination >>>>> >>>>> 2. expect that as much as that amount of memory is pinned >>>>> and unvailable to host kernel and applications for >>>>> arbitrarily long time. >>>>> Make sure you have much more RAM in host or QEMU will get killed. >>>>> >>>>> To me, especially 1 is an unacceptable security tradeoff. >>>>> It is entirely fixable but we both have other priorities, >>>>> so it'll stay broken. >>>>> >>>> I've modified the beginning of docs/rdma.txt to say the following: >>> It really should say this, in a very prominent place: >>> >>> BUGS: >> Not a bug. We'll have to agree to disagree. Please drop this. > It's not a feature, it makes management harder and > will bite some users who are not careful enough > to read documentation and know what to expect. Something that does not exist cannot be a bug. That's called a non-existent optimization. >>> 1. You must run qemu as root, or under >>> ulimit -l on both source and destination >> Good, will update the documentation now. >>> 2. Expect that as much as that amount of memory to be locked >>> and unvailable to host kernel and applications for >>> arbitrarily long time. >>> Make sure you have much more RAM in host otherwise QEMU, >>> or some other arbitrary application on same host, will get killed. >> This is implied already. The docs say "If you don't want pinning, >> then use TCP". >> That's enough warning. > No it's not. Pinning is jargon, and does not mean locking > up gigabytes. Why are you using jargon? > Explain the limitation in plain English so people know > when to expect things to work. Already done. >>> 3. Migration with RDMA support is experimental and unsupported. >>> In particular, please do not expect it to work across qemu versions, >>> and do not expect the management interface to be stable. >> The only correct statement here is that it's experimental. >> >> I will update the docs to reflect that. >> >>>> $ cat docs/rdma.txt >>>> >>>> ... snip .. >>>> >>>> BEFORE RUNNING: >>>> =============== >>>> >>>> Use of RDMA requires pinning and registering memory with the >>>> hardware. If this is not acceptable for your application or >>>> product, then the use of RDMA is strongly discouraged and you >>>> should revert back to standard TCP-based migration. >>> No one knows of should know what "pinning and registering" means. >> I will define it in the docs, then. > Keep it simple. Just tell people what they need to know. > It's silly to expect users to understand internals of > the product before they even try it for the first time. Agreed. >>> For which applications and products is it appropriate? >> That's up to the vendor or user to decide, not us. > With zero information so far, no one will be > able to decide. There is plenty of information. Including this email thread. >>> Also, you are talking about current QEMU >>> code using RDMA for migration but say "RDMA" generally. >> Sure, I will fix the docs. >> >>>> Next, decide if you want dynamic page registration on the server-side. >>>> For example, if you have an 8GB RAM virtual machine, but only 1GB >>>> is in active use, then disabling this feature will cause all 8GB to >>>> be pinned and resident in memory. This feature mostly affects the >>>> bulk-phase round of the migration and can be disabled for extremely >>>> high-performance RDMA hardware using the following command: >>>> QEMU Monitor Command: >>>> $ migrate_set_capability chunk_register_destination off # enabled by default >>>> >>>> Performing this action will cause all 8GB to be pinned, so if that's >>>> not what you want, then please ignore this step altogether. >>> This does not make it clear what is the benefit of disabling this >>> capability. I think it's best to avoid options, just use chunk >>> based always. >>> If it's here "so people can play with it" then please rename >>> it to something like "x-unsupported-chunk_register_destination" >>> so people know this is unsupported and not to be used for production. >> Again, please drop the request for removing chunking. >> >> Paolo already told me to use "x-rdma" - so that's enough for now. >> >> - Michael > You are adding a new command that's also experimental, so you must tag > it explicitly too. The entire migration is experimental - which by extension makes the capability experimental.