From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50802) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UnEup-0004sj-H6 for qemu-devel@nongnu.org; Thu, 13 Jun 2013 17:18:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UnEum-000567-Hy for qemu-devel@nongnu.org; Thu, 13 Jun 2013 17:18:43 -0400 Received: from e39.co.us.ibm.com ([32.97.110.160]:33382) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UnEum-000561-At for qemu-devel@nongnu.org; Thu, 13 Jun 2013 17:18:40 -0400 Received: from /spool/local by e39.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 13 Jun 2013 15:18:38 -0600 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by d01dlp01.pok.ibm.com (Postfix) with ESMTP id BB2B638C804D for ; Thu, 13 Jun 2013 17:18:35 -0400 (EDT) Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by d01relay02.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r5DLHK6Y298430 for ; Thu, 13 Jun 2013 17:17:20 -0400 Received: from d01av01.pok.ibm.com (loopback [127.0.0.1]) by d01av01.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r5DLHJni019819 for ; Thu, 13 Jun 2013 17:17:19 -0400 Message-ID: <51BA36DD.8050703@linux.vnet.ibm.com> Date: Thu, 13 Jun 2013 17:17:17 -0400 From: "Michael R. Hines" MIME-Version: 1.0 References: <1370880226-2208-1-git-send-email-mrhines@linux.vnet.ibm.com> <4168C988EBDF2141B4E0B6475B6A73D10CE2AAC1@G6W2488.americas.hpqcorp.net> <51B60ABA.2070401@linux.vnet.ibm.com> <4168C988EBDF2141B4E0B6475B6A73D10CE2BAE7@G6W2488.americas.hpqcorp.net> <51B7B652.3070905@linux.vnet.ibm.com> <51B85EE5.1050702@hp.com> <51B868B3.9090607@linux.vnet.ibm.com> <51B9A614.2050101@hp.com> <51B9C2D6.30000@linux.vnet.ibm.com> <51B9D6A8.9070007@hp.com> <51B9DD5C.1030409@linux.vnet.ibm.com> <51BA264E.4010701@redhat.com> In-Reply-To: <51BA264E.4010701@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v7 00/12] rdma: migration support List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: Juan Jose Quintela Carreira , "qemu-devel@nongnu.org" , Bulent Abali , "mrhines@us.ibm.com" , Anthony Liguori , Chegu Vinod On 06/13/2013 04:06 PM, Paolo Bonzini wrote: >> >> (CC-ing qemu-devel). >> >> OK, that's good to know. This means that we need to bringup the mlock() >> problem as a "larger" issue in the linux community instead of the QEMU >> community. >> >> In the meantime, how about I make update to the RDMA patch which does >> the following: >> >> 1. Solution #1: >> If user requests "x-rdma-pin-all", then >> If QEMU has enabled "-realtime mlock=on" >> Then, allow the capability >> Else >> Disallow the capability >> >> 2. Solution #2: Create NEW qemu monitor command which locks memory *in >> advance* >> before the migrate command occurs, to clearly >> indicate to the user >> that the cost of locking memory must be paid >> before the migration starts. >> >> Which solution do you prefer? Or do you have alternative idea? > Let's just document it in the release notes. There's time to fix it. > > Regarding the timestamp problem, it should be fixed in the RDMA code. > You did find a bug, but xyz_start_outgoing_migration should be > asynchronous and the pinning should happen in the setup phase. This is > because the setup phase is already running outside the big QEMU lock and > the guest would not be frozen. I think you misunderstood the symptom. The pinning is *already* happening in the setup phase (xyz_start_outgoing_migration), not inside the the migration_thread(). The problem is in Linux: The guest appears to be frozen not because of any locks but because the pinning itself (allocating and clearing memory) is slowing down the virtual machine so much that it looks like its not running. > I think the patches are ready for merging, because incremental work > makes it easier to discuss the changes(*) but you really need to do two > things before 1.6, or I would rather revert them. Yes, could someone go ahead and pull them? They are very well bug-tested. > > (1) move the pinning to the setup phase This is already done in the existing patchset. > > (2) add a debug mode where every pass unpins all the memory and > restarts. Speed doesn't matter, this is so that the protocol supports > it from the beginning, and any caching heuristics need to be done on the > source side. As all debug modes, it will be somewhat prone to bitrot, > but at least there is a reference implementation for anyone who laters > wants to add caching. > > I think (2) is very important so that, for example, during fault > tolerance you can reduce a bit the pinned size for smaller workloads, > even without ballooning. I agree that this is a necessary feature (dynamic source registration), but it is a lot more complicated than a simple unpin of everything before every pass. As you suggested, I would rather not introduce unused code, but rather wait until someone in the future has a full-functional, testable, implementation. Actually - I am already working on a fault-tolerance implementation as we speak and will be posting it soon, so it's likely I will submit a patch to do something like this at that time. > (*) for example, why the introduction of acct_update_position? Is > it a fix for a bug that always existed, or driven by some other > changes? This is important because RDMA writes do not happen sycnrhonously. It is impossible to update the accounting inside of save_live_iterate() because the RDMA operations are still outstanding. It is only until they have completed later that we can actually know whether what the accounting statistics really are. - Michael