From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: [PATCH] Revert "xen-hvm: increase maxmem before calling xc_domain_populate_physmap" Date: Tue, 16 Jun 2015 16:56:04 +0100 Message-ID: <55804714.8040303@eu.citrix.com> References: <1433940913-19425-1-git-send-email-george.dunlap@eu.citrix.com> <20150616113305.GA30895@zion.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Stefano Stabellini , Wei Liu Cc: Andrew Cooper , Stefano Stabellini , Ian Campbell , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On 06/16/2015 04:51 PM, Stefano Stabellini wrote: > On Tue, 16 Jun 2015, Wei Liu wrote: >> On Wed, Jun 10, 2015 at 01:55:13PM +0100, George Dunlap wrote: >>> This reverts commit c1d322e6048796296555dd36fdd102d7fa2f50bf. >>> >>> The original commit fixes a bug when assigning a large number of >>> devices which require option roms to a guest. (One known >>> configuration that needs extra memory is having more than 4 emulated >>> NICs assigned. Three or fewer NICs seems to work without this >>> functionality.) >>> >>> However, by unilaterally increasing maxmem, it introduces two >>> problems. >>> >>> First, now libxl's calculation of the required maxmem during migration >>> is broken -- any guest which exercised this functionality will fail on >>> migration. (Guests which have the default number of devices are not >>> affected.) >>> >>> Secondly, it makes it impossible for a higher-level toolstack or >>> administer to predict how much memory a VM will actually use, making >>> it much more difficult to effectively use all of the memory on a >>> machine. >>> >>> The right solution to the original problem is to figure out a way for >>> qemu to take pages from the existing pool of guest memory, rather than >>> allocating more pages. >>> >>> That fix will take more time to develop than we have until the feature >>> freeze. In the mean time, the simplest way to fix the migration issue >>> is to revert this change. That will re-introduce the original bug, >>> but it's an unusual corner case; and without migration it isn't fully >>> functional yet anyway. >>> >>> Signed-off-by: George Dunlap >>> --- >>> I do think this is the right approach, but I'm mainly sending this is >>> mainly to open up discussion. >>> >>> CC: Stefano Stabellini >>> CC: Wei Liu >>> CC: Ian Campbell >>> CC: Andrew Cooper >> >> Stefano, Andrew, any comments? >> >> If we're to do this we need to do it now. >> >> I think reverting this change in QEMU and relevant changes in libxl >> would be the most viable solution to solve this for this release. > > Reverting this patch doesn't really solve the problem: instead of > breaking on migration when the VM has more than 3 emulated NICs, the VM > simply refuses to start in that case. I guess it can be considered a > small improvement but certainly not a fix. > > Given that the migration issue only happens in an "unusual corner case", > are we really in a hurry to revert this commit and go back to the > failure to start, even before we actually figure out what the proper fix > is? I'm in a hurry to go back to a world where qemu doesn't unexpectedly allocate more RAM to a guest. If the real problem with the original patch was that it broke migration, we could fix that pretty easily; but the real problem (to me) with the original patch is that it unpredicatably allocates more memory to a guest that the toolstack doesn't know about. -George