From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: BUG: failed to save x86 HVM guest with 1TB ram Date: Mon, 7 Sep 2015 10:48:38 +0100 Message-ID: <55ED5D76.9030808@citrix.com> References: <8ADDA2EB7601DA429B6B2A43EF4620A51D9345E8@szxeml556-mbs.china.huawei.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1ZYt2b-0006M9-IL for xen-devel@lists.xenproject.org; Mon, 07 Sep 2015 09:48:45 +0000 In-Reply-To: <8ADDA2EB7601DA429B6B2A43EF4620A51D9345E8@szxeml556-mbs.china.huawei.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: "wangxin (U)" , "xen-devel@lists.xenproject.org" Cc: Fanhenglong , "wei.liu2@citrix.com" , "Hanweidong (Randy)" List-Id: xen-devel@lists.xenproject.org On 07/09/15 09:09, wangxin (U) wrote: > Hi, > > I'm tring to hibernate an x86 HVM guest with 1TB ram, > [1.VM config] > builder = "hvm" > name = "suse12_sp3" > memory = 1048576 > vcpus = 16 > boot = "c" > disk = [ '/mnt/sda10/vm/SELS_ide_disk.img,raw,xvda,rw' ] > device_model_version = "qemu-xen" > vnc = 1 > vnclisten = '9.51.3.174' > vncdisplay = 0 > > but I get the error messages(see below) from XC: > [2.VM saving] xl save -p suse12_sp3 suse12_sp3.save > Saving to suse12_sp3.save new xl format (info 0x1/0x0/1309) > xc: error: Cannot save this big a guest: Internal error > libxl: error: libxl_dom.c:1875:libxl__xc_domain_save_done: saving domain: \ > domain did not respond to suspend request: Argument list too long > libxl: error: libxl_dom.c:2032:remus_teardown_done: Remus: failed to \ > teardown device for guest with domid 3, rc -8 > Failed to save domain, resuming domain > xc: error: Dom 3 not suspended: (shutdown 0, reason 255): Internal error > libxl: error: libxl.c:508:libxl__domain_resume: xc_domain_resume failed \ > for domain 3: Invalid argument > > The error in function xc_domain_save in xc_domain_save.c, > /* Get the size of the P2M table */ > dinfo->p2m_size = xc_domain_maximum_gpfn(xch, dom) + 1; > > if ( dinfo->p2m_size > ~XEN_DOMCTL_PFINFO_LTAB_MASK ) > { > errno = E2BIG; > ERROR("Cannot save this big a guest"); > goto out; > } > > it may be 1TB ram plus pci-hole space make the MFN wider than limit size. > > If I want to save a VM with 1TB ram or larger, what shoud I do? Did anyone > have tried this before and have some configuration I can refer to? This is clearly not from Xen 4.6, but the same issue will be present. The check serves a dual purpose. In the legacy case, it is to avoid clobbering the upper bits of pfn information with pfn type information for 32bit toolstacks; any PFN above 2^28 would have type information clobbering the upper bits. This has been mitigated somewhat in migration v2, as pfns are strictly 64bit values, still using the upper 4 bits for type information, allowing 60 bits for the PFN itself. The second purpose is just as a limit on toolstack resources. Migration requires allocating structures which scale linearly with the size of the VM; the biggest of which would be ~1GB for the p2m. Added to this is >1GB for the m2p, and suddenly a 32bit toolstack process is looking scarce on RAM. During the development of migration v2, I didn't spend any time considering if or how much it was sensible to lift the restriction by, so the check was imported wholesale from the legacy code. For now, I am going to say that it simply doesn't work. Simply upping the limit is only a stopgap measure; an HVM guest can still mess this up by playing physmap games and mapping a page of ram at a really high (guest) physical address. Longterm, we need hypervisor support for getting a compressed view of guest physical address space, so toolstack side resources are proportional to the amount of RAM given to the guest, not to how big a guest decides to make its physmap. ~Andrew