BUG: failed to save x86 HVM guest with 1TB ram

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* BUG: failed to save x86 HVM guest with 1TB ram
@ 2015-09-07  8:09 wangxin (U)
  2015-09-07  9:48 ` Andrew Cooper
  0 siblings, 1 reply; 4+ messages in thread
From: wangxin (U) @ 2015-09-07  8:09 UTC (permalink / raw)
  To: xen-devel@lists.xenproject.org
  Cc: Fanhenglong, wei.liu2@citrix.com, Hanweidong (Randy)

Hi,

I'm tring to hibernate an x86 HVM guest with 1TB ram,
  [1.VM config]
  builder = "hvm"
  name = "suse12_sp3"
  memory = 1048576
  vcpus = 16
  boot = "c"
  disk = [ '/mnt/sda10/vm/SELS_ide_disk.img,raw,xvda,rw' ]
  device_model_version = "qemu-xen"
  vnc = 1
  vnclisten = '9.51.3.174'
  vncdisplay = 0

but I get the error messages(see below) from XC:
  [2.VM saving] xl save -p suse12_sp3 suse12_sp3.save
  Saving to suse12_sp3.save new xl format (info 0x1/0x0/1309)
  xc: error: Cannot save this big a guest: Internal error
  libxl: error: libxl_dom.c:1875:libxl__xc_domain_save_done: saving domain: \
  domain did not respond to suspend request: Argument list too long
  libxl: error: libxl_dom.c:2032:remus_teardown_done: Remus: failed to \
  teardown device for guest with domid 3, rc -8
  Failed to save domain, resuming domain
  xc: error: Dom 3 not suspended: (shutdown 0, reason 255): Internal error
  libxl: error: libxl.c:508:libxl__domain_resume: xc_domain_resume failed \
  for domain 3: Invalid argument

The error in function xc_domain_save in xc_domain_save.c, 
    /* Get the size of the P2M table */
    dinfo->p2m_size = xc_domain_maximum_gpfn(xch, dom) + 1;

    if ( dinfo->p2m_size > ~XEN_DOMCTL_PFINFO_LTAB_MASK )
    {
        errno = E2BIG;
        ERROR("Cannot save this big a guest");
        goto out;
    }

it may be 1TB ram plus pci-hole space make the MFN wider than limit size.

If I want to save a VM with 1TB ram or larger, what shoud I do? Did anyone
have tried this before and have some configuration I can refer to?

Thank you very much!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: BUG: failed to save x86 HVM guest with 1TB ram
  2015-09-07  8:09 BUG: failed to save x86 HVM guest with 1TB ram wangxin (U)
@ 2015-09-07  9:48 ` Andrew Cooper
  2015-09-08  2:28   ` wangxin (U)
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Cooper @ 2015-09-07  9:48 UTC (permalink / raw)
  To: wangxin (U), xen-devel@lists.xenproject.org
  Cc: Fanhenglong, wei.liu2@citrix.com, Hanweidong (Randy)

On 07/09/15 09:09, wangxin (U) wrote:
> Hi,
>
> I'm tring to hibernate an x86 HVM guest with 1TB ram,
>    [1.VM config]
>    builder = "hvm"
>    name = "suse12_sp3"
>    memory = 1048576
>    vcpus = 16
>    boot = "c"
>    disk = [ '/mnt/sda10/vm/SELS_ide_disk.img,raw,xvda,rw' ]
>    device_model_version = "qemu-xen"
>    vnc = 1
>    vnclisten = '9.51.3.174'
>    vncdisplay = 0
>
> but I get the error messages(see below) from XC:
>    [2.VM saving] xl save -p suse12_sp3 suse12_sp3.save
>    Saving to suse12_sp3.save new xl format (info 0x1/0x0/1309)
>    xc: error: Cannot save this big a guest: Internal error
>    libxl: error: libxl_dom.c:1875:libxl__xc_domain_save_done: saving domain: \
>    domain did not respond to suspend request: Argument list too long
>    libxl: error: libxl_dom.c:2032:remus_teardown_done: Remus: failed to \
>    teardown device for guest with domid 3, rc -8
>    Failed to save domain, resuming domain
>    xc: error: Dom 3 not suspended: (shutdown 0, reason 255): Internal error
>    libxl: error: libxl.c:508:libxl__domain_resume: xc_domain_resume failed \
>    for domain 3: Invalid argument
>
> The error in function xc_domain_save in xc_domain_save.c,
>      /* Get the size of the P2M table */
>      dinfo->p2m_size = xc_domain_maximum_gpfn(xch, dom) + 1;
>
>      if ( dinfo->p2m_size > ~XEN_DOMCTL_PFINFO_LTAB_MASK )
>      {
>          errno = E2BIG;
>          ERROR("Cannot save this big a guest");
>          goto out;
>      }
>
> it may be 1TB ram plus pci-hole space make the MFN wider than limit size.
>
> If I want to save a VM with 1TB ram or larger, what shoud I do? Did anyone
> have tried this before and have some configuration I can refer to?

This is clearly not from Xen 4.6, but the same issue will be present.

The check serves a dual purpose.  In the legacy case, it is to avoid 
clobbering the upper bits of pfn information with pfn type information 
for 32bit toolstacks;  any PFN above 2^28 would have type information 
clobbering the upper bits.  This has been mitigated somewhat in 
migration v2, as pfns are strictly 64bit values, still using the upper 4 
bits for type information, allowing 60 bits for the PFN itself.

The second purpose is just as a limit on toolstack resources. Migration 
requires allocating structures which scale linearly with the size of the 
VM; the biggest of which would be ~1GB for the p2m. Added to this is 
 >1GB for the m2p, and suddenly a 32bit toolstack process is looking 
scarce on RAM.

During the development of migration v2, I didn't spend any time 
considering if or how much it was sensible to lift the restriction by, 
so the check was imported wholesale from the legacy code.

For now, I am going to say that it simply doesn't work.  Simply upping 
the limit is only a stopgap measure; an HVM guest can still mess this up 
by playing physmap games and mapping a page of ram at a really high 
(guest) physical address.  Longterm, we need hypervisor support for 
getting a compressed view of guest physical address space, so toolstack 
side resources are proportional to the amount of RAM given to the guest, 
not to how big a guest decides to make its physmap.

~Andrew

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: BUG: failed to save x86 HVM guest with 1TB ram
  2015-09-07  9:48 ` Andrew Cooper
@ 2015-09-08  2:28   ` wangxin (U)
  2015-09-09 18:50     ` Andrew Cooper
  0 siblings, 1 reply; 4+ messages in thread
From: wangxin (U) @ 2015-09-08  2:28 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel@lists.xenproject.org
  Cc: Fanhenglong, wei.liu2@citrix.com, Hanweidong (Randy)



> -----Original Message-----
> From: Andrew Cooper [mailto:amc96@hermes.cam.ac.uk] On Behalf Of Andrew Cooper
> Sent: Monday, September 07, 2015 5:49 PM
> To: wangxin (U); xen-devel@lists.xenproject.org
> Cc: Fanhenglong; wei.liu2@citrix.com; Hanweidong (Randy)
> Subject: Re: [Xen-devel] BUG: failed to save x86 HVM guest with 1TB ram
> 
> On 07/09/15 09:09, wangxin (U) wrote:
> > Hi,
> >
> > I'm tring to hibernate an x86 HVM guest with 1TB ram,
> >    [1.VM config]
> >    builder = "hvm"
> >    name = "suse12_sp3"
> >    memory = 1048576
> >    vcpus = 16
> >    boot = "c"
> >    disk = [ '/mnt/sda10/vm/SELS_ide_disk.img,raw,xvda,rw' ]
> >    device_model_version = "qemu-xen"
> >    vnc = 1
> >    vnclisten = '9.51.3.174'
> >    vncdisplay = 0
> >
> > but I get the error messages(see below) from XC:
> >    [2.VM saving] xl save -p suse12_sp3 suse12_sp3.save
> >    Saving to suse12_sp3.save new xl format (info 0x1/0x0/1309)
> >    xc: error: Cannot save this big a guest: Internal error
> >    libxl: error: libxl_dom.c:1875:libxl__xc_domain_save_done: saving domain: \
> >    domain did not respond to suspend request: Argument list too long
> >    libxl: error: libxl_dom.c:2032:remus_teardown_done: Remus: failed to \
> >    teardown device for guest with domid 3, rc -8
> >    Failed to save domain, resuming domain
> >    xc: error: Dom 3 not suspended: (shutdown 0, reason 255): Internal error
> >    libxl: error: libxl.c:508:libxl__domain_resume: xc_domain_resume failed \
> >    for domain 3: Invalid argument
> >
> > The error in function xc_domain_save in xc_domain_save.c,
> >      /* Get the size of the P2M table */
> >      dinfo->p2m_size = xc_domain_maximum_gpfn(xch, dom) + 1;
> >
> >      if ( dinfo->p2m_size > ~XEN_DOMCTL_PFINFO_LTAB_MASK )
> >      {
> >          errno = E2BIG;
> >          ERROR("Cannot save this big a guest");
> >          goto out;
> >      }
> >
> > it may be 1TB ram plus pci-hole space make the MFN wider than limit size.
> >
> > If I want to save a VM with 1TB ram or larger, what shoud I do? Did anyone
> > have tried this before and have some configuration I can refer to?
> 
> This is clearly not from Xen 4.6, but the same issue will be present.

Yeah,  It's Xen 4.5.0, and I haven't try it in Xen 4.6 upstream yet.

> 
> The check serves a dual purpose.  In the legacy case, it is to avoid
> clobbering the upper bits of pfn information with pfn type information
> for 32bit toolstacks;  any PFN above 2^28 would have type information
> clobbering the upper bits.  This has been mitigated somewhat in
> migration v2, as pfns are strictly 64bit values, still using the upper 4
> bits for type information, allowing 60 bits for the PFN itself.
> 
> The second purpose is just as a limit on toolstack resources. Migration
> requires allocating structures which scale linearly with the size of the
> VM; the biggest of which would be ~1GB for the p2m. Added to this is
>  >1GB for the m2p, and suddenly a 32bit toolstack process is looking
> scarce on RAM.
> 
> During the development of migration v2, I didn't spend any time
> considering if or how much it was sensible to lift the restriction by,
> so the check was imported wholesale from the legacy code.
> 
> For now, I am going to say that it simply doesn't work.  Simply upping
> the limit is only a stopgap measure; an HVM guest can still mess this up

Will the stopgap measure work in xen 4.5? 

> by playing physmap games and mapping a page of ram at a really high
> (guest) physical address.  Longterm, we need hypervisor support for
> getting a compressed view of guest physical address space, so toolstack
> side resources are proportional to the amount of RAM given to the guest,
> not to how big a guest decides to make its physmap.

Is that in your further work?  

Best regards,
Xin

> 
> ~Andrew

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: BUG: failed to save x86 HVM guest with 1TB ram
  2015-09-08  2:28   ` wangxin (U)
@ 2015-09-09 18:50     ` Andrew Cooper
  0 siblings, 0 replies; 4+ messages in thread
From: Andrew Cooper @ 2015-09-09 18:50 UTC (permalink / raw)
  To: wangxin (U), xen-devel@lists.xenproject.org
  Cc: Fanhenglong, wei.liu2@citrix.com, Hanweidong (Randy)

On 08/09/15 03:28, wangxin (U) wrote:
>
>> The check serves a dual purpose.  In the legacy case, it is to avoid
>> clobbering the upper bits of pfn information with pfn type information
>> for 32bit toolstacks;  any PFN above 2^28 would have type information
>> clobbering the upper bits.  This has been mitigated somewhat in
>> migration v2, as pfns are strictly 64bit values, still using the upper 4
>> bits for type information, allowing 60 bits for the PFN itself.
>>
>> The second purpose is just as a limit on toolstack resources. Migration
>> requires allocating structures which scale linearly with the size of the
>> VM; the biggest of which would be ~1GB for the p2m. Added to this is
>>   >1GB for the m2p, and suddenly a 32bit toolstack process is looking
>> scarce on RAM.
>>
>> During the development of migration v2, I didn't spend any time
>> considering if or how much it was sensible to lift the restriction by,
>> so the check was imported wholesale from the legacy code.
>>
>> For now, I am going to say that it simply doesn't work.  Simply upping
>> the limit is only a stopgap measure; an HVM guest can still mess this up
> Will the stopgap measure work in xen 4.5?

I don't know - try it.

>
>> by playing physmap games and mapping a page of ram at a really high
>> (guest) physical address.  Longterm, we need hypervisor support for
>> getting a compressed view of guest physical address space, so toolstack
>> side resources are proportional to the amount of RAM given to the guest,
>> not to how big a guest decides to make its physmap.
> Is that in your further work?

It is on the list, but no idea if/how it would be done at the moment.

~Andrew

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-09-09 18:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-07  8:09 BUG: failed to save x86 HVM guest with 1TB ram wangxin (U)
2015-09-07  9:48 ` Andrew Cooper
2015-09-08  2:28   ` wangxin (U)
2015-09-09 18:50     ` Andrew Cooper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).