* balloon driver broken in 3.12+ after save+restore
@ 2014-05-22 1:31 Marek Marczykowski-Górecki
2014-06-27 0:42 ` [bisected] " Marek Marczykowski-Górecki
2014-06-27 9:51 ` David Vrabel
0 siblings, 2 replies; 4+ messages in thread
From: Marek Marczykowski-Górecki @ 2014-05-22 1:31 UTC (permalink / raw)
To: xen-devel@lists.xen.org
[-- Attachment #1.1: Type: text/plain, Size: 3226 bytes --]
Hi,
I have a problem with balloon driver after/during restoring a saved domain.
There are two symptoms:
1. When domain was 'xl mem-set <some size smaller than initial>' just before
save, it still needs initial memory size to restore. Details below.
2. Restored domain sometimes (most of the time) do not want to balloon down.
For example when the domain has 3300MB and I mem-set it to 2800MB, nothing
changes immediately (only "target" in sysfs) - both 'xl list' and 'free'
inside reports the same size (and plenty of free memory in the VM). After some
time it get ballooned down to ~3000, still not 2800. I haven't found any
pattern here.
Both of above was working perfectly in 3.11.
I'm running Xen 4.1.6.1.
Details for the first problem:
Preparation:
I start the VM as in config at the end of email (memory=400, maxmem=4000),
wait some time, then 'xl mem-set' to size just about really used memory (about
200MB in most cases). Then 'sleep 1' and 'xl save'.
When I want to restore that domain, I get initial config file, replace memory
setting with size used in 'xl mem-set' above and call 'xl restore' providing
that config. It fails with this error:
---
Loading new save file /var/run/qubes/current-savefile (new xl fmt info
0x0/0x0/849)
Savefile contains xl domain config
xc: detail: xc_domain_restore start: p2m_size = fa800
xc: detail: Failed allocation for dom 51: 1024 extents of order 0
xc: error: Failed to allocate memory for batch.!: Internal error
xc: detail: Restore exit with rc=1
libxl: error: libxl_dom.c:313:libxl__domain_restore_common restoring domain:
Resource temporarily unavailable
cannot (re-)build domain: -3
libxl: error: libxl.c:713:libxl_domain_destroy non-existant domain 51
---
When memory set back to 400 (or slightly lower, like 380) - restore succeeded,
but still the second problem is happening.
I've bisected the first problem down to this commit:
commit cd9151e26d31048b2b5e00fd02e110e07d2200c9
xen/balloon: set a mapping for ballooned out pages
I've checked that the problem still exists in v3.14.4.
Any idea how to fix this?
The domain config:
---
kernel="/var/lib/qubes/vm-kernels/3.12.18-1/vmlinuz"
ramdisk="/var/lib/qubes/vm-kernels/3.12.18-1/initramfs"
extra="ro nomodeset console=hvc0 rd_NO_PLYMOUTH nopat"
root="/dev/mapper/dmroot"
tsc_mode = 2
memory = 400
maxmem = 4000
name = "fedora-20-x64-dvm"
disk = [
'script:snapshot:/var/lib/qubes/vm-templates/fedora-20-x64/root.img:/var/lib/qubes/vm-templates/fedora-20-x64/root-cow.img,xvda,r',
'script:file:/var/lib/qubes/appvms/fedora-20-x64-dvm/private.img,xvdb,w',
'script:file:/var/lib/qubes/appvms/fedora-20-x64-dvm/volatile.img,xvdc,w',
'script:file:/var/lib/qubes/vm-kernels/3.12.18-1/modules.img,xvdd,r',
]
vif = [
'mac=00:16:3E:5E:6C:02,script=/etc/xen/scripts/vif-route-qubes,ip=10.137.2.4,backend=firewallvm'
]
pci = [ ]
vcpus = 1
on_poweroff = 'destroy'
on_reboot = 'destroy'
on_crash = 'destroy'
---
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 538 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [bisected] balloon driver broken in 3.12+ after save+restore
2014-05-22 1:31 balloon driver broken in 3.12+ after save+restore Marek Marczykowski-Górecki
@ 2014-06-27 0:42 ` Marek Marczykowski-Górecki
2014-06-27 9:51 ` David Vrabel
1 sibling, 0 replies; 4+ messages in thread
From: Marek Marczykowski-Górecki @ 2014-06-27 0:42 UTC (permalink / raw)
To: xen-devel@lists.xen.org
[-- Attachment #1.1: Type: text/plain, Size: 3612 bytes --]
On 22.05.2014 03:31, Marek Marczykowski-Górecki wrote:
> Hi,
>
> I have a problem with balloon driver after/during restoring a saved domain.
> There are two symptoms:
> 1. When domain was 'xl mem-set <some size smaller than initial>' just before
> save, it still needs initial memory size to restore. Details below.
>
> 2. Restored domain sometimes (most of the time) do not want to balloon down.
> For example when the domain has 3300MB and I mem-set it to 2800MB, nothing
> changes immediately (only "target" in sysfs) - both 'xl list' and 'free'
> inside reports the same size (and plenty of free memory in the VM). After some
> time it get ballooned down to ~3000, still not 2800. I haven't found any
> pattern here.
>
> Both of above was working perfectly in 3.11.
>
> I'm running Xen 4.1.6.1.
>
> Details for the first problem:
> Preparation:
> I start the VM as in config at the end of email (memory=400, maxmem=4000),
> wait some time, then 'xl mem-set' to size just about really used memory (about
> 200MB in most cases). Then 'sleep 1' and 'xl save'.
> When I want to restore that domain, I get initial config file, replace memory
> setting with size used in 'xl mem-set' above and call 'xl restore' providing
> that config. It fails with this error:
> ---
> Loading new save file /var/run/qubes/current-savefile (new xl fmt info
> 0x0/0x0/849)
> Savefile contains xl domain config
> xc: detail: xc_domain_restore start: p2m_size = fa800
> xc: detail: Failed allocation for dom 51: 1024 extents of order 0
> xc: error: Failed to allocate memory for batch.!: Internal error
> xc: detail: Restore exit with rc=1
> libxl: error: libxl_dom.c:313:libxl__domain_restore_common restoring domain:
> Resource temporarily unavailable
> cannot (re-)build domain: -3
> libxl: error: libxl.c:713:libxl_domain_destroy non-existant domain 51
> ---
> When memory set back to 400 (or slightly lower, like 380) - restore succeeded,
> but still the second problem is happening.
>
> I've bisected the first problem down to this commit:
> commit cd9151e26d31048b2b5e00fd02e110e07d2200c9
> xen/balloon: set a mapping for ballooned out pages
>
> I've checked that the problem still exists in v3.14.4.
>
> Any idea how to fix this?
Anyone?
> The domain config:
> ---
> kernel="/var/lib/qubes/vm-kernels/3.12.18-1/vmlinuz"
> ramdisk="/var/lib/qubes/vm-kernels/3.12.18-1/initramfs"
> extra="ro nomodeset console=hvc0 rd_NO_PLYMOUTH nopat"
> root="/dev/mapper/dmroot"
> tsc_mode = 2
>
> memory = 400
> maxmem = 4000
> name = "fedora-20-x64-dvm"
>
> disk = [
> 'script:snapshot:/var/lib/qubes/vm-templates/fedora-20-x64/root.img:/var/lib/qubes/vm-templates/fedora-20-x64/root-cow.img,xvda,r',
> 'script:file:/var/lib/qubes/appvms/fedora-20-x64-dvm/private.img,xvdb,w',
>
> 'script:file:/var/lib/qubes/appvms/fedora-20-x64-dvm/volatile.img,xvdc,w',
> 'script:file:/var/lib/qubes/vm-kernels/3.12.18-1/modules.img,xvdd,r',
> ]
>
> vif = [
> 'mac=00:16:3E:5E:6C:02,script=/etc/xen/scripts/vif-route-qubes,ip=10.137.2.4,backend=firewallvm'
> ]
>
> pci = [ ]
>
> vcpus = 1
>
> on_poweroff = 'destroy'
> on_reboot = 'destroy'
> on_crash = 'destroy'
> ---
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 538 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: balloon driver broken in 3.12+ after save+restore
2014-05-22 1:31 balloon driver broken in 3.12+ after save+restore Marek Marczykowski-Górecki
2014-06-27 0:42 ` [bisected] " Marek Marczykowski-Górecki
@ 2014-06-27 9:51 ` David Vrabel
2014-06-27 13:57 ` Marek Marczykowski-Górecki
1 sibling, 1 reply; 4+ messages in thread
From: David Vrabel @ 2014-06-27 9:51 UTC (permalink / raw)
To: Marek Marczykowski-Górecki, xen-devel@lists.xen.org
On 22/05/14 02:31, Marek Marczykowski-Górecki wrote:
> Hi,
>
> I have a problem with balloon driver after/during restoring a saved domain.
> There are two symptoms:
> 1. When domain was 'xl mem-set <some size smaller than initial>' just before
> save, it still needs initial memory size to restore. Details below.
>
> 2. Restored domain sometimes (most of the time) do not want to balloon down.
> For example when the domain has 3300MB and I mem-set it to 2800MB, nothing
> changes immediately (only "target" in sysfs) - both 'xl list' and 'free'
> inside reports the same size (and plenty of free memory in the VM). After some
> time it get ballooned down to ~3000, still not 2800. I haven't found any
> pattern here.
>
> Both of above was working perfectly in 3.11.
>
> I'm running Xen 4.1.6.1.
>
> Details for the first problem:
> Preparation:
> I start the VM as in config at the end of email (memory=400, maxmem=4000),
> wait some time, then 'xl mem-set' to size just about really used memory (about
> 200MB in most cases). Then 'sleep 1' and 'xl save'.
> When I want to restore that domain, I get initial config file, replace memory
> setting with size used in 'xl mem-set' above and call 'xl restore' providing
> that config. It fails with this error:
> ---
> Loading new save file /var/run/qubes/current-savefile (new xl fmt info
> 0x0/0x0/849)
> Savefile contains xl domain config
> xc: detail: xc_domain_restore start: p2m_size = fa800
> xc: detail: Failed allocation for dom 51: 1024 extents of order 0
> xc: error: Failed to allocate memory for batch.!: Internal error
> xc: detail: Restore exit with rc=1
> libxl: error: libxl_dom.c:313:libxl__domain_restore_common restoring domain:
> Resource temporarily unavailable
> cannot (re-)build domain: -3
> libxl: error: libxl.c:713:libxl_domain_destroy non-existant domain 51
> ---
> When memory set back to 400 (or slightly lower, like 380) - restore succeeded,
> but still the second problem is happening.
>
> I've bisected the first problem down to this commit:
> commit cd9151e26d31048b2b5e00fd02e110e07d2200c9
> xen/balloon: set a mapping for ballooned out pages
Sorry for the delay. I somehow missed this.
This is likely caused by the balloon driver creating multiple entries
in the p2m all pointing to the MFNs of the scratch pages. These
duplicates are de-duped on save/restore.
I suspect your 2nd issue may also be caused by this.
Can you try this patch, please?
8<----------------------------------------------
xen/balloon: set ballooned out pages as invalid in p2m
Since cd9151e26d31048b2b5e00fd02e110e07d2200c9 (xen/balloon: set a
mapping for ballooned out pages), a ballooned out page had its entry
in the p2m set to the MFN of one of the scratch page. This means that
the p2m will contain many entries pointing to the same MFN.
During a domain save, this many-to-one entries are not considered and
the scratch page is saved multiple times. On restore the ballooned
pages are populated with new frames and the domain may use up its
allocation before all pages can be restores.
Set ballooned out pages as INVALID_P2M_ENTRY in the p2m (as they
werebefore), preventing them from being saved and re-populated on
restore.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
drivers/xen/balloon.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index b7a506f..5c660c7 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -426,20 +426,18 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
* p2m are consistent.
*/
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
- unsigned long p;
- struct page *scratch_page = get_balloon_scratch_page();
-
if (!PageHighMem(page)) {
+ struct page *scratch_page = get_balloon_scratch_page();
+
ret = HYPERVISOR_update_va_mapping(
(unsigned long)__va(pfn << PAGE_SHIFT),
pfn_pte(page_to_pfn(scratch_page),
PAGE_KERNEL_RO), 0);
BUG_ON(ret);
- }
- p = page_to_pfn(scratch_page);
- __set_phys_to_machine(pfn, pfn_to_mfn(p));
- put_balloon_scratch_page();
+ put_balloon_scratch_page();
+ }
+ __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
}
#endif
--
1.7.10.4
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: balloon driver broken in 3.12+ after save+restore
2014-06-27 9:51 ` David Vrabel
@ 2014-06-27 13:57 ` Marek Marczykowski-Górecki
0 siblings, 0 replies; 4+ messages in thread
From: Marek Marczykowski-Górecki @ 2014-06-27 13:57 UTC (permalink / raw)
To: David Vrabel, xen-devel@lists.xen.org
[-- Attachment #1.1: Type: text/plain, Size: 4828 bytes --]
On 27.06.2014 11:51, David Vrabel wrote:
> On 22/05/14 02:31, Marek Marczykowski-Górecki wrote:
>> Hi,
>>
>> I have a problem with balloon driver after/during restoring a saved domain.
>> There are two symptoms:
>> 1. When domain was 'xl mem-set <some size smaller than initial>' just before
>> save, it still needs initial memory size to restore. Details below.
>>
>> 2. Restored domain sometimes (most of the time) do not want to balloon down.
>> For example when the domain has 3300MB and I mem-set it to 2800MB, nothing
>> changes immediately (only "target" in sysfs) - both 'xl list' and 'free'
>> inside reports the same size (and plenty of free memory in the VM). After some
>> time it get ballooned down to ~3000, still not 2800. I haven't found any
>> pattern here.
>>
>> Both of above was working perfectly in 3.11.
>>
>> I'm running Xen 4.1.6.1.
>>
>> Details for the first problem:
>> Preparation:
>> I start the VM as in config at the end of email (memory=400, maxmem=4000),
>> wait some time, then 'xl mem-set' to size just about really used memory (about
>> 200MB in most cases). Then 'sleep 1' and 'xl save'.
>> When I want to restore that domain, I get initial config file, replace memory
>> setting with size used in 'xl mem-set' above and call 'xl restore' providing
>> that config. It fails with this error:
>> ---
>> Loading new save file /var/run/qubes/current-savefile (new xl fmt info
>> 0x0/0x0/849)
>> Savefile contains xl domain config
>> xc: detail: xc_domain_restore start: p2m_size = fa800
>> xc: detail: Failed allocation for dom 51: 1024 extents of order 0
>> xc: error: Failed to allocate memory for batch.!: Internal error
>> xc: detail: Restore exit with rc=1
>> libxl: error: libxl_dom.c:313:libxl__domain_restore_common restoring domain:
>> Resource temporarily unavailable
>> cannot (re-)build domain: -3
>> libxl: error: libxl.c:713:libxl_domain_destroy non-existant domain 51
>> ---
>> When memory set back to 400 (or slightly lower, like 380) - restore succeeded,
>> but still the second problem is happening.
>>
>> I've bisected the first problem down to this commit:
>> commit cd9151e26d31048b2b5e00fd02e110e07d2200c9
>> xen/balloon: set a mapping for ballooned out pages
>
> Sorry for the delay. I somehow missed this.
>
> This is likely caused by the balloon driver creating multiple entries
> in the p2m all pointing to the MFNs of the scratch pages. These
> duplicates are de-duped on save/restore.
>
> I suspect your 2nd issue may also be caused by this.
>
> Can you try this patch, please?
Looks to be the right fix, thanks!
>
> 8<----------------------------------------------
> xen/balloon: set ballooned out pages as invalid in p2m
>
> Since cd9151e26d31048b2b5e00fd02e110e07d2200c9 (xen/balloon: set a
> mapping for ballooned out pages), a ballooned out page had its entry
> in the p2m set to the MFN of one of the scratch page. This means that
> the p2m will contain many entries pointing to the same MFN.
>
> During a domain save, this many-to-one entries are not considered and
> the scratch page is saved multiple times. On restore the ballooned
> pages are populated with new frames and the domain may use up its
> allocation before all pages can be restores.
>
> Set ballooned out pages as INVALID_P2M_ENTRY in the p2m (as they
> werebefore), preventing them from being saved and re-populated on
> restore.
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
> drivers/xen/balloon.c | 12 +++++-------
> 1 file changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index b7a506f..5c660c7 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -426,20 +426,18 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
> * p2m are consistent.
> */
> if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> - unsigned long p;
> - struct page *scratch_page = get_balloon_scratch_page();
> -
> if (!PageHighMem(page)) {
> + struct page *scratch_page = get_balloon_scratch_page();
> +
> ret = HYPERVISOR_update_va_mapping(
> (unsigned long)__va(pfn << PAGE_SHIFT),
> pfn_pte(page_to_pfn(scratch_page),
> PAGE_KERNEL_RO), 0);
> BUG_ON(ret);
> - }
> - p = page_to_pfn(scratch_page);
> - __set_phys_to_machine(pfn, pfn_to_mfn(p));
>
> - put_balloon_scratch_page();
> + put_balloon_scratch_page();
> + }
> + __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
> }
> #endif
>
>
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 538 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-06-27 13:57 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-22 1:31 balloon driver broken in 3.12+ after save+restore Marek Marczykowski-Górecki
2014-06-27 0:42 ` [bisected] " Marek Marczykowski-Górecki
2014-06-27 9:51 ` David Vrabel
2014-06-27 13:57 ` Marek Marczykowski-Górecki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).