* [PATCH][HVM] fix VNIF restore failure on HVM guest with heavy workload
@ 2007-04-10 16:47 Zhai, Edwin
2007-04-10 19:16 ` Keir Fraser
0 siblings, 1 reply; 13+ messages in thread
From: Zhai, Edwin @ 2007-04-10 16:47 UTC (permalink / raw)
To: Ian Pratt, Keir Fraser, Tim Deegan; +Cc: xen-devel, edwin.zhai
[PATCH][HVM] fix VNIF restore failure on HVM guest with heavy workload
Signed-off-by: Zhai Edwin <edwin.zhai@intel.com>
VNIF has many intrs when save/restore with net workload, so need keep handler from intrs
diff -r 2fab1ec4dc74 linux-2.6-xen-sparse/drivers/xen/core/reboot.c
--- a/linux-2.6-xen-sparse/drivers/xen/core/reboot.c Mon Apr 09 16:35:01 2007 +0800
+++ b/linux-2.6-xen-sparse/drivers/xen/core/reboot.c Tue Apr 10 16:44:52 2007 +0800
@@ -44,12 +44,14 @@ int __xen_suspend(int fast_suspend)
int __xen_suspend(int fast_suspend)
{
xenbus_suspend();
+ local_irq_disable();
platform_pci_suspend();
/* pvdrv sleep in this hyper-call when save */
HYPERVISOR_shutdown(SHUTDOWN_suspend);
platform_pci_resume();
+ local_irq_enable();
xenbus_resume();
printk("PV stuff on HVM resume successfully!\n");
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH][HVM] fix VNIF restore failure on HVM guest with heavy workload
2007-04-10 16:47 [PATCH][HVM] fix VNIF restore failure on HVM guest with heavy workload Zhai, Edwin
@ 2007-04-10 19:16 ` Keir Fraser
2007-04-11 7:20 ` Zhai, Edwin
0 siblings, 1 reply; 13+ messages in thread
From: Keir Fraser @ 2007-04-10 19:16 UTC (permalink / raw)
To: Zhai, Edwin, Ian Pratt, Tim Deegan; +Cc: xen-devel
On 10/4/07 17:47, "Zhai, Edwin" <edwin.zhai@intel.com> wrote:
> [PATCH][HVM] fix VNIF restore failure on HVM guest with heavy workload
>
> Signed-off-by: Zhai Edwin <edwin.zhai@intel.com>
>
> VNIF has many intrs when save/restore with net workload, so need keep handler
> from intrs
What happens if an interrupt is being processed during save/restore? It
would be nice to know what the underlying bug is!
That said, it may well make sense to somehow disable interrupt handling
across save/restore. Unfortunately your patch is insufficient since we could
handle event-channel interrupts on any VCPU (the irq's affinity can be
changed outside our control if it is routed through the virtual IOAPIC, and
if e.g. the userspace irqbalance daemon is running).
I wanted to use stop_machine_run() but unfortunately it isn't exported to
modules. :-( irq_disable() may do the right thing for us though.
-- Keir
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH][HVM] fix VNIF restore failure on HVM guest with heavy workload
2007-04-10 19:16 ` Keir Fraser
@ 2007-04-11 7:20 ` Zhai, Edwin
2007-04-11 7:41 ` Keir Fraser
2007-04-11 8:56 ` Keir Fraser
0 siblings, 2 replies; 13+ messages in thread
From: Zhai, Edwin @ 2007-04-11 7:20 UTC (permalink / raw)
To: Keir Fraser; +Cc: Tim Deegan, Ian Pratt, xen-devel, Zhai, Edwin
On Tue, Apr 10, 2007 at 08:16:04PM +0100, Keir Fraser wrote:
> On 10/4/07 17:47, "Zhai, Edwin" <edwin.zhai@intel.com> wrote:
>
> > [PATCH][HVM] fix VNIF restore failure on HVM guest with heavy workload
> >
> > Signed-off-by: Zhai Edwin <edwin.zhai@intel.com>
> >
> > VNIF has many intrs when save/restore with net workload, so need keep handler
> > from intrs
>
> What happens if an interrupt is being processed during save/restore? It
> would be nice to know what the underlying bug is!
If an pseudo PCI intr occurred after xen_suspend on cpu0, there is definitely a
crash. I copy this code from original PV driver code.
>
> That said, it may well make sense to somehow disable interrupt handling
> across save/restore. Unfortunately your patch is insufficient since we could
> handle event-channel interrupts on any VCPU (the irq's affinity can be
> changed outside our control if it is routed through the virtual IOAPIC, and
> if e.g. the userspace irqbalance daemon is running).
>
> I wanted to use stop_machine_run() but unfortunately it isn't exported to
> modules. :-( irq_disable() may do the right thing for us though.
SMP is a headache for PV drv save/restore on HVM. Even we disable intr on all
cpus, PV driver on other cpu may still access low level service after
xen_suspend on cpu0.
smp_suspend is used for PV drv on PV domain, which is not suitable for HVM as we
need the transparency to guest.
Do we need lightweight stop_machine_run in this case, i.e. make other cpu sleep?
>
> -- Keir
>
--
best rgds,
edwin
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Re: [PATCH][HVM] fix VNIF restore failure on HVM guest with heavy workload
2007-04-11 7:20 ` Zhai, Edwin
@ 2007-04-11 7:41 ` Keir Fraser
2007-04-11 8:56 ` Keir Fraser
1 sibling, 0 replies; 13+ messages in thread
From: Keir Fraser @ 2007-04-11 7:41 UTC (permalink / raw)
To: Zhai, Edwin; +Cc: Tim Deegan, Ian Pratt, xen-devel
On 11/4/07 08:20, "Zhai, Edwin" <edwin.zhai@intel.com> wrote:
>> What happens if an interrupt is being processed during save/restore? It
>> would be nice to know what the underlying bug is!
>
> If an pseudo PCI intr occurred after xen_suspend on cpu0, there is definitely
> a crash. I copy this code from original PV driver code.
Yeah, but in that case: (a) it's for a different reason [make sure no
interrupt handler runs that might look at machine addresses in page tables,
mainly]; and (b) it's backed up by the fact that all other CPUs have been
offlined or stop_machined().
Do you have a crash oops message? I'm just a little concerned we may end up
masking a real save/restore bug here, which we may as well fix while you can
repro.
> SMP is a headache for PV drv save/restore on HVM. Even we disable intr on all
> cpus, PV driver on other cpu may still access low level service after
> xen_suspend on cpu0. smp_suspend is used for PV drv on PV domain, which is not
> suitable for HVM as we need the transparency to guest.
>
> Do we need lightweight stop_machine_run in this case, i.e. make other cpu
> sleep?
I'm thinking irq_disable() of the pci-flatform irq, coupled with a
smp_call_function() to make the other CPUs spin with interrupts disabled.
-- Keir
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Re: [PATCH][HVM] fix VNIF restore failure on HVM guest with heavy workload
2007-04-11 7:20 ` Zhai, Edwin
2007-04-11 7:41 ` Keir Fraser
@ 2007-04-11 8:56 ` Keir Fraser
2007-04-11 16:24 ` Re: [PATCH][HVM] fix VNIF restore failure on HVMguest " Zhao, Fan
1 sibling, 1 reply; 13+ messages in thread
From: Keir Fraser @ 2007-04-11 8:56 UTC (permalink / raw)
To: Zhai, Edwin; +Cc: Tim Deegan, Ian Pratt, xen-devel
FYI, the next changeset worth testing or fixing is r14795:6e7ef794cdbc. I've
made a *lot* of changes in the last 24 hours. I've tried a few save/restores
under block and net load with no observed problems.
-- Keir
On 11/4/07 08:20, "Zhai, Edwin" <edwin.zhai@intel.com> wrote:
> On Tue, Apr 10, 2007 at 08:16:04PM +0100, Keir Fraser wrote:
>> On 10/4/07 17:47, "Zhai, Edwin" <edwin.zhai@intel.com> wrote:
>>
>>> [PATCH][HVM] fix VNIF restore failure on HVM guest with heavy workload
>>>
>>> Signed-off-by: Zhai Edwin <edwin.zhai@intel.com>
>>>
>>> VNIF has many intrs when save/restore with net workload, so need keep
>>> handler
>>> from intrs
>>
>> What happens if an interrupt is being processed during save/restore? It
>> would be nice to know what the underlying bug is!
>
> If an pseudo PCI intr occurred after xen_suspend on cpu0, there is definitely
> a
> crash. I copy this code from original PV driver code.
>
>>
>> That said, it may well make sense to somehow disable interrupt handling
>> across save/restore. Unfortunately your patch is insufficient since we could
>> handle event-channel interrupts on any VCPU (the irq's affinity can be
>> changed outside our control if it is routed through the virtual IOAPIC, and
>> if e.g. the userspace irqbalance daemon is running).
>>
>> I wanted to use stop_machine_run() but unfortunately it isn't exported to
>> modules. :-( irq_disable() may do the right thing for us though.
>
> SMP is a headache for PV drv save/restore on HVM. Even we disable intr on all
> cpus, PV driver on other cpu may still access low level service after
> xen_suspend on cpu0.
>
> smp_suspend is used for PV drv on PV domain, which is not suitable for HVM as
> we
> need the transparency to guest.
>
> Do we need lightweight stop_machine_run in this case, i.e. make other cpu
> sleep?
>
>
>>
>> -- Keir
>>
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Re: [PATCH][HVM] fix VNIF restore failure on HVMguest with heavy workload
2007-04-11 8:56 ` Keir Fraser
@ 2007-04-11 16:24 ` Zhao, Fan
2007-04-11 16:54 ` Re: [PATCH][HVM] fix VNIF restore failure on HVMguestwith " Steven Hand
2007-04-11 17:32 ` Re: [PATCH][HVM] fix VNIF restore failure on HVMguest with " Keir Fraser
0 siblings, 2 replies; 13+ messages in thread
From: Zhao, Fan @ 2007-04-11 16:24 UTC (permalink / raw)
To: Keir Fraser, Zhai, Edwin; +Cc: Tim Deegan, Ian Pratt, xen-devel
Hi Keir,
I noticed that with cset 14773, if I use xm mem-set to adjust the memory of hvm guest with balloon driver by xm mem-set, and then save the guest, the xm save will fail, so does xm migrate. A white window will pop up, and the guest still exists through xm li. So will your great fixes also include the fixing for this issue? Thanks!
Best regards,
Fan
-----Original Message-----
From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser
Sent: 2007年4月11日 16:57
To: Zhai, Edwin
Cc: Tim Deegan; Ian Pratt; xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] Re: [PATCH][HVM] fix VNIF restore failure on HVMguest with heavy workload
FYI, the next changeset worth testing or fixing is r14795:6e7ef794cdbc. I've
made a *lot* of changes in the last 24 hours. I've tried a few save/restores
under block and net load with no observed problems.
-- Keir
On 11/4/07 08:20, "Zhai, Edwin" <edwin.zhai@intel.com> wrote:
> On Tue, Apr 10, 2007 at 08:16:04PM +0100, Keir Fraser wrote:
>> On 10/4/07 17:47, "Zhai, Edwin" <edwin.zhai@intel.com> wrote:
>>
>>> [PATCH][HVM] fix VNIF restore failure on HVM guest with heavy workload
>>>
>>> Signed-off-by: Zhai Edwin <edwin.zhai@intel.com>
>>>
>>> VNIF has many intrs when save/restore with net workload, so need keep
>>> handler
>>> from intrs
>>
>> What happens if an interrupt is being processed during save/restore? It
>> would be nice to know what the underlying bug is!
>
> If an pseudo PCI intr occurred after xen_suspend on cpu0, there is definitely
> a
> crash. I copy this code from original PV driver code.
>
>>
>> That said, it may well make sense to somehow disable interrupt handling
>> across save/restore. Unfortunately your patch is insufficient since we could
>> handle event-channel interrupts on any VCPU (the irq's affinity can be
>> changed outside our control if it is routed through the virtual IOAPIC, and
>> if e.g. the userspace irqbalance daemon is running).
>>
>> I wanted to use stop_machine_run() but unfortunately it isn't exported to
>> modules. :-( irq_disable() may do the right thing for us though.
>
> SMP is a headache for PV drv save/restore on HVM. Even we disable intr on all
> cpus, PV driver on other cpu may still access low level service after
> xen_suspend on cpu0.
>
> smp_suspend is used for PV drv on PV domain, which is not suitable for HVM as
> we
> need the transparency to guest.
>
> Do we need lightweight stop_machine_run in this case, i.e. make other cpu
> sleep?
>
>
>>
>> -- Keir
>>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Re: [PATCH][HVM] fix VNIF restore failure on HVMguestwith heavy workload
2007-04-11 16:24 ` Re: [PATCH][HVM] fix VNIF restore failure on HVMguest " Zhao, Fan
@ 2007-04-11 16:54 ` Steven Hand
2007-04-12 18:46 ` Mark Williamson
2007-04-11 17:32 ` Re: [PATCH][HVM] fix VNIF restore failure on HVMguest with " Keir Fraser
1 sibling, 1 reply; 13+ messages in thread
From: Steven Hand @ 2007-04-11 16:54 UTC (permalink / raw)
To: Zhao, Fan, Keir Fraser, Zhai, Edwin
Cc: Steven Hand, Tim Deegan, Ian Pratt, xen-devel
Just FYI - this is something I tested successfully last week. Not sure
if anything has been changed in recent changesets but worth checking
up - can you post the output from /var/log/xen/xend.log ?
Secondly: there is a known issue with save/restore of ballooned domains
(HVM or PV) where the balloning is done from within the guest (e.g.
by echoing to /proc/xen/balloon). Since this doesn't update the memory
target within xenstore, you'll end up 'reverting' the guest memory size to
that last set via xm mem-set. The 'fix' is not to do that, i.e. only use the
xm or XenAPI to request ballooning.
cheers,
S.
----- Original Message -----
From: "Zhao, Fan" <fan.zhao@intel.com>
To: "Keir Fraser" <keir@xensource.com>; "Zhai, Edwin" <edwin.zhai@intel.com>
Cc: "Tim Deegan" <Tim.Deegan@xensource.com>; "Ian Pratt"
<Ian.Pratt@cl.cam.ac.uk>; <xen-devel@lists.xensource.com>
Sent: Wednesday, April 11, 2007 5:24 PM
Subject: RE: [Xen-devel] Re: [PATCH][HVM] fix VNIF restore failure on
HVMguestwith heavy workload
Hi Keir,
I noticed that with cset 14773, if I use xm mem-set to adjust the memory of
hvm guest with balloon driver by xm mem-set, and then save the guest, the xm
save will fail, so does xm migrate. A white window will pop up, and the
guest still exists through xm li. So will your great fixes also include the
fixing for this issue? Thanks!
Best regards,
Fan
-----Original Message-----
From: xen-devel-bounces@lists.xensource.com
[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser
Sent: 2007Äê4ÔÂ11ÈÕ 16:57
To: Zhai, Edwin
Cc: Tim Deegan; Ian Pratt; xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] Re: [PATCH][HVM] fix VNIF restore failure on
HVMguest with heavy workload
FYI, the next changeset worth testing or fixing is r14795:6e7ef794cdbc. I've
made a *lot* of changes in the last 24 hours. I've tried a few save/restores
under block and net load with no observed problems.
-- Keir
On 11/4/07 08:20, "Zhai, Edwin" <edwin.zhai@intel.com> wrote:
> On Tue, Apr 10, 2007 at 08:16:04PM +0100, Keir Fraser wrote:
>> On 10/4/07 17:47, "Zhai, Edwin" <edwin.zhai@intel.com> wrote:
>>
>>> [PATCH][HVM] fix VNIF restore failure on HVM guest with heavy workload
>>>
>>> Signed-off-by: Zhai Edwin <edwin.zhai@intel.com>
>>>
>>> VNIF has many intrs when save/restore with net workload, so need keep
>>> handler
>>> from intrs
>>
>> What happens if an interrupt is being processed during save/restore? It
>> would be nice to know what the underlying bug is!
>
> If an pseudo PCI intr occurred after xen_suspend on cpu0, there is
> definitely
> a
> crash. I copy this code from original PV driver code.
>
>>
>> That said, it may well make sense to somehow disable interrupt handling
>> across save/restore. Unfortunately your patch is insufficient since we
>> could
>> handle event-channel interrupts on any VCPU (the irq's affinity can be
>> changed outside our control if it is routed through the virtual IOAPIC,
>> and
>> if e.g. the userspace irqbalance daemon is running).
>>
>> I wanted to use stop_machine_run() but unfortunately it isn't exported to
>> modules. :-( irq_disable() may do the right thing for us though.
>
> SMP is a headache for PV drv save/restore on HVM. Even we disable intr on
> all
> cpus, PV driver on other cpu may still access low level service after
> xen_suspend on cpu0.
>
> smp_suspend is used for PV drv on PV domain, which is not suitable for HVM
> as
> we
> need the transparency to guest.
>
> Do we need lightweight stop_machine_run in this case, i.e. make other cpu
> sleep?
>
>
>>
>> -- Keir
>>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Re: [PATCH][HVM] fix VNIF restore failure on HVMguest with heavy workload
2007-04-11 16:24 ` Re: [PATCH][HVM] fix VNIF restore failure on HVMguest " Zhao, Fan
2007-04-11 16:54 ` Re: [PATCH][HVM] fix VNIF restore failure on HVMguestwith " Steven Hand
@ 2007-04-11 17:32 ` Keir Fraser
2007-04-12 2:47 ` Zhao, Fan
1 sibling, 1 reply; 13+ messages in thread
From: Keir Fraser @ 2007-04-11 17:32 UTC (permalink / raw)
To: Zhao, Fan, Zhai, Edwin; +Cc: Tim Deegan, Ian Pratt, xen-devel
On 11/4/07 17:24, "Zhao, Fan" <fan.zhao@intel.com> wrote:
> I noticed that with cset 14773, if I use xm mem-set to adjust the memory of
> hvm guest with balloon driver by xm mem-set, and then save the guest, the xm
> save will fail, so does xm migrate. A white window will pop up, and the guest
> still exists through xm li. So will your great fixes also include the fixing
> for this issue? Thanks!
It works for me (Linux guest, initial alloc 256MB, ballooned to 128MB).
Is it definitely the save that fails for you? Is there any interesting
output in xend.log?
-- Keir
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Re: [PATCH][HVM] fix VNIF restore failure on HVMguest with heavy workload
2007-04-11 17:32 ` Re: [PATCH][HVM] fix VNIF restore failure on HVMguest with " Keir Fraser
@ 2007-04-12 2:47 ` Zhao, Fan
2007-04-12 7:06 ` Keir Fraser
0 siblings, 1 reply; 13+ messages in thread
From: Zhao, Fan @ 2007-04-12 2:47 UTC (permalink / raw)
To: Keir Fraser, Zhai, Edwin; +Cc: Tim Deegan, Ian Pratt, xen-devel
Hi Keir,
Make a correction, the phenomenon is that the guest could not be saved and the guest console print "PV stuff on HVM resume successfully!" as soon as xm save command was typed. This happens with cset 14773 only on ia32e guest, and I noticed that this will be reproduced when the pv modules have been inserted in the guest, and even not need to do the xm mem-set. For the latest cset 14797, pv drivers built failure on ia32e platform and I can not try.
Xend.log shows:
[2007-04-12 11:40:09 4875] DEBUG (XendDomainInfo:824) Storing domain details: {'console/port': '6', 'cpu/3/availability': 'online', 'name': 'migrating-ExampleHVMDomain', 'console/limit': '1048576', 'cpu/2/availability': 'online', 'vm': '/vm/ba2d6693-56eb-22cc-2b51-cc0643e37d32', 'domid': '1', 'cpu/0/availability': 'online', 'memory/target': '262144', 'control/platform-feature-multiprocessor-suspend': '1', 'store/ring-ref': '65534', 'cpu/1/availability': 'online', 'store/port': '5'}
[2007-04-12 11:40:09 4875] INFO (XendCheckpoint:81) save hvm domain
[2007-04-12 11:40:09 4875] DEBUG (XendCheckpoint:95) [xc_save]: /usr/lib64/xen/bin/xc_save 22 1 0 0 4
[2007-04-12 11:40:09 4875] DEBUG (XendCheckpoint:307) suspend
[2007-04-12 11:40:09 4875] DEBUG (XendCheckpoint:98) In saveInputHandler suspend
[2007-04-12 11:40:09 4875] DEBUG (XendCheckpoint:100) Suspending 1 ...
[2007-04-12 11:40:09 4875] DEBUG (XendDomainInfo:439) XendDomainInfo.shutdown(suspend)
[2007-04-12 11:40:09 4875] INFO (XendCheckpoint:336) xc_hvm_save: dom=1, max_iters=0, max_factor=0, flags=0x4, live=0, debug=0.
[2007-04-12 11:40:09 4875] DEBUG (XendDomainInfo:905) XendDomainInfo.handleShutdownWatch
[2007-04-12 11:40:09 4875] INFO (XendCheckpoint:336) saved hvm domain info: max_memkb=0x44000, nr_pages=0x107e0
[2007-04-12 11:40:09 4875] DEBUG (XendDomainInfo:905) XendDomainInfo.handleShutdownWatch
Best regards,
Fan
-----Original Message-----
From: Keir Fraser [mailto:keir@xensource.com]
Sent: 2007年4月12日 1:33
To: Zhao, Fan; Zhai, Edwin
Cc: Tim Deegan; Ian Pratt; xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] Re: [PATCH][HVM] fix VNIF restore failure on HVMguest with heavy workload
On 11/4/07 17:24, "Zhao, Fan" <fan.zhao@intel.com> wrote:
> I noticed that with cset 14773, if I use xm mem-set to adjust the memory of
> hvm guest with balloon driver by xm mem-set, and then save the guest, the xm
> save will fail, so does xm migrate. A white window will pop up, and the guest
> still exists through xm li. So will your great fixes also include the fixing
> for this issue? Thanks!
It works for me (Linux guest, initial alloc 256MB, ballooned to 128MB).
Is it definitely the save that fails for you? Is there any interesting
output in xend.log?
-- Keir
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Re: [PATCH][HVM] fix VNIF restore failure on HVMguest with heavy workload
2007-04-12 2:47 ` Zhao, Fan
@ 2007-04-12 7:06 ` Keir Fraser
0 siblings, 0 replies; 13+ messages in thread
From: Keir Fraser @ 2007-04-12 7:06 UTC (permalink / raw)
To: Zhao, Fan, Zhai, Edwin; +Cc: xen-devel
On 12/4/07 03:47, "Zhao, Fan" <fan.zhao@intel.com> wrote:
> Make a correction, the phenomenon is that the guest could not be saved and the
> guest console print "PV stuff on HVM resume successfully!" as soon as xm save
> command was typed. This happens with cset 14773 only on ia32e guest, and I
> noticed that this will be reproduced when the pv modules have been inserted in
> the guest, and even not need to do the xm mem-set. For the latest cset 14797,
> pv drivers built failure on ia32e platform and I can not try.
r14773 isn't interesting now. Please retry with r14815 or later.
-- Keir
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Re: [PATCH][HVM] fix VNIF restore failure on HVMguestwith heavy workload
2007-04-11 16:54 ` Re: [PATCH][HVM] fix VNIF restore failure on HVMguestwith " Steven Hand
@ 2007-04-12 18:46 ` Mark Williamson
2007-04-12 18:51 ` Steven Hand
0 siblings, 1 reply; 13+ messages in thread
From: Mark Williamson @ 2007-04-12 18:46 UTC (permalink / raw)
To: xen-devel
Cc: Tim Deegan, Zhai, Edwin, Steven Hand, Ian Pratt, Zhao, Fan,
Keir Fraser
> Secondly: there is a known issue with save/restore of ballooned domains
> (HVM or PV) where the balloning is done from within the guest (e.g.
> by echoing to /proc/xen/balloon). Since this doesn't update the memory
> target within xenstore, you'll end up 'reverting' the guest memory size to
> that last set via xm mem-set. The 'fix' is not to do that, i.e. only use
> the xm or XenAPI to request ballooning.
Any reason not to have the balloon driver write back to Xenstore if it's used
in this way. Or is it just waiting for a patch to do that?
Cheers,
Mark
>
> cheers,
>
> S.
>
>
>
>
> ----- Original Message -----
> From: "Zhao, Fan" <fan.zhao@intel.com>
> To: "Keir Fraser" <keir@xensource.com>; "Zhai, Edwin"
> <edwin.zhai@intel.com> Cc: "Tim Deegan" <Tim.Deegan@xensource.com>; "Ian
> Pratt"
> <Ian.Pratt@cl.cam.ac.uk>; <xen-devel@lists.xensource.com>
> Sent: Wednesday, April 11, 2007 5:24 PM
> Subject: RE: [Xen-devel] Re: [PATCH][HVM] fix VNIF restore failure on
> HVMguestwith heavy workload
>
>
> Hi Keir,
> I noticed that with cset 14773, if I use xm mem-set to adjust the memory of
> hvm guest with balloon driver by xm mem-set, and then save the guest, the
> xm save will fail, so does xm migrate. A white window will pop up, and the
> guest still exists through xm li. So will your great fixes also include the
> fixing for this issue? Thanks!
>
> Best regards,
> Fan
>
> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser
> Sent: 2007Äê4ÔÂ11ÈÕ 16:57
> To: Zhai, Edwin
> Cc: Tim Deegan; Ian Pratt; xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] Re: [PATCH][HVM] fix VNIF restore failure on
> HVMguest with heavy workload
>
>
> FYI, the next changeset worth testing or fixing is r14795:6e7ef794cdbc.
> I've made a *lot* of changes in the last 24 hours. I've tried a few
> save/restores under block and net load with no observed problems.
>
> -- Keir
>
> On 11/4/07 08:20, "Zhai, Edwin" <edwin.zhai@intel.com> wrote:
> > On Tue, Apr 10, 2007 at 08:16:04PM +0100, Keir Fraser wrote:
> >> On 10/4/07 17:47, "Zhai, Edwin" <edwin.zhai@intel.com> wrote:
> >>> [PATCH][HVM] fix VNIF restore failure on HVM guest with heavy workload
> >>>
> >>> Signed-off-by: Zhai Edwin <edwin.zhai@intel.com>
> >>>
> >>> VNIF has many intrs when save/restore with net workload, so need keep
> >>> handler
> >>> from intrs
> >>
> >> What happens if an interrupt is being processed during save/restore? It
> >> would be nice to know what the underlying bug is!
> >
> > If an pseudo PCI intr occurred after xen_suspend on cpu0, there is
> > definitely
> > a
> > crash. I copy this code from original PV driver code.
> >
> >> That said, it may well make sense to somehow disable interrupt handling
> >> across save/restore. Unfortunately your patch is insufficient since we
> >> could
> >> handle event-channel interrupts on any VCPU (the irq's affinity can be
> >> changed outside our control if it is routed through the virtual IOAPIC,
> >> and
> >> if e.g. the userspace irqbalance daemon is running).
> >>
> >> I wanted to use stop_machine_run() but unfortunately it isn't exported
> >> to modules. :-( irq_disable() may do the right thing for us though.
> >
> > SMP is a headache for PV drv save/restore on HVM. Even we disable intr on
> > all
> > cpus, PV driver on other cpu may still access low level service after
> > xen_suspend on cpu0.
> >
> > smp_suspend is used for PV drv on PV domain, which is not suitable for
> > HVM as
> > we
> > need the transparency to guest.
> >
> > Do we need lightweight stop_machine_run in this case, i.e. make other cpu
> > sleep?
> >
> >> -- Keir
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
--
Dave: Just a question. What use is a unicyle with no seat? And no pedals!
Mark: To answer a question with a question: What use is a skateboard?
Dave: Skateboards have wheels.
Mark: My wheel has a wheel!
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Re: [PATCH][HVM] fix VNIF restore failure on HVMguestwith heavy workload
2007-04-12 18:46 ` Mark Williamson
@ 2007-04-12 18:51 ` Steven Hand
2007-04-13 9:42 ` Keir Fraser
0 siblings, 1 reply; 13+ messages in thread
From: Steven Hand @ 2007-04-12 18:51 UTC (permalink / raw)
To: Mark Williamson
Cc: Tim Deegan, Steven.Hand, Zhai, Edwin, Keir Fraser, Ian Pratt,
xen-devel, Zhao, Fan, Steven Hand
>> Secondly: there is a known issue with save/restore of ballooned domains
>> (HVM or PV) where the balloning is done from within the guest (e.g.
>> by echoing to /proc/xen/balloon). Since this doesn't update the memory
>> target within xenstore, you'll end up 'reverting' the guest memory size to
>> that last set via xm mem-set. The 'fix' is not to do that, i.e. only use
>> the xm or XenAPI to request ballooning.
>
>Any reason not to have the balloon driver write back to Xenstore if it's
>used in this way. Or is it just waiting for a patch to do that?
You also need xend to watch the node and update its internal structures,
but otherwise that'd be fine.
cheers,
S.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Re: [PATCH][HVM] fix VNIF restore failure on HVMguestwith heavy workload
2007-04-12 18:51 ` Steven Hand
@ 2007-04-13 9:42 ` Keir Fraser
0 siblings, 0 replies; 13+ messages in thread
From: Keir Fraser @ 2007-04-13 9:42 UTC (permalink / raw)
To: Steven Hand, Mark Williamson
Cc: Tim Deegan, Zhai, Edwin, Ian Pratt, xen-devel, Zhao, Fan,
Keir Fraser
On 12/4/07 19:51, "Steven Hand" <Steven.Hand@cl.cam.ac.uk> wrote:
>> Any reason not to have the balloon driver write back to Xenstore if it's
>> used in this way. Or is it just waiting for a patch to do that?
>
> You also need xend to watch the node and update its internal structures,
> but otherwise that'd be fine.
We're not sure if this is even sensible in all cases. Should an admin memory
setting be overridable by a setting derived from the guest itself?
One sensible middle ground might be for the balloon driver to ignore the
memory-target field in xenstore if the balloon target has ever been
specified via /proc/xen/balloon. This would indicate that the guest is
taking control for its own memory setting and is a simple resolution of the
conflict over whose setting takes precedence. This might be good enough for
those people who would like us to keep the /proc/xen/balloon method (I was
considering killing it off entirely) -- I suspect people either want to
control their guest memory settings inside the guests *or* via 'xm mem-set':
not both.
-- Keir
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2007-04-13 9:42 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-10 16:47 [PATCH][HVM] fix VNIF restore failure on HVM guest with heavy workload Zhai, Edwin
2007-04-10 19:16 ` Keir Fraser
2007-04-11 7:20 ` Zhai, Edwin
2007-04-11 7:41 ` Keir Fraser
2007-04-11 8:56 ` Keir Fraser
2007-04-11 16:24 ` Re: [PATCH][HVM] fix VNIF restore failure on HVMguest " Zhao, Fan
2007-04-11 16:54 ` Re: [PATCH][HVM] fix VNIF restore failure on HVMguestwith " Steven Hand
2007-04-12 18:46 ` Mark Williamson
2007-04-12 18:51 ` Steven Hand
2007-04-13 9:42 ` Keir Fraser
2007-04-11 17:32 ` Re: [PATCH][HVM] fix VNIF restore failure on HVMguest with " Keir Fraser
2007-04-12 2:47 ` Zhao, Fan
2007-04-12 7:06 ` Keir Fraser
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.