From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Zhai, Edwin" Subject: Re: [PATCH][HVM] fix VNIF restore failure on HVM guest with heavy workload Date: Wed, 11 Apr 2007 15:20:14 +0800 Message-ID: <20070411072014.GK4593@edwin-srv.sh.intel.com> References: <20070410164738.GA24587@edwin-gen.ccr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Keir Fraser Cc: Tim Deegan , Ian Pratt , xen-devel@lists.xensource.com, "Zhai, Edwin" List-Id: xen-devel@lists.xenproject.org On Tue, Apr 10, 2007 at 08:16:04PM +0100, Keir Fraser wrote: > On 10/4/07 17:47, "Zhai, Edwin" wrote: > > > [PATCH][HVM] fix VNIF restore failure on HVM guest with heavy workload > > > > Signed-off-by: Zhai Edwin > > > > VNIF has many intrs when save/restore with net workload, so need keep handler > > from intrs > > What happens if an interrupt is being processed during save/restore? It > would be nice to know what the underlying bug is! If an pseudo PCI intr occurred after xen_suspend on cpu0, there is definitely a crash. I copy this code from original PV driver code. > > That said, it may well make sense to somehow disable interrupt handling > across save/restore. Unfortunately your patch is insufficient since we could > handle event-channel interrupts on any VCPU (the irq's affinity can be > changed outside our control if it is routed through the virtual IOAPIC, and > if e.g. the userspace irqbalance daemon is running). > > I wanted to use stop_machine_run() but unfortunately it isn't exported to > modules. :-( irq_disable() may do the right thing for us though. SMP is a headache for PV drv save/restore on HVM. Even we disable intr on all cpus, PV driver on other cpu may still access low level service after xen_suspend on cpu0. smp_suspend is used for PV drv on PV domain, which is not suitable for HVM as we need the transparency to guest. Do we need lightweight stop_machine_run in this case, i.e. make other cpu sleep? > > -- Keir > -- best rgds, edwin