From mboxrd@z Thu Jan 1 00:00:00 1970 From: Keir Fraser Subject: Re: Re: [PATCH][HVM] fix VNIF restore failure on HVM guest with heavy workload Date: Wed, 11 Apr 2007 09:56:40 +0100 Message-ID: References: <20070411072014.GK4593@edwin-srv.sh.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20070411072014.GK4593@edwin-srv.sh.intel.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "Zhai, Edwin" Cc: Tim Deegan , Ian Pratt , xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org FYI, the next changeset worth testing or fixing is r14795:6e7ef794cdbc. I've made a *lot* of changes in the last 24 hours. I've tried a few save/restores under block and net load with no observed problems. -- Keir On 11/4/07 08:20, "Zhai, Edwin" wrote: > On Tue, Apr 10, 2007 at 08:16:04PM +0100, Keir Fraser wrote: >> On 10/4/07 17:47, "Zhai, Edwin" wrote: >> >>> [PATCH][HVM] fix VNIF restore failure on HVM guest with heavy workload >>> >>> Signed-off-by: Zhai Edwin >>> >>> VNIF has many intrs when save/restore with net workload, so need keep >>> handler >>> from intrs >> >> What happens if an interrupt is being processed during save/restore? It >> would be nice to know what the underlying bug is! > > If an pseudo PCI intr occurred after xen_suspend on cpu0, there is definitely > a > crash. I copy this code from original PV driver code. > >> >> That said, it may well make sense to somehow disable interrupt handling >> across save/restore. Unfortunately your patch is insufficient since we could >> handle event-channel interrupts on any VCPU (the irq's affinity can be >> changed outside our control if it is routed through the virtual IOAPIC, and >> if e.g. the userspace irqbalance daemon is running). >> >> I wanted to use stop_machine_run() but unfortunately it isn't exported to >> modules. :-( irq_disable() may do the right thing for us though. > > SMP is a headache for PV drv save/restore on HVM. Even we disable intr on all > cpus, PV driver on other cpu may still access low level service after > xen_suspend on cpu0. > > smp_suspend is used for PV drv on PV domain, which is not suitable for HVM as > we > need the transparency to guest. > > Do we need lightweight stop_machine_run in this case, i.e. make other cpu > sleep? > > >> >> -- Keir >>