From mboxrd@z Thu Jan 1 00:00:00 1970 From: Keir Fraser Subject: Re: Re: [PATCH][HVM] fix VNIF restore failure on HVM guest with heavy workload Date: Wed, 11 Apr 2007 08:41:25 +0100 Message-ID: References: <20070411072014.GK4593@edwin-srv.sh.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20070411072014.GK4593@edwin-srv.sh.intel.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "Zhai, Edwin" Cc: Tim Deegan , Ian Pratt , xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org On 11/4/07 08:20, "Zhai, Edwin" wrote: >> What happens if an interrupt is being processed during save/restore? It >> would be nice to know what the underlying bug is! > > If an pseudo PCI intr occurred after xen_suspend on cpu0, there is definitely > a crash. I copy this code from original PV driver code. Yeah, but in that case: (a) it's for a different reason [make sure no interrupt handler runs that might look at machine addresses in page tables, mainly]; and (b) it's backed up by the fact that all other CPUs have been offlined or stop_machined(). Do you have a crash oops message? I'm just a little concerned we may end up masking a real save/restore bug here, which we may as well fix while you can repro. > SMP is a headache for PV drv save/restore on HVM. Even we disable intr on all > cpus, PV driver on other cpu may still access low level service after > xen_suspend on cpu0. smp_suspend is used for PV drv on PV domain, which is not > suitable for HVM as we need the transparency to guest. > > Do we need lightweight stop_machine_run in this case, i.e. make other cpu > sleep? I'm thinking irq_disable() of the pci-flatform irq, coupled with a smp_call_function() to make the other CPUs spin with interrupts disabled. -- Keir