From mboxrd@z Thu Jan 1 00:00:00 1970 From: Atom2 Subject: Re: [Xen-users] substantial shutdown delay for PV guests with PCI -passthrough Date: Wed, 19 Mar 2014 15:03:44 +0100 Message-ID: <5329A3C0.3000609@web2web.at> References: <5325B828.1060303@web2web.at> <1395050430.4122.29.camel@kazak.uk.xensource.com> <53273B3C.40707@web2web.at> <1395137709.12847.29.camel@kazak.uk.xensource.com> <5328439B.8050807@web2web.at> <1395155249.12847.66.camel@kazak.uk.xensource.com> <5328E403.8010506@web2web.at> <1395228384.10203.65.camel@kazak.uk.xensource.com> <20140319130002.GC8694@phenom.dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20140319130002.GC8694@phenom.dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk , Ian Campbell Cc: xen-devel , Ian Jackson , Roger Pau Monne , xen-users@lists.xenproject.org List-Id: xen-devel@lists.xenproject.org Am 19.03.14 14:00, schrieb Konrad Rzeszutek Wilk: > On Wed, Mar 19, 2014 at 11:26:24AM +0000, Ian Campbell wrote: >> On Wed, 2014-03-19 at 01:25 +0100, Atom2 wrote: >>> So it seems that pretty much at the start of the 10s delay the state >>> changed from 4 to 6 and stays at that value even after the first 10s >>> delay is over - whatever that means. >> >> 4 == Connected >> 6 == Closed >> >> I think what is happening is that the domain is shutting down, which >> causes pciback to transition to the closed state (because the f.e. went >> away, so this is a reasonable thing for it to do). >> >> The bug appears to be that libxl is trying to "hot unplug" the devices >> on shutdown when they have already been effectively "cold unplugged" by >> the domain going down. I might be wrong, but this behaviour is somehow reminescent of (although not identical to) the bug in the vif-bridge script that I reported some time ago (see http://xen.markmail.org/thread/auroivzr4vje3bzn ; btw discussions there seem to have stalled): The vif-bridge script also tried to do something (i.e. deleting an i/f from the bridge and bringing down the i/f) which obviously has already been done through shutting down the guest domain. >> >> Perhaps libxl__device_pci_remove_xenstore should observe that the state >> is > 4 (hence closing/closed) and not bother doing anything, i.e. only >> waiting iff the state is <4 (init, connecting etc)? Or unconditionally >> removing the nodes if state > 4. (perhaps state 7, reconfiguring needs >> handling here too) >> >> Or perhaps the force parameter passed to remove_common (which indicates >> destroy rather than unplug) ought to be propagated down to this code and >> $something done with it. >> >> Roger, Ian, any thoughts on that? > > This reminds me of this bug: > > commit 098b1aeaf4d6149953b8f1f8d55c21d85536fbff > Author: Konrad Rzeszutek Wilk > Date: Mon Jun 10 16:48:09 2013 -0400 > > xen/pcifront: Deal with toolstack missing 'XenbusStateClosing' state. > > ... snip.. > > In other words, this 4(Connected)->5(Closing)->4(Connected) state > was expected, while 4(Connected)->.... anything but 5(Closing)->4(Connected) > was not. This patch removes that aggressive check and allows > Xen pcifront to work with the 'xl' toolstack (for one or more > PCI devices) and with 'xm' toolstack (for more than two PCI > devices). > > But this seems to be a different state issue? > > Ariel/Atom2, do you see this behavior with 'xend'? And what is the version of Linux > kernel you are running as guest? Hi Konrad - nope, I am using xl; there is no xend or xm installed on the machine or involved anyhow (I assumed with xend you referred back to xm instead of xl). The xen (and xen-tools) version is 4.3.1-r5 and the linux kernel is 3.11.7-r1 from gentoo hardened-sources (that's both for guest and for dom0 - although clearly with different kernel configs). Both the kernel and xen/xen-tools are the latest stable versions available as ebuilds from gentoo. >> >> Ian. >>