From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Jan Beulich" Subject: Re: [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device Date: Wed, 01 Jun 2011 15:59:11 +0100 Message-ID: <4DE66FDF0200007800044EC7@vpn.id2.novell.com> References: <4DE60EF8.5060902@redhat.com> <4DE653290200007800044DE4@vpn.id2.novell.com> <4DE648C3.40602@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <4DE648C3.40602@redhat.com> Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Laszlo Ersek Cc: Paolo Bonzini , Drew Jones , "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org >>> On 01.06.11 at 16:12, Laszlo Ersek wrote: > On 06/01/11 14:56, Jan Beulich wrote: >>>>> On 01.06.11 at 12:05, Laszlo Ersek wrote: >=20 >>> domU: igbvf_shutdown() >>> igbvf_suspend() >>> pci_disable_device() >>> pcibios_disable_device() >>> pcibios_disable_resources() >>> pci_write_config_word( >>> dev, PCI_COMMAND, >>> orig& ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY) >>> ) >>> ... >>> pcifront_bus_write() >>> do_pci_op(XEN_PCI_OP_conf_write) >>> dom0: pciback_do_op() >>> pciback_config_write() >>> conf_space_write() >>> command_write() [via PCI_COMMAND = funcptr] >>> pci_disable_device() >>> disable_msi_mode() >>> dev->msix_enabled =3D 0; >>> >>> The final assignment above precludes c/s 1070 from doing the job. >> >> That seems to be a problem in the PCI subsystem, in that >> disable_msi_mode() is being used from pci_disable_device() (instead >> of pci_disable_msi{,x}()), and does not seem to be done in a similarly >> bad way in newer kernels. >=20 > But does linux-2.6.18-xen avoid the problem? I think its=20 > pci_disable_device() still calls disable_msi_mode(), and the latter = also=20 > only read/writes config words. Yes, and that's certainly broken (in the native base version, not particularly in the Xen pieces). >> So I wonder whether you shouldn't fix >> pci_disable_device() instead, or alternatively move the vector >> de-allocation (and then for MSI and MSI-X) into disable_msi_mode(). >=20 > Yes, I did think of that, however IMO that would introduce exactly = the=20 > problem that you describe below. If either pci_disable_device() or=20 > disable_msi_mode() frees the vector, then re-enabling the device can=20 > face yet another stumbling block. Why - pci_enable_msi{,x}() call msi{,x}_capability_init(), which obtains the vectors. The problem is that calling disable_msi_mode() alone is always a mistake, this should have been a static function just like its counterpart, enable_msi_mode(). >> While the approach you take covers the guest shutdown/restart >> case, what if the guest called its pci_disable_device() at runtime, >> expecting to be able to call pci_enable_{device,msi,msix}() on it >> again later on? Your newly added function would also be called here, >=20 > May I ask how? The function is only called from pciback_reset_device(),= =20 I was just looking at the call tree you provided (and which is still quoted at the top). Oh - you're referring to Dom0 calling the function, whereas I wrote about what happens when guest calls its version of it. >> but in this situation you would have to call into the hypervisor (as >> the domain is still alive), i.e. you could as well call >> msi_remove_pci_irq_vectors(). >=20 > I considered taking c/s 1070 and simply removing the ifs around the=20 > pci_disable_msi[x]() calls. However, msi_get_dev_owner(), called by=20 > msi_unmap_pirq(), could find no owner domain for the device and = return=20 > DOMID_SELF. (This is different in upstream I think, see c/s 680.) = Then=20 > msi_unmap_pirq() would try to unmap a PIRQ for dom0. Yeah, when the domain is gone you have this problem. But if you properly removed the vectors while the domain is still there, you shouldn't. >> Additionally, while covering the MSI-X case, I don't see how the >> similar already-mapped case would be cleanly handled for MSI. >=20 > The target is "cleanup after shutdown in such a way that it doesn't = hurt=20 > in other cases either", so: >=20 > - it is assumed the hypervisor takes care of the mappings when the=20 > domain goes away, >=20 > - dom0 has no filtering list for MSI. msi_capability_init() "simply"=20 > asks the hypervisor for a vector, while msix_capability_init() "gets = in=20 > the way". That's all true, but by considering only your one case you may end up making already crappy code worse. > I'm looking for the best (... least wrong) location to place the = cleanup=20 > at; c/s 1070 suggested pciback_reset_device(). For that situation this was the natural place, but you're dealing with an alive domain not being cleaned up after properly during certain operations, and deferring the cleanup until the domain is gone is only going to call for further problems (as outlined earlier, and who knows what other cases we didn't consider). Hence while I realize that from a pragmatic point of view your approach may seem acceptable, it only fixing a particular aspect of a more general problem is not really something I like. Jan