From mboxrd@z Thu Jan 1 00:00:00 1970 From: Laszlo Ersek Subject: Re: [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device Date: Wed, 01 Jun 2011 16:12:19 +0200 Message-ID: <4DE648C3.40602@redhat.com> References: <4DE60EF8.5060902@redhat.com> <4DE653290200007800044DE4@vpn.id2.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4DE653290200007800044DE4@vpn.id2.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Jan Beulich Cc: Paolo Bonzini , Drew Jones , "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org On 06/01/11 14:56, Jan Beulich wrote: >>>> On 01.06.11 at 12:05, Laszlo Ersek wrote: >> domU: igbvf_shutdown() >> igbvf_suspend() >> pci_disable_device() >> pcibios_disable_device() >> pcibios_disable_resources() >> pci_write_config_word( >> dev, PCI_COMMAND, >> orig& ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY) >> ) >> ... >> pcifront_bus_write() >> do_pci_op(XEN_PCI_OP_conf_write) >> dom0: pciback_do_op() >> pciback_config_write() >> conf_space_write() >> command_write() [via PCI_COMMAND funcptr] >> pci_disable_device() >> disable_msi_mode() >> dev->msix_enabled = 0; >> >> The final assignment above precludes c/s 1070 from doing the job. > > That seems to be a problem in the PCI subsystem, in that > disable_msi_mode() is being used from pci_disable_device() (instead > of pci_disable_msi{,x}()), and does not seem to be done in a similarly > bad way in newer kernels. But does linux-2.6.18-xen avoid the problem? I think its pci_disable_device() still calls disable_msi_mode(), and the latter also only read/writes config words. > So I wonder whether you shouldn't fix > pci_disable_device() instead, or alternatively move the vector > de-allocation (and then for MSI and MSI-X) into disable_msi_mode(). Yes, I did think of that, however IMO that would introduce exactly the problem that you describe below. If either pci_disable_device() or disable_msi_mode() frees the vector, then re-enabling the device can face yet another stumbling block. I liken this a bit to the UNIX(R) signals -- the allocation/mapping of the MSI-X vectors is the "signal disposition" (= signal action, signal delivery), while the PCI dev's configuration -- whether it's allowed to generate such interrupts -- is almost like the "signal mask". You can have the vectors allocated / mapped and the device can still be told not to generate those interrupts. In that sense dev->msix_enabled has a split personality (mixed responsibilities). > While the approach you take covers the guest shutdown/restart > case, what if the guest called its pci_disable_device() at runtime, > expecting to be able to call pci_enable_{device,msi,msix}() on it > again later on? Your newly added function would also be called here, May I ask how? The function is only called from pciback_reset_device(), which in turn is called from drivers/xen/pciback/pci_stub.c, pcistub_device_release pcistub_put_pci_dev EXTERN pcistub_init_device pcistub_device_put pcistub_init_devices_late pcistub_seize pcistub_device_get_pci_dev EXTERN pcistub_remove DRIVEROP permissive_add EXTERN DRIVER_ATTR(permissive) pciback_init MODULE_INIT pcistub_probe DRIVEROP pcistub_get_pci_dev_by_slot EXTERN pcistub_get_pci_dev EXTERN I did a cursory search for '\ but in this situation you would have to call into the hypervisor (as > the domain is still alive), i.e. you could as well call > msi_remove_pci_irq_vectors(). I considered taking c/s 1070 and simply removing the ifs around the pci_disable_msi[x]() calls. However, msi_get_dev_owner(), called by msi_unmap_pirq(), could find no owner domain for the device and return DOMID_SELF. (This is different in upstream I think, see c/s 680.) Then msi_unmap_pirq() would try to unmap a PIRQ for dom0. > Additionally, while covering the MSI-X case, I don't see how the > similar already-mapped case would be cleanly handled for MSI. The target is "cleanup after shutdown in such a way that it doesn't hurt in other cases either", so: - it is assumed the hypervisor takes care of the mappings when the domain goes away, - dom0 has no filtering list for MSI. msi_capability_init() "simply" asks the hypervisor for a vector, while msix_capability_init() "gets in the way". The sole purpose of the patch is to trim the MSI-X "filtering" list (msi_dev_head) after the domain is gone. I'm not bold enough to yank out the filtering altogether, even though I think it only does damage wrt. reboots -- it must have been originally meant to catch the case when the *same* guest tries to ask for a set of MSI-X entries for a device, in such a way that the requested set is not distinct from entries requested previously for the same device. The fact that the domid of the device-owning domain is not saved anywhere in the data structure (... at least in RHEL-5) seems to confirm this -- the current implementation has no way to detect that the owning domain has changed; I think it's even oblivious to this possiblity. I'm looking for the best (... least wrong) location to place the cleanup at; c/s 1070 suggested pciback_reset_device(). Thanks! lacos