From: Laszlo Ersek <lersek@redhat.com>
To: Jan Beulich <JBeulich@novell.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
Drew Jones <drjones@redhat.com>,
"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
Subject: Re: [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device
Date: Wed, 01 Jun 2011 16:12:19 +0200 [thread overview]
Message-ID: <4DE648C3.40602@redhat.com> (raw)
In-Reply-To: <4DE653290200007800044DE4@vpn.id2.novell.com>
On 06/01/11 14:56, Jan Beulich wrote:
>>>> On 01.06.11 at 12:05, Laszlo Ersek<lersek@redhat.com> wrote:
>> domU: igbvf_shutdown()
>> igbvf_suspend()
>> pci_disable_device()
>> pcibios_disable_device()
>> pcibios_disable_resources()
>> pci_write_config_word(
>> dev, PCI_COMMAND,
>> orig& ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY)
>> )
>> ...
>> pcifront_bus_write()
>> do_pci_op(XEN_PCI_OP_conf_write)
>> dom0: pciback_do_op()
>> pciback_config_write()
>> conf_space_write()
>> command_write() [via PCI_COMMAND funcptr]
>> pci_disable_device()
>> disable_msi_mode()
>> dev->msix_enabled = 0;
>>
>> The final assignment above precludes c/s 1070 from doing the job.
>
> That seems to be a problem in the PCI subsystem, in that
> disable_msi_mode() is being used from pci_disable_device() (instead
> of pci_disable_msi{,x}()), and does not seem to be done in a similarly
> bad way in newer kernels.
But does linux-2.6.18-xen avoid the problem? I think its
pci_disable_device() still calls disable_msi_mode(), and the latter also
only read/writes config words.
> So I wonder whether you shouldn't fix
> pci_disable_device() instead, or alternatively move the vector
> de-allocation (and then for MSI and MSI-X) into disable_msi_mode().
Yes, I did think of that, however IMO that would introduce exactly the
problem that you describe below. If either pci_disable_device() or
disable_msi_mode() frees the vector, then re-enabling the device can
face yet another stumbling block.
I liken this a bit to the UNIX(R) signals -- the allocation/mapping of
the MSI-X vectors is the "signal disposition" (= signal action, signal
delivery), while the PCI dev's configuration -- whether it's allowed to
generate such interrupts -- is almost like the "signal mask". You can
have the vectors allocated / mapped and the device can still be told not
to generate those interrupts.
In that sense dev->msix_enabled has a split personality (mixed
responsibilities).
> While the approach you take covers the guest shutdown/restart
> case, what if the guest called its pci_disable_device() at runtime,
> expecting to be able to call pci_enable_{device,msi,msix}() on it
> again later on? Your newly added function would also be called here,
May I ask how? The function is only called from pciback_reset_device(),
which in turn is called from drivers/xen/pciback/pci_stub.c,
pcistub_device_release
pcistub_put_pci_dev EXTERN
pcistub_init_device
pcistub_device_put
pcistub_init_devices_late
pcistub_seize
pcistub_device_get_pci_dev EXTERN
pcistub_remove DRIVEROP
permissive_add EXTERN DRIVER_ATTR(permissive)
pciback_init MODULE_INIT
pcistub_probe DRIVEROP
pcistub_get_pci_dev_by_slot EXTERN
pcistub_get_pci_dev EXTERN
I did a cursory search for '\<pcistub_', and it seems to me all these
functions are only called from functions like
pciback_release_pci_dev (passthrough.c, slot.c and vpci.c)
pciback_release_devices (passthrough.c, slot.c and vpci.c)
In my understanding we have no problems reported about the
insertion/removal of the pciback module, or device (de)assignment to /
from it.
> but in this situation you would have to call into the hypervisor (as
> the domain is still alive), i.e. you could as well call
> msi_remove_pci_irq_vectors().
I considered taking c/s 1070 and simply removing the ifs around the
pci_disable_msi[x]() calls. However, msi_get_dev_owner(), called by
msi_unmap_pirq(), could find no owner domain for the device and return
DOMID_SELF. (This is different in upstream I think, see c/s 680.) Then
msi_unmap_pirq() would try to unmap a PIRQ for dom0.
> Additionally, while covering the MSI-X case, I don't see how the
> similar already-mapped case would be cleanly handled for MSI.
The target is "cleanup after shutdown in such a way that it doesn't hurt
in other cases either", so:
- it is assumed the hypervisor takes care of the mappings when the
domain goes away,
- dom0 has no filtering list for MSI. msi_capability_init() "simply"
asks the hypervisor for a vector, while msix_capability_init() "gets in
the way".
The sole purpose of the patch is to trim the MSI-X "filtering" list
(msi_dev_head) after the domain is gone. I'm not bold enough to yank out
the filtering altogether, even though I think it only does damage wrt.
reboots -- it must have been originally meant to catch the case when the
*same* guest tries to ask for a set of MSI-X entries for a device, in
such a way that the requested set is not distinct from entries requested
previously for the same device.
The fact that the domid of the device-owning domain is not saved
anywhere in the data structure (... at least in RHEL-5) seems to confirm
this -- the current implementation has no way to detect that the owning
domain has changed; I think it's even oblivious to this possiblity.
I'm looking for the best (... least wrong) location to place the cleanup
at; c/s 1070 suggested pciback_reset_device().
Thanks!
lacos
next prev parent reply other threads:[~2011-06-01 14:12 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-01 10:05 [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device Laszlo Ersek
2011-06-01 11:02 ` Laszlo Ersek
2011-06-01 14:31 ` Konrad Rzeszutek Wilk
2011-06-01 16:18 ` Paolo Bonzini
2011-06-01 17:32 ` Laszlo Ersek
2011-06-01 18:13 ` 2.6.38 (FC15) with PCI passthrough fails mysteriously with iommu=soft Konrad Rzeszutek Wilk
2011-06-02 7:42 ` Laszlo Ersek
2011-06-02 20:49 ` Pasi Kärkkäinen
2011-06-01 12:56 ` [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device Jan Beulich
2011-06-01 14:12 ` Laszlo Ersek [this message]
2011-06-01 14:59 ` Jan Beulich
2011-06-01 15:27 ` Laszlo Ersek
2011-06-01 16:07 ` Jan Beulich
2011-06-01 16:25 ` Andrew Jones
2011-06-01 16:27 ` Paolo Bonzini
2011-06-01 16:16 ` Paolo Bonzini
2011-06-01 14:51 ` Konrad Rzeszutek Wilk
2011-06-01 15:01 ` Konrad Rzeszutek Wilk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4DE648C3.40602@redhat.com \
--to=lersek@redhat.com \
--cc=JBeulich@novell.com \
--cc=drjones@redhat.com \
--cc=pbonzini@redhat.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.