From: Laszlo Ersek <lersek@redhat.com>
To: Jan Beulich <JBeulich@novell.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
Drew Jones <drjones@redhat.com>,
"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
Subject: Re: [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device
Date: Wed, 01 Jun 2011 16:12:19 +0200 [thread overview]
Message-ID: <4DE648C3.40602@redhat.com> (raw)
In-Reply-To: <4DE653290200007800044DE4@vpn.id2.novell.com>
On 06/01/11 14:56, Jan Beulich wrote:
>>>> On 01.06.11 at 12:05, Laszlo Ersek<lersek@redhat.com> wrote:
>> domU: igbvf_shutdown()
>> igbvf_suspend()
>> pci_disable_device()
>> pcibios_disable_device()
>> pcibios_disable_resources()
>> pci_write_config_word(
>> dev, PCI_COMMAND,
>> orig& ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY)
>> )
>> ...
>> pcifront_bus_write()
>> do_pci_op(XEN_PCI_OP_conf_write)
>> dom0: pciback_do_op()
>> pciback_config_write()
>> conf_space_write()
>> command_write() [via PCI_COMMAND funcptr]
>> pci_disable_device()
>> disable_msi_mode()
>> dev->msix_enabled = 0;
>>
>> The final assignment above precludes c/s 1070 from doing the job.
>
> That seems to be a problem in the PCI subsystem, in that
> disable_msi_mode() is being used from pci_disable_device() (instead
> of pci_disable_msi{,x}()), and does not seem to be done in a similarly
> bad way in newer kernels.
But does linux-2.6.18-xen avoid the problem? I think its
pci_disable_device() still calls disable_msi_mode(), and the latter also
only read/writes config words.
> So I wonder whether you shouldn't fix
> pci_disable_device() instead, or alternatively move the vector
> de-allocation (and then for MSI and MSI-X) into disable_msi_mode().
Yes, I did think of that, however IMO that would introduce exactly the
problem that you describe below. If either pci_disable_device() or
disable_msi_mode() frees the vector, then re-enabling the device can
face yet another stumbling block.
I liken this a bit to the UNIX(R) signals -- the allocation/mapping of
the MSI-X vectors is the "signal disposition" (= signal action, signal
delivery), while the PCI dev's configuration -- whether it's allowed to
generate such interrupts -- is almost like the "signal mask". You can
have the vectors allocated / mapped and the device can still be told not
to generate those interrupts.
In that sense dev->msix_enabled has a split personality (mixed
responsibilities).
> While the approach you take covers the guest shutdown/restart
> case, what if the guest called its pci_disable_device() at runtime,
> expecting to be able to call pci_enable_{device,msi,msix}() on it
> again later on? Your newly added function would also be called here,
May I ask how? The function is only called from pciback_reset_device(),
which in turn is called from drivers/xen/pciback/pci_stub.c,
pcistub_device_release
pcistub_put_pci_dev EXTERN
pcistub_init_device
pcistub_device_put
pcistub_init_devices_late
pcistub_seize
pcistub_device_get_pci_dev EXTERN
pcistub_remove DRIVEROP
permissive_add EXTERN DRIVER_ATTR(permissive)
pciback_init MODULE_INIT
pcistub_probe DRIVEROP
pcistub_get_pci_dev_by_slot EXTERN
pcistub_get_pci_dev EXTERN
I did a cursory search for '\<pcistub_', and it seems to me all these
functions are only called from functions like
pciback_release_pci_dev (passthrough.c, slot.c and vpci.c)
pciback_release_devices (passthrough.c, slot.c and vpci.c)
In my understanding we have no problems reported about the
insertion/removal of the pciback module, or device (de)assignment to /
from it.
> but in this situation you would have to call into the hypervisor (as
> the domain is still alive), i.e. you could as well call
> msi_remove_pci_irq_vectors().
I considered taking c/s 1070 and simply removing the ifs around the
pci_disable_msi[x]() calls. However, msi_get_dev_owner(), called by
msi_unmap_pirq(), could find no owner domain for the device and return
DOMID_SELF. (This is different in upstream I think, see c/s 680.) Then
msi_unmap_pirq() would try to unmap a PIRQ for dom0.
> Additionally, while covering the MSI-X case, I don't see how the
> similar already-mapped case would be cleanly handled for MSI.
The target is "cleanup after shutdown in such a way that it doesn't hurt
in other cases either", so:
- it is assumed the hypervisor takes care of the mappings when the
domain goes away,
- dom0 has no filtering list for MSI. msi_capability_init() "simply"
asks the hypervisor for a vector, while msix_capability_init() "gets in
the way".
The sole purpose of the patch is to trim the MSI-X "filtering" list
(msi_dev_head) after the domain is gone. I'm not bold enough to yank out
the filtering altogether, even though I think it only does damage wrt.
reboots -- it must have been originally meant to catch the case when the
*same* guest tries to ask for a set of MSI-X entries for a device, in
such a way that the requested set is not distinct from entries requested
previously for the same device.
The fact that the domid of the device-owning domain is not saved
anywhere in the data structure (... at least in RHEL-5) seems to confirm
this -- the current implementation has no way to detect that the owning
domain has changed; I think it's even oblivious to this possiblity.
I'm looking for the best (... least wrong) location to place the cleanup
at; c/s 1070 suggested pciback_reset_device().
Thanks!
lacos
next prev parent reply other threads:[~2011-06-01 14:12 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-01 10:05 [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device Laszlo Ersek
2011-06-01 11:02 ` Laszlo Ersek
2011-06-01 14:31 ` Konrad Rzeszutek Wilk
2011-06-01 16:18 ` Paolo Bonzini
2011-06-01 17:32 ` Laszlo Ersek
2011-06-01 18:13 ` 2.6.38 (FC15) with PCI passthrough fails mysteriously with iommu=soft Konrad Rzeszutek Wilk
2011-06-02 7:42 ` Laszlo Ersek
2011-06-02 20:49 ` Pasi Kärkkäinen
2011-06-01 12:56 ` [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device Jan Beulich
2011-06-01 14:12 ` Laszlo Ersek [this message]
2011-06-01 14:59 ` Jan Beulich
2011-06-01 15:27 ` Laszlo Ersek
2011-06-01 16:07 ` Jan Beulich
2011-06-01 16:25 ` Andrew Jones
2011-06-01 16:27 ` Paolo Bonzini
2011-06-01 16:16 ` Paolo Bonzini
2011-06-01 14:51 ` Konrad Rzeszutek Wilk
2011-06-01 15:01 ` Konrad Rzeszutek Wilk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4DE648C3.40602@redhat.com \
--to=lersek@redhat.com \
--cc=JBeulich@novell.com \
--cc=drjones@redhat.com \
--cc=pbonzini@redhat.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).