xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Laszlo Ersek <lersek@redhat.com>
To: Jan Beulich <JBeulich@novell.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Drew Jones <drjones@redhat.com>,
	"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
Subject: Re: [RFC PATCH linux-2.6.18-xen] pciback: clean up	 (MSI-X vec, entrynr) list when resetting PCI device
Date: Wed, 01 Jun 2011 16:12:19 +0200	[thread overview]
Message-ID: <4DE648C3.40602@redhat.com> (raw)
In-Reply-To: <4DE653290200007800044DE4@vpn.id2.novell.com>

On 06/01/11 14:56, Jan Beulich wrote:
>>>> On 01.06.11 at 12:05, Laszlo Ersek<lersek@redhat.com>  wrote:

>> domU: igbvf_shutdown()
>>          igbvf_suspend()
>>            pci_disable_device()
>>              pcibios_disable_device()
>>                pcibios_disable_resources()
>>                  pci_write_config_word(
>>                      dev, PCI_COMMAND,
>>                      orig&  ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY)
>>                  )
>>                    ...
>>                      pcifront_bus_write()
>>                        do_pci_op(XEN_PCI_OP_conf_write)
>> dom0:                   pciback_do_op()
>>                            pciback_config_write()
>>                              conf_space_write()
>>                                command_write() [via PCI_COMMAND funcptr]
>>                                  pci_disable_device()
>>                                    disable_msi_mode()
>>                                      dev->msix_enabled = 0;
>>
>> The final assignment above precludes c/s 1070 from doing the job.
>
> That seems to be a problem in the PCI subsystem, in that
> disable_msi_mode() is being used from pci_disable_device() (instead
> of pci_disable_msi{,x}()), and does not seem to be done in a similarly
> bad way in newer kernels.

But does linux-2.6.18-xen avoid the problem? I think its 
pci_disable_device() still calls disable_msi_mode(), and the latter also 
only read/writes config words.

> So I wonder whether you shouldn't fix
> pci_disable_device() instead, or alternatively move the vector
> de-allocation (and then for MSI and MSI-X) into disable_msi_mode().

Yes, I did think of that, however IMO that would introduce exactly the 
problem that you describe below. If either pci_disable_device() or 
disable_msi_mode() frees the vector, then re-enabling the device can 
face yet another stumbling block.

I liken this a bit to the UNIX(R) signals -- the allocation/mapping of 
the MSI-X vectors is the "signal disposition" (= signal action, signal 
delivery), while the PCI dev's configuration -- whether it's allowed to 
generate such interrupts -- is almost like the "signal mask". You can 
have the vectors allocated / mapped and the device can still be told not 
to generate those interrupts.

In that sense dev->msix_enabled has a split personality (mixed 
responsibilities).


> While the approach you take covers the guest shutdown/restart
> case, what if the guest called its pci_disable_device() at runtime,
> expecting to be able to call pci_enable_{device,msi,msix}() on it
> again later on? Your newly added function would also be called here,

May I ask how? The function is only called from pciback_reset_device(), 
which in turn is called from drivers/xen/pciback/pci_stub.c,

   pcistub_device_release
   pcistub_put_pci_dev         EXTERN
   pcistub_init_device
   pcistub_device_put
   pcistub_init_devices_late
   pcistub_seize
   pcistub_device_get_pci_dev  EXTERN
   pcistub_remove              DRIVEROP
   permissive_add              EXTERN DRIVER_ATTR(permissive)
   pciback_init                MODULE_INIT
   pcistub_probe               DRIVEROP
   pcistub_get_pci_dev_by_slot EXTERN
   pcistub_get_pci_dev         EXTERN

I did a cursory search for '\<pcistub_', and it seems to me all these 
functions are only called from functions like

   pciback_release_pci_dev (passthrough.c, slot.c and vpci.c)
   pciback_release_devices (passthrough.c, slot.c and vpci.c)

In my understanding we have no problems reported about the 
insertion/removal of the pciback module, or device (de)assignment to / 
from it.

> but in this situation you would have to call into the hypervisor (as
> the domain is still alive), i.e. you could as well call
> msi_remove_pci_irq_vectors().

I considered taking c/s 1070 and simply removing the ifs around the 
pci_disable_msi[x]() calls. However, msi_get_dev_owner(), called by 
msi_unmap_pirq(), could find no owner domain for the device and return 
DOMID_SELF. (This is different in upstream I think, see c/s 680.) Then 
msi_unmap_pirq() would try to unmap a PIRQ for dom0.


> Additionally, while covering the MSI-X case, I don't see how the
> similar already-mapped case would be cleanly handled for MSI.

The target is "cleanup after shutdown in such a way that it doesn't hurt 
in other cases either", so:

- it is assumed the hypervisor takes care of the mappings when the 
domain goes away,

- dom0 has no filtering list for MSI. msi_capability_init() "simply" 
asks the hypervisor for a vector, while msix_capability_init() "gets in 
the way".

The sole purpose of the patch is to trim the MSI-X "filtering" list 
(msi_dev_head) after the domain is gone. I'm not bold enough to yank out 
the filtering altogether, even though I think it only does damage wrt. 
reboots -- it must have been originally meant to catch the case when the 
*same* guest tries to ask for a set of MSI-X entries for a device, in 
such a way that the requested set is not distinct from entries requested 
previously for the same device.

The fact that the domid of the device-owning domain is not saved 
anywhere in the data structure (... at least in RHEL-5) seems to confirm 
this -- the current implementation has no way to detect that the owning 
domain has changed; I think it's even oblivious to this possiblity.

I'm looking for the best (... least wrong) location to place the cleanup 
at; c/s 1070 suggested pciback_reset_device().

Thanks!
lacos

  reply	other threads:[~2011-06-01 14:12 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-01 10:05 [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device Laszlo Ersek
2011-06-01 11:02 ` Laszlo Ersek
2011-06-01 14:31   ` Konrad Rzeszutek Wilk
2011-06-01 16:18     ` Paolo Bonzini
2011-06-01 17:32     ` Laszlo Ersek
2011-06-01 18:13       ` 2.6.38 (FC15) with PCI passthrough fails mysteriously with iommu=soft Konrad Rzeszutek Wilk
2011-06-02  7:42         ` Laszlo Ersek
2011-06-02 20:49         ` Pasi Kärkkäinen
2011-06-01 12:56 ` [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device Jan Beulich
2011-06-01 14:12   ` Laszlo Ersek [this message]
2011-06-01 14:59     ` Jan Beulich
2011-06-01 15:27       ` Laszlo Ersek
2011-06-01 16:07         ` Jan Beulich
2011-06-01 16:25           ` Andrew Jones
2011-06-01 16:27             ` Paolo Bonzini
2011-06-01 16:16       ` Paolo Bonzini
2011-06-01 14:51 ` Konrad Rzeszutek Wilk
2011-06-01 15:01   ` Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DE648C3.40602@redhat.com \
    --to=lersek@redhat.com \
    --cc=JBeulich@novell.com \
    --cc=drjones@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).