All of lore.kernel.org
 help / color / mirror / Atom feed
From: Laszlo Ersek <lersek@redhat.com>
To: Jan Beulich <JBeulich@novell.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Drew Jones <drjones@redhat.com>,
	"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
Subject: Re: [RFC PATCH linux-2.6.18-xen] pciback: clean up	 (MSI-X vec, entrynr) list when resetting PCI device
Date: Wed, 01 Jun 2011 16:12:19 +0200	[thread overview]
Message-ID: <4DE648C3.40602@redhat.com> (raw)
In-Reply-To: <4DE653290200007800044DE4@vpn.id2.novell.com>

On 06/01/11 14:56, Jan Beulich wrote:
>>>> On 01.06.11 at 12:05, Laszlo Ersek<lersek@redhat.com>  wrote:

>> domU: igbvf_shutdown()
>>          igbvf_suspend()
>>            pci_disable_device()
>>              pcibios_disable_device()
>>                pcibios_disable_resources()
>>                  pci_write_config_word(
>>                      dev, PCI_COMMAND,
>>                      orig&  ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY)
>>                  )
>>                    ...
>>                      pcifront_bus_write()
>>                        do_pci_op(XEN_PCI_OP_conf_write)
>> dom0:                   pciback_do_op()
>>                            pciback_config_write()
>>                              conf_space_write()
>>                                command_write() [via PCI_COMMAND funcptr]
>>                                  pci_disable_device()
>>                                    disable_msi_mode()
>>                                      dev->msix_enabled = 0;
>>
>> The final assignment above precludes c/s 1070 from doing the job.
>
> That seems to be a problem in the PCI subsystem, in that
> disable_msi_mode() is being used from pci_disable_device() (instead
> of pci_disable_msi{,x}()), and does not seem to be done in a similarly
> bad way in newer kernels.

But does linux-2.6.18-xen avoid the problem? I think its 
pci_disable_device() still calls disable_msi_mode(), and the latter also 
only read/writes config words.

> So I wonder whether you shouldn't fix
> pci_disable_device() instead, or alternatively move the vector
> de-allocation (and then for MSI and MSI-X) into disable_msi_mode().

Yes, I did think of that, however IMO that would introduce exactly the 
problem that you describe below. If either pci_disable_device() or 
disable_msi_mode() frees the vector, then re-enabling the device can 
face yet another stumbling block.

I liken this a bit to the UNIX(R) signals -- the allocation/mapping of 
the MSI-X vectors is the "signal disposition" (= signal action, signal 
delivery), while the PCI dev's configuration -- whether it's allowed to 
generate such interrupts -- is almost like the "signal mask". You can 
have the vectors allocated / mapped and the device can still be told not 
to generate those interrupts.

In that sense dev->msix_enabled has a split personality (mixed 
responsibilities).


> While the approach you take covers the guest shutdown/restart
> case, what if the guest called its pci_disable_device() at runtime,
> expecting to be able to call pci_enable_{device,msi,msix}() on it
> again later on? Your newly added function would also be called here,

May I ask how? The function is only called from pciback_reset_device(), 
which in turn is called from drivers/xen/pciback/pci_stub.c,

   pcistub_device_release
   pcistub_put_pci_dev         EXTERN
   pcistub_init_device
   pcistub_device_put
   pcistub_init_devices_late
   pcistub_seize
   pcistub_device_get_pci_dev  EXTERN
   pcistub_remove              DRIVEROP
   permissive_add              EXTERN DRIVER_ATTR(permissive)
   pciback_init                MODULE_INIT
   pcistub_probe               DRIVEROP
   pcistub_get_pci_dev_by_slot EXTERN
   pcistub_get_pci_dev         EXTERN

I did a cursory search for '\<pcistub_', and it seems to me all these 
functions are only called from functions like

   pciback_release_pci_dev (passthrough.c, slot.c and vpci.c)
   pciback_release_devices (passthrough.c, slot.c and vpci.c)

In my understanding we have no problems reported about the 
insertion/removal of the pciback module, or device (de)assignment to / 
from it.

> but in this situation you would have to call into the hypervisor (as
> the domain is still alive), i.e. you could as well call
> msi_remove_pci_irq_vectors().

I considered taking c/s 1070 and simply removing the ifs around the 
pci_disable_msi[x]() calls. However, msi_get_dev_owner(), called by 
msi_unmap_pirq(), could find no owner domain for the device and return 
DOMID_SELF. (This is different in upstream I think, see c/s 680.) Then 
msi_unmap_pirq() would try to unmap a PIRQ for dom0.


> Additionally, while covering the MSI-X case, I don't see how the
> similar already-mapped case would be cleanly handled for MSI.

The target is "cleanup after shutdown in such a way that it doesn't hurt 
in other cases either", so:

- it is assumed the hypervisor takes care of the mappings when the 
domain goes away,

- dom0 has no filtering list for MSI. msi_capability_init() "simply" 
asks the hypervisor for a vector, while msix_capability_init() "gets in 
the way".

The sole purpose of the patch is to trim the MSI-X "filtering" list 
(msi_dev_head) after the domain is gone. I'm not bold enough to yank out 
the filtering altogether, even though I think it only does damage wrt. 
reboots -- it must have been originally meant to catch the case when the 
*same* guest tries to ask for a set of MSI-X entries for a device, in 
such a way that the requested set is not distinct from entries requested 
previously for the same device.

The fact that the domid of the device-owning domain is not saved 
anywhere in the data structure (... at least in RHEL-5) seems to confirm 
this -- the current implementation has no way to detect that the owning 
domain has changed; I think it's even oblivious to this possiblity.

I'm looking for the best (... least wrong) location to place the cleanup 
at; c/s 1070 suggested pciback_reset_device().

Thanks!
lacos

  reply	other threads:[~2011-06-01 14:12 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-01 10:05 [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device Laszlo Ersek
2011-06-01 11:02 ` Laszlo Ersek
2011-06-01 14:31   ` Konrad Rzeszutek Wilk
2011-06-01 16:18     ` Paolo Bonzini
2011-06-01 17:32     ` Laszlo Ersek
2011-06-01 18:13       ` 2.6.38 (FC15) with PCI passthrough fails mysteriously with iommu=soft Konrad Rzeszutek Wilk
2011-06-02  7:42         ` Laszlo Ersek
2011-06-02 20:49         ` Pasi Kärkkäinen
2011-06-01 12:56 ` [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device Jan Beulich
2011-06-01 14:12   ` Laszlo Ersek [this message]
2011-06-01 14:59     ` Jan Beulich
2011-06-01 15:27       ` Laszlo Ersek
2011-06-01 16:07         ` Jan Beulich
2011-06-01 16:25           ` Andrew Jones
2011-06-01 16:27             ` Paolo Bonzini
2011-06-01 16:16       ` Paolo Bonzini
2011-06-01 14:51 ` Konrad Rzeszutek Wilk
2011-06-01 15:01   ` Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DE648C3.40602@redhat.com \
    --to=lersek@redhat.com \
    --cc=JBeulich@novell.com \
    --cc=drjones@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.