* [Qemu-devel] SR-IOV PF reset and QEMU VFs VFIO passthrough @ 2013-06-01 12:13 Benoît Canet 2013-06-02 14:11 ` Alex Williamson 2013-06-03 18:41 ` Don Dutile 0 siblings, 2 replies; 11+ messages in thread From: Benoît Canet @ 2013-06-01 12:13 UTC (permalink / raw) To: linux-pci, qemu-devel, iommu, alex.williamson Hello, I may have soon the PF driver of an SR-IOV card to code and make work with QEMU/KVM so I have the following questions. In an AMD64 setup where QEMU use VFIO to passthrough the VFs of an SR-IOV card to a guest will the consequences of a PF FLR be handled fine by QEMU and the guest ? I read that pci_reset_function would call pci_restore_state restoring the SR-IOV configuration after the reset of the PF. The ways the hardware work means that the VFs would disappear and reappear in a short lapse of time. Will these events be handled by the kernel pci hotplug code ? Given that the PF driver restore the PF config space after the reset will /sys files used by QEMU disappear and reappear messing the QEMU VFIO passthrough or will it goes smoothly ? Best regards Benoît Canet ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] SR-IOV PF reset and QEMU VFs VFIO passthrough 2013-06-01 12:13 [Qemu-devel] SR-IOV PF reset and QEMU VFs VFIO passthrough Benoît Canet @ 2013-06-02 14:11 ` Alex Williamson 2013-06-02 15:13 ` Benoît Canet 2013-06-03 18:41 ` Don Dutile 1 sibling, 1 reply; 11+ messages in thread From: Alex Williamson @ 2013-06-02 14:11 UTC (permalink / raw) To: Benoît Canet; +Cc: linux-pci, iommu, qemu-devel On Sat, 2013-06-01 at 14:13 +0200, Benoît Canet wrote: > Hello, > > I may have soon the PF driver of an SR-IOV card to code and make work with > QEMU/KVM so I have the following questions. > > In an AMD64 setup where QEMU use VFIO to passthrough the VFs of an SR-IOV card > to a guest will the consequences of a PF FLR be handled fine by QEMU and the > guest ? > > I read that pci_reset_function would call pci_restore_state restoring the SR-IOV > configuration after the reset of the PF. > > The ways the hardware work means that the VFs would disappear and reappear in a > short lapse of time. > > Will these events be handled by the kernel pci hotplug code ? > > Given that the PF driver restore the PF config space after the reset will /sys > files used by QEMU disappear and reappear messing the QEMU VFIO passthrough or > will it goes smoothly ? On an Intel 82576 SR-IOV NIC, a FLR of the PF does not cause the VFs to be removed. It's not clear to me that they continue working across the reset, but there is no hotplug. If there was a hotplug, vfio-pci won't release the device while it's in use, so the hotplug would be blocked until the devices becomes unused, such as from the VM being shutdown. Thanks, Alex ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] SR-IOV PF reset and QEMU VFs VFIO passthrough 2013-06-02 14:11 ` Alex Williamson @ 2013-06-02 15:13 ` Benoît Canet 0 siblings, 0 replies; 11+ messages in thread From: Benoît Canet @ 2013-06-02 15:13 UTC (permalink / raw) To: Alex Williamson; +Cc: Benoît Canet, linux-pci, iommu, qemu-devel Thanks a lot for the answer. Best regards Benoît > Le Sunday 02 Jun 2013 à 08:11:42 (-0600), Alex Williamson a écrit : > On Sat, 2013-06-01 at 14:13 +0200, Benoît Canet wrote: > > Hello, > > > > I may have soon the PF driver of an SR-IOV card to code and make work with > > QEMU/KVM so I have the following questions. > > > > In an AMD64 setup where QEMU use VFIO to passthrough the VFs of an SR-IOV card > > to a guest will the consequences of a PF FLR be handled fine by QEMU and the > > guest ? > > > > I read that pci_reset_function would call pci_restore_state restoring the SR-IOV > > configuration after the reset of the PF. > > > > The ways the hardware work means that the VFs would disappear and reappear in a > > short lapse of time. > > > > Will these events be handled by the kernel pci hotplug code ? > > > > Given that the PF driver restore the PF config space after the reset will /sys > > files used by QEMU disappear and reappear messing the QEMU VFIO passthrough or > > will it goes smoothly ? > > On an Intel 82576 SR-IOV NIC, a FLR of the PF does not cause the VFs to > be removed. It's not clear to me that they continue working across the > reset, but there is no hotplug. If there was a hotplug, vfio-pci won't > release the device while it's in use, so the hotplug would be blocked > until the devices becomes unused, such as from the VM being shutdown. > Thanks, > > Alex > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] SR-IOV PF reset and QEMU VFs VFIO passthrough 2013-06-01 12:13 [Qemu-devel] SR-IOV PF reset and QEMU VFs VFIO passthrough Benoît Canet 2013-06-02 14:11 ` Alex Williamson @ 2013-06-03 18:41 ` Don Dutile 2013-06-03 19:29 ` Benoît Canet 1 sibling, 1 reply; 11+ messages in thread From: Don Dutile @ 2013-06-03 18:41 UTC (permalink / raw) To: Benoît Canet; +Cc: linux-pci, iommu, alex.williamson, qemu-devel On 06/01/2013 08:13 AM, Benoît Canet wrote: > > Hello, > > I may have soon the PF driver of an SR-IOV card to code and make work with > QEMU/KVM so I have the following questions. > > In an AMD64 setup where QEMU use VFIO to passthrough the VFs of an SR-IOV card > to a guest will the consequences of a PF FLR be handled fine by QEMU and the > guest ? > the reset occurs long before the device is passed to the guest. > I read that pci_reset_function would call pci_restore_state restoring the SR-IOV > configuration after the reset of the PF. > correct. > The ways the hardware work means that the VFs would disappear and reappear in a > short lapse of time. > Not sure your definitiion of 'disappear'. If you mean: if I had another thread poking at the device, the device would appear to be removed, then come back (if os poking hasn't crashed from the device's lack of response). If you mean the VF gets entirely removed from the PCI tree, then no. A pci reset != hot unplug/plug. The device remains in the device tree. > Will these events be handled by the kernel pci hotplug code ? > 'these events' ??? -- which events.... FLR is currently done by libvirt & qemu/vfio to ensure assigned devices are quiesced as they are switched from host->guest domain, and guest->(back-to-)host domain. > Given that the PF driver restore the PF config space after the reset will /sys The PF driver doesn't do the config space restore -- it's done in PCI core code. > files used by QEMU disappear and reappear messing the QEMU VFIO passthrough or As stated above, the devices don't disappear from the device tree, so they don't get removed/added to the /sys(/bus/pci/...) files. > will it goes smoothly ? > it goes smoothly today.... :-/ > Best regards > > Benoît Canet > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] SR-IOV PF reset and QEMU VFs VFIO passthrough 2013-06-03 18:41 ` Don Dutile @ 2013-06-03 19:29 ` Benoît Canet 2013-06-03 20:55 ` Don Dutile 0 siblings, 1 reply; 11+ messages in thread From: Benoît Canet @ 2013-06-03 19:29 UTC (permalink / raw) To: Don Dutile Cc: Benoît Canet, linux-pci, iommu, alex.williamson, qemu-devel > >to a guest will the consequences of a PF FLR be handled fine by QEMU and the > >guest ? > > > the reset occurs long before the device is passed to the guest. I was asking this because the PF driver should reset the PF while the VF are used by VFIO/QEMU when the PF doesn't respond anymore. > The PF driver doesn't do the config space restore -- it's done in PCI core code. > >files used by QEMU disappear and reappear messing the QEMU VFIO passthrough or > As stated above, the devices don't disappear from the device tree, so they don't > get removed/added to the /sys(/bus/pci/...) files. > > >will it goes smoothly ? > > > it goes smoothly today.... :-/ Happy to read that thanks for the answer. Best regards Benoît Canet ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] SR-IOV PF reset and QEMU VFs VFIO passthrough 2013-06-03 19:29 ` Benoît Canet @ 2013-06-03 20:55 ` Don Dutile 2013-06-03 21:27 ` Benoît Canet 0 siblings, 1 reply; 11+ messages in thread From: Don Dutile @ 2013-06-03 20:55 UTC (permalink / raw) To: Benoît Canet; +Cc: linux-pci, iommu, alex.williamson, qemu-devel On 06/03/2013 03:29 PM, Benoît Canet wrote: >>> to a guest will the consequences of a PF FLR be handled fine by QEMU and the >>> guest ? >>> >> the reset occurs long before the device is passed to the guest. > > I was asking this because the PF driver should reset the PF while the VF are > used by VFIO/QEMU when the PF doesn't respond anymore. > What your VF does while your PF is being reset is PF (& VF) dependent. A 'good design' would not impact the VF operation, other than to stall it until the PF completed reset. My experience, though, is that the PF has to be brought up to some level of functionality to share the physical resources with the VFs. >> The PF driver doesn't do the config space restore -- it's done in PCI core code. >>> files used by QEMU disappear and reappear messing the QEMU VFIO passthrough or >> As stated above, the devices don't disappear from the device tree, so they don't >> get removed/added to the /sys(/bus/pci/...) files. >> >>> will it goes smoothly ? >>> >> it goes smoothly today.... :-/ > > Happy to read that thanks for the answer. > > Best regards > > Benoît Canet > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] SR-IOV PF reset and QEMU VFs VFIO passthrough 2013-06-03 20:55 ` Don Dutile @ 2013-06-03 21:27 ` Benoît Canet 2013-06-03 21:42 ` Don Dutile 0 siblings, 1 reply; 11+ messages in thread From: Benoît Canet @ 2013-06-03 21:27 UTC (permalink / raw) To: Don Dutile Cc: Benoît Canet, linux-pci, iommu, alex.williamson, qemu-devel > >I was asking this because the PF driver should reset the PF while the VF are > >used by VFIO/QEMU when the PF doesn't respond anymore. > > > What your VF does while your PF is being reset is PF (& VF) dependent. > A 'good design' would not impact the VF operation, other than to stall it until > the PF completed reset. My experience, though, is that the PF has to be brought > up to some level of functionality to share the physical resources with the VFs. When the PF does an FLR the hardware go back to its default state, the SR-IOV configuration is gone and the VFs disappears from the bus. Then the restore state function of the kernel reset code would bring the SR-IOV PF configuration back. The hardware also have a privately owned SR-IOV related configuration in the PF configuration space. This configuration is used to configure the VFs resources. (memory) Best regards Benoît Canet ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] SR-IOV PF reset and QEMU VFs VFIO passthrough 2013-06-03 21:27 ` Benoît Canet @ 2013-06-03 21:42 ` Don Dutile 2013-06-03 21:58 ` Benoît Canet 0 siblings, 1 reply; 11+ messages in thread From: Don Dutile @ 2013-06-03 21:42 UTC (permalink / raw) To: Benoît Canet; +Cc: linux-pci, iommu, alex.williamson, qemu-devel On 06/03/2013 05:27 PM, Benoît Canet wrote: >>> I was asking this because the PF driver should reset the PF while the VF are >>> used by VFIO/QEMU when the PF doesn't respond anymore. >>> >> What your VF does while your PF is being reset is PF (& VF) dependent. >> A 'good design' would not impact the VF operation, other than to stall it until >> the PF completed reset. My experience, though, is that the PF has to be brought >> up to some level of functionality to share the physical resources with the VFs. > > When the PF does an FLR the hardware go back to its default state, the SR-IOV > configuration is gone and the VFs disappears from the bus. > Then the restore state function of the kernel reset code would bring the SR-IOV > PF configuration back. > Ok, now you're a bit mis-led here. The configuration header for SRIOV is _not_ put back. Only the std, PCI config header section is put back in place, along with msi(x), pm-caps. If the hw wipes out all VF state setup (which it should, IMO), all VF configuration will be lost in the hw... *but*, the PCI core will still think the VFs exist (not hot-unplugged, no more than PF was); trying to setup the VFs again, will fail (or worse). > The hardware also have a privately owned SR-IOV related configuration in the PF > configuration space. This configuration is used to configure the VFs resources. > (memory) > Per the SRIOV spec, yes, but that's in PCIe ext cfg space. That area of the PCI configuration is not saved or restored by dev-reset. > Best regards > > Benoît Canet > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] SR-IOV PF reset and QEMU VFs VFIO passthrough 2013-06-03 21:42 ` Don Dutile @ 2013-06-03 21:58 ` Benoît Canet 2013-06-03 22:03 ` Don Dutile 0 siblings, 1 reply; 11+ messages in thread From: Benoît Canet @ 2013-06-03 21:58 UTC (permalink / raw) To: Don Dutile Cc: Benoît Canet, linux-pci, iommu, alex.williamson, qemu-devel > >When the PF does an FLR the hardware go back to its default state, the SR-IOV > >configuration is gone and the VFs disappears from the bus. > >Then the restore state function of the kernel reset code would bring the SR-IOV > >PF configuration back. > > > Ok, now you're a bit mis-led here. > The configuration header for SRIOV is _not_ put back. > Only the std, PCI config header section is put back in place, along with > msi(x), pm-caps. > If the hw wipes out all VF state setup (which it should, IMO), all VF configuration > will be lost in the hw... > *but*, the PCI core will still think the VFs exist (not hot-unplugged, no more than PF was); > trying to setup the VFs again, will fail (or worse). I read the following code on a not so hold kernel. ----------- int pci_reset_function(struct pci_dev *dev) { >.......int rc; >.......rc = pci_dev_reset(dev, 1); >.......if (rc) >.......>.......return rc; >.......pci_save_state(dev); >......./* >....... * both INTx and MSI are disabled after the Interrupt Disable bit >....... * is set and the Bus Master bit is cleared. >....... */ >.......pci_write_config_word(dev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE); >.......rc = pci_dev_reset(dev, 0); >.......pci_restore_state(dev); >.......return rc; } EXPORT_SYMBOL_GPL(pci_reset_function); ----------- and ----------- /** * pci_restore_state - Restore the saved state of a PCI device * @dev: - PCI device that we're dealing with */ void pci_restore_state(struct pci_dev *dev) { >.......if (!dev->state_saved) >.......>.......return; >......./* PCI Express register must be restored first */ >.......pci_restore_pcie_state(dev); >.......pci_restore_ats_state(dev); >.......pci_restore_config_space(dev); >.......pci_restore_pcix_state(dev); >.......pci_restore_msi_state(dev); >.......pci_restore_iov_state(dev); >.......dev->state_saved = false; } ----------- with pci_restore_iov_state calling sriov_restore_state: ----------- static void sriov_restore_state(struct pci_dev *dev) { >.......int i; >.......u16 ctrl; >.......struct pci_sriov *iov = dev->sriov; >.......pci_read_config_word(dev, iov->pos + PCI_SRIOV_CTRL, &ctrl); >.......if (ctrl & PCI_SRIOV_CTRL_VFE) >.......>.......return; >.......for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++) >.......>.......pci_update_resource(dev, i); >.......pci_write_config_dword(dev, iov->pos + PCI_SRIOV_SYS_PGSIZE, iov->pgsz); >.......pci_write_config_word(dev, iov->pos + PCI_SRIOV_NUM_VF, iov->num_VFs); >.......pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl); >.......if (iov->ctrl & PCI_SRIOV_CTRL_VFE) >.......>.......msleep(100); } -------- The sriov_restore_state looked like if it does the right thing but maybe I missread the code. > > >The hardware also have a privately owned SR-IOV related configuration in the PF > >configuration space. This configuration is used to configure the VFs resources. > >(memory) > > > Per the SRIOV spec, yes, but that's in PCIe ext cfg space. > That area of the PCI configuration is not saved or restored by dev-reset. Can a callback be added so PF driver can restore this state ? Best regards Benoît Canet ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] SR-IOV PF reset and QEMU VFs VFIO passthrough 2013-06-03 21:58 ` Benoît Canet @ 2013-06-03 22:03 ` Don Dutile 2013-06-04 15:54 ` Benoît Canet 0 siblings, 1 reply; 11+ messages in thread From: Don Dutile @ 2013-06-03 22:03 UTC (permalink / raw) To: Benoît Canet; +Cc: linux-pci, iommu, alex.williamson, qemu-devel On 06/03/2013 05:58 PM, Benoît Canet wrote: >>> When the PF does an FLR the hardware go back to its default state, the SR-IOV >>> configuration is gone and the VFs disappears from the bus. >>> Then the restore state function of the kernel reset code would bring the SR-IOV >>> PF configuration back. >>> >> Ok, now you're a bit mis-led here. >> The configuration header for SRIOV is _not_ put back. >> Only the std, PCI config header section is put back in place, along with >> msi(x), pm-caps. >> If the hw wipes out all VF state setup (which it should, IMO), all VF configuration >> will be lost in the hw... >> *but*, the PCI core will still think the VFs exist (not hot-unplugged, no more than PF was); >> trying to setup the VFs again, will fail (or worse). > > I read the following code on a not so hold kernel. > > ----------- > int pci_reset_function(struct pci_dev *dev) > { >> .......int rc; > >> .......rc = pci_dev_reset(dev, 1); >> .......if (rc) >> .......>.......return rc; > >> .......pci_save_state(dev); > >> ......./* >> ....... * both INTx and MSI are disabled after the Interrupt Disable bit >> ....... * is set and the Bus Master bit is cleared. >> ....... */ >> .......pci_write_config_word(dev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE); > >> .......rc = pci_dev_reset(dev, 0); > >> .......pci_restore_state(dev); > >> .......return rc; > } > EXPORT_SYMBOL_GPL(pci_reset_function); > ----------- > > and > > ----------- > /** > * pci_restore_state - Restore the saved state of a PCI device > * @dev: - PCI device that we're dealing with > */ > void pci_restore_state(struct pci_dev *dev) > { >> .......if (!dev->state_saved) >> .......>.......return; > >> ......./* PCI Express register must be restored first */ >> .......pci_restore_pcie_state(dev); >> .......pci_restore_ats_state(dev); > >> .......pci_restore_config_space(dev); > >> .......pci_restore_pcix_state(dev); >> .......pci_restore_msi_state(dev); >> .......pci_restore_iov_state(dev); > >> .......dev->state_saved = false; > } > ----------- > > with pci_restore_iov_state calling sriov_restore_state: > > ----------- > static void sriov_restore_state(struct pci_dev *dev) > { >> .......int i; >> .......u16 ctrl; >> .......struct pci_sriov *iov = dev->sriov; > >> .......pci_read_config_word(dev, iov->pos + PCI_SRIOV_CTRL,&ctrl); >> .......if (ctrl& PCI_SRIOV_CTRL_VFE) >> .......>.......return; > >> .......for (i = PCI_IOV_RESOURCES; i<= PCI_IOV_RESOURCE_END; i++) >> .......>.......pci_update_resource(dev, i); > >> .......pci_write_config_dword(dev, iov->pos + PCI_SRIOV_SYS_PGSIZE, iov->pgsz); >> .......pci_write_config_word(dev, iov->pos + PCI_SRIOV_NUM_VF, iov->num_VFs); >> .......pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl); >> .......if (iov->ctrl& PCI_SRIOV_CTRL_VFE) >> .......>.......msleep(100); > } > -------- > > The sriov_restore_state looked like if it does the right thing but maybe I missread > the code. > /my bad; I forgot about the save|restore_iov_state calls.... doh! Now it gets down to how well your hw (& driver) works after the reset is done... >> >>> The hardware also have a privately owned SR-IOV related configuration in the PF >>> configuration space. This configuration is used to configure the VFs resources. >>> (memory) >>> >> Per the SRIOV spec, yes, but that's in PCIe ext cfg space. >> That area of the PCI configuration is not saved or restored by dev-reset. > > Can a callback be added so PF driver can restore this state ? > As you pointed out, no need to, unless it's a device-specific, PCIe cap structure. the SRIOV caps are re-instated, as you showed above... > Best regards > > Benoît Canet > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] SR-IOV PF reset and QEMU VFs VFIO passthrough 2013-06-03 22:03 ` Don Dutile @ 2013-06-04 15:54 ` Benoît Canet 0 siblings, 0 replies; 11+ messages in thread From: Benoît Canet @ 2013-06-04 15:54 UTC (permalink / raw) To: Don Dutile Cc: Benoît Canet, linux-pci, iommu, alex.williamson, qemu-devel > >>Per the SRIOV spec, yes, but that's in PCIe ext cfg space. > >>That area of the PCI configuration is not saved or restored by dev-reset. > > > >Can a callback be added so PF driver can restore this state ? > > > As you pointed out, no need to, unless it's a device-specific, > PCIe cap structure. the SRIOV caps are re-instated, as you showed above... I think this is a device specific structure. It would be annoying only in case of a PF FLR initiated via /sys or something external to the PF driver because the PF driver could restore this area after it initiate an FLR. Best regards Benoît Canet ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2013-06-04 15:53 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-06-01 12:13 [Qemu-devel] SR-IOV PF reset and QEMU VFs VFIO passthrough Benoît Canet 2013-06-02 14:11 ` Alex Williamson 2013-06-02 15:13 ` Benoît Canet 2013-06-03 18:41 ` Don Dutile 2013-06-03 19:29 ` Benoît Canet 2013-06-03 20:55 ` Don Dutile 2013-06-03 21:27 ` Benoît Canet 2013-06-03 21:42 ` Don Dutile 2013-06-03 21:58 ` Benoît Canet 2013-06-03 22:03 ` Don Dutile 2013-06-04 15:54 ` Benoît Canet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).