* Potential issue with hypervisor_isolated_pci_functions() handling
@ 2025-10-22 11:39 Niklas Schnelle
2025-10-22 14:49 ` Huacai Chen
0 siblings, 1 reply; 6+ messages in thread
From: Niklas Schnelle @ 2025-10-22 11:39 UTC (permalink / raw)
To: Huacai Chen, Jan Kiszka, Bjorn Helgaas
Cc: linux-pci, jailhouse-dev, linux-s390, loongarch, Farhan Ali,
Matthew Rosato, Tianrui Zhao, Gerd Bayer
Hi Huacai, Hi Jan, Hi Bjorn,
I noticed that with commit a02fd05661d7 ("PCI: Extend isolated function
probing to LoongArch") LoongArch now also makes use of the isolated
function probing. First, nice to see this sees more users and that the
interface is useful to you.
Seeing this, I was reminded of an issue I ran into when using this
mechanism with SR-IOV capable devices. In that case VFs with devfn > 7
and PCI_SLOT(devfn) != 0 wouldn't get probed.
This is because if a device is found next_fn() will check whether dev-
>multifunction is set. So if for example devfn == 8 is found (fn == 0)
dev->multifunction won't be set in pci_scan_slot() and for VFs it's
also not set via PCI_HEADER_TYPE_MFD. So we won't check for devfn == 9
even though that could very well be there.
Now for s390 this currently doesn't cause issues because for all
multifunction devices we have, we either get a VF alone and then since
commit 25f39d3dcb48 ("s390/pci: Ignore RID for isolated VFs") use devfn
== 0, or we have the parent PF passed-through and then VFs always get
hot plugged after SR-IOV enable, which then uses
pci_scan_single_device(). And for non VFs we always have devfn == 0
and/or devfn == 1 and multifunction via the header. So in a sense the
above commit works around the issue, though that wasn't its primary
intention.
Did either of you also run into this issue or can reproduce it?
Somewhat related, if ARI is enabled this would also break the isolated
function case including on s390 where ARI is used by the platform
firmware, but detected as off by Linux because there is no struct
pci_dev associated with the PCI bus. If I patch Linux to correctly
detect ARI, it no longer finds an isolated PF with devfn == 1.
Thanks,
Niklas Schnelle
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Potential issue with hypervisor_isolated_pci_functions() handling 2025-10-22 11:39 Potential issue with hypervisor_isolated_pci_functions() handling Niklas Schnelle @ 2025-10-22 14:49 ` Huacai Chen 2025-10-22 17:33 ` Niklas Schnelle 2025-10-23 15:25 ` Niklas Schnelle 0 siblings, 2 replies; 6+ messages in thread From: Huacai Chen @ 2025-10-22 14:49 UTC (permalink / raw) To: Niklas Schnelle Cc: Jan Kiszka, Bjorn Helgaas, linux-pci, jailhouse-dev, linux-s390, loongarch, Farhan Ali, Matthew Rosato, Tianrui Zhao, Gerd Bayer Hi, Niklas, On Wed, Oct 22, 2025 at 7:40 PM Niklas Schnelle <schnelle@linux.ibm.com> wrote: > > Hi Huacai, Hi Jan, Hi Bjorn, > > I noticed that with commit a02fd05661d7 ("PCI: Extend isolated function > probing to LoongArch") LoongArch now also makes use of the isolated > function probing. First, nice to see this sees more users and that the > interface is useful to you. > > Seeing this, I was reminded of an issue I ran into when using this > mechanism with SR-IOV capable devices. In that case VFs with devfn > 7 > and PCI_SLOT(devfn) != 0 wouldn't get probed. > This is because if a device is found next_fn() will check whether dev- > >multifunction is set. So if for example devfn == 8 is found (fn == 0) > dev->multifunction won't be set in pci_scan_slot() and for VFs it's > also not set via PCI_HEADER_TYPE_MFD. So we won't check for devfn == 9 > even though that could very well be there. > > Now for s390 this currently doesn't cause issues because for all > multifunction devices we have, we either get a VF alone and then since > commit 25f39d3dcb48 ("s390/pci: Ignore RID for isolated VFs") use devfn > == 0, or we have the parent PF passed-through and then VFs always get > hot plugged after SR-IOV enable, which then uses > pci_scan_single_device(). And for non VFs we always have devfn == 0 > and/or devfn == 1 and multifunction via the header. So in a sense the > above commit works around the issue, though that wasn't its primary > intention. > > Did either of you also run into this issue or can reproduce it? > > Somewhat related, if ARI is enabled this would also break the isolated > function case including on s390 where ARI is used by the platform > firmware, but detected as off by Linux because there is no struct > pci_dev associated with the PCI bus. If I patch Linux to correctly > detect ARI, it no longer finds an isolated PF with devfn == 1. LoongArch do have some problems after commit a02fd05661d7 ("PCI: Extend isolated function probing to LoongArch"). Please see: https://lore.kernel.org/linux-pci/20251014074100.2149737-1-chenhuacai@loongson.cn/ Now we don't know what happens exactly, so I haven't answered Bjorn's question... Huacai > > Thanks, > Niklas Schnelle ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Potential issue with hypervisor_isolated_pci_functions() handling 2025-10-22 14:49 ` Huacai Chen @ 2025-10-22 17:33 ` Niklas Schnelle 2025-10-23 15:25 ` Niklas Schnelle 1 sibling, 0 replies; 6+ messages in thread From: Niklas Schnelle @ 2025-10-22 17:33 UTC (permalink / raw) To: Huacai Chen Cc: Jan Kiszka, Bjorn Helgaas, linux-pci, jailhouse-dev, linux-s390, loongarch, Farhan Ali, Matthew Rosato, Tianrui Zhao, Gerd Bayer On Wed, 2025-10-22 at 22:49 +0800, Huacai Chen wrote: > Hi, Niklas, > > On Wed, Oct 22, 2025 at 7:40 PM Niklas Schnelle <schnelle@linux.ibm.com> wrote: > > > > Hi Huacai, Hi Jan, Hi Bjorn, > > > > I noticed that with commit a02fd05661d7 ("PCI: Extend isolated function > > probing to LoongArch") LoongArch now also makes use of the isolated > > function probing. First, nice to see this sees more users and that the > > interface is useful to you. > > > > Seeing this, I was reminded of an issue I ran into when using this > > mechanism with SR-IOV capable devices. In that case VFs with devfn > 7 > > and PCI_SLOT(devfn) != 0 wouldn't get probed. > > This is because if a device is found next_fn() will check whether dev- > > > multifunction is set. So if for example devfn == 8 is found (fn == 0) > > dev->multifunction won't be set in pci_scan_slot() and for VFs it's > > also not set via PCI_HEADER_TYPE_MFD. So we won't check for devfn == 9 > > even though that could very well be there. > > > > Now for s390 this currently doesn't cause issues because for all > > multifunction devices we have, we either get a VF alone and then since > > commit 25f39d3dcb48 ("s390/pci: Ignore RID for isolated VFs") use devfn > > == 0, or we have the parent PF passed-through and then VFs always get > > hot plugged after SR-IOV enable, which then uses > > pci_scan_single_device(). And for non VFs we always have devfn == 0 > > and/or devfn == 1 and multifunction via the header. So in a sense the > > above commit works around the issue, though that wasn't its primary > > intention. > > > > Did either of you also run into this issue or can reproduce it? > > > > Somewhat related, if ARI is enabled this would also break the isolated > > function case including on s390 where ARI is used by the platform > > firmware, but detected as off by Linux because there is no struct > > pci_dev associated with the PCI bus. If I patch Linux to correctly > > detect ARI, it no longer finds an isolated PF with devfn == 1. > LoongArch do have some problems after commit a02fd05661d7 ("PCI: > Extend isolated function probing to LoongArch"). Please see: > https://lore.kernel.org/linux-pci/20251014074100.2149737-1-chenhuacai@loongson.cn/ > > Now we don't know what happens exactly, so I haven't answered Bjorn's > question... > > Huacai > Interesting. I'm working on a patch that simplifies and hopefully fixes the isolated function probing by basically going back to basics and just trying every devfn 0-255. It would be interesting if that fixes your issue too. In the linked patch you said that the hole left by function 0 is not supposed to be probed, but with disabled isolated function probing you'd still try to scan function 0 so I can't see how that would explain your issues. On the other hand it doesn't sound like an exact match of what I saw. I'm still working on the cover letter and more testing including testing on x86, but if you're curious the current state is on my git.kernel.org account[0] in the b4/ari_no_bus_dev branch. Thanks, Niklas [0] https://git.kernel.org/pub/scm/linux/kernel/git/niks/linux.git/log/?h=b4/ari_no_bus_dev ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Potential issue with hypervisor_isolated_pci_functions() handling 2025-10-22 14:49 ` Huacai Chen 2025-10-22 17:33 ` Niklas Schnelle @ 2025-10-23 15:25 ` Niklas Schnelle 2025-10-24 10:05 ` Huacai Chen 1 sibling, 1 reply; 6+ messages in thread From: Niklas Schnelle @ 2025-10-23 15:25 UTC (permalink / raw) To: Huacai Chen Cc: Jan Kiszka, Bjorn Helgaas, linux-pci, jailhouse-dev, linux-s390, loongarch, Farhan Ali, Matthew Rosato, Tianrui Zhao, Gerd Bayer On Wed, 2025-10-22 at 22:49 +0800, Huacai Chen wrote: > Hi, Niklas, > > On Wed, Oct 22, 2025 at 7:40 PM Niklas Schnelle <schnelle@linux.ibm.com> wrote: > > > > Hi Huacai, Hi Jan, Hi Bjorn, > > > > I noticed that with commit a02fd05661d7 ("PCI: Extend isolated function > > probing to LoongArch") LoongArch now also makes use of the isolated > > function probing. First, nice to see this sees more users and that the > > interface is useful to you. > > > > Seeing this, I was reminded of an issue I ran into when using this > > mechanism with SR-IOV capable devices. In that case VFs with devfn > 7 > > and PCI_SLOT(devfn) != 0 wouldn't get probed. > > This is because if a device is found next_fn() will check whether dev- > > > multifunction is set. So if for example devfn == 8 is found (fn == 0) > > dev->multifunction won't be set in pci_scan_slot() and for VFs it's > > also not set via PCI_HEADER_TYPE_MFD. So we won't check for devfn == 9 > > even though that could very well be there. > > > > Now for s390 this currently doesn't cause issues because for all > > multifunction devices we have, we either get a VF alone and then since > > commit 25f39d3dcb48 ("s390/pci: Ignore RID for isolated VFs") use devfn > > == 0, or we have the parent PF passed-through and then VFs always get > > hot plugged after SR-IOV enable, which then uses > > pci_scan_single_device(). And for non VFs we always have devfn == 0 > > and/or devfn == 1 and multifunction via the header. So in a sense the > > above commit works around the issue, though that wasn't its primary > > intention. > > > > Did either of you also run into this issue or can reproduce it? > > > > Somewhat related, if ARI is enabled this would also break the isolated > > function case including on s390 where ARI is used by the platform > > firmware, but detected as off by Linux because there is no struct > > pci_dev associated with the PCI bus. If I patch Linux to correctly > > detect ARI, it no longer finds an isolated PF with devfn == 1. > LoongArch do have some problems after commit a02fd05661d7 ("PCI: > Extend isolated function probing to LoongArch"). Please see: > https://lore.kernel.org/linux-pci/20251014074100.2149737-1-chenhuacai@loongson.cn/ > > Now we don't know what happens exactly, so I haven't answered Bjorn's > question... > > Huacai > Hi Huacai, I have now send[0] a fix proposal together with correctly exposing ARI enablement in s390. I also reproduced both the issue I saw with SR-IOV and with ARI and confirmed that the first patch in the series fixes them. You're on Cc and I'd highly appreciate if you could test whether this fixes your issues as well. Thanks, Niklas [0] https://lore.kernel.org/linux-pci/20251023-ari_no_bus_dev-v4-0-8482e2ed10bd@linux.ibm.com/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Potential issue with hypervisor_isolated_pci_functions() handling 2025-10-23 15:25 ` Niklas Schnelle @ 2025-10-24 10:05 ` Huacai Chen 2025-10-24 10:27 ` Niklas Schnelle 0 siblings, 1 reply; 6+ messages in thread From: Huacai Chen @ 2025-10-24 10:05 UTC (permalink / raw) To: Niklas Schnelle Cc: Jan Kiszka, Bjorn Helgaas, linux-pci, jailhouse-dev, linux-s390, loongarch, Farhan Ali, Matthew Rosato, Tianrui Zhao, Gerd Bayer Hi, Niklas, On Thu, Oct 23, 2025 at 11:25 PM Niklas Schnelle <schnelle@linux.ibm.com> wrote: > > On Wed, 2025-10-22 at 22:49 +0800, Huacai Chen wrote: > > Hi, Niklas, > > > > On Wed, Oct 22, 2025 at 7:40 PM Niklas Schnelle <schnelle@linux.ibm.com> wrote: > > > > > > Hi Huacai, Hi Jan, Hi Bjorn, > > > > > > I noticed that with commit a02fd05661d7 ("PCI: Extend isolated function > > > probing to LoongArch") LoongArch now also makes use of the isolated > > > function probing. First, nice to see this sees more users and that the > > > interface is useful to you. > > > > > > Seeing this, I was reminded of an issue I ran into when using this > > > mechanism with SR-IOV capable devices. In that case VFs with devfn > 7 > > > and PCI_SLOT(devfn) != 0 wouldn't get probed. > > > This is because if a device is found next_fn() will check whether dev- > > > > multifunction is set. So if for example devfn == 8 is found (fn == 0) > > > dev->multifunction won't be set in pci_scan_slot() and for VFs it's > > > also not set via PCI_HEADER_TYPE_MFD. So we won't check for devfn == 9 > > > even though that could very well be there. > > > > > > Now for s390 this currently doesn't cause issues because for all > > > multifunction devices we have, we either get a VF alone and then since > > > commit 25f39d3dcb48 ("s390/pci: Ignore RID for isolated VFs") use devfn > > > == 0, or we have the parent PF passed-through and then VFs always get > > > hot plugged after SR-IOV enable, which then uses > > > pci_scan_single_device(). And for non VFs we always have devfn == 0 > > > and/or devfn == 1 and multifunction via the header. So in a sense the > > > above commit works around the issue, though that wasn't its primary > > > intention. > > > > > > Did either of you also run into this issue or can reproduce it? > > > > > > Somewhat related, if ARI is enabled this would also break the isolated > > > function case including on s390 where ARI is used by the platform > > > firmware, but detected as off by Linux because there is no struct > > > pci_dev associated with the PCI bus. If I patch Linux to correctly > > > detect ARI, it no longer finds an isolated PF with devfn == 1. > > LoongArch do have some problems after commit a02fd05661d7 ("PCI: > > Extend isolated function probing to LoongArch"). Please see: > > https://lore.kernel.org/linux-pci/20251014074100.2149737-1-chenhuacai@loongson.cn/ > > > > Now we don't know what happens exactly, so I haven't answered Bjorn's > > question... > > > > Huacai > > > > Hi Huacai, > > I have now send[0] a fix proposal together with correctly exposing ARI > enablement in s390. I also reproduced both the issue I saw with SR-IOV > and with ARI and confirmed that the first patch in the series fixes > them. You're on Cc and I'd highly appreciate if you could test whether > this fixes your issues as well. > > Thanks, > Niklas > > [0] > https://lore.kernel.org/linux-pci/20251023-ari_no_bus_dev-v4-0-8482e2ed10bd@linux.ibm.com/ There are two patches in this series. If LoongArch has a similar problem as S390, only the first patch is enough, or should we need an architectural change similar to the second patch? Huacai ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Potential issue with hypervisor_isolated_pci_functions() handling 2025-10-24 10:05 ` Huacai Chen @ 2025-10-24 10:27 ` Niklas Schnelle 0 siblings, 0 replies; 6+ messages in thread From: Niklas Schnelle @ 2025-10-24 10:27 UTC (permalink / raw) To: Huacai Chen Cc: Jan Kiszka, Bjorn Helgaas, linux-pci, jailhouse-dev, linux-s390, loongarch, Farhan Ali, Matthew Rosato, Tianrui Zhao, Gerd Bayer On Fri, 2025-10-24 at 18:05 +0800, Huacai Chen wrote: > Hi, Niklas, > > On Thu, Oct 23, 2025 at 11:25 PM Niklas Schnelle <schnelle@linux.ibm.com> wrote: > > > > On Wed, 2025-10-22 at 22:49 +0800, Huacai Chen wrote: > > > Hi, Niklas, > > > > > > On Wed, Oct 22, 2025 at 7:40 PM Niklas Schnelle <schnelle@linux.ibm.com> wrote: > > > > > > > > Hi Huacai, Hi Jan, Hi Bjorn, > > > > > > > > I noticed that with commit a02fd05661d7 ("PCI: Extend isolated function > > > > probing to LoongArch") LoongArch now also makes use of the isolated > > > > function probing. First, nice to see this sees more users and that the > > > > interface is useful to you. > > > > > > > > Seeing this, I was reminded of an issue I ran into when using this > > > > mechanism with SR-IOV capable devices. In that case VFs with devfn > 7 > > > > and PCI_SLOT(devfn) != 0 wouldn't get probed. > > > > This is because if a device is found next_fn() will check whether dev- > > > > > multifunction is set. So if for example devfn == 8 is found (fn == 0) > > > > dev->multifunction won't be set in pci_scan_slot() and for VFs it's > > > > also not set via PCI_HEADER_TYPE_MFD. So we won't check for devfn == 9 > > > > even though that could very well be there. > > > > > > > > Now for s390 this currently doesn't cause issues because for all > > > > multifunction devices we have, we either get a VF alone and then since > > > > commit 25f39d3dcb48 ("s390/pci: Ignore RID for isolated VFs") use devfn > > > > == 0, or we have the parent PF passed-through and then VFs always get > > > > hot plugged after SR-IOV enable, which then uses > > > > pci_scan_single_device(). And for non VFs we always have devfn == 0 > > > > and/or devfn == 1 and multifunction via the header. So in a sense the > > > > above commit works around the issue, though that wasn't its primary > > > > intention. > > > > > > > > Did either of you also run into this issue or can reproduce it? > > > > > > > > Somewhat related, if ARI is enabled this would also break the isolated > > > > function case including on s390 where ARI is used by the platform > > > > firmware, but detected as off by Linux because there is no struct > > > > pci_dev associated with the PCI bus. If I patch Linux to correctly > > > > detect ARI, it no longer finds an isolated PF with devfn == 1. > > > LoongArch do have some problems after commit a02fd05661d7 ("PCI: > > > Extend isolated function probing to LoongArch"). Please see: > > > https://lore.kernel.org/linux-pci/20251014074100.2149737-1-chenhuacai@loongson.cn/ > > > > > > Now we don't know what happens exactly, so I haven't answered Bjorn's > > > question... > > > > > > Huacai > > > > > > > Hi Huacai, > > > > I have now send[0] a fix proposal together with correctly exposing ARI > > enablement in s390. I also reproduced both the issue I saw with SR-IOV > > and with ARI and confirmed that the first patch in the series fixes > > them. You're on Cc and I'd highly appreciate if you could test whether > > this fixes your issues as well. > > > > Thanks, > > Niklas > > > > [0] > > https://lore.kernel.org/linux-pci/20251023-ari_no_bus_dev-v4-0-8482e2ed10bd@linux.ibm.com/ > > There are two patches in this series. If LoongArch has a similar > problem as S390, only the first patch is enough, or should we need an > architectural change similar to the second patch? > > Huacai Just for probing the devices the first patch should be enough. That said if the way you do PCI pass-through means you also end up without a struct pci_dev representing PCI bridges (bus->self is NULL for all PCI busses), then something like the s390 specific part of the second patch might be necessary to correctly detect ARI. Also it shouldn't hurt to test with both patches. Thanks, Niklas ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-10-24 10:27 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-10-22 11:39 Potential issue with hypervisor_isolated_pci_functions() handling Niklas Schnelle 2025-10-22 14:49 ` Huacai Chen 2025-10-22 17:33 ` Niklas Schnelle 2025-10-23 15:25 ` Niklas Schnelle 2025-10-24 10:05 ` Huacai Chen 2025-10-24 10:27 ` Niklas Schnelle
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).