* Query on ARM SMMUv3 nested support @ 2024-03-13 10:13 Shameerali Kolothum Thodi 2024-03-13 23:50 ` Nicolin Chen 0 siblings, 1 reply; 9+ messages in thread From: Shameerali Kolothum Thodi @ 2024-03-13 10:13 UTC (permalink / raw) To: Nicolin Chen, Jason Gunthorpe Cc: iommu@lists.linux.dev, Linuxarm, Zhangfei Gao, Michael Shavit, Eric Auger, Moritz Fischer Hi Nicolin, Thanks for the latest repos with basic SMMUv3 nested support enabled[1]. I did some basic sanity runs on a HiSilicon platform and they seems to work as expected. The only problem being we can't assign two devices to the VM if they are on different physical SMMUs. qemu-system-aarch64: -device vfio-pci,host=0000:75:00.1,iommufd=iommufd0: [iommufd=29] error attach 0000:75:00.1 (36) to id=4: Invalid argument qemu-system-aarch64: -device vfio-pci,host=0000:75:00.1,iommufd=iommufd0: Unable to attach dev to stage-2 HW pagetable: -1 Segmentation fault (core dumped) ... I see that on the Qemu side we now allocate a single s2 hwpt and attach that into all the devices . But this will only work currently if all the assigned devices are under the same physical SMMUv3 as we have a check in kernel whether domains are having the same SMMUv3. I remember Jason mentioning that he is planning to relax that. So are the Qemu side changes based on that assumption? And any idea how we are planning to relax that restriction? We do a check for compatibility of the phys SMMUv3s and then allow/restrict in kernel? Do Qemu can then try allocating a separate s2 hwpt for those and attach again? Sorry if this was already discussed elsewhere and I missed that. Thanks, Shameer [1]: https://github.com/nicolinc/iommufd/commits/wip/iommufd_nesting-03112024/ https://github.com/nicolinc/qemu/commits/wip/iommufd_vsmmu-02292024/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Query on ARM SMMUv3 nested support 2024-03-13 10:13 Query on ARM SMMUv3 nested support Shameerali Kolothum Thodi @ 2024-03-13 23:50 ` Nicolin Chen 2024-03-18 16:22 ` Jason Gunthorpe 2024-03-22 15:04 ` Shameerali Kolothum Thodi 0 siblings, 2 replies; 9+ messages in thread From: Nicolin Chen @ 2024-03-13 23:50 UTC (permalink / raw) To: Shameerali Kolothum Thodi Cc: Jason Gunthorpe, iommu@lists.linux.dev, Linuxarm, Zhangfei Gao, Michael Shavit, Eric Auger, Moritz Fischer On Wed, Mar 13, 2024 at 10:13:58AM +0000, Shameerali Kolothum Thodi wrote: > Hi Nicolin, > > Thanks for the latest repos with basic SMMUv3 nested support enabled[1]. > I did some basic sanity runs on a HiSilicon platform and they seems to work as > expected. The only problem being we can't assign two devices to the VM if > they are on different physical SMMUs. > > qemu-system-aarch64: -device > vfio-pci,host=0000:75:00.1,iommufd=iommufd0: [iommufd=29] error attach > 0000:75:00.1 (36) to id=4: Invalid argument > qemu-system-aarch64: -device > vfio-pci,host=0000:75:00.1,iommufd=iommufd0: Unable to attach dev to > stage-2 HW pagetable: -1 > Segmentation fault (core dumped) > ... > > I see that on the Qemu side we now allocate a single s2 hwpt and attach that into > all the devices . But this will only work currently if all the assigned devices > are under the same physical SMMUv3 as we have a check in kernel > whether domains are having the same SMMUv3. I remember Jason mentioning > that he is planning to relax that. So are the Qemu side changes based on that > assumption? And any idea how we are planning to relax that restriction? We > do a check for compatibility of the phys SMMUv3s and then allow/restrict in kernel? > Do Qemu can then try allocating a separate s2 hwpt for those and attach again? > Sorry if this was already discussed elsewhere and I missed that. You are very right about this. I haven't thought about supporting that case yet. Likely I need to spare some time to refine the QEMU part in the coming weeks or so (have been waiting for Zhenzhong's full nesting patches). Also, NVIDIA has an interest to support CMDQ-V accelerator, where we need more vSMMU instances, than just one. I haven't decided how to handle this for both single-vSMMU and multi-vSMMU versions. Yet, some of the ongoing work might help: https://github.com/nicolinc/iommufd/commit/b7520901184fd9fa127abb88c1f0be16b9967cff https://github.com/nicolinc/iommufd/commit/d969ffeac899491d3a6e54557bf6a46d52b95865 So, each device should poll hw_info to get its SMMU ID, if it is not in the S2 list of the vSMMU's, add it and allocate a new S2. Jason is on vacation. I believe we'll have a more solid discussion later. Thanks Nicolin ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Query on ARM SMMUv3 nested support 2024-03-13 23:50 ` Nicolin Chen @ 2024-03-18 16:22 ` Jason Gunthorpe 2024-03-22 15:04 ` Shameerali Kolothum Thodi 1 sibling, 0 replies; 9+ messages in thread From: Jason Gunthorpe @ 2024-03-18 16:22 UTC (permalink / raw) To: Nicolin Chen Cc: Shameerali Kolothum Thodi, iommu@lists.linux.dev, Linuxarm, Zhangfei Gao, Michael Shavit, Eric Auger, Moritz Fischer On Wed, Mar 13, 2024 at 04:50:58PM -0700, Nicolin Chen wrote: > On Wed, Mar 13, 2024 at 10:13:58AM +0000, Shameerali Kolothum Thodi wrote: > > Hi Nicolin, > > > > Thanks for the latest repos with basic SMMUv3 nested support enabled[1]. > > I did some basic sanity runs on a HiSilicon platform and they seems to work as > > expected. The only problem being we can't assign two devices to the VM if > > they are on different physical SMMUs. > > > > qemu-system-aarch64: -device > > vfio-pci,host=0000:75:00.1,iommufd=iommufd0: [iommufd=29] error attach > > 0000:75:00.1 (36) to id=4: Invalid argument > > qemu-system-aarch64: -device > > vfio-pci,host=0000:75:00.1,iommufd=iommufd0: Unable to attach dev to > > stage-2 HW pagetable: -1 > > Segmentation fault (core dumped) > > ... > > > > I see that on the Qemu side we now allocate a single s2 hwpt and attach that into > > all the devices . But this will only work currently if all the assigned devices > > are under the same physical SMMUv3 as we have a check in kernel > > whether domains are having the same SMMUv3. I remember Jason mentioning > > that he is planning to relax that. Yes, I intend this to be the direction as it makes the most sense > > So are the Qemu side changes based on that assumption? And any > > idea how we are planning to relax that restriction? We do a check > > for compatibility of the phys SMMUv3s and then allow/restrict in > > kernel? Right, so qemu should still handle the case where the HWPT is not compatible and create a new HWPT. Regardless of the kernel side optimization qemu is still missing this logic apparently. > > Do Qemu can then try allocating a separate s2 hwpt for > > those and attach again? Sorry if this was already discussed > > elsewhere and I missed that. Yes, this is the right thing. Jason ^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: Query on ARM SMMUv3 nested support 2024-03-13 23:50 ` Nicolin Chen 2024-03-18 16:22 ` Jason Gunthorpe @ 2024-03-22 15:04 ` Shameerali Kolothum Thodi 2024-03-29 7:13 ` Nicolin Chen 1 sibling, 1 reply; 9+ messages in thread From: Shameerali Kolothum Thodi @ 2024-03-22 15:04 UTC (permalink / raw) To: Nicolin Chen Cc: Jason Gunthorpe, iommu@lists.linux.dev, Linuxarm, Zhangfei Gao, Michael Shavit, Eric Auger, Moritz Fischer, baolu.lu@linux.intel.com > -----Original Message----- > From: Nicolin Chen <nicolinc@nvidia.com> > Sent: Wednesday, March 13, 2024 11:51 PM > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> > Cc: Jason Gunthorpe <jgg@nvidia.com>; iommu@lists.linux.dev; Linuxarm > <linuxarm@huawei.com>; Zhangfei Gao <zhangfei.gao@linaro.org>; Michael > Shavit <mshavit@google.com>; Eric Auger <eric.auger@redhat.com>; Moritz > Fischer <mdf@kernel.org> > Subject: Re: Query on ARM SMMUv3 nested support > > On Wed, Mar 13, 2024 at 10:13:58AM +0000, Shameerali Kolothum Thodi > wrote: > > Hi Nicolin, > > > > Thanks for the latest repos with basic SMMUv3 nested support enabled[1]. > > I did some basic sanity runs on a HiSilicon platform and they seems to > work as > > expected. The only problem being we can't assign two devices to the VM if > > they are on different physical SMMUs. > > > > qemu-system-aarch64: -device > > vfio-pci,host=0000:75:00.1,iommufd=iommufd0: [iommufd=29] error > attach > > 0000:75:00.1 (36) to id=4: Invalid argument > > qemu-system-aarch64: -device > > vfio-pci,host=0000:75:00.1,iommufd=iommufd0: Unable to attach dev to > > stage-2 HW pagetable: -1 > > Segmentation fault (core dumped) > > ... > > > > I see that on the Qemu side we now allocate a single s2 hwpt and attach > that into > > all the devices . But this will only work currently if all the assigned devices > > are under the same physical SMMUv3 as we have a check in kernel > > whether domains are having the same SMMUv3. I remember Jason > mentioning > > that he is planning to relax that. So are the Qemu side changes based on > that > > assumption? And any idea how we are planning to relax that restriction? > We > > do a check for compatibility of the phys SMMUv3s and then allow/restrict > in kernel? > > Do Qemu can then try allocating a separate s2 hwpt for those and attach > again? > > Sorry if this was already discussed elsewhere and I missed that. > > You are very right about this. I haven't thought about supporting > that case yet. Likely I need to spare some time to refine the QEMU > part in the coming weeks or so (have been waiting for Zhenzhong's > full nesting patches). > > Also, NVIDIA has an interest to support CMDQ-V accelerator, where > we need more vSMMU instances, than just one. I haven't decided how > to handle this for both single-vSMMU and multi-vSMMU versions. > > Yet, some of the ongoing work might help: > https://github.com/nicolinc/iommufd/commit/b7520901184fd9fa127abb88c > 1f0be16b9967cff > https://github.com/nicolinc/iommufd/commit/d969ffeac899491d3a6e54557 > bf6a46d52b95865 > So, each device should poll hw_info to get its SMMU ID, if it is > not in the S2 list of the vSMMU's, add it and allocate a new S2. Thanks for the reference to above CMDQ-V. Looks interesting. I think we can have a go with retry logic first to allocate a new hwpt if the attach fails. For now, I have a temp fix where I allocate a new hwpt every time. I also noticed another problem with the commit below, 6691a2f("hw/arm/smmu-common: Use sysmem for get_address _space until !!s2_hwpti") This actually breaks a virtio-pci dev assignment when that is the only assigned device to the Guest without any pass-through dev in nested scenario. I have a temp fix for that as well here, https://github.com/hisilicon/qemu/commit/ff0230e9593ba1a172aa7ec9162967d6ae6875b0 (I know it is ugly!). I will revisit that again. Please let me know if you have any ideas. FWIW, I have a working branch for vSVA here, https://github.com/hisilicon/qemu/tree/iommufd_vsmmu-02292024-vsva-wip-v2 I still have that io_uring write issue mentioned in the other thread[1], but for now I use io_uring API only for read and make use of normal write() for updating kernel with page response. I have done some basic sanity runs with our ACC devices and those seems to work(of course needs more testing). Please have a look and let me know if you spot anything. Thanks, Shameer 1. https://lore.kernel.org/all/ad4575588dd247fa8beae60963f36404@huawei.com/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Query on ARM SMMUv3 nested support 2024-03-22 15:04 ` Shameerali Kolothum Thodi @ 2024-03-29 7:13 ` Nicolin Chen 2024-04-02 7:25 ` Shameerali Kolothum Thodi 0 siblings, 1 reply; 9+ messages in thread From: Nicolin Chen @ 2024-03-29 7:13 UTC (permalink / raw) To: Shameerali Kolothum Thodi Cc: Jason Gunthorpe, iommu@lists.linux.dev, Linuxarm, Zhangfei Gao, Michael Shavit, Eric Auger, Moritz Fischer, baolu.lu@linux.intel.com Hi Shameer, Sorry for the late reply. Having been occupied by multiple tasks lately. On Fri, Mar 22, 2024 at 03:04:42PM +0000, Shameerali Kolothum Thodi wrote: > Thanks for the reference to above CMDQ-V. Looks interesting. > I think we can have a go with retry logic first to allocate a new hwpt > if the attach fails. For now, I have a temp fix where I allocate a new > hwpt every time. > > I also noticed another problem with the commit below, > 6691a2f("hw/arm/smmu-common: Use sysmem for get_address _space until !!s2_hwpti") > > This actually breaks a virtio-pci dev assignment when that is the only > assigned device to the Guest without any pass-through dev in nested > scenario. > > I have a temp fix for that as well here, > (I know it is ugly!). I will revisit that again. Please let me know if you have > any ideas. I am not quite getting the case here. Is that virtio-pci device behind a nested-SMMU while there is no S2 hwpt? In that case, why does this virtio-pci device require a "nested-smmuv3"? > I still have that io_uring write issue mentioned in the other thread[1], > but for now I use io_uring API only for read and make use of normal write() > for updating kernel with page response. > > I have done some basic sanity runs with our ACC devices and those seems > to work(of course needs more testing). > > Please have a look and let me know if you spot anything. I'm glad that you include the fault handling support. I'll give that a try. Thanks Nicolin ^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: Query on ARM SMMUv3 nested support 2024-03-29 7:13 ` Nicolin Chen @ 2024-04-02 7:25 ` Shameerali Kolothum Thodi 2024-04-02 11:28 ` Jason Gunthorpe 2024-04-09 6:12 ` Nicolin Chen 0 siblings, 2 replies; 9+ messages in thread From: Shameerali Kolothum Thodi @ 2024-04-02 7:25 UTC (permalink / raw) To: Nicolin Chen Cc: Jason Gunthorpe, iommu@lists.linux.dev, Linuxarm, Zhangfei Gao, Michael Shavit, Eric Auger, Moritz Fischer, baolu.lu@linux.intel.com > -----Original Message----- > From: Nicolin Chen <nicolinc@nvidia.com> > Sent: Friday, March 29, 2024 7:14 AM > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> > Cc: Jason Gunthorpe <jgg@nvidia.com>; iommu@lists.linux.dev; Linuxarm > <linuxarm@huawei.com>; Zhangfei Gao <zhangfei.gao@linaro.org>; Michael > Shavit <mshavit@google.com>; Eric Auger <eric.auger@redhat.com>; Moritz > Fischer <mdf@kernel.org>; baolu.lu@linux.intel.com > Subject: Re: Query on ARM SMMUv3 nested support > > Hi Shameer, > > Sorry for the late reply. Having been occupied by multiple tasks > lately. > > On Fri, Mar 22, 2024 at 03:04:42PM +0000, Shameerali Kolothum Thodi > wrote: > > > Thanks for the reference to above CMDQ-V. Looks interesting. > > I think we can have a go with retry logic first to allocate a new hwpt > > if the attach fails. For now, I have a temp fix where I allocate a new > > hwpt every time. > > > > I also noticed another problem with the commit below, > > 6691a2f("hw/arm/smmu-common: Use sysmem for get_address _space > until !!s2_hwpti") > > > > This actually breaks a virtio-pci dev assignment when that is the only > > assigned device to the Guest without any pass-through dev in nested > > scenario. > > > > I have a temp fix for that as well here, > > (I know it is ugly!). I will revisit that again. Please let me know if you have > > any ideas. > > I am not quite getting the case here. Is that virtio-pci device > behind a nested-SMMU while there is no S2 hwpt? In that case, why > does this virtio-pci device require a "nested-smmuv3"? Hi Nicolin, Let me try to explain with a use case, Consider a guest boot with below options, ./qemu-system-aarch64 virt,gic-version=3,iommu=nested-smmuv3,iommufd=iommufd0 \ -enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \ -object iommufd,id=iommufd0 \ -bios QEMU_EFI.fd \ -kernel Image \ -device virtio-blk-device,drive=fs \ -drive if=none,file=ubuntu.img,id=fs \ -device ioh3420,id=rp1 \ -device virtio-9p-pci,fsdev=p9fs,mount_tag=p9 \ -fsdev local,id=p9fs,path=p9root,security_model=mapped \ --append "rdinit=init console=ttyAMA0 root=/dev/vda rw earlycon=pl011,0x9000000" \ -net none \ -nographic Here we have specified nested-smmuv3 and have a PCIe root port(ioh3420) and a virtio-pci dev(for virtfs). Once this VM is up and running, both the ioh3420 and virtio-9p-pci will not work as expected as we return sysmem address space for them. And then later when you try to hotplug a vfio dev to the PCIe root port, device_add vfio-pci,host=0000:7d:02.1,bus=rp1,id=net1,iommufd=iommufd0 it won't work either because the address space for ioh3420 is wrong. Taking another look at this, I think we can replace my earlier attempt to fix this by detecting if the dev is vfio or not and add that into the check below, --- a/hw/arm/smmu-common.c +++ b/hw/arm/smmu-common.c @@ -619,8 +619,15 @@ static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn) SMMUState *s = opaque; SMMUPciBus *sbus = smmu_get_sbus(s, bus); SMMUDevice *sdev = smmu_get_sdev(s, sbus, bus, devfn); + bool is_vfio = false; + PCIDevice *pdev; - if (s->nested && !s->s2_hwpt) { + pdev = pci_find_device(bus, pci_bus_num(bus), devfn); + if (object_dynamic_cast(OBJECT(pdev), "vfio-pci")) { + is_vfio = true; + } + + if (is_vfio && s->nested && !s->s2_hwpt) { return &sdev->as_sysmem; } else { return &sdev->as; Please let me know your thoughts on this. Thanks, Shameer ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Query on ARM SMMUv3 nested support 2024-04-02 7:25 ` Shameerali Kolothum Thodi @ 2024-04-02 11:28 ` Jason Gunthorpe 2024-04-09 6:12 ` Nicolin Chen 1 sibling, 0 replies; 9+ messages in thread From: Jason Gunthorpe @ 2024-04-02 11:28 UTC (permalink / raw) To: Shameerali Kolothum Thodi Cc: Nicolin Chen, iommu@lists.linux.dev, Linuxarm, Zhangfei Gao, Michael Shavit, Eric Auger, Moritz Fischer, baolu.lu@linux.intel.com On Tue, Apr 02, 2024 at 07:25:56AM +0000, Shameerali Kolothum Thodi wrote: > Consider a guest boot with below options, > > ./qemu-system-aarch64 virt,gic-version=3,iommu=nested-smmuv3,iommufd=iommufd0 \ > -enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \ > -object iommufd,id=iommufd0 \ > -bios QEMU_EFI.fd \ > -kernel Image \ > -device virtio-blk-device,drive=fs \ > -drive if=none,file=ubuntu.img,id=fs \ > -device ioh3420,id=rp1 \ > -device virtio-9p-pci,fsdev=p9fs,mount_tag=p9 \ > -fsdev local,id=p9fs,path=p9root,security_model=mapped \ > --append "rdinit=init console=ttyAMA0 root=/dev/vda rw earlycon=pl011,0x9000000" \ > -net none \ > -nographic > > Here we have specified nested-smmuv3 and have a PCIe root port(ioh3420) and > a virtio-pci dev(for virtfs). Once this VM is up and running, both the ioh3420 > and virtio-9p-pci will not work as expected as we return sysmem address space > for them. A virtio device that does not use the HW SMMU for translation should not be affiliated with the nested smmuv3 in the ACPI. IOW only virtual devices that can be associated with the nested HWPT can be linked to the smmuv3 instance in the ACPI. QEMU needs to track this and build the correct IORT. Jason ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Query on ARM SMMUv3 nested support 2024-04-02 7:25 ` Shameerali Kolothum Thodi 2024-04-02 11:28 ` Jason Gunthorpe @ 2024-04-09 6:12 ` Nicolin Chen 2024-04-09 19:47 ` Nicolin Chen 1 sibling, 1 reply; 9+ messages in thread From: Nicolin Chen @ 2024-04-09 6:12 UTC (permalink / raw) To: Shameerali Kolothum Thodi Cc: Jason Gunthorpe, iommu@lists.linux.dev, Linuxarm, Zhangfei Gao, Michael Shavit, Eric Auger, Moritz Fischer, baolu.lu@linux.intel.com Sorry for my belated reply. On Tue, Apr 02, 2024 at 07:25:56AM +0000, Shameerali Kolothum Thodi wrote: > Here we have specified nested-smmuv3 and have a PCIe root port(ioh3420) and > a virtio-pci dev(for virtfs). Once this VM is up and running, both the ioh3420 > and virtio-9p-pci will not work as expected as we return sysmem address space > for them. > > And then later when you try to hotplug a vfio dev to the PCIe root port, > device_add vfio-pci,host=0000:7d:02.1,bus=rp1,id=net1,iommufd=iommufd0 > > it won't work either because the address space for ioh3420 is wrong. > > Taking another look at this, I think we can replace my earlier attempt to fix > this by detecting if the dev is vfio or not and add that into the check below, > > --- a/hw/arm/smmu-common.c > +++ b/hw/arm/smmu-common.c > @@ -619,8 +619,15 @@ static AddressSpace *smmu_find_add_as(PCIBus > *bus, void *opaque, int devfn) > SMMUState *s = opaque; > SMMUPciBus *sbus = smmu_get_sbus(s, bus); > SMMUDevice *sdev = smmu_get_sdev(s, sbus, bus, devfn); > + bool is_vfio = false; > + PCIDevice *pdev; > > - if (s->nested && !s->s2_hwpt) { > + pdev = pci_find_device(bus, pci_bus_num(bus), devfn); > + if (object_dynamic_cast(OBJECT(pdev), "vfio-pci")) { > + is_vfio = true; > + } > + > + if (is_vfio && s->nested && !s->s2_hwpt) { > return &sdev->as_sysmem; > } else { > return &sdev->as; > > Please let me know your thoughts on this. Hmm, checking "vfio-pci" here feels a bit of layer violation.. But I understand how you got there... I think that the current way of enabling the "nested" flag for a device is a bit messy: if -machine has "iommu=nested-smmuv3" && -device is "vfio_pci" && -device has "iommufd" Any other device that doesn't meet the latter two conditions is ambiguously added to a soft smmuv3, which feels wrong to me now. Also, the callbacks from hw/vfio/pci.c to iommu are tricky as well: vfio_realize() { pci_device_iommu_address_space() { smmu_find_add_as(); // Can only assume vfio-pci is behind // a nested SMMU. So returns system AS } vfio_attach_device() { ioctl(VFIO_DEVICE_BIND_IOMMUFD); ioctl(VFIO_DEVICE_ATTACH_IOMMUFD_PT); listener_add_address_space() { if (memory_region_is_iommu(section->mr)) { // Skip iommu_notifier_init(IOMMU_NOTIFIER_IOTLB_EVENTS); memory_region_iommu_replay(); } } } } In my view, we should sort this out with a cleaner enabling command first. And then, as Jason pointed out, other devices (non vfio-pci) should probably not be added to the SMMU, i.e. they might need to be added to a separate PCI bus/bridge with out iommu. In this case, the vSMMU module would not need an iommu AS at all when "iommu=nested-smmuv3". Thoughts? Thanks Nicolin ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Query on ARM SMMUv3 nested support 2024-04-09 6:12 ` Nicolin Chen @ 2024-04-09 19:47 ` Nicolin Chen 0 siblings, 0 replies; 9+ messages in thread From: Nicolin Chen @ 2024-04-09 19:47 UTC (permalink / raw) To: Eric Auger, Shameerali Kolothum Thodi Cc: Jason Gunthorpe, iommu@lists.linux.dev, Linuxarm, Zhangfei Gao, Michael Shavit, Moritz Fischer, baolu.lu@linux.intel.com On Mon, Apr 08, 2024 at 11:13:02PM -0700, Nicolin Chen wrote: > Sorry for my belated reply. > > On Tue, Apr 02, 2024 at 07:25:56AM +0000, Shameerali Kolothum Thodi wrote: > > > Here we have specified nested-smmuv3 and have a PCIe root port(ioh3420) and > > a virtio-pci dev(for virtfs). Once this VM is up and running, both the ioh3420 > > and virtio-9p-pci will not work as expected as we return sysmem address space > > for them. > > > > And then later when you try to hotplug a vfio dev to the PCIe root port, > > device_add vfio-pci,host=0000:7d:02.1,bus=rp1,id=net1,iommufd=iommufd0 > > > > it won't work either because the address space for ioh3420 is wrong. > > > > Taking another look at this, I think we can replace my earlier attempt to fix > > this by detecting if the dev is vfio or not and add that into the check below, > > > > --- a/hw/arm/smmu-common.c > > +++ b/hw/arm/smmu-common.c > > @@ -619,8 +619,15 @@ static AddressSpace *smmu_find_add_as(PCIBus > > *bus, void *opaque, int devfn) > > SMMUState *s = opaque; > > SMMUPciBus *sbus = smmu_get_sbus(s, bus); > > SMMUDevice *sdev = smmu_get_sdev(s, sbus, bus, devfn); > > + bool is_vfio = false; > > + PCIDevice *pdev; > > > > - if (s->nested && !s->s2_hwpt) { > > + pdev = pci_find_device(bus, pci_bus_num(bus), devfn); > > + if (object_dynamic_cast(OBJECT(pdev), "vfio-pci")) { > > + is_vfio = true; > > + } > > + > > + if (is_vfio && s->nested && !s->s2_hwpt) { > > return &sdev->as_sysmem; > > } else { > > return &sdev->as; > > > > Please let me know your thoughts on this. > > Hmm, checking "vfio-pci" here feels a bit of layer violation.. > But I understand how you got there... > > I think that the current way of enabling the "nested" flag for > a device is a bit messy: > if -machine has "iommu=nested-smmuv3" && > -device is "vfio_pci" && > -device has "iommufd" > > Any other device that doesn't meet the latter two conditions > is ambiguously added to a soft smmuv3, which feels wrong to > me now. > > Also, the callbacks from hw/vfio/pci.c to iommu are tricky as > well: > vfio_realize() { > pci_device_iommu_address_space() { > smmu_find_add_as(); // Can only assume vfio-pci is behind > // a nested SMMU. So returns system AS > } > vfio_attach_device() { > ioctl(VFIO_DEVICE_BIND_IOMMUFD); > ioctl(VFIO_DEVICE_ATTACH_IOMMUFD_PT); > listener_add_address_space() { > if (memory_region_is_iommu(section->mr)) { // Skip > iommu_notifier_init(IOMMU_NOTIFIER_IOTLB_EVENTS); > memory_region_iommu_replay(); > } > } > } > } > > In my view, we should sort this out with a cleaner enabling > command first. And then, as Jason pointed out, other devices > (non vfio-pci) should probably not be added to the SMMU, i.e. > they might need to be added to a separate PCI bus/bridge with > out iommu. In this case, the vSMMU module would not need an > iommu AS at all when "iommu=nested-smmuv3". Just recalled that kvm_arch_fixup_msi_route still requires a "vfio-pci" device to return an IOMMUMemoryRegion v.s. system memory. And that's why there's a "!s2_hwpt" condition (hack) when returning the system AS, so that (at the beginning) the vfio attach could bypass the iotlb registeration using system AS and (after s2_hwpt is allocated) MSI could translate the MSI address using iommu AS. Otherwise, such a device could (likely "should" in my view) stay with the system AS, since it doesn't need the emulated iotlb/cache at all. Yet in that case, we would need another pci_device_msi_address_space for MSI.. @Eric, Would it be possible for you shed some light also? In a summary, we have the following opens: 1. How should we enable "nesting" for a passthrough device? - Currently is a bit ambiguous: if -machine has "iommu=nested-smmuv3" && -device is "vfio_pci" && -device has "iommufd" - Actually "-device has 'iommufd'" only guarantees using the IOMMUFD cdev APIs v.s. the legacy one, i.e. it's not necessary to enable nesting supposedly? 2. How should we handle a non passthrough device (no nesting)? - Current is a bit ambiguous again by adding it to the soft SMMUv3. But, is it necessary? It doesn't seem to have any positive performance gain. - Yet not adding it behind the SMMUv3 would require another noiommu PCI bus to dock such a device, I assume? 3. Should we keep a nesting-enabled passthrough device in the system AS v.s. iommu AS? - The reasoning is that it doesn't need an emulated iotlb or cache since it uses the HW one. - Also, with a HW-accelerated VCMDQ on NVIDIA Grace, the emulated iotlb should be turned off because there will be no TLBI/ATC command trapped by QEMU, as all those commands will be issued to the VCMDQ HW directly. - The only usage of an iommu AS is the MSI code, as it's a bit unique requiring for a gIOVA to gPA translation. So, keeping such a device in the system AS would need another pci_device_msi_address_space, which feels reasonable since we need something similar in the kernel too: https://lore.kernel.org/lkml/f65a3d7e3e0e0c7b2f5cfc03ada9618aa8b523bb.1683688960.git.nicolinc@nvidia.com/ Thanks Nicolin ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-04-09 19:47 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-03-13 10:13 Query on ARM SMMUv3 nested support Shameerali Kolothum Thodi 2024-03-13 23:50 ` Nicolin Chen 2024-03-18 16:22 ` Jason Gunthorpe 2024-03-22 15:04 ` Shameerali Kolothum Thodi 2024-03-29 7:13 ` Nicolin Chen 2024-04-02 7:25 ` Shameerali Kolothum Thodi 2024-04-02 11:28 ` Jason Gunthorpe 2024-04-09 6:12 ` Nicolin Chen 2024-04-09 19:47 ` Nicolin Chen
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.