* [Qemu-devel] VFIO mdev with vIOMMU @ 2016-07-28 10:15 Tian, Kevin 2016-07-28 15:41 ` Alex Williamson 0 siblings, 1 reply; 4+ messages in thread From: Tian, Kevin @ 2016-07-28 10:15 UTC (permalink / raw) To: alex.williamson@redhat.com Cc: pbonzini@redhat.com, kraxel@redhat.com, cjia@nvidia.com, qemu-devel@nongnu.org, kvm@vger.kernel.org, Ruan, Shuai, Lv, Zhiyuan, bjsdjshi@linux.vnet.ibm.com, Song, Jike, Kirti Wankhede, Xiao, Guangrong, Wang, Zhenyu Z Hi, Alex, Along with recent enhancement on virtual IOMMU (vIOMMU) in Qemu, I'm thinking whether there is any issue for mdev to cope with vIOMMU. I know today VFIO device only works with PowerPC IOMMU (note someone is enabling VFIO device with virtual VT-d but looks not complete yet), but it's always good to do architecture discussion earlier. :-) VFIO mdev framework maintains a GPA->HPA mapping, which are queried by vendor specific mdev device model for emulation purpose. For example, guest GPU PTEs may need be translated into shadow GPU PTEs, where GPA->HPA conversion is required. When a virtual IOMMU is exposed to the guest, IOVA may be used as DMA address by the guest, which means guest PTE now contains IOVA instead of GPA then device model would like to know IOVA->HPA mapping. After checking current vIOMMU logic within Qemu, looks it's not a problem. vIOMMU is expected to notify any IOVA change to VFIO and kernel VFIO driver does receive map requests for IOVA regions. Thus the mapping structure that VFIO maintains does be IOVA->HPA mapping as required by device model. In this manner looks no further change is required on proposed mdev framework to support vIOMMU. The only thing that I'm unsure is how Qemu guarantees to map IOVA vs. GPA exclusively. I checked that vfio_listener_region_add initiates map request for normal memory regions (which is GPA), and then vfio_iommu_map_notify will send map request for IOVA region which is notified through IOMMU notifier. I don't think VFIO can cope both GPA/IOVA map requests simultaneously, since VFIO doesn't maintain multiple address spaces on one device. It's not a mdev specific question, but I definitely missed some key points here since it's assumed to be working for PowerPC already... Thanks Kevin ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] VFIO mdev with vIOMMU 2016-07-28 10:15 [Qemu-devel] VFIO mdev with vIOMMU Tian, Kevin @ 2016-07-28 15:41 ` Alex Williamson 2016-07-28 23:47 ` Tian, Kevin 0 siblings, 1 reply; 4+ messages in thread From: Alex Williamson @ 2016-07-28 15:41 UTC (permalink / raw) To: Tian, Kevin Cc: pbonzini@redhat.com, kraxel@redhat.com, cjia@nvidia.com, qemu-devel@nongnu.org, kvm@vger.kernel.org, Ruan, Shuai, Lv, Zhiyuan, bjsdjshi@linux.vnet.ibm.com, Song, Jike, Kirti Wankhede, Xiao, Guangrong, Wang, Zhenyu Z On Thu, 28 Jul 2016 10:15:24 +0000 "Tian, Kevin" <kevin.tian@intel.com> wrote: > Hi, Alex, > > Along with recent enhancement on virtual IOMMU (vIOMMU) in Qemu, I'm > thinking whether there is any issue for mdev to cope with vIOMMU. I > know today VFIO device only works with PowerPC IOMMU (note someone > is enabling VFIO device with virtual VT-d but looks not complete yet), but > it's always good to do architecture discussion earlier. :-) > > VFIO mdev framework maintains a GPA->HPA mapping, which are queried > by vendor specific mdev device model for emulation purpose. For example, > guest GPU PTEs may need be translated into shadow GPU PTEs, where > GPA->HPA conversion is required. > > When a virtual IOMMU is exposed to the guest, IOVA may be used as DMA > address by the guest, which means guest PTE now contains IOVA instead > of GPA then device model would like to know IOVA->HPA mapping. After > checking current vIOMMU logic within Qemu, looks it's not a problem. > vIOMMU is expected to notify any IOVA change to VFIO and kernel VFIO > driver does receive map requests for IOVA regions. Thus the mapping > structure that VFIO maintains does be IOVA->HPA mapping as required > by device model. > > In this manner looks no further change is required on proposed mdev > framework to support vIOMMU. The only thing that I'm unsure is how > Qemu guarantees to map IOVA vs. GPA exclusively. I checked that > vfio_listener_region_add initiates map request for normal memory > regions (which is GPA), and then vfio_iommu_map_notify will send > map request for IOVA region which is notified through IOMMU notifier. > I don't think VFIO can cope both GPA/IOVA map requests simultaneously, > since VFIO doesn't maintain multiple address spaces on one device. It's > not a mdev specific question, but I definitely missed some key points > here since it's assumed to be working for PowerPC already... I prefer not to distinguish GPA vs IOVA, the device always operates in the IOVA space. Without a vIOMMU, it just happens to be an identity map into the GPA space. Think about how this works on real hardware, when VT-d is not enabled, there's no translation, IOVA = GPA. The device interacts directly with system memory, same as the default case in QEMU now. When VT-d is enabled, the device is placed into an IOMMU domain and the IOVA space is now restricted to the translations defined within that domain. The same is expected to happen with QEMU, all of the GPA mapped IOVA space is removed via vfio_listener_region_del() and a new IOMMU region is added, enabling the vfio_iommu_map_notify callbacks. The fact that we can't have both system memory and an IOMMU active via vfio_listener_region_add() is a property of the VT-d emulation. Anyway, I think it's handled correctly, but until VT-d emulation actually starts interacting correctly with the iommu map notifier, we won't know if there might be some lingering bugs. Thanks, Alex ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] VFIO mdev with vIOMMU 2016-07-28 15:41 ` Alex Williamson @ 2016-07-28 23:47 ` Tian, Kevin 2016-07-29 15:19 ` Alex Williamson 0 siblings, 1 reply; 4+ messages in thread From: Tian, Kevin @ 2016-07-28 23:47 UTC (permalink / raw) To: Alex Williamson Cc: pbonzini@redhat.com, kraxel@redhat.com, cjia@nvidia.com, qemu-devel@nongnu.org, kvm@vger.kernel.org, Ruan, Shuai, Lv, Zhiyuan, bjsdjshi@linux.vnet.ibm.com, Song, Jike, Kirti Wankhede, Xiao, Guangrong, Wang, Zhenyu Z > From: Alex Williamson [mailto:alex.williamson@redhat.com] > Sent: Thursday, July 28, 2016 11:42 PM > > On Thu, 28 Jul 2016 10:15:24 +0000 > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > Hi, Alex, > > > > Along with recent enhancement on virtual IOMMU (vIOMMU) in Qemu, I'm > > thinking whether there is any issue for mdev to cope with vIOMMU. I > > know today VFIO device only works with PowerPC IOMMU (note someone > > is enabling VFIO device with virtual VT-d but looks not complete yet), but > > it's always good to do architecture discussion earlier. :-) > > > > VFIO mdev framework maintains a GPA->HPA mapping, which are queried > > by vendor specific mdev device model for emulation purpose. For example, > > guest GPU PTEs may need be translated into shadow GPU PTEs, where > > GPA->HPA conversion is required. > > > > When a virtual IOMMU is exposed to the guest, IOVA may be used as DMA > > address by the guest, which means guest PTE now contains IOVA instead > > of GPA then device model would like to know IOVA->HPA mapping. After > > checking current vIOMMU logic within Qemu, looks it's not a problem. > > vIOMMU is expected to notify any IOVA change to VFIO and kernel VFIO > > driver does receive map requests for IOVA regions. Thus the mapping > > structure that VFIO maintains does be IOVA->HPA mapping as required > > by device model. > > > > In this manner looks no further change is required on proposed mdev > > framework to support vIOMMU. The only thing that I'm unsure is how > > Qemu guarantees to map IOVA vs. GPA exclusively. I checked that > > vfio_listener_region_add initiates map request for normal memory > > regions (which is GPA), and then vfio_iommu_map_notify will send > > map request for IOVA region which is notified through IOMMU notifier. > > I don't think VFIO can cope both GPA/IOVA map requests simultaneously, > > since VFIO doesn't maintain multiple address spaces on one device. It's > > not a mdev specific question, but I definitely missed some key points > > here since it's assumed to be working for PowerPC already... > > I prefer not to distinguish GPA vs IOVA, the device always operates in > the IOVA space. Without a vIOMMU, it just happens to be an identity map > into the GPA space. Think about how this works on real hardware, when > VT-d is not enabled, there's no translation, IOVA = GPA. The device > interacts directly with system memory, same as the default case in > QEMU now. When VT-d is enabled, the device is placed into an IOMMU > domain and the IOVA space is now restricted to the translations defined > within that domain. The same is expected to happen with QEMU, all of > the GPA mapped IOVA space is removed via vfio_listener_region_del() and > a new IOMMU region is added, enabling the vfio_iommu_map_notify Ha, it is the info I'm looking for. Can you help point to me where above logic is implemented? I only saw the latter part about adding a new IOMMU region... And suppose we also have logic to do the vice versa - when guest disables IOMMU then all IOVA mappings will be deleted and then GPA mapped IOVA space will be replayed? > callbacks. The fact that we can't have both system memory and an IOMMU > active via vfio_listener_region_add() is a property of the VT-d > emulation. Anyway, I think it's handled correctly, but until VT-d > emulation actually starts interacting correctly with the iommu map > notifier, we won't know if there might be some lingering bugs. Thanks, > Thanks Kevin ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] VFIO mdev with vIOMMU 2016-07-28 23:47 ` Tian, Kevin @ 2016-07-29 15:19 ` Alex Williamson 0 siblings, 0 replies; 4+ messages in thread From: Alex Williamson @ 2016-07-29 15:19 UTC (permalink / raw) To: Tian, Kevin Cc: pbonzini@redhat.com, kraxel@redhat.com, cjia@nvidia.com, qemu-devel@nongnu.org, kvm@vger.kernel.org, Ruan, Shuai, Lv, Zhiyuan, bjsdjshi@linux.vnet.ibm.com, Song, Jike, Kirti Wankhede, Xiao, Guangrong, Wang, Zhenyu Z On Thu, 28 Jul 2016 23:47:58 +0000 "Tian, Kevin" <kevin.tian@intel.com> wrote: > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > Sent: Thursday, July 28, 2016 11:42 PM > > > > On Thu, 28 Jul 2016 10:15:24 +0000 > > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > > Hi, Alex, > > > > > > Along with recent enhancement on virtual IOMMU (vIOMMU) in Qemu, I'm > > > thinking whether there is any issue for mdev to cope with vIOMMU. I > > > know today VFIO device only works with PowerPC IOMMU (note someone > > > is enabling VFIO device with virtual VT-d but looks not complete yet), but > > > it's always good to do architecture discussion earlier. :-) > > > > > > VFIO mdev framework maintains a GPA->HPA mapping, which are queried > > > by vendor specific mdev device model for emulation purpose. For example, > > > guest GPU PTEs may need be translated into shadow GPU PTEs, where > > > GPA->HPA conversion is required. > > > > > > When a virtual IOMMU is exposed to the guest, IOVA may be used as DMA > > > address by the guest, which means guest PTE now contains IOVA instead > > > of GPA then device model would like to know IOVA->HPA mapping. After > > > checking current vIOMMU logic within Qemu, looks it's not a problem. > > > vIOMMU is expected to notify any IOVA change to VFIO and kernel VFIO > > > driver does receive map requests for IOVA regions. Thus the mapping > > > structure that VFIO maintains does be IOVA->HPA mapping as required > > > by device model. > > > > > > In this manner looks no further change is required on proposed mdev > > > framework to support vIOMMU. The only thing that I'm unsure is how > > > Qemu guarantees to map IOVA vs. GPA exclusively. I checked that > > > vfio_listener_region_add initiates map request for normal memory > > > regions (which is GPA), and then vfio_iommu_map_notify will send > > > map request for IOVA region which is notified through IOMMU notifier. > > > I don't think VFIO can cope both GPA/IOVA map requests simultaneously, > > > since VFIO doesn't maintain multiple address spaces on one device. It's > > > not a mdev specific question, but I definitely missed some key points > > > here since it's assumed to be working for PowerPC already... > > > > I prefer not to distinguish GPA vs IOVA, the device always operates in > > the IOVA space. Without a vIOMMU, it just happens to be an identity map > > into the GPA space. Think about how this works on real hardware, when > > VT-d is not enabled, there's no translation, IOVA = GPA. The device > > interacts directly with system memory, same as the default case in > > QEMU now. When VT-d is enabled, the device is placed into an IOMMU > > domain and the IOVA space is now restricted to the translations defined > > within that domain. The same is expected to happen with QEMU, all of > > the GPA mapped IOVA space is removed via vfio_listener_region_del() and > > a new IOMMU region is added, enabling the vfio_iommu_map_notify > > Ha, it is the info I'm looking for. Can you help point to me where above > logic is implemented? I only saw the latter part about adding a new IOMMU > region... > > And suppose we also have logic to do the vice versa - when guest disables > IOMMU then all IOVA mappings will be deleted and then GPA mapped IOVA > space will be replayed? Since we've never actually seen this work with an assigned device, I may have misspoken on how it actually works. I think the above is how it *should* work though. As it is, vfio_initfn() calls pci_device_iommu_address_space(), this is what looks for iommu address spaces for a device and when not found makes use of address_space_memory. So at this point we are either strictly using the iommu notifier or strictly using address_space_memory. This seems hugely inefficient to me, but we don't have any mechanisms to change a device's address space. IMHO it really seems like we need dynamic address space aliasing support for an iommu like VT-d. So if VT-d emulation actually worked with the iommu notifier today, I believe each vfio device will be placed into separate containers and the iommu replay mechanism would populate whatever initial translations are configured for the device, including translations to address_space_memory if the iommu is disabled or configured for passthrough. It would be through notifier events that this would need to be de-populated and re-populated with discrete entries if the device is configured to a more fine grained domain. I have my doubts whether QEMU iommu support is really properly designed to handle an iommu like VT-d, it seems better equipped for simple window-based iommus. Thanks, Alex ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-07-29 15:19 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-07-28 10:15 [Qemu-devel] VFIO mdev with vIOMMU Tian, Kevin 2016-07-28 15:41 ` Alex Williamson 2016-07-28 23:47 ` Tian, Kevin 2016-07-29 15:19 ` Alex Williamson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).