[Qemu-devel] VFIO mdev with vIOMMU

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] VFIO mdev with vIOMMU
@ 2016-07-28 10:15 Tian, Kevin
  2016-07-28 15:41 ` Alex Williamson
  0 siblings, 1 reply; 4+ messages in thread
From: Tian, Kevin @ 2016-07-28 10:15 UTC (permalink / raw)
  To: alex.williamson@redhat.com
  Cc: pbonzini@redhat.com, kraxel@redhat.com, cjia@nvidia.com,
	qemu-devel@nongnu.org, kvm@vger.kernel.org, Ruan, Shuai,
	Lv, Zhiyuan, bjsdjshi@linux.vnet.ibm.com, Song, Jike,
	Kirti Wankhede, Xiao, Guangrong, Wang, Zhenyu Z

Hi, Alex,

Along with recent enhancement on virtual IOMMU (vIOMMU) in Qemu, I'm 
thinking whether there is any issue for mdev to cope with vIOMMU. I
know today VFIO device only works with PowerPC IOMMU (note someone
is enabling VFIO device with virtual VT-d but looks not complete yet), but
it's always good to do architecture discussion earlier. :-)

VFIO mdev framework maintains a GPA->HPA mapping, which are queried
by vendor specific mdev device model for emulation purpose. For example,
guest GPU PTEs may need be translated into shadow GPU PTEs, where 
GPA->HPA conversion is required.

When a virtual IOMMU is exposed to the guest, IOVA may be used as DMA 
address by the guest, which means guest PTE now contains IOVA instead 
of GPA then device model would like to know IOVA->HPA mapping. After 
checking current vIOMMU logic within Qemu, looks it's not a problem. 
vIOMMU is expected to notify any IOVA change to VFIO and kernel VFIO 
driver does receive map requests for IOVA regions. Thus the mapping 
structure that VFIO maintains does be IOVA->HPA mapping as required 
by device model. 

In this manner looks no further change is required on proposed mdev
framework to support vIOMMU. The only thing that I'm unsure is how
Qemu guarantees to map IOVA vs. GPA exclusively. I checked that
vfio_listener_region_add initiates map request for normal memory 
regions (which is GPA), and then vfio_iommu_map_notify will send
map request for IOVA region which is notified through IOMMU notifier.
I don't think VFIO can cope both GPA/IOVA map requests simultaneously,
since VFIO doesn't maintain multiple address spaces on one device. It's
not a mdev specific question, but I definitely missed some key points 
here since it's assumed to be working for PowerPC already...

Thanks
Kevin

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] VFIO mdev with vIOMMU
  2016-07-28 10:15 [Qemu-devel] VFIO mdev with vIOMMU Tian, Kevin
@ 2016-07-28 15:41 ` Alex Williamson
  2016-07-28 23:47   ` Tian, Kevin
  0 siblings, 1 reply; 4+ messages in thread
From: Alex Williamson @ 2016-07-28 15:41 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: pbonzini@redhat.com, kraxel@redhat.com, cjia@nvidia.com,
	qemu-devel@nongnu.org, kvm@vger.kernel.org, Ruan, Shuai,
	Lv, Zhiyuan, bjsdjshi@linux.vnet.ibm.com, Song, Jike,
	Kirti Wankhede, Xiao, Guangrong, Wang, Zhenyu Z

On Thu, 28 Jul 2016 10:15:24 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> Hi, Alex,
> 
> Along with recent enhancement on virtual IOMMU (vIOMMU) in Qemu, I'm 
> thinking whether there is any issue for mdev to cope with vIOMMU. I
> know today VFIO device only works with PowerPC IOMMU (note someone
> is enabling VFIO device with virtual VT-d but looks not complete yet), but
> it's always good to do architecture discussion earlier. :-)
> 
> VFIO mdev framework maintains a GPA->HPA mapping, which are queried
> by vendor specific mdev device model for emulation purpose. For example,
> guest GPU PTEs may need be translated into shadow GPU PTEs, where 
> GPA->HPA conversion is required.
> 
> When a virtual IOMMU is exposed to the guest, IOVA may be used as DMA 
> address by the guest, which means guest PTE now contains IOVA instead 
> of GPA then device model would like to know IOVA->HPA mapping. After 
> checking current vIOMMU logic within Qemu, looks it's not a problem. 
> vIOMMU is expected to notify any IOVA change to VFIO and kernel VFIO 
> driver does receive map requests for IOVA regions. Thus the mapping 
> structure that VFIO maintains does be IOVA->HPA mapping as required 
> by device model. 
> 
> In this manner looks no further change is required on proposed mdev
> framework to support vIOMMU. The only thing that I'm unsure is how
> Qemu guarantees to map IOVA vs. GPA exclusively. I checked that
> vfio_listener_region_add initiates map request for normal memory 
> regions (which is GPA), and then vfio_iommu_map_notify will send
> map request for IOVA region which is notified through IOMMU notifier.
> I don't think VFIO can cope both GPA/IOVA map requests simultaneously,
> since VFIO doesn't maintain multiple address spaces on one device. It's
> not a mdev specific question, but I definitely missed some key points 
> here since it's assumed to be working for PowerPC already...

I prefer not to distinguish GPA vs IOVA, the device always operates in
the IOVA space.  Without a vIOMMU, it just happens to be an identity map
into the GPA space.  Think about how this works on real hardware, when
VT-d is not enabled, there's no translation, IOVA = GPA.  The device
interacts directly with system memory, same as the default case in
QEMU now.  When VT-d is enabled, the device is placed into an IOMMU
domain and the IOVA space is now restricted to the translations defined
within that domain.  The same is expected to happen with QEMU, all of
the GPA mapped IOVA space is removed via vfio_listener_region_del() and
a new IOMMU region is added, enabling the vfio_iommu_map_notify
callbacks.  The fact that we can't have both system memory and an IOMMU
active via vfio_listener_region_add() is a property of the VT-d
emulation.  Anyway, I think it's handled correctly, but until VT-d
emulation actually starts interacting correctly with the iommu map
notifier, we won't know if there might be some lingering bugs.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] VFIO mdev with vIOMMU
  2016-07-28 15:41 ` Alex Williamson
@ 2016-07-28 23:47   ` Tian, Kevin
  2016-07-29 15:19     ` Alex Williamson
  0 siblings, 1 reply; 4+ messages in thread
From: Tian, Kevin @ 2016-07-28 23:47 UTC (permalink / raw)
  To: Alex Williamson
  Cc: pbonzini@redhat.com, kraxel@redhat.com, cjia@nvidia.com,
	qemu-devel@nongnu.org, kvm@vger.kernel.org, Ruan, Shuai,
	Lv, Zhiyuan, bjsdjshi@linux.vnet.ibm.com, Song, Jike,
	Kirti Wankhede, Xiao, Guangrong, Wang, Zhenyu Z

> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Thursday, July 28, 2016 11:42 PM
> 
> On Thu, 28 Jul 2016 10:15:24 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > Hi, Alex,
> >
> > Along with recent enhancement on virtual IOMMU (vIOMMU) in Qemu, I'm
> > thinking whether there is any issue for mdev to cope with vIOMMU. I
> > know today VFIO device only works with PowerPC IOMMU (note someone
> > is enabling VFIO device with virtual VT-d but looks not complete yet), but
> > it's always good to do architecture discussion earlier. :-)
> >
> > VFIO mdev framework maintains a GPA->HPA mapping, which are queried
> > by vendor specific mdev device model for emulation purpose. For example,
> > guest GPU PTEs may need be translated into shadow GPU PTEs, where
> > GPA->HPA conversion is required.
> >
> > When a virtual IOMMU is exposed to the guest, IOVA may be used as DMA
> > address by the guest, which means guest PTE now contains IOVA instead
> > of GPA then device model would like to know IOVA->HPA mapping. After
> > checking current vIOMMU logic within Qemu, looks it's not a problem.
> > vIOMMU is expected to notify any IOVA change to VFIO and kernel VFIO
> > driver does receive map requests for IOVA regions. Thus the mapping
> > structure that VFIO maintains does be IOVA->HPA mapping as required
> > by device model.
> >
> > In this manner looks no further change is required on proposed mdev
> > framework to support vIOMMU. The only thing that I'm unsure is how
> > Qemu guarantees to map IOVA vs. GPA exclusively. I checked that
> > vfio_listener_region_add initiates map request for normal memory
> > regions (which is GPA), and then vfio_iommu_map_notify will send
> > map request for IOVA region which is notified through IOMMU notifier.
> > I don't think VFIO can cope both GPA/IOVA map requests simultaneously,
> > since VFIO doesn't maintain multiple address spaces on one device. It's
> > not a mdev specific question, but I definitely missed some key points
> > here since it's assumed to be working for PowerPC already...
> 
> I prefer not to distinguish GPA vs IOVA, the device always operates in
> the IOVA space.  Without a vIOMMU, it just happens to be an identity map
> into the GPA space.  Think about how this works on real hardware, when
> VT-d is not enabled, there's no translation, IOVA = GPA.  The device
> interacts directly with system memory, same as the default case in
> QEMU now.  When VT-d is enabled, the device is placed into an IOMMU
> domain and the IOVA space is now restricted to the translations defined
> within that domain.  The same is expected to happen with QEMU, all of
> the GPA mapped IOVA space is removed via vfio_listener_region_del() and
> a new IOMMU region is added, enabling the vfio_iommu_map_notify

Ha, it is the info I'm looking for. Can you help point to me where above 
logic is implemented? I only saw the latter part about adding a new IOMMU
region...

And suppose we also have logic to do the vice versa - when guest disables
IOMMU then all IOVA mappings will be deleted and then GPA mapped IOVA
space will be replayed?

> callbacks.  The fact that we can't have both system memory and an IOMMU
> active via vfio_listener_region_add() is a property of the VT-d
> emulation.  Anyway, I think it's handled correctly, but until VT-d
> emulation actually starts interacting correctly with the iommu map
> notifier, we won't know if there might be some lingering bugs.  Thanks,
> 

Thanks
Kevin

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] VFIO mdev with vIOMMU
  2016-07-28 23:47   ` Tian, Kevin
@ 2016-07-29 15:19     ` Alex Williamson
  0 siblings, 0 replies; 4+ messages in thread
From: Alex Williamson @ 2016-07-29 15:19 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: pbonzini@redhat.com, kraxel@redhat.com, cjia@nvidia.com,
	qemu-devel@nongnu.org, kvm@vger.kernel.org, Ruan, Shuai,
	Lv, Zhiyuan, bjsdjshi@linux.vnet.ibm.com, Song, Jike,
	Kirti Wankhede, Xiao, Guangrong, Wang, Zhenyu Z

On Thu, 28 Jul 2016 23:47:58 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Thursday, July 28, 2016 11:42 PM
> > 
> > On Thu, 28 Jul 2016 10:15:24 +0000
> > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >   
> > > Hi, Alex,
> > >
> > > Along with recent enhancement on virtual IOMMU (vIOMMU) in Qemu, I'm
> > > thinking whether there is any issue for mdev to cope with vIOMMU. I
> > > know today VFIO device only works with PowerPC IOMMU (note someone
> > > is enabling VFIO device with virtual VT-d but looks not complete yet), but
> > > it's always good to do architecture discussion earlier. :-)
> > >
> > > VFIO mdev framework maintains a GPA->HPA mapping, which are queried
> > > by vendor specific mdev device model for emulation purpose. For example,
> > > guest GPU PTEs may need be translated into shadow GPU PTEs, where
> > > GPA->HPA conversion is required.
> > >
> > > When a virtual IOMMU is exposed to the guest, IOVA may be used as DMA
> > > address by the guest, which means guest PTE now contains IOVA instead
> > > of GPA then device model would like to know IOVA->HPA mapping. After
> > > checking current vIOMMU logic within Qemu, looks it's not a problem.
> > > vIOMMU is expected to notify any IOVA change to VFIO and kernel VFIO
> > > driver does receive map requests for IOVA regions. Thus the mapping
> > > structure that VFIO maintains does be IOVA->HPA mapping as required
> > > by device model.
> > >
> > > In this manner looks no further change is required on proposed mdev
> > > framework to support vIOMMU. The only thing that I'm unsure is how
> > > Qemu guarantees to map IOVA vs. GPA exclusively. I checked that
> > > vfio_listener_region_add initiates map request for normal memory
> > > regions (which is GPA), and then vfio_iommu_map_notify will send
> > > map request for IOVA region which is notified through IOMMU notifier.
> > > I don't think VFIO can cope both GPA/IOVA map requests simultaneously,
> > > since VFIO doesn't maintain multiple address spaces on one device. It's
> > > not a mdev specific question, but I definitely missed some key points
> > > here since it's assumed to be working for PowerPC already...  
> > 
> > I prefer not to distinguish GPA vs IOVA, the device always operates in
> > the IOVA space.  Without a vIOMMU, it just happens to be an identity map
> > into the GPA space.  Think about how this works on real hardware, when
> > VT-d is not enabled, there's no translation, IOVA = GPA.  The device
> > interacts directly with system memory, same as the default case in
> > QEMU now.  When VT-d is enabled, the device is placed into an IOMMU
> > domain and the IOVA space is now restricted to the translations defined
> > within that domain.  The same is expected to happen with QEMU, all of
> > the GPA mapped IOVA space is removed via vfio_listener_region_del() and
> > a new IOMMU region is added, enabling the vfio_iommu_map_notify  
> 
> Ha, it is the info I'm looking for. Can you help point to me where above 
> logic is implemented? I only saw the latter part about adding a new IOMMU
> region...
> 
> And suppose we also have logic to do the vice versa - when guest disables
> IOMMU then all IOVA mappings will be deleted and then GPA mapped IOVA
> space will be replayed?

Since we've never actually seen this work with an assigned device, I
may have misspoken on how it actually works.  I think the above is how
it *should* work though.  As it is, vfio_initfn() calls
pci_device_iommu_address_space(), this is what looks for iommu address
spaces for a device and when not found makes use of
address_space_memory.  So at this point we are either strictly using
the iommu notifier or strictly using address_space_memory.  This seems
hugely inefficient to me, but we don't have any mechanisms to change a
device's address space.  IMHO it really seems like we need
dynamic address space aliasing support for an iommu like VT-d.  So if
VT-d emulation actually worked with the iommu notifier today, I believe
each vfio device will be placed into separate containers and the iommu
replay mechanism would populate whatever initial translations are
configured for the device, including translations to
address_space_memory if the iommu is disabled or configured for
passthrough.  It would be through notifier events that this would need
to be de-populated and re-populated with discrete entries if the device
is configured to a more fine grained domain.

I have my doubts whether QEMU iommu support is really properly designed
to handle an iommu like VT-d, it seems better equipped for simple
window-based iommus.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-07-29 15:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-28 10:15 [Qemu-devel] VFIO mdev with vIOMMU Tian, Kevin
2016-07-28 15:41 ` Alex Williamson
2016-07-28 23:47   ` Tian, Kevin
2016-07-29 15:19     ` Alex Williamson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).