Linux PCI subsystem development
 help / color / mirror / Atom feed
* How are iommu-mappings set up in guest-OS for dma_alloc_coherent
@ 2025-09-18 17:54 Ajay Garg
  2025-09-19 16:41 ` Alex Williamson
  0 siblings, 1 reply; 6+ messages in thread
From: Ajay Garg @ 2025-09-18 17:54 UTC (permalink / raw)
  To: iommu, linux-pci, Linux Kernel Mailing List

Hi everyone.

Let's say we have a following setup :

i)
x86_64 host-os, booted up with iommu enabled and pass-through mode.

ii)
x86_64 guest-os, booted up using vfio+qemu+kvm and a pci-device attached to it.

iii)
A guest-os-device-driver calls "dma_alloc_coherent", after which the
returned dma-address / iova is programmed to the pci-device's
mmio-register.


In the above case, how are the IOMMU mappings set up during the
guest-os-device-driver's "dma_alloc_coherent" call?
Does :

a)
The VMM / KVM intercept the "dma_alloc_coherent" call, and use the
host-iommu to set up things?

OR

b)
There is no interception from VMM / KVM, but rather the guest-OS
itself has a view of the IOMMU (through the regular ACPI tables
populated during guest boot up)?

OR

c)
Anything else under the hood?


Will be grateful for clearing the haze.


Thanks and Regards,
Ajay

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How are iommu-mappings set up in guest-OS for dma_alloc_coherent
  2025-09-18 17:54 How are iommu-mappings set up in guest-OS for dma_alloc_coherent Ajay Garg
@ 2025-09-19 16:41 ` Alex Williamson
  2025-09-20  3:04   ` Ajay Garg
  0 siblings, 1 reply; 6+ messages in thread
From: Alex Williamson @ 2025-09-19 16:41 UTC (permalink / raw)
  To: Ajay Garg; +Cc: iommu, linux-pci, Linux Kernel Mailing List

On Thu, 18 Sep 2025 23:24:19 +0530
Ajay Garg <ajaygargnsit@gmail.com> wrote:

> Hi everyone.
> 
> Let's say we have a following setup :
> 
> i)
> x86_64 host-os, booted up with iommu enabled and pass-through mode.
> 
> ii)
> x86_64 guest-os, booted up using vfio+qemu+kvm and a pci-device attached to it.
> 
> iii)
> A guest-os-device-driver calls "dma_alloc_coherent", after which the
> returned dma-address / iova is programmed to the pci-device's
> mmio-register.
> 
> 
> In the above case, how are the IOMMU mappings set up during the
> guest-os-device-driver's "dma_alloc_coherent" call?
> Does :
> 
> a)
> The VMM / KVM intercept the "dma_alloc_coherent" call, and use the
> host-iommu to set up things?
> 
> OR
> 
> b)
> There is no interception from VMM / KVM, but rather the guest-OS
> itself has a view of the IOMMU (through the regular ACPI tables
> populated during guest boot up)?
> 
> OR
> 
> c)
> Anything else under the hood?
> 
> 
> Will be grateful for clearing the haze.


Depends on details not revealed about the VM configuration.

If the VM is configured without a vIOMMU or the vIOMMU is inactive in
the guest, all of the guest physical memory is pinned and mapped
through the physical IOMMU when the guest is started.  Nothing happens
regarding the IOMMU when a coherent mapping is created in the guest,
it's already setup.

If there is an active vIOMMU in the VM, then the guest act of
programming the IOMMU results in mappings through to the host IOMMU.
This is a result of the IOMMU emulation in the VM.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How are iommu-mappings set up in guest-OS for dma_alloc_coherent
  2025-09-19 16:41 ` Alex Williamson
@ 2025-09-20  3:04   ` Ajay Garg
  2025-09-20 14:34     ` Alex Williamson
  0 siblings, 1 reply; 6+ messages in thread
From: Ajay Garg @ 2025-09-20  3:04 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu, linux-pci, Linux Kernel Mailing List

> If the VM is configured without a vIOMMU or the vIOMMU is inactive in
> the guest, all of the guest physical memory is pinned and mapped
> through the physical IOMMU when the guest is started.  Nothing happens
> regarding the IOMMU when a coherent mapping is created in the guest,
> it's already setup.
>

Thanks Alex.

Another doubt pops up for this scenario.

Let's take a host-OS, with two guess-OSes spawned up (we can take
everything to be x86_64 for simplicity).
Guest G1 has PCI-device-1 attached to it; Guest G2 has PCI-device-2
attached to it.

a)
We do "dma_alloc_coherent" in G1, which returns GVA1 (CPU
virtual-address) and GIOVA1 (Device-bus virtual-address).
Since vIOMMU is not exposed in guest, so GIOVA1 will/can be equal to
GPA1 (physical-address).

This GIOVA1 (== GPA1) will be programmed to the PCI-device-1's
BAR-register to set up DMA.

b)
Similarly, we do "dma_alloc_coherent" in G2, and program GIOVA2 (==
GPA2) to PCI-device-2's BAR-register to set up DMA.

c)
At this point, the physical/host IOMMU will contain mappings for :

    GIOVA1 => HPA1
    GIOVA2 => HPA2

We take "sufficiently" multi-core systems, so that both guests could
be running concurrently, and HPA1 != HPA2 generally.
However, since both guests are running independently, we could very
well land in the situation where

    GIOVA1 == GIOVA2 (== GPA1 == GPA2).

How do we handle such conflicts?
Does x86-IOMMU-PASID come to the rescue here (implicitly meaning that
PCI-device-1 and PCI-device-2 """"must"""" be PASID capable)?

Once again, many thanks for your time and help Alex !


Thanks and Regards,
Ajay

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How are iommu-mappings set up in guest-OS for dma_alloc_coherent
  2025-09-20  3:04   ` Ajay Garg
@ 2025-09-20 14:34     ` Alex Williamson
       [not found]       ` <CAHP4M8WOkDvEf6DYe6w+V9PVHkqcu2-8YrKa7jwLBYRAqLVS+g@mail.gmail.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Alex Williamson @ 2025-09-20 14:34 UTC (permalink / raw)
  To: Ajay Garg; +Cc: iommu, linux-pci, Linux Kernel Mailing List

On Sat, 20 Sep 2025 08:34:41 +0530
Ajay Garg <ajaygargnsit@gmail.com> wrote:

> > If the VM is configured without a vIOMMU or the vIOMMU is inactive in
> > the guest, all of the guest physical memory is pinned and mapped
> > through the physical IOMMU when the guest is started.  Nothing happens
> > regarding the IOMMU when a coherent mapping is created in the guest,
> > it's already setup.
> >  
> 
> Thanks Alex.
> 
> Another doubt pops up for this scenario.
> 
> Let's take a host-OS, with two guess-OSes spawned up (we can take
> everything to be x86_64 for simplicity).
> Guest G1 has PCI-device-1 attached to it; Guest G2 has PCI-device-2
> attached to it.
> 
> a)
> We do "dma_alloc_coherent" in G1, which returns GVA1 (CPU
> virtual-address) and GIOVA1 (Device-bus virtual-address).
> Since vIOMMU is not exposed in guest, so GIOVA1 will/can be equal to
> GPA1 (physical-address).
> 
> This GIOVA1 (== GPA1) will be programmed to the PCI-device-1's
> BAR-register to set up DMA.
> 
> b)
> Similarly, we do "dma_alloc_coherent" in G2, and program GIOVA2 (==
> GPA2) to PCI-device-2's BAR-register to set up DMA.
> 
> c)
> At this point, the physical/host IOMMU will contain mappings for :
> 
>     GIOVA1 => HPA1
>     GIOVA2 => HPA2
> 
> We take "sufficiently" multi-core systems, so that both guests could
> be running concurrently, and HPA1 != HPA2 generally.
> However, since both guests are running independently, we could very
> well land in the situation where
> 
>     GIOVA1 == GIOVA2 (== GPA1 == GPA2).
> 
> How do we handle such conflicts?
> Does x86-IOMMU-PASID come to the rescue here (implicitly meaning that
> PCI-device-1 and PCI-device-2 """"must"""" be PASID capable)?

No, each device has a unique Requester ID (RID).  The IOMMU page tables
are first indexed by the RID, therefore two devices making use of the
same IOVA will use separate page tables resulting in unique HPAs.
PASID provides another level of page table lookup that is not necessary
in the scenario described.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How are iommu-mappings set up in guest-OS for dma_alloc_coherent
       [not found]       ` <CAHP4M8WOkDvEf6DYe6w+V9PVHkqcu2-8YrKa7jwLBYRAqLVS+g@mail.gmail.com>
@ 2025-09-22 14:32         ` Alex Williamson
  2025-09-22 16:21           ` Ajay Garg
  0 siblings, 1 reply; 6+ messages in thread
From: Alex Williamson @ 2025-09-22 14:32 UTC (permalink / raw)
  To: Ajay Garg; +Cc: iommu, linux-pci, Linux Kernel Mailing List

On Sat, 20 Sep 2025 20:27:54 +0530
Ajay Garg <ajaygargnsit@gmail.com> wrote:

> Thanks Alex for the clarification; really grateful.
> 
> When I had last tried attaching PCI-device to a VM (spawned via VFIO), the
> PCI-device vanished from host-device :)
> 
> So, as of today, if the PCIe device is PASID-capable, can it be :
> 
> a)
> Shared/visible both across a host-os and guest-os?
> 
> b)
> Shared/visible across more than one guest-os?

VFIO doesn't make PCI devices disappear from the host.  Maybe you're
referring to unbinding the host function driver, which might make your
NIC/HBA/GPU device disappear from the host as the PCI device is bound
to vfio-pci instead.

There are ways to multiplex devices between host and guest, SR-IOV is
currently the most common way to do this.  Here you'd have a physical
function (PF) with a host function driver, which can create multiple
virtual functions (VFs), each of which have a unique requester ID and
therefore a unique set of page tables allowing them to operate in
independent IOVA spaces for VMs.  You can imagine here that your PF
remains bound to the host function driver and continues to provide host
services, while the VFs can be assigned to VMs.

PASID is another way to do this and is often described in an SIOV
(Scalable IOV) framework, where we rely more on software to expose an
assignable entity which makes use of the combination of the physical
requester ID along with PASID to create a unique IOVA space through two
levels of IOMMU page tables.

In either case, having an SR-IOV or PASID capability on the device
doesn't automatically enable device multiplexing, there's software
required to enable these features, more so in the direction of SIOV
support as the scalability trade-off is to push more of the basic
device emulation into software.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How are iommu-mappings set up in guest-OS for dma_alloc_coherent
  2025-09-22 14:32         ` Alex Williamson
@ 2025-09-22 16:21           ` Ajay Garg
  0 siblings, 0 replies; 6+ messages in thread
From: Ajay Garg @ 2025-09-22 16:21 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu, linux-pci, Linux Kernel Mailing List

>
> VFIO doesn't make PCI devices disappear from the host.  Maybe you're
> referring to unbinding the host function driver, which might make your
> NIC/HBA/GPU device disappear from the host as the PCI device is bound
> to vfio-pci instead.
>

Yep Alex, that's what I meant.
I am sorry for (unintentionally) causing ambiguity.

> There are ways to multiplex devices between host and guest, SR-IOV is
> currently the most common way to do this.  Here you'd have a physical
> function (PF) with a host function driver, which can create multiple
> virtual functions (VFs), each of which have a unique requester ID and
> therefore a unique set of page tables allowing them to operate in
> independent IOVA spaces for VMs.  You can imagine here that your PF
> remains bound to the host function driver and continues to provide host
> services, while the VFs can be assigned to VMs.

Perfect, thanks Alex ..

>
> PASID is another way to do this and is often described in an SIOV
> (Scalable IOV) framework, where we rely more on software to expose an
> assignable entity which makes use of the combination of the physical
> requester ID along with PASID to create a unique IOVA space through two
> levels of IOMMU page tables.

Perfect again, many thanks again Alex ..

>
> In either case, having an SR-IOV or PASID capability on the device
> doesn't automatically enable device multiplexing, there's software
> required to enable these features, more so in the direction of SIOV
> support as the scalability trade-off is to push more of the basic
> device emulation into software.

Thanks a ton Alex for all the help !
Thank you for always being there whenever we get stuck .. !!


Thanks and Regards,
Ajay

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-09-22 16:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-18 17:54 How are iommu-mappings set up in guest-OS for dma_alloc_coherent Ajay Garg
2025-09-19 16:41 ` Alex Williamson
2025-09-20  3:04   ` Ajay Garg
2025-09-20 14:34     ` Alex Williamson
     [not found]       ` <CAHP4M8WOkDvEf6DYe6w+V9PVHkqcu2-8YrKa7jwLBYRAqLVS+g@mail.gmail.com>
2025-09-22 14:32         ` Alex Williamson
2025-09-22 16:21           ` Ajay Garg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox