How do i know which "iommu" i used on VM? qemu emulated or hardware DMAR?

All of lore.kernel.org
 help / color / mirror / Atom feed

* How do  i know which "iommu" i used on VM? qemu emulated or hardware DMAR?
@ 2025-12-17  4:20 tugouxp
  2025-12-17 11:54 ` Alex Bennée
  0 siblings, 1 reply; 6+ messages in thread
From: tugouxp @ 2025-12-17  4:20 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2948 bytes --]

Hi folks:
  Hello everyone, I have a few questions regarding the QEMU device passthrough feature that I’d like to ask for help with.

Both my HOST OS and GUEST OS are running Ubuntu 20.4.6. I passed through a dedicated NVIDIA MX250 GPU from the HOST to the GUEST OS. On the HOST, I installed the VFIO-PCI driver for passthrough, while the GUEST OS uses the default Nouveau driver from Ubuntu 20.4.6. I also enabled IOMMU in the GUEST OS and checked the IOMMU group layout from sysfs"/sys/kernel/iommu_group/xxxx/type". The passthrough MX250 operates in “DMA” translation mode,which means the translation really work. Thanks to your excellent work, the setup process went smoothly and everything runs well. However, I have a couple of questions:

Is the IOMMU (DMAR) in the GUEST OS emulated by QEMU, or does it share the same IOMMU as the HOST OS?

Given that both the GUEST OS and HOST OS have IOMMU enabled, when the MX250 performs DMA, it should go through two-stage page table translation—first in the GUEST OS and then in the HOST OS—with VFIO-PCI assisting in this process, correct? If so, are both stages handled by hardware? I understand that the second stage is definitely hardware-assisted, but I’m uncertain about the first stage: whether the translation from IOVA to GPA (IPA) within the GUEST OS is also hardware-assisted.

Those are my two questions. Thank you very much for your help!
some information about my env:
Qemu Launch VM command:    qemu-system-x86_64 -cpu qemu64,+mtrr,+ssse3,sse4.1,+sse4.2 -m 4096 -smp 4 --enable-kvm -drive file=./test-vm-1.qcow2,if=virtio -machine q35,kernel-irqchip=split -device intel-iommu,intremap=on,caching-mode=on -device vfio-pci,host=02:00.0
vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ ls

0000:00:00.0  0000:00:01.0  0000:00:02.0  0000:00:03.0  0000:00:04.0  0000:00:1f.0  0000:00:1f.2  0000:00:1f.3

vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ lspci

00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller

00:01.0 VGA compatible controller: Device 1234:1111 (rev 02)

00:02.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection

00:03.0 3D controller: NVIDIA Corporation GP108M [GeForce MX250] (rev a1)

00:04.0 SCSI storage controller: Red Hat, Inc. Virtio block device

00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller (rev 02)

00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] (rev 02)

00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)

vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ cat /sys/kernel/iommu_groups/

0/ 1/ 2/ 3/ 4/ 5/ 

vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ cat /sys/kernel/iommu_groups/*/type

DMA

DMA

DMA

DMA

DMA

DMA
vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices
BRs
zlc

[-- Attachment #2: Type: text/html, Size: 10058 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How do  i know which "iommu" i used on VM? qemu emulated or hardware DMAR?
  2025-12-17  4:20 How do i know which "iommu" i used on VM? qemu emulated or hardware DMAR? tugouxp
@ 2025-12-17 11:54 ` Alex Bennée
  2025-12-17 14:12   ` CLEMENT MATHIEU--DRIF
  0 siblings, 1 reply; 6+ messages in thread
From: Alex Bennée @ 2025-12-17 11:54 UTC (permalink / raw)
  To: tugouxp
  Cc: qemu-devel, Michael S. Tsirkin, Jason Wang, Yi Liu,
	Clément Mathieu--Drif

tugouxp <13824125580@163.com> writes:

(I'll preface this with my not-an-expert hat on, adding some iommu
maintainers to the CC who might correct my ramblings)

> Hi folks:
>   Hello everyone, I have a few questions regarding the QEMU device passthrough feature that I’d like to ask for help with.
>
> Both my HOST OS and GUEST OS are running Ubuntu 20.4.6. I passed through a dedicated NVIDIA MX250 GPU from the HOST to the
> GUEST OS. On the HOST, I installed the VFIO-PCI driver for passthrough, while the GUEST OS uses the default Nouveau driver from
> Ubuntu 20.4.6. I also enabled IOMMU in the GUEST OS and checked the IOMMU group layout from
> sysfs"/sys/kernel/iommu_group/xxxx/type". The passthrough MX250 operates in “DMA” translation mode,which means the translation
> really work. Thanks to your excellent work, the setup process went smoothly and everything runs well. However, I have a couple of
> questions:
>
> 1 Is the IOMMU (DMAR) in the GUEST OS emulated by QEMU, or does it
> share the same IOMMU as the HOST OS?

Generally the guest IOMMU is emulated. You do not want the guest to be
able to directly program the host HW because that would open up security
issues. However for simplicity the IOMMU presented to the guest it
usually the same as the host hardware - whatever the architecturally
mandated IOMMU is.

There are fully virtual IOMMU's (e.g. virtio-iommu) which completely
abstract the host hardware away.

In both these cases it is QEMUs responsibility to take the guest
programming and apply those changes to the host hardware to ensure the
mappings work properly.

There are also host IOMMU's which virtualise some of the interfaces to
so the guest can directly program them (within certain bounds) for their
mappings. I have no idea if the intel-iommu is one of these.

>
> 2 Given that both the GUEST OS and HOST OS have IOMMU enabled, when the MX250 performs DMA, it should go through two-stage
>  page table translation—first in the GUEST OS and then in the HOST OS—with VFIO-PCI assisting in this process, correct? If so, are
>  both stages handled by hardware? I understand that the second stage is definitely hardware-assisted, but I’m uncertain about the
>  first stage: whether the translation from IOVA to GPA (IPA) within
> the GUEST OS is also hardware-assisted.

I think this will depend on the implementation details of the particular
IOMMU.

The guest will create/manage tables to map IOVA -> GPA.

There are two options for QEMU now.

The first is monitor the guest page tables for changes and then create a
shadow page table that mirrors the guest but maps the IOVA directly to
the final host physical address (HPA). This would be a single stage
translation. I think this is how intel-iommu,caching-mode=on works.

The second option is for IOMMU's that support a full two-stage HW
translation (much in the same way as hypervisors have a second stage).
The initial lookup would be via the guests iommu table (IOVA->GPA)
before a second stage controlled by the host would map to the final
address (GPA->HPA). I think two stage IOMMU's are a requirement if you
are handling nested VMs.

> Those are my two questions. Thank you very much for your help!
> some information about my env:
>
> 1 Qemu Launch VM command:    qemu-system-x86_64 -cpu qemu64,+mtrr,+ssse3,sse4.1,+sse4.2 -m 4096 -smp 4 --enable-kvm -
>  drive file=./test-vm-1.qcow2,if=virtio -machine q35,kernel-irqchip=split -device intel-iommu,intremap=on,caching-mode=on -
>  device vfio-pci,host=02:00.0
> 2 vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ ls
> 0000:00:00.0  0000:00:01.0  0000:00:02.0  0000:00:03.0  0000:00:04.0  0000:00:1f.0  0000:00:1f.2  0000:00:1f.3
> vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ lspci
> 00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
> 00:01.0 VGA compatible controller: Device 1234:1111 (rev 02)
> 00:02.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
> 00:03.0 3D controller: NVIDIA Corporation GP108M [GeForce MX250] (rev a1)
> 00:04.0 SCSI storage controller: Red Hat, Inc. Virtio block device
> 00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller (rev 02)
> 00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] (rev 02)
> 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
> vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ cat /sys/kernel/iommu_groups/
> 0/ 1/ 2/ 3/ 4/ 5/ 
> vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ cat /sys/kernel/iommu_groups/*/type
> DMA
> DMA
> DMA
> DMA
> DMA
> DMA
> vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices
>
> BRs
> zlc

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How do  i know which "iommu" i used on VM? qemu emulated or hardware DMAR?
  2025-12-17 11:54 ` Alex Bennée
@ 2025-12-17 14:12   ` CLEMENT MATHIEU--DRIF
  2025-12-18  1:33     ` tugouxp
  0 siblings, 1 reply; 6+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2025-12-17 14:12 UTC (permalink / raw)
  To: Alex Bennée, tugouxp
  Cc: qemu-devel@nongnu.org, Michael S.Tsirkin, Jason Wang, Yi Liu


On Wed, 2025-12-17 at 11:54 +0000, Alex Bennée wrote:
> Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.
> 
> 
> tugouxp <[13824125580@163.com](mailto:13824125580@163.com)> writes:
> 
> (I'll preface this with my not-an-expert hat on, adding some iommu  
> maintainers to the CC who might correct my ramblings)
> 
> 
> > Hi folks:
> >   Hello everyone, I have a few questions regarding the QEMU device passthrough feature that I’d like to ask for help with.
> > 
> > Both my HOST OS and GUEST OS are running Ubuntu 20.4.6. I passed through a dedicated NVIDIA MX250 GPU from the HOST to the
> > GUEST OS. On the HOST, I installed the VFIO-PCI driver for passthrough, while the GUEST OS uses the default Nouveau driver from
> > Ubuntu 20.4.6. I also enabled IOMMU in the GUEST OS and checked the IOMMU group layout from
> > sysfs"/sys/kernel/iommu_group/xxxx/type". The passthrough MX250 operates in “DMA” translation mode,which means the translation
> > really work. Thanks to your excellent work, the setup process went smoothly and everything runs well. However, I have a couple of
> > questions:
> > 
> > 1 Is the IOMMU (DMAR) in the GUEST OS emulated by QEMU, or does it
> > share the same IOMMU as the HOST OS?
> 
> 
> Generally the guest IOMMU is emulated. You do not want the guest to be  
> able to directly program the host HW because that would open up security  
> issues. However for simplicity the IOMMU presented to the guest it  
> usually the same as the host hardware - whatever the architecturally  
> mandated IOMMU is.
> 
> There are fully virtual IOMMU's (e.g. virtio-iommu) which completely  
> abstract the host hardware away.
> 
> In both these cases it is QEMUs responsibility to take the guest  
> programming and apply those changes to the host hardware to ensure the  
> mappings work properly.
> 
> There are also host IOMMU's which virtualise some of the interfaces to  
> so the guest can directly program them (within certain bounds) for their  
> mappings. I have no idea if the intel-iommu is one of these.
> 
> 
> 
> > 2 Given that both the GUEST OS and HOST OS have IOMMU enabled, when the MX250 performs DMA, it should go through two-stage
> >  page table translation—first in the GUEST OS and then in the HOST OS—with VFIO-PCI assisting in this process, correct? If so, are
> >  both stages handled by hardware? I understand that the second stage is definitely hardware-assisted, but I’m uncertain about the
> >  first stage: whether the translation from IOVA to GPA (IPA) within
> > the GUEST OS is also hardware-assisted.
> 
> 
> I think this will depend on the implementation details of the particular  
> IOMMU.
> 
> The guest will create/manage tables to map IOVA -> GPA.
> 
> There are two options for QEMU now.
> 
> The first is monitor the guest page tables for changes and then create a  
> shadow page table that mirrors the guest but maps the IOVA directly to  
> the final host physical address (HPA). This would be a single stage  
> translation. I think this is how intel-iommu,caching-mode=on works.

Indeed, caching mode allows us to trap and hook where needed to build the shadow page table.

A new mode based on nested translation is under development.  
I recommend reading this if you want more details: https://lists.nongnu.org/archive/html/qemu-devel/2025-12/msg01796.html

> 
> The second option is for IOMMU's that support a full two-stage HW  
> translation (much in the same way as hypervisors have a second stage).  
> The initial lookup would be via the guests iommu table (IOVA->GPA)  
> before a second stage controlled by the host would map to the final  
> address (GPA->HPA). I think two stage IOMMU's are a requirement if you  
> are handling nested VMs.
> 
> 
> > Those are my two questions. Thank you very much for your help!
> > some information about my env:
> > 
> > 1 Qemu Launch VM command:    qemu-system-x86_64 -cpu qemu64,+mtrr,+ssse3,sse4.1,+sse4.2 -m 4096 -smp 4 --enable-kvm -
> >  drive file=./test-vm-1.qcow2,if=virtio -machine q35,kernel-irqchip=split -device intel-iommu,intremap=on,caching-mode=on -
> >  device vfio-pci,host=02:00.0
> > 2 vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ ls
> > 0000:00:00.0  0000:00:01.0  0000:00:02.0  0000:00:03.0  0000:00:04.0  0000:00:1f.0  0000:00:1f.2  0000:00:1f.3
> > vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ lspci
> > 00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
> > 00:01.0 VGA compatible controller: Device 1234:1111 (rev 02)
> > 00:02.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
> > 00:03.0 3D controller: NVIDIA Corporation GP108M [GeForce MX250] (rev a1)
> > 00:04.0 SCSI storage controller: Red Hat, Inc. Virtio block device
> > 00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller (rev 02)
> > 00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] (rev 02)
> > 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
> > vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ cat /sys/kernel/iommu_groups/
> > 0/ 1/ 2/ 3/ 4/ 5/
> > vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ cat /sys/kernel/iommu_groups/*/type
> > DMA
> > DMA
> > DMA
> > DMA
> > DMA
> > DMA
> > vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices
> > 
> > BRs
> > zlc
> 
> 
> --  
> Alex Bennée  
> Virtualisation Tech Lead @ Linaro

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re:Re: How do  i know which "iommu" i used on VM? qemu emulated or hardware DMAR?
  2025-12-17 14:12   ` CLEMENT MATHIEU--DRIF
@ 2025-12-18  1:33     ` tugouxp
  2025-12-18  6:47       ` Yi Liu
  0 siblings, 1 reply; 6+ messages in thread
From: tugouxp @ 2025-12-18  1:33 UTC (permalink / raw)
  To: CLEMENT MATHIEU--DRIF
  Cc: Alex Bennée, qemu-devel@nongnu.org, Michael S.Tsirkin,
	Jason Wang, Yi Liu

[-- Attachment #1: Type: text/plain, Size: 6837 bytes --]



Thanks for your kindly help, it seems much clear now!


      So it seems that the QEMU parameters -device intel-iommu and virtio-iommu you said both implement purely software-emulated IOMMUs, is that correct? I have another question: Both Intel IOMMU and ARM SMMU support two-stage translation, where the second stage is managed by VFIO to handle the translation from IPA to HPA. Then, who manages the first stage? I find it hard to believe that the first stage is directly managed by the VM OS because, as you mentioned earlier, simultaneous access to the IOMMU hardware by both the VM and the host would pose security issues. Therefore, it is highly likely that the first stage is also managed by QEMU. However, in both QEMU's code and VFIO's code, I only see calls for creating second-stage IOMMU domains, and I haven’t traced any calls related to creating a first-stage IOMMU domain. This is where my understanding gets stuck. Am I misunderstanding something here? 


Thank you for your guidance.


BRs
zlcao


At 2025-12-17 22:12:47, "CLEMENT MATHIEU--DRIF" <clement.mathieu--drif@eviden.com> wrote:
>
>On Wed, 2025-12-17 at 11:54 +0000, Alex Bennée wrote:
>> Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.
>> 
>> 
>> tugouxp <[13824125580@163.com](mailto:13824125580@163.com)> writes:
>> 
>> (I'll preface this with my not-an-expert hat on, adding some iommu  
>> maintainers to the CC who might correct my ramblings)
>> 
>> 
>> > Hi folks:
>> >   Hello everyone, I have a few questions regarding the QEMU device passthrough feature that I’d like to ask for help with.
>> > 
>> > Both my HOST OS and GUEST OS are running Ubuntu 20.4.6. I passed through a dedicated NVIDIA MX250 GPU from the HOST to the
>> > GUEST OS. On the HOST, I installed the VFIO-PCI driver for passthrough, while the GUEST OS uses the default Nouveau driver from
>> > Ubuntu 20.4.6. I also enabled IOMMU in the GUEST OS and checked the IOMMU group layout from
>> > sysfs"/sys/kernel/iommu_group/xxxx/type". The passthrough MX250 operates in “DMA” translation mode,which means the translation
>> > really work. Thanks to your excellent work, the setup process went smoothly and everything runs well. However, I have a couple of
>> > questions:
>> > 
>> > 1 Is the IOMMU (DMAR) in the GUEST OS emulated by QEMU, or does it
>> > share the same IOMMU as the HOST OS?
>> 
>> 
>> Generally the guest IOMMU is emulated. You do not want the guest to be  
>> able to directly program the host HW because that would open up security  
>> issues. However for simplicity the IOMMU presented to the guest it  
>> usually the same as the host hardware - whatever the architecturally  
>> mandated IOMMU is.
>> 
>> There are fully virtual IOMMU's (e.g. virtio-iommu) which completely  
>> abstract the host hardware away.
>> 
>> In both these cases it is QEMUs responsibility to take the guest  
>> programming and apply those changes to the host hardware to ensure the  
>> mappings work properly.
>> 
>> There are also host IOMMU's which virtualise some of the interfaces to  
>> so the guest can directly program them (within certain bounds) for their  
>> mappings. I have no idea if the intel-iommu is one of these.
>> 
>> 
>> 
>> > 2 Given that both the GUEST OS and HOST OS have IOMMU enabled, when the MX250 performs DMA, it should go through two-stage
>> >  page table translation—first in the GUEST OS and then in the HOST OS—with VFIO-PCI assisting in this process, correct? If so, are
>> >  both stages handled by hardware? I understand that the second stage is definitely hardware-assisted, but I’m uncertain about the
>> >  first stage: whether the translation from IOVA to GPA (IPA) within
>> > the GUEST OS is also hardware-assisted.
>> 
>> 
>> I think this will depend on the implementation details of the particular  
>> IOMMU.
>> 
>> The guest will create/manage tables to map IOVA -> GPA.
>> 
>> There are two options for QEMU now.
>> 
>> The first is monitor the guest page tables for changes and then create a  
>> shadow page table that mirrors the guest but maps the IOVA directly to  
>> the final host physical address (HPA). This would be a single stage  
>> translation. I think this is how intel-iommu,caching-mode=on works.
>
>Indeed, caching mode allows us to trap and hook where needed to build the shadow page table.
>
>A new mode based on nested translation is under development.  
>I recommend reading this if you want more details: https://lists.nongnu.org/archive/html/qemu-devel/2025-12/msg01796.html
>
>> 
>> The second option is for IOMMU's that support a full two-stage HW  
>> translation (much in the same way as hypervisors have a second stage).  
>> The initial lookup would be via the guests iommu table (IOVA->GPA)  
>> before a second stage controlled by the host would map to the final  
>> address (GPA->HPA). I think two stage IOMMU's are a requirement if you  
>> are handling nested VMs.
>> 
>> 
>> > Those are my two questions. Thank you very much for your help!
>> > some information about my env:
>> > 
>> > 1 Qemu Launch VM command:    qemu-system-x86_64 -cpu qemu64,+mtrr,+ssse3,sse4.1,+sse4.2 -m 4096 -smp 4 --enable-kvm -
>> >  drive file=./test-vm-1.qcow2,if=virtio -machine q35,kernel-irqchip=split -device intel-iommu,intremap=on,caching-mode=on -
>> >  device vfio-pci,host=02:00.0
>> > 2 vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ ls
>> > 0000:00:00.0  0000:00:01.0  0000:00:02.0  0000:00:03.0  0000:00:04.0  0000:00:1f.0  0000:00:1f.2  0000:00:1f.3
>> > vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ lspci
>> > 00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
>> > 00:01.0 VGA compatible controller: Device 1234:1111 (rev 02)
>> > 00:02.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
>> > 00:03.0 3D controller: NVIDIA Corporation GP108M [GeForce MX250] (rev a1)
>> > 00:04.0 SCSI storage controller: Red Hat, Inc. Virtio block device
>> > 00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller (rev 02)
>> > 00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] (rev 02)
>> > 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
>> > vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ cat /sys/kernel/iommu_groups/
>> > 0/ 1/ 2/ 3/ 4/ 5/
>> > vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices$ cat /sys/kernel/iommu_groups/*/type
>> > DMA
>> > DMA
>> > DMA
>> > DMA
>> > DMA
>> > DMA
>> > vms@vms-Standard-PC-i440FX-PIIX-1996:/sys/class/iommu/dmar0/devices
>> > 
>> > BRs
>> > zlc
>> 
>> 
>> --  
>> Alex Bennée  
>> Virtualisation Tech Lead @ Linaro

[-- Attachment #2: Type: text/html, Size: 11763 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How do i know which "iommu" i used on VM? qemu emulated or hardware DMAR?
  2025-12-18  1:33     ` tugouxp
@ 2025-12-18  6:47       ` Yi Liu
  2025-12-18  7:23         ` tugouxp
  0 siblings, 1 reply; 6+ messages in thread
From: Yi Liu @ 2025-12-18  6:47 UTC (permalink / raw)
  To: tugouxp, CLEMENT MATHIEU--DRIF
  Cc: Alex Bennée, qemu-devel@nongnu.org, Michael S.Tsirkin,
	Jason Wang

On 2025/12/18 09:33, tugouxp wrote:
> 
> Thanks for your kindly help, it seems much clear now!
> 
>        So it seems that the QEMU parameters |-device intel-iommu| and 
> |virtio-iommu| you said both implement purely software-emulated IOMMUs, 
> is that correct? I have another question: Both Intel IOMMU and ARM SMMU 
> support two-stage translation, where the second stage is managed by VFIO 
> to handle the translation from IPA to HPA. Then, who manages the first 
> stage?

In nested translation mode, guest manages the first stage.

> I find it hard to believe that the first stage is directly 
> managed by the VM OS because, as you mentioned earlier, simultaneous 
> access to the IOMMU hardware by both the VM and the host would pose 
> security issues.

In nested translation, any output of first stage translation is
subjected to the second stage, and second stage is under VMM's
control. So guest cannot harm the system even it manages first
stage.

> Therefore, it is highly likely that the first stage is 
> also managed by QEMU. However, in both QEMU's code and VFIO's code, I 
> only see calls for creating second-stage IOMMU domains, and I haven’t 
> traced any calls related to creating a first-stage IOMMU domain. This is 
> where my understanding gets stuck. Am I misunderstanding something here?

nested translation mode is wip. You can get a full view by referring the
below links.

[1] 
https://lore.kernel.org/qemu-devel/20251215065046.86991-1-zhenzhong.duan@intel.com/
[2] 
https://lore.kernel.org/qemu-devel/20251120132213.56581-1-skolothumtho@nvidia.com/

 >>> > 2 Given that both the GUEST OS and HOST OS have IOMMU enabled, 
when the MX250 performs DMA, it should go through two-stage
 >>> >  page table translation—first in the GUEST OS and then in the 
HOST OS—with VFIO-PCI assisting in this process, correct? If so, are
 >>> >  both stages handled by hardware? I understand that the second 
stage is definitely hardware-assisted, but I’m uncertain about the
 >>> >  first stage: whether the translation from IOVA to GPA (IPA) within
 >>> > the GUEST OS is also hardware-assisted.

Alex has provided a comprehensive response to this quetion. I'd like to
emphasize one key point in case there are any remaining questions: For
passthrough devices, DMA address translation is invariably handled by
the hardware IOMMU. The VMM is responsible for configuring the
appropriate translation type and establishing the correct page table
mappings.

Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re:Re: How do i know which "iommu" i used on VM? qemu emulated or hardware DMAR?
  2025-12-18  6:47       ` Yi Liu
@ 2025-12-18  7:23         ` tugouxp
  0 siblings, 0 replies; 6+ messages in thread
From: tugouxp @ 2025-12-18  7:23 UTC (permalink / raw)
  To: Yi Liu
  Cc: CLEMENT MATHIEU--DRIF, Alex Bennée, qemu-devel@nongnu.org,
	Michael S.Tsirkin, Jason Wang

[-- Attachment #1: Type: text/plain, Size: 3503 bytes --]

Hi yi:



Thank you very much. Through our previous communication, I think I have a general understanding now. Please help confirm if my understanding is correct:

In my application scenario , I am passing through a GPU to the VM, and the VM OS does not have IOMMU enabled. As I understand it, the current mainline QEMU and Linux kernel should be able to achieve this. However, if I want the passthrough GPU device to support both Stage 1 and Stage 2 translations, I must use a special development branch in QEMU, is that correct?

By the way, I mentioned earlier that I was using QEMU 4.2.1. After double-checking, I found that the passthrough GPU did not successfully install its own driver; instead, it was using the emulated BOCHS DRM driver. I am not sure if this is related to the lack of support for nested IOMMU, but it seems highly likely that it is.

Thank you once again very much.


At 2025-12-18 14:47:10, "Yi Liu" <yi.l.liu@intel.com> wrote:
>On 2025/12/18 09:33, tugouxp wrote:
>> 
>> Thanks for your kindly help, it seems much clear now!
>> 
>>        So it seems that the QEMU parameters |-device intel-iommu| and 
>> |virtio-iommu| you said both implement purely software-emulated IOMMUs, 
>> is that correct? I have another question: Both Intel IOMMU and ARM SMMU 
>> support two-stage translation, where the second stage is managed by VFIO 
>> to handle the translation from IPA to HPA. Then, who manages the first 
>> stage?
>
>In nested translation mode, guest manages the first stage.
>
>> I find it hard to believe that the first stage is directly 
>> managed by the VM OS because, as you mentioned earlier, simultaneous 
>> access to the IOMMU hardware by both the VM and the host would pose 
>> security issues.
>
>In nested translation, any output of first stage translation is
>subjected to the second stage, and second stage is under VMM's
>control. So guest cannot harm the system even it manages first
>stage.
>
>> Therefore, it is highly likely that the first stage is 
>> also managed by QEMU. However, in both QEMU's code and VFIO's code, I 
>> only see calls for creating second-stage IOMMU domains, and I haven’t 
>> traced any calls related to creating a first-stage IOMMU domain. This is 
>> where my understanding gets stuck. Am I misunderstanding something here?
>
>nested translation mode is wip. You can get a full view by referring the
>below links.
>
>[1] 
>https://lore.kernel.org/qemu-devel/20251215065046.86991-1-zhenzhong.duan@intel.com/
>[2] 
>https://lore.kernel.org/qemu-devel/20251120132213.56581-1-skolothumtho@nvidia.com/
>
> >>> > 2 Given that both the GUEST OS and HOST OS have IOMMU enabled, 
>when the MX250 performs DMA, it should go through two-stage
> >>> >  page table translation—first in the GUEST OS and then in the 
>HOST OS—with VFIO-PCI assisting in this process, correct? If so, are
> >>> >  both stages handled by hardware? I understand that the second 
>stage is definitely hardware-assisted, but I’m uncertain about the
> >>> >  first stage: whether the translation from IOVA to GPA (IPA) within
> >>> > the GUEST OS is also hardware-assisted.
>
>Alex has provided a comprehensive response to this quetion. I'd like to
>emphasize one key point in case there are any remaining questions: For
>passthrough devices, DMA address translation is invariably handled by
>the hardware IOMMU. The VMM is responsible for configuring the
>appropriate translation type and establishing the correct page table
>mappings.
>
>Regards,
>Yi Liu

[-- Attachment #2: Type: text/html, Size: 5420 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-12-18  7:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-17  4:20 How do i know which "iommu" i used on VM? qemu emulated or hardware DMAR? tugouxp
2025-12-17 11:54 ` Alex Bennée
2025-12-17 14:12   ` CLEMENT MATHIEU--DRIF
2025-12-18  1:33     ` tugouxp
2025-12-18  6:47       ` Yi Liu
2025-12-18  7:23         ` tugouxp

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.