[PATCH 0/2] amd_iommu: Fixes

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] amd_iommu: Fixes
@ 2025-04-10  6:44 Sairaj Kodilkar
  2025-04-10  6:44 ` [PATCH 1/2] hw/i386/amd_iommu: Fix device setup failure when PT is on Sairaj Kodilkar
  2025-04-10  6:44 ` [PATCH 2/2] hw/i386/amd_iommu: Fix xtsup when vcpus < 255 Sairaj Kodilkar
  0 siblings, 2 replies; 10+ messages in thread
From: Sairaj Kodilkar @ 2025-04-10  6:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: suravee.suthikulpanit, alejandro.j.jimenez, joao.m.martins,
	philmd, sarunkod, vasant.hegde

Fix following two issues in the amd viommu
1. The guest fails to setup the device when pt_supported=on, because
   amd iommu enables the no DMA memory region even when device is using DMA
   remapping mode.
2. The guest fails to boot with xtsup=on and <= 255 vCPUs, because amd_iommu
   does not enable x2apic mode.

base commit 56c6e249b6988c1b6edc2dd34ebb0f1e570a1365 (v10.0.0-rc3)

Sairaj Kodilkar (1):
  hw/i386/amd_iommu: Fix device setup failure when PT is on.

Vasant Hegde (1):
  hw/i386/amd_iommu: Fix xtsup when vcpus < 255

 hw/i386/amd_iommu.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] hw/i386/amd_iommu: Fix device setup failure when PT is on.
  2025-04-10  6:44 [PATCH 0/2] amd_iommu: Fixes Sairaj Kodilkar
@ 2025-04-10  6:44 ` Sairaj Kodilkar
  2025-04-10  8:01   ` Vasant Hegde
  2025-04-14 20:26   ` Alejandro Jimenez
  2025-04-10  6:44 ` [PATCH 2/2] hw/i386/amd_iommu: Fix xtsup when vcpus < 255 Sairaj Kodilkar
  1 sibling, 2 replies; 10+ messages in thread
From: Sairaj Kodilkar @ 2025-04-10  6:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: suravee.suthikulpanit, alejandro.j.jimenez, joao.m.martins,
	philmd, sarunkod, vasant.hegde

Current amd_iommu enables the iommu_nodma address space when pt_supported
flag is on. This causes device to bypass the IOMMU and use untranslated
address to perform DMA when guest kernel uses DMA mode, resulting in
failure to setup the devices in the guest.

Fix the issue by removing pt_supported check and disabling nodma memory
region. Adding pt_supported requires additional changes and we will look
into it later.

Fixes: c1f46999ef506 ("amd_iommu: Add support for pass though mode")
Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
---
 hw/i386/amd_iommu.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 5f9b95279997..df8ba5d39ada 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -1426,7 +1426,6 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
     AMDVIState *s = opaque;
     AMDVIAddressSpace **iommu_as, *amdvi_dev_as;
     int bus_num = pci_bus_num(bus);
-    X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
 
     iommu_as = s->address_spaces[bus_num];
 
@@ -1486,15 +1485,8 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
                                             AMDVI_INT_ADDR_FIRST,
                                             &amdvi_dev_as->iommu_ir, 1);
 
-        if (!x86_iommu->pt_supported) {
-            memory_region_set_enabled(&amdvi_dev_as->iommu_nodma, false);
-            memory_region_set_enabled(MEMORY_REGION(&amdvi_dev_as->iommu),
-                                      true);
-        } else {
-            memory_region_set_enabled(MEMORY_REGION(&amdvi_dev_as->iommu),
-                                      false);
-            memory_region_set_enabled(&amdvi_dev_as->iommu_nodma, true);
-        }
+        memory_region_set_enabled(&amdvi_dev_as->iommu_nodma, false);
+        memory_region_set_enabled(MEMORY_REGION(&amdvi_dev_as->iommu), true);
     }
     return &iommu_as[devfn]->as;
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] hw/i386/amd_iommu: Fix xtsup when vcpus < 255
  2025-04-10  6:44 [PATCH 0/2] amd_iommu: Fixes Sairaj Kodilkar
  2025-04-10  6:44 ` [PATCH 1/2] hw/i386/amd_iommu: Fix device setup failure when PT is on Sairaj Kodilkar
@ 2025-04-10  6:44 ` Sairaj Kodilkar
  2025-04-10  9:16   ` Joao Martins
  2025-04-11  0:56   ` Alejandro Jimenez
  1 sibling, 2 replies; 10+ messages in thread
From: Sairaj Kodilkar @ 2025-04-10  6:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: suravee.suthikulpanit, alejandro.j.jimenez, joao.m.martins,
	philmd, sarunkod, vasant.hegde

From: Vasant Hegde <vasant.hegde@amd.com>

If vCPUs > 255 then x86 common code (x86_cpus_init()) call kvm_enable_x2apic().
But if vCPUs <= 255 then it won't call kvm_enable_x2apic().

Booting guest in x2apic mode, amd-iommu,xtsup=on and <= 255 vCPUs is
broken as it fails to call kvm_enable_x2apic().

Fix this by adding back kvm_enable_x2apic() call when xtsup=on.

Fixes: 8c6619f3e692 ("hw/i386/amd_iommu: Simplify non-KVM checks on XTSup feature")
Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Cc: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
---
 hw/i386/amd_iommu.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index df8ba5d39ada..af85706b8a0d 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -1649,6 +1649,14 @@ static void amdvi_sysbus_realize(DeviceState *dev, Error **errp)
         exit(EXIT_FAILURE);
     }
 
+    if (s->xtsup) {
+        if (kvm_irqchip_is_split() && !kvm_enable_x2apic()) {
+            error_report("AMD IOMMU xtsup=on requires x2APIC support on "
+                          "the KVM side");
+            exit(EXIT_FAILURE);
+        }
+    }
+
     pci_setup_iommu(bus, &amdvi_iommu_ops, s);
     amdvi_init(s);
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] hw/i386/amd_iommu: Fix device setup failure when PT is on.
  2025-04-10  6:44 ` [PATCH 1/2] hw/i386/amd_iommu: Fix device setup failure when PT is on Sairaj Kodilkar
@ 2025-04-10  8:01   ` Vasant Hegde
  2025-04-14 20:26   ` Alejandro Jimenez
  1 sibling, 0 replies; 10+ messages in thread
From: Vasant Hegde @ 2025-04-10  8:01 UTC (permalink / raw)
  To: Sairaj Kodilkar, qemu-devel
  Cc: suravee.suthikulpanit, alejandro.j.jimenez, joao.m.martins,
	philmd, Michael S. Tsirkin


+ Michael,

On 4/10/2025 12:14 PM, Sairaj Kodilkar wrote:
> Current amd_iommu enables the iommu_nodma address space when pt_supported
> flag is on. This causes device to bypass the IOMMU and use untranslated
> address to perform DMA when guest kernel uses DMA mode, resulting in
> failure to setup the devices in the guest.
> 
> Fix the issue by removing pt_supported check and disabling nodma memory
> region. Adding pt_supported requires additional changes and we will look
> into it later.
> 
> Fixes: c1f46999ef506 ("amd_iommu: Add support for pass though mode")
> Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>

Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>

-Vasant




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] hw/i386/amd_iommu: Fix xtsup when vcpus < 255
  2025-04-10  6:44 ` [PATCH 2/2] hw/i386/amd_iommu: Fix xtsup when vcpus < 255 Sairaj Kodilkar
@ 2025-04-10  9:16   ` Joao Martins
  2025-04-11  0:56   ` Alejandro Jimenez
  1 sibling, 0 replies; 10+ messages in thread
From: Joao Martins @ 2025-04-10  9:16 UTC (permalink / raw)
  To: Sairaj Kodilkar
  Cc: suravee.suthikulpanit, alejandro.j.jimenez, philmd, vasant.hegde,
	Michael S. Tsirkin, qemu-devel, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost

+x86 maintainers (you forgot to CC them)

On 10/04/2025 07:44, Sairaj Kodilkar wrote:
> From: Vasant Hegde <vasant.hegde@amd.com>
> 
> If vCPUs > 255 then x86 common code (x86_cpus_init()) call kvm_enable_x2apic().
> But if vCPUs <= 255 then it won't call kvm_enable_x2apic().
> 
> Booting guest in x2apic mode, amd-iommu,xtsup=on and <= 255 vCPUs is> broken
as it fails to call kvm_enable_x2apic().
> 
> Fix this by adding back kvm_enable_x2apic() call when xtsup=on.
> 
> Fixes: 8c6619f3e692 ("hw/i386/amd_iommu: Simplify non-KVM checks on XTSup feature")
> Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
> Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
> Cc: Joao Martins <joao.m.martins@oracle.com>
> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>

Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

> ---
>  hw/i386/amd_iommu.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index df8ba5d39ada..af85706b8a0d 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -1649,6 +1649,14 @@ static void amdvi_sysbus_realize(DeviceState *dev, Error **errp)
>          exit(EXIT_FAILURE);
>      }
>  
> +    if (s->xtsup) {
> +        if (kvm_irqchip_is_split() && !kvm_enable_x2apic()) {
> +            error_report("AMD IOMMU xtsup=on requires x2APIC support on "
> +                          "the KVM side");
> +            exit(EXIT_FAILURE);
> +        }
> +    }
> +
>      pci_setup_iommu(bus, &amdvi_iommu_ops, s);
>      amdvi_init(s);
>  }



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] hw/i386/amd_iommu: Fix xtsup when vcpus < 255
  2025-04-10  6:44 ` [PATCH 2/2] hw/i386/amd_iommu: Fix xtsup when vcpus < 255 Sairaj Kodilkar
  2025-04-10  9:16   ` Joao Martins
@ 2025-04-11  0:56   ` Alejandro Jimenez
  1 sibling, 0 replies; 10+ messages in thread
From: Alejandro Jimenez @ 2025-04-11  0:56 UTC (permalink / raw)
  To: Sairaj Kodilkar, qemu-devel
  Cc: suravee.suthikulpanit, joao.m.martins, philmd, vasant.hegde



On 4/10/25 2:44 AM, Sairaj Kodilkar wrote:
> From: Vasant Hegde <vasant.hegde@amd.com>
> 
> If vCPUs > 255 then x86 common code (x86_cpus_init()) call kvm_enable_x2apic().
> But if vCPUs <= 255 then it won't call kvm_enable_x2apic().
> 
> Booting guest in x2apic mode, amd-iommu,xtsup=on and <= 255 vCPUs is
> broken as it fails to call kvm_enable_x2apic().
> 
> Fix this by adding back kvm_enable_x2apic() call when xtsup=on.
> 
> Fixes: 8c6619f3e692 ("hw/i386/amd_iommu: Simplify non-KVM checks on XTSup feature")
> Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
> Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
> Cc: Joao Martins <joao.m.martins@oracle.com>
> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>

Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>

> ---
>   hw/i386/amd_iommu.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index df8ba5d39ada..af85706b8a0d 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -1649,6 +1649,14 @@ static void amdvi_sysbus_realize(DeviceState *dev, Error **errp)
>           exit(EXIT_FAILURE);
>       }
>   
> +    if (s->xtsup) {
> +        if (kvm_irqchip_is_split() && !kvm_enable_x2apic()) {
> +            error_report("AMD IOMMU xtsup=on requires x2APIC support on "
> +                          "the KVM side");
> +            exit(EXIT_FAILURE);
> +        }
> +    }
> +
>       pci_setup_iommu(bus, &amdvi_iommu_ops, s);
>       amdvi_init(s);
>   }



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] hw/i386/amd_iommu: Fix device setup failure when PT is on.
  2025-04-10  6:44 ` [PATCH 1/2] hw/i386/amd_iommu: Fix device setup failure when PT is on Sairaj Kodilkar
  2025-04-10  8:01   ` Vasant Hegde
@ 2025-04-14 20:26   ` Alejandro Jimenez
  2025-04-15  6:38     ` Sairaj Kodilkar
  1 sibling, 1 reply; 10+ messages in thread
From: Alejandro Jimenez @ 2025-04-14 20:26 UTC (permalink / raw)
  To: Sairaj Kodilkar, qemu-devel
  Cc: suravee.suthikulpanit, joao.m.martins, philmd, vasant.hegde

Hi Sairaj,

I'm conflicted by the implementation of the change, so I'd like to make 
sure I fully understand...

On 4/10/25 2:44 AM, Sairaj Kodilkar wrote:
> Current amd_iommu enables the iommu_nodma address space when pt_supported
> flag is on. 

As it should, that is the intended purpose of the iommu_nodma memory region.

This causes device to bypass the IOMMU and use untranslated
> address to perform DMA when guest kernel uses DMA mode, resulting in
> failure to setup the devices in the guest.

So the scenario you are describing above is this QEMU cmdline (using 
explicit options):

-device amd-iommu,intremap=on,xtsup=on,pt=on \
-device vfio-pci,host=0000:a1:00.1...

and guest forcing DMA remap mode e.g. 'iommu.passthrough=0'

which will cause failures from QEMU:

qemu-system-x86_64: AHCI: Failed to start DMA engine: bad command list 
buffer address
qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS 
receive buffer address
qemu-system-x86_64: AHCI: Failed to start DMA engine: bad command list 
buffer address
qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS 
receive buffer address
qemu-system-x86_64: AHCI: Failed to start DMA engine: bad command list 
buffer address

and also errors from the VF driver on the guest. e.g.:

[   69.424937] mlx5_core 0000:00:03.0: mlx5_function_enable:1212:(pid 
2492): enable hca failed
[   69.437048] mlx5_core 0000:00:03.0: probe_one:1994:(pid 2492): 
mlx5_init_one failed with error code -110
[   69.444913] mlx5_core 0000:00:03.0: probe with driver mlx5_core 
failed with error -110

Whereas after your change the guest would fail to launch because VFIO 
will try to register a MAP notifier for the device and fail the check in
amdvi_iommu_notify_flag_changed().

If my above description is correct, then...

> 
> Fix the issue by removing pt_supported check and disabling nodma memory
> region. Adding pt_supported requires additional changes and we will look
> into it later.

I see that you are trying to essentially block a guest from enabling an 
IOMMU feature that is not currently supported by the vIOMMU. Hopefully 
that limitation will be solved soon (shameless plug):
https://lore.kernel.org/qemu-devel/20250414020253.443831-1-alejandro.j.jimenez@oracle.com/

But in the meantime, I think enabling amdvi_dev_as->iommu when DMA 
remapping capability is not available is likely to cause more confusion 
for anyone trying to understand the already convoluted details of the 
memory region setup. To a reader of the code and the commit message, it 
is confusing that to support the "NO DMA" case, the nodma memory region 
must be disabled, which is the opposite of what it is meant to do.

To explain the "trick": this change is always enabling 
amdvi_dev_as->iommu, which is explicitly created as an IOMMU memory 
region (i.e. a memory region with mr->is_iommu == true), and it is meant 
to support DMA remapping. It is relying on the "side effect" that VFIO 
will try to register notifiers for memory regions that are an "IOMMU" 
(i.e. pass the check in memory_region_is_iommu()), and later fail when 
trying to register the notifier.

If this change is merged, I think you should at least include the 
explanation above in the commit message, since it is not obvious to me 
at first reading. That being said, in my opinion, this approach adds 
potential confusion that is not worth the trouble, since most guests 
will not be using AMD vIOMMU at this point. And if they are, they would 
also have to be specifically requesting to enable DMA translation to hit 
the problem. Unfortunately, guests will always have the ability of 
specifying an invalid configuration if they try really hard (or not hard 
at all in this case).

Alejandro

>
> Fixes: c1f46999ef506 ("amd_iommu: Add support for pass though mode")
> Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
> ---
>   hw/i386/amd_iommu.c | 12 ++----------
>   1 file changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index 5f9b95279997..df8ba5d39ada 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -1426,7 +1426,6 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>       AMDVIState *s = opaque;
>       AMDVIAddressSpace **iommu_as, *amdvi_dev_as;
>       int bus_num = pci_bus_num(bus);
> -    X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
>   
>       iommu_as = s->address_spaces[bus_num];
>   
> @@ -1486,15 +1485,8 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>                                               AMDVI_INT_ADDR_FIRST,
>                                               &amdvi_dev_as->iommu_ir, 1);
>   
> -        if (!x86_iommu->pt_supported) {
> -            memory_region_set_enabled(&amdvi_dev_as->iommu_nodma, false);
> -            memory_region_set_enabled(MEMORY_REGION(&amdvi_dev_as->iommu),
> -                                      true);
> -        } else {
> -            memory_region_set_enabled(MEMORY_REGION(&amdvi_dev_as->iommu),
> -                                      false);
> -            memory_region_set_enabled(&amdvi_dev_as->iommu_nodma, true);
> -        }
> +        memory_region_set_enabled(&amdvi_dev_as->iommu_nodma, false);
> +        memory_region_set_enabled(MEMORY_REGION(&amdvi_dev_as->iommu), true);
>       }
>       return &iommu_as[devfn]->as;
>   }

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] hw/i386/amd_iommu: Fix device setup failure when PT is on.
  2025-04-14 20:26   ` Alejandro Jimenez
@ 2025-04-15  6:38     ` Sairaj Kodilkar
  2025-04-15 18:28       ` Alejandro Jimenez
  0 siblings, 1 reply; 10+ messages in thread
From: Sairaj Kodilkar @ 2025-04-15  6:38 UTC (permalink / raw)
  To: Alejandro Jimenez, qemu-devel
  Cc: suravee.suthikulpanit, joao.m.martins, philmd, vasant.hegde



Hi Alejandro,

On 4/15/2025 1:56 AM, Alejandro Jimenez wrote:

> Hi Sairaj,
> 
> I'm conflicted by the implementation of the change, so I'd like to make 
> sure I fully understand...
> 
> On 4/10/25 2:44 AM, Sairaj Kodilkar wrote:
>> Current amd_iommu enables the iommu_nodma address space when pt_supported
>> flag is on. 
> 
> As it should, that is the intended purpose of the iommu_nodma memory 
> region.
> 
> This causes device to bypass the IOMMU and use untranslated
>> address to perform DMA when guest kernel uses DMA mode, resulting in
>> failure to setup the devices in the guest.
> 
> So the scenario you are describing above is this QEMU cmdline (using 
> explicit options):
> 
> -device amd-iommu,intremap=on,xtsup=on,pt=on \
> -device vfio-pci,host=0000:a1:00.1...
> 
> and guest forcing DMA remap mode e.g. 'iommu.passthrough=0'
> 
> which will cause failures from QEMU:
> 
> qemu-system-x86_64: AHCI: Failed to start DMA engine: bad command list 
> buffer address
> qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS 
> receive buffer address
> qemu-system-x86_64: AHCI: Failed to start DMA engine: bad command list 
> buffer address
> qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS 
> receive buffer address
> qemu-system-x86_64: AHCI: Failed to start DMA engine: bad command list 
> buffer address
> 
> and also errors from the VF driver on the guest. e.g.:
> 
> [   69.424937] mlx5_core 0000:00:03.0: mlx5_function_enable:1212:(pid 
> 2492): enable hca failed
> [   69.437048] mlx5_core 0000:00:03.0: probe_one:1994:(pid 2492): 
> mlx5_init_one failed with error code -110
> [   69.444913] mlx5_core 0000:00:03.0: probe with driver mlx5_core 
> failed with error -110
> 
> 
> Whereas after your change the guest would fail to launch because VFIO 
> will try to register a MAP notifier for the device and fail the check in
> amdvi_iommu_notify_flag_changed().
> 
> If my above description is correct, then...
> 
Yep, The above description is correct. I should have included it in the
cover letter.

>>
>> Fix the issue by removing pt_supported check and disabling nodma memory
>> region. Adding pt_supported requires additional changes and we will look
>> into it later.
> 
> I see that you are trying to essentially block a guest from enabling an 
> IOMMU feature that is not currently supported by the vIOMMU. Hopefully 
> that limitation will be solved soon (shameless plug):
> https://lore.kernel.org/qemu-devel/20250414020253.443831-1- 
> alejandro.j.jimenez@oracle.com/
> 
> But in the meantime, I think enabling amdvi_dev_as->iommu when DMA 
> remapping capability is not available is likely to cause more confusion 
> for anyone trying to understand the already convoluted details of the 
> memory region setup. 

> To a reader of the code and the commit message, it 
> is confusing that to support the "NO DMA" case, the nodma memory region 
> must be disabled, which is the opposite of what it is meant to do.
> 

I dont think that I understand above statement. What do you mean by "NO
DMA" case here ? is it iommu.passthrough=0 ?

Essentially, I am trying to support the "DMA" case that is
iommu.passthrough=0 for the emulated devices, by reverting the changes
that introduced the regression.

If I understand correct -->
The original intent of the flag (in case of Intel) is

1. To turn on the optimization which will use nodma region (dynamically
    enabling it) if guest configures the device with passthrough (pt=1)
    for given context entry.

2. The flag should not enable no_dma region if guest does not configure
    device with pt.

Intel driver does this dynamically (for every context entry update while
guest is running). But for AMD this is static and does not change with
the DTE updates, which is causing this regression.

> To explain the "trick": this change is always enabling amdvi_dev_as- 
>  >iommu, which is explicitly created as an IOMMU memory region (i.e. a 
> memory region with mr->is_iommu == true), and it is meant to support DMA 
> remapping. It is relying on the "side effect" that VFIO will try to 
> register notifiers for memory regions that are an "IOMMU" (i.e. pass the 
> check in memory_region_is_iommu()), and later fail when trying to 
> register the notifier.
> 
> If this change is merged, I think you should at least include the 
> explanation above in the commit message, since it is not obvious to me 
> at first reading. That being said, in my opinion, this approach adds 
> potential confusion that is not worth the trouble, since most guests 
> will not be using AMD vIOMMU at this point. And if they are, they would 
> also have to be specifically requesting to enable DMA translation to hit 
> the problem. Unfortunately, guests will always have the ability of 
> specifying an invalid configuration if they try really hard (or not hard 
> at all in this case).
> 

Yep, I should have explained it in details. Sorry about the confusion
will keep in mind while sending future patches.

Regards
Sairaj

> Alejandro
> 
>>
>> Fixes: c1f46999ef506 ("amd_iommu: Add support for pass though mode")
>> Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
>> ---
>>   hw/i386/amd_iommu.c | 12 ++----------
>>   1 file changed, 2 insertions(+), 10 deletions(-)
>>
>> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
>> index 5f9b95279997..df8ba5d39ada 100644
>> --- a/hw/i386/amd_iommu.c
>> +++ b/hw/i386/amd_iommu.c
>> @@ -1426,7 +1426,6 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus 
>> *bus, void *opaque, int devfn)
>>       AMDVIState *s = opaque;
>>       AMDVIAddressSpace **iommu_as, *amdvi_dev_as;
>>       int bus_num = pci_bus_num(bus);
>> -    X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
>>       iommu_as = s->address_spaces[bus_num];
>> @@ -1486,15 +1485,8 @@ static AddressSpace 
>> *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>>                                               AMDVI_INT_ADDR_FIRST,
>>                                               &amdvi_dev_as->iommu_ir, 
>> 1);
>> -        if (!x86_iommu->pt_supported) {
>> -            memory_region_set_enabled(&amdvi_dev_as->iommu_nodma, 
>> false);
>> -            memory_region_set_enabled(MEMORY_REGION(&amdvi_dev_as- 
>> >iommu),
>> -                                      true);
>> -        } else {
>> -            memory_region_set_enabled(MEMORY_REGION(&amdvi_dev_as- 
>> >iommu),
>> -                                      false);
>> -            memory_region_set_enabled(&amdvi_dev_as->iommu_nodma, true);
>> -        }
>> +        memory_region_set_enabled(&amdvi_dev_as->iommu_nodma, false);
>> +        memory_region_set_enabled(MEMORY_REGION(&amdvi_dev_as- 
>> >iommu), true);
>>       }
>>       return &iommu_as[devfn]->as;
>>   }
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] hw/i386/amd_iommu: Fix device setup failure when PT is on.
  2025-04-15  6:38     ` Sairaj Kodilkar
@ 2025-04-15 18:28       ` Alejandro Jimenez
  2025-04-17 10:21         ` Sairaj Kodilkar
  0 siblings, 1 reply; 10+ messages in thread
From: Alejandro Jimenez @ 2025-04-15 18:28 UTC (permalink / raw)
  To: Sairaj Kodilkar, qemu-devel
  Cc: suravee.suthikulpanit, joao.m.martins, philmd, vasant.hegde



On 4/15/25 2:38 AM, Sairaj Kodilkar wrote:
> 
> 
> Hi Alejandro,
> 
> On 4/15/2025 1:56 AM, Alejandro Jimenez wrote:
> 
>> Hi Sairaj,
>>
>> I'm conflicted by the implementation of the change, so I'd like to 
>> make sure I fully understand...
>>
>> On 4/10/25 2:44 AM, Sairaj Kodilkar wrote:

>>> Fix the issue by removing pt_supported check and disabling nodma memory
>>> region. Adding pt_supported requires additional changes and we will look
>>> into it later.
>>
>> I see that you are trying to essentially block a guest from enabling 
>> an IOMMU feature that is not currently supported by the vIOMMU. 
>> Hopefully that limitation will be solved soon (shameless plug):
>> https://lore.kernel.org/qemu-devel/20250414020253.443831-1- 
>> alejandro.j.jimenez@oracle.com/
>>
>> But in the meantime, I think enabling amdvi_dev_as->iommu when DMA 
>> remapping capability is not available is likely to cause more 
>> confusion for anyone trying to understand the already convoluted 
>> details of the memory region setup. 
> 
>> To a reader of the code and the commit message, it is confusing that 
>> to support the "NO DMA" case, the nodma memory region must be 
>> disabled, which is the opposite of what it is meant to do.
>>
> 
> I dont think that I understand above statement. What do you mean by "NO
> DMA" case here ? is it iommu.passthrough=0 ?

I meant it from the point of view of the vIOMMU configuration (since we 
don't control what the guest can request). So in terms of vIOMMU modes 
and corresponding Memory Regions that must be enabled to support such 
modes, I see it as:

Passthrough(NO DMA) --> MR: iommu_nodma: enabled && iommu: disabled

DMA remap --> MR: iommu: enabled && iommu_nodma: disabled

But I recognize that view/model is probably too rigid for now, although 
it should be the "correct state" once we support DMA remapping.

> 
> Essentially, I am trying to support the "DMA" case that is
> iommu.passthrough=0 for the emulated devices, by reverting the changes> that introduced the regression.

I understand the goal is to make emulated devs to work in more scenarios.

Because of that view that I mention above, is why I don't think of 
c1f46999ef506 ("amd_iommu: Add support for pass though mode") as 
introducing a regression, but more of a prerequisite to support both PT 
and DMA modes.

> 
> If I understand correct -->
> The original intent of the flag (in case of Intel) is
> 
> 1. To turn on the optimization which will use nodma region (dynamically
>     enabling it) if guest configures the device with passthrough (pt=1)
>     for given context entry.

This is why I said I am conflicted with the implementation. Your change 
always disables the iommu_nodma region, where the default for Linux 
guests is to use passthrough mode, which "normally" would result in 
iommu_nodma being enabled. I almost suggested on my first reply that you 
instead forced x86_iommu->pt_supported = 0 in the AMDVi code, but that 
creates a similar type of contradiction.

In short, I understand what you are trying to do, but I think "the 
trick" as I called it below should probably be documented.

> 
> 2. The flag should not enable no_dma region if guest does not configure
>     device with pt.
> 
> Intel driver does this dynamically (for every context entry update while
> guest is running). But for AMD this is static and does not change with
> the DTE updates, which is causing this regression.

hopefully solved soon:
https://lore.kernel.org/qemu-devel/20250414020253.443831-15-alejandro.j.jimenez@oracle.com/

Alejandro

> 
>> To explain the "trick": this change is always enabling amdvi_dev_as- 
>>  >iommu, which is explicitly created as an IOMMU memory region (i.e. a 
>> memory region with mr->is_iommu == true), and it is meant to support 
>> DMA remapping. It is relying on the "side effect" that VFIO will try 
>> to register notifiers for memory regions that are an "IOMMU" (i.e. 
>> pass the check in memory_region_is_iommu()), and later fail when 
>> trying to register the notifier.
>>
>> If this change is merged, I think you should at least include the 
>> explanation above in the commit message, since it is not obvious to me 
>> at first reading. That being said, in my opinion, this approach adds 
>> potential confusion that is not worth the trouble, since most guests 
>> will not be using AMD vIOMMU at this point. And if they are, they 
>> would also have to be specifically requesting to enable DMA 
>> translation to hit the problem. Unfortunately, guests will always have 
>> the ability of specifying an invalid configuration if they try really 
>> hard (or not hard at all in this case).
>>
> 
> Yep, I should have explained it in details. Sorry about the confusion
> will keep in mind while sending future patches.
> 
> Regards
> Sairaj
> 
>> Alejandro




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] hw/i386/amd_iommu: Fix device setup failure when PT is on.
  2025-04-15 18:28       ` Alejandro Jimenez
@ 2025-04-17 10:21         ` Sairaj Kodilkar
  0 siblings, 0 replies; 10+ messages in thread
From: Sairaj Kodilkar @ 2025-04-17 10:21 UTC (permalink / raw)
  To: Alejandro Jimenez, qemu-devel
  Cc: suravee.suthikulpanit, joao.m.martins, philmd, vasant.hegde



On 4/15/2025 11:58 PM, Alejandro Jimenez wrote:
> 
> 
> On 4/15/25 2:38 AM, Sairaj Kodilkar wrote:
>>
>>
>> Hi Alejandro,
>>
>> On 4/15/2025 1:56 AM, Alejandro Jimenez wrote:
>>
>>> Hi Sairaj,
>>>
>>> I'm conflicted by the implementation of the change, so I'd like to 
>>> make sure I fully understand...
>>>
>>> On 4/10/25 2:44 AM, Sairaj Kodilkar wrote:
> 
>>>> Fix the issue by removing pt_supported check and disabling nodma memory
>>>> region. Adding pt_supported requires additional changes and we will 
>>>> look
>>>> into it later.
>>>
>>> I see that you are trying to essentially block a guest from enabling 
>>> an IOMMU feature that is not currently supported by the vIOMMU. 
>>> Hopefully that limitation will be solved soon (shameless plug):
>>> https://lore.kernel.org/qemu-devel/20250414020253.443831-1- 
>>> alejandro.j.jimenez@oracle.com/
>>>
>>> But in the meantime, I think enabling amdvi_dev_as->iommu when DMA 
>>> remapping capability is not available is likely to cause more 
>>> confusion for anyone trying to understand the already convoluted 
>>> details of the memory region setup. 
>>
>>> To a reader of the code and the commit message, it is confusing that 
>>> to support the "NO DMA" case, the nodma memory region must be 
>>> disabled, which is the opposite of what it is meant to do.
>>>
>>
>> I dont think that I understand above statement. What do you mean by "NO
>> DMA" case here ? is it iommu.passthrough=0 ?
> 
> I meant it from the point of view of the vIOMMU configuration (since we 
> don't control what the guest can request). So in terms of vIOMMU modes 
> and corresponding Memory Regions that must be enabled to support such 
> modes, I see it as:
> 
> Passthrough(NO DMA) --> MR: iommu_nodma: enabled && iommu: disabled
> 
> DMA remap --> MR: iommu: enabled && iommu_nodma: disabled
> 
> But I recognize that view/model is probably too rigid for now, although 
> it should be the "correct state" once we support DMA remapping.
> 
>>
>> Essentially, I am trying to support the "DMA" case that is
>> iommu.passthrough=0 for the emulated devices, by reverting the 
>> changes> that introduced the regression.
> 
> I understand the goal is to make emulated devs to work in more scenarios.
> 
> Because of that view that I mention above, is why I don't think of 
> c1f46999ef506 ("amd_iommu: Add support for pass though mode") as 
> introducing a regression, but more of a prerequisite to support both PT 
> and DMA modes.
> 
>>
>> If I understand correct -->
>> The original intent of the flag (in case of Intel) is
>>
>> 1. To turn on the optimization which will use nodma region (dynamically
>>     enabling it) if guest configures the device with passthrough (pt=1)
>>     for given context entry.
> 
> This is why I said I am conflicted with the implementation. Your change 
> always disables the iommu_nodma region, where the default for Linux 
> guests is to use passthrough mode, which "normally" would result in 
> iommu_nodma being enabled. I almost suggested on my first reply that you 
> instead forced x86_iommu->pt_supported = 0 in the AMDVi code, but that 
> creates a similar type of contradiction.
> 
> In short, I understand what you are trying to do, but I think "the 
> trick" as I called it below should probably be documented.
> 

Yes, I will respin the series with better commit message. I understand
that this patch can cause the confusion.

Regards
Sairaj Kodilkar

>>
>> 2. The flag should not enable no_dma region if guest does not configure
>>     device with pt.
>>
>> Intel driver does this dynamically (for every context entry update while
>> guest is running). But for AMD this is static and does not change with
>> the DTE updates, which is causing this regression.
> 
> hopefully solved soon:
> https://lore.kernel.org/qemu-devel/20250414020253.443831-15- 
> alejandro.j.jimenez@oracle.com/
> 
> Alejandro
> 
>>
>>> To explain the "trick": this change is always enabling amdvi_dev_as- 
>>>  >iommu, which is explicitly created as an IOMMU memory region (i.e. 
>>> a memory region with mr->is_iommu == true), and it is meant to 
>>> support DMA remapping. It is relying on the "side effect" that VFIO 
>>> will try to register notifiers for memory regions that are an 
>>> "IOMMU" (i.e. pass the check in memory_region_is_iommu()), and later 
>>> fail when trying to register the notifier.
>>>
>>> If this change is merged, I think you should at least include the 
>>> explanation above in the commit message, since it is not obvious to 
>>> me at first reading. That being said, in my opinion, this approach 
>>> adds potential confusion that is not worth the trouble, since most 
>>> guests will not be using AMD vIOMMU at this point. And if they are, 
>>> they would also have to be specifically requesting to enable DMA 
>>> translation to hit the problem. Unfortunately, guests will always 
>>> have the ability of specifying an invalid configuration if they try 
>>> really hard (or not hard at all in this case).
>>>
>>
>> Yep, I should have explained it in details. Sorry about the confusion
>> will keep in mind while sending future patches.
>>
>> Regards
>> Sairaj
>>
>>> Alejandro
> 
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-04-17 10:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-10  6:44 [PATCH 0/2] amd_iommu: Fixes Sairaj Kodilkar
2025-04-10  6:44 ` [PATCH 1/2] hw/i386/amd_iommu: Fix device setup failure when PT is on Sairaj Kodilkar
2025-04-10  8:01   ` Vasant Hegde
2025-04-14 20:26   ` Alejandro Jimenez
2025-04-15  6:38     ` Sairaj Kodilkar
2025-04-15 18:28       ` Alejandro Jimenez
2025-04-17 10:21         ` Sairaj Kodilkar
2025-04-10  6:44 ` [PATCH 2/2] hw/i386/amd_iommu: Fix xtsup when vcpus < 255 Sairaj Kodilkar
2025-04-10  9:16   ` Joao Martins
2025-04-11  0:56   ` Alejandro Jimenez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).