From: Sairaj Kodilkar <sarunkod@amd.com>
To: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>,
<qemu-devel@nongnu.org>
Cc: <suravee.suthikulpanit@amd.com>, <joao.m.martins@oracle.com>,
<philmd@linaro.org>, <vasant.hegde@amd.com>
Subject: Re: [PATCH 1/2] hw/i386/amd_iommu: Fix device setup failure when PT is on.
Date: Tue, 15 Apr 2025 12:08:26 +0530 [thread overview]
Message-ID: <63acddc3-bc7b-47e4-9e7c-66bdc40f23d2@amd.com> (raw)
In-Reply-To: <914314b3-611d-4da3-9050-3c8c1b881e40@oracle.com>
Hi Alejandro,
On 4/15/2025 1:56 AM, Alejandro Jimenez wrote:
> Hi Sairaj,
>
> I'm conflicted by the implementation of the change, so I'd like to make
> sure I fully understand...
>
> On 4/10/25 2:44 AM, Sairaj Kodilkar wrote:
>> Current amd_iommu enables the iommu_nodma address space when pt_supported
>> flag is on.
>
> As it should, that is the intended purpose of the iommu_nodma memory
> region.
>
> This causes device to bypass the IOMMU and use untranslated
>> address to perform DMA when guest kernel uses DMA mode, resulting in
>> failure to setup the devices in the guest.
>
> So the scenario you are describing above is this QEMU cmdline (using
> explicit options):
>
> -device amd-iommu,intremap=on,xtsup=on,pt=on \
> -device vfio-pci,host=0000:a1:00.1...
>
> and guest forcing DMA remap mode e.g. 'iommu.passthrough=0'
>
> which will cause failures from QEMU:
>
> qemu-system-x86_64: AHCI: Failed to start DMA engine: bad command list
> buffer address
> qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS
> receive buffer address
> qemu-system-x86_64: AHCI: Failed to start DMA engine: bad command list
> buffer address
> qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS
> receive buffer address
> qemu-system-x86_64: AHCI: Failed to start DMA engine: bad command list
> buffer address
>
> and also errors from the VF driver on the guest. e.g.:
>
> [ 69.424937] mlx5_core 0000:00:03.0: mlx5_function_enable:1212:(pid
> 2492): enable hca failed
> [ 69.437048] mlx5_core 0000:00:03.0: probe_one:1994:(pid 2492):
> mlx5_init_one failed with error code -110
> [ 69.444913] mlx5_core 0000:00:03.0: probe with driver mlx5_core
> failed with error -110
>
>
> Whereas after your change the guest would fail to launch because VFIO
> will try to register a MAP notifier for the device and fail the check in
> amdvi_iommu_notify_flag_changed().
>
> If my above description is correct, then...
>
Yep, The above description is correct. I should have included it in the
cover letter.
>>
>> Fix the issue by removing pt_supported check and disabling nodma memory
>> region. Adding pt_supported requires additional changes and we will look
>> into it later.
>
> I see that you are trying to essentially block a guest from enabling an
> IOMMU feature that is not currently supported by the vIOMMU. Hopefully
> that limitation will be solved soon (shameless plug):
> https://lore.kernel.org/qemu-devel/20250414020253.443831-1-
> alejandro.j.jimenez@oracle.com/
>
> But in the meantime, I think enabling amdvi_dev_as->iommu when DMA
> remapping capability is not available is likely to cause more confusion
> for anyone trying to understand the already convoluted details of the
> memory region setup.
> To a reader of the code and the commit message, it
> is confusing that to support the "NO DMA" case, the nodma memory region
> must be disabled, which is the opposite of what it is meant to do.
>
I dont think that I understand above statement. What do you mean by "NO
DMA" case here ? is it iommu.passthrough=0 ?
Essentially, I am trying to support the "DMA" case that is
iommu.passthrough=0 for the emulated devices, by reverting the changes
that introduced the regression.
If I understand correct -->
The original intent of the flag (in case of Intel) is
1. To turn on the optimization which will use nodma region (dynamically
enabling it) if guest configures the device with passthrough (pt=1)
for given context entry.
2. The flag should not enable no_dma region if guest does not configure
device with pt.
Intel driver does this dynamically (for every context entry update while
guest is running). But for AMD this is static and does not change with
the DTE updates, which is causing this regression.
> To explain the "trick": this change is always enabling amdvi_dev_as-
> >iommu, which is explicitly created as an IOMMU memory region (i.e. a
> memory region with mr->is_iommu == true), and it is meant to support DMA
> remapping. It is relying on the "side effect" that VFIO will try to
> register notifiers for memory regions that are an "IOMMU" (i.e. pass the
> check in memory_region_is_iommu()), and later fail when trying to
> register the notifier.
>
> If this change is merged, I think you should at least include the
> explanation above in the commit message, since it is not obvious to me
> at first reading. That being said, in my opinion, this approach adds
> potential confusion that is not worth the trouble, since most guests
> will not be using AMD vIOMMU at this point. And if they are, they would
> also have to be specifically requesting to enable DMA translation to hit
> the problem. Unfortunately, guests will always have the ability of
> specifying an invalid configuration if they try really hard (or not hard
> at all in this case).
>
Yep, I should have explained it in details. Sorry about the confusion
will keep in mind while sending future patches.
Regards
Sairaj
> Alejandro
>
>>
>> Fixes: c1f46999ef506 ("amd_iommu: Add support for pass though mode")
>> Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
>> ---
>> hw/i386/amd_iommu.c | 12 ++----------
>> 1 file changed, 2 insertions(+), 10 deletions(-)
>>
>> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
>> index 5f9b95279997..df8ba5d39ada 100644
>> --- a/hw/i386/amd_iommu.c
>> +++ b/hw/i386/amd_iommu.c
>> @@ -1426,7 +1426,6 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus
>> *bus, void *opaque, int devfn)
>> AMDVIState *s = opaque;
>> AMDVIAddressSpace **iommu_as, *amdvi_dev_as;
>> int bus_num = pci_bus_num(bus);
>> - X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
>> iommu_as = s->address_spaces[bus_num];
>> @@ -1486,15 +1485,8 @@ static AddressSpace
>> *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>> AMDVI_INT_ADDR_FIRST,
>> &amdvi_dev_as->iommu_ir,
>> 1);
>> - if (!x86_iommu->pt_supported) {
>> - memory_region_set_enabled(&amdvi_dev_as->iommu_nodma,
>> false);
>> - memory_region_set_enabled(MEMORY_REGION(&amdvi_dev_as-
>> >iommu),
>> - true);
>> - } else {
>> - memory_region_set_enabled(MEMORY_REGION(&amdvi_dev_as-
>> >iommu),
>> - false);
>> - memory_region_set_enabled(&amdvi_dev_as->iommu_nodma, true);
>> - }
>> + memory_region_set_enabled(&amdvi_dev_as->iommu_nodma, false);
>> + memory_region_set_enabled(MEMORY_REGION(&amdvi_dev_as-
>> >iommu), true);
>> }
>> return &iommu_as[devfn]->as;
>> }
>
next prev parent reply other threads:[~2025-04-15 6:45 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-10 6:44 [PATCH 0/2] amd_iommu: Fixes Sairaj Kodilkar
2025-04-10 6:44 ` [PATCH 1/2] hw/i386/amd_iommu: Fix device setup failure when PT is on Sairaj Kodilkar
2025-04-10 8:01 ` Vasant Hegde
2025-04-14 20:26 ` Alejandro Jimenez
2025-04-15 6:38 ` Sairaj Kodilkar [this message]
2025-04-15 18:28 ` Alejandro Jimenez
2025-04-17 10:21 ` Sairaj Kodilkar
2025-04-10 6:44 ` [PATCH 2/2] hw/i386/amd_iommu: Fix xtsup when vcpus < 255 Sairaj Kodilkar
2025-04-10 9:16 ` Joao Martins
2025-04-11 0:56 ` Alejandro Jimenez
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=63acddc3-bc7b-47e4-9e7c-66bdc40f23d2@amd.com \
--to=sarunkod@amd.com \
--cc=alejandro.j.jimenez@oracle.com \
--cc=joao.m.martins@oracle.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=suravee.suthikulpanit@amd.com \
--cc=vasant.hegde@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).