From: Eric Auger <eric.auger@redhat.com>
To: Nicolin Chen <nicolinc@nvidia.com>
Cc: Shameer Kolothum <skolothumtho@nvidia.com>,
"qemu-arm@nongnu.org" <qemu-arm@nongnu.org>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"peter.maydell@linaro.org" <peter.maydell@linaro.org>,
Jason Gunthorpe <jgg@nvidia.com>,
"ddutile@redhat.com" <ddutile@redhat.com>,
"berrange@redhat.com" <berrange@redhat.com>,
Nathan Chen <nathanc@nvidia.com>, Matt Ochs <mochs@nvidia.com>,
"smostafa@google.com" <smostafa@google.com>,
"wangzhou1@hisilicon.com" <wangzhou1@hisilicon.com>,
"jiangkunkun@huawei.com" <jiangkunkun@huawei.com>,
"jonathan.cameron@huawei.com" <jonathan.cameron@huawei.com>,
"zhangfei.gao@linaro.org" <zhangfei.gao@linaro.org>,
"zhenzhong.duan@intel.com" <zhenzhong.duan@intel.com>,
"yi.l.liu@intel.com" <yi.l.liu@intel.com>,
Krishnakant Jaju <kjaju@nvidia.com>
Subject: Re: [PATCH v5 15/32] hw/pci/pci: Introduce optional get_msi_address_space() callback
Date: Wed, 5 Nov 2025 08:47:56 +0100 [thread overview]
Message-ID: <413ca488-1301-4f0c-90bf-ab3ef5a0791e@redhat.com> (raw)
In-Reply-To: <aQo8MPCrr82wh3LI@Asurada-Nvidia>
Hi Nicolin,
On 11/4/25 6:47 PM, Nicolin Chen wrote:
> On Tue, Nov 04, 2025 at 05:01:57PM +0100, Eric Auger wrote:
>>>>>> On 10/31/25 11:49 AM, Shameer Kolothum wrote:
>>>>>>> On ARM, devices behind an IOMMU have their MSI doorbell addresses
>>>>>>> translated by the IOMMU. In nested mode, this translation happens in
>>>>>>> two stages (gIOVA → gPA → ITS page).
>>>>>>>
>>>>>>> In accelerated SMMUv3 mode, both stages are handled by hardware, so
>>>>>>> get_address_space() returns the system address space so that VFIO
>>>>>>> can setup stage-2 mappings for system address space.
>>>>>> Sorry but I still don't catch the above. Can you explain (most probably
>>>>>> again) why this is a requirement to return the system as so that VFIO
>>>>>> can setup stage-2 mappings for system address space. I am sorry for
>>>>>> insisting (at the risk of being stubborn or dumb) but I fail to
>>>>>> understand the requirement. As far as I remember the way I integrated it
>>>>>> at the old times did not require that change:
>>>>>> https://lore.kernel.org/all/20210411120912.15770-1-
>>>>>> eric.auger@redhat.com/
>>>>>> I used a vfio_prereg_listener to force the S2 mapping.
>>>>> Yes I remember that.
>>>>>
>>>>>> What has changed that forces us now to have this gym
>>>>> This approach achieves the same outcome, but through a
>>>>> different mechanism. Returning the system address space
>>>>> here ensures that VFIO sets up the Stage-2 mappings for
>>>>> devices behind the accelerated SMMUv3.
>>>>>
>>>>> I think, this makes sense because, in the accelerated case, the
>>>>> device is no longer managed by QEMU’s SMMUv3 model. The
>>>> On the other hand, as we discussed on v4 by returning system as you
>>>> pretend there is no translation in place which is not true. Now we use
>>>> an alias for it but it has not really removed its usage. Also it forces
>>>> use to hack around the MSI mapping and introduce new PCIIOMMUOps.
>>>> Have
>>>> you assessed the feasability of using vfio_prereg_listener to force the
>>>> S2 mapping. Is it simply not relevant anymore or could it be used also
>>>> with the iommufd be integration? Eric
>>> IIUC, the prereg_listener mechanism just enables us to setup the s2
>>> mappings. For MSI, In your version, I see that smmu_find_add_as()
>>> always returns IOMMU as. How is that supposed to work if the Guest
>>> has s1 bypass mode STE for the device?
>> I need to delve into it again as I forgot the details. Will come back to
>> you ...
> We aligned with Intel previously about this system address space.
> You might know these very well, yet here are the breakdowns:
>
> 1. VFIO core has a container that manages an HWPT. By default, it
> allocates a stage-1 normal HWPT, unless vIOMMU requests for a
You may precise this stage-1 normal HWPT is used to map GPA to HPA (so
eventually implements stage 2).
> nesting parent HWPT for accelerated cases.
> 2. VFIO core adds a listener for that HWPT and sets up a handler
> vfio_container_region_add() where it checks the memory region
> whether it is iommu or not.
> a. In case of !IOMMU as (i.e. system address space), it treats
> the address space as a RAM region, and handles all stage-2
> mappings for the core allocated nesting parent HWPT.
> b. In case of IOMMU as (i.e. a translation type) it sets up
> the IOTLB notifier and translation replay while bypassing
> the listener for RAM region.
yes S1+S2 are combined through vfio_iommu_map_notify()
>
> In an accelerated case, we need stage-2 mappings to match with the
> nesting parent HWPT. So, returning system address space or an alias
> of that notifies the vfio core to take the 2.a path.
>
> If we take 2.b path by returning IOMMU as in smmu_find_add_as, the
> VFIO core would no longer listen to the RAM region for us, i.e. no
> stage-2 HWPT nor mappings. vIOMMU would have to allocate a nesting
except if you change the VFIO common.c as I did the past to force the S2
mapping in the nested config.
See
https://lore.kernel.org/all/20210411120912.15770-16-eric.auger@redhat.com/
and vfio_prereg_listener()
Again I do not say this is the right way to do but using system address
space is not the "only" implementation choice I think and it needs to be
properly justified, especially has it has at least 2 side effects:
- somehow abusing the semantic of returned address space and pretends
there is no IOMMU translation in place and
- also impacting the way MSIs are handled (introduction of a new
PCIIOMMUOps).
This kind of explanation you wrote is absolutely needed in the commit
msg for reviewers to understand the design choice I think.
Eric
> parent and manage the stage-2 mappings by adding a listener in its
> own code, which is largely duplicated with the core code.
>
> -------------- so far this works for Intel and ARM--------------
>
> 3. On ARM, vPCI device is programmed with gIOVA, so KVM has to
> follow what the vPCI is told to inject vIRQs. This requires
> a translation at the nested stage-1 address space. Note that
> vSMMU in this case doesn't manage translation as it doesn't
> need to. But there is no other sane way for KVM to know the
> vITS page corresponding to the given gIOVA. So, we invented
> the get_msi_address_space op.
>
> (3) makes sense because there is a complication in the MSI that
> does a 2-stage translation on ARM and KVM must follow the stage-1
> input address, leaving us no choice to have two address spaces.
>
> Thanks
> Nicolin
>
next prev parent reply other threads:[~2025-11-05 7:48 UTC|newest]
Thread overview: 148+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-31 10:49 [PATCH v5 00/32] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
2025-10-31 10:49 ` [PATCH v5 01/32] backends/iommufd: Introduce iommufd_backend_alloc_viommu Shameer Kolothum
2025-10-31 10:49 ` [PATCH v5 02/32] backends/iommufd: Introduce iommufd_backend_alloc_vdev Shameer Kolothum
2025-10-31 10:49 ` [PATCH v5 03/32] hw/arm/smmu-common: Factor out common helper functions and export Shameer Kolothum
2025-10-31 10:49 ` [PATCH v5 04/32] hw/arm/smmu-common: Make iommu ops part of SMMUState Shameer Kolothum
2025-10-31 10:49 ` [PATCH v5 05/32] hw/arm/smmuv3-accel: Introduce smmuv3 accel device Shameer Kolothum
2025-10-31 10:49 ` [PATCH v5 06/32] hw/arm/smmuv3-accel: Initialize shared system address space Shameer Kolothum
2025-10-31 21:10 ` Nicolin Chen
2025-11-03 14:17 ` Shameer Kolothum
2025-11-03 13:12 ` Jonathan Cameron via
2025-11-03 15:53 ` Shameer Kolothum
2025-11-03 13:39 ` Philippe Mathieu-Daudé
2025-11-03 16:30 ` Eric Auger
2025-10-31 10:49 ` [PATCH v5 07/32] hw/pci/pci: Move pci_init_bus_master() after adding device to bus Shameer Kolothum
2025-11-03 13:24 ` Jonathan Cameron via
2025-11-03 16:40 ` Eric Auger
2025-10-31 10:49 ` [PATCH v5 08/32] hw/pci/pci: Add optional supports_address_space() callback Shameer Kolothum
2025-11-03 13:30 ` Jonathan Cameron via
2025-11-03 16:47 ` Eric Auger
2025-10-31 10:49 ` [PATCH v5 09/32] hw/pci-bridge/pci_expander_bridge: Move TYPE_PXB_PCIE_DEV to header Shameer Kolothum
2025-11-03 13:30 ` Jonathan Cameron via
2025-11-03 14:25 ` Eric Auger
2025-10-31 10:49 ` [PATCH v5 10/32] hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints with iommufd Shameer Kolothum
2025-11-03 16:51 ` Eric Auger
2025-10-31 10:49 ` [PATCH v5 11/32] hw/arm/smmuv3: Implement get_viommu_cap() callback Shameer Kolothum
2025-11-03 16:55 ` Eric Auger
2025-10-31 10:49 ` [PATCH v5 12/32] hw/arm/smmuv3-accel: Add set/unset_iommu_device callback Shameer Kolothum
2025-10-31 22:02 ` Nicolin Chen
2025-10-31 22:08 ` Nicolin Chen
2025-11-03 14:19 ` Shameer Kolothum
2025-10-31 10:49 ` [PATCH v5 13/32] hw/arm/smmuv3-accel: Add nested vSTE install/uninstall support Shameer Kolothum
2025-10-31 23:52 ` Nicolin Chen
2025-11-01 0:20 ` Nicolin Chen
2025-11-03 15:11 ` Shameer Kolothum
2025-11-03 17:32 ` Nicolin Chen
2025-11-04 11:05 ` Eric Auger
2025-11-04 12:26 ` Shameer Kolothum
2025-11-04 13:30 ` Eric Auger
2025-11-04 16:48 ` Nicolin Chen
2025-10-31 10:49 ` [PATCH v5 14/32] hw/arm/smmuv3-accel: Install SMMUv3 GBPA based hwpt Shameer Kolothum
2025-11-04 13:28 ` Eric Auger
2025-10-31 10:49 ` [PATCH v5 15/32] hw/pci/pci: Introduce optional get_msi_address_space() callback Shameer Kolothum
2025-11-04 14:11 ` Eric Auger
2025-11-04 14:20 ` Jason Gunthorpe
2025-11-04 14:42 ` Shameer Kolothum
2025-11-04 14:51 ` Jason Gunthorpe
2025-11-04 14:58 ` Shameer Kolothum
2025-11-04 15:12 ` Jason Gunthorpe
2025-11-04 15:20 ` Shameer Kolothum
2025-11-04 15:35 ` Jason Gunthorpe
2025-11-04 17:11 ` Nicolin Chen
2025-11-04 17:41 ` Jason Gunthorpe
2025-11-04 17:57 ` Nicolin Chen
2025-11-04 18:09 ` Jason Gunthorpe
2025-11-04 18:44 ` Nicolin Chen
2025-11-04 18:56 ` Jason Gunthorpe
2025-11-04 19:31 ` Nicolin Chen
2025-11-04 19:35 ` Jason Gunthorpe
2025-11-04 19:43 ` Nicolin Chen
2025-11-04 19:45 ` Jason Gunthorpe
2025-11-04 19:59 ` Nicolin Chen
2025-11-04 19:46 ` Shameer Kolothum
2025-11-05 12:52 ` Jason Gunthorpe
2025-11-05 17:32 ` Eric Auger
2025-11-04 14:37 ` Shameer Kolothum
2025-11-04 14:44 ` Eric Auger
2025-11-04 15:14 ` Shameer Kolothum
2025-11-04 16:01 ` Eric Auger
2025-11-04 17:47 ` Nicolin Chen
2025-11-05 7:47 ` Eric Auger [this message]
2025-11-05 19:30 ` Nicolin Chen
2025-11-04 19:08 ` Shameer Kolothum
2025-11-05 8:56 ` Eric Auger
2025-11-05 11:41 ` Shameer Kolothum
2025-11-05 17:25 ` Eric Auger
2025-11-05 18:10 ` Jason Gunthorpe
2025-11-05 18:33 ` Nicolin Chen
2025-11-05 18:58 ` Jason Gunthorpe
2025-11-05 19:33 ` Nicolin Chen
2025-11-06 7:42 ` Eric Auger
2025-11-06 11:48 ` Shameer Kolothum
2025-11-06 17:04 ` Eric Auger
2025-11-07 10:27 ` Shameer Kolothum
2025-11-06 14:32 ` Jason Gunthorpe
2025-11-06 15:47 ` Eric Auger
2025-11-05 18:33 ` Shameer Kolothum
2025-10-31 10:49 ` [PATCH v5 16/32] hw/arm/smmuv3-accel: Make use of " Shameer Kolothum
2025-10-31 23:57 ` Nicolin Chen
2025-11-03 15:19 ` Shameer Kolothum
2025-11-03 17:34 ` Nicolin Chen
2025-10-31 10:49 ` [PATCH v5 17/32] hw/arm/smmuv3-accel: Add support to issue invalidation cmd to host Shameer Kolothum
2025-11-01 0:35 ` Nicolin Chen via
2025-11-03 15:28 ` Shameer Kolothum
2025-11-03 17:43 ` Nicolin Chen
2025-11-03 18:17 ` Shameer Kolothum
2025-11-03 18:51 ` Nicolin Chen
2025-11-04 8:55 ` Eric Auger
2025-11-04 16:41 ` Nicolin Chen
2025-11-03 17:11 ` Eric Auger
2025-10-31 10:49 ` [PATCH v5 18/32] hw/arm/smmuv3: Initialize ID registers early during realize() Shameer Kolothum
2025-11-01 0:24 ` Nicolin Chen
2025-11-03 13:57 ` Jonathan Cameron via
2025-11-03 15:11 ` Eric Auger
2025-10-31 10:49 ` [PATCH v5 19/32] hw/arm/smmuv3-accel: Get host SMMUv3 hw info and validate Shameer Kolothum
2025-11-01 0:49 ` Nicolin Chen
2025-11-01 14:20 ` Zhangfei Gao
2025-11-03 15:42 ` Shameer Kolothum
2025-11-03 17:16 ` Eric Auger
2025-11-03 14:47 ` Jonathan Cameron via
2025-10-31 10:49 ` [PATCH v5 20/32] hw/pci-host/gpex: Allow to generate preserve boot config DSM #5 Shameer Kolothum
2025-11-03 13:58 ` Jonathan Cameron via
2025-10-31 10:49 ` [PATCH v5 21/32] hw/arm/virt: Set PCI preserve_config for accel SMMUv3 Shameer Kolothum
2025-11-03 14:58 ` Eric Auger
2025-11-03 15:03 ` Eric Auger via
2025-11-03 16:01 ` Shameer Kolothum
2025-10-31 10:49 ` [PATCH v5 22/32] tests/qtest/bios-tables-test: Prepare for IORT revison upgrade Shameer Kolothum
2025-11-03 14:48 ` Jonathan Cameron via
2025-11-03 14:59 ` Eric Auger
2025-10-31 10:49 ` [PATCH v5 23/32] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding Shameer Kolothum
2025-11-03 14:53 ` Jonathan Cameron via
2025-11-03 15:43 ` Shameer Kolothum
2025-10-31 10:49 ` [PATCH v5 24/32] tests/qtest/bios-tables-test: Update IORT blobs after revision upgrade Shameer Kolothum
2025-11-03 14:54 ` Jonathan Cameron via
2025-11-03 15:01 ` Eric Auger
2025-10-31 10:49 ` [PATCH v5 25/32] hw/arm/smmuv3: Add accel property for SMMUv3 device Shameer Kolothum
2025-11-03 14:56 ` Jonathan Cameron via
2025-10-31 10:49 ` [PATCH v5 26/32] hw/arm/smmuv3-accel: Add a property to specify RIL support Shameer Kolothum
2025-11-03 15:07 ` Eric Auger
2025-11-03 16:08 ` Shameer Kolothum
2025-11-03 16:25 ` Eric Auger
2025-11-04 9:38 ` Eric Auger
2025-10-31 10:50 ` [PATCH v5 27/32] hw/arm/smmuv3-accel: Add support for ATS Shameer Kolothum
2025-11-04 14:22 ` Eric Auger
2025-10-31 10:50 ` [PATCH v5 28/32] hw/arm/smmuv3-accel: Add property to specify OAS bits Shameer Kolothum
2025-11-04 14:35 ` Eric Auger
2025-11-04 14:50 ` Jason Gunthorpe
2025-11-06 7:54 ` Eric Auger
2025-10-31 10:50 ` [PATCH v5 29/32] backends/iommufd: Retrieve PASID width from iommufd_backend_get_device_info() Shameer Kolothum
2025-10-31 10:50 ` [PATCH v5 30/32] Extend get_cap() callback to support PASID Shameer Kolothum
2025-11-03 14:58 ` Jonathan Cameron via
2025-11-06 8:45 ` Eric Auger
2025-10-31 10:50 ` [PATCH v5 31/32] vfio: Synthesize vPASID capability to VM Shameer Kolothum
2025-11-03 15:00 ` Jonathan Cameron via
2025-11-06 13:55 ` Eric Auger
2025-11-06 14:27 ` Shameer Kolothum
2025-11-06 15:44 ` Eric Auger
2025-10-31 10:50 ` [PATCH v5 32/32] hw/arm/smmuv3-accel: Add support for PASID enable Shameer Kolothum
2025-11-06 16:46 ` Eric Auger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=413ca488-1301-4f0c-90bf-ab3ef5a0791e@redhat.com \
--to=eric.auger@redhat.com \
--cc=berrange@redhat.com \
--cc=ddutile@redhat.com \
--cc=jgg@nvidia.com \
--cc=jiangkunkun@huawei.com \
--cc=jonathan.cameron@huawei.com \
--cc=kjaju@nvidia.com \
--cc=mochs@nvidia.com \
--cc=nathanc@nvidia.com \
--cc=nicolinc@nvidia.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=skolothumtho@nvidia.com \
--cc=smostafa@google.com \
--cc=wangzhou1@hisilicon.com \
--cc=yi.l.liu@intel.com \
--cc=zhangfei.gao@linaro.org \
--cc=zhenzhong.duan@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).