From: Yi Liu <yi.l.liu@intel.com>
To: Nicolin Chen <nicolinc@nvidia.com>,
"Duan, Zhenzhong" <zhenzhong.duan@intel.com>
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
"clg@redhat.com" <clg@redhat.com>,
"eric.auger@redhat.com" <eric.auger@redhat.com>,
"mst@redhat.com" <mst@redhat.com>,
"jasowang@redhat.com" <jasowang@redhat.com>,
"peterx@redhat.com" <peterx@redhat.com>,
"ddutile@redhat.com" <ddutile@redhat.com>,
"jgg@nvidia.com" <jgg@nvidia.com>,
"shameerali.kolothum.thodi@huawei.com"
<shameerali.kolothum.thodi@huawei.com>,
"joao.m.martins@oracle.com" <joao.m.martins@oracle.com>,
"clement.mathieu--drif@eviden.com"
<clement.mathieu--drif@eviden.com>,
"Tian, Kevin" <kevin.tian@intel.com>,
"Peng, Chao P" <chao.p.peng@intel.com>,
Yi Sun <yi.y.sun@linux.intel.com>,
Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Richard Henderson <richard.henderson@linaro.org>,
Eduardo Habkost <eduardo@habkost.net>
Subject: Re: [PATCH rfcv3 15/21] intel_iommu: Bind/unbind guest page table to host
Date: Fri, 23 May 2025 14:26:48 +0800 [thread overview]
Message-ID: <1b6e8e16-cc9f-4ac1-bf98-3d79460ba837@intel.com> (raw)
In-Reply-To: <aC97Kn+M5/9/naY0@Asurada-Nvidia>
On 2025/5/23 03:29, Nicolin Chen wrote:
> On Thu, May 22, 2025 at 06:50:42AM +0000, Duan, Zhenzhong wrote:
>>
>>
>>> -----Original Message-----
>>> From: Nicolin Chen <nicolinc@nvidia.com>
>>> Subject: Re: [PATCH rfcv3 15/21] intel_iommu: Bind/unbind guest page table to
>>> host
>>>
>>> On Wed, May 21, 2025 at 07:14:45PM +0800, Zhenzhong Duan wrote:
>>>> +static const MemoryListener iommufd_s2domain_memory_listener = {
>>>> + .name = "iommufd_s2domain",
>>>> + .priority = 1000,
>>>> + .region_add = iommufd_listener_region_add_s2domain,
>>>> + .region_del = iommufd_listener_region_del_s2domain,
>>>> +};
>>>
>>> Would you mind elaborating When and how vtd does all S2 mappings?
>>
>> When guest trigger pasid cache invalidation, vIOMMU will attach device
>> to stage2 page table if guest's PGTT=PT or nested page table if PGTT=Stage1.
>> All these page tables are dynamically created during attach. We don't use
>> VFIO's shadow page table. The S2 mappings are also created during attach.
>
> OK. That I can understand.
>
> Next question: what does VTD actually map onto the S2 page table?
> The entire guest RAM? Or just a part of that?
>
> On ARM, the VFIO listener would capture the entire RAM, and map it
> on S2 page table. I wonder if VTD would do the same.
>
>>> On ARM, the default vfio_memory_listener could capture the entire
>>> guest RAM and add to the address space. So what we do is basically
>>> reusing the vfio_memory_listener:
>>> https://lore.kernel.org/qemu-devel/20250311141045.66620-13-
>>> shameerali.kolothum.thodi@huawei.com/
>>>
>>> The thing is that when a VFIO device is attached to the container
>>> upon a nesting configuration, the ->get_address_space op should
>>> return the system address space as S1 nested HWPT isn't allocated
>>> yet. Then all the iommu as routines in vfio_listener_region_add()
>>> would be skipped, ending up with mapping the guest RAM in S2 HWPT
>>> correctly. Not until the S1 nested HWPT is allocated by the guest
>>> OS (after guest boots), can the ->get_address_space op return the
>>> iommu address space.
>>
>> When S1 hwpt is allocated by guest, who will notify VFIO to call
>> ->get_address_space op() again to get iommu address space?
>
> Hmm, would you please elaborate why VFIO needs to call that again?
>
> I can see VFIO create the MAP/UNMAP notifiers for an iommu address
> space. However, the device operating in the nested translation mode
> should go through IOMMU HW for these two:
> - S1 page table (MAP) will be created by the guest OS
> - S1 invalidation (UNMAP) will be issued by the guest OS, and then
> trapped by QEMU to forward to the HWPT uAPI to the host kernel.
>
> As you mentioned, there is no need of a shadow page table in this
> mode. What else does VT-d need from an iommu address space?
>
> On ARM, the only reason that we shift address space, is for KVM to
> inject MSI, as it only has the gIOVA and requires the iommu address
> space to translate that to gPA. Refer to kvm_arch_fixup_msi_route()
> in target/arm/kvm.c where it calls pci_device_iommu_address_space.
>
>>> With this address space shift, S2 mappings can be simply captured
>>> and done by vfio_memory_listener. Then, such an s2domain listener
>>> would be largely redundant.
>>
>> I didn't get how arm smmu supports switching address space, will VFIO call
>> ->get_address_space() twice, once to get system address space and the other
>> for iommu address space?
>
> The set_iommu_device() attaches the device to an stage2 page table
hmmm. I'm not sure if this is accurate. I think this set_iommu_device
just acts as setting some handle for this particular device to the vIOMMU
side. It has not idea about what address space nor page table.
> by default, indicating that the device works in the S1 passthrough
> mode (for VTD, that's PGTT=PT) at VM creation. And this is where
> the system address space should be returned by get_address_space().
>
> If the guest kernel sets an S1 Translate mode for the device (for
> VTD, that's PGTT=Stage1), QEMU would trap that and allocate an S1
> HWPT for device to attach. Starting from here, get_address_space()
> can return the iommu address space -- on ARM, we only need it for
> KVM to translate MSI.
>
refer to the last reply. This seems to be different between ARM and VT-d
emulation.
--
Regards,
Yi Liu
next prev parent reply other threads:[~2025-05-23 6:22 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-21 11:14 [PATCH rfcv3 00/21] intel_iommu: Enable stage-1 translation for passthrough device Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 01/21] backends/iommufd: Add a helper to invalidate user-managed HWPT Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 02/21] vfio/iommufd: Add properties and handlers to TYPE_HOST_IOMMU_DEVICE_IOMMUFD Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 03/21] vfio/iommufd: Initialize iommufd specific members in HostIOMMUDeviceIOMMUFD Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 04/21] vfio/iommufd: Implement [at|de]tach_hwpt handlers Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 05/21] vfio/iommufd: Save vendor specific device info Zhenzhong Duan
2025-05-21 21:57 ` Nicolin Chen
2025-05-22 9:21 ` Duan, Zhenzhong
2025-05-22 19:35 ` Nicolin Chen
2025-05-26 12:15 ` Cédric Le Goater
2025-05-27 2:12 ` Duan, Zhenzhong
2025-05-21 11:14 ` [PATCH rfcv3 06/21] iommufd: Implement query of host VTD IOMMU's capability Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 07/21] intel_iommu: Rename vtd_ce_get_rid2pasid_entry to vtd_ce_get_pasid_entry Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 08/21] intel_iommu: Optimize context entry cache utilization Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 09/21] intel_iommu: Check for compatibility with IOMMUFD backed device when x-flts=on Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 10/21] intel_iommu: Introduce a new structure VTDHostIOMMUDevice Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 11/21] intel_iommu: Introduce two helpers vtd_as_from/to_iommu_pasid_locked Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 12/21] intel_iommu: Handle PASID entry removing and updating Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 13/21] intel_iommu: Handle PASID entry adding Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 14/21] intel_iommu: Introduce a new pasid cache invalidation type FORCE_RESET Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 15/21] intel_iommu: Bind/unbind guest page table to host Zhenzhong Duan
2025-05-21 22:49 ` Nicolin Chen
2025-05-22 6:50 ` Duan, Zhenzhong
2025-05-22 19:29 ` Nicolin Chen
2025-05-23 6:26 ` Yi Liu [this message]
2025-05-26 3:34 ` Duan, Zhenzhong
2025-05-23 6:22 ` Yi Liu
2025-05-23 6:52 ` Duan, Zhenzhong
2025-05-23 21:12 ` Nicolin Chen
2025-05-26 3:46 ` Duan, Zhenzhong
2025-05-26 7:24 ` Yi Liu
2025-05-26 17:35 ` Nicolin Chen
2025-05-28 7:12 ` Duan, Zhenzhong
2025-06-12 12:53 ` Yi Liu
2025-06-12 14:06 ` Shameerali Kolothum Thodi via
2025-06-16 6:04 ` Nicolin Chen
2025-06-16 3:24 ` Duan, Zhenzhong
2025-06-16 6:34 ` Nicolin Chen
2025-06-16 8:54 ` Duan, Zhenzhong
2025-06-16 9:36 ` Yi Liu
2025-06-16 10:16 ` Duan, Zhenzhong
2025-06-17 7:04 ` Yi Liu
2025-06-16 5:59 ` Nicolin Chen
2025-06-16 7:38 ` Yi Liu
2025-06-17 3:22 ` Nicolin Chen
2025-06-17 6:48 ` Yi Liu
2025-06-16 5:47 ` Nicolin Chen
2025-06-16 8:15 ` Duan, Zhenzhong
2025-06-17 3:14 ` Nicolin Chen
2025-06-17 12:37 ` Jason Gunthorpe
2025-06-17 13:03 ` Yi Liu
2025-06-17 13:11 ` Jason Gunthorpe
2025-06-18 2:51 ` Duan, Zhenzhong
2025-06-18 3:40 ` Yi Liu
2025-06-18 11:43 ` Jason Gunthorpe
2025-05-21 11:14 ` [PATCH rfcv3 16/21] intel_iommu: ERRATA_772415 workaround Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 17/21] intel_iommu: Replay pasid binds after context cache invalidation Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 18/21] intel_iommu: Propagate PASID-based iotlb invalidation to host Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 19/21] intel_iommu: Refresh pasid bind when either SRTP or TE bit is changed Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 20/21] intel_iommu: Bypass replay in stage-1 page table mode Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 21/21] intel_iommu: Enable host device when x-flts=on in scalable mode Zhenzhong Duan
2025-05-26 12:19 ` [PATCH rfcv3 00/21] intel_iommu: Enable stage-1 translation for passthrough device Cédric Le Goater
2025-05-27 2:16 ` Duan, Zhenzhong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1b6e8e16-cc9f-4ac1-bf98-3d79460ba837@intel.com \
--to=yi.l.liu@intel.com \
--cc=alex.williamson@redhat.com \
--cc=chao.p.peng@intel.com \
--cc=clement.mathieu--drif@eviden.com \
--cc=clg@redhat.com \
--cc=ddutile@redhat.com \
--cc=eduardo@habkost.net \
--cc=eric.auger@redhat.com \
--cc=jasowang@redhat.com \
--cc=jgg@nvidia.com \
--cc=joao.m.martins@oracle.com \
--cc=kevin.tian@intel.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=mst@redhat.com \
--cc=nicolinc@nvidia.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=yi.y.sun@linux.intel.com \
--cc=zhenzhong.duan@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).