From: Yi Liu <yi.l.liu@intel.com>
To: Jason Gunthorpe <jgg@nvidia.com>, Nicolin Chen <nicolinc@nvidia.com>
Cc: "Duan, Zhenzhong" <zhenzhong.duan@intel.com>,
Peter Xu <peterx@redhat.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
"clg@redhat.com" <clg@redhat.com>,
"eric.auger@redhat.com" <eric.auger@redhat.com>,
"mst@redhat.com" <mst@redhat.com>,
"jasowang@redhat.com" <jasowang@redhat.com>,
"ddutile@redhat.com" <ddutile@redhat.com>,
"shameerali.kolothum.thodi@huawei.com"
<shameerali.kolothum.thodi@huawei.com>,
"joao.m.martins@oracle.com" <joao.m.martins@oracle.com>,
"clement.mathieu--drif@eviden.com"
<clement.mathieu--drif@eviden.com>,
"Tian, Kevin" <kevin.tian@intel.com>,
"Peng, Chao P" <chao.p.peng@intel.com>,
Yi Sun <yi.y.sun@linux.intel.com>,
Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Richard Henderson <richard.henderson@linaro.org>,
Eduardo Habkost <eduardo@habkost.net>
Subject: Re: [PATCH rfcv3 15/21] intel_iommu: Bind/unbind guest page table to host
Date: Tue, 17 Jun 2025 21:03:32 +0800 [thread overview]
Message-ID: <de5baefb-515a-47e3-9e4b-16bca4dbec5e@intel.com> (raw)
In-Reply-To: <20250617123707.GW1174925@nvidia.com>
On 2025/6/17 20:37, Jason Gunthorpe wrote:
> On Mon, Jun 16, 2025 at 08:14:27PM -0700, Nicolin Chen wrote:
>> On Mon, Jun 16, 2025 at 08:15:11AM +0000, Duan, Zhenzhong wrote:
>>>> IIUIC, the guest kernel cmdline can switch the mode between the
>>>> stage1 (nesting) and stage2 (legacy/emulated VT-d), right?
>>>
>>> Right. E.g., kexec from "intel_iommu=on,sm_on" to "intel_iommu=on,sm_off",
>>> Then first kernel will run in scalable mode and use stage1(nesting) and
>>> second kernel will run in legacy mode and use stage2.
>>
>> In scalable mode, guest kernel has a stage1 (nested) domain and
>> host kernel has a stage2 (nesting parent) domain. In this case,
>> the VFIO container IOAS could be the system AS corresponding to
>> the kernel-managed stage2 domain.
>>
>> In legacy mode, guest kernel has a stage2 (normal) domain while
>> host kernel has a stage2 (shadow) domain? In this case, the VFIO
>> container IOAS should be the iommu AS corresponding to the kernel
>> guest-level stage2 domain (or should it be shadow)?
>
> What you want is to disable HW support for legacy mode in qemu so the
> kernel rejects sm_off operation.
that can be the future. :)
> The HW spec is really goofy, we get an ecap_slts but it only applies
> to a PASID table entry (scalable mode). So the HW has to support
> second stage for legacy always but can turn it off for PASID?
yes. legacy mode (page table following second stage format) is anyhow
supported.
> IMHO the intention was to allow the VMM to not support shadowing, but
> it seems the execution was mangled.
>
> I suggest fixing the Linux driver to refuse to run in sm_on mode if
> the HW supports scalable mode and ecap_slts = false. That may not be
> 100% spec compliant but it seems like a reasonable approach.
running sm_on with only ecap_flts==true is what we want here. We want
the guest use stage-1 page table hence it can be used by hw under the
nested translation mode. While this page table is only available in sm_on
mode.
If we want to drop the legacy mode usage in virtualization environment, we
might let linux iommu driver refuse running legacy mode while ecap_slts is
false. I suppose HW is going to advertise both ecap_slts and ecap_flts. So
this will just let guest get rid of using legacy mode.
But this is not necessary so far. As the discussion going here, we intend
to reuse the GPA HWPT allocated by VFIO container as well.[1] This is now
aligned with Nic and Shameer.
[1]
https://lore.kernel.org/qemu-devel/b3d31287-4de5-4e0e-a81b-99f82edd5bcc@intel.com/
>> The ARM model that Shameer is proposing only allows a nested SMMU
>> when such a legacy mode is off. This simplifies a lot of things.
>> But the difficulty of the VT-d model is that it has to rely on a
>> guest bootcmd during runtime..
>
> ARM is cleaner because it doesn't have these drivers issues. qemu can
> reliably say not to use the S2 and all the existing guest kernels will
> obey that.
out of curious, does SMMU have legacy mode or a given version of SMMU
only supports either legacy mode or newer mode?
> AMD has the same issues, BTW, arguably even worse as I didn't notice
> any way to specify if the v1 page table is supported :\
>
> Jason
--
Regards,
Yi Liu
next prev parent reply other threads:[~2025-06-17 16:23 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-21 11:14 [PATCH rfcv3 00/21] intel_iommu: Enable stage-1 translation for passthrough device Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 01/21] backends/iommufd: Add a helper to invalidate user-managed HWPT Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 02/21] vfio/iommufd: Add properties and handlers to TYPE_HOST_IOMMU_DEVICE_IOMMUFD Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 03/21] vfio/iommufd: Initialize iommufd specific members in HostIOMMUDeviceIOMMUFD Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 04/21] vfio/iommufd: Implement [at|de]tach_hwpt handlers Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 05/21] vfio/iommufd: Save vendor specific device info Zhenzhong Duan
2025-05-21 21:57 ` Nicolin Chen
2025-05-22 9:21 ` Duan, Zhenzhong
2025-05-22 19:35 ` Nicolin Chen
2025-05-26 12:15 ` Cédric Le Goater
2025-05-27 2:12 ` Duan, Zhenzhong
2025-05-21 11:14 ` [PATCH rfcv3 06/21] iommufd: Implement query of host VTD IOMMU's capability Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 07/21] intel_iommu: Rename vtd_ce_get_rid2pasid_entry to vtd_ce_get_pasid_entry Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 08/21] intel_iommu: Optimize context entry cache utilization Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 09/21] intel_iommu: Check for compatibility with IOMMUFD backed device when x-flts=on Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 10/21] intel_iommu: Introduce a new structure VTDHostIOMMUDevice Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 11/21] intel_iommu: Introduce two helpers vtd_as_from/to_iommu_pasid_locked Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 12/21] intel_iommu: Handle PASID entry removing and updating Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 13/21] intel_iommu: Handle PASID entry adding Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 14/21] intel_iommu: Introduce a new pasid cache invalidation type FORCE_RESET Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 15/21] intel_iommu: Bind/unbind guest page table to host Zhenzhong Duan
2025-05-21 22:49 ` Nicolin Chen
2025-05-22 6:50 ` Duan, Zhenzhong
2025-05-22 19:29 ` Nicolin Chen
2025-05-23 6:26 ` Yi Liu
2025-05-26 3:34 ` Duan, Zhenzhong
2025-05-23 6:22 ` Yi Liu
2025-05-23 6:52 ` Duan, Zhenzhong
2025-05-23 21:12 ` Nicolin Chen
2025-05-26 3:46 ` Duan, Zhenzhong
2025-05-26 7:24 ` Yi Liu
2025-05-26 17:35 ` Nicolin Chen
2025-05-28 7:12 ` Duan, Zhenzhong
2025-06-12 12:53 ` Yi Liu
2025-06-12 14:06 ` Shameerali Kolothum Thodi via
2025-06-16 6:04 ` Nicolin Chen
2025-06-16 3:24 ` Duan, Zhenzhong
2025-06-16 6:34 ` Nicolin Chen
2025-06-16 8:54 ` Duan, Zhenzhong
2025-06-16 9:36 ` Yi Liu
2025-06-16 10:16 ` Duan, Zhenzhong
2025-06-17 7:04 ` Yi Liu
2025-06-16 5:59 ` Nicolin Chen
2025-06-16 7:38 ` Yi Liu
2025-06-17 3:22 ` Nicolin Chen
2025-06-17 6:48 ` Yi Liu
2025-06-16 5:47 ` Nicolin Chen
2025-06-16 8:15 ` Duan, Zhenzhong
2025-06-17 3:14 ` Nicolin Chen
2025-06-17 12:37 ` Jason Gunthorpe
2025-06-17 13:03 ` Yi Liu [this message]
2025-06-17 13:11 ` Jason Gunthorpe
2025-06-18 2:51 ` Duan, Zhenzhong
2025-06-18 3:40 ` Yi Liu
2025-06-18 11:43 ` Jason Gunthorpe
2025-05-21 11:14 ` [PATCH rfcv3 16/21] intel_iommu: ERRATA_772415 workaround Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 17/21] intel_iommu: Replay pasid binds after context cache invalidation Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 18/21] intel_iommu: Propagate PASID-based iotlb invalidation to host Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 19/21] intel_iommu: Refresh pasid bind when either SRTP or TE bit is changed Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 20/21] intel_iommu: Bypass replay in stage-1 page table mode Zhenzhong Duan
2025-05-21 11:14 ` [PATCH rfcv3 21/21] intel_iommu: Enable host device when x-flts=on in scalable mode Zhenzhong Duan
2025-05-26 12:19 ` [PATCH rfcv3 00/21] intel_iommu: Enable stage-1 translation for passthrough device Cédric Le Goater
2025-05-27 2:16 ` Duan, Zhenzhong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=de5baefb-515a-47e3-9e4b-16bca4dbec5e@intel.com \
--to=yi.l.liu@intel.com \
--cc=alex.williamson@redhat.com \
--cc=chao.p.peng@intel.com \
--cc=clement.mathieu--drif@eviden.com \
--cc=clg@redhat.com \
--cc=ddutile@redhat.com \
--cc=eduardo@habkost.net \
--cc=eric.auger@redhat.com \
--cc=jasowang@redhat.com \
--cc=jgg@nvidia.com \
--cc=joao.m.martins@oracle.com \
--cc=kevin.tian@intel.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=mst@redhat.com \
--cc=nicolinc@nvidia.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=yi.y.sun@linux.intel.com \
--cc=zhenzhong.duan@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).