qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Zhenzhong Duan <zhenzhong.duan@intel.com>
To: qemu-devel@nongnu.org
Cc: alex@shazbot.org, clg@redhat.com, eric.auger@redhat.com,
	mst@redhat.com, jasowang@redhat.com, peterx@redhat.com,
	ddutile@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com,
	skolothumtho@nvidia.com, joao.m.martins@oracle.com,
	clement.mathieu--drif@eviden.com, kevin.tian@intel.com,
	yi.l.liu@intel.com, chao.p.peng@intel.com,
	Zhenzhong Duan <zhenzhong.duan@intel.com>
Subject: [PATCH v8 21/23] Workaround for ERRATA_772415_SPR17
Date: Mon, 17 Nov 2025 04:37:24 -0500	[thread overview]
Message-ID: <20251117093729.1121324-22-zhenzhong.duan@intel.com> (raw)
In-Reply-To: <20251117093729.1121324-1-zhenzhong.duan@intel.com>

On a system influenced by ERRATA_772415, IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17
is repored by IOMMU_DEVICE_GET_HW_INFO. Due to this errata, even the readonly
range mapped on second stage page table could still be written.

Reference from 4th Gen Intel Xeon Processor Scalable Family Specification
Update, Errata Details, SPR17.
https://edc.intel.com/content/www/us/en/design/products-and-solutions/processors-and-chipsets/eagle-stream/sapphire-rapids-specification-update/

Also copied the SPR17 details from above link:
"Problem: When remapping hardware is configured by system software in
scalable mode as Nested (PGTT=011b) and with PWSNP field Set in the
PASID-table-entry, it may Set Accessed bit and Dirty bit (and Extended
Access bit if enabled) in first-stage page-table entries even when
second-stage mappings indicate that corresponding first-stage page-table
is Read-Only.

Implication: Due to this erratum, pages mapped as Read-only in second-stage
page-tables may be modified by remapping hardware Access/Dirty bit updates.

Workaround: None identified. System software enabling nested translations
for a VM should ensure that there are no read-only pages in the
corresponding second-stage mappings."

Introduce a helper vfio_device_get_host_iommu_quirk_bypass_ro to check if
readonly mappings should be bypassed.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 include/hw/vfio/vfio-container.h |  1 +
 include/hw/vfio/vfio-device.h    |  3 +++
 hw/vfio/device.c                 | 14 ++++++++++++++
 hw/vfio/iommufd.c                |  9 ++++++++-
 hw/vfio/listener.c               |  6 ++++--
 5 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/include/hw/vfio/vfio-container.h b/include/hw/vfio/vfio-container.h
index 9f6e8cedfc..a7d5c5ed67 100644
--- a/include/hw/vfio/vfio-container.h
+++ b/include/hw/vfio/vfio-container.h
@@ -52,6 +52,7 @@ struct VFIOContainer {
     QLIST_HEAD(, VFIODevice) device_list;
     GList *iova_ranges;
     NotifierWithReturn cpr_reboot_notifier;
+    bool bypass_ro;
 };
 
 #define TYPE_VFIO_IOMMU "vfio-iommu"
diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 48d00c7bc4..f6f3d0e378 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -268,6 +268,9 @@ void vfio_device_prepare(VFIODevice *vbasedev, VFIOContainer *bcontainer,
 void vfio_device_unprepare(VFIODevice *vbasedev);
 
 bool vfio_device_get_viommu_flags_want_nesting(VFIODevice *vbasedev);
+bool vfio_device_get_host_iommu_quirk_bypass_ro(VFIODevice *vbasedev,
+                                                uint32_t type, void *caps,
+                                                uint32_t size);
 
 int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
                                 struct vfio_region_info **info);
diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index 71eb069eb6..290011e154 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -533,6 +533,20 @@ bool vfio_device_get_viommu_flags_want_nesting(VFIODevice *vbasedev)
     return false;
 }
 
+bool vfio_device_get_host_iommu_quirk_bypass_ro(VFIODevice *vbasedev,
+                                                uint32_t type, void *caps,
+                                                uint32_t size)
+{
+    VFIOPCIDevice *vdev = vfio_pci_from_vfio_device(vbasedev);
+
+    if (vdev) {
+        return !!(pci_device_get_host_iommu_quirks(PCI_DEVICE(vdev), type,
+                                                   caps, size) &
+                  HOST_IOMMU_QUIRK_NESTING_PARENT_BYPASS_RO);
+    }
+    return false;
+}
+
 /*
  * Traditional ioctl() based io
  */
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 63f8442865..2a7b0d0c07 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -351,6 +351,7 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
     VFIOContainer *bcontainer = VFIO_IOMMU(container);
     uint32_t type, flags = 0;
     uint64_t hw_caps;
+    VendorCaps caps;
     VFIOIOASHwpt *hwpt;
     uint32_t hwpt_id;
     int ret;
@@ -396,7 +397,8 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
      * instead.
      */
     if (!iommufd_backend_get_device_info(vbasedev->iommufd, vbasedev->devid,
-                                         &type, NULL, 0, &hw_caps, errp)) {
+                                         &type, &caps, sizeof(caps), &hw_caps,
+                                         errp)) {
         return false;
     }
 
@@ -411,6 +413,11 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
      */
     if (vfio_device_get_viommu_flags_want_nesting(vbasedev)) {
         flags |= IOMMU_HWPT_ALLOC_NEST_PARENT;
+
+        if (vfio_device_get_host_iommu_quirk_bypass_ro(vbasedev, type,
+                                                       &caps, sizeof(caps))) {
+            bcontainer->bypass_ro = true;
+        }
     }
 
     if (cpr_is_incoming()) {
diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
index ca2377d860..090f935d30 100644
--- a/hw/vfio/listener.c
+++ b/hw/vfio/listener.c
@@ -502,7 +502,8 @@ void vfio_container_region_add(VFIOContainer *bcontainer,
     int ret;
     Error *err = NULL;
 
-    if (!vfio_listener_valid_section(section, false, "region_add")) {
+    if (!vfio_listener_valid_section(section, bcontainer->bypass_ro,
+                                     "region_add")) {
         return;
     }
 
@@ -668,7 +669,8 @@ static void vfio_listener_region_del(MemoryListener *listener,
     int ret;
     bool try_unmap = true;
 
-    if (!vfio_listener_valid_section(section, false, "region_del")) {
+    if (!vfio_listener_valid_section(section, bcontainer->bypass_ro,
+                                     "region_del")) {
         return;
     }
 
-- 
2.47.1



  parent reply	other threads:[~2025-11-17  9:41 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-17  9:37 [PATCH v8 00/23] intel_iommu: Enable first stage translation for passthrough device Zhenzhong Duan
2025-11-17  9:37 ` [PATCH v8 01/23] intel_iommu: Rename vtd_ce_get_rid2pasid_entry to vtd_ce_get_pasid_entry Zhenzhong Duan
2025-11-17  9:37 ` [PATCH v8 02/23] intel_iommu: Delete RPS capability related supporting code Zhenzhong Duan
2025-11-17  9:37 ` [PATCH v8 03/23] intel_iommu: Update terminology to match VTD spec Zhenzhong Duan
2025-11-17  9:37 ` [PATCH v8 04/23] hw/pci: Export pci_device_get_iommu_bus_devfn() and return bool Zhenzhong Duan
2025-11-17  9:37 ` [PATCH v8 05/23] hw/pci: Introduce pci_device_get_viommu_flags() Zhenzhong Duan
2025-11-17  9:37 ` [PATCH v8 06/23] intel_iommu: Implement get_viommu_flags() callback Zhenzhong Duan
2025-11-17  9:37 ` [PATCH v8 07/23] intel_iommu: Introduce a new structure VTDHostIOMMUDevice Zhenzhong Duan
2025-11-17  9:37 ` [PATCH v8 08/23] vfio/iommufd: Force creating nesting parent HWPT Zhenzhong Duan
2025-11-17  9:37 ` [PATCH v8 09/23] intel_iommu_accel: Check for compatibility with IOMMUFD backed device when x-flts=on Zhenzhong Duan
2025-11-17  9:37 ` [PATCH v8 10/23] intel_iommu_accel: Fail passthrough device under PCI bridge if x-flts=on Zhenzhong Duan
2025-11-17  9:37 ` [PATCH v8 11/23] intel_iommu_accel: Stick to system MR for IOMMUFD backed host device when x-flts=on Zhenzhong Duan
2025-11-17  9:37 ` [PATCH v8 12/23] intel_iommu: Add some macros and inline functions Zhenzhong Duan
2025-11-17  9:37 ` [PATCH v8 13/23] intel_iommu_accel: Bind/unbind guest page table to host Zhenzhong Duan
2025-11-17  9:37 ` [PATCH v8 14/23] intel_iommu_accel: Propagate PASID-based iotlb invalidation " Zhenzhong Duan
2025-11-17  9:37 ` [PATCH v8 15/23] intel_iommu: Replay all pasid bindings when either SRTP or TE bit is changed Zhenzhong Duan
2025-11-17  9:37 ` [PATCH v8 16/23] intel_iommu: Replay pasid bindings after context cache invalidation Zhenzhong Duan
2025-11-17  9:37 ` [PATCH v8 17/23] vfio/listener: Bypass readonly region for dirty tracking Zhenzhong Duan
2025-11-17  9:37 ` [PATCH v8 18/23] intel_iommu: Add migration support with x-flts=on Zhenzhong Duan
2025-11-17  9:37 ` [PATCH v8 19/23] hw/pci: Introduce pci_device_get_host_iommu_quirks() Zhenzhong Duan
2025-11-17  9:37 ` [PATCH v8 20/23] intel_iommu_accel: Implement get_host_iommu_quirks() callback Zhenzhong Duan
2025-11-17  9:37 ` Zhenzhong Duan [this message]
2025-11-17  9:37 ` [PATCH v8 22/23] intel_iommu: Enable host device when x-flts=on in scalable mode Zhenzhong Duan
2025-11-17  9:37 ` [PATCH v8 23/23] docs/devel: Add IOMMUFD nesting documentation Zhenzhong Duan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251117093729.1121324-22-zhenzhong.duan@intel.com \
    --to=zhenzhong.duan@intel.com \
    --cc=alex@shazbot.org \
    --cc=chao.p.peng@intel.com \
    --cc=clement.mathieu--drif@eviden.com \
    --cc=clg@redhat.com \
    --cc=ddutile@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=joao.m.martins@oracle.com \
    --cc=kevin.tian@intel.com \
    --cc=mst@redhat.com \
    --cc=nicolinc@nvidia.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=skolothumtho@nvidia.com \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).