* [PATCH v2 0/9]vfio: Improve error reporting when MMIO region mapping fails
@ 2025-01-30 13:43 Cédric Le Goater
2025-01-30 13:43 ` [PATCH v2 1/9] util/error: Introduce warn_report_once_err() Cédric Le Goater
` (8 more replies)
0 siblings, 9 replies; 35+ messages in thread
From: Cédric Le Goater @ 2025-01-30 13:43 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Williamson, Cédric Le Goater
Hello,
Under certain circumstances, a MMIO region of a device fails to map
because the region is outside the supported IOVA ranges of the VM. In
this case, PCI peer-to-peer transactions on BARs are not supported.
This typically occurs when the IOMMU address space width is less than
the physical address width, as can be the case on some consumer
processors or when using a vIOMMU device with default settings.
This series tries to clarify the error message reported to the user.
Thanks,
C.
Changes in v2:
- Removed advices on how to resolve the issue. Diagnostic is enough.
- Introduced helpers
- Checked device type, since this only applies to PCI
- Added cleanup
Cédric Le Goater (9):
util/error: Introduce warn_report_once_err()
vfio/pci: Replace "iommu_device" by "vIOMMU"
vfio: Rephrase comment in vfio_listener_region_add() error path
vfio: Introduce vfio_get_vfio_device()
vfio: Improve error reporting when MMIO region mapping fails
vfio: Remove reports of DMA mapping errors in backends
cpu: Introduce cpu_get_phys_bits()
vfio: Check compatibility of CPU and IOMMU address space width
vfio: Remove superfluous error report in vfio_listener_region_add()
include/hw/core/cpu.h | 9 +++++
include/hw/core/sysemu-cpu-ops.h | 6 ++++
include/hw/vfio/vfio-common.h | 1 +
include/qapi/error.h | 5 +++
backends/iommufd.c | 3 --
cpu-target.c | 5 +++
hw/core/cpu-system.c | 11 ++++++
hw/vfio/common.c | 61 ++++++++++++++++++++++++++------
hw/vfio/container.c | 2 --
hw/vfio/helpers.c | 10 ++++++
hw/vfio/pci.c | 2 +-
target/i386/cpu.c | 6 ++++
util/error.c | 9 +++++
13 files changed, 113 insertions(+), 17 deletions(-)
--
2.48.1
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v2 1/9] util/error: Introduce warn_report_once_err()
2025-01-30 13:43 [PATCH v2 0/9]vfio: Improve error reporting when MMIO region mapping fails Cédric Le Goater
@ 2025-01-30 13:43 ` Cédric Le Goater
2025-01-30 14:25 ` Markus Armbruster
2025-01-30 17:55 ` Alex Williamson
2025-01-30 13:43 ` [PATCH v2 2/9] vfio/pci: Replace "iommu_device" by "vIOMMU" Cédric Le Goater
` (7 subsequent siblings)
8 siblings, 2 replies; 35+ messages in thread
From: Cédric Le Goater @ 2025-01-30 13:43 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Williamson, Cédric Le Goater, Markus Armbruster
Depending on the configuration, a passthrough device may produce
recurring DMA mapping errors at runtime and produce a lot of
output. It is useful to report only once.
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
include/qapi/error.h | 5 +++++
util/error.c | 9 +++++++++
2 files changed, 14 insertions(+)
diff --git a/include/qapi/error.h b/include/qapi/error.h
index 71f8fb2c50eee9a544992d0c05263c9793956fe1..b6ea274882b9788b64d4bb213c3458d7c674a881 100644
--- a/include/qapi/error.h
+++ b/include/qapi/error.h
@@ -448,6 +448,11 @@ void error_free_or_abort(Error **errp);
*/
void warn_report_err(Error *err);
+/*
+ * Convenience function to call warn_report_err() once.
+ */
+void warn_report_once_err(Error *err);
+
/*
* Convenience function to error_report() and free @err.
* The report includes hints added with error_append_hint().
diff --git a/util/error.c b/util/error.c
index e5e247209a9e0796074a9794f5598325f22f8d35..b8a211cccaa609a93091b86316144a0ad0a02662 100644
--- a/util/error.c
+++ b/util/error.c
@@ -247,6 +247,15 @@ void warn_report_err(Error *err)
error_free(err);
}
+void warn_report_once_err(Error *err)
+{
+ static bool print_once_;
+ if (!print_once_) {
+ warn_report_err(err);
+ print_once_ = true;
+ }
+}
+
void error_reportf_err(Error *err, const char *fmt, ...)
{
va_list ap;
--
2.48.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v2 2/9] vfio/pci: Replace "iommu_device" by "vIOMMU"
2025-01-30 13:43 [PATCH v2 0/9]vfio: Improve error reporting when MMIO region mapping fails Cédric Le Goater
2025-01-30 13:43 ` [PATCH v2 1/9] util/error: Introduce warn_report_once_err() Cédric Le Goater
@ 2025-01-30 13:43 ` Cédric Le Goater
2025-02-10 14:28 ` Philippe Mathieu-Daudé
2025-01-30 13:43 ` [PATCH v2 3/9] vfio: Rephrase comment in vfio_listener_region_add() error path Cédric Le Goater
` (6 subsequent siblings)
8 siblings, 1 reply; 35+ messages in thread
From: Cédric Le Goater @ 2025-01-30 13:43 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Williamson, Cédric Le Goater
This is to be consistent with other reported errors related to vIOMMU
devices.
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
hw/vfio/pci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 1b90c78c5a1927b1553c59de7a470544bc07788a..90570e9e4ee571d3ec5a7f7b302b1e7e7f0c9a33 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3116,7 +3116,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
if (!vbasedev->mdev &&
!pci_device_set_iommu_device(pdev, vbasedev->hiod, errp)) {
- error_prepend(errp, "Failed to set iommu_device: ");
+ error_prepend(errp, "Failed to set vIOMMU: ");
goto out_teardown;
}
--
2.48.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v2 3/9] vfio: Rephrase comment in vfio_listener_region_add() error path
2025-01-30 13:43 [PATCH v2 0/9]vfio: Improve error reporting when MMIO region mapping fails Cédric Le Goater
2025-01-30 13:43 ` [PATCH v2 1/9] util/error: Introduce warn_report_once_err() Cédric Le Goater
2025-01-30 13:43 ` [PATCH v2 2/9] vfio/pci: Replace "iommu_device" by "vIOMMU" Cédric Le Goater
@ 2025-01-30 13:43 ` Cédric Le Goater
2025-01-30 13:43 ` [PATCH v2 4/9] vfio: Introduce vfio_get_vfio_device() Cédric Le Goater
` (5 subsequent siblings)
8 siblings, 0 replies; 35+ messages in thread
From: Cédric Le Goater @ 2025-01-30 13:43 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Williamson, Cédric Le Goater
Rephrase a bit the ending comment about how errors are handled
depending on the phase in which vfio_listener_region_add() is called.
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
hw/vfio/common.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index f7499a9b7447a7593198e1523c50858b70a8bd85..62af1216fc5a9089fc718c2afe3a405d9381db32 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -683,12 +683,13 @@ fail:
error_reportf_err(err, "PCI p2p may not work: ");
return;
}
- /*
- * On the initfn path, store the first error in the container so we
- * can gracefully fail. Runtime, there's not much we can do other
- * than throw a hardware error.
- */
+
if (!bcontainer->initialized) {
+ /*
+ * At machine init time or when the device is attached to the
+ * VM, store the first error in the container so we can
+ * gracefully fail the device realize routine.
+ */
if (!bcontainer->error) {
error_propagate_prepend(&bcontainer->error, err,
"Region %s: ",
@@ -697,6 +698,10 @@ fail:
error_free(err);
}
} else {
+ /*
+ * At runtime, there's not much we can do other than throw a
+ * hardware error.
+ */
error_report_err(err);
hw_error("vfio: DMA mapping failed, unable to continue");
}
--
2.48.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v2 4/9] vfio: Introduce vfio_get_vfio_device()
2025-01-30 13:43 [PATCH v2 0/9]vfio: Improve error reporting when MMIO region mapping fails Cédric Le Goater
` (2 preceding siblings ...)
2025-01-30 13:43 ` [PATCH v2 3/9] vfio: Rephrase comment in vfio_listener_region_add() error path Cédric Le Goater
@ 2025-01-30 13:43 ` Cédric Le Goater
2025-02-10 14:32 ` Philippe Mathieu-Daudé
2025-01-30 13:43 ` [PATCH v2 5/9] vfio: Improve error reporting when MMIO region mapping fails Cédric Le Goater
` (4 subsequent siblings)
8 siblings, 1 reply; 35+ messages in thread
From: Cédric Le Goater @ 2025-01-30 13:43 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Williamson, Cédric Le Goater
This helper will be useful in the listener handlers to extract the
VFIO device from a memory region using memory_region_owner(). At the
moment, we only care for PCI passthrough devices. If the need arises,
we will add more.
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
include/hw/vfio/vfio-common.h | 1 +
hw/vfio/helpers.c | 10 ++++++++++
2 files changed, 11 insertions(+)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 0c60be5b15c70168f4f94ad7054d9bd750a162d3..ac35136a11051b079cd9d04e6becd344a0e0f7e7 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -252,6 +252,7 @@ bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp);
bool vfio_attach_device(char *name, VFIODevice *vbasedev,
AddressSpace *as, Error **errp);
void vfio_detach_device(VFIODevice *vbasedev);
+VFIODevice *vfio_get_vfio_device(Object *obj);
int vfio_kvm_device_add_fd(int fd, Error **errp);
int vfio_kvm_device_del_fd(int fd, Error **errp);
diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
index 913796f437f84eece8711cb4b4b654a44040d17c..4b255d4f3a9e81f55df00c68fc71da769fd5bd04 100644
--- a/hw/vfio/helpers.c
+++ b/hw/vfio/helpers.c
@@ -23,6 +23,7 @@
#include <sys/ioctl.h>
#include "hw/vfio/vfio-common.h"
+#include "hw/vfio/pci.h"
#include "hw/hw.h"
#include "trace.h"
#include "qapi/error.h"
@@ -728,3 +729,12 @@ bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp)
return HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp);
}
+
+VFIODevice *vfio_get_vfio_device(Object *obj)
+{
+ if (object_dynamic_cast(obj, TYPE_VFIO_PCI)) {
+ return &VFIO_PCI(obj)->vbasedev;
+ } else {
+ return NULL;
+ }
+}
--
2.48.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v2 5/9] vfio: Improve error reporting when MMIO region mapping fails
2025-01-30 13:43 [PATCH v2 0/9]vfio: Improve error reporting when MMIO region mapping fails Cédric Le Goater
` (3 preceding siblings ...)
2025-01-30 13:43 ` [PATCH v2 4/9] vfio: Introduce vfio_get_vfio_device() Cédric Le Goater
@ 2025-01-30 13:43 ` Cédric Le Goater
2025-02-10 14:36 ` Philippe Mathieu-Daudé
2025-01-30 13:43 ` [PATCH v2 6/9] vfio: Remove reports of DMA mapping errors in backends Cédric Le Goater
` (3 subsequent siblings)
8 siblings, 1 reply; 35+ messages in thread
From: Cédric Le Goater @ 2025-01-30 13:43 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Williamson, Cédric Le Goater
When the IOMMU address space width is smaller than the physical
address width, a MMIO region of a device can fail to map because the
region is outside the supported IOVA ranges of the VM. In this case,
PCI peer-to-peer transactions on BARs are not supported.
This can occur with the 39-bit IOMMU address space width, as can be
the case on some consumer processors or when using a vIOMMU device
with default settings. The current error message is unclear, also
change the error report to a warning because it is a non fatal
condition for the VM.
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
hw/vfio/common.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 62af1216fc5a9089fc718c2afe3a405d9381db32..5c9d8657d746ce30af5ae8f9122101e086a61ef5 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -555,6 +555,18 @@ static bool vfio_get_section_iova_range(VFIOContainerBase *bcontainer,
return true;
}
+static void vfio_device_error_append(VFIODevice *vbasedev, Error **errp)
+{
+ /*
+ * MMIO region mapping failures are not fatal but in this case PCI
+ * peer-to-peer transactions are broken.
+ */
+ if (vbasedev && vbasedev->type == VFIO_DEVICE_TYPE_PCI) {
+ error_append_hint(errp, "%s: PCI peer-to-peer transactions "
+ "on BARs are not supported.\n", vbasedev->name);
+ }
+}
+
static void vfio_listener_region_add(MemoryListener *listener,
MemoryRegionSection *section)
{
@@ -670,7 +682,10 @@ static void vfio_listener_region_add(MemoryListener *listener,
strerror(-ret));
if (memory_region_is_ram_device(section->mr)) {
/* Allow unexpected mappings not to be fatal for RAM devices */
- error_report_err(err);
+ VFIODevice *vbasedev =
+ vfio_get_vfio_device(memory_region_owner(section->mr));
+ vfio_device_error_append(vbasedev, &err);
+ warn_report_once_err(err);
return;
}
goto fail;
--
2.48.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v2 6/9] vfio: Remove reports of DMA mapping errors in backends
2025-01-30 13:43 [PATCH v2 0/9]vfio: Improve error reporting when MMIO region mapping fails Cédric Le Goater
` (4 preceding siblings ...)
2025-01-30 13:43 ` [PATCH v2 5/9] vfio: Improve error reporting when MMIO region mapping fails Cédric Le Goater
@ 2025-01-30 13:43 ` Cédric Le Goater
2025-01-30 13:43 ` [PATCH v2 7/9] cpu: Introduce cpu_get_phys_bits() Cédric Le Goater
` (2 subsequent siblings)
8 siblings, 0 replies; 35+ messages in thread
From: Cédric Le Goater @ 2025-01-30 13:43 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Williamson, Cédric Le Goater
Currently, the IOMMU backend, VFIO IOMMU Type 1 aka. legacy and
IOMMUFD, mapping handlers return an errno and also report an error.
This can produce a lot of output at runtime for recurring DMA mapping
errors and is redundant with the errors reported by the callers in
vfio_container_dma_un/map() routines.
Simply remove to let the callers handle the error. The mapping handler
of the IOMMUFD backend has a comment suggesting MMIO region mapping
failures return EFAULT. I am not sure this is entirely true, so keep
the EFAULT case until the conditions are clarified.
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
backends/iommufd.c | 3 ---
hw/vfio/container.c | 2 --
2 files changed, 5 deletions(-)
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 7b4fc8ec460ef635b9ed5ac7b201f124476b512a..d57da44755be3d7fdba74f7dbecfe6d1c89921ba 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -167,8 +167,6 @@ int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
/* TODO: Not support mapping hardware PCI BAR region for now. */
if (errno == EFAULT) {
warn_report("IOMMU_IOAS_MAP failed: %m, PCI BAR?");
- } else {
- error_report("IOMMU_IOAS_MAP failed: %m");
}
}
return ret;
@@ -203,7 +201,6 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
if (ret) {
ret = -errno;
- error_report("IOMMU_IOAS_UNMAP failed: %m");
}
return ret;
}
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 4ebb5268088d0a2006e0ed04afec0ee746ed2c1d..7c57bdd27b72731db5cf4f9446d954e143b4747e 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -159,7 +159,6 @@ static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
unmap.size -= 1ULL << ctz64(bcontainer->pgsizes);
continue;
}
- error_report("VFIO_UNMAP_DMA failed: %s", strerror(errno));
return -errno;
}
@@ -204,7 +203,6 @@ static int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
return 0;
}
- error_report("VFIO_MAP_DMA failed: %s", strerror(errno));
return -errno;
}
--
2.48.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v2 7/9] cpu: Introduce cpu_get_phys_bits()
2025-01-30 13:43 [PATCH v2 0/9]vfio: Improve error reporting when MMIO region mapping fails Cédric Le Goater
` (5 preceding siblings ...)
2025-01-30 13:43 ` [PATCH v2 6/9] vfio: Remove reports of DMA mapping errors in backends Cédric Le Goater
@ 2025-01-30 13:43 ` Cédric Le Goater
2025-02-10 14:40 ` Philippe Mathieu-Daudé
2025-03-06 10:37 ` Philippe Mathieu-Daudé
2025-01-30 13:43 ` [PATCH v2 8/9] vfio: Check compatibility of CPU and IOMMU address space width Cédric Le Goater
2025-01-30 13:43 ` [PATCH v2 9/9] vfio: Remove superfluous error report in vfio_listener_region_add() Cédric Le Goater
8 siblings, 2 replies; 35+ messages in thread
From: Cédric Le Goater @ 2025-01-30 13:43 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cédric Le Goater, Richard Henderson,
Paolo Bonzini, Eduardo Habkost, Marcel Apfelbaum,
Philippe Mathieu-Daudé, Yanan Wang, Zhao Liu
The Intel CPU has a complex history regarding setting of the physical
address space width on KVM. A 'phys_bits' field and a "phys-bits"
property were added by commit af45907a1328 ("target-i386: Allow
physical address bits to be set") to tune this value.
In certain circumstances, it is interesting to know this value to
check that all the conditions are met for optimal operation. For
instance, when the system has a 39-bit IOMMU address space width and a
larger CPU physical address space, we expect issues when mapping the
MMIO regions of passthrough devices and it would good to report to the
user. These hybrid HW configs can be found on some consumer grade
processors or when using a vIOMMU device with default settings.
For this purpose, add an helper routine and a CPUClass callback to
return the physical address space width of a CPU.
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Eduardo Habkost <eduardo@habkost.net>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: "Philippe Mathieu-Daudé" <philmd@linaro.org>
Cc: Yanan Wang <wangyanan55@huawei.com>
Cc: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
include/hw/core/cpu.h | 9 +++++++++
include/hw/core/sysemu-cpu-ops.h | 6 ++++++
cpu-target.c | 5 +++++
hw/core/cpu-system.c | 11 +++++++++++
target/i386/cpu.c | 6 ++++++
5 files changed, 37 insertions(+)
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index fb397cdfc53d12d40d3e4e7f86251fc31c48b9f6..1b3eead102ce62fcee55ab0ed5e0dff327fa2fc5 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -748,6 +748,14 @@ int cpu_asidx_from_attrs(CPUState *cpu, MemTxAttrs attrs);
*/
bool cpu_virtio_is_big_endian(CPUState *cpu);
+/**
+ * cpu_get_phys_bits:
+ * @cpu: CPU
+ *
+ * Return the physical address space width of the CPU @cpu.
+ */
+uint32_t cpu_get_phys_bits(const CPUState *cpu);
+
#endif /* CONFIG_USER_ONLY */
/**
@@ -1168,6 +1176,7 @@ void cpu_exec_unrealizefn(CPUState *cpu);
void cpu_exec_reset_hold(CPUState *cpu);
const char *target_name(void);
+uint32_t target_phys_bits(void);
#ifdef COMPILING_PER_TARGET
diff --git a/include/hw/core/sysemu-cpu-ops.h b/include/hw/core/sysemu-cpu-ops.h
index 0df5b058f50073e47d2a6b8286be5204776520d2..210b3ed57985525795b81559e41e0085969210d5 100644
--- a/include/hw/core/sysemu-cpu-ops.h
+++ b/include/hw/core/sysemu-cpu-ops.h
@@ -81,6 +81,12 @@ typedef struct SysemuCPUOps {
*/
bool (*virtio_is_big_endian)(CPUState *cpu);
+ /**
+ * @get_phys_bits: Callback to return the physical address space
+ * width of a CPU.
+ */
+ uint32_t (*get_phys_bits)(const CPUState *cpu);
+
/**
* @legacy_vmsd: Legacy state for migration.
* Do not use in new targets, use #DeviceClass::vmsd instead.
diff --git a/cpu-target.c b/cpu-target.c
index 667688332c929aa53782c94343def34571272d5f..88158272c06cc42424d435b9701e33735f080239 100644
--- a/cpu-target.c
+++ b/cpu-target.c
@@ -472,3 +472,8 @@ const char *target_name(void)
{
return TARGET_NAME;
}
+
+uint32_t target_phys_bits(void)
+{
+ return TARGET_PHYS_ADDR_SPACE_BITS;
+}
diff --git a/hw/core/cpu-system.c b/hw/core/cpu-system.c
index 6aae28a349a7a377d010ff9dcab5ebc29e1126ca..05067d84f4258facf4252216f17729e390d38eae 100644
--- a/hw/core/cpu-system.c
+++ b/hw/core/cpu-system.c
@@ -60,6 +60,17 @@ hwaddr cpu_get_phys_page_attrs_debug(CPUState *cpu, vaddr addr,
return cc->sysemu_ops->get_phys_page_debug(cpu, addr);
}
+uint32_t cpu_get_phys_bits(const CPUState *cpu)
+{
+ CPUClass *cc = CPU_GET_CLASS(cpu);
+
+ if (cc->sysemu_ops->get_phys_bits) {
+ return cc->sysemu_ops->get_phys_bits(cpu);
+ }
+
+ return target_phys_bits();
+}
+
hwaddr cpu_get_phys_page_debug(CPUState *cpu, vaddr addr)
{
MemTxAttrs attrs = {};
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index b5dd60d2812e0c3d13c1743fd485a9068ab29c4f..01cf9a44963710a415c755c17582730f75233143 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -8393,6 +8393,11 @@ static bool x86_cpu_get_paging_enabled(const CPUState *cs)
return cpu->env.cr[0] & CR0_PG_MASK;
}
+
+static uint32_t x86_cpu_get_phys_bits(const CPUState *cs)
+{
+ return X86_CPU(cs)->phys_bits;
+}
#endif /* !CONFIG_USER_ONLY */
static void x86_cpu_set_pc(CPUState *cs, vaddr value)
@@ -8701,6 +8706,7 @@ static const struct SysemuCPUOps i386_sysemu_ops = {
.get_memory_mapping = x86_cpu_get_memory_mapping,
.get_paging_enabled = x86_cpu_get_paging_enabled,
.get_phys_page_attrs_debug = x86_cpu_get_phys_page_attrs_debug,
+ .get_phys_bits = x86_cpu_get_phys_bits,
.asidx_from_attrs = x86_asidx_from_attrs,
.get_crash_info = x86_cpu_get_crash_info,
.write_elf32_note = x86_cpu_write_elf32_note,
--
2.48.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v2 8/9] vfio: Check compatibility of CPU and IOMMU address space width
2025-01-30 13:43 [PATCH v2 0/9]vfio: Improve error reporting when MMIO region mapping fails Cédric Le Goater
` (6 preceding siblings ...)
2025-01-30 13:43 ` [PATCH v2 7/9] cpu: Introduce cpu_get_phys_bits() Cédric Le Goater
@ 2025-01-30 13:43 ` Cédric Le Goater
2025-01-30 18:58 ` Alex Williamson
` (2 more replies)
2025-01-30 13:43 ` [PATCH v2 9/9] vfio: Remove superfluous error report in vfio_listener_region_add() Cédric Le Goater
8 siblings, 3 replies; 35+ messages in thread
From: Cédric Le Goater @ 2025-01-30 13:43 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Williamson, Cédric Le Goater
Print a warning if IOMMU address space width is smaller than the
physical address width. In this case, PCI peer-to-peer transactions on
BARs are not supported and failures of device MMIO regions are to be
expected.
This can occur with the 39-bit IOMMU address space width as found on
consumer grade processors or when using a vIOMMU device with default
settings.
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
hw/vfio/common.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 5c9d8657d746ce30af5ae8f9122101e086a61ef5..e5ef93c589b2bed68f790608868ec3c7779d4cb8 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -44,6 +44,8 @@
#include "migration/qemu-file.h"
#include "system/tpm.h"
+#include "hw/core/cpu.h"
+
VFIODeviceList vfio_device_list =
QLIST_HEAD_INITIALIZER(vfio_device_list);
static QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces =
@@ -1546,12 +1548,28 @@ retry:
return info;
}
+static bool vfio_device_check_address_space(VFIODevice *vbasedev, Error **errp)
+{
+ uint32_t cpu_aw_bits = cpu_get_phys_bits(first_cpu);
+ uint32_t iommu_aw_bits = vfio_device_get_aw_bits(vbasedev);
+
+ if (cpu_aw_bits && cpu_aw_bits > iommu_aw_bits) {
+ error_setg(errp, "Host physical address space (%u) is larger than "
+ "the host IOMMU address space (%u).", cpu_aw_bits,
+ iommu_aw_bits);
+ vfio_device_error_append(vbasedev, errp);
+ return false;
+ }
+ return true;
+}
+
bool vfio_attach_device(char *name, VFIODevice *vbasedev,
AddressSpace *as, Error **errp)
{
const VFIOIOMMUClass *ops =
VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
HostIOMMUDevice *hiod = NULL;
+ Error *local_err = NULL;
if (vbasedev->iommufd) {
ops = VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
@@ -1571,6 +1589,9 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
return false;
}
+ if (!vfio_device_check_address_space(vbasedev, &local_err)) {
+ warn_report_err(local_err);
+ }
return true;
}
--
2.48.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v2 9/9] vfio: Remove superfluous error report in vfio_listener_region_add()
2025-01-30 13:43 [PATCH v2 0/9]vfio: Improve error reporting when MMIO region mapping fails Cédric Le Goater
` (7 preceding siblings ...)
2025-01-30 13:43 ` [PATCH v2 8/9] vfio: Check compatibility of CPU and IOMMU address space width Cédric Le Goater
@ 2025-01-30 13:43 ` Cédric Le Goater
8 siblings, 0 replies; 35+ messages in thread
From: Cédric Le Goater @ 2025-01-30 13:43 UTC (permalink / raw)
To: qemu-devel
Cc: Alex Williamson, Cédric Le Goater, Harsh Prateek Bora,
Shivaprasad G Bhat
For pseries machines, commit 567b5b309abe ("vfio/pci: Relax DMA map
errors for MMIO regions") introduced 2 error reports to notify the
user of MMIO mapping errors. Consolidate both code paths under one.
Cc: Harsh Prateek Bora <harshpb@linux.ibm.com>
Cc: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
hw/vfio/common.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index e5ef93c589b2bed68f790608868ec3c7779d4cb8..f87214d6439ace74e3a8b2e81207d5a266d5238f 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -596,8 +596,9 @@ static void vfio_listener_region_add(MemoryListener *listener,
return;
}
+ /* PPC64/pseries machine only */
if (!vfio_container_add_section_window(bcontainer, section, &err)) {
- goto fail;
+ goto mmio_dma_error;
}
memory_region_ref(section->mr);
@@ -682,6 +683,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
"0x%"HWADDR_PRIx", %p) = %d (%s)",
bcontainer, iova, int128_get64(llsize), vaddr, ret,
strerror(-ret));
+ mmio_dma_error:
if (memory_region_is_ram_device(section->mr)) {
/* Allow unexpected mappings not to be fatal for RAM devices */
VFIODevice *vbasedev =
@@ -696,11 +698,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
return;
fail:
- if (memory_region_is_ram_device(section->mr)) {
- error_reportf_err(err, "PCI p2p may not work: ");
- return;
- }
-
if (!bcontainer->initialized) {
/*
* At machine init time or when the device is attached to the
@@ -808,6 +805,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
memory_region_unref(section->mr);
+ /* PPC64/pseries machine only */
vfio_container_del_section_window(bcontainer, section);
}
--
2.48.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/9] util/error: Introduce warn_report_once_err()
2025-01-30 13:43 ` [PATCH v2 1/9] util/error: Introduce warn_report_once_err() Cédric Le Goater
@ 2025-01-30 14:25 ` Markus Armbruster
2025-01-30 16:03 ` Cédric Le Goater
2025-01-30 17:55 ` Alex Williamson
1 sibling, 1 reply; 35+ messages in thread
From: Markus Armbruster @ 2025-01-30 14:25 UTC (permalink / raw)
To: Cédric Le Goater; +Cc: qemu-devel, Alex Williamson, Markus Armbruster
Cédric Le Goater <clg@redhat.com> writes:
> Depending on the configuration, a passthrough device may produce
> recurring DMA mapping errors at runtime and produce a lot of
> output. It is useful to report only once.
>
> Cc: Markus Armbruster <armbru@redhat.com>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> ---
> include/qapi/error.h | 5 +++++
> util/error.c | 9 +++++++++
> 2 files changed, 14 insertions(+)
>
> diff --git a/include/qapi/error.h b/include/qapi/error.h
> index 71f8fb2c50eee9a544992d0c05263c9793956fe1..b6ea274882b9788b64d4bb213c3458d7c674a881 100644
> --- a/include/qapi/error.h
> +++ b/include/qapi/error.h
> @@ -448,6 +448,11 @@ void error_free_or_abort(Error **errp);
> */
> void warn_report_err(Error *err);
>
> +/*
> + * Convenience function to call warn_report_err() once.
> + */
> +void warn_report_once_err(Error *err);
> +
> /*
> * Convenience function to error_report() and free @err.
> * The report includes hints added with error_append_hint().
> diff --git a/util/error.c b/util/error.c
> index e5e247209a9e0796074a9794f5598325f22f8d35..b8a211cccaa609a93091b86316144a0ad0a02662 100644
> --- a/util/error.c
> +++ b/util/error.c
> @@ -247,6 +247,15 @@ void warn_report_err(Error *err)
> error_free(err);
> }
>
> +void warn_report_once_err(Error *err)
> +{
> + static bool print_once_;
> + if (!print_once_) {
> + warn_report_err(err);
> + print_once_ = true;
> + }
> +}
Any particular reason for the trailing _ in @print_once_?
The first warning suppresses all subsequent warnings, even if they're
completely different. PATCH 5 uses this to emit a (rather cryptic)
warning about unexpected mappings once. If we later add another use
elsewhere, these uses will suppress each other. Is this what we want?
The related function warn_report_once_cond() takes the flag as a
parameter. Only the calls using the same flag suppress each other.
> +
> void error_reportf_err(Error *err, const char *fmt, ...)
> {
> va_list ap;
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/9] util/error: Introduce warn_report_once_err()
2025-01-30 14:25 ` Markus Armbruster
@ 2025-01-30 16:03 ` Cédric Le Goater
2025-01-30 16:28 ` Cédric Le Goater
0 siblings, 1 reply; 35+ messages in thread
From: Cédric Le Goater @ 2025-01-30 16:03 UTC (permalink / raw)
To: Markus Armbruster; +Cc: qemu-devel, Alex Williamson
On 1/30/25 15:25, Markus Armbruster wrote:
> Cédric Le Goater <clg@redhat.com> writes:
>
>> Depending on the configuration, a passthrough device may produce
>> recurring DMA mapping errors at runtime and produce a lot of
>> output. It is useful to report only once.
>>
>> Cc: Markus Armbruster <armbru@redhat.com>
>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>> ---
>> include/qapi/error.h | 5 +++++
>> util/error.c | 9 +++++++++
>> 2 files changed, 14 insertions(+)
>>
>> diff --git a/include/qapi/error.h b/include/qapi/error.h
>> index 71f8fb2c50eee9a544992d0c05263c9793956fe1..b6ea274882b9788b64d4bb213c3458d7c674a881 100644
>> --- a/include/qapi/error.h
>> +++ b/include/qapi/error.h
>> @@ -448,6 +448,11 @@ void error_free_or_abort(Error **errp);
>> */
>> void warn_report_err(Error *err);
>>
>> +/*
>> + * Convenience function to call warn_report_err() once.
>> + */
>> +void warn_report_once_err(Error *err);
>> +
>> /*
>> * Convenience function to error_report() and free @err.
>> * The report includes hints added with error_append_hint().
>> diff --git a/util/error.c b/util/error.c
>> index e5e247209a9e0796074a9794f5598325f22f8d35..b8a211cccaa609a93091b86316144a0ad0a02662 100644
>> --- a/util/error.c
>> +++ b/util/error.c
>> @@ -247,6 +247,15 @@ void warn_report_err(Error *err)
>> error_free(err);
>> }
>>
>> +void warn_report_once_err(Error *err)
>> +{
>> + static bool print_once_;
>> + if (!print_once_) {
>> + warn_report_err(err);
>> + print_once_ = true;
>> + }
>> +}
>
> Any particular reason for the trailing _ in @print_once_?>
> The first warning suppresses all subsequent warnings, even if they're
> completely different. PATCH 5 uses this to emit a (rather cryptic)
> warning about unexpected mappings once. If we later add another use
> elsewhere, these uses will suppress each other. Is this what we want?
This is copy paste of a static routine that served one purpose in vfio.
I wanted to make it common and didn't think enough about the implication.
Sorry about that.
> The related function warn_report_once_cond() takes the flag as a
> parameter. Only the calls using the same flag suppress each other.
yep. I wonder if we could use warn_report_once_cond() in a similar
macro.
Thanks,
C.
C.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/9] util/error: Introduce warn_report_once_err()
2025-01-30 16:03 ` Cédric Le Goater
@ 2025-01-30 16:28 ` Cédric Le Goater
0 siblings, 0 replies; 35+ messages in thread
From: Cédric Le Goater @ 2025-01-30 16:28 UTC (permalink / raw)
To: Markus Armbruster; +Cc: qemu-devel, Alex Williamson
>> The related function warn_report_once_cond() takes the flag as a
>> parameter. Only the calls using the same flag suppress each other.
>
> yep. I wonder if we could use warn_report_once_cond() in a similar
> macro.
But vfio uses ->hint too which adds complexity. I will keep the
custom version I have in vfio for now.
Thanks,
C.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/9] util/error: Introduce warn_report_once_err()
2025-01-30 13:43 ` [PATCH v2 1/9] util/error: Introduce warn_report_once_err() Cédric Le Goater
2025-01-30 14:25 ` Markus Armbruster
@ 2025-01-30 17:55 ` Alex Williamson
2025-01-30 21:26 ` Cédric Le Goater
1 sibling, 1 reply; 35+ messages in thread
From: Alex Williamson @ 2025-01-30 17:55 UTC (permalink / raw)
To: Cédric Le Goater; +Cc: qemu-devel, Markus Armbruster
On Thu, 30 Jan 2025 14:43:38 +0100
Cédric Le Goater <clg@redhat.com> wrote:
> Depending on the configuration, a passthrough device may produce
> recurring DMA mapping errors at runtime and produce a lot of
> output. It is useful to report only once.
>
> Cc: Markus Armbruster <armbru@redhat.com>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> ---
> include/qapi/error.h | 5 +++++
> util/error.c | 9 +++++++++
> 2 files changed, 14 insertions(+)
>
> diff --git a/include/qapi/error.h b/include/qapi/error.h
> index 71f8fb2c50eee9a544992d0c05263c9793956fe1..b6ea274882b9788b64d4bb213c3458d7c674a881 100644
> --- a/include/qapi/error.h
> +++ b/include/qapi/error.h
> @@ -448,6 +448,11 @@ void error_free_or_abort(Error **errp);
> */
> void warn_report_err(Error *err);
>
> +/*
> + * Convenience function to call warn_report_err() once.
> + */
> +void warn_report_once_err(Error *err);
> +
Turning it into a macro would do what you want:
#define warn_report_once_err(err) ({ \
static bool print_once_; \
if (!print_once_) { \
warn_report_err(err); \
print_once_ = true; \
} \
})
So long as we only want once per call site and not once per object,
which would pull in something like warn_report_once_cond(). Thanks,
Alex
> /*
> * Convenience function to error_report() and free @err.
> * The report includes hints added with error_append_hint().
> diff --git a/util/error.c b/util/error.c
> index e5e247209a9e0796074a9794f5598325f22f8d35..b8a211cccaa609a93091b86316144a0ad0a02662 100644
> --- a/util/error.c
> +++ b/util/error.c
> @@ -247,6 +247,15 @@ void warn_report_err(Error *err)
> error_free(err);
> }
>
> +void warn_report_once_err(Error *err)
> +{
> + static bool print_once_;
> + if (!print_once_) {
> + warn_report_err(err);
> + print_once_ = true;
> + }
> +}
> +
> void error_reportf_err(Error *err, const char *fmt, ...)
> {
> va_list ap;
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 8/9] vfio: Check compatibility of CPU and IOMMU address space width
2025-01-30 13:43 ` [PATCH v2 8/9] vfio: Check compatibility of CPU and IOMMU address space width Cédric Le Goater
@ 2025-01-30 18:58 ` Alex Williamson
2025-01-31 12:42 ` Cédric Le Goater
2025-03-06 10:33 ` Philippe Mathieu-Daudé
2025-09-05 13:04 ` Daniel Kral
2 siblings, 1 reply; 35+ messages in thread
From: Alex Williamson @ 2025-01-30 18:58 UTC (permalink / raw)
To: Cédric Le Goater; +Cc: qemu-devel
On Thu, 30 Jan 2025 14:43:45 +0100
Cédric Le Goater <clg@redhat.com> wrote:
> Print a warning if IOMMU address space width is smaller than the
> physical address width. In this case, PCI peer-to-peer transactions on
> BARs are not supported and failures of device MMIO regions are to be
> expected.
>
> This can occur with the 39-bit IOMMU address space width as found on
> consumer grade processors or when using a vIOMMU device with default
> settings.
>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> ---
> hw/vfio/common.c | 21 +++++++++++++++++++++
> 1 file changed, 21 insertions(+)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 5c9d8657d746ce30af5ae8f9122101e086a61ef5..e5ef93c589b2bed68f790608868ec3c7779d4cb8 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -44,6 +44,8 @@
> #include "migration/qemu-file.h"
> #include "system/tpm.h"
>
> +#include "hw/core/cpu.h"
> +
> VFIODeviceList vfio_device_list =
> QLIST_HEAD_INITIALIZER(vfio_device_list);
> static QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces =
> @@ -1546,12 +1548,28 @@ retry:
> return info;
> }
>
> +static bool vfio_device_check_address_space(VFIODevice *vbasedev, Error **errp)
> +{
> + uint32_t cpu_aw_bits = cpu_get_phys_bits(first_cpu);
> + uint32_t iommu_aw_bits = vfio_device_get_aw_bits(vbasedev);
> +
> + if (cpu_aw_bits && cpu_aw_bits > iommu_aw_bits) {
Should we be testing the 64-bit MMIO window and maximum RAM GPA rather
than the vCPU physical address width?
Maybe we're just stuck with an indirect solution currently. AIUI,
we're testing the vCPU physical address width because (obviously) all
devices and memory need to be addressable within that address space.
However, as we've explored offline, there are bare metal systems where
the CPU physical address width exceeds the IOMMU address width and this
is not a fundamental problem. It places restrictions on the maximum
RAM physical address and the location of the 64-bit MMIO space.
RAM tends not to be a problem on these bare metal systems since they
physically cannot hold enough RAM to exceed the IOMMU width and, for
the most part, we map RAM starting from the bottom of the address
space. So long as the VMM doesn't decide to map RAM at the top of the
physical address space, this doesn't become a problem.
However, we've decided to do exactly that for the 64-bit MMIO window.
It's not that the vCPU width being greater than the IOMMU width is a
fundamental problem, it's that we've chosen a 64-bit MMIO policy that
makes this become a problem and QEMU only has a convenient mechanism to
check the host IOMMU width when a vfio device is present. Arguably,
when a vIOMMU is present guest firmware should be enlightened to
understand the address width of that vIOMMU when placing the 64-bit
MMIO window. I'd say the failure to do so is a current firmware bug.
If the vIOMMU address width were honored, we could recommend users set
that to match the host, regardless of the vCPU physical address width.
We also already have a failure condition if the vIOMMU address width
exceeds the vfio-device (ie. indirectly the host) IOMMU width.
Of course without a vIOMMU, given our 64-bit MMIO policy, we don't have
a good solution without specifying the 64-bit window or implementing a
more conservative placement.
Not sure how much of this is immediately solvable and some indication
to the user how they can resolve the issue, such as implemented here, is
better than none, but maybe we can elaborate in a comment that this is
really more of a workaround for the current behavior of firmware
relative to the 64-bit MMIO placement policy. Thanks,
Alex
> + error_setg(errp, "Host physical address space (%u) is larger than "
> + "the host IOMMU address space (%u).", cpu_aw_bits,
> + iommu_aw_bits);
> + vfio_device_error_append(vbasedev, errp);
> + return false;
> + }
> + return true;
> +}
> +
> bool vfio_attach_device(char *name, VFIODevice *vbasedev,
> AddressSpace *as, Error **errp)
> {
> const VFIOIOMMUClass *ops =
> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
> HostIOMMUDevice *hiod = NULL;
> + Error *local_err = NULL;
>
> if (vbasedev->iommufd) {
> ops = VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
> @@ -1571,6 +1589,9 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
> return false;
> }
>
> + if (!vfio_device_check_address_space(vbasedev, &local_err)) {
> + warn_report_err(local_err);
> + }
> return true;
> }
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/9] util/error: Introduce warn_report_once_err()
2025-01-30 17:55 ` Alex Williamson
@ 2025-01-30 21:26 ` Cédric Le Goater
2025-01-31 8:30 ` Markus Armbruster
0 siblings, 1 reply; 35+ messages in thread
From: Cédric Le Goater @ 2025-01-30 21:26 UTC (permalink / raw)
To: Alex Williamson; +Cc: qemu-devel, Markus Armbruster
On 1/30/25 18:55, Alex Williamson wrote:
> On Thu, 30 Jan 2025 14:43:38 +0100
> Cédric Le Goater <clg@redhat.com> wrote:
>
>> Depending on the configuration, a passthrough device may produce
>> recurring DMA mapping errors at runtime and produce a lot of
>> output. It is useful to report only once.
>>
>> Cc: Markus Armbruster <armbru@redhat.com>
>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>> ---
>> include/qapi/error.h | 5 +++++
>> util/error.c | 9 +++++++++
>> 2 files changed, 14 insertions(+)
>>
>> diff --git a/include/qapi/error.h b/include/qapi/error.h
>> index 71f8fb2c50eee9a544992d0c05263c9793956fe1..b6ea274882b9788b64d4bb213c3458d7c674a881 100644
>> --- a/include/qapi/error.h
>> +++ b/include/qapi/error.h
>> @@ -448,6 +448,11 @@ void error_free_or_abort(Error **errp);
>> */
>> void warn_report_err(Error *err);
>>
>> +/*
>> + * Convenience function to call warn_report_err() once.
>> + */
>> +void warn_report_once_err(Error *err);
>> +
>
> Turning it into a macro would do what you want:
>
> #define warn_report_once_err(err) ({ \
> static bool print_once_; \
> if (!print_once_) { \
> warn_report_err(err); \
> print_once_ = true; \
> } \
> })
>
> So long as we only want once per call site and not once per object,
> which would pull in something like warn_report_once_cond(). Thanks,
yeah. I came up with this :
/*
* TODO: move to util/
*/
static bool warn_report_once_err_cond(bool *printed, Error *err)
{
if (*printed) {
error_free(err);
return false;
}
*printed = true;
warn_report_err(err);
return true;
}
#define warn_report_once_err(err) \
({ \
static bool print_once_; \
warn_report_once_err_cond(&print_once_, err); \
})
I don't know where to put it though. It sits in between qapi/error.h
and qemu/error-report.h.
Thanks,
C.
> Alex
>
>> /*
>> * Convenience function to error_report() and free @err.
>> * The report includes hints added with error_append_hint().
>> diff --git a/util/error.c b/util/error.c
>> index e5e247209a9e0796074a9794f5598325f22f8d35..b8a211cccaa609a93091b86316144a0ad0a02662 100644
>> --- a/util/error.c
>> +++ b/util/error.c
>> @@ -247,6 +247,15 @@ void warn_report_err(Error *err)
>> error_free(err);
>> }
>>
>> +void warn_report_once_err(Error *err)
>> +{
>> + static bool print_once_;
>> + if (!print_once_) {
>> + warn_report_err(err);
>> + print_once_ = true;
>> + }
>> +}
>> +
>> void error_reportf_err(Error *err, const char *fmt, ...)
>> {
>> va_list ap;
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/9] util/error: Introduce warn_report_once_err()
2025-01-30 21:26 ` Cédric Le Goater
@ 2025-01-31 8:30 ` Markus Armbruster
0 siblings, 0 replies; 35+ messages in thread
From: Markus Armbruster @ 2025-01-31 8:30 UTC (permalink / raw)
To: Cédric Le Goater; +Cc: Alex Williamson, qemu-devel
Cédric Le Goater <clg@redhat.com> writes:
> On 1/30/25 18:55, Alex Williamson wrote:
>> On Thu, 30 Jan 2025 14:43:38 +0100
>> Cédric Le Goater <clg@redhat.com> wrote:
>>
>>> Depending on the configuration, a passthrough device may produce
>>> recurring DMA mapping errors at runtime and produce a lot of
>>> output. It is useful to report only once.
>>>
>>> Cc: Markus Armbruster <armbru@redhat.com>
>>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>>> ---
>>> include/qapi/error.h | 5 +++++
>>> util/error.c | 9 +++++++++
>>> 2 files changed, 14 insertions(+)
>>>
>>> diff --git a/include/qapi/error.h b/include/qapi/error.h
>>> index 71f8fb2c50eee9a544992d0c05263c9793956fe1..b6ea274882b9788b64d4bb213c3458d7c674a881 100644
>>> --- a/include/qapi/error.h
>>> +++ b/include/qapi/error.h
>>> @@ -448,6 +448,11 @@ void error_free_or_abort(Error **errp);
>>> */
>>> void warn_report_err(Error *err);
>>> +/*
>>> + * Convenience function to call warn_report_err() once.
>>> + */
>>> +void warn_report_once_err(Error *err);
>>> +
>> Turning it into a macro would do what you want:
>> #define warn_report_once_err(err) ({ \
>> static bool print_once_; \
>> if (!print_once_) { \
>> warn_report_err(err); \
>> print_once_ = true; \
>> } \
>> })
>> So long as we only want once per call site and not once per object,
>> which would pull in something like warn_report_once_cond(). Thanks,
>
> yeah. I came up with this :
>
> /*
> * TODO: move to util/
> */
> static bool warn_report_once_err_cond(bool *printed, Error *err)
> {
> if (*printed) {
> error_free(err);
> return false;
> }
> *printed = true;
> warn_report_err(err);
> return true;
> }
>
> #define warn_report_once_err(err) \
> ({ \
> static bool print_once_; \
> warn_report_once_err_cond(&print_once_, err); \
> })
>
>
> I don't know where to put it though. It sits in between qapi/error.h
> and qemu/error-report.h.
Stuff involving the Error type should not go into qemu/error-report.h.
Precedence: warn_report_err() & friends are in qapi/error.h even though
they're straightforward wrappers around warn_report() & friends for easy
use with Error.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 8/9] vfio: Check compatibility of CPU and IOMMU address space width
2025-01-30 18:58 ` Alex Williamson
@ 2025-01-31 12:42 ` Cédric Le Goater
2025-01-31 13:23 ` Gerd Hoffmann
0 siblings, 1 reply; 35+ messages in thread
From: Cédric Le Goater @ 2025-01-31 12:42 UTC (permalink / raw)
To: Alex Williamson; +Cc: qemu-devel, Gerd Hoffmann
+ Gerd for insights regarding vIOMMU support in edk2.
On 1/30/25 19:58, Alex Williamson wrote:
> On Thu, 30 Jan 2025 14:43:45 +0100
> Cédric Le Goater <clg@redhat.com> wrote:
>
>> Print a warning if IOMMU address space width is smaller than the
>> physical address width. In this case, PCI peer-to-peer transactions on
>> BARs are not supported and failures of device MMIO regions are to be
>> expected.
>>
>> This can occur with the 39-bit IOMMU address space width as found on
>> consumer grade processors or when using a vIOMMU device with default
>> settings.
>>
>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>> ---
>> hw/vfio/common.c | 21 +++++++++++++++++++++
>> 1 file changed, 21 insertions(+)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 5c9d8657d746ce30af5ae8f9122101e086a61ef5..e5ef93c589b2bed68f790608868ec3c7779d4cb8 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -44,6 +44,8 @@
>> #include "migration/qemu-file.h"
>> #include "system/tpm.h"
>>
>> +#include "hw/core/cpu.h"
>> +
>> VFIODeviceList vfio_device_list =
>> QLIST_HEAD_INITIALIZER(vfio_device_list);
>> static QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces =
>> @@ -1546,12 +1548,28 @@ retry:
>> return info;
>> }
>>
>> +static bool vfio_device_check_address_space(VFIODevice *vbasedev, Error **errp)
>> +{
>> + uint32_t cpu_aw_bits = cpu_get_phys_bits(first_cpu);
>> + uint32_t iommu_aw_bits = vfio_device_get_aw_bits(vbasedev);
>> +
>> + if (cpu_aw_bits && cpu_aw_bits > iommu_aw_bits) {
>
> Should we be testing the 64-bit MMIO window and maximum RAM GPA rather
> than the vCPU physical address width?
>
> Maybe we're just stuck with an indirect solution currently. AIUI,
> we're testing the vCPU physical address width because (obviously) all
> devices and memory need to be addressable within that address space.
> However, as we've explored offline, there are bare metal systems where
> the CPU physical address width exceeds the IOMMU address width and this
> is not a fundamental problem. It places restrictions on the maximum
> RAM physical address and the location of the 64-bit MMIO space.
>
> RAM tends not to be a problem on these bare metal systems since they
> physically cannot hold enough RAM to exceed the IOMMU width and, for
> the most part, we map RAM starting from the bottom of the address
> space. So long as the VMM doesn't decide to map RAM at the top of the
> physical address space, this doesn't become a problem.
>
> However, we've decided to do exactly that for the 64-bit MMIO window.
> It's not that the vCPU width being greater than the IOMMU width is a
> fundamental problem, it's that we've chosen a 64-bit MMIO policy that
> makes this become a problem and QEMU only has a convenient mechanism to
> check the host IOMMU width when a vfio device is present. Arguably,
> when a vIOMMU is present guest firmware should be enlightened to
> understand the address width of that vIOMMU when placing the 64-bit
> MMIO window. I'd say the failure to do so is a current firmware bug.
>
> If the vIOMMU address width were honored, we could recommend users set
> that to match the host, regardless of the vCPU physical address width.
> We also already have a failure condition if the vIOMMU address width
> exceeds the vfio-device (ie. indirectly the host) IOMMU width.
>
> Of course without a vIOMMU, given our 64-bit MMIO policy, we don't have
> a good solution without specifying the 64-bit window or implementing a
> more conservative placement.
>
> Not sure how much of this is immediately solvable and some indication
> to the user how they can resolve the issue, such as implemented here, is
> better than none, but maybe we can elaborate in a comment that this is
> really more of a workaround for the current behavior of firmware
> relative to the 64-bit MMIO placement policy. Thanks,
Sure. I will improve the commit log in v3.
Thanks,
C.
> Alex
>
>> + error_setg(errp, "Host physical address space (%u) is larger than "
>> + "the host IOMMU address space (%u).", cpu_aw_bits,
>> + iommu_aw_bits);
>> + vfio_device_error_append(vbasedev, errp);
>> + return false;
>> + }
>> + return true;
>> +}
>> +
>> bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>> AddressSpace *as, Error **errp)
>> {
>> const VFIOIOMMUClass *ops =
>> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
>> HostIOMMUDevice *hiod = NULL;
>> + Error *local_err = NULL;
>>
>> if (vbasedev->iommufd) {
>> ops = VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
>> @@ -1571,6 +1589,9 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>> return false;
>> }
>>
>> + if (!vfio_device_check_address_space(vbasedev, &local_err)) {
>> + warn_report_err(local_err);
>> + }
>> return true;
>> }
>>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 8/9] vfio: Check compatibility of CPU and IOMMU address space width
2025-01-31 12:42 ` Cédric Le Goater
@ 2025-01-31 13:23 ` Gerd Hoffmann
2025-01-31 17:03 ` Cédric Le Goater
2025-01-31 22:18 ` Alex Williamson
0 siblings, 2 replies; 35+ messages in thread
From: Gerd Hoffmann @ 2025-01-31 13:23 UTC (permalink / raw)
To: Cédric Le Goater; +Cc: Alex Williamson, qemu-devel
On Fri, Jan 31, 2025 at 01:42:31PM +0100, Cédric Le Goater wrote:
> + Gerd for insights regarding vIOMMU support in edk2.
>
> > > +static bool vfio_device_check_address_space(VFIODevice *vbasedev, Error **errp)
> > > +{
> > > + uint32_t cpu_aw_bits = cpu_get_phys_bits(first_cpu);
> > > + uint32_t iommu_aw_bits = vfio_device_get_aw_bits(vbasedev);
> > > +
> > > + if (cpu_aw_bits && cpu_aw_bits > iommu_aw_bits) {
> >
> > Should we be testing the 64-bit MMIO window and maximum RAM GPA rather
> > than the vCPU physical address width?
Placing the 64-bit mmio window is done by the guest firmware,
so this isn't something qemu can check before boot.
> > However, we've decided to do exactly that for the 64-bit MMIO window.
> > It's not that the vCPU width being greater than the IOMMU width is a
> > fundamental problem, it's that we've chosen a 64-bit MMIO policy that
> > makes this become a problem and QEMU only has a convenient mechanism to
> > check the host IOMMU width when a vfio device is present. Arguably,
> > when a vIOMMU is present guest firmware should be enlightened to
> > understand the address width of that vIOMMU when placing the 64-bit
> > MMIO window. I'd say the failure to do so is a current firmware bug.
edk2 has no iommu driver ...
Is there some simple way to figure what the iommu width is (inside the
guest)?
Note that there is a 'guest-phys-bits' property for x86 CPUs, which is a
hint for the guest what the usable address width is. It was added
because there are cases where the guest simply can't figure that it is
not possible to use the full physical address space of the cpu. There
are some non-obvious limitations around 5-level paging. Intel has some
CPUs with phys-bits > 48 but only 4-level EPT for example.
So one option to handle this is to make sure guest-phys-bits is not
larger than the iommu width.
take care,
Gerd
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 8/9] vfio: Check compatibility of CPU and IOMMU address space width
2025-01-31 13:23 ` Gerd Hoffmann
@ 2025-01-31 17:03 ` Cédric Le Goater
2025-02-06 7:54 ` Gerd Hoffmann
2025-01-31 22:18 ` Alex Williamson
1 sibling, 1 reply; 35+ messages in thread
From: Cédric Le Goater @ 2025-01-31 17:03 UTC (permalink / raw)
To: Gerd Hoffmann; +Cc: Alex Williamson, qemu-devel
Hello Gerd,
On 1/31/25 14:23, Gerd Hoffmann wrote:
> On Fri, Jan 31, 2025 at 01:42:31PM +0100, Cédric Le Goater wrote:
>> + Gerd for insights regarding vIOMMU support in edk2.
>>
>>>> +static bool vfio_device_check_address_space(VFIODevice *vbasedev, Error **errp)
>>>> +{
>>>> + uint32_t cpu_aw_bits = cpu_get_phys_bits(first_cpu);
>>>> + uint32_t iommu_aw_bits = vfio_device_get_aw_bits(vbasedev);
>>>> +
>>>> + if (cpu_aw_bits && cpu_aw_bits > iommu_aw_bits) {
>>>
>>> Should we be testing the 64-bit MMIO window and maximum RAM GPA rather
>>> than the vCPU physical address width?
>
> Placing the 64-bit mmio window is done by the guest firmware,
> so this isn't something qemu can check before boot.
>
>>> However, we've decided to do exactly that for the 64-bit MMIO window.
>>> It's not that the vCPU width being greater than the IOMMU width is a
>>> fundamental problem, it's that we've chosen a 64-bit MMIO policy that
>>> makes this become a problem and QEMU only has a convenient mechanism to
>>> check the host IOMMU width when a vfio device is present. Arguably,
>>> when a vIOMMU is present guest firmware should be enlightened to
>>> understand the address width of that vIOMMU when placing the 64-bit
>>> MMIO window. I'd say the failure to do so is a current firmware bug.
>
> edk2 has no iommu driver ...
>
> Is there some simple way to figure what the iommu width is (inside the
> guest)?
When using VFIO, vfio_device_get_aw_bits() will return the IOMMU
address space width. There are checks when using a vIOMMU that
the host and vIOMMU address space are compatible. vIOMMU should
be smaller.
> Note that there is a 'guest-phys-bits' property for x86 CPUs, which is a
> hint for the guest what the usable address width is. It was added
> because there are cases where the guest simply can't figure that it is
> not possible to use the full physical address space of the cpu. There
> are some non-obvious limitations around 5-level paging. Intel has some
> CPUs with phys-bits > 48 but only 4-level EPT for example.
>
> So one option to handle this is to make sure guest-phys-bits is not
> larger than the iommu width.
Yes. This is what I am trying to do.
Patch [1] returns X86_CPU(cs)->phys_bits. I was not sure which *phys*
property to use. If you think this is incorrect and not returning the
right information, I will change the proposal with guest-phys-bits.
Thanks,
C.
[1] https://lore.kernel.org/qemu-devel/20250130134346.1754143-8-clg@redhat.com/
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 8/9] vfio: Check compatibility of CPU and IOMMU address space width
2025-01-31 13:23 ` Gerd Hoffmann
2025-01-31 17:03 ` Cédric Le Goater
@ 2025-01-31 22:18 ` Alex Williamson
2025-02-06 8:22 ` Gerd Hoffmann
1 sibling, 1 reply; 35+ messages in thread
From: Alex Williamson @ 2025-01-31 22:18 UTC (permalink / raw)
To: Gerd Hoffmann; +Cc: Cédric Le Goater, qemu-devel
On Fri, 31 Jan 2025 14:23:58 +0100
Gerd Hoffmann <kraxel@redhat.com> wrote:
> On Fri, Jan 31, 2025 at 01:42:31PM +0100, Cédric Le Goater wrote:
> > + Gerd for insights regarding vIOMMU support in edk2.
> >
> > > > +static bool vfio_device_check_address_space(VFIODevice *vbasedev, Error **errp)
> > > > +{
> > > > + uint32_t cpu_aw_bits = cpu_get_phys_bits(first_cpu);
> > > > + uint32_t iommu_aw_bits = vfio_device_get_aw_bits(vbasedev);
> > > > +
> > > > + if (cpu_aw_bits && cpu_aw_bits > iommu_aw_bits) {
> > >
> > > Should we be testing the 64-bit MMIO window and maximum RAM GPA rather
> > > than the vCPU physical address width?
>
> Placing the 64-bit mmio window is done by the guest firmware,
> so this isn't something qemu can check before boot.
>
> > > However, we've decided to do exactly that for the 64-bit MMIO window.
> > > It's not that the vCPU width being greater than the IOMMU width is a
> > > fundamental problem, it's that we've chosen a 64-bit MMIO policy that
> > > makes this become a problem and QEMU only has a convenient mechanism to
> > > check the host IOMMU width when a vfio device is present. Arguably,
> > > when a vIOMMU is present guest firmware should be enlightened to
> > > understand the address width of that vIOMMU when placing the 64-bit
> > > MMIO window. I'd say the failure to do so is a current firmware bug.
>
> edk2 has no iommu driver ...
>
> Is there some simple way to figure what the iommu width is (inside the
> guest)?
If the guest firmware is exposing a DMAR table (VT-d), there's a host
address width field in that table. Otherwise there are capability
registers on the DRHD units that could be queried. AMD-Vi seems to
have similar information in the IVinfo table linked from the IVRS
table. Maybe an iterative solution is ok, starting with the most
common IOMMU types and assuming others are 64-bits wide until proven
otherwise.
>
> Note that there is a 'guest-phys-bits' property for x86 CPUs, which is a
> hint for the guest what the usable address width is. It was added
> because there are cases where the guest simply can't figure that it is
> not possible to use the full physical address space of the cpu. There
> are some non-obvious limitations around 5-level paging. Intel has some
> CPUs with phys-bits > 48 but only 4-level EPT for example.
>
> So one option to handle this is to make sure guest-phys-bits is not
> larger than the iommu width.
Right, as Cédric notes that's sort of what happens here, but my concern
was that the we're kind of incorrectly linking the cpu address width to
the platform address width, which isn't specifically the requirement.
It's valid to have CPU physical address width great than IOMMU address
width, that exists on bare metal, but guest firmware needs to take
IOMMU address width into account in placing the 64-bit MMIO window if
we want PCI BARs to be DMA addressable. Thanks,
Alex
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 8/9] vfio: Check compatibility of CPU and IOMMU address space width
2025-01-31 17:03 ` Cédric Le Goater
@ 2025-02-06 7:54 ` Gerd Hoffmann
2025-02-06 17:05 ` Cédric Le Goater
0 siblings, 1 reply; 35+ messages in thread
From: Gerd Hoffmann @ 2025-02-06 7:54 UTC (permalink / raw)
To: Cédric Le Goater; +Cc: Alex Williamson, qemu-devel
> > Note that there is a 'guest-phys-bits' property for x86 CPUs, which is a
> > hint for the guest what the usable address width is. It was added
> > because there are cases where the guest simply can't figure that it is
> > not possible to use the full physical address space of the cpu. There
> > are some non-obvious limitations around 5-level paging. Intel has some
> > CPUs with phys-bits > 48 but only 4-level EPT for example.
> >
> > So one option to handle this is to make sure guest-phys-bits is not
> > larger than the iommu width.
>
> Yes. This is what I am trying to do.
>
> Patch [1] returns X86_CPU(cs)->phys_bits. I was not sure which *phys*
> property to use. If you think this is incorrect and not returning the
> right information, I will change the proposal with guest-phys-bits.
> [1] https://lore.kernel.org/qemu-devel/20250130134346.1754143-8-clg@redhat.com/
Yes, guest-phys-bits should be used here (and the helpers renamed too
for consistency, to avoid making all this more complex than it already
is).
take care,
Gerd
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 8/9] vfio: Check compatibility of CPU and IOMMU address space width
2025-01-31 22:18 ` Alex Williamson
@ 2025-02-06 8:22 ` Gerd Hoffmann
0 siblings, 0 replies; 35+ messages in thread
From: Gerd Hoffmann @ 2025-02-06 8:22 UTC (permalink / raw)
To: Alex Williamson; +Cc: Cédric Le Goater, qemu-devel
Hi,
> > Is there some simple way to figure what the iommu width is (inside the
> > guest)?
>
> If the guest firmware is exposing a DMAR table (VT-d), there's a host
> address width field in that table. Otherwise there are capability
> registers on the DRHD units that could be queried. AMD-Vi seems to
> have similar information in the IVinfo table linked from the IVRS
> table. Maybe an iterative solution is ok, starting with the most
> common IOMMU types and assuming others are 64-bits wide until proven
> otherwise.
Hmm. The guest firmware simply exposes the tables it gets from qemu
(without parsing it). Also the acpi tables are loaded after the
firmware created the address space layout, because qemu adjusts them
according to the programming done by the firmware (set pci bus ranges
according to bridge windows etc).
So checking ACPI tables for that information doesn't look very
attractive. The firmware would have to load the tables twice (once
early to parse DMAR, once late for the final tables).
Going for the capability registers might be possible. Can the hardware
be probed for safely? Has AMD capability registers too?
Third option would be using another channel to communicate the
information from qemu to firmware, where using 'guest-phys-bits' would
be one candidate.
> Right, as Cédric notes that's sort of what happens here, but my concern
> was that the we're kind of incorrectly linking the cpu address width to
> the platform address width, which isn't specifically the requirement.
There is a separate 'guest-phys-bits' property for that reason. The
phys-bits of the CPU are not changed.
take care,
Gerd
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 8/9] vfio: Check compatibility of CPU and IOMMU address space width
2025-02-06 7:54 ` Gerd Hoffmann
@ 2025-02-06 17:05 ` Cédric Le Goater
0 siblings, 0 replies; 35+ messages in thread
From: Cédric Le Goater @ 2025-02-06 17:05 UTC (permalink / raw)
To: Gerd Hoffmann; +Cc: Alex Williamson, qemu-devel
On 2/6/25 08:54, Gerd Hoffmann wrote:
>>> Note that there is a 'guest-phys-bits' property for x86 CPUs, which is a
>>> hint for the guest what the usable address width is. It was added
>>> because there are cases where the guest simply can't figure that it is
>>> not possible to use the full physical address space of the cpu. There
>>> are some non-obvious limitations around 5-level paging. Intel has some
>>> CPUs with phys-bits > 48 but only 4-level EPT for example.
>>>
>>> So one option to handle this is to make sure guest-phys-bits is not
>>> larger than the iommu width.
>>
>> Yes. This is what I am trying to do.
>>
>> Patch [1] returns X86_CPU(cs)->phys_bits. I was not sure which *phys*
>> property to use. If you think this is incorrect and not returning the
>> right information, I will change the proposal with guest-phys-bits.
>
>> [1] https://lore.kernel.org/qemu-devel/20250130134346.1754143-8-clg@redhat.com/
>
> Yes, guest-phys-bits should be used here (and the helpers renamed too
> for consistency, to avoid making all this more complex than it already
> is).
yep. I will do that in a dedicated series.
Thanks
C.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 2/9] vfio/pci: Replace "iommu_device" by "vIOMMU"
2025-01-30 13:43 ` [PATCH v2 2/9] vfio/pci: Replace "iommu_device" by "vIOMMU" Cédric Le Goater
@ 2025-02-10 14:28 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 35+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-02-10 14:28 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel; +Cc: Alex Williamson
On 30/1/25 14:43, Cédric Le Goater wrote:
> This is to be consistent with other reported errors related to vIOMMU
> devices.
>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> ---
> hw/vfio/pci.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 4/9] vfio: Introduce vfio_get_vfio_device()
2025-01-30 13:43 ` [PATCH v2 4/9] vfio: Introduce vfio_get_vfio_device() Cédric Le Goater
@ 2025-02-10 14:32 ` Philippe Mathieu-Daudé
2025-02-10 16:19 ` Cédric Le Goater
0 siblings, 1 reply; 35+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-02-10 14:32 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel; +Cc: Alex Williamson
On 30/1/25 14:43, Cédric Le Goater wrote:
> This helper will be useful in the listener handlers to extract the
> VFIO device from a memory region using memory_region_owner(). At the
> moment, we only care for PCI passthrough devices. If the need arises,
> we will add more.
>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> ---
> include/hw/vfio/vfio-common.h | 1 +
> hw/vfio/helpers.c | 10 ++++++++++
> 2 files changed, 11 insertions(+)
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 0c60be5b15c70168f4f94ad7054d9bd750a162d3..ac35136a11051b079cd9d04e6becd344a0e0f7e7 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -252,6 +252,7 @@ bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp);
> bool vfio_attach_device(char *name, VFIODevice *vbasedev,
> AddressSpace *as, Error **errp);
> void vfio_detach_device(VFIODevice *vbasedev);
> +VFIODevice *vfio_get_vfio_device(Object *obj);
>
> int vfio_kvm_device_add_fd(int fd, Error **errp);
> int vfio_kvm_device_del_fd(int fd, Error **errp);
> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
> index 913796f437f84eece8711cb4b4b654a44040d17c..4b255d4f3a9e81f55df00c68fc71da769fd5bd04 100644
> --- a/hw/vfio/helpers.c
> +++ b/hw/vfio/helpers.c
> @@ -23,6 +23,7 @@
> #include <sys/ioctl.h>
>
> #include "hw/vfio/vfio-common.h"
> +#include "hw/vfio/pci.h"
> #include "hw/hw.h"
> #include "trace.h"
> #include "qapi/error.h"
> @@ -728,3 +729,12 @@ bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp)
>
> return HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp);
> }
> +
> +VFIODevice *vfio_get_vfio_device(Object *obj)
Can't we take a VFIOPCIDevice argument?
> +{
> + if (object_dynamic_cast(obj, TYPE_VFIO_PCI)) {
> + return &VFIO_PCI(obj)->vbasedev;
> + } else {
> + return NULL;
> + }
> +}
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 5/9] vfio: Improve error reporting when MMIO region mapping fails
2025-01-30 13:43 ` [PATCH v2 5/9] vfio: Improve error reporting when MMIO region mapping fails Cédric Le Goater
@ 2025-02-10 14:36 ` Philippe Mathieu-Daudé
2025-02-10 16:17 ` Cédric Le Goater
0 siblings, 1 reply; 35+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-02-10 14:36 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel; +Cc: Alex Williamson
On 30/1/25 14:43, Cédric Le Goater wrote:
> When the IOMMU address space width is smaller than the physical
> address width, a MMIO region of a device can fail to map because the
> region is outside the supported IOVA ranges of the VM. In this case,
> PCI peer-to-peer transactions on BARs are not supported.
>
> This can occur with the 39-bit IOMMU address space width, as can be
> the case on some consumer processors or when using a vIOMMU device
> with default settings. The current error message is unclear, also
> change the error report to a warning because it is a non fatal
> condition for the VM.
>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> ---
> hw/vfio/common.c | 17 ++++++++++++++++-
> 1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 62af1216fc5a9089fc718c2afe3a405d9381db32..5c9d8657d746ce30af5ae8f9122101e086a61ef5 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -555,6 +555,18 @@ static bool vfio_get_section_iova_range(VFIOContainerBase *bcontainer,
> return true;
> }
>
> +static void vfio_device_error_append(VFIODevice *vbasedev, Error **errp)
> +{
> + /*
> + * MMIO region mapping failures are not fatal but in this case PCI
> + * peer-to-peer transactions are broken.
> + */
> + if (vbasedev && vbasedev->type == VFIO_DEVICE_TYPE_PCI) {
> + error_append_hint(errp, "%s: PCI peer-to-peer transactions "
> + "on BARs are not supported.\n", vbasedev->name);
> + }
> +}
> +
> static void vfio_listener_region_add(MemoryListener *listener,
> MemoryRegionSection *section)
> {
> @@ -670,7 +682,10 @@ static void vfio_listener_region_add(MemoryListener *listener,
> strerror(-ret));
> if (memory_region_is_ram_device(section->mr)) {
> /* Allow unexpected mappings not to be fatal for RAM devices */
> - error_report_err(err);
> + VFIODevice *vbasedev =
> + vfio_get_vfio_device(memory_region_owner(section->mr));
> + vfio_device_error_append(vbasedev, &err);
Having vfio_get_vfio_device() returning NULL and
vfio_device_error_append() also checking for NULL is odd.
Maybe just inline everything here?
> + warn_report_once_err(err);
> return;
> }
> goto fail;
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 7/9] cpu: Introduce cpu_get_phys_bits()
2025-01-30 13:43 ` [PATCH v2 7/9] cpu: Introduce cpu_get_phys_bits() Cédric Le Goater
@ 2025-02-10 14:40 ` Philippe Mathieu-Daudé
2025-03-06 10:37 ` Philippe Mathieu-Daudé
1 sibling, 0 replies; 35+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-02-10 14:40 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Alex Williamson, Richard Henderson, Paolo Bonzini,
Eduardo Habkost, Marcel Apfelbaum, Yanan Wang, Zhao Liu
Hi Cédric,
On 30/1/25 14:43, Cédric Le Goater wrote:
> The Intel CPU has a complex history regarding setting of the physical
> address space width on KVM. A 'phys_bits' field and a "phys-bits"
> property were added by commit af45907a1328 ("target-i386: Allow
> physical address bits to be set") to tune this value.
>
> In certain circumstances, it is interesting to know this value to
> check that all the conditions are met for optimal operation. For
> instance, when the system has a 39-bit IOMMU address space width and a
> larger CPU physical address space, we expect issues when mapping the
> MMIO regions of passthrough devices and it would good to report to the
> user. These hybrid HW configs can be found on some consumer grade
> processors or when using a vIOMMU device with default settings.
>
> For this purpose, add an helper routine and a CPUClass callback to
> return the physical address space width of a CPU.
>
> Cc: Richard Henderson <richard.henderson@linaro.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Eduardo Habkost <eduardo@habkost.net>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: "Philippe Mathieu-Daudé" <philmd@linaro.org>
> Cc: Yanan Wang <wangyanan55@huawei.com>
> Cc: Zhao Liu <zhao1.liu@intel.com>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> ---
> include/hw/core/cpu.h | 9 +++++++++
> include/hw/core/sysemu-cpu-ops.h | 6 ++++++
> cpu-target.c | 5 +++++
> hw/core/cpu-system.c | 11 +++++++++++
> target/i386/cpu.c | 6 ++++++
> 5 files changed, 37 insertions(+)
>
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index fb397cdfc53d12d40d3e4e7f86251fc31c48b9f6..1b3eead102ce62fcee55ab0ed5e0dff327fa2fc5 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -748,6 +748,14 @@ int cpu_asidx_from_attrs(CPUState *cpu, MemTxAttrs attrs);
> */
> bool cpu_virtio_is_big_endian(CPUState *cpu);
>
> +/**
> + * cpu_get_phys_bits:
> + * @cpu: CPU
> + *
> + * Return the physical address space width of the CPU @cpu.
> + */
> +uint32_t cpu_get_phys_bits(const CPUState *cpu);
> +
> #endif /* CONFIG_USER_ONLY */
>
> /**
> @@ -1168,6 +1176,7 @@ void cpu_exec_unrealizefn(CPUState *cpu);
> void cpu_exec_reset_hold(CPUState *cpu);
>
> const char *target_name(void);
> +uint32_t target_phys_bits(void);
>
> #ifdef COMPILING_PER_TARGET
>
> diff --git a/include/hw/core/sysemu-cpu-ops.h b/include/hw/core/sysemu-cpu-ops.h
> index 0df5b058f50073e47d2a6b8286be5204776520d2..210b3ed57985525795b81559e41e0085969210d5 100644
> --- a/include/hw/core/sysemu-cpu-ops.h
> +++ b/include/hw/core/sysemu-cpu-ops.h
> @@ -81,6 +81,12 @@ typedef struct SysemuCPUOps {
> */
> bool (*virtio_is_big_endian)(CPUState *cpu);
>
> + /**
> + * @get_phys_bits: Callback to return the physical address space
> + * width of a CPU.
> + */
> + uint32_t (*get_phys_bits)(const CPUState *cpu);
Using 32-bit isn't really relevant IMHO, I'd return an unsigned type.
> +
> /**
> * @legacy_vmsd: Legacy state for migration.
> * Do not use in new targets, use #DeviceClass::vmsd instead.
> diff --git a/cpu-target.c b/cpu-target.c
> index 667688332c929aa53782c94343def34571272d5f..88158272c06cc42424d435b9701e33735f080239 100644
> --- a/cpu-target.c
> +++ b/cpu-target.c
> @@ -472,3 +472,8 @@ const char *target_name(void)
> {
> return TARGET_NAME;
> }
> +
> +uint32_t target_phys_bits(void)
> +{
> + return TARGET_PHYS_ADDR_SPACE_BITS;
> +}
> diff --git a/hw/core/cpu-system.c b/hw/core/cpu-system.c
> index 6aae28a349a7a377d010ff9dcab5ebc29e1126ca..05067d84f4258facf4252216f17729e390d38eae 100644
> --- a/hw/core/cpu-system.c
> +++ b/hw/core/cpu-system.c
> @@ -60,6 +60,17 @@ hwaddr cpu_get_phys_page_attrs_debug(CPUState *cpu, vaddr addr,
> return cc->sysemu_ops->get_phys_page_debug(cpu, addr);
> }
>
> +uint32_t cpu_get_phys_bits(const CPUState *cpu)
> +{
> + CPUClass *cc = CPU_GET_CLASS(cpu);
> +
> + if (cc->sysemu_ops->get_phys_bits) {
> + return cc->sysemu_ops->get_phys_bits(cpu);
> + }
> +
> + return target_phys_bits();
With heterogeneous emulation in mind, I'd rather this to be a mandatory
CPUClass handler.
> +}
> +
> hwaddr cpu_get_phys_page_debug(CPUState *cpu, vaddr addr)
> {
> MemTxAttrs attrs = {};
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index b5dd60d2812e0c3d13c1743fd485a9068ab29c4f..01cf9a44963710a415c755c17582730f75233143 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -8393,6 +8393,11 @@ static bool x86_cpu_get_paging_enabled(const CPUState *cs)
>
> return cpu->env.cr[0] & CR0_PG_MASK;
> }
> +
> +static uint32_t x86_cpu_get_phys_bits(const CPUState *cs)
> +{
> + return X86_CPU(cs)->phys_bits;
> +}
> #endif /* !CONFIG_USER_ONLY */
>
> static void x86_cpu_set_pc(CPUState *cs, vaddr value)
> @@ -8701,6 +8706,7 @@ static const struct SysemuCPUOps i386_sysemu_ops = {
> .get_memory_mapping = x86_cpu_get_memory_mapping,
> .get_paging_enabled = x86_cpu_get_paging_enabled,
> .get_phys_page_attrs_debug = x86_cpu_get_phys_page_attrs_debug,
> + .get_phys_bits = x86_cpu_get_phys_bits,
> .asidx_from_attrs = x86_asidx_from_attrs,
> .get_crash_info = x86_cpu_get_crash_info,
> .write_elf32_note = x86_cpu_write_elf32_note,
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 5/9] vfio: Improve error reporting when MMIO region mapping fails
2025-02-10 14:36 ` Philippe Mathieu-Daudé
@ 2025-02-10 16:17 ` Cédric Le Goater
0 siblings, 0 replies; 35+ messages in thread
From: Cédric Le Goater @ 2025-02-10 16:17 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, qemu-devel; +Cc: Alex Williamson
On 2/10/25 15:36, Philippe Mathieu-Daudé wrote:
> On 30/1/25 14:43, Cédric Le Goater wrote:
>> When the IOMMU address space width is smaller than the physical
>> address width, a MMIO region of a device can fail to map because the
>> region is outside the supported IOVA ranges of the VM. In this case,
>> PCI peer-to-peer transactions on BARs are not supported.
>>
>> This can occur with the 39-bit IOMMU address space width, as can be
>> the case on some consumer processors or when using a vIOMMU device
>> with default settings. The current error message is unclear, also
>> change the error report to a warning because it is a non fatal
>> condition for the VM.
>>
>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>> ---
>> hw/vfio/common.c | 17 ++++++++++++++++-
>> 1 file changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 62af1216fc5a9089fc718c2afe3a405d9381db32..5c9d8657d746ce30af5ae8f9122101e086a61ef5 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -555,6 +555,18 @@ static bool vfio_get_section_iova_range(VFIOContainerBase *bcontainer,
>> return true;
>> }
>> +static void vfio_device_error_append(VFIODevice *vbasedev, Error **errp)
>> +{
>> + /*
>> + * MMIO region mapping failures are not fatal but in this case PCI
>> + * peer-to-peer transactions are broken.
>> + */
>> + if (vbasedev && vbasedev->type == VFIO_DEVICE_TYPE_PCI) {
>> + error_append_hint(errp, "%s: PCI peer-to-peer transactions "
>> + "on BARs are not supported.\n", vbasedev->name);
>> + }
>> +}
>> +
>> static void vfio_listener_region_add(MemoryListener *listener,
>> MemoryRegionSection *section)
>> {
>> @@ -670,7 +682,10 @@ static void vfio_listener_region_add(MemoryListener *listener,
>> strerror(-ret));
>> if (memory_region_is_ram_device(section->mr)) {
>> /* Allow unexpected mappings not to be fatal for RAM devices */
>> - error_report_err(err);
>> + VFIODevice *vbasedev =
>> + vfio_get_vfio_device(memory_region_owner(section->mr));
>> + vfio_device_error_append(vbasedev, &err);
>
> Having vfio_get_vfio_device() returning NULL and
> vfio_device_error_append() also checking for NULL is odd.
Shouldn't vfio_device_error_append() check that its arguments are
safe to use ?
> Maybe just inline everything here?
I plan to use it elsewhere. See last patch.
Thanks,
C.
>
>> + warn_report_once_err(err);
>> return;
>> }
>> goto fail;
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 4/9] vfio: Introduce vfio_get_vfio_device()
2025-02-10 14:32 ` Philippe Mathieu-Daudé
@ 2025-02-10 16:19 ` Cédric Le Goater
0 siblings, 0 replies; 35+ messages in thread
From: Cédric Le Goater @ 2025-02-10 16:19 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, qemu-devel; +Cc: Alex Williamson
On 2/10/25 15:32, Philippe Mathieu-Daudé wrote:
> On 30/1/25 14:43, Cédric Le Goater wrote:
>> This helper will be useful in the listener handlers to extract the
>> VFIO device from a memory region using memory_region_owner(). At the
>> moment, we only care for PCI passthrough devices. If the need arises,
>> we will add more.
>>
>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>> ---
>> include/hw/vfio/vfio-common.h | 1 +
>> hw/vfio/helpers.c | 10 ++++++++++
>> 2 files changed, 11 insertions(+)
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 0c60be5b15c70168f4f94ad7054d9bd750a162d3..ac35136a11051b079cd9d04e6becd344a0e0f7e7 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -252,6 +252,7 @@ bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp);
>> bool vfio_attach_device(char *name, VFIODevice *vbasedev,
>> AddressSpace *as, Error **errp);
>> void vfio_detach_device(VFIODevice *vbasedev);
>> +VFIODevice *vfio_get_vfio_device(Object *obj);
>> int vfio_kvm_device_add_fd(int fd, Error **errp);
>> int vfio_kvm_device_del_fd(int fd, Error **errp);
>> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
>> index 913796f437f84eece8711cb4b4b654a44040d17c..4b255d4f3a9e81f55df00c68fc71da769fd5bd04 100644
>> --- a/hw/vfio/helpers.c
>> +++ b/hw/vfio/helpers.c
>> @@ -23,6 +23,7 @@
>> #include <sys/ioctl.h>
>> #include "hw/vfio/vfio-common.h"
>> +#include "hw/vfio/pci.h"
>> #include "hw/hw.h"
>> #include "trace.h"
>> #include "qapi/error.h"
>> @@ -728,3 +729,12 @@ bool vfio_device_hiod_realize(VFIODevice *vbasedev, Error **errp)
>> return HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp);
>> }
>> +
>> +VFIODevice *vfio_get_vfio_device(Object *obj)
>
> Can't we take a VFIOPCIDevice argument?
It could be some other VFIO (AP, Plateform) device type and we wouldn't
need to do the cast below in that case.
Thanks,
C.
>
>> +{
>> + if (object_dynamic_cast(obj, TYPE_VFIO_PCI)) {
>> + return &VFIO_PCI(obj)->vbasedev;
>> + } else {
>> + return NULL;
>> + }
>> +}
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 8/9] vfio: Check compatibility of CPU and IOMMU address space width
2025-01-30 13:43 ` [PATCH v2 8/9] vfio: Check compatibility of CPU and IOMMU address space width Cédric Le Goater
2025-01-30 18:58 ` Alex Williamson
@ 2025-03-06 10:33 ` Philippe Mathieu-Daudé
2025-09-05 13:04 ` Daniel Kral
2 siblings, 0 replies; 35+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-03-06 10:33 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Alex Williamson, Daniel P. Berrangé
On 30/1/25 14:43, Cédric Le Goater wrote:
> Print a warning if IOMMU address space width is smaller than the
> physical address width. In this case, PCI peer-to-peer transactions on
> BARs are not supported and failures of device MMIO regions are to be
> expected.
>
> This can occur with the 39-bit IOMMU address space width as found on
> consumer grade processors or when using a vIOMMU device with default
> settings.
>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> ---
> hw/vfio/common.c | 21 +++++++++++++++++++++
> 1 file changed, 21 insertions(+)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 5c9d8657d746ce30af5ae8f9122101e086a61ef5..e5ef93c589b2bed68f790608868ec3c7779d4cb8 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -44,6 +44,8 @@
> #include "migration/qemu-file.h"
> #include "system/tpm.h"
>
> +#include "hw/core/cpu.h"
> +
> VFIODeviceList vfio_device_list =
> QLIST_HEAD_INITIALIZER(vfio_device_list);
> static QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces =
> @@ -1546,12 +1548,28 @@ retry:
> return info;
> }
>
> +static bool vfio_device_check_address_space(VFIODevice *vbasedev, Error **errp)
> +{
> + uint32_t cpu_aw_bits = cpu_get_phys_bits(first_cpu);
This 'first_cpu' use is annoying. I understand the device is created
from the CLI, and passing the host physical bits limit as a device
property is not practical. We'll deal with that later...
> + uint32_t iommu_aw_bits = vfio_device_get_aw_bits(vbasedev);
> +
> + if (cpu_aw_bits && cpu_aw_bits > iommu_aw_bits) {
> + error_setg(errp, "Host physical address space (%u) is larger than "
> + "the host IOMMU address space (%u).", cpu_aw_bits,
> + iommu_aw_bits);
> + vfio_device_error_append(vbasedev, errp);
> + return false;
> + }
> + return true;
> +}
> +
> bool vfio_attach_device(char *name, VFIODevice *vbasedev,
> AddressSpace *as, Error **errp)
> {
> const VFIOIOMMUClass *ops =
> VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
> HostIOMMUDevice *hiod = NULL;
> + Error *local_err = NULL;
>
> if (vbasedev->iommufd) {
> ops = VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
> @@ -1571,6 +1589,9 @@ bool vfio_attach_device(char *name, VFIODevice *vbasedev,
> return false;
> }
>
> + if (!vfio_device_check_address_space(vbasedev, &local_err)) {
> + warn_report_err(local_err);
> + }
> return true;
> }
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 7/9] cpu: Introduce cpu_get_phys_bits()
2025-01-30 13:43 ` [PATCH v2 7/9] cpu: Introduce cpu_get_phys_bits() Cédric Le Goater
2025-02-10 14:40 ` Philippe Mathieu-Daudé
@ 2025-03-06 10:37 ` Philippe Mathieu-Daudé
2025-03-06 14:41 ` Cédric Le Goater
1 sibling, 1 reply; 35+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-03-06 10:37 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Alex Williamson, Richard Henderson, Paolo Bonzini,
Eduardo Habkost, Marcel Apfelbaum, Yanan Wang, Zhao Liu
On 30/1/25 14:43, Cédric Le Goater wrote:
> The Intel CPU has a complex history regarding setting of the physical
> address space width on KVM. A 'phys_bits' field and a "phys-bits"
> property were added by commit af45907a1328 ("target-i386: Allow
> physical address bits to be set") to tune this value.
>
> In certain circumstances, it is interesting to know this value to
> check that all the conditions are met for optimal operation. For
> instance, when the system has a 39-bit IOMMU address space width and a
> larger CPU physical address space, we expect issues when mapping the
> MMIO regions of passthrough devices and it would good to report to the
> user. These hybrid HW configs can be found on some consumer grade
> processors or when using a vIOMMU device with default settings.
>
> For this purpose, add an helper routine and a CPUClass callback to
> return the physical address space width of a CPU.
>
> Cc: Richard Henderson <richard.henderson@linaro.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Eduardo Habkost <eduardo@habkost.net>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: "Philippe Mathieu-Daudé" <philmd@linaro.org>
> Cc: Yanan Wang <wangyanan55@huawei.com>
> Cc: Zhao Liu <zhao1.liu@intel.com>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> ---
> include/hw/core/cpu.h | 9 +++++++++
> include/hw/core/sysemu-cpu-ops.h | 6 ++++++
> cpu-target.c | 5 +++++
> hw/core/cpu-system.c | 11 +++++++++++
> target/i386/cpu.c | 6 ++++++
> 5 files changed, 37 insertions(+)
>
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index fb397cdfc53d12d40d3e4e7f86251fc31c48b9f6..1b3eead102ce62fcee55ab0ed5e0dff327fa2fc5 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -748,6 +748,14 @@ int cpu_asidx_from_attrs(CPUState *cpu, MemTxAttrs attrs);
> */
> bool cpu_virtio_is_big_endian(CPUState *cpu);
>
> +/**
> + * cpu_get_phys_bits:
> + * @cpu: CPU
> + *
> + * Return the physical address space width of the CPU @cpu.
> + */
> +uint32_t cpu_get_phys_bits(const CPUState *cpu);
> +
> #endif /* CONFIG_USER_ONLY */
>
> /**
> @@ -1168,6 +1176,7 @@ void cpu_exec_unrealizefn(CPUState *cpu);
> void cpu_exec_reset_hold(CPUState *cpu);
>
> const char *target_name(void);
> +uint32_t target_phys_bits(void);
>
> #ifdef COMPILING_PER_TARGET
>
> diff --git a/include/hw/core/sysemu-cpu-ops.h b/include/hw/core/sysemu-cpu-ops.h
> index 0df5b058f50073e47d2a6b8286be5204776520d2..210b3ed57985525795b81559e41e0085969210d5 100644
> --- a/include/hw/core/sysemu-cpu-ops.h
> +++ b/include/hw/core/sysemu-cpu-ops.h
> @@ -81,6 +81,12 @@ typedef struct SysemuCPUOps {
> */
> bool (*virtio_is_big_endian)(CPUState *cpu);
>
> + /**
> + * @get_phys_bits: Callback to return the physical address space
> + * width of a CPU.
> + */
> + uint32_t (*get_phys_bits)(const CPUState *cpu);
> +
> /**
> * @legacy_vmsd: Legacy state for migration.
> * Do not use in new targets, use #DeviceClass::vmsd instead.
> diff --git a/cpu-target.c b/cpu-target.c
> index 667688332c929aa53782c94343def34571272d5f..88158272c06cc42424d435b9701e33735f080239 100644
> --- a/cpu-target.c
> +++ b/cpu-target.c
> @@ -472,3 +472,8 @@ const char *target_name(void)
> {
> return TARGET_NAME;
> }
> +
> +uint32_t target_phys_bits(void)
> +{
> + return TARGET_PHYS_ADDR_SPACE_BITS;
> +}
> diff --git a/hw/core/cpu-system.c b/hw/core/cpu-system.c
> index 6aae28a349a7a377d010ff9dcab5ebc29e1126ca..05067d84f4258facf4252216f17729e390d38eae 100644
> --- a/hw/core/cpu-system.c
> +++ b/hw/core/cpu-system.c
> @@ -60,6 +60,17 @@ hwaddr cpu_get_phys_page_attrs_debug(CPUState *cpu, vaddr addr,
> return cc->sysemu_ops->get_phys_page_debug(cpu, addr);
> }
>
> +uint32_t cpu_get_phys_bits(const CPUState *cpu)
> +{
> + CPUClass *cc = CPU_GET_CLASS(cpu);
> +
> + if (cc->sysemu_ops->get_phys_bits) {
> + return cc->sysemu_ops->get_phys_bits(cpu);
> + }
> +
> + return target_phys_bits();
I might have suggested to return TARGET_PHYS_ADDR_SPACE_BITS
by default in v1, I'm not sure about it anymore. Maybe default
to 0 and have each target register its helper if necessary?
> +}
> +
> hwaddr cpu_get_phys_page_debug(CPUState *cpu, vaddr addr)
> {
> MemTxAttrs attrs = {};
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index b5dd60d2812e0c3d13c1743fd485a9068ab29c4f..01cf9a44963710a415c755c17582730f75233143 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -8393,6 +8393,11 @@ static bool x86_cpu_get_paging_enabled(const CPUState *cs)
>
> return cpu->env.cr[0] & CR0_PG_MASK;
> }
> +
> +static uint32_t x86_cpu_get_phys_bits(const CPUState *cs)
> +{
> + return X86_CPU(cs)->phys_bits;
> +}
> #endif /* !CONFIG_USER_ONLY */
>
> static void x86_cpu_set_pc(CPUState *cs, vaddr value)
> @@ -8701,6 +8706,7 @@ static const struct SysemuCPUOps i386_sysemu_ops = {
> .get_memory_mapping = x86_cpu_get_memory_mapping,
> .get_paging_enabled = x86_cpu_get_paging_enabled,
> .get_phys_page_attrs_debug = x86_cpu_get_phys_page_attrs_debug,
> + .get_phys_bits = x86_cpu_get_phys_bits,
> .asidx_from_attrs = x86_asidx_from_attrs,
> .get_crash_info = x86_cpu_get_crash_info,
> .write_elf32_note = x86_cpu_write_elf32_note,
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 7/9] cpu: Introduce cpu_get_phys_bits()
2025-03-06 10:37 ` Philippe Mathieu-Daudé
@ 2025-03-06 14:41 ` Cédric Le Goater
0 siblings, 0 replies; 35+ messages in thread
From: Cédric Le Goater @ 2025-03-06 14:41 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, qemu-devel
Cc: Alex Williamson, Richard Henderson, Paolo Bonzini,
Eduardo Habkost, Marcel Apfelbaum, Yanan Wang, Zhao Liu
On 3/6/25 11:37, Philippe Mathieu-Daudé wrote:
> On 30/1/25 14:43, Cédric Le Goater wrote:
>> The Intel CPU has a complex history regarding setting of the physical
>> address space width on KVM. A 'phys_bits' field and a "phys-bits"
>> property were added by commit af45907a1328 ("target-i386: Allow
>> physical address bits to be set") to tune this value.
>>
>> In certain circumstances, it is interesting to know this value to
>> check that all the conditions are met for optimal operation. For
>> instance, when the system has a 39-bit IOMMU address space width and a
>> larger CPU physical address space, we expect issues when mapping the
>> MMIO regions of passthrough devices and it would good to report to the
>> user. These hybrid HW configs can be found on some consumer grade
>> processors or when using a vIOMMU device with default settings.
>>
>> For this purpose, add an helper routine and a CPUClass callback to
>> return the physical address space width of a CPU.
>>
>> Cc: Richard Henderson <richard.henderson@linaro.org>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Eduardo Habkost <eduardo@habkost.net>
>> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
>> Cc: "Philippe Mathieu-Daudé" <philmd@linaro.org>
>> Cc: Yanan Wang <wangyanan55@huawei.com>
>> Cc: Zhao Liu <zhao1.liu@intel.com>
>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>> ---
>> include/hw/core/cpu.h | 9 +++++++++
>> include/hw/core/sysemu-cpu-ops.h | 6 ++++++
>> cpu-target.c | 5 +++++
>> hw/core/cpu-system.c | 11 +++++++++++
>> target/i386/cpu.c | 6 ++++++
>> 5 files changed, 37 insertions(+)
>>
>> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
>> index fb397cdfc53d12d40d3e4e7f86251fc31c48b9f6..1b3eead102ce62fcee55ab0ed5e0dff327fa2fc5 100644
>> --- a/include/hw/core/cpu.h
>> +++ b/include/hw/core/cpu.h
>> @@ -748,6 +748,14 @@ int cpu_asidx_from_attrs(CPUState *cpu, MemTxAttrs attrs);
>> */
>> bool cpu_virtio_is_big_endian(CPUState *cpu);
>> +/**
>> + * cpu_get_phys_bits:
>> + * @cpu: CPU
>> + *
>> + * Return the physical address space width of the CPU @cpu.
>> + */
>> +uint32_t cpu_get_phys_bits(const CPUState *cpu);
>> +
>> #endif /* CONFIG_USER_ONLY */
>> /**
>> @@ -1168,6 +1176,7 @@ void cpu_exec_unrealizefn(CPUState *cpu);
>> void cpu_exec_reset_hold(CPUState *cpu);
>> const char *target_name(void);
>> +uint32_t target_phys_bits(void);
>> #ifdef COMPILING_PER_TARGET
>> diff --git a/include/hw/core/sysemu-cpu-ops.h b/include/hw/core/sysemu-cpu-ops.h
>> index 0df5b058f50073e47d2a6b8286be5204776520d2..210b3ed57985525795b81559e41e0085969210d5 100644
>> --- a/include/hw/core/sysemu-cpu-ops.h
>> +++ b/include/hw/core/sysemu-cpu-ops.h
>> @@ -81,6 +81,12 @@ typedef struct SysemuCPUOps {
>> */
>> bool (*virtio_is_big_endian)(CPUState *cpu);
>> + /**
>> + * @get_phys_bits: Callback to return the physical address space
>> + * width of a CPU.
>> + */
>> + uint32_t (*get_phys_bits)(const CPUState *cpu);
>> +
>> /**
>> * @legacy_vmsd: Legacy state for migration.
>> * Do not use in new targets, use #DeviceClass::vmsd instead.
>> diff --git a/cpu-target.c b/cpu-target.c
>> index 667688332c929aa53782c94343def34571272d5f..88158272c06cc42424d435b9701e33735f080239 100644
>> --- a/cpu-target.c
>> +++ b/cpu-target.c
>> @@ -472,3 +472,8 @@ const char *target_name(void)
>> {
>> return TARGET_NAME;
>> }
>> +
>> +uint32_t target_phys_bits(void)
>> +{
>> + return TARGET_PHYS_ADDR_SPACE_BITS;
>> +}
>> diff --git a/hw/core/cpu-system.c b/hw/core/cpu-system.c
>> index 6aae28a349a7a377d010ff9dcab5ebc29e1126ca..05067d84f4258facf4252216f17729e390d38eae 100644
>> --- a/hw/core/cpu-system.c
>> +++ b/hw/core/cpu-system.c
>> @@ -60,6 +60,17 @@ hwaddr cpu_get_phys_page_attrs_debug(CPUState *cpu, vaddr addr,
>> return cc->sysemu_ops->get_phys_page_debug(cpu, addr);
>> }
>> +uint32_t cpu_get_phys_bits(const CPUState *cpu)
>> +{
>> + CPUClass *cc = CPU_GET_CLASS(cpu);
>> +
>> + if (cc->sysemu_ops->get_phys_bits) {
>> + return cc->sysemu_ops->get_phys_bits(cpu);
>> + }
>> +
>> + return target_phys_bits();
>
> I might have suggested to return TARGET_PHYS_ADDR_SPACE_BITS
> by default in v1, I'm not sure about it anymore. Maybe default
> to 0 and have each target register its helper if necessary?
sure. No problem. I can change that.
Thanks,
C.
>
>> +}
>> +
>> hwaddr cpu_get_phys_page_debug(CPUState *cpu, vaddr addr)
>> {
>> MemTxAttrs attrs = {};
>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>> index b5dd60d2812e0c3d13c1743fd485a9068ab29c4f..01cf9a44963710a415c755c17582730f75233143 100644
>> --- a/target/i386/cpu.c
>> +++ b/target/i386/cpu.c
>> @@ -8393,6 +8393,11 @@ static bool x86_cpu_get_paging_enabled(const CPUState *cs)
>> return cpu->env.cr[0] & CR0_PG_MASK;
>> }
>> +
>> +static uint32_t x86_cpu_get_phys_bits(const CPUState *cs)
>> +{
>> + return X86_CPU(cs)->phys_bits;
>> +}
>> #endif /* !CONFIG_USER_ONLY */
>> static void x86_cpu_set_pc(CPUState *cs, vaddr value)
>> @@ -8701,6 +8706,7 @@ static const struct SysemuCPUOps i386_sysemu_ops = {
>> .get_memory_mapping = x86_cpu_get_memory_mapping,
>> .get_paging_enabled = x86_cpu_get_paging_enabled,
>> .get_phys_page_attrs_debug = x86_cpu_get_phys_page_attrs_debug,
>> + .get_phys_bits = x86_cpu_get_phys_bits,
>> .asidx_from_attrs = x86_asidx_from_attrs,
>> .get_crash_info = x86_cpu_get_crash_info,
>> .write_elf32_note = x86_cpu_write_elf32_note,
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 8/9] vfio: Check compatibility of CPU and IOMMU address space width
2025-01-30 13:43 ` [PATCH v2 8/9] vfio: Check compatibility of CPU and IOMMU address space width Cédric Le Goater
2025-01-30 18:58 ` Alex Williamson
2025-03-06 10:33 ` Philippe Mathieu-Daudé
@ 2025-09-05 13:04 ` Daniel Kral
2025-09-08 8:35 ` Cédric Le Goater
2 siblings, 1 reply; 35+ messages in thread
From: Daniel Kral @ 2025-09-05 13:04 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel
Cc: Alex Williamson,
qemu-devel-bounces+qemu-devel=archiver.kernel.org
On Thu Jan 30, 2025 at 2:43 PM CET, Cédric Le Goater wrote:
> Print a warning if IOMMU address space width is smaller than the
> physical address width. In this case, PCI peer-to-peer transactions on
> BARs are not supported and failures of device MMIO regions are to be
> expected.
>
> This can occur with the 39-bit IOMMU address space width as found on
> consumer grade processors or when using a vIOMMU device with default
> settings.
>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
Hi Cédric!
Some of our users are running into this with Proxmox VE, where they get
vfio_container_dma_map(...) = -22 errors, which are likely caused by
this issue of the mismatch mentioned above. Setting the guest-phys-bits
in accordance to the iommu aw-bits seems to fix that for users, e.g.
[0].
Before applying this downstream for pve-qemu, I saw that this patch was
dropped in the v3 [1], but you mentioned that this is addressed in a
later series. I couldn't find a direct follow-up in the archive, are
there any updates on this?
Thanks a lot in advance & have a nice weekend!
Daniel
[0] https://forum.proxmox.com/threads/169586/page-3#post-795813
[1] https://lore.kernel.org/qemu-devel/20250206131438.1505542-1-clg@redhat.com/
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 8/9] vfio: Check compatibility of CPU and IOMMU address space width
2025-09-05 13:04 ` Daniel Kral
@ 2025-09-08 8:35 ` Cédric Le Goater
0 siblings, 0 replies; 35+ messages in thread
From: Cédric Le Goater @ 2025-09-08 8:35 UTC (permalink / raw)
To: Daniel Kral, qemu-devel
Cc: Alex Williamson,
qemu-devel-bounces+qemu-devel=archiver.kernel.org
On 9/5/25 15:04, Daniel Kral wrote:
> On Thu Jan 30, 2025 at 2:43 PM CET, Cédric Le Goater wrote:
>> Print a warning if IOMMU address space width is smaller than the
>> physical address width. In this case, PCI peer-to-peer transactions on
>> BARs are not supported and failures of device MMIO regions are to be
>> expected.
>>
>> This can occur with the 39-bit IOMMU address space width as found on
>> consumer grade processors or when using a vIOMMU device with default
>> settings.
>>
>> Signed-off-by: Cédric Le Goater <clg@redhat.com>
>
> Hi Cédric!
>
> Some of our users are running into this with Proxmox VE, where they get
> vfio_container_dma_map(...) = -22 errors, which are likely caused by
> this issue of the mismatch mentioned above. Setting the guest-phys-bits
> in accordance to the iommu aw-bits seems to fix that for users, e.g.
> [0].
>
> Before applying this downstream for pve-qemu, I saw that this patch was
> dropped in the v3 [1], but you mentioned that this is addressed in a
> later series. I couldn't find a direct follow-up in the archive, are
> there any updates on this?
Hello Daniel,
There have been several changes in VFIO since this patch was
submitted. The code will need to be reworked, as it is no longer
possible to check the IOMMU address space width before attaching
the device. At that stage, the IOVA ranges are still unknown.
Since this affects older Intel consumer-grade CPUs (~ 12th gen),
the priority is low. Please open an issue if this support is
important for your users.
That said, vfio_device_get_aw_bits() needs to be improved to be
more robust. I will work on that.
Thanks,
C.
^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2025-09-08 8:37 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-30 13:43 [PATCH v2 0/9]vfio: Improve error reporting when MMIO region mapping fails Cédric Le Goater
2025-01-30 13:43 ` [PATCH v2 1/9] util/error: Introduce warn_report_once_err() Cédric Le Goater
2025-01-30 14:25 ` Markus Armbruster
2025-01-30 16:03 ` Cédric Le Goater
2025-01-30 16:28 ` Cédric Le Goater
2025-01-30 17:55 ` Alex Williamson
2025-01-30 21:26 ` Cédric Le Goater
2025-01-31 8:30 ` Markus Armbruster
2025-01-30 13:43 ` [PATCH v2 2/9] vfio/pci: Replace "iommu_device" by "vIOMMU" Cédric Le Goater
2025-02-10 14:28 ` Philippe Mathieu-Daudé
2025-01-30 13:43 ` [PATCH v2 3/9] vfio: Rephrase comment in vfio_listener_region_add() error path Cédric Le Goater
2025-01-30 13:43 ` [PATCH v2 4/9] vfio: Introduce vfio_get_vfio_device() Cédric Le Goater
2025-02-10 14:32 ` Philippe Mathieu-Daudé
2025-02-10 16:19 ` Cédric Le Goater
2025-01-30 13:43 ` [PATCH v2 5/9] vfio: Improve error reporting when MMIO region mapping fails Cédric Le Goater
2025-02-10 14:36 ` Philippe Mathieu-Daudé
2025-02-10 16:17 ` Cédric Le Goater
2025-01-30 13:43 ` [PATCH v2 6/9] vfio: Remove reports of DMA mapping errors in backends Cédric Le Goater
2025-01-30 13:43 ` [PATCH v2 7/9] cpu: Introduce cpu_get_phys_bits() Cédric Le Goater
2025-02-10 14:40 ` Philippe Mathieu-Daudé
2025-03-06 10:37 ` Philippe Mathieu-Daudé
2025-03-06 14:41 ` Cédric Le Goater
2025-01-30 13:43 ` [PATCH v2 8/9] vfio: Check compatibility of CPU and IOMMU address space width Cédric Le Goater
2025-01-30 18:58 ` Alex Williamson
2025-01-31 12:42 ` Cédric Le Goater
2025-01-31 13:23 ` Gerd Hoffmann
2025-01-31 17:03 ` Cédric Le Goater
2025-02-06 7:54 ` Gerd Hoffmann
2025-02-06 17:05 ` Cédric Le Goater
2025-01-31 22:18 ` Alex Williamson
2025-02-06 8:22 ` Gerd Hoffmann
2025-03-06 10:33 ` Philippe Mathieu-Daudé
2025-09-05 13:04 ` Daniel Kral
2025-09-08 8:35 ` Cédric Le Goater
2025-01-30 13:43 ` [PATCH v2 9/9] vfio: Remove superfluous error report in vfio_listener_region_add() Cédric Le Goater
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).