* [PATCH rfcv2 00/17] intel_iommu: Enable stage-1 translation for emulated device
@ 2024-05-22 6:22 Zhenzhong Duan
2024-05-22 6:22 ` [PATCH rfcv2 01/17] intel_iommu: Update version to 3.0 and add the latest fault reasons Zhenzhong Duan
` (17 more replies)
0 siblings, 18 replies; 29+ messages in thread
From: Zhenzhong Duan @ 2024-05-22 6:22 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, mst, peterx, jasowang, jgg,
nicolinc, joao.m.martins, clement.mathieu--drif, kevin.tian,
yi.l.liu, chao.p.peng, Zhenzhong Duan
Hi,
Per Jason Wang's suggestion, iommufd nesting series[1] is split into
"Enable stage-1 translation for emulated device" series and
"Enable stage-1 translation for passthrough device" series.
This series enables stage-1 translation support for emulated device
in intel iommu which we called "modern" mode.
PATCH1-5: Some preparing work before support stage-1 translation
PATCH6-8: Implement stage-1 translation for emulated device
PATCH9-14: Emulate iotlb invalidation of stage-1 mapping
PATCH15: Set default aw_bits to 48 in scalable modren mode
PATCH16: Introduce "modern" mode to distinguish with legacy mode
PATCH17: Add qtest
Qemu code can be found at [2]
[1] https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg02740.html
[2] https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_stage1_emu_rfcv2
Thanks
Zhenzhong
Changelog:
v2:
- split from nesting series (Jason)
- merged some commits from Clement
- add qtest (jason)
Clément Mathieu--Drif (5):
intel_iommu: check if the input address is canonical
intel_iommu: set accessed and dirty bits during first stage
translation
intel_iommu: Extract device IOTLB invalidation logic
intel_iommu: add an internal API to find an address space with PASID
intel_iommu: add support for PASID-based device IOTLB invalidation
Yi Liu (3):
intel_iommu: Rename slpte to pte
intel_iommu: Implement stage-1 translation
intel_iommu: Modify x-scalable-mode to be string option
Yu Zhang (1):
intel_iommu: Update version to 3.0 and add the latest fault reasons
Zhenzhong Duan (8):
intel_iommu: Make pasid entry type check accurate
intel_iommu: Add a placeholder variable for scalable modern mode
intel_iommu: Flush stage-2 cache in PADID-selective PASID-based iotlb
invalidation
intel_iommu: Flush stage-1 cache in iotlb invalidation
intel_iommu: Process PASID-based iotlb invalidation
intel_iommu: piotlb invalidation should notify unmap
intel_iommu: Set default aw_bits to 48 in scalable modren mode
tests/qtest: Add intel-iommu test
MAINTAINERS | 1 +
hw/i386/intel_iommu_internal.h | 60 +++-
include/hw/i386/intel_iommu.h | 5 +-
hw/i386/intel_iommu.c | 639 ++++++++++++++++++++++++++++-----
tests/qtest/intel-iommu-test.c | 63 ++++
tests/qtest/meson.build | 1 +
6 files changed, 676 insertions(+), 93 deletions(-)
create mode 100644 tests/qtest/intel-iommu-test.c
--
2.34.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH rfcv2 01/17] intel_iommu: Update version to 3.0 and add the latest fault reasons
2024-05-22 6:22 [PATCH rfcv2 00/17] intel_iommu: Enable stage-1 translation for emulated device Zhenzhong Duan
@ 2024-05-22 6:22 ` Zhenzhong Duan
2024-05-22 6:22 ` [PATCH rfcv2 02/17] intel_iommu: Make pasid entry type check accurate Zhenzhong Duan
` (16 subsequent siblings)
17 siblings, 0 replies; 29+ messages in thread
From: Zhenzhong Duan @ 2024-05-22 6:22 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, mst, peterx, jasowang, jgg,
nicolinc, joao.m.martins, clement.mathieu--drif, kevin.tian,
yi.l.liu, chao.p.peng, Yu Zhang, Zhenzhong Duan, Marcel Apfelbaum,
Paolo Bonzini, Richard Henderson, Eduardo Habkost
From: Yu Zhang <yu.c.zhang@linux.intel.com>
The scalable mode was introduced in VTD spec 3.0, now that
the scalable mode is already supported, bump version to 3.0.
In spec 3.0 some more detailed fault reasons are defined
for scalable mode. So introduce them into emulation code,
see spec section 7.1.2 for details.
Guest kernel should use the version for informational purpose
not feature check, cap/ecap bits should be checked instead.
So this change will not impact migration.
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu_internal.h | 9 ++++++++-
hw/i386/intel_iommu.c | 27 +++++++++++++++++----------
2 files changed, 25 insertions(+), 11 deletions(-)
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index b800d62ca0..955bc24787 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -312,7 +312,14 @@ typedef enum VTDFaultReason {
* request while disabled */
VTD_FR_IR_SID_ERR = 0x26, /* Invalid Source-ID */
- VTD_FR_PASID_TABLE_INV = 0x58, /*Invalid PASID table entry */
+ /* PASID directory entry access failure */
+ VTD_FR_PASID_DIR_ACCESS_ERR = 0x50,
+ /* The Present(P) field of pasid directory entry is 0 */
+ VTD_FR_PASID_DIR_ENTRY_P = 0x51,
+ VTD_FR_PASID_TABLE_ACCESS_ERR = 0x58, /* PASID table entry access failure */
+ /* The Present(P) field of pasid table entry is 0 */
+ VTD_FR_PASID_ENTRY_P = 0x59,
+ VTD_FR_PASID_TABLE_ENTRY_INV = 0x5b, /*Invalid PASID table entry */
/* Output address in the interrupt address range for scalable mode */
VTD_FR_SM_INTERRUPT_ADDR = 0x87,
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 07bfd4f99e..d85aaf4bb8 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -779,7 +779,7 @@ static int vtd_get_pdire_from_pdir_table(dma_addr_t pasid_dir_base,
addr = pasid_dir_base + index * entry_size;
if (dma_memory_read(&address_space_memory, addr,
pdire, entry_size, MEMTXATTRS_UNSPECIFIED)) {
- return -VTD_FR_PASID_TABLE_INV;
+ return -VTD_FR_PASID_DIR_ACCESS_ERR;
}
pdire->val = le64_to_cpu(pdire->val);
@@ -797,6 +797,7 @@ static int vtd_get_pe_in_pasid_leaf_table(IntelIOMMUState *s,
dma_addr_t addr,
VTDPASIDEntry *pe)
{
+ uint8_t pgtt;
uint32_t index;
dma_addr_t entry_size;
X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
@@ -806,7 +807,7 @@ static int vtd_get_pe_in_pasid_leaf_table(IntelIOMMUState *s,
addr = addr + index * entry_size;
if (dma_memory_read(&address_space_memory, addr,
pe, entry_size, MEMTXATTRS_UNSPECIFIED)) {
- return -VTD_FR_PASID_TABLE_INV;
+ return -VTD_FR_PASID_TABLE_ACCESS_ERR;
}
for (size_t i = 0; i < ARRAY_SIZE(pe->val); i++) {
pe->val[i] = le64_to_cpu(pe->val[i]);
@@ -814,11 +815,13 @@ static int vtd_get_pe_in_pasid_leaf_table(IntelIOMMUState *s,
/* Do translation type check */
if (!vtd_pe_type_check(x86_iommu, pe)) {
- return -VTD_FR_PASID_TABLE_INV;
+ return -VTD_FR_PASID_TABLE_ENTRY_INV;
}
- if (!vtd_is_level_supported(s, VTD_PE_GET_LEVEL(pe))) {
- return -VTD_FR_PASID_TABLE_INV;
+ pgtt = VTD_PE_GET_TYPE(pe);
+ if (pgtt == VTD_SM_PASID_ENTRY_SLT &&
+ !vtd_is_level_supported(s, VTD_PE_GET_LEVEL(pe))) {
+ return -VTD_FR_PASID_TABLE_ENTRY_INV;
}
return 0;
@@ -859,7 +862,7 @@ static int vtd_get_pe_from_pasid_table(IntelIOMMUState *s,
}
if (!vtd_pdire_present(&pdire)) {
- return -VTD_FR_PASID_TABLE_INV;
+ return -VTD_FR_PASID_DIR_ENTRY_P;
}
ret = vtd_get_pe_from_pdire(s, pasid, &pdire, pe);
@@ -868,7 +871,7 @@ static int vtd_get_pe_from_pasid_table(IntelIOMMUState *s,
}
if (!vtd_pe_present(pe)) {
- return -VTD_FR_PASID_TABLE_INV;
+ return -VTD_FR_PASID_ENTRY_P;
}
return 0;
@@ -921,7 +924,7 @@ static int vtd_ce_get_pasid_fpd(IntelIOMMUState *s,
}
if (!vtd_pdire_present(&pdire)) {
- return -VTD_FR_PASID_TABLE_INV;
+ return -VTD_FR_PASID_DIR_ENTRY_P;
}
/*
@@ -1778,7 +1781,11 @@ static const bool vtd_qualified_faults[] = {
[VTD_FR_ROOT_ENTRY_RSVD] = false,
[VTD_FR_PAGING_ENTRY_RSVD] = true,
[VTD_FR_CONTEXT_ENTRY_TT] = true,
- [VTD_FR_PASID_TABLE_INV] = false,
+ [VTD_FR_PASID_DIR_ACCESS_ERR] = false,
+ [VTD_FR_PASID_DIR_ENTRY_P] = true,
+ [VTD_FR_PASID_TABLE_ACCESS_ERR] = false,
+ [VTD_FR_PASID_ENTRY_P] = true,
+ [VTD_FR_PASID_TABLE_ENTRY_INV] = true,
[VTD_FR_SM_INTERRUPT_ADDR] = true,
[VTD_FR_MAX] = false,
};
@@ -4138,7 +4145,7 @@ static void vtd_init(IntelIOMMUState *s)
vtd_reset_caches(s);
/* Define registers with default values and bit semantics */
- vtd_define_long(s, DMAR_VER_REG, 0x10UL, 0, 0);
+ vtd_define_long(s, DMAR_VER_REG, 0x30UL, 0, 0);
vtd_define_quad(s, DMAR_CAP_REG, s->cap, 0, 0);
vtd_define_quad(s, DMAR_ECAP_REG, s->ecap, 0, 0);
vtd_define_long(s, DMAR_GCMD_REG, 0, 0xff800000UL, 0);
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH rfcv2 02/17] intel_iommu: Make pasid entry type check accurate
2024-05-22 6:22 [PATCH rfcv2 00/17] intel_iommu: Enable stage-1 translation for emulated device Zhenzhong Duan
2024-05-22 6:22 ` [PATCH rfcv2 01/17] intel_iommu: Update version to 3.0 and add the latest fault reasons Zhenzhong Duan
@ 2024-05-22 6:22 ` Zhenzhong Duan
2024-05-22 6:22 ` [PATCH rfcv2 03/17] intel_iommu: Add a placeholder variable for scalable modern mode Zhenzhong Duan
` (15 subsequent siblings)
17 siblings, 0 replies; 29+ messages in thread
From: Zhenzhong Duan @ 2024-05-22 6:22 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, mst, peterx, jasowang, jgg,
nicolinc, joao.m.martins, clement.mathieu--drif, kevin.tian,
yi.l.liu, chao.p.peng, Zhenzhong Duan, Marcel Apfelbaum,
Paolo Bonzini, Richard Henderson, Eduardo Habkost
When guest configures Nested Translation(011b) or First-stage Translation only
(001b), type check passed unaccurately.
Fails the type check in those cases as their simulation isn't supported yet.
Fixes: fb43cf739e1 ("intel_iommu: scalable mode emulation")
Suggested-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index d85aaf4bb8..348e3a441e 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -742,20 +742,16 @@ static inline bool vtd_pe_type_check(X86IOMMUState *x86_iommu,
VTDPASIDEntry *pe)
{
switch (VTD_PE_GET_TYPE(pe)) {
- case VTD_SM_PASID_ENTRY_FLT:
case VTD_SM_PASID_ENTRY_SLT:
- case VTD_SM_PASID_ENTRY_NESTED:
- break;
+ return true;
case VTD_SM_PASID_ENTRY_PT:
- if (!x86_iommu->pt_supported) {
- return false;
- }
- break;
+ return x86_iommu->pt_supported;
+ case VTD_SM_PASID_ENTRY_FLT:
+ case VTD_SM_PASID_ENTRY_NESTED:
default:
/* Unknown type */
return false;
}
- return true;
}
static inline bool vtd_pdire_present(VTDPASIDDirEntry *pdire)
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH rfcv2 03/17] intel_iommu: Add a placeholder variable for scalable modern mode
2024-05-22 6:22 [PATCH rfcv2 00/17] intel_iommu: Enable stage-1 translation for emulated device Zhenzhong Duan
2024-05-22 6:22 ` [PATCH rfcv2 01/17] intel_iommu: Update version to 3.0 and add the latest fault reasons Zhenzhong Duan
2024-05-22 6:22 ` [PATCH rfcv2 02/17] intel_iommu: Make pasid entry type check accurate Zhenzhong Duan
@ 2024-05-22 6:22 ` Zhenzhong Duan
2024-05-22 6:23 ` [PATCH rfcv2 04/17] intel_iommu: Flush stage-2 cache in PADID-selective PASID-based iotlb invalidation Zhenzhong Duan
` (14 subsequent siblings)
17 siblings, 0 replies; 29+ messages in thread
From: Zhenzhong Duan @ 2024-05-22 6:22 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, mst, peterx, jasowang, jgg,
nicolinc, joao.m.martins, clement.mathieu--drif, kevin.tian,
yi.l.liu, chao.p.peng, Zhenzhong Duan, Marcel Apfelbaum,
Paolo Bonzini, Richard Henderson, Eduardo Habkost
Add an new element scalable_mode in IntelIOMMUState to mark scalable
modern mode, this element will be exposed as an intel_iommu property
finally.
For now, it's only a placehholder and used for cap/ecap initialization,
compatibility check and block host device passthrough until nesting
is supported.
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu_internal.h | 2 ++
include/hw/i386/intel_iommu.h | 1 +
hw/i386/intel_iommu.c | 37 ++++++++++++++++++++++++----------
3 files changed, 29 insertions(+), 11 deletions(-)
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 955bc24787..75aea80942 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -196,6 +196,7 @@
#define VTD_ECAP_PASID (1ULL << 40)
#define VTD_ECAP_SMTS (1ULL << 43)
#define VTD_ECAP_SLTS (1ULL << 46)
+#define VTD_ECAP_FLTS (1ULL << 47)
/* CAP_REG */
/* (offset >> 4) << 24 */
@@ -212,6 +213,7 @@
#define VTD_CAP_SLLPS ((1ULL << 34) | (1ULL << 35))
#define VTD_CAP_DRAIN_WRITE (1ULL << 54)
#define VTD_CAP_DRAIN_READ (1ULL << 55)
+#define VTD_CAP_FS1GP (1ULL << 56)
#define VTD_CAP_DRAIN (VTD_CAP_DRAIN_READ | VTD_CAP_DRAIN_WRITE)
#define VTD_CAP_CM (1ULL << 7)
#define VTD_PASID_ID_SHIFT 20
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 2bbde41e45..9ba9c45015 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -263,6 +263,7 @@ struct IntelIOMMUState {
bool caching_mode; /* RO - is cap CM enabled? */
bool scalable_mode; /* RO - is Scalable Mode supported? */
+ bool scalable_modern; /* RO - is modern SM supported? */
bool snoop_control; /* RO - is SNP filed supported? */
dma_addr_t root; /* Current root table pointer */
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 348e3a441e..6d1d94ada3 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -738,16 +738,20 @@ static inline bool vtd_is_level_supported(IntelIOMMUState *s, uint32_t level)
}
/* Return true if check passed, otherwise false */
-static inline bool vtd_pe_type_check(X86IOMMUState *x86_iommu,
- VTDPASIDEntry *pe)
+static inline bool vtd_pe_type_check(IntelIOMMUState *s, VTDPASIDEntry *pe)
{
+ X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
+
switch (VTD_PE_GET_TYPE(pe)) {
+ case VTD_SM_PASID_ENTRY_FLT:
+ return s->scalable_modern;
case VTD_SM_PASID_ENTRY_SLT:
- return true;
+ return !s->scalable_modern;
+ case VTD_SM_PASID_ENTRY_NESTED:
+ /* Not support NESTED page table type yet */
+ return false;
case VTD_SM_PASID_ENTRY_PT:
return x86_iommu->pt_supported;
- case VTD_SM_PASID_ENTRY_FLT:
- case VTD_SM_PASID_ENTRY_NESTED:
default:
/* Unknown type */
return false;
@@ -796,7 +800,6 @@ static int vtd_get_pe_in_pasid_leaf_table(IntelIOMMUState *s,
uint8_t pgtt;
uint32_t index;
dma_addr_t entry_size;
- X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
index = VTD_PASID_TABLE_INDEX(pasid);
entry_size = VTD_PASID_ENTRY_SIZE;
@@ -810,7 +813,7 @@ static int vtd_get_pe_in_pasid_leaf_table(IntelIOMMUState *s,
}
/* Do translation type check */
- if (!vtd_pe_type_check(x86_iommu, pe)) {
+ if (!vtd_pe_type_check(s, pe)) {
return -VTD_FR_PASID_TABLE_ENTRY_INV;
}
@@ -3839,8 +3842,17 @@ static bool vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice *vtd_hdev,
error_setg(errp, "aw-bits %d > host aw-bits %d", s->aw_bits, ret);
return false;
}
-#endif
+
+ if (!s->scalable_modern) {
+ /* All checks requested by VTD non-modern mode pass */
+ return true;
+ }
+
+ error_setg(errp, "host device is unsupported in scalable modern mode yet");
+ return false;
+#else
return true;
+#endif
}
static bool vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
@@ -4076,7 +4088,10 @@ static void vtd_cap_init(IntelIOMMUState *s)
}
/* TODO: read cap/ecap from host to decide which cap to be exposed. */
- if (s->scalable_mode) {
+ if (s->scalable_modern) {
+ s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_FLTS;
+ s->cap |= VTD_CAP_FS1GP;
+ } else if (s->scalable_mode) {
s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS;
}
@@ -4243,9 +4258,9 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
}
}
- /* Currently only address widths supported are 39 and 48 bits */
if ((s->aw_bits != VTD_HOST_AW_39BIT) &&
- (s->aw_bits != VTD_HOST_AW_48BIT)) {
+ (s->aw_bits != VTD_HOST_AW_48BIT) &&
+ !s->scalable_modern) {
error_setg(errp, "Supported values for aw-bits are: %d, %d",
VTD_HOST_AW_39BIT, VTD_HOST_AW_48BIT);
return false;
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH rfcv2 04/17] intel_iommu: Flush stage-2 cache in PADID-selective PASID-based iotlb invalidation
2024-05-22 6:22 [PATCH rfcv2 00/17] intel_iommu: Enable stage-1 translation for emulated device Zhenzhong Duan
` (2 preceding siblings ...)
2024-05-22 6:22 ` [PATCH rfcv2 03/17] intel_iommu: Add a placeholder variable for scalable modern mode Zhenzhong Duan
@ 2024-05-22 6:23 ` Zhenzhong Duan
2024-05-22 6:23 ` [PATCH rfcv2 05/17] intel_iommu: Rename slpte to pte Zhenzhong Duan
` (13 subsequent siblings)
17 siblings, 0 replies; 29+ messages in thread
From: Zhenzhong Duan @ 2024-05-22 6:23 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, mst, peterx, jasowang, jgg,
nicolinc, joao.m.martins, clement.mathieu--drif, kevin.tian,
yi.l.liu, chao.p.peng, Zhenzhong Duan, Paolo Bonzini,
Richard Henderson, Eduardo Habkost, Marcel Apfelbaum
Per spec 6.5.2.4, PADID-selective PASID-based iotlb invalidation will
flush stage-2 iotlb entries with matching domain id and pasid.
With scalable modern mdoe introduced, guest could send PADID-selective
PASID-based iotlb invalidation to flush both stage-1 and stage-2 entries.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu_internal.h | 10 +++++
hw/i386/intel_iommu.c | 78 ++++++++++++++++++++++++++++++++++
2 files changed, 88 insertions(+)
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 75aea80942..b0d9b1f986 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -441,6 +441,16 @@ typedef union VTDInvDesc VTDInvDesc;
(0x3ffff800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM | VTD_SL_TM)) : \
(0x3ffff800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM))
+#define VTD_INV_DESC_PIOTLB_ALL_IN_PASID (2ULL << 4)
+#define VTD_INV_DESC_PIOTLB_PSI_IN_PASID (3ULL << 4)
+
+#define VTD_INV_DESC_PIOTLB_RSVD_VAL0 0xfff000000000ffc0ULL
+#define VTD_INV_DESC_PIOTLB_RSVD_VAL1 0xf80ULL
+
+#define VTD_INV_DESC_PIOTLB_PASID(val) (((val) >> 32) & 0xfffffULL)
+#define VTD_INV_DESC_PIOTLB_DID(val) (((val) >> 16) & \
+ VTD_DOMAIN_ID_MASK)
+
/* Information about page-selective IOTLB invalidate */
struct VTDIOTLBPageInvInfo {
uint16_t domain_id;
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 6d1d94ada3..ed95b5ba2e 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2642,6 +2642,80 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
return true;
}
+static gboolean vtd_hash_remove_by_pasid(gpointer key, gpointer value,
+ gpointer user_data)
+{
+ VTDIOTLBEntry *entry = (VTDIOTLBEntry *)value;
+ VTDIOTLBPageInvInfo *info = (VTDIOTLBPageInvInfo *)user_data;
+
+ return ((entry->domain_id == info->domain_id) &&
+ (entry->pasid == info->pasid));
+}
+
+static void vtd_piotlb_pasid_invalidate(IntelIOMMUState *s,
+ uint16_t domain_id, uint32_t pasid)
+{
+ VTDIOTLBPageInvInfo info;
+ VTDAddressSpace *vtd_as;
+ VTDContextEntry ce;
+
+ info.domain_id = domain_id;
+ info.pasid = pasid;
+
+ vtd_iommu_lock(s);
+ g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_pasid,
+ &info);
+ vtd_iommu_unlock(s);
+
+ QLIST_FOREACH(vtd_as, &s->vtd_as_with_notifiers, next) {
+ if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
+ vtd_as->devfn, &ce) &&
+ domain_id == vtd_get_domain_id(s, &ce, vtd_as->pasid)) {
+ uint32_t rid2pasid = VTD_CE_GET_RID2PASID(&ce);
+
+ if ((vtd_as->pasid != PCI_NO_PASID || pasid != rid2pasid) &&
+ vtd_as->pasid != pasid) {
+ continue;
+ }
+
+ if (!s->scalable_modern) {
+ vtd_address_space_sync(vtd_as);
+ }
+ }
+ }
+}
+
+static bool vtd_process_piotlb_desc(IntelIOMMUState *s,
+ VTDInvDesc *inv_desc)
+{
+ uint16_t domain_id;
+ uint32_t pasid;
+
+ if ((inv_desc->val[0] & VTD_INV_DESC_PIOTLB_RSVD_VAL0) ||
+ (inv_desc->val[1] & VTD_INV_DESC_PIOTLB_RSVD_VAL1)) {
+ error_report_once("non-zero-field-in-piotlb_inv_desc hi: 0x%" PRIx64
+ " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]);
+ return false;
+ }
+
+ domain_id = VTD_INV_DESC_PIOTLB_DID(inv_desc->val[0]);
+ pasid = VTD_INV_DESC_PIOTLB_PASID(inv_desc->val[0]);
+ switch (inv_desc->val[0] & VTD_INV_DESC_IOTLB_G) {
+ case VTD_INV_DESC_PIOTLB_ALL_IN_PASID:
+ vtd_piotlb_pasid_invalidate(s, domain_id, pasid);
+ break;
+
+ case VTD_INV_DESC_PIOTLB_PSI_IN_PASID:
+ break;
+
+ default:
+ error_report_once("Invalid granularity in P-IOTLB desc hi: 0x%" PRIx64
+ " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]);
+ return false;
+ }
+ return true;
+}
+
static bool vtd_process_inv_iec_desc(IntelIOMMUState *s,
VTDInvDesc *inv_desc)
{
@@ -2752,6 +2826,10 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
break;
case VTD_INV_DESC_PIOTLB:
+ trace_vtd_inv_desc("p-iotlb", inv_desc.val[1], inv_desc.val[0]);
+ if (!vtd_process_piotlb_desc(s, &inv_desc)) {
+ return false;
+ }
break;
case VTD_INV_DESC_WAIT:
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH rfcv2 05/17] intel_iommu: Rename slpte to pte
2024-05-22 6:22 [PATCH rfcv2 00/17] intel_iommu: Enable stage-1 translation for emulated device Zhenzhong Duan
` (3 preceding siblings ...)
2024-05-22 6:23 ` [PATCH rfcv2 04/17] intel_iommu: Flush stage-2 cache in PADID-selective PASID-based iotlb invalidation Zhenzhong Duan
@ 2024-05-22 6:23 ` Zhenzhong Duan
2024-05-22 6:23 ` [PATCH rfcv2 06/17] intel_iommu: Implement stage-1 translation Zhenzhong Duan
` (12 subsequent siblings)
17 siblings, 0 replies; 29+ messages in thread
From: Zhenzhong Duan @ 2024-05-22 6:23 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, mst, peterx, jasowang, jgg,
nicolinc, joao.m.martins, clement.mathieu--drif, kevin.tian,
yi.l.liu, chao.p.peng, Yi Sun, Zhenzhong Duan, Marcel Apfelbaum,
Paolo Bonzini, Richard Henderson, Eduardo Habkost
From: Yi Liu <yi.l.liu@intel.com>
Because we will support both FST(a.k.a, FLT) and SST(a.k.a, SLT) translation,
rename slpte to pte to make it generic.
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu_internal.h | 3 ++-
include/hw/i386/intel_iommu.h | 2 +-
hw/i386/intel_iommu.c | 39 +++++++++++++++++-----------------
3 files changed, 23 insertions(+), 21 deletions(-)
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index b0d9b1f986..0e240d6d54 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -553,10 +553,11 @@ typedef struct VTDRootEntry VTDRootEntry;
#define VTD_SL_RW_MASK 3ULL
#define VTD_SL_R 1ULL
#define VTD_SL_W (1ULL << 1)
-#define VTD_SL_PT_BASE_ADDR_MASK(aw) (~(VTD_PAGE_SIZE - 1) & VTD_HAW_MASK(aw))
#define VTD_SL_IGN_COM 0xbff0000000000000ULL
#define VTD_SL_TM (1ULL << 62)
+/* Common for both First Level and Second Level */
+#define VTD_PT_BASE_ADDR_MASK(aw) (~(VTD_PAGE_SIZE - 1) & VTD_HAW_MASK(aw))
typedef struct VTDHostIOMMUDevice {
IntelIOMMUState *iommu_state;
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 9ba9c45015..011f374883 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -153,7 +153,7 @@ struct VTDIOTLBEntry {
uint64_t gfn;
uint16_t domain_id;
uint32_t pasid;
- uint64_t slpte;
+ uint64_t pte;
uint64_t mask;
uint8_t access_flags;
};
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index ed95b5ba2e..544e8f0e40 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -360,7 +360,7 @@ out:
/* Must be with IOMMU lock held */
static void vtd_update_iotlb(IntelIOMMUState *s, uint16_t source_id,
- uint16_t domain_id, hwaddr addr, uint64_t slpte,
+ uint16_t domain_id, hwaddr addr, uint64_t pte,
uint8_t access_flags, uint32_t level,
uint32_t pasid)
{
@@ -368,7 +368,7 @@ static void vtd_update_iotlb(IntelIOMMUState *s, uint16_t source_id,
struct vtd_iotlb_key *key = g_malloc(sizeof(*key));
uint64_t gfn = vtd_get_iotlb_gfn(addr, level);
- trace_vtd_iotlb_page_update(source_id, addr, slpte, domain_id);
+ trace_vtd_iotlb_page_update(source_id, addr, pte, domain_id);
if (g_hash_table_size(s->iotlb) >= VTD_IOTLB_MAX_SIZE) {
trace_vtd_iotlb_reset("iotlb exceeds size limit");
vtd_reset_iotlb_locked(s);
@@ -376,7 +376,7 @@ static void vtd_update_iotlb(IntelIOMMUState *s, uint16_t source_id,
entry->gfn = gfn;
entry->domain_id = domain_id;
- entry->slpte = slpte;
+ entry->pte = pte;
entry->access_flags = access_flags;
entry->mask = vtd_slpt_level_page_mask(level);
entry->pasid = pasid;
@@ -693,9 +693,9 @@ static inline dma_addr_t vtd_ce_get_slpt_base(VTDContextEntry *ce)
return ce->lo & VTD_CONTEXT_ENTRY_SLPTPTR;
}
-static inline uint64_t vtd_get_slpte_addr(uint64_t slpte, uint8_t aw)
+static inline uint64_t vtd_get_pte_addr(uint64_t slpte, uint8_t aw)
{
- return slpte & VTD_SL_PT_BASE_ADDR_MASK(aw);
+ return slpte & VTD_PT_BASE_ADDR_MASK(aw);
}
/* Whether the pte indicates the address of the page frame */
@@ -1152,11 +1152,11 @@ static int vtd_iova_to_slpte(IntelIOMMUState *s, VTDContextEntry *ce,
*slpte_level = level;
break;
}
- addr = vtd_get_slpte_addr(slpte, aw_bits);
+ addr = vtd_get_pte_addr(slpte, aw_bits);
level--;
}
- xlat = vtd_get_slpte_addr(*slptep, aw_bits);
+ xlat = vtd_get_pte_addr(*slptep, aw_bits);
size = ~vtd_slpt_level_page_mask(level) + 1;
/*
@@ -1343,7 +1343,7 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
* This is a valid PDE (or even bigger than PDE). We need
* to walk one further level.
*/
- ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, info->aw),
+ ret = vtd_page_walk_level(vtd_get_pte_addr(slpte, info->aw),
iova, MIN(iova_next, end), level - 1,
read_cur, write_cur, info);
} else {
@@ -1360,7 +1360,7 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
event.entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
event.entry.addr_mask = ~subpage_mask;
/* NOTE: this is only meaningful if entry_valid == true */
- event.entry.translated_addr = vtd_get_slpte_addr(slpte, info->aw);
+ event.entry.translated_addr = vtd_get_pte_addr(slpte, info->aw);
event.type = event.entry.perm ? IOMMU_NOTIFIER_MAP :
IOMMU_NOTIFIER_UNMAP;
ret = vtd_page_walk_one(&event, info);
@@ -1883,7 +1883,7 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
VTDContextEntry ce;
uint8_t bus_num = pci_bus_num(bus);
VTDContextCacheEntry *cc_entry;
- uint64_t slpte, page_mask;
+ uint64_t pte, page_mask;
uint32_t level, pasid = vtd_as->pasid;
uint16_t source_id = PCI_BUILD_BDF(bus_num, devfn);
int ret_fr;
@@ -1904,13 +1904,13 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
cc_entry = &vtd_as->context_cache_entry;
- /* Try to fetch slpte form IOTLB, we don't need RID2PASID logic */
+ /* Try to fetch pte form IOTLB, we don't need RID2PASID logic */
if (!rid2pasid) {
iotlb_entry = vtd_lookup_iotlb(s, source_id, pasid, addr);
if (iotlb_entry) {
- trace_vtd_iotlb_page_hit(source_id, addr, iotlb_entry->slpte,
+ trace_vtd_iotlb_page_hit(source_id, addr, iotlb_entry->pte,
iotlb_entry->domain_id);
- slpte = iotlb_entry->slpte;
+ pte = iotlb_entry->pte;
access_flags = iotlb_entry->access_flags;
page_mask = iotlb_entry->mask;
goto out;
@@ -1982,21 +1982,22 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
return true;
}
- /* Try to fetch slpte form IOTLB for RID2PASID slow path */
+ /* Try to fetch pte form IOTLB for RID2PASID slow path */
if (rid2pasid) {
iotlb_entry = vtd_lookup_iotlb(s, source_id, pasid, addr);
if (iotlb_entry) {
- trace_vtd_iotlb_page_hit(source_id, addr, iotlb_entry->slpte,
+ trace_vtd_iotlb_page_hit(source_id, addr, iotlb_entry->pte,
iotlb_entry->domain_id);
- slpte = iotlb_entry->slpte;
+ pte = iotlb_entry->pte;
access_flags = iotlb_entry->access_flags;
page_mask = iotlb_entry->mask;
goto out;
}
}
- ret_fr = vtd_iova_to_slpte(s, &ce, addr, is_write, &slpte, &level,
+ ret_fr = vtd_iova_to_slpte(s, &ce, addr, is_write, &pte, &level,
&reads, &writes, s->aw_bits, pasid);
+
if (ret_fr) {
vtd_report_fault(s, -ret_fr, is_fpd_set, source_id,
addr, is_write, pasid != PCI_NO_PASID, pasid);
@@ -2006,11 +2007,11 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
page_mask = vtd_slpt_level_page_mask(level);
access_flags = IOMMU_ACCESS_FLAG(reads, writes);
vtd_update_iotlb(s, source_id, vtd_get_domain_id(s, &ce, pasid),
- addr, slpte, access_flags, level, pasid);
+ addr, pte, access_flags, level, pasid);
out:
vtd_iommu_unlock(s);
entry->iova = addr & page_mask;
- entry->translated_addr = vtd_get_slpte_addr(slpte, s->aw_bits) & page_mask;
+ entry->translated_addr = vtd_get_pte_addr(pte, s->aw_bits) & page_mask;
entry->addr_mask = ~page_mask;
entry->perm = access_flags;
return true;
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH rfcv2 06/17] intel_iommu: Implement stage-1 translation
2024-05-22 6:22 [PATCH rfcv2 00/17] intel_iommu: Enable stage-1 translation for emulated device Zhenzhong Duan
` (4 preceding siblings ...)
2024-05-22 6:23 ` [PATCH rfcv2 05/17] intel_iommu: Rename slpte to pte Zhenzhong Duan
@ 2024-05-22 6:23 ` Zhenzhong Duan
2024-05-24 13:57 ` CLEMENT MATHIEU--DRIF
2024-05-22 6:23 ` [PATCH rfcv2 07/17] intel_iommu: check if the input address is canonical Zhenzhong Duan
` (11 subsequent siblings)
17 siblings, 1 reply; 29+ messages in thread
From: Zhenzhong Duan @ 2024-05-22 6:23 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, mst, peterx, jasowang, jgg,
nicolinc, joao.m.martins, clement.mathieu--drif, kevin.tian,
yi.l.liu, chao.p.peng, Yi Sun, Zhenzhong Duan, Marcel Apfelbaum,
Paolo Bonzini, Richard Henderson, Eduardo Habkost
From: Yi Liu <yi.l.liu@intel.com>
This adds stage-1 page table walking to support stage-1 only
transltion in scalable modern mode.
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu_internal.h | 17 +++++
hw/i386/intel_iommu.c | 128 +++++++++++++++++++++++++++++++--
2 files changed, 141 insertions(+), 4 deletions(-)
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 0e240d6d54..abfdbd5f65 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -534,6 +534,23 @@ typedef struct VTDRootEntry VTDRootEntry;
#define VTD_SM_PASID_ENTRY_AW 7ULL /* Adjusted guest-address-width */
#define VTD_SM_PASID_ENTRY_DID(val) ((val) & VTD_DOMAIN_ID_MASK)
+#define VTD_SM_PASID_ENTRY_FLPM 3ULL
+#define VTD_SM_PASID_ENTRY_FLPTPTR (~0xfffULL)
+
+/* Paging Structure common */
+#define VTD_FL_PT_PAGE_SIZE_MASK (1ULL << 7)
+/* Bits to decide the offset for each level */
+#define VTD_FL_LEVEL_BITS 9
+
+/* First Level Paging Structure */
+#define VTD_FL_PT_LEVEL 1
+#define VTD_FL_PT_ENTRY_NR 512
+
+/* Masks for First Level Paging Entry */
+#define VTD_FL_RW_MASK (1ULL << 1)
+#define VTD_FL_PT_BASE_ADDR_MASK(aw) (~(VTD_PAGE_SIZE - 1) & VTD_HAW_MASK(aw))
+#define VTD_PASID_ENTRY_FPD (1ULL << 1) /* Fault Processing Disable */
+
/* Second Level Page Translation Pointer*/
#define VTD_SM_PASID_ENTRY_SLPTPTR (~0xfffULL)
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 544e8f0e40..cf29809bc1 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -50,6 +50,8 @@
/* pe operations */
#define VTD_PE_GET_TYPE(pe) ((pe)->val[0] & VTD_SM_PASID_ENTRY_PGTT)
#define VTD_PE_GET_LEVEL(pe) (2 + (((pe)->val[0] >> 2) & VTD_SM_PASID_ENTRY_AW))
+#define VTD_PE_GET_FLPT_LEVEL(pe) \
+ (4 + (((pe)->val[2] >> 2) & VTD_SM_PASID_ENTRY_FLPM))
/*
* PCI bus number (or SID) is not reliable since the device is usaully
@@ -823,6 +825,11 @@ static int vtd_get_pe_in_pasid_leaf_table(IntelIOMMUState *s,
return -VTD_FR_PASID_TABLE_ENTRY_INV;
}
+ if (pgtt == VTD_SM_PASID_ENTRY_FLT &&
+ VTD_PE_GET_FLPT_LEVEL(pe) != 4) {
+ return -VTD_FR_PASID_TABLE_ENTRY_INV;
+ }
+
return 0;
}
@@ -958,7 +965,11 @@ static uint32_t vtd_get_iova_level(IntelIOMMUState *s,
if (s->root_scalable) {
vtd_ce_get_rid2pasid_entry(s, ce, &pe, pasid);
- return VTD_PE_GET_LEVEL(&pe);
+ if (s->scalable_modern) {
+ return VTD_PE_GET_FLPT_LEVEL(&pe);
+ } else {
+ return VTD_PE_GET_LEVEL(&pe);
+ }
}
return vtd_ce_get_level(ce);
@@ -1045,7 +1056,11 @@ static dma_addr_t vtd_get_iova_pgtbl_base(IntelIOMMUState *s,
if (s->root_scalable) {
vtd_ce_get_rid2pasid_entry(s, ce, &pe, pasid);
- return pe.val[0] & VTD_SM_PASID_ENTRY_SLPTPTR;
+ if (s->scalable_modern) {
+ return pe.val[2] & VTD_SM_PASID_ENTRY_FLPTPTR;
+ } else {
+ return pe.val[0] & VTD_SM_PASID_ENTRY_SLPTPTR;
+ }
}
return vtd_ce_get_slpt_base(ce);
@@ -1847,6 +1862,106 @@ out:
trace_vtd_pt_enable_fast_path(source_id, success);
}
+/* The shift of an addr for a certain level of paging structure */
+static inline uint32_t vtd_flpt_level_shift(uint32_t level)
+{
+ assert(level != 0);
+ return VTD_PAGE_SHIFT_4K + (level - 1) * VTD_FL_LEVEL_BITS;
+}
+
+/*
+ * Given an iova and the level of paging structure, return the offset
+ * of current level.
+ */
+static inline uint32_t vtd_iova_fl_level_offset(uint64_t iova, uint32_t level)
+{
+ return (iova >> vtd_flpt_level_shift(level)) &
+ ((1ULL << VTD_FL_LEVEL_BITS) - 1);
+}
+
+/* Get the content of a flpte located in @base_addr[@index] */
+static uint64_t vtd_get_flpte(dma_addr_t base_addr, uint32_t index)
+{
+ uint64_t flpte;
+
+ assert(index < VTD_FL_PT_ENTRY_NR);
+
+ if (dma_memory_read(&address_space_memory,
+ base_addr + index * sizeof(flpte), &flpte,
+ sizeof(flpte), MEMTXATTRS_UNSPECIFIED)) {
+ flpte = (uint64_t)-1;
+ return flpte;
+ }
+ flpte = le64_to_cpu(flpte);
+ return flpte;
+}
+
+static inline bool vtd_flpte_present(uint64_t flpte)
+{
+ return !!(flpte & 0x1);
+}
+
+/* Whether the pte indicates the address of the page frame */
+static inline bool vtd_is_last_flpte(uint64_t flpte, uint32_t level)
+{
+ return level == VTD_FL_PT_LEVEL || (flpte & VTD_FL_PT_PAGE_SIZE_MASK);
+}
+
+static inline uint64_t vtd_get_flpte_addr(uint64_t flpte, uint8_t aw)
+{
+ return flpte & VTD_FL_PT_BASE_ADDR_MASK(aw);
+}
+
+/*
+ * Given the @iova, get relevant @flptep. @flpte_level will be the last level
+ * of the translation, can be used for deciding the size of large page.
+ */
+static int vtd_iova_to_flpte(IntelIOMMUState *s, VTDContextEntry *ce,
+ uint64_t iova, bool is_write,
+ uint64_t *flptep, uint32_t *flpte_level,
+ bool *reads, bool *writes, uint8_t aw_bits,
+ uint32_t pasid)
+{
+ dma_addr_t addr = vtd_get_iova_pgtbl_base(s, ce, pasid);
+ uint32_t level = vtd_get_iova_level(s, ce, pasid);
+ uint32_t offset;
+ uint64_t flpte;
+
+ while (true) {
+ offset = vtd_iova_fl_level_offset(iova, level);
+ flpte = vtd_get_flpte(addr, offset);
+ if (flpte == (uint64_t)-1) {
+ if (level == vtd_get_iova_level(s, ce, pasid)) {
+ /* Invalid programming of context-entry */
+ return -VTD_FR_CONTEXT_ENTRY_INV;
+ } else {
+ return -VTD_FR_PAGING_ENTRY_INV;
+ }
+ }
+
+ if (!vtd_flpte_present(flpte)) {
+ *reads = false;
+ *writes = false;
+ return -VTD_FR_PAGING_ENTRY_INV;
+ }
+
+ *reads = true;
+ *writes = (*writes) && (flpte & VTD_FL_RW_MASK);
+ if (is_write && !(flpte & VTD_FL_RW_MASK)) {
+ return -VTD_FR_WRITE;
+ }
+
+ if (vtd_is_last_flpte(flpte, level)) {
+ *flptep = flpte;
+ *flpte_level = level;
+ return 0;
+ }
+
+ addr = vtd_get_flpte_addr(flpte, aw_bits);
+ level--;
+ }
+}
+
static void vtd_report_fault(IntelIOMMUState *s,
int err, bool is_fpd_set,
uint16_t source_id,
@@ -1995,8 +2110,13 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
}
}
- ret_fr = vtd_iova_to_slpte(s, &ce, addr, is_write, &pte, &level,
- &reads, &writes, s->aw_bits, pasid);
+ if (s->scalable_modern) {
+ ret_fr = vtd_iova_to_flpte(s, &ce, addr, is_write, &pte, &level,
+ &reads, &writes, s->aw_bits, pasid);
+ } else {
+ ret_fr = vtd_iova_to_slpte(s, &ce, addr, is_write, &pte, &level,
+ &reads, &writes, s->aw_bits, pasid);
+ }
if (ret_fr) {
vtd_report_fault(s, -ret_fr, is_fpd_set, source_id,
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH rfcv2 07/17] intel_iommu: check if the input address is canonical
2024-05-22 6:22 [PATCH rfcv2 00/17] intel_iommu: Enable stage-1 translation for emulated device Zhenzhong Duan
` (5 preceding siblings ...)
2024-05-22 6:23 ` [PATCH rfcv2 06/17] intel_iommu: Implement stage-1 translation Zhenzhong Duan
@ 2024-05-22 6:23 ` Zhenzhong Duan
2024-05-22 6:23 ` [PATCH rfcv2 08/17] intel_iommu: set accessed and dirty bits during first stage translation Zhenzhong Duan
` (10 subsequent siblings)
17 siblings, 0 replies; 29+ messages in thread
From: Zhenzhong Duan @ 2024-05-22 6:23 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, mst, peterx, jasowang, jgg,
nicolinc, joao.m.martins, clement.mathieu--drif, kevin.tian,
yi.l.liu, chao.p.peng, Zhenzhong Duan, Marcel Apfelbaum,
Paolo Bonzini, Richard Henderson, Eduardo Habkost
From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
First stage translation must fail if the address to translate is
not canonical.
Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu_internal.h | 2 ++
hw/i386/intel_iommu.c | 21 +++++++++++++++++++++
2 files changed, 23 insertions(+)
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index abfdbd5f65..b6820dbca3 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -323,6 +323,8 @@ typedef enum VTDFaultReason {
VTD_FR_PASID_ENTRY_P = 0x59,
VTD_FR_PASID_TABLE_ENTRY_INV = 0x5b, /*Invalid PASID table entry */
+ VTD_FR_FS_NON_CANONICAL = 0x80, /* SNG.1 : Address for FS not canonical.*/
+
/* Output address in the interrupt address range for scalable mode */
VTD_FR_SM_INTERRUPT_ADDR = 0x87,
VTD_FR_MAX, /* Guard */
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index cf29809bc1..1ea030bfbe 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1801,6 +1801,7 @@ static const bool vtd_qualified_faults[] = {
[VTD_FR_PASID_ENTRY_P] = true,
[VTD_FR_PASID_TABLE_ENTRY_INV] = true,
[VTD_FR_SM_INTERRUPT_ADDR] = true,
+ [VTD_FR_FS_NON_CANONICAL] = true,
[VTD_FR_MAX] = false,
};
@@ -1912,6 +1913,20 @@ static inline uint64_t vtd_get_flpte_addr(uint64_t flpte, uint8_t aw)
return flpte & VTD_FL_PT_BASE_ADDR_MASK(aw);
}
+/* Return true if IOVA is canonical, otherwise false. */
+static bool vtd_iova_fl_check_canonical(IntelIOMMUState *s, uint64_t iova,
+ VTDContextEntry *ce, uint32_t pasid)
+{
+ uint64_t iova_limit = vtd_iova_limit(s, ce, s->aw_bits, pasid);
+ uint64_t upper_bits_mask = ~(iova_limit - 1);
+ uint64_t upper_bits = iova & upper_bits_mask;
+ bool msb = ((iova & (iova_limit >> 1)) != 0);
+ return !(
+ (!msb && (upper_bits != 0)) ||
+ (msb && (upper_bits != upper_bits_mask))
+ );
+}
+
/*
* Given the @iova, get relevant @flptep. @flpte_level will be the last level
* of the translation, can be used for deciding the size of large page.
@@ -1927,6 +1942,12 @@ static int vtd_iova_to_flpte(IntelIOMMUState *s, VTDContextEntry *ce,
uint32_t offset;
uint64_t flpte;
+ if (!vtd_iova_fl_check_canonical(s, iova, ce, pasid)) {
+ error_report_once("%s: detected non canonical IOVA (iova=0x%" PRIx64 ","
+ "pasid=0x%" PRIx32 ")", __func__, iova, pasid);
+ return -VTD_FR_FS_NON_CANONICAL;
+ }
+
while (true) {
offset = vtd_iova_fl_level_offset(iova, level);
flpte = vtd_get_flpte(addr, offset);
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH rfcv2 08/17] intel_iommu: set accessed and dirty bits during first stage translation
2024-05-22 6:22 [PATCH rfcv2 00/17] intel_iommu: Enable stage-1 translation for emulated device Zhenzhong Duan
` (6 preceding siblings ...)
2024-05-22 6:23 ` [PATCH rfcv2 07/17] intel_iommu: check if the input address is canonical Zhenzhong Duan
@ 2024-05-22 6:23 ` Zhenzhong Duan
2024-05-22 6:23 ` [PATCH rfcv2 09/17] intel_iommu: Flush stage-1 cache in iotlb invalidation Zhenzhong Duan
` (9 subsequent siblings)
17 siblings, 0 replies; 29+ messages in thread
From: Zhenzhong Duan @ 2024-05-22 6:23 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, mst, peterx, jasowang, jgg,
nicolinc, joao.m.martins, clement.mathieu--drif, kevin.tian,
yi.l.liu, chao.p.peng, Zhenzhong Duan, Paolo Bonzini,
Richard Henderson, Eduardo Habkost, Marcel Apfelbaum
From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu_internal.h | 3 +++
hw/i386/intel_iommu.c | 25 +++++++++++++++++++++++++
2 files changed, 28 insertions(+)
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index b6820dbca3..c0a94af820 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -327,6 +327,7 @@ typedef enum VTDFaultReason {
/* Output address in the interrupt address range for scalable mode */
VTD_FR_SM_INTERRUPT_ADDR = 0x87,
+ VTD_FR_FS_BIT_UPDATE_FAILED = 0x91, /* SFS.10 */
VTD_FR_MAX, /* Guard */
} VTDFaultReason;
@@ -547,6 +548,8 @@ typedef struct VTDRootEntry VTDRootEntry;
/* First Level Paging Structure */
#define VTD_FL_PT_LEVEL 1
#define VTD_FL_PT_ENTRY_NR 512
+#define VTD_FL_PTE_A 0x20
+#define VTD_FL_PTE_D 0x40
/* Masks for First Level Paging Entry */
#define VTD_FL_RW_MASK (1ULL << 1)
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 1ea030bfbe..0801112e2e 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1802,6 +1802,7 @@ static const bool vtd_qualified_faults[] = {
[VTD_FR_PASID_TABLE_ENTRY_INV] = true,
[VTD_FR_SM_INTERRUPT_ADDR] = true,
[VTD_FR_FS_NON_CANONICAL] = true,
+ [VTD_FR_FS_BIT_UPDATE_FAILED] = true,
[VTD_FR_MAX] = false,
};
@@ -1927,6 +1928,20 @@ static bool vtd_iova_fl_check_canonical(IntelIOMMUState *s, uint64_t iova,
);
}
+static MemTxResult vtd_set_flag_in_pte(dma_addr_t base_addr, uint32_t index,
+ uint64_t pte, uint64_t flag)
+{
+ if (pte & flag) {
+ return MEMTX_OK;
+ }
+ pte |= flag;
+ pte = cpu_to_le64(pte);
+ return dma_memory_write(&address_space_memory,
+ base_addr + index * sizeof(pte),
+ &pte, sizeof(pte),
+ MEMTXATTRS_UNSPECIFIED);
+}
+
/*
* Given the @iova, get relevant @flptep. @flpte_level will be the last level
* of the translation, can be used for deciding the size of large page.
@@ -1972,7 +1987,17 @@ static int vtd_iova_to_flpte(IntelIOMMUState *s, VTDContextEntry *ce,
return -VTD_FR_WRITE;
}
+ if (vtd_set_flag_in_pte(addr, offset, flpte, VTD_FL_PTE_A)
+ != MEMTX_OK) {
+ return -VTD_FR_FS_BIT_UPDATE_FAILED;
+ }
+
if (vtd_is_last_flpte(flpte, level)) {
+ if (is_write &&
+ (vtd_set_flag_in_pte(addr, offset, flpte, VTD_FL_PTE_D) !=
+ MEMTX_OK)) {
+ return -VTD_FR_FS_BIT_UPDATE_FAILED;
+ }
*flptep = flpte;
*flpte_level = level;
return 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH rfcv2 09/17] intel_iommu: Flush stage-1 cache in iotlb invalidation
2024-05-22 6:22 [PATCH rfcv2 00/17] intel_iommu: Enable stage-1 translation for emulated device Zhenzhong Duan
` (7 preceding siblings ...)
2024-05-22 6:23 ` [PATCH rfcv2 08/17] intel_iommu: set accessed and dirty bits during first stage translation Zhenzhong Duan
@ 2024-05-22 6:23 ` Zhenzhong Duan
2024-05-24 13:57 ` CLEMENT MATHIEU--DRIF
2024-05-22 6:23 ` [PATCH rfcv2 10/17] intel_iommu: Process PASID-based " Zhenzhong Duan
` (8 subsequent siblings)
17 siblings, 1 reply; 29+ messages in thread
From: Zhenzhong Duan @ 2024-05-22 6:23 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, mst, peterx, jasowang, jgg,
nicolinc, joao.m.martins, clement.mathieu--drif, kevin.tian,
yi.l.liu, chao.p.peng, Zhenzhong Duan, Paolo Bonzini,
Richard Henderson, Eduardo Habkost, Marcel Apfelbaum
According to spec, Page-Selective-within-Domain Invalidation (11b):
1. IOTLB entries caching second-stage mappings (PGTT=010b) or pass-through
(PGTT=100b) mappings associated with the specified domain-id and the
input-address range are invalidated.
2. IOTLB entries caching first-stage (PGTT=001b) or nested (PGTT=011b)
mapping associated with specified domain-id are invalidated.
So per spec definition the Page-Selective-within-Domain Invalidation
needs to flush first stage and nested cached IOTLB enties as well.
We don't support nested yet and pass-through mapping is never cached,
so what in iotlb cache are only first-stage and second-stage mappings.
Add a tag pgtt in VTDIOTLBEntry to mark PGTT type of the mapping and
invalidate entries based on PGTT type.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
include/hw/i386/intel_iommu.h | 1 +
hw/i386/intel_iommu.c | 20 +++++++++++++++++---
2 files changed, 18 insertions(+), 3 deletions(-)
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 011f374883..b0d5b5a5be 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -156,6 +156,7 @@ struct VTDIOTLBEntry {
uint64_t pte;
uint64_t mask;
uint8_t access_flags;
+ uint8_t pgtt;
};
/* VT-d Source-ID Qualifier types */
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 0801112e2e..0078bad9d4 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -287,9 +287,21 @@ static gboolean vtd_hash_remove_by_page(gpointer key, gpointer value,
VTDIOTLBPageInvInfo *info = (VTDIOTLBPageInvInfo *)user_data;
uint64_t gfn = (info->addr >> VTD_PAGE_SHIFT_4K) & info->mask;
uint64_t gfn_tlb = (info->addr & entry->mask) >> VTD_PAGE_SHIFT_4K;
- return (entry->domain_id == info->domain_id) &&
- (((entry->gfn & info->mask) == gfn) ||
- (entry->gfn == gfn_tlb));
+
+ if (entry->domain_id != info->domain_id) {
+ return false;
+ }
+
+ /*
+ * According to spec, IOTLB entries caching first-stage (PGTT=001b) or
+ * nested (PGTT=011b) mapping associated with specified domain-id are
+ * invalidated. Nested isn't supported yet, so only need to check 001b.
+ */
+ if (entry->pgtt == VTD_SM_PASID_ENTRY_FLT) {
+ return true;
+ }
+
+ return (entry->gfn & info->mask) == gfn || entry->gfn == gfn_tlb;
}
/* Reset all the gen of VTDAddressSpace to zero and set the gen of
@@ -382,6 +394,8 @@ static void vtd_update_iotlb(IntelIOMMUState *s, uint16_t source_id,
entry->access_flags = access_flags;
entry->mask = vtd_slpt_level_page_mask(level);
entry->pasid = pasid;
+ entry->pgtt = s->scalable_modern ? VTD_SM_PASID_ENTRY_FLT
+ : VTD_SM_PASID_ENTRY_SLT;
key->gfn = gfn;
key->sid = source_id;
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH rfcv2 10/17] intel_iommu: Process PASID-based iotlb invalidation
2024-05-22 6:22 [PATCH rfcv2 00/17] intel_iommu: Enable stage-1 translation for emulated device Zhenzhong Duan
` (8 preceding siblings ...)
2024-05-22 6:23 ` [PATCH rfcv2 09/17] intel_iommu: Flush stage-1 cache in iotlb invalidation Zhenzhong Duan
@ 2024-05-22 6:23 ` Zhenzhong Duan
2024-05-22 6:23 ` [PATCH rfcv2 11/17] intel_iommu: Extract device IOTLB invalidation logic Zhenzhong Duan
` (7 subsequent siblings)
17 siblings, 0 replies; 29+ messages in thread
From: Zhenzhong Duan @ 2024-05-22 6:23 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, mst, peterx, jasowang, jgg,
nicolinc, joao.m.martins, clement.mathieu--drif, kevin.tian,
yi.l.liu, chao.p.peng, Zhenzhong Duan, Paolo Bonzini,
Richard Henderson, Eduardo Habkost, Marcel Apfelbaum
PASID-based iotlb (piotlb) is used during walking Intel
VT-d stage-1 page table.
This emulates the stage-1 page table iotlb invalidation requested
by a PASID-based IOTLB Invalidate Descriptor (P_IOTLB).
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu_internal.h | 3 +++
hw/i386/intel_iommu.c | 45 ++++++++++++++++++++++++++++++++++
2 files changed, 48 insertions(+)
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index c0a94af820..8a375d038a 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -453,6 +453,9 @@ typedef union VTDInvDesc VTDInvDesc;
#define VTD_INV_DESC_PIOTLB_PASID(val) (((val) >> 32) & 0xfffffULL)
#define VTD_INV_DESC_PIOTLB_DID(val) (((val) >> 16) & \
VTD_DOMAIN_ID_MASK)
+#define VTD_INV_DESC_PIOTLB_ADDR(val) ((val) & ~0xfffULL)
+#define VTD_INV_DESC_PIOTLB_AM(val) ((val) & 0x3fULL)
+#define VTD_INV_DESC_PIOTLB_IH(val) (((val) >> 6) & 0x1)
/* Information about page-selective IOTLB invalidate */
struct VTDIOTLBPageInvInfo {
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 0078bad9d4..f6c429ae4c 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -304,6 +304,28 @@ static gboolean vtd_hash_remove_by_page(gpointer key, gpointer value,
return (entry->gfn & info->mask) == gfn || entry->gfn == gfn_tlb;
}
+static gboolean vtd_hash_remove_by_page_piotlb(gpointer key, gpointer value,
+ gpointer user_data)
+{
+ VTDIOTLBEntry *entry = (VTDIOTLBEntry *)value;
+ VTDIOTLBPageInvInfo *info = (VTDIOTLBPageInvInfo *)user_data;
+ uint64_t gfn = (info->addr >> VTD_PAGE_SHIFT_4K) & info->mask;
+ uint64_t gfn_tlb = (info->addr & entry->mask) >> VTD_PAGE_SHIFT_4K;
+
+ /*
+ * According to spec, PASID-based-IOTLB Invalidation in page granularity
+ * doesn't invalidate IOTLB entries caching second-stage (PGTT=010b)
+ * or pass-through (PGTT=100b) mappings. Nested isn't supported yet,
+ * so only need to check first-stage (PGTT=001b) mappings.
+ */
+ if (entry->pgtt != VTD_SM_PASID_ENTRY_FLT) {
+ return false;
+ }
+
+ return entry->domain_id == info->domain_id && entry->pasid == info->pasid &&
+ ((entry->gfn & info->mask) == gfn || entry->gfn == gfn_tlb);
+}
+
/* Reset all the gen of VTDAddressSpace to zero and set the gen of
* IntelIOMMUState to 1. Must be called with IOMMU lock held.
*/
@@ -2866,11 +2888,30 @@ static void vtd_piotlb_pasid_invalidate(IntelIOMMUState *s,
}
}
+static void vtd_piotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
+ uint32_t pasid, hwaddr addr, uint8_t am,
+ bool ih)
+{
+ VTDIOTLBPageInvInfo info;
+
+ info.domain_id = domain_id;
+ info.pasid = pasid;
+ info.addr = addr;
+ info.mask = ~((1 << am) - 1);
+
+ vtd_iommu_lock(s);
+ g_hash_table_foreach_remove(s->iotlb,
+ vtd_hash_remove_by_page_piotlb, &info);
+ vtd_iommu_unlock(s);
+}
+
static bool vtd_process_piotlb_desc(IntelIOMMUState *s,
VTDInvDesc *inv_desc)
{
uint16_t domain_id;
uint32_t pasid;
+ uint8_t am;
+ hwaddr addr;
if ((inv_desc->val[0] & VTD_INV_DESC_PIOTLB_RSVD_VAL0) ||
(inv_desc->val[1] & VTD_INV_DESC_PIOTLB_RSVD_VAL1)) {
@@ -2887,6 +2928,10 @@ static bool vtd_process_piotlb_desc(IntelIOMMUState *s,
break;
case VTD_INV_DESC_PIOTLB_PSI_IN_PASID:
+ am = VTD_INV_DESC_PIOTLB_AM(inv_desc->val[1]);
+ addr = (hwaddr) VTD_INV_DESC_PIOTLB_ADDR(inv_desc->val[1]);
+ vtd_piotlb_page_invalidate(s, domain_id, pasid, addr, am,
+ VTD_INV_DESC_PIOTLB_IH(inv_desc->val[1]));
break;
default:
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH rfcv2 11/17] intel_iommu: Extract device IOTLB invalidation logic
2024-05-22 6:22 [PATCH rfcv2 00/17] intel_iommu: Enable stage-1 translation for emulated device Zhenzhong Duan
` (9 preceding siblings ...)
2024-05-22 6:23 ` [PATCH rfcv2 10/17] intel_iommu: Process PASID-based " Zhenzhong Duan
@ 2024-05-22 6:23 ` Zhenzhong Duan
2024-05-22 6:23 ` [PATCH rfcv2 12/17] intel_iommu: add an internal API to find an address space with PASID Zhenzhong Duan
` (6 subsequent siblings)
17 siblings, 0 replies; 29+ messages in thread
From: Zhenzhong Duan @ 2024-05-22 6:23 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, mst, peterx, jasowang, jgg,
nicolinc, joao.m.martins, clement.mathieu--drif, kevin.tian,
yi.l.liu, chao.p.peng, Philippe Mathieu-Daudé,
Zhenzhong Duan, Paolo Bonzini, Richard Henderson, Eduardo Habkost,
Marcel Apfelbaum
From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
This piece of code can be shared by both IOTLB invalidation and
PASID-based IOTLB invalidation
No functional changes intended.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu.c | 57 +++++++++++++++++++++++++------------------
1 file changed, 33 insertions(+), 24 deletions(-)
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index f6c429ae4c..3c14fd85cc 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2955,13 +2955,43 @@ static bool vtd_process_inv_iec_desc(IntelIOMMUState *s,
return true;
}
+static void do_invalidate_device_tlb(VTDAddressSpace *vtd_dev_as,
+ bool size, hwaddr addr)
+{
+ /*
+ * According to ATS spec table 2.4:
+ * S = 0, bits 15:12 = xxxx range size: 4K
+ * S = 1, bits 15:12 = xxx0 range size: 8K
+ * S = 1, bits 15:12 = xx01 range size: 16K
+ * S = 1, bits 15:12 = x011 range size: 32K
+ * S = 1, bits 15:12 = 0111 range size: 64K
+ * ...
+ */
+
+ IOMMUTLBEvent event;
+ uint64_t sz;
+
+ if (size) {
+ sz = (VTD_PAGE_SIZE * 2) << cto64(addr >> VTD_PAGE_SHIFT);
+ addr &= ~(sz - 1);
+ } else {
+ sz = VTD_PAGE_SIZE;
+ }
+
+ event.type = IOMMU_NOTIFIER_DEVIOTLB_UNMAP;
+ event.entry.target_as = &vtd_dev_as->as;
+ event.entry.addr_mask = sz - 1;
+ event.entry.iova = addr;
+ event.entry.perm = IOMMU_NONE;
+ event.entry.translated_addr = 0;
+ memory_region_notify_iommu(&vtd_dev_as->iommu, 0, event);
+}
+
static bool vtd_process_device_iotlb_desc(IntelIOMMUState *s,
VTDInvDesc *inv_desc)
{
VTDAddressSpace *vtd_dev_as;
- IOMMUTLBEvent event;
hwaddr addr;
- uint64_t sz;
uint16_t sid;
bool size;
@@ -2986,28 +3016,7 @@ static bool vtd_process_device_iotlb_desc(IntelIOMMUState *s,
goto done;
}
- /* According to ATS spec table 2.4:
- * S = 0, bits 15:12 = xxxx range size: 4K
- * S = 1, bits 15:12 = xxx0 range size: 8K
- * S = 1, bits 15:12 = xx01 range size: 16K
- * S = 1, bits 15:12 = x011 range size: 32K
- * S = 1, bits 15:12 = 0111 range size: 64K
- * ...
- */
- if (size) {
- sz = (VTD_PAGE_SIZE * 2) << cto64(addr >> VTD_PAGE_SHIFT);
- addr &= ~(sz - 1);
- } else {
- sz = VTD_PAGE_SIZE;
- }
-
- event.type = IOMMU_NOTIFIER_DEVIOTLB_UNMAP;
- event.entry.target_as = &vtd_dev_as->as;
- event.entry.addr_mask = sz - 1;
- event.entry.iova = addr;
- event.entry.perm = IOMMU_NONE;
- event.entry.translated_addr = 0;
- memory_region_notify_iommu(&vtd_dev_as->iommu, 0, event);
+ do_invalidate_device_tlb(vtd_dev_as, size, addr);
done:
return true;
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH rfcv2 12/17] intel_iommu: add an internal API to find an address space with PASID
2024-05-22 6:22 [PATCH rfcv2 00/17] intel_iommu: Enable stage-1 translation for emulated device Zhenzhong Duan
` (10 preceding siblings ...)
2024-05-22 6:23 ` [PATCH rfcv2 11/17] intel_iommu: Extract device IOTLB invalidation logic Zhenzhong Duan
@ 2024-05-22 6:23 ` Zhenzhong Duan
2024-05-22 6:23 ` [PATCH rfcv2 13/17] intel_iommu: add support for PASID-based device IOTLB invalidation Zhenzhong Duan
` (5 subsequent siblings)
17 siblings, 0 replies; 29+ messages in thread
From: Zhenzhong Duan @ 2024-05-22 6:23 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, mst, peterx, jasowang, jgg,
nicolinc, joao.m.martins, clement.mathieu--drif, kevin.tian,
yi.l.liu, chao.p.peng, Zhenzhong Duan, Marcel Apfelbaum,
Paolo Bonzini, Richard Henderson, Eduardo Habkost
From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
This will be used to implement the device IOTLB invalidation
Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu.c | 39 ++++++++++++++++++++++++---------------
1 file changed, 24 insertions(+), 15 deletions(-)
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 3c14fd85cc..7ae8df2f49 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -64,6 +64,11 @@ struct vtd_as_key {
uint32_t pasid;
};
+struct vtd_as_raw_key {
+ uint16_t sid;
+ uint32_t pasid;
+};
+
struct vtd_iotlb_key {
uint64_t gfn;
uint32_t pasid;
@@ -1856,29 +1861,33 @@ static inline bool vtd_is_interrupt_addr(hwaddr addr)
return VTD_INTERRUPT_ADDR_FIRST <= addr && addr <= VTD_INTERRUPT_ADDR_LAST;
}
-static gboolean vtd_find_as_by_sid(gpointer key, gpointer value,
- gpointer user_data)
+static gboolean vtd_find_as_by_sid_and_pasid(gpointer key, gpointer value,
+ gpointer user_data)
{
struct vtd_as_key *as_key = (struct vtd_as_key *)key;
- uint16_t target_sid = *(uint16_t *)user_data;
+ struct vtd_as_raw_key target = *(struct vtd_as_raw_key *)user_data;
uint16_t sid = PCI_BUILD_BDF(pci_bus_num(as_key->bus), as_key->devfn);
- return sid == target_sid;
+
+ return (as_key->pasid == target.pasid) &&
+ (sid == target.sid);
}
-static VTDAddressSpace *vtd_get_as_by_sid(IntelIOMMUState *s, uint16_t sid)
+static VTDAddressSpace *vtd_get_as_by_sid_and_pasid(IntelIOMMUState *s,
+ uint16_t sid,
+ uint32_t pasid)
{
- uint8_t bus_num = PCI_BUS_NUM(sid);
- VTDAddressSpace *vtd_as = s->vtd_as_cache[bus_num];
-
- if (vtd_as &&
- (sid == PCI_BUILD_BDF(pci_bus_num(vtd_as->bus), vtd_as->devfn))) {
- return vtd_as;
- }
+ struct vtd_as_raw_key key = {
+ .sid = sid,
+ .pasid = pasid
+ };
- vtd_as = g_hash_table_find(s->vtd_address_spaces, vtd_find_as_by_sid, &sid);
- s->vtd_as_cache[bus_num] = vtd_as;
+ return g_hash_table_find(s->vtd_address_spaces,
+ vtd_find_as_by_sid_and_pasid, &key);
+}
- return vtd_as;
+static VTDAddressSpace *vtd_get_as_by_sid(IntelIOMMUState *s, uint16_t sid)
+{
+ return vtd_get_as_by_sid_and_pasid(s, sid, PCI_NO_PASID);
}
static void vtd_pt_enable_fast_path(IntelIOMMUState *s, uint16_t source_id)
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH rfcv2 13/17] intel_iommu: add support for PASID-based device IOTLB invalidation
2024-05-22 6:22 [PATCH rfcv2 00/17] intel_iommu: Enable stage-1 translation for emulated device Zhenzhong Duan
` (11 preceding siblings ...)
2024-05-22 6:23 ` [PATCH rfcv2 12/17] intel_iommu: add an internal API to find an address space with PASID Zhenzhong Duan
@ 2024-05-22 6:23 ` Zhenzhong Duan
2024-05-22 6:23 ` [PATCH rfcv2 14/17] intel_iommu: piotlb invalidation should notify unmap Zhenzhong Duan
` (4 subsequent siblings)
17 siblings, 0 replies; 29+ messages in thread
From: Zhenzhong Duan @ 2024-05-22 6:23 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, mst, peterx, jasowang, jgg,
nicolinc, joao.m.martins, clement.mathieu--drif, kevin.tian,
yi.l.liu, chao.p.peng, Zhenzhong Duan, Marcel Apfelbaum,
Paolo Bonzini, Richard Henderson, Eduardo Habkost
From: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu_internal.h | 11 ++++++++
hw/i386/intel_iommu.c | 50 ++++++++++++++++++++++++++++++++++
2 files changed, 61 insertions(+)
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 8a375d038a..5831aa4d82 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -378,6 +378,7 @@ typedef union VTDInvDesc VTDInvDesc;
#define VTD_INV_DESC_WAIT 0x5 /* Invalidation Wait Descriptor */
#define VTD_INV_DESC_PIOTLB 0x6 /* PASID-IOTLB Invalidate Desc */
#define VTD_INV_DESC_PC 0x7 /* PASID-cache Invalidate Desc */
+#define VTD_INV_DESC_DEV_PIOTLB 0x8 /* PASID-based-DIOTLB inv_desc*/
#define VTD_INV_DESC_NONE 0 /* Not an Invalidate Descriptor */
/* Masks for Invalidation Wait Descriptor*/
@@ -421,6 +422,16 @@ typedef union VTDInvDesc VTDInvDesc;
#define VTD_INV_DESC_DEVICE_IOTLB_RSVD_HI 0xffeULL
#define VTD_INV_DESC_DEVICE_IOTLB_RSVD_LO 0xffff0000ffe0fff8
+/* Mask for PASID Device IOTLB Invalidate Descriptor */
+#define VTD_INV_DESC_PASID_DEVICE_IOTLB_ADDR(val) ((val) & \
+ 0xfffffffffffff000ULL)
+#define VTD_INV_DESC_PASID_DEVICE_IOTLB_SIZE(val) ((val >> 11) & 0x1)
+#define VTD_INV_DESC_PASID_DEVICE_IOTLB_GLOBAL(val) ((val) & 0x1)
+#define VTD_INV_DESC_PASID_DEVICE_IOTLB_SID(val) (((val) >> 16) & 0xffffULL)
+#define VTD_INV_DESC_PASID_DEVICE_IOTLB_PASID(val) ((val >> 32) & 0xfffffULL)
+#define VTD_INV_DESC_PASID_DEVICE_IOTLB_RSVD_HI 0x7feULL
+#define VTD_INV_DESC_PASID_DEVICE_IOTLB_RSVD_LO 0xfff000000000f000ULL
+
/* Rsvd field masks for spte */
#define VTD_SPTE_SNP 0x800ULL
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 7ae8df2f49..de4e8afcf9 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2996,6 +2996,49 @@ static void do_invalidate_device_tlb(VTDAddressSpace *vtd_dev_as,
memory_region_notify_iommu(&vtd_dev_as->iommu, 0, event);
}
+static bool vtd_process_device_piotlb_desc(IntelIOMMUState *s,
+ VTDInvDesc *inv_desc)
+{
+ uint16_t sid;
+ VTDAddressSpace *vtd_dev_as;
+ bool size;
+ bool global;
+ hwaddr addr;
+ uint32_t pasid;
+
+ if ((inv_desc->hi & VTD_INV_DESC_PASID_DEVICE_IOTLB_RSVD_HI) ||
+ (inv_desc->lo & VTD_INV_DESC_PASID_DEVICE_IOTLB_RSVD_LO)) {
+ error_report_once("%s: invalid pasid-based dev iotlb inv desc:"
+ "hi=%"PRIx64 "(reserved nonzero)",
+ __func__, inv_desc->hi);
+ return false;
+ }
+
+ global = VTD_INV_DESC_PASID_DEVICE_IOTLB_GLOBAL(inv_desc->hi);
+ size = VTD_INV_DESC_PASID_DEVICE_IOTLB_SIZE(inv_desc->hi);
+ addr = VTD_INV_DESC_PASID_DEVICE_IOTLB_ADDR(inv_desc->hi);
+ sid = VTD_INV_DESC_PASID_DEVICE_IOTLB_SID(inv_desc->lo);
+ if (global) {
+ QLIST_FOREACH(vtd_dev_as, &s->vtd_as_with_notifiers, next) {
+ if ((vtd_dev_as->pasid != PCI_NO_PASID) &&
+ (PCI_BUILD_BDF(pci_bus_num(vtd_dev_as->bus),
+ vtd_dev_as->devfn) == sid)) {
+ do_invalidate_device_tlb(vtd_dev_as, size, addr);
+ }
+ }
+ } else {
+ pasid = VTD_INV_DESC_PASID_DEVICE_IOTLB_PASID(inv_desc->lo);
+ vtd_dev_as = vtd_get_as_by_sid_and_pasid(s, sid, pasid);
+ if (!vtd_dev_as) {
+ return true;
+ }
+
+ do_invalidate_device_tlb(vtd_dev_as, size, addr);
+ }
+
+ return true;
+}
+
static bool vtd_process_device_iotlb_desc(IntelIOMMUState *s,
VTDInvDesc *inv_desc)
{
@@ -3090,6 +3133,13 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
}
break;
+ case VTD_INV_DESC_DEV_PIOTLB:
+ trace_vtd_inv_desc("device-piotlb", inv_desc.hi, inv_desc.lo);
+ if (!vtd_process_device_piotlb_desc(s, &inv_desc)) {
+ return false;
+ }
+ break;
+
case VTD_INV_DESC_DEVICE:
trace_vtd_inv_desc("device", inv_desc.hi, inv_desc.lo);
if (!vtd_process_device_iotlb_desc(s, &inv_desc)) {
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH rfcv2 14/17] intel_iommu: piotlb invalidation should notify unmap
2024-05-22 6:22 [PATCH rfcv2 00/17] intel_iommu: Enable stage-1 translation for emulated device Zhenzhong Duan
` (12 preceding siblings ...)
2024-05-22 6:23 ` [PATCH rfcv2 13/17] intel_iommu: add support for PASID-based device IOTLB invalidation Zhenzhong Duan
@ 2024-05-22 6:23 ` Zhenzhong Duan
2024-05-22 6:23 ` [PATCH rfcv2 15/17] intel_iommu: Set default aw_bits to 48 in scalable modren mode Zhenzhong Duan
` (3 subsequent siblings)
17 siblings, 0 replies; 29+ messages in thread
From: Zhenzhong Duan @ 2024-05-22 6:23 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, mst, peterx, jasowang, jgg,
nicolinc, joao.m.martins, clement.mathieu--drif, kevin.tian,
yi.l.liu, chao.p.peng, Zhenzhong Duan, Yi Sun, Paolo Bonzini,
Richard Henderson, Eduardo Habkost, Marcel Apfelbaum
This is used by some emulated devices which caches address
translation result. When piotlb invalidation issued in guest,
those caches should be refreshed.
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu.c | 35 ++++++++++++++++++++++++++++++++++-
1 file changed, 34 insertions(+), 1 deletion(-)
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index de4e8afcf9..e07daaba99 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2890,7 +2890,7 @@ static void vtd_piotlb_pasid_invalidate(IntelIOMMUState *s,
continue;
}
- if (!s->scalable_modern) {
+ if (!s->scalable_modern || !vtd_as_has_map_notifier(vtd_as)) {
vtd_address_space_sync(vtd_as);
}
}
@@ -2902,6 +2902,9 @@ static void vtd_piotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
bool ih)
{
VTDIOTLBPageInvInfo info;
+ VTDAddressSpace *vtd_as;
+ VTDContextEntry ce;
+ hwaddr size = (1 << am) * VTD_PAGE_SIZE;
info.domain_id = domain_id;
info.pasid = pasid;
@@ -2912,6 +2915,36 @@ static void vtd_piotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
g_hash_table_foreach_remove(s->iotlb,
vtd_hash_remove_by_page_piotlb, &info);
vtd_iommu_unlock(s);
+
+ QLIST_FOREACH(vtd_as, &s->vtd_as_with_notifiers, next) {
+ if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
+ vtd_as->devfn, &ce) &&
+ domain_id == vtd_get_domain_id(s, &ce, vtd_as->pasid)) {
+ uint32_t rid2pasid = VTD_CE_GET_RID2PASID(&ce);
+ IOMMUTLBEvent event;
+
+ if ((vtd_as->pasid != PCI_NO_PASID || pasid != rid2pasid) &&
+ vtd_as->pasid != pasid) {
+ continue;
+ }
+
+ /*
+ * Page-Selective-within-PASID PASID-based-IOTLB Invalidation
+ * does not flush stage-2 entries. See spec section 6.5.2.4
+ */
+ if (!s->scalable_modern) {
+ continue;
+ }
+
+ event.type = IOMMU_NOTIFIER_UNMAP;
+ event.entry.target_as = &address_space_memory;
+ event.entry.iova = addr;
+ event.entry.perm = IOMMU_NONE;
+ event.entry.addr_mask = size - 1;
+ event.entry.translated_addr = 0;
+ memory_region_notify_iommu(&vtd_as->iommu, 0, event);
+ }
+ }
}
static bool vtd_process_piotlb_desc(IntelIOMMUState *s,
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH rfcv2 15/17] intel_iommu: Set default aw_bits to 48 in scalable modren mode
2024-05-22 6:22 [PATCH rfcv2 00/17] intel_iommu: Enable stage-1 translation for emulated device Zhenzhong Duan
` (13 preceding siblings ...)
2024-05-22 6:23 ` [PATCH rfcv2 14/17] intel_iommu: piotlb invalidation should notify unmap Zhenzhong Duan
@ 2024-05-22 6:23 ` Zhenzhong Duan
2024-05-24 13:56 ` CLEMENT MATHIEU--DRIF
2024-05-22 6:23 ` [PATCH rfcv2 16/17] intel_iommu: Modify x-scalable-mode to be string option Zhenzhong Duan
` (2 subsequent siblings)
17 siblings, 1 reply; 29+ messages in thread
From: Zhenzhong Duan @ 2024-05-22 6:23 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, mst, peterx, jasowang, jgg,
nicolinc, joao.m.martins, clement.mathieu--drif, kevin.tian,
yi.l.liu, chao.p.peng, Zhenzhong Duan, Paolo Bonzini,
Richard Henderson, Eduardo Habkost, Marcel Apfelbaum
According to VTD spec, stage-1 page table could support 4-level and
5-level paging.
However, 5-level paging translation emulation is unsupported yet.
That means the only supported value for aw_bits is 48.
So default aw_bits to 48 in scalable modern mode. In other cases,
it is still default to 39 for compatibility.
Add a check to ensure user specified value is 48 in modern mode
for now.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index e07daaba99..a4c241ea96 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3748,7 +3748,7 @@ static Property vtd_properties[] = {
ON_OFF_AUTO_AUTO),
DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
DEFINE_PROP_UINT8("aw-bits", IntelIOMMUState, aw_bits,
- VTD_HOST_ADDRESS_WIDTH),
+ 0xff),
DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE),
DEFINE_PROP_BOOL("x-scalable-mode", IntelIOMMUState, scalable_mode, FALSE),
DEFINE_PROP_BOOL("snoop-control", IntelIOMMUState, snoop_control, false),
@@ -4663,6 +4663,14 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
}
}
+ if (s->aw_bits == 0xff) {
+ if (s->scalable_modern) {
+ s->aw_bits = VTD_HOST_AW_48BIT;
+ } else {
+ s->aw_bits = VTD_HOST_AW_39BIT;
+ }
+ }
+
if ((s->aw_bits != VTD_HOST_AW_39BIT) &&
(s->aw_bits != VTD_HOST_AW_48BIT) &&
!s->scalable_modern) {
@@ -4671,6 +4679,12 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
return false;
}
+ if ((s->aw_bits != VTD_HOST_AW_48BIT) && s->scalable_modern) {
+ error_setg(errp, "Supported values for aw-bits are: %d",
+ VTD_HOST_AW_48BIT);
+ return false;
+ }
+
if (s->scalable_mode && !s->dma_drain) {
error_setg(errp, "Need to set dma_drain for scalable mode");
return false;
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH rfcv2 16/17] intel_iommu: Modify x-scalable-mode to be string option
2024-05-22 6:22 [PATCH rfcv2 00/17] intel_iommu: Enable stage-1 translation for emulated device Zhenzhong Duan
` (14 preceding siblings ...)
2024-05-22 6:23 ` [PATCH rfcv2 15/17] intel_iommu: Set default aw_bits to 48 in scalable modren mode Zhenzhong Duan
@ 2024-05-22 6:23 ` Zhenzhong Duan
2024-05-22 6:23 ` [PATCH rfcv2 17/17] tests/qtest: Add intel-iommu test Zhenzhong Duan
2024-05-22 8:10 ` [PATCH rfcv2 00/17] intel_iommu: Enable stage-1 translation for emulated device Jason Wang
17 siblings, 0 replies; 29+ messages in thread
From: Zhenzhong Duan @ 2024-05-22 6:23 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, mst, peterx, jasowang, jgg,
nicolinc, joao.m.martins, clement.mathieu--drif, kevin.tian,
yi.l.liu, chao.p.peng, Yi Sun, Zhenzhong Duan, Marcel Apfelbaum,
Paolo Bonzini, Richard Henderson, Eduardo Habkost
From: Yi Liu <yi.l.liu@intel.com>
Intel VT-d 3.0 introduces scalable mode, and it has a bunch of capabilities
related to scalable mode translation, thus there are multiple combinations.
While this vIOMMU implementation wants to simplify it for user by providing
typical combinations. User could config it by "x-scalable-mode" option. The
usage is as below:
"-device intel-iommu,x-scalable-mode=["legacy"|"modern"|"off"]"
- "legacy": gives support for stage-2 page table
- "modern": gives support for stage-1 page table
- "off": no scalable mode support
- if not configured, means no scalable mode support, if not proper
configured, will throw error
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
include/hw/i386/intel_iommu.h | 1 +
hw/i386/intel_iommu.c | 24 +++++++++++++++++++++++-
2 files changed, 24 insertions(+), 1 deletion(-)
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index b0d5b5a5be..dd032b1081 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -264,6 +264,7 @@ struct IntelIOMMUState {
bool caching_mode; /* RO - is cap CM enabled? */
bool scalable_mode; /* RO - is Scalable Mode supported? */
+ char *scalable_mode_str; /* RO - admin's Scalable Mode config */
bool scalable_modern; /* RO - is modern SM supported? */
bool snoop_control; /* RO - is SNP filed supported? */
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index a4c241ea96..1bd91fcf4c 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3750,7 +3750,7 @@ static Property vtd_properties[] = {
DEFINE_PROP_UINT8("aw-bits", IntelIOMMUState, aw_bits,
0xff),
DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE),
- DEFINE_PROP_BOOL("x-scalable-mode", IntelIOMMUState, scalable_mode, FALSE),
+ DEFINE_PROP_STRING("x-scalable-mode", IntelIOMMUState, scalable_mode_str),
DEFINE_PROP_BOOL("snoop-control", IntelIOMMUState, snoop_control, false),
DEFINE_PROP_BOOL("x-pasid-mode", IntelIOMMUState, pasid, false),
DEFINE_PROP_BOOL("dma-drain", IntelIOMMUState, dma_drain, true),
@@ -4663,6 +4663,28 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
}
}
+ if (s->scalable_mode_str &&
+ (strcmp(s->scalable_mode_str, "off") &&
+ strcmp(s->scalable_mode_str, "modern") &&
+ strcmp(s->scalable_mode_str, "legacy"))) {
+ error_setg(errp, "Invalid x-scalable-mode config,"
+ "Please use \"modern\", \"legacy\" or \"off\"");
+ return false;
+ }
+
+ if (s->scalable_mode_str &&
+ !strcmp(s->scalable_mode_str, "legacy")) {
+ s->scalable_mode = true;
+ s->scalable_modern = false;
+ } else if (s->scalable_mode_str &&
+ !strcmp(s->scalable_mode_str, "modern")) {
+ s->scalable_mode = true;
+ s->scalable_modern = true;
+ } else {
+ s->scalable_mode = false;
+ s->scalable_modern = false;
+ }
+
if (s->aw_bits == 0xff) {
if (s->scalable_modern) {
s->aw_bits = VTD_HOST_AW_48BIT;
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH rfcv2 17/17] tests/qtest: Add intel-iommu test
2024-05-22 6:22 [PATCH rfcv2 00/17] intel_iommu: Enable stage-1 translation for emulated device Zhenzhong Duan
` (15 preceding siblings ...)
2024-05-22 6:23 ` [PATCH rfcv2 16/17] intel_iommu: Modify x-scalable-mode to be string option Zhenzhong Duan
@ 2024-05-22 6:23 ` Zhenzhong Duan
2024-05-22 12:46 ` Thomas Huth
2024-05-22 8:10 ` [PATCH rfcv2 00/17] intel_iommu: Enable stage-1 translation for emulated device Jason Wang
17 siblings, 1 reply; 29+ messages in thread
From: Zhenzhong Duan @ 2024-05-22 6:23 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, mst, peterx, jasowang, jgg,
nicolinc, joao.m.martins, clement.mathieu--drif, kevin.tian,
yi.l.liu, chao.p.peng, Zhenzhong Duan, Thomas Huth,
Laurent Vivier, Paolo Bonzini
Add the framework to test the intel-iommu device.
Currently only tested cap/ecap bits correctness in scalable
modern mode. Also tested cap/ecap bits consistency before
and after system reset.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
MAINTAINERS | 1 +
tests/qtest/intel-iommu-test.c | 63 ++++++++++++++++++++++++++++++++++
tests/qtest/meson.build | 1 +
3 files changed, 65 insertions(+)
create mode 100644 tests/qtest/intel-iommu-test.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 5dab60bd04..f1ef6128c8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3656,6 +3656,7 @@ S: Supported
F: hw/i386/intel_iommu.c
F: hw/i386/intel_iommu_internal.h
F: include/hw/i386/intel_iommu.h
+F: tests/qtest/intel-iommu-test.c
AMD-Vi Emulation
S: Orphan
diff --git a/tests/qtest/intel-iommu-test.c b/tests/qtest/intel-iommu-test.c
new file mode 100644
index 0000000000..e1273bce14
--- /dev/null
+++ b/tests/qtest/intel-iommu-test.c
@@ -0,0 +1,63 @@
+/*
+ * QTest testcase for intel-iommu
+ *
+ * Copyright (c) 2024 Intel, Inc.
+ *
+ * Author: Zhenzhong Duan <zhenzhong.duan@intel.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest-single.h"
+#include "hw/i386/intel_iommu_internal.h"
+
+#define vtd_reg_readl(offset) (readq(Q35_HOST_BRIDGE_IOMMU_ADDR + offset))
+#define CAP_MODERN_FIXED1 (VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND | \
+ VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS)
+#define ECAP_MODERN_FIXED1 (VTD_ECAP_QI | VTD_ECAP_IRO | VTD_ECAP_MHMV | \
+ VTD_ECAP_SMTS | VTD_ECAP_FLTS)
+
+static void test_intel_iommu_modern(void)
+{
+ uint8_t init_csr[DMAR_REG_SIZE]; /* register values */
+ uint8_t post_reset_csr[DMAR_REG_SIZE]; /* register values */
+ uint64_t cap, ecap, tmp;
+
+ qtest_start("-M q35 -device intel-iommu,x-scalable-mode=modern");
+
+ g_assert(vtd_reg_readl(DMAR_VER_REG) == 0x30);
+
+ cap = vtd_reg_readl(DMAR_CAP_REG);
+ g_assert((cap & CAP_MODERN_FIXED1) == CAP_MODERN_FIXED1);
+
+ tmp = cap & VTD_CAP_SAGAW_MASK;
+ g_assert(tmp == (VTD_CAP_SAGAW_39bit | VTD_CAP_SAGAW_48bit));
+
+ tmp = VTD_MGAW_FROM_CAP(cap);
+ g_assert(tmp == VTD_HOST_AW_48BIT - 1);
+
+ ecap = vtd_reg_readl(DMAR_ECAP_REG);
+ g_assert((ecap & ECAP_MODERN_FIXED1) == ECAP_MODERN_FIXED1);
+ g_assert(ecap & VTD_ECAP_IR);
+
+ memread(Q35_HOST_BRIDGE_IOMMU_ADDR, init_csr, DMAR_REG_SIZE);
+
+ qobject_unref(qmp("{ 'execute': 'system_reset' }"));
+ qmp_eventwait("RESET");
+
+ memread(Q35_HOST_BRIDGE_IOMMU_ADDR, post_reset_csr, DMAR_REG_SIZE);
+ /* Ensure registers are consistent after hard reset */
+ g_assert(!memcmp(init_csr, post_reset_csr, DMAR_REG_SIZE));
+
+ qtest_end();
+}
+
+int main(int argc, char **argv)
+{
+ g_test_init(&argc, &argv, NULL);
+ qtest_add_func("/q35/intel-iommu/modern", test_intel_iommu_modern);
+
+ return g_test_run();
+}
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 6f2f594ace..09106739d2 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -79,6 +79,7 @@ qtests_i386 = \
(config_all_devices.has_key('CONFIG_SB16') ? ['fuzz-sb16-test'] : []) + \
(config_all_devices.has_key('CONFIG_SDHCI_PCI') ? ['fuzz-sdcard-test'] : []) + \
(config_all_devices.has_key('CONFIG_ESP_PCI') ? ['am53c974-test'] : []) + \
+ (config_all_devices.has_key('CONFIG_VTD') ? ['intel-iommu-test'] : []) + \
(host_os != 'windows' and \
config_all_devices.has_key('CONFIG_ACPI_ERST') ? ['erst-test'] : []) + \
(config_all_devices.has_key('CONFIG_PCIE_PORT') and \
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH rfcv2 00/17] intel_iommu: Enable stage-1 translation for emulated device
2024-05-22 6:22 [PATCH rfcv2 00/17] intel_iommu: Enable stage-1 translation for emulated device Zhenzhong Duan
` (16 preceding siblings ...)
2024-05-22 6:23 ` [PATCH rfcv2 17/17] tests/qtest: Add intel-iommu test Zhenzhong Duan
@ 2024-05-22 8:10 ` Jason Wang
2024-05-23 9:35 ` Duan, Zhenzhong
17 siblings, 1 reply; 29+ messages in thread
From: Jason Wang @ 2024-05-22 8:10 UTC (permalink / raw)
To: Zhenzhong Duan
Cc: qemu-devel, alex.williamson, clg, eric.auger, mst, peterx, jgg,
nicolinc, joao.m.martins, clement.mathieu--drif, kevin.tian,
yi.l.liu, chao.p.peng
On Wed, May 22, 2024 at 2:25 PM Zhenzhong Duan <zhenzhong.duan@intel.com> wrote:
>
> Hi,
>
> Per Jason Wang's suggestion, iommufd nesting series[1] is split into
> "Enable stage-1 translation for emulated device" series and
> "Enable stage-1 translation for passthrough device" series.
>
> This series enables stage-1 translation support for emulated device
> in intel iommu which we called "modern" mode.
Btw, I think we never merge RFC patches so I guess this series could
be sent as formal one for the next version.
Thanks
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH rfcv2 17/17] tests/qtest: Add intel-iommu test
2024-05-22 6:23 ` [PATCH rfcv2 17/17] tests/qtest: Add intel-iommu test Zhenzhong Duan
@ 2024-05-22 12:46 ` Thomas Huth
2024-05-23 9:46 ` Duan, Zhenzhong
0 siblings, 1 reply; 29+ messages in thread
From: Thomas Huth @ 2024-05-22 12:46 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, clg, eric.auger, mst, peterx, jasowang, jgg,
nicolinc, joao.m.martins, clement.mathieu--drif, kevin.tian,
yi.l.liu, chao.p.peng, Laurent Vivier, Paolo Bonzini,
Michael S. Tsirkin
On 22/05/2024 08.23, Zhenzhong Duan wrote:
> Add the framework to test the intel-iommu device.
>
> Currently only tested cap/ecap bits correctness in scalable
> modern mode. Also tested cap/ecap bits consistency before
> and after system reset.
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> MAINTAINERS | 1 +
> tests/qtest/intel-iommu-test.c | 63 ++++++++++++++++++++++++++++++++++
> tests/qtest/meson.build | 1 +
> 3 files changed, 65 insertions(+)
> create mode 100644 tests/qtest/intel-iommu-test.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5dab60bd04..f1ef6128c8 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3656,6 +3656,7 @@ S: Supported
> F: hw/i386/intel_iommu.c
> F: hw/i386/intel_iommu_internal.h
> F: include/hw/i386/intel_iommu.h
> +F: tests/qtest/intel-iommu-test.c
>
> AMD-Vi Emulation
> S: Orphan
> diff --git a/tests/qtest/intel-iommu-test.c b/tests/qtest/intel-iommu-test.c
> new file mode 100644
> index 0000000000..e1273bce14
> --- /dev/null
> +++ b/tests/qtest/intel-iommu-test.c
> @@ -0,0 +1,63 @@
> +/*
> + * QTest testcase for intel-iommu
> + *
> + * Copyright (c) 2024 Intel, Inc.
> + *
> + * Author: Zhenzhong Duan <zhenzhong.duan@intel.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "libqtest-single.h"
It's a little bit nicer to write new tests without libqtest-single.h (e.g.
in case you ever add migration tests later, you must not use anything that
uses a global state), so I'd recommend to use "qts = qtest_init(...)"
instead of qtest_start(...) and then to use the functions with the "qtest_"
prefix instead of the other functions from libqtest-single.h ... but it's
only a recommendation, up to you whether you want to respin your patch with
it or not.
Anyway:
Acked-by: Thomas Huth <thuth@redhat.com>
Do you want me to pick this up through the qtest tree, or shall this go
through some x86-related tree instead?
Thomas
^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH rfcv2 00/17] intel_iommu: Enable stage-1 translation for emulated device
2024-05-22 8:10 ` [PATCH rfcv2 00/17] intel_iommu: Enable stage-1 translation for emulated device Jason Wang
@ 2024-05-23 9:35 ` Duan, Zhenzhong
0 siblings, 0 replies; 29+ messages in thread
From: Duan, Zhenzhong @ 2024-05-23 9:35 UTC (permalink / raw)
To: Jason Wang
Cc: qemu-devel@nongnu.org, alex.williamson@redhat.com, clg@redhat.com,
eric.auger@redhat.com, mst@redhat.com, peterx@redhat.com,
jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com,
clement.mathieu--drif@eviden.com, Tian, Kevin, Liu, Yi L,
Peng, Chao P
>-----Original Message-----
>From: Jason Wang <jasowang@redhat.com>
>Subject: Re: [PATCH rfcv2 00/17] intel_iommu: Enable stage-1 translation
>for emulated device
>
>On Wed, May 22, 2024 at 2:25 PM Zhenzhong Duan
><zhenzhong.duan@intel.com> wrote:
>>
>> Hi,
>>
>> Per Jason Wang's suggestion, iommufd nesting series[1] is split into
>> "Enable stage-1 translation for emulated device" series and
>> "Enable stage-1 translation for passthrough device" series.
>>
>> This series enables stage-1 translation support for emulated device
>> in intel iommu which we called "modern" mode.
>
>Btw, I think we never merge RFC patches so I guess this series could
>be sent as formal one for the next version.
Got it, will do.
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH rfcv2 17/17] tests/qtest: Add intel-iommu test
2024-05-22 12:46 ` Thomas Huth
@ 2024-05-23 9:46 ` Duan, Zhenzhong
0 siblings, 0 replies; 29+ messages in thread
From: Duan, Zhenzhong @ 2024-05-23 9:46 UTC (permalink / raw)
To: Thomas Huth, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com,
mst@redhat.com, peterx@redhat.com, jasowang@redhat.com,
jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com,
clement.mathieu--drif@eviden.com, Tian, Kevin, Liu, Yi L,
Peng, Chao P, Laurent Vivier, Paolo Bonzini, Michael S. Tsirkin
>-----Original Message-----
>From: Thomas Huth <thuth@redhat.com>
>Subject: Re: [PATCH rfcv2 17/17] tests/qtest: Add intel-iommu test
>
>On 22/05/2024 08.23, Zhenzhong Duan wrote:
>> Add the framework to test the intel-iommu device.
>>
>> Currently only tested cap/ecap bits correctness in scalable
>> modern mode. Also tested cap/ecap bits consistency before
>> and after system reset.
>>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>> MAINTAINERS | 1 +
>> tests/qtest/intel-iommu-test.c | 63
>++++++++++++++++++++++++++++++++++
>> tests/qtest/meson.build | 1 +
>> 3 files changed, 65 insertions(+)
>> create mode 100644 tests/qtest/intel-iommu-test.c
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 5dab60bd04..f1ef6128c8 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -3656,6 +3656,7 @@ S: Supported
>> F: hw/i386/intel_iommu.c
>> F: hw/i386/intel_iommu_internal.h
>> F: include/hw/i386/intel_iommu.h
>> +F: tests/qtest/intel-iommu-test.c
>>
>> AMD-Vi Emulation
>> S: Orphan
>> diff --git a/tests/qtest/intel-iommu-test.c b/tests/qtest/intel-iommu-test.c
>> new file mode 100644
>> index 0000000000..e1273bce14
>> --- /dev/null
>> +++ b/tests/qtest/intel-iommu-test.c
>> @@ -0,0 +1,63 @@
>> +/*
>> + * QTest testcase for intel-iommu
>> + *
>> + * Copyright (c) 2024 Intel, Inc.
>> + *
>> + * Author: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or
>later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "libqtest-single.h"
>
>It's a little bit nicer to write new tests without libqtest-single.h (e.g.
>in case you ever add migration tests later, you must not use anything that
>uses a global state), so I'd recommend to use "qts = qtest_init(...)"
>instead of qtest_start(...) and then to use the functions with the "qtest_"
>prefix instead of the other functions from libqtest-single.h ... but it's
>only a recommendation, up to you whether you want to respin your patch
>with
>it or not.
Got it, I'll fix it in next version.
>
>Anyway:
>Acked-by: Thomas Huth <thuth@redhat.com>
>
>Do you want me to pick this up through the qtest tree, or shall this go
>through some x86-related tree instead?
This patch depends on other functional patches in this series,
So maybe going through x86-related tree with others is better.
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH rfcv2 15/17] intel_iommu: Set default aw_bits to 48 in scalable modren mode
2024-05-22 6:23 ` [PATCH rfcv2 15/17] intel_iommu: Set default aw_bits to 48 in scalable modren mode Zhenzhong Duan
@ 2024-05-24 13:56 ` CLEMENT MATHIEU--DRIF
2024-05-27 3:16 ` Duan, Zhenzhong
0 siblings, 1 reply; 29+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-05-24 13:56 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com,
mst@redhat.com, peterx@redhat.com, jasowang@redhat.com,
jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com,
kevin.tian@intel.com, yi.l.liu@intel.com, chao.p.peng@intel.com,
Paolo Bonzini, Richard Henderson, Eduardo Habkost,
Marcel Apfelbaum
Hi Zhenzhong
On 22/05/2024 08:23, Zhenzhong Duan wrote:
> Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.
>
>
> According to VTD spec, stage-1 page table could support 4-level and
> 5-level paging.
>
> However, 5-level paging translation emulation is unsupported yet.
> That means the only supported value for aw_bits is 48.
>
> So default aw_bits to 48 in scalable modern mode. In other cases,
> it is still default to 39 for compatibility.
>
> Add a check to ensure user specified value is 48 in modern mode
> for now.
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> hw/i386/intel_iommu.c | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index e07daaba99..a4c241ea96 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -3748,7 +3748,7 @@ static Property vtd_properties[] = {
> ON_OFF_AUTO_AUTO),
> DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false),
> DEFINE_PROP_UINT8("aw-bits", IntelIOMMUState, aw_bits,
> - VTD_HOST_ADDRESS_WIDTH),
> + 0xff),
you could define a constant for this invalid value
> DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE),
> DEFINE_PROP_BOOL("x-scalable-mode", IntelIOMMUState, scalable_mode, FALSE),
> DEFINE_PROP_BOOL("snoop-control", IntelIOMMUState, snoop_control, false),
> @@ -4663,6 +4663,14 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
> }
> }
>
> + if (s->aw_bits == 0xff) {
> + if (s->scalable_modern) {
> + s->aw_bits = VTD_HOST_AW_48BIT;
> + } else {
> + s->aw_bits = VTD_HOST_AW_39BIT;
> + }
> + }
> +
> if ((s->aw_bits != VTD_HOST_AW_39BIT) &&
> (s->aw_bits != VTD_HOST_AW_48BIT) &&
> !s->scalable_modern) {
> @@ -4671,6 +4679,12 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
> return false;
> }
>
> + if ((s->aw_bits != VTD_HOST_AW_48BIT) && s->scalable_modern) {
> + error_setg(errp, "Supported values for aw-bits are: %d",
specify 'in modern mode' in the message?
> + VTD_HOST_AW_48BIT);
> + return false;
> + }
> +
> if (s->scalable_mode && !s->dma_drain) {
> error_setg(errp, "Need to set dma_drain for scalable mode");
> return false;
> --
> 2.34.1
>
#cmd
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH rfcv2 09/17] intel_iommu: Flush stage-1 cache in iotlb invalidation
2024-05-22 6:23 ` [PATCH rfcv2 09/17] intel_iommu: Flush stage-1 cache in iotlb invalidation Zhenzhong Duan
@ 2024-05-24 13:57 ` CLEMENT MATHIEU--DRIF
2024-05-27 3:17 ` Duan, Zhenzhong
0 siblings, 1 reply; 29+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-05-24 13:57 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com,
mst@redhat.com, peterx@redhat.com, jasowang@redhat.com,
jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com,
kevin.tian@intel.com, yi.l.liu@intel.com, chao.p.peng@intel.com,
Paolo Bonzini, Richard Henderson, Eduardo Habkost,
Marcel Apfelbaum
Hi Zhenzhong
On 22/05/2024 08:23, Zhenzhong Duan wrote:
> Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.
>
>
> According to spec, Page-Selective-within-Domain Invalidation (11b):
>
> 1. IOTLB entries caching second-stage mappings (PGTT=010b) or pass-through
> (PGTT=100b) mappings associated with the specified domain-id and the
> input-address range are invalidated.
> 2. IOTLB entries caching first-stage (PGTT=001b) or nested (PGTT=011b)
> mapping associated with specified domain-id are invalidated.
>
> So per spec definition the Page-Selective-within-Domain Invalidation
> needs to flush first stage and nested cached IOTLB enties as well.
>
> We don't support nested yet and pass-through mapping is never cached,
> so what in iotlb cache are only first-stage and second-stage mappings.
>
> Add a tag pgtt in VTDIOTLBEntry to mark PGTT type of the mapping and
> invalidate entries based on PGTT type.
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> include/hw/i386/intel_iommu.h | 1 +
> hw/i386/intel_iommu.c | 20 +++++++++++++++++---
> 2 files changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index 011f374883..b0d5b5a5be 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -156,6 +156,7 @@ struct VTDIOTLBEntry {
> uint64_t pte;
> uint64_t mask;
> uint8_t access_flags;
> + uint8_t pgtt;
> };
>
> /* VT-d Source-ID Qualifier types */
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 0801112e2e..0078bad9d4 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -287,9 +287,21 @@ static gboolean vtd_hash_remove_by_page(gpointer key, gpointer value,
> VTDIOTLBPageInvInfo *info = (VTDIOTLBPageInvInfo *)user_data;
> uint64_t gfn = (info->addr >> VTD_PAGE_SHIFT_4K) & info->mask;
> uint64_t gfn_tlb = (info->addr & entry->mask) >> VTD_PAGE_SHIFT_4K;
> - return (entry->domain_id == info->domain_id) &&
> - (((entry->gfn & info->mask) == gfn) ||
> - (entry->gfn == gfn_tlb));
> +
> + if (entry->domain_id != info->domain_id) {
> + return false;
> + }
> +
> + /*
> + * According to spec, IOTLB entries caching first-stage (PGTT=001b) or
> + * nested (PGTT=011b) mapping associated with specified domain-id are
> + * invalidated. Nested isn't supported yet, so only need to check 001b.
> + */
> + if (entry->pgtt == VTD_SM_PASID_ENTRY_FLT) {
> + return true;
> + }
> +
> + return (entry->gfn & info->mask) == gfn || entry->gfn == gfn_tlb;
> }
>
> /* Reset all the gen of VTDAddressSpace to zero and set the gen of
> @@ -382,6 +394,8 @@ static void vtd_update_iotlb(IntelIOMMUState *s, uint16_t source_id,
> entry->access_flags = access_flags;
> entry->mask = vtd_slpt_level_page_mask(level);
> entry->pasid = pasid;
> + entry->pgtt = s->scalable_modern ? VTD_SM_PASID_ENTRY_FLT
> + : VTD_SM_PASID_ENTRY_SLT;
What about passing pgtt as a parameter so that the translation type
detection is done only once (in vtd_do_iommu_translate)?
>
> key->gfn = gfn;
> key->sid = source_id;
> --
> 2.34.1
>
#cmd
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH rfcv2 06/17] intel_iommu: Implement stage-1 translation
2024-05-22 6:23 ` [PATCH rfcv2 06/17] intel_iommu: Implement stage-1 translation Zhenzhong Duan
@ 2024-05-24 13:57 ` CLEMENT MATHIEU--DRIF
2024-05-27 3:17 ` Duan, Zhenzhong
0 siblings, 1 reply; 29+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-05-24 13:57 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com,
mst@redhat.com, peterx@redhat.com, jasowang@redhat.com,
jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com,
kevin.tian@intel.com, yi.l.liu@intel.com, chao.p.peng@intel.com,
Yi Sun, Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
Eduardo Habkost
Hi Zhenzhong,
I already sent you my comments about this patch earlier (question about
checking pgtt) but here is a style review
On 22/05/2024 08:23, Zhenzhong Duan wrote:
> Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.
>
>
> From: Yi Liu <yi.l.liu@intel.com>
>
> This adds stage-1 page table walking to support stage-1 only
> transltion in scalable modern mode.
>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> hw/i386/intel_iommu_internal.h | 17 +++++
> hw/i386/intel_iommu.c | 128 +++++++++++++++++++++++++++++++--
> 2 files changed, 141 insertions(+), 4 deletions(-)
>
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index 0e240d6d54..abfdbd5f65 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -534,6 +534,23 @@ typedef struct VTDRootEntry VTDRootEntry;
> #define VTD_SM_PASID_ENTRY_AW 7ULL /* Adjusted guest-address-width */
> #define VTD_SM_PASID_ENTRY_DID(val) ((val) & VTD_DOMAIN_ID_MASK)
>
> +#define VTD_SM_PASID_ENTRY_FLPM 3ULL
> +#define VTD_SM_PASID_ENTRY_FLPTPTR (~0xfffULL)
> +
> +/* Paging Structure common */
> +#define VTD_FL_PT_PAGE_SIZE_MASK (1ULL << 7)
> +/* Bits to decide the offset for each level */
> +#define VTD_FL_LEVEL_BITS 9
> +
> +/* First Level Paging Structure */
> +#define VTD_FL_PT_LEVEL 1
> +#define VTD_FL_PT_ENTRY_NR 512
> +
> +/* Masks for First Level Paging Entry */
> +#define VTD_FL_RW_MASK (1ULL << 1)
> +#define VTD_FL_PT_BASE_ADDR_MASK(aw) (~(VTD_PAGE_SIZE - 1) & VTD_HAW_MASK(aw))
> +#define VTD_PASID_ENTRY_FPD (1ULL << 1) /* Fault Processing Disable */
> +
> /* Second Level Page Translation Pointer*/
> #define VTD_SM_PASID_ENTRY_SLPTPTR (~0xfffULL)
>
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 544e8f0e40..cf29809bc1 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -50,6 +50,8 @@
> /* pe operations */
> #define VTD_PE_GET_TYPE(pe) ((pe)->val[0] & VTD_SM_PASID_ENTRY_PGTT)
> #define VTD_PE_GET_LEVEL(pe) (2 + (((pe)->val[0] >> 2) & VTD_SM_PASID_ENTRY_AW))
> +#define VTD_PE_GET_FLPT_LEVEL(pe) \
> + (4 + (((pe)->val[2] >> 2) & VTD_SM_PASID_ENTRY_FLPM))
>
> /*
> * PCI bus number (or SID) is not reliable since the device is usaully
> @@ -823,6 +825,11 @@ static int vtd_get_pe_in_pasid_leaf_table(IntelIOMMUState *s,
> return -VTD_FR_PASID_TABLE_ENTRY_INV;
> }
>
> + if (pgtt == VTD_SM_PASID_ENTRY_FLT &&
> + VTD_PE_GET_FLPT_LEVEL(pe) != 4) {
Maybe you could add a function to check if the level is supported.
And it would also be nice to rename vtd_is_level_supported (used just
above these lines) to make it clear that it's only relevant for second
level translations and avoid mistakes
> + return -VTD_FR_PASID_TABLE_ENTRY_INV;
> + }
> +
> return 0;
> }
>
> @@ -958,7 +965,11 @@ static uint32_t vtd_get_iova_level(IntelIOMMUState *s,
>
> if (s->root_scalable) {
> vtd_ce_get_rid2pasid_entry(s, ce, &pe, pasid);
> - return VTD_PE_GET_LEVEL(&pe);
> + if (s->scalable_modern) {
> + return VTD_PE_GET_FLPT_LEVEL(&pe);
> + } else {
> + return VTD_PE_GET_LEVEL(&pe);
same, could be renamed
> + }
> }
>
> return vtd_ce_get_level(ce);
> @@ -1045,7 +1056,11 @@ static dma_addr_t vtd_get_iova_pgtbl_base(IntelIOMMUState *s,
>
> if (s->root_scalable) {
> vtd_ce_get_rid2pasid_entry(s, ce, &pe, pasid);
> - return pe.val[0] & VTD_SM_PASID_ENTRY_SLPTPTR;
> + if (s->scalable_modern) {
> + return pe.val[2] & VTD_SM_PASID_ENTRY_FLPTPTR;
> + } else {
> + return pe.val[0] & VTD_SM_PASID_ENTRY_SLPTPTR;
> + }
> }
>
> return vtd_ce_get_slpt_base(ce);
> @@ -1847,6 +1862,106 @@ out:
> trace_vtd_pt_enable_fast_path(source_id, success);
> }
>
> +/* The shift of an addr for a certain level of paging structure */
> +static inline uint32_t vtd_flpt_level_shift(uint32_t level)
> +{
> + assert(level != 0);
> + return VTD_PAGE_SHIFT_4K + (level - 1) * VTD_FL_LEVEL_BITS;
> +}
> +
> +/*
> + * Given an iova and the level of paging structure, return the offset
> + * of current level.
> + */
> +static inline uint32_t vtd_iova_fl_level_offset(uint64_t iova, uint32_t level)
> +{
> + return (iova >> vtd_flpt_level_shift(level)) &
> + ((1ULL << VTD_FL_LEVEL_BITS) - 1);
> +}
> +
> +/* Get the content of a flpte located in @base_addr[@index] */
> +static uint64_t vtd_get_flpte(dma_addr_t base_addr, uint32_t index)
> +{
> + uint64_t flpte;
> +
> + assert(index < VTD_FL_PT_ENTRY_NR);
> +
> + if (dma_memory_read(&address_space_memory,
> + base_addr + index * sizeof(flpte), &flpte,
> + sizeof(flpte), MEMTXATTRS_UNSPECIFIED)) {
> + flpte = (uint64_t)-1;
> + return flpte;
> + }
> + flpte = le64_to_cpu(flpte);
> + return flpte;
> +}
> +
> +static inline bool vtd_flpte_present(uint64_t flpte)
> +{
> + return !!(flpte & 0x1);
Shouldn't we use a #define instead of hardcoding the mask?
> +}
> +
> +/* Whether the pte indicates the address of the page frame */
> +static inline bool vtd_is_last_flpte(uint64_t flpte, uint32_t level)
> +{
> + return level == VTD_FL_PT_LEVEL || (flpte & VTD_FL_PT_PAGE_SIZE_MASK);
> +}
> +
> +static inline uint64_t vtd_get_flpte_addr(uint64_t flpte, uint8_t aw)
> +{
> + return flpte & VTD_FL_PT_BASE_ADDR_MASK(aw);
> +}
> +
> +/*
> + * Given the @iova, get relevant @flptep. @flpte_level will be the last level
> + * of the translation, can be used for deciding the size of large page.
> + */
> +static int vtd_iova_to_flpte(IntelIOMMUState *s, VTDContextEntry *ce,
> + uint64_t iova, bool is_write,
> + uint64_t *flptep, uint32_t *flpte_level,
> + bool *reads, bool *writes, uint8_t aw_bits,
> + uint32_t pasid)
> +{
> + dma_addr_t addr = vtd_get_iova_pgtbl_base(s, ce, pasid);
> + uint32_t level = vtd_get_iova_level(s, ce, pasid);
> + uint32_t offset;
> + uint64_t flpte;
> +
> + while (true) {
> + offset = vtd_iova_fl_level_offset(iova, level);
> + flpte = vtd_get_flpte(addr, offset);
> + if (flpte == (uint64_t)-1) {
> + if (level == vtd_get_iova_level(s, ce, pasid)) {
> + /* Invalid programming of context-entry */
> + return -VTD_FR_CONTEXT_ENTRY_INV;
> + } else {
> + return -VTD_FR_PAGING_ENTRY_INV;
> + }
> + }
> +
> + if (!vtd_flpte_present(flpte)) {
> + *reads = false;
> + *writes = false;
> + return -VTD_FR_PAGING_ENTRY_INV;
> + }
> +
> + *reads = true;
> + *writes = (*writes) && (flpte & VTD_FL_RW_MASK);
> + if (is_write && !(flpte & VTD_FL_RW_MASK)) {
> + return -VTD_FR_WRITE;
> + }
> +
> + if (vtd_is_last_flpte(flpte, level)) {
> + *flptep = flpte;
> + *flpte_level = level;
> + return 0;
> + }
> +
> + addr = vtd_get_flpte_addr(flpte, aw_bits);
> + level--;
> + }
> +}
> +
> static void vtd_report_fault(IntelIOMMUState *s,
> int err, bool is_fpd_set,
> uint16_t source_id,
> @@ -1995,8 +2110,13 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
> }
> }
>
> - ret_fr = vtd_iova_to_slpte(s, &ce, addr, is_write, &pte, &level,
> - &reads, &writes, s->aw_bits, pasid);
> + if (s->scalable_modern) {
> + ret_fr = vtd_iova_to_flpte(s, &ce, addr, is_write, &pte, &level,
> + &reads, &writes, s->aw_bits, pasid);
> + } else {
> + ret_fr = vtd_iova_to_slpte(s, &ce, addr, is_write, &pte, &level,
> + &reads, &writes, s->aw_bits, pasid);
> + }
>
> if (ret_fr) {
> vtd_report_fault(s, -ret_fr, is_fpd_set, source_id,
> --
> 2.34.1
>
#cmd
^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH rfcv2 15/17] intel_iommu: Set default aw_bits to 48 in scalable modren mode
2024-05-24 13:56 ` CLEMENT MATHIEU--DRIF
@ 2024-05-27 3:16 ` Duan, Zhenzhong
2024-05-27 5:14 ` CLEMENT MATHIEU--DRIF
0 siblings, 1 reply; 29+ messages in thread
From: Duan, Zhenzhong @ 2024-05-27 3:16 UTC (permalink / raw)
To: CLEMENT MATHIEU--DRIF, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com,
mst@redhat.com, peterx@redhat.com, jasowang@redhat.com,
jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com,
Tian, Kevin, Liu, Yi L, Peng, Chao P, Paolo Bonzini,
Richard Henderson, Eduardo Habkost, Marcel Apfelbaum
Hi Clement,
>-----Original Message-----
>From: CLEMENT MATHIEU--DRIF <clement.mathieu--drif@eviden.com>
>Sent: Friday, May 24, 2024 9:57 PM
>To: Duan, Zhenzhong <zhenzhong.duan@intel.com>; qemu-
>devel@nongnu.org
>Cc: alex.williamson@redhat.com; clg@redhat.com; eric.auger@redhat.com;
>mst@redhat.com; peterx@redhat.com; jasowang@redhat.com;
>jgg@nvidia.com; nicolinc@nvidia.com; joao.m.martins@oracle.com; Tian,
>Kevin <kevin.tian@intel.com>; Liu, Yi L <yi.l.liu@intel.com>; Peng, Chao P
><chao.p.peng@intel.com>; Paolo Bonzini <pbonzini@redhat.com>; Richard
>Henderson <richard.henderson@linaro.org>; Eduardo Habkost
><eduardo@habkost.net>; Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
>Subject: Re: [PATCH rfcv2 15/17] intel_iommu: Set default aw_bits to 48 in
>scalable modren mode
>
>Hi Zhenzhong
>
>On 22/05/2024 08:23, Zhenzhong Duan wrote:
>> Caution: External email. Do not open attachments or click links, unless this
>email comes from a known sender and you know the content is safe.
>>
>>
>> According to VTD spec, stage-1 page table could support 4-level and
>> 5-level paging.
>>
>> However, 5-level paging translation emulation is unsupported yet.
>> That means the only supported value for aw_bits is 48.
>>
>> So default aw_bits to 48 in scalable modern mode. In other cases,
>> it is still default to 39 for compatibility.
>>
>> Add a check to ensure user specified value is 48 in modern mode
>> for now.
>>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>> hw/i386/intel_iommu.c | 16 +++++++++++++++-
>> 1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index e07daaba99..a4c241ea96 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -3748,7 +3748,7 @@ static Property vtd_properties[] = {
>> ON_OFF_AUTO_AUTO),
>> DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim,
>false),
>> DEFINE_PROP_UINT8("aw-bits", IntelIOMMUState, aw_bits,
>> - VTD_HOST_ADDRESS_WIDTH),
>> + 0xff),
>you could define a constant for this invalid value
Sure, maybe VTD_HOST_ADDRESS_WIDTH_UNDEFINED?
Thanks
Zhenzhong
>> DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode,
>FALSE),
>> DEFINE_PROP_BOOL("x-scalable-mode", IntelIOMMUState,
>scalable_mode, FALSE),
>> DEFINE_PROP_BOOL("snoop-control", IntelIOMMUState,
>snoop_control, false),
>> @@ -4663,6 +4663,14 @@ static bool
>vtd_decide_config(IntelIOMMUState *s, Error **errp)
>> }
>> }
>>
>> + if (s->aw_bits == 0xff) {
>> + if (s->scalable_modern) {
>> + s->aw_bits = VTD_HOST_AW_48BIT;
>> + } else {
>> + s->aw_bits = VTD_HOST_AW_39BIT;
>> + }
>> + }
>> +
>> if ((s->aw_bits != VTD_HOST_AW_39BIT) &&
>> (s->aw_bits != VTD_HOST_AW_48BIT) &&
>> !s->scalable_modern) {
>> @@ -4671,6 +4679,12 @@ static bool
>vtd_decide_config(IntelIOMMUState *s, Error **errp)
>> return false;
>> }
>>
>> + if ((s->aw_bits != VTD_HOST_AW_48BIT) && s->scalable_modern) {
>> + error_setg(errp, "Supported values for aw-bits are: %d",
>specify 'in modern mode' in the message?
>> + VTD_HOST_AW_48BIT);
>> + return false;
>> + }
>> +
>> if (s->scalable_mode && !s->dma_drain) {
>> error_setg(errp, "Need to set dma_drain for scalable mode");
>> return false;
>> --
>> 2.34.1
>>
>#cmd
^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH rfcv2 09/17] intel_iommu: Flush stage-1 cache in iotlb invalidation
2024-05-24 13:57 ` CLEMENT MATHIEU--DRIF
@ 2024-05-27 3:17 ` Duan, Zhenzhong
0 siblings, 0 replies; 29+ messages in thread
From: Duan, Zhenzhong @ 2024-05-27 3:17 UTC (permalink / raw)
To: CLEMENT MATHIEU--DRIF, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com,
mst@redhat.com, peterx@redhat.com, jasowang@redhat.com,
jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com,
Tian, Kevin, Liu, Yi L, Peng, Chao P, Paolo Bonzini,
Richard Henderson, Eduardo Habkost, Marcel Apfelbaum
>-----Original Message-----
>From: CLEMENT MATHIEU--DRIF <clement.mathieu--drif@eviden.com>
>Subject: Re: [PATCH rfcv2 09/17] intel_iommu: Flush stage-1 cache in iotlb
>invalidation
>
>Hi Zhenzhong
>
>On 22/05/2024 08:23, Zhenzhong Duan wrote:
>> Caution: External email. Do not open attachments or click links, unless this
>email comes from a known sender and you know the content is safe.
>>
>>
>> According to spec, Page-Selective-within-Domain Invalidation (11b):
>>
>> 1. IOTLB entries caching second-stage mappings (PGTT=010b) or pass-
>through
>> (PGTT=100b) mappings associated with the specified domain-id and the
>> input-address range are invalidated.
>> 2. IOTLB entries caching first-stage (PGTT=001b) or nested (PGTT=011b)
>> mapping associated with specified domain-id are invalidated.
>>
>> So per spec definition the Page-Selective-within-Domain Invalidation
>> needs to flush first stage and nested cached IOTLB enties as well.
>>
>> We don't support nested yet and pass-through mapping is never cached,
>> so what in iotlb cache are only first-stage and second-stage mappings.
>>
>> Add a tag pgtt in VTDIOTLBEntry to mark PGTT type of the mapping and
>> invalidate entries based on PGTT type.
>>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>> include/hw/i386/intel_iommu.h | 1 +
>> hw/i386/intel_iommu.c | 20 +++++++++++++++++---
>> 2 files changed, 18 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/hw/i386/intel_iommu.h
>b/include/hw/i386/intel_iommu.h
>> index 011f374883..b0d5b5a5be 100644
>> --- a/include/hw/i386/intel_iommu.h
>> +++ b/include/hw/i386/intel_iommu.h
>> @@ -156,6 +156,7 @@ struct VTDIOTLBEntry {
>> uint64_t pte;
>> uint64_t mask;
>> uint8_t access_flags;
>> + uint8_t pgtt;
>> };
>>
>> /* VT-d Source-ID Qualifier types */
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index 0801112e2e..0078bad9d4 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -287,9 +287,21 @@ static gboolean
>vtd_hash_remove_by_page(gpointer key, gpointer value,
>> VTDIOTLBPageInvInfo *info = (VTDIOTLBPageInvInfo *)user_data;
>> uint64_t gfn = (info->addr >> VTD_PAGE_SHIFT_4K) & info->mask;
>> uint64_t gfn_tlb = (info->addr & entry->mask) >> VTD_PAGE_SHIFT_4K;
>> - return (entry->domain_id == info->domain_id) &&
>> - (((entry->gfn & info->mask) == gfn) ||
>> - (entry->gfn == gfn_tlb));
>> +
>> + if (entry->domain_id != info->domain_id) {
>> + return false;
>> + }
>> +
>> + /*
>> + * According to spec, IOTLB entries caching first-stage (PGTT=001b) or
>> + * nested (PGTT=011b) mapping associated with specified domain-id
>are
>> + * invalidated. Nested isn't supported yet, so only need to check 001b.
>> + */
>> + if (entry->pgtt == VTD_SM_PASID_ENTRY_FLT) {
>> + return true;
>> + }
>> +
>> + return (entry->gfn & info->mask) == gfn || entry->gfn == gfn_tlb;
>> }
>>
>> /* Reset all the gen of VTDAddressSpace to zero and set the gen of
>> @@ -382,6 +394,8 @@ static void vtd_update_iotlb(IntelIOMMUState *s,
>uint16_t source_id,
>> entry->access_flags = access_flags;
>> entry->mask = vtd_slpt_level_page_mask(level);
>> entry->pasid = pasid;
>> + entry->pgtt = s->scalable_modern ? VTD_SM_PASID_ENTRY_FLT
>> + : VTD_SM_PASID_ENTRY_SLT;
>What about passing pgtt as a parameter so that the translation type
>detection is done only once (in vtd_do_iommu_translate)?
Good idea, will do.
Thanks
Zhenzhong
>>
>> key->gfn = gfn;
>> key->sid = source_id;
>> --
>> 2.34.1
>>
>#cmd
^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH rfcv2 06/17] intel_iommu: Implement stage-1 translation
2024-05-24 13:57 ` CLEMENT MATHIEU--DRIF
@ 2024-05-27 3:17 ` Duan, Zhenzhong
0 siblings, 0 replies; 29+ messages in thread
From: Duan, Zhenzhong @ 2024-05-27 3:17 UTC (permalink / raw)
To: CLEMENT MATHIEU--DRIF, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com,
mst@redhat.com, peterx@redhat.com, jasowang@redhat.com,
jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com,
Tian, Kevin, Liu, Yi L, Peng, Chao P, Yi Sun, Marcel Apfelbaum,
Paolo Bonzini, Richard Henderson, Eduardo Habkost
>-----Original Message-----
>From: CLEMENT MATHIEU--DRIF <clement.mathieu--drif@eviden.com>
>Subject: Re: [PATCH rfcv2 06/17] intel_iommu: Implement stage-1
>translation
>
>Hi Zhenzhong,
>
>I already sent you my comments about this patch earlier (question about
>checking pgtt) but here is a style review
>
>On 22/05/2024 08:23, Zhenzhong Duan wrote:
>> Caution: External email. Do not open attachments or click links, unless this
>email comes from a known sender and you know the content is safe.
>>
>>
>> From: Yi Liu <yi.l.liu@intel.com>
>>
>> This adds stage-1 page table walking to support stage-1 only
>> transltion in scalable modern mode.
>>
>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>> hw/i386/intel_iommu_internal.h | 17 +++++
>> hw/i386/intel_iommu.c | 128
>+++++++++++++++++++++++++++++++--
>> 2 files changed, 141 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/i386/intel_iommu_internal.h
>b/hw/i386/intel_iommu_internal.h
>> index 0e240d6d54..abfdbd5f65 100644
>> --- a/hw/i386/intel_iommu_internal.h
>> +++ b/hw/i386/intel_iommu_internal.h
>> @@ -534,6 +534,23 @@ typedef struct VTDRootEntry VTDRootEntry;
>> #define VTD_SM_PASID_ENTRY_AW 7ULL /* Adjusted guest-
>address-width */
>> #define VTD_SM_PASID_ENTRY_DID(val) ((val) &
>VTD_DOMAIN_ID_MASK)
>>
>> +#define VTD_SM_PASID_ENTRY_FLPM 3ULL
>> +#define VTD_SM_PASID_ENTRY_FLPTPTR (~0xfffULL)
>> +
>> +/* Paging Structure common */
>> +#define VTD_FL_PT_PAGE_SIZE_MASK (1ULL << 7)
>> +/* Bits to decide the offset for each level */
>> +#define VTD_FL_LEVEL_BITS 9
>> +
>> +/* First Level Paging Structure */
>> +#define VTD_FL_PT_LEVEL 1
>> +#define VTD_FL_PT_ENTRY_NR 512
>> +
>> +/* Masks for First Level Paging Entry */
>> +#define VTD_FL_RW_MASK (1ULL << 1)
>> +#define VTD_FL_PT_BASE_ADDR_MASK(aw) (~(VTD_PAGE_SIZE - 1) &
>VTD_HAW_MASK(aw))
>> +#define VTD_PASID_ENTRY_FPD (1ULL << 1) /* Fault Processing
>Disable */
>> +
>> /* Second Level Page Translation Pointer*/
>> #define VTD_SM_PASID_ENTRY_SLPTPTR (~0xfffULL)
>>
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index 544e8f0e40..cf29809bc1 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -50,6 +50,8 @@
>> /* pe operations */
>> #define VTD_PE_GET_TYPE(pe) ((pe)->val[0] &
>VTD_SM_PASID_ENTRY_PGTT)
>> #define VTD_PE_GET_LEVEL(pe) (2 + (((pe)->val[0] >> 2) &
>VTD_SM_PASID_ENTRY_AW))
>> +#define VTD_PE_GET_FLPT_LEVEL(pe) \
>> + (4 + (((pe)->val[2] >> 2) & VTD_SM_PASID_ENTRY_FLPM))
>>
>> /*
>> * PCI bus number (or SID) is not reliable since the device is usaully
>> @@ -823,6 +825,11 @@ static int
>vtd_get_pe_in_pasid_leaf_table(IntelIOMMUState *s,
>> return -VTD_FR_PASID_TABLE_ENTRY_INV;
>> }
>>
>> + if (pgtt == VTD_SM_PASID_ENTRY_FLT &&
>> + VTD_PE_GET_FLPT_LEVEL(pe) != 4) {
>Maybe you could add a function to check if the level is supported.
>And it would also be nice to rename vtd_is_level_supported (used just
>above these lines) to make it clear that it's only relevant for second
>level translations and avoid mistakes
Sure, will do.
Thanks
Zhenzhong
>> + return -VTD_FR_PASID_TABLE_ENTRY_INV;
>> + }
>> +
>> return 0;
>> }
>>
>> @@ -958,7 +965,11 @@ static uint32_t
>vtd_get_iova_level(IntelIOMMUState *s,
>>
>> if (s->root_scalable) {
>> vtd_ce_get_rid2pasid_entry(s, ce, &pe, pasid);
>> - return VTD_PE_GET_LEVEL(&pe);
>> + if (s->scalable_modern) {
>> + return VTD_PE_GET_FLPT_LEVEL(&pe);
>> + } else {
>> + return VTD_PE_GET_LEVEL(&pe);
>same, could be renamed
>> + }
>> }
>>
>> return vtd_ce_get_level(ce);
>> @@ -1045,7 +1056,11 @@ static dma_addr_t
>vtd_get_iova_pgtbl_base(IntelIOMMUState *s,
>>
>> if (s->root_scalable) {
>> vtd_ce_get_rid2pasid_entry(s, ce, &pe, pasid);
>> - return pe.val[0] & VTD_SM_PASID_ENTRY_SLPTPTR;
>> + if (s->scalable_modern) {
>> + return pe.val[2] & VTD_SM_PASID_ENTRY_FLPTPTR;
>> + } else {
>> + return pe.val[0] & VTD_SM_PASID_ENTRY_SLPTPTR;
>> + }
>> }
>>
>> return vtd_ce_get_slpt_base(ce);
>> @@ -1847,6 +1862,106 @@ out:
>> trace_vtd_pt_enable_fast_path(source_id, success);
>> }
>>
>> +/* The shift of an addr for a certain level of paging structure */
>> +static inline uint32_t vtd_flpt_level_shift(uint32_t level)
>> +{
>> + assert(level != 0);
>> + return VTD_PAGE_SHIFT_4K + (level - 1) * VTD_FL_LEVEL_BITS;
>> +}
>> +
>> +/*
>> + * Given an iova and the level of paging structure, return the offset
>> + * of current level.
>> + */
>> +static inline uint32_t vtd_iova_fl_level_offset(uint64_t iova, uint32_t
>level)
>> +{
>> + return (iova >> vtd_flpt_level_shift(level)) &
>> + ((1ULL << VTD_FL_LEVEL_BITS) - 1);
>> +}
>> +
>> +/* Get the content of a flpte located in @base_addr[@index] */
>> +static uint64_t vtd_get_flpte(dma_addr_t base_addr, uint32_t index)
>> +{
>> + uint64_t flpte;
>> +
>> + assert(index < VTD_FL_PT_ENTRY_NR);
>> +
>> + if (dma_memory_read(&address_space_memory,
>> + base_addr + index * sizeof(flpte), &flpte,
>> + sizeof(flpte), MEMTXATTRS_UNSPECIFIED)) {
>> + flpte = (uint64_t)-1;
>> + return flpte;
>> + }
>> + flpte = le64_to_cpu(flpte);
>> + return flpte;
>> +}
>> +
>> +static inline bool vtd_flpte_present(uint64_t flpte)
>> +{
>> + return !!(flpte & 0x1);
>Shouldn't we use a #define instead of hardcoding the mask?
>> +}
>> +
>> +/* Whether the pte indicates the address of the page frame */
>> +static inline bool vtd_is_last_flpte(uint64_t flpte, uint32_t level)
>> +{
>> + return level == VTD_FL_PT_LEVEL || (flpte &
>VTD_FL_PT_PAGE_SIZE_MASK);
>> +}
>> +
>> +static inline uint64_t vtd_get_flpte_addr(uint64_t flpte, uint8_t aw)
>> +{
>> + return flpte & VTD_FL_PT_BASE_ADDR_MASK(aw);
>> +}
>> +
>> +/*
>> + * Given the @iova, get relevant @flptep. @flpte_level will be the last
>level
>> + * of the translation, can be used for deciding the size of large page.
>> + */
>> +static int vtd_iova_to_flpte(IntelIOMMUState *s, VTDContextEntry *ce,
>> + uint64_t iova, bool is_write,
>> + uint64_t *flptep, uint32_t *flpte_level,
>> + bool *reads, bool *writes, uint8_t aw_bits,
>> + uint32_t pasid)
>> +{
>> + dma_addr_t addr = vtd_get_iova_pgtbl_base(s, ce, pasid);
>> + uint32_t level = vtd_get_iova_level(s, ce, pasid);
>> + uint32_t offset;
>> + uint64_t flpte;
>> +
>> + while (true) {
>> + offset = vtd_iova_fl_level_offset(iova, level);
>> + flpte = vtd_get_flpte(addr, offset);
>> + if (flpte == (uint64_t)-1) {
>> + if (level == vtd_get_iova_level(s, ce, pasid)) {
>> + /* Invalid programming of context-entry */
>> + return -VTD_FR_CONTEXT_ENTRY_INV;
>> + } else {
>> + return -VTD_FR_PAGING_ENTRY_INV;
>> + }
>> + }
>> +
>> + if (!vtd_flpte_present(flpte)) {
>> + *reads = false;
>> + *writes = false;
>> + return -VTD_FR_PAGING_ENTRY_INV;
>> + }
>> +
>> + *reads = true;
>> + *writes = (*writes) && (flpte & VTD_FL_RW_MASK);
>> + if (is_write && !(flpte & VTD_FL_RW_MASK)) {
>> + return -VTD_FR_WRITE;
>> + }
>> +
>> + if (vtd_is_last_flpte(flpte, level)) {
>> + *flptep = flpte;
>> + *flpte_level = level;
>> + return 0;
>> + }
>> +
>> + addr = vtd_get_flpte_addr(flpte, aw_bits);
>> + level--;
>> + }
>> +}
>> +
>> static void vtd_report_fault(IntelIOMMUState *s,
>> int err, bool is_fpd_set,
>> uint16_t source_id,
>> @@ -1995,8 +2110,13 @@ static bool
>vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
>> }
>> }
>>
>> - ret_fr = vtd_iova_to_slpte(s, &ce, addr, is_write, &pte, &level,
>> - &reads, &writes, s->aw_bits, pasid);
>> + if (s->scalable_modern) {
>> + ret_fr = vtd_iova_to_flpte(s, &ce, addr, is_write, &pte, &level,
>> + &reads, &writes, s->aw_bits, pasid);
>> + } else {
>> + ret_fr = vtd_iova_to_slpte(s, &ce, addr, is_write, &pte, &level,
>> + &reads, &writes, s->aw_bits, pasid);
>> + }
>>
>> if (ret_fr) {
>> vtd_report_fault(s, -ret_fr, is_fpd_set, source_id,
>> --
>> 2.34.1
>>
>#cmd
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH rfcv2 15/17] intel_iommu: Set default aw_bits to 48 in scalable modren mode
2024-05-27 3:16 ` Duan, Zhenzhong
@ 2024-05-27 5:14 ` CLEMENT MATHIEU--DRIF
0 siblings, 0 replies; 29+ messages in thread
From: CLEMENT MATHIEU--DRIF @ 2024-05-27 5:14 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com,
mst@redhat.com, peterx@redhat.com, jasowang@redhat.com,
jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com,
Tian, Kevin, Liu, Yi L, Peng, Chao P, Paolo Bonzini,
Richard Henderson, Eduardo Habkost, Marcel Apfelbaum
On 27/05/2024 05:16, Duan, Zhenzhong wrote:
> Caution: External email. Do not open attachments or click links, unless this email comes from a known sender and you know the content is safe.
>
>
> Hi Clement,
>
>> -----Original Message-----
>> From: CLEMENT MATHIEU--DRIF <clement.mathieu--drif@eviden.com>
>> Sent: Friday, May 24, 2024 9:57 PM
>> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>; qemu-
>> devel@nongnu.org
>> Cc: alex.williamson@redhat.com; clg@redhat.com; eric.auger@redhat.com;
>> mst@redhat.com; peterx@redhat.com; jasowang@redhat.com;
>> jgg@nvidia.com; nicolinc@nvidia.com; joao.m.martins@oracle.com; Tian,
>> Kevin <kevin.tian@intel.com>; Liu, Yi L <yi.l.liu@intel.com>; Peng, Chao P
>> <chao.p.peng@intel.com>; Paolo Bonzini <pbonzini@redhat.com>; Richard
>> Henderson <richard.henderson@linaro.org>; Eduardo Habkost
>> <eduardo@habkost.net>; Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
>> Subject: Re: [PATCH rfcv2 15/17] intel_iommu: Set default aw_bits to 48 in
>> scalable modren mode
>>
>> Hi Zhenzhong
>>
>> On 22/05/2024 08:23, Zhenzhong Duan wrote:
>>> Caution: External email. Do not open attachments or click links, unless this
>> email comes from a known sender and you know the content is safe.
>>>
>>> According to VTD spec, stage-1 page table could support 4-level and
>>> 5-level paging.
>>>
>>> However, 5-level paging translation emulation is unsupported yet.
>>> That means the only supported value for aw_bits is 48.
>>>
>>> So default aw_bits to 48 in scalable modern mode. In other cases,
>>> it is still default to 39 for compatibility.
>>>
>>> Add a check to ensure user specified value is 48 in modern mode
>>> for now.
>>>
>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>> ---
>>> hw/i386/intel_iommu.c | 16 +++++++++++++++-
>>> 1 file changed, 15 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>>> index e07daaba99..a4c241ea96 100644
>>> --- a/hw/i386/intel_iommu.c
>>> +++ b/hw/i386/intel_iommu.c
>>> @@ -3748,7 +3748,7 @@ static Property vtd_properties[] = {
>>> ON_OFF_AUTO_AUTO),
>>> DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim,
>> false),
>>> DEFINE_PROP_UINT8("aw-bits", IntelIOMMUState, aw_bits,
>>> - VTD_HOST_ADDRESS_WIDTH),
>>> + 0xff),
>> you could define a constant for this invalid value
> Sure, maybe VTD_HOST_ADDRESS_WIDTH_UNDEFINED?
Yes, fine for me
>
> Thanks
> Zhenzhong
>
>>> DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode,
>> FALSE),
>>> DEFINE_PROP_BOOL("x-scalable-mode", IntelIOMMUState,
>> scalable_mode, FALSE),
>>> DEFINE_PROP_BOOL("snoop-control", IntelIOMMUState,
>> snoop_control, false),
>>> @@ -4663,6 +4663,14 @@ static bool
>> vtd_decide_config(IntelIOMMUState *s, Error **errp)
>>> }
>>> }
>>>
>>> + if (s->aw_bits == 0xff) {
>>> + if (s->scalable_modern) {
>>> + s->aw_bits = VTD_HOST_AW_48BIT;
>>> + } else {
>>> + s->aw_bits = VTD_HOST_AW_39BIT;
>>> + }
>>> + }
>>> +
>>> if ((s->aw_bits != VTD_HOST_AW_39BIT) &&
>>> (s->aw_bits != VTD_HOST_AW_48BIT) &&
>>> !s->scalable_modern) {
>>> @@ -4671,6 +4679,12 @@ static bool
>> vtd_decide_config(IntelIOMMUState *s, Error **errp)
>>> return false;
>>> }
>>>
>>> + if ((s->aw_bits != VTD_HOST_AW_48BIT) && s->scalable_modern) {
>>> + error_setg(errp, "Supported values for aw-bits are: %d",
>> specify 'in modern mode' in the message?
>>> + VTD_HOST_AW_48BIT);
>>> + return false;
>>> + }
>>> +
>>> if (s->scalable_mode && !s->dma_drain) {
>>> error_setg(errp, "Need to set dma_drain for scalable mode");
>>> return false;
>>> --
>>> 2.34.1
>>>
>> #cmd
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2024-05-27 5:15 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-22 6:22 [PATCH rfcv2 00/17] intel_iommu: Enable stage-1 translation for emulated device Zhenzhong Duan
2024-05-22 6:22 ` [PATCH rfcv2 01/17] intel_iommu: Update version to 3.0 and add the latest fault reasons Zhenzhong Duan
2024-05-22 6:22 ` [PATCH rfcv2 02/17] intel_iommu: Make pasid entry type check accurate Zhenzhong Duan
2024-05-22 6:22 ` [PATCH rfcv2 03/17] intel_iommu: Add a placeholder variable for scalable modern mode Zhenzhong Duan
2024-05-22 6:23 ` [PATCH rfcv2 04/17] intel_iommu: Flush stage-2 cache in PADID-selective PASID-based iotlb invalidation Zhenzhong Duan
2024-05-22 6:23 ` [PATCH rfcv2 05/17] intel_iommu: Rename slpte to pte Zhenzhong Duan
2024-05-22 6:23 ` [PATCH rfcv2 06/17] intel_iommu: Implement stage-1 translation Zhenzhong Duan
2024-05-24 13:57 ` CLEMENT MATHIEU--DRIF
2024-05-27 3:17 ` Duan, Zhenzhong
2024-05-22 6:23 ` [PATCH rfcv2 07/17] intel_iommu: check if the input address is canonical Zhenzhong Duan
2024-05-22 6:23 ` [PATCH rfcv2 08/17] intel_iommu: set accessed and dirty bits during first stage translation Zhenzhong Duan
2024-05-22 6:23 ` [PATCH rfcv2 09/17] intel_iommu: Flush stage-1 cache in iotlb invalidation Zhenzhong Duan
2024-05-24 13:57 ` CLEMENT MATHIEU--DRIF
2024-05-27 3:17 ` Duan, Zhenzhong
2024-05-22 6:23 ` [PATCH rfcv2 10/17] intel_iommu: Process PASID-based " Zhenzhong Duan
2024-05-22 6:23 ` [PATCH rfcv2 11/17] intel_iommu: Extract device IOTLB invalidation logic Zhenzhong Duan
2024-05-22 6:23 ` [PATCH rfcv2 12/17] intel_iommu: add an internal API to find an address space with PASID Zhenzhong Duan
2024-05-22 6:23 ` [PATCH rfcv2 13/17] intel_iommu: add support for PASID-based device IOTLB invalidation Zhenzhong Duan
2024-05-22 6:23 ` [PATCH rfcv2 14/17] intel_iommu: piotlb invalidation should notify unmap Zhenzhong Duan
2024-05-22 6:23 ` [PATCH rfcv2 15/17] intel_iommu: Set default aw_bits to 48 in scalable modren mode Zhenzhong Duan
2024-05-24 13:56 ` CLEMENT MATHIEU--DRIF
2024-05-27 3:16 ` Duan, Zhenzhong
2024-05-27 5:14 ` CLEMENT MATHIEU--DRIF
2024-05-22 6:23 ` [PATCH rfcv2 16/17] intel_iommu: Modify x-scalable-mode to be string option Zhenzhong Duan
2024-05-22 6:23 ` [PATCH rfcv2 17/17] tests/qtest: Add intel-iommu test Zhenzhong Duan
2024-05-22 12:46 ` Thomas Huth
2024-05-23 9:46 ` Duan, Zhenzhong
2024-05-22 8:10 ` [PATCH rfcv2 00/17] intel_iommu: Enable stage-1 translation for emulated device Jason Wang
2024-05-23 9:35 ` Duan, Zhenzhong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).