* [Qemu-devel] [PATCH v3 0/2] intel-iommu: add support for 5-level virtual IOMMU. @ 2018-12-12 13:05 Yu Zhang 2018-12-12 13:05 ` [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width Yu Zhang ` (3 more replies) 0 siblings, 4 replies; 57+ messages in thread From: Yu Zhang @ 2018-12-12 13:05 UTC (permalink / raw) To: qemu-devel Cc: Michael S. Tsirkin, Igor Mammedov, Marcel Apfelbaum, Paolo Bonzini, Richard Henderson, Eduardo Habkost, Peter Xu Intel's upcoming processors will extend maximum linear address width to 57 bits, and introduce 5-level paging for CPU. Meanwhile, the platform will also extend the maximum guest address width for IOMMU to 57 bits, thus introducing the 5-level paging for 2nd level translation(See chapter 3 in Intel Virtualization Technology for Directed I/O). This patch series extends the current logic to support a wider address width. A 5-level paging capable IOMMU(for 2nd level translation) can be rendered with configuration "device intel-iommu,x-aw-bits=57". Also, kvm-unit-tests were updated to verify this patch series. Patch for the test was sent out at: https://www.spinics.net/lists/kvm/msg177425.html. Note: this patch series checks the existance of 5-level paging in the host and in the guest, and rejects configurations for 57-bit IOVA if either check fails(VTD-d hardware shall not support 57-bit IOVA on platforms without CPU 5-level paging). However, current vIOMMU implementation still lacks logic to check against the physical IOMMU capability, future enhancements are expected to do this. Changes in V3: - Address comments from Peter Xu: squash the 3rd patch in v2 into the 2nd patch in this version. - Added "Reviewed-by: Peter Xu <peterx@redhat.com>" Changes in V2: - Address comments from Peter Xu: add haw member in vtd_page_walk_info. - Address comments from Peter Xu: only searches for 4K/2M/1G mappings in iotlb are meaningful. - Address comments from Peter Xu: cover letter changes(e.g. mention the test patch in kvm-unit-tests). - Coding style changes. --- Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Peter Xu <peterx@redhat.com> --- Yu Zhang (2): intel-iommu: differentiate host address width from IOVA address width. intel-iommu: extend VTD emulation to allow 57-bit IOVA address width. hw/i386/acpi-build.c | 2 +- hw/i386/intel_iommu.c | 96 +++++++++++++++++++++++++++++------------- hw/i386/intel_iommu_internal.h | 10 ++++- include/hw/i386/intel_iommu.h | 10 +++-- 4 files changed, 81 insertions(+), 37 deletions(-) -- 1.9.1 ^ permalink raw reply [flat|nested] 57+ messages in thread
* [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-12 13:05 [Qemu-devel] [PATCH v3 0/2] intel-iommu: add support for 5-level virtual IOMMU Yu Zhang @ 2018-12-12 13:05 ` Yu Zhang 2018-12-17 13:17 ` Igor Mammedov 2018-12-12 13:05 ` [Qemu-devel] [PATCH v3 2/2] intel-iommu: extend VTD emulation to allow 57-bit " Yu Zhang ` (2 subsequent siblings) 3 siblings, 1 reply; 57+ messages in thread From: Yu Zhang @ 2018-12-12 13:05 UTC (permalink / raw) To: qemu-devel Cc: Michael S. Tsirkin, Igor Mammedov, Marcel Apfelbaum, Paolo Bonzini, Richard Henderson, Eduardo Habkost, Peter Xu Currently, vIOMMU is using the value of IOVA address width, instead of the host address width(HAW) to calculate the number of reserved bits in data structures such as root entries, context entries, and entries of DMA paging structures etc. However values of IOVA address width and of the HAW may not equal. For example, a 48-bit IOVA can only be mapped to host addresses no wider than 46 bits. Using 48, instead of 46 to calculate the reserved bit may result in an invalid IOVA being accepted. To fix this, a new field - haw_bits is introduced in struct IntelIOMMUState, whose value is initialized based on the maximum physical address set to guest CPU. Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed to clarify. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> --- Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Peter Xu <peterx@redhat.com> --- hw/i386/acpi-build.c | 2 +- hw/i386/intel_iommu.c | 55 ++++++++++++++++++++++++------------------- include/hw/i386/intel_iommu.h | 9 +++---- 3 files changed, 37 insertions(+), 29 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 236a20e..b989523 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2431,7 +2431,7 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker) } dmar = acpi_data_push(table_data, sizeof(*dmar)); - dmar->host_address_width = intel_iommu->aw_bits - 1; + dmar->host_address_width = intel_iommu->haw_bits - 1; dmar->flags = dmar_flags; /* DMAR Remapping Hardware Unit Definition structure */ diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index d97bcbc..0e88c63 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -707,7 +707,8 @@ static VTDBus *vtd_find_as_from_bus_num(IntelIOMMUState *s, uint8_t bus_num) */ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova, bool is_write, uint64_t *slptep, uint32_t *slpte_level, - bool *reads, bool *writes, uint8_t aw_bits) + bool *reads, bool *writes, uint8_t aw_bits, + uint8_t haw_bits) { dma_addr_t addr = vtd_ce_get_slpt_base(ce); uint32_t level = vtd_ce_get_level(ce); @@ -760,7 +761,7 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova, bool is_write, *slpte_level = level; return 0; } - addr = vtd_get_slpte_addr(slpte, aw_bits); + addr = vtd_get_slpte_addr(slpte, haw_bits); level--; } } @@ -783,6 +784,7 @@ typedef struct { void *private; bool notify_unmap; uint8_t aw; + uint8_t haw; uint16_t domain_id; } vtd_page_walk_info; @@ -925,7 +927,7 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start, * This is a valid PDE (or even bigger than PDE). We need * to walk one further level. */ - ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, info->aw), + ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, info->haw), iova, MIN(iova_next, end), level - 1, read_cur, write_cur, info); } else { @@ -942,7 +944,7 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start, entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur); entry.addr_mask = ~subpage_mask; /* NOTE: this is only meaningful if entry_valid == true */ - entry.translated_addr = vtd_get_slpte_addr(slpte, info->aw); + entry.translated_addr = vtd_get_slpte_addr(slpte, info->haw); ret = vtd_page_walk_one(&entry, info); } @@ -1002,7 +1004,7 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num, return -VTD_FR_ROOT_ENTRY_P; } - if (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD(s->aw_bits))) { + if (re.rsvd || (re.val & VTD_ROOT_ENTRY_RSVD(s->haw_bits))) { trace_vtd_re_invalid(re.rsvd, re.val); return -VTD_FR_ROOT_ENTRY_RSVD; } @@ -1019,7 +1021,7 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num, } if ((ce->hi & VTD_CONTEXT_ENTRY_RSVD_HI) || - (ce->lo & VTD_CONTEXT_ENTRY_RSVD_LO(s->aw_bits))) { + (ce->lo & VTD_CONTEXT_ENTRY_RSVD_LO(s->haw_bits))) { trace_vtd_ce_invalid(ce->hi, ce->lo); return -VTD_FR_CONTEXT_ENTRY_RSVD; } @@ -1056,6 +1058,7 @@ static int vtd_sync_shadow_page_table_range(VTDAddressSpace *vtd_as, .private = (void *)&vtd_as->iommu, .notify_unmap = true, .aw = s->aw_bits, + .haw = s->haw_bits, .as = vtd_as, .domain_id = VTD_CONTEXT_ENTRY_DID(ce->hi), }; @@ -1360,7 +1363,7 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus, } ret_fr = vtd_iova_to_slpte(&ce, addr, is_write, &slpte, &level, - &reads, &writes, s->aw_bits); + &reads, &writes, s->aw_bits, s->haw_bits); if (ret_fr) { ret_fr = -ret_fr; if (is_fpd_set && vtd_is_qualified_fault(ret_fr)) { @@ -1378,7 +1381,7 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus, out: vtd_iommu_unlock(s); entry->iova = addr & page_mask; - entry->translated_addr = vtd_get_slpte_addr(slpte, s->aw_bits) & page_mask; + entry->translated_addr = vtd_get_slpte_addr(slpte, s->haw_bits) & page_mask; entry->addr_mask = ~page_mask; entry->perm = access_flags; return true; @@ -1396,7 +1399,7 @@ static void vtd_root_table_setup(IntelIOMMUState *s) { s->root = vtd_get_quad_raw(s, DMAR_RTADDR_REG); s->root_extended = s->root & VTD_RTADDR_RTT; - s->root &= VTD_RTADDR_ADDR_MASK(s->aw_bits); + s->root &= VTD_RTADDR_ADDR_MASK(s->haw_bits); trace_vtd_reg_dmar_root(s->root, s->root_extended); } @@ -1412,7 +1415,7 @@ static void vtd_interrupt_remap_table_setup(IntelIOMMUState *s) uint64_t value = 0; value = vtd_get_quad_raw(s, DMAR_IRTA_REG); s->intr_size = 1UL << ((value & VTD_IRTA_SIZE_MASK) + 1); - s->intr_root = value & VTD_IRTA_ADDR_MASK(s->aw_bits); + s->intr_root = value & VTD_IRTA_ADDR_MASK(s->haw_bits); s->intr_eime = value & VTD_IRTA_EIME; /* Notify global invalidation */ @@ -1689,7 +1692,7 @@ static void vtd_handle_gcmd_qie(IntelIOMMUState *s, bool en) trace_vtd_inv_qi_enable(en); if (en) { - s->iq = iqa_val & VTD_IQA_IQA_MASK(s->aw_bits); + s->iq = iqa_val & VTD_IQA_IQA_MASK(s->haw_bits); /* 2^(x+8) entries */ s->iq_size = 1UL << ((iqa_val & VTD_IQA_QS) + 8); s->qi_enabled = true; @@ -2629,7 +2632,7 @@ static Property vtd_properties[] = { ON_OFF_AUTO_AUTO), DEFINE_PROP_BOOL("x-buggy-eim", IntelIOMMUState, buggy_eim, false), DEFINE_PROP_UINT8("x-aw-bits", IntelIOMMUState, aw_bits, - VTD_HOST_ADDRESS_WIDTH), + VTD_ADDRESS_WIDTH), DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE), DEFINE_PROP_END_OF_LIST(), }; @@ -3080,6 +3083,7 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n) .private = (void *)n, .notify_unmap = false, .aw = s->aw_bits, + .haw = s->haw_bits, .as = vtd_as, .domain_id = VTD_CONTEXT_ENTRY_DID(ce.hi), }; @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n) static void vtd_init(IntelIOMMUState *s) { X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); + CPUState *cs = first_cpu; + X86CPU *cpu = X86_CPU(cs); memset(s->csr, 0, DMAR_REG_SIZE); memset(s->wmask, 0, DMAR_REG_SIZE); @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s) s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND | VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS | VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); - if (s->aw_bits == VTD_HOST_AW_48BIT) { + if (s->aw_bits == VTD_AW_48BIT) { s->cap |= VTD_CAP_SAGAW_48bit; } s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; + s->haw_bits = cpu->phys_bits; /* * Rsvd field masks for spte */ vtd_paging_entry_rsvd_field[0] = ~0ULL; - vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits); - vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits); - vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits); - vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits); - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->aw_bits); - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits); - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits); - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->aw_bits); + vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->haw_bits); + vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); + vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); + vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); if (x86_iommu->intr_supported) { s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; @@ -3261,10 +3268,10 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) } /* Currently only address widths supported are 39 and 48 bits */ - if ((s->aw_bits != VTD_HOST_AW_39BIT) && - (s->aw_bits != VTD_HOST_AW_48BIT)) { + if ((s->aw_bits != VTD_AW_39BIT) && + (s->aw_bits != VTD_AW_48BIT)) { error_setg(errp, "Supported values for x-aw-bits are: %d, %d", - VTD_HOST_AW_39BIT, VTD_HOST_AW_48BIT); + VTD_AW_39BIT, VTD_AW_48BIT); return false; } diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h index ed4e758..820451c 100644 --- a/include/hw/i386/intel_iommu.h +++ b/include/hw/i386/intel_iommu.h @@ -47,9 +47,9 @@ #define VTD_SID_TO_DEVFN(sid) ((sid) & 0xff) #define DMAR_REG_SIZE 0x230 -#define VTD_HOST_AW_39BIT 39 -#define VTD_HOST_AW_48BIT 48 -#define VTD_HOST_ADDRESS_WIDTH VTD_HOST_AW_39BIT +#define VTD_AW_39BIT 39 +#define VTD_AW_48BIT 48 +#define VTD_ADDRESS_WIDTH VTD_AW_39BIT #define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1) #define DMAR_REPORT_F_INTR (1) @@ -244,7 +244,8 @@ struct IntelIOMMUState { bool intr_eime; /* Extended interrupt mode enabled */ OnOffAuto intr_eim; /* Toggle for EIM cabability */ bool buggy_eim; /* Force buggy EIM unless eim=off */ - uint8_t aw_bits; /* Host/IOVA address width (in bits) */ + uint8_t aw_bits; /* IOVA address width (in bits) */ + uint8_t haw_bits; /* Hardware address width (in bits) */ /* * Protects IOMMU states in general. Currently it protects the -- 1.9.1 ^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-12 13:05 ` [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width Yu Zhang @ 2018-12-17 13:17 ` Igor Mammedov 2018-12-18 9:27 ` Yu Zhang 0 siblings, 1 reply; 57+ messages in thread From: Igor Mammedov @ 2018-12-17 13:17 UTC (permalink / raw) To: Yu Zhang Cc: qemu-devel, Michael S. Tsirkin, Marcel Apfelbaum, Paolo Bonzini, Richard Henderson, Eduardo Habkost, Peter Xu On Wed, 12 Dec 2018 21:05:38 +0800 Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > Currently, vIOMMU is using the value of IOVA address width, instead of > the host address width(HAW) to calculate the number of reserved bits in > data structures such as root entries, context entries, and entries of > DMA paging structures etc. > > However values of IOVA address width and of the HAW may not equal. For > example, a 48-bit IOVA can only be mapped to host addresses no wider than > 46 bits. Using 48, instead of 46 to calculate the reserved bit may result > in an invalid IOVA being accepted. > > To fix this, a new field - haw_bits is introduced in struct IntelIOMMUState, > whose value is initialized based on the maximum physical address set to > guest CPU. > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed > to clarify. > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > Reviewed-by: Peter Xu <peterx@redhat.com> > --- [...] > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n) > static void vtd_init(IntelIOMMUState *s) > { > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > + CPUState *cs = first_cpu; > + X86CPU *cpu = X86_CPU(cs); > > memset(s->csr, 0, DMAR_REG_SIZE); > memset(s->wmask, 0, DMAR_REG_SIZE); > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s) > s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND | > VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS | > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > - if (s->aw_bits == VTD_HOST_AW_48BIT) { > + if (s->aw_bits == VTD_AW_48BIT) { > s->cap |= VTD_CAP_SAGAW_48bit; > } > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > + s->haw_bits = cpu->phys_bits; Is it possible to avoid accessing CPU fields directly or cpu altogether and set phys_bits when iommu is created? Perhaps Eduardo can suggest better approach, since he's more familiar with phys_bits topic > /* > * Rsvd field masks for spte > */ > vtd_paging_entry_rsvd_field[0] = ~0ULL; > - vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits); > - vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits); > - vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits); > - vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits); > - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->aw_bits); > - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits); > - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits); > - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->aw_bits); > + vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->haw_bits); > + vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); > + vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); > + vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); > + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > if (x86_iommu->intr_supported) { > s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; > @@ -3261,10 +3268,10 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > } > > /* Currently only address widths supported are 39 and 48 bits */ > - if ((s->aw_bits != VTD_HOST_AW_39BIT) && > - (s->aw_bits != VTD_HOST_AW_48BIT)) { > + if ((s->aw_bits != VTD_AW_39BIT) && > + (s->aw_bits != VTD_AW_48BIT)) { > error_setg(errp, "Supported values for x-aw-bits are: %d, %d", > - VTD_HOST_AW_39BIT, VTD_HOST_AW_48BIT); > + VTD_AW_39BIT, VTD_AW_48BIT); > return false; > } > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h > index ed4e758..820451c 100644 > --- a/include/hw/i386/intel_iommu.h > +++ b/include/hw/i386/intel_iommu.h > @@ -47,9 +47,9 @@ > #define VTD_SID_TO_DEVFN(sid) ((sid) & 0xff) > > #define DMAR_REG_SIZE 0x230 > -#define VTD_HOST_AW_39BIT 39 > -#define VTD_HOST_AW_48BIT 48 > -#define VTD_HOST_ADDRESS_WIDTH VTD_HOST_AW_39BIT > +#define VTD_AW_39BIT 39 > +#define VTD_AW_48BIT 48 > +#define VTD_ADDRESS_WIDTH VTD_AW_39BIT > #define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1) > > #define DMAR_REPORT_F_INTR (1) > @@ -244,7 +244,8 @@ struct IntelIOMMUState { > bool intr_eime; /* Extended interrupt mode enabled */ > OnOffAuto intr_eim; /* Toggle for EIM cabability */ > bool buggy_eim; /* Force buggy EIM unless eim=off */ > - uint8_t aw_bits; /* Host/IOVA address width (in bits) */ > + uint8_t aw_bits; /* IOVA address width (in bits) */ > + uint8_t haw_bits; /* Hardware address width (in bits) */ > > /* > * Protects IOMMU states in general. Currently it protects the ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-17 13:17 ` Igor Mammedov @ 2018-12-18 9:27 ` Yu Zhang 2018-12-18 14:23 ` Michael S. Tsirkin ` (2 more replies) 0 siblings, 3 replies; 57+ messages in thread From: Yu Zhang @ 2018-12-18 9:27 UTC (permalink / raw) To: Igor Mammedov Cc: Eduardo Habkost, Michael S. Tsirkin, qemu-devel, Peter Xu, Paolo Bonzini, Richard Henderson On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote: > On Wed, 12 Dec 2018 21:05:38 +0800 > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > Currently, vIOMMU is using the value of IOVA address width, instead of > > the host address width(HAW) to calculate the number of reserved bits in > > data structures such as root entries, context entries, and entries of > > DMA paging structures etc. > > > > However values of IOVA address width and of the HAW may not equal. For > > example, a 48-bit IOVA can only be mapped to host addresses no wider than > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may result > > in an invalid IOVA being accepted. > > > > To fix this, a new field - haw_bits is introduced in struct IntelIOMMUState, > > whose value is initialized based on the maximum physical address set to > > guest CPU. > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed > > to clarify. > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > Reviewed-by: Peter Xu <peterx@redhat.com> > > --- > [...] > > > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n) > > static void vtd_init(IntelIOMMUState *s) > > { > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > + CPUState *cs = first_cpu; > > + X86CPU *cpu = X86_CPU(cs); > > > > memset(s->csr, 0, DMAR_REG_SIZE); > > memset(s->wmask, 0, DMAR_REG_SIZE); > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s) > > s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND | > > VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS | > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > - if (s->aw_bits == VTD_HOST_AW_48BIT) { > > + if (s->aw_bits == VTD_AW_48BIT) { > > s->cap |= VTD_CAP_SAGAW_48bit; > > } > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > + s->haw_bits = cpu->phys_bits; > Is it possible to avoid accessing CPU fields directly or cpu altogether > and set phys_bits when iommu is created? Thanks for your comments, Igor. Well, I guess you prefer not to query the CPU capabilities while deciding the vIOMMU features. But to me, they are not that irrelevant.:) Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR are referring to the same concept. In VM, both are the maximum guest physical address width. If we do not check the CPU field here, we will still have to check the CPU field in other places such as build_dmar_q35(), and reset the s->haw_bits again. Is this explanation convincing enough? :) > > Perhaps Eduardo > can suggest better approach, since he's more familiar with phys_bits topic @Eduardo, any comments? Thanks! > > > /* > > * Rsvd field masks for spte > > */ > > vtd_paging_entry_rsvd_field[0] = ~0ULL; > > - vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits); > > - vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits); > > - vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits); > > - vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits); > > - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->aw_bits); > > - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits); > > - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits); > > - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->aw_bits); > > + vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->haw_bits); > > + vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); > > + vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); > > + vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); > > + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > if (x86_iommu->intr_supported) { > > s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; > > @@ -3261,10 +3268,10 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > } > > > > /* Currently only address widths supported are 39 and 48 bits */ > > - if ((s->aw_bits != VTD_HOST_AW_39BIT) && > > - (s->aw_bits != VTD_HOST_AW_48BIT)) { > > + if ((s->aw_bits != VTD_AW_39BIT) && > > + (s->aw_bits != VTD_AW_48BIT)) { > > error_setg(errp, "Supported values for x-aw-bits are: %d, %d", > > - VTD_HOST_AW_39BIT, VTD_HOST_AW_48BIT); > > + VTD_AW_39BIT, VTD_AW_48BIT); > > return false; > > } > > > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h > > index ed4e758..820451c 100644 > > --- a/include/hw/i386/intel_iommu.h > > +++ b/include/hw/i386/intel_iommu.h > > @@ -47,9 +47,9 @@ > > #define VTD_SID_TO_DEVFN(sid) ((sid) & 0xff) > > > > #define DMAR_REG_SIZE 0x230 > > -#define VTD_HOST_AW_39BIT 39 > > -#define VTD_HOST_AW_48BIT 48 > > -#define VTD_HOST_ADDRESS_WIDTH VTD_HOST_AW_39BIT > > +#define VTD_AW_39BIT 39 > > +#define VTD_AW_48BIT 48 > > +#define VTD_ADDRESS_WIDTH VTD_AW_39BIT > > #define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1) > > > > #define DMAR_REPORT_F_INTR (1) > > @@ -244,7 +244,8 @@ struct IntelIOMMUState { > > bool intr_eime; /* Extended interrupt mode enabled */ > > OnOffAuto intr_eim; /* Toggle for EIM cabability */ > > bool buggy_eim; /* Force buggy EIM unless eim=off */ > > - uint8_t aw_bits; /* Host/IOVA address width (in bits) */ > > + uint8_t aw_bits; /* IOVA address width (in bits) */ > > + uint8_t haw_bits; /* Hardware address width (in bits) */ > > > > /* > > * Protects IOMMU states in general. Currently it protects the > > B.R. Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-18 9:27 ` Yu Zhang @ 2018-12-18 14:23 ` Michael S. Tsirkin 2018-12-18 14:55 ` Igor Mammedov 2018-12-20 20:58 ` Eduardo Habkost 2 siblings, 0 replies; 57+ messages in thread From: Michael S. Tsirkin @ 2018-12-18 14:23 UTC (permalink / raw) To: Yu Zhang Cc: Igor Mammedov, Eduardo Habkost, qemu-devel, Peter Xu, Paolo Bonzini, Richard Henderson On Tue, Dec 18, 2018 at 05:27:23PM +0800, Yu Zhang wrote: > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote: > > On Wed, 12 Dec 2018 21:05:38 +0800 > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > Currently, vIOMMU is using the value of IOVA address width, instead of > > > the host address width(HAW) to calculate the number of reserved bits in > > > data structures such as root entries, context entries, and entries of > > > DMA paging structures etc. > > > > > > However values of IOVA address width and of the HAW may not equal. For > > > example, a 48-bit IOVA can only be mapped to host addresses no wider than > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may result > > > in an invalid IOVA being accepted. > > > > > > To fix this, a new field - haw_bits is introduced in struct IntelIOMMUState, > > > whose value is initialized based on the maximum physical address set to > > > guest CPU. > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed > > > to clarify. > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > --- > > [...] > > > > > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n) > > > static void vtd_init(IntelIOMMUState *s) > > > { > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > + CPUState *cs = first_cpu; > > > + X86CPU *cpu = X86_CPU(cs); > > > > > > memset(s->csr, 0, DMAR_REG_SIZE); > > > memset(s->wmask, 0, DMAR_REG_SIZE); > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s) > > > s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND | > > > VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS | > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > - if (s->aw_bits == VTD_HOST_AW_48BIT) { > > > + if (s->aw_bits == VTD_AW_48BIT) { > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > } > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > + s->haw_bits = cpu->phys_bits; > > Is it possible to avoid accessing CPU fields directly or cpu altogether > > and set phys_bits when iommu is created? > > Thanks for your comments, Igor. > > Well, I guess you prefer not to query the CPU capabilities while deciding > the vIOMMU features. But to me, they are not that irrelevant.:) > > Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR > are referring to the same concept. In VM, both are the maximum guest physical > address width. If we do not check the CPU field here, we will still have to > check the CPU field in other places such as build_dmar_q35(), and reset the > s->haw_bits again. > > Is this explanation convincing enough? :) So what happens if these don't match? I guess guest can configure the vtd to put data into some memory which isn't then accessible to the cpu, or cpu can use some memory not accessible to devices. I guess some guests might be confused - is this what you observe? If yes some comments that tell people which guests get confused would be nice. Is windows happy? Is linux happy? > > > > Perhaps Eduardo > > can suggest better approach, since he's more familiar with phys_bits topic > > @Eduardo, any comments? Thanks! > > > > > > /* > > > * Rsvd field masks for spte > > > */ > > > vtd_paging_entry_rsvd_field[0] = ~0ULL; > > > - vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits); > > > - vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits); > > > - vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits); > > > - vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits); > > > - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->aw_bits); > > > - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits); > > > - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits); > > > - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->aw_bits); > > > + vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->haw_bits); > > > + vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); > > > + vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); > > > + vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); > > > + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > if (x86_iommu->intr_supported) { > > > s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; > > > @@ -3261,10 +3268,10 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > } > > > > > > /* Currently only address widths supported are 39 and 48 bits */ > > > - if ((s->aw_bits != VTD_HOST_AW_39BIT) && > > > - (s->aw_bits != VTD_HOST_AW_48BIT)) { > > > + if ((s->aw_bits != VTD_AW_39BIT) && > > > + (s->aw_bits != VTD_AW_48BIT)) { > > > error_setg(errp, "Supported values for x-aw-bits are: %d, %d", > > > - VTD_HOST_AW_39BIT, VTD_HOST_AW_48BIT); > > > + VTD_AW_39BIT, VTD_AW_48BIT); > > > return false; > > > } > > > > > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h > > > index ed4e758..820451c 100644 > > > --- a/include/hw/i386/intel_iommu.h > > > +++ b/include/hw/i386/intel_iommu.h > > > @@ -47,9 +47,9 @@ > > > #define VTD_SID_TO_DEVFN(sid) ((sid) & 0xff) > > > > > > #define DMAR_REG_SIZE 0x230 > > > -#define VTD_HOST_AW_39BIT 39 > > > -#define VTD_HOST_AW_48BIT 48 > > > -#define VTD_HOST_ADDRESS_WIDTH VTD_HOST_AW_39BIT > > > +#define VTD_AW_39BIT 39 > > > +#define VTD_AW_48BIT 48 > > > +#define VTD_ADDRESS_WIDTH VTD_AW_39BIT > > > #define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1) > > > > > > #define DMAR_REPORT_F_INTR (1) > > > @@ -244,7 +244,8 @@ struct IntelIOMMUState { > > > bool intr_eime; /* Extended interrupt mode enabled */ > > > OnOffAuto intr_eim; /* Toggle for EIM cabability */ > > > bool buggy_eim; /* Force buggy EIM unless eim=off */ > > > - uint8_t aw_bits; /* Host/IOVA address width (in bits) */ > > > + uint8_t aw_bits; /* IOVA address width (in bits) */ > > > + uint8_t haw_bits; /* Hardware address width (in bits) */ > > > > > > /* > > > * Protects IOMMU states in general. Currently it protects the > > > > > > B.R. > Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-18 9:27 ` Yu Zhang 2018-12-18 14:23 ` Michael S. Tsirkin @ 2018-12-18 14:55 ` Igor Mammedov 2018-12-18 14:58 ` Michael S. Tsirkin 2018-12-19 2:57 ` Yu Zhang 2018-12-20 20:58 ` Eduardo Habkost 2 siblings, 2 replies; 57+ messages in thread From: Igor Mammedov @ 2018-12-18 14:55 UTC (permalink / raw) To: Yu Zhang Cc: Eduardo Habkost, Michael S. Tsirkin, qemu-devel, Peter Xu, Paolo Bonzini, Richard Henderson On Tue, 18 Dec 2018 17:27:23 +0800 Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote: > > On Wed, 12 Dec 2018 21:05:38 +0800 > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > Currently, vIOMMU is using the value of IOVA address width, instead of > > > the host address width(HAW) to calculate the number of reserved bits in > > > data structures such as root entries, context entries, and entries of > > > DMA paging structures etc. > > > > > > However values of IOVA address width and of the HAW may not equal. For > > > example, a 48-bit IOVA can only be mapped to host addresses no wider than > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may result > > > in an invalid IOVA being accepted. > > > > > > To fix this, a new field - haw_bits is introduced in struct IntelIOMMUState, > > > whose value is initialized based on the maximum physical address set to > > > guest CPU. > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed > > > to clarify. > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > --- > > [...] > > > > > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n) > > > static void vtd_init(IntelIOMMUState *s) > > > { > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > + CPUState *cs = first_cpu; > > > + X86CPU *cpu = X86_CPU(cs); > > > > > > memset(s->csr, 0, DMAR_REG_SIZE); > > > memset(s->wmask, 0, DMAR_REG_SIZE); > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s) > > > s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND | > > > VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS | > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > - if (s->aw_bits == VTD_HOST_AW_48BIT) { > > > + if (s->aw_bits == VTD_AW_48BIT) { > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > } > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > + s->haw_bits = cpu->phys_bits; > > Is it possible to avoid accessing CPU fields directly or cpu altogether > > and set phys_bits when iommu is created? > > Thanks for your comments, Igor. > > Well, I guess you prefer not to query the CPU capabilities while deciding > the vIOMMU features. But to me, they are not that irrelevant.:) > > Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR > are referring to the same concept. In VM, both are the maximum guest physical > address width. If we do not check the CPU field here, we will still have to > check the CPU field in other places such as build_dmar_q35(), and reset the > s->haw_bits again. > > Is this explanation convincing enough? :) current build_dmar_q35() doesn't do it, it's all new code in this series that contains not acceptable direct access from one device (iommu) to another (cpu). Proper way would be for the owner of iommu to fish limits from somewhere and set values during iommu creation. > > > > Perhaps Eduardo > > can suggest better approach, since he's more familiar with phys_bits topic > > @Eduardo, any comments? Thanks! > > > > > > /* > > > * Rsvd field masks for spte > > > */ > > > vtd_paging_entry_rsvd_field[0] = ~0ULL; > > > - vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits); > > > - vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits); > > > - vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits); > > > - vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits); > > > - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->aw_bits); > > > - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits); > > > - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits); > > > - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->aw_bits); > > > + vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->haw_bits); > > > + vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); > > > + vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); > > > + vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); > > > + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > if (x86_iommu->intr_supported) { > > > s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; > > > @@ -3261,10 +3268,10 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > } > > > > > > /* Currently only address widths supported are 39 and 48 bits */ > > > - if ((s->aw_bits != VTD_HOST_AW_39BIT) && > > > - (s->aw_bits != VTD_HOST_AW_48BIT)) { > > > + if ((s->aw_bits != VTD_AW_39BIT) && > > > + (s->aw_bits != VTD_AW_48BIT)) { > > > error_setg(errp, "Supported values for x-aw-bits are: %d, %d", > > > - VTD_HOST_AW_39BIT, VTD_HOST_AW_48BIT); > > > + VTD_AW_39BIT, VTD_AW_48BIT); > > > return false; > > > } > > > > > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h > > > index ed4e758..820451c 100644 > > > --- a/include/hw/i386/intel_iommu.h > > > +++ b/include/hw/i386/intel_iommu.h > > > @@ -47,9 +47,9 @@ > > > #define VTD_SID_TO_DEVFN(sid) ((sid) & 0xff) > > > > > > #define DMAR_REG_SIZE 0x230 > > > -#define VTD_HOST_AW_39BIT 39 > > > -#define VTD_HOST_AW_48BIT 48 > > > -#define VTD_HOST_ADDRESS_WIDTH VTD_HOST_AW_39BIT > > > +#define VTD_AW_39BIT 39 > > > +#define VTD_AW_48BIT 48 > > > +#define VTD_ADDRESS_WIDTH VTD_AW_39BIT > > > #define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1) > > > > > > #define DMAR_REPORT_F_INTR (1) > > > @@ -244,7 +244,8 @@ struct IntelIOMMUState { > > > bool intr_eime; /* Extended interrupt mode enabled */ > > > OnOffAuto intr_eim; /* Toggle for EIM cabability */ > > > bool buggy_eim; /* Force buggy EIM unless eim=off */ > > > - uint8_t aw_bits; /* Host/IOVA address width (in bits) */ > > > + uint8_t aw_bits; /* IOVA address width (in bits) */ > > > + uint8_t haw_bits; /* Hardware address width (in bits) */ > > > > > > /* > > > * Protects IOMMU states in general. Currently it protects the > > > > > > B.R. > Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-18 14:55 ` Igor Mammedov @ 2018-12-18 14:58 ` Michael S. Tsirkin 2018-12-19 3:03 ` Yu Zhang 2018-12-19 2:57 ` Yu Zhang 1 sibling, 1 reply; 57+ messages in thread From: Michael S. Tsirkin @ 2018-12-18 14:58 UTC (permalink / raw) To: Igor Mammedov Cc: Yu Zhang, Eduardo Habkost, qemu-devel, Peter Xu, Paolo Bonzini, Richard Henderson On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote: > On Tue, 18 Dec 2018 17:27:23 +0800 > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote: > > > On Wed, 12 Dec 2018 21:05:38 +0800 > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > Currently, vIOMMU is using the value of IOVA address width, instead of > > > > the host address width(HAW) to calculate the number of reserved bits in > > > > data structures such as root entries, context entries, and entries of > > > > DMA paging structures etc. > > > > > > > > However values of IOVA address width and of the HAW may not equal. For > > > > example, a 48-bit IOVA can only be mapped to host addresses no wider than > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may result > > > > in an invalid IOVA being accepted. > > > > > > > > To fix this, a new field - haw_bits is introduced in struct IntelIOMMUState, > > > > whose value is initialized based on the maximum physical address set to > > > > guest CPU. > > > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed > > > > to clarify. > > > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > > --- > > > [...] > > > > > > > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n) > > > > static void vtd_init(IntelIOMMUState *s) > > > > { > > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > > + CPUState *cs = first_cpu; > > > > + X86CPU *cpu = X86_CPU(cs); > > > > > > > > memset(s->csr, 0, DMAR_REG_SIZE); > > > > memset(s->wmask, 0, DMAR_REG_SIZE); > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s) > > > > s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND | > > > > VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS | > > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > > - if (s->aw_bits == VTD_HOST_AW_48BIT) { > > > > + if (s->aw_bits == VTD_AW_48BIT) { > > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > > } > > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > > + s->haw_bits = cpu->phys_bits; > > > Is it possible to avoid accessing CPU fields directly or cpu altogether > > > and set phys_bits when iommu is created? > > > > Thanks for your comments, Igor. > > > > Well, I guess you prefer not to query the CPU capabilities while deciding > > the vIOMMU features. But to me, they are not that irrelevant.:) > > > > Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR > > are referring to the same concept. In VM, both are the maximum guest physical > > address width. If we do not check the CPU field here, we will still have to > > check the CPU field in other places such as build_dmar_q35(), and reset the > > s->haw_bits again. > > > > Is this explanation convincing enough? :) > current build_dmar_q35() doesn't do it, it's all new code in this series that > contains not acceptable direct access from one device (iommu) to another (cpu). > Proper way would be for the owner of iommu to fish limits from somewhere and set > values during iommu creation. Maybe it's a good idea to add documentation for now. It would be nice not to push this stuff up the stack, it's unfortunate that our internal APIs make it hard. > > > > > > Perhaps Eduardo > > > can suggest better approach, since he's more familiar with phys_bits topic > > > > @Eduardo, any comments? Thanks! > > > > > > > > > /* > > > > * Rsvd field masks for spte > > > > */ > > > > vtd_paging_entry_rsvd_field[0] = ~0ULL; > > > > - vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits); > > > > - vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits); > > > > - vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits); > > > > - vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits); > > > > - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->aw_bits); > > > > - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits); > > > > - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits); > > > > - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->aw_bits); > > > > + vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->haw_bits); > > > > + vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); > > > > + vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); > > > > + vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); > > > > + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > if (x86_iommu->intr_supported) { > > > > s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; > > > > @@ -3261,10 +3268,10 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > } > > > > > > > > /* Currently only address widths supported are 39 and 48 bits */ > > > > - if ((s->aw_bits != VTD_HOST_AW_39BIT) && > > > > - (s->aw_bits != VTD_HOST_AW_48BIT)) { > > > > + if ((s->aw_bits != VTD_AW_39BIT) && > > > > + (s->aw_bits != VTD_AW_48BIT)) { > > > > error_setg(errp, "Supported values for x-aw-bits are: %d, %d", > > > > - VTD_HOST_AW_39BIT, VTD_HOST_AW_48BIT); > > > > + VTD_AW_39BIT, VTD_AW_48BIT); > > > > return false; > > > > } > > > > > > > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h > > > > index ed4e758..820451c 100644 > > > > --- a/include/hw/i386/intel_iommu.h > > > > +++ b/include/hw/i386/intel_iommu.h > > > > @@ -47,9 +47,9 @@ > > > > #define VTD_SID_TO_DEVFN(sid) ((sid) & 0xff) > > > > > > > > #define DMAR_REG_SIZE 0x230 > > > > -#define VTD_HOST_AW_39BIT 39 > > > > -#define VTD_HOST_AW_48BIT 48 > > > > -#define VTD_HOST_ADDRESS_WIDTH VTD_HOST_AW_39BIT > > > > +#define VTD_AW_39BIT 39 > > > > +#define VTD_AW_48BIT 48 > > > > +#define VTD_ADDRESS_WIDTH VTD_AW_39BIT > > > > #define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1) > > > > > > > > #define DMAR_REPORT_F_INTR (1) > > > > @@ -244,7 +244,8 @@ struct IntelIOMMUState { > > > > bool intr_eime; /* Extended interrupt mode enabled */ > > > > OnOffAuto intr_eim; /* Toggle for EIM cabability */ > > > > bool buggy_eim; /* Force buggy EIM unless eim=off */ > > > > - uint8_t aw_bits; /* Host/IOVA address width (in bits) */ > > > > + uint8_t aw_bits; /* IOVA address width (in bits) */ > > > > + uint8_t haw_bits; /* Hardware address width (in bits) */ > > > > > > > > /* > > > > * Protects IOMMU states in general. Currently it protects the > > > > > > > > > > B.R. > > Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-18 14:58 ` Michael S. Tsirkin @ 2018-12-19 3:03 ` Yu Zhang 2018-12-19 3:12 ` Michael S. Tsirkin 0 siblings, 1 reply; 57+ messages in thread From: Yu Zhang @ 2018-12-19 3:03 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Igor Mammedov, Eduardo Habkost, qemu-devel, Peter Xu, Paolo Bonzini, Richard Henderson On Tue, Dec 18, 2018 at 09:58:35AM -0500, Michael S. Tsirkin wrote: > On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote: > > On Tue, 18 Dec 2018 17:27:23 +0800 > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote: > > > > On Wed, 12 Dec 2018 21:05:38 +0800 > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > Currently, vIOMMU is using the value of IOVA address width, instead of > > > > > the host address width(HAW) to calculate the number of reserved bits in > > > > > data structures such as root entries, context entries, and entries of > > > > > DMA paging structures etc. > > > > > > > > > > However values of IOVA address width and of the HAW may not equal. For > > > > > example, a 48-bit IOVA can only be mapped to host addresses no wider than > > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may result > > > > > in an invalid IOVA being accepted. > > > > > > > > > > To fix this, a new field - haw_bits is introduced in struct IntelIOMMUState, > > > > > whose value is initialized based on the maximum physical address set to > > > > > guest CPU. > > > > > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed > > > > > to clarify. > > > > > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > > > --- > > > > [...] > > > > > > > > > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n) > > > > > static void vtd_init(IntelIOMMUState *s) > > > > > { > > > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > > > + CPUState *cs = first_cpu; > > > > > + X86CPU *cpu = X86_CPU(cs); > > > > > > > > > > memset(s->csr, 0, DMAR_REG_SIZE); > > > > > memset(s->wmask, 0, DMAR_REG_SIZE); > > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s) > > > > > s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND | > > > > > VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS | > > > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > > > - if (s->aw_bits == VTD_HOST_AW_48BIT) { > > > > > + if (s->aw_bits == VTD_AW_48BIT) { > > > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > > > } > > > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > > > + s->haw_bits = cpu->phys_bits; > > > > Is it possible to avoid accessing CPU fields directly or cpu altogether > > > > and set phys_bits when iommu is created? > > > > > > Thanks for your comments, Igor. > > > > > > Well, I guess you prefer not to query the CPU capabilities while deciding > > > the vIOMMU features. But to me, they are not that irrelevant.:) > > > > > > Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR > > > are referring to the same concept. In VM, both are the maximum guest physical > > > address width. If we do not check the CPU field here, we will still have to > > > check the CPU field in other places such as build_dmar_q35(), and reset the > > > s->haw_bits again. > > > > > > Is this explanation convincing enough? :) > > current build_dmar_q35() doesn't do it, it's all new code in this series that > > contains not acceptable direct access from one device (iommu) to another (cpu). > > Proper way would be for the owner of iommu to fish limits from somewhere and set > > values during iommu creation. > > Maybe it's a good idea to add documentation for now. Thanks Michael. So what kind of documentation do you refer? > > It would be nice not to push this stuff up the stack, > it's unfortunate that our internal APIs make it hard. Sorry, I do not quite get it. What do you mean "internal APIs make it hard"? :) > > > > > > > > > > Perhaps Eduardo > > > > can suggest better approach, since he's more familiar with phys_bits topic > > > > > > @Eduardo, any comments? Thanks! > > > > > > > > > > > > /* > > > > > * Rsvd field masks for spte > > > > > */ > > > > > vtd_paging_entry_rsvd_field[0] = ~0ULL; > > > > > - vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits); > > > > > - vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits); > > > > > - vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits); > > > > > - vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits); > > > > > - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->aw_bits); > > > > > - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits); > > > > > - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits); > > > > > - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->aw_bits); > > > > > + vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->haw_bits); > > > > > + vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); > > > > > + vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); > > > > > + vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); > > > > > + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > > + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > > + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > > + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > > > if (x86_iommu->intr_supported) { > > > > > s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; > > > > > @@ -3261,10 +3268,10 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > > } > > > > > > > > > > /* Currently only address widths supported are 39 and 48 bits */ > > > > > - if ((s->aw_bits != VTD_HOST_AW_39BIT) && > > > > > - (s->aw_bits != VTD_HOST_AW_48BIT)) { > > > > > + if ((s->aw_bits != VTD_AW_39BIT) && > > > > > + (s->aw_bits != VTD_AW_48BIT)) { > > > > > error_setg(errp, "Supported values for x-aw-bits are: %d, %d", > > > > > - VTD_HOST_AW_39BIT, VTD_HOST_AW_48BIT); > > > > > + VTD_AW_39BIT, VTD_AW_48BIT); > > > > > return false; > > > > > } > > > > > > > > > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h > > > > > index ed4e758..820451c 100644 > > > > > --- a/include/hw/i386/intel_iommu.h > > > > > +++ b/include/hw/i386/intel_iommu.h > > > > > @@ -47,9 +47,9 @@ > > > > > #define VTD_SID_TO_DEVFN(sid) ((sid) & 0xff) > > > > > > > > > > #define DMAR_REG_SIZE 0x230 > > > > > -#define VTD_HOST_AW_39BIT 39 > > > > > -#define VTD_HOST_AW_48BIT 48 > > > > > -#define VTD_HOST_ADDRESS_WIDTH VTD_HOST_AW_39BIT > > > > > +#define VTD_AW_39BIT 39 > > > > > +#define VTD_AW_48BIT 48 > > > > > +#define VTD_ADDRESS_WIDTH VTD_AW_39BIT > > > > > #define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1) > > > > > > > > > > #define DMAR_REPORT_F_INTR (1) > > > > > @@ -244,7 +244,8 @@ struct IntelIOMMUState { > > > > > bool intr_eime; /* Extended interrupt mode enabled */ > > > > > OnOffAuto intr_eim; /* Toggle for EIM cabability */ > > > > > bool buggy_eim; /* Force buggy EIM unless eim=off */ > > > > > - uint8_t aw_bits; /* Host/IOVA address width (in bits) */ > > > > > + uint8_t aw_bits; /* IOVA address width (in bits) */ > > > > > + uint8_t haw_bits; /* Hardware address width (in bits) */ > > > > > > > > > > /* > > > > > * Protects IOMMU states in general. Currently it protects the > > > > > > > > > > > > > > B.R. > > > Yu > B.R. Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-19 3:03 ` Yu Zhang @ 2018-12-19 3:12 ` Michael S. Tsirkin 2018-12-19 6:28 ` Yu Zhang 0 siblings, 1 reply; 57+ messages in thread From: Michael S. Tsirkin @ 2018-12-19 3:12 UTC (permalink / raw) To: Yu Zhang Cc: Igor Mammedov, Eduardo Habkost, qemu-devel, Peter Xu, Paolo Bonzini, Richard Henderson On Wed, Dec 19, 2018 at 11:03:58AM +0800, Yu Zhang wrote: > On Tue, Dec 18, 2018 at 09:58:35AM -0500, Michael S. Tsirkin wrote: > > On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote: > > > On Tue, 18 Dec 2018 17:27:23 +0800 > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote: > > > > > On Wed, 12 Dec 2018 21:05:38 +0800 > > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > > > Currently, vIOMMU is using the value of IOVA address width, instead of > > > > > > the host address width(HAW) to calculate the number of reserved bits in > > > > > > data structures such as root entries, context entries, and entries of > > > > > > DMA paging structures etc. > > > > > > > > > > > > However values of IOVA address width and of the HAW may not equal. For > > > > > > example, a 48-bit IOVA can only be mapped to host addresses no wider than > > > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may result > > > > > > in an invalid IOVA being accepted. > > > > > > > > > > > > To fix this, a new field - haw_bits is introduced in struct IntelIOMMUState, > > > > > > whose value is initialized based on the maximum physical address set to > > > > > > guest CPU. > > > > > > > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed > > > > > > to clarify. > > > > > > > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > > > > --- > > > > > [...] > > > > > > > > > > > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n) > > > > > > static void vtd_init(IntelIOMMUState *s) > > > > > > { > > > > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > > > > + CPUState *cs = first_cpu; > > > > > > + X86CPU *cpu = X86_CPU(cs); > > > > > > > > > > > > memset(s->csr, 0, DMAR_REG_SIZE); > > > > > > memset(s->wmask, 0, DMAR_REG_SIZE); > > > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s) > > > > > > s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND | > > > > > > VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS | > > > > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > > > > - if (s->aw_bits == VTD_HOST_AW_48BIT) { > > > > > > + if (s->aw_bits == VTD_AW_48BIT) { > > > > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > > > > } > > > > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > > > > + s->haw_bits = cpu->phys_bits; > > > > > Is it possible to avoid accessing CPU fields directly or cpu altogether > > > > > and set phys_bits when iommu is created? > > > > > > > > Thanks for your comments, Igor. > > > > > > > > Well, I guess you prefer not to query the CPU capabilities while deciding > > > > the vIOMMU features. But to me, they are not that irrelevant.:) > > > > > > > > Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR > > > > are referring to the same concept. In VM, both are the maximum guest physical > > > > address width. If we do not check the CPU field here, we will still have to > > > > check the CPU field in other places such as build_dmar_q35(), and reset the > > > > s->haw_bits again. > > > > > > > > Is this explanation convincing enough? :) > > > current build_dmar_q35() doesn't do it, it's all new code in this series that > > > contains not acceptable direct access from one device (iommu) to another (cpu). > > > Proper way would be for the owner of iommu to fish limits from somewhere and set > > > values during iommu creation. > > > > Maybe it's a good idea to add documentation for now. > > Thanks Michael. So what kind of documentation do you refer? The idea would be to have two properties, AW for the CPU and the IOMMU. In the documentation explain that they should normally be set to the same value. > > > > It would be nice not to push this stuff up the stack, > > it's unfortunate that our internal APIs make it hard. > > Sorry, I do not quite get it. What do you mean "internal APIs make it hard"? :) The API doesn't actually guarantee any initialization order. CPU happens to be initialized first but I do not think there's a guarantee that it will keep being the case. This makes it hard to get properties from one device and use in another one. > > > > > > > > > > > > > > Perhaps Eduardo > > > > > can suggest better approach, since he's more familiar with phys_bits topic > > > > > > > > @Eduardo, any comments? Thanks! > > > > > > > > > > > > > > > /* > > > > > > * Rsvd field masks for spte > > > > > > */ > > > > > > vtd_paging_entry_rsvd_field[0] = ~0ULL; > > > > > > - vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits); > > > > > > - vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits); > > > > > > - vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits); > > > > > > - vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits); > > > > > > - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->aw_bits); > > > > > > - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits); > > > > > > - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits); > > > > > > - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->aw_bits); > > > > > > + vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->haw_bits); > > > > > > + vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > + vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > + vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > > > + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > if (x86_iommu->intr_supported) { > > > > > > s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; > > > > > > @@ -3261,10 +3268,10 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > > > } > > > > > > > > > > > > /* Currently only address widths supported are 39 and 48 bits */ > > > > > > - if ((s->aw_bits != VTD_HOST_AW_39BIT) && > > > > > > - (s->aw_bits != VTD_HOST_AW_48BIT)) { > > > > > > + if ((s->aw_bits != VTD_AW_39BIT) && > > > > > > + (s->aw_bits != VTD_AW_48BIT)) { > > > > > > error_setg(errp, "Supported values for x-aw-bits are: %d, %d", > > > > > > - VTD_HOST_AW_39BIT, VTD_HOST_AW_48BIT); > > > > > > + VTD_AW_39BIT, VTD_AW_48BIT); > > > > > > return false; > > > > > > } > > > > > > > > > > > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h > > > > > > index ed4e758..820451c 100644 > > > > > > --- a/include/hw/i386/intel_iommu.h > > > > > > +++ b/include/hw/i386/intel_iommu.h > > > > > > @@ -47,9 +47,9 @@ > > > > > > #define VTD_SID_TO_DEVFN(sid) ((sid) & 0xff) > > > > > > > > > > > > #define DMAR_REG_SIZE 0x230 > > > > > > -#define VTD_HOST_AW_39BIT 39 > > > > > > -#define VTD_HOST_AW_48BIT 48 > > > > > > -#define VTD_HOST_ADDRESS_WIDTH VTD_HOST_AW_39BIT > > > > > > +#define VTD_AW_39BIT 39 > > > > > > +#define VTD_AW_48BIT 48 > > > > > > +#define VTD_ADDRESS_WIDTH VTD_AW_39BIT > > > > > > #define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1) > > > > > > > > > > > > #define DMAR_REPORT_F_INTR (1) > > > > > > @@ -244,7 +244,8 @@ struct IntelIOMMUState { > > > > > > bool intr_eime; /* Extended interrupt mode enabled */ > > > > > > OnOffAuto intr_eim; /* Toggle for EIM cabability */ > > > > > > bool buggy_eim; /* Force buggy EIM unless eim=off */ > > > > > > - uint8_t aw_bits; /* Host/IOVA address width (in bits) */ > > > > > > + uint8_t aw_bits; /* IOVA address width (in bits) */ > > > > > > + uint8_t haw_bits; /* Hardware address width (in bits) */ > > > > > > > > > > > > /* > > > > > > * Protects IOMMU states in general. Currently it protects the > > > > > > > > > > > > > > > > > > B.R. > > > > Yu > > > > B.R. > Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-19 3:12 ` Michael S. Tsirkin @ 2018-12-19 6:28 ` Yu Zhang 2018-12-19 15:30 ` Michael S. Tsirkin 0 siblings, 1 reply; 57+ messages in thread From: Yu Zhang @ 2018-12-19 6:28 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Igor Mammedov, Eduardo Habkost, qemu-devel, Peter Xu, Paolo Bonzini, Richard Henderson On Tue, Dec 18, 2018 at 10:12:45PM -0500, Michael S. Tsirkin wrote: > On Wed, Dec 19, 2018 at 11:03:58AM +0800, Yu Zhang wrote: > > On Tue, Dec 18, 2018 at 09:58:35AM -0500, Michael S. Tsirkin wrote: > > > On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote: > > > > On Tue, 18 Dec 2018 17:27:23 +0800 > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote: > > > > > > On Wed, 12 Dec 2018 21:05:38 +0800 > > > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > > > > > Currently, vIOMMU is using the value of IOVA address width, instead of > > > > > > > the host address width(HAW) to calculate the number of reserved bits in > > > > > > > data structures such as root entries, context entries, and entries of > > > > > > > DMA paging structures etc. > > > > > > > > > > > > > > However values of IOVA address width and of the HAW may not equal. For > > > > > > > example, a 48-bit IOVA can only be mapped to host addresses no wider than > > > > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may result > > > > > > > in an invalid IOVA being accepted. > > > > > > > > > > > > > > To fix this, a new field - haw_bits is introduced in struct IntelIOMMUState, > > > > > > > whose value is initialized based on the maximum physical address set to > > > > > > > guest CPU. > > > > > > > > > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed > > > > > > > to clarify. > > > > > > > > > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > > > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > > > > > --- > > > > > > [...] > > > > > > > > > > > > > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n) > > > > > > > static void vtd_init(IntelIOMMUState *s) > > > > > > > { > > > > > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > > > > > + CPUState *cs = first_cpu; > > > > > > > + X86CPU *cpu = X86_CPU(cs); > > > > > > > > > > > > > > memset(s->csr, 0, DMAR_REG_SIZE); > > > > > > > memset(s->wmask, 0, DMAR_REG_SIZE); > > > > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s) > > > > > > > s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND | > > > > > > > VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS | > > > > > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > > > > > - if (s->aw_bits == VTD_HOST_AW_48BIT) { > > > > > > > + if (s->aw_bits == VTD_AW_48BIT) { > > > > > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > > > > > } > > > > > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > > > > > + s->haw_bits = cpu->phys_bits; > > > > > > Is it possible to avoid accessing CPU fields directly or cpu altogether > > > > > > and set phys_bits when iommu is created? > > > > > > > > > > Thanks for your comments, Igor. > > > > > > > > > > Well, I guess you prefer not to query the CPU capabilities while deciding > > > > > the vIOMMU features. But to me, they are not that irrelevant.:) > > > > > > > > > > Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR > > > > > are referring to the same concept. In VM, both are the maximum guest physical > > > > > address width. If we do not check the CPU field here, we will still have to > > > > > check the CPU field in other places such as build_dmar_q35(), and reset the > > > > > s->haw_bits again. > > > > > > > > > > Is this explanation convincing enough? :) > > > > current build_dmar_q35() doesn't do it, it's all new code in this series that > > > > contains not acceptable direct access from one device (iommu) to another (cpu). > > > > Proper way would be for the owner of iommu to fish limits from somewhere and set > > > > values during iommu creation. > > > > > > Maybe it's a good idea to add documentation for now. > > > > Thanks Michael. So what kind of documentation do you refer? > > The idea would be to have two properties, AW for the CPU and > the IOMMU. In the documentation explain that they > should normally be set to the same value. > > > > > > > It would be nice not to push this stuff up the stack, > > > it's unfortunate that our internal APIs make it hard. > > > > Sorry, I do not quite get it. What do you mean "internal APIs make it hard"? :) > > The API doesn't actually guarantee any initialization order. > CPU happens to be initialized first but I do not > think there's a guarantee that it will keep being the case. > This makes it hard to get properties from one device > and use in another one. > Oops... Then there can be no easy way in the runtime to gurantee this. BTW, could we initialize CPU before other components? Is it hard to do, or not reasonable to do so? I have plan to draft a doc in qemu on 5-level paging topic(maybe after all the enabling is done). But I don't this this is the proper place to put - as you can see, this fix is not relevant to 5-level paging. So any suggestion about the documentation? > > > > > > > > > > > > > > > > > > Perhaps Eduardo > > > > > > can suggest better approach, since he's more familiar with phys_bits topic > > > > > > > > > > @Eduardo, any comments? Thanks! > > > > > > > > > > > > > > > > > > /* > > > > > > > * Rsvd field masks for spte > > > > > > > */ > > > > > > > vtd_paging_entry_rsvd_field[0] = ~0ULL; > > > > > > > - vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits); > > > > > > > - vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits); > > > > > > > - vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits); > > > > > > > - vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits); > > > > > > > - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->aw_bits); > > > > > > > - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits); > > > > > > > - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits); > > > > > > > - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->aw_bits); > > > > > > > + vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->haw_bits); > > > > > > > + vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > > + vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > > + vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > > > > + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > > + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > > + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > > > if (x86_iommu->intr_supported) { > > > > > > > s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; > > > > > > > @@ -3261,10 +3268,10 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > > > > } > > > > > > > > > > > > > > /* Currently only address widths supported are 39 and 48 bits */ > > > > > > > - if ((s->aw_bits != VTD_HOST_AW_39BIT) && > > > > > > > - (s->aw_bits != VTD_HOST_AW_48BIT)) { > > > > > > > + if ((s->aw_bits != VTD_AW_39BIT) && > > > > > > > + (s->aw_bits != VTD_AW_48BIT)) { > > > > > > > error_setg(errp, "Supported values for x-aw-bits are: %d, %d", > > > > > > > - VTD_HOST_AW_39BIT, VTD_HOST_AW_48BIT); > > > > > > > + VTD_AW_39BIT, VTD_AW_48BIT); > > > > > > > return false; > > > > > > > } > > > > > > > > > > > > > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h > > > > > > > index ed4e758..820451c 100644 > > > > > > > --- a/include/hw/i386/intel_iommu.h > > > > > > > +++ b/include/hw/i386/intel_iommu.h > > > > > > > @@ -47,9 +47,9 @@ > > > > > > > #define VTD_SID_TO_DEVFN(sid) ((sid) & 0xff) > > > > > > > > > > > > > > #define DMAR_REG_SIZE 0x230 > > > > > > > -#define VTD_HOST_AW_39BIT 39 > > > > > > > -#define VTD_HOST_AW_48BIT 48 > > > > > > > -#define VTD_HOST_ADDRESS_WIDTH VTD_HOST_AW_39BIT > > > > > > > +#define VTD_AW_39BIT 39 > > > > > > > +#define VTD_AW_48BIT 48 > > > > > > > +#define VTD_ADDRESS_WIDTH VTD_AW_39BIT > > > > > > > #define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1) > > > > > > > > > > > > > > #define DMAR_REPORT_F_INTR (1) > > > > > > > @@ -244,7 +244,8 @@ struct IntelIOMMUState { > > > > > > > bool intr_eime; /* Extended interrupt mode enabled */ > > > > > > > OnOffAuto intr_eim; /* Toggle for EIM cabability */ > > > > > > > bool buggy_eim; /* Force buggy EIM unless eim=off */ > > > > > > > - uint8_t aw_bits; /* Host/IOVA address width (in bits) */ > > > > > > > + uint8_t aw_bits; /* IOVA address width (in bits) */ > > > > > > > + uint8_t haw_bits; /* Hardware address width (in bits) */ > > > > > > > > > > > > > > /* > > > > > > > * Protects IOMMU states in general. Currently it protects the > > > > > > > > > > > > > > > > > > > > > > B.R. > > > > > Yu > > > > > > > B.R. > > Yu B.R. Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-19 6:28 ` Yu Zhang @ 2018-12-19 15:30 ` Michael S. Tsirkin 0 siblings, 0 replies; 57+ messages in thread From: Michael S. Tsirkin @ 2018-12-19 15:30 UTC (permalink / raw) To: Yu Zhang Cc: Igor Mammedov, Eduardo Habkost, qemu-devel, Peter Xu, Paolo Bonzini, Richard Henderson On Wed, Dec 19, 2018 at 02:28:10PM +0800, Yu Zhang wrote: > On Tue, Dec 18, 2018 at 10:12:45PM -0500, Michael S. Tsirkin wrote: > > On Wed, Dec 19, 2018 at 11:03:58AM +0800, Yu Zhang wrote: > > > On Tue, Dec 18, 2018 at 09:58:35AM -0500, Michael S. Tsirkin wrote: > > > > On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote: > > > > > On Tue, 18 Dec 2018 17:27:23 +0800 > > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote: > > > > > > > On Wed, 12 Dec 2018 21:05:38 +0800 > > > > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > > > > > > > Currently, vIOMMU is using the value of IOVA address width, instead of > > > > > > > > the host address width(HAW) to calculate the number of reserved bits in > > > > > > > > data structures such as root entries, context entries, and entries of > > > > > > > > DMA paging structures etc. > > > > > > > > > > > > > > > > However values of IOVA address width and of the HAW may not equal. For > > > > > > > > example, a 48-bit IOVA can only be mapped to host addresses no wider than > > > > > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may result > > > > > > > > in an invalid IOVA being accepted. > > > > > > > > > > > > > > > > To fix this, a new field - haw_bits is introduced in struct IntelIOMMUState, > > > > > > > > whose value is initialized based on the maximum physical address set to > > > > > > > > guest CPU. > > > > > > > > > > > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed > > > > > > > > to clarify. > > > > > > > > > > > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > > > > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > > > > > > --- > > > > > > > [...] > > > > > > > > > > > > > > > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n) > > > > > > > > static void vtd_init(IntelIOMMUState *s) > > > > > > > > { > > > > > > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > > > > > > + CPUState *cs = first_cpu; > > > > > > > > + X86CPU *cpu = X86_CPU(cs); > > > > > > > > > > > > > > > > memset(s->csr, 0, DMAR_REG_SIZE); > > > > > > > > memset(s->wmask, 0, DMAR_REG_SIZE); > > > > > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s) > > > > > > > > s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND | > > > > > > > > VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS | > > > > > > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > > > > > > - if (s->aw_bits == VTD_HOST_AW_48BIT) { > > > > > > > > + if (s->aw_bits == VTD_AW_48BIT) { > > > > > > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > > > > > > } > > > > > > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > > > > > > + s->haw_bits = cpu->phys_bits; > > > > > > > Is it possible to avoid accessing CPU fields directly or cpu altogether > > > > > > > and set phys_bits when iommu is created? > > > > > > > > > > > > Thanks for your comments, Igor. > > > > > > > > > > > > Well, I guess you prefer not to query the CPU capabilities while deciding > > > > > > the vIOMMU features. But to me, they are not that irrelevant.:) > > > > > > > > > > > > Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR > > > > > > are referring to the same concept. In VM, both are the maximum guest physical > > > > > > address width. If we do not check the CPU field here, we will still have to > > > > > > check the CPU field in other places such as build_dmar_q35(), and reset the > > > > > > s->haw_bits again. > > > > > > > > > > > > Is this explanation convincing enough? :) > > > > > current build_dmar_q35() doesn't do it, it's all new code in this series that > > > > > contains not acceptable direct access from one device (iommu) to another (cpu). > > > > > Proper way would be for the owner of iommu to fish limits from somewhere and set > > > > > values during iommu creation. > > > > > > > > Maybe it's a good idea to add documentation for now. > > > > > > Thanks Michael. So what kind of documentation do you refer? > > > > The idea would be to have two properties, AW for the CPU and > > the IOMMU. In the documentation explain that they > > should normally be set to the same value. > > > > > > > > > > It would be nice not to push this stuff up the stack, > > > > it's unfortunate that our internal APIs make it hard. > > > > > > Sorry, I do not quite get it. What do you mean "internal APIs make it hard"? :) > > > > The API doesn't actually guarantee any initialization order. > > CPU happens to be initialized first but I do not > > think there's a guarantee that it will keep being the case. > > This makes it hard to get properties from one device > > and use in another one. > > > > Oops... > Then there can be no easy way in the runtime to gurantee this. BTW, could we > initialize CPU before other components? Is it hard to do, or not reasonable > to do so? I think we already happen to do it, but we lack a generic way to describe the order of initialization at the QOM level. Instead for a while now we've been trying to remove dependencies between devices. Thus the general reluctance to add another dependency. Given this one is more of a hack I'm not sure it qualifies as a good reason to change that. > I have plan to draft a doc in qemu on 5-level paging topic(maybe after all the > enabling is done). But I don't this this is the proper place to put - as you > can see, this fix is not relevant to 5-level paging. So any suggestion about > the documentation? Documentation for user-visible fetures generally belongs in the man page. > > > > > > > > > > > > > > > > > > > > > > Perhaps Eduardo > > > > > > > can suggest better approach, since he's more familiar with phys_bits topic > > > > > > > > > > > > @Eduardo, any comments? Thanks! > > > > > > > > > > > > > > > > > > > > > /* > > > > > > > > * Rsvd field masks for spte > > > > > > > > */ > > > > > > > > vtd_paging_entry_rsvd_field[0] = ~0ULL; > > > > > > > > - vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits); > > > > > > > > - vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits); > > > > > > > > - vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits); > > > > > > > > - vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits); > > > > > > > > - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->aw_bits); > > > > > > > > - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits); > > > > > > > > - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits); > > > > > > > > - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->aw_bits); > > > > > > > > + vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->haw_bits); > > > > > > > > + vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > > > + vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > > > + vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > > > > > + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > > > + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > > > + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > > > > > if (x86_iommu->intr_supported) { > > > > > > > > s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; > > > > > > > > @@ -3261,10 +3268,10 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > > > > > } > > > > > > > > > > > > > > > > /* Currently only address widths supported are 39 and 48 bits */ > > > > > > > > - if ((s->aw_bits != VTD_HOST_AW_39BIT) && > > > > > > > > - (s->aw_bits != VTD_HOST_AW_48BIT)) { > > > > > > > > + if ((s->aw_bits != VTD_AW_39BIT) && > > > > > > > > + (s->aw_bits != VTD_AW_48BIT)) { > > > > > > > > error_setg(errp, "Supported values for x-aw-bits are: %d, %d", > > > > > > > > - VTD_HOST_AW_39BIT, VTD_HOST_AW_48BIT); > > > > > > > > + VTD_AW_39BIT, VTD_AW_48BIT); > > > > > > > > return false; > > > > > > > > } > > > > > > > > > > > > > > > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h > > > > > > > > index ed4e758..820451c 100644 > > > > > > > > --- a/include/hw/i386/intel_iommu.h > > > > > > > > +++ b/include/hw/i386/intel_iommu.h > > > > > > > > @@ -47,9 +47,9 @@ > > > > > > > > #define VTD_SID_TO_DEVFN(sid) ((sid) & 0xff) > > > > > > > > > > > > > > > > #define DMAR_REG_SIZE 0x230 > > > > > > > > -#define VTD_HOST_AW_39BIT 39 > > > > > > > > -#define VTD_HOST_AW_48BIT 48 > > > > > > > > -#define VTD_HOST_ADDRESS_WIDTH VTD_HOST_AW_39BIT > > > > > > > > +#define VTD_AW_39BIT 39 > > > > > > > > +#define VTD_AW_48BIT 48 > > > > > > > > +#define VTD_ADDRESS_WIDTH VTD_AW_39BIT > > > > > > > > #define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1) > > > > > > > > > > > > > > > > #define DMAR_REPORT_F_INTR (1) > > > > > > > > @@ -244,7 +244,8 @@ struct IntelIOMMUState { > > > > > > > > bool intr_eime; /* Extended interrupt mode enabled */ > > > > > > > > OnOffAuto intr_eim; /* Toggle for EIM cabability */ > > > > > > > > bool buggy_eim; /* Force buggy EIM unless eim=off */ > > > > > > > > - uint8_t aw_bits; /* Host/IOVA address width (in bits) */ > > > > > > > > + uint8_t aw_bits; /* IOVA address width (in bits) */ > > > > > > > > + uint8_t haw_bits; /* Hardware address width (in bits) */ > > > > > > > > > > > > > > > > /* > > > > > > > > * Protects IOMMU states in general. Currently it protects the > > > > > > > > > > > > > > > > > > > > > > > > > > B.R. > > > > > > Yu > > > > > > > > > > B.R. > > > Yu > > B.R. > Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-18 14:55 ` Igor Mammedov 2018-12-18 14:58 ` Michael S. Tsirkin @ 2018-12-19 2:57 ` Yu Zhang 2018-12-19 10:40 ` Igor Mammedov 1 sibling, 1 reply; 57+ messages in thread From: Yu Zhang @ 2018-12-19 2:57 UTC (permalink / raw) To: Igor Mammedov Cc: Eduardo Habkost, Michael S. Tsirkin, qemu-devel, Peter Xu, Paolo Bonzini, Richard Henderson On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote: > On Tue, 18 Dec 2018 17:27:23 +0800 > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote: > > > On Wed, 12 Dec 2018 21:05:38 +0800 > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > Currently, vIOMMU is using the value of IOVA address width, instead of > > > > the host address width(HAW) to calculate the number of reserved bits in > > > > data structures such as root entries, context entries, and entries of > > > > DMA paging structures etc. > > > > > > > > However values of IOVA address width and of the HAW may not equal. For > > > > example, a 48-bit IOVA can only be mapped to host addresses no wider than > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may result > > > > in an invalid IOVA being accepted. > > > > > > > > To fix this, a new field - haw_bits is introduced in struct IntelIOMMUState, > > > > whose value is initialized based on the maximum physical address set to > > > > guest CPU. > > > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed > > > > to clarify. > > > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > > --- > > > [...] > > > > > > > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n) > > > > static void vtd_init(IntelIOMMUState *s) > > > > { > > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > > + CPUState *cs = first_cpu; > > > > + X86CPU *cpu = X86_CPU(cs); > > > > > > > > memset(s->csr, 0, DMAR_REG_SIZE); > > > > memset(s->wmask, 0, DMAR_REG_SIZE); > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s) > > > > s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND | > > > > VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS | > > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > > - if (s->aw_bits == VTD_HOST_AW_48BIT) { > > > > + if (s->aw_bits == VTD_AW_48BIT) { > > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > > } > > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > > + s->haw_bits = cpu->phys_bits; > > > Is it possible to avoid accessing CPU fields directly or cpu altogether > > > and set phys_bits when iommu is created? > > > > Thanks for your comments, Igor. > > > > Well, I guess you prefer not to query the CPU capabilities while deciding > > the vIOMMU features. But to me, they are not that irrelevant.:) > > > > Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR > > are referring to the same concept. In VM, both are the maximum guest physical > > address width. If we do not check the CPU field here, we will still have to > > check the CPU field in other places such as build_dmar_q35(), and reset the > > s->haw_bits again. > > > > Is this explanation convincing enough? :) > current build_dmar_q35() doesn't do it, it's all new code in this series that > contains not acceptable direct access from one device (iommu) to another (cpu). > Proper way would be for the owner of iommu to fish limits from somewhere and set > values during iommu creation. Well, current build_dmar_q35() doesn't do it, because it is using the incorrect value. :) According to the spec, the host address width is the maximum physical address width, yet current implementation is using the DMA address width. For me, this is not only wrong, but also unsecure. For this point, I think we all agree this need to be fixed. As to how to fix it - should we query the cpu fields, I still do not understand why this is not acceptable. :) I had thought of other approaches before, yet I did not choose: 1> Introduce a new parameter, say, "x-haw-bits" which is used for iommu to limit its physical address width(similar to the "x-aw-bits" for IOVA). But what should we check this parameter or not? What if this parameter is set to sth. different than the "phys-bits" or not? 2> Another choice I had thought of is, to query the physical iommu. I abandoned this idea because my understanding is that vIOMMU is not a passthrued device, it is emulated. So Igor, may I ask why you think checking against the cpu fields so not acceptable? :) > > > > > > > Perhaps Eduardo > > > can suggest better approach, since he's more familiar with phys_bits topic > > > > @Eduardo, any comments? Thanks! > > > > > > > > > /* > > > > * Rsvd field masks for spte > > > > */ > > > > vtd_paging_entry_rsvd_field[0] = ~0ULL; > > > > - vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits); > > > > - vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits); > > > > - vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits); > > > > - vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits); > > > > - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->aw_bits); > > > > - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits); > > > > - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits); > > > > - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->aw_bits); > > > > + vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->haw_bits); > > > > + vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); > > > > + vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); > > > > + vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); > > > > + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > if (x86_iommu->intr_supported) { > > > > s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; > > > > @@ -3261,10 +3268,10 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > } > > > > > > > > /* Currently only address widths supported are 39 and 48 bits */ > > > > - if ((s->aw_bits != VTD_HOST_AW_39BIT) && > > > > - (s->aw_bits != VTD_HOST_AW_48BIT)) { > > > > + if ((s->aw_bits != VTD_AW_39BIT) && > > > > + (s->aw_bits != VTD_AW_48BIT)) { > > > > error_setg(errp, "Supported values for x-aw-bits are: %d, %d", > > > > - VTD_HOST_AW_39BIT, VTD_HOST_AW_48BIT); > > > > + VTD_AW_39BIT, VTD_AW_48BIT); > > > > return false; > > > > } > > > > > > > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h > > > > index ed4e758..820451c 100644 > > > > --- a/include/hw/i386/intel_iommu.h > > > > +++ b/include/hw/i386/intel_iommu.h > > > > @@ -47,9 +47,9 @@ > > > > #define VTD_SID_TO_DEVFN(sid) ((sid) & 0xff) > > > > > > > > #define DMAR_REG_SIZE 0x230 > > > > -#define VTD_HOST_AW_39BIT 39 > > > > -#define VTD_HOST_AW_48BIT 48 > > > > -#define VTD_HOST_ADDRESS_WIDTH VTD_HOST_AW_39BIT > > > > +#define VTD_AW_39BIT 39 > > > > +#define VTD_AW_48BIT 48 > > > > +#define VTD_ADDRESS_WIDTH VTD_AW_39BIT > > > > #define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1) > > > > > > > > #define DMAR_REPORT_F_INTR (1) > > > > @@ -244,7 +244,8 @@ struct IntelIOMMUState { > > > > bool intr_eime; /* Extended interrupt mode enabled */ > > > > OnOffAuto intr_eim; /* Toggle for EIM cabability */ > > > > bool buggy_eim; /* Force buggy EIM unless eim=off */ > > > > - uint8_t aw_bits; /* Host/IOVA address width (in bits) */ > > > > + uint8_t aw_bits; /* IOVA address width (in bits) */ > > > > + uint8_t haw_bits; /* Hardware address width (in bits) */ > > > > > > > > /* > > > > * Protects IOMMU states in general. Currently it protects the > > > > > > > > > > B.R. > > Yu > > B.R. Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-19 2:57 ` Yu Zhang @ 2018-12-19 10:40 ` Igor Mammedov 2018-12-19 16:47 ` Michael S. Tsirkin 2018-12-20 21:18 ` Eduardo Habkost 0 siblings, 2 replies; 57+ messages in thread From: Igor Mammedov @ 2018-12-19 10:40 UTC (permalink / raw) To: Yu Zhang Cc: Eduardo Habkost, Michael S. Tsirkin, qemu-devel, Peter Xu, Paolo Bonzini, Richard Henderson On Wed, 19 Dec 2018 10:57:17 +0800 Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote: > > On Tue, 18 Dec 2018 17:27:23 +0800 > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote: > > > > On Wed, 12 Dec 2018 21:05:38 +0800 > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > Currently, vIOMMU is using the value of IOVA address width, instead of > > > > > the host address width(HAW) to calculate the number of reserved bits in > > > > > data structures such as root entries, context entries, and entries of > > > > > DMA paging structures etc. > > > > > > > > > > However values of IOVA address width and of the HAW may not equal. For > > > > > example, a 48-bit IOVA can only be mapped to host addresses no wider than > > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may result > > > > > in an invalid IOVA being accepted. > > > > > > > > > > To fix this, a new field - haw_bits is introduced in struct IntelIOMMUState, > > > > > whose value is initialized based on the maximum physical address set to > > > > > guest CPU. > > > > > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed > > > > > to clarify. > > > > > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > > > --- > > > > [...] > > > > > > > > > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n) > > > > > static void vtd_init(IntelIOMMUState *s) > > > > > { > > > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > > > + CPUState *cs = first_cpu; > > > > > + X86CPU *cpu = X86_CPU(cs); > > > > > > > > > > memset(s->csr, 0, DMAR_REG_SIZE); > > > > > memset(s->wmask, 0, DMAR_REG_SIZE); > > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s) > > > > > s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND | > > > > > VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS | > > > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > > > - if (s->aw_bits == VTD_HOST_AW_48BIT) { > > > > > + if (s->aw_bits == VTD_AW_48BIT) { > > > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > > > } > > > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > > > + s->haw_bits = cpu->phys_bits; > > > > Is it possible to avoid accessing CPU fields directly or cpu altogether > > > > and set phys_bits when iommu is created? > > > > > > Thanks for your comments, Igor. > > > > > > Well, I guess you prefer not to query the CPU capabilities while deciding > > > the vIOMMU features. But to me, they are not that irrelevant.:) > > > > > > Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR > > > are referring to the same concept. In VM, both are the maximum guest physical > > > address width. If we do not check the CPU field here, we will still have to > > > check the CPU field in other places such as build_dmar_q35(), and reset the > > > s->haw_bits again. > > > > > > Is this explanation convincing enough? :) > > current build_dmar_q35() doesn't do it, it's all new code in this series that > > contains not acceptable direct access from one device (iommu) to another (cpu). > > Proper way would be for the owner of iommu to fish limits from somewhere and set > > values during iommu creation. > > Well, current build_dmar_q35() doesn't do it, because it is using the incorrect value. :) > According to the spec, the host address width is the maximum physical address width, > yet current implementation is using the DMA address width. For me, this is not only > wrong, but also unsecure. For this point, I think we all agree this need to be fixed. > > As to how to fix it - should we query the cpu fields, I still do not understand why > this is not acceptable. :) > > I had thought of other approaches before, yet I did not choose: > > 1> Introduce a new parameter, say, "x-haw-bits" which is used for iommu to limit its > physical address width(similar to the "x-aw-bits" for IOVA). But what should we check > this parameter or not? What if this parameter is set to sth. different than the "phys-bits" > or not? > > 2> Another choice I had thought of is, to query the physical iommu. I abandoned this > idea because my understanding is that vIOMMU is not a passthrued device, it is emulated. > So Igor, may I ask why you think checking against the cpu fields so not acceptable? :) Because accessing private fields of device from another random device is not robust and a subject to breaking in unpredictable manner when field meaning or initialization order changes. (analogy to baremetal: one does not solder wire to a CPU die to let access some piece of data from random device). I've looked at intel-iommu code and how it's created so here is a way to do the thing you need using proper interfaces: 1. add x-haw_bits property 2. include in your series patch '[Qemu-devel] [PATCH] qdev: let machine hotplug handler to override bus hotplug handler' 3. add your iommu to pc_get_hotpug_handler() to redirect plug flow to machine and let _pre_plug handler to check and set x-haw_bits for machine level 4. you probably can use phys-bits/host-phys-bits properties to get data that you need also see how ms->possible_cpus, that's how you can get access to CPU from machine layer. > > > > > > > > > > Perhaps Eduardo > > > > can suggest better approach, since he's more familiar with phys_bits topic > > > > > > @Eduardo, any comments? Thanks! > > > > > > > > > > > > /* > > > > > * Rsvd field masks for spte > > > > > */ > > > > > vtd_paging_entry_rsvd_field[0] = ~0ULL; > > > > > - vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits); > > > > > - vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits); > > > > > - vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits); > > > > > - vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits); > > > > > - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->aw_bits); > > > > > - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits); > > > > > - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits); > > > > > - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->aw_bits); > > > > > + vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->haw_bits); > > > > > + vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); > > > > > + vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); > > > > > + vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); > > > > > + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > > + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > > + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > > + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > > > if (x86_iommu->intr_supported) { > > > > > s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; > > > > > @@ -3261,10 +3268,10 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > > } > > > > > > > > > > /* Currently only address widths supported are 39 and 48 bits */ > > > > > - if ((s->aw_bits != VTD_HOST_AW_39BIT) && > > > > > - (s->aw_bits != VTD_HOST_AW_48BIT)) { > > > > > + if ((s->aw_bits != VTD_AW_39BIT) && > > > > > + (s->aw_bits != VTD_AW_48BIT)) { > > > > > error_setg(errp, "Supported values for x-aw-bits are: %d, %d", > > > > > - VTD_HOST_AW_39BIT, VTD_HOST_AW_48BIT); > > > > > + VTD_AW_39BIT, VTD_AW_48BIT); > > > > > return false; > > > > > } > > > > > > > > > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h > > > > > index ed4e758..820451c 100644 > > > > > --- a/include/hw/i386/intel_iommu.h > > > > > +++ b/include/hw/i386/intel_iommu.h > > > > > @@ -47,9 +47,9 @@ > > > > > #define VTD_SID_TO_DEVFN(sid) ((sid) & 0xff) > > > > > > > > > > #define DMAR_REG_SIZE 0x230 > > > > > -#define VTD_HOST_AW_39BIT 39 > > > > > -#define VTD_HOST_AW_48BIT 48 > > > > > -#define VTD_HOST_ADDRESS_WIDTH VTD_HOST_AW_39BIT > > > > > +#define VTD_AW_39BIT 39 > > > > > +#define VTD_AW_48BIT 48 > > > > > +#define VTD_ADDRESS_WIDTH VTD_AW_39BIT > > > > > #define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1) > > > > > > > > > > #define DMAR_REPORT_F_INTR (1) > > > > > @@ -244,7 +244,8 @@ struct IntelIOMMUState { > > > > > bool intr_eime; /* Extended interrupt mode enabled */ > > > > > OnOffAuto intr_eim; /* Toggle for EIM cabability */ > > > > > bool buggy_eim; /* Force buggy EIM unless eim=off */ > > > > > - uint8_t aw_bits; /* Host/IOVA address width (in bits) */ > > > > > + uint8_t aw_bits; /* IOVA address width (in bits) */ > > > > > + uint8_t haw_bits; /* Hardware address width (in bits) */ > > > > > > > > > > /* > > > > > * Protects IOMMU states in general. Currently it protects the > > > > > > > > > > > > > > B.R. > > > Yu > > > > > > B.R. > Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-19 10:40 ` Igor Mammedov @ 2018-12-19 16:47 ` Michael S. Tsirkin 2018-12-20 5:59 ` Yu Zhang 2018-12-20 21:18 ` Eduardo Habkost 1 sibling, 1 reply; 57+ messages in thread From: Michael S. Tsirkin @ 2018-12-19 16:47 UTC (permalink / raw) To: Igor Mammedov Cc: Yu Zhang, Eduardo Habkost, qemu-devel, Peter Xu, Paolo Bonzini, Richard Henderson On Wed, Dec 19, 2018 at 11:40:37AM +0100, Igor Mammedov wrote: > On Wed, 19 Dec 2018 10:57:17 +0800 > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote: > > > On Tue, 18 Dec 2018 17:27:23 +0800 > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote: > > > > > On Wed, 12 Dec 2018 21:05:38 +0800 > > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > > > Currently, vIOMMU is using the value of IOVA address width, instead of > > > > > > the host address width(HAW) to calculate the number of reserved bits in > > > > > > data structures such as root entries, context entries, and entries of > > > > > > DMA paging structures etc. > > > > > > > > > > > > However values of IOVA address width and of the HAW may not equal. For > > > > > > example, a 48-bit IOVA can only be mapped to host addresses no wider than > > > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may result > > > > > > in an invalid IOVA being accepted. > > > > > > > > > > > > To fix this, a new field - haw_bits is introduced in struct IntelIOMMUState, > > > > > > whose value is initialized based on the maximum physical address set to > > > > > > guest CPU. > > > > > > > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed > > > > > > to clarify. > > > > > > > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > > > > --- > > > > > [...] > > > > > > > > > > > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n) > > > > > > static void vtd_init(IntelIOMMUState *s) > > > > > > { > > > > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > > > > + CPUState *cs = first_cpu; > > > > > > + X86CPU *cpu = X86_CPU(cs); > > > > > > > > > > > > memset(s->csr, 0, DMAR_REG_SIZE); > > > > > > memset(s->wmask, 0, DMAR_REG_SIZE); > > > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s) > > > > > > s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND | > > > > > > VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS | > > > > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > > > > - if (s->aw_bits == VTD_HOST_AW_48BIT) { > > > > > > + if (s->aw_bits == VTD_AW_48BIT) { > > > > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > > > > } > > > > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > > > > + s->haw_bits = cpu->phys_bits; > > > > > Is it possible to avoid accessing CPU fields directly or cpu altogether > > > > > and set phys_bits when iommu is created? > > > > > > > > Thanks for your comments, Igor. > > > > > > > > Well, I guess you prefer not to query the CPU capabilities while deciding > > > > the vIOMMU features. But to me, they are not that irrelevant.:) > > > > > > > > Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR > > > > are referring to the same concept. In VM, both are the maximum guest physical > > > > address width. If we do not check the CPU field here, we will still have to > > > > check the CPU field in other places such as build_dmar_q35(), and reset the > > > > s->haw_bits again. > > > > > > > > Is this explanation convincing enough? :) > > > current build_dmar_q35() doesn't do it, it's all new code in this series that > > > contains not acceptable direct access from one device (iommu) to another (cpu). > > > Proper way would be for the owner of iommu to fish limits from somewhere and set > > > values during iommu creation. > > > > Well, current build_dmar_q35() doesn't do it, because it is using the incorrect value. :) > > According to the spec, the host address width is the maximum physical address width, > > yet current implementation is using the DMA address width. For me, this is not only > > wrong, but also unsecure. For this point, I think we all agree this need to be fixed. > > > > As to how to fix it - should we query the cpu fields, I still do not understand why > > this is not acceptable. :) > > > > I had thought of other approaches before, yet I did not choose: > > > > 1> Introduce a new parameter, say, "x-haw-bits" which is used for iommu to limit its > > physical address width(similar to the "x-aw-bits" for IOVA). But what should we check > > this parameter or not? What if this parameter is set to sth. different than the "phys-bits" > > or not? > > > > 2> Another choice I had thought of is, to query the physical iommu. I abandoned this > > idea because my understanding is that vIOMMU is not a passthrued device, it is emulated. > > > So Igor, may I ask why you think checking against the cpu fields so not acceptable? :) > Because accessing private fields of device from another random device is not robust > and a subject to breaking in unpredictable manner when field meaning or initialization > order changes. (analogy to baremetal: one does not solder wire to a CPU die to let > access some piece of data from random device). > > I've looked at intel-iommu code and how it's created so here is a way to do the thing > you need using proper interfaces: > > 1. add x-haw_bits property > 2. include in your series patch > '[Qemu-devel] [PATCH] qdev: let machine hotplug handler to override bus hotplug handler' > 3. add your iommu to pc_get_hotpug_handler() to redirect plug flow to > machine and let _pre_plug handler to check and set x-haw_bits for machine level > 4. you probably can use phys-bits/host-phys-bits properties to get data that you need > also see how ms->possible_cpus, that's how you can get access to CPU from machine > layer. But given it's all actually a hack trying to guess host CPU capabilities, I would rather say 1. add a host kernel interface to get it from VFIO 2. on a host where it's not there, and assuming we want to support old kernels, write a function returning these (do we? why? is the 5 level hardware already so widespread?), and call it at any time. No need to poke at the VCPU. > > > > > > > > > > > > > Perhaps Eduardo > > > > > can suggest better approach, since he's more familiar with phys_bits topic > > > > > > > > @Eduardo, any comments? Thanks! > > > > > > > > > > > > > > > /* > > > > > > * Rsvd field masks for spte > > > > > > */ > > > > > > vtd_paging_entry_rsvd_field[0] = ~0ULL; > > > > > > - vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits); > > > > > > - vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits); > > > > > > - vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits); > > > > > > - vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits); > > > > > > - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->aw_bits); > > > > > > - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits); > > > > > > - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits); > > > > > > - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->aw_bits); > > > > > > + vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->haw_bits); > > > > > > + vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > + vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > + vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > > > + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > if (x86_iommu->intr_supported) { > > > > > > s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; > > > > > > @@ -3261,10 +3268,10 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > > > } > > > > > > > > > > > > /* Currently only address widths supported are 39 and 48 bits */ > > > > > > - if ((s->aw_bits != VTD_HOST_AW_39BIT) && > > > > > > - (s->aw_bits != VTD_HOST_AW_48BIT)) { > > > > > > + if ((s->aw_bits != VTD_AW_39BIT) && > > > > > > + (s->aw_bits != VTD_AW_48BIT)) { > > > > > > error_setg(errp, "Supported values for x-aw-bits are: %d, %d", > > > > > > - VTD_HOST_AW_39BIT, VTD_HOST_AW_48BIT); > > > > > > + VTD_AW_39BIT, VTD_AW_48BIT); > > > > > > return false; > > > > > > } > > > > > > > > > > > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h > > > > > > index ed4e758..820451c 100644 > > > > > > --- a/include/hw/i386/intel_iommu.h > > > > > > +++ b/include/hw/i386/intel_iommu.h > > > > > > @@ -47,9 +47,9 @@ > > > > > > #define VTD_SID_TO_DEVFN(sid) ((sid) & 0xff) > > > > > > > > > > > > #define DMAR_REG_SIZE 0x230 > > > > > > -#define VTD_HOST_AW_39BIT 39 > > > > > > -#define VTD_HOST_AW_48BIT 48 > > > > > > -#define VTD_HOST_ADDRESS_WIDTH VTD_HOST_AW_39BIT > > > > > > +#define VTD_AW_39BIT 39 > > > > > > +#define VTD_AW_48BIT 48 > > > > > > +#define VTD_ADDRESS_WIDTH VTD_AW_39BIT > > > > > > #define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1) > > > > > > > > > > > > #define DMAR_REPORT_F_INTR (1) > > > > > > @@ -244,7 +244,8 @@ struct IntelIOMMUState { > > > > > > bool intr_eime; /* Extended interrupt mode enabled */ > > > > > > OnOffAuto intr_eim; /* Toggle for EIM cabability */ > > > > > > bool buggy_eim; /* Force buggy EIM unless eim=off */ > > > > > > - uint8_t aw_bits; /* Host/IOVA address width (in bits) */ > > > > > > + uint8_t aw_bits; /* IOVA address width (in bits) */ > > > > > > + uint8_t haw_bits; /* Hardware address width (in bits) */ > > > > > > > > > > > > /* > > > > > > * Protects IOMMU states in general. Currently it protects the > > > > > > > > > > > > > > > > > > B.R. > > > > Yu > > > > > > > > > > B.R. > > Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-19 16:47 ` Michael S. Tsirkin @ 2018-12-20 5:59 ` Yu Zhang 0 siblings, 0 replies; 57+ messages in thread From: Yu Zhang @ 2018-12-20 5:59 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Igor Mammedov, Eduardo Habkost, qemu-devel, Peter Xu, Paolo Bonzini, Richard Henderson On Wed, Dec 19, 2018 at 11:47:23AM -0500, Michael S. Tsirkin wrote: > On Wed, Dec 19, 2018 at 11:40:37AM +0100, Igor Mammedov wrote: > > On Wed, 19 Dec 2018 10:57:17 +0800 > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote: > > > > On Tue, 18 Dec 2018 17:27:23 +0800 > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote: > > > > > > On Wed, 12 Dec 2018 21:05:38 +0800 > > > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > > > > > Currently, vIOMMU is using the value of IOVA address width, instead of > > > > > > > the host address width(HAW) to calculate the number of reserved bits in > > > > > > > data structures such as root entries, context entries, and entries of > > > > > > > DMA paging structures etc. > > > > > > > > > > > > > > However values of IOVA address width and of the HAW may not equal. For > > > > > > > example, a 48-bit IOVA can only be mapped to host addresses no wider than > > > > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may result > > > > > > > in an invalid IOVA being accepted. > > > > > > > > > > > > > > To fix this, a new field - haw_bits is introduced in struct IntelIOMMUState, > > > > > > > whose value is initialized based on the maximum physical address set to > > > > > > > guest CPU. > > > > > > > > > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed > > > > > > > to clarify. > > > > > > > > > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > > > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > > > > > --- > > > > > > [...] > > > > > > > > > > > > > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n) > > > > > > > static void vtd_init(IntelIOMMUState *s) > > > > > > > { > > > > > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > > > > > + CPUState *cs = first_cpu; > > > > > > > + X86CPU *cpu = X86_CPU(cs); > > > > > > > > > > > > > > memset(s->csr, 0, DMAR_REG_SIZE); > > > > > > > memset(s->wmask, 0, DMAR_REG_SIZE); > > > > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s) > > > > > > > s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND | > > > > > > > VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS | > > > > > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > > > > > - if (s->aw_bits == VTD_HOST_AW_48BIT) { > > > > > > > + if (s->aw_bits == VTD_AW_48BIT) { > > > > > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > > > > > } > > > > > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > > > > > + s->haw_bits = cpu->phys_bits; > > > > > > Is it possible to avoid accessing CPU fields directly or cpu altogether > > > > > > and set phys_bits when iommu is created? > > > > > > > > > > Thanks for your comments, Igor. > > > > > > > > > > Well, I guess you prefer not to query the CPU capabilities while deciding > > > > > the vIOMMU features. But to me, they are not that irrelevant.:) > > > > > > > > > > Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR > > > > > are referring to the same concept. In VM, both are the maximum guest physical > > > > > address width. If we do not check the CPU field here, we will still have to > > > > > check the CPU field in other places such as build_dmar_q35(), and reset the > > > > > s->haw_bits again. > > > > > > > > > > Is this explanation convincing enough? :) > > > > current build_dmar_q35() doesn't do it, it's all new code in this series that > > > > contains not acceptable direct access from one device (iommu) to another (cpu). > > > > Proper way would be for the owner of iommu to fish limits from somewhere and set > > > > values during iommu creation. > > > > > > Well, current build_dmar_q35() doesn't do it, because it is using the incorrect value. :) > > > According to the spec, the host address width is the maximum physical address width, > > > yet current implementation is using the DMA address width. For me, this is not only > > > wrong, but also unsecure. For this point, I think we all agree this need to be fixed. > > > > > > As to how to fix it - should we query the cpu fields, I still do not understand why > > > this is not acceptable. :) > > > > > > I had thought of other approaches before, yet I did not choose: > > > > > > 1> Introduce a new parameter, say, "x-haw-bits" which is used for iommu to limit its > > > physical address width(similar to the "x-aw-bits" for IOVA). But what should we check > > > this parameter or not? What if this parameter is set to sth. different than the "phys-bits" > > > or not? > > > > > > 2> Another choice I had thought of is, to query the physical iommu. I abandoned this > > > idea because my understanding is that vIOMMU is not a passthrued device, it is emulated. > > > > > So Igor, may I ask why you think checking against the cpu fields so not acceptable? :) > > Because accessing private fields of device from another random device is not robust > > and a subject to breaking in unpredictable manner when field meaning or initialization > > order changes. (analogy to baremetal: one does not solder wire to a CPU die to let > > access some piece of data from random device). > > > > I've looked at intel-iommu code and how it's created so here is a way to do the thing > > you need using proper interfaces: > > > > 1. add x-haw_bits property > > 2. include in your series patch > > '[Qemu-devel] [PATCH] qdev: let machine hotplug handler to override bus hotplug handler' > > 3. add your iommu to pc_get_hotpug_handler() to redirect plug flow to > > machine and let _pre_plug handler to check and set x-haw_bits for machine level > > 4. you probably can use phys-bits/host-phys-bits properties to get data that you need > > also see how ms->possible_cpus, that's how you can get access to CPU from machine > > layer. > > > But given it's all actually a hack trying to guess host CPU capabilities, Well, not exactly. :) Unlike the 2nd patch in this series, which I used host cpu capability as a reference(though not the final solution), what this patch cares about is the guest physical address width, which can possibly be different than the physical one. E.g. we can create a VM with 39 bit physical address on hosts whose address width is 46 bits, and in such case, 39 shall be the address limitation in guest DMAR, instead of 46. So I think Igor's proposal can meet all my requirement(I'll study the hotplug handler interface to figure out). > I would rather say > 1. add a host kernel interface to get it from VFIO > 2. on a host where it's not there, and assuming we want to support old kernels, > write a function returning these (do we? why? is the 5 level hardware already > so widespread?), and call it at any time. No need to poke at the VCPU. > > > > > > > > > > > > > > > > > > > Perhaps Eduardo > > > > > > can suggest better approach, since he's more familiar with phys_bits topic > > > > > > > > > > @Eduardo, any comments? Thanks! > > > > > > > > > > > > > > > > > > /* > > > > > > > * Rsvd field masks for spte > > > > > > > */ > > > > > > > vtd_paging_entry_rsvd_field[0] = ~0ULL; > > > > > > > - vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits); > > > > > > > - vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits); > > > > > > > - vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits); > > > > > > > - vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits); > > > > > > > - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->aw_bits); > > > > > > > - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits); > > > > > > > - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits); > > > > > > > - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->aw_bits); > > > > > > > + vtd_paging_entry_rsvd_field[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->haw_bits); > > > > > > > + vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > > + vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > > + vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > > > > + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > > + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > > + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > > > if (x86_iommu->intr_supported) { > > > > > > > s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; > > > > > > > @@ -3261,10 +3268,10 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > > > > } > > > > > > > > > > > > > > /* Currently only address widths supported are 39 and 48 bits */ > > > > > > > - if ((s->aw_bits != VTD_HOST_AW_39BIT) && > > > > > > > - (s->aw_bits != VTD_HOST_AW_48BIT)) { > > > > > > > + if ((s->aw_bits != VTD_AW_39BIT) && > > > > > > > + (s->aw_bits != VTD_AW_48BIT)) { > > > > > > > error_setg(errp, "Supported values for x-aw-bits are: %d, %d", > > > > > > > - VTD_HOST_AW_39BIT, VTD_HOST_AW_48BIT); > > > > > > > + VTD_AW_39BIT, VTD_AW_48BIT); > > > > > > > return false; > > > > > > > } > > > > > > > > > > > > > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h > > > > > > > index ed4e758..820451c 100644 > > > > > > > --- a/include/hw/i386/intel_iommu.h > > > > > > > +++ b/include/hw/i386/intel_iommu.h > > > > > > > @@ -47,9 +47,9 @@ > > > > > > > #define VTD_SID_TO_DEVFN(sid) ((sid) & 0xff) > > > > > > > > > > > > > > #define DMAR_REG_SIZE 0x230 > > > > > > > -#define VTD_HOST_AW_39BIT 39 > > > > > > > -#define VTD_HOST_AW_48BIT 48 > > > > > > > -#define VTD_HOST_ADDRESS_WIDTH VTD_HOST_AW_39BIT > > > > > > > +#define VTD_AW_39BIT 39 > > > > > > > +#define VTD_AW_48BIT 48 > > > > > > > +#define VTD_ADDRESS_WIDTH VTD_AW_39BIT > > > > > > > #define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1) > > > > > > > > > > > > > > #define DMAR_REPORT_F_INTR (1) > > > > > > > @@ -244,7 +244,8 @@ struct IntelIOMMUState { > > > > > > > bool intr_eime; /* Extended interrupt mode enabled */ > > > > > > > OnOffAuto intr_eim; /* Toggle for EIM cabability */ > > > > > > > bool buggy_eim; /* Force buggy EIM unless eim=off */ > > > > > > > - uint8_t aw_bits; /* Host/IOVA address width (in bits) */ > > > > > > > + uint8_t aw_bits; /* IOVA address width (in bits) */ > > > > > > > + uint8_t haw_bits; /* Hardware address width (in bits) */ > > > > > > > > > > > > > > /* > > > > > > > * Protects IOMMU states in general. Currently it protects the > > > > > > > > > > > > > > > > > > > > > > B.R. > > > > > Yu > > > > > > > > > > > > > > B.R. > > > Yu B.R. Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-19 10:40 ` Igor Mammedov 2018-12-19 16:47 ` Michael S. Tsirkin @ 2018-12-20 21:18 ` Eduardo Habkost 2018-12-21 14:13 ` Igor Mammedov 1 sibling, 1 reply; 57+ messages in thread From: Eduardo Habkost @ 2018-12-20 21:18 UTC (permalink / raw) To: Igor Mammedov Cc: Yu Zhang, Michael S. Tsirkin, qemu-devel, Peter Xu, Paolo Bonzini, Richard Henderson On Wed, Dec 19, 2018 at 11:40:37AM +0100, Igor Mammedov wrote: > On Wed, 19 Dec 2018 10:57:17 +0800 > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote: > > > On Tue, 18 Dec 2018 17:27:23 +0800 > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote: > > > > > On Wed, 12 Dec 2018 21:05:38 +0800 > > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > > > Currently, vIOMMU is using the value of IOVA address width, instead of > > > > > > the host address width(HAW) to calculate the number of reserved bits in > > > > > > data structures such as root entries, context entries, and entries of > > > > > > DMA paging structures etc. > > > > > > > > > > > > However values of IOVA address width and of the HAW may not equal. For > > > > > > example, a 48-bit IOVA can only be mapped to host addresses no wider than > > > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may result > > > > > > in an invalid IOVA being accepted. > > > > > > > > > > > > To fix this, a new field - haw_bits is introduced in struct IntelIOMMUState, > > > > > > whose value is initialized based on the maximum physical address set to > > > > > > guest CPU. > > > > > > > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed > > > > > > to clarify. > > > > > > > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > > > > --- > > > > > [...] > > > > > > > > > > > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n) > > > > > > static void vtd_init(IntelIOMMUState *s) > > > > > > { > > > > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > > > > + CPUState *cs = first_cpu; > > > > > > + X86CPU *cpu = X86_CPU(cs); > > > > > > > > > > > > memset(s->csr, 0, DMAR_REG_SIZE); > > > > > > memset(s->wmask, 0, DMAR_REG_SIZE); > > > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s) > > > > > > s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND | > > > > > > VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS | > > > > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > > > > - if (s->aw_bits == VTD_HOST_AW_48BIT) { > > > > > > + if (s->aw_bits == VTD_AW_48BIT) { > > > > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > > > > } > > > > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > > > > + s->haw_bits = cpu->phys_bits; > > > > > Is it possible to avoid accessing CPU fields directly or cpu altogether > > > > > and set phys_bits when iommu is created? > > > > > > > > Thanks for your comments, Igor. > > > > > > > > Well, I guess you prefer not to query the CPU capabilities while deciding > > > > the vIOMMU features. But to me, they are not that irrelevant.:) > > > > > > > > Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR > > > > are referring to the same concept. In VM, both are the maximum guest physical > > > > address width. If we do not check the CPU field here, we will still have to > > > > check the CPU field in other places such as build_dmar_q35(), and reset the > > > > s->haw_bits again. > > > > > > > > Is this explanation convincing enough? :) > > > current build_dmar_q35() doesn't do it, it's all new code in this series that > > > contains not acceptable direct access from one device (iommu) to another (cpu). > > > Proper way would be for the owner of iommu to fish limits from somewhere and set > > > values during iommu creation. > > > > Well, current build_dmar_q35() doesn't do it, because it is using the incorrect value. :) > > According to the spec, the host address width is the maximum physical address width, > > yet current implementation is using the DMA address width. For me, this is not only > > wrong, but also unsecure. For this point, I think we all agree this need to be fixed. > > > > As to how to fix it - should we query the cpu fields, I still do not understand why > > this is not acceptable. :) > > > > I had thought of other approaches before, yet I did not choose: > > > > 1> Introduce a new parameter, say, "x-haw-bits" which is used for iommu to limit its > > physical address width(similar to the "x-aw-bits" for IOVA). But what should we check > > this parameter or not? What if this parameter is set to sth. different than the "phys-bits" > > or not? > > > > 2> Another choice I had thought of is, to query the physical iommu. I abandoned this > > idea because my understanding is that vIOMMU is not a passthrued device, it is emulated. > > > So Igor, may I ask why you think checking against the cpu fields so not acceptable? :) > Because accessing private fields of device from another random device is not robust > and a subject to breaking in unpredictable manner when field meaning or initialization > order changes. (analogy to baremetal: one does not solder wire to a CPU die to let > access some piece of data from random device). > With either the solution below or the one I proposed, we still have a ordering problem: if we want "-cpu ...,phys-bits=..." to affect the IOMMU device, we will need the CPU objects to be created before IOMMU realize. At least both proposals make the initialization ordering explicitly a responsibility of the machine code. In either case, I don't think we will start creating all CPU objects after device realize any time soon. > I've looked at intel-iommu code and how it's created so here is a way to do the thing > you need using proper interfaces: > > 1. add x-haw_bits property > 2. include in your series patch > '[Qemu-devel] [PATCH] qdev: let machine hotplug handler to override bus hotplug handler' > 3. add your iommu to pc_get_hotpug_handler() to redirect plug flow to > machine and let _pre_plug handler to check and set x-haw_bits for machine level Wow, that's a very complex way to pass a single integer from machine code to device code. If this is the only way to do that, we really need to take a step back and rethink our API design. What's wrong with having a simple uint32_t pc_max_phys_bits(PCMachineState*) function? > 4. you probably can use phys-bits/host-phys-bits properties to get data that you need > also see how ms->possible_cpus, that's how you can get access to CPU from machine > layer. > [...] -- Eduardo ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-20 21:18 ` Eduardo Habkost @ 2018-12-21 14:13 ` Igor Mammedov 2018-12-21 16:09 ` Yu Zhang 2018-12-27 14:54 ` Eduardo Habkost 0 siblings, 2 replies; 57+ messages in thread From: Igor Mammedov @ 2018-12-21 14:13 UTC (permalink / raw) To: Eduardo Habkost Cc: Michael S. Tsirkin, qemu-devel, Peter Xu, Yu Zhang, Paolo Bonzini, Richard Henderson On Thu, 20 Dec 2018 19:18:01 -0200 Eduardo Habkost <ehabkost@redhat.com> wrote: > On Wed, Dec 19, 2018 at 11:40:37AM +0100, Igor Mammedov wrote: > > On Wed, 19 Dec 2018 10:57:17 +0800 > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote: > > > > On Tue, 18 Dec 2018 17:27:23 +0800 > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote: > > > > > > On Wed, 12 Dec 2018 21:05:38 +0800 > > > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > > > > > Currently, vIOMMU is using the value of IOVA address width, instead of > > > > > > > the host address width(HAW) to calculate the number of reserved bits in > > > > > > > data structures such as root entries, context entries, and entries of > > > > > > > DMA paging structures etc. > > > > > > > > > > > > > > However values of IOVA address width and of the HAW may not equal. For > > > > > > > example, a 48-bit IOVA can only be mapped to host addresses no wider than > > > > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may result > > > > > > > in an invalid IOVA being accepted. > > > > > > > > > > > > > > To fix this, a new field - haw_bits is introduced in struct IntelIOMMUState, > > > > > > > whose value is initialized based on the maximum physical address set to > > > > > > > guest CPU. > > > > > > > > > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed > > > > > > > to clarify. > > > > > > > > > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > > > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > > > > > --- > > > > > > [...] > > > > > > > > > > > > > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n) > > > > > > > static void vtd_init(IntelIOMMUState *s) > > > > > > > { > > > > > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > > > > > + CPUState *cs = first_cpu; > > > > > > > + X86CPU *cpu = X86_CPU(cs); > > > > > > > > > > > > > > memset(s->csr, 0, DMAR_REG_SIZE); > > > > > > > memset(s->wmask, 0, DMAR_REG_SIZE); > > > > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s) > > > > > > > s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND | > > > > > > > VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS | > > > > > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > > > > > - if (s->aw_bits == VTD_HOST_AW_48BIT) { > > > > > > > + if (s->aw_bits == VTD_AW_48BIT) { > > > > > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > > > > > } > > > > > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > > > > > + s->haw_bits = cpu->phys_bits; > > > > > > Is it possible to avoid accessing CPU fields directly or cpu altogether > > > > > > and set phys_bits when iommu is created? > > > > > > > > > > Thanks for your comments, Igor. > > > > > > > > > > Well, I guess you prefer not to query the CPU capabilities while deciding > > > > > the vIOMMU features. But to me, they are not that irrelevant.:) > > > > > > > > > > Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR > > > > > are referring to the same concept. In VM, both are the maximum guest physical > > > > > address width. If we do not check the CPU field here, we will still have to > > > > > check the CPU field in other places such as build_dmar_q35(), and reset the > > > > > s->haw_bits again. > > > > > > > > > > Is this explanation convincing enough? :) > > > > current build_dmar_q35() doesn't do it, it's all new code in this series that > > > > contains not acceptable direct access from one device (iommu) to another (cpu). > > > > Proper way would be for the owner of iommu to fish limits from somewhere and set > > > > values during iommu creation. > > > > > > Well, current build_dmar_q35() doesn't do it, because it is using the incorrect value. :) > > > According to the spec, the host address width is the maximum physical address width, > > > yet current implementation is using the DMA address width. For me, this is not only > > > wrong, but also unsecure. For this point, I think we all agree this need to be fixed. > > > > > > As to how to fix it - should we query the cpu fields, I still do not understand why > > > this is not acceptable. :) > > > > > > I had thought of other approaches before, yet I did not choose: > > > > > > 1> Introduce a new parameter, say, "x-haw-bits" which is used for iommu to limit its > > > physical address width(similar to the "x-aw-bits" for IOVA). But what should we check > > > this parameter or not? What if this parameter is set to sth. different than the "phys-bits" > > > or not? > > > > > > 2> Another choice I had thought of is, to query the physical iommu. I abandoned this > > > idea because my understanding is that vIOMMU is not a passthrued device, it is emulated. > > > > > So Igor, may I ask why you think checking against the cpu fields so not acceptable? :) > > Because accessing private fields of device from another random device is not robust > > and a subject to breaking in unpredictable manner when field meaning or initialization > > order changes. (analogy to baremetal: one does not solder wire to a CPU die to let > > access some piece of data from random device). > > > > With either the solution below or the one I proposed, we still > have a ordering problem: if we want "-cpu ...,phys-bits=..." to As Michael said, it's questionable if iommu should rely on guest's phys-bits at all, but that aside we should use proper interfaces and hierarchy to initialize devices, see below why I dislike simplistic pc_max_phys_bits(). > affect the IOMMU device, we will need the CPU objects to be > created before IOMMU realize. > > At least both proposals make the initialization ordering > explicitly a responsibility of the machine code. In either case, > I don't think we will start creating all CPU objects after device > realize any time soon. > > > > I've looked at intel-iommu code and how it's created so here is a way to do the thing > > you need using proper interfaces: > > > > 1. add x-haw_bits property > > 2. include in your series patch > > '[Qemu-devel] [PATCH] qdev: let machine hotplug handler to override bus hotplug handler' > > 3. add your iommu to pc_get_hotpug_handler() to redirect plug flow to > > machine and let _pre_plug handler to check and set x-haw_bits for machine level > > Wow, that's a very complex way to pass a single integer from > machine code to device code. If this is the only way to do that, > we really need to take a step back and rethink our API design. > > What's wrong with having a simple > uint32_t pc_max_phys_bits(PCMachineState*) > function? As suggested, it would be only aesthetic change for accessing first_cpu from random device at random time. IOMMU would still access cpu instance directly no matter how much wrappers one would use so it's still the same hack. If phys_bits were changing during VM lifecycle and iommu needed to use updated value then using pc_max_phys_bits() might be justified as we don't have interfaces to handle that but that's not the case here. I suggested a typical way (albeit a bit complex) to handle device initialization in cases where bus plug handler is not sufficient. It follows proper hierarchy without any layer violations and can fail gracefully even if we start creating CPUs later using only '-device cpufoo' without need to fix iommu code to handle that (it would fail creating iommu with clear error that CPU isn't available and all user have to do is to fix CLI to make sure that CPU is created before iommu). So I'd prefer if we used exiting pattern for device initialization instead of hacks whenever it is possible. > > > 4. you probably can use phys-bits/host-phys-bits properties to get data that you need > > also see how ms->possible_cpus, that's how you can get access to CPU from machine > > layer. > > > [...] > PS: Another thing I'd like to draw your attention to (since you recently looked at phys-bits) is about host/guest phys_bits and if it's safe from migration pov between hosts with different limits. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-21 14:13 ` Igor Mammedov @ 2018-12-21 16:09 ` Yu Zhang 2018-12-21 17:04 ` Michael S. Tsirkin 2018-12-27 14:54 ` Eduardo Habkost 1 sibling, 1 reply; 57+ messages in thread From: Yu Zhang @ 2018-12-21 16:09 UTC (permalink / raw) To: Igor Mammedov Cc: Eduardo Habkost, Michael S. Tsirkin, qemu-devel, Peter Xu, Paolo Bonzini, Richard Henderson On Fri, Dec 21, 2018 at 03:13:25PM +0100, Igor Mammedov wrote: > On Thu, 20 Dec 2018 19:18:01 -0200 > Eduardo Habkost <ehabkost@redhat.com> wrote: > > > On Wed, Dec 19, 2018 at 11:40:37AM +0100, Igor Mammedov wrote: > > > On Wed, 19 Dec 2018 10:57:17 +0800 > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote: > > > > > On Tue, 18 Dec 2018 17:27:23 +0800 > > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote: > > > > > > > On Wed, 12 Dec 2018 21:05:38 +0800 > > > > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > > > > > > > Currently, vIOMMU is using the value of IOVA address width, instead of > > > > > > > > the host address width(HAW) to calculate the number of reserved bits in > > > > > > > > data structures such as root entries, context entries, and entries of > > > > > > > > DMA paging structures etc. > > > > > > > > > > > > > > > > However values of IOVA address width and of the HAW may not equal. For > > > > > > > > example, a 48-bit IOVA can only be mapped to host addresses no wider than > > > > > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may result > > > > > > > > in an invalid IOVA being accepted. > > > > > > > > > > > > > > > > To fix this, a new field - haw_bits is introduced in struct IntelIOMMUState, > > > > > > > > whose value is initialized based on the maximum physical address set to > > > > > > > > guest CPU. > > > > > > > > > > > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed > > > > > > > > to clarify. > > > > > > > > > > > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > > > > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > > > > > > --- > > > > > > > [...] > > > > > > > > > > > > > > > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n) > > > > > > > > static void vtd_init(IntelIOMMUState *s) > > > > > > > > { > > > > > > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > > > > > > + CPUState *cs = first_cpu; > > > > > > > > + X86CPU *cpu = X86_CPU(cs); > > > > > > > > > > > > > > > > memset(s->csr, 0, DMAR_REG_SIZE); > > > > > > > > memset(s->wmask, 0, DMAR_REG_SIZE); > > > > > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s) > > > > > > > > s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND | > > > > > > > > VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS | > > > > > > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > > > > > > - if (s->aw_bits == VTD_HOST_AW_48BIT) { > > > > > > > > + if (s->aw_bits == VTD_AW_48BIT) { > > > > > > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > > > > > > } > > > > > > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > > > > > > + s->haw_bits = cpu->phys_bits; > > > > > > > Is it possible to avoid accessing CPU fields directly or cpu altogether > > > > > > > and set phys_bits when iommu is created? > > > > > > > > > > > > Thanks for your comments, Igor. > > > > > > > > > > > > Well, I guess you prefer not to query the CPU capabilities while deciding > > > > > > the vIOMMU features. But to me, they are not that irrelevant.:) > > > > > > > > > > > > Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR > > > > > > are referring to the same concept. In VM, both are the maximum guest physical > > > > > > address width. If we do not check the CPU field here, we will still have to > > > > > > check the CPU field in other places such as build_dmar_q35(), and reset the > > > > > > s->haw_bits again. > > > > > > > > > > > > Is this explanation convincing enough? :) > > > > > current build_dmar_q35() doesn't do it, it's all new code in this series that > > > > > contains not acceptable direct access from one device (iommu) to another (cpu). > > > > > Proper way would be for the owner of iommu to fish limits from somewhere and set > > > > > values during iommu creation. > > > > > > > > Well, current build_dmar_q35() doesn't do it, because it is using the incorrect value. :) > > > > According to the spec, the host address width is the maximum physical address width, > > > > yet current implementation is using the DMA address width. For me, this is not only > > > > wrong, but also unsecure. For this point, I think we all agree this need to be fixed. > > > > > > > > As to how to fix it - should we query the cpu fields, I still do not understand why > > > > this is not acceptable. :) > > > > > > > > I had thought of other approaches before, yet I did not choose: > > > > > > > > 1> Introduce a new parameter, say, "x-haw-bits" which is used for iommu to limit its > > > > physical address width(similar to the "x-aw-bits" for IOVA). But what should we check > > > > this parameter or not? What if this parameter is set to sth. different than the "phys-bits" > > > > or not? > > > > > > > > 2> Another choice I had thought of is, to query the physical iommu. I abandoned this > > > > idea because my understanding is that vIOMMU is not a passthrued device, it is emulated. > > > > > > > So Igor, may I ask why you think checking against the cpu fields so not acceptable? :) > > > Because accessing private fields of device from another random device is not robust > > > and a subject to breaking in unpredictable manner when field meaning or initialization > > > order changes. (analogy to baremetal: one does not solder wire to a CPU die to let > > > access some piece of data from random device). > > > > > > > With either the solution below or the one I proposed, we still > > have a ordering problem: if we want "-cpu ...,phys-bits=..." to > As Michael said, it's questionable if iommu should rely on guest's > phys-bits at all, but that aside we should use proper interfaces > and hierarchy to initialize devices, see below why I dislike > simplistic pc_max_phys_bits(). Well, my understanding of the vt-d spec is that the address limitation in DMAR are referring to the same concept of CPUID.MAXPHYSADDR. I do not think there's any different in the native scenario. :) > > > affect the IOMMU device, we will need the CPU objects to be > > created before IOMMU realize. > > > > At least both proposals make the initialization ordering > > explicitly a responsibility of the machine code. In either case, > > I don't think we will start creating all CPU objects after device > > realize any time soon. > > > > > > > I've looked at intel-iommu code and how it's created so here is a way to do the thing > > > you need using proper interfaces: > > > > > > 1. add x-haw_bits property > > > 2. include in your series patch > > > '[Qemu-devel] [PATCH] qdev: let machine hotplug handler to override bus hotplug handler' > > > 3. add your iommu to pc_get_hotpug_handler() to redirect plug flow to > > > machine and let _pre_plug handler to check and set x-haw_bits for machine level > > > > Wow, that's a very complex way to pass a single integer from > > machine code to device code. If this is the only way to do that, > > we really need to take a step back and rethink our API design. > > > > What's wrong with having a simple > > uint32_t pc_max_phys_bits(PCMachineState*) > > function? > As suggested, it would be only aesthetic change for accessing first_cpu from > random device at random time. IOMMU would still access cpu instance directly > no matter how much wrappers one would use so it's still the same hack. > If phys_bits were changing during VM lifecycle and iommu needed to use > updated value then using pc_max_phys_bits() might be justified as > we don't have interfaces to handle that but that's not the case here. > > I suggested a typical way (albeit a bit complex) to handle device > initialization in cases where bus plug handler is not sufficient. > It follows proper hierarchy without any layer violations and can fail > gracefully even if we start creating CPUs later using only '-device cpufoo' > without need to fix iommu code to handle that (it would fail creating > iommu with clear error that CPU isn't available and all user have to > do is to fix CLI to make sure that CPU is created before iommu). > > So I'd prefer if we used exiting pattern for device initialization > instead of hacks whenever it is possible. Thanks, Igor. I kind of understand your concern here. And I am wondering, the phys-bits shall be a configuration used by the VM, not just vCPU. So, instead of trying to deduce this value from the 1st created vCPU, or to guarantee the order of vCPU & vIOMMU creation, is there any possibility we move a max-phys-bits in the MachineState, and derive the 'phys-bits' in vCPU and 'haw-bits' in vIOMMU from MachineState later in their creation process respectively? > > > > > > 4. you probably can use phys-bits/host-phys-bits properties to get data that you need > > > also see how ms->possible_cpus, that's how you can get access to CPU from machine > > > layer. > > > > > [...] > > > PS: > Another thing I'd like to draw your attention to (since you recently looked at > phys-bits) is about host/guest phys_bits and if it's safe from migration pov > between hosts with different limits. > Good point, and thanks for the remind. Edurado, Paolo, and I discussed this before. And indeed, it is a bit tricky... :) B.R. Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-21 16:09 ` Yu Zhang @ 2018-12-21 17:04 ` Michael S. Tsirkin 2018-12-21 17:37 ` Yu Zhang 0 siblings, 1 reply; 57+ messages in thread From: Michael S. Tsirkin @ 2018-12-21 17:04 UTC (permalink / raw) To: Yu Zhang Cc: Igor Mammedov, Eduardo Habkost, qemu-devel, Peter Xu, Paolo Bonzini, Richard Henderson On Sat, Dec 22, 2018 at 12:09:44AM +0800, Yu Zhang wrote: > Well, my understanding of the vt-d spec is that the address limitation in > DMAR are referring to the same concept of CPUID.MAXPHYSADDR. I do not think > there's any different in the native scenario. :) I think native machines exist on which the two values are different. Is that true? ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-21 17:04 ` Michael S. Tsirkin @ 2018-12-21 17:37 ` Yu Zhang 2018-12-21 19:02 ` Michael S. Tsirkin 0 siblings, 1 reply; 57+ messages in thread From: Yu Zhang @ 2018-12-21 17:37 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Igor Mammedov, Eduardo Habkost, qemu-devel, Peter Xu, Paolo Bonzini, Richard Henderson On Fri, Dec 21, 2018 at 12:04:49PM -0500, Michael S. Tsirkin wrote: > On Sat, Dec 22, 2018 at 12:09:44AM +0800, Yu Zhang wrote: > > Well, my understanding of the vt-d spec is that the address limitation in > > DMAR are referring to the same concept of CPUID.MAXPHYSADDR. I do not think > > there's any different in the native scenario. :) > > I think native machines exist on which the two values are different. > Is that true? I think the answer is not. My understanding is that HAW(host address wdith) is the maximum physical address width a CPU can detects(by cpuid.0x80000008). I agree there are some addresses the CPU does not touch, but they are still in the physical address space, and there's only one physical address space... B.R. Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-21 17:37 ` Yu Zhang @ 2018-12-21 19:02 ` Michael S. Tsirkin 2018-12-21 20:01 ` Eduardo Habkost 2018-12-22 1:11 ` Yu Zhang 0 siblings, 2 replies; 57+ messages in thread From: Michael S. Tsirkin @ 2018-12-21 19:02 UTC (permalink / raw) To: Yu Zhang Cc: Igor Mammedov, Eduardo Habkost, qemu-devel, Peter Xu, Paolo Bonzini, Richard Henderson On Sat, Dec 22, 2018 at 01:37:58AM +0800, Yu Zhang wrote: > On Fri, Dec 21, 2018 at 12:04:49PM -0500, Michael S. Tsirkin wrote: > > On Sat, Dec 22, 2018 at 12:09:44AM +0800, Yu Zhang wrote: > > > Well, my understanding of the vt-d spec is that the address limitation in > > > DMAR are referring to the same concept of CPUID.MAXPHYSADDR. I do not think > > > there's any different in the native scenario. :) > > > > I think native machines exist on which the two values are different. > > Is that true? > > I think the answer is not. My understanding is that HAW(host address wdith) is > the maximum physical address width a CPU can detects(by cpuid.0x80000008). > > I agree there are some addresses the CPU does not touch, but they are still in > the physical address space, and there's only one physical address space... > > B.R. > Yu Ouch I thought we are talking about the virtual address size. I think I did have a box where VTD's virtual address size was smaller than CPU's. For physical one - we just need to make it as big as max supported memory right? -- MST ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-21 19:02 ` Michael S. Tsirkin @ 2018-12-21 20:01 ` Eduardo Habkost 2018-12-22 1:11 ` Yu Zhang 1 sibling, 0 replies; 57+ messages in thread From: Eduardo Habkost @ 2018-12-21 20:01 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Yu Zhang, Igor Mammedov, qemu-devel, Peter Xu, Paolo Bonzini, Richard Henderson On Fri, Dec 21, 2018 at 02:02:28PM -0500, Michael S. Tsirkin wrote: > On Sat, Dec 22, 2018 at 01:37:58AM +0800, Yu Zhang wrote: > > On Fri, Dec 21, 2018 at 12:04:49PM -0500, Michael S. Tsirkin wrote: > > > On Sat, Dec 22, 2018 at 12:09:44AM +0800, Yu Zhang wrote: > > > > Well, my understanding of the vt-d spec is that the address limitation in > > > > DMAR are referring to the same concept of CPUID.MAXPHYSADDR. I do not think > > > > there's any different in the native scenario. :) > > > > > > I think native machines exist on which the two values are different. > > > Is that true? > > > > I think the answer is not. My understanding is that HAW(host address wdith) is > > the maximum physical address width a CPU can detects(by cpuid.0x80000008). > > > > I agree there are some addresses the CPU does not touch, but they are still in > > the physical address space, and there's only one physical address space... > > > > B.R. > > Yu > > Ouch I thought we are talking about the virtual address size. > I think I did have a box where VTD's virtual address size was > smaller than CPU's. > For physical one - we just need to make it as big as max supported > memory right? What exactly do you mean by "max supported memory"? -- Eduardo ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-21 19:02 ` Michael S. Tsirkin 2018-12-21 20:01 ` Eduardo Habkost @ 2018-12-22 1:11 ` Yu Zhang 2018-12-25 16:56 ` Michael S. Tsirkin 1 sibling, 1 reply; 57+ messages in thread From: Yu Zhang @ 2018-12-22 1:11 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Igor Mammedov, Eduardo Habkost, qemu-devel, Peter Xu, Paolo Bonzini, Richard Henderson On Fri, Dec 21, 2018 at 02:02:28PM -0500, Michael S. Tsirkin wrote: > On Sat, Dec 22, 2018 at 01:37:58AM +0800, Yu Zhang wrote: > > On Fri, Dec 21, 2018 at 12:04:49PM -0500, Michael S. Tsirkin wrote: > > > On Sat, Dec 22, 2018 at 12:09:44AM +0800, Yu Zhang wrote: > > > > Well, my understanding of the vt-d spec is that the address limitation in > > > > DMAR are referring to the same concept of CPUID.MAXPHYSADDR. I do not think > > > > there's any different in the native scenario. :) > > > > > > I think native machines exist on which the two values are different. > > > Is that true? > > > > I think the answer is not. My understanding is that HAW(host address wdith) is > > the maximum physical address width a CPU can detects(by cpuid.0x80000008). > > > > I agree there are some addresses the CPU does not touch, but they are still in > > the physical address space, and there's only one physical address space... > > > > B.R. > > Yu > > Ouch I thought we are talking about the virtual address size. > I think I did have a box where VTD's virtual address size was > smaller than CPU's. > For physical one - we just need to make it as big as max supported > memory right? Well, my understanding of the physical one is the maximum physical address width. Sorry, this explain seems nonsense... I mean, it's not just about the max supported memory, but also covers MMIO. It shall be detectable from cpuid, or ACPI's DMAR table, instead of calculated by the max memory size. One common usage of this value is to tell the paging structure entries( CPU's or IOMMU's) which bits shall be reserved. There are also some registers e.g. apic base reg etc, whose contents are physical addresses, therefore also need to follow the similar requirement for the reserved bits. So I think the correct direction might be to define this property in the machine status level, instead of the CPU level. Is this reasonable to you? > > -- > MST B.R. Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-22 1:11 ` Yu Zhang @ 2018-12-25 16:56 ` Michael S. Tsirkin 2018-12-26 5:30 ` Yu Zhang 0 siblings, 1 reply; 57+ messages in thread From: Michael S. Tsirkin @ 2018-12-25 16:56 UTC (permalink / raw) To: Yu Zhang Cc: Igor Mammedov, Eduardo Habkost, qemu-devel, Peter Xu, Paolo Bonzini, Richard Henderson On Sat, Dec 22, 2018 at 09:11:26AM +0800, Yu Zhang wrote: > On Fri, Dec 21, 2018 at 02:02:28PM -0500, Michael S. Tsirkin wrote: > > On Sat, Dec 22, 2018 at 01:37:58AM +0800, Yu Zhang wrote: > > > On Fri, Dec 21, 2018 at 12:04:49PM -0500, Michael S. Tsirkin wrote: > > > > On Sat, Dec 22, 2018 at 12:09:44AM +0800, Yu Zhang wrote: > > > > > Well, my understanding of the vt-d spec is that the address limitation in > > > > > DMAR are referring to the same concept of CPUID.MAXPHYSADDR. I do not think > > > > > there's any different in the native scenario. :) > > > > > > > > I think native machines exist on which the two values are different. > > > > Is that true? > > > > > > I think the answer is not. My understanding is that HAW(host address wdith) is > > > the maximum physical address width a CPU can detects(by cpuid.0x80000008). > > > > > > I agree there are some addresses the CPU does not touch, but they are still in > > > the physical address space, and there's only one physical address space... > > > > > > B.R. > > > Yu > > > > Ouch I thought we are talking about the virtual address size. > > I think I did have a box where VTD's virtual address size was > > smaller than CPU's. > > For physical one - we just need to make it as big as max supported > > memory right? > > Well, my understanding of the physical one is the maximum physical address > width. Sorry, this explain seems nonsense... I mean, it's not just about > the max supported memory, but also covers MMIO. It shall be detectable > from cpuid, or ACPI's DMAR table, instead of calculated by the max memory > size. One common usage of this value is to tell the paging structure entries( > CPU's or IOMMU's) which bits shall be reserved. There are also some registers > e.g. apic base reg etc, whose contents are physical addresses, therefore also > need to follow the similar requirement for the reserved bits. > > So I think the correct direction might be to define this property in the > machine status level, instead of the CPU level. Is this reasonable to you? At that level yes. But isn't this already specified by "pci-hole64-end"? > > > > -- > > MST > > B.R. > Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-25 16:56 ` Michael S. Tsirkin @ 2018-12-26 5:30 ` Yu Zhang 2018-12-27 15:14 ` Eduardo Habkost 0 siblings, 1 reply; 57+ messages in thread From: Yu Zhang @ 2018-12-26 5:30 UTC (permalink / raw) To: Michael S. Tsirkin, Eduardo Habkost Cc: qemu-devel, Peter Xu, Paolo Bonzini, Igor Mammedov, Richard Henderson On Tue, Dec 25, 2018 at 11:56:19AM -0500, Michael S. Tsirkin wrote: > On Sat, Dec 22, 2018 at 09:11:26AM +0800, Yu Zhang wrote: > > On Fri, Dec 21, 2018 at 02:02:28PM -0500, Michael S. Tsirkin wrote: > > > On Sat, Dec 22, 2018 at 01:37:58AM +0800, Yu Zhang wrote: > > > > On Fri, Dec 21, 2018 at 12:04:49PM -0500, Michael S. Tsirkin wrote: > > > > > On Sat, Dec 22, 2018 at 12:09:44AM +0800, Yu Zhang wrote: > > > > > > Well, my understanding of the vt-d spec is that the address limitation in > > > > > > DMAR are referring to the same concept of CPUID.MAXPHYSADDR. I do not think > > > > > > there's any different in the native scenario. :) > > > > > > > > > > I think native machines exist on which the two values are different. > > > > > Is that true? > > > > > > > > I think the answer is not. My understanding is that HAW(host address wdith) is > > > > the maximum physical address width a CPU can detects(by cpuid.0x80000008). > > > > > > > > I agree there are some addresses the CPU does not touch, but they are still in > > > > the physical address space, and there's only one physical address space... > > > > > > > > B.R. > > > > Yu > > > > > > Ouch I thought we are talking about the virtual address size. > > > I think I did have a box where VTD's virtual address size was > > > smaller than CPU's. > > > For physical one - we just need to make it as big as max supported > > > memory right? > > > > Well, my understanding of the physical one is the maximum physical address > > width. Sorry, this explain seems nonsense... I mean, it's not just about > > the max supported memory, but also covers MMIO. It shall be detectable > > from cpuid, or ACPI's DMAR table, instead of calculated by the max memory > > size. One common usage of this value is to tell the paging structure entries( > > CPU's or IOMMU's) which bits shall be reserved. There are also some registers > > e.g. apic base reg etc, whose contents are physical addresses, therefore also > > need to follow the similar requirement for the reserved bits. > > > > So I think the correct direction might be to define this property in the > > machine status level, instead of the CPU level. Is this reasonable to you? > > At that level yes. But isn't this already specified by "pci-hole64-end"? But this value is set by guest firmware? Will PCI hotplug change this address? @Eduardo, do you have any plan to calculate the phys-bits by "pci-hole64-end"? Or introduce another property, say "max-phys-bits" in machine status? > > > > > > > > > -- > > > MST > > > > B.R. > > Yu > B.R. Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-26 5:30 ` Yu Zhang @ 2018-12-27 15:14 ` Eduardo Habkost 2018-12-28 2:32 ` Yu Zhang 0 siblings, 1 reply; 57+ messages in thread From: Eduardo Habkost @ 2018-12-27 15:14 UTC (permalink / raw) To: Yu Zhang Cc: Michael S. Tsirkin, qemu-devel, Peter Xu, Paolo Bonzini, Igor Mammedov, Richard Henderson On Wed, Dec 26, 2018 at 01:30:00PM +0800, Yu Zhang wrote: > On Tue, Dec 25, 2018 at 11:56:19AM -0500, Michael S. Tsirkin wrote: > > On Sat, Dec 22, 2018 at 09:11:26AM +0800, Yu Zhang wrote: > > > On Fri, Dec 21, 2018 at 02:02:28PM -0500, Michael S. Tsirkin wrote: > > > > On Sat, Dec 22, 2018 at 01:37:58AM +0800, Yu Zhang wrote: > > > > > On Fri, Dec 21, 2018 at 12:04:49PM -0500, Michael S. Tsirkin wrote: > > > > > > On Sat, Dec 22, 2018 at 12:09:44AM +0800, Yu Zhang wrote: > > > > > > > Well, my understanding of the vt-d spec is that the address limitation in > > > > > > > DMAR are referring to the same concept of CPUID.MAXPHYSADDR. I do not think > > > > > > > there's any different in the native scenario. :) > > > > > > > > > > > > I think native machines exist on which the two values are different. > > > > > > Is that true? > > > > > > > > > > I think the answer is not. My understanding is that HAW(host address wdith) is > > > > > the maximum physical address width a CPU can detects(by cpuid.0x80000008). > > > > > > > > > > I agree there are some addresses the CPU does not touch, but they are still in > > > > > the physical address space, and there's only one physical address space... > > > > > > > > > > B.R. > > > > > Yu > > > > > > > > Ouch I thought we are talking about the virtual address size. > > > > I think I did have a box where VTD's virtual address size was > > > > smaller than CPU's. > > > > For physical one - we just need to make it as big as max supported > > > > memory right? > > > > > > Well, my understanding of the physical one is the maximum physical address > > > width. Sorry, this explain seems nonsense... I mean, it's not just about > > > the max supported memory, but also covers MMIO. It shall be detectable > > > from cpuid, or ACPI's DMAR table, instead of calculated by the max memory > > > size. One common usage of this value is to tell the paging structure entries( > > > CPU's or IOMMU's) which bits shall be reserved. There are also some registers > > > e.g. apic base reg etc, whose contents are physical addresses, therefore also > > > need to follow the similar requirement for the reserved bits. > > > > > > So I think the correct direction might be to define this property in the > > > machine status level, instead of the CPU level. Is this reasonable to you? > > > > At that level yes. But isn't this already specified by "pci-hole64-end"? > > But this value is set by guest firmware? Will PCI hotplug change this address? > > @Eduardo, do you have any plan to calculate the phys-bits by "pci-hole64-end"? > Or introduce another property, say "max-phys-bits" in machine status? I agree it may make sense to make the machine code control phys-bits instead of the CPU object. A machine property sounds like the simplest solution. But I don't think we can have a meaningful discussion about implementation if we don't agree about the command-line interface. We must decide what will happen to the CPU and iommu physical address width in cases like: $QEMU -device intel-iommu $QEMU -cpu ...,phys-bits=50 -device intel-iommu $QEMU -cpu ...,host-phys-bits=on -device intel-iommu $QEMU -machine phys-bits=50 -device intel-iommu $QEMU -machine phys-bits=50 -cpu ...,phys-bits=48 -device intel-iommu -- Eduardo ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-27 15:14 ` Eduardo Habkost @ 2018-12-28 2:32 ` Yu Zhang 2018-12-29 1:29 ` Eduardo Habkost 0 siblings, 1 reply; 57+ messages in thread From: Yu Zhang @ 2018-12-28 2:32 UTC (permalink / raw) To: Eduardo Habkost Cc: Michael S. Tsirkin, qemu-devel, Peter Xu, Igor Mammedov, Paolo Bonzini, Richard Henderson On Thu, Dec 27, 2018 at 01:14:11PM -0200, Eduardo Habkost wrote: > On Wed, Dec 26, 2018 at 01:30:00PM +0800, Yu Zhang wrote: > > On Tue, Dec 25, 2018 at 11:56:19AM -0500, Michael S. Tsirkin wrote: > > > On Sat, Dec 22, 2018 at 09:11:26AM +0800, Yu Zhang wrote: > > > > On Fri, Dec 21, 2018 at 02:02:28PM -0500, Michael S. Tsirkin wrote: > > > > > On Sat, Dec 22, 2018 at 01:37:58AM +0800, Yu Zhang wrote: > > > > > > On Fri, Dec 21, 2018 at 12:04:49PM -0500, Michael S. Tsirkin wrote: > > > > > > > On Sat, Dec 22, 2018 at 12:09:44AM +0800, Yu Zhang wrote: > > > > > > > > Well, my understanding of the vt-d spec is that the address limitation in > > > > > > > > DMAR are referring to the same concept of CPUID.MAXPHYSADDR. I do not think > > > > > > > > there's any different in the native scenario. :) > > > > > > > > > > > > > > I think native machines exist on which the two values are different. > > > > > > > Is that true? > > > > > > > > > > > > I think the answer is not. My understanding is that HAW(host address wdith) is > > > > > > the maximum physical address width a CPU can detects(by cpuid.0x80000008). > > > > > > > > > > > > I agree there are some addresses the CPU does not touch, but they are still in > > > > > > the physical address space, and there's only one physical address space... > > > > > > > > > > > > B.R. > > > > > > Yu > > > > > > > > > > Ouch I thought we are talking about the virtual address size. > > > > > I think I did have a box where VTD's virtual address size was > > > > > smaller than CPU's. > > > > > For physical one - we just need to make it as big as max supported > > > > > memory right? > > > > > > > > Well, my understanding of the physical one is the maximum physical address > > > > width. Sorry, this explain seems nonsense... I mean, it's not just about > > > > the max supported memory, but also covers MMIO. It shall be detectable > > > > from cpuid, or ACPI's DMAR table, instead of calculated by the max memory > > > > size. One common usage of this value is to tell the paging structure entries( > > > > CPU's or IOMMU's) which bits shall be reserved. There are also some registers > > > > e.g. apic base reg etc, whose contents are physical addresses, therefore also > > > > need to follow the similar requirement for the reserved bits. > > > > > > > > So I think the correct direction might be to define this property in the > > > > machine status level, instead of the CPU level. Is this reasonable to you? > > > > > > At that level yes. But isn't this already specified by "pci-hole64-end"? > > > > But this value is set by guest firmware? Will PCI hotplug change this address? > > > > @Eduardo, do you have any plan to calculate the phys-bits by "pci-hole64-end"? > > Or introduce another property, say "max-phys-bits" in machine status? > > I agree it may make sense to make the machine code control > phys-bits instead of the CPU object. A machine property sounds > like the simplest solution. > > But I don't think we can have a meaningful discussion about > implementation if we don't agree about the command-line > interface. We must decide what will happen to the CPU and iommu > physical address width in cases like: Thanks, Eduardo. What about we just use "-machine phys-bits=52", and remove the "phys-bits" from CPU parameter? > > $QEMU -device intel-iommu > $QEMU -cpu ...,phys-bits=50 -device intel-iommu > $QEMU -cpu ...,host-phys-bits=on -device intel-iommu > $QEMU -machine phys-bits=50 -device intel-iommu > $QEMU -machine phys-bits=50 -cpu ...,phys-bits=48 -device intel-iommu > > -- > Eduardo > B.R. Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-28 2:32 ` Yu Zhang @ 2018-12-29 1:29 ` Eduardo Habkost 2019-01-15 7:13 ` Yu Zhang 0 siblings, 1 reply; 57+ messages in thread From: Eduardo Habkost @ 2018-12-29 1:29 UTC (permalink / raw) To: Yu Zhang Cc: Michael S. Tsirkin, qemu-devel, Peter Xu, Igor Mammedov, Paolo Bonzini, Richard Henderson On Fri, Dec 28, 2018 at 10:32:59AM +0800, Yu Zhang wrote: > On Thu, Dec 27, 2018 at 01:14:11PM -0200, Eduardo Habkost wrote: > > On Wed, Dec 26, 2018 at 01:30:00PM +0800, Yu Zhang wrote: > > > On Tue, Dec 25, 2018 at 11:56:19AM -0500, Michael S. Tsirkin wrote: > > > > On Sat, Dec 22, 2018 at 09:11:26AM +0800, Yu Zhang wrote: > > > > > On Fri, Dec 21, 2018 at 02:02:28PM -0500, Michael S. Tsirkin wrote: > > > > > > On Sat, Dec 22, 2018 at 01:37:58AM +0800, Yu Zhang wrote: > > > > > > > On Fri, Dec 21, 2018 at 12:04:49PM -0500, Michael S. Tsirkin wrote: > > > > > > > > On Sat, Dec 22, 2018 at 12:09:44AM +0800, Yu Zhang wrote: > > > > > > > > > Well, my understanding of the vt-d spec is that the address limitation in > > > > > > > > > DMAR are referring to the same concept of CPUID.MAXPHYSADDR. I do not think > > > > > > > > > there's any different in the native scenario. :) > > > > > > > > > > > > > > > > I think native machines exist on which the two values are different. > > > > > > > > Is that true? > > > > > > > > > > > > > > I think the answer is not. My understanding is that HAW(host address wdith) is > > > > > > > the maximum physical address width a CPU can detects(by cpuid.0x80000008). > > > > > > > > > > > > > > I agree there are some addresses the CPU does not touch, but they are still in > > > > > > > the physical address space, and there's only one physical address space... > > > > > > > > > > > > > > B.R. > > > > > > > Yu > > > > > > > > > > > > Ouch I thought we are talking about the virtual address size. > > > > > > I think I did have a box where VTD's virtual address size was > > > > > > smaller than CPU's. > > > > > > For physical one - we just need to make it as big as max supported > > > > > > memory right? > > > > > > > > > > Well, my understanding of the physical one is the maximum physical address > > > > > width. Sorry, this explain seems nonsense... I mean, it's not just about > > > > > the max supported memory, but also covers MMIO. It shall be detectable > > > > > from cpuid, or ACPI's DMAR table, instead of calculated by the max memory > > > > > size. One common usage of this value is to tell the paging structure entries( > > > > > CPU's or IOMMU's) which bits shall be reserved. There are also some registers > > > > > e.g. apic base reg etc, whose contents are physical addresses, therefore also > > > > > need to follow the similar requirement for the reserved bits. > > > > > > > > > > So I think the correct direction might be to define this property in the > > > > > machine status level, instead of the CPU level. Is this reasonable to you? > > > > > > > > At that level yes. But isn't this already specified by "pci-hole64-end"? > > > > > > But this value is set by guest firmware? Will PCI hotplug change this address? > > > > > > @Eduardo, do you have any plan to calculate the phys-bits by "pci-hole64-end"? > > > Or introduce another property, say "max-phys-bits" in machine status? > > > > I agree it may make sense to make the machine code control > > phys-bits instead of the CPU object. A machine property sounds > > like the simplest solution. > > > > But I don't think we can have a meaningful discussion about > > implementation if we don't agree about the command-line > > interface. We must decide what will happen to the CPU and iommu > > physical address width in cases like: > > Thanks, Eduardo. > > What about we just use "-machine phys-bits=52", and remove the > "phys-bits" from CPU parameter? Maybe we can deprecate it, but we can't remove it immediately. We still need to decide what to do on the cases below, while the option is still available. > > > > > $QEMU -device intel-iommu > > $QEMU -cpu ...,phys-bits=50 -device intel-iommu > > $QEMU -cpu ...,host-phys-bits=on -device intel-iommu > > $QEMU -machine phys-bits=50 -device intel-iommu > > $QEMU -machine phys-bits=50 -cpu ...,phys-bits=48 -device intel-iommu > > > > -- > > Eduardo > > > > B.R. > Yu -- Eduardo ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-29 1:29 ` Eduardo Habkost @ 2019-01-15 7:13 ` Yu Zhang 2019-01-18 7:10 ` Yu Zhang 0 siblings, 1 reply; 57+ messages in thread From: Yu Zhang @ 2019-01-15 7:13 UTC (permalink / raw) To: Eduardo Habkost, Michael S. Tsirkin, Igor Mammedov, Peter Xu Cc: qemu-devel, Paolo Bonzini, Richard Henderson On Fri, Dec 28, 2018 at 11:29:41PM -0200, Eduardo Habkost wrote: > On Fri, Dec 28, 2018 at 10:32:59AM +0800, Yu Zhang wrote: > > On Thu, Dec 27, 2018 at 01:14:11PM -0200, Eduardo Habkost wrote: > > > On Wed, Dec 26, 2018 at 01:30:00PM +0800, Yu Zhang wrote: > > > > On Tue, Dec 25, 2018 at 11:56:19AM -0500, Michael S. Tsirkin wrote: > > > > > On Sat, Dec 22, 2018 at 09:11:26AM +0800, Yu Zhang wrote: > > > > > > On Fri, Dec 21, 2018 at 02:02:28PM -0500, Michael S. Tsirkin wrote: > > > > > > > On Sat, Dec 22, 2018 at 01:37:58AM +0800, Yu Zhang wrote: > > > > > > > > On Fri, Dec 21, 2018 at 12:04:49PM -0500, Michael S. Tsirkin wrote: > > > > > > > > > On Sat, Dec 22, 2018 at 12:09:44AM +0800, Yu Zhang wrote: > > > > > > > > > > Well, my understanding of the vt-d spec is that the address limitation in > > > > > > > > > > DMAR are referring to the same concept of CPUID.MAXPHYSADDR. I do not think > > > > > > > > > > there's any different in the native scenario. :) > > > > > > > > > > > > > > > > > > I think native machines exist on which the two values are different. > > > > > > > > > Is that true? > > > > > > > > > > > > > > > > I think the answer is not. My understanding is that HAW(host address wdith) is > > > > > > > > the maximum physical address width a CPU can detects(by cpuid.0x80000008). > > > > > > > > > > > > > > > > I agree there are some addresses the CPU does not touch, but they are still in > > > > > > > > the physical address space, and there's only one physical address space... > > > > > > > > > > > > > > > > B.R. > > > > > > > > Yu > > > > > > > > > > > > > > Ouch I thought we are talking about the virtual address size. > > > > > > > I think I did have a box where VTD's virtual address size was > > > > > > > smaller than CPU's. > > > > > > > For physical one - we just need to make it as big as max supported > > > > > > > memory right? > > > > > > > > > > > > Well, my understanding of the physical one is the maximum physical address > > > > > > width. Sorry, this explain seems nonsense... I mean, it's not just about > > > > > > the max supported memory, but also covers MMIO. It shall be detectable > > > > > > from cpuid, or ACPI's DMAR table, instead of calculated by the max memory > > > > > > size. One common usage of this value is to tell the paging structure entries( > > > > > > CPU's or IOMMU's) which bits shall be reserved. There are also some registers > > > > > > e.g. apic base reg etc, whose contents are physical addresses, therefore also > > > > > > need to follow the similar requirement for the reserved bits. > > > > > > > > > > > > So I think the correct direction might be to define this property in the > > > > > > machine status level, instead of the CPU level. Is this reasonable to you? > > > > > > > > > > At that level yes. But isn't this already specified by "pci-hole64-end"? > > > > > > > > But this value is set by guest firmware? Will PCI hotplug change this address? > > > > > > > > @Eduardo, do you have any plan to calculate the phys-bits by "pci-hole64-end"? > > > > Or introduce another property, say "max-phys-bits" in machine status? > > > > > > I agree it may make sense to make the machine code control > > > phys-bits instead of the CPU object. A machine property sounds > > > like the simplest solution. > > > > > > But I don't think we can have a meaningful discussion about > > > implementation if we don't agree about the command-line > > > interface. We must decide what will happen to the CPU and iommu > > > physical address width in cases like: > > > > Thanks, Eduardo. > > > > What about we just use "-machine phys-bits=52", and remove the > > "phys-bits" from CPU parameter? > > Maybe we can deprecate it, but we can't remove it immediately. > We still need to decide what to do on the cases below, while the > option is still available. I saw the ACPI DMAR is ininitialized in acpi_build(), which is called by pc_machine_done(). I guess this is done after the initialization of vCPU and vIOMMU. So I am wondering, instead of moving "phys-bits" from X86CPU into the MachineState, maybe we could: 1> Define a "phys_bits" in MachineState or PCMachineState(not sure which one is more suitable). 2> Set ms->phys_bits in x86_cpu_realizefn(). 3> Since DMAR is created after vCPU creation, we can build DMAR table with ms->phys_bits. 4> Also, we can reset the hardware address width for vIOMMU(and the vtd_paging_entry_rsvd_field array) in pc_machine_done(), based on the value of ms->phys_bits, or from ACPI DMAR table(from spec point of view, address width limitation of IOMMU shall come from DMAR, yet I have not figured out any simple approach to probe the ACPI property). This way, we do not need worry about the initialization sequence of vCPU and vIOMMU, and both DMAR and IOMMU setting are from the machine level which follows the spec. Any comments? :) B.R. Yu > > > > > > > > > $QEMU -device intel-iommu > > > $QEMU -cpu ...,phys-bits=50 -device intel-iommu > > > $QEMU -cpu ...,host-phys-bits=on -device intel-iommu > > > $QEMU -machine phys-bits=50 -device intel-iommu > > > $QEMU -machine phys-bits=50 -cpu ...,phys-bits=48 -device intel-iommu > > > > > > -- > > > Eduardo > > > > > > > B.R. > > Yu > > -- > Eduardo > ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2019-01-15 7:13 ` Yu Zhang @ 2019-01-18 7:10 ` Yu Zhang 0 siblings, 0 replies; 57+ messages in thread From: Yu Zhang @ 2019-01-18 7:10 UTC (permalink / raw) To: Eduardo Habkost, Michael S. Tsirkin, Igor Mammedov, Peter Xu Cc: Paolo Bonzini, qemu-devel, Richard Henderson On Tue, Jan 15, 2019 at 03:13:14PM +0800, Yu Zhang wrote: > On Fri, Dec 28, 2018 at 11:29:41PM -0200, Eduardo Habkost wrote: > > On Fri, Dec 28, 2018 at 10:32:59AM +0800, Yu Zhang wrote: > > > On Thu, Dec 27, 2018 at 01:14:11PM -0200, Eduardo Habkost wrote: > > > > On Wed, Dec 26, 2018 at 01:30:00PM +0800, Yu Zhang wrote: > > > > > On Tue, Dec 25, 2018 at 11:56:19AM -0500, Michael S. Tsirkin wrote: > > > > > > On Sat, Dec 22, 2018 at 09:11:26AM +0800, Yu Zhang wrote: > > > > > > > On Fri, Dec 21, 2018 at 02:02:28PM -0500, Michael S. Tsirkin wrote: > > > > > > > > On Sat, Dec 22, 2018 at 01:37:58AM +0800, Yu Zhang wrote: > > > > > > > > > On Fri, Dec 21, 2018 at 12:04:49PM -0500, Michael S. Tsirkin wrote: > > > > > > > > > > On Sat, Dec 22, 2018 at 12:09:44AM +0800, Yu Zhang wrote: > > > > > > > > > > > Well, my understanding of the vt-d spec is that the address limitation in > > > > > > > > > > > DMAR are referring to the same concept of CPUID.MAXPHYSADDR. I do not think > > > > > > > > > > > there's any different in the native scenario. :) > > > > > > > > > > > > > > > > > > > > I think native machines exist on which the two values are different. > > > > > > > > > > Is that true? > > > > > > > > > > > > > > > > > > I think the answer is not. My understanding is that HAW(host address wdith) is > > > > > > > > > the maximum physical address width a CPU can detects(by cpuid.0x80000008). > > > > > > > > > > > > > > > > > > I agree there are some addresses the CPU does not touch, but they are still in > > > > > > > > > the physical address space, and there's only one physical address space... > > > > > > > > > > > > > > > > > > B.R. > > > > > > > > > Yu > > > > > > > > > > > > > > > > Ouch I thought we are talking about the virtual address size. > > > > > > > > I think I did have a box where VTD's virtual address size was > > > > > > > > smaller than CPU's. > > > > > > > > For physical one - we just need to make it as big as max supported > > > > > > > > memory right? > > > > > > > > > > > > > > Well, my understanding of the physical one is the maximum physical address > > > > > > > width. Sorry, this explain seems nonsense... I mean, it's not just about > > > > > > > the max supported memory, but also covers MMIO. It shall be detectable > > > > > > > from cpuid, or ACPI's DMAR table, instead of calculated by the max memory > > > > > > > size. One common usage of this value is to tell the paging structure entries( > > > > > > > CPU's or IOMMU's) which bits shall be reserved. There are also some registers > > > > > > > e.g. apic base reg etc, whose contents are physical addresses, therefore also > > > > > > > need to follow the similar requirement for the reserved bits. > > > > > > > > > > > > > > So I think the correct direction might be to define this property in the > > > > > > > machine status level, instead of the CPU level. Is this reasonable to you? > > > > > > > > > > > > At that level yes. But isn't this already specified by "pci-hole64-end"? > > > > > > > > > > But this value is set by guest firmware? Will PCI hotplug change this address? > > > > > > > > > > @Eduardo, do you have any plan to calculate the phys-bits by "pci-hole64-end"? > > > > > Or introduce another property, say "max-phys-bits" in machine status? > > > > > > > > I agree it may make sense to make the machine code control > > > > phys-bits instead of the CPU object. A machine property sounds > > > > like the simplest solution. > > > > > > > > But I don't think we can have a meaningful discussion about > > > > implementation if we don't agree about the command-line > > > > interface. We must decide what will happen to the CPU and iommu > > > > physical address width in cases like: > > > > > > Thanks, Eduardo. > > > > > > What about we just use "-machine phys-bits=52", and remove the > > > "phys-bits" from CPU parameter? > > > > Maybe we can deprecate it, but we can't remove it immediately. > > We still need to decide what to do on the cases below, while the > > option is still available. > > I saw the ACPI DMAR is ininitialized in acpi_build(), which is called > by pc_machine_done(). I guess this is done after the initialization of > vCPU and vIOMMU. > > So I am wondering, instead of moving "phys-bits" from X86CPU into the > MachineState, maybe we could: > > 1> Define a "phys_bits" in MachineState or PCMachineState(not sure which > one is more suitable). > > 2> Set ms->phys_bits in x86_cpu_realizefn(). > > 3> Since DMAR is created after vCPU creation, we can build DMAR table > with ms->phys_bits. > > 4> Also, we can reset the hardware address width for vIOMMU(and the > vtd_paging_entry_rsvd_field array) in pc_machine_done(), based on the value > of ms->phys_bits, or from ACPI DMAR table(from spec point of view, address > width limitation of IOMMU shall come from DMAR, yet I have not figured out > any simple approach to probe the ACPI property). > > This way, we do not need worry about the initialization sequence of vCPU > and vIOMMU, and both DMAR and IOMMU setting are from the machine level which > follows the spec. > > Any comments? :) > Ping... Andy comments on this proposal? Thanks! :) Yu > B.R. > Yu > > > > > > > > > > > > > > $QEMU -device intel-iommu > > > > $QEMU -cpu ...,phys-bits=50 -device intel-iommu > > > > $QEMU -cpu ...,host-phys-bits=on -device intel-iommu > > > > $QEMU -machine phys-bits=50 -device intel-iommu > > > > $QEMU -machine phys-bits=50 -cpu ...,phys-bits=48 -device intel-iommu > > > > > > > > -- > > > > Eduardo > > > > > > > > > > B.R. > > > Yu > > > > -- > > Eduardo > > > ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-21 14:13 ` Igor Mammedov 2018-12-21 16:09 ` Yu Zhang @ 2018-12-27 14:54 ` Eduardo Habkost 2018-12-28 11:42 ` Igor Mammedov 1 sibling, 1 reply; 57+ messages in thread From: Eduardo Habkost @ 2018-12-27 14:54 UTC (permalink / raw) To: Igor Mammedov Cc: Michael S. Tsirkin, qemu-devel, Peter Xu, Yu Zhang, Paolo Bonzini, Richard Henderson On Fri, Dec 21, 2018 at 03:13:25PM +0100, Igor Mammedov wrote: > On Thu, 20 Dec 2018 19:18:01 -0200 > Eduardo Habkost <ehabkost@redhat.com> wrote: > > > On Wed, Dec 19, 2018 at 11:40:37AM +0100, Igor Mammedov wrote: > > > On Wed, 19 Dec 2018 10:57:17 +0800 > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote: > > > > > On Tue, 18 Dec 2018 17:27:23 +0800 > > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote: > > > > > > > On Wed, 12 Dec 2018 21:05:38 +0800 > > > > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > > > > > > > Currently, vIOMMU is using the value of IOVA address width, instead of > > > > > > > > the host address width(HAW) to calculate the number of reserved bits in > > > > > > > > data structures such as root entries, context entries, and entries of > > > > > > > > DMA paging structures etc. > > > > > > > > > > > > > > > > However values of IOVA address width and of the HAW may not equal. For > > > > > > > > example, a 48-bit IOVA can only be mapped to host addresses no wider than > > > > > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may result > > > > > > > > in an invalid IOVA being accepted. > > > > > > > > > > > > > > > > To fix this, a new field - haw_bits is introduced in struct IntelIOMMUState, > > > > > > > > whose value is initialized based on the maximum physical address set to > > > > > > > > guest CPU. > > > > > > > > > > > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed > > > > > > > > to clarify. > > > > > > > > > > > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > > > > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > > > > > > --- > > > > > > > [...] > > > > > > > > > > > > > > > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n) > > > > > > > > static void vtd_init(IntelIOMMUState *s) > > > > > > > > { > > > > > > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > > > > > > + CPUState *cs = first_cpu; > > > > > > > > + X86CPU *cpu = X86_CPU(cs); > > > > > > > > > > > > > > > > memset(s->csr, 0, DMAR_REG_SIZE); > > > > > > > > memset(s->wmask, 0, DMAR_REG_SIZE); > > > > > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s) > > > > > > > > s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND | > > > > > > > > VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS | > > > > > > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > > > > > > - if (s->aw_bits == VTD_HOST_AW_48BIT) { > > > > > > > > + if (s->aw_bits == VTD_AW_48BIT) { > > > > > > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > > > > > > } > > > > > > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > > > > > > + s->haw_bits = cpu->phys_bits; > > > > > > > Is it possible to avoid accessing CPU fields directly or cpu altogether > > > > > > > and set phys_bits when iommu is created? > > > > > > > > > > > > Thanks for your comments, Igor. > > > > > > > > > > > > Well, I guess you prefer not to query the CPU capabilities while deciding > > > > > > the vIOMMU features. But to me, they are not that irrelevant.:) > > > > > > > > > > > > Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR > > > > > > are referring to the same concept. In VM, both are the maximum guest physical > > > > > > address width. If we do not check the CPU field here, we will still have to > > > > > > check the CPU field in other places such as build_dmar_q35(), and reset the > > > > > > s->haw_bits again. > > > > > > > > > > > > Is this explanation convincing enough? :) > > > > > current build_dmar_q35() doesn't do it, it's all new code in this series that > > > > > contains not acceptable direct access from one device (iommu) to another (cpu). > > > > > Proper way would be for the owner of iommu to fish limits from somewhere and set > > > > > values during iommu creation. > > > > > > > > Well, current build_dmar_q35() doesn't do it, because it is using the incorrect value. :) > > > > According to the spec, the host address width is the maximum physical address width, > > > > yet current implementation is using the DMA address width. For me, this is not only > > > > wrong, but also unsecure. For this point, I think we all agree this need to be fixed. > > > > > > > > As to how to fix it - should we query the cpu fields, I still do not understand why > > > > this is not acceptable. :) > > > > > > > > I had thought of other approaches before, yet I did not choose: > > > > > > > > 1> Introduce a new parameter, say, "x-haw-bits" which is used for iommu to limit its > > > > physical address width(similar to the "x-aw-bits" for IOVA). But what should we check > > > > this parameter or not? What if this parameter is set to sth. different than the "phys-bits" > > > > or not? > > > > > > > > 2> Another choice I had thought of is, to query the physical iommu. I abandoned this > > > > idea because my understanding is that vIOMMU is not a passthrued device, it is emulated. > > > > > > > So Igor, may I ask why you think checking against the cpu fields so not acceptable? :) > > > Because accessing private fields of device from another random device is not robust > > > and a subject to breaking in unpredictable manner when field meaning or initialization > > > order changes. (analogy to baremetal: one does not solder wire to a CPU die to let > > > access some piece of data from random device). > > > > > > > With either the solution below or the one I proposed, we still > > have a ordering problem: if we want "-cpu ...,phys-bits=..." to > As Michael said, it's questionable if iommu should rely on guest's > phys-bits at all, Agreed, this is not clear. I don't know yet if we really want to make "-cpu" affect other devices. Probably not. > but that aside we should use proper interfaces > and hierarchy to initialize devices, see below why I dislike > simplistic pc_max_phys_bits(). What do you mean by proper interfaces and hierarchy? pc_max_phys_bits() is simple, and that's supposed to be a good thing. > > > affect the IOMMU device, we will need the CPU objects to be > > created before IOMMU realize. > > > > At least both proposals make the initialization ordering > > explicitly a responsibility of the machine code. In either case, > > I don't think we will start creating all CPU objects after device > > realize any time soon. > > > > > > > I've looked at intel-iommu code and how it's created so here is a way to do the thing > > > you need using proper interfaces: > > > > > > 1. add x-haw_bits property > > > 2. include in your series patch > > > '[Qemu-devel] [PATCH] qdev: let machine hotplug handler to override bus hotplug handler' > > > 3. add your iommu to pc_get_hotpug_handler() to redirect plug flow to > > > machine and let _pre_plug handler to check and set x-haw_bits for machine level > > > > Wow, that's a very complex way to pass a single integer from > > machine code to device code. If this is the only way to do that, > > we really need to take a step back and rethink our API design. > > > > What's wrong with having a simple > > uint32_t pc_max_phys_bits(PCMachineState*) > > function? > As suggested, it would be only aesthetic change for accessing first_cpu from > random device at random time. IOMMU would still access cpu instance directly > no matter how much wrappers one would use so it's still the same hack. > If phys_bits were changing during VM lifecycle and iommu needed to use > updated value then using pc_max_phys_bits() might be justified as > we don't have interfaces to handle that but that's not the case here. I don't understand what you mean here. Which "interfaces to handle that" you are talking about? > > I suggested a typical way (albeit a bit complex) to handle device > initialization in cases where bus plug handler is not sufficient. > It follows proper hierarchy without any layer violations and can fail > gracefully even if we start creating CPUs later using only '-device cpufoo' > without need to fix iommu code to handle that (it would fail creating > iommu with clear error that CPU isn't available and all user have to > do is to fix CLI to make sure that CPU is created before iommu). What do you mean by "proper hierarchy" and "layer violations"? What exactly is wrong with having device code talking to the machine object? You do have a point about "-device cpufoo": making "-cpu" affect iommu phys-bits is probably not a good idea after all. > > So I'd prefer if we used exiting pattern for device initialization > instead of hacks whenever it is possible. Why do you describe it as a hack? It's just C code calling a C function. I don't see any problem in having device code talking to the machine code to get a bit of information. > > > > > > 4. you probably can use phys-bits/host-phys-bits properties to get data that you need > > > also see how ms->possible_cpus, that's how you can get access to CPU from machine > > > layer. > > > > > [...] > > > PS: > Another thing I'd like to draw your attention to (since you recently looked at > phys-bits) is about host/guest phys_bits and if it's safe from migration pov > between hosts with different limits. > > -- Eduardo ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-27 14:54 ` Eduardo Habkost @ 2018-12-28 11:42 ` Igor Mammedov 0 siblings, 0 replies; 57+ messages in thread From: Igor Mammedov @ 2018-12-28 11:42 UTC (permalink / raw) To: Eduardo Habkost Cc: Michael S. Tsirkin, qemu-devel, Peter Xu, Yu Zhang, Paolo Bonzini, Richard Henderson On Thu, 27 Dec 2018 12:54:02 -0200 Eduardo Habkost <ehabkost@redhat.com> wrote: > On Fri, Dec 21, 2018 at 03:13:25PM +0100, Igor Mammedov wrote: > > On Thu, 20 Dec 2018 19:18:01 -0200 > > Eduardo Habkost <ehabkost@redhat.com> wrote: > > > > > On Wed, Dec 19, 2018 at 11:40:37AM +0100, Igor Mammedov wrote: > > > > On Wed, 19 Dec 2018 10:57:17 +0800 > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote: > > > > > > On Tue, 18 Dec 2018 17:27:23 +0800 > > > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > > > > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote: > > > > > > > > On Wed, 12 Dec 2018 21:05:38 +0800 > > > > > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > > > > > > > > > Currently, vIOMMU is using the value of IOVA address width, instead of > > > > > > > > > the host address width(HAW) to calculate the number of reserved bits in > > > > > > > > > data structures such as root entries, context entries, and entries of > > > > > > > > > DMA paging structures etc. > > > > > > > > > > > > > > > > > > However values of IOVA address width and of the HAW may not equal. For > > > > > > > > > example, a 48-bit IOVA can only be mapped to host addresses no wider than > > > > > > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may result > > > > > > > > > in an invalid IOVA being accepted. > > > > > > > > > > > > > > > > > > To fix this, a new field - haw_bits is introduced in struct IntelIOMMUState, > > > > > > > > > whose value is initialized based on the maximum physical address set to > > > > > > > > > guest CPU. > > > > > > > > > > > > > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed > > > > > > > > > to clarify. > > > > > > > > > > > > > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > > > > > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > > > > > > > --- > > > > > > > > [...] > > > > > > > > > > > > > > > > > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n) > > > > > > > > > static void vtd_init(IntelIOMMUState *s) > > > > > > > > > { > > > > > > > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > > > > > > > + CPUState *cs = first_cpu; > > > > > > > > > + X86CPU *cpu = X86_CPU(cs); > > > > > > > > > > > > > > > > > > memset(s->csr, 0, DMAR_REG_SIZE); > > > > > > > > > memset(s->wmask, 0, DMAR_REG_SIZE); > > > > > > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s) > > > > > > > > > s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND | > > > > > > > > > VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS | > > > > > > > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > > > > > > > - if (s->aw_bits == VTD_HOST_AW_48BIT) { > > > > > > > > > + if (s->aw_bits == VTD_AW_48BIT) { > > > > > > > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > > > > > > > } > > > > > > > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > > > > > > > + s->haw_bits = cpu->phys_bits; > > > > > > > > Is it possible to avoid accessing CPU fields directly or cpu altogether > > > > > > > > and set phys_bits when iommu is created? > > > > > > > > > > > > > > Thanks for your comments, Igor. > > > > > > > > > > > > > > Well, I guess you prefer not to query the CPU capabilities while deciding > > > > > > > the vIOMMU features. But to me, they are not that irrelevant.:) > > > > > > > > > > > > > > Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR > > > > > > > are referring to the same concept. In VM, both are the maximum guest physical > > > > > > > address width. If we do not check the CPU field here, we will still have to > > > > > > > check the CPU field in other places such as build_dmar_q35(), and reset the > > > > > > > s->haw_bits again. > > > > > > > > > > > > > > Is this explanation convincing enough? :) > > > > > > current build_dmar_q35() doesn't do it, it's all new code in this series that > > > > > > contains not acceptable direct access from one device (iommu) to another (cpu). > > > > > > Proper way would be for the owner of iommu to fish limits from somewhere and set > > > > > > values during iommu creation. > > > > > > > > > > Well, current build_dmar_q35() doesn't do it, because it is using the incorrect value. :) > > > > > According to the spec, the host address width is the maximum physical address width, > > > > > yet current implementation is using the DMA address width. For me, this is not only > > > > > wrong, but also unsecure. For this point, I think we all agree this need to be fixed. > > > > > > > > > > As to how to fix it - should we query the cpu fields, I still do not understand why > > > > > this is not acceptable. :) > > > > > > > > > > I had thought of other approaches before, yet I did not choose: > > > > > > > > > > 1> Introduce a new parameter, say, "x-haw-bits" which is used for iommu to limit its > > > > > physical address width(similar to the "x-aw-bits" for IOVA). But what should we check > > > > > this parameter or not? What if this parameter is set to sth. different than the "phys-bits" > > > > > or not? > > > > > > > > > > 2> Another choice I had thought of is, to query the physical iommu. I abandoned this > > > > > idea because my understanding is that vIOMMU is not a passthrued device, it is emulated. > > > > > > > > > So Igor, may I ask why you think checking against the cpu fields so not acceptable? :) > > > > Because accessing private fields of device from another random device is not robust > > > > and a subject to breaking in unpredictable manner when field meaning or initialization > > > > order changes. (analogy to baremetal: one does not solder wire to a CPU die to let > > > > access some piece of data from random device). > > > > > > > > > > With either the solution below or the one I proposed, we still > > > have a ordering problem: if we want "-cpu ...,phys-bits=..." to > > As Michael said, it's questionable if iommu should rely on guest's > > phys-bits at all, > > Agreed, this is not clear. I don't know yet if we really want to > make "-cpu" affect other devices. Probably not. > > > but that aside we should use proper interfaces > > and hierarchy to initialize devices, see below why I dislike > > simplistic pc_max_phys_bits(). > > What do you mean by proper interfaces and hierarchy? set properties on created iommu object by one of it's parents (SysBus or machine) > > pc_max_phys_bits() is simple, and that's supposed to be a good > thing. first_cpu->phys_bits even simpler, shouldn't we use it then? it only seems simple, but with this approach one would end up creating a bunch of custom APIs for every little thing then try to generalize them to share with other machine types pushing APIs to generic machine where it is irrelevant for the most machines. So one end ups with a lot hard to manage APIs that are called at random times by devices. > > > affect the IOMMU device, we will need the CPU objects to be > > > created before IOMMU realize. > > > > > > At least both proposals make the initialization ordering > > > explicitly a responsibility of the machine code. In either case, > > > I don't think we will start creating all CPU objects after device > > > realize any time soon. > > > > > > > > > > I've looked at intel-iommu code and how it's created so here is a way to do the thing > > > > you need using proper interfaces: > > > > > > > > 1. add x-haw_bits property > > > > 2. include in your series patch > > > > '[Qemu-devel] [PATCH] qdev: let machine hotplug handler to override bus hotplug handler' > > > > 3. add your iommu to pc_get_hotpug_handler() to redirect plug flow to > > > > machine and let _pre_plug handler to check and set x-haw_bits for machine level > > > > > > Wow, that's a very complex way to pass a single integer from > > > machine code to device code. If this is the only way to do that, > > > we really need to take a step back and rethink our API design. > > > > > > What's wrong with having a simple > > > uint32_t pc_max_phys_bits(PCMachineState*) > > > function? > > As suggested, it would be only aesthetic change for accessing first_cpu from > > random device at random time. IOMMU would still access cpu instance directly > > no matter how much wrappers one would use so it's still the same hack. > > If phys_bits were changing during VM lifecycle and iommu needed to use > > updated value then using pc_max_phys_bits() might be justified as > > we don't have interfaces to handle that but that's not the case here. > > I don't understand what you mean here. Which "interfaces to > handle that" you are talking about? There is HotplugHandler and its pre_plug() handler to initialize being created device before it's realize() method is called. In iommu case, I suggest machine to set iommu::phys_bits property when device is created at pre_plug() time and fail gracefully if it's not possible. It's a bit more complex than pc_max_phys_bits() but follows QOM device life-cycle just as expected without unnecessary relations. > > I suggested a typical way (albeit a bit complex) to handle device > > initialization in cases where bus plug handler is not sufficient. > > It follows proper hierarchy without any layer violations and can fail > > gracefully even if we start creating CPUs later using only '-device cpufoo' > > without need to fix iommu code to handle that (it would fail creating > > iommu with clear error that CPU isn't available and all user have to > > do is to fix CLI to make sure that CPU is created before iommu). > > What do you mean by "proper hierarchy" and "layer violations"? it means that an object shouldn't reach to the parent to fetch a random bit of configuration (ideally child shouldn't be aware of parent's existence at all), and then it's responsibility of parent to configure child during it's creation and set all necessary properties/resources for the child to function properly. That model was used in qdev and it's still true with QOM device models we use now. Difference is that instead of configuring device fields directly, QOM device approach uses more unified object_new/set properties/realize work-flow. > What exactly is wrong with having device code talking to the > machine object? it breaks device abstraction boundaries and introduces unnecessary relationship between devices instead of reusing existing device initialization framework. It will also be a problem if we start isolating device models/backends into separate processes as one would have to add/maintain/secure ABIs for 'simple' APIs where property setting ABI would be sufficient. > You do have a point about "-device cpufoo": making "-cpu" affect > iommu phys-bits is probably not a good idea after all. > > > > > So I'd prefer if we used exiting pattern for device initialization > > instead of hacks whenever it is possible. > > Why do you describe it as a hack? It's just C code calling a C > function. I don't see any problem in having device code talking > to the machine code to get a bit of information. (Well, we can get rid of a bunch of properties and query QemuOpts directly from each device whenever configuration info is needed, it's just a C code calling C functions after all) Writing and calling random set of C functions at random time is fine if we give up on modeling devices as QOM objects (QEMU was like that at some point), but that becomes unmanageable as complexity grows. > > > > 4. you probably can use phys-bits/host-phys-bits properties to get data that you need > > > > also see how ms->possible_cpus, that's how you can get access to CPU from machine > > > > layer. > > > > > > > [...] > > > > > PS: > > Another thing I'd like to draw your attention to (since you recently looked at > > phys-bits) is about host/guest phys_bits and if it's safe from migration pov > > between hosts with different limits. > > > > > ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width. 2018-12-18 9:27 ` Yu Zhang 2018-12-18 14:23 ` Michael S. Tsirkin 2018-12-18 14:55 ` Igor Mammedov @ 2018-12-20 20:58 ` Eduardo Habkost 2 siblings, 0 replies; 57+ messages in thread From: Eduardo Habkost @ 2018-12-20 20:58 UTC (permalink / raw) To: Yu Zhang Cc: Igor Mammedov, Michael S. Tsirkin, qemu-devel, Peter Xu, Paolo Bonzini, Richard Henderson On Tue, Dec 18, 2018 at 05:27:23PM +0800, Yu Zhang wrote: > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote: > > On Wed, 12 Dec 2018 21:05:38 +0800 > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > Currently, vIOMMU is using the value of IOVA address width, instead of > > > the host address width(HAW) to calculate the number of reserved bits in > > > data structures such as root entries, context entries, and entries of > > > DMA paging structures etc. > > > > > > However values of IOVA address width and of the HAW may not equal. For > > > example, a 48-bit IOVA can only be mapped to host addresses no wider than > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may result > > > in an invalid IOVA being accepted. > > > > > > To fix this, a new field - haw_bits is introduced in struct IntelIOMMUState, > > > whose value is initialized based on the maximum physical address set to > > > guest CPU. > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed > > > to clarify. > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > --- > > [...] > > > > > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n) > > > static void vtd_init(IntelIOMMUState *s) > > > { > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > + CPUState *cs = first_cpu; > > > + X86CPU *cpu = X86_CPU(cs); > > > > > > memset(s->csr, 0, DMAR_REG_SIZE); > > > memset(s->wmask, 0, DMAR_REG_SIZE); > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s) > > > s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND | > > > VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS | > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > - if (s->aw_bits == VTD_HOST_AW_48BIT) { > > > + if (s->aw_bits == VTD_AW_48BIT) { > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > } > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > + s->haw_bits = cpu->phys_bits; > > Is it possible to avoid accessing CPU fields directly or cpu altogether > > and set phys_bits when iommu is created? > > Thanks for your comments, Igor. > > Well, I guess you prefer not to query the CPU capabilities while deciding > the vIOMMU features. But to me, they are not that irrelevant.:) > > Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR > are referring to the same concept. In VM, both are the maximum guest physical > address width. If we do not check the CPU field here, we will still have to > check the CPU field in other places such as build_dmar_q35(), and reset the > s->haw_bits again. > > Is this explanation convincing enough? :) > > > > > Perhaps Eduardo > > can suggest better approach, since he's more familiar with phys_bits topic > > @Eduardo, any comments? Thanks! Configuring IOMMU phys-bits automatically depending on the configured CPU is OK, but accessing first_cpu directly in iommu code is. I suggest delegating this to the machine object, e.g.: uint32_t pc_max_phys_bits(PCMachineState *pcms) { return object_property_get_uint(OBJECT(first_cpu), "phys-bits", &error_abort); } as the machine itself is responsible for creating the CPU objects, and I believe there are other places in PC code where we do physical address calculations that could be affected by the physical address space size. -- Eduardo ^ permalink raw reply [flat|nested] 57+ messages in thread
* [Qemu-devel] [PATCH v3 2/2] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width. 2018-12-12 13:05 [Qemu-devel] [PATCH v3 0/2] intel-iommu: add support for 5-level virtual IOMMU Yu Zhang 2018-12-12 13:05 ` [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width Yu Zhang @ 2018-12-12 13:05 ` Yu Zhang 2018-12-17 13:29 ` Igor Mammedov 2018-12-14 9:17 ` [Qemu-devel] [PATCH v3 0/2] intel-iommu: add support for 5-level virtual IOMMU Yu Zhang 2019-01-15 4:02 ` Michael S. Tsirkin 3 siblings, 1 reply; 57+ messages in thread From: Yu Zhang @ 2018-12-12 13:05 UTC (permalink / raw) To: qemu-devel Cc: Michael S. Tsirkin, Marcel Apfelbaum, Paolo Bonzini, Richard Henderson, Eduardo Habkost, Peter Xu A 5-level paging capable VM may choose to use 57-bit IOVA address width. E.g. guest applications may prefer to use its VA as IOVA when performing VFIO map/unmap operations, to avoid the burden of managing the IOVA space. This patch extends the current vIOMMU logic to cover the extended address width. When creating a VM with 5-level paging feature, one can choose to create a virtual VTD with 5-level paging capability, with configurations like "-device intel-iommu,x-aw-bits=57". Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> --- Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Peter Xu <peterx@redhat.com> --- hw/i386/intel_iommu.c | 53 ++++++++++++++++++++++++++++++++---------- hw/i386/intel_iommu_internal.h | 10 ++++++-- include/hw/i386/intel_iommu.h | 1 + 3 files changed, 50 insertions(+), 14 deletions(-) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 0e88c63..871110c 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -664,16 +664,16 @@ static inline bool vtd_iova_range_check(uint64_t iova, VTDContextEntry *ce, /* * Rsvd field masks for spte: - * Index [1] to [4] 4k pages - * Index [5] to [8] large pages + * Index [1] to [5] 4k pages + * Index [6] to [10] large pages */ -static uint64_t vtd_paging_entry_rsvd_field[9]; +static uint64_t vtd_paging_entry_rsvd_field[11]; static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t level) { if (slpte & VTD_SL_PT_PAGE_SIZE_MASK) { /* Maybe large page */ - return slpte & vtd_paging_entry_rsvd_field[level + 4]; + return slpte & vtd_paging_entry_rsvd_field[level + 5]; } else { return slpte & vtd_paging_entry_rsvd_field[level]; } @@ -3127,6 +3127,8 @@ static void vtd_init(IntelIOMMUState *s) VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); if (s->aw_bits == VTD_AW_48BIT) { s->cap |= VTD_CAP_SAGAW_48bit; + } else if (s->aw_bits == VTD_AW_57BIT) { + s->cap |= VTD_CAP_SAGAW_57bit | VTD_CAP_SAGAW_48bit; } s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; s->haw_bits = cpu->phys_bits; @@ -3139,10 +3141,12 @@ static void vtd_init(IntelIOMMUState *s) vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_PAGE_L5_RSVD_MASK(s->haw_bits); + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); + vtd_paging_entry_rsvd_field[9] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); + vtd_paging_entry_rsvd_field[10] = VTD_SPTE_LPAGE_L5_RSVD_MASK(s->haw_bits); if (x86_iommu->intr_supported) { s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; @@ -3241,6 +3245,23 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn) return &vtd_as->as; } +static bool host_has_la57(void) +{ + uint32_t ecx, unused; + + host_cpuid(7, 0, &unused, &unused, &ecx, &unused); + return ecx & CPUID_7_0_ECX_LA57; +} + +static bool guest_has_la57(void) +{ + CPUState *cs = first_cpu; + X86CPU *cpu = X86_CPU(cs); + CPUX86State *env = &cpu->env; + + return env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_LA57; +} + static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) { X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); @@ -3267,11 +3288,19 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) } } - /* Currently only address widths supported are 39 and 48 bits */ + /* Currently address widths supported are 39, 48, and 57 bits */ if ((s->aw_bits != VTD_AW_39BIT) && - (s->aw_bits != VTD_AW_48BIT)) { - error_setg(errp, "Supported values for x-aw-bits are: %d, %d", - VTD_AW_39BIT, VTD_AW_48BIT); + (s->aw_bits != VTD_AW_48BIT) && + (s->aw_bits != VTD_AW_57BIT)) { + error_setg(errp, "Supported values for x-aw-bits are: %d, %d, %d", + VTD_AW_39BIT, VTD_AW_48BIT, VTD_AW_57BIT); + return false; + } + + if ((s->aw_bits == VTD_AW_57BIT) && + !(host_has_la57() && guest_has_la57())) { + error_setg(errp, "Do not support 57-bit DMA address, unless both " + "host and guest are capable of 5-level paging"); return false; } diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index d084099..2b29b6f 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -114,8 +114,8 @@ VTD_INTERRUPT_ADDR_FIRST + 1) /* The shift of source_id in the key of IOTLB hash table */ -#define VTD_IOTLB_SID_SHIFT 36 -#define VTD_IOTLB_LVL_SHIFT 52 +#define VTD_IOTLB_SID_SHIFT 45 +#define VTD_IOTLB_LVL_SHIFT 61 #define VTD_IOTLB_MAX_SIZE 1024 /* Max size of the hash table */ /* IOTLB_REG */ @@ -212,6 +212,8 @@ #define VTD_CAP_SAGAW_39bit (0x2ULL << VTD_CAP_SAGAW_SHIFT) /* 48-bit AGAW, 4-level page-table */ #define VTD_CAP_SAGAW_48bit (0x4ULL << VTD_CAP_SAGAW_SHIFT) + /* 57-bit AGAW, 5-level page-table */ +#define VTD_CAP_SAGAW_57bit (0x8ULL << VTD_CAP_SAGAW_SHIFT) /* IQT_REG */ #define VTD_IQT_QT(val) (((val) >> 4) & 0x7fffULL) @@ -379,6 +381,8 @@ typedef union VTDInvDesc VTDInvDesc; (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) #define VTD_SPTE_PAGE_L4_RSVD_MASK(aw) \ (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) +#define VTD_SPTE_PAGE_L5_RSVD_MASK(aw) \ + (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) #define VTD_SPTE_LPAGE_L1_RSVD_MASK(aw) \ (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) #define VTD_SPTE_LPAGE_L2_RSVD_MASK(aw) \ @@ -387,6 +391,8 @@ typedef union VTDInvDesc VTDInvDesc; (0x3ffff800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) #define VTD_SPTE_LPAGE_L4_RSVD_MASK(aw) \ (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) +#define VTD_SPTE_LPAGE_L5_RSVD_MASK(aw) \ + (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) /* Information about page-selective IOTLB invalidate */ struct VTDIOTLBPageInvInfo { diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h index 820451c..7474c4f 100644 --- a/include/hw/i386/intel_iommu.h +++ b/include/hw/i386/intel_iommu.h @@ -49,6 +49,7 @@ #define DMAR_REG_SIZE 0x230 #define VTD_AW_39BIT 39 #define VTD_AW_48BIT 48 +#define VTD_AW_57BIT 57 #define VTD_ADDRESS_WIDTH VTD_AW_39BIT #define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1) -- 1.9.1 ^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width. 2018-12-12 13:05 ` [Qemu-devel] [PATCH v3 2/2] intel-iommu: extend VTD emulation to allow 57-bit " Yu Zhang @ 2018-12-17 13:29 ` Igor Mammedov 2018-12-18 9:47 ` Yu Zhang 0 siblings, 1 reply; 57+ messages in thread From: Igor Mammedov @ 2018-12-17 13:29 UTC (permalink / raw) To: Yu Zhang Cc: qemu-devel, Eduardo Habkost, Michael S. Tsirkin, Peter Xu, Paolo Bonzini, Richard Henderson On Wed, 12 Dec 2018 21:05:39 +0800 Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > A 5-level paging capable VM may choose to use 57-bit IOVA address width. > E.g. guest applications may prefer to use its VA as IOVA when performing > VFIO map/unmap operations, to avoid the burden of managing the IOVA space. > > This patch extends the current vIOMMU logic to cover the extended address > width. When creating a VM with 5-level paging feature, one can choose to > create a virtual VTD with 5-level paging capability, with configurations > like "-device intel-iommu,x-aw-bits=57". > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > Reviewed-by: Peter Xu <peterx@redhat.com> > --- > Cc: "Michael S. Tsirkin" <mst@redhat.com> > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Richard Henderson <rth@twiddle.net> > Cc: Eduardo Habkost <ehabkost@redhat.com> > Cc: Peter Xu <peterx@redhat.com> > --- > hw/i386/intel_iommu.c | 53 ++++++++++++++++++++++++++++++++---------- > hw/i386/intel_iommu_internal.h | 10 ++++++-- > include/hw/i386/intel_iommu.h | 1 + > 3 files changed, 50 insertions(+), 14 deletions(-) > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > index 0e88c63..871110c 100644 > --- a/hw/i386/intel_iommu.c > +++ b/hw/i386/intel_iommu.c > @@ -664,16 +664,16 @@ static inline bool vtd_iova_range_check(uint64_t iova, VTDContextEntry *ce, > > /* > * Rsvd field masks for spte: > - * Index [1] to [4] 4k pages > - * Index [5] to [8] large pages > + * Index [1] to [5] 4k pages > + * Index [6] to [10] large pages > */ > -static uint64_t vtd_paging_entry_rsvd_field[9]; > +static uint64_t vtd_paging_entry_rsvd_field[11]; > > static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t level) > { > if (slpte & VTD_SL_PT_PAGE_SIZE_MASK) { > /* Maybe large page */ > - return slpte & vtd_paging_entry_rsvd_field[level + 4]; > + return slpte & vtd_paging_entry_rsvd_field[level + 5]; > } else { > return slpte & vtd_paging_entry_rsvd_field[level]; > } > @@ -3127,6 +3127,8 @@ static void vtd_init(IntelIOMMUState *s) > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > if (s->aw_bits == VTD_AW_48BIT) { > s->cap |= VTD_CAP_SAGAW_48bit; > + } else if (s->aw_bits == VTD_AW_57BIT) { > + s->cap |= VTD_CAP_SAGAW_57bit | VTD_CAP_SAGAW_48bit; > } > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > s->haw_bits = cpu->phys_bits; > @@ -3139,10 +3141,12 @@ static void vtd_init(IntelIOMMUState *s) > vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); > vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); > vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); > - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_PAGE_L5_RSVD_MASK(s->haw_bits); > + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > + vtd_paging_entry_rsvd_field[9] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > + vtd_paging_entry_rsvd_field[10] = VTD_SPTE_LPAGE_L5_RSVD_MASK(s->haw_bits); > > if (x86_iommu->intr_supported) { > s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; > @@ -3241,6 +3245,23 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn) > return &vtd_as->as; > } > > +static bool host_has_la57(void) > +{ > + uint32_t ecx, unused; > + > + host_cpuid(7, 0, &unused, &unused, &ecx, &unused); > + return ecx & CPUID_7_0_ECX_LA57; > +} > + > +static bool guest_has_la57(void) > +{ > + CPUState *cs = first_cpu; > + X86CPU *cpu = X86_CPU(cs); > + CPUX86State *env = &cpu->env; > + > + return env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_LA57; > +} another direct access to CPU fields, I'd suggest to set this value when iommu is created i.e. add 'la57' property and set from iommu owner. > static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > { > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > @@ -3267,11 +3288,19 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > } > } > > - /* Currently only address widths supported are 39 and 48 bits */ > + /* Currently address widths supported are 39, 48, and 57 bits */ > if ((s->aw_bits != VTD_AW_39BIT) && > - (s->aw_bits != VTD_AW_48BIT)) { > - error_setg(errp, "Supported values for x-aw-bits are: %d, %d", > - VTD_AW_39BIT, VTD_AW_48BIT); > + (s->aw_bits != VTD_AW_48BIT) && > + (s->aw_bits != VTD_AW_57BIT)) { > + error_setg(errp, "Supported values for x-aw-bits are: %d, %d, %d", > + VTD_AW_39BIT, VTD_AW_48BIT, VTD_AW_57BIT); > + return false; > + } > + > + if ((s->aw_bits == VTD_AW_57BIT) && > + !(host_has_la57() && guest_has_la57())) { Does iommu supposed to work in TCG mode? If yes then why it should care about host_has_la57()? > + error_setg(errp, "Do not support 57-bit DMA address, unless both " > + "host and guest are capable of 5-level paging"); > return false; > } > > diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h > index d084099..2b29b6f 100644 > --- a/hw/i386/intel_iommu_internal.h > +++ b/hw/i386/intel_iommu_internal.h > @@ -114,8 +114,8 @@ > VTD_INTERRUPT_ADDR_FIRST + 1) > > /* The shift of source_id in the key of IOTLB hash table */ > -#define VTD_IOTLB_SID_SHIFT 36 > -#define VTD_IOTLB_LVL_SHIFT 52 > +#define VTD_IOTLB_SID_SHIFT 45 > +#define VTD_IOTLB_LVL_SHIFT 61 > #define VTD_IOTLB_MAX_SIZE 1024 /* Max size of the hash table */ > > /* IOTLB_REG */ > @@ -212,6 +212,8 @@ > #define VTD_CAP_SAGAW_39bit (0x2ULL << VTD_CAP_SAGAW_SHIFT) > /* 48-bit AGAW, 4-level page-table */ > #define VTD_CAP_SAGAW_48bit (0x4ULL << VTD_CAP_SAGAW_SHIFT) > + /* 57-bit AGAW, 5-level page-table */ > +#define VTD_CAP_SAGAW_57bit (0x8ULL << VTD_CAP_SAGAW_SHIFT) > > /* IQT_REG */ > #define VTD_IQT_QT(val) (((val) >> 4) & 0x7fffULL) > @@ -379,6 +381,8 @@ typedef union VTDInvDesc VTDInvDesc; > (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > #define VTD_SPTE_PAGE_L4_RSVD_MASK(aw) \ > (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > +#define VTD_SPTE_PAGE_L5_RSVD_MASK(aw) \ > + (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > #define VTD_SPTE_LPAGE_L1_RSVD_MASK(aw) \ > (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > #define VTD_SPTE_LPAGE_L2_RSVD_MASK(aw) \ > @@ -387,6 +391,8 @@ typedef union VTDInvDesc VTDInvDesc; > (0x3ffff800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > #define VTD_SPTE_LPAGE_L4_RSVD_MASK(aw) \ > (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > +#define VTD_SPTE_LPAGE_L5_RSVD_MASK(aw) \ > + (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > /* Information about page-selective IOTLB invalidate */ > struct VTDIOTLBPageInvInfo { > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h > index 820451c..7474c4f 100644 > --- a/include/hw/i386/intel_iommu.h > +++ b/include/hw/i386/intel_iommu.h > @@ -49,6 +49,7 @@ > #define DMAR_REG_SIZE 0x230 > #define VTD_AW_39BIT 39 > #define VTD_AW_48BIT 48 > +#define VTD_AW_57BIT 57 > #define VTD_ADDRESS_WIDTH VTD_AW_39BIT > #define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1) > ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width. 2018-12-17 13:29 ` Igor Mammedov @ 2018-12-18 9:47 ` Yu Zhang 2018-12-18 10:01 ` Yu Zhang 0 siblings, 1 reply; 57+ messages in thread From: Yu Zhang @ 2018-12-18 9:47 UTC (permalink / raw) To: Igor Mammedov Cc: Eduardo Habkost, Michael S. Tsirkin, qemu-devel, Peter Xu, Paolo Bonzini, Richard Henderson On Mon, Dec 17, 2018 at 02:29:02PM +0100, Igor Mammedov wrote: > On Wed, 12 Dec 2018 21:05:39 +0800 > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > A 5-level paging capable VM may choose to use 57-bit IOVA address width. > > E.g. guest applications may prefer to use its VA as IOVA when performing > > VFIO map/unmap operations, to avoid the burden of managing the IOVA space. > > > > This patch extends the current vIOMMU logic to cover the extended address > > width. When creating a VM with 5-level paging feature, one can choose to > > create a virtual VTD with 5-level paging capability, with configurations > > like "-device intel-iommu,x-aw-bits=57". > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > Reviewed-by: Peter Xu <peterx@redhat.com> > > --- > > Cc: "Michael S. Tsirkin" <mst@redhat.com> > > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > > Cc: Paolo Bonzini <pbonzini@redhat.com> > > Cc: Richard Henderson <rth@twiddle.net> > > Cc: Eduardo Habkost <ehabkost@redhat.com> > > Cc: Peter Xu <peterx@redhat.com> > > --- > > hw/i386/intel_iommu.c | 53 ++++++++++++++++++++++++++++++++---------- > > hw/i386/intel_iommu_internal.h | 10 ++++++-- > > include/hw/i386/intel_iommu.h | 1 + > > 3 files changed, 50 insertions(+), 14 deletions(-) > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > > index 0e88c63..871110c 100644 > > --- a/hw/i386/intel_iommu.c > > +++ b/hw/i386/intel_iommu.c > > @@ -664,16 +664,16 @@ static inline bool vtd_iova_range_check(uint64_t iova, VTDContextEntry *ce, > > > > /* > > * Rsvd field masks for spte: > > - * Index [1] to [4] 4k pages > > - * Index [5] to [8] large pages > > + * Index [1] to [5] 4k pages > > + * Index [6] to [10] large pages > > */ > > -static uint64_t vtd_paging_entry_rsvd_field[9]; > > +static uint64_t vtd_paging_entry_rsvd_field[11]; > > > > static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t level) > > { > > if (slpte & VTD_SL_PT_PAGE_SIZE_MASK) { > > /* Maybe large page */ > > - return slpte & vtd_paging_entry_rsvd_field[level + 4]; > > + return slpte & vtd_paging_entry_rsvd_field[level + 5]; > > } else { > > return slpte & vtd_paging_entry_rsvd_field[level]; > > } > > @@ -3127,6 +3127,8 @@ static void vtd_init(IntelIOMMUState *s) > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > if (s->aw_bits == VTD_AW_48BIT) { > > s->cap |= VTD_CAP_SAGAW_48bit; > > + } else if (s->aw_bits == VTD_AW_57BIT) { > > + s->cap |= VTD_CAP_SAGAW_57bit | VTD_CAP_SAGAW_48bit; > > } > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > s->haw_bits = cpu->phys_bits; > > @@ -3139,10 +3141,12 @@ static void vtd_init(IntelIOMMUState *s) > > vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); > > vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); > > vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); > > - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_PAGE_L5_RSVD_MASK(s->haw_bits); > > + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > + vtd_paging_entry_rsvd_field[9] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > + vtd_paging_entry_rsvd_field[10] = VTD_SPTE_LPAGE_L5_RSVD_MASK(s->haw_bits); > > > > if (x86_iommu->intr_supported) { > > s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; > > @@ -3241,6 +3245,23 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn) > > return &vtd_as->as; > > } > > > > +static bool host_has_la57(void) > > +{ > > + uint32_t ecx, unused; > > + > > + host_cpuid(7, 0, &unused, &unused, &ecx, &unused); > > + return ecx & CPUID_7_0_ECX_LA57; > > +} > > + > > +static bool guest_has_la57(void) > > +{ > > + CPUState *cs = first_cpu; > > + X86CPU *cpu = X86_CPU(cs); > > + CPUX86State *env = &cpu->env; > > + > > + return env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_LA57; > > +} > another direct access to CPU fields, > I'd suggest to set this value when iommu is created > i.e. add 'la57' property and set from iommu owner. > Sorry, do you mean "-device intel-iommu,la57"? I think we do not need that, because a 5-level capable vIOMMU can be created with properties like "-device intel-iommu,x-aw-bits=57". The guest CPU fields are checked to make sure the VM has LA57 CPU feature, because I believe there shall be no 5-level IOMMU on platforms without LA57 CPUs. > > static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > { > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > @@ -3267,11 +3288,19 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > } > > } > > > > - /* Currently only address widths supported are 39 and 48 bits */ > > + /* Currently address widths supported are 39, 48, and 57 bits */ > > if ((s->aw_bits != VTD_AW_39BIT) && > > - (s->aw_bits != VTD_AW_48BIT)) { > > - error_setg(errp, "Supported values for x-aw-bits are: %d, %d", > > - VTD_AW_39BIT, VTD_AW_48BIT); > > + (s->aw_bits != VTD_AW_48BIT) && > > + (s->aw_bits != VTD_AW_57BIT)) { > > + error_setg(errp, "Supported values for x-aw-bits are: %d, %d, %d", > > + VTD_AW_39BIT, VTD_AW_48BIT, VTD_AW_57BIT); > > + return false; > > + } > > + > > + if ((s->aw_bits == VTD_AW_57BIT) && > > + !(host_has_la57() && guest_has_la57())) { > Does iommu supposed to work in TCG mode? > If yes then why it should care about host_has_la57()? > Hmm... I did not take TCG mode into consideration. And host_has_la57() is used to guarantee the host have la57 feature so that iommu shadowing works for device assignment. I guess iommu shall work in TCG mode(though I am not quite sure about this). But I do not have any usage case of a 5-level vIOMMU in TCG in mind. So maybe we can: 1> check the 'ms->accel' in vtd_decide_config() and do not care about host capability if it is TCG. 2> Or, we can choose to keep as it is, and add the check when 5-level paging vIOMMU does have usage in TCG? But as to the check of guest capability, I still believe it is necessary. As said, a VM without LA57 feature shall not see a VT-d with 5-level IOMMU. > > + error_setg(errp, "Do not support 57-bit DMA address, unless both " > > + "host and guest are capable of 5-level paging"); > > return false; > > } > > > > diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h > > index d084099..2b29b6f 100644 > > --- a/hw/i386/intel_iommu_internal.h > > +++ b/hw/i386/intel_iommu_internal.h > > @@ -114,8 +114,8 @@ > > VTD_INTERRUPT_ADDR_FIRST + 1) > > > > /* The shift of source_id in the key of IOTLB hash table */ > > -#define VTD_IOTLB_SID_SHIFT 36 > > -#define VTD_IOTLB_LVL_SHIFT 52 > > +#define VTD_IOTLB_SID_SHIFT 45 > > +#define VTD_IOTLB_LVL_SHIFT 61 > > #define VTD_IOTLB_MAX_SIZE 1024 /* Max size of the hash table */ > > > > /* IOTLB_REG */ > > @@ -212,6 +212,8 @@ > > #define VTD_CAP_SAGAW_39bit (0x2ULL << VTD_CAP_SAGAW_SHIFT) > > /* 48-bit AGAW, 4-level page-table */ > > #define VTD_CAP_SAGAW_48bit (0x4ULL << VTD_CAP_SAGAW_SHIFT) > > + /* 57-bit AGAW, 5-level page-table */ > > +#define VTD_CAP_SAGAW_57bit (0x8ULL << VTD_CAP_SAGAW_SHIFT) > > > > /* IQT_REG */ > > #define VTD_IQT_QT(val) (((val) >> 4) & 0x7fffULL) > > @@ -379,6 +381,8 @@ typedef union VTDInvDesc VTDInvDesc; > > (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > #define VTD_SPTE_PAGE_L4_RSVD_MASK(aw) \ > > (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > +#define VTD_SPTE_PAGE_L5_RSVD_MASK(aw) \ > > + (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > #define VTD_SPTE_LPAGE_L1_RSVD_MASK(aw) \ > > (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > #define VTD_SPTE_LPAGE_L2_RSVD_MASK(aw) \ > > @@ -387,6 +391,8 @@ typedef union VTDInvDesc VTDInvDesc; > > (0x3ffff800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > #define VTD_SPTE_LPAGE_L4_RSVD_MASK(aw) \ > > (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > +#define VTD_SPTE_LPAGE_L5_RSVD_MASK(aw) \ > > + (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > /* Information about page-selective IOTLB invalidate */ > > struct VTDIOTLBPageInvInfo { > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h > > index 820451c..7474c4f 100644 > > --- a/include/hw/i386/intel_iommu.h > > +++ b/include/hw/i386/intel_iommu.h > > @@ -49,6 +49,7 @@ > > #define DMAR_REG_SIZE 0x230 > > #define VTD_AW_39BIT 39 > > #define VTD_AW_48BIT 48 > > +#define VTD_AW_57BIT 57 > > #define VTD_ADDRESS_WIDTH VTD_AW_39BIT > > #define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1) > > > > B.R. Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width. 2018-12-18 9:47 ` Yu Zhang @ 2018-12-18 10:01 ` Yu Zhang 2018-12-18 12:43 ` Michael S. Tsirkin 0 siblings, 1 reply; 57+ messages in thread From: Yu Zhang @ 2018-12-18 10:01 UTC (permalink / raw) To: Igor Mammedov Cc: Eduardo Habkost, Michael S. Tsirkin, qemu-devel, Peter Xu, Paolo Bonzini, Richard Henderson On Tue, Dec 18, 2018 at 05:47:14PM +0800, Yu Zhang wrote: > On Mon, Dec 17, 2018 at 02:29:02PM +0100, Igor Mammedov wrote: > > On Wed, 12 Dec 2018 21:05:39 +0800 > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > A 5-level paging capable VM may choose to use 57-bit IOVA address width. > > > E.g. guest applications may prefer to use its VA as IOVA when performing > > > VFIO map/unmap operations, to avoid the burden of managing the IOVA space. > > > > > > This patch extends the current vIOMMU logic to cover the extended address > > > width. When creating a VM with 5-level paging feature, one can choose to > > > create a virtual VTD with 5-level paging capability, with configurations > > > like "-device intel-iommu,x-aw-bits=57". > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > --- > > > Cc: "Michael S. Tsirkin" <mst@redhat.com> > > > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > > > Cc: Paolo Bonzini <pbonzini@redhat.com> > > > Cc: Richard Henderson <rth@twiddle.net> > > > Cc: Eduardo Habkost <ehabkost@redhat.com> > > > Cc: Peter Xu <peterx@redhat.com> > > > --- > > > hw/i386/intel_iommu.c | 53 ++++++++++++++++++++++++++++++++---------- > > > hw/i386/intel_iommu_internal.h | 10 ++++++-- > > > include/hw/i386/intel_iommu.h | 1 + > > > 3 files changed, 50 insertions(+), 14 deletions(-) > > > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > > > index 0e88c63..871110c 100644 > > > --- a/hw/i386/intel_iommu.c > > > +++ b/hw/i386/intel_iommu.c > > > @@ -664,16 +664,16 @@ static inline bool vtd_iova_range_check(uint64_t iova, VTDContextEntry *ce, > > > > > > /* > > > * Rsvd field masks for spte: > > > - * Index [1] to [4] 4k pages > > > - * Index [5] to [8] large pages > > > + * Index [1] to [5] 4k pages > > > + * Index [6] to [10] large pages > > > */ > > > -static uint64_t vtd_paging_entry_rsvd_field[9]; > > > +static uint64_t vtd_paging_entry_rsvd_field[11]; > > > > > > static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t level) > > > { > > > if (slpte & VTD_SL_PT_PAGE_SIZE_MASK) { > > > /* Maybe large page */ > > > - return slpte & vtd_paging_entry_rsvd_field[level + 4]; > > > + return slpte & vtd_paging_entry_rsvd_field[level + 5]; > > > } else { > > > return slpte & vtd_paging_entry_rsvd_field[level]; > > > } > > > @@ -3127,6 +3127,8 @@ static void vtd_init(IntelIOMMUState *s) > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > if (s->aw_bits == VTD_AW_48BIT) { > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > + } else if (s->aw_bits == VTD_AW_57BIT) { > > > + s->cap |= VTD_CAP_SAGAW_57bit | VTD_CAP_SAGAW_48bit; > > > } > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > s->haw_bits = cpu->phys_bits; > > > @@ -3139,10 +3141,12 @@ static void vtd_init(IntelIOMMUState *s) > > > vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); > > > vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); > > > vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); > > > - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_PAGE_L5_RSVD_MASK(s->haw_bits); > > > + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > + vtd_paging_entry_rsvd_field[9] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > + vtd_paging_entry_rsvd_field[10] = VTD_SPTE_LPAGE_L5_RSVD_MASK(s->haw_bits); > > > > > > if (x86_iommu->intr_supported) { > > > s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; > > > @@ -3241,6 +3245,23 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn) > > > return &vtd_as->as; > > > } > > > > > > +static bool host_has_la57(void) > > > +{ > > > + uint32_t ecx, unused; > > > + > > > + host_cpuid(7, 0, &unused, &unused, &ecx, &unused); > > > + return ecx & CPUID_7_0_ECX_LA57; > > > +} > > > + > > > +static bool guest_has_la57(void) > > > +{ > > > + CPUState *cs = first_cpu; > > > + X86CPU *cpu = X86_CPU(cs); > > > + CPUX86State *env = &cpu->env; > > > + > > > + return env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_LA57; > > > +} > > another direct access to CPU fields, > > I'd suggest to set this value when iommu is created > > i.e. add 'la57' property and set from iommu owner. > > > > Sorry, do you mean "-device intel-iommu,la57"? I think we do not need > that, because a 5-level capable vIOMMU can be created with properties > like "-device intel-iommu,x-aw-bits=57". > > The guest CPU fields are checked to make sure the VM has LA57 CPU feature, > because I believe there shall be no 5-level IOMMU on platforms without LA57 > CPUs. > > > > static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > { > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > @@ -3267,11 +3288,19 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > } > > > } > > > > > > - /* Currently only address widths supported are 39 and 48 bits */ > > > + /* Currently address widths supported are 39, 48, and 57 bits */ > > > if ((s->aw_bits != VTD_AW_39BIT) && > > > - (s->aw_bits != VTD_AW_48BIT)) { > > > - error_setg(errp, "Supported values for x-aw-bits are: %d, %d", > > > - VTD_AW_39BIT, VTD_AW_48BIT); > > > + (s->aw_bits != VTD_AW_48BIT) && > > > + (s->aw_bits != VTD_AW_57BIT)) { > > > + error_setg(errp, "Supported values for x-aw-bits are: %d, %d, %d", > > > + VTD_AW_39BIT, VTD_AW_48BIT, VTD_AW_57BIT); > > > + return false; > > > + } > > > + > > > + if ((s->aw_bits == VTD_AW_57BIT) && > > > + !(host_has_la57() && guest_has_la57())) { > > Does iommu supposed to work in TCG mode? > > If yes then why it should care about host_has_la57()? > > > > Hmm... I did not take TCG mode into consideration. And host_has_la57() is > used to guarantee the host have la57 feature so that iommu shadowing works > for device assignment. > > I guess iommu shall work in TCG mode(though I am not quite sure about this). > But I do not have any usage case of a 5-level vIOMMU in TCG in mind. So maybe > we can: > 1> check the 'ms->accel' in vtd_decide_config() and do not care about host > capability if it is TCG. For choice 1, kvm_enabled() might be used instead of ms->accel. Thanks Peter for the remind. :) > 2> Or, we can choose to keep as it is, and add the check when 5-level paging > vIOMMU does have usage in TCG? > > But as to the check of guest capability, I still believe it is necessary. As > said, a VM without LA57 feature shall not see a VT-d with 5-level IOMMU. > > > > + error_setg(errp, "Do not support 57-bit DMA address, unless both " > > > + "host and guest are capable of 5-level paging"); > > > return false; > > > } > > > > > > diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h > > > index d084099..2b29b6f 100644 > > > --- a/hw/i386/intel_iommu_internal.h > > > +++ b/hw/i386/intel_iommu_internal.h > > > @@ -114,8 +114,8 @@ > > > VTD_INTERRUPT_ADDR_FIRST + 1) > > > > > > /* The shift of source_id in the key of IOTLB hash table */ > > > -#define VTD_IOTLB_SID_SHIFT 36 > > > -#define VTD_IOTLB_LVL_SHIFT 52 > > > +#define VTD_IOTLB_SID_SHIFT 45 > > > +#define VTD_IOTLB_LVL_SHIFT 61 > > > #define VTD_IOTLB_MAX_SIZE 1024 /* Max size of the hash table */ > > > > > > /* IOTLB_REG */ > > > @@ -212,6 +212,8 @@ > > > #define VTD_CAP_SAGAW_39bit (0x2ULL << VTD_CAP_SAGAW_SHIFT) > > > /* 48-bit AGAW, 4-level page-table */ > > > #define VTD_CAP_SAGAW_48bit (0x4ULL << VTD_CAP_SAGAW_SHIFT) > > > + /* 57-bit AGAW, 5-level page-table */ > > > +#define VTD_CAP_SAGAW_57bit (0x8ULL << VTD_CAP_SAGAW_SHIFT) > > > > > > /* IQT_REG */ > > > #define VTD_IQT_QT(val) (((val) >> 4) & 0x7fffULL) > > > @@ -379,6 +381,8 @@ typedef union VTDInvDesc VTDInvDesc; > > > (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > #define VTD_SPTE_PAGE_L4_RSVD_MASK(aw) \ > > > (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > +#define VTD_SPTE_PAGE_L5_RSVD_MASK(aw) \ > > > + (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > #define VTD_SPTE_LPAGE_L1_RSVD_MASK(aw) \ > > > (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > #define VTD_SPTE_LPAGE_L2_RSVD_MASK(aw) \ > > > @@ -387,6 +391,8 @@ typedef union VTDInvDesc VTDInvDesc; > > > (0x3ffff800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > #define VTD_SPTE_LPAGE_L4_RSVD_MASK(aw) \ > > > (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > +#define VTD_SPTE_LPAGE_L5_RSVD_MASK(aw) \ > > > + (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > /* Information about page-selective IOTLB invalidate */ > > > struct VTDIOTLBPageInvInfo { > > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h > > > index 820451c..7474c4f 100644 > > > --- a/include/hw/i386/intel_iommu.h > > > +++ b/include/hw/i386/intel_iommu.h > > > @@ -49,6 +49,7 @@ > > > #define DMAR_REG_SIZE 0x230 > > > #define VTD_AW_39BIT 39 > > > #define VTD_AW_48BIT 48 > > > +#define VTD_AW_57BIT 57 > > > #define VTD_ADDRESS_WIDTH VTD_AW_39BIT > > > #define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1) > > > > > > > > > B.R. > Yu > ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width. 2018-12-18 10:01 ` Yu Zhang @ 2018-12-18 12:43 ` Michael S. Tsirkin 2018-12-18 13:45 ` Yu Zhang 0 siblings, 1 reply; 57+ messages in thread From: Michael S. Tsirkin @ 2018-12-18 12:43 UTC (permalink / raw) To: Yu Zhang Cc: Igor Mammedov, Eduardo Habkost, qemu-devel, Peter Xu, Paolo Bonzini, Richard Henderson On Tue, Dec 18, 2018 at 06:01:16PM +0800, Yu Zhang wrote: > On Tue, Dec 18, 2018 at 05:47:14PM +0800, Yu Zhang wrote: > > On Mon, Dec 17, 2018 at 02:29:02PM +0100, Igor Mammedov wrote: > > > On Wed, 12 Dec 2018 21:05:39 +0800 > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > A 5-level paging capable VM may choose to use 57-bit IOVA address width. > > > > E.g. guest applications may prefer to use its VA as IOVA when performing > > > > VFIO map/unmap operations, to avoid the burden of managing the IOVA space. > > > > > > > > This patch extends the current vIOMMU logic to cover the extended address > > > > width. When creating a VM with 5-level paging feature, one can choose to > > > > create a virtual VTD with 5-level paging capability, with configurations > > > > like "-device intel-iommu,x-aw-bits=57". > > > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > > --- > > > > Cc: "Michael S. Tsirkin" <mst@redhat.com> > > > > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > > > > Cc: Paolo Bonzini <pbonzini@redhat.com> > > > > Cc: Richard Henderson <rth@twiddle.net> > > > > Cc: Eduardo Habkost <ehabkost@redhat.com> > > > > Cc: Peter Xu <peterx@redhat.com> > > > > --- > > > > hw/i386/intel_iommu.c | 53 ++++++++++++++++++++++++++++++++---------- > > > > hw/i386/intel_iommu_internal.h | 10 ++++++-- > > > > include/hw/i386/intel_iommu.h | 1 + > > > > 3 files changed, 50 insertions(+), 14 deletions(-) > > > > > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > > > > index 0e88c63..871110c 100644 > > > > --- a/hw/i386/intel_iommu.c > > > > +++ b/hw/i386/intel_iommu.c > > > > @@ -664,16 +664,16 @@ static inline bool vtd_iova_range_check(uint64_t iova, VTDContextEntry *ce, > > > > > > > > /* > > > > * Rsvd field masks for spte: > > > > - * Index [1] to [4] 4k pages > > > > - * Index [5] to [8] large pages > > > > + * Index [1] to [5] 4k pages > > > > + * Index [6] to [10] large pages > > > > */ > > > > -static uint64_t vtd_paging_entry_rsvd_field[9]; > > > > +static uint64_t vtd_paging_entry_rsvd_field[11]; > > > > > > > > static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t level) > > > > { > > > > if (slpte & VTD_SL_PT_PAGE_SIZE_MASK) { > > > > /* Maybe large page */ > > > > - return slpte & vtd_paging_entry_rsvd_field[level + 4]; > > > > + return slpte & vtd_paging_entry_rsvd_field[level + 5]; > > > > } else { > > > > return slpte & vtd_paging_entry_rsvd_field[level]; > > > > } > > > > @@ -3127,6 +3127,8 @@ static void vtd_init(IntelIOMMUState *s) > > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > > if (s->aw_bits == VTD_AW_48BIT) { > > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > > + } else if (s->aw_bits == VTD_AW_57BIT) { > > > > + s->cap |= VTD_CAP_SAGAW_57bit | VTD_CAP_SAGAW_48bit; > > > > } > > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > > s->haw_bits = cpu->phys_bits; > > > > @@ -3139,10 +3141,12 @@ static void vtd_init(IntelIOMMUState *s) > > > > vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); > > > > vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); > > > > vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); > > > > - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_PAGE_L5_RSVD_MASK(s->haw_bits); > > > > + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > + vtd_paging_entry_rsvd_field[9] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > + vtd_paging_entry_rsvd_field[10] = VTD_SPTE_LPAGE_L5_RSVD_MASK(s->haw_bits); > > > > > > > > if (x86_iommu->intr_supported) { > > > > s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; > > > > @@ -3241,6 +3245,23 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn) > > > > return &vtd_as->as; > > > > } > > > > > > > > +static bool host_has_la57(void) > > > > +{ > > > > + uint32_t ecx, unused; > > > > + > > > > + host_cpuid(7, 0, &unused, &unused, &ecx, &unused); > > > > + return ecx & CPUID_7_0_ECX_LA57; > > > > +} > > > > + > > > > +static bool guest_has_la57(void) > > > > +{ > > > > + CPUState *cs = first_cpu; > > > > + X86CPU *cpu = X86_CPU(cs); > > > > + CPUX86State *env = &cpu->env; > > > > + > > > > + return env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_LA57; > > > > +} > > > another direct access to CPU fields, > > > I'd suggest to set this value when iommu is created > > > i.e. add 'la57' property and set from iommu owner. > > > > > > > Sorry, do you mean "-device intel-iommu,la57"? I think we do not need > > that, because a 5-level capable vIOMMU can be created with properties > > like "-device intel-iommu,x-aw-bits=57". > > > > The guest CPU fields are checked to make sure the VM has LA57 CPU feature, > > because I believe there shall be no 5-level IOMMU on platforms without LA57 > > CPUs. I don't necessarily see why these need to be connected. If yes pls add code to explain. > > > > static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > { > > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > > @@ -3267,11 +3288,19 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > } > > > > } > > > > > > > > - /* Currently only address widths supported are 39 and 48 bits */ > > > > + /* Currently address widths supported are 39, 48, and 57 bits */ > > > > if ((s->aw_bits != VTD_AW_39BIT) && > > > > - (s->aw_bits != VTD_AW_48BIT)) { > > > > - error_setg(errp, "Supported values for x-aw-bits are: %d, %d", > > > > - VTD_AW_39BIT, VTD_AW_48BIT); > > > > + (s->aw_bits != VTD_AW_48BIT) && > > > > + (s->aw_bits != VTD_AW_57BIT)) { > > > > + error_setg(errp, "Supported values for x-aw-bits are: %d, %d, %d", > > > > + VTD_AW_39BIT, VTD_AW_48BIT, VTD_AW_57BIT); > > > > + return false; > > > > + } > > > > + > > > > + if ((s->aw_bits == VTD_AW_57BIT) && > > > > + !(host_has_la57() && guest_has_la57())) { > > > Does iommu supposed to work in TCG mode? > > > If yes then why it should care about host_has_la57()? > > > > > > > Hmm... I did not take TCG mode into consideration. And host_has_la57() is > > used to guarantee the host have la57 feature so that iommu shadowing works > > for device assignment. > > > > I guess iommu shall work in TCG mode(though I am not quite sure about this). > > But I do not have any usage case of a 5-level vIOMMU in TCG in mind. So maybe > > we can: > > 1> check the 'ms->accel' in vtd_decide_config() and do not care about host > > capability if it is TCG. > > For choice 1, kvm_enabled() might be used instead of ms->accel. Thanks Peter > for the remind. :) This needs a big comment with an explanation though. And probably a TODO to make it work under TCG ... > > 2> Or, we can choose to keep as it is, and add the check when 5-level paging > > vIOMMU does have usage in TCG? > > > > But as to the check of guest capability, I still believe it is necessary. As > > said, a VM without LA57 feature shall not see a VT-d with 5-level IOMMU. > > > > > > + error_setg(errp, "Do not support 57-bit DMA address, unless both " > > > > + "host and guest are capable of 5-level paging"); > > > > return false; > > > > } > > > > > > > > diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h > > > > index d084099..2b29b6f 100644 > > > > --- a/hw/i386/intel_iommu_internal.h > > > > +++ b/hw/i386/intel_iommu_internal.h > > > > @@ -114,8 +114,8 @@ > > > > VTD_INTERRUPT_ADDR_FIRST + 1) > > > > > > > > /* The shift of source_id in the key of IOTLB hash table */ > > > > -#define VTD_IOTLB_SID_SHIFT 36 > > > > -#define VTD_IOTLB_LVL_SHIFT 52 > > > > +#define VTD_IOTLB_SID_SHIFT 45 > > > > +#define VTD_IOTLB_LVL_SHIFT 61 > > > > #define VTD_IOTLB_MAX_SIZE 1024 /* Max size of the hash table */ > > > > > > > > /* IOTLB_REG */ > > > > @@ -212,6 +212,8 @@ > > > > #define VTD_CAP_SAGAW_39bit (0x2ULL << VTD_CAP_SAGAW_SHIFT) > > > > /* 48-bit AGAW, 4-level page-table */ > > > > #define VTD_CAP_SAGAW_48bit (0x4ULL << VTD_CAP_SAGAW_SHIFT) > > > > + /* 57-bit AGAW, 5-level page-table */ > > > > +#define VTD_CAP_SAGAW_57bit (0x8ULL << VTD_CAP_SAGAW_SHIFT) > > > > > > > > /* IQT_REG */ > > > > #define VTD_IQT_QT(val) (((val) >> 4) & 0x7fffULL) > > > > @@ -379,6 +381,8 @@ typedef union VTDInvDesc VTDInvDesc; > > > > (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > #define VTD_SPTE_PAGE_L4_RSVD_MASK(aw) \ > > > > (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > +#define VTD_SPTE_PAGE_L5_RSVD_MASK(aw) \ > > > > + (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > #define VTD_SPTE_LPAGE_L1_RSVD_MASK(aw) \ > > > > (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > #define VTD_SPTE_LPAGE_L2_RSVD_MASK(aw) \ > > > > @@ -387,6 +391,8 @@ typedef union VTDInvDesc VTDInvDesc; > > > > (0x3ffff800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > #define VTD_SPTE_LPAGE_L4_RSVD_MASK(aw) \ > > > > (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > +#define VTD_SPTE_LPAGE_L5_RSVD_MASK(aw) \ > > > > + (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > /* Information about page-selective IOTLB invalidate */ > > > > struct VTDIOTLBPageInvInfo { > > > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h > > > > index 820451c..7474c4f 100644 > > > > --- a/include/hw/i386/intel_iommu.h > > > > +++ b/include/hw/i386/intel_iommu.h > > > > @@ -49,6 +49,7 @@ > > > > #define DMAR_REG_SIZE 0x230 > > > > #define VTD_AW_39BIT 39 > > > > #define VTD_AW_48BIT 48 > > > > +#define VTD_AW_57BIT 57 > > > > #define VTD_ADDRESS_WIDTH VTD_AW_39BIT > > > > #define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1) > > > > > > > > > > > > > > B.R. > > Yu > > ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width. 2018-12-18 12:43 ` Michael S. Tsirkin @ 2018-12-18 13:45 ` Yu Zhang 2018-12-18 14:49 ` Michael S. Tsirkin 0 siblings, 1 reply; 57+ messages in thread From: Yu Zhang @ 2018-12-18 13:45 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Igor Mammedov, Eduardo Habkost, qemu-devel, Peter Xu, Paolo Bonzini, Richard Henderson On Tue, Dec 18, 2018 at 07:43:28AM -0500, Michael S. Tsirkin wrote: > On Tue, Dec 18, 2018 at 06:01:16PM +0800, Yu Zhang wrote: > > On Tue, Dec 18, 2018 at 05:47:14PM +0800, Yu Zhang wrote: > > > On Mon, Dec 17, 2018 at 02:29:02PM +0100, Igor Mammedov wrote: > > > > On Wed, 12 Dec 2018 21:05:39 +0800 > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > A 5-level paging capable VM may choose to use 57-bit IOVA address width. > > > > > E.g. guest applications may prefer to use its VA as IOVA when performing > > > > > VFIO map/unmap operations, to avoid the burden of managing the IOVA space. > > > > > > > > > > This patch extends the current vIOMMU logic to cover the extended address > > > > > width. When creating a VM with 5-level paging feature, one can choose to > > > > > create a virtual VTD with 5-level paging capability, with configurations > > > > > like "-device intel-iommu,x-aw-bits=57". > > > > > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > > > --- > > > > > Cc: "Michael S. Tsirkin" <mst@redhat.com> > > > > > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > > > > > Cc: Paolo Bonzini <pbonzini@redhat.com> > > > > > Cc: Richard Henderson <rth@twiddle.net> > > > > > Cc: Eduardo Habkost <ehabkost@redhat.com> > > > > > Cc: Peter Xu <peterx@redhat.com> > > > > > --- > > > > > hw/i386/intel_iommu.c | 53 ++++++++++++++++++++++++++++++++---------- > > > > > hw/i386/intel_iommu_internal.h | 10 ++++++-- > > > > > include/hw/i386/intel_iommu.h | 1 + > > > > > 3 files changed, 50 insertions(+), 14 deletions(-) > > > > > > > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > > > > > index 0e88c63..871110c 100644 > > > > > --- a/hw/i386/intel_iommu.c > > > > > +++ b/hw/i386/intel_iommu.c > > > > > @@ -664,16 +664,16 @@ static inline bool vtd_iova_range_check(uint64_t iova, VTDContextEntry *ce, > > > > > > > > > > /* > > > > > * Rsvd field masks for spte: > > > > > - * Index [1] to [4] 4k pages > > > > > - * Index [5] to [8] large pages > > > > > + * Index [1] to [5] 4k pages > > > > > + * Index [6] to [10] large pages > > > > > */ > > > > > -static uint64_t vtd_paging_entry_rsvd_field[9]; > > > > > +static uint64_t vtd_paging_entry_rsvd_field[11]; > > > > > > > > > > static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t level) > > > > > { > > > > > if (slpte & VTD_SL_PT_PAGE_SIZE_MASK) { > > > > > /* Maybe large page */ > > > > > - return slpte & vtd_paging_entry_rsvd_field[level + 4]; > > > > > + return slpte & vtd_paging_entry_rsvd_field[level + 5]; > > > > > } else { > > > > > return slpte & vtd_paging_entry_rsvd_field[level]; > > > > > } > > > > > @@ -3127,6 +3127,8 @@ static void vtd_init(IntelIOMMUState *s) > > > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > > > if (s->aw_bits == VTD_AW_48BIT) { > > > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > > > + } else if (s->aw_bits == VTD_AW_57BIT) { > > > > > + s->cap |= VTD_CAP_SAGAW_57bit | VTD_CAP_SAGAW_48bit; > > > > > } > > > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > > > s->haw_bits = cpu->phys_bits; > > > > > @@ -3139,10 +3141,12 @@ static void vtd_init(IntelIOMMUState *s) > > > > > vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); > > > > > vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); > > > > > vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); > > > > > - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > > - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > > - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > > - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_PAGE_L5_RSVD_MASK(s->haw_bits); > > > > > + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > > + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > > + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > > + vtd_paging_entry_rsvd_field[9] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > + vtd_paging_entry_rsvd_field[10] = VTD_SPTE_LPAGE_L5_RSVD_MASK(s->haw_bits); > > > > > > > > > > if (x86_iommu->intr_supported) { > > > > > s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; > > > > > @@ -3241,6 +3245,23 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn) > > > > > return &vtd_as->as; > > > > > } > > > > > > > > > > +static bool host_has_la57(void) > > > > > +{ > > > > > + uint32_t ecx, unused; > > > > > + > > > > > + host_cpuid(7, 0, &unused, &unused, &ecx, &unused); > > > > > + return ecx & CPUID_7_0_ECX_LA57; > > > > > +} > > > > > + > > > > > +static bool guest_has_la57(void) > > > > > +{ > > > > > + CPUState *cs = first_cpu; > > > > > + X86CPU *cpu = X86_CPU(cs); > > > > > + CPUX86State *env = &cpu->env; > > > > > + > > > > > + return env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_LA57; > > > > > +} > > > > another direct access to CPU fields, > > > > I'd suggest to set this value when iommu is created > > > > i.e. add 'la57' property and set from iommu owner. > > > > > > > > > > Sorry, do you mean "-device intel-iommu,la57"? I think we do not need > > > that, because a 5-level capable vIOMMU can be created with properties > > > like "-device intel-iommu,x-aw-bits=57". > > > > > > The guest CPU fields are checked to make sure the VM has LA57 CPU feature, > > > because I believe there shall be no 5-level IOMMU on platforms without LA57 > > > CPUs. > > I don't necessarily see why these need to be connected. > If yes pls add code to explain. Sorry, do you mean the VM shall be able to see a 5-level IOMMU even it does not have LA57 feature? I do not see any direct connection when asked to enable a 5-level vIOMMU at first, but I was told(and checked) that DPDK in the VM may choose a VA value as an IOVA. And if guest has LA57, we should create a 5-level vIOMMU to the VM. But if the VM even does not have LA57, any specific reason we should give it a 5-level vIOMMU? > > > > > > > static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > > { > > > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > > > @@ -3267,11 +3288,19 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > > } > > > > > } > > > > > > > > > > - /* Currently only address widths supported are 39 and 48 bits */ > > > > > + /* Currently address widths supported are 39, 48, and 57 bits */ > > > > > if ((s->aw_bits != VTD_AW_39BIT) && > > > > > - (s->aw_bits != VTD_AW_48BIT)) { > > > > > - error_setg(errp, "Supported values for x-aw-bits are: %d, %d", > > > > > - VTD_AW_39BIT, VTD_AW_48BIT); > > > > > + (s->aw_bits != VTD_AW_48BIT) && > > > > > + (s->aw_bits != VTD_AW_57BIT)) { > > > > > + error_setg(errp, "Supported values for x-aw-bits are: %d, %d, %d", > > > > > + VTD_AW_39BIT, VTD_AW_48BIT, VTD_AW_57BIT); > > > > > + return false; > > > > > + } > > > > > + > > > > > + if ((s->aw_bits == VTD_AW_57BIT) && > > > > > + !(host_has_la57() && guest_has_la57())) { > > > > Does iommu supposed to work in TCG mode? > > > > If yes then why it should care about host_has_la57()? > > > > > > > > > > Hmm... I did not take TCG mode into consideration. And host_has_la57() is > > > used to guarantee the host have la57 feature so that iommu shadowing works > > > for device assignment. > > > > > > I guess iommu shall work in TCG mode(though I am not quite sure about this). > > > But I do not have any usage case of a 5-level vIOMMU in TCG in mind. So maybe > > > we can: > > > 1> check the 'ms->accel' in vtd_decide_config() and do not care about host > > > capability if it is TCG. > > > > For choice 1, kvm_enabled() might be used instead of ms->accel. Thanks Peter > > for the remind. :) > > > This needs a big comment with an explanation though. > And probably a TODO to make it work under TCG ... > Thanks, Michael. For choice 1, I believe it should work for TCG(will need test though), and the condition would be sth. like: if ((s->aw_bits == VTD_AW_57BIT) && kvm_enabled() && !host_has_la57()) { As you can see, though I remove the check of guest_has_la57(), I still kept the check against host when KVM is enabled. I'm still ready to be convinced for any requirement why we do not need the guest check. :) > > > 2> Or, we can choose to keep as it is, and add the check when 5-level paging > > > vIOMMU does have usage in TCG? > > > > > > But as to the check of guest capability, I still believe it is necessary. As > > > said, a VM without LA57 feature shall not see a VT-d with 5-level IOMMU. > > > > > > > > + error_setg(errp, "Do not support 57-bit DMA address, unless both " > > > > > + "host and guest are capable of 5-level paging"); > > > > > return false; > > > > > } > > > > > > > > > > diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h > > > > > index d084099..2b29b6f 100644 > > > > > --- a/hw/i386/intel_iommu_internal.h > > > > > +++ b/hw/i386/intel_iommu_internal.h > > > > > @@ -114,8 +114,8 @@ > > > > > VTD_INTERRUPT_ADDR_FIRST + 1) > > > > > > > > > > /* The shift of source_id in the key of IOTLB hash table */ > > > > > -#define VTD_IOTLB_SID_SHIFT 36 > > > > > -#define VTD_IOTLB_LVL_SHIFT 52 > > > > > +#define VTD_IOTLB_SID_SHIFT 45 > > > > > +#define VTD_IOTLB_LVL_SHIFT 61 > > > > > #define VTD_IOTLB_MAX_SIZE 1024 /* Max size of the hash table */ > > > > > > > > > > /* IOTLB_REG */ > > > > > @@ -212,6 +212,8 @@ > > > > > #define VTD_CAP_SAGAW_39bit (0x2ULL << VTD_CAP_SAGAW_SHIFT) > > > > > /* 48-bit AGAW, 4-level page-table */ > > > > > #define VTD_CAP_SAGAW_48bit (0x4ULL << VTD_CAP_SAGAW_SHIFT) > > > > > + /* 57-bit AGAW, 5-level page-table */ > > > > > +#define VTD_CAP_SAGAW_57bit (0x8ULL << VTD_CAP_SAGAW_SHIFT) > > > > > > > > > > /* IQT_REG */ > > > > > #define VTD_IQT_QT(val) (((val) >> 4) & 0x7fffULL) > > > > > @@ -379,6 +381,8 @@ typedef union VTDInvDesc VTDInvDesc; > > > > > (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > #define VTD_SPTE_PAGE_L4_RSVD_MASK(aw) \ > > > > > (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > +#define VTD_SPTE_PAGE_L5_RSVD_MASK(aw) \ > > > > > + (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > #define VTD_SPTE_LPAGE_L1_RSVD_MASK(aw) \ > > > > > (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > #define VTD_SPTE_LPAGE_L2_RSVD_MASK(aw) \ > > > > > @@ -387,6 +391,8 @@ typedef union VTDInvDesc VTDInvDesc; > > > > > (0x3ffff800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > #define VTD_SPTE_LPAGE_L4_RSVD_MASK(aw) \ > > > > > (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > +#define VTD_SPTE_LPAGE_L5_RSVD_MASK(aw) \ > > > > > + (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > > /* Information about page-selective IOTLB invalidate */ > > > > > struct VTDIOTLBPageInvInfo { > > > > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h > > > > > index 820451c..7474c4f 100644 > > > > > --- a/include/hw/i386/intel_iommu.h > > > > > +++ b/include/hw/i386/intel_iommu.h > > > > > @@ -49,6 +49,7 @@ > > > > > #define DMAR_REG_SIZE 0x230 > > > > > #define VTD_AW_39BIT 39 > > > > > #define VTD_AW_48BIT 48 > > > > > +#define VTD_AW_57BIT 57 > > > > > #define VTD_ADDRESS_WIDTH VTD_AW_39BIT > > > > > #define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1) > > > > > > > > > > > > > > > > > > > B.R. > > > Yu > > > B.R. Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width. 2018-12-18 13:45 ` Yu Zhang @ 2018-12-18 14:49 ` Michael S. Tsirkin 2018-12-19 3:40 ` Yu Zhang 0 siblings, 1 reply; 57+ messages in thread From: Michael S. Tsirkin @ 2018-12-18 14:49 UTC (permalink / raw) To: Yu Zhang Cc: Igor Mammedov, Eduardo Habkost, qemu-devel, Peter Xu, Paolo Bonzini, Richard Henderson On Tue, Dec 18, 2018 at 09:45:41PM +0800, Yu Zhang wrote: > On Tue, Dec 18, 2018 at 07:43:28AM -0500, Michael S. Tsirkin wrote: > > On Tue, Dec 18, 2018 at 06:01:16PM +0800, Yu Zhang wrote: > > > On Tue, Dec 18, 2018 at 05:47:14PM +0800, Yu Zhang wrote: > > > > On Mon, Dec 17, 2018 at 02:29:02PM +0100, Igor Mammedov wrote: > > > > > On Wed, 12 Dec 2018 21:05:39 +0800 > > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > > > A 5-level paging capable VM may choose to use 57-bit IOVA address width. > > > > > > E.g. guest applications may prefer to use its VA as IOVA when performing > > > > > > VFIO map/unmap operations, to avoid the burden of managing the IOVA space. > > > > > > > > > > > > This patch extends the current vIOMMU logic to cover the extended address > > > > > > width. When creating a VM with 5-level paging feature, one can choose to > > > > > > create a virtual VTD with 5-level paging capability, with configurations > > > > > > like "-device intel-iommu,x-aw-bits=57". > > > > > > > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > > > > --- > > > > > > Cc: "Michael S. Tsirkin" <mst@redhat.com> > > > > > > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > > > > > > Cc: Paolo Bonzini <pbonzini@redhat.com> > > > > > > Cc: Richard Henderson <rth@twiddle.net> > > > > > > Cc: Eduardo Habkost <ehabkost@redhat.com> > > > > > > Cc: Peter Xu <peterx@redhat.com> > > > > > > --- > > > > > > hw/i386/intel_iommu.c | 53 ++++++++++++++++++++++++++++++++---------- > > > > > > hw/i386/intel_iommu_internal.h | 10 ++++++-- > > > > > > include/hw/i386/intel_iommu.h | 1 + > > > > > > 3 files changed, 50 insertions(+), 14 deletions(-) > > > > > > > > > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > > > > > > index 0e88c63..871110c 100644 > > > > > > --- a/hw/i386/intel_iommu.c > > > > > > +++ b/hw/i386/intel_iommu.c > > > > > > @@ -664,16 +664,16 @@ static inline bool vtd_iova_range_check(uint64_t iova, VTDContextEntry *ce, > > > > > > > > > > > > /* > > > > > > * Rsvd field masks for spte: > > > > > > - * Index [1] to [4] 4k pages > > > > > > - * Index [5] to [8] large pages > > > > > > + * Index [1] to [5] 4k pages > > > > > > + * Index [6] to [10] large pages > > > > > > */ > > > > > > -static uint64_t vtd_paging_entry_rsvd_field[9]; > > > > > > +static uint64_t vtd_paging_entry_rsvd_field[11]; > > > > > > > > > > > > static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t level) > > > > > > { > > > > > > if (slpte & VTD_SL_PT_PAGE_SIZE_MASK) { > > > > > > /* Maybe large page */ > > > > > > - return slpte & vtd_paging_entry_rsvd_field[level + 4]; > > > > > > + return slpte & vtd_paging_entry_rsvd_field[level + 5]; > > > > > > } else { > > > > > > return slpte & vtd_paging_entry_rsvd_field[level]; > > > > > > } > > > > > > @@ -3127,6 +3127,8 @@ static void vtd_init(IntelIOMMUState *s) > > > > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > > > > if (s->aw_bits == VTD_AW_48BIT) { > > > > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > > > > + } else if (s->aw_bits == VTD_AW_57BIT) { > > > > > > + s->cap |= VTD_CAP_SAGAW_57bit | VTD_CAP_SAGAW_48bit; > > > > > > } > > > > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > > > > s->haw_bits = cpu->phys_bits; > > > > > > @@ -3139,10 +3141,12 @@ static void vtd_init(IntelIOMMUState *s) > > > > > > vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > > > - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_PAGE_L5_RSVD_MASK(s->haw_bits); > > > > > > + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > > > + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > + vtd_paging_entry_rsvd_field[9] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > + vtd_paging_entry_rsvd_field[10] = VTD_SPTE_LPAGE_L5_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > if (x86_iommu->intr_supported) { > > > > > > s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; > > > > > > @@ -3241,6 +3245,23 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn) > > > > > > return &vtd_as->as; > > > > > > } > > > > > > > > > > > > +static bool host_has_la57(void) > > > > > > +{ > > > > > > + uint32_t ecx, unused; > > > > > > + > > > > > > + host_cpuid(7, 0, &unused, &unused, &ecx, &unused); > > > > > > + return ecx & CPUID_7_0_ECX_LA57; > > > > > > +} > > > > > > + > > > > > > +static bool guest_has_la57(void) > > > > > > +{ > > > > > > + CPUState *cs = first_cpu; > > > > > > + X86CPU *cpu = X86_CPU(cs); > > > > > > + CPUX86State *env = &cpu->env; > > > > > > + > > > > > > + return env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_LA57; > > > > > > +} > > > > > another direct access to CPU fields, > > > > > I'd suggest to set this value when iommu is created > > > > > i.e. add 'la57' property and set from iommu owner. > > > > > > > > > > > > > Sorry, do you mean "-device intel-iommu,la57"? I think we do not need > > > > that, because a 5-level capable vIOMMU can be created with properties > > > > like "-device intel-iommu,x-aw-bits=57". > > > > > > > > The guest CPU fields are checked to make sure the VM has LA57 CPU feature, > > > > because I believe there shall be no 5-level IOMMU on platforms without LA57 > > > > CPUs. > > > > I don't necessarily see why these need to be connected. > > If yes pls add code to explain. > > Sorry, do you mean the VM shall be able to see a 5-level IOMMU even it does not > have LA57 feature? I do not see any direct connection when asked to enable a 5-level > vIOMMU at first, but I was told(and checked) that DPDK in the VM may choose a VA > value as an IOVA. Right but then that doesn't work on all hosts either. > And if guest has LA57, we should create a 5-level vIOMMU to the VM. > But if the VM even does not have LA57, any specific reason we should give it a 5-level > vIOMMU? So the example you give is VTD address width < CPU aw. That is known to be problematic for dpdk but not for other software and maybe dpdk will learns how to cope. Given such hosts exist it might be useful to support this at least for debugging. Are there reasons to worry about VTD > CPU? > > > > > > > > > > static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > > > { > > > > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > > > > @@ -3267,11 +3288,19 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > > > } > > > > > > } > > > > > > > > > > > > - /* Currently only address widths supported are 39 and 48 bits */ > > > > > > + /* Currently address widths supported are 39, 48, and 57 bits */ > > > > > > if ((s->aw_bits != VTD_AW_39BIT) && > > > > > > - (s->aw_bits != VTD_AW_48BIT)) { > > > > > > - error_setg(errp, "Supported values for x-aw-bits are: %d, %d", > > > > > > - VTD_AW_39BIT, VTD_AW_48BIT); > > > > > > + (s->aw_bits != VTD_AW_48BIT) && > > > > > > + (s->aw_bits != VTD_AW_57BIT)) { > > > > > > + error_setg(errp, "Supported values for x-aw-bits are: %d, %d, %d", > > > > > > + VTD_AW_39BIT, VTD_AW_48BIT, VTD_AW_57BIT); > > > > > > + return false; > > > > > > + } > > > > > > + > > > > > > + if ((s->aw_bits == VTD_AW_57BIT) && > > > > > > + !(host_has_la57() && guest_has_la57())) { > > > > > Does iommu supposed to work in TCG mode? > > > > > If yes then why it should care about host_has_la57()? > > > > > > > > > > > > > Hmm... I did not take TCG mode into consideration. And host_has_la57() is > > > > used to guarantee the host have la57 feature so that iommu shadowing works > > > > for device assignment. > > > > > > > > I guess iommu shall work in TCG mode(though I am not quite sure about this). > > > > But I do not have any usage case of a 5-level vIOMMU in TCG in mind. So maybe > > > > we can: > > > > 1> check the 'ms->accel' in vtd_decide_config() and do not care about host > > > > capability if it is TCG. > > > > > > For choice 1, kvm_enabled() might be used instead of ms->accel. Thanks Peter > > > for the remind. :) > > > > > > This needs a big comment with an explanation though. > > And probably a TODO to make it work under TCG ... > > > > Thanks, Michael. For choice 1, I believe it should work for TCG(will need test > though), and the condition would be sth. like: > > if ((s->aw_bits == VTD_AW_57BIT) && > kvm_enabled() && > !host_has_la57()) { > > As you can see, though I remove the check of guest_has_la57(), I still kept the > check against host when KVM is enabled. I'm still ready to be convinced for any > requirement why we do not need the guest check. :) okay but then (repeating myself, sorry) pls add a comment that explains what happens if you do not add this limitation. > > > > 2> Or, we can choose to keep as it is, and add the check when 5-level paging > > > > vIOMMU does have usage in TCG? > > > > > > > > But as to the check of guest capability, I still believe it is necessary. As > > > > said, a VM without LA57 feature shall not see a VT-d with 5-level IOMMU. > > > > > > > > > > + error_setg(errp, "Do not support 57-bit DMA address, unless both " > > > > > > + "host and guest are capable of 5-level paging"); > > > > > > return false; > > > > > > } > > > > > > > > > > > > diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h > > > > > > index d084099..2b29b6f 100644 > > > > > > --- a/hw/i386/intel_iommu_internal.h > > > > > > +++ b/hw/i386/intel_iommu_internal.h > > > > > > @@ -114,8 +114,8 @@ > > > > > > VTD_INTERRUPT_ADDR_FIRST + 1) > > > > > > > > > > > > /* The shift of source_id in the key of IOTLB hash table */ > > > > > > -#define VTD_IOTLB_SID_SHIFT 36 > > > > > > -#define VTD_IOTLB_LVL_SHIFT 52 > > > > > > +#define VTD_IOTLB_SID_SHIFT 45 > > > > > > +#define VTD_IOTLB_LVL_SHIFT 61 > > > > > > #define VTD_IOTLB_MAX_SIZE 1024 /* Max size of the hash table */ > > > > > > > > > > > > /* IOTLB_REG */ > > > > > > @@ -212,6 +212,8 @@ > > > > > > #define VTD_CAP_SAGAW_39bit (0x2ULL << VTD_CAP_SAGAW_SHIFT) > > > > > > /* 48-bit AGAW, 4-level page-table */ > > > > > > #define VTD_CAP_SAGAW_48bit (0x4ULL << VTD_CAP_SAGAW_SHIFT) > > > > > > + /* 57-bit AGAW, 5-level page-table */ > > > > > > +#define VTD_CAP_SAGAW_57bit (0x8ULL << VTD_CAP_SAGAW_SHIFT) > > > > > > > > > > > > /* IQT_REG */ > > > > > > #define VTD_IQT_QT(val) (((val) >> 4) & 0x7fffULL) > > > > > > @@ -379,6 +381,8 @@ typedef union VTDInvDesc VTDInvDesc; > > > > > > (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > #define VTD_SPTE_PAGE_L4_RSVD_MASK(aw) \ > > > > > > (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > +#define VTD_SPTE_PAGE_L5_RSVD_MASK(aw) \ > > > > > > + (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > #define VTD_SPTE_LPAGE_L1_RSVD_MASK(aw) \ > > > > > > (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > #define VTD_SPTE_LPAGE_L2_RSVD_MASK(aw) \ > > > > > > @@ -387,6 +391,8 @@ typedef union VTDInvDesc VTDInvDesc; > > > > > > (0x3ffff800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > #define VTD_SPTE_LPAGE_L4_RSVD_MASK(aw) \ > > > > > > (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > +#define VTD_SPTE_LPAGE_L5_RSVD_MASK(aw) \ > > > > > > + (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > > > > /* Information about page-selective IOTLB invalidate */ > > > > > > struct VTDIOTLBPageInvInfo { > > > > > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h > > > > > > index 820451c..7474c4f 100644 > > > > > > --- a/include/hw/i386/intel_iommu.h > > > > > > +++ b/include/hw/i386/intel_iommu.h > > > > > > @@ -49,6 +49,7 @@ > > > > > > #define DMAR_REG_SIZE 0x230 > > > > > > #define VTD_AW_39BIT 39 > > > > > > #define VTD_AW_48BIT 48 > > > > > > +#define VTD_AW_57BIT 57 > > > > > > #define VTD_ADDRESS_WIDTH VTD_AW_39BIT > > > > > > #define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1) > > > > > > > > > > > > > > > > > > > > > > > > B.R. > > > > Yu > > > > > > B.R. > Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width. 2018-12-18 14:49 ` Michael S. Tsirkin @ 2018-12-19 3:40 ` Yu Zhang 2018-12-19 4:35 ` Michael S. Tsirkin 0 siblings, 1 reply; 57+ messages in thread From: Yu Zhang @ 2018-12-19 3:40 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Eduardo Habkost, qemu-devel, Peter Xu, Paolo Bonzini, Igor Mammedov, Richard Henderson On Tue, Dec 18, 2018 at 09:49:02AM -0500, Michael S. Tsirkin wrote: > On Tue, Dec 18, 2018 at 09:45:41PM +0800, Yu Zhang wrote: > > On Tue, Dec 18, 2018 at 07:43:28AM -0500, Michael S. Tsirkin wrote: > > > On Tue, Dec 18, 2018 at 06:01:16PM +0800, Yu Zhang wrote: > > > > On Tue, Dec 18, 2018 at 05:47:14PM +0800, Yu Zhang wrote: > > > > > On Mon, Dec 17, 2018 at 02:29:02PM +0100, Igor Mammedov wrote: > > > > > > On Wed, 12 Dec 2018 21:05:39 +0800 > > > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > > > > > A 5-level paging capable VM may choose to use 57-bit IOVA address width. > > > > > > > E.g. guest applications may prefer to use its VA as IOVA when performing > > > > > > > VFIO map/unmap operations, to avoid the burden of managing the IOVA space. > > > > > > > > > > > > > > This patch extends the current vIOMMU logic to cover the extended address > > > > > > > width. When creating a VM with 5-level paging feature, one can choose to > > > > > > > create a virtual VTD with 5-level paging capability, with configurations > > > > > > > like "-device intel-iommu,x-aw-bits=57". > > > > > > > > > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > > > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > > > > > --- > > > > > > > Cc: "Michael S. Tsirkin" <mst@redhat.com> > > > > > > > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > > > > > > > Cc: Paolo Bonzini <pbonzini@redhat.com> > > > > > > > Cc: Richard Henderson <rth@twiddle.net> > > > > > > > Cc: Eduardo Habkost <ehabkost@redhat.com> > > > > > > > Cc: Peter Xu <peterx@redhat.com> > > > > > > > --- > > > > > > > hw/i386/intel_iommu.c | 53 ++++++++++++++++++++++++++++++++---------- > > > > > > > hw/i386/intel_iommu_internal.h | 10 ++++++-- > > > > > > > include/hw/i386/intel_iommu.h | 1 + > > > > > > > 3 files changed, 50 insertions(+), 14 deletions(-) > > > > > > > > > > > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > > > > > > > index 0e88c63..871110c 100644 > > > > > > > --- a/hw/i386/intel_iommu.c > > > > > > > +++ b/hw/i386/intel_iommu.c > > > > > > > @@ -664,16 +664,16 @@ static inline bool vtd_iova_range_check(uint64_t iova, VTDContextEntry *ce, > > > > > > > > > > > > > > /* > > > > > > > * Rsvd field masks for spte: > > > > > > > - * Index [1] to [4] 4k pages > > > > > > > - * Index [5] to [8] large pages > > > > > > > + * Index [1] to [5] 4k pages > > > > > > > + * Index [6] to [10] large pages > > > > > > > */ > > > > > > > -static uint64_t vtd_paging_entry_rsvd_field[9]; > > > > > > > +static uint64_t vtd_paging_entry_rsvd_field[11]; > > > > > > > > > > > > > > static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t level) > > > > > > > { > > > > > > > if (slpte & VTD_SL_PT_PAGE_SIZE_MASK) { > > > > > > > /* Maybe large page */ > > > > > > > - return slpte & vtd_paging_entry_rsvd_field[level + 4]; > > > > > > > + return slpte & vtd_paging_entry_rsvd_field[level + 5]; > > > > > > > } else { > > > > > > > return slpte & vtd_paging_entry_rsvd_field[level]; > > > > > > > } > > > > > > > @@ -3127,6 +3127,8 @@ static void vtd_init(IntelIOMMUState *s) > > > > > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > > > > > if (s->aw_bits == VTD_AW_48BIT) { > > > > > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > > > > > + } else if (s->aw_bits == VTD_AW_57BIT) { > > > > > > > + s->cap |= VTD_CAP_SAGAW_57bit | VTD_CAP_SAGAW_48bit; > > > > > > > } > > > > > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > > > > > s->haw_bits = cpu->phys_bits; > > > > > > > @@ -3139,10 +3141,12 @@ static void vtd_init(IntelIOMMUState *s) > > > > > > > vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > > vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > > vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > > > > - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > > - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > > - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_PAGE_L5_RSVD_MASK(s->haw_bits); > > > > > > > + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > > > > + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > > + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > > + vtd_paging_entry_rsvd_field[9] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > + vtd_paging_entry_rsvd_field[10] = VTD_SPTE_LPAGE_L5_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > > > if (x86_iommu->intr_supported) { > > > > > > > s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; > > > > > > > @@ -3241,6 +3245,23 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn) > > > > > > > return &vtd_as->as; > > > > > > > } > > > > > > > > > > > > > > +static bool host_has_la57(void) > > > > > > > +{ > > > > > > > + uint32_t ecx, unused; > > > > > > > + > > > > > > > + host_cpuid(7, 0, &unused, &unused, &ecx, &unused); > > > > > > > + return ecx & CPUID_7_0_ECX_LA57; > > > > > > > +} > > > > > > > + > > > > > > > +static bool guest_has_la57(void) > > > > > > > +{ > > > > > > > + CPUState *cs = first_cpu; > > > > > > > + X86CPU *cpu = X86_CPU(cs); > > > > > > > + CPUX86State *env = &cpu->env; > > > > > > > + > > > > > > > + return env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_LA57; > > > > > > > +} > > > > > > another direct access to CPU fields, > > > > > > I'd suggest to set this value when iommu is created > > > > > > i.e. add 'la57' property and set from iommu owner. > > > > > > > > > > > > > > > > Sorry, do you mean "-device intel-iommu,la57"? I think we do not need > > > > > that, because a 5-level capable vIOMMU can be created with properties > > > > > like "-device intel-iommu,x-aw-bits=57". > > > > > > > > > > The guest CPU fields are checked to make sure the VM has LA57 CPU feature, > > > > > because I believe there shall be no 5-level IOMMU on platforms without LA57 > > > > > CPUs. > > > > > > I don't necessarily see why these need to be connected. > > > If yes pls add code to explain. > > > > Sorry, do you mean the VM shall be able to see a 5-level IOMMU even it does not > > have LA57 feature? I do not see any direct connection when asked to enable a 5-level > > vIOMMU at first, but I was told(and checked) that DPDK in the VM may choose a VA > > value as an IOVA. > > Right but then that doesn't work on all hosts either. Oh, the host already has 5-level IOMMU now. So I think DPDK in native shall work with that. > > > And if guest has LA57, we should create a 5-level vIOMMU to the VM. > > But if the VM even does not have LA57, any specific reason we should give it a 5-level > > vIOMMU? > > So the example you give is VTD address width < CPU aw. That is known > to be problematic for dpdk but not for other software and maybe dpdk > will learns how to cope. Given such hosts exist it might be > useful to support this at least for debugging. > > Are there reasons to worry about VTD > CPU? Well, I am not that worried(no usage case is one concern). I am OK to drop the guest check. :) > > > > > > > > > > > > > > > static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > > > > { > > > > > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > > > > > @@ -3267,11 +3288,19 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > > > > } > > > > > > > } > > > > > > > > > > > > > > - /* Currently only address widths supported are 39 and 48 bits */ > > > > > > > + /* Currently address widths supported are 39, 48, and 57 bits */ > > > > > > > if ((s->aw_bits != VTD_AW_39BIT) && > > > > > > > - (s->aw_bits != VTD_AW_48BIT)) { > > > > > > > - error_setg(errp, "Supported values for x-aw-bits are: %d, %d", > > > > > > > - VTD_AW_39BIT, VTD_AW_48BIT); > > > > > > > + (s->aw_bits != VTD_AW_48BIT) && > > > > > > > + (s->aw_bits != VTD_AW_57BIT)) { > > > > > > > + error_setg(errp, "Supported values for x-aw-bits are: %d, %d, %d", > > > > > > > + VTD_AW_39BIT, VTD_AW_48BIT, VTD_AW_57BIT); > > > > > > > + return false; > > > > > > > + } > > > > > > > + > > > > > > > + if ((s->aw_bits == VTD_AW_57BIT) && > > > > > > > + !(host_has_la57() && guest_has_la57())) { > > > > > > Does iommu supposed to work in TCG mode? > > > > > > If yes then why it should care about host_has_la57()? > > > > > > > > > > > > > > > > Hmm... I did not take TCG mode into consideration. And host_has_la57() is > > > > > used to guarantee the host have la57 feature so that iommu shadowing works > > > > > for device assignment. > > > > > > > > > > I guess iommu shall work in TCG mode(though I am not quite sure about this). > > > > > But I do not have any usage case of a 5-level vIOMMU in TCG in mind. So maybe > > > > > we can: > > > > > 1> check the 'ms->accel' in vtd_decide_config() and do not care about host > > > > > capability if it is TCG. > > > > > > > > For choice 1, kvm_enabled() might be used instead of ms->accel. Thanks Peter > > > > for the remind. :) > > > > > > > > > This needs a big comment with an explanation though. > > > And probably a TODO to make it work under TCG ... > > > > > > > Thanks, Michael. For choice 1, I believe it should work for TCG(will need test > > though), and the condition would be sth. like: > > > > if ((s->aw_bits == VTD_AW_57BIT) && > > kvm_enabled() && > > !host_has_la57()) { > > > > As you can see, though I remove the check of guest_has_la57(), I still kept the > > check against host when KVM is enabled. I'm still ready to be convinced for any > > requirement why we do not need the guest check. :) > > > okay but then (repeating myself, sorry) pls add a comment that explains > what happens if you do not add this limitation. How about below comments? /* * For KVM guests, the host capability of LA57 shall be available, so * that iommu shadowing works for device assignment scenario. But for * TCG mode, we do not need such restriction. */ BTW, I just tested the TCG mode, it works(with restriction of host capability removed). > > > > > > > 2> Or, we can choose to keep as it is, and add the check when 5-level paging > > > > > vIOMMU does have usage in TCG? > > > > > > > > > > But as to the check of guest capability, I still believe it is necessary. As > > > > > said, a VM without LA57 feature shall not see a VT-d with 5-level IOMMU. > > > > > > > > > > > > + error_setg(errp, "Do not support 57-bit DMA address, unless both " > > > > > > > + "host and guest are capable of 5-level paging"); > > > > > > > return false; > > > > > > > } > > > > > > > > > > > > > > diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h > > > > > > > index d084099..2b29b6f 100644 > > > > > > > --- a/hw/i386/intel_iommu_internal.h > > > > > > > +++ b/hw/i386/intel_iommu_internal.h > > > > > > > @@ -114,8 +114,8 @@ > > > > > > > VTD_INTERRUPT_ADDR_FIRST + 1) > > > > > > > > > > > > > > /* The shift of source_id in the key of IOTLB hash table */ > > > > > > > -#define VTD_IOTLB_SID_SHIFT 36 > > > > > > > -#define VTD_IOTLB_LVL_SHIFT 52 > > > > > > > +#define VTD_IOTLB_SID_SHIFT 45 > > > > > > > +#define VTD_IOTLB_LVL_SHIFT 61 > > > > > > > #define VTD_IOTLB_MAX_SIZE 1024 /* Max size of the hash table */ > > > > > > > > > > > > > > /* IOTLB_REG */ > > > > > > > @@ -212,6 +212,8 @@ > > > > > > > #define VTD_CAP_SAGAW_39bit (0x2ULL << VTD_CAP_SAGAW_SHIFT) > > > > > > > /* 48-bit AGAW, 4-level page-table */ > > > > > > > #define VTD_CAP_SAGAW_48bit (0x4ULL << VTD_CAP_SAGAW_SHIFT) > > > > > > > + /* 57-bit AGAW, 5-level page-table */ > > > > > > > +#define VTD_CAP_SAGAW_57bit (0x8ULL << VTD_CAP_SAGAW_SHIFT) > > > > > > > > > > > > > > /* IQT_REG */ > > > > > > > #define VTD_IQT_QT(val) (((val) >> 4) & 0x7fffULL) > > > > > > > @@ -379,6 +381,8 @@ typedef union VTDInvDesc VTDInvDesc; > > > > > > > (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > #define VTD_SPTE_PAGE_L4_RSVD_MASK(aw) \ > > > > > > > (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > +#define VTD_SPTE_PAGE_L5_RSVD_MASK(aw) \ > > > > > > > + (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > #define VTD_SPTE_LPAGE_L1_RSVD_MASK(aw) \ > > > > > > > (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > #define VTD_SPTE_LPAGE_L2_RSVD_MASK(aw) \ > > > > > > > @@ -387,6 +391,8 @@ typedef union VTDInvDesc VTDInvDesc; > > > > > > > (0x3ffff800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > #define VTD_SPTE_LPAGE_L4_RSVD_MASK(aw) \ > > > > > > > (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > +#define VTD_SPTE_LPAGE_L5_RSVD_MASK(aw) \ > > > > > > > + (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > > > > > > /* Information about page-selective IOTLB invalidate */ > > > > > > > struct VTDIOTLBPageInvInfo { > > > > > > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h > > > > > > > index 820451c..7474c4f 100644 > > > > > > > --- a/include/hw/i386/intel_iommu.h > > > > > > > +++ b/include/hw/i386/intel_iommu.h > > > > > > > @@ -49,6 +49,7 @@ > > > > > > > #define DMAR_REG_SIZE 0x230 > > > > > > > #define VTD_AW_39BIT 39 > > > > > > > #define VTD_AW_48BIT 48 > > > > > > > +#define VTD_AW_57BIT 57 > > > > > > > #define VTD_ADDRESS_WIDTH VTD_AW_39BIT > > > > > > > #define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > B.R. > > > > > Yu > > > > > > > > > B.R. > > Yu > B.R. Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width. 2018-12-19 3:40 ` Yu Zhang @ 2018-12-19 4:35 ` Michael S. Tsirkin 2018-12-19 5:57 ` Yu Zhang 0 siblings, 1 reply; 57+ messages in thread From: Michael S. Tsirkin @ 2018-12-19 4:35 UTC (permalink / raw) To: Yu Zhang Cc: Eduardo Habkost, qemu-devel, Peter Xu, Paolo Bonzini, Igor Mammedov, Richard Henderson On Wed, Dec 19, 2018 at 11:40:06AM +0800, Yu Zhang wrote: > On Tue, Dec 18, 2018 at 09:49:02AM -0500, Michael S. Tsirkin wrote: > > On Tue, Dec 18, 2018 at 09:45:41PM +0800, Yu Zhang wrote: > > > On Tue, Dec 18, 2018 at 07:43:28AM -0500, Michael S. Tsirkin wrote: > > > > On Tue, Dec 18, 2018 at 06:01:16PM +0800, Yu Zhang wrote: > > > > > On Tue, Dec 18, 2018 at 05:47:14PM +0800, Yu Zhang wrote: > > > > > > On Mon, Dec 17, 2018 at 02:29:02PM +0100, Igor Mammedov wrote: > > > > > > > On Wed, 12 Dec 2018 21:05:39 +0800 > > > > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > > > > > > > A 5-level paging capable VM may choose to use 57-bit IOVA address width. > > > > > > > > E.g. guest applications may prefer to use its VA as IOVA when performing > > > > > > > > VFIO map/unmap operations, to avoid the burden of managing the IOVA space. > > > > > > > > > > > > > > > > This patch extends the current vIOMMU logic to cover the extended address > > > > > > > > width. When creating a VM with 5-level paging feature, one can choose to > > > > > > > > create a virtual VTD with 5-level paging capability, with configurations > > > > > > > > like "-device intel-iommu,x-aw-bits=57". > > > > > > > > > > > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > > > > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > > > > > > --- > > > > > > > > Cc: "Michael S. Tsirkin" <mst@redhat.com> > > > > > > > > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > > > > > > > > Cc: Paolo Bonzini <pbonzini@redhat.com> > > > > > > > > Cc: Richard Henderson <rth@twiddle.net> > > > > > > > > Cc: Eduardo Habkost <ehabkost@redhat.com> > > > > > > > > Cc: Peter Xu <peterx@redhat.com> > > > > > > > > --- > > > > > > > > hw/i386/intel_iommu.c | 53 ++++++++++++++++++++++++++++++++---------- > > > > > > > > hw/i386/intel_iommu_internal.h | 10 ++++++-- > > > > > > > > include/hw/i386/intel_iommu.h | 1 + > > > > > > > > 3 files changed, 50 insertions(+), 14 deletions(-) > > > > > > > > > > > > > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > > > > > > > > index 0e88c63..871110c 100644 > > > > > > > > --- a/hw/i386/intel_iommu.c > > > > > > > > +++ b/hw/i386/intel_iommu.c > > > > > > > > @@ -664,16 +664,16 @@ static inline bool vtd_iova_range_check(uint64_t iova, VTDContextEntry *ce, > > > > > > > > > > > > > > > > /* > > > > > > > > * Rsvd field masks for spte: > > > > > > > > - * Index [1] to [4] 4k pages > > > > > > > > - * Index [5] to [8] large pages > > > > > > > > + * Index [1] to [5] 4k pages > > > > > > > > + * Index [6] to [10] large pages > > > > > > > > */ > > > > > > > > -static uint64_t vtd_paging_entry_rsvd_field[9]; > > > > > > > > +static uint64_t vtd_paging_entry_rsvd_field[11]; > > > > > > > > > > > > > > > > static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t level) > > > > > > > > { > > > > > > > > if (slpte & VTD_SL_PT_PAGE_SIZE_MASK) { > > > > > > > > /* Maybe large page */ > > > > > > > > - return slpte & vtd_paging_entry_rsvd_field[level + 4]; > > > > > > > > + return slpte & vtd_paging_entry_rsvd_field[level + 5]; > > > > > > > > } else { > > > > > > > > return slpte & vtd_paging_entry_rsvd_field[level]; > > > > > > > > } > > > > > > > > @@ -3127,6 +3127,8 @@ static void vtd_init(IntelIOMMUState *s) > > > > > > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > > > > > > if (s->aw_bits == VTD_AW_48BIT) { > > > > > > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > > > > > > + } else if (s->aw_bits == VTD_AW_57BIT) { > > > > > > > > + s->cap |= VTD_CAP_SAGAW_57bit | VTD_CAP_SAGAW_48bit; > > > > > > > > } > > > > > > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > > > > > > s->haw_bits = cpu->phys_bits; > > > > > > > > @@ -3139,10 +3141,12 @@ static void vtd_init(IntelIOMMUState *s) > > > > > > > > vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > > > vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > > > vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > > > > > - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > > > - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > > > - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_PAGE_L5_RSVD_MASK(s->haw_bits); > > > > > > > > + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > > > > > + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > > > + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > > > + vtd_paging_entry_rsvd_field[9] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > + vtd_paging_entry_rsvd_field[10] = VTD_SPTE_LPAGE_L5_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > > > > > if (x86_iommu->intr_supported) { > > > > > > > > s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; > > > > > > > > @@ -3241,6 +3245,23 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn) > > > > > > > > return &vtd_as->as; > > > > > > > > } > > > > > > > > > > > > > > > > +static bool host_has_la57(void) > > > > > > > > +{ > > > > > > > > + uint32_t ecx, unused; > > > > > > > > + > > > > > > > > + host_cpuid(7, 0, &unused, &unused, &ecx, &unused); > > > > > > > > + return ecx & CPUID_7_0_ECX_LA57; > > > > > > > > +} > > > > > > > > + > > > > > > > > +static bool guest_has_la57(void) > > > > > > > > +{ > > > > > > > > + CPUState *cs = first_cpu; > > > > > > > > + X86CPU *cpu = X86_CPU(cs); > > > > > > > > + CPUX86State *env = &cpu->env; > > > > > > > > + > > > > > > > > + return env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_LA57; > > > > > > > > +} > > > > > > > another direct access to CPU fields, > > > > > > > I'd suggest to set this value when iommu is created > > > > > > > i.e. add 'la57' property and set from iommu owner. > > > > > > > > > > > > > > > > > > > Sorry, do you mean "-device intel-iommu,la57"? I think we do not need > > > > > > that, because a 5-level capable vIOMMU can be created with properties > > > > > > like "-device intel-iommu,x-aw-bits=57". > > > > > > > > > > > > The guest CPU fields are checked to make sure the VM has LA57 CPU feature, > > > > > > because I believe there shall be no 5-level IOMMU on platforms without LA57 > > > > > > CPUs. > > > > > > > > I don't necessarily see why these need to be connected. > > > > If yes pls add code to explain. > > > > > > Sorry, do you mean the VM shall be able to see a 5-level IOMMU even it does not > > > have LA57 feature? I do not see any direct connection when asked to enable a 5-level > > > vIOMMU at first, but I was told(and checked) that DPDK in the VM may choose a VA > > > value as an IOVA. > > > > Right but then that doesn't work on all hosts either. > > Oh, the host already has 5-level IOMMU now. So I think DPDK in native shall work with that. > > > > > > And if guest has LA57, we should create a 5-level vIOMMU to the VM. > > > But if the VM even does not have LA57, any specific reason we should give it a 5-level > > > vIOMMU? > > > > So the example you give is VTD address width < CPU aw. That is known > > to be problematic for dpdk but not for other software and maybe dpdk > > will learns how to cope. Given such hosts exist it might be > > useful to support this at least for debugging. > > > > Are there reasons to worry about VTD > CPU? > > Well, I am not that worried(no usage case is one concern). I am OK to drop the guest check. :) > > > > > > > > > > > > > > > > > > > > > static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > > > > > { > > > > > > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > > > > > > @@ -3267,11 +3288,19 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > > > > > } > > > > > > > > } > > > > > > > > > > > > > > > > - /* Currently only address widths supported are 39 and 48 bits */ > > > > > > > > + /* Currently address widths supported are 39, 48, and 57 bits */ > > > > > > > > if ((s->aw_bits != VTD_AW_39BIT) && > > > > > > > > - (s->aw_bits != VTD_AW_48BIT)) { > > > > > > > > - error_setg(errp, "Supported values for x-aw-bits are: %d, %d", > > > > > > > > - VTD_AW_39BIT, VTD_AW_48BIT); > > > > > > > > + (s->aw_bits != VTD_AW_48BIT) && > > > > > > > > + (s->aw_bits != VTD_AW_57BIT)) { > > > > > > > > + error_setg(errp, "Supported values for x-aw-bits are: %d, %d, %d", > > > > > > > > + VTD_AW_39BIT, VTD_AW_48BIT, VTD_AW_57BIT); > > > > > > > > + return false; > > > > > > > > + } > > > > > > > > + > > > > > > > > + if ((s->aw_bits == VTD_AW_57BIT) && > > > > > > > > + !(host_has_la57() && guest_has_la57())) { > > > > > > > Does iommu supposed to work in TCG mode? > > > > > > > If yes then why it should care about host_has_la57()? > > > > > > > > > > > > > > > > > > > Hmm... I did not take TCG mode into consideration. And host_has_la57() is > > > > > > used to guarantee the host have la57 feature so that iommu shadowing works > > > > > > for device assignment. > > > > > > > > > > > > I guess iommu shall work in TCG mode(though I am not quite sure about this). > > > > > > But I do not have any usage case of a 5-level vIOMMU in TCG in mind. So maybe > > > > > > we can: > > > > > > 1> check the 'ms->accel' in vtd_decide_config() and do not care about host > > > > > > capability if it is TCG. > > > > > > > > > > For choice 1, kvm_enabled() might be used instead of ms->accel. Thanks Peter > > > > > for the remind. :) > > > > > > > > > > > > This needs a big comment with an explanation though. > > > > And probably a TODO to make it work under TCG ... > > > > > > > > > > Thanks, Michael. For choice 1, I believe it should work for TCG(will need test > > > though), and the condition would be sth. like: > > > > > > if ((s->aw_bits == VTD_AW_57BIT) && > > > kvm_enabled() && > > > !host_has_la57()) { > > > > > > As you can see, though I remove the check of guest_has_la57(), I still kept the > > > check against host when KVM is enabled. I'm still ready to be convinced for any > > > requirement why we do not need the guest check. :) > > > > > > okay but then (repeating myself, sorry) pls add a comment that explains > > what happens if you do not add this limitation. > > How about below comments? > /* > * For KVM guests, the host capability of LA57 shall be available, So why is host CPU LA57 necessary for shadowing? Could you explain pls? > so > * that iommu shadowing works for device assignment scenario. But for > * TCG mode, we do not need such restriction. > */ > > BTW, I just tested the TCG mode, it works(with restriction of host capability removed). > > > > > > > > > > > 2> Or, we can choose to keep as it is, and add the check when 5-level paging > > > > > > vIOMMU does have usage in TCG? > > > > > > > > > > > > But as to the check of guest capability, I still believe it is necessary. As > > > > > > said, a VM without LA57 feature shall not see a VT-d with 5-level IOMMU. > > > > > > > > > > > > > > + error_setg(errp, "Do not support 57-bit DMA address, unless both " > > > > > > > > + "host and guest are capable of 5-level paging"); > > > > > > > > return false; > > > > > > > > } > > > > > > > > > > > > > > > > diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h > > > > > > > > index d084099..2b29b6f 100644 > > > > > > > > --- a/hw/i386/intel_iommu_internal.h > > > > > > > > +++ b/hw/i386/intel_iommu_internal.h > > > > > > > > @@ -114,8 +114,8 @@ > > > > > > > > VTD_INTERRUPT_ADDR_FIRST + 1) > > > > > > > > > > > > > > > > /* The shift of source_id in the key of IOTLB hash table */ > > > > > > > > -#define VTD_IOTLB_SID_SHIFT 36 > > > > > > > > -#define VTD_IOTLB_LVL_SHIFT 52 > > > > > > > > +#define VTD_IOTLB_SID_SHIFT 45 > > > > > > > > +#define VTD_IOTLB_LVL_SHIFT 61 > > > > > > > > #define VTD_IOTLB_MAX_SIZE 1024 /* Max size of the hash table */ > > > > > > > > > > > > > > > > /* IOTLB_REG */ > > > > > > > > @@ -212,6 +212,8 @@ > > > > > > > > #define VTD_CAP_SAGAW_39bit (0x2ULL << VTD_CAP_SAGAW_SHIFT) > > > > > > > > /* 48-bit AGAW, 4-level page-table */ > > > > > > > > #define VTD_CAP_SAGAW_48bit (0x4ULL << VTD_CAP_SAGAW_SHIFT) > > > > > > > > + /* 57-bit AGAW, 5-level page-table */ > > > > > > > > +#define VTD_CAP_SAGAW_57bit (0x8ULL << VTD_CAP_SAGAW_SHIFT) > > > > > > > > > > > > > > > > /* IQT_REG */ > > > > > > > > #define VTD_IQT_QT(val) (((val) >> 4) & 0x7fffULL) > > > > > > > > @@ -379,6 +381,8 @@ typedef union VTDInvDesc VTDInvDesc; > > > > > > > > (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > #define VTD_SPTE_PAGE_L4_RSVD_MASK(aw) \ > > > > > > > > (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > +#define VTD_SPTE_PAGE_L5_RSVD_MASK(aw) \ > > > > > > > > + (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > #define VTD_SPTE_LPAGE_L1_RSVD_MASK(aw) \ > > > > > > > > (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > #define VTD_SPTE_LPAGE_L2_RSVD_MASK(aw) \ > > > > > > > > @@ -387,6 +391,8 @@ typedef union VTDInvDesc VTDInvDesc; > > > > > > > > (0x3ffff800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > #define VTD_SPTE_LPAGE_L4_RSVD_MASK(aw) \ > > > > > > > > (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > +#define VTD_SPTE_LPAGE_L5_RSVD_MASK(aw) \ > > > > > > > > + (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > > > > > > > > /* Information about page-selective IOTLB invalidate */ > > > > > > > > struct VTDIOTLBPageInvInfo { > > > > > > > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h > > > > > > > > index 820451c..7474c4f 100644 > > > > > > > > --- a/include/hw/i386/intel_iommu.h > > > > > > > > +++ b/include/hw/i386/intel_iommu.h > > > > > > > > @@ -49,6 +49,7 @@ > > > > > > > > #define DMAR_REG_SIZE 0x230 > > > > > > > > #define VTD_AW_39BIT 39 > > > > > > > > #define VTD_AW_48BIT 48 > > > > > > > > +#define VTD_AW_57BIT 57 > > > > > > > > #define VTD_ADDRESS_WIDTH VTD_AW_39BIT > > > > > > > > #define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > B.R. > > > > > > Yu > > > > > > > > > > > > B.R. > > > Yu > > > > B.R. > Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width. 2018-12-19 4:35 ` Michael S. Tsirkin @ 2018-12-19 5:57 ` Yu Zhang 2018-12-19 15:23 ` Michael S. Tsirkin 0 siblings, 1 reply; 57+ messages in thread From: Yu Zhang @ 2018-12-19 5:57 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Eduardo Habkost, qemu-devel, Peter Xu, Paolo Bonzini, Igor Mammedov, Richard Henderson On Tue, Dec 18, 2018 at 11:35:34PM -0500, Michael S. Tsirkin wrote: > On Wed, Dec 19, 2018 at 11:40:06AM +0800, Yu Zhang wrote: > > On Tue, Dec 18, 2018 at 09:49:02AM -0500, Michael S. Tsirkin wrote: > > > On Tue, Dec 18, 2018 at 09:45:41PM +0800, Yu Zhang wrote: > > > > On Tue, Dec 18, 2018 at 07:43:28AM -0500, Michael S. Tsirkin wrote: > > > > > On Tue, Dec 18, 2018 at 06:01:16PM +0800, Yu Zhang wrote: > > > > > > On Tue, Dec 18, 2018 at 05:47:14PM +0800, Yu Zhang wrote: > > > > > > > On Mon, Dec 17, 2018 at 02:29:02PM +0100, Igor Mammedov wrote: > > > > > > > > On Wed, 12 Dec 2018 21:05:39 +0800 > > > > > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > > > > > > > > > A 5-level paging capable VM may choose to use 57-bit IOVA address width. > > > > > > > > > E.g. guest applications may prefer to use its VA as IOVA when performing > > > > > > > > > VFIO map/unmap operations, to avoid the burden of managing the IOVA space. > > > > > > > > > > > > > > > > > > This patch extends the current vIOMMU logic to cover the extended address > > > > > > > > > width. When creating a VM with 5-level paging feature, one can choose to > > > > > > > > > create a virtual VTD with 5-level paging capability, with configurations > > > > > > > > > like "-device intel-iommu,x-aw-bits=57". > > > > > > > > > > > > > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > > > > > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > > > > > > > --- > > > > > > > > > Cc: "Michael S. Tsirkin" <mst@redhat.com> > > > > > > > > > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > > > > > > > > > Cc: Paolo Bonzini <pbonzini@redhat.com> > > > > > > > > > Cc: Richard Henderson <rth@twiddle.net> > > > > > > > > > Cc: Eduardo Habkost <ehabkost@redhat.com> > > > > > > > > > Cc: Peter Xu <peterx@redhat.com> > > > > > > > > > --- > > > > > > > > > hw/i386/intel_iommu.c | 53 ++++++++++++++++++++++++++++++++---------- > > > > > > > > > hw/i386/intel_iommu_internal.h | 10 ++++++-- > > > > > > > > > include/hw/i386/intel_iommu.h | 1 + > > > > > > > > > 3 files changed, 50 insertions(+), 14 deletions(-) > > > > > > > > > > > > > > > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > > > > > > > > > index 0e88c63..871110c 100644 > > > > > > > > > --- a/hw/i386/intel_iommu.c > > > > > > > > > +++ b/hw/i386/intel_iommu.c > > > > > > > > > @@ -664,16 +664,16 @@ static inline bool vtd_iova_range_check(uint64_t iova, VTDContextEntry *ce, > > > > > > > > > > > > > > > > > > /* > > > > > > > > > * Rsvd field masks for spte: > > > > > > > > > - * Index [1] to [4] 4k pages > > > > > > > > > - * Index [5] to [8] large pages > > > > > > > > > + * Index [1] to [5] 4k pages > > > > > > > > > + * Index [6] to [10] large pages > > > > > > > > > */ > > > > > > > > > -static uint64_t vtd_paging_entry_rsvd_field[9]; > > > > > > > > > +static uint64_t vtd_paging_entry_rsvd_field[11]; > > > > > > > > > > > > > > > > > > static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t level) > > > > > > > > > { > > > > > > > > > if (slpte & VTD_SL_PT_PAGE_SIZE_MASK) { > > > > > > > > > /* Maybe large page */ > > > > > > > > > - return slpte & vtd_paging_entry_rsvd_field[level + 4]; > > > > > > > > > + return slpte & vtd_paging_entry_rsvd_field[level + 5]; > > > > > > > > > } else { > > > > > > > > > return slpte & vtd_paging_entry_rsvd_field[level]; > > > > > > > > > } > > > > > > > > > @@ -3127,6 +3127,8 @@ static void vtd_init(IntelIOMMUState *s) > > > > > > > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > > > > > > > if (s->aw_bits == VTD_AW_48BIT) { > > > > > > > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > > > > > > > + } else if (s->aw_bits == VTD_AW_57BIT) { > > > > > > > > > + s->cap |= VTD_CAP_SAGAW_57bit | VTD_CAP_SAGAW_48bit; > > > > > > > > > } > > > > > > > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > > > > > > > s->haw_bits = cpu->phys_bits; > > > > > > > > > @@ -3139,10 +3141,12 @@ static void vtd_init(IntelIOMMUState *s) > > > > > > > > > vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > > > > vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > > > > vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > > - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > > > > > > - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > > > > - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > > > > - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > > + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_PAGE_L5_RSVD_MASK(s->haw_bits); > > > > > > > > > + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > > > > > > + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > > > > + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > > > > + vtd_paging_entry_rsvd_field[9] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > > + vtd_paging_entry_rsvd_field[10] = VTD_SPTE_LPAGE_L5_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > > > > > > > if (x86_iommu->intr_supported) { > > > > > > > > > s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; > > > > > > > > > @@ -3241,6 +3245,23 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn) > > > > > > > > > return &vtd_as->as; > > > > > > > > > } > > > > > > > > > > > > > > > > > > +static bool host_has_la57(void) > > > > > > > > > +{ > > > > > > > > > + uint32_t ecx, unused; > > > > > > > > > + > > > > > > > > > + host_cpuid(7, 0, &unused, &unused, &ecx, &unused); > > > > > > > > > + return ecx & CPUID_7_0_ECX_LA57; > > > > > > > > > +} > > > > > > > > > + > > > > > > > > > +static bool guest_has_la57(void) > > > > > > > > > +{ > > > > > > > > > + CPUState *cs = first_cpu; > > > > > > > > > + X86CPU *cpu = X86_CPU(cs); > > > > > > > > > + CPUX86State *env = &cpu->env; > > > > > > > > > + > > > > > > > > > + return env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_LA57; > > > > > > > > > +} > > > > > > > > another direct access to CPU fields, > > > > > > > > I'd suggest to set this value when iommu is created > > > > > > > > i.e. add 'la57' property and set from iommu owner. > > > > > > > > > > > > > > > > > > > > > > Sorry, do you mean "-device intel-iommu,la57"? I think we do not need > > > > > > > that, because a 5-level capable vIOMMU can be created with properties > > > > > > > like "-device intel-iommu,x-aw-bits=57". > > > > > > > > > > > > > > The guest CPU fields are checked to make sure the VM has LA57 CPU feature, > > > > > > > because I believe there shall be no 5-level IOMMU on platforms without LA57 > > > > > > > CPUs. > > > > > > > > > > I don't necessarily see why these need to be connected. > > > > > If yes pls add code to explain. > > > > > > > > Sorry, do you mean the VM shall be able to see a 5-level IOMMU even it does not > > > > have LA57 feature? I do not see any direct connection when asked to enable a 5-level > > > > vIOMMU at first, but I was told(and checked) that DPDK in the VM may choose a VA > > > > value as an IOVA. > > > > > > Right but then that doesn't work on all hosts either. > > > > Oh, the host already has 5-level IOMMU now. So I think DPDK in native shall work with that. > > > > > > > > > And if guest has LA57, we should create a 5-level vIOMMU to the VM. > > > > But if the VM even does not have LA57, any specific reason we should give it a 5-level > > > > vIOMMU? > > > > > > So the example you give is VTD address width < CPU aw. That is known > > > to be problematic for dpdk but not for other software and maybe dpdk > > > will learns how to cope. Given such hosts exist it might be > > > useful to support this at least for debugging. > > > > > > Are there reasons to worry about VTD > CPU? > > > > Well, I am not that worried(no usage case is one concern). I am OK to drop the guest check. :) > > > > > > > > > > > > > > > > > > > > > > > > > > > static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > > > > > > { > > > > > > > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > > > > > > > @@ -3267,11 +3288,19 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > > > > > > } > > > > > > > > > } > > > > > > > > > > > > > > > > > > - /* Currently only address widths supported are 39 and 48 bits */ > > > > > > > > > + /* Currently address widths supported are 39, 48, and 57 bits */ > > > > > > > > > if ((s->aw_bits != VTD_AW_39BIT) && > > > > > > > > > - (s->aw_bits != VTD_AW_48BIT)) { > > > > > > > > > - error_setg(errp, "Supported values for x-aw-bits are: %d, %d", > > > > > > > > > - VTD_AW_39BIT, VTD_AW_48BIT); > > > > > > > > > + (s->aw_bits != VTD_AW_48BIT) && > > > > > > > > > + (s->aw_bits != VTD_AW_57BIT)) { > > > > > > > > > + error_setg(errp, "Supported values for x-aw-bits are: %d, %d, %d", > > > > > > > > > + VTD_AW_39BIT, VTD_AW_48BIT, VTD_AW_57BIT); > > > > > > > > > + return false; > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > + if ((s->aw_bits == VTD_AW_57BIT) && > > > > > > > > > + !(host_has_la57() && guest_has_la57())) { > > > > > > > > Does iommu supposed to work in TCG mode? > > > > > > > > If yes then why it should care about host_has_la57()? > > > > > > > > > > > > > > > > > > > > > > Hmm... I did not take TCG mode into consideration. And host_has_la57() is > > > > > > > used to guarantee the host have la57 feature so that iommu shadowing works > > > > > > > for device assignment. > > > > > > > > > > > > > > I guess iommu shall work in TCG mode(though I am not quite sure about this). > > > > > > > But I do not have any usage case of a 5-level vIOMMU in TCG in mind. So maybe > > > > > > > we can: > > > > > > > 1> check the 'ms->accel' in vtd_decide_config() and do not care about host > > > > > > > capability if it is TCG. > > > > > > > > > > > > For choice 1, kvm_enabled() might be used instead of ms->accel. Thanks Peter > > > > > > for the remind. :) > > > > > > > > > > > > > > > This needs a big comment with an explanation though. > > > > > And probably a TODO to make it work under TCG ... > > > > > > > > > > > > > Thanks, Michael. For choice 1, I believe it should work for TCG(will need test > > > > though), and the condition would be sth. like: > > > > > > > > if ((s->aw_bits == VTD_AW_57BIT) && > > > > kvm_enabled() && > > > > !host_has_la57()) { > > > > > > > > As you can see, though I remove the check of guest_has_la57(), I still kept the > > > > check against host when KVM is enabled. I'm still ready to be convinced for any > > > > requirement why we do not need the guest check. :) > > > > > > > > > okay but then (repeating myself, sorry) pls add a comment that explains > > > what happens if you do not add this limitation. > > > > How about below comments? > > /* > > * For KVM guests, the host capability of LA57 shall be available, > > So why is host CPU LA57 necessary for shadowing? Could you explain pls? Oh, let me try to explain the background here. :) Currently, vIOMMU in qemu does not have logic to check against the hardware IOMMU capability. E.g. when we create an IOMMU with 48 bit DMA address width, qemu does not check if any physical IOMMU has such support. And the shadow IOMMU logic will have problem if host IOMMU only supports 39 bit IOVA. And we will have the same problem when it comes to 57 bit IOVA. My previous discussion with Peter Xu reached an agreement that for now, we just use the host cpu capability as a reference when trying to create a 5-level vIOMMU, because 57 bit IOMMU hardware will not come until ICX platform(which includes LA57). And the final correct solution should be to enumerate the capabilities of hardware IOMMUs used by the assigned device, and reject if any dismatch is found. Maybe I should add a TODO in above comments, give the background explaination. > > > so > > * that iommu shadowing works for device assignment scenario. But for > > * TCG mode, we do not need such restriction. > > */ > > > > BTW, I just tested the TCG mode, it works(with restriction of host capability removed). > > > > > > > > > > > > > > > 2> Or, we can choose to keep as it is, and add the check when 5-level paging > > > > > > > vIOMMU does have usage in TCG? > > > > > > > > > > > > > > But as to the check of guest capability, I still believe it is necessary. As > > > > > > > said, a VM without LA57 feature shall not see a VT-d with 5-level IOMMU. > > > > > > > > > > > > > > > > + error_setg(errp, "Do not support 57-bit DMA address, unless both " > > > > > > > > > + "host and guest are capable of 5-level paging"); > > > > > > > > > return false; > > > > > > > > > } > > > > > > > > > > > > > > > > > > diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h > > > > > > > > > index d084099..2b29b6f 100644 > > > > > > > > > --- a/hw/i386/intel_iommu_internal.h > > > > > > > > > +++ b/hw/i386/intel_iommu_internal.h > > > > > > > > > @@ -114,8 +114,8 @@ > > > > > > > > > VTD_INTERRUPT_ADDR_FIRST + 1) > > > > > > > > > > > > > > > > > > /* The shift of source_id in the key of IOTLB hash table */ > > > > > > > > > -#define VTD_IOTLB_SID_SHIFT 36 > > > > > > > > > -#define VTD_IOTLB_LVL_SHIFT 52 > > > > > > > > > +#define VTD_IOTLB_SID_SHIFT 45 > > > > > > > > > +#define VTD_IOTLB_LVL_SHIFT 61 > > > > > > > > > #define VTD_IOTLB_MAX_SIZE 1024 /* Max size of the hash table */ > > > > > > > > > > > > > > > > > > /* IOTLB_REG */ > > > > > > > > > @@ -212,6 +212,8 @@ > > > > > > > > > #define VTD_CAP_SAGAW_39bit (0x2ULL << VTD_CAP_SAGAW_SHIFT) > > > > > > > > > /* 48-bit AGAW, 4-level page-table */ > > > > > > > > > #define VTD_CAP_SAGAW_48bit (0x4ULL << VTD_CAP_SAGAW_SHIFT) > > > > > > > > > + /* 57-bit AGAW, 5-level page-table */ > > > > > > > > > +#define VTD_CAP_SAGAW_57bit (0x8ULL << VTD_CAP_SAGAW_SHIFT) > > > > > > > > > > > > > > > > > > /* IQT_REG */ > > > > > > > > > #define VTD_IQT_QT(val) (((val) >> 4) & 0x7fffULL) > > > > > > > > > @@ -379,6 +381,8 @@ typedef union VTDInvDesc VTDInvDesc; > > > > > > > > > (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > #define VTD_SPTE_PAGE_L4_RSVD_MASK(aw) \ > > > > > > > > > (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > +#define VTD_SPTE_PAGE_L5_RSVD_MASK(aw) \ > > > > > > > > > + (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > #define VTD_SPTE_LPAGE_L1_RSVD_MASK(aw) \ > > > > > > > > > (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > #define VTD_SPTE_LPAGE_L2_RSVD_MASK(aw) \ > > > > > > > > > @@ -387,6 +391,8 @@ typedef union VTDInvDesc VTDInvDesc; > > > > > > > > > (0x3ffff800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > #define VTD_SPTE_LPAGE_L4_RSVD_MASK(aw) \ > > > > > > > > > (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > +#define VTD_SPTE_LPAGE_L5_RSVD_MASK(aw) \ > > > > > > > > > + (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > > > > > > > > > > /* Information about page-selective IOTLB invalidate */ > > > > > > > > > struct VTDIOTLBPageInvInfo { > > > > > > > > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h > > > > > > > > > index 820451c..7474c4f 100644 > > > > > > > > > --- a/include/hw/i386/intel_iommu.h > > > > > > > > > +++ b/include/hw/i386/intel_iommu.h > > > > > > > > > @@ -49,6 +49,7 @@ > > > > > > > > > #define DMAR_REG_SIZE 0x230 > > > > > > > > > #define VTD_AW_39BIT 39 > > > > > > > > > #define VTD_AW_48BIT 48 > > > > > > > > > +#define VTD_AW_57BIT 57 > > > > > > > > > #define VTD_ADDRESS_WIDTH VTD_AW_39BIT > > > > > > > > > #define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > B.R. > > > > > > > Yu > > > > > > > > > > > > > > > B.R. > > > > Yu > > > > > > > B.R. > > Yu B.R. Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width. 2018-12-19 5:57 ` Yu Zhang @ 2018-12-19 15:23 ` Michael S. Tsirkin 2018-12-20 5:49 ` Yu Zhang 0 siblings, 1 reply; 57+ messages in thread From: Michael S. Tsirkin @ 2018-12-19 15:23 UTC (permalink / raw) To: Yu Zhang Cc: Eduardo Habkost, qemu-devel, Peter Xu, Paolo Bonzini, Igor Mammedov, Richard Henderson On Wed, Dec 19, 2018 at 01:57:43PM +0800, Yu Zhang wrote: > On Tue, Dec 18, 2018 at 11:35:34PM -0500, Michael S. Tsirkin wrote: > > On Wed, Dec 19, 2018 at 11:40:06AM +0800, Yu Zhang wrote: > > > On Tue, Dec 18, 2018 at 09:49:02AM -0500, Michael S. Tsirkin wrote: > > > > On Tue, Dec 18, 2018 at 09:45:41PM +0800, Yu Zhang wrote: > > > > > On Tue, Dec 18, 2018 at 07:43:28AM -0500, Michael S. Tsirkin wrote: > > > > > > On Tue, Dec 18, 2018 at 06:01:16PM +0800, Yu Zhang wrote: > > > > > > > On Tue, Dec 18, 2018 at 05:47:14PM +0800, Yu Zhang wrote: > > > > > > > > On Mon, Dec 17, 2018 at 02:29:02PM +0100, Igor Mammedov wrote: > > > > > > > > > On Wed, 12 Dec 2018 21:05:39 +0800 > > > > > > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > > > > > > > > > > > A 5-level paging capable VM may choose to use 57-bit IOVA address width. > > > > > > > > > > E.g. guest applications may prefer to use its VA as IOVA when performing > > > > > > > > > > VFIO map/unmap operations, to avoid the burden of managing the IOVA space. > > > > > > > > > > > > > > > > > > > > This patch extends the current vIOMMU logic to cover the extended address > > > > > > > > > > width. When creating a VM with 5-level paging feature, one can choose to > > > > > > > > > > create a virtual VTD with 5-level paging capability, with configurations > > > > > > > > > > like "-device intel-iommu,x-aw-bits=57". > > > > > > > > > > > > > > > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > > > > > > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > > > > > > > > --- > > > > > > > > > > Cc: "Michael S. Tsirkin" <mst@redhat.com> > > > > > > > > > > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > > > > > > > > > > Cc: Paolo Bonzini <pbonzini@redhat.com> > > > > > > > > > > Cc: Richard Henderson <rth@twiddle.net> > > > > > > > > > > Cc: Eduardo Habkost <ehabkost@redhat.com> > > > > > > > > > > Cc: Peter Xu <peterx@redhat.com> > > > > > > > > > > --- > > > > > > > > > > hw/i386/intel_iommu.c | 53 ++++++++++++++++++++++++++++++++---------- > > > > > > > > > > hw/i386/intel_iommu_internal.h | 10 ++++++-- > > > > > > > > > > include/hw/i386/intel_iommu.h | 1 + > > > > > > > > > > 3 files changed, 50 insertions(+), 14 deletions(-) > > > > > > > > > > > > > > > > > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > > > > > > > > > > index 0e88c63..871110c 100644 > > > > > > > > > > --- a/hw/i386/intel_iommu.c > > > > > > > > > > +++ b/hw/i386/intel_iommu.c > > > > > > > > > > @@ -664,16 +664,16 @@ static inline bool vtd_iova_range_check(uint64_t iova, VTDContextEntry *ce, > > > > > > > > > > > > > > > > > > > > /* > > > > > > > > > > * Rsvd field masks for spte: > > > > > > > > > > - * Index [1] to [4] 4k pages > > > > > > > > > > - * Index [5] to [8] large pages > > > > > > > > > > + * Index [1] to [5] 4k pages > > > > > > > > > > + * Index [6] to [10] large pages > > > > > > > > > > */ > > > > > > > > > > -static uint64_t vtd_paging_entry_rsvd_field[9]; > > > > > > > > > > +static uint64_t vtd_paging_entry_rsvd_field[11]; > > > > > > > > > > > > > > > > > > > > static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t level) > > > > > > > > > > { > > > > > > > > > > if (slpte & VTD_SL_PT_PAGE_SIZE_MASK) { > > > > > > > > > > /* Maybe large page */ > > > > > > > > > > - return slpte & vtd_paging_entry_rsvd_field[level + 4]; > > > > > > > > > > + return slpte & vtd_paging_entry_rsvd_field[level + 5]; > > > > > > > > > > } else { > > > > > > > > > > return slpte & vtd_paging_entry_rsvd_field[level]; > > > > > > > > > > } > > > > > > > > > > @@ -3127,6 +3127,8 @@ static void vtd_init(IntelIOMMUState *s) > > > > > > > > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > > > > > > > > if (s->aw_bits == VTD_AW_48BIT) { > > > > > > > > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > > > > > > > > + } else if (s->aw_bits == VTD_AW_57BIT) { > > > > > > > > > > + s->cap |= VTD_CAP_SAGAW_57bit | VTD_CAP_SAGAW_48bit; > > > > > > > > > > } > > > > > > > > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > > > > > > > > s->haw_bits = cpu->phys_bits; > > > > > > > > > > @@ -3139,10 +3141,12 @@ static void vtd_init(IntelIOMMUState *s) > > > > > > > > > > vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > > > > > vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > > > > > vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > > > - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > > > > > > > - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > > > > > - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > > > > > - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > > > + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_PAGE_L5_RSVD_MASK(s->haw_bits); > > > > > > > > > > + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > > > > > > > + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > > > > > + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > > > > > + vtd_paging_entry_rsvd_field[9] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > > > + vtd_paging_entry_rsvd_field[10] = VTD_SPTE_LPAGE_L5_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > > > > > > > > > if (x86_iommu->intr_supported) { > > > > > > > > > > s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; > > > > > > > > > > @@ -3241,6 +3245,23 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn) > > > > > > > > > > return &vtd_as->as; > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > +static bool host_has_la57(void) > > > > > > > > > > +{ > > > > > > > > > > + uint32_t ecx, unused; > > > > > > > > > > + > > > > > > > > > > + host_cpuid(7, 0, &unused, &unused, &ecx, &unused); > > > > > > > > > > + return ecx & CPUID_7_0_ECX_LA57; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +static bool guest_has_la57(void) > > > > > > > > > > +{ > > > > > > > > > > + CPUState *cs = first_cpu; > > > > > > > > > > + X86CPU *cpu = X86_CPU(cs); > > > > > > > > > > + CPUX86State *env = &cpu->env; > > > > > > > > > > + > > > > > > > > > > + return env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_LA57; > > > > > > > > > > +} > > > > > > > > > another direct access to CPU fields, > > > > > > > > > I'd suggest to set this value when iommu is created > > > > > > > > > i.e. add 'la57' property and set from iommu owner. > > > > > > > > > > > > > > > > > > > > > > > > > Sorry, do you mean "-device intel-iommu,la57"? I think we do not need > > > > > > > > that, because a 5-level capable vIOMMU can be created with properties > > > > > > > > like "-device intel-iommu,x-aw-bits=57". > > > > > > > > > > > > > > > > The guest CPU fields are checked to make sure the VM has LA57 CPU feature, > > > > > > > > because I believe there shall be no 5-level IOMMU on platforms without LA57 > > > > > > > > CPUs. > > > > > > > > > > > > I don't necessarily see why these need to be connected. > > > > > > If yes pls add code to explain. > > > > > > > > > > Sorry, do you mean the VM shall be able to see a 5-level IOMMU even it does not > > > > > have LA57 feature? I do not see any direct connection when asked to enable a 5-level > > > > > vIOMMU at first, but I was told(and checked) that DPDK in the VM may choose a VA > > > > > value as an IOVA. > > > > > > > > Right but then that doesn't work on all hosts either. > > > > > > Oh, the host already has 5-level IOMMU now. So I think DPDK in native shall work with that. > > > > > > > > > > > > And if guest has LA57, we should create a 5-level vIOMMU to the VM. > > > > > But if the VM even does not have LA57, any specific reason we should give it a 5-level > > > > > vIOMMU? > > > > > > > > So the example you give is VTD address width < CPU aw. That is known > > > > to be problematic for dpdk but not for other software and maybe dpdk > > > > will learns how to cope. Given such hosts exist it might be > > > > useful to support this at least for debugging. > > > > > > > > Are there reasons to worry about VTD > CPU? > > > > > > Well, I am not that worried(no usage case is one concern). I am OK to drop the guest check. :) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > > > > > > > { > > > > > > > > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > > > > > > > > @@ -3267,11 +3288,19 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > > > > > > > } > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > - /* Currently only address widths supported are 39 and 48 bits */ > > > > > > > > > > + /* Currently address widths supported are 39, 48, and 57 bits */ > > > > > > > > > > if ((s->aw_bits != VTD_AW_39BIT) && > > > > > > > > > > - (s->aw_bits != VTD_AW_48BIT)) { > > > > > > > > > > - error_setg(errp, "Supported values for x-aw-bits are: %d, %d", > > > > > > > > > > - VTD_AW_39BIT, VTD_AW_48BIT); > > > > > > > > > > + (s->aw_bits != VTD_AW_48BIT) && > > > > > > > > > > + (s->aw_bits != VTD_AW_57BIT)) { > > > > > > > > > > + error_setg(errp, "Supported values for x-aw-bits are: %d, %d, %d", > > > > > > > > > > + VTD_AW_39BIT, VTD_AW_48BIT, VTD_AW_57BIT); > > > > > > > > > > + return false; > > > > > > > > > > + } > > > > > > > > > > + > > > > > > > > > > + if ((s->aw_bits == VTD_AW_57BIT) && > > > > > > > > > > + !(host_has_la57() && guest_has_la57())) { > > > > > > > > > Does iommu supposed to work in TCG mode? > > > > > > > > > If yes then why it should care about host_has_la57()? > > > > > > > > > > > > > > > > > > > > > > > > > Hmm... I did not take TCG mode into consideration. And host_has_la57() is > > > > > > > > used to guarantee the host have la57 feature so that iommu shadowing works > > > > > > > > for device assignment. > > > > > > > > > > > > > > > > I guess iommu shall work in TCG mode(though I am not quite sure about this). > > > > > > > > But I do not have any usage case of a 5-level vIOMMU in TCG in mind. So maybe > > > > > > > > we can: > > > > > > > > 1> check the 'ms->accel' in vtd_decide_config() and do not care about host > > > > > > > > capability if it is TCG. > > > > > > > > > > > > > > For choice 1, kvm_enabled() might be used instead of ms->accel. Thanks Peter > > > > > > > for the remind. :) > > > > > > > > > > > > > > > > > > This needs a big comment with an explanation though. > > > > > > And probably a TODO to make it work under TCG ... > > > > > > > > > > > > > > > > Thanks, Michael. For choice 1, I believe it should work for TCG(will need test > > > > > though), and the condition would be sth. like: > > > > > > > > > > if ((s->aw_bits == VTD_AW_57BIT) && > > > > > kvm_enabled() && > > > > > !host_has_la57()) { > > > > > > > > > > As you can see, though I remove the check of guest_has_la57(), I still kept the > > > > > check against host when KVM is enabled. I'm still ready to be convinced for any > > > > > requirement why we do not need the guest check. :) > > > > > > > > > > > > okay but then (repeating myself, sorry) pls add a comment that explains > > > > what happens if you do not add this limitation. > > > > > > How about below comments? > > > /* > > > * For KVM guests, the host capability of LA57 shall be available, > > > > So why is host CPU LA57 necessary for shadowing? Could you explain pls? > > Oh, let me try to explain the background here. :) > > Currently, vIOMMU in qemu does not have logic to check against the hardware > IOMMU capability. E.g. when we create an IOMMU with 48 bit DMA address width, > qemu does not check if any physical IOMMU has such support. And the shadow > IOMMU logic will have problem if host IOMMU only supports 39 bit IOVA. And > we will have the same problem when it comes to 57 bit IOVA. > > My previous discussion with Peter Xu reached an agreement that for now, we > just use the host cpu capability as a reference when trying to create a 5-level > vIOMMU, because 57 bit IOMMU hardware will not come until ICX platform(which > includes LA57). > > And the final correct solution should be to enumerate the capabilities of > hardware IOMMUs used by the assigned device, and reject if any dismatch is > found. Right. And it's a hack because 1. CPU AW doesn't always match VTD AW 2. The limitation only applies to hardware devices, software ones are fine So we need a patch for the host sysfs to expose the actual IOMMU AW to userspace. QEMU could then look at the actual hardware features. I'd like to see the actual patch doing that, even if we add a hack based on CPU AW for existing systems. But how is it working for TCG? It would seem that VFIO with TCG would be just as broken as with KVM... > Maybe I should add a TODO in above comments, give the background explaination. > > > > > > so > > > * that iommu shadowing works for device assignment scenario. But for > > > * TCG mode, we do not need such restriction. > > > */ > > > > > > BTW, I just tested the TCG mode, it works(with restriction of host capability removed). > > > > > > > > > > > > > > > > > > > 2> Or, we can choose to keep as it is, and add the check when 5-level paging > > > > > > > > vIOMMU does have usage in TCG? > > > > > > > > > > > > > > > > But as to the check of guest capability, I still believe it is necessary. As > > > > > > > > said, a VM without LA57 feature shall not see a VT-d with 5-level IOMMU. > > > > > > > > > > > > > > > > > > + error_setg(errp, "Do not support 57-bit DMA address, unless both " > > > > > > > > > > + "host and guest are capable of 5-level paging"); > > > > > > > > > > return false; > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h > > > > > > > > > > index d084099..2b29b6f 100644 > > > > > > > > > > --- a/hw/i386/intel_iommu_internal.h > > > > > > > > > > +++ b/hw/i386/intel_iommu_internal.h > > > > > > > > > > @@ -114,8 +114,8 @@ > > > > > > > > > > VTD_INTERRUPT_ADDR_FIRST + 1) > > > > > > > > > > > > > > > > > > > > /* The shift of source_id in the key of IOTLB hash table */ > > > > > > > > > > -#define VTD_IOTLB_SID_SHIFT 36 > > > > > > > > > > -#define VTD_IOTLB_LVL_SHIFT 52 > > > > > > > > > > +#define VTD_IOTLB_SID_SHIFT 45 > > > > > > > > > > +#define VTD_IOTLB_LVL_SHIFT 61 > > > > > > > > > > #define VTD_IOTLB_MAX_SIZE 1024 /* Max size of the hash table */ > > > > > > > > > > > > > > > > > > > > /* IOTLB_REG */ > > > > > > > > > > @@ -212,6 +212,8 @@ > > > > > > > > > > #define VTD_CAP_SAGAW_39bit (0x2ULL << VTD_CAP_SAGAW_SHIFT) > > > > > > > > > > /* 48-bit AGAW, 4-level page-table */ > > > > > > > > > > #define VTD_CAP_SAGAW_48bit (0x4ULL << VTD_CAP_SAGAW_SHIFT) > > > > > > > > > > + /* 57-bit AGAW, 5-level page-table */ > > > > > > > > > > +#define VTD_CAP_SAGAW_57bit (0x8ULL << VTD_CAP_SAGAW_SHIFT) > > > > > > > > > > > > > > > > > > > > /* IQT_REG */ > > > > > > > > > > #define VTD_IQT_QT(val) (((val) >> 4) & 0x7fffULL) > > > > > > > > > > @@ -379,6 +381,8 @@ typedef union VTDInvDesc VTDInvDesc; > > > > > > > > > > (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > > #define VTD_SPTE_PAGE_L4_RSVD_MASK(aw) \ > > > > > > > > > > (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > > +#define VTD_SPTE_PAGE_L5_RSVD_MASK(aw) \ > > > > > > > > > > + (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > > #define VTD_SPTE_LPAGE_L1_RSVD_MASK(aw) \ > > > > > > > > > > (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > > #define VTD_SPTE_LPAGE_L2_RSVD_MASK(aw) \ > > > > > > > > > > @@ -387,6 +391,8 @@ typedef union VTDInvDesc VTDInvDesc; > > > > > > > > > > (0x3ffff800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > > #define VTD_SPTE_LPAGE_L4_RSVD_MASK(aw) \ > > > > > > > > > > (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > > +#define VTD_SPTE_LPAGE_L5_RSVD_MASK(aw) \ > > > > > > > > > > + (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > > > > > > > > > > > > /* Information about page-selective IOTLB invalidate */ > > > > > > > > > > struct VTDIOTLBPageInvInfo { > > > > > > > > > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h > > > > > > > > > > index 820451c..7474c4f 100644 > > > > > > > > > > --- a/include/hw/i386/intel_iommu.h > > > > > > > > > > +++ b/include/hw/i386/intel_iommu.h > > > > > > > > > > @@ -49,6 +49,7 @@ > > > > > > > > > > #define DMAR_REG_SIZE 0x230 > > > > > > > > > > #define VTD_AW_39BIT 39 > > > > > > > > > > #define VTD_AW_48BIT 48 > > > > > > > > > > +#define VTD_AW_57BIT 57 > > > > > > > > > > #define VTD_ADDRESS_WIDTH VTD_AW_39BIT > > > > > > > > > > #define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > B.R. > > > > > > > > Yu > > > > > > > > > > > > > > > > > > B.R. > > > > > Yu > > > > > > > > > > B.R. > > > Yu > > B.R. > Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width. 2018-12-19 15:23 ` Michael S. Tsirkin @ 2018-12-20 5:49 ` Yu Zhang 2018-12-20 18:28 ` Michael S. Tsirkin 0 siblings, 1 reply; 57+ messages in thread From: Yu Zhang @ 2018-12-20 5:49 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Eduardo Habkost, qemu-devel, Peter Xu, Paolo Bonzini, Igor Mammedov, Richard Henderson On Wed, Dec 19, 2018 at 10:23:44AM -0500, Michael S. Tsirkin wrote: > On Wed, Dec 19, 2018 at 01:57:43PM +0800, Yu Zhang wrote: > > On Tue, Dec 18, 2018 at 11:35:34PM -0500, Michael S. Tsirkin wrote: > > > On Wed, Dec 19, 2018 at 11:40:06AM +0800, Yu Zhang wrote: > > > > On Tue, Dec 18, 2018 at 09:49:02AM -0500, Michael S. Tsirkin wrote: > > > > > On Tue, Dec 18, 2018 at 09:45:41PM +0800, Yu Zhang wrote: > > > > > > On Tue, Dec 18, 2018 at 07:43:28AM -0500, Michael S. Tsirkin wrote: > > > > > > > On Tue, Dec 18, 2018 at 06:01:16PM +0800, Yu Zhang wrote: > > > > > > > > On Tue, Dec 18, 2018 at 05:47:14PM +0800, Yu Zhang wrote: > > > > > > > > > On Mon, Dec 17, 2018 at 02:29:02PM +0100, Igor Mammedov wrote: > > > > > > > > > > On Wed, 12 Dec 2018 21:05:39 +0800 > > > > > > > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > > > > > > > > > > > > > A 5-level paging capable VM may choose to use 57-bit IOVA address width. > > > > > > > > > > > E.g. guest applications may prefer to use its VA as IOVA when performing > > > > > > > > > > > VFIO map/unmap operations, to avoid the burden of managing the IOVA space. > > > > > > > > > > > > > > > > > > > > > > This patch extends the current vIOMMU logic to cover the extended address > > > > > > > > > > > width. When creating a VM with 5-level paging feature, one can choose to > > > > > > > > > > > create a virtual VTD with 5-level paging capability, with configurations > > > > > > > > > > > like "-device intel-iommu,x-aw-bits=57". > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > > > > > > > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > > > > > > > > > --- > > > > > > > > > > > Cc: "Michael S. Tsirkin" <mst@redhat.com> > > > > > > > > > > > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > > > > > > > > > > > Cc: Paolo Bonzini <pbonzini@redhat.com> > > > > > > > > > > > Cc: Richard Henderson <rth@twiddle.net> > > > > > > > > > > > Cc: Eduardo Habkost <ehabkost@redhat.com> > > > > > > > > > > > Cc: Peter Xu <peterx@redhat.com> > > > > > > > > > > > --- > > > > > > > > > > > hw/i386/intel_iommu.c | 53 ++++++++++++++++++++++++++++++++---------- > > > > > > > > > > > hw/i386/intel_iommu_internal.h | 10 ++++++-- > > > > > > > > > > > include/hw/i386/intel_iommu.h | 1 + > > > > > > > > > > > 3 files changed, 50 insertions(+), 14 deletions(-) > > > > > > > > > > > > > > > > > > > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > > > > > > > > > > > index 0e88c63..871110c 100644 > > > > > > > > > > > --- a/hw/i386/intel_iommu.c > > > > > > > > > > > +++ b/hw/i386/intel_iommu.c > > > > > > > > > > > @@ -664,16 +664,16 @@ static inline bool vtd_iova_range_check(uint64_t iova, VTDContextEntry *ce, > > > > > > > > > > > > > > > > > > > > > > /* > > > > > > > > > > > * Rsvd field masks for spte: > > > > > > > > > > > - * Index [1] to [4] 4k pages > > > > > > > > > > > - * Index [5] to [8] large pages > > > > > > > > > > > + * Index [1] to [5] 4k pages > > > > > > > > > > > + * Index [6] to [10] large pages > > > > > > > > > > > */ > > > > > > > > > > > -static uint64_t vtd_paging_entry_rsvd_field[9]; > > > > > > > > > > > +static uint64_t vtd_paging_entry_rsvd_field[11]; > > > > > > > > > > > > > > > > > > > > > > static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t level) > > > > > > > > > > > { > > > > > > > > > > > if (slpte & VTD_SL_PT_PAGE_SIZE_MASK) { > > > > > > > > > > > /* Maybe large page */ > > > > > > > > > > > - return slpte & vtd_paging_entry_rsvd_field[level + 4]; > > > > > > > > > > > + return slpte & vtd_paging_entry_rsvd_field[level + 5]; > > > > > > > > > > > } else { > > > > > > > > > > > return slpte & vtd_paging_entry_rsvd_field[level]; > > > > > > > > > > > } > > > > > > > > > > > @@ -3127,6 +3127,8 @@ static void vtd_init(IntelIOMMUState *s) > > > > > > > > > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > > > > > > > > > if (s->aw_bits == VTD_AW_48BIT) { > > > > > > > > > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > > > > > > > > > + } else if (s->aw_bits == VTD_AW_57BIT) { > > > > > > > > > > > + s->cap |= VTD_CAP_SAGAW_57bit | VTD_CAP_SAGAW_48bit; > > > > > > > > > > > } > > > > > > > > > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > > > > > > > > > s->haw_bits = cpu->phys_bits; > > > > > > > > > > > @@ -3139,10 +3141,12 @@ static void vtd_init(IntelIOMMUState *s) > > > > > > > > > > > vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > > > > > > vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > > > > > > vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > > > > - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > > > > > > > > - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > > > > > > - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > > > > > > - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > > > > + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_PAGE_L5_RSVD_MASK(s->haw_bits); > > > > > > > > > > > + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > > > > > > > > + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > > > > > > + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > > > > > > + vtd_paging_entry_rsvd_field[9] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > > > > + vtd_paging_entry_rsvd_field[10] = VTD_SPTE_LPAGE_L5_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > > > > > > > > > > > if (x86_iommu->intr_supported) { > > > > > > > > > > > s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; > > > > > > > > > > > @@ -3241,6 +3245,23 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn) > > > > > > > > > > > return &vtd_as->as; > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > +static bool host_has_la57(void) > > > > > > > > > > > +{ > > > > > > > > > > > + uint32_t ecx, unused; > > > > > > > > > > > + > > > > > > > > > > > + host_cpuid(7, 0, &unused, &unused, &ecx, &unused); > > > > > > > > > > > + return ecx & CPUID_7_0_ECX_LA57; > > > > > > > > > > > +} > > > > > > > > > > > + > > > > > > > > > > > +static bool guest_has_la57(void) > > > > > > > > > > > +{ > > > > > > > > > > > + CPUState *cs = first_cpu; > > > > > > > > > > > + X86CPU *cpu = X86_CPU(cs); > > > > > > > > > > > + CPUX86State *env = &cpu->env; > > > > > > > > > > > + > > > > > > > > > > > + return env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_LA57; > > > > > > > > > > > +} > > > > > > > > > > another direct access to CPU fields, > > > > > > > > > > I'd suggest to set this value when iommu is created > > > > > > > > > > i.e. add 'la57' property and set from iommu owner. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sorry, do you mean "-device intel-iommu,la57"? I think we do not need > > > > > > > > > that, because a 5-level capable vIOMMU can be created with properties > > > > > > > > > like "-device intel-iommu,x-aw-bits=57". > > > > > > > > > > > > > > > > > > The guest CPU fields are checked to make sure the VM has LA57 CPU feature, > > > > > > > > > because I believe there shall be no 5-level IOMMU on platforms without LA57 > > > > > > > > > CPUs. > > > > > > > > > > > > > > I don't necessarily see why these need to be connected. > > > > > > > If yes pls add code to explain. > > > > > > > > > > > > Sorry, do you mean the VM shall be able to see a 5-level IOMMU even it does not > > > > > > have LA57 feature? I do not see any direct connection when asked to enable a 5-level > > > > > > vIOMMU at first, but I was told(and checked) that DPDK in the VM may choose a VA > > > > > > value as an IOVA. > > > > > > > > > > Right but then that doesn't work on all hosts either. > > > > > > > > Oh, the host already has 5-level IOMMU now. So I think DPDK in native shall work with that. > > > > > > > > > > > > > > > And if guest has LA57, we should create a 5-level vIOMMU to the VM. > > > > > > But if the VM even does not have LA57, any specific reason we should give it a 5-level > > > > > > vIOMMU? > > > > > > > > > > So the example you give is VTD address width < CPU aw. That is known > > > > > to be problematic for dpdk but not for other software and maybe dpdk > > > > > will learns how to cope. Given such hosts exist it might be > > > > > useful to support this at least for debugging. > > > > > > > > > > Are there reasons to worry about VTD > CPU? > > > > > > > > Well, I am not that worried(no usage case is one concern). I am OK to drop the guest check. :) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > > > > > > > > { > > > > > > > > > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > > > > > > > > > @@ -3267,11 +3288,19 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > > > > > > > > } > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > - /* Currently only address widths supported are 39 and 48 bits */ > > > > > > > > > > > + /* Currently address widths supported are 39, 48, and 57 bits */ > > > > > > > > > > > if ((s->aw_bits != VTD_AW_39BIT) && > > > > > > > > > > > - (s->aw_bits != VTD_AW_48BIT)) { > > > > > > > > > > > - error_setg(errp, "Supported values for x-aw-bits are: %d, %d", > > > > > > > > > > > - VTD_AW_39BIT, VTD_AW_48BIT); > > > > > > > > > > > + (s->aw_bits != VTD_AW_48BIT) && > > > > > > > > > > > + (s->aw_bits != VTD_AW_57BIT)) { > > > > > > > > > > > + error_setg(errp, "Supported values for x-aw-bits are: %d, %d, %d", > > > > > > > > > > > + VTD_AW_39BIT, VTD_AW_48BIT, VTD_AW_57BIT); > > > > > > > > > > > + return false; > > > > > > > > > > > + } > > > > > > > > > > > + > > > > > > > > > > > + if ((s->aw_bits == VTD_AW_57BIT) && > > > > > > > > > > > + !(host_has_la57() && guest_has_la57())) { > > > > > > > > > > Does iommu supposed to work in TCG mode? > > > > > > > > > > If yes then why it should care about host_has_la57()? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hmm... I did not take TCG mode into consideration. And host_has_la57() is > > > > > > > > > used to guarantee the host have la57 feature so that iommu shadowing works > > > > > > > > > for device assignment. > > > > > > > > > > > > > > > > > > I guess iommu shall work in TCG mode(though I am not quite sure about this). > > > > > > > > > But I do not have any usage case of a 5-level vIOMMU in TCG in mind. So maybe > > > > > > > > > we can: > > > > > > > > > 1> check the 'ms->accel' in vtd_decide_config() and do not care about host > > > > > > > > > capability if it is TCG. > > > > > > > > > > > > > > > > For choice 1, kvm_enabled() might be used instead of ms->accel. Thanks Peter > > > > > > > > for the remind. :) > > > > > > > > > > > > > > > > > > > > > This needs a big comment with an explanation though. > > > > > > > And probably a TODO to make it work under TCG ... > > > > > > > > > > > > > > > > > > > Thanks, Michael. For choice 1, I believe it should work for TCG(will need test > > > > > > though), and the condition would be sth. like: > > > > > > > > > > > > if ((s->aw_bits == VTD_AW_57BIT) && > > > > > > kvm_enabled() && > > > > > > !host_has_la57()) { > > > > > > > > > > > > As you can see, though I remove the check of guest_has_la57(), I still kept the > > > > > > check against host when KVM is enabled. I'm still ready to be convinced for any > > > > > > requirement why we do not need the guest check. :) > > > > > > > > > > > > > > > okay but then (repeating myself, sorry) pls add a comment that explains > > > > > what happens if you do not add this limitation. > > > > > > > > How about below comments? > > > > /* > > > > * For KVM guests, the host capability of LA57 shall be available, > > > > > > So why is host CPU LA57 necessary for shadowing? Could you explain pls? > > > > Oh, let me try to explain the background here. :) > > > > Currently, vIOMMU in qemu does not have logic to check against the hardware > > IOMMU capability. E.g. when we create an IOMMU with 48 bit DMA address width, > > qemu does not check if any physical IOMMU has such support. And the shadow > > IOMMU logic will have problem if host IOMMU only supports 39 bit IOVA. And > > we will have the same problem when it comes to 57 bit IOVA. > > > > My previous discussion with Peter Xu reached an agreement that for now, we > > just use the host cpu capability as a reference when trying to create a 5-level > > vIOMMU, because 57 bit IOMMU hardware will not come until ICX platform(which > > includes LA57). > > > > And the final correct solution should be to enumerate the capabilities of > > hardware IOMMUs used by the assigned device, and reject if any dismatch is > > found. > > Right. And it's a hack because > 1. CPU AW doesn't always match VTD AW > 2. The limitation only applies to hardware devices, software ones are fine > So we need a patch for the host sysfs to expose the actual IOMMU AW to userspace. > QEMU could then look at the actual hardware features. > I'd like to see the actual patch doing that, even if we > add a hack based on CPU AW for existing systems. > Sure, I have plan to do so. And I am wondering, if this is a must for current patchset to be accepted? I mean, after all, we already have the same problem on existing platform. :) > > But how is it working for TCG? It would seem that > VFIO with TCG would be just as broken as with KVM... Sorry, may I ask why TCG shall be broken? I had thought TCG does not need IOMMU shadowing... > > > Maybe I should add a TODO in above comments, give the background explaination. > > > > > > > > > so > > > > * that iommu shadowing works for device assignment scenario. But for > > > > * TCG mode, we do not need such restriction. > > > > */ > > > > > > > > BTW, I just tested the TCG mode, it works(with restriction of host capability removed). > > > > > > > > > > > > > > > > > > > > > > > 2> Or, we can choose to keep as it is, and add the check when 5-level paging > > > > > > > > > vIOMMU does have usage in TCG? > > > > > > > > > > > > > > > > > > But as to the check of guest capability, I still believe it is necessary. As > > > > > > > > > said, a VM without LA57 feature shall not see a VT-d with 5-level IOMMU. > > > > > > > > > > > > > > > > > > > > + error_setg(errp, "Do not support 57-bit DMA address, unless both " > > > > > > > > > > > + "host and guest are capable of 5-level paging"); > > > > > > > > > > > return false; > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h > > > > > > > > > > > index d084099..2b29b6f 100644 > > > > > > > > > > > --- a/hw/i386/intel_iommu_internal.h > > > > > > > > > > > +++ b/hw/i386/intel_iommu_internal.h > > > > > > > > > > > @@ -114,8 +114,8 @@ > > > > > > > > > > > VTD_INTERRUPT_ADDR_FIRST + 1) > > > > > > > > > > > > > > > > > > > > > > /* The shift of source_id in the key of IOTLB hash table */ > > > > > > > > > > > -#define VTD_IOTLB_SID_SHIFT 36 > > > > > > > > > > > -#define VTD_IOTLB_LVL_SHIFT 52 > > > > > > > > > > > +#define VTD_IOTLB_SID_SHIFT 45 > > > > > > > > > > > +#define VTD_IOTLB_LVL_SHIFT 61 > > > > > > > > > > > #define VTD_IOTLB_MAX_SIZE 1024 /* Max size of the hash table */ > > > > > > > > > > > > > > > > > > > > > > /* IOTLB_REG */ > > > > > > > > > > > @@ -212,6 +212,8 @@ > > > > > > > > > > > #define VTD_CAP_SAGAW_39bit (0x2ULL << VTD_CAP_SAGAW_SHIFT) > > > > > > > > > > > /* 48-bit AGAW, 4-level page-table */ > > > > > > > > > > > #define VTD_CAP_SAGAW_48bit (0x4ULL << VTD_CAP_SAGAW_SHIFT) > > > > > > > > > > > + /* 57-bit AGAW, 5-level page-table */ > > > > > > > > > > > +#define VTD_CAP_SAGAW_57bit (0x8ULL << VTD_CAP_SAGAW_SHIFT) > > > > > > > > > > > > > > > > > > > > > > /* IQT_REG */ > > > > > > > > > > > #define VTD_IQT_QT(val) (((val) >> 4) & 0x7fffULL) > > > > > > > > > > > @@ -379,6 +381,8 @@ typedef union VTDInvDesc VTDInvDesc; > > > > > > > > > > > (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > > > #define VTD_SPTE_PAGE_L4_RSVD_MASK(aw) \ > > > > > > > > > > > (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > > > +#define VTD_SPTE_PAGE_L5_RSVD_MASK(aw) \ > > > > > > > > > > > + (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > > > #define VTD_SPTE_LPAGE_L1_RSVD_MASK(aw) \ > > > > > > > > > > > (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > > > #define VTD_SPTE_LPAGE_L2_RSVD_MASK(aw) \ > > > > > > > > > > > @@ -387,6 +391,8 @@ typedef union VTDInvDesc VTDInvDesc; > > > > > > > > > > > (0x3ffff800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > > > #define VTD_SPTE_LPAGE_L4_RSVD_MASK(aw) \ > > > > > > > > > > > (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > > > +#define VTD_SPTE_LPAGE_L5_RSVD_MASK(aw) \ > > > > > > > > > > > + (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > > > > > > > > > > > > > > /* Information about page-selective IOTLB invalidate */ > > > > > > > > > > > struct VTDIOTLBPageInvInfo { > > > > > > > > > > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h > > > > > > > > > > > index 820451c..7474c4f 100644 > > > > > > > > > > > --- a/include/hw/i386/intel_iommu.h > > > > > > > > > > > +++ b/include/hw/i386/intel_iommu.h > > > > > > > > > > > @@ -49,6 +49,7 @@ > > > > > > > > > > > #define DMAR_REG_SIZE 0x230 > > > > > > > > > > > #define VTD_AW_39BIT 39 > > > > > > > > > > > #define VTD_AW_48BIT 48 > > > > > > > > > > > +#define VTD_AW_57BIT 57 > > > > > > > > > > > #define VTD_ADDRESS_WIDTH VTD_AW_39BIT > > > > > > > > > > > #define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > B.R. > > > > > > > > > Yu > > > > > > > > > > > > > > > > > > > > > B.R. > > > > > > Yu > > > > > > > > > > > > > B.R. > > > > Yu > > > > B.R. > > Yu B.R. Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width. 2018-12-20 5:49 ` Yu Zhang @ 2018-12-20 18:28 ` Michael S. Tsirkin 2018-12-21 16:19 ` Yu Zhang 0 siblings, 1 reply; 57+ messages in thread From: Michael S. Tsirkin @ 2018-12-20 18:28 UTC (permalink / raw) To: Yu Zhang Cc: Eduardo Habkost, qemu-devel, Peter Xu, Paolo Bonzini, Igor Mammedov, Richard Henderson On Thu, Dec 20, 2018 at 01:49:21PM +0800, Yu Zhang wrote: > On Wed, Dec 19, 2018 at 10:23:44AM -0500, Michael S. Tsirkin wrote: > > On Wed, Dec 19, 2018 at 01:57:43PM +0800, Yu Zhang wrote: > > > On Tue, Dec 18, 2018 at 11:35:34PM -0500, Michael S. Tsirkin wrote: > > > > On Wed, Dec 19, 2018 at 11:40:06AM +0800, Yu Zhang wrote: > > > > > On Tue, Dec 18, 2018 at 09:49:02AM -0500, Michael S. Tsirkin wrote: > > > > > > On Tue, Dec 18, 2018 at 09:45:41PM +0800, Yu Zhang wrote: > > > > > > > On Tue, Dec 18, 2018 at 07:43:28AM -0500, Michael S. Tsirkin wrote: > > > > > > > > On Tue, Dec 18, 2018 at 06:01:16PM +0800, Yu Zhang wrote: > > > > > > > > > On Tue, Dec 18, 2018 at 05:47:14PM +0800, Yu Zhang wrote: > > > > > > > > > > On Mon, Dec 17, 2018 at 02:29:02PM +0100, Igor Mammedov wrote: > > > > > > > > > > > On Wed, 12 Dec 2018 21:05:39 +0800 > > > > > > > > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > A 5-level paging capable VM may choose to use 57-bit IOVA address width. > > > > > > > > > > > > E.g. guest applications may prefer to use its VA as IOVA when performing > > > > > > > > > > > > VFIO map/unmap operations, to avoid the burden of managing the IOVA space. > > > > > > > > > > > > > > > > > > > > > > > > This patch extends the current vIOMMU logic to cover the extended address > > > > > > > > > > > > width. When creating a VM with 5-level paging feature, one can choose to > > > > > > > > > > > > create a virtual VTD with 5-level paging capability, with configurations > > > > > > > > > > > > like "-device intel-iommu,x-aw-bits=57". > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > > > > > > > > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > > > > > > > > > > --- > > > > > > > > > > > > Cc: "Michael S. Tsirkin" <mst@redhat.com> > > > > > > > > > > > > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > > > > > > > > > > > > Cc: Paolo Bonzini <pbonzini@redhat.com> > > > > > > > > > > > > Cc: Richard Henderson <rth@twiddle.net> > > > > > > > > > > > > Cc: Eduardo Habkost <ehabkost@redhat.com> > > > > > > > > > > > > Cc: Peter Xu <peterx@redhat.com> > > > > > > > > > > > > --- > > > > > > > > > > > > hw/i386/intel_iommu.c | 53 ++++++++++++++++++++++++++++++++---------- > > > > > > > > > > > > hw/i386/intel_iommu_internal.h | 10 ++++++-- > > > > > > > > > > > > include/hw/i386/intel_iommu.h | 1 + > > > > > > > > > > > > 3 files changed, 50 insertions(+), 14 deletions(-) > > > > > > > > > > > > > > > > > > > > > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > > > > > > > > > > > > index 0e88c63..871110c 100644 > > > > > > > > > > > > --- a/hw/i386/intel_iommu.c > > > > > > > > > > > > +++ b/hw/i386/intel_iommu.c > > > > > > > > > > > > @@ -664,16 +664,16 @@ static inline bool vtd_iova_range_check(uint64_t iova, VTDContextEntry *ce, > > > > > > > > > > > > > > > > > > > > > > > > /* > > > > > > > > > > > > * Rsvd field masks for spte: > > > > > > > > > > > > - * Index [1] to [4] 4k pages > > > > > > > > > > > > - * Index [5] to [8] large pages > > > > > > > > > > > > + * Index [1] to [5] 4k pages > > > > > > > > > > > > + * Index [6] to [10] large pages > > > > > > > > > > > > */ > > > > > > > > > > > > -static uint64_t vtd_paging_entry_rsvd_field[9]; > > > > > > > > > > > > +static uint64_t vtd_paging_entry_rsvd_field[11]; > > > > > > > > > > > > > > > > > > > > > > > > static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t level) > > > > > > > > > > > > { > > > > > > > > > > > > if (slpte & VTD_SL_PT_PAGE_SIZE_MASK) { > > > > > > > > > > > > /* Maybe large page */ > > > > > > > > > > > > - return slpte & vtd_paging_entry_rsvd_field[level + 4]; > > > > > > > > > > > > + return slpte & vtd_paging_entry_rsvd_field[level + 5]; > > > > > > > > > > > > } else { > > > > > > > > > > > > return slpte & vtd_paging_entry_rsvd_field[level]; > > > > > > > > > > > > } > > > > > > > > > > > > @@ -3127,6 +3127,8 @@ static void vtd_init(IntelIOMMUState *s) > > > > > > > > > > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > > > > > > > > > > if (s->aw_bits == VTD_AW_48BIT) { > > > > > > > > > > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > > > > > > > > > > + } else if (s->aw_bits == VTD_AW_57BIT) { > > > > > > > > > > > > + s->cap |= VTD_CAP_SAGAW_57bit | VTD_CAP_SAGAW_48bit; > > > > > > > > > > > > } > > > > > > > > > > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > > > > > > > > > > s->haw_bits = cpu->phys_bits; > > > > > > > > > > > > @@ -3139,10 +3141,12 @@ static void vtd_init(IntelIOMMUState *s) > > > > > > > > > > > > vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_PAGE_L5_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > + vtd_paging_entry_rsvd_field[9] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > + vtd_paging_entry_rsvd_field[10] = VTD_SPTE_LPAGE_L5_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > > > > > > > > > > > > > if (x86_iommu->intr_supported) { > > > > > > > > > > > > s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; > > > > > > > > > > > > @@ -3241,6 +3245,23 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn) > > > > > > > > > > > > return &vtd_as->as; > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > +static bool host_has_la57(void) > > > > > > > > > > > > +{ > > > > > > > > > > > > + uint32_t ecx, unused; > > > > > > > > > > > > + > > > > > > > > > > > > + host_cpuid(7, 0, &unused, &unused, &ecx, &unused); > > > > > > > > > > > > + return ecx & CPUID_7_0_ECX_LA57; > > > > > > > > > > > > +} > > > > > > > > > > > > + > > > > > > > > > > > > +static bool guest_has_la57(void) > > > > > > > > > > > > +{ > > > > > > > > > > > > + CPUState *cs = first_cpu; > > > > > > > > > > > > + X86CPU *cpu = X86_CPU(cs); > > > > > > > > > > > > + CPUX86State *env = &cpu->env; > > > > > > > > > > > > + > > > > > > > > > > > > + return env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_LA57; > > > > > > > > > > > > +} > > > > > > > > > > > another direct access to CPU fields, > > > > > > > > > > > I'd suggest to set this value when iommu is created > > > > > > > > > > > i.e. add 'la57' property and set from iommu owner. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sorry, do you mean "-device intel-iommu,la57"? I think we do not need > > > > > > > > > > that, because a 5-level capable vIOMMU can be created with properties > > > > > > > > > > like "-device intel-iommu,x-aw-bits=57". > > > > > > > > > > > > > > > > > > > > The guest CPU fields are checked to make sure the VM has LA57 CPU feature, > > > > > > > > > > because I believe there shall be no 5-level IOMMU on platforms without LA57 > > > > > > > > > > CPUs. > > > > > > > > > > > > > > > > I don't necessarily see why these need to be connected. > > > > > > > > If yes pls add code to explain. > > > > > > > > > > > > > > Sorry, do you mean the VM shall be able to see a 5-level IOMMU even it does not > > > > > > > have LA57 feature? I do not see any direct connection when asked to enable a 5-level > > > > > > > vIOMMU at first, but I was told(and checked) that DPDK in the VM may choose a VA > > > > > > > value as an IOVA. > > > > > > > > > > > > Right but then that doesn't work on all hosts either. > > > > > > > > > > Oh, the host already has 5-level IOMMU now. So I think DPDK in native shall work with that. > > > > > > > > > > > > > > > > > > And if guest has LA57, we should create a 5-level vIOMMU to the VM. > > > > > > > But if the VM even does not have LA57, any specific reason we should give it a 5-level > > > > > > > vIOMMU? > > > > > > > > > > > > So the example you give is VTD address width < CPU aw. That is known > > > > > > to be problematic for dpdk but not for other software and maybe dpdk > > > > > > will learns how to cope. Given such hosts exist it might be > > > > > > useful to support this at least for debugging. > > > > > > > > > > > > Are there reasons to worry about VTD > CPU? > > > > > > > > > > Well, I am not that worried(no usage case is one concern). I am OK to drop the guest check. :) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > > > > > > > > > { > > > > > > > > > > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > > > > > > > > > > @@ -3267,11 +3288,19 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > > > > > > > > > } > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > - /* Currently only address widths supported are 39 and 48 bits */ > > > > > > > > > > > > + /* Currently address widths supported are 39, 48, and 57 bits */ > > > > > > > > > > > > if ((s->aw_bits != VTD_AW_39BIT) && > > > > > > > > > > > > - (s->aw_bits != VTD_AW_48BIT)) { > > > > > > > > > > > > - error_setg(errp, "Supported values for x-aw-bits are: %d, %d", > > > > > > > > > > > > - VTD_AW_39BIT, VTD_AW_48BIT); > > > > > > > > > > > > + (s->aw_bits != VTD_AW_48BIT) && > > > > > > > > > > > > + (s->aw_bits != VTD_AW_57BIT)) { > > > > > > > > > > > > + error_setg(errp, "Supported values for x-aw-bits are: %d, %d, %d", > > > > > > > > > > > > + VTD_AW_39BIT, VTD_AW_48BIT, VTD_AW_57BIT); > > > > > > > > > > > > + return false; > > > > > > > > > > > > + } > > > > > > > > > > > > + > > > > > > > > > > > > + if ((s->aw_bits == VTD_AW_57BIT) && > > > > > > > > > > > > + !(host_has_la57() && guest_has_la57())) { > > > > > > > > > > > Does iommu supposed to work in TCG mode? > > > > > > > > > > > If yes then why it should care about host_has_la57()? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hmm... I did not take TCG mode into consideration. And host_has_la57() is > > > > > > > > > > used to guarantee the host have la57 feature so that iommu shadowing works > > > > > > > > > > for device assignment. > > > > > > > > > > > > > > > > > > > > I guess iommu shall work in TCG mode(though I am not quite sure about this). > > > > > > > > > > But I do not have any usage case of a 5-level vIOMMU in TCG in mind. So maybe > > > > > > > > > > we can: > > > > > > > > > > 1> check the 'ms->accel' in vtd_decide_config() and do not care about host > > > > > > > > > > capability if it is TCG. > > > > > > > > > > > > > > > > > > For choice 1, kvm_enabled() might be used instead of ms->accel. Thanks Peter > > > > > > > > > for the remind. :) > > > > > > > > > > > > > > > > > > > > > > > > This needs a big comment with an explanation though. > > > > > > > > And probably a TODO to make it work under TCG ... > > > > > > > > > > > > > > > > > > > > > > Thanks, Michael. For choice 1, I believe it should work for TCG(will need test > > > > > > > though), and the condition would be sth. like: > > > > > > > > > > > > > > if ((s->aw_bits == VTD_AW_57BIT) && > > > > > > > kvm_enabled() && > > > > > > > !host_has_la57()) { > > > > > > > > > > > > > > As you can see, though I remove the check of guest_has_la57(), I still kept the > > > > > > > check against host when KVM is enabled. I'm still ready to be convinced for any > > > > > > > requirement why we do not need the guest check. :) > > > > > > > > > > > > > > > > > > okay but then (repeating myself, sorry) pls add a comment that explains > > > > > > what happens if you do not add this limitation. > > > > > > > > > > How about below comments? > > > > > /* > > > > > * For KVM guests, the host capability of LA57 shall be available, > > > > > > > > So why is host CPU LA57 necessary for shadowing? Could you explain pls? > > > > > > Oh, let me try to explain the background here. :) > > > > > > Currently, vIOMMU in qemu does not have logic to check against the hardware > > > IOMMU capability. E.g. when we create an IOMMU with 48 bit DMA address width, > > > qemu does not check if any physical IOMMU has such support. And the shadow > > > IOMMU logic will have problem if host IOMMU only supports 39 bit IOVA. And > > > we will have the same problem when it comes to 57 bit IOVA. > > > > > > My previous discussion with Peter Xu reached an agreement that for now, we > > > just use the host cpu capability as a reference when trying to create a 5-level > > > vIOMMU, because 57 bit IOMMU hardware will not come until ICX platform(which > > > includes LA57). > > > > > > And the final correct solution should be to enumerate the capabilities of > > > hardware IOMMUs used by the assigned device, and reject if any dismatch is > > > found. > > > > Right. And it's a hack because > > 1. CPU AW doesn't always match VTD AW > > 2. The limitation only applies to hardware devices, software ones are fine > > So we need a patch for the host sysfs to expose the actual IOMMU AW to userspace. > > QEMU could then look at the actual hardware features. > > I'd like to see the actual patch doing that, even if we > > add a hack based on CPU AW for existing systems. > > > > Sure, I have plan to do so. And I am wondering, if this is a must for current > patchset to be accepted? I mean, after all, we already have the same problem > on existing platform. :) I'd like to avoid poking at the CPU from VTD code. That's all. > > > > But how is it working for TCG? It would seem that > > VFIO with TCG would be just as broken as with KVM... > > Sorry, may I ask why TCG shall be broken? I had thought TCG does not need IOMMU > shadowing... IOMMU shadowing is used for vfio. I do not think it matters whether it's KVM or TCG. > > > > > Maybe I should add a TODO in above comments, give the background explaination. > > > > > > > > > > > > so > > > > > * that iommu shadowing works for device assignment scenario. But for > > > > > * TCG mode, we do not need such restriction. > > > > > */ > > > > > > > > > > BTW, I just tested the TCG mode, it works(with restriction of host capability removed). > > > > > > > > > > > > > > > > > > > > > > > > > > > 2> Or, we can choose to keep as it is, and add the check when 5-level paging > > > > > > > > > > vIOMMU does have usage in TCG? > > > > > > > > > > > > > > > > > > > > But as to the check of guest capability, I still believe it is necessary. As > > > > > > > > > > said, a VM without LA57 feature shall not see a VT-d with 5-level IOMMU. > > > > > > > > > > > > > > > > > > > > > > + error_setg(errp, "Do not support 57-bit DMA address, unless both " > > > > > > > > > > > > + "host and guest are capable of 5-level paging"); > > > > > > > > > > > > return false; > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h > > > > > > > > > > > > index d084099..2b29b6f 100644 > > > > > > > > > > > > --- a/hw/i386/intel_iommu_internal.h > > > > > > > > > > > > +++ b/hw/i386/intel_iommu_internal.h > > > > > > > > > > > > @@ -114,8 +114,8 @@ > > > > > > > > > > > > VTD_INTERRUPT_ADDR_FIRST + 1) > > > > > > > > > > > > > > > > > > > > > > > > /* The shift of source_id in the key of IOTLB hash table */ > > > > > > > > > > > > -#define VTD_IOTLB_SID_SHIFT 36 > > > > > > > > > > > > -#define VTD_IOTLB_LVL_SHIFT 52 > > > > > > > > > > > > +#define VTD_IOTLB_SID_SHIFT 45 > > > > > > > > > > > > +#define VTD_IOTLB_LVL_SHIFT 61 > > > > > > > > > > > > #define VTD_IOTLB_MAX_SIZE 1024 /* Max size of the hash table */ > > > > > > > > > > > > > > > > > > > > > > > > /* IOTLB_REG */ > > > > > > > > > > > > @@ -212,6 +212,8 @@ > > > > > > > > > > > > #define VTD_CAP_SAGAW_39bit (0x2ULL << VTD_CAP_SAGAW_SHIFT) > > > > > > > > > > > > /* 48-bit AGAW, 4-level page-table */ > > > > > > > > > > > > #define VTD_CAP_SAGAW_48bit (0x4ULL << VTD_CAP_SAGAW_SHIFT) > > > > > > > > > > > > + /* 57-bit AGAW, 5-level page-table */ > > > > > > > > > > > > +#define VTD_CAP_SAGAW_57bit (0x8ULL << VTD_CAP_SAGAW_SHIFT) > > > > > > > > > > > > > > > > > > > > > > > > /* IQT_REG */ > > > > > > > > > > > > #define VTD_IQT_QT(val) (((val) >> 4) & 0x7fffULL) > > > > > > > > > > > > @@ -379,6 +381,8 @@ typedef union VTDInvDesc VTDInvDesc; > > > > > > > > > > > > (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > > > > #define VTD_SPTE_PAGE_L4_RSVD_MASK(aw) \ > > > > > > > > > > > > (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > > > > +#define VTD_SPTE_PAGE_L5_RSVD_MASK(aw) \ > > > > > > > > > > > > + (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > > > > #define VTD_SPTE_LPAGE_L1_RSVD_MASK(aw) \ > > > > > > > > > > > > (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > > > > #define VTD_SPTE_LPAGE_L2_RSVD_MASK(aw) \ > > > > > > > > > > > > @@ -387,6 +391,8 @@ typedef union VTDInvDesc VTDInvDesc; > > > > > > > > > > > > (0x3ffff800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > > > > #define VTD_SPTE_LPAGE_L4_RSVD_MASK(aw) \ > > > > > > > > > > > > (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > > > > +#define VTD_SPTE_LPAGE_L5_RSVD_MASK(aw) \ > > > > > > > > > > > > + (0x880ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) > > > > > > > > > > > > > > > > > > > > > > > > /* Information about page-selective IOTLB invalidate */ > > > > > > > > > > > > struct VTDIOTLBPageInvInfo { > > > > > > > > > > > > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h > > > > > > > > > > > > index 820451c..7474c4f 100644 > > > > > > > > > > > > --- a/include/hw/i386/intel_iommu.h > > > > > > > > > > > > +++ b/include/hw/i386/intel_iommu.h > > > > > > > > > > > > @@ -49,6 +49,7 @@ > > > > > > > > > > > > #define DMAR_REG_SIZE 0x230 > > > > > > > > > > > > #define VTD_AW_39BIT 39 > > > > > > > > > > > > #define VTD_AW_48BIT 48 > > > > > > > > > > > > +#define VTD_AW_57BIT 57 > > > > > > > > > > > > #define VTD_ADDRESS_WIDTH VTD_AW_39BIT > > > > > > > > > > > > #define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > B.R. > > > > > > > > > > Yu > > > > > > > > > > > > > > > > > > > > > > > > B.R. > > > > > > > Yu > > > > > > > > > > > > > > > > B.R. > > > > > Yu > > > > > > B.R. > > > Yu > B.R. > Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width. 2018-12-20 18:28 ` Michael S. Tsirkin @ 2018-12-21 16:19 ` Yu Zhang 2018-12-21 17:15 ` Michael S. Tsirkin 0 siblings, 1 reply; 57+ messages in thread From: Yu Zhang @ 2018-12-21 16:19 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Eduardo Habkost, qemu-devel, Peter Xu, Igor Mammedov, Paolo Bonzini, Richard Henderson On Thu, Dec 20, 2018 at 01:28:21PM -0500, Michael S. Tsirkin wrote: > On Thu, Dec 20, 2018 at 01:49:21PM +0800, Yu Zhang wrote: > > On Wed, Dec 19, 2018 at 10:23:44AM -0500, Michael S. Tsirkin wrote: > > > On Wed, Dec 19, 2018 at 01:57:43PM +0800, Yu Zhang wrote: > > > > On Tue, Dec 18, 2018 at 11:35:34PM -0500, Michael S. Tsirkin wrote: > > > > > On Wed, Dec 19, 2018 at 11:40:06AM +0800, Yu Zhang wrote: > > > > > > On Tue, Dec 18, 2018 at 09:49:02AM -0500, Michael S. Tsirkin wrote: > > > > > > > On Tue, Dec 18, 2018 at 09:45:41PM +0800, Yu Zhang wrote: > > > > > > > > On Tue, Dec 18, 2018 at 07:43:28AM -0500, Michael S. Tsirkin wrote: > > > > > > > > > On Tue, Dec 18, 2018 at 06:01:16PM +0800, Yu Zhang wrote: > > > > > > > > > > On Tue, Dec 18, 2018 at 05:47:14PM +0800, Yu Zhang wrote: > > > > > > > > > > > On Mon, Dec 17, 2018 at 02:29:02PM +0100, Igor Mammedov wrote: > > > > > > > > > > > > On Wed, 12 Dec 2018 21:05:39 +0800 > > > > > > > > > > > > Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > A 5-level paging capable VM may choose to use 57-bit IOVA address width. > > > > > > > > > > > > > E.g. guest applications may prefer to use its VA as IOVA when performing > > > > > > > > > > > > > VFIO map/unmap operations, to avoid the burden of managing the IOVA space. > > > > > > > > > > > > > > > > > > > > > > > > > > This patch extends the current vIOMMU logic to cover the extended address > > > > > > > > > > > > > width. When creating a VM with 5-level paging feature, one can choose to > > > > > > > > > > > > > create a virtual VTD with 5-level paging capability, with configurations > > > > > > > > > > > > > like "-device intel-iommu,x-aw-bits=57". > > > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > > > > > > > > > > > > > Reviewed-by: Peter Xu <peterx@redhat.com> > > > > > > > > > > > > > --- > > > > > > > > > > > > > Cc: "Michael S. Tsirkin" <mst@redhat.com> > > > > > > > > > > > > > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > > > > > > > > > > > > > Cc: Paolo Bonzini <pbonzini@redhat.com> > > > > > > > > > > > > > Cc: Richard Henderson <rth@twiddle.net> > > > > > > > > > > > > > Cc: Eduardo Habkost <ehabkost@redhat.com> > > > > > > > > > > > > > Cc: Peter Xu <peterx@redhat.com> > > > > > > > > > > > > > --- > > > > > > > > > > > > > hw/i386/intel_iommu.c | 53 ++++++++++++++++++++++++++++++++---------- > > > > > > > > > > > > > hw/i386/intel_iommu_internal.h | 10 ++++++-- > > > > > > > > > > > > > include/hw/i386/intel_iommu.h | 1 + > > > > > > > > > > > > > 3 files changed, 50 insertions(+), 14 deletions(-) > > > > > > > > > > > > > > > > > > > > > > > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > > > > > > > > > > > > > index 0e88c63..871110c 100644 > > > > > > > > > > > > > --- a/hw/i386/intel_iommu.c > > > > > > > > > > > > > +++ b/hw/i386/intel_iommu.c > > > > > > > > > > > > > @@ -664,16 +664,16 @@ static inline bool vtd_iova_range_check(uint64_t iova, VTDContextEntry *ce, > > > > > > > > > > > > > > > > > > > > > > > > > > /* > > > > > > > > > > > > > * Rsvd field masks for spte: > > > > > > > > > > > > > - * Index [1] to [4] 4k pages > > > > > > > > > > > > > - * Index [5] to [8] large pages > > > > > > > > > > > > > + * Index [1] to [5] 4k pages > > > > > > > > > > > > > + * Index [6] to [10] large pages > > > > > > > > > > > > > */ > > > > > > > > > > > > > -static uint64_t vtd_paging_entry_rsvd_field[9]; > > > > > > > > > > > > > +static uint64_t vtd_paging_entry_rsvd_field[11]; > > > > > > > > > > > > > > > > > > > > > > > > > > static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t level) > > > > > > > > > > > > > { > > > > > > > > > > > > > if (slpte & VTD_SL_PT_PAGE_SIZE_MASK) { > > > > > > > > > > > > > /* Maybe large page */ > > > > > > > > > > > > > - return slpte & vtd_paging_entry_rsvd_field[level + 4]; > > > > > > > > > > > > > + return slpte & vtd_paging_entry_rsvd_field[level + 5]; > > > > > > > > > > > > > } else { > > > > > > > > > > > > > return slpte & vtd_paging_entry_rsvd_field[level]; > > > > > > > > > > > > > } > > > > > > > > > > > > > @@ -3127,6 +3127,8 @@ static void vtd_init(IntelIOMMUState *s) > > > > > > > > > > > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > > > > > > > > > > > if (s->aw_bits == VTD_AW_48BIT) { > > > > > > > > > > > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > > > > > > > > > > > + } else if (s->aw_bits == VTD_AW_57BIT) { > > > > > > > > > > > > > + s->cap |= VTD_CAP_SAGAW_57bit | VTD_CAP_SAGAW_48bit; > > > > > > > > > > > > > } > > > > > > > > > > > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > > > > > > > > > > > s->haw_bits = cpu->phys_bits; > > > > > > > > > > > > > @@ -3139,10 +3141,12 @@ static void vtd_init(IntelIOMMUState *s) > > > > > > > > > > > > > vtd_paging_entry_rsvd_field[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > > vtd_paging_entry_rsvd_field[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > > vtd_paging_entry_rsvd_field[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > > - vtd_paging_entry_rsvd_field[5] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > > - vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > > - vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > > - vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > > + vtd_paging_entry_rsvd_field[5] = VTD_SPTE_PAGE_L5_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > > + vtd_paging_entry_rsvd_field[6] = VTD_SPTE_LPAGE_L1_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > > + vtd_paging_entry_rsvd_field[7] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > > + vtd_paging_entry_rsvd_field[8] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > > + vtd_paging_entry_rsvd_field[9] = VTD_SPTE_LPAGE_L4_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > > + vtd_paging_entry_rsvd_field[10] = VTD_SPTE_LPAGE_L5_RSVD_MASK(s->haw_bits); > > > > > > > > > > > > > > > > > > > > > > > > > > if (x86_iommu->intr_supported) { > > > > > > > > > > > > > s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV; > > > > > > > > > > > > > @@ -3241,6 +3245,23 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn) > > > > > > > > > > > > > return &vtd_as->as; > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > +static bool host_has_la57(void) > > > > > > > > > > > > > +{ > > > > > > > > > > > > > + uint32_t ecx, unused; > > > > > > > > > > > > > + > > > > > > > > > > > > > + host_cpuid(7, 0, &unused, &unused, &ecx, &unused); > > > > > > > > > > > > > + return ecx & CPUID_7_0_ECX_LA57; > > > > > > > > > > > > > +} > > > > > > > > > > > > > + > > > > > > > > > > > > > +static bool guest_has_la57(void) > > > > > > > > > > > > > +{ > > > > > > > > > > > > > + CPUState *cs = first_cpu; > > > > > > > > > > > > > + X86CPU *cpu = X86_CPU(cs); > > > > > > > > > > > > > + CPUX86State *env = &cpu->env; > > > > > > > > > > > > > + > > > > > > > > > > > > > + return env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_LA57; > > > > > > > > > > > > > +} > > > > > > > > > > > > another direct access to CPU fields, > > > > > > > > > > > > I'd suggest to set this value when iommu is created > > > > > > > > > > > > i.e. add 'la57' property and set from iommu owner. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sorry, do you mean "-device intel-iommu,la57"? I think we do not need > > > > > > > > > > > that, because a 5-level capable vIOMMU can be created with properties > > > > > > > > > > > like "-device intel-iommu,x-aw-bits=57". > > > > > > > > > > > > > > > > > > > > > > The guest CPU fields are checked to make sure the VM has LA57 CPU feature, > > > > > > > > > > > because I believe there shall be no 5-level IOMMU on platforms without LA57 > > > > > > > > > > > CPUs. > > > > > > > > > > > > > > > > > > I don't necessarily see why these need to be connected. > > > > > > > > > If yes pls add code to explain. > > > > > > > > > > > > > > > > Sorry, do you mean the VM shall be able to see a 5-level IOMMU even it does not > > > > > > > > have LA57 feature? I do not see any direct connection when asked to enable a 5-level > > > > > > > > vIOMMU at first, but I was told(and checked) that DPDK in the VM may choose a VA > > > > > > > > value as an IOVA. > > > > > > > > > > > > > > Right but then that doesn't work on all hosts either. > > > > > > > > > > > > Oh, the host already has 5-level IOMMU now. So I think DPDK in native shall work with that. > > > > > > > > > > > > > > > > > > > > > And if guest has LA57, we should create a 5-level vIOMMU to the VM. > > > > > > > > But if the VM even does not have LA57, any specific reason we should give it a 5-level > > > > > > > > vIOMMU? > > > > > > > > > > > > > > So the example you give is VTD address width < CPU aw. That is known > > > > > > > to be problematic for dpdk but not for other software and maybe dpdk > > > > > > > will learns how to cope. Given such hosts exist it might be > > > > > > > useful to support this at least for debugging. > > > > > > > > > > > > > > Are there reasons to worry about VTD > CPU? > > > > > > > > > > > > Well, I am not that worried(no usage case is one concern). I am OK to drop the guest check. :) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > > > > > > > > > > { > > > > > > > > > > > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > > > > > > > > > > > @@ -3267,11 +3288,19 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) > > > > > > > > > > > > > } > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > - /* Currently only address widths supported are 39 and 48 bits */ > > > > > > > > > > > > > + /* Currently address widths supported are 39, 48, and 57 bits */ > > > > > > > > > > > > > if ((s->aw_bits != VTD_AW_39BIT) && > > > > > > > > > > > > > - (s->aw_bits != VTD_AW_48BIT)) { > > > > > > > > > > > > > - error_setg(errp, "Supported values for x-aw-bits are: %d, %d", > > > > > > > > > > > > > - VTD_AW_39BIT, VTD_AW_48BIT); > > > > > > > > > > > > > + (s->aw_bits != VTD_AW_48BIT) && > > > > > > > > > > > > > + (s->aw_bits != VTD_AW_57BIT)) { > > > > > > > > > > > > > + error_setg(errp, "Supported values for x-aw-bits are: %d, %d, %d", > > > > > > > > > > > > > + VTD_AW_39BIT, VTD_AW_48BIT, VTD_AW_57BIT); > > > > > > > > > > > > > + return false; > > > > > > > > > > > > > + } > > > > > > > > > > > > > + > > > > > > > > > > > > > + if ((s->aw_bits == VTD_AW_57BIT) && > > > > > > > > > > > > > + !(host_has_la57() && guest_has_la57())) { > > > > > > > > > > > > Does iommu supposed to work in TCG mode? > > > > > > > > > > > > If yes then why it should care about host_has_la57()? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hmm... I did not take TCG mode into consideration. And host_has_la57() is > > > > > > > > > > > used to guarantee the host have la57 feature so that iommu shadowing works > > > > > > > > > > > for device assignment. > > > > > > > > > > > > > > > > > > > > > > I guess iommu shall work in TCG mode(though I am not quite sure about this). > > > > > > > > > > > But I do not have any usage case of a 5-level vIOMMU in TCG in mind. So maybe > > > > > > > > > > > we can: > > > > > > > > > > > 1> check the 'ms->accel' in vtd_decide_config() and do not care about host > > > > > > > > > > > capability if it is TCG. > > > > > > > > > > > > > > > > > > > > For choice 1, kvm_enabled() might be used instead of ms->accel. Thanks Peter > > > > > > > > > > for the remind. :) > > > > > > > > > > > > > > > > > > > > > > > > > > > This needs a big comment with an explanation though. > > > > > > > > > And probably a TODO to make it work under TCG ... > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, Michael. For choice 1, I believe it should work for TCG(will need test > > > > > > > > though), and the condition would be sth. like: > > > > > > > > > > > > > > > > if ((s->aw_bits == VTD_AW_57BIT) && > > > > > > > > kvm_enabled() && > > > > > > > > !host_has_la57()) { > > > > > > > > > > > > > > > > As you can see, though I remove the check of guest_has_la57(), I still kept the > > > > > > > > check against host when KVM is enabled. I'm still ready to be convinced for any > > > > > > > > requirement why we do not need the guest check. :) > > > > > > > > > > > > > > > > > > > > > okay but then (repeating myself, sorry) pls add a comment that explains > > > > > > > what happens if you do not add this limitation. > > > > > > > > > > > > How about below comments? > > > > > > /* > > > > > > * For KVM guests, the host capability of LA57 shall be available, > > > > > > > > > > So why is host CPU LA57 necessary for shadowing? Could you explain pls? > > > > > > > > Oh, let me try to explain the background here. :) > > > > > > > > Currently, vIOMMU in qemu does not have logic to check against the hardware > > > > IOMMU capability. E.g. when we create an IOMMU with 48 bit DMA address width, > > > > qemu does not check if any physical IOMMU has such support. And the shadow > > > > IOMMU logic will have problem if host IOMMU only supports 39 bit IOVA. And > > > > we will have the same problem when it comes to 57 bit IOVA. > > > > > > > > My previous discussion with Peter Xu reached an agreement that for now, we > > > > just use the host cpu capability as a reference when trying to create a 5-level > > > > vIOMMU, because 57 bit IOMMU hardware will not come until ICX platform(which > > > > includes LA57). > > > > > > > > And the final correct solution should be to enumerate the capabilities of > > > > hardware IOMMUs used by the assigned device, and reject if any dismatch is > > > > found. > > > > > > Right. And it's a hack because > > > 1. CPU AW doesn't always match VTD AW > > > 2. The limitation only applies to hardware devices, software ones are fine > > > So we need a patch for the host sysfs to expose the actual IOMMU AW to userspace. > > > QEMU could then look at the actual hardware features. > > > I'd like to see the actual patch doing that, even if we > > > add a hack based on CPU AW for existing systems. > > > > > > > Sure, I have plan to do so. And I am wondering, if this is a must for current > > patchset to be accepted? I mean, after all, we already have the same problem > > on existing platform. :) > > I'd like to avoid poking at the CPU from VTD code. That's all. OK. So for the short term,how about I remove the check of host cpu, and add a TODO in the comments in vtd_decide_config()? As to the check against hardware IOMMU, Peter once had a proposal in http://lists.nongnu.org/archive/html/qemu-devel/2018-11/msg02281.html Do you have any comment or suggestion on Peter's proposal? I still do not quite know how to do it for now... [...] B.R. Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width. 2018-12-21 16:19 ` Yu Zhang @ 2018-12-21 17:15 ` Michael S. Tsirkin 2018-12-21 17:34 ` Yu Zhang 0 siblings, 1 reply; 57+ messages in thread From: Michael S. Tsirkin @ 2018-12-21 17:15 UTC (permalink / raw) To: Yu Zhang Cc: Eduardo Habkost, qemu-devel, Peter Xu, Igor Mammedov, Paolo Bonzini, Richard Henderson On Sat, Dec 22, 2018 at 12:19:20AM +0800, Yu Zhang wrote: > > I'd like to avoid poking at the CPU from VTD code. That's all. > > OK. So for the short term,how about I remove the check of host cpu, and add a TODO > in the comments in vtd_decide_config()? My question would be what happens on an incorrect use? And how does user figure out which values to set? > As to the check against hardware IOMMU, Peter once had a proposal in > http://lists.nongnu.org/archive/html/qemu-devel/2018-11/msg02281.html > > Do you have any comment or suggestion on Peter's proposal? Sounds reasonable to me. Do we do it on vfio attach or unconditionally? > I still do not quite know > how to do it for now... > > [...] > > > B.R. > Yu -- MST ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width. 2018-12-21 17:15 ` Michael S. Tsirkin @ 2018-12-21 17:34 ` Yu Zhang 2018-12-21 18:10 ` Michael S. Tsirkin 2018-12-25 1:59 ` Tian, Kevin 0 siblings, 2 replies; 57+ messages in thread From: Yu Zhang @ 2018-12-21 17:34 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Eduardo Habkost, qemu-devel, Peter Xu, Igor Mammedov, Paolo Bonzini, Richard Henderson On Fri, Dec 21, 2018 at 12:15:26PM -0500, Michael S. Tsirkin wrote: > On Sat, Dec 22, 2018 at 12:19:20AM +0800, Yu Zhang wrote: > > > I'd like to avoid poking at the CPU from VTD code. That's all. > > > > OK. So for the short term,how about I remove the check of host cpu, and add a TODO > > in the comments in vtd_decide_config()? > > My question would be what happens on an incorrect use? I believe the vfio_dma_map will return failure for an incorrect use. > And how does user figure out which values to set? Well, for now I don't think user can figure out. E.g. if we expose a vIOMMU with 48-bit IOVA capability, yet host only supports 39-bit IOVA, vfio shall return failure, but the user does not know whose fault it is. > > > As to the check against hardware IOMMU, Peter once had a proposal in > > http://lists.nongnu.org/archive/html/qemu-devel/2018-11/msg02281.html > > > > Do you have any comment or suggestion on Peter's proposal? > > Sounds reasonable to me. Do we do it on vfio attach or unconditionally? > I guess on vfio attach? Will need more thinking in it. > > > I still do not quite know > > how to do it for now... > > > > [...] > > > > > > B.R. > > Yu > > > > -- > MST B.R. Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width. 2018-12-21 17:34 ` Yu Zhang @ 2018-12-21 18:10 ` Michael S. Tsirkin 2018-12-22 0:41 ` Yu Zhang 2018-12-25 1:59 ` Tian, Kevin 1 sibling, 1 reply; 57+ messages in thread From: Michael S. Tsirkin @ 2018-12-21 18:10 UTC (permalink / raw) To: Yu Zhang Cc: Eduardo Habkost, qemu-devel, Peter Xu, Igor Mammedov, Paolo Bonzini, Richard Henderson On Sat, Dec 22, 2018 at 01:34:01AM +0800, Yu Zhang wrote: > On Fri, Dec 21, 2018 at 12:15:26PM -0500, Michael S. Tsirkin wrote: > > On Sat, Dec 22, 2018 at 12:19:20AM +0800, Yu Zhang wrote: > > > > I'd like to avoid poking at the CPU from VTD code. That's all. > > > > > > OK. So for the short term,how about I remove the check of host cpu, and add a TODO > > > in the comments in vtd_decide_config()? > > > > My question would be what happens on an incorrect use? > > I believe the vfio_dma_map will return failure for an incorrect use. > > > And how does user figure out which values to set? > > Well, for now I don't think user can figure out. E.g. if we expose a vIOMMU with > 48-bit IOVA capability, yet host only supports 39-bit IOVA, vfio shall return failure, > but the user does not know whose fault it is. > > > > > As to the check against hardware IOMMU, Peter once had a proposal in > > > http://lists.nongnu.org/archive/html/qemu-devel/2018-11/msg02281.html > > > > > > Do you have any comment or suggestion on Peter's proposal? > > > > Sounds reasonable to me. Do we do it on vfio attach or unconditionally? > > > > I guess on vfio attach? Will need more thinking in it. Things like live migration (e.g. after hot removal of the vfio device) are also concerns. > > > > > I still do not quite know > > > how to do it for now... > > > > > > [...] > > > > > > > > > B.R. > > > Yu > > > > > > > > -- > > MST > > B.R. > Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width. 2018-12-21 18:10 ` Michael S. Tsirkin @ 2018-12-22 0:41 ` Yu Zhang 2018-12-25 17:00 ` Michael S. Tsirkin 0 siblings, 1 reply; 57+ messages in thread From: Yu Zhang @ 2018-12-22 0:41 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Eduardo Habkost, qemu-devel, Peter Xu, Igor Mammedov, Paolo Bonzini, Richard Henderson On Fri, Dec 21, 2018 at 01:10:13PM -0500, Michael S. Tsirkin wrote: > On Sat, Dec 22, 2018 at 01:34:01AM +0800, Yu Zhang wrote: > > On Fri, Dec 21, 2018 at 12:15:26PM -0500, Michael S. Tsirkin wrote: > > > On Sat, Dec 22, 2018 at 12:19:20AM +0800, Yu Zhang wrote: > > > > > I'd like to avoid poking at the CPU from VTD code. That's all. > > > > > > > > OK. So for the short term,how about I remove the check of host cpu, and add a TODO > > > > in the comments in vtd_decide_config()? > > > > > > My question would be what happens on an incorrect use? > > > > I believe the vfio_dma_map will return failure for an incorrect use. > > > > > And how does user figure out which values to set? > > > > Well, for now I don't think user can figure out. E.g. if we expose a vIOMMU with > > 48-bit IOVA capability, yet host only supports 39-bit IOVA, vfio shall return failure, > > but the user does not know whose fault it is. > > > > > > > As to the check against hardware IOMMU, Peter once had a proposal in > > > > http://lists.nongnu.org/archive/html/qemu-devel/2018-11/msg02281.html > > > > > > > > Do you have any comment or suggestion on Peter's proposal? > > > > > > Sounds reasonable to me. Do we do it on vfio attach or unconditionally? > > > > > > > I guess on vfio attach? Will need more thinking in it. > > > Things like live migration (e.g. after hot removal of the vfio device) > are also concerns. Sorry, why live migration shall be a problem? I mean, if the DMA address width of vIOMMU does not match the host IOMMU's, we can just stop creating the VM, there's no live migration. > > > > > > > > I still do not quite know > > > > how to do it for now... > > > > > > > > [...] > > > > > > > > > > > > B.R. > > > > Yu > > > > > > > > > > > > -- > > > MST > > > > B.R. > > Yu B.R. Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width. 2018-12-22 0:41 ` Yu Zhang @ 2018-12-25 17:00 ` Michael S. Tsirkin 2018-12-26 5:58 ` Yu Zhang 0 siblings, 1 reply; 57+ messages in thread From: Michael S. Tsirkin @ 2018-12-25 17:00 UTC (permalink / raw) To: Yu Zhang Cc: Eduardo Habkost, qemu-devel, Peter Xu, Igor Mammedov, Paolo Bonzini, Richard Henderson On Sat, Dec 22, 2018 at 08:41:37AM +0800, Yu Zhang wrote: > On Fri, Dec 21, 2018 at 01:10:13PM -0500, Michael S. Tsirkin wrote: > > On Sat, Dec 22, 2018 at 01:34:01AM +0800, Yu Zhang wrote: > > > On Fri, Dec 21, 2018 at 12:15:26PM -0500, Michael S. Tsirkin wrote: > > > > On Sat, Dec 22, 2018 at 12:19:20AM +0800, Yu Zhang wrote: > > > > > > I'd like to avoid poking at the CPU from VTD code. That's all. > > > > > > > > > > OK. So for the short term,how about I remove the check of host cpu, and add a TODO > > > > > in the comments in vtd_decide_config()? > > > > > > > > My question would be what happens on an incorrect use? > > > > > > I believe the vfio_dma_map will return failure for an incorrect use. > > > > > > > And how does user figure out which values to set? > > > > > > Well, for now I don't think user can figure out. E.g. if we expose a vIOMMU with > > > 48-bit IOVA capability, yet host only supports 39-bit IOVA, vfio shall return failure, > > > but the user does not know whose fault it is. > > > > > > > > > As to the check against hardware IOMMU, Peter once had a proposal in > > > > > http://lists.nongnu.org/archive/html/qemu-devel/2018-11/msg02281.html > > > > > > > > > > Do you have any comment or suggestion on Peter's proposal? > > > > > > > > Sounds reasonable to me. Do we do it on vfio attach or unconditionally? > > > > > > > > > > I guess on vfio attach? Will need more thinking in it. > > > > > > Things like live migration (e.g. after hot removal of the vfio device) > > are also concerns. > > Sorry, why live migration shall be a problem? I mean, if the DMA address > width of vIOMMU does not match the host IOMMU's, we can just stop creating > the VM, there's no live migration. I don't see code like this though. Also management needs to somehow be able to figure out that migration will fail. It's not nice to transfer all memory and then have it fail when viommu is migrated. So from that POV a flag is better. It can be validated agains host capabilities. We can still have something like aw=host just like cpu host. > > > > > > > > > > > I still do not quite know > > > > > how to do it for now... > > > > > > > > > > [...] > > > > > > > > > > > > > > > B.R. > > > > > Yu > > > > > > > > > > > > > > > > -- > > > > MST > > > > > > B.R. > > > Yu > > B.R. > Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width. 2018-12-25 17:00 ` Michael S. Tsirkin @ 2018-12-26 5:58 ` Yu Zhang 0 siblings, 0 replies; 57+ messages in thread From: Yu Zhang @ 2018-12-26 5:58 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Eduardo Habkost, qemu-devel, Peter Xu, Paolo Bonzini, Igor Mammedov, Richard Henderson On Tue, Dec 25, 2018 at 12:00:08PM -0500, Michael S. Tsirkin wrote: > On Sat, Dec 22, 2018 at 08:41:37AM +0800, Yu Zhang wrote: > > On Fri, Dec 21, 2018 at 01:10:13PM -0500, Michael S. Tsirkin wrote: > > > On Sat, Dec 22, 2018 at 01:34:01AM +0800, Yu Zhang wrote: > > > > On Fri, Dec 21, 2018 at 12:15:26PM -0500, Michael S. Tsirkin wrote: > > > > > On Sat, Dec 22, 2018 at 12:19:20AM +0800, Yu Zhang wrote: > > > > > > > I'd like to avoid poking at the CPU from VTD code. That's all. > > > > > > > > > > > > OK. So for the short term,how about I remove the check of host cpu, and add a TODO > > > > > > in the comments in vtd_decide_config()? > > > > > > > > > > My question would be what happens on an incorrect use? > > > > > > > > I believe the vfio_dma_map will return failure for an incorrect use. > > > > > > > > > And how does user figure out which values to set? > > > > > > > > Well, for now I don't think user can figure out. E.g. if we expose a vIOMMU with > > > > 48-bit IOVA capability, yet host only supports 39-bit IOVA, vfio shall return failure, > > > > but the user does not know whose fault it is. > > > > > > > > > > > As to the check against hardware IOMMU, Peter once had a proposal in > > > > > > http://lists.nongnu.org/archive/html/qemu-devel/2018-11/msg02281.html > > > > > > > > > > > > Do you have any comment or suggestion on Peter's proposal? > > > > > > > > > > Sounds reasonable to me. Do we do it on vfio attach or unconditionally? > > > > > > > > > > > > > I guess on vfio attach? Will need more thinking in it. > > > > > > > > > Things like live migration (e.g. after hot removal of the vfio device) > > > are also concerns. > > > > Sorry, why live migration shall be a problem? I mean, if the DMA address > > width of vIOMMU does not match the host IOMMU's, we can just stop creating > > the VM, there's no live migration. > > I don't see code like this though. > > Also management needs to somehow be able to figure out that migration > will fail. It's not nice to transfer all memory and then have it fail > when viommu is migrated. So from that POV a flag is better. It can be > validated agains host capabilities. > > We can still have something like aw=host just like cpu host. Well, I think vIOMMU's requirement is kind of different: 1> the vIOMMU could be an emulated one, and there can be no physical IOMMU underneath. And the emulated device can still use this vIOMMU; 2> there might be multiple physical IOMMUs on one platform, I am not sure if all these IOMMUs will have the same capability setting. So I think we should have a more generic solution, to check the host capability, e.g. like Kevin's and Peter's suggestion. It's not just about 5-level vIOMMU, existing 4-level vIOMMU and future virtual SVM have similar requirement. :) > > > > > > > > > > > > > > > I still do not quite know > > > > > > how to do it for now... > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > > > > B.R. > > > > > > Yu > > > > > > > > > > > > > > > > > > > > -- > > > > > MST > > > > > > > > B.R. > > > > Yu > > > > B.R. > > Yu > B.R. Yu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] intel-iommu: extend VTD emulation to allow 57-bit IOVA address width. 2018-12-21 17:34 ` Yu Zhang 2018-12-21 18:10 ` Michael S. Tsirkin @ 2018-12-25 1:59 ` Tian, Kevin 1 sibling, 0 replies; 57+ messages in thread From: Tian, Kevin @ 2018-12-25 1:59 UTC (permalink / raw) To: Yu Zhang, Michael S. Tsirkin Cc: Eduardo Habkost, qemu-devel@nongnu.org, Peter Xu, Paolo Bonzini, Igor Mammedov, Richard Henderson, Liu, Yi L > From: Yu Zhang > Sent: Saturday, December 22, 2018 1:34 AM > [...] > > > > > As to the check against hardware IOMMU, Peter once had a proposal in > > > http://lists.nongnu.org/archive/html/qemu-devel/2018- > 11/msg02281.html > > > > > > Do you have any comment or suggestion on Peter's proposal? > > > > Sounds reasonable to me. Do we do it on vfio attach or unconditionally? > > > > I guess on vfio attach? Will need more thinking in it. > either way is not perfect. Unconditional check doesn't make sense if there is no vfio device attached, while vfio attach might happen late (e.g. hotplug) after vIOMMU is initialized... Basically there are two checks to be concerned. One is the check at boot time, which decides vIOMMU capabilities. The other is the check at vfio attach, which decides whether attachment can succeed (i.e. whether the vIOMMU capabilities which are used by device are indeed supported by hardware). Possibly we can make boot-time check configurable. If boot-time check is turned on, vIOMMU capabilities are always a subset of pIOMMU regardless of whether vfio device is attached. check on vfio attach may be skipped since it will always pass. virtual devices also bear with same limitation of pIOMMU. If boot-time check is off, vIOMMU capabilities are always specified by end user, which might be different from pIOMMU. virtual devices can use any capability, but vfio attach may fail if required vIOMMU capabilities are not supported by pIOMMU. btw 5 level is just one example of demanding check with pIOMMU. There are more when emulating VT-d scalable mode (+Yi). Thanks Kevin ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/2] intel-iommu: add support for 5-level virtual IOMMU. 2018-12-12 13:05 [Qemu-devel] [PATCH v3 0/2] intel-iommu: add support for 5-level virtual IOMMU Yu Zhang 2018-12-12 13:05 ` [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width Yu Zhang 2018-12-12 13:05 ` [Qemu-devel] [PATCH v3 2/2] intel-iommu: extend VTD emulation to allow 57-bit " Yu Zhang @ 2018-12-14 9:17 ` Yu Zhang 2019-01-15 4:02 ` Michael S. Tsirkin 3 siblings, 0 replies; 57+ messages in thread From: Yu Zhang @ 2018-12-14 9:17 UTC (permalink / raw) To: qemu-devel, Eduardo Habkost, Michael S. Tsirkin, Peter Xu, Paolo Bonzini, Igor Mammedov, Richard Henderson Sorry, any comments for this series? Thanks. :) B.R. Yu On 12/12/2018 9:05 PM, Yu Zhang wrote: > Intel's upcoming processors will extend maximum linear address width to > 57 bits, and introduce 5-level paging for CPU. Meanwhile, the platform > will also extend the maximum guest address width for IOMMU to 57 bits, > thus introducing the 5-level paging for 2nd level translation(See chapter > 3 in Intel Virtualization Technology for Directed I/O). > > This patch series extends the current logic to support a wider address width. > A 5-level paging capable IOMMU(for 2nd level translation) can be rendered > with configuration "device intel-iommu,x-aw-bits=57". > > Also, kvm-unit-tests were updated to verify this patch series. Patch for > the test was sent out at: https://www.spinics.net/lists/kvm/msg177425.html. > > Note: this patch series checks the existance of 5-level paging in the host > and in the guest, and rejects configurations for 57-bit IOVA if either check > fails(VTD-d hardware shall not support 57-bit IOVA on platforms without CPU > 5-level paging). However, current vIOMMU implementation still lacks logic to > check against the physical IOMMU capability, future enhancements are expected > to do this. > > Changes in V3: > - Address comments from Peter Xu: squash the 3rd patch in v2 into the 2nd > patch in this version. > - Added "Reviewed-by: Peter Xu <peterx@redhat.com>" > > Changes in V2: > - Address comments from Peter Xu: add haw member in vtd_page_walk_info. > - Address comments from Peter Xu: only searches for 4K/2M/1G mappings in > iotlb are meaningful. > - Address comments from Peter Xu: cover letter changes(e.g. mention the test > patch in kvm-unit-tests). > - Coding style changes. > --- > Cc: "Michael S. Tsirkin" <mst@redhat.com> > Cc: Igor Mammedov <imammedo@redhat.com> > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Richard Henderson <rth@twiddle.net> > Cc: Eduardo Habkost <ehabkost@redhat.com> > Cc: Peter Xu <peterx@redhat.com> > --- > > Yu Zhang (2): > intel-iommu: differentiate host address width from IOVA address width. > intel-iommu: extend VTD emulation to allow 57-bit IOVA address width. > > hw/i386/acpi-build.c | 2 +- > hw/i386/intel_iommu.c | 96 +++++++++++++++++++++++++++++------------- > hw/i386/intel_iommu_internal.h | 10 ++++- > include/hw/i386/intel_iommu.h | 10 +++-- > 4 files changed, 81 insertions(+), 37 deletions(-) > ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/2] intel-iommu: add support for 5-level virtual IOMMU. 2018-12-12 13:05 [Qemu-devel] [PATCH v3 0/2] intel-iommu: add support for 5-level virtual IOMMU Yu Zhang ` (2 preceding siblings ...) 2018-12-14 9:17 ` [Qemu-devel] [PATCH v3 0/2] intel-iommu: add support for 5-level virtual IOMMU Yu Zhang @ 2019-01-15 4:02 ` Michael S. Tsirkin 2019-01-15 7:27 ` Yu Zhang 3 siblings, 1 reply; 57+ messages in thread From: Michael S. Tsirkin @ 2019-01-15 4:02 UTC (permalink / raw) To: Yu Zhang Cc: qemu-devel, Igor Mammedov, Marcel Apfelbaum, Paolo Bonzini, Richard Henderson, Eduardo Habkost, Peter Xu On Wed, Dec 12, 2018 at 09:05:37PM +0800, Yu Zhang wrote: > Intel's upcoming processors will extend maximum linear address width to > 57 bits, and introduce 5-level paging for CPU. Meanwhile, the platform > will also extend the maximum guest address width for IOMMU to 57 bits, > thus introducing the 5-level paging for 2nd level translation(See chapter > 3 in Intel Virtualization Technology for Directed I/O). > > This patch series extends the current logic to support a wider address width. > A 5-level paging capable IOMMU(for 2nd level translation) can be rendered > with configuration "device intel-iommu,x-aw-bits=57". > > Also, kvm-unit-tests were updated to verify this patch series. Patch for > the test was sent out at: https://www.spinics.net/lists/kvm/msg177425.html. > > Note: this patch series checks the existance of 5-level paging in the host > and in the guest, and rejects configurations for 57-bit IOVA if either check > fails(VTD-d hardware shall not support 57-bit IOVA on platforms without CPU > 5-level paging). However, current vIOMMU implementation still lacks logic to > check against the physical IOMMU capability, future enhancements are expected > to do this. > > Changes in V3: > - Address comments from Peter Xu: squash the 3rd patch in v2 into the 2nd > patch in this version. > - Added "Reviewed-by: Peter Xu <peterx@redhat.com>" > > Changes in V2: > - Address comments from Peter Xu: add haw member in vtd_page_walk_info. > - Address comments from Peter Xu: only searches for 4K/2M/1G mappings in > iotlb are meaningful. > - Address comments from Peter Xu: cover letter changes(e.g. mention the test > patch in kvm-unit-tests). > - Coding style changes. > --- > Cc: "Michael S. Tsirkin" <mst@redhat.com> > Cc: Igor Mammedov <imammedo@redhat.com> > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Richard Henderson <rth@twiddle.net> > Cc: Eduardo Habkost <ehabkost@redhat.com> > Cc: Peter Xu <peterx@redhat.com> OK is this going anywhere? How about dropping cpu flags probing for now, you can always revisit it later. Will make it maybe a bit less user friendly but OTOH uncontriversial... > --- > > Yu Zhang (2): > intel-iommu: differentiate host address width from IOVA address width. > intel-iommu: extend VTD emulation to allow 57-bit IOVA address width. > > hw/i386/acpi-build.c | 2 +- > hw/i386/intel_iommu.c | 96 +++++++++++++++++++++++++++++------------- > hw/i386/intel_iommu_internal.h | 10 ++++- > include/hw/i386/intel_iommu.h | 10 +++-- > 4 files changed, 81 insertions(+), 37 deletions(-) > > -- > 1.9.1 ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/2] intel-iommu: add support for 5-level virtual IOMMU. 2019-01-15 4:02 ` Michael S. Tsirkin @ 2019-01-15 7:27 ` Yu Zhang 0 siblings, 0 replies; 57+ messages in thread From: Yu Zhang @ 2019-01-15 7:27 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Eduardo Habkost, qemu-devel, Peter Xu, Paolo Bonzini, Igor Mammedov, Richard Henderson On Mon, Jan 14, 2019 at 11:02:28PM -0500, Michael S. Tsirkin wrote: > On Wed, Dec 12, 2018 at 09:05:37PM +0800, Yu Zhang wrote: > > Intel's upcoming processors will extend maximum linear address width to > > 57 bits, and introduce 5-level paging for CPU. Meanwhile, the platform > > will also extend the maximum guest address width for IOMMU to 57 bits, > > thus introducing the 5-level paging for 2nd level translation(See chapter > > 3 in Intel Virtualization Technology for Directed I/O). > > > > This patch series extends the current logic to support a wider address width. > > A 5-level paging capable IOMMU(for 2nd level translation) can be rendered > > with configuration "device intel-iommu,x-aw-bits=57". > > > > Also, kvm-unit-tests were updated to verify this patch series. Patch for > > the test was sent out at: https://www.spinics.net/lists/kvm/msg177425.html. > > > > Note: this patch series checks the existance of 5-level paging in the host > > and in the guest, and rejects configurations for 57-bit IOVA if either check > > fails(VTD-d hardware shall not support 57-bit IOVA on platforms without CPU > > 5-level paging). However, current vIOMMU implementation still lacks logic to > > check against the physical IOMMU capability, future enhancements are expected > > to do this. > > > > Changes in V3: > > - Address comments from Peter Xu: squash the 3rd patch in v2 into the 2nd > > patch in this version. > > - Added "Reviewed-by: Peter Xu <peterx@redhat.com>" > > > > Changes in V2: > > - Address comments from Peter Xu: add haw member in vtd_page_walk_info. > > - Address comments from Peter Xu: only searches for 4K/2M/1G mappings in > > iotlb are meaningful. > > - Address comments from Peter Xu: cover letter changes(e.g. mention the test > > patch in kvm-unit-tests). > > - Coding style changes. > > --- > > Cc: "Michael S. Tsirkin" <mst@redhat.com> > > Cc: Igor Mammedov <imammedo@redhat.com> > > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> > > Cc: Paolo Bonzini <pbonzini@redhat.com> > > Cc: Richard Henderson <rth@twiddle.net> > > Cc: Eduardo Habkost <ehabkost@redhat.com> > > Cc: Peter Xu <peterx@redhat.com> > > > OK is this going anywhere? > How about dropping cpu flags probing for now, you can > always revisit it later. > Will make it maybe a bit less user friendly but OTOH > uncontriversial... Thanks Michael, and sorry for the late reply. Sure. For patch 2/2, I'd like to drop the cpu check. And we are working on another patch to check the host capability. This is supposed to be done by sysfs similar to Peter's previous suggestion. One exception is that our plan is to use the minimal capability of all host VT-d hardware. For example, allow 4-level vIOMMU as long as there is a VT-d hardware do not support 5-level, in case we offered a 5-level vIOMMU, yet to find later a hotplugged device is binded to a 4-level VT-d hardware. This patch is not ready yet, because we also would like to cover the requirement of scalable mode. So for now, I'm more inclined to just drop the cpu check and add some TODO comments. And as to 1/2, I am proposing to address the initialization problem by resetting the haw in vIOMMU in pc_machine_done() in my another reply. If you are OK with this direction, I'll send out the patch after testing. :-) B.R. Yu > > > --- > > > > Yu Zhang (2): > > intel-iommu: differentiate host address width from IOVA address width. > > intel-iommu: extend VTD emulation to allow 57-bit IOVA address width. > > > > hw/i386/acpi-build.c | 2 +- > > hw/i386/intel_iommu.c | 96 +++++++++++++++++++++++++++++------------- > > hw/i386/intel_iommu_internal.h | 10 ++++- > > include/hw/i386/intel_iommu.h | 10 +++-- > > 4 files changed, 81 insertions(+), 37 deletions(-) > > > > -- > > 1.9.1 > ^ permalink raw reply [flat|nested] 57+ messages in thread
end of thread, other threads:[~2019-01-18 7:14 UTC | newest] Thread overview: 57+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-12-12 13:05 [Qemu-devel] [PATCH v3 0/2] intel-iommu: add support for 5-level virtual IOMMU Yu Zhang 2018-12-12 13:05 ` [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width Yu Zhang 2018-12-17 13:17 ` Igor Mammedov 2018-12-18 9:27 ` Yu Zhang 2018-12-18 14:23 ` Michael S. Tsirkin 2018-12-18 14:55 ` Igor Mammedov 2018-12-18 14:58 ` Michael S. Tsirkin 2018-12-19 3:03 ` Yu Zhang 2018-12-19 3:12 ` Michael S. Tsirkin 2018-12-19 6:28 ` Yu Zhang 2018-12-19 15:30 ` Michael S. Tsirkin 2018-12-19 2:57 ` Yu Zhang 2018-12-19 10:40 ` Igor Mammedov 2018-12-19 16:47 ` Michael S. Tsirkin 2018-12-20 5:59 ` Yu Zhang 2018-12-20 21:18 ` Eduardo Habkost 2018-12-21 14:13 ` Igor Mammedov 2018-12-21 16:09 ` Yu Zhang 2018-12-21 17:04 ` Michael S. Tsirkin 2018-12-21 17:37 ` Yu Zhang 2018-12-21 19:02 ` Michael S. Tsirkin 2018-12-21 20:01 ` Eduardo Habkost 2018-12-22 1:11 ` Yu Zhang 2018-12-25 16:56 ` Michael S. Tsirkin 2018-12-26 5:30 ` Yu Zhang 2018-12-27 15:14 ` Eduardo Habkost 2018-12-28 2:32 ` Yu Zhang 2018-12-29 1:29 ` Eduardo Habkost 2019-01-15 7:13 ` Yu Zhang 2019-01-18 7:10 ` Yu Zhang 2018-12-27 14:54 ` Eduardo Habkost 2018-12-28 11:42 ` Igor Mammedov 2018-12-20 20:58 ` Eduardo Habkost 2018-12-12 13:05 ` [Qemu-devel] [PATCH v3 2/2] intel-iommu: extend VTD emulation to allow 57-bit " Yu Zhang 2018-12-17 13:29 ` Igor Mammedov 2018-12-18 9:47 ` Yu Zhang 2018-12-18 10:01 ` Yu Zhang 2018-12-18 12:43 ` Michael S. Tsirkin 2018-12-18 13:45 ` Yu Zhang 2018-12-18 14:49 ` Michael S. Tsirkin 2018-12-19 3:40 ` Yu Zhang 2018-12-19 4:35 ` Michael S. Tsirkin 2018-12-19 5:57 ` Yu Zhang 2018-12-19 15:23 ` Michael S. Tsirkin 2018-12-20 5:49 ` Yu Zhang 2018-12-20 18:28 ` Michael S. Tsirkin 2018-12-21 16:19 ` Yu Zhang 2018-12-21 17:15 ` Michael S. Tsirkin 2018-12-21 17:34 ` Yu Zhang 2018-12-21 18:10 ` Michael S. Tsirkin 2018-12-22 0:41 ` Yu Zhang 2018-12-25 17:00 ` Michael S. Tsirkin 2018-12-26 5:58 ` Yu Zhang 2018-12-25 1:59 ` Tian, Kevin 2018-12-14 9:17 ` [Qemu-devel] [PATCH v3 0/2] intel-iommu: add support for 5-level virtual IOMMU Yu Zhang 2019-01-15 4:02 ` Michael S. Tsirkin 2019-01-15 7:27 ` Yu Zhang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).