All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH 11/13] intel_iommu: provide its own replay() callback
  2016-12-06 10:36 [Qemu-devel] [RFC PATCH 00/13] VT-d replay and misc cleanup Peter Xu
@ 2016-12-06 10:36 ` Peter Xu
  2016-12-08  3:01   ` Lan Tianyu
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Xu @ 2016-12-06 10:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: tianyu.lan, kevin.tian, mst, jan.kiszka, bd.aviv, peterx,
	alex.williamson, jasowang

The default replay() don't work for VT-d since vt-d will have a huge
default memory region which covers address range 0-(2^64-1). This will
normally bring a dead loop when guest starts.

The solution is simple - we don't walk over all the regions. Instead, we
jump over the regions when we found that the page directories are empty.
It'll greatly reduce the time to walk the whole region.

To achieve this, we provided a page walk helper to do that, invoking
corresponding hook function when we found an page we are interested in.
vtd_page_walk_level() is the core logic for the page walking. It's
interface is designed to suite further use case, e.g., to invalidate a
range of addresses.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c | 212 ++++++++++++++++++++++++++++++++++++++++++++++++--
 hw/i386/trace-events  |   8 ++
 include/exec/memory.h |   2 +
 3 files changed, 217 insertions(+), 5 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 46b8a2f..2fcd7af 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -620,6 +620,22 @@ static inline uint32_t vtd_get_agaw_from_context_entry(VTDContextEntry *ce)
     return 30 + (ce->hi & VTD_CONTEXT_ENTRY_AW) * 9;
 }
 
+static inline uint64_t vtd_iova_limit(VTDContextEntry *ce)
+{
+    uint32_t ce_agaw = vtd_get_agaw_from_context_entry(ce);
+    return 1ULL << MIN(ce_agaw, VTD_MGAW);
+}
+
+/* Return true if IOVA passes range check, otherwise false. */
+static inline bool vtd_iova_range_check(uint64_t iova, VTDContextEntry *ce)
+{
+    /*
+     * Check if @iova is above 2^X-1, where X is the minimum of MGAW
+     * in CAP_REG and AW in context-entry.
+     */
+    return !(iova & ~(vtd_iova_limit(ce) - 1));
+}
+
 static const uint64_t vtd_paging_entry_rsvd_field[] = {
     [0] = ~0ULL,
     /* For not large page */
@@ -656,13 +672,9 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova,
     uint32_t level = vtd_get_level_from_context_entry(ce);
     uint32_t offset;
     uint64_t slpte;
-    uint32_t ce_agaw = vtd_get_agaw_from_context_entry(ce);
     uint64_t access_right_check = 0;
 
-    /* Check if @iova is above 2^X-1, where X is the minimum of MGAW
-     * in CAP_REG and AW in context-entry.
-     */
-    if (iova & ~((1ULL << MIN(ce_agaw, VTD_MGAW)) - 1)) {
+    if (!vtd_iova_range_check(iova, ce)) {
         error_report("IOVA 0x%"PRIx64 " exceeds limits", iova);
         return -VTD_FR_ADDR_BEYOND_MGAW;
     }
@@ -718,6 +730,166 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova,
     }
 }
 
+typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
+
+/**
+ * vtd_page_walk_level - walk over specific level for IOVA range
+ *
+ * @addr: base GPA addr to start the walk
+ * @start: IOVA range start address
+ * @end: IOVA range end address (start <= addr < end)
+ * @hook_fn: hook func to be called when detected page
+ * @private: private data to be passed into hook func
+ * @read: whether parent level has read permission
+ * @write: whether parent level has write permission
+ * @skipped: accumulated skipped ranges
+ * @notify_unmap: whether we should notify invalid entries
+ */
+static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
+                               uint64_t end, vtd_page_walk_hook hook_fn,
+                               void *private, uint32_t level,
+                               bool read, bool write, uint64_t *skipped,
+                               bool notify_unmap)
+{
+    bool read_cur, write_cur, entry_valid;
+    uint32_t offset;
+    uint64_t slpte;
+    uint64_t subpage_size, subpage_mask;
+    IOMMUTLBEntry entry;
+    uint64_t iova = start;
+    uint64_t iova_next;
+    uint64_t skipped_local = 0;
+    int ret = 0;
+
+    trace_vtd_page_walk_level(addr, level, start, end);
+
+    subpage_size = 1ULL << vtd_slpt_level_shift(level);
+    subpage_mask = vtd_slpt_level_page_mask(level);
+
+    while (iova < end) {
+        iova_next = (iova & subpage_mask) + subpage_size;
+
+        offset = vtd_iova_level_offset(iova, level);
+        slpte = vtd_get_slpte(addr, offset);
+
+        /*
+         * When one of the following case happens, we assume the whole
+         * range is invalid:
+         *
+         * 1. read block failed
+         * 3. reserved area non-zero
+         * 2. both read & write flag are not set
+         */
+
+        if (slpte == (uint64_t)-1) {
+            trace_vtd_page_walk_skip_read(iova, iova_next);
+            skipped_local++;
+            goto next;
+        }
+
+        if (vtd_slpte_nonzero_rsvd(slpte, level)) {
+            trace_vtd_page_walk_skip_reserve(iova, iova_next);
+            skipped_local++;
+            goto next;
+        }
+
+        /* Permissions are stacked with parents' */
+        read_cur = read && (slpte & VTD_SL_R);
+        write_cur = write && (slpte & VTD_SL_W);
+
+        /*
+         * As long as we have either read/write permission, this is
+         * a valid entry. The rule works for both page or page tables.
+         */
+        entry_valid = read_cur | write_cur;
+
+        if (vtd_is_last_slpte(slpte, level)) {
+            entry.target_as = &address_space_memory;
+            entry.iova = iova & subpage_mask;
+            /*
+             * This might be meaningless addr if (!read_cur &&
+             * !write_cur), but after all this field will be
+             * meaningless in that case, so let's share the code to
+             * generate the IOTLBs no matter it's an MAP or UNMAP
+             */
+            entry.translated_addr = vtd_get_slpte_addr(slpte);
+            entry.addr_mask = ~subpage_mask;
+            entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
+            if (!entry_valid && !notify_unmap) {
+                trace_vtd_page_walk_skip_perm(iova, iova_next);
+                skipped_local++;
+                goto next;
+            }
+            trace_vtd_page_walk_one(level, entry.iova, addr,
+                                    entry.addr_mask, entry.perm);
+            if (hook_fn) {
+                ret = hook_fn(&entry, private);
+                if (ret < 0) {
+                    error_report("Detected error in page walk hook "
+                                 "function, stop walk.");
+                    return ret;
+                }
+            }
+        } else {
+            if (!entry_valid) {
+                trace_vtd_page_walk_skip_perm(iova, iova_next);
+                skipped_local++;
+                goto next;
+            }
+            ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte), iova,
+                                      MIN(iova_next, end), hook_fn, private,
+                                      level - 1, read_cur, write_cur,
+                                      &skipped_local, notify_unmap);
+            if (ret < 0) {
+                error_report("Detected page walk error on addr 0x%"PRIx64
+                             " level %"PRIu32", stop walk.", addr, level - 1);
+                return ret;
+            }
+        }
+
+next:
+        iova = iova_next;
+    }
+
+    if (skipped) {
+        *skipped += skipped_local;
+    }
+
+    return 0;
+}
+
+/**
+ * vtd_page_walk - walk specific IOVA range, and call the hook
+ *
+ * @ce: context entry to walk upon
+ * @start: IOVA address to start the walk
+ * @end: size of the IOVA range to walk over
+ * @hook_fn: the hook that to be called for each detected area
+ * @private: private data for the hook function
+ */
+static int vtd_page_walk(VTDContextEntry *ce, uint64_t start, uint64_t end,
+                         vtd_page_walk_hook hook_fn, void *private)
+{
+    dma_addr_t addr = vtd_get_slpt_base_from_context(ce);
+    uint32_t level = vtd_get_level_from_context_entry(ce);
+
+    if (!vtd_iova_range_check(start, ce)) {
+        error_report("IOVA start 0x%"PRIx64 " end 0x%"PRIx64" exceeds limits",
+                     start, end);
+        return -VTD_FR_ADDR_BEYOND_MGAW;
+    }
+
+    if (!vtd_iova_range_check(end, ce)) {
+        /* Fix end so that it reaches the maximum */
+        end = vtd_iova_limit(ce);
+    }
+
+    trace_vtd_page_walk(ce->hi, ce->lo, start, end);
+
+    return vtd_page_walk_level(addr, start, end, hook_fn, private,
+                               level, true, true, NULL, false);
+}
+
 /* Map a device to its corresponding domain (context-entry) */
 static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
                                     uint8_t devfn, VTDContextEntry *ce)
@@ -2430,6 +2602,35 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
     return vtd_dev_as;
 }
 
+static int vtd_replay_hook(IOMMUTLBEntry *entry, void *private)
+{
+    memory_region_notify_one((IOMMUNotifier *)private, entry);
+    return 0;
+}
+
+static void vtd_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n)
+{
+    VTDAddressSpace *vtd_as = container_of(mr, VTDAddressSpace, iommu);
+    IntelIOMMUState *s = vtd_as->iommu_state;
+    uint8_t bus_n = pci_bus_num(vtd_as->bus);
+    VTDContextEntry ce;
+
+    if (vtd_dev_to_context_entry(s, bus_n, vtd_as->devfn, &ce) == 0) {
+        /*
+         * Scanned a valid context entry, walk over the pages and
+         * notify when needed.
+         */
+        trace_vtd_replay_ce_valid(bus_n, PCI_SLOT(vtd_as->devfn),
+                                  PCI_FUNC(vtd_as->devfn), ce.hi, ce.lo);
+        vtd_page_walk(&ce, 0, ~0, vtd_replay_hook, (void *)n);
+    } else {
+        trace_vtd_replay_ce_invalid(bus_n, PCI_SLOT(vtd_as->devfn),
+                                    PCI_FUNC(vtd_as->devfn));
+    }
+
+    return;
+}
+
 /* Do the initialization. It will also be called when reset, so pay
  * attention when adding new initialization stuff.
  */
@@ -2444,6 +2645,7 @@ static void vtd_init(IntelIOMMUState *s)
 
     s->iommu_ops.translate = vtd_iommu_translate;
     s->iommu_ops.notify_flag_changed = vtd_iommu_notify_flag_changed;
+    s->iommu_ops.replay = vtd_iommu_replay;
     s->root = 0;
     s->root_extended = false;
     s->dmar_enabled = false;
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index e64a4e2..922d543 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -28,6 +28,14 @@ vtd_iotlb_page_update(uint16_t sid, uint64_t addr, uint64_t slpte, uint16_t doma
 vtd_iotlb_cc_hit(uint8_t bus, uint8_t devfn, uint64_t high, uint64_t low, uint32_t gen) "IOTLB context hit bus 0x%"PRIx8" devfn 0x%"PRIx8" high 0x%"PRIx64" low 0x%"PRIx64" gen %"PRIu32
 vtd_iotlb_cc_update(uint8_t bus, uint8_t devfn, uint64_t high, uint64_t low, uint32_t gen1, uint32_t gen2) "IOTLB context update bus 0x%"PRIx8" devfn 0x%"PRIx8" high 0x%"PRIx64" low 0x%"PRIx64" gen %"PRIu32" -> gen %"PRIu32
 vtd_iotlb_reset(const char *reason) "IOTLB reset (reason: %s)"
+vtd_replay_ce_valid(uint8_t bus, uint8_t dev, uint8_t fn, uint64_t hi, uint64_t lo) "replay valid context device %02"PRIx8":%02"PRIx8".%02"PRIx8" hi 0x%"PRIx64" lo 0x%"PRIx64
+vtd_replay_ce_invalid(uint8_t bus, uint8_t dev, uint8_t fn) "replay invalid context device %02"PRIx8":%02"PRIx8".%02"PRIx8
+vtd_page_walk(uint64_t hi, uint64_t lo, uint64_t start, uint64_t end) "Page walk for ce (0x%"PRIx64", 0x%"PRIx64") iova range 0x%"PRIx64" - 0x%"PRIx64
+vtd_page_walk_level(uint64_t addr, uint32_t level, uint64_t start, uint64_t end) "Page walk (base=0x%"PRIx64", level=%"PRIu32") iova range 0x%"PRIx64" - 0x%"PRIx64
+vtd_page_walk_one(uint32_t level, uint64_t iova, uint64_t gpa, uint64_t mask, int perm) "Page walk detected map level 0x%"PRIx32" iova 0x%"PRIx64" -> gpa 0x%"PRIx64" mask 0x%"PRIx64" perm %d"
+vtd_page_walk_skip_read(uint64_t iova, uint64_t next) "Page walk skip iova 0x%"PRIx64" - 0x%"PRIx64" due to unable to read"
+vtd_page_walk_skip_perm(uint64_t iova, uint64_t next) "Page walk skip iova 0x%"PRIx64" - 0x%"PRIx64" due to perm empty"
+vtd_page_walk_skip_reserve(uint64_t iova, uint64_t next) "Page walk skip iova 0x%"PRIx64" - 0x%"PRIx64" due to rsrv set"
 
 # hw/i386/amd_iommu.c
 amdvi_evntlog_fail(uint64_t addr, uint32_t head) "error: fail to write at addr 0x%"PRIx64" +  offset 0x%"PRIx32
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 6bdd12c..e66a0d0 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -60,6 +60,8 @@ typedef enum {
     IOMMU_NO_FAIL  = 4, /* may not be present, don't repport error to guest */
 } IOMMUAccessFlags;
 
+#define IOMMU_ACCESS_FLAG(r, w) ((r) ? IOMMU_RO : 0) | ((w) ? IOMMU_WO : 0)
+
 struct IOMMUTLBEntry {
     AddressSpace    *target_as;
     hwaddr           iova;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 11/13] intel_iommu: provide its own replay() callback
  2016-12-06 10:36 ` [Qemu-devel] [RFC PATCH 11/13] intel_iommu: provide its own replay() callback Peter Xu
@ 2016-12-08  3:01   ` Lan Tianyu
  2016-12-09  1:56     ` Peter Xu
  0 siblings, 1 reply; 9+ messages in thread
From: Lan Tianyu @ 2016-12-08  3:01 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: kevin.tian, mst, jan.kiszka, bd.aviv, alex.williamson, jasowang

On 2016年12月06日 18:36, Peter Xu wrote:
> +/**
> + * vtd_page_walk - walk specific IOVA range, and call the hook
> + *
> + * @ce: context entry to walk upon
> + * @start: IOVA address to start the walk
> + * @end: size of the IOVA range to walk over


*·@end:·IOVA·range·end·address·(start·<=·addr·<·end)
This comment for vtd_page_walk_level() maybe more accurate here?


-- 
Best regards
Tianyu Lan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 11/13] intel_iommu: provide its own replay() callback
  2016-12-08  3:01   ` Lan Tianyu
@ 2016-12-09  1:56     ` Peter Xu
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Xu @ 2016-12-09  1:56 UTC (permalink / raw)
  To: Lan Tianyu
  Cc: qemu-devel, kevin.tian, mst, jan.kiszka, bd.aviv, alex.williamson,
	jasowang

On Thu, Dec 08, 2016 at 11:01:53AM +0800, Lan Tianyu wrote:
> On 2016年12月06日 18:36, Peter Xu wrote:
> > +/**
> > + * vtd_page_walk - walk specific IOVA range, and call the hook
> > + *
> > + * @ce: context entry to walk upon
> > + * @start: IOVA address to start the walk
> > + * @end: size of the IOVA range to walk over
> 
> 
> *·@end:·IOVA·range·end·address·(start·<=·addr·<·end)
> This comment for vtd_page_walk_level() maybe more accurate here?

You are right, thanks. :)

-- peterx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 11/13] intel_iommu: provide its own replay() callback
@ 2016-12-29  7:38 Liu, Yi L
  2016-12-29  9:55 ` Peter Xu
  0 siblings, 1 reply; 9+ messages in thread
From: Liu, Yi L @ 2016-12-29  7:38 UTC (permalink / raw)
  To: ', Peter Xu'
  Cc: Tian, Kevin, Lan, Tianyu, 'qemu-devel@nongnu.org',
	'bd.aviv@gmail.com'

> The default replay() don't work for VT-d since vt-d will have a huge
> default memory region which covers address range 0-(2^64-1). This will
> normally bring a dead loop when guest starts.
> 
> The solution is simple - we don't walk over all the regions. Instead, we
> jump over the regions when we found that the page directories are empty.
> It'll greatly reduce the time to walk the whole region.
> 
> To achieve this, we provided a page walk helper to do that, invoking
> corresponding hook function when we found an page we are interested in.
> vtd_page_walk_level() is the core logic for the page walking. It's
> interface is designed to suite further use case, e.g., to invalidate a
> range of addresses.
> 
> Signed-off-by: Peter Xu <address@hidden>
> ---
>  hw/i386/intel_iommu.c | 212 ++++++++++++++++++++++++++++++++++++++++++++++++--
>  hw/i386/trace-events  |   8 ++
>  include/exec/memory.h |   2 +
>  3 files changed, 217 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 46b8a2f..2fcd7af 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -620,6 +620,22 @@ static inline uint32_t 
> vtd_get_agaw_from_context_entry(VTDContextEntry *ce)
>      return 30 + (ce->hi & VTD_CONTEXT_ENTRY_AW) * 9;
>  }
>  
> +static inline uint64_t vtd_iova_limit(VTDContextEntry *ce)
> +{
> +    uint32_t ce_agaw = vtd_get_agaw_from_context_entry(ce);
> +    return 1ULL << MIN(ce_agaw, VTD_MGAW);
> +}
> +
> +/* Return true if IOVA passes range check, otherwise false. */
> +static inline bool vtd_iova_range_check(uint64_t iova, VTDContextEntry *ce)
> +{
> +    /*
> +     * Check if @iova is above 2^X-1, where X is the minimum of MGAW
> +     * in CAP_REG and AW in context-entry.
> +     */
> +    return !(iova & ~(vtd_iova_limit(ce) - 1));
> +}
> +
>  static const uint64_t vtd_paging_entry_rsvd_field[] = {
>      [0] = ~0ULL,
>      /* For not large page */
> @@ -656,13 +672,9 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t 
> iova,
>      uint32_t level = vtd_get_level_from_context_entry(ce);
>      uint32_t offset;
>      uint64_t slpte;
> -    uint32_t ce_agaw = vtd_get_agaw_from_context_entry(ce);
>      uint64_t access_right_check = 0;
>  
> -    /* Check if @iova is above 2^X-1, where X is the minimum of MGAW
> -     * in CAP_REG and AW in context-entry.
> -     */
> -    if (iova & ~((1ULL << MIN(ce_agaw, VTD_MGAW)) - 1)) {
> +    if (!vtd_iova_range_check(iova, ce)) {
>          error_report("IOVA 0x%"PRIx64 " exceeds limits", iova);
>          return -VTD_FR_ADDR_BEYOND_MGAW;
>      }
> @@ -718,6 +730,166 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, 
> uint64_t iova,
>      }
>  }
>  
> +typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
> +
> +/**
> + * vtd_page_walk_level - walk over specific level for IOVA range
> + *
> + * @addr: base GPA addr to start the walk
> + * @start: IOVA range start address
> + * @end: IOVA range end address (start <= addr < end)
> + * @hook_fn: hook func to be called when detected page
> + * @private: private data to be passed into hook func
> + * @read: whether parent level has read permission
> + * @write: whether parent level has write permission
> + * @skipped: accumulated skipped ranges
> + * @notify_unmap: whether we should notify invalid entries
> + */
> +static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
> +                               uint64_t end, vtd_page_walk_hook hook_fn,
> +                               void *private, uint32_t level,
> +                               bool read, bool write, uint64_t *skipped,
> +                               bool notify_unmap)
> +{
> +    bool read_cur, write_cur, entry_valid;
> +    uint32_t offset;
> +    uint64_t slpte;
> +    uint64_t subpage_size, subpage_mask;
> +    IOMMUTLBEntry entry;
> +    uint64_t iova = start;
> +    uint64_t iova_next;
> +    uint64_t skipped_local = 0;
> +    int ret = 0;
> +
> +    trace_vtd_page_walk_level(addr, level, start, end);
> +
> +    subpage_size = 1ULL << vtd_slpt_level_shift(level);
> +    subpage_mask = vtd_slpt_level_page_mask(level);
> +
> +    while (iova < end) {
> +        iova_next = (iova & subpage_mask) + subpage_size;
> +
> +        offset = vtd_iova_level_offset(iova, level);
> +        slpte = vtd_get_slpte(addr, offset);
> +
> +        /*
> +         * When one of the following case happens, we assume the whole
> +         * range is invalid:
> +         *
> +         * 1. read block failed
> +         * 3. reserved area non-zero
> +         * 2. both read & write flag are not set
> +         */
> +
> +        if (slpte == (uint64_t)-1) {
> +            trace_vtd_page_walk_skip_read(iova, iova_next);
> +            skipped_local++;
> +            goto next;
> +        }
> +
> +        if (vtd_slpte_nonzero_rsvd(slpte, level)) {
> +            trace_vtd_page_walk_skip_reserve(iova, iova_next);
> +            skipped_local++;
> +            goto next;
> +        }
> +
> +        /* Permissions are stacked with parents' */
> +        read_cur = read && (slpte & VTD_SL_R);
> +        write_cur = write && (slpte & VTD_SL_W);
> +
> +        /*
> +         * As long as we have either read/write permission, this is
> +         * a valid entry. The rule works for both page or page tables.
> +         */
> +        entry_valid = read_cur | write_cur;
> +
> +        if (vtd_is_last_slpte(slpte, level)) {
> +            entry.target_as = &address_space_memory;
> +            entry.iova = iova & subpage_mask;
> +            /*
> +             * This might be meaningless addr if (!read_cur &&
> +             * !write_cur), but after all this field will be
> +             * meaningless in that case, so let's share the code to
> +             * generate the IOTLBs no matter it's an MAP or UNMAP
> +             */
> +            entry.translated_addr = vtd_get_slpte_addr(slpte);
> +            entry.addr_mask = ~subpage_mask;
> +            entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
> +            if (!entry_valid && !notify_unmap) {
> +                trace_vtd_page_walk_skip_perm(iova, iova_next);
> +                skipped_local++;
> +                goto next;
> +            }
> +            trace_vtd_page_walk_one(level, entry.iova, addr,
> +                                    entry.addr_mask, entry.perm);
> +            if (hook_fn) {
> +                ret = hook_fn(&entry, private);
> +                if (ret < 0) {
> +                    error_report("Detected error in page walk hook "
> +                                 "function, stop walk.");
> +                    return ret;
> +                }
> +            }
> +        } else {
> +            if (!entry_valid) {
> +                trace_vtd_page_walk_skip_perm(iova, iova_next);
> +                skipped_local++;
> +                goto next;
> +            }
> +            ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte), iova,
> +                                      MIN(iova_next, end), hook_fn, private,
> +                                      level - 1, read_cur, write_cur,
> +                                      &skipped_local, notify_unmap);
> +            if (ret < 0) {
> +                error_report("Detected page walk error on addr 0x%"PRIx64
> +                             " level %"PRIu32", stop walk.", addr, level - 1);
> +                return ret;
> +            }
> +        }
> +
> +next:
> +        iova = iova_next;
> +    }
> +
> +    if (skipped) {
> +        *skipped += skipped_local;
> +    }
> +
> +    return 0;
> +}
> +
> +/**
> + * vtd_page_walk - walk specific IOVA range, and call the hook
> + *
> + * @ce: context entry to walk upon
> + * @start: IOVA address to start the walk
> + * @end: size of the IOVA range to walk over
> + * @hook_fn: the hook that to be called for each detected area
> + * @private: private data for the hook function
> + */
> +static int vtd_page_walk(VTDContextEntry *ce, uint64_t start, uint64_t end,
> +                         vtd_page_walk_hook hook_fn, void *private)
> +{
> +    dma_addr_t addr = vtd_get_slpt_base_from_context(ce);
> +    uint32_t level = vtd_get_level_from_context_entry(ce);
> +
> +    if (!vtd_iova_range_check(start, ce)) {
> +        error_report("IOVA start 0x%"PRIx64 " end 0x%"PRIx64" exceeds limits",
> +                     start, end);
> +        return -VTD_FR_ADDR_BEYOND_MGAW;
> +    }
> +
> +    if (!vtd_iova_range_check(end, ce)) {
> +        /* Fix end so that it reaches the maximum */
> +        end = vtd_iova_limit(ce);
> +    }
> +
> +    trace_vtd_page_walk(ce->hi, ce->lo, start, end);
> +
> +    return vtd_page_walk_level(addr, start, end, hook_fn, private,
> +                               level, true, true, NULL, false);
> +}
> +
>  /* Map a device to its corresponding domain (context-entry) */
>  static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
>                                      uint8_t devfn, VTDContextEntry *ce)
> @@ -2430,6 +2602,35 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, 
> PCIBus *bus, int devfn)
>      return vtd_dev_as;
>  }
>  
> +static int vtd_replay_hook(IOMMUTLBEntry *entry, void *private)
> +{
> +    memory_region_notify_one((IOMMUNotifier *)private, entry);
> +    return 0;
> +}
> +
> +static void vtd_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n)
> +{
> +    VTDAddressSpace *vtd_as = container_of(mr, VTDAddressSpace, iommu);
> +    IntelIOMMUState *s = vtd_as->iommu_state;
> +    uint8_t bus_n = pci_bus_num(vtd_as->bus);
> +    VTDContextEntry ce;
> +
> +    if (vtd_dev_to_context_entry(s, bus_n, vtd_as->devfn, &ce) == 0) {

Hi Peter,

Again I'd like to give my cheers on this patch set.

Hereby, I have a minor concern on this part.
The replay would be performed in vfio_listener_region_add() when device is
assigned. Since guest is just started, so there will be no valid context entry for the
assigned device. So there will be no vtd_page_walk. Thus no change to SL page
table in host. The SL page table would still be a IOVA->HPA mapping. It's true
that the gIOVA->HPA mapping can be built by page map/unmap when guest
trying to issue dma. So it is just ok without relpay in the beginning. Hot plug should
be all the same. I guess it is the design here?

However, it may be incompatible with a future work in which we expose an
vIOMMU to guest but guest disable IOVA usage. In this scenario, the SL page
table in host would be a GPA->HPA instead of a gIOVA->HPA mapping. It would
rely on memory_replay to build a GPA->HPA mapping. If I'm correct, I think it
would be expected to have a replay in the time device is assigned.

Regards,
Yi L

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 11/13] intel_iommu: provide its own replay() callback
  2016-12-29  7:38 [Qemu-devel] [RFC PATCH 11/13] intel_iommu: provide its own replay() callback Liu, Yi L
@ 2016-12-29  9:55 ` Peter Xu
  2016-12-29 10:23   ` Liu, Yi L
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Xu @ 2016-12-29  9:55 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Tian, Kevin, Lan, Tianyu, 'qemu-devel@nongnu.org',
	'bd.aviv@gmail.com'

On Thu, Dec 29, 2016 at 07:38:20AM +0000, Liu, Yi L wrote:

[...]

> > +static void vtd_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n)
> > +{
> > +    VTDAddressSpace *vtd_as = container_of(mr, VTDAddressSpace, iommu);
> > +    IntelIOMMUState *s = vtd_as->iommu_state;
> > +    uint8_t bus_n = pci_bus_num(vtd_as->bus);
> > +    VTDContextEntry ce;
> > +
> > +    if (vtd_dev_to_context_entry(s, bus_n, vtd_as->devfn, &ce) == 0) {
> 
> Hi Peter,
> 
> Again I'd like to give my cheers on this patch set.

Thanks.

> 
> Hereby, I have a minor concern on this part.
> The replay would be performed in vfio_listener_region_add() when device is
> assigned. Since guest is just started, so there will be no valid context entry for the
> assigned device. So there will be no vtd_page_walk. Thus no change to SL page
> table in host. The SL page table would still be a IOVA->HPA mapping. It's true
> that the gIOVA->HPA mapping can be built by page map/unmap when guest
> trying to issue dma. So it is just ok without relpay in the beginning. Hot plug should
> be all the same. I guess it is the design here?

I am not sure whether I get your point here - I think it's by design.
We don't really need replay if we are sure that the IOMMU address
space has totally no mapping (i.e., when the guest init). Replay is
only useful when there are existing mappings in the IOMMU address
space.

Another thing to mention: IMHO IOVA -> HPA mapping should be built by
page map/unmap when guest issues IOTLB invalidations, not DMA (guest
will first issue IOTLB invalidations, only after that the iova address
would be a valid DMA address).

> 
> However, it may be incompatible with a future work in which we expose an
> vIOMMU to guest but guest disable IOVA usage. In this scenario, the SL page
> table in host would be a GPA->HPA instead of a gIOVA->HPA mapping. It would
> rely on memory_replay to build a GPA->HPA mapping. If I'm correct, I think it
> would be expected to have a replay in the time device is assigned.

If you need a GPA -> HPA mapping (AFAICT you mean virtualized SVM and
the so-called first level translation), IMHO the default mapping
behavior of vfio_listener_region_add() suites here (when
memory_region_is_iommu(section->mr) == false), which maps the whole
guest physical address space, just like the case when we use vfio
devices without vIOMMU.

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 11/13] intel_iommu: provide its own replay() callback
  2016-12-29  9:55 ` Peter Xu
@ 2016-12-29 10:23   ` Liu, Yi L
  2016-12-30  3:43     ` Peter Xu
  0 siblings, 1 reply; 9+ messages in thread
From: Liu, Yi L @ 2016-12-29 10:23 UTC (permalink / raw)
  To: Peter Xu
  Cc: Tian, Kevin, Lan, Tianyu, 'qemu-devel@nongnu.org',
	'bd.aviv@gmail.com'

> -----Original Message-----
> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Thursday, December 29, 2016 5:56 PM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Cc: Tian, Kevin <kevin.tian@intel.com>; Lan, Tianyu <tianyu.lan@intel.com>; 'qemu-
> devel@nongnu.org' <qemu-devel@nongnu.org>; 'bd.aviv@gmail.com'
> <bd.aviv@gmail.com>
> Subject: Re: [Qemu-devel] [RFC PATCH 11/13] intel_iommu: provide its own replay()
> callback
> 
> On Thu, Dec 29, 2016 at 07:38:20AM +0000, Liu, Yi L wrote:
> 
> [...]
> 
> > > +static void vtd_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n)
> > > +{
> > > +    VTDAddressSpace *vtd_as = container_of(mr, VTDAddressSpace, iommu);
> > > +    IntelIOMMUState *s = vtd_as->iommu_state;
> > > +    uint8_t bus_n = pci_bus_num(vtd_as->bus);
> > > +    VTDContextEntry ce;
> > > +
> > > +    if (vtd_dev_to_context_entry(s, bus_n, vtd_as->devfn, &ce) == 0) {
> >
> > Hi Peter,
> >
> > Again I'd like to give my cheers on this patch set.
> 
> Thanks.
> 
> >
> > Hereby, I have a minor concern on this part.
> > The replay would be performed in vfio_listener_region_add() when device is
> > assigned. Since guest is just started, so there will be no valid context entry for the
> > assigned device. So there will be no vtd_page_walk. Thus no change to SL page
> > table in host. The SL page table would still be a IOVA->HPA mapping. It's true
> > that the gIOVA->HPA mapping can be built by page map/unmap when guest
> > trying to issue dma. So it is just ok without relpay in the beginning. Hot plug should
> > be all the same. I guess it is the design here?
> 
> I am not sure whether I get your point here - I think it's by design.
> We don't really need replay if we are sure that the IOMMU address
> space has totally no mapping (i.e., when the guest init). Replay is
> only useful when there are existing mappings in the IOMMU address
> space.
yes, if we are sure no existing mapping, it is ok. There is another case which may
complain it. If a device is hotplug in, then there should be existing mapping. Just
looked code again. If corresponding context entry is not present, the
vtd_page_walk should be bypassed.

> 
> Another thing to mention: IMHO IOVA -> HPA mapping should be built by
> page map/unmap when guest issues IOTLB invalidations, not DMA (guest
> will first issue IOTLB invalidations, only after that the iova address
> would be a valid DMA address).
Exactly. thx for the correction.

> 
> >
> > However, it may be incompatible with a future work in which we expose an
> > vIOMMU to guest but guest disable IOVA usage. In this scenario, the SL page
> > table in host would be a GPA->HPA instead of a gIOVA->HPA mapping. It would
> > rely on memory_replay to build a GPA->HPA mapping. If I'm correct, I think it
> > would be expected to have a replay in the time device is assigned.
> 
> If you need a GPA -> HPA mapping (AFAICT you mean virtualized SVM and
> the so-called first level translation), IMHO the default mapping
> behavior of vfio_listener_region_add() suites here (when
> memory_region_is_iommu(section->mr) == false), which maps the whole
> guest physical address space, just like the case when we use vfio
> devices without vIOMMU.
Yes, I mean SVM virtualizaion. To virtualize SVM, an vIOMMU would be exposed to
guest. So memory_region_is_iommu(section->mr) would be true. Then, the original
mapping the whole guest physical address space would not run. Thus no GPA->HPA
mapping would be built. This is what I really concern here.

Thanks,
Yi L

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 11/13] intel_iommu: provide its own replay() callback
  2016-12-29 10:23   ` Liu, Yi L
@ 2016-12-30  3:43     ` Peter Xu
  2016-12-30  4:39       ` Liu, Yi L
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Xu @ 2016-12-30  3:43 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Tian, Kevin, Lan, Tianyu, 'qemu-devel@nongnu.org',
	'bd.aviv@gmail.com'

On Thu, Dec 29, 2016 at 10:23:21AM +0000, Liu, Yi L wrote:

[...]

> > >
> > > Hereby, I have a minor concern on this part.
> > > The replay would be performed in vfio_listener_region_add() when device is
> > > assigned. Since guest is just started, so there will be no valid context entry for the
> > > assigned device. So there will be no vtd_page_walk. Thus no change to SL page
> > > table in host. The SL page table would still be a IOVA->HPA mapping. It's true
> > > that the gIOVA->HPA mapping can be built by page map/unmap when guest
> > > trying to issue dma. So it is just ok without relpay in the beginning. Hot plug should
> > > be all the same. I guess it is the design here?
> > 
> > I am not sure whether I get your point here - I think it's by design.
> > We don't really need replay if we are sure that the IOMMU address
> > space has totally no mapping (i.e., when the guest init). Replay is
> > only useful when there are existing mappings in the IOMMU address
> > space.
> yes, if we are sure no existing mapping, it is ok. There is another case which may
> complain it. If a device is hotplug in, then there should be existing mapping. Just
> looked code again. If corresponding context entry is not present, the
> vtd_page_walk should be bypassed.

Yes, if the context entry is not present, the replay will be bypassed.
And if it exists, then the replay will work (along with the page walk
process). Shouldn't that the behavior we want?

Please clarify if I misunderstood your question.

[...]

> > >
> > > However, it may be incompatible with a future work in which we expose an
> > > vIOMMU to guest but guest disable IOVA usage. In this scenario, the SL page
> > > table in host would be a GPA->HPA instead of a gIOVA->HPA mapping. It would
> > > rely on memory_replay to build a GPA->HPA mapping. If I'm correct, I think it
> > > would be expected to have a replay in the time device is assigned.
> > 
> > If you need a GPA -> HPA mapping (AFAICT you mean virtualized SVM and
> > the so-called first level translation), IMHO the default mapping
> > behavior of vfio_listener_region_add() suites here (when
> > memory_region_is_iommu(section->mr) == false), which maps the whole
> > guest physical address space, just like the case when we use vfio
> > devices without vIOMMU.
> Yes, I mean SVM virtualizaion. To virtualize SVM, an vIOMMU would be exposed to
> guest. So memory_region_is_iommu(section->mr) would be true. Then, the original
> mapping the whole guest physical address space would not run. Thus no GPA->HPA
> mapping would be built. This is what I really concern here.

What I meant above was, we might be able to leverage existing
no-viommu logic to build the SLPT for virtualized SVM. For example, we
can switch the if block from:

  if (memory_region_is_iommu(section->mr))

into something similiar like:

  if (memory_region_is_iommu(section->mr) &&
      !FIRST_LEVEL_TRANSLATION_ENABLED(section->mr))

Just a quick thought, not sure whether that would be an applicable
idea. Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 11/13] intel_iommu: provide its own replay() callback
  2016-12-30  3:43     ` Peter Xu
@ 2016-12-30  4:39       ` Liu, Yi L
  2016-12-30  9:18         ` Peter Xu
  0 siblings, 1 reply; 9+ messages in thread
From: Liu, Yi L @ 2016-12-30  4:39 UTC (permalink / raw)
  To: Peter Xu
  Cc: Tian, Kevin, Lan, Tianyu, 'qemu-devel@nongnu.org',
	'bd.aviv@gmail.com'

> -----Original Message-----
> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Friday, December 30, 2016 11:44 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Cc: Tian, Kevin <kevin.tian@intel.com>; Lan, Tianyu <tianyu.lan@intel.com>; 'qemu-
> devel@nongnu.org' <qemu-devel@nongnu.org>; 'bd.aviv@gmail.com'
> <bd.aviv@gmail.com>
> Subject: Re: [Qemu-devel] [RFC PATCH 11/13] intel_iommu: provide its own replay()
> callback
> 
> On Thu, Dec 29, 2016 at 10:23:21AM +0000, Liu, Yi L wrote:
> 
> [...]
> 
> > > >
> > > > Hereby, I have a minor concern on this part.
> > > > The replay would be performed in vfio_listener_region_add() when device is
> > > > assigned. Since guest is just started, so there will be no valid context entry for
> the
> > > > assigned device. So there will be no vtd_page_walk. Thus no change to SL page
> > > > table in host. The SL page table would still be a IOVA->HPA mapping. It's true
> > > > that the gIOVA->HPA mapping can be built by page map/unmap when guest
> > > > trying to issue dma. So it is just ok without relpay in the beginning. Hot plug
> should
> > > > be all the same. I guess it is the design here?
> > >
> > > I am not sure whether I get your point here - I think it's by design.
> > > We don't really need replay if we are sure that the IOMMU address
> > > space has totally no mapping (i.e., when the guest init). Replay is
> > > only useful when there are existing mappings in the IOMMU address
> > > space.
> > yes, if we are sure no existing mapping, it is ok. There is another case which may
> > complain it. If a device is hotplug in, then there should be existing mapping. Just
> > looked code again. If corresponding context entry is not present, the
> > vtd_page_walk should be bypassed.
> 
> Yes, if the context entry is not present, the replay will be bypassed.
> And if it exists, then the replay will work (along with the page walk
> process). Shouldn't that the behavior we want?
> 
> Please clarify if I misunderstood your question.
I was just assuming a replay should be performed when a new device is plug in.
However, according to your statements, it is not required. Is this the conclusion in
the previous discussion?

> [...]
> 
> > > >
> > > > However, it may be incompatible with a future work in which we expose an
> > > > vIOMMU to guest but guest disable IOVA usage. In this scenario, the SL page
> > > > table in host would be a GPA->HPA instead of a gIOVA->HPA mapping. It would
> > > > rely on memory_replay to build a GPA->HPA mapping. If I'm correct, I think it
> > > > would be expected to have a replay in the time device is assigned.
> > >
> > > If you need a GPA -> HPA mapping (AFAICT you mean virtualized SVM and
> > > the so-called first level translation), IMHO the default mapping
> > > behavior of vfio_listener_region_add() suites here (when
> > > memory_region_is_iommu(section->mr) == false), which maps the whole
> > > guest physical address space, just like the case when we use vfio
> > > devices without vIOMMU.
> > Yes, I mean SVM virtualizaion. To virtualize SVM, an vIOMMU would be exposed to
> > guest. So memory_region_is_iommu(section->mr) would be true. Then, the
> original
> > mapping the whole guest physical address space would not run. Thus no GPA->HPA
> > mapping would be built. This is what I really concern here.
> 
> What I meant above was, we might be able to leverage existing
> no-viommu logic to build the SLPT for virtualized SVM. For example, we
> can switch the if block from:
> 
>   if (memory_region_is_iommu(section->mr))
> 
> into something similiar like:
> 
>   if (memory_region_is_iommu(section->mr) &&
>       !FIRST_LEVEL_TRANSLATION_ENABLED(section->mr))
> 
> Just a quick thought, not sure whether that would be an applicable
> idea. Thanks,
If I remember correctly, with vIOMMU exposed, section->mr here should
not be a ram, the logic would fail in the following code. I agree with you
that map the whole guest physical address space should work with SVM
virtualization. May need to think more on how to make it happen.

hw/i386/common.c, a snippet:
    /* Here we assume that memory_region_is_ram(section->mr)==true */

    vaddr = memory_region_get_ram_ptr(section->mr) +
            section->offset_within_region +
            (iova - section->offset_within_address_space);

Thanks,
Yi L

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 11/13] intel_iommu: provide its own replay() callback
  2016-12-30  4:39       ` Liu, Yi L
@ 2016-12-30  9:18         ` Peter Xu
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Xu @ 2016-12-30  9:18 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Tian, Kevin, Lan, Tianyu, 'qemu-devel@nongnu.org',
	'bd.aviv@gmail.com'

On Fri, Dec 30, 2016 at 04:39:49AM +0000, Liu, Yi L wrote:
> > -----Original Message-----
> > From: Peter Xu [mailto:peterx@redhat.com]
> > Sent: Friday, December 30, 2016 11:44 AM
> > To: Liu, Yi L <yi.l.liu@intel.com>
> > Cc: Tian, Kevin <kevin.tian@intel.com>; Lan, Tianyu <tianyu.lan@intel.com>; 'qemu-
> > devel@nongnu.org' <qemu-devel@nongnu.org>; 'bd.aviv@gmail.com'
> > <bd.aviv@gmail.com>
> > Subject: Re: [Qemu-devel] [RFC PATCH 11/13] intel_iommu: provide its own replay()
> > callback
> > 
> > On Thu, Dec 29, 2016 at 10:23:21AM +0000, Liu, Yi L wrote:
> > 
> > [...]
> > 
> > > > >
> > > > > Hereby, I have a minor concern on this part.
> > > > > The replay would be performed in vfio_listener_region_add() when device is
> > > > > assigned. Since guest is just started, so there will be no valid context entry for
> > the
> > > > > assigned device. So there will be no vtd_page_walk. Thus no change to SL page
> > > > > table in host. The SL page table would still be a IOVA->HPA mapping. It's true
> > > > > that the gIOVA->HPA mapping can be built by page map/unmap when guest
> > > > > trying to issue dma. So it is just ok without relpay in the beginning. Hot plug
> > should
> > > > > be all the same. I guess it is the design here?
> > > >
> > > > I am not sure whether I get your point here - I think it's by design.
> > > > We don't really need replay if we are sure that the IOMMU address
> > > > space has totally no mapping (i.e., when the guest init). Replay is
> > > > only useful when there are existing mappings in the IOMMU address
> > > > space.
> > > yes, if we are sure no existing mapping, it is ok. There is another case which may
> > > complain it. If a device is hotplug in, then there should be existing mapping. Just
> > > looked code again. If corresponding context entry is not present, the
> > > vtd_page_walk should be bypassed.
> > 
> > Yes, if the context entry is not present, the replay will be bypassed.
> > And if it exists, then the replay will work (along with the page walk
> > process). Shouldn't that the behavior we want?
> > 
> > Please clarify if I misunderstood your question.
> I was just assuming a replay should be performed when a new device is plug in.
> However, according to your statements, it is not required. Is this the conclusion in
> the previous discussion?

Hmm, I mean whether the replay is needed for the hotplugged device
should depend on which iommu domain it joins when it is plugged. I
think it should belong to nowhere when plugged but I am not quite
sure... IIUC that'll depend on whether the context entry of that
plugged device is valid - and I think it should be an invalid one.

> 
> > [...]
> > 
> > > > >
> > > > > However, it may be incompatible with a future work in which we expose an
> > > > > vIOMMU to guest but guest disable IOVA usage. In this scenario, the SL page
> > > > > table in host would be a GPA->HPA instead of a gIOVA->HPA mapping. It would
> > > > > rely on memory_replay to build a GPA->HPA mapping. If I'm correct, I think it
> > > > > would be expected to have a replay in the time device is assigned.
> > > >
> > > > If you need a GPA -> HPA mapping (AFAICT you mean virtualized SVM and
> > > > the so-called first level translation), IMHO the default mapping
> > > > behavior of vfio_listener_region_add() suites here (when
> > > > memory_region_is_iommu(section->mr) == false), which maps the whole
> > > > guest physical address space, just like the case when we use vfio
> > > > devices without vIOMMU.
> > > Yes, I mean SVM virtualizaion. To virtualize SVM, an vIOMMU would be exposed to
> > > guest. So memory_region_is_iommu(section->mr) would be true. Then, the
> > original
> > > mapping the whole guest physical address space would not run. Thus no GPA->HPA
> > > mapping would be built. This is what I really concern here.
> > 
> > What I meant above was, we might be able to leverage existing
> > no-viommu logic to build the SLPT for virtualized SVM. For example, we
> > can switch the if block from:
> > 
> >   if (memory_region_is_iommu(section->mr))
> > 
> > into something similiar like:
> > 
> >   if (memory_region_is_iommu(section->mr) &&
> >       !FIRST_LEVEL_TRANSLATION_ENABLED(section->mr))
> > 
> > Just a quick thought, not sure whether that would be an applicable
> > idea. Thanks,
> If I remember correctly, with vIOMMU exposed, section->mr here should
> not be a ram, the logic would fail in the following code. I agree with you
> that map the whole guest physical address space should work with SVM
> virtualization. May need to think more on how to make it happen.
> 
> hw/i386/common.c, a snippet:
>     /* Here we assume that memory_region_is_ram(section->mr)==true */
> 
>     vaddr = memory_region_get_ram_ptr(section->mr) +
>             section->offset_within_region +
>             (iova - section->offset_within_address_space);

Right. That code won't work if only with a change in the if() clause -
I wasn't showing any workable codes (even it's not pesudo), only to
show what I meant. :)

-- peterx

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-12-30  9:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-29  7:38 [Qemu-devel] [RFC PATCH 11/13] intel_iommu: provide its own replay() callback Liu, Yi L
2016-12-29  9:55 ` Peter Xu
2016-12-29 10:23   ` Liu, Yi L
2016-12-30  3:43     ` Peter Xu
2016-12-30  4:39       ` Liu, Yi L
2016-12-30  9:18         ` Peter Xu
  -- strict thread matches above, loose matches on Subject: below --
2016-12-06 10:36 [Qemu-devel] [RFC PATCH 00/13] VT-d replay and misc cleanup Peter Xu
2016-12-06 10:36 ` [Qemu-devel] [RFC PATCH 11/13] intel_iommu: provide its own replay() callback Peter Xu
2016-12-08  3:01   ` Lan Tianyu
2016-12-09  1:56     ` Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.