xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/2] Refactor ioreq server for better performance.
@ 2015-08-13 10:05 Yu Zhang
  2015-08-13 10:05 ` [PATCH v4 1/2] Differentiate IO/mem resources tracked by ioreq server Yu Zhang
  2015-08-13 10:05 ` [PATCH v4 2/2] Refactor rangeset structure for better performance Yu Zhang
  0 siblings, 2 replies; 11+ messages in thread
From: Yu Zhang @ 2015-08-13 10:05 UTC (permalink / raw)
  To: xen-devel, Paul.Durrant, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, keir, jbeulich, andrew.cooper3
  Cc: kevin.tian, zhiyuan.lv

XenGT leverages ioreq server to track and forward the accesses to
GPU I/O resources, e.g. the PPGTT(per-process graphic translation
tables). Currently, ioreq server uses rangeset to track the BDF/
PIO/MMIO ranges to be emulated. To select an ioreq server, the 
rangeset is searched to see if the I/O range is recorded. However,
traversing the link list inside rangeset could be time consuming
when number of ranges is too high. On HSW platform, number of PPGTTs
for each vGPU could be several hundred. On BDW, this value could
be several thousand.  This patch series refactored rangeset to base
it on red-back tree, so that the searching would be more efficient. 

Besides, this patchset also splits the tracking of MMIO and guest
ram ranges into different rangesets. And to accommodate more ranges,
limitation of the number of ranges in an ioreq server, MAX_NR_IO_RANGES
is changed - future patches might be provided to tune this with other
approaches.


Changes in v4: 
Keep the name HVMOP_IO_RANGE_MEMORY for MMIO resources, and add 
a new one, HVMOP_IO_RANGE_WP_MEM, for write-protected memory.

Changes in v3: 
1> Use a seperate rangeset for guest ram pages in ioreq server;
2> Refactor rangeset, instead of introduce a new data structure.

Changes in v2: 
1> Split the original patch into 2;
2> Take Paul Durrant's comments:
  a> Add a name member in the struct rb_rangeset, and use the 'q' 
debug key to dump the ranges in ioreq server;
  b> Keep original routine names for hvm ioreq server;
  c> Commit message changes - mention that a future patch to change
the maximum ranges inside ioreq server.

Yu Zhang (2):
  Differentiate IO/mem resources tracked by ioreq server
  Refactor rangeset structure for better performance.

 tools/libxc/include/xenctrl.h    | 31 +++++++++++++++
 tools/libxc/xc_domain.c          | 55 +++++++++++++++++++++++++++
 xen/arch/x86/hvm/hvm.c           | 26 ++++++++++++-
 xen/common/rangeset.c            | 82 +++++++++++++++++++++++++++++-----------
 xen/include/asm-x86/hvm/domain.h |  4 +-
 xen/include/public/hvm/hvm_op.h  |  1 +
 xen/include/public/hvm/ioreq.h   |  1 +
 7 files changed, 175 insertions(+), 25 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v4 1/2] Differentiate IO/mem resources tracked by ioreq server
  2015-08-13 10:05 [PATCH v4 0/2] Refactor ioreq server for better performance Yu Zhang
@ 2015-08-13 10:05 ` Yu Zhang
  2015-08-13 10:16   ` Paul Durrant
  2015-08-13 10:05 ` [PATCH v4 2/2] Refactor rangeset structure for better performance Yu Zhang
  1 sibling, 1 reply; 11+ messages in thread
From: Yu Zhang @ 2015-08-13 10:05 UTC (permalink / raw)
  To: xen-devel, Paul.Durrant, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, keir, jbeulich, andrew.cooper3
  Cc: kevin.tian, zhiyuan.lv

Currently in ioreq server, guest write-protected ram pages are
tracked in the same rangeset with device mmio resources. Yet
unlike device mmio, which can be in big chunks, the guest write-
protected pages may be discrete ranges with 4K bytes each.

This patch uses a seperate rangeset for the guest ram pages.
And a new ioreq type, IOREQ_TYPE_WP_MEM, is defined.

Note: Previously, a new hypercall or subop was suggested to map
write-protected pages into ioreq server. However, it turned out
handler of this new hypercall would be almost the same with the
existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's
already a type parameter in this hypercall. So no new hypercall
defined, only a new type is introduced.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
 tools/libxc/include/xenctrl.h    | 31 ++++++++++++++++++++++
 tools/libxc/xc_domain.c          | 55 ++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/hvm.c           | 26 ++++++++++++++++++-
 xen/include/asm-x86/hvm/domain.h |  4 +--
 xen/include/public/hvm/hvm_op.h  |  1 +
 xen/include/public/hvm/ioreq.h   |  1 +
 6 files changed, 115 insertions(+), 3 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index de3c0ad..53f196d 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2010,6 +2010,37 @@ int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch,
                                             int is_mmio,
                                             uint64_t start,
                                             uint64_t end);
+/**
+ * This function registers a range of write-protected memory for emulation.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id the IOREQ Server id.
+ * @parm start start of range
+ * @parm end end of range (inclusive).
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_map_mem_range_to_ioreq_server(xc_interface *xch,
+                                        domid_t domid,
+                                        ioservid_t id,
+                                        uint64_t start,
+                                        uint64_t end);
+
+/**
+ * This function deregisters a range of write-protected memory for emulation.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id the IOREQ Server id.
+ * @parm start start of range
+ * @parm end end of range (inclusive).
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_unmap_mem_range_from_ioreq_server(xc_interface *xch,
+                                            domid_t domid,
+                                            ioservid_t id,
+                                            uint64_t start,
+                                            uint64_t end);
 
 /**
  * This function registers a PCI device for config space emulation.
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 2ee26fb..42aeba9 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1552,6 +1552,61 @@ int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch, domid_t domid,
     return rc;
 }
 
+int xc_hvm_map_mem_range_to_ioreq_server(xc_interface *xch, domid_t domid,
+                                        ioservid_t id, uint64_t start, uint64_t end)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+    arg->domid = domid;
+    arg->id = id;
+    arg->type = HVMOP_IO_RANGE_WP_MEM;
+    arg->start = start;
+    arg->end = end;
+
+    rc = do_xen_hypercall(xch, &hypercall);
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_unmap_mem_range_from_ioreq_server(xc_interface *xch, domid_t domid,
+                                            ioservid_t id, uint64_t start, uint64_t end)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_unmap_io_range_from_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+    arg->domid = domid;
+    arg->id = id;
+    arg->type = HVMOP_IO_RANGE_WP_MEM;
+    arg->start = start;
+    arg->end = end;
+
+    rc = do_xen_hypercall(xch, &hypercall);
+
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+
+}
+
 int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch, domid_t domid,
                                       ioservid_t id, uint16_t segment,
                                       uint8_t bus, uint8_t device,
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index c957610..0389c0a 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -938,8 +938,9 @@ static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
 
         rc = asprintf(&name, "ioreq_server %d %s", s->id,
                       (i == HVMOP_IO_RANGE_PORT) ? "port" :
-                      (i == HVMOP_IO_RANGE_MEMORY) ? "memory" :
+                      (i == HVMOP_IO_RANGE_MEMORY) ? "mmio" :
                       (i == HVMOP_IO_RANGE_PCI) ? "pci" :
+                      (i == HVMOP_IO_RANGE_WP_MEM) ? "wp-ed memory" :
                       "");
         if ( rc )
             goto fail;
@@ -1258,6 +1259,7 @@ static int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
             case HVMOP_IO_RANGE_PORT:
             case HVMOP_IO_RANGE_MEMORY:
             case HVMOP_IO_RANGE_PCI:
+            case HVMOP_IO_RANGE_WP_MEM:
                 r = s->range[type];
                 break;
 
@@ -1309,6 +1311,7 @@ static int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
             case HVMOP_IO_RANGE_PORT:
             case HVMOP_IO_RANGE_MEMORY:
             case HVMOP_IO_RANGE_PCI:
+            case HVMOP_IO_RANGE_WP_MEM:
                 r = s->range[type];
                 break;
 
@@ -2523,6 +2526,8 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
     uint32_t cf8;
     uint8_t type;
     uint64_t addr;
+    p2m_type_t p2mt;
+    struct page_info *ram_page;
 
     if ( list_empty(&d->arch.hvm_domain.ioreq_server.list) )
         return NULL;
@@ -2565,6 +2570,17 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
     {
         type = p->type;
         addr = p->addr;
+
+        if ( p->type == IOREQ_TYPE_COPY )
+        {
+            ram_page = get_page_from_gfn(d, p->addr >> PAGE_SHIFT,
+                                         &p2mt, P2M_UNSHARE);
+            if ( p2mt == p2m_mmio_write_dm )
+                type = IOREQ_TYPE_WP_MEM;
+
+            if ( ram_page )
+                put_page(ram_page);
+        }
     }
 
     list_for_each_entry ( s,
@@ -2582,6 +2598,7 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
         BUILD_BUG_ON(IOREQ_TYPE_PIO != HVMOP_IO_RANGE_PORT);
         BUILD_BUG_ON(IOREQ_TYPE_COPY != HVMOP_IO_RANGE_MEMORY);
         BUILD_BUG_ON(IOREQ_TYPE_PCI_CONFIG != HVMOP_IO_RANGE_PCI);
+        BUILD_BUG_ON(IOREQ_TYPE_WP_MEM != HVMOP_IO_RANGE_WP_MEM);
         r = s->range[type];
 
         switch ( type )
@@ -2609,6 +2626,13 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
             }
 
             break;
+
+        case IOREQ_TYPE_WP_MEM:
+            end = addr + (p->size * p->count) - 1;
+            if ( rangeset_contains_range(r, addr, end) )
+                return s;
+
+            break;
         }
     }
 
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 992d5d1..b2e4234 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -48,8 +48,8 @@ struct hvm_ioreq_vcpu {
     bool_t           pending;
 };
 
-#define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_PCI + 1)
-#define MAX_NR_IO_RANGES  256
+#define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_WP_MEM + 1)
+#define MAX_NR_IO_RANGES  8192
 
 struct hvm_ioreq_server {
     struct list_head       list_entry;
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index 014546a..c02c91a 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -331,6 +331,7 @@ struct xen_hvm_io_range {
 # define HVMOP_IO_RANGE_PORT   0 /* I/O port range */
 # define HVMOP_IO_RANGE_MEMORY 1 /* MMIO range */
 # define HVMOP_IO_RANGE_PCI    2 /* PCI segment/bus/dev/func range */
+# define HVMOP_IO_RANGE_WP_MEM 3 /* Write-protected memory range */
     uint64_aligned_t start, end; /* IN - inclusive start and end of range */
 };
 typedef struct xen_hvm_io_range xen_hvm_io_range_t;
diff --git a/xen/include/public/hvm/ioreq.h b/xen/include/public/hvm/ioreq.h
index 2e5809b..2b712ac 100644
--- a/xen/include/public/hvm/ioreq.h
+++ b/xen/include/public/hvm/ioreq.h
@@ -35,6 +35,7 @@
 #define IOREQ_TYPE_PIO          0 /* pio */
 #define IOREQ_TYPE_COPY         1 /* mmio ops */
 #define IOREQ_TYPE_PCI_CONFIG   2
+#define IOREQ_TYPE_WP_MEM       3
 #define IOREQ_TYPE_TIMEOFFSET   7
 #define IOREQ_TYPE_INVALIDATE   8 /* mapcache */
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 2/2] Refactor rangeset structure for better performance.
  2015-08-13 10:05 [PATCH v4 0/2] Refactor ioreq server for better performance Yu Zhang
  2015-08-13 10:05 ` [PATCH v4 1/2] Differentiate IO/mem resources tracked by ioreq server Yu Zhang
@ 2015-08-13 10:05 ` Yu Zhang
  2015-08-13 10:33   ` Paul Durrant
  1 sibling, 1 reply; 11+ messages in thread
From: Yu Zhang @ 2015-08-13 10:05 UTC (permalink / raw)
  To: xen-devel, Paul.Durrant, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2, keir, jbeulich, andrew.cooper3
  Cc: kevin.tian, zhiyuan.lv

This patch refactors struct rangeset to base it on the red-black
tree structure, instead of on the current doubly linked list. By
now, ioreq leverages rangeset to keep track of the IO/memory
resources to be emulated. Yet when number of ranges inside one
ioreq server is very high, traversing a doubly linked list could
be time consuming. With this patch, the time complexity for
searching a rangeset can be improved from O(n) to O(log(n)).
Interfaces of rangeset still remain the same, and no new APIs
introduced.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
 xen/common/rangeset.c | 82 +++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 60 insertions(+), 22 deletions(-)

diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
index 6c6293c..87b6aab 100644
--- a/xen/common/rangeset.c
+++ b/xen/common/rangeset.c
@@ -10,11 +10,12 @@
 #include <xen/sched.h>
 #include <xen/errno.h>
 #include <xen/rangeset.h>
+#include <xen/rbtree.h>
 #include <xsm/xsm.h>
 
 /* An inclusive range [s,e] and pointer to next range in ascending order. */
 struct range {
-    struct list_head list;
+    struct rb_node node;
     unsigned long s, e;
 };
 
@@ -24,7 +25,7 @@ struct rangeset {
     struct domain   *domain;
 
     /* Ordered list of ranges contained in this set, and protecting lock. */
-    struct list_head range_list;
+    struct rb_root   range_tree;
 
     /* Number of ranges that can be allocated */
     long             nr_ranges;
@@ -45,41 +46,78 @@ struct rangeset {
 static struct range *find_range(
     struct rangeset *r, unsigned long s)
 {
-    struct range *x = NULL, *y;
+    struct rb_node *node;
+    struct range   *x;
+    struct range   *prev = NULL;
 
-    list_for_each_entry ( y, &r->range_list, list )
+    node = r->range_tree.rb_node;
+    while ( node )
     {
-        if ( y->s > s )
-            break;
-        x = y;
+        x = container_of(node, struct range, node);
+        if ( (s >= x->s) && (s <= x->e) )
+            return x;
+        if ( s < x->s )
+            node = node->rb_left;
+        else if ( s > x->s )
+        {
+            prev = x;
+            node = node->rb_right;
+        }
     }
 
-    return x;
+    return prev;
 }
 
 /* Return the lowest range in the set r, or NULL if r is empty. */
 static struct range *first_range(
     struct rangeset *r)
 {
-    if ( list_empty(&r->range_list) )
-        return NULL;
-    return list_entry(r->range_list.next, struct range, list);
+    struct rb_node *node;
+
+    node = rb_first(&r->range_tree);
+    if ( node != NULL )
+        return container_of(node, struct range, node);
+
+    return NULL;
 }
 
 /* Return range following x in ascending order, or NULL if x is the highest. */
 static struct range *next_range(
     struct rangeset *r, struct range *x)
 {
-    if ( x->list.next == &r->range_list )
-        return NULL;
-    return list_entry(x->list.next, struct range, list);
+    struct rb_node *node;
+
+    node = rb_next(&x->node);
+    if ( node )
+        return container_of(node, struct range, node);
+
+    return NULL;
 }
 
 /* Insert range y after range x in r. Insert as first range if x is NULL. */
 static void insert_range(
     struct rangeset *r, struct range *x, struct range *y)
 {
-    list_add(&y->list, (x != NULL) ? &x->list : &r->range_list);
+    struct rb_node **node;
+    struct rb_node *parent = NULL;
+
+    if ( x == NULL )
+        node = &r->range_tree.rb_node;
+    else
+    {
+        node = &x->node.rb_right;
+        parent = &x->node;
+    }
+
+    while ( *node )
+    {
+        parent = *node;
+        node = &parent->rb_left;
+    }
+
+    /* Add new node and rebalance the red-black tree. */
+    rb_link_node(&y->node, parent, node);
+    rb_insert_color(&y->node, &r->range_tree);
 }
 
 /* Remove a range from its list and free it. */
@@ -88,7 +126,7 @@ static void destroy_range(
 {
     r->nr_ranges++;
 
-    list_del(&x->list);
+    rb_erase(&x->node, &r->range_tree);
     xfree(x);
 }
 
@@ -319,7 +357,7 @@ bool_t rangeset_contains_singleton(
 bool_t rangeset_is_empty(
     const struct rangeset *r)
 {
-    return ((r == NULL) || list_empty(&r->range_list));
+    return ((r == NULL) || RB_EMPTY_ROOT(&r->range_tree));
 }
 
 struct rangeset *rangeset_new(
@@ -332,7 +370,7 @@ struct rangeset *rangeset_new(
         return NULL;
 
     rwlock_init(&r->lock);
-    INIT_LIST_HEAD(&r->range_list);
+    r->range_tree = RB_ROOT;
     r->nr_ranges = -1;
 
     BUG_ON(flags & ~RANGESETF_prettyprint_hex);
@@ -410,7 +448,7 @@ void rangeset_domain_destroy(
 
 void rangeset_swap(struct rangeset *a, struct rangeset *b)
 {
-    LIST_HEAD(tmp);
+    struct rb_node* tmp;
 
     if ( a < b )
     {
@@ -423,9 +461,9 @@ void rangeset_swap(struct rangeset *a, struct rangeset *b)
         write_lock(&a->lock);
     }
 
-    list_splice_init(&a->range_list, &tmp);
-    list_splice_init(&b->range_list, &a->range_list);
-    list_splice(&tmp, &b->range_list);
+    tmp = a->range_tree.rb_node;
+    a->range_tree.rb_node = b->range_tree.rb_node;
+    b->range_tree.rb_node = tmp;
 
     write_unlock(&a->lock);
     write_unlock(&b->lock);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 1/2] Differentiate IO/mem resources tracked by ioreq server
  2015-08-13 10:05 ` [PATCH v4 1/2] Differentiate IO/mem resources tracked by ioreq server Yu Zhang
@ 2015-08-13 10:16   ` Paul Durrant
  2015-08-13 10:39     ` Yu, Zhang
  0 siblings, 1 reply; 11+ messages in thread
From: Paul Durrant @ 2015-08-13 10:16 UTC (permalink / raw)
  To: Yu Zhang, xen-devel@lists.xen.org, Ian Jackson,
	Stefano Stabellini, Ian Campbell, Wei Liu, Keir (Xen.org),
	jbeulich@suse.com, Andrew Cooper
  Cc: Kevin Tian, zhiyuan.lv@intel.com

> -----Original Message-----
> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
> Sent: 13 August 2015 11:06
> To: xen-devel@lists.xen.org; Paul Durrant; Ian Jackson; Stefano Stabellini; Ian
> Campbell; Wei Liu; Keir (Xen.org); jbeulich@suse.com; Andrew Cooper
> Cc: Kevin Tian; zhiyuan.lv@intel.com
> Subject: [PATCH v4 1/2] Differentiate IO/mem resources tracked by ioreq
> server
> 
> Currently in ioreq server, guest write-protected ram pages are
> tracked in the same rangeset with device mmio resources. Yet
> unlike device mmio, which can be in big chunks, the guest write-
> protected pages may be discrete ranges with 4K bytes each.

Would the interfaces be better using xen_pfn_t rather than using uint64_t to pass addresses round where the bottom 12 bits will always be zero?

  Paul

> 
> This patch uses a seperate rangeset for the guest ram pages.
> And a new ioreq type, IOREQ_TYPE_WP_MEM, is defined.
> 
> Note: Previously, a new hypercall or subop was suggested to map
> write-protected pages into ioreq server. However, it turned out
> handler of this new hypercall would be almost the same with the
> existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's
> already a type parameter in this hypercall. So no new hypercall
> defined, only a new type is introduced.
> 
> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> ---
>  tools/libxc/include/xenctrl.h    | 31 ++++++++++++++++++++++
>  tools/libxc/xc_domain.c          | 55
> ++++++++++++++++++++++++++++++++++++++++
>  xen/arch/x86/hvm/hvm.c           | 26 ++++++++++++++++++-
>  xen/include/asm-x86/hvm/domain.h |  4 +--
>  xen/include/public/hvm/hvm_op.h  |  1 +
>  xen/include/public/hvm/ioreq.h   |  1 +
>  6 files changed, 115 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index de3c0ad..53f196d 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -2010,6 +2010,37 @@ int
> xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch,
>                                              int is_mmio,
>                                              uint64_t start,
>                                              uint64_t end);
> +/**
> + * This function registers a range of write-protected memory for emulation.
> + *
> + * @parm xch a handle to an open hypervisor interface.
> + * @parm domid the domain id to be serviced
> + * @parm id the IOREQ Server id.
> + * @parm start start of range
> + * @parm end end of range (inclusive).
> + * @return 0 on success, -1 on failure.
> + */
> +int xc_hvm_map_mem_range_to_ioreq_server(xc_interface *xch,
> +                                        domid_t domid,
> +                                        ioservid_t id,
> +                                        uint64_t start,
> +                                        uint64_t end);
> +
> +/**
> + * This function deregisters a range of write-protected memory for
> emulation.
> + *
> + * @parm xch a handle to an open hypervisor interface.
> + * @parm domid the domain id to be serviced
> + * @parm id the IOREQ Server id.
> + * @parm start start of range
> + * @parm end end of range (inclusive).
> + * @return 0 on success, -1 on failure.
> + */
> +int xc_hvm_unmap_mem_range_from_ioreq_server(xc_interface *xch,
> +                                            domid_t domid,
> +                                            ioservid_t id,
> +                                            uint64_t start,
> +                                            uint64_t end);
> 
>  /**
>   * This function registers a PCI device for config space emulation.
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index 2ee26fb..42aeba9 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -1552,6 +1552,61 @@ int
> xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch, domid_t
> domid,
>      return rc;
>  }
> 
> +int xc_hvm_map_mem_range_to_ioreq_server(xc_interface *xch,
> domid_t domid,
> +                                        ioservid_t id, uint64_t start, uint64_t end)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +
> +    arg->domid = domid;
> +    arg->id = id;
> +    arg->type = HVMOP_IO_RANGE_WP_MEM;
> +    arg->start = start;
> +    arg->end = end;
> +
> +    rc = do_xen_hypercall(xch, &hypercall);
> +
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}
> +
> +int xc_hvm_unmap_mem_range_from_ioreq_server(xc_interface *xch,
> domid_t domid,
> +                                            ioservid_t id, uint64_t start, uint64_t end)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_unmap_io_range_from_ioreq_server;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +
> +    arg->domid = domid;
> +    arg->id = id;
> +    arg->type = HVMOP_IO_RANGE_WP_MEM;
> +    arg->start = start;
> +    arg->end = end;
> +
> +    rc = do_xen_hypercall(xch, &hypercall);
> +
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +
> +}
> +
>  int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch, domid_t
> domid,
>                                        ioservid_t id, uint16_t segment,
>                                        uint8_t bus, uint8_t device,
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index c957610..0389c0a 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -938,8 +938,9 @@ static int hvm_ioreq_server_alloc_rangesets(struct
> hvm_ioreq_server *s,
> 
>          rc = asprintf(&name, "ioreq_server %d %s", s->id,
>                        (i == HVMOP_IO_RANGE_PORT) ? "port" :
> -                      (i == HVMOP_IO_RANGE_MEMORY) ? "memory" :
> +                      (i == HVMOP_IO_RANGE_MEMORY) ? "mmio" :
>                        (i == HVMOP_IO_RANGE_PCI) ? "pci" :
> +                      (i == HVMOP_IO_RANGE_WP_MEM) ? "wp-ed memory" :
>                        "");
>          if ( rc )
>              goto fail;
> @@ -1258,6 +1259,7 @@ static int
> hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
>              case HVMOP_IO_RANGE_PORT:
>              case HVMOP_IO_RANGE_MEMORY:
>              case HVMOP_IO_RANGE_PCI:
> +            case HVMOP_IO_RANGE_WP_MEM:
>                  r = s->range[type];
>                  break;
> 
> @@ -1309,6 +1311,7 @@ static int
> hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
>              case HVMOP_IO_RANGE_PORT:
>              case HVMOP_IO_RANGE_MEMORY:
>              case HVMOP_IO_RANGE_PCI:
> +            case HVMOP_IO_RANGE_WP_MEM:
>                  r = s->range[type];
>                  break;
> 
> @@ -2523,6 +2526,8 @@ struct hvm_ioreq_server
> *hvm_select_ioreq_server(struct domain *d,
>      uint32_t cf8;
>      uint8_t type;
>      uint64_t addr;
> +    p2m_type_t p2mt;
> +    struct page_info *ram_page;
> 
>      if ( list_empty(&d->arch.hvm_domain.ioreq_server.list) )
>          return NULL;
> @@ -2565,6 +2570,17 @@ struct hvm_ioreq_server
> *hvm_select_ioreq_server(struct domain *d,
>      {
>          type = p->type;
>          addr = p->addr;
> +
> +        if ( p->type == IOREQ_TYPE_COPY )
> +        {
> +            ram_page = get_page_from_gfn(d, p->addr >> PAGE_SHIFT,
> +                                         &p2mt, P2M_UNSHARE);
> +            if ( p2mt == p2m_mmio_write_dm )
> +                type = IOREQ_TYPE_WP_MEM;
> +
> +            if ( ram_page )
> +                put_page(ram_page);
> +        }
>      }
> 
>      list_for_each_entry ( s,
> @@ -2582,6 +2598,7 @@ struct hvm_ioreq_server
> *hvm_select_ioreq_server(struct domain *d,
>          BUILD_BUG_ON(IOREQ_TYPE_PIO != HVMOP_IO_RANGE_PORT);
>          BUILD_BUG_ON(IOREQ_TYPE_COPY != HVMOP_IO_RANGE_MEMORY);
>          BUILD_BUG_ON(IOREQ_TYPE_PCI_CONFIG !=
> HVMOP_IO_RANGE_PCI);
> +        BUILD_BUG_ON(IOREQ_TYPE_WP_MEM !=
> HVMOP_IO_RANGE_WP_MEM);
>          r = s->range[type];
> 
>          switch ( type )
> @@ -2609,6 +2626,13 @@ struct hvm_ioreq_server
> *hvm_select_ioreq_server(struct domain *d,
>              }
> 
>              break;
> +
> +        case IOREQ_TYPE_WP_MEM:
> +            end = addr + (p->size * p->count) - 1;
> +            if ( rangeset_contains_range(r, addr, end) )
> +                return s;
> +
> +            break;
>          }
>      }
> 
> diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-
> x86/hvm/domain.h
> index 992d5d1..b2e4234 100644
> --- a/xen/include/asm-x86/hvm/domain.h
> +++ b/xen/include/asm-x86/hvm/domain.h
> @@ -48,8 +48,8 @@ struct hvm_ioreq_vcpu {
>      bool_t           pending;
>  };
> 
> -#define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_PCI + 1)
> -#define MAX_NR_IO_RANGES  256
> +#define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_WP_MEM + 1)
> +#define MAX_NR_IO_RANGES  8192
> 
>  struct hvm_ioreq_server {
>      struct list_head       list_entry;
> diff --git a/xen/include/public/hvm/hvm_op.h
> b/xen/include/public/hvm/hvm_op.h
> index 014546a..c02c91a 100644
> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -331,6 +331,7 @@ struct xen_hvm_io_range {
>  # define HVMOP_IO_RANGE_PORT   0 /* I/O port range */
>  # define HVMOP_IO_RANGE_MEMORY 1 /* MMIO range */
>  # define HVMOP_IO_RANGE_PCI    2 /* PCI segment/bus/dev/func range */
> +# define HVMOP_IO_RANGE_WP_MEM 3 /* Write-protected memory
> range */
>      uint64_aligned_t start, end; /* IN - inclusive start and end of range */
>  };
>  typedef struct xen_hvm_io_range xen_hvm_io_range_t;
> diff --git a/xen/include/public/hvm/ioreq.h
> b/xen/include/public/hvm/ioreq.h
> index 2e5809b..2b712ac 100644
> --- a/xen/include/public/hvm/ioreq.h
> +++ b/xen/include/public/hvm/ioreq.h
> @@ -35,6 +35,7 @@
>  #define IOREQ_TYPE_PIO          0 /* pio */
>  #define IOREQ_TYPE_COPY         1 /* mmio ops */
>  #define IOREQ_TYPE_PCI_CONFIG   2
> +#define IOREQ_TYPE_WP_MEM       3
>  #define IOREQ_TYPE_TIMEOFFSET   7
>  #define IOREQ_TYPE_INVALIDATE   8 /* mapcache */
> 
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/2] Refactor rangeset structure for better performance.
  2015-08-13 10:05 ` [PATCH v4 2/2] Refactor rangeset structure for better performance Yu Zhang
@ 2015-08-13 10:33   ` Paul Durrant
  2015-08-13 10:46     ` Yu, Zhang
  0 siblings, 1 reply; 11+ messages in thread
From: Paul Durrant @ 2015-08-13 10:33 UTC (permalink / raw)
  To: Yu Zhang, xen-devel@lists.xen.org, Ian Jackson,
	Stefano Stabellini, Ian Campbell, Wei Liu, Keir (Xen.org),
	jbeulich@suse.com, Andrew Cooper
  Cc: Kevin Tian, zhiyuan.lv@intel.com

> -----Original Message-----
> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
> Sent: 13 August 2015 11:06
> To: xen-devel@lists.xen.org; Paul Durrant; Ian Jackson; Stefano Stabellini; Ian
> Campbell; Wei Liu; Keir (Xen.org); jbeulich@suse.com; Andrew Cooper
> Cc: Kevin Tian; zhiyuan.lv@intel.com
> Subject: [PATCH v4 2/2] Refactor rangeset structure for better performance.
> 
> This patch refactors struct rangeset to base it on the red-black
> tree structure, instead of on the current doubly linked list. By
> now, ioreq leverages rangeset to keep track of the IO/memory
> resources to be emulated. Yet when number of ranges inside one
> ioreq server is very high, traversing a doubly linked list could
> be time consuming. With this patch, the time complexity for
> searching a rangeset can be improved from O(n) to O(log(n)).
> Interfaces of rangeset still remain the same, and no new APIs
> introduced.
> 
> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> ---
>  xen/common/rangeset.c | 82
> +++++++++++++++++++++++++++++++++++++--------------
>  1 file changed, 60 insertions(+), 22 deletions(-)
> 
> diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
> index 6c6293c..87b6aab 100644
> --- a/xen/common/rangeset.c
> +++ b/xen/common/rangeset.c
> @@ -10,11 +10,12 @@
>  #include <xen/sched.h>
>  #include <xen/errno.h>
>  #include <xen/rangeset.h>
> +#include <xen/rbtree.h>
>  #include <xsm/xsm.h>
> 
>  /* An inclusive range [s,e] and pointer to next range in ascending order. */
>  struct range {
> -    struct list_head list;
> +    struct rb_node node;
>      unsigned long s, e;
>  };
> 
> @@ -24,7 +25,7 @@ struct rangeset {
>      struct domain   *domain;
> 
>      /* Ordered list of ranges contained in this set, and protecting lock. */
> -    struct list_head range_list;
> +    struct rb_root   range_tree;
> 
>      /* Number of ranges that can be allocated */
>      long             nr_ranges;
> @@ -45,41 +46,78 @@ struct rangeset {
>  static struct range *find_range(
>      struct rangeset *r, unsigned long s)
>  {
> -    struct range *x = NULL, *y;
> +    struct rb_node *node;
> +    struct range   *x;
> +    struct range   *prev = NULL;
> 
> -    list_for_each_entry ( y, &r->range_list, list )
> +    node = r->range_tree.rb_node;
> +    while ( node )
>      {
> -        if ( y->s > s )
> -            break;
> -        x = y;
> +        x = container_of(node, struct range, node);
> +        if ( (s >= x->s) && (s <= x->e) )
> +            return x;
> +        if ( s < x->s )
> +            node = node->rb_left;
> +        else if ( s > x->s )

Do you need else if there? I think else would do. s either has to be < x->s or > x->e to get to this point.

> +        {
> +            prev = x;
> +            node = node->rb_right;
> +        }
>      }
> 
> -    return x;
> +    return prev;

Do you not want to return NULL here? It looks like you'd only get here if there was no matching range.

  Paul

>  }
> 
>  /* Return the lowest range in the set r, or NULL if r is empty. */
>  static struct range *first_range(
>      struct rangeset *r)
>  {
> -    if ( list_empty(&r->range_list) )
> -        return NULL;
> -    return list_entry(r->range_list.next, struct range, list);
> +    struct rb_node *node;
> +
> +    node = rb_first(&r->range_tree);
> +    if ( node != NULL )
> +        return container_of(node, struct range, node);
> +
> +    return NULL;
>  }
> 
>  /* Return range following x in ascending order, or NULL if x is the highest. */
>  static struct range *next_range(
>      struct rangeset *r, struct range *x)
>  {
> -    if ( x->list.next == &r->range_list )
> -        return NULL;
> -    return list_entry(x->list.next, struct range, list);
> +    struct rb_node *node;
> +
> +    node = rb_next(&x->node);
> +    if ( node )
> +        return container_of(node, struct range, node);
> +
> +    return NULL;
>  }
> 
>  /* Insert range y after range x in r. Insert as first range if x is NULL. */
>  static void insert_range(
>      struct rangeset *r, struct range *x, struct range *y)
>  {
> -    list_add(&y->list, (x != NULL) ? &x->list : &r->range_list);
> +    struct rb_node **node;
> +    struct rb_node *parent = NULL;
> +
> +    if ( x == NULL )
> +        node = &r->range_tree.rb_node;
> +    else
> +    {
> +        node = &x->node.rb_right;
> +        parent = &x->node;
> +    }
> +
> +    while ( *node )
> +    {
> +        parent = *node;
> +        node = &parent->rb_left;
> +    }
> +
> +    /* Add new node and rebalance the red-black tree. */
> +    rb_link_node(&y->node, parent, node);
> +    rb_insert_color(&y->node, &r->range_tree);
>  }
> 
>  /* Remove a range from its list and free it. */
> @@ -88,7 +126,7 @@ static void destroy_range(
>  {
>      r->nr_ranges++;
> 
> -    list_del(&x->list);
> +    rb_erase(&x->node, &r->range_tree);
>      xfree(x);
>  }
> 
> @@ -319,7 +357,7 @@ bool_t rangeset_contains_singleton(
>  bool_t rangeset_is_empty(
>      const struct rangeset *r)
>  {
> -    return ((r == NULL) || list_empty(&r->range_list));
> +    return ((r == NULL) || RB_EMPTY_ROOT(&r->range_tree));
>  }
> 
>  struct rangeset *rangeset_new(
> @@ -332,7 +370,7 @@ struct rangeset *rangeset_new(
>          return NULL;
> 
>      rwlock_init(&r->lock);
> -    INIT_LIST_HEAD(&r->range_list);
> +    r->range_tree = RB_ROOT;
>      r->nr_ranges = -1;
> 
>      BUG_ON(flags & ~RANGESETF_prettyprint_hex);
> @@ -410,7 +448,7 @@ void rangeset_domain_destroy(
> 
>  void rangeset_swap(struct rangeset *a, struct rangeset *b)
>  {
> -    LIST_HEAD(tmp);
> +    struct rb_node* tmp;
> 
>      if ( a < b )
>      {
> @@ -423,9 +461,9 @@ void rangeset_swap(struct rangeset *a, struct
> rangeset *b)
>          write_lock(&a->lock);
>      }
> 
> -    list_splice_init(&a->range_list, &tmp);
> -    list_splice_init(&b->range_list, &a->range_list);
> -    list_splice(&tmp, &b->range_list);
> +    tmp = a->range_tree.rb_node;
> +    a->range_tree.rb_node = b->range_tree.rb_node;
> +    b->range_tree.rb_node = tmp;
> 
>      write_unlock(&a->lock);
>      write_unlock(&b->lock);
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 1/2] Differentiate IO/mem resources tracked by ioreq server
  2015-08-13 10:16   ` Paul Durrant
@ 2015-08-13 10:39     ` Yu, Zhang
  2015-08-14  7:40       ` Yu, Zhang
  0 siblings, 1 reply; 11+ messages in thread
From: Yu, Zhang @ 2015-08-13 10:39 UTC (permalink / raw)
  To: Paul Durrant, xen-devel@lists.xen.org, Ian Jackson,
	Stefano Stabellini, Ian Campbell, Wei Liu, Keir (Xen.org),
	jbeulich@suse.com, Andrew Cooper
  Cc: Kevin Tian, zhiyuan.lv@intel.com



On 8/13/2015 6:16 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
>> Sent: 13 August 2015 11:06
>> To: xen-devel@lists.xen.org; Paul Durrant; Ian Jackson; Stefano Stabellini; Ian
>> Campbell; Wei Liu; Keir (Xen.org); jbeulich@suse.com; Andrew Cooper
>> Cc: Kevin Tian; zhiyuan.lv@intel.com
>> Subject: [PATCH v4 1/2] Differentiate IO/mem resources tracked by ioreq
>> server
>>
>> Currently in ioreq server, guest write-protected ram pages are
>> tracked in the same rangeset with device mmio resources. Yet
>> unlike device mmio, which can be in big chunks, the guest write-
>> protected pages may be discrete ranges with 4K bytes each.
>
> Would the interfaces be better using xen_pfn_t rather than using uint64_t to pass addresses round where the bottom 12 bits will always be zero?
>
>    Paul

Thank you, Paul.
Well, I'm not quite sure if the bottom 12 bits of the address are zero.
I had thought these addresses are just guest physical ones causing
the ept violation, and not aligned down to its page address, like the
MMIO. So, if our handler code has already done that, using xen_pfn_t
could be better(my developing machine encountered some hardware
problem, will check the code tomorrow) :)

Yu
>
>>
>> This patch uses a seperate rangeset for the guest ram pages.
>> And a new ioreq type, IOREQ_TYPE_WP_MEM, is defined.
>>
>> Note: Previously, a new hypercall or subop was suggested to map
>> write-protected pages into ioreq server. However, it turned out
>> handler of this new hypercall would be almost the same with the
>> existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's
>> already a type parameter in this hypercall. So no new hypercall
>> defined, only a new type is introduced.
>>
>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>> ---
>>   tools/libxc/include/xenctrl.h    | 31 ++++++++++++++++++++++
>>   tools/libxc/xc_domain.c          | 55
>> ++++++++++++++++++++++++++++++++++++++++
>>   xen/arch/x86/hvm/hvm.c           | 26 ++++++++++++++++++-
>>   xen/include/asm-x86/hvm/domain.h |  4 +--
>>   xen/include/public/hvm/hvm_op.h  |  1 +
>>   xen/include/public/hvm/ioreq.h   |  1 +
>>   6 files changed, 115 insertions(+), 3 deletions(-)
>>
>> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
>> index de3c0ad..53f196d 100644
>> --- a/tools/libxc/include/xenctrl.h
>> +++ b/tools/libxc/include/xenctrl.h
>> @@ -2010,6 +2010,37 @@ int
>> xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch,
>>                                               int is_mmio,
>>                                               uint64_t start,
>>                                               uint64_t end);
>> +/**
>> + * This function registers a range of write-protected memory for emulation.
>> + *
>> + * @parm xch a handle to an open hypervisor interface.
>> + * @parm domid the domain id to be serviced
>> + * @parm id the IOREQ Server id.
>> + * @parm start start of range
>> + * @parm end end of range (inclusive).
>> + * @return 0 on success, -1 on failure.
>> + */
>> +int xc_hvm_map_mem_range_to_ioreq_server(xc_interface *xch,
>> +                                        domid_t domid,
>> +                                        ioservid_t id,
>> +                                        uint64_t start,
>> +                                        uint64_t end);
>> +
>> +/**
>> + * This function deregisters a range of write-protected memory for
>> emulation.
>> + *
>> + * @parm xch a handle to an open hypervisor interface.
>> + * @parm domid the domain id to be serviced
>> + * @parm id the IOREQ Server id.
>> + * @parm start start of range
>> + * @parm end end of range (inclusive).
>> + * @return 0 on success, -1 on failure.
>> + */
>> +int xc_hvm_unmap_mem_range_from_ioreq_server(xc_interface *xch,
>> +                                            domid_t domid,
>> +                                            ioservid_t id,
>> +                                            uint64_t start,
>> +                                            uint64_t end);
>>
>>   /**
>>    * This function registers a PCI device for config space emulation.
>> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
>> index 2ee26fb..42aeba9 100644
>> --- a/tools/libxc/xc_domain.c
>> +++ b/tools/libxc/xc_domain.c
>> @@ -1552,6 +1552,61 @@ int
>> xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch, domid_t
>> domid,
>>       return rc;
>>   }
>>
>> +int xc_hvm_map_mem_range_to_ioreq_server(xc_interface *xch,
>> domid_t domid,
>> +                                        ioservid_t id, uint64_t start, uint64_t end)
>> +{
>> +    DECLARE_HYPERCALL;
>> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
>> +    int rc;
>> +
>> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
>> +    if ( arg == NULL )
>> +        return -1;
>> +
>> +    hypercall.op     = __HYPERVISOR_hvm_op;
>> +    hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
>> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
>> +
>> +    arg->domid = domid;
>> +    arg->id = id;
>> +    arg->type = HVMOP_IO_RANGE_WP_MEM;
>> +    arg->start = start;
>> +    arg->end = end;
>> +
>> +    rc = do_xen_hypercall(xch, &hypercall);
>> +
>> +    xc_hypercall_buffer_free(xch, arg);
>> +    return rc;
>> +}
>> +
>> +int xc_hvm_unmap_mem_range_from_ioreq_server(xc_interface *xch,
>> domid_t domid,
>> +                                            ioservid_t id, uint64_t start, uint64_t end)
>> +{
>> +    DECLARE_HYPERCALL;
>> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
>> +    int rc;
>> +
>> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
>> +    if ( arg == NULL )
>> +        return -1;
>> +
>> +    hypercall.op     = __HYPERVISOR_hvm_op;
>> +    hypercall.arg[0] = HVMOP_unmap_io_range_from_ioreq_server;
>> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
>> +
>> +    arg->domid = domid;
>> +    arg->id = id;
>> +    arg->type = HVMOP_IO_RANGE_WP_MEM;
>> +    arg->start = start;
>> +    arg->end = end;
>> +
>> +    rc = do_xen_hypercall(xch, &hypercall);
>> +
>> +    xc_hypercall_buffer_free(xch, arg);
>> +    return rc;
>> +
>> +}
>> +
>>   int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch, domid_t
>> domid,
>>                                         ioservid_t id, uint16_t segment,
>>                                         uint8_t bus, uint8_t device,
>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>> index c957610..0389c0a 100644
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -938,8 +938,9 @@ static int hvm_ioreq_server_alloc_rangesets(struct
>> hvm_ioreq_server *s,
>>
>>           rc = asprintf(&name, "ioreq_server %d %s", s->id,
>>                         (i == HVMOP_IO_RANGE_PORT) ? "port" :
>> -                      (i == HVMOP_IO_RANGE_MEMORY) ? "memory" :
>> +                      (i == HVMOP_IO_RANGE_MEMORY) ? "mmio" :
>>                         (i == HVMOP_IO_RANGE_PCI) ? "pci" :
>> +                      (i == HVMOP_IO_RANGE_WP_MEM) ? "wp-ed memory" :
>>                         "");
>>           if ( rc )
>>               goto fail;
>> @@ -1258,6 +1259,7 @@ static int
>> hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
>>               case HVMOP_IO_RANGE_PORT:
>>               case HVMOP_IO_RANGE_MEMORY:
>>               case HVMOP_IO_RANGE_PCI:
>> +            case HVMOP_IO_RANGE_WP_MEM:
>>                   r = s->range[type];
>>                   break;
>>
>> @@ -1309,6 +1311,7 @@ static int
>> hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
>>               case HVMOP_IO_RANGE_PORT:
>>               case HVMOP_IO_RANGE_MEMORY:
>>               case HVMOP_IO_RANGE_PCI:
>> +            case HVMOP_IO_RANGE_WP_MEM:
>>                   r = s->range[type];
>>                   break;
>>
>> @@ -2523,6 +2526,8 @@ struct hvm_ioreq_server
>> *hvm_select_ioreq_server(struct domain *d,
>>       uint32_t cf8;
>>       uint8_t type;
>>       uint64_t addr;
>> +    p2m_type_t p2mt;
>> +    struct page_info *ram_page;
>>
>>       if ( list_empty(&d->arch.hvm_domain.ioreq_server.list) )
>>           return NULL;
>> @@ -2565,6 +2570,17 @@ struct hvm_ioreq_server
>> *hvm_select_ioreq_server(struct domain *d,
>>       {
>>           type = p->type;
>>           addr = p->addr;
>> +
>> +        if ( p->type == IOREQ_TYPE_COPY )
>> +        {
>> +            ram_page = get_page_from_gfn(d, p->addr >> PAGE_SHIFT,
>> +                                         &p2mt, P2M_UNSHARE);
>> +            if ( p2mt == p2m_mmio_write_dm )
>> +                type = IOREQ_TYPE_WP_MEM;
>> +
>> +            if ( ram_page )
>> +                put_page(ram_page);
>> +        }
>>       }
>>
>>       list_for_each_entry ( s,
>> @@ -2582,6 +2598,7 @@ struct hvm_ioreq_server
>> *hvm_select_ioreq_server(struct domain *d,
>>           BUILD_BUG_ON(IOREQ_TYPE_PIO != HVMOP_IO_RANGE_PORT);
>>           BUILD_BUG_ON(IOREQ_TYPE_COPY != HVMOP_IO_RANGE_MEMORY);
>>           BUILD_BUG_ON(IOREQ_TYPE_PCI_CONFIG !=
>> HVMOP_IO_RANGE_PCI);
>> +        BUILD_BUG_ON(IOREQ_TYPE_WP_MEM !=
>> HVMOP_IO_RANGE_WP_MEM);
>>           r = s->range[type];
>>
>>           switch ( type )
>> @@ -2609,6 +2626,13 @@ struct hvm_ioreq_server
>> *hvm_select_ioreq_server(struct domain *d,
>>               }
>>
>>               break;
>> +
>> +        case IOREQ_TYPE_WP_MEM:
>> +            end = addr + (p->size * p->count) - 1;
>> +            if ( rangeset_contains_range(r, addr, end) )
>> +                return s;
>> +
>> +            break;
>>           }
>>       }
>>
>> diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-
>> x86/hvm/domain.h
>> index 992d5d1..b2e4234 100644
>> --- a/xen/include/asm-x86/hvm/domain.h
>> +++ b/xen/include/asm-x86/hvm/domain.h
>> @@ -48,8 +48,8 @@ struct hvm_ioreq_vcpu {
>>       bool_t           pending;
>>   };
>>
>> -#define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_PCI + 1)
>> -#define MAX_NR_IO_RANGES  256
>> +#define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_WP_MEM + 1)
>> +#define MAX_NR_IO_RANGES  8192
>>
>>   struct hvm_ioreq_server {
>>       struct list_head       list_entry;
>> diff --git a/xen/include/public/hvm/hvm_op.h
>> b/xen/include/public/hvm/hvm_op.h
>> index 014546a..c02c91a 100644
>> --- a/xen/include/public/hvm/hvm_op.h
>> +++ b/xen/include/public/hvm/hvm_op.h
>> @@ -331,6 +331,7 @@ struct xen_hvm_io_range {
>>   # define HVMOP_IO_RANGE_PORT   0 /* I/O port range */
>>   # define HVMOP_IO_RANGE_MEMORY 1 /* MMIO range */
>>   # define HVMOP_IO_RANGE_PCI    2 /* PCI segment/bus/dev/func range */
>> +# define HVMOP_IO_RANGE_WP_MEM 3 /* Write-protected memory
>> range */
>>       uint64_aligned_t start, end; /* IN - inclusive start and end of range */
>>   };
>>   typedef struct xen_hvm_io_range xen_hvm_io_range_t;
>> diff --git a/xen/include/public/hvm/ioreq.h
>> b/xen/include/public/hvm/ioreq.h
>> index 2e5809b..2b712ac 100644
>> --- a/xen/include/public/hvm/ioreq.h
>> +++ b/xen/include/public/hvm/ioreq.h
>> @@ -35,6 +35,7 @@
>>   #define IOREQ_TYPE_PIO          0 /* pio */
>>   #define IOREQ_TYPE_COPY         1 /* mmio ops */
>>   #define IOREQ_TYPE_PCI_CONFIG   2
>> +#define IOREQ_TYPE_WP_MEM       3
>>   #define IOREQ_TYPE_TIMEOFFSET   7
>>   #define IOREQ_TYPE_INVALIDATE   8 /* mapcache */
>>
>> --
>> 1.9.1
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/2] Refactor rangeset structure for better performance.
  2015-08-13 10:33   ` Paul Durrant
@ 2015-08-13 10:46     ` Yu, Zhang
  2015-08-13 11:29       ` Paul Durrant
  0 siblings, 1 reply; 11+ messages in thread
From: Yu, Zhang @ 2015-08-13 10:46 UTC (permalink / raw)
  To: Paul Durrant, xen-devel@lists.xen.org, Ian Jackson,
	Stefano Stabellini, Ian Campbell, Wei Liu, Keir (Xen.org),
	jbeulich@suse.com, Andrew Cooper
  Cc: Kevin Tian, zhiyuan.lv@intel.com



On 8/13/2015 6:33 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
>> Sent: 13 August 2015 11:06
>> To: xen-devel@lists.xen.org; Paul Durrant; Ian Jackson; Stefano Stabellini; Ian
>> Campbell; Wei Liu; Keir (Xen.org); jbeulich@suse.com; Andrew Cooper
>> Cc: Kevin Tian; zhiyuan.lv@intel.com
>> Subject: [PATCH v4 2/2] Refactor rangeset structure for better performance.
>>
>> This patch refactors struct rangeset to base it on the red-black
>> tree structure, instead of on the current doubly linked list. By
>> now, ioreq leverages rangeset to keep track of the IO/memory
>> resources to be emulated. Yet when number of ranges inside one
>> ioreq server is very high, traversing a doubly linked list could
>> be time consuming. With this patch, the time complexity for
>> searching a rangeset can be improved from O(n) to O(log(n)).
>> Interfaces of rangeset still remain the same, and no new APIs
>> introduced.
>>
>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>> ---
>>   xen/common/rangeset.c | 82
>> +++++++++++++++++++++++++++++++++++++--------------
>>   1 file changed, 60 insertions(+), 22 deletions(-)
>>
>> diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
>> index 6c6293c..87b6aab 100644
>> --- a/xen/common/rangeset.c
>> +++ b/xen/common/rangeset.c
>> @@ -10,11 +10,12 @@
>>   #include <xen/sched.h>
>>   #include <xen/errno.h>
>>   #include <xen/rangeset.h>
>> +#include <xen/rbtree.h>
>>   #include <xsm/xsm.h>
>>
>>   /* An inclusive range [s,e] and pointer to next range in ascending order. */
>>   struct range {
>> -    struct list_head list;
>> +    struct rb_node node;
>>       unsigned long s, e;
>>   };
>>
>> @@ -24,7 +25,7 @@ struct rangeset {
>>       struct domain   *domain;
>>
>>       /* Ordered list of ranges contained in this set, and protecting lock. */
>> -    struct list_head range_list;
>> +    struct rb_root   range_tree;
>>
>>       /* Number of ranges that can be allocated */
>>       long             nr_ranges;
>> @@ -45,41 +46,78 @@ struct rangeset {
>>   static struct range *find_range(
>>       struct rangeset *r, unsigned long s)
>>   {
>> -    struct range *x = NULL, *y;
>> +    struct rb_node *node;
>> +    struct range   *x;
>> +    struct range   *prev = NULL;
>>
>> -    list_for_each_entry ( y, &r->range_list, list )
>> +    node = r->range_tree.rb_node;
>> +    while ( node )
>>       {
>> -        if ( y->s > s )
>> -            break;
>> -        x = y;
>> +        x = container_of(node, struct range, node);
>> +        if ( (s >= x->s) && (s <= x->e) )
>> +            return x;
>> +        if ( s < x->s )
>> +            node = node->rb_left;
>> +        else if ( s > x->s )
>
> Do you need else if there? I think else would do. s either has to be < x->s or > x->e to get to this point.
Thanks. You are right. :)
>
>> +        {
>> +            prev = x;
>> +            node = node->rb_right;
>> +        }
>>       }
>>
>> -    return x;
>> +    return prev;
>
> Do you not want to return NULL here? It looks like you'd only get here if there was no matching range.
Well, prev can either be NULL(like its initial value), or pointing to
the previous node of the range. It's to keep the original intention of
routine find_range() - to return the range if found, or return the
previous one, or NULL.

B.R.
Yu
>
>    Paul
>
>>   }
>>
>>   /* Return the lowest range in the set r, or NULL if r is empty. */
>>   static struct range *first_range(
>>       struct rangeset *r)
>>   {
>> -    if ( list_empty(&r->range_list) )
>> -        return NULL;
>> -    return list_entry(r->range_list.next, struct range, list);
>> +    struct rb_node *node;
>> +
>> +    node = rb_first(&r->range_tree);
>> +    if ( node != NULL )
>> +        return container_of(node, struct range, node);
>> +
>> +    return NULL;
>>   }
>>
>>   /* Return range following x in ascending order, or NULL if x is the highest. */
>>   static struct range *next_range(
>>       struct rangeset *r, struct range *x)
>>   {
>> -    if ( x->list.next == &r->range_list )
>> -        return NULL;
>> -    return list_entry(x->list.next, struct range, list);
>> +    struct rb_node *node;
>> +
>> +    node = rb_next(&x->node);
>> +    if ( node )
>> +        return container_of(node, struct range, node);
>> +
>> +    return NULL;
>>   }
>>
>>   /* Insert range y after range x in r. Insert as first range if x is NULL. */
>>   static void insert_range(
>>       struct rangeset *r, struct range *x, struct range *y)
>>   {
>> -    list_add(&y->list, (x != NULL) ? &x->list : &r->range_list);
>> +    struct rb_node **node;
>> +    struct rb_node *parent = NULL;
>> +
>> +    if ( x == NULL )
>> +        node = &r->range_tree.rb_node;
>> +    else
>> +    {
>> +        node = &x->node.rb_right;
>> +        parent = &x->node;
>> +    }
>> +
>> +    while ( *node )
>> +    {
>> +        parent = *node;
>> +        node = &parent->rb_left;
>> +    }
>> +
>> +    /* Add new node and rebalance the red-black tree. */
>> +    rb_link_node(&y->node, parent, node);
>> +    rb_insert_color(&y->node, &r->range_tree);
>>   }
>>
>>   /* Remove a range from its list and free it. */
>> @@ -88,7 +126,7 @@ static void destroy_range(
>>   {
>>       r->nr_ranges++;
>>
>> -    list_del(&x->list);
>> +    rb_erase(&x->node, &r->range_tree);
>>       xfree(x);
>>   }
>>
>> @@ -319,7 +357,7 @@ bool_t rangeset_contains_singleton(
>>   bool_t rangeset_is_empty(
>>       const struct rangeset *r)
>>   {
>> -    return ((r == NULL) || list_empty(&r->range_list));
>> +    return ((r == NULL) || RB_EMPTY_ROOT(&r->range_tree));
>>   }
>>
>>   struct rangeset *rangeset_new(
>> @@ -332,7 +370,7 @@ struct rangeset *rangeset_new(
>>           return NULL;
>>
>>       rwlock_init(&r->lock);
>> -    INIT_LIST_HEAD(&r->range_list);
>> +    r->range_tree = RB_ROOT;
>>       r->nr_ranges = -1;
>>
>>       BUG_ON(flags & ~RANGESETF_prettyprint_hex);
>> @@ -410,7 +448,7 @@ void rangeset_domain_destroy(
>>
>>   void rangeset_swap(struct rangeset *a, struct rangeset *b)
>>   {
>> -    LIST_HEAD(tmp);
>> +    struct rb_node* tmp;
>>
>>       if ( a < b )
>>       {
>> @@ -423,9 +461,9 @@ void rangeset_swap(struct rangeset *a, struct
>> rangeset *b)
>>           write_lock(&a->lock);
>>       }
>>
>> -    list_splice_init(&a->range_list, &tmp);
>> -    list_splice_init(&b->range_list, &a->range_list);
>> -    list_splice(&tmp, &b->range_list);
>> +    tmp = a->range_tree.rb_node;
>> +    a->range_tree.rb_node = b->range_tree.rb_node;
>> +    b->range_tree.rb_node = tmp;
>>
>>       write_unlock(&a->lock);
>>       write_unlock(&b->lock);
>> --
>> 1.9.1
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/2] Refactor rangeset structure for better performance.
  2015-08-13 10:46     ` Yu, Zhang
@ 2015-08-13 11:29       ` Paul Durrant
  0 siblings, 0 replies; 11+ messages in thread
From: Paul Durrant @ 2015-08-13 11:29 UTC (permalink / raw)
  To: Yu, Zhang, xen-devel@lists.xen.org, Ian Jackson,
	Stefano Stabellini, Ian Campbell, Wei Liu, Keir (Xen.org),
	jbeulich@suse.com, Andrew Cooper
  Cc: Kevin Tian, zhiyuan.lv@intel.com

> -----Original Message-----
> From: Yu, Zhang [mailto:yu.c.zhang@linux.intel.com]
> Sent: 13 August 2015 11:47
> To: Paul Durrant; xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Ian
> Campbell; Wei Liu; Keir (Xen.org); jbeulich@suse.com; Andrew Cooper
> Cc: Kevin Tian; zhiyuan.lv@intel.com
> Subject: Re: [Xen-devel] [PATCH v4 2/2] Refactor rangeset structure for
> better performance.
> 
> 
> 
> On 8/13/2015 6:33 PM, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
> >> Sent: 13 August 2015 11:06
> >> To: xen-devel@lists.xen.org; Paul Durrant; Ian Jackson; Stefano Stabellini;
> Ian
> >> Campbell; Wei Liu; Keir (Xen.org); jbeulich@suse.com; Andrew Cooper
> >> Cc: Kevin Tian; zhiyuan.lv@intel.com
> >> Subject: [PATCH v4 2/2] Refactor rangeset structure for better
> performance.
> >>
> >> This patch refactors struct rangeset to base it on the red-black
> >> tree structure, instead of on the current doubly linked list. By
> >> now, ioreq leverages rangeset to keep track of the IO/memory
> >> resources to be emulated. Yet when number of ranges inside one
> >> ioreq server is very high, traversing a doubly linked list could
> >> be time consuming. With this patch, the time complexity for
> >> searching a rangeset can be improved from O(n) to O(log(n)).
> >> Interfaces of rangeset still remain the same, and no new APIs
> >> introduced.
> >>
> >> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> >> ---
> >>   xen/common/rangeset.c | 82
> >> +++++++++++++++++++++++++++++++++++++--------------
> >>   1 file changed, 60 insertions(+), 22 deletions(-)
> >>
> >> diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
> >> index 6c6293c..87b6aab 100644
> >> --- a/xen/common/rangeset.c
> >> +++ b/xen/common/rangeset.c
> >> @@ -10,11 +10,12 @@
> >>   #include <xen/sched.h>
> >>   #include <xen/errno.h>
> >>   #include <xen/rangeset.h>
> >> +#include <xen/rbtree.h>
> >>   #include <xsm/xsm.h>
> >>
> >>   /* An inclusive range [s,e] and pointer to next range in ascending order.
> */
> >>   struct range {
> >> -    struct list_head list;
> >> +    struct rb_node node;
> >>       unsigned long s, e;
> >>   };
> >>
> >> @@ -24,7 +25,7 @@ struct rangeset {
> >>       struct domain   *domain;
> >>
> >>       /* Ordered list of ranges contained in this set, and protecting lock. */
> >> -    struct list_head range_list;
> >> +    struct rb_root   range_tree;
> >>
> >>       /* Number of ranges that can be allocated */
> >>       long             nr_ranges;
> >> @@ -45,41 +46,78 @@ struct rangeset {
> >>   static struct range *find_range(
> >>       struct rangeset *r, unsigned long s)
> >>   {
> >> -    struct range *x = NULL, *y;
> >> +    struct rb_node *node;
> >> +    struct range   *x;
> >> +    struct range   *prev = NULL;
> >>
> >> -    list_for_each_entry ( y, &r->range_list, list )
> >> +    node = r->range_tree.rb_node;
> >> +    while ( node )
> >>       {
> >> -        if ( y->s > s )
> >> -            break;
> >> -        x = y;
> >> +        x = container_of(node, struct range, node);
> >> +        if ( (s >= x->s) && (s <= x->e) )
> >> +            return x;
> >> +        if ( s < x->s )
> >> +            node = node->rb_left;
> >> +        else if ( s > x->s )
> >
> > Do you need else if there? I think else would do. s either has to be < x->s or
> > x->e to get to this point.
> Thanks. You are right. :)
> >
> >> +        {
> >> +            prev = x;
> >> +            node = node->rb_right;
> >> +        }
> >>       }
> >>
> >> -    return x;
> >> +    return prev;
> >
> > Do you not want to return NULL here? It looks like you'd only get here if
> there was no matching range.
> Well, prev can either be NULL(like its initial value), or pointing to
> the previous node of the range. It's to keep the original intention of
> routine find_range() - to return the range if found, or return the
> previous one, or NULL.

If that's the correct semantic then that's fine.

  Paul

> 
> B.R.
> Yu
> >
> >    Paul
> >
> >>   }
> >>
> >>   /* Return the lowest range in the set r, or NULL if r is empty. */
> >>   static struct range *first_range(
> >>       struct rangeset *r)
> >>   {
> >> -    if ( list_empty(&r->range_list) )
> >> -        return NULL;
> >> -    return list_entry(r->range_list.next, struct range, list);
> >> +    struct rb_node *node;
> >> +
> >> +    node = rb_first(&r->range_tree);
> >> +    if ( node != NULL )
> >> +        return container_of(node, struct range, node);
> >> +
> >> +    return NULL;
> >>   }
> >>
> >>   /* Return range following x in ascending order, or NULL if x is the highest.
> */
> >>   static struct range *next_range(
> >>       struct rangeset *r, struct range *x)
> >>   {
> >> -    if ( x->list.next == &r->range_list )
> >> -        return NULL;
> >> -    return list_entry(x->list.next, struct range, list);
> >> +    struct rb_node *node;
> >> +
> >> +    node = rb_next(&x->node);
> >> +    if ( node )
> >> +        return container_of(node, struct range, node);
> >> +
> >> +    return NULL;
> >>   }
> >>
> >>   /* Insert range y after range x in r. Insert as first range if x is NULL. */
> >>   static void insert_range(
> >>       struct rangeset *r, struct range *x, struct range *y)
> >>   {
> >> -    list_add(&y->list, (x != NULL) ? &x->list : &r->range_list);
> >> +    struct rb_node **node;
> >> +    struct rb_node *parent = NULL;
> >> +
> >> +    if ( x == NULL )
> >> +        node = &r->range_tree.rb_node;
> >> +    else
> >> +    {
> >> +        node = &x->node.rb_right;
> >> +        parent = &x->node;
> >> +    }
> >> +
> >> +    while ( *node )
> >> +    {
> >> +        parent = *node;
> >> +        node = &parent->rb_left;
> >> +    }
> >> +
> >> +    /* Add new node and rebalance the red-black tree. */
> >> +    rb_link_node(&y->node, parent, node);
> >> +    rb_insert_color(&y->node, &r->range_tree);
> >>   }
> >>
> >>   /* Remove a range from its list and free it. */
> >> @@ -88,7 +126,7 @@ static void destroy_range(
> >>   {
> >>       r->nr_ranges++;
> >>
> >> -    list_del(&x->list);
> >> +    rb_erase(&x->node, &r->range_tree);
> >>       xfree(x);
> >>   }
> >>
> >> @@ -319,7 +357,7 @@ bool_t rangeset_contains_singleton(
> >>   bool_t rangeset_is_empty(
> >>       const struct rangeset *r)
> >>   {
> >> -    return ((r == NULL) || list_empty(&r->range_list));
> >> +    return ((r == NULL) || RB_EMPTY_ROOT(&r->range_tree));
> >>   }
> >>
> >>   struct rangeset *rangeset_new(
> >> @@ -332,7 +370,7 @@ struct rangeset *rangeset_new(
> >>           return NULL;
> >>
> >>       rwlock_init(&r->lock);
> >> -    INIT_LIST_HEAD(&r->range_list);
> >> +    r->range_tree = RB_ROOT;
> >>       r->nr_ranges = -1;
> >>
> >>       BUG_ON(flags & ~RANGESETF_prettyprint_hex);
> >> @@ -410,7 +448,7 @@ void rangeset_domain_destroy(
> >>
> >>   void rangeset_swap(struct rangeset *a, struct rangeset *b)
> >>   {
> >> -    LIST_HEAD(tmp);
> >> +    struct rb_node* tmp;
> >>
> >>       if ( a < b )
> >>       {
> >> @@ -423,9 +461,9 @@ void rangeset_swap(struct rangeset *a, struct
> >> rangeset *b)
> >>           write_lock(&a->lock);
> >>       }
> >>
> >> -    list_splice_init(&a->range_list, &tmp);
> >> -    list_splice_init(&b->range_list, &a->range_list);
> >> -    list_splice(&tmp, &b->range_list);
> >> +    tmp = a->range_tree.rb_node;
> >> +    a->range_tree.rb_node = b->range_tree.rb_node;
> >> +    b->range_tree.rb_node = tmp;
> >>
> >>       write_unlock(&a->lock);
> >>       write_unlock(&b->lock);
> >> --
> >> 1.9.1
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> >
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 1/2] Differentiate IO/mem resources tracked by ioreq server
  2015-08-13 10:39     ` Yu, Zhang
@ 2015-08-14  7:40       ` Yu, Zhang
  2015-08-14  8:46         ` Paul Durrant
  0 siblings, 1 reply; 11+ messages in thread
From: Yu, Zhang @ 2015-08-14  7:40 UTC (permalink / raw)
  To: Paul Durrant, xen-devel@lists.xen.org, Ian Jackson,
	Stefano Stabellini, Ian Campbell, Wei Liu, Keir (Xen.org),
	jbeulich@suse.com, Andrew Cooper
  Cc: Kevin Tian, zhiyuan.lv@intel.com

Hi Paul,

On 8/13/2015 6:39 PM, Yu, Zhang wrote:
>
>
> On 8/13/2015 6:16 PM, Paul Durrant wrote:
>>> -----Original Message-----
>>> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
>>> Sent: 13 August 2015 11:06
>>> To: xen-devel@lists.xen.org; Paul Durrant; Ian Jackson; Stefano
>>> Stabellini; Ian
>>> Campbell; Wei Liu; Keir (Xen.org); jbeulich@suse.com; Andrew Cooper
>>> Cc: Kevin Tian; zhiyuan.lv@intel.com
>>> Subject: [PATCH v4 1/2] Differentiate IO/mem resources tracked by ioreq
>>> server
>>>
>>> Currently in ioreq server, guest write-protected ram pages are
>>> tracked in the same rangeset with device mmio resources. Yet
>>> unlike device mmio, which can be in big chunks, the guest write-
>>> protected pages may be discrete ranges with 4K bytes each.
>>
>> Would the interfaces be better using xen_pfn_t rather than using
>> uint64_t to pass addresses round where the bottom 12 bits will always
>> be zero?
>>
>>    Paul
>
> Thank you, Paul.
> Well, I'm not quite sure if the bottom 12 bits of the address are zero.
> I had thought these addresses are just guest physical ones causing
> the ept violation, and not aligned down to its page address, like the
> MMIO. So, if our handler code has already done that, using xen_pfn_t
> could be better(my developing machine encountered some hardware
> problem, will check the code tomorrow) :)
>
> Yu

I just checked the code in hvm_select_ioreq_server(), and found value
of address is not aligned down, and the lower 12 bits are not zero.
So I wonder, is it necessary to do the shift to get a gfn, and what's
the benefit?

Thanks
Yu
>>
>>>
>>> This patch uses a seperate rangeset for the guest ram pages.
>>> And a new ioreq type, IOREQ_TYPE_WP_MEM, is defined.
>>>
>>> Note: Previously, a new hypercall or subop was suggested to map
>>> write-protected pages into ioreq server. However, it turned out
>>> handler of this new hypercall would be almost the same with the
>>> existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's
>>> already a type parameter in this hypercall. So no new hypercall
>>> defined, only a new type is introduced.
>>>
>>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>>> ---
>>>   tools/libxc/include/xenctrl.h    | 31 ++++++++++++++++++++++
>>>   tools/libxc/xc_domain.c          | 55
>>> ++++++++++++++++++++++++++++++++++++++++
>>>   xen/arch/x86/hvm/hvm.c           | 26 ++++++++++++++++++-
>>>   xen/include/asm-x86/hvm/domain.h |  4 +--
>>>   xen/include/public/hvm/hvm_op.h  |  1 +
>>>   xen/include/public/hvm/ioreq.h   |  1 +
>>>   6 files changed, 115 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/tools/libxc/include/xenctrl.h
>>> b/tools/libxc/include/xenctrl.h
>>> index de3c0ad..53f196d 100644
>>> --- a/tools/libxc/include/xenctrl.h
>>> +++ b/tools/libxc/include/xenctrl.h
>>> @@ -2010,6 +2010,37 @@ int
>>> xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch,
>>>                                               int is_mmio,
>>>                                               uint64_t start,
>>>                                               uint64_t end);
>>> +/**
>>> + * This function registers a range of write-protected memory for
>>> emulation.
>>> + *
>>> + * @parm xch a handle to an open hypervisor interface.
>>> + * @parm domid the domain id to be serviced
>>> + * @parm id the IOREQ Server id.
>>> + * @parm start start of range
>>> + * @parm end end of range (inclusive).
>>> + * @return 0 on success, -1 on failure.
>>> + */
>>> +int xc_hvm_map_mem_range_to_ioreq_server(xc_interface *xch,
>>> +                                        domid_t domid,
>>> +                                        ioservid_t id,
>>> +                                        uint64_t start,
>>> +                                        uint64_t end);
>>> +
>>> +/**
>>> + * This function deregisters a range of write-protected memory for
>>> emulation.
>>> + *
>>> + * @parm xch a handle to an open hypervisor interface.
>>> + * @parm domid the domain id to be serviced
>>> + * @parm id the IOREQ Server id.
>>> + * @parm start start of range
>>> + * @parm end end of range (inclusive).
>>> + * @return 0 on success, -1 on failure.
>>> + */
>>> +int xc_hvm_unmap_mem_range_from_ioreq_server(xc_interface *xch,
>>> +                                            domid_t domid,
>>> +                                            ioservid_t id,
>>> +                                            uint64_t start,
>>> +                                            uint64_t end);
>>>
>>>   /**
>>>    * This function registers a PCI device for config space emulation.
>>> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
>>> index 2ee26fb..42aeba9 100644
>>> --- a/tools/libxc/xc_domain.c
>>> +++ b/tools/libxc/xc_domain.c
>>> @@ -1552,6 +1552,61 @@ int
>>> xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch, domid_t
>>> domid,
>>>       return rc;
>>>   }
>>>
>>> +int xc_hvm_map_mem_range_to_ioreq_server(xc_interface *xch,
>>> domid_t domid,
>>> +                                        ioservid_t id, uint64_t
>>> start, uint64_t end)
>>> +{
>>> +    DECLARE_HYPERCALL;
>>> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
>>> +    int rc;
>>> +
>>> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
>>> +    if ( arg == NULL )
>>> +        return -1;
>>> +
>>> +    hypercall.op     = __HYPERVISOR_hvm_op;
>>> +    hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
>>> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
>>> +
>>> +    arg->domid = domid;
>>> +    arg->id = id;
>>> +    arg->type = HVMOP_IO_RANGE_WP_MEM;
>>> +    arg->start = start;
>>> +    arg->end = end;
>>> +
>>> +    rc = do_xen_hypercall(xch, &hypercall);
>>> +
>>> +    xc_hypercall_buffer_free(xch, arg);
>>> +    return rc;
>>> +}
>>> +
>>> +int xc_hvm_unmap_mem_range_from_ioreq_server(xc_interface *xch,
>>> domid_t domid,
>>> +                                            ioservid_t id, uint64_t
>>> start, uint64_t end)
>>> +{
>>> +    DECLARE_HYPERCALL;
>>> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
>>> +    int rc;
>>> +
>>> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
>>> +    if ( arg == NULL )
>>> +        return -1;
>>> +
>>> +    hypercall.op     = __HYPERVISOR_hvm_op;
>>> +    hypercall.arg[0] = HVMOP_unmap_io_range_from_ioreq_server;
>>> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
>>> +
>>> +    arg->domid = domid;
>>> +    arg->id = id;
>>> +    arg->type = HVMOP_IO_RANGE_WP_MEM;
>>> +    arg->start = start;
>>> +    arg->end = end;
>>> +
>>> +    rc = do_xen_hypercall(xch, &hypercall);
>>> +
>>> +    xc_hypercall_buffer_free(xch, arg);
>>> +    return rc;
>>> +
>>> +}
>>> +
>>>   int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch, domid_t
>>> domid,
>>>                                         ioservid_t id, uint16_t segment,
>>>                                         uint8_t bus, uint8_t device,
>>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>>> index c957610..0389c0a 100644
>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -938,8 +938,9 @@ static int hvm_ioreq_server_alloc_rangesets(struct
>>> hvm_ioreq_server *s,
>>>
>>>           rc = asprintf(&name, "ioreq_server %d %s", s->id,
>>>                         (i == HVMOP_IO_RANGE_PORT) ? "port" :
>>> -                      (i == HVMOP_IO_RANGE_MEMORY) ? "memory" :
>>> +                      (i == HVMOP_IO_RANGE_MEMORY) ? "mmio" :
>>>                         (i == HVMOP_IO_RANGE_PCI) ? "pci" :
>>> +                      (i == HVMOP_IO_RANGE_WP_MEM) ? "wp-ed memory" :
>>>                         "");
>>>           if ( rc )
>>>               goto fail;
>>> @@ -1258,6 +1259,7 @@ static int
>>> hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
>>>               case HVMOP_IO_RANGE_PORT:
>>>               case HVMOP_IO_RANGE_MEMORY:
>>>               case HVMOP_IO_RANGE_PCI:
>>> +            case HVMOP_IO_RANGE_WP_MEM:
>>>                   r = s->range[type];
>>>                   break;
>>>
>>> @@ -1309,6 +1311,7 @@ static int
>>> hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
>>>               case HVMOP_IO_RANGE_PORT:
>>>               case HVMOP_IO_RANGE_MEMORY:
>>>               case HVMOP_IO_RANGE_PCI:
>>> +            case HVMOP_IO_RANGE_WP_MEM:
>>>                   r = s->range[type];
>>>                   break;
>>>
>>> @@ -2523,6 +2526,8 @@ struct hvm_ioreq_server
>>> *hvm_select_ioreq_server(struct domain *d,
>>>       uint32_t cf8;
>>>       uint8_t type;
>>>       uint64_t addr;
>>> +    p2m_type_t p2mt;
>>> +    struct page_info *ram_page;
>>>
>>>       if ( list_empty(&d->arch.hvm_domain.ioreq_server.list) )
>>>           return NULL;
>>> @@ -2565,6 +2570,17 @@ struct hvm_ioreq_server
>>> *hvm_select_ioreq_server(struct domain *d,
>>>       {
>>>           type = p->type;
>>>           addr = p->addr;
>>> +
>>> +        if ( p->type == IOREQ_TYPE_COPY )
>>> +        {
>>> +            ram_page = get_page_from_gfn(d, p->addr >> PAGE_SHIFT,
>>> +                                         &p2mt, P2M_UNSHARE);
>>> +            if ( p2mt == p2m_mmio_write_dm )
>>> +                type = IOREQ_TYPE_WP_MEM;
>>> +
>>> +            if ( ram_page )
>>> +                put_page(ram_page);
>>> +        }
>>>       }
>>>
>>>       list_for_each_entry ( s,
>>> @@ -2582,6 +2598,7 @@ struct hvm_ioreq_server
>>> *hvm_select_ioreq_server(struct domain *d,
>>>           BUILD_BUG_ON(IOREQ_TYPE_PIO != HVMOP_IO_RANGE_PORT);
>>>           BUILD_BUG_ON(IOREQ_TYPE_COPY != HVMOP_IO_RANGE_MEMORY);
>>>           BUILD_BUG_ON(IOREQ_TYPE_PCI_CONFIG !=
>>> HVMOP_IO_RANGE_PCI);
>>> +        BUILD_BUG_ON(IOREQ_TYPE_WP_MEM !=
>>> HVMOP_IO_RANGE_WP_MEM);
>>>           r = s->range[type];
>>>
>>>           switch ( type )
>>> @@ -2609,6 +2626,13 @@ struct hvm_ioreq_server
>>> *hvm_select_ioreq_server(struct domain *d,
>>>               }
>>>
>>>               break;
>>> +
>>> +        case IOREQ_TYPE_WP_MEM:
>>> +            end = addr + (p->size * p->count) - 1;
>>> +            if ( rangeset_contains_range(r, addr, end) )
>>> +                return s;
>>> +
>>> +            break;
>>>           }
>>>       }
>>>
>>> diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-
>>> x86/hvm/domain.h
>>> index 992d5d1..b2e4234 100644
>>> --- a/xen/include/asm-x86/hvm/domain.h
>>> +++ b/xen/include/asm-x86/hvm/domain.h
>>> @@ -48,8 +48,8 @@ struct hvm_ioreq_vcpu {
>>>       bool_t           pending;
>>>   };
>>>
>>> -#define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_PCI + 1)
>>> -#define MAX_NR_IO_RANGES  256
>>> +#define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_WP_MEM + 1)
>>> +#define MAX_NR_IO_RANGES  8192
>>>
>>>   struct hvm_ioreq_server {
>>>       struct list_head       list_entry;
>>> diff --git a/xen/include/public/hvm/hvm_op.h
>>> b/xen/include/public/hvm/hvm_op.h
>>> index 014546a..c02c91a 100644
>>> --- a/xen/include/public/hvm/hvm_op.h
>>> +++ b/xen/include/public/hvm/hvm_op.h
>>> @@ -331,6 +331,7 @@ struct xen_hvm_io_range {
>>>   # define HVMOP_IO_RANGE_PORT   0 /* I/O port range */
>>>   # define HVMOP_IO_RANGE_MEMORY 1 /* MMIO range */
>>>   # define HVMOP_IO_RANGE_PCI    2 /* PCI segment/bus/dev/func range */
>>> +# define HVMOP_IO_RANGE_WP_MEM 3 /* Write-protected memory
>>> range */
>>>       uint64_aligned_t start, end; /* IN - inclusive start and end of
>>> range */
>>>   };
>>>   typedef struct xen_hvm_io_range xen_hvm_io_range_t;
>>> diff --git a/xen/include/public/hvm/ioreq.h
>>> b/xen/include/public/hvm/ioreq.h
>>> index 2e5809b..2b712ac 100644
>>> --- a/xen/include/public/hvm/ioreq.h
>>> +++ b/xen/include/public/hvm/ioreq.h
>>> @@ -35,6 +35,7 @@
>>>   #define IOREQ_TYPE_PIO          0 /* pio */
>>>   #define IOREQ_TYPE_COPY         1 /* mmio ops */
>>>   #define IOREQ_TYPE_PCI_CONFIG   2
>>> +#define IOREQ_TYPE_WP_MEM       3
>>>   #define IOREQ_TYPE_TIMEOFFSET   7
>>>   #define IOREQ_TYPE_INVALIDATE   8 /* mapcache */
>>>
>>> --
>>> 1.9.1
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>>
>>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 1/2] Differentiate IO/mem resources tracked by ioreq server
  2015-08-14  7:40       ` Yu, Zhang
@ 2015-08-14  8:46         ` Paul Durrant
  2015-08-14  9:30           ` Yu, Zhang
  0 siblings, 1 reply; 11+ messages in thread
From: Paul Durrant @ 2015-08-14  8:46 UTC (permalink / raw)
  To: Yu, Zhang, xen-devel@lists.xen.org, Ian Jackson,
	Stefano Stabellini, Ian Campbell, Wei Liu, Keir (Xen.org),
	jbeulich@suse.com, Andrew Cooper
  Cc: Kevin Tian, zhiyuan.lv@intel.com

> -----Original Message-----
> From: Yu, Zhang [mailto:yu.c.zhang@linux.intel.com]
> Sent: 14 August 2015 08:41
> To: Paul Durrant; xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Ian
> Campbell; Wei Liu; Keir (Xen.org); jbeulich@suse.com; Andrew Cooper
> Cc: Kevin Tian; zhiyuan.lv@intel.com
> Subject: Re: [Xen-devel] [PATCH v4 1/2] Differentiate IO/mem resources
> tracked by ioreq server
> 
> Hi Paul,
> 
> On 8/13/2015 6:39 PM, Yu, Zhang wrote:
> >
> >
> > On 8/13/2015 6:16 PM, Paul Durrant wrote:
> >>> -----Original Message-----
> >>> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
> >>> Sent: 13 August 2015 11:06
> >>> To: xen-devel@lists.xen.org; Paul Durrant; Ian Jackson; Stefano
> >>> Stabellini; Ian
> >>> Campbell; Wei Liu; Keir (Xen.org); jbeulich@suse.com; Andrew Cooper
> >>> Cc: Kevin Tian; zhiyuan.lv@intel.com
> >>> Subject: [PATCH v4 1/2] Differentiate IO/mem resources tracked by
> ioreq
> >>> server
> >>>
> >>> Currently in ioreq server, guest write-protected ram pages are
> >>> tracked in the same rangeset with device mmio resources. Yet
> >>> unlike device mmio, which can be in big chunks, the guest write-
> >>> protected pages may be discrete ranges with 4K bytes each.
> >>
> >> Would the interfaces be better using xen_pfn_t rather than using
> >> uint64_t to pass addresses round where the bottom 12 bits will always
> >> be zero?
> >>
> >>    Paul
> >
> > Thank you, Paul.
> > Well, I'm not quite sure if the bottom 12 bits of the address are zero.
> > I had thought these addresses are just guest physical ones causing
> > the ept violation, and not aligned down to its page address, like the
> > MMIO. So, if our handler code has already done that, using xen_pfn_t
> > could be better(my developing machine encountered some hardware
> > problem, will check the code tomorrow) :)
> >
> > Yu
> 
> I just checked the code in hvm_select_ioreq_server(), and found value
> of address is not aligned down, and the lower 12 bits are not zero.
> So I wonder, is it necessary to do the shift to get a gfn, and what's
> the benefit?
> 

Well, you can only make whole pages mmio_dm_write and I was assuming an emulator that did so would be interested in writes anywhere in the page. Maybe that assumption is wrong?

  Paul

> Thanks
> Yu
> >>
> >>>
> >>> This patch uses a seperate rangeset for the guest ram pages.
> >>> And a new ioreq type, IOREQ_TYPE_WP_MEM, is defined.
> >>>
> >>> Note: Previously, a new hypercall or subop was suggested to map
> >>> write-protected pages into ioreq server. However, it turned out
> >>> handler of this new hypercall would be almost the same with the
> >>> existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's
> >>> already a type parameter in this hypercall. So no new hypercall
> >>> defined, only a new type is introduced.
> >>>
> >>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> >>> ---
> >>>   tools/libxc/include/xenctrl.h    | 31 ++++++++++++++++++++++
> >>>   tools/libxc/xc_domain.c          | 55
> >>> ++++++++++++++++++++++++++++++++++++++++
> >>>   xen/arch/x86/hvm/hvm.c           | 26 ++++++++++++++++++-
> >>>   xen/include/asm-x86/hvm/domain.h |  4 +--
> >>>   xen/include/public/hvm/hvm_op.h  |  1 +
> >>>   xen/include/public/hvm/ioreq.h   |  1 +
> >>>   6 files changed, 115 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/tools/libxc/include/xenctrl.h
> >>> b/tools/libxc/include/xenctrl.h
> >>> index de3c0ad..53f196d 100644
> >>> --- a/tools/libxc/include/xenctrl.h
> >>> +++ b/tools/libxc/include/xenctrl.h
> >>> @@ -2010,6 +2010,37 @@ int
> >>> xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch,
> >>>                                               int is_mmio,
> >>>                                               uint64_t start,
> >>>                                               uint64_t end);
> >>> +/**
> >>> + * This function registers a range of write-protected memory for
> >>> emulation.
> >>> + *
> >>> + * @parm xch a handle to an open hypervisor interface.
> >>> + * @parm domid the domain id to be serviced
> >>> + * @parm id the IOREQ Server id.
> >>> + * @parm start start of range
> >>> + * @parm end end of range (inclusive).
> >>> + * @return 0 on success, -1 on failure.
> >>> + */
> >>> +int xc_hvm_map_mem_range_to_ioreq_server(xc_interface *xch,
> >>> +                                        domid_t domid,
> >>> +                                        ioservid_t id,
> >>> +                                        uint64_t start,
> >>> +                                        uint64_t end);
> >>> +
> >>> +/**
> >>> + * This function deregisters a range of write-protected memory for
> >>> emulation.
> >>> + *
> >>> + * @parm xch a handle to an open hypervisor interface.
> >>> + * @parm domid the domain id to be serviced
> >>> + * @parm id the IOREQ Server id.
> >>> + * @parm start start of range
> >>> + * @parm end end of range (inclusive).
> >>> + * @return 0 on success, -1 on failure.
> >>> + */
> >>> +int xc_hvm_unmap_mem_range_from_ioreq_server(xc_interface
> *xch,
> >>> +                                            domid_t domid,
> >>> +                                            ioservid_t id,
> >>> +                                            uint64_t start,
> >>> +                                            uint64_t end);
> >>>
> >>>   /**
> >>>    * This function registers a PCI device for config space emulation.
> >>> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> >>> index 2ee26fb..42aeba9 100644
> >>> --- a/tools/libxc/xc_domain.c
> >>> +++ b/tools/libxc/xc_domain.c
> >>> @@ -1552,6 +1552,61 @@ int
> >>> xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch,
> domid_t
> >>> domid,
> >>>       return rc;
> >>>   }
> >>>
> >>> +int xc_hvm_map_mem_range_to_ioreq_server(xc_interface *xch,
> >>> domid_t domid,
> >>> +                                        ioservid_t id, uint64_t
> >>> start, uint64_t end)
> >>> +{
> >>> +    DECLARE_HYPERCALL;
> >>> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
> >>> +    int rc;
> >>> +
> >>> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> >>> +    if ( arg == NULL )
> >>> +        return -1;
> >>> +
> >>> +    hypercall.op     = __HYPERVISOR_hvm_op;
> >>> +    hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
> >>> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> >>> +
> >>> +    arg->domid = domid;
> >>> +    arg->id = id;
> >>> +    arg->type = HVMOP_IO_RANGE_WP_MEM;
> >>> +    arg->start = start;
> >>> +    arg->end = end;
> >>> +
> >>> +    rc = do_xen_hypercall(xch, &hypercall);
> >>> +
> >>> +    xc_hypercall_buffer_free(xch, arg);
> >>> +    return rc;
> >>> +}
> >>> +
> >>> +int xc_hvm_unmap_mem_range_from_ioreq_server(xc_interface
> *xch,
> >>> domid_t domid,
> >>> +                                            ioservid_t id, uint64_t
> >>> start, uint64_t end)
> >>> +{
> >>> +    DECLARE_HYPERCALL;
> >>> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
> >>> +    int rc;
> >>> +
> >>> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> >>> +    if ( arg == NULL )
> >>> +        return -1;
> >>> +
> >>> +    hypercall.op     = __HYPERVISOR_hvm_op;
> >>> +    hypercall.arg[0] = HVMOP_unmap_io_range_from_ioreq_server;
> >>> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> >>> +
> >>> +    arg->domid = domid;
> >>> +    arg->id = id;
> >>> +    arg->type = HVMOP_IO_RANGE_WP_MEM;
> >>> +    arg->start = start;
> >>> +    arg->end = end;
> >>> +
> >>> +    rc = do_xen_hypercall(xch, &hypercall);
> >>> +
> >>> +    xc_hypercall_buffer_free(xch, arg);
> >>> +    return rc;
> >>> +
> >>> +}
> >>> +
> >>>   int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch, domid_t
> >>> domid,
> >>>                                         ioservid_t id, uint16_t segment,
> >>>                                         uint8_t bus, uint8_t device,
> >>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> >>> index c957610..0389c0a 100644
> >>> --- a/xen/arch/x86/hvm/hvm.c
> >>> +++ b/xen/arch/x86/hvm/hvm.c
> >>> @@ -938,8 +938,9 @@ static int
> hvm_ioreq_server_alloc_rangesets(struct
> >>> hvm_ioreq_server *s,
> >>>
> >>>           rc = asprintf(&name, "ioreq_server %d %s", s->id,
> >>>                         (i == HVMOP_IO_RANGE_PORT) ? "port" :
> >>> -                      (i == HVMOP_IO_RANGE_MEMORY) ? "memory" :
> >>> +                      (i == HVMOP_IO_RANGE_MEMORY) ? "mmio" :
> >>>                         (i == HVMOP_IO_RANGE_PCI) ? "pci" :
> >>> +                      (i == HVMOP_IO_RANGE_WP_MEM) ? "wp-ed memory" :
> >>>                         "");
> >>>           if ( rc )
> >>>               goto fail;
> >>> @@ -1258,6 +1259,7 @@ static int
> >>> hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
> >>>               case HVMOP_IO_RANGE_PORT:
> >>>               case HVMOP_IO_RANGE_MEMORY:
> >>>               case HVMOP_IO_RANGE_PCI:
> >>> +            case HVMOP_IO_RANGE_WP_MEM:
> >>>                   r = s->range[type];
> >>>                   break;
> >>>
> >>> @@ -1309,6 +1311,7 @@ static int
> >>> hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t
> id,
> >>>               case HVMOP_IO_RANGE_PORT:
> >>>               case HVMOP_IO_RANGE_MEMORY:
> >>>               case HVMOP_IO_RANGE_PCI:
> >>> +            case HVMOP_IO_RANGE_WP_MEM:
> >>>                   r = s->range[type];
> >>>                   break;
> >>>
> >>> @@ -2523,6 +2526,8 @@ struct hvm_ioreq_server
> >>> *hvm_select_ioreq_server(struct domain *d,
> >>>       uint32_t cf8;
> >>>       uint8_t type;
> >>>       uint64_t addr;
> >>> +    p2m_type_t p2mt;
> >>> +    struct page_info *ram_page;
> >>>
> >>>       if ( list_empty(&d->arch.hvm_domain.ioreq_server.list) )
> >>>           return NULL;
> >>> @@ -2565,6 +2570,17 @@ struct hvm_ioreq_server
> >>> *hvm_select_ioreq_server(struct domain *d,
> >>>       {
> >>>           type = p->type;
> >>>           addr = p->addr;
> >>> +
> >>> +        if ( p->type == IOREQ_TYPE_COPY )
> >>> +        {
> >>> +            ram_page = get_page_from_gfn(d, p->addr >> PAGE_SHIFT,
> >>> +                                         &p2mt, P2M_UNSHARE);
> >>> +            if ( p2mt == p2m_mmio_write_dm )
> >>> +                type = IOREQ_TYPE_WP_MEM;
> >>> +
> >>> +            if ( ram_page )
> >>> +                put_page(ram_page);
> >>> +        }
> >>>       }
> >>>
> >>>       list_for_each_entry ( s,
> >>> @@ -2582,6 +2598,7 @@ struct hvm_ioreq_server
> >>> *hvm_select_ioreq_server(struct domain *d,
> >>>           BUILD_BUG_ON(IOREQ_TYPE_PIO != HVMOP_IO_RANGE_PORT);
> >>>           BUILD_BUG_ON(IOREQ_TYPE_COPY !=
> HVMOP_IO_RANGE_MEMORY);
> >>>           BUILD_BUG_ON(IOREQ_TYPE_PCI_CONFIG !=
> >>> HVMOP_IO_RANGE_PCI);
> >>> +        BUILD_BUG_ON(IOREQ_TYPE_WP_MEM !=
> >>> HVMOP_IO_RANGE_WP_MEM);
> >>>           r = s->range[type];
> >>>
> >>>           switch ( type )
> >>> @@ -2609,6 +2626,13 @@ struct hvm_ioreq_server
> >>> *hvm_select_ioreq_server(struct domain *d,
> >>>               }
> >>>
> >>>               break;
> >>> +
> >>> +        case IOREQ_TYPE_WP_MEM:
> >>> +            end = addr + (p->size * p->count) - 1;
> >>> +            if ( rangeset_contains_range(r, addr, end) )
> >>> +                return s;
> >>> +
> >>> +            break;
> >>>           }
> >>>       }
> >>>
> >>> diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-
> >>> x86/hvm/domain.h
> >>> index 992d5d1..b2e4234 100644
> >>> --- a/xen/include/asm-x86/hvm/domain.h
> >>> +++ b/xen/include/asm-x86/hvm/domain.h
> >>> @@ -48,8 +48,8 @@ struct hvm_ioreq_vcpu {
> >>>       bool_t           pending;
> >>>   };
> >>>
> >>> -#define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_PCI + 1)
> >>> -#define MAX_NR_IO_RANGES  256
> >>> +#define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_WP_MEM + 1)
> >>> +#define MAX_NR_IO_RANGES  8192
> >>>
> >>>   struct hvm_ioreq_server {
> >>>       struct list_head       list_entry;
> >>> diff --git a/xen/include/public/hvm/hvm_op.h
> >>> b/xen/include/public/hvm/hvm_op.h
> >>> index 014546a..c02c91a 100644
> >>> --- a/xen/include/public/hvm/hvm_op.h
> >>> +++ b/xen/include/public/hvm/hvm_op.h
> >>> @@ -331,6 +331,7 @@ struct xen_hvm_io_range {
> >>>   # define HVMOP_IO_RANGE_PORT   0 /* I/O port range */
> >>>   # define HVMOP_IO_RANGE_MEMORY 1 /* MMIO range */
> >>>   # define HVMOP_IO_RANGE_PCI    2 /* PCI segment/bus/dev/func
> range */
> >>> +# define HVMOP_IO_RANGE_WP_MEM 3 /* Write-protected memory
> >>> range */
> >>>       uint64_aligned_t start, end; /* IN - inclusive start and end of
> >>> range */
> >>>   };
> >>>   typedef struct xen_hvm_io_range xen_hvm_io_range_t;
> >>> diff --git a/xen/include/public/hvm/ioreq.h
> >>> b/xen/include/public/hvm/ioreq.h
> >>> index 2e5809b..2b712ac 100644
> >>> --- a/xen/include/public/hvm/ioreq.h
> >>> +++ b/xen/include/public/hvm/ioreq.h
> >>> @@ -35,6 +35,7 @@
> >>>   #define IOREQ_TYPE_PIO          0 /* pio */
> >>>   #define IOREQ_TYPE_COPY         1 /* mmio ops */
> >>>   #define IOREQ_TYPE_PCI_CONFIG   2
> >>> +#define IOREQ_TYPE_WP_MEM       3
> >>>   #define IOREQ_TYPE_TIMEOFFSET   7
> >>>   #define IOREQ_TYPE_INVALIDATE   8 /* mapcache */
> >>>
> >>> --
> >>> 1.9.1
> >>
> >>
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@lists.xen.org
> >> http://lists.xen.org/xen-devel
> >>
> >>
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> >
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 1/2] Differentiate IO/mem resources tracked by ioreq server
  2015-08-14  8:46         ` Paul Durrant
@ 2015-08-14  9:30           ` Yu, Zhang
  0 siblings, 0 replies; 11+ messages in thread
From: Yu, Zhang @ 2015-08-14  9:30 UTC (permalink / raw)
  To: Paul Durrant, xen-devel@lists.xen.org, Ian Jackson,
	Stefano Stabellini, Ian Campbell, Wei Liu, Keir (Xen.org),
	jbeulich@suse.com, Andrew Cooper
  Cc: Kevin Tian, zhiyuan.lv@intel.com



On 8/14/2015 4:46 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: Yu, Zhang [mailto:yu.c.zhang@linux.intel.com]
>> Sent: 14 August 2015 08:41
>> To: Paul Durrant; xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Ian
>> Campbell; Wei Liu; Keir (Xen.org); jbeulich@suse.com; Andrew Cooper
>> Cc: Kevin Tian; zhiyuan.lv@intel.com
>> Subject: Re: [Xen-devel] [PATCH v4 1/2] Differentiate IO/mem resources
>> tracked by ioreq server
>>
>> Hi Paul,
>>
>> On 8/13/2015 6:39 PM, Yu, Zhang wrote:
>>>
>>>
>>> On 8/13/2015 6:16 PM, Paul Durrant wrote:
>>>>> -----Original Message-----
>>>>> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
>>>>> Sent: 13 August 2015 11:06
>>>>> To: xen-devel@lists.xen.org; Paul Durrant; Ian Jackson; Stefano
>>>>> Stabellini; Ian
>>>>> Campbell; Wei Liu; Keir (Xen.org); jbeulich@suse.com; Andrew Cooper
>>>>> Cc: Kevin Tian; zhiyuan.lv@intel.com
>>>>> Subject: [PATCH v4 1/2] Differentiate IO/mem resources tracked by
>> ioreq
>>>>> server
>>>>>
>>>>> Currently in ioreq server, guest write-protected ram pages are
>>>>> tracked in the same rangeset with device mmio resources. Yet
>>>>> unlike device mmio, which can be in big chunks, the guest write-
>>>>> protected pages may be discrete ranges with 4K bytes each.
>>>>
>>>> Would the interfaces be better using xen_pfn_t rather than using
>>>> uint64_t to pass addresses round where the bottom 12 bits will always
>>>> be zero?
>>>>
>>>>     Paul
>>>
>>> Thank you, Paul.
>>> Well, I'm not quite sure if the bottom 12 bits of the address are zero.
>>> I had thought these addresses are just guest physical ones causing
>>> the ept violation, and not aligned down to its page address, like the
>>> MMIO. So, if our handler code has already done that, using xen_pfn_t
>>> could be better(my developing machine encountered some hardware
>>> problem, will check the code tomorrow) :)
>>>
>>> Yu
>>
>> I just checked the code in hvm_select_ioreq_server(), and found value
>> of address is not aligned down, and the lower 12 bits are not zero.
>> So I wonder, is it necessary to do the shift to get a gfn, and what's
>> the benefit?
>>
>
> Well, you can only make whole pages mmio_dm_write and I was assuming an emulator that did so would be interested in writes anywhere in the page. Maybe that assumption is wrong?
>
>    Paul

Thank you, Paul. That makes sense. I'll try use the xen_pfn_t. :-)

Yu
>
>> Thanks
>> Yu
>>>>
>>>>>
>>>>> This patch uses a seperate rangeset for the guest ram pages.
>>>>> And a new ioreq type, IOREQ_TYPE_WP_MEM, is defined.
>>>>>
>>>>> Note: Previously, a new hypercall or subop was suggested to map
>>>>> write-protected pages into ioreq server. However, it turned out
>>>>> handler of this new hypercall would be almost the same with the
>>>>> existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's
>>>>> already a type parameter in this hypercall. So no new hypercall
>>>>> defined, only a new type is introduced.
>>>>>
>>>>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>>>>> ---
>>>>>    tools/libxc/include/xenctrl.h    | 31 ++++++++++++++++++++++
>>>>>    tools/libxc/xc_domain.c          | 55
>>>>> ++++++++++++++++++++++++++++++++++++++++
>>>>>    xen/arch/x86/hvm/hvm.c           | 26 ++++++++++++++++++-
>>>>>    xen/include/asm-x86/hvm/domain.h |  4 +--
>>>>>    xen/include/public/hvm/hvm_op.h  |  1 +
>>>>>    xen/include/public/hvm/ioreq.h   |  1 +
>>>>>    6 files changed, 115 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/tools/libxc/include/xenctrl.h
>>>>> b/tools/libxc/include/xenctrl.h
>>>>> index de3c0ad..53f196d 100644
>>>>> --- a/tools/libxc/include/xenctrl.h
>>>>> +++ b/tools/libxc/include/xenctrl.h
>>>>> @@ -2010,6 +2010,37 @@ int
>>>>> xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch,
>>>>>                                                int is_mmio,
>>>>>                                                uint64_t start,
>>>>>                                                uint64_t end);
>>>>> +/**
>>>>> + * This function registers a range of write-protected memory for
>>>>> emulation.
>>>>> + *
>>>>> + * @parm xch a handle to an open hypervisor interface.
>>>>> + * @parm domid the domain id to be serviced
>>>>> + * @parm id the IOREQ Server id.
>>>>> + * @parm start start of range
>>>>> + * @parm end end of range (inclusive).
>>>>> + * @return 0 on success, -1 on failure.
>>>>> + */
>>>>> +int xc_hvm_map_mem_range_to_ioreq_server(xc_interface *xch,
>>>>> +                                        domid_t domid,
>>>>> +                                        ioservid_t id,
>>>>> +                                        uint64_t start,
>>>>> +                                        uint64_t end);
>>>>> +
>>>>> +/**
>>>>> + * This function deregisters a range of write-protected memory for
>>>>> emulation.
>>>>> + *
>>>>> + * @parm xch a handle to an open hypervisor interface.
>>>>> + * @parm domid the domain id to be serviced
>>>>> + * @parm id the IOREQ Server id.
>>>>> + * @parm start start of range
>>>>> + * @parm end end of range (inclusive).
>>>>> + * @return 0 on success, -1 on failure.
>>>>> + */
>>>>> +int xc_hvm_unmap_mem_range_from_ioreq_server(xc_interface
>> *xch,
>>>>> +                                            domid_t domid,
>>>>> +                                            ioservid_t id,
>>>>> +                                            uint64_t start,
>>>>> +                                            uint64_t end);
>>>>>
>>>>>    /**
>>>>>     * This function registers a PCI device for config space emulation.
>>>>> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
>>>>> index 2ee26fb..42aeba9 100644
>>>>> --- a/tools/libxc/xc_domain.c
>>>>> +++ b/tools/libxc/xc_domain.c
>>>>> @@ -1552,6 +1552,61 @@ int
>>>>> xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch,
>> domid_t
>>>>> domid,
>>>>>        return rc;
>>>>>    }
>>>>>
>>>>> +int xc_hvm_map_mem_range_to_ioreq_server(xc_interface *xch,
>>>>> domid_t domid,
>>>>> +                                        ioservid_t id, uint64_t
>>>>> start, uint64_t end)
>>>>> +{
>>>>> +    DECLARE_HYPERCALL;
>>>>> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
>>>>> +    int rc;
>>>>> +
>>>>> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
>>>>> +    if ( arg == NULL )
>>>>> +        return -1;
>>>>> +
>>>>> +    hypercall.op     = __HYPERVISOR_hvm_op;
>>>>> +    hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
>>>>> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
>>>>> +
>>>>> +    arg->domid = domid;
>>>>> +    arg->id = id;
>>>>> +    arg->type = HVMOP_IO_RANGE_WP_MEM;
>>>>> +    arg->start = start;
>>>>> +    arg->end = end;
>>>>> +
>>>>> +    rc = do_xen_hypercall(xch, &hypercall);
>>>>> +
>>>>> +    xc_hypercall_buffer_free(xch, arg);
>>>>> +    return rc;
>>>>> +}
>>>>> +
>>>>> +int xc_hvm_unmap_mem_range_from_ioreq_server(xc_interface
>> *xch,
>>>>> domid_t domid,
>>>>> +                                            ioservid_t id, uint64_t
>>>>> start, uint64_t end)
>>>>> +{
>>>>> +    DECLARE_HYPERCALL;
>>>>> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
>>>>> +    int rc;
>>>>> +
>>>>> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
>>>>> +    if ( arg == NULL )
>>>>> +        return -1;
>>>>> +
>>>>> +    hypercall.op     = __HYPERVISOR_hvm_op;
>>>>> +    hypercall.arg[0] = HVMOP_unmap_io_range_from_ioreq_server;
>>>>> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
>>>>> +
>>>>> +    arg->domid = domid;
>>>>> +    arg->id = id;
>>>>> +    arg->type = HVMOP_IO_RANGE_WP_MEM;
>>>>> +    arg->start = start;
>>>>> +    arg->end = end;
>>>>> +
>>>>> +    rc = do_xen_hypercall(xch, &hypercall);
>>>>> +
>>>>> +    xc_hypercall_buffer_free(xch, arg);
>>>>> +    return rc;
>>>>> +
>>>>> +}
>>>>> +
>>>>>    int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch, domid_t
>>>>> domid,
>>>>>                                          ioservid_t id, uint16_t segment,
>>>>>                                          uint8_t bus, uint8_t device,
>>>>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>>>>> index c957610..0389c0a 100644
>>>>> --- a/xen/arch/x86/hvm/hvm.c
>>>>> +++ b/xen/arch/x86/hvm/hvm.c
>>>>> @@ -938,8 +938,9 @@ static int
>> hvm_ioreq_server_alloc_rangesets(struct
>>>>> hvm_ioreq_server *s,
>>>>>
>>>>>            rc = asprintf(&name, "ioreq_server %d %s", s->id,
>>>>>                          (i == HVMOP_IO_RANGE_PORT) ? "port" :
>>>>> -                      (i == HVMOP_IO_RANGE_MEMORY) ? "memory" :
>>>>> +                      (i == HVMOP_IO_RANGE_MEMORY) ? "mmio" :
>>>>>                          (i == HVMOP_IO_RANGE_PCI) ? "pci" :
>>>>> +                      (i == HVMOP_IO_RANGE_WP_MEM) ? "wp-ed memory" :
>>>>>                          "");
>>>>>            if ( rc )
>>>>>                goto fail;
>>>>> @@ -1258,6 +1259,7 @@ static int
>>>>> hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
>>>>>                case HVMOP_IO_RANGE_PORT:
>>>>>                case HVMOP_IO_RANGE_MEMORY:
>>>>>                case HVMOP_IO_RANGE_PCI:
>>>>> +            case HVMOP_IO_RANGE_WP_MEM:
>>>>>                    r = s->range[type];
>>>>>                    break;
>>>>>
>>>>> @@ -1309,6 +1311,7 @@ static int
>>>>> hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t
>> id,
>>>>>                case HVMOP_IO_RANGE_PORT:
>>>>>                case HVMOP_IO_RANGE_MEMORY:
>>>>>                case HVMOP_IO_RANGE_PCI:
>>>>> +            case HVMOP_IO_RANGE_WP_MEM:
>>>>>                    r = s->range[type];
>>>>>                    break;
>>>>>
>>>>> @@ -2523,6 +2526,8 @@ struct hvm_ioreq_server
>>>>> *hvm_select_ioreq_server(struct domain *d,
>>>>>        uint32_t cf8;
>>>>>        uint8_t type;
>>>>>        uint64_t addr;
>>>>> +    p2m_type_t p2mt;
>>>>> +    struct page_info *ram_page;
>>>>>
>>>>>        if ( list_empty(&d->arch.hvm_domain.ioreq_server.list) )
>>>>>            return NULL;
>>>>> @@ -2565,6 +2570,17 @@ struct hvm_ioreq_server
>>>>> *hvm_select_ioreq_server(struct domain *d,
>>>>>        {
>>>>>            type = p->type;
>>>>>            addr = p->addr;
>>>>> +
>>>>> +        if ( p->type == IOREQ_TYPE_COPY )
>>>>> +        {
>>>>> +            ram_page = get_page_from_gfn(d, p->addr >> PAGE_SHIFT,
>>>>> +                                         &p2mt, P2M_UNSHARE);
>>>>> +            if ( p2mt == p2m_mmio_write_dm )
>>>>> +                type = IOREQ_TYPE_WP_MEM;
>>>>> +
>>>>> +            if ( ram_page )
>>>>> +                put_page(ram_page);
>>>>> +        }
>>>>>        }
>>>>>
>>>>>        list_for_each_entry ( s,
>>>>> @@ -2582,6 +2598,7 @@ struct hvm_ioreq_server
>>>>> *hvm_select_ioreq_server(struct domain *d,
>>>>>            BUILD_BUG_ON(IOREQ_TYPE_PIO != HVMOP_IO_RANGE_PORT);
>>>>>            BUILD_BUG_ON(IOREQ_TYPE_COPY !=
>> HVMOP_IO_RANGE_MEMORY);
>>>>>            BUILD_BUG_ON(IOREQ_TYPE_PCI_CONFIG !=
>>>>> HVMOP_IO_RANGE_PCI);
>>>>> +        BUILD_BUG_ON(IOREQ_TYPE_WP_MEM !=
>>>>> HVMOP_IO_RANGE_WP_MEM);
>>>>>            r = s->range[type];
>>>>>
>>>>>            switch ( type )
>>>>> @@ -2609,6 +2626,13 @@ struct hvm_ioreq_server
>>>>> *hvm_select_ioreq_server(struct domain *d,
>>>>>                }
>>>>>
>>>>>                break;
>>>>> +
>>>>> +        case IOREQ_TYPE_WP_MEM:
>>>>> +            end = addr + (p->size * p->count) - 1;
>>>>> +            if ( rangeset_contains_range(r, addr, end) )
>>>>> +                return s;
>>>>> +
>>>>> +            break;
>>>>>            }
>>>>>        }
>>>>>
>>>>> diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-
>>>>> x86/hvm/domain.h
>>>>> index 992d5d1..b2e4234 100644
>>>>> --- a/xen/include/asm-x86/hvm/domain.h
>>>>> +++ b/xen/include/asm-x86/hvm/domain.h
>>>>> @@ -48,8 +48,8 @@ struct hvm_ioreq_vcpu {
>>>>>        bool_t           pending;
>>>>>    };
>>>>>
>>>>> -#define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_PCI + 1)
>>>>> -#define MAX_NR_IO_RANGES  256
>>>>> +#define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_WP_MEM + 1)
>>>>> +#define MAX_NR_IO_RANGES  8192
>>>>>
>>>>>    struct hvm_ioreq_server {
>>>>>        struct list_head       list_entry;
>>>>> diff --git a/xen/include/public/hvm/hvm_op.h
>>>>> b/xen/include/public/hvm/hvm_op.h
>>>>> index 014546a..c02c91a 100644
>>>>> --- a/xen/include/public/hvm/hvm_op.h
>>>>> +++ b/xen/include/public/hvm/hvm_op.h
>>>>> @@ -331,6 +331,7 @@ struct xen_hvm_io_range {
>>>>>    # define HVMOP_IO_RANGE_PORT   0 /* I/O port range */
>>>>>    # define HVMOP_IO_RANGE_MEMORY 1 /* MMIO range */
>>>>>    # define HVMOP_IO_RANGE_PCI    2 /* PCI segment/bus/dev/func
>> range */
>>>>> +# define HVMOP_IO_RANGE_WP_MEM 3 /* Write-protected memory
>>>>> range */
>>>>>        uint64_aligned_t start, end; /* IN - inclusive start and end of
>>>>> range */
>>>>>    };
>>>>>    typedef struct xen_hvm_io_range xen_hvm_io_range_t;
>>>>> diff --git a/xen/include/public/hvm/ioreq.h
>>>>> b/xen/include/public/hvm/ioreq.h
>>>>> index 2e5809b..2b712ac 100644
>>>>> --- a/xen/include/public/hvm/ioreq.h
>>>>> +++ b/xen/include/public/hvm/ioreq.h
>>>>> @@ -35,6 +35,7 @@
>>>>>    #define IOREQ_TYPE_PIO          0 /* pio */
>>>>>    #define IOREQ_TYPE_COPY         1 /* mmio ops */
>>>>>    #define IOREQ_TYPE_PCI_CONFIG   2
>>>>> +#define IOREQ_TYPE_WP_MEM       3
>>>>>    #define IOREQ_TYPE_TIMEOFFSET   7
>>>>>    #define IOREQ_TYPE_INVALIDATE   8 /* mapcache */
>>>>>
>>>>> --
>>>>> 1.9.1
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xen.org
>>>> http://lists.xen.org/xen-devel
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xen.org
>>> http://lists.xen.org/xen-devel
>>>
>>>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-08-14  9:30 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-13 10:05 [PATCH v4 0/2] Refactor ioreq server for better performance Yu Zhang
2015-08-13 10:05 ` [PATCH v4 1/2] Differentiate IO/mem resources tracked by ioreq server Yu Zhang
2015-08-13 10:16   ` Paul Durrant
2015-08-13 10:39     ` Yu, Zhang
2015-08-14  7:40       ` Yu, Zhang
2015-08-14  8:46         ` Paul Durrant
2015-08-14  9:30           ` Yu, Zhang
2015-08-13 10:05 ` [PATCH v4 2/2] Refactor rangeset structure for better performance Yu Zhang
2015-08-13 10:33   ` Paul Durrant
2015-08-13 10:46     ` Yu, Zhang
2015-08-13 11:29       ` Paul Durrant

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).