[PATCH v2 0/2] Refactor ioreq server for better performance.

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] Refactor ioreq server for better performance.
@ 2015-07-06  6:25 Yu Zhang
  2015-07-06  6:25 ` [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server Yu Zhang
  2015-07-06  6:25 ` [PATCH v2 2/2] Add new data structure to track ranges Yu Zhang
  0 siblings, 2 replies; 24+ messages in thread
From: Yu Zhang @ 2015-07-06  6:25 UTC (permalink / raw)
  To: xen-devel, keir, JBeulich, andrew.cooper3, Paul.Durrant
  Cc: kevin.tian, zhiyuan.lv

XenGT leverages ioreq server to track and forward the accesses to
GPU I/O resources, e.g. the PPGTT(per-process graphic translation
tables). Currently, ioreq server uses rangeset to track the BDF/
PIO/MMIO ranges to be emulated. To select an ioreq server, the
rangeset is searched to see if the I/O range is recorded. However,
traversing the link list inside rangeset could be time consuming
when number of ranges is too high. On HSW platform, number of PPGTTs
for each vGPU could be several hundred. On BDW, this value could
be several thousand.

To accommodate more ranges, limitation of the number of ranges in an
ioreq server, MAX_NR_IO_RANGES is changed - future patches will be
provided to tune this with other approaches. And to increase the ioreq
server performance, a new data structure, rb_rangeset, is introduced.

Changes in v2:
1> Split the original patch into 2;
2> Take Paul Durrant's comments:
  a> Add a name member in the struct rb_rangeset, and use the 'q'
debug key to dump the ranges in ioreq server;
  b> Keep original routine names for hvm ioreq server;
  c> Commit message changes - mention that a future patch to change
the maximum ranges inside ioreq server.

Yu Zhang (2):
  Resize the MAX_NR_IO_RANGES for ioreq server
  Add new data structure to track ranges.

 xen/arch/x86/domain.c            |   3 +
 xen/arch/x86/hvm/hvm.c           |  56 ++++++--
 xen/common/Makefile              |   1 +
 xen/common/rb_rangeset.c         | 281 +++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/domain.h |   4 +-
 xen/include/asm-x86/hvm/hvm.h    |   1 +
 xen/include/xen/rb_rangeset.h    |  49 +++++++
 7 files changed, 379 insertions(+), 16 deletions(-)
 create mode 100644 xen/common/rb_rangeset.c
 create mode 100644 xen/include/xen/rb_rangeset.h

-- 
1.9.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server
  2015-07-06  6:25 [PATCH v2 0/2] Refactor ioreq server for better performance Yu Zhang
@ 2015-07-06  6:25 ` Yu Zhang
  2015-07-06 12:35   ` George Dunlap
  2015-07-06  6:25 ` [PATCH v2 2/2] Add new data structure to track ranges Yu Zhang
  1 sibling, 1 reply; 24+ messages in thread
From: Yu Zhang @ 2015-07-06  6:25 UTC (permalink / raw)
  To: xen-devel, keir, JBeulich, andrew.cooper3, Paul.Durrant
  Cc: kevin.tian, zhiyuan.lv

MAX_NR_IO_RANGES is used by ioreq server as the maximum
number of discrete ranges to be tracked. This patch changes
its value to 8k, so that more ranges can be tracked on next
generation of Intel platforms in XenGT. Future patches can
extend the limit to be toolstack tunable, and MAX_NR_IO_RANGES
can serve as a default limit.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
 xen/include/asm-x86/hvm/domain.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index ad68fcf..d62fda9 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -49,7 +49,7 @@ struct hvm_ioreq_vcpu {
 };
 
 #define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_PCI + 1)
-#define MAX_NR_IO_RANGES  256
+#define MAX_NR_IO_RANGES  8192
 
 struct hvm_ioreq_server {
     struct list_head       list_entry;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 2/2] Add new data structure to track ranges.
  2015-07-06  6:25 [PATCH v2 0/2] Refactor ioreq server for better performance Yu Zhang
  2015-07-06  6:25 ` [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server Yu Zhang
@ 2015-07-06  6:25 ` Yu Zhang
  1 sibling, 0 replies; 24+ messages in thread
From: Yu Zhang @ 2015-07-06  6:25 UTC (permalink / raw)
  To: xen-devel, keir, JBeulich, andrew.cooper3, Paul.Durrant
  Cc: kevin.tian, zhiyuan.lv

This patch introduces a new data structure, struct rb_rangeset,
to represent a group of continuous ranges, e.g. the start and end
addresses for PIO/MMIO regions. By now, this structure is supposed
to assist ioreq server to forward the I/O request to backend device
models more efficiently.

Behavior of this new data structure is quite similar to rangeset,
with major difference being the time complexity. Based on doubly
linked list, struct rangeset provides O(n) time complexity for
searching. And struct rb_rangeset is based on red-black tree, with
binary searching, the time complexity is improved to O(log(n)) -
more suitable to track massive discrete ranges.

Ioreq server code is changed to utilize this new type, and a new
routine, hvm_ioreq_server_dump_range_info, is added to dump all the
ranges tracked in an ioreq server.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
 xen/arch/x86/domain.c            |   3 +
 xen/arch/x86/hvm/hvm.c           |  56 ++++++--
 xen/common/Makefile              |   1 +
 xen/common/rb_rangeset.c         | 281 +++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/domain.h |   2 +-
 xen/include/asm-x86/hvm/hvm.h    |   1 +
 xen/include/xen/rb_rangeset.h    |  49 +++++++
 7 files changed, 378 insertions(+), 15 deletions(-)
 create mode 100644 xen/common/rb_rangeset.c
 create mode 100644 xen/include/xen/rb_rangeset.h

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index a8fe046..f8a8b80 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2086,6 +2086,9 @@ int domain_relinquish_resources(struct domain *d)
 void arch_dump_domain_info(struct domain *d)
 {
     paging_dump_domain_info(d);
+
+    if ( is_hvm_domain(d) )
+        hvm_ioreq_server_dump_range_info(d);
 }
 
 void arch_dump_vcpu_info(struct vcpu *v)
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 535d622..c79676e 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -37,6 +37,7 @@
 #include <xen/wait.h>
 #include <xen/mem_access.h>
 #include <xen/rangeset.h>
+#include <xen/rb_rangeset.h>
 #include <asm/shadow.h>
 #include <asm/hap.h>
 #include <asm/current.h>
@@ -818,7 +819,7 @@ static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s,
         return;
 
     for ( i = 0; i < NR_IO_RANGE_TYPES; i++ )
-        rangeset_destroy(s->range[i]);
+        rb_rangeset_destroy(s->range[i]);
 }
 
 static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s, 
@@ -842,8 +843,7 @@ static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
         if ( rc )
             goto fail;
 
-        s->range[i] = rangeset_new(s->domain, name,
-                                   RANGESETF_prettyprint_hex);
+        s->range[i] = rb_rangeset_new(name);
 
         xfree(name);
 
@@ -851,7 +851,7 @@ static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
         if ( !s->range[i] )
             goto fail;
 
-        rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
+        rb_rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
     }
 
  done:
@@ -1149,7 +1149,7 @@ static int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
 
         if ( s->id == id )
         {
-            struct rangeset *r;
+            struct rb_rangeset *r;
 
             switch ( type )
             {
@@ -1169,10 +1169,10 @@ static int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
                 break;
 
             rc = -EEXIST;
-            if ( rangeset_overlaps_range(r, start, end) )
+            if ( rb_rangeset_overlaps_range(r, start, end) )
                 break;
 
-            rc = rangeset_add_range(r, start, end);
+            rc = rb_rangeset_add_range(r, start, end);
             break;
         }
     }
@@ -1200,7 +1200,7 @@ static int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
 
         if ( s->id == id )
         {
-            struct rangeset *r;
+            struct rb_rangeset *r;
 
             switch ( type )
             {
@@ -1220,10 +1220,10 @@ static int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
                 break;
 
             rc = -ENOENT;
-            if ( !rangeset_contains_range(r, start, end) )
+            if ( !rb_rangeset_contains_range(r, start, end) )
                 break;
 
-            rc = rangeset_remove_range(r, start, end);
+            rc = rb_rangeset_remove_range(r, start, end);
             break;
         }
     }
@@ -1349,6 +1349,34 @@ static void hvm_destroy_all_ioreq_servers(struct domain *d)
     spin_unlock(&d->arch.hvm_domain.ioreq_server.lock);
 }
 
+void  hvm_ioreq_server_dump_range_info(struct domain *d)
+{
+    unsigned int i;
+    struct hvm_ioreq_server *s;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
+
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server.list,
+                          list_entry )
+    {
+        if ( s == d->arch.hvm_domain.default_ioreq_server )
+            continue;
+
+         printk("Domain %u, ranges tracked in ioreq server %u:\n", d->domain_id, s->id);
+
+         for ( i = 0; i < NR_IO_RANGE_TYPES; i++ )
+         {
+             if ( !s->range[i] )
+                 continue;
+
+             rb_rangeset_printk(s->range[i]);
+        }
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server.lock);
+}
+
 static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
                                      evtchn_port_t *p_port)
 {
@@ -2465,7 +2493,7 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
                           &d->arch.hvm_domain.ioreq_server.list,
                           list_entry )
     {
-        struct rangeset *r;
+        struct rb_rangeset *r;
 
         if ( s == d->arch.hvm_domain.default_ioreq_server )
             continue;
@@ -2484,18 +2512,18 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
 
         case IOREQ_TYPE_PIO:
             end = addr + p->size - 1;
-            if ( rangeset_contains_range(r, addr, end) )
+            if ( rb_rangeset_contains_range(r, addr, end) )
                 return s;
 
             break;
         case IOREQ_TYPE_COPY:
             end = addr + (p->size * p->count) - 1;
-            if ( rangeset_contains_range(r, addr, end) )
+            if ( rb_rangeset_contains_range(r, addr, end) )
                 return s;
 
             break;
         case IOREQ_TYPE_PCI_CONFIG:
-            if ( rangeset_contains_singleton(r, addr >> 32) )
+            if ( rb_rangeset_contains_range(r, addr>>32, addr>>32) )
             {
                 p->type = type;
                 p->addr = addr;
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 1cddebc..c49040f 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -28,6 +28,7 @@ obj-y += random.o
 obj-y += rangeset.o
 obj-y += radix-tree.o
 obj-y += rbtree.o
+obj-y += rb_rangeset.o
 obj-y += rcupdate.o
 obj-y += sched_credit.o
 obj-y += sched_credit2.o
diff --git a/xen/common/rb_rangeset.c b/xen/common/rb_rangeset.c
new file mode 100644
index 0000000..a19df6b
--- /dev/null
+++ b/xen/common/rb_rangeset.c
@@ -0,0 +1,281 @@
+/*
+  Red-black tree based rangeset
+
+  This program is free software; you can redistribute it and/or modify
+  it under the terms of the GNU General Public License as published by
+  the Free Software Foundation; either version 2 of the License, or
+  (at your option) any later version.
+
+  This program is distributed in the hope that it will be useful,
+  but WITHOUT ANY WARRANTY; without even the implied warranty of
+  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+  GNU General Public License for more details.
+
+  You should have received a copy of the GNU General Public License
+  along with this program; if not, write to the Free Software
+  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+*/
+
+#include <xen/lib.h>
+#include <xen/kernel.h>
+#include <xen/errno.h>
+#include <xen/rb_rangeset.h>
+
+static struct rb_range *alloc_and_init_rb_range(
+    struct rb_rangeset *r, unsigned long s, unsigned long e)
+{
+    struct rb_range *range = NULL;
+
+    if ( r->nr_ranges == 0 )
+        return NULL;
+
+    range =  xmalloc(struct rb_range);
+    if ( range )
+    {
+        --r->nr_ranges;
+        range->s = s;
+        range->e = e;
+    }
+    return range;
+}
+
+static void free_rb_range(struct rb_rangeset *r, struct rb_range *range)
+{
+    r->nr_ranges++;
+    rb_erase(&range->node, &r->rbroot);
+    xfree(range);
+}
+
+static struct rb_range *rb_rangeset_find_range(
+    struct rb_rangeset *r, unsigned long s)
+{
+    struct rb_node *node;
+
+    node = r->rbroot.rb_node;
+
+    while ( node )
+    {
+        struct rb_range *range = container_of(node, struct rb_range, node);
+
+        if ( (s >= range->s) && (s <= range->e) )
+            return range;
+        if ( s < range->s )
+            node = node->rb_left;
+        else if ( s > range->s )
+            node = node->rb_right;
+    }
+    return NULL;
+}
+
+bool_t rb_rangeset_overlaps_range(struct rb_rangeset *r,
+    unsigned long s, unsigned long e)
+{
+    struct rb_node *node;
+    bool_t rc = 0;
+
+    ASSERT(s <= e);
+
+    node = r->rbroot.rb_node;
+    while ( node )
+    {
+        struct rb_range *range = container_of(node, struct rb_range, node);
+        if ( (s <= range->e) && (e >= range->s) )
+        {
+            rc = 1;
+            break;
+        }
+        else if ( s < range->s )
+            node = node->rb_left;
+        else if ( s > range->s )
+            node = node->rb_right;
+    }
+    return rc;
+}
+
+bool_t rb_rangeset_contains_range(
+    struct rb_rangeset *r, unsigned long s, unsigned long e)
+{
+    struct rb_range *range;
+    bool_t contains;
+
+    ASSERT(s <= e);
+
+    range = rb_rangeset_find_range(r, s);
+    contains = (range && (range->e >= e));
+    return contains;
+}
+
+static void rb_rangeset_insert_range(
+    struct rb_root *root, struct rb_range *range)
+{
+    struct rb_node **new = &(root->rb_node);
+    struct rb_node *parent = NULL;
+
+    /* Figure out where to put new node */
+    while ( *new )
+    {
+        struct rb_range *this = container_of(*new, struct rb_range, node);
+        parent = *new;
+
+        if ( range->s < this->s )
+            new = &((*new)->rb_left);
+        else if ( range->s > this->s )
+            new = &((*new)->rb_right);
+    }
+
+    /* Add new node and rebalance the range tree. */
+    rb_link_node(&range->node, parent, new);
+    rb_insert_color(&range->node, root);
+}
+
+/*
+ * Add a new range into the rb_rangeset, rb_rangeset_overlaps_range()
+ * should be called first, to ensure nodes inside the rb_rangeset will
+ * not interleave.
+ */
+int rb_rangeset_add_range(struct rb_rangeset *r,
+    unsigned long s, unsigned long e)
+{
+    struct rb_range *range = NULL;
+    struct rb_range *next = NULL;
+
+    ASSERT(s <= e);
+
+    if ( (s) && (range = rb_rangeset_find_range(r, s - 1)) )
+    {
+        /* range tree overlapped */
+        if ( range->e != (s - 1) )
+            return -EEXIST;
+        range->e = e;
+    }
+    else
+    {
+        range = alloc_and_init_rb_range(r, s, e);
+        if ( !range )
+            return -ENOMEM;
+        rb_rangeset_insert_range(&r->rbroot, range);
+    }
+
+    next = container_of(rb_next(&range->node), struct rb_range, node);
+
+    if ( next && (next->s == (e + 1)) )
+    {
+        range->e = next->e;
+        free_rb_range(r, next);
+    }
+
+    return 0;
+}
+
+int rb_rangeset_remove_range(struct rb_rangeset *r,
+    unsigned long s, unsigned long e)
+{
+    struct rb_range *range = NULL;
+    struct rb_range *next = NULL;
+    unsigned long start, end;
+
+    ASSERT(s <= e);
+
+    range = rb_rangeset_find_range(r, s);
+    if ( !range )
+        return -ENOENT;
+
+    start = range->s;
+    end = range->e;
+
+    /* the range to be removed must be contained in one rb_range */
+    if ( end < e )
+        return -ENOENT;
+
+    /* value of start can only be less than or equal to s */
+    if ( start == s )
+    {
+        if ( end > e )
+            range->s = e + 1;
+        else
+            free_rb_range(r, range);
+    }
+    else
+    {
+        if ( end > e )
+        {
+            next = alloc_and_init_rb_range(r, e + 1, end);
+            if ( next == NULL )
+                return -ENOMEM;
+
+            rb_rangeset_insert_range(&r->rbroot, next);
+        }
+        range->e = s - 1;
+    }
+    return 0;
+}
+
+void rb_rangeset_destroy(struct rb_rangeset *r)
+{
+    struct rb_root *root;
+    struct rb_node *node;
+
+    if ( r == NULL )
+        return;
+
+    root = &r->rbroot;
+    node = rb_first(root);
+    while ( node )
+    {
+        struct rb_range *range = container_of(node, struct rb_range, node);
+        free_rb_range(r, range);
+        node = rb_first(root);
+    }
+
+    xfree(r);
+}
+
+struct rb_rangeset *rb_rangeset_new(char *name)
+{
+    struct rb_rangeset *r;
+
+    r = xmalloc(struct rb_rangeset);
+    if ( r == NULL )
+        return NULL;
+
+    r->nr_ranges = -1;
+    if ( name != NULL )
+    {
+        safe_strcpy(r->name, name);
+    }
+    else
+    {
+        snprintf(r->name, sizeof(r->name), "(no name)");
+    }
+
+    r->rbroot = RB_ROOT;
+    return r;
+}
+
+void rb_rangeset_limit(
+    struct rb_rangeset *r, unsigned int limit)
+{
+    r->nr_ranges = limit;
+}
+
+void rb_rangeset_printk(struct rb_rangeset *r)
+{
+    struct rb_node *node;
+
+    printk("  %-10s: {\n", r->name);
+
+    node = rb_first(&r->rbroot);
+
+    while ( node )
+    {
+        struct rb_range *range = container_of(node, struct rb_range, node);
+
+        printk("    [ 0x%lx", range->s);
+        if ( range->e != range->s )
+            printk(", 0x%lx", range->e);
+        printk(" ]\n");
+
+        node = rb_next(node);
+    }
+    printk("  }\n");
+}
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index d62fda9..a2f60a8 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -68,7 +68,7 @@ struct hvm_ioreq_server {
     /* Lock to serialize access to buffered ioreq ring */
     spinlock_t             bufioreq_lock;
     evtchn_port_t          bufioreq_evtchn;
-    struct rangeset        *range[NR_IO_RANGE_TYPES];
+    struct rb_rangeset     *range[NR_IO_RANGE_TYPES];
     bool_t                 enabled;
     bool_t                 bufioreq_atomic;
 };
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 57f9605..2b1ea30 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -230,6 +230,7 @@ void hvm_vcpu_reset_state(struct vcpu *v, uint16_t cs, uint16_t ip);
 
 struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
                                                  ioreq_t *p);
+void  hvm_ioreq_server_dump_range_info(struct domain *d);
 int hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *p);
 void hvm_broadcast_assist_req(ioreq_t *p);
 void hvm_complete_assist_req(ioreq_t *p);
diff --git a/xen/include/xen/rb_rangeset.h b/xen/include/xen/rb_rangeset.h
new file mode 100644
index 0000000..4c07e30
--- /dev/null
+++ b/xen/include/xen/rb_rangeset.h
@@ -0,0 +1,49 @@
+/*
+  Red-black tree based rangeset
+
+  This program is free software; you can redistribute it and/or modify
+  it under the terms of the GNU General Public License as published by
+  the Free Software Foundation; either version 2 of the License, or
+  (at your option) any later version.
+
+  This program is distributed in the hope that it will be useful,
+  but WITHOUT ANY WARRANTY; without even the implied warranty of
+  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+  GNU General Public License for more details.
+
+  You should have received a copy of the GNU General Public License
+  along with this program; if not, write to the Free Software
+  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+*/
+
+#ifndef __RB_RANGESET_H__
+#define __RB_RANGESET_H__
+
+#include <xen/rbtree.h>
+
+struct rb_rangeset {
+    char             name[32];
+    long             nr_ranges;
+    struct rb_root   rbroot;
+};
+
+struct rb_range {
+    struct rb_node node;
+    unsigned long s, e;
+};
+
+struct rb_rangeset *rb_rangeset_new(char *name);
+void rb_rangeset_limit(
+    struct rb_rangeset *r, unsigned int limit);
+void rb_rangeset_destroy(struct rb_rangeset *r);
+bool_t rb_rangeset_overlaps_range(struct rb_rangeset *r,
+    unsigned long s, unsigned long e);
+bool_t rb_rangeset_contains_range(
+    struct rb_rangeset *r, unsigned long s, unsigned long e);
+int rb_rangeset_add_range(struct rb_rangeset *r,
+    unsigned long s, unsigned long e);
+int rb_rangeset_remove_range(struct rb_rangeset *r,
+    unsigned long s, unsigned long e);
+void rb_rangeset_printk(struct rb_rangeset *r);
+
+#endif /* __RB_RANGESET_H__ */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server
  2015-07-06  6:25 ` [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server Yu Zhang
@ 2015-07-06 12:35   ` George Dunlap
  2015-07-06 12:38     ` Paul Durrant
  0 siblings, 1 reply; 24+ messages in thread
From: George Dunlap @ 2015-07-06 12:35 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Tian, Kevin, Keir Fraser, Andrew Cooper, xen-devel@lists.xen.org,
	Paul Durrant, zhiyuan.lv, Jan Beulich

On Mon, Jul 6, 2015 at 7:25 AM, Yu Zhang <yu.c.zhang@linux.intel.com> wrote:
> MAX_NR_IO_RANGES is used by ioreq server as the maximum
> number of discrete ranges to be tracked. This patch changes
> its value to 8k, so that more ranges can be tracked on next
> generation of Intel platforms in XenGT. Future patches can
> extend the limit to be toolstack tunable, and MAX_NR_IO_RANGES
> can serve as a default limit.
>
> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>

I said this at the Hackathon, and I'll say it here:  I think this is
the wrong approach.

The problem here is not that you don't have enough memory ranges.  The
problem is that you are not tracking memory ranges, but individual
pages.

You need to make a new interface that allows you to tag individual
gfns as p2m_mmio_write_dm, and then allow one ioreq server to get
notifications for all such writes.

 -George

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server
  2015-07-06 12:35   ` George Dunlap
@ 2015-07-06 12:38     ` Paul Durrant
  2015-07-06 12:49       ` George Dunlap
  0 siblings, 1 reply; 24+ messages in thread
From: Paul Durrant @ 2015-07-06 12:38 UTC (permalink / raw)
  To: George Dunlap, Yu Zhang
  Cc: Kevin Tian, Keir (Xen.org), Andrew Cooper,
	xen-devel@lists.xen.org, zhiyuan.lv@intel.com, Jan Beulich

> -----Original Message-----
> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
> George Dunlap
> Sent: 06 July 2015 13:36
> To: Yu Zhang
> Cc: xen-devel@lists.xen.org; Keir (Xen.org); Jan Beulich; Andrew Cooper;
> Paul Durrant; Kevin Tian; zhiyuan.lv@intel.com
> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for
> ioreq server
> 
> On Mon, Jul 6, 2015 at 7:25 AM, Yu Zhang <yu.c.zhang@linux.intel.com>
> wrote:
> > MAX_NR_IO_RANGES is used by ioreq server as the maximum
> > number of discrete ranges to be tracked. This patch changes
> > its value to 8k, so that more ranges can be tracked on next
> > generation of Intel platforms in XenGT. Future patches can
> > extend the limit to be toolstack tunable, and MAX_NR_IO_RANGES
> > can serve as a default limit.
> >
> > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> 
> I said this at the Hackathon, and I'll say it here:  I think this is
> the wrong approach.
> 
> The problem here is not that you don't have enough memory ranges.  The
> problem is that you are not tracking memory ranges, but individual
> pages.
> 
> You need to make a new interface that allows you to tag individual
> gfns as p2m_mmio_write_dm, and then allow one ioreq server to get
> notifications for all such writes.
> 

I think that is conflating things. It's quite conceivable that more than one ioreq server will handle write_dm pages. If we had enough types to have two page types per server then I'd agree with you, but we don't.

  Paul

>  -George

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server
  2015-07-06 12:38     ` Paul Durrant
@ 2015-07-06 12:49       ` George Dunlap
  2015-07-06 13:09         ` Paul Durrant
  0 siblings, 1 reply; 24+ messages in thread
From: George Dunlap @ 2015-07-06 12:49 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Kevin Tian, Keir (Xen.org), Andrew Cooper,
	xen-devel@lists.xen.org, Yu Zhang, zhiyuan.lv@intel.com,
	Jan Beulich

On Mon, Jul 6, 2015 at 1:38 PM, Paul Durrant <Paul.Durrant@citrix.com> wrote:
>> -----Original Message-----
>> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
>> George Dunlap
>> Sent: 06 July 2015 13:36
>> To: Yu Zhang
>> Cc: xen-devel@lists.xen.org; Keir (Xen.org); Jan Beulich; Andrew Cooper;
>> Paul Durrant; Kevin Tian; zhiyuan.lv@intel.com
>> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for
>> ioreq server
>>
>> On Mon, Jul 6, 2015 at 7:25 AM, Yu Zhang <yu.c.zhang@linux.intel.com>
>> wrote:
>> > MAX_NR_IO_RANGES is used by ioreq server as the maximum
>> > number of discrete ranges to be tracked. This patch changes
>> > its value to 8k, so that more ranges can be tracked on next
>> > generation of Intel platforms in XenGT. Future patches can
>> > extend the limit to be toolstack tunable, and MAX_NR_IO_RANGES
>> > can serve as a default limit.
>> >
>> > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>>
>> I said this at the Hackathon, and I'll say it here:  I think this is
>> the wrong approach.
>>
>> The problem here is not that you don't have enough memory ranges.  The
>> problem is that you are not tracking memory ranges, but individual
>> pages.
>>
>> You need to make a new interface that allows you to tag individual
>> gfns as p2m_mmio_write_dm, and then allow one ioreq server to get
>> notifications for all such writes.
>>
>
> I think that is conflating things. It's quite conceivable that more than one ioreq server will handle write_dm pages. If we had enough types to have two page types per server then I'd agree with you, but we don't.

What's conflating things is using an interface designed for *device
memory ranges* to instead *track writes to gfns*.  Fundamentally the
reason you have this explosion of "device memory ranges" is that what
you're tracking isn't device memory, and it isn't a range.  If your
umbrella isn't very good at hammering in nails, the solution is to go
get a hammer, not to add steel reinforcement to your umbrella.

My suggestion is, short-term, to simply allow the first ioreq server
to register for write_dm notifications to get notifications, and
return an error if a second one tries to do so.

If it becomes important for a single domain to have two ioreq servers
that need this functionality, then we can come up with an internal Xen
structure, *designed for gfns*, to track this.  My suspicion is that
it will never be needed.

 -George

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server
  2015-07-06 12:49       ` George Dunlap
@ 2015-07-06 13:09         ` Paul Durrant
  2015-07-06 13:23           ` George Dunlap
  2015-07-06 13:28           ` George Dunlap
  0 siblings, 2 replies; 24+ messages in thread
From: Paul Durrant @ 2015-07-06 13:09 UTC (permalink / raw)
  To: George Dunlap
  Cc: Kevin Tian, Keir (Xen.org), Andrew Cooper,
	xen-devel@lists.xen.org, Yu Zhang, zhiyuan.lv@intel.com,
	Jan Beulich

> -----Original Message-----
> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
> George Dunlap
> Sent: 06 July 2015 13:50
> To: Paul Durrant
> Cc: Yu Zhang; xen-devel@lists.xen.org; Keir (Xen.org); Jan Beulich; Andrew
> Cooper; Kevin Tian; zhiyuan.lv@intel.com
> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for
> ioreq server
> 
> On Mon, Jul 6, 2015 at 1:38 PM, Paul Durrant <Paul.Durrant@citrix.com>
> wrote:
> >> -----Original Message-----
> >> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
> >> George Dunlap
> >> Sent: 06 July 2015 13:36
> >> To: Yu Zhang
> >> Cc: xen-devel@lists.xen.org; Keir (Xen.org); Jan Beulich; Andrew Cooper;
> >> Paul Durrant; Kevin Tian; zhiyuan.lv@intel.com
> >> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES
> for
> >> ioreq server
> >>
> >> On Mon, Jul 6, 2015 at 7:25 AM, Yu Zhang <yu.c.zhang@linux.intel.com>
> >> wrote:
> >> > MAX_NR_IO_RANGES is used by ioreq server as the maximum
> >> > number of discrete ranges to be tracked. This patch changes
> >> > its value to 8k, so that more ranges can be tracked on next
> >> > generation of Intel platforms in XenGT. Future patches can
> >> > extend the limit to be toolstack tunable, and MAX_NR_IO_RANGES
> >> > can serve as a default limit.
> >> >
> >> > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> >>
> >> I said this at the Hackathon, and I'll say it here:  I think this is
> >> the wrong approach.
> >>
> >> The problem here is not that you don't have enough memory ranges.  The
> >> problem is that you are not tracking memory ranges, but individual
> >> pages.
> >>
> >> You need to make a new interface that allows you to tag individual
> >> gfns as p2m_mmio_write_dm, and then allow one ioreq server to get
> >> notifications for all such writes.
> >>
> >
> > I think that is conflating things. It's quite conceivable that more than one
> ioreq server will handle write_dm pages. If we had enough types to have
> two page types per server then I'd agree with you, but we don't.
> 
> What's conflating things is using an interface designed for *device
> memory ranges* to instead *track writes to gfns*.

What's the difference? Are you asserting that all device memory ranges have read side effects and therefore write_dm is not a reasonable optimization to use? I would not want to make that assertion.

  Paul

>  Fundamentally the
> reason you have this explosion of "device memory ranges" is that what
> you're tracking isn't device memory, and it isn't a range.  If your
> umbrella isn't very good at hammering in nails, the solution is to go
> get a hammer, not to add steel reinforcement to your umbrella.
> 
> My suggestion is, short-term, to simply allow the first ioreq server
> to register for write_dm notifications to get notifications, and
> return an error if a second one tries to do so.
> 
> If it becomes important for a single domain to have two ioreq servers
> that need this functionality, then we can come up with an internal Xen
> structure, *designed for gfns*, to track this.  My suspicion is that
> it will never be needed.
> 
>  -George

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server
  2015-07-06 13:09         ` Paul Durrant
@ 2015-07-06 13:23           ` George Dunlap
  2015-07-06 13:28           ` George Dunlap
  1 sibling, 0 replies; 24+ messages in thread
From: George Dunlap @ 2015-07-06 13:23 UTC (permalink / raw)
  To: Paul Durrant, George Dunlap
  Cc: Kevin Tian, Keir (Xen.org), Andrew Cooper,
	xen-devel@lists.xen.org, Yu Zhang, zhiyuan.lv@intel.com,
	Jan Beulich

On 07/06/2015 02:09 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
>> George Dunlap
>> Sent: 06 July 2015 13:50
>> To: Paul Durrant
>> Cc: Yu Zhang; xen-devel@lists.xen.org; Keir (Xen.org); Jan Beulich; Andrew
>> Cooper; Kevin Tian; zhiyuan.lv@intel.com
>> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for
>> ioreq server
>>
>> On Mon, Jul 6, 2015 at 1:38 PM, Paul Durrant <Paul.Durrant@citrix.com>
>> wrote:
>>>> -----Original Message-----
>>>> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
>>>> George Dunlap
>>>> Sent: 06 July 2015 13:36
>>>> To: Yu Zhang
>>>> Cc: xen-devel@lists.xen.org; Keir (Xen.org); Jan Beulich; Andrew Cooper;
>>>> Paul Durrant; Kevin Tian; zhiyuan.lv@intel.com
>>>> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES
>> for
>>>> ioreq server
>>>>
>>>> On Mon, Jul 6, 2015 at 7:25 AM, Yu Zhang <yu.c.zhang@linux.intel.com>
>>>> wrote:
>>>>> MAX_NR_IO_RANGES is used by ioreq server as the maximum
>>>>> number of discrete ranges to be tracked. This patch changes
>>>>> its value to 8k, so that more ranges can be tracked on next
>>>>> generation of Intel platforms in XenGT. Future patches can
>>>>> extend the limit to be toolstack tunable, and MAX_NR_IO_RANGES
>>>>> can serve as a default limit.
>>>>>
>>>>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>>>>
>>>> I said this at the Hackathon, and I'll say it here:  I think this is
>>>> the wrong approach.
>>>>
>>>> The problem here is not that you don't have enough memory ranges.  The
>>>> problem is that you are not tracking memory ranges, but individual
>>>> pages.
>>>>
>>>> You need to make a new interface that allows you to tag individual
>>>> gfns as p2m_mmio_write_dm, and then allow one ioreq server to get
>>>> notifications for all such writes.
>>>>
>>>
>>> I think that is conflating things. It's quite conceivable that more than one
>> ioreq server will handle write_dm pages. If we had enough types to have
>> two page types per server then I'd agree with you, but we don't.
>>
>> What's conflating things is using an interface designed for *device
>> memory ranges* to instead *track writes to gfns*.
> 
> What's the difference? Are you asserting that all device memory ranges have read side effects and therefore write_dm is not a reasonable optimization to use? I would not want to make that assertion.

The difference is that actual device memory is typically in big chunks,
and each device rarely has more than a handful; whereas guest pfns
allocated by the operating system to a driver are typically singleton
pages scattered all over the address space.

Which is why a handful of memory ranges is sufficient for tracking
devices with hundreds of megabytes of device memory, but thousands of
memory ranges are not sufficient for tracking a megabyte or so of GPU
pagetables: the former is one or two actual ranges of a significant
size; the latter are (apparently) thousands of ranges of one page each.

 -George

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server
  2015-07-06 13:09         ` Paul Durrant
  2015-07-06 13:23           ` George Dunlap
@ 2015-07-06 13:28           ` George Dunlap
  2015-07-06 13:33             ` Paul Durrant
  1 sibling, 1 reply; 24+ messages in thread
From: George Dunlap @ 2015-07-06 13:28 UTC (permalink / raw)
  To: Paul Durrant, George Dunlap
  Cc: Kevin Tian, Keir (Xen.org), Andrew Cooper,
	xen-devel@lists.xen.org, Yu Zhang, zhiyuan.lv@intel.com,
	Jan Beulich

On 07/06/2015 02:09 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
>> George Dunlap
>> Sent: 06 July 2015 13:50
>> To: Paul Durrant
>> Cc: Yu Zhang; xen-devel@lists.xen.org; Keir (Xen.org); Jan Beulich; Andrew
>> Cooper; Kevin Tian; zhiyuan.lv@intel.com
>> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for
>> ioreq server
>>
>> On Mon, Jul 6, 2015 at 1:38 PM, Paul Durrant <Paul.Durrant@citrix.com>
>> wrote:
>>>> -----Original Message-----
>>>> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
>>>> George Dunlap
>>>> Sent: 06 July 2015 13:36
>>>> To: Yu Zhang
>>>> Cc: xen-devel@lists.xen.org; Keir (Xen.org); Jan Beulich; Andrew Cooper;
>>>> Paul Durrant; Kevin Tian; zhiyuan.lv@intel.com
>>>> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES
>> for
>>>> ioreq server
>>>>
>>>> On Mon, Jul 6, 2015 at 7:25 AM, Yu Zhang <yu.c.zhang@linux.intel.com>
>>>> wrote:
>>>>> MAX_NR_IO_RANGES is used by ioreq server as the maximum
>>>>> number of discrete ranges to be tracked. This patch changes
>>>>> its value to 8k, so that more ranges can be tracked on next
>>>>> generation of Intel platforms in XenGT. Future patches can
>>>>> extend the limit to be toolstack tunable, and MAX_NR_IO_RANGES
>>>>> can serve as a default limit.
>>>>>
>>>>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>>>>
>>>> I said this at the Hackathon, and I'll say it here:  I think this is
>>>> the wrong approach.
>>>>
>>>> The problem here is not that you don't have enough memory ranges.  The
>>>> problem is that you are not tracking memory ranges, but individual
>>>> pages.
>>>>
>>>> You need to make a new interface that allows you to tag individual
>>>> gfns as p2m_mmio_write_dm, and then allow one ioreq server to get
>>>> notifications for all such writes.
>>>>
>>>
>>> I think that is conflating things. It's quite conceivable that more than one
>> ioreq server will handle write_dm pages. If we had enough types to have
>> two page types per server then I'd agree with you, but we don't.
>>
>> What's conflating things is using an interface designed for *device
>> memory ranges* to instead *track writes to gfns*.
> 
> What's the difference? Are you asserting that all device memory ranges have read side effects and therefore write_dm is not a reasonable optimization to use? I would not want to make that assertion.

Using write_dm is not the problem; it's having thousands of memory
"ranges" of 4k each that I object to.

Which is why I suggested adding an interface to request updates to gfns
(by marking them write_dm), rather than abusing the io range interface.

 -George

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server
  2015-07-06 13:28           ` George Dunlap
@ 2015-07-06 13:33             ` Paul Durrant
  2015-07-06 14:06               ` George Dunlap
  0 siblings, 1 reply; 24+ messages in thread
From: Paul Durrant @ 2015-07-06 13:33 UTC (permalink / raw)
  To: George Dunlap
  Cc: Kevin Tian, Keir (Xen.org), Andrew Cooper,
	xen-devel@lists.xen.org, Yu Zhang, zhiyuan.lv@intel.com,
	Jan Beulich

> -----Original Message-----
> From: George Dunlap [mailto:george.dunlap@eu.citrix.com]
> Sent: 06 July 2015 14:28
> To: Paul Durrant; George Dunlap
> Cc: Yu Zhang; xen-devel@lists.xen.org; Keir (Xen.org); Jan Beulich; Andrew
> Cooper; Kevin Tian; zhiyuan.lv@intel.com
> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for
> ioreq server
> 
> On 07/06/2015 02:09 PM, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
> >> George Dunlap
> >> Sent: 06 July 2015 13:50
> >> To: Paul Durrant
> >> Cc: Yu Zhang; xen-devel@lists.xen.org; Keir (Xen.org); Jan Beulich;
> Andrew
> >> Cooper; Kevin Tian; zhiyuan.lv@intel.com
> >> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES
> for
> >> ioreq server
> >>
> >> On Mon, Jul 6, 2015 at 1:38 PM, Paul Durrant <Paul.Durrant@citrix.com>
> >> wrote:
> >>>> -----Original Message-----
> >>>> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
> >>>> George Dunlap
> >>>> Sent: 06 July 2015 13:36
> >>>> To: Yu Zhang
> >>>> Cc: xen-devel@lists.xen.org; Keir (Xen.org); Jan Beulich; Andrew
> Cooper;
> >>>> Paul Durrant; Kevin Tian; zhiyuan.lv@intel.com
> >>>> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the
> MAX_NR_IO_RANGES
> >> for
> >>>> ioreq server
> >>>>
> >>>> On Mon, Jul 6, 2015 at 7:25 AM, Yu Zhang <yu.c.zhang@linux.intel.com>
> >>>> wrote:
> >>>>> MAX_NR_IO_RANGES is used by ioreq server as the maximum
> >>>>> number of discrete ranges to be tracked. This patch changes
> >>>>> its value to 8k, so that more ranges can be tracked on next
> >>>>> generation of Intel platforms in XenGT. Future patches can
> >>>>> extend the limit to be toolstack tunable, and MAX_NR_IO_RANGES
> >>>>> can serve as a default limit.
> >>>>>
> >>>>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> >>>>
> >>>> I said this at the Hackathon, and I'll say it here:  I think this is
> >>>> the wrong approach.
> >>>>
> >>>> The problem here is not that you don't have enough memory ranges.
> The
> >>>> problem is that you are not tracking memory ranges, but individual
> >>>> pages.
> >>>>
> >>>> You need to make a new interface that allows you to tag individual
> >>>> gfns as p2m_mmio_write_dm, and then allow one ioreq server to get
> >>>> notifications for all such writes.
> >>>>
> >>>
> >>> I think that is conflating things. It's quite conceivable that more than one
> >> ioreq server will handle write_dm pages. If we had enough types to have
> >> two page types per server then I'd agree with you, but we don't.
> >>
> >> What's conflating things is using an interface designed for *device
> >> memory ranges* to instead *track writes to gfns*.
> >
> > What's the difference? Are you asserting that all device memory ranges
> have read side effects and therefore write_dm is not a reasonable
> optimization to use? I would not want to make that assertion.
> 
> Using write_dm is not the problem; it's having thousands of memory
> "ranges" of 4k each that I object to.
> 
> Which is why I suggested adding an interface to request updates to gfns
> (by marking them write_dm), rather than abusing the io range interface.
> 

And it's the assertion that use of write_dm will only be relevant to gfns, and that all such notifications only need go to a single ioreq server, that I have a problem with. Whilst the use of io ranges to track gfn updates is, I agree, not ideal I think the overloading of write_dm is not a step in the right direction.

  Paul

>  -George

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server
  2015-07-06 13:33             ` Paul Durrant
@ 2015-07-06 14:06               ` George Dunlap
  2015-07-07  8:16                 ` Yu, Zhang
  0 siblings, 1 reply; 24+ messages in thread
From: George Dunlap @ 2015-07-06 14:06 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Kevin Tian, Keir (Xen.org), Andrew Cooper, George Dunlap,
	xen-devel@lists.xen.org, Yu Zhang, zhiyuan.lv@intel.com,
	Jan Beulich

On Mon, Jul 6, 2015 at 2:33 PM, Paul Durrant <Paul.Durrant@citrix.com> wrote:
>> -----Original Message-----
>> From: George Dunlap [mailto:george.dunlap@eu.citrix.com]
>> Sent: 06 July 2015 14:28
>> To: Paul Durrant; George Dunlap
>> Cc: Yu Zhang; xen-devel@lists.xen.org; Keir (Xen.org); Jan Beulich; Andrew
>> Cooper; Kevin Tian; zhiyuan.lv@intel.com
>> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for
>> ioreq server
>>
>> On 07/06/2015 02:09 PM, Paul Durrant wrote:
>> >> -----Original Message-----
>> >> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
>> >> George Dunlap
>> >> Sent: 06 July 2015 13:50
>> >> To: Paul Durrant
>> >> Cc: Yu Zhang; xen-devel@lists.xen.org; Keir (Xen.org); Jan Beulich;
>> Andrew
>> >> Cooper; Kevin Tian; zhiyuan.lv@intel.com
>> >> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES
>> for
>> >> ioreq server
>> >>
>> >> On Mon, Jul 6, 2015 at 1:38 PM, Paul Durrant <Paul.Durrant@citrix.com>
>> >> wrote:
>> >>>> -----Original Message-----
>> >>>> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
>> >>>> George Dunlap
>> >>>> Sent: 06 July 2015 13:36
>> >>>> To: Yu Zhang
>> >>>> Cc: xen-devel@lists.xen.org; Keir (Xen.org); Jan Beulich; Andrew
>> Cooper;
>> >>>> Paul Durrant; Kevin Tian; zhiyuan.lv@intel.com
>> >>>> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the
>> MAX_NR_IO_RANGES
>> >> for
>> >>>> ioreq server
>> >>>>
>> >>>> On Mon, Jul 6, 2015 at 7:25 AM, Yu Zhang <yu.c.zhang@linux.intel.com>
>> >>>> wrote:
>> >>>>> MAX_NR_IO_RANGES is used by ioreq server as the maximum
>> >>>>> number of discrete ranges to be tracked. This patch changes
>> >>>>> its value to 8k, so that more ranges can be tracked on next
>> >>>>> generation of Intel platforms in XenGT. Future patches can
>> >>>>> extend the limit to be toolstack tunable, and MAX_NR_IO_RANGES
>> >>>>> can serve as a default limit.
>> >>>>>
>> >>>>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>> >>>>
>> >>>> I said this at the Hackathon, and I'll say it here:  I think this is
>> >>>> the wrong approach.
>> >>>>
>> >>>> The problem here is not that you don't have enough memory ranges.
>> The
>> >>>> problem is that you are not tracking memory ranges, but individual
>> >>>> pages.
>> >>>>
>> >>>> You need to make a new interface that allows you to tag individual
>> >>>> gfns as p2m_mmio_write_dm, and then allow one ioreq server to get
>> >>>> notifications for all such writes.
>> >>>>
>> >>>
>> >>> I think that is conflating things. It's quite conceivable that more than one
>> >> ioreq server will handle write_dm pages. If we had enough types to have
>> >> two page types per server then I'd agree with you, but we don't.
>> >>
>> >> What's conflating things is using an interface designed for *device
>> >> memory ranges* to instead *track writes to gfns*.
>> >
>> > What's the difference? Are you asserting that all device memory ranges
>> have read side effects and therefore write_dm is not a reasonable
>> optimization to use? I would not want to make that assertion.
>>
>> Using write_dm is not the problem; it's having thousands of memory
>> "ranges" of 4k each that I object to.
>>
>> Which is why I suggested adding an interface to request updates to gfns
>> (by marking them write_dm), rather than abusing the io range interface.
>>
>
> And it's the assertion that use of write_dm will only be relevant to gfns, and that all such notifications only need go to a single ioreq server, that I have a problem with. Whilst the use of io ranges to track gfn updates is, I agree, not ideal I think the overloading of write_dm is not a step in the right direction.

So there are two questions here.

First of all, I certainly think that the *interface* should be able to
be transparently extended to support multiple ioreq servers being able
to track gfns.  My suggestion was to add a hypercall that allows an
ioreq server to say, "Please send modifications to gfn N to ioreq
server X"; and that for the time being, only allow one such X to exist
at a time per domain.  That is, if ioreq server Y makes such a call
after ioreq server X has done so, return -EBUSY.  That way we can add
support when we need it.

In fact, you probably already have a problem with two ioreq servers,
because (if I recall correctly) you don't know for sure when a page
has stopped being used as a GPU pagetable.  Consider the following
scenario:
1. Two devices, served by ioreq servers 1 and 2.
2. driver for device served by ioreq server 1 allocates a page, uses
it as a pagetable.  ioreq server 1 adds that pfn to the ranges it's
watching.
3. driver frees page back to guest OS; but ioreq server 1 doesn't know
so it doesn't release the range
4. driver for device served by ioreq server 2 allocates a page, which
happens to be the same one used before, and uses it as a pagetable.
ioreq server 1 tries to add that pfn to the ranges it's watching.

Now you have an "overlap in the range" between the two ioreq servers.
What do you do?

Regarding using write_dm for actual device memory ranges: Do you have
any concrete scenarios in mind where you think this will be used?

Fundamentally, write_dm looks to me like it's about tracking gfns --
i.e., things backed by guest RAM -- not IO ranges.  As such, it should
have an interface and an implementation that reflects that.

 -George

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server
  2015-07-06 14:06               ` George Dunlap
@ 2015-07-07  8:16                 ` Yu, Zhang
  2015-07-07  9:23                   ` Paul Durrant
  0 siblings, 1 reply; 24+ messages in thread
From: Yu, Zhang @ 2015-07-07  8:16 UTC (permalink / raw)
  To: George Dunlap, Paul Durrant
  Cc: Kevin Tian, Keir (Xen.org), Andrew Cooper, George Dunlap,
	xen-devel@lists.xen.org, zhiyuan.lv@intel.com, Jan Beulich

Thanks a lot, George.

On 7/6/2015 10:06 PM, George Dunlap wrote:
> On Mon, Jul 6, 2015 at 2:33 PM, Paul Durrant <Paul.Durrant@citrix.com> wrote:
>>> -----Original Message-----
>>> From: George Dunlap [mailto:george.dunlap@eu.citrix.com]
>>> Sent: 06 July 2015 14:28
>>> To: Paul Durrant; George Dunlap
>>> Cc: Yu Zhang; xen-devel@lists.xen.org; Keir (Xen.org); Jan Beulich; Andrew
>>> Cooper; Kevin Tian; zhiyuan.lv@intel.com
>>> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for
>>> ioreq server
>>>
>>> On 07/06/2015 02:09 PM, Paul Durrant wrote:
>>>>> -----Original Message-----
>>>>> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
>>>>> George Dunlap
>>>>> Sent: 06 July 2015 13:50
>>>>> To: Paul Durrant
>>>>> Cc: Yu Zhang; xen-devel@lists.xen.org; Keir (Xen.org); Jan Beulich;
>>> Andrew
>>>>> Cooper; Kevin Tian; zhiyuan.lv@intel.com
>>>>> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES
>>> for
>>>>> ioreq server
>>>>>
>>>>> On Mon, Jul 6, 2015 at 1:38 PM, Paul Durrant <Paul.Durrant@citrix.com>
>>>>> wrote:
>>>>>>> -----Original Message-----
>>>>>>> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
>>>>>>> George Dunlap
>>>>>>> Sent: 06 July 2015 13:36
>>>>>>> To: Yu Zhang
>>>>>>> Cc: xen-devel@lists.xen.org; Keir (Xen.org); Jan Beulich; Andrew
>>> Cooper;
>>>>>>> Paul Durrant; Kevin Tian; zhiyuan.lv@intel.com
>>>>>>> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the
>>> MAX_NR_IO_RANGES
>>>>> for
>>>>>>> ioreq server
>>>>>>>
>>>>>>> On Mon, Jul 6, 2015 at 7:25 AM, Yu Zhang <yu.c.zhang@linux.intel.com>
>>>>>>> wrote:
>>>>>>>> MAX_NR_IO_RANGES is used by ioreq server as the maximum
>>>>>>>> number of discrete ranges to be tracked. This patch changes
>>>>>>>> its value to 8k, so that more ranges can be tracked on next
>>>>>>>> generation of Intel platforms in XenGT. Future patches can
>>>>>>>> extend the limit to be toolstack tunable, and MAX_NR_IO_RANGES
>>>>>>>> can serve as a default limit.
>>>>>>>>
>>>>>>>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>>>>>>>
>>>>>>> I said this at the Hackathon, and I'll say it here:  I think this is
>>>>>>> the wrong approach.
>>>>>>>
>>>>>>> The problem here is not that you don't have enough memory ranges.
>>> The
>>>>>>> problem is that you are not tracking memory ranges, but individual
>>>>>>> pages.
>>>>>>>
>>>>>>> You need to make a new interface that allows you to tag individual
>>>>>>> gfns as p2m_mmio_write_dm, and then allow one ioreq server to get
>>>>>>> notifications for all such writes.
>>>>>>>
>>>>>>
>>>>>> I think that is conflating things. It's quite conceivable that more than one
>>>>> ioreq server will handle write_dm pages. If we had enough types to have
>>>>> two page types per server then I'd agree with you, but we don't.
>>>>>
>>>>> What's conflating things is using an interface designed for *device
>>>>> memory ranges* to instead *track writes to gfns*.
>>>>
>>>> What's the difference? Are you asserting that all device memory ranges
>>> have read side effects and therefore write_dm is not a reasonable
>>> optimization to use? I would not want to make that assertion.
>>>
>>> Using write_dm is not the problem; it's having thousands of memory
>>> "ranges" of 4k each that I object to.
>>>
>>> Which is why I suggested adding an interface to request updates to gfns
>>> (by marking them write_dm), rather than abusing the io range interface.
>>>
>>
>> And it's the assertion that use of write_dm will only be relevant to gfns, and that all such notifications only need go to a single ioreq server, that I have a problem with. Whilst the use of io ranges to track gfn updates is, I agree, not ideal I think the overloading of write_dm is not a step in the right direction.
>
> So there are two questions here.
>
> First of all, I certainly think that the *interface* should be able to
> be transparently extended to support multiple ioreq servers being able
> to track gfns.  My suggestion was to add a hypercall that allows an
> ioreq server to say, "Please send modifications to gfn N to ioreq
> server X"; and that for the time being, only allow one such X to exist
> at a time per domain.  That is, if ioreq server Y makes such a call
> after ioreq server X has done so, return -EBUSY.  That way we can add
> support when we need it.
>

Well, I also agree the current implementation is probably not optimal.
And yes, it seems promiscuous( hope I did not use the wrong word :) )
to mix the device I/O ranges and the guest memory. But, forwarding an
ioreq to backend driver, just by an p2m type? Although it would be easy
for XenGT to take this approach, I agree with Paul that this would
weaken the functionality of ioreq server. Besides, is it appropriate
for a p2m type to be used this way? It seems strange for me.

> In fact, you probably already have a problem with two ioreq servers,
> because (if I recall correctly) you don't know for sure when a page

Fortunately, we do, and these unmapped page tables will be removed from
the rangeset of ioreq server. So the following scenario won't happen. :)

> has stopped being used as a GPU pagetable.  Consider the following
> scenario:
> 1. Two devices, served by ioreq servers 1 and 2.
> 2. driver for device served by ioreq server 1 allocates a page, uses
> it as a pagetable.  ioreq server 1 adds that pfn to the ranges it's
> watching.
> 3. driver frees page back to guest OS; but ioreq server 1 doesn't know
> so it doesn't release the range
> 4. driver for device served by ioreq server 2 allocates a page, which
> happens to be the same one used before, and uses it as a pagetable.
> ioreq server 1 tries to add that pfn to the ranges it's watching.
>
> Now you have an "overlap in the range" between the two ioreq servers.
> What do you do?
>
> Regarding using write_dm for actual device memory ranges: Do you have
> any concrete scenarios in mind where you think this will be used?
>
> Fundamentally, write_dm looks to me like it's about tracking gfns --
> i.e., things backed by guest RAM -- not IO ranges.  As such, it should
> have an interface and an implementation that reflects that.
>

Here, I guess your major concern about the difference between tracking
gfns and I/O ranges is that the gfns are scattered? And yes, this is why
we need more ranges inside a rangeset. Here the new value of the limit, 
8K, is a practical one for XenGT. In the future, we can either provide
other approaches to configure the maximum ranges inside an ioreq server,
or provide some xenheap allocation management routines. Is this OK?

I thought we had successfully convinced you in hackathon. And seems
I'm wrong. Anyway, your advices are very appreciated. :)

>   -George
>

B.R.
Yu
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server
  2015-07-07  8:16                 ` Yu, Zhang
@ 2015-07-07  9:23                   ` Paul Durrant
  2015-07-07 12:53                     ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Paul Durrant @ 2015-07-07  9:23 UTC (permalink / raw)
  To: Yu, Zhang
  Cc: Kevin Tian, Keir (Xen.org), Andrew Cooper, George Dunlap,
	xen-devel@lists.xen.org, zhiyuan.lv@intel.com, Jan Beulich

> -----Original Message-----
> From: Yu, Zhang [mailto:yu.c.zhang@linux.intel.com]
> Sent: 07 July 2015 09:16
> To: George Dunlap; Paul Durrant
> Cc: Kevin Tian; Keir (Xen.org); Andrew Cooper; George Dunlap; xen-
> devel@lists.xen.org; zhiyuan.lv@intel.com; Jan Beulich
> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for
> ioreq server
> 
> Thanks a lot, George.
> 
> On 7/6/2015 10:06 PM, George Dunlap wrote:
> > On Mon, Jul 6, 2015 at 2:33 PM, Paul Durrant <Paul.Durrant@citrix.com>
> wrote:
> >>> -----Original Message-----
> >>> From: George Dunlap [mailto:george.dunlap@eu.citrix.com]
> >>> Sent: 06 July 2015 14:28
> >>> To: Paul Durrant; George Dunlap
> >>> Cc: Yu Zhang; xen-devel@lists.xen.org; Keir (Xen.org); Jan Beulich;
> Andrew
> >>> Cooper; Kevin Tian; zhiyuan.lv@intel.com
> >>> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the
> MAX_NR_IO_RANGES for
> >>> ioreq server
> >>>
> >>> On 07/06/2015 02:09 PM, Paul Durrant wrote:
> >>>>> -----Original Message-----
> >>>>> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
> >>>>> George Dunlap
> >>>>> Sent: 06 July 2015 13:50
> >>>>> To: Paul Durrant
> >>>>> Cc: Yu Zhang; xen-devel@lists.xen.org; Keir (Xen.org); Jan Beulich;
> >>> Andrew
> >>>>> Cooper; Kevin Tian; zhiyuan.lv@intel.com
> >>>>> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the
> MAX_NR_IO_RANGES
> >>> for
> >>>>> ioreq server
> >>>>>
> >>>>> On Mon, Jul 6, 2015 at 1:38 PM, Paul Durrant
> <Paul.Durrant@citrix.com>
> >>>>> wrote:
> >>>>>>> -----Original Message-----
> >>>>>>> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf
> Of
> >>>>>>> George Dunlap
> >>>>>>> Sent: 06 July 2015 13:36
> >>>>>>> To: Yu Zhang
> >>>>>>> Cc: xen-devel@lists.xen.org; Keir (Xen.org); Jan Beulich; Andrew
> >>> Cooper;
> >>>>>>> Paul Durrant; Kevin Tian; zhiyuan.lv@intel.com
> >>>>>>> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the
> >>> MAX_NR_IO_RANGES
> >>>>> for
> >>>>>>> ioreq server
> >>>>>>>
> >>>>>>> On Mon, Jul 6, 2015 at 7:25 AM, Yu Zhang
> <yu.c.zhang@linux.intel.com>
> >>>>>>> wrote:
> >>>>>>>> MAX_NR_IO_RANGES is used by ioreq server as the maximum
> >>>>>>>> number of discrete ranges to be tracked. This patch changes
> >>>>>>>> its value to 8k, so that more ranges can be tracked on next
> >>>>>>>> generation of Intel platforms in XenGT. Future patches can
> >>>>>>>> extend the limit to be toolstack tunable, and
> MAX_NR_IO_RANGES
> >>>>>>>> can serve as a default limit.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> >>>>>>>
> >>>>>>> I said this at the Hackathon, and I'll say it here:  I think this is
> >>>>>>> the wrong approach.
> >>>>>>>
> >>>>>>> The problem here is not that you don't have enough memory
> ranges.
> >>> The
> >>>>>>> problem is that you are not tracking memory ranges, but individual
> >>>>>>> pages.
> >>>>>>>
> >>>>>>> You need to make a new interface that allows you to tag individual
> >>>>>>> gfns as p2m_mmio_write_dm, and then allow one ioreq server to
> get
> >>>>>>> notifications for all such writes.
> >>>>>>>
> >>>>>>
> >>>>>> I think that is conflating things. It's quite conceivable that more than
> one
> >>>>> ioreq server will handle write_dm pages. If we had enough types to
> have
> >>>>> two page types per server then I'd agree with you, but we don't.
> >>>>>
> >>>>> What's conflating things is using an interface designed for *device
> >>>>> memory ranges* to instead *track writes to gfns*.
> >>>>
> >>>> What's the difference? Are you asserting that all device memory ranges
> >>> have read side effects and therefore write_dm is not a reasonable
> >>> optimization to use? I would not want to make that assertion.
> >>>
> >>> Using write_dm is not the problem; it's having thousands of memory
> >>> "ranges" of 4k each that I object to.
> >>>
> >>> Which is why I suggested adding an interface to request updates to gfns
> >>> (by marking them write_dm), rather than abusing the io range interface.
> >>>
> >>
> >> And it's the assertion that use of write_dm will only be relevant to gfns,
> and that all such notifications only need go to a single ioreq server, that I
> have a problem with. Whilst the use of io ranges to track gfn updates is, I
> agree, not ideal I think the overloading of write_dm is not a step in the right
> direction.
> >
> > So there are two questions here.
> >
> > First of all, I certainly think that the *interface* should be able to
> > be transparently extended to support multiple ioreq servers being able
> > to track gfns.  My suggestion was to add a hypercall that allows an
> > ioreq server to say, "Please send modifications to gfn N to ioreq
> > server X"; and that for the time being, only allow one such X to exist
> > at a time per domain.  That is, if ioreq server Y makes such a call
> > after ioreq server X has done so, return -EBUSY.  That way we can add
> > support when we need it.
> >
> 
> Well, I also agree the current implementation is probably not optimal.
> And yes, it seems promiscuous( hope I did not use the wrong word :) )
> to mix the device I/O ranges and the guest memory. But, forwarding an
> ioreq to backend driver, just by an p2m type? Although it would be easy
> for XenGT to take this approach, I agree with Paul that this would
> weaken the functionality of ioreq server. Besides, is it appropriate
> for a p2m type to be used this way? It seems strange for me.
> 
> > In fact, you probably already have a problem with two ioreq servers,
> > because (if I recall correctly) you don't know for sure when a page
> 
> Fortunately, we do, and these unmapped page tables will be removed from
> the rangeset of ioreq server. So the following scenario won't happen. :)
> 
> > has stopped being used as a GPU pagetable.  Consider the following
> > scenario:
> > 1. Two devices, served by ioreq servers 1 and 2.
> > 2. driver for device served by ioreq server 1 allocates a page, uses
> > it as a pagetable.  ioreq server 1 adds that pfn to the ranges it's
> > watching.
> > 3. driver frees page back to guest OS; but ioreq server 1 doesn't know
> > so it doesn't release the range
> > 4. driver for device served by ioreq server 2 allocates a page, which
> > happens to be the same one used before, and uses it as a pagetable.
> > ioreq server 1 tries to add that pfn to the ranges it's watching.
> >
> > Now you have an "overlap in the range" between the two ioreq servers.
> > What do you do?
> >
> > Regarding using write_dm for actual device memory ranges: Do you have
> > any concrete scenarios in mind where you think this will be used?
> >
> > Fundamentally, write_dm looks to me like it's about tracking gfns --
> > i.e., things backed by guest RAM -- not IO ranges.  As such, it should
> > have an interface and an implementation that reflects that.
> >
> 
> Here, I guess your major concern about the difference between tracking
> gfns and I/O ranges is that the gfns are scattered? And yes, this is why
> we need more ranges inside a rangeset. Here the new value of the limit,
> 8K, is a practical one for XenGT. In the future, we can either provide
> other approaches to configure the maximum ranges inside an ioreq server,
> or provide some xenheap allocation management routines. Is this OK?
> 
> I thought we had successfully convinced you in hackathon. And seems
> I'm wrong. Anyway, your advices are very appreciated. :)
> 

George,

I wonder, would it be sufficient - at this stage - to add a new mapping sub-op to the HVM op to distinguish mapping of mapping gfns vs. MMIO ranges. That way we could use the same implementation underneath for now (using the rb_rangeset, which I think stands on its own merits for MMIO ranges anyway) but allow them to diverge later... perhaps using a new P2T (page-to-type) table, which I believe may become necessary as Intel reclaims bits for h/w use and thus squeezes our existing number of supported page types.

  Paul

> >   -George
> >
> 
> B.R.
> Yu
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> >
> >

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server
  2015-07-07  9:23                   ` Paul Durrant
@ 2015-07-07 12:53                     ` Jan Beulich
  2015-07-07 13:11                       ` Paul Durrant
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2015-07-07 12:53 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Kevin Tian, Keir (Xen.org), AndrewCooper, George Dunlap,
	xen-devel@lists.xen.org, Zhang Yu, zhiyuan.lv@intel.com

>>> On 07.07.15 at 11:23, <Paul.Durrant@citrix.com> wrote:
> I wonder, would it be sufficient - at this stage - to add a new mapping sub-op 
> to the HVM op to distinguish mapping of mapping gfns vs. MMIO ranges. That 
> way we could use the same implementation underneath for now (using the 
> rb_rangeset, which I think stands on its own merits for MMIO ranges anyway) 

Which would be (taking into account the good description of the
differences between RAM and MMIO pages given by George
yesterday [I think])? I continue to not be convinced we need
this new rangeset type (the more that it's name seems wrong,
since - as said by George - we're unlikely to deal with ranges
here).

Jan

> but allow them to diverge later... perhaps using a new P2T (page-to-type) 
> table, which I believe may become necessary as Intel reclaims bits for h/w 
> use and thus squeezes our existing number of supported page types.
> 
>   Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server
  2015-07-07 12:53                     ` Jan Beulich
@ 2015-07-07 13:11                       ` Paul Durrant
  2015-07-07 14:04                         ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Paul Durrant @ 2015-07-07 13:11 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Keir (Xen.org), Andrew Cooper, George Dunlap,
	xen-devel@lists.xen.org, Zhang Yu, zhiyuan.lv@intel.com

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 07 July 2015 13:53
> To: Paul Durrant
> Cc: Andrew Cooper; George Dunlap; Kevin Tian; zhiyuan.lv@intel.com; Zhang
> Yu; xen-devel@lists.xen.org; Keir (Xen.org)
> Subject: RE: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for
> ioreq server
> 
> >>> On 07.07.15 at 11:23, <Paul.Durrant@citrix.com> wrote:
> > I wonder, would it be sufficient - at this stage - to add a new mapping sub-
> op
> > to the HVM op to distinguish mapping of mapping gfns vs. MMIO ranges.
> That
> > way we could use the same implementation underneath for now (using
> the
> > rb_rangeset, which I think stands on its own merits for MMIO ranges
> anyway)
> 
> Which would be (taking into account the good description of the
> differences between RAM and MMIO pages given by George
> yesterday [I think])? I continue to not be convinced we need
> this new rangeset type (the more that it's name seems wrong,
> since - as said by George - we're unlikely to deal with ranges
> here).
> 

I don't see that implementing rangesets on top of rb tree is a problem. IMO it's a useful optimization in its own right since it takes something that's currently O(n) and makes it O(log n) using an rb tree implementation that's already there. In fact, perhaps we just make the current rangeset implementation use rb trees underneath, then there's no need for the extra API.

  Paul

> Jan
> 
> > but allow them to diverge later... perhaps using a new P2T (page-to-type)
> > table, which I believe may become necessary as Intel reclaims bits for h/w
> > use and thus squeezes our existing number of supported page types.
> >
> >   Paul
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server
  2015-07-07 13:11                       ` Paul Durrant
@ 2015-07-07 14:04                         ` Jan Beulich
  2015-07-07 14:30                           ` Yu, Zhang
  2015-07-07 15:12                           ` Paul Durrant
  0 siblings, 2 replies; 24+ messages in thread
From: Jan Beulich @ 2015-07-07 14:04 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Kevin Tian, Keir (Xen.org), Andrew Cooper, George Dunlap,
	xen-devel@lists.xen.org, Zhang Yu, zhiyuan.lv@intel.com

>>> On 07.07.15 at 15:11, <Paul.Durrant@citrix.com> wrote:
>>  -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 07 July 2015 13:53
>> To: Paul Durrant
>> Cc: Andrew Cooper; George Dunlap; Kevin Tian; zhiyuan.lv@intel.com; Zhang
>> Yu; xen-devel@lists.xen.org; Keir (Xen.org)
>> Subject: RE: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for
>> ioreq server
>> 
>> >>> On 07.07.15 at 11:23, <Paul.Durrant@citrix.com> wrote:
>> > I wonder, would it be sufficient - at this stage - to add a new mapping sub-
>> op
>> > to the HVM op to distinguish mapping of mapping gfns vs. MMIO ranges.
>> That
>> > way we could use the same implementation underneath for now (using
>> the
>> > rb_rangeset, which I think stands on its own merits for MMIO ranges
>> anyway)
>> 
>> Which would be (taking into account the good description of the
>> differences between RAM and MMIO pages given by George
>> yesterday [I think])? I continue to not be convinced we need
>> this new rangeset type (the more that it's name seems wrong,
>> since - as said by George - we're unlikely to deal with ranges
>> here).
>> 
> 
> I don't see that implementing rangesets on top of rb tree is a problem. IMO 
> it's a useful optimization in its own right since it takes something that's 
> currently O(n) and makes it O(log n) using an rb tree implementation that's 
> already there. In fact, perhaps we just make the current rangeset 
> implementation use rb trees underneath, then there's no need for the extra 
> API.

I wouldn't mind such an overall change (provided it doesn't
introduce new issues), but I'm not convinced of having two
rangeset implementations, and I don't see the lookup speed
as an issue with the uses we have for rangesets now. The
arguments for bumping the number of ranges, which would
possibly affect lookup in a negative way, haven't been
convincing to me so far (and XenGT isn't going to make 4.6
anyway).

Jan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server
  2015-07-07 14:04                         ` Jan Beulich
@ 2015-07-07 14:30                           ` Yu, Zhang
  2015-07-07 14:43                             ` Jan Beulich
  2015-07-07 15:12                           ` Paul Durrant
  1 sibling, 1 reply; 24+ messages in thread
From: Yu, Zhang @ 2015-07-07 14:30 UTC (permalink / raw)
  To: Jan Beulich, Paul Durrant
  Cc: Kevin Tian, Keir (Xen.org), Andrew Cooper, George Dunlap,
	xen-devel@lists.xen.org, zhiyuan.lv@intel.com

Hi Jan,

On 7/7/2015 10:04 PM, Jan Beulich wrote:
>>>> On 07.07.15 at 15:11, <Paul.Durrant@citrix.com> wrote:
>>>   -----Original Message-----
>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>> Sent: 07 July 2015 13:53
>>> To: Paul Durrant
>>> Cc: Andrew Cooper; George Dunlap; Kevin Tian; zhiyuan.lv@intel.com; Zhang
>>> Yu; xen-devel@lists.xen.org; Keir (Xen.org)
>>> Subject: RE: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for
>>> ioreq server
>>>
>>>>>> On 07.07.15 at 11:23, <Paul.Durrant@citrix.com> wrote:
>>>> I wonder, would it be sufficient - at this stage - to add a new mapping sub-
>>> op
>>>> to the HVM op to distinguish mapping of mapping gfns vs. MMIO ranges.
>>> That
>>>> way we could use the same implementation underneath for now (using
>>> the
>>>> rb_rangeset, which I think stands on its own merits for MMIO ranges
>>> anyway)
>>>
>>> Which would be (taking into account the good description of the
>>> differences between RAM and MMIO pages given by George
>>> yesterday [I think])? I continue to not be convinced we need
>>> this new rangeset type (the more that it's name seems wrong,
>>> since - as said by George - we're unlikely to deal with ranges
>>> here).
>>>
>>
>> I don't see that implementing rangesets on top of rb tree is a problem. IMO
>> it's a useful optimization in its own right since it takes something that's
>> currently O(n) and makes it O(log n) using an rb tree implementation that's
>> already there. In fact, perhaps we just make the current rangeset
>> implementation use rb trees underneath, then there's no need for the extra
>> API.
>
> I wouldn't mind such an overall change (provided it doesn't
> introduce new issues), but I'm not convinced of having two
> rangeset implementations, and I don't see the lookup speed
> as an issue with the uses we have for rangesets now. The
> arguments for bumping the number of ranges, which would
> possibly affect lookup in a negative way, haven't been
> convincing to me so far (and XenGT isn't going to make 4.6
> anyway).
>
I know that George and you have concerns about the differences
between MMIO and guest page tables, but I do not quite understand
why. :)

Althogh the granularity of the write-protected memory is 4K in our
case, the trapped address is still a guest physical address, not a
guest page frame number. We can see the range as an 4K sized one,
right? Besides, in many cases, the guest graphic page tables are
continuous and the size of a range inside ioreq server is more than
4k. As you can see, the existing code already tracks the guest graphic
page tables by rangeset, and the difference here is that there would
be more page tables we need to take care of for BDW. Changing the
upper limit of the number of rangeset inside an ioreq server does
not necessarily mean there would be so many.

About the rb_rangeset I introduced, it is only for XenGT case,
because only in such cases will a more efficient data structure be
necessary - maybe in the future there would be more requirements to
this type.

Yes, XenGT may not catch up the 4.6. And thanks for your frankness. :)
But I'll still be very appreciated if you can give me some advices
and directions to go.

B.R.
Yu
> Jan
>
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server
  2015-07-07 14:30                           ` Yu, Zhang
@ 2015-07-07 14:43                             ` Jan Beulich
  2015-07-07 14:49                               ` Yu, Zhang
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2015-07-07 14:43 UTC (permalink / raw)
  To: Zhang Yu
  Cc: Kevin Tian, Keir (Xen.org), Andrew Cooper, George Dunlap,
	xen-devel@lists.xen.org, Paul Durrant, zhiyuan.lv@intel.com

>>> On 07.07.15 at 16:30, <yu.c.zhang@linux.intel.com> wrote:
> I know that George and you have concerns about the differences
> between MMIO and guest page tables, but I do not quite understand
> why. :)

But you read George's very nice description of the differences? I
ask because if you did, I don't see why you re-raise the question
above.

Jan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server
  2015-07-07 14:43                             ` Jan Beulich
@ 2015-07-07 14:49                               ` Yu, Zhang
  2015-07-07 15:10                                 ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Yu, Zhang @ 2015-07-07 14:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Keir (Xen.org), Andrew Cooper, George Dunlap,
	xen-devel@lists.xen.org, Paul Durrant, zhiyuan.lv@intel.com



On 7/7/2015 10:43 PM, Jan Beulich wrote:
>>>> On 07.07.15 at 16:30, <yu.c.zhang@linux.intel.com> wrote:
>> I know that George and you have concerns about the differences
>> between MMIO and guest page tables, but I do not quite understand
>> why. :)
>
> But you read George's very nice description of the differences? I
> ask because if you did, I don't see why you re-raise the question
> above.
>

Well, yes. I guess you mean this statement:
"the former is one or two actual ranges of a significant size; the
latter are (apparently) thousands of ranges of one page each."?
But I do not understand why this is abusing the io range interface.
Does the number matters so much? :)

B.R.
Yu

> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server
  2015-07-07 14:49                               ` Yu, Zhang
@ 2015-07-07 15:10                                 ` Jan Beulich
  2015-07-07 16:02                                   ` Yu, Zhang
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2015-07-07 15:10 UTC (permalink / raw)
  To: Zhang Yu
  Cc: Kevin Tian, Keir (Xen.org), Andrew Cooper, George Dunlap,
	xen-devel@lists.xen.org, Paul Durrant, zhiyuan.lv@intel.com

>>> On 07.07.15 at 16:49, <yu.c.zhang@linux.intel.com> wrote:
> On 7/7/2015 10:43 PM, Jan Beulich wrote:
>>>>> On 07.07.15 at 16:30, <yu.c.zhang@linux.intel.com> wrote:
>>> I know that George and you have concerns about the differences
>>> between MMIO and guest page tables, but I do not quite understand
>>> why. :)
>>
>> But you read George's very nice description of the differences? I
>> ask because if you did, I don't see why you re-raise the question
>> above.
>>
> 
> Well, yes. I guess you mean this statement:
> "the former is one or two actual ranges of a significant size; the
> latter are (apparently) thousands of ranges of one page each."?
> But I do not understand why this is abusing the io range interface.
> Does the number matters so much? :)

Yes, we specifically set it that low so misbehaving tool stacks
(perhaps de-privileged) can't cause the hypervisor to allocate
undue amounts of memory for tracking these ranges. This
concern, btw, applies as much to the rb-rangesets.

Plus the number you bump MAX_NR_IO_RANGES to is - as I
understood it - obtained phenomenologically, i.e. there's no
reason not to assume that some bigger graphics card may
need this to be further bumped. The current count is arbitrary
too, but limiting guests only in so far as there can't be more
than so many (possibly huge) MMIO ranges on the complete set
of devices passed through to it.

And finally, the I/O ranges are called I/O ranges because they
are intended to cover I/O memory. RAM clearly isn't I/O memory,
even if it may be accessed directly by devices.

Jan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server
  2015-07-07 14:04                         ` Jan Beulich
  2015-07-07 14:30                           ` Yu, Zhang
@ 2015-07-07 15:12                           ` Paul Durrant
  2015-07-07 15:27                             ` Jan Beulich
  1 sibling, 1 reply; 24+ messages in thread
From: Paul Durrant @ 2015-07-07 15:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Keir (Xen.org), Andrew Cooper, George Dunlap,
	xen-devel@lists.xen.org, Zhang Yu, zhiyuan.lv@intel.com

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 07 July 2015 15:04
> To: Paul Durrant
> Cc: Andrew Cooper; George Dunlap; Kevin Tian; zhiyuan.lv@intel.com; Zhang
> Yu; xen-devel@lists.xen.org; Keir (Xen.org)
> Subject: RE: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for
> ioreq server
> 
> >>> On 07.07.15 at 15:11, <Paul.Durrant@citrix.com> wrote:
> >>  -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: 07 July 2015 13:53
> >> To: Paul Durrant
> >> Cc: Andrew Cooper; George Dunlap; Kevin Tian; zhiyuan.lv@intel.com;
> Zhang
> >> Yu; xen-devel@lists.xen.org; Keir (Xen.org)
> >> Subject: RE: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES
> for
> >> ioreq server
> >>
> >> >>> On 07.07.15 at 11:23, <Paul.Durrant@citrix.com> wrote:
> >> > I wonder, would it be sufficient - at this stage - to add a new mapping
> sub-
> >> op
> >> > to the HVM op to distinguish mapping of mapping gfns vs. MMIO
> ranges.
> >> That
> >> > way we could use the same implementation underneath for now (using
> >> the
> >> > rb_rangeset, which I think stands on its own merits for MMIO ranges
> >> anyway)
> >>
> >> Which would be (taking into account the good description of the
> >> differences between RAM and MMIO pages given by George
> >> yesterday [I think])? I continue to not be convinced we need
> >> this new rangeset type (the more that it's name seems wrong,
> >> since - as said by George - we're unlikely to deal with ranges
> >> here).
> >>
> >
> > I don't see that implementing rangesets on top of rb tree is a problem. IMO
> > it's a useful optimization in its own right since it takes something that's
> > currently O(n) and makes it O(log n) using an rb tree implementation that's
> > already there. In fact, perhaps we just make the current rangeset
> > implementation use rb trees underneath, then there's no need for the
> extra
> > API.
> 
> I wouldn't mind such an overall change (provided it doesn't
> introduce new issues), but I'm not convinced of having two
> rangeset implementations, and I don't see the lookup speed
> as an issue with the uses we have for rangesets now. The
> arguments for bumping the number of ranges, which would
> possibly affect lookup in a negative way, haven't been
> convincing to me so far (and XenGT isn't going to make 4.6
> anyway).
> 

I don't think that the Xan parts of XenGT are that far off. I had a list of 3 items:

- 16 byte I/O brokenness, which should be fixed now
- Basic PV-IOMMU implementation, which I think Malcolm should be posting very soon
- Shadow GTT emulation (which is what we're looking at here)

I had a chat with George earlier and wondered whether we might approach things this way:

- Add a new sub op to HVMOP_[un]map_io_range_to_ioreq_server or a new HVM_[un]map_gfn_to_ioreq_server
- Make rb_rangeset the default rangeset implementation (since I believe it can only be an improvement even if it makes no significant difference for small numbers of ranges)
- Use a common implementation for MMIO ranges and GFN ranges *for now* (which does mean increasing the limit) to allow us to be functionally complete for 4.6
- Follow up (post-4.6) with a different ioreq server redirection implementation for mmio-dm and mmio-dm-write pages, possibly based on claiming a range of page types for emulator use, to optimize shadow GTT implementation.

Does that sound plausible?

  Paul

> Jan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server
  2015-07-07 15:12                           ` Paul Durrant
@ 2015-07-07 15:27                             ` Jan Beulich
  2015-07-07 15:29                               ` Paul Durrant
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2015-07-07 15:27 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Kevin Tian, Keir (Xen.org), Andrew Cooper, George Dunlap,
	xen-devel@lists.xen.org, Zhang Yu, zhiyuan.lv@intel.com

>>> On 07.07.15 at 17:12, <Paul.Durrant@citrix.com> wrote:
> I don't think that the Xan parts of XenGT are that far off. I had a list of 
> 3 items:
> 
> - 16 byte I/O brokenness, which should be fixed now
> - Basic PV-IOMMU implementation, which I think Malcolm should be posting very soon

No matter how soon Malcolm would post that, unless it is in a shape
to go in without any revisions, it will - afaict - miss 4.6 (freezing at
the end of the week). I can't even guarantee that your HVM
emulation updates will have gone in by then, but I very much hope
so.

> - Shadow GTT emulation (which is what we're looking at here)
> 
> I had a chat with George earlier and wondered whether we might approach 
> things this way:
> 
> - Add a new sub op to HVMOP_[un]map_io_range_to_ioreq_server or a new 
> HVM_[un]map_gfn_to_ioreq_server
> - Make rb_rangeset the default rangeset implementation (since I believe it 
> can only be an improvement even if it makes no significant difference for 
> small numbers of ranges)
> - Use a common implementation for MMIO ranges and GFN ranges *for now* (which 
> does mean increasing the limit) to allow us to be functionally complete for 
> 4.6
> - Follow up (post-4.6) with a different ioreq server redirection 
> implementation for mmio-dm and mmio-dm-write pages, possibly based on claiming a 
> range of page types for emulator use, to optimize shadow GTT implementation.
> 
> Does that sound plausible?

Yes, except that I think all of this will be post-4.6.

Jan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server
  2015-07-07 15:27                             ` Jan Beulich
@ 2015-07-07 15:29                               ` Paul Durrant
  0 siblings, 0 replies; 24+ messages in thread
From: Paul Durrant @ 2015-07-07 15:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Keir (Xen.org), Andrew Cooper, George Dunlap,
	xen-devel@lists.xen.org, Zhang Yu, zhiyuan.lv@intel.com

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 07 July 2015 16:27
> To: Paul Durrant
> Cc: Andrew Cooper; George Dunlap; Kevin Tian; zhiyuan.lv@intel.com; Zhang
> Yu; xen-devel@lists.xen.org; Keir (Xen.org)
> Subject: RE: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for
> ioreq server
> 
> >>> On 07.07.15 at 17:12, <Paul.Durrant@citrix.com> wrote:
> > I don't think that the Xan parts of XenGT are that far off. I had a list of
> > 3 items:
> >
> > - 16 byte I/O brokenness, which should be fixed now
> > - Basic PV-IOMMU implementation, which I think Malcolm should be
> posting very soon
> 
> No matter how soon Malcolm would post that, unless it is in a shape
> to go in without any revisions, it will - afaict - miss 4.6 (freezing at
> the end of the week). I can't even guarantee that your HVM
> emulation updates will have gone in by then, but I very much hope
> so.
> 
> > - Shadow GTT emulation (which is what we're looking at here)
> >
> > I had a chat with George earlier and wondered whether we might approach
> > things this way:
> >
> > - Add a new sub op to HVMOP_[un]map_io_range_to_ioreq_server or a
> new
> > HVM_[un]map_gfn_to_ioreq_server
> > - Make rb_rangeset the default rangeset implementation (since I believe it
> > can only be an improvement even if it makes no significant difference for
> > small numbers of ranges)
> > - Use a common implementation for MMIO ranges and GFN ranges *for
> now* (which
> > does mean increasing the limit) to allow us to be functionally complete for
> > 4.6
> > - Follow up (post-4.6) with a different ioreq server redirection
> > implementation for mmio-dm and mmio-dm-write pages, possibly based
> on claiming a
> > range of page types for emulator use, to optimize shadow GTT
> implementation.
> >
> > Does that sound plausible?
> 
> Yes, except that I think all of this will be post-4.6.
> 

Ok, if we miss 4.6 then so be it, but we'll go with that plan and live in hope ;-)

  Paul

> Jan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server
  2015-07-07 15:10                                 ` Jan Beulich
@ 2015-07-07 16:02                                   ` Yu, Zhang
  0 siblings, 0 replies; 24+ messages in thread
From: Yu, Zhang @ 2015-07-07 16:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Keir (Xen.org), Andrew Cooper, George Dunlap,
	xen-devel@lists.xen.org, Paul Durrant, zhiyuan.lv@intel.com



On 7/7/2015 11:10 PM, Jan Beulich wrote:
>>>> On 07.07.15 at 16:49, <yu.c.zhang@linux.intel.com> wrote:
>> On 7/7/2015 10:43 PM, Jan Beulich wrote:
>>>>>> On 07.07.15 at 16:30, <yu.c.zhang@linux.intel.com> wrote:
>>>> I know that George and you have concerns about the differences
>>>> between MMIO and guest page tables, but I do not quite understand
>>>> why. :)
>>>
>>> But you read George's very nice description of the differences? I
>>> ask because if you did, I don't see why you re-raise the question
>>> above.
>>>
>>
>> Well, yes. I guess you mean this statement:
>> "the former is one or two actual ranges of a significant size; the
>> latter are (apparently) thousands of ranges of one page each."?
>> But I do not understand why this is abusing the io range interface.
>> Does the number matters so much? :)
>
> Yes, we specifically set it that low so misbehaving tool stacks
> (perhaps de-privileged) can't cause the hypervisor to allocate
> undue amounts of memory for tracking these ranges. This
> concern, btw, applies as much to the rb-rangesets.
>
Thanks for your explanation, Jan. : )
In fact, I have considered to add another patch to set this limit as
toolstack tunable.
One problem I encountered is that how to guarantee the validity
of the configured value - shall not over-consume the xen heap.
But I do agree there definitely should be more amendment patches.

B.R.
Yu

> Plus the number you bump MAX_NR_IO_RANGES to is - as I
> understood it - obtained phenomenologically, i.e. there's no
> reason not to assume that some bigger graphics card may
> need this to be further bumped. The current count is arbitrary
> too, but limiting guests only in so far as there can't be more
> than so many (possibly huge) MMIO ranges on the complete set
> of devices passed through to it.
>
> And finally, the I/O ranges are called I/O ranges because they
> are intended to cover I/O memory. RAM clearly isn't I/O memory,
> even if it may be accessed directly by devices.
>
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2015-07-07 16:02 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-06  6:25 [PATCH v2 0/2] Refactor ioreq server for better performance Yu Zhang
2015-07-06  6:25 ` [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server Yu Zhang
2015-07-06 12:35   ` George Dunlap
2015-07-06 12:38     ` Paul Durrant
2015-07-06 12:49       ` George Dunlap
2015-07-06 13:09         ` Paul Durrant
2015-07-06 13:23           ` George Dunlap
2015-07-06 13:28           ` George Dunlap
2015-07-06 13:33             ` Paul Durrant
2015-07-06 14:06               ` George Dunlap
2015-07-07  8:16                 ` Yu, Zhang
2015-07-07  9:23                   ` Paul Durrant
2015-07-07 12:53                     ` Jan Beulich
2015-07-07 13:11                       ` Paul Durrant
2015-07-07 14:04                         ` Jan Beulich
2015-07-07 14:30                           ` Yu, Zhang
2015-07-07 14:43                             ` Jan Beulich
2015-07-07 14:49                               ` Yu, Zhang
2015-07-07 15:10                                 ` Jan Beulich
2015-07-07 16:02                                   ` Yu, Zhang
2015-07-07 15:12                           ` Paul Durrant
2015-07-07 15:27                             ` Jan Beulich
2015-07-07 15:29                               ` Paul Durrant
2015-07-06  6:25 ` [PATCH v2 2/2] Add new data structure to track ranges Yu Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).