From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Yu, Zhang" Subject: Re: [PATCH] Refactor ioreq server for better performance Date: Tue, 30 Jun 2015 15:11:07 +0800 Message-ID: <5592410B.5040509@linux.intel.com> References: <1435314588-8815-1-git-send-email-yu.c.zhang@linux.intel.com> <9AAE0902D5BC7E449B7C8E4E778ABCD0259742CB@AMSPEX01CL02.citrite.net> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1Z9pnU-0006GT-2d for xen-devel@lists.xenproject.org; Tue, 30 Jun 2015 07:17:36 +0000 In-Reply-To: <9AAE0902D5BC7E449B7C8E4E778ABCD0259742CB@AMSPEX01CL02.citrite.net> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Paul Durrant , "xen-devel@lists.xenproject.org" , Andrew Cooper , "JBeulich@suse.com" , Kevin Tian , "zhiyuan.lv@intel.com" List-Id: xen-devel@lists.xenproject.org Thanks you, Paul. On 6/29/2015 8:12 PM, Paul Durrant wrote: >> -----Original Message----- >> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com] >> Sent: 26 June 2015 11:30 >> To: xen-devel@lists.xenproject.org; Paul Durrant; Andrew Cooper; >> JBeulich@suse.com; Kevin Tian; zhiyuan.lv@intel.com >> Subject: [PATCH] Refactor ioreq server for better performance >> >> XenGT leverages ioreq server to track and forward the accesses to >> GPU IO resources, e.g. the PPGTT(per-process graphic translation >> tables). Currently, ioreq server uses rangeset to track the BDF/ >> PIO/MMIO ranges to be emulated. To select an ioreq server, the >> rangeset is searched to see if the IO range is recorded. However, >> traversing the link list inside rangeset could be time consuming >> when number of ranges is too high. On HSW platform, number of PPGTTs >> for each vGPU could be several hundred. On BDW, this value could >> be several thousand. >> To increase the performance, a new data structure, rb_rangeset >> is defined. Compared with rangeset, which is based on doubly linked >> list with O(n) time complexity for searching, this rb_rangeset is >> based on red-black tree with O(log(n)) time complexity. Besides the >> underlying data structure difference with the rangeset, another one >> is that the rb_rangeset does not provide a spinlock, instead, it >> left this to users. >> Besides, NR_IO_RANGE_TYPES is changed to 8192 to accommodate more >> ranges. >> >> Signed-off-by: Yu Zhang >> --- >> xen/arch/x86/hvm/hvm.c | 52 ++++----- >> xen/common/Makefile | 1 + >> xen/common/rb_rangeset.c | 243 >> +++++++++++++++++++++++++++++++++++++++ >> xen/include/asm-x86/hvm/domain.h | 4 +- >> xen/include/xen/rb_rangeset.h | 45 ++++++++ >> 5 files changed, 311 insertions(+), 34 deletions(-) >> create mode 100644 xen/common/rb_rangeset.c >> create mode 100644 xen/include/xen/rb_rangeset.h >> >> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c >> index 535d622..be70925 100644 >> --- a/xen/arch/x86/hvm/hvm.c >> +++ b/xen/arch/x86/hvm/hvm.c >> @@ -37,6 +37,7 @@ >> #include >> #include >> #include >> +#include >> #include >> #include >> #include >> @@ -809,7 +810,7 @@ static void hvm_ioreq_server_unmap_pages(struct >> hvm_ioreq_server *s, >> } >> } >> >> -static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s, >> +static void hvm_ioreq_server_free_rb_rangesets(struct hvm_ioreq_server >> *s, > > Did you need to change the name of the function here? Got it. It's not necessary to change the name. :) > >> bool_t is_default) >> { >> unsigned int i; >> @@ -818,10 +819,10 @@ static void >> hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s, >> return; >> >> for ( i = 0; i < NR_IO_RANGE_TYPES; i++ ) >> - rangeset_destroy(s->range[i]); >> + rb_rangeset_destroy(s->range[i]); >> } >> >> -static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s, >> +static int hvm_ioreq_server_alloc_rb_rangesets(struct hvm_ioreq_server >> *s, > > Same here. Yes, and thanks. > >> bool_t is_default) >> { >> unsigned int i; >> @@ -832,33 +833,20 @@ static int hvm_ioreq_server_alloc_rangesets(struct >> hvm_ioreq_server *s, >> >> for ( i = 0; i < NR_IO_RANGE_TYPES; i++ ) >> { >> - char *name; >> - >> - rc = asprintf(&name, "ioreq_server %d %s", s->id, >> - (i == HVMOP_IO_RANGE_PORT) ? "port" : >> - (i == HVMOP_IO_RANGE_MEMORY) ? "memory" : >> - (i == HVMOP_IO_RANGE_PCI) ? "pci" : >> - ""); >> - if ( rc ) >> - goto fail; >> - >> - s->range[i] = rangeset_new(s->domain, name, >> - RANGESETF_prettyprint_hex); >> - >> - xfree(name); >> + s->range[i] = rb_rangeset_new(); >> > > I think assigning a name to the rangeset and having a debug-key dump is useful. Can you not duplicate that in your new implementation? > Well, I can add some dump routines, e.g. hvm_ioreq_server_dump_range(), to dump the ranges inside each ioreq server of a domain. This routine is similar to the rangeset_domain_printk(). But unlike the rangeset, which is also inserted in domain->rangesets list, the new rb_rangeset is only a member of ioreq server. So we can dump the ranges inside a domain by first access each ioreq server. Do you think this approach duplicated? >> rc = -ENOMEM; >> if ( !s->range[i] ) >> goto fail; >> >> - rangeset_limit(s->range[i], MAX_NR_IO_RANGES); >> + s->range[i]->nr_ranges = MAX_NR_IO_RANGES; > > I'd add a limit function rather than just stooging into the structure fields. > Yes, will do. :) >> } >> >> done: >> return 0; >> >> fail: >> - hvm_ioreq_server_free_rangesets(s, 0); >> + hvm_ioreq_server_free_rb_rangesets(s, 0); >> > > Without the name change this diff is gone. > Yes, and thanks. >> return rc; >> } >> @@ -934,7 +922,7 @@ static int hvm_ioreq_server_init(struct >> hvm_ioreq_server *s, struct domain *d, >> INIT_LIST_HEAD(&s->ioreq_vcpu_list); >> spin_lock_init(&s->bufioreq_lock); >> >> - rc = hvm_ioreq_server_alloc_rangesets(s, is_default); >> + rc = hvm_ioreq_server_alloc_rb_rangesets(s, is_default); > > And this one. > Yes, and thanks. >> if ( rc ) >> return rc; >> >> @@ -960,7 +948,7 @@ static int hvm_ioreq_server_init(struct >> hvm_ioreq_server *s, struct domain *d, >> hvm_ioreq_server_unmap_pages(s, is_default); >> >> fail_map: >> - hvm_ioreq_server_free_rangesets(s, is_default); >> + hvm_ioreq_server_free_rb_rangesets(s, is_default); >> > > And this one. > Yes, and thanks. >> return rc; >> } >> @@ -971,7 +959,7 @@ static void hvm_ioreq_server_deinit(struct >> hvm_ioreq_server *s, >> ASSERT(!s->enabled); >> hvm_ioreq_server_remove_all_vcpus(s); >> hvm_ioreq_server_unmap_pages(s, is_default); >> - hvm_ioreq_server_free_rangesets(s, is_default); >> + hvm_ioreq_server_free_rb_rangesets(s, is_default); > > And this one. > Yes, and thanks. >> } >> >> static ioservid_t next_ioservid(struct domain *d) >> @@ -1149,7 +1137,7 @@ static int >> hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id, >> >> if ( s->id == id ) >> { >> - struct rangeset *r; >> + struct rb_rangeset *r; >> >> switch ( type ) >> { >> @@ -1169,10 +1157,10 @@ static int >> hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id, >> break; >> >> rc = -EEXIST; >> - if ( rangeset_overlaps_range(r, start, end) ) >> + if ( rb_rangeset_overlaps_range(r, start, end) ) >> break; >> >> - rc = rangeset_add_range(r, start, end); >> + rc = rb_rangeset_add_range(r, start, end); >> break; >> } >> } >> @@ -1200,7 +1188,7 @@ static int >> hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id, >> >> if ( s->id == id ) >> { >> - struct rangeset *r; >> + struct rb_rangeset *r; >> >> switch ( type ) >> { >> @@ -1220,10 +1208,10 @@ static int >> hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id, >> break; >> >> rc = -ENOENT; >> - if ( !rangeset_contains_range(r, start, end) ) >> + if ( !rb_rangeset_contains_range(r, start, end) ) >> break; >> >> - rc = rangeset_remove_range(r, start, end); >> + rc = rb_rangeset_remove_range(r, start, end); >> break; >> } >> } >> @@ -2465,7 +2453,7 @@ struct hvm_ioreq_server >> *hvm_select_ioreq_server(struct domain *d, >> &d->arch.hvm_domain.ioreq_server.list, >> list_entry ) >> { >> - struct rangeset *r; >> + struct rb_rangeset *r; >> >> if ( s == d->arch.hvm_domain.default_ioreq_server ) >> continue; >> @@ -2484,18 +2472,18 @@ struct hvm_ioreq_server >> *hvm_select_ioreq_server(struct domain *d, >> >> case IOREQ_TYPE_PIO: >> end = addr + p->size - 1; >> - if ( rangeset_contains_range(r, addr, end) ) >> + if ( rb_rangeset_contains_range(r, addr, end) ) >> return s; >> >> break; >> case IOREQ_TYPE_COPY: >> end = addr + (p->size * p->count) - 1; >> - if ( rangeset_contains_range(r, addr, end) ) >> + if ( rb_rangeset_contains_range(r, addr, end) ) >> return s; >> >> break; >> case IOREQ_TYPE_PCI_CONFIG: >> - if ( rangeset_contains_singleton(r, addr >> 32) ) >> + if ( rb_rangeset_contains_range(r, addr >> 32, addr >> 32) ) >> { >> p->type = type; >> p->addr = addr; >> diff --git a/xen/common/Makefile b/xen/common/Makefile >> index 1cddebc..ec9273e 100644 >> --- a/xen/common/Makefile >> +++ b/xen/common/Makefile >> @@ -26,6 +26,7 @@ obj-$(HAS_PDX) += pdx.o >> obj-y += preempt.o >> obj-y += random.o >> obj-y += rangeset.o >> +obj-y += rb_rangeset.o >> obj-y += radix-tree.o >> obj-y += rbtree.o >> obj-y += rcupdate.o >> diff --git a/xen/common/rb_rangeset.c b/xen/common/rb_rangeset.c >> new file mode 100644 >> index 0000000..d3f3a08 >> --- /dev/null >> +++ b/xen/common/rb_rangeset.c >> @@ -0,0 +1,243 @@ >> +/* >> + Red-black tree based rangeset >> + >> + This program is free software; you can redistribute it and/or modify >> + it under the terms of the GNU General Public License as published by >> + the Free Software Foundation; either version 2 of the License, or >> + (at your option) any later version. >> + >> + This program is distributed in the hope that it will be useful, >> + but WITHOUT ANY WARRANTY; without even the implied warranty of >> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> + GNU General Public License for more details. >> + >> + You should have received a copy of the GNU General Public License >> + along with this program; if not, write to the Free Software >> + Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA >> +*/ >> + >> +#include >> +#include >> +#include >> +#include >> + >> +static struct rb_range *alloc_and_init_rb_range( >> + struct rb_rangeset *r, unsigned long s, unsigned long e) >> +{ >> + struct rb_range *range = NULL; >> + >> + if ( r->nr_ranges == 0 ) >> + return NULL; >> + >> + range = xmalloc(struct rb_range); >> + if ( range ) >> + { >> + --r->nr_ranges; >> + range->s = s; >> + range->e = e; >> + } >> + return range; >> +} >> + >> +static void free_rb_range(struct rb_rangeset *r, struct rb_range *range) >> +{ >> + r->nr_ranges++; >> + rb_erase(&range->node, &r->rbroot); >> + xfree(range); >> +} >> + >> +static struct rb_range *rb_rangeset_find_range( >> + struct rb_rangeset *r, unsigned long s) >> +{ >> + struct rb_node *node; >> + >> + node = r->rbroot.rb_node; >> + >> + while ( node ) >> + { >> + struct rb_range *range = container_of(node, struct rb_range, node); >> + >> + if ( (s >= range->s) && (s <= range->e) ) >> + return range; >> + if ( s < range->s ) >> + node = node->rb_left; >> + else if ( s > range->s ) >> + node = node->rb_right; >> + } >> + return NULL; >> +} >> + >> +bool_t rb_rangeset_overlaps_range(struct rb_rangeset *r, >> + unsigned long s, unsigned long e) >> +{ >> + struct rb_node *node; >> + bool_t rc = 0; >> + >> + ASSERT(s <= e); >> + >> + node = r->rbroot.rb_node; >> + while ( node ) >> + { >> + struct rb_range *range = container_of(node, struct rb_range, node); >> + if ( (s <= range->e) && (e >= range->s) ) >> + { >> + rc = 1; >> + break; >> + } >> + else if ( s < range->s ) >> + node = node->rb_left; >> + else if ( s > range->s ) >> + node = node->rb_right; >> + } >> + return rc; >> +} >> + >> +bool_t rb_rangeset_contains_range( >> + struct rb_rangeset *r, unsigned long s, unsigned long e) >> +{ >> + struct rb_range *range; >> + bool_t contains; >> + >> + ASSERT(s <= e); >> + >> + range = rb_rangeset_find_range(r, s); >> + contains = (range && (range->e >= e)); >> + return contains; >> +} >> + >> +static void rb_rangeset_insert_range( >> + struct rb_root *root, struct rb_range *range) >> +{ >> + struct rb_node **new = &(root->rb_node); >> + struct rb_node *parent = NULL; >> + >> + /* Figure out where to put new node */ >> + while ( *new ) >> + { >> + struct rb_range *this = container_of(*new, struct rb_range, node); >> + parent = *new; >> + >> + if ( range->s < this->s ) >> + new = &((*new)->rb_left); >> + else if ( range->s > this->s ) >> + new = &((*new)->rb_right); >> + } >> + >> + /* Add new node and rebalance the range tree. */ >> + rb_link_node(&range->node, parent, new); >> + rb_insert_color(&range->node, root); >> +} >> + >> +/* >> + * Add a new range into the rb_rangeset, rb_rangeset_overlaps_range() >> + * should be called first, to ensure nodes inside the rb_rangeset will >> + * not interleave. >> + */ >> +int rb_rangeset_add_range(struct rb_rangeset *r, >> + unsigned long s, unsigned long e) >> +{ >> + struct rb_range *range = NULL; >> + struct rb_range *next = NULL; >> + >> + ASSERT(s <= e); >> + >> + if ( (s) && (range = rb_rangeset_find_range(r, s - 1)) ) >> + { >> + /* range tree overlapped */ >> + if ( range->e != (s - 1) ) >> + return -EEXIST; >> + range->e = e; >> + } >> + else >> + { >> + range = alloc_and_init_rb_range(r, s, e); >> + if ( !range ) >> + return -ENOMEM; >> + rb_rangeset_insert_range(&r->rbroot, range); >> + } >> + >> + next = container_of(rb_next(&range->node), struct rb_range, node); >> + >> + if ( next && (next->s == (e + 1)) ) >> + { >> + range->e = next->e; >> + free_rb_range(r, next); >> + } >> + >> + return 0; >> +} >> + >> +int rb_rangeset_remove_range(struct rb_rangeset *r, >> + unsigned long s, unsigned long e) >> +{ >> + struct rb_range *range = NULL; >> + struct rb_range *next = NULL; >> + unsigned long start, end; >> + >> + ASSERT(s <= e); >> + >> + range = rb_rangeset_find_range(r, s); >> + if ( !range ) >> + return -ENOENT; >> + >> + start = range->s; >> + end = range->e; >> + >> + /* the range to be removed must be contained in one rb_range */ >> + if ( end < e ) >> + return -ENOENT; >> + >> + /* value of start can only be less than or equal to s */ >> + if ( start == s ) >> + { >> + if ( end > e ) >> + range->s = e + 1; >> + else >> + free_rb_range(r, range); >> + } >> + else >> + { >> + if ( end > e ) >> + { >> + next = alloc_and_init_rb_range(r, e + 1, end); >> + if ( next == NULL ) >> + return -ENOMEM; >> + >> + rb_rangeset_insert_range(&r->rbroot, next); >> + } >> + range->e = s - 1; >> + } >> + return 0; >> +} >> + >> +void rb_rangeset_destroy(struct rb_rangeset *r) >> +{ >> + struct rb_root *root; >> + struct rb_node *node; >> + >> + if ( r == NULL ) >> + return; >> + >> + root = &r->rbroot; >> + node = rb_first(root); >> + while ( node ) >> + { >> + struct rb_range *range = container_of(node, struct rb_range, node); >> + free_rb_range(r, range); >> + node = rb_first(root); >> + } >> + >> + xfree(r); >> +} >> + >> +struct rb_rangeset *rb_rangeset_new() >> +{ >> + struct rb_rangeset *r; >> + >> + r = xmalloc(struct rb_rangeset); >> + if ( r == NULL ) >> + return NULL; >> + >> + r->rbroot = RB_ROOT; >> + return r; >> +} >> diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm- >> x86/hvm/domain.h >> index ad68fcf..a2f60a8 100644 >> --- a/xen/include/asm-x86/hvm/domain.h >> +++ b/xen/include/asm-x86/hvm/domain.h >> @@ -49,7 +49,7 @@ struct hvm_ioreq_vcpu { >> }; >> >> #define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_PCI + 1) >> -#define MAX_NR_IO_RANGES 256 >> +#define MAX_NR_IO_RANGES 8192 > > I this limit enough? I think having something that's toolstack-tunable would be more future-proof. > By now, this value would suffice. I'd prefer this as a default value. As you know, I have also considered a xl param to do so. And one question is that, would a per-domain param appropriate? I mean, each hvm can have multiple ioreq servers. If this is acceptible, I can cook another patch to do so, is this OK? Thanks Yu > Paul > >> >> struct hvm_ioreq_server { >> struct list_head list_entry; >> @@ -68,7 +68,7 @@ struct hvm_ioreq_server { >> /* Lock to serialize access to buffered ioreq ring */ >> spinlock_t bufioreq_lock; >> evtchn_port_t bufioreq_evtchn; >> - struct rangeset *range[NR_IO_RANGE_TYPES]; >> + struct rb_rangeset *range[NR_IO_RANGE_TYPES]; >> bool_t enabled; >> bool_t bufioreq_atomic; >> }; >> diff --git a/xen/include/xen/rb_rangeset.h >> b/xen/include/xen/rb_rangeset.h >> new file mode 100644 >> index 0000000..768230c >> --- /dev/null >> +++ b/xen/include/xen/rb_rangeset.h >> @@ -0,0 +1,45 @@ >> +/* >> + Red-black tree based rangeset >> + >> + This program is free software; you can redistribute it and/or modify >> + it under the terms of the GNU General Public License as published by >> + the Free Software Foundation; either version 2 of the License, or >> + (at your option) any later version. >> + >> + This program is distributed in the hope that it will be useful, >> + but WITHOUT ANY WARRANTY; without even the implied warranty of >> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> + GNU General Public License for more details. >> + >> + You should have received a copy of the GNU General Public License >> + along with this program; if not, write to the Free Software >> + Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA >> +*/ >> + >> +#ifndef __RB_RANGESET_H__ >> +#define __RB_RANGESET_H__ >> + >> +#include >> + >> +struct rb_rangeset { >> + long nr_ranges; >> + struct rb_root rbroot; >> +}; >> + >> +struct rb_range { >> + struct rb_node node; >> + unsigned long s, e; >> +}; >> + >> +struct rb_rangeset *rb_rangeset_new(void); >> +void rb_rangeset_destroy(struct rb_rangeset *r); >> +bool_t rb_rangeset_overlaps_range(struct rb_rangeset *r, >> + unsigned long s, unsigned long e); >> +bool_t rb_rangeset_contains_range( >> + struct rb_rangeset *r, unsigned long s, unsigned long e); >> +int rb_rangeset_add_range(struct rb_rangeset *r, >> + unsigned long s, unsigned long e); >> +int rb_rangeset_remove_range(struct rb_rangeset *r, >> + unsigned long s, unsigned long e); >> + >> +#endif /* __RB_RANGESET_H__ */ >> -- >> 1.9.1 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel > >