[PATCH v4 0/8] Support for running secondary emulators

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 0/8] Support for running secondary emulators
@ 2014-04-02 15:11 Paul Durrant
  2014-04-02 15:11 ` [PATCH v4 1/8] ioreq-server: pre-series tidy up Paul Durrant
                   ` (7 more replies)
  0 siblings, 8 replies; 62+ messages in thread
From: Paul Durrant @ 2014-04-02 15:11 UTC (permalink / raw)
  To: xen-devel

This patch series adds the ioreq server interface which I mentioned in
my talk at the Xen developer summit in Edinburgh at the end of last year.
The code is based on work originally done by Julien Grall but has been
re-written to allow existing versions of QEMU to work unmodified.

The code is available in my xen.git [1] repo on xenbits, under the 'savannah4'
branch, and I have also written a demo emulator to test the code, which can
be found in my demu.git [2] repo.

The series has been re-worked since v3. The modifications are now broken
down as follows:

Patch #1 is a pre-series tidy-up. No semantic change.

Patch #2 moves some code around to centralize use of the ioreq_t data
structure.

Patch #3 introduces the new hvm_ioreq_server structure.

Patch #4 defers creation of the ioreq server until something actually
reads one of the HVM parameters concerned with emulation.

Patch #5 makes the single ioreq server of previous patches into the
default ioreq server and introduces an API for creating secondary servers.

Patch #6 adds an enable/disable operation to the API for secondary servers
which makes sure that they cannot be active whilst their shared pages are
present in the guest's P2M.

Patch #7 adds makes handling bufferd ioreqs optional for secondary servers.
This saves a page of memory.

Patch #8 pulls the PCI hotplug controller emulation into Xen. This is
necessary to allow a secondary emulator to hotplug a PCI device into the VM.
The code implements the controller in the same way as upstream QEMU and thus
the variant of the DSDT ASL used for upstream QEMU is retained.

The demo emulator can simply be invoked from a shell and will hotplug its
device onto the PCI bus (and remove it again when it's killed). The emulated
device is not an awful lot of use at this stage - it appears as a SCSI
controller with one IO BAR and one MEM BAR and has no intrinsic
functionality... but then it is only supposed to be demo :-)

  Paul

[1] http://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git
[2] http://xenbits.xen.org/gitweb/?p=people/pauldu/demu.git

v2:
 - First non-RFC posting

v3:
 - Addressed comments from Jan Beulich

v4:
 - Addressed comments from Ian Campbell and George Dunlap
 - Series heavily re-worked, 2 patches added

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 1/8] ioreq-server: pre-series tidy up
  2014-04-02 15:11 [PATCH v4 0/8] Support for running secondary emulators Paul Durrant
@ 2014-04-02 15:11 ` Paul Durrant
  2014-04-07 10:48   ` Jan Beulich
  2014-04-02 15:11 ` [PATCH v4 2/8] ioreq-server: centralize access to ioreq structures Paul Durrant
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 62+ messages in thread
From: Paul Durrant @ 2014-04-02 15:11 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant, Keir Fraser, Jan Beulich

This patch tidies up various parts of the code that following patches move
around. If these modifications were combined with the code motion it would
be easy to miss them.

There's also some function renaming to reflect purpose and a single
whitespace fix.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/hvm/hvm.c            |   24 ++++++++++++------------
 xen/arch/x86/hvm/io.c             |   36 +++++++++++++++---------------------
 xen/arch/x86/hvm/stdvga.c         |    2 +-
 xen/include/asm-x86/hvm/io.h      |    2 +-
 xen/include/asm-x86/hvm/support.h |    2 ++
 5 files changed, 31 insertions(+), 35 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5e89cf5..69d0a44 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -354,7 +354,7 @@ void hvm_migrate_pirqs(struct vcpu *v)
 
 void hvm_do_resume(struct vcpu *v)
 {
-    ioreq_t *p;
+    ioreq_t *p = get_ioreq(v);
 
     check_wakeup_from_wait();
 
@@ -362,7 +362,7 @@ void hvm_do_resume(struct vcpu *v)
         pt_restore_timer(v);
 
     /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
-    if ( !(p = get_ioreq(v)) )
+    if ( !p )
         goto check_inject_trap;
 
     while ( p->state != STATE_IOREQ_NONE )
@@ -415,7 +415,7 @@ void destroy_ring_for_helper(
     }
 }
 
-static void hvm_destroy_ioreq_page(
+static void hvm_unmap_ioreq_page(
     struct domain *d, struct hvm_ioreq_page *iorp)
 {
     spin_lock(&iorp->lock);
@@ -471,7 +471,7 @@ int prepare_ring_for_helper(
     return 0;
 }
 
-static int hvm_set_ioreq_page(
+static int hvm_map_ioreq_page(
     struct domain *d, struct hvm_ioreq_page *iorp, unsigned long gmfn)
 {
     struct page_info *page;
@@ -485,7 +485,7 @@ static int hvm_set_ioreq_page(
 
     if ( (iorp->va != NULL) || d->is_dying )
     {
-        destroy_ring_for_helper(&iorp->va, iorp->page);
+        destroy_ring_for_helper(&va, page);
         spin_unlock(&iorp->lock);
         return -EINVAL;
     }
@@ -641,8 +641,8 @@ void hvm_domain_relinquish_resources(struct domain *d)
     if ( hvm_funcs.nhvm_domain_relinquish_resources )
         hvm_funcs.nhvm_domain_relinquish_resources(d);
 
-    hvm_destroy_ioreq_page(d, &d->arch.hvm_domain.ioreq);
-    hvm_destroy_ioreq_page(d, &d->arch.hvm_domain.buf_ioreq);
+    hvm_unmap_ioreq_page(d, &d->arch.hvm_domain.ioreq);
+    hvm_unmap_ioreq_page(d, &d->arch.hvm_domain.buf_ioreq);
 
     msixtbl_pt_cleanup(d);
 
@@ -1416,12 +1416,12 @@ void hvm_vcpu_down(struct vcpu *v)
 
 bool_t hvm_send_assist_req(struct vcpu *v)
 {
-    ioreq_t *p;
+    ioreq_t *p = get_ioreq(v);
 
     if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
         return 0; /* implicitly bins the i/o operation */
 
-    if ( !(p = get_ioreq(v)) )
+    if ( !p )
         return 0;
 
     if ( unlikely(p->state != STATE_IOREQ_NONE) )
@@ -4118,7 +4118,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             {
             case HVM_PARAM_IOREQ_PFN:
                 iorp = &d->arch.hvm_domain.ioreq;
-                if ( (rc = hvm_set_ioreq_page(d, iorp, a.value)) != 0 )
+                if ( (rc = hvm_map_ioreq_page(d, iorp, a.value)) != 0 )
                     break;
                 spin_lock(&iorp->lock);
                 if ( iorp->va != NULL )
@@ -4127,9 +4127,9 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                         get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
                 spin_unlock(&iorp->lock);
                 break;
-            case HVM_PARAM_BUFIOREQ_PFN: 
+            case HVM_PARAM_BUFIOREQ_PFN:
                 iorp = &d->arch.hvm_domain.buf_ioreq;
-                rc = hvm_set_ioreq_page(d, iorp, a.value);
+                rc = hvm_map_ioreq_page(d, iorp, a.value);
                 break;
             case HVM_PARAM_CALLBACK_IRQ:
                 hvm_set_callback_via(d, a.value);
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index bf6309d..5ba38d2 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -46,10 +46,9 @@
 #include <xen/iocap.h>
 #include <public/hvm/ioreq.h>
 
-int hvm_buffered_io_send(ioreq_t *p)
+int hvm_buffered_io_send(struct domain *d, const ioreq_t *p)
 {
-    struct vcpu *v = current;
-    struct hvm_ioreq_page *iorp = &v->domain->arch.hvm_domain.buf_ioreq;
+    struct hvm_ioreq_page *iorp = &d->arch.hvm_domain.buf_ioreq;
     buffered_iopage_t *pg = iorp->va;
     buf_ioreq_t bp;
     /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
@@ -104,22 +103,20 @@ int hvm_buffered_io_send(ioreq_t *p)
         return 0;
     }
     
-    memcpy(&pg->buf_ioreq[pg->write_pointer % IOREQ_BUFFER_SLOT_NUM],
-           &bp, sizeof(bp));
+    pg->buf_ioreq[pg->write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp;
     
     if ( qw )
     {
         bp.data = p->data >> 32;
-        memcpy(&pg->buf_ioreq[(pg->write_pointer+1) % IOREQ_BUFFER_SLOT_NUM],
-               &bp, sizeof(bp));
+        pg->buf_ioreq[(pg->write_pointer+1) % IOREQ_BUFFER_SLOT_NUM] = bp;
     }
 
     /* Make the ioreq_t visible /before/ write_pointer. */
     wmb();
     pg->write_pointer += qw ? 2 : 1;
 
-    notify_via_xen_event_channel(v->domain,
-            v->domain->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
+    notify_via_xen_event_channel(d,
+            d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
     spin_unlock(&iorp->lock);
     
     return 1;
@@ -127,22 +124,19 @@ int hvm_buffered_io_send(ioreq_t *p)
 
 void send_timeoffset_req(unsigned long timeoff)
 {
-    ioreq_t p[1];
+    ioreq_t p = {
+        .type = IOREQ_TYPE_TIMEOFFSET,
+        .size = 8,
+        .count = 1,
+        .dir = IOREQ_WRITE,
+        .data = timeoff,
+        .state = STATE_IOREQ_READY,
+    };
 
     if ( timeoff == 0 )
         return;
 
-    memset(p, 0, sizeof(*p));
-
-    p->type = IOREQ_TYPE_TIMEOFFSET;
-    p->size = 8;
-    p->count = 1;
-    p->dir = IOREQ_WRITE;
-    p->data = timeoff;
-
-    p->state = STATE_IOREQ_READY;
-
-    if ( !hvm_buffered_io_send(p) )
+    if ( !hvm_buffered_io_send(current->domain, &p) )
         printk("Unsuccessful timeoffset update\n");
 }
 
diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c
index 19e80ed..9e2d28e 100644
--- a/xen/arch/x86/hvm/stdvga.c
+++ b/xen/arch/x86/hvm/stdvga.c
@@ -580,7 +580,7 @@ static int stdvga_intercept_mmio(ioreq_t *p)
         buf = (p->dir == IOREQ_WRITE);
     }
 
-    rc = (buf && hvm_buffered_io_send(p));
+    rc = (buf && hvm_buffered_io_send(d, p));
 
     spin_unlock(&s->lock);
 
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 86db58d..bfd28c2 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -92,7 +92,7 @@ static inline int hvm_buffered_io_intercept(ioreq_t *p)
 }
 
 int hvm_mmio_intercept(ioreq_t *p);
-int hvm_buffered_io_send(ioreq_t *p);
+int hvm_buffered_io_send(struct domain *d, const ioreq_t *p);
 
 static inline void register_portio_handler(
     struct domain *d, unsigned long addr,
diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-x86/hvm/support.h
index 3529499..1dc2f2d 100644
--- a/xen/include/asm-x86/hvm/support.h
+++ b/xen/include/asm-x86/hvm/support.h
@@ -31,7 +31,9 @@ static inline ioreq_t *get_ioreq(struct vcpu *v)
 {
     struct domain *d = v->domain;
     shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
+
     ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
+
     return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
 }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 1/8] ioreq-server: pre-series tidy up
  2014-04-02 15:11 ` [PATCH v4 1/8] ioreq-server: pre-series tidy up Paul Durrant
@ 2014-04-07 10:48   ` Jan Beulich
  2014-04-08  9:13     ` Paul Durrant
  0 siblings, 1 reply; 62+ messages in thread
From: Jan Beulich @ 2014-04-07 10:48 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Keir Fraser, xen-devel

>>> On 02.04.14 at 17:11, <paul.durrant@citrix.com> wrote:
> @@ -485,7 +485,7 @@ static int hvm_set_ioreq_page(
>  
>      if ( (iorp->va != NULL) || d->is_dying )
>      {
> -        destroy_ring_for_helper(&iorp->va, iorp->page);
> +        destroy_ring_for_helper(&va, page);

This clearly isn't just tidying: The bug fix should at least be mentioned
in the description, but for the purposes of backporting it should
probably be submitted as a separate patch.

> --- a/xen/arch/x86/hvm/io.c
> +++ b/xen/arch/x86/hvm/io.c
> @@ -46,10 +46,9 @@
>  #include <xen/iocap.h>
>  #include <public/hvm/ioreq.h>
>  
> -int hvm_buffered_io_send(ioreq_t *p)
> +int hvm_buffered_io_send(struct domain *d, const ioreq_t *p)
>  {
> -    struct vcpu *v = current;
> -    struct hvm_ioreq_page *iorp = &v->domain->arch.hvm_domain.buf_ioreq;
> +    struct hvm_ioreq_page *iorp = &d->arch.hvm_domain.buf_ioreq;

This isn't a purely cosmetic change either, especially without an
ASSERT(current->domain == d). It looks to be correct with one minor
exception: There's a gdprintk() in this function, which - if you don't
expect the function to be called for the current domain only - needs to
be altered to not falsely print the current vCPU as subject anymore.

Jan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 1/8] ioreq-server: pre-series tidy up
  2014-04-07 10:48   ` Jan Beulich
@ 2014-04-08  9:13     ` Paul Durrant
  0 siblings, 0 replies; 62+ messages in thread
From: Paul Durrant @ 2014-04-08  9:13 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir (Xen.org), xen-devel@lists.xen.org

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 07 April 2014 11:49
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Keir (Xen.org)
> Subject: Re: [PATCH v4 1/8] ioreq-server: pre-series tidy up
> 
> >>> On 02.04.14 at 17:11, <paul.durrant@citrix.com> wrote:
> > @@ -485,7 +485,7 @@ static int hvm_set_ioreq_page(
> >
> >      if ( (iorp->va != NULL) || d->is_dying )
> >      {
> > -        destroy_ring_for_helper(&iorp->va, iorp->page);
> > +        destroy_ring_for_helper(&va, page);
> 
> This clearly isn't just tidying: The bug fix should at least be mentioned
> in the description, but for the purposes of backporting it should
> probably be submitted as a separate patch.

Ok. I'll separate it.

> 
> > --- a/xen/arch/x86/hvm/io.c
> > +++ b/xen/arch/x86/hvm/io.c
> > @@ -46,10 +46,9 @@
> >  #include <xen/iocap.h>
> >  #include <public/hvm/ioreq.h>
> >
> > -int hvm_buffered_io_send(ioreq_t *p)
> > +int hvm_buffered_io_send(struct domain *d, const ioreq_t *p)
> >  {
> > -    struct vcpu *v = current;
> > -    struct hvm_ioreq_page *iorp = &v->domain-
> >arch.hvm_domain.buf_ioreq;
> > +    struct hvm_ioreq_page *iorp = &d->arch.hvm_domain.buf_ioreq;
> 
> This isn't a purely cosmetic change either, especially without an
> ASSERT(current->domain == d). It looks to be correct with one minor
> exception: There's a gdprintk() in this function, which - if you don't
> expect the function to be called for the current domain only - needs to
> be altered to not falsely print the current vCPU as subject anymore.
> 

It will only by called for the current domain - I was just making it more analogous with hvm_send_assist_req() which takes a vcpu as an argument even though, in practice, it's always current. Perhaps I should tidy away that argument rather than introducing a new one here.

  Paul

> Jan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 2/8] ioreq-server: centralize access to ioreq structures
  2014-04-02 15:11 [PATCH v4 0/8] Support for running secondary emulators Paul Durrant
  2014-04-02 15:11 ` [PATCH v4 1/8] ioreq-server: pre-series tidy up Paul Durrant
@ 2014-04-02 15:11 ` Paul Durrant
  2014-04-03 11:22   ` George Dunlap
  2014-04-07 11:10   ` Jan Beulich
  2014-04-02 15:11 ` [PATCH v4 3/8] ioreq-server: create basic ioreq server abstraction Paul Durrant
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 62+ messages in thread
From: Paul Durrant @ 2014-04-02 15:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, Jan Beulich, Eddie Dong, Paul Durrant,
	Jun Nakajima

To simplify creation of the ioreq server abstraction in a subsequent patch,
this patch centralizes all use of the shared ioreq structure and the
buffered ioreq ring to the source module xen/arch/x86/hvm/hvm.c.

The patch moves an rmb() from inside hvm_io_assist() to hvm_do_resume()
because the former may now be passed a data structure on stack, in which
case the barrier is unnecessary.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Eddie Dong <eddie.dong@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
---
 xen/arch/x86/hvm/emulate.c        |   70 +++++++++-------------
 xen/arch/x86/hvm/hvm.c            |  118 +++++++++++++++++++++++++++++++++++--
 xen/arch/x86/hvm/io.c             |  104 +++-----------------------------
 xen/arch/x86/hvm/vmx/vvmx.c       |   13 +++-
 xen/include/asm-x86/hvm/hvm.h     |   15 ++++-
 xen/include/asm-x86/hvm/support.h |   21 ++++---
 6 files changed, 185 insertions(+), 156 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 868aa1d..1c71902 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -57,24 +57,11 @@ static int hvmemul_do_io(
     int value_is_ptr = (p_data == NULL);
     struct vcpu *curr = current;
     struct hvm_vcpu_io *vio;
-    ioreq_t *p = get_ioreq(curr);
-    ioreq_t _ioreq;
+    ioreq_t p;
     unsigned long ram_gfn = paddr_to_pfn(ram_gpa);
     p2m_type_t p2mt;
     struct page_info *ram_page;
     int rc;
-    bool_t has_dm = 1;
-
-    /*
-     * Domains without a backing DM, don't have an ioreq page.  Just
-     * point to a struct on the stack, initialising the state as needed.
-     */
-    if ( !p )
-    {
-        has_dm = 0;
-        p = &_ioreq;
-        p->state = STATE_IOREQ_NONE;
-    }
 
     /* Check for paged out page */
     ram_page = get_page_from_gfn(curr->domain, ram_gfn, &p2mt, P2M_UNSHARE);
@@ -173,15 +160,6 @@ static int hvmemul_do_io(
         return X86EMUL_UNHANDLEABLE;
     }
 
-    if ( p->state != STATE_IOREQ_NONE )
-    {
-        gdprintk(XENLOG_WARNING, "WARNING: io already pending (%d)?\n",
-                 p->state);
-        if ( ram_page )
-            put_page(ram_page);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
     vio->io_state =
         (p_data == NULL) ? HVMIO_dispatched : HVMIO_awaiting_completion;
     vio->io_size = size;
@@ -193,38 +171,38 @@ static int hvmemul_do_io(
     if ( vio->mmio_retrying )
         *reps = 1;
 
-    p->dir = dir;
-    p->data_is_ptr = value_is_ptr;
-    p->type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO;
-    p->size = size;
-    p->addr = addr;
-    p->count = *reps;
-    p->df = df;
-    p->data = value;
+    p.dir = dir;
+    p.data_is_ptr = value_is_ptr;
+    p.type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO;
+    p.size = size;
+    p.addr = addr;
+    p.count = *reps;
+    p.df = df;
+    p.data = value;
 
     if ( dir == IOREQ_WRITE )
-        hvmtrace_io_assist(is_mmio, p);
+        hvmtrace_io_assist(is_mmio, &p);
 
     if ( is_mmio )
     {
-        rc = hvm_mmio_intercept(p);
+        rc = hvm_mmio_intercept(&p);
         if ( rc == X86EMUL_UNHANDLEABLE )
-            rc = hvm_buffered_io_intercept(p);
+            rc = hvm_buffered_io_intercept(&p);
     }
     else
     {
-        rc = hvm_portio_intercept(p);
+        rc = hvm_portio_intercept(&p);
     }
 
     switch ( rc )
     {
     case X86EMUL_OKAY:
     case X86EMUL_RETRY:
-        *reps = p->count;
-        p->state = STATE_IORESP_READY;
+        *reps = p.count;
+        p.state = STATE_IORESP_READY;
         if ( !vio->mmio_retry )
         {
-            hvm_io_assist(p);
+            hvm_io_assist(&p);
             vio->io_state = HVMIO_none;
         }
         else
@@ -233,7 +211,7 @@ static int hvmemul_do_io(
         break;
     case X86EMUL_UNHANDLEABLE:
         /* If there is no backing DM, just ignore accesses */
-        if ( !has_dm )
+        if ( !hvm_has_dm(curr->domain) )
         {
             rc = X86EMUL_OKAY;
             vio->io_state = HVMIO_none;
@@ -241,7 +219,7 @@ static int hvmemul_do_io(
         else
         {
             rc = X86EMUL_RETRY;
-            if ( !hvm_send_assist_req(curr) )
+            if ( !hvm_send_assist_req(curr, &p) )
                 vio->io_state = HVMIO_none;
             else if ( p_data == NULL )
                 rc = X86EMUL_OKAY;
@@ -260,7 +238,7 @@ static int hvmemul_do_io(
 
  finish_access:
     if ( dir == IOREQ_READ )
-        hvmtrace_io_assist(is_mmio, p);
+        hvmtrace_io_assist(is_mmio, &p);
 
     if ( p_data != NULL )
         memcpy(p_data, &vio->io_data, size);
@@ -1292,3 +1270,13 @@ struct segment_register *hvmemul_get_seg_reg(
         hvm_get_segment_register(current, seg, &hvmemul_ctxt->seg_reg[seg]);
     return &hvmemul_ctxt->seg_reg[seg];
 }
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 69d0a44..573f845 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -352,6 +352,26 @@ void hvm_migrate_pirqs(struct vcpu *v)
     spin_unlock(&d->event_lock);
 }
 
+static ioreq_t *get_ioreq(struct vcpu *v)
+{
+    struct domain *d = v->domain;
+    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
+
+    ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
+
+    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
+}
+
+bool_t hvm_io_pending(struct vcpu *v)
+{
+    ioreq_t *p = get_ioreq(v);
+
+    if ( !p )
+        return 0;
+
+    return ( p->state != STATE_IOREQ_NONE );
+}
+
 void hvm_do_resume(struct vcpu *v)
 {
     ioreq_t *p = get_ioreq(v);
@@ -370,11 +390,12 @@ void hvm_do_resume(struct vcpu *v)
         switch ( p->state )
         {
         case STATE_IORESP_READY: /* IORESP_READY -> NONE */
+            rmb(); /* see IORESP_READY /then/ read contents of ioreq */
             hvm_io_assist(p);
             break;
         case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
         case STATE_IOREQ_INPROCESS:
-            wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port,
+            wait_on_xen_event_channel(p->vp_eport,
                                       (p->state != STATE_IOREQ_READY) &&
                                       (p->state != STATE_IOREQ_INPROCESS));
             break;
@@ -1414,7 +1435,87 @@ void hvm_vcpu_down(struct vcpu *v)
     }
 }
 
-bool_t hvm_send_assist_req(struct vcpu *v)
+int hvm_buffered_io_send(struct domain *d, const ioreq_t *p)
+{
+    struct hvm_ioreq_page *iorp = &d->arch.hvm_domain.buf_ioreq;
+    buffered_iopage_t *pg = iorp->va;
+    buf_ioreq_t bp;
+    /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
+    int qw = 0;
+
+    /* Ensure buffered_iopage fits in a page */
+    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
+
+    /*
+     * Return 0 for the cases we can't deal with:
+     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
+     *  - we cannot buffer accesses to guest memory buffers, as the guest
+     *    may expect the memory buffer to be synchronously accessed
+     *  - the count field is usually used with data_is_ptr and since we don't
+     *    support data_is_ptr we do not waste space for the count field either
+     */
+    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
+        return 0;
+
+    bp.type = p->type;
+    bp.dir  = p->dir;
+    switch ( p->size )
+    {
+    case 1:
+        bp.size = 0;
+        break;
+    case 2:
+        bp.size = 1;
+        break;
+    case 4:
+        bp.size = 2;
+        break;
+    case 8:
+        bp.size = 3;
+        qw = 1;
+        break;
+    default:
+        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
+        return 0;
+    }
+
+    bp.data = p->data;
+    bp.addr = p->addr;
+
+    spin_lock(&iorp->lock);
+
+    if ( (pg->write_pointer - pg->read_pointer) >=
+         (IOREQ_BUFFER_SLOT_NUM - qw) )
+    {
+        /* The queue is full: send the iopacket through the normal path. */
+        spin_unlock(&iorp->lock);
+        return 0;
+    }
+
+    pg->buf_ioreq[pg->write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp;
+
+    if ( qw )
+    {
+        bp.data = p->data >> 32;
+        pg->buf_ioreq[(pg->write_pointer+1) % IOREQ_BUFFER_SLOT_NUM] = bp;
+    }
+
+    /* Make the ioreq_t visible /before/ write_pointer. */
+    wmb();
+    pg->write_pointer += qw ? 2 : 1;
+
+    notify_via_xen_event_channel(d, d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
+    spin_unlock(&iorp->lock);
+
+    return 1;
+}
+
+bool_t hvm_has_dm(struct domain *d)
+{
+    return !!d->arch.hvm_domain.ioreq.va;
+}
+
+bool_t hvm_send_assist_req(struct vcpu *v, const ioreq_t *proto_p)
 {
     ioreq_t *p = get_ioreq(v);
 
@@ -1432,14 +1533,23 @@ bool_t hvm_send_assist_req(struct vcpu *v)
         return 0;
     }
 
-    prepare_wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port);
+    p->dir = proto_p->dir;
+    p->data_is_ptr = proto_p->data_is_ptr;
+    p->type = proto_p->type;
+    p->size = proto_p->size;
+    p->addr = proto_p->addr;
+    p->count = proto_p->count;
+    p->df = proto_p->df;
+    p->data = proto_p->data;
+
+    prepare_wait_on_xen_event_channel(p->vp_eport);
 
     /*
      * Following happens /after/ blocking and setting up ioreq contents.
      * prepare_wait_on_xen_event_channel() is an implicit barrier.
      */
     p->state = STATE_IOREQ_READY;
-    notify_via_xen_event_channel(v->domain, v->arch.hvm_vcpu.xen_port);
+    notify_via_xen_event_channel(v->domain, p->vp_eport);
 
     return 1;
 }
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 5ba38d2..8db300d 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -46,82 +46,6 @@
 #include <xen/iocap.h>
 #include <public/hvm/ioreq.h>
 
-int hvm_buffered_io_send(struct domain *d, const ioreq_t *p)
-{
-    struct hvm_ioreq_page *iorp = &d->arch.hvm_domain.buf_ioreq;
-    buffered_iopage_t *pg = iorp->va;
-    buf_ioreq_t bp;
-    /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
-    int qw = 0;
-
-    /* Ensure buffered_iopage fits in a page */
-    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
-
-    /*
-     * Return 0 for the cases we can't deal with:
-     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
-     *  - we cannot buffer accesses to guest memory buffers, as the guest
-     *    may expect the memory buffer to be synchronously accessed
-     *  - the count field is usually used with data_is_ptr and since we don't
-     *    support data_is_ptr we do not waste space for the count field either
-     */
-    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
-        return 0;
-
-    bp.type = p->type;
-    bp.dir  = p->dir;
-    switch ( p->size )
-    {
-    case 1:
-        bp.size = 0;
-        break;
-    case 2:
-        bp.size = 1;
-        break;
-    case 4:
-        bp.size = 2;
-        break;
-    case 8:
-        bp.size = 3;
-        qw = 1;
-        break;
-    default:
-        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
-        return 0;
-    }
-    
-    bp.data = p->data;
-    bp.addr = p->addr;
-    
-    spin_lock(&iorp->lock);
-
-    if ( (pg->write_pointer - pg->read_pointer) >=
-         (IOREQ_BUFFER_SLOT_NUM - qw) )
-    {
-        /* The queue is full: send the iopacket through the normal path. */
-        spin_unlock(&iorp->lock);
-        return 0;
-    }
-    
-    pg->buf_ioreq[pg->write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp;
-    
-    if ( qw )
-    {
-        bp.data = p->data >> 32;
-        pg->buf_ioreq[(pg->write_pointer+1) % IOREQ_BUFFER_SLOT_NUM] = bp;
-    }
-
-    /* Make the ioreq_t visible /before/ write_pointer. */
-    wmb();
-    pg->write_pointer += qw ? 2 : 1;
-
-    notify_via_xen_event_channel(d,
-            d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
-    spin_unlock(&iorp->lock);
-    
-    return 1;
-}
-
 void send_timeoffset_req(unsigned long timeoff)
 {
     ioreq_t p = {
@@ -143,26 +67,14 @@ void send_timeoffset_req(unsigned long timeoff)
 /* Ask ioemu mapcache to invalidate mappings. */
 void send_invalidate_req(void)
 {
-    struct vcpu *v = current;
-    ioreq_t *p = get_ioreq(v);
-
-    if ( !p )
-        return;
-
-    if ( p->state != STATE_IOREQ_NONE )
-    {
-        gdprintk(XENLOG_ERR, "WARNING: send invalidate req with something "
-                 "already pending (%d)?\n", p->state);
-        domain_crash(v->domain);
-        return;
-    }
-
-    p->type = IOREQ_TYPE_INVALIDATE;
-    p->size = 4;
-    p->dir = IOREQ_WRITE;
-    p->data = ~0UL; /* flush all */
+    ioreq_t p = {
+        .type = IOREQ_TYPE_INVALIDATE,
+        .size = 4,
+        .dir = IOREQ_WRITE,
+        .data = ~0UL, /* flush all */
+    };
 
-    (void)hvm_send_assist_req(v);
+    (void)hvm_send_assist_req(current, &p);
 }
 
 int handle_mmio(void)
@@ -265,8 +177,6 @@ void hvm_io_assist(ioreq_t *p)
     struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
     enum hvm_io_state io_state;
 
-    rmb(); /* see IORESP_READY /then/ read contents of ioreq */
-
     p->state = STATE_IOREQ_NONE;
 
     io_state = vio->io_state;
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 40167d6..0421623 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1394,7 +1394,6 @@ void nvmx_switch_guest(void)
     struct vcpu *v = current;
     struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
     struct cpu_user_regs *regs = guest_cpu_user_regs();
-    const ioreq_t *ioreq = get_ioreq(v);
 
     /*
      * A pending IO emulation may still be not finished. In this case, no
@@ -1404,7 +1403,7 @@ void nvmx_switch_guest(void)
      * don't want to continue as this setup is not implemented nor supported
      * as of right now.
      */
-    if ( !ioreq || ioreq->state != STATE_IOREQ_NONE )
+    if ( hvm_io_pending(v) )
         return;
     /*
      * a softirq may interrupt us between a virtual vmentry is
@@ -2522,3 +2521,13 @@ void nvmx_set_cr_read_shadow(struct vcpu *v, unsigned int cr)
     /* nvcpu.guest_cr is what L2 write to cr actually. */
     __vmwrite(read_shadow_field, v->arch.hvm_vcpu.nvcpu.guest_cr[cr]);
 }
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index dcc3483..08a62ea 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -26,6 +26,7 @@
 #include <asm/hvm/asid.h>
 #include <public/domctl.h>
 #include <public/hvm/save.h>
+#include <public/hvm/ioreq.h>
 #include <asm/mm.h>
 
 /* Interrupt acknowledgement sources. */
@@ -227,7 +228,7 @@ int prepare_ring_for_helper(struct domain *d, unsigned long gmfn,
                             struct page_info **_page, void **_va);
 void destroy_ring_for_helper(void **_va, struct page_info *page);
 
-bool_t hvm_send_assist_req(struct vcpu *v);
+bool_t hvm_send_assist_req(struct vcpu *v, const ioreq_t *p);
 
 void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat);
 int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat);
@@ -339,6 +340,8 @@ static inline unsigned long hvm_get_shadow_gs_base(struct vcpu *v)
 void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
                                    unsigned int *ecx, unsigned int *edx);
 void hvm_migrate_timers(struct vcpu *v);
+bool_t hvm_has_dm(struct domain *d);
+bool_t hvm_io_pending(struct vcpu *v);
 void hvm_do_resume(struct vcpu *v);
 void hvm_migrate_pirqs(struct vcpu *v);
 
@@ -522,3 +525,13 @@ bool_t nhvm_vmcx_hap_enabled(struct vcpu *v);
 enum hvm_intblk nhvm_interrupt_blocked(struct vcpu *v);
 
 #endif /* __ASM_X86_HVM_HVM_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-x86/hvm/support.h
index 1dc2f2d..05ef5c5 100644
--- a/xen/include/asm-x86/hvm/support.h
+++ b/xen/include/asm-x86/hvm/support.h
@@ -22,21 +22,10 @@
 #define __ASM_X86_HVM_SUPPORT_H__
 
 #include <xen/types.h>
-#include <public/hvm/ioreq.h>
 #include <xen/sched.h>
 #include <xen/hvm/save.h>
 #include <asm/processor.h>
 
-static inline ioreq_t *get_ioreq(struct vcpu *v)
-{
-    struct domain *d = v->domain;
-    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
-
-    ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
-
-    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
-}
-
 #define HVM_DELIVER_NO_ERROR_CODE  -1
 
 #ifndef NDEBUG
@@ -144,3 +133,13 @@ int hvm_mov_to_cr(unsigned int cr, unsigned int gpr);
 int hvm_mov_from_cr(unsigned int cr, unsigned int gpr);
 
 #endif /* __ASM_X86_HVM_SUPPORT_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 2/8] ioreq-server: centralize access to ioreq structures
  2014-04-02 15:11 ` [PATCH v4 2/8] ioreq-server: centralize access to ioreq structures Paul Durrant
@ 2014-04-03 11:22   ` George Dunlap
  2014-04-07 11:10   ` Jan Beulich
  1 sibling, 0 replies; 62+ messages in thread
From: George Dunlap @ 2014-04-03 11:22 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Kevin Tian, Keir Fraser, Jan Beulich, Eddie Dong,
	xen-devel@lists.xen.org, Jun Nakajima

On Wed, Apr 2, 2014 at 4:11 PM, Paul Durrant <paul.durrant@citrix.com> wrote:
> To simplify creation of the ioreq server abstraction in a subsequent patch,
> this patch centralizes all use of the shared ioreq structure and the
> buffered ioreq ring to the source module xen/arch/x86/hvm/hvm.c.
>
> The patch moves an rmb() from inside hvm_io_assist() to hvm_do_resume()
> because the former may now be passed a data structure on stack, in which
> case the barrier is unnecessary.
>
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Cc: Keir Fraser <keir@xen.org>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Jun Nakajima <jun.nakajima@intel.com>
> Cc: Eddie Dong <eddie.dong@intel.com>
> Cc: Kevin Tian <kevin.tian@intel.com>

Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

> ---
>  xen/arch/x86/hvm/emulate.c        |   70 +++++++++-------------
>  xen/arch/x86/hvm/hvm.c            |  118 +++++++++++++++++++++++++++++++++++--
>  xen/arch/x86/hvm/io.c             |  104 +++-----------------------------
>  xen/arch/x86/hvm/vmx/vvmx.c       |   13 +++-
>  xen/include/asm-x86/hvm/hvm.h     |   15 ++++-
>  xen/include/asm-x86/hvm/support.h |   21 ++++---
>  6 files changed, 185 insertions(+), 156 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index 868aa1d..1c71902 100644
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -57,24 +57,11 @@ static int hvmemul_do_io(
>      int value_is_ptr = (p_data == NULL);
>      struct vcpu *curr = current;
>      struct hvm_vcpu_io *vio;
> -    ioreq_t *p = get_ioreq(curr);
> -    ioreq_t _ioreq;
> +    ioreq_t p;
>      unsigned long ram_gfn = paddr_to_pfn(ram_gpa);
>      p2m_type_t p2mt;
>      struct page_info *ram_page;
>      int rc;
> -    bool_t has_dm = 1;
> -
> -    /*
> -     * Domains without a backing DM, don't have an ioreq page.  Just
> -     * point to a struct on the stack, initialising the state as needed.
> -     */
> -    if ( !p )
> -    {
> -        has_dm = 0;
> -        p = &_ioreq;
> -        p->state = STATE_IOREQ_NONE;
> -    }
>
>      /* Check for paged out page */
>      ram_page = get_page_from_gfn(curr->domain, ram_gfn, &p2mt, P2M_UNSHARE);
> @@ -173,15 +160,6 @@ static int hvmemul_do_io(
>          return X86EMUL_UNHANDLEABLE;
>      }
>
> -    if ( p->state != STATE_IOREQ_NONE )
> -    {
> -        gdprintk(XENLOG_WARNING, "WARNING: io already pending (%d)?\n",
> -                 p->state);
> -        if ( ram_page )
> -            put_page(ram_page);
> -        return X86EMUL_UNHANDLEABLE;
> -    }
> -
>      vio->io_state =
>          (p_data == NULL) ? HVMIO_dispatched : HVMIO_awaiting_completion;
>      vio->io_size = size;
> @@ -193,38 +171,38 @@ static int hvmemul_do_io(
>      if ( vio->mmio_retrying )
>          *reps = 1;
>
> -    p->dir = dir;
> -    p->data_is_ptr = value_is_ptr;
> -    p->type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO;
> -    p->size = size;
> -    p->addr = addr;
> -    p->count = *reps;
> -    p->df = df;
> -    p->data = value;
> +    p.dir = dir;
> +    p.data_is_ptr = value_is_ptr;
> +    p.type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO;
> +    p.size = size;
> +    p.addr = addr;
> +    p.count = *reps;
> +    p.df = df;
> +    p.data = value;
>
>      if ( dir == IOREQ_WRITE )
> -        hvmtrace_io_assist(is_mmio, p);
> +        hvmtrace_io_assist(is_mmio, &p);
>
>      if ( is_mmio )
>      {
> -        rc = hvm_mmio_intercept(p);
> +        rc = hvm_mmio_intercept(&p);
>          if ( rc == X86EMUL_UNHANDLEABLE )
> -            rc = hvm_buffered_io_intercept(p);
> +            rc = hvm_buffered_io_intercept(&p);
>      }
>      else
>      {
> -        rc = hvm_portio_intercept(p);
> +        rc = hvm_portio_intercept(&p);
>      }
>
>      switch ( rc )
>      {
>      case X86EMUL_OKAY:
>      case X86EMUL_RETRY:
> -        *reps = p->count;
> -        p->state = STATE_IORESP_READY;
> +        *reps = p.count;
> +        p.state = STATE_IORESP_READY;
>          if ( !vio->mmio_retry )
>          {
> -            hvm_io_assist(p);
> +            hvm_io_assist(&p);
>              vio->io_state = HVMIO_none;
>          }
>          else
> @@ -233,7 +211,7 @@ static int hvmemul_do_io(
>          break;
>      case X86EMUL_UNHANDLEABLE:
>          /* If there is no backing DM, just ignore accesses */
> -        if ( !has_dm )
> +        if ( !hvm_has_dm(curr->domain) )
>          {
>              rc = X86EMUL_OKAY;
>              vio->io_state = HVMIO_none;
> @@ -241,7 +219,7 @@ static int hvmemul_do_io(
>          else
>          {
>              rc = X86EMUL_RETRY;
> -            if ( !hvm_send_assist_req(curr) )
> +            if ( !hvm_send_assist_req(curr, &p) )
>                  vio->io_state = HVMIO_none;
>              else if ( p_data == NULL )
>                  rc = X86EMUL_OKAY;
> @@ -260,7 +238,7 @@ static int hvmemul_do_io(
>
>   finish_access:
>      if ( dir == IOREQ_READ )
> -        hvmtrace_io_assist(is_mmio, p);
> +        hvmtrace_io_assist(is_mmio, &p);
>
>      if ( p_data != NULL )
>          memcpy(p_data, &vio->io_data, size);
> @@ -1292,3 +1270,13 @@ struct segment_register *hvmemul_get_seg_reg(
>          hvm_get_segment_register(current, seg, &hvmemul_ctxt->seg_reg[seg]);
>      return &hvmemul_ctxt->seg_reg[seg];
>  }
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 69d0a44..573f845 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -352,6 +352,26 @@ void hvm_migrate_pirqs(struct vcpu *v)
>      spin_unlock(&d->event_lock);
>  }
>
> +static ioreq_t *get_ioreq(struct vcpu *v)
> +{
> +    struct domain *d = v->domain;
> +    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
> +
> +    ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
> +
> +    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
> +}
> +
> +bool_t hvm_io_pending(struct vcpu *v)
> +{
> +    ioreq_t *p = get_ioreq(v);
> +
> +    if ( !p )
> +        return 0;
> +
> +    return ( p->state != STATE_IOREQ_NONE );
> +}
> +
>  void hvm_do_resume(struct vcpu *v)
>  {
>      ioreq_t *p = get_ioreq(v);
> @@ -370,11 +390,12 @@ void hvm_do_resume(struct vcpu *v)
>          switch ( p->state )
>          {
>          case STATE_IORESP_READY: /* IORESP_READY -> NONE */
> +            rmb(); /* see IORESP_READY /then/ read contents of ioreq */
>              hvm_io_assist(p);
>              break;
>          case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
>          case STATE_IOREQ_INPROCESS:
> -            wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port,
> +            wait_on_xen_event_channel(p->vp_eport,
>                                        (p->state != STATE_IOREQ_READY) &&
>                                        (p->state != STATE_IOREQ_INPROCESS));
>              break;
> @@ -1414,7 +1435,87 @@ void hvm_vcpu_down(struct vcpu *v)
>      }
>  }
>
> -bool_t hvm_send_assist_req(struct vcpu *v)
> +int hvm_buffered_io_send(struct domain *d, const ioreq_t *p)
> +{
> +    struct hvm_ioreq_page *iorp = &d->arch.hvm_domain.buf_ioreq;
> +    buffered_iopage_t *pg = iorp->va;
> +    buf_ioreq_t bp;
> +    /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
> +    int qw = 0;
> +
> +    /* Ensure buffered_iopage fits in a page */
> +    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
> +
> +    /*
> +     * Return 0 for the cases we can't deal with:
> +     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
> +     *  - we cannot buffer accesses to guest memory buffers, as the guest
> +     *    may expect the memory buffer to be synchronously accessed
> +     *  - the count field is usually used with data_is_ptr and since we don't
> +     *    support data_is_ptr we do not waste space for the count field either
> +     */
> +    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
> +        return 0;
> +
> +    bp.type = p->type;
> +    bp.dir  = p->dir;
> +    switch ( p->size )
> +    {
> +    case 1:
> +        bp.size = 0;
> +        break;
> +    case 2:
> +        bp.size = 1;
> +        break;
> +    case 4:
> +        bp.size = 2;
> +        break;
> +    case 8:
> +        bp.size = 3;
> +        qw = 1;
> +        break;
> +    default:
> +        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
> +        return 0;
> +    }
> +
> +    bp.data = p->data;
> +    bp.addr = p->addr;
> +
> +    spin_lock(&iorp->lock);
> +
> +    if ( (pg->write_pointer - pg->read_pointer) >=
> +         (IOREQ_BUFFER_SLOT_NUM - qw) )
> +    {
> +        /* The queue is full: send the iopacket through the normal path. */
> +        spin_unlock(&iorp->lock);
> +        return 0;
> +    }
> +
> +    pg->buf_ioreq[pg->write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp;
> +
> +    if ( qw )
> +    {
> +        bp.data = p->data >> 32;
> +        pg->buf_ioreq[(pg->write_pointer+1) % IOREQ_BUFFER_SLOT_NUM] = bp;
> +    }
> +
> +    /* Make the ioreq_t visible /before/ write_pointer. */
> +    wmb();
> +    pg->write_pointer += qw ? 2 : 1;
> +
> +    notify_via_xen_event_channel(d, d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
> +    spin_unlock(&iorp->lock);
> +
> +    return 1;
> +}
> +
> +bool_t hvm_has_dm(struct domain *d)
> +{
> +    return !!d->arch.hvm_domain.ioreq.va;
> +}
> +
> +bool_t hvm_send_assist_req(struct vcpu *v, const ioreq_t *proto_p)
>  {
>      ioreq_t *p = get_ioreq(v);
>
> @@ -1432,14 +1533,23 @@ bool_t hvm_send_assist_req(struct vcpu *v)
>          return 0;
>      }
>
> -    prepare_wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port);
> +    p->dir = proto_p->dir;
> +    p->data_is_ptr = proto_p->data_is_ptr;
> +    p->type = proto_p->type;
> +    p->size = proto_p->size;
> +    p->addr = proto_p->addr;
> +    p->count = proto_p->count;
> +    p->df = proto_p->df;
> +    p->data = proto_p->data;
> +
> +    prepare_wait_on_xen_event_channel(p->vp_eport);
>
>      /*
>       * Following happens /after/ blocking and setting up ioreq contents.
>       * prepare_wait_on_xen_event_channel() is an implicit barrier.
>       */
>      p->state = STATE_IOREQ_READY;
> -    notify_via_xen_event_channel(v->domain, v->arch.hvm_vcpu.xen_port);
> +    notify_via_xen_event_channel(v->domain, p->vp_eport);
>
>      return 1;
>  }
> diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
> index 5ba38d2..8db300d 100644
> --- a/xen/arch/x86/hvm/io.c
> +++ b/xen/arch/x86/hvm/io.c
> @@ -46,82 +46,6 @@
>  #include <xen/iocap.h>
>  #include <public/hvm/ioreq.h>
>
> -int hvm_buffered_io_send(struct domain *d, const ioreq_t *p)
> -{
> -    struct hvm_ioreq_page *iorp = &d->arch.hvm_domain.buf_ioreq;
> -    buffered_iopage_t *pg = iorp->va;
> -    buf_ioreq_t bp;
> -    /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
> -    int qw = 0;
> -
> -    /* Ensure buffered_iopage fits in a page */
> -    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
> -
> -    /*
> -     * Return 0 for the cases we can't deal with:
> -     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
> -     *  - we cannot buffer accesses to guest memory buffers, as the guest
> -     *    may expect the memory buffer to be synchronously accessed
> -     *  - the count field is usually used with data_is_ptr and since we don't
> -     *    support data_is_ptr we do not waste space for the count field either
> -     */
> -    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
> -        return 0;
> -
> -    bp.type = p->type;
> -    bp.dir  = p->dir;
> -    switch ( p->size )
> -    {
> -    case 1:
> -        bp.size = 0;
> -        break;
> -    case 2:
> -        bp.size = 1;
> -        break;
> -    case 4:
> -        bp.size = 2;
> -        break;
> -    case 8:
> -        bp.size = 3;
> -        qw = 1;
> -        break;
> -    default:
> -        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
> -        return 0;
> -    }
> -
> -    bp.data = p->data;
> -    bp.addr = p->addr;
> -
> -    spin_lock(&iorp->lock);
> -
> -    if ( (pg->write_pointer - pg->read_pointer) >=
> -         (IOREQ_BUFFER_SLOT_NUM - qw) )
> -    {
> -        /* The queue is full: send the iopacket through the normal path. */
> -        spin_unlock(&iorp->lock);
> -        return 0;
> -    }
> -
> -    pg->buf_ioreq[pg->write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp;
> -
> -    if ( qw )
> -    {
> -        bp.data = p->data >> 32;
> -        pg->buf_ioreq[(pg->write_pointer+1) % IOREQ_BUFFER_SLOT_NUM] = bp;
> -    }
> -
> -    /* Make the ioreq_t visible /before/ write_pointer. */
> -    wmb();
> -    pg->write_pointer += qw ? 2 : 1;
> -
> -    notify_via_xen_event_channel(d,
> -            d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
> -    spin_unlock(&iorp->lock);
> -
> -    return 1;
> -}
> -
>  void send_timeoffset_req(unsigned long timeoff)
>  {
>      ioreq_t p = {
> @@ -143,26 +67,14 @@ void send_timeoffset_req(unsigned long timeoff)
>  /* Ask ioemu mapcache to invalidate mappings. */
>  void send_invalidate_req(void)
>  {
> -    struct vcpu *v = current;
> -    ioreq_t *p = get_ioreq(v);
> -
> -    if ( !p )
> -        return;
> -
> -    if ( p->state != STATE_IOREQ_NONE )
> -    {
> -        gdprintk(XENLOG_ERR, "WARNING: send invalidate req with something "
> -                 "already pending (%d)?\n", p->state);
> -        domain_crash(v->domain);
> -        return;
> -    }
> -
> -    p->type = IOREQ_TYPE_INVALIDATE;
> -    p->size = 4;
> -    p->dir = IOREQ_WRITE;
> -    p->data = ~0UL; /* flush all */
> +    ioreq_t p = {
> +        .type = IOREQ_TYPE_INVALIDATE,
> +        .size = 4,
> +        .dir = IOREQ_WRITE,
> +        .data = ~0UL, /* flush all */
> +    };
>
> -    (void)hvm_send_assist_req(v);
> +    (void)hvm_send_assist_req(current, &p);
>  }
>
>  int handle_mmio(void)
> @@ -265,8 +177,6 @@ void hvm_io_assist(ioreq_t *p)
>      struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
>      enum hvm_io_state io_state;
>
> -    rmb(); /* see IORESP_READY /then/ read contents of ioreq */
> -
>      p->state = STATE_IOREQ_NONE;
>
>      io_state = vio->io_state;
> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> index 40167d6..0421623 100644
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -1394,7 +1394,6 @@ void nvmx_switch_guest(void)
>      struct vcpu *v = current;
>      struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
>      struct cpu_user_regs *regs = guest_cpu_user_regs();
> -    const ioreq_t *ioreq = get_ioreq(v);
>
>      /*
>       * A pending IO emulation may still be not finished. In this case, no
> @@ -1404,7 +1403,7 @@ void nvmx_switch_guest(void)
>       * don't want to continue as this setup is not implemented nor supported
>       * as of right now.
>       */
> -    if ( !ioreq || ioreq->state != STATE_IOREQ_NONE )
> +    if ( hvm_io_pending(v) )
>          return;
>      /*
>       * a softirq may interrupt us between a virtual vmentry is
> @@ -2522,3 +2521,13 @@ void nvmx_set_cr_read_shadow(struct vcpu *v, unsigned int cr)
>      /* nvcpu.guest_cr is what L2 write to cr actually. */
>      __vmwrite(read_shadow_field, v->arch.hvm_vcpu.nvcpu.guest_cr[cr]);
>  }
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
> index dcc3483..08a62ea 100644
> --- a/xen/include/asm-x86/hvm/hvm.h
> +++ b/xen/include/asm-x86/hvm/hvm.h
> @@ -26,6 +26,7 @@
>  #include <asm/hvm/asid.h>
>  #include <public/domctl.h>
>  #include <public/hvm/save.h>
> +#include <public/hvm/ioreq.h>
>  #include <asm/mm.h>
>
>  /* Interrupt acknowledgement sources. */
> @@ -227,7 +228,7 @@ int prepare_ring_for_helper(struct domain *d, unsigned long gmfn,
>                              struct page_info **_page, void **_va);
>  void destroy_ring_for_helper(void **_va, struct page_info *page);
>
> -bool_t hvm_send_assist_req(struct vcpu *v);
> +bool_t hvm_send_assist_req(struct vcpu *v, const ioreq_t *p);
>
>  void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat);
>  int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat);
> @@ -339,6 +340,8 @@ static inline unsigned long hvm_get_shadow_gs_base(struct vcpu *v)
>  void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
>                                     unsigned int *ecx, unsigned int *edx);
>  void hvm_migrate_timers(struct vcpu *v);
> +bool_t hvm_has_dm(struct domain *d);
> +bool_t hvm_io_pending(struct vcpu *v);
>  void hvm_do_resume(struct vcpu *v);
>  void hvm_migrate_pirqs(struct vcpu *v);
>
> @@ -522,3 +525,13 @@ bool_t nhvm_vmcx_hap_enabled(struct vcpu *v);
>  enum hvm_intblk nhvm_interrupt_blocked(struct vcpu *v);
>
>  #endif /* __ASM_X86_HVM_HVM_H__ */
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-x86/hvm/support.h
> index 1dc2f2d..05ef5c5 100644
> --- a/xen/include/asm-x86/hvm/support.h
> +++ b/xen/include/asm-x86/hvm/support.h
> @@ -22,21 +22,10 @@
>  #define __ASM_X86_HVM_SUPPORT_H__
>
>  #include <xen/types.h>
> -#include <public/hvm/ioreq.h>
>  #include <xen/sched.h>
>  #include <xen/hvm/save.h>
>  #include <asm/processor.h>
>
> -static inline ioreq_t *get_ioreq(struct vcpu *v)
> -{
> -    struct domain *d = v->domain;
> -    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
> -
> -    ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
> -
> -    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
> -}
> -
>  #define HVM_DELIVER_NO_ERROR_CODE  -1
>
>  #ifndef NDEBUG
> @@ -144,3 +133,13 @@ int hvm_mov_to_cr(unsigned int cr, unsigned int gpr);
>  int hvm_mov_from_cr(unsigned int cr, unsigned int gpr);
>
>  #endif /* __ASM_X86_HVM_SUPPORT_H__ */
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> --
> 1.7.10.4
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 2/8] ioreq-server: centralize access to ioreq structures
  2014-04-02 15:11 ` [PATCH v4 2/8] ioreq-server: centralize access to ioreq structures Paul Durrant
  2014-04-03 11:22   ` George Dunlap
@ 2014-04-07 11:10   ` Jan Beulich
  2014-04-08  9:18     ` Paul Durrant
  1 sibling, 1 reply; 62+ messages in thread
From: Jan Beulich @ 2014-04-07 11:10 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Keir Fraser, Kevin Tian, EddieDong, Jun Nakajima, xen-devel

>>> On 02.04.14 at 17:11, <paul.durrant@citrix.com> wrote:
> @@ -173,15 +160,6 @@ static int hvmemul_do_io(
>          return X86EMUL_UNHANDLEABLE;
>      }
>  
> -    if ( p->state != STATE_IOREQ_NONE )
> -    {
> -        gdprintk(XENLOG_WARNING, "WARNING: io already pending (%d)?\n",
> -                 p->state);
> -        if ( ram_page )
> -            put_page(ram_page);
> -        return X86EMUL_UNHANDLEABLE;
> -    }
> -

Shouldn't this be replaced with a call to hvm_io_pending() instead of
just getting deleted?

> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -352,6 +352,26 @@ void hvm_migrate_pirqs(struct vcpu *v)
>      spin_unlock(&d->event_lock);
>  }
>  
> +static ioreq_t *get_ioreq(struct vcpu *v)
> +{
> +    struct domain *d = v->domain;
> +    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
> +
> +    ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
> +
> +    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
> +}
> +
> +bool_t hvm_io_pending(struct vcpu *v)
> +{
> +    ioreq_t *p = get_ioreq(v);
> +
> +    if ( !p )
> +        return 0;
> +
> +    return ( p->state != STATE_IOREQ_NONE );

The parentheses are pointless but a matter of taste (but then again
using them here makes little sense if you don't also use them around
the conditional expression in the function right above), but the blanks
inside them clearly don't belong there according to our coding style.

> @@ -1432,14 +1533,23 @@ bool_t hvm_send_assist_req(struct vcpu *v)
>          return 0;
>      }
>  
> -    prepare_wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port);
> +    p->dir = proto_p->dir;
> +    p->data_is_ptr = proto_p->data_is_ptr;
> +    p->type = proto_p->type;
> +    p->size = proto_p->size;
> +    p->addr = proto_p->addr;
> +    p->count = proto_p->count;
> +    p->df = proto_p->df;
> +    p->data = proto_p->data;

I realize that you do this piecemeal copying because of wanting to
leave alone ->state. If you didn't have the input pointer const, and
if no caller depended on that field having any specific contents (I'm
sure none of them cares), you could set the input structure's field
first to STATE_IOREQ_NONE and then copy the whole structure in
one go. And that all if the field's value is meaningful at all across the
call to prepare_wait_on_xen_event_channel(), as it gets set to
STATE_IOREQ_READY right afterwards.

Hmm, I see vp_eport also needs to be left untouched. I wonder
whether there shouldn't be a sub-structure with all the fields set
above, and with only that sub-structure getting passed around
after setting up on-stack.

Jan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 2/8] ioreq-server: centralize access to ioreq structures
  2014-04-07 11:10   ` Jan Beulich
@ 2014-04-08  9:18     ` Paul Durrant
  0 siblings, 0 replies; 62+ messages in thread
From: Paul Durrant @ 2014-04-08  9:18 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir (Xen.org), Kevin Tian, Eddie Dong, Jun Nakajima,
	xen-devel@lists.xen.org

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 07 April 2014 12:10
> To: Paul Durrant
> Cc: Eddie Dong; Jun Nakajima; Kevin Tian; xen-devel@lists.xen.org; Keir
> (Xen.org)
> Subject: Re: [PATCH v4 2/8] ioreq-server: centralize access to ioreq structures
> 
> >>> On 02.04.14 at 17:11, <paul.durrant@citrix.com> wrote:
> > @@ -173,15 +160,6 @@ static int hvmemul_do_io(
> >          return X86EMUL_UNHANDLEABLE;
> >      }
> >
> > -    if ( p->state != STATE_IOREQ_NONE )
> > -    {
> > -        gdprintk(XENLOG_WARNING, "WARNING: io already pending
> (%d)?\n",
> > -                 p->state);
> > -        if ( ram_page )
> > -            put_page(ram_page);
> > -        return X86EMUL_UNHANDLEABLE;
> > -    }
> > -
> 
> Shouldn't this be replaced with a call to hvm_io_pending() instead of
> just getting deleted?

Yes, that's probably better.

> 
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -352,6 +352,26 @@ void hvm_migrate_pirqs(struct vcpu *v)
> >      spin_unlock(&d->event_lock);
> >  }
> >
> > +static ioreq_t *get_ioreq(struct vcpu *v)
> > +{
> > +    struct domain *d = v->domain;
> > +    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
> > +
> > +    ASSERT((v == current) || spin_is_locked(&d-
> >arch.hvm_domain.ioreq.lock));
> > +
> > +    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
> > +}
> > +
> > +bool_t hvm_io_pending(struct vcpu *v)
> > +{
> > +    ioreq_t *p = get_ioreq(v);
> > +
> > +    if ( !p )
> > +        return 0;
> > +
> > +    return ( p->state != STATE_IOREQ_NONE );
> 
> The parentheses are pointless but a matter of taste (but then again
> using them here makes little sense if you don't also use them around
> the conditional expression in the function right above), but the blanks
> inside them clearly don't belong there according to our coding style.
> 

Ok.

> > @@ -1432,14 +1533,23 @@ bool_t hvm_send_assist_req(struct vcpu *v)
> >          return 0;
> >      }
> >
> > -    prepare_wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port);
> > +    p->dir = proto_p->dir;
> > +    p->data_is_ptr = proto_p->data_is_ptr;
> > +    p->type = proto_p->type;
> > +    p->size = proto_p->size;
> > +    p->addr = proto_p->addr;
> > +    p->count = proto_p->count;
> > +    p->df = proto_p->df;
> > +    p->data = proto_p->data;
> 
> I realize that you do this piecemeal copying because of wanting to
> leave alone ->state. If you didn't have the input pointer const, and
> if no caller depended on that field having any specific contents (I'm
> sure none of them cares), you could set the input structure's field
> first to STATE_IOREQ_NONE and then copy the whole structure in
> one go. And that all if the field's value is meaningful at all across the
> call to prepare_wait_on_xen_event_channel(), as it gets set to
> STATE_IOREQ_READY right afterwards.
> 
> Hmm, I see vp_eport also needs to be left untouched. I wonder
> whether there shouldn't be a sub-structure with all the fields set
> above, and with only that sub-structure getting passed around
> after setting up on-stack.
> 

The layout of ioreq_t is stable (as it's an ABI with QEMU) so I can't change it. I could introduce a new variant without vp_eport or state and change xen to use that internally, but that's going to be quite a big patch. Alternatively I could copy vp_eport aside here an assign the struct - which should lead to a smaller hunk, so I'll do that.

  Paul

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 3/8] ioreq-server: create basic ioreq server abstraction.
  2014-04-02 15:11 [PATCH v4 0/8] Support for running secondary emulators Paul Durrant
  2014-04-02 15:11 ` [PATCH v4 1/8] ioreq-server: pre-series tidy up Paul Durrant
  2014-04-02 15:11 ` [PATCH v4 2/8] ioreq-server: centralize access to ioreq structures Paul Durrant
@ 2014-04-02 15:11 ` Paul Durrant
  2014-04-03 14:49   ` George Dunlap
  2014-04-07 11:36   ` Jan Beulich
  2014-04-02 15:11 ` [PATCH v4 4/8] ioreq-server: on-demand creation of ioreq server Paul Durrant
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 62+ messages in thread
From: Paul Durrant @ 2014-04-02 15:11 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant, Keir Fraser, Jan Beulich

Collect together data structures concerning device emulation together into
a new struct hvm_ioreq_server.

Code that deals with the shared and buffered ioreq pages is extracted from
functions such as hvm_domain_initialise, hvm_vcpu_initialise and do_hvm_op
and consolidated into a set of hvm_ioreq_server manipulation functions. The
lock in the hvm_ioreq_page served two different purposes and has been
replaced by separate locks in the hvm_ioreq_server.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/hvm/hvm.c           |  406 ++++++++++++++++++++++++++------------
 xen/include/asm-x86/hvm/domain.h |   35 +++-
 xen/include/asm-x86/hvm/vcpu.h   |   12 +-
 3 files changed, 322 insertions(+), 131 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 573f845..5f131c4 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -352,39 +352,49 @@ void hvm_migrate_pirqs(struct vcpu *v)
     spin_unlock(&d->event_lock);
 }
 
-static ioreq_t *get_ioreq(struct vcpu *v)
+static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v)
 {
-    struct domain *d = v->domain;
-    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
+    shared_iopage_t *p = s->ioreq.va;
 
-    ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
+    /*
+     * Manipulation of the shared ioreq structure (to update the event
+     * channel) is protected by a domain_pause(). So this function should
+     * only ever be executed for the current vcpu or one that is paused.
+     */
+    ASSERT((v == current) || !vcpu_runnable(v));
+    ASSERT(p != NULL);
 
-    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
+    return &p->vcpu_ioreq[v->vcpu_id];
 }
 
 bool_t hvm_io_pending(struct vcpu *v)
 {
-    ioreq_t *p = get_ioreq(v);
+    struct hvm_ioreq_server *s = v->domain->arch.hvm_domain.ioreq_server;
+    ioreq_t *p;
 
-    if ( !p )
+    if ( !s )
         return 0;
 
+    p = get_ioreq(s, v);
     return ( p->state != STATE_IOREQ_NONE );
 }
 
 void hvm_do_resume(struct vcpu *v)
 {
-    ioreq_t *p = get_ioreq(v);
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    ioreq_t *p;
 
     check_wakeup_from_wait();
 
     if ( is_hvm_vcpu(v) )
         pt_restore_timer(v);
 
-    /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
-    if ( !p )
+    if ( !s )
         goto check_inject_trap;
 
+    /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
+    p = get_ioreq(s, v);
     while ( p->state != STATE_IOREQ_NONE )
     {
         switch ( p->state )
@@ -415,14 +425,6 @@ void hvm_do_resume(struct vcpu *v)
     }
 }
 
-static void hvm_init_ioreq_page(
-    struct domain *d, struct hvm_ioreq_page *iorp)
-{
-    memset(iorp, 0, sizeof(*iorp));
-    spin_lock_init(&iorp->lock);
-    domain_pause(d);
-}
-
 void destroy_ring_for_helper(
     void **_va, struct page_info *page)
 {
@@ -436,16 +438,9 @@ void destroy_ring_for_helper(
     }
 }
 
-static void hvm_unmap_ioreq_page(
-    struct domain *d, struct hvm_ioreq_page *iorp)
+static void hvm_unmap_ioreq_page(struct hvm_ioreq_page *iorp)
 {
-    spin_lock(&iorp->lock);
-
-    ASSERT(d->is_dying);
-
     destroy_ring_for_helper(&iorp->va, iorp->page);
-
-    spin_unlock(&iorp->lock);
 }
 
 int prepare_ring_for_helper(
@@ -502,22 +497,15 @@ static int hvm_map_ioreq_page(
     if ( (rc = prepare_ring_for_helper(d, gmfn, &page, &va)) )
         return rc;
 
-    spin_lock(&iorp->lock);
-
     if ( (iorp->va != NULL) || d->is_dying )
     {
         destroy_ring_for_helper(&va, page);
-        spin_unlock(&iorp->lock);
         return -EINVAL;
     }
 
     iorp->va = va;
     iorp->page = page;
 
-    spin_unlock(&iorp->lock);
-
-    domain_unpause(d);
-
     return 0;
 }
 
@@ -561,8 +549,227 @@ static int handle_pvh_io(
     return X86EMUL_OKAY;
 }
 
+static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s,
+                                    struct hvm_ioreq_vcpu *sv)
+{
+    ASSERT(spin_is_locked(&s->lock));
+
+    if ( s->ioreq.va != NULL )
+    {
+        ioreq_t *p = get_ioreq(s, sv->vcpu);
+
+        p->vp_eport = sv->ioreq_evtchn;
+    }
+}
+
+static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
+                                     struct vcpu *v)
+{
+    struct hvm_ioreq_vcpu *sv;
+    int rc;
+
+    spin_lock(&s->lock);
+
+    sv = xzalloc(struct hvm_ioreq_vcpu);
+
+    rc = -ENOMEM;
+    if ( !sv )
+        goto fail1;
+
+    rc = alloc_unbound_xen_event_channel(v, s->domid, NULL);
+    if ( rc < 0 )
+        goto fail2;
+
+    sv->ioreq_evtchn = rc;
+
+    if ( v->vcpu_id == 0 )
+    {
+        struct domain *d = s->domain;
+
+        rc = alloc_unbound_xen_event_channel(v, s->domid, NULL);
+        if ( rc < 0 )
+            goto fail3;
+
+        s->bufioreq_evtchn = rc;
+        d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] =
+            s->bufioreq_evtchn;
+    }
+
+    sv->vcpu = v;
+
+    list_add(&sv->list_entry, &s->ioreq_vcpu_list);
+
+    hvm_update_ioreq_evtchn(s, sv);
+
+    spin_unlock(&s->lock);
+    return 0;
+
+ fail3:
+    free_xen_event_channel(v, sv->ioreq_evtchn);
+    
+ fail2:
+    xfree(sv);
+
+ fail1:
+    spin_unlock(&s->lock);
+    return rc;
+}
+
+static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s,
+                                         struct vcpu *v)
+{
+    struct list_head *entry;
+
+    spin_lock(&s->lock);
+
+    list_for_each ( entry, &s->ioreq_vcpu_list )
+    {
+        struct hvm_ioreq_vcpu *sv = container_of(entry, 
+                                                 struct hvm_ioreq_vcpu, 
+                                                 list_entry);
+
+        if ( sv->vcpu != v )
+            continue;
+
+        list_del_init(&sv->list_entry);
+
+        if ( v->vcpu_id == 0 )
+            free_xen_event_channel(v, s->bufioreq_evtchn);
+
+        free_xen_event_channel(v, sv->ioreq_evtchn);
+
+        xfree(sv);
+        break;
+    }
+
+    spin_unlock(&s->lock);
+}
+
+static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
+{
+    struct hvm_ioreq_server *s;
+
+    s = xzalloc(struct hvm_ioreq_server);
+    if ( !s )
+        return -ENOMEM;
+
+    s->domain = d;
+    s->domid = domid;
+
+    spin_lock_init(&s->lock);
+    INIT_LIST_HEAD(&s->ioreq_vcpu_list);
+    spin_lock_init(&s->bufioreq_lock);
+
+    d->arch.hvm_domain.ioreq_server = s;
+    return 0;
+}
+
+static void hvm_destroy_ioreq_server(struct domain *d)
+{
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+
+    hvm_unmap_ioreq_page(&s->bufioreq);
+    hvm_unmap_ioreq_page(&s->ioreq);
+
+    xfree(s);
+}
+
+static int hvm_set_ioreq_pfn(struct domain *d, bool_t buf,
+                             unsigned long pfn)
+{
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    int rc;
+
+    spin_lock(&s->lock);
+
+    rc = hvm_map_ioreq_page(d, iorp, pfn);
+    if ( rc )
+        goto fail;
+
+    if (!buf) {
+        struct list_head *entry;
+
+        list_for_each ( entry, &s->ioreq_vcpu_list )
+        {
+            struct hvm_ioreq_vcpu *sv = container_of(entry,
+                                                     struct hvm_ioreq_vcpu,
+                                                     list_entry);
+
+            hvm_update_ioreq_evtchn(s, sv);
+        }
+    }
+
+    spin_unlock(&s->lock);
+    return 0;
+
+ fail:
+    spin_unlock(&s->lock);
+    return rc;
+}
+
+static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
+                                     evtchn_port_t *p_port)
+{
+    evtchn_port_t old_port, new_port;
+
+    new_port = alloc_unbound_xen_event_channel(v, remote_domid, NULL);
+    if ( new_port < 0 )
+        return new_port;
+
+    /* xchg() ensures that only we call free_xen_event_channel(). */
+    old_port = xchg(p_port, new_port);
+    free_xen_event_channel(v, old_port);
+    return 0;
+}
+
+static int hvm_set_dm_domain(struct domain *d, domid_t domid)
+{
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    int rc = 0;
+
+    spin_lock(&s->lock);
+    domain_pause(d);
+
+    if ( s->domid != domid ) {
+        struct list_head *entry;
+
+        list_for_each ( entry, &s->ioreq_vcpu_list )
+        {
+            struct hvm_ioreq_vcpu *sv = container_of(entry,
+                                                     struct hvm_ioreq_vcpu,
+                                                     list_entry);
+            struct vcpu *v = sv->vcpu;
+
+            if ( v->vcpu_id == 0 ) {
+                rc = hvm_replace_event_channel(v, domid,
+                                               &s->bufioreq_evtchn);
+                if ( rc )
+                    break;
+
+                d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] =
+                    s->bufioreq_evtchn;
+            }
+
+            rc = hvm_replace_event_channel(v, domid, &sv->ioreq_evtchn);
+            if ( rc )
+                break;
+
+            hvm_update_ioreq_evtchn(s, sv);
+        }
+
+        s->domid = domid;
+    }
+
+    domain_unpause(d);
+    spin_unlock(&s->lock);
+
+    return rc;
+}
+
 int hvm_domain_initialise(struct domain *d)
 {
+    domid_t domid;
     int rc;
 
     if ( !hvm_enabled )
@@ -628,17 +835,21 @@ int hvm_domain_initialise(struct domain *d)
 
     rtc_init(d);
 
-    hvm_init_ioreq_page(d, &d->arch.hvm_domain.ioreq);
-    hvm_init_ioreq_page(d, &d->arch.hvm_domain.buf_ioreq);
+    domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
+    rc = hvm_create_ioreq_server(d, domid);
+    if ( rc != 0 )
+        goto fail2;
 
     register_portio_handler(d, 0xe9, 1, hvm_print_line);
 
     rc = hvm_funcs.domain_initialise(d);
     if ( rc != 0 )
-        goto fail2;
+        goto fail3;
 
     return 0;
 
+ fail3:
+    hvm_destroy_ioreq_server(d);
  fail2:
     rtc_deinit(d);
     stdvga_deinit(d);
@@ -662,8 +873,7 @@ void hvm_domain_relinquish_resources(struct domain *d)
     if ( hvm_funcs.nhvm_domain_relinquish_resources )
         hvm_funcs.nhvm_domain_relinquish_resources(d);
 
-    hvm_unmap_ioreq_page(d, &d->arch.hvm_domain.ioreq);
-    hvm_unmap_ioreq_page(d, &d->arch.hvm_domain.buf_ioreq);
+    hvm_destroy_ioreq_server(d);
 
     msixtbl_pt_cleanup(d);
 
@@ -1296,7 +1506,7 @@ int hvm_vcpu_initialise(struct vcpu *v)
 {
     int rc;
     struct domain *d = v->domain;
-    domid_t dm_domid;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
 
     hvm_asid_flush_vcpu(v);
 
@@ -1339,30 +1549,10 @@ int hvm_vcpu_initialise(struct vcpu *v)
          && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown: nestedhvm_vcpu_destroy */
         goto fail5;
 
-    dm_domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
-
-    /* Create ioreq event channel. */
-    rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); /* teardown: none */
-    if ( rc < 0 )
+    rc = hvm_ioreq_server_add_vcpu(s, v);
+    if ( rc != 0 )
         goto fail6;
 
-    /* Register ioreq event channel. */
-    v->arch.hvm_vcpu.xen_port = rc;
-
-    if ( v->vcpu_id == 0 )
-    {
-        /* Create bufioreq event channel. */
-        rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); /* teardown: none */
-        if ( rc < 0 )
-            goto fail6;
-        d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] = rc;
-    }
-
-    spin_lock(&d->arch.hvm_domain.ioreq.lock);
-    if ( d->arch.hvm_domain.ioreq.va != NULL )
-        get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
-    spin_unlock(&d->arch.hvm_domain.ioreq.lock);
-
     if ( v->vcpu_id == 0 )
     {
         /* NB. All these really belong in hvm_domain_initialise(). */
@@ -1395,6 +1585,11 @@ int hvm_vcpu_initialise(struct vcpu *v)
 
 void hvm_vcpu_destroy(struct vcpu *v)
 {
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+
+    hvm_ioreq_server_remove_vcpu(s, v);
+
     nestedhvm_vcpu_destroy(v);
 
     free_compat_arg_xlat(v);
@@ -1406,9 +1601,6 @@ void hvm_vcpu_destroy(struct vcpu *v)
         vlapic_destroy(v);
 
     hvm_funcs.vcpu_destroy(v);
-
-    /* Event channel is already freed by evtchn_destroy(). */
-    /*free_xen_event_channel(v, v->arch.hvm_vcpu.xen_port);*/
 }
 
 void hvm_vcpu_down(struct vcpu *v)
@@ -1437,8 +1629,9 @@ void hvm_vcpu_down(struct vcpu *v)
 
 int hvm_buffered_io_send(struct domain *d, const ioreq_t *p)
 {
-    struct hvm_ioreq_page *iorp = &d->arch.hvm_domain.buf_ioreq;
-    buffered_iopage_t *pg = iorp->va;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_page *iorp;
+    buffered_iopage_t *pg;
     buf_ioreq_t bp;
     /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
     int qw = 0;
@@ -1446,6 +1639,12 @@ int hvm_buffered_io_send(struct domain *d, const ioreq_t *p)
     /* Ensure buffered_iopage fits in a page */
     BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
 
+    if ( !s )
+        return 0;
+
+    iorp = &s->bufioreq;
+    pg = iorp->va;
+
     /*
      * Return 0 for the cases we can't deal with:
      *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
@@ -1482,13 +1681,13 @@ int hvm_buffered_io_send(struct domain *d, const ioreq_t *p)
     bp.data = p->data;
     bp.addr = p->addr;
 
-    spin_lock(&iorp->lock);
+    spin_lock(&s->bufioreq_lock);
 
     if ( (pg->write_pointer - pg->read_pointer) >=
          (IOREQ_BUFFER_SLOT_NUM - qw) )
     {
         /* The queue is full: send the iopacket through the normal path. */
-        spin_unlock(&iorp->lock);
+        spin_unlock(&s->bufioreq_lock);
         return 0;
     }
 
@@ -1504,32 +1703,36 @@ int hvm_buffered_io_send(struct domain *d, const ioreq_t *p)
     wmb();
     pg->write_pointer += qw ? 2 : 1;
 
-    notify_via_xen_event_channel(d, d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
-    spin_unlock(&iorp->lock);
+    notify_via_xen_event_channel(d, s->bufioreq_evtchn);
+    spin_unlock(&s->bufioreq_lock);
 
     return 1;
 }
 
 bool_t hvm_has_dm(struct domain *d)
 {
-    return !!d->arch.hvm_domain.ioreq.va;
+    return !!d->arch.hvm_domain.ioreq_server;
 }
 
 bool_t hvm_send_assist_req(struct vcpu *v, const ioreq_t *proto_p)
 {
-    ioreq_t *p = get_ioreq(v);
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    ioreq_t *p;
 
     if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
         return 0; /* implicitly bins the i/o operation */
 
-    if ( !p )
+    if ( !s )
         return 0;
 
+    p = get_ioreq(s, v);
+
     if ( unlikely(p->state != STATE_IOREQ_NONE) )
     {
         /* This indicates a bug in the device model. Crash the domain. */
         gdprintk(XENLOG_ERR, "Device model set bad IO state %d.\n", p->state);
-        domain_crash(v->domain);
+        domain_crash(d);
         return 0;
     }
 
@@ -4167,21 +4370,6 @@ static int hvmop_flush_tlb_all(void)
     return 0;
 }
 
-static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
-                                     int *p_port)
-{
-    int old_port, new_port;
-
-    new_port = alloc_unbound_xen_event_channel(v, remote_domid, NULL);
-    if ( new_port < 0 )
-        return new_port;
-
-    /* xchg() ensures that only we call free_xen_event_channel(). */
-    old_port = xchg(p_port, new_port);
-    free_xen_event_channel(v, old_port);
-    return 0;
-}
-
 long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
 {
@@ -4194,7 +4382,6 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
     case HVMOP_get_param:
     {
         struct xen_hvm_param a;
-        struct hvm_ioreq_page *iorp;
         struct domain *d;
         struct vcpu *v;
 
@@ -4227,19 +4414,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             switch ( a.index )
             {
             case HVM_PARAM_IOREQ_PFN:
-                iorp = &d->arch.hvm_domain.ioreq;
-                if ( (rc = hvm_map_ioreq_page(d, iorp, a.value)) != 0 )
-                    break;
-                spin_lock(&iorp->lock);
-                if ( iorp->va != NULL )
-                    /* Initialise evtchn port info if VCPUs already created. */
-                    for_each_vcpu ( d, v )
-                        get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
-                spin_unlock(&iorp->lock);
+                rc = hvm_set_ioreq_pfn(d, 0, a.value);
                 break;
             case HVM_PARAM_BUFIOREQ_PFN:
-                iorp = &d->arch.hvm_domain.buf_ioreq;
-                rc = hvm_map_ioreq_page(d, iorp, a.value);
+                rc = hvm_set_ioreq_pfn(d, 1, a.value);
                 break;
             case HVM_PARAM_CALLBACK_IRQ:
                 hvm_set_callback_via(d, a.value);
@@ -4294,31 +4472,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 if ( a.value == DOMID_SELF )
                     a.value = curr_d->domain_id;
 
-                rc = 0;
-                domain_pause(d); /* safe to change per-vcpu xen_port */
-                if ( d->vcpu[0] )
-                    rc = hvm_replace_event_channel(d->vcpu[0], a.value,
-                             (int *)&d->vcpu[0]->domain->arch.hvm_domain.params
-                                     [HVM_PARAM_BUFIOREQ_EVTCHN]);
-                if ( rc )
-                {
-                    domain_unpause(d);
-                    break;
-                }
-                iorp = &d->arch.hvm_domain.ioreq;
-                for_each_vcpu ( d, v )
-                {
-                    rc = hvm_replace_event_channel(v, a.value,
-                                                   &v->arch.hvm_vcpu.xen_port);
-                    if ( rc )
-                        break;
-
-                    spin_lock(&iorp->lock);
-                    if ( iorp->va != NULL )
-                        get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
-                    spin_unlock(&iorp->lock);
-                }
-                domain_unpause(d);
+                rc = hvm_set_dm_domain(d, a.value);
                 break;
             case HVM_PARAM_ACPI_S_STATE:
                 /* Not reflexive, as we must domain_pause(). */
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index b1e3187..1f6eaec 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -36,14 +36,34 @@
 #include <public/hvm/save.h>
 
 struct hvm_ioreq_page {
-    spinlock_t lock;
     struct page_info *page;
     void *va;
 };
 
-struct hvm_domain {
+struct hvm_ioreq_vcpu {
+    struct list_head list_entry;
+    struct vcpu      *vcpu;
+    evtchn_port_t    ioreq_evtchn;
+};
+
+struct hvm_ioreq_server {
+    /* Lock to serialize toolstack modifications */
+    spinlock_t             lock;
+    struct domain          *domain;
+
+    /* Domain id of emulating domain */
+    domid_t                domid;
     struct hvm_ioreq_page  ioreq;
-    struct hvm_ioreq_page  buf_ioreq;
+    struct list_head       ioreq_vcpu_list;
+    struct hvm_ioreq_page  bufioreq;
+
+    /* Lock to serialize access to buffered ioreq ring */
+    spinlock_t             bufioreq_lock;
+    evtchn_port_t          bufioreq_evtchn;
+};
+
+struct hvm_domain {
+    struct hvm_ioreq_server *ioreq_server;
 
     struct pl_time         pl_time;
 
@@ -100,3 +120,12 @@ struct hvm_domain {
 
 #endif /* __ASM_X86_HVM_DOMAIN_H__ */
 
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 122ab0d..08e98fb 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -138,8 +138,6 @@ struct hvm_vcpu {
     spinlock_t          tm_lock;
     struct list_head    tm_list;
 
-    int                 xen_port;
-
     bool_t              flag_dr_dirty;
     bool_t              debug_state_latch;
     bool_t              single_step;
@@ -186,3 +184,13 @@ struct hvm_vcpu {
 };
 
 #endif /* __ASM_X86_HVM_VCPU_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 3/8] ioreq-server: create basic ioreq server abstraction.
  2014-04-02 15:11 ` [PATCH v4 3/8] ioreq-server: create basic ioreq server abstraction Paul Durrant
@ 2014-04-03 14:49   ` George Dunlap
  2014-04-03 15:43     ` Paul Durrant
  2014-04-07 11:36   ` Jan Beulich
  1 sibling, 1 reply; 62+ messages in thread
From: George Dunlap @ 2014-04-03 14:49 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Keir Fraser, Jan Beulich, xen-devel@lists.xen.org

On Wed, Apr 2, 2014 at 4:11 PM, Paul Durrant <paul.durrant@citrix.com> wrote:
> Collect together data structures concerning device emulation together into
> a new struct hvm_ioreq_server.
>
> Code that deals with the shared and buffered ioreq pages is extracted from
> functions such as hvm_domain_initialise, hvm_vcpu_initialise and do_hvm_op
> and consolidated into a set of hvm_ioreq_server manipulation functions. The
> lock in the hvm_ioreq_page served two different purposes and has been
> replaced by separate locks in the hvm_ioreq_server.
>
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Cc: Keir Fraser <keir@xen.org>
> Cc: Jan Beulich <jbeulich@suse.com>
> ---
>  xen/arch/x86/hvm/hvm.c           |  406 ++++++++++++++++++++++++++------------
>  xen/include/asm-x86/hvm/domain.h |   35 +++-
>  xen/include/asm-x86/hvm/vcpu.h   |   12 +-
>  3 files changed, 322 insertions(+), 131 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 573f845..5f131c4 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -352,39 +352,49 @@ void hvm_migrate_pirqs(struct vcpu *v)
>      spin_unlock(&d->event_lock);
>  }
>
> -static ioreq_t *get_ioreq(struct vcpu *v)
> +static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v)
>  {
> -    struct domain *d = v->domain;
> -    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
> +    shared_iopage_t *p = s->ioreq.va;
>
> -    ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
> +    /*
> +     * Manipulation of the shared ioreq structure (to update the event
> +     * channel) is protected by a domain_pause(). So this function should
> +     * only ever be executed for the current vcpu or one that is paused.
> +     */

What on earth is "manipulation of the shared ioreq structure is
protected by domain_pause()" supposed to mean?  Do you mean that the
only time there may be a race is between something in the emulation
code writing to it, and something in the resume path reading it?  That
there are never any other races to access the structure?  And that
since the resume path won't be taken when the domain is paused, there
can be no races, and therefore we do not need to be locked?

 -George

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 3/8] ioreq-server: create basic ioreq server abstraction.
  2014-04-03 14:49   ` George Dunlap
@ 2014-04-03 15:43     ` Paul Durrant
  2014-04-03 15:48       ` George Dunlap
  0 siblings, 1 reply; 62+ messages in thread
From: Paul Durrant @ 2014-04-03 15:43 UTC (permalink / raw)
  To: George Dunlap; +Cc: Keir (Xen.org), Jan Beulich, xen-devel@lists.xen.org

> -----Original Message-----
> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
> George Dunlap
> Sent: 03 April 2014 15:50
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Keir (Xen.org); Jan Beulich
> Subject: Re: [Xen-devel] [PATCH v4 3/8] ioreq-server: create basic ioreq
> server abstraction.
> 
> On Wed, Apr 2, 2014 at 4:11 PM, Paul Durrant <paul.durrant@citrix.com>
> wrote:
> > Collect together data structures concerning device emulation together into
> > a new struct hvm_ioreq_server.
> >
> > Code that deals with the shared and buffered ioreq pages is extracted from
> > functions such as hvm_domain_initialise, hvm_vcpu_initialise and
> do_hvm_op
> > and consolidated into a set of hvm_ioreq_server manipulation functions.
> The
> > lock in the hvm_ioreq_page served two different purposes and has been
> > replaced by separate locks in the hvm_ioreq_server.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > Cc: Keir Fraser <keir@xen.org>
> > Cc: Jan Beulich <jbeulich@suse.com>
> > ---
> >  xen/arch/x86/hvm/hvm.c           |  406 ++++++++++++++++++++++++++----
> --------
> >  xen/include/asm-x86/hvm/domain.h |   35 +++-
> >  xen/include/asm-x86/hvm/vcpu.h   |   12 +-
> >  3 files changed, 322 insertions(+), 131 deletions(-)
> >
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > index 573f845..5f131c4 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -352,39 +352,49 @@ void hvm_migrate_pirqs(struct vcpu *v)
> >      spin_unlock(&d->event_lock);
> >  }
> >
> > -static ioreq_t *get_ioreq(struct vcpu *v)
> > +static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v)
> >  {
> > -    struct domain *d = v->domain;
> > -    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
> > +    shared_iopage_t *p = s->ioreq.va;
> >
> > -    ASSERT((v == current) || spin_is_locked(&d-
> >arch.hvm_domain.ioreq.lock));
> > +    /*
> > +     * Manipulation of the shared ioreq structure (to update the event
> > +     * channel) is protected by a domain_pause(). So this function should
> > +     * only ever be executed for the current vcpu or one that is paused.
> > +     */
> 
> What on earth is "manipulation of the shared ioreq structure is
> protected by domain_pause()" supposed to mean?  Do you mean that the
> only time there may be a race is between something in the emulation
> code writing to it, and something in the resume path reading it?  That
> there are never any other races to access the structure?  And that
> since the resume path won't be taken when the domain is paused, there
> can be no races, and therefore we do not need to be locked?
> 

The sentiment I'm trying to express is that the shared structure can never be in use in the emulation path whilst it is being modified as the code that modifies always pauses the domain before doing so, so the assertion is that either v == current (in which case the domain is clearly not paused and we're in the emulation path) or !vcpu_runnable(v) (in which case the domain is paused and we're making a change).

  Paul

>  -George

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 3/8] ioreq-server: create basic ioreq server abstraction.
  2014-04-03 15:43     ` Paul Durrant
@ 2014-04-03 15:48       ` George Dunlap
  2014-04-03 15:54         ` Paul Durrant
  0 siblings, 1 reply; 62+ messages in thread
From: George Dunlap @ 2014-04-03 15:48 UTC (permalink / raw)
  To: Paul Durrant, George Dunlap
  Cc: Keir (Xen.org), Jan Beulich, xen-devel@lists.xen.org

On 04/03/2014 04:43 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
>> George Dunlap
>> Sent: 03 April 2014 15:50
>> To: Paul Durrant
>> Cc: xen-devel@lists.xen.org; Keir (Xen.org); Jan Beulich
>> Subject: Re: [Xen-devel] [PATCH v4 3/8] ioreq-server: create basic ioreq
>> server abstraction.
>>
>> On Wed, Apr 2, 2014 at 4:11 PM, Paul Durrant <paul.durrant@citrix.com>
>> wrote:
>>> Collect together data structures concerning device emulation together into
>>> a new struct hvm_ioreq_server.
>>>
>>> Code that deals with the shared and buffered ioreq pages is extracted from
>>> functions such as hvm_domain_initialise, hvm_vcpu_initialise and
>> do_hvm_op
>>> and consolidated into a set of hvm_ioreq_server manipulation functions.
>> The
>>> lock in the hvm_ioreq_page served two different purposes and has been
>>> replaced by separate locks in the hvm_ioreq_server.
>>>
>>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>>> Cc: Keir Fraser <keir@xen.org>
>>> Cc: Jan Beulich <jbeulich@suse.com>
>>> ---
>>>   xen/arch/x86/hvm/hvm.c           |  406 ++++++++++++++++++++++++++----
>> --------
>>>   xen/include/asm-x86/hvm/domain.h |   35 +++-
>>>   xen/include/asm-x86/hvm/vcpu.h   |   12 +-
>>>   3 files changed, 322 insertions(+), 131 deletions(-)
>>>
>>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>>> index 573f845..5f131c4 100644
>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -352,39 +352,49 @@ void hvm_migrate_pirqs(struct vcpu *v)
>>>       spin_unlock(&d->event_lock);
>>>   }
>>>
>>> -static ioreq_t *get_ioreq(struct vcpu *v)
>>> +static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v)
>>>   {
>>> -    struct domain *d = v->domain;
>>> -    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
>>> +    shared_iopage_t *p = s->ioreq.va;
>>>
>>> -    ASSERT((v == current) || spin_is_locked(&d-
>>> arch.hvm_domain.ioreq.lock));
>>> +    /*
>>> +     * Manipulation of the shared ioreq structure (to update the event
>>> +     * channel) is protected by a domain_pause(). So this function should
>>> +     * only ever be executed for the current vcpu or one that is paused.
>>> +     */
>> What on earth is "manipulation of the shared ioreq structure is
>> protected by domain_pause()" supposed to mean?  Do you mean that the
>> only time there may be a race is between something in the emulation
>> code writing to it, and something in the resume path reading it?  That
>> there are never any other races to access the structure?  And that
>> since the resume path won't be taken when the domain is paused, there
>> can be no races, and therefore we do not need to be locked?
>>
> The sentiment I'm trying to express is that the shared structure can never be in use in the emulation path whilst it is being modified as the code that modifies always pauses the domain before doing so, so the assertion is that either v == current (in which case the domain is clearly not paused and we're in the emulation path) or !vcpu_runnable(v) (in which case the domain is paused and we're making a change).

Sure, but is there a risk of two different invocations of the "code that 
modifies" happening at the same time?  (Perhaps, for instance, because 
of a buggy toolstack that makes two calls on the same ioreq server?)

  -George

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 3/8] ioreq-server: create basic ioreq server abstraction.
  2014-04-03 15:48       ` George Dunlap
@ 2014-04-03 15:54         ` Paul Durrant
  0 siblings, 0 replies; 62+ messages in thread
From: Paul Durrant @ 2014-04-03 15:54 UTC (permalink / raw)
  To: George Dunlap; +Cc: Keir (Xen.org), Jan Beulich, xen-devel@lists.xen.org

> -----Original Message-----
> From: George Dunlap [mailto:george.dunlap@eu.citrix.com]
> Sent: 03 April 2014 16:49
> To: Paul Durrant; George Dunlap
> Cc: xen-devel@lists.xen.org; Keir (Xen.org); Jan Beulich
> Subject: Re: [Xen-devel] [PATCH v4 3/8] ioreq-server: create basic ioreq
> server abstraction.
> 
> On 04/03/2014 04:43 PM, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
> >> George Dunlap
> >> Sent: 03 April 2014 15:50
> >> To: Paul Durrant
> >> Cc: xen-devel@lists.xen.org; Keir (Xen.org); Jan Beulich
> >> Subject: Re: [Xen-devel] [PATCH v4 3/8] ioreq-server: create basic ioreq
> >> server abstraction.
> >>
> >> On Wed, Apr 2, 2014 at 4:11 PM, Paul Durrant <paul.durrant@citrix.com>
> >> wrote:
> >>> Collect together data structures concerning device emulation together
> into
> >>> a new struct hvm_ioreq_server.
> >>>
> >>> Code that deals with the shared and buffered ioreq pages is extracted
> from
> >>> functions such as hvm_domain_initialise, hvm_vcpu_initialise and
> >> do_hvm_op
> >>> and consolidated into a set of hvm_ioreq_server manipulation functions.
> >> The
> >>> lock in the hvm_ioreq_page served two different purposes and has
> been
> >>> replaced by separate locks in the hvm_ioreq_server.
> >>>
> >>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> >>> Cc: Keir Fraser <keir@xen.org>
> >>> Cc: Jan Beulich <jbeulich@suse.com>
> >>> ---
> >>>   xen/arch/x86/hvm/hvm.c           |  406
> ++++++++++++++++++++++++++----
> >> --------
> >>>   xen/include/asm-x86/hvm/domain.h |   35 +++-
> >>>   xen/include/asm-x86/hvm/vcpu.h   |   12 +-
> >>>   3 files changed, 322 insertions(+), 131 deletions(-)
> >>>
> >>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> >>> index 573f845..5f131c4 100644
> >>> --- a/xen/arch/x86/hvm/hvm.c
> >>> +++ b/xen/arch/x86/hvm/hvm.c
> >>> @@ -352,39 +352,49 @@ void hvm_migrate_pirqs(struct vcpu *v)
> >>>       spin_unlock(&d->event_lock);
> >>>   }
> >>>
> >>> -static ioreq_t *get_ioreq(struct vcpu *v)
> >>> +static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v)
> >>>   {
> >>> -    struct domain *d = v->domain;
> >>> -    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
> >>> +    shared_iopage_t *p = s->ioreq.va;
> >>>
> >>> -    ASSERT((v == current) || spin_is_locked(&d-
> >>> arch.hvm_domain.ioreq.lock));
> >>> +    /*
> >>> +     * Manipulation of the shared ioreq structure (to update the event
> >>> +     * channel) is protected by a domain_pause(). So this function should
> >>> +     * only ever be executed for the current vcpu or one that is paused.
> >>> +     */
> >> What on earth is "manipulation of the shared ioreq structure is
> >> protected by domain_pause()" supposed to mean?  Do you mean that
> the
> >> only time there may be a race is between something in the emulation
> >> code writing to it, and something in the resume path reading it?  That
> >> there are never any other races to access the structure?  And that
> >> since the resume path won't be taken when the domain is paused, there
> >> can be no races, and therefore we do not need to be locked?
> >>
> > The sentiment I'm trying to express is that the shared structure can never
> be in use in the emulation path whilst it is being modified as the code that
> modifies always pauses the domain before doing so, so the assertion is that
> either v == current (in which case the domain is clearly not paused and we're
> in the emulation path) or !vcpu_runnable(v) (in which case the domain is
> paused and we're making a change).
> 
> Sure, but is there a risk of two different invocations of the "code that
> modifies" happening at the same time?  (Perhaps, for instance, because
> of a buggy toolstack that makes two calls on the same ioreq server?)
> 

No, hvm_ioreq_server->lock prevents that. (See comment in structure definition).

  Paul

>   -George

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 3/8] ioreq-server: create basic ioreq server abstraction.
  2014-04-02 15:11 ` [PATCH v4 3/8] ioreq-server: create basic ioreq server abstraction Paul Durrant
  2014-04-03 14:49   ` George Dunlap
@ 2014-04-07 11:36   ` Jan Beulich
  2014-04-08  9:32     ` Paul Durrant
  1 sibling, 1 reply; 62+ messages in thread
From: Jan Beulich @ 2014-04-07 11:36 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Keir Fraser, xen-devel

>>> On 02.04.14 at 17:11, <paul.durrant@citrix.com> wrote:
> +static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
> +                                     struct vcpu *v)
> +{
> +    struct hvm_ioreq_vcpu *sv;
> +    int rc;
> +
> +    spin_lock(&s->lock);
> +
> +    sv = xzalloc(struct hvm_ioreq_vcpu);
> +
> +    rc = -ENOMEM;
> +    if ( !sv )
> +        goto fail1;

I don't see why this allocation needs to be done with the lock already
held. For the other (event channel) allocations further down I would
also prefer if you allocated them without holding the lock yet, even
if that means freeing the per-domain one if you find it already set.

> +
> +    rc = alloc_unbound_xen_event_channel(v, s->domid, NULL);
> +    if ( rc < 0 )
> +        goto fail2;
> +
> +    sv->ioreq_evtchn = rc;
> +
> +    if ( v->vcpu_id == 0 )
> +    {

I generally dislike needless dependencies on vCPU 0 being the first
one to make it into any specific function. Can't you check emptiness
of s->ioreq_vcpu_list instead?

> +static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s,
> +                                         struct vcpu *v)
> +{
> +    struct list_head *entry;
> +
> +    spin_lock(&s->lock);
> +
> +    list_for_each ( entry, &s->ioreq_vcpu_list )
> +    {
> +        struct hvm_ioreq_vcpu *sv = container_of(entry, 
> +                                                 struct hvm_ioreq_vcpu, 
> +                                                 list_entry);
> +
> +        if ( sv->vcpu != v )
> +            continue;
> +
> +        list_del_init(&sv->list_entry);
> +
> +        if ( v->vcpu_id == 0 )
> +            free_xen_event_channel(v, s->bufioreq_evtchn);
> +
> +        free_xen_event_channel(v, sv->ioreq_evtchn);
> +
> +        xfree(sv);
> +        break;

Similar comments as above: Try to avoid depending on vCPU 0 being
the last one to be cleaned up (I'm not even certain this is the case),
and try freeing stuff with the lock already dropped.

> +static int hvm_set_ioreq_pfn(struct domain *d, bool_t buf,
> +                             unsigned long pfn)
> +{
> +    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> +    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
> +    int rc;
> +
> +    spin_lock(&s->lock);
> +
> +    rc = hvm_map_ioreq_page(d, iorp, pfn);

While I realize that at this point there's only one server per domain,
the locking and operation still look to be out of sync at the first
glance: As this can't remain that way anyway till the end of the series,
can't this be brought back in sync here right away (whether that's by
passing s instead of d into the function or acquiring the lock only after
the call I don't know offhand)?

> +    if ( rc )
> +        goto fail;
> +
> +    if (!buf) {

Coding style.

> +static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
> +                                     evtchn_port_t *p_port)
> +{
> +    evtchn_port_t old_port, new_port;
> +
> +    new_port = alloc_unbound_xen_event_channel(v, remote_domid, NULL);
> +    if ( new_port < 0 )

evtchn_port_t is an unsigned type, so this check won't work.

> +static int hvm_set_dm_domain(struct domain *d, domid_t domid)
> +{
> +    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> +    int rc = 0;
> +
> +    spin_lock(&s->lock);
> +    domain_pause(d);

The other way around perhaps?

> +
> +    if ( s->domid != domid ) {

Coding style again.

> +        struct list_head *entry;
> +
> +        list_for_each ( entry, &s->ioreq_vcpu_list )
> +        {
> +            struct hvm_ioreq_vcpu *sv = container_of(entry,
> +                                                     struct hvm_ioreq_vcpu,
> +                                                     list_entry);
> +            struct vcpu *v = sv->vcpu;
> +
> +            if ( v->vcpu_id == 0 ) {

And again; won't make further remarks to this effect.

>  int hvm_domain_initialise(struct domain *d)
>  {
> +    domid_t domid;

Do you really need this new variable, being used just once? (There
is at least one more similar case elsewhere.)

> @@ -1339,30 +1549,10 @@ int hvm_vcpu_initialise(struct vcpu *v)
>           && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown: 
> nestedhvm_vcpu_destroy */
>          goto fail5;
>  
> -    dm_domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
> -
> -    /* Create ioreq event channel. */
> -    rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); /* teardown: none */
> -    if ( rc < 0 )
> +    rc = hvm_ioreq_server_add_vcpu(s, v);
> +    if ( rc != 0 )

Can this really be > 0 now, and if so is this being handled correctly in
the caller(s)?

Jan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 3/8] ioreq-server: create basic ioreq server abstraction.
  2014-04-07 11:36   ` Jan Beulich
@ 2014-04-08  9:32     ` Paul Durrant
  2014-04-08  9:47       ` Jan Beulich
  0 siblings, 1 reply; 62+ messages in thread
From: Paul Durrant @ 2014-04-08  9:32 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir (Xen.org), xen-devel@lists.xen.org

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 07 April 2014 12:37
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Keir (Xen.org)
> Subject: Re: [PATCH v4 3/8] ioreq-server: create basic ioreq server
> abstraction.
> 
> >>> On 02.04.14 at 17:11, <paul.durrant@citrix.com> wrote:
> > +static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
> > +                                     struct vcpu *v)
> > +{
> > +    struct hvm_ioreq_vcpu *sv;
> > +    int rc;
> > +
> > +    spin_lock(&s->lock);
> > +
> > +    sv = xzalloc(struct hvm_ioreq_vcpu);
> > +
> > +    rc = -ENOMEM;
> > +    if ( !sv )
> > +        goto fail1;
> 
> I don't see why this allocation needs to be done with the lock already
> held. For the other (event channel) allocations further down I would
> also prefer if you allocated them without holding the lock yet, even
> if that means freeing the per-domain one if you find it already set.
> 

Ok.

> > +
> > +    rc = alloc_unbound_xen_event_channel(v, s->domid, NULL);
> > +    if ( rc < 0 )
> > +        goto fail2;
> > +
> > +    sv->ioreq_evtchn = rc;
> > +
> > +    if ( v->vcpu_id == 0 )
> > +    {
> 
> I generally dislike needless dependencies on vCPU 0 being the first
> one to make it into any specific function. Can't you check emptiness
> of s->ioreq_vcpu_list instead?
> 

That will need a bit more code. I could create the buffered channel on first cpu addition but, I'd then need to track which cpu that  was and re-plumb the event channel if that cpu disappears. Also, the default server has always bound the buffered channel to cpu 0 so would I not risk a compatibility issue by changing this?

> > +static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s,
> > +                                         struct vcpu *v)
> > +{
> > +    struct list_head *entry;
> > +
> > +    spin_lock(&s->lock);
> > +
> > +    list_for_each ( entry, &s->ioreq_vcpu_list )
> > +    {
> > +        struct hvm_ioreq_vcpu *sv = container_of(entry,
> > +                                                 struct hvm_ioreq_vcpu,
> > +                                                 list_entry);
> > +
> > +        if ( sv->vcpu != v )
> > +            continue;
> > +
> > +        list_del_init(&sv->list_entry);
> > +
> > +        if ( v->vcpu_id == 0 )
> > +            free_xen_event_channel(v, s->bufioreq_evtchn);
> > +
> > +        free_xen_event_channel(v, sv->ioreq_evtchn);
> > +
> > +        xfree(sv);
> > +        break;
> 
> Similar comments as above: Try to avoid depending on vCPU 0 being
> the last one to be cleaned up (I'm not even certain this is the case),
> and try freeing stuff with the lock already dropped.
> 
> > +static int hvm_set_ioreq_pfn(struct domain *d, bool_t buf,
> > +                             unsigned long pfn)
> > +{
> > +    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> > +    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
> > +    int rc;
> > +
> > +    spin_lock(&s->lock);
> > +
> > +    rc = hvm_map_ioreq_page(d, iorp, pfn);
> 
> While I realize that at this point there's only one server per domain,
> the locking and operation still look to be out of sync at the first
> glance: As this can't remain that way anyway till the end of the series,
> can't this be brought back in sync here right away (whether that's by
> passing s instead of d into the function or acquiring the lock only after
> the call I don't know offhand)?

I don't follow what you mean by 'out of sync' here: The lock is taken, the pfn is mapped, the lock is dropped. What am I missing?

> 
> > +    if ( rc )
> > +        goto fail;
> > +
> > +    if (!buf) {
> 
> Coding style.
> 

Ok.

> > +static int hvm_replace_event_channel(struct vcpu *v, domid_t
> remote_domid,
> > +                                     evtchn_port_t *p_port)
> > +{
> > +    evtchn_port_t old_port, new_port;
> > +
> > +    new_port = alloc_unbound_xen_event_channel(v, remote_domid,
> NULL);
> > +    if ( new_port < 0 )
> 
> evtchn_port_t is an unsigned type, so this check won't work.
> 

Yes, it should be an int. Over-zealous type cleanup.

> > +static int hvm_set_dm_domain(struct domain *d, domid_t domid)
> > +{
> > +    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> > +    int rc = 0;
> > +
> > +    spin_lock(&s->lock);
> > +    domain_pause(d);
> 
> The other way around perhaps?

Yes.

> 
> > +
> > +    if ( s->domid != domid ) {
> 
> Coding style again.
> 
> > +        struct list_head *entry;
> > +
> > +        list_for_each ( entry, &s->ioreq_vcpu_list )
> > +        {
> > +            struct hvm_ioreq_vcpu *sv = container_of(entry,
> > +                                                     struct hvm_ioreq_vcpu,
> > +                                                     list_entry);
> > +            struct vcpu *v = sv->vcpu;
> > +
> > +            if ( v->vcpu_id == 0 ) {
> 
> And again; won't make further remarks to this effect.
> 
> >  int hvm_domain_initialise(struct domain *d)
> >  {
> > +    domid_t domid;
> 
> Do you really need this new variable, being used just once? (There
> is at least one more similar case elsewhere.)

This is to avoid a line becoming very long.

> 
> > @@ -1339,30 +1549,10 @@ int hvm_vcpu_initialise(struct vcpu *v)
> >           && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown:
> > nestedhvm_vcpu_destroy */
> >          goto fail5;
> >
> > -    dm_domid = d-
> >arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
> > -
> > -    /* Create ioreq event channel. */
> > -    rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); /*
> teardown: none */
> > -    if ( rc < 0 )
> > +    rc = hvm_ioreq_server_add_vcpu(s, v);
> > +    if ( rc != 0 )
> 
> Can this really be > 0 now, and if so is this being handled correctly in
> the caller(s)?

It's not the same function being called. hvm_ioreq_server_add_vcpu () can only return 0 or a -ve errno. I can test for < 0 but IIRC you objected to doing that in a review of a previous patch if the function never returns > 0. Would you prefer 'if ( rc )'?

  Paul

> 
> Jan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 3/8] ioreq-server: create basic ioreq server abstraction.
  2014-04-08  9:32     ` Paul Durrant
@ 2014-04-08  9:47       ` Jan Beulich
  2014-04-08 10:06         ` Paul Durrant
  0 siblings, 1 reply; 62+ messages in thread
From: Jan Beulich @ 2014-04-08  9:47 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Keir (Xen.org), xen-devel@lists.xen.org

>>> On 08.04.14 at 11:32, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >>> On 02.04.14 at 17:11, <paul.durrant@citrix.com> wrote:
>> > +
>> > +    rc = alloc_unbound_xen_event_channel(v, s->domid, NULL);
>> > +    if ( rc < 0 )
>> > +        goto fail2;
>> > +
>> > +    sv->ioreq_evtchn = rc;
>> > +
>> > +    if ( v->vcpu_id == 0 )
>> > +    {
>> 
>> I generally dislike needless dependencies on vCPU 0 being the first
>> one to make it into any specific function. Can't you check emptiness
>> of s->ioreq_vcpu_list instead?
>> 
> 
> That will need a bit more code. I could create the buffered channel on first 
> cpu addition but, I'd then need to track which cpu that  was and re-plumb the 
> event channel if that cpu disappears. Also, the default server has always 
> bound the buffered channel to cpu 0 so would I not risk a compatibility issue 
> by changing this?

Hmm, good point. Albeit I still wonder what would happen if vCPU 0
went away.

But yes, considering that this is code effectively getting moved here
(i.e. having special cased vCPU 0 already before), I guess I withdraw
my change request for now.

>> > +static int hvm_set_ioreq_pfn(struct domain *d, bool_t buf,
>> > +                             unsigned long pfn)
>> > +{
>> > +    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
>> > +    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
>> > +    int rc;
>> > +
>> > +    spin_lock(&s->lock);
>> > +
>> > +    rc = hvm_map_ioreq_page(d, iorp, pfn);
>> 
>> While I realize that at this point there's only one server per domain,
>> the locking and operation still look to be out of sync at the first
>> glance: As this can't remain that way anyway till the end of the series,
>> can't this be brought back in sync here right away (whether that's by
>> passing s instead of d into the function or acquiring the lock only after
>> the call I don't know offhand)?
> 
> I don't follow what you mean by 'out of sync' here: The lock is taken, the 
> pfn is mapped, the lock is dropped. What am I missing?

You lock "s" but operate on "d".

>> > @@ -1339,30 +1549,10 @@ int hvm_vcpu_initialise(struct vcpu *v)
>> >           && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown:
>> > nestedhvm_vcpu_destroy */
>> >          goto fail5;
>> >
>> > -    dm_domid = d-
>> >arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
>> > -
>> > -    /* Create ioreq event channel. */
>> > -    rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); /*
>> teardown: none */
>> > -    if ( rc < 0 )
>> > +    rc = hvm_ioreq_server_add_vcpu(s, v);
>> > +    if ( rc != 0 )
>> 
>> Can this really be > 0 now, and if so is this being handled correctly in
>> the caller(s)?
> 
> It's not the same function being called. hvm_ioreq_server_add_vcpu () can 
> only return 0 or a -ve errno.

Oh, sorry, didn't pay close enough attention.

> I can test for < 0 but IIRC you objected to doing 
> that in a review of a previous patch if the function never returns > 0. Would 
> you prefer 'if ( rc )'?

Yes, "if ( rc )" would seem better, but with the called function changing
it doesn't really matter all that much.

Jan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 3/8] ioreq-server: create basic ioreq server abstraction.
  2014-04-08  9:47       ` Jan Beulich
@ 2014-04-08 10:06         ` Paul Durrant
  0 siblings, 0 replies; 62+ messages in thread
From: Paul Durrant @ 2014-04-08 10:06 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir (Xen.org), xen-devel@lists.xen.org

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 08 April 2014 10:47
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Keir (Xen.org)
> Subject: RE: [PATCH v4 3/8] ioreq-server: create basic ioreq server
> abstraction.
> 
> >>> On 08.04.14 at 11:32, <Paul.Durrant@citrix.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >>> On 02.04.14 at 17:11, <paul.durrant@citrix.com> wrote:
> >> > +
> >> > +    rc = alloc_unbound_xen_event_channel(v, s->domid, NULL);
> >> > +    if ( rc < 0 )
> >> > +        goto fail2;
> >> > +
> >> > +    sv->ioreq_evtchn = rc;
> >> > +
> >> > +    if ( v->vcpu_id == 0 )
> >> > +    {
> >>
> >> I generally dislike needless dependencies on vCPU 0 being the first
> >> one to make it into any specific function. Can't you check emptiness
> >> of s->ioreq_vcpu_list instead?
> >>
> >
> > That will need a bit more code. I could create the buffered channel on first
> > cpu addition but, I'd then need to track which cpu that  was and re-plumb
> the
> > event channel if that cpu disappears. Also, the default server has always
> > bound the buffered channel to cpu 0 so would I not risk a compatibility
> issue
> > by changing this?
> 
> Hmm, good point. Albeit I still wonder what would happen if vCPU 0
> went away.
> 
> But yes, considering that this is code effectively getting moved here
> (i.e. having special cased vCPU 0 already before), I guess I withdraw
> my change request for now.
> 
> >> > +static int hvm_set_ioreq_pfn(struct domain *d, bool_t buf,
> >> > +                             unsigned long pfn)
> >> > +{
> >> > +    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> >> > +    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
> >> > +    int rc;
> >> > +
> >> > +    spin_lock(&s->lock);
> >> > +
> >> > +    rc = hvm_map_ioreq_page(d, iorp, pfn);
> >>
> >> While I realize that at this point there's only one server per domain,
> >> the locking and operation still look to be out of sync at the first
> >> glance: As this can't remain that way anyway till the end of the series,
> >> can't this be brought back in sync here right away (whether that's by
> >> passing s instead of d into the function or acquiring the lock only after
> >> the call I don't know offhand)?
> >
> > I don't follow what you mean by 'out of sync' here: The lock is taken, the
> > pfn is mapped, the lock is dropped. What am I missing?
> 
> You lock "s" but operate on "d".

Ah, I see what you mean. I'll mod. the function to take s rather than d then.

> 
> >> > @@ -1339,30 +1549,10 @@ int hvm_vcpu_initialise(struct vcpu *v)
> >> >           && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown:
> >> > nestedhvm_vcpu_destroy */
> >> >          goto fail5;
> >> >
> >> > -    dm_domid = d-
> >> >arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
> >> > -
> >> > -    /* Create ioreq event channel. */
> >> > -    rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); /*
> >> teardown: none */
> >> > -    if ( rc < 0 )
> >> > +    rc = hvm_ioreq_server_add_vcpu(s, v);
> >> > +    if ( rc != 0 )
> >>
> >> Can this really be > 0 now, and if so is this being handled correctly in
> >> the caller(s)?
> >
> > It's not the same function being called. hvm_ioreq_server_add_vcpu () can
> > only return 0 or a -ve errno.
> 
> Oh, sorry, didn't pay close enough attention.
> 
> > I can test for < 0 but IIRC you objected to doing
> > that in a review of a previous patch if the function never returns > 0. Would
> > you prefer 'if ( rc )'?
> 
> Yes, "if ( rc )" would seem better, but with the called function changing
> it doesn't really matter all that much.

Ok. I'll check for consistency with what I did elsewhere.

  Paul

> 
> Jan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 4/8] ioreq-server: on-demand creation of ioreq server
  2014-04-02 15:11 [PATCH v4 0/8] Support for running secondary emulators Paul Durrant
                   ` (2 preceding siblings ...)
  2014-04-02 15:11 ` [PATCH v4 3/8] ioreq-server: create basic ioreq server abstraction Paul Durrant
@ 2014-04-02 15:11 ` Paul Durrant
  2014-04-07 11:50   ` Jan Beulich
  2014-04-02 15:11 ` [PATCH v4 5/8] ioreq-server: add support for multiple servers Paul Durrant
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 62+ messages in thread
From: Paul Durrant @ 2014-04-02 15:11 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant, Keir Fraser, Jan Beulich

This patch only creates the ioreq server when the legacy HVM parameters
are read (by an emulator).

A lock is introduced to protect access to the ioreq server by multiple
emulator/tool invocations should such an eventuality arise. The guest is
protected by creation of the ioreq server only being done whilst the
domain is paused.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/hvm/hvm.c           |  262 +++++++++++++++++++++++++++++++-------
 xen/include/asm-x86/hvm/domain.h |    1 +
 2 files changed, 215 insertions(+), 48 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5f131c4..4ecbede 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -383,40 +383,38 @@ void hvm_do_resume(struct vcpu *v)
 {
     struct domain *d = v->domain;
     struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
-    ioreq_t *p;
 
     check_wakeup_from_wait();
 
     if ( is_hvm_vcpu(v) )
         pt_restore_timer(v);
 
-    if ( !s )
-        goto check_inject_trap;
-
-    /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
-    p = get_ioreq(s, v);
-    while ( p->state != STATE_IOREQ_NONE )
+    if ( s )
     {
-        switch ( p->state )
+        ioreq_t *p = get_ioreq(s, v);
+
+        while ( p->state != STATE_IOREQ_NONE )
         {
-        case STATE_IORESP_READY: /* IORESP_READY -> NONE */
-            rmb(); /* see IORESP_READY /then/ read contents of ioreq */
-            hvm_io_assist(p);
-            break;
-        case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
-        case STATE_IOREQ_INPROCESS:
-            wait_on_xen_event_channel(p->vp_eport,
-                                      (p->state != STATE_IOREQ_READY) &&
-                                      (p->state != STATE_IOREQ_INPROCESS));
-            break;
-        default:
-            gdprintk(XENLOG_ERR, "Weird HVM iorequest state %d.\n", p->state);
-            domain_crash(v->domain);
-            return; /* bail */
+            switch ( p->state )
+            {
+            case STATE_IORESP_READY: /* IORESP_READY -> NONE */
+                rmb(); /* see IORESP_READY /then/ read contents of ioreq */
+                hvm_io_assist(p);
+                break;
+            case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
+            case STATE_IOREQ_INPROCESS:
+                wait_on_xen_event_channel(p->vp_eport,
+                                          (p->state != STATE_IOREQ_READY) &&
+                                          (p->state != STATE_IOREQ_INPROCESS));
+                break;
+            default:
+                gdprintk(XENLOG_ERR, "Weird HVM iorequest state %d.\n", p->state);
+                domain_crash(d);
+                return; /* bail */
+            }
         }
     }
 
- check_inject_trap:
     /* Inject pending hw/sw trap */
     if ( v->arch.hvm_vcpu.inject_trap.vector != -1 ) 
     {
@@ -645,13 +643,68 @@ static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s,
     spin_unlock(&s->lock);
 }
 
-static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
+static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
 {
-    struct hvm_ioreq_server *s;
+    struct list_head *entry, *next;
 
-    s = xzalloc(struct hvm_ioreq_server);
-    if ( !s )
-        return -ENOMEM;
+    spin_lock(&s->lock);
+
+    list_for_each_safe ( entry, next, &s->ioreq_vcpu_list )
+    {
+        struct hvm_ioreq_vcpu *sv = container_of(entry, 
+                                                 struct hvm_ioreq_vcpu, 
+                                                 list_entry);
+        struct vcpu *v = sv->vcpu;
+
+        list_del_init(&sv->list_entry);
+
+        if ( v->vcpu_id == 0 )
+            free_xen_event_channel(v, s->bufioreq_evtchn);
+
+        free_xen_event_channel(v, sv->ioreq_evtchn);
+
+        xfree(sv);
+    }
+
+    spin_unlock(&s->lock);
+}
+
+static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s)
+{
+    struct domain *d = s->domain;
+    unsigned long pfn;
+    int rc;
+
+    pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
+    rc = hvm_map_ioreq_page(d, &s->ioreq, pfn);
+    if ( rc )
+        goto fail1;
+
+    pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
+    rc = hvm_map_ioreq_page(d, &s->bufioreq, pfn);
+    if ( rc )
+        goto fail2;
+
+    return 0;
+
+fail2:
+    hvm_unmap_ioreq_page(&s->ioreq);
+
+fail1:
+    return rc;
+}
+
+static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
+{
+    hvm_unmap_ioreq_page(&s->bufioreq);
+    hvm_unmap_ioreq_page(&s->ioreq);
+}
+
+static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, struct domain *d,
+                                 domid_t domid)
+{
+    struct vcpu *v;
+    int rc;
 
     s->domain = d;
     s->domid = domid;
@@ -660,29 +713,112 @@ static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
     INIT_LIST_HEAD(&s->ioreq_vcpu_list);
     spin_lock_init(&s->bufioreq_lock);
 
+    rc = hvm_ioreq_server_map_pages(s);
+    if ( rc )
+        return rc;
+
+    for_each_vcpu ( d, v )
+    {
+        rc = hvm_ioreq_server_add_vcpu(s, v);
+        if ( rc )
+            goto fail;
+    }
+
+    return 0;
+
+ fail:
+    hvm_ioreq_server_remove_all_vcpus(s);
+    hvm_ioreq_server_unmap_pages(s);
+
+    return rc;
+}
+
+static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s)
+{
+    hvm_ioreq_server_remove_all_vcpus(s);
+    hvm_ioreq_server_unmap_pages(s);
+}
+
+static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    rc = -EEXIST;
+    if ( d->arch.hvm_domain.ioreq_server != NULL )
+        goto fail1;
+ 
+    rc = -ENOMEM;
+    s = xzalloc(struct hvm_ioreq_server);
+    if ( !s )
+        goto fail2;
+
+    domain_pause(d);
+
+    rc = hvm_ioreq_server_init(s, d, domid);
+    if ( rc )
+        goto fail3;
+
     d->arch.hvm_domain.ioreq_server = s;
+
+    domain_unpause(d);
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
     return 0;
+
+ fail3:
+    domain_unpause(d);
+
+    xfree(s);
+
+ fail2:
+ fail1:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+    return rc;
 }
 
 static void hvm_destroy_ioreq_server(struct domain *d)
 {
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_server *s;
 
-    hvm_unmap_ioreq_page(&s->bufioreq);
-    hvm_unmap_ioreq_page(&s->ioreq);
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    s = d->arch.hvm_domain.ioreq_server;
+    if ( !s )
+        goto done;
+
+    d->arch.hvm_domain.ioreq_server = NULL;
+
+    domain_pause(d);
+
+    hvm_ioreq_server_deinit(s);
+
+    domain_unpause(d);
 
     xfree(s);
+
+ done:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
 }
 
 static int hvm_set_ioreq_pfn(struct domain *d, bool_t buf,
                              unsigned long pfn)
 {
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    struct hvm_ioreq_server *s;
+    struct hvm_ioreq_page *iorp;
     int rc;
 
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    s = d->arch.hvm_domain.ioreq_server;
+    if ( !s )
+        goto done;
+
     spin_lock(&s->lock);
 
+    iorp = buf ? &s->bufioreq : &s->ioreq;
     rc = hvm_map_ioreq_page(d, iorp, pfn);
     if ( rc )
         goto fail;
@@ -701,10 +837,14 @@ static int hvm_set_ioreq_pfn(struct domain *d, bool_t buf,
     }
 
     spin_unlock(&s->lock);
+
+ done:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
     return 0;
 
  fail:
     spin_unlock(&s->lock);
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
     return rc;
 }
 
@@ -725,9 +865,15 @@ static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
 
 static int hvm_set_dm_domain(struct domain *d, domid_t domid)
 {
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_server *s;
     int rc = 0;
 
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    s = d->arch.hvm_domain.ioreq_server;
+    if ( !s )
+        goto done;
+
     spin_lock(&s->lock);
     domain_pause(d);
 
@@ -764,12 +910,13 @@ static int hvm_set_dm_domain(struct domain *d, domid_t domid)
     domain_unpause(d);
     spin_unlock(&s->lock);
 
+ done:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
     return rc;
 }
 
 int hvm_domain_initialise(struct domain *d)
 {
-    domid_t domid;
     int rc;
 
     if ( !hvm_enabled )
@@ -795,6 +942,7 @@ int hvm_domain_initialise(struct domain *d)
 
     }
 
+    spin_lock_init(&d->arch.hvm_domain.ioreq_server_lock);
     spin_lock_init(&d->arch.hvm_domain.irq_lock);
     spin_lock_init(&d->arch.hvm_domain.uc_lock);
 
@@ -835,21 +983,14 @@ int hvm_domain_initialise(struct domain *d)
 
     rtc_init(d);
 
-    domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
-    rc = hvm_create_ioreq_server(d, domid);
-    if ( rc != 0 )
-        goto fail2;
-
     register_portio_handler(d, 0xe9, 1, hvm_print_line);
 
     rc = hvm_funcs.domain_initialise(d);
     if ( rc != 0 )
-        goto fail3;
+        goto fail2;
 
     return 0;
 
- fail3:
-    hvm_destroy_ioreq_server(d);
  fail2:
     rtc_deinit(d);
     stdvga_deinit(d);
@@ -1506,7 +1647,7 @@ int hvm_vcpu_initialise(struct vcpu *v)
 {
     int rc;
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_server *s;
 
     hvm_asid_flush_vcpu(v);
 
@@ -1549,7 +1690,14 @@ int hvm_vcpu_initialise(struct vcpu *v)
          && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown: nestedhvm_vcpu_destroy */
         goto fail5;
 
-    rc = hvm_ioreq_server_add_vcpu(s, v);
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    s = d->arch.hvm_domain.ioreq_server;
+    if ( s )
+        rc = hvm_ioreq_server_add_vcpu(s, v);
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
     if ( rc != 0 )
         goto fail6;
 
@@ -1586,9 +1734,15 @@ int hvm_vcpu_initialise(struct vcpu *v)
 void hvm_vcpu_destroy(struct vcpu *v)
 {
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_server *s;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    s = d->arch.hvm_domain.ioreq_server;
+    if ( s )
+        hvm_ioreq_server_remove_vcpu(s, v);
 
-    hvm_ioreq_server_remove_vcpu(s, v);
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
 
     nestedhvm_vcpu_destroy(v);
 
@@ -4464,7 +4618,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 domctl_lock_release();
                 break;
             case HVM_PARAM_DM_DOMAIN:
-                /* Not reflexive, as we must domain_pause(). */
+                /* Not reflexive, as we may need to domain_pause(). */
                 rc = -EPERM;
                 if ( curr_d == d )
                     break;
@@ -4570,6 +4724,18 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             case HVM_PARAM_ACPI_S_STATE:
                 a.value = d->arch.hvm_domain.is_s3_suspended ? 3 : 0;
                 break;
+            case HVM_PARAM_IOREQ_PFN:
+            case HVM_PARAM_BUFIOREQ_PFN:
+            case HVM_PARAM_BUFIOREQ_EVTCHN: {
+                domid_t domid;
+                
+                /* May need to create server */
+                domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
+                rc = hvm_create_ioreq_server(d, domid);
+                if ( rc != 0 && rc != -EEXIST )
+                    goto param_fail;
+                /*FALLTHRU*/
+            }
             default:
                 a.value = d->arch.hvm_domain.params[a.index];
                 break;
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 1f6eaec..b6911f9 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -63,6 +63,7 @@ struct hvm_ioreq_server {
 };
 
 struct hvm_domain {
+    spinlock_t              ioreq_server_lock;
     struct hvm_ioreq_server *ioreq_server;
 
     struct pl_time         pl_time;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 4/8] ioreq-server: on-demand creation of ioreq server
  2014-04-02 15:11 ` [PATCH v4 4/8] ioreq-server: on-demand creation of ioreq server Paul Durrant
@ 2014-04-07 11:50   ` Jan Beulich
  2014-04-08  9:35     ` Paul Durrant
  0 siblings, 1 reply; 62+ messages in thread
From: Jan Beulich @ 2014-04-07 11:50 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Keir Fraser, xen-devel

>>> On 02.04.14 at 17:11, <paul.durrant@citrix.com> wrote:
> @@ -645,13 +643,68 @@ static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s,
>      spin_unlock(&s->lock);
>  }
>  
> -static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
> +static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
>  {
> -    struct hvm_ioreq_server *s;
> +    struct list_head *entry, *next;
>  
> -    s = xzalloc(struct hvm_ioreq_server);
> -    if ( !s )
> -        return -ENOMEM;
> +    spin_lock(&s->lock);
> +
> +    list_for_each_safe ( entry, next, &s->ioreq_vcpu_list )
> +    {
> +        struct hvm_ioreq_vcpu *sv = container_of(entry, 
> +                                                 struct hvm_ioreq_vcpu, 
> +                                                 list_entry);

list_for_each_entry_safe() avoids the need for the explicit use of
container_of(), making the code easier to read.

> +static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
> +{
> +    struct hvm_ioreq_server *s;
> +    int rc;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    rc = -EEXIST;
> +    if ( d->arch.hvm_domain.ioreq_server != NULL )
> +        goto fail1;
> + 
> +    rc = -ENOMEM;
> +    s = xzalloc(struct hvm_ioreq_server);

Similar comment as on an earlier patch: Please try to avoid allocations
with lock held.

> +    if ( !s )
> +        goto fail2;
> +
> +    domain_pause(d);

And with that adjusted I would then again wonder whether taking
the lock after pausing the domain wouldn't be the better model.

> @@ -4570,6 +4724,18 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>              case HVM_PARAM_ACPI_S_STATE:
>                  a.value = d->arch.hvm_domain.is_s3_suspended ? 3 : 0;
>                  break;
> +            case HVM_PARAM_IOREQ_PFN:
> +            case HVM_PARAM_BUFIOREQ_PFN:
> +            case HVM_PARAM_BUFIOREQ_EVTCHN: {
> +                domid_t domid;
> +                
> +                /* May need to create server */
> +                domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
> +                rc = hvm_create_ioreq_server(d, domid);

Pretty odd that you do this on reads, but not on writes. What's the
rationale behind this?

Jan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 4/8] ioreq-server: on-demand creation of ioreq server
  2014-04-07 11:50   ` Jan Beulich
@ 2014-04-08  9:35     ` Paul Durrant
  2014-04-08  9:51       ` Jan Beulich
  0 siblings, 1 reply; 62+ messages in thread
From: Paul Durrant @ 2014-04-08  9:35 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir (Xen.org), xen-devel@lists.xen.org

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 07 April 2014 12:51
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Keir (Xen.org)
> Subject: Re: [PATCH v4 4/8] ioreq-server: on-demand creation of ioreq
> server
> 
> >>> On 02.04.14 at 17:11, <paul.durrant@citrix.com> wrote:
> > @@ -645,13 +643,68 @@ static void
> hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s,
> >      spin_unlock(&s->lock);
> >  }
> >
> > -static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
> > +static void hvm_ioreq_server_remove_all_vcpus(struct
> hvm_ioreq_server *s)
> >  {
> > -    struct hvm_ioreq_server *s;
> > +    struct list_head *entry, *next;
> >
> > -    s = xzalloc(struct hvm_ioreq_server);
> > -    if ( !s )
> > -        return -ENOMEM;
> > +    spin_lock(&s->lock);
> > +
> > +    list_for_each_safe ( entry, next, &s->ioreq_vcpu_list )
> > +    {
> > +        struct hvm_ioreq_vcpu *sv = container_of(entry,
> > +                                                 struct hvm_ioreq_vcpu,
> > +                                                 list_entry);
> 
> list_for_each_entry_safe() avoids the need for the explicit use of
> container_of(), making the code easier to read.
> 

It also expands the scope of sv.

> > +static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
> > +{
> > +    struct hvm_ioreq_server *s;
> > +    int rc;
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    rc = -EEXIST;
> > +    if ( d->arch.hvm_domain.ioreq_server != NULL )
> > +        goto fail1;
> > +
> > +    rc = -ENOMEM;
> > +    s = xzalloc(struct hvm_ioreq_server);
> 
> Similar comment as on an earlier patch: Please try to avoid allocations
> with lock held.
> 

Ok.

> > +    if ( !s )
> > +        goto fail2;
> > +
> > +    domain_pause(d);
> 
> And with that adjusted I would then again wonder whether taking
> the lock after pausing the domain wouldn't be the better model.
> 

Yes, it would.

> > @@ -4570,6 +4724,18 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> >              case HVM_PARAM_ACPI_S_STATE:
> >                  a.value = d->arch.hvm_domain.is_s3_suspended ? 3 : 0;
> >                  break;
> > +            case HVM_PARAM_IOREQ_PFN:
> > +            case HVM_PARAM_BUFIOREQ_PFN:
> > +            case HVM_PARAM_BUFIOREQ_EVTCHN: {
> > +                domid_t domid;
> > +
> > +                /* May need to create server */
> > +                domid = d-
> >arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
> > +                rc = hvm_create_ioreq_server(d, domid);
> 
> Pretty odd that you do this on reads, but not on writes. What's the
> rationale behind this?
> 

The default server does not actually need to be there until something (i.e. QEMU) looks for it by reading one of these params. In future I hope that QEMU can be modified to use the explicit ioreq server creation API and we can eventually drop the default server.

  Paul

> Jan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 4/8] ioreq-server: on-demand creation of ioreq server
  2014-04-08  9:35     ` Paul Durrant
@ 2014-04-08  9:51       ` Jan Beulich
  2014-04-08 10:11         ` Paul Durrant
  0 siblings, 1 reply; 62+ messages in thread
From: Jan Beulich @ 2014-04-08  9:51 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Keir (Xen.org), xen-devel@lists.xen.org

>>> On 08.04.14 at 11:35, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >>> On 02.04.14 at 17:11, <paul.durrant@citrix.com> wrote:
>> > @@ -645,13 +643,68 @@ static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s,
>> >      spin_unlock(&s->lock);
>> >  }
>> >
>> > -static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
>> > +static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
>> >  {
>> > -    struct hvm_ioreq_server *s;
>> > +    struct list_head *entry, *next;
>> >
>> > -    s = xzalloc(struct hvm_ioreq_server);
>> > -    if ( !s )
>> > -        return -ENOMEM;
>> > +    spin_lock(&s->lock);
>> > +
>> > +    list_for_each_safe ( entry, next, &s->ioreq_vcpu_list )
>> > +    {
>> > +        struct hvm_ioreq_vcpu *sv = container_of(entry,
>> > +                                                 struct hvm_ioreq_vcpu,
>> > +                                                 list_entry);
>> 
>> list_for_each_entry_safe() avoids the need for the explicit use of
>> container_of(), making the code easier to read.
>> 
> 
> It also expands the scope of sv.

Which does no harm afaict. "entry" (which effectively is "sv") has the
same wider scope already.

>> > @@ -4570,6 +4724,18 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>> >              case HVM_PARAM_ACPI_S_STATE:
>> >                  a.value = d->arch.hvm_domain.is_s3_suspended ? 3 : 0;
>> >                  break;
>> > +            case HVM_PARAM_IOREQ_PFN:
>> > +            case HVM_PARAM_BUFIOREQ_PFN:
>> > +            case HVM_PARAM_BUFIOREQ_EVTCHN: {
>> > +                domid_t domid;
>> > +
>> > +                /* May need to create server */
>> > +                domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
>> > +                rc = hvm_create_ioreq_server(d, domid);
>> 
>> Pretty odd that you do this on reads, but not on writes. What's the
>> rationale behind this?
>> 
> 
> The default server does not actually need to be there until something (i.e. 
> QEMU) looks for it by reading one of these params. In future I hope that QEMU 
> can be modified to use the explicit ioreq server creation API and we can 
> eventually drop the default server.

Oh, okay - the writer of these values just drops them in without
caring what Xen (or another consumer) does with them?

Jan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 4/8] ioreq-server: on-demand creation of ioreq server
  2014-04-08  9:51       ` Jan Beulich
@ 2014-04-08 10:11         ` Paul Durrant
  0 siblings, 0 replies; 62+ messages in thread
From: Paul Durrant @ 2014-04-08 10:11 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir (Xen.org), xen-devel@lists.xen.org

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 08 April 2014 10:51
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Keir (Xen.org)
> Subject: RE: [PATCH v4 4/8] ioreq-server: on-demand creation of ioreq server
> 
> >>> On 08.04.14 at 11:35, <Paul.Durrant@citrix.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >>> On 02.04.14 at 17:11, <paul.durrant@citrix.com> wrote:
> >> > @@ -645,13 +643,68 @@ static void
> hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s,
> >> >      spin_unlock(&s->lock);
> >> >  }
> >> >
> >> > -static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
> >> > +static void hvm_ioreq_server_remove_all_vcpus(struct
> hvm_ioreq_server *s)
> >> >  {
> >> > -    struct hvm_ioreq_server *s;
> >> > +    struct list_head *entry, *next;
> >> >
> >> > -    s = xzalloc(struct hvm_ioreq_server);
> >> > -    if ( !s )
> >> > -        return -ENOMEM;
> >> > +    spin_lock(&s->lock);
> >> > +
> >> > +    list_for_each_safe ( entry, next, &s->ioreq_vcpu_list )
> >> > +    {
> >> > +        struct hvm_ioreq_vcpu *sv = container_of(entry,
> >> > +                                                 struct hvm_ioreq_vcpu,
> >> > +                                                 list_entry);
> >>
> >> list_for_each_entry_safe() avoids the need for the explicit use of
> >> container_of(), making the code easier to read.
> >>
> >
> > It also expands the scope of sv.
> 
> Which does no harm afaict. "entry" (which effectively is "sv") has the
> same wider scope already.
> 
> >> > @@ -4570,6 +4724,18 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> >> >              case HVM_PARAM_ACPI_S_STATE:
> >> >                  a.value = d->arch.hvm_domain.is_s3_suspended ? 3 : 0;
> >> >                  break;
> >> > +            case HVM_PARAM_IOREQ_PFN:
> >> > +            case HVM_PARAM_BUFIOREQ_PFN:
> >> > +            case HVM_PARAM_BUFIOREQ_EVTCHN: {
> >> > +                domid_t domid;
> >> > +
> >> > +                /* May need to create server */
> >> > +                domid = d-
> >arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
> >> > +                rc = hvm_create_ioreq_server(d, domid);
> >>
> >> Pretty odd that you do this on reads, but not on writes. What's the
> >> rationale behind this?
> >>
> >
> > The default server does not actually need to be there until something (i.e.
> > QEMU) looks for it by reading one of these params. In future I hope that
> QEMU
> > can be modified to use the explicit ioreq server creation API and we can
> > eventually drop the default server.
> 
> Oh, okay - the writer of these values just drops them in without
> caring what Xen (or another consumer) does with them?
> 

Yes, that's right. setup_guest() in xc_hvm_build_x86.c writes the pfns in , along with the other specials (like store, etc.). In the case of these, nothing actually needs to happen until QEMU goes looking for what the tools wrote.

  Paul

> Jan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 5/8] ioreq-server: add support for multiple servers
  2014-04-02 15:11 [PATCH v4 0/8] Support for running secondary emulators Paul Durrant
                   ` (3 preceding siblings ...)
  2014-04-02 15:11 ` [PATCH v4 4/8] ioreq-server: on-demand creation of ioreq server Paul Durrant
@ 2014-04-02 15:11 ` Paul Durrant
  2014-04-03 15:32   ` George Dunlap
                     ` (2 more replies)
  2014-04-02 15:11 ` [PATCH v4 6/8] ioreq-server: remove p2m entries when server is enabled Paul Durrant
                   ` (2 subsequent siblings)
  7 siblings, 3 replies; 62+ messages in thread
From: Paul Durrant @ 2014-04-02 15:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Paul Durrant, Ian Jackson, Ian Campbell, Jan Beulich,
	Stefano Stabellini

The previous single ioreq server that was created on demand now
becomes the default server and an API is created to allow secondary
servers, which handle specific IO ranges or PCI devices, to be added.

When the guest issues an IO the list of secondary servers is checked
for a matching IO range or PCI device. If none is found then the IO
is passed to the default server.

Secondary servers use guest pages to communicate with emulators, in
the same way as the default server. These pages need to be in the
guest physmap otherwise there is no suitable reference that can be
queried by an emulator in order to map them. Therefore a pool of
pages in the current E820 reserved region, just below the special
pages is used. Secondary servers allocate from and free to this pool
as they are created and destroyed.

The size of the pool is currently hardcoded in the domain build at a
value of 8. This should be sufficient for now and both the location and
size of the pool can be modified in future without any need to change the
API.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 tools/libxc/xc_domain.c          |  175 +++++++
 tools/libxc/xc_domain_restore.c  |   27 +
 tools/libxc/xc_domain_save.c     |   24 +
 tools/libxc/xc_hvm_build_x86.c   |   30 +-
 tools/libxc/xenctrl.h            |   52 ++
 tools/libxc/xg_save_restore.h    |    2 +
 xen/arch/x86/hvm/hvm.c           | 1035 +++++++++++++++++++++++++++++++++++---
 xen/arch/x86/hvm/io.c            |    2 +-
 xen/include/asm-x86/hvm/domain.h |   34 +-
 xen/include/asm-x86/hvm/hvm.h    |    3 +-
 xen/include/asm-x86/hvm/io.h     |    2 +-
 xen/include/public/hvm/hvm_op.h  |   70 +++
 xen/include/public/hvm/ioreq.h   |    1 +
 xen/include/public/hvm/params.h  |    5 +-
 14 files changed, 1383 insertions(+), 79 deletions(-)

diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 369c3f3..8cec171 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1284,6 +1284,181 @@ int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long
     return rc;
 }
 
+int xc_hvm_create_ioreq_server(xc_interface *xch,
+                               domid_t domid,
+                               ioservid_t *id)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_create_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_create_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    rc = do_xen_hypercall(xch, &hypercall);
+    *id = arg->id;
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_get_ioreq_server_info(xc_interface *xch,
+                                 domid_t domid,
+                                 ioservid_t id,
+                                 xen_pfn_t *ioreq_pfn,
+                                 xen_pfn_t *bufioreq_pfn,
+                                 evtchn_port_t *bufioreq_port)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_get_ioreq_server_info_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_get_ioreq_server_info;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->id = id;
+    rc = do_xen_hypercall(xch, &hypercall);
+    if ( rc != 0 )
+        goto done;
+
+    if ( ioreq_pfn )
+        *ioreq_pfn = arg->ioreq_pfn;
+
+    if ( bufioreq_pfn )
+        *bufioreq_pfn = arg->bufioreq_pfn;
+
+    if ( bufioreq_port )
+        *bufioreq_port = arg->bufioreq_port;
+
+done:
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_map_io_range_to_ioreq_server(xc_interface *xch, domid_t domid,
+                                        ioservid_t id, int is_mmio,
+                                        uint64_t start, uint64_t end)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_map_io_range_to_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->id = id;
+    arg->is_mmio = is_mmio;
+    arg->start = start;
+    arg->end = end;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch, domid_t domid,
+                                            ioservid_t id, int is_mmio,
+                                            uint64_t start)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_unmap_io_range_from_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_unmap_io_range_from_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->id = id;
+    arg->is_mmio = is_mmio;
+    arg->start = start;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch, domid_t domid,
+                                      ioservid_t id, uint16_t bdf)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_map_pcidev_to_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_map_pcidev_to_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->id = id;
+    arg->bdf = bdf;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch, domid_t domid,
+                                          ioservid_t id, uint16_t bdf)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_unmap_pcidev_from_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_unmap_pcidev_from_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->id = id;
+    arg->bdf = bdf;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_destroy_ioreq_server(xc_interface *xch,
+                                domid_t domid,
+                                ioservid_t id)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_destroy_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_destroy_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->id = id;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
 int xc_domain_setdebugging(xc_interface *xch,
                            uint32_t domid,
                            unsigned int enable)
diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
index bcb0ae0..7350b31 100644
--- a/tools/libxc/xc_domain_restore.c
+++ b/tools/libxc/xc_domain_restore.c
@@ -740,6 +740,8 @@ typedef struct {
     uint64_t acpi_ioport_location;
     uint64_t viridian;
     uint64_t vm_generationid_addr;
+    uint64_t ioreq_server_pfn;
+    uint64_t nr_ioreq_server_pages;
 
     struct toolstack_data_t tdata;
 } pagebuf_t;
@@ -990,6 +992,26 @@ static int pagebuf_get_one(xc_interface *xch, struct restore_ctx *ctx,
         DPRINTF("read generation id buffer address");
         return pagebuf_get_one(xch, ctx, buf, fd, dom);
 
+    case XC_SAVE_ID_HVM_IOREQ_SERVER_PFN:
+        /* Skip padding 4 bytes then read the ioreq server gmfn base. */
+        if ( RDEXACT(fd, &buf->ioreq_server_pfn, sizeof(uint32_t)) ||
+             RDEXACT(fd, &buf->ioreq_server_pfn, sizeof(uint64_t)) )
+        {
+            PERROR("error read the ioreq server gmfn base");
+            return -1;
+        }
+        return pagebuf_get_one(xch, ctx, buf, fd, dom);
+
+    case XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES:
+        /* Skip padding 4 bytes then read the ioreq server gmfn count. */
+        if ( RDEXACT(fd, &buf->nr_ioreq_server_pages, sizeof(uint32_t)) ||
+             RDEXACT(fd, &buf->nr_ioreq_server_pages, sizeof(uint64_t)) )
+        {
+            PERROR("error read the ioreq server gmfn count");
+            return -1;
+        }
+        return pagebuf_get_one(xch, ctx, buf, fd, dom);
+
     default:
         if ( (count > MAX_BATCH_SIZE) || (count < 0) ) {
             ERROR("Max batch size exceeded (%d). Giving up.", count);
@@ -1748,6 +1770,11 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
     if (pagebuf.viridian != 0)
         xc_set_hvm_param(xch, dom, HVM_PARAM_VIRIDIAN, 1);
 
+    xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_SERVER_PFN,
+                     pagebuf.ioreq_server_pfn);
+    xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVER_PAGES, 
+                     pagebuf.nr_ioreq_server_pages);
+
     if (pagebuf.acpi_ioport_location == 1) {
         DBGPRINTF("Use new firmware ioport from the checkpoint\n");
         xc_set_hvm_param(xch, dom, HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
index 71f9b59..acf3685 100644
--- a/tools/libxc/xc_domain_save.c
+++ b/tools/libxc/xc_domain_save.c
@@ -1737,6 +1737,30 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
             PERROR("Error when writing the viridian flag");
             goto out;
         }
+
+        chunk.id = XC_SAVE_ID_HVM_IOREQ_SERVER_PFN;
+        chunk.data = 0;
+        xc_get_hvm_param(xch, dom, HVM_PARAM_IOREQ_SERVER_PFN,
+                         (unsigned long *)&chunk.data);
+
+        if ( (chunk.data != 0) &&
+             wrexact(io_fd, &chunk, sizeof(chunk)) )
+        {
+            PERROR("Error when writing the ioreq server gmfn base");
+            goto out;
+        }
+
+        chunk.id = XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES;
+        chunk.data = 0;
+        xc_get_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVER_PAGES,
+                         (unsigned long *)&chunk.data);
+
+        if ( (chunk.data != 0) &&
+             wrexact(io_fd, &chunk, sizeof(chunk)) )
+        {
+            PERROR("Error when writing the ioreq server gmfn count");
+            goto out;
+        }
     }
 
     if ( callbacks != NULL && callbacks->toolstack_save != NULL )
diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index dd3b522..3564e8b 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -49,6 +49,9 @@
 #define NR_SPECIAL_PAGES     8
 #define special_pfn(x) (0xff000u - NR_SPECIAL_PAGES + (x))
 
+#define NR_IOREQ_SERVER_PAGES 8
+#define ioreq_server_pfn(x) (special_pfn(0) - NR_IOREQ_SERVER_PAGES + (x))
+
 #define VGA_HOLE_SIZE (0x20)
 
 static int modules_init(struct xc_hvm_build_args *args,
@@ -114,7 +117,7 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
     /* Memory parameters. */
     hvm_info->low_mem_pgend = lowmem_end >> PAGE_SHIFT;
     hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT;
-    hvm_info->reserved_mem_pgstart = special_pfn(0);
+    hvm_info->reserved_mem_pgstart = ioreq_server_pfn(0);
 
     /* Finish with the checksum. */
     for ( i = 0, sum = 0; i < hvm_info->length; i++ )
@@ -502,6 +505,31 @@ static int setup_guest(xc_interface *xch,
                      special_pfn(SPECIALPAGE_SHARING));
 
     /*
+     * Allocate and clear additional ioreq server pages. The default
+     * server will use the IOREQ and BUFIOREQ special pages above.
+     */
+    for ( i = 0; i < NR_IOREQ_SERVER_PAGES; i++ )
+    {
+        xen_pfn_t pfn = ioreq_server_pfn(i);
+
+        rc = xc_domain_populate_physmap_exact(xch, dom, 1, 0, 0, &pfn);
+        if ( rc != 0 )
+        {
+            PERROR("Could not allocate %d'th ioreq server page.", i);
+            goto error_out;
+        }
+
+        if ( xc_clear_domain_page(xch, dom, pfn) )
+            goto error_out;
+    }
+
+    /* Tell the domain where the pages are and how many there are */
+    xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_SERVER_PFN,
+                     ioreq_server_pfn(0));
+    xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVER_PAGES,
+                     NR_IOREQ_SERVER_PAGES);
+
+    /*
      * Identity-map page table is required for running with CR0.PG=0 when
      * using Intel EPT. Create a 32-bit non-PAE page directory of superpages.
      */
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index e3a32f2..3b0c678 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1801,6 +1801,48 @@ void xc_clear_last_error(xc_interface *xch);
 int xc_set_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long value);
 int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long *value);
 
+/*
+ * IOREQ server API
+ */
+
+int xc_hvm_create_ioreq_server(xc_interface *xch,
+                               domid_t domid,
+                               ioservid_t *id);
+
+int xc_hvm_get_ioreq_server_info(xc_interface *xch,
+                                 domid_t domid,
+                                 ioservid_t id,
+                                 xen_pfn_t *ioreq_pfn,
+                                 xen_pfn_t *bufioreq_pfn,
+                                 evtchn_port_t *bufioreq_port);
+
+int xc_hvm_map_io_range_to_ioreq_server(xc_interface *xch,
+                                        domid_t domid,
+                                        ioservid_t id,
+                                        int is_mmio,
+                                        uint64_t start,
+                                        uint64_t end);
+
+int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch,
+                                            domid_t domid,
+                                            ioservid_t id,
+                                            int is_mmio,
+                                            uint64_t start);
+
+int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch,
+                                      domid_t domid,
+                                      ioservid_t id,
+                                      uint16_t bdf);
+
+int xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch,
+                                          domid_t domid,
+                                          ioservid_t id,
+                                          uint16_t bdf);
+
+int xc_hvm_destroy_ioreq_server(xc_interface *xch,
+                                domid_t domid,
+                                ioservid_t id);
+
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
                      uint32_t domid,
@@ -2425,3 +2467,13 @@ int xc_kexec_load(xc_interface *xch, uint8_t type, uint16_t arch,
 int xc_kexec_unload(xc_interface *xch, int type);
 
 #endif /* XENCTRL_H */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxc/xg_save_restore.h b/tools/libxc/xg_save_restore.h
index f859621..f1ec7f5 100644
--- a/tools/libxc/xg_save_restore.h
+++ b/tools/libxc/xg_save_restore.h
@@ -259,6 +259,8 @@
 #define XC_SAVE_ID_HVM_ACCESS_RING_PFN  -16
 #define XC_SAVE_ID_HVM_SHARING_RING_PFN -17
 #define XC_SAVE_ID_TOOLSTACK          -18 /* Optional toolstack specific info */
+#define XC_SAVE_ID_HVM_IOREQ_SERVER_PFN -19
+#define XC_SAVE_ID_HVM_NR_IOREQ_SERVER_PAGES -20
 
 /*
 ** We process save/restore/migrate in batches of pages; the below
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 4ecbede..5af01b0 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -369,28 +369,39 @@ static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v)
 
 bool_t hvm_io_pending(struct vcpu *v)
 {
-    struct hvm_ioreq_server *s = v->domain->arch.hvm_domain.ioreq_server;
-    ioreq_t *p;
-
-    if ( !s )
-        return 0;
+    struct domain *d = v->domain;
+    struct list_head *entry;
+ 
+    list_for_each ( entry, &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
+        ioreq_t *p = get_ioreq(s, v);
+ 
+        p = get_ioreq(s, v);
+        if ( p->state != STATE_IOREQ_NONE )
+            return 1;
+    }
 
-    p = get_ioreq(s, v);
-    return ( p->state != STATE_IOREQ_NONE );
+    return 0;
 }
 
 void hvm_do_resume(struct vcpu *v)
 {
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct list_head *entry;
 
     check_wakeup_from_wait();
 
     if ( is_hvm_vcpu(v) )
         pt_restore_timer(v);
 
-    if ( s )
+    list_for_each ( entry, &d->arch.hvm_domain.ioreq_server_list )
     {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
         ioreq_t *p = get_ioreq(s, v);
 
         while ( p->state != STATE_IOREQ_NONE )
@@ -408,7 +419,8 @@ void hvm_do_resume(struct vcpu *v)
                                           (p->state != STATE_IOREQ_INPROCESS));
                 break;
             default:
-                gdprintk(XENLOG_ERR, "Weird HVM iorequest state %d.\n", p->state);
+                gdprintk(XENLOG_ERR, "Weird HVM iorequest state %d.\n",
+                         p->state);
                 domain_crash(d);
                 return; /* bail */
             }
@@ -423,6 +435,31 @@ void hvm_do_resume(struct vcpu *v)
     }
 }
 
+static int hvm_alloc_ioreq_gmfn(struct domain *d, unsigned long *gmfn)
+{
+    unsigned int i;
+    int rc;
+
+    rc = -ENOMEM;
+    for ( i = 0; i < d->arch.hvm_domain.ioreq_gmfn_count; i++ )
+    {
+        if ( !test_and_set_bit(i, &d->arch.hvm_domain.ioreq_gmfn_mask) ) {
+            *gmfn = d->arch.hvm_domain.ioreq_gmfn_base + i;
+            rc = 0;
+            break;
+        }
+    }
+
+    return rc;
+}
+
+static void hvm_free_ioreq_gmfn(struct domain *d, unsigned long gmfn)
+{
+    unsigned int i = gmfn - d->arch.hvm_domain.ioreq_gmfn_base;
+
+    clear_bit(i, &d->arch.hvm_domain.ioreq_gmfn_mask);
+}
+
 void destroy_ring_for_helper(
     void **_va, struct page_info *page)
 {
@@ -503,6 +540,7 @@ static int hvm_map_ioreq_page(
 
     iorp->va = va;
     iorp->page = page;
+    iorp->gmfn = gmfn;
 
     return 0;
 }
@@ -533,6 +571,83 @@ static int hvm_print_line(
     return X86EMUL_OKAY;
 }
 
+static int hvm_access_cf8(
+    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+{
+    struct vcpu *curr = current;
+    struct hvm_domain *hd = &curr->domain->arch.hvm_domain;
+    int rc;
+
+    BUG_ON(port < 0xcf8);
+    port -= 0xcf8;
+
+    spin_lock(&hd->pci_lock);
+
+    if ( dir == IOREQ_WRITE )
+    {
+        switch ( bytes )
+        {
+        case 4:
+            hd->pci_cf8 = *val;
+            break;
+
+        case 2:
+        {
+            uint32_t mask = 0xffff << (port * 8);
+            uint32_t subval = *val << (port * 8);
+
+            hd->pci_cf8 = (hd->pci_cf8 & ~mask) |
+                          (subval & mask);
+            break;
+        }
+            
+        case 1:
+        {
+            uint32_t mask = 0xff << (port * 8);
+            uint32_t subval = *val << (port * 8);
+
+            hd->pci_cf8 = (hd->pci_cf8 & ~mask) |
+                          (subval & mask);
+            break;
+        }
+
+        default:
+            break;
+        }
+
+        /* We always need to fall through to the catch all emulator */
+        rc = X86EMUL_UNHANDLEABLE;
+    }
+    else
+    {
+        switch ( bytes )
+        {
+        case 4:
+            *val = hd->pci_cf8;
+            rc = X86EMUL_OKAY;
+            break;
+
+        case 2:
+            *val = (hd->pci_cf8 >> (port * 8)) & 0xffff;
+            rc = X86EMUL_OKAY;
+            break;
+            
+        case 1:
+            *val = (hd->pci_cf8 >> (port * 8)) & 0xff;
+            rc = X86EMUL_OKAY;
+            break;
+
+        default:
+            rc = X86EMUL_UNHANDLEABLE;
+            break;
+        }
+    }
+
+    spin_unlock(&hd->pci_lock);
+
+    return rc;
+}
+
 static int handle_pvh_io(
     int dir, uint32_t port, uint32_t bytes, uint32_t *val)
 {
@@ -561,7 +676,7 @@ static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s,
 }
 
 static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
-                                     struct vcpu *v)
+                                     bool_t is_default, struct vcpu *v)
 {
     struct hvm_ioreq_vcpu *sv;
     int rc;
@@ -589,8 +704,9 @@ static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
             goto fail3;
 
         s->bufioreq_evtchn = rc;
-        d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] =
-            s->bufioreq_evtchn;
+        if ( is_default )
+            d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] =
+                s->bufioreq_evtchn;
     }
 
     sv->vcpu = v;
@@ -669,57 +785,90 @@ static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
     spin_unlock(&s->lock);
 }
 
-static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s)
+static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s,
+                                      bool_t is_default)
 {
     struct domain *d = s->domain;
-    unsigned long pfn;
+    unsigned long ioreq_pfn, bufioreq_pfn;
     int rc;
 
-    pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
-    rc = hvm_map_ioreq_page(d, &s->ioreq, pfn);
+    if ( is_default ) {
+        ioreq_pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
+        bufioreq_pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
+    } else {
+        rc = hvm_alloc_ioreq_gmfn(d, &ioreq_pfn);
+        if ( rc )
+            goto fail1;
+
+        rc = hvm_alloc_ioreq_gmfn(d, &bufioreq_pfn);
+        if ( rc )
+            goto fail2;
+    }
+
+    rc = hvm_map_ioreq_page(d, &s->ioreq, ioreq_pfn);
     if ( rc )
-        goto fail1;
+        goto fail3;
 
-    pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
-    rc = hvm_map_ioreq_page(d, &s->bufioreq, pfn);
+    rc = hvm_map_ioreq_page(d, &s->bufioreq, bufioreq_pfn);
     if ( rc )
-        goto fail2;
+        goto fail4;
 
     return 0;
 
-fail2:
+fail4:
     hvm_unmap_ioreq_page(&s->ioreq);
 
+fail3:
+    if ( !is_default )
+        hvm_free_ioreq_gmfn(d, bufioreq_pfn);
+
+fail2:
+    if ( !is_default )
+        hvm_free_ioreq_gmfn(d, ioreq_pfn);
+
 fail1:
     return rc;
 }
 
-static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
+static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s, 
+                                         bool_t is_default)
 {
+    struct domain *d = s->domain;
+
     hvm_unmap_ioreq_page(&s->bufioreq);
     hvm_unmap_ioreq_page(&s->ioreq);
+
+    if ( !is_default ) {
+        hvm_free_ioreq_gmfn(d, s->bufioreq.gmfn);
+        hvm_free_ioreq_gmfn(d, s->ioreq.gmfn);
+    }
 }
 
 static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, struct domain *d,
-                                 domid_t domid)
+                                 domid_t domid, bool_t is_default,
+                                 ioservid_t id)
 {
     struct vcpu *v;
     int rc;
 
+    s->id = id;
     s->domain = d;
     s->domid = domid;
+    INIT_LIST_HEAD(&s->mmio_range_list);
+    INIT_LIST_HEAD(&s->portio_range_list);
+    INIT_LIST_HEAD(&s->pcidev_list);
 
     spin_lock_init(&s->lock);
     INIT_LIST_HEAD(&s->ioreq_vcpu_list);
     spin_lock_init(&s->bufioreq_lock);
 
-    rc = hvm_ioreq_server_map_pages(s);
+    rc = hvm_ioreq_server_map_pages(s, is_default);
     if ( rc )
         return rc;
 
     for_each_vcpu ( d, v )
     {
-        rc = hvm_ioreq_server_add_vcpu(s, v);
+        rc = hvm_ioreq_server_add_vcpu(s, is_default, v);
         if ( rc )
             goto fail;
     }
@@ -728,18 +877,53 @@ static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, struct domain *d,
 
  fail:
     hvm_ioreq_server_remove_all_vcpus(s);
-    hvm_ioreq_server_unmap_pages(s);
+    hvm_ioreq_server_unmap_pages(s, is_default);
 
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
     return rc;
 }
 
-static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s)
+static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s,
+                                    bool_t is_default)
 {
+    struct list_head *entry;
+
+    list_for_each ( entry,
+                    &s->mmio_range_list )
+    {
+        struct hvm_io_range *x = list_entry(entry,
+                                            struct hvm_io_range,
+                                            list_entry);
+
+        xfree(x);
+    }
+
+    list_for_each ( entry,
+                    &s->portio_range_list )
+    {
+        struct hvm_io_range *x = list_entry(entry,
+                                            struct hvm_io_range,
+                                            list_entry);
+
+        xfree(x);
+    }
+
+    list_for_each ( entry,
+                    &s->mmio_range_list )
+    {
+        struct hvm_pcidev *x = list_entry(entry,
+                                          struct hvm_pcidev,
+                                          list_entry);
+
+        xfree(x);
+    }
+
     hvm_ioreq_server_remove_all_vcpus(s);
-    hvm_ioreq_server_unmap_pages(s);
+    hvm_ioreq_server_unmap_pages(s, is_default);
 }
 
-static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
+static int hvm_create_ioreq_server(struct domain *d, domid_t domid,
+                                   bool_t is_default, ioservid_t *id)
 {
     struct hvm_ioreq_server *s;
     int rc;
@@ -747,7 +931,7 @@ static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
     spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
 
     rc = -EEXIST;
-    if ( d->arch.hvm_domain.ioreq_server != NULL )
+    if ( is_default && d->arch.hvm_domain.default_ioreq_server != NULL )
         goto fail1;
  
     rc = -ENOMEM;
@@ -757,14 +941,23 @@ static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
 
     domain_pause(d);
 
-    rc = hvm_ioreq_server_init(s, d, domid);
+    rc = hvm_ioreq_server_init(s, d, domid, is_default,
+                               d->arch.hvm_domain.ioreq_server_id++);
     if ( rc )
         goto fail3;
 
-    d->arch.hvm_domain.ioreq_server = s;
+    list_add(&s->list_entry,
+             &d->arch.hvm_domain.ioreq_server_list);
+    d->arch.hvm_domain.ioreq_server_count++;
+
+    if ( is_default )
+        d->arch.hvm_domain.default_ioreq_server = s;
 
     domain_unpause(d);
 
+    if (id != NULL)
+        *id = s->id;
+
     spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
     return 0;
 
@@ -779,27 +972,363 @@ static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
     return rc;
 }
 
-static void hvm_destroy_ioreq_server(struct domain *d)
+static void hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
+{
+    struct list_head *entry;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    list_for_each ( entry,
+                    &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
+        bool_t is_default = ( s == d->arch.hvm_domain.default_ioreq_server);
+
+        if ( s->id != id )
+            continue;
+
+        domain_pause(d);
+
+        if ( is_default )
+            d->arch.hvm_domain.default_ioreq_server = NULL;
+
+        --d->arch.hvm_domain.ioreq_server_count;
+        list_del_init(&s->list_entry);
+        
+        hvm_ioreq_server_deinit(s, is_default);
+
+        domain_unpause(d);
+
+        xfree(s);
+
+        break;
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+}
+
+static int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
+                                     unsigned long *ioreq_pfn,
+                                     unsigned long *bufioreq_pfn,
+                                     evtchn_port_t *bufioreq_port)
+{
+    struct list_head *entry;
+    int rc;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    rc = -ENOENT;
+    list_for_each ( entry,
+                    &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
+
+        if ( s->id != id )
+            continue;
+
+        *ioreq_pfn = s->ioreq.gmfn;
+        *bufioreq_pfn = s->bufioreq.gmfn;
+        *bufioreq_port = s->bufioreq_evtchn;
+
+        rc = 0;
+        break;
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
+static int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
+                                            int is_mmio, uint64_t start,
+                                            uint64_t end)
 {
     struct hvm_ioreq_server *s;
+    struct hvm_io_range *x;
+    struct list_head *list;
+    int rc;
+
+    x = xmalloc(struct hvm_io_range);
+    if ( x == NULL )
+        return -ENOMEM;
 
     spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
 
-    s = d->arch.hvm_domain.ioreq_server;
-    if ( !s )
-        goto done;
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        if ( s->id == id )
+            goto found;
+    }
 
-    d->arch.hvm_domain.ioreq_server = NULL;
+    goto fail;
 
-    domain_pause(d);
+ found:
+    INIT_RCU_HEAD(&x->rcu);
+    x->start = start;
+    x->end = end;
 
-    hvm_ioreq_server_deinit(s);
+    list = ( is_mmio ) ? &s->mmio_range_list : &s->portio_range_list;
+    list_add_rcu(&x->list_entry, list);
 
-    domain_unpause(d);
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
 
-    xfree(s);
+    return 0;
+
+ fail:
+    xfree(x);
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
+static void free_io_range(struct rcu_head *rcu)
+{
+    struct hvm_io_range *x;
+
+    x = container_of (rcu, struct hvm_io_range, rcu);
+
+    xfree(x);
+}
+
+static int hvm_unmap_io_range_from_ioreq_server(struct domain *d, 
+                                                ioservid_t id,
+                                                int is_mmio, uint64_t start)
+{
+    struct hvm_ioreq_server *s;
+    struct list_head *list, *entry;
+    int rc;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        if ( s->id == id )
+            goto found;
+    }
+
+    goto done;
+
+ found:
+    list = ( is_mmio ) ? &s->mmio_range_list : &s->portio_range_list;
+
+    list_for_each ( entry,
+                    list )
+    {
+        struct hvm_io_range *x = list_entry(entry,
+                                            struct hvm_io_range,
+                                            list_entry);
+
+        if ( start == x->start )
+        {
+            list_del_rcu(&x->list_entry);
+            call_rcu(&x->rcu, free_io_range);
+
+            rc = 0;
+            break;
+        }
+    }
+
+ done:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
+static int hvm_map_pcidev_to_ioreq_server(struct domain *d, ioservid_t id,
+                                          uint16_t bdf)
+{
+    struct hvm_ioreq_server *s;
+    struct hvm_pcidev *x;
+    int rc;
+
+    x = xmalloc(struct hvm_pcidev);
+    if ( x == NULL )
+        return -ENOMEM;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        if ( s->id == id )
+            goto found;
+    }
+
+    goto fail;
+
+ found:
+    INIT_RCU_HEAD(&x->rcu);
+    x->bdf = bdf;
+
+    list_add_rcu(&x->list_entry, &s->pcidev_list);
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return 0;
+
+ fail:
+    xfree(x);
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
+static void free_pcidev(struct rcu_head *rcu)
+{
+    struct hvm_pcidev *x;
+
+    x = container_of (rcu, struct hvm_pcidev, rcu);
+
+    xfree(x);
+}
+
+static int hvm_unmap_pcidev_from_ioreq_server(struct domain *d, ioservid_t id,
+                                              uint16_t bdf)
+{
+    struct hvm_ioreq_server *s;
+    struct list_head *entry;
+    int rc;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        if ( s->id == id )
+            goto found;
+    }
+
+    goto done;
+
+ found:
+    list_for_each ( entry,
+                    &s->pcidev_list )
+    {
+        struct hvm_pcidev *x = list_entry(entry,
+                                          struct hvm_pcidev,
+                                          list_entry);
+
+        if ( bdf == x->bdf )
+        {
+            list_del_rcu(&x->list_entry);
+            call_rcu(&x->rcu, free_pcidev);
+
+            rc = 0;
+            break;
+        }
+    }
+
+ done:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
+static int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
+{
+    struct list_head *entry;
+    int rc;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    list_for_each ( entry,
+                    &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
+        bool_t is_default = ( s == d->arch.hvm_domain.default_ioreq_server);
+
+        rc = hvm_ioreq_server_add_vcpu(s, is_default, v);
+        if ( rc )
+            goto fail;
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return 0;
+
+ fail:
+    list_for_each ( entry,
+                    &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
+
+        hvm_ioreq_server_remove_vcpu(s, v);
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
+static void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v)
+{
+    struct list_head *entry;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    list_for_each ( entry,
+                    &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
+
+        hvm_ioreq_server_remove_vcpu(s, v);
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+}
+
+static void hvm_destroy_all_ioreq_servers(struct domain *d)
+{
+    struct list_head *entry, *next;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    list_for_each_safe ( entry,
+                         next,
+                         &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
+        bool_t is_default = ( s == d->arch.hvm_domain.default_ioreq_server);
+
+        domain_pause(d);
+
+        if ( is_default )
+            d->arch.hvm_domain.default_ioreq_server = NULL;
+
+        --d->arch.hvm_domain.ioreq_server_count;
+        list_del_init(&s->list_entry);
+        
+        hvm_ioreq_server_deinit(s, is_default);
+
+        domain_unpause(d);
+
+        xfree(s);
+    }
 
- done:
     spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
 }
 
@@ -812,7 +1341,7 @@ static int hvm_set_ioreq_pfn(struct domain *d, bool_t buf,
 
     spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
 
-    s = d->arch.hvm_domain.ioreq_server;
+    s = d->arch.hvm_domain.default_ioreq_server;
     if ( !s )
         goto done;
 
@@ -870,7 +1399,7 @@ static int hvm_set_dm_domain(struct domain *d, domid_t domid)
 
     spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
 
-    s = d->arch.hvm_domain.ioreq_server;
+    s = d->arch.hvm_domain.default_ioreq_server;
     if ( !s )
         goto done;
 
@@ -943,6 +1472,8 @@ int hvm_domain_initialise(struct domain *d)
     }
 
     spin_lock_init(&d->arch.hvm_domain.ioreq_server_lock);
+    INIT_LIST_HEAD(&d->arch.hvm_domain.ioreq_server_list);
+    spin_lock_init(&d->arch.hvm_domain.pci_lock);
     spin_lock_init(&d->arch.hvm_domain.irq_lock);
     spin_lock_init(&d->arch.hvm_domain.uc_lock);
 
@@ -984,6 +1515,7 @@ int hvm_domain_initialise(struct domain *d)
     rtc_init(d);
 
     register_portio_handler(d, 0xe9, 1, hvm_print_line);
+    register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
 
     rc = hvm_funcs.domain_initialise(d);
     if ( rc != 0 )
@@ -1014,7 +1546,7 @@ void hvm_domain_relinquish_resources(struct domain *d)
     if ( hvm_funcs.nhvm_domain_relinquish_resources )
         hvm_funcs.nhvm_domain_relinquish_resources(d);
 
-    hvm_destroy_ioreq_server(d);
+    hvm_destroy_all_ioreq_servers(d);
 
     msixtbl_pt_cleanup(d);
 
@@ -1647,7 +2179,6 @@ int hvm_vcpu_initialise(struct vcpu *v)
 {
     int rc;
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s;
 
     hvm_asid_flush_vcpu(v);
 
@@ -1690,14 +2221,7 @@ int hvm_vcpu_initialise(struct vcpu *v)
          && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown: nestedhvm_vcpu_destroy */
         goto fail5;
 
-    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
-
-    s = d->arch.hvm_domain.ioreq_server;
-    if ( s )
-        rc = hvm_ioreq_server_add_vcpu(s, v);
-
-    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
-
+    rc = hvm_all_ioreq_servers_add_vcpu(d, v);
     if ( rc != 0 )
         goto fail6;
 
@@ -1734,15 +2258,8 @@ int hvm_vcpu_initialise(struct vcpu *v)
 void hvm_vcpu_destroy(struct vcpu *v)
 {
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s;
 
-    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
-
-    s = d->arch.hvm_domain.ioreq_server;
-    if ( s )
-        hvm_ioreq_server_remove_vcpu(s, v);
-
-    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+    hvm_all_ioreq_servers_remove_vcpu(d, v);
 
     nestedhvm_vcpu_destroy(v);
 
@@ -1781,9 +2298,105 @@ void hvm_vcpu_down(struct vcpu *v)
     }
 }
 
-int hvm_buffered_io_send(struct domain *d, const ioreq_t *p)
+static DEFINE_RCU_READ_LOCK(ioreq_server_rcu_lock);
+
+static struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
+                                                        ioreq_t *p)
+{
+#define BDF(cf8) (((cf8) & 0x00ffff00) >> 8)
+
+    struct hvm_ioreq_server *s;
+    uint8_t type;
+    uint64_t addr;
+
+    if ( d->arch.hvm_domain.ioreq_server_count == 1 &&
+         d->arch.hvm_domain.default_ioreq_server )
+        goto done;
+
+    if ( p->type == IOREQ_TYPE_PIO &&
+         (p->addr & ~3) == 0xcfc )
+    { 
+        /* PCI config data cycle */
+        type = IOREQ_TYPE_PCI_CONFIG;
+
+        spin_lock(&d->arch.hvm_domain.pci_lock);
+        addr = d->arch.hvm_domain.pci_cf8 + (p->addr & 3);
+        spin_unlock(&d->arch.hvm_domain.pci_lock);
+    }
+    else
+    {
+        type = p->type;
+        addr = p->addr;
+    }
+
+    rcu_read_lock(&ioreq_server_rcu_lock);
+
+    switch ( type )
+    {
+    case IOREQ_TYPE_COPY:
+    case IOREQ_TYPE_PIO:
+    case IOREQ_TYPE_PCI_CONFIG:
+        break;
+    default:
+        goto done;
+    }
+
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          list_entry )
+    {
+        switch ( type )
+        {
+            case IOREQ_TYPE_COPY:
+            case IOREQ_TYPE_PIO: {
+                struct list_head *list;
+                struct hvm_io_range *x;
+
+                list = ( type == IOREQ_TYPE_COPY ) ?
+                    &s->mmio_range_list :
+                    &s->portio_range_list;
+
+                list_for_each_entry ( x,
+                                      list,
+                                      list_entry )
+                {
+                    if ( (addr >= x->start) && (addr <= x->end) )
+                        goto found;
+                }
+                break;
+            }
+            case IOREQ_TYPE_PCI_CONFIG: {
+                struct hvm_pcidev *x;
+
+                list_for_each_entry ( x,
+                                      &s->pcidev_list,
+                                      list_entry )
+                {
+                    if ( BDF(addr) == x->bdf ) {
+                        p->type = type;
+                        p->addr = addr;
+                        goto found;
+                    }
+                }
+                break;
+            }
+        }
+    }
+
+ done:
+    s = d->arch.hvm_domain.default_ioreq_server;
+
+ found:
+    rcu_read_unlock(&ioreq_server_rcu_lock);
+
+    return s;
+
+#undef BDF
+}
+
+int hvm_buffered_io_send(struct domain *d, ioreq_t *p)
 {
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_server *s = hvm_select_ioreq_server(d, p);
     struct hvm_ioreq_page *iorp;
     buffered_iopage_t *pg;
     buf_ioreq_t bp;
@@ -1865,21 +2478,19 @@ int hvm_buffered_io_send(struct domain *d, const ioreq_t *p)
 
 bool_t hvm_has_dm(struct domain *d)
 {
-    return !!d->arch.hvm_domain.ioreq_server;
+    return !list_empty(&d->arch.hvm_domain.ioreq_server_list);
 }
 
-bool_t hvm_send_assist_req(struct vcpu *v, const ioreq_t *proto_p)
+bool_t hvm_send_assist_req_to_ioreq_server(struct hvm_ioreq_server *s,
+                                           struct vcpu *v,
+                                           ioreq_t *proto_p)
 {
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
     ioreq_t *p;
 
     if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
         return 0; /* implicitly bins the i/o operation */
 
-    if ( !s )
-        return 0;
-
     p = get_ioreq(s, v);
 
     if ( unlikely(p->state != STATE_IOREQ_NONE) )
@@ -1911,6 +2522,33 @@ bool_t hvm_send_assist_req(struct vcpu *v, const ioreq_t *proto_p)
     return 1;
 }
 
+bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *p)
+{
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s = hvm_select_ioreq_server(d, p);
+
+    if ( !s )
+        return 0;
+
+    return hvm_send_assist_req_to_ioreq_server(s, v, p);
+}
+
+void hvm_broadcast_assist_req(struct vcpu *v, ioreq_t *p)
+{
+    struct domain *d = v->domain;
+    struct list_head *entry;
+
+    list_for_each ( entry,
+                    &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
+
+        (void) hvm_send_assist_req_to_ioreq_server(s, v, p);
+    }
+}
+
 void hvm_hlt(unsigned long rflags)
 {
     struct vcpu *curr = current;
@@ -4524,6 +5162,195 @@ static int hvmop_flush_tlb_all(void)
     return 0;
 }
 
+static int hvmop_create_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_create_ioreq_server_t) uop)
+{
+    struct domain *curr_d = current->domain;
+    xen_hvm_create_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = hvm_create_ioreq_server(d, curr_d->domain_id, 0, &op.id);
+    if ( rc != 0 )
+        goto out;
+
+    rc = copy_to_guest(uop, &op, 1) ? -EFAULT : 0;
+    
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_get_ioreq_server_info(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_get_ioreq_server_info_t) uop)
+{
+    xen_hvm_get_ioreq_server_info_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    if ( (rc = hvm_get_ioreq_server_info(d, op.id,
+                                         &op.ioreq_pfn,
+                                         &op.bufioreq_pfn, 
+                                         &op.bufioreq_port)) < 0 )
+        goto out;
+
+    rc = copy_to_guest(uop, &op, 1) ? -EFAULT : 0;
+    
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_map_io_range_to_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_map_io_range_to_ioreq_server_t) uop)
+{
+    xen_hvm_map_io_range_to_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = hvm_map_io_range_to_ioreq_server(d, op.id, op.is_mmio,
+                                          op.start, op.end);
+
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_unmap_io_range_from_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_unmap_io_range_from_ioreq_server_t) uop)
+{
+    xen_hvm_unmap_io_range_from_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = hvm_unmap_io_range_from_ioreq_server(d, op.id, op.is_mmio,
+                                              op.start);
+    
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_map_pcidev_to_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_map_pcidev_to_ioreq_server_t) uop)
+{
+    xen_hvm_map_pcidev_to_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = hvm_map_pcidev_to_ioreq_server(d, op.id, op.bdf);
+
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_unmap_pcidev_from_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_unmap_pcidev_from_ioreq_server_t) uop)
+{
+    xen_hvm_unmap_pcidev_from_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = hvm_unmap_pcidev_from_ioreq_server(d, op.id, op.bdf);
+
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_destroy_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_destroy_ioreq_server_t) uop)
+{
+    xen_hvm_destroy_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    hvm_destroy_ioreq_server(d, op.id);
+    rc = 0;
+
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
 long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
 {
@@ -4532,6 +5359,41 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     switch ( op )
     {
+    case HVMOP_create_ioreq_server:
+        rc = hvmop_create_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_create_ioreq_server_t));
+        break;
+    
+    case HVMOP_get_ioreq_server_info:
+        rc = hvmop_get_ioreq_server_info(
+            guest_handle_cast(arg, xen_hvm_get_ioreq_server_info_t));
+        break;
+    
+    case HVMOP_map_io_range_to_ioreq_server:
+        rc = hvmop_map_io_range_to_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_map_io_range_to_ioreq_server_t));
+        break;
+    
+    case HVMOP_unmap_io_range_from_ioreq_server:
+        rc = hvmop_unmap_io_range_from_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_unmap_io_range_from_ioreq_server_t));
+        break;
+    
+    case HVMOP_map_pcidev_to_ioreq_server:
+        rc = hvmop_map_pcidev_to_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_map_pcidev_to_ioreq_server_t));
+        break;
+    
+    case HVMOP_unmap_pcidev_from_ioreq_server:
+        rc = hvmop_unmap_pcidev_from_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_unmap_pcidev_from_ioreq_server_t));
+        break;
+    
+    case HVMOP_destroy_ioreq_server:
+        rc = hvmop_destroy_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_destroy_ioreq_server_t));
+        break;
+    
     case HVMOP_set_param:
     case HVMOP_get_param:
     {
@@ -4626,7 +5488,9 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 if ( a.value == DOMID_SELF )
                     a.value = curr_d->domain_id;
 
-                rc = hvm_set_dm_domain(d, a.value);
+                rc = hvm_create_ioreq_server(d, a.value, 1, NULL);
+                if ( rc == -EEXIST )
+                    rc = hvm_set_dm_domain(d, a.value);
                 break;
             case HVM_PARAM_ACPI_S_STATE:
                 /* Not reflexive, as we must domain_pause(). */
@@ -4691,6 +5555,25 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 if ( a.value > SHUTDOWN_MAX )
                     rc = -EINVAL;
                 break;
+            case HVM_PARAM_IOREQ_SERVER_PFN:
+                if ( d == current->domain ) {
+                    rc = -EPERM;
+                    break;
+                }
+                d->arch.hvm_domain.ioreq_gmfn_base = a.value;
+                break;
+            case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
+                if ( d == current->domain ) {
+                    rc = -EPERM;
+                    break;
+                }
+                if ( a.value == 0 ||
+                     a.value > sizeof(unsigned long) * 8 ) {
+                    rc = -EINVAL;
+                    break;
+                }
+                d->arch.hvm_domain.ioreq_gmfn_count = a.value;
+                break;
             }
 
             if ( rc == 0 ) 
@@ -4724,6 +5607,12 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             case HVM_PARAM_ACPI_S_STATE:
                 a.value = d->arch.hvm_domain.is_s3_suspended ? 3 : 0;
                 break;
+            case HVM_PARAM_IOREQ_SERVER_PFN:
+            case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
+                if ( d == current->domain ) {
+                    rc = -EPERM;
+                    break;
+                }
             case HVM_PARAM_IOREQ_PFN:
             case HVM_PARAM_BUFIOREQ_PFN:
             case HVM_PARAM_BUFIOREQ_EVTCHN: {
@@ -4731,7 +5620,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 
                 /* May need to create server */
                 domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
-                rc = hvm_create_ioreq_server(d, domid);
+                rc = hvm_create_ioreq_server(d, domid, 1, NULL);
                 if ( rc != 0 && rc != -EEXIST )
                     goto param_fail;
                 /*FALLTHRU*/
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 8db300d..8461cc3 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -74,7 +74,7 @@ void send_invalidate_req(void)
         .data = ~0UL, /* flush all */
     };
 
-    (void)hvm_send_assist_req(current, &p);
+    hvm_broadcast_assist_req(current, &p);
 }
 
 int handle_mmio(void)
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index b6911f9..d7a73ce 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -36,10 +36,23 @@
 #include <public/hvm/save.h>
 
 struct hvm_ioreq_page {
+    unsigned long gmfn;
     struct page_info *page;
     void *va;
 };
 
+struct hvm_io_range {
+    struct list_head    list_entry;
+    uint64_t            start, end;
+    struct rcu_head     rcu;
+};	
+
+struct hvm_pcidev {
+    struct list_head    list_entry;
+    uint16_t            bdf;
+    struct rcu_head     rcu;
+};	
+
 struct hvm_ioreq_vcpu {
     struct list_head list_entry;
     struct vcpu      *vcpu;
@@ -47,6 +60,9 @@ struct hvm_ioreq_vcpu {
 };
 
 struct hvm_ioreq_server {
+    struct list_head       list_entry;
+    ioservid_t             id;
+
     /* Lock to serialize toolstack modifications */
     spinlock_t             lock;
     struct domain          *domain;
@@ -60,11 +76,27 @@ struct hvm_ioreq_server {
     /* Lock to serialize access to buffered ioreq ring */
     spinlock_t             bufioreq_lock;
     evtchn_port_t          bufioreq_evtchn;
+    struct list_head       mmio_range_list;
+    struct list_head       portio_range_list;
+    struct list_head       pcidev_list;
 };
 
 struct hvm_domain {
+    /* Guest page range used for non-default ioreq servers */
+    unsigned long           ioreq_gmfn_base;
+    unsigned int            ioreq_gmfn_count;
+    unsigned long           ioreq_gmfn_mask;
+
+    /* Lock protects all other values in the following block */
     spinlock_t              ioreq_server_lock;
-    struct hvm_ioreq_server *ioreq_server;
+    ioservid_t              ioreq_server_id;
+    struct list_head        ioreq_server_list;
+    unsigned int            ioreq_server_count;
+    struct hvm_ioreq_server *default_ioreq_server;
+
+    /* Cached CF8 for guest PCI config cycles */
+    uint32_t                pci_cf8;
+    spinlock_t              pci_lock;
 
     struct pl_time         pl_time;
 
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 08a62ea..6c4530e 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -228,7 +228,8 @@ int prepare_ring_for_helper(struct domain *d, unsigned long gmfn,
                             struct page_info **_page, void **_va);
 void destroy_ring_for_helper(void **_va, struct page_info *page);
 
-bool_t hvm_send_assist_req(struct vcpu *v, const ioreq_t *p);
+bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *p);
+void hvm_broadcast_assist_req(struct vcpu *v, ioreq_t *p);
 
 void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat);
 int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat);
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index bfd28c2..be6546d 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -92,7 +92,7 @@ static inline int hvm_buffered_io_intercept(ioreq_t *p)
 }
 
 int hvm_mmio_intercept(ioreq_t *p);
-int hvm_buffered_io_send(struct domain *d, const ioreq_t *p);
+int hvm_buffered_io_send(struct domain *d, ioreq_t *p);
 
 static inline void register_portio_handler(
     struct domain *d, unsigned long addr,
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index a9aab4b..c6ceea5 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -23,6 +23,7 @@
 
 #include "../xen.h"
 #include "../trace.h"
+#include "../event_channel.h"
 
 /* Get/set subcommands: extra argument == pointer to xen_hvm_param struct. */
 #define HVMOP_set_param           0
@@ -270,6 +271,75 @@ struct xen_hvm_inject_msi {
 typedef struct xen_hvm_inject_msi xen_hvm_inject_msi_t;
 DEFINE_XEN_GUEST_HANDLE(xen_hvm_inject_msi_t);
 
+typedef uint32_t ioservid_t;
+
+DEFINE_XEN_GUEST_HANDLE(ioservid_t);
+
+#define HVMOP_create_ioreq_server 17
+struct xen_hvm_create_ioreq_server {
+    domid_t domid;  /* IN - domain to be serviced */
+    ioservid_t id;  /* OUT - server id */
+};
+typedef struct xen_hvm_create_ioreq_server xen_hvm_create_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_create_ioreq_server_t);
+
+#define HVMOP_get_ioreq_server_info 18
+struct xen_hvm_get_ioreq_server_info {
+    domid_t domid;               /* IN - domain to be serviced */
+    ioservid_t id;               /* IN - server id */
+    xen_pfn_t ioreq_pfn;         /* OUT - sync ioreq pfn */
+    xen_pfn_t bufioreq_pfn;      /* OUT - buffered ioreq pfn */
+    evtchn_port_t bufioreq_port; /* OUT - buffered ioreq port */
+};
+typedef struct xen_hvm_get_ioreq_server_info xen_hvm_get_ioreq_server_info_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_get_ioreq_server_info_t);
+
+#define HVMOP_map_io_range_to_ioreq_server 19
+struct xen_hvm_map_io_range_to_ioreq_server {
+    domid_t domid;                  /* IN - domain to be serviced */
+    ioservid_t id;                  /* IN - handle from HVMOP_register_ioreq_server */
+    int is_mmio;                    /* IN - MMIO or port IO? */
+    uint64_aligned_t start, end;    /* IN - inclusive start and end of range */
+};
+typedef struct xen_hvm_map_io_range_to_ioreq_server xen_hvm_map_io_range_to_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_io_range_to_ioreq_server_t);
+
+#define HVMOP_unmap_io_range_from_ioreq_server 20
+struct xen_hvm_unmap_io_range_from_ioreq_server {
+    domid_t domid;          /* IN - domain to be serviced */
+    ioservid_t id;          /* IN - handle from HVMOP_register_ioreq_server */
+    uint8_t is_mmio;        /* IN - MMIO or port IO? */
+    uint64_aligned_t start; /* IN - start address of the range to remove */
+};
+typedef struct xen_hvm_unmap_io_range_from_ioreq_server xen_hvm_unmap_io_range_from_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_unmap_io_range_from_ioreq_server_t);
+
+#define HVMOP_map_pcidev_to_ioreq_server 21
+struct xen_hvm_map_pcidev_to_ioreq_server {
+    domid_t domid;      /* IN - domain to be serviced */
+    ioservid_t id;      /* IN - handle from HVMOP_register_ioreq_server */
+    uint16_t bdf;       /* IN - PCI bus/dev/func */
+};
+typedef struct xen_hvm_map_pcidev_to_ioreq_server xen_hvm_map_pcidev_to_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_pcidev_to_ioreq_server_t);
+
+#define HVMOP_unmap_pcidev_from_ioreq_server 22
+struct xen_hvm_unmap_pcidev_from_ioreq_server {
+    domid_t domid;      /* IN - domain to be serviced */
+    ioservid_t id;      /* IN - handle from HVMOP_register_ioreq_server */
+    uint16_t bdf;       /* IN - PCI bus/dev/func */
+};
+typedef struct xen_hvm_unmap_pcidev_from_ioreq_server xen_hvm_unmap_pcidev_from_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_unmap_pcidev_from_ioreq_server_t);
+
+#define HVMOP_destroy_ioreq_server 23
+struct xen_hvm_destroy_ioreq_server {
+    domid_t domid;          /* IN - domain to be serviced */
+    ioservid_t id;          /* IN - server id */
+};
+typedef struct xen_hvm_destroy_ioreq_server xen_hvm_destroy_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_destroy_ioreq_server_t);
+
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 
 #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
diff --git a/xen/include/public/hvm/ioreq.h b/xen/include/public/hvm/ioreq.h
index f05d130..e84fa75 100644
--- a/xen/include/public/hvm/ioreq.h
+++ b/xen/include/public/hvm/ioreq.h
@@ -34,6 +34,7 @@
 
 #define IOREQ_TYPE_PIO          0 /* pio */
 #define IOREQ_TYPE_COPY         1 /* mmio ops */
+#define IOREQ_TYPE_PCI_CONFIG   2 /* pci config ops */
 #define IOREQ_TYPE_TIMEOFFSET   7
 #define IOREQ_TYPE_INVALIDATE   8 /* mapcache */
 
diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
index 517a184..f830bdd 100644
--- a/xen/include/public/hvm/params.h
+++ b/xen/include/public/hvm/params.h
@@ -145,6 +145,9 @@
 /* SHUTDOWN_* action in case of a triple fault */
 #define HVM_PARAM_TRIPLE_FAULT_REASON 31
 
-#define HVM_NR_PARAMS          32
+#define HVM_PARAM_IOREQ_SERVER_PFN 32
+#define HVM_PARAM_NR_IOREQ_SERVER_PAGES 33
+
+#define HVM_NR_PARAMS          34
 
 #endif /* __XEN_PUBLIC_HVM_PARAMS_H__ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 5/8] ioreq-server: add support for multiple servers
  2014-04-02 15:11 ` [PATCH v4 5/8] ioreq-server: add support for multiple servers Paul Durrant
@ 2014-04-03 15:32   ` George Dunlap
  2014-04-03 15:39     ` Paul Durrant
  2014-04-07 15:57   ` Ian Campbell
  2014-04-09 12:43   ` Jan Beulich
  2 siblings, 1 reply; 62+ messages in thread
From: George Dunlap @ 2014-04-03 15:32 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Ian Jackson, Ian Campbell, Jan Beulich,
	xen-devel@lists.xen.org

On Wed, Apr 2, 2014 at 4:11 PM, Paul Durrant <paul.durrant@citrix.com> wrote:
> The previous single ioreq server that was created on demand now
> becomes the default server and an API is created to allow secondary
> servers, which handle specific IO ranges or PCI devices, to be added.
>
> When the guest issues an IO the list of secondary servers is checked
> for a matching IO range or PCI device. If none is found then the IO
> is passed to the default server.
>
> Secondary servers use guest pages to communicate with emulators, in
> the same way as the default server. These pages need to be in the
> guest physmap otherwise there is no suitable reference that can be
> queried by an emulator in order to map them. Therefore a pool of
> pages in the current E820 reserved region, just below the special
> pages is used. Secondary servers allocate from and free to this pool
> as they are created and destroyed.
>
> The size of the pool is currently hardcoded in the domain build at a
> value of 8. This should be sufficient for now and both the location and
> size of the pool can be modified in future without any need to change the
> API.
>
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> ---
>  tools/libxc/xc_domain.c          |  175 +++++++
>  tools/libxc/xc_domain_restore.c  |   27 +
>  tools/libxc/xc_domain_save.c     |   24 +
>  tools/libxc/xc_hvm_build_x86.c   |   30 +-
>  tools/libxc/xenctrl.h            |   52 ++
>  tools/libxc/xg_save_restore.h    |    2 +
>  xen/arch/x86/hvm/hvm.c           | 1035 +++++++++++++++++++++++++++++++++++---
>  xen/arch/x86/hvm/io.c            |    2 +-
>  xen/include/asm-x86/hvm/domain.h |   34 +-
>  xen/include/asm-x86/hvm/hvm.h    |    3 +-
>  xen/include/asm-x86/hvm/io.h     |    2 +-
>  xen/include/public/hvm/hvm_op.h  |   70 +++
>  xen/include/public/hvm/ioreq.h   |    1 +
>  xen/include/public/hvm/params.h  |    5 +-
>  14 files changed, 1383 insertions(+), 79 deletions(-)
>
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index 369c3f3..8cec171 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -1284,6 +1284,181 @@ int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long
>      return rc;
>  }
>
> +int xc_hvm_create_ioreq_server(xc_interface *xch,
> +                               domid_t domid,
> +                               ioservid_t *id)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_create_ioreq_server_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_create_ioreq_server;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +    arg->domid = domid;
> +    rc = do_xen_hypercall(xch, &hypercall);
> +    *id = arg->id;
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}

Sorry if I missed it, but was there anywhere the 8-server limit is
checked?  What happens if someone calls xc_hvm_create_ioreq_server() 9
times?

> @@ -728,18 +877,53 @@ static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, struct domain *d,
>
>   fail:
>      hvm_ioreq_server_remove_all_vcpus(s);
> -    hvm_ioreq_server_unmap_pages(s);
> +    hvm_ioreq_server_unmap_pages(s, is_default);
>
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
>      return rc;
>  }
>
> -static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s)
> +static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s,
> +                                    bool_t is_default)
>  {
> +    struct list_head *entry;
> +
> +    list_for_each ( entry,
> +                    &s->mmio_range_list )
> +    {
> +        struct hvm_io_range *x = list_entry(entry,
> +                                            struct hvm_io_range,
> +                                            list_entry);
> +
> +        xfree(x);

Hang on, isn't x still actually on mmio_range_list at this point, and
doesn't entry equal &(x->list_entry)?  So the next time around
list_for_each(), you're using x after you've freed it?

I think you're missing a list_del_entry(), here and in the other 2
loops in this function.

I haven't gone through the rest of it with a fine-tooth comb, but
interface-wise it looks good.

 -George

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 5/8] ioreq-server: add support for multiple servers
  2014-04-03 15:32   ` George Dunlap
@ 2014-04-03 15:39     ` Paul Durrant
  2014-04-03 15:43       ` George Dunlap
  0 siblings, 1 reply; 62+ messages in thread
From: Paul Durrant @ 2014-04-03 15:39 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Jackson, Stefano Stabellini, Ian Campbell, Jan Beulich,
	xen-devel@lists.xen.org

> -----Original Message-----
> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
> George Dunlap
> Sent: 03 April 2014 16:33
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Ian Jackson; Ian Campbell; Jan Beulich; Stefano
> Stabellini
> Subject: Re: [Xen-devel] [PATCH v4 5/8] ioreq-server: add support for
> multiple servers
> 
> On Wed, Apr 2, 2014 at 4:11 PM, Paul Durrant <paul.durrant@citrix.com>
> wrote:
> > The previous single ioreq server that was created on demand now
> > becomes the default server and an API is created to allow secondary
> > servers, which handle specific IO ranges or PCI devices, to be added.
> >
> > When the guest issues an IO the list of secondary servers is checked
> > for a matching IO range or PCI device. If none is found then the IO
> > is passed to the default server.
> >
> > Secondary servers use guest pages to communicate with emulators, in
> > the same way as the default server. These pages need to be in the
> > guest physmap otherwise there is no suitable reference that can be
> > queried by an emulator in order to map them. Therefore a pool of
> > pages in the current E820 reserved region, just below the special
> > pages is used. Secondary servers allocate from and free to this pool
> > as they are created and destroyed.
> >
> > The size of the pool is currently hardcoded in the domain build at a
> > value of 8. This should be sufficient for now and both the location and
> > size of the pool can be modified in future without any need to change the
> > API.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > Cc: Ian Campbell <ian.campbell@citrix.com>
> > Cc: Jan Beulich <jbeulich@suse.com>
> > ---
> >  tools/libxc/xc_domain.c          |  175 +++++++
> >  tools/libxc/xc_domain_restore.c  |   27 +
> >  tools/libxc/xc_domain_save.c     |   24 +
> >  tools/libxc/xc_hvm_build_x86.c   |   30 +-
> >  tools/libxc/xenctrl.h            |   52 ++
> >  tools/libxc/xg_save_restore.h    |    2 +
> >  xen/arch/x86/hvm/hvm.c           | 1035
> +++++++++++++++++++++++++++++++++++---
> >  xen/arch/x86/hvm/io.c            |    2 +-
> >  xen/include/asm-x86/hvm/domain.h |   34 +-
> >  xen/include/asm-x86/hvm/hvm.h    |    3 +-
> >  xen/include/asm-x86/hvm/io.h     |    2 +-
> >  xen/include/public/hvm/hvm_op.h  |   70 +++
> >  xen/include/public/hvm/ioreq.h   |    1 +
> >  xen/include/public/hvm/params.h  |    5 +-
> >  14 files changed, 1383 insertions(+), 79 deletions(-)
> >
> > diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> > index 369c3f3..8cec171 100644
> > --- a/tools/libxc/xc_domain.c
> > +++ b/tools/libxc/xc_domain.c
> > @@ -1284,6 +1284,181 @@ int xc_get_hvm_param(xc_interface *handle,
> domid_t dom, int param, unsigned long
> >      return rc;
> >  }
> >
> > +int xc_hvm_create_ioreq_server(xc_interface *xch,
> > +                               domid_t domid,
> > +                               ioservid_t *id)
> > +{
> > +    DECLARE_HYPERCALL;
> > +    DECLARE_HYPERCALL_BUFFER(xen_hvm_create_ioreq_server_t, arg);
> > +    int rc;
> > +
> > +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> > +    if ( arg == NULL )
> > +        return -1;
> > +
> > +    hypercall.op     = __HYPERVISOR_hvm_op;
> > +    hypercall.arg[0] = HVMOP_create_ioreq_server;
> > +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> > +    arg->domid = domid;
> > +    rc = do_xen_hypercall(xch, &hypercall);
> > +    *id = arg->id;
> > +    xc_hypercall_buffer_free(xch, arg);
> > +    return rc;
> > +}
> 
> Sorry if I missed it, but was there anywhere the 8-server limit is
> checked?  What happens if someone calls xc_hvm_create_ioreq_server() 9
> times?
> 
> > @@ -728,18 +877,53 @@ static int hvm_ioreq_server_init(struct
> hvm_ioreq_server *s, struct domain *d,
> >
> >   fail:
> >      hvm_ioreq_server_remove_all_vcpus(s);
> > -    hvm_ioreq_server_unmap_pages(s);
> > +    hvm_ioreq_server_unmap_pages(s, is_default);
> >
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> >      return rc;
> >  }
> >
> > -static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s)
> > +static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s,
> > +                                    bool_t is_default)
> >  {
> > +    struct list_head *entry;
> > +
> > +    list_for_each ( entry,
> > +                    &s->mmio_range_list )
> > +    {
> > +        struct hvm_io_range *x = list_entry(entry,
> > +                                            struct hvm_io_range,
> > +                                            list_entry);
> > +
> > +        xfree(x);
> 
> Hang on, isn't x still actually on mmio_range_list at this point, and
> doesn't entry equal &(x->list_entry)?  So the next time around
> list_for_each(), you're using x after you've freed it?
> 

Good catch, I should be using list_for_each_safe().

> I think you're missing a list_del_entry(), here and in the other 2
> loops in this function.
> 

I don't the del as this is a full teardown, but I do need to avoid invalidating my iterator :-)

  Paul

> I haven't gone through the rest of it with a fine-tooth comb, but
> interface-wise it looks good.
> 
>  -George

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 5/8] ioreq-server: add support for multiple servers
  2014-04-03 15:39     ` Paul Durrant
@ 2014-04-03 15:43       ` George Dunlap
  2014-04-03 15:46         ` Paul Durrant
  0 siblings, 1 reply; 62+ messages in thread
From: George Dunlap @ 2014-04-03 15:43 UTC (permalink / raw)
  To: Paul Durrant, George Dunlap
  Cc: Ian Jackson, Stefano Stabellini, Ian Campbell, Jan Beulich,
	xen-devel@lists.xen.org

On 04/03/2014 04:39 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
>> George Dunlap
>> Sent: 03 April 2014 16:33
>> To: Paul Durrant
>> Cc: xen-devel@lists.xen.org; Ian Jackson; Ian Campbell; Jan Beulich; Stefano
>> Stabellini
>> Subject: Re: [Xen-devel] [PATCH v4 5/8] ioreq-server: add support for
>> multiple servers
>>
>> On Wed, Apr 2, 2014 at 4:11 PM, Paul Durrant <paul.durrant@citrix.com>
>> wrote:
>> +++++++++++++++++++++++++++++++++++---
>>>
>>> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
>>> index 369c3f3..8cec171 100644
>>> --- a/tools/libxc/xc_domain.c
>>> +++ b/tools/libxc/xc_domain.c
>>> @@ -1284,6 +1284,181 @@ int xc_get_hvm_param(xc_interface *handle,
>> domid_t dom, int param, unsigned long
>>>       return rc;
>>>   }
>>>
>>> +int xc_hvm_create_ioreq_server(xc_interface *xch,
>>> +                               domid_t domid,
>>> +                               ioservid_t *id)
>>> +{
>>> +    DECLARE_HYPERCALL;
>>> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_create_ioreq_server_t, arg);
>>> +    int rc;
>>> +
>>> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
>>> +    if ( arg == NULL )
>>> +        return -1;
>>> +
>>> +    hypercall.op     = __HYPERVISOR_hvm_op;
>>> +    hypercall.arg[0] = HVMOP_create_ioreq_server;
>>> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
>>> +    arg->domid = domid;
>>> +    rc = do_xen_hypercall(xch, &hypercall);
>>> +    *id = arg->id;
>>> +    xc_hypercall_buffer_free(xch, arg);
>>> +    return rc;
>>> +}
>> Sorry if I missed it, but was there anywhere the 8-server limit is
>> checked?  What happens if someone calls xc_hvm_create_ioreq_server() 9
>> times?

Just checking -- did you miss this question, or are you going to come 
back to it later? :-)

  -George

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 5/8] ioreq-server: add support for multiple servers
  2014-04-03 15:43       ` George Dunlap
@ 2014-04-03 15:46         ` Paul Durrant
  0 siblings, 0 replies; 62+ messages in thread
From: Paul Durrant @ 2014-04-03 15:46 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Jackson, Stefano Stabellini, Ian Campbell, Jan Beulich,
	xen-devel@lists.xen.org

> -----Original Message-----
> From: George Dunlap [mailto:george.dunlap@eu.citrix.com]
> Sent: 03 April 2014 16:43
> To: Paul Durrant; George Dunlap
> Cc: xen-devel@lists.xen.org; Ian Jackson; Ian Campbell; Jan Beulich; Stefano
> Stabellini
> Subject: Re: [Xen-devel] [PATCH v4 5/8] ioreq-server: add support for
> multiple servers
> 
> On 04/03/2014 04:39 PM, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
> >> George Dunlap
> >> Sent: 03 April 2014 16:33
> >> To: Paul Durrant
> >> Cc: xen-devel@lists.xen.org; Ian Jackson; Ian Campbell; Jan Beulich;
> Stefano
> >> Stabellini
> >> Subject: Re: [Xen-devel] [PATCH v4 5/8] ioreq-server: add support for
> >> multiple servers
> >>
> >> On Wed, Apr 2, 2014 at 4:11 PM, Paul Durrant <paul.durrant@citrix.com>
> >> wrote:
> >> +++++++++++++++++++++++++++++++++++---
> >>>
> >>> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> >>> index 369c3f3..8cec171 100644
> >>> --- a/tools/libxc/xc_domain.c
> >>> +++ b/tools/libxc/xc_domain.c
> >>> @@ -1284,6 +1284,181 @@ int xc_get_hvm_param(xc_interface
> *handle,
> >> domid_t dom, int param, unsigned long
> >>>       return rc;
> >>>   }
> >>>
> >>> +int xc_hvm_create_ioreq_server(xc_interface *xch,
> >>> +                               domid_t domid,
> >>> +                               ioservid_t *id)
> >>> +{
> >>> +    DECLARE_HYPERCALL;
> >>> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_create_ioreq_server_t,
> arg);
> >>> +    int rc;
> >>> +
> >>> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> >>> +    if ( arg == NULL )
> >>> +        return -1;
> >>> +
> >>> +    hypercall.op     = __HYPERVISOR_hvm_op;
> >>> +    hypercall.arg[0] = HVMOP_create_ioreq_server;
> >>> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> >>> +    arg->domid = domid;
> >>> +    rc = do_xen_hypercall(xch, &hypercall);
> >>> +    *id = arg->id;
> >>> +    xc_hypercall_buffer_free(xch, arg);
> >>> +    return rc;
> >>> +}
> >> Sorry if I missed it, but was there anywhere the 8-server limit is
> >> checked?  What happens if someone calls xc_hvm_create_ioreq_server()
> 9
> >> times?
> 
> Just checking -- did you miss this question, or are you going to come
> back to it later? :-)
> 

Sorry, I did miss that. The limit of 8 is not the number of servers, it's the number of pages. If someone tries to create too many servers the page allocation code will fail and thus the creation will fail with an ENOMEM.

  Paul

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 5/8] ioreq-server: add support for multiple servers
  2014-04-02 15:11 ` [PATCH v4 5/8] ioreq-server: add support for multiple servers Paul Durrant
  2014-04-03 15:32   ` George Dunlap
@ 2014-04-07 15:57   ` Ian Campbell
  2014-04-08  8:32     ` Paul Durrant
  2014-04-09 12:43   ` Jan Beulich
  2 siblings, 1 reply; 62+ messages in thread
From: Ian Campbell @ 2014-04-07 15:57 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Stefano Stabellini, Ian Jackson, Jan Beulich, xen-devel

On Wed, 2014-04-02 at 16:11 +0100, Paul Durrant wrote:
> The previous single ioreq server that was created on demand now
> becomes the default server and an API is created to allow secondary
> servers, which handle specific IO ranges or PCI devices, to be added.
> 
> When the guest issues an IO the list of secondary servers is checked
> for a matching IO range or PCI device. If none is found then the IO
> is passed to the default server.
> 
> Secondary servers use guest pages to communicate with emulators, in
> the same way as the default server. These pages need to be in the
> guest physmap otherwise there is no suitable reference that can be
> queried by an emulator in order to map them. Therefore a pool of
> pages in the current E820 reserved region, just below the special
> pages is used. Secondary servers allocate from and free to this pool
> as they are created and destroyed.
> 
> The size of the pool is currently hardcoded in the domain build at a
> value of 8. This should be sufficient for now and both the location and
> size of the pool can be modified in future without any need to change the
> API.

A pool of 8 implies 4 servers with a buffered and unbuffered page each?
Or later on some combination of servers using both or one or the other?

> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> ---
>  tools/libxc/xc_domain.c          |  175 +++++++
>  tools/libxc/xc_domain_restore.c  |   27 +
>  tools/libxc/xc_domain_save.c     |   24 +
>  tools/libxc/xc_hvm_build_x86.c   |   30 +-
>  tools/libxc/xenctrl.h            |   52 ++
>  tools/libxc/xg_save_restore.h    |    2 +
>  xen/arch/x86/hvm/hvm.c           | 1035 +++++++++++++++++++++++++++++++++++---
>  xen/arch/x86/hvm/io.c            |    2 +-
>  xen/include/asm-x86/hvm/domain.h |   34 +-
>  xen/include/asm-x86/hvm/hvm.h    |    3 +-
>  xen/include/asm-x86/hvm/io.h     |    2 +-
>  xen/include/public/hvm/hvm_op.h  |   70 +++
>  xen/include/public/hvm/ioreq.h   |    1 +
>  xen/include/public/hvm/params.h  |    5 +-
>  14 files changed, 1383 insertions(+), 79 deletions(-)
> 
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index 369c3f3..8cec171 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -1284,6 +1284,181 @@ int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long
>      return rc;
>  }
>  
> +int xc_hvm_create_ioreq_server(xc_interface *xch,
> +                               domid_t domid,
> +                               ioservid_t *id)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_create_ioreq_server_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_create_ioreq_server;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);

Can you add some newlines please.

  Here

> +    arg->domid = domid;

  Here

> +    rc = do_xen_hypercall(xch, &hypercall);

  Here

> +    *id = arg->id;

  Here

> +    xc_hypercall_buffer_free(xch, arg);

  Here

> +    return rc;
> +}
> +
> +int xc_hvm_get_ioreq_server_info(xc_interface *xch,
> +                                 domid_t domid,
> +                                 ioservid_t id,
> +                                 xen_pfn_t *ioreq_pfn,
> +                                 xen_pfn_t *bufioreq_pfn,
> +                                 evtchn_port_t *bufioreq_port)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_get_ioreq_server_info_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_get_ioreq_server_info;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);

  Here

> +    arg->domid = domid;
> +    arg->id = id;

  Here

> +    rc = do_xen_hypercall(xch, &hypercall);
> +    if ( rc != 0 )
> +        goto done;
> +
> +    if ( ioreq_pfn )
> +        *ioreq_pfn = arg->ioreq_pfn;
> +
> +    if ( bufioreq_pfn )
> +        *bufioreq_pfn = arg->bufioreq_pfn;
> +
> +    if ( bufioreq_port )
> +        *bufioreq_port = arg->bufioreq_port;
> +
> +done:
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}
> +
> +int xc_hvm_map_io_range_to_ioreq_server(xc_interface *xch, domid_t domid,
> +                                        ioservid_t id, int is_mmio,
> +                                        uint64_t start, uint64_t end)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_map_io_range_to_ioreq_server_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);

  Here

> +    arg->domid = domid;
> +    arg->id = id;
> +    arg->is_mmio = is_mmio;
> +    arg->start = start;
> +    arg->end = end;

  Here
> +    rc = do_xen_hypercall(xch, &hypercall);

  Here 
> +    xc_hypercall_buffer_free(xch, arg);

  Here

Well, you get the picture, it applies for the rest of these bindings
too. The actual code looks correct.

[...]

> @@ -1748,6 +1770,11 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
>      if (pagebuf.viridian != 0)
>          xc_set_hvm_param(xch, dom, HVM_PARAM_VIRIDIAN, 1);
>  
> +    xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_SERVER_PFN,
> +                     pagebuf.ioreq_server_pfn);
> +    xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVER_PAGES, 
> +                     pagebuf.nr_ioreq_server_pages);

When migrating into this from the previous version of Xen both of these
will be zero. Does that do the right thing? I didn't see any special
handling on the hypervisor side. In fact it looks like it will EINVAL.

> +
>      if (pagebuf.acpi_ioport_location == 1) {
>          DBGPRINTF("Use new firmware ioport from the checkpoint\n");
>          xc_set_hvm_param(xch, dom, HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);

> diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> index e3a32f2..3b0c678 100644
> --- a/tools/libxc/xenctrl.h
> +++ b/tools/libxc/xenctrl.h
> @@ -1801,6 +1801,48 @@ void xc_clear_last_error(xc_interface *xch);
>  int xc_set_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long value);
>  int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long *value);
>  
> +/*
> + * IOREQ server API
> + */

Hopefully xen/include/public has the associated docs for what all these
functions do.

Ian.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 5/8] ioreq-server: add support for multiple servers
  2014-04-07 15:57   ` Ian Campbell
@ 2014-04-08  8:32     ` Paul Durrant
  2014-04-08  8:40       ` Ian Campbell
  0 siblings, 1 reply; 62+ messages in thread
From: Paul Durrant @ 2014-04-08  8:32 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Ian Jackson, Stefano Stabellini, Jan Beulich,
	xen-devel@lists.xen.org

> -----Original Message-----
> From: Ian Campbell
> Sent: 07 April 2014 16:58
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Jan Beulich
> Subject: Re: [PATCH v4 5/8] ioreq-server: add support for multiple servers
> 
> On Wed, 2014-04-02 at 16:11 +0100, Paul Durrant wrote:
> > The previous single ioreq server that was created on demand now
> > becomes the default server and an API is created to allow secondary
> > servers, which handle specific IO ranges or PCI devices, to be added.
> >
> > When the guest issues an IO the list of secondary servers is checked
> > for a matching IO range or PCI device. If none is found then the IO
> > is passed to the default server.
> >
> > Secondary servers use guest pages to communicate with emulators, in
> > the same way as the default server. These pages need to be in the
> > guest physmap otherwise there is no suitable reference that can be
> > queried by an emulator in order to map them. Therefore a pool of
> > pages in the current E820 reserved region, just below the special
> > pages is used. Secondary servers allocate from and free to this pool
> > as they are created and destroyed.
> >
> > The size of the pool is currently hardcoded in the domain build at a
> > value of 8. This should be sufficient for now and both the location and
> > size of the pool can be modified in future without any need to change the
> > API.
> 
> A pool of 8 implies 4 servers with a buffered and unbuffered page each?
> Or later on some combination of servers using both or one or the other?
> 

Yes, it seems like enough room for now.

> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > Cc: Ian Campbell <ian.campbell@citrix.com>
> > Cc: Jan Beulich <jbeulich@suse.com>
> > ---
> >  tools/libxc/xc_domain.c          |  175 +++++++
> >  tools/libxc/xc_domain_restore.c  |   27 +
> >  tools/libxc/xc_domain_save.c     |   24 +
> >  tools/libxc/xc_hvm_build_x86.c   |   30 +-
> >  tools/libxc/xenctrl.h            |   52 ++
> >  tools/libxc/xg_save_restore.h    |    2 +
> >  xen/arch/x86/hvm/hvm.c           | 1035
> +++++++++++++++++++++++++++++++++++---
> >  xen/arch/x86/hvm/io.c            |    2 +-
> >  xen/include/asm-x86/hvm/domain.h |   34 +-
> >  xen/include/asm-x86/hvm/hvm.h    |    3 +-
> >  xen/include/asm-x86/hvm/io.h     |    2 +-
> >  xen/include/public/hvm/hvm_op.h  |   70 +++
> >  xen/include/public/hvm/ioreq.h   |    1 +
> >  xen/include/public/hvm/params.h  |    5 +-
> >  14 files changed, 1383 insertions(+), 79 deletions(-)
> >
> > diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> > index 369c3f3..8cec171 100644
> > --- a/tools/libxc/xc_domain.c
> > +++ b/tools/libxc/xc_domain.c
> > @@ -1284,6 +1284,181 @@ int xc_get_hvm_param(xc_interface *handle,
> domid_t dom, int param, unsigned long
> >      return rc;
> >  }
> >
> > +int xc_hvm_create_ioreq_server(xc_interface *xch,
> > +                               domid_t domid,
> > +                               ioservid_t *id)
> > +{
> > +    DECLARE_HYPERCALL;
> > +    DECLARE_HYPERCALL_BUFFER(xen_hvm_create_ioreq_server_t, arg);
> > +    int rc;
> > +
> > +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> > +    if ( arg == NULL )
> > +        return -1;
> > +
> > +    hypercall.op     = __HYPERVISOR_hvm_op;
> > +    hypercall.arg[0] = HVMOP_create_ioreq_server;
> > +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> 
> Can you add some newlines please.
> 

Sure.

[snip]
> > @@ -1748,6 +1770,11 @@ int xc_domain_restore(xc_interface *xch, int
> io_fd, uint32_t dom,
> >      if (pagebuf.viridian != 0)
> >          xc_set_hvm_param(xch, dom, HVM_PARAM_VIRIDIAN, 1);
> >
> > +    xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_SERVER_PFN,
> > +                     pagebuf.ioreq_server_pfn);
> > +    xc_set_hvm_param(xch, dom,
> HVM_PARAM_NR_IOREQ_SERVER_PAGES,
> > +                     pagebuf.nr_ioreq_server_pages);
> 
> When migrating into this from the previous version of Xen both of these
> will be zero. Does that do the right thing? I didn't see any special
> handling on the hypervisor side. In fact it looks like it will EINVAL.
> 

Yes, it will be EINVAL which is why the return code is deliberately not checked. I can special-case if you think that would be clearer, or stick '(void)' in front of these calls to show the return code is being deliberately ignored.

> > +
> >      if (pagebuf.acpi_ioport_location == 1) {
> >          DBGPRINTF("Use new firmware ioport from the checkpoint\n");
> >          xc_set_hvm_param(xch, dom,
> HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
> 
> > diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> > index e3a32f2..3b0c678 100644
> > --- a/tools/libxc/xenctrl.h
> > +++ b/tools/libxc/xenctrl.h
> > @@ -1801,6 +1801,48 @@ void xc_clear_last_error(xc_interface *xch);
> >  int xc_set_hvm_param(xc_interface *handle, domid_t dom, int param,
> unsigned long value);
> >  int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param,
> unsigned long *value);
> >
> > +/*
> > + * IOREQ server API
> > + */
> 
> Hopefully xen/include/public has the associated docs for what all these
> functions do.

Not yet. Do think this would best handled in a header or in some markdown in docs/misc?

  Paul

> 
> Ian.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 5/8] ioreq-server: add support for multiple servers
  2014-04-08  8:32     ` Paul Durrant
@ 2014-04-08  8:40       ` Ian Campbell
  2014-04-08  8:45         ` Paul Durrant
  0 siblings, 1 reply; 62+ messages in thread
From: Ian Campbell @ 2014-04-08  8:40 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Ian Jackson, Stefano Stabellini, Jan Beulich,
	xen-devel@lists.xen.org

On Tue, 2014-04-08 at 09:32 +0100, Paul Durrant wrote:
> > -----Original Message-----
> > From: Ian Campbell
> > Sent: 07 April 2014 16:58
> > To: Paul Durrant
> > Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Jan Beulich
> > Subject: Re: [PATCH v4 5/8] ioreq-server: add support for multiple servers
> > 
> > On Wed, 2014-04-02 at 16:11 +0100, Paul Durrant wrote:
> > > The previous single ioreq server that was created on demand now
> > > becomes the default server and an API is created to allow secondary
> > > servers, which handle specific IO ranges or PCI devices, to be added.
> > >
> > > When the guest issues an IO the list of secondary servers is checked
> > > for a matching IO range or PCI device. If none is found then the IO
> > > is passed to the default server.
> > >
> > > Secondary servers use guest pages to communicate with emulators, in
> > > the same way as the default server. These pages need to be in the
> > > guest physmap otherwise there is no suitable reference that can be
> > > queried by an emulator in order to map them. Therefore a pool of
> > > pages in the current E820 reserved region, just below the special
> > > pages is used. Secondary servers allocate from and free to this pool
> > > as they are created and destroyed.
> > >
> > > The size of the pool is currently hardcoded in the domain build at a
> > > value of 8. This should be sufficient for now and both the location and
> > > size of the pool can be modified in future without any need to change the
> > > API.
> > 
> > A pool of 8 implies 4 servers with a buffered and unbuffered page each?
> > Or later on some combination of servers using both or one or the other?
> > 
> 
> Yes, it seems like enough room for now.

What I was asking is can I have 3 servers with both buffered and
unbuffered plus 2 with only unbuffered, for a total of 8 pages?

> [snip]
> > > @@ -1748,6 +1770,11 @@ int xc_domain_restore(xc_interface *xch, int
> > io_fd, uint32_t dom,
> > >      if (pagebuf.viridian != 0)
> > >          xc_set_hvm_param(xch, dom, HVM_PARAM_VIRIDIAN, 1);
> > >
> > > +    xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_SERVER_PFN,
> > > +                     pagebuf.ioreq_server_pfn);
> > > +    xc_set_hvm_param(xch, dom,
> > HVM_PARAM_NR_IOREQ_SERVER_PAGES,
> > > +                     pagebuf.nr_ioreq_server_pages);
> > 
> > When migrating into this from the previous version of Xen both of these
> > will be zero. Does that do the right thing? I didn't see any special
> > handling on the hypervisor side. In fact it looks like it will EINVAL.
> > 
> 
> Yes, it will be EINVAL which is why the return code is deliberately
> not checked. I can special-case if you think that would be clearer, or
> stick '(void)' in front of these calls to show the return code is
> being deliberately ignored.

At the very least the behaviour needs to be written down somewhere.

But I think being explicit in the restore code about these cases would
be clearer than relying on EINVAL from the hypervisor, i.e. remember if
you saw that chunk or not and either make the call or not.

As for not checking the return code -- what if it fails for some other
reason. perhaps the migration stream is corrupt?

How does everything agree on the location of the fallback ioreq PFN in
this case since it isn't set here?

> 
> > > +
> > >      if (pagebuf.acpi_ioport_location == 1) {
> > >          DBGPRINTF("Use new firmware ioport from the checkpoint\n");
> > >          xc_set_hvm_param(xch, dom,
> > HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
> > 
> > > diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> > > index e3a32f2..3b0c678 100644
> > > --- a/tools/libxc/xenctrl.h
> > > +++ b/tools/libxc/xenctrl.h
> > > @@ -1801,6 +1801,48 @@ void xc_clear_last_error(xc_interface *xch);
> > >  int xc_set_hvm_param(xc_interface *handle, domid_t dom, int param,
> > unsigned long value);
> > >  int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param,
> > unsigned long *value);
> > >
> > > +/*
> > > + * IOREQ server API
> > > + */
> > 
> > Hopefully xen/include/public has the associated docs for what all these
> > functions do.
> 
> Not yet. Do think this would best handled in a header or in some
> markdown in docs/misc?

For hypercalls in the headers is best, they will be exported into the
docs subtree by the build. See
http://xenbits.xen.org/docs/unstable/hypercall/x86_64/index.html.

Ian.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 5/8] ioreq-server: add support for multiple servers
  2014-04-08  8:40       ` Ian Campbell
@ 2014-04-08  8:45         ` Paul Durrant
  0 siblings, 0 replies; 62+ messages in thread
From: Paul Durrant @ 2014-04-08  8:45 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Ian Jackson, Stefano Stabellini, Jan Beulich,
	xen-devel@lists.xen.org

> -----Original Message-----
> From: Ian Campbell
> Sent: 08 April 2014 09:41
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Jan Beulich
> Subject: Re: [PATCH v4 5/8] ioreq-server: add support for multiple servers
> 
> On Tue, 2014-04-08 at 09:32 +0100, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Ian Campbell
> > > Sent: 07 April 2014 16:58
> > > To: Paul Durrant
> > > Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Jan Beulich
> > > Subject: Re: [PATCH v4 5/8] ioreq-server: add support for multiple
> servers
> > >
> > > On Wed, 2014-04-02 at 16:11 +0100, Paul Durrant wrote:
> > > > The previous single ioreq server that was created on demand now
> > > > becomes the default server and an API is created to allow secondary
> > > > servers, which handle specific IO ranges or PCI devices, to be added.
> > > >
> > > > When the guest issues an IO the list of secondary servers is checked
> > > > for a matching IO range or PCI device. If none is found then the IO
> > > > is passed to the default server.
> > > >
> > > > Secondary servers use guest pages to communicate with emulators, in
> > > > the same way as the default server. These pages need to be in the
> > > > guest physmap otherwise there is no suitable reference that can be
> > > > queried by an emulator in order to map them. Therefore a pool of
> > > > pages in the current E820 reserved region, just below the special
> > > > pages is used. Secondary servers allocate from and free to this pool
> > > > as they are created and destroyed.
> > > >
> > > > The size of the pool is currently hardcoded in the domain build at a
> > > > value of 8. This should be sufficient for now and both the location and
> > > > size of the pool can be modified in future without any need to change
> the
> > > > API.
> > >
> > > A pool of 8 implies 4 servers with a buffered and unbuffered page each?
> > > Or later on some combination of servers using both or one or the other?
> > >
> >
> > Yes, it seems like enough room for now.
> 
> What I was asking is can I have 3 servers with both buffered and
> unbuffered plus 2 with only unbuffered, for a total of 8 pages?
> 

Yes.

> > [snip]
> > > > @@ -1748,6 +1770,11 @@ int xc_domain_restore(xc_interface *xch, int
> > > io_fd, uint32_t dom,
> > > >      if (pagebuf.viridian != 0)
> > > >          xc_set_hvm_param(xch, dom, HVM_PARAM_VIRIDIAN, 1);
> > > >
> > > > +    xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_SERVER_PFN,
> > > > +                     pagebuf.ioreq_server_pfn);
> > > > +    xc_set_hvm_param(xch, dom,
> > > HVM_PARAM_NR_IOREQ_SERVER_PAGES,
> > > > +                     pagebuf.nr_ioreq_server_pages);
> > >
> > > When migrating into this from the previous version of Xen both of these
> > > will be zero. Does that do the right thing? I didn't see any special
> > > handling on the hypervisor side. In fact it looks like it will EINVAL.
> > >
> >
> > Yes, it will be EINVAL which is why the return code is deliberately
> > not checked. I can special-case if you think that would be clearer, or
> > stick '(void)' in front of these calls to show the return code is
> > being deliberately ignored.
> 
> At the very least the behaviour needs to be written down somewhere.
> 
> But I think being explicit in the restore code about these cases would
> be clearer than relying on EINVAL from the hypervisor, i.e. remember if
> you saw that chunk or not and either make the call or not.
> 
> As for not checking the return code -- what if it fails for some other
> reason. perhaps the migration stream is corrupt?
> 

EPERM is the only other possible failure, and that shouldn't happen for a legit domain restore. I'll avoid making the call if the save record is not there though, as you suggest.

> How does everything agree on the location of the fallback ioreq PFN in
> this case since it isn't set here?
> 

The QEMU PFNs don't move, so no change there. If the params aren't set properly then secondary servers cannot be created after migration.

> >
> > > > +
> > > >      if (pagebuf.acpi_ioport_location == 1) {
> > > >          DBGPRINTF("Use new firmware ioport from the checkpoint\n");
> > > >          xc_set_hvm_param(xch, dom,
> > > HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
> > >
> > > > diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> > > > index e3a32f2..3b0c678 100644
> > > > --- a/tools/libxc/xenctrl.h
> > > > +++ b/tools/libxc/xenctrl.h
> > > > @@ -1801,6 +1801,48 @@ void xc_clear_last_error(xc_interface *xch);
> > > >  int xc_set_hvm_param(xc_interface *handle, domid_t dom, int param,
> > > unsigned long value);
> > > >  int xc_get_hvm_param(xc_interface *handle, domid_t dom, int
> param,
> > > unsigned long *value);
> > > >
> > > > +/*
> > > > + * IOREQ server API
> > > > + */
> > >
> > > Hopefully xen/include/public has the associated docs for what all these
> > > functions do.
> >
> > Not yet. Do think this would best handled in a header or in some
> > markdown in docs/misc?
> 
> For hypercalls in the headers is best, they will be exported into the
> docs subtree by the build. See
> http://xenbits.xen.org/docs/unstable/hypercall/x86_64/index.html.
> 

Ok. Ta,

  Paul

> Ian.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 5/8] ioreq-server: add support for multiple servers
  2014-04-02 15:11 ` [PATCH v4 5/8] ioreq-server: add support for multiple servers Paul Durrant
  2014-04-03 15:32   ` George Dunlap
  2014-04-07 15:57   ` Ian Campbell
@ 2014-04-09 12:43   ` Jan Beulich
  2014-04-09 12:49     ` Ian Campbell
  2014-04-09 13:32     ` Paul Durrant
  2 siblings, 2 replies; 62+ messages in thread
From: Jan Beulich @ 2014-04-09 12:43 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, Ian Jackson, Ian Campbell, Stefano Stabellini

>>> On 02.04.14 at 17:11, <paul.durrant@citrix.com> wrote:
> Secondary servers use guest pages to communicate with emulators, in
> the same way as the default server. These pages need to be in the
> guest physmap otherwise there is no suitable reference that can be
> queried by an emulator in order to map them. Therefore a pool of
> pages in the current E820 reserved region, just below the special
> pages is used. Secondary servers allocate from and free to this pool
> as they are created and destroyed.

Ah, here is the answer to the question I raised on patch 6 - somehow
I managed to look at them in wrong order. Nevertheless, and also in
the context of the discussion we had with Stefano yesterday, we may
want/need to think of a way to allow pages to be trackable without
being mapped in the physmap.

> @@ -60,11 +76,27 @@ struct hvm_ioreq_server {
>      /* Lock to serialize access to buffered ioreq ring */
>      spinlock_t             bufioreq_lock;
>      evtchn_port_t          bufioreq_evtchn;
> +    struct list_head       mmio_range_list;
> +    struct list_head       portio_range_list;
> +    struct list_head       pcidev_list;

Wouldn't these better be range sets? I realize this might conflict with
the RCU manipulation of the entries, but perhaps the rangesets could
get their interface extended if this is strictly a requirement (otoh hand
I can't see why you couldn't get away with "normal" freeing, since
changes to these lists shouldn't be frequent, and hence not be
performance critical).

Also, I didn't see a limit being enforced on the number of elements
that can be added to these lists, yet allowing this to be unlimited is
a latent security issue.

>  struct hvm_domain {
> +    /* Guest page range used for non-default ioreq servers */
> +    unsigned long           ioreq_gmfn_base;
> +    unsigned int            ioreq_gmfn_count;
> +    unsigned long           ioreq_gmfn_mask;
> +
> +    /* Lock protects all other values in the following block */
>      spinlock_t              ioreq_server_lock;
> -    struct hvm_ioreq_server *ioreq_server;
> +    ioservid_t              ioreq_server_id;
> +    struct list_head        ioreq_server_list;
> +    unsigned int            ioreq_server_count;
> +    struct hvm_ioreq_server *default_ioreq_server;
> +
> +    /* Cached CF8 for guest PCI config cycles */
> +    uint32_t                pci_cf8;
> +    spinlock_t              pci_lock;

Please consider padding when adding new fields here - try grouping
64-bit quantities together rather than alternating between 32- and
64-bit ones.

> --- a/xen/include/asm-x86/hvm/hvm.h
> +++ b/xen/include/asm-x86/hvm/hvm.h
> @@ -228,7 +228,8 @@ int prepare_ring_for_helper(struct domain *d, unsigned long gmfn,
>                              struct page_info **_page, void **_va);
>  void destroy_ring_for_helper(void **_va, struct page_info *page);
>  
> -bool_t hvm_send_assist_req(struct vcpu *v, const ioreq_t *p);
> +bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *p);

Any reason you couldn't avoid adding the const in one of the earlier
patches?

> +#define HVMOP_map_io_range_to_ioreq_server 19
> +struct xen_hvm_map_io_range_to_ioreq_server {
> +    domid_t domid;                  /* IN - domain to be serviced */
> +    ioservid_t id;                  /* IN - handle from HVMOP_register_ioreq_server */
> +    int is_mmio;                    /* IN - MMIO or port IO? */
> +    uint64_aligned_t start, end;    /* IN - inclusive start and end of range */
> +};
> +typedef struct xen_hvm_map_io_range_to_ioreq_server xen_hvm_map_io_range_to_ioreq_server_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_io_range_to_ioreq_server_t);
> +
> +#define HVMOP_unmap_io_range_from_ioreq_server 20
> +struct xen_hvm_unmap_io_range_from_ioreq_server {
> +    domid_t domid;          /* IN - domain to be serviced */
> +    ioservid_t id;          /* IN - handle from HVMOP_register_ioreq_server */
> +    uint8_t is_mmio;        /* IN - MMIO or port IO? */

Please use uint8_t above too, and move the field ahead of "id" for
better packing. I'm also not sure the "is_" prefix is really useful here.
And for this to be usable with other architectures that may have
address spaces other than memory and I/O ports it would seem
desirable to not consider this a boolean, but an enumerator. In the
end a third address space could immediately be PCI space, thus
eliminating the need for the two special ops below. I.e. this could
follow more closely ACPI's address space handling - there's nothing
inherently wrong with an MSR based I/O interface for example.

> +#define HVMOP_map_pcidev_to_ioreq_server 21
> +struct xen_hvm_map_pcidev_to_ioreq_server {
> +    domid_t domid;      /* IN - domain to be serviced */
> +    ioservid_t id;      /* IN - handle from HVMOP_register_ioreq_server */
> +    uint16_t bdf;       /* IN - PCI bus/dev/func */
> +};
> +typedef struct xen_hvm_map_pcidev_to_ioreq_server xen_hvm_map_pcidev_to_ioreq_server_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_pcidev_to_ioreq_server_t);
> +
> +#define HVMOP_unmap_pcidev_from_ioreq_server 22
> +struct xen_hvm_unmap_pcidev_from_ioreq_server {
> +    domid_t domid;      /* IN - domain to be serviced */
> +    ioservid_t id;      /* IN - handle from HVMOP_register_ioreq_server */
> +    uint16_t bdf;       /* IN - PCI bus/dev/func */

Both of these need a PCI segment/domain added. Also what's the
point of having two identical structures of map and unmap?

Jan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 5/8] ioreq-server: add support for multiple servers
  2014-04-09 12:43   ` Jan Beulich
@ 2014-04-09 12:49     ` Ian Campbell
  2014-04-09 13:15       ` Jan Beulich
  2014-04-09 13:32     ` Paul Durrant
  1 sibling, 1 reply; 62+ messages in thread
From: Ian Campbell @ 2014-04-09 12:49 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Paul Durrant, Ian Jackson, Stefano Stabellini

On Wed, 2014-04-09 at 13:43 +0100, Jan Beulich wrote:
> >>> On 02.04.14 at 17:11, <paul.durrant@citrix.com> wrote:
> > Secondary servers use guest pages to communicate with emulators, in
> > the same way as the default server. These pages need to be in the
> > guest physmap otherwise there is no suitable reference that can be
> > queried by an emulator in order to map them. Therefore a pool of
> > pages in the current E820 reserved region, just below the special
> > pages is used. Secondary servers allocate from and free to this pool
> > as they are created and destroyed.
> 
> Ah, here is the answer to the question I raised on patch 6 - somehow
> I managed to look at them in wrong order. Nevertheless, and also in
> the context of the discussion we had with Stefano yesterday, we may
> want/need to think of a way to allow pages to be trackable without
> being mapped in the physmap.

Is what is wanted a new XENMAPSPACE which could be used via
xen_add_to_physmap_batch to map foreign_dom.idx where idx for that
mapspace is the ioreq page index rather than a gfn?

Ian.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 5/8] ioreq-server: add support for multiple servers
  2014-04-09 12:49     ` Ian Campbell
@ 2014-04-09 13:15       ` Jan Beulich
  0 siblings, 0 replies; 62+ messages in thread
From: Jan Beulich @ 2014-04-09 13:15 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, Paul Durrant, Ian Jackson, Stefano Stabellini

>>> On 09.04.14 at 14:49, <Ian.Campbell@citrix.com> wrote:
> On Wed, 2014-04-09 at 13:43 +0100, Jan Beulich wrote:
>> >>> On 02.04.14 at 17:11, <paul.durrant@citrix.com> wrote:
>> > Secondary servers use guest pages to communicate with emulators, in
>> > the same way as the default server. These pages need to be in the
>> > guest physmap otherwise there is no suitable reference that can be
>> > queried by an emulator in order to map them. Therefore a pool of
>> > pages in the current E820 reserved region, just below the special
>> > pages is used. Secondary servers allocate from and free to this pool
>> > as they are created and destroyed.
>> 
>> Ah, here is the answer to the question I raised on patch 6 - somehow
>> I managed to look at them in wrong order. Nevertheless, and also in
>> the context of the discussion we had with Stefano yesterday, we may
>> want/need to think of a way to allow pages to be trackable without
>> being mapped in the physmap.
> 
> Is what is wanted a new XENMAPSPACE which could be used via
> xen_add_to_physmap_batch to map foreign_dom.idx where idx for that
> mapspace is the ioreq page index rather than a gfn?

Something along those lines at least; the precise one you talk about
would help the other (qemu) issue - we'd really need a "free floating"
map space, with indexes assigned as they get removed from physmap
or allocated without putting them into the physmap.

Jan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 5/8] ioreq-server: add support for multiple servers
  2014-04-09 12:43   ` Jan Beulich
  2014-04-09 12:49     ` Ian Campbell
@ 2014-04-09 13:32     ` Paul Durrant
  2014-04-09 13:46       ` Jan Beulich
  1 sibling, 1 reply; 62+ messages in thread
From: Paul Durrant @ 2014-04-09 13:32 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Jackson, Stefano Stabellini, Ian Campbell,
	xen-devel@lists.xen.org

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 09 April 2014 13:43
> To: Paul Durrant
> Cc: Ian Campbell; Ian Jackson; Stefano Stabellini; xen-devel@lists.xen.org
> Subject: Re: [PATCH v4 5/8] ioreq-server: add support for multiple servers
> 
> >>> On 02.04.14 at 17:11, <paul.durrant@citrix.com> wrote:
> > Secondary servers use guest pages to communicate with emulators, in
> > the same way as the default server. These pages need to be in the
> > guest physmap otherwise there is no suitable reference that can be
> > queried by an emulator in order to map them. Therefore a pool of
> > pages in the current E820 reserved region, just below the special
> > pages is used. Secondary servers allocate from and free to this pool
> > as they are created and destroyed.
> 
> Ah, here is the answer to the question I raised on patch 6 - somehow
> I managed to look at them in wrong order. Nevertheless, and also in
> the context of the discussion we had with Stefano yesterday, we may
> want/need to think of a way to allow pages to be trackable without
> being mapped in the physmap.
> 

Possibly. That would be rather a large amount of scope-creep to these patches though.

> > @@ -60,11 +76,27 @@ struct hvm_ioreq_server {
> >      /* Lock to serialize access to buffered ioreq ring */
> >      spinlock_t             bufioreq_lock;
> >      evtchn_port_t          bufioreq_evtchn;
> > +    struct list_head       mmio_range_list;
> > +    struct list_head       portio_range_list;
> > +    struct list_head       pcidev_list;
> 
> Wouldn't these better be range sets? I realize this might conflict with
> the RCU manipulation of the entries, but perhaps the rangesets could
> get their interface extended if this is strictly a requirement (otoh hand
> I can't see why you couldn't get away with "normal" freeing, since
> changes to these lists shouldn't be frequent, and hence not be
> performance critical).
> 

The lists are accessed without lock by a vcpu requesting emulation so they need to be RCU. I'll look at the range set code and see if that is feasible.

> Also, I didn't see a limit being enforced on the number of elements
> that can be added to these lists, yet allowing this to be unlimited is
> a latent security issue.
> 

Guest domains cannot add to the lists, only the emulating domain, but if that is unprivileged then, yes, that is a security issue.

> >  struct hvm_domain {
> > +    /* Guest page range used for non-default ioreq servers */
> > +    unsigned long           ioreq_gmfn_base;
> > +    unsigned int            ioreq_gmfn_count;
> > +    unsigned long           ioreq_gmfn_mask;
> > +
> > +    /* Lock protects all other values in the following block */
> >      spinlock_t              ioreq_server_lock;
> > -    struct hvm_ioreq_server *ioreq_server;
> > +    ioservid_t              ioreq_server_id;
> > +    struct list_head        ioreq_server_list;
> > +    unsigned int            ioreq_server_count;
> > +    struct hvm_ioreq_server *default_ioreq_server;
> > +
> > +    /* Cached CF8 for guest PCI config cycles */
> > +    uint32_t                pci_cf8;
> > +    spinlock_t              pci_lock;
> 
> Please consider padding when adding new fields here - try grouping
> 64-bit quantities together rather than alternating between 32- and
> 64-bit ones.

Why do we need to care about padding? Re-ordering for efficiency of space is reasonable.

> 
> > --- a/xen/include/asm-x86/hvm/hvm.h
> > +++ b/xen/include/asm-x86/hvm/hvm.h
> > @@ -228,7 +228,8 @@ int prepare_ring_for_helper(struct domain *d,
> unsigned long gmfn,
> >                              struct page_info **_page, void **_va);
> >  void destroy_ring_for_helper(void **_va, struct page_info *page);
> >
> > -bool_t hvm_send_assist_req(struct vcpu *v, const ioreq_t *p);
> > +bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *p);
> 
> Any reason you couldn't avoid adding the const in one of the earlier
> patches?
> 

You asked for it in a previous review. I'm happy to lose the const again.

> > +#define HVMOP_map_io_range_to_ioreq_server 19
> > +struct xen_hvm_map_io_range_to_ioreq_server {
> > +    domid_t domid;                  /* IN - domain to be serviced */
> > +    ioservid_t id;                  /* IN - handle from
> HVMOP_register_ioreq_server */
> > +    int is_mmio;                    /* IN - MMIO or port IO? */
> > +    uint64_aligned_t start, end;    /* IN - inclusive start and end of range */
> > +};
> > +typedef struct xen_hvm_map_io_range_to_ioreq_server
> xen_hvm_map_io_range_to_ioreq_server_t;
> >
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_io_range_to_ioreq_server_
> t);
> > +
> > +#define HVMOP_unmap_io_range_from_ioreq_server 20
> > +struct xen_hvm_unmap_io_range_from_ioreq_server {
> > +    domid_t domid;          /* IN - domain to be serviced */
> > +    ioservid_t id;          /* IN - handle from HVMOP_register_ioreq_server */
> > +    uint8_t is_mmio;        /* IN - MMIO or port IO? */
> 
> Please use uint8_t above too, and move the field ahead of "id" for
> better packing. I'm also not sure the "is_" prefix is really useful here.

Ok.

> And for this to be usable with other architectures that may have
> address spaces other than memory and I/O ports it would seem
> desirable to not consider this a boolean, but an enumerator.

Maybe it would be better to consolidate io ranges and pci devs then and the existing ioreq type values in the interface. I.e:

#define IOREQ_TYPE_PIO          0 /* pio */
#define IOREQ_TYPE_COPY         1 /* mmio ops */
#define IOREQ_TYPE_PCI_CONFIG   2 /* pci config ops */

>  In the
> end a third address space could immediately be PCI space, thus
> eliminating the need for the two special ops below. I.e. this could
> follow more closely ACPI's address space handling - there's nothing
> inherently wrong with an MSR based I/O interface for example.
> 

Ah, I see you had the same thought :-)

> > +#define HVMOP_map_pcidev_to_ioreq_server 21
> > +struct xen_hvm_map_pcidev_to_ioreq_server {
> > +    domid_t domid;      /* IN - domain to be serviced */
> > +    ioservid_t id;      /* IN - handle from HVMOP_register_ioreq_server */
> > +    uint16_t bdf;       /* IN - PCI bus/dev/func */
> > +};
> > +typedef struct xen_hvm_map_pcidev_to_ioreq_server
> xen_hvm_map_pcidev_to_ioreq_server_t;
> >
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_pcidev_to_ioreq_server_t);
> > +
> > +#define HVMOP_unmap_pcidev_from_ioreq_server 22
> > +struct xen_hvm_unmap_pcidev_from_ioreq_server {
> > +    domid_t domid;      /* IN - domain to be serviced */
> > +    ioservid_t id;      /* IN - handle from HVMOP_register_ioreq_server */
> > +    uint16_t bdf;       /* IN - PCI bus/dev/func */
> 
> Both of these need a PCI segment/domain added. 

Ok.

> Also what's the
> point of having two identical structures of map and unmap?
> 

Good point. I'll collapse them down.

  Paul

> Jan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 5/8] ioreq-server: add support for multiple servers
  2014-04-09 13:32     ` Paul Durrant
@ 2014-04-09 13:46       ` Jan Beulich
  2014-04-09 13:51         ` Paul Durrant
  2014-04-09 14:42         ` Ian Campbell
  0 siblings, 2 replies; 62+ messages in thread
From: Jan Beulich @ 2014-04-09 13:46 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Ian Jackson, Stefano Stabellini, Ian Campbell,
	xen-devel@lists.xen.org

>>> On 09.04.14 at 15:32, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >>> On 02.04.14 at 17:11, <paul.durrant@citrix.com> wrote:
>> Also, I didn't see a limit being enforced on the number of elements
>> that can be added to these lists, yet allowing this to be unlimited is
>> a latent security issue.
>> 
> 
> Guest domains cannot add to the lists, only the emulating domain, but if 
> that is unprivileged then, yes, that is a security issue.

And hence it needs to be fixed, or the operation added to the list of
disaggregation-unsafe ones (which XSA-77 added). I'd clearly favor
the former...

>> >  struct hvm_domain {
>> > +    /* Guest page range used for non-default ioreq servers */
>> > +    unsigned long           ioreq_gmfn_base;
>> > +    unsigned int            ioreq_gmfn_count;
>> > +    unsigned long           ioreq_gmfn_mask;
>> > +
>> > +    /* Lock protects all other values in the following block */
>> >      spinlock_t              ioreq_server_lock;
>> > -    struct hvm_ioreq_server *ioreq_server;
>> > +    ioservid_t              ioreq_server_id;
>> > +    struct list_head        ioreq_server_list;
>> > +    unsigned int            ioreq_server_count;
>> > +    struct hvm_ioreq_server *default_ioreq_server;
>> > +
>> > +    /* Cached CF8 for guest PCI config cycles */
>> > +    uint32_t                pci_cf8;
>> > +    spinlock_t              pci_lock;
>> 
>> Please consider padding when adding new fields here - try grouping
>> 64-bit quantities together rather than alternating between 32- and
>> 64-bit ones.
> 
> Why do we need to care about padding? Re-ordering for efficiency of space is 
> reasonable.

That's what I meant - try to avoid unnecessary padding.

>> > --- a/xen/include/asm-x86/hvm/hvm.h
>> > +++ b/xen/include/asm-x86/hvm/hvm.h
>> > @@ -228,7 +228,8 @@ int prepare_ring_for_helper(struct domain *d,
>> unsigned long gmfn,
>> >                              struct page_info **_page, void **_va);
>> >  void destroy_ring_for_helper(void **_va, struct page_info *page);
>> >
>> > -bool_t hvm_send_assist_req(struct vcpu *v, const ioreq_t *p);
>> > +bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *p);
>> 
>> Any reason you couldn't avoid adding the const in one of the earlier
>> patches?
>> 
> 
> You asked for it in a previous review. I'm happy to lose the const again.

If it doesn't persist through the series, there's little point in adding it.

>> And for this to be usable with other architectures that may have
>> address spaces other than memory and I/O ports it would seem
>> desirable to not consider this a boolean, but an enumerator.
> 
> Maybe it would be better to consolidate io ranges and pci devs then and the 
> existing ioreq type values in the interface. I.e:
> 
> #define IOREQ_TYPE_PIO          0 /* pio */
> #define IOREQ_TYPE_COPY         1 /* mmio ops */

Right, except that "COPY" is sort of odd here - why not "MMIO"?

Jan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 5/8] ioreq-server: add support for multiple servers
  2014-04-09 13:46       ` Jan Beulich
@ 2014-04-09 13:51         ` Paul Durrant
  2014-04-09 14:42         ` Ian Campbell
  1 sibling, 0 replies; 62+ messages in thread
From: Paul Durrant @ 2014-04-09 13:51 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Jackson, Stefano Stabellini, Ian Campbell,
	xen-devel@lists.xen.org

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 09 April 2014 14:47
> To: Paul Durrant
> Cc: Ian Campbell; Ian Jackson; Stefano Stabellini; xen-devel@lists.xen.org
> Subject: RE: [PATCH v4 5/8] ioreq-server: add support for multiple servers
> 
> >>> On 09.04.14 at 15:32, <Paul.Durrant@citrix.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >>> On 02.04.14 at 17:11, <paul.durrant@citrix.com> wrote:
> >> Also, I didn't see a limit being enforced on the number of elements
> >> that can be added to these lists, yet allowing this to be unlimited is
> >> a latent security issue.
> >>
> >
> > Guest domains cannot add to the lists, only the emulating domain, but if
> > that is unprivileged then, yes, that is a security issue.
> 
> And hence it needs to be fixed, or the operation added to the list of
> disaggregation-unsafe ones (which XSA-77 added). I'd clearly favor
> the former...
> 
> >> >  struct hvm_domain {
> >> > +    /* Guest page range used for non-default ioreq servers */
> >> > +    unsigned long           ioreq_gmfn_base;
> >> > +    unsigned int            ioreq_gmfn_count;
> >> > +    unsigned long           ioreq_gmfn_mask;
> >> > +
> >> > +    /* Lock protects all other values in the following block */
> >> >      spinlock_t              ioreq_server_lock;
> >> > -    struct hvm_ioreq_server *ioreq_server;
> >> > +    ioservid_t              ioreq_server_id;
> >> > +    struct list_head        ioreq_server_list;
> >> > +    unsigned int            ioreq_server_count;
> >> > +    struct hvm_ioreq_server *default_ioreq_server;
> >> > +
> >> > +    /* Cached CF8 for guest PCI config cycles */
> >> > +    uint32_t                pci_cf8;
> >> > +    spinlock_t              pci_lock;
> >>
> >> Please consider padding when adding new fields here - try grouping
> >> 64-bit quantities together rather than alternating between 32- and
> >> 64-bit ones.
> >
> > Why do we need to care about padding? Re-ordering for efficiency of
> space is
> > reasonable.
> 
> That's what I meant - try to avoid unnecessary padding.
> 
> >> > --- a/xen/include/asm-x86/hvm/hvm.h
> >> > +++ b/xen/include/asm-x86/hvm/hvm.h
> >> > @@ -228,7 +228,8 @@ int prepare_ring_for_helper(struct domain *d,
> >> unsigned long gmfn,
> >> >                              struct page_info **_page, void **_va);
> >> >  void destroy_ring_for_helper(void **_va, struct page_info *page);
> >> >
> >> > -bool_t hvm_send_assist_req(struct vcpu *v, const ioreq_t *p);
> >> > +bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *p);
> >>
> >> Any reason you couldn't avoid adding the const in one of the earlier
> >> patches?
> >>
> >
> > You asked for it in a previous review. I'm happy to lose the const again.
> 
> If it doesn't persist through the series, there's little point in adding it.
> 
> >> And for this to be usable with other architectures that may have
> >> address spaces other than memory and I/O ports it would seem
> >> desirable to not consider this a boolean, but an enumerator.
> >
> > Maybe it would be better to consolidate io ranges and pci devs then and
> the
> > existing ioreq type values in the interface. I.e:
> >
> > #define IOREQ_TYPE_PIO          0 /* pio */
> > #define IOREQ_TYPE_COPY         1 /* mmio ops */
> 
> Right, except that "COPY" is sort of odd here - why not "MMIO"?
> 

Good question. Historical I imagine. I prefer the term MMIO but changing the ioreq header would probably break many things so I's aim to #include it and then do something like

#define RANGE_TYPE_PORTIO	(IOREQ_TYPE_PIO)
#define RANGE_TYPE_MMIO	(IOREQ_TYPE_COPY)

Etc.

  Paul

> Jan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 5/8] ioreq-server: add support for multiple servers
  2014-04-09 13:46       ` Jan Beulich
  2014-04-09 13:51         ` Paul Durrant
@ 2014-04-09 14:42         ` Ian Campbell
  1 sibling, 0 replies; 62+ messages in thread
From: Ian Campbell @ 2014-04-09 14:42 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Jackson, Paul Durrant, Stefano Stabellini,
	xen-devel@lists.xen.org

On Wed, 2014-04-09 at 14:46 +0100, Jan Beulich wrote:
> >>> On 09.04.14 at 15:32, <Paul.Durrant@citrix.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >>> On 02.04.14 at 17:11, <paul.durrant@citrix.com> wrote:
> >> Also, I didn't see a limit being enforced on the number of elements
> >> that can be added to these lists, yet allowing this to be unlimited is
> >> a latent security issue.
> >> 
> > 
> > Guest domains cannot add to the lists, only the emulating domain, but if 
> > that is unprivileged then, yes, that is a security issue.
> 
> And hence it needs to be fixed, or the operation added to the list of
> disaggregation-unsafe ones (which XSA-77 added). I'd clearly favor
> the former...

and I will require it.

Quoting from the changelog of the XSA-77 patch:
    It is expected that these lists will be whittled away as each interface is
    audited for safety.
    
    New interfaces should be expected to be safe when introduced (IOW the list
    should never be expanded).
    
Ian.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 6/8] ioreq-server: remove p2m entries when server is enabled
  2014-04-02 15:11 [PATCH v4 0/8] Support for running secondary emulators Paul Durrant
                   ` (4 preceding siblings ...)
  2014-04-02 15:11 ` [PATCH v4 5/8] ioreq-server: add support for multiple servers Paul Durrant
@ 2014-04-02 15:11 ` Paul Durrant
  2014-04-07 16:00   ` Ian Campbell
  2014-04-09 12:20   ` Jan Beulich
  2014-04-02 15:11 ` [PATCH v4 7/8] ioreq-server: make buffered ioreq handling optional Paul Durrant
  2014-04-02 15:11 ` [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller implementation into Xen Paul Durrant
  7 siblings, 2 replies; 62+ messages in thread
From: Paul Durrant @ 2014-04-02 15:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Paul Durrant, Ian Jackson, Ian Campbell, Jan Beulich,
	Stefano Stabellini

For secondary servers, add a hvm op to enable/disable the server. The
server will not accept IO until it is enabled and the act of enabling
the server removes its pages from the guest p2m, thus preventing the guest
from directly mapping the pages and synthesizing ioreqs.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 tools/libxc/xc_domain.c          |   24 ++++++++
 tools/libxc/xenctrl.h            |    5 ++
 xen/arch/x86/hvm/hvm.c           |  115 +++++++++++++++++++++++++++++++++++++-
 xen/include/asm-x86/hvm/domain.h |    1 +
 xen/include/public/hvm/hvm_op.h  |   33 +++++++----
 5 files changed, 164 insertions(+), 14 deletions(-)

diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 8cec171..67829c5 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1459,6 +1459,30 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
     return rc;
 }
 
+int xc_hvm_set_ioreq_server_state(xc_interface *xch,
+                                  domid_t domid,
+                                  ioservid_t id,
+                                  int enabled)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_set_ioreq_server_state_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_set_ioreq_server_state;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->id = id;
+    arg->enabled = enabled;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
 int xc_domain_setdebugging(xc_interface *xch,
                            uint32_t domid,
                            unsigned int enable)
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 3b0c678..1f8d490 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1843,6 +1843,11 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
                                 domid_t domid,
                                 ioservid_t id);
 
+int xc_hvm_set_ioreq_server_state(xc_interface *xch,
+                                  domid_t domid,
+                                  ioservid_t id,
+                                  int enabled);
+
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
                      uint32_t domid,
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5af01b0..ba9b304 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -545,6 +545,20 @@ static int hvm_map_ioreq_page(
     return 0;
 }
 
+static void hvm_remove_ioreq_gmfn(
+    struct domain *d, struct hvm_ioreq_page *iorp)
+{
+    guest_physmap_remove_page(d, iorp->gmfn, 
+                              page_to_mfn(iorp->page), 0);
+}
+
+static int hvm_add_ioreq_gmfn(
+    struct domain *d, struct hvm_ioreq_page *iorp)
+{
+    return guest_physmap_add_page(d, iorp->gmfn,
+                                  page_to_mfn(iorp->page), 0);
+}
+
 static int hvm_print_line(
     int dir, uint32_t port, uint32_t bytes, uint32_t *val)
 {
@@ -844,6 +858,26 @@ static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s,
     }
 }
 
+static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s)
+{
+    struct domain *d = s->domain;
+
+    hvm_remove_ioreq_gmfn(d, &s->ioreq);
+    hvm_remove_ioreq_gmfn(d, &s->bufioreq);
+
+    s->enabled = 1;
+}
+
+static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s)
+{
+    struct domain *d = s->domain;
+
+    hvm_add_ioreq_gmfn(d, &s->bufioreq);
+    hvm_add_ioreq_gmfn(d, &s->ioreq);
+
+    s->enabled = 0;
+}
+
 static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, struct domain *d,
                                  domid_t domid, bool_t is_default,
                                  ioservid_t id)
@@ -888,6 +922,9 @@ static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s,
 {
     struct list_head *entry;
 
+    if ( !is_default && s->enabled )
+        hvm_ioreq_server_disable(s);
+
     list_for_each ( entry,
                     &s->mmio_range_list )
     {
@@ -950,8 +987,10 @@ static int hvm_create_ioreq_server(struct domain *d, domid_t domid,
              &d->arch.hvm_domain.ioreq_server_list);
     d->arch.hvm_domain.ioreq_server_count++;
 
-    if ( is_default )
+    if ( is_default ) {
+        s->enabled = 1;
         d->arch.hvm_domain.default_ioreq_server = s;
+    }
 
     domain_unpause(d);
 
@@ -996,7 +1035,7 @@ static void hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
 
         --d->arch.hvm_domain.ioreq_server_count;
         list_del_init(&s->list_entry);
-        
+
         hvm_ioreq_server_deinit(s, is_default);
 
         domain_unpause(d);
@@ -1240,6 +1279,44 @@ static int hvm_unmap_pcidev_from_ioreq_server(struct domain *d, ioservid_t id,
     return rc;
 }
 
+static int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
+                                      bool_t enabled)
+{
+    struct list_head *entry;
+    int rc;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    rc = -ENOENT;
+    list_for_each ( entry,
+                    &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                list_entry);
+        if ( s->id != id )
+            continue;
+
+        rc = 0;
+        if ( s->enabled == enabled )
+            break;
+
+        domain_pause(d);
+
+        if ( enabled )
+            hvm_ioreq_server_enable(s);
+        else
+            hvm_ioreq_server_disable(s);
+
+        domain_unpause(d);
+        break;
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
 static int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
 {
     struct list_head *entry;
@@ -2345,6 +2422,9 @@ static struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
                           &d->arch.hvm_domain.ioreq_server_list,
                           list_entry )
     {
+        if ( !s->enabled )
+            continue;
+
         switch ( type )
         {
             case IOREQ_TYPE_COPY:
@@ -2389,6 +2469,7 @@ static struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
  found:
     rcu_read_unlock(&ioreq_server_rcu_lock);
 
+    ASSERT(!s || s->enabled);
     return s;
 
 #undef BDF
@@ -5325,6 +5406,31 @@ static int hvmop_unmap_pcidev_from_ioreq_server(
     return rc;
 }
 
+static int hvmop_set_ioreq_server_state(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_set_ioreq_server_state_t) uop)
+{
+    xen_hvm_set_ioreq_server_state_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = hvm_set_ioreq_server_state(d, op.id, op.enabled);
+
+ out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
 static int hvmop_destroy_ioreq_server(
     XEN_GUEST_HANDLE_PARAM(xen_hvm_destroy_ioreq_server_t) uop)
 {
@@ -5388,6 +5494,11 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
         rc = hvmop_unmap_pcidev_from_ioreq_server(
             guest_handle_cast(arg, xen_hvm_unmap_pcidev_from_ioreq_server_t));
         break;
+
+    case HVMOP_set_ioreq_server_state:
+        rc = hvmop_set_ioreq_server_state(
+            guest_handle_cast(arg, xen_hvm_set_ioreq_server_state_t));
+        break;
     
     case HVMOP_destroy_ioreq_server:
         rc = hvmop_destroy_ioreq_server(
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index d7a73ce..36cb7ec 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -79,6 +79,7 @@ struct hvm_ioreq_server {
     struct list_head       mmio_range_list;
     struct list_head       portio_range_list;
     struct list_head       pcidev_list;
+    bool_t                 enabled;
 };
 
 struct hvm_domain {
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index c6ceea5..a39290e 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -296,10 +296,10 @@ DEFINE_XEN_GUEST_HANDLE(xen_hvm_get_ioreq_server_info_t);
 
 #define HVMOP_map_io_range_to_ioreq_server 19
 struct xen_hvm_map_io_range_to_ioreq_server {
-    domid_t domid;                  /* IN - domain to be serviced */
-    ioservid_t id;                  /* IN - handle from HVMOP_register_ioreq_server */
-    int is_mmio;                    /* IN - MMIO or port IO? */
-    uint64_aligned_t start, end;    /* IN - inclusive start and end of range */
+    domid_t domid;               /* IN - domain to be serviced */
+    ioservid_t id;               /* IN - handle from HVMOP_register_ioreq_server */
+    int is_mmio;                 /* IN - MMIO or port IO? */
+    uint64_aligned_t start, end; /* IN - inclusive start and end of range */
 };
 typedef struct xen_hvm_map_io_range_to_ioreq_server xen_hvm_map_io_range_to_ioreq_server_t;
 DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_io_range_to_ioreq_server_t);
@@ -316,30 +316,39 @@ DEFINE_XEN_GUEST_HANDLE(xen_hvm_unmap_io_range_from_ioreq_server_t);
 
 #define HVMOP_map_pcidev_to_ioreq_server 21
 struct xen_hvm_map_pcidev_to_ioreq_server {
-    domid_t domid;      /* IN - domain to be serviced */
-    ioservid_t id;      /* IN - handle from HVMOP_register_ioreq_server */
-    uint16_t bdf;       /* IN - PCI bus/dev/func */
+    domid_t domid; /* IN - domain to be serviced */
+    ioservid_t id; /* IN - handle from HVMOP_register_ioreq_server */
+    uint16_t bdf;  /* IN - PCI bus/dev/func */
 };
 typedef struct xen_hvm_map_pcidev_to_ioreq_server xen_hvm_map_pcidev_to_ioreq_server_t;
 DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_pcidev_to_ioreq_server_t);
 
 #define HVMOP_unmap_pcidev_from_ioreq_server 22
 struct xen_hvm_unmap_pcidev_from_ioreq_server {
-    domid_t domid;      /* IN - domain to be serviced */
-    ioservid_t id;      /* IN - handle from HVMOP_register_ioreq_server */
-    uint16_t bdf;       /* IN - PCI bus/dev/func */
+    domid_t domid; /* IN - domain to be serviced */
+    ioservid_t id; /* IN - handle from HVMOP_register_ioreq_server */
+    uint16_t bdf;  /* IN - PCI bus/dev/func */
 };
 typedef struct xen_hvm_unmap_pcidev_from_ioreq_server xen_hvm_unmap_pcidev_from_ioreq_server_t;
 DEFINE_XEN_GUEST_HANDLE(xen_hvm_unmap_pcidev_from_ioreq_server_t);
 
 #define HVMOP_destroy_ioreq_server 23
 struct xen_hvm_destroy_ioreq_server {
-    domid_t domid;          /* IN - domain to be serviced */
-    ioservid_t id;          /* IN - server id */
+    domid_t domid; /* IN - domain to be serviced */
+    ioservid_t id; /* IN - server id */
 };
 typedef struct xen_hvm_destroy_ioreq_server xen_hvm_destroy_ioreq_server_t;
 DEFINE_XEN_GUEST_HANDLE(xen_hvm_destroy_ioreq_server_t);
 
+#define HVMOP_set_ioreq_server_state 24
+struct xen_hvm_set_ioreq_server_state {
+    domid_t domid;   /* IN - domain to be serviced */
+    ioservid_t id;   /* IN - server id */
+    uint8_t enabled; /* IN - enabled? */    
+};
+typedef struct xen_hvm_set_ioreq_server_state xen_hvm_set_ioreq_server_state_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_set_ioreq_server_state_t);
+
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 
 #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 6/8] ioreq-server: remove p2m entries when server is enabled
  2014-04-02 15:11 ` [PATCH v4 6/8] ioreq-server: remove p2m entries when server is enabled Paul Durrant
@ 2014-04-07 16:00   ` Ian Campbell
  2014-04-08  8:33     ` Paul Durrant
  2014-04-09 12:20   ` Jan Beulich
  1 sibling, 1 reply; 62+ messages in thread
From: Ian Campbell @ 2014-04-07 16:00 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Stefano Stabellini, Ian Jackson, Jan Beulich, xen-devel

On Wed, 2014-04-02 at 16:11 +0100, Paul Durrant wrote:
> For secondary servers, add a hvm op to enable/disable the server. The
> server will not accept IO until it is enabled and the act of enabling
> the server removes its pages from the guest p2m, thus preventing the guest
> from directly mapping the pages and synthesizing ioreqs.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> ---
>  tools/libxc/xc_domain.c          |   24 ++++++++
>  tools/libxc/xenctrl.h            |    5 ++
>  xen/arch/x86/hvm/hvm.c           |  115 +++++++++++++++++++++++++++++++++++++-
>  xen/include/asm-x86/hvm/domain.h |    1 +
>  xen/include/public/hvm/hvm_op.h  |   33 +++++++----
>  5 files changed, 164 insertions(+), 14 deletions(-)
> 
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index 8cec171..67829c5 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -1459,6 +1459,30 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
>      return rc;
>  }
>  
> +int xc_hvm_set_ioreq_server_state(xc_interface *xch,
> +                                  domid_t domid,
> +                                  ioservid_t id,
> +                                  int enabled)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_set_ioreq_server_state_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_set_ioreq_server_state;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +    arg->domid = domid;
> +    arg->id = id;
> +    arg->enabled = enabled;
> +    rc = do_xen_hypercall(xch, &hypercall);
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}

Newlines for clarity please.

I'm not really looking at the actual interface here. I'm assuming they
are a pretty straight exposure of the underlying hypercall and relying
on the hypervisor guys to agree that is a sane interface.
> diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
> index c6ceea5..a39290e 100644
> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -296,10 +296,10 @@ DEFINE_XEN_GUEST_HANDLE(xen_hvm_get_ioreq_server_info_t);
>  
>  #define HVMOP_map_io_range_to_ioreq_server 19
>  struct xen_hvm_map_io_range_to_ioreq_server {
> -    domid_t domid;                  /* IN - domain to be serviced */
> -    ioservid_t id;                  /* IN - handle from HVMOP_register_ioreq_server */
> -    int is_mmio;                    /* IN - MMIO or port IO? */
> -    uint64_aligned_t start, end;    /* IN - inclusive start and end of range */
> +    domid_t domid;               /* IN - domain to be serviced */
> +    ioservid_t id;               /* IN - handle from HVMOP_register_ioreq_server */
> +    int is_mmio;                 /* IN - MMIO or port IO? */
> +    uint64_aligned_t start, end; /* IN - inclusive start and end of range */

There seems to be a lot of gratuitous whitespace changes in this patch.
Shouldn't most of these be folded into the previous patch which
introduced things so that happens with the correct indentation?

Ian.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 6/8] ioreq-server: remove p2m entries when server is enabled
  2014-04-07 16:00   ` Ian Campbell
@ 2014-04-08  8:33     ` Paul Durrant
  0 siblings, 0 replies; 62+ messages in thread
From: Paul Durrant @ 2014-04-08  8:33 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Ian Jackson, Stefano Stabellini, Jan Beulich,
	xen-devel@lists.xen.org

> -----Original Message-----
> From: Ian Campbell
> Sent: 07 April 2014 17:01
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Jan Beulich
> Subject: Re: [PATCH v4 6/8] ioreq-server: remove p2m entries when server is
> enabled
> 
> On Wed, 2014-04-02 at 16:11 +0100, Paul Durrant wrote:
> > For secondary servers, add a hvm op to enable/disable the server. The
> > server will not accept IO until it is enabled and the act of enabling
> > the server removes its pages from the guest p2m, thus preventing the
> guest
> > from directly mapping the pages and synthesizing ioreqs.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > Cc: Ian Campbell <ian.campbell@citrix.com>
> > Cc: Jan Beulich <jbeulich@suse.com>
> > ---
> >  tools/libxc/xc_domain.c          |   24 ++++++++
> >  tools/libxc/xenctrl.h            |    5 ++
> >  xen/arch/x86/hvm/hvm.c           |  115
> +++++++++++++++++++++++++++++++++++++-
> >  xen/include/asm-x86/hvm/domain.h |    1 +
> >  xen/include/public/hvm/hvm_op.h  |   33 +++++++----
> >  5 files changed, 164 insertions(+), 14 deletions(-)
> >
> > diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> > index 8cec171..67829c5 100644
> > --- a/tools/libxc/xc_domain.c
> > +++ b/tools/libxc/xc_domain.c
> > @@ -1459,6 +1459,30 @@ int xc_hvm_destroy_ioreq_server(xc_interface
> *xch,
> >      return rc;
> >  }
> >
> > +int xc_hvm_set_ioreq_server_state(xc_interface *xch,
> > +                                  domid_t domid,
> > +                                  ioservid_t id,
> > +                                  int enabled)
> > +{
> > +    DECLARE_HYPERCALL;
> > +    DECLARE_HYPERCALL_BUFFER(xen_hvm_set_ioreq_server_state_t,
> arg);
> > +    int rc;
> > +
> > +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> > +    if ( arg == NULL )
> > +        return -1;
> > +
> > +    hypercall.op     = __HYPERVISOR_hvm_op;
> > +    hypercall.arg[0] = HVMOP_set_ioreq_server_state;
> > +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> > +    arg->domid = domid;
> > +    arg->id = id;
> > +    arg->enabled = enabled;
> > +    rc = do_xen_hypercall(xch, &hypercall);
> > +    xc_hypercall_buffer_free(xch, arg);
> > +    return rc;
> > +}
> 
> Newlines for clarity please.
> 

Sure.

> I'm not really looking at the actual interface here. I'm assuming they
> are a pretty straight exposure of the underlying hypercall and relying
> on the hypervisor guys to agree that is a sane interface.
> > diff --git a/xen/include/public/hvm/hvm_op.h
> b/xen/include/public/hvm/hvm_op.h
> > index c6ceea5..a39290e 100644
> > --- a/xen/include/public/hvm/hvm_op.h
> > +++ b/xen/include/public/hvm/hvm_op.h
> > @@ -296,10 +296,10 @@
> DEFINE_XEN_GUEST_HANDLE(xen_hvm_get_ioreq_server_info_t);
> >
> >  #define HVMOP_map_io_range_to_ioreq_server 19
> >  struct xen_hvm_map_io_range_to_ioreq_server {
> > -    domid_t domid;                  /* IN - domain to be serviced */
> > -    ioservid_t id;                  /* IN - handle from
> HVMOP_register_ioreq_server */
> > -    int is_mmio;                    /* IN - MMIO or port IO? */
> > -    uint64_aligned_t start, end;    /* IN - inclusive start and end of range */
> > +    domid_t domid;               /* IN - domain to be serviced */
> > +    ioservid_t id;               /* IN - handle from HVMOP_register_ioreq_server
> */
> > +    int is_mmio;                 /* IN - MMIO or port IO? */
> > +    uint64_aligned_t start, end; /* IN - inclusive start and end of range */
> 
> There seems to be a lot of gratuitous whitespace changes in this patch.
> Shouldn't most of these be folded into the previous patch which
> introduced things so that happens with the correct indentation?
> 

Indeed they should. Emacs must have been playing tricks on me.

  Paul

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 6/8] ioreq-server: remove p2m entries when server is enabled
  2014-04-02 15:11 ` [PATCH v4 6/8] ioreq-server: remove p2m entries when server is enabled Paul Durrant
  2014-04-07 16:00   ` Ian Campbell
@ 2014-04-09 12:20   ` Jan Beulich
  2014-04-09 13:36     ` Paul Durrant
  1 sibling, 1 reply; 62+ messages in thread
From: Jan Beulich @ 2014-04-09 12:20 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, Ian Jackson, Ian Campbell, Stefano Stabellini

>>> On 02.04.14 at 17:11, <paul.durrant@citrix.com> wrote:
> For secondary servers, add a hvm op to enable/disable the server. The
> server will not accept IO until it is enabled and the act of enabling
> the server removes its pages from the guest p2m, thus preventing the guest
> from directly mapping the pages and synthesizing ioreqs.

So why do these pages get put into the physmap in the first place?

> +int xc_hvm_set_ioreq_server_state(xc_interface *xch,
> +                                  domid_t domid,
> +                                  ioservid_t id,
> +                                  int enabled)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_set_ioreq_server_state_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_set_ioreq_server_state;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +    arg->domid = domid;
> +    arg->id = id;
> +    arg->enabled = enabled;

Truncating int to uint8_t.

> @@ -996,7 +1035,7 @@ static void hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
>  
>          --d->arch.hvm_domain.ioreq_server_count;
>          list_del_init(&s->list_entry);
> -        
> +

Stray white space cleanup (should be done right in the patch adding
that code).

> +static int hvmop_set_ioreq_server_state(
> +    XEN_GUEST_HANDLE_PARAM(xen_hvm_set_ioreq_server_state_t) uop)
> +{
> +    xen_hvm_set_ioreq_server_state_t op;
> +    struct domain *d;
> +    int rc;
> +
> +    if ( copy_from_guest(&op, uop, 1) )
> +        return -EFAULT;
> +
> +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> +    if ( rc != 0 )
> +        return rc;
> +
> +    rc = -EINVAL;
> +    if ( !is_hvm_domain(d) )
> +        goto out;
> +
> +    rc = hvm_set_ioreq_server_state(d, op.id, op.enabled);

So you're converting uint8_t to bool_t here, which presently seems
to do what you want. But I think you'd be better of using !! here.

Also, you're pretty consistently naming the field/variable "enabled"
rather than "enable", despite it being a transition you're invoking
rather than obtaining state.

Jan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 6/8] ioreq-server: remove p2m entries when server is enabled
  2014-04-09 12:20   ` Jan Beulich
@ 2014-04-09 13:36     ` Paul Durrant
  2014-04-09 13:50       ` Jan Beulich
  0 siblings, 1 reply; 62+ messages in thread
From: Paul Durrant @ 2014-04-09 13:36 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Jackson, Stefano Stabellini, Ian Campbell,
	xen-devel@lists.xen.org

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 09 April 2014 13:21
> To: Paul Durrant
> Cc: Ian Campbell; Ian Jackson; Stefano Stabellini; xen-devel@lists.xen.org
> Subject: Re: [PATCH v4 6/8] ioreq-server: remove p2m entries when server is
> enabled
> 
> >>> On 02.04.14 at 17:11, <paul.durrant@citrix.com> wrote:
> > For secondary servers, add a hvm op to enable/disable the server. The
> > server will not accept IO until it is enabled and the act of enabling
> > the server removes its pages from the guest p2m, thus preventing the
> guest
> > from directly mapping the pages and synthesizing ioreqs.
> 
> So why do these pages get put into the physmap in the first place?
> 
> > +int xc_hvm_set_ioreq_server_state(xc_interface *xch,
> > +                                  domid_t domid,
> > +                                  ioservid_t id,
> > +                                  int enabled)
> > +{
> > +    DECLARE_HYPERCALL;
> > +    DECLARE_HYPERCALL_BUFFER(xen_hvm_set_ioreq_server_state_t,
> arg);
> > +    int rc;
> > +
> > +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> > +    if ( arg == NULL )
> > +        return -1;
> > +
> > +    hypercall.op     = __HYPERVISOR_hvm_op;
> > +    hypercall.arg[0] = HVMOP_set_ioreq_server_state;
> > +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> > +    arg->domid = domid;
> > +    arg->id = id;
> > +    arg->enabled = enabled;
> 
> Truncating int to uint8_t.
> 
> > @@ -996,7 +1035,7 @@ static void hvm_destroy_ioreq_server(struct
> domain *d, ioservid_t id)
> >
> >          --d->arch.hvm_domain.ioreq_server_count;
> >          list_del_init(&s->list_entry);
> > -
> > +
> 
> Stray white space cleanup (should be done right in the patch adding
> that code).
> 
> > +static int hvmop_set_ioreq_server_state(
> > +    XEN_GUEST_HANDLE_PARAM(xen_hvm_set_ioreq_server_state_t)
> uop)
> > +{
> > +    xen_hvm_set_ioreq_server_state_t op;
> > +    struct domain *d;
> > +    int rc;
> > +
> > +    if ( copy_from_guest(&op, uop, 1) )
> > +        return -EFAULT;
> > +
> > +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> > +    if ( rc != 0 )
> > +        return rc;
> > +
> > +    rc = -EINVAL;
> > +    if ( !is_hvm_domain(d) )
> > +        goto out;
> > +
> > +    rc = hvm_set_ioreq_server_state(d, op.id, op.enabled);
> 
> So you're converting uint8_t to bool_t here, which presently seems
> to do what you want. But I think you'd be better of using !! here.
> 

Ok.

> Also, you're pretty consistently naming the field/variable "enabled"
> rather than "enable", despite it being a transition you're invoking
> rather than obtaining state.
> 

Yes, because I'm setting whether the server state is 'enabled' or not. The value of the boolean is the end state not the transition, so it's correct to use the adjective rather than the verb.

  Paul

> Jan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 6/8] ioreq-server: remove p2m entries when server is enabled
  2014-04-09 13:36     ` Paul Durrant
@ 2014-04-09 13:50       ` Jan Beulich
  0 siblings, 0 replies; 62+ messages in thread
From: Jan Beulich @ 2014-04-09 13:50 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Ian Jackson, Stefano Stabellini, Ian Campbell,
	xen-devel@lists.xen.org

>>> On 09.04.14 at 15:36, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Also, you're pretty consistently naming the field/variable "enabled"
>> rather than "enable", despite it being a transition you're invoking
>> rather than obtaining state.
> 
> Yes, because I'm setting whether the server state is 'enabled' or not. The 
> value of the boolean is the end state not the transition, so it's correct to 
> use the adjective rather than the verb.

Hmm, that's not my way of thinking with operations like this. To me,
the operation is to enable (or disable) the server, not to set its state
to enabled (or disabled). But yes, one may view it your way too,
even if I would think that's not commonly done (and I think I saw
"enable"s too somewhere in the series, and maybe even in the same
patch, so I'd be inclined to ask for consistency even if our ways of
thinking of these operations differ).

Jan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 7/8] ioreq-server: make buffered ioreq handling optional
  2014-04-02 15:11 [PATCH v4 0/8] Support for running secondary emulators Paul Durrant
                   ` (5 preceding siblings ...)
  2014-04-02 15:11 ` [PATCH v4 6/8] ioreq-server: remove p2m entries when server is enabled Paul Durrant
@ 2014-04-02 15:11 ` Paul Durrant
  2014-04-07 16:06   ` Ian Campbell
  2014-04-02 15:11 ` [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller implementation into Xen Paul Durrant
  7 siblings, 1 reply; 62+ messages in thread
From: Paul Durrant @ 2014-04-02 15:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Paul Durrant, Ian Jackson, Ian Campbell, Jan Beulich,
	Stefano Stabellini

Some emulators will only register regions that require non-buffered
access. (In practice the only region that a guest uses buffered access
for today is the VGA aperture from 0xa0000-0xbffff). This patch therefore
makes allocation of the buffered ioreq page and event channel optional for
secondary ioreq servers.

If a guest attempts buffered access to an ioreq server that does not
support it, the access will be handled via the normal synchronous path.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 tools/libxc/xc_domain.c         |    2 ++
 tools/libxc/xenctrl.h           |    1 +
 xen/arch/x86/hvm/hvm.c          |   74 +++++++++++++++++++++++++++------------
 xen/include/public/hvm/hvm_op.h |    5 +--
 4 files changed, 58 insertions(+), 24 deletions(-)

diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 67829c5..6eacce6 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1286,6 +1286,7 @@ int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long
 
 int xc_hvm_create_ioreq_server(xc_interface *xch,
                                domid_t domid,
+                               int handle_bufioreq,
                                ioservid_t *id)
 {
     DECLARE_HYPERCALL;
@@ -1300,6 +1301,7 @@ int xc_hvm_create_ioreq_server(xc_interface *xch,
     hypercall.arg[0] = HVMOP_create_ioreq_server;
     hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
     arg->domid = domid;
+    arg->handle_bufioreq = handle_bufioreq;
     rc = do_xen_hypercall(xch, &hypercall);
     *id = arg->id;
     xc_hypercall_buffer_free(xch, arg);
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 1f8d490..cc0dab9 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1807,6 +1807,7 @@ int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long
 
 int xc_hvm_create_ioreq_server(xc_interface *xch,
                                domid_t domid,
+                               int handle_bufioreq,
                                ioservid_t *id);
 
 int xc_hvm_get_ioreq_server_info(xc_interface *xch,
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index ba9b304..6a117e8 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -709,7 +709,7 @@ static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
 
     sv->ioreq_evtchn = rc;
 
-    if ( v->vcpu_id == 0 )
+    if ( v->vcpu_id == 0 && s->bufioreq.va != NULL )
     {
         struct domain *d = s->domain;
 
@@ -761,7 +761,7 @@ static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s,
 
         list_del_init(&sv->list_entry);
 
-        if ( v->vcpu_id == 0 )
+        if ( v->vcpu_id == 0 && s->bufioreq.va != NULL )
             free_xen_event_channel(v, s->bufioreq_evtchn);
 
         free_xen_event_channel(v, sv->ioreq_evtchn);
@@ -788,7 +788,7 @@ static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
 
         list_del_init(&sv->list_entry);
 
-        if ( v->vcpu_id == 0 )
+        if ( v->vcpu_id == 0 && s->bufioreq.va != NULL )
             free_xen_event_channel(v, s->bufioreq_evtchn);
 
         free_xen_event_channel(v, sv->ioreq_evtchn);
@@ -800,7 +800,7 @@ static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
 }
 
 static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s,
-                                      bool_t is_default)
+                                      bool_t is_default, bool_t handle_bufioreq)
 {
     struct domain *d = s->domain;
     unsigned long ioreq_pfn, bufioreq_pfn;
@@ -808,24 +808,34 @@ static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s,
 
     if ( is_default ) {
         ioreq_pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
+
+        /*
+         * The default ioreq server must handle buffered ioreqs, for
+         * backwards compatibility.
+         */
+        ASSERT(handle_bufioreq);
         bufioreq_pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
     } else {
         rc = hvm_alloc_ioreq_gmfn(d, &ioreq_pfn);
         if ( rc )
             goto fail1;
 
-        rc = hvm_alloc_ioreq_gmfn(d, &bufioreq_pfn);
-        if ( rc )
-            goto fail2;
+        if ( handle_bufioreq ) {
+            rc = hvm_alloc_ioreq_gmfn(d, &bufioreq_pfn);
+            if ( rc )
+                goto fail2;
+        }
     }
 
     rc = hvm_map_ioreq_page(d, &s->ioreq, ioreq_pfn);
     if ( rc )
         goto fail3;
 
-    rc = hvm_map_ioreq_page(d, &s->bufioreq, bufioreq_pfn);
-    if ( rc )
-        goto fail4;
+    if ( handle_bufioreq ) {
+        rc = hvm_map_ioreq_page(d, &s->bufioreq, bufioreq_pfn);
+        if ( rc )
+            goto fail4;
+    }
 
     return 0;
 
@@ -833,7 +843,7 @@ fail4:
     hvm_unmap_ioreq_page(&s->ioreq);
 
 fail3:
-    if ( !is_default )
+    if ( !is_default && handle_bufioreq )
         hvm_free_ioreq_gmfn(d, bufioreq_pfn);
 
 fail2:
@@ -848,12 +858,17 @@ static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s,
                                          bool_t is_default)
 {
     struct domain *d = s->domain;
+    bool_t handle_bufioreq = ( s->bufioreq.va != NULL );
+
+    if ( handle_bufioreq )
+        hvm_unmap_ioreq_page(&s->bufioreq);
 
-    hvm_unmap_ioreq_page(&s->bufioreq);
     hvm_unmap_ioreq_page(&s->ioreq);
 
     if ( !is_default ) {
-        hvm_free_ioreq_gmfn(d, s->bufioreq.gmfn);
+        if ( handle_bufioreq )
+            hvm_free_ioreq_gmfn(d, s->bufioreq.gmfn);
+
         hvm_free_ioreq_gmfn(d, s->ioreq.gmfn);
     }
 }
@@ -880,7 +895,7 @@ static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s)
 
 static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, struct domain *d,
                                  domid_t domid, bool_t is_default,
-                                 ioservid_t id)
+                                 bool_t handle_bufioreq, ioservid_t id)
 {
     struct vcpu *v;
     int rc;
@@ -896,7 +911,7 @@ static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, struct domain *d,
     INIT_LIST_HEAD(&s->ioreq_vcpu_list);
     spin_lock_init(&s->bufioreq_lock);
 
-    rc = hvm_ioreq_server_map_pages(s, is_default);
+    rc = hvm_ioreq_server_map_pages(s, is_default, handle_bufioreq);
     if ( rc )
         return rc;
 
@@ -960,7 +975,8 @@ static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s,
 }
 
 static int hvm_create_ioreq_server(struct domain *d, domid_t domid,
-                                   bool_t is_default, ioservid_t *id)
+                                   bool_t is_default, bool_t handle_bufioreq,
+                                   ioservid_t *id)
 {
     struct hvm_ioreq_server *s;
     int rc;
@@ -978,7 +994,7 @@ static int hvm_create_ioreq_server(struct domain *d, domid_t domid,
 
     domain_pause(d);
 
-    rc = hvm_ioreq_server_init(s, d, domid, is_default,
+    rc = hvm_ioreq_server_init(s, d, domid, is_default, handle_bufioreq,
                                d->arch.hvm_domain.ioreq_server_id++);
     if ( rc )
         goto fail3;
@@ -1070,8 +1086,11 @@ static int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
             continue;
 
         *ioreq_pfn = s->ioreq.gmfn;
-        *bufioreq_pfn = s->bufioreq.gmfn;
-        *bufioreq_port = s->bufioreq_evtchn;
+
+        if ( s->bufioreq.va != NULL ) {
+            *bufioreq_pfn = s->bufioreq.gmfn;
+            *bufioreq_port = s->bufioreq_evtchn;
+        }
 
         rc = 0;
         break;
@@ -1425,6 +1444,13 @@ static int hvm_set_ioreq_pfn(struct domain *d, bool_t buf,
     spin_lock(&s->lock);
 
     iorp = buf ? &s->bufioreq : &s->ioreq;
+
+    /*
+     * There must already be mapped page, set up when the
+     * ioreq server was created.
+     */
+    hvm_unmap_ioreq_page(iorp);
+
     rc = hvm_map_ioreq_page(d, iorp, pfn);
     if ( rc )
         goto fail;
@@ -2493,6 +2519,9 @@ int hvm_buffered_io_send(struct domain *d, ioreq_t *p)
     iorp = &s->bufioreq;
     pg = iorp->va;
 
+    if ( !pg )
+        return 0;
+
     /*
      * Return 0 for the cases we can't deal with:
      *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
@@ -5262,7 +5291,8 @@ static int hvmop_create_ioreq_server(
     if ( !is_hvm_domain(d) )
         goto out;
 
-    rc = hvm_create_ioreq_server(d, curr_d->domain_id, 0, &op.id);
+    rc = hvm_create_ioreq_server(d, curr_d->domain_id, 0, op.handle_bufioreq,
+                                 &op.id);
     if ( rc != 0 )
         goto out;
 
@@ -5599,7 +5629,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 if ( a.value == DOMID_SELF )
                     a.value = curr_d->domain_id;
 
-                rc = hvm_create_ioreq_server(d, a.value, 1, NULL);
+                rc = hvm_create_ioreq_server(d, a.value, 1, 1, NULL);
                 if ( rc == -EEXIST )
                     rc = hvm_set_dm_domain(d, a.value);
                 break;
@@ -5731,7 +5761,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 
                 /* May need to create server */
                 domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
-                rc = hvm_create_ioreq_server(d, domid, 1, NULL);
+                rc = hvm_create_ioreq_server(d, domid, 1, 1, NULL);
                 if ( rc != 0 && rc != -EEXIST )
                     goto param_fail;
                 /*FALLTHRU*/
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index a39290e..0ecb492 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -277,8 +277,9 @@ DEFINE_XEN_GUEST_HANDLE(ioservid_t);
 
 #define HVMOP_create_ioreq_server 17
 struct xen_hvm_create_ioreq_server {
-    domid_t domid;  /* IN - domain to be serviced */
-    ioservid_t id;  /* OUT - server id */
+    domid_t domid;           /* IN - domain to be serviced */
+    uint8_t handle_bufioreq; /* IN - should server handle buffered ioreqs */
+    ioservid_t id;           /* OUT - server id */
 };
 typedef struct xen_hvm_create_ioreq_server xen_hvm_create_ioreq_server_t;
 DEFINE_XEN_GUEST_HANDLE(xen_hvm_create_ioreq_server_t);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 7/8] ioreq-server: make buffered ioreq handling optional
  2014-04-02 15:11 ` [PATCH v4 7/8] ioreq-server: make buffered ioreq handling optional Paul Durrant
@ 2014-04-07 16:06   ` Ian Campbell
  2014-04-08  8:35     ` Paul Durrant
  0 siblings, 1 reply; 62+ messages in thread
From: Ian Campbell @ 2014-04-07 16:06 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Stefano Stabellini, Ian Jackson, Jan Beulich, xen-devel

On Wed, 2014-04-02 at 16:11 +0100, Paul Durrant wrote:
> Some emulators will only register regions that require non-buffered
> access. (In practice the only region that a guest uses buffered access
> for today is the VGA aperture from 0xa0000-0xbffff). This patch therefore
> makes allocation of the buffered ioreq page and event channel optional for
> secondary ioreq servers.
> 
> If a guest attempts buffered access to an ioreq server that does not
> support it, the access will be handled via the normal synchronous path.

In terms of the guest PFN space do the magic PFNs get packed together or
are there holes? (Mostly just curious...)

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 7/8] ioreq-server: make buffered ioreq handling optional
  2014-04-07 16:06   ` Ian Campbell
@ 2014-04-08  8:35     ` Paul Durrant
  0 siblings, 0 replies; 62+ messages in thread
From: Paul Durrant @ 2014-04-08  8:35 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Ian Jackson, Stefano Stabellini, Jan Beulich,
	xen-devel@lists.xen.org

> -----Original Message-----
> From: Ian Campbell
> Sent: 07 April 2014 17:07
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Jan Beulich
> Subject: Re: [PATCH v4 7/8] ioreq-server: make buffered ioreq handling
> optional
> 
> On Wed, 2014-04-02 at 16:11 +0100, Paul Durrant wrote:
> > Some emulators will only register regions that require non-buffered
> > access. (In practice the only region that a guest uses buffered access
> > for today is the VGA aperture from 0xa0000-0xbffff). This patch therefore
> > makes allocation of the buffered ioreq page and event channel optional for
> > secondary ioreq servers.
> >
> > If a guest attempts buffered access to an ioreq server that does not
> > support it, the access will be handled via the normal synchronous path.
> 
> In terms of the guest PFN space do the magic PFNs get packed together or
> are there holes? (Mostly just curious...)
> 

There can be holes. The allocator is just a dead-simple bitmap test-and-set/clear so which actual PFNs get assigned to which ioreq server depends on the sequence of creation/destruction.

  Paul

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-04-02 15:11 [PATCH v4 0/8] Support for running secondary emulators Paul Durrant
                   ` (6 preceding siblings ...)
  2014-04-02 15:11 ` [PATCH v4 7/8] ioreq-server: make buffered ioreq handling optional Paul Durrant
@ 2014-04-02 15:11 ` Paul Durrant
  2014-04-07 16:14   ` Ian Campbell
  2014-04-09 13:34   ` Jan Beulich
  7 siblings, 2 replies; 62+ messages in thread
From: Paul Durrant @ 2014-04-02 15:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Paul Durrant, Ian Jackson, Ian Campbell, Jan Beulich,
	Stefano Stabellini

Because we may now have more than one emulator, the implementation of the
PCI hotplug controller needs to be done by Xen. Happily the code is very
short and simple and it also removes the need for a different ACPI DSDT
when using different variants of QEMU.

As a precaution, we obscure the IO ranges used by QEMU traditional's gpe
and hotplug controller implementations to avoid the possibility of it
raising an SCI which will never be cleared.

VMs started on an older host and then migrated in will not use the in-Xen
controller as the AML may still point at QEMU traditional's hotplug
controller implementation. This means hotplug ops will fail with EOPNOTSUPP
and it is up to the caller to decide whether this is a problem or not.
libxl will ignore EOPNOTSUPP as it is always hotplugging via QEMU so it does
not matter whether it is Xen or QEMU providing the implementation.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 tools/firmware/hvmloader/acpi/mk_dsdt.c |  191 +++++--------------------
 tools/libxc/xc_domain.c                 |   24 ++++
 tools/libxc/xenctrl.h                   |    9 ++
 tools/libxl/libxl_pci.c                 |   15 ++
 xen/arch/x86/hvm/Makefile               |    1 +
 xen/arch/x86/hvm/hotplug.c              |  231 +++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/hvm.c                  |   40 ++++++
 xen/include/asm-x86/hvm/domain.h        |   11 ++
 xen/include/asm-x86/hvm/io.h            |    8 +-
 xen/include/public/hvm/hvm_op.h         |    9 ++
 xen/include/public/hvm/ioreq.h          |    4 +
 11 files changed, 387 insertions(+), 156 deletions(-)
 create mode 100644 xen/arch/x86/hvm/hotplug.c

diff --git a/tools/firmware/hvmloader/acpi/mk_dsdt.c b/tools/firmware/hvmloader/acpi/mk_dsdt.c
index a4b693b..7c4dd45 100644
--- a/tools/firmware/hvmloader/acpi/mk_dsdt.c
+++ b/tools/firmware/hvmloader/acpi/mk_dsdt.c
@@ -8,11 +8,6 @@
 
 static unsigned int indent_level;
 
-typedef enum dm_version {
-    QEMU_XEN_TRADITIONAL,
-    QEMU_XEN,
-} dm_version;
-
 static void indent(void)
 {
     unsigned int i;
@@ -58,28 +53,6 @@ static void pop_block(void)
     printf("}\n");
 }
 
-static void pci_hotplug_notify(unsigned int slt)
-{
-    stmt("Notify", "\\_SB.PCI0.S%02X, EVT", slt);
-}
-
-static void decision_tree(
-    unsigned int s, unsigned int e, char *var, void (*leaf)(unsigned int))
-{
-    if ( s == (e-1) )
-    {
-        (*leaf)(s);
-        return;
-    }
-
-    push_block("If", "And(%s, 0x%02x)", var, (e-s)/2);
-    decision_tree((s+e)/2, e, var, leaf);
-    pop_block();
-    push_block("Else", NULL);
-    decision_tree(s, (s+e)/2, var, leaf);
-    pop_block();
-}
-
 static struct option options[] = {
     { "maxcpu", 1, 0, 'c' },
     { "dm-version", 1, 0, 'q' },
@@ -89,7 +62,6 @@ static struct option options[] = {
 int main(int argc, char **argv)
 {
     unsigned int slot, dev, intx, link, cpu, max_cpus = HVM_MAX_VCPUS;
-    dm_version dm_version = QEMU_XEN_TRADITIONAL;
 
     for ( ; ; )
     {
@@ -116,14 +88,7 @@ int main(int argc, char **argv)
             break;
         }
         case 'q':
-            if (strcmp(optarg, "qemu-xen") == 0) {
-                dm_version = QEMU_XEN;
-            } else if (strcmp(optarg, "qemu-xen-traditional") == 0) {
-                dm_version = QEMU_XEN_TRADITIONAL;
-            } else {
-                fprintf(stderr, "Unknown device model version `%s'.\n", optarg);
-                return -1;
-            }
+            /* qemu version - no longer used */
             break;
         default:
             return -1;
@@ -222,11 +187,8 @@ int main(int argc, char **argv)
 
     /* Define GPE control method. */
     push_block("Scope", "\\_GPE");
-    if (dm_version == QEMU_XEN_TRADITIONAL) {
-        push_block("Method", "_L02");
-    } else {
-        push_block("Method", "_E02");
-    }
+    push_block("Method", "_L02");
+
     stmt("Return", "\\_SB.PRSC()");
     pop_block();
     pop_block();
@@ -237,23 +199,17 @@ int main(int argc, char **argv)
     push_block("Scope", "\\_SB.PCI0");
 
     /*
-     * Reserve the IO port ranges [0x10c0, 0x1101] and [0xb044, 0xb047].
-     * Or else, for a hotplugged-in device, the port IO BAR assigned
-     * by guest OS may conflict with the ranges here.
+     * Reserve the IO port ranges used by PCI hotplug controller or else,
+     * for a hotplugged-in device, the port IO BAR assigned by guest OS may
+     * conflict with the ranges here.
      */
     push_block("Device", "HP0"); {
         stmt("Name", "_HID, EISAID(\"PNP0C02\")");
-        if (dm_version == QEMU_XEN_TRADITIONAL) {
-            stmt("Name", "_CRS, ResourceTemplate() {"
-                 "  IO (Decode16, 0x10c0, 0x10c0, 0x00, 0x82)"
-                 "  IO (Decode16, 0xb044, 0xb044, 0x00, 0x04)"
-                 "}");
-        } else {
-            stmt("Name", "_CRS, ResourceTemplate() {"
-                 "  IO (Decode16, 0xae00, 0xae00, 0x00, 0x10)"
-                 "  IO (Decode16, 0xb044, 0xb044, 0x00, 0x04)"
-                 "}");
-        }
+        stmt("Name", "_CRS, ResourceTemplate() {"
+             "  IO (Decode16, 0x10c0, 0x10c0, 0x00, 0x82)"
+             "  IO (Decode16, 0xb044, 0xb044, 0x00, 0x04)"
+             "  IO (Decode16, 0xae00, 0xae00, 0x00, 0x10)"
+             "}");
     } pop_block();
 
     /*** PCI-ISA link definitions ***/
@@ -322,64 +278,21 @@ int main(int argc, char **argv)
                    dev, intx, ((dev*4+dev/8+intx)&31)+16);
     printf("})\n");
 
-    /*
-     * Each PCI hotplug slot needs at least two methods to handle
-     * the ACPI event:
-     *  _EJ0: eject a device
-     *  _STA: return a device's status, e.g. enabled or removed
-     * 
-     * Eject button would generate a general-purpose event, then the
-     * control method for this event uses Notify() to inform OSPM which
-     * action happened and on which device.
-     *
-     * Pls. refer "6.3 Device Insertion, Removal, and Status Objects"
-     * in ACPI spec 3.0b for details.
-     *
-     * QEMU provides a simple hotplug controller with some I/O to handle
-     * the hotplug action and status, which is beyond the ACPI scope.
-     */
-    if (dm_version == QEMU_XEN_TRADITIONAL) {
-        for ( slot = 0; slot < 0x100; slot++ )
-        {
-            push_block("Device", "S%02X", slot);
-            /* _ADR == dev:fn (16:16) */
-            stmt("Name", "_ADR, 0x%08x", ((slot & ~7) << 13) | (slot & 7));
-            /* _SUN == dev */
-            stmt("Name", "_SUN, 0x%08x", slot >> 3);
-            push_block("Method", "_EJ0, 1");
-            stmt("Store", "0x%02x, \\_GPE.DPT1", slot);
-            stmt("Store", "0x88, \\_GPE.DPT2");
-            stmt("Store", "0x%02x, \\_GPE.PH%02X", /* eject */
-                 (slot & 1) ? 0x10 : 0x01, slot & ~1);
-            pop_block();
-            push_block("Method", "_STA, 0");
-            stmt("Store", "0x%02x, \\_GPE.DPT1", slot);
-            stmt("Store", "0x89, \\_GPE.DPT2");
-            if ( slot & 1 )
-                stmt("ShiftRight", "0x4, \\_GPE.PH%02X, Local1", slot & ~1);
-            else
-                stmt("And", "\\_GPE.PH%02X, 0x0f, Local1", slot & ~1);
-            stmt("Return", "Local1"); /* IN status as the _STA */
-            pop_block();
-            pop_block();
-        }
-    } else {
-        stmt("OperationRegion", "SEJ, SystemIO, 0xae08, 0x04");
-        push_block("Field", "SEJ, DWordAcc, NoLock, WriteAsZeros");
-        indent(); printf("B0EJ, 32,\n");
-        pop_block();
+    stmt("OperationRegion", "SEJ, SystemIO, 0xae08, 0x04");
+    push_block("Field", "SEJ, DWordAcc, NoLock, WriteAsZeros");
+    indent(); printf("B0EJ, 32,\n");
+    pop_block();
 
-        /* hotplug_slot */
-        for (slot = 1; slot <= 31; slot++) {
-            push_block("Device", "S%i", slot); {
-                stmt("Name", "_ADR, %#06x0000", slot);
-                push_block("Method", "_EJ0,1"); {
-                    stmt("Store", "ShiftLeft(1, %#06x), B0EJ", slot);
-                    stmt("Return", "0x0");
-                } pop_block();
-                stmt("Name", "_SUN, %i", slot);
+    /* hotplug_slot */
+    for (slot = 1; slot <= 31; slot++) {
+        push_block("Device", "S%i", slot); {
+            stmt("Name", "_ADR, %#06x0000", slot);
+            push_block("Method", "_EJ0,1"); {
+                stmt("Store", "ShiftLeft(1, %#06x), B0EJ", slot);
+                stmt("Return", "0x0");
             } pop_block();
-        }
+            stmt("Name", "_SUN, %i", slot);
+        } pop_block();
     }
 
     pop_block();
@@ -389,26 +302,11 @@ int main(int argc, char **argv)
     /**** GPE start ****/
     push_block("Scope", "\\_GPE");
 
-    if (dm_version == QEMU_XEN_TRADITIONAL) {
-        stmt("OperationRegion", "PHP, SystemIO, 0x10c0, 0x82");
-
-        push_block("Field", "PHP, ByteAcc, NoLock, Preserve");
-        indent(); printf("PSTA, 8,\n"); /* hotplug controller event reg */
-        indent(); printf("PSTB, 8,\n"); /* hotplug controller slot reg */
-        for ( slot = 0; slot < 0x100; slot += 2 )
-        {
-            indent();
-            /* Each hotplug control register manages a pair of pci functions. */
-            printf("PH%02X, 8,\n", slot);
-        }
-        pop_block();
-    } else {
-        stmt("OperationRegion", "PCST, SystemIO, 0xae00, 0x08");
-        push_block("Field", "PCST, DWordAcc, NoLock, WriteAsZeros");
-        indent(); printf("PCIU, 32,\n");
-        indent(); printf("PCID, 32,\n");
-        pop_block();
-    }
+    stmt("OperationRegion", "PCST, SystemIO, 0xae00, 0x08");
+    push_block("Field", "PCST, DWordAcc, NoLock, WriteAsZeros");
+    indent(); printf("PCIU, 32,\n");
+    indent(); printf("PCID, 32,\n");
+    pop_block();
 
     stmt("OperationRegion", "DG1, SystemIO, 0xb044, 0x04");
 
@@ -416,33 +314,16 @@ int main(int argc, char **argv)
     indent(); printf("DPT1, 8, DPT2, 8\n");
     pop_block();
 
-    if (dm_version == QEMU_XEN_TRADITIONAL) {
-        push_block("Method", "_L03, 0, Serialized");
-        /* Detect slot and event (remove/add). */
-        stmt("Name", "SLT, 0x0");
-        stmt("Name", "EVT, 0x0");
-        stmt("Store", "PSTA, Local1");
-        stmt("And", "Local1, 0xf, EVT");
-        stmt("Store", "PSTB, Local1"); /* XXX: Store (PSTB, SLT) ? */
-        stmt("And", "Local1, 0xff, SLT");
-        /* Debug */
-        stmt("Store", "SLT, DPT1");
-        stmt("Store", "EVT, DPT2");
-        /* Decision tree */
-        decision_tree(0x00, 0x100, "SLT", pci_hotplug_notify);
+    push_block("Method", "_E01");
+    for (slot = 1; slot <= 31; slot++) {
+        push_block("If", "And(PCIU, ShiftLeft(1, %i))", slot);
+        stmt("Notify", "\\_SB.PCI0.S%i, 1", slot);
         pop_block();
-    } else {
-        push_block("Method", "_E01");
-        for (slot = 1; slot <= 31; slot++) {
-            push_block("If", "And(PCIU, ShiftLeft(1, %i))", slot);
-            stmt("Notify", "\\_SB.PCI0.S%i, 1", slot);
-            pop_block();
-            push_block("If", "And(PCID, ShiftLeft(1, %i))", slot);
-            stmt("Notify", "\\_SB.PCI0.S%i, 3", slot);
-            pop_block();
-        }
+        push_block("If", "And(PCID, ShiftLeft(1, %i))", slot);
+        stmt("Notify", "\\_SB.PCI0.S%i, 3", slot);
         pop_block();
     }
+    pop_block();
 
     pop_block();
     /**** GPE end ****/
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 6eacce6..0f3c56b 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1485,6 +1485,30 @@ int xc_hvm_set_ioreq_server_state(xc_interface *xch,
     return rc;
 }
 
+int xc_hvm_pci_hotplug(xc_interface *xch,
+                       domid_t domid,
+                       uint32_t slot,
+                       int enable)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_pci_hotplug_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_pci_hotplug;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->slot = slot;
+    arg->enable = enable;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
 int xc_domain_setdebugging(xc_interface *xch,
                            uint32_t domid,
                            unsigned int enable)
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index cc0dab9..1eee77b 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1849,6 +1849,15 @@ int xc_hvm_set_ioreq_server_state(xc_interface *xch,
                                   ioservid_t id,
                                   int enabled);
 
+/*
+ * Hotplug controller API
+ */
+
+int xc_hvm_pci_hotplug(xc_interface *xch,
+                       domid_t domid,
+                       uint32_t slot,
+                       int enable);
+
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
                      uint32_t domid,
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 44d0453..968cd5a 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -867,6 +867,13 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
         }
         if ( rc )
             return ERROR_FAIL;
+
+        rc = xc_hvm_pci_hotplug(ctx->xch, domid, pcidev->dev, 1);
+        if (rc < 0 && errno != EOPNOTSUPP) {
+            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "Error: xc_hvm_pci_hotplug enable failed");
+            return ERROR_FAIL;
+        }
+
         break;
     case LIBXL_DOMAIN_TYPE_PV:
     {
@@ -1188,6 +1195,14 @@ static int do_pci_remove(libxl__gc *gc, uint32_t domid,
                                          NULL, NULL, NULL) < 0)
             goto out_fail;
 
+        rc = xc_hvm_pci_hotplug(ctx->xch, domid, pcidev->dev, 0);
+        if (rc < 0 && errno != EOPNOTSUPP) {
+            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR,
+                             "Error: xc_hvm_pci_hotplug disable failed");
+            rc = ERROR_FAIL;
+            goto out_fail;
+        }
+
         switch (libxl__device_model_version_running(gc, domid)) {
         case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
             rc = qemu_pci_remove_xenstore(gc, domid, pcidev, force);
diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
index eea5555..48efddb 100644
--- a/xen/arch/x86/hvm/Makefile
+++ b/xen/arch/x86/hvm/Makefile
@@ -3,6 +3,7 @@ subdir-y += vmx
 
 obj-y += asid.o
 obj-y += emulate.o
+obj-y += hotplug.o
 obj-y += hpet.o
 obj-y += hvm.o
 obj-y += i8254.o
diff --git a/xen/arch/x86/hvm/hotplug.c b/xen/arch/x86/hvm/hotplug.c
new file mode 100644
index 0000000..a70ef3f
--- /dev/null
+++ b/xen/arch/x86/hvm/hotplug.c
@@ -0,0 +1,231 @@
+/*
+ * hvm/hotplug.c
+ *
+ * Copyright (c) 2014, Citrix Systems Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#include <xen/types.h>
+#include <xen/spinlock.h>
+#include <xen/xmalloc.h>
+#include <asm/hvm/io.h>
+#include <asm/hvm/support.h>
+
+#define SCI_IRQ 9
+
+/* pci status bit: \_GPE._L02 - i.e. level sensitive, bit 2 */
+#define GPE_PCI_HOTPLUG_STATUS  2
+
+#define PCI_UP      0
+#define PCI_DOWN    4
+#define PCI_EJECT   8
+
+static void gpe_update_sci(struct hvm_hotplug *hp)
+{
+    struct domain *d;
+
+    d = container_of(
+            container_of(
+                container_of(hp, struct hvm_domain, hotplug),
+                struct arch_domain, hvm_domain),
+            struct domain, arch);
+
+    if ( (hp->gpe_sts[0] & hp->gpe_en[0]) & GPE_PCI_HOTPLUG_STATUS )
+        hvm_isa_irq_assert(d, SCI_IRQ);
+    else
+        hvm_isa_irq_deassert(d, SCI_IRQ);
+}
+
+static int handle_gpe_io(
+    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+{
+    struct vcpu *v = current;
+    struct domain *d = v->domain;
+    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
+
+    if ( bytes != 1 )
+    {
+        gdprintk(XENLOG_WARNING, "%s: bad access\n", __func__);
+        goto done;
+    }
+
+    port -= ACPI_GPE0_BLK_ADDRESS_V1;
+
+    if ( dir == IOREQ_READ )
+    {
+        if ( port < ACPI_GPE0_BLK_LEN_V1 / 2 )
+        {
+            *val = hp->gpe_sts[port];
+        }
+        else
+        {
+            port -= ACPI_GPE0_BLK_LEN_V1 / 2;
+            *val = hp->gpe_en[port];
+        }
+    } else {
+        if ( port < ACPI_GPE0_BLK_LEN_V1 / 2 )
+        {
+            hp->gpe_sts[port] &= ~*val;
+        }
+        else
+        {
+            port -= ACPI_GPE0_BLK_LEN_V1 / 2;
+            hp->gpe_en[port] = *val;
+        }
+
+        gpe_update_sci(hp);
+    }
+
+ done:
+    return X86EMUL_OKAY;
+}
+
+static void pci_hotplug_eject(struct hvm_hotplug *hp, uint32_t mask)
+{
+    int slot = ffs(mask) - 1;
+
+    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, slot);
+
+    hp->slot_down &= ~(1u  << slot);
+    hp->slot_up &= ~(1u  << slot);
+}
+
+static int handle_pci_hotplug_io(
+    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+{
+    struct vcpu *v = current;
+    struct domain *d = v->domain;
+    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
+
+    if ( bytes != 4 )
+    {
+        gdprintk(XENLOG_WARNING, "%s: bad access\n", __func__);
+        goto done;
+    }
+
+    port -= ACPI_PCI_HOTPLUG_ADDRESS_V1;
+
+    if ( dir == IOREQ_READ )
+    {
+        switch ( port )
+        {
+        case PCI_UP:
+            *val = hp->slot_up;
+            break;
+        case PCI_DOWN:
+            *val = hp->slot_down;
+            break;
+        default:
+            break;
+        }
+    }
+    else
+    {   
+        switch ( port )
+        {
+        case PCI_EJECT:
+            pci_hotplug_eject(hp, *val);
+            break;
+        default:
+            break;
+        }
+    }
+
+ done:
+    return X86EMUL_OKAY;
+}
+
+static int null_io(
+    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+{
+    /* Make it look like this IO range is non-existent */
+    if ( dir == IOREQ_READ )
+        *val = ~0u;
+
+    return X86EMUL_OKAY;
+}
+
+int pci_hotplug(struct domain *d, int slot, bool_t enable)
+{
+    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
+
+    if ( !hp->gpe_sts )
+        return -EOPNOTSUPP;
+
+    if ( enable )
+        hp->slot_up |= (1u << slot);
+    else
+        hp->slot_down |= (1u << slot);
+
+    hp->gpe_sts[0] |= GPE_PCI_HOTPLUG_STATUS;
+    gpe_update_sci(hp);
+
+    return 0;
+}
+
+int gpe_init(struct domain *d)
+{
+    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
+
+    hp->gpe_sts = xzalloc_array(uint8_t, ACPI_GPE0_BLK_LEN_V1 / 2);
+    if ( hp->gpe_sts == NULL )
+        goto fail1;
+
+    hp->gpe_en = xzalloc_array(uint8_t, ACPI_GPE0_BLK_LEN_V1 / 2);
+    if ( hp->gpe_en == NULL )
+        goto fail2;
+
+    register_portio_handler(d, ACPI_GPE0_BLK_ADDRESS_V1,
+                            ACPI_GPE0_BLK_LEN_V1, handle_gpe_io);
+    register_portio_handler(d, ACPI_PCI_HOTPLUG_ADDRESS_V1,
+                            ACPI_PCI_HOTPLUG_LEN_V1, handle_pci_hotplug_io);
+
+    /*
+     * We should make sure that the old GPE and hotplug controller ranges
+     * used by qemu trad are obscured to avoid confusion.
+     */
+    register_portio_handler(d, ACPI_GPE0_BLK_ADDRESS_V0,
+                            ACPI_GPE0_BLK_LEN_V0, null_io);
+    register_portio_handler(d, ACPI_PCI_HOTPLUG_ADDRESS_V0,
+                            ACPI_PCI_HOTPLUG_LEN_V0, null_io);
+
+
+    return 0;
+
+ fail2:
+    xfree(hp->gpe_sts);
+
+ fail1:
+    return -ENOMEM;
+}
+
+void gpe_deinit(struct domain *d)
+{
+    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
+
+    xfree(hp->gpe_en);
+    xfree(hp->gpe_sts);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * c-tab-always-indent: nil
+ * End:
+ */
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 6a117e8..ce3d90a 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1670,6 +1670,7 @@ void hvm_domain_destroy(struct domain *d)
         return;
 
     hvm_funcs.domain_destroy(d);
+    gpe_deinit(d);
     rtc_deinit(d);
     stdvga_deinit(d);
     vioapic_deinit(d);
@@ -5487,6 +5488,31 @@ static int hvmop_destroy_ioreq_server(
     return rc;
 }
 
+static int hvmop_pci_hotplug(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_pci_hotplug_t) uop)
+{
+    xen_hvm_pci_hotplug_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = pci_hotplug(d, op.slot, op.enable);
+
+out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
 long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
 {
@@ -5535,6 +5561,11 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             guest_handle_cast(arg, xen_hvm_destroy_ioreq_server_t));
         break;
     
+    case HVMOP_pci_hotplug:
+        rc = hvmop_pci_hotplug(
+            guest_handle_cast(arg, xen_hvm_pci_hotplug_t));
+        break;
+
     case HVMOP_set_param:
     case HVMOP_get_param:
     {
@@ -5714,6 +5745,15 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                     break;
                 }
                 d->arch.hvm_domain.ioreq_gmfn_count = a.value;
+
+                /*
+                 * Since secondary emulators are now possible, enable
+                 * the PCI hotplug controller.
+                 */
+                rc = gpe_init(d);
+                if ( rc == -EEXIST )
+                    rc = 0;
+
                 break;
             }
 
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 36cb7ec..5840983 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -82,6 +82,15 @@ struct hvm_ioreq_server {
     bool_t                 enabled;
 };
 
+struct hvm_hotplug {
+    uint8_t         *gpe_sts;
+    uint8_t         *gpe_en;
+
+    /* PCI hotplug */
+    uint32_t        slot_up;
+    uint32_t        slot_down;
+};
+
 struct hvm_domain {
     /* Guest page range used for non-default ioreq servers */
     unsigned long           ioreq_gmfn_base;
@@ -99,6 +108,8 @@ struct hvm_domain {
     uint32_t                pci_cf8;
     spinlock_t              pci_lock;
 
+    struct hvm_hotplug      hotplug;
+
     struct pl_time         pl_time;
 
     struct hvm_io_handler *io_handler;
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index be6546d..6631b9a 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -25,7 +25,7 @@
 #include <public/hvm/ioreq.h>
 #include <public/event_channel.h>
 
-#define MAX_IO_HANDLER             16
+#define MAX_IO_HANDLER             32
 
 #define HVM_PORTIO                  0
 #define HVM_BUFFERED_IO             2
@@ -142,5 +142,11 @@ void stdvga_init(struct domain *d);
 void stdvga_deinit(struct domain *d);
 
 extern void hvm_dpci_msi_eoi(struct domain *d, int vector);
+
+int gpe_init(struct domain *d);
+void gpe_deinit(struct domain *d);
+
+int pci_hotplug(struct domain *d, int slot, bool_t enable);
+
 #endif /* __ASM_X86_HVM_IO_H__ */
 
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index 0ecb492..59c1c5b 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -350,6 +350,15 @@ struct xen_hvm_set_ioreq_server_state {
 typedef struct xen_hvm_set_ioreq_server_state xen_hvm_set_ioreq_server_state_t;
 DEFINE_XEN_GUEST_HANDLE(xen_hvm_set_ioreq_server_state_t);
 
+#define HVMOP_pci_hotplug 25
+struct xen_hvm_pci_hotplug {
+    domid_t domid;          /* IN - domain to be serviced */
+    uint8_t enable;         /* IN - enable or disable? */
+    uint32_t slot;          /* IN - slot to enable/disable */
+};
+typedef struct xen_hvm_pci_hotplug xen_hvm_pci_hotplug_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_pci_hotplug_t);
+
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 
 #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
diff --git a/xen/include/public/hvm/ioreq.h b/xen/include/public/hvm/ioreq.h
index e84fa75..44b1b94 100644
--- a/xen/include/public/hvm/ioreq.h
+++ b/xen/include/public/hvm/ioreq.h
@@ -94,6 +94,8 @@ typedef struct buffered_iopage buffered_iopage_t;
 #define ACPI_PM_TMR_BLK_ADDRESS_V0   (ACPI_PM1A_EVT_BLK_ADDRESS_V0 + 0x08)
 #define ACPI_GPE0_BLK_ADDRESS_V0     (ACPI_PM_TMR_BLK_ADDRESS_V0 + 0x20)
 #define ACPI_GPE0_BLK_LEN_V0         0x08
+#define ACPI_PCI_HOTPLUG_ADDRESS_V0  0x10c0
+#define ACPI_PCI_HOTPLUG_LEN_V0      0x82 /* NR_PHP_SLOT_REG in piix4acpi.c */
 
 /* Version 1: Locations preferred by modern Qemu. */
 #define ACPI_PM1A_EVT_BLK_ADDRESS_V1 0xb000
@@ -101,6 +103,8 @@ typedef struct buffered_iopage buffered_iopage_t;
 #define ACPI_PM_TMR_BLK_ADDRESS_V1   (ACPI_PM1A_EVT_BLK_ADDRESS_V1 + 0x08)
 #define ACPI_GPE0_BLK_ADDRESS_V1     0xafe0
 #define ACPI_GPE0_BLK_LEN_V1         0x04
+#define ACPI_PCI_HOTPLUG_ADDRESS_V1  0xae00
+#define ACPI_PCI_HOTPLUG_LEN_V1      0x10
 
 /* Compatibility definitions for the default location (version 0). */
 #define ACPI_PM1A_EVT_BLK_ADDRESS    ACPI_PM1A_EVT_BLK_ADDRESS_V0
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-04-02 15:11 ` [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller implementation into Xen Paul Durrant
@ 2014-04-07 16:14   ` Ian Campbell
  2014-04-08  8:25     ` Paul Durrant
  2014-04-09 13:34   ` Jan Beulich
  1 sibling, 1 reply; 62+ messages in thread
From: Ian Campbell @ 2014-04-07 16:14 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Stefano Stabellini, Ian Jackson, Jan Beulich, xen-devel

On Wed, 2014-04-02 at 16:11 +0100, Paul Durrant wrote:
> Because we may now have more than one emulator, the implementation of the
> PCI hotplug controller needs to be done by Xen. Happily the code is very
> short and simple and it also removes the need for a different ACPI DSDT
> when using different variants of QEMU.
> 
> As a precaution, we obscure the IO ranges used by QEMU traditional's gpe
> and hotplug controller implementations to avoid the possibility of it
> raising an SCI which will never be cleared.
> 
> VMs started on an older host and then migrated in will not use the in-Xen
> controller as the AML may still point at QEMU traditional's hotplug
> controller implementation. This means hotplug ops will fail with EOPNOTSUPP
> and it is up to the caller to decide whether this is a problem or not.
> libxl will ignore EOPNOTSUPP as it is always hotplugging via QEMU so it does
> not matter whether it is Xen or QEMU providing the implementation.

I don't follow the second half of this paragraph.

If it is always hotplugging via qemu where does it see EOPNOTSUPP from?

Also, if you are obscuring those regions now how does it continue to
work?

> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> ---
>  tools/firmware/hvmloader/acpi/mk_dsdt.c |  191 +++++--------------------
>  tools/libxc/xc_domain.c                 |   24 ++++
>  tools/libxc/xenctrl.h                   |    9 ++
>  tools/libxl/libxl_pci.c                 |   15 ++
>  xen/arch/x86/hvm/Makefile               |    1 +
>  xen/arch/x86/hvm/hotplug.c              |  231 +++++++++++++++++++++++++++++++
>  xen/arch/x86/hvm/hvm.c                  |   40 ++++++
>  xen/include/asm-x86/hvm/domain.h        |   11 ++
>  xen/include/asm-x86/hvm/io.h            |    8 +-
>  xen/include/public/hvm/hvm_op.h         |    9 ++
>  xen/include/public/hvm/ioreq.h          |    4 +
>  11 files changed, 387 insertions(+), 156 deletions(-)
>  create mode 100644 xen/arch/x86/hvm/hotplug.c
> 
> @@ -116,14 +88,7 @@ int main(int argc, char **argv)
>              break;
>          }
>          case 'q':
> -            if (strcmp(optarg, "qemu-xen") == 0) {
> -                dm_version = QEMU_XEN;
> -            } else if (strcmp(optarg, "qemu-xen-traditional") == 0) {
> -                dm_version = QEMU_XEN_TRADITIONAL;
> -            } else {
> -                fprintf(stderr, "Unknown device model version `%s'.\n", optarg);
> -                return -1;
> -            }
> +            /* qemu version - no longer used */

No need to keep this sort of legacy stuff in tools used solely by the
build system. All uses of this option should be removed and it should be
an error to use it.

> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index 6eacce6..0f3c56b 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -1485,6 +1485,30 @@ int xc_hvm_set_ioreq_server_state(xc_interface *xch,
>      return rc;
>  }
>  
> +int xc_hvm_pci_hotplug(xc_interface *xch,
> +                       domid_t domid,
> +                       uint32_t slot,
> +                       int enable)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_pci_hotplug_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_pci_hotplug;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +    arg->domid = domid;
> +    arg->slot = slot;
> +    arg->enable = enable;
> +    rc = do_xen_hypercall(xch, &hypercall);
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;

Newlines for clarity please.

> diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> index cc0dab9..1eee77b 100644
> --- a/tools/libxc/xenctrl.h
> +++ b/tools/libxc/xenctrl.h
> @@ -1849,6 +1849,15 @@ int xc_hvm_set_ioreq_server_state(xc_interface *xch,
>                                    ioservid_t id,
>                                    int enabled);
>  
> +/*
> + * Hotplug controller API
> + */
> +

According to the commit message the error codes from this function have
some pretty interesting semantics which should be documented IMHO.
Either here or in xen/include/public.

> +int xc_hvm_pci_hotplug(xc_interface *xch,
> +                       domid_t domid,
> +                       uint32_t slot,
> +                       int enable);
> +
>  /* HVM guest pass-through */
>  int xc_assign_device(xc_interface *xch,
>                       uint32_t domid,
> diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> index 44d0453..968cd5a 100644
> --- a/tools/libxl/libxl_pci.c
> +++ b/tools/libxl/libxl_pci.c
> @@ -867,6 +867,13 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
>          }
>          if ( rc )
>              return ERROR_FAIL;
> +
> +        rc = xc_hvm_pci_hotplug(ctx->xch, domid, pcidev->dev, 1);

CTX->xch is somewhat preferred for new code.

> +        if (rc < 0 && errno != EOPNOTSUPP) {
> +            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "Error: xc_hvm_pci_hotplug enable failed");

You can use the LOGE macro to shorten this line. 

Hrm, so I didn't see anything on the restore side which handles the
registration or not of the thing which would lead to EOPNOTSUPP vs
success on a guest started on an older Xen. How does all that actually
hang together?

Ian.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-04-07 16:14   ` Ian Campbell
@ 2014-04-08  8:25     ` Paul Durrant
  2014-04-08  8:45       ` Ian Campbell
  0 siblings, 1 reply; 62+ messages in thread
From: Paul Durrant @ 2014-04-08  8:25 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Ian Jackson, Stefano Stabellini, Jan Beulich,
	xen-devel@lists.xen.org

> -----Original Message-----
> From: Ian Campbell
> Sent: 07 April 2014 17:14
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Jan Beulich
> Subject: Re: [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller
> implementation into Xen
> 
> On Wed, 2014-04-02 at 16:11 +0100, Paul Durrant wrote:
> > Because we may now have more than one emulator, the implementation
> of the
> > PCI hotplug controller needs to be done by Xen. Happily the code is very
> > short and simple and it also removes the need for a different ACPI DSDT
> > when using different variants of QEMU.
> >
> > As a precaution, we obscure the IO ranges used by QEMU traditional's gpe
> > and hotplug controller implementations to avoid the possibility of it
> > raising an SCI which will never be cleared.
> >
> > VMs started on an older host and then migrated in will not use the in-Xen
> > controller as the AML may still point at QEMU traditional's hotplug
> > controller implementation. This means hotplug ops will fail with
> EOPNOTSUPP
> > and it is up to the caller to decide whether this is a problem or not.
> > libxl will ignore EOPNOTSUPP as it is always hotplugging via QEMU so it does
> > not matter whether it is Xen or QEMU providing the implementation.
> 
> I don't follow the second half of this paragraph.
> 
> If it is always hotplugging via qemu where does it see EOPNOTSUPP from?
> 

Sorry, I should have said hotplug ops via libxc.

> Also, if you are obscuring those regions now how does it continue to
> work?
> 

We only obscure the old regions if we create the in-xen hotplug controller. This doesn't happen for migrated-in guests.

> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > Cc: Ian Campbell <ian.campbell@citrix.com>
> > Cc: Jan Beulich <jbeulich@suse.com>
> > ---
> >  tools/firmware/hvmloader/acpi/mk_dsdt.c |  191 +++++--------------------
> >  tools/libxc/xc_domain.c                 |   24 ++++
> >  tools/libxc/xenctrl.h                   |    9 ++
> >  tools/libxl/libxl_pci.c                 |   15 ++
> >  xen/arch/x86/hvm/Makefile               |    1 +
> >  xen/arch/x86/hvm/hotplug.c              |  231
> +++++++++++++++++++++++++++++++
> >  xen/arch/x86/hvm/hvm.c                  |   40 ++++++
> >  xen/include/asm-x86/hvm/domain.h        |   11 ++
> >  xen/include/asm-x86/hvm/io.h            |    8 +-
> >  xen/include/public/hvm/hvm_op.h         |    9 ++
> >  xen/include/public/hvm/ioreq.h          |    4 +
> >  11 files changed, 387 insertions(+), 156 deletions(-)
> >  create mode 100644 xen/arch/x86/hvm/hotplug.c
> >
> > @@ -116,14 +88,7 @@ int main(int argc, char **argv)
> >              break;
> >          }
> >          case 'q':
> > -            if (strcmp(optarg, "qemu-xen") == 0) {
> > -                dm_version = QEMU_XEN;
> > -            } else if (strcmp(optarg, "qemu-xen-traditional") == 0) {
> > -                dm_version = QEMU_XEN_TRADITIONAL;
> > -            } else {
> > -                fprintf(stderr, "Unknown device model version `%s'.\n", optarg);
> > -                return -1;
> > -            }
> > +            /* qemu version - no longer used */
> 
> No need to keep this sort of legacy stuff in tools used solely by the
> build system. All uses of this option should be removed and it should be
> an error to use it.
> 

Ok.

> > diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> > index 6eacce6..0f3c56b 100644
> > --- a/tools/libxc/xc_domain.c
> > +++ b/tools/libxc/xc_domain.c
> > @@ -1485,6 +1485,30 @@ int
> xc_hvm_set_ioreq_server_state(xc_interface *xch,
> >      return rc;
> >  }
> >
> > +int xc_hvm_pci_hotplug(xc_interface *xch,
> > +                       domid_t domid,
> > +                       uint32_t slot,
> > +                       int enable)
> > +{
> > +    DECLARE_HYPERCALL;
> > +    DECLARE_HYPERCALL_BUFFER(xen_hvm_pci_hotplug_t, arg);
> > +    int rc;
> > +
> > +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> > +    if ( arg == NULL )
> > +        return -1;
> > +
> > +    hypercall.op     = __HYPERVISOR_hvm_op;
> > +    hypercall.arg[0] = HVMOP_pci_hotplug;
> > +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> > +    arg->domid = domid;
> > +    arg->slot = slot;
> > +    arg->enable = enable;
> > +    rc = do_xen_hypercall(xch, &hypercall);
> > +    xc_hypercall_buffer_free(xch, arg);
> > +    return rc;
> 
> Newlines for clarity please.

Ok.

> 
> > diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> > index cc0dab9..1eee77b 100644
> > --- a/tools/libxc/xenctrl.h
> > +++ b/tools/libxc/xenctrl.h
> > @@ -1849,6 +1849,15 @@ int
> xc_hvm_set_ioreq_server_state(xc_interface *xch,
> >                                    ioservid_t id,
> >                                    int enabled);
> >
> > +/*
> > + * Hotplug controller API
> > + */
> > +
> 
> According to the commit message the error codes from this function have
> some pretty interesting semantics which should be documented IMHO.
> Either here or in xen/include/public.
> 

Well, the fact that the hotplug controller is not necessarily created (and thus functions may return with errno==EOPNOTSUPP) is worth mentioning, so I'll add a comment here.

> > +int xc_hvm_pci_hotplug(xc_interface *xch,
> > +                       domid_t domid,
> > +                       uint32_t slot,
> > +                       int enable);
> > +
> >  /* HVM guest pass-through */
> >  int xc_assign_device(xc_interface *xch,
> >                       uint32_t domid,
> > diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> > index 44d0453..968cd5a 100644
> > --- a/tools/libxl/libxl_pci.c
> > +++ b/tools/libxl/libxl_pci.c
> > @@ -867,6 +867,13 @@ static int do_pci_add(libxl__gc *gc, uint32_t
> domid, libxl_device_pci *pcidev, i
> >          }
> >          if ( rc )
> >              return ERROR_FAIL;
> > +
> > +        rc = xc_hvm_pci_hotplug(ctx->xch, domid, pcidev->dev, 1);
> 
> CTX->xch is somewhat preferred for new code.
> 

Ok.

> > +        if (rc < 0 && errno != EOPNOTSUPP) {
> > +            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "Error:
> xc_hvm_pci_hotplug enable failed");
> 
> You can use the LOGE macro to shorten this line.
> 

Ok.

> Hrm, so I didn't see anything on the restore side which handles the
> registration or not of the thing which would lead to EOPNOTSUPP vs
> success on a guest started on an older Xen. How does all that actually
> hang together?
> 

If the guest was started on an older xen then HVM_PARAM_NR_IOREQ_SERVER_PAGES would not have been set, so when xc_domain_restore runs it would find no corresponding save record. Thus that param would not be set on restore and hence the controller would not be created. I could add another hvm op to make hotplug controller creation explicit if you like, but it seemed like rather a lot of extra code.

  Paul

> Ian.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-04-08  8:25     ` Paul Durrant
@ 2014-04-08  8:45       ` Ian Campbell
  2014-04-08  8:49         ` Paul Durrant
  0 siblings, 1 reply; 62+ messages in thread
From: Ian Campbell @ 2014-04-08  8:45 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Ian Jackson, Stefano Stabellini, Jan Beulich,
	xen-devel@lists.xen.org

On Tue, 2014-04-08 at 09:25 +0100, Paul Durrant wrote:

> > Also, if you are obscuring those regions now how does it continue to
> > work?
> > 
> 
> We only obscure the old regions if we create the in-xen hotplug
> controller. This doesn't happen for migrated-in guests.

I presume it does happen for migrated in guests from new Xen with this
support?

> > Hrm, so I didn't see anything on the restore side which handles the
> > registration or not of the thing which would lead to EOPNOTSUPP vs
> > success on a guest started on an older Xen. How does all that actually
> > hang together?
> > 
> 
> If the guest was started on an older xen then
> HVM_PARAM_NR_IOREQ_SERVER_PAGES would not have been set, so when
> xc_domain_restore runs it would find no corresponding save record.
> Thus that param would not be set on restore and hence the controller
> would not be created. I could add another hvm op to make hotplug
> controller creation explicit if you like, but it seemed like rather a
> lot of extra code.

I think what wasn't obvious was that the use of the ioreq server
interfaces also implicitly causes the hotplug controller to be enabled
within Xen instead of elsewhere. I can see the chain of events which
leads to this now that it has been pointed out, and with that I can see
where the commit message implies it, but I think it could do with being
made explicit.

Ian.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-04-08  8:45       ` Ian Campbell
@ 2014-04-08  8:49         ` Paul Durrant
  2014-04-08  8:57           ` Ian Campbell
  0 siblings, 1 reply; 62+ messages in thread
From: Paul Durrant @ 2014-04-08  8:49 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Ian Jackson, Stefano Stabellini, Jan Beulich,
	xen-devel@lists.xen.org

> -----Original Message-----
> From: Ian Campbell
> Sent: 08 April 2014 09:45
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Jan Beulich
> Subject: Re: [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller
> implementation into Xen
> 
> On Tue, 2014-04-08 at 09:25 +0100, Paul Durrant wrote:
> 
> > > Also, if you are obscuring those regions now how does it continue to
> > > work?
> > >
> >
> > We only obscure the old regions if we create the in-xen hotplug
> > controller. This doesn't happen for migrated-in guests.
> 
> I presume it does happen for migrated in guests from new Xen with this
> support?

Sorry, yes. I meant migrated-in guests from old xen; I'll fix the text.

> 
> > > Hrm, so I didn't see anything on the restore side which handles the
> > > registration or not of the thing which would lead to EOPNOTSUPP vs
> > > success on a guest started on an older Xen. How does all that actually
> > > hang together?
> > >
> >
> > If the guest was started on an older xen then
> > HVM_PARAM_NR_IOREQ_SERVER_PAGES would not have been set, so
> when
> > xc_domain_restore runs it would find no corresponding save record.
> > Thus that param would not be set on restore and hence the controller
> > would not be created. I could add another hvm op to make hotplug
> > controller creation explicit if you like, but it seemed like rather a
> > lot of extra code.
> 
> I think what wasn't obvious was that the use of the ioreq server
> interfaces also implicitly causes the hotplug controller to be enabled
> within Xen instead of elsewhere. I can see the chain of events which
> leads to this now that it has been pointed out, and with that I can see
> where the commit message implies it, but I think it could do with being
> made explicit.
> 

Ok. I can shortcut some code-churn by using an HVM param (e.g. setting it to non-zero creates the controller) or would you prefer a new HVMOP?

  Paul

> Ian.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-04-08  8:49         ` Paul Durrant
@ 2014-04-08  8:57           ` Ian Campbell
  2014-04-08  9:00             ` Paul Durrant
  0 siblings, 1 reply; 62+ messages in thread
From: Ian Campbell @ 2014-04-08  8:57 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Ian Jackson, Stefano Stabellini, Jan Beulich,
	xen-devel@lists.xen.org

On Tue, 2014-04-08 at 09:49 +0100, Paul Durrant wrote:
> > -----Original Message-----
> > From: Ian Campbell
> > Sent: 08 April 2014 09:45
> > To: Paul Durrant
> > Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Jan Beulich
> > Subject: Re: [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller
> > implementation into Xen
> > 
> > On Tue, 2014-04-08 at 09:25 +0100, Paul Durrant wrote:
> > 
> > > > Also, if you are obscuring those regions now how does it continue to
> > > > work?
> > > >
> > >
> > > We only obscure the old regions if we create the in-xen hotplug
> > > controller. This doesn't happen for migrated-in guests.
> > 
> > I presume it does happen for migrated in guests from new Xen with this
> > support?
> 
> Sorry, yes. I meant migrated-in guests from old xen; I'll fix the text.
> 
> > 
> > > > Hrm, so I didn't see anything on the restore side which handles the
> > > > registration or not of the thing which would lead to EOPNOTSUPP vs
> > > > success on a guest started on an older Xen. How does all that actually
> > > > hang together?
> > > >
> > >
> > > If the guest was started on an older xen then
> > > HVM_PARAM_NR_IOREQ_SERVER_PAGES would not have been set, so
> > when
> > > xc_domain_restore runs it would find no corresponding save record.
> > > Thus that param would not be set on restore and hence the controller
> > > would not be created. I could add another hvm op to make hotplug
> > > controller creation explicit if you like, but it seemed like rather a
> > > lot of extra code.
> > 
> > I think what wasn't obvious was that the use of the ioreq server
> > interfaces also implicitly causes the hotplug controller to be enabled
> > within Xen instead of elsewhere. I can see the chain of events which
> > leads to this now that it has been pointed out, and with that I can see
> > where the commit message implies it, but I think it could do with being
> > made explicit.
> > 
> 
> Ok. I can shortcut some code-churn by using an HVM param (e.g. setting
> it to non-zero creates the controller) or would you prefer a new
> HVMOP?

Sorry, I meant explicit in the docs/comments etc, I don't think an
explicit HVPOP is necessarily worth it (although the hypervisor side
people may disagree and I wouldn't object to adding it).

It's possible that just making the support for migration from older
Xen's more explicit in the restore code as we were discussing on another
and having a suitable comment at that point will be sufficient.

e.g. 

	/* If we are migrating from blah then register the blah blah,
	 * this will also enable the in-Xen hotplug controller. etc etc
	 * If we are migrating from an older Xen then this chunk won't
	 * be present and the hotplug controller is provided by qemu 
	 * (sometimes?) and the ioreq pfns are ...  */
	if (we got the new chunk)
	{
		xc_hvm_whatever(...)
	}

(fill in the details ;-))

Ian.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-04-08  8:57           ` Ian Campbell
@ 2014-04-08  9:00             ` Paul Durrant
  0 siblings, 0 replies; 62+ messages in thread
From: Paul Durrant @ 2014-04-08  9:00 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Ian Jackson, Stefano Stabellini, Jan Beulich,
	xen-devel@lists.xen.org

> -----Original Message-----
> From: Ian Campbell
> Sent: 08 April 2014 09:57
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Jan Beulich
> Subject: Re: [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller
> implementation into Xen
> 
> On Tue, 2014-04-08 at 09:49 +0100, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Ian Campbell
> > > Sent: 08 April 2014 09:45
> > > To: Paul Durrant
> > > Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Jan Beulich
> > > Subject: Re: [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller
> > > implementation into Xen
> > >
> > > On Tue, 2014-04-08 at 09:25 +0100, Paul Durrant wrote:
> > >
> > > > > Also, if you are obscuring those regions now how does it continue to
> > > > > work?
> > > > >
> > > >
> > > > We only obscure the old regions if we create the in-xen hotplug
> > > > controller. This doesn't happen for migrated-in guests.
> > >
> > > I presume it does happen for migrated in guests from new Xen with this
> > > support?
> >
> > Sorry, yes. I meant migrated-in guests from old xen; I'll fix the text.
> >
> > >
> > > > > Hrm, so I didn't see anything on the restore side which handles the
> > > > > registration or not of the thing which would lead to EOPNOTSUPP vs
> > > > > success on a guest started on an older Xen. How does all that actually
> > > > > hang together?
> > > > >
> > > >
> > > > If the guest was started on an older xen then
> > > > HVM_PARAM_NR_IOREQ_SERVER_PAGES would not have been set,
> so
> > > when
> > > > xc_domain_restore runs it would find no corresponding save record.
> > > > Thus that param would not be set on restore and hence the controller
> > > > would not be created. I could add another hvm op to make hotplug
> > > > controller creation explicit if you like, but it seemed like rather a
> > > > lot of extra code.
> > >
> > > I think what wasn't obvious was that the use of the ioreq server
> > > interfaces also implicitly causes the hotplug controller to be enabled
> > > within Xen instead of elsewhere. I can see the chain of events which
> > > leads to this now that it has been pointed out, and with that I can see
> > > where the commit message implies it, but I think it could do with being
> > > made explicit.
> > >
> >
> > Ok. I can shortcut some code-churn by using an HVM param (e.g. setting
> > it to non-zero creates the controller) or would you prefer a new
> > HVMOP?
> 
> Sorry, I meant explicit in the docs/comments etc, I don't think an
> explicit HVPOP is necessarily worth it (although the hypervisor side
> people may disagree and I wouldn't object to adding it).
> 
> It's possible that just making the support for migration from older
> Xen's more explicit in the restore code as we were discussing on another
> and having a suitable comment at that point will be sufficient.
> 

Ok. I'll do that then :-)

  Paul

> e.g.
> 
> 	/* If we are migrating from blah then register the blah blah,
> 	 * this will also enable the in-Xen hotplug controller. etc etc
> 	 * If we are migrating from an older Xen then this chunk won't
> 	 * be present and the hotplug controller is provided by qemu
> 	 * (sometimes?) and the ioreq pfns are ...  */
> 	if (we got the new chunk)
> 	{
> 		xc_hvm_whatever(...)
> 	}
> 
> (fill in the details ;-))
> 
> Ian.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-04-02 15:11 ` [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller implementation into Xen Paul Durrant
  2014-04-07 16:14   ` Ian Campbell
@ 2014-04-09 13:34   ` Jan Beulich
  2014-04-09 13:42     ` Paul Durrant
  1 sibling, 1 reply; 62+ messages in thread
From: Jan Beulich @ 2014-04-09 13:34 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, Ian Jackson, Ian Campbell, Stefano Stabellini

>>> On 02.04.14 at 17:11, <paul.durrant@citrix.com> wrote:
> +static int handle_gpe_io(
> +    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
> +{
> +    struct vcpu *v = current;
> +    struct domain *d = v->domain;
> +    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
> +
> +    if ( bytes != 1 )

Is this really a valid restriction?

> +int gpe_init(struct domain *d)
> +{
> +    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
> +
> +    hp->gpe_sts = xzalloc_array(uint8_t, ACPI_GPE0_BLK_LEN_V1 / 2);
> +    if ( hp->gpe_sts == NULL )
> +        goto fail1;
> +
> +    hp->gpe_en = xzalloc_array(uint8_t, ACPI_GPE0_BLK_LEN_V1 / 2);
> +    if ( hp->gpe_en == NULL )
> +        goto fail2;

I'd like to ask (also elsewhere in this series) to try to limit the number
of "goto"s to the absolute minimum required to help code readability.
There's no need for them here: Allocate both blocks, then check both
pointers, and if either is NULL free them both in a single error path.

Jan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-04-09 13:34   ` Jan Beulich
@ 2014-04-09 13:42     ` Paul Durrant
  2014-04-09 13:53       ` Jan Beulich
  0 siblings, 1 reply; 62+ messages in thread
From: Paul Durrant @ 2014-04-09 13:42 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Jackson, Stefano Stabellini, Ian Campbell,
	xen-devel@lists.xen.org

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 09 April 2014 14:34
> To: Paul Durrant
> Cc: Ian Campbell; Ian Jackson; Stefano Stabellini; xen-devel@lists.xen.org
> Subject: Re: [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller
> implementation into Xen
> 
> >>> On 02.04.14 at 17:11, <paul.durrant@citrix.com> wrote:
> > +static int handle_gpe_io(
> > +    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
> > +{
> > +    struct vcpu *v = current;
> > +    struct domain *d = v->domain;
> > +    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
> > +
> > +    if ( bytes != 1 )
> 
> Is this really a valid restriction?
> 

Hmm, I believe so but I guess it would be more pragmatic to handle word and double-word access too.

> > +int gpe_init(struct domain *d)
> > +{
> > +    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
> > +
> > +    hp->gpe_sts = xzalloc_array(uint8_t, ACPI_GPE0_BLK_LEN_V1 / 2);
> > +    if ( hp->gpe_sts == NULL )
> > +        goto fail1;
> > +
> > +    hp->gpe_en = xzalloc_array(uint8_t, ACPI_GPE0_BLK_LEN_V1 / 2);
> > +    if ( hp->gpe_en == NULL )
> > +        goto fail2;
> 
> I'd like to ask (also elsewhere in this series) to try to limit the number
> of "goto"s to the absolute minimum required to help code readability.

Personally I find using forward jumps to fail labels the most readable form of error exit - I wish they were used more widely.

> There's no need for them here: Allocate both blocks, then check both
> pointers, and if either is NULL free them both in a single error path.
> 

I'll alloc a single array and carve it in half.

  Paul

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-04-09 13:42     ` Paul Durrant
@ 2014-04-09 13:53       ` Jan Beulich
  2014-04-09 14:25         ` Paul Durrant
  2014-04-09 14:59         ` Ian Jackson
  0 siblings, 2 replies; 62+ messages in thread
From: Jan Beulich @ 2014-04-09 13:53 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Ian Jackson, Stefano Stabellini, Ian Campbell,
	xen-devel@lists.xen.org

>>> On 09.04.14 at 15:42, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >>> On 02.04.14 at 17:11, <paul.durrant@citrix.com> wrote:
>> > +int gpe_init(struct domain *d)
>> > +{
>> > +    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
>> > +
>> > +    hp->gpe_sts = xzalloc_array(uint8_t, ACPI_GPE0_BLK_LEN_V1 / 2);
>> > +    if ( hp->gpe_sts == NULL )
>> > +        goto fail1;
>> > +
>> > +    hp->gpe_en = xzalloc_array(uint8_t, ACPI_GPE0_BLK_LEN_V1 / 2);
>> > +    if ( hp->gpe_en == NULL )
>> > +        goto fail2;
>> 
>> I'd like to ask (also elsewhere in this series) to try to limit the number
>> of "goto"s to the absolute minimum required to help code readability.
> 
> Personally I find using forward jumps to fail labels the most readable form 
> of error exit - I wish they were used more widely.

Interesting - almost everyone/-thing involved in educating me in
programming skills recommended to try to get away without goto
altogether, unless programming Fortran, Basic or some such.

Jan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-04-09 13:53       ` Jan Beulich
@ 2014-04-09 14:25         ` Paul Durrant
  2014-04-09 14:47           ` Jan Beulich
  2014-04-09 14:59         ` Ian Jackson
  1 sibling, 1 reply; 62+ messages in thread
From: Paul Durrant @ 2014-04-09 14:25 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Jackson, Stefano Stabellini, Ian Campbell,
	xen-devel@lists.xen.org

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 09 April 2014 14:54
> To: Paul Durrant
> Cc: Ian Campbell; Ian Jackson; Stefano Stabellini; xen-devel@lists.xen.org
> Subject: RE: [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller
> implementation into Xen
> 
> >>> On 09.04.14 at 15:42, <Paul.Durrant@citrix.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >>> On 02.04.14 at 17:11, <paul.durrant@citrix.com> wrote:
> >> > +int gpe_init(struct domain *d)
> >> > +{
> >> > +    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
> >> > +
> >> > +    hp->gpe_sts = xzalloc_array(uint8_t, ACPI_GPE0_BLK_LEN_V1 / 2);
> >> > +    if ( hp->gpe_sts == NULL )
> >> > +        goto fail1;
> >> > +
> >> > +    hp->gpe_en = xzalloc_array(uint8_t, ACPI_GPE0_BLK_LEN_V1 / 2);
> >> > +    if ( hp->gpe_en == NULL )
> >> > +        goto fail2;
> >>
> >> I'd like to ask (also elsewhere in this series) to try to limit the number
> >> of "goto"s to the absolute minimum required to help code readability.
> >
> > Personally I find using forward jumps to fail labels the most readable form
> > of error exit - I wish they were used more widely.
> 
> Interesting - almost everyone/-thing involved in educating me in
> programming skills recommended to try to get away without goto
> altogether, unless programming Fortran, Basic or some such.
> 

In my case it comes from programming for SPARC where forward branches were (are?) statically predicted untaken, so much better to code your error path as a single forward branch. Don't have to worry about such things for x86, but I'm used to reading code that looks like that :-/

  Paul

> Jan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-04-09 14:25         ` Paul Durrant
@ 2014-04-09 14:47           ` Jan Beulich
  0 siblings, 0 replies; 62+ messages in thread
From: Jan Beulich @ 2014-04-09 14:47 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Ian Jackson, Stefano Stabellini, Ian Campbell,
	xen-devel@lists.xen.org

>>> On 09.04.14 at 16:25, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >>> On 09.04.14 at 15:42, <Paul.Durrant@citrix.com> wrote:
>> > Personally I find using forward jumps to fail labels the most readable form
>> > of error exit - I wish they were used more widely.
>> 
>> Interesting - almost everyone/-thing involved in educating me in
>> programming skills recommended to try to get away without goto
>> altogether, unless programming Fortran, Basic or some such.
>> 
> 
> In my case it comes from programming for SPARC where forward branches were 
> (are?) statically predicted untaken, so much better to code your error path 
> as a single forward branch. Don't have to worry about such things for x86, 
> but I'm used to reading code that looks like that :-/

The static prediction rules are similar on x86, but there's no guarantee
that what is a forward branch at the source level would end up being
one in the translated code. That's what we've got likely()/unlikely() for.

Jan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-04-09 13:53       ` Jan Beulich
  2014-04-09 14:25         ` Paul Durrant
@ 2014-04-09 14:59         ` Ian Jackson
  2014-04-09 15:06           ` Jan Beulich
  2014-04-10 16:04           ` George Dunlap
  1 sibling, 2 replies; 62+ messages in thread
From: Ian Jackson @ 2014-04-09 14:59 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Paul Durrant, Stefano Stabellini, Ian Campbell,
	xen-devel@lists.xen.org

Jan Beulich writes ("RE: [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller implementation into Xen"):
> On 09.04.14 at 15:42, <Paul.Durrant@citrix.com> wrote:
> > Personally I find using forward jumps to fail labels the most
> > readable form of error exit - I wish they were used more widely.
> 
> Interesting - almost everyone/-thing involved in educating me in
> programming skills recommended to try to get away without goto
> altogether, unless programming Fortran, Basic or some such.

I think the hatred of goto is a "lie to children".  It is easy to
misuse goto and make spaghetti.

My view is that use of goto should normally be restricted to certain
very specific patterns.  Examples:
  * Providing a uniform cleanup path (eg error exit) from a function
  * "goto continue_foobar" or "goto break_foobar" for emulating
    "continue" or "break" on an outer loop from within an inner loop
  * Usages embedded in structural macros

Certainly goto should not be used if another control construct can do
the job (without excessive circumlocution or repetition).  In libxl we
have a few "goto retry_transaction" which I think should be abolished.
IME goto should not be used to construct loops.

But goto _should_ be used to avoid repetition.  The exit path pattern
is a particularly good example.  In functions which use this pattern:
  - all variables which refer to resources are initialised to
    "unallocated" (0, -1, whatever) at declaration
  - there is only one use of "return"
  - the return is preceded by a single copy of the cleanup for
    all the allocated resources
The result is that it is difficult to accidentally leak or
double-free resources.  Failure to initialise one of the resource
variables to "empty" is normally detected by the compiler.

In this pattern the label should have a conventional name.  In libxl
we use "out".

Ian.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-04-09 14:59         ` Ian Jackson
@ 2014-04-09 15:06           ` Jan Beulich
  2014-04-10 16:04           ` George Dunlap
  1 sibling, 0 replies; 62+ messages in thread
From: Jan Beulich @ 2014-04-09 15:06 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Paul Durrant, Stefano Stabellini, Ian Campbell,
	xen-devel@lists.xen.org

>>> On 09.04.14 at 16:59, <Ian.Jackson@eu.citrix.com> wrote:
> Jan Beulich writes ("RE: [PATCH v4 8/8] ioreq-server: bring the PCI hotplug 
> controller implementation into Xen"):
>> On 09.04.14 at 15:42, <Paul.Durrant@citrix.com> wrote:
>> > Personally I find using forward jumps to fail labels the most
>> > readable form of error exit - I wish they were used more widely.
>> 
>> Interesting - almost everyone/-thing involved in educating me in
>> programming skills recommended to try to get away without goto
>> altogether, unless programming Fortran, Basic or some such.
> 
> I think the hatred of goto is a "lie to children".  It is easy to
> misuse goto and make spaghetti.

Right, and when I see goto I unconditionally see spaghetti (which
in many cases I'm right with, but I learned to accept cases where
otherwise e.g. deep indentation would make code hard to read/
follow).

Jan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-04-09 14:59         ` Ian Jackson
  2014-04-09 15:06           ` Jan Beulich
@ 2014-04-10 16:04           ` George Dunlap
  1 sibling, 0 replies; 62+ messages in thread
From: George Dunlap @ 2014-04-10 16:04 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Paul Durrant, Stefano Stabellini, Ian Campbell, Jan Beulich,
	xen-devel@lists.xen.org

On Wed, Apr 9, 2014 at 3:59 PM, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:
> Jan Beulich writes ("RE: [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller implementation into Xen"):
>> On 09.04.14 at 15:42, <Paul.Durrant@citrix.com> wrote:
>> > Personally I find using forward jumps to fail labels the most
>> > readable form of error exit - I wish they were used more widely.
>>
>> Interesting - almost everyone/-thing involved in educating me in
>> programming skills recommended to try to get away without goto
>> altogether, unless programming Fortran, Basic or some such.
>
> I think the hatred of goto is a "lie to children".  It is easy to
> misuse goto and make spaghetti.

I read the original "Goto Considered Harmful" paper many years ago,
and the vast majority of the objections are handled in a modern
language like C.

break, contine, and switch statement are all just very specific
versions of "goto".  But it's not possible (nor desirable) for a
language to implement all uses of "goto"; Ian mentions some other uses
below.  In this case, cleaning up partial acquisition of resources
(e.g., a malloc failure after several malloc successes) is a very
common and well understood idiom in OS design.

 -George

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2014-04-10 16:04 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-02 15:11 [PATCH v4 0/8] Support for running secondary emulators Paul Durrant
2014-04-02 15:11 ` [PATCH v4 1/8] ioreq-server: pre-series tidy up Paul Durrant
2014-04-07 10:48   ` Jan Beulich
2014-04-08  9:13     ` Paul Durrant
2014-04-02 15:11 ` [PATCH v4 2/8] ioreq-server: centralize access to ioreq structures Paul Durrant
2014-04-03 11:22   ` George Dunlap
2014-04-07 11:10   ` Jan Beulich
2014-04-08  9:18     ` Paul Durrant
2014-04-02 15:11 ` [PATCH v4 3/8] ioreq-server: create basic ioreq server abstraction Paul Durrant
2014-04-03 14:49   ` George Dunlap
2014-04-03 15:43     ` Paul Durrant
2014-04-03 15:48       ` George Dunlap
2014-04-03 15:54         ` Paul Durrant
2014-04-07 11:36   ` Jan Beulich
2014-04-08  9:32     ` Paul Durrant
2014-04-08  9:47       ` Jan Beulich
2014-04-08 10:06         ` Paul Durrant
2014-04-02 15:11 ` [PATCH v4 4/8] ioreq-server: on-demand creation of ioreq server Paul Durrant
2014-04-07 11:50   ` Jan Beulich
2014-04-08  9:35     ` Paul Durrant
2014-04-08  9:51       ` Jan Beulich
2014-04-08 10:11         ` Paul Durrant
2014-04-02 15:11 ` [PATCH v4 5/8] ioreq-server: add support for multiple servers Paul Durrant
2014-04-03 15:32   ` George Dunlap
2014-04-03 15:39     ` Paul Durrant
2014-04-03 15:43       ` George Dunlap
2014-04-03 15:46         ` Paul Durrant
2014-04-07 15:57   ` Ian Campbell
2014-04-08  8:32     ` Paul Durrant
2014-04-08  8:40       ` Ian Campbell
2014-04-08  8:45         ` Paul Durrant
2014-04-09 12:43   ` Jan Beulich
2014-04-09 12:49     ` Ian Campbell
2014-04-09 13:15       ` Jan Beulich
2014-04-09 13:32     ` Paul Durrant
2014-04-09 13:46       ` Jan Beulich
2014-04-09 13:51         ` Paul Durrant
2014-04-09 14:42         ` Ian Campbell
2014-04-02 15:11 ` [PATCH v4 6/8] ioreq-server: remove p2m entries when server is enabled Paul Durrant
2014-04-07 16:00   ` Ian Campbell
2014-04-08  8:33     ` Paul Durrant
2014-04-09 12:20   ` Jan Beulich
2014-04-09 13:36     ` Paul Durrant
2014-04-09 13:50       ` Jan Beulich
2014-04-02 15:11 ` [PATCH v4 7/8] ioreq-server: make buffered ioreq handling optional Paul Durrant
2014-04-07 16:06   ` Ian Campbell
2014-04-08  8:35     ` Paul Durrant
2014-04-02 15:11 ` [PATCH v4 8/8] ioreq-server: bring the PCI hotplug controller implementation into Xen Paul Durrant
2014-04-07 16:14   ` Ian Campbell
2014-04-08  8:25     ` Paul Durrant
2014-04-08  8:45       ` Ian Campbell
2014-04-08  8:49         ` Paul Durrant
2014-04-08  8:57           ` Ian Campbell
2014-04-08  9:00             ` Paul Durrant
2014-04-09 13:34   ` Jan Beulich
2014-04-09 13:42     ` Paul Durrant
2014-04-09 13:53       ` Jan Beulich
2014-04-09 14:25         ` Paul Durrant
2014-04-09 14:47           ` Jan Beulich
2014-04-09 14:59         ` Ian Jackson
2014-04-09 15:06           ` Jan Beulich
2014-04-10 16:04           ` George Dunlap

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).