[RFC PATCH 1/5] Support for running secondary emulators

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 1/5] Support for running secondary emulators
@ 2014-01-30 14:19 Paul Durrant
  2014-01-30 14:19 ` [RFC PATCH 1/5] ioreq-server: centralize access to ioreq structures Paul Durrant
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Paul Durrant @ 2014-01-30 14:19 UTC (permalink / raw)
  To: xen-devel

This patch series adds the ioreq server interface which I mentioned in
my talk at the Xen developer summit in Edinburgh at the end of last year.
The code is based on work originally done by Julien Grall but has been
re-written to allow existing versions of QEMU to work unmodified.

The code is available in my xen.git [1] repo on xenbits, under the 'savannah'
branch, and I have also written a demo emulator to test the code, which can
be found in my demu.git [2] repo.

The modifications are broken down as follows:

Patch #1 basically just moves some code around to make subsequent patches
more obvious. The patch also removes the has_dm flag in hvememul_do_io() as
it is no longer necessary to special-case PVH domains in this way. (The I/O
can be completed by hvm_send_assist_req() later, which it is discovered there
is no shared ioreq page).

Patch #2 again is largely code movement, from various places into a new
hvm_ioreq_server structure. There should be no functional change at this
stage as the ioreq server is still created at domain initialisation time (as
were its contents prior to this patch).

Patch #3 is the first functional change. The ioreq server struct
initialisation is now deferred until something actually tries to play with
the HVM parameters which reference it. In practice this is QEMU, which
needs to read the ioreq pfns so it can map them.

Patch #4 is the big one. This moves from a single ioreq server per domain
to a list. The server that is created when the HVM parameters are reference
is given id 0 and is considered to be the 'catch all' server which is, after
all, how QEMU is used. Any secondary emulator, created using the new API
in xenctrl.h, will have id 1 or above and only gets ioreqs when I/O hits one
of its registered IO ranges or PCI devices.

Patch #5 pulls the PCI hotplug controller emulation into Xen. This is
necessary to allow a secondary emulator to hotplug a PCI device into the VM.
The code implements the controller in the same way as upstream QEMU and thus
the variant of the DSDT ASL used for upstream QEMU is retained.

There are no modifications to libxl to actually invoke a secondary emulator
at this stage. The only changes made are simply to increase the number of
special pages reserved for a VM to allow the use of more than one emulator
and call the new PCI hotplug API when attaching or detaching PCI devices.
The demo emulator can simply be invoked from a shell and will hotplug its
device onto the PCI bus (and remove it again when it's killed). The emulated
device is not an awful lot of use at this stage - it appears as a SCSI
controller with one IO BAR and one MEM BAR and has no intrinsic
functionality... but then it is only supposed to be demo :-)

  Paul

[1] http://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git
[2] http://xenbits.xen.org/gitweb/?p=people/pauldu/demu.git

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 1/5] ioreq-server: centralize access to ioreq structures
  2014-01-30 14:19 [RFC PATCH 1/5] Support for running secondary emulators Paul Durrant
@ 2014-01-30 14:19 ` Paul Durrant
  2014-01-30 14:32   ` Andrew Cooper
  2014-02-07  4:53   ` Matt Wilson
  2014-01-30 14:19 ` [RFC PATCH 2/5] ioreq-server: create basic ioreq server abstraction Paul Durrant
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 25+ messages in thread
From: Paul Durrant @ 2014-01-30 14:19 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant

To simplify creation of the ioreq server abstraction in a
subsequent patch, this patch centralizes all use of the shared
ioreq structure and the buffered ioreq ring to the source module
xen/arch/x86/hvm/hvm.c.
Also, re-work hvm_send_assist_req() slightly to complete IO
immediately in the case where there is no emulator (i.e. the shared
IOREQ ring has not been set). This should handle the case currently
covered by has_dm in hvmemul_do_io().

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
 xen/arch/x86/hvm/emulate.c        |   40 +++------------
 xen/arch/x86/hvm/hvm.c            |   98 ++++++++++++++++++++++++++++++++++++-
 xen/arch/x86/hvm/io.c             |   94 +----------------------------------
 xen/include/asm-x86/hvm/hvm.h     |    3 +-
 xen/include/asm-x86/hvm/support.h |    9 ----
 5 files changed, 108 insertions(+), 136 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 868aa1d..d1d3a6f 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -57,24 +57,11 @@ static int hvmemul_do_io(
     int value_is_ptr = (p_data == NULL);
     struct vcpu *curr = current;
     struct hvm_vcpu_io *vio;
-    ioreq_t *p = get_ioreq(curr);
-    ioreq_t _ioreq;
+    ioreq_t p[1];
     unsigned long ram_gfn = paddr_to_pfn(ram_gpa);
     p2m_type_t p2mt;
     struct page_info *ram_page;
     int rc;
-    bool_t has_dm = 1;
-
-    /*
-     * Domains without a backing DM, don't have an ioreq page.  Just
-     * point to a struct on the stack, initialising the state as needed.
-     */
-    if ( !p )
-    {
-        has_dm = 0;
-        p = &_ioreq;
-        p->state = STATE_IOREQ_NONE;
-    }
 
     /* Check for paged out page */
     ram_page = get_page_from_gfn(curr->domain, ram_gfn, &p2mt, P2M_UNSHARE);
@@ -173,15 +160,6 @@ static int hvmemul_do_io(
         return X86EMUL_UNHANDLEABLE;
     }
 
-    if ( p->state != STATE_IOREQ_NONE )
-    {
-        gdprintk(XENLOG_WARNING, "WARNING: io already pending (%d)?\n",
-                 p->state);
-        if ( ram_page )
-            put_page(ram_page);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
     vio->io_state =
         (p_data == NULL) ? HVMIO_dispatched : HVMIO_awaiting_completion;
     vio->io_size = size;
@@ -193,6 +171,7 @@ static int hvmemul_do_io(
     if ( vio->mmio_retrying )
         *reps = 1;
 
+    p->state = STATE_IOREQ_NONE;
     p->dir = dir;
     p->data_is_ptr = value_is_ptr;
     p->type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO;
@@ -232,20 +211,15 @@ static int hvmemul_do_io(
             vio->io_state = HVMIO_handle_mmio_awaiting_completion;
         break;
     case X86EMUL_UNHANDLEABLE:
-        /* If there is no backing DM, just ignore accesses */
-        if ( !has_dm )
+        rc = X86EMUL_RETRY;
+        if ( !hvm_send_assist_req(curr, p) )
         {
             rc = X86EMUL_OKAY;
             vio->io_state = HVMIO_none;
         }
-        else
-        {
-            rc = X86EMUL_RETRY;
-            if ( !hvm_send_assist_req(curr) )
-                vio->io_state = HVMIO_none;
-            else if ( p_data == NULL )
-                rc = X86EMUL_OKAY;
-        }
+        else if ( p_data == NULL )
+            rc = X86EMUL_OKAY;
+
         break;
     default:
         BUG();
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 69f7e74..71a44db 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -345,6 +345,14 @@ void hvm_migrate_pirqs(struct vcpu *v)
     spin_unlock(&d->event_lock);
 }
 
+static ioreq_t *get_ioreq(struct vcpu *v)
+{
+    struct domain *d = v->domain;
+    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
+    ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
+    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
+}
+
 void hvm_do_resume(struct vcpu *v)
 {
     ioreq_t *p;
@@ -1287,7 +1295,86 @@ void hvm_vcpu_down(struct vcpu *v)
     }
 }
 
-bool_t hvm_send_assist_req(struct vcpu *v)
+int hvm_buffered_io_send(ioreq_t *p)
+{
+    struct vcpu *v = current;
+    struct hvm_ioreq_page *iorp = &v->domain->arch.hvm_domain.buf_ioreq;
+    buffered_iopage_t *pg = iorp->va;
+    buf_ioreq_t bp;
+    /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
+    int qw = 0;
+
+    /* Ensure buffered_iopage fits in a page */
+    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
+
+    /*
+     * Return 0 for the cases we can't deal with:
+     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
+     *  - we cannot buffer accesses to guest memory buffers, as the guest
+     *    may expect the memory buffer to be synchronously accessed
+     *  - the count field is usually used with data_is_ptr and since we don't
+     *    support data_is_ptr we do not waste space for the count field either
+     */
+    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
+        return 0;
+
+    bp.type = p->type;
+    bp.dir  = p->dir;
+    switch ( p->size )
+    {
+    case 1:
+        bp.size = 0;
+        break;
+    case 2:
+        bp.size = 1;
+        break;
+    case 4:
+        bp.size = 2;
+        break;
+    case 8:
+        bp.size = 3;
+        qw = 1;
+        break;
+    default:
+        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
+        return 0;
+    }
+    
+    bp.data = p->data;
+    bp.addr = p->addr;
+    
+    spin_lock(&iorp->lock);
+
+    if ( (pg->write_pointer - pg->read_pointer) >=
+         (IOREQ_BUFFER_SLOT_NUM - qw) )
+    {
+        /* The queue is full: send the iopacket through the normal path. */
+        spin_unlock(&iorp->lock);
+        return 0;
+    }
+    
+    memcpy(&pg->buf_ioreq[pg->write_pointer % IOREQ_BUFFER_SLOT_NUM],
+           &bp, sizeof(bp));
+    
+    if ( qw )
+    {
+        bp.data = p->data >> 32;
+        memcpy(&pg->buf_ioreq[(pg->write_pointer+1) % IOREQ_BUFFER_SLOT_NUM],
+               &bp, sizeof(bp));
+    }
+
+    /* Make the ioreq_t visible /before/ write_pointer. */
+    wmb();
+    pg->write_pointer += qw ? 2 : 1;
+
+    notify_via_xen_event_channel(v->domain,
+            v->domain->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
+    spin_unlock(&iorp->lock);
+    
+    return 1;
+}
+
+bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *proto_p)
 {
     ioreq_t *p;
 
@@ -1305,6 +1392,15 @@ bool_t hvm_send_assist_req(struct vcpu *v)
         return 0;
     }
 
+    p->dir = proto_p->dir;
+    p->data_is_ptr = proto_p->data_is_ptr;
+    p->type = proto_p->type;
+    p->size = proto_p->size;
+    p->addr = proto_p->addr;
+    p->count = proto_p->count;
+    p->df = proto_p->df;
+    p->data = proto_p->data;
+
     prepare_wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port);
 
     /*
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index bf6309d..576641c 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -46,85 +46,6 @@
 #include <xen/iocap.h>
 #include <public/hvm/ioreq.h>
 
-int hvm_buffered_io_send(ioreq_t *p)
-{
-    struct vcpu *v = current;
-    struct hvm_ioreq_page *iorp = &v->domain->arch.hvm_domain.buf_ioreq;
-    buffered_iopage_t *pg = iorp->va;
-    buf_ioreq_t bp;
-    /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
-    int qw = 0;
-
-    /* Ensure buffered_iopage fits in a page */
-    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
-
-    /*
-     * Return 0 for the cases we can't deal with:
-     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
-     *  - we cannot buffer accesses to guest memory buffers, as the guest
-     *    may expect the memory buffer to be synchronously accessed
-     *  - the count field is usually used with data_is_ptr and since we don't
-     *    support data_is_ptr we do not waste space for the count field either
-     */
-    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
-        return 0;
-
-    bp.type = p->type;
-    bp.dir  = p->dir;
-    switch ( p->size )
-    {
-    case 1:
-        bp.size = 0;
-        break;
-    case 2:
-        bp.size = 1;
-        break;
-    case 4:
-        bp.size = 2;
-        break;
-    case 8:
-        bp.size = 3;
-        qw = 1;
-        break;
-    default:
-        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
-        return 0;
-    }
-    
-    bp.data = p->data;
-    bp.addr = p->addr;
-    
-    spin_lock(&iorp->lock);
-
-    if ( (pg->write_pointer - pg->read_pointer) >=
-         (IOREQ_BUFFER_SLOT_NUM - qw) )
-    {
-        /* The queue is full: send the iopacket through the normal path. */
-        spin_unlock(&iorp->lock);
-        return 0;
-    }
-    
-    memcpy(&pg->buf_ioreq[pg->write_pointer % IOREQ_BUFFER_SLOT_NUM],
-           &bp, sizeof(bp));
-    
-    if ( qw )
-    {
-        bp.data = p->data >> 32;
-        memcpy(&pg->buf_ioreq[(pg->write_pointer+1) % IOREQ_BUFFER_SLOT_NUM],
-               &bp, sizeof(bp));
-    }
-
-    /* Make the ioreq_t visible /before/ write_pointer. */
-    wmb();
-    pg->write_pointer += qw ? 2 : 1;
-
-    notify_via_xen_event_channel(v->domain,
-            v->domain->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
-    spin_unlock(&iorp->lock);
-    
-    return 1;
-}
-
 void send_timeoffset_req(unsigned long timeoff)
 {
     ioreq_t p[1];
@@ -150,25 +71,14 @@ void send_timeoffset_req(unsigned long timeoff)
 void send_invalidate_req(void)
 {
     struct vcpu *v = current;
-    ioreq_t *p = get_ioreq(v);
-
-    if ( !p )
-        return;
-
-    if ( p->state != STATE_IOREQ_NONE )
-    {
-        gdprintk(XENLOG_ERR, "WARNING: send invalidate req with something "
-                 "already pending (%d)?\n", p->state);
-        domain_crash(v->domain);
-        return;
-    }
+    ioreq_t p[1];
 
     p->type = IOREQ_TYPE_INVALIDATE;
     p->size = 4;
     p->dir = IOREQ_WRITE;
     p->data = ~0UL; /* flush all */
 
-    (void)hvm_send_assist_req(v);
+    (void)hvm_send_assist_req(v, p);
 }
 
 int handle_mmio(void)
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index ccca5df..4e8fee8 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -26,6 +26,7 @@
 #include <asm/hvm/asid.h>
 #include <public/domctl.h>
 #include <public/hvm/save.h>
+#include <public/hvm/ioreq.h>
 #include <asm/mm.h>
 
 /* Interrupt acknowledgement sources. */
@@ -223,7 +224,7 @@ int prepare_ring_for_helper(struct domain *d, unsigned long gmfn,
                             struct page_info **_page, void **_va);
 void destroy_ring_for_helper(void **_va, struct page_info *page);
 
-bool_t hvm_send_assist_req(struct vcpu *v);
+bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *p);
 
 void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat);
 int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat);
diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-x86/hvm/support.h
index 3529499..b6af3c5 100644
--- a/xen/include/asm-x86/hvm/support.h
+++ b/xen/include/asm-x86/hvm/support.h
@@ -22,19 +22,10 @@
 #define __ASM_X86_HVM_SUPPORT_H__
 
 #include <xen/types.h>
-#include <public/hvm/ioreq.h>
 #include <xen/sched.h>
 #include <xen/hvm/save.h>
 #include <asm/processor.h>
 
-static inline ioreq_t *get_ioreq(struct vcpu *v)
-{
-    struct domain *d = v->domain;
-    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
-    ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
-    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
-}
-
 #define HVM_DELIVER_NO_ERROR_CODE  -1
 
 #ifndef NDEBUG
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 1/5] ioreq-server: centralize access to ioreq structures
  2014-01-30 14:19 ` [RFC PATCH 1/5] ioreq-server: centralize access to ioreq structures Paul Durrant
@ 2014-01-30 14:32   ` Andrew Cooper
  2014-01-30 14:35     ` Paul Durrant
  2014-02-07  4:53   ` Matt Wilson
  1 sibling, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2014-01-30 14:32 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

On 30/01/14 14:19, Paul Durrant wrote:
> To simplify creation of the ioreq server abstraction in a
> subsequent patch, this patch centralizes all use of the shared
> ioreq structure and the buffered ioreq ring to the source module
> xen/arch/x86/hvm/hvm.c.
> Also, re-work hvm_send_assist_req() slightly to complete IO
> immediately in the case where there is no emulator (i.e. the shared
> IOREQ ring has not been set). This should handle the case currently
> covered by has_dm in hvmemul_do_io().
>
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> ---
>  xen/arch/x86/hvm/emulate.c        |   40 +++------------
>  xen/arch/x86/hvm/hvm.c            |   98 ++++++++++++++++++++++++++++++++++++-
>  xen/arch/x86/hvm/io.c             |   94 +----------------------------------
>  xen/include/asm-x86/hvm/hvm.h     |    3 +-
>  xen/include/asm-x86/hvm/support.h |    9 ----
>  5 files changed, 108 insertions(+), 136 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index 868aa1d..d1d3a6f 100644
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -57,24 +57,11 @@ static int hvmemul_do_io(
>      int value_is_ptr = (p_data == NULL);
>      struct vcpu *curr = current;
>      struct hvm_vcpu_io *vio;
> -    ioreq_t *p = get_ioreq(curr);
> -    ioreq_t _ioreq;
> +    ioreq_t p[1];

I know it will make the patch sightly larger by modifying the
indirection of p, but having an array of 1 item on the stack is seems silly.

>      unsigned long ram_gfn = paddr_to_pfn(ram_gpa);
>      p2m_type_t p2mt;
>      struct page_info *ram_page;
>      int rc;
> -    bool_t has_dm = 1;
> -
> -    /*
> -     * Domains without a backing DM, don't have an ioreq page.  Just
> -     * point to a struct on the stack, initialising the state as needed.
> -     */
> -    if ( !p )
> -    {
> -        has_dm = 0;
> -        p = &_ioreq;
> -        p->state = STATE_IOREQ_NONE;
> -    }
>  
>      /* Check for paged out page */
>      ram_page = get_page_from_gfn(curr->domain, ram_gfn, &p2mt, P2M_UNSHARE);
> @@ -173,15 +160,6 @@ static int hvmemul_do_io(
>          return X86EMUL_UNHANDLEABLE;
>      }
>  
> -    if ( p->state != STATE_IOREQ_NONE )
> -    {
> -        gdprintk(XENLOG_WARNING, "WARNING: io already pending (%d)?\n",
> -                 p->state);
> -        if ( ram_page )
> -            put_page(ram_page);
> -        return X86EMUL_UNHANDLEABLE;
> -    }
> -
>      vio->io_state =
>          (p_data == NULL) ? HVMIO_dispatched : HVMIO_awaiting_completion;
>      vio->io_size = size;
> @@ -193,6 +171,7 @@ static int hvmemul_do_io(
>      if ( vio->mmio_retrying )
>          *reps = 1;
>  
> +    p->state = STATE_IOREQ_NONE;
>      p->dir = dir;
>      p->data_is_ptr = value_is_ptr;
>      p->type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO;
> @@ -232,20 +211,15 @@ static int hvmemul_do_io(
>              vio->io_state = HVMIO_handle_mmio_awaiting_completion;
>          break;
>      case X86EMUL_UNHANDLEABLE:
> -        /* If there is no backing DM, just ignore accesses */
> -        if ( !has_dm )
> +        rc = X86EMUL_RETRY;
> +        if ( !hvm_send_assist_req(curr, p) )
>          {
>              rc = X86EMUL_OKAY;
>              vio->io_state = HVMIO_none;
>          }
> -        else
> -        {
> -            rc = X86EMUL_RETRY;
> -            if ( !hvm_send_assist_req(curr) )
> -                vio->io_state = HVMIO_none;
> -            else if ( p_data == NULL )
> -                rc = X86EMUL_OKAY;
> -        }
> +        else if ( p_data == NULL )
> +            rc = X86EMUL_OKAY;
> +
>          break;
>      default:
>          BUG();
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 69f7e74..71a44db 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -345,6 +345,14 @@ void hvm_migrate_pirqs(struct vcpu *v)
>      spin_unlock(&d->event_lock);
>  }
>  
> +static ioreq_t *get_ioreq(struct vcpu *v)
> +{
> +    struct domain *d = v->domain;
> +    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;

newline here...

> +    ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));

.. and here.  (I realise that this is just code motion, but might as
well take the opportunity to fix the style.)

> +    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
> +}
> +
>  void hvm_do_resume(struct vcpu *v)
>  {
>      ioreq_t *p;
> @@ -1287,7 +1295,86 @@ void hvm_vcpu_down(struct vcpu *v)
>      }
>  }
>  
> -bool_t hvm_send_assist_req(struct vcpu *v)
> +int hvm_buffered_io_send(ioreq_t *p)
> +{
> +    struct vcpu *v = current;
> +    struct hvm_ioreq_page *iorp = &v->domain->arch.hvm_domain.buf_ioreq;
> +    buffered_iopage_t *pg = iorp->va;
> +    buf_ioreq_t bp;
> +    /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
> +    int qw = 0;
> +
> +    /* Ensure buffered_iopage fits in a page */
> +    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
> +
> +    /*
> +     * Return 0 for the cases we can't deal with:
> +     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
> +     *  - we cannot buffer accesses to guest memory buffers, as the guest
> +     *    may expect the memory buffer to be synchronously accessed
> +     *  - the count field is usually used with data_is_ptr and since we don't
> +     *    support data_is_ptr we do not waste space for the count field either
> +     */
> +    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
> +        return 0;
> +
> +    bp.type = p->type;
> +    bp.dir  = p->dir;
> +    switch ( p->size )
> +    {
> +    case 1:
> +        bp.size = 0;
> +        break;
> +    case 2:
> +        bp.size = 1;
> +        break;
> +    case 4:
> +        bp.size = 2;
> +        break;
> +    case 8:
> +        bp.size = 3;
> +        qw = 1;
> +        break;
> +    default:
> +        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
> +        return 0;
> +    }
> +    
> +    bp.data = p->data;
> +    bp.addr = p->addr;
> +    
> +    spin_lock(&iorp->lock);
> +
> +    if ( (pg->write_pointer - pg->read_pointer) >=
> +         (IOREQ_BUFFER_SLOT_NUM - qw) )
> +    {
> +        /* The queue is full: send the iopacket through the normal path. */
> +        spin_unlock(&iorp->lock);
> +        return 0;
> +    }
> +    
> +    memcpy(&pg->buf_ioreq[pg->write_pointer % IOREQ_BUFFER_SLOT_NUM],
> +           &bp, sizeof(bp));
> +    
> +    if ( qw )
> +    {
> +        bp.data = p->data >> 32;
> +        memcpy(&pg->buf_ioreq[(pg->write_pointer+1) % IOREQ_BUFFER_SLOT_NUM],
> +               &bp, sizeof(bp));
> +    }
> +
> +    /* Make the ioreq_t visible /before/ write_pointer. */
> +    wmb();
> +    pg->write_pointer += qw ? 2 : 1;
> +
> +    notify_via_xen_event_channel(v->domain,
> +            v->domain->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
> +    spin_unlock(&iorp->lock);
> +    
> +    return 1;
> +}
> +
> +bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *proto_p)
>  {
>      ioreq_t *p;
>  
> @@ -1305,6 +1392,15 @@ bool_t hvm_send_assist_req(struct vcpu *v)
>          return 0;
>      }
>  
> +    p->dir = proto_p->dir;
> +    p->data_is_ptr = proto_p->data_is_ptr;
> +    p->type = proto_p->type;
> +    p->size = proto_p->size;
> +    p->addr = proto_p->addr;
> +    p->count = proto_p->count;
> +    p->df = proto_p->df;
> +    p->data = proto_p->data;
> +
>      prepare_wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port);
>  
>      /*
> diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
> index bf6309d..576641c 100644
> --- a/xen/arch/x86/hvm/io.c
> +++ b/xen/arch/x86/hvm/io.c
> @@ -46,85 +46,6 @@
>  #include <xen/iocap.h>
>  #include <public/hvm/ioreq.h>
>  
> -int hvm_buffered_io_send(ioreq_t *p)
> -{
> -    struct vcpu *v = current;
> -    struct hvm_ioreq_page *iorp = &v->domain->arch.hvm_domain.buf_ioreq;
> -    buffered_iopage_t *pg = iorp->va;
> -    buf_ioreq_t bp;
> -    /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
> -    int qw = 0;
> -
> -    /* Ensure buffered_iopage fits in a page */
> -    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
> -
> -    /*
> -     * Return 0 for the cases we can't deal with:
> -     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
> -     *  - we cannot buffer accesses to guest memory buffers, as the guest
> -     *    may expect the memory buffer to be synchronously accessed
> -     *  - the count field is usually used with data_is_ptr and since we don't
> -     *    support data_is_ptr we do not waste space for the count field either
> -     */
> -    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
> -        return 0;
> -
> -    bp.type = p->type;
> -    bp.dir  = p->dir;
> -    switch ( p->size )
> -    {
> -    case 1:
> -        bp.size = 0;
> -        break;
> -    case 2:
> -        bp.size = 1;
> -        break;
> -    case 4:
> -        bp.size = 2;
> -        break;
> -    case 8:
> -        bp.size = 3;
> -        qw = 1;
> -        break;
> -    default:
> -        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
> -        return 0;
> -    }
> -    
> -    bp.data = p->data;
> -    bp.addr = p->addr;
> -    
> -    spin_lock(&iorp->lock);
> -
> -    if ( (pg->write_pointer - pg->read_pointer) >=
> -         (IOREQ_BUFFER_SLOT_NUM - qw) )
> -    {
> -        /* The queue is full: send the iopacket through the normal path. */
> -        spin_unlock(&iorp->lock);
> -        return 0;
> -    }
> -    
> -    memcpy(&pg->buf_ioreq[pg->write_pointer % IOREQ_BUFFER_SLOT_NUM],
> -           &bp, sizeof(bp));
> -    
> -    if ( qw )
> -    {
> -        bp.data = p->data >> 32;
> -        memcpy(&pg->buf_ioreq[(pg->write_pointer+1) % IOREQ_BUFFER_SLOT_NUM],
> -               &bp, sizeof(bp));
> -    }
> -
> -    /* Make the ioreq_t visible /before/ write_pointer. */
> -    wmb();
> -    pg->write_pointer += qw ? 2 : 1;
> -
> -    notify_via_xen_event_channel(v->domain,
> -            v->domain->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
> -    spin_unlock(&iorp->lock);
> -    
> -    return 1;
> -}
> -
>  void send_timeoffset_req(unsigned long timeoff)
>  {
>      ioreq_t p[1];
> @@ -150,25 +71,14 @@ void send_timeoffset_req(unsigned long timeoff)
>  void send_invalidate_req(void)
>  {
>      struct vcpu *v = current;
> -    ioreq_t *p = get_ioreq(v);
> -
> -    if ( !p )
> -        return;
> -
> -    if ( p->state != STATE_IOREQ_NONE )
> -    {
> -        gdprintk(XENLOG_ERR, "WARNING: send invalidate req with something "
> -                 "already pending (%d)?\n", p->state);
> -        domain_crash(v->domain);
> -        return;
> -    }
> +    ioreq_t p[1];

This can all be reduced to a single item, and even using C structure
initialisation rather than 4 explicit assignments.

~Andrew

>  
>      p->type = IOREQ_TYPE_INVALIDATE;
>      p->size = 4;
>      p->dir = IOREQ_WRITE;
>      p->data = ~0UL; /* flush all */
>  
> -    (void)hvm_send_assist_req(v);
> +    (void)hvm_send_assist_req(v, p);
>  }
>  
>  int handle_mmio(void)
> diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
> index ccca5df..4e8fee8 100644
> --- a/xen/include/asm-x86/hvm/hvm.h
> +++ b/xen/include/asm-x86/hvm/hvm.h
> @@ -26,6 +26,7 @@
>  #include <asm/hvm/asid.h>
>  #include <public/domctl.h>
>  #include <public/hvm/save.h>
> +#include <public/hvm/ioreq.h>
>  #include <asm/mm.h>
>  
>  /* Interrupt acknowledgement sources. */
> @@ -223,7 +224,7 @@ int prepare_ring_for_helper(struct domain *d, unsigned long gmfn,
>                              struct page_info **_page, void **_va);
>  void destroy_ring_for_helper(void **_va, struct page_info *page);
>  
> -bool_t hvm_send_assist_req(struct vcpu *v);
> +bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *p);
>  
>  void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat);
>  int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat);
> diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-x86/hvm/support.h
> index 3529499..b6af3c5 100644
> --- a/xen/include/asm-x86/hvm/support.h
> +++ b/xen/include/asm-x86/hvm/support.h
> @@ -22,19 +22,10 @@
>  #define __ASM_X86_HVM_SUPPORT_H__
>  
>  #include <xen/types.h>
> -#include <public/hvm/ioreq.h>
>  #include <xen/sched.h>
>  #include <xen/hvm/save.h>
>  #include <asm/processor.h>
>  
> -static inline ioreq_t *get_ioreq(struct vcpu *v)
> -{
> -    struct domain *d = v->domain;
> -    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
> -    ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
> -    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
> -}
> -
>  #define HVM_DELIVER_NO_ERROR_CODE  -1
>  
>  #ifndef NDEBUG

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 1/5] ioreq-server: centralize access to ioreq structures
  2014-01-30 14:32   ` Andrew Cooper
@ 2014-01-30 14:35     ` Paul Durrant
  0 siblings, 0 replies; 25+ messages in thread
From: Paul Durrant @ 2014-01-30 14:35 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel@lists.xen.org

> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: 30 January 2014 14:32
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [RFC PATCH 1/5] ioreq-server: centralize access to
> ioreq structures
> 
> On 30/01/14 14:19, Paul Durrant wrote:
> > To simplify creation of the ioreq server abstraction in a
> > subsequent patch, this patch centralizes all use of the shared
> > ioreq structure and the buffered ioreq ring to the source module
> > xen/arch/x86/hvm/hvm.c.
> > Also, re-work hvm_send_assist_req() slightly to complete IO
> > immediately in the case where there is no emulator (i.e. the shared
> > IOREQ ring has not been set). This should handle the case currently
> > covered by has_dm in hvmemul_do_io().
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > ---
> >  xen/arch/x86/hvm/emulate.c        |   40 +++------------
> >  xen/arch/x86/hvm/hvm.c            |   98
> ++++++++++++++++++++++++++++++++++++-
> >  xen/arch/x86/hvm/io.c             |   94 +----------------------------------
> >  xen/include/asm-x86/hvm/hvm.h     |    3 +-
> >  xen/include/asm-x86/hvm/support.h |    9 ----
> >  5 files changed, 108 insertions(+), 136 deletions(-)
> >
> > diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> > index 868aa1d..d1d3a6f 100644
> > --- a/xen/arch/x86/hvm/emulate.c
> > +++ b/xen/arch/x86/hvm/emulate.c
> > @@ -57,24 +57,11 @@ static int hvmemul_do_io(
> >      int value_is_ptr = (p_data == NULL);
> >      struct vcpu *curr = current;
> >      struct hvm_vcpu_io *vio;
> > -    ioreq_t *p = get_ioreq(curr);
> > -    ioreq_t _ioreq;
> > +    ioreq_t p[1];
> 
> I know it will make the patch sightly larger by modifying the
> indirection of p, but having an array of 1 item on the stack is seems silly.
> 

I'm following the style adopted in io.c and it is entirely to keep the patch as small as possible :-)
I agree it's a bit silly but I guess it would be better to keep such a change in a separate patch. I can add that to the sequence when I come to submit the patches for real.

  Paul

> >      unsigned long ram_gfn = paddr_to_pfn(ram_gpa);
> >      p2m_type_t p2mt;
> >      struct page_info *ram_page;
> >      int rc;
> > -    bool_t has_dm = 1;
> > -
> > -    /*
> > -     * Domains without a backing DM, don't have an ioreq page.  Just
> > -     * point to a struct on the stack, initialising the state as needed.
> > -     */
> > -    if ( !p )
> > -    {
> > -        has_dm = 0;
> > -        p = &_ioreq;
> > -        p->state = STATE_IOREQ_NONE;
> > -    }
> >
> >      /* Check for paged out page */
> >      ram_page = get_page_from_gfn(curr->domain, ram_gfn, &p2mt,
> P2M_UNSHARE);
> > @@ -173,15 +160,6 @@ static int hvmemul_do_io(
> >          return X86EMUL_UNHANDLEABLE;
> >      }
> >
> > -    if ( p->state != STATE_IOREQ_NONE )
> > -    {
> > -        gdprintk(XENLOG_WARNING, "WARNING: io already pending
> (%d)?\n",
> > -                 p->state);
> > -        if ( ram_page )
> > -            put_page(ram_page);
> > -        return X86EMUL_UNHANDLEABLE;
> > -    }
> > -
> >      vio->io_state =
> >          (p_data == NULL) ? HVMIO_dispatched :
> HVMIO_awaiting_completion;
> >      vio->io_size = size;
> > @@ -193,6 +171,7 @@ static int hvmemul_do_io(
> >      if ( vio->mmio_retrying )
> >          *reps = 1;
> >
> > +    p->state = STATE_IOREQ_NONE;
> >      p->dir = dir;
> >      p->data_is_ptr = value_is_ptr;
> >      p->type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO;
> > @@ -232,20 +211,15 @@ static int hvmemul_do_io(
> >              vio->io_state = HVMIO_handle_mmio_awaiting_completion;
> >          break;
> >      case X86EMUL_UNHANDLEABLE:
> > -        /* If there is no backing DM, just ignore accesses */
> > -        if ( !has_dm )
> > +        rc = X86EMUL_RETRY;
> > +        if ( !hvm_send_assist_req(curr, p) )
> >          {
> >              rc = X86EMUL_OKAY;
> >              vio->io_state = HVMIO_none;
> >          }
> > -        else
> > -        {
> > -            rc = X86EMUL_RETRY;
> > -            if ( !hvm_send_assist_req(curr) )
> > -                vio->io_state = HVMIO_none;
> > -            else if ( p_data == NULL )
> > -                rc = X86EMUL_OKAY;
> > -        }
> > +        else if ( p_data == NULL )
> > +            rc = X86EMUL_OKAY;
> > +
> >          break;
> >      default:
> >          BUG();
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > index 69f7e74..71a44db 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -345,6 +345,14 @@ void hvm_migrate_pirqs(struct vcpu *v)
> >      spin_unlock(&d->event_lock);
> >  }
> >
> > +static ioreq_t *get_ioreq(struct vcpu *v)
> > +{
> > +    struct domain *d = v->domain;
> > +    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
> 
> newline here...
> 
> > +    ASSERT((v == current) || spin_is_locked(&d-
> >arch.hvm_domain.ioreq.lock));
> 
> .. and here.  (I realise that this is just code motion, but might as
> well take the opportunity to fix the style.)
> 
> > +    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
> > +}
> > +
> >  void hvm_do_resume(struct vcpu *v)
> >  {
> >      ioreq_t *p;
> > @@ -1287,7 +1295,86 @@ void hvm_vcpu_down(struct vcpu *v)
> >      }
> >  }
> >
> > -bool_t hvm_send_assist_req(struct vcpu *v)
> > +int hvm_buffered_io_send(ioreq_t *p)
> > +{
> > +    struct vcpu *v = current;
> > +    struct hvm_ioreq_page *iorp = &v->domain-
> >arch.hvm_domain.buf_ioreq;
> > +    buffered_iopage_t *pg = iorp->va;
> > +    buf_ioreq_t bp;
> > +    /* Timeoffset sends 64b data, but no address. Use two consecutive
> slots. */
> > +    int qw = 0;
> > +
> > +    /* Ensure buffered_iopage fits in a page */
> > +    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
> > +
> > +    /*
> > +     * Return 0 for the cases we can't deal with:
> > +     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
> > +     *  - we cannot buffer accesses to guest memory buffers, as the guest
> > +     *    may expect the memory buffer to be synchronously accessed
> > +     *  - the count field is usually used with data_is_ptr and since we don't
> > +     *    support data_is_ptr we do not waste space for the count field
> either
> > +     */
> > +    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
> > +        return 0;
> > +
> > +    bp.type = p->type;
> > +    bp.dir  = p->dir;
> > +    switch ( p->size )
> > +    {
> > +    case 1:
> > +        bp.size = 0;
> > +        break;
> > +    case 2:
> > +        bp.size = 1;
> > +        break;
> > +    case 4:
> > +        bp.size = 2;
> > +        break;
> > +    case 8:
> > +        bp.size = 3;
> > +        qw = 1;
> > +        break;
> > +    default:
> > +        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p-
> >size);
> > +        return 0;
> > +    }
> > +
> > +    bp.data = p->data;
> > +    bp.addr = p->addr;
> > +
> > +    spin_lock(&iorp->lock);
> > +
> > +    if ( (pg->write_pointer - pg->read_pointer) >=
> > +         (IOREQ_BUFFER_SLOT_NUM - qw) )
> > +    {
> > +        /* The queue is full: send the iopacket through the normal path. */
> > +        spin_unlock(&iorp->lock);
> > +        return 0;
> > +    }
> > +
> > +    memcpy(&pg->buf_ioreq[pg->write_pointer %
> IOREQ_BUFFER_SLOT_NUM],
> > +           &bp, sizeof(bp));
> > +
> > +    if ( qw )
> > +    {
> > +        bp.data = p->data >> 32;
> > +        memcpy(&pg->buf_ioreq[(pg->write_pointer+1) %
> IOREQ_BUFFER_SLOT_NUM],
> > +               &bp, sizeof(bp));
> > +    }
> > +
> > +    /* Make the ioreq_t visible /before/ write_pointer. */
> > +    wmb();
> > +    pg->write_pointer += qw ? 2 : 1;
> > +
> > +    notify_via_xen_event_channel(v->domain,
> > +            v->domain-
> >arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
> > +    spin_unlock(&iorp->lock);
> > +
> > +    return 1;
> > +}
> > +
> > +bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *proto_p)
> >  {
> >      ioreq_t *p;
> >
> > @@ -1305,6 +1392,15 @@ bool_t hvm_send_assist_req(struct vcpu *v)
> >          return 0;
> >      }
> >
> > +    p->dir = proto_p->dir;
> > +    p->data_is_ptr = proto_p->data_is_ptr;
> > +    p->type = proto_p->type;
> > +    p->size = proto_p->size;
> > +    p->addr = proto_p->addr;
> > +    p->count = proto_p->count;
> > +    p->df = proto_p->df;
> > +    p->data = proto_p->data;
> > +
> >      prepare_wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port);
> >
> >      /*
> > diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
> > index bf6309d..576641c 100644
> > --- a/xen/arch/x86/hvm/io.c
> > +++ b/xen/arch/x86/hvm/io.c
> > @@ -46,85 +46,6 @@
> >  #include <xen/iocap.h>
> >  #include <public/hvm/ioreq.h>
> >
> > -int hvm_buffered_io_send(ioreq_t *p)
> > -{
> > -    struct vcpu *v = current;
> > -    struct hvm_ioreq_page *iorp = &v->domain-
> >arch.hvm_domain.buf_ioreq;
> > -    buffered_iopage_t *pg = iorp->va;
> > -    buf_ioreq_t bp;
> > -    /* Timeoffset sends 64b data, but no address. Use two consecutive
> slots. */
> > -    int qw = 0;
> > -
> > -    /* Ensure buffered_iopage fits in a page */
> > -    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
> > -
> > -    /*
> > -     * Return 0 for the cases we can't deal with:
> > -     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
> > -     *  - we cannot buffer accesses to guest memory buffers, as the guest
> > -     *    may expect the memory buffer to be synchronously accessed
> > -     *  - the count field is usually used with data_is_ptr and since we don't
> > -     *    support data_is_ptr we do not waste space for the count field either
> > -     */
> > -    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
> > -        return 0;
> > -
> > -    bp.type = p->type;
> > -    bp.dir  = p->dir;
> > -    switch ( p->size )
> > -    {
> > -    case 1:
> > -        bp.size = 0;
> > -        break;
> > -    case 2:
> > -        bp.size = 1;
> > -        break;
> > -    case 4:
> > -        bp.size = 2;
> > -        break;
> > -    case 8:
> > -        bp.size = 3;
> > -        qw = 1;
> > -        break;
> > -    default:
> > -        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p-
> >size);
> > -        return 0;
> > -    }
> > -
> > -    bp.data = p->data;
> > -    bp.addr = p->addr;
> > -
> > -    spin_lock(&iorp->lock);
> > -
> > -    if ( (pg->write_pointer - pg->read_pointer) >=
> > -         (IOREQ_BUFFER_SLOT_NUM - qw) )
> > -    {
> > -        /* The queue is full: send the iopacket through the normal path. */
> > -        spin_unlock(&iorp->lock);
> > -        return 0;
> > -    }
> > -
> > -    memcpy(&pg->buf_ioreq[pg->write_pointer %
> IOREQ_BUFFER_SLOT_NUM],
> > -           &bp, sizeof(bp));
> > -
> > -    if ( qw )
> > -    {
> > -        bp.data = p->data >> 32;
> > -        memcpy(&pg->buf_ioreq[(pg->write_pointer+1) %
> IOREQ_BUFFER_SLOT_NUM],
> > -               &bp, sizeof(bp));
> > -    }
> > -
> > -    /* Make the ioreq_t visible /before/ write_pointer. */
> > -    wmb();
> > -    pg->write_pointer += qw ? 2 : 1;
> > -
> > -    notify_via_xen_event_channel(v->domain,
> > -            v->domain-
> >arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
> > -    spin_unlock(&iorp->lock);
> > -
> > -    return 1;
> > -}
> > -
> >  void send_timeoffset_req(unsigned long timeoff)
> >  {
> >      ioreq_t p[1];
> > @@ -150,25 +71,14 @@ void send_timeoffset_req(unsigned long timeoff)
> >  void send_invalidate_req(void)
> >  {
> >      struct vcpu *v = current;
> > -    ioreq_t *p = get_ioreq(v);
> > -
> > -    if ( !p )
> > -        return;
> > -
> > -    if ( p->state != STATE_IOREQ_NONE )
> > -    {
> > -        gdprintk(XENLOG_ERR, "WARNING: send invalidate req with
> something "
> > -                 "already pending (%d)?\n", p->state);
> > -        domain_crash(v->domain);
> > -        return;
> > -    }
> > +    ioreq_t p[1];
> 
> This can all be reduced to a single item, and even using C structure
> initialisation rather than 4 explicit assignments.
> 
> ~Andrew
> 
> >
> >      p->type = IOREQ_TYPE_INVALIDATE;
> >      p->size = 4;
> >      p->dir = IOREQ_WRITE;
> >      p->data = ~0UL; /* flush all */
> >
> > -    (void)hvm_send_assist_req(v);
> > +    (void)hvm_send_assist_req(v, p);
> >  }
> >
> >  int handle_mmio(void)
> > diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-
> x86/hvm/hvm.h
> > index ccca5df..4e8fee8 100644
> > --- a/xen/include/asm-x86/hvm/hvm.h
> > +++ b/xen/include/asm-x86/hvm/hvm.h
> > @@ -26,6 +26,7 @@
> >  #include <asm/hvm/asid.h>
> >  #include <public/domctl.h>
> >  #include <public/hvm/save.h>
> > +#include <public/hvm/ioreq.h>
> >  #include <asm/mm.h>
> >
> >  /* Interrupt acknowledgement sources. */
> > @@ -223,7 +224,7 @@ int prepare_ring_for_helper(struct domain *d,
> unsigned long gmfn,
> >                              struct page_info **_page, void **_va);
> >  void destroy_ring_for_helper(void **_va, struct page_info *page);
> >
> > -bool_t hvm_send_assist_req(struct vcpu *v);
> > +bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *p);
> >
> >  void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat);
> >  int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat);
> > diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-
> x86/hvm/support.h
> > index 3529499..b6af3c5 100644
> > --- a/xen/include/asm-x86/hvm/support.h
> > +++ b/xen/include/asm-x86/hvm/support.h
> > @@ -22,19 +22,10 @@
> >  #define __ASM_X86_HVM_SUPPORT_H__
> >
> >  #include <xen/types.h>
> > -#include <public/hvm/ioreq.h>
> >  #include <xen/sched.h>
> >  #include <xen/hvm/save.h>
> >  #include <asm/processor.h>
> >
> > -static inline ioreq_t *get_ioreq(struct vcpu *v)
> > -{
> > -    struct domain *d = v->domain;
> > -    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
> > -    ASSERT((v == current) || spin_is_locked(&d-
> >arch.hvm_domain.ioreq.lock));
> > -    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
> > -}
> > -
> >  #define HVM_DELIVER_NO_ERROR_CODE  -1
> >
> >  #ifndef NDEBUG

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 1/5] ioreq-server: centralize access to ioreq structures
  2014-01-30 14:19 ` [RFC PATCH 1/5] ioreq-server: centralize access to ioreq structures Paul Durrant
  2014-01-30 14:32   ` Andrew Cooper
@ 2014-02-07  4:53   ` Matt Wilson
  2014-02-07  9:24     ` Paul Durrant
  1 sibling, 1 reply; 25+ messages in thread
From: Matt Wilson @ 2014-02-07  4:53 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

On Thu, Jan 30, 2014 at 02:19:46PM +0000, Paul Durrant wrote:
> To simplify creation of the ioreq server abstraction in a
> subsequent patch, this patch centralizes all use of the shared
> ioreq structure and the buffered ioreq ring to the source module
> xen/arch/x86/hvm/hvm.c.
> Also, re-work hvm_send_assist_req() slightly to complete IO
> immediately in the case where there is no emulator (i.e. the shared
> IOREQ ring has not been set). This should handle the case currently
> covered by has_dm in hvmemul_do_io().
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>

[...]

> diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-x86/hvm/support.h
> index 3529499..b6af3c5 100644
> --- a/xen/include/asm-x86/hvm/support.h
> +++ b/xen/include/asm-x86/hvm/support.h
> @@ -22,19 +22,10 @@
>  #define __ASM_X86_HVM_SUPPORT_H__
>  
>  #include <xen/types.h>
> -#include <public/hvm/ioreq.h>
>  #include <xen/sched.h>
>  #include <xen/hvm/save.h>
>  #include <asm/processor.h>
>  
> -static inline ioreq_t *get_ioreq(struct vcpu *v)
> -{
> -    struct domain *d = v->domain;
> -    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
> -    ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
> -    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
> -}
> -
>  #define HVM_DELIVER_NO_ERROR_CODE  -1
>  
>  #ifndef NDEBUG

Seems like this breaks nested VMX:

vvmx.c: In function 'nvmx_switch_guest':
vvmx.c:1403: error: implicit declaration of function 'get_ioreq'
vvmx.c:1403: error: nested extern declaration of 'get_ioreq'
vvmx.c:1403: error: invalid type argument of '->' (have 'int')

--msw

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 1/5] ioreq-server: centralize access to ioreq structures
  2014-02-07  4:53   ` Matt Wilson
@ 2014-02-07  9:24     ` Paul Durrant
  0 siblings, 0 replies; 25+ messages in thread
From: Paul Durrant @ 2014-02-07  9:24 UTC (permalink / raw)
  To: Matt Wilson; +Cc: xen-devel@lists.xen.org

> -----Original Message-----
> From: Matt Wilson [mailto:mswilson@gmail.com] On Behalf Of Matt Wilson
> Sent: 07 February 2014 04:54
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [RFC PATCH 1/5] ioreq-server: centralize access to
> ioreq structures
> 
> On Thu, Jan 30, 2014 at 02:19:46PM +0000, Paul Durrant wrote:
> > To simplify creation of the ioreq server abstraction in a
> > subsequent patch, this patch centralizes all use of the shared
> > ioreq structure and the buffered ioreq ring to the source module
> > xen/arch/x86/hvm/hvm.c.
> > Also, re-work hvm_send_assist_req() slightly to complete IO
> > immediately in the case where there is no emulator (i.e. the shared
> > IOREQ ring has not been set). This should handle the case currently
> > covered by has_dm in hvmemul_do_io().
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> 
> [...]
> 
> > diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-
> x86/hvm/support.h
> > index 3529499..b6af3c5 100644
> > --- a/xen/include/asm-x86/hvm/support.h
> > +++ b/xen/include/asm-x86/hvm/support.h
> > @@ -22,19 +22,10 @@
> >  #define __ASM_X86_HVM_SUPPORT_H__
> >
> >  #include <xen/types.h>
> > -#include <public/hvm/ioreq.h>
> >  #include <xen/sched.h>
> >  #include <xen/hvm/save.h>
> >  #include <asm/processor.h>
> >
> > -static inline ioreq_t *get_ioreq(struct vcpu *v)
> > -{
> > -    struct domain *d = v->domain;
> > -    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
> > -    ASSERT((v == current) || spin_is_locked(&d-
> >arch.hvm_domain.ioreq.lock));
> > -    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
> > -}
> > -
> >  #define HVM_DELIVER_NO_ERROR_CODE  -1
> >
> >  #ifndef NDEBUG
> 
> Seems like this breaks nested VMX:
> 
> vvmx.c: In function 'nvmx_switch_guest':
> vvmx.c:1403: error: implicit declaration of function 'get_ioreq'
> vvmx.c:1403: error: nested extern declaration of 'get_ioreq'
> vvmx.c:1403: error: invalid type argument of '->' (have 'int')
> 

Thanks Matt. That'll teach me to rebase just before I post. I tripped across this a couple of days ago myself and I've incorporated another couple of changes in v2 of this patch to fix it. I'll re-post the series once I've done some more testing with a secondary emulator that actually does something :-)

  Cheers,

  Paul

> --msw

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 2/5] ioreq-server: create basic ioreq server abstraction.
  2014-01-30 14:19 [RFC PATCH 1/5] Support for running secondary emulators Paul Durrant
  2014-01-30 14:19 ` [RFC PATCH 1/5] ioreq-server: centralize access to ioreq structures Paul Durrant
@ 2014-01-30 14:19 ` Paul Durrant
  2014-01-30 15:03   ` Andrew Cooper
  2014-01-30 14:19 ` [RFC PATCH 3/5] ioreq-server: on-demand creation of ioreq server Paul Durrant
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Paul Durrant @ 2014-01-30 14:19 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant

Collect together data structures concerning device emulation together into
a new struct hvm_ioreq_server.

Code that deals with the shared and buffered ioreq pages is extracted from
functions such as hvm_domain_initialise, hvm_vcpu_initialise and do_hvm_op
and consolidated into a set of hvm_ioreq_server_XXX functions.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
 xen/arch/x86/hvm/hvm.c           |  318 ++++++++++++++++++++++++++------------
 xen/include/asm-x86/hvm/domain.h |    9 +-
 xen/include/asm-x86/hvm/vcpu.h   |    2 +-
 3 files changed, 229 insertions(+), 100 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 71a44db..a0eaadb 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -345,16 +345,16 @@ void hvm_migrate_pirqs(struct vcpu *v)
     spin_unlock(&d->event_lock);
 }
 
-static ioreq_t *get_ioreq(struct vcpu *v)
+static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, int id)
 {
-    struct domain *d = v->domain;
-    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
-    ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
-    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
+    shared_iopage_t *p = s->ioreq.va;
+    ASSERT(p != NULL);
+    return &p->vcpu_ioreq[id];
 }
 
 void hvm_do_resume(struct vcpu *v)
 {
+    struct hvm_ioreq_server *s;
     ioreq_t *p;
 
     check_wakeup_from_wait();
@@ -362,10 +362,14 @@ void hvm_do_resume(struct vcpu *v)
     if ( is_hvm_vcpu(v) )
         pt_restore_timer(v);
 
-    /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
-    if ( !(p = get_ioreq(v)) )
+    s = v->arch.hvm_vcpu.ioreq_server;
+    v->arch.hvm_vcpu.ioreq_server = NULL;
+
+    if ( !s )
         goto check_inject_trap;
 
+    /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
+    p = get_ioreq(s, v->vcpu_id);
     while ( p->state != STATE_IOREQ_NONE )
     {
         switch ( p->state )
@@ -375,7 +379,7 @@ void hvm_do_resume(struct vcpu *v)
             break;
         case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
         case STATE_IOREQ_INPROCESS:
-            wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port,
+            wait_on_xen_event_channel(p->vp_eport,
                                       (p->state != STATE_IOREQ_READY) &&
                                       (p->state != STATE_IOREQ_INPROCESS));
             break;
@@ -398,7 +402,6 @@ void hvm_do_resume(struct vcpu *v)
 static void hvm_init_ioreq_page(
     struct domain *d, struct hvm_ioreq_page *iorp)
 {
-    memset(iorp, 0, sizeof(*iorp));
     spin_lock_init(&iorp->lock);
     domain_pause(d);
 }
@@ -541,6 +544,167 @@ static int handle_pvh_io(
     return X86EMUL_OKAY;
 }
 
+static int hvm_init_ioreq_server(struct domain *d)
+{
+    struct hvm_ioreq_server *s;
+    int i;
+
+    s = xzalloc(struct hvm_ioreq_server);
+    if ( !s )
+        return -ENOMEM;
+
+    s->domain = d;
+
+    for ( i = 0; i < MAX_HVM_VCPUS; i++ )
+        s->ioreq_evtchn[i] = -1;
+    s->buf_ioreq_evtchn = -1;
+
+    hvm_init_ioreq_page(d, &s->ioreq);
+    hvm_init_ioreq_page(d, &s->buf_ioreq);
+
+    d->arch.hvm_domain.ioreq_server = s;
+    return 0;
+}
+
+static void hvm_deinit_ioreq_server(struct domain *d)
+{
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+
+    hvm_destroy_ioreq_page(d, &s->ioreq);
+    hvm_destroy_ioreq_page(d, &s->buf_ioreq);
+
+    xfree(s);
+}
+
+static void hvm_update_ioreq_server_evtchn(struct hvm_ioreq_server *s)
+{
+    struct domain *d = s->domain;
+
+    if ( s->ioreq.va != NULL )
+    {
+        shared_iopage_t *p = s->ioreq.va;
+        struct vcpu *v;
+
+        for_each_vcpu ( d, v )
+            p->vcpu_ioreq[v->vcpu_id].vp_eport = s->ioreq_evtchn[v->vcpu_id];
+    }
+}
+
+static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s, struct vcpu *v)
+{
+    int rc;
+
+    /* Create ioreq event channel. */
+    rc = alloc_unbound_xen_event_channel(v, s->domid, NULL);
+    if ( rc < 0 )
+        goto done;
+
+    /* Register ioreq event channel. */
+    s->ioreq_evtchn[v->vcpu_id] = rc;
+
+    if ( v->vcpu_id == 0 )
+    {
+        /* Create bufioreq event channel. */
+        rc = alloc_unbound_xen_event_channel(v, s->domid, NULL);
+        if ( rc < 0 )
+            goto done;
+
+        s->buf_ioreq_evtchn = rc;
+    }
+
+    hvm_update_ioreq_server_evtchn(s);
+    rc = 0;
+
+done:
+    return rc;
+}
+
+static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, struct vcpu *v)
+{
+    if ( v->vcpu_id == 0 )
+    {
+        if ( s->buf_ioreq_evtchn >= 0 )
+        {
+            free_xen_event_channel(v, s->buf_ioreq_evtchn);
+            s->buf_ioreq_evtchn = -1;
+        }
+    }
+
+    if ( s->ioreq_evtchn[v->vcpu_id] >= 0 )
+    {
+        free_xen_event_channel(v, s->ioreq_evtchn[v->vcpu_id]);
+        s->ioreq_evtchn[v->vcpu_id] = -1;
+    }
+}
+
+static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
+                                     int *p_port)
+{
+    int old_port, new_port;
+
+    new_port = alloc_unbound_xen_event_channel(v, remote_domid, NULL);
+    if ( new_port < 0 )
+        return new_port;
+
+    /* xchg() ensures that only we call free_xen_event_channel(). */
+    old_port = xchg(p_port, new_port);
+    free_xen_event_channel(v, old_port);
+    return 0;
+}
+
+static int hvm_set_ioreq_server_domid(struct hvm_ioreq_server *s, domid_t domid)
+{
+    struct domain *d = s->domain;
+    struct vcpu *v;
+    int rc = 0;
+
+    domain_pause(d);
+
+    if ( d->vcpu[0] )
+    {
+        rc = hvm_replace_event_channel(d->vcpu[0], domid, &s->buf_ioreq_evtchn);
+        if ( rc < 0 )
+            goto done;
+    }
+
+    for_each_vcpu ( d, v )
+    {
+        rc = hvm_replace_event_channel(v, domid, &s->ioreq_evtchn[v->vcpu_id]);
+        if ( rc < 0 )
+            goto done;
+    }
+
+    hvm_update_ioreq_server_evtchn(s);
+
+    s->domid = domid;
+
+done:
+    domain_unpause(d);
+
+    return rc;
+}
+
+static int hvm_set_ioreq_server_pfn(struct hvm_ioreq_server *s, unsigned long pfn)
+{
+    struct domain *d = s->domain;
+    int rc;
+
+    rc = hvm_set_ioreq_page(d, &s->ioreq, pfn);
+    if ( rc < 0 )
+        return rc;
+
+    hvm_update_ioreq_server_evtchn(s);
+
+    return 0;
+}
+
+static int hvm_set_ioreq_server_buf_pfn(struct hvm_ioreq_server *s, unsigned long pfn)
+{
+    struct domain *d = s->domain;
+
+    return  hvm_set_ioreq_page(d, &s->buf_ioreq, pfn);
+}
+
 int hvm_domain_initialise(struct domain *d)
 {
     int rc;
@@ -608,17 +772,20 @@ int hvm_domain_initialise(struct domain *d)
 
     rtc_init(d);
 
-    hvm_init_ioreq_page(d, &d->arch.hvm_domain.ioreq);
-    hvm_init_ioreq_page(d, &d->arch.hvm_domain.buf_ioreq);
+    rc = hvm_init_ioreq_server(d);
+    if ( rc != 0 )
+        goto fail2;
 
     register_portio_handler(d, 0xe9, 1, hvm_print_line);
 
     rc = hvm_funcs.domain_initialise(d);
     if ( rc != 0 )
-        goto fail2;
+        goto fail3;
 
     return 0;
 
+ fail3:
+    hvm_deinit_ioreq_server(d);
  fail2:
     rtc_deinit(d);
     stdvga_deinit(d);
@@ -642,8 +809,7 @@ void hvm_domain_relinquish_resources(struct domain *d)
     if ( hvm_funcs.nhvm_domain_relinquish_resources )
         hvm_funcs.nhvm_domain_relinquish_resources(d);
 
-    hvm_destroy_ioreq_page(d, &d->arch.hvm_domain.ioreq);
-    hvm_destroy_ioreq_page(d, &d->arch.hvm_domain.buf_ioreq);
+    hvm_deinit_ioreq_server(d);
 
     msixtbl_pt_cleanup(d);
 
@@ -1155,7 +1321,7 @@ int hvm_vcpu_initialise(struct vcpu *v)
 {
     int rc;
     struct domain *d = v->domain;
-    domid_t dm_domid;
+    struct hvm_ioreq_server *s;
 
     hvm_asid_flush_vcpu(v);
 
@@ -1198,30 +1364,12 @@ int hvm_vcpu_initialise(struct vcpu *v)
          && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown: nestedhvm_vcpu_destroy */
         goto fail5;
 
-    dm_domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
+    s = d->arch.hvm_domain.ioreq_server;
 
-    /* Create ioreq event channel. */
-    rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); /* teardown: none */
+    rc = hvm_ioreq_server_add_vcpu(s, v);
     if ( rc < 0 )
         goto fail6;
 
-    /* Register ioreq event channel. */
-    v->arch.hvm_vcpu.xen_port = rc;
-
-    if ( v->vcpu_id == 0 )
-    {
-        /* Create bufioreq event channel. */
-        rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); /* teardown: none */
-        if ( rc < 0 )
-            goto fail6;
-        d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] = rc;
-    }
-
-    spin_lock(&d->arch.hvm_domain.ioreq.lock);
-    if ( d->arch.hvm_domain.ioreq.va != NULL )
-        get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
-    spin_unlock(&d->arch.hvm_domain.ioreq.lock);
-
     if ( v->vcpu_id == 0 )
     {
         /* NB. All these really belong in hvm_domain_initialise(). */
@@ -1255,6 +1403,11 @@ int hvm_vcpu_initialise(struct vcpu *v)
 
 void hvm_vcpu_destroy(struct vcpu *v)
 {
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+
+    hvm_ioreq_server_remove_vcpu(s, v);
+
     nestedhvm_vcpu_destroy(v);
 
     free_compat_arg_xlat(v);
@@ -1266,9 +1419,6 @@ void hvm_vcpu_destroy(struct vcpu *v)
         vlapic_destroy(v);
 
     hvm_funcs.vcpu_destroy(v);
-
-    /* Event channel is already freed by evtchn_destroy(). */
-    /*free_xen_event_channel(v, v->arch.hvm_vcpu.xen_port);*/
 }
 
 void hvm_vcpu_down(struct vcpu *v)
@@ -1298,8 +1448,10 @@ void hvm_vcpu_down(struct vcpu *v)
 int hvm_buffered_io_send(ioreq_t *p)
 {
     struct vcpu *v = current;
-    struct hvm_ioreq_page *iorp = &v->domain->arch.hvm_domain.buf_ioreq;
-    buffered_iopage_t *pg = iorp->va;
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s;
+    struct hvm_ioreq_page *iorp;
+    buffered_iopage_t *pg;
     buf_ioreq_t bp;
     /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
     int qw = 0;
@@ -1307,6 +1459,13 @@ int hvm_buffered_io_send(ioreq_t *p)
     /* Ensure buffered_iopage fits in a page */
     BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
 
+    s = d->arch.hvm_domain.ioreq_server;
+    if ( !s )
+        return 0;
+
+    iorp = &s->buf_ioreq;
+    pg = iorp->va;
+
     /*
      * Return 0 for the cases we can't deal with:
      *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
@@ -1367,8 +1526,7 @@ int hvm_buffered_io_send(ioreq_t *p)
     wmb();
     pg->write_pointer += qw ? 2 : 1;
 
-    notify_via_xen_event_channel(v->domain,
-            v->domain->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
+    notify_via_xen_event_channel(d, s->buf_ioreq_evtchn);
     spin_unlock(&iorp->lock);
     
     return 1;
@@ -1376,22 +1534,29 @@ int hvm_buffered_io_send(ioreq_t *p)
 
 bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *proto_p)
 {
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s;
     ioreq_t *p;
 
     if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
         return 0; /* implicitly bins the i/o operation */
 
-    if ( !(p = get_ioreq(v)) )
+    s = d->arch.hvm_domain.ioreq_server;
+    if ( !s )
         return 0;
 
+    p = get_ioreq(s, v->vcpu_id);
+
     if ( unlikely(p->state != STATE_IOREQ_NONE) )
     {
         /* This indicates a bug in the device model. Crash the domain. */
         gdprintk(XENLOG_ERR, "Device model set bad IO state %d.\n", p->state);
-        domain_crash(v->domain);
+        domain_crash(d);
         return 0;
     }
 
+    v->arch.hvm_vcpu.ioreq_server = s;
+
     p->dir = proto_p->dir;
     p->data_is_ptr = proto_p->data_is_ptr;
     p->type = proto_p->type;
@@ -1401,14 +1566,14 @@ bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *proto_p)
     p->df = proto_p->df;
     p->data = proto_p->data;
 
-    prepare_wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port);
+    prepare_wait_on_xen_event_channel(p->vp_eport);
 
     /*
      * Following happens /after/ blocking and setting up ioreq contents.
      * prepare_wait_on_xen_event_channel() is an implicit barrier.
      */
     p->state = STATE_IOREQ_READY;
-    notify_via_xen_event_channel(v->domain, v->arch.hvm_vcpu.xen_port);
+    notify_via_xen_event_channel(d, p->vp_eport);
 
     return 1;
 }
@@ -3995,21 +4160,6 @@ static int hvmop_flush_tlb_all(void)
     return 0;
 }
 
-static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
-                                     int *p_port)
-{
-    int old_port, new_port;
-
-    new_port = alloc_unbound_xen_event_channel(v, remote_domid, NULL);
-    if ( new_port < 0 )
-        return new_port;
-
-    /* xchg() ensures that only we call free_xen_event_channel(). */
-    old_port = xchg(p_port, new_port);
-    free_xen_event_channel(v, old_port);
-    return 0;
-}
-
 long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
 {
@@ -4022,7 +4172,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
     case HVMOP_get_param:
     {
         struct xen_hvm_param a;
-        struct hvm_ioreq_page *iorp;
+        struct hvm_ioreq_server *s;
         struct domain *d;
         struct vcpu *v;
 
@@ -4048,6 +4198,8 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( rc )
             goto param_fail;
 
+        s = d->arch.hvm_domain.ioreq_server;
+
         if ( op == HVMOP_set_param )
         {
             rc = 0;
@@ -4055,19 +4207,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             switch ( a.index )
             {
             case HVM_PARAM_IOREQ_PFN:
-                iorp = &d->arch.hvm_domain.ioreq;
-                if ( (rc = hvm_set_ioreq_page(d, iorp, a.value)) != 0 )
-                    break;
-                spin_lock(&iorp->lock);
-                if ( iorp->va != NULL )
-                    /* Initialise evtchn port info if VCPUs already created. */
-                    for_each_vcpu ( d, v )
-                        get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
-                spin_unlock(&iorp->lock);
+                rc = hvm_set_ioreq_server_pfn(s, a.value);
                 break;
             case HVM_PARAM_BUFIOREQ_PFN: 
-                iorp = &d->arch.hvm_domain.buf_ioreq;
-                rc = hvm_set_ioreq_page(d, iorp, a.value);
+                rc = hvm_set_ioreq_server_buf_pfn(s, a.value);
                 break;
             case HVM_PARAM_CALLBACK_IRQ:
                 hvm_set_callback_via(d, a.value);
@@ -4122,31 +4265,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 if ( a.value == DOMID_SELF )
                     a.value = curr_d->domain_id;
 
-                rc = 0;
-                domain_pause(d); /* safe to change per-vcpu xen_port */
-                if ( d->vcpu[0] )
-                    rc = hvm_replace_event_channel(d->vcpu[0], a.value,
-                             (int *)&d->vcpu[0]->domain->arch.hvm_domain.params
-                                     [HVM_PARAM_BUFIOREQ_EVTCHN]);
-                if ( rc )
-                {
-                    domain_unpause(d);
-                    break;
-                }
-                iorp = &d->arch.hvm_domain.ioreq;
-                for_each_vcpu ( d, v )
-                {
-                    rc = hvm_replace_event_channel(v, a.value,
-                                                   &v->arch.hvm_vcpu.xen_port);
-                    if ( rc )
-                        break;
-
-                    spin_lock(&iorp->lock);
-                    if ( iorp->va != NULL )
-                        get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
-                    spin_unlock(&iorp->lock);
-                }
-                domain_unpause(d);
+                rc = hvm_set_ioreq_server_domid(s, a.value);
                 break;
             case HVM_PARAM_ACPI_S_STATE:
                 /* Not reflexive, as we must domain_pause(). */
@@ -4241,6 +4360,9 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
         {
             switch ( a.index )
             {
+            case HVM_PARAM_BUFIOREQ_EVTCHN:
+                a.value = s->buf_ioreq_evtchn;
+                break;
             case HVM_PARAM_ACPI_S_STATE:
                 a.value = d->arch.hvm_domain.is_s3_suspended ? 3 : 0;
                 break;
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index b1e3187..4c039f8 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -41,10 +41,17 @@ struct hvm_ioreq_page {
     void *va;
 };
 
-struct hvm_domain {
+struct hvm_ioreq_server {
+    struct domain          *domain;
+    domid_t                domid;
     struct hvm_ioreq_page  ioreq;
+    int                    ioreq_evtchn[MAX_HVM_VCPUS];
     struct hvm_ioreq_page  buf_ioreq;
+    int                    buf_ioreq_evtchn;
+};
 
+struct hvm_domain {
+    struct hvm_ioreq_server *ioreq_server;
     struct pl_time         pl_time;
 
     struct hvm_io_handler *io_handler;
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 122ab0d..4c9d7ee 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -138,7 +138,7 @@ struct hvm_vcpu {
     spinlock_t          tm_lock;
     struct list_head    tm_list;
 
-    int                 xen_port;
+    struct hvm_ioreq_server *ioreq_server;
 
     bool_t              flag_dr_dirty;
     bool_t              debug_state_latch;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 2/5] ioreq-server: create basic ioreq server abstraction.
  2014-01-30 14:19 ` [RFC PATCH 2/5] ioreq-server: create basic ioreq server abstraction Paul Durrant
@ 2014-01-30 15:03   ` Andrew Cooper
  2014-01-30 15:17     ` Paul Durrant
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2014-01-30 15:03 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

On 30/01/14 14:19, Paul Durrant wrote:
> Collect together data structures concerning device emulation together into
> a new struct hvm_ioreq_server.
>
> Code that deals with the shared and buffered ioreq pages is extracted from
> functions such as hvm_domain_initialise, hvm_vcpu_initialise and do_hvm_op
> and consolidated into a set of hvm_ioreq_server_XXX functions.
>
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> ---
>  xen/arch/x86/hvm/hvm.c           |  318 ++++++++++++++++++++++++++------------
>  xen/include/asm-x86/hvm/domain.h |    9 +-
>  xen/include/asm-x86/hvm/vcpu.h   |    2 +-
>  3 files changed, 229 insertions(+), 100 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 71a44db..a0eaadb 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -345,16 +345,16 @@ void hvm_migrate_pirqs(struct vcpu *v)
>      spin_unlock(&d->event_lock);
>  }
>  
> -static ioreq_t *get_ioreq(struct vcpu *v)
> +static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, int id)
>  {
> -    struct domain *d = v->domain;
> -    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
> -    ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock));
> -    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
> +    shared_iopage_t *p = s->ioreq.va;
> +    ASSERT(p != NULL);
> +    return &p->vcpu_ioreq[id];
>  }
>  
>  void hvm_do_resume(struct vcpu *v)
>  {
> +    struct hvm_ioreq_server *s;
>      ioreq_t *p;
>  
>      check_wakeup_from_wait();
> @@ -362,10 +362,14 @@ void hvm_do_resume(struct vcpu *v)
>      if ( is_hvm_vcpu(v) )
>          pt_restore_timer(v);
>  
> -    /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
> -    if ( !(p = get_ioreq(v)) )
> +    s = v->arch.hvm_vcpu.ioreq_server;

This assignment can be part of the declaration of 's' (and likewise in
most later examples).

> +    v->arch.hvm_vcpu.ioreq_server = NULL;
> +
> +    if ( !s )
>          goto check_inject_trap;
>  
> +    /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
> +    p = get_ioreq(s, v->vcpu_id);
>      while ( p->state != STATE_IOREQ_NONE )
>      {
>          switch ( p->state )
> @@ -375,7 +379,7 @@ void hvm_do_resume(struct vcpu *v)
>              break;
>          case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
>          case STATE_IOREQ_INPROCESS:
> -            wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port,
> +            wait_on_xen_event_channel(p->vp_eport,
>                                        (p->state != STATE_IOREQ_READY) &&
>                                        (p->state != STATE_IOREQ_INPROCESS));
>              break;
> @@ -398,7 +402,6 @@ void hvm_do_resume(struct vcpu *v)
>  static void hvm_init_ioreq_page(
>      struct domain *d, struct hvm_ioreq_page *iorp)
>  {
> -    memset(iorp, 0, sizeof(*iorp));

Is it worth keeping this function?  the two back to back
domain_pause()'s from the callers are redundant.

>      spin_lock_init(&iorp->lock);
>      domain_pause(d);
>  }
> @@ -541,6 +544,167 @@ static int handle_pvh_io(
>      return X86EMUL_OKAY;
>  }
>  
> +static int hvm_init_ioreq_server(struct domain *d)
> +{
> +    struct hvm_ioreq_server *s;
> +    int i;
> +
> +    s = xzalloc(struct hvm_ioreq_server);
> +    if ( !s )
> +        return -ENOMEM;
> +
> +    s->domain = d;
> +
> +    for ( i = 0; i < MAX_HVM_VCPUS; i++ )
> +        s->ioreq_evtchn[i] = -1;
> +    s->buf_ioreq_evtchn = -1;
> +
> +    hvm_init_ioreq_page(d, &s->ioreq);
> +    hvm_init_ioreq_page(d, &s->buf_ioreq);
> +
> +    d->arch.hvm_domain.ioreq_server = s;
> +    return 0;
> +}
> +
> +static void hvm_deinit_ioreq_server(struct domain *d)
> +{
> +    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> +
> +    hvm_destroy_ioreq_page(d, &s->ioreq);
> +    hvm_destroy_ioreq_page(d, &s->buf_ioreq);
> +
> +    xfree(s);
> +}
> +
> +static void hvm_update_ioreq_server_evtchn(struct hvm_ioreq_server *s)
> +{
> +    struct domain *d = s->domain;
> +
> +    if ( s->ioreq.va != NULL )
> +    {
> +        shared_iopage_t *p = s->ioreq.va;
> +        struct vcpu *v;
> +
> +        for_each_vcpu ( d, v )
> +            p->vcpu_ioreq[v->vcpu_id].vp_eport = s->ioreq_evtchn[v->vcpu_id];
> +    }
> +}
> +
> +static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s, struct vcpu *v)
> +{
> +    int rc;
> +
> +    /* Create ioreq event channel. */
> +    rc = alloc_unbound_xen_event_channel(v, s->domid, NULL);
> +    if ( rc < 0 )
> +        goto done;
> +
> +    /* Register ioreq event channel. */
> +    s->ioreq_evtchn[v->vcpu_id] = rc;
> +
> +    if ( v->vcpu_id == 0 )
> +    {
> +        /* Create bufioreq event channel. */
> +        rc = alloc_unbound_xen_event_channel(v, s->domid, NULL);
> +        if ( rc < 0 )
> +            goto done;

skipping hvm_update_ioreq_server_evtchn() even in the case of a
successful ioreq event channel?

> +
> +        s->buf_ioreq_evtchn = rc;
> +    }
> +
> +    hvm_update_ioreq_server_evtchn(s);
> +    rc = 0;
> +
> +done:
> +    return rc;
> +}
> +
> +static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, struct vcpu *v)
> +{
> +    if ( v->vcpu_id == 0 )
> +    {
> +        if ( s->buf_ioreq_evtchn >= 0 )
> +        {
> +            free_xen_event_channel(v, s->buf_ioreq_evtchn);
> +            s->buf_ioreq_evtchn = -1;
> +        }
> +    }
> +
> +    if ( s->ioreq_evtchn[v->vcpu_id] >= 0 )
> +    {
> +        free_xen_event_channel(v, s->ioreq_evtchn[v->vcpu_id]);
> +        s->ioreq_evtchn[v->vcpu_id] = -1;
> +    }
> +}
> +
> +static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
> +                                     int *p_port)
> +{
> +    int old_port, new_port;
> +
> +    new_port = alloc_unbound_xen_event_channel(v, remote_domid, NULL);
> +    if ( new_port < 0 )
> +        return new_port;
> +
> +    /* xchg() ensures that only we call free_xen_event_channel(). */
> +    old_port = xchg(p_port, new_port);
> +    free_xen_event_channel(v, old_port);
> +    return 0;
> +}
> +
> +static int hvm_set_ioreq_server_domid(struct hvm_ioreq_server *s, domid_t domid)
> +{
> +    struct domain *d = s->domain;
> +    struct vcpu *v;
> +    int rc = 0;
> +
> +    domain_pause(d);
> +
> +    if ( d->vcpu[0] )
> +    {
> +        rc = hvm_replace_event_channel(d->vcpu[0], domid, &s->buf_ioreq_evtchn);
> +        if ( rc < 0 )
> +            goto done;
> +    }
> +
> +    for_each_vcpu ( d, v )
> +    {
> +        rc = hvm_replace_event_channel(v, domid, &s->ioreq_evtchn[v->vcpu_id]);
> +        if ( rc < 0 )
> +            goto done;
> +    }
> +
> +    hvm_update_ioreq_server_evtchn(s);
> +
> +    s->domid = domid;
> +
> +done:
> +    domain_unpause(d);
> +
> +    return rc;
> +}
> +
> +static int hvm_set_ioreq_server_pfn(struct hvm_ioreq_server *s, unsigned long pfn)
> +{
> +    struct domain *d = s->domain;
> +    int rc;
> +
> +    rc = hvm_set_ioreq_page(d, &s->ioreq, pfn);
> +    if ( rc < 0 )
> +        return rc;
> +
> +    hvm_update_ioreq_server_evtchn(s);
> +
> +    return 0;
> +}
> +
> +static int hvm_set_ioreq_server_buf_pfn(struct hvm_ioreq_server *s, unsigned long pfn)
> +{
> +    struct domain *d = s->domain;
> +
> +    return  hvm_set_ioreq_page(d, &s->buf_ioreq, pfn);

Double space.

> +}
> +
>  int hvm_domain_initialise(struct domain *d)
>  {
>      int rc;
> @@ -608,17 +772,20 @@ int hvm_domain_initialise(struct domain *d)
>  
>      rtc_init(d);
>  
> -    hvm_init_ioreq_page(d, &d->arch.hvm_domain.ioreq);
> -    hvm_init_ioreq_page(d, &d->arch.hvm_domain.buf_ioreq);
> +    rc = hvm_init_ioreq_server(d);
> +    if ( rc != 0 )
> +        goto fail2;
>  
>      register_portio_handler(d, 0xe9, 1, hvm_print_line);
>  
>      rc = hvm_funcs.domain_initialise(d);
>      if ( rc != 0 )
> -        goto fail2;
> +        goto fail3;
>  
>      return 0;
>  
> + fail3:
> +    hvm_deinit_ioreq_server(d);
>   fail2:
>      rtc_deinit(d);
>      stdvga_deinit(d);
> @@ -642,8 +809,7 @@ void hvm_domain_relinquish_resources(struct domain *d)
>      if ( hvm_funcs.nhvm_domain_relinquish_resources )
>          hvm_funcs.nhvm_domain_relinquish_resources(d);
>  
> -    hvm_destroy_ioreq_page(d, &d->arch.hvm_domain.ioreq);
> -    hvm_destroy_ioreq_page(d, &d->arch.hvm_domain.buf_ioreq);
> +    hvm_deinit_ioreq_server(d);
>  
>      msixtbl_pt_cleanup(d);
>  
> @@ -1155,7 +1321,7 @@ int hvm_vcpu_initialise(struct vcpu *v)
>  {
>      int rc;
>      struct domain *d = v->domain;
> -    domid_t dm_domid;
> +    struct hvm_ioreq_server *s;
>  
>      hvm_asid_flush_vcpu(v);
>  
> @@ -1198,30 +1364,12 @@ int hvm_vcpu_initialise(struct vcpu *v)
>           && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown: nestedhvm_vcpu_destroy */
>          goto fail5;
>  
> -    dm_domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
> +    s = d->arch.hvm_domain.ioreq_server;
>  
> -    /* Create ioreq event channel. */
> -    rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); /* teardown: none */
> +    rc = hvm_ioreq_server_add_vcpu(s, v);
>      if ( rc < 0 )
>          goto fail6;
>  
> -    /* Register ioreq event channel. */
> -    v->arch.hvm_vcpu.xen_port = rc;
> -
> -    if ( v->vcpu_id == 0 )
> -    {
> -        /* Create bufioreq event channel. */
> -        rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); /* teardown: none */
> -        if ( rc < 0 )
> -            goto fail6;
> -        d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] = rc;
> -    }
> -
> -    spin_lock(&d->arch.hvm_domain.ioreq.lock);
> -    if ( d->arch.hvm_domain.ioreq.va != NULL )
> -        get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
> -    spin_unlock(&d->arch.hvm_domain.ioreq.lock);
> -
>      if ( v->vcpu_id == 0 )
>      {
>          /* NB. All these really belong in hvm_domain_initialise(). */
> @@ -1255,6 +1403,11 @@ int hvm_vcpu_initialise(struct vcpu *v)
>  
>  void hvm_vcpu_destroy(struct vcpu *v)
>  {
> +    struct domain *d = v->domain;
> +    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> +
> +    hvm_ioreq_server_remove_vcpu(s, v);
> +
>      nestedhvm_vcpu_destroy(v);
>  
>      free_compat_arg_xlat(v);
> @@ -1266,9 +1419,6 @@ void hvm_vcpu_destroy(struct vcpu *v)
>          vlapic_destroy(v);
>  
>      hvm_funcs.vcpu_destroy(v);
> -
> -    /* Event channel is already freed by evtchn_destroy(). */
> -    /*free_xen_event_channel(v, v->arch.hvm_vcpu.xen_port);*/
>  }
>  
>  void hvm_vcpu_down(struct vcpu *v)
> @@ -1298,8 +1448,10 @@ void hvm_vcpu_down(struct vcpu *v)
>  int hvm_buffered_io_send(ioreq_t *p)
>  {
>      struct vcpu *v = current;
> -    struct hvm_ioreq_page *iorp = &v->domain->arch.hvm_domain.buf_ioreq;
> -    buffered_iopage_t *pg = iorp->va;
> +    struct domain *d = v->domain;
> +    struct hvm_ioreq_server *s;
> +    struct hvm_ioreq_page *iorp;
> +    buffered_iopage_t *pg;
>      buf_ioreq_t bp;
>      /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
>      int qw = 0;
> @@ -1307,6 +1459,13 @@ int hvm_buffered_io_send(ioreq_t *p)
>      /* Ensure buffered_iopage fits in a page */
>      BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
>  
> +    s = d->arch.hvm_domain.ioreq_server;
> +    if ( !s )
> +        return 0;
> +
> +    iorp = &s->buf_ioreq;
> +    pg = iorp->va;
> +
>      /*
>       * Return 0 for the cases we can't deal with:
>       *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
> @@ -1367,8 +1526,7 @@ int hvm_buffered_io_send(ioreq_t *p)
>      wmb();
>      pg->write_pointer += qw ? 2 : 1;
>  
> -    notify_via_xen_event_channel(v->domain,
> -            v->domain->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
> +    notify_via_xen_event_channel(d, s->buf_ioreq_evtchn);
>      spin_unlock(&iorp->lock);
>      
>      return 1;
> @@ -1376,22 +1534,29 @@ int hvm_buffered_io_send(ioreq_t *p)
>  
>  bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *proto_p)
>  {
> +    struct domain *d = v->domain;
> +    struct hvm_ioreq_server *s;
>      ioreq_t *p;
>  
>      if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
>          return 0; /* implicitly bins the i/o operation */
>  
> -    if ( !(p = get_ioreq(v)) )
> +    s = d->arch.hvm_domain.ioreq_server;
> +    if ( !s )
>          return 0;
>  
> +    p = get_ioreq(s, v->vcpu_id);
> +
>      if ( unlikely(p->state != STATE_IOREQ_NONE) )
>      {
>          /* This indicates a bug in the device model. Crash the domain. */
>          gdprintk(XENLOG_ERR, "Device model set bad IO state %d.\n", p->state);
> -        domain_crash(v->domain);
> +        domain_crash(d);
>          return 0;
>      }
>  
> +    v->arch.hvm_vcpu.ioreq_server = s;
> +
>      p->dir = proto_p->dir;
>      p->data_is_ptr = proto_p->data_is_ptr;
>      p->type = proto_p->type;
> @@ -1401,14 +1566,14 @@ bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *proto_p)
>      p->df = proto_p->df;
>      p->data = proto_p->data;
>  
> -    prepare_wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port);
> +    prepare_wait_on_xen_event_channel(p->vp_eport);
>  
>      /*
>       * Following happens /after/ blocking and setting up ioreq contents.
>       * prepare_wait_on_xen_event_channel() is an implicit barrier.
>       */
>      p->state = STATE_IOREQ_READY;
> -    notify_via_xen_event_channel(v->domain, v->arch.hvm_vcpu.xen_port);
> +    notify_via_xen_event_channel(d, p->vp_eport);
>  
>      return 1;
>  }
> @@ -3995,21 +4160,6 @@ static int hvmop_flush_tlb_all(void)
>      return 0;
>  }
>  
> -static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
> -                                     int *p_port)
> -{
> -    int old_port, new_port;
> -
> -    new_port = alloc_unbound_xen_event_channel(v, remote_domid, NULL);
> -    if ( new_port < 0 )
> -        return new_port;
> -
> -    /* xchg() ensures that only we call free_xen_event_channel(). */
> -    old_port = xchg(p_port, new_port);
> -    free_xen_event_channel(v, old_port);
> -    return 0;
> -}
> -
>  long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>  
>  {
> @@ -4022,7 +4172,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>      case HVMOP_get_param:
>      {
>          struct xen_hvm_param a;
> -        struct hvm_ioreq_page *iorp;
> +        struct hvm_ioreq_server *s;
>          struct domain *d;
>          struct vcpu *v;
>  
> @@ -4048,6 +4198,8 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>          if ( rc )
>              goto param_fail;
>  
> +        s = d->arch.hvm_domain.ioreq_server;
> +

This should be reduced in lexical scope, and I would have said that it
can just be 'inlined' into each of the 4 uses later.

>          if ( op == HVMOP_set_param )
>          {
>              rc = 0;
> @@ -4055,19 +4207,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>              switch ( a.index )
>              {
>              case HVM_PARAM_IOREQ_PFN:
> -                iorp = &d->arch.hvm_domain.ioreq;
> -                if ( (rc = hvm_set_ioreq_page(d, iorp, a.value)) != 0 )
> -                    break;
> -                spin_lock(&iorp->lock);
> -                if ( iorp->va != NULL )
> -                    /* Initialise evtchn port info if VCPUs already created. */
> -                    for_each_vcpu ( d, v )
> -                        get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
> -                spin_unlock(&iorp->lock);
> +                rc = hvm_set_ioreq_server_pfn(s, a.value);
>                  break;
>              case HVM_PARAM_BUFIOREQ_PFN: 
> -                iorp = &d->arch.hvm_domain.buf_ioreq;
> -                rc = hvm_set_ioreq_page(d, iorp, a.value);
> +                rc = hvm_set_ioreq_server_buf_pfn(s, a.value);
>                  break;
>              case HVM_PARAM_CALLBACK_IRQ:
>                  hvm_set_callback_via(d, a.value);
> @@ -4122,31 +4265,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>                  if ( a.value == DOMID_SELF )
>                      a.value = curr_d->domain_id;
>  
> -                rc = 0;
> -                domain_pause(d); /* safe to change per-vcpu xen_port */
> -                if ( d->vcpu[0] )
> -                    rc = hvm_replace_event_channel(d->vcpu[0], a.value,
> -                             (int *)&d->vcpu[0]->domain->arch.hvm_domain.params
> -                                     [HVM_PARAM_BUFIOREQ_EVTCHN]);
> -                if ( rc )
> -                {
> -                    domain_unpause(d);
> -                    break;
> -                }
> -                iorp = &d->arch.hvm_domain.ioreq;
> -                for_each_vcpu ( d, v )
> -                {
> -                    rc = hvm_replace_event_channel(v, a.value,
> -                                                   &v->arch.hvm_vcpu.xen_port);
> -                    if ( rc )
> -                        break;
> -
> -                    spin_lock(&iorp->lock);
> -                    if ( iorp->va != NULL )
> -                        get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
> -                    spin_unlock(&iorp->lock);
> -                }
> -                domain_unpause(d);
> +                rc = hvm_set_ioreq_server_domid(s, a.value);
>                  break;
>              case HVM_PARAM_ACPI_S_STATE:
>                  /* Not reflexive, as we must domain_pause(). */
> @@ -4241,6 +4360,9 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>          {
>              switch ( a.index )
>              {
> +            case HVM_PARAM_BUFIOREQ_EVTCHN:
> +                a.value = s->buf_ioreq_evtchn;
> +                break;
>              case HVM_PARAM_ACPI_S_STATE:
>                  a.value = d->arch.hvm_domain.is_s3_suspended ? 3 : 0;
>                  break;
> diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
> index b1e3187..4c039f8 100644
> --- a/xen/include/asm-x86/hvm/domain.h
> +++ b/xen/include/asm-x86/hvm/domain.h
> @@ -41,10 +41,17 @@ struct hvm_ioreq_page {
>      void *va;
>  };
>  
> -struct hvm_domain {
> +struct hvm_ioreq_server {
> +    struct domain          *domain;
> +    domid_t                domid;
>      struct hvm_ioreq_page  ioreq;
> +    int                    ioreq_evtchn[MAX_HVM_VCPUS];
>      struct hvm_ioreq_page  buf_ioreq;
> +    int                    buf_ioreq_evtchn;
> +};
>  
> +struct hvm_domain {
> +    struct hvm_ioreq_server *ioreq_server;
>      struct pl_time         pl_time;
>  
>      struct hvm_io_handler *io_handler;
> diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
> index 122ab0d..4c9d7ee 100644
> --- a/xen/include/asm-x86/hvm/vcpu.h
> +++ b/xen/include/asm-x86/hvm/vcpu.h
> @@ -138,7 +138,7 @@ struct hvm_vcpu {
>      spinlock_t          tm_lock;
>      struct list_head    tm_list;
>  
> -    int                 xen_port;
> +    struct hvm_ioreq_server *ioreq_server;
>  

Why do both hvm_vcpu and hvm_domain need ioreq_server pointers?  I cant
spot anything which actually uses the vcpu one.

~Andrew

>      bool_t              flag_dr_dirty;
>      bool_t              debug_state_latch;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 2/5] ioreq-server: create basic ioreq server abstraction.
  2014-01-30 15:03   ` Andrew Cooper
@ 2014-01-30 15:17     ` Paul Durrant
  0 siblings, 0 replies; 25+ messages in thread
From: Paul Durrant @ 2014-01-30 15:17 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel@lists.xen.org

> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: 30 January 2014 15:04
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [RFC PATCH 2/5] ioreq-server: create basic ioreq
> server abstraction.
> 
> On 30/01/14 14:19, Paul Durrant wrote:
> > Collect together data structures concerning device emulation together into
> > a new struct hvm_ioreq_server.
> >
> > Code that deals with the shared and buffered ioreq pages is extracted from
> > functions such as hvm_domain_initialise, hvm_vcpu_initialise and
> do_hvm_op
> > and consolidated into a set of hvm_ioreq_server_XXX functions.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > ---
> >  xen/arch/x86/hvm/hvm.c           |  318 ++++++++++++++++++++++++++----
> --------
> >  xen/include/asm-x86/hvm/domain.h |    9 +-
> >  xen/include/asm-x86/hvm/vcpu.h   |    2 +-
> >  3 files changed, 229 insertions(+), 100 deletions(-)
> >
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > index 71a44db..a0eaadb 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -345,16 +345,16 @@ void hvm_migrate_pirqs(struct vcpu *v)
> >      spin_unlock(&d->event_lock);
> >  }
> >
> > -static ioreq_t *get_ioreq(struct vcpu *v)
> > +static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, int id)
> >  {
> > -    struct domain *d = v->domain;
> > -    shared_iopage_t *p = d->arch.hvm_domain.ioreq.va;
> > -    ASSERT((v == current) || spin_is_locked(&d-
> >arch.hvm_domain.ioreq.lock));
> > -    return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL;
> > +    shared_iopage_t *p = s->ioreq.va;
> > +    ASSERT(p != NULL);
> > +    return &p->vcpu_ioreq[id];
> >  }
> >
> >  void hvm_do_resume(struct vcpu *v)
> >  {
> > +    struct hvm_ioreq_server *s;
> >      ioreq_t *p;
> >
> >      check_wakeup_from_wait();
> > @@ -362,10 +362,14 @@ void hvm_do_resume(struct vcpu *v)
> >      if ( is_hvm_vcpu(v) )
> >          pt_restore_timer(v);
> >
> > -    /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE).
> */
> > -    if ( !(p = get_ioreq(v)) )
> > +    s = v->arch.hvm_vcpu.ioreq_server;
> 
> This assignment can be part of the declaration of 's' (and likewise in
> most later examples).
> 

Whilst that's true it would make the subsequent patch where we moved to using lists less obvious.

> > +    v->arch.hvm_vcpu.ioreq_server = NULL;
> > +
> > +    if ( !s )
> >          goto check_inject_trap;
> >
> > +    /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE).
> */
> > +    p = get_ioreq(s, v->vcpu_id);
> >      while ( p->state != STATE_IOREQ_NONE )
> >      {
> >          switch ( p->state )
> > @@ -375,7 +379,7 @@ void hvm_do_resume(struct vcpu *v)
> >              break;
> >          case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} ->
> IORESP_READY */
> >          case STATE_IOREQ_INPROCESS:
> > -            wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port,
> > +            wait_on_xen_event_channel(p->vp_eport,
> >                                        (p->state != STATE_IOREQ_READY) &&
> >                                        (p->state != STATE_IOREQ_INPROCESS));
> >              break;
> > @@ -398,7 +402,6 @@ void hvm_do_resume(struct vcpu *v)
> >  static void hvm_init_ioreq_page(
> >      struct domain *d, struct hvm_ioreq_page *iorp)
> >  {
> > -    memset(iorp, 0, sizeof(*iorp));
> 
> Is it worth keeping this function?  the two back to back
> domain_pause()'s from the callers are redundant.
> 

It actually becomes just the spin_lock_init in a subsequent patch. I left it this was as I did not want to make too many steps in one go.

> >      spin_lock_init(&iorp->lock);
> >      domain_pause(d);
> >  }
> > @@ -541,6 +544,167 @@ static int handle_pvh_io(
> >      return X86EMUL_OKAY;
> >  }
> >
> > +static int hvm_init_ioreq_server(struct domain *d)
> > +{
> > +    struct hvm_ioreq_server *s;
> > +    int i;
> > +
> > +    s = xzalloc(struct hvm_ioreq_server);
> > +    if ( !s )
> > +        return -ENOMEM;
> > +
> > +    s->domain = d;
> > +
> > +    for ( i = 0; i < MAX_HVM_VCPUS; i++ )
> > +        s->ioreq_evtchn[i] = -1;
> > +    s->buf_ioreq_evtchn = -1;
> > +
> > +    hvm_init_ioreq_page(d, &s->ioreq);
> > +    hvm_init_ioreq_page(d, &s->buf_ioreq);
> > +
> > +    d->arch.hvm_domain.ioreq_server = s;
> > +    return 0;
> > +}
> > +
> > +static void hvm_deinit_ioreq_server(struct domain *d)
> > +{
> > +    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> > +
> > +    hvm_destroy_ioreq_page(d, &s->ioreq);
> > +    hvm_destroy_ioreq_page(d, &s->buf_ioreq);
> > +
> > +    xfree(s);
> > +}
> > +
> > +static void hvm_update_ioreq_server_evtchn(struct hvm_ioreq_server
> *s)
> > +{
> > +    struct domain *d = s->domain;
> > +
> > +    if ( s->ioreq.va != NULL )
> > +    {
> > +        shared_iopage_t *p = s->ioreq.va;
> > +        struct vcpu *v;
> > +
> > +        for_each_vcpu ( d, v )
> > +            p->vcpu_ioreq[v->vcpu_id].vp_eport = s->ioreq_evtchn[v-
> >vcpu_id];
> > +    }
> > +}
> > +
> > +static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
> struct vcpu *v)
> > +{
> > +    int rc;
> > +
> > +    /* Create ioreq event channel. */
> > +    rc = alloc_unbound_xen_event_channel(v, s->domid, NULL);
> > +    if ( rc < 0 )
> > +        goto done;
> > +
> > +    /* Register ioreq event channel. */
> > +    s->ioreq_evtchn[v->vcpu_id] = rc;
> > +
> > +    if ( v->vcpu_id == 0 )
> > +    {
> > +        /* Create bufioreq event channel. */
> > +        rc = alloc_unbound_xen_event_channel(v, s->domid, NULL);
> > +        if ( rc < 0 )
> > +            goto done;
> 
> skipping hvm_update_ioreq_server_evtchn() even in the case of a
> successful ioreq event channel?
> 

Yes, because the vcpu creation will fail.

> > +
> > +        s->buf_ioreq_evtchn = rc;
> > +    }
> > +
> > +    hvm_update_ioreq_server_evtchn(s);
> > +    rc = 0;
> > +
> > +done:
> > +    return rc;
> > +}
> > +
> > +static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s,
> struct vcpu *v)
> > +{
> > +    if ( v->vcpu_id == 0 )
> > +    {
> > +        if ( s->buf_ioreq_evtchn >= 0 )
> > +        {
> > +            free_xen_event_channel(v, s->buf_ioreq_evtchn);
> > +            s->buf_ioreq_evtchn = -1;
> > +        }
> > +    }
> > +
> > +    if ( s->ioreq_evtchn[v->vcpu_id] >= 0 )
> > +    {
> > +        free_xen_event_channel(v, s->ioreq_evtchn[v->vcpu_id]);
> > +        s->ioreq_evtchn[v->vcpu_id] = -1;
> > +    }
> > +}
> > +
> > +static int hvm_replace_event_channel(struct vcpu *v, domid_t
> remote_domid,
> > +                                     int *p_port)
> > +{
> > +    int old_port, new_port;
> > +
> > +    new_port = alloc_unbound_xen_event_channel(v, remote_domid,
> NULL);
> > +    if ( new_port < 0 )
> > +        return new_port;
> > +
> > +    /* xchg() ensures that only we call free_xen_event_channel(). */
> > +    old_port = xchg(p_port, new_port);
> > +    free_xen_event_channel(v, old_port);
> > +    return 0;
> > +}
> > +
> > +static int hvm_set_ioreq_server_domid(struct hvm_ioreq_server *s,
> domid_t domid)
> > +{
> > +    struct domain *d = s->domain;
> > +    struct vcpu *v;
> > +    int rc = 0;
> > +
> > +    domain_pause(d);
> > +
> > +    if ( d->vcpu[0] )
> > +    {
> > +        rc = hvm_replace_event_channel(d->vcpu[0], domid, &s-
> >buf_ioreq_evtchn);
> > +        if ( rc < 0 )
> > +            goto done;
> > +    }
> > +
> > +    for_each_vcpu ( d, v )
> > +    {
> > +        rc = hvm_replace_event_channel(v, domid, &s->ioreq_evtchn[v-
> >vcpu_id]);
> > +        if ( rc < 0 )
> > +            goto done;
> > +    }
> > +
> > +    hvm_update_ioreq_server_evtchn(s);
> > +
> > +    s->domid = domid;
> > +
> > +done:
> > +    domain_unpause(d);
> > +
> > +    return rc;
> > +}
> > +
> > +static int hvm_set_ioreq_server_pfn(struct hvm_ioreq_server *s,
> unsigned long pfn)
> > +{
> > +    struct domain *d = s->domain;
> > +    int rc;
> > +
> > +    rc = hvm_set_ioreq_page(d, &s->ioreq, pfn);
> > +    if ( rc < 0 )
> > +        return rc;
> > +
> > +    hvm_update_ioreq_server_evtchn(s);
> > +
> > +    return 0;
> > +}
> > +
> > +static int hvm_set_ioreq_server_buf_pfn(struct hvm_ioreq_server *s,
> unsigned long pfn)
> > +{
> > +    struct domain *d = s->domain;
> > +
> > +    return  hvm_set_ioreq_page(d, &s->buf_ioreq, pfn);
> 
> Double space.
> 
> > +}
> > +
> >  int hvm_domain_initialise(struct domain *d)
> >  {
> >      int rc;
> > @@ -608,17 +772,20 @@ int hvm_domain_initialise(struct domain *d)
> >
> >      rtc_init(d);
> >
> > -    hvm_init_ioreq_page(d, &d->arch.hvm_domain.ioreq);
> > -    hvm_init_ioreq_page(d, &d->arch.hvm_domain.buf_ioreq);
> > +    rc = hvm_init_ioreq_server(d);
> > +    if ( rc != 0 )
> > +        goto fail2;
> >
> >      register_portio_handler(d, 0xe9, 1, hvm_print_line);
> >
> >      rc = hvm_funcs.domain_initialise(d);
> >      if ( rc != 0 )
> > -        goto fail2;
> > +        goto fail3;
> >
> >      return 0;
> >
> > + fail3:
> > +    hvm_deinit_ioreq_server(d);
> >   fail2:
> >      rtc_deinit(d);
> >      stdvga_deinit(d);
> > @@ -642,8 +809,7 @@ void hvm_domain_relinquish_resources(struct
> domain *d)
> >      if ( hvm_funcs.nhvm_domain_relinquish_resources )
> >          hvm_funcs.nhvm_domain_relinquish_resources(d);
> >
> > -    hvm_destroy_ioreq_page(d, &d->arch.hvm_domain.ioreq);
> > -    hvm_destroy_ioreq_page(d, &d->arch.hvm_domain.buf_ioreq);
> > +    hvm_deinit_ioreq_server(d);
> >
> >      msixtbl_pt_cleanup(d);
> >
> > @@ -1155,7 +1321,7 @@ int hvm_vcpu_initialise(struct vcpu *v)
> >  {
> >      int rc;
> >      struct domain *d = v->domain;
> > -    domid_t dm_domid;
> > +    struct hvm_ioreq_server *s;
> >
> >      hvm_asid_flush_vcpu(v);
> >
> > @@ -1198,30 +1364,12 @@ int hvm_vcpu_initialise(struct vcpu *v)
> >           && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown:
> nestedhvm_vcpu_destroy */
> >          goto fail5;
> >
> > -    dm_domid = d-
> >arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN];
> > +    s = d->arch.hvm_domain.ioreq_server;
> >
> > -    /* Create ioreq event channel. */
> > -    rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); /*
> teardown: none */
> > +    rc = hvm_ioreq_server_add_vcpu(s, v);
> >      if ( rc < 0 )
> >          goto fail6;
> >
> > -    /* Register ioreq event channel. */
> > -    v->arch.hvm_vcpu.xen_port = rc;
> > -
> > -    if ( v->vcpu_id == 0 )
> > -    {
> > -        /* Create bufioreq event channel. */
> > -        rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); /*
> teardown: none */
> > -        if ( rc < 0 )
> > -            goto fail6;
> > -        d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] =
> rc;
> > -    }
> > -
> > -    spin_lock(&d->arch.hvm_domain.ioreq.lock);
> > -    if ( d->arch.hvm_domain.ioreq.va != NULL )
> > -        get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
> > -    spin_unlock(&d->arch.hvm_domain.ioreq.lock);
> > -
> >      if ( v->vcpu_id == 0 )
> >      {
> >          /* NB. All these really belong in hvm_domain_initialise(). */
> > @@ -1255,6 +1403,11 @@ int hvm_vcpu_initialise(struct vcpu *v)
> >
> >  void hvm_vcpu_destroy(struct vcpu *v)
> >  {
> > +    struct domain *d = v->domain;
> > +    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> > +
> > +    hvm_ioreq_server_remove_vcpu(s, v);
> > +
> >      nestedhvm_vcpu_destroy(v);
> >
> >      free_compat_arg_xlat(v);
> > @@ -1266,9 +1419,6 @@ void hvm_vcpu_destroy(struct vcpu *v)
> >          vlapic_destroy(v);
> >
> >      hvm_funcs.vcpu_destroy(v);
> > -
> > -    /* Event channel is already freed by evtchn_destroy(). */
> > -    /*free_xen_event_channel(v, v->arch.hvm_vcpu.xen_port);*/
> >  }
> >
> >  void hvm_vcpu_down(struct vcpu *v)
> > @@ -1298,8 +1448,10 @@ void hvm_vcpu_down(struct vcpu *v)
> >  int hvm_buffered_io_send(ioreq_t *p)
> >  {
> >      struct vcpu *v = current;
> > -    struct hvm_ioreq_page *iorp = &v->domain-
> >arch.hvm_domain.buf_ioreq;
> > -    buffered_iopage_t *pg = iorp->va;
> > +    struct domain *d = v->domain;
> > +    struct hvm_ioreq_server *s;
> > +    struct hvm_ioreq_page *iorp;
> > +    buffered_iopage_t *pg;
> >      buf_ioreq_t bp;
> >      /* Timeoffset sends 64b data, but no address. Use two consecutive
> slots. */
> >      int qw = 0;
> > @@ -1307,6 +1459,13 @@ int hvm_buffered_io_send(ioreq_t *p)
> >      /* Ensure buffered_iopage fits in a page */
> >      BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
> >
> > +    s = d->arch.hvm_domain.ioreq_server;
> > +    if ( !s )
> > +        return 0;
> > +
> > +    iorp = &s->buf_ioreq;
> > +    pg = iorp->va;
> > +
> >      /*
> >       * Return 0 for the cases we can't deal with:
> >       *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
> > @@ -1367,8 +1526,7 @@ int hvm_buffered_io_send(ioreq_t *p)
> >      wmb();
> >      pg->write_pointer += qw ? 2 : 1;
> >
> > -    notify_via_xen_event_channel(v->domain,
> > -            v->domain-
> >arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]);
> > +    notify_via_xen_event_channel(d, s->buf_ioreq_evtchn);
> >      spin_unlock(&iorp->lock);
> >
> >      return 1;
> > @@ -1376,22 +1534,29 @@ int hvm_buffered_io_send(ioreq_t *p)
> >
> >  bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *proto_p)
> >  {
> > +    struct domain *d = v->domain;
> > +    struct hvm_ioreq_server *s;
> >      ioreq_t *p;
> >
> >      if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
> >          return 0; /* implicitly bins the i/o operation */
> >
> > -    if ( !(p = get_ioreq(v)) )
> > +    s = d->arch.hvm_domain.ioreq_server;
> > +    if ( !s )
> >          return 0;
> >
> > +    p = get_ioreq(s, v->vcpu_id);
> > +
> >      if ( unlikely(p->state != STATE_IOREQ_NONE) )
> >      {
> >          /* This indicates a bug in the device model. Crash the domain. */
> >          gdprintk(XENLOG_ERR, "Device model set bad IO state %d.\n", p-
> >state);
> > -        domain_crash(v->domain);
> > +        domain_crash(d);
> >          return 0;
> >      }
> >
> > +    v->arch.hvm_vcpu.ioreq_server = s;
> > +
> >      p->dir = proto_p->dir;
> >      p->data_is_ptr = proto_p->data_is_ptr;
> >      p->type = proto_p->type;
> > @@ -1401,14 +1566,14 @@ bool_t hvm_send_assist_req(struct vcpu *v,
> ioreq_t *proto_p)
> >      p->df = proto_p->df;
> >      p->data = proto_p->data;
> >
> > -    prepare_wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port);
> > +    prepare_wait_on_xen_event_channel(p->vp_eport);
> >
> >      /*
> >       * Following happens /after/ blocking and setting up ioreq contents.
> >       * prepare_wait_on_xen_event_channel() is an implicit barrier.
> >       */
> >      p->state = STATE_IOREQ_READY;
> > -    notify_via_xen_event_channel(v->domain, v-
> >arch.hvm_vcpu.xen_port);
> > +    notify_via_xen_event_channel(d, p->vp_eport);
> >
> >      return 1;
> >  }
> > @@ -3995,21 +4160,6 @@ static int hvmop_flush_tlb_all(void)
> >      return 0;
> >  }
> >
> > -static int hvm_replace_event_channel(struct vcpu *v, domid_t
> remote_domid,
> > -                                     int *p_port)
> > -{
> > -    int old_port, new_port;
> > -
> > -    new_port = alloc_unbound_xen_event_channel(v, remote_domid,
> NULL);
> > -    if ( new_port < 0 )
> > -        return new_port;
> > -
> > -    /* xchg() ensures that only we call free_xen_event_channel(). */
> > -    old_port = xchg(p_port, new_port);
> > -    free_xen_event_channel(v, old_port);
> > -    return 0;
> > -}
> > -
> >  long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void)
> arg)
> >
> >  {
> > @@ -4022,7 +4172,7 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> >      case HVMOP_get_param:
> >      {
> >          struct xen_hvm_param a;
> > -        struct hvm_ioreq_page *iorp;
> > +        struct hvm_ioreq_server *s;
> >          struct domain *d;
> >          struct vcpu *v;
> >
> > @@ -4048,6 +4198,8 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> >          if ( rc )
> >              goto param_fail;
> >
> > +        s = d->arch.hvm_domain.ioreq_server;
> > +
> 
> This should be reduced in lexical scope, and I would have said that it
> can just be 'inlined' into each of the 4 uses later.
>

Again. It's done this way to make the patch sequencing work better together.
 
> >          if ( op == HVMOP_set_param )
> >          {
> >              rc = 0;
> > @@ -4055,19 +4207,10 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> >              switch ( a.index )
> >              {
> >              case HVM_PARAM_IOREQ_PFN:
> > -                iorp = &d->arch.hvm_domain.ioreq;
> > -                if ( (rc = hvm_set_ioreq_page(d, iorp, a.value)) != 0 )
> > -                    break;
> > -                spin_lock(&iorp->lock);
> > -                if ( iorp->va != NULL )
> > -                    /* Initialise evtchn port info if VCPUs already created. */
> > -                    for_each_vcpu ( d, v )
> > -                        get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
> > -                spin_unlock(&iorp->lock);
> > +                rc = hvm_set_ioreq_server_pfn(s, a.value);
> >                  break;
> >              case HVM_PARAM_BUFIOREQ_PFN:
> > -                iorp = &d->arch.hvm_domain.buf_ioreq;
> > -                rc = hvm_set_ioreq_page(d, iorp, a.value);
> > +                rc = hvm_set_ioreq_server_buf_pfn(s, a.value);
> >                  break;
> >              case HVM_PARAM_CALLBACK_IRQ:
> >                  hvm_set_callback_via(d, a.value);
> > @@ -4122,31 +4265,7 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> >                  if ( a.value == DOMID_SELF )
> >                      a.value = curr_d->domain_id;
> >
> > -                rc = 0;
> > -                domain_pause(d); /* safe to change per-vcpu xen_port */
> > -                if ( d->vcpu[0] )
> > -                    rc = hvm_replace_event_channel(d->vcpu[0], a.value,
> > -                             (int *)&d->vcpu[0]->domain->arch.hvm_domain.params
> > -                                     [HVM_PARAM_BUFIOREQ_EVTCHN]);
> > -                if ( rc )
> > -                {
> > -                    domain_unpause(d);
> > -                    break;
> > -                }
> > -                iorp = &d->arch.hvm_domain.ioreq;
> > -                for_each_vcpu ( d, v )
> > -                {
> > -                    rc = hvm_replace_event_channel(v, a.value,
> > -                                                   &v->arch.hvm_vcpu.xen_port);
> > -                    if ( rc )
> > -                        break;
> > -
> > -                    spin_lock(&iorp->lock);
> > -                    if ( iorp->va != NULL )
> > -                        get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port;
> > -                    spin_unlock(&iorp->lock);
> > -                }
> > -                domain_unpause(d);
> > +                rc = hvm_set_ioreq_server_domid(s, a.value);
> >                  break;
> >              case HVM_PARAM_ACPI_S_STATE:
> >                  /* Not reflexive, as we must domain_pause(). */
> > @@ -4241,6 +4360,9 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> >          {
> >              switch ( a.index )
> >              {
> > +            case HVM_PARAM_BUFIOREQ_EVTCHN:
> > +                a.value = s->buf_ioreq_evtchn;
> > +                break;
> >              case HVM_PARAM_ACPI_S_STATE:
> >                  a.value = d->arch.hvm_domain.is_s3_suspended ? 3 : 0;
> >                  break;
> > diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-
> x86/hvm/domain.h
> > index b1e3187..4c039f8 100644
> > --- a/xen/include/asm-x86/hvm/domain.h
> > +++ b/xen/include/asm-x86/hvm/domain.h
> > @@ -41,10 +41,17 @@ struct hvm_ioreq_page {
> >      void *va;
> >  };
> >
> > -struct hvm_domain {
> > +struct hvm_ioreq_server {
> > +    struct domain          *domain;
> > +    domid_t                domid;
> >      struct hvm_ioreq_page  ioreq;
> > +    int                    ioreq_evtchn[MAX_HVM_VCPUS];
> >      struct hvm_ioreq_page  buf_ioreq;
> > +    int                    buf_ioreq_evtchn;
> > +};
> >
> > +struct hvm_domain {
> > +    struct hvm_ioreq_server *ioreq_server;
> >      struct pl_time         pl_time;
> >
> >      struct hvm_io_handler *io_handler;
> > diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-
> x86/hvm/vcpu.h
> > index 122ab0d..4c9d7ee 100644
> > --- a/xen/include/asm-x86/hvm/vcpu.h
> > +++ b/xen/include/asm-x86/hvm/vcpu.h
> > @@ -138,7 +138,7 @@ struct hvm_vcpu {
> >      spinlock_t          tm_lock;
> >      struct list_head    tm_list;
> >
> > -    int                 xen_port;
> > +    struct hvm_ioreq_server *ioreq_server;
> >
> 
> Why do both hvm_vcpu and hvm_domain need ioreq_server pointers?  I
> cant
> spot anything which actually uses the vcpu one.
> 

Your first comment is about one of those uses!

To explain... The reference is copied into the vcpu struct when the ioreq is sent to the emulator and removed when the response comes back. Now, this is strictly not necessary when dealing with one a single emulator per domain but once we move to a list of emulators then usually only one of them is in use for a particular vcpu at a time (except for the one case of a broadcast ioreq - for mapcache invalidate). This is why we need to track ioreq servers per vcpu as well as per domain.

  Paul

> ~Andrew
> 
> >      bool_t              flag_dr_dirty;
> >      bool_t              debug_state_latch;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 3/5] ioreq-server: on-demand creation of ioreq server
  2014-01-30 14:19 [RFC PATCH 1/5] Support for running secondary emulators Paul Durrant
  2014-01-30 14:19 ` [RFC PATCH 1/5] ioreq-server: centralize access to ioreq structures Paul Durrant
  2014-01-30 14:19 ` [RFC PATCH 2/5] ioreq-server: create basic ioreq server abstraction Paul Durrant
@ 2014-01-30 14:19 ` Paul Durrant
  2014-01-30 15:21   ` Andrew Cooper
  2014-01-30 14:19 ` [RFC PATCH 4/5] ioreq-server: add support for multiple servers Paul Durrant
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Paul Durrant @ 2014-01-30 14:19 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant

This patch only creates the ioreq server when the legacy HVM parameters
are touched by an emulator. It also lays some groundwork for supporting
multiple IOREQ servers. For instance, it introduces ioreq server reference
counting which is not strictly necessary at this stage but will become so
when ioreq servers can be destroyed prior the domain dying.

There is a significant change in the layout of the special pages reserved
in xc_hvm_build_x86.c. This is so that we can 'grow' them downwards without
moving pages such as the xenstore page when building a domain that can
support more than one emulator.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
 tools/libxc/xc_hvm_build_x86.c   |   41 ++--
 xen/arch/x86/hvm/hvm.c           |  409 ++++++++++++++++++++++++++------------
 xen/include/asm-x86/hvm/domain.h |    3 +-
 3 files changed, 314 insertions(+), 139 deletions(-)

diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index 77bd365..f24f2a1 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -41,13 +41,12 @@
 #define SPECIALPAGE_PAGING   0
 #define SPECIALPAGE_ACCESS   1
 #define SPECIALPAGE_SHARING  2
-#define SPECIALPAGE_BUFIOREQ 3
-#define SPECIALPAGE_XENSTORE 4
-#define SPECIALPAGE_IOREQ    5
-#define SPECIALPAGE_IDENT_PT 6
-#define SPECIALPAGE_CONSOLE  7
-#define NR_SPECIAL_PAGES     8
-#define special_pfn(x) (0xff000u - NR_SPECIAL_PAGES + (x))
+#define SPECIALPAGE_XENSTORE 3
+#define SPECIALPAGE_IDENT_PT 4
+#define SPECIALPAGE_CONSOLE  5
+#define SPECIALPAGE_IOREQ    6
+#define NR_SPECIAL_PAGES     SPECIALPAGE_IOREQ + 2 /* ioreq server needs 2 pages */
+#define special_pfn(x) (0xff000u - (x))
 
 static int modules_init(struct xc_hvm_build_args *args,
                         uint64_t vend, struct elf_binary *elf,
@@ -112,7 +111,7 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
     /* Memory parameters. */
     hvm_info->low_mem_pgend = lowmem_end >> PAGE_SHIFT;
     hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT;
-    hvm_info->reserved_mem_pgstart = special_pfn(0);
+    hvm_info->reserved_mem_pgstart = special_pfn(0) - NR_SPECIAL_PAGES;
 
     /* Finish with the checksum. */
     for ( i = 0, sum = 0; i < hvm_info->length; i++ )
@@ -463,6 +462,24 @@ static int setup_guest(xc_interface *xch,
     munmap(hvm_info_page, PAGE_SIZE);
 
     /* Allocate and clear special pages. */
+
+     DPRINTF("%d SPECIAL PAGES:\n"
+            "  PAGING:    %"PRI_xen_pfn"\n"
+            "  ACCESS:    %"PRI_xen_pfn"\n"
+            "  SHARING:   %"PRI_xen_pfn"\n"
+            "  STORE:     %"PRI_xen_pfn"\n"
+            "  IDENT_PT:  %"PRI_xen_pfn"\n"
+            "  CONSOLE:   %"PRI_xen_pfn"\n"
+            "  IOREQ:     %"PRI_xen_pfn"\n",
+            NR_SPECIAL_PAGES,
+            (xen_pfn_t)special_pfn(SPECIALPAGE_PAGING),
+            (xen_pfn_t)special_pfn(SPECIALPAGE_ACCESS),
+            (xen_pfn_t)special_pfn(SPECIALPAGE_SHARING),
+            (xen_pfn_t)special_pfn(SPECIALPAGE_XENSTORE),
+            (xen_pfn_t)special_pfn(SPECIALPAGE_IDENT_PT),
+            (xen_pfn_t)special_pfn(SPECIALPAGE_CONSOLE),
+            (xen_pfn_t)special_pfn(SPECIALPAGE_IOREQ));
+
     for ( i = 0; i < NR_SPECIAL_PAGES; i++ )
     {
         xen_pfn_t pfn = special_pfn(i);
@@ -478,10 +495,6 @@ static int setup_guest(xc_interface *xch,
 
     xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_PFN,
                      special_pfn(SPECIALPAGE_XENSTORE));
-    xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
-                     special_pfn(SPECIALPAGE_BUFIOREQ));
-    xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN,
-                     special_pfn(SPECIALPAGE_IOREQ));
     xc_set_hvm_param(xch, dom, HVM_PARAM_CONSOLE_PFN,
                      special_pfn(SPECIALPAGE_CONSOLE));
     xc_set_hvm_param(xch, dom, HVM_PARAM_PAGING_RING_PFN,
@@ -490,6 +503,10 @@ static int setup_guest(xc_interface *xch,
                      special_pfn(SPECIALPAGE_ACCESS));
     xc_set_hvm_param(xch, dom, HVM_PARAM_SHARING_RING_PFN,
                      special_pfn(SPECIALPAGE_SHARING));
+    xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN,
+                     special_pfn(SPECIALPAGE_IOREQ));
+    xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
+                     special_pfn(SPECIALPAGE_IOREQ) - 1);
 
     /*
      * Identity-map page table is required for running with CR0.PG=0 when
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index a0eaadb..d9874fb 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -352,24 +352,9 @@ static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, int id)
     return &p->vcpu_ioreq[id];
 }
 
-void hvm_do_resume(struct vcpu *v)
+static void hvm_wait_on_io(struct domain *d, ioreq_t *p)
 {
-    struct hvm_ioreq_server *s;
-    ioreq_t *p;
-
-    check_wakeup_from_wait();
-
-    if ( is_hvm_vcpu(v) )
-        pt_restore_timer(v);
-
-    s = v->arch.hvm_vcpu.ioreq_server;
-    v->arch.hvm_vcpu.ioreq_server = NULL;
-
-    if ( !s )
-        goto check_inject_trap;
-
     /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
-    p = get_ioreq(s, v->vcpu_id);
     while ( p->state != STATE_IOREQ_NONE )
     {
         switch ( p->state )
@@ -385,12 +370,32 @@ void hvm_do_resume(struct vcpu *v)
             break;
         default:
             gdprintk(XENLOG_ERR, "Weird HVM iorequest state %d.\n", p->state);
-            domain_crash(v->domain);
+            domain_crash(d);
             return; /* bail */
         }
     }
+}
+
+void hvm_do_resume(struct vcpu *v)
+{
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s;
+
+    check_wakeup_from_wait();
+
+    if ( is_hvm_vcpu(v) )
+        pt_restore_timer(v);
+
+    s = v->arch.hvm_vcpu.ioreq_server;
+    v->arch.hvm_vcpu.ioreq_server = NULL;
+
+    if ( s )
+    {
+        ioreq_t *p = get_ioreq(s, v->vcpu_id);
+
+        hvm_wait_on_io(d, p);
+    }
 
- check_inject_trap:
     /* Inject pending hw/sw trap */
     if ( v->arch.hvm_vcpu.inject_trap.vector != -1 ) 
     {
@@ -399,11 +404,13 @@ void hvm_do_resume(struct vcpu *v)
     }
 }
 
-static void hvm_init_ioreq_page(
-    struct domain *d, struct hvm_ioreq_page *iorp)
+static void hvm_init_ioreq_page(struct hvm_ioreq_server *s, int buf)
 {
+    struct hvm_ioreq_page *iorp;
+
+    iorp = ( buf ) ? &s->buf_ioreq : &s->ioreq;
+
     spin_lock_init(&iorp->lock);
-    domain_pause(d);
 }
 
 void destroy_ring_for_helper(
@@ -419,16 +426,13 @@ void destroy_ring_for_helper(
     }
 }
 
-static void hvm_destroy_ioreq_page(
-    struct domain *d, struct hvm_ioreq_page *iorp)
+static void hvm_destroy_ioreq_page(struct hvm_ioreq_server *s, int buf)
 {
-    spin_lock(&iorp->lock);
+    struct hvm_ioreq_page *iorp;
 
-    ASSERT(d->is_dying);
+    iorp = ( buf ) ? &s->buf_ioreq : &s->ioreq;
 
     destroy_ring_for_helper(&iorp->va, iorp->page);
-
-    spin_unlock(&iorp->lock);
 }
 
 int prepare_ring_for_helper(
@@ -476,8 +480,10 @@ int prepare_ring_for_helper(
 }
 
 static int hvm_set_ioreq_page(
-    struct domain *d, struct hvm_ioreq_page *iorp, unsigned long gmfn)
+    struct hvm_ioreq_server *s, int buf, unsigned long gmfn)
 {
+    struct domain *d = s->domain;
+    struct hvm_ioreq_page *iorp;
     struct page_info *page;
     void *va;
     int rc;
@@ -485,22 +491,17 @@ static int hvm_set_ioreq_page(
     if ( (rc = prepare_ring_for_helper(d, gmfn, &page, &va)) )
         return rc;
 
-    spin_lock(&iorp->lock);
+    iorp = ( buf ) ? &s->buf_ioreq : &s->ioreq;
 
     if ( (iorp->va != NULL) || d->is_dying )
     {
-        destroy_ring_for_helper(&iorp->va, iorp->page);
-        spin_unlock(&iorp->lock);
+        destroy_ring_for_helper(&va, page);
         return -EINVAL;
     }
 
     iorp->va = va;
     iorp->page = page;
 
-    spin_unlock(&iorp->lock);
-
-    domain_unpause(d);
-
     return 0;
 }
 
@@ -544,38 +545,6 @@ static int handle_pvh_io(
     return X86EMUL_OKAY;
 }
 
-static int hvm_init_ioreq_server(struct domain *d)
-{
-    struct hvm_ioreq_server *s;
-    int i;
-
-    s = xzalloc(struct hvm_ioreq_server);
-    if ( !s )
-        return -ENOMEM;
-
-    s->domain = d;
-
-    for ( i = 0; i < MAX_HVM_VCPUS; i++ )
-        s->ioreq_evtchn[i] = -1;
-    s->buf_ioreq_evtchn = -1;
-
-    hvm_init_ioreq_page(d, &s->ioreq);
-    hvm_init_ioreq_page(d, &s->buf_ioreq);
-
-    d->arch.hvm_domain.ioreq_server = s;
-    return 0;
-}
-
-static void hvm_deinit_ioreq_server(struct domain *d)
-{
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
-
-    hvm_destroy_ioreq_page(d, &s->ioreq);
-    hvm_destroy_ioreq_page(d, &s->buf_ioreq);
-
-    xfree(s);
-}
-
 static void hvm_update_ioreq_server_evtchn(struct hvm_ioreq_server *s)
 {
     struct domain *d = s->domain;
@@ -637,6 +606,152 @@ static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, struct vcpu
     }
 }
 
+static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
+{
+    struct hvm_ioreq_server *s;
+    int i;
+    unsigned long pfn;
+    struct vcpu *v;
+    int rc;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    rc = -EEXIST;
+    if ( d->arch.hvm_domain.ioreq_server != NULL )
+        goto fail_exist;
+
+    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, d->domain_id);
+
+    rc = -ENOMEM;
+    s = xzalloc(struct hvm_ioreq_server);
+    if ( !s )
+        goto fail_alloc;
+
+    s->domain = d;
+    s->domid = domid;
+
+    for ( i = 0; i < MAX_HVM_VCPUS; i++ )
+        s->ioreq_evtchn[i] = -1;
+    s->buf_ioreq_evtchn = -1;
+
+    /* Initialize shared pages */
+    pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
+
+    hvm_init_ioreq_page(s, 0);
+    if ( (rc = hvm_set_ioreq_page(s, 0, pfn)) < 0 )
+        goto fail_set_ioreq;
+
+    pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
+
+    hvm_init_ioreq_page(s, 1);
+    if ( (rc = hvm_set_ioreq_page(s, 1, pfn)) < 0 )
+        goto fail_set_buf_ioreq;
+
+    for_each_vcpu ( d, v )
+    {
+        if ( (rc = hvm_ioreq_server_add_vcpu(s, v)) < 0 )
+            goto fail_add_vcpu;
+    }
+
+    d->arch.hvm_domain.ioreq_server = s;
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return 0;
+
+fail_add_vcpu:
+    for_each_vcpu ( d, v )
+        hvm_ioreq_server_remove_vcpu(s, v);
+    hvm_destroy_ioreq_page(s, 1);
+fail_set_buf_ioreq:
+    hvm_destroy_ioreq_page(s, 0);
+fail_set_ioreq:
+    xfree(s);
+fail_alloc:
+fail_exist:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+    return rc;
+}
+
+static void hvm_destroy_ioreq_server(struct domain *d)
+{
+    struct hvm_ioreq_server *s;
+    struct vcpu *v;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, d->domain_id);
+
+    s = d->arch.hvm_domain.ioreq_server;
+    if ( !s )
+        goto done;
+
+    domain_pause(d);
+
+    d->arch.hvm_domain.ioreq_server = NULL;
+
+    for_each_vcpu ( d, v )
+        hvm_ioreq_server_remove_vcpu(s, v);
+
+    hvm_destroy_ioreq_page(s, 1);
+    hvm_destroy_ioreq_page(s, 0);
+
+    xfree(s);
+
+    domain_unpause(d);
+
+done:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+}
+
+static int hvm_get_ioreq_server_buf_port(struct domain *d, evtchn_port_t *port)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    s = d->arch.hvm_domain.ioreq_server;
+
+    rc = -ENOENT;
+    if ( !s )
+        goto done;
+
+    *port = s->buf_ioreq_evtchn;
+    rc = 0;
+
+done:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
+static int hvm_get_ioreq_server_pfn(struct domain *d, int buf, xen_pfn_t *pfn)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    s = d->arch.hvm_domain.ioreq_server;
+
+    rc = -ENOENT;
+    if ( !s )
+        goto done;
+
+    if ( buf )
+        *pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
+    else
+        *pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
+
+    rc = 0;
+
+done:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
 static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
                                      int *p_port)
 {
@@ -652,13 +767,24 @@ static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
     return 0;
 }
 
-static int hvm_set_ioreq_server_domid(struct hvm_ioreq_server *s, domid_t domid)
+static int hvm_set_ioreq_server_domid(struct domain *d, domid_t domid)
 {
-    struct domain *d = s->domain;
+    struct hvm_ioreq_server *s;
     struct vcpu *v;
     int rc = 0;
 
     domain_pause(d);
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    s = d->arch.hvm_domain.ioreq_server;
+
+    rc = -ENOENT;
+    if ( !s )
+        goto done;
+
+    rc = 0;
+    if ( s->domid == domid )
+        goto done;
 
     if ( d->vcpu[0] )
     {
@@ -680,31 +806,11 @@ static int hvm_set_ioreq_server_domid(struct hvm_ioreq_server *s, domid_t domid)
 
 done:
     domain_unpause(d);
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
 
     return rc;
 }
 
-static int hvm_set_ioreq_server_pfn(struct hvm_ioreq_server *s, unsigned long pfn)
-{
-    struct domain *d = s->domain;
-    int rc;
-
-    rc = hvm_set_ioreq_page(d, &s->ioreq, pfn);
-    if ( rc < 0 )
-        return rc;
-
-    hvm_update_ioreq_server_evtchn(s);
-
-    return 0;
-}
-
-static int hvm_set_ioreq_server_buf_pfn(struct hvm_ioreq_server *s, unsigned long pfn)
-{
-    struct domain *d = s->domain;
-
-    return  hvm_set_ioreq_page(d, &s->buf_ioreq, pfn);
-}
-
 int hvm_domain_initialise(struct domain *d)
 {
     int rc;
@@ -732,6 +838,7 @@ int hvm_domain_initialise(struct domain *d)
 
     }
 
+    spin_lock_init(&d->arch.hvm_domain.ioreq_server_lock);
     spin_lock_init(&d->arch.hvm_domain.irq_lock);
     spin_lock_init(&d->arch.hvm_domain.uc_lock);
 
@@ -772,20 +879,14 @@ int hvm_domain_initialise(struct domain *d)
 
     rtc_init(d);
 
-    rc = hvm_init_ioreq_server(d);
-    if ( rc != 0 )
-        goto fail2;
-
     register_portio_handler(d, 0xe9, 1, hvm_print_line);
 
     rc = hvm_funcs.domain_initialise(d);
     if ( rc != 0 )
-        goto fail3;
+        goto fail2;
 
     return 0;
 
- fail3:
-    hvm_deinit_ioreq_server(d);
  fail2:
     rtc_deinit(d);
     stdvga_deinit(d);
@@ -809,7 +910,7 @@ void hvm_domain_relinquish_resources(struct domain *d)
     if ( hvm_funcs.nhvm_domain_relinquish_resources )
         hvm_funcs.nhvm_domain_relinquish_resources(d);
 
-    hvm_deinit_ioreq_server(d);
+    hvm_destroy_ioreq_server(d);
 
     msixtbl_pt_cleanup(d);
 
@@ -1364,11 +1465,16 @@ int hvm_vcpu_initialise(struct vcpu *v)
          && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown: nestedhvm_vcpu_destroy */
         goto fail5;
 
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
     s = d->arch.hvm_domain.ioreq_server;
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
 
-    rc = hvm_ioreq_server_add_vcpu(s, v);
-    if ( rc < 0 )
-        goto fail6;
+    if ( s )
+    {
+        rc = hvm_ioreq_server_add_vcpu(s, v);
+        if ( rc < 0 )
+            goto fail6;
+    }
 
     if ( v->vcpu_id == 0 )
     {
@@ -1404,9 +1510,14 @@ int hvm_vcpu_initialise(struct vcpu *v)
 void hvm_vcpu_destroy(struct vcpu *v)
 {
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
+    struct hvm_ioreq_server *s;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+    s = d->arch.hvm_domain.ioreq_server;
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
 
-    hvm_ioreq_server_remove_vcpu(s, v);
+    if ( s )
+        hvm_ioreq_server_remove_vcpu(s, v);
 
     nestedhvm_vcpu_destroy(v);
 
@@ -1459,7 +1570,10 @@ int hvm_buffered_io_send(ioreq_t *p)
     /* Ensure buffered_iopage fits in a page */
     BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
 
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
     s = d->arch.hvm_domain.ioreq_server;
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
     if ( !s )
         return 0;
 
@@ -1532,20 +1646,12 @@ int hvm_buffered_io_send(ioreq_t *p)
     return 1;
 }
 
-bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *proto_p)
+static bool_t hvm_send_assist_req_to_server(struct hvm_ioreq_server *s,
+                                            struct vcpu *v,
+                                            ioreq_t *proto_p)
 {
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s;
-    ioreq_t *p;
-
-    if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
-        return 0; /* implicitly bins the i/o operation */
-
-    s = d->arch.hvm_domain.ioreq_server;
-    if ( !s )
-        return 0;
-
-    p = get_ioreq(s, v->vcpu_id);
+    ioreq_t *p = get_ioreq(s, v->vcpu_id);
 
     if ( unlikely(p->state != STATE_IOREQ_NONE) )
     {
@@ -1578,6 +1684,26 @@ bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *proto_p)
     return 1;
 }
 
+bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *p)
+{
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s;
+
+    ASSERT(v->arch.hvm_vcpu.ioreq_server == NULL);
+
+    if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
+        return 0;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+    s = d->arch.hvm_domain.ioreq_server;
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    if ( !s )
+        return 0;
+
+    return hvm_send_assist_req_to_server(s, v, p);
+}
+
 void hvm_hlt(unsigned long rflags)
 {
     struct vcpu *curr = current;
@@ -4172,7 +4298,6 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
     case HVMOP_get_param:
     {
         struct xen_hvm_param a;
-        struct hvm_ioreq_server *s;
         struct domain *d;
         struct vcpu *v;
 
@@ -4198,20 +4323,12 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( rc )
             goto param_fail;
 
-        s = d->arch.hvm_domain.ioreq_server;
-
         if ( op == HVMOP_set_param )
         {
             rc = 0;
 
             switch ( a.index )
             {
-            case HVM_PARAM_IOREQ_PFN:
-                rc = hvm_set_ioreq_server_pfn(s, a.value);
-                break;
-            case HVM_PARAM_BUFIOREQ_PFN: 
-                rc = hvm_set_ioreq_server_buf_pfn(s, a.value);
-                break;
             case HVM_PARAM_CALLBACK_IRQ:
                 hvm_set_callback_via(d, a.value);
                 hvm_latch_shinfo_size(d);
@@ -4265,7 +4382,9 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 if ( a.value == DOMID_SELF )
                     a.value = curr_d->domain_id;
 
-                rc = hvm_set_ioreq_server_domid(s, a.value);
+                rc = hvm_create_ioreq_server(d, a.value);
+                if ( rc == -EEXIST )
+                    rc = hvm_set_ioreq_server_domid(d, a.value);
                 break;
             case HVM_PARAM_ACPI_S_STATE:
                 /* Not reflexive, as we must domain_pause(). */
@@ -4360,8 +4479,46 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
         {
             switch ( a.index )
             {
+            case HVM_PARAM_IOREQ_PFN:
+            case HVM_PARAM_BUFIOREQ_PFN:
             case HVM_PARAM_BUFIOREQ_EVTCHN:
-                a.value = s->buf_ioreq_evtchn;
+                /* May need to create server */
+                rc = hvm_create_ioreq_server(d, curr_d->domain_id);
+                if ( rc != 0 && rc != -EEXIST )
+                    goto param_fail;
+
+                switch ( a.index )
+                {
+                case HVM_PARAM_IOREQ_PFN: {
+                    xen_pfn_t pfn;
+
+                    if ( (rc = hvm_get_ioreq_server_pfn(d, 0, &pfn)) < 0 )
+                        goto param_fail;
+
+                    a.value = pfn;
+                    break;
+                }
+                case HVM_PARAM_BUFIOREQ_PFN: {
+                    xen_pfn_t pfn;
+
+                    if ( (rc = hvm_get_ioreq_server_pfn(d, 1, &pfn)) < 0 )
+                        goto param_fail;
+
+                    a.value = pfn;
+                    break;
+                }
+                case HVM_PARAM_BUFIOREQ_EVTCHN: {
+                    evtchn_port_t port;
+
+                    if ( (rc = hvm_get_ioreq_server_buf_port(d, &port)) < 0 )
+                        goto param_fail;
+
+                    a.value = port;
+                    break;
+                }
+                default:
+                    BUG();
+                }
                 break;
             case HVM_PARAM_ACPI_S_STATE:
                 a.value = d->arch.hvm_domain.is_s3_suspended ? 3 : 0;
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 4c039f8..e750ef0 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -52,6 +52,8 @@ struct hvm_ioreq_server {
 
 struct hvm_domain {
     struct hvm_ioreq_server *ioreq_server;
+    spinlock_t              ioreq_server_lock;
+
     struct pl_time         pl_time;
 
     struct hvm_io_handler *io_handler;
@@ -106,4 +108,3 @@ struct hvm_domain {
 #define hap_enabled(d)  ((d)->arch.hvm_domain.hap_enabled)
 
 #endif /* __ASM_X86_HVM_DOMAIN_H__ */
-
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 3/5] ioreq-server: on-demand creation of ioreq server
  2014-01-30 14:19 ` [RFC PATCH 3/5] ioreq-server: on-demand creation of ioreq server Paul Durrant
@ 2014-01-30 15:21   ` Andrew Cooper
  2014-01-30 15:32     ` Paul Durrant
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2014-01-30 15:21 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

On 30/01/14 14:19, Paul Durrant wrote:
> This patch only creates the ioreq server when the legacy HVM parameters
> are touched by an emulator. It also lays some groundwork for supporting
> multiple IOREQ servers. For instance, it introduces ioreq server reference
> counting which is not strictly necessary at this stage but will become so
> when ioreq servers can be destroyed prior the domain dying.
>
> There is a significant change in the layout of the special pages reserved
> in xc_hvm_build_x86.c. This is so that we can 'grow' them downwards without
> moving pages such as the xenstore page when building a domain that can
> support more than one emulator.
>
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> ---
>  tools/libxc/xc_hvm_build_x86.c   |   41 ++--
>  xen/arch/x86/hvm/hvm.c           |  409 ++++++++++++++++++++++++++------------
>  xen/include/asm-x86/hvm/domain.h |    3 +-
>  3 files changed, 314 insertions(+), 139 deletions(-)
>
> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> index 77bd365..f24f2a1 100644
> --- a/tools/libxc/xc_hvm_build_x86.c
> +++ b/tools/libxc/xc_hvm_build_x86.c
> @@ -41,13 +41,12 @@
>  #define SPECIALPAGE_PAGING   0
>  #define SPECIALPAGE_ACCESS   1
>  #define SPECIALPAGE_SHARING  2
> -#define SPECIALPAGE_BUFIOREQ 3
> -#define SPECIALPAGE_XENSTORE 4
> -#define SPECIALPAGE_IOREQ    5
> -#define SPECIALPAGE_IDENT_PT 6
> -#define SPECIALPAGE_CONSOLE  7
> -#define NR_SPECIAL_PAGES     8
> -#define special_pfn(x) (0xff000u - NR_SPECIAL_PAGES + (x))
> +#define SPECIALPAGE_XENSTORE 3
> +#define SPECIALPAGE_IDENT_PT 4
> +#define SPECIALPAGE_CONSOLE  5
> +#define SPECIALPAGE_IOREQ    6
> +#define NR_SPECIAL_PAGES     SPECIALPAGE_IOREQ + 2 /* ioreq server needs 2 pages */
> +#define special_pfn(x) (0xff000u - (x))
>  
>  static int modules_init(struct xc_hvm_build_args *args,
>                          uint64_t vend, struct elf_binary *elf,
> @@ -112,7 +111,7 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
>      /* Memory parameters. */
>      hvm_info->low_mem_pgend = lowmem_end >> PAGE_SHIFT;
>      hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT;
> -    hvm_info->reserved_mem_pgstart = special_pfn(0);
> +    hvm_info->reserved_mem_pgstart = special_pfn(0) - NR_SPECIAL_PAGES;
>  
>      /* Finish with the checksum. */
>      for ( i = 0, sum = 0; i < hvm_info->length; i++ )
> @@ -463,6 +462,24 @@ static int setup_guest(xc_interface *xch,
>      munmap(hvm_info_page, PAGE_SIZE);
>  
>      /* Allocate and clear special pages. */
> +
> +     DPRINTF("%d SPECIAL PAGES:\n"
> +            "  PAGING:    %"PRI_xen_pfn"\n"
> +            "  ACCESS:    %"PRI_xen_pfn"\n"
> +            "  SHARING:   %"PRI_xen_pfn"\n"
> +            "  STORE:     %"PRI_xen_pfn"\n"
> +            "  IDENT_PT:  %"PRI_xen_pfn"\n"
> +            "  CONSOLE:   %"PRI_xen_pfn"\n"
> +            "  IOREQ:     %"PRI_xen_pfn"\n",
> +            NR_SPECIAL_PAGES,
> +            (xen_pfn_t)special_pfn(SPECIALPAGE_PAGING),
> +            (xen_pfn_t)special_pfn(SPECIALPAGE_ACCESS),
> +            (xen_pfn_t)special_pfn(SPECIALPAGE_SHARING),
> +            (xen_pfn_t)special_pfn(SPECIALPAGE_XENSTORE),
> +            (xen_pfn_t)special_pfn(SPECIALPAGE_IDENT_PT),
> +            (xen_pfn_t)special_pfn(SPECIALPAGE_CONSOLE),
> +            (xen_pfn_t)special_pfn(SPECIALPAGE_IOREQ));
> +

I realise I am being quite picky here, but from a daemon point of view
trying to log to facilities like syslog, multi-line single debugging
messages are a pain.  Would it be possible to do this as 8 DPRINTF()s?

>      for ( i = 0; i < NR_SPECIAL_PAGES; i++ )
>      {
>          xen_pfn_t pfn = special_pfn(i);
> @@ -478,10 +495,6 @@ static int setup_guest(xc_interface *xch,
>  
>      xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_PFN,
>                       special_pfn(SPECIALPAGE_XENSTORE));
> -    xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
> -                     special_pfn(SPECIALPAGE_BUFIOREQ));
> -    xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN,
> -                     special_pfn(SPECIALPAGE_IOREQ));
>      xc_set_hvm_param(xch, dom, HVM_PARAM_CONSOLE_PFN,
>                       special_pfn(SPECIALPAGE_CONSOLE));
>      xc_set_hvm_param(xch, dom, HVM_PARAM_PAGING_RING_PFN,
> @@ -490,6 +503,10 @@ static int setup_guest(xc_interface *xch,
>                       special_pfn(SPECIALPAGE_ACCESS));
>      xc_set_hvm_param(xch, dom, HVM_PARAM_SHARING_RING_PFN,
>                       special_pfn(SPECIALPAGE_SHARING));
> +    xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN,
> +                     special_pfn(SPECIALPAGE_IOREQ));
> +    xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
> +                     special_pfn(SPECIALPAGE_IOREQ) - 1);
>  
>      /*
>       * Identity-map page table is required for running with CR0.PG=0 when
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index a0eaadb..d9874fb 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -352,24 +352,9 @@ static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, int id)
>      return &p->vcpu_ioreq[id];
>  }
>  
> -void hvm_do_resume(struct vcpu *v)
> +static void hvm_wait_on_io(struct domain *d, ioreq_t *p)
>  {
> -    struct hvm_ioreq_server *s;
> -    ioreq_t *p;
> -
> -    check_wakeup_from_wait();
> -
> -    if ( is_hvm_vcpu(v) )
> -        pt_restore_timer(v);
> -
> -    s = v->arch.hvm_vcpu.ioreq_server;
> -    v->arch.hvm_vcpu.ioreq_server = NULL;
> -
> -    if ( !s )
> -        goto check_inject_trap;
> -
>      /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */
> -    p = get_ioreq(s, v->vcpu_id);
>      while ( p->state != STATE_IOREQ_NONE )
>      {
>          switch ( p->state )
> @@ -385,12 +370,32 @@ void hvm_do_resume(struct vcpu *v)
>              break;
>          default:
>              gdprintk(XENLOG_ERR, "Weird HVM iorequest state %d.\n", p->state);
> -            domain_crash(v->domain);
> +            domain_crash(d);
>              return; /* bail */
>          }
>      }
> +}
> +
> +void hvm_do_resume(struct vcpu *v)
> +{
> +    struct domain *d = v->domain;
> +    struct hvm_ioreq_server *s;
> +
> +    check_wakeup_from_wait();
> +
> +    if ( is_hvm_vcpu(v) )
> +        pt_restore_timer(v);
> +
> +    s = v->arch.hvm_vcpu.ioreq_server;
> +    v->arch.hvm_vcpu.ioreq_server = NULL;
> +
> +    if ( s )
> +    {
> +        ioreq_t *p = get_ioreq(s, v->vcpu_id);
> +
> +        hvm_wait_on_io(d, p);
> +    }
>  
> - check_inject_trap:
>      /* Inject pending hw/sw trap */
>      if ( v->arch.hvm_vcpu.inject_trap.vector != -1 ) 
>      {
> @@ -399,11 +404,13 @@ void hvm_do_resume(struct vcpu *v)
>      }
>  }
>  
> -static void hvm_init_ioreq_page(
> -    struct domain *d, struct hvm_ioreq_page *iorp)
> +static void hvm_init_ioreq_page(struct hvm_ioreq_server *s, int buf)
>  {
> +    struct hvm_ioreq_page *iorp;
> +
> +    iorp = ( buf ) ? &s->buf_ioreq : &s->ioreq;
> +

Brackets are redundant.

>      spin_lock_init(&iorp->lock);
> -    domain_pause(d);
>  }
>  
>  void destroy_ring_for_helper(
> @@ -419,16 +426,13 @@ void destroy_ring_for_helper(
>      }
>  }
>  
> -static void hvm_destroy_ioreq_page(
> -    struct domain *d, struct hvm_ioreq_page *iorp)
> +static void hvm_destroy_ioreq_page(struct hvm_ioreq_server *s, int buf)
>  {
> -    spin_lock(&iorp->lock);
> +    struct hvm_ioreq_page *iorp;
>  
> -    ASSERT(d->is_dying);
> +    iorp = ( buf ) ? &s->buf_ioreq : &s->ioreq;
>  
>      destroy_ring_for_helper(&iorp->va, iorp->page);
> -
> -    spin_unlock(&iorp->lock);
>  }
>  
>  int prepare_ring_for_helper(
> @@ -476,8 +480,10 @@ int prepare_ring_for_helper(
>  }
>  
>  static int hvm_set_ioreq_page(
> -    struct domain *d, struct hvm_ioreq_page *iorp, unsigned long gmfn)
> +    struct hvm_ioreq_server *s, int buf, unsigned long gmfn)
>  {
> +    struct domain *d = s->domain;
> +    struct hvm_ioreq_page *iorp;
>      struct page_info *page;
>      void *va;
>      int rc;
> @@ -485,22 +491,17 @@ static int hvm_set_ioreq_page(
>      if ( (rc = prepare_ring_for_helper(d, gmfn, &page, &va)) )
>          return rc;
>  
> -    spin_lock(&iorp->lock);
> +    iorp = ( buf ) ? &s->buf_ioreq : &s->ioreq;
>  
>      if ( (iorp->va != NULL) || d->is_dying )
>      {
> -        destroy_ring_for_helper(&iorp->va, iorp->page);
> -        spin_unlock(&iorp->lock);
> +        destroy_ring_for_helper(&va, page);
>          return -EINVAL;
>      }
>  
>      iorp->va = va;
>      iorp->page = page;
>  
> -    spin_unlock(&iorp->lock);
> -
> -    domain_unpause(d);
> -
>      return 0;
>  }
>  
> @@ -544,38 +545,6 @@ static int handle_pvh_io(
>      return X86EMUL_OKAY;
>  }
>  
> -static int hvm_init_ioreq_server(struct domain *d)
> -{
> -    struct hvm_ioreq_server *s;
> -    int i;
> -
> -    s = xzalloc(struct hvm_ioreq_server);
> -    if ( !s )
> -        return -ENOMEM;
> -
> -    s->domain = d;
> -
> -    for ( i = 0; i < MAX_HVM_VCPUS; i++ )
> -        s->ioreq_evtchn[i] = -1;
> -    s->buf_ioreq_evtchn = -1;
> -
> -    hvm_init_ioreq_page(d, &s->ioreq);
> -    hvm_init_ioreq_page(d, &s->buf_ioreq);
> -
> -    d->arch.hvm_domain.ioreq_server = s;
> -    return 0;
> -}
> -
> -static void hvm_deinit_ioreq_server(struct domain *d)
> -{
> -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> -
> -    hvm_destroy_ioreq_page(d, &s->ioreq);
> -    hvm_destroy_ioreq_page(d, &s->buf_ioreq);
> -
> -    xfree(s);
> -}
> -
>  static void hvm_update_ioreq_server_evtchn(struct hvm_ioreq_server *s)
>  {
>      struct domain *d = s->domain;
> @@ -637,6 +606,152 @@ static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, struct vcpu
>      }
>  }
>  
> +static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
> +{
> +    struct hvm_ioreq_server *s;
> +    int i;
> +    unsigned long pfn;
> +    struct vcpu *v;
> +    int rc;

i and rc can be declared together.

> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    rc = -EEXIST;
> +    if ( d->arch.hvm_domain.ioreq_server != NULL )
> +        goto fail_exist;
> +
> +    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, d->domain_id);
> +
> +    rc = -ENOMEM;
> +    s = xzalloc(struct hvm_ioreq_server);
> +    if ( !s )
> +        goto fail_alloc;
> +
> +    s->domain = d;
> +    s->domid = domid;
> +
> +    for ( i = 0; i < MAX_HVM_VCPUS; i++ )
> +        s->ioreq_evtchn[i] = -1;
> +    s->buf_ioreq_evtchn = -1;
> +
> +    /* Initialize shared pages */
> +    pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
> +
> +    hvm_init_ioreq_page(s, 0);
> +    if ( (rc = hvm_set_ioreq_page(s, 0, pfn)) < 0 )
> +        goto fail_set_ioreq;
> +
> +    pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
> +
> +    hvm_init_ioreq_page(s, 1);
> +    if ( (rc = hvm_set_ioreq_page(s, 1, pfn)) < 0 )
> +        goto fail_set_buf_ioreq;
> +
> +    for_each_vcpu ( d, v )
> +    {
> +        if ( (rc = hvm_ioreq_server_add_vcpu(s, v)) < 0 )
> +            goto fail_add_vcpu;
> +    }
> +
> +    d->arch.hvm_domain.ioreq_server = s;
> +
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    return 0;
> +
> +fail_add_vcpu:
> +    for_each_vcpu ( d, v )
> +        hvm_ioreq_server_remove_vcpu(s, v);
> +    hvm_destroy_ioreq_page(s, 1);
> +fail_set_buf_ioreq:
> +    hvm_destroy_ioreq_page(s, 0);
> +fail_set_ioreq:
> +    xfree(s);
> +fail_alloc:
> +fail_exist:
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +    return rc;
> +}
> +
> +static void hvm_destroy_ioreq_server(struct domain *d)
> +{
> +    struct hvm_ioreq_server *s;
> +    struct vcpu *v;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, d->domain_id);
> +
> +    s = d->arch.hvm_domain.ioreq_server;
> +    if ( !s )
> +        goto done;
> +
> +    domain_pause(d);
> +
> +    d->arch.hvm_domain.ioreq_server = NULL;
> +
> +    for_each_vcpu ( d, v )
> +        hvm_ioreq_server_remove_vcpu(s, v);
> +
> +    hvm_destroy_ioreq_page(s, 1);
> +    hvm_destroy_ioreq_page(s, 0);
> +
> +    xfree(s);
> +
> +    domain_unpause(d);
> +
> +done:
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +}
> +
> +static int hvm_get_ioreq_server_buf_port(struct domain *d, evtchn_port_t *port)
> +{
> +    struct hvm_ioreq_server *s;
> +    int rc;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    s = d->arch.hvm_domain.ioreq_server;
> +
> +    rc = -ENOENT;
> +    if ( !s )
> +        goto done;
> +
> +    *port = s->buf_ioreq_evtchn;
> +    rc = 0;
> +
> +done:
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    return rc;
> +}
> +
> +static int hvm_get_ioreq_server_pfn(struct domain *d, int buf, xen_pfn_t *pfn)
> +{
> +    struct hvm_ioreq_server *s;
> +    int rc;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    s = d->arch.hvm_domain.ioreq_server;
> +
> +    rc = -ENOENT;
> +    if ( !s )
> +        goto done;
> +
> +    if ( buf )
> +        *pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
> +    else
> +        *pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];

This can be reduced and use "params[buf ? HVM_PARAM_BUFIOREQ_PFN :
HVM_PARAM_IOREQ_PFN]", although that is perhaps not as clear.

> +
> +    rc = 0;
> +
> +done:
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    return rc;
> +}
> +
>  static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
>                                       int *p_port)
>  {
> @@ -652,13 +767,24 @@ static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
>      return 0;
>  }
>  
> -static int hvm_set_ioreq_server_domid(struct hvm_ioreq_server *s, domid_t domid)
> +static int hvm_set_ioreq_server_domid(struct domain *d, domid_t domid)
>  {
> -    struct domain *d = s->domain;
> +    struct hvm_ioreq_server *s;
>      struct vcpu *v;
>      int rc = 0;
>  
>      domain_pause(d);
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    s = d->arch.hvm_domain.ioreq_server;
> +
> +    rc = -ENOENT;
> +    if ( !s )
> +        goto done;
> +
> +    rc = 0;
> +    if ( s->domid == domid )
> +        goto done;
>  
>      if ( d->vcpu[0] )
>      {
> @@ -680,31 +806,11 @@ static int hvm_set_ioreq_server_domid(struct hvm_ioreq_server *s, domid_t domid)
>  
>  done:
>      domain_unpause(d);
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);

Mismatched order of pause/unpause and lock/unlock pairs.  The unlock
should ideally be before the unpause.

>  
>      return rc;
>  }
>  
> -static int hvm_set_ioreq_server_pfn(struct hvm_ioreq_server *s, unsigned long pfn)
> -{
> -    struct domain *d = s->domain;
> -    int rc;
> -
> -    rc = hvm_set_ioreq_page(d, &s->ioreq, pfn);
> -    if ( rc < 0 )
> -        return rc;
> -
> -    hvm_update_ioreq_server_evtchn(s);
> -
> -    return 0;
> -}
> -
> -static int hvm_set_ioreq_server_buf_pfn(struct hvm_ioreq_server *s, unsigned long pfn)
> -{
> -    struct domain *d = s->domain;
> -
> -    return  hvm_set_ioreq_page(d, &s->buf_ioreq, pfn);
> -}
> -
>  int hvm_domain_initialise(struct domain *d)
>  {
>      int rc;
> @@ -732,6 +838,7 @@ int hvm_domain_initialise(struct domain *d)
>  
>      }
>  
> +    spin_lock_init(&d->arch.hvm_domain.ioreq_server_lock);
>      spin_lock_init(&d->arch.hvm_domain.irq_lock);
>      spin_lock_init(&d->arch.hvm_domain.uc_lock);
>  
> @@ -772,20 +879,14 @@ int hvm_domain_initialise(struct domain *d)
>  
>      rtc_init(d);
>  
> -    rc = hvm_init_ioreq_server(d);
> -    if ( rc != 0 )
> -        goto fail2;
> -
>      register_portio_handler(d, 0xe9, 1, hvm_print_line);
>  
>      rc = hvm_funcs.domain_initialise(d);
>      if ( rc != 0 )
> -        goto fail3;
> +        goto fail2;
>  
>      return 0;
>  
> - fail3:
> -    hvm_deinit_ioreq_server(d);
>   fail2:
>      rtc_deinit(d);
>      stdvga_deinit(d);
> @@ -809,7 +910,7 @@ void hvm_domain_relinquish_resources(struct domain *d)
>      if ( hvm_funcs.nhvm_domain_relinquish_resources )
>          hvm_funcs.nhvm_domain_relinquish_resources(d);
>  
> -    hvm_deinit_ioreq_server(d);
> +    hvm_destroy_ioreq_server(d);
>  
>      msixtbl_pt_cleanup(d);
>  
> @@ -1364,11 +1465,16 @@ int hvm_vcpu_initialise(struct vcpu *v)
>           && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown: nestedhvm_vcpu_destroy */
>          goto fail5;
>  
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
>      s = d->arch.hvm_domain.ioreq_server;
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
>  
> -    rc = hvm_ioreq_server_add_vcpu(s, v);
> -    if ( rc < 0 )
> -        goto fail6;
> +    if ( s )
> +    {
> +        rc = hvm_ioreq_server_add_vcpu(s, v);
> +        if ( rc < 0 )
> +            goto fail6;
> +    }
>  
>      if ( v->vcpu_id == 0 )
>      {
> @@ -1404,9 +1510,14 @@ int hvm_vcpu_initialise(struct vcpu *v)
>  void hvm_vcpu_destroy(struct vcpu *v)
>  {
>      struct domain *d = v->domain;
> -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> +    struct hvm_ioreq_server *s;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +    s = d->arch.hvm_domain.ioreq_server;
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
>  
> -    hvm_ioreq_server_remove_vcpu(s, v);
> +    if ( s )
> +        hvm_ioreq_server_remove_vcpu(s, v);
>  
>      nestedhvm_vcpu_destroy(v);
>  
> @@ -1459,7 +1570,10 @@ int hvm_buffered_io_send(ioreq_t *p)
>      /* Ensure buffered_iopage fits in a page */
>      BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
>  
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
>      s = d->arch.hvm_domain.ioreq_server;
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +
>      if ( !s )
>          return 0;
>  
> @@ -1532,20 +1646,12 @@ int hvm_buffered_io_send(ioreq_t *p)
>      return 1;
>  }
>  
> -bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *proto_p)
> +static bool_t hvm_send_assist_req_to_server(struct hvm_ioreq_server *s,
> +                                            struct vcpu *v,
> +                                            ioreq_t *proto_p)
>  {
>      struct domain *d = v->domain;
> -    struct hvm_ioreq_server *s;
> -    ioreq_t *p;
> -
> -    if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
> -        return 0; /* implicitly bins the i/o operation */
> -
> -    s = d->arch.hvm_domain.ioreq_server;
> -    if ( !s )
> -        return 0;
> -
> -    p = get_ioreq(s, v->vcpu_id);
> +    ioreq_t *p = get_ioreq(s, v->vcpu_id);
>  
>      if ( unlikely(p->state != STATE_IOREQ_NONE) )
>      {
> @@ -1578,6 +1684,26 @@ bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *proto_p)
>      return 1;
>  }
>  
> +bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *p)
> +{
> +    struct domain *d = v->domain;
> +    struct hvm_ioreq_server *s;
> +
> +    ASSERT(v->arch.hvm_vcpu.ioreq_server == NULL);
> +
> +    if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
> +        return 0;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +    s = d->arch.hvm_domain.ioreq_server;
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);

What is the purpose of talking the server lock just to read the
ioreq_server pointer?

> +
> +    if ( !s )
> +        return 0;
> +
> +    return hvm_send_assist_req_to_server(s, v, p);
> +}
> +
>  void hvm_hlt(unsigned long rflags)
>  {
>      struct vcpu *curr = current;
> @@ -4172,7 +4298,6 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>      case HVMOP_get_param:
>      {
>          struct xen_hvm_param a;
> -        struct hvm_ioreq_server *s;
>          struct domain *d;
>          struct vcpu *v;
>  
> @@ -4198,20 +4323,12 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>          if ( rc )
>              goto param_fail;
>  
> -        s = d->arch.hvm_domain.ioreq_server;
> -
>          if ( op == HVMOP_set_param )
>          {
>              rc = 0;
>  
>              switch ( a.index )
>              {
> -            case HVM_PARAM_IOREQ_PFN:
> -                rc = hvm_set_ioreq_server_pfn(s, a.value);
> -                break;
> -            case HVM_PARAM_BUFIOREQ_PFN: 
> -                rc = hvm_set_ioreq_server_buf_pfn(s, a.value);
> -                break;
>              case HVM_PARAM_CALLBACK_IRQ:
>                  hvm_set_callback_via(d, a.value);
>                  hvm_latch_shinfo_size(d);
> @@ -4265,7 +4382,9 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>                  if ( a.value == DOMID_SELF )
>                      a.value = curr_d->domain_id;
>  
> -                rc = hvm_set_ioreq_server_domid(s, a.value);
> +                rc = hvm_create_ioreq_server(d, a.value);
> +                if ( rc == -EEXIST )
> +                    rc = hvm_set_ioreq_server_domid(d, a.value);
>                  break;
>              case HVM_PARAM_ACPI_S_STATE:
>                  /* Not reflexive, as we must domain_pause(). */
> @@ -4360,8 +4479,46 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>          {
>              switch ( a.index )
>              {
> +            case HVM_PARAM_IOREQ_PFN:
> +            case HVM_PARAM_BUFIOREQ_PFN:
>              case HVM_PARAM_BUFIOREQ_EVTCHN:
> -                a.value = s->buf_ioreq_evtchn;
> +                /* May need to create server */
> +                rc = hvm_create_ioreq_server(d, curr_d->domain_id);
> +                if ( rc != 0 && rc != -EEXIST )
> +                    goto param_fail;
> +
> +                switch ( a.index )
> +                {
> +                case HVM_PARAM_IOREQ_PFN: {
> +                    xen_pfn_t pfn;
> +
> +                    if ( (rc = hvm_get_ioreq_server_pfn(d, 0, &pfn)) < 0 )
> +                        goto param_fail;
> +
> +                    a.value = pfn;
> +                    break;
> +                }
> +                case HVM_PARAM_BUFIOREQ_PFN: {
> +                    xen_pfn_t pfn;
> +
> +                    if ( (rc = hvm_get_ioreq_server_pfn(d, 1, &pfn)) < 0 )
> +                        goto param_fail;
> +
> +                    a.value = pfn;
> +                    break;
> +                }
> +                case HVM_PARAM_BUFIOREQ_EVTCHN: {
> +                    evtchn_port_t port;
> +
> +                    if ( (rc = hvm_get_ioreq_server_buf_port(d, &port)) < 0 )
> +                        goto param_fail;
> +
> +                    a.value = port;
> +                    break;
> +                }
> +                default:
> +                    BUG();
> +                }
>                  break;
>              case HVM_PARAM_ACPI_S_STATE:
>                  a.value = d->arch.hvm_domain.is_s3_suspended ? 3 : 0;
> diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
> index 4c039f8..e750ef0 100644
> --- a/xen/include/asm-x86/hvm/domain.h
> +++ b/xen/include/asm-x86/hvm/domain.h
> @@ -52,6 +52,8 @@ struct hvm_ioreq_server {
>  
>  struct hvm_domain {
>      struct hvm_ioreq_server *ioreq_server;
> +    spinlock_t              ioreq_server_lock;
> +
>      struct pl_time         pl_time;
>  
>      struct hvm_io_handler *io_handler;
> @@ -106,4 +108,3 @@ struct hvm_domain {
>  #define hap_enabled(d)  ((d)->arch.hvm_domain.hap_enabled)
>  
>  #endif /* __ASM_X86_HVM_DOMAIN_H__ */
> -

Spurious whitespace change

~Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 3/5] ioreq-server: on-demand creation of ioreq server
  2014-01-30 15:21   ` Andrew Cooper
@ 2014-01-30 15:32     ` Paul Durrant
  0 siblings, 0 replies; 25+ messages in thread
From: Paul Durrant @ 2014-01-30 15:32 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel@lists.xen.org

> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: 30 January 2014 15:22
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [RFC PATCH 3/5] ioreq-server: on-demand creation
> of ioreq server
> 
> On 30/01/14 14:19, Paul Durrant wrote:
> > This patch only creates the ioreq server when the legacy HVM parameters
> > are touched by an emulator. It also lays some groundwork for supporting
> > multiple IOREQ servers. For instance, it introduces ioreq server reference
> > counting which is not strictly necessary at this stage but will become so
> > when ioreq servers can be destroyed prior the domain dying.
> >
> > There is a significant change in the layout of the special pages reserved
> > in xc_hvm_build_x86.c. This is so that we can 'grow' them downwards
> without
> > moving pages such as the xenstore page when building a domain that can
> > support more than one emulator.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > ---
> >  tools/libxc/xc_hvm_build_x86.c   |   41 ++--
> >  xen/arch/x86/hvm/hvm.c           |  409 ++++++++++++++++++++++++++----
> --------
> >  xen/include/asm-x86/hvm/domain.h |    3 +-
> >  3 files changed, 314 insertions(+), 139 deletions(-)
> >
> > diff --git a/tools/libxc/xc_hvm_build_x86.c
> b/tools/libxc/xc_hvm_build_x86.c
> > index 77bd365..f24f2a1 100644
> > --- a/tools/libxc/xc_hvm_build_x86.c
> > +++ b/tools/libxc/xc_hvm_build_x86.c
> > @@ -41,13 +41,12 @@
> >  #define SPECIALPAGE_PAGING   0
> >  #define SPECIALPAGE_ACCESS   1
> >  #define SPECIALPAGE_SHARING  2
> > -#define SPECIALPAGE_BUFIOREQ 3
> > -#define SPECIALPAGE_XENSTORE 4
> > -#define SPECIALPAGE_IOREQ    5
> > -#define SPECIALPAGE_IDENT_PT 6
> > -#define SPECIALPAGE_CONSOLE  7
> > -#define NR_SPECIAL_PAGES     8
> > -#define special_pfn(x) (0xff000u - NR_SPECIAL_PAGES + (x))
> > +#define SPECIALPAGE_XENSTORE 3
> > +#define SPECIALPAGE_IDENT_PT 4
> > +#define SPECIALPAGE_CONSOLE  5
> > +#define SPECIALPAGE_IOREQ    6
> > +#define NR_SPECIAL_PAGES     SPECIALPAGE_IOREQ + 2 /* ioreq server
> needs 2 pages */
> > +#define special_pfn(x) (0xff000u - (x))
> >
> >  static int modules_init(struct xc_hvm_build_args *args,
> >                          uint64_t vend, struct elf_binary *elf,
> > @@ -112,7 +111,7 @@ static void build_hvm_info(void *hvm_info_page,
> uint64_t mem_size,
> >      /* Memory parameters. */
> >      hvm_info->low_mem_pgend = lowmem_end >> PAGE_SHIFT;
> >      hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT;
> > -    hvm_info->reserved_mem_pgstart = special_pfn(0);
> > +    hvm_info->reserved_mem_pgstart = special_pfn(0) -
> NR_SPECIAL_PAGES;
> >
> >      /* Finish with the checksum. */
> >      for ( i = 0, sum = 0; i < hvm_info->length; i++ )
> > @@ -463,6 +462,24 @@ static int setup_guest(xc_interface *xch,
> >      munmap(hvm_info_page, PAGE_SIZE);
> >
> >      /* Allocate and clear special pages. */
> > +
> > +     DPRINTF("%d SPECIAL PAGES:\n"
> > +            "  PAGING:    %"PRI_xen_pfn"\n"
> > +            "  ACCESS:    %"PRI_xen_pfn"\n"
> > +            "  SHARING:   %"PRI_xen_pfn"\n"
> > +            "  STORE:     %"PRI_xen_pfn"\n"
> > +            "  IDENT_PT:  %"PRI_xen_pfn"\n"
> > +            "  CONSOLE:   %"PRI_xen_pfn"\n"
> > +            "  IOREQ:     %"PRI_xen_pfn"\n",
> > +            NR_SPECIAL_PAGES,
> > +            (xen_pfn_t)special_pfn(SPECIALPAGE_PAGING),
> > +            (xen_pfn_t)special_pfn(SPECIALPAGE_ACCESS),
> > +            (xen_pfn_t)special_pfn(SPECIALPAGE_SHARING),
> > +            (xen_pfn_t)special_pfn(SPECIALPAGE_XENSTORE),
> > +            (xen_pfn_t)special_pfn(SPECIALPAGE_IDENT_PT),
> > +            (xen_pfn_t)special_pfn(SPECIALPAGE_CONSOLE),
> > +            (xen_pfn_t)special_pfn(SPECIALPAGE_IOREQ));
> > +
> 
> I realise I am being quite picky here, but from a daemon point of view
> trying to log to facilities like syslog, multi-line single debugging
> messages are a pain.  Would it be possible to do this as 8 DPRINTF()s?
> 

Yes, of course.

> >      for ( i = 0; i < NR_SPECIAL_PAGES; i++ )
> >      {
> >          xen_pfn_t pfn = special_pfn(i);
> > @@ -478,10 +495,6 @@ static int setup_guest(xc_interface *xch,
> >
> >      xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_PFN,
> >                       special_pfn(SPECIALPAGE_XENSTORE));
> > -    xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
> > -                     special_pfn(SPECIALPAGE_BUFIOREQ));
> > -    xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN,
> > -                     special_pfn(SPECIALPAGE_IOREQ));
> >      xc_set_hvm_param(xch, dom, HVM_PARAM_CONSOLE_PFN,
> >                       special_pfn(SPECIALPAGE_CONSOLE));
> >      xc_set_hvm_param(xch, dom, HVM_PARAM_PAGING_RING_PFN,
> > @@ -490,6 +503,10 @@ static int setup_guest(xc_interface *xch,
> >                       special_pfn(SPECIALPAGE_ACCESS));
> >      xc_set_hvm_param(xch, dom, HVM_PARAM_SHARING_RING_PFN,
> >                       special_pfn(SPECIALPAGE_SHARING));
> > +    xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN,
> > +                     special_pfn(SPECIALPAGE_IOREQ));
> > +    xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
> > +                     special_pfn(SPECIALPAGE_IOREQ) - 1);
> >
> >      /*
> >       * Identity-map page table is required for running with CR0.PG=0 when
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > index a0eaadb..d9874fb 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -352,24 +352,9 @@ static ioreq_t *get_ioreq(struct hvm_ioreq_server
> *s, int id)
> >      return &p->vcpu_ioreq[id];
> >  }
> >
> > -void hvm_do_resume(struct vcpu *v)
> > +static void hvm_wait_on_io(struct domain *d, ioreq_t *p)
> >  {
> > -    struct hvm_ioreq_server *s;
> > -    ioreq_t *p;
> > -
> > -    check_wakeup_from_wait();
> > -
> > -    if ( is_hvm_vcpu(v) )
> > -        pt_restore_timer(v);
> > -
> > -    s = v->arch.hvm_vcpu.ioreq_server;
> > -    v->arch.hvm_vcpu.ioreq_server = NULL;
> > -
> > -    if ( !s )
> > -        goto check_inject_trap;
> > -
> >      /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE).
> */
> > -    p = get_ioreq(s, v->vcpu_id);
> >      while ( p->state != STATE_IOREQ_NONE )
> >      {
> >          switch ( p->state )
> > @@ -385,12 +370,32 @@ void hvm_do_resume(struct vcpu *v)
> >              break;
> >          default:
> >              gdprintk(XENLOG_ERR, "Weird HVM iorequest state %d.\n", p-
> >state);
> > -            domain_crash(v->domain);
> > +            domain_crash(d);
> >              return; /* bail */
> >          }
> >      }
> > +}
> > +
> > +void hvm_do_resume(struct vcpu *v)
> > +{
> > +    struct domain *d = v->domain;
> > +    struct hvm_ioreq_server *s;
> > +
> > +    check_wakeup_from_wait();
> > +
> > +    if ( is_hvm_vcpu(v) )
> > +        pt_restore_timer(v);
> > +
> > +    s = v->arch.hvm_vcpu.ioreq_server;
> > +    v->arch.hvm_vcpu.ioreq_server = NULL;
> > +
> > +    if ( s )
> > +    {
> > +        ioreq_t *p = get_ioreq(s, v->vcpu_id);
> > +
> > +        hvm_wait_on_io(d, p);
> > +    }
> >
> > - check_inject_trap:
> >      /* Inject pending hw/sw trap */
> >      if ( v->arch.hvm_vcpu.inject_trap.vector != -1 )
> >      {
> > @@ -399,11 +404,13 @@ void hvm_do_resume(struct vcpu *v)
> >      }
> >  }
> >
> > -static void hvm_init_ioreq_page(
> > -    struct domain *d, struct hvm_ioreq_page *iorp)
> > +static void hvm_init_ioreq_page(struct hvm_ioreq_server *s, int buf)
> >  {
> > +    struct hvm_ioreq_page *iorp;
> > +
> > +    iorp = ( buf ) ? &s->buf_ioreq : &s->ioreq;
> > +
> 
> Brackets are redundant.
> 

...but good style IMO.

> >      spin_lock_init(&iorp->lock);
> > -    domain_pause(d);
> >  }
> >
> >  void destroy_ring_for_helper(
> > @@ -419,16 +426,13 @@ void destroy_ring_for_helper(
> >      }
> >  }
> >
> > -static void hvm_destroy_ioreq_page(
> > -    struct domain *d, struct hvm_ioreq_page *iorp)
> > +static void hvm_destroy_ioreq_page(struct hvm_ioreq_server *s, int buf)
> >  {
> > -    spin_lock(&iorp->lock);
> > +    struct hvm_ioreq_page *iorp;
> >
> > -    ASSERT(d->is_dying);
> > +    iorp = ( buf ) ? &s->buf_ioreq : &s->ioreq;
> >
> >      destroy_ring_for_helper(&iorp->va, iorp->page);
> > -
> > -    spin_unlock(&iorp->lock);
> >  }
> >
> >  int prepare_ring_for_helper(
> > @@ -476,8 +480,10 @@ int prepare_ring_for_helper(
> >  }
> >
> >  static int hvm_set_ioreq_page(
> > -    struct domain *d, struct hvm_ioreq_page *iorp, unsigned long gmfn)
> > +    struct hvm_ioreq_server *s, int buf, unsigned long gmfn)
> >  {
> > +    struct domain *d = s->domain;
> > +    struct hvm_ioreq_page *iorp;
> >      struct page_info *page;
> >      void *va;
> >      int rc;
> > @@ -485,22 +491,17 @@ static int hvm_set_ioreq_page(
> >      if ( (rc = prepare_ring_for_helper(d, gmfn, &page, &va)) )
> >          return rc;
> >
> > -    spin_lock(&iorp->lock);
> > +    iorp = ( buf ) ? &s->buf_ioreq : &s->ioreq;
> >
> >      if ( (iorp->va != NULL) || d->is_dying )
> >      {
> > -        destroy_ring_for_helper(&iorp->va, iorp->page);
> > -        spin_unlock(&iorp->lock);
> > +        destroy_ring_for_helper(&va, page);
> >          return -EINVAL;
> >      }
> >
> >      iorp->va = va;
> >      iorp->page = page;
> >
> > -    spin_unlock(&iorp->lock);
> > -
> > -    domain_unpause(d);
> > -
> >      return 0;
> >  }
> >
> > @@ -544,38 +545,6 @@ static int handle_pvh_io(
> >      return X86EMUL_OKAY;
> >  }
> >
> > -static int hvm_init_ioreq_server(struct domain *d)
> > -{
> > -    struct hvm_ioreq_server *s;
> > -    int i;
> > -
> > -    s = xzalloc(struct hvm_ioreq_server);
> > -    if ( !s )
> > -        return -ENOMEM;
> > -
> > -    s->domain = d;
> > -
> > -    for ( i = 0; i < MAX_HVM_VCPUS; i++ )
> > -        s->ioreq_evtchn[i] = -1;
> > -    s->buf_ioreq_evtchn = -1;
> > -
> > -    hvm_init_ioreq_page(d, &s->ioreq);
> > -    hvm_init_ioreq_page(d, &s->buf_ioreq);
> > -
> > -    d->arch.hvm_domain.ioreq_server = s;
> > -    return 0;
> > -}
> > -
> > -static void hvm_deinit_ioreq_server(struct domain *d)
> > -{
> > -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> > -
> > -    hvm_destroy_ioreq_page(d, &s->ioreq);
> > -    hvm_destroy_ioreq_page(d, &s->buf_ioreq);
> > -
> > -    xfree(s);
> > -}
> > -
> >  static void hvm_update_ioreq_server_evtchn(struct hvm_ioreq_server
> *s)
> >  {
> >      struct domain *d = s->domain;
> > @@ -637,6 +606,152 @@ static void
> hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, struct vcpu
> >      }
> >  }
> >
> > +static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
> > +{
> > +    struct hvm_ioreq_server *s;
> > +    int i;
> > +    unsigned long pfn;
> > +    struct vcpu *v;
> > +    int rc;
> 
> i and rc can be declared together.
> 
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    rc = -EEXIST;
> > +    if ( d->arch.hvm_domain.ioreq_server != NULL )
> > +        goto fail_exist;
> > +
> > +    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, d->domain_id);
> > +
> > +    rc = -ENOMEM;
> > +    s = xzalloc(struct hvm_ioreq_server);
> > +    if ( !s )
> > +        goto fail_alloc;
> > +
> > +    s->domain = d;
> > +    s->domid = domid;
> > +
> > +    for ( i = 0; i < MAX_HVM_VCPUS; i++ )
> > +        s->ioreq_evtchn[i] = -1;
> > +    s->buf_ioreq_evtchn = -1;
> > +
> > +    /* Initialize shared pages */
> > +    pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
> > +
> > +    hvm_init_ioreq_page(s, 0);
> > +    if ( (rc = hvm_set_ioreq_page(s, 0, pfn)) < 0 )
> > +        goto fail_set_ioreq;
> > +
> > +    pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
> > +
> > +    hvm_init_ioreq_page(s, 1);
> > +    if ( (rc = hvm_set_ioreq_page(s, 1, pfn)) < 0 )
> > +        goto fail_set_buf_ioreq;
> > +
> > +    for_each_vcpu ( d, v )
> > +    {
> > +        if ( (rc = hvm_ioreq_server_add_vcpu(s, v)) < 0 )
> > +            goto fail_add_vcpu;
> > +    }
> > +
> > +    d->arch.hvm_domain.ioreq_server = s;
> > +
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    return 0;
> > +
> > +fail_add_vcpu:
> > +    for_each_vcpu ( d, v )
> > +        hvm_ioreq_server_remove_vcpu(s, v);
> > +    hvm_destroy_ioreq_page(s, 1);
> > +fail_set_buf_ioreq:
> > +    hvm_destroy_ioreq_page(s, 0);
> > +fail_set_ioreq:
> > +    xfree(s);
> > +fail_alloc:
> > +fail_exist:
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> > +    return rc;
> > +}
> > +
> > +static void hvm_destroy_ioreq_server(struct domain *d)
> > +{
> > +    struct hvm_ioreq_server *s;
> > +    struct vcpu *v;
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, d->domain_id);
> > +
> > +    s = d->arch.hvm_domain.ioreq_server;
> > +    if ( !s )
> > +        goto done;
> > +
> > +    domain_pause(d);
> > +
> > +    d->arch.hvm_domain.ioreq_server = NULL;
> > +
> > +    for_each_vcpu ( d, v )
> > +        hvm_ioreq_server_remove_vcpu(s, v);
> > +
> > +    hvm_destroy_ioreq_page(s, 1);
> > +    hvm_destroy_ioreq_page(s, 0);
> > +
> > +    xfree(s);
> > +
> > +    domain_unpause(d);
> > +
> > +done:
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> > +}
> > +
> > +static int hvm_get_ioreq_server_buf_port(struct domain *d,
> evtchn_port_t *port)
> > +{
> > +    struct hvm_ioreq_server *s;
> > +    int rc;
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    s = d->arch.hvm_domain.ioreq_server;
> > +
> > +    rc = -ENOENT;
> > +    if ( !s )
> > +        goto done;
> > +
> > +    *port = s->buf_ioreq_evtchn;
> > +    rc = 0;
> > +
> > +done:
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    return rc;
> > +}
> > +
> > +static int hvm_get_ioreq_server_pfn(struct domain *d, int buf,
> xen_pfn_t *pfn)
> > +{
> > +    struct hvm_ioreq_server *s;
> > +    int rc;
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    s = d->arch.hvm_domain.ioreq_server;
> > +
> > +    rc = -ENOENT;
> > +    if ( !s )
> > +        goto done;
> > +
> > +    if ( buf )
> > +        *pfn = d-
> >arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
> > +    else
> > +        *pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
> 
> This can be reduced and use "params[buf ? HVM_PARAM_BUFIOREQ_PFN :
> HVM_PARAM_IOREQ_PFN]", although that is perhaps not as clear.
> 

Indeed. Yuck.

> > +
> > +    rc = 0;
> > +
> > +done:
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    return rc;
> > +}
> > +
> >  static int hvm_replace_event_channel(struct vcpu *v, domid_t
> remote_domid,
> >                                       int *p_port)
> >  {
> > @@ -652,13 +767,24 @@ static int hvm_replace_event_channel(struct
> vcpu *v, domid_t remote_domid,
> >      return 0;
> >  }
> >
> > -static int hvm_set_ioreq_server_domid(struct hvm_ioreq_server *s,
> domid_t domid)
> > +static int hvm_set_ioreq_server_domid(struct domain *d, domid_t
> domid)
> >  {
> > -    struct domain *d = s->domain;
> > +    struct hvm_ioreq_server *s;
> >      struct vcpu *v;
> >      int rc = 0;
> >
> >      domain_pause(d);
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> > +    s = d->arch.hvm_domain.ioreq_server;
> > +
> > +    rc = -ENOENT;
> > +    if ( !s )
> > +        goto done;
> > +
> > +    rc = 0;
> > +    if ( s->domid == domid )
> > +        goto done;
> >
> >      if ( d->vcpu[0] )
> >      {
> > @@ -680,31 +806,11 @@ static int hvm_set_ioreq_server_domid(struct
> hvm_ioreq_server *s, domid_t domid)
> >
> >  done:
> >      domain_unpause(d);
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> 
> Mismatched order of pause/unpause and lock/unlock pairs.  The unlock
> should ideally be before the unpause.
> 

Ok.

> >
> >      return rc;
> >  }
> >
> > -static int hvm_set_ioreq_server_pfn(struct hvm_ioreq_server *s,
> unsigned long pfn)
> > -{
> > -    struct domain *d = s->domain;
> > -    int rc;
> > -
> > -    rc = hvm_set_ioreq_page(d, &s->ioreq, pfn);
> > -    if ( rc < 0 )
> > -        return rc;
> > -
> > -    hvm_update_ioreq_server_evtchn(s);
> > -
> > -    return 0;
> > -}
> > -
> > -static int hvm_set_ioreq_server_buf_pfn(struct hvm_ioreq_server *s,
> unsigned long pfn)
> > -{
> > -    struct domain *d = s->domain;
> > -
> > -    return  hvm_set_ioreq_page(d, &s->buf_ioreq, pfn);
> > -}
> > -
> >  int hvm_domain_initialise(struct domain *d)
> >  {
> >      int rc;
> > @@ -732,6 +838,7 @@ int hvm_domain_initialise(struct domain *d)
> >
> >      }
> >
> > +    spin_lock_init(&d->arch.hvm_domain.ioreq_server_lock);
> >      spin_lock_init(&d->arch.hvm_domain.irq_lock);
> >      spin_lock_init(&d->arch.hvm_domain.uc_lock);
> >
> > @@ -772,20 +879,14 @@ int hvm_domain_initialise(struct domain *d)
> >
> >      rtc_init(d);
> >
> > -    rc = hvm_init_ioreq_server(d);
> > -    if ( rc != 0 )
> > -        goto fail2;
> > -
> >      register_portio_handler(d, 0xe9, 1, hvm_print_line);
> >
> >      rc = hvm_funcs.domain_initialise(d);
> >      if ( rc != 0 )
> > -        goto fail3;
> > +        goto fail2;
> >
> >      return 0;
> >
> > - fail3:
> > -    hvm_deinit_ioreq_server(d);
> >   fail2:
> >      rtc_deinit(d);
> >      stdvga_deinit(d);
> > @@ -809,7 +910,7 @@ void hvm_domain_relinquish_resources(struct
> domain *d)
> >      if ( hvm_funcs.nhvm_domain_relinquish_resources )
> >          hvm_funcs.nhvm_domain_relinquish_resources(d);
> >
> > -    hvm_deinit_ioreq_server(d);
> > +    hvm_destroy_ioreq_server(d);
> >
> >      msixtbl_pt_cleanup(d);
> >
> > @@ -1364,11 +1465,16 @@ int hvm_vcpu_initialise(struct vcpu *v)
> >           && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown:
> nestedhvm_vcpu_destroy */
> >          goto fail5;
> >
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> >      s = d->arch.hvm_domain.ioreq_server;
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> >
> > -    rc = hvm_ioreq_server_add_vcpu(s, v);
> > -    if ( rc < 0 )
> > -        goto fail6;
> > +    if ( s )
> > +    {
> > +        rc = hvm_ioreq_server_add_vcpu(s, v);
> > +        if ( rc < 0 )
> > +            goto fail6;
> > +    }
> >
> >      if ( v->vcpu_id == 0 )
> >      {
> > @@ -1404,9 +1510,14 @@ int hvm_vcpu_initialise(struct vcpu *v)
> >  void hvm_vcpu_destroy(struct vcpu *v)
> >  {
> >      struct domain *d = v->domain;
> > -    struct hvm_ioreq_server *s = d->arch.hvm_domain.ioreq_server;
> > +    struct hvm_ioreq_server *s;
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> > +    s = d->arch.hvm_domain.ioreq_server;
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> >
> > -    hvm_ioreq_server_remove_vcpu(s, v);
> > +    if ( s )
> > +        hvm_ioreq_server_remove_vcpu(s, v);
> >
> >      nestedhvm_vcpu_destroy(v);
> >
> > @@ -1459,7 +1570,10 @@ int hvm_buffered_io_send(ioreq_t *p)
> >      /* Ensure buffered_iopage fits in a page */
> >      BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
> >
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> >      s = d->arch.hvm_domain.ioreq_server;
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> > +
> >      if ( !s )
> >          return 0;
> >
> > @@ -1532,20 +1646,12 @@ int hvm_buffered_io_send(ioreq_t *p)
> >      return 1;
> >  }
> >
> > -bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *proto_p)
> > +static bool_t hvm_send_assist_req_to_server(struct hvm_ioreq_server
> *s,
> > +                                            struct vcpu *v,
> > +                                            ioreq_t *proto_p)
> >  {
> >      struct domain *d = v->domain;
> > -    struct hvm_ioreq_server *s;
> > -    ioreq_t *p;
> > -
> > -    if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
> > -        return 0; /* implicitly bins the i/o operation */
> > -
> > -    s = d->arch.hvm_domain.ioreq_server;
> > -    if ( !s )
> > -        return 0;
> > -
> > -    p = get_ioreq(s, v->vcpu_id);
> > +    ioreq_t *p = get_ioreq(s, v->vcpu_id);
> >
> >      if ( unlikely(p->state != STATE_IOREQ_NONE) )
> >      {
> > @@ -1578,6 +1684,26 @@ bool_t hvm_send_assist_req(struct vcpu *v,
> ioreq_t *proto_p)
> >      return 1;
> >  }
> >
> > +bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *p)
> > +{
> > +    struct domain *d = v->domain;
> > +    struct hvm_ioreq_server *s;
> > +
> > +    ASSERT(v->arch.hvm_vcpu.ioreq_server == NULL);
> > +
> > +    if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
> > +        return 0;
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> > +    s = d->arch.hvm_domain.ioreq_server;
> > +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> 
> What is the purpose of talking the server lock just to read the
> ioreq_server pointer?
> 

The lock supposed to be there to eventually wrap a list walk, but as that's done in a separate function the lock is probably not particularly illustrative here - I'll ditch it.

> > +
> > +    if ( !s )
> > +        return 0;
> > +
> > +    return hvm_send_assist_req_to_server(s, v, p);
> > +}
> > +
> >  void hvm_hlt(unsigned long rflags)
> >  {
> >      struct vcpu *curr = current;
> > @@ -4172,7 +4298,6 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> >      case HVMOP_get_param:
> >      {
> >          struct xen_hvm_param a;
> > -        struct hvm_ioreq_server *s;
> >          struct domain *d;
> >          struct vcpu *v;
> >
> > @@ -4198,20 +4323,12 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> >          if ( rc )
> >              goto param_fail;
> >
> > -        s = d->arch.hvm_domain.ioreq_server;
> > -
> >          if ( op == HVMOP_set_param )
> >          {
> >              rc = 0;
> >
> >              switch ( a.index )
> >              {
> > -            case HVM_PARAM_IOREQ_PFN:
> > -                rc = hvm_set_ioreq_server_pfn(s, a.value);
> > -                break;
> > -            case HVM_PARAM_BUFIOREQ_PFN:
> > -                rc = hvm_set_ioreq_server_buf_pfn(s, a.value);
> > -                break;
> >              case HVM_PARAM_CALLBACK_IRQ:
> >                  hvm_set_callback_via(d, a.value);
> >                  hvm_latch_shinfo_size(d);
> > @@ -4265,7 +4382,9 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> >                  if ( a.value == DOMID_SELF )
> >                      a.value = curr_d->domain_id;
> >
> > -                rc = hvm_set_ioreq_server_domid(s, a.value);
> > +                rc = hvm_create_ioreq_server(d, a.value);
> > +                if ( rc == -EEXIST )
> > +                    rc = hvm_set_ioreq_server_domid(d, a.value);
> >                  break;
> >              case HVM_PARAM_ACPI_S_STATE:
> >                  /* Not reflexive, as we must domain_pause(). */
> > @@ -4360,8 +4479,46 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> >          {
> >              switch ( a.index )
> >              {
> > +            case HVM_PARAM_IOREQ_PFN:
> > +            case HVM_PARAM_BUFIOREQ_PFN:
> >              case HVM_PARAM_BUFIOREQ_EVTCHN:
> > -                a.value = s->buf_ioreq_evtchn;
> > +                /* May need to create server */
> > +                rc = hvm_create_ioreq_server(d, curr_d->domain_id);
> > +                if ( rc != 0 && rc != -EEXIST )
> > +                    goto param_fail;
> > +
> > +                switch ( a.index )
> > +                {
> > +                case HVM_PARAM_IOREQ_PFN: {
> > +                    xen_pfn_t pfn;
> > +
> > +                    if ( (rc = hvm_get_ioreq_server_pfn(d, 0, &pfn)) < 0 )
> > +                        goto param_fail;
> > +
> > +                    a.value = pfn;
> > +                    break;
> > +                }
> > +                case HVM_PARAM_BUFIOREQ_PFN: {
> > +                    xen_pfn_t pfn;
> > +
> > +                    if ( (rc = hvm_get_ioreq_server_pfn(d, 1, &pfn)) < 0 )
> > +                        goto param_fail;
> > +
> > +                    a.value = pfn;
> > +                    break;
> > +                }
> > +                case HVM_PARAM_BUFIOREQ_EVTCHN: {
> > +                    evtchn_port_t port;
> > +
> > +                    if ( (rc = hvm_get_ioreq_server_buf_port(d, &port)) < 0 )
> > +                        goto param_fail;
> > +
> > +                    a.value = port;
> > +                    break;
> > +                }
> > +                default:
> > +                    BUG();
> > +                }
> >                  break;
> >              case HVM_PARAM_ACPI_S_STATE:
> >                  a.value = d->arch.hvm_domain.is_s3_suspended ? 3 : 0;
> > diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-
> x86/hvm/domain.h
> > index 4c039f8..e750ef0 100644
> > --- a/xen/include/asm-x86/hvm/domain.h
> > +++ b/xen/include/asm-x86/hvm/domain.h
> > @@ -52,6 +52,8 @@ struct hvm_ioreq_server {
> >
> >  struct hvm_domain {
> >      struct hvm_ioreq_server *ioreq_server;
> > +    spinlock_t              ioreq_server_lock;
> > +
> >      struct pl_time         pl_time;
> >
> >      struct hvm_io_handler *io_handler;
> > @@ -106,4 +108,3 @@ struct hvm_domain {
> >  #define hap_enabled(d)  ((d)->arch.hvm_domain.hap_enabled)
> >
> >  #endif /* __ASM_X86_HVM_DOMAIN_H__ */
> > -
> 
> Spurious whitespace change
> 

Ok.

  Paul

> ~Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 4/5] ioreq-server: add support for multiple servers
  2014-01-30 14:19 [RFC PATCH 1/5] Support for running secondary emulators Paul Durrant
                   ` (2 preceding siblings ...)
  2014-01-30 14:19 ` [RFC PATCH 3/5] ioreq-server: on-demand creation of ioreq server Paul Durrant
@ 2014-01-30 14:19 ` Paul Durrant
  2014-01-30 15:46   ` Andrew Cooper
  2014-01-30 14:19 ` [RFC PATCH 5/5] ioreq-server: bring the PCI hotplug controller implementation into Xen Paul Durrant
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Paul Durrant @ 2014-01-30 14:19 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant

The legacy 'catch-all' server is always created with id 0. Secondary
servers will have an id ranging from 1 to a limit set by the toolstack
via the 'max_emulators' build info field. This defaults to 1 so ordinarily
no extra special pages are reserved for secondary emulators. It may be
increased using the secondary_device_emulators parameter in xl.cfg(5).

Because of the re-arrangement of the special pages in a previous patch we
only need the addition of parameter HVM_PARAM_NR_IOREQ_SERVERS to determine
the layout of the shared pages for multiple emulators. Guests migrated in
from hosts without this patch will be lacking the save record which stores
the new parameter and so the guest is assumed to only have had a single
emulator.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
 docs/man/xl.cfg.pod.5            |    7 +
 tools/libxc/xc_domain.c          |  175 ++++++++
 tools/libxc/xc_domain_restore.c  |   20 +
 tools/libxc/xc_domain_save.c     |   12 +
 tools/libxc/xc_hvm_build_x86.c   |   25 +-
 tools/libxc/xenctrl.h            |   41 ++
 tools/libxc/xenguest.h           |    2 +
 tools/libxc/xg_save_restore.h    |    1 +
 tools/libxl/libxl.h              |    8 +
 tools/libxl/libxl_create.c       |    3 +
 tools/libxl/libxl_dom.c          |    1 +
 tools/libxl/libxl_types.idl      |    1 +
 tools/libxl/xl_cmdimpl.c         |    3 +
 xen/arch/x86/hvm/hvm.c           |  916 +++++++++++++++++++++++++++++++++++---
 xen/arch/x86/hvm/io.c            |    2 +-
 xen/include/asm-x86/hvm/domain.h |   21 +-
 xen/include/asm-x86/hvm/hvm.h    |    1 +
 xen/include/asm-x86/hvm/vcpu.h   |    2 +-
 xen/include/public/hvm/hvm_op.h  |   70 +++
 xen/include/public/hvm/ioreq.h   |    1 +
 xen/include/public/hvm/params.h  |    4 +-
 21 files changed, 1230 insertions(+), 86 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 9941395..9aa9958 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -1277,6 +1277,13 @@ specified, enabling the use of XenServer PV drivers in the guest.
 This parameter only takes effect when device_model_version=qemu-xen.
 See F<docs/misc/pci-device-reservations.txt> for more information.
 
+=item B<secondary_device_emulators=NUMBER>
+
+If a number of secondary device emulators (i.e. in addition to
+qemu-xen or qemu-xen-traditional) are to be invoked to support the
+guest then this parameter can be set with the count of how many are
+to be used. The default value is zero.
+
 =back
 
 =head2 Device-Model Options
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index c2fdd74..c64d15a 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1246,6 +1246,181 @@ int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long
     return rc;
 }
 
+int xc_hvm_create_ioreq_server(xc_interface *xch,
+                               domid_t domid,
+                               ioservid_t *id)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_create_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_create_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    rc = do_xen_hypercall(xch, &hypercall);
+    *id = arg->id;
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_get_ioreq_server_info(xc_interface *xch,
+                                 domid_t domid,
+                                 ioservid_t id,
+                                 xen_pfn_t *pfn,
+                                 xen_pfn_t *buf_pfn,
+                                 evtchn_port_t *buf_port)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_get_ioreq_server_info_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_get_ioreq_server_info;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->id = id;
+    rc = do_xen_hypercall(xch, &hypercall);
+    if ( rc != 0 )
+        goto done;
+
+    if ( pfn )
+        *pfn = arg->pfn;
+
+    if ( buf_pfn )
+        *buf_pfn = arg->buf_pfn;
+
+    if ( buf_port )
+        *buf_port = arg->buf_port;
+
+done:
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_map_io_range_to_ioreq_server(xc_interface *xch, domid_t domid,
+                                        ioservid_t id, int is_mmio,
+                                        uint64_t start, uint64_t end)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_map_io_range_to_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->id = id;
+    arg->is_mmio = is_mmio;
+    arg->start = start;
+    arg->end = end;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch, domid_t domid,
+                                            ioservid_t id, int is_mmio,
+                                            uint64_t start)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_unmap_io_range_from_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_unmap_io_range_from_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->id = id;
+    arg->is_mmio = is_mmio;
+    arg->start = start;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch, domid_t domid,
+                                      ioservid_t id, uint16_t bdf)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_map_pcidev_to_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_map_pcidev_to_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->id = id;
+    arg->bdf = bdf;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch, domid_t domid,
+                                          ioservid_t id, uint16_t bdf)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_unmap_pcidev_from_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_unmap_pcidev_from_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->id = id;
+    arg->bdf = bdf;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_destroy_ioreq_server(xc_interface *xch,
+                                domid_t domid,
+                                ioservid_t id)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_destroy_ioreq_server_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_destroy_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->id = id;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
 int xc_domain_setdebugging(xc_interface *xch,
                            uint32_t domid,
                            unsigned int enable)
diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
index ca2fb51..305e4b8 100644
--- a/tools/libxc/xc_domain_restore.c
+++ b/tools/libxc/xc_domain_restore.c
@@ -746,6 +746,7 @@ typedef struct {
     uint64_t acpi_ioport_location;
     uint64_t viridian;
     uint64_t vm_generationid_addr;
+    uint64_t nr_ioreq_servers;
 
     struct toolstack_data_t tdata;
 } pagebuf_t;
@@ -996,6 +997,16 @@ static int pagebuf_get_one(xc_interface *xch, struct restore_ctx *ctx,
         DPRINTF("read generation id buffer address");
         return pagebuf_get_one(xch, ctx, buf, fd, dom);
 
+    case XC_SAVE_ID_HVM_NR_IOREQ_SERVERS:
+        /* Skip padding 4 bytes then read the acpi ioport location. */
+        if ( RDEXACT(fd, &buf->nr_ioreq_servers, sizeof(uint32_t)) ||
+             RDEXACT(fd, &buf->nr_ioreq_servers, sizeof(uint64_t)) )
+        {
+            PERROR("error reading the number of IOREQ servers");
+            return -1;
+        }
+        return pagebuf_get_one(xch, ctx, buf, fd, dom);
+
     default:
         if ( (count > MAX_BATCH_SIZE) || (count < 0) ) {
             ERROR("Max batch size exceeded (%d). Giving up.", count);
@@ -1755,6 +1766,15 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
     if (pagebuf.viridian != 0)
         xc_set_hvm_param(xch, dom, HVM_PARAM_VIRIDIAN, 1);
 
+    if ( hvm ) {
+        int nr_ioreq_servers = pagebuf.nr_ioreq_servers;
+
+        if ( nr_ioreq_servers == 0 )
+            nr_ioreq_servers = 1;
+
+        xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS, nr_ioreq_servers);
+    }
+
     if (pagebuf.acpi_ioport_location == 1) {
         DBGPRINTF("Use new firmware ioport from the checkpoint\n");
         xc_set_hvm_param(xch, dom, HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
index 42c4752..3293e29 100644
--- a/tools/libxc/xc_domain_save.c
+++ b/tools/libxc/xc_domain_save.c
@@ -1731,6 +1731,18 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
             PERROR("Error when writing the viridian flag");
             goto out;
         }
+
+        chunk.id = XC_SAVE_ID_HVM_NR_IOREQ_SERVERS;
+        chunk.data = 0;
+        xc_get_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS,
+                         (unsigned long *)&chunk.data);
+
+        if ( (chunk.data != 0) &&
+             wrexact(io_fd, &chunk, sizeof(chunk)) )
+        {
+            PERROR("Error when writing the number of IOREQ servers");
+            goto out;
+        }
     }
 
     if ( callbacks != NULL && callbacks->toolstack_save != NULL )
diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index f24f2a1..bbe5def 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -45,7 +45,7 @@
 #define SPECIALPAGE_IDENT_PT 4
 #define SPECIALPAGE_CONSOLE  5
 #define SPECIALPAGE_IOREQ    6
-#define NR_SPECIAL_PAGES     SPECIALPAGE_IOREQ + 2 /* ioreq server needs 2 pages */
+#define NR_SPECIAL_PAGES(n)  SPECIALPAGE_IOREQ + (2 * n) /* ioreq server needs 2 pages */
 #define special_pfn(x) (0xff000u - (x))
 
 static int modules_init(struct xc_hvm_build_args *args,
@@ -83,7 +83,8 @@ static int modules_init(struct xc_hvm_build_args *args,
 }
 
 static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
-                           uint64_t mmio_start, uint64_t mmio_size)
+                           uint64_t mmio_start, uint64_t mmio_size,
+                           int max_emulators)
 {
     struct hvm_info_table *hvm_info = (struct hvm_info_table *)
         (((unsigned char *)hvm_info_page) + HVM_INFO_OFFSET);
@@ -111,7 +112,7 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
     /* Memory parameters. */
     hvm_info->low_mem_pgend = lowmem_end >> PAGE_SHIFT;
     hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT;
-    hvm_info->reserved_mem_pgstart = special_pfn(0) - NR_SPECIAL_PAGES;
+    hvm_info->reserved_mem_pgstart = special_pfn(0) - NR_SPECIAL_PAGES(max_emulators);
 
     /* Finish with the checksum. */
     for ( i = 0, sum = 0; i < hvm_info->length; i++ )
@@ -254,6 +255,10 @@ static int setup_guest(xc_interface *xch,
         stat_1gb_pages = 0;
     int pod_mode = 0;
     int claim_enabled = args->claim_enabled;
+    int max_emulators = args->max_emulators;
+
+    if ( max_emulators < 1 )
+        goto error_out;
 
     if ( nr_pages > target_pages )
         pod_mode = XENMEMF_populate_on_demand;
@@ -458,7 +463,8 @@ static int setup_guest(xc_interface *xch,
               xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE,
               HVM_INFO_PFN)) == NULL )
         goto error_out;
-    build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size);
+    build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size,
+                   max_emulators);
     munmap(hvm_info_page, PAGE_SIZE);
 
     /* Allocate and clear special pages. */
@@ -470,17 +476,18 @@ static int setup_guest(xc_interface *xch,
             "  STORE:     %"PRI_xen_pfn"\n"
             "  IDENT_PT:  %"PRI_xen_pfn"\n"
             "  CONSOLE:   %"PRI_xen_pfn"\n"
-            "  IOREQ:     %"PRI_xen_pfn"\n",
-            NR_SPECIAL_PAGES,
+            "  IOREQ(%02d): %"PRI_xen_pfn"\n",
+            NR_SPECIAL_PAGES(max_emulators),
             (xen_pfn_t)special_pfn(SPECIALPAGE_PAGING),
             (xen_pfn_t)special_pfn(SPECIALPAGE_ACCESS),
             (xen_pfn_t)special_pfn(SPECIALPAGE_SHARING),
             (xen_pfn_t)special_pfn(SPECIALPAGE_XENSTORE),
             (xen_pfn_t)special_pfn(SPECIALPAGE_IDENT_PT),
             (xen_pfn_t)special_pfn(SPECIALPAGE_CONSOLE),
+            max_emulators * 2,
             (xen_pfn_t)special_pfn(SPECIALPAGE_IOREQ));
 
-    for ( i = 0; i < NR_SPECIAL_PAGES; i++ )
+    for ( i = 0; i < NR_SPECIAL_PAGES(max_emulators); i++ )
     {
         xen_pfn_t pfn = special_pfn(i);
         rc = xc_domain_populate_physmap_exact(xch, dom, 1, 0, 0, &pfn);
@@ -506,7 +513,9 @@ static int setup_guest(xc_interface *xch,
     xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN,
                      special_pfn(SPECIALPAGE_IOREQ));
     xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
-                     special_pfn(SPECIALPAGE_IOREQ) - 1);
+                     special_pfn(SPECIALPAGE_IOREQ) - max_emulators);
+    xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS,
+                     max_emulators);
 
     /*
      * Identity-map page table is required for running with CR0.PG=0 when
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 13f816b..142aaea 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1801,6 +1801,47 @@ void xc_clear_last_error(xc_interface *xch);
 int xc_set_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long value);
 int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long *value);
 
+/*
+ * IOREQ server API
+ */
+int xc_hvm_create_ioreq_server(xc_interface *xch,
+			       domid_t domid,
+			       ioservid_t *id);
+
+int xc_hvm_get_ioreq_server_info(xc_interface *xch,
+				 domid_t domid,
+				 ioservid_t id,
+				 xen_pfn_t *pfn,
+				 xen_pfn_t *buf_pfn,
+				 evtchn_port_t *buf_port);
+
+int xc_hvm_map_io_range_to_ioreq_server(xc_interface *xch,
+					domid_t domid,
+                                        ioservid_t id,
+					int is_mmio,
+                                        uint64_t start,
+					uint64_t end);
+
+int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch,
+					    domid_t domid,
+                                            ioservid_t id,
+					    int is_mmio,
+                                            uint64_t start);
+
+int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch,
+				      domid_t domid,
+                                      ioservid_t id,
+				      uint16_t bdf);
+
+int xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch,
+					  domid_t domid,
+					  ioservid_t id,
+					  uint16_t bdf);
+
+int xc_hvm_destroy_ioreq_server(xc_interface *xch,
+				domid_t domid,
+				ioservid_t id);
+
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
                      uint32_t domid,
diff --git a/tools/libxc/xenguest.h b/tools/libxc/xenguest.h
index a0e30e1..8930ac0 100644
--- a/tools/libxc/xenguest.h
+++ b/tools/libxc/xenguest.h
@@ -234,6 +234,8 @@ struct xc_hvm_build_args {
     struct xc_hvm_firmware_module smbios_module;
     /* Whether to use claim hypercall (1 - enable, 0 - disable). */
     int claim_enabled;
+    /* Maximum number of emulators for VM */
+    int max_emulators;
 };
 
 /**
diff --git a/tools/libxc/xg_save_restore.h b/tools/libxc/xg_save_restore.h
index f859621..5170b7f 100644
--- a/tools/libxc/xg_save_restore.h
+++ b/tools/libxc/xg_save_restore.h
@@ -259,6 +259,7 @@
 #define XC_SAVE_ID_HVM_ACCESS_RING_PFN  -16
 #define XC_SAVE_ID_HVM_SHARING_RING_PFN -17
 #define XC_SAVE_ID_TOOLSTACK          -18 /* Optional toolstack specific info */
+#define XC_SAVE_ID_HVM_NR_IOREQ_SERVERS -19
 
 /*
 ** We process save/restore/migrate in batches of pages; the below
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 12d6c31..b679957 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -95,6 +95,14 @@
 #define LIBXL_HAVE_BUILDINFO_EVENT_CHANNELS 1
 
 /*
+ * LIBXL_HAVE_BUILDINFO_HVM_MAX_EMULATORS indicates that the
+ * max_emulators field is present in the hvm sections of
+ * libxl_domain_build_info. This field can be used to reserve
+ * extra special pages for secondary device emulators.
+ */
+#define LIBXL_HAVE_BUILDINFO_HVM_MAX_EMULATORS 1
+
+/*
  * libxl ABI compatibility
  *
  * The only guarantee which libxl makes regarding ABI compatibility
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index a604cd8..cce93d9 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -330,6 +330,9 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
 
         libxl_defbool_setdefault(&b_info->u.hvm.gfx_passthru, false);
 
+        if (b_info->u.hvm.max_emulators < 1)
+            b_info->u.hvm.max_emulators = 1;
+
         break;
     case LIBXL_DOMAIN_TYPE_PV:
         libxl_defbool_setdefault(&b_info->u.pv.e820_host, false);
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 55f74b2..9de06f9 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -637,6 +637,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     args.mem_size = (uint64_t)(info->max_memkb - info->video_memkb) << 10;
     args.mem_target = (uint64_t)(info->target_memkb - info->video_memkb) << 10;
     args.claim_enabled = libxl_defbool_val(info->claim_mode);
+    args.max_emulators = info->u.hvm.max_emulators;
     if (libxl__domain_firmware(gc, info, &args)) {
         LOG(ERROR, "initializing domain firmware failed");
         goto out;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 649ce50..b707159 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -372,6 +372,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                        ("xen_platform_pci", libxl_defbool),
                                        ("usbdevice_list",   libxl_string_list),
                                        ("vendor_device",    libxl_vendor_device),
+                                       ("max_emulators",    integer),
                                        ])),
                  ("pv", Struct(None, [("kernel", string),
                                       ("slack_memkb", MemKB),
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index aff6f90..c65f4f4 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1750,6 +1750,9 @@ skip_vfb:
 
             b_info->u.hvm.vendor_device = d;
         }
+ 
+        if (!xlu_cfg_get_long (config, "secondary_device_emulators", &l, 0))
+            b_info->u.hvm.max_emulators = l + 1;
     }
 
     xlu_cfg_destroy(config);
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index d9874fb..5f9e728 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -379,21 +379,23 @@ static void hvm_wait_on_io(struct domain *d, ioreq_t *p)
 void hvm_do_resume(struct vcpu *v)
 {
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s;
+    struct list_head *entry, *next;
 
     check_wakeup_from_wait();
 
     if ( is_hvm_vcpu(v) )
         pt_restore_timer(v);
 
-    s = v->arch.hvm_vcpu.ioreq_server;
-    v->arch.hvm_vcpu.ioreq_server = NULL;
-
-    if ( s )
+    list_for_each_safe ( entry, next, &v->arch.hvm_vcpu.ioreq_server_list )
     {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                vcpu_list_entry[v->vcpu_id]);
         ioreq_t *p = get_ioreq(s, v->vcpu_id);
 
         hvm_wait_on_io(d, p);
+
+        list_del_init(entry);
     }
 
     /* Inject pending hw/sw trap */
@@ -531,6 +533,83 @@ static int hvm_print_line(
     return X86EMUL_OKAY;
 }
 
+static int hvm_access_cf8(
+    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+{
+    struct vcpu *curr = current;
+    struct hvm_domain *hd = &curr->domain->arch.hvm_domain;
+    int rc;
+
+    BUG_ON(port < 0xcf8);
+    port -= 0xcf8;
+
+    spin_lock(&hd->pci_lock);
+
+    if ( dir == IOREQ_WRITE )
+    {
+        switch ( bytes )
+        {
+        case 4:
+            hd->pci_cf8 = *val;
+            break;
+
+        case 2:
+        {
+            uint32_t mask = 0xffff << (port * 8);
+            uint32_t subval = *val << (port * 8);
+
+            hd->pci_cf8 = (hd->pci_cf8 & ~mask) |
+                          (subval & mask);
+            break;
+        }
+            
+        case 1:
+        {
+            uint32_t mask = 0xff << (port * 8);
+            uint32_t subval = *val << (port * 8);
+
+            hd->pci_cf8 = (hd->pci_cf8 & ~mask) |
+                          (subval & mask);
+            break;
+        }
+
+        default:
+            break;
+        }
+
+        /* We always need to fall through to the catch all emulator */
+        rc = X86EMUL_UNHANDLEABLE;
+    }
+    else
+    {
+        switch ( bytes )
+        {
+        case 4:
+            *val = hd->pci_cf8;
+            rc = X86EMUL_OKAY;
+            break;
+
+        case 2:
+            *val = (hd->pci_cf8 >> (port * 8)) & 0xffff;
+            rc = X86EMUL_OKAY;
+            break;
+            
+        case 1:
+            *val = (hd->pci_cf8 >> (port * 8)) & 0xff;
+            rc = X86EMUL_OKAY;
+            break;
+
+        default:
+            rc = X86EMUL_UNHANDLEABLE;
+            break;
+        }
+    }
+
+    spin_unlock(&hd->pci_lock);
+
+    return rc;
+}
+
 static int handle_pvh_io(
     int dir, uint32_t port, uint32_t bytes, uint32_t *val)
 {
@@ -590,6 +669,8 @@ done:
 
 static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, struct vcpu *v)
 {
+    list_del_init(&s->vcpu_list_entry[v->vcpu_id]);
+
     if ( v->vcpu_id == 0 )
     {
         if ( s->buf_ioreq_evtchn >= 0 )
@@ -606,7 +687,7 @@ static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, struct vcpu
     }
 }
 
-static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
+static int hvm_create_ioreq_server(struct domain *d, ioservid_t id, domid_t domid)
 {
     struct hvm_ioreq_server *s;
     int i;
@@ -614,34 +695,47 @@ static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
     struct vcpu *v;
     int rc;
 
+    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
+        return -EINVAL;
+
     spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
 
     rc = -EEXIST;
-    if ( d->arch.hvm_domain.ioreq_server != NULL )
-        goto fail_exist;
+    list_for_each_entry ( s, 
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          domain_list_entry )
+    {
+        if ( s->id == id )
+            goto fail_exist;
+    }
 
-    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, d->domain_id);
+    gdprintk(XENLOG_INFO, "%s: %d:%d\n", __func__, d->domain_id, id);
 
     rc = -ENOMEM;
     s = xzalloc(struct hvm_ioreq_server);
     if ( !s )
         goto fail_alloc;
 
+    s->id = id;
     s->domain = d;
     s->domid = domid;
+    INIT_LIST_HEAD(&s->domain_list_entry);
 
     for ( i = 0; i < MAX_HVM_VCPUS; i++ )
+    {
         s->ioreq_evtchn[i] = -1;
+        INIT_LIST_HEAD(&s->vcpu_list_entry[i]);
+    }
     s->buf_ioreq_evtchn = -1;
 
     /* Initialize shared pages */
-    pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
+    pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN] - s->id;
 
     hvm_init_ioreq_page(s, 0);
     if ( (rc = hvm_set_ioreq_page(s, 0, pfn)) < 0 )
         goto fail_set_ioreq;
 
-    pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
+    pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN] - s->id;
 
     hvm_init_ioreq_page(s, 1);
     if ( (rc = hvm_set_ioreq_page(s, 1, pfn)) < 0 )
@@ -653,7 +747,8 @@ static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
             goto fail_add_vcpu;
     }
 
-    d->arch.hvm_domain.ioreq_server = s;
+    list_add(&s->domain_list_entry,
+             &d->arch.hvm_domain.ioreq_server_list);
 
     spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
 
@@ -673,22 +768,30 @@ fail_exist:
     return rc;
 }
 
-static void hvm_destroy_ioreq_server(struct domain *d)
+static void hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
 {
-    struct hvm_ioreq_server *s;
+    struct hvm_ioreq_server *s, *next;
     struct vcpu *v;
 
     spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
 
-    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, d->domain_id);
+    list_for_each_entry_safe ( s,
+                               next,
+                               &d->arch.hvm_domain.ioreq_server_list,
+                               domain_list_entry)
+    {
+        if ( s->id == id )
+            goto found;
+    }
 
-    s = d->arch.hvm_domain.ioreq_server;
-    if ( !s )
-        goto done;
+    goto done;
+
+found:
+    gdprintk(XENLOG_INFO, "%s: %d:%d\n", __func__, d->domain_id, id);
 
     domain_pause(d);
 
-    d->arch.hvm_domain.ioreq_server = NULL;
+    list_del_init(&s->domain_list_entry);
 
     for_each_vcpu ( d, v )
         hvm_ioreq_server_remove_vcpu(s, v);
@@ -704,21 +807,186 @@ done:
     spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
 }
 
-static int hvm_get_ioreq_server_buf_port(struct domain *d, evtchn_port_t *port)
+static int hvm_get_ioreq_server_buf_port(struct domain *d, ioservid_t id, evtchn_port_t *port)
+{
+    struct list_head *entry;
+    int rc;
+
+    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
+        return -EINVAL;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    rc = -ENOENT;
+    list_for_each ( entry,
+                    &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                domain_list_entry);
+
+        if ( s->id == id )
+        {
+            *port = s->buf_ioreq_evtchn;
+            rc = 0;
+            break;
+        }
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
+static int hvm_get_ioreq_server_pfn(struct domain *d, ioservid_t id, int buf, xen_pfn_t *pfn)
+{
+    struct list_head *entry;
+    int rc;
+
+    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
+        return -EINVAL;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    rc = -ENOENT;
+    list_for_each ( entry,
+                    &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                domain_list_entry);
+
+        if ( s->id == id )
+        {
+            if ( buf )
+                *pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN] - s->id;
+            else
+                *pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN] - s->id;
+
+            rc = 0;
+            break;
+        }
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
+static int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
+                                            int is_mmio, uint64_t start, uint64_t end)
 {
     struct hvm_ioreq_server *s;
+    struct hvm_io_range *x;
     int rc;
 
+    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
+        return -EINVAL;
+
+    x = xmalloc(struct hvm_io_range);
+    if ( x == NULL )
+        return -ENOMEM;
+
     spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
 
-    s = d->arch.hvm_domain.ioreq_server;
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          domain_list_entry )
+    {
+        if ( s->id == id )
+            goto found;
+    }
+
+    goto fail;
+
+found:
+    x->start = start;
+    x->end = end;
+
+    if ( is_mmio )
+    {
+        x->next = s->mmio_range_list;
+        s->mmio_range_list = x;
+    }
+    else
+    {
+        x->next = s->portio_range_list;
+        s->portio_range_list = x;
+    }
+
+    gdprintk(XENLOG_DEBUG, "%d:%d: +%s %"PRIX64" - %"PRIX64"\n",
+             d->domain_id,
+             s->id,
+             ( is_mmio ) ? "MMIO" : "PORTIO",
+             x->start,
+             x->end);
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return 0;
+
+fail:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+    xfree(x);
+
+    return rc;
+}
+
+static int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
+                                                int is_mmio, uint64_t start)
+{
+    struct hvm_ioreq_server *s;
+    struct hvm_io_range *x, **xp;
+    int rc;
+
+    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
+        return -EINVAL;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
 
     rc = -ENOENT;
-    if ( !s )
-        goto done;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          domain_list_entry )
+    {
+        if ( s->id == id )
+            goto found;
+    }
 
-    *port = s->buf_ioreq_evtchn;
-    rc = 0;
+    goto done;
+
+found:
+    if ( is_mmio )
+    {
+        x = s->mmio_range_list;
+        xp = &s->mmio_range_list;
+    }
+    else
+    {
+        x = s->portio_range_list;
+        xp = &s->portio_range_list;
+    }
+
+    while ( (x != NULL) && (start != x->start) )
+    {
+        xp = &x->next;
+        x = x->next;
+    }
+
+    if ( (x != NULL) )
+    {
+        gdprintk(XENLOG_DEBUG, "%d:%d: -%s %"PRIX64" - %"PRIX64"\n",
+                 d->domain_id,
+                 s->id,
+                 ( is_mmio ) ? "MMIO" : "PORTIO",
+                 x->start,
+                 x->end);
+
+        *xp = x->next;
+        xfree(x);
+        rc = 0;
+    }
 
 done:
     spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
@@ -726,25 +994,98 @@ done:
     return rc;
 }
 
-static int hvm_get_ioreq_server_pfn(struct domain *d, int buf, xen_pfn_t *pfn)
+static int hvm_map_pcidev_to_ioreq_server(struct domain *d, ioservid_t id,
+                                          uint16_t bdf)
 {
     struct hvm_ioreq_server *s;
+    struct hvm_pcidev *x;
     int rc;
 
+    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
+        return -EINVAL;
+
+    x = xmalloc(struct hvm_pcidev);
+    if ( x == NULL )
+        return -ENOMEM;
+
     spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
 
-    s = d->arch.hvm_domain.ioreq_server;
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          domain_list_entry )
+    {
+        if ( s->id == id )
+            goto found;
+    }
+
+    goto fail;
+
+found:
+    x->bdf = bdf;
+
+    x->next = s->pcidev_list;
+    s->pcidev_list = x;
+
+    gdprintk(XENLOG_DEBUG, "%d:%d: +PCIDEV %04X\n",
+             d->domain_id,
+             s->id,
+             x->bdf);
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return 0;
+
+fail:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+    xfree(x);
+
+    return rc;
+}
+
+static int hvm_unmap_pcidev_from_ioreq_server(struct domain *d, ioservid_t id,
+                                              uint16_t bdf)
+{
+    struct hvm_ioreq_server *s;
+    struct hvm_pcidev *x, **xp;
+    int rc;
+
+    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
+        return -EINVAL;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
 
     rc = -ENOENT;
-    if ( !s )
-        goto done;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          domain_list_entry )
+    {
+        if ( s->id == id )
+            goto found;
+    }
 
-    if ( buf )
-        *pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
-    else
-        *pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
+    goto done;
 
-    rc = 0;
+found:
+    x = s->pcidev_list;
+    xp = &s->pcidev_list;
+
+    while ( (x != NULL) && (bdf != x->bdf) )
+    {
+        xp = &x->next;
+        x = x->next;
+    }
+    if ( (x != NULL) )
+    {
+        gdprintk(XENLOG_DEBUG, "%d:%d: -PCIDEV %04X\n",
+                 d->domain_id,
+                 s->id,
+                 x->bdf);
+
+         *xp = x->next;
+        xfree(x);
+        rc = 0;
+    }
 
 done:
     spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
@@ -752,6 +1093,73 @@ done:
     return rc;
 }
 
+static int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
+{
+    struct list_head *entry;
+    int rc;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    list_for_each ( entry,
+                    &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                domain_list_entry);
+
+        if ( (rc = hvm_ioreq_server_add_vcpu(s, v)) < 0 )
+            goto fail;
+    }
+        
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return 0;
+
+fail:
+    list_for_each ( entry,
+                    &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                domain_list_entry);
+
+        hvm_ioreq_server_remove_vcpu(s, v);
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    return rc;
+}
+
+static void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v)
+{
+    struct list_head *entry;
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    list_for_each ( entry,
+                    &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                domain_list_entry);
+
+        hvm_ioreq_server_remove_vcpu(s, v);
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+}
+
+static void hvm_destroy_all_ioreq_servers(struct domain *d)
+{
+    ioservid_t id;
+
+    for ( id = 0;
+          id < d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS];
+          id++ )
+        hvm_destroy_ioreq_server(d, id);
+}
+
 static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
                                      int *p_port)
 {
@@ -767,21 +1175,30 @@ static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
     return 0;
 }
 
-static int hvm_set_ioreq_server_domid(struct domain *d, domid_t domid)
+static int hvm_set_ioreq_server_domid(struct domain *d, ioservid_t id, domid_t domid)
 {
     struct hvm_ioreq_server *s;
     struct vcpu *v;
     int rc = 0;
 
+    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
+        return -EINVAL;
+
     domain_pause(d);
     spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
 
-    s = d->arch.hvm_domain.ioreq_server;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          domain_list_entry )
+    {
+        if ( s->id == id )
+            goto found;
+    }
 
     rc = -ENOENT;
-    if ( !s )
-        goto done;
+    goto done;
 
+found:
     rc = 0;
     if ( s->domid == domid )
         goto done;
@@ -838,7 +1255,9 @@ int hvm_domain_initialise(struct domain *d)
 
     }
 
+    INIT_LIST_HEAD(&d->arch.hvm_domain.ioreq_server_list);
     spin_lock_init(&d->arch.hvm_domain.ioreq_server_lock);
+    spin_lock_init(&d->arch.hvm_domain.pci_lock);
     spin_lock_init(&d->arch.hvm_domain.irq_lock);
     spin_lock_init(&d->arch.hvm_domain.uc_lock);
 
@@ -880,6 +1299,7 @@ int hvm_domain_initialise(struct domain *d)
     rtc_init(d);
 
     register_portio_handler(d, 0xe9, 1, hvm_print_line);
+    register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
 
     rc = hvm_funcs.domain_initialise(d);
     if ( rc != 0 )
@@ -910,7 +1330,7 @@ void hvm_domain_relinquish_resources(struct domain *d)
     if ( hvm_funcs.nhvm_domain_relinquish_resources )
         hvm_funcs.nhvm_domain_relinquish_resources(d);
 
-    hvm_destroy_ioreq_server(d);
+    hvm_destroy_all_ioreq_servers(d);
 
     msixtbl_pt_cleanup(d);
 
@@ -1422,13 +1842,14 @@ int hvm_vcpu_initialise(struct vcpu *v)
 {
     int rc;
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s;
 
     hvm_asid_flush_vcpu(v);
 
     spin_lock_init(&v->arch.hvm_vcpu.tm_lock);
     INIT_LIST_HEAD(&v->arch.hvm_vcpu.tm_list);
 
+    INIT_LIST_HEAD(&v->arch.hvm_vcpu.ioreq_server_list);
+
     rc = hvm_vcpu_cacheattr_init(v); /* teardown: vcpu_cacheattr_destroy */
     if ( rc != 0 )
         goto fail1;
@@ -1465,16 +1886,9 @@ int hvm_vcpu_initialise(struct vcpu *v)
          && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown: nestedhvm_vcpu_destroy */
         goto fail5;
 
-    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
-    s = d->arch.hvm_domain.ioreq_server;
-    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
-
-    if ( s )
-    {
-        rc = hvm_ioreq_server_add_vcpu(s, v);
-        if ( rc < 0 )
-            goto fail6;
-    }
+    rc = hvm_all_ioreq_servers_add_vcpu(d, v);
+    if ( rc < 0 )
+        goto fail6;
 
     if ( v->vcpu_id == 0 )
     {
@@ -1510,14 +1924,8 @@ int hvm_vcpu_initialise(struct vcpu *v)
 void hvm_vcpu_destroy(struct vcpu *v)
 {
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s;
-
-    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
-    s = d->arch.hvm_domain.ioreq_server;
-    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
 
-    if ( s )
-        hvm_ioreq_server_remove_vcpu(s, v);
+    hvm_all_ioreq_servers_remove_vcpu(d, v);
 
     nestedhvm_vcpu_destroy(v);
 
@@ -1556,6 +1964,101 @@ void hvm_vcpu_down(struct vcpu *v)
     }
 }
 
+static struct hvm_ioreq_server *hvm_select_ioreq_server(struct vcpu *v, ioreq_t *p)
+{
+#define BDF(cf8) (((cf8) & 0x00ffff00) >> 8)
+
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s;
+    uint8_t type;
+    uint64_t addr;
+
+    if ( p->type == IOREQ_TYPE_PIO &&
+         (p->addr & ~3) == 0xcfc )
+    { 
+        /* PCI config data cycle */
+        type = IOREQ_TYPE_PCI_CONFIG;
+
+        spin_lock(&d->arch.hvm_domain.pci_lock);
+        addr = d->arch.hvm_domain.pci_cf8 + (p->addr & 3);
+        spin_unlock(&d->arch.hvm_domain.pci_lock);
+    }
+    else
+    {
+        type = p->type;
+        addr = p->addr;
+    }
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    switch ( type )
+    {
+    case IOREQ_TYPE_COPY:
+    case IOREQ_TYPE_PIO:
+    case IOREQ_TYPE_PCI_CONFIG:
+        break;
+    default:
+        goto done;
+    }
+
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          domain_list_entry )
+    {
+        switch ( type )
+        {
+            case IOREQ_TYPE_COPY:
+            case IOREQ_TYPE_PIO: {
+                struct hvm_io_range *x;
+
+                x = (type == IOREQ_TYPE_COPY) ?
+                    s->mmio_range_list :
+                    s->portio_range_list;
+
+                for ( ; x; x = x->next )
+                {
+                    if ( (addr >= x->start) && (addr <= x->end) )
+                        goto found;
+                }
+                break;
+            }
+            case IOREQ_TYPE_PCI_CONFIG: {
+                struct hvm_pcidev *x;
+
+                x = s->pcidev_list;
+
+                for ( ; x; x = x->next )
+                {
+                    if ( BDF(addr) == x->bdf ) {
+                        p->type = type;
+                        p->addr = addr;
+                        goto found;
+                    }
+                }
+                break;
+            }
+        }
+    }
+
+done:
+    /* The catch-all server has id 0 */
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server_list,
+                          domain_list_entry )
+    {
+        if ( s->id == 0 )
+            goto found;
+    }
+
+    s = NULL;
+
+found:
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+    return s;
+
+#undef BDF
+}
+
 int hvm_buffered_io_send(ioreq_t *p)
 {
     struct vcpu *v = current;
@@ -1570,10 +2073,7 @@ int hvm_buffered_io_send(ioreq_t *p)
     /* Ensure buffered_iopage fits in a page */
     BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
 
-    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
-    s = d->arch.hvm_domain.ioreq_server;
-    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
-
+    s = hvm_select_ioreq_server(v, p);
     if ( !s )
         return 0;
 
@@ -1661,7 +2161,9 @@ static bool_t hvm_send_assist_req_to_server(struct hvm_ioreq_server *s,
         return 0;
     }
 
-    v->arch.hvm_vcpu.ioreq_server = s;
+    ASSERT(list_empty(&s->vcpu_list_entry[v->vcpu_id]));
+    list_add(&s->vcpu_list_entry[v->vcpu_id],
+             &v->arch.hvm_vcpu.ioreq_server_list); 
 
     p->dir = proto_p->dir;
     p->data_is_ptr = proto_p->data_is_ptr;
@@ -1686,24 +2188,42 @@ static bool_t hvm_send_assist_req_to_server(struct hvm_ioreq_server *s,
 
 bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *p)
 {
-    struct domain *d = v->domain;
     struct hvm_ioreq_server *s;
 
-    ASSERT(v->arch.hvm_vcpu.ioreq_server == NULL);
+    ASSERT(list_empty(&v->arch.hvm_vcpu.ioreq_server_list));
 
     if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
         return 0;
 
-    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
-    s = d->arch.hvm_domain.ioreq_server;
-    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
-
+    s = hvm_select_ioreq_server(v, p);
     if ( !s )
         return 0;
 
     return hvm_send_assist_req_to_server(s, v, p);
 }
 
+void hvm_broadcast_assist_req(struct vcpu *v, ioreq_t *p)
+{
+    struct domain *d = v->domain;
+    struct list_head *entry;
+
+    ASSERT(list_empty(&v->arch.hvm_vcpu.ioreq_server_list));
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
+
+    list_for_each ( entry,
+                    &d->arch.hvm_domain.ioreq_server_list )
+    {
+        struct hvm_ioreq_server *s = list_entry(entry,
+                                                struct hvm_ioreq_server,
+                                                domain_list_entry);
+
+        (void) hvm_send_assist_req_to_server(s, v, p);
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
+}
+
 void hvm_hlt(unsigned long rflags)
 {
     struct vcpu *curr = current;
@@ -4286,6 +4806,215 @@ static int hvmop_flush_tlb_all(void)
     return 0;
 }
 
+static int hvmop_create_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_create_ioreq_server_t) uop)
+{
+    struct domain *curr_d = current->domain;
+    xen_hvm_create_ioreq_server_t op;
+    struct domain *d;
+    ioservid_t id;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = -ENOSPC;
+    for ( id = 1;
+          id <  d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS];
+          id++ )
+    {
+        rc = hvm_create_ioreq_server(d, id, curr_d->domain_id);
+        if ( rc == -EEXIST )
+            continue;
+
+        break;
+    }
+
+    if ( rc == -EEXIST )
+        rc = -ENOSPC;
+
+    if ( rc < 0 )
+        goto out;
+
+    op.id = id;
+
+    rc = copy_to_guest(uop, &op, 1) ? -EFAULT : 0;
+    
+out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_get_ioreq_server_info(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_get_ioreq_server_info_t) uop)
+{
+    xen_hvm_get_ioreq_server_info_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    if ( (rc = hvm_get_ioreq_server_pfn(d, op.id, 0, &op.pfn)) < 0 )
+        goto out;
+
+    if ( (rc = hvm_get_ioreq_server_pfn(d, op.id, 1, &op.buf_pfn)) < 0 )
+        goto out;
+
+    if ( (rc = hvm_get_ioreq_server_buf_port(d, op.id, &op.buf_port)) < 0 )
+        goto out;
+
+    rc = copy_to_guest(uop, &op, 1) ? -EFAULT : 0;
+    
+out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_map_io_range_to_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_map_io_range_to_ioreq_server_t) uop)
+{
+    xen_hvm_map_io_range_to_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = hvm_map_io_range_to_ioreq_server(d, op.id, op.is_mmio,
+                                          op.start, op.end);
+
+out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_unmap_io_range_from_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_unmap_io_range_from_ioreq_server_t) uop)
+{
+    xen_hvm_unmap_io_range_from_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = hvm_unmap_io_range_from_ioreq_server(d, op.id, op.is_mmio,
+                                              op.start);
+    
+out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_map_pcidev_to_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_map_pcidev_to_ioreq_server_t) uop)
+{
+    xen_hvm_map_pcidev_to_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = hvm_map_pcidev_to_ioreq_server(d, op.id, op.bdf);
+
+out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_unmap_pcidev_from_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_unmap_pcidev_from_ioreq_server_t) uop)
+{
+    xen_hvm_unmap_pcidev_from_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = hvm_unmap_pcidev_from_ioreq_server(d, op.id, op.bdf);
+
+out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
+static int hvmop_destroy_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_destroy_ioreq_server_t) uop)
+{
+    xen_hvm_destroy_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    hvm_destroy_ioreq_server(d, op.id);
+    rc = 0;
+
+out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
 long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
 {
@@ -4294,6 +5023,41 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     switch ( op )
     {
+    case HVMOP_create_ioreq_server:
+        rc = hvmop_create_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_create_ioreq_server_t));
+        break;
+    
+    case HVMOP_get_ioreq_server_info:
+        rc = hvmop_get_ioreq_server_info(
+            guest_handle_cast(arg, xen_hvm_get_ioreq_server_info_t));
+        break;
+    
+    case HVMOP_map_io_range_to_ioreq_server:
+        rc = hvmop_map_io_range_to_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_map_io_range_to_ioreq_server_t));
+        break;
+    
+    case HVMOP_unmap_io_range_from_ioreq_server:
+        rc = hvmop_unmap_io_range_from_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_unmap_io_range_from_ioreq_server_t));
+        break;
+    
+    case HVMOP_map_pcidev_to_ioreq_server:
+        rc = hvmop_map_pcidev_to_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_map_pcidev_to_ioreq_server_t));
+        break;
+    
+    case HVMOP_unmap_pcidev_from_ioreq_server:
+        rc = hvmop_unmap_pcidev_from_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_unmap_pcidev_from_ioreq_server_t));
+        break;
+    
+    case HVMOP_destroy_ioreq_server:
+        rc = hvmop_destroy_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_destroy_ioreq_server_t));
+        break;
+    
     case HVMOP_set_param:
     case HVMOP_get_param:
     {
@@ -4382,9 +5146,9 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 if ( a.value == DOMID_SELF )
                     a.value = curr_d->domain_id;
 
-                rc = hvm_create_ioreq_server(d, a.value);
+                rc = hvm_create_ioreq_server(d, 0, a.value);
                 if ( rc == -EEXIST )
-                    rc = hvm_set_ioreq_server_domid(d, a.value);
+                    rc = hvm_set_ioreq_server_domid(d, 0, a.value);
                 break;
             case HVM_PARAM_ACPI_S_STATE:
                 /* Not reflexive, as we must domain_pause(). */
@@ -4449,6 +5213,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 if ( a.value > SHUTDOWN_MAX )
                     rc = -EINVAL;
                 break;
+            case HVM_PARAM_NR_IOREQ_SERVERS:
+                if ( d == current->domain )
+                    rc = -EPERM;
+                break;
             }
 
             if ( rc == 0 ) 
@@ -4483,7 +5251,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             case HVM_PARAM_BUFIOREQ_PFN:
             case HVM_PARAM_BUFIOREQ_EVTCHN:
                 /* May need to create server */
-                rc = hvm_create_ioreq_server(d, curr_d->domain_id);
+                rc = hvm_create_ioreq_server(d, 0, curr_d->domain_id);
                 if ( rc != 0 && rc != -EEXIST )
                     goto param_fail;
 
@@ -4492,7 +5260,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 case HVM_PARAM_IOREQ_PFN: {
                     xen_pfn_t pfn;
 
-                    if ( (rc = hvm_get_ioreq_server_pfn(d, 0, &pfn)) < 0 )
+                    if ( (rc = hvm_get_ioreq_server_pfn(d, 0, 0, &pfn)) < 0 )
                         goto param_fail;
 
                     a.value = pfn;
@@ -4501,7 +5269,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 case HVM_PARAM_BUFIOREQ_PFN: {
                     xen_pfn_t pfn;
 
-                    if ( (rc = hvm_get_ioreq_server_pfn(d, 1, &pfn)) < 0 )
+                    if ( (rc = hvm_get_ioreq_server_pfn(d, 0, 1, &pfn)) < 0 )
                         goto param_fail;
 
                     a.value = pfn;
@@ -4510,7 +5278,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
                 case HVM_PARAM_BUFIOREQ_EVTCHN: {
                     evtchn_port_t port;
 
-                    if ( (rc = hvm_get_ioreq_server_buf_port(d, &port)) < 0 )
+                    if ( (rc = hvm_get_ioreq_server_buf_port(d, 0, &port)) < 0 )
                         goto param_fail;
 
                     a.value = port;
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 576641c..a0d76b2 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -78,7 +78,7 @@ void send_invalidate_req(void)
     p->dir = IOREQ_WRITE;
     p->data = ~0UL; /* flush all */
 
-    (void)hvm_send_assist_req(v, p);
+    hvm_broadcast_assist_req(v, p);
 }
 
 int handle_mmio(void)
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index e750ef0..93dcec1 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -41,19 +41,38 @@ struct hvm_ioreq_page {
     void *va;
 };
 
+struct hvm_io_range {
+    struct hvm_io_range *next;
+    uint64_t            start, end;
+};	
+
+struct hvm_pcidev {
+    struct hvm_pcidev *next;
+    uint16_t          bdf;
+};	
+
 struct hvm_ioreq_server {
+    struct list_head       domain_list_entry;
+    struct list_head       vcpu_list_entry[MAX_HVM_VCPUS];
+    ioservid_t             id;
     struct domain          *domain;
     domid_t                domid;
     struct hvm_ioreq_page  ioreq;
     int                    ioreq_evtchn[MAX_HVM_VCPUS];
     struct hvm_ioreq_page  buf_ioreq;
     int                    buf_ioreq_evtchn;
+    struct hvm_io_range    *mmio_range_list;
+    struct hvm_io_range    *portio_range_list;
+    struct hvm_pcidev      *pcidev_list;
 };
 
 struct hvm_domain {
-    struct hvm_ioreq_server *ioreq_server;
+    struct list_head        ioreq_server_list;
     spinlock_t              ioreq_server_lock;
 
+    uint32_t                pci_cf8;
+    spinlock_t              pci_lock;
+
     struct pl_time         pl_time;
 
     struct hvm_io_handler *io_handler;
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 4e8fee8..1c3854f 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -225,6 +225,7 @@ int prepare_ring_for_helper(struct domain *d, unsigned long gmfn,
 void destroy_ring_for_helper(void **_va, struct page_info *page);
 
 bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *p);
+void hvm_broadcast_assist_req(struct vcpu *v, ioreq_t *p);
 
 void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat);
 int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat);
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 4c9d7ee..211ebfd 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -138,7 +138,7 @@ struct hvm_vcpu {
     spinlock_t          tm_lock;
     struct list_head    tm_list;
 
-    struct hvm_ioreq_server *ioreq_server;
+    struct list_head    ioreq_server_list;
 
     bool_t              flag_dr_dirty;
     bool_t              debug_state_latch;
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index a9aab4b..6b31189 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -23,6 +23,7 @@
 
 #include "../xen.h"
 #include "../trace.h"
+#include "../event_channel.h"
 
 /* Get/set subcommands: extra argument == pointer to xen_hvm_param struct. */
 #define HVMOP_set_param           0
@@ -270,6 +271,75 @@ struct xen_hvm_inject_msi {
 typedef struct xen_hvm_inject_msi xen_hvm_inject_msi_t;
 DEFINE_XEN_GUEST_HANDLE(xen_hvm_inject_msi_t);
 
+typedef uint32_t ioservid_t;
+
+DEFINE_XEN_GUEST_HANDLE(ioservid_t);
+
+#define HVMOP_create_ioreq_server 17
+struct xen_hvm_create_ioreq_server {
+    domid_t domid;  /* IN - domain to be serviced */
+    ioservid_t id;  /* OUT - server id */
+};
+typedef struct xen_hvm_create_ioreq_server xen_hvm_create_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_create_ioreq_server_t);
+
+#define HVMOP_get_ioreq_server_info 18
+struct xen_hvm_get_ioreq_server_info {
+    domid_t domid;          /* IN - domain to be serviced */
+    ioservid_t id;          /* IN - server id */
+    xen_pfn_t pfn;          /* OUT - ioreq pfn */
+    xen_pfn_t buf_pfn;      /* OUT - buf ioreq pfn */
+    evtchn_port_t buf_port; /* OUT - buf ioreq port */
+};
+typedef struct xen_hvm_get_ioreq_server_info xen_hvm_get_ioreq_server_info_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_get_ioreq_server_info_t);
+
+#define HVMOP_map_io_range_to_ioreq_server 19
+struct xen_hvm_map_io_range_to_ioreq_server {
+    domid_t domid;                  /* IN - domain to be serviced */
+    ioservid_t id;                  /* IN - handle from HVMOP_register_ioreq_server */
+    int is_mmio;                    /* IN - MMIO or port IO? */
+    uint64_aligned_t start, end;    /* IN - inclusive start and end of range */
+};
+typedef struct xen_hvm_map_io_range_to_ioreq_server xen_hvm_map_io_range_to_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_io_range_to_ioreq_server_t);
+
+#define HVMOP_unmap_io_range_from_ioreq_server 20
+struct xen_hvm_unmap_io_range_from_ioreq_server {
+    domid_t domid;          /* IN - domain to be serviced */
+    ioservid_t id;          /* IN - handle from HVMOP_register_ioreq_server */
+    uint8_t is_mmio;        /* IN - MMIO or port IO? */
+    uint64_aligned_t start; /* IN - start address of the range to remove */
+};
+typedef struct xen_hvm_unmap_io_range_from_ioreq_server xen_hvm_unmap_io_range_from_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_unmap_io_range_from_ioreq_server_t);
+
+#define HVMOP_map_pcidev_to_ioreq_server 21
+struct xen_hvm_map_pcidev_to_ioreq_server {
+    domid_t domid;      /* IN - domain to be serviced */
+    ioservid_t id;      /* IN - handle from HVMOP_register_ioreq_server */
+    uint16_t bdf;       /* IN - PCI bus/dev/func */
+};
+typedef struct xen_hvm_map_pcidev_to_ioreq_server xen_hvm_map_pcidev_to_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_pcidev_to_ioreq_server_t);
+
+#define HVMOP_unmap_pcidev_from_ioreq_server 22
+struct xen_hvm_unmap_pcidev_from_ioreq_server {
+    domid_t domid;      /* IN - domain to be serviced */
+    ioservid_t id;      /* IN - handle from HVMOP_register_ioreq_server */
+    uint16_t bdf;       /* IN - PCI bus/dev/func */
+};
+typedef struct xen_hvm_unmap_pcidev_from_ioreq_server xen_hvm_unmap_pcidev_from_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_unmap_pcidev_from_ioreq_server_t);
+
+#define HVMOP_destroy_ioreq_server 23
+struct xen_hvm_destroy_ioreq_server {
+    domid_t domid;          /* IN - domain to be serviced */
+    ioservid_t id;          /* IN - server id */
+};
+typedef struct xen_hvm_destroy_ioreq_server xen_hvm_destroy_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_destroy_ioreq_server_t);
+
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 
 #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
diff --git a/xen/include/public/hvm/ioreq.h b/xen/include/public/hvm/ioreq.h
index f05d130..e84fa75 100644
--- a/xen/include/public/hvm/ioreq.h
+++ b/xen/include/public/hvm/ioreq.h
@@ -34,6 +34,7 @@
 
 #define IOREQ_TYPE_PIO          0 /* pio */
 #define IOREQ_TYPE_COPY         1 /* mmio ops */
+#define IOREQ_TYPE_PCI_CONFIG   2 /* pci config ops */
 #define IOREQ_TYPE_TIMEOFFSET   7
 #define IOREQ_TYPE_INVALIDATE   8 /* mapcache */
 
diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
index 517a184..4109b11 100644
--- a/xen/include/public/hvm/params.h
+++ b/xen/include/public/hvm/params.h
@@ -145,6 +145,8 @@
 /* SHUTDOWN_* action in case of a triple fault */
 #define HVM_PARAM_TRIPLE_FAULT_REASON 31
 
-#define HVM_NR_PARAMS          32
+#define HVM_PARAM_NR_IOREQ_SERVERS 32
+
+#define HVM_NR_PARAMS          33
 
 #endif /* __XEN_PUBLIC_HVM_PARAMS_H__ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 4/5] ioreq-server: add support for multiple servers
  2014-01-30 14:19 ` [RFC PATCH 4/5] ioreq-server: add support for multiple servers Paul Durrant
@ 2014-01-30 15:46   ` Andrew Cooper
  2014-01-30 15:56     ` Paul Durrant
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2014-01-30 15:46 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

On 30/01/14 14:19, Paul Durrant wrote:
> The legacy 'catch-all' server is always created with id 0. Secondary
> servers will have an id ranging from 1 to a limit set by the toolstack
> via the 'max_emulators' build info field. This defaults to 1 so ordinarily
> no extra special pages are reserved for secondary emulators. It may be
> increased using the secondary_device_emulators parameter in xl.cfg(5).
>
> Because of the re-arrangement of the special pages in a previous patch we
> only need the addition of parameter HVM_PARAM_NR_IOREQ_SERVERS to determine
> the layout of the shared pages for multiple emulators. Guests migrated in
> from hosts without this patch will be lacking the save record which stores
> the new parameter and so the guest is assumed to only have had a single
> emulator.
>
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> ---
>  docs/man/xl.cfg.pod.5            |    7 +
>  tools/libxc/xc_domain.c          |  175 ++++++++
>  tools/libxc/xc_domain_restore.c  |   20 +
>  tools/libxc/xc_domain_save.c     |   12 +
>  tools/libxc/xc_hvm_build_x86.c   |   25 +-
>  tools/libxc/xenctrl.h            |   41 ++
>  tools/libxc/xenguest.h           |    2 +
>  tools/libxc/xg_save_restore.h    |    1 +
>  tools/libxl/libxl.h              |    8 +
>  tools/libxl/libxl_create.c       |    3 +
>  tools/libxl/libxl_dom.c          |    1 +
>  tools/libxl/libxl_types.idl      |    1 +
>  tools/libxl/xl_cmdimpl.c         |    3 +
>  xen/arch/x86/hvm/hvm.c           |  916 +++++++++++++++++++++++++++++++++++---
>  xen/arch/x86/hvm/io.c            |    2 +-
>  xen/include/asm-x86/hvm/domain.h |   21 +-
>  xen/include/asm-x86/hvm/hvm.h    |    1 +
>  xen/include/asm-x86/hvm/vcpu.h   |    2 +-
>  xen/include/public/hvm/hvm_op.h  |   70 +++
>  xen/include/public/hvm/ioreq.h   |    1 +
>  xen/include/public/hvm/params.h  |    4 +-
>  21 files changed, 1230 insertions(+), 86 deletions(-)
>
> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> index 9941395..9aa9958 100644
> --- a/docs/man/xl.cfg.pod.5
> +++ b/docs/man/xl.cfg.pod.5
> @@ -1277,6 +1277,13 @@ specified, enabling the use of XenServer PV drivers in the guest.
>  This parameter only takes effect when device_model_version=qemu-xen.
>  See F<docs/misc/pci-device-reservations.txt> for more information.
>  
> +=item B<secondary_device_emulators=NUMBER>
> +
> +If a number of secondary device emulators (i.e. in addition to
> +qemu-xen or qemu-xen-traditional) are to be invoked to support the
> +guest then this parameter can be set with the count of how many are
> +to be used. The default value is zero.
> +
>  =back
>  
>  =head2 Device-Model Options
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index c2fdd74..c64d15a 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -1246,6 +1246,181 @@ int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long
>      return rc;
>  }
>  
> +int xc_hvm_create_ioreq_server(xc_interface *xch,
> +                               domid_t domid,
> +                               ioservid_t *id)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_create_ioreq_server_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_create_ioreq_server;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +    arg->domid = domid;
> +    rc = do_xen_hypercall(xch, &hypercall);
> +    *id = arg->id;
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}
> +
> +int xc_hvm_get_ioreq_server_info(xc_interface *xch,
> +                                 domid_t domid,
> +                                 ioservid_t id,
> +                                 xen_pfn_t *pfn,
> +                                 xen_pfn_t *buf_pfn,
> +                                 evtchn_port_t *buf_port)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_get_ioreq_server_info_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_get_ioreq_server_info;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +    arg->domid = domid;
> +    arg->id = id;
> +    rc = do_xen_hypercall(xch, &hypercall);
> +    if ( rc != 0 )
> +        goto done;
> +
> +    if ( pfn )
> +        *pfn = arg->pfn;
> +
> +    if ( buf_pfn )
> +        *buf_pfn = arg->buf_pfn;
> +
> +    if ( buf_port )
> +        *buf_port = arg->buf_port;
> +
> +done:
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}
> +
> +int xc_hvm_map_io_range_to_ioreq_server(xc_interface *xch, domid_t domid,
> +                                        ioservid_t id, int is_mmio,
> +                                        uint64_t start, uint64_t end)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_map_io_range_to_ioreq_server_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +    arg->domid = domid;
> +    arg->id = id;
> +    arg->is_mmio = is_mmio;
> +    arg->start = start;
> +    arg->end = end;
> +    rc = do_xen_hypercall(xch, &hypercall);
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}
> +
> +int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch, domid_t domid,
> +                                            ioservid_t id, int is_mmio,
> +                                            uint64_t start)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_unmap_io_range_from_ioreq_server_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_unmap_io_range_from_ioreq_server;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +    arg->domid = domid;
> +    arg->id = id;
> +    arg->is_mmio = is_mmio;
> +    arg->start = start;
> +    rc = do_xen_hypercall(xch, &hypercall);
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}
> +
> +int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch, domid_t domid,
> +                                      ioservid_t id, uint16_t bdf)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_map_pcidev_to_ioreq_server_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_map_pcidev_to_ioreq_server;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +    arg->domid = domid;
> +    arg->id = id;
> +    arg->bdf = bdf;
> +    rc = do_xen_hypercall(xch, &hypercall);
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}
> +
> +int xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch, domid_t domid,
> +                                          ioservid_t id, uint16_t bdf)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_unmap_pcidev_from_ioreq_server_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_unmap_pcidev_from_ioreq_server;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +    arg->domid = domid;
> +    arg->id = id;
> +    arg->bdf = bdf;
> +    rc = do_xen_hypercall(xch, &hypercall);
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}
> +
> +int xc_hvm_destroy_ioreq_server(xc_interface *xch,
> +                                domid_t domid,
> +                                ioservid_t id)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_destroy_ioreq_server_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_destroy_ioreq_server;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +    arg->domid = domid;
> +    arg->id = id;
> +    rc = do_xen_hypercall(xch, &hypercall);
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}
> +
>  int xc_domain_setdebugging(xc_interface *xch,
>                             uint32_t domid,
>                             unsigned int enable)
> diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
> index ca2fb51..305e4b8 100644
> --- a/tools/libxc/xc_domain_restore.c
> +++ b/tools/libxc/xc_domain_restore.c
> @@ -746,6 +746,7 @@ typedef struct {
>      uint64_t acpi_ioport_location;
>      uint64_t viridian;
>      uint64_t vm_generationid_addr;
> +    uint64_t nr_ioreq_servers;
>  
>      struct toolstack_data_t tdata;
>  } pagebuf_t;
> @@ -996,6 +997,16 @@ static int pagebuf_get_one(xc_interface *xch, struct restore_ctx *ctx,
>          DPRINTF("read generation id buffer address");
>          return pagebuf_get_one(xch, ctx, buf, fd, dom);
>  
> +    case XC_SAVE_ID_HVM_NR_IOREQ_SERVERS:
> +        /* Skip padding 4 bytes then read the acpi ioport location. */
> +        if ( RDEXACT(fd, &buf->nr_ioreq_servers, sizeof(uint32_t)) ||
> +             RDEXACT(fd, &buf->nr_ioreq_servers, sizeof(uint64_t)) )
> +        {
> +            PERROR("error reading the number of IOREQ servers");
> +            return -1;
> +        }
> +        return pagebuf_get_one(xch, ctx, buf, fd, dom);
> +
>      default:
>          if ( (count > MAX_BATCH_SIZE) || (count < 0) ) {
>              ERROR("Max batch size exceeded (%d). Giving up.", count);
> @@ -1755,6 +1766,15 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
>      if (pagebuf.viridian != 0)
>          xc_set_hvm_param(xch, dom, HVM_PARAM_VIRIDIAN, 1);
>  
> +    if ( hvm ) {
> +        int nr_ioreq_servers = pagebuf.nr_ioreq_servers;
> +
> +        if ( nr_ioreq_servers == 0 )
> +            nr_ioreq_servers = 1;
> +
> +        xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS, nr_ioreq_servers);
> +    }
> +
>      if (pagebuf.acpi_ioport_location == 1) {
>          DBGPRINTF("Use new firmware ioport from the checkpoint\n");
>          xc_set_hvm_param(xch, dom, HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
> diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c
> index 42c4752..3293e29 100644
> --- a/tools/libxc/xc_domain_save.c
> +++ b/tools/libxc/xc_domain_save.c
> @@ -1731,6 +1731,18 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter
>              PERROR("Error when writing the viridian flag");
>              goto out;
>          }
> +
> +        chunk.id = XC_SAVE_ID_HVM_NR_IOREQ_SERVERS;
> +        chunk.data = 0;
> +        xc_get_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS,
> +                         (unsigned long *)&chunk.data);
> +
> +        if ( (chunk.data != 0) &&
> +             wrexact(io_fd, &chunk, sizeof(chunk)) )
> +        {
> +            PERROR("Error when writing the number of IOREQ servers");
> +            goto out;
> +        }
>      }
>  
>      if ( callbacks != NULL && callbacks->toolstack_save != NULL )
> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> index f24f2a1..bbe5def 100644
> --- a/tools/libxc/xc_hvm_build_x86.c
> +++ b/tools/libxc/xc_hvm_build_x86.c
> @@ -45,7 +45,7 @@
>  #define SPECIALPAGE_IDENT_PT 4
>  #define SPECIALPAGE_CONSOLE  5
>  #define SPECIALPAGE_IOREQ    6
> -#define NR_SPECIAL_PAGES     SPECIALPAGE_IOREQ + 2 /* ioreq server needs 2 pages */
> +#define NR_SPECIAL_PAGES(n)  SPECIALPAGE_IOREQ + (2 * n) /* ioreq server needs 2 pages */
>  #define special_pfn(x) (0xff000u - (x))
>  
>  static int modules_init(struct xc_hvm_build_args *args,
> @@ -83,7 +83,8 @@ static int modules_init(struct xc_hvm_build_args *args,
>  }
>  
>  static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
> -                           uint64_t mmio_start, uint64_t mmio_size)
> +                           uint64_t mmio_start, uint64_t mmio_size,
> +                           int max_emulators)
>  {
>      struct hvm_info_table *hvm_info = (struct hvm_info_table *)
>          (((unsigned char *)hvm_info_page) + HVM_INFO_OFFSET);
> @@ -111,7 +112,7 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
>      /* Memory parameters. */
>      hvm_info->low_mem_pgend = lowmem_end >> PAGE_SHIFT;
>      hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT;
> -    hvm_info->reserved_mem_pgstart = special_pfn(0) - NR_SPECIAL_PAGES;
> +    hvm_info->reserved_mem_pgstart = special_pfn(0) - NR_SPECIAL_PAGES(max_emulators);
>  
>      /* Finish with the checksum. */
>      for ( i = 0, sum = 0; i < hvm_info->length; i++ )
> @@ -254,6 +255,10 @@ static int setup_guest(xc_interface *xch,
>          stat_1gb_pages = 0;
>      int pod_mode = 0;
>      int claim_enabled = args->claim_enabled;
> +    int max_emulators = args->max_emulators;
> +
> +    if ( max_emulators < 1 )
> +        goto error_out;

Is there a sane upper bound for emulators?

>  
>      if ( nr_pages > target_pages )
>          pod_mode = XENMEMF_populate_on_demand;
> @@ -458,7 +463,8 @@ static int setup_guest(xc_interface *xch,
>                xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE,
>                HVM_INFO_PFN)) == NULL )
>          goto error_out;
> -    build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size);
> +    build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size,
> +                   max_emulators);
>      munmap(hvm_info_page, PAGE_SIZE);
>  
>      /* Allocate and clear special pages. */
> @@ -470,17 +476,18 @@ static int setup_guest(xc_interface *xch,
>              "  STORE:     %"PRI_xen_pfn"\n"
>              "  IDENT_PT:  %"PRI_xen_pfn"\n"
>              "  CONSOLE:   %"PRI_xen_pfn"\n"
> -            "  IOREQ:     %"PRI_xen_pfn"\n",
> -            NR_SPECIAL_PAGES,
> +            "  IOREQ(%02d): %"PRI_xen_pfn"\n",
> +            NR_SPECIAL_PAGES(max_emulators),
>              (xen_pfn_t)special_pfn(SPECIALPAGE_PAGING),
>              (xen_pfn_t)special_pfn(SPECIALPAGE_ACCESS),
>              (xen_pfn_t)special_pfn(SPECIALPAGE_SHARING),
>              (xen_pfn_t)special_pfn(SPECIALPAGE_XENSTORE),
>              (xen_pfn_t)special_pfn(SPECIALPAGE_IDENT_PT),
>              (xen_pfn_t)special_pfn(SPECIALPAGE_CONSOLE),
> +            max_emulators * 2,
>              (xen_pfn_t)special_pfn(SPECIALPAGE_IOREQ));
>  
> -    for ( i = 0; i < NR_SPECIAL_PAGES; i++ )
> +    for ( i = 0; i < NR_SPECIAL_PAGES(max_emulators); i++ )
>      {
>          xen_pfn_t pfn = special_pfn(i);
>          rc = xc_domain_populate_physmap_exact(xch, dom, 1, 0, 0, &pfn);
> @@ -506,7 +513,9 @@ static int setup_guest(xc_interface *xch,
>      xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN,
>                       special_pfn(SPECIALPAGE_IOREQ));
>      xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
> -                     special_pfn(SPECIALPAGE_IOREQ) - 1);
> +                     special_pfn(SPECIALPAGE_IOREQ) - max_emulators);
> +    xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS,
> +                     max_emulators);
>  
>      /*
>       * Identity-map page table is required for running with CR0.PG=0 when
> diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> index 13f816b..142aaea 100644
> --- a/tools/libxc/xenctrl.h
> +++ b/tools/libxc/xenctrl.h
> @@ -1801,6 +1801,47 @@ void xc_clear_last_error(xc_interface *xch);
>  int xc_set_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long value);
>  int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long *value);
>  
> +/*
> + * IOREQ server API
> + */
> +int xc_hvm_create_ioreq_server(xc_interface *xch,
> +			       domid_t domid,
> +			       ioservid_t *id);
> +
> +int xc_hvm_get_ioreq_server_info(xc_interface *xch,
> +				 domid_t domid,
> +				 ioservid_t id,
> +				 xen_pfn_t *pfn,
> +				 xen_pfn_t *buf_pfn,
> +				 evtchn_port_t *buf_port);
> +
> +int xc_hvm_map_io_range_to_ioreq_server(xc_interface *xch,
> +					domid_t domid,
> +                                        ioservid_t id,
> +					int is_mmio,
> +                                        uint64_t start,
> +					uint64_t end);
> +
> +int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch,
> +					    domid_t domid,
> +                                            ioservid_t id,
> +					    int is_mmio,
> +                                            uint64_t start);
> +
> +int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch,
> +				      domid_t domid,
> +                                      ioservid_t id,
> +				      uint16_t bdf);
> +
> +int xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch,
> +					  domid_t domid,
> +					  ioservid_t id,
> +					  uint16_t bdf);
> +
> +int xc_hvm_destroy_ioreq_server(xc_interface *xch,
> +				domid_t domid,
> +				ioservid_t id);
> +

There are tab/space issues in this hunk.

>  /* HVM guest pass-through */
>  int xc_assign_device(xc_interface *xch,
>                       uint32_t domid,
> diff --git a/tools/libxc/xenguest.h b/tools/libxc/xenguest.h
> index a0e30e1..8930ac0 100644
> --- a/tools/libxc/xenguest.h
> +++ b/tools/libxc/xenguest.h
> @@ -234,6 +234,8 @@ struct xc_hvm_build_args {
>      struct xc_hvm_firmware_module smbios_module;
>      /* Whether to use claim hypercall (1 - enable, 0 - disable). */
>      int claim_enabled;
> +    /* Maximum number of emulators for VM */
> +    int max_emulators;
>  };
>  
>  /**
> diff --git a/tools/libxc/xg_save_restore.h b/tools/libxc/xg_save_restore.h
> index f859621..5170b7f 100644
> --- a/tools/libxc/xg_save_restore.h
> +++ b/tools/libxc/xg_save_restore.h
> @@ -259,6 +259,7 @@
>  #define XC_SAVE_ID_HVM_ACCESS_RING_PFN  -16
>  #define XC_SAVE_ID_HVM_SHARING_RING_PFN -17
>  #define XC_SAVE_ID_TOOLSTACK          -18 /* Optional toolstack specific info */
> +#define XC_SAVE_ID_HVM_NR_IOREQ_SERVERS -19
>  
>  /*
>  ** We process save/restore/migrate in batches of pages; the below
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index 12d6c31..b679957 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -95,6 +95,14 @@
>  #define LIBXL_HAVE_BUILDINFO_EVENT_CHANNELS 1
>  
>  /*
> + * LIBXL_HAVE_BUILDINFO_HVM_MAX_EMULATORS indicates that the
> + * max_emulators field is present in the hvm sections of
> + * libxl_domain_build_info. This field can be used to reserve
> + * extra special pages for secondary device emulators.
> + */
> +#define LIBXL_HAVE_BUILDINFO_HVM_MAX_EMULATORS 1
> +
> +/*
>   * libxl ABI compatibility
>   *
>   * The only guarantee which libxl makes regarding ABI compatibility
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index a604cd8..cce93d9 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -330,6 +330,9 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
>  
>          libxl_defbool_setdefault(&b_info->u.hvm.gfx_passthru, false);
>  
> +        if (b_info->u.hvm.max_emulators < 1)
> +            b_info->u.hvm.max_emulators = 1;
> +
>          break;
>      case LIBXL_DOMAIN_TYPE_PV:
>          libxl_defbool_setdefault(&b_info->u.pv.e820_host, false);
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 55f74b2..9de06f9 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -637,6 +637,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>      args.mem_size = (uint64_t)(info->max_memkb - info->video_memkb) << 10;
>      args.mem_target = (uint64_t)(info->target_memkb - info->video_memkb) << 10;
>      args.claim_enabled = libxl_defbool_val(info->claim_mode);
> +    args.max_emulators = info->u.hvm.max_emulators;
>      if (libxl__domain_firmware(gc, info, &args)) {
>          LOG(ERROR, "initializing domain firmware failed");
>          goto out;
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 649ce50..b707159 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -372,6 +372,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
>                                         ("xen_platform_pci", libxl_defbool),
>                                         ("usbdevice_list",   libxl_string_list),
>                                         ("vendor_device",    libxl_vendor_device),
> +                                       ("max_emulators",    integer),
>                                         ])),
>                   ("pv", Struct(None, [("kernel", string),
>                                        ("slack_memkb", MemKB),
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index aff6f90..c65f4f4 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -1750,6 +1750,9 @@ skip_vfb:
>  
>              b_info->u.hvm.vendor_device = d;
>          }
> + 
> +        if (!xlu_cfg_get_long (config, "secondary_device_emulators", &l, 0))
> +            b_info->u.hvm.max_emulators = l + 1;
>      }
>  
>      xlu_cfg_destroy(config);
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index d9874fb..5f9e728 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -379,21 +379,23 @@ static void hvm_wait_on_io(struct domain *d, ioreq_t *p)
>  void hvm_do_resume(struct vcpu *v)
>  {
>      struct domain *d = v->domain;
> -    struct hvm_ioreq_server *s;
> +    struct list_head *entry, *next;
>  
>      check_wakeup_from_wait();
>  
>      if ( is_hvm_vcpu(v) )
>          pt_restore_timer(v);
>  
> -    s = v->arch.hvm_vcpu.ioreq_server;
> -    v->arch.hvm_vcpu.ioreq_server = NULL;
> -
> -    if ( s )
> +    list_for_each_safe ( entry, next, &v->arch.hvm_vcpu.ioreq_server_list )
>      {
> +        struct hvm_ioreq_server *s = list_entry(entry,
> +                                                struct hvm_ioreq_server,
> +                                                vcpu_list_entry[v->vcpu_id]);
>          ioreq_t *p = get_ioreq(s, v->vcpu_id);
>  
>          hvm_wait_on_io(d, p);
> +
> +        list_del_init(entry);
>      }
>  
>      /* Inject pending hw/sw trap */
> @@ -531,6 +533,83 @@ static int hvm_print_line(
>      return X86EMUL_OKAY;
>  }
>  
> +static int hvm_access_cf8(
> +    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
> +{
> +    struct vcpu *curr = current;
> +    struct hvm_domain *hd = &curr->domain->arch.hvm_domain;
> +    int rc;
> +
> +    BUG_ON(port < 0xcf8);
> +    port -= 0xcf8;
> +
> +    spin_lock(&hd->pci_lock);
> +
> +    if ( dir == IOREQ_WRITE )
> +    {
> +        switch ( bytes )
> +        {
> +        case 4:
> +            hd->pci_cf8 = *val;
> +            break;
> +
> +        case 2:
> +        {
> +            uint32_t mask = 0xffff << (port * 8);
> +            uint32_t subval = *val << (port * 8);
> +
> +            hd->pci_cf8 = (hd->pci_cf8 & ~mask) |
> +                          (subval & mask);
> +            break;
> +        }
> +            
> +        case 1:
> +        {
> +            uint32_t mask = 0xff << (port * 8);
> +            uint32_t subval = *val << (port * 8);
> +
> +            hd->pci_cf8 = (hd->pci_cf8 & ~mask) |
> +                          (subval & mask);
> +            break;
> +        }
> +
> +        default:
> +            break;
> +        }
> +
> +        /* We always need to fall through to the catch all emulator */
> +        rc = X86EMUL_UNHANDLEABLE;
> +    }
> +    else
> +    {
> +        switch ( bytes )
> +        {
> +        case 4:
> +            *val = hd->pci_cf8;
> +            rc = X86EMUL_OKAY;
> +            break;
> +
> +        case 2:
> +            *val = (hd->pci_cf8 >> (port * 8)) & 0xffff;
> +            rc = X86EMUL_OKAY;
> +            break;
> +            
> +        case 1:
> +            *val = (hd->pci_cf8 >> (port * 8)) & 0xff;
> +            rc = X86EMUL_OKAY;
> +            break;
> +
> +        default:
> +            rc = X86EMUL_UNHANDLEABLE;
> +            break;
> +        }
> +    }
> +
> +    spin_unlock(&hd->pci_lock);
> +
> +    return rc;
> +}
> +
>  static int handle_pvh_io(
>      int dir, uint32_t port, uint32_t bytes, uint32_t *val)
>  {
> @@ -590,6 +669,8 @@ done:
>  
>  static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, struct vcpu *v)
>  {
> +    list_del_init(&s->vcpu_list_entry[v->vcpu_id]);
> +
>      if ( v->vcpu_id == 0 )
>      {
>          if ( s->buf_ioreq_evtchn >= 0 )
> @@ -606,7 +687,7 @@ static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, struct vcpu
>      }
>  }
>  
> -static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
> +static int hvm_create_ioreq_server(struct domain *d, ioservid_t id, domid_t domid)
>  {
>      struct hvm_ioreq_server *s;
>      int i;
> @@ -614,34 +695,47 @@ static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
>      struct vcpu *v;
>      int rc;
>  
> +    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
> +        return -EINVAL;
> +
>      spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
>  
>      rc = -EEXIST;
> -    if ( d->arch.hvm_domain.ioreq_server != NULL )
> -        goto fail_exist;
> +    list_for_each_entry ( s, 
> +                          &d->arch.hvm_domain.ioreq_server_list,
> +                          domain_list_entry )
> +    {
> +        if ( s->id == id )
> +            goto fail_exist;
> +    }
>  
> -    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, d->domain_id);
> +    gdprintk(XENLOG_INFO, "%s: %d:%d\n", __func__, d->domain_id, id);
>  
>      rc = -ENOMEM;
>      s = xzalloc(struct hvm_ioreq_server);
>      if ( !s )
>          goto fail_alloc;
>  
> +    s->id = id;
>      s->domain = d;
>      s->domid = domid;
> +    INIT_LIST_HEAD(&s->domain_list_entry);
>  
>      for ( i = 0; i < MAX_HVM_VCPUS; i++ )
> +    {
>          s->ioreq_evtchn[i] = -1;
> +        INIT_LIST_HEAD(&s->vcpu_list_entry[i]);
> +    }
>      s->buf_ioreq_evtchn = -1;
>  
>      /* Initialize shared pages */
> -    pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
> +    pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN] - s->id;
>  
>      hvm_init_ioreq_page(s, 0);
>      if ( (rc = hvm_set_ioreq_page(s, 0, pfn)) < 0 )
>          goto fail_set_ioreq;
>  
> -    pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
> +    pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN] - s->id;
>  
>      hvm_init_ioreq_page(s, 1);
>      if ( (rc = hvm_set_ioreq_page(s, 1, pfn)) < 0 )
> @@ -653,7 +747,8 @@ static int hvm_create_ioreq_server(struct domain *d, domid_t domid)
>              goto fail_add_vcpu;
>      }
>  
> -    d->arch.hvm_domain.ioreq_server = s;
> +    list_add(&s->domain_list_entry,
> +             &d->arch.hvm_domain.ioreq_server_list);
>  
>      spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
>  
> @@ -673,22 +768,30 @@ fail_exist:
>      return rc;
>  }
>  
> -static void hvm_destroy_ioreq_server(struct domain *d)
> +static void hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
>  {
> -    struct hvm_ioreq_server *s;
> +    struct hvm_ioreq_server *s, *next;
>      struct vcpu *v;
>  
>      spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
>  
> -    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, d->domain_id);
> +    list_for_each_entry_safe ( s,
> +                               next,
> +                               &d->arch.hvm_domain.ioreq_server_list,
> +                               domain_list_entry)
> +    {
> +        if ( s->id == id )
> +            goto found;
> +    }
>  
> -    s = d->arch.hvm_domain.ioreq_server;
> -    if ( !s )
> -        goto done;
> +    goto done;
> +
> +found:
> +    gdprintk(XENLOG_INFO, "%s: %d:%d\n", __func__, d->domain_id, id);
>  
>      domain_pause(d);
>  
> -    d->arch.hvm_domain.ioreq_server = NULL;
> +    list_del_init(&s->domain_list_entry);
>  
>      for_each_vcpu ( d, v )
>          hvm_ioreq_server_remove_vcpu(s, v);
> @@ -704,21 +807,186 @@ done:
>      spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
>  }
>  
> -static int hvm_get_ioreq_server_buf_port(struct domain *d, evtchn_port_t *port)
> +static int hvm_get_ioreq_server_buf_port(struct domain *d, ioservid_t id, evtchn_port_t *port)
> +{
> +    struct list_head *entry;
> +    int rc;
> +
> +    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
> +        return -EINVAL;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    rc = -ENOENT;
> +    list_for_each ( entry,
> +                    &d->arch.hvm_domain.ioreq_server_list )
> +    {
> +        struct hvm_ioreq_server *s = list_entry(entry,
> +                                                struct hvm_ioreq_server,
> +                                                domain_list_entry);
> +
> +        if ( s->id == id )
> +        {
> +            *port = s->buf_ioreq_evtchn;
> +            rc = 0;
> +            break;
> +        }
> +    }
> +
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    return rc;
> +}
> +
> +static int hvm_get_ioreq_server_pfn(struct domain *d, ioservid_t id, int buf, xen_pfn_t *pfn)
> +{
> +    struct list_head *entry;
> +    int rc;
> +
> +    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
> +        return -EINVAL;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    rc = -ENOENT;
> +    list_for_each ( entry,
> +                    &d->arch.hvm_domain.ioreq_server_list )
> +    {
> +        struct hvm_ioreq_server *s = list_entry(entry,
> +                                                struct hvm_ioreq_server,
> +                                                domain_list_entry);
> +
> +        if ( s->id == id )
> +        {
> +            if ( buf )
> +                *pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN] - s->id;
> +            else
> +                *pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN] - s->id;
> +
> +            rc = 0;
> +            break;
> +        }
> +    }
> +
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    return rc;
> +}
> +
> +static int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
> +                                            int is_mmio, uint64_t start, uint64_t end)
>  {
>      struct hvm_ioreq_server *s;
> +    struct hvm_io_range *x;
>      int rc;
>  
> +    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
> +        return -EINVAL;
> +
> +    x = xmalloc(struct hvm_io_range);
> +    if ( x == NULL )
> +        return -ENOMEM;
> +
>      spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
>  
> -    s = d->arch.hvm_domain.ioreq_server;
> +    rc = -ENOENT;
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server_list,
> +                          domain_list_entry )
> +    {
> +        if ( s->id == id )
> +            goto found;
> +    }
> +
> +    goto fail;
> +
> +found:
> +    x->start = start;
> +    x->end = end;
> +
> +    if ( is_mmio )
> +    {
> +        x->next = s->mmio_range_list;
> +        s->mmio_range_list = x;
> +    }
> +    else
> +    {
> +        x->next = s->portio_range_list;
> +        s->portio_range_list = x;
> +    }
> +
> +    gdprintk(XENLOG_DEBUG, "%d:%d: +%s %"PRIX64" - %"PRIX64"\n",
> +             d->domain_id,
> +             s->id,
> +             ( is_mmio ) ? "MMIO" : "PORTIO",
> +             x->start,
> +             x->end);
> +
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    return 0;
> +
> +fail:
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +    xfree(x);
> +
> +    return rc;
> +}
> +
> +static int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
> +                                                int is_mmio, uint64_t start)
> +{
> +    struct hvm_ioreq_server *s;
> +    struct hvm_io_range *x, **xp;
> +    int rc;
> +
> +    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
> +        return -EINVAL;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
>  
>      rc = -ENOENT;
> -    if ( !s )
> -        goto done;
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server_list,
> +                          domain_list_entry )
> +    {
> +        if ( s->id == id )
> +            goto found;
> +    }
>  
> -    *port = s->buf_ioreq_evtchn;
> -    rc = 0;
> +    goto done;
> +
> +found:
> +    if ( is_mmio )
> +    {
> +        x = s->mmio_range_list;
> +        xp = &s->mmio_range_list;
> +    }
> +    else
> +    {
> +        x = s->portio_range_list;
> +        xp = &s->portio_range_list;
> +    }
> +
> +    while ( (x != NULL) && (start != x->start) )
> +    {
> +        xp = &x->next;
> +        x = x->next;
> +    }
> +
> +    if ( (x != NULL) )
> +    {
> +        gdprintk(XENLOG_DEBUG, "%d:%d: -%s %"PRIX64" - %"PRIX64"\n",
> +                 d->domain_id,
> +                 s->id,
> +                 ( is_mmio ) ? "MMIO" : "PORTIO",
> +                 x->start,
> +                 x->end);
> +
> +        *xp = x->next;
> +        xfree(x);
> +        rc = 0;
> +    }
>  
>  done:
>      spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> @@ -726,25 +994,98 @@ done:
>      return rc;
>  }
>  
> -static int hvm_get_ioreq_server_pfn(struct domain *d, int buf, xen_pfn_t *pfn)
> +static int hvm_map_pcidev_to_ioreq_server(struct domain *d, ioservid_t id,
> +                                          uint16_t bdf)
>  {
>      struct hvm_ioreq_server *s;
> +    struct hvm_pcidev *x;
>      int rc;
>  
> +    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
> +        return -EINVAL;
> +
> +    x = xmalloc(struct hvm_pcidev);
> +    if ( x == NULL )
> +        return -ENOMEM;
> +
>      spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
>  
> -    s = d->arch.hvm_domain.ioreq_server;
> +    rc = -ENOENT;
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server_list,
> +                          domain_list_entry )
> +    {
> +        if ( s->id == id )
> +            goto found;
> +    }
> +
> +    goto fail;
> +
> +found:
> +    x->bdf = bdf;
> +
> +    x->next = s->pcidev_list;
> +    s->pcidev_list = x;
> +
> +    gdprintk(XENLOG_DEBUG, "%d:%d: +PCIDEV %04X\n",
> +             d->domain_id,
> +             s->id,
> +             x->bdf);
> +
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    return 0;
> +
> +fail:
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +    xfree(x);
> +
> +    return rc;
> +}
> +
> +static int hvm_unmap_pcidev_from_ioreq_server(struct domain *d, ioservid_t id,
> +                                              uint16_t bdf)
> +{
> +    struct hvm_ioreq_server *s;
> +    struct hvm_pcidev *x, **xp;
> +    int rc;
> +
> +    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
> +        return -EINVAL;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
>  
>      rc = -ENOENT;
> -    if ( !s )
> -        goto done;
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server_list,
> +                          domain_list_entry )
> +    {
> +        if ( s->id == id )
> +            goto found;
> +    }
>  
> -    if ( buf )
> -        *pfn = d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN];
> -    else
> -        *pfn = d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
> +    goto done;
>  
> -    rc = 0;
> +found:
> +    x = s->pcidev_list;
> +    xp = &s->pcidev_list;
> +
> +    while ( (x != NULL) && (bdf != x->bdf) )
> +    {
> +        xp = &x->next;
> +        x = x->next;
> +    }
> +    if ( (x != NULL) )
> +    {
> +        gdprintk(XENLOG_DEBUG, "%d:%d: -PCIDEV %04X\n",
> +                 d->domain_id,
> +                 s->id,
> +                 x->bdf);
> +
> +         *xp = x->next;
> +        xfree(x);
> +        rc = 0;
> +    }
>  
>  done:
>      spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> @@ -752,6 +1093,73 @@ done:
>      return rc;
>  }
>  
> +static int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
> +{
> +    struct list_head *entry;
> +    int rc;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    list_for_each ( entry,
> +                    &d->arch.hvm_domain.ioreq_server_list )
> +    {
> +        struct hvm_ioreq_server *s = list_entry(entry,
> +                                                struct hvm_ioreq_server,
> +                                                domain_list_entry);
> +
> +        if ( (rc = hvm_ioreq_server_add_vcpu(s, v)) < 0 )
> +            goto fail;
> +    }
> +        
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    return 0;
> +
> +fail:
> +    list_for_each ( entry,
> +                    &d->arch.hvm_domain.ioreq_server_list )
> +    {
> +        struct hvm_ioreq_server *s = list_entry(entry,
> +                                                struct hvm_ioreq_server,
> +                                                domain_list_entry);
> +
> +        hvm_ioreq_server_remove_vcpu(s, v);
> +    }
> +
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    return rc;
> +}
> +
> +static void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v)
> +{
> +    struct list_head *entry;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    list_for_each ( entry,
> +                    &d->arch.hvm_domain.ioreq_server_list )
> +    {
> +        struct hvm_ioreq_server *s = list_entry(entry,
> +                                                struct hvm_ioreq_server,
> +                                                domain_list_entry);
> +
> +        hvm_ioreq_server_remove_vcpu(s, v);
> +    }
> +
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +}
> +
> +static void hvm_destroy_all_ioreq_servers(struct domain *d)
> +{
> +    ioservid_t id;
> +
> +    for ( id = 0;
> +          id < d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS];
> +          id++ )
> +        hvm_destroy_ioreq_server(d, id);
> +}
> +
>  static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
>                                       int *p_port)
>  {
> @@ -767,21 +1175,30 @@ static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid,
>      return 0;
>  }
>  
> -static int hvm_set_ioreq_server_domid(struct domain *d, domid_t domid)
> +static int hvm_set_ioreq_server_domid(struct domain *d, ioservid_t id, domid_t domid)
>  {
>      struct hvm_ioreq_server *s;
>      struct vcpu *v;
>      int rc = 0;
>  
> +    if ( id >= d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS] )
> +        return -EINVAL;
> +
>      domain_pause(d);
>      spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
>  
> -    s = d->arch.hvm_domain.ioreq_server;
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server_list,
> +                          domain_list_entry )
> +    {
> +        if ( s->id == id )
> +            goto found;
> +    }
>  
>      rc = -ENOENT;
> -    if ( !s )
> -        goto done;
> +    goto done;
>  
> +found:
>      rc = 0;
>      if ( s->domid == domid )
>          goto done;
> @@ -838,7 +1255,9 @@ int hvm_domain_initialise(struct domain *d)
>  
>      }
>  
> +    INIT_LIST_HEAD(&d->arch.hvm_domain.ioreq_server_list);
>      spin_lock_init(&d->arch.hvm_domain.ioreq_server_lock);
> +    spin_lock_init(&d->arch.hvm_domain.pci_lock);
>      spin_lock_init(&d->arch.hvm_domain.irq_lock);
>      spin_lock_init(&d->arch.hvm_domain.uc_lock);
>  
> @@ -880,6 +1299,7 @@ int hvm_domain_initialise(struct domain *d)
>      rtc_init(d);
>  
>      register_portio_handler(d, 0xe9, 1, hvm_print_line);
> +    register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
>  
>      rc = hvm_funcs.domain_initialise(d);
>      if ( rc != 0 )
> @@ -910,7 +1330,7 @@ void hvm_domain_relinquish_resources(struct domain *d)
>      if ( hvm_funcs.nhvm_domain_relinquish_resources )
>          hvm_funcs.nhvm_domain_relinquish_resources(d);
>  
> -    hvm_destroy_ioreq_server(d);
> +    hvm_destroy_all_ioreq_servers(d);
>  
>      msixtbl_pt_cleanup(d);
>  
> @@ -1422,13 +1842,14 @@ int hvm_vcpu_initialise(struct vcpu *v)
>  {
>      int rc;
>      struct domain *d = v->domain;
> -    struct hvm_ioreq_server *s;
>  
>      hvm_asid_flush_vcpu(v);
>  
>      spin_lock_init(&v->arch.hvm_vcpu.tm_lock);
>      INIT_LIST_HEAD(&v->arch.hvm_vcpu.tm_list);
>  
> +    INIT_LIST_HEAD(&v->arch.hvm_vcpu.ioreq_server_list);
> +
>      rc = hvm_vcpu_cacheattr_init(v); /* teardown: vcpu_cacheattr_destroy */
>      if ( rc != 0 )
>          goto fail1;
> @@ -1465,16 +1886,9 @@ int hvm_vcpu_initialise(struct vcpu *v)
>           && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown: nestedhvm_vcpu_destroy */
>          goto fail5;
>  
> -    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> -    s = d->arch.hvm_domain.ioreq_server;
> -    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> -
> -    if ( s )
> -    {
> -        rc = hvm_ioreq_server_add_vcpu(s, v);
> -        if ( rc < 0 )
> -            goto fail6;
> -    }
> +    rc = hvm_all_ioreq_servers_add_vcpu(d, v);
> +    if ( rc < 0 )
> +        goto fail6;
>  
>      if ( v->vcpu_id == 0 )
>      {
> @@ -1510,14 +1924,8 @@ int hvm_vcpu_initialise(struct vcpu *v)
>  void hvm_vcpu_destroy(struct vcpu *v)
>  {
>      struct domain *d = v->domain;
> -    struct hvm_ioreq_server *s;
> -
> -    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> -    s = d->arch.hvm_domain.ioreq_server;
> -    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
>  
> -    if ( s )
> -        hvm_ioreq_server_remove_vcpu(s, v);
> +    hvm_all_ioreq_servers_remove_vcpu(d, v);
>  
>      nestedhvm_vcpu_destroy(v);
>  
> @@ -1556,6 +1964,101 @@ void hvm_vcpu_down(struct vcpu *v)
>      }
>  }
>  
> +static struct hvm_ioreq_server *hvm_select_ioreq_server(struct vcpu *v, ioreq_t *p)
> +{
> +#define BDF(cf8) (((cf8) & 0x00ffff00) >> 8)
> +
> +    struct domain *d = v->domain;
> +    struct hvm_ioreq_server *s;
> +    uint8_t type;
> +    uint64_t addr;
> +
> +    if ( p->type == IOREQ_TYPE_PIO &&
> +         (p->addr & ~3) == 0xcfc )
> +    { 
> +        /* PCI config data cycle */
> +        type = IOREQ_TYPE_PCI_CONFIG;
> +
> +        spin_lock(&d->arch.hvm_domain.pci_lock);
> +        addr = d->arch.hvm_domain.pci_cf8 + (p->addr & 3);
> +        spin_unlock(&d->arch.hvm_domain.pci_lock);
> +    }
> +    else
> +    {
> +        type = p->type;
> +        addr = p->addr;
> +    }
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    switch ( type )
> +    {
> +    case IOREQ_TYPE_COPY:
> +    case IOREQ_TYPE_PIO:
> +    case IOREQ_TYPE_PCI_CONFIG:
> +        break;
> +    default:
> +        goto done;
> +    }
> +
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server_list,
> +                          domain_list_entry )
> +    {
> +        switch ( type )
> +        {
> +            case IOREQ_TYPE_COPY:
> +            case IOREQ_TYPE_PIO: {
> +                struct hvm_io_range *x;
> +
> +                x = (type == IOREQ_TYPE_COPY) ?
> +                    s->mmio_range_list :
> +                    s->portio_range_list;
> +
> +                for ( ; x; x = x->next )
> +                {
> +                    if ( (addr >= x->start) && (addr <= x->end) )
> +                        goto found;
> +                }
> +                break;
> +            }
> +            case IOREQ_TYPE_PCI_CONFIG: {
> +                struct hvm_pcidev *x;
> +
> +                x = s->pcidev_list;
> +
> +                for ( ; x; x = x->next )
> +                {
> +                    if ( BDF(addr) == x->bdf ) {
> +                        p->type = type;
> +                        p->addr = addr;
> +                        goto found;
> +                    }
> +                }
> +                break;
> +            }
> +        }
> +    }
> +
> +done:
> +    /* The catch-all server has id 0 */
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server_list,
> +                          domain_list_entry )
> +    {
> +        if ( s->id == 0 )
> +            goto found;
> +    }
> +
> +    s = NULL;
> +
> +found:
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +    return s;
> +
> +#undef BDF
> +}
> +
>  int hvm_buffered_io_send(ioreq_t *p)
>  {
>      struct vcpu *v = current;
> @@ -1570,10 +2073,7 @@ int hvm_buffered_io_send(ioreq_t *p)
>      /* Ensure buffered_iopage fits in a page */
>      BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
>  
> -    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> -    s = d->arch.hvm_domain.ioreq_server;
> -    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> -
> +    s = hvm_select_ioreq_server(v, p);
>      if ( !s )
>          return 0;
>  
> @@ -1661,7 +2161,9 @@ static bool_t hvm_send_assist_req_to_server(struct hvm_ioreq_server *s,
>          return 0;
>      }
>  
> -    v->arch.hvm_vcpu.ioreq_server = s;
> +    ASSERT(list_empty(&s->vcpu_list_entry[v->vcpu_id]));
> +    list_add(&s->vcpu_list_entry[v->vcpu_id],
> +             &v->arch.hvm_vcpu.ioreq_server_list); 
>  
>      p->dir = proto_p->dir;
>      p->data_is_ptr = proto_p->data_is_ptr;
> @@ -1686,24 +2188,42 @@ static bool_t hvm_send_assist_req_to_server(struct hvm_ioreq_server *s,
>  
>  bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *p)
>  {
> -    struct domain *d = v->domain;
>      struct hvm_ioreq_server *s;
>  
> -    ASSERT(v->arch.hvm_vcpu.ioreq_server == NULL);
> +    ASSERT(list_empty(&v->arch.hvm_vcpu.ioreq_server_list));
>  
>      if ( unlikely(!vcpu_start_shutdown_deferral(v)) )
>          return 0;
>  
> -    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> -    s = d->arch.hvm_domain.ioreq_server;
> -    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> -
> +    s = hvm_select_ioreq_server(v, p);
>      if ( !s )
>          return 0;
>  
>      return hvm_send_assist_req_to_server(s, v, p);
>  }
>  
> +void hvm_broadcast_assist_req(struct vcpu *v, ioreq_t *p)
> +{
> +    struct domain *d = v->domain;
> +    struct list_head *entry;
> +
> +    ASSERT(list_empty(&v->arch.hvm_vcpu.ioreq_server_list));
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server_lock);
> +
> +    list_for_each ( entry,
> +                    &d->arch.hvm_domain.ioreq_server_list )
> +    {
> +        struct hvm_ioreq_server *s = list_entry(entry,
> +                                                struct hvm_ioreq_server,
> +                                                domain_list_entry);
> +
> +        (void) hvm_send_assist_req_to_server(s, v, p);
> +    }
> +
> +    spin_unlock(&d->arch.hvm_domain.ioreq_server_lock);
> +}
> +
>  void hvm_hlt(unsigned long rflags)
>  {
>      struct vcpu *curr = current;
> @@ -4286,6 +4806,215 @@ static int hvmop_flush_tlb_all(void)
>      return 0;
>  }
>  
> +static int hvmop_create_ioreq_server(
> +    XEN_GUEST_HANDLE_PARAM(xen_hvm_create_ioreq_server_t) uop)
> +{
> +    struct domain *curr_d = current->domain;
> +    xen_hvm_create_ioreq_server_t op;
> +    struct domain *d;
> +    ioservid_t id;
> +    int rc;
> +
> +    if ( copy_from_guest(&op, uop, 1) )
> +        return -EFAULT;
> +
> +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> +    if ( rc != 0 )
> +        return rc;
> +
> +    rc = -EINVAL;
> +    if ( !is_hvm_domain(d) )
> +        goto out;
> +
> +    rc = -ENOSPC;
> +    for ( id = 1;
> +          id <  d->arch.hvm_domain.params[HVM_PARAM_NR_IOREQ_SERVERS];
> +          id++ )
> +    {
> +        rc = hvm_create_ioreq_server(d, id, curr_d->domain_id);
> +        if ( rc == -EEXIST )
> +            continue;
> +
> +        break;
> +    }
> +
> +    if ( rc == -EEXIST )
> +        rc = -ENOSPC;
> +
> +    if ( rc < 0 )
> +        goto out;
> +
> +    op.id = id;
> +
> +    rc = copy_to_guest(uop, &op, 1) ? -EFAULT : 0;
> +    
> +out:
> +    rcu_unlock_domain(d);
> +    return rc;
> +}
> +
> +static int hvmop_get_ioreq_server_info(
> +    XEN_GUEST_HANDLE_PARAM(xen_hvm_get_ioreq_server_info_t) uop)
> +{
> +    xen_hvm_get_ioreq_server_info_t op;
> +    struct domain *d;
> +    int rc;
> +
> +    if ( copy_from_guest(&op, uop, 1) )
> +        return -EFAULT;
> +
> +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> +    if ( rc != 0 )
> +        return rc;
> +
> +    rc = -EINVAL;
> +    if ( !is_hvm_domain(d) )
> +        goto out;
> +
> +    if ( (rc = hvm_get_ioreq_server_pfn(d, op.id, 0, &op.pfn)) < 0 )
> +        goto out;
> +
> +    if ( (rc = hvm_get_ioreq_server_pfn(d, op.id, 1, &op.buf_pfn)) < 0 )
> +        goto out;
> +
> +    if ( (rc = hvm_get_ioreq_server_buf_port(d, op.id, &op.buf_port)) < 0 )
> +        goto out;
> +
> +    rc = copy_to_guest(uop, &op, 1) ? -EFAULT : 0;
> +    
> +out:
> +    rcu_unlock_domain(d);
> +    return rc;
> +}
> +
> +static int hvmop_map_io_range_to_ioreq_server(
> +    XEN_GUEST_HANDLE_PARAM(xen_hvm_map_io_range_to_ioreq_server_t) uop)
> +{
> +    xen_hvm_map_io_range_to_ioreq_server_t op;
> +    struct domain *d;
> +    int rc;
> +
> +    if ( copy_from_guest(&op, uop, 1) )
> +        return -EFAULT;
> +
> +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> +    if ( rc != 0 )
> +        return rc;
> +
> +    rc = -EINVAL;
> +    if ( !is_hvm_domain(d) )
> +        goto out;
> +
> +    rc = hvm_map_io_range_to_ioreq_server(d, op.id, op.is_mmio,
> +                                          op.start, op.end);
> +
> +out:
> +    rcu_unlock_domain(d);
> +    return rc;
> +}
> +
> +static int hvmop_unmap_io_range_from_ioreq_server(
> +    XEN_GUEST_HANDLE_PARAM(xen_hvm_unmap_io_range_from_ioreq_server_t) uop)
> +{
> +    xen_hvm_unmap_io_range_from_ioreq_server_t op;
> +    struct domain *d;
> +    int rc;
> +
> +    if ( copy_from_guest(&op, uop, 1) )
> +        return -EFAULT;
> +
> +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> +    if ( rc != 0 )
> +        return rc;
> +
> +    rc = -EINVAL;
> +    if ( !is_hvm_domain(d) )
> +        goto out;
> +
> +    rc = hvm_unmap_io_range_from_ioreq_server(d, op.id, op.is_mmio,
> +                                              op.start);
> +    
> +out:
> +    rcu_unlock_domain(d);
> +    return rc;
> +}
> +
> +static int hvmop_map_pcidev_to_ioreq_server(
> +    XEN_GUEST_HANDLE_PARAM(xen_hvm_map_pcidev_to_ioreq_server_t) uop)
> +{
> +    xen_hvm_map_pcidev_to_ioreq_server_t op;
> +    struct domain *d;
> +    int rc;
> +
> +    if ( copy_from_guest(&op, uop, 1) )
> +        return -EFAULT;
> +
> +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> +    if ( rc != 0 )
> +        return rc;
> +
> +    rc = -EINVAL;
> +    if ( !is_hvm_domain(d) )
> +        goto out;
> +
> +    rc = hvm_map_pcidev_to_ioreq_server(d, op.id, op.bdf);
> +
> +out:
> +    rcu_unlock_domain(d);
> +    return rc;
> +}
> +
> +static int hvmop_unmap_pcidev_from_ioreq_server(
> +    XEN_GUEST_HANDLE_PARAM(xen_hvm_unmap_pcidev_from_ioreq_server_t) uop)
> +{
> +    xen_hvm_unmap_pcidev_from_ioreq_server_t op;
> +    struct domain *d;
> +    int rc;
> +
> +    if ( copy_from_guest(&op, uop, 1) )
> +        return -EFAULT;
> +
> +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> +    if ( rc != 0 )
> +        return rc;
> +
> +    rc = -EINVAL;
> +    if ( !is_hvm_domain(d) )
> +        goto out;
> +
> +    rc = hvm_unmap_pcidev_from_ioreq_server(d, op.id, op.bdf);
> +
> +out:
> +    rcu_unlock_domain(d);
> +    return rc;
> +}
> +
> +static int hvmop_destroy_ioreq_server(
> +    XEN_GUEST_HANDLE_PARAM(xen_hvm_destroy_ioreq_server_t) uop)
> +{
> +    xen_hvm_destroy_ioreq_server_t op;
> +    struct domain *d;
> +    int rc;
> +
> +    if ( copy_from_guest(&op, uop, 1) )
> +        return -EFAULT;
> +
> +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> +    if ( rc != 0 )
> +        return rc;
> +
> +    rc = -EINVAL;
> +    if ( !is_hvm_domain(d) )
> +        goto out;
> +
> +    hvm_destroy_ioreq_server(d, op.id);
> +    rc = 0;
> +
> +out:
> +    rcu_unlock_domain(d);
> +    return rc;
> +}
> +
>  long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>  
>  {
> @@ -4294,6 +5023,41 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>  
>      switch ( op )
>      {
> +    case HVMOP_create_ioreq_server:
> +        rc = hvmop_create_ioreq_server(
> +            guest_handle_cast(arg, xen_hvm_create_ioreq_server_t));
> +        break;
> +    
> +    case HVMOP_get_ioreq_server_info:
> +        rc = hvmop_get_ioreq_server_info(
> +            guest_handle_cast(arg, xen_hvm_get_ioreq_server_info_t));
> +        break;
> +    
> +    case HVMOP_map_io_range_to_ioreq_server:
> +        rc = hvmop_map_io_range_to_ioreq_server(
> +            guest_handle_cast(arg, xen_hvm_map_io_range_to_ioreq_server_t));
> +        break;
> +    
> +    case HVMOP_unmap_io_range_from_ioreq_server:
> +        rc = hvmop_unmap_io_range_from_ioreq_server(
> +            guest_handle_cast(arg, xen_hvm_unmap_io_range_from_ioreq_server_t));
> +        break;
> +    
> +    case HVMOP_map_pcidev_to_ioreq_server:
> +        rc = hvmop_map_pcidev_to_ioreq_server(
> +            guest_handle_cast(arg, xen_hvm_map_pcidev_to_ioreq_server_t));
> +        break;
> +    
> +    case HVMOP_unmap_pcidev_from_ioreq_server:
> +        rc = hvmop_unmap_pcidev_from_ioreq_server(
> +            guest_handle_cast(arg, xen_hvm_unmap_pcidev_from_ioreq_server_t));
> +        break;
> +    
> +    case HVMOP_destroy_ioreq_server:
> +        rc = hvmop_destroy_ioreq_server(
> +            guest_handle_cast(arg, xen_hvm_destroy_ioreq_server_t));
> +        break;
> +    
>      case HVMOP_set_param:
>      case HVMOP_get_param:
>      {
> @@ -4382,9 +5146,9 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>                  if ( a.value == DOMID_SELF )
>                      a.value = curr_d->domain_id;
>  
> -                rc = hvm_create_ioreq_server(d, a.value);
> +                rc = hvm_create_ioreq_server(d, 0, a.value);
>                  if ( rc == -EEXIST )
> -                    rc = hvm_set_ioreq_server_domid(d, a.value);
> +                    rc = hvm_set_ioreq_server_domid(d, 0, a.value);
>                  break;
>              case HVM_PARAM_ACPI_S_STATE:
>                  /* Not reflexive, as we must domain_pause(). */
> @@ -4449,6 +5213,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>                  if ( a.value > SHUTDOWN_MAX )
>                      rc = -EINVAL;
>                  break;
> +            case HVM_PARAM_NR_IOREQ_SERVERS:
> +                if ( d == current->domain )
> +                    rc = -EPERM;
> +                break;

Is this correct? Security-wise, it should be restricted more.

Having said that, I can't see anything good to come from being able to
change this value on the fly.  Is it possible to make a domain creation
parameters?

>              }
>  
>              if ( rc == 0 ) 
> @@ -4483,7 +5251,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>              case HVM_PARAM_BUFIOREQ_PFN:
>              case HVM_PARAM_BUFIOREQ_EVTCHN:
>                  /* May need to create server */
> -                rc = hvm_create_ioreq_server(d, curr_d->domain_id);
> +                rc = hvm_create_ioreq_server(d, 0, curr_d->domain_id);
>                  if ( rc != 0 && rc != -EEXIST )
>                      goto param_fail;
>  
> @@ -4492,7 +5260,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>                  case HVM_PARAM_IOREQ_PFN: {
>                      xen_pfn_t pfn;
>  
> -                    if ( (rc = hvm_get_ioreq_server_pfn(d, 0, &pfn)) < 0 )
> +                    if ( (rc = hvm_get_ioreq_server_pfn(d, 0, 0, &pfn)) < 0 )
>                          goto param_fail;
>  
>                      a.value = pfn;
> @@ -4501,7 +5269,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>                  case HVM_PARAM_BUFIOREQ_PFN: {
>                      xen_pfn_t pfn;
>  
> -                    if ( (rc = hvm_get_ioreq_server_pfn(d, 1, &pfn)) < 0 )
> +                    if ( (rc = hvm_get_ioreq_server_pfn(d, 0, 1, &pfn)) < 0 )
>                          goto param_fail;
>  
>                      a.value = pfn;
> @@ -4510,7 +5278,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>                  case HVM_PARAM_BUFIOREQ_EVTCHN: {
>                      evtchn_port_t port;
>  
> -                    if ( (rc = hvm_get_ioreq_server_buf_port(d, &port)) < 0 )
> +                    if ( (rc = hvm_get_ioreq_server_buf_port(d, 0, &port)) < 0 )
>                          goto param_fail;
>  
>                      a.value = port;
> diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
> index 576641c..a0d76b2 100644
> --- a/xen/arch/x86/hvm/io.c
> +++ b/xen/arch/x86/hvm/io.c
> @@ -78,7 +78,7 @@ void send_invalidate_req(void)
>      p->dir = IOREQ_WRITE;
>      p->data = ~0UL; /* flush all */
>  
> -    (void)hvm_send_assist_req(v, p);
> +    hvm_broadcast_assist_req(v, p);
>  }
>  
>  int handle_mmio(void)
> diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
> index e750ef0..93dcec1 100644
> --- a/xen/include/asm-x86/hvm/domain.h
> +++ b/xen/include/asm-x86/hvm/domain.h
> @@ -41,19 +41,38 @@ struct hvm_ioreq_page {
>      void *va;
>  };
>  
> +struct hvm_io_range {
> +    struct hvm_io_range *next;
> +    uint64_t            start, end;
> +};	
> +
> +struct hvm_pcidev {
> +    struct hvm_pcidev *next;
> +    uint16_t          bdf;
> +};	
> +
>  struct hvm_ioreq_server {
> +    struct list_head       domain_list_entry;
> +    struct list_head       vcpu_list_entry[MAX_HVM_VCPUS];

Given that this has to be initialised anyway, would it be better to have
it dynamically sized on the d->max_cpus, which is almost always be far
smaller?

~Andrew

> +    ioservid_t             id;
>      struct domain          *domain;
>      domid_t                domid;
>      struct hvm_ioreq_page  ioreq;
>      int                    ioreq_evtchn[MAX_HVM_VCPUS];
>      struct hvm_ioreq_page  buf_ioreq;
>      int                    buf_ioreq_evtchn;
> +    struct hvm_io_range    *mmio_range_list;
> +    struct hvm_io_range    *portio_range_list;
> +    struct hvm_pcidev      *pcidev_list;
>  };
>  
>  struct hvm_domain {
> -    struct hvm_ioreq_server *ioreq_server;
> +    struct list_head        ioreq_server_list;
>      spinlock_t              ioreq_server_lock;
>  
> +    uint32_t                pci_cf8;
> +    spinlock_t              pci_lock;
> +
>      struct pl_time         pl_time;
>  
>      struct hvm_io_handler *io_handler;
> diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
> index 4e8fee8..1c3854f 100644
> --- a/xen/include/asm-x86/hvm/hvm.h
> +++ b/xen/include/asm-x86/hvm/hvm.h
> @@ -225,6 +225,7 @@ int prepare_ring_for_helper(struct domain *d, unsigned long gmfn,
>  void destroy_ring_for_helper(void **_va, struct page_info *page);
>  
>  bool_t hvm_send_assist_req(struct vcpu *v, ioreq_t *p);
> +void hvm_broadcast_assist_req(struct vcpu *v, ioreq_t *p);
>  
>  void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat);
>  int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat);
> diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
> index 4c9d7ee..211ebfd 100644
> --- a/xen/include/asm-x86/hvm/vcpu.h
> +++ b/xen/include/asm-x86/hvm/vcpu.h
> @@ -138,7 +138,7 @@ struct hvm_vcpu {
>      spinlock_t          tm_lock;
>      struct list_head    tm_list;
>  
> -    struct hvm_ioreq_server *ioreq_server;
> +    struct list_head    ioreq_server_list;
>  
>      bool_t              flag_dr_dirty;
>      bool_t              debug_state_latch;
> diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
> index a9aab4b..6b31189 100644
> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -23,6 +23,7 @@
>  
>  #include "../xen.h"
>  #include "../trace.h"
> +#include "../event_channel.h"
>  
>  /* Get/set subcommands: extra argument == pointer to xen_hvm_param struct. */
>  #define HVMOP_set_param           0
> @@ -270,6 +271,75 @@ struct xen_hvm_inject_msi {
>  typedef struct xen_hvm_inject_msi xen_hvm_inject_msi_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_hvm_inject_msi_t);
>  
> +typedef uint32_t ioservid_t;
> +
> +DEFINE_XEN_GUEST_HANDLE(ioservid_t);
> +
> +#define HVMOP_create_ioreq_server 17
> +struct xen_hvm_create_ioreq_server {
> +    domid_t domid;  /* IN - domain to be serviced */
> +    ioservid_t id;  /* OUT - server id */
> +};
> +typedef struct xen_hvm_create_ioreq_server xen_hvm_create_ioreq_server_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_create_ioreq_server_t);
> +
> +#define HVMOP_get_ioreq_server_info 18
> +struct xen_hvm_get_ioreq_server_info {
> +    domid_t domid;          /* IN - domain to be serviced */
> +    ioservid_t id;          /* IN - server id */
> +    xen_pfn_t pfn;          /* OUT - ioreq pfn */
> +    xen_pfn_t buf_pfn;      /* OUT - buf ioreq pfn */
> +    evtchn_port_t buf_port; /* OUT - buf ioreq port */
> +};
> +typedef struct xen_hvm_get_ioreq_server_info xen_hvm_get_ioreq_server_info_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_get_ioreq_server_info_t);
> +
> +#define HVMOP_map_io_range_to_ioreq_server 19
> +struct xen_hvm_map_io_range_to_ioreq_server {
> +    domid_t domid;                  /* IN - domain to be serviced */
> +    ioservid_t id;                  /* IN - handle from HVMOP_register_ioreq_server */
> +    int is_mmio;                    /* IN - MMIO or port IO? */
> +    uint64_aligned_t start, end;    /* IN - inclusive start and end of range */
> +};
> +typedef struct xen_hvm_map_io_range_to_ioreq_server xen_hvm_map_io_range_to_ioreq_server_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_io_range_to_ioreq_server_t);
> +
> +#define HVMOP_unmap_io_range_from_ioreq_server 20
> +struct xen_hvm_unmap_io_range_from_ioreq_server {
> +    domid_t domid;          /* IN - domain to be serviced */
> +    ioservid_t id;          /* IN - handle from HVMOP_register_ioreq_server */
> +    uint8_t is_mmio;        /* IN - MMIO or port IO? */
> +    uint64_aligned_t start; /* IN - start address of the range to remove */
> +};
> +typedef struct xen_hvm_unmap_io_range_from_ioreq_server xen_hvm_unmap_io_range_from_ioreq_server_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_unmap_io_range_from_ioreq_server_t);
> +
> +#define HVMOP_map_pcidev_to_ioreq_server 21
> +struct xen_hvm_map_pcidev_to_ioreq_server {
> +    domid_t domid;      /* IN - domain to be serviced */
> +    ioservid_t id;      /* IN - handle from HVMOP_register_ioreq_server */
> +    uint16_t bdf;       /* IN - PCI bus/dev/func */
> +};
> +typedef struct xen_hvm_map_pcidev_to_ioreq_server xen_hvm_map_pcidev_to_ioreq_server_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_pcidev_to_ioreq_server_t);
> +
> +#define HVMOP_unmap_pcidev_from_ioreq_server 22
> +struct xen_hvm_unmap_pcidev_from_ioreq_server {
> +    domid_t domid;      /* IN - domain to be serviced */
> +    ioservid_t id;      /* IN - handle from HVMOP_register_ioreq_server */
> +    uint16_t bdf;       /* IN - PCI bus/dev/func */
> +};
> +typedef struct xen_hvm_unmap_pcidev_from_ioreq_server xen_hvm_unmap_pcidev_from_ioreq_server_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_unmap_pcidev_from_ioreq_server_t);
> +
> +#define HVMOP_destroy_ioreq_server 23
> +struct xen_hvm_destroy_ioreq_server {
> +    domid_t domid;          /* IN - domain to be serviced */
> +    ioservid_t id;          /* IN - server id */
> +};
> +typedef struct xen_hvm_destroy_ioreq_server xen_hvm_destroy_ioreq_server_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_destroy_ioreq_server_t);
> +
>  #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
>  
>  #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
> diff --git a/xen/include/public/hvm/ioreq.h b/xen/include/public/hvm/ioreq.h
> index f05d130..e84fa75 100644
> --- a/xen/include/public/hvm/ioreq.h
> +++ b/xen/include/public/hvm/ioreq.h
> @@ -34,6 +34,7 @@
>  
>  #define IOREQ_TYPE_PIO          0 /* pio */
>  #define IOREQ_TYPE_COPY         1 /* mmio ops */
> +#define IOREQ_TYPE_PCI_CONFIG   2 /* pci config ops */
>  #define IOREQ_TYPE_TIMEOFFSET   7
>  #define IOREQ_TYPE_INVALIDATE   8 /* mapcache */
>  
> diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
> index 517a184..4109b11 100644
> --- a/xen/include/public/hvm/params.h
> +++ b/xen/include/public/hvm/params.h
> @@ -145,6 +145,8 @@
>  /* SHUTDOWN_* action in case of a triple fault */
>  #define HVM_PARAM_TRIPLE_FAULT_REASON 31
>  
> -#define HVM_NR_PARAMS          32
> +#define HVM_PARAM_NR_IOREQ_SERVERS 32
> +
> +#define HVM_NR_PARAMS          33
>  
>  #endif /* __XEN_PUBLIC_HVM_PARAMS_H__ */

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 4/5] ioreq-server: add support for multiple servers
  2014-01-30 15:46   ` Andrew Cooper
@ 2014-01-30 15:56     ` Paul Durrant
  0 siblings, 0 replies; 25+ messages in thread
From: Paul Durrant @ 2014-01-30 15:56 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel@lists.xen.org

> -----Original Message-----
[snip]
> > +
> > +    if ( max_emulators < 1 )
> > +        goto error_out;
> 
> Is there a sane upper bound for emulators?
> 

I imagine there probably needs to be. I haven't work it out yet, but it will be when the special pages  start to run into something else no doubt.

> >
> >      if ( nr_pages > target_pages )
> >          pod_mode = XENMEMF_populate_on_demand;
> > @@ -458,7 +463,8 @@ static int setup_guest(xc_interface *xch,
> >                xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE,
> >                HVM_INFO_PFN)) == NULL )
> >          goto error_out;
> > -    build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size);
> > +    build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size,
> > +                   max_emulators);
> >      munmap(hvm_info_page, PAGE_SIZE);
> >
> >      /* Allocate and clear special pages. */
> > @@ -470,17 +476,18 @@ static int setup_guest(xc_interface *xch,
> >              "  STORE:     %"PRI_xen_pfn"\n"
> >              "  IDENT_PT:  %"PRI_xen_pfn"\n"
> >              "  CONSOLE:   %"PRI_xen_pfn"\n"
> > -            "  IOREQ:     %"PRI_xen_pfn"\n",
> > -            NR_SPECIAL_PAGES,
> > +            "  IOREQ(%02d): %"PRI_xen_pfn"\n",
> > +            NR_SPECIAL_PAGES(max_emulators),
> >              (xen_pfn_t)special_pfn(SPECIALPAGE_PAGING),
> >              (xen_pfn_t)special_pfn(SPECIALPAGE_ACCESS),
> >              (xen_pfn_t)special_pfn(SPECIALPAGE_SHARING),
> >              (xen_pfn_t)special_pfn(SPECIALPAGE_XENSTORE),
> >              (xen_pfn_t)special_pfn(SPECIALPAGE_IDENT_PT),
> >              (xen_pfn_t)special_pfn(SPECIALPAGE_CONSOLE),
> > +            max_emulators * 2,
> >              (xen_pfn_t)special_pfn(SPECIALPAGE_IOREQ));
> >
> > -    for ( i = 0; i < NR_SPECIAL_PAGES; i++ )
> > +    for ( i = 0; i < NR_SPECIAL_PAGES(max_emulators); i++ )
> >      {
> >          xen_pfn_t pfn = special_pfn(i);
> >          rc = xc_domain_populate_physmap_exact(xch, dom, 1, 0, 0, &pfn);
> > @@ -506,7 +513,9 @@ static int setup_guest(xc_interface *xch,
> >      xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN,
> >                       special_pfn(SPECIALPAGE_IOREQ));
> >      xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN,
> > -                     special_pfn(SPECIALPAGE_IOREQ) - 1);
> > +                     special_pfn(SPECIALPAGE_IOREQ) - max_emulators);
> > +    xc_set_hvm_param(xch, dom, HVM_PARAM_NR_IOREQ_SERVERS,
> > +                     max_emulators);
> >
> >      /*
> >       * Identity-map page table is required for running with CR0.PG=0 when
> > diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> > index 13f816b..142aaea 100644
> > --- a/tools/libxc/xenctrl.h
> > +++ b/tools/libxc/xenctrl.h
> > @@ -1801,6 +1801,47 @@ void xc_clear_last_error(xc_interface *xch);
> >  int xc_set_hvm_param(xc_interface *handle, domid_t dom, int param,
> unsigned long value);
> >  int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param,
> unsigned long *value);
> >
> > +/*
> > + * IOREQ server API
> > + */
> > +int xc_hvm_create_ioreq_server(xc_interface *xch,
> > +			       domid_t domid,
> > +			       ioservid_t *id);
> > +
> > +int xc_hvm_get_ioreq_server_info(xc_interface *xch,
> > +				 domid_t domid,
> > +				 ioservid_t id,
> > +				 xen_pfn_t *pfn,
> > +				 xen_pfn_t *buf_pfn,
> > +				 evtchn_port_t *buf_port);
> > +
> > +int xc_hvm_map_io_range_to_ioreq_server(xc_interface *xch,
> > +					domid_t domid,
> > +                                        ioservid_t id,
> > +					int is_mmio,
> > +                                        uint64_t start,
> > +					uint64_t end);
> > +
> > +int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch,
> > +					    domid_t domid,
> > +                                            ioservid_t id,
> > +					    int is_mmio,
> > +                                            uint64_t start);
> > +
> > +int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch,
> > +				      domid_t domid,
> > +                                      ioservid_t id,
> > +				      uint16_t bdf);
> > +
> > +int xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch,
> > +					  domid_t domid,
> > +					  ioservid_t id,
> > +					  uint16_t bdf);
> > +
> > +int xc_hvm_destroy_ioreq_server(xc_interface *xch,
> > +				domid_t domid,
> > +				ioservid_t id);
> > +
> 
> There are tab/space issues in this hunk.
> 

So there are. Probably some missing emacs boilerplate.

[snip]
> > +            case HVM_PARAM_NR_IOREQ_SERVERS:
> > +                if ( d == current->domain )
> > +                    rc = -EPERM;
> > +                break;
> 
> Is this correct? Security-wise, it should be restricted more.
> 
> Having said that, I can't see anything good to come from being able to
> change this value on the fly.  Is it possible to make a domain creation
> parameters?
> 

I don't know. Maybe we can have one-time settable params? The other 'legacy' ioreq params seem quite insecure too.

> >              }
> >
> >              if ( rc == 0 )
> > @@ -4483,7 +5251,7 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> >              case HVM_PARAM_BUFIOREQ_PFN:
> >              case HVM_PARAM_BUFIOREQ_EVTCHN:
> >                  /* May need to create server */
> > -                rc = hvm_create_ioreq_server(d, curr_d->domain_id);
> > +                rc = hvm_create_ioreq_server(d, 0, curr_d->domain_id);
> >                  if ( rc != 0 && rc != -EEXIST )
> >                      goto param_fail;
> >
> > @@ -4492,7 +5260,7 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> >                  case HVM_PARAM_IOREQ_PFN: {
> >                      xen_pfn_t pfn;
> >
> > -                    if ( (rc = hvm_get_ioreq_server_pfn(d, 0, &pfn)) < 0 )
> > +                    if ( (rc = hvm_get_ioreq_server_pfn(d, 0, 0, &pfn)) < 0 )
> >                          goto param_fail;
> >
> >                      a.value = pfn;
> > @@ -4501,7 +5269,7 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> >                  case HVM_PARAM_BUFIOREQ_PFN: {
> >                      xen_pfn_t pfn;
> >
> > -                    if ( (rc = hvm_get_ioreq_server_pfn(d, 1, &pfn)) < 0 )
> > +                    if ( (rc = hvm_get_ioreq_server_pfn(d, 0, 1, &pfn)) < 0 )
> >                          goto param_fail;
> >
> >                      a.value = pfn;
> > @@ -4510,7 +5278,7 @@ long do_hvm_op(unsigned long op,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> >                  case HVM_PARAM_BUFIOREQ_EVTCHN: {
> >                      evtchn_port_t port;
> >
> > -                    if ( (rc = hvm_get_ioreq_server_buf_port(d, &port)) < 0 )
> > +                    if ( (rc = hvm_get_ioreq_server_buf_port(d, 0, &port)) < 0 )
> >                          goto param_fail;
> >
> >                      a.value = port;
> > diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
> > index 576641c..a0d76b2 100644
> > --- a/xen/arch/x86/hvm/io.c
> > +++ b/xen/arch/x86/hvm/io.c
> > @@ -78,7 +78,7 @@ void send_invalidate_req(void)
> >      p->dir = IOREQ_WRITE;
> >      p->data = ~0UL; /* flush all */
> >
> > -    (void)hvm_send_assist_req(v, p);
> > +    hvm_broadcast_assist_req(v, p);
> >  }
> >
> >  int handle_mmio(void)
> > diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-
> x86/hvm/domain.h
> > index e750ef0..93dcec1 100644
> > --- a/xen/include/asm-x86/hvm/domain.h
> > +++ b/xen/include/asm-x86/hvm/domain.h
> > @@ -41,19 +41,38 @@ struct hvm_ioreq_page {
> >      void *va;
> >  };
> >
> > +struct hvm_io_range {
> > +    struct hvm_io_range *next;
> > +    uint64_t            start, end;
> > +};
> > +
> > +struct hvm_pcidev {
> > +    struct hvm_pcidev *next;
> > +    uint16_t          bdf;
> > +};
> > +
> >  struct hvm_ioreq_server {
> > +    struct list_head       domain_list_entry;
> > +    struct list_head       vcpu_list_entry[MAX_HVM_VCPUS];
> 
> Given that this has to be initialised anyway, would it be better to have
> it dynamically sized on the d->max_cpus, which is almost always be far
> smaller?
> 

Can vcpu ids be sparse? If not then that would seem fine.

  Paul

> ~Andrew
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 5/5] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-01-30 14:19 [RFC PATCH 1/5] Support for running secondary emulators Paul Durrant
                   ` (3 preceding siblings ...)
  2014-01-30 14:19 ` [RFC PATCH 4/5] ioreq-server: add support for multiple servers Paul Durrant
@ 2014-01-30 14:19 ` Paul Durrant
  2014-01-30 15:55   ` Andrew Cooper
  2014-01-30 14:23 ` [RFC PATCH 1/5] Support for running secondary emulators Paul Durrant
  2014-03-01 22:24 ` Matt Wilson
  6 siblings, 1 reply; 25+ messages in thread
From: Paul Durrant @ 2014-01-30 14:19 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant

Because we may now have more than one emulator, the implementation of the
PCI hotplug controller needs to be done by Xen. Happily the code is very
short and simple and it also removes the need for a different ACPI DSDT
when using different variants of QEMU.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
 tools/firmware/hvmloader/acpi/mk_dsdt.c |  147 ++++------------------
 tools/libxc/xc_domain.c                 |   46 +++++++
 tools/libxc/xenctrl.h                   |   11 ++
 tools/libxl/libxl_pci.c                 |   15 +++
 xen/arch/x86/hvm/Makefile               |    1 +
 xen/arch/x86/hvm/hotplug.c              |  207 +++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/hvm.c                  |   40 +++++-
 xen/include/asm-x86/hvm/domain.h        |   12 ++
 xen/include/asm-x86/hvm/io.h            |    6 +
 xen/include/public/hvm/hvm_op.h         |    9 ++
 xen/include/public/hvm/ioreq.h          |    2 +
 11 files changed, 373 insertions(+), 123 deletions(-)
 create mode 100644 xen/arch/x86/hvm/hotplug.c

diff --git a/tools/firmware/hvmloader/acpi/mk_dsdt.c b/tools/firmware/hvmloader/acpi/mk_dsdt.c
index a4b693b..6408b44 100644
--- a/tools/firmware/hvmloader/acpi/mk_dsdt.c
+++ b/tools/firmware/hvmloader/acpi/mk_dsdt.c
@@ -58,28 +58,6 @@ static void pop_block(void)
     printf("}\n");
 }
 
-static void pci_hotplug_notify(unsigned int slt)
-{
-    stmt("Notify", "\\_SB.PCI0.S%02X, EVT", slt);
-}
-
-static void decision_tree(
-    unsigned int s, unsigned int e, char *var, void (*leaf)(unsigned int))
-{
-    if ( s == (e-1) )
-    {
-        (*leaf)(s);
-        return;
-    }
-
-    push_block("If", "And(%s, 0x%02x)", var, (e-s)/2);
-    decision_tree((s+e)/2, e, var, leaf);
-    pop_block();
-    push_block("Else", NULL);
-    decision_tree(s, (s+e)/2, var, leaf);
-    pop_block();
-}
-
 static struct option options[] = {
     { "maxcpu", 1, 0, 'c' },
     { "dm-version", 1, 0, 'q' },
@@ -322,64 +300,21 @@ int main(int argc, char **argv)
                    dev, intx, ((dev*4+dev/8+intx)&31)+16);
     printf("})\n");
 
-    /*
-     * Each PCI hotplug slot needs at least two methods to handle
-     * the ACPI event:
-     *  _EJ0: eject a device
-     *  _STA: return a device's status, e.g. enabled or removed
-     * 
-     * Eject button would generate a general-purpose event, then the
-     * control method for this event uses Notify() to inform OSPM which
-     * action happened and on which device.
-     *
-     * Pls. refer "6.3 Device Insertion, Removal, and Status Objects"
-     * in ACPI spec 3.0b for details.
-     *
-     * QEMU provides a simple hotplug controller with some I/O to handle
-     * the hotplug action and status, which is beyond the ACPI scope.
-     */
-    if (dm_version == QEMU_XEN_TRADITIONAL) {
-        for ( slot = 0; slot < 0x100; slot++ )
-        {
-            push_block("Device", "S%02X", slot);
-            /* _ADR == dev:fn (16:16) */
-            stmt("Name", "_ADR, 0x%08x", ((slot & ~7) << 13) | (slot & 7));
-            /* _SUN == dev */
-            stmt("Name", "_SUN, 0x%08x", slot >> 3);
-            push_block("Method", "_EJ0, 1");
-            stmt("Store", "0x%02x, \\_GPE.DPT1", slot);
-            stmt("Store", "0x88, \\_GPE.DPT2");
-            stmt("Store", "0x%02x, \\_GPE.PH%02X", /* eject */
-                 (slot & 1) ? 0x10 : 0x01, slot & ~1);
-            pop_block();
-            push_block("Method", "_STA, 0");
-            stmt("Store", "0x%02x, \\_GPE.DPT1", slot);
-            stmt("Store", "0x89, \\_GPE.DPT2");
-            if ( slot & 1 )
-                stmt("ShiftRight", "0x4, \\_GPE.PH%02X, Local1", slot & ~1);
-            else
-                stmt("And", "\\_GPE.PH%02X, 0x0f, Local1", slot & ~1);
-            stmt("Return", "Local1"); /* IN status as the _STA */
-            pop_block();
-            pop_block();
-        }
-    } else {
-        stmt("OperationRegion", "SEJ, SystemIO, 0xae08, 0x04");
-        push_block("Field", "SEJ, DWordAcc, NoLock, WriteAsZeros");
-        indent(); printf("B0EJ, 32,\n");
-        pop_block();
+    stmt("OperationRegion", "SEJ, SystemIO, 0xae08, 0x04");
+    push_block("Field", "SEJ, DWordAcc, NoLock, WriteAsZeros");
+    indent(); printf("B0EJ, 32,\n");
+    pop_block();
 
-        /* hotplug_slot */
-        for (slot = 1; slot <= 31; slot++) {
-            push_block("Device", "S%i", slot); {
-                stmt("Name", "_ADR, %#06x0000", slot);
-                push_block("Method", "_EJ0,1"); {
-                    stmt("Store", "ShiftLeft(1, %#06x), B0EJ", slot);
-                    stmt("Return", "0x0");
-                } pop_block();
-                stmt("Name", "_SUN, %i", slot);
+    /* hotplug_slot */
+    for (slot = 1; slot <= 31; slot++) {
+        push_block("Device", "S%i", slot); {
+            stmt("Name", "_ADR, %#06x0000", slot);
+            push_block("Method", "_EJ0,1"); {
+                stmt("Store", "ShiftLeft(1, %#06x), B0EJ", slot);
+                stmt("Return", "0x0");
             } pop_block();
-        }
+            stmt("Name", "_SUN, %i", slot);
+        } pop_block();
     }
 
     pop_block();
@@ -389,26 +324,11 @@ int main(int argc, char **argv)
     /**** GPE start ****/
     push_block("Scope", "\\_GPE");
 
-    if (dm_version == QEMU_XEN_TRADITIONAL) {
-        stmt("OperationRegion", "PHP, SystemIO, 0x10c0, 0x82");
-
-        push_block("Field", "PHP, ByteAcc, NoLock, Preserve");
-        indent(); printf("PSTA, 8,\n"); /* hotplug controller event reg */
-        indent(); printf("PSTB, 8,\n"); /* hotplug controller slot reg */
-        for ( slot = 0; slot < 0x100; slot += 2 )
-        {
-            indent();
-            /* Each hotplug control register manages a pair of pci functions. */
-            printf("PH%02X, 8,\n", slot);
-        }
-        pop_block();
-    } else {
-        stmt("OperationRegion", "PCST, SystemIO, 0xae00, 0x08");
-        push_block("Field", "PCST, DWordAcc, NoLock, WriteAsZeros");
-        indent(); printf("PCIU, 32,\n");
-        indent(); printf("PCID, 32,\n");
-        pop_block();
-    }
+    stmt("OperationRegion", "PCST, SystemIO, 0xae00, 0x08");
+    push_block("Field", "PCST, DWordAcc, NoLock, WriteAsZeros");
+    indent(); printf("PCIU, 32,\n");
+    indent(); printf("PCID, 32,\n");
+    pop_block();
 
     stmt("OperationRegion", "DG1, SystemIO, 0xb044, 0x04");
 
@@ -416,33 +336,16 @@ int main(int argc, char **argv)
     indent(); printf("DPT1, 8, DPT2, 8\n");
     pop_block();
 
-    if (dm_version == QEMU_XEN_TRADITIONAL) {
-        push_block("Method", "_L03, 0, Serialized");
-        /* Detect slot and event (remove/add). */
-        stmt("Name", "SLT, 0x0");
-        stmt("Name", "EVT, 0x0");
-        stmt("Store", "PSTA, Local1");
-        stmt("And", "Local1, 0xf, EVT");
-        stmt("Store", "PSTB, Local1"); /* XXX: Store (PSTB, SLT) ? */
-        stmt("And", "Local1, 0xff, SLT");
-        /* Debug */
-        stmt("Store", "SLT, DPT1");
-        stmt("Store", "EVT, DPT2");
-        /* Decision tree */
-        decision_tree(0x00, 0x100, "SLT", pci_hotplug_notify);
+    push_block("Method", "_E01");
+    for (slot = 1; slot <= 31; slot++) {
+        push_block("If", "And(PCIU, ShiftLeft(1, %i))", slot);
+        stmt("Notify", "\\_SB.PCI0.S%i, 1", slot);
         pop_block();
-    } else {
-        push_block("Method", "_E01");
-        for (slot = 1; slot <= 31; slot++) {
-            push_block("If", "And(PCIU, ShiftLeft(1, %i))", slot);
-            stmt("Notify", "\\_SB.PCI0.S%i, 1", slot);
-            pop_block();
-            push_block("If", "And(PCID, ShiftLeft(1, %i))", slot);
-            stmt("Notify", "\\_SB.PCI0.S%i, 3", slot);
-            pop_block();
-        }
+        push_block("If", "And(PCID, ShiftLeft(1, %i))", slot);
+        stmt("Notify", "\\_SB.PCI0.S%i, 3", slot);
         pop_block();
     }
+    pop_block();
 
     pop_block();
     /**** GPE end ****/
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index c64d15a..c89068e 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1421,6 +1421,52 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
     return rc;
 }
 
+int xc_hvm_pci_hotplug_enable(xc_interface *xch,
+                              domid_t domid,
+                              uint32_t slot)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_pci_hotplug_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_pci_hotplug;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->enable = 1;
+    arg->slot = slot;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
+int xc_hvm_pci_hotplug_disable(xc_interface *xch,
+                               domid_t domid,
+                               uint32_t slot)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_pci_hotplug_t, arg);
+    int rc;
+
+    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+    if ( arg == NULL )
+        return -1;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_pci_hotplug;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+    arg->domid = domid;
+    arg->enable = 0;
+    arg->slot = slot;
+    rc = do_xen_hypercall(xch, &hypercall);
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
 int xc_domain_setdebugging(xc_interface *xch,
                            uint32_t domid,
                            unsigned int enable)
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 142aaea..c3e35a9 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1842,6 +1842,17 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
 				domid_t domid,
 				ioservid_t id);
 
+/*
+ * PCI hotplug API
+ */
+int xc_hvm_pci_hotplug_enable(xc_interface *xch,
+			      domid_t domid,
+			      uint32_t slot);
+
+int xc_hvm_pci_hotplug_disable(xc_interface *xch,
+			       domid_t domid,
+			       uint32_t slot);
+
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
                      uint32_t domid,
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 2e52470..4176440 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -867,6 +867,13 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
         }
         if ( rc )
             return ERROR_FAIL;
+
+        rc = xc_hvm_pci_hotplug_enable(ctx->xch, domid, pcidev->dev);
+        if (rc < 0) {
+            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "Error: xc_hvm_pci_hotplug_enable failed");
+            return ERROR_FAIL;
+        }
+
         break;
     case LIBXL_DOMAIN_TYPE_PV:
     {
@@ -1182,6 +1189,14 @@ static int do_pci_remove(libxl__gc *gc, uint32_t domid,
                                          NULL, NULL, NULL) < 0)
             goto out_fail;
 
+        rc = xc_hvm_pci_hotplug_disable(ctx->xch, domid, pcidev->dev);
+        if (rc < 0) {
+            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR,
+                             "Error: xc_hvm_pci_hotplug_disable failed");
+            rc = ERROR_FAIL;
+            goto out_fail;
+        }
+
         switch (libxl__device_model_version_running(gc, domid)) {
         case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
             rc = qemu_pci_remove_xenstore(gc, domid, pcidev, force);
diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
index eea5555..48efddb 100644
--- a/xen/arch/x86/hvm/Makefile
+++ b/xen/arch/x86/hvm/Makefile
@@ -3,6 +3,7 @@ subdir-y += vmx
 
 obj-y += asid.o
 obj-y += emulate.o
+obj-y += hotplug.o
 obj-y += hpet.o
 obj-y += hvm.o
 obj-y += i8254.o
diff --git a/xen/arch/x86/hvm/hotplug.c b/xen/arch/x86/hvm/hotplug.c
new file mode 100644
index 0000000..253d435
--- /dev/null
+++ b/xen/arch/x86/hvm/hotplug.c
@@ -0,0 +1,207 @@
+/*
+ * hvm/hotplug.c
+ *
+ * Copyright (c) 2013, Citrix Systems Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#include <xen/types.h>
+#include <xen/spinlock.h>
+#include <xen/xmalloc.h>
+#include <asm/hvm/io.h>
+#include <asm/hvm/support.h>
+
+#define SCI_IRQ 9
+
+#define GPE_BASE            (ACPI_GPE0_BLK_ADDRESS_V1)
+#define GPE_LEN             (ACPI_GPE0_BLK_LEN_V1)
+
+#define GPE_PCI_HOTPLUG_STATUS  2
+
+#define PCI_HOTPLUG_BASE    (ACPI_PCI_HOTPLUG_ADDRESS_V1)
+#define PCI_HOTPLUG_LEN     (ACPI_PCI_HOTPLUG_LEN_V1)
+
+#define PCI_UP      0
+#define PCI_DOWN    4
+#define PCI_EJECT   8
+
+static void gpe_update_sci(struct hvm_hotplug *hp)
+{
+    if ( (hp->gpe_sts[0] & hp->gpe_en[0]) & GPE_PCI_HOTPLUG_STATUS )
+        hvm_isa_irq_assert(hp->domain, SCI_IRQ);
+    else
+        hvm_isa_irq_deassert(hp->domain, SCI_IRQ);
+}
+
+static int handle_gpe_io(
+    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+{
+    struct vcpu *v = current;
+    struct domain *d = v->domain;
+    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
+
+    if ( bytes != 1 )
+    {
+        gdprintk(XENLOG_WARNING, "%s: bad access\n", __func__);
+        goto done;
+    }
+
+    port -= GPE_BASE;
+
+    if ( dir == IOREQ_READ )
+    {
+        if ( port < GPE_LEN / 2 )
+        {
+            *val = hp->gpe_sts[port];
+        }
+        else
+        {
+            port -= GPE_LEN / 2;
+            *val = hp->gpe_en[port];
+        }
+    } else {
+        if ( port < GPE_LEN / 2 )
+        {
+            hp->gpe_sts[port] &= ~*val;
+        }
+        else
+        {
+            port -= GPE_LEN / 2;
+            hp->gpe_en[port] = *val;
+        }
+
+        gpe_update_sci(hp);
+    }
+
+done:
+    return X86EMUL_OKAY;
+}
+
+static void pci_hotplug_eject(struct hvm_hotplug *hp, uint32_t mask)
+{
+    int slot = ffs(mask) - 1;
+
+    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, slot);
+
+    hp->slot_down &= ~(1u  << slot);
+    hp->slot_up &= ~(1u  << slot);
+}
+
+static int handle_pci_hotplug_io(
+    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
+{
+    struct vcpu *v = current;
+    struct domain *d = v->domain;
+    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
+
+    if ( bytes != 4 )
+    {
+        gdprintk(XENLOG_WARNING, "%s: bad access\n", __func__);
+        goto done;
+    }
+
+    port -= PCI_HOTPLUG_BASE;
+
+    if ( dir == IOREQ_READ )
+    {
+        switch ( port )
+        {
+        case PCI_UP:
+            *val = hp->slot_up;
+            break;
+        case PCI_DOWN:
+            *val = hp->slot_down;
+            break;
+        default:
+            break;
+        }
+    }
+    else
+    {   
+        switch ( port )
+        {
+        case PCI_EJECT:
+            pci_hotplug_eject(hp, *val);
+            break;
+        default:
+            break;
+        }
+    }
+
+done:
+    return X86EMUL_OKAY;
+}
+
+void pci_hotplug(struct domain *d, int slot, bool_t enable)
+{
+    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
+
+    gdprintk(XENLOG_INFO, "%s: %s %d\n", __func__,
+             ( enable ) ? "enable" : "disable", slot);
+
+    if ( enable )
+        hp->slot_up |= (1u << slot);
+    else
+        hp->slot_down |= (1u << slot);
+
+    hp->gpe_sts[0] |= GPE_PCI_HOTPLUG_STATUS;
+    gpe_update_sci(hp);
+}
+
+int gpe_init(struct domain *d)
+{
+    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
+
+    hp->domain = d;
+
+    hp->gpe_sts = xzalloc_array(uint8_t, GPE_LEN / 2);
+    if ( hp->gpe_sts == NULL )
+        goto fail1;
+
+    hp->gpe_en = xzalloc_array(uint8_t, GPE_LEN / 2);
+    if ( hp->gpe_en == NULL )
+        goto fail2;
+
+    register_portio_handler(d, GPE_BASE, GPE_LEN, handle_gpe_io);
+    register_portio_handler(d, PCI_HOTPLUG_BASE, PCI_HOTPLUG_LEN,
+                            handle_pci_hotplug_io);
+
+    return 0;
+
+fail2:
+    xfree(hp->gpe_sts);
+
+fail1:
+    return -ENOMEM;
+}
+
+void gpe_deinit(struct domain *d)
+{
+    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
+
+    xfree(hp->gpe_en);
+    xfree(hp->gpe_sts);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * c-tab-always-indent: nil
+ * End:
+ */
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5f9e728..ff7b259 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1298,15 +1298,21 @@ int hvm_domain_initialise(struct domain *d)
 
     rtc_init(d);
 
+    rc = gpe_init(d);
+    if ( rc != 0 )
+        goto fail2;
+
     register_portio_handler(d, 0xe9, 1, hvm_print_line);
     register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
 
     rc = hvm_funcs.domain_initialise(d);
     if ( rc != 0 )
-        goto fail2;
+        goto fail3;
 
     return 0;
 
+ fail3:
+    gpe_deinit(d);
  fail2:
     rtc_deinit(d);
     stdvga_deinit(d);
@@ -1352,6 +1358,7 @@ void hvm_domain_destroy(struct domain *d)
         return;
 
     hvm_funcs.domain_destroy(d);
+    gpe_deinit(d);
     rtc_deinit(d);
     stdvga_deinit(d);
     vioapic_deinit(d);
@@ -5015,6 +5022,32 @@ out:
     return rc;
 }
 
+static int hvmop_pci_hotplug(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_pci_hotplug_t) uop)
+{
+    xen_hvm_pci_hotplug_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    pci_hotplug(d, op.slot, op.enable);
+    rc = 0;
+
+out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
 long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
 {
@@ -5058,6 +5091,11 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             guest_handle_cast(arg, xen_hvm_destroy_ioreq_server_t));
         break;
     
+    case HVMOP_pci_hotplug:
+        rc = hvmop_pci_hotplug(
+            guest_handle_cast(arg, xen_hvm_pci_hotplug_t));
+        break;
+
     case HVMOP_set_param:
     case HVMOP_get_param:
     {
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 93dcec1..13dd24d 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -66,6 +66,16 @@ struct hvm_ioreq_server {
     struct hvm_pcidev      *pcidev_list;
 };
 
+struct hvm_hotplug {
+    struct domain   *domain;
+    uint8_t         *gpe_sts;
+    uint8_t         *gpe_en;
+
+    /* PCI hotplug */
+    uint32_t        slot_up;
+    uint32_t        slot_down;
+};
+
 struct hvm_domain {
     struct list_head        ioreq_server_list;
     spinlock_t              ioreq_server_lock;
@@ -73,6 +83,8 @@ struct hvm_domain {
     uint32_t                pci_cf8;
     spinlock_t              pci_lock;
 
+    struct hvm_hotplug      hotplug;
+
     struct pl_time         pl_time;
 
     struct hvm_io_handler *io_handler;
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 86db58d..072bfe7 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -142,5 +142,11 @@ void stdvga_init(struct domain *d);
 void stdvga_deinit(struct domain *d);
 
 extern void hvm_dpci_msi_eoi(struct domain *d, int vector);
+
+int gpe_init(struct domain *d);
+void gpe_deinit(struct domain *d);
+
+void pci_hotplug(struct domain *d, int slot, bool_t enable);
+
 #endif /* __ASM_X86_HVM_IO_H__ */
 
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index 6b31189..20a53ab 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -340,6 +340,15 @@ struct xen_hvm_destroy_ioreq_server {
 typedef struct xen_hvm_destroy_ioreq_server xen_hvm_destroy_ioreq_server_t;
 DEFINE_XEN_GUEST_HANDLE(xen_hvm_destroy_ioreq_server_t);
 
+#define HVMOP_pci_hotplug 24
+struct xen_hvm_pci_hotplug {
+    domid_t domid;          /* IN - domain to be serviced */
+    uint8_t enable;         /* IN - enable or disable? */
+    uint32_t slot;          /* IN - slot to enable/disable */
+};
+typedef struct xen_hvm_pci_hotplug xen_hvm_pci_hotplug_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_pci_hotplug_t);
+
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 
 #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
diff --git a/xen/include/public/hvm/ioreq.h b/xen/include/public/hvm/ioreq.h
index e84fa75..40bfa61 100644
--- a/xen/include/public/hvm/ioreq.h
+++ b/xen/include/public/hvm/ioreq.h
@@ -101,6 +101,8 @@ typedef struct buffered_iopage buffered_iopage_t;
 #define ACPI_PM_TMR_BLK_ADDRESS_V1   (ACPI_PM1A_EVT_BLK_ADDRESS_V1 + 0x08)
 #define ACPI_GPE0_BLK_ADDRESS_V1     0xafe0
 #define ACPI_GPE0_BLK_LEN_V1         0x04
+#define ACPI_PCI_HOTPLUG_ADDRESS_V1  0xae00
+#define ACPI_PCI_HOTPLUG_LEN_V1      0x10
 
 /* Compatibility definitions for the default location (version 0). */
 #define ACPI_PM1A_EVT_BLK_ADDRESS    ACPI_PM1A_EVT_BLK_ADDRESS_V0
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 5/5] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-01-30 14:19 ` [RFC PATCH 5/5] ioreq-server: bring the PCI hotplug controller implementation into Xen Paul Durrant
@ 2014-01-30 15:55   ` Andrew Cooper
  2014-01-30 16:06     ` Paul Durrant
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2014-01-30 15:55 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

On 30/01/14 14:19, Paul Durrant wrote:
> Because we may now have more than one emulator, the implementation of the
> PCI hotplug controller needs to be done by Xen. Happily the code is very
> short and simple and it also removes the need for a different ACPI DSDT
> when using different variants of QEMU.
>
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> ---
>  tools/firmware/hvmloader/acpi/mk_dsdt.c |  147 ++++------------------
>  tools/libxc/xc_domain.c                 |   46 +++++++
>  tools/libxc/xenctrl.h                   |   11 ++
>  tools/libxl/libxl_pci.c                 |   15 +++
>  xen/arch/x86/hvm/Makefile               |    1 +
>  xen/arch/x86/hvm/hotplug.c              |  207 +++++++++++++++++++++++++++++++
>  xen/arch/x86/hvm/hvm.c                  |   40 +++++-
>  xen/include/asm-x86/hvm/domain.h        |   12 ++
>  xen/include/asm-x86/hvm/io.h            |    6 +
>  xen/include/public/hvm/hvm_op.h         |    9 ++
>  xen/include/public/hvm/ioreq.h          |    2 +
>  11 files changed, 373 insertions(+), 123 deletions(-)
>  create mode 100644 xen/arch/x86/hvm/hotplug.c
>
> diff --git a/tools/firmware/hvmloader/acpi/mk_dsdt.c b/tools/firmware/hvmloader/acpi/mk_dsdt.c
> index a4b693b..6408b44 100644
> --- a/tools/firmware/hvmloader/acpi/mk_dsdt.c
> +++ b/tools/firmware/hvmloader/acpi/mk_dsdt.c
> @@ -58,28 +58,6 @@ static void pop_block(void)
>      printf("}\n");
>  }
>  
> -static void pci_hotplug_notify(unsigned int slt)
> -{
> -    stmt("Notify", "\\_SB.PCI0.S%02X, EVT", slt);
> -}
> -
> -static void decision_tree(
> -    unsigned int s, unsigned int e, char *var, void (*leaf)(unsigned int))
> -{
> -    if ( s == (e-1) )
> -    {
> -        (*leaf)(s);
> -        return;
> -    }
> -
> -    push_block("If", "And(%s, 0x%02x)", var, (e-s)/2);
> -    decision_tree((s+e)/2, e, var, leaf);
> -    pop_block();
> -    push_block("Else", NULL);
> -    decision_tree(s, (s+e)/2, var, leaf);
> -    pop_block();
> -}
> -
>  static struct option options[] = {
>      { "maxcpu", 1, 0, 'c' },
>      { "dm-version", 1, 0, 'q' },
> @@ -322,64 +300,21 @@ int main(int argc, char **argv)
>                     dev, intx, ((dev*4+dev/8+intx)&31)+16);
>      printf("})\n");
>  
> -    /*
> -     * Each PCI hotplug slot needs at least two methods to handle
> -     * the ACPI event:
> -     *  _EJ0: eject a device
> -     *  _STA: return a device's status, e.g. enabled or removed
> -     * 
> -     * Eject button would generate a general-purpose event, then the
> -     * control method for this event uses Notify() to inform OSPM which
> -     * action happened and on which device.
> -     *
> -     * Pls. refer "6.3 Device Insertion, Removal, and Status Objects"
> -     * in ACPI spec 3.0b for details.
> -     *
> -     * QEMU provides a simple hotplug controller with some I/O to handle
> -     * the hotplug action and status, which is beyond the ACPI scope.
> -     */
> -    if (dm_version == QEMU_XEN_TRADITIONAL) {
> -        for ( slot = 0; slot < 0x100; slot++ )
> -        {
> -            push_block("Device", "S%02X", slot);
> -            /* _ADR == dev:fn (16:16) */
> -            stmt("Name", "_ADR, 0x%08x", ((slot & ~7) << 13) | (slot & 7));
> -            /* _SUN == dev */
> -            stmt("Name", "_SUN, 0x%08x", slot >> 3);
> -            push_block("Method", "_EJ0, 1");
> -            stmt("Store", "0x%02x, \\_GPE.DPT1", slot);
> -            stmt("Store", "0x88, \\_GPE.DPT2");
> -            stmt("Store", "0x%02x, \\_GPE.PH%02X", /* eject */
> -                 (slot & 1) ? 0x10 : 0x01, slot & ~1);
> -            pop_block();
> -            push_block("Method", "_STA, 0");
> -            stmt("Store", "0x%02x, \\_GPE.DPT1", slot);
> -            stmt("Store", "0x89, \\_GPE.DPT2");
> -            if ( slot & 1 )
> -                stmt("ShiftRight", "0x4, \\_GPE.PH%02X, Local1", slot & ~1);
> -            else
> -                stmt("And", "\\_GPE.PH%02X, 0x0f, Local1", slot & ~1);
> -            stmt("Return", "Local1"); /* IN status as the _STA */
> -            pop_block();
> -            pop_block();
> -        }
> -    } else {
> -        stmt("OperationRegion", "SEJ, SystemIO, 0xae08, 0x04");
> -        push_block("Field", "SEJ, DWordAcc, NoLock, WriteAsZeros");
> -        indent(); printf("B0EJ, 32,\n");
> -        pop_block();
> +    stmt("OperationRegion", "SEJ, SystemIO, 0xae08, 0x04");
> +    push_block("Field", "SEJ, DWordAcc, NoLock, WriteAsZeros");
> +    indent(); printf("B0EJ, 32,\n");
> +    pop_block();
>  
> -        /* hotplug_slot */
> -        for (slot = 1; slot <= 31; slot++) {
> -            push_block("Device", "S%i", slot); {
> -                stmt("Name", "_ADR, %#06x0000", slot);
> -                push_block("Method", "_EJ0,1"); {
> -                    stmt("Store", "ShiftLeft(1, %#06x), B0EJ", slot);
> -                    stmt("Return", "0x0");
> -                } pop_block();
> -                stmt("Name", "_SUN, %i", slot);
> +    /* hotplug_slot */
> +    for (slot = 1; slot <= 31; slot++) {
> +        push_block("Device", "S%i", slot); {
> +            stmt("Name", "_ADR, %#06x0000", slot);
> +            push_block("Method", "_EJ0,1"); {
> +                stmt("Store", "ShiftLeft(1, %#06x), B0EJ", slot);
> +                stmt("Return", "0x0");
>              } pop_block();
> -        }
> +            stmt("Name", "_SUN, %i", slot);
> +        } pop_block();
>      }
>  
>      pop_block();
> @@ -389,26 +324,11 @@ int main(int argc, char **argv)
>      /**** GPE start ****/
>      push_block("Scope", "\\_GPE");
>  
> -    if (dm_version == QEMU_XEN_TRADITIONAL) {
> -        stmt("OperationRegion", "PHP, SystemIO, 0x10c0, 0x82");
> -
> -        push_block("Field", "PHP, ByteAcc, NoLock, Preserve");
> -        indent(); printf("PSTA, 8,\n"); /* hotplug controller event reg */
> -        indent(); printf("PSTB, 8,\n"); /* hotplug controller slot reg */
> -        for ( slot = 0; slot < 0x100; slot += 2 )
> -        {
> -            indent();
> -            /* Each hotplug control register manages a pair of pci functions. */
> -            printf("PH%02X, 8,\n", slot);
> -        }
> -        pop_block();
> -    } else {
> -        stmt("OperationRegion", "PCST, SystemIO, 0xae00, 0x08");
> -        push_block("Field", "PCST, DWordAcc, NoLock, WriteAsZeros");
> -        indent(); printf("PCIU, 32,\n");
> -        indent(); printf("PCID, 32,\n");
> -        pop_block();
> -    }
> +    stmt("OperationRegion", "PCST, SystemIO, 0xae00, 0x08");
> +    push_block("Field", "PCST, DWordAcc, NoLock, WriteAsZeros");
> +    indent(); printf("PCIU, 32,\n");
> +    indent(); printf("PCID, 32,\n");
> +    pop_block();
>  
>      stmt("OperationRegion", "DG1, SystemIO, 0xb044, 0x04");
>  
> @@ -416,33 +336,16 @@ int main(int argc, char **argv)
>      indent(); printf("DPT1, 8, DPT2, 8\n");
>      pop_block();
>  
> -    if (dm_version == QEMU_XEN_TRADITIONAL) {
> -        push_block("Method", "_L03, 0, Serialized");
> -        /* Detect slot and event (remove/add). */
> -        stmt("Name", "SLT, 0x0");
> -        stmt("Name", "EVT, 0x0");
> -        stmt("Store", "PSTA, Local1");
> -        stmt("And", "Local1, 0xf, EVT");
> -        stmt("Store", "PSTB, Local1"); /* XXX: Store (PSTB, SLT) ? */
> -        stmt("And", "Local1, 0xff, SLT");
> -        /* Debug */
> -        stmt("Store", "SLT, DPT1");
> -        stmt("Store", "EVT, DPT2");
> -        /* Decision tree */
> -        decision_tree(0x00, 0x100, "SLT", pci_hotplug_notify);
> +    push_block("Method", "_E01");
> +    for (slot = 1; slot <= 31; slot++) {
> +        push_block("If", "And(PCIU, ShiftLeft(1, %i))", slot);
> +        stmt("Notify", "\\_SB.PCI0.S%i, 1", slot);
>          pop_block();
> -    } else {
> -        push_block("Method", "_E01");
> -        for (slot = 1; slot <= 31; slot++) {
> -            push_block("If", "And(PCIU, ShiftLeft(1, %i))", slot);
> -            stmt("Notify", "\\_SB.PCI0.S%i, 1", slot);
> -            pop_block();
> -            push_block("If", "And(PCID, ShiftLeft(1, %i))", slot);
> -            stmt("Notify", "\\_SB.PCI0.S%i, 3", slot);
> -            pop_block();
> -        }
> +        push_block("If", "And(PCID, ShiftLeft(1, %i))", slot);
> +        stmt("Notify", "\\_SB.PCI0.S%i, 3", slot);
>          pop_block();
>      }
> +    pop_block();
>  
>      pop_block();
>      /**** GPE end ****/
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index c64d15a..c89068e 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -1421,6 +1421,52 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
>      return rc;
>  }
>  
> +int xc_hvm_pci_hotplug_enable(xc_interface *xch,
> +                              domid_t domid,
> +                              uint32_t slot)

Take enable as a parameter and save having 2 almost identical functions?

> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_pci_hotplug_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_pci_hotplug;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +    arg->domid = domid;
> +    arg->enable = 1;
> +    arg->slot = slot;
> +    rc = do_xen_hypercall(xch, &hypercall);
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}
> +
> +int xc_hvm_pci_hotplug_disable(xc_interface *xch,
> +                               domid_t domid,
> +                               uint32_t slot)
> +{
> +    DECLARE_HYPERCALL;
> +    DECLARE_HYPERCALL_BUFFER(xen_hvm_pci_hotplug_t, arg);
> +    int rc;
> +
> +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +    if ( arg == NULL )
> +        return -1;
> +
> +    hypercall.op     = __HYPERVISOR_hvm_op;
> +    hypercall.arg[0] = HVMOP_pci_hotplug;
> +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +    arg->domid = domid;
> +    arg->enable = 0;
> +    arg->slot = slot;
> +    rc = do_xen_hypercall(xch, &hypercall);
> +    xc_hypercall_buffer_free(xch, arg);
> +    return rc;
> +}
> +
>  int xc_domain_setdebugging(xc_interface *xch,
>                             uint32_t domid,
>                             unsigned int enable)
> diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> index 142aaea..c3e35a9 100644
> --- a/tools/libxc/xenctrl.h
> +++ b/tools/libxc/xenctrl.h
> @@ -1842,6 +1842,17 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
>  				domid_t domid,
>  				ioservid_t id);
>  
> +/*
> + * PCI hotplug API
> + */
> +int xc_hvm_pci_hotplug_enable(xc_interface *xch,
> +			      domid_t domid,
> +			      uint32_t slot);
> +
> +int xc_hvm_pci_hotplug_disable(xc_interface *xch,
> +			       domid_t domid,
> +			       uint32_t slot);
> +

tabs/spaces

>  /* HVM guest pass-through */
>  int xc_assign_device(xc_interface *xch,
>                       uint32_t domid,
> diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> index 2e52470..4176440 100644
> --- a/tools/libxl/libxl_pci.c
> +++ b/tools/libxl/libxl_pci.c
> @@ -867,6 +867,13 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
>          }
>          if ( rc )
>              return ERROR_FAIL;
> +
> +        rc = xc_hvm_pci_hotplug_enable(ctx->xch, domid, pcidev->dev);
> +        if (rc < 0) {
> +            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "Error: xc_hvm_pci_hotplug_enable failed");
> +            return ERROR_FAIL;
> +        }
> +
>          break;
>      case LIBXL_DOMAIN_TYPE_PV:
>      {
> @@ -1182,6 +1189,14 @@ static int do_pci_remove(libxl__gc *gc, uint32_t domid,
>                                           NULL, NULL, NULL) < 0)
>              goto out_fail;
>  
> +        rc = xc_hvm_pci_hotplug_disable(ctx->xch, domid, pcidev->dev);
> +        if (rc < 0) {
> +            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR,
> +                             "Error: xc_hvm_pci_hotplug_disable failed");
> +            rc = ERROR_FAIL;
> +            goto out_fail;
> +        }
> +
>          switch (libxl__device_model_version_running(gc, domid)) {
>          case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
>              rc = qemu_pci_remove_xenstore(gc, domid, pcidev, force);
> diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
> index eea5555..48efddb 100644
> --- a/xen/arch/x86/hvm/Makefile
> +++ b/xen/arch/x86/hvm/Makefile
> @@ -3,6 +3,7 @@ subdir-y += vmx
>  
>  obj-y += asid.o
>  obj-y += emulate.o
> +obj-y += hotplug.o
>  obj-y += hpet.o
>  obj-y += hvm.o
>  obj-y += i8254.o
> diff --git a/xen/arch/x86/hvm/hotplug.c b/xen/arch/x86/hvm/hotplug.c
> new file mode 100644
> index 0000000..253d435
> --- /dev/null
> +++ b/xen/arch/x86/hvm/hotplug.c
> @@ -0,0 +1,207 @@
> +/*
> + * hvm/hotplug.c
> + *
> + * Copyright (c) 2013, Citrix Systems Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
> + * Place - Suite 330, Boston, MA 02111-1307 USA.
> + */
> +
> +#include <xen/types.h>
> +#include <xen/spinlock.h>
> +#include <xen/xmalloc.h>
> +#include <asm/hvm/io.h>
> +#include <asm/hvm/support.h>
> +
> +#define SCI_IRQ 9
> +
> +#define GPE_BASE            (ACPI_GPE0_BLK_ADDRESS_V1)
> +#define GPE_LEN             (ACPI_GPE0_BLK_LEN_V1)
> +
> +#define GPE_PCI_HOTPLUG_STATUS  2
> +
> +#define PCI_HOTPLUG_BASE    (ACPI_PCI_HOTPLUG_ADDRESS_V1)
> +#define PCI_HOTPLUG_LEN     (ACPI_PCI_HOTPLUG_LEN_V1)
> +
> +#define PCI_UP      0
> +#define PCI_DOWN    4
> +#define PCI_EJECT   8
> +
> +static void gpe_update_sci(struct hvm_hotplug *hp)
> +{
> +    if ( (hp->gpe_sts[0] & hp->gpe_en[0]) & GPE_PCI_HOTPLUG_STATUS )
> +        hvm_isa_irq_assert(hp->domain, SCI_IRQ);
> +    else
> +        hvm_isa_irq_deassert(hp->domain, SCI_IRQ);
> +}
> +
> +static int handle_gpe_io(
> +    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
> +{
> +    struct vcpu *v = current;
> +    struct domain *d = v->domain;
> +    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
> +
> +    if ( bytes != 1 )
> +    {
> +        gdprintk(XENLOG_WARNING, "%s: bad access\n", __func__);
> +        goto done;
> +    }
> +
> +    port -= GPE_BASE;
> +
> +    if ( dir == IOREQ_READ )
> +    {
> +        if ( port < GPE_LEN / 2 )
> +        {
> +            *val = hp->gpe_sts[port];
> +        }
> +        else
> +        {
> +            port -= GPE_LEN / 2;
> +            *val = hp->gpe_en[port];
> +        }
> +    } else {
> +        if ( port < GPE_LEN / 2 )
> +        {
> +            hp->gpe_sts[port] &= ~*val;
> +        }
> +        else
> +        {
> +            port -= GPE_LEN / 2;
> +            hp->gpe_en[port] = *val;
> +        }
> +
> +        gpe_update_sci(hp);
> +    }
> +
> +done:
> +    return X86EMUL_OKAY;
> +}
> +
> +static void pci_hotplug_eject(struct hvm_hotplug *hp, uint32_t mask)
> +{
> +    int slot = ffs(mask) - 1;
> +
> +    gdprintk(XENLOG_INFO, "%s: %d\n", __func__, slot);
> +
> +    hp->slot_down &= ~(1u  << slot);
> +    hp->slot_up &= ~(1u  << slot);
> +}
> +
> +static int handle_pci_hotplug_io(
> +    int dir, uint32_t port, uint32_t bytes, uint32_t *val)
> +{
> +    struct vcpu *v = current;
> +    struct domain *d = v->domain;
> +    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
> +
> +    if ( bytes != 4 )
> +    {
> +        gdprintk(XENLOG_WARNING, "%s: bad access\n", __func__);
> +        goto done;
> +    }
> +
> +    port -= PCI_HOTPLUG_BASE;
> +
> +    if ( dir == IOREQ_READ )
> +    {
> +        switch ( port )
> +        {
> +        case PCI_UP:
> +            *val = hp->slot_up;
> +            break;
> +        case PCI_DOWN:
> +            *val = hp->slot_down;
> +            break;
> +        default:
> +            break;
> +        }
> +    }
> +    else
> +    {   
> +        switch ( port )
> +        {
> +        case PCI_EJECT:
> +            pci_hotplug_eject(hp, *val);
> +            break;
> +        default:
> +            break;
> +        }
> +    }
> +
> +done:
> +    return X86EMUL_OKAY;
> +}
> +
> +void pci_hotplug(struct domain *d, int slot, bool_t enable)
> +{
> +    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
> +
> +    gdprintk(XENLOG_INFO, "%s: %s %d\n", __func__,
> +             ( enable ) ? "enable" : "disable", slot);
> +
> +    if ( enable )
> +        hp->slot_up |= (1u << slot);
> +    else
> +        hp->slot_down |= (1u << slot);
> +
> +    hp->gpe_sts[0] |= GPE_PCI_HOTPLUG_STATUS;
> +    gpe_update_sci(hp);
> +}
> +
> +int gpe_init(struct domain *d)
> +{
> +    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
> +
> +    hp->domain = d;
> +
> +    hp->gpe_sts = xzalloc_array(uint8_t, GPE_LEN / 2);

This size is known at compile time - what about arrays inside
hvm_hotplug and forgo the small memory allocations?

> +    if ( hp->gpe_sts == NULL )
> +        goto fail1;
> +
> +    hp->gpe_en = xzalloc_array(uint8_t, GPE_LEN / 2);
> +    if ( hp->gpe_en == NULL )
> +        goto fail2;
> +
> +    register_portio_handler(d, GPE_BASE, GPE_LEN, handle_gpe_io);
> +    register_portio_handler(d, PCI_HOTPLUG_BASE, PCI_HOTPLUG_LEN,
> +                            handle_pci_hotplug_io);
> +
> +    return 0;
> +
> +fail2:
> +    xfree(hp->gpe_sts);
> +
> +fail1:
> +    return -ENOMEM;
> +}
> +
> +void gpe_deinit(struct domain *d)
> +{
> +    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
> +
> +    xfree(hp->gpe_en);
> +    xfree(hp->gpe_sts);
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * c-tab-always-indent: nil
> + * End:
> + */
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 5f9e728..ff7b259 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -1298,15 +1298,21 @@ int hvm_domain_initialise(struct domain *d)
>  
>      rtc_init(d);
>  
> +    rc = gpe_init(d);
> +    if ( rc != 0 )
> +        goto fail2;
> +
>      register_portio_handler(d, 0xe9, 1, hvm_print_line);
>      register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
>  
>      rc = hvm_funcs.domain_initialise(d);
>      if ( rc != 0 )
> -        goto fail2;
> +        goto fail3;
>  
>      return 0;
>  
> + fail3:
> +    gpe_deinit(d);
>   fail2:
>      rtc_deinit(d);
>      stdvga_deinit(d);
> @@ -1352,6 +1358,7 @@ void hvm_domain_destroy(struct domain *d)
>          return;
>  
>      hvm_funcs.domain_destroy(d);
> +    gpe_deinit(d);
>      rtc_deinit(d);
>      stdvga_deinit(d);
>      vioapic_deinit(d);
> @@ -5015,6 +5022,32 @@ out:
>      return rc;
>  }
>  
> +static int hvmop_pci_hotplug(
> +    XEN_GUEST_HANDLE_PARAM(xen_hvm_pci_hotplug_t) uop)
> +{
> +    xen_hvm_pci_hotplug_t op;
> +    struct domain *d;
> +    int rc;
> +
> +    if ( copy_from_guest(&op, uop, 1) )
> +        return -EFAULT;
> +
> +    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
> +    if ( rc != 0 )
> +        return rc;
> +
> +    rc = -EINVAL;
> +    if ( !is_hvm_domain(d) )
> +        goto out;
> +
> +    pci_hotplug(d, op.slot, op.enable);
> +    rc = 0;
> +
> +out:
> +    rcu_unlock_domain(d);
> +    return rc;
> +}
> +
>  long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>  
>  {
> @@ -5058,6 +5091,11 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>              guest_handle_cast(arg, xen_hvm_destroy_ioreq_server_t));
>          break;
>      
> +    case HVMOP_pci_hotplug:
> +        rc = hvmop_pci_hotplug(
> +            guest_handle_cast(arg, xen_hvm_pci_hotplug_t));
> +        break;
> +
>      case HVMOP_set_param:
>      case HVMOP_get_param:
>      {
> diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
> index 93dcec1..13dd24d 100644
> --- a/xen/include/asm-x86/hvm/domain.h
> +++ b/xen/include/asm-x86/hvm/domain.h
> @@ -66,6 +66,16 @@ struct hvm_ioreq_server {
>      struct hvm_pcidev      *pcidev_list;
>  };
>  
> +struct hvm_hotplug {
> +    struct domain   *domain;

This appears to be found by using container_of(), which will help keep
the size of struct domain down.

> +    uint8_t         *gpe_sts;
> +    uint8_t         *gpe_en;
> +
> +    /* PCI hotplug */
> +    uint32_t        slot_up;
> +    uint32_t        slot_down;
> +};
> +
>  struct hvm_domain {
>      struct list_head        ioreq_server_list;
>      spinlock_t              ioreq_server_lock;
> @@ -73,6 +83,8 @@ struct hvm_domain {
>      uint32_t                pci_cf8;
>      spinlock_t              pci_lock;
>  
> +    struct hvm_hotplug      hotplug;
> +
>      struct pl_time         pl_time;
>  
>      struct hvm_io_handler *io_handler;
> diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
> index 86db58d..072bfe7 100644
> --- a/xen/include/asm-x86/hvm/io.h
> +++ b/xen/include/asm-x86/hvm/io.h
> @@ -142,5 +142,11 @@ void stdvga_init(struct domain *d);
>  void stdvga_deinit(struct domain *d);
>  
>  extern void hvm_dpci_msi_eoi(struct domain *d, int vector);
> +
> +int gpe_init(struct domain *d);
> +void gpe_deinit(struct domain *d);
> +
> +void pci_hotplug(struct domain *d, int slot, bool_t enable);
> +
>  #endif /* __ASM_X86_HVM_IO_H__ */
>  
> diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
> index 6b31189..20a53ab 100644
> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -340,6 +340,15 @@ struct xen_hvm_destroy_ioreq_server {
>  typedef struct xen_hvm_destroy_ioreq_server xen_hvm_destroy_ioreq_server_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_hvm_destroy_ioreq_server_t);
>  
> +#define HVMOP_pci_hotplug 24
> +struct xen_hvm_pci_hotplug {
> +    domid_t domid;          /* IN - domain to be serviced */
> +    uint8_t enable;         /* IN - enable or disable? */
> +    uint32_t slot;          /* IN - slot to enable/disable */

Reordering these two will make the structure smaller.

~Andrew

> +};
> +typedef struct xen_hvm_pci_hotplug xen_hvm_pci_hotplug_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_pci_hotplug_t);
> +
>  #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
>  
>  #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
> diff --git a/xen/include/public/hvm/ioreq.h b/xen/include/public/hvm/ioreq.h
> index e84fa75..40bfa61 100644
> --- a/xen/include/public/hvm/ioreq.h
> +++ b/xen/include/public/hvm/ioreq.h
> @@ -101,6 +101,8 @@ typedef struct buffered_iopage buffered_iopage_t;
>  #define ACPI_PM_TMR_BLK_ADDRESS_V1   (ACPI_PM1A_EVT_BLK_ADDRESS_V1 + 0x08)
>  #define ACPI_GPE0_BLK_ADDRESS_V1     0xafe0
>  #define ACPI_GPE0_BLK_LEN_V1         0x04
> +#define ACPI_PCI_HOTPLUG_ADDRESS_V1  0xae00
> +#define ACPI_PCI_HOTPLUG_LEN_V1      0x10
>  
>  /* Compatibility definitions for the default location (version 0). */
>  #define ACPI_PM1A_EVT_BLK_ADDRESS    ACPI_PM1A_EVT_BLK_ADDRESS_V0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 5/5] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-01-30 15:55   ` Andrew Cooper
@ 2014-01-30 16:06     ` Paul Durrant
  2014-01-30 16:38       ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: Paul Durrant @ 2014-01-30 16:06 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel@lists.xen.org

> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
[snip]
> > +int xc_hvm_pci_hotplug_enable(xc_interface *xch,
> > +                              domid_t domid,
> > +                              uint32_t slot)
> 
> Take enable as a parameter and save having 2 almost identical functions?
> 

I was in two minds. Internally it's a single HVMOP with an enable/disable parameter (as we can see below) but I thought it was neater to keep the separation at the API.

> > +{
> > +    DECLARE_HYPERCALL;
> > +    DECLARE_HYPERCALL_BUFFER(xen_hvm_pci_hotplug_t, arg);
> > +    int rc;
> > +
> > +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> > +    if ( arg == NULL )
> > +        return -1;
> > +
> > +    hypercall.op     = __HYPERVISOR_hvm_op;
> > +    hypercall.arg[0] = HVMOP_pci_hotplug;
> > +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> > +    arg->domid = domid;
> > +    arg->enable = 1;
> > +    arg->slot = slot;
> > +    rc = do_xen_hypercall(xch, &hypercall);
> > +    xc_hypercall_buffer_free(xch, arg);
> > +    return rc;
> > +}
> > +
> > +int xc_hvm_pci_hotplug_disable(xc_interface *xch,
> > +                               domid_t domid,
> > +                               uint32_t slot)
> > +{
> > +    DECLARE_HYPERCALL;
> > +    DECLARE_HYPERCALL_BUFFER(xen_hvm_pci_hotplug_t, arg);
> > +    int rc;
> > +
> > +    arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> > +    if ( arg == NULL )
> > +        return -1;
> > +
> > +    hypercall.op     = __HYPERVISOR_hvm_op;
> > +    hypercall.arg[0] = HVMOP_pci_hotplug;
> > +    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> > +    arg->domid = domid;
> > +    arg->enable = 0;
> > +    arg->slot = slot;
> > +    rc = do_xen_hypercall(xch, &hypercall);
> > +    xc_hypercall_buffer_free(xch, arg);
> > +    return rc;
> > +}
> > +
[snip]
> > +int xc_hvm_pci_hotplug_disable(xc_interface *xch,
> > +			       domid_t domid,
> > +			       uint32_t slot);
> > +
> 
> tabs/spaces
> 

Yep.

[snip]
> > +int gpe_init(struct domain *d)
> > +{
> > +    struct hvm_hotplug  *hp = &d->arch.hvm_domain.hotplug;
> > +
> > +    hp->domain = d;
> > +
> > +    hp->gpe_sts = xzalloc_array(uint8_t, GPE_LEN / 2);
> 
> This size is known at compile time - what about arrays inside
> hvm_hotplug and forgo the small memory allocations?
> 

Yes, that seems reasonable.

[snip]
> > +struct hvm_hotplug {
> > +    struct domain   *domain;
> 
> This appears to be found by using container_of(), which will help keep
> the size of struct domain down.
> 

Sure.

> > +    uint8_t         *gpe_sts;
> > +    uint8_t         *gpe_en;
> > +
> > +    /* PCI hotplug */
> > +    uint32_t        slot_up;
> > +    uint32_t        slot_down;
> > +};
> > +
> >  struct hvm_domain {
> >      struct list_head        ioreq_server_list;
> >      spinlock_t              ioreq_server_lock;
> > @@ -73,6 +83,8 @@ struct hvm_domain {
> >      uint32_t                pci_cf8;
> >      spinlock_t              pci_lock;
> >
> > +    struct hvm_hotplug      hotplug;
> > +
> >      struct pl_time         pl_time;
> >
> >      struct hvm_io_handler *io_handler;
> > diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-
> x86/hvm/io.h
> > index 86db58d..072bfe7 100644
> > --- a/xen/include/asm-x86/hvm/io.h
> > +++ b/xen/include/asm-x86/hvm/io.h
> > @@ -142,5 +142,11 @@ void stdvga_init(struct domain *d);
> >  void stdvga_deinit(struct domain *d);
> >
> >  extern void hvm_dpci_msi_eoi(struct domain *d, int vector);
> > +
> > +int gpe_init(struct domain *d);
> > +void gpe_deinit(struct domain *d);
> > +
> > +void pci_hotplug(struct domain *d, int slot, bool_t enable);
> > +
> >  #endif /* __ASM_X86_HVM_IO_H__ */
> >
> > diff --git a/xen/include/public/hvm/hvm_op.h
> b/xen/include/public/hvm/hvm_op.h
> > index 6b31189..20a53ab 100644
> > --- a/xen/include/public/hvm/hvm_op.h
> > +++ b/xen/include/public/hvm/hvm_op.h
> > @@ -340,6 +340,15 @@ struct xen_hvm_destroy_ioreq_server {
> >  typedef struct xen_hvm_destroy_ioreq_server
> xen_hvm_destroy_ioreq_server_t;
> >  DEFINE_XEN_GUEST_HANDLE(xen_hvm_destroy_ioreq_server_t);
> >
> > +#define HVMOP_pci_hotplug 24
> > +struct xen_hvm_pci_hotplug {
> > +    domid_t domid;          /* IN - domain to be serviced */
> > +    uint8_t enable;         /* IN - enable or disable? */
> > +    uint32_t slot;          /* IN - slot to enable/disable */
> 
> Reordering these two will make the structure smaller.
> 

It will indeed.

   Paul

> ~Andrew
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 5/5] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-01-30 16:06     ` Paul Durrant
@ 2014-01-30 16:38       ` Jan Beulich
  2014-01-30 16:42         ` Paul Durrant
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2014-01-30 16:38 UTC (permalink / raw)
  To: Andrew Cooper, Paul Durrant; +Cc: xen-devel@lists.xen.org

>>> On 30.01.14 at 17:06, Paul Durrant <Paul.Durrant@citrix.com> wrote:
>> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
>> > +struct xen_hvm_pci_hotplug {
>> > +    domid_t domid;          /* IN - domain to be serviced */
>> > +    uint8_t enable;         /* IN - enable or disable? */
>> > +    uint32_t slot;          /* IN - slot to enable/disable */
>> 
>> Reordering these two will make the structure smaller.
>> 
> 
> It will indeed.

Now I'm confused: domid_t being 16 bits, afaict re-ordering would
make it larger (from 8 to 12 bytes) rather than smaller.

What I'd certainly recommend is filling the 1 byte that's currently
unused (either by widening "enabled" or with a padding field) such
that eventual future extensions (flags?) could be added (i.e. the
field would need to be checked to be zero).

Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 5/5] ioreq-server: bring the PCI hotplug controller implementation into Xen
  2014-01-30 16:38       ` Jan Beulich
@ 2014-01-30 16:42         ` Paul Durrant
  0 siblings, 0 replies; 25+ messages in thread
From: Paul Durrant @ 2014-01-30 16:42 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper; +Cc: xen-devel@lists.xen.org

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 30 January 2014 16:39
> To: Andrew Cooper; Paul Durrant
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [RFC PATCH 5/5] ioreq-server: bring the PCI hotplug
> controller implementation into Xen
> 
> >>> On 30.01.14 at 17:06, Paul Durrant <Paul.Durrant@citrix.com> wrote:
> >> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> >> > +struct xen_hvm_pci_hotplug {
> >> > +    domid_t domid;          /* IN - domain to be serviced */
> >> > +    uint8_t enable;         /* IN - enable or disable? */
> >> > +    uint32_t slot;          /* IN - slot to enable/disable */
> >>
> >> Reordering these two will make the structure smaller.
> >>
> >
> > It will indeed.
> 
> Now I'm confused: domid_t being 16 bits, afaict re-ordering would
> make it larger (from 8 to 12 bytes) rather than smaller.
> 

Sorry, I had it in my head that domids were 32-bits. You are correct... which is probably why I used that ordering in the first place ;-)

  Paul

> What I'd certainly recommend is filling the 1 byte that's currently
> unused (either by widening "enabled" or with a padding field) such
> that eventual future extensions (flags?) could be added (i.e. the
> field would need to be checked to be zero).
> 
> Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 1/5] Support for running secondary emulators
  2014-01-30 14:19 [RFC PATCH 1/5] Support for running secondary emulators Paul Durrant
                   ` (4 preceding siblings ...)
  2014-01-30 14:19 ` [RFC PATCH 5/5] ioreq-server: bring the PCI hotplug controller implementation into Xen Paul Durrant
@ 2014-01-30 14:23 ` Paul Durrant
  2014-03-01 22:24 ` Matt Wilson
  6 siblings, 0 replies; 25+ messages in thread
From: Paul Durrant @ 2014-01-30 14:23 UTC (permalink / raw)
  To: Paul Durrant, xen-devel@lists.xen.org

> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> bounces@lists.xen.org] On Behalf Of Paul Durrant
> Sent: 30 January 2014 14:20
> To: xen-devel@lists.xen.org
> Subject: [Xen-devel] [RFC PATCH 1/5] Support for running secondary
> emulators
> 

That was, of course, supposed to read RFC PATCH 0/5.

  Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 1/5] Support for running secondary emulators
  2014-01-30 14:19 [RFC PATCH 1/5] Support for running secondary emulators Paul Durrant
                   ` (5 preceding siblings ...)
  2014-01-30 14:23 ` [RFC PATCH 1/5] Support for running secondary emulators Paul Durrant
@ 2014-03-01 22:24 ` Matt Wilson
  2014-03-03 13:34   ` Paul Durrant
  6 siblings, 1 reply; 25+ messages in thread
From: Matt Wilson @ 2014-03-01 22:24 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

On Thu, Jan 30, 2014 at 02:19:45PM +0000, Paul Durrant wrote:
> This patch series adds the ioreq server interface which I mentioned in
> my talk at the Xen developer summit in Edinburgh at the end of last year.
> The code is based on work originally done by Julien Grall but has been
> re-written to allow existing versions of QEMU to work unmodified.
> 
[...]

Hi Paul,

I'm coming back to play with this after a few weeks and I'm having
trouble getting things going. It seems that I'm crashing early when
hvmloader is programming the PCI-ISA bridge link routes.

(XEN) hvm.c:712:d0 hvm_create_ioreq_server: 9:0
(d9) HVM Loader
(d9) Detected Xen v4.4-rc2
(d9) Xenbus rings @0xfeffd000, event channel 1
(d9) System requested SeaBIOS
(d9) CPU speed is 1995 MHz
(d9) Relocating guest memory for lowmem MMIO space disabled
(XEN) io.c:170:d9 Weird HVM ioemulation status 1.
(XEN) domain_crash called from io.c:171
(XEN) Domain 9 (vcpu#0) crashed on cpu#15:
(XEN) ----[ Xen-4.4-rc2  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    15
(XEN) RIP:    0018:[<0000000000100965>]
(XEN) RFLAGS: 0000000000000046   CONTEXT: hvm guest
(XEN) rax: 0000000000000005   rbx: 0000000000000001   rcx: 000000000000000a
(XEN) rdx: 0000000000000cfc   rsi: 0000000000000000   rdi: 0000000000000005
(XEN) rbp: 0000000000185d6c   rsp: 0000000000185d6c   r8:  0000000000000000
(XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
(XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
(XEN) r15: 0000000000000000   cr0: 0000000000000011   cr4: 0000000000000000
(XEN) cr3: 0000000000000000   cr2: 0000000000000000
(XEN) ds: 0020   es: 0020   fs: 0020   gs: 0020   ss: 0020   cs: 0018
(XEN) hvm.c:790:d0 hvm_destroy_ioreq_server: 9:0

Any quick ideas before I go instrumenting things?

--msw

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 1/5] Support for running secondary emulators
  2014-03-01 22:24 ` Matt Wilson
@ 2014-03-03 13:34   ` Paul Durrant
  2014-03-03 22:41     ` Matt Wilson
  0 siblings, 1 reply; 25+ messages in thread
From: Paul Durrant @ 2014-03-03 13:34 UTC (permalink / raw)
  To: Matt Wilson; +Cc: xen-devel@lists.xen.org

> -----Original Message-----
> From: Matt Wilson [mailto:mswilson@gmail.com] On Behalf Of Matt Wilson
> Sent: 01 March 2014 22:25
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [RFC PATCH 1/5] Support for running secondary
> emulators
> 
> On Thu, Jan 30, 2014 at 02:19:45PM +0000, Paul Durrant wrote:
> > This patch series adds the ioreq server interface which I mentioned in
> > my talk at the Xen developer summit in Edinburgh at the end of last year.
> > The code is based on work originally done by Julien Grall but has been
> > re-written to allow existing versions of QEMU to work unmodified.
> >
> [...]
> 
> Hi Paul,
> 
> I'm coming back to play with this after a few weeks and I'm having
> trouble getting things going. It seems that I'm crashing early when
> hvmloader is programming the PCI-ISA bridge link routes.
> 
> (XEN) hvm.c:712:d0 hvm_create_ioreq_server: 9:0
> (d9) HVM Loader
> (d9) Detected Xen v4.4-rc2
> (d9) Xenbus rings @0xfeffd000, event channel 1
> (d9) System requested SeaBIOS
> (d9) CPU speed is 1995 MHz
> (d9) Relocating guest memory for lowmem MMIO space disabled
> (XEN) io.c:170:d9 Weird HVM ioemulation status 1.
> (XEN) domain_crash called from io.c:171
> (XEN) Domain 9 (vcpu#0) crashed on cpu#15:
> (XEN) ----[ Xen-4.4-rc2  x86_64  debug=y  Not tainted ]----
> (XEN) CPU:    15
> (XEN) RIP:    0018:[<0000000000100965>]
> (XEN) RFLAGS: 0000000000000046   CONTEXT: hvm guest
> (XEN) rax: 0000000000000005   rbx: 0000000000000001   rcx: 000000000000000a
> (XEN) rdx: 0000000000000cfc   rsi: 0000000000000000   rdi: 0000000000000005
> (XEN) rbp: 0000000000185d6c   rsp: 0000000000185d6c   r8:  0000000000000000
> (XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
> (XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
> (XEN) r15: 0000000000000000   cr0: 0000000000000011   cr4: 0000000000000000
> (XEN) cr3: 0000000000000000   cr2: 0000000000000000
> (XEN) ds: 0020   es: 0020   fs: 0020   gs: 0020   ss: 0020   cs: 0018
> (XEN) hvm.c:790:d0 hvm_destroy_ioreq_server: 9:0
> 
> Any quick ideas before I go instrumenting things?
> 


Matt,

  I've re-worked the patches a bit and am about to submit them non-RFC this time. I had a look at the shared page setup given what you saw and I can't find a memset anywhere to init the shared ioreq state so I suspect your problem is just uninitialized mem (and I guess I may not have seen it as I usually only run up a couple of VMs before I need to do a host reboot so I'm probably getting freshly scrubbed pages every time). So, I'll stick a memset in my new patch and hopefully you can give the new code a spin once I've posted it.

  Thanks for the test! Cheers,

    Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 1/5] Support for running secondary emulators
  2014-03-03 13:34   ` Paul Durrant
@ 2014-03-03 22:41     ` Matt Wilson
  2014-03-04 10:11       ` Paul Durrant
  0 siblings, 1 reply; 25+ messages in thread
From: Matt Wilson @ 2014-03-03 22:41 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel@lists.xen.org

On Mon, Mar 03, 2014 at 01:34:41PM +0000, Paul Durrant wrote:
> > -----Original Message-----
> > From: Matt Wilson [mailto:mswilson@gmail.com] On Behalf Of Matt Wilson
> > Sent: 01 March 2014 22:25
> > To: Paul Durrant
> > Cc: xen-devel@lists.xen.org
> > Subject: Re: [Xen-devel] [RFC PATCH 1/5] Support for running secondary
> > emulators
> > 
> > On Thu, Jan 30, 2014 at 02:19:45PM +0000, Paul Durrant wrote:
> > > This patch series adds the ioreq server interface which I mentioned in
> > > my talk at the Xen developer summit in Edinburgh at the end of last year.
> > > The code is based on work originally done by Julien Grall but has been
> > > re-written to allow existing versions of QEMU to work unmodified.
> > >
> > [...]
> > 
> > Hi Paul,
> > 
> > I'm coming back to play with this after a few weeks and I'm having
> > trouble getting things going. It seems that I'm crashing early when
> > hvmloader is programming the PCI-ISA bridge link routes.

[...]

> > Any quick ideas before I go instrumenting things?
> > 
> 
> 
> Matt,
> 
> I've re-worked the patches a bit and am about to submit them non-RFC
> this time. I had a look at the shared page setup given what you saw
> and I can't find a memset anywhere to init the shared ioreq state so
> I suspect your problem is just uninitialized mem (and I guess I may
> not have seen it as I usually only run up a couple of VMs before I
> need to do a host reboot so I'm probably getting freshly scrubbed
> pages every time). So, I'll stick a memset in my new patch and
> hopefully you can give the new code a spin once I've posted it.

I'm not sure... I tossed a scrub_one_page(page); in hvm_set_ioreq_page()
but that didn't seem to help.

I'll give your next series a spin when you post them (or push to your
git repository).

--msw

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 1/5] Support for running secondary emulators
  2014-03-03 22:41     ` Matt Wilson
@ 2014-03-04 10:11       ` Paul Durrant
  0 siblings, 0 replies; 25+ messages in thread
From: Paul Durrant @ 2014-03-04 10:11 UTC (permalink / raw)
  To: Matt Wilson; +Cc: xen-devel@lists.xen.org

> -----Original Message-----
> From: Matt Wilson [mailto:mswilson@gmail.com] On Behalf Of Matt Wilson
> Sent: 03 March 2014 22:41
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [RFC PATCH 1/5] Support for running secondary
> emulators
> 
> On Mon, Mar 03, 2014 at 01:34:41PM +0000, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Matt Wilson [mailto:mswilson@gmail.com] On Behalf Of Matt
> Wilson
> > > Sent: 01 March 2014 22:25
> > > To: Paul Durrant
> > > Cc: xen-devel@lists.xen.org
> > > Subject: Re: [Xen-devel] [RFC PATCH 1/5] Support for running secondary
> > > emulators
> > >
> > > On Thu, Jan 30, 2014 at 02:19:45PM +0000, Paul Durrant wrote:
> > > > This patch series adds the ioreq server interface which I mentioned in
> > > > my talk at the Xen developer summit in Edinburgh at the end of last
> year.
> > > > The code is based on work originally done by Julien Grall but has been
> > > > re-written to allow existing versions of QEMU to work unmodified.
> > > >
> > > [...]
> > >
> > > Hi Paul,
> > >
> > > I'm coming back to play with this after a few weeks and I'm having
> > > trouble getting things going. It seems that I'm crashing early when
> > > hvmloader is programming the PCI-ISA bridge link routes.
> 
> [...]
> 
> > > Any quick ideas before I go instrumenting things?
> > >
> >
> >
> > Matt,
> >
> > I've re-worked the patches a bit and am about to submit them non-RFC
> > this time. I had a look at the shared page setup given what you saw
> > and I can't find a memset anywhere to init the shared ioreq state so
> > I suspect your problem is just uninitialized mem (and I guess I may
> > not have seen it as I usually only run up a couple of VMs before I
> > need to do a host reboot so I'm probably getting freshly scrubbed
> > pages every time). So, I'll stick a memset in my new patch and
> > hopefully you can give the new code a spin once I've posted it.
> 
> I'm not sure... I tossed a scrub_one_page(page); in hvm_set_ioreq_page()
> but that didn't seem to help.
> 

Looking more closely, I'm guessing (given the log messages) that the IO in question was the first PCI config space write, which should hit the new cf8 handler in xen (which will indeed set the io emulation status to unhandleable) but it should then find the ioreq server (which was clearly created) and QEMU should handle the IO. Looking at the code in hvmemul_do_io though, if it gets as far as actually running the cf8 handler, I don't see how it can return with unhandleable which suggests maybe the vcpu emulation status is already hosed before the config space IO was attempted.
I've not been able to repro in my testing so far though. What sort of guest were you kicking off?

  Paul

> I'll give your next series a spin when you post them (or push to your
> git repository).
> 
> --msw

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2014-03-04 10:11 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-30 14:19 [RFC PATCH 1/5] Support for running secondary emulators Paul Durrant
2014-01-30 14:19 ` [RFC PATCH 1/5] ioreq-server: centralize access to ioreq structures Paul Durrant
2014-01-30 14:32   ` Andrew Cooper
2014-01-30 14:35     ` Paul Durrant
2014-02-07  4:53   ` Matt Wilson
2014-02-07  9:24     ` Paul Durrant
2014-01-30 14:19 ` [RFC PATCH 2/5] ioreq-server: create basic ioreq server abstraction Paul Durrant
2014-01-30 15:03   ` Andrew Cooper
2014-01-30 15:17     ` Paul Durrant
2014-01-30 14:19 ` [RFC PATCH 3/5] ioreq-server: on-demand creation of ioreq server Paul Durrant
2014-01-30 15:21   ` Andrew Cooper
2014-01-30 15:32     ` Paul Durrant
2014-01-30 14:19 ` [RFC PATCH 4/5] ioreq-server: add support for multiple servers Paul Durrant
2014-01-30 15:46   ` Andrew Cooper
2014-01-30 15:56     ` Paul Durrant
2014-01-30 14:19 ` [RFC PATCH 5/5] ioreq-server: bring the PCI hotplug controller implementation into Xen Paul Durrant
2014-01-30 15:55   ` Andrew Cooper
2014-01-30 16:06     ` Paul Durrant
2014-01-30 16:38       ` Jan Beulich
2014-01-30 16:42         ` Paul Durrant
2014-01-30 14:23 ` [RFC PATCH 1/5] Support for running secondary emulators Paul Durrant
2014-03-01 22:24 ` Matt Wilson
2014-03-03 13:34   ` Paul Durrant
2014-03-03 22:41     ` Matt Wilson
2014-03-04 10:11       ` Paul Durrant

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).